Oxford Handbook of Nucleic Acid Structure
This page intentionally left blank
Oxford Handbook of Nucleic Acid Struct...
14 downloads
1464 Views
17MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Oxford Handbook of Nucleic Acid Structure
This page intentionally left blank
Oxford Handbook of Nucleic Acid Structure Edited by
Stephen Neidle The CRC Biomolecular Structure Unit, The Institute of Cancer Research, Sutton, Surrey, UK
OXFORD UNIVERSITY PRESS
OXFORD UNIVERSITY PRBSS
Great Clarendo n Street , Oxfor d OX 2 6D P Oxford Universit y Pres s is a department of the Universit y o f Oxfor d and furthers th e University' s aim of excellence i n research, scholarship, and education by publishing worldwide i n Oxford Ne w Yor k Athens Aucklan d Bangko k Bogot a Bueno s Aires Calcutt a Cape Town Chenna i Da r e s Salaam Delh i Florenc e Hon g Kong Istanbu l Karachi Kual a Lumpur Madri d Melbourn e Mexic o Cit y Mumba i Nairobi Pari s Sa o Paulo Singapor e Taipe i Toky o Toront o Warsa w and associated companies in Berli n Ibada n Oxford i s a registered trade mark o f Oxford Universit y Press Published in the United State s by Oxford Universit y Press Inc., New Yor k © Oxfor d Universit y Press , 1999 All rights reserved. N o par t of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form o r by any means, without th e prior permission in writing of Oxfor d University Press. Within th e UK , exception s are allowed i n respect of any fair dealin g for the purpose of research or private study, or criticism or review, a s permitted unde r the Copyright , Design s and Patents Act, 1988 , o r in the cas e of reprographic reproduction i n accordance with the term s of licences issued by the Copyrigh t Licensin g Agency. Enquiries concernin g reproduction outsid e those terms and in other countrie s should be sent to the Rights Department , Oxfor d Universit y Press , at the addres s above. This book i s sold subjec t t o th e conditio n tha t it shall not, by way of trade or otherwise, b e lent, re-sold, hire d out, o r otherwis e circulated without th e publisher' s prior consen t in any form of binding or cover other than that in which it is published and without a similar condition including this condition bein g imposed on the subsequen t purchaser. A catalogue record fo r this book is available from th e Britis h Library Library of Congress Catalogin g i n Publication Data Oxford handboo k o f nucleic acid structure / edite d by S. Neidle . Includes bibliographical references and index. 1. Nucleic acids—Structure . 2 . X-ray crystallography. 3. Nuclear magnetic resonance spectroscopy. I . Neidle, Stephen . QD433.5.S77094 199 8 547'.790442-dc2 1 98-3443 1 ISBN 0 1 9 85003 8 6 (Hbk) Typeset by EXPO Holdings, Malaysia Printed i n Grea t Britain by Bookcraft (Bath ) Ltd, Midsomer Norton , Avon
Preface The stud y of nucleic aci d structure is now som e 4 5 years old. I t has grown int o a vast multifaceted field , whic h continue s to pla y a key role in furtherin g our understandin g of gene regulation and expression, and of ways for intervening with these processes . I t has become a fertile meeting-ground for crystallographers , NM R spectroscopists , and theoreticians, and no w eve n has its own databas e for structure deposition an d study. It i s a truism tha t nuclei c acid s are conformationall y mor e comple x tha n proteins. This complexity , whic h t o som e exten t ha s been maske d b y th e simplicit y o f th e classic DN A doubl e helix , i s show n b y th e readines s of bot h oligonucleotide s an d polynucleotides t o b e structurall y responsiv e t o change s i n loca l environment . Suc h conformational pluralit y ma y be cause d by wate r molecule s an d counterions , o r b y ligand (drug , protein ) binding , an d ca n b e highl y sequenc e dependent : a s shown b y the abilit y o f particular sequences to undergo bendin g an d deformation. The exten t o f local and global alterations in nuclei c acid conformation i s constrained by base pairing; once significan t stretche s o f non-helica l region s ar e present , the n nuclei c aci d sequences are capable of folding into altogether mor e complex , non-linea r structures , which typicall y involv e extensiv e non-Watson—Cric k base-base interactions . Ou r knowledge o f these structures is still rather rudimentary . The chapter s of this book describ e in detai l th e variet y of DNA an d RNA nuclei c acid structura l types discovere d t o date , al l of whic h ultimatel y depen d o n th e con formational pluralit y o f individua l nucleotid e repeatin g units . Thei r underlyin g conformational an d structural properties wer e extensivel y studie d i n th e tw o decade s following th e elucidatio n o f the structur e o f DNA itself . NM R an d crystallographic structural studie s wer e almos t entirel y confine d t o mononucleoside s an d nucleotide s up t o the mid-1970s . A number o f studies at that tim e focused on th e backbone con formations an d sugar puckers evident fro m these monomers , whic h provide d valuable information o n th e rang e of conformations likely t o be accessible to oligo- and polynucleotides, and on possible correlation s between them . This, th e earl y phase of nucleic acid structura l studies, produced atomic-resolutio n (c.0.7 A ) single-crysta l analyse s of a large numbe r o f nucleobases , mononucleosides , and mononucleotides . Thes e hav e provide d highl y accurat e geometri c dat a fo r th e five standar d DNA/RNA bases, as well a s for th e rar e bases occurring i n som e RNA s and for several protonated bases . This body of data is available for individual structures in th e smal l molecul e Cambridg e Crystallographi c Database , an d ha s recentl y bee n collated an d statisticall y analysed b y th e Nuclei c Aci d Databas e in orde r t o produc e standardized set s o f values. The availabilit y o f thi s dat a i s of particular importance fo r fibre diffraction , single-crystal , an d NM R refinement s o f poly- an d oligo-nucleotid e structures an d thei r complexes , al l o f whic h rel y o n accurat e geometrie s fo r th e definition o f reliable constraint s and restraints. Th e parameterizatio n o f force fields t o be use d i n molecula r dynamic s simulatio n studie s similarl y require s th e inclusio n o f high qualit y geometric data . The developmen t o f automated chemical synthesi s of defined sequence DNA (and , more recently , RNA ) oligonucleotide s ha s undoubtedly mad e a key contributio n t o
vi
Preface
the man y majo r advance s in nuclei c aci d structure since the earl y 1970s . A t th e sam e time, advance s in both crystallographic an d NMR methodology , togethe r wit h com puting an d visualizatio n developments, hav e enable d increasingl y comple x structure s to b e analyse d effectively . I t i s perhap s invidiou s t o selec t highlight s o f th e pas t 25 years , bu t structur e determination s o f tRNA , th e Dickerson—Dre w dodecamer , Z-DNA, ribozymes , an d telomeri c DNAs , al l represent significan t landmarks. Wha t about th e nex t tw o decades ? History tell s us that in thi s field, a s in man y others, pre diction i s foolhardy. However, som e trend s are already apparent. Thus, i t i s clear that the patterns of folding in complex RN A structure s represent a major challenge. DN A itself still has much t o reveal. A s crystallographic an d NMR dat a become mor e accu rate, feature s suc h a s hydration an d mobilit y (includin g sequenc e dependency) , wil l become bette r defined . DN A folding , including tha t o f catalytic and aptame r DNA , has yet to be explore d a t a molecular level. This handboo k ha s its origins in an earlier short introductor y monograp h o n DN A structure. Feedbac k from numerou s colleague s suggeste d tha t ther e i s a nee d fo r a comprehensive surve y and work o f reference for both DN A an d RNA structure , at an advanced level. I t is no longe r possibl e for one perso n t o emulat e the excellen t (1984 ) text b y Wolfra m Saenger , suc h ha s been th e growt h i n thi s fiel d sinc e then . I hav e been fortunat e in bein g abl e to persuad e so many o f my colleague s of this need, an d to contribut e t o thi s volume. Al l the majo r topics concerne d wit h 'native ' structures are represented. There is no explici t discussio n on eithe r protein- o r drug-nucleic acid complexes; these , i f covered a t the sam e level, woul d requir e separate volumes, suc h is the quantit y of information o n them . The boo k is set out i n a systematic manner, progressin g through th e polymorph s o f double helica l DN A throug h t o th e highe r orde r organization s o f triplexes, quadru plexes, an d junctions, the n o n t o RN A structure s in thei r variou s degree s o f com plexity. Th e tw o principa l tool s o f molecula r structur e determination , X-ra y crystallography and nuclear magneti c resonance , hav e been give n equa l weight i n th e book. Author s hav e bee n encourage d t o b e comprehensive , bu t no t encyclopaedic , and no t t o sh y away from controversy . It is to b e hope d tha t the reade r will arriv e at a balanced vie w o f th e complementarit y o f thes e tw o approaches , as well a s of thei r current scope and limitations. I a m ver y gratefu l t o a numbe r o f friend s an d colleague s fo r thei r wisdo m an d helpful advic e durin g thi s project , especiall y Hele n Berma n an d Dic k Dickerson . I have als o been fortunat e in a remarkable set of contributors, who hav e not onl y put much effor t int o thei r individua l chapters , but worke d togethe r t o provid e coherenc e and minima l overla p betwee n chapters . Thank s ar e du e t o m y editor s a t Oxfor d University Press , who hav e bee n instrumenta l i n guidin g th e contributor s (an d me) through th e man y minefields of a multi-author volume . Surrey S September 199 8
. N.
Contents Plate section fall s between page s 174 and 17 5 List of contributors xii Abbreviations xvi
1. Polynucleotide secondary structures: an historical perspective
i i
1
Struther Amott Introduction 1 The DN A duplex : discover y and definition 4 Expansion 8 Discrimination an d exploration 8 Polymorphism 9 Homopolymers 1 Polyoligonucleotide duplexe s 1 Envoi 2 Appendix: furthe r detail s of fibrous polynucleotide structure s together with some comment s 2 References 3
2. Base and base pair morphologies, helical parameters, and definitions
2 6 2 2 6
39
Richard Lavery and Krystyna Zakrzewska Introduction 3 Nucleic acid bases 3 Base pairing 4 Helical parameter definitions 5 Helical parameter calculation s 5 Examples of helical analysis 6 Analysing nucleic acid dynamics 7 Conclusions 7 References 7
3. The nucleic acid database: a research and teaching tool
9 9 3 1 4 0 2 3 4
77
Helen M. Bemtan, Christine Zardecki, and John Westbrook Introduction 7 The infrastructur e of the NDB 7 Production characteristics o f the NDB 8 Practical uses of the ND B fo r research and training 8 Prospects 9 References 9
7 8 3 7 2 2
viii
Contents
4. Simulation of nucleic acid structure
95
Jennifer L. Miller, Thomas E. Cheatham III, and Peter A. Kollman Force field s fo r nuclei c acids 9 Introduction t o simulatio n methods 9 Applications of molecular mechanics an d dynamics to nuclei c aci d systems 10 References 11
5. A-DNA duplexes in the crystal
5 8 1 1
117
Markus C. Wahl and Muttaiya Sundaralingam Introduction 11 The A-DNA conformation 11 A-DNA crystal packing 12 Sequence—structure relationships 12 Interconversions betwee n A- , B-, an d Z-forms 12 Chemical modification s of backbone and bases 12 Mispairs 13 A-DNA deformability 13 A-DNA interaction wit h ligands 13 Comparison wit h solutio n studie s 13 Conclusions 14 References 14
6. Helix structure and molecular recognition by B-DNA
7 7 1 6 8 9 0 2 3 9 0 1
145
Richard E. Dickerson
Introduction 14 Early sequence—structure correlations 14 Molecular propertie s o f B-DNA 15 Differences betwee n individual base steps 15 DNA behaviou r in crystal s and in protein:DNA complexe s 16 Roll/slide/twist correlations in protein:DNA complexe s 18 Summary and conclusions 18 References 19
7. The single-crystal structures of Z-DNA
5 5 5 8 3 2 9 1
199
Beth Basham, Brandt F. Eichman, and P. Shing Ho Introduction 19 The prototypical Z-DNA structure o f d(CGCGCG) 20 Sequence and substituent effects o n th e structur e and stability of Z-DNA 22 Summary: sequenc e effect s o n th e structure an d stability o f Z-DNA 24 References 24
9 0 2 5 9
8. Standard DNA duplexes and RNA:DNA hybrids in solution
253
Uli Schmitz, Forrest J.H. Blocker, and Thomas L. James Introduction 25
3
Contents i Data and methods fo r high resolutio n structur e determinatio n 25 DNA duple x structures 261 RNA:DNA hybrid structures 28 Outlook fo r the futur e 28 References 28
9. Nucleic acid hydration
x 4 0 8 9
295
Helen M. Berman and Bohdan Schneider
Introduction 29 Macroscopic studies 29 Structural analyse s of nucleic acid hydration 29 Summary 31 References 31
10. Single-crystal X-ray diffraction studies on the non-Watson—Crick base associations of mismatches, modified bases, and non-duplex oligonucleotide structures
5 5 7 0 0
313
William N. Hunter and Tom Brown
Introduction 31 Mismatches 31 Pairings with modifie d base s 32 Non-Watson-Crick associations stabilize higher orde r structure s 32 References 32
11. DNA mismatches in solution
3 4 1 4 8
331
Shan-Ho Chou and Brian R. Reid
Introduction 33 Mismatch pairing in antiparallel GA, GGA, and GGGA repeat s 33 Mismatches between parallel-strande d CG A triplet s and their repeats 33 Tandem sheare d G:A mismatches separate d by Watson-Crick base pairs 33 Sheared G:A mismatches closin g single-residue hairpi n loop s 34 Sheared G: A mismatches closing two-residu e hairpi n loops 34 Conclusion 34 References 35
12. Structures of nucleic acid triplexes
1 2 5 8 4 9 9 0
355
Edmond Wang and juli Feigon
Introduction 35 Structures of parallel triplexes 36 Structures of antiparallel triplexes 37 PNA triple x structure s 38 Conclusion 38 References 38
5 5 6 0 0 1
x
Contents
13. Structures of guanine-rich and cytosine-rich quadruplexes formed in vitro by telomeric, centromeric, and triplet repeat disease DNA sequences
389
Dinshaw J. Patel, Serge Bouaziz, Abdelali Kettani, and Yong Wang
Introduction 38 Telomeric sequenc e G quadruplexes 39 G:C:G:C tetrad-containing quadruplexe s 41 i-motif quadruplexes containin g intercalate d C:CH+ mismatc h pairs 43 Future direction s 44 References 44
14. DNA bending by adenine—thymine tracts
9 0 8 1 7 9
455
Donald M. Crothers and Zippora Shakked
Global an d spectroscopic properties o f DNA curvatur e induce d b y A-tracts 45 X-ray crystallographic studies 45 The stereochemica l basi s of A-tract-dependent curvatur e 46 References 46
15. Structures and interactions of helical junctions in nucleic acids
5 9 6 8
471
David M.J. Lilley
The occurrenc e o f helical junctions i n biology 47 Approaches to the stud y of branched nuclei c acids 47 The four-way DN A junction 47 The three-wa y DN A junction 48 The four-way RN A junction 48 Interaction betwee n DN A junctions an d proteins 48 Some final conclusions 49 References 49
16. DNA higher-order structures
1 2 5 1 5 8 2 4
499
Wilma K. Olson Overview 49 DNA supercoilin g 49 Computational issue s 50 Equilibrium structure s 51 Summary 52 References 52
17. Crystallographic structures of RNA oligoribonucleotides and ribozymes
9 9 4 0 6 7
533
Benoit Masquida and Eric Westhof
Introduction 53 Crystallization 53 Oligoribonucleotide crystal s 53
3 3 9
Contents x Catalytic RNAs 54 Conclusions 56 References 56
18. RNA structure in solution
i 8 0 2
567
Jacek Nowakowski and Ignado Tinoco, Jr Introduction 56 RNA structura l elements 56 Secondary structures 56 Tertiary structures , interactions between secondar y structures 58 References 59
19. Transfer RNA
7 7 9 9 9
603
John G . Amez and Dino Moras Introduction 60 The fre e tRN A 60 tRNA in aminoacylation 62 tRNA in protein synthesi s 64 Perspectives 64 References 64
Index
3 4 3 2 5 6
653
This page intentionally left blank
Contributors John G. Arnez: Laboratoir e d e Biologi e Structurale , Institu t d e Genetiqu e e t d e Biologie Moleculair e e t Cellulaire , CNRS/INSERM/ULP , 1 , rue L . Fries-BP 163 , F-67404 Illkirch, France Struther Arnott: The University , St. Andrews, Fife KY1 6 9AR , Scotlan d Beth Basham: Departmen t o f Biochemistr y an d Biophysics , AL S 2011, Orego n State University, Corvallis , OR 97331 , US A Helen M. Berman: Departmen t o f Chemistry, Rutger s University , Piscataway , N J 08854-8087, USA Forrest J.H. Blocker: Departmen t o f Pharmaceutica l Chemistry , University o f California, Sa n Francisco, San Francisco, CA 94143-446, USA Tom Brown: Department o f Chemistry, Universit y o f Southampton, Southampton , SO17 1BJ , UK Serge Bouaziz: Cellula r Biochemistr y an d Biophysic s Program , Memoria l Sloan Kettering Cancer Center , Ne w York, NY 10021 , USA Thomas E. Cheatham III: Laboratory for Structural Biology, MGSL/DCRT/12A2041, National Institute s of Health, Bethesda , MD 20814 , US A Shan-Ho Chou: Institut e o f Biochemistry , Nationa l Chung-Hsin g University , Taichung 40227 , Taiwan Donald M. Crothers: Department o f Chemistry, Yal e University, Ne w Haven , C T 06520, US A Richard E. Dickerson: Molecular Biology Institute , University o f California a t Los Angeles, Los Angeles, CA 90025-1570, USA Brandt F. Eichman: Departmen t o f Biochemistr y an d Biophysics , AL S 2011 , Oregon Stat e University, Corvallis, OR 97331 , USA Juli Feigon: Departmen t o f Chemistr y an d Biochemistry , Universit y o f California , Los Angeles, CA 90095, US A P. Shing Ho: Departmen t o f Biochemistry an d Biophysics, ALS 2011, Orego n State University, Corvallis, O R 97331 , USA
xiv
Contributors
William N. Hunter: Departmen t o f Biochemistry, Universit y o f Dundee, Dundee , DD1 5EH, U K Thomas L. James: Departmen t o f Pharmaceutica l Chemistry , Universit y o f California, Sa n Francisco, San Francisco, CA 94143-446, US A Abdelali Kettani: Cellula r Biochemistry and Biophysics Program, Memoria l Sloan Kettering Cancer Center, Ne w York, NY 10021 , USA Peter A. Kollman: Departmen t o f Pharmaceutical Chemistry, Box 0446, University of California, San Francisco, San Francisco, CA 94143, US A Richard Lavery: Laboratoir e de Biochimie Theorique , CNR S UPR 9080 , Institut de Biologie Physico-Chimique, 13 , Rue Pierr e et Marie Curie, Pari s 75005, France David M.J. Lilley: CRC Nuclei c Aci d Structur e Research Group , Department o f Biochemistry, The University , Dundee DD 1 4HN, U K Benoit Masquida: Institu t de Biologie Moleculaire et Cellulaire, Centre Nationa l d e la Recherch e Scientifique , UPR 9002 , 15 , ru e R . Descartes , F-67084 Strasbourg , France Jennifer L. Miller: Department o f Pharmaceutical Chemistry, Box 0446 , Universit y of California, San Francisco, San Francisco, CA 94143, USA Dino Moras: Laboratoir e d e Biologi e Structurale , Institu t d e Genetiqu e e t d e Biologie Moleculaire e t Cellulaire, CNRS/INSERM/ULP , 1 , ru e L . Fries-BP 163 , F-67404 Illkirch, France Jacek Nowakowski: Departmen t o f Chemistr y an d Molecula r Biology , Scripp s Research Institute , La Jolla, CA 92037, US A Wilma K. Olson: Departmen t o f Chemistry , Rutgers , Stat e Universit y o f Ne w Jersey, New Brunswick , NJ 08903, US A Dinshaw J. Patel: Cellular Biochemistr y an d Biophysics Program , Memoria l Sloan Kettering Cancer Center , Ne w York , NY 10021 , USA Brian R. Reid: Department s o f Chemistr y an d Biochemistry , Universit y o f Washington, Seattle, WA 98195, US A Uli Schmitz: Departmen t o f Pharmaceutica l Chemistry, Universit y o f California, San Francisco, San Francisco, CA 94143-446 , US A Bohdan Schneider: Heyrovsky Institute of Physical Chemistry, Academy of Sciences of the Czec h Republic, 1822 3 Prague , Czech Republi c
Contributors x
v
Zippora Shakked: Departmen t o f Structura l Biology , Weizman n Institut e o f Science, Rehovot, Israel Muttaiya Sundaralingam: Ohi o Stat e University , Biologica l Macromolecula r Structure Center , Department s o f Chemistr y an d Biochemistr y an d Th e Ohi o State Biochemistry Program, 012 Rughtmore Hall , 106 0 Carmac k Road, Columbus , OH 43210 , USA Ignacio Tinoco, Jr: Department o f Chemistry , Universit y o f California , Berkeley and Structura l Biology Division , Lawrenc e Berkeley Nationa l Laboratory , Berkeley, CA 94720-1460, US A Markus C. Wahl: Ohi o Stat e University , Biologica l Macromolecula r Structur e Center, Department s o f Chemistr y an d Biochemistr y an d Th e Ohi o Stat e Biochemistry Program , 01 2 Rughtmir e Hall , 106 0 Carmac k Road , Columbus , OH 43210 , USA Edmond Wang: Departmen t o f Chemistr y an d Biochemistry , University o f California, Lo s Angeles, CA 90095, US A Yong Wang: Cellula r Biochemistr y an d Biophysic s Program , Memoria l Sloan Kettering Cancer Center, Ne w York , NY 10021 , US A John Westbrook: Departmen t o f Chemistry , Rutger s University , Piscataway , N J 08854-8087, USA Eric Westhof: Institu t d e Biologi e Moleculair e e t Cellulaire , Centr e Nationa l d e la Recherche Scientifique , UP R 9002 , 15 , ru e R . Descartes , F-6708 4 Strasbourg , France Christine Zardecki: Departmen t o f Chemistry, Rutger s University , Piscataway , N J 08854-8087, USA Krystyna Zakrzewska: Laboratoir e de Biochimie Theorique , CNR S UP R 9080 , Institut d e Biologi e Physico-Chimique , 13 , Rue Pierr e e t Marie Curie , Pari s 75005 , France
This page intentionally left blank
Abbreviations aa-tRNA aminoacylate d tRN A aaRS aminoacyl-tRN A synthetas e AMP adenosin e monophosphate APP alternatin g pyrimidine-purin e APT antiparalle l triplex ATP adenosin e triphosphate bHLH basi c helix-loop-helix bZIP basi c leucine zipper CAP catabolit e activato r protei n COSY correlate d spectroscopy CS cationi c strength CSD Cambridg e Structural Database DIF dimeri c irregularity functio n dn dinucleotid e dzaX 7-deaza-2'-deoxyxanthosin e edA 1 ,N6-ethenoadenosine EF elongatio n factor FMN flavi n mononucleotid e g gauche GDP guanosin e 5'-diphosphat e GTP guanosin e 5'-triphosphat e HETCOR heteronuclea r correlated spectroscopy HPLC hig h performanc e liquid chromatography HTH helix-turn-heli x IHF integratio n hos t facto r IR infrare d ISPA isolate d spin-pai r approximation MD molecula r dynamics MG magnesiu m only form of d(CGCGCG ) MGSD magnesiu m and spermidine form of d(CGCGCG ) MGSP magnesiu m and spermine form of d(CGCGCG ) mmCIF macromolecula r crystallographi c informatio n fil e MMD multipl e molecular dynamics mRNA messenge r RNA NDB Nuclei c Acid Database NMR nuclea r magnetic resonance NOE nuclea r Overhauser effec t NOESY NO E spectroscop y nt nucleotid e O6MeG 06-methylguanin e O8A 8-oxoadenin e O8G 8-oxoguanin e PAGE polyacrylamid e gel electrophoresi s PDB Protei n Dat a Bank PME particl e mes h Ewal d
xviii
Abbreviations
PNA peptid e nuclei c acid ppm part s per million PT paralle l triplex r rotatio n RESP restraine d electrostati c potentia l rMC restraine d Monte Carl o rMD restraine d molecular dynamic rms roo t mea n square rmsd roo t mea n squar e differenc e RNAase ribonucleas e RNP ribonucleoprotei n RRE Rev respons e elemen t rRNA ribosoma l RNA SAS solvent-accessibl e surface s SFE solven t fre e energ y SP spermin e only for m of d(CGCGCG) SQL structur e query languag e t trans t translatio n TAR trans-activatio n respons e TBP TATA-bindin g protei n tRNA transfe r RN A UV ultraviole t WWW world-wid e web
1
Polynucleotide secondary structures: an historical perspective Struther Arnott The University, St. Andrews, Fife KY16 9AR, Scotland
1. Introduction In this chapter I shall describe th e fibre-derive d X-ra y analyse s upon whic h studie s of polynucleotide helical conformation s mainl y depende d from 195 0 t o 1980 . Th e firs t of these three decade s started off with th e dramati c events that showed tha t DNA, th e large comple x polyme r withi n whos e primar y structur e geneti c informatio n was stored, ha d a n unexpectedl y simpl e secondar y structure . Soon i t becam e clea r that i t could hav e tw o secondar y structures and fo r muc h o f th e 1950 s th e effort s o f molecular biophysicist s were concentrate d o n puttin g th e detail s of these tw o allomorph s beyond cavil . I n th e 1960s , whe n a s much effor t wa s pu t int o RN A structure s as DNA structures , i t becam e eviden t tha t polynucleotid e doubl e helice s belonge d t o two sets o f secondar y structure s related t o th e origina l tw o eponymou s DN A allo morphs, A and B. In th e sam e decade the technolog y o f X-ray diffractio n analysi s o f fibres becam e mor e sophisticate d s o tha t b y th e 1970 s th e fin e detail s o f syntheti c polynucleotide duplexe s o f define d sequenc e coul d begi n t o b e explore d routinely . This exploration , an d the emergin g parallel studies of oligonucleotides i n singl e crys tals, uncovere d a third se t of helical allomorphs, Z , o f opposite han d t o th e tw o ori ginal set s that had become familia r durin g th e previou s 20 years or so . These tw o set s of investigation s als o promote d speculation s tha t th e bas e sequence s withi n helice s might b e emphasize d by characteristic conformations and morphological wrinkle s o n the surface s of helices. Ther e are , indeed , wrinkle s o n th e surface s both o f polymer duplex helice s and of quasi-helical oligome r duplexes , bu t whethe r the y ar e of much significance biologicall y i n DN A remain s t o b e established . DNA i s obviousl y ver y plastic and thi s is important for its role as the substrat e in man y interactions. Fibres, metaphoricall y an d literally, are the continuou s threa d in th e stor y of DN A (1) an d relate d polynucleotides , fro m befor e 195 0 righ t u p t o th e presen t day . Th e important polynucleotide secondar y structures are all helical whether the y are single- , double-, triple- , o r quadruple-stranded. Lon g helices are more likel y t o be ordere d i n oriented fibre s tha n i n larg e singl e crystals . (Wh o ha s yet crystallize d a quasi-helica l oligonucleotide wit h 2 0 or 3 0 residues?) Helices impl y a motif containe d within on e pitch length, whic h is repeated linearl y alon g one polyme r molecule. Th e proces s o f spinning a fibre orient s such polymer molecules with thei r repeated motifs at least parallel t o on e another . Thes e ordere d array s make X-ra y diffractio n analyse s possible, mainly becaus e th e X-ray s scattere d b y the m ar e greatl y amplifie d version s o f th e
Fig. s ooff fibr e dihti'iu'tio ndiagrams dLii^nun- i of o hB-type 13-type DNAs DNA sthat that spa th e rangeand range . an dindicate indicat ethe th ediversity diversit yofofstructures. structures. (a C l a s s i c a l B-DNA B - D N A in i naa fibrous fibrou sspec^pL'c Fig.1.1 1.1.. Siimple Samples fibrediffraction spann the (a)) Classical 1 mim wlier Hi emolecules mol^riile 1 ;are :ire oriented oriente d parallel paraDe E to t o the tli efibre fit^r e axes Lixt 3^ and an d ir mKT(iLTN r stal]i]ic-. wit h th e resul n patten lik ethat ih;i tfrom froi n au rotatrOLU imen wheree the aree locall locallyy microcrystalline, with the resultt tha thatt th thee diffractio diffraction pattern] iis^like ing singlee a-ysta crystal B-NA. .The 33.7? A. Thee nver^ averagee twis twistt pe perr residu residue therefore andd th thee ing singl l o of f 11-PNA Th eintensity iiitensic\ fingerprint ' tln^LTjirintindicates indiL"itL' s a tenfold tenfol d helix heli x of < i f ppitch i t f } i 33." A . Th e iiss therefor e / t=+36.60 — r3(>.u° ^n average axu l ris e JILper T niifleotid 7 AA.. (h A iin n a\\ fibre fibre iinn which whic h the th e molecules molecule s ar merel y uniaxially u n i a v i a l l y oriented wa s ian n LMii e\impk h ooff average axial rise nucleotidee Ah ==3.3 3.37 (b)) Cl Classical C-DNA aree merely oriented.. Thi Thisi was earlyy example L t^if L tl C-DN imn-inK-yrj] DN A helice sinc e th e molecule H nucJeoude n thre e pitche s ,itid I n this thi s case c.i^ c /hi = = .V 3 I A. A . (c) (c ) Classical Cikissioi l l.)-DN A iin n non-integral DNA helicess since the moleculess h,iv havee 228 nucleotide p^irs pairsi in three pitches and thererbre thereforeFt =±3S.h° +38.6 .. In 3.31 D-DNA ;\n uniaxially uni;i>;i^ll y nneme d hhr e oof f poi y d(AAT):pol y d(ATT V Th e niolefnlni n eightfol d ^crc' w wit h t t—= "•"4^.0° an J /hj = —3.05 3 J )^ A. A. an oriented fibre poly d(AAT):poly d(ATT). The molecular' helice helicess lna\'t: haveaan eightfold screw with +45.0 and
Polynucleotide secondary structures: an historical perspective
3
Fig. 1.2. Mutuall y perpendicula r projections of segments o f B-type polynucleotide duplexes correspondin g to th e diffractio n pattern s o f Fig . 1.1 . Al l the helice s ar e right-handed , th e chain s antiparallel , and i n eac h duplex all the nucleotides conformations are identical. Thus the molecular symmetries are: (a) B-DNA, 10122 ; (b) C-DNA, 2832; (c) D-DNA, 8122. Morphologically an open and deep majo r groove is the persistent prop erty of these allomorphs, but a s t increases from 36. 0 t o 45.0° an d h declines from 3.37 t o 3.05 A , the almos t as deep minor groove s close . At the same time th e inclination o f the base pairs becomes more negative .
4
Oxford Handbook of Nucleic Acid Structure
scattering from a single motif. Th e diffractio n pattern s from uniaxiall y oriented fibre s give mainl y non-Brag g distribution s o f continuou s intensit y alon g laye r line s (2) . Good example s ar e shown i n Fig s l.l c an d 1.5a,b . I n thi s respec t the y ar e different from th e spotty , Bragg patterns given by crystal s where a motif i s repeated in a regular three-dimensional array . Th e diffractio n consequence s o f suc h three-dimensiona l regularity i s a three-powered amplificatio n of the repeate d motif's scatterin g pattern in specific directions . Thi s amplificatio n i s a benefit tha t usuall y outweighs th e corre sponding extinctio n o f the scatterin g pattern i n the many directions that do no t obe y the Bragg conditions. X-ray diffractio n analyse s of merely oriente d system s can b e just a s illuminating a s analyses o f full y crystallin e systems : the structura l studies of tobacc o mosai c virus (3) and o f bacteriophag e Pf l (4 ) hav e demonstrate d thi s amply , a s have th e analyse s o f fibres o f th e syntheti c DNA:RN A hybrid s (5 ) that provid e mor e non-Brag g X-ra y diffraction (Fig . 1.5a,b ) tha n Bragg diffraction . With nucleic acid s there are often eve n more favourabl e situations whe n th e uniaxiall y oriente d system s are, i n addition , microcrystalline an d therefore provide onl y Bragg-type dat a (e.g. Figs 1.1a and 1.3a,b) . Using contemporar y method s o f measuring intensities , current structur e determina tions o f repeated oligonucleotid e sequence s in fibre s tha t ar e both uniaxiall y oriente d and polycrystallin e ca n compet e wit h single-crysta l analyse s o f oligonucleotides , except i n th e fe w case s o f the latte r where exceptiona l crysta l perfection (6 ) provides an unusually rich set of high resolutio n data . To study oligonucleotide system s only in crystal s is needlessly remote fro m polyme r structures when th e objec t of the stud y is to determin e th e effec t o f sequence on local conformations o n a naked polymer. Certainly , i n term s of the secondar y structures of Watson—Crick base-paire d duplexes , ther e hav e bee n n o discoverie s wit h oligonu cleotides tha t hav e overturne d previous , fibre-derive d insight s wit h respec t t o th e prevalent right-hande d helica l conformations . Th e on e tru e novelt y t o emerg e fro m oligonucleotide crystallograph y wa s th e exoti c left-hande d helica l conformation s (Z-DNA) availabl e t o oligo(dGC):oligo(dGC ) (7,8 ) an d late r recognize d i n certai n polymers also with alternatin g purine-pyrimidine (9 ) base sequences. High resolution, single-crystal analyse s ar e also essential when visualization s of the precis e interactions between specifi c oligonucleotid e sequence s and adducts are needed (10) , o r when th e subtle adjustments in local structure required t o accommodat e a mismatched base pair have to be scrutinized (e.g . ref. 11) .
2. The DNA duplex: discovery and definition It ma y be to o procrustea n t o squeez e th e progressio n o f knowledg e abou t polynu cleotide secondar y structures into exac t decades, but ther e i s a certain convenience i n doing so . In the 'dar k age' befor e 1950 , diffractio n pattern s of oriented DN A existed . These wer e confusin g because , as we ca n now see , they were o f poorly ordere d mix tures o f th e A an d B allomorph s o f DNA . Nevertheless , thei r ver y existenc e fo r a polymer containing complex base sequences encourage d th e hop e tha t thes e divers e sequences migh t b e accommodate d i n a ver y simpl e framework . Mauric e Wilkins ' first achievemen t (12 ) was a clean pattern of the commones t allomorp h o f DNA, late r called B (Fig. l.la) . Rosalind Franklin' s main contributio n (13 ) was the discover y that
Polynudeotide secondary structures: an historical perspective 5 DNA wa s dimorphic (Fig . 1.3c) . Interestingly , sh e name d he r late r discovere d for m A an d th e prio r Wilkins ' for m B , perhaps because, i n he r hands , the uniaxiall y ori ented fibre s o f the forme r wer e alway s of the 'superior ' polycrystallin e typ e whereas those o f th e latte r wer e polycrystallin e onl y accidentally . Th e precis e experimenta l circumstances tha t woul d provide , routinely , oriente d an d polycrystallin e B pattern s (Fig. 1. 1 a) had t o awai t Wilkins' meticulou s furthe r experiments . Meanwhile, bot h A and B patterns helped Watson and Crick (14 ) to the conclusio n that DNA ha d helical secondary structures and provided th e dimension s an d symmetries that were impose d upon thei r firs t DN A models , whic h incorporate d antiparallel , duplex, right-hande d helices (e.g . Fig s 1. 2 and 1.4 ) an d isomorphou s A: T an d G: C pairs . However, i t was these isomorphous, complementar y pair s that were th e ke y revelation that was immediately exploite d i n orde r t o understan d th e molecula r biolog y o f genes . T o begi n with th e helica l framework s were incidental an d even a n embarrassment: the fac t tha t the tw o helica l chain s wer e intertwine d pose d th e difficul t proble m o f visualizin g unwinding durin g replicatio n o r transcription; also, the coordinate s of all the atom s in the helica l models (15 ) allowed diffractio n intensit y distribution s to b e calculate d and these wer e foun d a t onc e t o b e seriousl y differen t fro m thos e observed . Thi s pro voked a series of challenges to the Watson and Crick conjectur e by (notably) Donahu e (e.g. ref . 16) . Th e respons e by Wilkin s an d hi s grou p (17,18 ) wa s a decad e of pain staking refinement s o f th e origina l model , whic h contrive d t o preserv e the origina l base-pairing hypothesi s while remedyin g th e initia l ver y poo r fi t with th e diffractio n data. The fi t o f the origina l Cric k an d Watson model , incidentally , was so poor tha t th e residual error, a s measured by th e crystallographers ' R-factor , wa s about 0.80, a value so large a s to indicat e to conventiona l chemica l crystallographer s a structure so erroneous a s to b e beyon d rescue . Ironically , Wilkins' rescu e was possible because of th e polymorphism o f DNA. Th e origina l Cric k an d Watson model for B-DNA (15 ) was, unwittingly, what we would no w cal l an A structure. It had reasonable stereochemistry but incorporated , no t C2'-endo-puckered , bu t C3'-endo-puckere d furanos e rings . Such duplexes have base pairs 4 A nearer the heli x periphery tha n in B-DNA—a major difference i n th e distributio n o f electro n densit y tha t le d t o th e incompatibilit y o f th e calculated with th e observe d diffraction pattern . In th e 1950 s ther e wer e n o well-develope d protocol s fo r meldin g low resolutio n diffraction dat a wit h stereochemica l restraint s an d constraints . Consequently , th e refinement o f model s t o prov e th e Watson—Cric k conjectur e was a labour-intensive , manual proces s that persiste d unti l 1960 , accompanie d a s it wa s by th e equall y slo w processes of obtaining purer DNA specimens , better method s o f spinning DNA fibre s and o f collectin g highe r resolutio n X-ra y data . Nevertheless , b y th e en d o f thi s 'decade o f discover y an d definition ' tw o distinc t allomorph s fo r DN A duplexe s ha d been define d (17,18) , th e B - an d A-forms, whic h wer e mos t obviousl y distinguishe d by th e positio n o f th e bas e pairs: astride th e heli x axi s in th e forme r bu t noticeabl y displaced (d x = - 4 A ) in the latter . An immediat e consequenc e o f this are the equall y distinctive groov e structures : i n B th e majo r an d mino r groove s ar e equall y dee p (Fig. 1.2a) , whereas in A the majo r groove i s a relatively deep chasm , contrasting wit h the mino r groove , whic h i s merely a shallow depressio n (Fig . 1.4c) . Other feature s o f the bas e pairs in bot h structure s wer e thei r mil d propelle r distortio n fro m complet e
Fig. 1..1 . Samples , of libr e diffractio n diagram s o f A-typ e polynuduotid e duplexe s which spa n th e rang e an d indicat e th e diversit y o f structure s ot thi s genus. I n all three case s shown th e molecule s i n th e fibre s ar e uniaxiall y oriente d an d microcrystalline . (a ) The 12-fol d heli x (A'-RNA ) diffractio n fingerprin t observe d wit h RNA duplexe s and DNA:RN A hybrid s indicate s {/ , h) = (i3ll.d° , 3.0( 1 A) . (b ) The 1 1-fold helix (A-RNA ) fingerprin t observe d wit h RN A duplexe s indicate s (/ , h) = +(32.7° , 2.8 1 A) . (c ) The f i n g e r p r i n t o f classica l A - D N A als o indicate s a n 11-fol d heli x wit h (r . If) . (±32.7" , 2.5 6 A) .
Polynudeotide secondary structures: an historical perspective
7
Fig. 1.4. Mutuall y perpendicula r projection s o f th e rang e o f A-typ e duple x helice s correspondin g t o Fig. 1.3 . Al l are regular an d right-hande d an d have identical antiparalle l chains and therefor e thei r molecular symmetries are : (a ) A'-RNA, 12122 ; (b ) A-RNA, 1112 ; (c ) A-DNA, 1112 . Th e commo n molecula r feature o f these doubl e helices is their shallow minor groove s and very deep major grooves . In (a) , where h is maximum , th e majo r groov e i s also wid e open , bu t i n (c) , axiall y th e mos t compac t conformationa l variant, the majo r groove i s essentially closed .
8
Oxford Handbook of Nucleic Acid Structure
coplanarity an d th e larg e inclinatio n o f abou t 20 ° i n A , associate d with th e shorte r (2.56 A) rise per residue , compared wit h th e essentiall y 0° inclination in B, which has a longer (3.3 7 A ) rise per residue . The helica l twis t in A (32.7°) is also lower than tha t in B (36.0°). Towards th e en d o f the1950 s a thir d (19 ) an d a fourth (20 ) allomorph , C an d D , were als o discovered , bot h B-lik e i n structur e (Fig . 1.2 ) bu t wit h reduce d rise s pe r residue (3.3 1 an d 3.0 5 A , respectively ) an d increase d helica l twist s (38.6—40. 0 an d 45.0°, respectively) . These discoverie s heralde d the nex t decad e (1960s ) which ma y be thought o f as the 'decad e o f expansion an d exploration' .
3. Expansion By th e 1960 s i t wa s evident tha t ther e migh t b e man y polynucleotid e structure s to determine an d that , therefore , computerize d mode l buildin g (21 ) ha d t o tak e ove r from manua l procedure s an d valuabl e analytica l methods , suc h a s least-square s (21,22) refinement s an d Fourie r synthese s o f electro n densit y (23,24) , tha t wer e commonplace i n orthodo x X-ra y diffractio n analyse s of crysta l structures had t o b e adapted fo r furthe r studies . While thi s wa s in trai n a n importan t even t occurre d i n the discover y and determination o f the structure s of two allomorphs o f duplex RN A (24,25), bot h A-typ e (Fig s 1.3a, b an d 1.4a,b) , whic h immediatel y extende d th e range o f polymorphis m o f thi s se t o f right-hande d polynucleotid e helice s an d showed tha t the rang e o f helical twist s available to A structures was only 30.0—32.7 ° (cf. 36.0—45.0 ° availabl e to B structures) , but tha t rise s per residu e might b e just as variable fo r A structure s (2.56-3.00 A ) a s for B (3.05-3.3 7 A) . I t wa s also realize d explicitly tha t the distinctiv e morphologies o f the A and B structures correlated wit h C3'-endo furanos e ring s i n th e forme r versu s C2'-endo ring s i n th e latte r (21) , an d that thes e conformation s wer e th e origi n o f th e ver y negativ e dx displacement s o f the bas e pairs. A quarter of a century later, and afte r mor e tha n a hundred ver y expensive oligonu cleotide crysta l structure determinations, i t has had t o b e concluded , reluctantl y (26) , that: (i ) B-like structure s have a mean helica l twis t (an d standard deviation) o f 36.1 ° (5.9°) an d a mean axia l rise per bas e pair (an d standard deviation) o f 3.37 A (0.4 6 A); (ii) A-lik e structure s have mea n value s for helica l twis t an d ris e pe r residu e o f 31.1 ° and 2.90 A , respectively; and (3 ) the mos t persistent morphological featur e differentiating th e tw o familie s is the 4— 5 A relativ e bas e pair displacemen t tha t give s ris e t o their distinctiv e groove structures! Rarely i n the histor y o f scientific endeavou r has so much effor t b y so many investi gators provided s o few new insight s of significance .
4. Discrimination and exploration The introductio n o f automatic least-square s refinement to X-ray diffractio n analysi s of polymers i n fibre s (21 ) i n th e 1960 s no t onl y allowe d easie r an d faste r productio n of the polynucleotid e model s with th e bes t coordinates, bu t als o provided a means o f discriminating betwee n alternativ e structura l hypotheses. Suppos e Watso n an d Cric k had been awar e that for their firs t mode l o f B-DNA they would hav e to conside r left -
Polynudeotide secondary structures: an historical perspective 9 handed a s well a s right-handed duple x helices , an d tha t furanose ring s could be C2' endo a s well a s C3'-endo. The y shoul d hav e found i t necessar y and possibl e t o cobbl e together fou r version s of a DNA mode l eac h with isomorphou s A: T an d G: C pairs . The ris e per residue o f 3.37 A would no t hav e been ver y discriminating , no r would a helical twis t o f ±36° . Thes e generou s dimension s resul t in a fairly ope n structur e for B-DNA and , therefore, non e o f th e initia l model s woul d hav e bee n embarrassin g stereochemically. Sinc e the y woul d als o b e buildin g isolate d molecule s tha t di d no t have t o fi t int o a tigh t uni t cell , anothe r sourc e o f discriminatio n woul d hav e bee n absent. Onl y whe n the y ha d t o fi t the X-ra y intensitie s optimally, whil e maintainin g viable stereochemistry , would i t have been foun d that the tw o right - an d left-hande d models wit h C2'-endo ring s wer e noticabl y superio r t o th e right - an d left-hande d models wit h C3'-endo rings. The bes t right-handed double heli x with C2'-endo rings might hav e been somewha t superio r t o th e bes t left-handed structure , but coul d onl y have been judged t o be significantl y superior by applying statistical tests, suc h as those that wer e onl y late r introduce d b y Walte r Hamilto n (27) , to th e bes t least-squares models of each kind. During th e 'decad e o f discrimination ' (1970s ) th e possibilit y o f least-squares optimized model s o f polynucleotides, an d th e existenc e o f Hamilton's tests , remove d much o f th e uncertaint y tha t ha d com e t o b e associate d with th e fibr e diffractio n analysis of polynucleotides . This uncertaint y woul d no t hav e arise n s o acutel y i f meticulou s experimenta l studies of fibrous polynucleotide system s had been commonplac e i n laboratories othe r than tha t o f Mauric e Wilkins . Unfortunately , the y wer e not . Encouraged b y th e Watson an d Cric k model-buildin g coup , whic h owe d littl e t o loca l experimenta l effort, man y othe r analyse s of fibrou s polynucleotid e system s were undertake n wit h just as little experimenta l investment, but wit h muc h les s insight. Deservedly, most of the conclusion s from thes e forays wer e wrong, but fro m thes e failure s gre w a n under standable lac k o f confidenc e i n fibr e studie s o f polynucleotides , which , durin g th e 1960s accumulated an appalling negative record: n o fibrou s nuclei c acid structure produced b y a laboratory no t o f Maurice Wilkins ' schoo l survive d critica l re-examina tion: th e mode l fo r B-DNA by Crick and Watson (15) turne d ou t t o be a model fo r a member o f th e A-family ; Rich' s three-strande d mode l (28 ) for polyinosini c aci d should hav e bee n four-strande d (29) ; th e double-strande d mode l o f Langridg e an d Rich (30 ) for polycytidyli c aci d shoul d hav e bee n single-strande d (31) ; and Mitsu i et al. produced a left-handed model fo r D-DNA (32), whic h is , in fact , right-hande d (33). Th e poin t i s not tha t on e ca n easil y be wron g i n modelling a fibrous structure, but tha t with today' s technology scrupulousl y applied, most gross errors ar e detectable if enough effor t i s invested in alternativ e structures.
5. Polymorphism Polymorphism i n polynucleotide helice s has a number o f aspects: How polymorphou s are duplexe s containin g isomorphou s Watson—Cric k A: T an d G: C bas e pairs , n o matter wha t th e bas e sequenc e is ? How polymorphou s ar e the y whe n a particula r base sequenc e i s monotonously repeate d alon g th e polymer ? Furthe r question s arise when on e chai n is RNA bu t th e other i s DNA; when triple x helices occur in which a
Fig. 1.5 . D N A : R N A hybri d duplexe s with genera l base - sequem-e s are observed i n fibre s generall y t o hav e structure s isomorphoius with variou s DNA:DN A an d RNA:RNA duplexe s that hav e identica l antiparalle l chains . However, uniqu e diffractio n pattern s arc obtaine d with (.1 ) poly d(l):pol y d(C ) tha t indicat e tenfol d heluvs wit h (f . ti) = (36.0° , 3.1 3 A ) an d wit h (b ) pol y d(U):pol y r(A) th.it indicat e 1 1 fol d ln-lict s wit h (t, h] = (32.7° . 3.0 6 A) . Thes e patcern t ar e reminiscent of " die DU O obtained fro m (c ) th e triple x helice s of pol y r(A)ipol y r{U):pol y r(U ) tha t indicatt - 1 1 -Told helice s with i'r . li) ~ (32.7" , 3.0 5 A) .
Fig. 1.6. Th e DNA:RN A hybri d structure s corresponding t o Fig . 1.5a , b turn ou t t o be heteromerous, i.e . thei r chemically distinc t chain s are also conforma tionally distinct , as are all three chain s in th e RNA:RNA:RNA triplex . I n (a ) poly d(I):pol y r(C) , an d in (b) , poly d(A):pol y r(U), th e pol y d(R) chain s have B type conformations and the poly r(Y) chains A-type conformations . In (c) , poly r(U):poly(A):poly r(U ) th e poly r(U ) chai n that is Watson-Crick base-paired wit h the poly r(A ) chain is A-type, bu t th e Hoogstee n base-paire d poly r(U ) chai n is B-type, as is the poly r(A) chain itself. The duple x and triple x compoun d helice s are shown in mutually perpendicular projections in both disaggregate d an d aggregated forms.
12
Oxford Handbook of Nucleic Acid Structure
Watson—Crick duple x o f specia l sequenc e ha s a thir d stran d attache d tha t involve s non-Watson—Crick base—bas e interactions; and when duplexes , triplexes , an d quadru plexes ar e studie d i n whic h non e o f th e base—bas e interaction s can b e isomorphou s with th e classica l Watson-Crick bas e pairs. All these situations began t o b e explore d before th e 1970 s but i t was only whe n th e technolog y o f fibre diffraction analysi s had been systematize d tha t they could be explored scrupulously an d reasonably rapidly . An additional non-trivia l requiremen t wa s better dat a from bette r fibres , whic h coul d b e contrived onl y afte r ther e wa s ready availability , and i n quantity , of trul y polymeri c homopolynucleotides an d polyoligonucleotides of well-defined sequence . Discrimination i s a persisten t featur e o f polynucleotide structur e analyse s in fibre s and o f oligonucleotide analyse s in singl e crystals . As the precisio n o f analyses become s finer, th e issue s move o n fro m question s of the handednes s of helices, and fro m ques tions o f on e rin g pucke r o r another , t o whethe r a conformationa l wrinkl e o n th e surface o f a helix is real, and, i f real, is its existence predetermined b y primary structure or merely an accident of local crystal interactions or the effec t o f an odd catio n or two? How man y blobs of electron densit y represen t rea l water molecules and, if real, are they important and , if important , ar e the y trul y importan t t o molecula r biologist s rather than merely comfortin g to crystallographers worried b y less-than-atom resolutio n data ? To anticipat e th e detaile d conclusion s o f th e 'decad e o f discriminatio n an d poly morphism' (1970s ) it should be sai d that polynucleotide helice s have turned out t o b e much les s polymorphic tha n a polymer chemis t might hav e supposed. Any nucleotid e residue ha s si x variabl e conformatio n angle s i n it s phosphat e dieste r backbon e an d each o f these angles has two o r thre e region s o f variation. I n addition , ther e ar e two regions o f variatio n availabl e t o base s a t thei r glycosyli c attachments . Th e naiv e expectation ha s to be tha t polynucleotide helice s should be very polymorphic . Eve n if it is insisted tha t bases are 'stacked', i t is not obviou s that th e expecte d polymorphism should be reduce d t o merel y thre e classes ; namely, the origina l right-hande d A and B chains tha t incorporat e eithe r C3 ' endo o r C2'-endo furanos e rings , an d th e unique , left-handed Z chain s tha t incorporat e th e tw o kind s o f ring s alternately ! No r i s it obvious tha t requirin g a fe w hydroge n bond s i n Watson—Cric k o r an y othe r bas e pairing would seriousl y limit furthe r macropolymorphism . Yet , thi s doe s see m to b e the case . This i s not t o sa y that micropolymorphism doe s not exist . I t does : not al l chains of the A- , B- , o r Z-type s ar e identical t o on e another ; no r nee d th e tw o chain s in any particular A- , B- , o r Z-duple x b e identica l t o on e another , no r eve n similar , sinc e duplexes wit h A and B chains exist , a s do triplexes that incorporate mixtures o f A and B chain s (Fig. 1.6) . It i s also the cas e that local nucleotid e conformation s in oligonu cleotides sometime s vary , apparentl y i n a sequence-dependen t way . Much o f th e extent an d limits of these polymorphisms hav e been reveale d in polynucleotide fibres . These conclusion s have bee n confirme d an d a fe w o f the m hav e bee n extende d b y detailed analyses of oligonucleotides i n single crystals.
6. Homopolymers Uniaxially oriente d fibre s o f poly A , pol y U , pol y (thioU) , pol y C , pol y G , poly I , poly X hav e al l been fabricated . Th e diffractio n pattern s o f the firs t thre e polymer s
Polynucleotide secondary structures: an historical perspective 1
3
have all been interprete d a s deriving fro m double-strande d molecule s and that of poly I fro m a triple-stranded molecul e (28) . This patter n and tha t o f the essentia l fibres o f poly G hav e since been show n t o aris e from quadruplexe s (29 ) with A-lik e polynu cleotide chains . Oriente d fibre s o f pol y C fibre s ca n als o b e polycrystallin e and ar e now firml y establishe d as containing single , no t double , strand s o f A-lik e pol y C helices (31). No satisfactor y analyses of poly A or poly U fibre s hav e been completed . Poly (thioU ) an d poly X giv e surprising similar diffraction pattern s that are even mor e surprisingly simila r t o A-DN A (34,35 ) an d mus t therefor e correspon d t o duple x arrangements o f identica l right-handed , antiparalle l chain s wit h conformation s in right-handed, 11-fol d helice s with a 2.56 A rise per residue ! Apparently, such identical, antiparallel , sugar-phosphate chain s can, by mutual rotation abou t their commo n helix axis , contriv e duple x structure s that ca n accommodat e purine-purin e (X:X) , purine—pyrimidine (A: T o r G:C) , o r pyrimidine—pyrimidin e (s 2U:s2U) bas e pair s without an y significant conformationa l readjustment. This trul y remarkabl e result has important implication s for th e lac k o f geneti c specificit y implici t i n polynucleotid e secondary structures by themselves. To emphasiz e ho w adep t polynucleotid e helice s o f conventiona l conformatio n are a t accommodatin g exoti c bas e sequence s wit h complementar y (bu t non Watson—Crick) bas e pairs , on e onl y ha s t o conside r th e structure s of duplexe s an d triplexes containin g mixture s o f homopolynucleotide s suc h a s poly I:pol y A:pol y I , where ther e ar e two kind s of purine—purine pair s an d yet th e polynucleotid e strands are conformationall y conventionall y A-typ e (36) , albeit not conformationall y identical. Othe r triple-strande d homopolyme r systems , such as poly U:pol y A:pol y U (37 ) (Figs 1.7 and 1.8 ) an d poly dT:poly dA:pol y T (38) , have also been investigated. These contain bot h Watson—Cric k an d Hoogstee n bas e pairs . Originall y i t wa s assume d (36—38) that all the chain s would b e A-type, i.e . the structure s would be merely a n Atype Watson-Crick duple x wit h th e thir d strand , also A-type, fillin g th e wide , dee p major groove . Comprehensiv e revie w (39 ) o f alternativ e models wit h th e bes t least squares results , Hamilton-tested , ha s show n tha t pol y dT:pol y dA:pol y d T in fac t ha s a structure with al l chains B, but pol y U:poly A:pol y U ha s an A:B:B structure. Th e origina l conjectur e that pol y I:pol y A:pol y I ha s a n A:A:A triple x has, however, survived rescrutiny. It alway s ha d t o b e thinkabl e that DNA:RN A hybrids might hav e a heteromerous duplex structure with tw o conformationally non-identical strands . In fact, DNA:RN A hybrids mos t ofte n hav e fibrou s structure s isomorphous wit h A-DN A o r A'-RN A (5,40) (Fig s 1. 3 and 1.4 ) and must, therefore, form duplexes with polynucleotid e chains that are conformationally identical despite their chemical difference. Tha t heteromerous structures indee d exis t ha s been demonstrate d wit h syntheti c DNA:RN A hybrid s where th e chains are homopolymers, like poly dA:poly rU and the related, but no t iso morphous, pol y dI:pol y r C (Fig s 1. 7 an d 1.8) . I n eac h o f thes e duplexe s th e DN A strand i s B-type an d th e RN A stran d A-type (40) . It wa s originally though t tha t th e unique (B' ) diffractio n patter n of poly dA:poly dT (38 ) (Fig. 1.7c) was also the conse quence o f just suc h an heteromerou s structur e (41) , but mor e intensiv e analyses of a variety o f crystal forms o f poly dA:pol y d T (42,43) , pol y dA:pol y d U (44) , and pol y d(AI):poly d(CT) (45 ) have shown that all these structures, although heteromerous with two non-identical polynucleotid e strands, contain two B-type strand s (Fig. 1.8c).
Fig. 1.7 . Fibr e diffractio n pattern s fro m (a ) cal f thymu s DNA , (b ) poly d(GC):pol y d(GC) , an d (c ) poly d(A):pol y d(T) . Th e similarit y o f th e interesit y distribu tions indicate s that the y al l deriv e fro m structure s cha t ar e analogou s t o th e tenfol d holice s o f classica l B-DNA . Th e rathe r simila r patterns i n (a ) and (b ) sugges t that th e difference s betwee n an d 'average ' B-DN A an d B pol y J(GC):pol y d(GC) , whil e significant , ar e als o subtle . Th e mor e distinctiv e patter n o f pol y d(A):poly d T i n (c ) leads on e t o anticipat e some markedl y differen t conformation .
Polynucleotide secondary structures: an historical perspective 1
5
Fig. 1.8 . Mutuall y perpendicular views of: (a) the 'average ' (cal f thymus) B-DNA structure with molecu lar symmetr y 10122 ; (b ) th e B-for m o f pol y d(GC):pol y d(GC ) whic h ha s 512 2 symmetry , i.e . a righthanded helica l duple x with identical antiparalle l chains, each of which i s a fivefold heli x o f dinucleotide s with Gp C conformationall y distinc t from CpG; (c ) the so-calle d B'-form o f poly d(A):pol y dT wher e th e molecular symmetr y i s 101 , an d ther e i s no dya d axi s of symmetr y relatin g the tw o chains , i.e . th e pol y d(A) an d poly d(T) chain s have the same pitch an d symmetry bu t th e nucleotides in the differen t chain s do not hav e the sam e conformations.
16 Oxford Handbook of Nucleic Acid Structure
7. Polyoligonucleotide duplexes Following o n fro m th e polymonomicleotides , chemically , th e simples t syntheti c polynucleotides ar e th e polydinucleotide s wit h alternating , self-complementar y base sequences, pol y d(GC):pol y d(GC ) an d pol y d(AT):pol y d(AT) , bot h o f which , i n different ways , turne d ou t t o b e ver y importan t i n extendin g th e rang e o f DNA poly morphism (Fig s 1.7-1.12) . A s mentioned before , pol y d(AT):pol y d(AT ] wa s important fo r it s B-like , D structure , whic h strictl y i s a fourfol d heli x o f dinucleotide s (46) , but t o a goo d approximatio n i s a n eightfol d helix , wit h twis t = 45.0° , an d wit h a reduced ris e pe r residu e (3.0 2 A ) compare d wit h tenfol d helica l B (3.3 7 A) , Unlik e the classica l B structure , th e bas e pair s ar e inclined , bu t i n th e opposit e sens e t o A . This D structure , wit h C , brok e th e classica l B monopoly an d indicate d tha t th e twist s per residu e i n 1 3 structures coul d var y markedly , an d tha t th e variatio n coul d b e expected t o b e upward s fro m th e classica l valu e of 36.0o . Pol y d(AT):pol y d(AT ) als o forms orthodo x B helice s (47 ) and , reluctantly , classica l A helices . Th e rarit y o f A helices fo r thi s polyme r an d thei r complet e absenc e i n pol y d(A):pol y d(T ) re emphasizes a n olde r discover y tha t (AT)-ric h DNA s fin d th e B—» A transitio n mor e difficult tha n (GC)-ric h DNAs . Poly d(GC):pol y d(GC ) ca n b e obtaine d (an d i n fibre s o f well-washe d DNA , alway s is) i n th e A o r B form s (47) : th e A for m i s classical , a regula r 11-fol d heli x wit h n o conformational evidenc e o f th e underlyin g polydinucleotid e sequence ; no t s o th e B form whic h ha s bot h a crysta l structur e tha t i s differen t fro m nativ e B-DN A (Fig . l,7b) an d contain s fivefol d helice s o f dinucleotides , despit e th e generall y clos e re semblance o f it s diffraction patter n t o th e classica l B for m o f DNA . Th e roo t o f th e difference lie s i n th e differen t loca l conformation s i n Gp C an d Cp G wher e th e con -
Fig. 1.9 . Fibr e diffracmen t patterns i s fro m tw o form s o f polymeri c Z-DNA : (a ) fro m pol y d(GC):pol y d ( G C ) a sixfold h e l i x o f pitc h 4.1. 5 A , (/ , It = UilJ.n*' , 7.2 5 A) : (b ) fro m pol y d(A'T):pol y d ( A 1 n ) 1 sevenfold heli x o f pitc h 53. 2 A , (; , Is = +.51 .-i' : . 7 . d ( ) A).
Polynudeotide secondary structures: an historical perspective 1
7
Fig. 1.10. Mutuall y perpendicular projections o f segments o f the tw o polynucleotid e duplexe s tha t cor respond t o th e diffractio n pattern s o f Fig . 1.9 . Bot h ar e left-hande d helice s wit h antiparalle l chain s i n which th e uni t o f structur e i s a dinucleotide: (a ) has molecular symmetr y 6 522; (b ) has 7 62. Th e mor phologies o f both ar e compact an d quasi-cylindrical .
formations (e, £) are (g , t) and (t, t), rathe r than bot h (t, t) as they are , on average , in native B-DN A (46) . Thi s apparentl y sequence-relate d wrinkl e (Fig . 1.14b ) wa s th e first detecte d i n a polymeric DNA . A mor e modes t versio n o f th e sam e wrinkl e i s present i n th e D form s o f poly d(AT):pol y d(AT ) an d it s isomorph, pol y d(IC):pol y d(IC) (Fig s 1.11 an d 1.12) . There is also an interesting varian t o f the D for m o f poly d(AT):pol y d(AT ) whic h has a hexanucleotide structura l repeat (40 ) (Fig s 1.1 1 an d 1.12 ) becaus e successive A:T nucleotides hav e al l thei r (e, £) conformation s successively , bu t no t identicall y (t, g-), bu t successiv e TA nucleotide s ar e (g-, t), (g-, t), an d (t, t). I n othe r word s th e nondescript conformation , (t, t), i s intrude d ever y sixt h nucleotid e i n plac e o f th e
Fig. 1.11 . Fibr e diffracio n pattern s obtaine d fro m a variet y o f B-typ e D - D N A structure s (a ) the screw-disordere d form o f pol y d(AAT):pol y d(A!"!" ) i n whic h (/, /i ) = (45.0 c , 3.0 1 A) ; (b ) the tctragona l polycrystallin e for m o f pol y d(RY):poly(RY ) i n whic h (f/2 , h/2) - (45.0 c . 3.0 2 A) ; and (c ) a pleomeri c form o f poly d(AT):poly d(AT ) i n whic h th e confomationa l asymmetri c uni t i s a hexanueleotide an d (/6 , h/6) - (45.0o , 3.08 A) .
Polynudeotide secondary structures: an historical perspective 1
9
Fig. 1.12. Mutuall y perpendicula r projection s o f segments o f the D helice s tha t furnished the diffractio n patterns in Fig . 1.11 . Th e regula r 812 2 heli x of average mononucleotides in (a ) is fairly closel y mimicke d by th e 4 122 heli x of dinucleotides i n (b) , but les s so in th e 4 3 heli x o f hexanucleotides i n (c) , as is eviden t when on e view s th e overal l morphologies perpendicula r t o th e heli x axes . Then , th e distinctiv e surface s are more apparen t than when one contemplate s the projection s paralle l to the heli x axes .
discriminating conformation s (t, g- ) fo r (purine , pyrimidine ) step s an d (g- , t) fo r (pyrimidine, purine ) steps . The importan t message s to be taken fro m thi s structure are that no t ever y variation o f sequence produces a wrinkle an d that onl y som e wrinkle s may be diagnosti c o f sequences. Thus, whe n on e come s t o examin e detaile d confor mations in various B-type polyme r structures, such as poly d(GGT):pol y d(ACC ) (48 , 46) (Fig s 1.1 3 an d 1.14) , pol y d(AG):pol y d(CT ) (46) , pol y d(AI):pol y d(CT ) (45) , and poly d(AATT):pol y d(AATT ) (49) , one doe s indee d fin d tha t th e nondescrip t (t,
Fig. 1.1.1. Variou s fibr e diffractio n patterns of B-type C-DNA : (a ) th e cla ssical patter n obtained wit h calf thymu s DN A wher e (r , Ii) = (38,(>°, 3.3 0 A) ; (b ) a [pattern o btained w i t h pol y d(AG):pol y d(CT ) where the dinucleorid e duple x repea t i s very eviden t i n th e meridiona l diffractio n a t 6 .52 A an d wher e th e belice s have ninefol d scre w summerr y w i t h ( i / 2 . W2) - (•Ki.d 5 , 3.2 6 A) ; (c ) obtainable wit h pol y d(GGT):pol y d ( A C C ) , indicate s threefol d helice s wher e (j/3 , h/3 ) = i'4(40.0°: 3.3 1 A) .
Polynucleotide secondary structures: an historical perspective 2
1
Fig. 1.14. Mutuall y perpendicula r projection s o f segment s of : (a ) (classical ) C-DNA , symmetr y 28 32; (b) pol y d(AG):pol y d(CT) , symmetr y 9 2; and (c) poly d(GGT):pol y d(ACC) , symmetr y 31. The view s down th e heli x axe s emphasiz e bes t ho w muc h th e surface s o f thes e helice s woul d 'feel ' differen t t o exploring interactants.
t) conformation s ar e quit e common . Th e discriminatin g (g- , t) an d (t, g- ) conforma tions fo r (e, £ ) als o occur , an d ma y indee d represen t a conformationa l languag e o f likely nucleotide sequences . The morphologica l consequence s o f this language may be braille-like wrinkle s o n th e surfac e o f DNA, bu t s o far all the evidenc e indicate s that
22
Oxford Handbook of Nucleic Acid Structure
this language ha s a sloppy vocabulary an d tha t it i s impressionistic rathe r tha n precise , just a s one woul d expec t fro m a potentially rathe r polymorphi c polyme r tha t i s most often merel y a substrate. The mos t dramaticall y new allomorph s o f DNA, th e left-handed forms , called triv ially Z, wer e discovere d durin g the 1970s , als o with alternatin g purine—pyrimidine base sequences. Th e firs t allomorp h wa s detected i n a n exoti c varian t o f poly d(AT):pol y d(AT), namel y pol y d( s4TA):poly d( s4TA), b y Saenge r et al. (9) , (Fig . 1.9b) . I t ha s a structure (Fig . l.10b ) whic h i s a sevenfold helix o f dinucleotides (i.e . the heli x twis t is ±51.4°) wit h a n axial rise per dinucleotid e tha t is 7.60 A . Unfortunately , Saenger et al. did no t eve n contemplat e seriousl y a Watson—Cric k base-paire d structur e fo r thei r exotic ne w complex , fa r less a left-handed duplex , an d s o a grea t opportunit y wen t unrecognized unti l pointe d ou t b y Arnot t et al. (8 ) when the y discovere d a simila r novel diffractio n patter n (Fig . 1.9a ) fo r pol y d(GC):pol y d(GC ) i n a n ol d fibr e tha t earlier had been show n t o contai n B-DNA duplex helices. Their new allomorp h wa s a sixfold heli x o f dinucleotides, with , therefor e a helix twis t o f ±60.0°. It s axial rise pe r dinucleotide wa s 7.25 A . Unfortunatel y fo r thes e researchers too, th e ne w allomorp h had alread y bee n visualize d from a single-crystal analysi s of olig o d(GC):olig o d(GC ) (7) and show n t o be , unprecedentedly , left-handed . Eve n so , the fibr e structure s (Fig . 1.10a,b) attes t to tw o importan t conclusions : first , Z-DNA s are also polymorphic; an d secondly, the B to Z transitio n ca n take place in a not ver y wet o r plastic fibre, suggest ing tha t inversion o f helix sens e involves a mechanism wit h limite d loca l melting, base unstacking, and rotation, followe d by total rotations of individual quasi-cyclindical mol ecules. All of this could conceivabl y take place in the hydrate d soli d state.
8. Envoi In th e 1980 s an d 1990s , fibr e diffractio n analyse s of polymers hav e largely give n way to single-crysta l analyse s of oligonucleotides . I t woul d b e a pity i f th e forme r wer e extinguished altogether . Th e structure s o f polymer molecule s ar e not subjec t to end effects, no r ar e they terrorize d b y lattices; the sizes , shapes , an d spac e groups o f thei r lattices are more likel y to reflec t thei r intrinsi c dimension s and symmetries rathe r tha n the reverse . Certainly , interaction s of polynucleotides wit h drug s and the lik e ma y be visualized more precisel y in high resolution single-crysta l analyses, but i t could b e that subsequent direc t measurement s i n a polymeric syste m o f th e effect s o f th e interac tions would b e more convincin g tha n extrapolator y modelling . Eve n if such collabora tions do not evolve , i t would b e a denial of an important pioneerin g er a in the histor y of molecula r biolog y t o disguis e o r diminis h ho w muc h informatio n abou t nuclei c acid secondary structures was distilled fro m X-ra y studie s of fibres in the thir d quarte r of this century.
9. Appendix: further details of fibrous polynucleotide structures together -with some comments A comprehensive surve y of fibrous polynucleotid e studie s was prepared an d publishe d by Chandrasekara n and Arnott i n the mid-1980s and published (50 ) in 1989 . Som e o f
Polynudeotide secondary structures: an historical perspective 2
3
these result s are reproduced her e bu t wit h a differen t emphasi s and wit h revision s o f certain comple x structure s that hav e bee n reviewe d since , suc h a s the double - an d tripled-stranded helice s wher e eac h stran d i n th e comple x ha s a differen t conforma tion from the other(s). 9.1 Fibre diffraction
analysis
The number , quality , an d resolvin g powe r o f th e X-ra y diffractio n intensitie s fro m fibrous specimen s are rarely sufficient fo r th e relativ e atomi c position s in th e diffract ing molecule s to b e establishe d independently wit h usefu l accuracy . However, a s with crystallography of oligonucleotides, ther e are systematic schemes for augmenting these data with non-controversial stereochemica l information , whic h certainl y includes th e primary structur e of the polymer an d the mos t probable values of its bond length s and angles. Further metrical constraint s may be provided by the dimension s and symmetr y of the uni t cell, by the requirement s that non-bonded atoms should neve r be less than certain distances apart, and by the requiremen t tha t hydrogen-bonded an d polar interactions should be characterize d by a narrow rang e o f distances. The meetin g togethe r of thes e rathe r differen t kind s o f dat a ca n lea d t o ver y detaile d structure s i n whic h most o f the atomi c positions are defined t o withi n a few hundredths of a nanometer , which i s a precisio n adequat e fo r identifyin g th e critica l interaction s withi n an d between molecules. How fa r one proceed s varies from cas e to case , since there ar e a great many kinds of partially ordere d system s o f helica l molecules , eac h givin g ris e t o differen t type s o f fibre diffractio n pattern s in which bot h continuou s intensit y and Bragg maxima occur. If we wish to analys e quantitatively a diffraction pattern , we of course must succeed in modelling no t onl y th e molecula r structure , but als o th e molecula r packing . Thi s is true fo r an y diffractio n pattern , bu t fo r fibr e diffractio n pattern s ther e i s additional complexity becaus e the modes of packing are more varied and complex tha n in single crystals. Wit h fibrou s structures , solving th e X-ra y phas e problem , an d arbitratio n between plausibl e alternativ e model s devise d t o provid e th e initia l solutio n o f thi s problem, i s more o f an issue than wit h crystallographi c analyses , wher e multiple iso morphous replacemen t ca n lea d t o a n unbiase d experimenta l solution . Althoug h a direct o r experimenta l solutio n o f the X-ra y phase problem i s not usuall y possible fo r fibrous structures , the extensiv e symmetr y o f helical molecule s mean s tha t th e mole cular asymmetri c uni t i s commonl y a relativel y smal l chemica l uni t suc h a s a fe w nucleotides. I t i s therefore not difficul t t o fabricat e a preliminary mode l tha t provide s an approximate solution to the phase problem an d then to refin e thi s model t o provid e a 'best ' solution . Thi s process , however , provide s n o assuranc e that th e solutio n i s unique. Othe r stereochemicall y plausibl e model s ma y hav e t o b e considered . Fortunately, th e linked-ato m least-square s approac h (21,22 ) provide s a ver y goo d framework fo r objectiv e arbitration ; independen t refinement s o f competin g model s provide th e bes t model o f each kind; the fina l value s of the residual s provide measures of the acceptabilit y of various models; and thes e measures of relative acceptability can be compared usin g standard statistical tests (27) and the decisio n made whether o r no t a particular model i s significantly superior t o an y other. This approac h has been con sistently applied to th e structures detailed i n this Appendix .
24
Oxford Handbook of Nucleic Acid Structure
Table 1.1. Lis t of nucleic aci d structure s Structure Reference 1 A-DN 2 A-DN 3 A-DN
A (cal f thymus) 18,5 A pol y d(ABr 5U) : polyd(ABr5U) 5 A (cal f thymus) : poly d(A1T2C3G4G5A6A7T8G9G10Tll ) : poly d(AlC2C3A4T5T6C7C8G9A10T11) 5 4 B-DN A (calf thymus) 51,5 5 B-DN A poly d(GC) : poly d(GC) 5 6 B-DN A (cal f thymus) 5 Poly d(C1C2C3C4C5) : poly d(G6G7G8G9G10 ) 7 C-DNA (cal f thymus) 5 8 C-DN A poly d(GGT) : poly d(ACC) 5 9 C-DNA poly d(G1G2T3) : poly d(A4C5C6) 5 10 C-DNA poly d(AG) : poly d(CT) 5 11 C-DNA poly d(A1G2) : poly d(C3T4) 5 12 D-DN A poly d(AAT) : poly d(ATT) 55,5 13 D-DN A poly d(IC) : poly d(IC) o r poly d(AT) : poly d(AT) 5 14 D-DN A poly d(A1T2A3T4A5T6) : poly d(AlT2A3T4A5T6) 4 15 Z-DN A poly d(GC) : poly d(GC ) 8,5 16 Z-DN A poly d(As4T) : poly (As 4T) 8,5 17 L-DN A (cal f thymus) poly d(RY) : poly d(RY) 56,5 18 B'-DN A a poly d(A) : poly d(T) 5 19 B'-DN A P2 Poly d(A ) : poly d(T) 5 20 A-RN A poly(A) : poly (U ) 59,5 21 A'-RN A poly(I) : poly(C) 59,5 22 Poly(A ) : poly d(T) 60,5 23 Pol y d(G ) : poly (C) 60,5 24 Pol y d(I) : poly (C) 6 25 Pol y d(A ) : poly (U ) 6 26 Poly(X ) : poly(X) (10-fold) 3 27 Poly(X ) :poly(X) (11-fold ) 3 2 28 Poly(s U) : poly(s2U) (symmetri c base pair) 3 2 29 Poly(s U) : poly(s2U) (asymmetric base pair) 3 30 Pol y d(C) : poly d(I) : poly d(C) 6 31 Pol y d(T) : poly d(A) : poly d(T) 63,6 32 Poly(U ) :pol y (A ) : Poly (U ) (11-fold ) 6 33 Poly(U ) : poly (A) : Poly (U ) (12-fold) 6 34 Poly(I ) : poly(A) : poly (I ) 63,6 35 Poly(I ) : poly(I) : poly (I ) : poly (I ) 64,2 36 Poly(C ) or poly(mC) or Poly (eC ) 50,31,6 37 B'-DN A B2 poly d(A) : poly d(U) 4 38 B'-DN A B1 poly d(A) : poly d(T) 4 39 B'-DN A B2 poly d(AI) : poly d(CT ) 4 40 B'-DN A B1 poly d(AI) : poly d(CT) 4 41 B'-DN A poly d(AATT) : poly d(AATT) 4
s 0 0 2 3
0
0
4 4 4 0 0
0 2 7 0 0 0 7 8 0 0 0 0 1 1 5 5 4 4 2 2 2 3 2 9 4 3 5 5 9
5
Polynucleotide secondary structures: an historical perspective 2
5
9.2 The structures and tables The developmen t o f th e methodologie s fo r analysin g fibre diffractio n pattern s proceeded concurrentl y wit h th e discover y o f new pattern s an d with th e availabilit y o f more powerfu l computers. Consequently , som e structure s in th e earlie r literature are flawed in havin g n o hydroge n atom s an d i n retainin g mor e steri c compressio n tha n need b e tolerate d now . Amon g th e 4 1 structures listed i n Tabl e 1.1, with th e excep tion o f a few (7,30,33,35 , an d 36), this ha s been remedie d i n tha t th e model s presented her e com e eithe r fro m recen t analyse s o f ne w structure s o r moder n re-refinements o f older models . For eac h structure, the heli x symmetr y (P Q) an d the unit-cel l dimension s ar e given in Tabl e 1.2 ; under repeating unit , n, is listed the numbe r o f nucleotides in one , two, or thre e chain s that constitut e the molecula r asymmetric unit . I n som e duplexe s th e two chain s are (or are assumed to be) antiparalle l and identical. Thi s implie s that there is a diad axis perpendicular to th e scre w axis. Formally , this is indicated a s 2PQ. When P is an even intege r (a s in B-DNA), there is necessarily anothe r diad perpendicular t o the firs t a t hal f a pitc h alon g th e heli x axis . Thi s situatio n i s indicate d formall y by 22PQ. The conformatio n angles are listed in Tabl e 1.3. If more tha n one chai n is involved in th e molecula r asymmetri c uni t o f a structure, i t i s indicated b y chai n 1 , 2 , etc., immediately afte r th e structure number. Th e angle s a, B , y, d, e, and £ are the back bone conformatio n angle s at bonds P-O5' , O5'-C5' , C5'-C4' , C4'-C3', C3'-O3' , and O3'-P , respectively ; the glycosidi c conformation , x, i s the conformatio n a t th e Cl'—N bond; the endocyclic conformation angle s of the sugar rings are V0,...,v4. The disposition s an d shape s o f bas e pair s (Tabl e 1.4 ) are o f som e interes t an d in thi s presentatio n th e olde r description s ar e provide d t o allo w compariso n wit h reference 50 . Th e radia l shift d and th e latera l shear s are th e orthogona l component s of the displacemen t of a base pair fro m the heli x axis in the xy-plan e that i s perpendicular t o it . Th e propelle r twist , 0 P, of the tw o base s in a pair is defined like a conformation angle . Th e angle s between bas e normals an d the helix axis , y1 and y 2, ar e equal o r simila r in mos t structures . The til t o f the whol e bas e pair is 0 T, whil e 0 R is the rol l angl e o f th e it h bas e pair . Th e relativ e rol l A0 R = 0R(i—1)—#R(I ) i s also o f interest, as , of course, is t, the loca l helical twist . Al l these parameter s are define d i n Millane et al. (40). The dimension s o f groove s i n Watson—Cric k base-paire d 'smooth ' duplexes , wherein only on e nucleotide per chain constitutes the molecular asymmetric unit , are given in Table 1.5. Thes e hav e been calculate d following Arnot t (66) . The orientatio n o f th e phosphat e grou p relativ e t o th e heli x axi s i n eac h o f th e structures is provided in Table 1.6. 01 and 0 2 are, respectively, the angle s that the P—O l and P-O 2 bonds mak e wit h th e heli x axis . Similarly , 0 3 an d 0 4 are, respectively, th e angles tha t th e lin e O1...O 2 and the bisecto r o f th e Ol—P—O 2 plan e make wit h th e helix axis. Finally, Tabl e 1. 7 shows th e mea n value s for man y morphologica l an d conforma tional features of polynucleotide helices derived fro m single-crystal diffractio n analyses of oligonucleotides (26) .
26
Oxford Handbook of Nucleic Acid Structure
Table 1.2. Molecula r and crystal structures. Numbe r of nucleotides i n the molecular asymmetric unit n, helix symmetr y PQ, unit cell dimensions a, b, c, a, B, y. Fo r structure description and references see Table 2.1 Structure 1 2 3 4
5 6 7
8 9
10 11 12 13
n 1
2 11 1 2 5+5 1 1 3+3 2 2+2 1 2
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
6 2 2 2
37 38 39 40 41
1+1
35 36
1 +1
1+1 1+1 1 1 1 1+1 1+1 1 1 1+1
1 +1 1+1 + 1 1+1+1 1+1+1 1+1 + 1 1+1+1
1 1
1+1
2+2
2+2 . 4
P 11 11 1 10
5 2 28
9 3 9 9 8 4 4 4 6 7 1 10 10 11 12 11 45 10 11 10 11 11 11 11 12 11 12 12 23 6 6 6 10 10 5 5 5
Q 1
2 1 1 1 1 3
1 1 2 2 1 1 1 3 -1 -1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
a (nm)
b (nm)
c (nm)
n
a
B (°)
7 (°)
2.17 2.23 2.17 3.08 3.79 3.08 3.50
3.22 3.34 3.34 2.20 2.20 1.95 1.69 1.70 1.72 1.91 1.77 2.00 2.32 1.87 3.97 3.94 2.36 2.32 2.32 2.48 2.11 2.35 2.15 2.15
3.99 4.14 3.99 2.24 3.61 2.24 3.50 2.02 3.34 3.34 2.20 2.20 1.95 1.69 1.70 1.72 1.91 1.77 1.15 2.32 3.55 3.97 3.94 2.36 2.32 2.32 2.48 2.11 2.35 3.73 3.73
90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0
96.8 90.0 96.8 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0
90.0 90.0 90.0 90.0 90.0 90.0 120.0 90.0 120.0 120.0 120.0 120.0 120.0 90.0 90.0 90.0 120.0 120.0 90.0 120.0 90.0 120.0 120.0 120.0 120.0 120.0 120.0 120.0 120.0 90.0 90.0
4.95 4.58 2.71 4.03 2.79 2.32 1.58 1.65 1.84 1.86 1.93 1.93 3.11
4.95 4.58 2.71 4.03 2.79 2.32 2.16 2.19 3.49 2.27 2.32 2.32 2.26
2.80 5.60 2.80 3.37 3.36 3.37 9.24 9.33 2.98 2.98 5.87 5.87 2.41 2.42 2.43 7.40 4.35 5.32 1.02 3.32 3.23 3.09 3.60 2.81 11.32 3.13 3.37 3.01 2.77 2.86 2.86 3.48 3.84 3.35 3.65 3.97 7.84 1.86 1.89 1.89 3.20 3.24 3.21 3.21 3.39
90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0
90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0 90.0
120.0 120.0 120.0 120.0 120.0 120.0 90.0 90.0 90.0 99.9 98.7 98.7 90.0
-
-
—
—
-
The tw o entrie s for structure 13 are successively for poly d(IC) : poly d(IC) and poly d(AT) : poly d(AT). The thre e entries for structure 37 are successively for poly (C) , poly (mC), and poly (eC).
Polynucleotide secondary structures: an historical perspective
27
Table 1.3. Conformatio n angles . Conformatio n angle s (°) in the nucleotid e backbone ( a t o £) , about th e glycosyl bon d (x), an d the endocycli c conformatio n angles in the suga r ring (v0 to v4). For the definitio n o f the angle s see text. Fo r th e references se e Table 2. 1 Structure Chain Nucleotide a 1 2
N A U Al T2 C3 G4 G5 A6 A7 T8 G9 G10 T11
3
4 5 6
N C
1C
2G
7 8 9
1G 2A
10 11
1A 2C
12 13
G
l C2 C3 C4 C5 6 G7 G8 G9 G10 N N l G2 T3 4 C5 C6 A G l G2 3 T4 N C
-52 -58 -58 -71 -67 -66 -68 -66 -69 -68 -65 -68 -68 -68 -30 -30 -66 -59 -64 -50 -46 -75 -32 -61 -44 -84 -47 -37 -65 -57 -62 -63 -64 -69 -61 -63 -83 -90 -50 -82 -64 -59 -51
B
y
S
e
C
X
Vo
V1
"2
V3
V4
175 42 79 -148 -75 -157 8 -3 4 44 -40 2 1 176 47 81 -149 -75 -157 2 -2 6 39 -38 2 3 173 48 77 -150 -72 -156 4 -3 0 43 -41 2 3 -177 -50 -79 -154 -64 -152 -1 -24 -38 -40 2 6 176 56 76 -161 -68 -153 -5 -22 -38 -42 3 0 -174 46 80 -151 -70 -150 3 -2 7 39 -39 2 3 174 58 78 -156 -73 -156 -1 -25 39 -40 2 6 178 52 83 -152 -65 -157 5 -2 8 38 -36 2 0 -175 50 81 -153 -67 -152 0 -2 4 37 -38 2 4 180 52 78 -155 -69 -150 0 -2 5 39 -40 2 5 180 51 80 -154 -66 -151 0 -2 4 38 -39 2 4 173 59 79 -160 -69 -156 -5 -2 0 35 -39 2 8 -178 52 82 -146 -70 -155 -1 -22 36 -37 2 4 175 55 80 -160 -67 -153 -6 -19 35 -39 2 9 136 31 143 -141 -161 -98 -33 4 5 -40 23 6 126 47 143 -85 -169 -97 -35 5 3 -48 30 3 145 22 147 -156 -158 -74 -16 3 7 -42 34 -1 2 173 51 137 -92 172 -105 -40 4 7 -36 15 1 6 128 41 125 -163 -102 -110 -38 3 7 -23 2 2 2 173 39 143 -150 171 -90 -33 4 4 -38 20 8 120 68 127 -151 -106 -134 -36 3 7 -24 4 2 0 -173 52 140 -173 -102 -101 -29 3 9 -33 17 8 174 39 153 -175 -95 -108 -6 2 5 -34 31 -1 6 180 43 144 -145 -136 -103 -30 4 2 -37 20 6 154 32 146 -72 158 -98 -31 4 4 -39 23 5 158 27 153 -161 -129 -95 -15 3 3 -38 30-10 152 47 141 -176 -147 -109 -33 4 2 -36 18 9 -160 37 157 161 -106 -97 -4 2 5 -35 33 -1 9 134 43 145 -100 -179 -84 -24 3 9 -38 26 - 2 149 55 147 -148 -151 -96 -11 2 8 -33 27-11 143 67 144 -166 -108 -104 -9 -3 0 25 -10 168 57 138 -141 -158 -100 -25 3 6 -32 18 4 124 44 141 -95 170 -95 -32 4 5 -39 22 6 125 51 133 -96 -152 -113 -34 4 1 -32 14 1 3 128 45 139 -96 174 -81 -33 4 4 -38 20 8 110 98 146 -148 -89 -137 -23 3 7 -36 23 0 -173 39 146 -166 -155 -88 -23 3 7 -36 23 0 147 88 147 -153 -164 -108 -31 4 5 -41 24 4 153 48 147 -141 -113 -122 -13 2 9 -33 26 - 8 131 62 147 -162 -87 -113 -23 3 8 -38 25 - 2 173 36 131 -95 145 -91 -44 4 8 -35 11 2 1 156 64 145 -163 -131 -l02 -13 3 6 -42 36 -1 5 140 61 146 -128 -141 -115 -28 4 1 -40 25 2
28
Oxford Handbook of Nucleic Acid Structure
Table 1.3. Continued Structure Chai n Nucleotid e a I -7
14
15 16 17 18 19
1 2
20 21 22 23 24 25 26 27 28 2 29 30 31
32 33
1 1 2 1 2 3 1 2 3 1 2 3 1 2
34
3 1
Al T2 A3 T4 A5 T6 G C A T R Y A T A T A U N N G C I C A U X X U U U U C I C T A T U A U U A U 1 -
B 6 14
26
y
8
e
£
\
v0
v1
V2 v3 v4
8 14 8 -15 2 -15 4 -10 5 -2 8 4 3 -4 2 2 7 0
138 78 141 -169 -104 -118 -10 30 -37 32 -15 178 74 140 -96 -160 -103 -23 42 -42 30 -5 124 85 143 172 -101 -101 -10 31 -38 33 -15 -176 74 138 -114 -160 -105 -20 38 -38 25 -5 126 81 141 -179 -96 -99 -11 30 -36 31 -13 171 68 130 -152 -149 -111 -31 43 -37 20 7 179-174 95 -104 -65 59 -4-11 21 -24 17 -137 51 138 -97 82 -154 -28 36 -31 16 7 -175 -179 93 -107 -61 61 -8 -9 21 -25 21 -137 49 133 -98 79 -149 -32 37 -29 11 13 -162 180 76 171 125 26 -3 -24 39 -42 29 -133 -139 147 -84 98 167 -37 53 -48 29 5 127 35 137 -127 -166 -107 -39 49 -42 21 11 138 46 133 -144 -148 -111 -44 50 -37 13 19 135 43 136 -135 -156 -113 -49 56 -42 16 20 147 40 143 -146 -147 -116 -38 49 -40 20 11 179 55 82 -154 -71 -161 2 -25 37 -37 22 178 51 83 -152 -173 -161 2 -25 37 -37 22 177 61 77 -153 -70 -163 -3 -23 38 -42 28 -153 48 83 180 -50 -155 3 -26 37 -36 21 176 46 83 -148 -78 -167 3 -26 37 -36 21 178 47 83 -148 -78 -167 3 -26 37 -36 21 180 63 134 -169 -106 -119 -32 37 10 14 169 72 86 -146 -75 -155 4 -25 35 -34 19 -176 51 130 -174 -101 -121 -36 38 -27 7 18 180 60 84 -153 -72 -160 8 -30 39 -36 17 171 63 87 -142 -80 -156 7 -27 36 -33 17 -179 51 80 -153 -70 -163 3 -27 39 -38 22 169 41 77 -147 -76 -157 0 -26 40 -41 26 172 37 80 -148 -77 -156 3 -28 41 -40 24 172 37 77 -146 -77 -162 -3 -24 39 -41 28 163 41 80 -148 -80 -149 4 -29 42 -40 23 176 51 83 -155 -71 -158 3 -26 37 -36 21 172 72 83 -151 -73 -157 3 -26 37 -36 21 178 54 83 -153 -72 -156 3 -26 37 -36 21 131 28 135 -114 -162 -117 -46 53 -42 17 18 155 41 127 -158 -128 -113 -49 51 -34 6 27 154 28 135 -149 -135 -111 -43 49 -36 12 19 177 62 83 -146 -78 -166 5 -30 41 -37 21 -167 74 138 -174 -110 -123 -42 50 -39 14 19 -178 26 132 -170 -101 -131 -39 44 -31 7 21 171 23 83 -156 -75 -154 3 -26 37 -36 21 -66 -179 53 83 -163 -67 -149 3 -26 37 -36 21 -40 167 37 83 -149 -83 -156 3 -26 37 -36 21 4 0 131 52 80 -120 -114 -173 4 -28 40 -38 22 -64 -84 -103 -78 -38 -72 52 -140 58 -139 82 -60 -36 -40 -42 -43 -69 -64 -70 -85 -58 -60 -81 -82 -69 -74 -75 -66 -46 -45 -43 -43 -61 -82 -65 -48 -40 -38 -74 -99 -44 -28
Polynudeotide secondary structures: an historical perspective
29
Table 1.3. Continued Structure Chai n Nucleotid e
2 3 35
36 37
1
2 38
1
39
1
2
2 40
1
2 41
A I I
C A U A T Al I2 C3 T4 Al 12 C3 T4 Al A2 T3 T4
a
B
y
-74 -75 -103
179 178 176
-78 -53 -58 -45 -41 -74 -53 -47 -43 -44 -78 -46 -71 -44 -44 -45 -56
173 137 146 128 136
63 82 -160 -68 64 82 -155 -72 92 83 -156 -69 64 83 -125 -67 49 136 -133 -150 66 122 -157 -120 37 139 -119 -170 38 141 -133 -160 70 139 -174 -124 53 127 -169 -105 58 138 -162 -97 28 131 -175 -139 25 144 -150 -137 58 149 -106 -173 32 138 -160 -128 75 143 -134 -148 24 129 176 -132 59-140 -163 -98 27 143 -166 -135
177
165 161 164 161 132 168 135 125 163 172 172
8
50 159
e
£
x -168 -163 -169 -161 -117 -127 -109 -115 -126 -123 -134 -124 -111 -109 -108 -101 -101 -115
vn
1 3 3 3
-51 -45 -48 -42 -28 -44 -44 -29 -40 -39 -46 -46 -38 -17 -98 -37 -96 168 -96 -22
v1
v0
v3
-23 -26 -26 -26 56 46 56 52 36 45 46 36 52 54 57 54 40 28 48 44
36 38 37 37 -40 -28 -43 -42 -31 -29 -29 -30 -44 -47 -45 -41 -28 -28 -41 -47
-35 -36 -36 -36 13 3 19 19 16 4 5 17 22 27 21 17 7 18 21 36
v4 22 21 21 21 23 26 17 14 8 25 22 9 11 7 15 17 19 -1 10 -9
9.3 Commentary What i s evident fro m th e 4 1 structure s liste d i n Tabl e 1. 1 i s th e wid e coverag e o f polynucleotide helice s that i s provided b y fibr e diffractio n analysis . Mos t o f them ar e Watson-Crick paired duplexes, but no t all ; some base sequences from nativ e material are, i n effect , random , bu t som e ar e specia l i n th e extreme—homopolymers , fo r example. The divers e crystal structures in Table 1. 2 attest to another importan t consideratio n and tha t i s th e rang e o f environment s inhabite d b y th e differen t molecula r helices . Fibres of polynucleotides, like single crystals of oligonucleotides, ar e awash with water, some o f it firmly bound an d contributin g t o th e diffractio n i n a cooperative, crystal like fashion, but a great dea l of it mor e indifferentl y distribute d fro m cel l to cel l in a more liquid-lik e fashion . Th e poin t i s that th e polynucleotid e helice s examine d i n fibres ar e not onl y unperturbe d by the end-effect s that have to be suspecte d in crystals of polynucleotide fragments, but ar e unlikely t o be perturbed by lateral packing effect s in thei r spaciou s fibrous environments . I n thes e environments microcrystallinit y i s an option fo r molecula r packin g but i s not obligatory—i n man y o f th e fibre s th e con stituent molecules are merely uniaxially oriented. I n studies of oligonucleotide crystals there ha s been selection for only thos e conformations that have ended u p in crystals. Beneath th e diversit y of structure apparent in Table s 1. 1 and 1. 2 there i s the muc h more conservativ e framework indicating that all the nucleotide s belong to on e o f two major genera , A or B, and a very few belong to a third genu s Z. Th e ver y commo n A
30
Oxford Handbook of Nucleic Acid Structure
Table 1.4. Bas e pair orientations and helical twists. Base pair positions, orientation s and helical twists in th e Watson-Crick base-paire d duplexes. Fo r definition o f symbols see text. Fo r the reference s se e Table 1.1 Structure Nucleot ide 1
2
3
4 5 6
7 8 9
10 11 12 13 14
15
N A U Al T2 C3 G4 G5 A6 A7 T8 A9 A10 T11 N C G Cl C2 C3 C4 C5 N N Gl
G2 T3 A G Al G2 N C I Al T2 A3 T4 A5 T6 G C
d (nm)
s (nm)
o
0.48 0.46 0.46 0.48 0.43 0.47 0.47 0.51 0.51 0.47 0.47 0.43 0.48 0.48 -0.02 -0.05 -0.05 -0.06 0.02 -0.09 -0.02 0.07 -0.05
0.00 0.00 0.00 0.03 -0.01 -0.03 -0.04 -0.01 0.01 0.04 0.03 0.01 -0.03 0.00 0.00 -0.01 0.01 -0.03 -0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.08 0.00 0.01 -0.01 0.09 0.04 0.00 -0.02 0.02 -0.09 -0.22 -0.01 0.01 0.23 0.09 -0.26 0.26
-10.5 -11.9 -11.9 -13.0 -11.0 -10.6 -10.2 -13.2 -13.2 -10.2 -10.6 -11.0 -13.0 -9.0 -15.1 -14.2 -14.2 -1.8 -23.9 5.6 -18.6 -17.3 -1.8 -18.5 -19.8 -28.6 -12.1 -17.3 -17.3 -5.3 -23.7 -21.0 -16.9 -16.9 -14.5 -27.3 -21.4 -21 .4 -27.3 -14.5 8.3 8.3
-0.29 -0.21 -0.21 -0.27 -0.18 -0.18 -0.25 -0.14 -0.17 -0.19 -0.19 -0.39 -0.23 -0.09 -0.09 -0.23 -0.39 -0.30 -0.30
y1
72
23.2 23.2 22.4 23.2 23.2 22.4 22.5 24.1 25.2 25.4 25.4 25.6 25..4 26.1 25.0 25.0 25.0 24.1 26.1 25.4 25.6 25.4 25.4 25.2 24.0 22.5 22.8 22.8 8.1 8.1 9.1 9.2 9.1 9.2 1.3 2.3 18.4 5.8 6.1 11.0 17.5 1.1 11.3 6.3 8.2 8.2 11.2 11.2 12.2 12.1 16.4 18.6 9.1 12.7 20.9 11.0 11.0 20.9 11.0 10.1 16.8 25.4 16.6 16.6 17.8 16.3 16.3 17.8 20.1 18.0 19.1 22.7 23.3 20.0 20.0 23.3 22.7 19.1 18.0 20.1 6.9 1.4 1.4 6.9
0 22.6 22.0 22.0 22.2 24.7 25.0 25.2 24.1 25.0 25.2 25.0 24.7 22.2 22.4 2.8 5.7 5.7 1.3 1.6 4.1 -0.4 -1.5 -8.2 -6.4 7.1 10.2 8.7 -11.0 -11.0 -10.1 -16.3 -13.0 -14.8 -14.8 -17.5 -15.8 -18.7 -18.7 -15.8 -17.5 0.1 0.1
(o)o
R
t
0.0 0.0 32.7 -1.6 3.1 32.7 1.6 -3.1 31.8 -2.8 2.8 33.8 -0.4 -2.4 31.3 -0.4 0.0 32.4 -1.7 1.3 31.6 0.1 -1.8 33.7 -0.1 0.2 34.5 1.7 -1.8 33.7 0.4 1.3 31.6 0.0 32.6 0.4 2.8 -2.4 31.3 0.0 2.8 33.8 0.0 0.0 36.0 0.0 29.5 0.0 0.0 42.5 0.0 -1.0 3.5 35.7 6.4 -7.3 37.1 36.5 7.4 -1.0 8.2 -0.8 34.2 2.5 5.7 36.5 0.0 38.6 0.0 0.0 0.0 40.0 -0.1 3.4 39.2 42.9 1.4 -1.5 37.9 3.3 -1.9 42.4 9.1 -18.2 18.2 37.6 -9.1 46.0 1.8 -9.3 9.3 34.0 -7.5 0.0 45.0 0.0 44.8 1.5 -3.0 -1.5 3.0 45.2 49.6 2.8 -5.6 -2.8 5.6 36.5 54.4 3.3 -6.1 6.6 38.7 -3.3 54.4 2.8 -6.1 5.6 36.5 -2.8 2.8 -5.6 -10.7 -2.8 5.6 -49.3
Polynucleotide secondary structures: an historical perspective 3
1
Table 1.4. Continued Structure Nucleotid e
16
17 18 19 20 21 22 23 24 25 30 31 32 33 34 37 38 39 40 41
A T R Y A A A I A G I A I A A A A A A A I A I Al A2 T3 T4
0P
d (nm)
s (nm)
(°)
Ti O
(°)
-0..25 -0,.25 0..12 0.,12 0. 03 0. 08 0.,44 0,,51 0,,50 0.,51 0. 25 0.,38 0,,33 0.31 0.,48 0.,39 0 .25 0 .09 0,.03 0,,15 0,.07 0..11 0.,11 -0,,03 0..02 0..02 -0..03
-0.25 0.25 -0.18 0.18 -0.03 -0.01 0.01 0.00 0.00 0.00 0.00 -0.04 0.04 0.02 0.00 0.07 0.00 0.01 0.01 0.04 0.04 -0.07 0.04 0.00 0.01 -0.01 0.00
-8.6 -8.6 -21.1 -21.1 -22.0 -15.1 -2.1 2.3 10.5 16.1 -14.3 -4.3 -13.3 -5.5 -1.0 -8.8 -10.1 -21.5 -12.0 -12.3 -5.4 -19.8 -22.3 -8.9 -20.4 -20.4 -8.9
1.7 7.8 6.2 16.9 8.0 12.9 15.6 10.6 20.2 18.0 14.9 12.7 12.9 17.9 17.2 13.4 6.9 11.2 10.6 12.1 12.8 2.0 4.6 6.7 6.3 15.4 15.4
7.8 1.7 16.9 6.2 16.1 6.9 15.5 10.6 20.2 18.0 19.5 13.3 8.8 13.1 17.2 10.6 5.1 12.1 8.3 9.6 9.5 21.7 17.7 15.4 15.4 6.3 6.7
y2
0T
n
0R
(°)
A0R
n
t (°)
-1.5 3.4 -6.8 -8.1 -1.5 -3.4 6.8 -43.4 -4.0 5.8 -11.7 12.0 -4.0 -5.8 11.7 -12.0 -4.7 -4.5 0.0 36.0 -5.9 4.0 0.0 36.0 -15.5 -0.8 0.0 32.7 -10.6 0.0 0.0 30.0 19.5 0.0 0.0 32.7 16.1 0.0 0.0 32.0 14.9 -5.4 0.0 36.0 12.7 -2.0 0.0 32.7 8.2 4.0 0.0 32.7 7.3 13.5 9.7 30.0 17.2 -0.1 6.0 32.0 10.6 5.1 0.0 30.0 3.1 0.4 0.0 30.0 -4.4 -0.5 0.0 36.0 7.2 -1.8 0.0 36.0 -8.8 2.2 4.6 35.0 -8.5 6.8 -4.6 37.0 0.5 -11.8 5.3 31.0 0.7 -6.5 -5.3 40.0 -1.8 10.9 -21.7 37.0 3.4 4.8 6.0 39.0 3.4 -4.8 9.7 27.0 -1.8 -10.9 6.0 39.0
and B conformations eac h aggregat e int o right-handed helices ; only th e Z conforma tions aggregat e int o left-handed helices . The diagnosti c conformationa l difference betwee n A and B was long ag o identified as th e suga r rin g pucke r whic h i s C3'-endo i n A structure s and C2'-endo i n B . Thi s translates int o ther e bein g a ver y differen t se t o f endocycli c conformatio n angle s (v0,...,v4) fo r th e furanos e rings [cf . (A-DNA ) structur e 1 and (B-DNA ) structur e 4, in Tabl e 1.3] . Mor e simply , one ca n use S (which i s equivalent to v3), and o f the orde r of 80 ° i n A an d 140 ° i n B . Associated with th e differen t furanos e conformation s ar e different value s fo r x , th e glycosidi c conformatio n whic h ha s a (60° ) greate r magni tude i n A tha n i n B . Othe r loca l nucleotid e conformationa l difference s betwee n A an d B ar e eviden t i n £ (O3'—P ) whic h i n A i s invariabl y g~ with a mea n valu e o f —80° but i n B ca n be g- or t with a mean valu e of—120 ° bu t a wide range . The neigh bouring conformatio n angle , e (C3'—O3') , is t in A (mea n value —160°) bu t ca n be t or
32
Oxford Handbook of Nucleic Acid Structure
Table 1.5. Groov e dimensions . Dimensions o f major and minor groove s in 'smooth' Watson—Crick base-paired duplexes, i.e. thos e in which al l the nucleotide s are assumed to have the sam e conformations Major Mino Structure 1 0.2 4 1.1 7 1.0 8 1.1 12 0.9 18 1.3 19 1.4 20 0.4 21 0.8 22 0.2 23 0.4 24 0.6 25 0.8 26 0.0 27 0.0 28 0.3 30 0.7 31 1.5 32 0.8 33 0.8 34 0.8 37 1.3 38 1.4
Width Dept (nm) (nm
r h Widt ) (nm
2 1.3 6 0.8 5 0.7 0 0.5 6 0.6 8 0.9 1 0.9 7 1.2 9 1.4 7 1.3 6 1.3 5 1.1 7 1.3 7 1.1 8 1.3 6 1.2 9 1.2 7 1.3 5 1.4 5 1.2 7 1.2 8 0.9 0 0.8
h Dept ) (nm
0 1.1 5 0.6 6 0.4 0 0.3 2 0.0 0 0.2 6 0.2 9 1.0 4 1.0 6 1.0 6 1.1 7 0.9 1 0.9 4 1.3 6 1.2 7 1.0 2 1.0 9 0.6 3 0.9 7 1.0 8 1.2 6 0.3 9 0.2
h )
1 0.2 0 0.8 8 0.7 5 0.7 8 0.7 9 0.7 8 0.7 8 0.3 5 0.3 9 0.2 1 0.2 5 0.5 3 0.4 8 0.3 4 0.2 0 0.2 2 0.4 2 0.6 5 0.3 8 0.4 6 0.5 0 0.7 7 0.7
6 2 9 2 4 1 0 3 4 9 5 3 5 9 8 4 4 7 9 1 5 2 1
Table 1.6. Phosphat e grou p orientations . Phosphate group orientation s relative to helix-axis in fibrous polynucleotid e structures Structure Chai
1
2
3
n Nucleotid N A U Al T2 C3 G4 G5 A6 A7 T8 G9
e
Oi
o 145 140 139 132 136 137 140 142 136 135 138 137
02
03
04
(°)
(°)
(°)
89 95 96 105 99 101
94 96 103 101 98 99
61 66 67 76 71 71 66 66 73 72 69 71
140 144 143 151 147 151 142 147 154 148 147 147
Polynucleotide secondary structures: an historical perspective 3 Table 1.6. Structure
Continued Chain
4 5 6
1
2
7
8 9
1 2
10 11
1 2
12 13 14
Nucleotide
G10 T11 N C G C1 C2 C3 C4 C5 G6 G7 G8 G9 G10 N N G1 G2 T3 A4 C5 C6 A G Al G2 C3 T4 N C I Al T2
15 16 17 18 19
3
1
A3 T4 A5 T6 G C A T R Y A T A
03
0
(°)
134 130 59 67 18 82 25 73 67 59 95 75 66 18 70 93 18 51 53 64 19 21 20 66 47 52 65 23 64 62 56 36 62 64 14 73 22
106 104 139 128 124 115 136 117 135 133 128 125 118 131 125 123 118 133 136 121 115 128 108 121 117 112 143 137 101 133 135 122 131 129 109 126 122
69
117
28 109 27 118 42 90 48 54 50
95 36 99 38 140 101 140 135 136
04
n
o
76 76 138 126 153 109 163 116 130 135 108 119 121 162 123 107 147 141 141 125 143 156 138 123 132 126 136 167 112 132 138 143 132 129 140 121 149 118 125 48 128 42 153 96 148 140 143
156 147 103 102 68 105 80 98 108 100 131 107 93 74 102 124 63 93 96 94 60 73 54 96 78 77 111 79 76 102 98 75 100 101 52 106 68 96 41 63 46 73 91 101 95 97 94
34
Oxford Handbook of Nucleic Acid Structure
Table 1.6. Continued Structure
Chain
A
20
U N N
21 22 23 24 25 26 27 28 29 30 31 32 33 34
1
2 1 2 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
35 36 37
1
38
2 1
39
2 1
40
1 2
41
Nucleotide
G C I C A U X X U U U U C
I
C T A T U A U U A U I A I I C A U A T Al 12 C3 T4 Al 12 C3 T4 Al A2 T3
T4
fli (°) 137 137 135 135 144 144 92 124 95 136 135 141 140 141 147 130 130 123 135 57 73 83 143 102 100 133 125 134 117 124 131 117 141 54 69 43 59 81 73 93 72 68 37 47 64 30 90 66 80
02
03
(°)
(°)
(
99 97 99 108 89 89 130 106 128 99 97 96 92 91 85 100 101 108 97 136 131 128 92 130 123 94 108 92 91 107 101 116 91 131 131 136 138 133 127 124 127 132 128 127 133 142 127 124 119
70 69 71 76 61 61 111 80 108 70 70 67 65 63 57 74 75 82 70 138 124 116 63 105 103 69 81 67 75 81 74 89 63 137 127 149 138 120 122 108 123 128 147 139 131 168 111 125 113
146 144 144 162 139 139 129 142 132 145 141 147 139 139 135 140 142 144 142 100 110 118 141 143 133 135 146 134 117 144 143 141 138 94 106 89 102 119 107 125 106 106 80 86 103 86 125 99 106
04
Polynudeotide secondary structures: an historical perspective 3
5
Table 1.7. Compariso n o f helix parameters for A-, B-, an d Z-DNA derived fro m crystal structures (from ref . 26) 1. Bas e step parameters Helix
Step
Roll A0R (° )
B A Z
NpN NpN RpY YpR
0.6 6.3 5.8 -5.8
Tilt (°)
Cup (°)
Slide (nm)
Twist f (° )
Rise h (nm)
Rad(P) (nm)
0.0
36.1 31.1 -50.6 -9.4
0.34 0.29 0.35 0.39
0.94 0.95 0.73 0.63
X d (nm)
s (nm)
y
P-P (nm)
0.0 0.0 0.0
-12.5 12.5
0.04 -0.16 -0.11 0.54
0R(°)
Tip
Incl 0T(°)
Prop 0p (° )
Buck (°)
0.0 11.0 -2.9 2.9
2.4 12.0 -6.2 -6.2
-11.1 -8.3 -1.3 -1.3
-0.2 -2.4 6.2 -6.2
-
2. Bas e pair parameters Helix
Base
B A Z
N N R Y
0.08 0.41 0.30 0.30
0.01 0.23 -0.23
0.88-1.40 0.12-1.59 0.77 1.37
3. Mai n chain conformation angle s (°) Helix p B
A Z
Sugar
N
N N R Y
-65 -73 48 -137
167 173 179 -139
51 64 -170 55
129 78 100 138
-157 -151 -104 -94
-120 -77 -69 80
-103 -165 67 -159
2'en 3'en 3'en 2'en
g- in B, producing, therefore , a mean value lower i n magnitude. Th e effec t o f changes in e an d £ is to alte r greatly the orientatio n o f the phosphat e groups. Th e 'wrinkles ' on man y B helices usuall y take the for m of altered phosphat e orientations and there fore distinctl y differen t value s for B and o r £ (cf. structures 4 and 5 , an d structure s 12, 13, and 14). The tw o Z structure s (15 and 16 ) emerge fro m alternatin g purine-pyrimidine (RY) poly(dinucleotides). The R nucleotid e i s not onl y C3'-endo but th e glycosidic angle is syn (+60° ) rathe r tha n th e usua l anti (- 0 t o -160°) . Th e Y nucleotid e i s C2'-endo with a n anti glycocidic conformation . All the othe r conformationa l angle s in th e tw o nucleotides are different also . When one examine s the summar y of oligonucleotide structure s in Table 1. 7 what is depressing or reassuring—dependin g o n one' s vantage point—is tha t nothing dramat ically ne w ha s been reveale d abou t th e commo n conformation s o f nucleotide s i n helices. Much ha s been made of the fin e morphologica l difference s fro m on e structure to another . Olde r term s such as tilt and twist have been give n ne w meaning s and no w incline and propeller, an d base s and bas e pairs also roll, cup, slide, tip, and buckle as, up t o a point, the y must in lower symmetry arrangements . Nevertheless, i t is hard not t o con -
36
Oxford Handbook of Nucleic Acid Structure
elude that only one revelatio n of significance ha s emerged from oligonucleotid e crystal studies, i.e . th e possibilit y o f the existenc e o f Z structure s in polynucleotides containing some alternating purine and pyrimidine nucleotides.
Acknowledgements The presidenc y of Scotland's oldest university , S t Andrews, founded 1410—141 3 AD , i s not th e bes t vantag e poin t fro m whic h t o writ e a review o f polynucleotide structure , even a n historica l one . I am , therefore , ver y indebte d t o m y long-tim e frien d an d former colleagu e Professo r R . Chandrasekara n o f Purdue Universit y fo r keepin g m e aware o f recen t developments . Hi s ow n contribution s t o polynucleotid e structur e determinations ar e substantial an d th e exten t t o whic h h e ha s inherited th e Wilkins ' tradition o f meticulous fibr e diffractio n studie s i s amply illustrate d b y th e man y usefu l fibre structure s o f polysaccharide s a s well a s polynucleotides tha t ar e emergin g fro m his laboratory.
References 1. Olby , R . (1974 ) The Path to the Double Helix, p. 65 . Universit y o f Washingto n Press , Seattle. 2. Arnott , S . (1973) Trans. Am. Cryst. Assoc. 9, 93 . 3. Stubbs , G., Warren, Se . and Holmes, K . (1977 ) Nature 267 , 216 . 4. Makowski , L . (1978) J. Appl. Cryst. 11 , 273 . 5. Arnott , S. , Chandrasekara n R., Millane , R.P . an d Park , H.-S . (1986 ) J. Mol. Biol . 188 , 631. 6. Prive , G.G. , Yanagi, K. and Dickerson, R.E . (1991 ) J. Mol. Biol. 217, 177 . 7. Wang , A.H-J. , Quigley , G.J. , Kolpak , F.J. , Crawford , J.L. , va n Broom , J.H. , va n de r Marel, G. an d Rich, A. (1979 ) Nature 281, 680 . 8. Arnott , S. , Chandrasekaran , R. , Birdsall , D.L. , Leslie , A.G.W. an d Ratliff , R.L . (1980 ) Nature 283 , 743 . 9. Saenger , W., Landmann , H. an d Lazius , A.G . (1973 ) in Jerusalem Symposium on Quantum Chemistry V, p. 457 . Th e Israel i Academy o f Sciences and Humanities, Jerusalem. 10. Coll , M. , Frederick , C.A. , Wang, A . H.-J. an d Rich, A. (1987 ) Proc. Natl. Acad. Sci. USA 84, 8385 . 11. Brown , T. , Leonard , G.A., Booth, E.D. an d Chambers, J. (1989 ) J. Mol. Biol. 207, 455 . 12. Wilkins , M.H.F., Stokes , A.R. an d Wilson, H.R. (1953 ) Nature 171 , 738 . 13. Franklin , R.E. an d Gosling, R.G . (1953 ) Ada Cryst. 6 , 673 . 14. Watson , J.D. an d Crick, F.H.C . (1953 ) Nature 171 , 737 . 15. Crick , F.H.C . and Watson, J.D. (1954 ) Proc. R. Soc. (London) Ser. A. 223 , 80. 16. Donahue , J. (1956 ) Proc. Natl. Acad. Sci. USA 42 , 60 . 17. Langridge , R. , Marvin , D.A. , Seeds , W.E. , Wilson , H.R. , Hooper , C.W. , Wilkins , M.H.F. and Hamilton, L.D . (1960 ) J. Mol. Biol. 2, 28. 18. Fuller , W., Wilkins , M.H.F. , Wilson , H.R. , Hamilton , L.D. , an d Arnott, S . (1965) J. Mol. Biol. 12, 60 . 19. Marvin , D.A. , Spencer , M. , Wilkins , M.H.F . an d Hamilton, L.D . (1961 ) J. Mol. Biol. 3, 547. 20. Davies , D.R. an d Baldwin, R.L . (1963 ) J. Mol. Biol. 6, 251 . 21. Arnott , S., Dover, S.D . an d Wonacott, AJ . (1969 ) Acta Cryst. B 25, 2142 .
Polynudeotide secondary structures: an historical perspective 3
7
22. Smith , P.J.C . and Arnott, S . (1978) Acta Cryst. A 34, 3. 23. Arnott , S. , Wilkins, M.H.F. , Hamilton , L.D . an d Langridge , R . (1965 ) J. Mol. Bid. 27, 391. 24. Arnott , S. , Wilkins, M.H.F. , Fuller , W. , Venable , J. an d Langridge , R . (1967 ) J. Mol. Biol. 11, 391 . 25. Arnott, S. , Fuller, W., Hodgson , A . and Prutton, I . (1968 ) Nature 220, 561 . 26. Dickerson , R.E . (1992 ) Meth. Enzymol. 211, 67 . 27. Hamilton , W.D . (1965 ) Acta Cryst. 18 , 502 . 28. Rich , A. (1958 ) Biochim. Biophys. Acta 29, 502 . 29. Arnott , S. , Chandrasekaran, R. an d Martilla, C . (1974 ) Biochem. J. 141 , 537 . 30. Langridge , R . an d Rich, A. (1963) Nature 198 , 725 . 31. Arnott , S. , Chandrasekaran, R. an d Leslie, A.G.W. (1976) J. Mol. Biol. 106, 735 . 32. Mitsui , Y. , Langridge , R. , Shortle , B.E. , Cantor , C.R. , Grant , R.C. , Kodama , M . an d Wells, R.D . (1970 ) Nature 228 , 1166 . 33. Arnott , S. , Chandrasekaran , R. , Hukins , D.W.L. , Smith , P.J.C . an d Watts , L . (1974 ) J. Mol. Biol. 88, 523 . 34. Arnott , S. , Chandrasekaran, R. , Leslie , A.G.W., Puigjaner , L.C . an d Saenger , W. (1981 ) J. Mol. Biol. 149 , 507 . 35. Arnott , S. , Chandrasekaran, R., Day , W.A. , Puigjaner , L.C . an d Watts, L . (1981) J. Mol. Biol. 149, 489 . 36. Arnott , S . and Bond, P.J . (1973 ) Science 181 , 68 . 37. Arnott , S . and Bond, P.J . (1973 ) Nature New Biology 244 , 99 . 38. Arnott , S . and Seising, E. (1974 ) J. Mol. Biol. 88, 509 . 39. Giacometti , A . and Chandrasekaran, R. (1998 ) (i n preparation). 40. Millane , R.P. , Walker , J.K. , Arnott , S. , Chandrasekaran , R . an d Birdsall , D.L . (1984 ) Nud. Adds Res. 12, 5475 . 41. Arnott , S. , Chandrasekaran, R., Hall , I.H . an d Puigjaner, L.C. (1983 ) Nucl. Adds Res. 11, 4141. 42. Chandrasekaran , R. an d Radha, A . (1992) J. Biomol. Struct. Dynamics 10 , 153 . 43. Chandrasekaran , R. , Radha , A . and Park, H.-S. (1995 ) Acta Cryst. D51 , 1024 . 44. Chandrasekaran , R. , Radha , A. , Park , H.-S . an d Arnott , S . (1989 ) J. Biomol. Struct. Dynamics 6 , 1203 . 45. Chandrasekaran , R., Radha , A. and Park, H.-S. (1997 ) J. Biomol. Struct. Dynamics 15 , 285. 46. Amott , S. , Chandrasekaran , R., Puigjaner , L.C. , Walker , J.K. , Hall , I.H. , Birdsall , D.L . and Ratliff , R.L. (1983 ) Nucl. Acids Res 11, 1457 . 47. Leslie , A.G.W., Arnott , S. , Chandrasekaran, R. an d Ratliff, R.L. (1980 ) J. Mol. Biol. 143 , 49. 48. Arnott , S . and Seising, E . (1975 ) J. Mol. Biol. 98, 265 . 49. Chandrasekaran , R., Radha , A . an d Ratlif f R.L . (1994 ) J. Biomol. Struct. Dynamics 11 , 741. 50. Chandrasekaran , R . an d Arnott , S . (1989 ) Landolt—Bornstei n Numerica l Dat a an d Functional Relationships i n Science and Technology (Grou p VII, Biophysics), Subvolume VII 1b, p. 31 . Springer-Verlag, Berlin , Heidelberg . 51. Arnott , S . and Hukins, D.W.L. (1973 ) J. Mol. Biol. 81, 93. 52. Arnott , S. , Chandrasekaran, R., Puigjaner , L.C., Walker , J.K., Hall , I.H . an d Birdsall, D.L (1983) Nucl. Acids Res. 11, 1457 . 53. Chandrasekaran , R. , Arnott , S. , He , R.-G. , Millane , R.P. , Park , H.-S. , Puigjaner , L.C. an d Walker, J.K. (1985 ) J. Macromol. Sci. Phys. 24, 1. 54. Arnott , S . and Seising, E. (1975 ) J. Mol. Biol. 98, 243 . 55. Seising , E., Amott, S. and Ratliff, R.L. (1975 ) J. Mol. Biol. 98, 243 .
38
Oxford Handbook of Nucleic Acid Structure
56. Arnott , S . and Chandrasekaran, R. (1980 ) Nature 287 , 56 1 57. Park , H.-S. , Arnott , S. , Chandrasekaran , R. , Millane , R.P . an d Campagnari , F . (1987 ) J. Mol. Biol. 197 , 513 . 58. Chandrasekaran , R. an d Radha, A . (1992) J. Biomol. Struct. Dynamics 10 , 153 . 59. Arnott , S. , Hukins, D.W.L. , Dover , S.S. , Fuller , W. an d Hodgson. , A.R . (1973 ) J. Mol. Biol. 81, 107 . 60. Arnott , S. , Chandrasekaran, R., Puigjaner , L.C., Walker , J.K., Hall , I.H . and Birdsall, D.L (1983) Nucl. Acids Res. 11, 1457 . 61. Arnott , S. , Chandrasekaran , R. , Millane , R.P. , an d Park , H.- S (1986 ) J. Mol. Biol . 188 , 631. 62. Arnott , S. , Bond, P.J. , Seising , E. and Smith, P.J.C . (1976 ) Nucl. Acids Res. 3, 2459 . 63. Giacometti , A . and Chandrasekaran, R. (1997 ) i n preparation . 64. Chou , C.H. , Thomas , Jr , G.J. , Arnott , S . an d Smith , P.J.C . (1977 ) Nucl. Acids Res. 4 , 2407. 65. Leslie , A.G.W. an d Arnott, S . (1978) J. Mol. Biol. 119 , 399 . 66. Arnott , S (1976) i n Organisation and Expression of Chromosomes, (Bautz , E.K.F., McCarthy , B.J., Schimke , R.T . an d Tissieres, A . eds), p. 209. Dahle m Konferenzen , Berlin .
2 Base and base pair morphologies, helical parameters, and definitions Richard Lavery and Krystyna Zakrzewska Laboratoire de Biochimie Theorique, CNRS UPR 9080, Institut de Biologie Physico-Chimique, 13, Rue Pierre et Marie Curie, Paris 75005, France
1. Introduction As tim e passes , the complexit y o f nucleic aci d structur e an d conformatio n continue s to increas e rapidly. The beautifull y regular doubl e heli x of Watson an d Crick ha s lost its symmetry wit h the appearanc e o f major bas e sequence effect s an d loca l perturba tions cause d b y bas e modifications , mispairing , bulges , an d abasi c sites. I n addition , larger scal e deformations suc h as curvature and groov e widt h variation s have come t o light an d are particularly important fo r understandin g dru g an d protein binding . Th e standard duplex ha s also been joined by an ever-growing collection o f new structures, including tripl e an d quadrupl e helices , paralle l duplexes , mutuall y intercalate d duplexes, ste m loops an d three- and four-branch junctions. Thi s growt h ha s also been fuelled b y rapid progress involving RNAs , whic h has revealed a host o f complex ter tiary conformations , an d als o b y th e creatio n o f nove l oligonucleotide s destine d t o bind t o specifi c DN A o r RN A target s as part o f the antigen e an d antisense strategies for artificia l genetic control . This increas e in complexit y require s a parallel effor t in developin g the mean s for describing an d analysin g th e ne w structures . Thi s nee d exist s o n severa l differen t levels: i n classifyin g th e basi c element s o f th e structure s (stran d direction , pairin g schemes, etc.) , in describin g th e detai l o f conformation (notably , to enabl e structures to b e compare d i n a quantitative way), and i n dealin g wit h dat a including conforma tional dynamics (suc h as the trajectorie s generated i n increasingly realistic MD simula tions). Thi s chapte r attempt s to summariz e th e presen t stat e of affair s i n eac h of thes e areas and to point ou t th e difficultie s tha t stil l exist .
2. Nucleic acid bases The standar d nucleic aci d base s are illustrated i n Fig . 2.1 . I n th e cas e o f DNA, the y comprise tw o purines (abbreviated , Pur o r R), adenin e (Ad e or A), and guanine (Gu a or G) , eac h containin g tw o fuse d ring s with fiv e an d si x atoms, respectively , and tw o pyrimidines (Py r or Y), thymine (Th y or T), an d cytosine (Cy t o r C) , eac h containin g a single six-atom ring . Within RNA , thymin e i s replaced b y uracil (Ur a o r U) whic h differs onl y in the lack of a methyl group a t position 5. The figur e shows the standar d notation o f th e bas e atom s an d thei r geometrie s ar e liste d i n Tabl e 2. 1 (which , fo r
40
Oxford Handbook of Nucleic Acid Structure
Table 2.1. Standard bas e geometries. [Taken fro m th e most recent fibre coordinates fo r canonical B-DN A (53)] . Fo r reference, backbon e bon d lengths have als o been include d (a) Bonds lengths (A)
Adenine N1-C2 N3-C4 C5-N7 N7-C8 Guanine N1-C2 C2-N3 C4-C5 C6-06 Thymine N1-C2 N3-C4 C5-C6 C7-H72 Cytosine N1-C2 N3-C4 N4-H42 Backbone
1.332 1.349 1.388 1.297
N1-C6 1.34 C4-C5 1.36 C6-N6 1.34 C8-N9 1.36
6 C2-N 5 C4-N 1 N6-H6 6 C8-H
3 1.31 9 1.37 1 1.00 8 1.00
5 0 0 0
C2-H2 C5-C6 N6-H62
1.000 1.404 1.000
1.381 1.331 1.375 1.228
N1-C6 1.40 N2-H21 1.00 C4-N9 1.37 N7-C8 1.31
2 N1-H 0 N2-H2 8 C5-C 1 C8-N
1 1.00 2 1.00 6 1.41 9 1.37
0 0 9 8
C2-N2 N3-C4 C5-N7 C8-H8
1.335 1.359 1.394 1.000
1.374 1.380 1.343 1.090
N1-C6 1.37 N3-H3 1.00 C5-C7 1.50 C7-H73 1.09
0 C2-O 0 C4-O 0 C6-H 0
2 1.21 4 1.23 6 1.00
9 3 0
C2-N3 C4-C5 C7-H71
1.381 1.444 1.090
1.392 1.339 1.000
N1-C6 1.36 C4-N4 1.32 C5-C6 1.35
0 C2-O 4 C4-C 7 C5-H
2 1.23 5 1.43 5 1.00
7 3 0
C2-N3 N4-H41 C6-H6
1.358 1.000 1.000
P-01'
O3'-C3' Cl'-Hl' C2'-H2' C4'-O4'
1.480 1.422 1.090 1.090 1.457
P-O2' 1.48 O5'-C5' 1.44 C1'-N9 1.49 C3'-C4' 1.52 C4'-H4 1.09
0 P-03 ' 1.60 0 C1'-C2 ' 1.52 0 C2'-C3 ' 1.52 9 C3'-H 3 1.09 1 C5'-H5 ' 1.09
0 5 9 0 0
P-O5' C1'-O4' C2-H2' C4-C5' C5-H5'
1.600 1.419 1.090 1.516 1.090
(b) Bond angles
(o)
Adenine C2'-C1'-N9 C2-N1-C6 N3-C2-H2 N3-C4-N9 C4-C5-N7 N1-C6-N6 C6-N6-H62 N7-C8-N9 C1-N9-C4 Guanine C2'-C1'-N9 C2-N1-C6 N1-C2-N2 C2-N2-H21 C2-N3-C4 C5-C4-N9 C6-C5-N7
113.71 118.83 115.42 127.20 110.48 119.12 120.02 113.83 126.00
O4'-C1'-N9 108.1 N1-C2-N3 129.1 C2-N3-C4 110.8 C5-C4-N9 106.1 C6-C5-N7 132.4 C5-C6-N6 123.5 H61-N6-H62 119.9 N7-C8-H8 123.0 C1'-N9-C8 128.3
1 H1'-C1'-N 8 N1-C2-H 2 N3-C4-C 1 C4-C5-C 1 N1-C6-C 0 C6-N6-H6 9 C5-N7-C 6 N9-C8-H 9 C4-N9-C
9 2 5 6 5 1 8 8 8
109.46 115.41 126.69 117.11 117.38 119.99 103.97 123.11 105.60
113.71 125.23 116.05 119.99 112.25 106.43 130.07
O4'-C1'-N9 108.1 C2-N1-H1 117.3 N1-C2-N3 123.3 C2-N2-H22 120.0 N3-C4-C5 128.5 C4-C5-C6 119.3 N1-C6-C5 111.4
1 H1'-C1'-N 7 C6-N1-H 0 N2-C2-N 1 H21-N2-H2 1 N3-C4-N 1 C4-C5-N 0 N1-C6-O
9 1 3
109.46 117.39 120.65 120.00 125.07 110.61 119.80
9 7 6
2
Base and base pair morphologies, helical parameters, and definitions 4
1
Table 2.1. Continued C5-C6-O6 N7-C8-H8 C1'-N9-C8 Thymine C2'-C1'-N1 C1'-N1-C2 N1-C2-O2 C2-N3-C4 N3-C4-O4 C4-C5-C6 N1-C6-C5 C5-C7-H71 H71-C7-H72 Cytosine C2'-C1'-N1 C1'-N1-C2 N1-C2-O2 C2-N3-C4 N4-C4-C5 H41-N4-H42 C6-C5-H5 C5-C6-H6
128.80 123.01 129.18
C5-N7-C8 N9-C8-H8 C4-N9-C8
103.75 122.99 105.22
N7-C8-N9 C1'-N9-C4
113.99 125.60
113.71 117.09 122.93 126.40 120.55 120.75 121.26 109.49 109.49
O4'-C1'-N1 C1'-N1-C6 N1-C2-N3 C2-N3-H3 N3-C4-C5 C4-C5-C7 N1-C6-H6 C5-C7-H72 H71-C7-H73
108.11 120.84 115.43 116.78 114.09 117.53 119.39 109.45 109.50
Hl'-Cl'-Nl C2-N1-C6 O2-C2-N3 C4-N3-H3 04-C4-C5 C6-C5-C7 C5-C6-H6 C5-C7-H73 H72-C7-H73
109.46 122.07 121.64 116.82 125.36 121.72 119.35 109.45 109.45
113.71 117.80 118.85 120.63 120.13 120.00 121.55 119.44
O4'-C1'-N1 C1'-N1-C6 N1-C2-N3 N3-C4-N4 C4-N4-H41 C4-C5-C6 N1-C6-C5
108.10 121.05 118.70 118.32 120.01 116.89 121.08
Hl'-Cl'-Nl C2-N1-C6 O2-C2-N3 N3-C4-C5 C4-N4-H42 C4-C5-H5 N1-C6-H6
109.46 121.15 122.45 121.55 119.99 121.56 119.48
Amino and methyl hydrogens are named by adding 1, 2, or 3 to the parent atom number, thus, G(N2) carries the hydrogens H21 an d H22. The methy l group of thymine is numbered C7).
completeness, als o provide s th e bon d length s withi n th e phosphodieste r backbone) . Since al l th e base s contai n conjugate d rings , thei r mos t stabl e conformation s ar e planar. They can , nevertheless, underg o non-plana r deformation s as a result of thermal agitation, steric strain, or th e presenc e of other species. In addition t o these standard bases many others are found within nucleic acids. These may occu r naturally , as in th e cas e o f RNA s tha t contai n bot h modifie d base s (m 2G, m7G, m1A , m5C, wybutine, etc. ) and unconventiona l linkage s (e.g . pseudouracil) (1) . Other unusua l base s ar e th e resul t o f chemica l modification s (se e below), whil e stil l others ar e voluntaril y introduce d int o oligonucleotide s wit h specifi c goal s i n mind . This i s the cas e for effort s aime d a t generating so-calle d 'universal ' base s whic h coul d recognize mor e tha n one paire d partner and thu s be very usefu l i n designin g antisense or antigen e oligonucleotide s (2) . Th e reade r is referred t o Wolfra m Saenger's book o n nucleic acid s for an overview o f modified bas e structures (1). Amongst th e variou s chemical modification s that th e base s ca n undergo , protona tion an d methylation meri t consideration . Protonatio n occur s most readil y a t C(N3) , A(N1), G(N7) , an d T(O4). Suc h changes considerably modif y the charg e distributio n within th e conjugate d base s and als o modif y the pairin g scheme s the y ca n adopt . A well-known exampl e o f suc h modifications involves cytosine , fo r whic h protonatio n
42
Oxford Handbook of Nucleic Acid Structure
Fig. 2.1. Standard nucleic acid bases.
at N 3 i s a necessary step in th e formatio n o f G:C + Hoogstee n pairin g withi n tripl e helices (3 ) or th e formatio n of the nove l i motif, which i s comprised o f two mutually intercalated C:C + paralle l duplexes (4). The influenc e o f the protonatio n o n th e base geometry i s generall y limite d an d loca l i n nature , a s show n b y crystallographi c (5) and quantu m chemical studies (6). Base methylatio n plays a n important biologica l role , sinc e it function s as a genetic control mechanis m (7) . Th e mos t prominen t reactio n occur s a t C(C5) , mainl y i n CpG sequences . The nex t most important sit e involves the externa l proton o f A(N6). Methylation ha s an importan t effec t o n interaction s wit h protein s and, fo r example , generally protect s fro m endonuclease s (although som e enzyme s o f thi s clas s actually require methylated bases to function) .
Base and base pair morphologies, helical parameters, and definitions 4
3
3. Base pairing Standard Watson—Cric k base pair s ar e forme d b y specifi c recognitio n betwee n a purine an d a pyrimidin e base : adenin e wit h thymin e (o r uracil ) an d guanin e wit h cytosine (Fig . 2.2) . Thes e combination s lea d t o virtuall y identica l bas e geometries a s illustrated in Fig . 2.3. Thi s identit y wa s the basi s of the realizatio n tha t it is possible to build a regular helical doubl e helix wit h an arbitrary base sequence an d it was also th e basis for understanding th e replication of the genetic code. A:T pairs ar e maintained by two hydroge n bonds , whil e G: C pair s have three bonds . Fo r isolated pair s this leads to stronger bindin g i n th e latte r case (G:C —21 kcal/mol versus A:T —1 3 kcal/mol, meas ured in vacuum, ref . 8) , and, in general , G: C pair s are less easil y deformed o r broke n within DN A tha n A:T pairs . It should, however , b e recalle d tha t base pairing is much
Fig. 2.2. Schematic views of various types of base pairing.
44
Oxford Handbook of Nucleic Acid Structure
Fig. 2.3. Standard Watson–Crick pairing geometries: A:T (top), G:C (bottom).
weaker in water tha n in vacuum [th e values are unknown i n water, sinc e isolated bases prefer t o stac k rather tha n t o pair , bu t G: C pairin g i n chlorofor m (9 ) is reduced t o -5.8 kcal/mol] . It should be noted tha t canonical Watson—Crick base pairs involve base s in th e ket o and amin o forms . I n paralle l wit h th e appearanc e o f th e doubl e helica l mode l fo r DNA, Watso n an d Cric k propose d tha t tautomeris m fro m ket o t o eno l an d fro m amino t o imin o form s coul d b e a t th e origi n o f th e poin t mutation s necessar y t o power evolutio n (10) . As shown i n Fig . 2.4 , suc h tautomerism permits th e formatio n of A:C an d G:T pair s with overal l geometries very clos e to thos e of the canonica l base pairs. Much effor t ha s since been pu t int o attempt s to demonstrat e th e importanc e o f such mechanisms for mutagenesis (11). The presen t state of knowledge, however , sug gests tha t poin t mutation s occu r mos t frequentl y as a result of th e formatio n o f G: T (Fig. 2.2 ) an d A +:C wobbl e pair s rather tha n o f tautomeri c forms . Thi s i s supported by a n increasin g numbe r o f crystallographi c structure s containin g mispair s (12 ; se e also Chapte r 10) . I t i s als o clear , today , tha t th e flexibilit y o f th e doubl e heli x als o allows i t t o accommodat e R: R an d Y:Y pairs whos e Cl'—Cl ' separations (whe n th e interaction involve s the Watson—Cric k faces ) are , respectively , muc h wide r (12. 5 A) and much narrowe r (8. 4 A ) than thos e o f the canonica l bas e pairs (1 1 A). I n th e cas e of R:R mispairin g i t is , however, als o possible t o diminis h thes e steri c constraints by changing t o syn conformations (13,14) , whil e NM R dat a show tha t Y:Y pairs can be extended b y water bridging (15) .
Base and base pair morphologies, helical parameters, and definitions 4
5
Fig. 2.4. Base pairs formed by non-standard base tautomers. Left: A:C* and A*:C involving imino forms. Right: G:T* and G*:T involving enol forms.
Despite th e primordia l importanc e o f the standar d Watson—Cric k pairs , the way s that bases ca n b e assemble d by hydroge n bondin g ar e remarkabl y varied. As the rang e o f nucleic acid conformations has progressed, mor e an d more structure s containing non canonical base interactions hav e been generate d (se e examples i n Fig . 2.2) . Th e mos t important alternativ e pairing s ar e probabl y th e Hoogstee n an d reverse d Hoogstee n schemes, whic h occu r notabl y withi n triple t helice s (se e Chapter 12) . These pairing s involve eithe r purines or pyrimidines interacting with the site s on purin e bases that are not involve d in Watson-Crick hydrogen bondin g (N 7 and O6 i n the cas e of guanine, N7 an d N 6 i n th e cas e o f adenine) . Thi s explain s wh y a combinatio n o f Watson-Crick an d Hoogstee n (o r reverse d Hoogsteen ) pairin g ca n coexis t withi n a triplex (Fig . 2.5). I t should be note d tha t Watson—Crick pairing ca n also be 'reversed ' in th e cas e of certain bases. This occurs notabl y i n parallel-stranded DN A (16) . Mos t unusual pairs are less stable tha n thei r canonica l cousins , but the y ar e ofte n stericall y advantageous, b y being adapte d to narro w o r wid e stran d separations or t o particular backbone orientations . Example s o f unusua l pairing s occu r commonl y withi n th e complex folde d structures of RNAs, within loops , withi n mispairs , and within chemi cally modified nuclei c acids. Lastly, wit h certai n bases , it i s also possible t o for m four-strande d (o r quadruplex) structures a s shown i n Fig . 2. 6 involvin g G tetrad s (17—19 ; se e also Chapte r 13) . I t is also interesting t o not e tha t two identica l base pairs can also form favourabl e interactions betwee n thei r majo r groov e face s (20) . This typ e o f interactio n i s the basi s o f a four-stranded structur e that could play a role i n homologous recombinatio n (21) . It is possible to describ e these multiple pairin g schemes in an ordered way . The firs t important ste p in this direction cam e from th e wor k o f Rose et al. (22) who remarke d that th e nuclei c aci d base s hav e tw o distinc t face s (becaus e the y hav e n o twofol d
46
Oxford Handbook of Nucleic Acid Structure
Fig. 2.5. Schemati c views o f bas e triplets : T. A x T , C. G X C+, C. G X G, T. A X A. Th e do t signifie s Watson-Crick pairin g between th e firs t tw o strand s and th e cros s eithe r Hoogstee n (TAT , CGC +) o r reverse Hoogsteen pairin g (CGG, TAA) between th e first and third strands.
symmetry axi s i n thei r T-plane) . Thi s point , whic h becam e importan t whe n dis cussing th e differenc e betwee n B - an d Z-DN A (23) , led t o th e ide a tha t thes e face s should be distinguished when describin g pairing interactions. A unique definition can be made using the right-hand rule , with the finger s o f the right han d pointing aroun d the shortes t distanc e from th e glycosidi c bon d t o th e Watson—Cric k pairin g edg e o f the base. The directio n o f the thum b the n indicate s a unique face whic h w e will con ventionally colou r white, th e opposin g face being black. To make a simplified diagram that show s not onl y bas e orientation, bu t als o strand orientation, w e dra w a rectangle for th e bas e (longe r fo r purine s tha n fo r pyrimidines) , ad d a lin e representin g th e
Base and base pair morphologies, helical parameters, and definitions 4
7
Fig. 2.6. Schemati c views of base quartets: (top ) G4 wit h four identica l reverse Hoogsteen pairings, and (bottom) (AT) 2, where two Watson-Crick A: T pairs interact through their major groov e faces.
glycosidic bond , an d ad d a small circl e representing th e strand : a white circl e mean s that th e 5'—>3 ' directio n point s upwards , whil e a blac k circl e mean s tha t i t point s downwards. Fo r canonica l nucleotides, wit h a n anti conformatio n aroun d th e glyco sidic bond , a white circl e wil l alway s accompany a white bas e face, whereas , fo r syn nucleotides, th e circl e and the base face wil l hav e opposite colours . We ca n us e thi s schem e t o classif y duple x structure s full y (Fig . 2.7) . Takin g int o account th e Watson—Cric k (W), reverse d Watson—Cric k (C) , Hoogstee n (H) , an d reversed Hoogstee n (R ) pairin g schemes , tw o nucleotide s ma y be paire d in fou r dif ferent ways . Since, i n addition , eac h nucleotid e ca n be i n on e o f two possibl e states, there ar e a tota l o f 1 6 distinc t combinations . I n fact , a s the figur e shows , ther e ar e
48
Oxford Handbook of Nucleic Acid Structure
Fig. 2.7 . Diagrammati c representations o f double-strande d nucleic acid helices : W, Watson—Crick ; C, reversed Watson-Crick; R , reverse d Hoogsteen; H, Hoogsteen .
actually onl y 1 4 unique classe s becaus e o f two degeneracie s create d b y th e pseudo dyad symmetry of W an d C pairing (not e eac h base pair in Fig . 2.7 is oriented s o that the left-han d base, o r lowe r base , shows it s white face) . Eac h structural family ca n b e defined b y a notatio n consistin g o f a lette r t o specif y th e bas e pairin g (W , C, H , or R) , a prefix indicating whether th e stran d directions are parallel (+) o r antiparallel (—) an d a suffix specifyin g whether th e left-han d (o r lower) nucleotid e i s of type 'a ' o r type 's ' (thi s index ca n be droppe d i n th e cas e of the degenerat e pair s + Wa/+WS an d -ca/-cs). If we conside r the classica l B-DNA duple x (correspondin g to the diagra m in the top left-han d corne r o f Fig. 2.7) , the combinatio n o f Watson—Crick pairin g an d anti nucleotides automaticall y leads to antiparalle l strands. The sam e result can be obtaine d with revers e Hoogstee n pairing , whic h als o ha s on e whit e an d on e blac k bas e fac e exposed, but not wit h reverse d Watson—Crick or Hoogsteen pairing . Thi s resul t points to th e utilit y o f such a scheme. I t links together thre e factors : th e typ e o f pairing, th e anti/syn conformatio n of the nucleotides , an d th e stran d directions . Thi s ca n be ver y useful i n buildin g nuclei c aci d structure s since i f an y on e o f thes e dat a i s absent, its nature ca n be deduce d fro m th e othe r tw o (24,25) : when th e bas e pair face s ar e of a common colour , paralle l strand s impl y nucleotide s o f th e sam e type an d antiparalle l strands impl y nucleotide s o f differen t types . Th e opposit e i s true whe n tw o colour s
Base and base pair morphologies, helical parameters, and definitions 4
9
appear o n th e bas e pai r faces . On e must , however , b e cautiou s concernin g on e point—the stran d direction referre d t o her e i s local. If we attempt t o appl y these rule s to Z-DNA , Watson—Cric k pairin g combine d wit h a syn purin e bas e an d a n anti pyrimidine bas e woul d impl y paralle l strands . I n fact , thi s i s tru e o n a loca l level , despite th e fac t that , macroscopically, Z-DNA is an antiparallel duplex . Thi s apparen t conflict i s create d b y th e stron g zigza g i n th e phosphodieste r backbon e whic h gav e this conformation it s name (26) . A numbe r o f th e conformationa l familie s show n i n Fig . 2. 7 ar e alread y known . The famil y -W a correspond s t o B-DN A (o r A-DNA ) wit h antiparalle l strands , Watson—Crick hydroge n bonding , an d anti nucleotides . Th e famil y -W S correspond s to Z-DNA. The representatio n o f -WS makes it clear that base pairs have to be turne d over i n passing from th e B to th e Z conformation , sinc e t o alig n th e stran d direction s between th e firs t tw o familie s i n th e figur e i t i s necessary t o inver t th e -W S diagra m around a horizontal axi s leading t o a base pair with a black face o n th e lef t an d a white face o n th e right . Changin g th e nucleotid e stereochemistr y a t Cl ' b y usin g a-nucleotides is one rout e t o ne w Watson—Crick families. This change diminishe s th e steric hindranc e associate d with syn conformation pyrimidines . I t i s thus no t surpris ing tha t a n al l a-nucleotide duple x belongin g t o th e famil y -W S can be forme d (27) . The fina l famil y tha t ca n b e mad e wit h Watson-Cric k bas e pairs, + W, ha s also bee n observed i n parallel-strande d duplexe s wher e on e stran d i s agai n compose d o f a-nucleotides in th e syn conformation (28) . For the reverse d Watson-Crick duplexe s only th e + Ca famil y is currently known . I t is found in parallel-stranded DN A (16 ) and in th e unusua l four-stranded i motif structur e (4). Hoogsteen an d reverse d Hoogstee n pairing s ar e see n withi n tripl e helice s and , indeed, i t i s possible to exten d thi s classificatio n schem e t o bot h triple x an d quadru plex structure s (24) . Figur e 2. 8 show s a n exampl e o f thi s fo r th e 1 6 triple x familie s that can be built fro m Watson—Cric k duplexes. Each o f these triplexes is named o n th e basis o f its two constituen t duplexes . Th e firs t family , buil t fro m a -W a Watson—Cric k duplex an d a -H a Hoogstee n duplex , thu s become s -W a-Ha. I n fact , th e nucleotid e type indicate d fo r th e secon d bas e pai r ca n b e droppe d sinc e i t mus t b e identica l t o that o f the firs t pair . The nucleotid e typ e refer s t o the left-hand , o r lower, nucleotide s for th e constituen t duplexe s an d i s necessaril y commo n t o an y pai r o f duplexe s forming a triplex . The bes t known tripl e heli x mad e b y adding a Hoogsteen-bonded thymidine stran d to a poly (dA):pol y (dT ) doubl e heli x (29 ) correspond s t o th e famil y -Wa+H since all nucleotides ar e anti an d th e Hoogsteen-boun d pol y (dT ) stran d i s paralle l t o th e adenosine stran d o f th e duplex . A n identica l tripl e heli x famil y CGC + ca n als o b e formed unde r acidi c condition s b y addin g a protonate d cytosin e stran d t o a pol y (dG):poly (dC ) duplex , agai n usin g Hoogstee n hydroge n bondin g (se e also Fig . 2.5) . The onl y othe r wa y t o for m a n al l anti tripl e heli x startin g fro m a Watson—Cric k duplex i s th e family -W a-R, whic h ha s indee d bee n experimentall y observe d fo r T.AxA and C.GxG tripl e helice s (30,31 ) wher e the tw o purin e strand s form a n antiparallel, reverse d Hoogstee n duple x (se e Fig . 2.5) . A relate d famil y containin g syn nucleotides i n th e thir d strand , -W a+R, i s als o know n t o exis t whe n a-thymidin e nucleotides, whic h ca n easil y adop t th e syn conformation , ar e buil t int o th e thir d strand o f a T.AxT triplex (32) .
50
Oxford Handbook of Nucleic Acid Structure
Fig. 2.8. Diagrammati c representations of triple-strande d nuclei c acid helice s base d on Watson—Cric k duplexes.
Forming th e tripl e helice s show n i n th e secon d colum n o f Fig . 2. 8 seem s unlikel y since the onl y known for m of the -W S duplex i s Z-DNA, in which th e majo r groove face o f the bas e pairs is sterically hindered. I n contrast, starting from a parallel-stranded Watson-Crick duple x + W (forme d usin g a n a-nucleotid e pyrimidin e strand ) on e could for m tripl e helice s belongin g t o th e familie s + Wa+H o r + Wa-R, whic h onl y require syn conformations in th e Watson—Crick-boun d pyrimidine strand . I t i s possible t o continu e thi s classification to dea l wit h othe r triplexes , fo r exampl e base d o n reversed Watson—Crick duplexes, an d also with quadruplexes . The reade r may refer t o an earlier publication for detail s (24).
Base and base pair morphologies, helical parameters, and definitions 5
1
4. Helical parameter definitions Although a completely detaile d descriptio n o f the conformatio n of a nucleic acid fragment require s 3 N Cartesia n coordinate s fo r N atoms , i t i s possibl e t o reduc e significantly thi s number o f variables. Since bond lengths an d valence angles vary only slightly, on e ca n describe a conformation successfull y usin g only torsio n angles . Sinc e the intracycli c torsio n angle s o f th e suga r ring s ar e dependen t o n on e another , i t is actually easie r t o describ e rin g puckerin g usin g th e well-know n pseudo-rotationa l variables (1 ) phase (P ) an d amplitud e (A). I n thi s case , a nucleotide ca n b e describe d by a tota l o f eigh t variables : th e backbon e torsion s a (P-O5') , B (O5'-C5') , y (C5'-C4') , e(C3'-O3'), an d f(O3'-P); th e suga r conformation give n b y P and A (which als o fix the 5 torsion aroun d C4'—C3'); and the glycosidic bond x (Cl'—N 9 for purines or Cl'—N l fo r pyrimidines). However , these parameter s are not ver y helpfu l for judging th e overal l shap e of the molecule . Sinc e nuclei c acid s often for m helica l structures i t i s clearly usefu l t o b e abl e t o describ e thei r helica l geometr y i n a mor e direct way. Such parameters have been employe d sinc e the ver y first nuclei c acid conformations wer e obtained , bu t becaus e thes e conformation s resulte d fro m fibr e diffraction experiments , which average ou t bas e sequence information , onl y perfectl y helical conformations were considered . For regula r helices , th e helica l axi s can b e locate d rathe r easily . When th e tail s of difference vector s joining symmetricall y relate d atom s (fo r example , successiv e Cl ' sugar atoms within a n oligonucleotide strand ) are brought together , thei r head s generate a plane an d th e helica l axi s is defined by th e perpendicula r t o thi s plane. A poin t on the helica l axis may be located by joining th e head s of successive vectors (which lie on a circle around the axis ) and projecting perpendiculars to these lines into th e plane described. Th e helica l axi s must li e a t th e intersectio n o f thes e perpendicular s (33). Once th e helica l axi s i s known, th e positio n o f th e bas e pair s ca n b e describe d i n terms o f th e ris e an d twis t betwee n successiv e pairs, leading als o t o th e pitc h o f th e helix an d th e numbe r o f base pairs per turn . I f a reference axis system is defined fo r each bas e pair [thi s i s ofte n take n a s the lin e joining th e R(C8)—Y(C6 ) atoms] , i t i s possible t o fi x th e distanc e o f th e bas e pair fro m th e axi s and th e inclinatio n o f th e base pai r wit h respec t t o th e axis . Th e calculatio n o f suc h parameter s ha s bee n described in detail by Struther Arnott (34) . When th e firs t single-crysta l nuclei c acid conformation s appeared , i t was clear that such description s wer e insufficient . Th e conformatio n o f th e famou s oligome r d(CGCGAATTCGCG)2 (35 ) showed tha t bas e sequenc e effect s le d t o a deforme d double heli x with non-planar bas e pairs, fluctuating rise and twist values , and a kinked helical axis . I f suc h deformation s remai n smal l i t i s stil l reasonabl e t o loo k fo r a n optimal linea r axi s t o describ e th e structure . Thi s ca n b e don e wit h th e metho d described above , but usin g an eigenvalue approach to fin d the shortes t principal axis of the ellipsoida l clou d no w forme d b y th e head s o f the vectors . A point o n th e axi s is similarly foun d b y looking fo r th e barycentr e o f th e disperse d intersectio n point s o f the perpendiculars to the projected vectors (33) . Observed deformations , however, le d to the nee d fo r an increased number o f parameters fo r describing the bas e pairs, suc h a s the propelle r twis t angle , forme d b y th e contra-rotation o f the base s aroun d their lon g axes , or slide , characterizin g th e lateral
52
Oxford Handbook of Nucleic Acid Structure
displacement o f successiv e base pairs. Suc h parameter s wer e stil l generall y calculate d with respec t t o th e R(C8)—Y(C6 ) axis, or , i n som e cases , wit h respec t t o th e lin e joining th e glycosidic C1'-C1' atoms. In the period followin g th e appearanc e of high resolution oligome r conformations , ne w parameter s were adde d t o th e existin g se t as the nee d wa s felt , withou t muc h attentio n bein g pai d t o coherenc e i n definitions , names, o r sig n conventions . Sinc e differen t group s use d differen t parameter s and cal culation techniques it became extremel y difficul t t o compar e existin g structures and it was clearly necessar y to review th e situation . Thi s review was carried ou t a t a meeting in Cambridg e i n 1988 , wher e a n effor t wa s made t o define , and name, a complete se t of parameter s fo r describin g helica l nuclei c acid s (36) . Figur e 2. 9 show s thes e parameters, whic h wer e divide d int o thre e families : base pair-axi s parameters , intra-bas e pair parameters, and inter-base pai r parameters. Each family contained th e geometric ally require d combinatio n o f rotation s (r ) an d translation s (t), 2t+2r i n th e firs t case , to positio n a bod y wit h respec t t o a vector , an d 3t+3 r i n th e othe r cases , whic h positions tw o bodie s wit h respec t t o on e another . Bot h bas e pair-axis an d inter-bas e pair parameter s ca n b e furthe r broke n dow n int o parameter s describin g individua l base positions . Althoug h i t i s important t o hav e complet e familie s of parameter s fo r mathematical reasons, it i s clear that the y ar e not al l equally interesting. Certai n para meters sho w onl y smal l variabilit y withi n standar d nuclei c aci d conformation s (notably, shear , stretch, stagger , and opening) , bu t eve n thes e parameters can becom e important i n describing the growin g numbe r o f deformed nucleic aci d conformations (see Section 6) . The Cambridg e meetin g fixe d th e name s (an d abbreviations) fo r al l parameters and defined a right-handed axi s reference system. The orientation s of these axes , which se t the positiv e directio n fo r translationa l variables , ar e show n i n Fig . 2. 9 an d notabl y have th e pseudo-dya d axi s pointing toward s th e majo r groove . Th e sig n o f al l rota tional parameters was chosen to correspon d t o right-handed rotatio n aroun d th e asso ciated axes . Finally , rule s wer e invente d fo r buildin g u p compoun d parameter s fro m the underlyin g parameter s referring t o individua l bases . Thus , bas e pai r parameter s such a s propeller ar e obtained b y adding th e bas e tip o f the left-han d stran d to tha t o f the right-han d stran d (lef t an d righ t refe r t o th e orientatio n show n i n Fig . 2.9 , wit h the viewe r lookin g int o th e mino r groove) . Othe r parameters , suc h a s buckle, ar e obtained by subtracting the inclinatio n o f the right-han d stran d from tha t o f the left hand strand. These definition s are given i n Tabl e 2.2. Not e tha t it is necessary to tak e into accoun t th e fac t tha t th e parameter s Ydis p an d ti p refe r t o a n axi s tha t point s towards the backbon e o f each stran d (36,37) . Rule s als o defin e th e derivatio n o f base pair ste p parameters, whic h ar e obtaine d b y subtractin g th e valu e fo r th e lowe r bas e pair from that o f the uppe r base pair (agai n with th e nuclei c aci d oriented a s shown). With these rules , all the parameter s in Fig . 2. 9 (an d in the variou s publications resulting from the Cambridg e meeting ) ar e positive. Sinc e 198 8 thes e parameters have been almost completel y respected , althoug h a disagreement ha s arisen concernin g buckle , which i s defined by Dickerso n wit h a revers e sig n (Fig . 2. 9 woul d sho w a negative buckle i n thi s case) . A ne w parameter , terme d cup , ha s als o bee n introduce d t o characterize th e spac e create d "whe n tw o successiv e base pair s buckl e awa y fro m one another .
Base and base pair morphologies, helical parameters, and definitions 5
3
Fig. 2.9. Helica l parameters. Translations ar e shown i n the uppe r part of the diagra m and rotations in the lower part. Eac h section contain s base pair-axis, intrabas e pair, and interbase pair parameters, respectively .
54
Oxford Handbook of Nucleic Acid Structure
Table 2.2 . Helicoida l paramete r names and definitions for base pair values Name Famil
y
X-displacement Base-axi Y-displacement Inclination Tip
s
Shear Intra-bas Stretch
e
Code
Symbol
Base pair value
XDP YDP INC TIP
dx dy
(dxL + dxR)/ 2 (dyL - dy R)/2
e
[0L-0 R ]/2
SHR STR
sx
Stagger
STG
Buckle Propeller
BKL PRP
Opening
OPN
Shift Inter-bas Slide Rise Tilt Roll Twist Axis X-disp. Axi Axis Y-disp. Axis inclinatio n Axis ti p
e
s
SHF SLD RIS TLT ROL TWT AXD AYD AIN ATP
£LL + t
2 (Dz L-DZR) in— i K w
0L+0R
a
SjflL-nj
Dxyz T P
dx(i) + AX- dx(i -1) dy(0 + Ay - dy(i -1)
•"7(0 + T/ A - 17( 1 --1) 0(0 + 0 A _0(i'-l )
A]
OA
The definitio n colum n abov e indicate s how compoun d parameter s (bas e pair, interbase) are built up. A full geometrica l definitio n o f the parameter s ca n be found i n an earlier reference (42) . Stagger and opening fo r the it h bas e pair are, respectively , sum s of differences i n rise and twist .
5. Helical parameter calculations Despite th e importanc e o f defining names and sign convention s for helical parameters, it i s still necessar y to defin e ho w the y ar e to b e calculated . Thi s wa s not determine d by the Cambridg e meeting , whic h invite d thos e intereste d to compar e the behaviou r of the variou s existing method s (38—42) . Sinc e the n th e situatio n ha s evolved t o som e extent. First , most program s hav e been revise d t o respec t the Cambridg e recommen dations and basi c criteria such as the independenc e o f parameters with respec t to th e direction i n whic h a n oligome r i s analysed . Secondly , whil e certai n program s hav e continued t o b e use d relativel y frequently , other s hav e mor e o r les s disappeared . However, th e overal l choic e o f method s ha s hardly diminishe d sinc e ne w program s have als o been propose d (43—45) . I n addition , a n importan t questio n ha s arisen con cerning th e nee d for defining a helical axis in the cas e of irregular conformations. This question is linked t o tw o fundamentall y differen t way s of describing nuclei c aci d con formations, whic h ar e termed globa l an d local. The globa l approach is an extension o f
Base and base pair morphologies, helical parameters, and definitions 5
5
the analysi s of regular helices . I t maintains th e notio n of a helical axi s and parameters that positio n th e base s or bas e pairs with respec t t o thi s axis . The difficult y wit h thi s approach arise s fro m th e fac t tha t a linea r helica l axi s i s n o longe r appropriat e fo r many conformations . A possibl e solutio n i s t o calculat e individua l linea r axe s fo r helical segments that are more o r les s straight, bu t thi s choice i s necessarily subjective. The alternativ e local approach t o helical parameters abandons the notio n o f a helical axis and calculates parameters that describe the junctions linking on e bas e or base pair to th e nex t alon g th e nuclei c aci d fragment . Th e differenc e betwee n th e loca l an d global approache s has been nicel y illustrate d by Calladin e an d Dre w (46 ) in thei r dis cussion of the transition between th e B- an d A-forms of DNA. Fro m th e global viewpoint, thi s transition consist s of moving th e bas e pairs away from th e axi s towards th e minor groov e (negativ e displacement), inclining the m wit h respec t to th e helica l axis (positive inclination) , an d reducin g th e helica l ris e an d twist . I n contrast , fro m th e local viewpoint , th e sam e transitio n i s described a s sliding successiv e base pair s ove r one anothe r i n th e directio n o f thei r long axe s an d creatin g a rol l wedg e betwee n them. Usin g either loca l or global parameters, A- an d B-DNA are distinguished fro m one another , bu t b y differen t means . I n th e globa l case , th e distinctio n i s mainly o n the level o f the base pair-axis parameters , while in th e local case , inter-base pai r para meters mus t be used . Note tha t whil e bot h th e loca l approac h an d th e globa l approac h calculat e inter base pai r parameters , thes e parameter s ca n diffe r quit e dramaticall y i f the conforma tions analyse d d o no t hav e bas e pair s centre d o n th e axis , a s in canonica l B-DNA . This i s owin g t o th e fac t tha t globa l inter-bas e pai r parameter s ar e calculate d afte r deconvolution o f base pair-axis parameter s (37) . I t shoul d b e added , i n thi s connec tion, tha t the globa l parameters shift, slide , tilt, and roll are all indicative of dissimilar ity betwee n th e tw o strand s o f th e doubl e heli x an d thes e parameter s ar e zer o b y definition for duplexes where dyadic symmetry exist s between the two strands . Thes e problems ar e reconsidered i n Sectio n 6 , where w e present several examples o f helical analysis for regular an d irregular conformations . There ha s bee n n o fina l decisio n regardin g loca l versu s globa l parameter s and , indeed, bot h ca n be useful . Loca l paramete r algorithm s hav e th e advantag e that the y avoid th e difficult y o f definin g axes , they als o reflect mor e closel y th e model s developed fo r describing intrinsically curved DNA , whic h onl y attemp t t o defin e th e twist , tilt, an d rol l o f successiv e inter-base pai r steps . The y als o yiel d parameter s tha t onl y depend o n the geometr y o f the give n dinucleotid e step . In contrast , global parameters depend o n th e conformatio n o f th e whol e nuclei c aci d fragmen t analysed , but the y have th e majo r advantag e of enabling th e exten t an d the locatio n o f curvature t o b e calculated. (I t must be stresse d that non-zer o loca l rol l o r til t angle s certainl y d o no t imply curvature , a s shown b y loca l descriptio n o f regula r A-DN A discusse d above. ) Since axi s bending i s a common featur e of nucleic acid conformations, notabl y withi n protein-nucleic aci d complexes, th e availabilit y o f a defined axi s has ofte n becom e a determining argumen t fo r the us e of global parameters. Globa l parameters finally hav e the advantag e o f distinguishin g more easil y between th e differen t familie s o f helica l conformation. We present below a brief summary o f the various analysis techniques that have bee n developed t o date .
56
Oxford Handbook of Nucleic Acid Structure
5.1 'Newhelix' (38) 'Newhelix' remain s a popula r DN A analysi s program . I t originate d fro m th e 'Modhelix' cod e writte n b y Rabinovich , Reich , an d Shakke d a t th e Weizman n Institute o f Scienc e o n th e basi s o f routine s comin g fro m th e Heli b librar y o f Rosenberg an d Dickerson . I t i s basicall y a globa l paramete r approach , bu t i t i s restricted t o calculatin g an optimal linea r helical axi s using the techniqu e describe d in Section 4. Base pair parameters are defined with respec t to a n R(C8)—Y(C6) reference vector, an d thu s helical twis t become s th e angl e betwee n successiv e reference vectors projected o n t o th e plan e perpendicular t o th e helica l axis , whil e ris e i s the distanc e between successiv e R(C8)—Y(C6) vectors, projecte d ont o th e helica l axis . Similarly , slide is the relativ e movemen t of the bas e pair s alon g the directio n define d by the averaged R(C8)—Y(C6) vectors for the ste p in question . Roll an d til t hav e tw o definition s i n 'Newhelix' . Th e origina l definitio n involve s calculating th e angle s betwee n th e successiv e bas e pai r normals , whic h ar e the n resolved int o th e pseudo-dya d an d perpendicula r direction s based o n th e helica l axi s reference system . This techniqu e i s appropriate onl y fo r smal l angles . A mor e recen t definition involve s a preliminary remova l o f helical twist fo r the base pair ste p in question, t o avoi d paramete r dependencies . Th e tw o set s o f value s fo r rol l an d til t ar e related by formulae that involve ti p an d inclinatio n (45) . Lastly, propeller i s calculated as th e angl e betwee n th e tw o bas e normal s projecte d int o th e plan e norma l t o th e R(C8)—Y(C6) vector , and buckle is obtained b y projection into a plane define d by th e R(C8)—Y(C6) vector an d the bisector of the tw o base normals. When loca l kinkin g i s suspected, th e molecul e ca n b e divide d int o segment s fo r which individua l straight-lin e axe s are calculated. Thi s simpl e techniqu e nevertheles s introduces a degre e o f subjectivity . Th e author s o f 'Newhelix ' hav e approache d th e analysis of axis deformation with a supplementary progra m name d 'Bend ' (47 ) whic h calculates th e bendin g a t a give n bas e pai r a s the angl e forme d betwee n th e mea n normal vectors belonging t o th e bas e pairs 'i' step s before and afte r th e tes t pair (i = 1 , 2,...). Globa l curvatur e i s measure d a s the angl e betwee n norma l vector s average d over 1 0 successive base pairs in orde r t o attenuat e local conformational irregularities.
5.2 'von Kitzing/Diekmann' (39) This progra m i s a local paramete r approach . Bas e plan e normal s ar e define d b y th e atoms o f th e six-membere d ring s o f th e purin e an d pyrimidin e bases . Bas e pai r normals ar e taken as perpendicular t o th e least-square s plane o f the tw o bases , which can be weighted t o tak e into accoun t the differenc e betwee n purine s and pyrimidines. The bas e pair referenc e vector i s then define d a s the projectio n o f the R(C8)-Y(C6 ) vector o r o f th e Cl'—Cl ' vecto r o n t o th e bas e pai r plane . Decomposin g th e angl e between th e base normals along the referenc e vector an d perpendicular to thi s direction the n leads to the buckle an d propeller angles . The relativ e position s o f successiv e base pair s ar e define d i n a loca l sens e usin g wedge angle s between th e bas e pair normals , whic h ar e decompose d along , o r per pendicularly to , a n axi s obtaine d b y averagin g th e referenc e vectors o f the bas e pairs involved. Thi s lead s to rol l an d til t values. The translationa l parameters rise, 'long axis
Base and base pair morphologies, helical parameters, and definitions 5
7
slide' (slide) , an d 'shor t axi s slide ' (shift ) ar e als o calculate d wit h respec t t o a n axi s system average d ove r th e tw o bas e pair s involved . I n a second , so-calle d cylinder , approach (terme d 'scre w axis' b y other authors), a local helica l axi s is defined s o that the passag e from on e bas e pair axi s syste m t o th e nex t ca n b e obtaine d b y rotatio n around an d translatio n along this axis . The orientatio n o f the bas e pair reference axes with respec t t o th e cylinde r axi s i s then use d t o defin e 'cylinder ' roll , tilt , an d twis t angles, a s well a s the relate d translationa l parameters. Thes e si x cylindrical parameter s are identica l t o thos e use d b y Arnot t et al. for regula r helice s (34) . Axis curvatur e is estimated b y bringin g th e successiv e base pai r normals , o r cylinde r twis t axes , t o a common origin . Th e head s o f these vectors the n li e on a unit sphere , whos e surface can be mappe d o n t o a plane using a Mercator projection . A bent heli x will appear as a pathway across this plane.
5,3 'Tung/Soumpasis' (40) This metho d use s inertia l axe s t o creat e th e bas e (o r bas e pair ) referenc e systems . These axes are obtained by diagonalization of the moment of inertia tensor, wit h th e origin bein g take n as the centr e o f mass of the bas e (o r base pair). Thi s approac h has the advantage of being applicabl e to bot h unusua l bases and unusual pairing schemes , but require s detaile d correction s t o avoi d apparen t irregularit y withi n regula r helice s resulting simpl y from variations i n the chemica l structur e of successive base pairs. The positio n of successive base pairs is defined by a translation along th e differenc e vector joining th e origin s o f the tw o referenc e axi s system s and b y a 3 X 3 rotatio n matrix. Translationa l parameters are obtained b y projecting th e differenc e vecto r o n t o a mea n axi s syste m betwee n th e bas e pairs , an d rotationa l parameter s ar e obtaine d from, the rotatio n matrix, which i s decomposed int o three Euler angles . Parameters for the base s within a base pair are obtained i n a similar way. A straigh t globa l axi s ca n als o b e obtaine d i n thi s metho d b y diagonalizin g th e moment o f inerti a tenso r fo r th e entir e molecul e o r fo r a subse t o f selecte d atoms , such a s Cl' o r P , to avoi d artefact s relate d t o th e chemica l structure s o f th e bases . Curvature i s agai n approache d b y lookin g a t th e angle s forme d betwee n successiv e base pai r normal s o r b y calculating straight-lin e axe s for segment s o f the molecule , a s in the cas e of 'Newhelix'.
5.4 'Bansal' (41) This method i s similar to tha t o f von Kitzin g and Diekmann . Th e bas e pair reference vector i s agai n chose n a s R(C8)—Y(C6), whose midpoin t determine s th e bas e pai r origin. A mea n plan e perpendicula r t o th e average d bas e norma l i s then calculated . Wedge parameter s are use d t o describ e th e relativ e orientatio n o f tw o bas e pair s i n terms of an axis system calculated by averaging the bas e pair reference planes and mid points. Propelle r an d buckle angle s are decomposed usin g the mea n bas e pair norma l and the bas e pair reference vector. The metho d equall y determines loca l helical axe s for each base pair step within th e molecule an d relate d 'helical ' (otherwis e terme d cylinde r o r scre w axis ) parameters .
58
Oxford Handbook of Nucleic Acid Structure
A straight globa l axi s can be calculate d as a least-squares fi t t o th e successiv e local axis reference point s an d an idea o f curvature i s obtained by plotting the pat h o f the loca l helical axe s in a plane perpendicular t o th e globa l axis.
5.5 'Babcock/Olson' (43,44) This method , whic h als o calculate s loca l helica l parameters , employ s a ful l threedimensional rotatio n matri x fo r relating th e position s of bases and bas e pairs. Bases are considered t o rotat e aroun d a chosen pivo t poin t an d th e author s hav e carefull y con sidered the effec t o f the choice of this point on the dependence between helical parameters. Th e principa l axis of each base passes through R(C8 ) o r Y(C6) an d is parallel to th e C1'-C1 ' directio n i n th e corresponding , idea l Watson—Cric k pair . A perpendicular vecto r lie s i n th e bas e plane , pointing toward s th e majo r groove , an d passe s through th e midpoin t o f the idea l Cl'—Cl ' vector . A s in 'Curves ' (se e Section 5.7) , a set o f reference bases are available for fittin g to experimenta l coordinate s t o avoi d th e effects o f base deformation. Unusua l geometries , suc h a s syn bases, are dealt wit h via specially adapted reference systems. Interbase pair parameters are calculated with respect t o a coordinate frame , whic h i s defined a s the half-wa y rotate d an d translated syste m between th e tw o bas e pair refer ences. Parameter s correspon d t o simultaneou s rotation s aroun d th e thre e axe s in th e coordinate fram e an d ar e decompose d b y a formalis m adapte d fro m rigi d bod y dynamics. I n addition, unfortunatel y termed 'local helica l parameters ' (agai n related t o earlier 'cylinder ' o r 'scre w axis ' approaches) ar e calculated from the scre w axi s linking successive base pairs. Intrabase pai r parameter s ar e derive d fro m a half-way-rotate d referenc e fram e derived fro m the tw o bas e reference systems. Both translational (shear, stretch, stagger) and rotational (buckle , propeller, opening) parameter s ar e therefore mad e u p fro m tw o half movements o n eithe r side of the referenc e system.
5.6 'El Hassan/Calladine' (45) This recen t loca l paramete r approac h (name d CEHS : Cambridg e Universit y Engineering Department Helix Computation Scheme) share s many common features with the algorithm s alread y described and , fo r many parameters, give s results similar t o 'Newhelix'. It s author s fee l strongl y tha t onl y loca l parameter s ar e usefu l fo r under standing nuclei c aci d structure . Makin g a ne w analysi s approac h a t thi s lat e stag e i s justified b y criticism s o f a subse t o f earlie r method s ('Newhelix' , 'Babcock/Olson' , 'von Kitzing/Diekmann') . A t a base pair level, th e standar d R(C8)-Y(C6) vector an d the mea n bas e pair normal ar e used to defin e the referenc e axis system. For individua l bases, axe s parallel t o R(N1—C4 ) or Y(N3—C6 ) ar e chosen . Parameter s ar e based o n Euler angle s with on e approximation : twis t i s treate d normally , bu t rol l an d til t ar e grouped int o a single rotation abou t a 'RollTilt' axi s in the xy-plan e o f the referenc e system betwee n successiv e base pairs. I n a similar way , base pair parameter s ar e sep arated int o a principal propelle r twis t angl e an d a groupe d openin g buckl e rotatio n around a common axis .
Base and base pair morphologies, helical parameters, and definitions 5
9
5.7 'Curves' (42) 'Curves' wa s created wit h th e ai m o f obtainin g a global descriptio n o f nuclei c aci d conformation. It s developmen t wa s guide d b y th e desir e t o exten d th e approac h applicable to regular helical geometries t o the description o f irregular systems, withou t losing the notio n o f a helical axis . In the cas e of a perfect helix, 'Curves ' automaticall y yields a straight-line axis . In thi s case , ever y monomer ha s the sam e relative positio n and orientatio n wit h respec t to th e axis , an d consecutiv e monomer s ar e related by a fixed rotatio n around , an d translation along, th e axis. 'Curves' extend s thes e notion s t o irregula r conformation s b y introducin g a least squares optimizatio n procedur e base d o n a functio n tha t mathematicall y describe s departures fro m idea l helical symmetry . First , it i s required tha t the helica l axis should be a s straight a s possible. I f the overal l axi s i s broken dow n int o segments , wit h on e segment pe r nucleotid e (o r nucleotid e pair) , the n thes e segment s shoul d ideall y b e aligned an d th e referenc e points o n eac h axi s shoul d no t b e laterall y displace d fro m one another . Thes e tw o criteria , which are , respectively, rotational an d translational in nature, ca n be expresse d as sums of squares, with eac h term referrin g to a dinucleotid e step within the nuclei c acid . Next, it i s required tha t successive nucleotides should, as far a s possible , hav e identica l orientation s wit h respec t t o thei r loca l helica l axi s systems. Th e translationa l an d rotationa l difference s betwee n successiv e nucleotides (or nucleotide pairs ) again leads to tw o term s which ca n be squared and summed ove r the nuclei c aci d fragment . Thi s procedur e lead s t o a functio n wit h fou r term s tha t describes the helica l irregularity. I f we now consider the parameters tha t position th e individual axis segments as variables (two translations and two rotation s with respec t t o a reference nucleotide a t each level), it is possible t o searc h for th e variable s that minimize th e irregularit y function . Thi s se t o f variable s the n define s th e optima l axi s describing th e give n nuclei c acid conformation . Several remark s ca n be mad e concernin g thi s approach . A s already mentioned, th e optimization procedur e mean s tha t th e analysi s of a helicall y regula r conformatio n will automaticall y lead to a straight axis, since all terms of the irregularit y function can simultaneously becom e zero . I n th e cas e of irregular conformations , th e axi s will b e chosen so as to minimize bot h deformatio n o f the axi s and irregular positioning o f th e bases with respect t o thi s axis . Thi s choice will be optima l in a least-squares sense . I t reveals th e presenc e o f axia l deformatio n and/o r bas e mispositioning , withou t an y subjective decision s having t o b e made . I t als o avoids local change s in helica l confor mation bein g incorrectl y interprete d a s axis curvature . I t shoul d als o b e adde d that , after optimization , th e valu e o f th e irregularit y functio n i s in itsel f a usefu l measure . This value can be broken dow n int o contribution s fro m each dinucleotid e ste p (DIF : dimeric irregularit y function), yielding a valuable guide t o the location of deformation 'hot spots ' within th e structur e (see Section 6) . It is also important t o not e tha t 'Curves ' i s founded o n individua l bases and not o n base pairs . 'Curves ' define s eac h nucleotid e b y a base-fixe d referenc e axi s syste m whose origi n lie s beyon d th e Watson—Cric k base pairin g fac e an d whos e z-axis i s perpendicular t o th e bas e plane . Thi s choic e wa s mad e s o tha t th e referenc e axis systems would b e centre d o n th e helica l axi s within th e canonica l B-DNA conform ation (Xdis p = Ydis p = 0) . Sinc e eac h nucleotid e ha s its own referenc e axis system ,
60
Oxford Handbook of Nucleic Acid Structure
a duple x ca n be treate d i n tw o ways : an optima l helica l axis can be calculate d for th e duplex, o r axes can be generated independentl y fo r each strand, showing u p disparities between them . Similarly , i t i s easy to trea t three - o r four-strande d systems , withou t having t o chang e th e referenc e axes , an d i t i s easie r t o trea t modifie d nuclei c acid s containing bulges , abasi c sites, or mispairing. Thus , helica l deformations resultin g fro m the unorthodo x orientation of one o r more bases (fo r example, cause d b y a transition from anti t o syn conformation) ca n be avoide d b y excludin g th e correspondin g base s from th e helica l axi s optimizatio n procedure . Onc e th e unperturbe d axi s i s known , the position o f these bases can, nevertheless, be calculated. It shoul d b e note d tha t car e must be take n i n determinin g th e base-fixe d reference axes in case s where th e base s themselves ma y be deformed . Thi s ca n arise in low resolution X-ra y o r NM R conformations , bu t i s mos t commo n withi n conformation s coming fro m molecula r dynamic s trajectories , where th e effec t o f therma l agitatio n can lea d t o majo r out-of-plan e deformations . I n suc h case s i t i s better t o fi t a n idea l base conformatio n optimall y t o th e give n coordinate s before calculating the reference axes. Although th e 'Curves ' algorith m wa s specifically made for global helicoida l analysis, it als o calculate s local helica l parameters , maintainin g th e nucleotid e leve l approac h and th e base-fixe d referenc e axi s system s describe d above . Havin g bot h globa l an d local parameter s available from a single analysi s allows a deepe r understandin g o f th e conformation i n han d an d allows easie r comparison wit h othe r methods , whic h hav e almost exclusivel y chose n th e loca l approach . I t i s als o possibl e t o us e 'Curves ' t o determine an optimal linear axis. As wel l a s providin g numerica l parameter s describin g helica l conformation , 'Curves' create s a graphical fil e includin g a spline-fitted curv e describin g th e optima l helical axi s and a simplified ribbon an d plate representatio n of the nuclei c aci d back bones an d base s (se e example s i n Sectio n 6) . Th e progra m ca n finall y carr y ou t a n analysis o f groov e geometr y base d o n spline-fitte d curve s runnin g throug h chose n backbone atoms . Thi s approac h lead s t o a continuous measuremen t o f groove widt h and depth , an d als o of helical diameter (48) .
6. Examples of helical analysis The fundamenta l differences betwee n th e method s describe d abov e ca n be illustrated by looking at possible helical axis definitions within a n irregular structure . Figure 2.1 0 shows a theoreticall y generate d DNA dodecame r wit h a 50 A radiu s of curvature . Using 'Curves ' i t is possible to mimi c th e way s the variou s analysis schemes distribut e this irregularity . First , it i s possible t o insis t o n a linear axi s an d t o locat e it s optima l orientation fo r the molecule . Thi s is the approac h adopte d b y 'Newhelix' . Secondly , one ca n calculate separat e helica l axe s for eac h successiv e base pair step without wor rying abou t longe r rang e continuity. Thi s correspond s t o th e so-calle d 'scre w axis ' o r 'cylinder' approache s discussed above. Thirdly , on e ca n look fo r an optima l curve d axis as is normally don e wit h th e 'Curves ' algorithm . Thes e choice s clearl y have a n effec t on th e resultin g helical parameters, particularl y when the y ar e compounded b y differ ent base reference systems and translation/rotation definitions .
Base and base pair morphologies, helical parameters, and definitions 6
1
Fig. 2.10. Differen t analyses of a curved DN A dodecamer , usin g a global linea r axi s (left) , loca l helical axes for eac h base pair step (centre) , and a curved globa l axi s (right) .
For a compariso n o f th e numerica l result s obtaine d b y th e analysi s programs, th e reader i s referred to a recent stud y by Elgavish an d Harve y (49) . Thes e author s have compared the programs discussed above (exceptin g th e most recen t program, 'CEHS' ) for a number o f test oligomers, includin g tw o B-DNAs , a n A-DNA, an d tw o differ ent studie s of an Okazaki fragment . Th e result s bring t o light severa l points worthy o f note. First , the variou s programs ca n disagre e dramaticall y for give n helicoida l para meters, includin g suc h fundamental values as rise and twist . Thi s i s particularly visible when th e structur e differs significantl y from canonica l B-DNA , and , i n th e cas e o f irregular fragments , thes e disagreement s ca n lea d t o qualitativel y differen t structura l descriptions. Secondly , the program s fal l int o familie s based on th e difference s i n base reference syste m and algorithm describe d above . Thus , 'Newhelix ' an d 'Bansal ' ofte n agree closel y an d als o sho w stron g correlation s wit h 'vo n Kitzing/Diekmann ' fo r certain parameters . 'Babcock/Olson ' an d 'Curves ' (local ) parameter s als o agre e closely, with exception s fo r rise and an offset i n slide. The 'Tung/Soumpasis ' progra m stands apar t from th e other s i n man y case s ( a consequence o f th e author s preference for th e us e of inertial axes) . It is also recalled that 'Newhelix' and 'Bansal ' us e a differ ent sig n conventio n fo r buckle compare d wit h th e othe r programs , an d til t als o has a inverse sig n i n 'Tung/Soumpasis' . Lastly , 'Curves ' i s the onl y progra m t o propos e a n optimally curve d axi s fo r irregula r fragment s an d coheren t set s o f loca l an d globa l parameters. In orde r t o giv e a better feelin g for the sens e o f helical parameters and, notably , fo r the differenc e betwee n loca l an d globa l parameters , w e presen t belo w a numbe r o f analyses using 'Curves' .
62 Oxford Handbook of Nucleic AcidStructure
Base and base pair morphologies, helical parameters, and definitions 6
3
Fig. 2.11. Regula r DNA conformations , (i) A-DNA, (ii ) B-DNA, (iii ) Z-DNA. The three views shown are, respectively, (a ) along the axis , (b ) perpendicular to th e axis , and (c ) with an inclined axis to show the groove profiles .
6.1 Regular conformations We begin with the basic allomorphic form s o f DNA (an d RNA) whic h are presented in Tabl e 2. 3 (se e als o Chapter s 1 and 5) . Thi s tabl e include s bot h globa l an d loca l helical parameter s (th e latter being denoted b y the prefi x 'L-' ) and , for completeness , the backbon e conformations . (Onl y parameter s referrin g t o bas e pair s ar e presente d and those tha t remain clos e t o zer o fo r regular helices , shear , stretch, and stagger , have been excluded. ) Differen t fibr e conformation s ar e given fo r A- (50,51 ) an d B-DNA (51—53) an d for A-RNA (54) , while th e Z conformatio n i s represented b y the ideal ized Z I- and Z II-forms (55) . The overal l shape of these helicall y regula r duplexe s can be compare d i n Fig . 2.11 , wher e thre e differen t projection s illustrat e th e variations in diameter, bas e position, helicity , and groove geometry . Since thes e conformation s ar e well known , the y ar e useful fo r a first compariso n o f global an d local parameters. Tabl e 2. 3 show s littl e differenc e betwee n thes e tw o type s of parameters fo r B-DNA , wher e th e base s lie clos e t o th e helica l axi s an d ar e onl y slightly inclined . Th e difference , however , become s muc h cleare r fo r th e A an d Z conformations. Fo r A-DNA, as discussed in Sectio n 5 , the missin g Xdisp an d inclina tion value s are replaced i n th e loca l parameter descriptio n b y negative slid e an d positive roll . On e ca n also note tha t the shorter and broader natur e o f the A-DNA helix is visible in th e globa l Xdis p an d ris e values, whereas th e loca l ris e shows, as it should, a value clos e t o tha t i n B-DNA , linke d onl y t o loca l stackin g interactions . I t i s also important t o recal l that , i n th e globa l description , non-zer o value s o f shift , slide , tilt , and rol l al l signify difference s betwee n th e tw o strand s of a duplex (se e the definition s given i n Tabl e 2.2) . Thi s i s the cas e fo r A an d B helice s tha t hav e identica l strand s (homonomous conformations) , bu t no t fo r Z-DNA owin g t o it s dinucleotide repea t symmetry. Globa l shift , slide , tilt , an d rol l (unlik e thei r loca l equivalents ) ar e thu s a
64
Oxford Handbook of Nucleic Acid Structure
Table 2.3. Helica l an d backbone parameter s for allomorphic conformation s o f DNA an d RNA. (Translation s in A, rotations in degrees. Th e prefi x L- distinguishe s local parameters. For Z-DNA, values are given fo r CG/GC pairs and CpG/GpC steps, respectively) Parameter B-DNA B-DN A B-DN A A-DN A A-DN A A-RN A Z I-DNA (52) (51 ) (53 ) (50 ) (51 ) (54 ) (26,55 ) X-disp Y-disp Inclin Tip Buckle Prop Open
-0.71 0.0 -5.9 0.0 0.0 3.7 -4.1
Shift Slide Rise Tilt Roll Twist
0.0 0.0 3.38 0.0 0.0 36.0
L-Shift L-Slide L-Rise L-Tilt L-Roll L-Twist
0.0 -0.76 3.32 0.0 -3.6 35.8
(26,55)
-0.18 0.0 2.7 0.0 0.1 -15.1 0.4
-5.43 0.0 19.1 0.0 0.0 13.7 -4.6
-5.28 0.0 20.7 0.0 0.0 -7.5 0.0
-5.3 -1.16 -2.46 1.95 -2.32 0.0 -1.95 2.32 15.8 14.5 4.2 0.0 -178.5 178 .5 -178.2 178.2 4.9 -4 .9 0.0 6.3 -6.3 14.5 1.1 -0.8 -4.2 -0.1 5.6
0.0 0.0 3.38 0.0 0.0 36.0
0.0 0.0 3.38 0.0 0.0 36.0
0.0 0.0 2.56 0.0 0.0 32.7
0.0 0.0 2.56 0.0 0.0 32.7
0.0 0.0 2.81 0.0 0.0 32.7
0.0 3.90 -3.90 4.73 2.66 0.0 -2.9 2 .9 -8.9 -49 .2
0.0 4.63 -4.63 4.35 3.08 0.0 -3.6 3.6 -3.7 -56.3
0.0 0.08 3.38 0.0 0.9 35.6
0.0 0.04 3.38 0.0 1.7 36.0
0.0 -2.08 3.42 0.0 10.7 30.9
0.0 -1.92 3.44 0.0 11.4 30.7
0.0 -2.13 3.52 0.0 8.9 31.5
0.0 5.11 -1.87 3.58 3.21 0.0 -9 .2 -5.1 -8.7 -47 .7
0.0 5.13 -1.54 4.00 3.21 0.0 -4.2 -0.5 -3.6 -56.2
0.0 0.0 1.5 0.0 0.0 -13.3 0.0
-46.9 -40.7 -146.0 135.6 36.4 37.4 y s 156.4 139.5 e 155.0 -133.2 -95.1 -156.9 L -97.9 -101.9 X Phase 191.6 154.8 Amplitude 36.3 39.7 a B
ZII-DNA
-29.9 136.3 31.1 143.4 -140.8 -160.5 -98.0 154.3 45.9
-84.6 -74.8 -152.1 -179.1 45.5 58.9 82.6 78.2 177.7 -155.0 -46.4 -67.1 -154.3 -158.9 13.1 18.3 38.9 41.6
-62.1 71.7 -137 ,.4 92.4 145.9 -179.9 -176.0 -169. 1 -167.0 163.0 47.4 175.5 60.0 156.9 66.4 83.5 140.2 103.4 146.9 93.4 -151.7 -92.2 -101. 8 -100.5 -178. 7 -73.6 78.3 -53, .3 73.6 55.5 -165.9 -161.2 63 .3 -147.4 62.9 13.4 156.8 13,.2 163.4 50.4 39.0 38.9 17 .1 41.0 26.6
guide t o stran d asymmetry (heteronomou s conformations) . It is also worth notin g that the bas e inversion involve d i n th e B—» Z transitio n i s clearly visibl e onl y i n th e globa l helical paramete r tip, with a value close t o 180 ° fo r the Z conformation .
6.2 Irregular conformations We will now loo k a t some mor e irregula r conformations , beginnin g wit h th e famou s dodecamer CGCGAATTCGC G (35 , protei n dat a ban k 'PDB ' entr y 1BNA) , whic h clearly reveale d bas e sequence effect s withi n th e doubl e helix . Becaus e this oligome r
Base and base pair morphologies, helical parameters, and definitions 6
5
often serve s a s a reference , w e presen t a rathe r complet e se t o f helica l parameter s i n Table 2.4 . Figur e 2.1 2 show s a molecula r graphi c an d th e simplifie d representatio n generated b y 'Curves' , wher e th e axi s i s shown , th e backbon e i s represente d b y a ribbon (passin g through P , bisecting C3'—C4 ' an d oriente d b y th e phosphat e anionic oxygens), an d th e base s are replaced by rectangle s (completed b y a line t o thei r refer ence points which touc h thei r Watson—Crick partner an d lie on th e helica l axi s i n a canonical B conformation). The analysi s of the dodecame r natur e shows th e orde r o f sequence-induced varia tions within crystallographi c B-DNAs, typically of the orde r of 20° for rotational parameters an d 1 A fo r translationa l parameters . Th e kinkin g o f th e axi s withi n th e dodecamer i s visible i n th e figur e an d i s characterized by th e angl e formed betwee n successive helica l axi s vectors , b y th e til t an d rol l angles , an d b y th e DI F value s (se e Section 5) , which measur e the overal l irregularity o f each dinucleotid e step . On e ca n also note large propeller values and strong buckling on eithe r side of the centra l AAT T sequence. The centr e o f the oligome r als o shows positive openin g value s linked t o th e
Fig. 2.12. Molecula r graphi c (left ) an d 'Curves ' schemati c vie w (right ) o f th e B-DN A dodecame r d(CGCGAATTCGCG)2 (35) .
66
Oxford Handbook of Nucleic Acid Structure
Table 2.4. Helica l analysi s o f the dodecamer d(CGCGAATTCGCG) 2 (35 , PD B entry 1BNA ) obtaine d usin g Curve s (a) Globa l base pair-axis and intra-base pair parameters Base pair
X-disp
Y-disp
Inclin
Tip
C1 G2 C3
-0.66 -0.78 -0.54 -0.75 -0.77 -0.74 -0.55 -0.59
5.9 4.3 3.9 4.3
0.6 0.6
-0.9 -4.8 -5.4 -2.5
-0.2 -1.2 -1.1
-1.3 -4.1
0.2
-0.4
G4 A5 A6 T7 T8 C9 G10 C11 G12
-0.42 -0.01 -0.51 -0.12
0.16 0.09 0.01 0.26 0.22 0.16 -0.01 0.00 -0.08 0.28 0.26 0.13
Average
-0.54
0.12
0.1
-4.2 -1.6 -2.3 -1.7 -0.6
1.8 2.3 5.7
Buckle
Propel
Opening
3.7
-14.4 -10.6 -3.9 -11.7 -18.2 -19.6 -18.4 -19.7 -19.3 -6.2 -19.8
-3.0 -3.0 -1.6
-4.5 -7.5 10.1
4.7 3.2 0.6
-1.7 -10.8
2.4
-3.9
7.2 0.3
0.5
-13.4
0.4 3.7 7.3
10.0 2.8 0.7 0.4
-5.0 -2.5
0.8
(b) Globa l and local interbase pair parameters Duplex
Rise Til
C1/G2 G2/C3 C3/G4 G4/A5 A5/A6 A6/T7 T7/T8 T8/C9 C9/G10 G10/C11 C11/G12
3.49 -0. 3.08 1. 3.36 -5. 3.33 -2. 3.39 1. 3.36 3. 3.39 2. 3.23 -0. 3.62 -3. 3.22 1.
Average
3.37 -0.3
3.56 -3.
t
Roll
Twist
L-Rise
L-Tilt
2
2.3 -8.0 7.1 0.8 2.0 -0.5 2.3 -0.1 4.9 -13.1 -2.1
42.7 36.0 27.6 39.8 35.2 34.6 35.1 38.1 32.2 38.6 34.8
3.62 3.48 3.14 3.40 3.32 3.31 3.33 3.38 3.28 3.57 3.19
-3.4
-0.4
35.9
3.37
2 8 0 3 2 9 5 7 3 2 7
(c) Axis bend and DI F (Dimeri c irregularity function) Duplex
Angle
Diff
C1/G2 G2/C3 C3/G4 G4/A5 A5/A6 A6/T7 T7/T8 T8/C9 C9/G10 G10/C11 C11/G12
2.7 3.2 4.8 1.5 2.1 2.4 1.1 0.6 1.5 6.5 1.3
0.55 0.54 1.28 0.48 0.25 0.43 0.22 0.34 1.69 2.06
1.51
L-Roll
L-Twist
3.3
-13.2 -3.0
42.8 36.1 26.5 40.0 35.3 34.6 35.3 39.0 31.1 38.9 34.4
-0.2
-0.3
35.8
6.0
1.0 3.2
-5.3
2.1 2.9 0.8
-3.3
-3.1 -0.7
-2.9 -5.2
9.0 2.1 0.7 0.1
-0.8
4.8
Base and base pair morphologies, helical parameters, and definitions 6
7
decreased minor groov e width . Lastly , not e tha t sinc e thi s structur e i s clearl y a B-DNA, there ar e only smal l differences betwee n loca l and global parameters . The sam e is not tru e if we move awa y from th e B domain, fo r example wit h hybri d decamer d(GGGTATACGC):r(GCG)d(TATACCC ) (56 , PD B entr y 1OFX) . Th e conformation o f thi s oligome r i s show n i n Fig . 2.1 3 an d analyse d i n Tabl e 2.5 . I t is globally close r to a n A conformation, wit h a strong negativ e Xdis p an d strong positive inclinatio n fo r al l but th e firs t tw o bas e pairs . Les s obviou s i s th e kin k withi n the structure , whic h i s largel y concentrate d a t th e T4—A 5 ste p followin g th e 5'-hybrid:DNA-3' junction. Not e that such junctions ar e now generally though t (57 ) to be more perturbin g tha n 5'-DNA:hybrid-3 ' junctions. Th e globa l an d local interbase pai r parameter s fo r thi s structur e diffe r significantly , notabl y i n describin g th e A-like form, which again leads, in the local analysis, to negativ e slide, positive roll, and increased rise . On e ca n also not e that the T4—A 5 kink (toward s th e majo r groove ) is assimilated into a very strong local roll .
Fig. 2.13. Molecula r graphi c (left ) an d 'Curves ' schemati c vie w (right ) o f a hybri d d(GGGTAT ACGC):r(GCG)d(TATACCC) oligome r (56).
68
Oxford Handbook of Nucleic Acid Structure
Table 2.5. Helica l analysi s of the hybri d decame r d(GGGTATACGC) . r(GCG)d(TATACCC) (56 , PDB entr y 1OFX ) obtaine d usin g 'Curves ' (a) Selecte d base pair-axis and intrabas e pair parameters Base pair X-dis
p
Inclination
Buckle
Propel
Opening
G1 G2 G3 T4 A5 T6 A7 C8 G9 C10
-4.53 -4.21 -4.42 -4.54 -4.27 -3.97 -4.02 -4.17 -4.20 -3.96
6.2 8.0 11.7 12.8 14.6 13.6 10.9 12.3 13.3 11.9
-2.0 -9.1 -11.3 8.1 2.1 6.7 13.0 18.2 -5.2 -4.3
-4.8 -10.5 -19.3 -9.9 -16.2 -23.8 -21.0 -14.6 -14.7 10.6
-1.7 3.1 2.7 1.9 7.3 12.2 3.6 0.6 -1.8 0.4
Average
-4.23
11.5
1.6
-12.4
2.8
(b) Global and local inter-base pair parameters Duplex Slid
e Ris
t
L-Slide
-6.0
38.5 34.2 22.4 33.0 30.8 35.3 27.9 31.6 32.4
-2.75 -2.01 -1.57 -1.56 -1.39 -1.61 -2.00 -1.55 -1.80
3.56 3.39 2.92 3.55 3.41 3.11 3.39 3.91 3.80
2.4
31.8
-1.80
3.45
e Rol
G1/G2 G2/G3 G3/T4 T4/A5 A5/T6 T6/A7 A7/C8 C8/G9 G9/C10
-0.25 0.01 -0.41 0.20 0.01 0.17 -0.67 -0.07 -0.29
3.23 2.98 2.57 2.11 2.94 2.26 3.07 3.00 3.51
Average
-0.15
2.85
l Twis 1.2 1.7
-2.3 14.9 -1.1
8.0
-2.6
8.0
L-Rise L-Rol
l L-Twis t
1.0
38.6 34.3 20.7 29.5 29.3 32.5 27.9 31.0 31.7
8.8
30.6
5.7 7.4 2.7
22.3 7.1
15.4
3.1
14.6
It i s also interestin g to conside r a ver y perturbe d duplex , d(CGCAGAATTCGCG) 2, which contain s both bulge s an d opene d bas e pairs (58 , PD B entr y 1D31) . Bot h A 4 adenines in this oligomer are bulged bases , but whil e A 4 in th e firs t stran d is excluded from th e helix , th e equivalen t base in th e secon d stran d maintains stacking. Tw o such oligomers interac t head-to-tai l i n th e crystal , th e exclude d adenin e fillin g th e spac e opposite th e stacke d adenine. I n addition , th e termina l G: C pai r is disrupted an d th e bases point outward s (Fig . 2.14). Usin g th e option s i n 'Curves' , i t is possible to ignor e the position s o f the exclude d A 4 an d th e tw o opene d base s during th e globa l helica l analysis, revealin g a more o r les s straigh t helical axi s and typica l B-DNA helical parameters ((Xdisp ) = 0. 6 A , (Ydisp ) = 0. 8 A , (Incl ) = 0.9° , (Rise ) = 3. 4 A , (Twist ) = 36.6°). However , i t is also possible to characteriz e the loca l deformations, by showing , for example , tha t th e stacke d and unstacke d bulge site s both hav e similar twist values (40.5 an d 39.7°), bu t clearl y differ i n rise (3. 8 and 6.5 A). The disrupte d bas e pair can
Base and base pair morphologies, helical parameters, and definitions 6
9
also be full y characterized , notabl y b y a spectacular openin g of 208°. Note that a local parameter analysi s is not well adapted t o dealin g with structures containing major local deformations.
Fig. 2.14. Molecula r graphic (left ) an d 'Curves' schematic view (right) of a B-DNA oligome r d(CGCA GAATTCGCG)2 containin g two adenin e bulges and opened base s (58).
Fig. 2.15. Molecula r graphi c (left ) an d 'Curves ' schemati c view (right ) o f the tetraple x (TTGGGGT) 4 containing both G 4 and T4 tetrads (59).
70
Oxford Handbook of Nucleic Acid Structure
We finall y conside r a multistranded conformatio n (Fig . 2.15) . A parallel-strande d tetraplex, d(TTGGGGT) 4, containin g bot h T an d G tetrads, has been chose n fo r this purpose (59 , PDB entr y 201D) . Thi s structur e has a rather regular overal l conforma tion, wit h base s displace d roughly 2. 9 A toward s th e mino r groov e side . Twist angles vary fro m 2 4 to 36° , wit h th e larges t value between th e firs t an d secon d G tetrads . It should b e note d tha t Hoogstee n bas e pairin g lead s t o unusua l base—bas e parameter s (e.g. G tetrads have (Shear) = -5. 8 A , (Stretch) = 2. 8 A and (Opening) = -90°), bu t this doe s no t pertur b th e inter-bas e pai r parameter s o r th e characterizatio n o f tetrad deformations, suc h as the bucklin g of the termina l T tetrad s (roughly —18° for the firs t tetrad, versus roughly 37 ° for the las t tetrad). Today, mor e an d mor e irregula r DN A conformation s ar e bein g analyse d withi n protein:DNA complexes . Sinc e th e deformation s induce d b y protei n bindin g ca n be
Base and base pair morphologies, helical parameters, and definitions 7
1
severe, i t i s often difficul t t o understan d thei r natur e withou t th e hel p o f a detaile d helical analysis . Th e intereste d reade r migh t loo k a t th e exampl e provide d b y th e complex betwee n DN A an d the TAT A box-binding protein (60,61) . A visual inspect ion o f this comple x show s DNA t o b e ver y significantl y bent awa y fro m th e protei n and als o helicall y unwound , bu t a globa l helica l paramete r analysi s goe s furthe r i n revealing a new , virtuall y regular , helica l conformatio n i n th e protein-boun d regio n (62). Th e ne w conformatio n shows a striking resemblanc e to A-DNA , differin g onl y in strongly, positively inclined bas e pairs resulting from less negative glycosidic torsions .
Fig. 2.16. A n exampl e o f the outpu t fro m 'Dial s an d Windows' showin g th e tempora l fluctuation s o f selected helica l and backbone parameter s within a B-DNA oligomer. The tim e axis points upwards withi n the rectangula r diagram s and is radial for the circula r diagrams . Th e dat a shown cove r an 850 ps simulatio n in wate r (67) . See also p. 70 .
72
Oxford Handbook of Nucleic Acid Structure
7. Analysing nucleic acid dynamics Dynamic conformationa l informatio n i s available bot h fro m experimenta l an d theoretical studie s (Chapte r 4 ) an d treatin g thi s typ e o f informatio n pose s a numbe r o f new problems . Th e principa l proble m i n analysin g th e trajectorie s that resul t fro m molecular dynami c simulation s is that a very large mass o f information must be mad e readable. Nuclei c aci d simulation s are today generall y carrie d ou t i n wate r (typicall y in boxe s containin g roughl y 500 0 wate r molecules ) an d ofte n las t fo r on e o r mor e nanoseconds. Sinc e structure s are typicall y saved about ever y 0. 5 ps , thi s mean s that the complet e trajector y is represented b y roughl y 200 0 set s o f coordinate s an d velo cities, eac h of which contain s roughly 1 5 000 atoms. With double-precisio n dat a this represents abou t 1. 5 Gb o f information t o be processed . Although th e mas s of data is not th e same , NM R studie s ca n also lea d t o a large numbe r of structures compatibl e with th e spectroscopi c measurements, whos e dispersio n contain s valuabl e informa tion o n th e dynamic s o f the molecul e i n solutio n (Chapte r 8) . Although no t repre senting a tim e series , suc h dat a als o nee d t o b e presente d i n a comprehensibl e fashion. A firs t ste p t o analysi s is t o us e molecula r graphic s t o generat e optimall y super posed structure s o r a n animatio n o f th e tim e evolution . Thi s give s a goo d overal l impression o f the conformationa l changes taking place during th e trajectory , bu t i s not adapted to extractin g any quantitative information . Suc h information can be obtaine d by plotting th e tim e evolutio n o f individual conformationa l variables such as backbone angles, interatomi c distances , suga r puckers, o r helicoida l parameters, bu t ther e ar e a very larg e numbe r o f suc h variables . T o overcom e thi s problem , th e 'Dial s an d Windows' progra m develope d a t Wesleyan University (63 ) provides a compact repre sentation tha t plots tim e serie s using rectangula r 'windows ' fo r translationa l variables and circula r 'dials ' fo r rotationa l variables , wit h th e tim e axi s runnin g verticall y upwards in the forme r cas e and radially outwards in the latter . (Thi s ha s the advantage of avoiding problems owin g t o the cycli c nature of torsion angles , but th e disadvantage that variation s i n parameter s occurrin g earl y i n th e trajector y ar e somewha t com pressed.) Referenc e values , typicall y fo r th e A - an d B-form s o f DNA , enabl e th e range o f variation s t o b e estimated . A n exampl e o f th e outpu t fro m 'Dial s an d Windows' i s shown i n Fig . 2.16 . Thi s outpu t can be adjusted interactively o n a graphics workstation an d enable s man y element s o f a n entir e trajector y to b e viewe d o n a single page. Although 'Dial s an d Windows' use s 'Curves' to obtai n th e helicoidal parameters an d axi s variables , simila r technique s ca n b e use d wit h an y othe r analysi s approach and many authors have developed thei r own programs . On a mor e globa l level , a numbe r o f measurement s ca n b e useful , suc h a s axis bending, persistenc e length (64) , or rm s difference s wit h know n conformation s (65). In th e latte r case , i t i s particularly informative t o follo w th e evolutio n o f a trajectory with respec t to tw o o r mor e referenc e conformations. Thi s sor t o f triangulation ofte n gives very good insigh t into th e basi c natur e o f a complex conformationa l pathway. A useful extensio n o f rm s calculation s consist s of buildin g a two-dimensiona l matri x where ever y conformatio n save d alon g the trajector y is compared wit h ever y othe r conformation (66,67) . The result s can be represente d i n terms of shading, with darker squares referring to smalle r rms difference s (Fig . 2.17). On e woul d expec t such a plot
Base and base pair morphologies, helical parameters, and definitions 73
Fig. 2.17 . A two-dimensiona l rn u plo l >lii?\\'in ^ that a conformationa l stat e i s visite d twice durin g th e 850 p s simulatio n o f a B-DN A obgomer (67) . Structures alon g th e majectory (separate d b y 1 0 ps ) ar e com pared with on e another. Darker s h a d i n g indicates smalle r rm s value s (wit h whit e area s correspondin g t o al l values > 2 2 A) .
to sho w , 1 dar k ban d clos e t o th e diagonal , since neighbouring points along the trajec tory wil l generall y b e relate d t o on e another , hu t if , i n addition , off-diagona l dar k zones appear , ther e i s clea r evidenc e fo r th e reoccurrenc e o f a give n conformatio n (which ma y h e take n a s evidence fo r th e existenc e o f a conformatioml substate) .
8. Conclusions This chapte r has summarized presen t approaches to describin g the structur e of nucleic acids. Th e enormou s growt h i n th e variet y o f suc h structure s ha s pose d a numbe r of problems tha t ar e no t ye t completel y solved . A t th e simples t leve l o f description (number o f strands , stran d direction , syn/anti conformation , and bas e pairing ) som e order ha s been introduced , bu t challenge s t o thi s orde r continu e to appear . suc h a s the
74
Oxford Handbook of Nucleic Add Structure
triad DN A propose d b y Kuryavy i an d Jovi n (68 ) o r th e relate d 'adenosin e platforms' (69 ) foun d i n RNA . A t th e leve l o f helica l conformation , althoug h a number o f different analysi s schemes stil l coexist, ther e is a better leve l o f understand ing of the meanin g o f helical parameters and , i n particular, o f the difference s betwee n local an d globa l parameters . Non-helica l structures, however , continu e t o pos e problems an d mor e wor k i s neede d o n loops , multi-ar m junctions , an d th e rang e o f baroque architecture s o f RNA. Lastly , while some usefu l step s have been mad e i n th e analysis o f dynami c data , problem s persis t an d ar e particularl y challengin g fo r th e organizers of structural databases.
References 1. Saenger , W. (1984 ) Principles of Nucleic Add Structure. Springer-Verlag, New-York . 2. Helene, C. an d Toulme, J.J. (1990) Biochem. Biophys. Acta 1049, 99 . 3. Sun , J.S., Garestier, T. an d Helene C. (1996 ) Curr. Opin. Struct. Biol. 6, 327. 4. LeroyJ.L . an d Gueron, M. (1995 ) Structure 3, 101. 5. Taylor , R. an d Kennard, O . (1982 ) J. Mol. Struct. 78, 1. 6. Jiang, S.-P. , Raghunathan , G., Ting , K.-L. , Xuan , J.C. an d Jernigan, R.L . (1994 ) J. Biomol. Struct. Dynamics 12 , 367. 7. Jost , J.P. an d Saluz , H.P . (eds ) (1993) DNA Methylation: Molecular Biology and Biological Significance. Birkhauser , Basel.
8. Yanson , I.K., Teplitsky, A.B. and Sukhodub, L.F. (1979) Biopolymers 18 , 1149 . 9. Williams , L., Chawla, B. and Shaw, B. (1987 ) Biopolymers 26 , 591 . 10. Watson , J.D. an d Crick, F.H.C. (1953) Nature 171 , 964 . 11. Morgan, A.R. (1993 ) TIBS 18 , 160 . 12. Hartmann , B. and Lavery, R. (1996 ) Quart. Rev. Biophys. 29 , 309 . 13. Leonard , G.A., Thomson , J. , Watson , W.P . an d Brown , T . (1990 ) Proc. Natl. Acad. Sci. USA 87 , 9573 . 14. Kennard , O. (1985 ) J. Biomol. Struct. Dynamics 3,205 . 15. Jaishree, T.N. an d Wang, A.H.J . (1993 ) Nud. Adds Res. 16, 3839. 16. Rippe, K . and Jovin, T.M . (1992 ) Meth. Enzymol. 211, 199 . 17. Aboul-ela , F., Murchie, A.I.H. and Lilley, D.M.J. (1992 ) Nature 360, 280 . 18. Kang , C., Zhang , X., Ratliff , R. , Moyzis , R. an d Rich, A. (1992) Nature 356 , 126 . 19. Smith , F.W. and Feigon, J. (1992 ) Nature 356, 164 . 20. Kettani , A., Kumar, R.A. an d Patel, D.J. (1995 ) J. Mol. Biol. 254, 638 . 21. Lebrun , A. and Lavery, R. (1996 ) J. Biomol. Struct. Dynamics 13 , 459 . 22. Rose , I.A. , Hanson , K.R., Wilkinson , K.D . an d Wimmer, M.J . (1980 ) Proc. Natl. Acad. Sci. USA 77 , 2439 . 23. Harvey , S.C. (1983 ) Nucl. Adds Res. 11, 4867. 24. Lavery , R., Zakrzewska , K., Sun , J.S. and Harvey, S.C . (1992) . Nucl. Acids Res. 20, 5011 . 25. Westhof , E. (1992 ) Nature 358 , 459 . 26. Wang , A.H.J., Quigley , G.J. , Kolpak, F.J., Crawford, J.L., van Boom, J.H., va n der Marel, G. and Rich, A. (1979) Nature 282, 680 . 27. Morvan , F., Rayner , B. , Imbach , J.L., Chang , D.K . an d Lown , J.W. (1987 ) Nucl. Acids Res. 15 , 4241. 28. Sun , J.S., Francois , J-.C., Lavery , R. , Saison-Behmoaras , T. , Montenay-Garestier , T. , Thuong, N.T . an d Helene, C . (1988 ) Biochemistry 27 , 6039 . 29. Arnott , S. and Seising, E. (1974 ) J. Mol. Biol. 88, 509 .
Base and base pair morphologies, helical parameters, and definitions 7
5
30. Broitman , S.L. , Im, D.D . an d Fresco, J.R. (1987 ) Proc. Nad. Acad. Sci. USA 84 , 5120 . 31. Pilch , D.S. , Levensen , C. and Shafer , R.H . (1991 ) Biochemistry 30 , 6081 . 32. Sun , J.S., Mergny , J.-L., Lavery , R. , Montenay-Garestier , T . an d Helene , C . (1991 ) J. Biomol. Struct. Dynamics 9 , 411 . 33. Rosenberg , J.M. , Seeman , N.C. , Day , R.O . an d Rich , A . (1976 ) Biochem. Biophys. Res. Commun. 69, 979 . 34. Arnott , S . (1970) Progr. Biophys. Mol. Biol. 21, 265 . 35. Drew , H.R., Wing , R.M., Takano , T. , Broka , C. , Tanaka , S. , Itakura, K. and Dickerson R.E. (1981 ) Proc. Nad. Acad. Sci. USA 78 , 2179 . 36. Dickerson , R.E. , Bansal , M., Calladine , C.R. , Diekmann , S. , Hunter , W.N. , Kennard , O., Lavery , R. , Nelson , H.C.M. , Olson , W.K. , Saenger , W. , Shakked , Z. , Sklenar , H., Soumpasis , D.M., Tung , C.-S. , Vo n Kitzing , E. , Wang , A.H.-J . an d Zhurkin , V.B . (1989) J. Mol. Biol. 205, 787 . 37. Lavery , R. an d Sklenar, H. (1990 ) i n Structure and Methods, Vol.2 , DNA Protein Complexes and Proteins, (Sarma, R.H. an d Sarma, M.H., eds) , p. 412 . Adenin e Press, New York . 38. Fratin i A.V. , Kopk a M.L. , Dre w H.R . an d Dickerso n R.E . (1982 ) J. Biol. Chem. 257 , 14686. 39. vo n Kitzing , E . and Diekman, S . (1987). Eur. Biophys. J. 15 , 13. 40. Soumpasis , D.M., Tung , C.-S . an d Garcia, A.E. (1991) J. Biomol. Struct. Dynamics 8 , 867 . 41. Bhattacharyya , D. an d Bansal, M. (1989 ) J. Biomol. Struct. Dynamics 6, 635 . 42. Lavery , R. an d Sklenar, H. (1989 ) J. Biomol. Struct. Dynamics 6 , 655 . 43. Babcoc k M.S., Pednaul t E.P.D . an d Olson W.K. (1994 ) J. Mol. Biol. 237, 125 . 44. Babcoc k M.S. an d Olson W.K . (1994 ) J. Mol. Biol. 237, 98 . 45. E l Hassan M.A. an d Calladine C.R . (1995 ) J. Mol. Biol. 251, 648 . 46. Calladin e C.R . an d Drew H.R. (1984 ) J. Mol. Biol. 178, 77 3 47. Goodsel l D.S . an d Dickerson R.E . (1994 ) Nucl. Acids Res. 22, 549 7 48. Stofer , E. and Lavery, R. (1993 ) Biopolymers 34 , 337 . 49. Elgavish , T. an d Harvey, S.C. (1998 ) in preparation. 50. Arnott , S . and Hukins, D.W.L . 1972 ) Biochem. Biophys. Res. Commun. 47, 1504 . 51. Arnott , S. , Chandrasekaran , R. , Birdsall , D.L. , Leslie , A.G.W . an d Ratliff , R.L . (1980 ) Nature 283 , 74 3 (an d coordinates communicate d t o ou r laborator y by S. Arnott). 52. Arnott , S . and Hukins, D.W.L. (1973) J. Mol. Biol. 81, 93. 53. Chandrasekaran , R. an d Arnott, S. (1996) J. Biomol. Struct. Dynamics 13 , 1015 . 54. Arnott , S. , Hukins, D.W.L., Dover , S.D. , Fuller , W. an d Hodgson, A.R . (1973 ) J. Mol. Biol. 81, 107 . 55. Wang , A.H.-J. , Quigley, G.J. , Kolpak, F.J., van Der Marel, G. , van Boom, J.H. an d Rich, A. (1981 ) Scienc e 211, 171 . 56. Egli , M., Usman , N. , Zhang , S . and Rich, A. (1992 ) Proc. Natl. Acad. Sci. USA 89 , 534 . 57. Nishizaki , T. , Iwai , S. , Ohkubo , T. , Kojima , C. , Nakamura , H. , Kyogoku , Y . an d Ohtsuka, E. (1996 ) Biochemistry 35 , 4016 . 58. Joshua-Tor , L. , Frolow , F. , Appella , E. , Hope , H. , Rabinovich , D . an d Sussman , J.L. (1992) J. Mol. Biol. 225, 397 . 59. Wang , Y . and Patel, D.J. (1995 ) J. Mol. Biol. 251, 76 . 60. Kim , Y., Gieger , J.H., Hahn , S . and Sigler, P.B . (1993 ) Nature 365 , 512 . 61. Kim , J.L., Nikolov , D.B. an d Burley, S.K . (1993 ) Nature 365 , 520 . 62. Guzikevich-Guerstein , G. and Shakked , Z. (1996 ) Nature Struct. Biol. 3, 32 . 63. Ravishankar , G. , Swaminathan , S. , Beveridge, D.L. , Lavery , R . an d Sklenar , H . (1989 ) J. Biomol. Struct. Dynamics 6, 669 . 64. Prevost , C. , Louise-May , S. , Ravishankar , G. , Lavery , R . an d Beveridge , D.L . (1992 ) Biopolymers 33 , 335 .
76
Oxford Handbook of Nucleic Acid Structure
65. Goodfellow , J . M. , d e Souza , O.N. , Parker , K . an d Cruzeiro-Hansson , L . (1993 ) i n Computer Simulation of Biomolecular Systems, Vol . 2, (va n Gunteren, W.F., Weiner, P.K . an d Wilkinson, A.J. , eds), p. 483. Escom. , Leiden. 66. McConnell , K.J. , Nirmala , R., Young , M.A. , Ravishankar , M.A . and Beveridge , D.L . (1994) J. Am. Chem. Soc. 116, 4461 . 67. Flatters , D., Young , M.A. , Beverdige , D.L . an d Lavery , R . (1997 ) J. Biomol. Struct. Dynamics 14 , 757 . 68. Kuryavyi , V.V. and Jovin, T.M . (1995 ) Nature Genetics 9, 339 . 69. Cate , J.H. , Gooding , A.R., Podell , E., Zhou, K. , Golden, B.L., Szewczak, A.A., Kundrot, C.E., Cech , T.R. an d Doudna, J.A. (1996 ) Science 273, 1696 .
3 The Nucleic Acid Database: a research and teaching tool Helen M. Berman, Christine Zardecki, and John Westbrook Department of < '.llftlllilrf. Kfifsjcr.
. 1 'llil't'l-.-ily. I'i.i,;uju.;iy. \J
I>HX:J4 HI>H7,
'-endo range . Also , th e bas e pai r stackin g resemble s tha t o f A-DNA, while th e intrastrand phosphate-phosphate distanc e (average 6.5 A), the ris e per residu e (averag e 3.2 A) , an d th e inclinatio n (averag e 2.6°) ar e reminiscent o f th e B-form. Analogously , the A-form duple x d(CTCTAGAG ) (115 ) resemble s the DN A structure i n th e trp represser—operator comple x (116) . Thes e analyse s point ou t th e difficulty o f assigning a DNA, i n comple x wit h a protein, t o a pure structura l class and emphasize the apparen t structural continuum o f helical conformations. I n the recentl y determined crysta l structures of the TAT A box-binding protei n complexe d t o differen t TATA box element s (19,20) , the DN A i s found to be highl y distorted , an d resembles an A-DN A (shor t intrastran d phosphate—phosphate distances , lo w twis t angles , hig h roll) with a very large inclination (averag e about 50° ) (21) . It is also intriguing tha t th e founding membe r o f the zin c finger family , TFIIIA , ca n recognize bot h th e 5 S RN A gene and the 5 S RNA molecule , an d that its cognate DNA elemen t crystallize s in th e A-form (65) . The deformation s detected i n th e duplexe s o f A-DNA crysta l structures owing t o lattice force s ma y also be achieved , with eve n greater severity , when ligands , like pro teins, interact at particular sequences. The structur e of the CCG G tetramer sequence, a restrictio n endonucleas e recognition site , suggested a possible identification scheme based o n suc h DN A deformabilit y (11) . Th e correspondin g enzyme(s ) could mak e out th e overal l A-form structur e and , subsequen t to binding , 'test ' th e deformabilit y of the sequence, which coul d be followed by introduction o f the cuts.
10. Comparison -with solution studies While th e A-for m i s a commonly adopte d conformatio n i n crystals , the majorit y o f the duplexe s conver t t o th e B-for m i n aqueou s environment s (2,62) , a s shown, fo r
140
Oxford Handbook of Nucleic Acid Structure
example, fo r a DNA octame r (113 ) or th e TFIII A bindin g sit e (65,117) . Th e belie f that B-DN A constitute s th e biologicall y 'active ' for m o f DNA i s therefore strength ened b y solution studie s but no t necessaril y b y single-crystal work . However , certai n sequences, lik e poly (dG):pol y (dC) , clearl y prefe r the A-for m eve n i n solutio n (62) , confirming X-ra y crystallographi c studies of GC-rich oligomer s (60) . It has also been suggested tha t the helica l differences in solution ar e more subtl e than in th e crysta l and that intermediat e form s ma y exist (55) . Possibly, ther e ar e also equilibria betwee n dif ferent helica l form s i n solutio n (2,3) . Furthermore , i t ha s bee n reporte d tha t lon g DNA sequence s ca n exhibit th e A-for m i n solution i n the presenc e of certain cation s (63). I n som e case s it i s found tha t cobal t hexammin e i s an essentia l ingredien t promoting th e formatio n o f A-DNA crystals (12). Similarly, we may expect certai n counterions in vivo t o favou r th e A-form . I t shoul d als o b e kep t i n mind , tha t th e cel l nucleus i s aki n t o a 'semicrystalline ' environment , wit h highl y compacte d DNA, reminiscent of tightly packed A-DNA crystals. Even thoug h DN A ma y commonl y adop t th e B-for m i n solution , th e A-for m i s the natura l conformatio n o f RN A an d chimeri c duplexes . Whil e a larg e bod y o f structural data is available for A-DNA, far less chimer a an d RN A structure s have been determined t o date , mostl y becaus e o f initia l problem s wit h thei r synthesi s an d purification. I t i s therefor e o f interes t t o se e whethe r th e lesson s learne d fro m A-DNAs can be transferred to RNA an d chimeric molecules .
11. Conclusions Our understandin g o f the biophysics of nucleic acid s has been enhance d by crysal structure analyse s o f oligonucleotides . Th e right-hande d A-DNA s compris e a structural family separated fro m the B-like cluster by the bimodal distribution of the suga r pucker (C3'-endo i n A-DNA , C2'-endo i n B-DNA) , a s well a s their larg e x-displacemen t an d the inclinatio n o f the bas e pairs. In othe r words , the y constitut e a continuum o f righthanded duple x conformations , i.e . the sam e cal f thymus DN A fibr e ca n b e intercon verted betwee n th e A- and B-forms, depending o n the relative humidity. Th e existenc e of this structural continuum suggest s that A-DNA represents a biologically relevan t conformation, althoug h transcriptionall y silent . Th e A-for m i s characterize d b y a mor e hydrophobic shallo w groove compare d with B-DNA. Therefore, a DNA ligand , such as a protein, that demand s suc h a groove architecture fo r complex formation coul d evoke a switch from B - t o A-DNA. A-DNA also seems to be favoure d over B-DNA in certai n water-deficient crystallin e environments. Genomi c DN A mus t be intricately folded and compacted i n orde r t o fi t into th e tigh t spac e provided b y eukaryotic nuclei. It is therefore no t unreasonabl e t o conside r th e aggregat e stat e of this DNA a s paracrystalline, in which th e DN A ma y favour th e A-for m a t particular sequences, possibly with packin g similar t o tha t observe d i n A-DN A crystals . Therefore , th e observe d sequenc e an d packing effects i n the crysta l structures may have biological relevance .
Acknowledgements Support fro m th e Nationa l Institute s o f Healt h (NI H grants GM-1737 8 an d GM-49547), th e Endowmen t fro m a n Ohio Regents Eminent Schola r Chair , an d an OSU Presidentia l Fellowship (t o M.C.W) ar e gratefully acknowledged .
A -DNA duplexes in the crystal 14
1
References 1. Franklin , R.E. an d Gosling , R.G . (1953 ) Acta Cryst. 6, 673 . 2. Frederick , C.A. , Quigley , G.J. , Teng , M.K. , Coll , M. , va n der Marel , G.A. , va n Boom, J.H., Rich , A. and Wang, A.H.J . (1989 ) Eur.J. Biochem. 181, 295 . 3. Kennard , O. and Hunter, W.N . (1989 ) Biophys.J. 22, 327 . 4. Wahl , M.C.an d Sundaralingam, M. (1997 ) Biopolymers 44, 45 . 5. Fuller , W., Wilkins , M.H.F. , Wilson , H.R . an d Hamilton, L.D . (1965 ) J. Mol Biol . 12, 60. 6. Arnott , S. , Dover, S.D . an d Wonacott, A.J. (1969 ) Acta Cryst. B25 , 2192 . 7. Arnott , S . and Hukins , D.W.L. (1972 ) Biochem. Biophys. Res. Commun. 47, 1504 . 8. Sundaralingam , M. an d Ban , C . (1993) . I n Aspects of Crystallography in Molecular Biology, New Ag a International Limited, Publishers , New Delhi , India. 9. Shakked , Z., Rabinovich , D. , Cruse , W.B. , Egert , E. , Kennard , O., Sala , G. , Salisbury, S.A. an d Viswamitra, M.A. (1981 ) Proc. R. Soc. (London), B213 , 479 . 10. Wang , A.H.-J., Fujii , S. , van Boom , J.H. an d Rich, A. (1982 ) Proc. Natl. Acad. Sci. USA 79, 3968. 11. Conner , B.N. , Yoon , C. , Dickerson , J.L . an d Dickerson, R.E . (1984 ) J. Mol. Biol. 174 , 663. 12. Bingman , C.A. , Zon , G . and Sundaralingam, M. (1992 ) J. Mol. Biol. 227, 738 . 13. Verdaguer , N. , Aymami , J., Fernandez-Forner , D. , Fita , I., Huynh-Dinh , T. , Igolen , J . and Subirana , J.A. (1991)J . Mol. Biol. 221, 623 . 14. Mooers , B.H. , Schroth , G.P. , Baxter , W.W. an d Ho, P.S . (1995 ) J. Mol. Biol. 249, 772 . 15. O"Brien , E.J. an d MacEwan, A.W . (1970 ) J. Mol. Biol. 48, 243 . 16. Arnott , S. , Hukins, D.W.L. , Dover , S.D. , Fuller , W. an d Hodgson, A.R. (1973 ) J. Mol. Biol. 81, 107 . 17. Haran , T.E., Shakked , Z. , Wang , A.H.-J. and Rich, A. (1087 ) J. Biomol. Struct. Dynamics 5, 199 . 18. Wahl , M.C. an d Sundaralingam, M . (1995 ) Curr. Opin. Struct. Biol. 5, 282 . 19. Kim , Y., Geiger , J.H., Hahn , S. and Sigler, P. B. (1993 ) Nature 365 , 512 . 20. Kim , L.J. , Nikolov , D.B . an d Burley, S.K . (1993 ) Nature 365, 520 . 21. Guzikevich-Guerstein , G . and Shakked , Z. (1996 ) Nature Struct. Biol. 3, 32 . 22. Sundaralingam , M. (1973 ) Jerus. Symp. Quant. Chem. Biochem. 5, 417 . 23. Yathindra , N. an d Sundaralingam, M. (1973 ) Biopolymers 12 , 297. 24. Wang , A.H.J. , Fujii , S. , van Boom , J.H. , va n de r Marel , G.A. , va n Boeckel , S.A . an d Rich, A. (1982 ) Nature 299 , 601 . 25. Egli , M., Usman , N. , Zhang , S . and Rich, A. (1992 ) Proc. Natl. Acad. Sci. USA 89 , 534 . 26. Egli , M. , Usman , N . an d Rich, A. (1993 ) Biochemistry 32 , 3221 . 27. Ban , C., Ramakrishnan , B. and Sundaralingam, M. (1994 ) J. Mol. Biol. 236, 275 . 28. Ban , C. , Ramakrishnan , B . and Sundaralingam, M. (1994 ) Nucl. Acids Res. 22, 5466 . 29. Horton , N.C . an d Finzel, B.C. (1996 ) J. Mol. Biol. 264, 521 . 30. Nunn , C.M. an d Neidle, S . (1996) J. Mol. Biol. 256, 340 . 31. Ramakrishnan , B. and Sundaralingam, M . (1993 ) Biochemistry 32 , 11458 . 32. Jain , S. and Sundaralingam, M. (1989 ) J. Biol. Chem. 264, 12720 . 33. Shakked , Z., Guerstein-Guzikevich , G. , Eisenstein , M. , Frolow , F . and Rabinovich, D . (1989) Nature 342 , 456 . 34. Ramakrishnan , B . and Sundaralingam, M. (1993 ) J. Biomol. Struct. Dynamics 11 , 11. 35. Heinemann , U . (1991 ) J. Biomol. Struct. Dynamics 8 , 801 . 36. Jain , S., Zon, G . and Sundaralingam , M. (1991 ) Biochemistry 30 , 3567 . 37. Ramakrishnan , B . and Sundaralingam, M . (1993 ) J. Mol. Biol. 231, 431 . 38. Dock-Bregeon , A.C. , Chevrier , B., Podjarny, A., Johnson, J., d e Bear, J.S., Gough , G.R. , Gilham, P.T . an d Moras, D. (1989 ) J. Mol. Biol. 209, 459 .
142
Oxford Handbook of Nucleic Acid Structure
39. Tippin , D.B . an d Sundaralingam, M . (1996 ) Acta Cryst. D52 , 997 . 40. Michel , F . and Westhof, E. (1990 ) J. Mol. Biol. 216, 585 . 41. Jaeger , L., Michel, F . and Westhof, E . (1994 ) J. Mol. Biol. 236, 1271 . 42. Pley , H.W. , Flaherty , K.M. an d McKay, D.B. (1994 ) Nature 372 , 111 . 43. Dickerson , R.E. , Goodsell , D.S . an d Neidle , S . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 3579. 44. Sponer.J . an d Kypr, J. (1991 ) J. Mol. Biol 221 , 761 . 45. Shakked , Z. , Rabinovich , D. , Kennard , O. , Cruse , W.B. , Salisbury , S.A . an d Viswamitra, M.A. (1983 ) J. Mol Biol 166 , 183 . 46. Calladine , C.R . (1982 ) J. Mol Biol 161 , 343 . 47. Dickerson , R.E . (1983 ) J. Mol Biol. 166, 419 . 48. Fratini , A.V., Kopka, M.L., Drew , H.R. an d Dickerson, R.E . (1982 ) J. Biol. Chem. 257, 14686. 49. Lavery , R. an d Sklenar, H. (1988 ) J. Biomol Struct. Dynamics 6 , 63. 50. Bhattacharyya , D. an d Bansal, M. (1990 ) J. Biomol Struct. Dynamics 8 , 539 . 51. Babcock , M.S. , Pednault , E.P.D . an d Olson , W . (1993 ) J. Biomol Struct. Dynamics 11 , 597. 52. Sponer , J . an d Kypr , J. (1990 ) i n Theoretical Biochemistry and Molecular Biophysics, Vol. 1: DNA (Beveridge , D.L . an d Lavery, L. , eds) , Vol . 1 , pp. 271-284 . Adenin e Press , Ne w York. 53. Thota , N. , Li , X.H., Bingman , C . an d Sundaralingam, M. (1993 ) Acta Cryst. D49 , 282 . 54. Olson , W.K . (1982 ) Nucl Acids Res. 10, 777 . 55. Heinemann , U. , Lauble , H., Frank , R. an d Blocker, H. (1987 ) Nucl. Acids Res. 15, 9531 . 56. Lauble , H., Frank , R., Blocker , H . an d Heinemann, U . (1988 ) Nucl. Acids Res. 16, 7799. 57. Jain , S., Zon, G . and Sundaralingam, M . (1987) J. Mol. Biol. 197, 141 . 58. Heinemann , U. , Alings , C. an d Bansal, M. (1992 ) EMBOJ. 11 , 1931 . 59. Arnott , S . and Seising, E. (1974 ) J. Mol. Biol. 88, 551 . 60. McCall , M. , Brown, T . an d Kennard, O. (1985 ) J. Mol. Biol. 183, 385 . 61. Langridge , R. (1969 ) J. Cell. Physiol 74(Suppl . 1) , 1. 62. Benevides , J.M., Wang , A.H.J. , Rich , A., Kyogoku, Y., van der Marel, G.A. , van Boom, J.H. an d Thomas, G.J.Jr . (1986 ) Biochemistry 25 , 41 . 63. Robinson , H . an d Wang, A.H.-J . (1996 ) Nucl Acids Res. 24, 676 . 64. Pavletich , N.P. an d Pabo, C.O . (1993 ) Science 261, 1701 . 65. McCall , M. , Brown, T. , Hunter , W.N . an d Kennard, O. (1986 ) Nature 322 , 661 . 66. Steitz , T.A. (1990 ) Q . Rev. Biophys. 23 , 205 . 67. Rabinovich , D. , Haran , T., Eisenstein, M. and Shakked, Z. (1988 ) J. Mol Biol 200 , 151 . 68. Doucet , J. , Benoit , J.-P. , Cruse , W.B.T. , Prange , T . an d Kennard , O . (1989 ) Nature 337, 190 . 69. Seising , E., Wells, R.D. , Alden , C.J. an d Arnott, S . (1978) J. Biol. Chem. 254, 5417 . 70. Wahl , M.C. , Rao , S.T . an d Sundaralingam, M. (1996 ) Biophys. J. 71 , 2857 . 71. Bingman , C. , Li , X., Zon , G . and Sundaralingam, M. (1992 ) Biochemistry 31 , 12803 . 72. Bingman , C. , Jain, S., Zon, G . and Sundaralingam, M. (1992 ) Nucl Acids Res. 20, 6637. 73. Ban , C., Ramakrishnan , B. and Sundaralingam, M. (1996 ) Biophys. J. 71 , 1215 . 74. Ban , C. an d Sundaralingam, M . (1996 ) Biophys. J. 71 , 1222 . 75. Tippin , D.B. , Ramakrishnan , B. and Sundaralingam, M. (1996 ) J. Mol Biol 270 , 247 . 76. Wang , A.H.J. , Quigley , G.J. , Kolpak , F.J. , Crawford , J.L. , va n Boom , J.H. , va n de r Marel, G . and Rich, A. (1979 ) Nature 282 , 680 . 77. Behe , M . an d Felsenfeld , G . (1981 ) Proc. Natl Acad. Sci. USA 78 , 1619 . 78. Fujii , S. , Wang , A.H.-J. , va n de r Marel , G.A. , va n Boom , J.H . an d Rich , A . (1982 ) Nucl. Acids Res. 10, 7879.
A-DNA duplexes in the crystal 14
3
79. Heinemann , U. , Rudolph , L.N. , Alings , C. , Morr , M. , Heikens , W. , Frank , R . an d Blocker, H . (1991 ) Nucl. Adds Res. 19, 427 . 80. Frederick , C.A., Saal , D., van der Marel, G.A., van Boom, J.H., Wang , A.H.J . and Rich, A. (1987) Biopolymers 26 , S145 . 81. Ramakrishnan , B . and Sundaralingam, M. (1995 ) Biophys.J. 69, 553 . 82. Tippin , D.B. an d Sundaralingam, M. (1997 ) J. Mol. Biol. 267, 1171 . 83. Hunter , W.N. , Kneale , G. , Brown, T. , Rabinovich , D . and Kennard, O . (1986 ) J. Mol. Biol. 190, 605 . 84. Crick , F.H.C . (1966 ) J. Mol. Biol. 19, 548 . 85. Kneale , G. , Brown, T. , Kennard , O. an d Rabinovich, D. (1985 ) J. Mol. Biol. 186, 805 . 86. Westhof , E. (1987 ) Annu. Rev. Biophys. 17, 125 . 87. Betzel , C. , Lorenz , S. , Furste, J.P., Bald , R., Zhang , M. , Schneider , T.R. , Wilson , K.S . and Erdmann , V.A . (1994 ) FEBS Lett. 351, 159 . 88. Cruse, W.B.T. , et al. (1989) Nucl. Adds Res. 17, 55 . 89. Chen , X. , Ramakrishnan , B. , Rao , S.T . an d Sundaralingam , M . (1994 ) Nature Struct. Biol. 1, 169 . 90. Jain , S. , Zon, G . and Sundaralingam , M. (1989 ) Biochemistry 28 , 2360 . 91. Wilcock , D.J. , Adams , A., Cardin, C.J . an d Wakelin, L.P.G . (1996 ) Ada Cryst. D52 , 481 . 92. Neidle , S., Berman, H. an d Shieh, H.S. (1980 ) Nature 288, 129 . 93. Kopka , M.L. , Fratini , A.V., Drew , H.R . an d Dickerson, R.E . (1983 ) J. Mol. Biol. 163 , 129. 94. Westhof , E., Prange, T. , Chevrier , B . and Moras, D . (1985 ) Biochimie 67, 811 . 95. Kennard , O., et al. (1986) J. Biomol. Struct. Dynamics 3, 623 . 96. Berman , H.M . (1986 ) Ann. N. Y. Acad. Sci. 482, 166 . 97. Westhof , E. (1988 ) Int. J. Biol. Macromol. 9, 185 . 98. Berman , H.M. , Sown , A., Ginell , S . and Beveridge, D . (1988 ) J. Biomol. Struct. Dynamics 5, 1101 . 99. Schneider , B. , Cohen, D. an d Berman, H.M . (1992 ) Biopolymers 32 , 725 . 100. Schneider , B., Cohen , D.M. , Schleifer , L., Srinivasan, A.R., Olson , W.K . and Berman , H.M. (1993 ) Biophys.J. 65, 2291 . 101. Eisenstein , M. an d Shakked, Z. (1995 ) J. Mol. Biol. 248, 66 2 (1995). 102. Wahl , M.C . an d Sundaralingam, M. (1997 ) TIBS 22 , 97. 103. Tippin , D.B . an d Sundaralingam , M. (1997 ) Biochemistry 26 , 536 . 104. Eisenstein , M. , Hope , H. , Haran , T.E. , Frolow , F. , Shakked , Z . an d Rabinovich , D . (1988) Acta Cryst. B44 , 625 . 105. Eisenstein , M., Frolow , F. , Shakked , Z. an d Rabinovich, D . (1990 ) Nucl. Acids Res. 18, 3185. 106. Gao , Y.-G., Robinson, H., van Boom, J.H. an d Wang, A.H.-J . (1995) Biophys.J. 69 , 559 . 107. Saenger , W., Hunter , W.N . an d Kennard, O . (1986 ) Nature 324, 385 . 108. Shakked , Z. Guzikevich-Guerstein , G. , Frolow , F. , Rabinovich, D. , Joachimiak, A . and Sigler, P.B . (1994 ) Nature 368 , 469 . 109. Brennan , R.G., Westhof , E . an d Sundaralingam , M. (1986 ) J. Biomol. Struct. Dynamics 3 , 649. 110. Gao , Y.G., Sriram , M. and Wang, A.H . (1993 ) Nucl. Adds Res. 21, 4093. 111. Tari , L.W . an d Secco, A.S. (1995) Nucl. Acids Res. 23, 2065 . 112. Wemmer , D.E. , Srivenugopal , K.S. , Reid, B.R . an d Morris, D.R . (1985 ) J. Mol. Biol. 185, 457 . 113. Clark , G.R. , Brown , D.G. , Sanderson , M.R. , Chwalinski , T. , Neidle , S. , Veal , J.M. , Jones, R.L. , Wilson , W.D. , Zon , G. , Carman, E . and Stuart, D.I. (1990 ) Nucl. Acids Res. 18, 5521 .
144
Oxford Handbook of Nucleic Acid Structure
114. Weston , S.A. , Lahm , A. and Suck, D . (1992 ) J. Mol. Biol. 226, 1237 . 115. Hunter , W.N., Langloi s D'Estaintot, B. and Kennard, O. (1989 ) Biochemistry 28, 2444. 116. Otwinowski , Z. , Schevitz , R.W. , Zhang , R.G. , Lawson , C.L . Joanchimiak , A. , Marmorstein, R.Q. , Luisi , B.F. and Sigler, P.B . (1988 ) Nature 355 , 321 . 117. Aboul-ela , F., Varani , G., Walker , G.T . and Tinoco , Jr, I. (1988 ) Nucl. Acids Res. 16, 3559.
6 Helix structure and molecular recognition by B-DNA Richard E. Dickerson Molecular Biology Institute, University of California at Los Angeles, Los Angeles, CA 90025-1570, USA
1. Introduction One da y in 1980 , Franci s Cric k arrive d i n Pasaden a fro m th e Sal k Institut e t o visi t James Olds of the Biology Divisio n a t Caltech. Whil e on campus , he cam e by my lab oratory to see graduate student Horace Drew' s ne w DNA dodecame r structure , CGC GAATTCGCG, the first single-crystal X-ray structur e analysi s of B-DNA. By chance the onl y people i n the laboratory that afternoon were Horace , m y long-time scientific colleague Mary Kopka , an d myself. The fou r o f us huddled aroun d contoured plexigla s sheets stacked atop a light box in a darkened room , in that pre-computer graphic s age. Horace pointe d ou t with pride bot h th e similaritie s to the Watson—Crick—Arnott fibr e structure fo r B-DNA , an d th e difference s produce d b y loca l bas e sequence , whic h only a single-crystal analysis could show. Crick looke d o n with interest , and then mad e a casua l remark tha t ha d fa r more impac t o n Horac e tha n anythin g tha t I ever sai d t o him, eithe r befor e or since . When Horac e conclude d hi s presentation, Cric k smile d and sai d to him , 'S o that's what i t looks like' Metaphorically speaking , Horace inflated with pleasur e and floated around the ceilin g o f the roo m lik e a balloon. Crick' s remar k was quoted endlessl y to al l comers for days. At on e level , Franci s Crick wa s only bein g hi s usual courteous self , complimentin g a graduat e studen t wh o wa s thrille d t o b e i n th e sam e roo m wit h him , le t alon e receive kin d word s abou t hi s research . Bu t ther e i s anothe r sens e i n whic h Crick' s remark wa s prescient, an d quite justified. Fibr e diffraction , of necessity, yields th e aver aged helix structure, and does no t sho w th e effect s o f base sequence except when tha t sequence affect s th e overal l average d structure . Single-crystal structure s can revea l th e local, base-to-base variation s in helix structur e that are a consequence of the particular DNA sequenc e in question .
2. Early sequence—structure correlations The firs t 1 0 years of single-crystal DNA structur e analysis, roughly 1979—1988 , were a euphoric searc h fo r th e rule s tha t governe d sequence—structur e relationships . Fo r B-DNA thi s wa s the 'decad e o f th e dodecamer' , sinc e virtuall y al l the structure s solved wer e variant s of th e Dre w dodecamer , CGCGAATTCGC G (1-5) . Thi s was the cas e becaus e the CG-ric h end s kep t th e heli x fro m unravelling , an d hydroge n bonded overla p o f th e fina l tw o bas e pair s a t eac h en d wit h neighbourin g helice s
146
Oxford Handbook of Nucleic Acid Structure
provided a strong crystallin e scaffoldin g whil e preservin g th e identit y o f th e 1 0 base pair crystallographi c an d helica l repeats . Earl y single-crysta l A-DN A structure s (see Chapter 5 ) tende d t o b e octamer s containin g man y C- C an d G- G steps . Z-DN A structures (se e Chapter 7) were limited t o variants of the alternatin g pyrimidine-purine sequence: CGCG..., with a price being paid in helix stabilit y for each deviatio n from pyrimidine—purine alternation, or eve n for substitution of A for G and T fo r C. CGCxxxxxxGCG wa s th e 'magic ' sequenc e fo r B-DN A dodecamers . Providin g that thes e oute r C: G bas e pairs and thei r intermolecula r overla p wer e preserved , th e central six base pairs could be varied almost at will, t o yield crystal s that were isomor phous with the Drew structure and could be solved b y simple molecular replacement . To dat e mor e tha n 7 5 dodecame r crysta l structure s isomorphous wit h Drew' s hav e been solve d (Tabl e 6.la) , al l in th e sam e orthorhombi c P212121 space group , wit h essentially identica l uni t cel l dimension s an d crysta l packing. I n th e vas t majorit y o f cases, thes e wer e solve d b y takin g a s a firs t approximatio n th e phase s o f th e Dre w dodecamer o r on e o f its cousins. Twenty-four o f these structur e analyses have been o f DNA alon e (Tabl e 6.1a.l). Bu t th e rugge d crystal s of Drew-like sequences also proved to b e excellen t scaffoldin g fo r drug-bindin g studies ; 3 9 differen t analyse s have bee n carried out o f complexes with 1 8 different dru g molecules that bind withi n th e mino r groove (Tabl e 6. la.2), a s well a s 16 studies of base mismatches (Tabl e 6.1a.3). I n con trast t o th e richnes s an d variet y o f thes e results , onl y eight dodecamer s hav e bee n solved independently i n spac e groups other than orthorhombic P2 1 2121 (Table 6.1b). During thi s first 'dodecame r decade' , severa l attempts were mad e t o dra w u p rule s connecting loca l heli x structur e with loca l base sequence. The obviou s parameters t o consider were helical twist angle, th e rol l angl e between successiv e base pairs, the rise from on e bas e pair to anothe r along th e heli x axis , th e inclinatio n o f a base pair away from perpendicularit y to th e heli x axis , th e latera l displacement o f the bas e pair away from tha t same axis, and the propelle r twis t between base s i n a base pair (Fig . 6.1 an d ref. 120 ; see also Chapter 2) . Indeed, thes e parameters can be used to describ e the dis tinguishing characteristic s of the thre e DN A heli x families : A , B, and Z. A n idealize d Arnott fibre-derive d B-DN A (Chapte r 1 ) has a mean 36 ° twist , 3.3 8 A rise, zero roll, —6.0° inclination , +0.2 3 A x-displacement , an d —4.4 ° propelle r twist. 1 I n contrast , ideal A-DNA has a 33° twist, 2.5 6 A rise , 6° roll, 21 ° inclination, -4. 5 A x-displace ment, an d —7.5 ° propelle r twist . Simila r value s wer e observe d i n th e earl y single crystal analyses , except tha t inclinatio n i n A-DN A oligomer s tende d t o b e smaller , around 13° , an d propelle r twis t fo r bot h A - an d B-DN A turne d ou t t o b e a highl y variable and sequence-dependent quantity . The fac t tha t base pairs in A-DNA are displaced off-axi s b y 4. 5 A mean s tha t th e mino r groov e i s shallowe r tha n th e major , whereas in B-DNA, wher e bas e pairs sit squarely on the heli x axi s (0.23 A mean dis placement), major and minor grooves are of equal depth .
1 By a misplaced attentio n t o torsio n angl e sig n consistency , th e propelle r shown i n Fig. 6. 1 was defined as positive b y th e 198 8 Cambridg e Conventio n (120 ) Unfortunately , propelle r twis t i n DN A i s almost always i n th e opposit e direction , becaus e thi s facilitate s stackin g o f base s alon g eac h individua l strand o f helix. A s a consequence o f this 198 8 sig n decision, th e fiel d ha s been curse d wit h a steady stream of negative propeller values.
Table 6.1. X-Ray analyses of B-DNA helices and their complexes with minor groove binding drug molecules Sequence
Space group
z
Ubp
Date, Institution
Ref.
NDB No.
I.D. No.
BDL001 BDL002 BDL020 BDL005 BDLB03 BDLB04 BDLB13 BDLB73 BDLB74 BDLB72 BDL075 BDLS79 BDLS80 BDLS67 pending BDL006 BDL015 BDL038 pending BDL007 BDL059 BDL029 BDL028 BDL021.32
101 102 103 104 105 106 107 108 109 110 111 112 113 114 114a 115 116 117 117a 118 119 120 121 122
(a) Dodecamers, isomorphous with CGCGAATTCGCG, space group P2,2 12, 1. Oligonucleotide s alone CGCGAATTCGCG CGCGAATTCGCG CGCGAATTCGCG CGCGAATTCGCG CGCGAATTVCGCG CGCGAATTbr5CGCG CGCGAm6ATTCGCG CGCGAATTm5CGCG CGCGAATlVCGCG CGm5CGAATTm5CGCG CGCGAAUUCGCG CGCGAAm6"Tm6°TCGCG CGCGAAh6aTOH6oTCGCG CGCGAASSCGCG CGCGATATCGCG CGCAAAAAAGCG CGCAAAAATGCG CGCAAATTTGCG CGCAIATm5CTGCG CGCATATATGCG CGCGTTAACGCG CGTGAATTCACG CGTGAATTCACG CGCGAAAACGCG/ CGCGTT/TTCGCG (nicked strand)
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
1980 UCL A 1982 UCL A 1987 Strasbourg 1985 Berkeley 1982 UCL A 1982 UCLA 1988 MI T 1997 Cambridg e 1997 Cambridg e 1997 Cambridg e 1997 Cambridg e 1997 Northwester n 1997 Northwester n 1996 Manchester 1997 Weizman n 1987 Cambridg e 1989 Yale 1992 Inst. Can. Res. 1997 Weizman n 1988 UCLA 1991 Ohi o State 1991 UCL A 1991 Rutger s 1990 MI T
1-5,6 7 8 9 10,11 6,10,11 12 13 13 13 13 14 14 15 16 6,17 6,18 6,19 16 20 21,22 6,23 24 25
Table 6.1. Continued Sequence
Space group
z
2. DNA : Drug complexes Cisplatin CGCGAATTCGCG/Cis 4 Netropsin: +Py-Py+ CGCGAATlVCGCG/N 4 CGCGAAllVCGCG/N 4 CGCGAATTCGCG/N 4 CGCe6GAATTCGCG/N 4 CGCAAATTTGCG/N 4 CGCGATATCGCG/N 4 CGCGTTAACGCG/N 4 Lexitropsin: +Im-Py+ CGCGAATTCGCG/L 4 Distamycin: °Py-Py-Py+
CGCAAATTTGCG/D CGCGAATTCGCG/D
Hoechst 33258 (par a -OH o n phenyl ring A) CGCGAATTCGCG/H 4 CGCGAATTCGCG/H 4 CGCGAATTCGCG/H 4 CGCGAATTCGCG/H 4 CGCGAATTCGCG/H 4 CGCGAATTCGCG/H 4 CGCAAATTTGCG/H 4 CGCAAATTTGCG/H 4 CGCGAATTCGCG/H 4 CGCe'GAATTCGCG/H 4
Ubp
Date, Institution
NDB No.
I.D. No.
DDL017
123
27,28 29 30 30 31 32 33
GDLB05 GDLB31 GDL018 GDLB17 GDL014 GDL001,4 GDL030
124 125 126 127 128 129 130
Ref.
1
2
1984 UCLA
1 1 1 1 1 1 1
2 2 2 2 2 2 2
1985 UCLA 1995 UCLA 1992 Illinois 1992 Illinois 1993 MIT 1989 MI T 1995 Ohio Stat e
1
2
1995 UCLA
34
GDL037,8
131
1987 MI T 1996 Cambridge
35 13
GDL003 GDLB41
132 133
1987 UCLA 1988 MI T 1991 UCLA 1991 UCLA 1991 UCLA 1989 MI T 1994 Inst. Can. Res. 1994 MI T 1992 Illinois 1992 Illinoi s
36 37 38 38 38 39 40 41 42 42
GDL006 GDL002 GDL010,11 GDL012 GDL013 GDL007(?) GDL028 GDL026 GDL022 GDLB19
134 135 136 137 138 139 140 141 142 143
4 4
12 12 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
26
Table 6.1. Continued Sequence grou
Space p
Z
Ubp
Date, Institutio n
Meta-OH(N) Hoech t 3325 8 (met a -O H on ring A) CGCGAATTCGCG/H "in " 12 4 1996 Inst . Can. Res . CGCGAATTCGCG/H "out " 12 1996 Inst . Can. Res . 4 Hoechst 3334 2 (par a -OEt o n ring A) 4 CGCGAATTCGCG/H 12 1992 Illinoi s 12 CGCe6GAATTCGCG/H 4 1992 Illinois Bis-benziniidazole compound (imidazol e for piperazine on Hoechs t 33258) CGCGAATTCGCG/B 4 12 1995 Inst. Can. Res . Berenil CGCGAATTCGCG/B 4 12 1990 Inst . Can. Res . CGCGAATTCGCG/B 4 1992 Inst. Can. Res . 12 CGCGAATTm5CGCG/B 4 12 1997 Cambridg e DAPI CGCGAATTCGCG/D 4 1989 UCL A 12 2,5-Bis(4-guanylphenyl)furan (bereni l analogue) CGCGAATTCGCG/F 4 12 1996 Inst . Can. Res . 2,5-Bis{ [4-(N-isopropyl)amidino]phenyl}furan (bereni l analogue) CGCGAATTCGCG/F 4 12 1996 Inst . Can. Res . 2,5-Bis{[4-(N-cyclopropyl)amidino]phenyl}furan (bereni l analogue) 4 12 1997 Inst . Can. Res . CGCGAATTCGCG/F Pentamidine 4 12 1992 Inst . Can. Res . CGCGAATTCGCG/P •y-Oxapentamidine CGCGAATTCGCG/P 4 12 1994 Inst . Can. Res . Propamidine 4 12 CGCGAATTCGCG/P 1993 Inst . Can. Res . 12 1995 Inst . Can. Res . CGCGAATTCGCG/P 4
NDB No.
I.D. No.
43 43
GDL047 GDL048
144 145
42 42
GDLB21 GDLB20
146
44
GDL033
148
45 46 13
GDL009 GDL016 GDLB42
149 150 151
47
GDL008
152
48
GDL036
153
49
GDL044
154
49
GDL045
155
50
GDL015
156
51
GDL027
157
52 53
GDL023 GDL032
158 159
Ref.
147
Table 6.1. Continued Sequence
Space group
z
Ubp
SN6999 CGCe6GAATTCGCG/S 4 1 2 Tribiz o r tris-benzimidazole (extended Hoechst 33258 analogue) CGCAAATTTGCG/T 4 1 2 3. Mismatc h oligonucleotides (mismatche s underlined) CGCGAATTGGCG 4 CGCGAATTAGCG 4 CGCGAATTe6AGCG 4 CGCGAATTo8AGCG 4 CGCGAATTTGCG 4 CGCm6GAATTTGCG 4 CGCAAATTGGCG 4 CGCAAGCTGGCG 4 CGCAAATTo8GGCG 4 CGCAAATTCGCG 4 CGCAAATTIGCG 4 CGCIAATTAGCG 4 CGCIAATTCGCG 4 CGCm2IAATTCGCG 4 CGAGAATTCm6GCG 4 CGTGAATTCm6GCG 4
Date, Institution
Ref.
NDB No.
I.D. No.
1993 Illinoi s
54
GDLB24
160
1996 Inst . Can . Res .
55
GDL039
161
56
BDL046 BDL012 BDLB54 BDLB33 BDL009 BDLB26 BDL014 BDL022 BDLB56 BDL011 BDLB41 BDLB10 BDLB40 BDLB53 BDLB58
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
74
DDLB57
1
75 6,76
BDL070 BDL042
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
1993 Inst . Can. Res . 1986 Cambridg e 1994 Mancheste r 1992 Mancheste r 1985 Cambridg e 1990 Edinburg h 1989 Mancheste r 1990 Inst . Can. Res . 1994 Edinburgh 1986 Cambridg e 1992 Edinburgh 1987 Cambridg e 1992 Thos . Jeff. U . 1993 Illinoi s 1994 Rutger s 1995 Rutger s
I1
12
1995 MI T
!2 24
24 12
1996 Barcelona 1993 Manchester
57,58
59 60 61 62
63,64 6,65
66 67 68 69 70 71 72 73
—
(b) Dodecamers: other space groups CCUCTGGTCTCC P GGAGACCAGAGG CGCTCTAGAGCG P2 CGTAGATCTACG C
2 3
Table 6.1. Continued Sequence CGCGAAAAAACG
ACCGGCGCCACA ACCGGCGCCACA ACCGCCGGCGCC GCCGCCGGCGCC
NDB
I.D. No.
Space group
Z
Ubp
Date, Institution
Ref.
P2j2,2 R3 R3 R3 R3
4 9 9 9 9
24 12 12 12 12
1993 Yal e 1989 Strasbour g 1989 Strasbour g 1989 Strasbour g 1989 Strasbour g
6,77 78-80 79,80 79,80 79,80
BDL047 BDL018(?) BDL034(?) BDL035(?) -
4 5 6 7 8
C2 C2 C2 C2 C2 C2 C2 C2 C2 C2 P2,
4 4 4 4 4 4 4 4 4 4 2
5 5 5 5 5 5 5 10 10 20 10
1987 UCL A 1992 UCL A 1991 UCL A 1992 UCL A 1989 Berli n 1991 MI T 1995 MI T 1994 UCL A 1997 Inst . Can . Res . 1997 UCL A
81,82
1997 Ne w York U.
86 87 88 89 90 91 92
BDJ008 BDJ019 BDJB44 BDJ017 BDJS30(3)
BDJB57 BDJ060 BDJ069 BDJ081 UDJ060
9 10 11 12 13 14 15 16 17 18 19
P212,2, P2,2121 P2,2,21 P212,2, P2.2.2, P2,2121
4 4 4 4 4 4
10 10 10 10 10 10
1991 UCL A 1992 UCL A 1992 UCL A 1992 UCL A 1993 UCL A 1997 UCL A
93 94 95 95 96 97
BDJ025 BDJ031 BDJ037 BDJ036 BDJ051 pending
20 21 22 23 24 24a
P2,212,
4
10
1996 Cambridg e
98
UDJ049
25
No.
(c) Decamer s CCAAGATTGG CCAAIATTGG CCAACGTTGG, M g CCAACJTTGG, C a OOACjCjCyQ J T OGr
CCAGGCaraCTGG CCAo8G_CGCTGG CTCTCGAGAG CGCAATTGCG CAAAGAAAAG \CGACGATCGT\ \TGCTAGCAGC\ CGATCGATCG. M g CGATTAATCG. M g CGATATATCG. M g CGATATATCG. C a CATGGCCATG. C a CATGGCCATG. C a
(+ di-imidazole Lexitropsin) XGGCCAATTGGX XGGTTAACCGGX
83,84 85
Table 6.1. Continued Sequence CGCAATTGCG CCIIICCCGG CGATCGVATCG CGATGCm"ATCG CCAACITTGG, M g CCATTAATGG. M g CCACTAGTGG CCAACGTTGG/A (+ anthramycin) CCAGGCm5CTGG CCAGGCm5C_TGG CCAGGCm5CTGG CCAAGCTTGG CCGGCGCCGG
Space group
Z
Ubp
12,2,2, P3, P3221 P3221 P3221 P3221 P3221 P3221
4 3 6 6 6 6 6 6
5 10 10 10 10 10 10 5
1995 Inst. Can. Res . 1997 Weizmann 1992 UCLA 1993 UCLA 1992 UCLA 1994 UCLA 1994 Weizmann 1993 UCL A
P6 P6 P6 P6 R3
6 6 6 6 9
10 10 10 10 10
1992 Berli n 1993 Berlin 1993 Berlin 1993 UCLA 1992 Berlin
P2,2,21 C2
4 4
16 8
1996 Barcelona 1992 Kansas
P4,22 P4,22 P4,22 P4,22 P4,22 C2
8 8 8 8 8 4
4 4 4 4 4 4
1994 Ohio State 1995 Ohio State 1995 Ohio Stat e 1997 Ohi o Stat e 1997 Ohio State 1997 Ohio State
Date, Institution
Ref.
NDB No.
I.D. No.
99 16 100 85 101 102 103
UDJ031 pending BDJB48 BDJB43 BDJ055 GDJB29
26 26a 28 29 30 31 32 33
104,105 106 106 107 108
BDJB27 BDJB49 BDJB50 BDJ052 BDJ039
34 35 36 37 38
75 109
BDH071 DDH037(?)
51 52
110 22 22 111 111 111
GDHB25 GDHB34 GDHB35 -
53 54 55 56 57 58
(d) Octamer s CGCTAGCG GAAGCTTC/Act D (actinomycin D) Side-by-side distamycins ICICICIC/2Dst IcICICIC/2Dst IcIcICIC/2Dst 1CITACIC ICATATIC ICATATIC
Table 6.1. Continued Sequence
Space group
Z
Ubp
Date, Institution
Ref.
NDB No.
I.D. No.
(e) Othe r oligonucleotide lengths CGCGAAATTTACGCG CGCAGAATTCGCG \GCGAATTCG\ \GCTT AAGCG\ /GCGTACGCG/ /GCGCATGCG/ \CGGTGG\ \CCACCG\ CTCGAG GpsCGpsCGpsC
C2
14 12 8
1988 NCI/Weiz. 1988 Weizmann 1996 Cambridge
112
P212,21
8 4 4
113,114 115
UD0009(?) UDM010 UDI030
61 62 63
P4.2.2,
8
8
1997 Cambridge
116
UDI047
64
P6,22
12
6
1995 Manitoba
117
BDF062
65
P6222 P2,2,2,
12 4
3 6
1996 Ohio Stat e 1987 Cambridg e
118 119
BDF068 BDFP24
66 67
1222
Z = numbe r of asymmetric units per cell; Ubp = numbe r o f base pairs per asymmetric unit ; NDB No . = Nuclei c Aci d Database serial number; br ' = 5-bromo ; m5 = 5-methyl ; e 6 = 6-ethyl ; e 6" = 6'-a-methyl ; h f> ° = 6'-a-hydroxyl ; o 8 = 8-oxo ; ara = arabinosyl ; ps = phosphorothioate ; A , T, G , C = DNA; a, U, g, £ = RNA ; Py = pyrrole; I m = imidazole .
154
Oxford Handbook of Nucleic Acid Structure
Fig. 6.1. Definition s of the mos t usefu l loca l variable s in B-DNA helices : twist , roll , tilt , rise , an d slid e from on e bas e pair t o the next , and propeller within on e bas e pair. (Fro m ref. 120. )
In th e simples t model , considerin g onl y th e ste p betwee n tw o bas e pairs , table s were constructe d of'th e 1 0 twis t angle s o f DNA' , assumin g tha t eac h o f th e 1 0 unique bas e pair step s in a DNA duple x would b e associate d wit h a unique valu e o f twist (121) . Calladine realize d that the valu e of a helix parameter at a given ste p could be influence d by th e step s preceding an d following , an d dre w u p a set o f principles involving thre e adjacen t bas e steps o r fou r bas e pairs (122) . Dickerson codifie d these principles int o 'Calladine' s Rules ' an d showe d tha t th e rule s explained correctl y th e five A helices and lon e B heli x tha t were know n a t the tim e (123) . The isolate d base pair step was superseded by th e tetra d a s the fundamenta l unit o f sequence and struc ture. All seemed for the bes t with thi s best of all possible helices. However, thi s candid optimism crumble d durin g the followin g years, which ca n be termed th e 'decame r decade' , 198 8 t o th e presen t (Tabl e 6.1c) . Advance s in DN A synthesis method s mad e i t easie r t o prepar e man y ne w sequence s in crystallizable quantities. I t was discovered that B-DNA decamers , with chain s just one helica l turn long, woul d stac k ato p on e anothe r i n a manne r tha t simulate d repeating , endless helices runnin g throug h th e crysta l (81) . A t firs t thi s appeare d to ope n th e wa y fo r rapid completion o f the tetra d model o f sequence—structure relationships. But a s more sequences wer e examined , th e sam e tetra d wa s observe d t o exhibi t differen t heli x parameters i n differen t setting s (84) . Indeed , th e sam e decame r sequenc e exhibite d different loca l parameter s i n differen t crystallin e environment s (85,100) . Th e tetra d model wa s evidently o f little mor e validit y than the ol d 'te n twis t angle' model . Wha t was going on? Did DN A hav e no fixed structure? More structur e analyse s wer e require d t o sugges t th e answer . I n th e simpl e 'on e sequence/one structure ' picture that everyone ha d been assuming, a DNA duple x was a rigi d object , it s geometry fixe d uniquel y by its base sequence. The structura l polymorphism observe d i n th e newe r crysta l structure s suggeste d a n alternativ e an d dis couraging possibility: Perhaps the DN A duple x ha s no definit e loca l structure, but is
Helix structure and molecular recognition by B-DNA 15
5
only a shapeless mass of duplex spaghetti. The individua l structures that were observe d by X-ra y crystallograph y might the n b e n o mor e tha n accident s produce d b y loca l crystal packing forces, withou t genera l applicability. 2 But, a s more crysta l structures became available , a third an d eve n mor e challengin g pattern began t o emerge . Som e bas e pair steps such a s C—A could exhibit twis t angles of a s little as 29° o r a s much a s 54°, wit h man y intermediat e values . In contrast , A— A steps showed onl y a narrow rang e of variation around 36° , excep t whe n paire d with a high twis t C- A ste p i n th e sequence : CAA , i n whic h cas e th e A— A ste p coul d b e forced dow n t o aroun d 29 ° (83) . C— G an d T— A steps coul d eithe r b e straight , o r kinked sharpl y int o th e ope n majo r groove , wherea s purine—purin e step s generall y displayed little o r n o bending . I n short, differen t bas e sequences were foun d to display quite differen t degree s o f variation with respec t to twisting , bending , an d othe r loca l helix deformation s (84) . It becam e apparen t tha t th e issu e wa s not on e o f sequence determined rigi d loca l structure, but rathe r o f a sequence-based differential deformability o f the helix . Thi s chapte r deals with wha t w e have learned abou t sequence-base d differ ential deformability o f the B-DNA helix, an d how thi s deformability i s utilized in th e recognition o f DNA sequenc e by drugs an d proteins.
3. Molecular properties of B-DNA The firs t B-DN A structure , tha t o f the Dre w dodecamer , CGCGAATTCGC G (1) , illustrated severa l feature s o f B-DN A tha t hav e becom e canonica l fro m subsequen t work. 1. The mea n twis t angl e of base steps is centred aroun d 36° , bu t wit h wid e latitud e at individual steps : from les s than 20° to more than 55° . 2. Sugar pucker , althoug h centre d aroun d th e C2'-endo conformatio n expecte d fro m fibre diffractio n model s o f B-DNA , i s broadly distribute d fro m C4'-exo, throug h O4'-endo an d Cl'-exo, t o th e expecte d C2'-endo an d beyond . B y contrast , suga r conformations i n A-DNA oligomers ar e much mor e closel y clustered around thei r expected centre, C3'-endo (see Figs 3 and 4 of ref. 124) . Thi s greater variability i n B helix suga r pucker may reflect a greater malleability o f the B helix, makin g i t espe cially suitabl e fo r involvemen t i n th e molecula r recognitio n process , an d henc e more suitabl e as a medium fo r storage an d contro l o f genetic information . 3. A:T bas e pair s exhibi t greate r variabilit y o f propeller twistin g tha n d o G: C pairs . This i s undoubtedly a consequence o f A:T pair s having onl y tw o hydroge n bond s rather tha n three , an d henc e offerin g les s resistanc e t o twistin g o f th e bas e pai r about it s long axis. 4. Minor groove widt h i s more variabl e in regions o f successive A:T bas e pairs than in G:C regions . I n th e absenc e o f groove-bindin g drugs , A: T region s o f th e mino r 2
This opinio n ha s become almos t a mantr a in som e circles , embodied i n phrase s such as , 'a n artefac t o f crystal packing' , usuall y preceede d b y 'only' . On e majo r questio n t o b e addresse d b y thi s chapte r is whether crystal packing is artefactual, o r informative .
156
Oxford Handbook of Nucleic Acid Structure
Fig. 6.2 . A 2: 1 comple x of a di-imidazol e lexitropsin, an analogu e o f netropsin , wit h th e B-DN A decamer CATGGCCATG , illustratin g widenin g of the mino r groove by insertio n of side-by-sid e dru g molecules. (Fro m ref. 97.)
groove ar e narrowe r tha n G: C regions . Thi s follow s immediatel y fro m th e differ ence i n propeller twist , since an increase i n propelle r magnitud e bring s Cl ' atom s on opposit e strand s close r together and narrow s th e groov e (se e Fig . 9 of ref. 10) . But region s o f A: T bas e pair s ar e als o capabl e o f openin g thei r mino r groov e enough t o accommodat e tw o plana r drug molecules side by side (125, 126) , as G:C regions ca n do without significan t groove widenin g (Fig . 6.2). 5. Narrow A:T region s of the mino r groov e are filled wit h an ordered zigza g spine of hydration, i n whic h a firs t hydratio n shel l bridge s adenin e N 3 an d thymin e O 2 atoms diagonall y acros s th e groove , whil e a secon d hydratio n shel l bridge s thes e waters, givin g eac h o f the m a loca l tetrahedra l environmen t (se e botto m o f Fig. 6.3) . Wide r mino r groov e region s exhibi t a somewha t les s regula r doubl e ribbon o f water molecule s coordinate d t o bas e N an d O atom s an d t o suga r O4 ' atoms (se e top o f Fig. 6.3). Thes e wate r molecules are displaced by a minor groove binding drug , an d the entrop y o f break-up o f the spin e contribute s significantly t o the fre e energ y of binding (127) .
Helix structure and molecular recognition by B-DNA 15
7
Fig. 6.3. Stereovie w o f the B-DNA decame r CAAAGAAAAG , showin g a well-formed zigzag spin e o f hydration down the narro w AAAA region of the mino r groove (bottom), and two les s regular ribbons of hydration along the wall s of the wider CAAA region of the groov e (top, rear). (Fro m ref. 91. )
6. Bending o f the B-DNA duple x nearl y alway s involves the eas y deformation o f roll, rather tha n th e energeticall y unfavourabl e tilt (Fig . 6.1). A s with stacke d planks in a lumber yard , rollin g th e bas e pairs abou t thei r lon g axe s is relatively easy , wherea s pulling the tw o plank s apart at one en d requires mor e energ y (128 , 129) . Th e easie r roll deformatio n i s that whic h compresse s th e mor e ope n majo r groove, an d thi s is defined a s positive roll. But i n certai n case s the observe d bendin g involve s negative roll, with compression of the mino r groov e instead . B-DNA ca n b e ben t an d twiste d i n way s tha t allo w i t t o b e woun d aroun d larg e proteins, to b e supercoiled , an d to b e recognize d b y smaller proteins. Bu t it s susceptibility t o bendin g an d twistin g i s keyed t o bas e sequence . Loca l physica l propertie s o f the heli x ar e an expressio n o f the underlyin g bas e sequence, i n a manner tha t i s usefu l for control . Bas e sequence doe s no t defin e a fixed, static deformation, bu t rathe r a differential deformability o f on e regio n o f helix vs. another . A n analog y ha s been mad e t o the huma n arm . A t th e elbo w i t ca n ben d bu t no t twist , a t bot h th e forear m an d upper ar m i t can twist bu t no t bend , and both bending and twisting ar e possible a t the wrist. I n combination , thes e possibilities allow a broad rang e o f motion. Bu t a determined effor t t o creat e a bend i n th e middl e o f the forear m would lead to disaster . Just
158
Oxford Handbook of Nucleic Acid Structure
as a n ergonomic enginee r need s t o kno w th e inheren t structura l capabilities of differ ent region s of the huma n body , so we nee d to kno w the inheren t deformabilitie s of different sequence s of B-DNA.
4. Differences between individual base steps Bases in DN A ar e of two types : single-ring pyrimidine s (Y ) and double-rin g purines (R). Te n differen t bas e step s ar e possibl e i n a heli x compose d o f complementar y hydrogen bonde d chains . Thre e o f thes e involv e a Y-R ste p fro m a pyrimidine t o a purine: T-A, C-G an d C-A/T-G. (T- G is identical t o C- A fro m th e viewpoin t o f the othe r backbon e chain. ) Three purine—pyrimidin e o r R— Y step s exist: A—T , G—C and A—C/G-T, an d fou r purine-purine (o r pyrimidine-pyrimidine) steps : A—A/T—T, A—G/C-T, G—A/T—C , an d G-G/C—C . Example s o f each o f thes e thre e classe s fro m actual single-crystal B-DNA structures are shown i n Fig s 6.4—6.6. To date , severa l generalizations hav e emerge d fro m X-ra y crystallograph y o f DN A oligomers an d thei r complexe s wit h drug s an d proteins . Thes e o f cours e wil l b e subject to later verification an d improvement; it takes more data to establis h a trend o r a probabilit y tha n t o discove r a n immutabl e la w (provide d tha t on e exists) . Th e following initia l observation s ar e presente d a s innate propensitie s o r tendencies , no t mandatory conformations . Tha t is , th e previousl y mentione d C— A step i s mor e capable o f exhibitin g a larg e 55 ° twis t angl e tha n an y othe r step , bu t thi s doe s no t mean tha t ever y C- A wil l sho w suc h a large twist . However , on e ca n b e confiden t that suc h a large twist i s unlikely a t an A—T step. (a) Y— R step s exhibit ver y littl e ring—rin g overla p betwee n adjacen t bas e pairs (Fig . 6.4). Instead , outlying polar N o r O atom s fro m on e bas e pair ar e stacked against polarizable aromati c ring s o n th e other . A s a consequence, T-A, C—A, an d C— G steps ar e weak , an d ar e natura l fractur e point s fo r th e helix . The y ar e especiall y susceptible t o larg e twis t an d slid e deformations , an d t o bendin g vi a positive roll . This i s a useful featur e i n th e bendin g o f a B-DNA duplex b y proteins such as the Lad represse r (130) , PurR (131), TATA-bindin g protei n (TBP ) (132-135) , ySresolvase (136) , an d others . I n al l these examples, th e protei n open s u p th e mino r groove an d force s th e DN A t o ben d awa y from it , compressin g th e broa d majo r groove. La c an d Pu r accomplis h thi s wit h th e ai d o f a n extremel y larg e rol l o f more tha n +40 ° a t a C-G ste p (Fig . 6.7). Huma n TATA-bindin g protei n (TBP ) opens th e mino r groov e an d induces a 100° ben d i n the sequenc e TATATATA by inserting phenylalanine rings into the tw o outermos t T— A steps, giving the m agai n a positiv e rol l o f more tha n 40 ° (Fig . 6.8a) . Indeed , almos t ever y loca l maximu m of roll i n yS-resolvas e i s a Y-R step : T-A, C-A/T-G, o r C-G (Fig . 6.8b) . (b) In contrast , R-Y and R— R step s exhibit extensiv e ring-rin g overla p fro m one base pair to th e nex t (Fig s 6. 5 an d 6.6) . Indeed , th e bas e pairs in R— R step s seem almost t o pivo t aroun d stacke d purines a s a hinge, wit h greate r ring—rin g separa tion a t the pyrimidine end . Pyrimidin e O 2 atom s in the minor groove, an d O4 o r N4 atom s i n th e majo r groove , stac k firml y ove r th e six-membere d rin g o f a neighbouring pyrimidine . Thes e intimat e stackin g contacts al l tends t o kee p base pairs parallel , an d t o giv e R— R step s smaller roll , slide , an d twis t deformations .
Helix structure and molecular recognition by B-DNA 15
9
Fig. 6.4. Representativ e example s of the thre e pyrimidine-purine , o r Y- R steps , showin g littl e direct overlap between ring s i n adjacen t bas e pairs. Individual roll (°) , shift (A ) and twis t (° ) values ar e give n as [R/S/T]. (a ) C-A/T-G ste p from CCAACGTTGG . [-6.15/+2.59/50.8 ] (83) . Larg e slide, with O 2 o f pyrimidines stacke d agains t six-membere d ring s o f neighbourin g purines . (b ) C—A/T— G ste p fro m CGCATATATGCG, [+3.65/+0.43/36.1 ] (20) . Smal l slide , wit h pyrimidin e O 2 stacke d against five membered rings of purines. (c ) T-A ste p from CGATATATCG , [-1.51/+1.05/43.7] (95). High twis t bu t intermediate slide , wit h pyrimidin e O 2 stacke d betwee n purin e rings , (d ) T— A step fro m CGAT TAATCG, [+9.11/+0.53/31.1 ] (94) . Lo w twist , (e ) C- G ste p fro m CGCATATATGCG . [-3.13/+0.34/37.8] (20).
160
Oxford Handbook of Nucleic Acid Structure
Fig. 6.5. Representative example s o f th e fou r purine-purine , o r R- R steps , illustratin g extensiv e ring—ring overlap, especiall y o n th e purin e end . [R/S/T] a s before , (a ) A—A/T— T ste p fro m CGCGAATTCGCG, [+0.31/-0.31/35.8 ] (3) . (b) A-A/T-T step fro m CCAACITTGG , [-0.82/+0.08/34.7 ] (85). (c ) A-G/C-T ste p fro m CCAG.GCCTGG , [+5.55/+0.94/23.8] (86) . (d) G-A/T-C ste p fro m CGCGAATTC.GCG, [+2.67/-0.10/40.7 ] (3) . (e ) G-G/C- C ste p fro m CCAGGCCTGG , [+4.06/+0.74/36.9] (86) .
Helix structure and molecular recognition by B-DNA 16
1
Fig 6.6. Representativ e example s of the three purine—pyrimidine, or R— Y steps, showing more ring-ring overlap tha n i n Y- R steps . [ R / S / T ] a s before , (a ) A-C/G- T ste p fro m CCAACGTTGG , [-1.99/-0.28/29.9] (83). (b) A-T ste p from CGATTAATGC, [-1.84/-0.49/35.3 ] (94) . (c) A-T ste p from CGATATATCG, [+5.27/+0.04/25.0] (95) . (d) G-C ste p from GCGCGC, [+1.31/+1.05/36.9 ] (119) .
A—A steps ar e especially resistan t to bending , probabl y becaus e o f the interlockin g of base pairs with hig h propelle r twist . T o return t o ou r lumbe r yar d analogy , it is more difficul t t o pus h over a stack of nested sawhorse s than a stack o f planks. Th e unbent characte r o f successiv e A— A step s i s visibl e o n bot h side s o f th e centra l C—G ben d i n th e Pur R comple x (Fig . 6.7b) , an d wil l b e encountere d i n several other protein:DN A complexes , a s well a s in al l crystal structure s o f DN A alon e (139,140). (c) Th e tetra d concept , o r th e ide a tha t behaviou r o f a centra l ste p i s influenced b y the step s flankin g i t t o eithe r side , stil l ha s a potentia l validit y tha t ha s no t bee n
162
Oxford Handbook of Nucleic Acid Structure
Fig. 6.7. Rol l plots fo r two protein:DN A complexe s illustratin g bendin g at C—G steps : (a) Lac represser (131), an d (b ) PurR (132) . Til t i s also plotted i n (a ) to illustrat e it s negligible importanc e b y compariso n with roll . Shor t A-tract s i n (b ) are indicated b y filled dots. Heli x analysis using Richard Lavery' s 'Curves ' program (137 ) fro m th e Nuclei c Aci d Database at Rutgers (138) .
tested sufficientl y becaus e o f th e paucit y o f DN A structur e data . Fo r example , G-G-C-C steps are capable of large positive roll deformation s at the G— C centre , even thoug h G— C in genera l doe s no t exhibi t larg e roll . I n particular , th e sequence TGGCCA is observed to ben d a t the centre , with goo d paralle l stacking of flankin g TG G an d CC A segments . Th e effec t i s so stron g that i t occur s bot h in th e centr e o f the decante r CATGGCCAT G (96) , and acros s the ga p betwee n two stacke d decame r helice s i n crystal s o f genera l sequence : CCAxxxxTGG / CCAxxxxTGG. whethe r th e centra l xxx x segmen t i s ACGT (83) , ACI T (85) , AGAT (81) , GGCC (86) , TTAA (101) , or CTA G (102) . This illustrate s the truis m that the structure of the DNA helix is determined primarily by the stacking of base pairs, with the role of the sugar—phosphate backbone being that of stabilizing the helix against disruption from outside.
Helix structure and molecular recognition by B-DNA 16
3
Fig. 6.8 . Rol l plot s fo r tw o protein:DN A complexe s illustratin g bendin g a t T- A steps : (a ) huma n TATA-binding protein (135) , and (b ) yS-resolvase (136) . Tilt i s also plotte d i n (a ) to sho w it s almost tota l insignificance whe n compare d wit h roll . Shor t A-tract s i n (b ) are indicate d b y filled dots . Y- R step s i n both (a ) and (b ) are marked by vertical lines .
5. DNA behaviour in crystals and in protein:DNA complexes The sequence-induce d behaviou r o f DNA i n crysta l structures has been analyse d i n detail by El Hassan and Calladin e (141), who hav e studied 400 base pair steps from 24 A-DNA structures and 36 of the B-DN A structures of Table 6.1. They, lik e previou s investigators (142,143) , find tha t the most sequence-sensitive local helix parameters ar e roll (R), slid e (S), and twis t (7 ) fro m on e bas e pair t o th e next , an d propelle r twis t within a given bas e pair. Their conclusion s from the systemati c analysis of DNA conformations i n th e crysta l will no t b e repeate d here ; bu t the y generall y confir m an d extend th e principle s enunciate d above . Thi s chapte r wil l buil d o n thei r work , an d carry it to another level. An objectio n ha s bee n raise d i n th e pas t a s to th e biologica l relevanc e o f DN A crystal structures , especiall y whe n thes e appea r t o diffe r fro m result s measured o n
Table 6.2. Representativ e X-ray analyses o f 63 protein: DNA complexes, indicatin g local roll, slide , an d twist behaviour Protein DN
I.D. . Ref .
A Binding Sequence No
A. PROKARYOTI C H-T- H PROTEINS Lambda represser [A-T-A-C-C-A-C-T-G-G-C-G-G-T-G-A-T-A-T )1 (25°) R :1 1 o 1 1 o 0- 2 o-l- l 1 2 2 o o 1 o concave S :o l-o- o 2-2- o 3 o- o 1-2- 2 1 o- o o- o T: 1 l-o-o-o-o- o 2- 1 o o-2-o- o o- o l- o
14
4
Lambda represser [A-T-A-C-C-A-C-T-G-G-C-G-G-T-G-A-G-A-T )2 (25°) R :1 2 o 1 1 o 1- 1 o-l- o 1 o 3-o- o 1 o concave S :o l-o- o l-2- o 2 o- o 1-1- 1 o-o- o o- l T: 1 l-o-l-o-o- o 3- 1 o o-2-o- o o- l l- o
14
5
434 represser (OR!) [A-G-T-A-C-A-A-A-C-T-T-T-C-T-T-G-T-A-T )4 (42°)[rc=65A] R :o o- o o o 1- 1 o o-l- o 2- 1 1 l-o- o o concave S :o o-l- l o 1 o-l- l 1 o- o o 1 o-o-o- o T: 1 o- o o-l- o o- l o 2 o o- l o-o- l 1 o
14
6
434 represser (OR2) [A-T-A-C-A-A-T-G-T-A-T-C-T-T-G-T-T-T )5 (35°) R : -ol 1 1 l- o o-l-o- o lo o l- o o o concave S :o o-l- o o o o- l l-o- l o l-o-l- l o T: 1 l-o-o- l o 2- 1 2-o- o 2-o-o-l o- o
14
7
434 represser (OR3 ) fA-G-T-A-C-A-G-T-T-T-T-T-C-T-T-G-T-A-T )6 (42°) R : lo o o o o o- o o-o-o- o o o l-o-o- o concave S : -oo o-o- o o-o-o- o o-o-o- o l-o- o o- o T: -oo l-o- o o o l- o l- o l- o o-l- o o- o
14
8
434 cro protein (OR!) [A-G-T-A-C-A-A-A-C-T-T-T-C-T-T-G-T-A-T
14
9
(27°) R concave S
:o : -oT: -o-
o- o o l-o-o- o o o o- l o- o 2 o-o-o o o o- l 1 o o- l o-o-o- o l-o-o- o o o o o-2- o 1 o- o oo o o-o-o- l o o
)7
Table 6.2. Continued I.D. DNA Binding Sequenc e
Protein
No. Ref.
CAP protein rC-G-A-A-A-A-G-T-G-T-G-A-C-A-T-A-T[G-T-C-A-C-A-C-T-T-T-T-C-G )8 90° R : l-o-o- o o o o-o- o 7-o-o o 1- 2 o l-o- o 5- o o o-o- o o-l 1- 1 concave S :1 l-o-l-l-o- l 2- o o o-2-o- o 2- 2 o- 2 2 2- o l-2-l-o- o oi l T: o 1 l- o o o- o l-o-2- o o-o- l 1- 2 l- o o-2-l- o o-o- o 1 o 1- 1
15
0
CAP protein FC-G-A-A-A-A-G-T-G-T-G-A-CIA-T-A-T-G-T-C-A-C-A-C-T-T-T-T-C-G )9 90" R : o- o 0-4-0- 2 2- 2 1 9-O-2- 1 2-1- 1 o o- o 6-O-1-O-1-O- 3 oi l concave S: 1 1 o-1-o-l- l 2 o- o 2- 1 o o o-l-o- l 2 l- o 2-2-o-l-o- o o 2 T: - 1 o o o o-o- o 2-1-2-0-0 o-o-l- o 2- 2 1-3- 1 1 l-2- o 3-o-o- 2
15
1
10 13
0
11 13
1
12 15
2
14 15
3
Lac represser
(60')
convex
[A-A-T-T-G-T-G-A-G-C-G-C-T-C-A-C-A-A-T-T) R: -6- 4 o o- l 1-2-2- 1 9-2-0- 1 2- 1 1- 1 3 3 S: -5- 1 4-1- 0 2 1 o-l l-o- o l-o-l- o 2-2-1 T: -7- 9 2- 2 2 o 3- 5 l-2-o- o 2-3- 3 3 o 2- 8
PurR represser
45°
convex
R: 1-
S: -2-
T: Trp represser
(28°)
concave
HIN recombinas e straight
R: S: T: R: S: T:
R: S: T:
TA-C-G-A-A-A-A-C-G-T-T-T-T-C-G-T) 1 o-o- o o-o 8- 0 o-o- o o-l 1
0 o-p o-o-l l-l-o o-o o-o- 2
2 2-1-o-o-o-l-o-l-o-o-d- l 2- 2
[G-T-A-C-T-A-G-T-T-A-A-C-T-A-G-T-A-C) -0-0-0 1 1 1-1-0 0-1- 1 1 1 l-o-l-o -o l-l-o- l o- l 1 2 1- 1 o-l-o-l l- o -1 l-o-o-o- o o o- o o o-o-o-o- o l- o -o-o-o 1 2 o-o-o-o-o- o o 2 l-o- o o -o l-l-o- o o- l 1 2 1- 1 o-o-l-l l- o -1 l-o-o-o-o- o o- o o-o- o o-l-l 2- 2 TG-T-T-T-T-T-G-A-T-A-A-G-A) 3 2 2 o l 2 1 o l o 3 -0 - o o o o- l 1 o-l 1- 1 1- 2 -l-o o-o- o o l- o l-o-o- o
Table 6.2. Continued
No. Ref.
DNA Binding Sequence
Protein
y6 resolvase rC-A-G-T-G-T-C-C-G-A-T-A-A-T-T-T-A-T-A-A-A
60° R : - 1 1- 1 l- o o o o-o- o l-o oconvex S : o-2- l 1 o-o- o l-o-o-l 1 o T: o-1-o-o-O-o-
l1
o-l- l o
| T-T-A-T-C-G-G-A-C-A-C-T-G1 5 13
o6 o 2 1 1 o o-2-o-o-o-o- o l- o 1 o o-o-l- o o o o-o-o- o o- o o-o- o o-l o
-o-o-o-o-o-l- o -
6
o o- o o o- 2 o 1- 2 1- 2 1
B. EUKARYOTI C H-T-H PROTEIN S
Engrailed homeodomai n straight R
: S: T:
MATcc2 (Yeast) straight
MATal/a2 (Yeast) 60' concave Even-skipped straight
Oct-1/POU straight
R: S: T:
ri-T-T-G-C-C-A-T-G-T-A-A-T-T-A-C-C-T-A-A) o o o o-l- o o- l o-o-o- o o 1 o 1 o o o o-o-o-l o 2- 2 2- o o o-l- o o-o- l o l- o o-l-o o- 2 l- o 2- 1 o o-o-o-o- o o- l 1- 2
16 15
4
rC-A-T-G-T-A-A-T-T-C-A-T-T-T-A-C-A-C-Q-C) l-o l- o o o-l- o o o- o o o-o- o 1-2 2 1 l-o o- o o o- l o o 1- 1 o 1 o-o- o o o- o
17 15
5
1 0-0- 1 l-
O O- O l-l-O O
O O-O-l 1- 1 1
rC-A-T-G-T-A-A-T-T-T-A-T-T-A-C-A-T-C-A)
1 o 2 o o l- o o-2-l-l- o 1 o 2- o o 1 l-o-o-l o o- o o- o o-l- o 2- 1 o-l- o 1 o-o o- 2 1 o- l l- o 2- 1 o-o- l o-o-o- o
18 15
6
R: S: T:
[T-A-A-T-T-G-A-A-T-T) O-O-O O-O- l l-O 1 2-0-1-1 1 1-0-0- 0 1-0 0- 0 1 1- 0 O 1
19 15
7
R: S: T:
20 15
8
R: oS: -
T: -
FG-T-A-T-G-C-A-A-A-T-A-A-G-G) o 1 1- 2 2-1- 0 o o 1 2- o 1 1- 1 loo o-o- l l-o-O- o
o 1- 1 o- o o o-o- o l-o-o o
Table 6.2. Continued Protein DN
A Binding Sequence No
. Ref .
Paired (prd)protein [C-G-T-C-A-C-G-G-T-T-G-A-C )2 20° R : -11 o-o- 2 1 o l-1-o-o-o concave S : -o-lo 1- 1 2-2-1- O 2 2- 1 T: 2-oQ l-o o-o-o- o o o- l
2 15
9
Paired (prd) protein [A-T-A-A-T-C-T-G-A-T-T-A-C
)2
3 16
0
[G-T-A-A-T-C-T-G-A-T-T-A-C) 2
3 16
0
Pul ETS-domai n FA-A-A-A-G-G-G-G-A-A-G-T-G-G-G )2 (407) R : -o-ol o 1 1 o 1 o o- l oo o concave S :o o-o-o- l o l- o o-l- l 1 o- o Tj oo o o-1-l-o- l o 1- 1 o- o o R: -o-o-l o l o l l o o o o-o o S: o o-o-o-l o l-o o-l-l 1 o-l T: ooo o-o-l-o-l o l-l-o 1-1
4 16
1
RAP1 DN A domain [c-G-C-A-C-A-C-C-C-A-C-A-C-A-C-C-A-G )2 20° R : -1- 1 o- o o- o 1 1 o- l o o o o o-o-o concave S :1 o o-l-o-o- 2 o 2- o 2-1-1-1- O o o T: o l-o- o l-o-2- l l- o l-1-o-l- o l- o
5 16
2
21° R
concave S
: -o-
o o o o o 2- o l- o o
:o T: o
o-o-l-o- o l-o-l- o l- o o o-l-o- l o-o- o o o- o
R: -o-oS: -
T: -
o
o o o 1 1 o o-o-o- o
o 1 o-l- o o o-o- l o l- o
o o l-1-o-o-o-o- l 1 o- o
Table 6.2. Continued
No. Ref.
DNA Binding Sequence
Protein C. ZINC-BINDIN G PROTEIN S Zif268 (Cys2His2Zn) x 3 R straight S
YY1 zinc finger domain (Cys2His2Zn) x 4 R straight S Tram track (Cys2His2Zn) x 2 straight
P53 tumor suppresso r (Cys3HisZn) straight Glucocorticoid recepto r (Cys4Zn) x 2 R straight S Glucocorticoid recepto r (Cys 4 Zn)x2 R straight S
26 16
3
rA-G-G-G-T-C-T-C-C-A-T-T-T-T-G-A-A-G-C-Gl o-o o- l o o- l o 1 o- o l- o 2-o- o 1-1 o o o-o- l o- o o- o o-o- o o-o-o o-Q-o-2 2
28 16
4
[T-A-A-T-A-A-G-G-A-T-A-A-C-G-T-C-C-G) 0-1-1 0 0 1 1 0- 0 0 0- 1 1-0- 0 0 1 l-o-o l-o-o- l o-o-o-o- l l- o o o-o
29 16
5
rT-T-C-C-T-A-G-A-C-T-T-G-C-C-C-A-A-T-T-A) - 1 1 1 2 o- l 1- 1 1- 1 o o- l o o-o-o- o 2 -o-l o- 2 1 o-o-l- o o l-l-o- o l-o- l o 1 -2-0-0-1-1 4-2- 0 o-o o- l o- o l-o- l l- o
29a 16
6
[C-A-G-A-A-C-A-T-C-G-A-T-G-T-T-C-T-G) -o o o- o 1 1 o- l l- o o o-o-o- l o- o o-o-o o-o-o- l o l-o-l- o o o-o- o o
30 16
7
[T-C-C-A-G-A-A-C-A-T-G-T-T-C-T-G-G-A) -o 1- 2 1 o- o o 2 o 2 o- o o 1- 2 l- o -1 o 3 o o o-1-o-l-o-l- o o o 3 o-l -o-l 2- 1 o o-o-o-l-o- o o 1- 2 2-l- o
31 16
8
: T:
: :
T:
R: S: T: R: S: T: R: S: T:
: : T: : : T:
:
[G-C-G-T-G-G-G-C-G-T) o-o o-o- o o-l-o- o -0 0- 0 O- l O- O l-O -1 0- 1 0- 1 0- 1 0- 1
-1 o-o- o o- l o- o o-o- o o- l o- o 1-O- 3 4
O O O 1-O-O-O-O- 2 O
O- l O-
O O-O- O
o-o-l-o 1 o o-o- o o o- o l- o o- o o o o- o l-o-o- l o-o- o o- o 1 o- o o- o -o o- o l- o o o-o- 2 o o-l- o o-o-o- l
o o- l o- o o-2- o o-o-l o o- o o- l 1
Table 6.2. Continued DNA Binding Sequence
Protein Estrogen receptor (Cys4Zn) x 2 straight
PPR1 (Cys6Zn2) straight
R:
S: T: R: S: T: R: S: T:
D. LEUCIN E ZIPPER AND RELATED
GCN4 (bZIP) straight
GCN4 (bZIP) straight
R: S: T:
R: S: T:
[C-A-G-G-T-C-A-C-A-G-T-G-A-C-C-T-G)
o o o o-o-o- o o l- o o- o o o-o- o o-o-l-l 1 2-2 o-o-2 2 l-l-l- o 1
No. Ref .
32 16
9
34 17
0
36 17
1
37 17
2
o-o l- o o-o- o 3-0-1 1 o-l 1 o-o
o-o o-o- o o-o 1 l-o-o- o o l-o-o l-o-o-l 1 2-1 o-o-l 2 l-l-l- o 1 o-o o- o l-o- l 2 o- l o-o- o 1-1 o [T-C-G-G-C-A-A-T-T-G-C-C-G-A)
o-o-o 1 o- l o 1 o l-o- o 3 -1 l-o- o o-o-l-o o- o o o- l o-o o- 2 1 2-1 1 o-o o-l- l
[G-G-A-G-A-T-G-A-C-G-T-C-A-T-C-T-C-C)
o 1 1 o o 1 l- o 3- o 1 1 o o 1 1 o 1 o- o o- l 1 o-l 1- 1 o 1- 1 o- o o 1
-o o- o 1- 1 1-1-1 o-l- l 1- 1 l- o o- o
[G-G-A-G-A-T-G-A-C-G-T-C-A-T-C-T-C-C) oil 10 1 1-0 2-0 1 l-o 1 1 1 0
-o o-o-o- l o-o- l l-l- o o-l-o-o o- o -1 o- l 1- 1 l-o-o-o-o- o 1- 1 1- 1 o- l
Table 6.2. Continued DNA Binding Sequence
Protein
Fos & Jun
(bZIP) straight
MyoD (bHLH) straight
R: S: T: R: S: T:
[A-T-G-G-A-T-G-A-G-T-C-A-T-A-G-G-A-G-A) -1 1
No. Ref. 38 17
3
41 17
4
44 17
5
5 17
6
O O 0 O- O O- O 1 0 O-l- O O- O O-O
- 0 0 0 0-
0 1-0-0-1- 0 1- 1 0 1 0 1- 1 1
-1 o- o o- l l-o- o o-o 1- 1 o 1 o o- l 3 o-o-o-o lo o o- l 1 1 o o o o o-o-l o o o o- l l-o-o-l- o o- l l- o o- l o o o-l o 1- 1 l- o o-o- o o- l 2- 1 2-1- 0 2
[T-C-A-A-C-A-G-C-T-G-T-T-G-A)
R: S: T: R: S: T:
1-1 0- 0 1 0 1 0 1- 0 0- 0 0
-1 2-o- l 1-1-1-1 1-1- 1 2-o -1 l-o- o l-o-l- o l-o- o l- o
o-l o- o 1 o 1 o l- o o- l o
-o 2-1- 1 1-1-1-1 l-l- o 2-0 -1 l-o- o l-o-l-o l-o- o 1-1
E. OTHE R SPECIFIC PROTEIN / DNA COMPLEXE S EcoRI restriction enzym e straight
R: S: T: R: S: T:
re-G-C-G-A-A-T-T-C-6-C-Gl 1 0 1-0 6-9 6- 0 1 0 1 o-o-o-o o o o-o-o- o o -l-o c-o- 2 o-2-o o-o- l 1 o l-o 6- 9 6- 0 1 o 1 o-o-o-o o o o-o-o- o o -l-o O-O- 2 o-2- o o-o- l
Bov. papillomavirus- 1 E2-DNA [C-G-A-C-C-G-A-C-G-T-c-G-G-T-c-G )4 68° R : o-oo 1 2-0-1-1-1- 0 2 l-o- o o concave S :1 1-1- 1 1- 1 o 2 o- l 1-1- 1 1 1 T: 1 l-o-o- o 1 o 1 o l-o-o- o 1- 1
Table 6.2. Continued DNA Binding Sequence No
Protein Met J represser straight (NDB) [2x25° (paper)] Arc represse r 50' R concave S EcoRV restriction enzyme straight
EcoRV restriction enzyme 50° convex
. Ref.
[T-A-G-A-C-G-T-C-T-A-G-A-C-G-T-C-T) 4 R: o l- o o o o- o o o l- o o o o- o o S: l-l-oo o- o 0- 1 l-l-o- o o- o o- l T: l-l-oo o- l o- l l-l-o- o o- l o- l
: :
T:
[A-T-A-G-T-A-G-A-G-T-G-C-T-T-C-T-A-T-C-A-T) 4 o 3 1 o- 2 2 o-o-o-l- 2 o- o 1 1- 3 1 1 1 o
6 17
7
7 17
8
9 17
9
8 17
9
-o-l o-l 1 o-o-o- o 1 o-o-o- o o o- l o-o-l
-1-2 1-2 2-2-o-l- o o o- o o-2- o o-o-o-o-o R: S: T: R: S: T: R: S: T: R: S: T:
[C-G-A-G-C-T-c-G) (non-cognat 2 o o-o-o-o- l -l-o o-o- o 1 o o-o-o-o o- o 1 . 1-1 o- o o o- o -o l-o-o-o- l 2
e sequence) 4
-o 1- 1 o o-o- o
[G-G-G-A-T-A-T-C-C-O (cognat e sequence) 4 -o-o-o 2 9 1- 1 l- o 1 o-o-l-o-o- l o 1 o-o 1-3-2- 3 o- o o o o- o 1 9 l- o o o l-o-o-o-o-o-o-o 1 o-o 2-4-2- 4 2- 0 o
Table 6.2. Continued DNA Binding Sequence
Protein EcoRV restriction enzyme
(80°)
convex
R: S: T: R: S: T:
No. Ref.
[A-A-G-A-T-A-T-C-T-T) 1 1 - 0 2 9 2 1 34
80°
convex
TATA-binding protei n (TBP) (Arabidopsis)
80°
convex
R: S: T: R: S: T: R: S: T:
R: S: T:
180
1 o-o- l o-o o 2 4
-l-o 0-1-4- 2 l-l- o 0 1 - 1 1 9 1 1 2 3 1 o-o-l-o- o o 1 3 -o-o 1-2-3- 1 o-l- l
IHF rG-C-C-A-A-A-A-A-A-G-C-A-T-T-G-C-T-T-A-T-C-A-A-T-T-T-G-T-T-G-C-A-C~C5 160° R :o 1 o-o-l-l-o-o-o-o- o 1 9-3-o-o-l-l-l- l o 9 1 1 o 2-1-1- 3 o- 2 o- o concave S : -oo 2-o- o o-o-l-o-o l-o- l 3-o-o-l-o- l 2 l-o-l-o-o-o-2- o 3- o 2-1-O T: -oo o- o o o-o- o l- o o- 4 1- 1 o o- o l-o-o-o-l-l-l- o o o- o 2- 1 l- o o TATA-binding protein (TBP) (Saccharomyces)
50
fG-T-A-T-A-T-A-A-A-A-C-G. . . , -o 9 3 1 5 2 1 7-o-o- o -2-1-2 2- o l-o-o-l-3- l 0-2-3-1-5-1-1-3 o- o 1 -o 9 3 o 5 2 1 7-o-o- Q -2-1-1 2- o l-o-l-l-3- l 0-2-3-1-4-1-1-2 o- o 1 TG-C-T-A-T-A-A-A-A-G-G-G-C-A) -0-4 8 2- 1 2 l- o 6-2-1-2- 1 -3 1-3- 1 2 o o-o-l-o- o o 1 -2 2-2-1-1-2-1- 0 l-o-l- o o -2-2 7 1- 1 2 l- o 6-2-0-1- 1 -2 1-2- 2 2 o o o-l-Q-o- 2 2 -1 l-l-l-l-2-l-o- o o o- 3 o
1 18
1
52 13
2
53
133
Table 6.2. Continued DNA Binding Sequence
Protein TATA-binding protein (TBP) (Human) 80° convex
R: S: T:
TATA-binding protein (TBP) (Human) 80° convex
R: S: T:
TFIIA /TBP/DNA H T complex 80° convex TFIIB /TBP/DNA HF complex 80° R convex S Kappa B P50 homodimer straight
rC-T-G-C-T-A-T-A-A-A-A-G-G-C-T-G) -1 o-l- l 8 2- 1 2 2- 0 6-1-2-0- 1
No. Ref.
54 13
4
55 13
5
56 18
2
57 18
3
61 18
4
o o-l-o-2- 2 2 o o o- l o-l- o 1 o-o-o O-O-2-1-2-1-O- 1 o-o- o 2 [C-G-T-A-T-A-T-A-T-A-C-G)
l - o 8 5 1 4 6 2 9 20 2-0-1-1 2- 2 3-1-2- 1 2 o 1-3-3-1-6-1-2-4-3- 0
TT-G-T-A-T-G-T-A-T-A-T-A-A-A-A-C) R:-2-2-0-0-0-2-0-2 7 1- 1 1 o- l 3 S: 2 o-o-o- o D-o-o-2- 2 3- o 1 o- 4 T:-l-o o-2- o o-l-o O-2-1-2-0- 0 o TG-G-C-T-A-T-A-A-A-A-G-G-G-C-T-G) -o-2-l 6 l - o 2 1 2 4-0-1-2-0- 4 -1-1 1-2- 1 3-0 o-o-o-o-2-o- l 5 o-l o-l-l-o-2-l-l- o o-o- l o o
: : T: R: S: T:
rG-G-G-A-A-T-T-C-C-C) O l o o o o l l l
2-l-o o- o o-o- o o
-l-l-o o o o-l-l- l
Table 6.2. Continued Protein DN
I.D. . Ref.
A Binding Sequence No
PVUII endonuclease [G-A-C-C-A-G-C-T-G-G-T-C straight R : 11
3 l-
o 4- o 2 3 o 1
)6
3 18
5
S: o o-l-o- l o-2-o- l o 2 T: -2-0-1-0-2-0-3-1-00o
Notes: Overall ben d in DNA heli x i s given at left belo w th e protein name, a s well a s whether th e protei n sit s primarily on th e concav e o r convex side o f the ben d produce d in th e DNA . Value s i n parentheses have bee n measure d approximately from Figures , when n o angle is cited explicitly in the text . Henc e the y are to b e considered merely suggestive of the tru e bend angle . r c = Radiu s o f curvature o f 4 smoothly ben t DNA helix . All Y-R step s are indicated i n bold face. A-tracts , consistin g of four o r more successive A T bas e pairs without a disruptive T—A step, are underlined. 8 6 of the 15 7 A-A/T-T steps, or 55% , are found in A-tracts. Fo r distribution s of other step types, see Table 3 . Observed roll , slide, and twist , a s calculated by th e 'Curves ' program , ar e indicated belo w eac h ste p o f the sequence . A s show n below, Roll is numbered i n 5° intervals centere d around zero, Slide i s numbered in 0.5 A intervals centered around zero, and Twist is numbered i n 5° intervals centered aroun d 35° . Range o f the give n variable: Roll: -10 °-5 ° 0 Slide: -1. 0 A - -0. 5 A - 0. Twist: 25 °30 ° 35 Symbol: -
1-
oo
°5 ° 10 ° - 15 ° 20 ° - 25 ° 30 ° 0 A - 0. 5 A - 1. 0 A - 1. 5 A - 2. 0 A - 2. 5 A - 3. ° - 40 ° - 45 ° - 50 ° 55 ° - 60 ° 65 ° 1
2
3
4
5
All step s greater tha n 9 ar e represente d simpl y b y a 9 . Sequence s followe d b y tw o set s o f R/S/ T value s generall y represen t tw o independent molecule s per asymmetric unit . For a more extende d analysi s using the ne w FREEHELI X program , se e references 191 and 192 .
0A
[*late [. I 1 \\lnr.oii o: A-1l\A. ;,ii (.i i'. .'.iul A: I \V LIISHII t JH k b.i--.1 ; xn^ wiih Ui-.-ir p;:^it>.f :: \Ji-.iUiin -iu-v mJu .iii-il ,L- In ilrn^i'ii lio:iil linings .u- .KVi-i'l'irv Mi: OKiv--, i'iL :iiMr.it:i!ii at tin: Ki-o •.\u:^. Alnn.^r .ill '::\-.\r:•••.:•.• n L'i!n,i:in;. ^-L". ..;n :ln- h.iii-> ,i:\- iv.-iipi^ -jxn1].!; ri.i;' ;hc .ii'i'.i-- LI: il:i- ^luilnv. JJIDOM- ili.n .-.iv [•lni'it'ii .'V i.:"i-1.11 n!rn;iils. \V.IUT> 111 lilt' •.ri.illmv t;ii«!Vi' .nv s:un\ I) ir ^IVCi:. In I In- -.kvp ::1OUV'.- thi' \\ iu-;'> i iiiir.ii'riiisi. rhr ^]:usp!:.iii- :iniiv:ir ovysjciis ,iu- blur. rliii^L1 iniiuyc:! lii>ilJ:[]Lr x> pn^):nHis (i ;ir 7 in [•ir-.ii:'v in- ii-ii. .'.iiii :lu- Vi.itL'r ii-.iM.-nik1- ':':\ ,trit:n;: pn~iiu-.ii . : .1 |!v--.in;.;i:n^ .1:1- S/U-ATI in \flUnv NiiU1 l :!;• riiri.iiiL!i:> lr> ,lr.i::..:ii of [In 1 :• iii-.-iJ.'.! ;-.rin.|.' ir- : l i v n r i u - .'ilnn l::iik.'H lru\. id W.icor L:n'lL'i".i]i'> hriii^i^^ ^-oiisc'.-.iiix.. 1 O i l ' |i|i,>-iiii.-.ri- ovv. 1 -,-!! .ico:ns -'blue 1 uip.'ili.-r \\".ili \\,'.u-i^ :u-J--,unii; ihi- O - l ' f \ 4 p-.^-.rn'ii ,-t'p',nuiiJiiu.- :\,'llov. i iir -.lu- N~ ]u^i:im. IH'^-.IVIIKX -'ivdi t-^rin p t -:;!.-.i;:-ri.i! .iv.iys. .;i: IV-n.ii'.on.il M . i t c - - r.r : . - i I lie- llV..--.U'.'.,-H hoil-.lil:.; 7ii t l ; c .'iki.-- i.'" -,ll:"-:•, ^ i t . t . t i T A t ' x . x ' : :.li 1 "!.:. I:-,M.( :(:rt ,t-( A ' < . ( ;t;: -3~;.
Platw II, Sniu-f.iii- ot i:ite^iMti;;il ]li.'>[ tat tor protein {]! IF), ro'tiplexed \\ iih U DNA (1 SI). '1 he I )\ A is hei:t sliarpK -it the top bv -.iiseMioi- or loops -.1110 die minor p-oovc. .ir.c h;ii t\vo :n,md,,ii>iily ^trMieli: Sv'^iiK'ius rli.n pL;.f.\ nLon^ tb^': j^iot^un TO left arid riL^i:. :>roi:t I'et. I S ] . )
Plate III. Srrueture of d(CCCGCG) .is Z UNA. :'A; The two stacked h.e\,-]niideo:ide duplexes in the i-rv-ul struLl-.ire of d:rG(X'r(X!) .ire shown as a stereodui:r.iiii. ' I h e upper duplex is s:io\vn .1* a CPK model u-iiiis? the van dor \V.ials rudii of t',H'h .icon) to dL-tirit.- ^|^lle!es for each atom. 11'.i' .o\\'i'r duplex is shown as a stick model. which the barkbone phosphates traced with a ribbon to show the zigzag name of 7 DNA. The nucleocides are numbered from the 5 to the 3-terminus of each serand. 1-6 for one strand and 7-12 for the complemantry strand. The d(CpG) dinacleotrice in the tin.- .ij'ir/p'.'yt stacking arrangement (B) and the diGpC dinucleotide in the fyit.-'jj.'tni'i sracking arrangement (C) are shown looking down the helix axis. Hydrogen bonds are shown as dashed lines connecting the bases of each base pair.
ljliilf ir..'l!h
TV,
. .: i'.-iir.Mr.Mir. IviiMvn
l i j i i i i . •• K- \> :i d.-
:2~
. K!
..,-::i :!;,- i:::i-.-.;n h-l
I 110 J',i.;i • • ^ i ' : c ' i ' s i i.l iK-s nl I hi' > l v k hulk! >i:'.i ::•.'. .il'i i . ' l » , ' i \ ' - ' l - - " , ' - i
.uli-i:^ JM: \-
K^: t '. I .;'\ ..l\' ^ \ ^ L ' l L I I \ 1 ^ i L ^ . f . , ' . 1 hi.
^'
ii. ":: T. 11! :. 11'/, ir: U • ' ! • • '•• I."-.'. .'I
I''
•"
-.'
l l i : . ' K \ : 'I
: J . ' - . I . •••'. i l l ' . : ^'
mil- 1' l ; - i n i i : ' l . s . i \ \ M V M'i'in: :ll,- •..•l-,-v.\- > ,-iii. .;•!-,•.I p;n!;. ' ; - i (, '..i; I ;p.ir>.'!: K'l". :.'.': i ill,' p.n. !.,'. vi : C t r V . .i' :i:u1 ;:',:-| ; :.:::. i;n >;:t :, : in i ni - i " . i - ' . • • • L:roiL::.
ill:; .liH']'.'! i !i-i -'
i'( ,A • I I Ai ,:
: ! . - . . ,ihi. i.'.i In i ! i \ •:. : ! n ' . : . : . - l i .
•' i. : . . • : : ! ' • ] • • ! • .
.;'
i ..\
. H i . I I : , • • . ; . . - . I- ' . ; ' - : i . n . l
i!,l,-.'. 1
I in .'..I. •...-.. 1 1 : - :'.. - l . l i l i - , i- L:..
; - s : . . :.;•', .1. l ' ' l . : - . ' . . i'.: 'K'
.AT:,): n,- ••..•>i,".:i-
rv
I'lati- V,
1 ,'iii|-ni-.!:i K-i«.vn :!u UT.>.LI.L sU''.KUrr .>: iri-hi: ill. - . ! : ( : ( ' ( . A A ' l C!A( ; < ' : . i i i - i - . •.IIKT dupk'x
. I n ; , r.r.l 1 1 1 1 : 1 1 : ' . :\v-.> -!L,'.i:\J i I ' Y l . A I ' L ' i •
nr- .UK! k-'u .. ^l.:r : .!.:iii ; I - ] > \ A Jlipk-x i' 1111,1:11:114 -iiinn.il
C:(.: ,nul A: 1 \ V i i - . - n Ci;. k |>.u-->. Ore .n.iiK'l i< xlnv.vn in j si-.n-.'-lillins! vjn ,!..•] \V.i.ih ''.'rin. \vlnlr rlumill-: vr.mJ is -lii!\\n iv vr;-,-k-bn:;J Invn: i;> ikiiMr.ix- iu-i^'i iln- inU'fsiv.'i'J si.si ^in : > ';ii't\\x-L-[] di;1 !i,' -i.'p\-^..i-nii ul . i •::r,inuT,-s. Ilu 1 iinri'-lr.iiul l\i- t -si.i, k:n:: IH-IV.\-I-:I i l u - f A c .in|'.iiri-d irt:.iiii--ii:c u'siiiiii-s rK'in opvisiic str.ii^i* ,.-,m In1 >i-i-:i .-.t [J:c iiiK-rl.u i1 at il^ 1 ip.uv-ti.liui: ,u)J stii'k-ivnui OMIlds.
(>l.itc VII. ] K- ,;m, II.IL' .'Clkm , ' .'.'. -.11 pi..-p.inrion o:'li:i' -dr [>..i;vd ^l ;t'( ;At ,TV(;\A :l;-n. :n I . 1 In-. V.'L|IK'IV;\- --.ml mi- ni,! C i - \ A : C i A in.i:it>. in v.'hi. h rli^ \-x;r,i A' JEVJ:: iii..1 < ,AA - I I . M H ! i:/ ; -i\ .I!.;:L-> :H-U\OI'J- ci-.i- u r :i.;i]>>-i::i KVi'.iV1. i'l" iv.-n tl.iiikii:;.; -h;Mi.\l C:^ ;i.ii:,. I":;..- iiiu-r. j'.nc.l. nuplii\v. .k : .c]:.iMiic n--iJ;k> .nv •. iiliviu-il liiik- :i- I'.ii'i- i l k - vp.i.v-iLJi:-^ mil •,li, k
'.Vlil,1 vIMIlJv
i 1::: JH'li.x . 1 S ] > i- U'lil
- 2 l ' " .,[ ..'.nil , ' l ' l l l . . - ( i A A l U A -I'.-JS >1\..1IM- - . i t " '.In- •-.M'Jx;;-
MU
I l l v . - n i i ' l l : - ' " I'"i- -liiLili- . k l i ' l l i ^ i l k ' .
Plate VIII, 1'lu- >iri:o;ui-L- "I" l'-v t , ; 1 ' G G A A I ' t K l A A I GGAA (.' K-].'t.iJ;v.im;.T II..IITII: i-i>:i:.iiii;iii; .1 J Cl'.A. nuMir ri.:'.:-. L;I-; lum' ml ;n •.nri-iY.il.ilu i- l . l l . Y . moiii'm rlio h.iirpin -u-ir i In. 'I'll,- i.n-li'i:! nm-isiir ilu- n.'.!r-:ii!-ii ili(.it'A; loo-^ :imtr",n\' .•o^m^J I'lUL 1 :n\i- iio-.\- 1 ho ;(i(i.Vi-. nsinil :n ilu1 -Ii-lll. U'ltll lt^ L;nJ ;:•! '. ili-.lrii^ui l''i::.io;l .lonnl'.'.u L'l.-;1!.!'! :'.:.>llli. ;'.ill ho -00:1 HIM :H'Ui\v '.111.1 ..'ITIII'L- nl llio 1'iL'Jr.. iii.ii.:i- Mi'niH'i.' I'lL'i1.. li1 ;iio i ii;ln \ ii">'.'. tin- I I I' pvton iir iho ik-ii\\i \riiii::o -I'.^.i:- is . ulnnu-it vo.loi\ 1.1 i.'iii|'li i-i/.. 1 ::s st.u-kiii; 1 . on llu1 A Kiv nl'lho -ho,i!;\i ( ;iA ;.\iir .11-0 :K i 01^1 ,|ik-:i: I.IVL;O npnokl ilnti '.si1..1 K-M'1.
Pliilr XI. -K i u,'Urn' ii|'tli-,- NTT, ;i-;:-; ; -\ [ r.i- ^r.Liu] .M-lonr^ .iu- ihi- MI:IO ,^ r' l'l:;to V I'*;, qy Mi' tin' ;kivd ".ii-iiv. ^n.i-iiiu- in ilu' "(.::! i(.' !-.i;''.,-i, v.'li;>-;i i-. o.ilnnu'J yollim . J lu- I.-op- .in- i.-(>K;iiu\1 s:iw. .,. Si'.-R'iAK-v. liinkiiiL; 'illi' tli^- ni.iji'i- .;;;'i«i\'L- i> : " ilif .lupicx lor UK- LII^IA ;•-: ci-c i-ijh! IUMVSI i-iu-rp ^M••.^•r.::\•-. :i- Siif-i I;M ^inm- ,>r"i:iL-- " C : t H ' ni|'lri \\:i>ki!ij', intu tlu-nuior iii'iH'VL1 01 tht 1 duplex. I I n - I ) . h;i^i^ Jr.i'A :i -A-:if! i l;ii L.'V [vniil^. 'I'lii- >lr in;l ,'II|IH:I^ :i-c Lie -.Li-ic .i> i:> I'].;!..1 X. ;-\(-i'-->i tin- rlk1 1 ), li.isi1. T!'L- A. li. i n . l t : !•[]!;;•; ill ilif U.II.ISL- .ir,-L;IV-,-I:. -i\\. .iu>i I0;v. r;^|!:\-nvfl\-. i;> -.Niisir.!!!- liii\\ rln1 I >, h.i-t- ::-.]:iiu'- . : . i - i p K - i . A t r i - r r.'t'. J_;7.
Plate XIII. X r.iy iTY4i:i! iiriuliuv of .1 !'N" A: I >N A; I'N A ip.ir.ilk:]: rvipL'v viiin- .11 in I'l.iu- X --C-. lli..1.
Hit' strjiit; i\>kiii:x .,ri- t.lu-
i i: Vii-',v inci-i 7hi- iii.ror t;i\!i.>x',- Di'iln 1 il-.ipli-x. dv VIL-U- ili>\i!i llu- lifln'.i! .i-^ii. Alu-r
Plato XIV.
A I'jJiir.Mr.i'.r*-. 1 - . . i o n : i:i-:\.r. • • ! : .1 i : , - 2.~> ^ \
M^ . vw..l -: :".u I •: re cir [!: ilivnl.irh tivivJ ti>.ir-]\'vo.n liviii.ni u - l i i i i K - : i - ii A( i .' I'..\( i.' : | i.'n.i.in^ik--. : :i Ki -I'linum r^nr.iiiJMii-. .1 ;\-IHIM| f — I I 5 '\ \ ^ .II.IL;.^":.:1 I 1 .i-''.• • ~ . ' J ' : .nui \i :iu \MK-lvAL-J ^ikrjnii ^::'IUTI:u^ or rl'.o inf.irink\'ul : n'.y lt>k:^\l llnir iv.H'Jl '(,•'.'I, 1 :;. 1 ',:. 1 , 1 ,! li'!i'li':-.'Vi J I :< ' • .ju.ulrupk-^ . oiiMMiiili; .1 'I'I ') T2l ' ,\< I'.ib.o i li.n •• u-i •.'! ~.L! kiiip : •>. I iu- l < > • . ! ; • ( , M-.'.IIH-IIU ..i-inrii.l r:i- .|ii nI-u;>'..•-, .1; • ..-u'.imii-.l ni.i:\-:ii.i. ;;.IVL'II. v'v.in. .n>.,l vi-llov.'. •i'Jiik ilu 1 iM[» -!•!,!"..-ins .ir\- :ii v. luu1.
Plate XVI, A of the C I ,C;,l.j i|n.Kirn[-k-\ 1 ,-onT,ii,L-.i^. .: lM m:' dupl.n of the NMK ;u-ol .irur:mv, 1 rn'':?k^ though lack to to [ . . : ! i:i!i!i.---i/.;uon or a par of li.myHh n:: LI.- \.'. ^(il-jun^i: i.l l f » : i-nl '••.'*': K Mi.ul-ii'i •: I I" . I .lach of die tour ^Lr.;-\s ::'.\ olu-,1 ii; :iir.ni ;nrii-.]:inn s^j:m\ii -n .1 M-| \ii.ii, colour.The loop resdues .are ::u\\:i ic: '.while ["!::-L-i- \.; i'..iiiin- .m.l ihi-r ..•iini\iii;.:.ti.ij •?.\\^ i i)\ are shown in-jiipMI. ii'i-.. r;^]\\-iu-i.'k . in ..I... vvhil-,- Nvi.- k iiiiuii^ jiul i:u-ir i-.'iii\ii-:'.-.:..'ii n\\;n'n> .nv ^|ii ; -i\--i in \-rlli>w nui i\^i. i'L'^p^'-^ijx'oly. in i : v . Iho .kMit^M.-l J\ t .i::(in :nn.|-ii^ Mir^ i:j 'V-'. .ir^ J^H•.^L•J^.- \ ^ n h i ^ [lu 1 dfjkajksdgf askdfkjag dskfjkajsdgk k
Plate XVII, A ,.i.ni(-n.imv ,',:|.nn- iii,].].^. o - ' i ^ t - \-r.ii sink-iuiv- •.•!'• :... l':.- i\•(.',-( \ on ^ . M C M ( i - < ! (• n'i^ :! 1 ^ . : .i..,] !•'• i ' u - .l:.'.v\ [ ' . ( " \'l .1 LiinJi'L^M"-, n!:i:.ii::i;u: \: I : \: I li'ii'.^s .ili;;iu\i -.liriniuj: ;li..' i nrun LUI.HT .-ilbv, ol \V .MM!!- ( 'rirk A: I :\urs ' I I ' 1
Plate XV1I1. A . ui-p.-.i-.i: :-,;•:.,...in--,1i-|'].i>. . > • . i i. - l i ; - NMR-h.noJ -in.k--.iuv of the lour-M:-.!::^^! , | - ' l t: ': i I I I . M i: .11 .1,-j.tii- pi 1 -I-UICKM: : I .-i.: .i;:ii .1r t f u - 1.1 A ay-;,il -I- .1. r - . - i , - or' ill.- r'.!ir >ir.irkli-.l i i : ( " 'I j i nusti: v]n.:^]T.pJ,-\ M . ^ - ^ ' . t ' - i - u ^ 'I -1^' loi .r \i r,n n!^ n i\ ^lh.•^^l • 11 i :i:ol i • . |LI iJi upK'x Ttinii.Ltif.ri^ ;s shmv n i:: .•
kfjgkdfjg kdfjgkjs
plate XIX. A Tr.vr--oiiit.iiiiiilL; JupkxL-s. All iki]'k'\o .if- 12 hp k-ni;. A it UTS.areshow(nv:i ui \vllov. .irui thf :v-t MI i-\v.n. l-'r-oni lill ir. rijil-.l. : J A ' . i.j.l I i • lu-l;\ IM-CI] DII ;lic titrv-dLTn-fJ Mnmiu'i- oi' (>i>.\ :, I A'.:]-.i]\ ^iTj '.7;.; C i X ' A A A : \ A A ( r t : A Ircm :lu: ii-yst.ii ^rLuun'o tir'-.lu' 1HI" (Mntnn ] ) N A i-DitipL-x .>': : I ' A A f i A A A A A C . : Hi in.>r.\ u:c , r>.-i-i! ^ru,-u:ix- nl" i l u - ~.\4 i\';v\-'-x!i-:]>lSA oi!ii:)^i-\ ^l;; (.-i-j-.u". Nrni..TLi:\- of (,'{.'i(':AAA A A A ( ! i . ' { ; ( " i ! : . .niil r':hiv-[;.i^'J ^L'nfi.il ^•J^nK•:[^c [i !)NA iS7;. Tin- \ ii-i\ i[^•ri'i-iiilii n'.si ;•,: i:u-- A-tiv.u jiiinov S;C;>IIVL- a'ii|\' .i;iJ TOIIIAT. [iy '^i:' 1 : il^n;i: LH- hrlix .ixis ilioui'iii:.
Plate XX. MuMiiiv •-.•''. .. fMVi -v. .:.\ nuiiim:: J.'JiKvJ IVOMI NMR J..M ' - - i . 1 ' . This j'.rviMii iil-i^^ .• •.•ii.:\i.i!l\ \i u'k.-J •-•[ ru," u:v in i lu- ••-.n'-kr f f . i n \\; -\ \ \ \ \\ i. r: '• ri .-i i o J ::•.. I n.!s .-. 1 \vi> V L ^ \ \ - S nl -.^i n.1 ^U'Lik. I'JH' ..i c -h. -v. i:. \\\\': .1 ri:i:viii |]A'.U'.H:HL' I - K - |\U:: ol i -^' . K-: ^\\ :'j|' i (^i".\'!o^[Hi.n^ l\u'k hum1. I ".•.: NI\A t o i n i - . . ! .-•f iS • Ki IK' 11.1I: v. i: h .1 r • \ '. n t ' I ^-.il^i 1 . i :u- .\.l-^ J i u- ^\:i \\.- \\. •..'.-..•• \'.'.-.'•.••* .ir:.- : n J-i ,iu-il i: i •. \ . 11 I:UL ! lu- i-i -: o~ •Ju1 iruk'Liili- ' .M.:I ::'.... I -i- -.i^^-i j \-_\\ •_•.,• s|u'-v^s i '-.•K-\\ -.i| ^lu 1 MUJOV p\*n\ f Mill i-: LJii';n:u'U^:i. Ill-, 1 ' = 208 ), an d th e mor e compac t syn conlonnation (<X>} = 67°) (Table 7.3 ) with th e bas e essentiall y sittin g o n to p o f th e deoxyribos e rin g (Plat e I I I ). The sreri c inhibitio n t o pyrimidine s adoptin g th e s yn conformatio n impose s th e char acteristic AP P sequenc e moti f commonl y associate d wit h Z-DNA . Thi s alternating pattern o f anti/syn nuclcotid e conformations , however , i s strictly adhered t o i n al l th e crystal structure s of Z-DNA, includin g thos e containin g ont-of-alternatio n sequence s and non-Watson-Cric k bas e pairs . Thus, i t i s the alternatio n in th e backbon e an d no t the sequenc e tha t define s Z-DNA .
Fig, 7.1. Companson of (A) the goame nucleoside in the syn contiormation and (B) the cytosine in the and conformation of (CGCGCG) as Z-DNA. The mitrogen and oxygens of each base, along with the the atoms of the furanox rung of the 2 -deoxyribose sugar are labelled. The attows show the carbons in the sugar rings that detine the C3' cude and 2sugarpuckers for the syn guatine (A) and anti cylosine (B). The are
rotation
labelled
x.
about
the
glycoside
bonds
that
detines
the syn and anti conformations of each nucleoride
The single-crystal structures of Z-DNA 20
5
Table 7.3. Helica l parameters of d(CGCGCG) crystallize d in the presence of spermine4+ an d Mg 2+ [4 ] Base
C1 G2 C3 G4 C5 G6
208.5 57.2 202.4 52.5 214.8 79.1
Sugar pucker
Base
C2'-endo C3'-endo C2'-endo C3'-endo C2'-endo C2'-endo
G12 C11 G10 C9 G8 C7
207.9±7.1
Sugar pucker 79.5 203.9 64.7 200.1 69.5 217.7
C2'-endo C2'-endo C3'-endo C2'-endo C3'-endo C2'-endo
<XGuanine> 67.1+11.1
Twist (O )
Rise (Dz)
Roll (p)
Tilt (r )
d(CpG) ste p (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
-8.5 -9.1 -10.6 -9.4+1.1
3.8 3.8 4.3 4.0+0.3
-3.0 3.6 -2.1 -0.5+3.6
6.9 1.1 0.7 2.9+3.5
d(GpC) ste p (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
-48.8 -51.4 -50.1+1.8
3.7 3.6 3.7+0.1
-0.8 0.4 -0.2+0.8
-0.6 0.2 -0.210.6
Base pair Ti
p (O)
C1:G12 3. G2:C11 2. C3:G10 -1. G4:C9 -1. C5:G8 1. G6:C7 0. Average 0.7±1.
0 1 5 1 0 9 8
Inclination (17)
Buckle (K)
6.9 7.5 6.4 6.6 7.3 7.7 7.110.5
0.3 4.8 2.8 5.9 0.1 4.4 3.112.4
Propeller twist (w) 0.8 2.1 5.6 3.4 0.6 3.2 2.6+1.9 -
Displacement
4
dy
3.3 3.1 3.1 3.1 3.5 3.4 3.3+0.2
2.5 1.9 2.2 2.4 2.0 1.9 2.2+0.3
All parameters were calculated with NASTE. All values are in degrees, except rise (Dz) and displacement (dx, d y ), which are i n A .
In a n antiparalle l DNA doubl e helix , ther e i s a major groov e an d a minor groove . B-DNA ha s a dee p majo r groov e an d shallo w mino r groove . Z-DN A ha s a mino r groove tha t is a deep narro w crevice , which bring s th e phosphat e group s o f opposit e strands close r togethe r tha n i n A - o r B-DNA . I n contrast , th e majo r groov e o f Z-DNA is more a convex surface tha n a true groove and , consequently, exposes more atoms to solvent than would b e expected for B-DNA (Plat e III).
206
Oxford Handbook of Nucleic Acid Structure
The conformation s o f th e deoxyribos e sugar s along th e phosphoribos e backbon e are strongly affected b y the alternatin g anti/ syn structure of Z-DNA. These suga r con formations ar e defined by ho w th e furanos e rin g is distorted (o r puckered) from pla narity (Fig . 7.1) . I n B-DNA , th e sugar s adopt th e C2'-endo conformatio n in whic h the C2 ' carbo n sit s abov e th e plan e (toward s the base ) forme d by th e C1' , Ol' , an d C4' atom s (se e Chapter 1) . In A-DNA, the suga r puckers are C3'-endo. The deoxyri boses of Z-DNA alternate between C3'-endo for nucleotides i n a syn and C2'-endo for nucleotides in a n anti conformation (Tabl e 7.3). Thi s alternating patter n is seen for all sequences, includin g non-AP P sequence s tha t plac e pyrimidine s syn. I n th e crysta l structure o f d(CGCGCG) , a s wel l a s othe r Z-DN A sequences , th e 3'-termina l dG nucleotid e ha s a C2'-endo sugar , eve n thoug h th e guanin e i s syn (Tabl e 7.3) . This end-effec t an d othe r exception s t o suga r pucke r alternatio n mos t likel y reflec t distortions induced b y the crysta l lattice rather tha n any inherent sequenc e effect . The phosphat e backbone linking the sugar s of each nucleotide show s two differen t conformations, Z I an d Z II. Th e Z I conformatio n i s characterized by a pattern of alternating torsion angle s along the backbon e ( a t o £) . The Z II-form show s exception s t o this alternating pattern (mos t prominently a t a, B , an d y), usually between th e fourth and fifth bas e pairs of one stran d of the hexame r duplex . Th e Z II conformatio n rotates the phosphat e ou t an d awa y from th e mino r groov e crevic e a t thi s nucleotide . Thi s differentiates on e stran d fro m th e othe r i n th e crysta l fo r mos t o f th e structure s i n which th e asymmetri c uni t i s the DN A duplex ; however , th e base s are not dramati cally affecte d b y thes e deviations . Th e Z II patter n ha s bee n attribute d t o crysta l packing effects , an d has been suggeste d t o b e stabilize d by a specific patter n of waters at the interface between Z-DN A duplexes (24) . The Z I patter n is generally considere d to be representative of the averag e structure of the Z-DNA backbone, whil e th e exis tence o f th e Z II patter n reflect s th e degre e o f flexibilit y i n th e backbon e o f a n otherwise rigid structure. This adherenc e t o a characteristi c zigzagge d patter n i n th e backbone , wit h th e nucleotides alway s i n th e alternatin g anti/ syn conformatio n eve n fo r non-AP P an d non-Watson—Crick base pairs, suggests that Z-DNA is very rigi d i n it s conformity t o a structural and not t o a sequence pattern. Th e repeatin g uni t o f Z-DNA is therefore a dinucleotid e (dn ) wit h ver y well-define d geometries . Th e zigzag patter n o f th e backbone result s in the stackin g of the bas e pair s in two differen t arrangements . A d(CpG) dinucleotid e place s the pyrimidines i n the anti conformation 5 ' t o syn purines along each chai n (a n anti—p—syn step) , while th e alternativ e d(GpC) dinucleotide ha s a syn purin e stacke d ove r a n anti pyrimidin e ( a syn—p-anti step ) (Plat e IIIb , c) . I n th e anti—p—syn dinucleotide , th e base s of the pyrimidine s fro m opposit e strand s are actually stacked, whil e th e purine s stac k ove r th e deoxyribose s o f th e adjacen t pyrimidine s along th e sam e strands. It ha s been suggeste d tha t thi s latter stacking is stabilized by a favourable electrostati c interaction betwee n th e II electrons of the purine an d the non bonding electron s of the O4 ' oxyge n o f the sugar ring (74) . The syn—p-anti ste p places the six-membere d rin g of the purine ove r th e adjacen t pyrimidin e o f the sam e strand. Thus, althoug h a n argument ca n be mad e tha t the anti—p—syn stackin g arrangement is more stable, it is difficult t o compar e th e bas e stacking interactions accurately because they ar e s o differen t betwee n th e tw o stackin g modes . Ther e ma y b e a differenc e imposed b y the solven t interactions with Z-DNA, a s will b e discusse d later. Fo r now ,
The single-crystal structures of Z-DNA 20
7
we wil l trea t th e anti—p—syn dinucleotid e a s th e repeatin g uni t i n Z-DNA . Th e sequence d(CGCGCG ) can therefore be though t o f as three stacke d repeats of d(CpG ) dinucleotides i n th e anti—p-syn conformation , wit h th e interface s being th e syn/ anti arrangements. Thus, th e overal l shap e o f Z-DNA remains fairl y consisten t acros s all the Z-DN A crystal structure s tha t hav e bee n determined . Factor s suc h a s sequence an d solven t interactions affec t th e detail s o f the structure , whic h ar e best describe d b y the helica l parameters.
2.2 The helix structure of d(CGCGCG) We will compar e th e helica l parameter s o f the variou s Z-DN A structure s in orde r t o elucidate th e effec t o f an y particula r facto r o n th e conformation . A se t o f standard definitions fo r helica l parameter s ha s previousl y bee n establishe d b y th e Cambridg e Workshop (75 ; se e als o Chapte r 2) ; however , som e o f thes e parameter s hav e a specia l meaning fo r Z-DNA. We have therefore developed a n algorithm, NAST E (nuclei c acid structure evaluation) , t o calculat e variou s helica l parameter s specificall y fo r Z-DNA . The algorith m first transpose s all base pairs to a common frame o f reference, defined b y the heli x axi s and the normal fro m th e helix axi s to th e base pair's long axis. The helica l parameters o f each base pair and each base step within th e structure s are calculated from this frame of reference. The effect s o f cations, sequence , bas e modifications , and crystal packing forces o n these parameters will be discussed in this chapter. The helica l parameters for the antir-p-syn an d syn—p-anti bas e steps of Z-DNA includ e the rise (D z) an d helical twist (ft ) between each base pair. The single-crysta l structure s of Z-DNA are long and narrow, with an average helical rise () o f 3.8 A and a width of ~20 A. By comparison, th e (D z) an d width o f B-DNA is 3.4 A and ~24 A, respectively. In Z-DNA , the averag e helical twis t () is —30° per bas e pair; thus, each base pair of Z-DNA i s underwound o n averag e by -66° relativ e t o B-DN A ((O ) = 36°) . Fo r th e remainder o f thi s discussio n w e wil l onl y b e comparin g Z-DN A structures , thu s th e terms 'overwound ' an d 'underwound ' wil l refe r t o mor e an d les s left-hande d twist s (negative ft) , respectively . Th e repea t uni t o f Z-DNA, however, i s the dinucleotid e an d thus it is more accurat e to compar e f t fo r the distinc t dinucleotid e repeat s in the struc ture. Th e d(CpG ) ste p in d(CGCGCG) is characterized b y (ft) = 9. 4 ± 1.1° , whil e fo r the d(GpC ) ste p (ft) = -50. 1 ± 1.8 ° (Tabl e 7.3) , t o giv e a total (ft ) = -59.5° fo r the sum of the dinucleotide step s (or (ft) = 29.8° per base step). Roll (p) and tilt (r) describe the angle s between adjacent bas e pairs along thei r lon g and shor t axes , respectively . Positiv e rol l indicate s tha t th e base s ope n toward s th e major groove , an d positiv e til t indicate s tha t th e angl e open s toward s th e leadin g strand, whic h i s define d a s th e stran d containin g th e firs t nucleotid e (Plat e III) . NASTE's assignment s of these parameters ar e consistent wit h the Cambridg e conventions (75) . Th e d(CpG ) step s i n Z-DN A hav e a greate r averag e rol l ((p) = —0. 5 ± 3.6°) tha n th e d(GpC) step s ( ( p ) - -0. 2 ± 0.8° ) (Tabl e 7.3) . Th e averag e roll fo r all steps i s -0. 4 ± 2.6 . Til t differ s slightl y betwee n d(CpG ) an d d(GpC ) steps . Th e average til t i s (T) = 1. 7 ± 2.9 ° fo r d(CGCGCG) . B y comparison , A-DN A tend s t o have a muc h large r rol l ((p ) = 6.3°) (76) , an d B-DN A tend s t o hav e onl y smal l degrees o f roll ((p ) = 0.6° ) an d tilt = 0.0° ) (76) .
208
Oxford Handbook of Nucleic Acid Structure
The structure s of DNA duplexe s are additionally describe d by how eac h base pair is rotated alon g its long and shor t axe s (the rotational helica l parameters) and translated along thes e axe s (th e displacement ) relativ e t o th e heli x axis . Th e rotationa l helica l parameters (ti p and inclination ) ar e calculated as the angl e between th e perpendicula r to th e bas e plan e (th e base normal) an d th e heli x axis . Tip (0 ) measures the rotatio n around the long axis of the base . In our comparisons , a positive value for tip indicate s that th e bas e pair i s rotated toward s th e majo r groove . Inclinatio n (77 ) measure s th e rotation aroun d th e shor t axi s o f the bas e pair, with a positive inclinatio n reflectin g a rotation toward s the secon d strand. The averag e tip observed in Z-DNA (Tabl e 7.3) is O = 0.7 ± 1.8 ° an d is greater tha n that observe d fo r B-DNA (0 = 0°) (76) , but muc h less than th e averag e tip observe d in A-DNA (9 = 11.0°) (76) . Likewise, Z-DNA has more inclinatio n ( n = 7. 1 ± 0.5° ) (Tabl e 7.3 ) tha n B-DN A (T J =2.4° ) (76) , but les s than A-DNA (T J =12.0° ) (76) . Like helical rise, displacement i s a measure of translation, but i n thi s case, translation of the positio n o f the bas e pair relative to the heli x axis . Displacement of the heli x axis from th e shor t axi s a t th e centr e o f th e bas e i s th e x-displacemen t (dx), while tha t along th e lon g axis is the y-displacemen t (dy). Positiv e values of dx reflect a translation towards th e mino r groov e an d positiv e dy toward s th e leadin g strand . Bas e pair s i n B-DNA ar e essentially centre d o n th e heli x axi s and therefor e show littl e o r n o dis placement; however , i n Z-DNA , th e heli x axi s is displaced b y ~ 4 A int o th e mino r groove. Thi s can be separate d into average values for dx = 3. 3 ± 0. 2 A and dy = 2.2 ± 0.3 A , respectively , for d(CGCGCG ) (Tabl e 7.3) . B y comparison , th e heli x axi s o f A-DNA is highly displace d towards the majo r groov e (wit h dx = —4.5 A and dx = 2. 6 ± 1.9° ) i s less tha n tha t o f B-DNA (<w> = 11.0°) (17 ) and A-DNA (<w>) = 8.3° ) (76) . The averag e buckle is similar between Z DNA (Tabl e 7.3) ( = 3. 1 ± 2.4° ) an d A-DNA ( = 2.4° ) (76) , but greate r than in B-DNA ( = 0.2° ) (76). In summary, Z-DNA is a long, narrow , double heli x i n which th e plane s of the base pairs al l lie essentially perpendicular t o th e heli x axis , with th e heli x axi s lying in th e minor groove . Th e alternatin g helica l twis t angle s reflec t th e distinc t differenc e between th e d(CpG ) and d(GpC) dinucleotid e steps.
2.3 The solvent structure of d(CGCGCG) Both th e dee p narrow mino r groove crevic e an d the conve x major groove surfac e are important site s for Z-DNA interactions wit h solvent an d with meta l complexes . Th e most immediatel y obviou s sit e o f interactio n a t th e majo r groov e surfac e i s the N 7 nitrogen o f the guanin e bases . This i s the mos t accessibl e nucleophili c grou p o f th e surface, an d ha s bee n foun d t o for m covalen t adduct s wit h transitio n metal s [e.g . copper(II) (48,53 ) and platinum(II ) (61)] . Perhap s mor e importan t in term s of thei r effect o n th e stabilit y o f Z-DNA , however , ar e th e hydroge n bondin g interactions . The potentia l hydroge n bondin g group s i n Z-DN A ar e basicall y th e sam e a s thos e
The single-crystal structures of Z-DNA 20
9
present i n B-DNA , wit h th e exceptio n tha t th e N 3 nitrogen s o f th e adenin e an d guanine base s ar e no t normall y accessibl e in th e mino r groov e crevic e o f Z-DNA . The hydroge n bondin g groups interact with wate r molecules, with ligands of solvated magnesium an d sodiu m complexe s [Mg(H 2O)62+ an d Na(H 2 O) n + , fo r n = 5—7 (reviewed i n ref s 7 7 an d 78)] , with th e hexammin e complexe s o f cobal t an d ruthe nium [Co(NH 3)63+ (42 ) an d Ru(NH 3)63+ (41)] , an d wit h th e polyamine s spermin e (39) an d spermidin e (40) . W e wil l focu s firs t o n th e wate r structur e and the n o n cation interactions an d their effect s o n Z-DNA structure. The solven t organizatio n a t the majo r groov e surfac e an d minor groov e crevic e o f Z-DNA ha s been extensivel y studied b y Gessne r et al. (79) for thre e crysta l forms o f d(CGCGCG) (th e forms crystallize d with onl y magnesium, only spermine, an d mixe d magnesium/spermine solutions) . The feature s tha t are common t o thes e three crystal structures probabl y represen t th e typica l organizatio n o f solven t aroun d th e d(CG ) base pairs of Z-DNA. There ar e tw o conserve d pattern s o f wate r interaction s observe d a t th e majo r groove surfac e (Fig . 7.2). These regula r solvent motif s connec t cytosine s to cytosine s and guanines to guanines across the strands . In the firs t motif , tw o waters bridge adja cent cytosine s on opposit e strand s of the anti—p—syn step s [th e d(CpG ) dinucleotides] . This appear s to b e th e mor e stabl e pattern o f waters. Th e bridgin g wate r molecule s are very well ordered , a s indicated by their low temperature factors (averag e = 16. 8 ± 5.1 A 2), an d fal l int o ver y well-define d geometrie s (wit h averag e water—cytosin e hydrogen bon d distance s of 2.99 ± 0.1 5 A , water-water distance s of 2.94 ± 0.3 4 A , and angles of 92.4 ± 6.5 ° for cytosine—water—water) . Th e water s are not disrupte d b y either magnesium s o r spermine s i n th e crystal , even thoug h hydrate d magnesiu m complexes are located in close proximity t o the connecte d cytosines. The secon d moti f a t th e majo r groov e surfac e i s forme d b y singl e water s tha t directly connec t tw o guanin e base s o n th e opposit e strand s of th e syn—p-anti steps . These ar e less regular in structur e than thos e that bridg e th e cytosine s (with average hydrogen bon d distance s of 3.05 ± 0.3 6 A between th e water s and the O 6 oxyge n o f the guanines , and guanine—water—guanin e angles of 69.5 ± 5.7°) . They ar e also readily displaced by hydrated magnesium complexe s and spermine. Thus , this set of bridging waters at the syn—p-anti step s are less regular and apparently less stable than those a t th e anti—p-syn steps . The mino r groov e crevic e is lined b y a continuous network o f well-ordered water molecules. Ther e ar e typically at least tw o wate r molecules lying in th e plan e of each d(CG) bas e pair (77). These for m hydrogen bonds t o the O 2 ket o oxygen o f the cytosine and the N 2 amin o grou p o f the guanine bases. The interconnecte d waters bound to the O 2 oxygen s of the cytosin e bases form a continuous network, referre d to a s the spine o f hydration. Simila r spine s of waters are observed i n th e mino r groove s o f B DNA structure s (80) . Th e significanc e o f regula r network s o f water s i n th e minor groove o f DNA duplexe s has previously been discussed for B-DNA (81 ) and is further analysed i n Chapte r 9 . Th e basi c conclusion s fro m NM R studie s o n exchang e between DNA-boun d an d bul k solven t wer e tha t thes e spine s exist i n solutio n i n B-DNA (82,83 ) an d thus can be treated as an integral part of the DNA structur e (81). These sam e concept s ar e likel y t o appl y t o th e hydratio n spin e i n th e mor e rigi d Z-DNA structures.
210
Oxford Handbook of Nacleic Add Structure
Fig, 7,2 . Solven t interaction s wit h d(CGCGCG ) a s 7-DNA . (A ) Sterrodiagru m comparin g th e solven t structures a t th e majo r groove surfac e an d th e mino r groov e crevic e o f Z-DNA. Th e water s du t interac t a t the majo r groov e surfac e are shown o n th e uppe r duplex . Hydrogo n bond s betwee n cach wate r ar e show n a s solid lines , whil e hydrogen bond s fro m cac h wate r t o th e DN A surfac e ar e show n a s dotte d lines. Water s tha t bridge th e stacke d cytosine s o f th e d ( C p G ) dinuclcoude (throug h th e N 4 amin o group s o f th e bases ) ar e shown a s dark sphere s (labelle d W an d Wc.) , whil e water s tha t bodg e th e stacked goanine s o f th e d(GpC ) dinucleotide (throug h th e O 6 oxyge n o f th e bases ) ar e show n a s ope n circle s (labelled W CA). Water s tha t interact wit h th e mino r groov e crevic e ar e shown i n th e lowe r duplex . Dar k sphere s represen t th e spin e o f hydration tha t link s th e eytosine s (throug h interaction s with th e O 2 oxyge n o f the bases) , whil e thos e tha t lin k the guanin e N 2 amm o group s t o th e phosphorihos e backbon e ar e shown a s open circles . Th e solven t intera e tions wit h (B ) the d(CpG ) an d (C ) th e d ( G p C ) dinucleotid e step s ar e shown lookin g dow n th e heli x axis . I n addition t o th e label s describe d above , W G;1, represents Che waters tha t lin k th e sy n guanine bases t o th e phos phoribose backbone. Water s that ar e hydroge n bonded t o cytosme s ar e show n a s dark spheres , whil e thos e t o guanine ar e shown a s ligh t spheres.
The single-crystal structures of Z-DNA 21
1
The water s at the guanin e base s are significant in tha t the y bridg e the N2 amin o groups to the phosphat e oxygens of the backbone (Fig . 7.2b). This interactio n may be important fo r stabilizing the syn conformation o f the guanin e bases. Any perturbatio n to th e solven t interaction s i n th e majo r groov e surfac e an d mino r groov e crevic e caused by various base substituent groups will affec t th e stabilit y of Z-DNA.
2.4 Cation effects on the structure of d(CGCGCG) Z-DNA ha s bee n crystallize d i n th e presenc e o f severa l differen t type s o f cations , including magnesiu m an d th e polyamine s spermin e and , mos t recently , spermidine . The effec t tha t cation s hav e o n th e structur e o f Z-DN A i s significan t becaus e th e cations hel p t o stabiliz e th e left-hande d structur e i n solutio n b y screening , an d thu s shielding, the negativel y charge d phosphates . The phosphate—phosphat e distance s are closer i n Z-DN A tha n i n eithe r A - o r B-DNA because of the narro w mino r groov e crevice. The effec t o n the stabilit y of Z-DNA is dependent o n both th e concentratio n and the charg e of the cation , with higher charge d ion s being mor e effectiv e a t stabiliz ing this conformation. Th e stabilizatio n of Z-DNA in solution follow s the tren d sper mine4+ > spermidine 3+ > Mg 2+ > Na + (14) . I n particular , th e stabilit y o f differen t sequences a s Z-DNA i s dependen t o n th e th e catio n strengt h o f a solutio n (C S = EZ,2[Mi], wher e Z i, is the charg e an d (M i) i s the concentratio n o f th e catio n typ e i) . This relationshi p ha s been use d a s a quantitative metho d t o predic t th e solution s fo r crystallizing different sequence s as Z-DNA (22). The polyamine s hav e bee n extensivel y studie d becaus e the y ar e know n t o ai d i n DNA condensatio n an d to prevent therma l denaturatio n o f the duple x (84) . Levels of polyamines ar e highly dependen t o n th e cel l cycle , an d ar e perturbed i n cance r cell s (reviewed i n ref.85) . Fo r thes e reasons , polyamin e bindin g ha s been o f interes t no t only to biologists , but als o to crystallographer s because of the analog y between crystal lization and condensation . Four crysta l structures of d(CGCGCG) hav e been analyse d to determin e th e effec t of polyamines an d magnesiu m o n th e structur e o f Z-DNA. Thes e structure s include the magnesiu m onl y (MG ) for m (37) , th e spermin e onl y (SP ) for m (38) , th e mixe d magnesium an d spermin e (MGSP ) for m (4) , and th e mixe d magnesiu m an d spermi dine (MGSD ) for m (40) . Although al l these crystal s wer e grow n i n th e presenc e o f sodium ions , i t i s the interaction s o f th e multivalen t cation s (specificall y Mg 2+ versu s the tw o polyamines) tha t will b e discussed here. The referenc e d(CGCGCG) structure to which w e have been referrin g is the original MGSP form (4) . The structure s of the DNA i n both th e MG, MGSP , and MGS D forms ar e nearly identical i n al l respects, except for the ligan d interaction s (Tabl e 7.4). Thus, althoug h th e polyamine s ar e more effectiv e a t stabilizin g Z-DNA i n solution , the crysta l structure s appear to be determine d by the presenc e of magnesium . We will, therefore , trea t the M G for m a s the referenc e fo r comparison , wit h th e realiza tion tha t the MGSP an d MGSD forms are very simila r t o this . The observe d lattic e o f th e S P crysta l i s differen t fro m tha t o f th e othe r d(CGCGCG) crystals , suggesting tha t thi s DN A structur e i s significantl y differen t from th e referenc e structure (Tabl e 7.4) . Th e DN A i n th e S P lattice i s rotated b y 70° around th e heli x axis , shifte d b y 3 A alon g th e heli x axis , an d rotate d aroun d th e
212
Oxford Handbook of Nucleic Acid Structure
Table 7.4. Effec t o f cations on the Z-DNA structure of d(CGCGCG ) Crystal form
Twist (O) (C1:G12)/(G2:C11) (G2:C11)/(C3?G10) (C3:G10)/(G4:C9) (G4:C9)/(C5:G8) (C5:G8)/(G6:C7) Average d(CpG) Average d(GpC) Rise (Dz) (C1:G12)/(G2:C11) (G2:C11)/(C3:G10) (C3:G10)/(G4:C9) (G4:C9)/(C5:G8) (C5:G8)/G6:C7) Average Roll (p) (C1:G12)/G2:C11) (G2:C11)/(C3:G10) (C3:G10)/G4:C9) (G4:C9)/C5:G8) (C5:G8)/(G6:C7) Average Inclination (n ) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average x-displacement (dx) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
MG
MGSP
MGSD
SP
-9.2 -48.9 -9.4 -50.8 -12.2 -10.3 ± 1.7 -49.9 ± 1.3
-8.5 -48.8 -9.1 -51.4 -10.6 -9.4 ± 1.1 -50.1 ± 1.8
-9.0 -48.8 -8.7 -51.6 -11.6 -9.8 ± 1.6 -50.2 ± 2.0
-11.6 -47.7 -11.7 -49.3 -12.3 -11.9 ±0.4 -48.5 ± 1.1
3.8 3.6 3.9 3.5 4.1
3.8 ± 0.3
3.9 3.7 3.8 3.6 4.1 3.8 ± 0.2
3.9 3.7 3.4 3.6 3.4
3.8 ± 0.2
3.8 3.7 3.8 3.6 4.3
-0.8 -1.5 -1.1
-3.0 -0.8
-1.6 -0.9 -2.1
-4.6
0.3 3.6
3.6 0.4
0.0 1.4
3.6 + 0.2 2.5 1.2
-1.9
0.1 ± 2.1
-2.1 -0.4 ± 2.6
-0.6 ± 1.4
0.6 + 3.9
6.0 5.9 6.8 7.4 8.1 7.4
6.9 7.5 6.4 6.6 7.3 7.7
6.2 7.1 7.5 7.5 9.3 8.1
5.7 4.1 1.4 1.4 2.3 0.6
5.6
6.9 ± 0.9
7.1 ± 0.5
7.6 ± 1.0
2.8 ± 1. 9
-3.0 -3.1 -3.3 -3.3 -3.5 -3.4 -3.3 ± 0.2
-3.3 -3.1 -3.1 -3.1 -3.5 -3.4 -3.3 ± 0.2
-3.1 -3.1 -3.2 -3.2 -3.5 -3.4 -3.3 ± 0.2
-4.8 -4.8 -3.9 -3.4 -3.8 -4.2 -4.2 ± 0.6
All values are in degrees, excep t rise (Dz) and displacement (dx), which ar e in A. MG refer s to th e crysta l grown i n the presence of Mg 2+ onl y (37), MGS P is the crysta l grown wit h MG 2+ an d spermine (4), MGSD i s the crysta l grown with Mg 2+ and spermidine (40), an d S P is the crysta l grown with spermine only (38).
The single-crystal structures of
Z-DNA
213
intramolecular pseudo-twofol d axi s compare d wit h th e DN A i n th e M G an d MGS P crystals (38) . Th e mos t obviou s differenc e betwee n th e Z-DN A structur e o f th e S P form i s th e shorte r (D z ) (3, 6 A ) an d large r (d x ) (-4. 2 A ) compare d wit h th e M G structure (3. 8 an d -3, 3 A , respectively ) (Tabl e 7.4) . A s a result , th e S P structur e o f Z DNA i s shorter an d wide r tha n th e referenc e conformation. This i s proabably a result of the bindin g o f spetmin e t o th e majo r an d mino r grooves . Th e (O ) show s tha t th e d(CpG) step s ar e overwoun d (b y ~ 2°} whil e th e d(GpC ) step s ar e underwoun d (b y ~ 1.0°) i n th e S P structur e relativ e t o th e referenc e M G structur e (Tabl e 7,4) . Th e compensating under - an d over-windin g o f th e dinucleotid e step s render s th e overal l (O) o f th e structur e identica l to tha t o f th e othe r Z-DN A structures . Finally , th e S P form show s a sligh t increas e i n rol l an d a dramati c decreas e i n th e inclinatio n of th e base pair s (Table 7.4). In orde r t o understan d th e effec t o f th e cation s o n Z-DN A stability , w e mus t first characterize th e specifi c interaction s tha t th e cation s an d thei r ligand s mak e wit h th e DNA, Startin g wit h th e referenc e M G form , ther e ar e fou r uniqu e hydrate d mag nesium cluster s tha t wer e observe d t o bin d th e DN A duple x (Fig . 7.3) . Thi s
Fig. 7,3, Comparison of the cation interactions between the magnesium only (MG), mixed magnesiuni/spermine (MGSP), spermine only (SP), and mixed megnesium/spermidine (MGS1)) forms of d(CGCGCG). Views perpendicular to (top) and down the helix axes (bottom) of each structure are shown. In the stricires of the polyamines (spermine and spermidine), the nitrogen atoms are shown as spheres. In the MG form of the structure, each unique magnesium ion (waters not shown) is labelled as MgA. MgB. MgC, and MgD. The two unique spermine molecules of theMGSPform are labelled Sp1 and Sp,. while the single magnesium (which is symmetry related to MgC of the MG form) is labelled MgC'. Although there is only one unique spermine in the SP form, it makes three different interactions with each duplex. These three types of interactions are shown. Fuially, the single unique spermndine (Sd), the three magnesiums (identical to MgB, MgC, and MgD of the MG form), and the cation identified as a sodium (Libelled as Na, hut similar in position to MgA of the MG form) are shown for the MGSI) form of
d(CGCGCG).
214
Oxford Handbook of Nucleic Acid Structure
Table 7.5. Hydroge n bonding contacts of the four unique magnesium ions in the MG for m of d(CGCGCG) Cation
Residue
Atom
MgA
G6(s)
PO, O4 ' PO, N2
G8 C9
G10 MgB
Mgc
MgD
PO PO
G4(s) C5(s) G6 G8(s) G10 C11
PO(w)
C1(s) C5 G6(s) C9(s)
N4(w) PO(w) N7, O6(w ) N4, PO(w )
C1(s) G6 C9(s)
N4 O6
G10(s) G12(s)
O6 PO PO N7 PO
MgD(w)
N4, PO O6, N7(w) O6
Mgc(w)
Interactions o f the io n with adjacent , symmetry-related residue s is indicated b y the designatio n (s) , while (w) indicate s tha t thi s contact i s mediated throug h a coordinating wate r molecule.
does no t entirel y neutraliz e th e ne t —1 0 charge o f th e phosphoribos e backbon e i n d(CGCGCG), requirin g eithe r on e additiona l magnesiu m o r tw o sodiu m ion s tha t cannot be observed in the crysta l structure. The observe d ions, however, represen t th e specific interactions . The hexahydrate d Mg A (Tabl e 7.5) makes six hydrogen bondin g contacts with the DNA . I t interacts with a phosphate oxygen o f G8, C9, G10 , an d of the G 6 of a neighbouring duplex , as well as with the N 2 o f G8 and the O4 ' o f the G 6 of the neighbourin g duplex. Mg B (Tabl e 7.5) i s also hexahydrated and makes contacts with a phosphate oxyge n o f G6 , G10 , an d C11, althoug h th e contac t wit h th e C1 1 oxygen i s mediated b y a water molecule. Thre e contacts are made wit h neighbourin g duplexes. Thes e ar e with th e O 6 o f G4, the phosphat e oxygen o f C5, an d the N 7 o f G8. Mg 2+ complexe s Mg c an d Mg D (Tabl e 7.5 ) ar e linked togethe r an d shar e tw o water ligands . On e o f these ligands binds t o N 4 o f C 1 ( a neighbouring duplex ) an d additional water molecules mediate contact s with th e N4 o f C9 and the O6 o f G6 on a neighbourin g duplex. Additionally, an Mgc ligan d makes contact with th e N7 o f the same G6 . Additiona l contact s o f Mg c ligand s includ e water-mediate d interaction s with th e phosphate oxygen o f C5 an d the N 4 o f C1 in a neighbouring duplex . Mg D has additiona l interaction s wit h th e DNA , specificall y with th e O 6 o f G1 0 an d a
The single-crystal structures of Z-DNA 21
5
water-mediated contac t wit h th e C 9 phosphat e oxygen . I t als o make s contact s wit h two othe r duplexes , namel y th e O 6 o f G1 2 o f on e duple x an d th e O 6 o f G 6 o f another duplex. Thus , these Mg 2+ complexe s not onl y provide intramolecular stabilization of Z-DNA, but als o stabilize the crysta l through intermolecula r interactions . In th e MGS D form , ther e i s a singl e uniqu e spermidin e pe r duple x tha t dis places Mg A, an d the remainin g thre e divalen t cation s ar e unperturbed (Fig . 7.3) . Th e Table 7.6. Hydroge n bonding contacts of polyamines and magnesium with the MGSP, MGSD, and SP forms o f d(CGCGCG) a Structure
MGSPb
Cation Residu Mgc G Sp1 G
Sp2 Cl(s
MGSDc
SPd
MgB Mgc MgD Sd C3(s
Sp C
e 6 4 C5(s) G6(s) G8 G8(s) C9(s) G10(s) ) G2 G2(s) C3 G10 C11 C11(s) G12
Atom N7 N7 PO PO, PO(w), O4' O6 PO(w) PO PO(w) 5'-OH N7 N7, O6(w) O6, N4(w) O6, N7 N4 PO O6
MG contact equivalent C(s)
B,C A,B A
A,C,D A,B D — D B
) G6 G12
PO(w) PO(w) P0(w)
3 G8 C9(s) G10(s) C11(s) G12(s)
PO N7 PO N7, O 6 PO PO
C D A — B A,C,D D B -
a Sp and Sd denote th e polyamines spermine 4+ an d spermidine 3+, respectively . PO denote s a phosphate oxygen. Interaction s of the ion with adjacent , symmetry related residues is indicated by the designatio n (s), while (w ) indicates that this contact is mediated through a coordinating wate r molecule. 'M G contac t equivalent' refers to the analogous Mg 2+ comple x of the MG structure . b MGSP refer s t o the crysta l grown i n the presence o f Mg 2+ an d spermine. c MGSD i s the crysta l structure of Z-DNA with Mg 2+ an d spermidine. d SP is the structur e that was crystallized only with spermine .
216
Oxford Handbook of Nucleic Acid Structure
spermidine itsel f interacts with th e phosphoribose backbon e o f two adjacen t duplexe s in th e crysta l lattice . Thes e ar e al l mediated b y wate r bridge s betwee n th e amin o nitrogens o f the ligan d and the oxygen s o f the phosphates . Specifically , these interac tions (Tabl e 7.6) ar e with phosphat e oxygen s o f C3, G6 , and G12. Th e C 3 interactio n is wit h a neighbourin g duplex . Th e interactio n wit h G 6 i s equivalen t t o a contac t made b y Mg A i n th e othe r structures . Interestingly, th e amin o group s o f a truncated analogue [N-(2-amino-ethyl)-l,4-diaminobutane ] o f spermidin e [spermindin e i s N-(2-amino-propyl)-l,4-diaminobutane] bin d directl y t o th e phosphates , an d sho w direct interaction s with th e base s at the majo r groove surfac e (40). Similarly, i n th e mixe d MGS P form , on e o f the origina l Mg 2+ cluster s remains in place, bu t i n thi s cas e th e complexe s Mg A, Mg B, an d Mg D ar e displace d b y th e polyamines (Fig . 7.3) . Ther e ar e tw o spermine s pe r duple x i n th e asymmetri c unit , each interactin g wit h thre e DN A duplexes . Spermine 1 make s two contact s with th e DNA (Tabl e 7.6) , on e wit h th e N 7 o f G 4 an d th e othe r wit h th e O 6 o f G8 . Th e remainder of the interaction s are with othe r duplexes . These interaction s are with th e phosphate oxygen s o f C5 , G6 , an d C 9 an d th e O4 ' o f G6 . Ther e ar e als o water mediated contact s t o phoshat e oxygen s G6 , G8 , an d G10 . Th e amin o group s o f spermine2 als o mak e numerou s contact s with th e DN A an d neighbourin g duplexe s (Table 7.6) . Direct interaction s includ e thos e with th e N 7 o f G2, th e O 6 o f G10 and G12, an d th e N 4 o f C 3 an d C1 1 (althoug h th e interactio n wit h C 3 i s wate r mediated). Interaction s with othe r duplexe s include th e 5'-O H o f C1, th e phosphate oxygen o f C11, th e N 7 o f G2 , an d a water-mediate d contac t wit h th e O 6 o f G2 . The interaction s o f sperrnine wit h the DNA ar e similar to those observed in the M G structure and, in fact, man y of the spermin e contact s are equivalent to thos e seen with Mg2+ i n the MG for m (Tabl e 7.6). In th e S P crystal , there i s one spermin e pe r duplex . Eac h spermine interact s wit h three differen t DN A molecules , an d thre e spermine s interact wit h eac h DNA mole cule (Fig . 7.3) . Thi s larg e numbe r o f interaction s betwee n th e polyamin e an d th e DNA i s consisten t wit h spermine' s abilit y t o condens e DN A (38) . Th e spermine s interact wit h th e DN A a s follows (Fig . 7.3) , on e bind s i n th e majo r groov e o f th e DNA, th e second interacts with th e phosphates along the minor groove , an d the thir d interacts with onl y th e C 9 an d G1 0 o f the DNA . Direc t contact s are made betwee n the phosphat e oxyge n o f C 3 an d th e N 7 o f G8 . Interaction s wit h neighbourin g duplexes include hydrogen bonds wit h th e phosphat e oxygen s of C9, C11, an d G12 as well a s interaction s wit h th e N 7 an d O 6 o f G1 0 (Tabl e 7.6) . Thes e interaction s are commo n wit h interaction s observe d betwee n al l fou r Mg 2+ an d th e DN A i n the M G structur e (Tabl e 7.6) . However , th e interaction s betwee n spermin e an d the DN A duple x i n th e S P for m versu s th e mixe d catio n MGS P for m ar e no t identical. In summary , the cation s make simila r contact s with th e DN A acros s th e differen t structures, an d displa y both intra - an d inter-duple x interactions , which ofte n involv e the coordinatio n o f bridging wate r molecules. Whe n comparin g the concentration of cations required to crystalliz e eac h o f these form s of d(CGCGCG), it became eviden t that spermin e ha d twic e th e effec t expecte d relativ e t o othe r cations . W e therefor e simply increase d th e effectiv e C S for spermin e o n Z-DN A crystallization by a facto r of 2 (thi s ha s already been incorporate d int o th e C S value s in Tabl e 7.2) . Thi s i s an
The single-crystal structures of Z-DNA 21
7
empirical observation , bu t ma y b e relate d t o th e base-specifi c interaction s o f thi s polyamine with Z-DNA . The bindin g o f spermin e t o supercoil-induce d Z-DN A i n close d circula r DN A plasmids (86 ) appear s to b e consisten t with tha t observe d i n th e crystal . The associa tion constan t o f spermine fo r Z-DN A [1. 5 X 107 M -1 fo r d(CpG ) an d 1. 2 X 108 M -1 for d(CpA/TpG ) dinucleotides ] i s > 100-fol d greate r than that for B-DNA (1. 4 X 105 M -1 ), consisten t with th e stabilizin g effect tha t thi s polyamine ha s on th e left-hande d conformation. Th e siz e o f the spermine-bindin g sit e for Z-DNA was determined t o be 10. 4 d(CG ) bas e pairs. This i s larger tha n tha t observe d i n th e crysta l structure o f the S P form o f d(CGCGCG) ( 1 spermine pe r duplex , o r 6 bp/spermine). However , the crysta l structure may overexaggerate the numbe r o f ligands actually bound t o th e DNA. Th e temperatur e factor s fo r th e spermine s ar e about twic e tha t observe d fo r the DNA , eve n a t —100°C. This suggest s that the spermin e ma y not b e full y occupie d and, therefore, th e numbe r o f ligands bound pe r duple x i s likely t o be significantly less than 1 . This would giv e a n overall binding siz e that is more consisten t with th e results from th e solutio n studies.
2.5 Length effects on the structure of d (CpG) sequences as Z-DNA The structure s of alternating d(CpG) dinucleotide s a s Z-DNA have been determine d for fiv e differen t length s o f duplexes , fro m a singl e dinucleotid e i n th e structur e o f d(CpG) t o fiv e dinucleotide s i n d(GCGCGCGCGC ) (Tabl e 7.2) . A compariso n o f lengths shorte r tha n tha t o f th e hexame r d(CGCGCG ) allow s u s t o determin e whether th e conformatio n o f th e anti—p—syn stackin g i n d(CpG ) i s inherent t o thi s dinucleotide i n th e absenc e o f significant flanking bas e pairs. Comparisons o f longe r sequences addres s th e question s o f whethe r th e structur e o f d(CGCGCG) , o r an y hexanucleotide, ca n indee d b e extrapolate d t o longe r an d eve n infinit e length s o f Z-DNA, an d whethe r th e anti—p—syn dinucleotid e o f d(CpG ) i s the stabl e repeatin g unit of Z-DNA. The overal l conformations of the structure s in this comparison ar e all very similar t o one another and to d(CGCGCG) , wit h just a few exceptions. On e interestin g featur e that i s commo n t o al l structures is the presenc e o f th e Z II backbon e conformation . The crysta l lattice interactions tha t ar e associated with thi s perturbatio n i n th e refer ence d(CGCGCG) structur e are not identica l acros s this set of structures. It is unclear then whic h specifi c crysta l lattice interaction s ar e directl y responsibl e fo r thi s back bone conformation . Within thi s se t o f nin e d(CpG) n Z-DN A sequence s (Tabl e 7.2) , th e heptamer s d(GCGCGCG),d(CCGCGCG) an d d(GCGCGCG),d(TCGCGCG ) d o no t hav e blunt ends (44). These structure s are essentially that o f d(CGCGCG) a s Z-DNA, with nucleotides danglin g fro m th e 5'-end s o f eac h strand . Thes e orphane d nucleotide s pair betwee n adjacen t duplexe s t o for m revers e Watson—Cric k d(GC ) an d revers e wobble d(GT ) bas e pair s tha t ar e sandwiche d betwee n tw o stacke d d(CGCGCG ) Z-DNA duplexes . Thes e ca n therefor e b e treate d a s variation s o n th e referenc e d(CGCGCG) structur e in which th e Z-DN A pattern i s disrupted a t the ends , servin g as a tru e indicato r o f end-effects . I n thi s case , th e overal l structur e o f th e Z-DN A duplexed regio n i s remarkabl y simila r t o tha t o f th e referenc e d(CGCGCG ) i n al l
218
Oxford Handbook of Nucleic Acid Structure
respects (Tabl e 7.7) . Th e termina l bas e pair s o f th e duple x regio n (C1:G1 2 an d G6:C7, where th e nucleotides ar e numbered accordin g to the duple x Z-DN A regions only, ignoring th e 5'-overhangs ) sho w a larger buckle than found in any of the d(CG ) base pair s o f d(CGCGCG) ; but , otherwise , al l helica l parameter s are reproduced , including th e averag e helica l twis t a t eac h dinucleotid e ste p an d eve n th e C2'-endo sugar pucker of the 3'-terminal guanine tha t breaks the alternatin g sugar conformation along eac h strand . Th e notabl e exception s ar e the shorte r ris e an d greater buckl e i n the correspondin g termina l d(CG ) bas e pairs in th e heptame r structures . Thus, thes e appear t o b e tru e end-distortion s associate d with th e lac k o f a Z-DNA-like d(GpC ) step between duplexe s in the crysta l lattice. Unlike d(CGCGCG) , th e shor t duplexe s o f d(CG ) an d d(CGCG ) d o no t stac k end-to-end to form pseudo-continuous helices . There ar e no internal d(CpG) step s in either structure , and onl y a single interna l d(GpC) ste p i n th e tetrame r structure . We would expect , therefore, that these structures are essentially 'all ends' . Th e dime r ha s a twist, rise , an d til t comparabl e with th e firs t d(CpG ) ste p o f the heptamer , makin g i t less left-handed and shorte r tha n comparabl e steps in d(CGCGCG ) (Tabl e 7.7). Thu s the structur e o f d(CpG) i s that o f unconstrained Z-DN A ends . Ther e ar e additiona l distortions t o th e dimer , suc h a s the significan t rol l an d propelle r twis t betwee n an d within th e bas e pairs . These , however , ma y b e relate d t o th e unusua l ammoniu m cation present in this crystal that is not presen t in other Z-DN A structures. The d(CpG ) step s in th e tetrame r structur e of d(CGCG) ar e the mos t overwoun d of al l th e structures , wit h (O ) = -13° . Th e singl e d(GpC ) step , however , i s significantly underwound , compensatin g fo r th e overwoun d d(CpG ) an d resultin g i n an (O ) = -29.4 ° pe r bas e ste p tha t i s almost identica l t o tha t o f th e d(CGCGCG ) structures (Tabl e 7.7). I t shoul d be note d tha t onl y th e hig h salt , orthorhombi c for m of d(CGCG ) (35 ) wa s availabl e fo r thi s comparison . However , th e structur e o f d(CCGCGG) ha s the 5'-termina l cytosin e nucleotid e flippe d ou t t o a n extrahelica l conformation and , thus , ca n b e treate d a s a tetrame r o f fou r centra l Z-DN A bas e pairs, analogou s to th e treatmen t o f th e heptamer s a s six Z-DNA duple x bas e pairs with unusua l ends . Wit h respec t t o O , th e tetrame r withi n d(CCGCGG ) i s mor e similar to d(CGCGCG) , particularl y the MGSP form , than t o that of d(CGCG). Th e similarity betwee n th e tetrame r structure s lie s i n th e hig h negativ e rol l o f bot h th e d(CpG) an d d(GpC ) bas e steps, a hig h negativ e til t i n th e d(CpG ) steps , large vari ations i n th e ti p an d inclinatio n a t eac h bas e pair , an d larg e variation s i n propelle r twist an d buckl e withi n eac h bas e pai r (Tabl e 7.7) . Thes e distortion s ar e evidentl y associated with thi s short length o f the duple x and , again , may reflect th e structur e of Z-DNA ends , a s oppose d t o th e interna l dinucleotide s tha t on e woul d expec t i n longer sequences . There ar e a numbe r o f octanucleotid e Z-DN A structure s tha t hav e bee n deter mined, includin g tha t o f d(CGCGCGCG), but the y are all in disordered lattices . Th e only reliabl e parameter that we ca n determine fro m thi s structure is the averag e helical rise per bas e pair (3. 6 A/bp), whic h wa s calculated from th e lengt h o f the helica l axis (the crystallographi c c-axis ) of 43.6 A fo r si x base pairs (45) . This i s shorter tha n th e average for the alternatin g d(CG) tetrame r an d hexamer sequences . The longes t Z-DNA duple x crysta l structur e solve d to dat e is that o f the decame r d(GCGCGCGCGC). Thi s sequenc e i s unusua l i n tha t i t start s wit h a guanin e
Table 7.7. Helica l base ste p and base pai r parameter s of d(CG) n sequence s that crystallize as Z-DNAa d(CG) d(CGCG)
b
d(CCGCGG)
d(CGCGCG) MGSP
d(GCGCGCG)/ d(TCGCGCG)
-7.8 -9.2
-8.5 -9.1 -10.6 -9.4 ± 1. 1
-7.4 -10.3 -10.6 -9.4 ± 1. 8
-6.5 -10.5 -9.8 -8.9 ± 2. 1
3.8 3.8 4.3
3.4 4.1 3.8
3.4 3.9 3.9
d(GCGCGCG)/ d(GCGCGCGCGC) d(CCGCGCG)
d(CpG) steps Twist (O ) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Rise (Dz) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
Roll (p)
(C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Averaged
Tilt (r) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Averaged
-7.4 -12.
3 -13.6
-7.4 -13.
0 ± 0. 9
-8.5 ± 1. 0
8
3.8 3.8
8 ± 0. 1
3.8
3.2 3. 3.2 3.
-13.7 -7.
3.7
1 -0.7
13.7 3.
9 ± 4. 5
-5.8 -10.
1 -3.4
5.8 6.
8 ± 4. 7
-9.0 -4.0 6.5 ± 3. 5
-8.4
0.7
4.6 ± 5. 4
4.0 ± 0. 3
-3.0
3.6
-2.1 2.9 ± 0. 8
6.9 1.1 0.7
3.8 ± 0. 4
0.6 4.7 2.2
2.5 ± 2. 1
-5.8
0.2
-9.7 -9.7
3.9
3.7 ± 0. 3
3.9
0.9
3.2
-3.7
2.0
2.2 ± 1. 4
-4.4
0.6
2.9 ± 3. 5
-2.7 2.9 ± 2. 8
-0.1 1.7 ± 2. 4
-48.8 -51.4 -50.1 ± 1. 8
-47.5 -47.1 -47.3 ± 0. 3
-47.5 -48.0 -47.8 ± 0. 4
3.2 0.0 0.0
d(GpC) steps Twist (O) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
-45.8
-44.0
-45.8
-44.0
-50.3 -50.3
c
Table 7.7. Continued d(CG) d(CGCG) Rise (Dz) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Roll (p) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Averaged Tilt (7)
(G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Averaged
b
d(CCGCGG) d(CGCGCG ) d(GCGCGCG) / d(GCGCGCG)/ d(GCGCGCGCGC) MGSP d(TCGCGCG ) d(CCGCGCG)
3.7
3.7
3.7
3.7
-4.2
-6.1
4.2
6.1
4.3
-1.5
4.3
1.5
1 8 2.1 -7.4
9.0 3.0 -1.0 -6.2
3.7 3.6 3.710.1
3.6 3.7 3.7+10.1
3.7 3.6 3.7+10.1
3.2
-0.8 0.4 0.6+10.8
-1.8 -3.7 2.8+11.3
-1.8 0.1 0.9+11.3
-3.2
-0.6 0.2 0.4+10.6
1.9 1.0 1.5+10.6
1.1 2.6 1.9+11.1
0.0
3.0 2.1 -1.5 -1.1 1.0 0.9 1.6+10.8
-0.6 1.2 -3.6 0.1 -2.1 4.6 2.0+11.8
-0.9 0.9 -2.8 -2.9 -0.9 5.5 2.3+1.8
-1.6 1.6
5.8 3.8 4.0 5.0
4.4 3.2 3.8 6.4
-2.7 -2.7
3.2
3.2
0.0
Base pairs Tip (O) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Averaged
Inclination (77) C1:G12 G2:C11 C3:G10 G4:C9
13.7 7. 1.7 2.
7.7+8.5 4.9+2. 5.8 10. 2.2 5.
1 9 9.2 4.9
8
4.8+13.5 8.4 7.0 6.2 7.4
6.9 7.5 6.4 6.6
1.6
c
Table 7.7 . Continued d(CG) C5:G8 G6:C7 Averaged Propeller twist (w) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Averaged Buckle (K) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Averaged
d(CGCG)b d(CCGCGG) d(CGCGCG) d(GCGCGCG)/ d(GCGCGCG)/ d(GCGCGCGCGC)c MGSP d(TCGCGCG) d(CCGCGCG)
4.0 ± 2. 5 7.
5 ± 2. 5
10.6 1. -15.7 2.
9 0 3.6 4.2
13.1 ± 13.6 2. 4.6 8. 1.2 -2.
2.9 ± 2. 4 5.
9 ±1.2 5 8 8.9 -2.0 6 ± 3.7
7.3 ± 0.9 3.2 -7.7 3.7 0.9 3.9 ± 2.8 2.2 -6.6 4.9 -1.5 3.8 ± 2.4
7.3 7.7 7.1 ± 0.5
7.7 5.4 5.3 ± 1.4
6.4 2.6 4.5 ± 1. 6
0.8 2.1 5.6 3.4 0.6 3.2 2.6 ± 1.9
-0.7 2.2 -0.9 1.2 -3.6 5.4 2.3 ± 1.8
6.3 0.1 -0.7 -0.9 2.2 0.1 1.7 ± 2 .4
0.3 -4.8 2.8 -5.9 0.1 4.4 3.1 ± 2.4
10.4 -5.9 0.1 -0.7 8.2 -8.3 5.6 ± 4.3
10.0 -3.5 0.0 -2.1 3.8 -1.3 3.5 ±3.5
2.7 1.5 1.5
1.5
-10.5 10.5
10.5
* Underlined sequence s denote th e d(CpG ) dinucleotide s i n the standar d Z-DNA duplex. Th e numberin g o f residues refer s onl y t o thos e nucleotides i n th e duplex. All values are in degrees, excep t ris e (Dz) and displacement (dx), which ar e in A. MGSP refers t o the for m of d(CGCGCG) crystallize d in the presenc e of magnesium and spermine . b d(CGCG) refer s t o the high sal t orthorhombi c for m [6] . ' Onl y one value for each type o f base step in the d(GCGCGCGCGC) decamer i s shown. These values are repeated throughout th e decame r because of the dinucleotide asymmetri c unit. d Averages of the magnitude s of roll, tilt , tip, inclination, propelle r twist , an d buckle ar e shown, an d were calculated by Ave = (E| qi|) /i, wher e qi is the value of that parameter at base step i.
222
Oxford Handbook of Nucleic Acid Structure
nucleotide an d thu s ther e ar e more d(GpC ) step s (5 ) than d(CpG ) step s (4) . Shorte r alternating d(GpC) sequences that have been solve d crystallographically [d(Gm 5CGCGC) and d(Gm 5CGm5CGCGC)] wer e i n th e A-for m (87) . Th e unmethylate d version s of these hexame r an d octame r sequence s crystallize, but ar e highly disordered , wit h th e octamer showin g a stron g Brag g reflectio n a t 3. 4 A resolution , suggestin g tha t i t i s probably i n th e B-form . Thus , i t appear s tha t Z-DN A i s not th e preferre d for m i n alternating d(GC) n sequence s unti l n > 5 dinucleotides . Thi s i s consisten t wit h th e solution studie s o f Quadrafogli o et al. (88) , which showe d tha t oligonucleotide s o f d(GC)n ar e left-handed onl y i n longe r sequence s ( « > 7 dinucleotides) , while shorte r sequences ( 3 < n < 7 dinucleotides ) remai n right-hande d eve n unde r dehydratin g conditions. Thus , i t does appear that the d(CpG ) ste p is the significan t determinant fo r Z-DNA formation, an d in oligonucleotide s wher e th e d(GpC ) step s would b e dom inant, th e left-handed conformatio n i s not stable . In longer sequences , the numbe r o f destabilizing d(GpC) an d stabilizing d(CpG) dinucleotide s becom e equalized , allowin g Z-DNA to form. Unfortunately, th e structur e of d(GCGCGCGCGC) show s positiona l disorde r an d therefore end-effect s coul d no t b e distinguishe d fro m th e remainde r o f the structure . Still, the averag e values for the helica l parameter s ca n be compared to the average s fo r the shorte r DNA lengths . Fo r the mos t part , this decame r is very simila r to th e hexamers. The mos t interestin g deviation i s that the ris e at the d(GpC ) ste p is significantly shorter (b y ~0.5 A ) than i n th e hexame r o r tetrame r structures , to giv e (Dz) = 3.2 A (Table 7.7) . When compare d wit h th e (D z ) o f the octame r structure , which i s intermediate betwee n th e shorte r (tetrame r an d hexame r sequences ) an d thi s longe r sequence, th e compresse d ris e appear s to b e lengt h dependen t an d suggest s that th e shorter sequence s have an elongated d(GpC) step . Alternatively, the shorte r (Dz) o f the octa- an d deca-nucleotide structure s may be relate d to th e crysta l lattice since both are in disordere d hexagona l spac e groups . I n suppor t o f this , th e disordere d d(CGCG ) tetramers crystallize in hexagonal spac e groups an d show shorter rise s (3.6 1 t o 3.6 7 A) when determine d fro m the lengths of the helix axes (36). When comparin g al l these length s o f alternating d(CG ) sequence s as Z-DNA, th e crystal structure s of th e hexanucleotide s appea r indee d t o b e a reasonabl e mode l fo r long, an d perhap s eve n infinite , lengths o f Z-DNA. Th e helica l twis t fo r Z-DN A is very consistent, a t ~—30° per base step for all lengths. Th e ris e at the Z-DN A stabilizing d(CpG ) step s is ~3.8 A , bu t ma y be slightl y exaggerate d i n th e shorte r sequence s at th e d(GpC ) step . The bas e pairs are all nearly perpendicular t o th e heli x axis , with very littl e distortio n t o th e bas e pai r plane (a s would be expecte d fo r this rigid struc ture). I t is also clea r that the bas e pairs at th e end s o f a Z-DNA stretch (a s typified by the dime r an d tetramer structures , and the end s o f the heptame r structures ) are mor e variable i n structure.
3. Sequence and substituent effifects on the structure and stability of Z-DNA The tendenc y o f dinucleotides t o for m Z-DNA is as follows, d(m 5CpG) > d(CpG ) > d(CpA)/d(TpG) > d(TpA ) (77) . I n orde r t o understan d th e structura l basi s behin d
The single-crystal structures of Z-DNA
223
3
3
2
Fig. 7.4 . Definition s an d structure s of variations in d(CG) - (top ) an d d(TA) - (bottom ) type base pairs. Substituents a t the C5 carbon o f the pyrimidine base s are labelled RA, whil e those at the C2 carbon of the purine bases are R B. I n the d(CG)-typ e base pairs , substituents at the C 5 carbo n of the cytosin e base for m 5-methylcytosine an d 5-bromocytosine . Removing th e amin o group a t the C 2 carbo n of guanine form s the unusual inosine nucleotide. In the d(TA)-type base pairs, removing the methyl group at the C 5 carbon of thymine form s th e unusua l deoxynucleotide undine , whil e addin g an amino group to the C 2 carbon of adenine forms th e unusual 2-aminoadenine (or diaminoadenine) nucleotide.
these trends , DNA s containing man y differen t sequences an d base modifications have been crystallize d a s Z-DNA. We will focu s th e discussio n her e o n bas e modification s that bot h stabiliz e an d destabiliz e th e Z conformatio n i n term s o f th e substituen t groups that ar e added, deleted , o r replace d i n th e standar d bases of cytosine, thymine , guanine, an d adenine (Fig . 7.4). In thi s analysi s o f sequence s tha t hav e bee n crystallize d a s Z-DNA, w e compar e hexanucleotide sequence s tha t hav e bee n crystallize d i n th e sam e crysta l lattic e t o determine whic h structura l feature s are sequenc e versu s crysta l packin g effects . Th e impact that sequence has on th e stabilit y o f Z-DNA will be addresse d b y considerin g two measure s o f it s stabilit y relativ e t o B-DNA . Thes e ar e th e solven t fre e energie s (SFEs) an d th e cationi c strengt h (CS ) o f th e crystallizatio n solution . SFE s ar e esti mated fro m th e solvent-accessibl e surface s (SAS ) calculate d fo r th e DN A molecul e and, therefore , reflec t th e energ y associate d wit h th e DN A i n a n aqueou s environ ment. Th e differenc e i n SF E fo r a sequenc e i n th e Z-for m versu s th e B for m (ASFEZ_B) is indicative of the sequence' s stability as Z-DNA (89). Th e othe r measur e of Z-DN A stabilit y tha t i s relevant t o th e sequence s i n a singl e crysta l relie s o n th e recognition tha t th e amoun t o f sal t require d t o conver t a sequence t o Z-DN A fro m B-DNA depend s o n th e sequence' s inheren t stabilit y a s Z-DNA relativ e t o B-DNA.
224
Oxford Handbook of Nucleic Acid Structure
Indeed, th e quantit y o f salt (particularl y the cations , a s defined a s the catio n strength , or CS ) required t o crystalliz e sequences a s Z-DNA was observed t o b e relate d t o th e relative stabilit y of that sequence a s Z-DNA (a s estimated fro m ASFE Z_B), an d ca n b e used t o predic t quantitativel y th e condition s fo r obtainin g thes e crystal s (22) . Thi s relationship ca n be attribute d t o th e requiremen t tha t sequence s undergo a transitio n from B - t o Z-DN A alon g th e crystallizatio n pathway . Thus , i n thi s analysi s o f sequence effect s o n DN A structur e an d stability , w e wil l focu s o n th e effect s tha t various substituen t group s hav e o n th e structur e (bot h th e DN A conformatio n an d the solven t structure) , ASFE Z_B, an d th e crystallizatio n condition s fo r th e variou s sequences that have been crystallize d as Z-DNA.
3.1 Effects of cytosine methylation on Z-DNA structure Methylation o f cytosin e a t th e C 5 carbo n o f th e bas e (m 5C) (Fig . 7.4 ) ha s bee n studied extensivel y becaus e o f it s effec t o n DN A transcriptio n (90) . Th e effec t tha t methylation ha s o n Z-DN A ha s bee n studie d bot h i n solutio n (14 ) an d i n variou s crystal structures . Thes e studie s hav e show n tha t methylatio n stabilize s the Z-DN A conformation relativ e t o th e B-form . Usin g circula r dichrosi m spectroscop y t o monitor sal t an d alcoho l titrations , Beh e an d Felsenfel d (14 ) showed tha t pol y [d(m5CpG)] require s les s sal t o r alcoho l t o conver t t o Z-DN A tha n th e unmethylate d poly [d(CpG)] . Thi s stabilizatio n is associated wit h th e effec t tha t th e methy l grou p has on th e hydrophobicit y o f Z- an d B-DNA, a s reflected in th e abilit y o f cations o f the Hofmeiste r serie s to induc e Z-DN A in poly [d(CpG) ] an d poly [d(m 5CpG)] (26) . In th e Hofmeiste r series , the cation s woul d b e expecte d t o follo w th e tren d Mg 2+ > Li+ > Na + > K + > NH 4+ in affecting th e transition , if, indeed, th e hydrophobi c effec t is significant (91). The crystallizatio n of methylated an d unmethylated d(CGCGCG) reflect s th e stabil izing effec t o f cytosine methylation o n Z-DNA . Methylated sequence s require les s salt to crystalliz e than unmethylate d sequences . Th e sequenc e d(CGCGCG ) wa s crystal lized fro m a solutio n wit h a C S = 2. 0 M , wherea s th e methylate d sequenc e d(m5Cm5CGm5CG) require d C S = 0.5 7 M cations (Tabl e 7.2). The crysta l structure of d(m 5CGm5CGm5CG) (18 ) showed tha t th e methy l group s reside i n protecte d an d recesse d pocket s a t th e majo r groov e surfac e forme d b y th e base and sugar of the adjacen t guanin e nucleotide . Thus , b y burying th e methy l int o a hydrophobic pocket , thi s grou p form s a hydrophobi c patc h tha t i s les s accessibl e t o solvent i n Z-DN A tha n i n B-DNA . I n addition , th e methy l grou p i s involve d i n favourable contact s with the bas e and sugar (18). The methy l group , however , shoul d no t b e viewed a s simply a substituent added t o the d(CGCGCG ) structure . It als o affect s th e structur e o f Z-DNA, a s is evident fro m the analysi s o f sequence s havin g differen t degree s o f cytosin e methylation . I n thi s analysis, w e compare th e sequenc e d(CGCGCG) , whic h contain s the full y unmethy lated d(CG ) bas e pairs, t o th e sequence s d(m 5CGm5CGm5CG) an d d(CGCGm 5CG), in whic h th e d(CG ) bas e pair s ar e full y an d hemimethylated . Eac h d(CpG ) an d d(GpC) dinucleotid e ste p was analysed fo r helica l twist , rise , roll , tilt , propelle r twist , buckle, an d x-displacement (Tabl e 7.8) .
Table 7.8 . Comparison s o f helical parameters for modified d(CpG ) dinucleotide s d(m 5CpG), dfB^CpG) , an d d(CpI ) d(CGCGCG) d(m d(CpG) steps Twist (ft ) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
5
CGm5CGm5CG) d(CGCGm
5
CG) d(Br5CGBr5CGBr5CG) dCCGCGB^CG ) d(CGCICG)"
-9.2 -9.4 -12.2 -10.3 ±1. 7
-14.4 -14.5 -16.1 -15.0 ± 1.0
-10.8 -12.1 -14.6 -12. 5 ± 1.9
-14.5 -11.8 -14.8 -13.7 ± 1.7
-8.9 -11.6 -13.9 -11.5 + 2.5
-11.8 -11.9 -12.3 -12.0 + 0.2
3.8 3.9 4.1 3.9 ± 0.2
3.9 3.7 3.9 3.8 ±0.1
3.9 3.8 3.9 3.9 ±0.1
3.9 3.5 3.9 3.8 + 0.2
3.8 4.0 3.8 3.9 + 0.1
3.6 2.1 3.4 3.0 ± 0.8
-0.8 -1.1 3.6 0.6 ± 2. 6
0.1 -2.6 1.5 -0.3 + 2.1
1.5 -2.9 2.3 0.3 ± 2.8
0.0 -2.1 3.9 0.6 ± 3.0
-2.2 2.1 1.6 0.5 + 2.4
-4.6 -5.8 3.4 -2.3 ± 5.0
(C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
-6.0 0.9 -0.8 -2.0 ± 3.6
-7.1 -0.1 -1.0 -2.7 ± 3.8
-7.4 0.7 1.5 -1.7 ±4 . 9
-9.2 0.6 0.3 -2.8 + 5.6
-6.4 -0.3 -2.3 -3.0 ± 3.1
1.1 -1.7 0.0 -0.2 ± 1.4
d(GpC) step s Twist (11 ) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
-48.9 -50.8 -49.9 + 1.3
-43.6 -44.5 -44.1 ± 0.6
-48.4 -47.0 -47.7 + 1.0
-45.4 -46.0 -45.7 ± 0.4
-47.9 -48.6 -48.3 ± 0.5
-49.2 -48.2 -48.7 + 0.7
Rise (Dz) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Roll (p) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Tilt (T)
Table 7.8. Continued d(CGCGCG) d(m
5
CGm5CGm5CG) d(CGCGm5CG) d(Br
5
CGBrsCGBr5CG)
dCCGCGBr^CG)
d(CGCICG)"
Rise (D z) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
3.6 3.5 3.6 ± 0. 1
3.8 3.8 3.8 ± 0. 0
3.6 3.7 3.7 ± 0. 1
3.4 3.9 3.7 ± 0. 4
3.6 3.6 3.6 ± 0. 0
3.3 2.3 2.8 ± 0. 7
Roll (p) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
-1.5 0.3 -0.6 ± 1. 3
-4.6 -2.4 -3.5 ± 1. 6
-0.1 1.1 0.5 ± 0. 8
-4.3 0.0 -2.2 ± 3. 0
-1.4 0.3 -0.6 ± 1. 2
4.7 4.4 4.6 ± 0. 2
Tilt (T) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
0.1 0.6 0.4 ± 0. 4
0.8 0.3 0.6 ± 0. 4
2.3 0.3 1.3 ± 1. 4
5.9 1.3 3.6 ± 3. 3
-0.5 0.1 -0.2 ± 0. 4
-0.5 -0.6 -0.6 ± 0. 0
Base pair s Propeller twist (« ) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
1.1 3.2 0.9 1.5 0.5 2.7 1.7 ± 1. 1
2.0 3.4 4.8 1.2 0.3 2.1 2.3 ± 1. 6
0.6 0.9 1.2 0.4 2.6 2.1 1.3 ± 0. 9
6.6 6.5 4.4 0.7 5.9 0.7 4.1 ± 2. 8
2.2 0.4 0.5 5.7 0.2 0.4 1.6 ± 2. 2
1.5 1.7 5.0 1.0 1.9 2.0 2.2 ± 1. 4
Table 7.8. Continued d(CGCGCG) d(m Buckle (K) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average ^-displacement (dx) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
5
CGm5CGm5CG) d(CGCGm 5CG) dCBi^CGB^CGB^CG )i dCCGCGBi^CG ) d(CGCICG)"
2.0 0.0 2.1+1.2
6.2 4.8 2.1 5.7 5.3 3.8 4.711.5
4.4 0.8 1.5 3.4 4.1 3.5 3.011.5
11.6 3.5 2.2 10.3 4.2 0.6 5.414.5
2.2 3.5 0.6 0.8 4.1 4.0 2.511.6
5.3 8.7 8.7 8.3 3.6 0.7 5.913.3
-3.0 -3.1 -3.3 -3.3 -3.5 -3.4 -3.3+0.2
-3.6 -3.6 -3.4 -3.5 -3.8 -3.6 -3.6+0.1
-3.1 -3.2 -3.1 -3.2 -3.8 -3.5 -3.3+0.3
-3.7 -3.6 -3.4 -3.4 -3.7 -3.7 -3.6+0.1
-3.2 -3.3 -3.3 -3.3 -3.9 -3.6 -3.410.3
-3.8 -3.3 -2.1 -2.3 -3.4 -3.4 -3.010.7
1.9 3.5 3.0 2.4
All values are in degrees, excep t rise (Dz) an d displacemen t (dx) whic h are in A . " Parameters for d(CGCICG) were taken directly from th e tables in ref. 56.
228
Oxford Handbook of Nucleic Acid Structure
The mos t significan t effec t o f methylation o n d(CpG ) step s i s on th e helica l twist (Table 7.8) . In it s full y unmethylate d form , (O ) = -10. 3 ± 1.7, ° whereas th e full y methylated dinucleotide , d(m 5CpG/m5CpG), i s overwoun d b y ~5 ° wit h a n (fi ) = —15.0 ± 1.0° . This ha s been attribute d t o unfavourabl e steric contact s between th e methyl grou p an d th e C2 ' carbo n o f th e deoxyribos e o f th e neighbourin g guanin e (18). In the d(GpC ) steps , methylation affect s bot h th e twis t an d the roll , independen t of the dinucleotide's location in the sequence. In the unmethylated d(GpC/GpC) dinucleotide, (ft ) = -49. 9 ± 1.3° , whereas th e full y methylate d d(Gpm 5C/Gpm5C) i s underwound b y 5.8° with (ft ) = -44. 1 + 0.6° . Thus, ther e i s a compensating overand under-windin g o f th e d(CpG ) an d d(GpC ) step s s o that th e overal l structure o f methylated Z-DNA remains relatively unperturbed [(11 ) = —30.3 ° per bas e step in th e unmethylated an d (O ) = —29.8 ° pe r bas e ste p fo r th e full y methylate d d(CpG ) sequences]. This suggests, once again, that the primary determinant o f Z-DNA struc ture is the d(CpG ) step, with th e d(GpC ) step s acting to compensat e for perturbations to th e structure. At th e bas e pai r level, buckle an d x-displacemen t ar e the onl y parameter s that are significantly affecte d b y methylation . I n thi s case , d(CG ) bas e pairs hav e a n averag e buckle of 2.1 ± 1.2 ° and an average x-displacemen t of 3.3 ± 0. 2 A. whereas d(m5CG) base pair s hav e a n averag e buckl e o f 4. 7 ± 1.5 ° and a n averag e x-displacemen t o f 3.6 ± 0. 1 A (Tabl e 7.8). Thi s agai n is associated with steri c interaction s between th e substituent and the neighbourin g guanine nucleotide . Studies on the hemimethylated dinucleotides d(m 5CpG/CpG) an d d(Gpm 5C/GpC) show tha t eac h methy l grou p act s independentl y t o affec t th e structur e and stability of Z-DNA . Th e structure s o f th e tw o hemimethylate d d(m 5 CpG/CpG) an d d(Gpm5C/GpC) step s i n th e sequenc e d(CGCGm 5CG) ar e intermediat e betwee n those observed for the correspondin g unmethylated and fully methylate d dinucleotid e steps (Tabl e 7.8). Thi s suggests that the hemimethylate d for m is a true conformational intermediate. Thi s i s eviden t whe n comparing , fo r example , th e helica l twis t a t th e two d(m 5CpG/CpG) t o th e averag e of the correspondin g dinucleotide s a t each position. Th e helica l twis t betwee n bas e pair s d(Cl:G12 ) an d d(G2:m 5 Cll) i s -10.8°, while th e averag e for th e correspondin g base pairs in th e unmethylate d an d methyl ated structure s is (ft) = -11.8° . Likewise , bas e pair s d(m 5C5:G8) an d d(G6:C7 ) a t the opposit e en d o f th e duple x hav e ( 1 = -14.6 ° compare d wit h (ft ) = —14.2 ° fo r the unmethylate d an d methylate d analogues . Th e positio n effect s ar e probabl y a result o f the crysta l packing forces o f the lattice. Similarly, (fl) for the d(Gpm 5C/GpC) steps ar e intermediat e betwee n th e analogou s unmethylate d an d methylate d structures. We woul d expec t hemimethylatio n o f cytosine s to hav e a n intermediat e effec t o n the stabilit y of Z-DNA. Indeed , thi s is the case . Th e sequenc e d(CGCGm 5CG) was crystallized fro m a solution wit h C S = 1.2 6 M, whic h i s intermediate betwee n tha t required fo r th e unmethylated , d(CGCGCG ) (2. 0 M), an d th e full y methylated , d(m:'CGm5CGm5CG), sequenc e (0.5 7 M). Th e CS w e predic t fo r crystallizin g this hemimethylated sequence is 1.52 M, assumin g equal contributions fro m eac h methyl group. A salt titratio n of 24 alternating base pairs of d(CG) showe d tha t the midpoin t for th e transitio n fro m B - t o Z-DN A i n th e unmethylate d sequenc e occurre d a t ~1.25 M MgCl 2, whil e th e full y methylate d sequence was predominantly Z-DN A in
The single-crystal structures of Z-DNA 22
9
Fig. 7.5 . Titratio n of unmethylated, methylated, and hemimethylated d(CpG) dinucleotide s with MgCl2 to induce the formation o f Z-DNA (60) . The unmethylate d sequence d(CG)12 (squares), full y methylate d sequence d(m 5CG)12 (circles) , an d hemimethylate d sequenc e d(m 5CGCG)3(CGCG)3 (diamonds)were titrated with 0.0-4.0 M MgCl 2. The formatio n o f Z-DNA was monitored by following th e rati o of light absorbed at 260 nm versus 290 nm (A 260 /A 290 ratio) . The conformation s o f the DNA a t the beginning and end of the titration were confirmed t o be that of B-DNA and Z-DNA, respectively, by circular dichroism spectroscopy.
the absenc e o f adde d sal t (Fig . 7.5) . Thi s salt-induce d transitio n i n th e sequenc e d(m5CGCGm5CGCGm5CGCGCGCGCGCGCGCG), whic h ha s si x tru e hemi methylated dinucleotid e steps, had a midpoint for the transitio n at ~1 M MgCl 2 (60). Again, w e woul d predic t a midpoin t a t 0. 9 M fo r equa l contribution s fro m eac h methyl group . Thus, th e hemimethylate d dinucleotide s represen t a true intermediate , both structurall y an d thermodynamically , betwee n full y unmethylate d an d full y methylated dinucleotides . The Z-DN A stabilizin g effec t fro m cytosin e methylatio n i s likely t o b e associate d with th e hydration o f the DNA structure . Th e onl y notabl e effec t o f cytosine methyl ation o n th e solven t structure in the crystal , however, i s that the water that is hydrogen bonded t o th e cytosin e N 4 nitroge n a t th e majo r groov e surfac e i s slightly displace d away fro m th e methy l grou p (18) . Otherwise , th e arrangemen t o f waters aroun d th e Z-DNA structur e remain s unperturbed . Thi s i s not entirel y surprisin g becaus e th e methyl grou p actuall y sit s recessed in a pocket o f the majo r groov e surfac e an d thu s is largely inaccessible to solvent . The calculate d solven t fre e energie s sho w tha t methylatio n make s th e Z-DN A surface mor e hydrophobic; however , th e ASFE Z_B indicates that methylation works t o stabilize Z-DNA primarily b y destabilizing B-DNA [b y making its surface eve n mor e hydrophobic, (Tabl e 7.9)] . Methylatin g th e cytosine s o f Z-DN A increase s th e SFEZ by 1.3 kcal/mol/bp and thus we would expec t thi s to destabiliz e the left-hande d conformation. I n contrast , ther e i s a n eve n greate r destabilizatio n o f B-DN A fo r these sequence s (ASFE Z_B = 0.2 9 kcal/mol/d n fo r th e unmethylate d sequence , but i s -0.87 kcal/mol/d n fo r th e methylate d analogue) . Th e SFE Z an d ASFE Z_B for the hemimethylate d dinucleotide s i n d(CGCGm 5CG) ar e agai n intermediat e be tween th e correspondin g value s fo r d(CGCGCG ) an d d(m 5CGm5CGm5CG) (Table 7.9).
230
Oxford Handbook of Nucleic Acid Structure
Table 7.9. Solven t fre e energie s of various dinucleotides as Z- an d B-DNA Dinucleotide anti—p—syn
SFEZ (kcal/mol/dn)
SFEB (kcal/mol/dn)
ASFE(Z_B) (kcal/mol/dn)
d(CpG) d(m5CpG)/(CpG) d(m5CpG) d(TpA) d(UpA) d(ApT)a d(GpC)/d(Gpm5C)a d(CpC)/d(GpG) a
-12.97 -12.58 -10.28 -9.90 -11.50 -8.17 -12.08 -12.26
-13.26 -12.08 -9.41 -8.53 -10.64 -9.45 -11.62 -12.90
0.29 -0.50 -0.87 1.35 0.86 1.28 -0.46 0.64
SFEZ and SFEB are the solven t free energies for the dinucleotide in the Z and B conformations , respectively. SFEZ wa s calculated from the crysta l structure containing tha t dinucleotid e ste p and SFE B was calculated from idealize d B-DNA models. ASFE (Z_B) is the fre e energ y differenc e fo r th e dinucleotide ste p i n the Z form versus the B form . a Dinucleotide out-of-alternatio n (e.g . the firs t bas e pair of the ste p is anti, followed b y syn).
3.2 Effects of cytosine bromination on Z-DNA structure The effectiv e radiu s (~ 2 A) an d hydrophobicit y o f a bromine ato m i s very simila r t o that of a methyl group . We would therefor e expec t bromination o f cytosines to have a similar effec t i n stabilizin g Z-DNA , an d o n th e structur e o f Z-DNA . Th e tw o Z-DNA sequence s that hav e been crystallize d that contai n a brominated C 5 o f cytosine (Fig . 7.4 ) ar e the full y brominate d sequenc e d(Br 5CGBr5CGBr5CG) (19 ) and th e hemibrominated sequence d(CGCGBr 5CG) (63). The effec t o f cytosine brominatio n o n th e stabilit y o f Z-DNA i s equivalent t o o r greater tha n tha t o f cytosin e methylation . Pol y (Br 5CpG) i s constituitivel y i n th e Z-form eve n i n th e absenc e o f alcohol s an d hig h concentration s o f adde d salt s (15). Like d(m 5CGm5CGm5CG), d(Br 5CGBr5CGBr5CG) require d ver y little salt to crystal lize. Th e enhance d stabilit y of brominated Z-DN A compare d wit h eve n th e methy lated form may result from th e smalle r perturbation o f the structure . Comparison o f base dinucleotide parameter s reveals trends similar t o thos e see n fo r methylation (Tabl e 7.8) . Specifically, th e d(CpG/CpG ) dinucleotid e ha s a n averag e twist of-10.3 ± 1.7° , while the d(Br 5CpG/Br5CpG) dinucleotid e i s overwound b y 3.4 to —13. 7 + 1.7° . Bromination, therefore , ha s a n similar , bu t les s dramatic , effec t o n Z-DNA structure than methylation. Thi s smalle r perturbation on th e structur e may be a result of differences i n the interactions with adjacen t nucleotide s between th e spheri cally shape d bromin e ato m compare d wit h th e tetrahedra l methy l group . A s with th e methylation effec t o n twist , the overwindin g o f the anti—p—syn ste p in brominated step s is compensated a t the d(GpC ) ste p to giv e n o ne t differenc e i n th e helica l twis t o f th e hexanucleotide structures . Th e (O ) = —30.3 ° pe r bas e ste p i n d(CGCGCG ) an d th e brominated structur e has (O) = —30.2 ° per base step. Unlike othe r Z-DN A structures, the helica l twist o f d(Br 5CGBr5CGBr5CG) i s not positio n dependen t (Tabl e 7.8), sug gesting tha t th e conformatio n o f th e full y brominate d sequenc e i s less affecte d b y th e crystal lattice. Additionally, onl y th e Z I backbon e conformatio n i s present i n thi s full y
The single-crystal structures of Z-DNA 23
1
brominated structure . Finally, brominatio n doe s not appea r t o hav e any effec t o n th e base parameters of the tip, inclination, propeller twist , buckle, and x-displacement. Unlike methylation , i t i s not clea r if hemibrominatio n represent s an intermediat e between full y brominate d an d full y unbrominate d dinucleotide s (Tabl e 7.8) . Th e helical twist ( O = —8.9° ) fo r th e hemibrominate d dinucleotide at one en d i s similar to that o f d(CpG/CpG) (-9.2°) . However , O = -13.9° fo r thi s sam e hemibrominate d dinucleotide at the opposit e end, an d is intermediate betwee n ( 1 = —12.2 ° an d —14.8 ° observed fo r th e dinucleotide s d(CpG/CpG ) an d d(Br 5CpG/Br5CpG), respectively . Additionally, the (O) = -48.3° for the hemibrominated d(GpBr 5C/GpC) step s is identical t o th e averag e fo r d(GpC/GpC ) an d d(GpBr 5C/GpBr5C). Thes e observation s are consisten t with th e hemibrominate d sequenc e representing a n intermediate conformation excep t a t one terminal dinucleotide .
3.3 Effect of the N2 amino of guanine on the structure and stability of Z-DNA Removing th e amin o grou p a t th e N 2 positio n o f guanin e (t o for m inosine , dI ) (Fig. 7.4 ) would be expected t o destabilize Z-DNA. This would eliminat e one hydro gen bon d withi n th e bas e pair but, perhaps more importantly , woul d affec t th e spin e of hydratio n in th e mino r groov e o f the Z-DN A duplex . Th e mino r groov e wate r that bridge s thi s N 2 amin o grou p t o th e phosphat e oxygen s o f th e backbon e i s thought t o b e importan t fo r stabilizin g th e syn conformatio n o f th e guanin e base s (77). Th e tw o published structures of inosine-containing Z-DN A are for the octamer sequence d(CGCICICG) (64 ) and the hexamer d(CGCICG) (56) . The coordinate s of neither of these structures were availabl e for analysis by our program , but som e helical parameters coul d b e gleaned fro m th e data presented i n the published papers . The structur e of d(CGCICICG ) wa s disordered i n th e crysta l an d thu s specifi c parameters fo r th e d(CpI ) an d d(IpC) step s coul d not b e distinguishe d fro m thos e o f the d(CpG ) an d d(GpC ) steps . Th e value s reporte d ar e therefor e average s for th e respective dinucleotides . Th e averag e ris e {3. 6 A fo r th e d[CpG(I) ] an d 3. 7 A fo r d[G(I)pC])} an d helica l twis t {16.5 ° fo r th e d[CpG(I) ] an d 43.5 ° fo r d[G(I)pC] } o f this structur e ar e more similar t o th e tetrame r d(CGCG ) an d the disordere d octanu cleotide d(CGCGCGCG ) tha n t o th e paren t hexanucleotid e structures . This, again , may be related more to the hexagona l space group o f these crystals than to an y intrinsic structural property of Z-DNA. The structur e o f d(CGCICG ) show s a crysta l lattic e an d conformatio n tha t i s similar to the SP form of d(CGCGCG). Th e mino r groov e o f the duple x is 0.6 A narrower than the standard MGSP structure , but i t was not clea r whether thi s is primarily localized at the d(CI ) base pairs, o r averaged over the structure . The wate r structure of d(CGCICG) wa s said t o b e simila r t o tha t o f th e spermin e for m o f d(CGCGCG) , including the continuous spine connecting the O 2 oxygen s of the cytosines along th e minor groov e crevice . This suggests that the N 2 amin o grou p is not absolutel y essential t o orderin g th e water s i n th e crevice . Still , th e bridg e fro m th e purin e t o th e phosphate canno t be made . Th e SFE s calculate d suggest that d(CI ) base pair s ar e less stable as Z-DNA by 0.30 kcal/mol/bp compared with d(TA ) base pairs. The sequence
232
Oxford Handbook of Nucleic Acid Structure
d(CICGCG) required CS = 4.2 to crystallize a s Z-DNA (22), the highest salt concentration required fo r any APP sequence.
3.4 The structure and stability of d(TpA) dinucleotides in Z-DNA The observatio n tha t d(TpA ) dinucleotide s ca n be incorporated into the structur e o f Z-DNA extends the range of sequences that can adopt the left-handed conformation. Although thi s is an APP dinucleotide , i t doe s not promot e th e formation of Z-DNA, and mus t be flanke d b y methylate d d(m 5CpG) dinucleotide s t o crystallize , as in th e sequence d(m 5CGTAm5CG). The structur e of d(m 5CGm5CGm5CG) therefor e serves as the referenc e when analysin g this d(TpA)-containing structure . The destabilizatio n of Z-DNA in the crystal s by the d(TpA ) dinucleotide is reflected i n the C S for the crys tallization o f d(m5CGTAm5CG) (1. 3 M ) compared wit h that for d(m5CGm5CGm5CG) (0.57 M ) (Tabl e 7.2). The overal l structur e o f d(m 5CGTAm5CG) i s indee d mor e simila r t o d(m5CGm5CGm5CG) tha n t o d(CGCGCG ) i n al l respect s (Table s 7. 8 an d 7.10) . Differences i n th e structura l details are attribute d t o replacin g the centra l d(m 5CpG) dinucleotide wit h d(TpA) . Th e helica l twis t i s reduced b y 1.7° , approachin g tha t o f d(CGCGCG). Thi s i s associated with a sliding o f th e d(TA ) base pairs towards each other. Thi s slidin g i s localized, however , t o onl y th e d(TpA ) dinucleotide , sinc e th e d(GpT) an d d(Apm 5C) step s show increase s in (ft ) = 0.5 ° eac h to compensate . Th e d(TpA) dinucleotid e i s also significantly compressed (Dz = 3.3 A ) compared with an y d(CpG) dinucleotide . The destabilizatio n o f Z-DN A b y d(TpA ) dinucleotide s appear s to b e associate d with the presence of the methyl group at the majo r groov e surface o f the thymine base and the absenc e of an N2 amin o grou p fro m th e mino r groov e crevic e of the adenin e base (Fig . 7.4) , both of which perturb th e solvatio n aroun d th e d(TA ) base pairs (89) . The cytosine s of d(CpG) dinucleotides ar e bridged by a well-defined patter n of waters at the majo r groov e surfac e (79) . In contrast, the solven t structure at the d(TpA ) dinucleotide majo r groov e surfac e ca n bes t b e describe d a s a se t o f disordere d water s and/or catio n complexes , wit h n o specifi c hydroge n bondin g patter n t o th e thymin e bases (51) . In comparison , the structur e of d(m 5CGUAm5CG) help s to pin-poin t the rol e of the thymin e methy l grou p i n th e instabilit y o f d(TpA ) dinucleotide s a s Z-DNA. I n this deoxyuridine-containing structur e of Z-DNA (Fig . 7.4), the twis t angle betwee n the centra l d(UA) base pairs approach the value s of O for the d(m 5CpG) dinucleotide s (Tables 7. 8 an d 7.10) . Thi s appear s to resul t fro m th e couplin g o f th e tw o stacke d uridine bases by a Mg(H2O) 6 2+ complex . Thi s comple x is analogous to th e waters that bridge the stacke d cytosines a t the majo r groove surfac e of the d(CpG ) dinucleotides . The thymin e methyl s i n th e structur e o f d(m 5CGTAm5CG) evidentl y disrup t these interactions. Thi s wa s suggeste d b y th e lowe r ASFE Z_B calculate d for th e d(UpA ) compared with the d(TpA ) dinucleotides (Tabl e 7.9). Interestingly, th e solven t i n th e mino r groov e crevic e i s also perturbed b y th e C 5 methyl o f the thymine s in d(m 5CGTAm5CG). The tw o well-ordered waters typically observed a t each d(CG) bas e pair in Z-DNA (Fig . 7.2) could no t b e locate d a t eithe r d(TA) bas e pai r (51) . Thus , th e spin e o f hydratio n i n th e mino r groov e crevic e i s
Table 7.10. Helica l parameters for d(A), d(T), d(U), and d(D)-containing sequences
d(CpG) steps Twist (O) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
Rise (Dz) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average
Roll (p)
(C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Tilt (T) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average d(GpC) step s Twist (O) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
d(m5CGTAm5CG)
d(m5CGUAm5CG)
d(CGTDCG)
d(CDCGTG)
d(CDUDCG)
-16.1 -12.8 -14.9 -14.6 ± 1.7
-14.8 -13.8 -17.0 -15.2 ± 1.6
-13.5 -7.3 -13.5 -11.4 ±3. 6
-7.5 -12.4 -11.9 -10.6 ± 2.7
-13.5 -7.3 -13.5 -11. 4 ±3.6
3.9 3.3 3.9
3.7 ± 0.3 0.1
-1.4
1.1
3.8 3.4 3.8
4.2 3.6 4.3
4.0 3.9 3.8
4.2 3.6 4.3
3.7 ± 0.2
4.0 + 0.4
3.9 ±0.1
4.0 ± 0.4
0.6
-9.4 -6.6 -5.8 -7.3 ± 1.9
0.5 2.6 0.3
1.1 ± 1.3
-9.4 -6.6 -5.8 -7.3 ± 1.9
3.9
-9.7
-0.3
1.4
-0.1 ± 1.3
0.6 ± 0.9
7.3
6.2 3.8
-9.7
-2.4 -1.1 1.3 + 5.3
-1.7 2.8 ±4.1
-2.6 ± 6.2
1.8 + 2.0
-2.6 ± 6.2
-44.9 -44.2 -44.6 ± 0.5
-44.7 -45.8 -45.3 ± 0.8
-42.0 -42.0 -42.0 ± 0.0
-49.7 -47.5 -48.6 ± 1.6
-42.0 -42.0 -42.0 ± 0.0
0.1 1.7
-0.1
1.7
0.1 1.7
Table 7.10. Continued d(m5CGTAm5CG) Rise (D z) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Roll (p) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Tilt (T) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Base pairs Propeller twis t (w) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
d(m5CGUAm5CG)
d(CGTDCG)
d(CDCGTG)
d(CDUDCG)
3.9 3.8 3.9 ± 0.1
3.9 3.6 3.8 ± 0.2
3.6 3.6 3.6 ± 0.0
3.6 3.7 3.7 ±0.1
3.6 3.6 3.6 + 0.0
-5.3 -5.2 -5.3 ± 0.1
4.0 6.5 5.3 ± 1.8
-3.1 -1.9 -2.5 ± 0.8
-0.6 -3.1 -1.9 ±1. 8
-3.1 -1.9 -2.5 ± 0.8
-0.6 0.6 0.0 ± 0.8
-1.1 0.7 -0.2 ± 1.3
-1.8 0.2 -0.8 ± 1.4
1.8 0.3 1.1 ± 1.1
-1.8 0.2 -0.8 + 1.4
0.7 4.1 2.6 2.0 1.6 2.2 2.2 ± 1. 1
2.8 7.9 4.0 1.3 0.4 2.0 3.1 ± 2.7
5.7 0.8 0.2 0.2 0.7 5.7 2.2 ± 2.7
4.2 0.8 3.0 2.9 1.1 2.6 2.4 ± 1. 3
5.7 0.8 0.2 0.2 0.7 5.7 2.2 ± 2.7
Table 7.10. Continued d(m5CGTAm5CG) Buckle (K) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average x-displacement (dx) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
d(m5CGUAm5CG)
d(CGTDCG)
d(CDCGTG)
d(CDUDCG)
5.5 7.5 5.1 8.8 2.1 2.8 5.3 ± 2.6
7.7 7.2 2.0 4.3 0.3 2.5 4.0 ± 3.0
2.9 1.2 6.9 6.9 1.2 2.8 3.7 ± 2.6
0.9 1.8 1.0 1.8 5.2 6.3 2.8 ± 2.3
2.9 1.2 6.9 6.9 1.2 2.8 3.7 ± 2.6
-3.8 -3.7 -3.4 -3.4 -3.8 -3.8 -3.7 ± 0.2
-3.9 -3.6 -3.3 -3.2 -3.9 -3.9 -3.6 ± 0.3
-3.7 -3.1 -2.8 -2.9 -3.3 -4.0 -3.3 ± 0.5
-3.1 -3.1 -3.3 -3.3 -3.7 -3.4 -3.3 ± 0.2
-3.7 -3.1 -2.8 -2.9 -3.3 -4.0 -3.3 ± 0.5
All values are in degrees, excep t rise (Dz) and displacement (dx), whic h are in A. Parameters for the reference sequences d(CGCGCG) and d(m5CGm5CGm5CG) are shown in Table 7.8 .
236
Oxford Handbook of Nucleic Acid Structure
disrupted a t eac h d(TA ) bas e pair . Thi s ma y contribut e t o th e reduce d stabilit y o f d(TpA) dinucleotide s as Z-DNA. The wate r network i n the minor groov e o f B-DNA is continuous eve n a t the d(TA ) bas e pairs (80) . I n thi s case , the water s ar e hydroge n bonded t o th e N 3 nitroge n o f th e purin e ring , whic h i s largel y inaccessibl e i n Z-DNA. Thus, ther e are no waters that bridge the N 2 amin o group o f the purine t o the phosphat e backbone to stabiliz e the syn conformation, a s was observed wit h th e d(CpG) dinucleotide s in Z-DNA. The ordere d hydratio n i n th e mino r groove , however , i s restored t o th e d(CpG) like spine at d(UA) base pairs of d(m 5CGUAm5CG) (56) . This apparently results from a widening o f the minor groov e cause d by the coupled binding of the uridin e bases by the magnesium—wate r comple x a t th e majo r groov e surface . Ther e ar e two water s at each d(UA) base pair. One wate r is directly hydroge n bonde d to the O 2 of the uridine base, whil e th e secon d connect s thi s wate r t o th e phosphoribos e backbon e o f th e opposite strand . Thus , althoug h n o wate r directl y connect s the adenin e bas e t o th e backbone, ther e ma y stil l be a degre e o f stabilizatio n o f th e syn conformatio n con ferred b y th e pyrimidine—water—water—phosphat e bridge . Thi s woul d sugges t tha t d(UpA) dinucleotide s ar e more stabl e as Z-DNA than d(TpA ) dinucleotides. Indeed , the sequenc e d(m 5CGUAm5CG) wa s crystallized in solution s havin g C S = 0.3 1 M , which i s less than half of that required to crystalliz e d(m5CGTAm5CG). The magnesiu m comple x o f d(m 5CGUAm5CG) ca n b e displace d fro m th e majo r groove by binding coppe r ion s t o the purines (53) . Th e resul t is that the minor groove crevice of the d(UpA ) dinucleotide becomes narrower, althoug h no t a s narrow as that of the d(TpA ) dinucleotide . Th e effec t o f thi s o n th e spin e o f hydration i s that th e four water s at the d(UpA ) dinucleotid e ar e perturbed, bu t no t displaced . On e wate r remains hydrogen bonded t o the O 2 an d in the plane of the uridine base. The secon d water fo r each base pair , however , i s pushed out o f plane and, therefore, cannot form the pyrimidine—water—water—phosphoribos e bridg e o f th e nativ e d(UpA ) structure . This displacemen t effectivel y isolate s the cluste r o f fou r water s a t th e d(UpA ) dinu cleotides fro m thos e o f the neighbourin g d(CpG ) dinucleotides . Thus , althoug h th e number o f water s i n th e spin e remain s unchanged , it s continuit y alon g th e mino r groove an d acros s the heli x become s disrupted by removing th e magnesiu m comple x at the majo r groove surface . To se e ho w perturbation s t o th e majo r groov e surfac e affec t th e stackin g o f th e bases an d th e wate r structur e of Z-DNA, w e star t wit h th e d(UpA ) dinucleotid e o f the copper-soake d structure, which ha s a minor groov e crevice that is intermediate i n width (Fig . 7.6). Introducing a magnesium complex a t the majo r groove surfac e slides the base pairs to provid e a wider crevic e that can accommodate th e fou r wate r molecules in the plane of the d(UA ) base pairs. Methylating the uridine bases, o n the othe r hand, prevent s the binding of this magnesium complex and slides the base pairs in th e opposite directio n t o narro w th e crevice . Th e narrowe r crevic e prevent s th e water s from formin g a well-ordered networ k a t th e d(TA ) bas e pairs . Thus , th e majo r an d minor groove s of Z-DNA cannot be treate d as two isolate d domains of the structure. Perturbations to on e side are transmitted through th e doubl e heli x t o the othe r sid e of the duplex, The othe r substituent that affects th e stabilit y of d(TpA) dinucleotides a s Z-DNA is the N 2 amino , or , mor e precisely , the lac k o f this grou p o n th e adenin e bases . Th e
The single-crystal structures of Z-DNA 23
7
Fig. 7.6 . Compariso n o f the solven t structure s an d width s o f the mino r groov e crevice of d(U[T]pA) dinucleotides i n Z-DNA . Show n ar e th e structure s o f d(m 5CGUAm5CG) (25) , d(m 5CGUAm5CG) soaked with copper [(53) , d(m 5CGUAm5CG)*], an d d(m 5CGTAm5CG) (51) . The to p bas e pai r o f each dinucleotide is shown with thic k bonds and labelled in bold, while th e lowe r base pair s are shown as thin bonds and labelled in standar d type. Waters that interact with the to p base pairs are shown as filled circles , while those interacting with the lower base pairs are open circles [the circle with a cross in the structure of d(m5CGUAm5CG)* sit s between th e tw o bas e pairs] . Width s o f the mino r groove crevic e are measured between th e O3' oxygens , and between th e closes t oxygens of the phosphate group of the dinucleotides. The methy l groups of the thymines in d(m 5CGTAm5CG) ar e stippled.
unusual base 2-aminoadenine [o r diamminopurine, d(D) ] (Fig . 7.4) has been use d t o probe th e effec t o f this group o n Z-DNA structure and stability. Introducing this addi tional amin o grou p t o adenine s apparentl y stabilize s Z-DNA . Th e sequenc e d(CGTDCG) was crystallized from solutions with C S = 1. 1 M (61) . These condition s are comparabl e t o tha t o f d(m 5CGTAm5CG), eve n thoug h th e cytosine s ar e no t methylated. Ther e are two potential mean s b y which d D stabilize s Z-DNA. The firs t effect woul d b e th e introductio n o f a n additiona l hydroge n bon d t o th e bas e pair , making d(TD ) mor e aki n t o d(CG ) bas e pairs . Sinc e Z-DN A i s a mor e rigi d heli x than B-DN A (77,78) , thi s woul d affec t th e differenc e i n conformationa l entrop y between th e tw o DN A form s for the modifie d bas e pair. Th e secon d effec t woul d b e to plac e a n additiona l hydroge n bondin g functio n int o th e mino r groov e crevic e t o accommodate th e water s o f th e hydratio n spine . Thi s ha s bee n mor e extensivel y studied, an d thus will be the focu s o f this discussion on d(TpD ) dinucleotides . The structur e of the sequenc e d(CGTDCG) was solved in an unusual spac e group for Z-DNA , P3 221 (61) . Although i n a completely differen t lattice arrangemen t fro m other Z-DN A hexanucleotides, it s structure show s many o f the sam e feature s a s standard Z-DN A (Table s 7.8 an d 7.10) . I t is , however, slightl y underwoun d (th e averag e helical twist is ~8° more positive) compare d wit h d(CGCGCG) , with mos t o f this dis tortion associate d with the d(GpC ) step s [being approximately 7—9 ° less negative tha n comparable step s o f d(CGCGCG) ] an d a t on e o f th e termina l d(CpG ) step s (i n thi s case 4.7° overwoun d in the left-hande d direction) . The mino r groove is narrower as a result o f appreciabl e negativ e rol l a t nearl y al l dinucleotide step s o f th e helix . Thes e distortions ma y aris e fro m th e crysta l lattic e i n tha t th e termina l bas e pair s ar e no t stacked end-to-en d t o for m essentiall y continuou s strand s of Z-DNA as in th e 'stan dard' hexame r crystals . Th e duplexe s pac k perpendicula r t o an d agains t th e majo r groove surfac e of the neighbourin g duplex. Thi s general lattice is similar t o A-DNA
238
Oxford Handbook of Nucleic Acid Structure
packing modes, excep t tha t the end s of the duplexe s pack against the minor groove i n the crystal s of A-DN A hexanucleotide s (87) . Thus , thi s structur e may sho w mor e 'end-effects' tha n would normall y b e observed . There are , however, som e sequence dependent features . Despite thes e distortions, th e firs t hydratio n shell is again nearly identical to tha t of d(CGCGCG), i f the waters at the interface between helice s are ignored. Th e narrowe r minor groov e crevic e shifts th e spin e o f hydration, but doe s no t apparentl y 'squeeze' any wate r ou t a s in th e d(TpA ) dinucleotides . Thus , i t i s clea r tha t th e N 2 amin o group o f the purin e doe s play a significant role i n definin g the regula r pattern of this water network . Bot h the crystallizatio n conditions an d salt titrations followed b y cir cular dichrois m spectroscop y sho w tha t d(TpD ) dinucleotide s ar e mor e stabl e a s Z DNA tha n are d(TpA), bu t les s so than d(CpG ) (23) . Under dehydratin g conditions , however, th e hexame r d(TDTDTD ) form s A-DNA instea d of Z-DNA, a s measured by circula r dichroism . Th e flankin g d(CpG ) dinucleotide s i n d(CGTDCG ) ar e required t o induce d(TpD ) t o form Z-DNA, although th e cytosine bases do not nee d to be methylated. All thi s take n togethe r suggest s tha t demethylatin g th e thymin e an d addin g a n amino grou p t o th e adenin e [a s in a d(UpD) dinucleotide ] woul d greatl y enhance th e stability of Z-DNA relative to the standard d(TpA) dinucleotide , t o the point wher e i t should behav e more lik e a d(CpG ) bas e pair . Indeed , th e structur e of d(CGUDCG ) (24) mos t closel y resemble s tha t o f th e M G an d MGS P form s o f d(CGCGCG ) i n terms o f th e DN A conformatio n an d th e solven t interaction s a t th e majo r groov e surface an d mino r groov e crevice . Th e C S fo r crystallizatio n o f thi s sequenc e a s Z-DNA was identical to tha t o f d(CGCGCG). I t would b e interesting to extrapolat e from thi s t o determin e whethe r d(UDUDUD) , a s opposed t o d(TDTDTD) , woul d form Z-DN A in solution or in a crystal.
3.5 d(CpA)/d(TpG) dinucleotides in Z-DNA One o f the mos t prevalent simple, repeating sequences found in eukaryoti c genomes is the alternatin g patter n o f d(CpA)/d(TpG ) dinucleotide s (92-94) . Thes e AP P sequences ar e though t t o for m Z-DNA . Studie s o n Z-DN A forme d i n negativel y supercoiled plasmid s indicat e tha t th e orde r o f stabilit y fo r AP P dinucleotide s i s d(CpG) > d(CpA)/d(TpG ) > d(TpA ) (77,78) . Th e thermodynami c propensit y o f a d(CpA)/d(TpG) dinucleotid e t o for m Z-DNA is not, however , simpl y a n average of the d(CpG ) an d d(TpA ) dinucleotides . Th e firs t conversio n o f a d(CG ) bas e pai r i n the standar d d(CpG) dinucleotid e t o a d(TA ) bas e pair i s no t a s destabilizing a s th e second. I s this reflected i n th e crysta l structure? The single-crysta l structur e o f th e sequenc e d(CACGTG ) ha s bee n solve d t o ~2.5 A resolutio n (49) , which i s one o f th e lowes t resolutio n structure s of Z-DNA. The structur e show s tw o feature s tha t ma y contribut e t o th e lowe r propensit y o f d(CpA)/d(TpG) dinucleotide s t o form Z-DNA. One i s that the lack of an N2 amin o group o n th e adenin e bas e reduce s th e stackin g surfac e an d thu s result s i n poore r stacking interactions at the d(ApC ) step s a s opposed t o th e d(GpC ) steps . This cannot be th e majo r contributor , sinc e onl y th e d(ApC ) ste p a t the A8/C 9 position s show s this poore r stacking . Th e d(ApC ) ste p a t A2/C3 compensate s by placin g the phos -
The single-crystal structures of Z-DNA 23
9
phate o f C 3 i n th e ZI I conformation . Thi s displace s th e A 2 purin e s o tha t it s six membered rin g lies directly o n top o f the cytosin e base. The othe r effec t i s observed in th e solven t structur e of the mino r groove . Althoug h the mino r groov e crevic e o f thi s sequenc e i s identica l i n widt h t o tha t o f d(CGCGCG), ther e were n o ordere d solvent molecules located at or near the adenine bases in th e groov e (49) . Th e suggestio n her e was that th e N 2 amin o grou p tha t i s missing from the adenin e base contributes to th e disruptio n o f the spin e of hydration. As with th e d(TpA ) dinucleotide , th e bridg e fro m the purin e bas e to th e phosphori bose backbone , whic h appear s to b e importan t fo r stabilizin g th e purin e i n th e syn conformation, i s lost. I n suppor t o f thi s proposition , th e structur e o f d(CDCGTG ) (23) show s th e sam e organizatio n o f wate r molecule s i n th e mino r groov e a s doe s d(CGCGCG). I n addition , th e structur e o f d(CDCGTG) i s identical t o th e MGS P form o f d(CGCGCG) i n al l respects (Tables 7.8 an d 7.10) . Thi s woul d contribut e t o the lower stabilit y of d(CpA)/d(TpG) dinucleotides . We ha d argue d abov e wit h th e structur e o f d(m 5CGUAm5CG), however , tha t a wide minor groove , eve n i n the absence of the N2 amin o grou p o n the purine, allow s waters t o organiz e int o th e well-ordere d spin e i n Z-DNA . Wh y i n thi s case, wher e the width s o f the mino r groov e crevic e o f d(CpA)/d(TpG ) ar e identical t o thos e o f d(CpG) an d d(CpD)/d(TpG) , wer e n o ordere d water s locate d nea r th e adenines ? It may be tha t the water s are less populated an d thus could no t b e observe d a t the lowe r resolution o f thi s structure . Th e structur e o f d(CACGCG)/d(CGCGTG ) ha s bee n solved t o 1. 6 A resolutio n (50) , where on e coul d expec t t o observ e les s populate d solvent molecules . However , thi s asymmetri c sequenc e show s orientationa l disorde r about th e dyad-axi s o f th e duplex ; therefore , i t woul d b e difficul t t o assig n solven t structure definitively at the d(TA ) base pairs. These bas e pairs effectively overla p in th e electron densit y maps. The questio n therefore remains unanswered. If a higher resolu tion structur e o f d(CpA)/d(TpG ) dinucleotid e doe s indee d sho w th e sam e typ e o f pyrimidine—water—water—phosphoribose bridge a s was observed wit h th e d(UpA ) step, then we can start to understand wh y introducing the first d(TA) bas e pair into a dinucleotide i s not a s destabilizing to Z-DN A as the second .
3.6 Out-of-alternation structures Z-DNA ca n tolerate dinucleotide s tha t d o no t follo w th e AP P rul e fo r its formation (that is, they ar e out-of-alternation, an d place pyrimidine bases in th e disfavoure d syn conformation). Th e crysta l structures of d(Br 5CGATBr5CG) an d d(m 5CGATm5CG) were th e firs t t o indicat e tha t the AP P rul e coul d b e violated (52) , and the structur e of the brominated sequenc e was the on e reported. I n the structure of d(Br5CGATBr5CG), both thymin e base s o f the centra l dinucleotid e adop t th e syn conformation whil e th e complementary adenine s ar e anti. Still , th e backbon e conformatio n i s remarkabl y similar t o tha t o f d(CGCGCG ) (Table s 7.8 an d 7.11) . Th e twis t angl e (ft ) fo r th e anti—p—syn ste p o f th e d(ApT ) dinucleotid e i s —9° , while al l th e syn—p—anti step s ar e —49°. Al l nucleotide s i n th e anti conformatio n hav e C2'-endo suga r puckers, while a majority o f those that ar e syn have C3'-endo puckers. Exceptions t o thi s rul e wer e a t the guanine s a t th e 3'-en d o f each strand . Thus , th e alternatin g sugar conformation s remain eve n when th e pyrimidine s ar e syn.
Table 7.11. Th e effect s o f out-of-alternation bas e step s o n the helica l structure of Z-DNAa
d(CpG) step s Twist (O ) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Rise (Dz) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Roll (p) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average Tilt (T) (C1:G12)/(G2:C11) (C3:G10)/(G4:C9) (C5:G8)/(G6:C7) Average d(GpC) step s Twist (O) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
d(m5CGm5CGm5CG)
d(m5CGGCm5CG)
d(m5CGGGm5CG)/ d(m5CGCCm5CG)
d(m5CGGGm5CG)/ d(Br d(m5CGC m5CCG)
-14.4 -14.5 -16.1 -15.0 ± 1. 0
-13.6 -11.4 -14.8 -13.3 ± 1. 7
-13.6 -12.4 -14.7 -13.6 ± 1. 2
-13.2 -13. -12.2 -9. -14.7 -12. -13.4 ± 1. 2 -11.
3.9 3.7 3.9 3.8 ± 0. 1
4.0 3.6 3.6 3.7 ± 0. 2
3.9 3.6 3.8 3.8 ± 0. 2
4.0 3.8 4.0 3.9 ± 0. 1
0.1 -2.6 1.5 -0.3 ± 2. 1
4.3 0.4 -0.3 1.5 ± 2. 5
2.0 -0.9 -0.3 0.3 ± 1. 5
2.2 0.2 -2.0 0.2 ± 2. 1
-7.1 -0.1 -1.0 -2.7 ± 3. 8
3.4 5.1 -3.9 1.5 ± 4. 8
-1.9 2.0 -1.7 -0.5 ± 2. 2
5.1 1.8 2.5 3.2 ± 1. 7
-43.6 -44.5 -44.1 ± 0. 6
-46.6 -46.8 -46.7 ± 0. 1
-46.8 -46.8 -46.8
-47.0 -47.3 -47.2 ± 0. 2 -49.
5
CGATBr5CG)b
0 0 0
0 ± 2. 0
0
Table 7.11. Continued
Rise (Dz) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Roll (p) (G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average Tilt (T)
(G2:C11)/(C3:G10) (G4:C9)/(C5:G8) Average
d(m5CGGGm5CG)/ d(Br d(m5CGC m 5CCG)
d(m5CGm5CGm5CG)
d(m5CGGCm5CG)
3.8 3.8 3.8
3.7 3.7 3.7
3.6 3.8 3.7 ± 0. 1
3.6 3.8 3.7 ± 0.1
-2.4 -3.5 ± 1. 6
5.6 1.3 3.5 ± 3. 0
0.2 -0.1 0.0 ± 0. 2
1.3 2.1 1.7 ± 0.6
0.8 0.3 0.6 ± 0. 4
-1.4 0.9 -0.3 ± 1. 6
-1.3 -0.1 -0.7 ± 0. 8
-0.1 -0.8 -0.4 ± 0.5
2.0 3.4 4.8 1.2 0.3 2.1 2.3 ± 1. 6
5.0 5.8 2.4 3.1 1.1 1.3 3.1 ± 1. 9
0.9 1.8 3.0 2.2 2.3 2.8 2.2 ± 0. 7
2.2 1.7 4.5 1.7 1.0 0.8 2.0 ± 1.3
-4.6
d(m5CGGGm5CG)/ d(m5CGCCm5CG)
Base pairs Propeller twis t (w) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average
5
CGATBr5CG)b
Table 7.11. Continued d(m5CGm5CGm5CG) d(m Buckle (K) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Averagec x-displacement (dx) C1:G12 G2:C11 C3:G10 G4:C9 C5:G8 G6:C7 Average a
6.2
-4.8
2.1
-5.7
5.3
5
CGGCm5CG)
4.7
-0.5 14.1 -12.6
0.6
d(m5CGGGm5CG)/ d(m5CGCCm5CG) 2.6
-0.8 14.8 -5.4
2.3
d(m5CGGGm5CG)/ d(m5CGC m 5CCG)
d(Br5CGATBr5CG)b
1.5
-3.1 13.9 -4.0
1.9
-3.8 4.7 ± 1. 5
-3.5 6.0 ± 5.9
-3.2 4.8 ± 5.1
-1.0 4.2 ± 4.9
-3.6 -3.6 -3.4 -3.5 -3.8 -3.6 -3.6 ± 0.1
-3.8 -3.7 -3.3 -3.4 -4.0 -4.0 -3.7 ± 0.3
-3.7 -3.7 -3.3 -3.4 -3.9 -3.7 -3.6 ± 0.2
-3.5 -3.4 -3.2 -3.4 -3.9 -3.6 -3.5 ± 0.2
Base ste p an d base pair parameters ar e shown for crystallize d Z-DN A structure s containin g out-of-alternatio n bas e pairs (underlined) . Al l values are in degrees , except rise (Dz) an d displacemen t (dx), which are in A . * Values shown fo r d(Br 5CGATBr5CG) ar e from ref . 52 . ' Averages fo r base pair buckle wer e calculated fro m th e magnitude s o f the values liste d ( = (E |Ki| )/i, where ki is the buckl e a t base pair i).
Thesingle-crystalstructures of Z-DNA 24
3
Fig. 7,7 . Compariso n o f th e ou t o f alternatio n bases in th e structure s o f (A ) d(mCGATnpCG) (52 ) an d (B) d ( m ' C G G G m ' C G ) / d ( m 5 C G m 5 C C m 5 C G ) (59) . Shown ar e th e dinucleotid e slack s o f th e out-of-alter n a t i o n bas e pairs . Th e pyrimidin e base s cha t ar e i n th e disfavore d sy n conformatio n ar e highlighte d b y th e stippled rings . Th e to p bas e pair s o f th e stack s ar e shown a s soli d atom s an d bonds , w h i le the bottom bas e pairs ar e i n ope n atom s an d bonds . (A ) View s dow n th e heli x axi s o f th e s y n — p — a n t i . anti —p—syn, an d syn—p—anti arrangement s o f th e out-of alternatio n bas e pair s i n d(m'CGATm'CG ) ar e shown . Th e struc ture show s that th e sy n thymin e i s unstacke d an d protrude s awa y from th e m a j o r groov e surfac e fo r th e d ( G 2 : C H ) / d ( A 3 : T H ) , d(A3:T10)/(T4:A9) , an d d(T4:A9)/d(G6:C7 ) stacke d bas e pairs . Th e guanine s at th e tw o out-of-alternatio n base pair s sucke d o n to p o f eac h othe r i n th e d(A3:T10)/d(T4:A9 ) stack. (B) Th e anti- p syn an d sy n p anti stackin g of d(CG) bas e pair s ar e show n dow n [h o heli x axi s an d per pendicular t o th e axi s o f the d ( m 5 C G G G m 5 C G ) / d ( m 5 C G C m 5 C m 5 C G ) structure . I n th e view s dow n th e axis, th e singl e syn cytosin e i s shown t o b e unstacke d als o an d protrudin g awa y fro m th e majo r groov e surface. Th e view alon g th e heli x show s th e out-of-alternatio n d(G3:C10* ) bas e pair . Th e C10 * bas e is buckled w i t h respec t to th e G 3 plane .
The significan t effect s o f th e out-of-alternatio n bas e pair s o n th e structur e o f Z-DNA ar e seen i n th e stackin g of th e base s (Fig . 7.7) . Th e purin e base s nearl y com pletely overla p i n th e anti—p— syn stack , eve n mor e s o tha n th e pyrimidin e base s o f th e standard AP P sequences . Th e thymin e bases , however , ar e completel y unstacke d i n both th e d(ApT ) an d d(GpA ) step s and , therefore , protrud e ou t fro m th e majo r groov e surface an d int o th e solvent . The organizatio n o f solven t i n th e mino r groov e i s differen t fro m tha t o f th e AP P d(m5CGTAm5CG) structure . I n thi s latte r case , n o ordere d water s wer e observe d a t the d(TA ) bas e pairs . Th e d(AT ) bas e pair s o f th e out-of-alternatio n structur e d o support ordere d waters , bu t i n a slightl y differen t arrangemen t tha n i n d(CGCGCG) . In thi s case , the N 3 nitroge n o f adenin e i s accessible, as it i s in B-DNA . There ar e several questions tha t wer e lef t unanswere d b y thi s structure. Wh y ar c syn pyrimidines unstabl e i n Z-DNA ? Th e supercoil-induce d B- Z transitio n fre e energ y (AG° T ) fo r th e AP P dinucleotid e d(CpA)/d(TpG ) i s 1. 3 kcal/mol (10) , while tha t fo r the non-AP P dinucleotid e d(TpC)/d(GpA) i t i s 2. 5 kcal/mo l (13 ) (Tabl e 7.1) . Thus,
244
Oxford Handbook of Nucleic Add Structure
placing a single syn thymine require s 1. 2 kcal/mol. Th e origina l explanation was that pyrimidines ar e stericall y inhibite d fro m adoptin g th e syn conformation becaus e o f collisions between th e base and the deoxyribos e (95,96) . The intramolecula r distances from th e thymine t o the suga r in the d(Br 5CGATBr5CG) structure , however, ar e only slightly shorte r tha n thos e o f guanines syn to thei r sugars . I t i s unclear a s to whethe r the stackin g of bases accounts for this destabilizing effect since , although th e thymine s are poorly stacked , the adenine s show better stackin g interactions. It is more likel y that the protrusio n o f the out-of-alternatio n thymine s int o th e solven t make s th e differ ence. This will be discusse d in greater detai l later. The othe r question s remainin g ar e whethe r a singl e bas e pai r tha t i s out-of alternation i s more o r les s stabl e than tw o adjacen t out-of-alternatio n bas e pairs i n a non-APP dinucleotide ? Finally , ar e out-of-alternatio n d(TA ) bas e pairs mor e o r les s stable than d(GC) ? These question s can potentially be addressed by studying the struc tures of non-APP d(GpC ) sequences . Only recentl y hav e structure s of Z-DN A hexanucleotide s been solve d tha t plac e d(CG) bas e pair s out-of-alternation . Th e firs t wa s th e non-self-complementar y sequence d(m 5CGGGm5CG)/d(m5CGCCm5CG) (59) , which ha s a singl e syn cyto sine bas e (underlined) . Lik e the d(ApT)-containin g structure , this cytosin e protrude s into th e majo r groove, but th e bas e pair is significantly buckled (Fig. 7.7). This distor tion t o th e bas e pair , whic h relieve s th e steri c strai n o f placin g th e cytosin e syn, appears to be induce d by the methy l grou p o f an adjacent cytosine . W e had propose d that i n th e absenc e of methylation o f the flanking d(CpG) dinucleotides , th e pyrimi dine bas e woul d slid e awa y fro m th e ribos e t o reliev e th e steri c strain, much lik e th e thymines d o i n th e d(Br 5mCGATBr5CG) structur e (52). I n th e refine d structure, the steric energy was calculated to be essentiall y identical between thi s out-of-alternatio n structure an d th e standar d structur e o f d(m 5CGm5CGm5CG). Th e structur e o f d(m5CGGGm5CG)/d(m5CGCm5CCG), however, show s the out-of-alternation d(CG ) base pai r wit h essentiall y th e sam e hig h buckle , eve n i n th e absenc e o f th e methy l group o f the adjacen t cytosine . Similarly , both base pairs that are out-of-alternation i n the structur e o f d(m 5CGGCm5CG) sho w thi s sam e bucklin g (Tabl e 7.11) . Thu s thi s distortion t o th e bas e plane i s inherent t o out-of-alternatio n bas e pairs, regardless of the flankin g bas e pairs . I t ma y simpl y b e tha t th e syn pyrimidin e bas e i s no t sand wiched by the bas e and deoxyriboses of the tw o flanking bas e pairs, a s is the standar d guanine base. The syn cytosine affect s th e solven t structur e at both th e majo r groove surfac e an d minor groove crevice . I n the minor groove o f the out-of-alternatio n d(CG ) bas e pair , a water is hydrogen bonded t o th e N 2 amin o grou p o f the guanin e base and no waters are observed bound t o the now inaccessibl e O2 oxyge n o f the cytosine , as in the non APP d(ApT) dinucleotides . In addition , there is no pattern o f ordered water s aroun d this d(CG) bas e pair. This may , however, b e associate d with th e orientationa l disorde r of this non-self-complementary sequence . The differenc e i n stabilit y betwee n a standar d d(CG ) bas e pai r an d a n out-of alternation d(GC ) bas e pai r wa s estimate d fro m supercoile d ccDN A studie s t o b e 1.7 kcal/mol/bp [AG° T for an APP d(CpG ) dinucleotid e (dn ) is 0.7 kcal/mol/d n (9) , while tha t fo r a d(CpC)/d(GpG ) dinucleotid e i s 2.4 kcal/mol/d n (13)] . W e believ e that these solvent rearrangements play a role in this destabilization of Z-DNA.
The single-crystal structures of Z-DNA 24
5
Perhaps th e tw o mos t dramati c example s o f th e out-of-alternatio n structure s are the sequence s d(CCCGGG ) an d d(m 5CGGCm5CG). Bot h resembl e sequence s tha t one migh t expec t to for m A-DN A instea d of Z-DNA . Indeed , the revers e of the latter sequence , a s i n th e hexamer s d(GCCGGC ) an d it s methylate d analogu e d(Gm5CCGGC), hav e been crystallized as A-DNA (87) . The structur e of d(CCCGGG ) has no t been publishe d i n detail an d thus w e cannot discus s it in this revie w (47) . We have recentl y complete d th e structur e of d(m 5CGGCm5CG) an d fin d i t t o b e nearly identical t o th e structur e of d(CGCGCG) a t the leve l o f the DN A (Tabl e 7.11). Th e one major exceptio n is in the high buckle o f the base pairs that are out-of-alternatio n (as discusse d above) . Th e othe r importan t structura l perturbation i s foun d i n th e solvent structure. At the major groove surface , th e waters that bridge eac h cytosine are not a s apparent, eve n a t the flankin g anti—p—syn d(CpG ) dinucleotides . A t th e centra l out-of-alternation anti—p—syn d(GpC ) dinucleotide , th e tw o stacke d guanines , however, sho w analogou s solvent structure s to thos e o f the standar d stacked cytosines in d(CGCGCG) . Fo r th e flankin g anti—p—syn d(CpG ) dinucleotide s i n th e mino r groove, th e waters that link the guanin e N2 amin o group s to the phosphoribose backbone wer e stil l observed and thus help t o stabilize these in the syn conformation. Th e spine o f hydration that links the cytosine s in th e mino r groove , however , i s no longe r present. At the tw o centra l out-of-alternation d(GC ) bas e pairs, th e syn cytosines ar e not a t all accessible to solven t in th e minor groov e crevice . Th e tw o stacke d guanines, however, ar e bridge d b y tw o water s tha t ar e analogou s t o th e water s tha t normall y form th e spine that bridges the centra l cytosine bases in d(CGCGCG). Thi s ma y help to increas e the stabilit y of the tw o bas e pairs that ar e out-of-alternation i f they occu r adjacent t o eac h othe r a s opposed t o bein g separate d in a sequence. Thus , althoug h the DN A structur e is not dramaticall y affected i n this very unlikel y Z-DNA sequence, the wate r interactions are .
4. Summary: sequence effects on the structure and stability of Z-DNA The nucleotid e sequenc e affect s no t onl y th e structure , bu t als o th e stabilit y o f Z-DNA. We have concentrated o n ho w th e majo r and minor grooves ar e affected, a s well a s the relate d solvent rearrangments at these surfaces because these are the classica l explanations give n fo r whethe r a DN A duple x conformatio n i s stabl e o r not . Th e characterization o f Z-DNA sequence s tha t contai n d(m 5CG), d(CG) , d(CI) , d(UA) , d(TA), an d d(TD ) bas e pair s in variou s combination s suggest s tha t ther e ar e several distinct factors important fo r Z-DNA stability. Amidation o f the purine base at the C 2 helps t o stabilize Z-DNA . Removing the N 2 amin o group from guanin e destabilize s Z-DNA in d(CG) sequences , while addin g this group t o adenin e helps to stabilize the structure i n d(TA)-containin g sequences . Methylation a t the C 5 positio n o f pyrimidine bases has both a stabilizing and destabilizing effec t o n Z-DNA. Z-DNA is stabilized b y methylatio n o f cytosines , a s i n d(m 5CG), an d als o whe n thymine s ar e demethylated to for m deoxyuridine. Thi s apparen t contradictory effec t o f methylation depends o n it s position relativ e to th e amin o an d keto group s o f the bas e pairs in th e major groove.
246
Oxford Handbook of Nucleic Acid Structure
We shoul d stress , however , tha t comparison s o f Z-DN A structure s alon e canno t provide a n accurate account of the factor s tha t stabiliz e a sequence in thi s form. Thes e same parameter s mus t b e compare d wit h th e referenc e B-DNA structure s o f thes e sequences. Eve n then , however , i t i s not entirel y clea r ho w al l these variou s factor s contribute t o th e abilit y o r inabilit y o f certai n sequence s t o adop t th e left-hande d form o f the duplex. For example, i f one simply compares th e spine of hydration in the narrow mino r groov e acros s th e variou s Z-DN A structures , we se e that thi s spin e is disrupted b y narro w mino r grooves , th e lac k o f an amin o grou p contribute d b y th e purine, and base pairs that violate th e alternatin g pyrimidinepurin e sequenc e motif fo r the anti—p—syn dinucleotid e stacking . I t i s als o clea r tha t solven t interaction s a t th e major groov e (e.g . cation complexes that bridg e stacke d adjacent bases ) wil l als o affec t the structur e o f the mino r groov e an d its hydration spine . Whethe r thi s facilitate s o r hinders the formation o f Z-DNA depends o n ones point o f view. On e ca n argue tha t solvent interaction s ar e stabilizing since water s ca n form a direct bridg e fro m th e N 2 amino grou p (i f present) of the purine bas e to th e DNA backbone , whic h woul d hel p to hol d th e bas e in the syn conformation. Furthermore , al l this discussion says nothin g about the effec t o f this amino grou p o n th e spine o f hydration in the minor groov e o f B-DNA. However, a well-structured wate r networ k ca n be argued t o be destabilizin g to eithe r B - o r Z-DN A fro m th e perspectiv e o f the reduce d entrop y o f the solven t structure (24) . Thus, althoug h th e larg e data set of single-crystal structure s for differen t sequences an d substituen t groups a s Z-DNA provides a wealth o f structural informa tion, th e detail s may not tel l us much abou t th e stabilit y of this unusual conformatio n if we are confined to thes e qualitative comparisons . One approac h that doe s utilize the crysta l structures to stud y and predict th e effect s of sequence of substituent groups on Z-DN A stability is to calculat e solvent fre e ener gies (SFEs ) fro m th e structures , and t o compar e thes e to SFE s for th e sam e sequences as B-DN A (ASFE Z_B). I n thi s case , th e referenc e B-DNA stat e i s treate d explicitly . Unfortunately, no t al l the variou s substituen t modification s ar e wel l represente d i n B-DNA crysta l structures ; however , th e SFE s calculate d fro m B-DN A models con structed usin g idealized parameter s appear to represen t accurately the fre e energ y fo r hydrating thi s form , eve n whe n compare d wit h th e conformation s o f sequence s i n single crystal s (89,97). For the standar d APP sequences , we ca n derive a thermodynamic cycl e (26 ) to elu cidate ho w eac h bas e substituen t affect s th e hydratio n an d stabilit y o f Z-DNA . Fo r example, deaminatio n o f the guanin e i n d(CG ) bas e pairs to for m d(CI ) ha s an energetic cos t o f + 1.6 kcal/mol , wherea s aminatio n o f d(TA ) t o for m d(TD ) favour s Z-DNA by —1. 4 kcal/mo l (Fig . 7.8) . Thi s underscore s the importanc e o f the amin o group i n the mino r groov e an d is consistent wit h it s role i n coordinatin g wate r molecules to for m the spin e of water molecule s tha t travers e the mino r goov e (51) . I t also explains th e apparen t contradictory effec t o f methylation o n th e stabilit y o f Z-DNA, with methylatio n o f cytosines favouring the left-handed for m and the thymine methy l disfavouring thi s form. Thi s i s not intuitive , bu t whe n th e SAS s of each surfac e typ e are compare d fo r B - an d Z-DNA , the y becom e mor e apparen t (Tabl e 7.12) . Methylation o f cytosin e does hav e th e effec t o f increasin g th e overal l expose d hydrophobic surfac e fo r th e d(CpG ) dinucleotides ; however , thi s increas e i s significantly greate r for B-DNA than for Z-DNA and thus increases the relativ e stabil -
The single-crystal structures of Z-DNA 24
7
Fig. 7.8. Effec t o f substituent group s on th e difference s i n solven t free energie s (ASFE) an d th e stabilit y (AAG°T) o f dinucleotides in Z-DN A versus B-DNA. A thermodynamic cycl e is shown fo r the addition , removal, o r replacemen t of variou s substituen t groups , startin g wit h th e mos t stable dinucleotid e a s Z DNA d(m 5CpG) t o th e leas t stable [d(CpI ) an d d(TpA)] and back to d(m 5CpG).The ASFE ar e shown for each dinucleotide, while the effect s o f the chang e in th e substituen t o n th e stabilit y o f Z-DNA (AAG° T) are shown for each modification step .
ity o f th e left-hande d form . I t i s now becomin g eviden t tha t cytosin e methylatio n destabilizes B-DNA, allowin g th e formatio n o f A-DNA in crystal s (87), and increas ing the frequenc y fo r cytosine deamination i n solution (98) . We ca n als o mak e som e prediction s concernin g th e bas e pair s tha t ar e out-of alternation. Fro m th e SF E calculations (Tabl e 7.9) , w e ca n se e that th e d(ApT ) a s an out-of-alternation anti—p-syn dinucleotid e i s predicte d t o b e les s stabl e a s Z-DNA compared with th e analogous d(GpC ) out-of-alternatio n dinucleotide . Thi s appear s to be associate d primarily no t wit h th e out-of-alternatio n step s themselves [i n this case, the d(ApT ) ste p i s actually mor e stable] , but wit h ho w eac h out-of-alternatio n bas e pair affect s th e flankin g bas e pairs. Finally, a single d(CG ) tha t i s out-of-alternation i s predicted t o b e onl y slightl y les s destabilize d a s Z-DNA compare d wit h th e d(GpC ) dinucleotide. Thus , w e woul d expec t tha t placin g th e cytosine s o f two adjacen t bas e pairs in a syn conformation i s more favourabl e than havin g them separated . Upon puttin g al l of this together i n th e contex t o f the crystallograph y o f Z-DNA, it became evident tha t th e SF E calculations are useful a s an analytical tool for predict ing th e targe t sal t concentration s fo r obtainin g crystal s o f this conformatio n (22) . A
248
Oxford Handbook of Nucleic Acid Structure
Table 7.12 . Solven t accessibl e surface area s (A 2) of dinucleotides step s as B- an d Z-DNA Conformation
Base atoms
(B/Z) Dinucleotid
eC
CH3(C5) O
Ribose atoms
N
N2 — —
C'
O' P
Total
B Z
d(TpA) d(TpA)
43 .6 46,,0
44.8 46.2
32.6 55.6 27.0 50.8
B Z
d(TpD) d(TpD)
28..8 33,,3
44.8
46.3
29.8 46.8 26.0 183.8 43.8 132.8 536.6 27.4 49.7 21.5 170.9 41.8 133.2 524.1
B Z
d(CPG) d(CPG)
49,,4 56..3
-
31.0 59.1 23.6 185.5 47.2 132.6 528.4 44.2 48.7 19.6 184.0 47.4 132.1 532.3
B Z
d(CPG) d(CpG)
64,.8 71,.4
—
38.0 65.2 47.2 48.8
-
197.7 52.3 132.6 550.6 199.4 46.8 133.0 546.6
B Z
d(UpA) d(UpA)
63..2 68..6
-
39.2 57.6 37.8 57.0
-
190.0 51.4 133.6 535.0 194.6 42.0 133.8 533.8
B Z
d(m5CG) d(m5CG)
26..9 31..4
48.3 50.5
31.1 46.6 30.3 195.7 42.3 127.1 548.2 37.7 43.8 20.2 180.8 40.6 141.9 547.0
B Z
d(ApT)a d(APT)a
33.,0 55.,4
55.1 80.3
35.1 58.4 24.6 37.2
B Z
d(GpC)a d(GpC)a
34. 3 75.,3
-
28.3 57.9 28.1 187.8 40.3 128.8 505.6 21.6 70.5 20.8 177.2 47.1 132.5 544.9
B
d(GpG)/ d(CpC)a d(GpG)/ d(CpC)a
32..8
-
28.5 60.0 27.7 191.3 41.5 128.3 510.1
67..2
-
25.5 62.0 20.6 183.6 40.9 135.9 535.7
Z
-
182.8 51.4 132.8 543.6 188.2 42.0 133.6 533.8
185.4 48.0 127.7 542.7 171.8 48.9 131.0 549.2
' Out-of-alternation dinucleotid e step.
comparison o f the CS for crystallization of the curren t Z-DN A sequences (Table 7.2) shows a strong corelation t o th e ASFE Z_B for these sequences (Fig. 7.9). This relationship apparently arises for the stabilizatio n o f the Z - versu s the B-for m a s both th e sal t and alcoho l concentration s in the crystallizatio n set-up s are increased . The pathwa y for crystallization , therefore , direct s th e DN A t o th e left-hande d for m i n solution , while avoidin g various amorphous precipitant form s alon g the way. The shortcomin g o f the SF E approach in studying stability is that we d o no t utiliz e any o f the detaile d informatio n o n solven t interaction s gleaned fro m th e hig h resolu tion single-crystal structures . The genera l hydration parameter s from the SF E calculations shoul d somehow b e relate d t o thes e specifi c pattern s o f water structure . This is perhaps wher e Z-DN A ma y pla y it s mos t significan t rol e i n physica l biochemistry . The accumulate d structura l an d thermodynami c dat a fo r al l these variou s sequence s can provid e a benchmar k fo r th e developmen t o f molecula r forc e fields . I t serve s much the same function as the hydroge n ato m to physica l chemists. The propertie s of
The single-crystal structures of Z-DNA 24
9
Fig. 7.9. Relationshi p between th e effective catio n concentration (lo g cation strength or logCS) o f the crystallization solution s an d th e differenc e i n solven t fre e energy betwee n Z-DN A an d B-DN A (AASFEZ_B) for sequences crystallized as Z-DNA. The logC S that could be determined (Table 7.2) for all sequences (that contain base pairs of the typ e defined i n Fig. 7.4) are plotted relative to the AASFE Z_B calculated (Tabl e 7.9) for these sequences. The ope n circle represents the sequence d(CGCICG), which was crystallized by the hangin g drop method of vapour diffusion. Th e lin e represents the bes t linear fit of the data fo r sequence s wit h SF E between -0.4 an d +0. 4 kcal/mol/dn (slop e = 1.36 , y-intercep t = 0.05 , R = 0.93). The plot asymptotes at both high and low values for logCS. At the high end, the sal t concen trations reac h th e poin t o f saturatio n i n th e crystallizatio n solutions , whil e th e lo w en d represent s th e minimum amount o f cations required to crystalliz e the DNA s (approximatel y equa l to th e concentration of mononucleotide equivalents in the DNA).
Z-DNA ar e no w ver y wel l understood ; next , w e nee d t o develo p th e theorie s t o explain th e properties. Onc e developed , thes e sam e principle s shoul d b e generall y applicable to th e stud y and prediction o f all classes of biological macromolecules.
Acknowledgements This wor k ha s bee n supporte d b y grant s fro m th e Nationa l Scienc e Foundatio n (MCB972824), th e Nationa l Institute s o f Healt h (R05GM54538A) , an d th e Environmental Health Sciences Center a t Oregon State University (NIEH S ES00210). We would like to thank Mason Kwong an d Christine Nguye n fo r their help with this project.
References 1. Watson , J.D. and Crick, F.H.C. (1953) Nature 171 , 737. 2. Franklin , R.E. and Gosling, R.G. (1953) Nature 172 , 156. 3. Pohl , R.M. and Jovin, T.M. (1972) J. Mol. Biol. 647, 375. 4. Wang , A.H.-J. , Quigley , G.J. , Kolpak, F.J. , Crawford, J.L. , van Boom , J.H., van de r Marel, G. and Rich, A. (1979) Nature 282, 680.
250
Oxford Handbook of Nucleic Acid Structure
5. Wang , A.H.-J., Quigley , G.J. , Kolpak, F.J., van der Marel, G., van Boom, J.H. an d Bach, A. (1979 ) Science 211, 171 . 6. Drew , H.R. , Takano , T. , Tanaka , S. , Itakura , K . an d Dickerson , R.E . (1980 ) Nature (London) 286 , 755 . 7. Shakked , D., Rabinovich , D. , Cruse , W.B.T., Egert, E. , Kennard, O., Sals , G. , Salisbury, S.A. an d Viswamitra, M.A. (1981 ) Proc. R. Soc. (London) B 213, 479 . 8. Wing , R.M. , Drew , H.R. , Takano , T. , Broka, C., Tanaka , S., Itakura, K., and Dickerson, R.E. (1980 ) Nature 287, 755 . 9. Peck , L.J. and Wang, J.C. (1983 ) Proc. Natl. Acad. Sci. USA 80 , 6206 . 10. Vologodskii , A.V . an d Frank-Kamenetskii , M.D . (1984 ) J. Biomol. Struct. Dynamics 1 , 1325. 11. Ellison , M.J., Feigon , J., Kelleher , R.J. , III , Wang , A.H.-J., Habener , J.F . an d Rich, A. (1986) Biochemistry 25 , 3648 . 12. McLean , M.J., Lee, J.W. an d Wells, R.D . (1988 ) J. Biol. Chem. 263, 7378 . 13. Ellison , M.J., Kelleher , R.J., III , Wang, A.H.-J., Habener, J.F. an d Rich, A. (1985) Proc. Natl. Acad. Sri . USA 82 , 8320 . 14. Behe , M. an d Felsenfeld , G . (1981 ) Proc. Natl. Acad. Sri . USA. 78 , 1619 . 15. Moller , A. , Nordheim, A. , Kozlowski, S.A. , Patel, D. an d Rich, A. (1984 ) Biochemistry 23 , 54. 16. Jovin , T.M. , McIntosh , L.P., Arndt-Jovin, D. , Zarling , D.A., Robert-Nicoud , M., va n de Sande, J.H. an d Jorgenson, K.F. (1983) J. Biomol. Struct. Dynamics 1 , 21. 17. Sagi , J., Szemzo , A., Otvos, L. , Vorlikckova, M . an d Kypr, J. (1991 ) Int. J. Biol. Macromol. 13, 329 . 18. Fujii , S. , Wang, A.H.-J. , va n de r Marel , G. , va n Boom , J.H . an d Rich, A . (1982 ) Nucl. Acids Res. 10, 7879 . 19. Chevrier , B. , Dock , A.C. , Hartmann , B. , Leng , M. , Moras , D. , Thuong , M.T . an d Westhof, E. (1986) J. Mol Biol. 188, 707 . 20. Vorlickova , M. an d Sagi, J. (1991 ) Nucl. Acids Res. 21, 2343 . 21. Wang , L . and Keiderling, T.A. (1993 ) Nucl. Acids Res. 21, 4127 . 22. Ho , P.S. , Kagawa, T.F., Tseng , K. , Schroth, G.P . an d Zhou, G . (1991 ) Science 254, 1003 . 23. Coll , M. , Wang , A.H.-J. , va n de r Marel , G.A. , va n Boom , J.H. an d Rich , A . (1986 ) J. Biomol. Struct. Dynamics 4 , 157 . 24. Schneider , B., Ginell , S.L. , Jones, R. , Gaffney , B . an d Berman, H.M . (1992 ) Biochemistry 31, 9622 . 25. Zhou , G . and Ho, P.S . (1990) Biochemistry 29, 7229 . 26. Kagawa , T.F., Howell , M.L. , Tseng , K . and Ho, P.S . (1993 ) Nucl. Acids Res. 21, 5978 . 27. McDonnell , N.B . an d Preisler , R.S . (1989 ) Biochem. Biophys. Res. Commun. 164 , 426. 28. Preisler , R.S., Chen , H.H. , Colombo , M.F., Choe , Y. , Short, B.J.J. and Rau, D.C . (1995 ) Biochemistry 34 , 14400 . 29. Tereshko , V. and Milinina, L . (1990) J. Biomol. Struct. Dynamics 7, 827 . 30. Feuerstein , B.G., Williams , L.D. , Basu , H.S . an d Marton, L.J . (1991) J. Cell. Biochem. 46, 37. 31. Thomas , T.J. , Gunnia , U.B. an d Thomas, T . (1991 ) J. Biol. Chem. 266, 6137 . 32. Thomas , T.J. an d Thomas, T . (1994 ) Biochem. J. 298 , 485 . 33. Rahmouni , A.R . an d Wells, R.D . (1989 ) Science 246, 358 . 34. Ramakrishnan , B. and Viswamitra, M.A. (1988 ) J. Biomol. Struct. Dynamics 6 , 511 . 35. Drew , H.R . an d Dickerson , R.E. (1981 ) J. Mol. Biol. 152, 723 . 36. Crawford , J.L., Kolpak , F.J. , Wang , A.H.-J. , Quigley , G.J. , va n Boom , J.H. , va n de r Marel, G.A . and Rich, A. (1980 ) Proc. Natl. Acad. Sci. USA 77, 4016 .
The single-crystal structures of Z-DNA 25
1
37. Gessner , R.V., Frederick , C.A., Quigley , G.J. , Rich, A . and Wang, A.H.-J. (1989 ) J. Biol. Chem. 264, 7921 . 38. Egli , M., Williams, L.D., Gao , Q. an d Rich, A. (1991) Biochemistry 30 , 11388 . 39. Bancroft , D., Williams, L.D. , Rich, A. and Egli, M. (1994 ) Biochemistry 33, 1073 . 40. Ohishi , H. , Nakanishi , I., Inubushi, K., van der Marel, G.A., van Boom, J.H., Rich , A., Wang, A.H.-J., Hakoshima , T. an d Tomita, K . (1996) FEBS Lett. 391, 153 . 41. Ho , P.S. , Frederick , C.A. , Saal , D., Wang, A.H.-J . an d Rich, A. (1987 ) J. Biomol. Struct. Dynamics 4, 521 . 42. Gessner , R.V., Quigley , G.J. , Wang, A.H.-J., va n der Marel, G.A., van Boom, J.H. an d Rich, A. (1985) Biochemistry 24, 237 . 43. Gao , Y.G., Sriram , M. an d Wang, A.H.-J . (1993) Nucl. Acids Res. 21, 4093 . 44. Mooers , B.H.M., Eichman, B.F. and Ho, P.S . (1997) J. Mol. Biol. 269, 796. 45. Fujii , S. , Wang, A.H.-J., Quigley , G.J. , Westerink, H., va n der Marel, G., van Boom, J.H. and Rich, A . (1985 ) Biopolymers 24, 243 . 46. Ban , C., Ramakrishnan, B. and Sundaralingam, M. (1996 ) Biophys. J. 7 , 1215 . 47. Malinina , L., Urpi, L. , Salas , X. , Huynh-Dinh , T . an d Subirana , J.A. (1994 ) J. Mol. Biol. 243, 484. 48. Kagawa , T.F., Geierstanger , B.H., Wang , A.H.-J . and Ho, P.S . (1991 ) J. Biol. Chem. 266 , 20175. 49. Coll , M. , Fita , I., Lloveras , J., Subirana , J.A., Bardella , F. , Huynh-Dinh , T . an d Igolen , J. (1988 ) Nucl. Acids Res. 16, 8695 . 50. Sadsivan , C. an d Gautham, N. (1995 ) J. Mol. Biol. 248, 918 . 51. Wang , A.H.-J., Hakoshima, T., va n der Marel, G., van Boom, J.H. an d Rich, A . (1984 ) Cell 37, 321 . 52. Wang , A.H.-J., Gessner, R.V., va n der Marel, G.A., van Boom, J.H. an d Rich, A. (1985) Proc. Natl. Acad. Sci. USA 82 , 3611 . 53. Geierstanger , B.H., Kagawa, T.F., Chen , S.-L. , Quigley , GJ . an d Ho, P.S . (1991 ) J. Biol. Chem. 266, 20185 . 54. Ginell , S.L. , Kuzmich , S. , Jones , R.A . an d Berman , H.M . (1990 ) Biochemistry 29 , 10461. 55. va n Meervelt , L. , Moore , M.H. , Lin , P.K.T. , Brown , D.M . an d Kennard , O . (1990 ) J. Mol. Biol. 216, 773 . 56. Kumar , V.D. and Weber, I.T . (1993 ) Nucl. Acids Res. 21, 2201 . 57. Cervi , A.R., Guy , A. , Leonard, G.A. , Teoule , R. an d Hunter, W.N. (1993 ) Nucl. Adds Res. 21, 5623 . 58. Eichman , B.F., Basham, B., Schroth , G.P. and Ho, P.S . (submitted). 59. Schroth , G.P. , Kagawa, T.F. an d Ho, P.S . (1993 ) Biochemistry 32, 13381 . 60. Bononi , J. (1994 ) MS Thesis, Oregon Stat e University, Corvallis. 61. Parkinson , G.N., Arvanitis , G.M., Lessinger, L., Ginell , S.L. , Jones, R., Gaffney , B . and Berman, H.M. (1995 ) Biochemistry 34 , 15487 . 62. Moore , M.H. , va n Meervelt , L., Salisbury , S.A. , Kong Thoo Lin , P. an d Brown , D.M . (1995) J. Mol. Biol. 251, 665 . 63. Peterson , M.R. , Harrop , S.J. , McSweeney , S.M. , Leonard , G.A. , Thompson , A.W. , Hunter, W.N. an d Helliwell, J.R. (1996 ) J. Synch. Rad. 3, 24. 64. Kumar , V.D. , Harrison , R.W. , Andrews , L.C . an d Weber, I.T . (1992 ) Biochemistry 31 , 1541. 65. Brennan , R.G., Westhof , E. an d Sundaralingam , M. (1986 ) J. Biomol. Struct. Dynamics 3 , 649. 66. Doi , M. , Inoue , M., Tomoo , K. , Ishida , T. , Ueda , Y. , Akagi , M. an d Urata , H . (1993 ) J. Am. Chem. Soc. 115, 10432 .
252
Oxford Handbook of Nucleic Acid Structure
67. Zhang , H. , va n de r Marel , G. , va n Boom , J . an d Wang , A.H.-J . (1992 ) Biopolymers 32 , 1559. 68. Teng , M. , Liaw , Y.-C., van der Marel, G.A. , van Boom, J.H. an d Wang, A.H.-J . (1989 ) Biochemistry 28 , 4923 . 69. Ho , P.S. , Frederick , C.A. , Quigley , G.J. , va n de r Marel , G.A. , va n Boom, J.H., Wang , A.H.-J. an d Rich, A. (1985) EMBO J. 4, 3617 . 70. Coll , M. , Saal , D., Frederick , C.A., Aymami , J., Rich , A . and Wang, A.H.-J. (1989) Nucl. Adds Res. 17, 911 . 71. Brown , T. , Kneale , G., Hunter, W.N . an d Kennard, O . (1986 ) Nucl. Acids Res. 14, 1801 . 72. Berman , H.M. , Olson , W.K. , Beveridge , D.L. , Westbrook , J. , Gelbin , A. , Demeny, T. , Hsieh, S.-H., Srinivasan , A.R. an d Schneider, B. (1992 ) Biophys. J. 63 , 751 . 73. Bernstein , F.C. , Koetzle , T.F., Williams , G.J.B. , Meyer , E.F. , Jr., Brice , M.D. , Rodger , J.R., Kennard , O., Shimanouchi , T. an d Tasumi, M . (1977 ) J. Mol. Biol. 112, 535 . 74. Egli , M. an d Gessner , R.V. (1995 ) Proc. Natl. Acad. Sci. USA 92 , 180 . 75. Diekmann , S . (1989) EMBO J. 8 , 1. 76. Dickerson , R.E. (1992 ) Meth. Enzymol. 211, 67 . 77. Rich , A., Nordheim, A. and Wang, A.H.-J. (1984 ) Annu. Rev. Biochem. 53, 791 . 78. Jovin , T.M. , Soumpasis , D.M. an d McIntosh , L.P . (1987 ) Annu. Rev. Phys. Chem. 38 , 521. 79. Gessner , R.V., Quigley , G.J . and Egli, M. (1994 ) J. Mol. Biol. 236, 1154 . 80. Drew , H.R . an d Dickerson, R.E. (1981 ) J. Mol. Biol. 151, 535 . 81. Berman , H.M. (1994 ) Curr. Opin. Struct. Biol. 4, 345 . 82. Kubinec , M.G. an d Wemmer, D.E . (1992 ) J. Am. Chem. Soc. 114, 8739 . 83. Liepinsh , E., Otting , G . and Wuthrich, K . (1992 ) Nucl. Acids Res. 20, 6549 . 84. Morgan , J.E., Blankenship , J.W. an d Matthews, H.R . (1986 ) Arch. Biochem. Biophys. 246 , 225. 85. Tabor , C.W . an d Tabor, H . (1984 ) Annu. Rev. Biochem. 53, 749. 86. Howell , M.L. , Schroth , G.P. an d Ho, P.S . (1996 ) Biochemistry 35 , 15373 . 87. Mooers , B.H.M. , Schroth , G.P., Baxter , W.W. an d Ho, P.S . (1995 ) J. Mol. Biol. 249, 772. 88. Quadrifoglio , F. , Manzini, G. and Yathindra, N. (1984 ) J. Mol. Biol. 175, 419 . 89. Kagawa , T.F., Stoddard , D., Zhou , G. and Ho, P.S . (1989 ) Biochemistry 28 , 6642 . 90. Futscher , B.W., Rice , J.C., Ho , P.S . and Dalton, W.S. (submitted). 91. Melander , W. an d Horvath, C . (1977 ) Arch. Biochem. Biophys. 183 , 200 . 92. Hamada , H. an d Kakunaga, T. (1982 ) J. Cell. Biochem. 3, 333 . 93. Hamada , H., Petrino , M.G . an d Kakunaga, T. (1982 ) Proc. Natl. Acad. Sci. USA 79 , 6465 . 94. Schroth , G.P. , Chou , P.J . and Ho, P.S . (1992) J. Biol. Chem. 267, 11846 . 95. Davies , D.B. (1978 ) Progress in NMR Spectroscopy, Vol . 12 , p. 135 . Pergamonn , Oxford . 96. Haschmeyer , A.E.V. and Rich, A. (1967 ) J. Mol. Biol. 27, 369 . 97. Basham , B., Schroth , G.P . an d Ho, P.S . (1995 ) Proc. Natl. Acad. Sci. USA 92 , 6464 . 98. Zhang , X . and Mathews, C.K. (1994 ) J. Biol. Chem. 269, 7066 .
8 Standard DNA duplexes and RNA:DNA hybrids in solution UK Schmitz, Forrest J. H. Blocker, and Thomas L. James Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143-446, USA
1. Introduction Following th e curren t paradig m tha t biologica l functio n i s encoded i n three-dimen sional molecular structure, the last 1 0 years have seen great efforts i n the determination of well-defined solutio n structures of DNA usin g high resolutio n nuclea r magnetic res onance (NMR ) method s an d modelling tools. Sinc e th e 1970s , NM R ha s emerged as the metho d o f choice for the stud y o f biomolecules under more physiologica l condi tions tha n ca n be create d i n th e crystallin e state . Initially, it wa s successfully applie d t o the stud y o f nuclei c acid s t o yiel d onl y 'low-resolution ' structura l insights , suc h as secondary structur e informatio n o r qualitativ e difference s betwee n simila r samples . With large improvements i n magnetic fiel d strength s and development o f sensitive two dimensional (2D ) NM R experiments , a s well a s the developmen t o f large-scal e syn thetic method s fo r nuclei c acids , hig h resolutio n structure s o f DNA coul d finall y b e tackled i n th e mi d t o lat e 1980 s (se e refs 1— 3 fo r review) . Consequently , mos t o f th e research in that period was directed towards establishing adequate methodologies for : (i) achieving complet e assignment s o f solvent-exchangeabl e an d non-exchangeabl e protons i n nuclei c acid s (4) ; (ii ) convertin g spectra l observables , i.e . intensitie s fro m quantitative 1D and 2D NOE spectroscop y o r coupling constant s from correlate d spectroscopy into structura l information (5-8) ; and (iii ) building adequat e models using the NMR-derived structural information (ref . 5 and references therein). A spate of high resolutio n DN A duple x structure s appeared i n th e earl y 1990s, an d it becam e conceivabl e tha t sequence-specifi c structura l rule s coul d b e establishe d readily if enough hig h qualit y duple x structure s could b e solved. However , prompte d by DNA crysta l structure results , it becam e eviden t tha t a sequence-specific cod e fo r structure i s more comple x tha n anticipated . O n th e othe r hand , i t ha s become clea r that the particula r structure of a dinucleotide ste p depends o n neighbourin g sequence s (9). Thi s als o le d t o th e ide a o f a specific flexibilit y o r malleabilit y o f certain DN A sequences. This , i n turn , cause s difficultie s i n th e interpretatio n o f NMR-derive d structural data, making th e conformationa l analysi s of a particular sequence i n terms o f rigid helical parameters problematic . Not surprisingly , som e o f th e earl y excitemen t stimulate d cautionar y studie s analysing the accurac y and limitations o f the NMR-base d approach (10—14).Ou r dis cussion o f structure s wil l b e precede d b y a critica l summar y o f methodologie s an d inherent limitations , a s well a s their impac t o n th e structure s derived.
254
Oxford Handbook of Nucleic Acid Structure
Despite th e intrinsi c limitation s o f NMR-derive d structures , whic h ar e stil l no t fully understood , som e promisin g result s emerge d fro m a statistica l analysi s o f a handful o f th e mos t accuratel y defined DNA duple x structure s (15,16). Thi s analysi s revealed that the averag e helical twist o f a particular set of structures is virtually identi cal wit h tha t measure d b y independen t biophysica l solutio n measurement s (17,18 ) lending som e long-awaite d credibilit y t o the NMR approach . There hav e bee n a larg e variet y o f application s o f NM R t o DNA . Ove r 100 0 studies ar e liste d i n th e Medline ® databas e (19 ) fo r th e year s 1990-199 6 i n whic h NMR wa s applied t o som e kin d o f sample containin g DNA . A comparativel y small number o f publication s repor t hig h resolutio n structure s o f unmodified , standar d DNA duplexe s (4%) , but ove r 50 % of al l studie s focu s o n th e interactio n o f DN A with other molecules , e.g . proteins an d peptides (9%) , drugs (covalentl y attached or as a non-covalen t complex ) (40%) , an d cation s (5%) . This chapte r wil l emphasiz e th e structures of standard duplexes. NMR i s als o a n excellen t too l fo r elucidatin g th e dynami c propertie s o f nuclei c acids (20) , sinc e man y o f the NM R observable s eithe r encod e conformationa l flexi bilites, owin g to thei r natur e a s time-average parameters , e.g. couplin g constant s and NOE intensities , o r reflec t globa l an d loca l dynami c propertie s directly , e.g . lin e widths and other relaxatio n parameters .
2. Data and methods for high resolution structure determination Before discussin g detaile d difference s betwee n hig h resolutio n NM R structure s o f DNA o r RNA:DN A hybrids , a brie f revie w o f th e method s involve d i n structur e determination i s helpful t o understan d som e o f the intrinsi c problems .
2.1 NMR data and restraints for the determination of high resolution structures Structural information used for DNA structur e determination i s typically in th e for m of distanc e restraint s extracte d fro m multidimensiona l homonuclea r NO E spectra ; {1H,1H} NOE cross-pea k volume s ar e converted int o distanc e restraints in a more o r less quantitativ e fashio n (21) . Distance dat a ar e ofte n augmente d b y scala r couplin g constant-derived 1H1H o r 'H 31 P torsion angles , available from variou s correlated spectroscopy experiment s (e.g . COSY, HETCOR) . Detailed review s ar e available describing acquisitio n o f suitabl e dat a sets , proto n assignments , an d th e generatio n o f adequate restrain t list s fo r th e structur e refinemen t procedur e (4,5,22,23) . Fo r th e present purpose , i t i s important t o realiz e tha t th e NMR-derived parameter s largel y represent local information, wher e couplin g constant s usually extend ove r three bond s and NOE s reflec t distance s of > 7 A , wit h th e highe r value s only i n th e mor e auspi cious case of methyl groups . It i s clear that ther e i s not enoug h informatio n availabl e from NM R dat a alone t o define precisel y th e structur e o f a DN A duple x withou t makin g assumption s abou t base pairin g geometrie s an d withou t inclusio n o f a n empirica l chemica l forc e field . The precisio n o f a structure, i.e . th e reproducibilit y o f the structure s generated fro m
Standard DNA duplexes and RNA:DNA hybrids in solution 25
5
the experimenta l dat a will improve wit h an increasing number of distance and torsio n angle restraints , a s wel l a s wit h th e precisio n o f thes e restraints . Consequently , decreased precisio n i n th e restraints , equivalent t o wide r distanc e erro r bounds , wil l require a larger numbe r o f restraints to defin e a structure with the sam e precision. Fo r the increasingl y encountere d situatio n o f isotope-labelle d protein s an d RNA , heteronuclear NO E spectr a require a very conservativ e quantitativ e interpretation o f dis tance restraints , e.g . usin g onl y a n uppe r boun d o f 6 A (24) . However , th e hug e number of NOEs in multidimensional spectr a of proteins an d RNA compensate s well for thi s los s i n precision . Also , i n thes e case s th e focu s i s no t usuall y o n smal l sequence-specific structura l details. For DNA duplexes , however , where isotope-labelle d samples and multidimensional heteronuclear NM R i s just on th e horizon , on e ca n currently assig n 10—1 5 NOEs per residue, 2 0 in th e mos t favourabl e cases. However, th e distributio n i s not ideal ; ther e are relativel y few restraint s involving the proton-poo r bas e moieties. Thes e restraint s are not sufficien t t o deriv e a precise structure wit h hig h accurac y unless the tightnes s of th e distanc e restraint s is considerabl y improve d compare d wit h th e conservativ e approach i n th e cas e o f isotope-labelled samples . On e possibility , commonl y use d i n protein structura l work , i s to grou p th e NO E intensitie s semi-quantitativel y (e.g . weak, medium, an d large, corresponding t o distances of 4-7, 3-5 , an d 1.8-3.5 A, respectively) (25) . Since , i n DNA duplexes , th e restraint s ar e not equall y distribute d ove r the molecul e an d ther e ar e typicall y fe w 'long-rang e distances' , i.e . non-sequential , cross-strand distances , th e semi-quantitative interpretatio n is even more prone to caus e artefacts resultin g from multispi n effects, commonl y terme d spi n diffusion (26,27) . An approximate wa y o f copin g wit h spi n diffusio n i s t o extrapolat e NO E intensitie s towards ver y smal l mixin g time s usin g NO E build-u p curve s (28) . Thi s procedur e requires a number o f NOE dataset s to b e analyse d and stil l canno t properl y accoun t for spi n diffusio n effect s (26) . A systemati c improvemen t i n th e restraints ' accuracy requires the consideratio n of all structure-dependent relaxatio n pathways of the entir e proton system , whic h ca n be don e wit h complet e relaxatio n matri x method s (29) . With thi s approach, the theoretica l NOE spectra , including spi n diffusio n effects , ca n be calculate d fo r a give n structur e wit h th e assumptio n o f a motiona l model . I n reverse, thi s strategy can be used to comput e accurat e distances from NOE intensitie s for unknow n structure s usin g a hybri d matri x approach , wher e al l experimentall y unobserved NOE s are taken from a model structur e that is similar t o th e targe t struc ture (30—32) . Exact details o f methods differ , bu t th e element s o f the relaxatio n matri x are varie d unti l a consisten t fi t t o th e experimentall y observe d NOE s i s obtained . While program s lik e IRM A (32 ) an d MORAS S (31 ) integrat e thi s proces s directl y into th e conformationa l search for the fina l structur e via restrained molecular dynam ics (rMD) , th e progra m MARDIGRA S (30 ) simpl y varie s cross-relaxation rate s unti l the best solution is found a s indicated by a minimum residual index . Interproton dis tances ar e readil y obtaine d fro m th e converge d relaxatio n matrix . Thi s latte r route , where defining th e targe t an d meeting it are separate steps , offer s th e methodologica l advantage that no structura l assumptions are implicit via the chemica l force fiel d o f the search engin e o r the constrain t impose d b y the requiremen t tha t all distances must b e satisfiable b y a singl e DN A model . Therefore, a se t o f MARDIGRAS-derive d dis tances migh t eve n exhibi t som e mutuall y inconsisten t restraints , a situation tha t ca n
256
Oxford Handbook of Nucleic Acid Structure
result fro m conformationa l flexibilit y leadin g t o dynamicall y averaged , an d therefor e possibly inconsistent, distances . Consequently, examinatio n o f these distance s may aid the recognition o f conformational flexibility. Complete relaxation matrix methods have also been implemented in different mol ecular dynamic s program s t o allo w refinemen t directl y agains t NOE intensities , e.g . AMBER (33) , XPLOR (34) , GROMOS (35) , CHARM M (36) , and some commer cial packages such a s DISCOVER (37) . Back-calculatio n method s lea d t o spi n diffu sion-adjusted structures , when preliminary, distanc e geometry-derived model s (38,39 ) or canonica l startin g model s (40 ) ar e refined b y adjustin g loca l mode l geometrie s t o match th e appearanc e o f th e actua l NOES Y spectra . I n a relate d approach , th e program NUCFI T varie s local structura l parameters t o matc h NO E build-u p curve s by taking int o account spi n diffusio n and anisotropic molecular motion (12,41) . Another importan t differenc e betwee n th e various complete relaxatio n matrix implementations is the wa y they dea l with th e assessmen t of errors and propagation o f those errors int o th e structur e derived . Integrate d NO E cross-pea k volume s hav e limite d accuracy, owing t o peak overlap, spectral noise levels, distortion fro m solven t or diagona l peaks, exchange processes, incomplete proto n relaxation , or baseline problems. To obtai n reliabl e NM R structures , i t i s important t o us e distanc e restraint s with bounds a s tight a s possible, without ignorin g th e inheren t experimenta l an d methodological error s (26) . Th e subsequen t refinemen t proces s shoul d the n yiel d a fina l structure alon g wit h a conformational envelope reflectin g the intrinsi c limitations i n the origina l dat a and the method . Therefore , i t is important t o us e a refinement pro cedure tha t utilize s uppe r an d lowe r restrain t bound s fo r obtainin g DN A structures , such that the degre e o f accuracy is not los t in the process . The MARDIGRA S procedure use s a n absolut e nois e error , e.g . conservativel y usin g th e siz e o f th e smalles t NOE, an d an additional relative error , e.g . 10—20 % in distance bounds determination . Furthermore, differen t motiona l model s wit h varyin g correlatio n time s ar e used (e.g. the isotropi c correlatio n tim e ma y be varie d fro m 2 t o 4 n s for a shor t oligonu cleotide) i n combinatio n wit h differen t startin g geometries . Typically , the differenc e between uppe r an d lowe r MARDIGRAS-derive d distanc e bound s encountere d i n the DN A structura l work discusse d below i s around 0. 3 A. However , distanc e deter minations fo r a covalently modified DN A exhibitin g a number o f unusual, long fixed distances reveale d tha t NO E intensit y erro r propagation , especiall y fo r th e wea k NOEs, neede d t o b e accounte d fo r in orde r t o reproduc e al l the fixe d distances (42). An alternativ e procedure wa s devised, entailin g MARDIGRA S calculation s repeate d 30—100 times usin g NO E intensit y set s randoml y perturbe d with user-selected noise and relativ e erro r limits . Wit h thi s modifie d MARDIGRA S procedur e (42) , th e average distanc e restrain t width increase d t o 0.6—1. 0 A , reflectin g th e significantl y wider bounds for wea k NOE s (e.g . 3— 5 A instea d o f 1— 2 A) . Th e compensatio n fo r the broader bounds i s that distances up t o 7 A could b e reliably determined (42). Even usin g coupling constant-derive d torsio n angle s as structural restraints i s not a s straightforward a s it seeme d i n th e earl y 1990 s (se e ref s 3 an d 7 fo r review) . Wit h respect t o 3 JHH couplin g constants , whic h ar e commonl y employe d t o restrai n th e deoxyribose moieties , i t appear s that on e canno t rel y blindly o n th e modifie d Karplus equation (7,43) , which aros e fro m th e empirica l correlatio n o f hundreds o f measured coupling constant s and torsio n angle s o f small cyclic molecules . Fo r macromolecules ,
Standard DNA duplexes and RNA:DNA hybrids in solution 25
7
dipolar effect s ca n influence the scala r coupling constant s (44,45) ; the effect s becom e significant a t large r correlatio n times . O n a practica l level , couplin g constant s ar e extracted fro m COSY-typ e experiments , ofte n involvin g extensiv e peak shape analysis via simulatio n (46—50) . Typica l error s rang e fro m ±0. 3 H z fo r th e best-define d coupling constants , i.e.JH1,H2, and JH1,H2, to ±1 Hz or more fo r JH3,H4,. Model calcula tions have shown tha t dipolar contribution s ar e small for correlatio n time s below 5 ns (45,51), wit h effect s smalle r tha n experimenta l errors . Furthermore , th e su m o f th e vicinal coupling constant s for a particular proton, e.g . EH1' , i s not affecte d b y dipolar relaxation (50,52 ) becaus e o f compensatory effect s fo r individua l couplin g constants , JH1'H2" an d JH1'H2' . Alton a an d coworker s (6 ) showe d tha t EH1 ' i s a ver y usefu l para meter for assessing whether o r not a particular sugar moiety adopts a rigid conform ation; eve n th e relativ e occupatio n o f S - an d N-typ e conformation s can b e inferred from EH1' . Overall , i t seem s tha t deoxyribos e coupling constant s ca n be use d safel y for DN A structura l work whe n th e lengt h doe s no t excee d 12—1 4 base pairs and th e temperature i s above 15—20°C . Lan e an d coworker s (50 ) demonstrate d tha t a reliable deoxyribose conformationa l analysi s can b e accomplishe d eve n fo r a 16-me r duple x if the temperatur e i s chosen appropriatel y (e.g . 50°C , yieldin g a correlatio n tim e o f 5 ns). 31 P—1H coupling constant s can be use d to defin e part s of the DN A backbon e (3,8) . However, limite d b y experimenta l accurac y and th e relativ e insensitivity of this cou pling constan t o n th e dihedra l angle , backbon e torsio n angle s restraint s are typically implemented i n a rather qualitativ e fashion wit h restrain t widths fro m 6 0 t o 180° . I f the NO E restraint s defin e th e relativ e geometr y betwee n sequentia l nucleotide s sufficiently, extan t chemica l forc e fields , e.g . AMBER4. 1 (53) , (se e als o Chapte r 4 ) have prove n ver y robus t i n definin g th e phosphodieste r moitiey . Thus , mos t DN A duplex structure s have been determine d withou t explici t backbon e restraints , excep t for thos e definin g the suga r geometry.
2.2 Structure refinement procedure To obtai n hig h resolutio n DN A duple x structures , NMR-derived restraint s (i.e . dis tances o r NO E intensities , an d torsio n angles ) ar e use d t o driv e a conformationa l search tool . Thi s conformationa l searc h shoul d enabl e al l conformation s consisten t with th e experimenta l dat a t o b e defined . Thi s lead s t o a conformationa l envelop e defining th e structure— a sourc e o f consternatio n fo r scientist s accustomed t o th e single structur e typically reporte d i n X-ra y studies. Most commonl y employe d i s the rMD approac h [e.g. AMBER (33 ) , GROMOS (35), XPLOR (34)] , where th e matc h of a theoretical structur e with th e restrain t target i s translated into a penalty ter m i n the empirica l forc e field . Hence , th e conformationa l search ideally lead s to a structure that agrees with both th e chemica l force field and the restraints . Similar strategies have been implemente d fo r restraine d Mont e Carl o (rMC ) [e.g . DNAminiCARLO , (54,55)] an d restrained molecular mechanics [e.g . JUMNA (56)] , which becom e par ticularly powerful when interna l coordinate s ar e substituted fo r Cartesian coordinates , typically used for rMD. Whil e distance geometry ha s also been applie d to DN A struc ture determination , i t typicall y produce s structure s tha t nee d subsequen t refinemen t against a chemical force field to obtai n energeticall y feasibl e structures.
258
Oxford Handbook of Nucleic Acid Structure
For al l methods , successfu l refinemen t require s convergence , whic h mean s tha t essentially th e sam e targe t structur e ca n b e reache d fro m differen t startin g model s using a reasonabl e protoco l (fo r a detaile d discussio n se e ref . 5) . Th e similarit y between converge d structure s obtained fro m differen t startin g structures (th e atomi c rms deviatio n shoul d b e belo w = 1 A for a short DN A duplex) , expresse s the preci sion o f the whol e refinemen t process. I t i s not t o b e confuse d with th e accurac y of the structur e (57) . Since NMR-derive d restraint s do no t completel y defin e a DN A duplex structure , the conformationa l searc h results will always depend somewha t o n the refinemen t protocol , especiall y the relativ e weightin g of chemical forc e fiel d an d NMR restraints . However , i t shoul d b e note d tha t th e structur e o f on e duple x refined wit h rM C agree d withi n 0. 5 A o f atomi c rm s deviatio n wit h tha t refine d with rMD (55) . The accurac y of a refined structure can be partially inferred from comparison o f th e theoretical spectra l data (i.e . NOE intensitie s an d coupling constants ) with the experimental data , which i s similar to th e procedur e i n X-ra y crystallograph y that produce s R-factors. Simila r figure s o f meri t ca n b e use d fo r NMR-derived structure s a s well (26,29,58,59). However , suc h figure s o f meri t fo r NMR-derive d structure s als o depend o n th e numbe r o f observables, such that it is difficult, fo r example, t o compar e NMR R-factor s o f different structures . None the less , NMR R-factor s hav e becom e valuable parameters to indicat e the progres s of a refinement process. An improve d representatio n o f final conformational parameter s involves th e depic tion o f their relativ e erro r bars . Such erro r bar s can be inferre d fro m analysin g larger clusters o f structure s obtaine d fro m differen t refinemen t run s o r eve n fro m longe r rMD trajectories . Th e latte r provide s a simpl e wa y to discer n whethe r smal l differ ences i n helica l parameter s o r backbon e parameter s fo r certai n residue s are , indeed , significant.
2.3 Limitations in NMR-derived high resolution structures The issue s discussed thus far are closely linked t o th e questio n of how wel l defined are the structura l parameters in DNA solutio n structure s published to date . Although thi s question canno t b e considere d solved , som e clarificatio n is emerging . It i s known tha t high precisio n distanc e restraints can reproduce fin e structura l features fairl y well (60) . Whe n a DNA crysta l structur e wa s used t o simulat e 2D NO E cross-peak volumes , whic h wer e subsequentl y use d a s inpu t dat a fo r a relaxatio n matrix/rMD protocol , th e resultin g structure s exhibite d almos t identica l structura l parameters, includin g mos t helica l parameters . However , thi s stud y underestimate d errors, especiall y for small NOE intensities , an d assumed no pea k overlap. We note that an earlier stud y (10 ) showe d that the accurac y o f a back-calculationbased refinement procedure t o reproduc e helica l parameter s was more limite d tha n i n the cas e above (60) . It has been note d (61) , however, tha t som e DNA structure s calculated with back-calculation method s exhibi t extrem e helica l features, e.g. hig h helica l twist. A statistica l analysi s o f a numbe r o f hig h resolutio n duple x structure s (61 ) revealed tha t one o f the structure s determine d by back-calculation method s (62 ) was highly overwound , exhibitin g larg e structura l deviation s fro m what coul d b e consid ered th e 'averag e B-DNA solution structure ' (61) . For the res t o f the structure s in th e
Standard DNA duplexes and RNA:DNA hybrids in solution 25
9
analysis, th e averag e twis t angl e fo r al l relaxation matrix/rMD-refine d structure s (see below) i s virtually identical with independen t solutio n dat a (17, 18). A reassuring result from anothe r stud y was that two independent structur e determi nations, on e wit h rM D an d th e othe r wit h rMC , fo r a Pribnow bo x duple x carrie d out wit h th e sam e relaxatio n matrix-derive d distance s (1 2 restraint s per residu e o n average), converge d t o essentiall y the sam e structure (see below), despit e use of differ ent refinemen t tool s an d force field s (55,63) . With respec t to th e numbe r o f relaxation matrix-derive d distanc e restraints necessary t o generat e structure s reproducibly, calculations with a particularly well-define d structure (2 0 distance restraints per residu e on average ) (64 ) demonstrated tha t practi cally th e sam e fina l structur e coul d b e obtaine d whe n u p t o 20 % o f th e distanc e restraints ha d bee n randoml y omitte d fro m th e protocol . A 'fre e R-factor ' analysi s also suggested that those structure s exhibited th e sam e degree o f accuracy. Therefore , an averag e numbe r o f 12—1 5 restraint s pe r residu e shoul d be considere d adequate , i f they ar e reasonabl y accurat e an d wel l distribute d ove r th e molecule . Th e latte r i s important, especiall y fo r th e spars e cross-strand restraints . An earlie r stud y o f a self complementary DNA duple x (13) reveale d significan t overal l structural change s whe n the central , cross-stran d restrain t betwee n tw o adenin e H 2 [A(H2) J proton s wa s removed. Thi s resul t exemplifies the wa y NMR-derived restraints enforce loca l structure t o which the res t o f the molecule must adjust . Therefore, structura l feature s associated wit h suc h constraint s mus t b e interprete d wit h care . Thes e latte r result s also explain wh y certai n sequences are better define d by NMR dat a than others : AT-ric h sequences provid e NOEs involving the A(H2) proton s and the thymine methyl group , not availabl e in GC pairs; sequences with run s of the same nucleotide o r strict alterna tion, e.g. (AC) 4(GT)4 (47,65) , exhibi t to o muc h pea k overlap an d therefore provide a smaller numbe r of restraints. The mos t elusiv e question i s how wel l th e individua l conformationa l parameters are defined throug h NM R restraints . Several studies have tried t o answer this question by characterizing the distributio n i n th e structural parameters when a series of refinement protocols produce d a numbe r o f structure s fo r a particula r restrain t dat a se t (10,11,13,60). Othe r approache s include d evaluatio n o f change s i n simulate d NO E intensities when structura l parameters were systematicall y scanne d (12,14 ) o r analysis of average helical parameters and their standard deviations obtaine d fro m rMD trajec tories with dramaticall y differen t weight s fo r the NMR restraint s (66). Together, thes e studie s indicate tha t intraresidu e parameters , i.e . suga r pucke r an d glycosidic torsion angle , are well define d by NMR restraints , especially when deoxyri bose couplin g constant s are available. On th e othe r hand , th e dependenc e of the base pair parameter s o n th e NM R restraint s i s no t clear , becaus e th e derive d bas e pair geometry i s a structural compromise betwee n th e NMR-determine d structure of each strand an d idea l Watson—Cric k bas e pairing. NM R structure s of DN A ar e typically modelled wit h explici t Watson-Cric k bas e pairs (5 ) connectin g th e strands , which , other tha n this , ar e only connecte d throug h fe w A(H2) cross-stran d restraints . Since the balanc e betwee n suc h holonomi c restraint s an d th e NM R restraint s is defined arbitrarily, it is often difficul t t o interpre t th e actua l values for parameters that describ e deviations fro m th e idea l fla t Watson—Cric k base pair . I n general , large r weight s fo r the NM R restraint s drive th e bas e pair parameters towards a larger deviation fro m 0 ,
260
Oxford Handbook of Nucleic Acid Structure
the idealize d B-DNA value. For example , value s for opening of— 5 t o —15° , o r values for shea r an d stretc h o f >0. 5 A ar e significan t distortion s o f th e hydroge n bondin g scheme an d can be structura l artefacts fro m overfittin g (5,66). (See ref. 6 7 and Chapte r 2 for definitions o f helical parameters. ) Among th e helica l parameters, x-displacement an d inclinatio n [a s calculated by th e program 'Curves ' (68) ] describe th e positio n o f a base pair with respec t to th e globa l helix axis and most readil y distinguis h between A- an d B-DNA. Interestingly , x-dis placement an d inclinatio n sho w a stron g correlatio n an d a pronounced dependenc e upon th e weigh t o f th e NM R restraint s (66) ; whe n NM R restrain t weight s ar e increased, x-displacemen t decrease s bu t absolut e value s fo r inclinatio n increase . Th e result is a leaner, less hollow, doubl e helix , whic h deviate s significantly from canonica l B-DNA. Unusua l value s for thes e tw o parameter s shoul d b e interprete d wit h grea t caution, a s they als o are sensitive to overfitting . For larg e NM R restrain t weights , som e reductio n i n th e rang e derive d fo r ste p parameters slid e an d shift , and , t o a smalle r extent , twist , ar e noticeable . However , values fo r th e ste p parameter s exhibited a mor e pronounce d NM R restrain t weight dependence, whic h seem s t o b e stronge r fo r th e centra l par t o f th e Pribno w bo x octamer. Systemati c change s i n al l steps were apparen t fo r til t an d roll , wherea s th e most dramati c change s fo r individua l step s coul d b e see n fo r th e parameter s slide , twist, and roll. Thus , eve n relaxation matrix-base d rM D refinemen t wil l rende r struc tures with limite d precision , whic h ca n be translate d into a conformational envelop e spanning a considerabl e rang e o f helica l paramete r values . Nevertheless , sequence specific pattern s for the absolut e values of the helica l parameters seem t o b e indepen dent of refinement protocol, eve n when individual error s in the helica l parameters are quite large (66). Qualitatively simila r effect s wer e see n i n th e refinemen t o f th e nonamer , d(GCAAAAACG):d(CGTTTTTC) (69) , where thre e differen t rM D protocol s (tw o different charge s fo r th e phosphat e group s unde r in vacuo conditions , -0. 3 an d —1.2 , versus utilizatio n o f explici t solven t wit h neutralize d phosphates ) yielde d reasonabl e structures wit h low energies an d NOE R-factor s bu t significan t deviation s for some o f the helica l parameters . Despit e the fac t tha t som e of thes e discrepancie s might be a result of extreme condition s with respec t to treatin g phosphate charges in rMD simula tions, most o f the sequence-dependent patterns o f the helica l parameter s are similar fo r the differen t structures . Some gros s helical features ar e clearly affected b y use of explici t water molecule s (69) . Besides reduce d value s for twis t an d propelle r twist , th e mino r groove wa s more compresse d whe n n o explici t solven t wa s used. Thi s i s in marke d contrast t o a n earlie r study , where rM D trajectorie s o f anothe r AT-ric h duple x (70 ) obtained wit h an d withou t explici t solven t yielde d a narrowe r mino r groov e wit h explicit solvent . Helica l paramete r value s i n thi s latte r study , especiall y wit h explici t solvent, wer e simila r t o othe r structura l studies in whic h ' a spin e of hydration' i n th e narrowed mino r groov e o f AT-ric h sequence s wa s postulated (71,72) . Furthermore , fluctuations o f the minor groov e widt h wer e reduce d dramatically for rMD trajectories with explici t solvent (66,70). Therefore, the mino r groov e width i s not wel l defined by NMR dat a and absolute values are more reliabl e when explici t solven t is used. All of the abov e considerations regardin g th e accurac y of structural parameters apply only i n the cas e where a DNA molecul e exist s in only on e conformation . I n reality, all
Standard DNA duplexes and RNA:DNA hybrids in solution 26
1
molecules ar e more o r les s flexible , an d thi s flexibility presents th e larges t systematic problem fo r defining high resolutio n solutio n structures . Since NOE-derived distance restraints are calculated from time-average d data , which mayb e severely biased towards shorter distances , even relativel y smal l fraction s o f mino r conformation s ca n distor t NOE intensities . I n mor e pronounce d situations , conformationa l flexibilit y wil l b e manifest i n mutuall y exclusiv e NOE-derive d distance s (16,70) o r couplin g constant s (7,43,73). Sinc e we barely have enough NM R dat a to defin e a single structure unam biguously, generatin g a well-defined descriptio n o f a flexible molecule i s a formidable task. Nevertheless , despit e the underdefine d natur e of the problem, som e partial solutions have been offered . Thes e wil l be discusse d in the contex t o f DNA:RNA hybri d structures (se e below).
3. DNA duplex structures The numbe r o f publishe d NM R structure s of DN A i s still quit e modes t compare d with proteins . A s this chapter's focu s i s on doubl e helica l DNA , Tabl e 8. 1 list s those high resolutio n structure s where th e DN A contain s solely standar d nucleotide s wit h Watson—Crick base pairs and no modifications . Fo r purpose s of comparison, onl y 3 9 studies tha t reported a description o f th e structur e determinatio n proces s along wit h the discussio n and depictio n o f the ful l structur e are included i n th e table . A numbe r of excellent structura l studies that focus o n certai n structural details, e.g. suga r pucker determination, coul d no t be included . Along wit h DN A sequence , Tabl e 8. 1 als o list s typ e an d numbe r o f experimenta l restraints used and gives a synopsis of refinement method s employed . Method s fo r th e generation o f distance restraints are listed, i.e . isolate d spin pai r approximation (ISPA ) and NO E build-u p curve s versus a hybrid relaxatio n matri x method s whic h i s either integrated int o th e rM D refinemen t (in t hyb, rMD ) o r i s independent (hy b dd) ; th e conformational searc h tool i s also listed, including th e nam e o f the program , i.e . dis tance geometr y (DG) , restraine d M D (rMD) , restraine d Mont e Carl o (rMC) , restrained energ y minimizatio n wit h interna l helica l parameter s (rEMhp) , an d back calculation method s (bkcalc) . I f method s othe r tha n bkcal c use d NO E volume s a s restraints, they ar e listed explicitly (vo l ref). In general , it seems that the vas t majorit y of structures from th e 1990 s were gener ated suc h tha t spi n diffusio n ha s bee n accounte d for , eithe r whe n th e distanc e restraints wer e generated , o r i n a fina l refinemen t ste p agains t NO E volumes . Th e amount o f experimental informatio n used in th e refinemen t differ s enormousl y fro m structure to structure , ranging from just a few to 2 0 distance restraints per nucleotide . About hal f o f th e newe r structure s wer e derive d wit h additiona l torsio n angl e restraints, mos t o f whic h constrai n onl y th e deoxyribos e moiety . T o enabl e furthe r exploration o f th e structure s in Tabl e 8.1 , w e als o provide accessio n numbers o f th e coordinates deposite d a t th e Protei n Dat a Ban k (PDB ) (URL : http://pdb.bnl.gov ) (100). However, disappointingly , only nin e o f the listed structures are actually available at PDB. [Not e tha t DNA crysta l structures are available at the Nuclei c Acid Databank as well, URL: http://ndbserver.rutgers.ed u (101) , see Chapter 3. ] Before taking a more detailed loo k a t the structure s in Tabl e 8.1, w e note tha t several extensive reviews are available covering th e earlie r years of this field (1,2,4,61) .
Table 8.1. Standar d duple x DN A structure s solved usin g NMR dat a Entry
1 2
3 4 5 6 7 8 9
10 11 12
Sequence
Exp.a restraints NOE/tor/nuc
Restraint generatio n an d refinement methods b (names of programs )
5'-AAGTGTGACAT 3'-TTCACACTGTA 5'-CGTACG 3'-GCATGC 5'-CGTACG 3'-GCATGC 5'-CTGGATCCAG 3'-GACCTAGGTC 5'-GCATGC 3'-CGTACG 5'-GTTTTAAAAC 3'-CAAAATTTTG 5'-GAAAATTTTC 3'-CTTTTAAAAG 5'-GCGTATGTTGCG 3'-CGCATACAACGC 5'-CGCGAATTCGCG 3'-GCGCTTAAGCGC 5'-GCCTGATCAGGC 3'-CGGACTAGTCCG 5'-TCTATCACCG 3'-AGATAGTGGC 5'-GTACGTAC 3'-CATGCATG
150/N/7
ISPA dd; rEM
na
74
190/N/16
ISPA dd ; rE M
na
75
192/N/16
ISPA dd; rM D (CHARMM)
na
76
160/N/8
ISPA dd ; rM D (CHARMM)
na
77
158/N/13
ISPA dd; rM D (CHARMM)
na
78
200/N/10
vol ref
na
79
200/N/10
vol ref
na
80
212/N/9
ISPA dd, DG; bkcal c
na
81
155/N/7
ISPA dd, DG; bkcal c (DSPACE; BKCALC )
171D
38
162/N/7
ISPA dd, DG; bkcal c (DSPACE; BKCALC )
na
82
330/Y/17
ISPA dd, rMD; vo l ref (GROMOS)
1D20
83
244/Y/15
ISPA dd, rMD; vo l ref (GROMOS)
1D19
84
Access number PDB
Reference
Table 8.1. Continued Entry
Sequence
Exp.a restraints NOE/tor/nuc
Restraint generation and refinement methods * (names of programs)
13
5'-CATGCATG 3'-GTACGTAC 5'-CGCCTAATCG 3'-GCGGATTAGC 5'-CGTCACGCGC 3'-GCAGTGCGCG 5'-CGCTTAAGCG 3'-GCGAATTCGC 5'-GGAAATTTCC 3'-CCTTTAAAGG 5'-ACACACAC 3'-TGTGTGTG 5'-ATATATAUAT 3'-TATATATATA 5'-GTATATAC 3'-CATATATG 5'-CCTTAAGG 3'-GGAATTCC 5'-CGATCG 3'-GCTAGC 5'-GTACTGCAGTAC 3'-CATGACGTCATG 5'-CATGACGTCATG 3'-GTACTGCAGTAC
208/Y/13
ISPA dd, rMD; vol ref (GROMOS) ISPA dd, DG; bkcal c (DSPACE; BKCALC) ISPA dd, DG; bkcal c (DSPACE; BKCALC) int hyb dd; rMD (MORASS; AMBER) ISPA dd; rM D (AMBER) hyb dd; rMD (MARDIGRAS; AMBER) hyb dd; rMD (MARDIGRAS; AMBER) hybrid: MARDIGRAS rMD ISPA dd; rM D (AMBER) w/ Rama n torsions vol ref; rMM SPEDREF, XPLOR ISPA dd; rHP (JUMNA) ISPA dd; rHP (JUMNA)
14 15 16 17 18 19 20 21 22 23 24
293/N/15 290/N/15 138/N/7 182/Y/9 220/Y/14 166/N/88 197/N/12 89/Y/6 703v/N/na 73/Y/3 95/Y/4
Access number PDB
Reference
1D18
84
na
10
na
10
na
85
na
86
na
65
na
87
1D42
13
na
88
na
40
na
56
na
56
Table 8.1 .
Continued
Entry
Sequence
Exp.a restraints NOE/tor/nuc
Restraint generation and refinement methodsb (names of programs)
25
5'-GCGTATACGC 3'-CGCATATGCG 5'-GCCGTTAACGGC 3'-CGGCAATTGCCG 5'-GTATAATG 3'-CATATTAC
na/Y/na
5'-GGATCC 3'-CCTAGG 5'-AGCTTGCCTTGAG 3'-TCGAACGGAACTC 5'-GAATTTAAATTC 3'-CTTAAATTTAAG 5'-GGTATACC 3'-CCATATGG 5'-AATGGAATGGAATGG 3'-GGTAAGGTAAGGTAA 5'-CATTTGCATC 3'-GTAAACGTAG 5'-ACCGTTAACGGT 3'-TGGCAATTGCCA 5'-CGGACAAGAAG 3'-GCCTGTTCTTC
na/N/na
ISPA dd, DG; bkcal c (DSPACE; BKCALC) ISPA dd, DG; bkcal c (DSPACE; BKCALC) hyb dd; rMD (MARDIGRAS; AMBER) MDtar (AMBER) rMC (DNAminiCarlo ) prob assess (PDQPRO ) hyb dd; rHP
26 27
28 29 30 31 32 33 34 35
80/Y/3 184/Y/12
267 /Y/ 13 250/N/10 149/N/9 200/Y/9 398/Y/20 190/N/10 217/N/10
hyb dd; rMD (MARDIGRAS; AMBER) ISPA dd, rMD;bkcalc (DISCOVER, AMBER; BKCALC) int hyb dd; rMD (IRMA; GROMOS ) hyb dd; rMD (AMBER) hyb dd; rMD (MARDIGRAS; AMBER) ISPA dd, rMD; vol ref (XPLOR) hyb dd; rMD (MARDIGRAS; AMBER)
Access number PDB
Reference
1D68
89
132D
62
1D70
63
na na na
70 90 61
na
91
142D
92
na
93
na
94
na
95
na
64
na
96
na
97
Table 8.1. Continued Entry
Sequence
Exp.a restraints NOE/tor/nuc
Restraint generation and refinement methods b (names of programs)
36
5'-GCAAAAACG 3'-CGTTTTTC 5'-TTTCTCCTTTCT 3'-AAAGAGGAAAGA 5'-CGCAAAAATGCG 3'-GCGTTTTTACGC 5'-CGAGGTTTAAACCTCG 3'-GCTCCAAATTTGGAGC
189/Y/11
hyb dd; rMD (MARDIGRAS; AMBER) int hyb dd; rMD (IRMA; GROMOS) vol ref; rMD (CHARMM)
na
69
na
98
na
36
ISPA dd, rMD;bkcalc (DISCOVER, AMBER; BKMAT)
na
99
37 38 39 a
149/N/6 na/Y/na 460/Y/14
Access number PDB
Reference
Number o f distance restraints; use of torsion angl e restraints for sugars or backbone (ye s = Y, no = N) ; number of distance restraints per residue. * Abbreviations for methods used to generat e distance restraints and type of refinement procedure : DG , distanc e geometry; rMD , restraine d molecula r dynamics; rEM, restrained energy minimization; dd, distance derivation ; ISPA, isolated spin-pair approximation; rMM, restraine d molecular mechanics ; int, integrated ; vol ref, NOE volum e refinement; MDtar, MD wit h time-average d restraints ; bkcalc, backcalculation; hyb dd, hybrid matrix approach for distance derivation ; rHP, restraine d energy minimization with internal helical parameters.
266
Oxford Handbook of Nucleic Acid Structure
3.1 The average DNA structure in solution as seen by NMR The leas t surprising, but no t a t all trivial, resul t from Tabl e 8.1 , i s that all DNA struc tures ar e reported t o b e i n th e B family. To date, n o structura l studies o f unmodifie d DNA usin g near physiological solution condition s have found any other duple x topology. Not e th e man y studie s of th e transitio n fro m B - t o th e left-hande d Z-for m o f DNA, whic h migh t b e involve d i n som e regulatio n o f DN A transcriptio n in vivo (102). I n general , alternatin g purine-pyrimidin e (RY ) sequences , i n particula r GC rich sequences , especially with specifi c methylation s (103) , coul d adop t a Z-for m geometry. Althoug h som e solutio n studie s suggeste d tha t eve n a n alternatin g C G hexamer coul d underg o th e B t o Z transitio n (104) , hig h resolutio n NM R studie s (105) coul d no t corroborat e thi s fo r d(CGCGCGTATACGCGCG) 2. I n genera l i t seems, tha t fairl y drastic , non-physiologica l conditions , e.g . ethanol , hig h concentra tions o f divalent cations, or perchlorate (106—108 ) are necessary to driv e DNA toward s a left-handed form. Before considerin g detaile d difference s betwee n th e structure s in Tabl e 8.1 , i t is instructive t o pictur e th e overal l similaritie s o f th e averag e B-for m i n solutio n deduced b y NM R methods . Suc h a pictur e emerge d fro m a quantitativ e analysi s (16) o f the helica l parameters of nine o f the structure s listed in Tabl e 8. 1 (entrie s 11, 12, 13 , 20, 25 , 26 , 27 , 29 , 33) . Not e tha t a meaningful analysi s o f a pool o f structures require s re-examinin g the m wit h th e sam e too l fo r derivatio n o f helical para meters. Th e analysi s was also limited b y the availabilit y of coordinates for publishe d structures. Nevertheless , som e interestin g feature s coul d b e extracted , eve n fo r specific trait s o f certai n nucleotid e steps . Tabl e 8. 2 list s result s fo r a numbe r o f helical parameter s i n juxtaposition t o averag e solid-stat e dat a derive d fro m crysta l structures. One o f the mos t important results of this statistical analysis (16) was that the average local helica l twis t (35.3° ) i s in goo d agreemen t wit h th e independen t value s (17,18 ) for th e solutio n state (34.0—34.3°) . Th e matc h i s nearly perfect when a highly over wound structur e derive d wit h a particula r back-calculatio n metho d (entr y 26 ) i s dropped fro m th e average . The larg e twis t value s of that structur e might b e a result of refinement artefacts , sinc e th e back-calculatio n metho d di d no t us e a ful l empirica l force field . Severa l distinctions ca n be mad e betwee n B-DN A i n solutio n an d i n th e solid state . First , the paramete r 'roll ' is positive i n solutio n an d slightly negative in th e crystalline state. Positive roll, als o encountered i n A-form structures , is correlated wit h minor groov e widening . Secondly , th e averag e valu e fo r slid e i s slightly negativ e fo r the NM R structures , while i t i s positive fo r B-form crysta l structures. Both o f these features sho w that the NMR averag e tends slightly towards the A-form . Table 8. 2 als o indicate s increased bendin g anisotrop y o f th e doubl e heli x whic h should be increased i n the absenc e o f crystal packing forces . Both the mea n valu e and standard deviatio n fo r rol l o f B-DN A ar e bigge r i n solutio n tha n fo r th e X-ra y average. Th e til t average s to zer o regardles s o f state , bu t th e standar d deviatio n i s clearly smalle r fo r th e NM R average . Rolling toward s th e mino r groov e i s easier t o accomplish i n solutio n tha n tilting , whic h result s fro m unfavourabl e compressin g o f one stran d an d stretchin g o f th e other . A simila r rational e ca n explai n th e reduce d values for twist and the associate d standard deviation. Fo r the crystallin e state, differen -
Standard DNA duplexes and RNA:DNA hybrids in solution 26
7
Table 8.2. Structura l parameters and standard deviations fo r average DNA fro m solution an d solid stat e data and individual nucleotid e step s Source" (number of values in dataset) C
B-DNA
A-DNAd
Twistb
Tilt
Roll
Shift
Slide
Rise
(°)
(°)
(°)
(A)
(A)
(A)
36.1 4.2
0.0 3.6 0.0 3.3
-0.2
0.21 0.75 -1.57 0.38
3.35 0.24 3.32 0.31
-13.8
7.9 5.6
0.00 0.55 0.00 0.52
-0.1 2.5
4.6 7.0
0.00 0.28
-0.34 0.46
3.15 0.21
-10.8
0.0 2.7 -0.2 2.0 0.0 2.0
3.5 3.9 1.4 6.1 9.8 7.7
0.00 0.29 0.01 0.28 -0.01 0.24
-0.41 0.39 -0.29 0.30 -0.24 0.59
3.16 0.16 3.20 0.23 3.10 0.20
30.8
4.8 e
35.3
NMR (42)
4.2 e
RR:YY (12) RY:RYe (15) YR:YRe (15)
34.8 2.8
37.3
3.6
34.2
5.4
5.6
Propeller twist o
() 6.6
-9.8
5.6 7.8
a
Data are cited from ref . 61; first line of each entry give s the mea n value, the secon d line gives th e standard deviation in eac h individual set of data. b Local helical parameters reported here conform t o the Cambridge convention (67) and were calculated with an algorithm by Zhurkin and coworkers (109). c Data of the surve y of high resolution X-ray structures in the B-form (110). d Data of the surve y of high resolution X-ray structures in the A-for m (111) . e Data of the survey of nine high resolution NMR structure s (see text for description of dataset).
tial helica l twis t i s an efficien t mean s t o reduc e steri c clashes , whereas increase d rol l can more efficiently reduce clashe s in solution. Another featur e o f the solutio n B-form i s the pronounced non-planarit y o f individual bas e pairs . Mea n value s fo r th e parameter s propelle r twist , buckle , an d stagge r deviate fro m solid-stat e value s (dat a no t shown) . A n interestin g featur e o f the NM R average is the slightl y compresse d helix rise , for which n o othe r systemati c explanation can be offere d beyon d th e ide a that non-flat base pairs in conjunctio n wit h appropri ate ste p parameter s migh t stac k in a mor e compresse d way . Take n together , th e ris e and slide values suggest tha t DNA i n solution is slightly shorte r and fatter than antici pated from canonical B-DNA.
3.2 Sequence-dependent structural variation Sequence-dependent structura l feature s ar e difficult t o determin e sinc e fo r both soli d and solution states the structure determination method s ar e not devoi d o f artefacts, e.g . the rol e o f crystal packing forces in crysta l structure s an d th e empirica l force field s o f the refinement tool s in both method s shoul d no t be underestimated. But beyond thes e limitations, i t i s not clea r wha t uni t o f a double heli x uniquel y define s the structura l equivalent o f th e DN A sequenc e code . Althoug h ther e ar e pronounce d difference s
268
Oxford Handbook of Nucleic Acid Structure
between th e 1 0 unique complementar y dinucleotid e segment s and the three groups of purine-pyrimidine sequences, a host o f crystal structure studies have suggested tha t a t least th e adjacen t step s exer t a distinc t structura l influence , whic h woul d yiel d 13 6 structurally uniqu e tetramer s (9) . Analysis of thi s proble m wit h experimenta l struc tures i s no t possibl e i n th e foreseeabl e future , bu t w e ca n focu s o n uniqu e feature s apparent in dinucleotid e steps. Earlier NM R studie s (65,87 ) trie d t o elucidat e such properties by studyin g strictly alternating sequences , assumin g tha t th e dinucleotid e ste p feature s d o no t ge t swamped b y th e effect s o f th e flankin g bas e pairs . Fo r example , i n th e cas e o f d(ATATATAUAT)2 (87 ) alternating patterns for som e helica l parameter s were found . These and subsequent studie s als o made clea r that ultimat e and penultimate bas e pairs are affected b y their termina l position an d hardly bear sequence-dependent features . More definit e insights cam e fro m th e statistica l analysis of non-terminal bas e pairs cited abov e (61) , where th e 1 0 unique dinucleotid e step s were analyse d utilizing 2—7 data points for eac h step. Despite this paucity of statistical data, a few themes emerged , especially fo r th e thre e A—T step s that wer e th e bes t defined. Instead o f listing helical parameters fo r th e latter , Fig . 8. 1 depict s th e averag e structures o f th e dyad s com puted fro m th e averag e helica l parameter s (61) . Th e parameter s rol l an d slide consti tute th e bigges t difference s betwee n th e thre e sequences . Fo r bot h parameters , maximum values are seen fo r TpA, an d a minimum for ApT, wit h ApA assuming an intermediate value. The larg e slide value for TpA result s in poo r stacking , which ca n be see n easil y i n Fig . 8.1 . Thi s tren d ca n be rationalize d b y th e so-calle d Calladin e
Fig. 8.1. Stereoview s of averag e conformations fo r th e thre e uniqu e A— T bas e step s according to th e local, average helical parameters, as described in ref . 61. (Th e to p bas e pair is bold; circles mark glycosidic bonds, with filled circles belonging to the to p an d open circles to the bottom bas e pair.)
Standard DNA duplexes and RNA:DNA hybrids in solution 26
9
rules (112) , accordin g t o whic h purine s o n opposit e strand s reliev e steri c clashe s by moving awa y from eac h other . Th e Tp A ste p exhibit s th e onl y positiv e slid e valu e among th e 1 0 analyse d dyads, whil e Ap T show s th e onl y positiv e rol l value . Thes e trends are also confirmed by independent modelling studie s (113,114) an d hold quali tatively fo r a number o f B-DNA crystal structures . TpA ha s the larges t value for helical twist, followed b y ApA and ApT. Although th e absolute difference s are fairl y smal l ( = 1—2° ) an d significan t sequence heterogeneit y i s found fo r thi s parameter , th e tendenc y agree s well wit h conformationa l calculations and variou s crystal data. Som e sequence s (62,89 ) showin g hug e difference s betwee n TpA an d ApT step s (>10°) have been reported . Considerin g al l results for alternating AT sequences, it becomes clea r that the parameter s interact to minimiz e bas e stacking for Tp A an d maximize stackin g for ApT steps , whic h is in accor d wit h th e so-calle d 'alternating B-form ' mode l (115,116) . Mos t o f thes e observation s als o appl y t o broader classe s o f sequence s o f purine s an d pyrimidines , wit h large r rol l an d slid e values fo r Yp R compare d wit h Rp Y step s (se e Tabl e 8.2) . However , th e tren d fo r twist i s reversed fro m alternatin g A T with large r value s fo r Rp Y tha n fo r Yp R an d RpR steps . The Yp R ste p is more compresse d on averag e than the othe r tw o and also exhibits the larges t value for the paramete r cup. (Fo r a definition o f this non-standar d helical parameter , se e Chapter s 2 an d 6 , an d ref . 117) . Thes e parameter s lea d t o a unique situatio n for the TpG:Cp A dyad , where especiall y the larg e positive roll , asso ciated wit h a compresse d majo r groove , cause s a fairly localize d bend . Thi s effec t i s obvious fo r d(CATTTGCATC):d(CTAAACGTAG) depicte d i n Fig . 8. 2 an d t o a lesser extent als o for a trisdecamer, entr y 29 in Table 8.1 . Fedoroff et al. (118 ) too k a differen t approac h t o elucidat e sequence-dependen t effects i n th e group s o f sequences o f purines an d pyrimidines , b y extractin g variou s sequential interproto n distance s via back-calculation methods . Althoug h th e depen dence betwee n specifi c distance s and th e overal l helica l parameter s i s fairly complex ,
Fig. 8.2 . Stereovie w of d(CATTTGCATC):d(GTAAACGTAG) (Tabl e 8.1, entr y 33) with a superposition o f the global helix axi s calculated with th e program 'Curves ' (68) . (The 5'-en d of the first strand is at the bottom in the back.) Th e arro w points at the centre of the Tp G step , which is the center of a bend.
270
Oxford Handbook of Nucleic Acid Structure
the sequentia l H1'—H8/ 6 an d H2'—H8/ 6 distance s see m t o follo w a pattern . Hl'-H8/6 distances are shorter (>3. 5 A ) for RpY an d RpR, an d longer (>3. 5 A ) for YpR an d YpY steps , while shorte r H2'-H8/6 distances (>2.7 A) ensue for YpY and RpY step s and larger distances (>2.7 A ) for YpR an d RpR steps . Distances involving A(H2) proton s hav e bee n recognize d a s a goo d structur e indicato r (13,119) . Th e cross-strand A(H2)(n)—Hl'(m+l ) distances , i n particular , appea r t o depen d o n sequence: fo r CpA , TpG , an d TpA , i t i s large r than 4. 5 A ; i n G A steps , i t varie s between 3. 8 an d 4. 5 A ; an d i n A A steps i t i s between 3. 7 an d 4. 2 A . However , fo r other sequence s both cross-stran d an d sequentia l A(H2)(n)—H1'(M+1) , th e distance s appear t o b e affecte d b y factor s othe r tha n th e dya d identity an d th e detail s of these complicating factor s ar e no t ye t understoo d entirel y (119) . Nevertheless , decrease d A(H2)(n)—Hl'(m+l) distance s ar e usuall y indicative o f a narrowe d mino r groove , found i n most AT-rich NMR structures . In general , i t i s clear that man y mor e NM R structure s will b e neede d t o confir m and extend th e result s described in this section.
3.3 Bending in NMR structures of DNA It ha s been recognize d fo r tw o decade s no w tha t DN A ca n be bent . Muc h research has bee n dedicate d t o describin g thi s phenomeno n structurally , to findin g physical explanations for the underlying mechanics, and to understanding the biological conse quences. A detailed introduction of all of the issue s is beyond th e scop e of this section, since Chapter 1 4 deals with thi s topic. Also , a review summing up 2 0 years of research into DN A bendin g ca n be found in ref. 120 . However, sinc e a large number o f NM R studies wer e geare d toward s understanding DNA bending , w e wil l summariz e som e interesting NMR-derived results. In general , it seems clear that DNA bending , whic h ca n span several residues or b e more localized, is a special case of sequence-dependent DN A features , especiall y those that involv e th e mechanic s of th e intrinsi c flexibilit y o f th e doubl e helix . Severa l models hav e been propose d fo r sequence-dependent bending , an d n o genera l agree ment ha s been achieved . The so-calle d 'stati c wedge models ' (121,122 ) assum e that the origi n o f curvatur e lies solel y i n th e localize d propertie s o f th e particula r dinu cleotide steps , wher e a propensity fo r certain roll, tilt, and twis t parameter s is thought to gover n bending . An exampl e tha t woul d b e i n goo d agreemen t wit h thi s notio n ha s already bee n discussed i n the sectio n above , wher e a particular combinatio n o f helical parameters , tilt, twist , and , foremost, roll fo r th e TpG:Cp A dyad , giv e ris e t o a distinc t bend i n several structures. However, a host o f other experimenta l data , including gel mobility studies, X-ra y data , nucleosom e positionin g data , an d energ y calculation s gav e th e motivation for several 'context-dependent' model s (120) , which postulat e cooperative interaction an d distan t neighbour conformation s in th e duplex . Thes e latte r models have evolve d aroun d th e intrinsic curvature o f longer runs o f adenines. The 'A-tract ' model (123 ) postulate s a negative rol l fo r (A) n: (T) n segments, which cause s bendin g into th e mino r groove . Th e effec t o f th e rol l motio n i s assume d t o b e mor e pro nounced fo r ApA steps in the (A) n: (T)B environment compare d with othe r surround ing sequences. In contrast, the alternativ e 'non-A-tract' mode l assume s that ApA steps
Standard DNA duplexes and RNA:DNA hybrids in solution 27
1
have zer o roll , a s seen i n know n B-DN A crysta l structures, bu t postulate s that othe r steps, o n average , exhibi t a positiv e rol l valu e (122,124) . Th e crucia l differenc e between th e latte r model s seem s to li e i n th e precis e rol l value s for th e Ap A step i n (A)n:(T)n segment s an d mixe d sequences . I t ha s als o bee n pointe d ou t tha t whe n addressing these issue s attention mus t be pai d t o th e genera l conditions , e.g . concen tration o f ions and other materials . Definitive answers are hardly available from macro scopic technique s suc h a s gel mobility studies , an d muc h hop e ha s been investe d i n NMR solutio n structures to give a more precis e and more accurat e picture. However, w e mus t bea r i n min d th e intrinsi c limitation s o f NMR-derived struc tures (see above). The combinatio n o f the underdetermined natur e of the experimental NM R dat a an d th e lac k o f experimenta l lon g distanc e restraint s makes i t ver y difficult t o deriv e a smooth curvatur e for a DNA NM R structure . Several studies have demonstrated tha t set s o f NM R dat a ca n simultaneousl y b e fitte d b y a curve d o r straight structur e (14) . However, th e resul t coul d b e dependen t o n sequenc e an d o n number an d distributio n o f experimenta l restraints . I t i s important t o bea r i n min d that som e o f th e globa l helica l feature s depen d als o o n th e non-NM R restrainin g information. I t ha s been pointe d ou t i n severa l studies (see above) tha t th e treatmen t of electrostatic interactions in particular, i.e. dealing wit h phosphat e charge s or solven t models, can hav e a significan t influenc e on the structura l result s (5,69,125) ; thi s is pronounced whe n th e numbe r o f restraint s is low a s one woul d fin d i n badl y over lapped spectr a from A-trac t sequences. Thus, i t appears that curvatur e and some o f its corollaries, like a certain major or minor groov e width , ca n be a n artefact o f the forc e field use d i n th e refinemen t process , an d no t necessaril y implie d b y the NO E data . For example , Chuprin a et al. (93 ) studie d d(GAATTTAAATTC) 2 (Tabl e 8.1 , entr y 30) an d suggested , o n th e basi s o f energetics , tha t A-trac t NOE-derive d distanc e restraints alone wer e no t sufficien t t o distinguis h betwee n structure s with a narrowe d minor groov e arisin g fro m a large propelle r twis t wit h a small inclinatio n o r fro m a small propeller twis t with a large negative inclination . A consisten t pictur e o f A-trac t bendin g canno t b e draw n fro m th e 1 6 (A) n-rich structures listed in Table 8.1. Severa l refined NMR structure s appear to be bent (Table 8.1: entrie s 7 , 17 , 30, an d 38), bu t straigh t structure s ca n also be foun d i n Tabl e 8. 1 (entries 6 , 33 , 34 , 37, an d 39) . Mos t o f the A-trac t sequence s structures are reporte d to be in the B family but wit h a narrower mino r groov e (36 , 62, 64, 69, 86, 96, 119) . A larger propeller twis t value than standard B-DNA seems to be another feature of Atracts, regardless of the presence of bending (46 , 63, 64, 79, 86 , 96, 99) . A bent structur e was observed fo r d(CGCAAAAATGCG):d(GCGTTTTTACGC ) (entry 38) (36) . In this case, a detailed comparison o f two crysta l structures obtained in the sam e lattic e (126) , a well-defined NM R structure , an d extensiv e fre e molecula r dynamics calculation s wit h explici t solven t wa s presented. O n th e basi s o f detaile d correlation analysi s of parameter s th e rol l an d tilt , bot h crysta l structure s exhibite d fairly straigh t A-trac t geometries , wit h bend s a t th e junction s wit h th e flankin g sequences towards minor o r major groove , respectively . The NM R structure , however , exhibited a slight, but nevertheles s concerted , ben d toward s the majo r groove fo r th e entire A-tract o n to p o f a more pronounce d ben d a t one o f the junctions. Th e ambi guity o f the bendin g directio n a t the crysta l structures' junctions wa s reflected by th e MD simulations , whic h showe d a n overal l ben d toward s th e majo r groove , beside s
272
Oxford Handbook of Nucleic Acid Structure
considerable oscillation s betwee n majo r an d mino r groov e directions . Interestingly , the NM R structur e was clearly more similar to on e o f the tw o form s observed in th e crystal lattice. Thi s form als o exhibited a characteristic narrowin g of the minor groove similar to th e NM R structure . The value s for propeller twis t were very similar for the A-tracts o f th e NM R an d bot h th e crysta l structures . Th e fre e M D simulation s revealed extensiv e buckl e an d propelle r twis t dynamics , slightl y larger fo r th e A-trac t than fo r flankin g sequences . Furthermore , enhance d backbon e dynamic s involvin g torsion angle s a an d y ar e see n withi n th e A-trac t an d th e junctions , whic h o n a qualitative level rationalizes small but distinc t difference s i n sugar pucker at the en d o f the A-trac t indicate d b y interproto n couplin g constants . Th e suga r repuckering , a s seen i n th e M D trajectories , coincide s wit h loca l bending . A detaile d suga r pucker analysis (46 ) for [d(A) 5(T)5]2 also found unique structural features a t the en d o f the A tract, suggestin g some distortio n o f th e regula r B-form geometries . A shar p drop fo r the pseudo-rotatio n angl e o f the suga r moietie s wa s observed when goin g fro m th e last tw o A s (150° > P >180° ) t o th e firs t tw o T s (100° > P >130°) a t the junction. I t was also noted tha t intraresidue H1'—H4 ' NOEs were muc h stronge r fo r the T junction compare d with all other As , which i s consistent with an even lower pseudo-rota tion angl e (60° > P >120°) . A numbe r o f NOE s fo r th e A a t th e junctio n wer e reported a s being differen t fro m th e othe r As . None o f thes e shar p difference s wer e observed fo r d(GCAAAAACG):d(CGTTTTTGC ) (Tabl e 8.1 , entr y 36) , wher e al l sugars were predominatel y C2'-endo, typical of B-form DN A (69) . Discontinuities i n helical parameters and apparen t kinks have also been detecte d i n a number o f other structure s with Ap T junctions: d(CGCGAATTCGCG) 2 (Tabl e 8.1 , entry 9 ) appear s to b e kinke d a t th e Ap T ste p (38 ) an d d(CCTAAATTTGCC) : d(GGCAAATTTAGG) appear s to be distorted at the ApT and TpA steps (127) . Gel electrophoresi s (128 ) an d crystallizatio n studie s (129 ) suggeste d a differenc e between th e curvatur e o f A n:Tn tract s an d T n:An tract s (130) . Severa l T B:AB NM R studies are available involving a Trp promote r sequenc e (131 ) an d endonucleas e cleavage site s (62,132) . Fo r th e latter , hig h negativ e propelle r twist , larg e rise , negativ e buckle, an d larg e openin g value s wer e foun d fo r th e Tp A step . I n th e sequenc e d(GTTTTAAAAC)2 (Tabl e 8.1, entr y 6) , there is no finit e distinguishabl e discontinu ity a t th e Tp A junctio n (79) , wherea s d(GAAAATTTTC) 2 (Tabl e 8.1,entr y 7 ) was reported t o b e ben t b y approximatel y 10 ° with a discontinuit y a t th e Ap T junctio n (80). d(CCTTAAGG) 2 (Tabl e 8.1 , entr y 21 ) wa s found , usin g NM R an d Rama n spectroscopy, t o ben d toward s the majo r groove a t the Tp A step . It was suggested that this Tp A junction bendin g i s th e resul t o f a hydrophobic interactio n betwee n th e methyl group s o f the thymine s (88) . Another duplex , d(GAATTTAAATTC) 2 (Tabl e 8.1, entr y 30 ) (93) , wa s found t o b e ben t locall y int o th e majo r groov e a t th e Tp A step. The rol e o f wate r i n DN A bendin g ha s been studie d recentl y b y NMR . Larg e NOEs were observe d between A(H2 ) of d(GTGGAATTCCAC) 2and hydratio n wate r (133), consisten t with th e presenc e of a 'spin e o f hydration' i n th e mino r groove . I n contrast, n o suc h NOE s wer e detecte d i n th e d(TTAA) 2 segmen t o f d(GTGGT TAACCAC)2, indicatin g n o tightl y boun d wate r molecules . Thes e result s can be cor related wit h th e large r widt h o f the mino r groov e i n d(TTAA) 2 segment s relativ e t o d(AATT)2 segments . Th e spin e o f hydratio n i n th e mino r groov e o f d(CGC -
Standard DNA duplexes and RNA:DNA hybrids in solution 27
3
GAATTCGCG)2 was found to b e particularly stable, emphasizing the potentia l struc tural significance of bound water (133-135 ) (se e als o Chapter 9) .
3.4 Conformational flexibility in DNA duplex structures NOEs, coupling constants , line widths, an d relaxation parameters are dependent upo n conformational fluctuation s o f the molecul e i n solution, whic h ma y entail time-scale s ranging fro m subnanosecon d processe s for smal l amplitud e vibration s t o large-scal e millisecond conformationa l transitions. A great dea l o f research has been dedicate d t o unravelling th e abov e issues , a s many review s hav e noted (20,136,137) . Here , w e wil l focus o n som e practica l structural manifestations of conformationa l flexibilit y related to NMR-derived structures in general. In general , th e fastest , subnanosecon d motion s ar e of relatively lo w amplitud e such that cross-relaxatio n rate s and associate d NOE intensitie s ar e not ver y differen t fro m those fo r a rigid bod y (137) , wher e overal l molecular tumblin g governs th e relaxation processes. Som e NM R relaxatio n studie s hav e focuse d o n loca l difference s i n DN A flexibility indicated b y measurin g 1 H-1H, 13 C—1H, o r 12 C-2H relaxatio n parameter s (137—139). Th e result s of thos e an d othe r studie s are not completel y consistent . Fo r example, a proton relaxatio n study reported th e sam e correlation tim e for the deoxyri bose proton H1' , an d base protons H6 an d H8, base d on proton spin—lattice relaxatio n time value s (T 1) (140). Some othe r studie s (137,139) als o found no significan t differ ences betwee n th e proto n relaxatio n behaviou r o f base s an d deoxyriboses , implyin g little o r n o interna l motio n o n a nanosecon d time-scale . However , a mor e recen t natural abundanc e 13 C relaxatio n stud y (138 ) with DN A hexamer s an d octamer s found small , but significant , differences betwee n bas e and sugar moieties. Fo r non-terminal residues , th e 'orde r parameters ' wer e aroun d 0. 8 fo r protonate d bas e carbon s and 0. 6 fo r suga r carbons . [Thes e 'orde r parameters ' describ e th e relativ e mobilit y with value s from 0 t o 1 , goin g fro m absolut e disorde r t o a rigi d bod y (141). ] Th e lowest orde r parameter s wer e observe d fo r termina l residues , wit h value s a s lo w a s 0.2—0.3 for the HO— 13C5'/3' positions. I n genera l terms , the anisotrop y o f molecular motion i n shor t DN A oligonucleotide s doe s no t see m have a significant effect o n th e determination o f accurate distance restraints and the ensuin g high resolutio n structures (137). Nevertheless , fo r NM R structure s tha t involve d complet e relaxatio n matri x methods, i t ha s been show n tha t th e matc h o f calculate d NOE s wit h experimenta l data ca n b e improve d b y appropriat e treatmen t o f th e aforementione d fas t interna l dynamics an d anisotropi c rotation s (142,143 ) withou t invokin g significan t structural changes.Therefore, large r DN A fragment s might requir e a differen t treatmen t o f th e relaxation as the molecula r tumblin g becomes mor e anisotropi c owing to th e increas ingly rod-like shape. Slower molecula r motion s lea d t o averagin g o f coupling constant s and NOEs . Fo r example, averagin g i s observed fo r vicina l couplin g constant s when th e rat e o f torsional fluctuations exceeds roughly 10 2 s-1. The time-scal e o f fast exchange , leadin g to averaging on th e chemica l shif t scale , which als o involves averaging of NOEs, depends on th e actua l chemical shif t difference s o f th e conformer s involved , which , o n th e other hand , i s expected t o b e n o mor e tha n on e pp m i n DN A (137) . I t i s clear that , with interchangin g conformations, th e averaging might lead to a 'virtual' NM R struc -
274
Oxford Handbook of Nucleic Acid Structure
ture o f limited value . Evidence o f such averaging has been reporte d by several groups (6,12,48) for the sugar moieties i n nucleic aci d structures . Conformational exchang e processes in a n intermediate real m might b e manifes t in unusual lin e width s o f specifi c resonances . Suc h a situatio n wa s foun d fo r man y sequences wit h Tp A steps , whic h seem s t o b e a uniqu e exampl e wher e flexibilit y might b e a sequence-dependent feature . 3.4.1 Specific flexibility of the TpA step Beyond th e structura l differences betwee n Ap T an d Tp A step s discussed in Sectio n 3.1, the TpA ste p exhibit s a unique, enhance d lin e broadenin g for the adenin e base protons, especiall y for th e usuall y very shar p A(H2) resonanc e (62,63,131,132) . Th e line width o f A(H2) i s dependent upo n th e magneti c fiel d strengt h and the tempera ture: th e lin e widt h increase s with temperatur e t o a maximum, afte r whic h i t show s the usual narrowing upon furthe r heating . Thi s behaviour was elucidated by T 1p mea surements (63) , which reveale d tha t th e adenin e base of the Tp A ste p is involved i n a relatively slo w conformationa l exchang e proces s o n th e submillisecon d time-scale : 10-4 s (63), 5 X 10-5 s (144), and 10 -6 t o 10 -2 s (132). Th e A(H2 ) chemica l shif t als o shows a clear temperature dependence , typica l for fas t exchange . Kenned y et al. (132) concluded that the effect s ar e most likely owing t o enhance d mobilit y o f the adenin e base plane with a n amplitude range of 20—50 ° degrees . Since large chemical shif t dif ferences betwee n th e rapidl y exchangin g conformation s ar e required fo r rationalizin g the experimenta l data , it seem s logical tha t fluctuations in the rin g curren t contribu tions t o th e chemica l shifts ar e the mai n reaso n for th e observe d effects . I n thi s vein, Kennedy et al. made a convincin g cas e fo r th e Tp A junctio n i n [d(CGAGGTT TAAACCTCG)]2, wher e A(H2 ) a t th e Tp A junctio n i s indee d positione d closel y enough beneat h the aromatic plane of the following A. Additional evidence fo r unique flexibility involving adenin e base s came fro m a high resolution NM R structur e of d(GTATAATG):d(CATATTAC) (Tabl e 8.1, entr y 27) . Unusually, shor t inter-residu e distanc e restraints were observe d betwee n th e tw o H 8 protons an d between th e tw o H 2 proton s of the central , stacke d adenines . Thi s was rationalized by movemen t of the two base s relative to one anothe r suc h tha t a shor t H8-H8 distance exists in on e conformatio n and a short H2—H 2 distance in the othe r conformation. A s measured NOE s strongly reflec t the shorter distance wit h motional averaging (se e above) , th e measure d H8—H 8 an d H2—H 2 distance s coul d no t b e satisfied b y a single structure (63) ; thi s was part o f the motivatio n fo r a more flexibl e structure refinemen t (se e below). Note that the unique flexibility is very much a property o f the Tp A ste p an d does not see m to requir e specifi c flankin g sequence s (145). This ide a i s also supported b y th e fac t tha t N6-methylatio n o f th e junction adenin e removes the lin e broadening effect s completel y (99) . 3.4.2 Accounting for sugar flexibility in the refinement process A know n limitatio n o f methods usin g NMR-derive d distance s an d torsio n angle s is that the y produc e stati c model s o f DNA eve n thoug h th e structur e may be dynamic . Dynamic averagin g of distances and torsion angle s yields single values of the restraints that ar e used to deriv e th e stati c structure . However, th e dynami c averagin g is non linear, so the precis e value s of the restraint s reflec t the structur e and populatio n of
Standard DNA duplexes and RNA:DNA hybrids in solution 27
5
each o f the interchangin g conformations . A particular situation exist s for deoxyribos e rings wher e bot h NO E restraint s an d vicina l couplin g constant s ca n b e use d t o describe the rin g conformation independently . Th e interproto n distance s that ar e most sensitive to suga r conformation are the H8/6—H2 ' and H8/6—H3' distances, which, i n conjunction wit h th e intrasuga r H1'—H4 ' an d H2"—H4 ' distances , usuall y lea d t o a reasonably well-define d suga r pucker . A n eve n mor e precis e descriptio n ca n b e obtained fro m intradeoxyribos e vicina l couplin g constants , whic h relat e t o suga r pucker vi a th e modifie d Karplu s equation , parameterize d accordin g t o Alton a an d coworkers (6,7,146) . It wa s noted earl y on tha t som e experimentall y determine d set s o f coupling constants wer e no t compatibl e wit h on e rigi d suga r conformatio n i n DN A duplexe s (6,43,46). Thi s le d t o th e simplest , non-rigi d model , a quickl y interconvertin g mixture o f th e tw o energeticall y mos t favourabl e and crystallographicall y mos t frequently observe d conformations , C2'-endo (S ) an d C3'-endo (N) . N o two-stat e mixture ca n b e define d unambiguousl y b y fou r o r fiv e experimenta l couplin g constants, but with a couple of reasonable assumptions the so-calle d S/ N mixtur e approx imates a dynamic sugar ring. Significan t effort ha s been devote d t o extractin g accurate coupling constant s usin g variou s simulatio n procedures . Differen t approache s hav e been undertaken , where th e fittin g procedur e wa s either manua l (46—48 ) o r iterativ e (147,148). Such methods typicall y yield relativel y precis e values for JH1'H2", and JH1'H2' , but eve n the more elusiv e JH3'H2" and JH3'H2' can be extracted with erro r bound s of up to ±1 Hz. Most studie s of pucker entailing three o r mor e couplin g constant s per deoxyribos e have found that th e coupling constants canno t be fitted simultaneously wit h one rigid conformation. Instead , mos t researcher s repor t result s (se e ref. 5 an d reference s therein) i n the form of a two-state model where the populations o f the two conform ers and th e pseudo-rotatio n angl e of the majo r conformer, s o far always within th e S range, ar e varied t o fi t th e data . Mos t DN A duple x studie s t o dat e indicat e a smal l percentage o f a n N-typ e conforme r (0—30 % fo r non-termina l residues) . Occasiona l repuckering has been observe d i n virtuall y all of the reliabl e free M D simulation s that approach the nanosecon d time-scal e (149 , 50 ) (see also Chapter 4) . From suc h simula tions and othe r theoretica l work (151) , it i s clear that the S range is energetically ver y shallow and another local energy minimum i s found nea r the O 4'-endo conformation . This show s how oversimplifie d th e classica l two-state interpretation is . On th e othe r hand, a smal l amoun t o f sugar repuckering doe s no t see m t o caus e big problem s i n deriving an average NMR structure . Several studie s have assesse d th e structura l implications o f suga r repuckerin g i n a DNA duple x containin g th e Pribno w bo x (Tabl e 8.1, entry 27 ) by adopting flexible refinement schemes . First , th e rigid , averag e structur e o f th e Pribno w bo x octame r was determined b y conventional rM D an d b y rMC method s wit h virtuall y th e sam e result (se e Fig. 8.3a). I t wa s noted tha t th e measure d H8/6—H3 ' distance s fo r mos t nucleotides wer e to o shor t for the S-typ e pucker , an d couplin g constan t analysis suggested th e presenc e o f a smal l percentag e o f th e mino r N-for m conforme r (63) . A shorter H8/6—H3' distance (r) in a minor N-conforme r wil l strongl y skew a measured distance t o a shorter valu e owin g t o th e non-linea r (r 6 weighted ) averagin g o f th e interconverting conformers . Th e suga r conformation s o f th e octame r wer e thu s
276
Oxford Handbook of Nucleic Acid Structure
Fig. 8.3. Hig h resolutio n NM R structure s o f th e Pribno w bo x octame r d(GTATAATG):d(CATAT TAC) (Tabl e 8.1 , entr y 27) . (Th e 5'-en d o f th e firs t stran d i s labelled. Hydroge n atom s ar e omitted. ) (a) Stereovie w o f best fi t heavy ato m superpositio n o f tw o structure s determined usin g rMC (bold ) an d rMD (thin ) refinement methods (se e text). (Th e atomi c rms deviation for the heavy atoms of the inne r six nucleotides is 0.5 A.) The vie w is into the majo r groove , (b ) Stereoview of nine structures representin g a MDtar ensembl e o f th e Pribno w bo x octame r accordin g to ref . 70 . Nin e snapshot s are shown coverin g the las t 9 0 p s o f a 12 0 p s simulation. Not e tha t th e backbon e i s more flexibl e tha n th e bases . Termina l residues ar e more disordered than inner nucleotides. The view is into the minor groove.
Standard DNA duplexes and RNA:DNA hybrids in solution 27
7
restrained t o th e rang e determine d fro m th e couplin g constant s fo r th e majo r conformers. The rigi d NM R structur e was then subjecte d to M D wit h time-average d restraint s (152), wher e th e restraints are enforced as an average over th e cours e of the trajectory rather than at each step o f the M D calculation s (70,153). Dependin g o n th e averaging time window, thi s method allow s the exploratio n o f extensive local dynamics, as several low energ y conformation s can be sample d drive n b y th e restraints . I n th e cas e o f th e Pribnow box octamer, MDta r simulations utilize d onl y distanc e restraints, includin g the discrepant H8/6-H3 ' distances . Th e ensuin g structura l ensemble coul d easil y satisf y both distanc e restraints , especially H8/6—H2' an d H8/6—H3 ' distances , as well a s th e coupling constants . Thi s seem s to b e a direct effec t o f the suga r repuckering, which is feasible whe n usin g MDtar, bu t no t conventiona l rMD . The conformationa l envelope produced wit h MDta r i s wider tha n wit h rM D o r eve n fre e M D simulations , a s the restraints encod e structura l averaging over a time-scale muc h longe r tha n a n M D trajectory. W e emphasiz e tha t th e averag e conformationa l parameter s fo r th e MDta r ensemble ar e ver y simila r t o thos e obtaine d b y conventiona l rM D refinement . Thi s means tha t th e overal l structur e of th e DN A octame r doe s no t chang e significantly when th e sugar s undergo occasiona l repuckering to th e N-form . A detailed discussion of the conformationa l parameters is beyond th e scope of this chapter and can be found elsewhere (70) . However , the overal l structura l effec t o f the MDta r refinement ca n be gleaned fro m Fig . 8.3b, which show s a representative ensembl e covering 10 0 ps. Note that th e backbone i s more disordered tha n the bases; also, terminal residues experienc e a much larger conformational range than non-terminal residues . On th e other hand , it must b e note d tha t th e discrepanc y between adenin e sequentia l H8-H8 an d H2—H 2 restraints (se e above) could no t b e remedie d completel y b y the MDta r approach , suggesting that the motio n require d to satisf y al l the restraint s is not achievabl e in the tim e window fo r averagin g presente d b y th e MDta r simulations . Anothe r exampl e o f thi s type of refinement is discussed in Section 4.1. A differen t approac h t o dynami c refinement , terme d PARS E (probabilit y assess ment vi a relaxation rate s of a structural ensemble), wa s applied to th e sam e Pribno w box octame r (16) . A larg e poo l o f conformer s wa s create d throug h a serie s o f restrained Mont e Carl o refinements , wher e restraint s wer e variousl y exclude d o r included. Th e modification s in the restrain t sets were mad e t o allo w individual sugars to assum e non-S-for m puckers . I n all , 60 differen t permutation s o f th e distanc e restraint fil e wer e used . The poo l o f conformers (>500 ) wa s created such that at least one membe r ha d N-type pucker for each nucleotide; th e pool als o contained th e fina l rMC-refined NM R structur e and othe r simila r structures . Then, probabilitie s for all conformers were compute d usin g PARSE t o yield th e best match with all experimen tal cross-relaxatio n rates , whic h wer e derive d fro m NO E intensities . Th e resultin g PARSE ensembl e contained 1 3 conformers with non-zer o probabilities . Al l 1 3 conformers contai n a t leas t on e deoxyribos e tha t i s flipped a t leas t partiall y t o th e N domain. Her e also , a better matc h wit h th e experimenta l dat a coul d b e achieve d b y allowing flexibilit y in th e suga r region. Th e averag e structura l parameters were ver y similar to thos e fro m th e conventiona l rMD (63 ) o r rMC (55 ) structures. In summary , i t seem s tha t allowin g flexibilit y i n th e refinemen t lead s t o bette r agreement wit h experimenta l data . Althoug h th e suga r dynamic s ultimatel y lea d t o
Table 8.3. RNA:DN A hybrid high resolution structural studie s sinc e 199 0 Entry Sequenc
e
1 5'-d(GTCACATG ) 3'-r(CAGUGUAC) 2 5'-d(GTGAACTT ) 3'-r(CACUUGAA) 3 5'-d(GCTATAA psTGG) 3'-r(CGAUAUU ACC ) ps= (S ) phosphorothioate 4 5'-d(GCTATAAprTGG ) 3'-r(CGAUAUU ACC ) pr= (R ) phosphorothioate 5 5'-d(CGCGTTTTGCGC ) 3'-r(GCGCAAAACGCG) 6 5'-d(GGGTAT A CGC ) 3'-d(CCCATAT)r(GCG) 7 5'-r(GCCA)d(CTGC ) 3'-d(CGGT GACG) 8 5'-d(GGG T TTACT) 3'-r(CCCA)d(AATGA) 9 5'-d(GGAGA)r(UGAC ) 3'-d(GTCAT CTCC) 10 5'-d(CG)r(CG)d(CG ) 3'-d(GC)r(GC)d(GC) 11 5'-d(CG)r(C)d(T A G CG) 3'-d(GC G AT)r(C)d(GC)
Exp.a restraints NOE/tor/nuc
322/Y/16
Restraint generation refinement methods a (names of programs) ISPA dd; DG, rMD,backcalc . 124 (DISCOVER, BIRDER ) int.vol.ref- internal par. n rMD;NUCFIT, DISCOVE R hyb matr; rMD, MDtar, 219 (MARDIGRAS.AMBER4.1)
320/Y/16
hyb matr; rMD, MDtar, (MARDIGRAS,AMBER 4.1)
Pucker analysis
Qualitative n
205/Y/13 ISPA dd; DG, rMD,backcalc . 260vb 592vb 789V*
Access number PDB d 16 a 16
Reference
2 3
d 73,16
4
219d 73,16
4
a 16
ISPA dd; DG, rMD,backcalc . 169 d 16 (DISCOVER, BIRDER ) ISPA dd; DG, rMD,backcalc . 1 gt c 167,16 (DISCOVER, BIRDER ) loka 5 2 (DISCOVER, BIRDER ) ISPA dd;rMD,vol ref 1drn 16 (XPLOR) int.vol.ref., rE M n a 17 (SPEDREF) int.vol.ref., rEM n a 17 (SPEDREF)
5 6 8
9 0 0
Table 8.3. Continued Entry
Sequence
Exp.a restraints NOE/tor/nuc
Restraint generation refinement methods" (names of programs)
12
5'-r(CGCG)d(TATA)r)CGCG) 3'-r(CGCG)d(ATAT)r(GCGC) 5'-d(CG)r(AGAU)d(GAC) 3'-d(CC TCTA CTG ) 5'-d(CGTTATAATGCG) 3'-r(GCAAUAUUACGC) 5'-d(CGCG)r(AAUU)d(CGCG) 3'-d(GCGC)r(UUAA)d(GCGC) 5'-d(CGCG)r(AUAU) d(CGCG) 3'-d(GCGC)r(UAUA)d(GCGC) 5'-d(CGTT)r(AUAA)d(TGCG) 3'-d(GCAA)r(UAUU)d(AAGC)
na
ISPA dd; DG, rMD,backcalc. (DISCOVER, BIRDER ) ISPA dd;rMD,vol re f (XPLOR) Qualitative
13 14 15 16 17 a
256vb Chem. shif t comparison Chem. shif t comparison Chem. shif t comparison Chem. shif t comparison
Access number PDB
Reference
104d
171
1dhh
169
na
172
Qualitative
na
173
Qualitative
na
173
Qualitative
na
173
For further explanatio n se e legend to Table 18.1 . ' Volum e refinemen t ma y involve above an d below diagonal peak s plu s some diagonal peaks ; a restraints-per-nucleotide value i s not readily available .
280
Oxford Handbook of Nucleic Acid Structure
higher disorde r for the backbone , th e overal l structura l feature s remain th e same . O n the othe r hand , neithe r o f the abov e approache s can lead to a unique solutio n o f th e problem, which makes dynamic refinement only desirable for situations where averag ing artefact s ar e obvious (see below).
4. RNA:DNA hybrid structures RNA:DNA hybrids are formed during essential biological processes such as transcription o f DNA int o RN A an d the revers e transcription of viral RNA cod e int o DN A sequences. Anothe r stron g motivatio n fo r structura l studies o f hybrid s i s t o under stand 'antisense ' pharmaceuticals which ar e generally modified DNA oligonucleotide s targeted t o mRNA or viral RNA. Suc h modifie d RNA:DN A hybrids are thought to be hydrolyse d b y largel y sequence-independent , hybrid-specifi c RNAases , e.g . RNAase H (154,155) . Understandin g the interactio n betwee n hybrid s and associated enzymes on a structural level should aid 'antisense' drug design. While sequence-dependent structura l features ar e of interest for DNA duplexes , for RNA:DNA hybrid s eve n th e gros s helical structur e wa s not establishe d unti l a few years ago . Th e agreemen t betwee n differen t method s wa s poo r i n earlie r wor k (156-159), but i t ha d alread y been suggeste d tha t hybri d structur e i n solutio n i s different fro m tha t in the soli d state. In crysta l structures the introduction o f ribonucleo tides int o shor t DN A oligomer s drive s the structur e from th e typica l B-form t o th e A-form. Just on e 5'-ribonucleotid e wa s enough t o driv e a n octame r hybri d int o a n A-form crysta l structure (160,161) . Thi s i s no t th e cas e fo r RNA:DN A hybrid s i n solution. High resolution NMR structure s are available for different constructs , where eithe r a part o f on e o r bot h strand s contains ribonucleotides, o r a whole stran d is RNA. A compilation of NMR structura l studies for hybrids is presented in Table 8.3. Not e that the selectio n criteri a wer e les s restrictiv e tha n fo r th e DN A duplexe s i n Tabl e 8. 1 since w e wante d t o gathe r high resolutio n structura l information on al l the differen t types of hybrids mentioned above .
4.1 RNA:DNA hybrids with one complete RNA strand Hybrids wit h on e complet e RN A stran d have bee n studie d mos t extensivel y with , however, conflictin g results. From fibr e diffractio n data , it became clear early on (158 ) that the structure of poly r(A):pol y d(T) depend s on th e relativ e humidity. An A-typ e diffraction patter n change d wit h increasin g humidity t o on e representin g th e RN A strand with C3'-endo sugars, but th e DN A stran d with B-like C3'-exo sugars . Simila r results were foun d fo r poly d(A) : poly r(U ) an d pol y d(I):pol y r(C) (159) . More evi dence fo r differen t backbon e conformation s fo r th e tw o strands , terme d 'het eronomous', cam e fro m soli d stat e 31 P NM R (174) , circula r dichrois m (175) , an d Raman spectroscop y (176). However, som e solutio n studie s came to a differen t con clusion, wit h bot h strand s assumin g loose B-for m geometrie s (156,157) . A mor e detailed hig h resolutio n NMR stud y (172) , where severa l non-exchangeable proton s had been assigned, resolve d mos t o f the ol d discrepancies, revealin g tha t th e deoxyri -
Standard DNA duplexes and RNA:DNA hybrids in solution 28
1
boses i n d(CGTATAATGCG):r(CGCAUUAUAACG ) assum e suga r pucker s i n th e general C2'-endo region, wherea s the RNA stran d definitely adopts the C3'-endo con formation. A detailed compariso n betwee n th e hybri d an d th e all-DN A analogu e o f chemical shift s an d NOE connectivit y pattern s in DNA versu s RNA strand s demonstrated clearl y tha t th e tw o strand s assume differen t geometries . A t th e time , a hig h resolution structur e was not determine d fo r the dodecame r sinc e the sequenc e was too long fo r complet e an d unambiguou s assignment s o f al l th e require d protons , espe cially H3 ' an d H4 ' protons . A fe w year s later , hig h resolutio n structure s wer e pre sented by several groups (73,162—164 ) establishin g the heteronomou s characte r of th e solution structure . A detailed structura l analysis of the d(GTCACTATG):r(CAUGU GAC) hybri d (se e Fig. 8.4a ) wa s presented b y Salaza r et al. (162), includin g a discussion o f th e interactio n o f thi s hybri d wit h RNAas e H (177) . Fro m NOES Y an d COSY data , th e author s coul d clearl y establis h th e A-typ e characte r o f th e RN A strand through smal l JH1'H2' coupling constants, typical fo r C3'-endo pucker, and stron g sequential H6/8-H2 ' NOEs . Fo r th e DN A strand , th e NM R dat a clearl y showe d that i t is neither A - no r B-form , bu t rathe r something intermediate . Stron g HI'—H4 ' NOEs, ver y simila r JH1'H2' an d JH1'H2" couplin g constant s ( = 6— 7 Hz) , an d stron g H3'—H4' an d mediu m H2'—H3 ' COS Y peak s indicated a deoxyribose conformatio n in th e O4'-endo rang e (168) . Before turnin g t o th e broade r helica l structural features , it i s interesting t o compar e th e result s for th e deoxyribos e conformatio n wit h othe r studies (73,163,165) , a s the suga r moietie s ca n b e wel l define d b y NM R dat a eve n without derivin g a complete mode l structure. In genera l it seems that most spectr a are consistent i n al l studies. However, a differ ent interpretatio n fo r th e suga r conformatio n ha s been proposed . A mor e consisten t interpretation o f all the dat a suggests a flexible deoxyribose pucke r model involvin g Sand N-typ e conformations , largel y substantiate d throug h additiona l informatio n tha t could no t be reconciled wit h the O4'-endo conformation (164) . As mentioned , th e suga r rin g canno t b e define d wit h hig h precisio n b y NOE s alone sinc e th e larges t suga r pucke r change-induce d distanc e fluctuatio n i s fo r H1'-H4', being 3. 3 A for C2'-endo and C3'-endo, an d 2.5 A for O4'-endo; i.e. barely larger tha n th e typica l accurac y of th e experimentall y determine d distanc e (±0. 2 t o ±0.4 A). We note tha t a quickly interconverting S/ N pucke r mixture wit h onl y 1 0 or 20% o f O4'-endo conformer s woul d giv e ris e t o a significantl y stronge r H1'—H4 ' NOE, wit h a sixth-root-weighte d averag e distanc e o f 3. 0 o r 2. 9 A , respectively. Pucker conformation s clos e to O4'-endo hav e been foun d t o b e relativel y stable wit h theoretical studies (151 ) an d dynamics structure refinement o f DNA (70) . Coupling constant s typicall y augmen t th e structura l description . Her e also , th e accuracy of the picture increases not onl y wit h th e accurac y of the couplin g constant s but als o with ho w man y of them have actually been determined . T o distinguish clearly between a flexible pucker model an d the rigi d O4'-endo conformation , more couplin g constants ar e necessar y tha n just value s fo r JH1,H2, and JH1'H2" an d semi-quantitativ e assessment o f JH2"H3' - Gonzale z et al. (73 ) extracte d JH1'H2' . JH1'H2", JH2'H3', and JH2"H3' with error bound s from ±0. 3 H z to±l H z via simulation o f COSY cross-peaks . Typical experimenta l value s fo r JH1'H2', JH1'H2", and JH2"H3' (6.1—8.5, 6.0—6.5 , >3—4. 0 Hz) ar e indeed compatibl e wit h a single suga r geometry aroun d O4'-endo. However ,
282
Oxford Handbook of Nuclear Acid Structure
values fo r JH2'H3' (5.5—6. 6 Hz ) ar e clearly no t i n agreemen t wit h th e abov e geometry as this couplin g constan t assume s a maximu m valu e o f = 9. 5 H z fo r O4'-endo . (Minimum value s o f 5-6 H z aris e for C2'-endo and C3'-endo conformations.) I t seems clear tha t suc h a substantial deviation canno t simpl y be cause d b y dipola r contribution s to couplin g constant s (45 ) becaus e th e correlatio n time s wer e > 4 n s (se e Sectio n 2. 1 above). Th e mos t difficul t couplin g t o determin e i s JH3'H4' because o f th e lac k o f fin e structure i n th e COS Y peak . Ga o an d Jeff s overcam e thi s obstacl e b y acquirin g {H2',H2"} l-dccoupled COSY spectr a (165) , Th e reporte d JH3'H4'values fo r non terminal residue s (6.1—7. 5 Hz ) ar e compatibl e wit h bot h a single O4'-endo geometr y
Standard DNA duplexes and RNA:DNA hybrids in solution 28
3
Fig. 8.4 , Stereoview s of hig h resolutio n NM R structure s o f R N A : D N A hybrid s wit h on e entir e strand being RN A . (The 5'-end o f th e DN A stran d i s marked. Th e RN A stran d i s shown i n bold. ) (a ) Structure of d(GTCACATG):r(CAGUGUAC ) (Tabl e 8.3 , entr y 1 ) with superposition of th e relativel y straigh t global helix axis , calculate d wit h th e progra m 'Curves ' (68) . (b ) Structure of d(GCTATAA p R TGG):r(CGAUAUUACC) (Tabl e 8.3, entry 4 ) wit h superpositio n of th e relativel y straigh t globa l heli x axis , calculate d wit h the progra m 'Curves ' (68) . 'pR' indicates a chirall y pur e R phosphorothioat e modification, which i s indicated b y a smal l close d circl e i n th e structure , (c ) Heav y atom bes t ti t superpositio n o f 1 0 snapshot s from a MDrar ensembl e for d(GCTATAA p R TGG):r(CGAUAUUACC) (Tabl e 8.3, entry 4) showing th e relativ e flexibilities of the tw o strands . Not e th e les s flexibl e RN A stran d i n th e foreground . Th e overal l geometr y is no t change d compare d wit h depictio n of th e conventiona l rM D averag e structur e i n (b) . (The 10 snap shots cove r th e las t 10 0 p s of 12 0 p s in 1 0 p s steps. )
(JH3'H4' = 7 Hz ) an d a S/N-mixture (JH3'H4' = 1 H z fo r C2'-endo ; JH3'H4' = 8 H z to r C3'-endo). Furthermore , th e C O S Y dat a o f Salaza r e t al. (168 ) showed ver y stron g H3'H4' C O S Y peak s fo r the RN A an d significantl y weake r one s to r th e DNA, whic h also argue s agains t a rigi d O4'-endo conformation i n ligh t o f th e smal l couplin g constant differenc e between C3'-end o and O4'-endo. Another indicatio n o f flexibilit y ca n b e th e incompatibilit y of NOE-derive d distances (16) . Th e intranucleotid e H6/ 8 H3 ' an d H6/8-H2 ' distance s assum e thei r shortest value s fo r C3'-endo an d C2'-endo , respectively , assumin g tha t th e glycosidi c torsion angl e i s adjusted to th e chang e i n suga r pucker. I n th e cas e o f a n S/ N mixture , the averag e values of thes e tw o distance s becom e shor t an d potentiall y unsatisfiabl e b y a singl e conforme r (16) . For a decame r hybri d containin g a single , chirall y pur e phos phorothioatc modificatio n (73) , the abov e distance s fo r th e RN A stran d assum e typical C3'-endo value s for th e majorit y o f residues . Fo r th e DN A strand , however , H6/8-H2' distance s ar e close r t o C2'-endo, while H6/8-H3 ' distance s ar e close r t o C3'-endo. Th e indiscrepancie s wer e no t dramatic , however , owin g t o th e relativel y high erro r bound s o n som e o f thos e distances . In ligh t o f th e above , i t i s no t surprisin g tha t rM D refinemen t o f th e h y b r i d yielded differen t result s whe n NOE-derive d distance s alon e wer e use d versu s
284
Oxford Handbook of Nucleic Acid Structure
refining wit h additiona l couplin g constan t restraints . I n th e firs t case , a structur e was obtaine d exhibitin g goo d agreemen t wit h th e experimenta l dat a an d suga r puckers for the DN A stran d around O 4'-endo (se e Fig. 8.4b) . Refinemen t wit h bot h types o f restraints indicates that bot h canno t b e satisfie d equall y wel l i n on e model ; interestingly, DN A suga r pucker s wer e stil l mostl y i n th e O4'-endo range . Thes e findings ar e consisten t wit h a flexible molecule wher e th e averag e structure canno t satisfy al l data. Furthermore , i t must be note d tha t the O4'-endo conformatio n i s the most probabl e compromis e betwee n satisfyin g al l restraints and keepin g a reasonable conformational energ y fo r a singl e structur e a s the restraine d portio n o f th e suga r ring i s indee d flat . Th e restraint s fo r th e physicall y unachievabl e averag e o f a n equally populate d S/ N mixtur e encod e a conformation wit h al l sugar carbon atom s in one plane . Besides conventiona l rM D refinemen t o f two phosphorothioat e hybrid s (164) , dif fering onl y i n the chiralit y o f the singl e phosphorothioate (Tabl e 8.3, entrie s 3 and 4), Gonzalez et al. also employe d th e mor e flexibl e refinemen t strateg y using time-averaged distanc e and couplin g constan t restraint s (MDtar) . Suc h a MDtar ensembl e (se e Fig. 8.4c ) wa s shown t o satisf y bot h th e couplin g constant s and th e distanc e restraints equally well . Structura l parameters were calclulate d for th e conventiona l rM D struc tures, a s wel l a s fo r lon g trajectorie s usin g conventiona l (rMD ) an d time-average d restraints (MDtar) . Both refinemen t methods hav e been applie d to th e tw o related , (R) and (S ) chiral form s o f th e hybrid . Beside s th e reassurin g result tha t th e completel y independently determine d (R)- an d (S)-for m rM D structure s are virtually identical , with sligh t difference s onl y fo r th e thioat e step , th e mos t strikin g resul t was that th e average value s fo r th e helica l parameter s wer e ver y simila r fo r rM D an d MDta r ensembles (164) . Despite th e large r standard deviations for MDtar parameters , indicat ing a wider conformationa l envelope, al l the sequence-specifi c patterns were repro duced very wel l compare d with the rMD data . With regar d t o suga r conformations , rM D an d MDta r ensemble s exhibite d th e same tigh t distributio n aroun d C3'-endo fo r th e RN A strand . Fo r th e DN A strand , however, th e result s wer e different . Th e tigh t distributio n fo r th e rM D ensemble , often centre d i n th e lowe r S-range , becam e a comple x distributio n patter n fo r th e MDtar ensemble , wit h a widel y populate d S-rang e (O4'-endo t o C2'-endo) an d a significant populatio n i n th e C3'-endo regio n (20—47%) . Gonzale z et al. conclude d that th e overal l helica l appearanc e o f th e hybri d doe s no t chang e significantl y whe n going fro m th e rM D ensemble , whic h onl y satisfie s th e NO E distances , to th e MDta r ensemble, whic h als o satisfie s couplin g constant s (se e Fig . 8.4 b an d c) . Thi s implie s that forcin g th e deoxyribos e moietie s int o compromis e averag e conformation s doe s not distor t th e overal l structure. For both of the high resolution hybrid structure s (164,177 ) helical parameters wer e reported. Unfortunately , th e sequence s ar e quite differen t an d differen t definition s for the helica l parameter s wer e used . Nevertheless , i n bot h structure s twist an d ris e are low, mor e simila r to th e A-form . Fo r the thioat e hybrid , th e x-displacemen t fo r most steps i s around — 3 A, roughl y betwee n th e value s o f the A - an d B-forms . Fedoroffe f al. report som e o f their helica l parameters independentl y fo r eac h strand (177) , which leads t o th e interestin g observatio n tha t th e DN A stran d i s more susceptibl e to rota tions about th e long base pair axis (large fluctuations for roll an d tip), while th e RN A
Standard DNA duplexes and RNA:DNA hybrids in solution 28
5
strand seems to be more pron e to undergo rotation s about the short axi s (larg e fluctuations fo r til t an d inclination ) The mos t importan t doubl e helica l featur e i s probably the mino r groov e width , becaus e i t i s thought t o b e th e primar y locu s fo r specifi c interactions with proteins such as RNAase H . Bot h studie s (164,177) agree , reportin g a minor groov e widt h o f 7.5-9 A for the hybrid , compare d wit h 1 1 A for the A-for m and 6 A for the B-form. Manual dockin g o f RNAase H an d the d(GTCACATG):r(CAGUGUAC ) hybri d structures led to the interesting idea that the overal l helix geometr y o f the hybrid , and especially th e intermediat e groov e width , i s th e basi s fo r th e discriminatio n o f RNAase H agains t double helica l RNA o r DNA (177) . For the authors , the propen sity of the DN A t o adop t O4'-endo suga r pucker constitute s one o f the ke y elements for th e interaction . However , i t i s easy to se e that a hybrid model wit h essentiall y the same helica l geometry bu t a more flexible DNA stran d (164) should certainl y fit into the binding area of RNAase H equall y well .
4.2 Okazaki-like fragments These fragments are hybrids where onl y part of the on e stran d is RNA. Okazak i frag ments for m durin g DN A transcriptio n an d revers e transcriptio n o f vira l genomes . RNAases ultimatel y remov e th e RN A par t by cleavin g exactl y a t the junction o r i n the cas e of E. coli RNAase HI , just before the las t step. This implies that structural discontinuities a t th e junctio n ar e availabl e t o guid e th e RNAases . Fo r syntheti c oligomers modellin g Okazak i fragments, solution an d soli d state result s differ. Crysta l structures o f [r(GGC)d(TATAGCC) 2] (178 ) an d r(GCG)d(TATACCC):d(GGGTAT ACGC) (179 ) reveale d A-for m geometrie s without large disruption s a t the junctions, whereas high resolutio n NMR result s (52 , 167), similar to thos e describe d for hybrids above, suggeste d th e suga r conformation s t o b e i n a a n 'heteronomous ' structure . Indeed, fo r th e r(GCCA)d(CTGC):d(GGTGACG ) hybri d (162) , which represent s a substrate for the RNAas e H activit y o f HIV-1 revers e transcriptase, the fou r ribonu cleotides adopt C3'-endo conformations while th e deoxyribonucleotide s cove r a wide range o f the pseudo-rotatio n whee l (54° > P >144°) . Th e mos t unusua l sugar pucker is reporte d fo r th e deoxyribos e a t th e junctio n (54° > P >90°) , whic h wa s als o observed fo r the 3'-ends . On th e othe r hand , DNA pucker s in the hybrid part are not very differen t fro m othe r DN A sugars . Nevertheless , th e las t tw o bas e pairs o f th e hybrid part appeared to be the mos t 'heteronomous'. Federof f et al. recently published two well-define d hig h resolutio n structure s o f Okazaki fragment s from HIV- 1 (167 ) and Molone y Murin e leukaemi a virus (52) (se e Fig. 8.5a) . I n general , both structures exhibit th e previousl y described heteronomou s feature s fo r th e hybri d sectio n an d more regula r B-form propertie s for the DN A duple x part. Interestingly , the disconti nuities produced a clear bend associate d with th e junction ( = 16° in ref . 52 , = 18 ° in ref. 167) . Distinct change s are seen for som e helical parameters at the junction, whil e other parameter s describing the genera l helical appearance change more gradually . For example, a large negativ e x-displacemen t an d a small inclinatio n fo r th e hybri d par t change concertedly t o a more pronounce d positiv e inclinatio n an d a reduction i n th e x-displacement. Although , th e value s for x-displacemen t giv e th e hybri d segmen t some A-like appearance , inclination an d other helica l parameters depict a double heli x
286
Oxford Handbook of Nucleic Acid Structure
Standard DNA duplexes and RNA:DNA hybrids in solution 28
7
Fig. 8.5 . Stereoview s of high resolutio n NMR structure s of RNA:DNA hybrid s where onl y a part of one stran d is RNA. Th e globa l helix axi s was calculated with the progra m 'Curves' (68) . (Th e 5'-end of the DN A stran d is marked. The RN A par t is shown i n bold.) (a) RNA a t the 5'-end : Okazaki-like frag ment r(CCCA)d(AATGA):d(GGGTTTACT ) (Tabl e 8.3 , entr y 8). Note th e clea r ben d associate d with the RNA:DN A junction . (b ) RNA a t the 3'-end : d(GGAGA)r(UGAC):d(GTCATCTCC ) (Tabl e 8.3 , entry 9) . Th e distortio n in th e globa l helix axi s i s minimal . (c ) RNA i n th e middl e o f a DN A strand : d(CG)r(AGAU)d(GAC):3'-d(CCTCTACTG) (Tabl e 8.3 , entr y 13) . Not e th e curvatur e in th e globa l helix axi s at both junctions.
with a narro w mino r groove , whic h decrease s graduall y fro m th e hybri d segmen t towards the DNA part . Intrastrand distance s between phosphat e group s assum e values between th e A - an d B-form onl y fo r th e RN A segment , wherea s th e entir e DN A par t exhibit s fairl y B-like values. Fedorof f an d coworkers offer a side-by-side comparison o f some of the helical parameter s for the tw o hybrid s (52,167) . Beside s obvious sequence-dependen t structural effects , simila r discontinuities aroun d th e DNA:RN A junction ca n be seen for parameter s roll, tilt , rise , and buckle. Especiall y for th e latter, a large negativ e value for th e junction bas e pair(s) seems to b e a unique feature , a s a similar behaviour i s no t only foun d fo r tw o othe r hybrids , [r(GCG)dTATACCC):d(GGGTATACGC) ] an d [r(CGCG)d(TATACGCG)2] (166) , bu t als o fo r th e A-for m crysta l structur e (179) . For rise , a distinc t increas e toward s th e junctio n i s apparen t fo r bot h o f th e recen t NMR structure s (52,167), althoug h th e decrease i n the DNA segmen t toward s the Aform value s is somewhat surprising . Most backbon e torsio n angle s are close to typica l values for either the A- o r B-form. Th e ribonucleotide s exhibit values differen t fro m the deoxyribonucleotide s onl y for S, E and £, which describ e the 3'-proximit y of the sugar moieties. Furthermore , x angle s for ribonucleotides reflec t th e expected A-for m values, whereas , fo r th e deoxyribonucleotides , intermediat e value s ar e found , reflecting th e adjustmen t to pucker values in th e lower S-range . Th e structura l results for th e self-complementar y hybrid , [r(CGCG)d(TATACGCG) 2] pain t a very simila r picture, wher e a n all-DN A TAT A bo x i s flanke d b y GC-ric h hybri d segment s o n both side s (171). Thi s structur e was analysed as three independen t segments , whic h o n the on e han d bring s ou t th e grossl y differen t helical features , bu t o n th e othe r han d creates the impressio n tha t discontinuities, especiall y for parameters rise, twist, buckle , x-displacement, and inclination, occur strictly at the junctions. Nevertheless, a distinct bend ( = 23°) results , similar to the othe r hybri d structures. DNA sequence s with alter nating A: T pair s ar e known fo r thei r compresse d mino r groov e (13,180) . Sinc e th e hybrid segments seem to induc e smal l groove width s fo r the DN A moietie s as seen in the structure s above, i t is not surprisin g that for the three-segmen t hybrid , a seriously compressed mino r groove is found fo r the TAT A segment . The tw o strand s get clos e enough i n the middl e o f this part (closes t cross-strand interphosphat e distanc e > 5 A) that strong interstrand H2—H1' NOEs were observed . A different structura l situation exist s when the hybrid segment follows th e DNA i n the 3'-directio n (169) . A hybrid segmen t a t the 3'-end doe s no t hav e nearly the same structural impac t as the 5'-counterpar t (se e Fig . 8.5b) . Sinc e the ribose s at the junction an d th e 3'-en d adop t som e intermediat e pucke r value s (JH1'H2 ' = 6 Hz ) an d the mino r groov e widt h i s largely th e sam e fo r th e entir e hybrid , th e heteronomou s
288
Oxford Handbook of Nucleic Acid Structure
character i s hardly tangible . Nevertheless , helica l parameter s rise , roll , tilt , an d slid e show uniqu e trend s for the 3'-hybrid segment .
4.3 RNA inserted into DNA sequences Although, n o biologica l rol e ha s bee n assigne d t o hybrid s wher e a shor t RN A segment is inserted into DNA sequences , such constructs are interesting to complet e a systematic structura l picture. Qualitativ e studie s on hybrid s with doubl e helica l RN A inserts (172 ) foun d the DNA segment s i n B-form geometries , while the RNA sectio n is essentiall y A-form, base d on th e suga r pucker criterio n an d a chemical shif t com parison betwee n th e hybri d an d th e all-DN A analogue . Th e RN A bas e pai r a t th e junction wa s described a s heteronomous sinc e th e 5'-ribos e doe s no t assum e typical A-form pucker , in contrast to th e 3'-ribose. Also, the DN A bas e pair at the junction is heteronomous i n tha t th e 3'-deoxyribos e exhibit s B-for m pucke r an d th e 5'-suga r assumes som e intermediat e state . A hig h resolutio n structur e o f thi s typ e o f hybri d confirmed som e o f th e abov e observations . However , i t i s questionable i f the result s for [d(GC)r(GC)d(GC)] 2 (170 ) ca n reall y b e compare d wit h th e dat a fo r th e abov e dodecamers, sinc e ever y bas e pai r mus t b e considere d termina l o r a t th e junction. Nevertheless, th e structur e of the G C hybri d is reported t o be between th e A- an d Bform, wit h larg e negativ e x-displacemen t bu t smal l til t values . All riboses adop t C3' endo pucke r wherea s onl y th e 5'-deoxyribos e a t th e junctio n i s C2'-endo, an d al l others assume a n intermediate S/ N value . JH1'H2' and JH1'H2" were interpreted a s S/ N mixtures wit h 40—75 % S population. Th e sam e authors also repor t th e structur e o f a self-complementary octame r wher e onl y on e suga r i s change d int o a ribos e (170) . Whereas th e deoxyribose s ar e all in th e S regime, th e ribonucleotid e adopt s A-for m pucker, which doe s not see m to perturb th e overal l B-form geometry . Nihizaki et al. reported a detaile d structur e o f a hybri d noname r (169 ) wher e fou r ribonucleotides ar e placed i n th e middl e o f on e stran d (se e Fig. 8.5c) . I n accor d with the earlie r qualitative interpretation (173 ) it seems that overlapped H2'H2" proton reso nances for the 5'-deoxyribos e and non-C3'-endo pucker s for the firs t ribos e are unique features for the DNA to RNA transition . Overall, the structure o f the one-strand insertion hybri d (169 ) i s reported t o b e close r t o th e A-for m tha n t o a B-form geometry , although th e helica l parameters presented sho w man y fluctuations without givin g clear A- o r B-for m tendencies . Th e mino r groov e width , however , i s clearly large r for th e hybrid section, which is very different fro m al l other hybri d structures discussed here. In this RNA inser t hybrid structure, bends ca n be seen for both junctions.
5. Outlook for the future The discussio n o f DN A an d DNA:RN A hybri d structure s abov e ha s show n th e potential an d th e limitation s o f the NMR-base d approac h fo r hig h resolutio n struc tures. Even th e most accurat e DNA duple x structure s are somewhat dependen t o n th e choice o f refinement method , an d thi s suggests cautio n in usin g specifi c value s of th e derived structura l parameters, especially helica l parameters, and comparing the m wit h the result s o f other studies . Nevertheless, whe n feature s ca n be compared , th e struc tural result s of differen t studie s ar e generally foun d t o b e i n accord . I n ou r opinion ,
Standard DNA duplexes and RNA:DNA hybrids in solution 28
9
most o f the recen t DN A an d DNA:RNA hybri d structure s represent goo d structural models tha t captur e a grea t dea l o f th e sequence-dependen t structura l traits. Wit h current NM R computationa l methods , reliabl e insight s abou t structura l feature s should b e expecte d fo r nuclei c aci d systems that deviat e fro m a standard duplex . Th e influence o n structura l features exerte d b y a distinct, localize d modificatio n shoul d b e revealed readil y throug h NM R methods . Such system s might include modification s i n backbone an d nucleotides , mismatches , unusua l bas e pairs , an d bulge d nucleotides ; some o f these ar e reviewed i n Chapter s 10-13 . Fo r standard DNA duple x structures, we can expect improvements via isotopic labelling techniques similar to developments in protei n an d RN A structur e determination . 15 N- an d 13 C -labelle d precursor s are becoming availabl e now fo r synthesizing DNA chemicall y (181 ) an d enzymatic prepa ration o f sample s (182) . Furthermore , selectiv e deuteratio n a t several position s o f th e sugar moietie s (183,184 ) i s creating uniqu e possibilitie s fo r th e observatio n o f onl y a portion o f a larger system without losin g the ful l structura l context . With respec t to th e accurac y of any NMR-derived structure, the biggest limitatio n comes fro m th e potentia l flexibilit y o f biological macromolecules , whic h migh t lea d to averag e structure s with artefacts . Th e availabilit y of isotope-labelled DN A sample s should als o provide a handle fo r addressing the flexibility problem b y measuring relaxation propertie s o f the heteronucle i similar t o earlie r studie s wit h ver y concentrate d natural abundance sample s (138,185). Although mos t case s of conformational flexibil ity present a n underdefined syste m in term s o f the informatio n availabl e from NMR , future refinemen t method s nee d t o addres s thes e issue s mor e systematically . Beside s the tool s mentione d i n this chapter, other methods fo r generating multipl e conformer s in accor d wit h NM R dat a o r findin g th e combinatio n o f conformer s bes t satisfyin g NMR dat a have already been reported .
References 1. va n den Ven, F.J.M . and Hilbers , C.W. (1988 ) Eur. J. Biochem. 178, 1 . 2. Patel , D.J. , Shapiro , L . and Hare, D . (1987 ) Q . Rev. Biophys. 20 , 35 . 3. Wijmenga , S.S. , Mooren , M.M.W . an d Hilbers , C.W . (1994 ) i n NM R Macromolecules, (ed. Roberts , G.C.K.), p. 217. Oxfor d University Press. 4. Feigon , J. , Sklenar , V. , Wang , E. , Gilbert , D.E. , Macaya , R.F . an d Schultze , P . (1992 ) Meth. Enzymol. 211 , 235 . 5. Schmitz , U . an d James, T.L . (1995 ) Meth. Enzymol. 261 , 1 . 6. Rinkel , L.J. an d Altona, C . (1987 ) J. Biomol. Struct. Dynamics 4, 621 . 7. va n Wijk, J., Huckriede , B.D. , Ippel , J.H. an d Altona, C. (1992 ) Meth. Enzymol. 211, 286 . 8. Kim, S.-G. , Lin, L.-J . an d Reid, B.R. (1992 ) Biochemistry 31 , 3564. 9. Dickerson , R . (1992 ) Meth. Enzymol. 211 , 67 . 10. Metzler , W.J. , Wang , C. , Kitchen , D.B. , Levy , R.M . an d Pardi , A. (1990 ) J. Mol . Biol. 214, 711 . 11. Pardi , A. , Hare , D.R. an d Wang, C. (1988 ) Proc. Natl. Acad. Sci. USA 85 , 8785 . 12. Lane , A.N . (1990 ) Biochim. Biophys. Acta 1049, 189 . 13. Schmitz , U. , Pearlman , D.A . an d James, T.L. (1991 ) J. Mol. Biol. 221, 271 . 14. Ulyanov , N.B. , Gorin , A.A., Zhurki n , V.B., Chen , B. , Sarma , M.H . an d Sarma, R.H . (1992) Biochemistry 31 , 3918 . 15. Ulyanov , N.B . an d James, T.L . (1994 ) Appl. Magn. Reson. 7, 21.
290
Oxford Handbook of Nucleic Acid Structure
16. Ulyanov , N.B., Schmitz , U., Kumar , A. and James, T.L . (1995 ) Biophys. J. 68 , 13. 17. Peck , L.J. and Wang, J.C. (1981 ) Nature 292, 375 . 18. Rhodes , D . an d Klug, A. (1981) Nature 292 , 378 . 19. Melvy l (1984 ) Registered Trademark of the Regents of the University of California. Interne t address: melvyl.ucop.edu . 20. Lane , A.N. (1994 ) Meth. Enzymol. 261, 41 3 21. Macura , S. and Ernst, R.R. (1980 ) J. Mol. Phys. 41, 95. 22. Gorenstein , D.A. (1992 ) Meth. Enzymol. 211, 254 . 23. James , T . L . (1991) Curr. Opin. Struct. Biol. 1, 1042 . 24. Allain , F.H.T. , Gubser , C.C. , Howe , P.W.A. , Nagai , K. , Neuhaus , D . an d Varani, G. (1996) Nature 380 , 646 . 25. Wuthrich , K . (1986 ) NMR of Proteins and Nucleic Acids. Wiley, Ne w York . 26. Thomas , P.D. , Basus , V.J. and James, T.L. (1991 ) Proc. Natl. Acad. Sci. USA 88 , 1237 . 27. Borgias , B.A. and James, T.L. (1989 ) Meth. Enzymol. 176, 169 . 28. Kumar , A., Ernst, R.R. an d Wuthrich, K . (1981) J. Am. Chem. Soc. 103, 3654 . 29. Keepers , J.W. an d James, T.L. (1984 ) J. Magn. Reson. 57, 404 . 30. Borgias , B.A. and James, T.L. (1990 ) J. Magn. Reson. 87, 475. 31. Post , C.B. , Meadows , R.P. an d Gorenstein, D.G . (1990 ) J. Am. Chem. Soc. 112, 6796 . 32. Boelens , R. , Koning , T.M.G. , va n de r Marel, G.A. , va n Boom, J.H. an d Kaptein, R . (1989) J. Magn. Reson. 82, 290 . 33. Pearlman , D. A., Case, D. A., Caldwell, J. C. , Seibel , G. L., Singh, U. C. , Weiner, P . and Kollman, P. A. (1990 ) AMBER, version 4.0. University of San Francisco, San Francisco. 34. Brunger , A . T . (1992 ) X-PLOR, Version 3.1: A System for X-ray Crystallography and NMR. Yal e Universiy Press, New Haven . 35. d e Vlieg, J., Boelens, R., Scheek , R. M. , Kaptein , R. an d van Gunsteren, W. F . (1986 ) Isr. J. Chem. 27, 181 . 36. Young , M.A. , Srinivasan , J., Goljer , I. , Kumar , S. , Beveridge , D.L . an d Bolton , P.H . (1995) Meth. Enzymol. 261, 121 . 37. Molecula r Simulations Inc. (1995 ) Discover, InsightII. Sa n Diego, CA . 38. Nerdal , W., Hare , D.R. an d Reid, B.R. (1989 ) Biochemistry 28 , 10008 . 39. Nibedita , R., Kumar , R.A., Majumdar , A. and Hosur, R.V. (1992) J. Biomol. NMR 2, 477. 40. Robinson , H. an d Wang, A.H.J . (1992 ) Biochemistry 31 , 3524 . 41. Lane , A.N. (1990 ) Biochim. Biophys. Acta 1049 , 205 . 42. Liu , H. , Spielmann , H.P. , Ulyanov , N.B. , Wemmer , D.E . an d James, T.L . (1995 ) J. Biomol. NMR 6, 390 . 43. Altona , C. (1982 ) Rec. Trav. Chim. Pays-Bas 101 , 413 . 44. Harbison , G.S. (1993 ) J. Am. Chem. Soc. 115, 3026 . 45. Zhu , L. , Reid, B.R. , Kennedy , M . an d Drobny, G.P . (1994 ) J. Mag. Res. Ser. A 111 , 195. 46. Celda , B. , Widmer , H. , Leupin , W. , Chazin , W.J. , Denny , W.A . an d Wuthrich , K . (1989) Biochemistry 28 , 1462 . 47. Gochin , M. , Zon , G . and James, T.L. (1990 ) Biochemistry 29 , 11161 . 48. Schmitz , U., Zon , G . and James, T.L. (1990 ) Biochemistry 29 , 2357. 49. Macaya , R., Wang , E. , Schultze , P., Sklenar , V. and Feigon, J. (1992 ) J. Mol. Biol. 225 , 755. 50. Conte , M.R., Bauer , C.J . an d Lane, A.N. (1996 ) J. Biomol. NMR 7, 190 . 51. Schmidt , P. and Griesinger, C . (1994 ) unpublished data. 52. Salazar , M., Fedoroff , O.Y . an d Reid, B.R. (1996 ) Biochemistry 35 , 8126 . 53. Cornell , W. , Cieplak , P. , Bayly , C . L , Gould , I.R . an d Kollman , P.A . (1996 ) J. Am. Chem. Soc. 118, 2309 .
Standard DNA duplexes and RNA:DNA hybrids in solution 29
1
54. Zhurkin , V.B. , Ulyanov , N.B. , Gorin , A.A . and Jernigan, R.L. (1991 ) Proc. Natl. Acad. Sci. USA 88 , 7046 . 55. Ulyanov , N. , Schmitz , U. an d James, T. (1993 ) J. Biomol. NMR 3 , 547. 56. Mauffret , O. , Hartmann , B., Convert, O. , Lavery , R. an d Fermandjian, S. (1992) J. Mol. Biol. 227 , 852 . 57. James , T.L. (1994 ) Meth. Enzymol. 239, 416 . 58. Gonzalez , C. , Rullmann , J.A.C. , Bonvin , M.J.J. , Boelens , R . an d Kaptein, R . (1991 ) J. Magn. Reson. 91, 659 . 59. Withka, J.M., Srinivasan , J. and Bolton, P.H . (1992 ) J. Magn. Reson. 98, 611 . 60. Kaluarachchi , K., Meadows, R.P . an d Gorenstein, D.G . (1992 ) Biochemistry 30 , 8785 . 61. Ulyanov , N.B . an d James, T.L. (1995 ) Meth. Enzymol. 261, 90 . 62. Kim , S.-G . an d Reid, B.R. (1992 ) Biochemistry 31 , 12103 . 63. Schmitz , U. , Sethson , I. , Egan, W.M. an d James, T.L . (1992 ) J. Mol. Biol. 227, 510 . 64. Weisz , K. , Shafer , R.H. , Egan , W. an d James, T.L. (1994 ) Biochemistry 33 , 354 . 65. Gochin , M . an d James, T.L. (1990 ) Biochemistry 29 , 11172 . 66. Schmitz , U . an d James , T.L . (1993 ) i n Structural Biology: The State of the Art, (Sarma , R.H. an d Sarma, M.H., eds) , Vol. 2 , p. 251. Adenin e Press, Schenectady. 67. Diekmann , S . (1989) EMBO J. 8 , 1. 68. Lavery , R. an d Sklenar, H . (1990 ) CURVES 3.0, Helical Analysis of Irregular Nucleic Acids. Laboratory for Theoretical Biochemistr y CNRS, Paris, France 1990 . 69. Leijon , M., Zdunek , J., Fritzsche , H., Sklenar , H. an d Graslund, A. (1995) Eur.J. Biochem. 234, 832 . 70. Schmitz , U., Ulyanov , N.B. , Kumar , A. and James, T.L. (1993 ) J. Mol. Biol. 234, 373 . 71. Kopka , M.L., Fratini, A.V., Drew, H.R . an d Dickerson, R.E. (1983 ) J. Mol. Biol. 163, 129 . 72. Chuprina , V.P . (1987 ) Nucl. Acids Res. 15, 293 . 73. Gonzalez , C. , Stec , W., Kobylanska, A., Hogrefe , R.I. , Reynolds , M . an d James, T.L . (1994) Biochemistry 33 , 11062 . 74. Clore , G.M. an d Gronenborn, A.M . (1985 ) EMBO J. 4, 829. 75. Clore , G.M. , Gronenborn , A. , Moss, D. and Tickle, I . (1985 ) J. Mol. Biol. 185, 219 . 76. Nilsson , L. , Clore , G.M. , Gronenborn , A.M. , Brunger , A.T . an d Karplus , M . (1986 ) J. Mol. Biol. 188, 455 . 77. Nilges , M. , Clore , G.M. , Gronenborn , A.M. , Brunger , A.T. , Karplus , M. an d Nilsson , L. (1987 ) Biochemistry 26, 3734 . 78. Nilges , M. , Clore , G.M . an d Gronenborn, A.M . (1987 ) Biochemistry 26 , 3718 . 79. Gupta , G., Sarma, M.H. an d Sarma, R.H. (1988 ) Biochemistry 27 , 7909 . 80. Sarma , M.H., Gupta , G. and Sarma, R.H. (1988 ) Biochemistry 27 , 3423. 81. Nerdal , W., Hare , D.R. an d Reid, B.R. (1988 ) J. Mol. Biol. 201, 717 . 82. Banks , K.M., Hare , D.R . an d Reid, B.R. (1989 ) Biochemistry 28 , 6996. 83. Baleja , J.D., Pon , R.T . an d Sykes, B.D. (1990 ) Biochemistry 29, 4828 . 84. Baleja , J.D., Germann , M.W. , va n d e Sande , J.H. an d Sykes , B.D. (1990)J . Mol. Biol. 215, 411 . 85. Powers , R. , Jones, C.R . an d Gorenstein, D.G . (1990 ) J. Biomol. Struct. Dynamics 8, 253 . 86. Katahira , M., Sugeta , H. an d Kyogoku, Y. (1990 ) Biochemistry 29 , 7214 . 87. Kerwood , D.J. , Zon , G . and James, T.L. (1991 ) Eur. J. Biochem. 197, 583 . 88. Ito , N., Nakamura , H., Sumikawa , H. an d Nagashima, N. (1991 ) J. Mol. Struct. 242, 119 . 89. Cheng , J.-W., Chou, S.-H., Salazar, M. and Reid, B.R. (1992 ) J. Mol. Biol. 228, 118 . 90. Ulyanov , N.B. , Sarma , M.H., Zhurkin , V.B . an d Sarma , R.H . (1993 ) Biochemistry 32 , 6875. 91. Ulyanov , N.B. , Gorin , A.A., Zhurkin, V.B. , Chen, B.C., Sarma , M.H. an d Sarma, R.H . (1992) Biochemistry 31 , 3918 .
292
Oxford Handbook of Nucleic Acid Structure
92. Mujeeb , A., Kerwin, S.M. , Kenyon , G.L . and James, T.L . (1993 ) Biochemistry 32 , 13419 . 93. Chuprina , V.P., Sletten , E. and Fedoroff, O . (1993 ) J. Biomol. Struct. Dynamics 10 , 693 . 94. Shapiro , L., Nilges, M. an d Eriksson, M. (1993 ) Ada Chem. Scand. 47, 43 . 95. Catasti , P. , Gupta , G., Garcia , A.E., Ratliff , R. , Hong , L. , Yau, P. , Moyzis , R.K . an d Bradbury, E.M. (1994 ) Biochemistry 33 , 3819 . 96. Radha , P.K., Madan, A., Nibedita, R . an d Hosur, R.V . (1995 ) Biochemistry 34 , 5913 . 97. Feng , B . and Stone, M.P . (1995 ) Chem. Res. Toxicol. 8, 821 . 98. Sodano , P. , Hartmann , B. , Rose , T. , Wain-Hobson , S . an d Delepierre , M . (1995 ) Biochemistry 34 , 6900 . 99. Lingbeck , J., Kubinec , M.G. , Miller , J., Reid , B.R. , Drobny , G.P . an d Kennedy, M.A . (1996) Biochemistr y 35, 719 . 100. Bernstein , F.C., Koetzle , T.F., Williams , G.J. , Meyer, E.E. , Brice, M. D., Rodgers , J. R. , Kennard, O. , Shimanouchi , T. an d Tasumi, M . (1977 ) J. Mol. Biol. 112, 535 . 101. Berman , H . M. , Olson , W . K. , Beveridge, D . L. , Westbrook, J. , Gelbin , A. , Demeny , T., Hsieh , S.-H., Srinivasan, A. R. an d Schneider, B. (1992) Biophys. J. 63 , 751 . 102. Herbert , A. , Lowenhaupt , K. , Spitzner , J., Berger , I . an d Rich , A . (1995 ) i n Biological Structure and Dynamics, (Sarma , R.H. an d Sarma, M.H., eds) , Vol. 2, p. 189 . Adenin e Press, Schenectady. 103. Orbons , L.P . and Altona, C . (1986 ) Eur. J. Biochem. 160, 141 . 104. Klysik , J., Stirdivant , S.M. , Larson , J., Hart , P.A . an d Wells , R.D. (1981 ) Nature 290 , 672. 105. Patel , D.J., Kozlowski , S.A., Hare, D.R., Reid , B., Ikuta , S., Lander, N. an d Itakura, K. (1985) Biochemistry 24 , 926 . 106. Ikuta , S. and Wang, Y.S . (1989) Nucl. Acids Res 17, 4131 . 107. Vorlickova , M. (1995 ) Biophys. J. 69 , 2033. 108. Riazance-Lawrence , J.H. an d Johnson, W.CJ . (1992 ) Biopolymers 32 , 271 . 109. Ulyanov , N.B., Gorin , A.A . and Zhurkin, V.B . unpublished results. 110. Gorin , A.A., Zhurkin, V.B . and Olson, W.K. (1995 ) J. Mol. Biol. 247, 34. 111. Gorin , A.A. , Zhurkin , V.B . an d Olson, W.K. (unpublishe d results, cited fro m Ulyano v and James, Meth. Enzymol. 261, 90) . 112. Calladine , C.R . (1982 ) J. Mol. Biol. 161, 343 . 113. Ulyanov , N.B . an d Zhurkin, V.B. (1984 ) J. Biomol. Struct. Dynamics 2, 361 . 114. Poncin , M. , Piazzola , D. an d Lavery, R. (1992 ) Biopolymers 32 , 1077 . 115. Yoon , C. , Prive , G.G. , Goodsell , D.S . an d Dickerson, R.E . (1988 ) Proc. Natl. Acad.Sci. USA 85 , 6332 . 116. Yuan , H., Quintana , J. and Dickerson, R . (1992 ) Biochemistry 31 , 8009. 117. Yanagi , K., Prive, G.G . an d Dickerson, R.E . (1991 ) J. Mol. Biol. 217, 201 . 118. Fedoroff , O.Y., Reid , B.R. an d Chuprina, V.P. (1994 ) J. Mol. Biol. 235, 325 . 119. Chuprina , V.P. , Lipanov , A.A. , Fedoroff , O.Y. , Kim , S.-G. , Kintanar , A . an d Reid , B.R. (1991 ) Proc. Natl. Acad. Sci. USA 88 , 9087 . 120. Olson , W.K . an d Zhurkin , V.B . (1996 ) Biological Structure and Dynamics, (Sarma , R. H . and Sarma , M. H. , eds) , Vol. 2 , p. 341. Adenin e Press , Schenectady. 121. Bolshoy , A. , McNamara , P. , Harrington , R.E . an d Trifonov , E.N . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 2312 . 122. Calladine , C.R. , Drew , H.R . an d McCall, M.J . (1988 ) J. Mol. Biol. 201, 127 . 123. Crothers , D.M., Haran , T.E. an d Nadeau, J.G. (1990 ) J. Biol. Chem. 265, 7093 . 124. Goodsell , D.S. , Kaczor-Grzeskowiak , M . an d Dickerson, R.E . (1994 ) J. Mol. Biol. 239 , 79. 125. Allain , F.H.T. and Varani, G. (1995 ) J. Mol. Biol. 250, 333 .
Standard DNA duplexes and RNA:DNA hybrids in solution 29
3
126. DiGabriele , A.D., Sanderson , M.R. an d Steitz, T.A. (1989 ) Proc. Nad. Acad. Sri. USA 86 , 1816. 127. Fawthrop , S.A. , Yang, J.C. an d Fisher, J. (1993) Nucl. Acids Res. 21, 4860. 128. Hagerman , P.J. (1986 ) Nature 321, 449 . 129. Koo , H.-S . an d Crothers, D.M . (1988 ) Proc. Natl. Acad. Sri . USA 85 , 1763 . 130. Sanghani , S.R., Zakrzewska , K., Harvey, S.C . and Lavery, R. (1996 ) Nucl. Acids Res. 24, 1632. 131. Lefevre , J.F., Lane , A.N. an d Jardetzky, O. (1987 ) Biochemistry 26 , 5076 . 132. Kennedy , M.A. , Nuutero , S.T. , Davis , J.T. , Drobny , G.P . an d Reid , B.R . (1993 ) Biochemistry 32 , 8022 . 133. Liepinsh , E. , Leupin, W . an d Otting, G. (1994) Nucl. Acids Res. 22, 2249 . 134. Kubinec , M.G. an d Wemmer, D.E . (1992 ) J. Am. Chem. Soc. 114, 8739 . 135. Liepinsh , E., Otting, G. and Wuthrich, K . (1992) Nucl. Adds Res. 20, 6549 . 136. Kearns , D.R. (1984 ) Crit. Rev. Biochem. 15 , 237 . 137. Lane , A. (1993 ) Progr. NMR Spectrosc. 25 , 481 . 138. Borer , P.N. , LaPlante , S.R. , Kumar , A. , Zanatta , N. , Martin , A. , Hakkinen , A . an d Levy, G.C. (1994 ) Biochemistry 33 , 2441 . 139. Alam , T.M., Orban , J. an d Drobny, G.P . (1991 ) Biochemistry 30 , 9229 . 140. Reid , B.R., Banks , K., Flynn, P. and Nerdal, W. (1989 ) Biochemistry 28, 10001 . 141. Lipari , G. and Szabo, A. (1982 ) J. Am. Chem. Soc. 104, 4546 . 142. Withka , J.M., Swaminathan , S., Srinivasan, J., Beveridge , D.L . an d Bolton, P.H . (1992 ) Science 255, 597 . 143. Koning , T.M.G. , Boelens, R. , va n der Marel, G.A. , va n Boom, J.H. an d Kaptein, R . (1991) Biochemistry 30 , 3787 . 144. Lane , A., Bauer, C.J. and Frenkiel, T.A. (1993 ) Eur. Biophys. J. 21 , 425 . 145. McAteer , K. , Ellis, P.D. an d Kennedy, M.A. (1995 ) Nucl. Adds Res. 23, 3962 . 146. Altona , C. and Sundaralingam, M . (1972 ) J. Am. Chem. Soc. 94, 8205. 147. Macaya , R.F., Schultze , P. and Feigon, J. (1992 ) J. Am. Chem. Soc. 114, 781 . 148. Emsley , L., Dwyer, T.J. , Spielmann , H.P. an d Wemmer, D.E . (1993 ) J. Am. Chem. Soc. 115, 7765 . 149. Beveridge , D., Swaminathan , S. , Ravishanker, G. , Withka, J., Srinivasan , J., Prevost , C. , Louise-May, S. , Langley, D., DiCapua , F . and Bolton, P.H . (1993 ) in Water and Biological Macromolecules, (Westhof , E., ed.) , p. 143 . CRC Press , Boca Raton . 150. Cheatha m III , T.E. an d Kollman, P.A . (1996 ) J. Mol. Biol. 259, 434 . 151. Gorin , A.A., Ulyanov, N.B . an d Zhurkin, V.B . (1990 ) Mol. Biol. 24, 1036 . 152. Torda , A.E. , Scheek , R.M. an d va n Gunsteren , W.F. (1991 ) i n Computational Aspects of the Study of Biological Macromolecules by Nuclear Magnetic Resonance Spectroscopy, (Hoch , J.C., ed.), p.219. Plenu m Press , New York . 153. Schmitz , U. , Kumar , A. and James, T.L . (1992 ) J. Am. Chem. Soc. 114, 10564 . 154. Nakamura , H. , Oda , Y. , Iwai , S. , Inoue , H. , Ohtsuka , E. , Kanaya , S. , Kimura , S. , Katsuda, C. , Katayanagi , K., Morikawa , K. , Miyashiro , H . an d Ikehara , M. (1991 ) Proc. Natl. Acad. Sri. USA 88 , 11535 . 155. Oda , Y. , Iwai , S. , Ohtsuka , E. , Ishikawa , M. , Ikehara , M . an d Nakamura , H . (1993 ) Nucl. Acids Res. 21, 4690 . 156. Reid , D.G. , Salisbury , S.A., Brown , T. , Williams , D.H. , Vasseur , J.J., Rayner , B . and Imbach, J.L. (1983) Eur. J. Biochem. 135, 307 . 157. Gupta , G. , Sarma, M.H. an d Sarma, R.H. (1985 ) J. Mol. Biol. 186, 463 . 158. Zimmerman , S.B . and Pheiffer, B.H . (1981 ) Proc. Natl. Acad. Sri. USA 78 , 78 . 159. Arnott , S. , Chandrasekaran , R., Millane, R.P . an d Park, H.S . (1986 ) J. Mol. Biol. 188 , 631.
294
Oxford Handbook of Nucleic Acid Structure
160. Egli , M., Usman , N . an d Rich, A. (1993 ) Biochemistry 32 , 3221 . 161. Ban , C., Ramakrishnan , B . and Sundaralingam, M . (1994 ) J. Mol. Biol. 236, 275 . 162. Salazar , M. , Fedoroff , O.Y. , Miller , J.M. , Ribeiro , N.S . an d Reid , B.R . (1993 ) Biochemistry 32 , 4207 . 163. Lane , A.N., Ebel , S . and Brown, T. (1993 ) Eur. J. Biochem. 215, 297 . 164. Gonzalez , C., Stec , W., Reynolds , M . an d James, T.L. (1995 ) Biochemistry 34 , 4969 . 165. Gao , X. an d Jeffs, P.W . (1994 ) J. Biomol. NMR 4 , 367 . 166. Salazar , M., Fedoroff , O., Zhu , L . and Reid, B.R. (1994 ) J. Mol. Biol. 241, 440 . 167. Fedoroff , O. , Salazar , M. and Reid, B.R. (1996 ) Biochemistry 35 , 11070 . 168. Salazar , M., Champoux , J.J. an d Reid, B.R. (1993 ) Biochemistry 32 , 739 . 169. Nishizaki , T. , Iwai , S. , Ohkubo , T. , Kojima , C. , Nakamura , H. , Kyogoku , Y . an d Ohtsuka, E. (1996 ) Biochemistry 35 , 4016 . 170. Jaishree , T.N., va n der Marel, G.A. , van Boom, J.H. an d Wang, A.H. (1993 ) Biochemistry 32, 4903 . 171. Zhu , L. , Salazar, M. an d Reid, B.R. (1995 ) Biochemistry 34 , 2372 . 172. Chou , S.-H. , Flynn, P . and Reid, B.R. (1989 ) Biochemistry 28 , 2435 . 173. Chou , S.-H. , Flynn , P., Wang, A . and Reid, B. (1991 ) Biochemistry 30 , 5248 . 174. Shindo , H. an d Matsumoto, U . (1984 ) J. Biol. Chem. 259, 8682 . 175. Steely , H.T., Gray , D.M . an d Ratcliff, R.L . (1986 ) Nucl. Acids Res. 24, 10071 . 176. Benevides , F.C. , Koetzle , T.F. , Williams , G.J.B. , Meyer , E.F. , Brice , M.D. , Rodgers , J.R., Kennard , O., Shimanouchi , T . an d Tasume, M . (1988 ) Biochemistry 27 , 3868. 177. Fedoroff , O.Y. , Salazar , M. and Reid, B.R. (1993 ) J. Mol. Biol. 233, 509 . 178. Wang , A.H.J. , Fujii , S. , Van Boom , J.H. , Va n de r Marel , G.A. , Va n Boeckel , C.A.A . and Rich , A. (1982 ) Nature 299, 601 . 179. Egli , M. , Usman , N., Zhang , S . and Rich, A. (1992 ) Proc. Nad. Acad. Sci. USA 89 , 534 . 180. Amott , S. , Chandrasekaran, R. , Puigjaner , L.C., Walker , J.K., Hall , I.H. , Birdsall , D.L . and Ratcliff , R.L. (1983 ) Nucl. Acids Res. 11, 1457 . 181. Tate , S., Ono, A . and Kainosho, M . (1994 ) J. Am. Chem. Soc. 116, 5977 . 182. Zimmer , D.P. an d Crothers, D.M . (1995 ) Proc. Natl. Acad. Sci. USA 92 , 3091 . 183. Yamakage , S.I., Maltseva, T.V., Nilson , F.P. , Foldesi , A. and Chattopadhyaya , J. (1993 ) Nucl. Acids Res. 21, 5005 . 184. Agback , P. , Maltseva , T.V. , Yamakage , S.I. , Nilson , F.P. , Foldesi , A . an d Chattopadhyaya, J. (1994) Nucl. Adds Res. 22, 1404 . 185. LaPlante , S.R. , Zanatta , N. , Hakkinen , A. , Wang , A.H . an d Borer , P.N . (1994 ) Biochemistry 33 , 2430 .
9
Nucleic acid hydration Helen M. Berman1 and Bohdan Schneider2 1
Department of Chemistry, Rutgers University, Piscataway, NJ 08854-8087, USA 2 J. Heyrovsky Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, 18223 Prague, Czech Republic
1. Introduction It i s perhaps only a small exaggeratio n t o sa y that th e timin g o f the birt h o f moder n molecular biolog y wa s dependent o n selectin g th e DN A sampl e wit h th e correc t water content . Frankli n an d Goslin g (1 ) firs t observe d tha t a s the humidit y o f th e sample increased, the characteristic s of the fibr e diffractio n patter n changed . Th e lo w humidity A-for m wa s apparently more crystallin e an d was, therefore , th e initia l focu s of thei r attention . However , th e hig h humidit y B-for m wa s mor e interpretabl e because i t yielde d th e characteristi c helica l diffractio n pattern . Onc e attentio n wa s given t o thi s form, the doubl e helica l structure of DNA wa s discovered (2). Fibre diffractio n (3,4 ) studies establishe d tha t th e B-for m o f DN A i s th e long , slender, righthanded heli x tha t has now become a n icon o f biology. A-DNA is shorter and squatter , with th e base s inclined t o th e heli x axis . Thes e studie s als o confirmed that th e presenc e o f ion s an d solven t play s a very stron g rol e i n determinin g whic h conformation a given DNA wil l adopt . The challeng e i s to fin d ou t wh y thi s is so. Early solution studie s introduced th e concep t o f hydration shell s and differentiated this water fro m th e bul k solvent (5,6) . Numerous experimental an d theoretical studies have give n furthe r insigh t int o th e effect s o f sequenc e an d environmen t o n DN A structure. Th e abilit y t o crystalliz e short, define d sequences of nucleic acids has made it possibl e t o visualiz e the boun d wate r usin g th e method s o f X-ray crystallography . Newly develope d NM R technique s hav e allowe d u s to obtai n a dynami c pictur e o f the waters and their interaction s with nuclei c acids. The importanc e o f water in macromolecular recognitio n i s well appreciated , if no t fully understood . Th e carefu l balanc e between th e enthalpic contributio n o f hydroge n bonding an d th e entropi c consequence s o f disruptin g thos e bond s drive s th e inter actions between nuclei c acids and other molecule s including drug s and proteins (7). This chapte r will summariz e th e result s of some recen t studie s of the behaviou r o f nucleic acids in solution, and then present the curren t state of our knowledge abou t th e structure o f water aroun d nuclei c acid s as derived fro m X-ray , NMR , and theoretica l analyses. Reviews o f some o f the earlie r work ca n be found in several sources (8—15).
2. Macroscopic studies Many differen t studie s on th e behaviou r o f DNA an d RNA i n solutio n an d in fibre s have provided dat a about how water influences the behaviour o f these molecules. Th e
296
Oxford Handbook of Nucleic Acid Structure
results o f thermodynami c studie s of nuclei c aci d duplexe s (16 ) an d thei r complexe s with drug s (7,17 ) have been interprete d i n term s o f the influenc e o f hydration on th e structure and interactions of nucleic acids. Changes in entrop y in particular have been correlated wit h transitio n or binding event s that induce ordere d solven t to be release d to th e bul k medium . Magneti c densimetri c technique s hav e bee n use d t o measur e ligand binding-induce d volum e changes , whic h hav e bee n foun d t o correlat e wit h entropy change s (18) . I n thes e studies , volum e increase s correspon d t o entrop y increases which, i n turn , ar e thought t o relat e t o th e releas e of bound water ; volum e contractions are proposed t o be associate d with ne t hydration . By judicious choic e o f samples, i t i s possible t o relat e th e macroscopi c behaviou r to specifi c microscopi c properties , suc h as sequence and conformation . Calorimetri c and densimetri c measurement s (19 ) sugges t tha t B-for m homoduplexe s ar e mor e hydrated tha n thei r A-for m counterparts . The formatio n o f bulged DN A i s accompanied b y volum e contractions , indicatin g tha t ther e i s more coulombi c hydratio n (20). Thes e experiment s sugges t tha t favourabl e change s i n th e thermodynamic s o f hydration compensat e fo r the otherwis e destabilizin g effect o f the bulge . Th e obser vation tha t th e volum e contract s mor e whe n distamyci n bind s t o alternatin g A T polymers tha n whe n i t bind s t o homopolymer s wa s interprete d a s a n indicatio n of highe r hydratio n o f homopolymeri c duplexes . I n addition , i t wa s suggeste d that dru g complexatio n i s accompanie d b y a n increas e i n hydration , whic h ma y result fro m th e strengthenin g o f th e hydroge n bonde d wate r networ k b y th e hydrophobic group s of the ligan d (21) . Other thermodynamic experiment s hav e led to th e conclusion s tha t paralle l DN A i s less hydrate d tha n antiparalle l DN A (22) , and that AT homopolymers hav e differen t hydratio n propertie s than thei r G C coun terparts (23) . Both th e partia l mola r volum e an d th e partia l mola r adiabati c compressibilit y are extremely sensitiv e t o solut e hydratio n (24) . Thes e hav e bee n measure d fo r DN A alone an d complexe d wit h netropsi n usin g densimetri c an d newl y develope d ultra sonic techniques (25,26) . It was found that th e coefficien t o f adiabatic compressibility of the firs t hydratio n shel l is significantly different fro m tha t o f bulk water . I t was also shown tha t duplexe s wit h 55—60 % A T composition s exhibi t th e weakes t hydration ; increases o r decrease s i n A T conten t fro m thi s rang e lea d t o enhance d hydration . Although al l o f th e B-DN A sequence s studie d showe d th e sam e tota l quantit y o f water, th e fac t tha t pol y (dA):pol y (dT ) homopolymer s ar e though t t o b e mor e hydrated tha n alternatin g pol y (dAdT):pol y (dAdT ) copolymer s ma y b e a conse quence o f th e fac t tha t the y hav e stronge r DNA—wate r interactions . However , thi s possibility is not confirme d b y the result s of the volumetri c measurements ; bot h type s of AT duplexes exhibit simila r values for both the partial molar volum e an d the partial molar compressibility . Thus , furthe r studie s ar e require d t o understan d bette r th e hydration propertie s o f these DNA polynucleotides . Osmotic stres s measurement s hav e give n anothe r vie w abou t th e rol e o f wate r i n intermolecular interaction s (27,28). Osmotic stress is applied to a n array of DNA mol ecules an d thei r intermolecula r separation s are measure d b y diffractio n techniques . The result s o f th e experiment s ar e interprete d t o mea n tha t cation s betwee n helice s reorganize th e wate r i n suc h a wa y a s t o balanc e repulsiv e force s wit h long-rang e attractive hydratio n forces . Th e concep t o f positiv e hydratio n force s ma y lea d t o a
Nucleic acid hydration 29
7
simpler physica l model tha n th e on e implie d b y th e mor e traditiona l concept s abou t hydrophobic interactions. Osmotic stres s has also been used to stud y the interaction s of EcoRI and DNA (29) . It has been known fo r some time tha t under som e conditions, th e enzym e has reduced sequence specificity; this has been calle d 'star' activity. When th e DN A cleavag e reaction wa s measured in th e presenc e of several osmolytes, th e 'star ' activity was demonstrated t o b e directl y related to osmoti c pressure. At high pressures , the wate r activity is lowere d and , wit h it , th e specificity . The interpretatio n o f thes e result s i s that th e bound wate r at the DNA-protein interface i s key to th e molecula r specificity .
3. Structural analyses of nucleic acid hydration 3.1 Early studies and methods of analysis The earl y crystal structure determinations of dinucleoside phosphates (30—32) demonstrated th e presenc e o f ordere d wate r i n crystal s an d le d t o th e concep t o f wate r involvement i n th e recognitio n proces s (33) . Th e structur e of a dinucleosid e phos phate complexed t o th e smal l molecule dru g proflavine showed a n elegant pentagonal network o f wate r molecule s reminiscen t o f clathrat e structures around hydrophobi c small molecules (34) . Thi s structure provide d a test bed for many subsequent theoret ical studies (35—37). Th e result s of these early structural analyses gave some important insights int o th e hydratio n o f nuclei c acids . However, i t wa s the observatio n o f th e spine o f hydration i n th e firs t B-DN A structur e (38) tha t mad e i t necessar y to con sider seriousl y th e concep t of water as an integral part of nucleic acids . The firs t structura l studies o f the hydratio n of nucleic acids were don e usin g X-ra y crystallographic methods , whic h allo w u s t o observ e a time-average d vie w o f th e atoms in a crystal. Therefore, i f a water molecule i s exchanging between on e sit e on a molecule an d the bul k solvent , i t wil l b e observe d o n th e electro n densit y map ; th e rate o f exchange does not affec t it s observation. O n th e othe r hand, i f a water mole cule occupie s multipl e site s i t wil l b e difficul t t o observe . I n recen t years , NM R methods have been develope d that not onl y allow the determinatio n o f macromolecu lar structures, bu t als o provide a view o f hydration structur e aroun d thes e molecules . Unlike diffractio n methods , whic h giv e a time-averaged view , NMR method s provid e a dynami c view and thus depend o n th e rat e o f exchange of the wate r molecule with the bul k solvent. If the rat e of exchange is slow, the wate r molecule will b e observed ; if the wate r molecul e i s rapidly exchangin g eve n betwee n a single sit e an d th e sur rounding water, it will no t b e detectable . Both method s ar e thus complementary an d offer u s insight into th e characteristic s of the hydratio n structur e of nucleic acids . In the followin g discussion, we giv e th e result s of both method s fo r th e variou s types of nucleic aci d structures .
3.2 B-DNA There are a few distinctive hydratio n motif s observed i n B-DNA helices. The spin e o f hydration firs t see n i n th e crysta l structur e of a dodecame r containin g th e EcoR l restriction sit e sequenc e d[GAATTC] 2 (BDL001 ) (39 ) ha s bee n observe d i n man y
298 Oxford Handbook of Nucleic Acid Structure
Fig. 9.1. The hydration patterns observed in B-DNA. The hydrogen bonds are shown as dashed lines. (a) The spine of hydration in the minor groove of d [ C G G G A A T T C G C G | j (BDL001) (39). The first shell waters are hydrogen bonded to the purine N3 and pyrimidine O2 atoms and are shown as large spheres. The second shell waters bridge the First shell waters and are shown as small spheres, (b) The double row of waters in the nunor groove of d [ C C A A C G T ' [GG| (BDJ019) (40). The waters making links between the base-attached waters are shown as smaller spheres. (c) Major groove hydration in d | C G A T C G A T C G ] 2 (BDJ025) (41). Two waters attached to gunine make hydrogen bonds to a water attached to thymine.
Nucleic acid hydration 29
9
When th e helica l twis t i s lower tha n tha t shown , anothe r hydroge n bon d forms betwee n th e water s attached t o guamin e N 7 an d adenin e N6 , thu s formin g a mor e extensiv e network . (d ) Phosphat e hydratio n in d [ C T C T C G A G A G ] . (BDJ060) (47) . Hydration spheres o f th e to p an d middl e phosphate s are linked b y a hydroge n bond , whil e waters hydratin g O1 P an d O2 P o f th e sam e phosphate ar e fa r apart . Dotte d line s show -3.6 A lon g contact s betwee n water s hydroge n bonde d t o O2 P an d th e C 6 atom s o f adjacen t pyrimidine bases .
300
Oxford Handbook of Nucleic Acid Structure
other structure s o f B-DNA . Th e spin e i s foun d i n th e mino r groov e o f AAT T regions. Firs t shell waters hydrogen bonde d t o the purine N3 o r pyrimidine O 2 atom s are bridged b y second shel l waters (Fig . 9.la). Additional contact s are made wit h th e O4' atom s o f th e sugars . I n crysta l structure s o f oligonucleotide s wit h wide r mino r grooves, suc h a s d[CCAACGTTGG]2 (BDJ019 ) (40) , double row s o f water s ca n b e accommodated int o the minor groov e (Fig . 9.1b). There ar e als o distinctiv e hydratio n pattern s see n i n th e majo r groove . I n d[CGATCGATCG]2 (BDJ025 ) (41 ) (Fig . 9.1c) , fo r example , a spin e o f firs t shel l waters interconnects th e hydrophili c atom s i n th e CGA T sequence . I n thi s particular example, th e hydratio n pattern s aroun d th e tw o G A step s sho w difference s tha t ar e directly correlate d wit h th e difference s i n their helical twist. Systematic analyses of the pattern s o f hydration aroun d th e bases , sugars , and phosphates (42—44 ) containe d i n DNA crysta l structures have led to a much cleare r under-
Fig. 9.2. Pseudo-electro n densities o f water aroun d bases i n B- , A- , an d Z-typ e DN A conformations . The majo r groov e is on th e uppe r left sid e of each base; the minor groove on the lower righ t side . (a ) The purines have two principa l hydration sites in the majo r groov e and one i n the mino r groove; pyrimidines have one suc h site in each groove. Low but significan t densities near pyrimidine atom C6 reflec t th e existence of waters trapped between phosphate atom O2P an d pyrimidine C6 (se e Fig. 9.1d), (b ) A-DNA bas e hydration i s similar to tha t in B-DNA. Hydratio n in th e majo r groov e of A-DNA i s more extensiv e than in th e minor groove. The differenc e i s quite striking for guanine. (c) Hydration sites of bases in the Z con formation ar e different fro m bot h right-handed conformations. I n th e mino r groove, guanine N2 rathe r than N 3 i s hydrated, and cytosin e has two hydratio n sites. I n th e majo r groove , hydratio n of guanin e is quite distinct with four non-planar hydration sites, two hydratin g O6, an d two N7 .
Nucleic acid hydration 30
1
Fig. 9.3 . A stereogram o f the pseudo-electro n density fo r th e d[CGCGAATTCGCG] 2 (BDL001) (39) dodecamer structure. The five strongest densities in the front reproduce the spine of hydration.
standing o f th e basi s o f th e network s i n DN A doubl e helices . Whe n al l bases o f a single type , with thei r associated water molecules, ar e superimposed t o creat e hydrated building blocks, there are clusters of water associate d with eac h polar atom (Fig . 9.2a). If thes e building block s ar e modelled int o know n B-DN A structures , the hydratio n patterns i n th e groove s ar e reproduce d (43) . Furthermore , buildin g block s create d from decame r structure s alone coul d b e use d t o mode l th e hydratio n pattern s in a dodecamer sequenc e (45 ) (Fig . 9.3) . Thi s mean s tha t th e hydratio n pattern s aroun d the base s are local. In that same study, it was demonstrated that , in principle, th e spin e of hydration ca n be forme d b y both AAT T an d GGC C sequences . I f GGCC coul d form th e sam e conformation as AATT, resultin g in th e narro w mino r groov e dimen sions, the n i t to o coul d nucleat e a spine. Thus , th e hydratio n patter n o f the centra l AATT i n th e dodecame r structur e is as much a function o f th e loca l bas e conforma tion a s it is of the hydroge n bondin g potentia l o f the base. In contras t to th e hydratio n geometr y aroun d th e bases , which show s ver y stron g clustering, the hydratio n aroun d phosphates is more variable and conformation dependent. Analyse s of know n crysta l structure s and theoretica l analyse s suggest tha t eac h charged oxyge n ca n b e surrounde d b y u p t o thre e wate r molecule s whic h ca n b e arranged i n a cone o f hydration (46) . An exampl e o f the type s o f water patterns that can b e forme d aroun d th e phosphat e backbon e i s show n i n th e structur e o f d[CTCTCGAGAG]2 (BDJ060 ) (47 ) (Fig. 9.1d). Of considerabl e interes t is whether the hydratio n see n in crysta l structures can be observed i n solution (48) . NMR method s usin g a combination o f nuclear Overhauser (NOESY) an d rotatin g fram e nuclea r Overhause r spectroscopie s (ROESY ) (49 ) have been successfull y employe d t o stud y hydrate d DN A (50—52) . Thes e studie s confirm the existenc e o f th e spin e o f hydratio n i n th e mino r groov e o f DN A containin g AATT segments . A ver y recen t stud y showe d tha t whil e i n som e sequence s mino r groove hydratio n o f TTA A segment s i s kineticall y destabilize d (51) , ther e ar e sequences where thi s is not th e cas e (52) . I t ha s been assume d tha t th e widt h o f th e minor groov e i s directl y relate d t o th e stabilit y o f th e spin e an d tha t th e TTA A
302
Oxford Handbook of Nucleic Acid Structure
segment woul d hav e a wid e groove . However , th e author s poin t ou t tha t NM R methods d o no t giv e accurat e informatio n abou t groov e widt h an d ther e ar e no t enough X-ra y structure s t o b e abl e t o predic t th e groov e widt h o f a particula r sequence because , amon g othe r things , we d o not , a s yet, know th e effect s o f flankin g sequences. Mor e studie s ar e neede d t o 'confir m thi s putativ e connectio n betwee n hydration lifetimes , minor groov e width , an d nucleotid e sequence ' (52) . Theoretical studie s o f DN A hydratio n hav e bee n reviewe d elsewher e (9,53) . On e very recen t molecula r dynamic s simulatio n give s th e ver y stimulatin g resul t tha t i n over hal f th e trajectory , a sodiu m ion , rathe r tha n a wate r molecule , i s foun d i n th e A-T ste p o f the mino r groov e (54) . Thi s typ e o f geometry wa s observed i n th e hig h resolution crysta l structure o f Ap U (32 ) bu t i t ha s neve r bee n seen , a s yet, i n DN A oligomer crystals . Th e lowe r resolutio n o f thes e structure s makes i t difficul t t o distin guish sodiu m ion s fro m wate r molecules , especiall y if the site s are no t full y occupied . The result s fro m th e theoretica l analysis strongl y sugges t tha t a t leas t some o f the wate r molecules foun d i n X-ra y structure s are actuall y ion s and/o r th e hydratio n site s ar e partially occupie d b y ions . Furthe r experimenta l analyse s a t muc h highe r resolutio n are needed t o resolv e thi s issue.
3.3 A-DNA One o f th e earlies t oligonucleotid e crysta l structure s t o b e reported , d[GGBr 5 UABr 5 UACC|2 (ADHB11 ) (55 ) contains fused pentagona l rings o f water i n th e majo r groov e
Fig. 9.4. Ordere d wate r networ k i n d|(GGBr 5 UABr 5 UACC| 2 (ADHB11 ) (55) . Water molecule s associ a t e d with bridges are shown as Large spheres ; tidier water s a s s m a l l e r spheres .
Nucleic acid hydration 30
3
(Fig. 9.4) . Sinc e thi s first observation , numerou s A-DNA structures have been deter mined an d many exhibit comple x an d interesting hydration patterns. In som e cases , as in d[GGGTACCC] 2 (ADH030 , ADH031) , temperatur e strongl y affects th e hydratio n patter n (56) . O n th e othe r hand , i n d[GGGCGCCC] 2 (ADH057), thi s is not th e case . Here , th e structur e o f the duple x wa s determined a t three differen t temperature s an d th e hydratio n pattern s wer e ver y simila r (57) . Only the hydratio n o f the phosphat e backbone i s less conserved . In a n analysis of five A-DN A crysta l structures that contai n C G i n th e centra l par t of their sequence , certai n commo n feature s wer e observe d (57) . Th e mos t strikin g is a chai n o f water molecule s i n th e mino r groov e tha t interconnec t th e centra l C G t o the backbon e atom s o f symmetry-related molecule s (Fig . 9.5a) . I n thes e sam e struc tures, ther e ar e water-mediate d groove—groov e and groove—backbon e interaction s (Fig. 9.5b) . I n two cases , th e secon d typ e o f water bridg e i s also involved i n pentagonal networks i n the crystal. Systematic analysi s of the bas e hydration i n A-DNA duplexes shows th e sam e type of tight clusterin g o f waters around the base heteroatoms expose d t o solven t (43 ) (Fig. 9.2b). Th e individua l hydration site s for bases in th e A and B conformations are very similar, wit h the majo r difference being i n the relativ e occupancies of the water s in th e
Fig. 9.5. (a ) Water network at the CG ste p in an A-DNA structure d[GGGGCCCC] 2. (b ) Intermolecula r water-mediated groove—groov e an d groove—backbon e interaction s i n A-DN A structures . (Fro m ref . 57 with permission.)
304
Oxford Handbook of Nucleic Acid Structure
Fig. 9.6 . (a ) Spin e o f h y d r a t i o n i n th e mino r groov e o f a Z-DNA d uplex, d[CGU'ACG] 2 ( Z D F B 3 1 ) (60). Wate r molecule s hydroge n bonde d t o th e base s ar e draw n a s larg e spheres, othe r water s a s smalle r spheres. Note that som e waters hydroge n bon d t o phosphat e oxygens . (b ) Cross-stran d O6-w-O 6 an d N+ www-N + wate r bridge s i n th e majo r groov e o f a Z-DNA heli x d [ C G C G C G | 2 (59) . Wate r molecule s associated with bridges ar c shown a s large spheres : othe r water s a s smaller spheres.
Nucleic ac id
hydration 30
5
major an d mino r grooves . I n B-DNA , th e water s ar e localize d equally wel l i n bot h grooves, wherea s i n A-DN A mor e water s ar e foun d i n th e majo r groov e tha n i n th e minor groove . Som e studie s o f th e phosphat e hydratio n sho w that , a s in B-DNA , th e hydration site s ar c les s wel l conserve d (57) . i n many , bu t certainl y no t all , A-DNA structures , ther e ar e wate r bridge s betwee n adjacen t phosphates i n a strand . This featur e le d t o th e concep t o f th e "econom y o f hydration ' (58) whic h i s suggested as a drivin g force i n th e B t o A transitio n whe n th e humidit y i s lowered.
3.4 Z-DNA Z-DNA duplexe s sho w ver y distinctiv e hydratio n patterns . A spin e o f hydratio n is formed i n th e ver y dee p mino r groove . A networ k o f wate r molecule s i s formed b y water molecule s connecte d t o O 2 atom s o f cytosine s fro m opposit e strand s which ar e further hydroge n bonde d t o secon d shel l wate r molecule s (Fig . 9.6a) . I n a detaile d analysis o f th e crysta l structur e o f d|(CGCGCG| 2 (59) , it wa s show n tha t i n th e conve x major groov e ther e ar e bridges betwee n th e tw o guanin e O 6 atom s a t Gp C step s fro m opposite strand s an d betwee n th e tw o N 4 cytosine s i n C pG step s (Fig, 9.6b). In additio n t o th e intraheli x network s foun d i n Z duplexes , there ar e bridge s tha t connect th e helice s in th e crystal . An analysi s o f thes e bridge s show s tha t thei r pres ence ma y b e relate d t o th e Z1/Z 2 conformatio n foun d i n ste p 4- 5 o f man y Z-DN A structures (Fig. 9.7) (60). As i n th e othe r DN A heli x types , th e hydratio n site s aroun d th e base s ar e tightly clustered. The structur e of the hydratio n shell aroun d th e base s i n Z-DN A i s very dif ferent an d mor e comple x tha n tha t i n B - o r A-DN A structures . In th e mino r groove ,
Fig. 9.7. Interhelical water bridges between Z helixes. The water bridges (Iabelled A- E) occur between the phosphates in the Zn conformation at step 4 5 and adjacent helices. (From ref. 611 with permission.)
306
Oxford
Handbook
of Nucleic Acid Structure
the primar y hydratio n sit e o f guanin e i s N2, rathe r tha n N 3 a s in th e A - an d B - con formations. I n th e majo r groove , cytosin e ha s on e localize d hydration sit e i n th e bas e plane (i n a positio n simila r t o B - D N A ) . Hydratio n o f purine s i s concentrate d int o three majo r an d on e lesse r sites , al l of whic h li e outside th e bas e plan e (43) .
3.5 RNA In DN A i t is clear that th e wate r molecule s bonde d t o th e base s ar e a n integra l par t o f the structure . I n RN A th e pictur e i s far more complex , wit h th e hydratio n patterns of the suga r phosphat e backbon e playin g a dominan t structura l role . In a comparativ e stud y o f fou r t R N A crysta l structures, i t ha s been show n tha t th e sugar group s ar e muc h mor e hydrate d tha n thos e i n DN A (61) . Althoug h th e struc tures ar e at relativel y low resolution , mor e tha n 40 % o f th e wate r site s are th e sam e i n all fou r structures . Th e helica l stern s have repetitive hydration patterns, man y o f whic h involve th e O2 ' hydroxy l groups . Th e unusua l bas e pair s foun d i n abundanc e i n t R N A exhibi t wate r bridge s betwee n th e bas e an d th e backbon e atoms . O f mos t interest i s the fac t tha t wate r site s are conserve d i n th e loo p area s and a t th e site s o f th e tertiary interactions . Th e author s o f th e stud y conclud e tha t wate r molecule s ma y indeed b e relate d t o th e stabilizatio n of thes e interactions . More recen t hig h resolutio n studie s of RNA duplexe s als o demonstrat e th e divers e roles tha t wate r play s i n thes e structures . I n on e stud y o f tw o RN A octamer s (62,63) , the 2'-O H group s ar e hydroge n bonde d t o wate r molecule s an d for m a repetitiv e hydration patter n i n th e mino r groov e (Fig . 9.8). Th e majo r groov e als o ha s a net work o f hydroge n bond s tha t involve s th e wate r molecules , th e phosphat e oxygens , and th e hydrophili c bas e atoms . Th e author s sugges t tha t th e hydratio n o f th e 2'-OH grou p ma y contribut e t o th e greate r rigidit y o f A - R N A duplexe s compare d with A-DNA . I n man y way s the wate r pattern s seen i n thi s structur e are analogou s t o
Fig, 9.8 . A n exampl e o f a wate r bridg e i n th e mino r groov e o f a n RN A helix i n v o l v i n g the O2 ' hydroxyl o f th e ribos e sugar (62) .
Nucleic acid hydration 30
7
those observe d i n th e highl y hydrate d structur e o f collage n i n whic h th e hydroxy l group o n th e hydroxyprolin e (64 ) appear s t o pla y a synergisti c rol e wit h th e wate r molecules i n stabilizing the conformation . Water is also involved i n th e bas e mismatches that are seen in RN A structure s that contain internal loops. I n the G: U pairs, waters bridge the N2 o f the guanine an d the O2' hydroxy l i n the mino r groov e (65) . In G:T pair s in DNA, th e wate r bridge s th e N2 an d the O 2 atom s (66,67). The U: C pair s are even more unusua l because there is only on e hydroge n bon d between th e bas e atoms i n th e pair . Th e secon d base—bas e link is mediated by a water bridge.
3.6 Drug-nucleic acid complexes The structur e of d(CpG)—proflavin e provide d th e firs t exampl e o f a n ordere d wate r network i n a DN A dru g comple x (34 ) (Fig . 9.9) . Th e wate r molecule s hydroge n bonded wit h th e bas e an d dru g heteroatom s i n eac h comple x associat e with wate r molecules i n symmetry-related complexe s to form the pentagonal arrays characteristic of this crysta l (68) . The intriguin g quality o f these networks, a s well as the hig h reso-
Fig. 9.9. Pentagona l wate r network i n th e crysta l structur e o f dCpG-proflavine . (From ref . 34 with permission.)
308
Oxford Handbook of Nucleic Acid Structure
lution o f th e structur e analysis , mad e thi s particula r structur e a benchmar k fo r several theoretical analyse s that, t o varyin g degrees, wer e abl e t o reproduc e th e experimenta l results (35-37) . While th e wate r molecule s observe d i n th e d(CpG)-proflavin e structur e ar e perhap s more importan t i n th e crystallin e interactions, ther e ar c no w example s i n whic h wate r
Fig. 9.10. Water-mediate d bridges between r e p r e s s o r o p e r a t o r comple x ( 7 3 ) .
three
DN A base s an d fou r amin o sid e claim s i n th e tr p
Nucleic
acid
hydration 30
9
is situate d at th e interfac e between th e dru g an d th e DNA . This i s seen i n intercalated complexes wit h daunomyci n analogue s (69,70) a s well a s in complexe s betwee n DN A and groov e binder s (71,72) , What i s not clea r a t thi s poin t i s whethe r th e presenc e o f thes e wate r molecule s a t the interactio n sit e i s f o r t u i t o u s o r whethe r the y pla y a rol e i n recognitio n an d specificity. Furthe r solutio n studies, a s wel l a s theoretica l calculations , will b e neede d to determin e this . 3.7 Protein—DNA complexes The importanc e o f wate r i n mediatin g protein—DN A interaction s was firs t demon strated i n th e crysta l structur e o f a comple x betwee n th e u p represso r an d it s targe t DNA (73) . There i s onl y on e direc t bas e contac t bu t ther e ar e lou r water-mediate d contacts involvin g thre e bas e pair s (bas e pair s 5 , 6 , 7) , fou r amin o aci d residues , an d three wate r molecule s i n eac h o f the symmetrica l half-sites (Fig . 9.10) . Recen t mutage nesis studie s o f this syste m (74) sho w tha t i f th e G 6 i s changed t o A , affinit y i s diminished. However , thi s i s reversed i t A 5 i s simultaneously change d t o G . Thi s i s explained by consideratio n o f hydrogen bondin g pattern s involving the water (Fig . 9.11). In a comparativ e stud y o f th e crysta l structure s o f a n uncomplexe d decame r containing th e si x bas e pai r recognitio n sit e an d th e DN A foun d i n th e trp repressor-DNA comple x (75) , it wa s foun d tha t ther e ar e 1 0 conserved wate r molecule s i n the majo r groove . Thes e conserve d wate r molecule s includ e th e thre e tha t ar e involved i n th e protei n interactions , an d th e author s conclud e tha t thes e water s ar e an integral par t o f th e D N A. Since th e firs t observatio n o f water-mediate d protein—DN A interactions , others hav e been observe d (76,77 ) i n crystal s o f DNA—protei n complexes . NM R investigation s have als o indicate d tha t water-mediate d interaction s between DN A an d protein s exis t in solutio n (78) . A combinatio n o f NM R an d molecula r dynamic s simulatio n o f a n Antennapedin homeodomain-DN A comple x provide s furthe r insigh t int o th e rol e o f
Fig. 9.11 . (a) Water mediatin g th e interaction s betwee n th e amin o aci d amid e grou p an d G 6 an d A 5 i n the wil d typ e tr p represso r operato r complex . (b ) Th e same bridge i n th e G5:A6 doubl e mutant. (Fro m ref. 7 4 wit h permission. )
310
Oxford Handbook of Nucleic Acid Structure
water a t the protein—DN A interfac e (79) . In thi s case , th e situatio n i s more comple x than i n th e trp system in tha t water-mediate d contact s coexist wit h direc t contacts , leading t o severa l different contac t geometries. Th e author s suggest that the specificity is a result of the rapi d interconversion o f the ensembl e of structures. This ha s the inter esting consequence of reducing the entropi c cost of complex formation.
4. Summary There is an accumulatin g bod y o f evidence suggestin g tha t wate r play s a key role i n modulating th e conformations , interactions , an d recognitio n propertie s o f nuclei c acids. Physical and biochemical method s continue to be developed tha t provide stron g circumstantial evidenc e tha t th e hydratio n characteristic s o f nucleic acid s must always be taken into account in trying to understand their macroscopic behaviour. Structures o f nuclei c acid s derive d b y X-ra y method s hav e displaye d a variety o f networks o f water s associate d wit h th e molecule s an d thei r complexes . NM R methods hav e confirme d tha t a t leas t some o f thes e water molecule s ar e kineticall y stable. Systematic analysi s of hydrated structure s has allowed u s to determine the likely positions of the waters associated with th e bases , as well as the locu s of those associated with phosphat e groups. The fac t tha t it is possible to predict the positions of waters in the groove s of nucleic acid s with know n conformation s is yet another indicatio n that water should be considered an integral part of nucleic acids. The nex t challeng e i s t o us e thi s knowledg e abou t hydration , whic h ha s been derived fro m structura l studies, to produc e physical models fo r th e solutio n an d biochemical behaviou r of nucleic acids. Although, i t i s not possibl e to d o thi s now, it is not overl y optimisti c t o think that thi s goal i s achievable i n the foreseeable future .
Acknowledgements We wish to acknowledg e the wonderfu l and stimulatin g discussions we have had ove r the years with Davi d Beveridge, Ke n Breslauer, and Stephen Neidl e who continu e t o influence ou r thinkin g abou t hydration . W e than k Christin e Zardeck i fo r he r hel p with thi s manuscript and Eric Plum, T.V. Chalikian, and Rachel Krame r for reviewing the manuscript . W e are also gratefu l for th e continue d fundin g o f thi s wor k b y th e NIH.
References 1. Franklin , R.E. and Gosling , R.G. (1953) Nature 171, 740 . 2. Watson , J.D. and Crick, F.H.C. (1953 ) Nature 171, 737. 3. Arnott , S . (1970) Progr. Biophys. Mol. Biol. 21, 267. 4. Arnott , S. , Campbel l Smith , P.J . and Chandrasekaran , R . (1976 ) i n G.D . Fasman (ed.) , CRC Handbook of Biochemistry and Molecular Biology: Nucleic Adds, (Fasman , G.D., ed.) , pp. 411-422. CRC Press , Cleveland . 5. Saenger , W . (1983 ) Principles of Nucleic Acid Structure, Springe r Advance d Text s i n Chemistry, (Cantor , C.R. , ed.) . Springer-Varlag, Berlin . 6. Texter , J. (1978 ) Progr. Biophys. Mol. Biol. 33, 83 .
Nucleic acid hydration 31
1
7. Breslauer , K.J., Remeta, D.P., Chou , W.-Y., Ferrante, R. , Curry , J., Zaunczkowski, D. , Snyder, J. an d Marky, L.A. (1987) Proc. Natl. Acad. Sci. USA 84 , 8922 . 8. Westhof , E. (1987 ) Int. J. Biol. Macromol. 9 , 186 . 9. Westhof , E . and Beveridge, D.L . (1989 ) Water Sci. Rev. 24. 10. Westhof , E . (1993 ) Water and Biological Macromolecules. Topics in Molecular and Structural Biology, (Westhof , E., ed.). CRC Press , Boca Raton. 11. Westhof , E . (1988 ) Annu. Rev. Biophys. Biophys. Chem. 17, 125 . 12. Berman , H.M . (1986 ) i n Computer Simulation of Chemical and Biomolecular Systems, (Beveridge, D.L . an d Jorgensen , W.L. , eds) , pp . 166-178 . Ne w Yor k Academ y o f Science, Ne w York . 13. Berman , H.M . (1991 ) Curr. Opin. Struct. Biol. 1, 423 . 14. Berman , H.M . (1994 ) Curr. Opin. Struct. Biol. 4, 345 . 15. Jeffrey , G.A . an d Saenger , W . (eds ) (1991 ) Hydrogen Bonding in Biological Structures. Springer-Verlag, New York . 16. Breslauer , K.J. (1991 ) Curr. Biol. 1, 416. 17. Marky , L.A. and Breslauer, K.J . (1987 ) Proc. Natl. Acad. Sci. USA 84 , 4359. 18. Rentzeperis , D. , Marky, L.A. and Kupke, D.W. (1992 ) J. Phys. Chem. 96, 9612 . 19. Rentzeperis , D. , Kupke , D.W. an d Marky, L.A. (1993) Biopolymers 33 , 117 . 20. Zieba , K. , Chu, T.M. , Kupke , D.W. an d Marky, L.A. (1991) Biochemistry 30 , 8018 . 21. Rentzeperis , D. , Kupke , D.W. an d Marky, L.A. (1992) Biopolymers 32 , 1065 . 22. Rentzeperis , D . and Marky, L.A . (1993) J. Am. Chem. Soc. 115, 1645 . 23. Remeta , D.P. , Mudd , C.P. , Berger , R.L. an d Breslauer, K.J. (1993 ) Biochemistry 32 , 5064 . 24. Chalikian , T.V., Sarvazyan , A.P. an d Breslauer, K.J. (1994 ) Biophys. Chem. 51, 89 . 25. Chalikian , T.V. , Sarvazyan , A.P., Plum , G.E . an d Breslauer, K.J . (1994 ) Biochemistry 33 , 2394. 26. Chalikian , T.V. , Plum , E.G. , Sarvazyan , A.P., an d Breslauer, K.J. (1994 ) Biochemistry 33 , 8629. 27. Rau , D.C . an d Parsegian, V.A. (1992) Biophys. J. 61 , 246 . 28. Rau , D.C. an d Parsegian, V.A. (1992) Biophys. J. 61 , 260 . 29. Robinson , C.R . an d Sligar, S.G. (1993) J. Mol Biol. 234, 302 . 30. Rosenberg , J.M., Seeman , N.C., Kim , J.J.P., Suddath , F.L., Nicholas, H.B . an d Rich, A. (1973) Nature 243, 150 . 31. Rosenberg , J.M., Seeman , N.C., Day , R.O. an d Rich, A. (1976) J. Mol. Biol. 104, 145 . 32. Seeman , N.C. , Rosenberg , J.M. , Suddath , F.L. , Kim, J.J.P. an d Rich, A . (1976 ) J. Mol. Biol. 104, 109 . 33. Seeman , N.C., Rosenberg , J.M . an d Rich, A. (1976) Proc. Natl. Acad. Sci. USA 73 , 804 . 34. Neidle , S. , Berman, H. an d Shieh, H.S. (1980 ) Nature 288 , 129 . 35. Swaminathan , S., Beveridge, D.L. an d Berman, H.M . (1990 ) J. Phys. Chem. 92, 4660. 36. Kim , K.S., Corongiu , G. and Clementi, E. (1983) J. Biomol. Struct. Dynamics 1, 263 . 37. Hummer , G. , Garcia , A.E. and Soumpasis, D.M. (1995 ) Biophys. J. 68 , 1639 . 38. Drew , H.R . an d Dickerson, R.E. (1981 ) J. Mol. Biol. 151, 535 . 39. Drew , H.R., Wing , R.M., Takano , T., Broka, C. , Tanaka, S. , Itakura, K. and Dickerson, R.E. (1981 ) Proc. Natl. Acad. Sci. USA 78 , 2179 . 40. Prive , G.G., Yanagi, K. and Dickerson, R.E. (1991 ) J. Mol. Biol. 217, 177 . 41. Grzeskowiak , K. , Yanagi, K. , Prive, G.G. an d Dickerson, R.E . (1991 ) J. Biol. Chem. 266, 8861. 42. Schneider , B., Cohen, D. an d Berman, H.M . (1992 ) Biopolymers 32 , 725. 43. Schneider , B. , Cohen , D.M. , Schleifer , L. , Srinivasan , A.R. , Olson , W.K . an d Berman , H.M. (1993 ) Biophys. J. 65 , 2291 . 44. Umrania , Y., Nikjoo, H. an d Goodfellow, J.M. (1995 ) Int. J. Radiat. Biol. 67, 145 .
312
Oxford Handbook of Nucleic Acid Structure
45. Schneider , B. and Berman, H.M . (1995 ) Biophys. J. 69 , 2661 . 46. Westhof , E . (1993 ) in Water and Biological Macromolecules, (Westhof , E. , ed.) , pp. 226-243 . CRC Press , Boca Raton. 47. Goodsell , D.S., Grzeskowiak , K . and Dickerson, R.E. (1995 ) Biochemistry 34 , 1022 . 48. Kochoyan , M. an d Leroy, J.L. (1995 ) Curr. Opin. Struct. Biol. 5, 329 . 49. Otting , G., Liepinsh, E. and Wuthrich, K . (1991 ) Science 254, 974 . 50. Kubinec , M.G. an d Wemmer, D.E . (1992 ) J. Am. Chem. Soc. 114, 8739 . 51. Liepinsh , E., Leupin, W. an d Otting, G. (1994 ) Nucl. Adds Res. 22, 2249 . 52. Jacobson , A., Leupin, W., Liepinsh , E. and Otting , F . (1996) Nucl. Acids Res. 24, 2911 . 53. Jayaram , B. and Beveridge, D.L . (1996 ) Annu. Rev. Biophys. Biomol. Struct. 25, 367 . 54. Young , M. , A., Jayaram, B. and Beveridge, D.L . (1997 ) J. Am. Chem. Soc. 119, 59 . 55. Kennard , O. , Cruse , W.B.T., Nachman , J., Prange , T., Shakked , Z. an d Rabinovich, D . (1986) J. Biomol. Struct. Dynamics 3, 623 . 56. Eisenstein , M., Frolow , F. , Shakked , Z . an d Rabinovich , D . (1990 ) Nucl. Acids Res. 18 , 3185. 57. Eisenstein , M. and Shakked, Z. (1995 ) J. Mol. Biol. 248, 662 . 58. Saenger , W., Hunter , W.N . an d Kennard, O . (1986 ) Nature 324, 385 . 59. Gessner , R.V., Quigley , G.J . and Egli, M. (1994 ) J. Mol. Biol. 236, 1154 . 60. Schneider , B. , Ginell , S.L. , Jones, R. , Gaffney , B . an d Berman, H.M . (1992 ) Biochemistry 31, 9622 . 61. Westhof , E. , Dumas, P . and Moras, D. (1988 ) Biochimie 70, 145 . 62. Egli , M., Portmann , S . and Usman, N . (1996 ) Biochemistry 35 , 8489. 63. Portmann , S., Usman, N . an d Egli, M. (1995 ) Biochemistry 34, 7569. 64. Bella , J., Brodsky , B. and Berman, H.M . (1995 ) Structure 3, 893 . 65. Holbrook , S.R., Cheong , C., Tinoco, Jr, I . and Kim, S.-H . (1991 ) Nature 353, 579 . 66. Hunter , W.N. , Brown , T. , Kneale , G. , Anand, N.N. , Rabinovich , D . an d Kennard, O . (1987) J. Biol. Chem. 262, 9962 . 67. Kneale , G., Brown, T. , Kennard , O. an d Rabinovich, D . (1985 ) J. Mol. Biol., 186, 805 . 68. Schneider , B., Ginell, S.L. an d Berman, H.M. (1992 ) Biophys. J. 63 , 1572 . 69. Moore , M.H. , Hunter , W.N., d'Estaintot , B.L. and Kennard, O . (1989 ) J. Mol. Biol. 206, 693. 70. Wang , A.H.-J., Ughetto , G. , Quigley , G.J . and Rich, A . (1987 ) Biochemistry 26, 1152 . 71. Brown , D.G. , Sanderson , M.R. , Skelly , J.V. , Jenkins , T.C. , Brown , T. , Garman , E. , Stuart, D.I. an d Neidle, S . (1990) EMBO J. 9 , 1329 . 72. Sriram , M., va n der Marel, G.A. , Roelen , H.L.P.F. , va n Boom, J.H. an d Wang, A.H.-J . (1992) Biochemistry 31 , 11823 . 73. Otwinowski , Z. , Schevitz , R.W. , Zhang , R.-G. , Lawson , C.L. , Joachimiak , A. , Marmorstein, R.Q., Luisi , B.F. and Sigler, P.B. (1988 ) Nature 335, 321 . 74. Joachimiak , A., Haran, T. an d Sigler, P . (1994 ) EMBO J. 13 , 367 . 75. Shakked , Z., Guzikevich-Guerstein , G. , Frolow , F. , Rabinovich , D., Joachimiak, A . and Sigler, P.B. (1994 ) Nature 368, 469 . 76. Hirsch, J.A. an d Aggarwal, A.K. (1995) EMBO J. 14 , 6280. 77. Wilson , D.S. , Guenther , B., Desplan, C. an d Kuriyan, J. (1995 ) Cell 82, 709 . 78. Qian , Y.Q., Otting , G. and Wuthrich, K. (1993) J. Am. Chem. Soc. 115, 1189 . 79. Billeter , M. , Guntert , P. , Luginbuhl, P. and Wuthrich, K . (1996 ) Cell 85, 1057 .
10 Single-crystal X-ray diffraction studies on the non-Watson-Crick base associations of mismatches, modified bases, and nonduplex oligonucleotide structures William N. Hunter1'* and Tom Brown2 'Department of Biochemistry, University of Dundee, Dundee, DD15EH, UK Department of Chemistry, University of Southampton, Southampton, SO17 1BJ, UK
2
1. Introduction The replicatio n o f DN A mus t occu r wit h a hig h degre e o f precisio n i n orde r fo r genetic informatio n t o b e faithfull y transmitte d fro m on e generatio n t o th e next . Watson an d Cric k recognize d tha t a complementar y bas e pairin g scheme i n duple x DNA coul d contribut e t o suc h a mechanis m (1) . I n thi s way, purine s interac t wit h pyrimidines s o tha t guanin e (G ) pair s with cytosin e (C ) an d adenin e (A ) pairs wit h thymine (T ) to form what ar e termed Watson—Cric k base pairs (Fig. 10.1) . The ver y specifi c manne r i n whic h th e Watson—Cric k bas e pairs are formed con tributes stability to a n oligonucleotide structur e an d a particular arrangement o f functional group s fo r interactio n wit h enzyme s an d protein s by , fo r example , specifi c hydrogen bondin g pattern s (2) . However , give n tha t th e huma n genom e i s estimated to contai n aroun d 10 9 base pairs it is hardly surprisin g that mistake s can an d d o occu r during th e replicatio n process . Give n th e redundanc y i n th e geneti c code , no t ever y alteration of the DN A sequenc e will lea d to a change in the gen e produc t but a single error i n a triple t ma y be carrie d throug h an d eventuall y lea d t o a seriou s mutation . Errors ca n b e introduce d vi a non-Watson—Cric k base pairs , terme d mismatche s o r mispairs. Alternatively , damag e t o DN A ca n produc e base s wit h altere d chemica l properties capabl e of scrambling the geneti c cod e (3) . Some mutation s ma y confer an evolutionary advantage , but i n general th e propagatio n o f such mistake s must no t b e allowed an d a complicate d protei n recognitio n an d repai r syste m play s a key rol e i n maintaining th e fidelit y of replication (4) . Structural investigation s o f the protein s involve d i n thi s recognitio n o f mistakes in DNA, an d subsequent repair , represent on e o f the mos t excitin g subject s in structural biology ( 5 and references therein). Studie s on thes e enzyme s follo w o n fro m researc h in a number o f laboratories directe d toward s th e biophysica l characterizatio n o f mis matches an d modifie d base s i n DN A an d RN A an d thei r biologica l implications . Crystallographic studie s hav e provide d structura l detai l t o complemen t thermo *Corresponding author.
314
Oxford Handbook of Nucleic Acid Structure
Fig. 10.1. Th e Watson-Crick base pairs G:C (top ) and A:T (bottom). I n all figures the hydrogen bonds are represented by dashed lines.
dynamic studie s on th e stabilit y of the mismatche s or bas e pair s involving chemically modified component s (6). In addition to a description of mismatch pairings in DNA , a numbe r o f studie s o n RN A fragments , triplexes , quadruplexes , and a nove l loo p assembly hav e highlighted th e importan t rol e o f non-Watson—Crick bas e associations (see Chapter 17) . This can involve an extension fro m tw o bases, interacting with eac h other usin g a specifi c patter n of hydrogen bonds , t o three - an d four-bas e assemblies. Our ai m in this chapter is to highlight the crystallographi c result s on base association s (NMR studie s ar e th e subjec t o f Chapte r 11) . Althoug h w e concentrat e o n mis matches i n duple x DN A som e mention i s made o f other example s involvin g RNA , triplexes, an d quadruplexes. However, th e reade r is directed elsewher e in thi s volume for mor e detaile d coverag e of RNA (Chapte r 17 ) and highe r orde r DN A structure s (Chapters 1 2 and 13) .
2. Mismatches There is a competition betwee n th e Watson—Cric k A:T o r G: C pair s and eigh t non Watson—Crick alternative s that ar e calle d mismatche s o r mispairs . Thes e ar e th e purine—pyrimidine G: T an d A: C pairings , th e purine-purin e G:G , A:A , an d G: A pairings, and, finally, th e pyrimidine—pyrimidine C:C , T:T , an d C:T mismatches . The incorporation o f non-Watson—Cric k bas e pair s i n duple x DN A i s on e o f th e mos t common error s tha t occur s during th e replicatio n process . Mutagenic pathway s are
Single-crystal X-ray diffraction studies 31
5
Fig. 10.2. Mutageni c pathways . Transitio n an d transversion mutation s startin g fro m A: T o r G: C bas e pairs.
divided int o transitio n an d transversion paths. The forme r invoke s purine—pyrimidine mismatches, th e latte r purine—purine o r pyrimidine—pyrimidine mispairs. Figure 10. 2 presents the mutagenic pathways starting from bot h A: T and G:C pairs. The theor y o f mispai r formation , initiall y propose d b y Watso n an d Cric k (7) , extended by Topal and Fresco (8) , and reviewed by Strazewski and Tamm (9 ) relies on the involvemen t o f rare tautomer form s of the bases . The mismatche s involving thes e tautomers could be stericall y equivalen t t o Watson—Cric k base pair s an d unlikel y t o distort o r perturb the duplex into which the y are formed. The crystallographi c stud y o f mispairs canno t giv e an y information o n th e occur rence o f rar e tautomer s durin g th e replicatio n process . However , thes e studie s d o define th e structure of the oligonucleotid e hostin g th e mispair, thu s serving to charac terize any localized perturbation s of structure, the hydrogen bonding patterns linkin g
316
Oxford Handbook of Nucleic Acid Structure
the bases , the influenc e o f neighbouring bases , and clue s abou t ho w recognitio n an d subsequent repai r o f mismatches ma y occur. On e o f the mai n conclusion s fro m mis match studies is that there i s no nee d t o invok e th e presenc e of rare tautomers in mis match formation an d stability. The crystallographi c stud y o f mismatche s ha s i n genera l use d complementar y sequences know n t o for m well-ordere d system s into whic h th e mispair s hav e bee n engineered. Th e mos t commo n framewor k has been th e Drew—Dickerson dodecamer duplex (10) . Thi s sequence , whic h crystallize s readil y i n th e B-form , i s d(CGC GAATTCGCG). Othe r template s hav e bee n A-for m DN A octamer s an d Z-for m hexamers (11) . In eac h case a duplex containin g two mispair s has been formed . Ther e are tw o mai n benefit s i n thi s approach . I t maximize s th e likelihoo d o f getting well ordered singl e crystals for the analysi s and i t means that there is a native Watson—Crick structure that ca n be used for comparative purposes .
2.1 Purine—pyrimidine base pairs The firs t mismatc h pai r t o b e characterize d wa s the G: T i n a n A-for m octame r (12,13). Subsequently, thi s was studied in different sequence environment s and in dif ferent DN A form s (14-16) . Thi s typ e o f purine—pyrimidin e pairin g adopt s what i s termed th e wobbl e configuration , whic h wa s first proposed b y Crick t o explai n G: U pairing a t th e thir d codo n positio n durin g codon—anticodo n interactions (17) . Th e purine i s shifte d toward s th e DN A mino r groov e an d th e pyrimidin e toward s th e major groove . Th e base s maintai n th e majo r tautomeri c form s an d creat e two inter base hydroge n bond s (Fig . 10.3a) . Well-ordere d solven t molecule s bridg e functional groups on th e base s in both majo r and minor groove s and confer additional stability to the pairing . G:Br 5U and G:F 5U pair s (where uraci l contains a bromine o r fluorin e a t the 5 position) have also been characterize d in Z-for m hexamer s (18,19 ) an d wobbl e G:U pairs , plus attendant solvent molecules observed in a fragment of 5S rRNA (20). Inosine (I ) is a guanin e analogu e tha t lacks the 2-amin o group . Thi s bas e is com monly foun d i n tRNA where i t is able to pair with A , C, an d U i n codon—anticodo n interactions. I t i s a n importan t bas e sinc e th e abilit y t o pai r wit h thre e othe r base s contributes to th e degenerac y o f the geneti c code . Inosin e occurs rarely in DNA, a s a result of deamination o f deoxyguanosine, where i t is potentially mutagenic . A specifi c glycosylase i s available to remov e i t fro m DNA . Th e I: T pai r (21 ) assume s a simila r structure t o th e G: T pair , althoug h th e los s o f N2 o n th e mino r groov e sid e o f the duplex remove s the possibilit y o f a stabilizing water bridg e betwee n th e base s in tha t groove. A:C pairing also display s a similar structure t o th e G:T , bu t ther e are two arrangements that could b e invoked t o explai n th e formatio n of two hydroge n bond s linkin g the base s (22,23 ; Fig . 10.3b , c) . A solven t molecul e ca n lin k th e base s on th e majo r groove sid e t o ai d stability, bu t no t o n th e mino r groove side . Th e adenin e i s either protonated o r i n a rar e tautomeri c form . Energeti c consideration s suggeste d th e former an d biophysical characterization of A:C mispairs using NMR an d UV meltin g methods over a wide pH rang e subsequentl y supporte d this proposal (24) . It is perhaps more appropriat e to denot e thi s base pair as A+:C.
Single-crystal X-ray diffraction studies 31
7
Fig. 10.3. (a ) The G: T 'wobble ' pair ; (b ) the A +:C pair ; an d (c ) the A: C pai r wit h th e purin e i n th e imino form .
2.2 Purine-purine base pairs Both A:G and G:G pairs have been characterized in duplex B-DNA. The A:A pairing will b e discusse d i n th e contex t o f non-duple x DN A later . Th e G: A pairin g ha s attracted particular interest sinc e biochemical studie s have identified suc h mismatche s as bein g repaire d wit h muc h les s efficienc y tha n othe r mispair s (25) . A structura l explanation ha s been sought .
318
Oxford Handbook of Nucleic Acid Structure
Fig. 10.4. Fou r example s o f a G: A pai r highligh t th e variabilit y o f thi s mismatch . (a ) G(anti):A.(anti); (b) G(anti):A(syn); (c) A+(anti):G(syn); (d ) G(anti):A(ant!) amino .
Single-crystal X-ray diffraction studies 31
9
Crystallographic an d NM R studie s hav e identifie d fou r G: A configuration s i n DNA (26—29 , Fig . 10.4) . Th e for m of the mispai r that is observed ha s been show n t o depend o n a number o f factors suc h as pH, sal t concentration , and , i n particular , th e sequence environment i n which th e mismatch is located. Th e dependenc e o f the G:A conformation o n th e adjacen t sequenc e ca n be rationalize d i n par t b y dipole—dipole interactions with adjacen t base s (28). Hydrogen bonding usin g a functional group pro vided b y an adjacent bas e can also be important an d this is clear in the exampl e o f th e G(anti):A(anti) pairing . Th e presenc e of an intrabase pair hydrogen bon d betwee n th e amino N 2 o f guanine an d the O 2 o f an adjacent thymin e o n th e opposin g stran d has been note d (26) . Presumably, without a n O2 i n thi s position tha t i s ready t o partici pate i n hydroge n bondin g som e othe r G: A conformatio n coul d b e preferred . Th e G(anti):A(anti) mismatc h also produces a bulge i n th e duple x structur e as the backbone is forced apart to accommodat e th e purine—purine pair in which eac h base adopts the anti conformation . Whe n on e o f th e base s i s in th e syn conformatio n thi s bulgin g effect i s not observed . Th e ke y point abou t studies on th e G: A mispair is that the variablility o f conformations tha t ca n be observe d woul d presen t quit e a challenge t o a n enzyme recognition an d repair system and this may be a n important facto r i n the poor recognition and repair of the G: A mismatch. In the RN A duple x r(CGCGAAUUAGCG ) ther e ar e two A(anti):G(anti) bas e pairs and evidenc e t o sugges t the sam e degree o f variability as that observed in DN A (30) . A carefu l investigatio n o f th e hydroge n bondin g possibilitie s suggest s tha t th e A(anti):G(anti) pairin g use s a conventiona l hydroge n bon d forme d betwee n N 6 an d O6 an d what i s termed a reverse, three-centre hydrogen bon d in which the lone pair on N 1 i s shared wit h th e N— H groups o f th e guanin e N l an d N2 . I n thi s wa y the destabilizing effect s o f having unsatisfie d hydroge n bondin g functional group s can b e avoided. The structura l variation observe d fo r th e G: A mismatc h als o applie s to I: A pairs (31—33). Thi s variabilit y ma y help explai n th e mutagenicit y o f inosine. U V meltin g studies indicate tha t inosine-containin g mismatche s are surprisingly stable (33). Mos t other mismatche s have a tendency t o destabiliz e the DN A duple x an d produce loca l melting effect s tha t ca n ope n u p th e duplex . Repai r enzyme s ca n us e thi s physical property o f the mismatc h duple x t o recogniz e incorrec t bas e pairing. Loca l destabil ization coul d als o assis t the flippin g ou t o f mismatched bases for excision . Th e phe nomenon o f bas e flippin g a s part o f th e protei n recognitio n an d repai r proces s has been note d o n th e basi s of Crystallographic studies (5). There has only been a single structure for the homopurin e G: G mismatch. It shows a G(anti):G(syn) arrangemen t (34) . The detail s are slightl y differen t fo r th e tw o mis pairs in the DN A duple x an d two hydroge n bondin g scheme s hav e been put forward (Fig. 10.5) . G: G transversio n mismatche s ar e readil y repaire d an d i n thi s cas e th e authors not e tha t th e sugar—phosphat e backbon e i s distorte d i n compariso n t o th e native duplex .
2.3 Pyrimidine—pyrimidine base pairs These mismatches have proven difficul t t o characteriz e whe n the y are incorporated i n duplex DNA , bu t ther e ar e som e example s o f C: U an d U: U association s in duple x
320
Oxford Handbook of Nucleic Acid Structure
Fig. 10.5. Tw o slightl y differen t G: G mismatche s hav e bee n observe d i n a B-form dodecame r duplex . Although they are both G(anti):G(syn) th e detail s of the hydrogen bonding vary.
RNA. Th e C: U mispai r has been observe d i n r(GGACUUCGGUCC ) (35) . I n thi s case ther e i s a single hydroge n bon d betwee n th e base s involvin g C(N4) an d U(O4 ) and a bridging solvent linking the tw o N3 group s (Fig. 10.6) . The U: U pai r i s polymorphic. Wha t ar e calle d cis U:U wobbl e pair s hav e bee n observed i n tw o RN A dodecame r structure s (36,37). Thes e ar e als o discusse d in Chapter 17 . The U: U pair s are held togethe r wit h tw o hydroge n bonds (Fig . 10.7a) , and althoug h an ordered solvent is not observe d in both crysta l structures, this pair has
Fig. 10.6. Th e U:C mispai r observed in RNA. W represents a water molecule that bridges the pyrimidines.
Single-crystal X-ray diffraction studies 32
1
Fig. 10.7. Tw o form s (cis and trans) o f the U: U pairin g as observed i n RNA structures .
what appear s to b e a n attractive site to brin g i n a water molecul e i n bot h th e majo r and mino r groov e sides . This woul d b e simila r t o th e G: T mismatc h discusse d above. The nonameri c sequence r(GCUUCGGC)d(Br 5U) ha s a similar U: U pai r at the en d of on e o f th e helices , whic h i s disordere d (38). Th e hexanucleotid e sequenc e r(UUCGCG) crystallize s with a tetranucleotide duple x involvin g C: G pair s and tw o U:U pair s formed by the overhangin g base s (39) . There is a conventiona l hydroge n bond betwee n N3 and O4 but als o a C—H--O hydrogen bon d betwee n C5 and O4 (Fig. 10.7b) . Th e importanc e an d occurrenc e of C—H--- O hydroge n bond s in nuclei c acid structur e ha s bee n discusse d recentl y (40) . Thi s typ e o f interactio n ha s bee n invoked in this particular type of U:U pai r and occurs in a number of base—base inter actions involving modifie d bases, and also in triplex formation.
3. Pairings with modified bases In additio n t o th e pressure s of carryin g ou t replicatio n involvin g a large numbe r o f bases, th e geneti c cod e i s constantly pressured by chemica l and physica l forces i n th e environment o r generate d i n cell s durin g th e norma l cours e o f metabolism . Carcinogenic chemicals , ultraviole t light , ionizin g radiation , an d reactiv e oxyge n
322
Oxford Handbook of Nucleic Acid Structure
species are al l capable of inducing modification s t o DN A (3,4) . Of particula r interes t are alterations to th e purines. Guanine ca n b e methylate d b y alkylnitrosourea s t o for m O6-methylguanin e (O6MeG), whic h i s potentially ver y damagin g sinc e i t alter s th e hydroge n bondin g potential o f the base, thereby promoting G to A transition mutations. Th e O6MeG: T mispair coul d the n b e selecte d durin g replicatio n i n preferenc e to a O6MeG:C pair.
Single-crystal X-ray diffraction studies 32
3
The structur e o f a O6MeG:C pair has been determine d a t physiological p H (41-43 ) and i s shown t o adop t a wobble conformatio n (Fig . 10.8a) . A highly specifi c enzyme , O6-methylguanine methyltransferase , which i s able t o repai r thi s particula r alteration by excising the methy l group , ha s evolved t o contro l thi s aspect of damage to DNA .
Fig. 10.8. (a ) The O6MeG: C pair , which resemble s th e G: T mismatch . (b ) The O6MeG:C + pairing , which resemble s a Watson-Cric k bas e pair . (c ) O6MeG: T mismatch , whic h als o resemble s a Watson-Crick pair . (d ) Th e G (anti):ed A pai r wher e ed A i s ethenoA . (e ) Th e A(anti):O8G(syn) an d (f) G(anti):O8A(syn ) pairings, where O8G an d O8A represen t 8-oxoG and 8-oxoA, respectively.
324
Oxford Handbook of Nucleic Acid Structure
Chemical damag e i s no t induce d solel y b y alkylatin g agents , bu t b y man y othe r carcinogens a s well . Adenin e ca n reac t wit h viny l chlorid e t o produc e 1,N6 ethenoadenosine (edA) . The structur e of the G:ed A pairin g has been determine d (44 ) and th e associatio n is depicted i n Fig . 10.8d . Ther e ar e two obviou s hydroge n bond s and a C—H-- O hydroge n bon d ha s been invoke d betwee n th e H 8 an d O 6 o f G t o alleviate th e destabilizatio n o f a n unsatisfie d hydroge n bon d accepto r i n th e pair . Unlike othe r non-Watson—Cric k pairings, ther e i s significant alteration i n th e con formation o f th e sugar—phosphat e backbon e whe n ed A i s incorporate d int o th e duplex. Suc h perturbatio n coul d represen t a signal fo r th e recognitio n an d repai r of this modified base by 3-methyladenine—DNA glycosylase. Both purine s ca n underg o oxidatio n a t th e 8 positio n t o produc e 8-oxoadenin e (O8A) an d 8-oxoguanin e (O8G) . Th e base s ar e predominantl y i n th e ket o form . Whilst modificatio n a t the 8 position doe s no t affec t th e hydroge n bondin g pattern s on functiona l group s used in G:C and A:T pairs , the presenc e of the O8 and N7(H ) does promot e othe r hydroge n bondin g possibilitie s and a syn conformation abou t th e glycosidic bond. Thi s is noted i n the structure s of O8G:A an d O8A:G pairings (45,46 ; Fig. 10.8e) . Th e presenc e o f the highl y mutageni c O8 G lesio n i n genomic DNA ca n produce a G t o T transversio n mutatio n vi a an intermediat e O8G: A bas e pair. Th e thermodynamic stabilit y o f this pair , i n additio n t o th e psuedo-symmetr y abou t th e glycosidic bonds , perhap s explain s why i t i s not readil y recognize d b y proof-readin g enzymes. O8 A i s not particularl y mutageni c an d th e O8A: G pairing , whils t agai n showing a syn/anti, pai r is asymmetric about th e glycosidi c bonds , a structural feature that ma y make i t easie r to recogniz e an d repair . Thi s pairin g is held togethe r b y fou r bifurcated hydroge n bond s resultin g fro m tw o reverse , three-centre d hydroge n bonding systems . Suc h an arrangement helps to stabiliz e the duplex , since it allows all functional group s in th e mismatched pair to fulfi l thei r hydrogen bondin g capacity. The structura l studies o n duplexe s containin g mismatche s o r modifie d base s have clearly indicate d tha t DN A ha s sufficien t flexibilit y t o incorporat e thes e wit h ease . The sugar—phosphat e backbon e make s smal l adjustment s a s required an d an y distor tions ar e highly localized . Biophysica l characterizatio n includin g U V meltin g studie s indicate tha t whe n non-Watson-Cric k association s are involve d ther e i s more ofte n than no t a reductio n i n T m. Thi s ca n b e ascribe d t o localize d destabilizatio n o f th e duplex structure . The recognitio n an d repai r o f mistakes in th e DN A duple x i s thus likely to occu r at a very localized level. I t will involve a combination o f structural and thermodynamic effect s suc h a s distortions t o th e furanose-phosphat e backbone , th e disposition o f functional group s abl e t o participat e in hydroge n bondin g interaction s with specifi c enzym e residues , and localized meltin g effects .
4. Non-Watson-Crick associations stabilize higher order structures There i s a requiremen t fo r non-Watson—Cric k base interaction s i n som e aspect s o f nucleic aci d structure , i n particula r wher e larg e assemblie s ar e involved . Suc h inter actions are important i n th e stabilizatio n o f large RNA structures , for example, tRN A (reviewed i n Chapter 19 ) and more recently show n i n ribozyme structure s (Chapter 17 ; 47 and references therein, 48). RNA structure s are detailed in Chapter 1 7 and we shall confine ourselve s to some comment s o n DNA triplexes , quadruplexes , and two loops.
Single-crystal X-ray diffraction studies 32
5
Fig. 10.9. Tw o example s o f base triplet s tha t involve on e Watson-Cric k bas e pair interactin g with a third base, (a ) CGC+ an d (b ) TAT triads.
4.1
Triple helices
A triplex is a duplex on to which a third strand i s attached, fo r exampl e by binding in the major groove. Th e three-stranded structur e has been implicate d in genetic recom bination, an d the desig n of molecular fragments able to form and stabilize a designated triplex i s an area of interest with prospect s for antigene therapy .
326
Oxford Handbook of Nucleic Acid Structure
Fig. 10.10. Tw o example s of G tetrads: (a ) the G(anti ) and (b ) the G(anti):G(syn) tetrad .
Single-crystal X-ray diffraction studies 32
7
Crystallographic studie s o f a noname r (49 ) the n a decame r (50 ) wit h a sequenc e designed t o for m a n overhangin g bas e hav e produce d model s fo r bot h paralle l an d antiparallel triplexes. Two types of C:G:C triple t are formed by crystal lattice contacts, which o n the basi s of model building can be extended t o provide tw o distinc t types of triplex (50). A full y forme d triple x structur e ha s been characterize d b y th e Crystallographi c analysis o f a peptid e nuclei c acid—DN A comple x (51) . Thi s molecul e utilize s bot h T:A:T an d C:G:C triplet s to creat e a unique triple x calle d the P-for m helix . Th e us e of a nuclease-resistant backbone, a s in thi s case, in combinatio n with a design strategy targeting triple x formation , open s up ne w possibilitie s in th e are a of antisense thera peutic agents . In thi s example Watson—Cric k pairs are supplemented b y a Hoogstee n base pair involving th e purine interacting with a pyrimidine in the major groove. Tw o types of triplet association s are depicted i n Fig . 10.9 .
4.2 Quadruplexes The termina l segment s o f eukaryoti c chromosome s ar e calle d telomeres . Thes e sec tions of the chromosom e hav e been implicate d i n replication processe s and in stability (52—55). The y hav e an unusual sequence which involves repeating tracts of guanines. The guanine s are able to self-associat e as tetrads or quartets (Fig. 10.10 ) and , under th e influence o f specific cations , this type of G-rich DNA i s able to form a range of parallel and antiparallel quadruplexes. Th e structure s o f d(GGGGTTTTGGGG) (56 ) an d d(TGGGGT) (57 ) hav e bee n determined . I n th e firs t case , eac h stran d form s a n intramolecular hairpin stabilized by G:G pairs . Two hairpins associate in an antiparallel manner to creat e a stack of four guanin e tetrads. The glycosy l bonds alternate betwee n syn an d anti. In th e cas e of d(TGGGGT), th e strand s in th e tetraple x are all parallel t o each othe r an d th e glycosy l bonds ar e al l in a n anti conformation . Eac h quadruple x binds a cation , th e antiparalle l stucture bind s potassium, an d th e paralle l quadruple x binds sodium, eithe r at the centr e o f or between th e G quartets. A series of crystal structures has been determine d tha t ar e stabilized by intercalating hemiprotonated C:C + pairs . Thi s pairin g i s shown i n Fig . 10.1 1 an d involve s thre e hydrogen bond s linkin g th e cytosines . Th e sequence s tha t provid e thes e structures
Fig. 10.11. Th e C:C + pairing .
328
Oxford Handbook of Nucleic Acid Structure
include d(CCCC ) (58) , d(CCCT ) (59) , d(CCCAAT ) (60) , an d d(TAACCC) . (61) . This las t exampl e als o involve s Hoogstee n A(syn):T pairs . I n eac h cas e a tetraplex is formed tha t can be though t o f as a combination o f two paralle l duplexes, intercalate d with opposite polarity .
4.3 A unique loop structure The structur e o f d(GCATGCT) ha s been determine d t o hig h resolutio n (62) . Th e asymmetric uni t i s a single strand that folds bac k upon itsel f to creat e a loop structur e not previousl y see n in structure s of DNA. Th e stem o f the loo p i s formed fro m th e two Gp C steps . However , th e hydroge n bon d dono r an d accepto r group s use d in Watson—Cric k G: C bas e pair s ar e positione d end-o n rathe r tha n i n th e norma l head-to-head fashio n note d i n hairpi n loo p structure s (63,64). Dimerizatio n usin g a crystallographic twofold axi s leads to th e formatio n of an extensive network o f hydro gen bond s forme d b y Watson—Cric k pairin g and , i n addition , b y th e G: C bas e pairs interacting with eac h other o n what ca n be termed th e mino r groov e sid e of the pair. A:A an d T: T bas e pair s ar e clearl y importan t fo r th e stabilit y o f thi s unusua l DN A structure. The A: A pair is formed using a symmetric N6—N 7 amin o hydroge n bonde d conformation, simila r t o tha t observe d i n yeas t tRNA phe (2) . Thi s purine—purin e pairing assists dimerization o f the loo p throug h th e hydroge n bond s an d i t als o assist s the associatio n of two loop dimer s by the base stacking of adjacent A: A pairs. The T: T pai r contribute s mainl y t o stabilizin g the crysta l lattice. I t i s formed by a symmetric hydroge n bondin g o f th e extrudin g thymin e N 3 an d O 2 atom s wit h a n equivalent thymin e o f a symmetry-relate d loop . Th e crysta l structur e o f a Z-for m stem hairpi n loo p ha s als o create d a T: T pai r owin g t o lattic e interaction s (64) , th e sequence i s d(CGCGCGTTTTCGCGCG). Th e loo p contain s fou r thymines . Th e T:T pai r formed between on e loo p wit h a symmetry-related loo p i s such that the rare enol tautomeric for m must be present for one o f the bases. It remains unclear whether th e quadruple x an d loop structure s that have been charac terized are of direct biological relevance . What ha s been shown to be important is the use of non-Watson—Crick bas e associations, base pairs, triplets, an d quartets , to hel p creat e such structures. It is tempting t o sugges t that the tigh t packagin g of nucleic acid s in, fo r example, viral genomes and chromosomes could well utilize similar structural motifs.
Acknowledgements Financial suppor t wa s provide d b y th e Biotechnolog y an d Biochemistr y Scienc e Research Counci l (BBSRC) , th e Engineerin g an d Physica l Science s Researc h Council (EPSRC) , and , in particular, the Wellcome Trust .
References 1. Watson , J.D. an d Crick, F.H.C . (1953 ) Nature 171 , 737 . 2. Saenger , W . (1984 ) Principles of Nucleic Acid Structure. Springer-Verlag, New York . 3. Loft , S . and Poulsen , H.E . (1996 ) J. Mol. Med. 74, 297 . 4. Modrich , P . (1987 ) Annu. Rev. Biochem. 56, 435 .
Single-crystal X-ray diffraction studies 32
9
5. Vassylyev , D.G. an d Morikawa, K. (1997 ) Curr. Opin. Struct. Biol. 7, 103 . 6. Brown , T. , Hunter , W.N. an d Leonard, G.A. (1993 ) Chem. Brit. 6, 484. 7. Watson , J.D an d Crick, F.H.C . (1953) Nature 171 , 964 . 8. Topal , M.D. an d Fresco, J.R. (1976 ) Nature 263 , 290 . 9. Strazewski , P. and Tamm, C . (1990 ) Angew. Chem. Intl. Ed. Engl. 29, 36 . 10. Wing , R.M. , Dre w H.R. , Takano , T., Broka , C. , Takana , S., Itakura, K. and Dickerson, R.E. (1980) Nature 287, 755 . 11. Kennard , O. an d Hunter, W.N. (1991 ) Angew. Chem. Intl. Ed. Engl. 30, 1254 . 12. Brown , T. , Kennard , O., Kneale , G. and Rabinovich, D . (1985 ) Nature 315 , 604 . 13. Hunter , W.N. , Kneale , G. , Brown, T. , Rabinovich , D . an d Kennard, O. (1986 ) J. Mol Biol. 190, 605 . 14. Kneale , G., Brown, T. , Kennard , O. an d Rabinovich, D . (1985 ) J. Mol. Biol. 186, 805 . 15. Hunter , W.N. , Brown , T. , Kneale , G. , Anand , N.N. , Rabinovich , D an d Kennard, O . (1987) J. Biol. Chem. 262, 9962 . 16. Ho , P.S. , Frederick , C.A. , Quigley , G. , va n de r Marel , G.A . va n Boom , J.H., Wang , A.H-J. an d Rich, A. (1985) EMBO J. 4, 3617 . 17. Crick , F.H.C . (1966 ) J. Mol. Biol. 19, 548. 18. Brown , T. , Kneale , G., Hunter, W.N . an d Kennard, O. (1986 ) Nucl. Acids Res. 14, 1801 . 19. Coll , M., Saal , D., Frederick , C.A. , Aymami , J., Rich , A., Wang, A.-H . J . (1989 ) Nucl. Acids Res. 17, 911 . 20. Betzel , C., Lorenz , S., Furste, J.P., Bald , R., Zhang , M., Schneider , T., Wilson , K.S . and Erdmann, V.A. (1994) FEBS Lett. 351, 159 . 21. Cruse , W.B.T. , Aymami , J., Kennard , O. , Brown , T. , Jack, A.G.C . an d Leonard, G.A . (1989) Nucl. Acids Res. 17, 55 . 22. Hunter , W.N., Brown , T. , Anand , N.N. an d Kennard, O. (1986 ) Nature 320, 552 . 23. Hunter , W.N., Brown , T . an d Kennard, O. (1987 ) Nucl. Acids Res. 15, 6589. 24. Brown , T. , Leonard , G.A. , Booth, E.D. an d Kneale, G. (1990) J. Mol. Biol. 221, 437 . 25. Fersht , A.R., Knill-Jones , J.W. an d Tsui, W.C . (1982 ) J. Mol. Biol. 156, 37 . 26. Prive , G.G. , Heinemann , U. , Kan , L.S., Chandrasegaran , S., and Dickerson, R.E. (1987 ) Science 238, 498 . 27. Brown , T. , Hunter , W.N. , Kneale , G.G . an d Kennard , O . (1986 ) Proc. Natl. Acad. Sci. USA 83 , 2402 . 28. Brown , T. , Leonard , G.A., Booth, E.D . an d Chambers, J. (1989) J. Mol. Biol. 207, 455 . 29. Hunter , W.N., Brown , T . an d Kennard, O. (1986 ) J. Biolmol. Struct. Dynamics 4, 173 . 30. Leonard , G.A. , McAuley-Hecht , K. , Abel , S. , Lough , D.M. , Brown , T . an d Hunter , W.N. (1994 ) Structure 2, 483 . 31. Corfield , P.W.R., Hunter , W.N., Brown , T. , Robinson , P and Kennard, O (1987 ) Nucl. Acids Res. 15, 7935 . 32. Webster , G.D. , Sanderson , M.R. , Skelly , J.V. , Neidle , S. , Swann , P.F. , Li , B.F . an d Tickle, I . (1990 ) Proc. Natl. Acad. Sci. USA 87 , 6693 . 33. Leonard , G.A. , Booth , E. , Hunter , W.N . an d Brown , T . (1992 ) Nucl. Acids Res. 20 , 4753. 34. Skelly , J.V., Edwards , K.J., Jenkins, T.C . an d Neidle, S. (1993) Proc. Natl. Acad. Sci. USA 90, 804 . 35. Holbrook , S.R. , Cheong , C. , Tinoco , I . and Kim, S . H. (1991 ) Nature 353, 579 . 36. Baeyens , K.J., De Bondt, H.L . and Holbrook, S.R. (1995 ) Nature Struct. Biol. 2, 56. 37. Lietzke , S.E., Barne, C.L., Bergland, J.A. an d Kundrot, C.E . (1996 ) Structure 4, 917 . 38. Cruse , W.B.T., Saludjian , P. , Biala, E., Strazewski, P., Prange, T. an d Kennard, O. (1994 ) Proc. Natl. Acad. Sci. USA 91 , 4160 . 39. Wahl , M.C . Rao, S.T . an d Sundaralingam, M. (1996 ) Nature Struct. Biol. 3, 24.
330
Oxford Handbook of Nucleic Acid Structure
40. Leonard , G.A. , McAuley-Hecht , K. , Brown , T . an d W.N . Hunter. , W.N . (1995 ) Acta Cryst. D51, 136 . 41. Leonard , G.A., Thomson , J.B., Watson , W.P . an d Brown, T . (1990 ) Proc. Natl. Acad. Sci. USA 87 , 9573 . 42. Ginell , S.L. , Vojtechovsky , J. , Gaffney , B. , Jones , R . an d Berman , H.M . (1994 ) Biochemistry 33 , 3487 . 43. Vojtechovsky , J. , Eaton , M.D. , Gaffney , B. , Jones , R . an d Berman , H.M . (1994 ) Biochemistry 34 , 16632 . 44. Leonard , G.A. , McAuley-Hecht , K.E. , Gibson , N.J. , Brown , T. , Watson , W.P . an d Hunter, W.N . (1994 ) Biochemistry 33 , 4755 . 45. Leonard , G.A. , Guy , A. , Brown , T. , Teoule , R . an d Hunter , W.N . (1992 ) Biochemistry 31, 8415 . 46. McAuley-Hecht , K.E. , Leonard , G.A. , Gibson , N.J. , Thomson , J.B. , Watson , W.P. , Hunter, W.N . an d Brown, T . (1994 ) Biochemistry 33 , 10266 . 47. Scott , W.G. an d Klug, A. (1996) TIBS 21 , 220 . 48. Cate , J.H. , Gooding , A.R. , Podell , E. , Zhou , K. , Golden , B.L. , Kundrot , C.E. , Cech , T.R. an d Doudna, J.A. (1996 ) Science 273, 1678 . 49. va n Meervelt, L. , Dautant, A. , Gallois , B. , Precigoux , G . an d Kennard, O . (1995 ) Nature 374, 742 . 50. Vlieghe , D. , va n Meervelt , L. , Dautant , A. , Gallois , B. , Precigoux , G . an d Kennard , O . (1996) SCIENC E 273 , 1702 . 51. Betts , L.,Josey,J.A., Veal, J.M. an d Jordan, S.R . (1995 ) Science 270, 1838 . 52. Sen , D. an d Gilbert, W . (1988 ) Nature 334 , 364 . 53. Sunquist , W. I . and Klug, A. (1989 ) Nature 342, 825 . 54. Williamson , J. R., Raghuraman , M . K. and Cech, T . R. (1989 ) Cell 59, 871 . 55. Smith , F . W. an d Feigon, J. (1992 ) Nature 356, 164 . 56. Kang, C., Zhang , X., Ratcliff , R. , Moyzis , R . an d Rich, A. (1992) Nature 356 , 126 . 57. Laughlin , G. , Murchie , A.I.H. , Norman , D.G. , Moore , M.H. , Moody , P.C.E. , Lilley , D.M.J. and Luisi, B. (1994 ) Science 265, 520 . 58. Chen , L. , Cai, L. , Zhang, X. an d Rich, A. (1994) Biochemistry 33 , 13540 . 59. Kang, C. , Berger , I. , Lockshin , C. , Ratcliff , R. , Moyzis , R . an d Rich , A . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 11636 . 60. Berger , I. , Kang , C. , Fredian , A. , Ratcliff , R., Moyzis , R . an d Rich , A . (1995 ) Nature Struct. Biol. 2, 416 . 61. Kang , C. , Berger , I. , Lockshin , C. , Ratcliff , R. , Moyzis , R . an d Rich , A . (1995 ) Proc. Natl. Acad. Sci. USA 92 , 3874. 62. Leonard , G . A. , Zhang , S. , Peterson , M . R. , Harrop , S . J., Helliwell , J . R. , Cruse , W.B.T., Langloi s d'Estaintot , B. , Kennard , O. , Brow n T . an d Hunter , W . N . (1995 ) Structure 3, 335 . 63. Chattopadhyaya , R., Ikuta , S., Grzeskowiak, K . an d Dickerson, R.E . (1988 ) Nature 334 , 175. 64. Chattopadhyaya , R., Grzeskowiak , K. and Dickerson, R. E . (1990 ) J. Mol Biol. 211, 189 .
11 DNA mismatches in solution Shan-Ho Chou1 and Brian R. Reid2 1
2
Institute of Biochemistry, National Chung-Hsing University, Taichung, 40227, Taiwan Department of Chemistry and Biochemistry, University of Washington, Seattle WA 98195, USA
1. Introduction The DN A doubl e helix , wit h it s complementar y G: C an d A: T Watson-Cric k bas e pairing, i s a remarkably efficient devic e fo r th e storag e and expressio n of information and th e stabl e transmissio n o f thi s informatio n throug h successiv e generations . Although normal , o r Watson—Crick , bas e pairin g i s mediate d throug h hydroge n bonding between A and T residue s and between G and C residues , the doubl e heli x is also stabilized b y a variety of other 'stacking ' interaction s whic h obviously diffe r fro m one sequenc e to th e next . I n the proces s of copying eac h of the tw o strand s of DN A to produc e tw o identica l doubl e helices , i.e . daughte r cell s with th e sam e geneti c composition, incorrec t o r mismatc h pairing s ( G o r C wit h T o r A , o r wit h them selves) inevitabl y occur. Suc h error s are detected an d correcte d firs t b y proof-readin g at th e replicativ e DN A polyrneras e leve l and , secondly , b y DN A mismatc h repai r systems that operat e in vivo t o excis e an d correct , post-replicatively , thos e nucleotid e misincorporations tha t hav e escape d proof-readin g (1) . Thi s doubl e lin e o f defence serves t o reduc e th e overal l level o f error propagatio n betwee n generation s t o on e i n about 10 11 bas e pairs. In additio n t o mismatc h pairing o f standar d bases produce d b y enzymati c errors , abnormal pairin g involvin g non-standar d base s that hav e been modifie d by chemica l agents, o r b y ionizing radiation, are also excise d and correcte d by the post-replicativ e repair enzym e system . Failure to repai r such aberrant mismatche s leads to th e intro duction o f mutations in the progeny cell DNA molecules , with potentially fata l conse quences tha t includ e cance r an d geneti c diseases . Neglectin g fo r th e momen t protonation a t acidic pH an d base pair orientation , ther e ar e eight possibl e mismatc h pairings, eac h o f which is equally likel y t o pas s on a mutation t o a daughter duplex , yet thes e differen t mismatche s ar e repaire d wit h quit e differen t efficiencies . Th e efficiency o f correction/repai r depend s o n whethe r th e mispairing i s o f th e Pu:Pu , Py:Py o r Pu:P y typ e (2,3) , a s well a s on th e sequenc e o f the flankin g base pair s (4) , implying the recognitio n o f discrete structural features o f the duple x surrounding th e error. I t would therefor e appear obvious tha t an understanding o f the repai r mechan ism and the recognition of mispaired base s by post-replicative repair enzyme s (5 ) at the molecular level will requir e reasonably detailed studie s of the structure s of the corre sponding mismatche d base pairs in a variety of sequence contexts. Although standar d Watson-Crick pairing tends to optimiz e hydroge n bonding , the possibility o f other energeticall y equivalent, non-standar d hydroge n bondin g schemes
332
Oxford Handbook of Nucleic Acid Structure
between tw o base s has long been recognized (6 ) and theoretical calculations have estimated tha t severa l such mismatc h pairing s shoul d b e energeticall y favourabl e as isolated base pairs (7), thus suggesting that not al l abnormal or mismatc h pairings should be assumed , a priori, to be destabilizing . Such 'isolate d base pair' calculation s obviousl y ignore importan t neares t neighbou r stackin g effects , suc h a s dipole—dipol e an d va n der Waal s interactions , an d i t i s t o b e expecte d tha t an y give n mismatc h wil l b e uniquely sensitiv e t o th e surroundin g sequenc e context . I t i s therefor e temptin g t o speculate whether th e more stabl e 'mismatch' sequence s should be considered 'abnor mal' an d whether non-standar d bas e pairing might actuall y exist in vivo an d carr y ou t important biologica l functions . Particularl y intriguin g i n thi s respec t ar e th e lon g stretches of tandeml y repeate d simpl e oligonucleotid e sequences , known as 'satellite DNAs', foun d i n eukaryoti c chromosome s (8) . Th e telomere s o f chromosome s ar e another example ; telomeri c tande m repeat s occu r a t th e covalen t end s o f chromo somes an d for m 'abnormal ' tetra d structure s involving G: G pairin g (fo r reviews se e ref. 9 and Chapter 13) . Several studie s on th e solutio n structur e o f differen t DN A sequence s containing a variety o f isolate d singl e mismatc h bas e pair s hav e bee n carrie d ou t usin g NM R methods (fo r a recent review , see ref. 10) . However, severa l of these attempts failed t o obtain detaile d NM R structure s of the mismatc h site because of the fac t tha t the par ticular mismatc h frequentl y caused destabilization o f the DN A duplex , an d ofte n le d to th e formatio n o f equilibrium mixture s o f multiple interconvertin g structures. The latter problem i s particularly troublesome i n NMR structur e determination and , wit h improvements i n forc e fiel d parameters , ma y b e bette r investigate d b y molecula r dynamics methods t o probe rapi d transition s between metastabl e states. In thi s chapter we wil l restric t ourselve s to a discussion o f well-defined, non-interconverting , stabl e base pairs involving non-complementar y (i n the Watson—Cric k sense) bases. Particular emphasis wil l b e place d o n purine—purin e mispairin g and , wher e possible , w e wil l also attemp t t o discus s th e possible biological implication s o f these unusual structural motifs.
2. Mismatch pairing in antiparallel GA, GGA, and GGGA repeats Tandem polypurin e repea t sequences of the typ e d(G 1_3A)B are highly represente d an d widely distribute d throughou t mammalia n genom e satellit e DNA sequence s (11) . Such sequences have been implicate d i n gen e regulatio n a s well a s genetic recombina tion (12) . Binding protein s specifi c for the complementar y single-strande d d(TC) n an d for d(GA) n DN A sequence s hav e als o bee n identifie d recentl y (13,14) . DN A sequences with th e purines on on e strand and the pyrimidines o n the othe r ar e struc turally polymorphic an d there is increasing evidence tha t they ca n form unusual structures tha t diffe r markedl y fro m norma l B-for m DNA . Alternatin g d(GA) n sequences are perhaps the bes t studie d exampl e and there are reports tha t suc h sequences, in th e absence of the complementar y strand , form antiparallel duplexes wit h themselve s (15), as well a s parallel-stranded, self-paire d duplexe s (16 ) and tetraplexe s (17). Furthermore , d(TC)n:d(GA)n repea t sequence s appea r t o serv e a s pause o r arres t signal s i n DN A
DNA mismatches in solution 33
3
replication an d amplificatio n (18,19) , perhap s a s a resul t o f formin g non-canonica l structures. Although th e formatio n o f parallel-stranded (16 ) o r anti-paralle l (15 ) double helica l structures fo r self-paire d d(GA) n sequence s ha s bee n inferre d fro m nativ e ge l elec trophoresis studies , the precis e base pairing geometr y o f these proposed structure s has been difficul t t o determin e unambiguously . T o date , n o detaile d NM R structura l studies hav e bee n reporte d o n thi s tande m repeat—probabl y becaus e its smal l dinu cleotide repea t natur e produce s highl y overlappe d proto n spectr a whic h ma y b e further broadene d b y interconversion betwee n multipl e conformations . Tw o differen t types o f pairing geometr y hav e been indirectl y deduce d fo r d(GA) n sequences unde r different solutio n condition s usin g accessibilit y t o chemica l modificatio n b y DEP C (diethylpyrocarbonate) and DMS (dimethy l sulfate ) as structural probe s (15,16) . The former reagen t is used to distinguis h between single-strande d an d base paired region s of polynucleotides sinc e th e predominan t reactio n o f DEPC i s to carbethoxylat e th e N7 atom s o f unpaire d purin e residues , wit h adenine s bein g muc h mor e susceptibl e than guanines . Conversely , DM S methylate s th e N 7 positio n o f paired an d unpaire d guanines but ca n be use d t o prob e th e typ e o f bas e pairing , sinc e th e guanin e N 7 i s unreactive toward s DM S whe n i t participate s in hydroge n bonding—a s i t does , fo r example, i n a Hoogsteen bas e pair (20,21) . Using 52-residu e DNA s i n whic h th e firs t (5' ) 1 1 residue s an d th e las t (3' ) 1 1 residues wer e autocomplementar y an d wer e separate d b y a n intervenin g 30-residu e stretch o f 1 5 G A repeats , Huerta s et al. (15 ) were abl e t o sho w tha t suc h sequence s formed fold-bac k hairpi n structures , with hyperreactivit y t o DEP C (single-stranded ness) confine d to a hexanucleotide loo p a t the centr e o f the (GA) 15 run . The y therefore conclude d tha t th e firs t (GA) 6 dodecanucleotid e mus t b e bas e paired t o th e las t (GA)6 dodecanucleotid e i n a n antiparalle l fashion . Al l guanine s wer e foun d t o b e DMS-susceptible, indicatin g n o Hoogstee n pairin g t o G(N7 ) atoms . Becaus e th e adenines i n th e descendin g ar m o f th e stem , i.e . th e secon d (3' ) (GA) 6 run , wer e somewhat les s reactive towards DEPC than thos e in the first (5' ) ascendin g arm (GA) 6 run, a pairing scheme consistin g of Ganti: Asyn pair s alternating with Ganti:Aanti pairs was proposed fo r the (GA) n:(GA)n repeats in the double-strande d ste m o f this hairpin (15) . Based on thes e conclusions, a molecular mode l o f an antiparallel, right-handed duple x containing Ganti:Aanti base pairs interleaved with Ganti:Asyn base pairs was constructed b y computer modelling ; al l th e deoxyribos e sugar s coul d b e successfull y incorporate d into thi s model in the normal C2' -endo conformation . Using simila r DEP C an d DM S probe s o f N 7 accessibility , a s wel l a s excime r fluorescence o f 5 ' pyrene-labelle d shorte r oligonucleotide s o f th e typ e (GA) 7.5 an d (GA)12.5, Ripp e et al. (16 ) cam e t o completel y differen t conclusion s abou t GA:G A pairing in bimolecular homoduplexe s o f GA repeats. The excime r fluorescence studies indicated tha t th e 5'-pyren e label s were a t th e same end of the presume d bimolecula r duplexes, thu s indicatin g parallel-strande d structures ; the formatio n an d stabilit y o f these structure s di d no t requir e acidi c condition s an d thu s di d no t appea r t o involv e protonated bases . Based o n thi s information, th e author s succeede d i n constructin g a right-handed, parallel-strande d duple x wit h a registe r i n whic h Gsyn:Gsyn bas e pairs alternated wit h symmetrica l Aanti:Aanti bas e pair s (16) . However , th e suga r puckers could onl y b e incorporate d int o thi s model i n the les s usual (for DNA) C3' -endo con -
334
Oxford Handbook of Nucleic Acid Structure
Fig. 11.1. Th e variou s types of purine-purine pairings discussed in this chapter.
DNA mismatches in solution 33
5
formation. Th e quit e differen t purine—purin e pairing schemes in these two model s fo r self-paired (GA) n:(GA)n homoduplexe s ar e show n i n Fig . 11.1 ; th e validit y o f thes e models remain s to be tested at atomic resolutio n b y more powerful techniques such as solution NMR o r X-ray crystallography, with the former being preferable by virtue of the avoidanc e of possible lattice packing artefacts . The structura l properties o f d(GGA) n and d(GGGA) n repeat sequence s are even les s well understood . Th e formatio n o f four-strande d tetraple x structure s b y d(GGA) n repeat sequences has been proposed o n th e basi s of cation-stabilization therma l meltin g studies (17) . Recently , th e demonstratio n o f intramolecular hairpi n formatio n b y th e Drosophila centromeri c dodeca-satellit e DN A sequenc e (22) and by d(GGA) n repeats, as well as d(GGGA)n repeat sequences, has shed some light o n thei r pairing (23). In th e evolutionarily conserve d dodeca-satellite 5'-d(GTACGGGACCGA) n repeat s of Drosophila centromeres , th e G-ric h stran d alon e ha s been show n t o for m a fold back structure , base d o n non-denaturin g ge l electrophoresis , electro n microscopy , accessibility t o chemica l modification , an d therma l denaturatio n studie s (22) . Th e central GGGA tract of the 12-me r repeat, and particularly the formatio n of G:A pairs, was found to be critical for the stabilit y of the intramolecula r hairpi n forms. However , the alignmen t an d precis e geometr y o f the purine—purin e pairin g i s not know n an d three differen t register s for a d(GGGA) tract interacting with a second d(GGGA) tract were proposed , namel y a -2 registe r wit h a (GA) 2 motif, a (GGA) 2 motif register , and even a (GGGA) 2 motif alignmen t (22) . By the sam e token, i n d(GGA) n and d(GGGA) n direct tande m repea t sequences, the types of purine—purine pairings and their precise geometry ar e also unclear. However, an interestin g conclusio n ha s been draw n b y Huerta s an d Azorin (23 ) on th e basi s o f chemical modificatio n studies ; namely , tha t pairin g betwee n d(GGA) n sequence s is stabilized b y G: A pairin g o f som e kind , whil e pairin g betwee n d(GGGA) B repeats involves only G: G an d A:A pairs, and not G: A pairs.
3. Mismatches between parallel-stranded CGA triplets and their repeats The abov e discussions suggest that tw o differen t conformation s fo r self-paired d(GA)n repeat sequence s may exist unde r differen t conditions. Unfortunately , detaile d three dimensional structures , determined eithe r by X-ray crystallographi c or NMR solutio n methods, ar e no t ye t availabl e for thes e repeats . A n interestin g related observatio n i s that cytidin e residues , which can pair with themselve s unde r mildl y acidi c condition s to for m stabl e C +:C pairs , hav e bee n foun d t o hel p i n alignin g GA:G A pairin g t o form th e parallel-strande d (CGA) 2 motif , firs t reporte d b y Wan g an d co-worker s (24-26) and later confirmed by Patel's group (27) . The firs t experimenta l demonstratio n o f paralle l alignmen t betwee n strand s i n DNA duplexe s wa s the X-ra y structur e o f crystal s o f d(CG) 2 grow n unde r acidi c conditions; th e nucleotid e residue s in thes e crystals were found to pai r via C+:C an d G:G homobas e pai r mismatche s (28,29) . Mor e recently , Guero n an d co-worker s (30) hav e show n tha t i n mildl y acidi c solutio n eve n simpl e dC n -containing sequences als o form parallel-strande d duplexe s containin g C +:C mismatches ; thes e
336
Oxford Handbook of Nucleic Acid Structure
parallel-stranded duplexe s dimeriz e intercalativel y i n a n antiparalle l orientation t o form th e four-strande d structure known a s the 'i-motif' . Sequence s containing on e or mor e CG A triplet s were foun d t o adop t distinc t structure s at p H value s belo w 5.0, whic h ar e i n a slo w exchang e equilibriu m wit h th e neutra l p H for m (24) . NMR studie s o f th e self-complementar y oligodeoxyribonucleotide s CGATCG , TCGATCGA, an d CGATCGATC G reveale d tha t th e neutra l form s o f thes e oligomers adop t a n antiparallel canonical B-form DN A structure , while thei r acidic forms ar e right-handed , parallel-stranded duplexe s containin g symmetrica l C +:C, G:G, A:A , an d T:T homobas e pair s instead o f Watson-Crick pair s (24) . The paral lel-stranded (CGA) 2 motif i s crucial to th e formatio n of such structures, which wer e proposed t o b e stabilize d by strong interstrand GA stacking, as well a s by hemiprotonated C +:C pairing . In fact , th e simpl e tetranucleotid e d(TCGA ) ha d bee n studie d b y NM R muc h earlier (31) ; based on th e observatio n o f several shifted proto n resonances , it was suggested tha t this sequence forms a non-B-form DN A duple x a t low temperature . Th e temperature-dependent transition s of thi s non-B-form structur e coul d b e duplicate d reversibly b y titratio n t o acidi c p H values , an d a protonate d antiparalle l (TCGA) 2 duplex mode l containin g a G:C + Hoogstee n bas e pai r wa s proposed (32) . However , the NM R data , which ca n readily distinguish between th e syn and anti orientations of the glycosidi c bond, d o not suppor t th e syn conformation proposed for the guanosines in this model . The structur e of the d(TCGA) 2 duple x wa s solved mor e recentl y b y Patel' s group using more extensiv e NM R dat a (27) , and was found to for m hemiprotonated C +:C pairs, a s well a s G: G an d A: A homopurin e pairs , a s expecte d i n a parallel-stranded d(CGA)2 motif . Th e one-dimensiona l imin o proto n spectru m exhibite d a resonance at approximatel y 1 5 ppm, whic h i s a characteristi c of C +:C pairin g an d reflect s th e mildly acidi c pH conditions . I n addition , imin o proto n resonance s were observe d a t 10.2 an d 11. 3 ppm . Th e resonanc e at 10. 2 pp m i s characteristic of a non-hydroge n bonded, bu t slowl y exchanging , guanosin e imino proton , a s occurs, fo r example , i n G:A pair s that ar e i n th e sheared , o r side-by-side , geometr y (33—35) . Similarly , th e imino proto n a t 11. 3 pp m i s characteristic o f a thymidine imin o proto n tha t i s no t hydrogen bonded bu t exchange s slowly with water (36) , either as a result of restricted solvent accessibilit y or th e reduce d hydroxide/buffe r exchang e catalysi s at lowe r p H values. The termina l thymidine residues in the d(TCGA) 2 sequenc e thus may not b e paired via hydroge n bonding , sinc e th e imin o proto n i s the onl y potentia l hydroge n bon d donor i n deoxythymidine; and , indeed, i n the NMR structur e reported for this parallel-stranded duplex , th e thymin e bas e i s oriente d toward s th e suga r moiet y o f th e thymidine o n th e opposit e stran d (27) . It is not clea r whether th e lac k o f T:T pairin g is an inherent property of the parallel-strande d (TCGA) 2 duple x or merely a reflectio n of the thymine s being termina l residues, especially since mismatched T: T pairin g has been proposed when th e TCGA tetranucleotide i s embedded i n the centr e of a longer sequence (24). Another importan t structura l feature o f sequence s formin g such parallel-stranded duplexes i s strong interstrand G/ A stacking , which ha s some similaritie s to th e inter strand G/ G an d A/ A stackin g in antiparallel-strande d tande m sheare d G: A pair s in
DNA mismatches in solution 33
7
d(PyGAPu)2 motif s (se e below). A compariso n betwee n parallel-strande d 5'-(GA) 2 stacking an d antiparallel-strande d 5'-(GA) 2 stackin g is shown i n Plat e IVa . One Gp A strand is shown a s a space-filling, va n de r Waals representation, while th e secon d Gp A strand is represented i n stick-bond form ; the space-fillin g strand s have the same orientation, with th e guanosine residues on top an d the adenosin e residues below. Th e glycosidic bonds o f the base s in both duplexe s all have the same anti conformation, whil e the phosphat e backbon e conformatio n is quite differen t fo r the paralle l and antiparallel cases. Th e torsio n angle s of the suga r phosphate backbon e connectin g th e guanosin e and adenosin e residue s exhibi t a e (g - )£(t) configuratio n i n th e antiparalle l duplex , while i n th e parallel-strande d duple x the y ar e bot h trans i n a e (t)£(t) configuratio n (27). Important difference s betwee n th e paralle l an d antiparalle l structures occu r i n th e strands show n i n stick-bon d form . Whil e th e guanosin e (coloure d brown ) i s located on th e botto m i n th e antiparalle l duplex, i t i s on to p i n th e parallel-strande d duplex . Furthermore, i n the parallel-stranded duplex th e H8 proton s o f the purines point int o the narro w o r 'minor ' groove , wherea s i n th e antiparalle l duple x the y ar e located i n the wide , majo r groove. I t is clear from thi s Plate that excellent interstran d stacking of both the G/G an d A/A typ e occur s in the antiparallel motif, whil e onl y th e G/A typ e of interstrand stackin g is observed i n the paralle l motif. Plat e IV b compares th e paralle l 5'-CGA-3':5'-CGA-3' motif wit h a n antiparallel 5'-CGA-3':3'-GAG-5' duple x containing a (GA) 2 motif , viewe d fro m th e sid e instea d o f end-on . Th e referenc e 5' d(CGA)-3' strand s ar e agai n show n i n va n de r Waal s representatio n i n th e sam e orientation fo r eas e o f comparison . Th e paralle l an d antiparalle l natur e o f th e tw o motifs ar e apparen t fro m thi s figure . I n th e paralle l motif , th e excellen t intrastran d C/G an d interstrand G/ A stackin g can easily be seen, while i n the antiparallel-strande d (GA)2 motif , th e interstran d stack s ar e of th e G/ G an d A/ A type , eve n thoug h th e intrastrand C/G stackin g is similar. It is also worthwhil e to compar e the G:G and A:A pairin g geometr y in the two parallel-stranded duplexes , namel y th e propose d d(GA) n tandem repea t (16 ) and th e parallel-stranded d(TCGA ) duple x (27) . As ca n b e see n fro m Figur e 11.1 b and c , while th e A: A pairings between th e tw o paralle l duplexes ar e similar (the y superim pose when flipped over horizontally), th e G: G pairing s are quite different . I n the parallel-stranded d(GA) n repeat, th e G: G base s pair symmetricall y via their N1H an d O 6 atoms an d bot h guanosin e residue s adop t th e syn glycosidic conformation , whil e i n the d(CGA ) motif , th e G: G base s pai r throug h thei r N 2H an d N 3 atom s an d ar e in the anti conformation . Another poin t wort h noting i s that it is now wel l establishe d (se e below) tha t CG A sequences in CGA G context s (o r TGA sequence s in TGAA contexts ) form antiparal lel duplexes containin g tw o tande m sheare d G:A pairs [the (PyGAPu)2 motif ] flanked by Watson-Crick pairs (33—35). I t is therefore interesting that , in complet e contras t t o CGAG, a singl e chang e t o CGA T wit h a 3'-pyrimidin e shoul d resul t i n a paralle l duplex wit h C +:C an d homopurin e bas e pair s unde r acidi c condition s (24) . This argues for a n important structura l role fo r th e purin e followin g th e tande m G: A pairs in th e antiparalle l (GA) 2 motif . I t i s no t know n a t thi s poin t whethe r CGA G o r TGAA sequence s can form parallel duplexes under acidi c conditions, bu t experiment s are in progress to investigat e this point .
338
Oxford Handbook of Nucleic Acid Structure
4. Tandem sheared G:A mismatches separated by Watson—Crick base pairs The sequence s describe d abov e i n Section s 2 an d 3 al l for m duplexe s containin g continuous run s o f adjacen t mismatche d bas e pairs , i.e . n o intervenin g norma l Watson—Crick bas e pairs are involved. W e will no w discus s small, stable mismatche d motifs containin g tandem sheare d G:A base pairs that are quite stable when flanked by, and embedde d in , norma l Watson—Cric k base-paire d duplexes . Perhap s th e mos t remarkable an d unusua l feature o f thes e duplexe s i s that th e 'destabilizatio n effect ' o f each mismatched G: A pair does not accumulat e progressively. Instead, they contribut e significantly t o th e stabilizatio n of adjacent G: A mismatches and flanking normal base pairs to for m very stable duplexes (33,37) , bu t onl y i n certain sequenc e contexts (35).
4. i Tandem sheared G:A mismatches in the [Py(GA)Pu]2 motif: sequence dependence Non-standard bas e pairing , includin g sheare d o r side-by-sid e G: A pairing , ha s long been recognize d a s a theoretica l alternativ e t o standar d Watson—Cric k pairin g i n nucleic acid s (6,7,38). However, interest i n the actua l existence an d remarkable stabil ity o f G: A pairing cam e fro m th e findin g o f Wilson an d colleague s (39 ) tha t certai n purine—rich oligodeoxynucleotid e sequence s coul d pai r wit h themselve s t o for m duplexes o f similar stabilitie s to thos e forme d i n th e presenc e o f the complementar y pyrimidine-rich strand . Base d o n sequenc e alignment , th e self-paire d homoduple x was propose d t o consis t o f tw o adjacen t G: A pair s separated from anothe r tw o G: A pairs by two intervenin g Watson—Cric k pairs, and flanked by two Watson—Cric k pairs at eac h end. NM R studies , combined wit h the effect s o f replacing guanosine residues with inosines , le d t o a model i n whic h guanin e paire d wit h adenin e via G(N2H) t o A(N7) and A(N6H) t o G(N3) hydroge n bond s (33) . Such sheared tandem 5'-GA:GA3' pair s could b e incorporated int o a n antiparallel duplex model wit h littl e distortio n from a standard B-form DN A backbon e configuration (33). In th e fiv e year s followin g thi s pioneerin g study , severa l structura l and thermody namic investigation s o f tande m G: A mismatche d pair s hav e bee n reporte d (34,35 , 40—49). Usin g characteristi c chemical shif t signature s for th e sheare d geometry, i t was shown fro m 1 D NM R studie s tha t th e formatio n o f tande m sheare d G: A pairs was sensitive t o th e orientatio n o f the flankin g Watson—Cric k pairs , requiring a PyGAP u context o n eac h of the antiparalle l strands (35). Some quit e unusual (for DNA) cross strand NOEs (33,34,40,49 ) produce d a set of distance restraint s that led t o th e deter mination o f fairly hig h precision structures for DNA duplexe s containing on e o r more [Py(GA)Pu]2 motifs; fo r example, 1 5 refined structure s exhibiting pairwise rmsd values of 0.96 ± 0.3 4 A (40). To illustrat e the gros s structura l difference s betwee n duplexe s containin g tande m sheared G: A pair s an d norma l duplexe s containin g Watson—Cric k pairs , thes e tw o structures ar e presente d i n simplifie d ring-and-arro w for m i n Fig . 11.2 . A s ca n b e seen fro m th e sid e vie w int o th e majo r groove , th e DN A containin g th e tande m sheared G: A pairs has two kink s in th e backbone ; par t o f the caus e of thi s kinking is the resul t o f change s i n th e backbon e torsio n angle s fro m th e B I conformatio n o f
DNA mismatches insolution
33
9
Fig, 11.2 . Th e unusua l structur e o f th e d( ( IC X !AA'['t;A(.iC !')- , ik-cunier dnpk-s : (; i und c- ) fonl-umn^ tw o sheared (I'Y C AI'L') ^ m o t' s (40) tomparet l wit h th e B-HN A (!>1 ) crystal structur e ( b an d tl ) fuiitaitiinj ; iiornul G : C - .ml A: T W , win—C'rii' k [uirs . Tin - plu n li;iu - Ku'kbiiiK' s ,irv ivprok'siU'i l li y .11 1 jrrow ribhn u 111 illu-, 1 rail- L i i- iw o k i n k i.iLiic J l> y Lilt - B,—>I> M p l l u [jluKiicML- J U i i l l s i t H u l heLwc.-i.-i l lilt - tw o sllentec l Ci:A hasc; pairs . ( , ) Als o sli o v s th e othe r u n n s u j l strt i turu l fhariifteristi e t ]>NAs i-ont.iiiuii g sheare d (1'ytrAI'u); oti l , ii.inic;l y [li t hint s d o no t follo w t h h.ii-khon e spira l hut st;it k vtrticnill y down o.ic h sid e ot'tht cylin c r o\ in g t o th t crrtHS-ovt r betwee n inte r trail J un J intraitnin d st.i t king, wit h non e o f th e bas t twisting th . ottur s i n B-DN A (bl . Thi s plienoi n nu n i s i n on- ivisil y see n i n th e enJ-ti n view s i n (c) jrl d (d) . whe i • (h e olc'iir-tu ; si'imratin n nt " tht s t rkiiii ; clcnv n the - ritrlit uni t Irr t side' s ta n h e see n m (t), whil e t h bu s s are ratlinll y ilistrihute d in th e li-HN A heli x sliow n i n (d) . Thi s vie w als n ileinonstrntt s ibin th e IT U «- c io n ill" rhe I')N A liuplc-xc- s c o n t a i n i n g (PyCiAPn). . niotit s i s nu t c i r c n l u r , i n i n I V - H N A , hnt hn s an e >ngnrc J ova l shape .
340
Oxford Handbook of Nucleic Acid Structure
B-form DN A t o a B II configuratio n a t th e Gp A phosphodieste r lin k (40,50) . Furthermore, the adenosine now swing s away from intrastran d stacking on the preced ing guanine an d participates in cross-stran d stackin g with th e adenin e o f the opposit e strand. Th e chang e i n phosphodiester conformatio n result s in a downfield shif t o f the GpA 31 P resonanc e (35,50) . Interestingly , thi s backbon e rearrangemen t t o produc e cross-strand Pu/P u stackin g i s restricte d t o juxtapose d Gp A dinucleotide s an d does not occur with juxtaposed ApG, ApA , o r GpG sequences (S.-M . Chou and B.R . Reid, unpublished observations). The tande m sheare d G: A DN A als o ha s a wide r mino r groov e tha n B-DN A (Fig. 11.2 ) (51) , whic h has a very smoot h backbon e trace . Figure s 11.2 c an d d sho w end-on view s o f these duplexes an d th e bas e stacking patterns are obviously dramati cally different fo r the tw o duplexes. Whil e the B-DNA duplex (Fig . 11.2d ) adopt s th e usual intrastran d base stacking, wit h c. 30—40° o f twis t pe r step , th e tande m sheare d G:A-containing DN A exhibit s a combinatio n o f intrastrand an d interstrand stacking , resulting in two clear-cut side s t o th e base stacking . Th e hig h stabilit y o f the tande m sheared G: A pair-containin g DN A i s obviously a result o f the extensiv e cross-stran d purine/purine bas e stackin g an d th e increase d intrastrand stackin g o f th e GpA : GpA dinucleotides wit h their Watson—Cric k nearest neighbours. A n increase in diamete r is also associate d with th e tande m sheare d G:A-containin g DN A duple x (Fig . 11.2c) . Plate V show s a comparison betwee n th e B-DN A duple x d(GCGAATTCGC) 2 (51 ) and th e d(GCGAATGAGC) 2 duple x containin g tande m G: A pair s (40) . T o bette r compare th e two , th e termina l bas e pair s of th e B-for m crysta l structur e have bee n removed s o that bot h sequence s no w hav e 1 0 base pairs . Fro m th e figure , i t ca n b e seen tha t th e B-DN A structur e (left ) ha s regular bas e stacking with a smooth back bone trace , while th e d(GCGAATGAGC) 2 duple x containin g th e (PyGAPu) 2 motif , although stil l a right-handed doubl e helix, has a quite differen t appearance ; residue 8 A has swung fro m intrastran d stacking t o a n interstrande d stac k with residu e 4 A o f th e opposite stran d and, in a similar fashion , 7G now stack s with residue 3G of the oppo site strand . Thi s excellen t cross-strande d G/ G an d A/G stackin g i s indicated by a red arrow i n th e botto m hal f o f Plat e V . The accompanyin g chang e i n th e phosphat e backbone fro m B I to a BII conformation cause s a kink in the otherwise smooth backbone trace, as indicated by the blu e arrow i n th e to p hal f of Plate V. Major effect s o n th e stabilit y an d structur e o f G: A mismatc h pair s a s a resul t o f changing th e immediately adjacen t Watson—Crick pairs have been reveale d by thermodynamic (33-37 ) an d 31 P NMR studie s (35). While thermodynamic studie s reveal that DNA sequence s containin g th e [PyGAPu] 2 moti f have stabilities comparable t o thos e of full y Watson—Cric k base-paire d duplexe s (32,37) , NM R studie s indicat e tha t changes to a different, non-sheared , G: A pairing geometr y (o r even to non-paire d G A bulges) occu r whe n antiparallel Gp A dinucleotides are juxtaposed in PuGAPy context s (35). Thus , th e head-to-hea d Ganti:Aanti geometry, wit h a hydroge n bonde d G imin o proton, tha t occur s in the (AGAT) 2 context switche s to the mor e stable sheared tande m G:A pairing in eithe r (CGAG) 2 o r (TGAA) 2 contexts , whil e no duple x i s formed at all in a (GGAC) 2 context . Thi s dramati c chang e i n G: A pairin g fro m head-to-hea d t o side-by-side geometr y i s clearl y reveale d i n th e NM R spectru m b y a characteristi c shifting o f the guanosine imin o proto n resonanc e from 12. 4 ppm (hydroge n bonded) t o 10.1 pp m (no t hydrogen bonded ) (35) . The 31 P resonances connecting th e 5'-GpA -3 '
DM4 mismatches in solution 34
1
residues are also found t o shif t fro m — 2 ppm t o -3 pp m i n the B II conformation (35) . It is als o interesting t o not e tha t tande m adjacen t GA:G A pair s in th e no n self-comple mentary DN A sequence s 5'GGACGACATC:GATGGAGTCC-3' were als o found t o adopt sheare d pairin g geometr y (48) . Th e flankin g neighbou r stackin g interaction s PyGA:GAPu wer e thus proposed t o contain th e minimal essentia l elements for the for mation an d stabilization of the sheare d (GA) 2 motif (48) . However, th e ful l structur e of the CGAC:GGA G duple x wa s not determine d an d furthe r studie s ar e stil l neede d t o clarify an d explain full y th e contex t requirement s of this motif . Interest i n th e relevanc e and biological functio n o f the (GA) 2 motif stem s from th e finding tha t the sequence requirements for the replication origin o f the single-strande d DNA viru s X17 4 sugges t that it contain s two adjacen t sheare d G: A pairs in a (GA)2 motif (52) . A uniqu e sequenc e withi n a hairpi n regio n i n th e X17 4 genom e wa s found t o be th e binding site for the protein n' , whic h is a pre-priming DN A replica tion enzym e o f E. coli (52) . I t ha s bee n suggeste d tha t recognitio n o f thi s hairpi n sequence i s the signa l that lead s t o th e initiatio n o f o r alternatively (GGAAT) n, repeats (8 ) that hav e recentl y bee n show n t o b e localize d a t th e centromeres o f human chromosome s (62) . This repeat, whic h ca n also be considere d to b e a (TGGAA) n repea t b y simpl e phase-shiftin g o n th e purin e strand , i s highl y conserved amon g al l eukaryotic specie s and i s a high affinit y ligan d fo r specifi c nuclea r proteins—the affinit y i s comparabl e t o othe r highl y selectiv e protein—DN A inter actions, suc h a s the lac represser—operator DN A interactio n (62) . These observation s have le d t o th e suggestio n tha t th e (TGGAA) n repea t ma y b e a componen t o f th e functional huma n centromere . A n extremel y interestin g aspec t o f thi s repea t i s th e fact tha t the purine-rich stran d alone form s homoduplexe s tha t have the sam e therma l stability a s the Watson—Cric k duple x forme d i n th e presenc e o f th e complementar y pyrimidine-rich strand . Several groups hav e subsequentl y investigated th e structur e o f the unusua l duplex forme d by this self-paired repeat. Jaishree an d Wang (64 ) used th e phase-shifted varian t C(AATGG ) sequenc e a s a mode l o f th e (AATGG) n tande m repeat. Unfortunately , th e additiona l C residu e (whic h does no t occu r i n th e natura l repeat) a t the 5'-terminu s reset the pairin g registe r an d forced th e C(AATGG ) duple x into a configuratio n wit h tw o non-adjacent head-to-hea d Ganti:Aanti pair s separated by two Watson-Cric k A: T pairs . I n a separat e NM R structura l stud y o f thi s repeatin g pentamer, Catast i et al (64 ) carrie d ou t NM R studie s o n th e self-paire d duplexe s formed b y (AATGG) n sequence s (where n = 2 or 3 ) and derive d a solution structur e in whic h th e repeatin g moti f containe d a G: G bas e pai r sandwiche d betwee n tw o sheared G: A pairs . Howeve r th e Gsyn:Ganti mismatc h pai r tha t the y propose d i s no t compatible wit h thei r own NMR dat a in tha t ther e wer e no stron g intranucleotid e G(H8) t o G(H1' ) NOEs that would be diagnostic of their proposed Gsyn conforma tion; a fast flip-flo p interconversio n betwee n Ganti:Gsyn and Gsyn:Ganti pairing had t o b e proposed a s a n ad hoc rationalization fo r thi s discrepancy . Furthermore , thei r assign ment o f the critica l guanosine H3 ' an d H4 ' proton s (64) , whic h ar e actually upfiel d shifted b y c . 2 pp m owin g t o a n unusua l stackin g arrangemen t i n th e (GGA) 2 moti f (41,44), als o appear to b e incorrect . Th e structur e of this repeat was finally solve d by Chou et al. (41 ) usin g a tande m repea t o f th e pentame r sequenc e wit h a TGGA A phase, i.e. (GTGGAATGGAAC) 2. Th e fac t tha t th e 'G: G pair' was, in fact , no t paire d at al l but wa s intercalated, wa s established by guanosine t o inosin e substitutions . Thi s led t o th e detectio n o f man y unusua l an d informativ e NOE s fro m th e inosin e H 2 proton; fo r example, i n th e 3G—41—5A:8G—91—10 A segment o f the duplex , th e detec tion o f 4I(H2) 9 1 (H8) an d 9I(H2 ) 4 1 (H8) NOEs , togethe r wit h th e absenc e o f 'nearest neighbour ' 4I(H8 ) 5A(H8) an d 9I (H8) 10A(H8) NOEs , i s incompat ible wit h I: I pairin g an d establishe s the intercalativ e stackin g arrangement o f the tw o I residue s on eac h other. Thi s conclusio n wa s also complemente d b y many additiona l and unexpecte d type s o f NOE s i n thi s region , a s summarize d i n Fig . 11.3 . The unusua l H4 ' chemica l shift s an d th e C3'-endo suga r conformation s o f th e unpaired guanosin e residue s were confirme d b y DQF-COSY and 31 P-1H correlatio n experiments (44).
DNA mismatches in solution 34
3
Fig. 11.3. Th e NO E connectivit y patter n fo r th e antiparalle l d(GIA) 2 motif . Guanosin e residue s were replaced by inosines at the centra l unpaire d purine to exploi t the extra I(H2)-related , throug h space connectivities, whic h wer e foun d t o b e critica l i n solvin g the structur e o f thi s unusua l moti f (41) . The detectable NOE connectivitie s are indicated by solid lines, while those expected in normal DNA, but no t detectable experimentally, are indicated by dashed lines. These dat a are only consistent with 4I/9I interca lating and stacking on eac h other between the bracketing sheared G:A pairs.
Plate V I compare s a standar d B-for m (GCGAATTCGC) 2 crysta l structur e (left ) with th e structur e o f the NMR-derive d (TGGAATGGAA) 2 duple x containin g th e pericentromeric TGGA A repea t (right) . In thi s figure, on e stran d is shown i n space filling displa y and th e othe r stran d in stick-bon d form . Thi s emphasize s the excellen t cross-strand stackin g between th e unpaire d intercalate d guanosin e residu e an d th e guanosine residu e of the sheare d G:A pair, as shown b y the paralle l interface betwee n the space-fillin g strand and th e stick-bon d stran d i n th e botto m hal f o f th e duplex . Owing t o it s obviously differen t gros s morphology compare d wit h standar d B-DNA (the major groove i s much wider an d the mino r groov e i s much narrower) , the mod e of interactio n o f thi s nove l duple x wit h proteins , i.e . isolate d HeL a cel l nuclea r extracts (62) , ca n also be expecte d t o be quit e differen t t o tha t o f normal DNA . Th e d(GGA)2 moti f contain s a gri d o f 1 6 hydrogen bon d donor s an d acceptors , i.e . th e N2H-N1H-06-N7 atoms of the fou r co-stacke d guanine residues, that are exposed t o the exterio r i n the major groove (41,44) . Whether o r no t th e self-paire d (TGGAA) 2 repea t i s actually formed in vivo i s no t yet established but th e expose d four-guanine, 'stick y patch' i s repeated twic e pe r turn, facing opposit e side s of th e duplex , an d coul d perhap s b e responsibl e fo r th e highl y condensed natur e o f DNA a t the centromere , an d may even participate in the capture of chromosomes b y means of the centromer e durin g mitosis. The participatio n o f th e analogou s r(GGA) 2 moti f i n RN A functio n i s less clear , but i t shoul d b e note d tha t i n th e foldin g o f tRNA , th e G5 7 bas e o f the r T loo p intercalates between th e G19:C5 6 and G18:£5 5 tertiar y bas e pairs to for m a continu ous G19-G57-G18-m 1A58 stac k (65) . Thi s four-purin e stac k i s on e o f th e mos t important stabilizin g interactions in tRNA folding .
344
Oxford Handbook of Nucleic Acid Structure
4.3 Sheared G:A mismatches in the [Py(GAA)Pu]:[Py(GA)Pu]
motif
After th e discover y o f sheare d tande m G: A pairin g i n th e antiparalle l (GA) 2 an d double-guanine intercalativ e (GGA) 2 motifs , i t becam e o f interes t whethe r singl e G intercalation betwee n sheare d G: A pair s coul d als o occur , an d whethe r adenosin e residues coul d replac e guanosin e residue s i n thes e intercalations . Th e biologica l rel evance o f this question stems from the fac t tha t a potential antiparalle l GAA:GA moti f could occu r in a highly conserve d region at the 3'-termin i o f single-stranded rodent parvovirus genome s (66,67) . A Y-shaped doubl e hairpi n fold-bac k structur e was pro posed fo r thi s conserve d sequenc e tha t juxtapose s a G A dinucleotid e opposit e a n antiparallel GA A triple t i n al l fou r parvoviru s sequences , suggestin g som e essentia l function fo r this element, whic h i s located i n th e regio n o f the genom e wher e initia tion o f DN A replicatio n occurs . A n unpaire d bubbl e structur e wa s originally pro posed fo r thi s mismatc h region , bu t i t i s interesting t o not e tha t i t i s 'constrained' i n that i t i s resistan t t o mun g bea n endonuclease , whic h i s a single-strande d DNA cleaving enzyme (67) . We have carried ou t NM R studie s of this potentially importan t moti f that indicat e that it does not for m a n unpaired bubble , bu t instea d forms a G:A-bracketed single- A stack intercalated motif in solution (S.-H . Chou, L. Zhu an d B.R. Reid , unpublishe d results), whic h explain s the resistanc e of this motif to mun g bea n endonucleas e cleav age. Th e structur e o f th e 5'-(CGAGTACGAAG) 2 11-me r duplex , containin g two GAA:G A motif s separate d b y fou r Watson—Cric k pairs , ha s bee n determine d (S.-H. Chou , L . Zh u an d B.R . Reid , i n preparation) , an d i s show n alongsid e th e B-DNA crystal structure of 5'-(GCGAATTCGC)2 in Plate VII. The unpaire d adeno sine tha t i s intercalated betwee n antiparalle l sheare d G: A pair s i s shown i n blu e (o n both strands ) and ca n be see n t o stac k very wel l wit h th e guanin e residue s of both o f flanking G:A pairs. Interestingly, since the adenin e followin g th e sheare d G: A guanin e now stack s o n it, ther e is now n o nee d for thi s Gp A phosphodiester to switc h int o a BII configuratio n t o permi t cross-stran d G/ G an d A/ A stacking . Th e backbon e ha s now reverte d t o th e B-DN A typ e wit h n o kin k and , unlik e th e (GA) 2 motif , n o unusually shifte d phosphoru s resonance s are observed i n th e 1 H-31P correlation spec trum o f the GAA:G A motif .
5. Sheared G:A mismatches closing single-residue hairpin loops 5.1 The (GCA) motif and (GNA) motifs There ha s been considerabl e interes t in th e structur e o f small hairpin loop s i n connection with the discover y o f the expansio n o f tandem triple t repeats in the targe t genes of several geneti c disease s that sho w anticipatio n (68,69) . Th e formatio n o f hairpin fold back structures by either the pyrimidine-rich stran d or the purine-rich stran d (o r both) of these repeated triplets has been suggeste d to be part o f a proposed replicativ e slippage mechanism fo r th e expansio n o f th e triple t repeat s (70) . Severa l ge l electrophoreti c studies o n th e formatio n o f fold-bac k hairpin s b y suc h repeat s hav e bee n reporte d recently (71—75) , includin g proposal s for th e formatio n o f loop s putativel y close d b y
DNA mismatches in solution 34
5
A:A o r G: G pairs , but th e typ e an d registe r o f the mismatc h pairing s an d th e actua l structure o f the bas e pairs in the stem , a s well a s the structur e of the loop in such hairpins, remain s unclea r at this point. Earlie r structural and thermodynamic studie s led t o the origina l conclusio n tha t oligonucleotid e hairpin s containin g les s tha n thre e nucleotides i n the loo p wer e stericall y impossible (76 ) and the optima l hairpi n loop size in DN A hairpin s was considered t o b e 4— 5 residue s (77) . However, thes e conclusion s were foun d to requir e revisio n whe n later studies established that the stabilit y of DN A hairpins increase d as the siz e o f the loo p wa s reduced, wit h trinucleotid e loop s (espe cially TT T o r AAA ) being th e mos t stabl e (78) . The natur e o f the closin g pai r at th e top o f the bas e paired stern has a major influenc e on loo p stability, and i n 199 4 Hira o et al. (79 ) reported tha t the DN A heptanucleotid e d(GCGAAGC ) form s a n extraordinar ily stabl e fold-bac k structur e tha t i s resistan t t o nuclease s and heat . Base d o n NM R studies, th e author s reporte d a compac t hairpi n mode l wit h a three-bas e pai r ste m closed by a sheared G: A pair and a 'mobile' loop consistin g o f a single adenosin e (79). In completel y separat e studie s o n variant s o f th e d(GGA) 2 moti f i n d(TGGAA) n repeat sequences , th e presen t authors , togethe r wit h Leimin g Zhu , investigate d th e solution structur e o f d(TGCAA ) sequences—expectin g the m t o for m intercalativ e (GCA)2 motif s analogou s t o th e (GGA) 2 motifs described above , since thermal denat uration studie s of (GCAAT) 6 ha d shown tha t i t has almost the sam e melting tempera ture a s the (GGAAT) 6 sequenc e (62) . However , t o ou r surprise , the y di d no t for m intercalative motifs , an d th e decame r CAATGCAAT G instea d forme d a n unusua l stable hairpi n wit h a four-bas e paire d ste m an d a single-cytidin e loo p close d b y a sheared G: A pai r (80) . Studie s o f th e remainin g tw o NAATGNAAT G variants , namely AAATGAAAT G an d TAATGTAATG, reveale d tha t neithe r forme d a single stable structure. Instead , they both establishe d a n equilibrium mixtur e o f hairpins con taining a single-residu e 'tight-turn ' loo p close d b y a sheare d G: A pai r [th e d(GNA ) motif] an d bimolecula r duplexe s containin g intercalativ e d(GNA) 2 motif s (42) ; these two quit e differen t conformation s were foun d t o b e i n slo w exchang e o n th e NM R time-scale fo r both decame r sequences . Thus, GN A triplet s exhibit remarkably differ ent foldin g an d interactio n propertie s tha t depen d o n th e identit y o f th e N residue . When N = G , d(NAATGNAATG ) sequence s hav e a stron g propensit y t o for m duplexes containin g a n intercalativ e d(GGA) 2 moti f (c. 80 % o f th e population) . However, when N = C , suc h decamers form exclusivel y hairpins containing a singleC tight-tur n loop , i.e . th e d(GCA ) loo p motif . Finally , whe n N = T o r A , th e decamers both exhibi t slo w exchang e hairpin-duple x equilibria , wit h a stronger ten dency t o for m single-residue , tight-loo p hairpins (c. 80%) tha n bimolecula r intercala tive duplexes (c. 20%) under NM R condition s (42) . The fac t tha t th e GC A triple t exclusivel y form s tight-turn hairpin s wit h singl e C loops may wel l be of biologica l relevanc e in modulatin g the foldin g of pericen tromeric DN A sinc e TGCAA i s the mos t commo n varian t i n (TGGAA) n run s (81) . We have recently show n that , while (G)TGGAATGGAATGGAA(C ) sequence s form antiparallel duplexe s containin g thre e intercalativ e (GGA) 2 motifs, a single chang e t o (G)TGGAATGCAATGGAA(C) result s i n th e exclusiv e formatio n o f hairpin s con taining a (GGA) 2 motif i n th e stem and a (GCA) moti f tight-turn loo p (45). This extraordinar y hairpin-promotin g capabilit y o f (GCA) triplet s i n the middl e of (TGGAA)n run s woul d b e expecte d t o for m multi-ar m fold-bac k structure s whic h
346
Oxford Handbook of Nucleic Acid Structure
may b e relate d t o th e condensatio n o f huma n centromeres . Plat e VII I show s th e hairpin structur e o f suc h a d(TGGAATGCAATGGAA ) sequenc e i n tw o differen t views (45) . I n th e majo r groov e view , th e gri d o f 1 6 hydroge n bon d donor s an d acceptors of the fou r well-stacke d guanosines can be clearl y seen just below th e centr e of the righ t view , whil e in the mino r groov e view , on the left , the excellen t bas e stacking i n th e GC A tight-loo p (i n whic h th e carbon s ar e blue ) i s evident . Furthermore, th e deoxyribos e o f residue 8C an d the bas e of residue 9 A (in which th e carbons ar e blue ) ar e als o 'stacked'—a s show n a t th e to p o f th e righ t view . Th e deoxyribose H4 ' proto n o f the residu e 8 C i s coloured yello w i n thi s Figur e to revea l its direct stackin g over th e 9 A base, which explain s its unusually upfield chemical shif t of c. 1.8 ppm (45,80) . Th e stackin g interaction o f the deoxyribos e o f the loop cytidine residue with th e adenin e base of the closin g sheared G:A pair now explain s how thi s motif ca n for m suc h smal l hairpi n loop s containin g onl y on e nucleotide . I n a loo p closed by a normal Watson-Crick pair, the C5 ' atom s of the ascendin g strand and th e C3' ato m o f th e descendin g stran d ar e to o fa r apar t t o b e bridge d b y a singl e nucleotide, and loop s bridgin g the end s of suc h stem s requir e a minimu m of two nucleotides (77) . However, th e sheare d geometr y o f the closin g G: A pair swing s th e ends o f the tw o ste m strands closer togethe r an d this , combine d wit h th e interactio n of the loop residu e sugar ring with th e closin g G:A pair, is sufficient t o permi t bridg ing by a single nucleotide .
5.2 (AAA) and (GAG) motifs Given th e requiremen t fo r sheare d G: A pair s i n closin g single-nucleotid e loops , a n interesting questio n becam e whethe r thi s functio n coul d b e carrie d ou t b y othe r Pu:Pu combination s i n sheare d geometry . W e hav e no w extende d thi s closin g pai r motif t o A: A an d G: G pairs . Th e DN A sequence s d(GTACAAAGTAC ) an d d(GTACGAGGTAC) als o form hairpin s with analogou s tight-tur n loops , containin g a singl e adenosine residue , tha t ar e closed b y sheared A:A and G: G pairs , respectively; the solution structur e of the d(GTAGAAAGTAC ) 11-me r hairpin has been rigorously determined b y NMR distanc e geometry method s (43) . Because of the smal l molecular size of this undecamer, it s well-resolved NM R spectra , and abundant distance con straints fro m th e A(H2 ) proton s an d stereospecifically assigned H5'/H5" protons, th e rmsd between 3 0 distance geometry structures was only a. 1.15 A before energ y mini mization. The backbon e e, B an d y torsio n angle s were als o constraine d from 31 P-1H correlation experiment s combine d wit h th e in-plan e 'W ' rul e (82,83) . Th e £ and a dihedral angle s were exclude d from the trans domain, base d on th e observatio n o f no unusually upfield-shifte d 31P resonance s (84) . Thes e backbon e torsion angl e restraints were foun d to be quit e useful fo r converging th e distanc e geometry structure s for this single-A loo p hairpin . The structur e of the d(GTACAAAGTAC ) hairpi n i s shown i n two different views i n Plate IX . I n the left view , th e kin k i n the backbone o f the loop region i s indicated by a blue arro w and i s brought abou t mainly by a change in torsio n angles fro m £(g~), <x(g~) t o £(g+), a(g+) a t th e turn , an d b y a chang e i n th e y torsio n angle o f residue 7A from gauche+ to trans. Th e excellen t 5A/6 A bas e stacking and 6 A deoxyribose/7A stackin g i n th e majo r groov e i s clea r i n th e righ t view , whil e th e clear distinction between the majo r and mino r groove s is evident in th e lef t view .
DNA mismatches in solution 34
7
Owing t o th e unusua l stackin g interactio n betwee n th e 3'-adenin e bas e an d th e deoxyribose ring s o f th e centra l nucleotid e i n th e d(GNA ) an d d(AAA ) tight-tur n motifs, a s well a s in th e d(GNA) 2 intercalativ e motifs, th e H4 ' proto n o f thi s sugar experiences extraordinar y upfield shift s o f c. 2 ppm int o the H2'/H2 " spectral region (41—43). This is due t o th e 'stacking ' o f the unpaired N residu e sugar directly ove r th e strong rin g curren t o f th e followin g adenine base , producing a maximal upfiel d rin g current shif t o n thi s H4 ' proto n (se e the righ t view s i n Plate s VII I an d IX) . Thi s special stackin g of deoxyribose suga r rings with thei r O4' an d H4' atom s directly on top o f an adjacent bas e is not withou t precedent , an d ha s been observed a t Cp G step s in Z-DNA (89) , in th e crysta l structures of self-intercalated dimer s of cyclic d(ApAp) (85), and in the crysta l structures of DNA complexe s with drug s (86—88). The stabiliz ing force s i n suc h interactions has been discussed recently and i s believed to aris e fro m O4'-CH hydrogen bondin g an d a n n IT* interaction (89) . 5.3 Sheared pairs and single-residue loops in biology It shoul d b e note d tha t the single-residu e loo p hairpin s discussed above ar e require d for promote r recognitio n b y the N 4 virio n RN A polymeras e (90,91) an d are extensively use d by th e single-strande d DN A parvoviruse s in formin g the specia l 'rabbit' s ears' fold-bac k structur e at thei r 3'-terminu s tha t i s necessar y fo r thei r replicatio n (66,67). I n th e coliphag e N4 , th e phage-code d RN A polymeras e is unable t o tran scribe norma l double-strande d DNA . However , i t ca n transcribe , accuratel y and efficiently, single-stranded , promoter-containin g templates . I t was later proposed tha t a hairpin is formed at the promoter region which , afte r formin g an 'activated promoter' with E. coli single-stranded-binding proteins , can be recognized by virion RN A poly merase. Analysi s o f a large series o f mutant promoter s reveale d that a particular template secondar y structure, specifically a hairpi n wit h a 5— 7 b p ste m an d a three-bas e loop, wa s required fo r recognitio n o f the promote r b y th e RN A polymerase . Thes e hairpin structure s contai n a cor e sequenc e consistin g o f eithe r 5'-GCGAAGC o r 5'-GCGGAGC . Th e sequenc e 5'-GCGAAG C ha s already been found t o b e extraordinaril y stabl e (79) , existing predominantly a s a compact hairpi n with a single- A residu e loo p tha t i s close d b y a sheare d G: A pai r (42,79) . Th e 5'-GCGGAGC sequence should als o be abl e to form a similar hairpin structur e with a single- G loop , o r a t leas t a hairpin—duplex equilibrium , i f it i s embedded i n a relatively long sequence. By itself, thi s heptamer i s more likely to form a duplex structure with a n intercalated d(GGA)2 motif (41,42,44) . The single-residu e hairpi n loop i s even more prevalen t i n the single-stranded DN A parvoviruses (66,67). The double-hairpi n secondar y structure proposed on th e basis of sequence dat a contain s a GA A single- A loo p moti f a t th e en d o f on e hairpi n an d a proposed CA G loop at the en d of the othe r hairpin of the Y-shaped 'rabbit's ear' at the 3'-terminus—as wel l a s an AAA loop cappin g th e propose d hairpi n a t the 5'-en d o f the genome , a s shown a t the to p o f Fig . 11.4 . Carefu l examinatio n o f the parvoviru s sequence reveals tha t th e propose d hairpin containin g the CA G loop ca n actually b e rearranged int o a hairpi n wit h a tight-tur n d(GCA ) moti f tha t contain s th e sam e number o f G: C pair s i n th e stem , a s shown a t th e botto m o f Fig . 11.4 . A conse quence o f thi s rearrangemen t is tha t ther e wil l no w b e tw o unpaire d nucleotide s
348
Oxford Handbook of Nucleic Acid Structure
Fig. 11.4. The originall y proposed (upper ) (66,67 ) and suggested revision based on thi s chapter (lower) secondary structure of the parvovirus genome. Not e th e reversed numbering system (starting from th e 3' end) use d by the authors. The 3'-en d of the genome contain s a Y-shaped 'rabbi t ear' structure, in which a triplet d(GAA ) loop moti f i s present (show n i n bold) . Th e d(CAG ) loo p propose d i n th e uppe r diagra m can be rearranged into an alternative hairpin containing a d(GCA) loop an d two unpaire d residues at the three-way junction (lowe r figure ) tha t we believ e woul d b e a more stabl e form , becaus e of th e stron g hairpin-promoting abilit y of the d(GCA ) motif and the reporte d stabilizin g effect o f two unpaire d bases at the three-wa y junction (93,94) . A conserve d GAA:G A motif i s present at positions 91, 90 , 8 9 and posi tions 26 , 25 . Th e structur e o f thi s moti f ha s recentl y bee n solve d an d contain s an unpaire d A residu e intercalated betwee n tw o sheare d G: A pairs. A d(AAA) loop moti f i s also present near the 5'-en d o f th e genome. Th e parvoviru s genome thu s contains several variants o f sheared PU:PU pairing motifs , whic h may serve important biologica l recognition function s i n this single-stranded DNA virus .
(G45-C46 or G70-C71—note the inverted numberin g fro m th e 3'-end) i n the three way junction region . W e believe tha t th e latte r foldin g wil l b e th e mor e stabl e for m because of the stron g hairpin-promoting abilit y of the GC A triple t moti f and furthermore, th e two unpaired residues would stabiliz e the DNA three-wa y junction (92,93) . If this rearranged mode l i s correct, the n th e fold-bac k secondar y structure of single stranded DN A parvoviruse s ha s mad e grea t us e o f sheare d mismatc h G: A an d A: A pairing b y forming a single-C loo p d(GCA ) motif , a single-A loo p d(GAA ) motif , a single-A loo p d(AAA ) motif, an d an intercalated GAA:G A moti f i n the ste m of the Yshaped 3'-fold-bac k (se e Fig. 11.4) . Becaus e these motif s ar e s o well conserve d i n all parvovirus genomes , thes e specia l structure s an d thei r motif s ar e probably importan t for th e autonomou s DNA replicatio n o f these viruses.
DNA mismatches in solution 34
9
6. Sheared GA mismatches closing two-residue hairpin loops Since th e sugar—suga r geometr y i n sheare d Pu:P u pair s allow s the m t o b e easil y bridged b y a single nucleotid e loop , the n on e woul d surmis e tha t two-residu e loop s closed by sheared G:A pairs should be a t least as a stable, and, indeed, suc h loop structures in DN A d o occur . Hira o et al. (94) first reporte d a rather stable structure for th e short single-strande d DN A sequenc e GCGAAAGC tha t move s unexpectedl y faste r than othe r DN A octamer s durin g electrophoresi s i n denaturin g polyacrylamid e gel s containing 7 M urea . A t tha t tim e a hairpi n structur e consistin g o f a two-G:C base paired ste m wit h a GAA A loo p wa s proposed. A related , simila r result was obtaine d later whe n Sandusk y et al. (95 ) reporte d tha t th e DN A 15-me r d(TG C GGCAGCAACAGC) revers e transcript of T-cell leukaemia virus 2 was found not t o be suitabl e as a DNA sequencin g primer . Thi s pentadecame r wa s found t o hav e th e electrophoretic mobility o f a nonamer o n denaturin g acrylamide gel s in 8 M urea . An NMR-derived structural model indicate d that this sequence (wit h a melting tempera ture c. 75°C) form s a hairpin structure with thre e Watson-Crick pairs in a 4 bp stem , a two-residu e loo p close d b y a sheared G:A pair , and fiv e danglin g bases (95). Studies on a separate serie s o f DNA oligomer s indicate d tha t 5'-CGXYA G sequences hav e melting temperature s 18-20° C highe r tha n DN A oligomer s wit h 5'-CAXYGG sequences (95) . The identitie s of the X Y loop residues were foun d to pla y a negligible role i n determinin g th e stabilit y of these DN A hairpins . The orientatio n dependenc e of th e flankin g Watson—Cric k contex t wa s implicate d b y th e observatio n tha t 5'-GGXYAC hairpin s wer e foun d t o hav e c. 20°C lowe r meltin g temperature s than 5'-CGXYAG sequences. The two-residu e loo p GCGAAAGC sequenc e ha s also been foun d a t the replica tion origi n o f the single-strande d DN A phag e G 4 (96) . However , th e hairpi n struc ture propose d b y th e author s contain s a five-residue AAAG C loo p close d b y a G: C pair, instea d of th e two-residu e A A loop close d by a G: A pai r tha t i s formed b y th e isolated GCGAAAGC octame r sequence. Hirao et al. (97) constructed a series of hairpins containin g differen t stem lengths , an d foun d tha t a t leas t eigh t bas e pairs in th e stem were require d t o maintai n the AAAG C loo p hairpi n structur e proposed b y Sims et al. (96) . When th e numbe r o f base pairs in th e ste m wa s less tha n eight , th e sec ondary structure rearranged to form the hairpi n with a two-residue A A loop closed by a sheare d G:A pai r (97) . In fact , the y foun d tha t th e 'GAA A loop ' hairpi n wit h onl y four Watson—Cric k pair s i n th e ste m wa s eve n mor e stabl e tha n th e AAAG C loo p hairpin containin g eigh t Watson—Cric k pair s i n th e stem . Thi s suggest s tha t th e sheared G:A-closed , two-residu e loo p i s a very stabl e motif that readil y form s stabl e hairpin structure s whe n flanke d b y self-complementar y sequences . I n th e cas e o f RNA, simila r stable, two-residue r(GNRA ) loo p hairpins closed by sheared G:A pairs have also been foun d both i n solution (98 ) and in the crystallin e state (99).
7. Conclusion From the abov e discussion , it is clea r tha t d(PyGNAPu) motif s containin g sheare d G:A pairs have established a special niche o f their ow n i n th e Watson—Cric k based B DNA world . When ther e is no N residu e between adjacen t G:A pairs, such sequences
350
Oxford Handbook of Nucleic Acid Structure
form th e rathe r stabl e d(PyGAPu) 2 motif . A simila r stabl e sheare d d(PyGAPu) : d(PyAAPu) antiparalle l duplex moti f ha s also been foun d (100) . When N = C , T , or A, suc h sequence s for m d(PyGCAPu) , d(PyGTAPu) , o r d(PyGAAPu ) tight-tur n single-residue loo p motifs , althoug h bimolecula r d(PyGTAPu) 2 an d d(PyGAAPu) 2 intercalative duplex form s ar e also detected whe n N = T o r A. The d(PyGCAPu ) cas e is specia l i n tha t n o intercalativ e d(PyGGAPu) 2 duple x i s observed. Fo r th e N = A case, stabl e d(PyAAAPu) an d d(PyGAGPu) tight-loo p motifs ca n also be formed. Th e two-residue, G:A-close d d(PyGAAAPu) hairpi n loop moti f has also been foun d to be stable. I n contrast , when N = G , the stable , symmetrica l bimolecula r d(PyGGAPu) 2 motif i s formed, with tw o unpaire d guanosin e residue s intercalated between adjacen t sheared G: A pairs . Whe n N = A o n on e stran d only , a stabl e non-symmetrica l d(PyGAAPu):d(PyGAPu) moti f wit h onl y a single unpaired adenosin e residu e inter calated betwee n adjacen t sheare d G: A pair s ca n b e formed . I t i s important t o not e that, excep t fo r closin g single - o r double-residu e hairpi n loops , n o singl e isolate d sheared G: A pairs have been foun d in duplexe s o r hairpi n stems . Only adjacen t pair s of sheared G: A pairs in (PyGAPu) 2 motifs are found t o b e stabl e and compatibl e wit h flanking B-DNA geometry. Althoug h G : A mismatches with othe r type s of geometr y can coexis t with Watson-Crick B-DNA, the y al l introduce some sor t o f destabilization t o th e paren t B-form duplexes . Th e cross-stran d stacking , eithe r i n th e for m o f base-base o r base-deoxyribos e stacking , i s a characteristi c of thes e tande m pairing s and i s thu s o f critica l importanc e t o th e stabilizatio n o f thes e mismatche d DN A sequences. I t will b e o f interest in th e futur e t o ascertai n to wha t extent , i f any, these Pu:Pu mismatche s participate in th e structur e and expansion o f GCA repeat s in spin obulbar muscula r atroph y (101 ) myotoni c dystroph y (102 ) an d Huntington' s diseas e (103), in th e structur e and expansion o f GCG repeat s in Fragil e X menta l retardatio n (104), and in the structur e and expansion of GAA repeats in Friedreich's ataxi a (105).
Acknowledgements The Nationa l Chung-Hsin g University, Taiwa n i s thanked fo r financial suppor t (t o S.H. C. ) and B.R.R. gratefully acknowledge s th e support o f NIH grant s GM32681 and GM52883.
References 1. Loeb , L.A. and Kunkel, T.A. (1982 ) Ann. Rev. Biochem. 52, 429. 2. Kramer , B., Kramer, W. an d Fritz, H.-J. (1984 ) Cell 38, 879 . 3. Dohet , C., Wanger , R. an d Radman, M. (1985 ) Proc. Natl. Acad. Sci USA 82 , 503 . 4. Fazakerley , G.V., Quignard, E., Woisard, A., Guschlbauer, W., va n der Marel, G.A., van Boom, J.H.Jones, M. and Radman, M. (1986 ) EMBO J. 5 , 3697. 5. Su , S.-S., Lahue, R.S., Au , K.G. and Modrich, P. (1988 ) J. Biol. Chem. 263, 6829.
6. Donohue , J. (1956 ) Proc. Natl. Acad. Sci. USA 42, 60 .
7. Poltev , V.I. and Shulyupina, N.V. (1986 ) J. Biomol. Struct. Dynamics 3, 739. 8. Prosser , J., Frommer , M., Paul , C. and Vincent, P.C. (1986 ) J. Mol. Biol. 187, 145 . 9. Zakian , V.A. (1995) Science 270, 1601 . 10. Fazakerley , G.V. and Boulard, Y. (1995 ) Meth. Enzymol. 261, 145 .
DNA mismatches in solution 35
1
11. Manor , H. , Rao , B.S . an d Martin, R.G . (1988 ) J. Mol Evol, 27, 96. 12. Palecek , E. (1991 ) CR C Crit. Rev. Biochem. Mol. Biol. 26, 151 . 13. Yee , H.A. , Wong , A.K. , van de Sande, J.H. an d Ratter, J.B. (1991 ) Nud. Acids. Res. 19, 949. 14. Aharoni , A., Baran, N. an d Manor, H . (1993 ) Nud. Acids. Res. 21, 5221 . 15. Huertas , D. , Bellsolell , L. , Casasnovas , J.M., Coll , M . an d Azorin , F . (1993 ) EMBO J. 12, 4029. 16. Rippe , K. , Fritsch , V., Westhof, E. andjovin, T.M . (1992 ) EMBO 11 , 3777 . 17. Lee , J.S. (1990 ) Nucl. Adds Res. 18, 6057. 18. Baran , N., Lapidot , A. and Manor, H . (1991 ) Proc, Natl. Acad. Sci. USA 88 , 507 . 19. Lapidot , A., Baran, N. an d Manor, H . (1989 ) Nucl. Acids Res. 17, 883 . 20. Sen , D. an d Gilbert, W . (1988 ) Nature 334 , 364 . 21. Sundquist , W.I . an d Klug, A. (1989) Nature 342, 825 . 22. Ferrer , N. , Azorin , F. , Villasante , A., Gutierrez , C . an d Abad, J.P. (1995 ) J. Mol. Biol. 245, 8 . 23. Huertas , D. an d Azorin, F. (1996 ) Biochemistry 35 , 13125 . 24. Robinson , H. , va n de r Marel , G.A. , va n Boom , J.H . an d Wang , A.H.-J . (1992 ) Biochemistry 31 , 10510 . 25. Robinson , H. an d Wang, A.H.-J . (1993 ) Proc . Natl. Acad. Sci. USA 90 , 5224 . 26. Robinson , H. , van Boom, J.H. an d Wang, A.H.-J . (1994 ) J. Am. Chem. Soc. 116, 1565 . 27. Wang , Y . and Patel, D.J. (1994 ) J. Mol. Biol. 242, 508 . 28. Cruse , W.B.T., Egert, E., Kennard, O., Sola , G.B. , Salisbury , S.A. and Viswamitra, M.A . (1983) Biochemistry 22 , 1833 . 29. Coll , M. , Solans , X. , Font-Altaba , M . an d Subirana , J.A . (1987 ) J. Biomol. Struct. Dynamics 4, 797 . 30. Gehring , K., Leroy, J.-L. an d Gueron, M. (1993 ) Nature 363 , 561 . 31. Reid , D.G., Salisbury , S.A., Brown, T . an d Williams, D.H . (1985 ) Biochemistry 24 , 4325 . 32. Topping , R.T. , Stone , M.P. , Brush , C.K . an d Harris, T.M . (1988 ) Biochemistry 27, 7216 . 33. Li , Y., Zon , G . and Wilson, W.D . (1991 ) Proc. Natl. Acad. Sci, USA 88 , 26 . 34. Chou , S.-H. , Cheng, J.-W. an d Reid, B.R. (1992 ) J. Mol. Biol. 228, 138 . 35. Cheng , J.-W., Chou, S.-H . an d Reid, B.R. (1992 ) J. Mol. Biol. 228, 1037 . 36. Hare , D.R . an d Reid, B.R. (1986 ) Biochemistry 25 , 5341 . 37. Ebel , S. , Lane, A.N. an d Brown, T . (1992 ) Biochemistry 31 , 12083 . 38. Saenger , W. (1984 ) Principles of Nucleic Add Structure. Springer, Ne w York . 39. Wilson , W.D. , Dotrong , M.H. , Zuo , E.T . an d Zon, G . (1988 ) Nucl. Acids Res. 16, 5137 . 40. Chou , S.-H., Cheng, J.-W., Fedoroff , O . an d Reid, B.R. (1994 ) J. Mol. Biol. 241, 467 . 41. Chou , S.-H., Zhu, L . and Reid, B.R. (1994 ) J. Mol. Biol. 244, 259 . 42. Chou , S.-H., Zhu, L . and Reid, B.R. (1996 ) J. Mol Biol. 259, 445 . 43. Chou , S.-H. , Zhu , L. , Gao, Z. , Cheng , J.-W . an d Reid, B.R. (1996 ) J. Mol. Biol. 264 , 981. 44. Zhu , L. , Chou, S.-H . an d Reid, B.R. (1995 ) J. Mol. Biol. 254, 623 . 45. Zhu , L. , Chou, S.-H . an d Reid, B.R. (1996 ) Proc. Natl. Acad. Sci. USA 93 , 12159 . 46. Lane , A., Martin, S.R. , Ebel , S . and Brown, T . (1992 ) Biochemistry 31 , 12087 . 47. Lane , A., Ebel, S . and Brown, T. (1994 ) Eur. J. Biochem. 220, 717 . 48. Katahira , M., Sato , H., Mishima , K. , Uesugi , S . and Fujii , S . (1993) Nucl. Acids Res. 21 , 5418. 49. Green , K.L.Jones, R.L., Li , Y., Robinson, H., Wang , A.H. , Zon , G . and Wilson, W.D . (1994) Biochemistry 33 , 1053 . 50. Chou , S.-H. , Cheng , J.-W., Fedoroff , O.Y. , Chuprina , V.P . an d Reid , B.R . (1992 ) J. Amr. Chem. Soc. 114, 3114 .
352
Oxford Handbook of Nucleic Acid Structure
51. Wing , R., Drew , H. , Takano, T. , Broka, C., Tanaka , S., Itakura, K. and Dickerson, R.E . (1980) Nature 287, 755 . 52. Shloma i J. an d Kornberg, A . (1980 ) Proc. Natl. Acad. Sci. USA 77 , 799 . 53. Freemont , P.S., Lane , A.N. an d Sanderson, M.R. (1991 ) Biochem. J. 278 , 1 . 54. Steitz , T.A. (1990 ) Q . Rev, Biophys. 23 , 205 . 55. Pley , H.W. , Flaherty , K.M. an d McKay, D.B. (1994 ) Nature 372, 68 . 56. Scott , W.G., Finch , J.T. an d Klug, A. (1995 ) Cell 81, 991 . 57. Uhlenbeck , O.C . (1987 ) Nature 328 , 596 . 58. Nowak , R . (1994 ) Science 263, 608 . 59. Blackburn , E.H. (1991) , Nature 350 , 569 . 60. Moyzis , R.K. , Torney , D.C. , Meyne , J. , Buckingham , J.M. , Wu , J.R. , Burks , C. , Sirotkin, K.M . and Goad, W.B. (1989 ) Genomics 4, 273 . 61. Williamson , J.R., Raghuraman , M.K . an d Cech, T.R . (1989 ) Cell 59, 871 . 62. Grady , D.L., Ratliff , R.L., Robinson , D.L., McCanlies , E.G., Meyne, J. and Moyzis, R.K . (1992) Proc. Natl. Acad. Sci. USA 89 , 1695 . 63. Jaishree , T.N. an d Wang, A.H.-J. (1994 ) FEBS Lett. 347, 99 . 64. Catasti , P. , Gupta , G. , Garcia , A.E., Ratliff , R. , Hong , L. , Yau , P. , Moyzis , R.K . an d Bradbury, E.M. (1994 ) Biochemistry 33 , 3819 . 65. Rich , A., Quigley , GJ . an d Wang , A.H.-J . (1979 ) i n Stereodynamics of Molecular Systems, (Sarma, R.H. ed.) , pp. 315-330. Pergamon Press , New York . 66. Astell , C.R., Chow , M.B. an d Ward, D.C . (1985 ) J. Viral. 54 , 171 . 67. Astell , C.R., Smith , M. , Chow , M.B . an d Ward, D.C . (1979 ) Cell 17 , 691. 68. Bates , G. and Lehrach, H. (1994 ) BioEssays 16 , 277. 69. Sutherland , G.R. an d Richards, R.I . (1995 ) Proc. Nad. Acad. Sci. USA 92 , 3636 . 70. Richards , R.I . an d Sutherlands, G.R. (1994 ) Nature Genet. 6, 114 . 71. Gacy , A.M., Goellner , G. , Juranic, N. , Macura , S. and McMurray, C.T . (1995 ) Cell 81, 533. 72. Yu , A. , Dill, J., Wirth , S.S. , Huang , G. , Lee , V.H., Haworth , I.S . and Mitas, M. (1995 ) Nucl. Acids Res. 23, 2706 . 73. Yu , A. , Dill, J. an d Mitas, M . (1995 ) Nucl. Acids Res. 23, 4055 . 74. Chen , F.-M . (1991 ) Biochemistry 30 , 4472 . 75. Mitas , M., Yu , A., Dili,], and Haworth, I.S . (1995) Biochemistry 34 , 12803 . 76. Tinoco , I.J. , Uhlenbeck, O.C . an d Levine, M.D. (1971 ) Nature 230 , 362 . 77. Haasnoot , C.A.G. , Hilbers , C.W. , va n der Marel, G.A. , va n Boom, J.H., Singh , U.C. , Pattabiraman, N. an d Kollman, P.A . (1986 ) J. Biomol. Struct. Dynamics 3, 843 . 78. Germann , M.W. , Kalisch , B.W. , Lundberg , P. , Vogel , HJ . an d va n d e Sande , J.H . (1990) Nucl. Acids Res. 18, 1489 . 79. Hirao , L, Kawai, G., Yoshizawa, S., Nishimura, Y., Ishido , Y., Watanabe, K . and Miura, K. (1994) Nucl. Adds Res. 22, 576 . 80. Zhu , L. , Chou, S.-H. , Xu, J. an d Reid, B.R. (1995 ) Nature Struct. Bid. 2, 1012 . 81. Vissel , B., Nagy , A. and Choo, K.H.A. (1992 ) Cytogenet. Cell Genet. 61, 81 . 82. Sarma , R.H., Mynott , R.J. , Wood , D.J. an d Hruska, F.E . (1973 ) J. Am. Chem. Soc. 95, 6457. 83. Altona , C. (1982 ) Rec. Trav. Chim. Pays-Bas 101 , 413 . 84. Gorenstein , D.G. , Schroeder , S.A. , Fu , J.M., Metz , J.T., Roongta , V . an d Jones, C.R . (1988) Biochemistry 27 , 7223 . 85. Frederick , C.A. , Coll , M. , va n de r Marel , G.A. , va n Boom , J.H. an d Wang , A.H.-J . (1988) Biochemistry 27, 8350 . 86. Teng , M.-K. , Usman, N. , Frederick , C.A . an d Wang, A.H.-J . (1988) Nucl. Acids Res. 16, 2671.
DNA mismatches in solution 35
3
87. Kopka , M.L. , Yoon , C. , GoodseU , D., Pjura , P. an d Dickerson, R.E . (1985 ) Proc. Natl Acad. Sci. USA 82 , 1376 . 88. Coll , M., Frederick , C.A., Wang , A.H.-J. and Rich, A. (1987 ) Proc. Natl. Acad. Sci. USA 84, 8385. 89. Egli , M. an d Gessner , R.V. (1995 ) Proc . Natl. Acad. Sci. USA 92 , 180 . 90. Glucksmann-Kuis , M.A., Malone , C. , Markiewicz , P . and Rothman-Denes, L.B . (1992) Cell 70, 491 . 91. Glucksmann-Kuis , M.A., Dai , X., Markiewicz , P . and Rothman-Denes, L.B . (1996 ) Cell 84, 147 . 92. Leontis , N.B., Kwok , W . an d Newman, J.S. (1991 ) Nud. Acids Res. 19, 759 . 93. Leontis , N.B. , Hills , M.T. , Piotto , M. , Ouporov , I.V. , Malhotra , A . an d Gorenstein , D.G. (1994 ) Biophys. J. 68 , 251 . 94. Hirao , I. , Nishimura , Y. , Naraoka , T. , Watanabe , K. , Arata , Y. an d Miura , K . (1989 ) Nud. Adds Res. 17, 2223 . 95. Sandusky , P. , Wooten , E.W. , Kurochkin , A.V. , Kavanaugh , T. , Mandecki , W . an d Zuiderweg, E.R . (1995 ) Nucl. Acids Res. 23, 4717 . 96. Sims , J., Capon, D . an d Dressier, D . (1979 ) J. Biol. Chem. 254, 12615 . 97. Hirao , I. , Ishida , M., Watanabe , K . an d Miura , K . (1990 ) Biochim. Biophy. Acta, 1087 , 199. 98. Heus , H.A . an d Pardi, A. (1991 ) Science 253, 191 . 99. Pley , H.W., Flaherty , K.M. an d McKay, D.B. (1994 ) Nature 372, 111 . 100. Maskos , K. , Gunn , B.M. , LeBlanc , D.A . an d Morden , K.M . (1993 ) Biochemistry 32 , 3583. 101. L a Spada, A.R., Wilson , E.M. , Lubahn , D.B., Harding , A.E . and Flschbeck, K.H. (1991 ) Nature 352 , 77 . 102. Brook , J.D., McCurrach , M.E. , Harley , H.G. , Buckler , A.J. , Church, D., Aburatani, H. , Hunter, K., Stanton, V.P., Thirion , J.P., Hudson , T. et al. (1992) Cell, 68, 799 . 103. Th e H D CoEaborativ e Research Grou p (1993 ) Cell 72, 971 . 104. Verkerk , A.J.M.H., Pieretti , M., Sutclifle , J.S., Fu , Y.H., Kuhl , D.P., Pizzuti , A., Reiner, O., Richards , S. , Victoria, M.F., Zhang , P.P . et al. (1991) Cell 65, 905 . 105. Campuzan o V., Montermini, L., Molto, M.D. , Pianese , L., Cossee, M., Cavalcanti , F., Monros, E. , Rodius , F. , Duclos, F. , Monticelli, A . et al. (1996) Science 271, 1423 .
This page intentionally left blank
12 Structures of nucleic acid triplexes Edmond Wanv and Juli Feieon Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095, USA
1. Introduction 1.1 Historical background During th e pas t 1 0 years, interest i n triple-strande d nuclei c acid s (triplexes ) ha s bee n considerable. Hundred s o f researc h papers hav e bee n publishe d o n variou s aspect s o f triple-stranded nuclei c acids : these include studies ranging fro m stabilit y and sequenc e requirements t o potentia l biological roles and pharmaceutical uses. Although the ide a of triple-strande d nuclei c acid s (triplexes ) seem s contemporary , i t actuall y date s back to 195 3 whe n Linu s Pauling propose d a three-stranded mode l for standard DNA tha t placed the phosphate backbone s in the centr e o f a helix an d th e base s facing outward s (1,2). Thi s mode l wa s considere d unlikel y fo r electrostati c reasons , an d th e correc t double-stranded structur e was proposed b y Watso n an d Cric k (3) . A fe w year s later, Felsenfeld an d coworker s discovere d tha t nuclei c acid s actuall y coul d for m a three stranded structur e (4,5) . Usin g RN A homopolymers , the y demonstrate d tha t pol y r(A) an d pol y r(U) , beside s formin g a duple x (presumabl y wit h Watson—Crick-type base pairs) , coul d als o for m a complex wit h a stoichiometry o f 2: 1 [pol y r(U)]:[pol y r(A)]. The y propose d tha t th e secon d pol y r(U ) stran d wa s binding i n on e o f th e grooves o f the duple x an d base pairing t o eithe r th e adenin e strand , the uraci l strand, or both strands , but wa s most likel y Hoogsteen base pairing (6 ) to th e adenin e strand. Additional triple-strande d RNA s wer e soo n discovered . Pol y r(C ) wa s shown t o form a 1: 2 comple x wit h pol y r(G ) (7 ) o r wit h guanosin e oligonucleotide s (8,9) . Lipsett demonstrate d tha t a duplex o f Watson-Crick bas e pair s was formed initially , and tha t th e secon d guanin e stran d boun d t o thi s duplex . Lipset t propose d tha t th e second guanin e stran d boun d t o th e duple x i n th e majo r groov e vi a Hoogstee n pairing to th e firs t guanin e strand, forming G:G C triplets. 1 Interestingly , if the p H was lowered t o p H 5 or 6 , pol y r(C ) wa s shown t o for m a 2: 1 comple x wit h guanosin e oligonucleotides preferentiall y (9,10) , rather than a 1:2 complex. Lipsett proposed tha t the secon d poly r(C ) stran d was protonated a t the N3 (imino ) positions , enablin g it to form Hoogstee n bas e pairs with th e poly r(G ) strand, to form C +:GC triplets (9). The formatio n o f triple-strande d DN A structure s was subsequently demonstrate d for pol y d(I):pol y d(I):pol y d(C ) (11,12 ) an d pol y d(T):pol y d(T):pol y d(A ) (13) . 1
In the triplet convention used here , the base preceding the ' :> is the third strand base. The bas e followin g the ''.' is the Watson—Cric k strand bas e tha t i s involved i n th e (reverse ) Hoogstee n pairin g with th e thir d strand base. The las t base is the othe r Watson-Crick base .
356
Oxford Handbook of Nucleic Acid Structure
Mixtures of RNA an d DNA polymer s wer e als o shown t o for m triplexes, suc h as poly d(T):poly r(A):pol y d(T) (14 ) an d poly d(I):pol y d(I):poly r(C) (12) . Many additiona l triplexes wer e forme d fro m polymer s o f modifie d base s o f bot h DN A an d RN A (reviewed in ref s 1 5 and 16) . These studies , althoug h limite d t o homonucleotid e polymers , demonstrate d tha t triplex formatio n wa s sequence dependen t [e.g . pol y d(T ) woul d for m a triplex wit h poly d(A):poly d(T), but no t wit h pol y d(G):pol y d(C)]. I t was not initiall y recognize d that man y o f these triplexe s coul d b e isomorphou s structures . In 1968 , Morga n an d Wells showe d tha t a stabl e triple x coul d b e constructe d fro m th e mixe d sequenc e polynucleotide pol y r(UC):pol y d(GA):pol y d(TC ) (17) . This importan t resul t led t o the realization that the sequenc e requirements for triplex formatio n could b e general ized to homopurine—homopyrimidin e sequences (at least in the cas e of a homopyrim idine third strand) . Furthermore, thes e results indicated that at least two o f the triplet s (U:AT an d C+:GC ) wer e likel y t o b e isosteric . Morga n an d Well s als o foun d the y could inhibi t RN A polymeras e by using an exogenous RNA stran d to target a duplex DNA sequence , suggesting a possible biological rol e fo r triple-stranded nuclei c acids. 1.2 Overview of triplex motifs and triplet base-pairing schemes A triplex i s formed b y th e bindin g o f a third nuclei c aci d strand in th e majo r groov e of a duple x nuclei c acid . Th e duple x mus t generall y b e compose d o f a homopurine-homopyrimidine sequenc e (fo r reviews, se e ref s 18—24) . Ther e ar e tw o types o f triplexe s tha t ca n b e distinguishe d b y th e orientatio n an d compositio n o f their thir d strand . I n thi s review , w e defin e th e tw o type s o f triplexe s as : (i) parallel motif triplexes , als o know n a s the pyrimidine , o r YRY , moti f triplexes ; an d (ii ) th e antiparallel motif triplexes , also known a s the purine, o r RRY, moti f triplexes. The paralle l motif i s generally characterize d by a homopyrimidine thir d stran d that binds paralle l t o th e homopurin e stran d o f the duple x (centra l strand of th e triplex) . This moti f has two canonica l triplets: a T:AT triplet , whic h i s formed when a thymine in th e thir d stran d Hoogstee n bas e pairs with a n adenin e i n th e duple x (Fig . 12.la) , and a C+:GC triplet, whic h i s formed when a protonated cytosin e i n th e thir d strand Hoogsteen bas e pairs with a guanine in the duple x (Fig . 12.1b) . The thir d stran d cytosine is protonated a t the N3(imino ) position ; thu s parallel triplex formation ha s a p H dependence an d i s favoured by low p H (9,25,26) . Triplexe s o f the paralle l moti f wil l be referre d to a s PTs (paralle l triplexes). The antiparalle l moti f i s characterize d b y a homopurin e thir d stran d tha t bind s antiparallel t o th e homopurin e stran d o f th e duple x (centra l stran d o f th e triplex) . This moti f ha s thre e canonica l triplets : a G:G C triplet , whic h i s forme d whe n a guanine in the third stran d reverse Hoogsteen bas e pairs with a n guanine in the duple x (Fig. 12.2a) ; a n A:A T triplet , whic h i s formed whe n a n adenin e i n th e thir d stran d reverse Hoogstee n bas e pairs with a n adenine i n th e duple x (Fig . 12.2b) ; and a T:A T triplet (whic h is different fro m th e T:A T triple t in th e paralle l motif), which i s formed when a thymine i n th e thir d stran d reverse Hoogstee n bas e pairs with a n adenin e i n the duple x (Fig . 12.2c) . Unlik e th e canonica l triplet s i n th e paralle l motif , th e thre e canonical triplets in the antiparalle l motif ar e not isosteri c (Fig. 12.3) , leading t o possi ble backbone distortion s whe n th e triplet s ar e intermixed. Th e antiparalle l triplexes ,
Structures of nucleic acid triplexes 35
7
Fig. 12.1. Triple t base-pairin g scheme s fo r th e paralle l triple x motif : (a ) T:A T canonica l triplet , (b) C +:GC canonical triplet, (c) G:TA mismatc h triplet, (d ) T:CG mismatc h triplet , (e ) 7G:GC triplet , and (f) D 3 base . Th e mismatc h triplet s are th e one s fo r whic h hig h resolutio n triple x structures containing them have been determined .
358
Oxford Handbook of Nucleic Acid Structure
Fig. 12.2. Triple t base-pairin g schemes fo r th e antiparalle l triplex motif : (a ) G:GC canonica l triplet, (b) A:AT canonical triplet, (c) T:AT canonica l triplet, and (d ) T:CG mismatc h triplet.
unlike th e parallel triplexes, ar e not p H dependent . Triplexe s o f the antiparalle l motif will be referre d t o a s APTs (antiparalle l triplexes). The sequenc e definition o f the paralle l and antiparallel motifs is somewhat compli cated by th e fac t tha t a third stran d composed o f a mixture o f guanines an d thymine s can switc h polarity fro m antiparalle l to paralle l depending o n th e rati o of guanines to thymines an d o n th e numbe r o f Gp T an d Tp G steps , (27—29) . Also , antiparalle l sequences ca n sometime s b e force d paralle l (30) , an d paralle l sequences ca n some times be forced antiparallel (21,31). 1.3 Biological significance of triplex formation Triplexes readil y for m unde r physiologica l conditions , bu t i t remain s unclea r wha t biological roles triplexes play in vivo, if any. In thi s section, we giv e a brief overvie w o f possible biologica l role s an d evidenc e fo r th e formatio n o f triplexe s in vivo. Mor e extensive discussions can be found in othe r review s (24,32) .
Structures of nucleic acid triplexes 35
9
Fig. 12.3. Isosteri c compariso n o f the base triplets. Onl y th e Cl' an d Nl (pyrimidines ) o r N9 (purities ) atoms ar e shown a s the tail s and head s of small arrows, indicatin g th e orientatio n o f the glycosidi c bond. The Watson—Cric k base pairs for each triplet hav e bee n superimpose d t o illustrat e the relativ e positio n o f the thir d stran d base. The mismatc h triplet s ar e connected b y dashed lines , (a ) Superposition o f the canon ical triplet s fo r bot h th e paralle l an d antiparalle l triple x motifs , (b ) Superpositio n o f th e canonica l an d mismatch triplet s for the paralle l triplexes , (c ) Superposition o f the canonica l an d mismatch triplet s for th e antiparallel triplexes .
Transcriptional regulatio n i s a possible and obviou s rol e fo r triplexes . I n fact , fro m the firs t discover y o f triple-stranded nuclei c acids , it wa s suggested that a biologicall y important three-strande d comple x coul d b e constructe d fro m single-strande d RN A and duple x DNA (4) . Also, early on, triplexe s were show n t o be stable under physio logical condition s an d t o inhibi t variou s enzyme s suc h a s RNA polymeras e (17) , DNAase I (33), and RNAase (33) . An earl y proposal by Miller an d Sobell ingeniously hypothesized tha t certai n represser s ma y be ribonucleoprotein s wher e th e sequenc e specificity i s conferre d b y a complementar y mRN A capabl e o f formin g a triple stranded complex wit h DN A (34) . One importan t poin t t o conside r about th e biologica l relevanc e o f triplexes i s that triplex formatio n require s a ru n o f purine s i n on e stran d (an d pyrimidine s i n th e other). Thi s requiremen t woul d appea r t o restric t triplexe s t o a mino r rol e in vivo. However, homopurine—homopyrimidin e tracts turn out t o be statistically three to fou r
360
Oxford Handbook of Nucleic Acid Structure
times over-represente d i n eukaryoti c (35 ) an d eukaryoti c vira l genome s (36 ) (fo r review see ref . 18) . Homopurine-homopyrimidin e tract s are not, however , over represented in prokaryotic (35 ) or bacteriophage genome s (36) , implying tha t triplexes may hav e a biologica l rol e i n eukaryotes , bu t no t i n prokaryotes . Man y homop urine—homopyrimidine tract s ar e foun d upstrea m o f gene s (fo r examples , se e ref s 37-40) o r withi n gene s (41,42) , consisten t wit h th e hypothesi s tha t triplexe s pla y a role i n transcriptional regulation. These homopurine-homopyrimidin e tracts are ofte n hypersensitive t o single-strande d nuclease s (43-47), indicatin g tha t the y ma y adop t a non-B-DNA conformation . In 1986 , Frank-Kamenetski i an d coworker s mad e a discover y tha t ha d importan t implications fo r th e in vivo existenc e o f triplexes , an d consequentl y sparke d much o f the renewed interes t in triplexes. The y showe d tha t an intramolecular triple x coul d b e formed a t homopurine—homopyrimidine mirror repea t sequences in negatively super coiled plasmid s (25,48,49) . Thei r propose d triple-strande d structur e als o explaine d the S 1 nucleas e hypersensitivit y o f thes e sequence s (25,48,49) . Man y o f th e homopurine—homopyrimidine sequences discovered so far are in fact mirror repeats (50), strongly suggestin g tha t triplexe s d o for m in vivo. Thes e triple-strande d structures, dubbed H-DNA, 2 ar e created when on e hal f of the mirror repea t dissociates into sep arate homopurin e an d homopyrimidin e singl e strands , followed b y the homopyrimi dine stran d foldin g bac k o n t o th e remainin g duple x hal f o f th e mirro r repea t an d binding i n th e majo r groov e t o fro m a parallel triple x (Fig . 12.4) . Th e remainde r o f the homopurin e stran d is single-stranded an d account s for th e S 1 nucleas e sensitivity. It i s als o possibl e fo r th e homopurin e stran d t o fol d back , formin g a n antiparalle l triplex (*H-DNA ) (51-53) . I n fact , ther e ar e numerou s possibl e H-DNA-relate d structures (fo r reviews se e refs 2 4 an d 54) . Divalen t cation s appear to b e require d fo r
Fig. 12.4. Intramolecula r foldin g schem e for H-DN A an d *H-DN A illustratin g the paralle l an d antiparallel orientation of the thir d strands. The homopyrimidin e strand is in grey and the homopurin e strand is in black . Th e thi n solid line s represen t Watson—Crick bas e pairing and th e thi n dashe d lines represent (reverse) Hoogstee n base pairing. 2
The 'H ' indicate s a proton becaus e th e origina l sequences required low pH , or , th e 'H ' stand s fo r 'Hoogsteen' o r 'hinged' .
Structures of nucleic acid triplexes 36
1
the favourabl e formatio n o f *H-DN A ove r H-DN A (51—53) . In vivo footprintin g results support the hypothesis that H- an d *H-DNA exist in living cells (55,56) . The requiremen t fo r protonation a t N3 o f cytosines in the thir d stran d of PTs sug gests tha t onl y smal l amount s o f triple x ma y exis t a t neutra l pH . However , severa l groups hav e demonstrate d tha t paralle l triplexe s ca n b e forme d a t physiologica l p H (57-60), an d ca n b e furthe r stabilize d a t neutra l p H b y replacin g th e thir d stran d cytosines with th e naturall y occurring 5-methylcytosin e ( m5C) (61-64) . H-DNA can also b e stabilize d a t physiologica l p H b y increasin g the negativ e superhelica l densit y (25,49,58), an d ha s been show n t o for m a t physiological p H an d superhelical density (60). Althoug h th e antiparalle l triplexes ar e not p H dependent , thei r bas e triplets are not isosteri c (Fig. 12.3), whic h mak e them les s stable. The identificatio n of triplex-binding protein s provides some evidenc e tha t triplexes are actuall y use d in vivo. Tw o triplex-bindin g protein s wit h apparen t molecula r weights o f 55 kDa hav e been purifie d fro m HeL a cell s (65,66) . Bot h protein s preferentially bind triple-strande d DN A ove r duplex DNA , bu t the y hav e differin g bindin g affinities an d sequence specificities, indicatin g that the y ar e different protein s (65,66). Immunostaining o f mous e an d huma n chromosome s wit h monoclona l antibodie s that specificall y recogniz e triplexe s reveale d a stron g correlatio n wit h chromosom e banding patterns , whic h suggest s tha t triple x formatio n i s cell cycl e dependen t an d may pla y a rol e i n chromosom e condensatio n an d organizatio n (67—69) . Direc t binding o f th e tw o antibodie s t o nucle i wa s als o show n t o inhibi t cel l growth , specifically a t the en d o f S phase and during G 2 (70) , while contro l antibodie s had n o effect. Thi s furthe r suggest s a role for triplexes in chromosome condensation . Several investigations hav e addressed the questio n o f whether triplexe s ar e involved in transcriptiona l regulation. Fo r example , a homopurine—homopyrimidine sequence was constructe d withi n a B-galactosidas e gen e withou t alterin g th e amin o aci d sequence by takin g advantage of codon degeneracy (71) . I n E. coli, the tota l enzym e activity was reduced roughly 80 % relative to th e wil d type sequence. Truncate d transcripts were als o isolated that were o f the lengt h predicte d betwee n th e star t sit e and the putativ e intramolecula r triple x site . Anothe r se t o f experiment s indicat e tha t triplexes may regulate transcriptio n via a tows-acting factor . I n mous e cells, poly d(G) sequences upstrea m o f a gen e wer e foun d t o ac t a s enhancer s (72) . However , th e enhancement wa s strongly dependen t o n th e lengt h o f the pol y d(G ) tract ; d(G) 27_30 enhances transcription , whereas d(G) 35 doe s not . In vitro, whe n pol y d(G ) tract s are inserted into a mildly supercoile d plasmid, tracts 32 bp o r longer form H-DNA, whil e tracts 30 bp o r shorter d o not. Furthermore , i f another plasmi d with a poly d(G) tract is cotransformed with th e first , the n th e secon d plasmi d can reduce expression o f th e first i f the pol y d(G ) trac t i s 30 bp , bu t no t 3 5 b p (72) . This suggest s that in vivo th e longer pol y d(G ) tracts are forming intramolecular triplexe s and blocking a trans-acting transcription factor . Triplexes ma y also play a role in homologous recombination. Fo r example, homol ogous recombinatio n wa s induced betwee n tw o direc t repeat s by activ e transcription in vivo whe n a poly d(G):pol y d(C ) sequenc e was inserted betwee n the m (73) . Thi s effect wa s proposed t o b e cause d by formatio n o f *H-DNA, whic h ma y then brin g two remot e sequence s togethe r t o stimulat e homologou s recombinatio n (73) . Similarly, usin g an in vivo plasmid—plasmi d recombinatio n assay , it has been demon -
362
Oxford Handbook of Nucleic Acid Structure
strated tha t plasmids containin g potentia l H-DN A (o r *H-DNA)-forming sequences undergo increase d recombination, whil e thos e plasmid s containin g nearl y identica l sequences tha t ar e unlikely t o for m intramolecula r triplexes, hav e no effec t (74) . Th e single stran d produce d b y H-DN A formatio n ma y b e actin g a s a n invadin g singl e strand in homologous recombinatio n (74). Recently, a palindromic homopurine—homopyrimidin e sequence require d fo r th e lytic replicatio n o f the Epstein—Bar r vira l genome ha s been studie d and show n t o b e capable of forming *H-DNA (75) . Mutations in the sequenc e inhibit bot h replication and *H-DN A formation . Surprisingly , complementar y mutation s tha t restor e th e palindrome als o restore replication an d *H-DNA formation. Thi s resul t suggests that it is not th e sequence , but th e palindrom e (an d its resulting structure ) that is important for replication . This i s the stronges t evidence yet of a biological role for triple-strande d nucleic acids.
1.4 Triplexes as therapeutics The secon d are a of research that has caused the resurgenc e o f interest in nuclei c acid triplexes i s in th e us e o f triplexe s a s potential therapeutics . This wor k ha s bee n th e motivation fo r many of the studie s on sequenc e specificity and alternate triplets in th e structures reviewe d here . Her e w e presen t a brief overvie w o f effort s t o targe t par ticular sequence s o f duple x DN A throug h triple x formation , wit h a n emphasi s on modification s use d t o hel p exten d an d improv e sequenc e specificit y (fo r mor e complete reviews, se e refs 76—78) . The genera l ide a behin d mos t potentia l pharmaceutica l applications o f triplexes is the targetin g o f sequences within o r upstrea m o f a particular gene vi a triplex forma tion, i n orde r t o bloc k transcriptio n an d thu s repress protein productio n a t the DN A level. Thi s strateg y is sometimes called th e antigen e strateg y and i s analogous to th e antisense strategy, except that the antigen e strateg y operates at the transcriptiona l level instead o f th e translationa l level. Man y researcher s have proven th e feasibilit y of th e antigene strategy in vitro by inhibiting transcriptio n of specific gene s (fo r examples, see refs 79—81) . I n additio n t o blockin g RN A polymerase , a numbe r o f othe r DNA binding protein s ca n als o b e inhibited , suc h a s DNA polymeras e (82-84) , variou s endonucleases (85-88) , methylas e (89) , NF-k B (88) , an d othe r transcriptio n factor s (89—91). Th e antigen e strateg y has also been show n t o wor k in vivo (92—96) . An anti gene RNA oligonucleotid e ha s been constitutivel y expressed from a vector and show n to reac h a high steady state concentration in vivo (97). Another pharmaceutica l applicatio n i s t o creat e artificia l nuclease s by couplin g triplex-forming oligonucleotide s t o DNA cleavin g reagents, such as Cu 11—phenanthroline (98,99) , Fe"-EDTA (57,100) , o r a n azidoproflavine derivative (101) . A photoac tive nucleas e ha s als o bee n produce d usin g ellipticin e (102,103) . Thes e artificia l nucleases ar e muc h mor e specifi c tha n naturall y occurrin g nuclease s because thei r recognition sequence s are potentially muc h longer , althoug h th e cleavag e position i s less precise. The y have been used to cleav e a single site in the bacteriophage A genome (104) an d a yeast chromosome (105) . Because of their abilit y t o generat e larg e DN A fragments, thes e artificial nuclease s are potentially usefu l i n chromosome mapping .
Structures of nucleic acid triplexes 36
3
In vitro triple x application s includ e use s as an artificia l ligase (106) , a s a sequence specific mutage n (107—109) , a s an agen t fo r th e purificatio n an d isolatio n o f specifi c double-stranded DN A sequence s (110—114), an d a s an agent to purif y PC R product s (115). Triplex-forming oligonucleotide s hav e also been use d as sequence probes, as in Southern blotting , excep t tha t the y hybridiz e t o double-strande d DN A (116) . B y covalently linkin g protein s t o single-strande d oligonucleotides , triple x formatio n has been used to target proteins to specifi c DN A sequence s (117). A majo r shortcoming o f triplexe s a s therapeutics is that their formatio n require s a homopurine—homopyrimidine sequence . Th e usefulnes s o f th e antigen e strateg y would b e greatl y expande d i f one coul d for m triplet s with al l four bas e pairs. I n a n effort t o targe t base pair inversions within a homopurine—homopyrimidine sequence, several group s hav e investigated th e stability and selectivity o f aternate triplet s in both the paralle l moti f (118-125 ) an d th e antiparalle l motif (126,127) . Eve n in vitro selec tion technique s have been employe d t o identif y mismatc h triplet s (117) . For th e paralle l motif , ever y investigatio n foun d tha t th e canonica l T:A T an d C+:GC triplets are the most stable. Non-canonical base s in the thir d strand have essentially the sam e effect o n triple x stabilit y that mismatches do o n DN A triplexes . Thus , an AT bas e pair is most effectivel y recognize d by a T, althoug h a n A ca n als o form a reasonably stabl e A:AT triple t (117,118,122,123) . Similarly , a G C bas e pai r i s mos t effectively recognize d by a protonated C , althoug h A +:GC, T:GC , an d G:GC triplet s can al l for m dependin g o n th e conditions , bu t ar e considerabl y les s stabl e (117,118,120—123). Severa l studies have investigated the stabilit y of triplets formed by third stran d recognition o f TA and C G bas e pairs (where the pyrimidin e base is in th e 'homopurine' strand) . It wa s found that to recogniz e a TA base pair, a G in th e thir d strand forms the mos t stable triplet (119,122—125 ) under mos t conditions, but i n som e cases a C:T A triple t i s more stabl e (120). A C G bas e pair is recognized b y bot h a T and a C, althoug h th e resultin g triplets are not ver y stable, and results vary as to whic h triplet is more stabl e (120,124,125). Thus , fo r th e paralle l motif, al l four base pairs can be targeted, but a t a cost in triplex stability when T A and, especially, CG bas e pairs are involved. I n addition, some specificit y i s lost since a T ca n recognize bot h a n AT and a CG bas e pair, an d a C ca n recogniz e bot h a G C an d a CG bas e pair. Th e effec t o f these alternate triplets on triple x stability depends on th e sequence context, i.e . whic h triplets are neighbouring, an d the p H (128—131) . For the antiparalle l motif, th e canonica l G:GC, A:AT , and T:AT triplet s were found to be by far the most stable (126,127) . However, an A can also bind to a GC base pair, and T ca n bin d t o a CG bas e pair. Fo r a TA bas e pair, ther e ar e n o stabl e triplets (a T:TA triple t i s th e leas t destabilizing ) (126,127) . Thus , th e lac k o f a stabl e TA containing triple t mean s that th e antiparalle l motif i s more restricted than the paralle l motif i n th e sequence s that can be targeted . I n addition, a large amount o f specificit y is lost since a T ca n recognize bot h a n AT o r a CG bas e pair and a n A can recogniz e both a n AT an d a GC bas e pair . Thi s mean s that althoug h T:A T an d A:A T ar e th e most stable triplets, both th e T an d the A can form reasonably stable aternate triplets. Another tacti c t o avoi d th e sequenc e restriction s i s t o simpl y bypas s a homopurine—homopyrimidine inversion site by inserting a n abasic residue in th e thir d strand. Abasi c substitution s generall y yiel d stabl e triplexe s i n bot h th e paralle l (124,132) an d the antiparalle l motif (133) , but decreas e specificity (124). An imidazol e
364
Oxford Handbook of Nucleic Acid Structure
has als o bee n use d a t a n inversio n sit e wit h som e succes s (134) . Severa l nucleotid e derivatives and synthetic bases have also been studied . One purin e derivative , 7-deaza 2'-deoxyxanthosine (dzaX) , was used as a T analogu e i n APT s (135) . Unlik e a T:A T triplet, a dzaX:A T triple t i s isosteri c wit h a G:G C triplet . Th e dzaX:AT-containin g triplex wa s found t o b e 100-fol d mor e stabl e tha n th e equivalen t T:AT-containin g triplex. Anothe r purine derivative , deoxynebularine , wa s found to recogniz e bot h C G and AT base pairs in APTs (136). Tw o synthetic bases, 3-(2-deoxy-j8-D-ribofuranosyl) 2-methyl-8-(N'-n-butylureido)naphthyl[l,2]imidazole an d 1 -(2-deoxy-/3-D-ribofuranosyl)-4-(3-benzamidophenyl)imidazole (D 3), hav e bee n designe d t o recogniz e C G base pairs in PTs by forming specific hydroge n bond s with both th e guanin e and cytosine (137,138) . Th e latte r bas e (D 3) wa s shown t o intercalat e an d mimi c a complet e triplet instead of hydrogen bonding to a Watson-Crick bas e pair (139,140) . For target sequences that consist of a homopurine trac t followed by a homopyrimidine tract, one coul d envision binding tw o oligonucleotides: one to eac h homopurin e tract on opposit e strands, linked togethe r a t the junction wher e the y meet. Thi s woul d effectively allo w triplexes to be targeted to a wider rang e of sequences. This alternativ e strand-targeting strategy has been teste d by several groups, using either a 5'-5' linkag e (141) o r a 3'—3 ' linkage (142) , an d ha s been foun d t o b e effectiv e i n formin g stable triplexes. Another metho d for alternate strand triplex formatio n come s from th e real ization tha t third strand s of PTs and APTs bin d t o thei r homopurin e strand s in opposite orientations. Therefore, alternate strands can be targeted by alternating the triple x motif, withou t changin g th e thir d stran d polarity , an d withou t th e us e o f unnatural 5'—5' o r 3'—3 ' linkages. Suc h triplexe s d o indee d for m stabl e complexe s (143—146) , greatly enhancin g th e sequenc e space that ca n be targete d by triplexes. An interesting variation o f th e alternat e stran d triplexe s take s advantag e o f th e fac t tha t APT s ca n switch polarit y dependin g o n th e Tp G an d Gp T conten t (27—29) . Thus , certai n sequences can be targete d o n alternat e strands using solely th e antiparalle l motif (27). A significant limitatio n o f all of the cross-ove r triplexe s is that a longer tota l sequence is required to for m a stable triplex. Although th e paralle l moti f ca n mor e successfull y an d specificall y target a large r sequence spac e tha n th e antiparalle l motif, th e paralle l motif ha s the disadvantag e of being considerably less stabl e a t physiological p H owin g to th e nee d t o protonat e th e cytosines. Muc h researc h has been devote d t o reducin g th e p H dependenc e o f PTs . The mos t commo n metho d o f increasing triplex stabilit y at neutral pH i s the substitution o f m5C fo r cytosine (28,61,62,89,147,148). A n alternativ e is to us e uncharged C + analogues, suc h a s pseudo-isocytidine (149,150) , l-(2-deoxy-j8-D-ribofuranosyl)-3 methyl-5-amino-lH-pyrazolo[4,3-rf]pyrimidin-7-one (P1 ) (151—153) , N7-glycosy lated guanin e (154) , 8-oxoadenine (155,156) , o r 4-amino-5-methyl-2,6-pyrimidione (157). A G:GC triple t in a parallel motif ha s also been use d instead of a C + (28) . Triplexes ca n als o be stabilize d by th e us e o f intercalatin g agent s (158) . However, mismatches ar e als o stabilize d (159) , s o som e specificit y ma y b e lost . Mos t o f thes e intercalating agents also bind preferentiall y to triplexe s over duplexes. Some intercala tors, suc h a s echinomycin an d actinomyci n D, hav e no effec t o n triple x stabilit y and may even destabilize triplexes (160) . When conjugated to th e en d o f a triplex-formin g oligonucleotide, th e intercalatin g agen t anchor s th e oligonucleotide , whic h greatl y improves triple x stabilit y in bot h paralle l (147,159-161 ) an d antiparalle l (161) motifs .
Structures of nucleic acid triplexes 36
5
The mino r groov e bindin g drug s netropsin an d berenil destabiliz e triplexes (162—164) , which suggest s tha t th e mino r groov e environmen t ma y be significantl y different i n triplexes an d duplexes. Another metho d o f stabilizin g triplexe s i s to attac h a cross-linkin g reagen t t o th e third strand and induce a covalent linkage to th e duplex . Severa l alkylating agents have been show n t o cross-lin k t o on e o f the duple x strand s (165,166) . Fo r example , pso ralen, a n intercalator , ca n dramaticall y stabiliz e triplexe s b y cross-linkin g t o bot h strands o f th e duple x (167—169) . Psoralen-cross-linke d triplexe s hav e eve n bee n demonstrated in vivo, and shown t o inhibi t transcriptio n mor e effectivel y tha n th e fre e oligonucleotide. However, the psoralen—triplex-mediated inhibition may be abolishe d in onl y a few hours by cellular DNA repai r systems (170). Finally, in order t o improv e variou s pharmaceutical characteristics, such as resistance to degradation , increased stability, or increased cellula r uptake, researchers have investigated triplex formatio n using modified backbone s (171). The mos t commo n targe t for modification i s th e phosphates , whic h hav e bee n replace d wit h phosphorothioate s (172—175), methy l phosphonate s (176,177) , o r guanidiniu m group s (178) . Oligonucleotides containin g phosphorothioat e o r methy l phosphonat e linkage s for m triplexes, bu t ar e les s stabl e tha n thei r DN A counterpart s (179-182) . Mixe d guanidinium—DNA triplexes ar e extremely stabl e (wit h meltin g temperature s as high as 100°C ) becaus e of the favourabl e interaction s between th e positivel y charged guani dinium moietie s an d th e negativel y charge d phosphate s (178) . Othe r modification s involve th e ribos e sugars , suc h a s replacing th e 2'-hydroxy l i n RN A wit h a 2'-O methyl (149,183,184) , o r replacin g th e ribos e wit h a riboaceta l grou p (185) , o r bicyclic rin g structur e (186) . Thes e modification s resul t i n triplexe s tha t ar e mor e stable tha n thei r DN A counterpart s (185-188) . Th e entir e backbon e ha s bee n replaced b y a peptide-lik e structure , a so-calle d peptid e nuclei c aci d o r PN A (189-191) (fo r review, se e ref. 192) . Whe n targete d t o duple x DNA , PNA s bin d t o their complementar y DN A stran d vi a stran d invasion , followe d b y a secon d PN A strand bindin g i n th e majo r groove , t o for m a PNA:DNA:PN A triple x (189,193) . These complexe s are extremely stable .
2. Structures of parallel triplexes 2.1 Background Of th e tw o triple x motifs , th e PT s ar e th e bes t characterized . Early researcher s had speculated o n th e base-pairin g scheme s o f th e canonica l T:A T an d C +:GC triplet s (9,10,34,194). I n 1973 , th e firs t direc t structura l information about PT s was provided by X-ra y fibr e diffractio n experiment s o n pol y rU:pol y rA:pol y r U an d it s DN A equivalent (195-197) . The X-ra y diffractio n dat a illuminated th e correc t base-pairin g scheme o f the U:A U an d T:AT triplet s (Fig . 12.1a) . The T:A T (U:AU ) triple t consists of a standard Watson-Crick AT base pair with a second thymine binding in the majo r groove vi a Hoogsteen hydroge n bondin g t o th e adenin e i n a parallel orientation. Th e third stran d has an anti glycosidic conformation. The fibr e diffractio n dat a indicated tha t the tripl e helices ha d a helical rise similar t o B-DNA but a low twis t an d a deep majo r groove simila r to A-DN A (Tabl e 12.1 ; ref .
Table 12.1. Structura l data on parallel triplexes Triplex sequence11
Method
DNA:RNA composition
angle
X
Sugar pucker*
Rise'
(A)
Twist1 (°)
x-disp'
(A)
Inclin' (°)
Reference (see notes )
S N S'
3.4 2.6 3.1 [3.3] 3.3 3.5 [3.4] 3.5 [3.4] 3.1 [3.1] 3.2 [3.2] 3.0 [3.3] [3.2] 3.3 [3.3] [3.2]
36 33 31 [31] 31 31 [31] 32 [32] 29 [29] 31 [31] 30 [28] [30] 30 [30] [33]
-0.7 -5.4 -4.0
-6.0 19.1 2.4
1
4.9 3.8
2 3
2.8
4
5.3
5
-1.9
6
13.9 [-5.0] [5.0] 5.4 [8.5] [10.0]
7 8 8 9
3.2
[32] 29
B-DNAd A-DNAd YRY1
NMR
D:DD
anti anti anti
YRY2 GTA
NMR NMR
D:DD D:DD
anti anti
Sf Ss
TCG
NMR
D:DD
anti
Sh
N7G
NMR
D:DD
anti
S
DTA
NMR
D:DD
anti
S'
PAT (TAT) 12 Mixed Poly(T:AT)'
NMR X-ray X-ray X-ray
D:DD D:DD D:DD D:DD
anti anti anti anti
Poly(C+:IC) Mixed (AG)3 Poly(C+:GC) Poly(C+:GC) Poly(T:AT) Poly(T:AT) (GA)n UAT Mixed Mixed Poly(C+:GC) Poly(U:AT) (GA), Poly(T:AT) Poly(U:AU) Poly(U:AT) Poly(C+:GC) Poly(T:AU) Mixed Mixed
X-ray FTIR FTIR FTIR/Raman FTIR FTIR FTIR Gel NMR FTIR FTIR FTIR FTIR Gel FTIR X-ray FTIR FTIR FTIR NM NMR/ FTI R
D:DD D:DDn D:DD? D:DD D:DD D:DD D:DD D:DD R:DD R:DD5 R:DD R:DD R:DD R:DD D:RD R:DR R:RD D:RR D:RR R:RR R:RR
anti anti
anti
anti
anti anti
k
s
S S N" N" S&N" S&N« S&N' S&N' S S S S&N' S&N" N S&N S&N N™ S&N N N N N
-2.8 -1.8 [-1.9] -2.2 [-2.1] -3.1 H2.9] -1.2 [-1.4] -3.0 [2.6]' [2.5]' -3.6
-2.0
-3.2
[32] [3.0]
[33]
[12.0]
9 10 11 12 13 14 15 16 17 10 18 13 15 16 15 9 15 13 15 19 20,21
Table 12.1 .
Continued
Triplex DNAiRN sequence" Metho Poly(U:AUr Poly(U:ALTr X-ra Poly(U:AU)' X-ra Mixed FTI Poly(C+:GC) FTI Poly(U:AU) FTI
d compositio Method y R:R X-ray y R:R X-ray R R:R FTIR R R:R FTIR R R:R FTIR
A DNA:RNA n angl composition R R:RR R R:RR R R:RR R R:RR R R:RR
X X Suga anglee pucker anti anti anti anti N N N N
" The tabl e is grouped b y DNA, DNA:RN A hybrid , and RNA triplexes . Th e hig h resolutio n NMR structure s are listed i n bold by the names use d i n the text; Otherwise, the composition of the triplexe s are given a s triplets [e.g. (T:AT) n] or as the sequence in the purine stran d [e.g. (AC ) n ] or a s 'mixed' for more comple x sequences. h S-type suga r pucker is a C2'-endo conformation. N-ryp e suga r pucker is a C3'-endo conformation. ' Helica l parameter s are calculated using 'Curves' v5.1 with a linear helical axis for the duple x alone. Values in [ ] are the parameter s given in th e references and ma y have been calculated usin g a different metho d and/o r helical axis. d The standar d A-DNA, B-DNA , and triplex parameters were calculate d from structures created in Insight 95 (Biosym). Th e Biosy m parameters are from X-ray fibr e diffractio n dat a (refs. 22-24). f Th e pyrimidin e strand s have some N-type character. -^The pyrimidine strands hav e some N-type character, especially the cytosines. * The guanin e in the G:T A mismatc h triplet i s N-type. h The thymine i n the T:CG mismatc h triplet is N-type. ' The suga r puckers are generally S-type, excep t for one thymin e which i s adjacent to the intercalation site/The thymine methyl groups in the thir d strand have been replaced by propyne groups . * The cytosine s in the thir d strand have some N-ryp e character . ' The author s report a positive displacement from the helical axis, which probabl y represent s a neg ative x-displacement i n the standar d helical parameter convention . ™ The suga r conformations were assumed based on th e helica l structure . " The cytosine s in the thir d strand are either unmodified o r have been replaced wit h 5-MeC . ° The sugar s in the T:A T triplet s ar e S-type, whil e the sugars in the C+:GC triplets have a ratio of l:2S-:N-type. p The cytosine s in neither, either, or both pyrimidine strand s have been replaced wit h 5-MeC. q The purine s are all S-type, bu t the overal l triplex has a ratio of 2:1 S-:N-type. ' Th e guanine s are all S-type an d the cytosines in both pyrimidin e strand s are N-type. ' The riboses in the third strand are 2'O-methylated. ' The suga r pucker in the duplex pyrimidine stran d is S-type, whil e th e purine stran d and the RNA thir d strand are N-type. " The suga r puckers are mostly N-type wit h som e S-type . Bindin g of the RNA thir d strand changes the DNA duple x sugar puckers from all S-type t o mostly N-type. " NOE cross-pea k pattern s in the NM R dat a are typical of A-form helice s (ref. 25). " Data wer e collected at 92% relative humidity. 1 Data were collected a t 75% relative humidity .
Sugarr Rise pucker** (A
N™ [3.0 N" " [3.0 Nm N N N -
Rise'' Twis (A)) (°
[3.0]] [30 [3.0]] [33 -
Twisff x-disp (°)) (A
[30]] [33]] -
x-disp'f Inclin (A)) (° [12.0 [12.0 -
Inclin'' Referenc Referencee (seee notes ) ) (se
o
[12.0]] 9 [12.0]] 9 1 1 1
9 9 18 8 3 13 5 15
1. Bornet , O. an d Lancelot, G. (1995)7 Biomol Struct. Dyn. 12 , 803-14. 2. Tarkoy , M., Phipps , A.K. , Schultze, P. and Feigon, J. (1998 ) Biochemistry 37, 5810-19. 3. Radhakrishnan , I . and Patel, D.J. (1994 ) Structure 2, 17-32 . 4. Radhakrishnan , 1. and Patel, D.J. (1994)7 . Mai. Biol. 241, 600-19 . 5. Koshlap , K.M., Schultze , P., Brunar, H. , Dervan , P.B . an d Feigon, J. (1997 ) Biochemistry 36, 2659-68. 6. Wang , E. , Koshlap, K.M. , Gillespie , P. , Dervan, P.B . an d Feigon, J. (1996 ) J. Mol. Biol. 257, 1052-69 . 7. Phipps , A.K., Tarkoy, M. , Schultze, P . and Feigon, J. (1998 ) Biochemistry 37, 5820-30. 8. Liu , K., Sasisekharan, V., Miles, H.T . an d Ragunathan, G . (1996 ) Biopolymers 39 , 573-89. 9. Amott , S. , Bond, P.J., Seising , E. and Smith P.C.J . (1976 ) Nucleic Adds Res. 3, 2459-70. 10. Dagneaux , C., Liquier , J. and Taillandier, E . (1995) Biochemistry 34, 16618-23 . 11. Fang , Y. , Bai, C., Wei , Y. , Lin, S.B . and Kan, L. (1995)J- Biomol Struct. Dyn. 13, 471-82. 12. Ouali , M. , Letellier , R., Adnet , F. , Liquier, J., Sun , J.-S., Lavery, R. an d Taillandier, E. (1993) Biochemistry 32, 2098-103. 13. Akhebat , A., Dagneaux, C. , Liquier , J. an d Taillandier, E . (1992 ) J. Biamol. Struct. Dyn. 10, 577-88. 14. Howard , F.B. , Miles, H.T. , Liu , K., Ftazier, J., Raghunathan , G . and Sasisekharan, V. (1992) Biochemistry 31, 10671-7 . 15. Liqnier , J., Coffinier , P., Firon , M . an d Taillandier, E . (1991)7 Biomol Struct. Dyn. 9 , 437-5. 16. Shin , C . an d Koo, H.S . (1996 ) Biochemistry 35 , 968-72. 17. Gotfredsen , C.H. , Schultze , P. and Feigon, J. (1998) 7 Am. Chem. Soc. 120, 4281-9 . 18. Liquier, J., Taillandier, E., Klinck, R. , Guittet , E., Gouyette, C. and Huynh-Dinh, T. (1995) NucliecAdds Res. 23, 1722-8. 19. Holland , J.A. an d Hoffinan, D.W. (1996 ) Nucliec Adds Res. 24, 2841-8. 20. Klinck , R. , Guittet , E., Liquier.J., Taillandier , E., Gouyette, C . an d Huynh-Dinh, T . (1994) FEBS Lett. 355, 297-300 . 21. Klinck , R., Liquiet , J., Taillandier, E., Gouyette, C., Huynh-Dinh , T and Guittet, E. (1995) Eur.J. Bioch. 233, 544-53 . 22. Arnott , S. , Hukins, D.W. an d Dover, S.D . (1972 ) Biochem. Biaphys. Res . Comm. 48, 1392-9. 23. Arnott , S. and Hukins, D.W . (1972 ) Biochem. Biophyi. Res. Comm. 47, 1504-9 . 24. Arnott , S . and Seising, E. (1974)7 Mol. Biol 88 . 509-21 . 25. Heus , H.A . and Pardi, A . (1991)7 Am. Chem. Soc. 113, 4360-1 .
368
Oxford Handbook of Nucleic Acid Structure
9). Consequently, it was concluded tha t the structur e was A-DNA-like, and, therefore, the sugar s must adopt a C3'-endo confromatio n (195-197) . Attempts to crystalliz e oligomer (instea d of polymer) DN A hav e been unsuccessfu l to date , and have also yielded fibre-type diffractio n result s (198,199). Th e dat a indicate similar helica l parameters, but wit h a significantly lesser x-displacemen t o f the duple x from th e helica l axis (Table 12.1; ref . 8) . Arnott et al. also collecte d dat a o n a pol y dC:pol y dl:pol y d C triple x (197) . Th e base-pairing schem e for a C:IC triple t presumabl y involves a protonated thir d strand C and would b e predicted to be similar to a C+:GC triplet. I t was already known tha t th e T:AT an d C +:GC triplet s were likel y t o b e isosteri c (17) . Som e earl y NM R experi ments provided th e firs t direc t evidenc e fo r the existenc e of the protonate d cytosin e in the C +:GC triple t (200,201) . However , mor e recen t NM R studie s have provided th e first definitiv e proof fo r no t onl y th e protonate d cytosine , bu t als o th e detail s of th e C+:GC base-pairing schem e (26 ) (Fig. 12.Ib). Rajagopa l an d Feigon (202 ) were abl e to observe th e protonate d cytosin e imin o proto n directl y an d t o defin e th e base-pairin g scheme fro m magnetizatio n transfe r pathways . Thi s an d subsequen t NM R studie s (26,59,202,203) als o confirmed th e base-pairing schem e o f the T:AT triplet .
2.2 DNA parallel triplexes 2.2.1 Helix morphology NMR studie s hav e provide d th e fe w hig h resolutio n structure s o f PT s t o dat e (se e Section 2.5) . Althoug h man y o f thes e structure s contai n a mismatc h triple t o r modified bases , they still have several features in common . (a) Th e suga r puckers are generally S-typ e (wit h th e exceptio n o f some sugars of bases involved i n alternate/mismatc h triplets) . However , th e cytosine s hav e som e N type character. (b) Th e bas e pair axial rises are in the typica l range o f B-DNA. (c) Th e triplexe s ar e generall y slightl y underwoun d a s indicated b y th e lo w helica l twists, eve n when compare d wit h A-DNA. (d) Th e x-displacemen t fro m the helica l axis is intermediate betwee n A - an d B-DNA. (e) Th e inclinatio n of the bas e pairs is small, similar to B-DNA. (f) Th e thir d stran d nucleotides ar e in th e anti conformation. An importan t distinctio n betwee n th e mor e recen t NM R result s and the earl y X ray result s i s tha t th e conformation s o f th e sugar s ar e generall y C2'-endo (S-type ) (121,204—206) a s opposed t o th e earl y assumptio n that the y wer e C3'-endo (N-type ) (195-197). Thi s i s supported b y I R dat a a s well (Tabl e 12.1) . O f th e helica l features , only th e helica l twis t i s simila r t o A-DNA . Th e X-displacemen t i s roughly - 2 A , which i s greater than tha t o f B-DNA but i s not a s dramatic as the -5. 4 A x-displace ment foun d i n A-DNA. Visually , the triplexe s resemble B-DN A more tha n A-DN A (Plate X) . Th e structura l characteristics of triplexe s ca n probabl y bes t b e interprete d
Structures of nucleic acid triplexes 36
9
within th e contex t o f a B-form DN A duple x with a nucleic acid ligand binding i n th e major groove . Fo r example , th e x-displacemen t fro m th e helica l axi s i s quit e significant an d ca n be considere d a consequence o f having t o accommodat e th e thir d strand i n th e majo r groove. I f one assume s that th e suga r pucker o f DNA prefer s th e S-type conformation an d that the Hoogstee n bas e pairing of a DNA thir d stran d tends to maintai n th e B-DNA-lik e ris e an d bas e pai r inclination , the n th e increase d xdisplacement can onl y be accommodate d by an unwindin g of the helix . Thes e are precisely th e result s observed experimentally . A comment o n helica l parameters 3 is in orde r her e (se e Chapter 2 for a detailed dis cussion o f thi s topic) . Dependin g o n th e progra m an d analysi s method used , th e helical parameters can vary significantly. Table 12. 2 illustrate s how th e helica l parame ters ca n var y dependin g o n ho w th e globa l axi s i s calculated . Fo r thi s review , th e program 'Curves' , versio n 5.1 , (208,209 ) wa s used t o evaluat e the helica l parameters for al l triplexes fo r whic h coordinate s wer e available . Th e Watson—Cric k duple x o f each triple x i s used as the referenc e point, wit h th e thir d stran d essentially being con sidered a ligand; thi s make s direct compariso n t o A - an d B-DN A mos t meaningful . Therefore, th e helica l parameters reported i n represent onl y th e duple x portion s o f the triplexes. A linear helica l axi s was used for the analysi s in Tables 12. 1 an d 12. 3 t o allo w a direc t compariso n betwee n triplexes .
2.3 RNA parallel triplexes Since RNA duplexe s ar e A-form, i t seems likely tha t RNA triplexe s will als o adopt a n A-form conformation . Althoug h a n RNA triple x structur e has yet to be solved, ther e is muc h dat a t o suppor t thi s prediction . Th e origina l triple x structur e o f pol y r(U):poly r(A):pol y r(U) b y Arnott an d coworker s ha s a lower ris e and a greater base pair inclination tha n their DNA triplex , i.e . mor e A-form-like (195—197 ) (Tabl e 12.1 ; ref. 9) . Chemica l cleavag e wit h FeII-EDT A produce s slightl y differen t cleavag e pat terns for A- o r B-for m helice s (210) . Th e cleavag e patter n fo r a n all-RNA triple x is consistent with an A-form structur e (210) . Results fro m I R studie s of PTs compose d entirely o f RN A (Tabl e 12.1 ) sho w tha t onl y N-typ e suga r pucker s ar e presen t (211,212). NM R studie s o n RN A triplexe s hav e als o foun d onl y N-typ e suga r puckers (213,214) . Mor e importantly, certai n NM R cross-pea k patterns are diagnostic of A-form helice s (215,216 ) an d thes e pattern s ar e observed i n th e NM R spectr a of these RN A triplexe s (213,214) .
2.4 DNA:RNA hybrid parallel triplexes Relatively littl e i s known abou t th e structur e o f triplexes forme d fro m combination s of RN A an d DNA . X-ra y fibr e diffractio n dat a o n pol y r(U):pol y d(A):pol y r(U ) indicates tha t it s structure has more in commo n wit h th e all-RN A polyme r triplexe s than wit h th e all-DN A polyme r triplexe s (197 ) (Tabl e 12.1) . I R dat a o f mixe d 3
The helical parameters conform to the convention s define d at the 198 8 EMBO worksho p o n DNA cur vature and bending (207).
370
Oxford Handbook of Nucleic Acid Structure
Table 12.2 . Eflf e ct o f axis calculat ion o n helica l pairameters Structurea B-DNAc A-DNAc Triplexc
Riseb
Twistb
(A)
x-Dispb
Inclb
(A)
(°)
(°)
3.4 (3.4 ) 2.6 (2.6 ) 3.3 (3.3 )
36 (36 33 (33 30 (30
) ) )
-0.7 (-0.7 -5.4 (-5.4 -3.6 (-3.6
) ) )
) ) ) ) ) e )
-4.0 (-3.2 -2.8 (-2.5 -1.8 (-2.0 -2.2 (-2.1 -3.1 (-1.3 -1.2 (-1.4 -3.0 (-1.6
) ) ) ) ) ) )
Parallel D:DD YRY1 YRY2 GTA TCG N7G DTA PAT
3.1 (3.3 3.3 (3.4 3.5 (3.4 3.5 (3.4 3.1 (3.6
) ) ) ) )
3.0 (3.3
)
31 (31 31 (31 31 (31 32 (32 29 (28 31e (31) 30 (29
Parallel R:DD UAT
3.2 (3.2
)
29 (29
)
-2.0 (-1.9
Antiparallel D:DD RRY
3.6 (3.6
)
30 (30
)
-2.1 (-1.9
3.2d (3.2) d
-5.9 (-5.9 19.1 (19.1 5.4 (5.4
2.4 (-3.6
4.9 (0.9 3.8 (4.1
) )
)
)
) )
2.8 (-0.2 5.3 (-12.5 -1.9 (-1.6
) ) )
)
-3.2 (-6.1
)
)
-1.3 (-4.4
)
13.9 (0.2
)
a High resolutio n NM R structure s discussed in the text . b Helical parameters are calculated using 'Curves' v5.1 with a linear helical axis applied to th e duplex alone. Values in parenthesis were calculate d using a best-fit curved axis (refs 1 and 2 below). c The standar d A-DNA, B-DNA , and triplex parameters were calculated from structure s created in Insight 95 (Biosym) . The Biosy m parameters are from X-ra y fiber diffractio n dat a (ref s 3-5 below) . d Calculated excludin g the bas e step a t the D 3 intercalatio n site (ref. 6 below) . ' Calculate d excludin g the 5' base step (with respect to the purine strand ) and the base step at the D3 intercalation sit e (ref . 6 below). 1. Lavery , R. an d Sklenar, H. (1988 ) J. Biotnol. Struct. Dyn. 6, 63-91 . 2. Lavery , R. an d Sklenar, H. (1989 ) J. Biomol. Struct. Dyn. 6, 655-67. 3. Arnott , S. and Hukins , D.W. (1972) Biochem. Biophys. Res. Comm. 47, 1504-9 . 4. Arnott , S. , Hukins, D.W. and Dover, S.D . (1972) Biochem. Biophys. Res. Comm. 48, 1392-9 . 5. Arnott , S . and Seising, E. (1974) J. Mol. Biol. 88, 509-21. 6. Wang , E., Koshlap, K.M. , Gillespie, P., Dervan, P.B. and Feigon, J. (1996 ) J. Mol. Biol. 257, 1052-69.
DNA:RNA triplexe s sho w mixture s o f S - an d N-typ e suga r puckers (211,212,217 ) (Table 12.1) , indicatin g tha t thes e triple x structure s may be a mixture o f B- an d A forms, dependin g o n RN A content . Chemica l cleavag e o f mixe d DNA:RN A triplexes with Fe 11—EDTA produces tw o familie s of cleavage patterns corresponding t o B- and A-form helice s (210). Both the IR and chemical cleavag e studies find a general trend toward s increasin g A-form characteristic s with increasin g RN A content . Th e latter stud y also finds tha t the identit y o f the purine strand , DNA o r RNA, correlate s with th e helica l conformation , B - o r A-form , respectively . Thi s findin g is also supported b y a recent NM R structur e of an R:DD intramolecular triple x (UA T triplex i n Table 12.1) , which is most simila r t o B-form wit h S-type suga r puckers for the DN A strands (218) . Interestingly , severa l studie s hav e foun d tha t th e D:R D an d D:R R strand combination s d o no t for m stabl e triplexe s (184,219—221) . Thi s result , com bined with the ide a tha t th e identit y o f the purine stran d determine s th e helica l con-
Structures of nucleic acid triplexes 37
1
Fig. 12.5. Intramolecula r foldin g pathwa y fro m single-strande d t o duple x t o triplex . The thi n soli d line s represent Watson—Cric k base pairing and the thi n dashed lines represent (reverse ) Hoogsteen base pairing. The arrow s are drawn 5 ' to 3' fro m tai l to head .
formation, suggest s tha t th e A-for m triple x typica l o f all-RN A triplexe s canno t accommodate a DNA thir d strand.
2.5 High resolution structures All o f the hig h resolutio n DN A triple x structure s solve d t o dat e ar e compose d o f a single stran d tha t fold s t o for m a n intramolecula r triplex , a s firs t demonstrate d b y Sklenar and Feigo n (59 ) (Fig . 12.5) . Thi s provide s a convenien t mode l syste m for studying triplex structures, since it guarantees the correc t stoichiometr y an d eliminates most potentia l problem s cause d b y th e formatio n o f alternativ e structures . Th e firs t NMR-based mode l structur e o f a n intramolecula r triple x wa s publishe d i n 199 2 (206). Thi s mode l structur e used distance restraints derived fro m NMR data , but was refined fro m a starting structure based on th e Arnot t fibr e diffractio n structure s (197). Subsequently publishe d hig h resolutio n structure s o f triplexe s hav e als o bee n cal culated fro m startin g structure s (A - and/o r B-DNA) , o r hav e bee n calculate d fro m distance geometr y generate d startin g structures. There are currently five published high resolutio n paralle l motif DN A triple x struc tures, al l o f whic h were solve d usin g NM R (Tabl e 12.1) . Thre e additiona l triple x structures hav e recently bee n solve d i n ou r laboratory . Al l the sequence s are given i n Fig. 12.6 . Tw o of the structure s are composed entirel y o f canonical T:A T an d C +:GC triplets (222,223 ) an d on e contain s P:A T an d C +:GC triplets (224) , where th e P i s a thymine wit h a propyn e grou p a t th e 5 positio n instea d o f a methy l group . Thes e structures ar e hereafte r referre d t o a s YRY1, YRY2 , an d PAT , respectively . Othe r triplexes incorporat e a modifie d bas e o r a singl e mismatc h triplet , suc h a s a G:T A
372
Oxford Handbook of Nucleic Acid Structure
Fig. 12.6. Sequenc e schematic s for the hig h resolutio n intramolecula r triplex structures . The thi n soli d lines represen t Watson—Cric k base pairin g an d th e thi n dashe d line s represen t (reverse ) Hoogsteen bas e pairing. The arrow s ar e drawn 5' to 3 ' from tai l to head. Th e sequence s for the eigh t parallel triplexes are on the top and the two antiparalle l triplexes ar e on the bottom.
triplet (GTA ) (225) , a T:CG triple t (TCG ) (226) , a l-(2-deoxy-D-ribofuranosyl)-4 (3-benzamido)phenyl-imidazole (D 3) targete d t o a TA base pair (DTA ) (227) , and a n N7-glycosylated guanin e targeted to a GC bas e pair (N7G ) (228) . The fou r structures currently in the Brookhaven PD B databan k (229) (http://www.pdb.bnl.gov) ar e GTA (accession #149d), TCG (#177d) , DTA (#lwan), an d N7G (#lgn7) . While none hav e been publishe d to date, we have recently determined a high reso lution DNA:RN A hybri d triple x structur e b y NMR . Thi s triple x (UA T triple x i n Table 12.1 ) i s composed o f an RNA thir d strand bound t o a DNA duple x (218). Before discussin g th e structures , w e not e tha t NM R refinemen t ha s certai n strengths and weaknesses. NMR i s very good a t determining sugar pucker and distin guishing betwee n a syn o r anti glycosidi c conformation . Also , i n theory , NM R ca n accurately defin e the backbon e angle s B, y, S, and E. However, i n practice , B and y require assignmen t of the H5 ' an d H5 " proton s an d E is often ambiguou s owin g t o
Structures of nucleic acid triplexes 37
3
the periodicit y o f th e couplin g constants . Th e a an d £ backbone angle s canno t b e determined reliabl y b y NMR, excep t qualitativel y throug h th e phosphoru s chemica l shift (230) . Therefore , backbon e angle s determined b y NMR ma y be influence d more by the energ y potential s use d in th e structur e calculation s tha n b y real data an d mus t be take n with th e proverbia l grain o f salt. Some helica l parameter s ar e reasonably wel l defined b y NMR , suc h a s ris e an d twist . Othe r parameter s ar e probabl y poorl y defined b y NMR, suc h as propeller twist . The method use d for NMR structur e determinatio n i s also an important considera tion. Startin g from mode l structures tends t o lea d t o smalle r roo t mea n square d deviation (rmsd ) than distanc e geometry . Precisio n (manifeste d as rmsd) i s often mistake n for accuracy . The rms d ca n b e manipulate d b y refinin g 100 0 structure s and showin g only th e thre e lowes t energ y structures . There ar e other factors , suc h as the inclusio n of explicit water molecules or ions during th e calculation . Sinc e method s fo r structure determination o f nuclei c acid s have evolve d durin g th e tim e perio d i n whic h thes e structures were solved , th e reade r i s advised t o conside r th e methodolog y use d whe n evaluating the fine r detail s of the structures . (See also Chapter 8) . 2.5.1 Canonical triplexes, YRY1 and YRY2 The tw o 'canonical ' triplexes , YRY1 an d YRY2 (Plat e X an d Fig . 12.6) , ar e the basi s of comparison fo r the othe r mismatch/modifie d triplexes . Th e triplexe s for m regula r helices whose genera l parameter s have already been describe d (Tabl e 12.1) . I t should b e noted tha t thes e helica l parameter s ar e measure d fo r th e duple x portion s o f th e triplexes. Interestingly , th e suga r pucker s o f YRY 1 an d YRY 2 ar e no t completel y S-type. Onl y th e purine s o f YRY1 ar e completely S-type , whil e man y o f the pyrim idines have a partial N-type characte r (222). In studies of YRY2 (223 ) an d i n previou s NMR studie s of a related triple x (205,206) , th e cytosine s i n bot h pyrimidin e strand s have significan t N-type character . Thi s observatio n i s supported b y a n I R stud y i n which th e T:A T triplet s ar e S-type , bu t th e C +:GC triplet s have a 1: 2 ratio o f S - t o N-type suga r pucke r (217) . Anothe r I R stud y ha s found tha t th e purine s ar e com pletely S-typ e an d that there is an overall 2:1 ratio o f S- to N-type sugar puckers (231). 2.5.2 The GTA triplex Two o f th e hig h resolutio n triple x structures , GT A an d TCG , contai n a single mis match triplet . Th e G:T A triple t wa s identified b y chemica l probin g an d U V spec troscopy a s the mos t stabl e mismatch triple t (119,122,123) . Th e detaile d base-pairin g scheme wa s first ascertaine d b y NM R (204,232 ) an d i s illustrated i n Fig . 12.Ic . Th e guanine i s bas e paire d t o th e thymin e vi a a singl e hydroge n bon d betwee n th e G [H2(2)] amino proto n an d the T(O4 ) oxygen . Whe n the G:T A triple t i s compared t o either th e T:A T an d C +:GC triplets (Fig s 12.la, b , c and 12.3b) , i t ca n be see n tha t it is not isosteri c wit h th e canonica l triplets. Th e position s o f the thir d stran d sugars are all roughly th e sam e in the three triplets , bu t th e orientatio n o f the guanin e glycosidi c bond i n th e G:T A triple t i s quite differen t fro m th e orientatio n o f the correspondin g glycosidic bonds i n the canonica l triplets. In th e GT A triple x structur e (225 ) (Fig . 12.6) , thi s difference induces severa l localized structural perturbations. Measuremen t o f the twis t in the thir d stran d (a s opposed to th e duplex ) reveal s tha t th e 5'-bas e ste p i s dramaticall y overwoun d whil e th e
374
Oxford Handbook of Nucleic Add Structure
3'-base ste p i s dramatically underwound. Thi s i s a direct effec t o f th e orientatio n o f the guanin e glycosidi c bon d relativ e t o thos e o f the neighbourin g thir d stran d bases. This altere d twist affect s th e bas e stacking in th e thir d strand . The guanin e i s stacked completely ove r th e 5'-bas e and has almost no overla p with th e 3'-base . The twis t o f the 5'-bas e ste p i n th e duplex 4 i s also overwoun d bu t th e 3'-bas e ste p i s unaffected . The G:T A guanin e sugar adopts an N-type pucker, apparently to reduce the backbone distortion cause d by the unusua l orientation of the guanin e glycosidi c bond (204,232) . The G:T A guanin e is tilted ou t o f plane with th e T A base pair towards the 3'-triplet , possibly formin g a wea k hydroge n bon d betwee n th e guanin e amin o H2(l ) proto n and th e O 4 oxyge n o f the duple x thymine o f the 3'-triplet (225) , although ther e i s no direct NM R evidenc e for such a hydrogen bond . 2.5.3 The TCG triplex The othe r mismatch-containing triple x i s the TC G triple x (Fig . 12.6), which contain s a singl e T:C G triple t centre d withi n canonica l T:A T an d C +:GC triplet s (226) . A thymine i n th e thir d stran d was found t o for m the mos t stable triplet wit h a CG base pair (122,125) . Th e detaile d base-pairin g schem e o f the T:C G triple t wa s shown t o involve a single hydroge n bon d between th e thir d stran d thymine O 2 oxyge n an d the Watson—Crick cytosine H2(2) amino proto n (226 ) (Fig. 12.Id). Surprisingly , both th e position an d orientatio n o f th e thymin e glycosidi c bon d ar e ver y simila r t o th e guanine glycosidic bond o f the G:T A triple t (Fig . 12.3b). A s a consequence, the struc tural perturbations observed i n th e TC G triple x ar e very simila r to thos e observed i n the GT A triplex. Bot h triplexe s have the sam e third stran d twist perturbations : a large overwinding i n th e bas e step tha t is 5' t o th e thir d stran d mismatch base , and a large underwinding i n the 3'-bas e step (225,226). All the suga r puckers are generally S-typ e with th e exceptio n o f the thymin e o f th e T:C G triplet , whic h ha s an N-type suga r pucker, exactl y lik e th e guanin e i n th e G:T A triplet . I n fact , th e tw o triplexe s have nearly identica l backbon e position s (no t shown ) an d hav e remarkabl y simila r helica l parameters (Tabl e 12.1) . 2.5.4 The N7G triplex The N7 G triple x (228 ) (Fig. 12.6 ) contain s a guanosine that is glycosylated at the N 7 position ( 7G) instead of at the N 9 positio n (Fig . 12 . le). Thi s base was designed t o b e an uncharge d analogue o f a protonated cytosine , thereb y allowin g a C +:GC triplet t o be replaced with a 7G:GC triplet, an d increase the stabilit y of PTs at physiological p H (154). Th e designe d base-pairing scheme o f the 7 G:GC triplet (154 ) (Fig . 12 . le) wa s confirmed b y the NM R structur e (228) . The helica l parameters for the N7 G triple x are similar t o th e othe r PTs , excep t i t ha s a slightly smalle r twis t an d a slightly larger ^-displacement. Thes e tw o change s may be relate d since a larger E-displacemen t wil l produce a smaller twist (al l else being equal) . Interestingly , the twis t an d rise betwee n base pair s ar e inversel y correlated , an d displa y a n alternatin g high an d lo w patter n (228). Th e Ap G an d Ap A bas e pair step s hav e a low twis t an d a large rise , an d th e 4
References t o th e 5'-en d o f th e triple x (o r duplex ) as a whol e ar e wit h respec t t o th e centra l purine strand, e.g. a reference t o th e triple t at the 5'-en d o f a triplex refers to th e triplet containin g the hase at th e 5'-end o f the purine strand.
Structures of nucleic acid triplexes 37
5
GpA base pair steps have a high twist and a small rise. This is the sam e sequence effec t observed i n duple x DN A (233,234) . Ther e ar e n o unusua l helica l parameter s asso ciated wit h th e 7 G:GC triplet . Overal l th e N7 G triple x i s a ver y regular structur e (Plate XIa). When th e 7 G:GC triple t i s superimpose d wit h a C +:GC triple t fro m th e N7 G triplex, th e Cl ' carbon s o f the 7 G an d C + occup y simila r position s (Plat e Xl b an d Fig. 12.3b) . However , a s in th e GT A an d TC G triplexes , th e thir d stran d glycosidic bond o f th e mismatc h triple t i s oriente d differentl y fro m th e canonica l third stran d glycosidic bonds, although th e differenc e i s smaller for the 7 G:GC triplet. Thi s pertur bation may be partially responsible for the lower stabilit y at pH 5. 2 of the N7 G triple x relative t o a triplex containin g a C +:GC triplet (228) . However, a t pH 7 , th e distor tion cause d by th e 7 G:GC triple t apparentl y is less destabilizin g tha n th e inabilit y t o protonate a C+:GC triplet (154) . 2.5.5 TheDTA triplex The DT A triple x (227 ) (Fig . 12.6 ) ha s a novel syntheti c base , D 3, designe d t o for m specific hydroge n bond s wit h a CG bas e pair (138) . However , chemica l footprintin g studies foun d tha t the D 3 bas e recognizes both T A and C G bas e pairs, forming D:T A and D:C G triplet s (138) . NM R experiment s reveale d tha t th e D 3 bas e was not bas e pairing via hydrogen bonds, bu t wa s intercalating instead (139) (Plat e XII). As expected, a large ris e is observed at the intercalatio n sit e to accommodate th e D 3 base. Concomitantly , a large unwindin g o f the heli x i s also found a t th e intercalatio n site (227) . If these distortions caused by the D 3 bas e intercalation ar e disregarded, the n the DT A triple x ha s helical parameters that are similar to th e othe r PT s (Tabl e 12.1). Another similarit y is that th e DT A triple x generall y ha s S-type suga r puckers, except for som e cytosine s i n th e thir d strand , a s ha s bee n observe d fo r othe r PT s (205,217,222). Th e thir d stran d thymin e tha t i s 5 ' t o th e D 3 bas e als o adopt s an N-type suga r pucker . Thi s conformatio n position s th e D 3 bas e directl y ove r th e 3 ' triplet at the intercalatio n site, allowing th e D 3 base to mimic a triplet (Plat e XII). Th e ability o f th e D 3 bas e t o mimi c a triple t wa s unexpected , an d suggest s that greate r triplex stability can be achieve d by designing a synthetic bas e that more closel y mimic s a triplet . 2.5.6 The propyne triplex The fina l DN A triplex , PA T (Fig. 12.6) , i s one i n whic h th e thymin e methy l group s in th e thir d strand have been replace d with propyn e group s (224) . These propy l bases have been show n to enhanc e both duple x and triplex stabilit y (235). The helica l parameters o f th e PA T triplex conform , i n general , t o othe r DN A PT s (Tabl e 12.1) . However, th e 5'-en d o f the triple x ha s a significant inclination (6.2° ) an d make s this end o f th e triple x resembl e A-DN A visually . A n opposit e inclinatio n (—5.3° ) i s observed in the 3'-end of the triplex, making this end of the triplex resemble B-DN A visually. Th e unusua l variatio n i n inclinatio n migh t t o b e a n effec t o f th e propyn e groups in combination wit h th e triple x sequence . The sequenc e o f the thir d stran d is 5'-PCPCPCPP-3'. The increas e of hydrophobicity resultin g from th e propyne group s may dehydrat e on e o f th e groove s and/o r alte r th e stackin g interactions , possibl y inducing a n A-for m inclination , whil e th e tw o sequentia l propyn e group s a t th e
376
Oxford Handbook of Nucleic Acid Structure
3'-end of the heli x apparentl y can produce som e steri c clash, which induce s a reverse inclination. 2.5.7 The DNA:RNA triplex The structur e o f a triple x (UAT ) (Fig . 12.6 ) compose d o f a DN A duple x an d a n RNA thir d stran d has been solve d by NMR (218) . Surprisingly , the replacemen t o f a DNA thir d stran d wit h a n RN A thir d stran d appear s t o hav e ver y littl e effec t on th e triple x structure . Th e helica l parameter s ar e ver y muc h lik e th e DN A PT s (Table 12.1) . I n fact , th e inclinatio n ma y be mor e B-for m the n othe r PT s studied . The suga r pucker conformation s fo r th e RN A stran d were difficul t t o evaluat e and could onl y b e qualitativel y determine d (218) . Normally , A-for m suga r pucker s i n RNA ar e indicated by small H1' t o H2 ' couplin g constant s leading t o weak o r non existent cross-peak s i n a correlatio n experiment . I n th e UA T triplex , th e RN A cross-peaks wer e fairl y intense , indicatin g significan t S-typ e suga r pucker, bu t th e sugar conformatio n coul d no t b e unambiguousl y determine d fro m th e singl e cou pling constant . Thi s apparentl y contradicts IR studie s on R:D D triplexe s (212,217) , where th e sugar s have significan t N-typ e suga r puckers (Tabl e 12.1) . However , dis tinctly weaker cross-peak s were observe d fo r the RNA cytosine s in th e UA T triplex , indicating tha t the y ha d a mor e N-typ e suga r pucker . Thi s ma y b e th e preferre d conformation fo r th e cytosine s i n a triplex , sinc e NM R studie s o n DN A triplexe s have als o found t o b e o f greater N-type characte r (204—206), an d a n I R stud y on a n R:DD triple x compose d solel y o f C + :GC triplet s als o find s onl y N-typ e suga r puckers (211 ) (Table 12.1) .
3. Structures of antiparallel triplexes 3. i Background Much les s is known abou t th e structur e of APTs tha n PTs. Earl y UV studie s of poly r(C) combine d wit h pol y r(G ) or olig o r(G ) detecte d a 1: 2 complex (7-9) . I n hind sight, thes e complexe s wer e undoubtedl y formin g APTs . APT s wer e no t recognize d as a separat e triplex moti f unti l recently . I n fact , pol y r(A):pol y r(A):pol y r(U ) wa s not discovere d unti l 198 7 (236) , an d th e correspondin g DN A polymer s wer e no t shown t o for m a triple x unti l 199 5 (237) . Th e A:A U triple x ha s bee n th e mos t difficult t o characteriz e and s o far ha s only bee n foun d whe n th e pol y r(A ) strand is —30—150 base s lon g (236) . Th e discover y o f th e thre e canonica l triplet s (G:GC , A:AT, an d T:AT ) sparke d th e realizatio n tha t APT s constitute d a separat e triple x motif (238) . Th e antiparalle l stran d orientatio n wa s als o onl y recentl y determine d (238-241). For triplexe s i n whic h th e thir d stran d i s antiparallel to th e duple x purin e strand , there ar e a t leas t tw o possibl e base-pairin g scheme s fo r th e canonica l triplet s (238). NMR studie s hav e define d th e actua l base-pairin g schem e a s bein g revers e Hoogsteen wit h a n anti glycosidi c bon d conformatio n (242,243 ) (Fig . 12.2) . A n important differenc e betwee n th e P T canonica l triplet s an d th e AP T canonica l triplets is that the P T canonica l triplets are isosteric, while th e AP T canonica l triplets are not (Fig . 12.3a) .
Structures of nucleic add triplexes 37
7
3.2 DNA antiparallel triplexes 3.2.1 Helix morphology There ar e currently tw o high resolution NM R structure s o f DNA APT s (244,245) and an X-ra y structur e of two stacke d G:GC triplet s (246) . Althoug h th e numbe r o f structures is limited, som e generalitie s ca n still be ventured. Overall , th e DN A APT s gready resemble th e DN A PT s both i n thei r helica l parameter s (Table s 12. 1 an d 12.3 ) an d i n their appearanc e (Plate X). Th e bas e pair rise and X-displacemen t ar e similar t o tha t o f the PTs . Th e helica l twists of one NM R structur e (244 ) and the X-ray structur e (246 ) are comparable to the PTs, bu t the NM R structur e with th e T:C G mismatc h (245 ) has a muc h large r twis t (Tabl e 12.3) . Result s fro m ge l migration studies , which apparentl y can measure the twis t with hig h precision, support the smalle r twis t (247 ) (Tabl e 12.3) . The inclinatio n fo r th e DN A APT s i s smal l an d slightl y mor e negativ e tha n th e DNA PT s and more simila r to B-DNA. However, there are only tw o values of inclination, an d onl y on e tha t w e coul d verif y (Tabl e 12.3) . Th e suga r pucker s i n thes e structures ar e predominantly S-type . Thi s i s confirmed by I R studie s o n DN A APT s (30,248) (Tabl e 12.3) . On e I R stud y o n a n AP T compose d solel y o f G ;GC triplet s suggests that the guanin e duplex strand may be N-type (248) .
3.3 RNA antiparallel triplexes Very littl e i s known abou t th e structur e o f RNA o r hybri d RNA:DN A APTs . Th e only structura l informatio n come s fro m a n IR/Rama n stud y o f pol y r(G):pol y d(G):poly d(C) , whic h find s tha t bot h guanin e strand s adopt N-typ e suga r pucker s while th e cytosin e strand appears to be S-type (248) .
3.4 High resolution structures The tw o hig h resolutio n NM R DN A AP T structure s are composed o f (1 ) canonical G:GC an d T:A T triplet s (hereafte r referre d t o a s RRY) (244 ) o r (2 ) G:GC , T:A T triplets, plu s a singl e T:C G triple t (TCG ) (245) . Th e crysta l structur e o f a triple x (GGC) wa s extrapolated fro m tw o G:G C triplet s formed a t the end s of a DNA duple x with tw o overhangin g guanine s tha t bin d t o tw o G C bas e pair s fro m a symmetry related duple x (246) . The coordinate s fo r RRY (accessio n # 134d-136d ) an d GG C (#272d) ar e in th e PD B database . A high resolutio n structur e o f an A:AT triple t ha s not yet been solved , althoug h th e detail s of the base-pairin g scheme hav e been deter mined b y NMR (243 ) (Fig . 12.2b) . 3.4.1 Canonical triplex, RRY The RR Y triple x (244) (Plat e X an d Fig . 12.6 ) ha s helica l parameter s (Tabl e 12.3 ) very similar to th e DNA PTs . However , a number o f sequence effect s ar e found in th e RRY triple x originatin g fro m th e fac t tha t th e G:G C an d T:A T triplet s are not isos teric (Fig . 12.3). If twist an d rise are measured for the thir d strand , then the Tp G bas e steps are underwound an d hav e a large rise . Conversely , th e Gp T bas e steps are overwound an d have a small rise (244). The Gp G bas e steps have an average twist an d rise, typical o f othe r DN A triplexes . N o Tp T bas e ste p occur s i n th e RR Y triplex , bu t
Table 12.3 . Structural data o n antiparallel triplexes Triplex sequence" B-DNAd A-DNAd RRY Mixedf TCG GGC (A:AT)IO Poly(G:GC) (GA). Poly(G:GC)
DNA:RNA composition
X angle
NMR
D:DD
anti anti anti
NMR NMR X-ray FTIR FTIR/Raman Gel FTIR/Raman
D:DD D:DD D:DD D:DD D:DD D:DD R:DD
Method
Sugar pucker6
Risec
(A)
Twistc (°)
x-dispc
Inclinc (°)
Reference (see notes )
S N
3.4 2.6 3.6 [3.6]
36 33 30 [30]
-0.7 -5.4 -2.1 [-1.9]
-6.0 19.1 -1.3
1
[3.3g [3.3]
[38] [30]
[-2.9] [-1.5]
[-4.8]
y
anti'
s s sh s S&NJ
anti'
S&Nk
anti anti anti
" The tabl e is grouped b y DNA an d DNA:RNA hybrid triplexes . The hig h resolutio n NMR structure s are listed in bold b y the name s used in the text ; Otherwise , th e composition o f the triplexe s are given a s triplets [e.g . (T:AT) J o r as the sequenc e i n th e purine stran d [e.g . (AG)n ] or as 'mixed' for more complex sequences. * S-type sugar pucker is a C2'-endo conformation. N-typ e sugar pucker i s a C3'-endo conformation. ' Helica l parameters are calculated using 'Curves' v5.l wit h a linear helical axis for the duplex alone . Values in [ ] are the parameters given in the references and may have bee n calculated using a different metho d and/o r helical axis. d The standar d A-DNA and B-DNA parameters were calculate d from structures create d in Insigh t 95 (Biosym). The Biosy m parameter s are from X-ra y fibr e diffractio n dat a (refs 8 and 9 below). e The suga r puckers in the final structures ar e generally S-type, although n o data o n th e sugar conformations wer e obtaine d and no direc t restraint s were used . fContains a T:CG mismatc h triplet. g The helica l parameters for the TCG triple x wer e not independentl y calculate d because the triplex coordinates were not available . h Five of the si x sugars are S-type. On e cytosin e i s N-type. ' Only the glycosidi c angle of the guanine s could be determined . J Both th e poly (dC ) duple x stran d and the pol y (dG ) third strand are S-type, while the poly (dG) duplex strand i s N-type. k The pol y (dC ) strand is S-type, while both th e pol y (dG ) and poly (rG) strands are N-type.
[32]
(A)
2 3 4 5 6 7 6
1. Radhakrishnan , I . and Patel, D.J . (1993 ) Structure 1, 135-52. 2. Dittrich , K. , Gu, J., Tinder, R., Hogan , M.E . an d Gao, X. (1994) Biochemistry 33, 4111-20. 3. Ji , J., Hogan, M.E . an d Gao, X. (1996 ) Structure 4, 425-35. 4. Vlieghe , D., Va n Meervelt, L.,Dautant , A., Gallois, B., Precigoux, G . and Kennard, O. (1996 ) Science 273, 1702-5 . 5. Dagneaux , C., Gousset , H., Shchyolkina , A.K., Ouali , M. , Letellier, R., Liquier , J., Florentiev , V.L . and Taillandier, E . (1996) Nucleic Adds Res. 24, 4506-12. 6. Ouali , M. , Letellier, R., Sun , J.-S., Akhebat, A., Adnet, F. , Liquier, J. an d Taillandier, E . (1993 ) J. Amer. Chetn. Soc. 115, 4264-70. 7. Shin , C. and Koo, H.S . (1996 ) Biochemistry 35, 968-72. 8. Arnott , S . and Hukins , D.W . (1972 ) Biochem. Biophys. Res. Cotntn. 47, 1504-9 . 9. Arnott , S. , Hukins, D.W . an d Dover, S.D. (1972 ) Biochem. Biophys. Res. Comm. 48, 1392-9.
Structures of nucleic acid triplexes 37
9
since suc h a base step would consis t o f like triplets , th e twis t an d ris e are expected t o be typica l o f othe r DN A triplexes . Simila r sequenc e effect s migh t b e expecte d fo r mixtures o f G:GC an d A:AT triplets . The x-displacemen t o f the duple x base pairs is larger for the G C bas e pairs than for the A T bas e pairs, which correlate s wit h th e siz e of the thir d stran d base. Evidently, the increase d size of a G:GC triple t with respec t to a T:AT triple t i s partially accom modated by displacing the duple x strand. The suga r conformations of the RRY triple x ar e unclear from th e NMR data . The NOE cross-pea k intensitie s for the H6,H8-H3 ' cross-peaks ar e quite strong , indicat ing partial N-type sugar pucker. However , th e couplin g pattern s for the HI'—H2',2 " cross-peaks are neither standar d S-type nor N-typ e (244). 3.4.2 The TCG triplex Studies on alternate triplets within a n APT moti f have revealed that a T:CG triple t is the most stabl e mismatch triplet , althoug h i t i s significantly less stabl e tha n th e canonica l triplets (126,249) . Th e TC G triple x (Fig . 12.6 ) i s comprised o f canonica l G:G C an d TAT triplet s an d a single T:CG triple t (245) . The triple x has helical parameters similar to other APTs an d PTs (Table 12.3), except for local distortions abou t the T:CG triplet . The NM R structur e reveals that the thymine o f the T:CG triple t interacts with th e CG base pai r vi a a singl e hydroge n bon d fro m th e C[H4(2) ] amin o proto n t o th e T(O4 ) oxygen (Fig . 12.2d) , rathe r than the T(O2) oxygen , as previously predicted (127). Incorporation o f the T:C G triple t ha s some effec t o n th e thir d strand conformation . The widt h o f th e groov e tha t i s formed b y th e thir d stran d an d th e (predominantly ) purine stran d is much wider nea r the T:CG triple t (245) . The helica l twist o f the thir d strand i s generally th e sam e as the duple x excep t a t the tw o bas e steps that involve th e thymine o f the T:C G triplet . Th e bas e step tha t i s 5' t o th e thymine , a Gp T step , is extremely underwoun d (5.8°) , and the bas e step that is 3' t o th e thymine , a TpG step , is extremel y overwoun d (67.4° ) (245) . This sequenc e effec t o n th e twis t i s the revers e of what i s observed i n th e RR Y triplex , wher e Gp T step s ar e overwoun d an d Tp G steps are underwound (244) . We can speculate that had the thymin e O 2 bee n involve d in th e hydroge n bon d instea d o f the O4 , the n th e sequenc e effec t o n th e twis t woul d match th e trend found in the canonica l triplex. However , by using the O 4 oxygen , th e thymine suga r is placed further fro m th e helica l axis , which mor e closel y matches th e position of a guanine suga r in a G:GC triple t (Fig s 12.2 d an d 12.3c) . Comparison o f the T:CG triple t i n this APT t o the T:C G triple t i n the P T show s that they diffe r i n their thymine suga r puckers. The thymin e suga r in the AP T remain s in th e sam e conformatio n a s the othe r sugars, S-type (245) , while th e thymin e suga r in th e P T i s N-type (226) . Th e tw o triplet s also diffe r i n th e atom s involve d i n th e hydrogen bond. The APT thymine utilize s the O4 oxygen , whil e th e PT thymin e uti lizes th e O 2 oxygen . Remarkably , i f the tw o triplet s ar e superimposed, th e O 4 ato m of the AP T thymin e perfectl y superimposes o n th e O 2 ato m o f the P T thymine . I n fact, th e thymin e bas e in the AP T i s perfectly related to th e P T thymin e by a twofold rotation abou t a pseudo-symmetry axi s that runs through th e N 3 an d C6 atom s (245). This symmetr y operatio n places the Cl ' ato m o f the APT thymin e in the sam e position a s the P T methy l group , which place s the AP T suga r further from the helica l axis and close r to th e analogou s position o f the guanin e suga r in a G:GC triplet . Thus, th e
380
Oxford Handbook of Nucleic Acid Structure
APT an d P T T:C G triplet s adop t th e sam e base-pairin g scheme , excep t tha t thei r thymines utiliz e pseudo-symmetry-relate d carbony l oxygen s fo r th e hydroge n bonding, an d eac h thymin e suga r i s positione d mos t favourabl y for th e particula r triplex motif . 3.4.3 The GGC triplex The fina l triple x i s extrapolate d fro m a 2. 0 A resolutio n crysta l structur e o f tw o tandem G:GC triplet s o n th e en d o f a duplex (246) . The helica l parameter s generally conform t o the helical parameters of the RRY an d TCG triplexe s (Tabl e 12.3). I n the two bas e triplets from whic h th e triple x structur e was calculated, al l of the sugar s are S-type, excep t for one o f the Watson—Cric k paire d cytosines. However, th e structure of the triple x ma y be influence d by the duplex/triple x junctio n an d a triplex/triple x junction, wher e a pai r o f antiparalle l G:G C triplet s interac t wit h paralle l G:G C triplets. The GG C structur e provides very precise detail of the G:G C triplet .
4. PNA triplex structures A crystal structure of a triplex compose d o f a homopurine DN A stran d and a hairpin homopyrimidine peptid e (o r polyamide ) nuclei c aci d (PNA ) stran d (193 ) ha s bee n solved (Plat e XIII). PNAs ar e nucleic acids in which th e phosphodieste r backbone has been replace d with a peptide backbon e (189) . When targete d to duple x DNA , the y form mor e stabl e triplexe s tha n thei r DN A counterpart s (189) . I n addition , stran d invasion b y th e PN A displace s the homopyrimidin e stran d of the duplex , formin g a 2:1 PNArDN A triple x (189,250) . Th e crysta l structure o f th e PNA:DN A triple x i s composed o f bot h T:A T an d C +:GC canonica l triplet s (193) . Base d o n th e triple t composition, thi s triplex i s of the parallel motif, wher e th e N-terminu s i s analogous to the 5'-en d of a DNA strand . The mos t strikin g characteristic o f th e PNA:DN A triple x i s the cavit y dow n th e centre o f th e heli x cause d b y th e larg e X-displacemen t (—6. 8 A ) (Plat e XIII) . Thi s structure differ s significantl y from bot h A - an d B-DNA . Th e suga r pucker s o f th e DNA stran d ar e all N-type as in A-DNA, th e ris e an d inclination ar e both similar t o B-DNA, th e x-displacemen t i s larger tha n eithe r A - o r B-DNA , th e twis t i s muc h smaller than either A- o r B-DNA, and th e glycosidi c bonds are all in th e anti conformation, a s in bot h A - an d B-DNA . Sinc e th e heli x i s neither A - no r B-form , th e helix has been calle d P-form (193) . Interestingly, th e Hoogstee n PN A stran d and th e DN A stran d are extremely clos e together an d shar e extensive van de r Waal s contacts (193). Ther e is a series of hydrogen bonds between th e amides of the PNA backbon e an d the O1 P phosphat e oxygens of the DNA backbone . These hydroge n bond s probabl y account for the increased stability o f the PNA triplex .
5. Conclusion In summary , triplex structure ca n be viewed in the contex t of a duplex structur e tha t has been perturbed to accommodat e th e binding o f a third stran d 'ligand' i n the major groove. Fo r DNA triplexes , th e duple x structur e is a B-DNA structure, and th e per -
Structures of nucleic acid triplexes 38
1
turbations tha t ar e observe d ar e a n increase d (negative ) X-displacemen t an d a n unwinding of the helix . Fo r RNA triplexes , ther e is less direct structura l information . However, I R an d fibr e diffractio n dat a indicat e tha t th e structure s ar e A-for m wit h some perturbations . Bot h th e paralle l an d antiparalle l triplexe s adop t simila r struc tures. However , the paralle l triplexes have a more regular backbone in the thir d stran d because their canonical triplets are isosteric.
Acknowledgments The author s than k Charlott e Gotfredsen , A . Kathry n Phipps , Marku s Tarkoy , an d Peter Schultz e for unpublished work discusse d here. This work was supported by NI H grant GM 3725 4 (t o J.F).
References 1. Pauling , L. and Corey, R.B . (1953 ) Nature 171 , 346 . 2. Pauling , L. and Corey , R.B . (1953 ) Proc. Natl. Acad. Sci. USA 39 , 84 . 3. Watson , J.D. an d Crick, F.H. (1953 ) Nature 171 , 737 . 4. Felsenfeld , G., Davies, D.R. and Rich, A. (1957 ) J. Am. Chem. Soc. 79, 2023 . 5. Felsenfeld , G . an d Rich, A . (1957 ) Biochim. Biophys. Acta 26, 457 . 6. Hoogsteen , K . (1959 ) Acta Cryst. 12 , 822 . 7. Fresco , J.R . (1963 ) i n Some Investigations on the Secondary and Tertiary Structure of Ribonucleic Acids, (Fresco , J.R., ed.) , pp. 121 . Academic Press, Inc.,New York . 8. Lipsett , M.N. (1963 ) Biochem. Biophys. Res . Commun. 11, 224. 9. Lipsett , M.N. (1964 ) J. Biol. Chem. 239, 1256 . 10. Howard , F.B. , Frazier , J., Lipsett , M.N . an d Miles , H.T . (1964 ) Biochem. Biophys. Res. Commun. 17, 93. 11. Inman , R.B. (1964 ) J. Mol. Biol. 10 , 137 . 12. Chamberlin , MJ . an d Patterson, D.L. (1965 ) J. Mol. Biol. 12 , 410. 13. Riley , M., Maling , B . and Chamberlin, MJ . (1966)J . Mol. Biol. 20, 359 . 14. Rich , A. (1960 ) Proc. Natl. Acad. Sci. USA 46 , 1044 . 15. Felsenfeld , G. and Miles, H.T . (1967 ) Annu. Rev. Biochem. 36, 407 . 16. Michelson , A.M. , Massoulie , J. an d Guschlbauer , W. (1967 ) Progr. Nucl. Acid Res. Mol. Biol. 6, 83 . 17. Morgan , A.R . an d Wells, R.D . (1968 ) J. Mol. Biol. 37, 63. 18. Wells , R.D. , Collier , D.A. , Hanvey , J.C., Shimizu , M. an d Wohlrab, F . (1988 ) FASEB J. 2, 2939. 19. Cheng , Y.K. an d Pettitt , B.M. (1992 ) Progr. Biophys. Mol. Biol. 58, 225 . 20. Sun , J.S. and Helene , C . (1993 ) Curr. Opin. Struct. 3, 345 . 21. Lu , G. and Ferl, RJ. (1993 ) Int.]. Biochem. 25, 1529 . 22. Radhakrishnan , I. and Patel, D.J. (1994 ) Biochemistry 33 , 11405 . 23. Plum , G.E. , Pilch , D.S. , Singleton , S.F . and Breslauer , K.J. (1995 ) Annu. Rev. Biophys. Biomol. Struct. 24, 319 . 24. Frank-Kamenetskii , M.D. an d Mirkin, S.M . (1995 ) Annu. Rev. Biochem. 64, 65. 25. Lyamichev , V.I., Mirkin , S.M . an d Frank-Kamenetskii , M.D. (1987 ) J. Biomol. Struct. Dynamics 5 , 275 . 26. Rajagopal , P . and Feigon, J. (1989 ) Nature 339, 637 . 27. Sun , J.S., D e Bizemont , T. , Duval-Valentin , G. , Montenay-Garestier , T . an d Helene, (1991) C. C. R. Acad. Sci. J//313, 585 .
382
Oxford Handbook of Nucleic Acid Structure
28. Giovannangeli , C. , Rougee , M. , Garestier , T. , Thuong , N.T . an d Helene , C . (1992 ) Proc. Natl. Acad. Sci.USA 89 , 8631 . 29. d e Bizemont, T. , Duval-Valentin , G. , Sun, J.S., Bisagni , E. , Garestier , T. an d Helene, C . (1996) Nud. Adds Res. 24, 1136 . 30. Dagneaux , C. , Gousset , H. , Shchyolkina , A.K. , Ouali , M. , Letellier , R. , Liquier , J. , Florentiev, V.L. an d Taillandier, E . (1996 ) Nud. Adds Res. 24, 4506 . 31. Dagneaux , C. , Liquier , J. an d Taillandier, E . (1995 ) Biochemistry 34 , 14815 . 32. Soyfer , V.N. an d Potaman, V.N. (1996 ) Triple-Helical Nucleic Acids. Springer-Verlag , Ne w York. 33. Murray , N.L . an d Morgan, A.R . (1973 ) Can.]. Biochem. 51, 436 . 34. Miller , J.H. an d Sobell, H.M . (1966 ) Proc. Natl. Acad. Sri. USA 55 , 1201 . 35. Behe , MJ. (1995 ) Nud. Acids Res. 23, 689 . 36. Beasty , A.M. an d Behe, MJ. (1988 ) Nud. Acids Res. 16 , 1517 . 37. Gillies , S.D. , Folsom , V. and Tonegawa , S . (1984) Nature 310 , 594 . 38. Fowler , R.F. an d Skinner, D.M . (1986 ) J. Bid. Chem. 261, 8994 . 39. d e Martynoff, G., Pohl, V. , Mercken, L. , van Ommen, GJ. an d Vassart, G. (1987 ) Eur.J. Biochem. 164, 591 . 40. Gee , J.E., Yen , R.L. , Hung , M.C. an d Hogan, M.E . (1994 ) Gene 149, 109 . 41. Belland , R.J. (1991 ) Mol. Microbiol. 5 , 2351 . 42. Vasquez , K.M., Wensel , T.G. , Hogan , M.E . an d Wilson , J.H . (1995 ) Biochemistry 34 , 7243. 43. Elgin , S.C . (1981 ) Cell 27, 413 . 44. Mace , H.A. , Pelham , H.R . an d Travers, A.A . (1983 ) Nature 304, 555 . 45. NickolJ.M . and Felsenfeld, G. (1983 ) Cell 35, 467 . 46. Cantor , C.R. an d Efstratiadis, A . (1984 ) Nud. Acids Res. 12, 8059. 47. Evans , T. an d Efstratiadis, A. (1986 ) J. Biol. Chem. 261, 14771 . 48. Lyamichev , V.I. , Mirkin , S.M . an d Frank-Kamenetskii , M.D . (1986 ) J. Biomol. Struct. Dynamics 3 , 667 . 49. Mirkin , S.M. , Lyamichev , V.I. , Drushlyak , K.N. , Dobrynin , V.N. , Filippov , S.A . an d Frank-Kamenetskii, M.D . (1987 ) Nature 330, 495 . 50. Schroth , G.P . an d Ho, P.S . (1995 ) Nud. Acids Res. 23, 1977 . 51. Kohwi , Y . and Kohwi-Shigematsu, T . (1988 ) Proc. Natl. Acad. Sri. USA 85 , 3781 . 52. Kohwi , Y. (1989 ) Nud. Acids Res. 17, 4493 . 53. Bernues , J., Beltran, R., Casasnovas , J.M. an d Azorin, F. (1989) EMBOJ. 8 , 2087. 54. Mirkin , S.M . an d Frank-Kamenetskii , M.D . (1994 ) Annu. Rev. Biophys. Biomol. Struct. 23, 541 . 55. Karlovsky , P., Pecinka, P. , Vojtiskova, M., Makaturova , E. and Palecek, E. (1990 ) FEBS Lett. 274, 39 . 56. Kohwi , Y. , Malkhosyan, S.R . an d Kohwi-Shigematsu, T . (1992 ) J. Mol. Biol. 223, 817 . 57. Moser , H.E . an d Dervan, P.B . (1987 ) Science 238, 645 . 58. Htun , H . an d Dahlberg, J.E. (1988 ) Science 241, 1791 . 59. Sklenar , V. and Feigon, J. (1990 ) Nature 345, 836 . 60. Collier , D.A . an d Wells, R.D. (1990 ) J. Biol. Chem. 265, 10652 . 61. Lee , J.S., Woodsworth , M.L., Larimer , LJ. P . and Morgan, A.R . (1984 ) Nud. Acids Res. 12, 6603. 62. Povsic , TJ. an d Dervan, P.B . (1989 ) J. Am. Chem. Soc. 111, 3059. 63. Xodo , L.E. , Manzini , G. , Quadrifoglio , F. , va n de r Marel , G.A . an d va n Boom, J.H . (1991) Nud. Acids Res. 19, 5625 . 64. Hanvey , J.C., Williams , E.M . an d Besterman, J.M. (1991 ) Antisense Res. Dev. 1, 307 . 65. Kiyama , R. an d Camerini-Otero, R.D. (1991 ) Proc. Natl. Acad. Sri. USA 88 , 10450 .
Structures of nucleic acid triplexes 38
3
66. Guieysse , A.L. , Praseuth, D. and Helena, C. (1997 ) J. Mo/ . Biol. 267, 289 . 67. Lee , J.S., Burkholder , G.D. , Latimer , LJ. P. , Haug , B.L . and Braun, R.P . (1987 ) Nttcl. Adds Res. 15, 1047 . 68. Burkholder , G.D. , Latimer , LJ. P . an d Lee, J.S. (1988 ) Chromosoma 97, 185 . 69. Agazie , Y.M., Lee , J.S. an d Burkholder, G.D . (1994 ) J. Biol. Chem. 269, 7019 . 70. Agazie , Y.M., Burkholder , G.D . an d Lee, J.S. (1996 ) Biochem.J. 316, 461 . 71. Sarkar , P.S. an d Brahmachari, S.K. (1992 ) Nud. Adds Res. 20, 5713 . 72. Kohwi , Y . and Kohwi-Shigematsu, T . (1991 ) Genes Dev. 5, 2547 . 73. Kohwi , Y . an d Panchenko, Y. (1993 ) Genes Dev. 7, 1766 . 74. Rooney , S.M. and Moore, P.D . (1995 ) Proc. Natl. Acad. Sd. USA 92 , 2141 . 75. Portes-Sentis , S. , Sergeant, A. and Gruffat , H . (1997 ) Nud. Adds Res. 25, 1347 . 76. Chubb , J.M. an d Hogan, M.E . (1992 ) Trends Biotechnol. 10, 132 . 77. Gee , J.E. an d Miller, D.M . (1992 ) Am.J. Med. Sd. 304 , 366 . 78. Helene, C . (1991 ) Anticancer Drug, Des. 6, 569 . 79. Cooney , M. , Czernuszewicz , G., Postel, E.H., Flint , S.J. and Hogan, M.E . (1988 ) Stience 241, 456 . 80. Young , S.L. , Krawczyk, S.H. , Matteucci , M.D . an d Took, JJ. (1991 ) Proc. Natl. Acad. Sd. USA 88 , 10023 . 81. Duval-Valentin , G. , Thuong , N.T . an d Helene, C . (1992 ) Proc. Natl. Acad. Sd. USA 89 , 504. 82. Hacia , J.G., Dervan , P.B. an d Wold, B.J. (1994 ) Biochemistry 33 , 6192 . 83. Samadashwily , G.M. an d Mirkin, S.M . (1994 ) Gene 149 , 127 . 84. Krasilnikov , A.S. , Panyutin , I.G. , Samadashwily , G.M. , Cox , R. , Lazurkin , Y.S . an d Mirkin, S.M . (1997 ) Nud. Add Res. 25, 1339 . 85. Francois , J.C., Saison-Behmoaras , T. , Thuong , N.T . an d Helene, C . (1989 ) Biochemistry 28, 9617 . 86. Maher , LJ . d. , Dervan, P.B . and Wold, B.J. (1990 ) Biochemistry 29 , 8820 . 87. Hanvey , J.C., Shimizu , M. and Wells, R.D . (1990 ) Nud. Adds Res. 18, 157 . 88. Grigoriev , M. , Praseuth , D. , Robin , P. , Hemar , A. , Saison-Behmoaras , T. , Dautry Varsat, A., Thuong , N.T. , Helene , C . an d Harel-Bellan , A . (1992 ) J. Biol. Chem. 267 , 3389. 89. Maher , LJ . D. , Wold , B . and Dervan, P.B . (1989 ) Science 245, 725 . 90. Gee , J.E. , Blume , S. , Snyder , R.C. , Ray , R . an d Miller , D.M . (1992 ) J. Biol. Chem. 267, 11163 . 91. Reddoch.J.F . and Miller, D.M . (1995 ) Biochemistry 34 , 7659 . 92. Orson , P.M. , Thomas , D.W. , McShan , W.M. , Kessler , DJ. an d Hogan , M.E . (1991 ) Nud. Adds Res. 19, 3435. 93. Postel , E.H. , Flint, S.J., Kessler, DJ. an d Hogan, M.E . (1991 ) Proc. Natl. Acad. Sd. USA 88, 8227 . 94. Lu , G. and Ferl, R. (1992 ) J. Plant Mol Biol. 19, 715 . 95. Helm , C.W. , Shrestha , K., Thomas , S. , Shingleton , H.M . an d Miller , D.M . (1993 ) Gynecol. Oncol. 49, 339 . 96. Ing , N.H. , Beekman , J.M., Kessler , D.J., Murphy , M. , Jayaraman, K., Zendegui , J.G. , Hogan, M.E. , O'Malley , B.W . an d Tsai, M.J. (1993 ) Nud. Adds Res. 21, 2789 . 97. Noonberg , S.B., Scott , G.K., Garovoy , M.R., Benz , C.C . an d Hunt, C.A. (1994 ) Nud. Adds Res. 22, 2830 . 98. Francois , J.C., Saison-Behmoaras , T. , Chassignol , M. , Thuong , N.T . an d Helene , C . (1989) J. Biol. Chem. 264, 5891 . 99. Francois , J.C., Saison-Behmoaras , T. , Barbier , C. , Chassignol , M. , Thuong , N.T . an d Helene, C . (1989 ) Proc. Natl. Acad. Sd. USA 86 , 9702 .
384
Oxford Handbook of Nucleic Acid Structure
100. Boidot-Forget , M. , Chassignol , M., Takasugi, M., Thuong , N.T . an d Helena, C . (1988 ) Gene 72, 361 . 101. L e Doan, T. , Perrouault , L. , Praseuth, D. , Habhoub , N. , Decout , J.L., Thuong , N.T. , Lhomme, J. an d Helene, C . (1987 ) Nucl. Adds Res. 15, 7749. 102. Perrouault , L., Asseline, U., Rivalle , C., Thuong , N.T. , Bisagni , E. , Giovannangeli, C. , Le Doan, T . an d Helene, C . (1990 ) Nature 344, 358 . 103. L e Doan, T. , Perrouault , L., Asseline , U. , Thuong , N.T. , Rivalle , C. , Bisagni , E. an d Helene, C . (1991 ) Antisense Res. Dev. 1, 43. 104. Strobel , S.A. , Moser, H.E. an d Dervan, P.B . (1988 ) J. Am. Chem. Soc. 110, 7927 . 105. Strobel , S.A . and Dervan, P.B. (1990 ) Science 249, 73 . 106. Luebke , K.J. and Dervan, P.B. (1992 ) Nucl. Acids Res. 20, 3005 . 107. Havre , P.A. and Glazer, P.M. (1993 ) J. Virol. 67 , 7324 . 108. Havre , P.A. , Gunther , E.J., Gasparro , P.P. and Glazer , P.M. (1993 ) Proc. Natl. Acad. Sci. USA 90 , 7879 . 109. Wang , G. , Levy , D.D. , Seidman , M.M . an d Glazer , P.M . (1995 ) Mol. Cell Biol. 15 , 1759. 110. Roberts , R.W . an d Crothers, D.M . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 9397. 111. Ito , T., Smith , C.L. an d Cantor, C.R . (1992 ) Nucl. Acids Res. 20, 3524 . 112. Ito , T. , Smith , C.L . an d Cantor, C.R . (1992 ) Proc. Natl. Acad. Sci. USA 89 , 495 . 113. Ito , T., Smith , C.L . an d Cantor, C.R . (1992 ) Genet. Anal. Tech. Appl. 9, 96. 114. Sonti , S. , V., Griffor , M.C. , Sano , T. , Narayanswami , S. , Bose , A. , Cantor , C.R . an d Kausch, A.P. (1995 ) Nucl. Adds Res. 23, 3995 . 115. Vary , C.P. (1992 ) Clin. Chem. 38, 687 . 116. Olivas , W.M. an d Maher, L.J. R. (1994 ) Biotechniques 16, 128 . 117. Pei , D.H., Ulrich , H.D . an d Schultz, P.G. (1991 ) Science 253, 1408 . 118. Letai , A.G., Palladino , M.A. , Fromm , E. , Rizzo , V. an d Fresco , J.R. (1988 ) Biochemistry 27, 9108 . 119. Griffin , L.C . an d Dervan, P.B. (1989 ) Science 245, 967 . 120. Belotserkovskii , B.P., Veselkov , A.G. , Filippov , S.A., Dobrynin, V.N., Mirkin , S.M. an d Frank-Kamenetskii, M.D. (1990 ) Nucl. Acids Res. 18, 6621 . 121. Macaya , R.F., Gilbert , D.E. , Malek , S. , Sinsheimer, J. an d Feigon.J . (1991 ) Science 254, 270. 122. Sun , J.S., Mergny , J.L. , Lavery , R. , Montenay-Garestier , T . an d Helene , C . (1991 ) J. Biomol. Struct. Dynamics 9, 411 . 123. Mergny , J.L., Sun , J.S., Rougee , M., Montenay-Garestier , T. , Barcelo , F., Chomilier, J . and Helene , C . (1991 ) Biochemistry 30 , 9791 . 124. Home , D.A. an d Dervan, P.B . (1991 ) Nucl. Adds Res. 19, 4963 . 125. Yoon , K. , Hobbs , C.A. , Koch , J., Sardaro , M. , Kutny , R . an d Weis, A.L . (1992 ) Proc. Natl. Acad. Sci. USA 89 , 3840 . 126. Beal , P.A. an d Dervan, P.B . (1992 ) Nucl. Adds Res. 20, 2773 . 127. Greenberg , W.A . an d Dervan, P.B . (1995 ) J. Am. Chem. Soc. 117, 5016 . 128. Kiessling , L.L., Griffin, L.C . an d Dervan , P.B . (1992 ) Biochemistry 31 , 2829 . 129. Colocci , N. , Distefano , M.D. an d Dervan, P.B. (1993 ) J. Am. Chem. Soc. 115, 4468 . 130. Volker , J. an d Klump, H.H . (1994 ) Biochemistry 33 , 13502 . 131. Colocci , N . an d Dervan, P.B. (1995 ) J. Am. Chem. Soc. 117, 4781 . 132. Ebbinghaus , S.W. , Gee , J.E., Rodu , B. , Mayfield , C.A., Sanders , G. an d Miller , D.M . (1993) J. Clin. Invest. 92, 2433. 133. Mayfield , C. an d Miller, D . (1994 ) Nucl. Adds Res. 22, 1909 . 134. Gee , J.E. , Revankar , G.R. , Rao , T.S . an d Hogan , M.E . (1995 ) Biochemistry 34 , 2042.
Structures of nucleic acid triplexes 38
5
135. Milligan , J.F. , Krawczyk , S.H. , Wadwani , S . an d Matteucci , M.D . (1993 ) Nucl. Adds Res. 21, 327 . 136. Stilz , H.U. an d Dervan, P.B. (1993 ) Biochemistry 32, 2177 . 137. Zimmerman , S.C . an d Schmitt, P. (1995 ) J. Am. Chem. Soc. 117, 10769 . 138. Griffin , L.C. , Kiessling , L.L. , Beal, P.A. , Gillespie , P . an d Dervan , P.B . (1992 ) J. Am. Chem. Soc. 114, 7976 . 139. Koshlap , K.M., Gillespie , P., Dervan , P.B . an d Feigon, J. (1993 ) J. Am. Chem. Soc. 115, 7908. 140. Wang , E. , Koshlap , K.M. , Gillespie , P., Dervan , P.B . an d Feigon , J. (1996 ) J. Mol. Biol. 257, 1052 . 141. Ono , A. , Chen, C.N . an d Kan, L.S. (1991) Biochemistry 30 , 9914 . 142. Home , D.A. and Dervan, P.B. (1990 ) J. Am. Chem. Soc. 112, 2435 . 143. Jayasena , S.D. an d Johnston, B.H . (1992 ) Biochemistry 31 , 320 . 144. Jayasena , S.D. an d Johnston, B.H . (1992 ) Nucl. Acids Res. 20, 5279. 145. Beal , P.A. an d Dervan, P.B. (1992 ) J. Am. Chem. Soc. 114, 4976 . 146. Washbrook , E. and Fox, K.R. (1994 ) Biochem. J. 301 , 569 . 147. Sun , J.S. , Francois , J.C., Montenay-Garestier , T. , Saison-Behmoaras , T. , Roig , V. , Thuong, N.T . an d Helene, C . (1989 ) Proc. Natl. Acad. Sci. USA 86 , 9198 . 148. Collier , D.A. , Thuong, N.T. an d Helene, C . (1991 ) J. Am. Chem. Soc. 113, 1457 . 149. Ono , A. , Tso, P.O . P . and Kan, L.S. (1991) J. Am. Chem. Soc. 113, 4032 . 150. Ono , A. , Tso, P.O . P . and Kan, L.S. (1992) J. Org. Chem. 57, 3225 . 151. Koh , J.S. an d Dervan, P.B. (1992 ) J. Am. Chem. Soc. 114, 1470 . 152. Radhakrishnan , I. , Patel , D.J. , Priestly , E.S. , Nash , H.M . an d Dervan , P.B . (1993 ) Biochemistry 32 , 11228 . 153. Priestley , E.S. an d Dervan, P.B. (1995 ) J. Am. Chem. Soc. 117, 4761 . 154. Hunziker , J., Priestley , E.S., Brunar, H. an d Dervan, P.B. (1995 ) J. Am. Chem. Soc. 117, 2661. 155. Krawczyk , S.H., Milligan , J.F., Wadwani, S. , Moulds, C. , Froehler , B.C. an d Matteucci, M.D. (1992 ) Proc. Natl. Acad. Sci. USA 89 , 3761 . 156. Jetter , M.C . an d Hobbs, F.W . (1993 ) Biochemistry 32 , 3249 . 157. Xiang , G.B., Soussou, W. an d McLaughlin, L.W. (1994 ) J. Am. Chem. Soc. 116, 11155 . 158. Thuong , N.T . an d Helene, C. (1993 ) Angew. Chem. Int. Ed. Eng. 32, 666 . 159. Stonehouse , T.J. an d Fox, K.R. (1994 ) Biochim. Biophys. Acta 1218 , 322 . 160. Collier , D.A. , Mergny , J.L. , Thuong , N.T . an d Helene , C . (1991 ) Nucl. Acids Res. 19, 42(19. 161. Fox , K.R. (1994 ) Nucl. Acids Res. 22, 2016 . 162. Durand , M., Thuong, N.T. an d Maurizot, J.C. (1992 ) J. Biol. Chem. 267, 24394 . 163. Park , Y.W. an d Breslauer, K.J. (1992) Proc. Natl. Acad. Sci. USA 89 , 6653 . 164. Durand , M. , Thuong , N.T . an d Maurizot , J.C. (1994 ) J. Biomol. Struct. Dynamics 11 , 1191. 165. Fedorova , O.S. , Knorre , D.G., Podust , L.M. an d Zarytova, V.F. (1988) FEBS Lett. 228, 273. 166. Povsic , T.J. an d Dervan, P.B . (1990 ) J. Am. Chem. Soc. 112, 9428 . 167. Takasugi , M., Guendouz , A., Chassignol , M. , Decout , J.L., Lhomme, J., Thuong , N.T . and Helene, C . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 5602 . 168. Giovannangeli , C., Thuong , N.T . an d Helene, C. (1992 ) Nucl. Acids Res. 20, 4275 . 169. Grigoriev , M. , Praseuth , D., Guieysse , A.L., Robin, P., Thuong, N.T., Helene , C. and Harel-Bellan, A . (1993 ) Proc. Natl. Acad. Sci. USA 90 , 3501 . 170. Degols , G. , Clarenc, J.P., Lebleu , B. and Leonetti, J.P. (1994 ) J. Biol. Chem. 269, 16933 . 171. Nielsen , P.E. (1995 ) Annu. Rev. Biophys. Biomol. Struct. 24, 167 .
386
Oxford Handbook of Nucleic Acid Structure
172. Latimer , L.J., Hampel, K . and Lee, J.S. (1989 ) Nud. Adds Res. 17, 1549 . 173. Kim , S.G. , Tsukahara, S., Yokoyama, S . and Takaku, H . (1992 ) FEBS Lett. 314, 29. 174. Tsukahara , S. , Kim, S.G. and Takaku , H . (1993 ) Biochem. Biophys. Res. Commun. 196, 990. 175. Hacia , J.G., Wold , BJ. an d Dervan, P.B. (1994) Biochemistry 33 , 5367 . 176. Callahan , D.E., Trapane , T.L., Miller , P.S. , Ts'o, P.O. and Kan, L.S. (1991) Biochemistry 30, 1650. 177. Reynolds , M.A. , Arnold , L.J. , Jr., Almazan, M.T., Beck, T.A. , Hogrefe, R.I. , Metzler, M.D., Stoughton , S.R., Tseng, B.Y. , Trapane , T.L., Ts'o, P.O. and Woolf, T.M . (1994) Proc. Natl. Acad. Sri. USA 91 , 12433 . 178. Browne , K.A. , Dempcy , R.O . and Bruice , T.C . (1995 ) Proc. Natl. Acad. Sri . USA 92 , 7051. 179. Kibler-Herzog , L. , Kell, B., Zon, G., Shinozuka, K., Mizan, S . and Wilson, W.D . (1990) Nud. Acids Res. 18, 3545. 180. Kibler-Herzog , L. , Zon, G., Whittier, G. , Mizan, S . and Wilson, W.D. (1993) Anticancer Drug Des. 8, 65 . 181. Alunni-Fabbroni , M. , Manfioletti , G. , Manzini , G . an d Xodo , L.E . (1994) Eur. J. Biochem. 226,831. 182. Xodo , L. , Alunni-Fabbroni , M. , Manzini , G . an d Quadrifoglio , F . (1994 ) Nucl. Acids Res. 22, 3322 . 183. Shimizu , M., Koizumi , T., Inoue , H . an d Ohtsuka, E . (1994 ) Bioorg. Med. 4, 1029. 184. Wang , S . and Kool, E.T. (1995) Nucl. Adds Res. 23, 1157. 185. Jones , R.J. , Swaminathan, S., Milligan, J.F., Wadwani , S. , Froehler, B.C. and Matteucci, M.D. (1993 ) J. Am. Chem. Soc. 115, 9816. 186. Tarkoy , M. , Bolli , M. and Leumann, C . (1994 ) Helv. Chim. Acta 77, 716. 187. Escude , C., Sun , J.S., Rougee , M. , Garestier , T. an d Helene, (1992 ) C. C. R. Acad. Sri . III 315, 521. 188. Shimizu , M. , Konishi , A. , Shimada , Y. , Inoue , H . an d Ohtsuka , E . (1992 ) FEB S Lett. 302, 155. 189. Nielsen , P.E. , Egholm, M. , Berg , R.H. and Buchardt, O . (1991 ) Science 254, 1497. 190. Egholm , M. , Buchardt , O. , Christensen , L. , Behrens , C. , Freier , S.M. , Driver, D.A., Berg, R.H., Kim, S.K. , Norden, B . and Nielsen, P.E . (1993) Nature 365, 566. 191. Kim , S.K. , Nielsen, P.E., Egholm, M. , Buchardt , O. , Berg , R.H. and Norden, B. (1993 ) J. Am. Chem. Soc. 115, 6477. 192.. Nielsen , P.E. , Egholm , M . an d Buchardt, O. (1994 ) Bioconjug. Chem. 5, 3. 193. Betts , L.JoseyJ.A., Veal.J.M. and Jordan, S.R. (1995) Science 270, 1838. 194. Miles , H.T . (1964) Proc. Natl. Acad. Sri. USA 51 , 1104. 195. Arnott , S. and Bond, PJ . (1973 ) Nature New Biol. 244, 99 . 196. Arnott , S . and Seising , E. (1974 ) J. Mol. Biol. 88, 509. 197. Arnott , S. , Bond, PJ., Seising, E. and Smith, PJ . C . (1976 ) Nucl. Acids Res. 3, 2459 . 198. Liu , K., Miles , H.T., Parris, K.D. and Sasisekharan , V. (1994 ) Nature Struct. Biol. 1, 11. 199. Liu , K., Sasisekharan , V., Miles , H.T . an d Raghunathan , G . (1996 ) Biopolymers 39 , 573. 200. Kallenbach , N.R., Daniel, Jr, W.E., and Kaminker, M.A. (1976) Biochemistry 15 , 1218. 201. Geerdes , H.A. M. an d Hilbers, C.W. (1977) Nucl. Acids Res. 4, 207. 202. Rajagopal , P. and Feigon, J. (\989)Biochemistry 28 , 7859 . 203. d e los Santos, C., Rosen , M. an d Patel, D . (1989 ) Biochemistry 28 , 7282 . 204. Radhakrishnan , I. , Patel , D.J., Veal.J.M. an d Gao, X.L . (1992)J. Am. Chem. Soc. 114, 6913. 205. Macaya , R.F., Schultze, P. and Feigon, J. (1992 ) J. Am. Chem. Soc. 114, 781.
Structures of nucleic acid triplexes 38
7
206. Macaya , R., Wang , E. , Schultze , P., Sklenar , V. and Feigon, J . (1992 ) J. Mol. Bid. 225 , 755. 207. Anonymou s (1989 ) EMBO J. 8 , 1. 208. Lavery , R. an d Sklenar, H. (1988 ) J. Biomol. Struct. Dynamics 6 , 63. 209. Lavery , R. an d Sklenar, H. (1989 ) J. Biomol. Struct. Dynamics 6, 655 . 210. Han , H . an d Dervan, P.B. (1994 ) Nucl. Acids Res. 22, 2837. 211. Akhebat , A. , Dagneaux , C. , Liquier , J. an d Taillandier , E . (1992 ) J. Biomol. Struct. Dynamics 10 , 577 . 212. Liquier , J., Taillandier , E. , Klinck , R. , Guittet , E. , Gouyette , C . an d Huynh-Dinh, T . (1995) Nucl. Acids Res. 23, 1722 . 213. Klinck , R. , Liquier , J. , Taillandier , E. , Gouyette , C. , Huynhdinh , T . an d Guittet , E . (1995) Eur.J. Biochem. 233, 544 . 214. Holland , J.A. an d Hoffman, D.W . (1996 ) Nud. Acids Res. 24, 2841 . 215. Heus , H.A. an d Pardi, A. (1991) J. Am. Chem. Soc. 113, 4360 . 216. Wiithrich , K . (1986 ) NMR of Proteins and Nucleic Adds. John Wiley & Sons, Ne w York . 217. Dagneaux , C. , Liquier , J. an d Taillandier, E . (1995 ) Biochemistry 34 , 16618 . 218. Gotfredsen , C.H., Schultze , P. and Feigon, J. (1998 ) J. Am. Chem. Soc. 120, 4281 . 219. Roberts , R.W. an d Crothers, D.M . (1992 ) Science 258, 1463 . 220. Escude , C., Francois , J.C., Sun , J.S., Ott , G. , Sprinzl , M. , Garestier , T. an d Helene, C . (1993) Nucl. Acids Res. 21, 5547 . 221. Han , H . an d Dervan, P.B . (1993 ) Proc. Natl. Acad. Sri. USA 90 , 3806 . 222. Bornet , O . an d Lancelot, G. (1995 ) J. Biomol. Struct. Dynamics 12 , 803 . 223. Tarkoy , M. , Phipps , A.K., Schultze, P. and Feigon, J. (1998 ) Biochemistry 37 , 5810 . 224. Phipps , A.K., Tarkoy , M. , Schultze , P. and Feigon, J. (1998 ) Biochemistry 37 , 5820 . 225. Radhakrishnan , I . and Patel, DJ. (1994 ) Structure 2, 17. 226. Radhakrishnan , I . and Patel, D.J. (1994 ) J. Mol. Biol. 241, 600 . 227. Wang , E. , Koshlap, K.M., Gillespie , P. , Dervan, P.B . an d Feigon, J. (1996 ) J. Mol. Biol. 257, 1052 . 228. Koshlap , K.M., Schultze , P., Brunar, H., Dervan , P.B . and Feigon, J. (1997 ) Biochemistry 36, 2659. 229. Bernstein , F.C. , Koetzle , T.F. , Williams , G.J. , Meyer , E.E. , Jr., Brice , M.D. , Rodgers , J.R., Kennard , O., Shimanouchi , T. an d Tasumi, M . (1977 ) J. Mol. Biol. 112, 535 . 230. Roongta , V.A.Jones, C.R. an d Gorenstein, D.G . (1990 ) Biochemistry 29, 5245. 231. Fang , Y., Bai , C. , Wei , Y. , Lin , S.B . an d Kan, L . (1995 ) J. Biomol. Struct. Dynamics 13 , 471. 232. Wang , E., Malek, S. and Feigon, J. (1992 ) Biochemistry 31 , 4838. 233. Yanagi , K., Prive, G.G . an d Dickerson, R.E . (1991 ) J. Mol. Biol. 217, 201 . 234. Quintana , J.R. , Grzeskowiak , K. , Yanagi , K . an d Dickerson, R.E . (1992 ) J. Mol. Biol. 225, 379 . 235. Froehler , B.C. , Wadwani , S. , Terhorst , TJ . an d Gerrard , S.R . (1992 ) Tetrahedron Lett. 33, 5307 . 236. Broitman , S.L. , Im, D.D. an d Fresco, J.R. (1987 ) Proc. Natl. Acad. Sci. USA 84 , 5120 . 237. Howard , F.B. , Miles, H.T . an d Ross, P.D. (1995 ) Biochemistry 34 , 7135 . 238. Beal , P.A. an d Dervan, P.B . (1991 ) Science 251, 1360 . 239. Durland , R.H. , Kessler , D.J. , Gunnell , S., Duvic , M., Pettitt , B.M . and Hogan , M.E . (1991) Biochemistry 30 , 9246 . 240. Chen , P.M . (1991 ) Biochemistry 30 , 4472 . 241. Pilch , D.S. , Levenson , C. an d Shafer, R.H . (1991 ) Biochemistry 30 , 6081 . 242. Radhakrishnan , I. , de los Santos, C. an d Patel, D.J . (1991 ) J. Mol. Biol. 221, 1403 . 243. Radhakrishnan , I. , de los Santos, C. an d Patel, D.J . (1993 ) J. Mol. Biol. 234, 188 .
388
Oxford Handbook of Nucleic Acid Structure
244. Radhakrishnan , I. and Patel, D.J . (1993 ) Structure 1 , 135 . 245. Ji , J., Hogan , M.E . an d Gao, X. (1996 ) Structure 4, 425. 246. Vlieghe , D., Va n Meervelt, L. , Dautant, A., Gallois , B., Precigoux , G . an d Kennard, O . (1996) Science 273, 1702 . 247. Shin , C . an d Koo, H.S. (1996 ) Biochemistry 35 , 968 . 248. Ouali , M. , Letellier , R. , Sun , J.S., Akhebat , A., Adnet, F. , Liquier, J. an d Taillandier, E . (1993) J. Am. Chem. Soc. 115, 4264 . 249. Durland , R.H. , Rao , T.S. , Revankar , G.R. , Tinsley , J.H., Myrick , M.A. , Seth , D.M. , Rayford, J., Singh , P. and Jayaraman, K. (1994 ) Nucl. Acids Res. 22, 3233 . 250. Nielsen , P.E., Egholm , M . an d Buchardt, O. (1994 ) J. Mol. Recogn. 7, 165 . NOTE adde d in proof: Thi s revie w cover s th e publishe d literature and work from the Feigo n laboratory through May , 1997 . References t o unpublished work from that time hav e been updated .
13 Structures of guanine-rich and cytosinerich quadruplexes formed in vitro by telomeric, centromeric, and triplet repeat disease DNA sequences DinshatvJ. Patel, Serge Bouaziz, Abdelali Kettani, and Yong Wang Cellular Biochemistry and Biophysics Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA
1. Introduction DNA sequence s ca n adopt highe r orde r architecture s beyond duple x alignments , an d research i n thi s are a i s increasingl y addressin g th e structura l an d energetic s issue s related t o DN A triplexe s (reviewe d i n ref s 1— 3 an d Chapte r 12) , quadruplexe s (4,5) , and junctions (6,7 , an d Chapte r 15) . Th e rang e o f strand directionalities an d pairin g alignments withi n thes e multistrande d structure s provide s nove l DN A architecture s associated with molecular recognitio n and function. The structur e of DNA quadruplexe s forme d b y guanine-rich DN A segment s i s of great interes t currently , sinc e i t affect s processe s rangin g fro m th e architectur e o f telomeric an d centromeri c sites , t o th e potentia l pairin g alignment s durin g geneti c recombination events . Th e initia l effort s i n thi s are a hav e focuse d o n monovalen t cation-coordinated G quadruplexes formed b y the stackin g of planar G:G:G:G tetrad s (8-12; reviewe d i n 13) . Thi s chapte r focuse s o n recen t structura l insight s int o G quadruplex architectur e tha t hav e emerged fro m crystallographi c and solutio n NM R studies. Th e observe d structura l polymorphism i s related t o th e relativ e stran d directionality an d t o th e distributio n o f syn/anti guanine s alon g individua l strand s an d around G tetrad s in G quadruplexe s (fo r earlier structura l reviews se e ref s 4 an d 14) . This chapte r als o summarize s recen t structure s o f quadruplexes containin g G:C:G:C tetrads adopted b y triple t repea t diseas e and relate d sequences . It als o discusse s recen t structural effort s tha t have defined th e role of monovalent cations sandwiche d betwee n G tetrads in stabilizing the G quadruplex fol d and , in addition, identifie d th e molecu lar basi s associate d wit h monovalen t cation-dependen t foldin g o f loo p domain s o f quadruplexes. Cytosine-rich sequences have been show n t o form quadruplexe s at acidic pH, des ignated i-motifs , throug h antiparalle l alignmen t o f a pair o f mutually intercalate d par allel-stranded C:CH + mismatch-containin g duplexe s (15) . Thi s chapte r outline s th e range o f i-motif structures adopte d by telomeric an d centromeric sequence s an d th e role o f flankin g sequence s i n directin g th e overal l foldin g topolog y o f th e i-moti f quadruplex.
390
Oxford Handbook of Nucleic Acid Structure
2, Telomeric sequence G quadruplexes Telomeres ar e nuclei c acid:protei n complexe s foun d a t th e end s o f linea r chromo somes. The y ar e involved i n chromosoma l 3'-en d replicatio n withou t truncation , i n chromosomal organizatio n an d in protection o f chromosomal termini i agains t degra dation, an d i n th e anchorin g o f chromosomes t o th e nuclea r envelop e (reviewe d i n ref. 16) . The y contai n tande m repeat s o f guanine s an d cytosine s o n partne r strands together wit h guanine-ric h segmen t overhang s at the 3'-ends , i n specie s as divergent as ciliates, yeast, and humans. The critica l functional role of telomere sequenc e follows directly fro m th e observatio n tha t mutate d telomeri c sequence s induc e telomer e length instability and subsequent death of the organis m (17) . The foldin g topologies o f such G rich tande m repeats are of considerable interest since they have the potential to form G quadruplexe s in vitro. The fundamenta l unit o f the G quadruple x is the G tetrad (18—20) whic h involves a cyclized, hydroge n bonded , squar e plana r alignmen t o f fou r guanines , a s show n in Fig . 13.1 . Adjacen t guanine s aroun d th e G tetra d ar e paire d throug h thei r Watson—Crick and Hoogsteen edges , resulting in four electronegativ e carbony l groups being directe d toward s th e interio r o f th e tetrad . G quadruple x formatio n ha s an absolute requirement fo r monovalent K + an d Na + cation s (21—24) , with th e monova lent cation-bindin g site s presumabl y positione d i n th e interio r o f th e quadruple x between stacke d G tetrads (11). G quadruple x architectur e is to som e exten t depend ent o n th e natur e of the monovalen t catio n (25—27) , wit h K + cation s generating th e most stabl e G quadruplexe s (reviewed i n 28) . Early effort s a t determinin g th e foldin g topologies o f G quadruplexes based on chemica l modification, bas e analogue substitution, an d cross-linkin g experiment s (9—12,29 ) hav e been supplemente d by X-ra y an d NMR approache s that provide atomi c resolutio n view s o f the foldin g architectur e in the crystallin e and solution states, respectively. These G quadruplex structures are pre -
Fig. 13.1 . A schematic drawing of the G:G:G:G tetrad pairing alignment.
Structures of guanine-rich and cytosine-rich quadruplexes 39
1
sented belo w an d categorize d accordin g t o th e relativ e stran d directionalitie s an d syn/anti distributio n aroun d individua l G tetrads.
2.1 G quadruplexes containing anti:anti:anti:anti G tetrad alignments and parallel strand directionalities Some o f the earlies t efforts a t structure determination o f G quadruplexes stabilized by G tetrad s focuse d o n sequence s containin g singl e dG n repeats . Suc h dG n sequence s with non-guanin e flankin g base s hav e provide d th e necessar y structural informatio n on th e architectur e o f parallel-strande d G quadruplexe s i n solutio n an d crystallin e states. 2. i. 1 Solution structure of d(NG4N) quadruplexes NMR-based studie s o f single guanine-ric h repea t d(T 2AG3T), d(T 2G4T) (30) , d(TG4T) (31) , an d d(TG 3T) (32 ) sequence s lackin g 3'-termina l guanines , provide d the initia l evidence fo r formation o f parallel-stranded G quadruplexes containin g onl y anti -glycosidic torsio n angle s in K +-containing solution . Thes e studie s also established that th e guanin e imin o proton s o f the interna l G tetrad s exchange d ver y slowl y wit h solvent wate r (30) . B y contrast , sequence s endin g wit h 3'-termina l guanine s ten d t o aggregate by forming highe r orde r multistrande d structures , as probed b y gel mobilit y and methylation protectio n experiment s (33,34 ) an d NMR spectra l parameters (30) . The solutio n structure s o f al l parallel-strande d G quaduplexe s hav e bee n solve d through combined NM R an d molecular dynamic s studies of the sequence s d(T 2G4T) (35), d(T 4G4) (36) , an d d(TG 4T) (37) . Th e structure s ar e wel l define d withi n th e guanine-rich segments , bu t underdefine d a t th e thyrnin e segments . A vie w lookin g into on e o f the fou r equivalen t groove s o f the solutio n structur e o f the G 4 segment o f the d(T 2G4T) quadruple x i s shown i n Fig . 13.2 a (35) . Th e structur e i s right-hande d with al l residue s adoptin g anti glycosidi c torsio n angle s an d S-typ e (C2'-endo) suga r pucker conformations. Th e fou r G tetrads, which approach coplanarity, are stacked on each other , with th e overla p of the centra l tetrad s shown i n Fig . 13.2b . 2.1.2 Energetics of the d(TG3T) quadruplex in solution The energetic s for the order—disorde r transition of d(TGBT) quadruplexe s in monovalent catio n solutio n hav e bee n measure d usin g optica l (38 ) an d calorimetri c (32 ) experiments. Th e calorimetri c studie s o n th e d(TG 3T) quadruple x i n K + solutio n yield value s o f AG° = -9. 6 kJ/mo l o f tetrad , AH ° = -87. 8 kj/mo l o f tetrad , an d ASo =25 9 J/K mo l o f tetrad a t 25°C (32) . Thes e dat a establish tha t the stabilit y of G quadruplexes reflect s a favourable enthalpic contributio n t o formation . 2.1.3 Crystal structure of the d(TG4T) quadruplex The crysta l structur e o f the d(TG 4T) sequenc e i n th e presenc e o f Na + catio n was solved initiall y at 1. 2 A resolution (39 ) an d refine d furthe r t o 0.9 5 A (40) . There are four parallel-strande d G quadruplexes in th e asymmetri c uni t o f this crystallographi c structure, wit h pair s of G quadruplexes stacke d end-to-end in a head-to-head (5 ' t o 5') orientatio n throug h thei r termina l G tetrads . Th e crysta l structur e o f th e G 4
392
Oxford Handbook of Nucleic Acid Structure
Fig. 13.2. (a ) A vie w o f th e NMR-base d solutio n structur e o f the four-strande d d(T 2G4T) quadruple x (35). Tw o o f th e strands , directed toward s th e viewer , ar e show n wit h fille d bond s an d th e othe r two , directed awa y fro m th e viewer , ar e show n wit h ope n bonds , (b ) Stackin g betwee n adjacen t interna l G:G:G:G tetrad s in the solution structur e of the d(T 2G4T) quadruple x (35).
Structures ofguanine-rich and cytosine-rich quadruplexes 39
3
Fig. 13.3. (a ) A view o f the X-ra y crysta l structure of th e four-strande d d(TG4T) quadruple x (39,40). Two of the strands, directed towards the viewer, ar e shown wit h filled bonds and the other two, directe d away fro m th e viewer , ar e show n wit h ope n bonds , (b ) Stackin g betwee n adjacen t interna l G:G:G: G tetrads in the crysta l structure of the d(TG 4T) quadruplex (39,40) .
394
Oxford Handbook of Nucleic Acid Structure
segment o f th e d(TG 4T) G quadruple x i s show n i n Fig . 13.3 a togethe r wit h th e overlap geometr y betwee n stacke d centra l G tetrads , i n Fig . 13.3b . Th e termina l thymines ar e less well define d and no t involve d in the stackin g with th e G tetrads of the G quadruplex . Th e Na + cation s are well define d in thi s 0.9 5 A hig h resolutio n crystal structure, and their positioning range s from coordination site s associated with inwardly directe d guanin e O 6 atom s locate d betwee n G tetra d planes , t o site s located within G tetrad planes (40). Bound wate r molecule s ca n also be identifie d at this high resolutio n an d are clustered around th e backbon e phosphate s in the helica l grooves. Th e basi c architecture of th e parallel-strande d G quadruple x segment s are the sam e in th e crysta l (Fig . 13.3a ) (40 ) and i n solutio n (Fig . 13.2a ) (35) , as are th e base pair overlap s between adjacen t G tetrads in the crysta l (Fig. 13.3b ) an d in solution (Figur e 13.2b) . 2.1.4 Solution structure of the r(UG4U) quadruplex Guanine-rich sequence s are also detected i n RNA, suggestin g the potential fo r RN A G quadruple x formation . Indeed , guanine-ric h sequence s hav e bee n identifie d i n E. colt 5 S RNA, wher e the y ar e know n t o aggregat e int o a tetrameri c for m i n th e presence of K+ catio n (41) . An NMR an d molecular dynamics-based characterizatio n of the r(UG 4U) sequenc e in K + solutio n establishe d formation o f a right-handed G quadruplex (Fig . 13.4a ) containin g al l anti-glycosidic torsio n angle s an d stabilize d by four stacke d G tetrads (42). The majorit y of the suga r puckers adopted N-typ e (C3' endo) o r partiall y N-typ e suga r pucke r conformations . Thi s structura l stud y als o identified formatio n of a U tetra d (show n schematicall y in Fig . 13.4b ) whic h stack s on the adjacen t G-tetra d (42) . 2.1.5 Self-assembly of guanine-rich telomeric sequences into larger superstructures The Tetrahymena telomer e d(G 4T2G4) sequenc e has been shown b y gel electrophoresis to assembl e spontaneousl y int o large r superstructure s in monovalen t catio n solutio n (43). Thes e superstructures , calle d G wires , hav e bee n image d b y scannin g prob e microscopy (44 ) an d exhibi t characteristic s o f long , linea r polymer s o f G tetrad stabilized, parallel-stranded DNA (43) .
2.2 G quadruplexes containing syn:anti:syn:anti G tetrad alignments and antiparallel directionalities of adjacent strands The structur e o f the d(G 4T4G4) sequence , whic h contain s two tande m guanine-ric h segments within th e sequenc e context o f the Oxythcha telomeric d(T 4G4)n, repea t has been solve d i n bot h crystallin e (45 ) and solutio n (46,47 ) states. The foldin g architec ture o f th e G quadruple x forme d throug h dimerizatio n o f a pair o f d(G 4T4G4) seg ments is distinct betwee n th e X-ra y (45 ) an d NM R (46,47 ) structures , as defined by the relativ e alignment o f adjacen t strands , the syn/anti distributio n o f guanine glyco sidic bond s aroun d individua l G tetrads , an d th e loo p connectivitie s (latera l versus diagonal). Th e result s o f th e X-ra y structur e o f th e d(G 4T4G4) G quadruple x ar e reported i n this section.
Structures ofguanine-rich and cytosine-rich quadrupkxes 39
5
Fig. 13.4. (a ) A vie w o f th e NMR-base d solutio n structur e o f th e four-strande d r(UG 4U) quadruple x (42). Tw o o f the strands , directe d toward s th e viewer , ar e show n with fille d bond s an d th e othe r two , directed awa y from th e viewer , ar e shown with ope n bonds , (b ) Alignment aroun d th e U:U:U: U tetra d involving the Ul residu e in the solution structur e o f the r(UG 4U) quadruple x (42) .
396
Oxford Handbook of Nucleic Acid Structure
2,2.1 Crystal structure of the Oxytricha telomere d(G4T4G4) quadruplex The X-ra y structur e o f crystal s of d(G 4T4G4) grow n fro m K + solutio n an d solve d at 2.5 A resolutio n establishe s formation o f a pair o f hairpins oriente d i n a head-to-tail alignment, wit h G 4 segment s connecte d b y latera l loops , a s shown schematicall y i n Fig. 13.5 a (45) . Adjacent strands are aligned antiparalle l to eac h other with alternating syn—anti—syn-anti alignment s o f guanine s alon g individua l G 4 segment s an d syn:anti:syn:anti alignment s o f guanine s aroun d individua l G tetrads . A vie w o f th e structure o f thi s G quadruple x i s show n i n Fig . 13.6 . A twofol d axi s o f symmetr y relates the tw o halve s of the G quadruplex , resultin g i n tw o symmetri c wid e groove s and tw o symmetri c narro w grooves . A K + cation-bindin g sit e wa s associate d wit h electron densit y betwee n th e tw o centra l G tetrad s of th e G quadruple x (45) . Th e stacking pattern s betwee n adjacen t G tetrad s a t G(syri)—G(antt) an d G(anti)—G(syn} steps are shown i n Fig . 13.7a,b , respectively .
Fig. 13.5 . (a ) Schematic of the X-ray crysta l structure-based foldin g topology adopte d b y the quadruple x formed throug h head-to-tail dimerization o f the two-repeat Oxytricha telomer e d(G 4T4G4) sequenc e in K + solution (45) . The T 4 loop s ar e of the lateral type. Th e syn guanines ar e shown a s hatched rectangle s whil e anti guanine s ar e show n a s open rectangles , (b ) Schematic o f th e NM R solutio n structure-base d foldin g topology adopte d b y th e intramolecula r quadruple x forme d b y th e d(G 2T2G2TGTG2T2G2) sequenc e i n K+ solution (49,50) . All three loops ar e of the latera l type.
Structures of guanine-rich and cytosine-rich quadruplexes 39
7
Fig. 13.6 . A view o f the 2. 5 A X-ray crysta l structure o f the two-repea t Oxytricha telomer e d(G 4T4G4) quadruplex forme d throug h head-to-tai l dimerizatio n o f a pai r o f hairpin s i n K * solutio n (45) . On e d(G4T4G4) hairpi n i s shown wit h filled bonds whil e th e othe r is shown wit h ope n bonds . Th e T 4 loop s are of the latera l loop type .
2.2.2 Solution structure of the thrombin-binding d(G2T2G2TGTG2T2C2) DNA aptamer quadruplex A DNA aptame r with th e consensu s d(G 2T2G2TGTG2T2G2) sequenc e wa s identified through in vitro selection based on it s ability to bin d a-thrombi n (48) . This thrombinbinding aptame r contains four G 2 steps with th e potential t o for m an intramolecular G quadruplex i n monovalen t catio n solution . Indeed , tw o group s independentl y estab lished tha t th e NMR parameter s of the d(G 2T2G2TGTG2T2G2) sequenc e i n K + solu tion (49,50 ) wer e consisten t wit h formatio n o f a G quadruple x wit h antiparalle l alignment o f adjacent strands , alternating syn—anti alignment s alon g individual G 2 steps and syn:anti:syn:anti alignment s o f guanine s aroun d individua l G tetrads , a s show n schematically i n Fig . 13.5b . Th e T 2, TGT , an d T 2 loop s wer e al l of the latera l typ e with a T: T wobbl e mismatc h forme d betwee n th e secon d thymine s i n th e tw o T 2 loops (51) . The thrombin-bindin g G quadruple x i n K + solutio n i s sufficiently stable , despite containin g onl y tw o stacke d G tetrads, to permit th e singl e inosine fo r guanin e substitutions necessar y for distinguishin g betwee n alternativ e foldin g topologies (49) . The NM R dat a hav e been quantitativel y analysed to provide th e solutio n structur e of
398
Oxford Handbook of Nucleic Acid Structure
Fig. 13.7 . View s dow n th e heli x axi s showin g stackin g between adjacen t G:G:G: G tetrad s a t (a ) G(syn)-G(anti) an d (b ) G(anti)-G(syn) step s in th e crysta l structur e of th e Oxytricha telomer e d(G 4T4G4) quadruplex (45) . Individual G tetrads are drawn with eithe r filled o r open bonds.
Structures of guanine-rich and cytosine-rich quadruplexes 39
9
Fig. 13.8 . A vie w o f th e NMR-base d solutio n structur e o f th e intramolecularl y folde d d(G2T2G2TGTG2T2G2) quadruple x i n Na + solutio n (51) . Two o f th e guanine-containin g G2 steps , directed towards the viewer, are shown with filled bonds and the othe r two guanine-containin g G2 steps, directed awa y fro m th e viewer , ar e show n wit h ope n bonds . Th e thre e loo p segments (T3—T4 , T7—G8—T9, an d T12—T13 ) ar e show n with hatche d bond s an d th e base s i n thes e segment s have been deleted in the interest s of clarity. All three loops are of the latera l type.
the d(G 2T2G2TGTG2T2G2) G quadruple x (51,52) . Th e solutio n structur e o f thi s G quadruplex structur e i s show n i n Fig . 13. 8 (51) . Th e overlap s betwee n adjacen t G tetrads i s show n i n Fig . 13.9 a an d betwee n th e T: T mismatc h an d th e G tetra d i n Fig. 13.9 b (51) . A combinatio n o f NM R an d electro n spi n resonanc e (ESR ) method s hav e bee n used t o identif y paramagnetic manganes e divalen t cation-bindin g site s o n th e throm bin-binding d(G 2T2G2TGTG2T2G2) G quadruple x (53) . Thes e divalen t M n cation binding site s are located on e per minor groove o f the G quadruplex . 2.2.3 Crystal structure of the thrombin-binding d(G2T2G2TGTG2T2G2) DNA aptamer quadruplex bound to thrombin The crysta l structure of d(G2T2G2TGTG2T2G2) with Na + a s counterion an d bound t o thrombin ha s been solve d at 2.9 A resolution (54) . The boun d DN A i n th e crystallin e complex form s a G quadruple x (54 ) with a n architectur e wher e th e stran d runs i n a n opposite direction t o that shown schematicall y in Fig. 13.5b . The intermolecula r inter face i n th e comple x involve s th e heparin-bindin g sit e an d fibrinogen-bindin g exosit e on tw o differen t thrombins , an d th e loo p segment s i n th e G quadruple x (54) . Interestingly, even thoug h bot h X-ray (54 ) and NMR (51,52 ) method s hav e identified the sam e structur e for the G quadruple x cor e containin g two stacke d G tetrads , they disagree wit h respec t t o th e orientatio n o f th e connectin g loops , a s pointe d ou t
400
Oxford Handbook of Nucleic Acid Structure
Fig. 13.9 . View s dow n th e heli x axi s showin g (a ) stacking betwee n adjacen t G:G:G: G tetrad s a t th e G(syn):G(ant!) ste p an d (b ) stackin g betwee n th e G:G:G: G tetra d an d a T: T mismatc h i n th e solutio n structure o f the intramolecularl y folded d(G2T2G 2TGTG2T2G2) quadruplex in Na + solutio n (51).
Structures of guanine-rich and cytosine-rich quadruplexes 40
1
recently (55) . Thi s ma y reflec t ambiguitie s in th e X-ra y structur e of th e comple x i n identifying th e les s well-defined electro n densities in the loop-connecting segments . 2.2.4 Solution structures of insulin-linked polymorphic d(G4TGTG4) and d(G4TGTG4ACAG4TGTG4) quadruplexes The huma n insuli n gene contain s a guanine-rich regio n tha t contains tandem repeats of the d(ACAG 4TGTG4) sequenc e (56). The solutio n structures of both d(G 4TGTG4) and d(G 4TGTG4ACAG4TGTG4) sequence s hav e bee n characterize d b y NM R i n Na+-containing aqueou s solutio n (57) . Th e author s conclude d tha t d(G 4TGTG4) forms a G quadruple x through head-to-tai l dimerizatio n of hairpins containing TG T lateral loops , while d(G 4TGTG4ACAG4TGTG4) form s a n intramolecular G quadru plex containin g TGT , ACA , an d TG T latera l loops (57) . There i s reason t o reserve judgement o n thes e conclusion s sinc e th e author s di d no t undertak e inosin e fo r guanine substitution s to identif y individua l guanin e residue s involved i n quadruple x formation definitivel y (see ref s 49,58,59) , a n approac h tha t ha s prove d invaluabl e in distinguishing between G(syn):G(anti):G(syri):G(antt) an d G(syn}:G(syn}:G(anti)'.G(antt) tetrad alignments . The concer n outline d above related to th e proposed solution structures o f th e G quadruplexe s forme d b y th e insulin-linke d polymorphi c regio n sequences (57 ) could b e resolve d following completio n o f inosine fo r guanine substitution experiment s alon g th e line s reported earlie r for th e structur e of the Oxytricha telomeric d(G 4T4G4T4G4T4G4) sequenc e i n Na +-containing solution , wher e a pro posed mode l of the foldin g topology (60 ) ha d to be corrected followin g inosin e sub stitution experiments (61,62). 2.2.5 Dimeric RNA G quadruplex models It ha s been suggeste d that guanine-ric h region s ma y be involve d i n dimerizatio n o f retroviral RNA s throug h G quadruple x formatio n (63) . Presumably , th e propose d quadruplex involve s formatio n o f a n intramolecula r hairpi n withi n G:A-ric h seg ments, which ca n then dimeriz e throug h intermolecula r association. This quadruplex model ha s bee n challenge d subsequently , since a n alternativ e dimerizatio n sit e has been identified i n HIV-1 (64,65 ) which does no t involv e G quadruplex formation .
2.2.6 Intramolecular RNA G quadruplex models An intramolecula r G quadruple x fol d ha s als o bee n postulate d fo r a guanine-ric h segment adjacen t t o a n endonucleolyti c cleavag e site in insulin-lik e growt h facto r I I mRNA (66) . Chemical an d enzymatic probing experiments hav e been interprete d i n terms of the formation of a unimolecular G quadruplex conformation in Na + an d K +, but not i n Li +, cation-containing solution.
2.3 G quadruplexes containing syn:syn:anti:anti G tetrad alignments and both parallel and antiparallel directionalities of adjacent strands The relativ e alignmen t o f strand s aroun d dimeri c G quadruplexe s i s define d b y the typ e o f connectin g loo p linkin g th e Gn segments . A ke y discover y wa s th e
402
Oxford Handbook of Nucleic Acid Structure
identification o f diagonal loops: initiall y in th e Oxytricha telomer e d(G 4T4G4) dimeri c G quadruple x (46), and subsequently in the huma n telomer e unimolecula r G quadru plex d[AG 3(T2AG3)3] (59 ) an d th e Oxytricha telomer e unimolecula r G quadruple x d[G4(T4G4)3] (46,61,62). 2.3.1 Diagonal loops in G quadruplexes An NM R stud y o f th e Oxytricha telomer e d(G 4T4G4) sequenc e i n Na +-containing solution identifie d formatio n o f a G quadruple x wit h a foldin g topolog y (46 ) dis tinctly differen t fro m th e correspondin g topolog y fo r th e sam e sequence observe d i n the crystallin e stat e (45) . Thi s foldin g topolog y wa s verified fro m additiona l NM R measurements, includin g th e inosin e fo r guanin e substitution s necessar y fo r unam bigous spectral assignments (67). The foldin g topology o f the d(G 4T4G4) quadruple x in Na + solutio n involve s head to-tail alignment o f a pair of d(G 4T4G4) segment s containin g diagona l connectin g T 4 loops, a s shown schematicall y i n Fig . 13.10 a (46) . Th e formatio n o f diagona l con necting loops affect s bot h th e directionalit y o f adjacent strand s around th e G quadru plex an d th e syn/anti distributio n o f guanine s aroun d individua l G tetrads . Specifically, individua l strand s hav e bot h a paralle l an d a n antiparalle l neighbou r around th e G quadruplex , syn—anti—syn—anti orientation s ar e observe d fo r guanine s along individua l G 4 segment s an d syn:syn:anti:anti alignment s ar e observe d fo r gua nines aroun d individua l G tetrad s (46) . The hydroge n bon d directionalitie s alternate between clockwis e an d anticlockwise orientation s between adjacen t stacke d G tetrads in th e quadruplex . Th e diagona l loo p G quadruple x contain s a twofol d symmetr y axis wit h on e wide , on e narrow , an d tw o mediu m grooves . Thi s diagona l loop containing d(G 4T4G4) G quadruple x architectur e i s quit e stabl e sinc e th e imin o protons fro m th e interna l G tetrads exhibit ver y slow exchange rate s on transfe r fro m H2O t o D 2O solutio n (46) . 2.3.2 Solution structure of the human telomere d[AG3(T2AG3)3] quadruplex The sequenc e o f the huma n telomer e repea t d(T 2AG3)n contains one less guanine than the correspondin g d(T 4G4)n Oxytricha an d d(T 2G4)n Tetrahymena telomeri c repeats . Th e odd numbe r o f guanine s i n th e huma n telomer e repea t raise s interestin g question s about its folding topology an d thes e have been addresse d in a solution structure deter mination o f the four AG 3 repeat human telomer e d[AG 3(T2AG3)3] quadruple x in Na +containing solution (59) . These structura l efforts hav e been complemented b y chemica l footprinting an d bas e substitution studies on d(T nAG3)4 sequences, where n — 2 and 4 , which fol d into intramolecularly folde d G quadruplexes (68). This structura l characterization, which reported th e firs t hig h resolutio n solutio n structure o f a diagona l loop-containin g G quadruplex , wa s undertake n o n th e d[AG3(T2AG3)3] sequence , sinc e the d(T 2AG3)4 sequenc e gave poo r qualit y NM R spectra, presumabl y owing to conformationa l heterogeneity . Th e resonanc e assignments i n th e d[AG 3(T2AG3)3] 22-me r sequenc e wer e assigne d afte r a n in depth analysi s of NO E connectivitie s an d o n th e basi s o f d U fo r T an d partiall y successful inosin e fo r guanin e substitution s (59). Th e solutio n structur e was solved by a combine d NM R an d molecula r dynamic s stud y includin g intensity-base d refinement.
Structures of guanine-rich and cytosine-rich quadruplexes 40
3
Fig. 13.10. (a ) Schematic of the NMR solutio n structure-based folding topology adopted by the quadruplex formed through head-to-tail dimerization o f the two-repeat Oxytricha telomer e d(G 4T4G4) sequence in Na + solutio n (46). The T 4 loop is of the diagona l type. The syn guanines are shown as hatched rectangles, while anti guanines are shown as open rectangles, (b ) Schematic of the NM R solutio n structure-based folding topology adopte d b y the intramolecula r quadruplex formed by the four-repea t human telomere d[AG3(T2AG3)3] quadruple x i n Na + solutio n (59) . Th e centra l T 2A loo p i s o f th e diagona l type , (c ) Schematic o f the NM R solutio n structure-based folding topology adopted by the intramolecula r quadruplex formed b y the four-repea t Oxytricha telomer e d[G4(T4G4)3] quadruplex in Na4+ solution (61,62) . Th e central T 4 loop is of the diagonal type.
The foldin g topology o f the d[AG3(T 2AG3)3] quadruplex i n Na + solutio n i s shown schematically i n Fig . 13.10 b an d th e solutio n structur e i s shown i n Fig . 13.1 1 (59) . The solutio n structur e contain s thre e stacke d G tetrad s involvin g al l 1 2 guanin e residues i n th e sequence . Th e firs t an d thir d TT A loop s ar e of the latera l type, whil e the critica l centra l TTA loop is of the diagona l type . Thes e loop connectivities defin e the stran d orientations such that individual strands have both a parallel and an antipar allel neighbour, as seen schematically in Fig . 13.10b . There is one wide , tw o medium , and one narro w groove i n thi s quadruplex (59). The guanin e glycosidi c torsio n angle s alternate between anti an d syn (starting with an anti alignment at G2) along the entire length of the d[AG3(T 2AG3)3] sequence , an d the alternatio n remain s in registr y despit e the intervenin g TT A loo p segments . Th e guanines adop t syn:syn:anti:anti glycosidi c torsio n angle s aroun d individua l G tetrads with th e hydroge n bondin g directionalitie s alternatin g betwee n clockwis e an d anti clockwise orientation s betwee n adjacen t stacke d G tetrads , a s seen schematicall y i n Fig. 13. 1 Ob (59). The overla p geometrie s betwee n adjacen t G tetrad s a t G(syn)-G(anti) an d G(anti)—G(syn) step s in th e solutio n structur e ar e shown i n Fig . 13.12a,b , respectivel y
404
Oxford Handbook of Nucleic Acid Structure
Fig. 13.11. A vie w o f th e NMR-base d solutio n structur e o f th e intramolecularl y folded four-repeat human telomer e d[AG3(T2AG3)3] quadruplex in Na + solutio n (59) . Three o f the guanine-containin g G3 steps are shown with filled bonds while the remainin g guanine-containing G3 steps are shown with open bonds. Th e thre e loo p segment s (T5-T6-A7 , T11-T12-A13 , an d T17-T18-A19 ) ar e show n wit h hatched bond s and th e base s i n thes e segments have been delete d in th e interest s o f clarity . Th e centra l T11-T12-A13 loop is of the diagonal type.
(59). Bas e overla p betwee n stacke d G tetrad s primaril y involve s th e guanin e five-membered ring s a t G(syw)— G(anti) step s (Fig . 13.12a) an d th e guanin e six membered ring s at G(anti)—G(syn) step s (Fig. 13.12b). Three of the fou r adenine s in the sequenc e are stacked on adjacen t G tetrads, while the fourt h is tilted relativ e to th e G tetra d plane. These stackin g alignments involvin g loop adenin e residue s must contribut e t o th e stabilizatio n o f th e tertiar y fol d o f th e d[AG3(T2AG3)3] quadruplex . Non e o f the thymine s o r adenine s ar e involved i n base pairing i n th e structur e o f the G quadruplex . Th e ver y slo w exchang e observe d fo r imino proton s o f th e interna l G tetrad s i n th e d(G 4T4G4) (46 ) and d[G 4(T4G4)3] (46,61,62) G quadruplexe s ar e no t observe d fo r th e interna l G tetra d i n th e d[AG3(T2AG3)3] G quadruplex (59) , reflectin g its marginal stability. 2.3.3 Energetics of the human telomere d(T2AG3)4 quadruplex in solution The thermodynami c parameter s for huma n telomer e G quadruple x formatio n hav e been determine d fro m th e concentratio n dependenc e o f optica l meltin g curve s fo r d(T2AG3)4 i n Na + an d K + solutio n (69) . The estimate d values are AG° = -3.3 (-7.1 ) kj/mol o f tetra d an d AH0 = -54. 3 (-66.9 ) kj/mo l o f tetra d i n Na + (K +) solution. These model-dependent thermodynami c parameter s for the d(T 2AG3)4 G quadruple x
Structures of guanine-rich and cytosine-rich quadruplexes 40
5
Fig. 13.12. View s dow n th e heli x axi s showin g stackin g betwee n adjacen t G:G:G: G tetrad s a t (a) G(syn)—G(anti) an d (b ) G(anti)—G(syn) step s in th e NMR-base d solutio n structur e o f th e intramole cularly folded four-repeat human telomer e d[AG3(T 2AG3)3] quadruplex (59) . Individual G:G:G: G tetrad s are drawn with eithe r fille d or open bonds .
406
Oxford Handbook of Nucleic Acid Structure
(69) ar e a factor o f two lower tha n their model-independent calorimetric counterparts for th e d(TG 3T) (32 ) and d(G 2T5G2) (70 ) G quadruplexes. The origi n of this discrepancy i s no t clear . I t shoul d b e kep t i n min d tha t sequence s such a s d(T 2AG3)4 can , potentially, adopt a distribution o f intramolecular foldin g topologies, an d a meaningful evaluation of the energetic s must be accompanie d by a rigorous characterization of the conformational state(s ) unde r consideration. 2.3.4 Solution structure of the Oxytricha telomere d(G4T4G4) quadruplex The detail s of the solutio n structur e of the d(G 4T4G4) quadruple x in Na + solutio n have emerged fro m a combined NM R an d molecular dynamics analysis of the spectral data (47) . A view o f the hig h resolutio n d(G 4T4G4) G quadruple x solution structur e containing diagonal loops is shown i n Fig . 13.1 3 (47) . The guanin e sugar puckers are of the S-type i n the d(G 4T4G4) G quadruplex. Th e symmetry-relate d T 4 loo p confor mations are well defined , with th e firs t an d third thymines stacked over the termina l G tetrad planes, the secon d thymin e stacke d over th e firs t thymine , an d the las t thymin e looped ou t an d somewha t disordered . Th e G tetra d overlap s a t G(syn)—G(anti) an d
Fig. 13.13. A vie w o f th e NMR-base d solutio n structur e o f th e two-repea t Oxytricha telomer e d(G4T4G4) quadruple x forme d throug h head-to-tai l dimerizatio n o f a pai r o f hairpin s i n Na + solutio n (47). On e d(G 4T4G4) hairpin is shown with filled bonds while the othe r is shown with open bonds. The T4 loop s are of the diagonal type.
Structures of guanine-rich and cytosine-rich quadruplexes 40
7
G(anti)—G(syn) step s i n th e refine d solutio n structur e o f th e Oxytricha d(G 4T4G4) G quadruplex (47 ) are simila r to thos e reporte d fo r thes e step s i n th e refine d solutio n structure o f th e huma n d[AG 3(T2AG3)3] G quadruple x (59) . Th e structur e o f th e lateral loop-containin g d(G 4T4G4) G quadruple x i n th e crystallin e stat e (45 ) and th e diagonal loop-containing d(G 4T4G4) G quadruple x i n solutio n (46 ) are directly com pared Plate XIV. More recen t studie s hav e establishe d tha t th e diagona l loop-containin g fol d o f th e d(G4T4G4) G quadruple x i s observed bot h i n Na + an d K +-containing solutio n (71) . Specific proto n marker s associate d wit h th e diagona l loop-linke d d(G 4T4G4) G quadruplex underwen t smal l shift s a s average resonances o n proceedin g fro m Na + t o K+-containing solution . Th e monovalen t catio n selectivit y o f the diagona l loop-con taining d(G 4T4G4) G quadruple x wa s assigned t o th e greate r energeti c cos t o f Na + dehydration relative to K + dehydratio n (71) . 2.3.5 Solution structure of the Oxytricha telomere d[G4(T4G4)3] quadruplex The Oxytricha telomer e d[G 4(T4G4)3] ha s the potentia l t o fol d int o a n intramolecular G quadruple x stabilized by four stacke d G tetrads and three connectin g T 4 loops . Th e d[G4(T4G4)3] sequenc e in Na + solutio n give s a surprisingly well-resolved imin o proto n spectrum correspondin g t o on e predominan t conformatio n (46) . The firs t attemp t a t determining th e solutio n structur e of the d[G 4(T4G4)3] quadruple x claime d t o differ entiate a folding topolog y favouring a lateral centra l loop over th e alternativ e possibil ity of a diagonal centra l loop (60) . This conclusio n appeare d to b e questionabl e give n the tentativ e natur e o f ke y guanin e proto n assignment s an d th e paucit y o f details related to the computationa l protocols . These uncertaintie s wer e resolve d independentl y b y tw o group s wh o solve d th e solution structur e o f th e d[G 4(T4G4)3] quadruple x i n Na + solutio n base d o n a n in depth NM R an d molecula r dynamic s computationa l approac h (61,62) . On e o f th e groups incorporate d si x individual inosin e fo r guanin e substitution s (61) , whil e th e other use d one inosine fo r guanine substitutio n and extensive compariso n wit h relate d data on th e d(G 4T4G4) quadruple x (62) . These studie s identified key assignment error s in th e earlie r NMR stud y (60 ) and rule d ou t th e propose d centra l lateral loop i n th e intramolecularly folde d G quadruplex. The foldin g topolog y o f th e intramolecularl y folde d d[G 4(T4G4)3] quadruple x i n Na+ solutio n i s shown schematicall y i n Fig . 13.10 c (61,62 ) an d it s solution structur e is shown i n Fig . 13.1 4 (61) . The structur e is stabilized by fou r stacke d G tetrad s with a central diagonal T 4 loo p an d two lateral T 4 loops. Th e stran d directionalities, guanine syn/anti alignment s alon g individual strand s and around G tetrads, and groove dimen sions are the sam e in th e G quadruplexe s formed throug h dimerizatio n o f d(G 4T4G4) hairpins (Fig . 13.10a ) (46 ) an d throug h intramolecula r foldin g o f th e d[G 4(T4G4)3] sequence (Fig . 13.10c ) (61,62) . A comparison of the folding schematic s o f the four guanine repeat Oxytricha telom ere d[G 4(T4G4)3] (Fig . 13.10c ) (61,62 ) an d huma n telomer e d[AG 3(T2AG3)3] (Fig. 13.10b ) (59 ) quadruplexe s establishe s common element s i n th e foldin g topologies. Indeed , th e thre e lowe r G tetrads i n th e d[G 4(T4G4)3] quadruple x (Fig . 13.10c ) exhibit th e same structural features a s the thre e tetrad s in the d[AG 3(T2AG3)3] quadru plex (Fig . 13.10b) . Thes e studie s emphasize the importanc e o f this folding topolog y
408
Oxford Handbook of Nucleic Acid Structure
Fig. 13.14. A vie w o f th e NMR-base d solutio n structur e o f th e intramolecularl y folde d four-repea t Oxytricha telomer e d[G 4(T4G4)3] quadruple x i n Na + solutio n (61) . Two o f th e guanine-containin g G 4 steps, directe d towards the viewer, ar e shown wit h fille d bond s while th e othe r tw o guanine-containin g G4 steps , directe d awa y fro m th e viewer , ar e show n wit h ope n bonds . Th e thre e loo p segment s (T5-T6-T7-T8, T13-T14-T15-T16 , an d T21-T22-T23-T24) are shown with hatche d bonds and the bases in these segments have been deleted in the interests of clarity. The centra l T13—T14—T15—T16 loop is of the diagona l type.
(Fig. 13.10b,c ) fo r th e solutio n structure s of intramolecularly folde d G quadruplexes , which i s defined by a central diagonal loo p (46). 2.3.6 Solution structure of the d(G2T4CG2) quadruplex The NM R parameter s characteristi c o f G quadruple x formatio n wer e initiall y identified fro m a heteronuclea r NM R stud y o f th e d(G 2T4CG2) sequenc e i n Na + solution (58) . The structur e o f this sequence, whic h contain s a pair o f G 2 repeat s has been solve d recently , with G quadruple x formatio n throug h dimerizatio n o f a pair o f antiparallel d(G 2T4CG2) hairpin s (72) . The T 4C loop s ar e of the diagona l type , whic h in tur n define s the stran d directionalitie s an d th e guanin e syn/anti alignment s alon g individual G 2 segments an d around G tetrads. Thus , th e d(G 2T4CG2) quadruple x containing tw o G tetrads (58,72) and the d(G 4T4G4) quadruple x containin g fou r G tetrads (Fig. 13.10a ) (46,47 ) adopt th e sam e folding topology .
Structures of guanine-rich and cytosine-rich quadruplexes 40
9
2.3.7 Energetics of the d(G2TsG2) quadruplex in solution The correspondin g energetic s fo r th e order—disorde r transitio n o f th e d(G 2T5G2) quadruplex i n Na + solutio n hav e been measure d calorimetricall y an d yiel d value s o f AG° = -15. 9 kj/mo l o f tetra d an d AH ° = -117. 0 kj/mo l o f tetra d a t 25° C (70) . These calorimetri c parameters once again stress the importance o f enthalpic contribu tions to th e stabilit y of G quadruplexes formed through alignmen t o f a pair of diagonal loop-containing segments. 2.3.8 Solution structure of the d(G3T4G3) quadruplex The d(G 3T4G3) sequenc e show s well-resolve d NM R spectr a in monovalen t catio n solution (73) , with the spectra l properties indicative of formation of an asymmetric G quadruplex throug h dimerizatio n o f a pair o f d(G 3T4G3) segments . Detaile d NM R studies by two group s (74,75), including a molecular dynamics-based refinement (76), have identifie d th e foldin g topolog y o f th e d(G 3T4G3) quadruplex , whic h i s shown schematically i n Fig . 13.15 . Th e solutio n structur e o f thi s G quadruple x contain s several unusual features which ar e discussed below. This G quadruplex, which contain s three stacked G tetrads, forms through head-to tail dimerization o f a pair of d(G 3T4G3) segments , with th e directionalit y of the fou r strands defined b y the diagona l alignment o f the T 4 loop s (74—76) . The dime r is asymmetric a s reflected in th e 5'-syn—syn-anti—(loop)—syn—anti—anti alignment s alon g on e strand an d 5'-syn—anti—anti—(loop)—syn—syn—anti alignment s along th e other , a s shown
Fig. 13.15. Schemati c of the NMR solutio n structure-based folding topology adopted by the quadruplex formed through head-to-tail dimerization of the two-repeat d(G 3T4G3) sequence in Na + solutio n (74-76). The T 4 loop is of the diagona l type.
410
Oxford Handbook of Nucleic Acid Structure
schematically i n Fig . 13.15. Each stran d ha s both a parallel an d a n antiparalle l neigh bour an d thi s i s accompanie d b y syn:syn:anti:anti alignment s aroun d individua l G tetrads (74-76). It i s interestin g tha t on e o f tw o possibl e arrangement s o f diagona l loo p fold s fo r segment dimerizatio n i s favoure d fo r formatio n o f bot h th e d(G 3T4G3) quadruple x (74—76) an d the d(G 4T4G4) quadruple x (46,47) . This preferenc e has been attribute d t o the predominanc e o f a specific intermediat e i n th e foldin g pathway t o G quadruple x formation (46). 2.3.9 Energetics of the d(G3T4G3) quadruplex in solution The thermodynami c parameter s fo r bimolecula r G quadruple x formatio n hav e bee n determined fro m th e concentratio n dependence o f optical melting curve s for d(G 3T4G3) in Na + an d K+ solution s (73) . Th e estimate d value s are AG° = -10.9 (-16.7 ) kj/mol o f tetrad, AH° = -96 (-133 ) kj/mol o f tetrad an d AS° =-288 (-393 ) J/K mo l of tetrad in Na+ (K +) solution . Thes e model-dependen t thermodynami c parameter s fo r th e d(G3T4G3) G quadruplex (73 ) compare favourably with thei r model-independent calori metric counterpart s fo r the d(TG 3T) (32 ) and d(G 2T5G2) (70 ) G quadruplexes .
2.4 A G quadruplex containing a double chain reversal loop, syn:syn:syn:anti and anti:anti:anti:syn G tetrads, and unequal strand directionalities The G quadruple x structure s presente d abov e containe d eithe r latera l o r diagona l central loops which define d the stran d directionalities an d the G(syri)/G(anti) distribu tion alon g give n strand s an d aroun d individua l G tetrads . Thes e quadruplexe s contained eve n number s o f G(syn)/G(anti) residue s aroun d a give n G tetra d an d equa l numbers o f strand s pointing i n opposit e directions . A n exceptio n t o thes e rule s has emerged followin g structur e determinatio n o f th e Tetrahymena telomer e d(T 2G4)K G quadruplex. 2.4.1 Solution structure of the Tetrahymena telomere d(T2G4)4 quadruplex The Tetrahymena telomer e d(T 2G4)n sequenc e differ s fro m it s Oxytricha telomer e d(T4G4)n counterpar t i n havin g tw o fewe r thymine s pe r repea t tha t ca n potentiall y influence th e loo p topolog y involve d i n chai n reversal . It is also conceivable tha t som e of the guanine s coul d participate in chain reversal , making it unclear as to th e numbe r of G tetrads stabilizing Tetrahymena telomer e d(T 2G4)K quadruplexes . Initial effort s t o addres s this issu e focused o n th e fou r repea t Tetrahymena d(T 2G4)4 sequence i n Na +-containing solution , whic h was studied by non-denaturing ge l elec trophoresis, chemica l footprinting , U V cross-linking , an d NM R experiment s (29). The dat a were interprete d i n term s o f an intramolecularly folde d G quadruplex stabil ized b y three G tetrads, three T 2G latera l loops, an d between 4 and 6 yyn-guanines in the folde d structure (29) . The sam e sequence has been investigate d furthe r base d on additiona l NM R charac terization, combine d wit h intensity-restrained molecula r dynamic s computation s (77). The stud y focused o n th e predominan t conformatio n exhibitin g narro w NM R reso nances i n th e presenc e o f a broad spectra l envelop e indicativ e o f aggregate d species .
Structures of guanine-rich and cytosine-rich quadruplexes 41
1
Fig. 13.16. Schemati c o f the NMR solutio n structure-base d folding topology adopted b y the intramolecular four-repea t Tetrahymena telomer e d(T 2G4)4 quadruple x i n Na + solutio n (77) . Th e T19—T2 0 loop segment form s a doubl e chai n reversal . Th e syn guanine s ar e show n a s hatche d rectangles whil e anti guanines ar e shown a s open rectangles .
The foldin g topolog y o f th e Tetrahymena telomer e d(T 2G4)4 quadruple x i s show n schematically in Fig . 13.1 6 an d its solution structur e shown i n Fig . 13.17 . The struc ture i s unprecedented i n term s o f th e syn/anti distributio n alon g individua l guanin e stretches and around individual G tetrads, the direction s of the fou r strand s around the G quadruplex , an d the presence of a loop involve d i n a double chai n reversal (77). The solutio n structure contains three G tetrad s connected b y three loop segments . The first , four-bas e GT 2G lateral loop i s followed b y a second, three-bas e T 2G latera l loop, an d the n b y a third , two-bas e T 2 loo p involve d i n doubl e chai n reversal , as shown schematicall y i n Fig . 13.16 . Sinc e th e doubl e chai n reversa l T 2 loo p connect s two strand s that are aligned in parallel, the overal l G quadruplex contains three o f the four strand s aligned i n on e directio n and the remainin g stran d aligned in the opposit e direction (77) . This result s in fou r uniqu e groove s aroun d th e G quadruplex , one o f which i s spanned by the T 2 loop . The tw o latera l loops ar e stabilized through forma tion o f a wobble G: T bas e pair which stack s over the adjacen t G tetrad in the structur e of the G quadruplex. Furthermore, th e guanine s adopt eithe r syn—anti—anti o r syn—syn—anti pattern s along individual strands and syn:syn:syn:anti an d anti:anti:anti:syn patterns around individual G tetrads within th e G quadruplex, a s shown schematicall y in Fig . 13.1 6 (77) . There ar e two uniqu e G— G steps in the solution structure of the Tetrahymena telomer e d(T 2G4)4 G quadruplex wit h distinc t stackin g patterns. Th e G(syn)—G(anti) step s have a n overla p pattern (Fig . 13.18a ) tha t is similar to what has been observe d fo r related steps in othe r
412
Oxford Handbook of Nucleic Acid Structure
Fig. 13.17. (a ) A view o f the NMR-base d solutio n structur e o f the intramolecularl y folded four-repeat Tetrahymena telomer e d(T 2G4)4 quadruple x in Na + solutio n (77) . Two o f the guanine-containin g G 3 steps, directed toward s th e viewer , ar e shown wit h fille d bond s whil e th e othe r tw o guanine-containin g G 3 steps, directe d awa y fro m th e viewer , ar e show n wit h ope n bonds . Th e thre e loo p segment s (G6-T7-T8-G9, T13-T14-G1 5 and T19-T20) are shown wit h hatche d bonds and the bases in these segments have been deleted in the interests of clarity. Th e T19-T20 loop i s of the double chain reversal type, (b) A close-up of the doubl e chain reversal loop involvin g T19—T2 0 which connect s G16—G17—G1 8 an d G21—G22—G23 segment s that are aligned i n parallel in the solutio n structur e of the intramolecularl y folded four-repeat Tetrahymena telomer e d(T 2G4)4 quadruplex (77).
Structures qfguanine-rich and cytosine-rich quadruplexes 41
3
Fig. 13.18. View s dow n th e heli x axi s showin g stackin g betwee n adjacen t G:G:G: G tetrad s a t (a) G(syn)—G(anti) an d (b ) G(syn)—G(syn) step s i n th e NMR-base d solutio n structur e o f th e intramole cularly folded four-repeat Tetrahymena telomer e d(T 2G4)4 quadruple x (77) . Individual G:G:G:G tetrad s are drawn with eithe r filled o r ope n bonds .
414
Oxford Handbook of Nucleic Acid Structure
G quadruplexes , while th e G(anti)—G(anti) o r G(syn)—G(syn) step s exhibi t a stacking pattern (Fig . 13.18b ) simila r t o tha t previousl y observed i n a n al l parallel-stranded G quadruplexes (35). The tw o adjacen t anti:anti:anti:syn G tetrads have the sam e clockwise hydrogen bond directionalities , in contras t to th e anticlockwis e hydrogen bon d direc tionality o f the syn:syn:syn:anti G tetrad, as shown schematicall y i n Fig. 13.1 6 (77) . The solutio n structur e of the d(T 2G4)4 in Na + solutio n (77 ) is in goo d agreemen t with th e footprinting and cross-linking experiment s reporte d previously (12,29). The Tetrahymena telomer e d(T 2G4)n sequence differ s fro m it s human telomer e d(T 2AG3)n counterpart in that a single G in th e forme r sequence is replaced by an A in th e latter sequence. Thi s small difference results in distinctly differen t foldin g topologies for th e Tetrahymena (Fig . 13.16) an d human (Fig . 13.10b) G quadruplexes, with differences i n strand directionalities , guanin e syn/anti distribution s alon g strand s an d aroun d G tetrads, and in the numbe r of bases and orientations of the connectin g loop segments. The solutio n structure s of the huma n (59 ) and Tetrahymena (77 ) telomere G quadru plexes are compared directl y in Plate XV. 2.5 Telomeric sequence G quadruplexes containing G tetrads and base triads The termina l G tetrad s of a G quadruple x can potentiall y serv e as templates for th e stepwise annealin g o f nove l stacked , multistrande d pairin g alignments . Suc h align ments could b e unusua l base mismatches, base triples and tetrads , and, as is shown i n
Fig. 13.19. Schemati c o f th e NM R solutio n structure-base d foldin g topolog y adopte d b y th e four stranded singl e repea t Bombyx mori telomer e analogu e d(TAG 2) quadruple x i n Na + solutio n (78) . Thi s folding topology contain s stacked A:(A:T) triad s and G:G:G: G tetrads.
Structures of guanine-rich and cytosine-rich quadruplexes 41
5
an example below, bas e triads. Such a concept provides an approach fo r the construc tion of novel multistrande d structures emanating from a G tetrad foundation. 2.5.1 Solution structure of the Bombyx mori telomered(T2AG2)quadruplex The Bombyx mori telomer e d(T 2AG2)n sequenc e differ s fro m th e huma n telomer e d(T2AG3)n sequenc e i n havin g on e les s guanin e i n th e repeat . Th e singl e repea t d(T2AG2) sequenc e and its truncated d(TAG 2) version giv e exceptionally well-resolve d NMR spectr a i n Na +-containing solution , exhibitin g imin o proto n resonance s between 1 1 and 1 2 ppm characteristi c of G tetrad formation (78). Single guanine-ric h repea t segment s ar e know n t o for m parallel-strande d G quadruplexes containing anti-glycosidi c torsion angle s at the guanin e residues. By con trast, bot h d(TAG 2) an d d(T 2AG2) contai n a syn-guanine a t th e 5'- G residue , rulin g out formatio n o f a parallel-stranded G quadruplex . Th e d(TAG 2) sequenc e [als o th e d(T2AG2) sequence ] form s a twofold, symmetric , four-strande d G quadruple x whic h is show n schematicall y i n Fig . 13.1 9 (78) . Thi s G quadruple x contain s tw o stacke d syn:syn:anti:anti G tetrads, with individua l strands having both a parallel and antiparallel neighbour aroun d th e quadruple x (Fig . 13.19) . Th e solutio n structur e o f th e G quadruplex i s shown i n Fig. 13.20 .
Fig. 13.20. A view of the NMR-base d solution structure of the four-strande d singl e repeat Bombyx mori telomere analogu e d(TAG 2) quadruple x i n Na + solutio n (78). Two o f th e strands , directe d toward s th e viewer, ar e show n with fille d bond s whil e th e othe r tw o strands , directe d awa y fro m th e viewer , ar e shown with open bonds.
416
Oxford Handbook of Nucleic Acid Structure
Fig. 13.21. (a ) A schematic of the A:(A:T ) triad containing a n T1-A2 platfor m that was identified in th e solution structur e of the Bombyx mori telomere analogu e d(TAG 2) quadruple x (78). (b) A view dow n th e helix axis showing the overla p between the A:(A:T) tria d and the G:G:G: G tetra d in the solution structure of the Bombyx mori telomere analogu e d(TAG 2) quadruple x (78).
The tw o G tetrad s are capped by novel (T:A): A triads , shown schematicall y in Fig . 13.21a, wher e a n A residu e hydroge n bond s t o th e mino r groov e edg e o f a Watson-Crick T:A base pair. The (T:A): A triad (Fig . 13.21a ) contains a T-A bas e platform, wher e tw o sequentia l bases are aligned i n th e sam e plane (78) . The concep t of base triads had been postulated earlier on th e basi s of modelling studie s (79), while base platforms wer e initiall y observe d experimentally a t three A—A steps in the crysta l structure o f the P4-P 6 domain o f the Tetrahymena self-splicin g group I ribozyme (80) . Th e overlap geometry betwee n th e (T:A): A triad and the G tetrad is shown i n Fig. 13.21b .
Structures of guanine-rich and cytosine-rich quadruplexes 41
7
2.6 G quadruplex recognition The uniqu e foldin g topologie s associate d wit h individua l familie s o f G quadruple x architectures make them attractiv e targets for ligands ranging from small organic molecules t o proteins . Ther e i s a limited literatur e o n smal l molecul e recognitio n an d a more extensiv e literature o n protei n recognitio n o f G quadruplexe s and thes e results are presented below fro m a structural perspective. 2.6.1 Small molecules complexed to G quadruplexes There ha s bee n considerabl e interes t i n identifyin g small molecule s tha t targe t G quadruplexes an d ar e capabl e of forming site-specifi c stable complexes . Bot h ethid ium bromid e (81 ) an d carbocyanin e dye s (82 ) bin d t o G quadruplexes , bu t thes e efforts hav e not provide d specifi c complexe s necessar y for structural characterization. More recently, DNA aptamer s containing guanine-rich repeat s capable of G quadruplex formatio n have been identifie d based on thei r abilit y to targe t anionic porphyri n ligands (83,84) . Th e structur e o f thi s family o f complexe s wil l b e o f considerabl e interest give n tha t th e dimension s o f the porphyri n ligan d ar e comparable t o tha t o f the G tetrad. 2.6.2 Therapeutic potential of G quadruplexes Three example s point t o th e potential of G quadruplex-based therapeutics , as reflected by the abilit y of this architecture to targe t functiona l proteins . Thus , a combinatoriall y selected, parallel-strande d G quadruple x wa s shown t o b e a potent inhibito r o f HI V envelope-mediated cel l fusio n (85) . Th e molecula r basi s o f thi s recognitio n remain s undefined a t present. The crysta l structur e o f th e thrombin-bindin g intramolecularl y folde d d(G2T2G2TGTG2T2G2) DNA aptame r complexed t o thrombi n ha s been solve d to 2.9 A resolution (54) . Molecular recognitio n involve s ionic an d hydrophobic interactions between loo p segment s o f th e G quadruple x fol d an d distinc t region s (putativ e heparin-binding sit e and fibrinogen exosite ) on two differen t thrombi n molecules . A DNA oligome r containin g tande m guanin e repeat s and capabl e of intramolecu larly folded G quadruplex formation in K + solutio n ha s been shown to be amongst th e most activ e inhibitor s o f HI V integras e (86—88) . Th e molecula r characterizatio n o f this G quadruple x i n th e absenc e and presenc e o f boun d HI V integras e wil l b e o f great interes t since th e K + cation-folde d loo p domai n o f the G quadruple x has been shown t o be involved i n targeting the binding sit e on th e HIV integras e (89,90). 2.6.3 Proteins that target G quadruplexes Recent studie s hav e identifie d a numbe r o f protein s tha t eithe r facilitat e DNA G quadruplex formatio n (91—93 ) o r bin d t o parallel-strande d DN A G quadruplexe s (94-96), including a nuclease that cleaves DNA 5 ' t o th e G quadruplex fold (97,98) . In addition , a cytoplasmi c exoribonucleas e ha s recentl y bee n show n t o targe t RNA G quadruplexe s preferentiall y (99) . Currently , nothin g i s known abou t th e molecular basi s o f G quadruplex—protei n recognition i n thes e systems . Severa l of thes e complexe s represen t attractiv e an d challengin g structura l characterization projects.
418
Oxford Handbook of Nucleic Acid Structure
2.7 Biological relevance of G tetrad-containing G quadruplexes Sequences other tha n telomeres contai n guanin e repeats. These include immunoglob ulin switc h regions (10) , insulin-linked polymorphi c region s associate d with diabete s mellitus (56) , retinoblastom a susceptibilit y gene s (100) , an d th e contro l regio n o f c-myc (101) . These sequence s form G quadruplexe s in vitro but i t remains to b e estab lished whether suc h quadruplexes play a biological rol e in vivo. There is some indirect evidence suggestin g a potential biologica l rol e fo r G quadruplexes . Thus, bot h th e B subunit o f th e Oxytricha telomere-bindin g protei n (91,92,102 ) an d th e yeas t Rapl protein (93 ) exhibi t molecula r chaperon e functio n i n thei r abilit y t o accelerat e G quadruplex formation. Similarly, mutations in the yeast KEM1 gene , whic h encode s a nuclease specifi c fo r G quadruple x DNA , hav e bee n show n t o affec t meiosi s an d mitosis (97) . Mor e researc h is neede d t o addres s definitivel y th e issue s relate d t o potential biological role s for G quadruplexes .
3 G:C:G:C tetrad-containing quadruplexes 3.1 Triplet repeat disease sequence quadruplexes containing G:C:G:C tetrads formed through alignment of major groove edges of Watson—Crick G:C pairs The discover y o f th e expansio n o f d(CGG) B :d(CCG) B repeats associate d with th e fagile X syndrom e (103—106 ) ha s stimulated spectroscopic and footprintin g efforts t o delineate th e potentia l foldin g topologie s adopte d b y suc h sequences . Indeed , i t has been show n tha t th e d(CGG) n repea t ( n = 7 ) form s a stabl e quadruple x structur e which i s suggested to be o f the all-parallel-strande d type , and that this process is facilitated by methylation of the cytosin e residues (107). 3.1. 1 Solution structure of the d(GCG2T3GCG2) quadruplex containing CG2 fragile X syndrome triplet repeats The d(GCG 2T3GCG2) sequenc e contain s bot h guanine s an d cytosine s wit h th e potential o f formin g tetrad s containin g a mixtur e o f G an d C residues . Th e d(GCG2T3GCG2) sequenc e i n Na" 1" solution exhibit s exceptionall y well-resolved , narrow resonance s corresponding t o formatio n o f a single conformatio n (108) . Th e NMR resonance s were assigne d definitively with th e ai d of inosin e fo r guanin e an d uracil fo r thymin e substitution s and th e structur e was solved b y molecula r dynamic s calculations includin g intensity-base d refinements . The resultin g quadruple x form s through head-to-tai l dimerizatio n o f a pai r o f d(GCG 2T3GCG2) hairpins , a s shown schematically in Fig . 13.22a . Th e structur e o f this quadruplex i n show n i n Fig . 13.2 3 (108). Th e twofol d symmetr y in thi s quadruplex required th e us e of a sum-averaging protocol i n the XPLOR molecular dynamic s program (109 ) to overcom e uncertainties associated wit h intramolecula r versu s intermolecula r NO E contribution s betwee n pairs of protons (110,111 ) The connectin g T 3 loop s are of the latera l type, with adjacent strand s aligne d i n a n antiparalle l orientatio n aroun d th e quadruple x (108) . Th e outer tetrad s ar e o f th e G(syn):G(antt):G(syn):G(anti) typ e (se e Fig. 13.1) , whil e th e inner tetrad s are o f th e G(anti):C(anti):G(anti):C(anti) type , a s shown schematicall y in
Structures of guanine-rich and cytosine-rich quadruplexes 41
9
Fig. 13.22 . (a ) Schematic of the NMR solutio n structure-based folding topolog y adopted by the quadruplex formed throug h head-to-tail dimerization o f the d(GCG 2T3GCG2) sequence i n Na + solutio n (108) . This topolog y contain s outer G:G:G: G tetrads an d inne r G:C:G: C tetrads . Th e T 3 loo p i s of the latera l type. The syn guanines are shown as hatched rectangles while anti guanines are shown as open rectangles, (b) Schemati c o f the NM R solutio n structure-base d foldin g topolog y adopte d by the quadruple x forme d through head-to-tai l dimerization o f th e d(G 3CT4G3C) sequence in Na + solutio n (116) . Thi s topolog y contains outer G:C:G:C and inner G:G:G:G tetrads. The T 4 loop is of the lateral type.
Fig. 13.24a . Th e stackin g between th e oute r G:G:G: G an d inne r G:C:G: C terad s is shown in Fig. 13.25a . Thi s result represented the firs t experimenta l demonstration o f a G:C:G:C tetrad involving pairing along the majo r groov e edges of Watson—Crick G: C base pairs (108) (for earlier models, se e refs 112—114) . Both cytosin e exocycli c amin o protons ar e hydroge n bonde d i n thi s majo r groove-aligne d G:C:G: C tetra d (Fig . 13.24a), whic h i s consisten t wit h bot h cytosin e amin o proton s resonatin g a t c. 9 pp m i n th e NM R spectrum . Furthermore , th e observe d NOE s betwee n th e
420
Oxford Handbook of Nucleic Acid Structure
Fig. 13.23. A view o f the NMR-base d solutio n structure o f the d(GCG 2T3GCG2) quadruple x formed through head-to-tail dimerizatio n of a pair of hairpins in Na + solutio n (108). Th e tetra d segments of one d(GCG2T3GCG2) hairpi n is shown with filled bonds while the othe r is shown with ope n bonds. The T 3 loops are of the lateral type and are shown by hatched bonds.
guanine H 8 an d cytosin e H5 proton s acros s the Watson—Cric k G: C bas e pairs of the tetrad provid e ke y restraints defining th e alignmen t i n th e centra l G:C:G: C tetrads in the solution structure of the d(GCG 2T3GCG2) quadruple x (108). 3.1.2 Solution structure of the d(G3CT4G3C) quadruplex formed by G3C repeats observed in adeno-associated viral DNA The adeno-associate d virus , a human parvovirus , i s unique amongs t eukaryoti c DN A viruses in it s ability to integrat e sit e specifically int o a defined region o f chromosom e 19 (reviewe d i n ref . 115) . Th e G 3C sequenc e ha s been identifie d bot h i n adeno associated virus (as islands) and i n chromosom e 1 9 (as tandem repeats ) and coul d pla y a rol e i n th e mechanis m o f site-specifi c integration . Th e NM R spectru m o f th e d(G3CT4G3C) sequence , which contain s two G 3C segment s separate d by a T n linke r (n = 3 o r 4), exhibit s a set of resonances corresponding t o a predominant conforma tion i n Na + solutio n (116) . Th e NM R resonance s in th e d(G 3CT4G3C) sequenc e were assigne d unambigously with th e ai d of site specifically incorporate d 15 N-labelled guanines [inosin e for guanine substitutions did not wor k i n this case owing t o destabi lization o f the d(G 3CT4G3C) structur e o n inosin e substitution ] and th e structur e was solved usin g the sum-averagin g routin e durin g bot h distanc e an d intensit y refine d molecular dynamic s calculation s (116) . Th e foldin g topolog y o f th e d(G 3CT4G3C)
Structures of guanine-rich and cytosine-rich quadruplexes 42
1
Fig. 13.24. Schemati c drawing s o f G:C:G: C tetra d pairin g alignment s involvin g dimerizatio n o f Watson-Crick G: C bas e pairs along (a ) their major groov e edges (108) and (b ) their minor groove edges (118).
quadruplex structur e i n Na + solutio n i s show n schematicall y i n Fig . 13.22B . Thi s quadruplex form s throug h head-to-tai l dimerizatio n o f a pai r o f d(G 3CT4G3C) hairpins involvin g connectin g T 4 latera l loop s an d individua l strand s runnin g anti parallel t o eac h othe r aroun d th e quadruplex . Thi s quadruplex als o contain s a pair o f separated G:C:G: C tetrad s forme d throug h majo r groov e alignmen t o f a pai r o f Watson—Crick G: C bas e pairs , a s show n previousl y i n Fig . 13.24a . Th e structur e of this quadruple x i s shown i n Fig . 13.2 6 (116) . Th e bas e overlaps between the oute r
422
Oxford Handbook of Nucleic Acid Structure
Fig. 13.25. View s down th e heli x axis showin g stacking between adjacen t G:G:G: G (fille d bonds ) an d G:C:G:C (ope n bonds ) tetrad s in: (a ) the solutio n structur e o f the d(GCG 2T3GCG2) quadruple x i n Na + solution (108) ; and (b) the solution structure o f the d(G 3CT4G3C) quadruple x in Na + solutio n (116) .
G(anti):C(anti):G(anti):C(anti) an d inne r G(syri):G(anti):G(syn):G(anti) tetrad s is shown in Fig . 13.25b . The abov e studie s o n th e foldin g topologie s o f quadruplexe s forme d throug h dimerization o f th e d(GCG 2T3GCG2) (Fig . 13.22a ) (108 ) an d d(G 3CT4G3C)
Structures of guanine-rich and cytosine-rich quadruplexes 42
3
Fig. 13.26. A vie w o f th e NMR-base d solutio n structure o f th e d(G 3CT4G3C) quadruple x forme d through head-to-tail dimerization of a pair o f hairpins in Na + solutio n (116). Th e tetra d segment o f one d(G3CT4G3C) hairpin is shown with filled bonds while the other is shown with open bonds. The T 4 loop s are of the lateral type and are shown by hatched bonds.
(Fig. 13.22b ) (116 ) sequence s establis h the prevalenc e o f G:C:G: C tetra d formatio n (Fig. 13.24a ) an d tha t suc h tetrad s ca n be eithe r adjacen t (Fig . 13.22a ) o r separate d (Fig. 13.22b ) fro m each other i n the quadruplex , depending o n sequence. 3.1.3 A Na+ to K+ cation-dependent conformational switch in the loop-spanning segment of a G3C repeat-containing quadruplex The rol e o f Na+ versu s K+ i n stabilizing DNA quadruplexe s has been on e o f consider able interest . Th e mos t favourabl e situatio n fo r a structura l analysi s of monovalen t cation-dependent conformation s woul d b e on e wher e distinc t NM R spectr a wer e observable fo r a quadruplex i n Na + solutio n o n th e on e han d an d in K + solution o n the other , and , i n addition , interconversio n betwee n thes e distinc t quadruple x con formations wer e slo w o n th e NM R time-scale . Th e NM R spectru m o f th e d(G3CT4G3C) sequenc e i n K + solutio n exhibit s a set of resonances corresponding t o a predominant conformatio n (117 ) tha t is distinct fro m it s predominant conformationa l counterpart i n Na + solutio n (116) . Furthermore , th e distinc t conformation s o f th e
424
Oxford Handbook of Nucleic Acid Structure
d(G3CT4G3C) sequence s in Na + an d K + solution s ar e in slo w exchang e i n solution s containing a mixture o f these monovalent cations . The solutio n structur e o f th e d(G 3CT4G3C) quadruple x i n K + solutio n ha s bee n solved (Plat e XVIb) (117 ) and , togethe r with th e correspondin g quadruple x structur e in Na + solutio n (Plat e XVIa ) (116) , define s th e molecula r basi s o f th e Na + t o K + cation-dependent conformationa l switch . Bot h Na + an d K+ cation-dependen t confor mations o f th e d(G 3CT4G3C) quadruplexe s exhibi t certai n commo n structura l fea tures, which include head-to-tai l dimerization o f symmetry-related hairpins , antiparalle l alignment o f adjacen t strands , and stacke d adjacen t G(syn):G(anti):G(syn):G(anti) tetrad s in th e centra l cor e o f the quadruplexes . Th e tw o quadruple x conformation s diffe r i n the conformation s o f th e T 4 loop s (Fig . 13.27a,b fo r Na + an d K + conformations , respectively), th e relativ e alignmen t o f opposing Watson—Crick G:C bas e pairs across
Fig. 13.27 . Th e fol d o f the T5-T6-T7-T8 hairpi n loop in th e solutio n structure of the d(G 3CT4G3C) quadruplex forme d throug h head-to-tail dimerizatio n of a pair of hairpins in: (a ) Na+ solutio n (116) and (b)K + solution (117).
Structures of guanine-rich and cytosine-rich quadruplexes 42
5
Fig. 13.28. Th e alignmen t o f opposing Watson—Crick G: C bas e pairs along their majo r groov e edge s in the solutio n structur e of th e d(G 3CT4G3C) quadruple x forme d throug h head-to-tai l dimerizatio n o f a pair of hairpins in: (a ) Na+ solutio n (116) and (b ) K+ solutio n (117) . Note th e rol e o f the potentiall y boun d K + catio n i n coordinatin g t o th e O 6 an d N 7 accepto r atoms o f guanines whose Hoogsteen edge s are directed towards each other i n (b).
426
Oxford Handbook of Nucleic Acid Structure
Fig. 13.29. A model of the K + catio n buried within the T6-T7-T8-G 9 loo p segmen t in th e solutio n structure o f the d(G 3CT4G3C) quadruple x formed throug h head-to-tail dimerization of a pair o f hairpins in K+solution (117) .
the majo r groove (Fig . 13.28a, b for Na + an d K + conformations , respectively) , an d the total numbe r o f potentia l monovalen t cation-bindin g site s (116 , 117). Singl e K +binding cavitie s wer e propose d withi n eac h o f th e symmetry-relate d T 3G loop spanning segment s (Fig . 13.29) resultin g i n tw o additiona l potentia l monovalen t cation-binding site s i n th e K +-stabilized d(G 3CT4G3C) quadruple x (Plat e XVIb) rela tive t o its Na+-stabilized counterpart (Plat e XVIa) . Th e majo r groov e edge s o f opposing guanine s fro m Watson—Cric k G: C bas e pairs are bridged b y potential coordinate d K+ cation s in the d(G 3CT4G3C) quadruple x conformation i n K+ solutio n (Fig . 13.28b ) (117), i n contrast to the G:C:G:C tetrad formation i n Na + solutio n (Fig . 13.28a) (116) . The solutio n structur e of th e K +-stabilized d(G 3CT4G3C) quadruple x define s th e principles involved i n potential K + coordinatio n withi n a T3G segment , resultin g i n a defined loo p architectur e whos e outwardl y pointin g functiona l groups ca n provide a unique folde d topology tha t can target potentia l recepto r site s (117) . Indeed , the biological significance o f this resul t i s likely to be related to the independen t demonstration o f K +-selective foldin g o f loop domain s withi n intramolecula r G quadruplexes , with these uniquely folded loops responsibl e for the potent oligonucleotide inhibitor y activity agains t HIV integras e (86—88) .
3.2 Quadruplexes containing G:C:G:C tetrads formed through alignment of minor groove edges of Watson—Crick G:C pairs The example s abov e define d th e alignmen t associate d wit h th e pairin g o f tw o Watson—Crick G: C bas e pairs throug h thei r majo r groov e edge s t o for m G:C:G: C tetrads (Fig . 13.24a) whic h ar e stabilize d throug h thei r participatio n wit h G:G:G: G tetrads i n quadruple x formatio n (108,116) . Suc h G:C:G: C tetra d formatio n (Fig. 13.24a) i s facilitated b y the glycosidi c bonds bein g directe d toward s four corner s of th e
Structures of guanine-rich and cytosine-rkh quadruplexes 42
7
tetrad, a s they do fo r G:G:G: G tetra d formation (Fig. 13.1), An interestin g issu e relates to whethe r G:C:G: C tetrad s ca n als o for m throug h alignmen t o f th e mino r groov e edges of two Watson—Crick G:C bas e pairs. In this case, pairs of glycosidic bonds would be directe d towards each other an d steri c constraints may require departure s from base
Fig. 13.30. (a ) Schematic o f the X-ra y crystallographi c structure-base d foldin g topolog y adopte d b y th e quadruplex formed through head-to-hea d dimerizatio n o f the d(GCATGCT ) sequence (118) . Thi s topol ogy contain s a pair o f G:C:G:C tetrad s flanked on on e sid e by a reversed A: A mismatch. Th e G:C:G: C tetrads involve alignment acros s the mino r groov e edge s o f Watson—Crick G: C bas e pairs. The A— T loops are of the latera l type. Reproduce d wit h permissio n of Structure. (b) A view dow n the heli x axi s showing the stackin g between th e adjacen t reverse d A: A mismatch (fille d bonds ) an d th e G:C:G: C tetra d (ope n bonds) in the X-ray structur e o f the d(GCATGCT) quadruplex (118) .
428
Oxford Handbook of Nucleic Acid Structure
planarity around this alternative G:C:G:C tetrad alignment. Recen t X-ra y structures of specific G:C - (118 ) an d A:T - (119 ) containin g sequences , which ar e described below, have provided molecula r view s defining the alignmen t o f Watson—Crick G:C pair s (and Watson—Crick A:T pairs) along their minor groov e edges. 3.2.1 Crystal structure of the d(GCATGCT) quadruplex The 1. 8 A X-ray structure o f the d(GCATGCT) sequence ha s defined a new quadru plex architectur e (118) . Th e structur e involves head-to-hea d dimerizatio n o f a pair of hairpins , a s show n schematicall y i n Fig . 13.30a , wit h th e structur e show n i n Fig. 13.3la . The quadruple x structure contains two stacked G:C:G:C tetrad s and on e A:A mismatch . Th e quadruple x fol d contain s a twofold elemen t o f symmetr y wit h adjacent strand s runnin g antiparalle l to eac h other , al l glycosidic torsion angle s in th e anti range, and al l sugar puckers in th e C2'-etido range. Formation o f Watson-Crick G: C pair s involve cross-stran d alignmen t o f guanines and cytosines , wit h furthe r pairin g o f th e mino r groov e edge s o f th e G: C pair s through tw o hydroge n bond s t o for m th e G:C:G: C tetrad s shown schematicall y in Fig. 13.24 b (118) . The base s in th e G:C:G: C tetra d are not coplana r but ar e tilted by c. 30°. Ther e i s extensive stackin g between adjacen t G:C:G: C tetrad s i n th e cor e o f the quadruple x through overla p of the cytosin e pyrimidine ring s and the guanin e sixmembered rings , as shown i n Fig . 13.31b. The adenine s form a n A: A mismatc h throug h cross-stran d alignmen t involvin g a pair o f hydroge n bond s alon g thei r majo r groov e Hoogstee n edges . Thi s A: A mis match, involvin g A residues in the T A loops, anchor the quadruple x achitecture. Th e stacking between th e A:A mismatch an d the G:C:G: C tetrad is shown i n Fig . 13.30 b (118). I n addition, the purine ring o f G5 stack s over the suga r ring of A3 in a van der Waals interactio n simila r t o tha t whic h ha s been observe d previousl y i n th e crysta l structure o f Z-DNA. The sugar—phosphat e backbone s o f these G:C:G: C tetrad s forme d throug h mino r groove alignmen t i n th e d(GCATGCT ) quadruple x (118 ) ar e distinc t fro m thos e observed fo r th e G:C:G: C tetrad s forme d throug h majo r groov e alignmen t i n th e d(GCG2T3GCG2) (108 ) an d d(GC 3T4GC3) (116 ) quadruplexe s presente d earlie r i n this chapter. There i s a close juxtaposition o f the backbon e phosphates o f C2 an d C 6 which ar e coordinated t o a cation in th e structur e of the d(GCATGCT ) quadruplex . Furthermore, two molecules of the d(GCATGCT ) quadruple x are aligned in the crys tallographic lattic e throug h T: T mismatc h formatio n involvin g th e loope d ou t T residue of the AT loop (118) . 3.2.2 Crystal structure of the d quadruplex The nove l architectur e tha t define s th e structur e o f th e d(GCATGCT ) quadruple x (118) presente d abov e ha s recentl y bee n observe d i n th e d DN A oligomer a s well (cyclize d in thi s case) . A ke y featur e commo n t o bot h sequence s is the separatio n of complementary 5'-purine—pyrimidin e dinucleotide step s within th e d(..RYNYRYN..) sequence context , where R i s a purine, Y is a pyrimidine, an d N is any nucleotide. The hig h resolution X-ray structure o f the cyclic octanucleotide d establishes quadruple x formation throug h dimerizatio n (119) . This structur e involves
Structures ofguanine-rich and cytosine-rich quadruplexes 42
9
Fig. 13.31. (a ) A view o f the 1. 8 A X-ray crystallographi c structure of the d(GCATGCT ) quadruple x formed throug h head-to-head dimerizatio n of a pair of hairpins (118) . One stran d is shown with darkened bonds whil e th e othe r stran d i s show n wit h ope n bonds . Th e base s i n th e G:C:G: C tetrad s depart significantly fro m planarity . (b ) A vie w dow n th e heli x axi s showin g th e stackin g between th e adjacen t G:C:G:C tetrad s (fille d an d ope n bonds , respectively ) in th e X-ra y structur e o f th e d(GCATGCT ) quadruplex (118) .
cross-strand formatio n o f Watson—Crick A:T bas e pairs involving the A— T steps , wit h the minor groove edges o f these A: T bas e pair s directed towards each other . Th e A: T base pair s are inclined by c. 32° withi n eac h laye r of the quadruplex . Sinc e A:T pair s
430
Oxford Handbook of Nucleic Acid Structure
Fig. 13.32 . (a ) A view of the X-ray crystallographi c structure of the d quadruple x (119) . The base s i n th e A:T:A: T tetrad s depar t significantl y fro m planarity . (b ) A vie w down th e heli x axis showing the stacking between the adjacent A:T:A:T tetrad s (filled an d open bonds, respectively) in the X ray structure of the d quadruple x (119).
cannot dimeriz e throug h hydroge n bon d alignment s involvin g thei r mino r groov e edges (contain only accepto r atoms), a sodium ion occupie s the centr e of the quadru plex an d i s coordinated t o th e thymin e O 2 oxygen s o f fou r A: T paire d thymines , as shown i n Fig . 13.32 a (119) . The stackin g in the centra l core o f the d quadruplex is shown i n Fig . 13.32b .
Structures of guanine-rich and cytosine-rich quadruplexes 43
1
There i s a striking similarity in the crystallographi c structure s of the centra l core o f the d(GCATGCT ) quadruple x (118 ) i n Plat e XVII a an d th e centra l cor e o f th e d quadruple x (119 ) i n Plat e XVIIb . I t ha s been propose d tha t thi s quadruplex architectur e containin g commo n structura l elements , calle d a bi-loo p motif, coul d play a role in biological processes involved i n strand exchange (119) .
3.3 Other potential purine-containing tetrads The demonstratio n o f majo r groove-aligne d G:C:G: C tetra d (108,116 ) an d mino r groove-aligned G:C:G: C tetra d (118 ) formation , i n additio n t o th e long-establishe d formation of G:G:G:G terad s (8), suggests that othe r purine-containin g tetra d align ments ma y also stabiliz e quadruplex formation. Possibl y th e mos t interestin g of these are tetrad s containin g G an d A purin e residue s whic h hav e th e potentia l t o alig n through th e majo r groov e edge s o f eithe r G(anti):A(anti) o r G(anti):A(syn) mismatc h pairs to for m G:A:G:A tetrad s (see models proposed i n ref . 68). Th e identificatio n of G:A:G:A tetrad s and determinatio n o f thei r alignmen t geometr y represent s a futur e challenge. This goal may be approachable based on th e reporte d equilibriu m betwee n duplex and quadruplex states for d(AG) 10 at neutral pH (120) .
3.4 Biological relevance of quadruplexes containing G:C:G:C tetrads The phas e of the d(CGG) n fragile X syndrom e triplet repeat can be either CGG, GGC , or GCG . Th e abilit y o f d(GCG 2T3GCG2), whic h contain s GC G an d CG 2 repeat s (108), an d d(G 3CT4G3C), which contain s G 2C repeat s (116), t o for m G quadruplexes stabilized b y G:C:G: C an d G:G:G: G tetrad s suggest s a potentia l biologica l rol e fo r G:C:G:C tetrads . Suc h G:C:G: C tetrad-containin g G quadruple x structure s coul d serve a s potential blockag e site s for th e progres s o f replicatio n forks (121 ) an d migh t account for the blockage of the fragil e X locus observed experimentally (122). The ke y demonstration establishin g formation o f G:C:G: C tetrad s throug h align ment o f Watson—Crick G: C bas e pairs alon g eithe r thei r majo r groov e (108,116 ) o r minor groov e (118 ) edge s ha s potentia l implication s i n geneti c recombination . Homologous DN A segment s coul d b e brough t int o registe r throug h G:C:G: C (an d A:T:A:T) tetra d formation as a first ste p prior t o th e onse t o f strand exchange medi ated through a pair of Holliday junction cross-ove r sites.
4 i-motif quadruplexes containing intercalated C:CH+ mismatch pairs 4.1 Four-stranded i-motif quadruplexes The formatio n o f C:CH+ mismatc h pairs for poly C a t acidic pH wa s proposed ove r three decade s ago. Furthermore , th e th e X-ray fibr e diffractio n patter n of poly C was interpreted i n term s o f a parallel-strande d C:CH + mismatch-containin g duple x (123,124). Direc t evidenc e fo r formatio n o f C:CH + pair s (Fig . 13.33a ) i n parallel stranded DNA duplexe s emerge d followin g th e structura l characterization of the par -
432
Oxford Handbook of Nucleic Acid Structure
Fig. 13.33. (a ) A schemati c drawin g o f th e reverse d C:CH + mismatc h pairin g alignment , (b ) A view down the helix axis showing the stacking between the adjacent C:CH + mismatc h pairs in the NMR-based solution structur e o f the d(TC 5) i-motif quadruplex a t acidi c p H (15) . One C:CH + mismatc h i s shown with filled bonds and the other with open bonds.
allel-stranded d(CA ) duple x a t acidi c p H i n th e crystallin e stat e (125) , an d th e d(TCGA) duple x a t acidi c p H i n solutio n (126,127) . Subsequently , solutio n NM R studies hav e identified a higher orde r quadruple x structur e involving C:CH + pair s as the basi c repeat uni t (15) . Thi s quadruple x architecture , calle d th e i-moti f (15) , has added a new dimension to our understanding of multistranded nuclei c aci d structures . 4.1.1 Solution structure of d(TC5) i-motif quadruplex The NM R spectr a of d(TC5) at acidic pH exhibi t an unusual set of chemical shifts an d NOE pattern s consistent with th e formatio n o f a folded highe r orde r solutio n struc ture (15) . A concentration-dependen t stud y o f d(TC 5) a t acidi c p H b y ge l elec trophoresis establishe d tha t thi s sequenc e form s a four-strande d quadruple x a t m M concentrations (15) . A single set of resonances were observe d fo r d(TC 5) a t acidic p H
Structures of guanine-rich andcytosine-richquadroplaces433
consistent wit h formatio n o f a four-strande d quadruple x wit h a fourfol d elemen t o f symmetry. The observatio n o f imin o proto n resonance s betwee n 1 5 an d 1 6 pp m establishe d the formatio n o f C:OH' mismatc h pairs . A se t o f diagnosti c NOE s wer e observe d between suga r HI ' proton s o n partne r strand s fo r th e d(TC 5 ) quadruple x (15) , a feature no t observe d fo r right-hande d amiparalle l doubl e helica l DNA. Furthermore, a se t o f critica l NOE s o f a non-sequentia l natur e wer e identified , whic h reflecte d th e order o f bas e pai r stackin g withi n th e quadruplex . Thes e NOL s exhibite d th e sequen tial patter n T1-C6-C2-C5-C3-C 4 i n th e d(T l -C2-C3-C4-C5-C6) quadruple x and provide d critica l restraint s tor structur e determination . The foldin g topolog y o f th e four-strande d (TC 5) quadruple x (15 ) i s show n i n a schematic vie w i n fig . 13,34. I t consist s o f tw o parallel-strande d C;CH + mismatc h paired duplexe s tha t ar e interlocke d throug h interdigitado n o f C:CH ' pair s fro m indi vidual duplexe s tha t ar e aligne d antiparalle l t o eac h other . Individua l strand s artaligned antiparalle l t o thei r neighbour s an d adjacen t C:CH + mismatc h pair s ar e approximately orthogona l t o eac h othe r (fig . 13.34) .
Fig. 13.34 . A shemati c o f th e NM R solutio n strurture-base d foldin g topolog y adopte d b y th e i-moti f quadruples forme d b y fou r strand s o f d(TC1) i n aeodi c pl 1 solutio n (15) . Two parallel-strande d C : C E l paired duplexe s interdigtat e int o each othe r o n a n a n t i p a r a l l e d o r i e n t a t i o n . (Reproduce d w i t h permission of Nature).
434
Oxford Handbook of Nucleic Acid Structure
Fig. 13.35. A view of the NMR-based solution structure of the four-stranded d(TC5) i-moti f quadruplex in acidi c p H solutio n (15) . On e parallel-strande d duple x i s show n usin g fille d bond s whil e th e other , aligned antiparallel to th e first, is shown using open bonds.
The solutio n structur e of the four-strande d (TC 5) i-moti f quadruple x i s shown i n Fig. 13.3 5 (15) . Th e quadruple x i s right-handed wit h a n c. 16° twis t betwee n mis match pairs. The i-moti f quadruple x contain s a pair o f opposin g wid e groove s an d a pair o f opposin g narro w grooves . Ther e i s a pairwise associatio n o f sugar—phosphat e backbones, which result s in clos e van der Waals contacts between suga r rings spannin g the mino r groove . Thi s architectur e explain s th e stron g suga r H1'—suga r H1 ' NOE s across the minor groov e tha t are characteristic of the i-motif quadruplex (15). The bas e overla p alignment s betwee n adjacen t face-to-fac e stacke d C:CH + mis match pair s is shown i n Fig . 13.33 b (15) . Ther e i s no overla p betwee n th e cytosin e rings themselves but, rather, ther e i s overlap between th e exocycli c amin o group s and between th e exocycli c carbony l group s (Fig . 13.33b) . Ther e i s a reversed orientatio n of th e amin o an d carbony l dipole s betwee n adjacen t stacke d C:CH + pair s an d a maximal separatio n o f the cytosin e N 3 nitrogen s i n this overlap pattern. Th e C:CH + mismatch i s of the reverse d type wit h on e hydroge n bonde d an d one expose d amin o proton fo r individual cytosine s in th e pair . Th e exchang e characteristic s of the cyto -
Structures of guanine-rich and cytosine-rich quadruplexes 43
5
sine irnino an d amino protons i n the i-motif quadruplex imply imin o proto n hoppin g between cytosine s within th e C:CH + pair of > 80 000 s -1 (15,128) . The semina l discover y o f th e i-moti f C:CH + quadruple x wa s unanticipated an d emerged fro m a n in-depth an d long-standing attemp t a t understanding the hydroge n exchange properties of dCn-containing sequence s at acidic pH (15) . It became quickly apparent that other dC n-rich sequences , in addition t o d(TC 5), als o adopt this quadru plex architecture at acidic pH (128) . 4.1.2 Solution structure of d(TC2) and d(m 5 CCT) i-motif quadruplexes The solutio n structur e determination o f the d(TC 5) i-moti f quadruplex (15 ) was followed b y a highe r resolutio n structur e determination o f th e simple r d(TC 2) i-moti f quadruplex in acidic pH solutio n (129) . The latte r NMR studie s identified additional NOEs characteristi c of the i-moti f quadruplexe s tha t wer e i n additio n t o th e previ ously identified strong suga r H1'—H1 ' cross-peaks between adjacen t strand s across the narrow groov e (15) . Th e mos t critica l newl y identifie d restraint s include d stron g NOEs observe d betwee n th e cytosin e amin o proton s an d suga r H2',2' ' proton s o n adjacent strand s across the wide groov e (129) . The solutio n structure of the d(TC 2) i motif quadruple x establishe d tha t sequence s containin g a s fe w a s tw o successiv e cytosines are sufficient fo r formation of an i-motif quadruplex (129). The d(m 5CCT) sequenc e forms tw o i-moti f quadruplexe s of comparable propor tions in equilibrium unde r acidi c pH condition s (129) . The analysi s of the NM R dat a established tha t on e o f thes e conformer s wa s th e maximall y intercalate d i-moti f quadruplex simila r t o it s d(TC 2) counterpart , whil e th e othe r involve d a shiftin g i n registry of the intercalate d C:CH+ mismatch pairs , resulting in a partial loss of intercalation contributions . Th e latte r conformer presumabl y reflects relie f o f methyl grou p steric clashes in the full y intercalate d d(m 5CCT) i-moti f quadruplex. 4.1.3 Solution structure of the d(m5CCTC2) i-motif quadruplex A mor e recen t solutio n structura l stud y ha s addresse d th e issu e relate d t o whethe r intervening residue s such as T:T mismatche s can be accommodate d withi n an i-motif quadruplex containin g intercalate d C:CH + mismatc h pairs . Th e solutio n structur e determination o f the d(m 5CCTC2) i-moti f quadruplex ha s definitively addressed this issue and come u p with a n unanticipated answer (130). The solutio n structur e o f th e d(m 5CCTC2) i-moti f quadruple x show n i n Fig. 13.3 6 establishe s that th e thymin e base s o f one parallel-strande d duple x compo nent intercalate as a symmetrical T:T mismatc h pair between C:CH + mismatc h pairs, while thos e o n th e othe r parallel-strande d duple x componen t ar e unpaired and loop out int o solutio n (130) . Furthermore , th e interconversion betwee n paire d and looped out thymin e base s ca n be monitore d b y NM R an d occur s at a rate o f 1. 4 s -1 at 0° C with a n activation energy o f 94 k J mol -1 (130) . Thi s opening—closing process is con certed an d occurs without disruptio n o f the entir e i-moti f quadruplex. Interestingly, the interconversio n rat e increase s to 4 0 s -1 a t 0° C wit h a reduce d barrie r o f 5 5 kJ mol-1 fo r the d(m 5CCUC2) i-moti f quadruplex wher e a U ha s replaced th e interna l T residue . Thu s th e swingin g o f th e pyrimidin e residu e associate d wit h th e opening—closing process is impeded b y the methy l group . Thes e studies represents an elegant exampl e o f a bistabl e DN A moti f wit h broke n symmetr y whic h ha s bee n
436
Oxford Handbook of Nucleic Acid Structure
Fig. 13.36. A vie w o f th e NMR-base d solutio n structur e o f th e four-strande d d(m 5CCTC2) i-moti f quadruplex i n acidi c pH solutio n (130). On e parallel-strande d duplex is shown using filled bonds while the other , aligne d antiparalle l to th e first , i s shown usin g ope n bonds . The pai r o f looped ou t thymin e residues can be clearly seen positioned in the grooves .
characterized both structurally and using hydrogen exchang e measurements (130) . It is als o importan t t o emphasiz e tha t th e mai n feature s o f th e i-moti f quadruple x are mantained despit e incorporatio n o f th e T: T mismatc h pai r betwee n C:CH + mis match pairs. 4.1.4 Base pair opening in the i-motif quadruplex Hydrogen exchang e o f imin o proton s i n dC n sequence s a t acidi c p H i s limited b y base mismatc h openin g o f th e i-moti f quadruple x (128) . Th e lac k o f a n effec t o f added catalyst s on th e hydroge n exchang e o f the imin o proton s o f C:CH+ mismatc h pairs (128 ) mus t reflec t th e predominan t contributio n o f intrinsi c catalysis across th e C—N3H +... C—N3 pai r i n th e i-moti f quadruplex . Th e measure d C:CH + mismatc h pair lifetime s are two order s o f magnitud e longe r tha n th e correspondin g value s for Watson—Crick pairs in B-form DN A (128) . Thi s coul d reflec t th e intercalatio n o f the C:CH+ mismatc h pairs within th e structur e of the i-moti f quadruplex. A free energ y value of-8. 5 k J mol -1 pe r C:C + mismatc h pai r wa s deduced fo r formatio n o f th e d(TC5) i-motif quadruplex from single strands (128).
Structures of guanine-rich and cytosine-rich quadruplexes 43
7
4.1.5 Crystal structures of d(C4) and d(C 3 T) i-motif quadruplexes The publicatio n o f the NMR-based solution structur e of the d(TC 5) i-moti f quadru plex (15 ) has , i n turn , stimulate d effort s t o elucidat e th e structur e o f th e i-moti f quadruplex i n th e crystallin e state. Thes e effort s hav e been quit e successful , startin g with th e 2. 3 A crysta l structure of d(C 4) (131 ) an d th e 1. 4 A crysta l structur e o f d(C3T) show n i n Fig . 13.3 6 (132) . There i s good agreemen t betwee n th e helica l features o f the i-moti f quadruple x architectur e o f the NMR-base d solutio n structure s (15,129,130) an d X-ray-based crysta l structures (131,132). Thus, th e averag e right-hande d helica l twist s o f 12.4 ° an d 17.1 ° observe d i n th e crystal structure s of d(C 4) (131 ) an d d(C 3T) (132 ) i-moti f quadruplexes, respectively, compare favourabl y with the helica l twis t o f 16° reported in the solutio n structure o f the d(TC 5) i-moti f quadruplex (15). In addition, th e overlap geometries between adja cent C:CH + tetrad s ar e very similar between th e solutio n (15 ) an d crysta l (131,132 ) structures o f th e i-moti f quadruplex . Th e bas e stackin g distance between successiv e C:CH+ bas e mismatche s i s 3.1 A in th e tw o crysta l structure s of the i-moti f quadruplexes (131,132) , whic h i s consistent with th e sam e meridional spacin g in th e X-ra y fibre diffractio n patter n of polycytidylic acid (123). It has been pointed ou t fro m th e crysta l structures that the i-motif quadruplex has a flat and ribbon-shape d architectur e wit h very wid e groove s alon g tw o side s and ver y narrow groove s a t th e ends . Furthermore , ther e i s a complementarit y i n th e fi t between th e zigza g pathway of the sugar—phosphat e backbones of adjacent antiparallel strands a t th e narro w en d o f th e twiste d ribbo n (131,132) . Thi s clos e packin g is reflected i n th e stron g NO E betwee n th e suga r H1 ' proton s o n adjacen t strand s observed i n th e solutio n structure s o f i-moti f quadruplexe s (15) , whic h i s readily explained b y th e observe d separatio n o f c. 3.1 A betwee n thes e proto n pair s i n th e crystal structures (131,132) . The glycosidi c torsio n angle s ar e in th e hig h anti rang e whil e ther e i s considerabl e variation i n th e suga r pucker s within th e crysta l structures of the i-moti f quadruplexes (131,132). Ther e i s also considerable asymmetry in th e phosphat e positions, as reflected by the sprea d in phosphorus—phosphorus separations across the wide an d narrow grooves in the crysta l structures o f the i-motif quadruplexes. Severa l opposing phosphate group s on on e stran d in th e wid e groov e exten d awa y fro m th e centr e o f the molecule , whil e those on the opposing strand in this groove bend over towards each other. The C:CH + mismatc h pair s are well define d i n th e 1. 4 A crysta l structur e of th e d(C3T) i-moti f quadruple x (Fig . 13.37 ) (132) , wit h centra l N-H- N heteroato m dis tances o f 2.74 A an d N—H... O heteroato m distance s of 2.77 A. Furthermore , th e exo cyclic amin o grou p o f eac h cytosin e i s hydroge n bonde d t o a wate r molecul e wit h N-H..O heteroatom distance s of 3.00 A . There ar e 59 solvent molecules i n the asym metric uni t of the d(C 3T) i-moti f quadruplex, with a small subset bridging cytosine exo cyclic amino group s and phosphate oxygens o n adjacen t strands . Several sodium cations have been identified in the high resolution crystal structure of d(C3T), wit h their octahe dral coordination spheres containing water molecules and phosphate oxygens (132). In summary , ver y simila r interna l i-motif , intercalate d C:CH + architecture s have been determined fo r the d(TC 5) solutio n structur e (see Plate XVIIIa) (15) and d(C 3T) crystal structure (Plate XVIIIb) (132) .
438
Oxford Handbook of Nucleic Acid Structure
Fig. 13.37. A view of the 1. 4 A crysta l structur e o f the four-strande d d(C 3T) i-motif quadruple x (132). One parallel-strande d duplex is shown using filled bond s while the other, aligned antiparallel to th e first, is shown using open bonds.
4,1.6 Crystal structures of d(C3A2T) and the human telomere d(TA2C3) i-motif quadruplexes The duple x segmen t o f huma n telomere s contain s d(C 3TA2)n cytosine-ric h an d d(T2AG3)n guanine-ric h repeat s o n complementar y strands . I t wa s therefore o f grea t interest t o determin e whethe r th e cytosine-ric h segment s o f d(C 3TA2)n repeat s [or d(TA2C3)n repeat s dependin g o n th e phase ] ca n for m i-moti f quadruplexe s and , i n addition, elucidat e th e foldin g topolog y o f th e TA 2 segment . Considerabl e progres s has been mad e toward s thes e goal s with the publication o f the crysta l structure o f the single repeat human telomer e d(TA 2C3) i-moti f quadruplex at 1. 9 A resolution (133 ) and the crysta l structure o f a sequence variant , d(C 3A2T) i-motif quadruplex, a t 2.0 A resolution (134) . The cytosin e segment s form i-motif quadruplexes in both structures (133, 134) wit h helical parameters simila r to those reporte d for the earlier crysta l structure o f th e d(C 3T) i-moti f quadruple x solve d t o ver y hig h resolutio n (132) . Th e A:T-rich segment s i n th e d(TA 2C3) an d d(C 3A2T) i-moti f quadruplexe s adop t novel folding topologies and these are discussed below. The crysta l structur e o f th e termina l segment s o f th e four-strande d d(TA 2C3) i-motif quadruplex is shown in Fig . 13.3 8 (133) . Th e 5'-TA 2 segments exhibit differ ent conformations, wit h one of them adopting a novel tight loop fold in which the 5' and 3'-end s o f adjacent strand s are brought into clos e proximity. Thi s folde d segment
Structures of guanine-rich and cytosine-rich quadruplexes 43
9
Fig. 13.38. (a ) A view o f the 1. 9 A crysta l structure o f the four-strande d d(TA 2C3) i-moti f quadruple x (133). On e parallel-strande d duple x i s shown usin g filled bonds while the other, aligne d antiparalle l to th e first, is shown using open bonds, (b) A view emphasizing the stacking of A2 on the T1:A3 Hoogstee n pai r which i s in tur n stacke d o n th e C:CH + mismatc h pai r i n th e crysta l structure o f th e d(TA 2C3) i-moti f quadruplex (133) .
440
Oxford Handbook of Nucleic Acid Structure
is stabilized by formation of a Hoogsteen T: A bas e pair between th e thymin e an d th e 3'-adenine, whic h i n tur n stack s ove r th e termina l C:C + mismatc h pair . Th e centra l adenine o f thi s TA 2 segmen t stack s o n th e othe r sid e ove r th e Hoogstee n T: A bas e pair an d cap s th e en d o f the i-moti f quadruple x (Fig . 13.38). Thi s structure , with it s novel TA 2-folded segment , provide s insigh t int o th e potentia l foldin g topologie s o f d(TA2C3)n (n =2 and 4 ) i-motif quadruplexes. Most importantly, isomorphous crystal s of d(TA2C3) ca n be grown betwee n p H 5. 5 and 7.5, suggestin g that the stabilit y of the crystal lattice has raised the apparen t pK, fo r hemiprotonation o f the C:C + mismatc h pair (133) . The crysta l structur e o f th e termina l segment s o f th e four-strande d d(C 3A2T) i-motif quadruple x i s shown i n Fig . 13.3 9 (134) . An asymmetri c A(anti):A(dinat) mis match pai r stack s ove r th e termina l C:CH + mismatc h pai r (Fig . 13.40b ) an d extend s the i-moti f architectur e b y on e ste p i n eithe r direction . Thi s asymmetri c A: A mis match, whic h involve s pairin g throug h th e Watson—Cric k an d Hoogstee n edge s o f the adenines , stack s in tur n ove r a symmetrical A(anti):A(syn) mismatc h (Fig . 13.40a) , which involve s pairing along the Watson—Cric k edge s of both adenines . Each of these two distinc t A:A mismatches participates in an A:A:T bas e triple with a thymine fro m
Fig. 13.39. A view o f the 2. 0 A crystal structure of the four-strande d d(C3A2T) i-moti f quadruplex (134) . One parallel-stranded duple x is shown using filled bonds while the other, aligned antiparallel t o the first, is shown usin g open bonds.
Structures of guanine-rich and cytosine-rich quadruplexes 44
1
Fig. 13.40. View s down the helix axis showing the stacking between adjacent (a ) A4:A4 and A5:A5 mismatch pairs an d (b ) A4:A4 and C1:C1H + mismatch pair s in th e crysta l structure of the d(C 3A2T) i-motif quadruplex (134) .
a symmetry-relate d i-moti f i n th e crystallographi c asymmetri c uni t (134) . Isomorphous crystal s of d(C3A2T) coul d also be grown ove r the pH rang e 5. 0 to 7.5 .
4.2 i-Motif quadruplexes formed through dimerization of loop containing segments Several group s hav e investigate d th e foldin g topologie s o f d(C nNmCn) sequence s a t acidic pH wit h th e understandin g tha t suc h sequences can fold bac k t o for m C:CH + mismatch pairs , which i n turn ca n dimerize t o for m i-moti f quadruplexes (135,136) . High resolutio n NM R ha s been use d more recentl y t o determin e th e solutio n struc tures of i-motif quadruplexes formed through dimerizatio n o f d(C BNmC„) sequence s at acidic pH an d the available results are outlined below . 4.2.1 Solution structure and opening kinetics of the d(m5CCT3AC2) i-motif quadruplex The d(m 5CCT3AC2) sequenc e give s well-resolve d proto n NM R spectr a an d NO E patterns characteristi c o f i-moti f quadruple x formatio n (137) . Th e concentratio n dependence o f th e equilibriu m betwee n multime r an d singl e stran d conformer s
442
Oxford Handbook of Nucleic Acid Structure
Fig. 13.41. A schematic of the NMR-based solution structure-base d folding topology adopted b y the i motif quadruple x forme d by : (a ) the head-to-tai l dimerizatio n of a pai r o f d(m 5CCT3AC2) hairpin s i n acidic p H solutio n (137) ; an d (b ) the head-to-hea d dimerizatio n of a pair o f d(m 5CCT4C2) hairpin s in acidic pH solutio n (137). Reproduced with permission of Structure.
established i-moti f quadruple x formatio n throug h dimerization . A n apparen t pK , o f c. 6.5 wa s estimate d fo r d(m 5CCT3AC2) i-moti f quadruple x formation . Th e solu tion structur e was solved b y a combine d NM R an d molecula r dynamic s structural characterization includin g intensit y refinement . Th e foldin g topolog y consist s of an i-motif quadruple x cor e containin g intercalate d C:CH + oute r pair s an d m 5C:CH+ inner pair s linked a t opposite ends by T 3A loop s that span the wid e groove , a s shown schematically i n Fig . 13.41 a (137) . Th e adenin e residu e i n bot h loop s stac k o n the oute r C:CH + pair s thu s extendin g the stackin g beyon d th e centra l i-moti f core. The solutio n structure of the d(m 5CCT3AC2) i-moti f quadruplex is shown in Fig. 13.42 . Hydrogen exchang e kinetics of base mismatch opening establis h that th e lifetim e is 1 ms at 15°C, wit h an activation energ y o f 60 kJ mol -1 for the outer C:CH+ mismatc h pairs i n th e d(m 5CCT3AC2) i-moti f quadruple x (137) . Thi s numbe r i s one orde r o f magnitude longer than the corresponding mismatc h lifetimes of terminal C:CH + pairs in th e d(TC 2) i-moti f quadruple x (129) . Thi s coul d reflec t th e contribution s o f the T3A loo p t o th e stabilit y o f thi s oute r C:CH + pai r i n th e d(m 5CCT3AC2) i-moti f quadruplex. By contrast, the mismatc h lifetime is three order s of magnitude longer at
Structures of guanine-rich and cytosine-rich quadruplexes 44
3
Fig. 13.42. A view o f the NMR-base d solution structure of the d(m 5CCT3AC2) i-motif quadruple x in acidic pH solution (137). On e stran d is shown using filled bonds and the othe r strand is shown with open bonds.
1 s at 15°C, wit h an activation energ y o f 100 kJ mol -1 for the inner m5C:CH+ pairs in the d(m 5CCT3AC2) i-motif quadruplex (137) . Thi s numbe r is comparable with thos e determined for the interna l C:CH + mismatch lifetimes in the d(TC 2) i-moti f quadru plex (129) . Th e exchang e characteristic s of th e thymin e imin o proton s als o suggest that th e loo p i s closed b y a Hoogsteen-like alignmen t involvin g th e loop-closin g T and A residues bridged by a bound water molecule . The d(m 5CCT4C2) sequence , wher e th e A residu e i s replaced b y T, also forms a n i-motif quadruple x throug h dimerization , excep t tha t th e loop s ar e positioned o n the sam e sid e o f th e i-moti f (Fig . 13.41b ) (137) . Thi s resul t emphasize s the strikin g change i n foldin g topolog y o f th e i-moti f quadruplexe s associate d with a switc h i n a single loop residue.
444
Oxford Handbook of Nucleic Acid Structure
4.2.2 Solution structure of the insulin minisatellite repeat d(C4TGTC4) i-motif quadruplex The insuli n minisatellit e sequenc e located upstrea m o f the huma n insuli n gen e (138 ) exhibits polymorphis m i n bot h repea t lengt h an d sequence . Th e pyrimidine-ric h d(C4ACAC4TGT)n stran d contain s C 4 segment s i n th e repea t element . A combine d NMR an d molecula r dynamic s stud y ha s been undertake n t o defin e th e solutio n structure o f th e d(C 4TGTC4) domai n a t acidi c p H (139) . Th e foldin g topolog y reflects formatio n o f a n i-moti f quadruple x throug h dimerizatio n o f fold-bac k seg ments, wit h th e TG T turn s positione d a t opposit e end s o f th e twofol d symmetri c quadruplex. Th e p H dependenc e o f i-motif quadruplex formatio n exhibit s a n appar ent pK , o f 6.5. Ther e is some concer n abou t the robustnes s of the refinement s base d on th e liste d statistics for th e refine d structure s of the d(C 4TGTC4) i-moti f quadru plex. Thus, th e five refine d structures exhibi t an unusually large number of NOE vio lations (4 2 violations, > 0. 5 A and < 1. 0 A) (139 ) an d thi s discrepanc y needs furthe r clarification. 4.2.3 Solution structure of the centromeric a satellite repeat d(TC3GT3C2A) i-motif quadruplex The centromeri c CENP- B protein i s known t o targe t th e d(TC 3GT3C2A2CGA2G)n box repeat o f a satellit e DNA locate d a t the centromeri c region s of human chromo somes (140) . Th e NMR-base d solutio n structur e o f the d(TC 3GT3C2A) sequenc e a t acidic p H ha s been determine d t o hig h resolutio n an d show n t o for m a n i-moti f quadruplex throug h dimerization of a pair o f fold-back segments , wit h the GT 3 turn s positioned o n th e sam e sid e o f th e twofol d symmetri c quadruple x (141) . Th e tw o hairpin turns positioned at one end of the i-motif quadruplex interact wit h each othe r through formatio n of a novel T:G:G:T tetrad. This T:G:G:T tetrad alignment involves the dimerizatio n o f two wobbl e G: T pair s throug h pairin g o f thei r guanin e mino r groove edges , as shown schematicall y i n Fig . 13.43 . Thi s structur e exhibits excellen t
Fig. 13.43. A schematic drawing of the T:G:G:T tetrad pairing alignment observed in the NMR-base d solution structure of the d(TC 3GT3C2A) i-moti f quadruplex at acidic pH (141) .
Structures of guanine-rich and cytosine-rich quadmplexes 44
5
refinement statistic s with bot h lo w pairwis e rms d value s (0.441+0.1 4 A ) an d a lo w number o f NOE violation s (two violations, > 0. 2 A) (141).
4.3 Intramolecularly folded i-motif quadruplexes Several group s hav e attempte d t o generat e intramolecularl y folde d i-moti f quadru plexes fro m DN A sequence s containing fou r C n repeat s under acidi c p H condition s (139, 141—145). Th e structura l characterizatio n of suc h a n intramolecularl y folde d i-motif represents a challenge becaus e of complications fro m conformationa l hetero genity. Initially , some progres s was made o n th e huma n telomer e d[(C 3TA2)3C3] i motif quadruple x syste m (142) . Mor e recently , a hig h resolutio n structur e o f th e intramolecularly folded d(m5CCT3C2T3AC2T3C2) i-moti f quadruplex has been solve d (146). These results ar e summarized briefl y below . 4.3.1 Human telomere d[(C 3 TA2)C 3 ] i-motif quadruplex Two group s hav e recentl y investigate d th e potentia l formatio n o f intramolecularl y folded i-motif quadruplexes b y the human telomere d(C 3TA2)4 sequence an d its variants under acidic pH condition s (142 , 143). On e o f these groups use d UV absorbance melting curves , chemica l modification , an d non-denaturin g ge l electrophoresi s t o monitor the folde d state of d(C 3TA2)4 a t acidic pH (143) . Th e othe r grou p use d U V absorbance and gel filtration, and, in addition, monitore d th e characteristi c NOE pat terns t o establis h intramolecula r i-moti f quadruple x formatio n fo r d[(C 3TA2)3C3] a t acidic p H (142) . Th e NM R resonance s wer e marginall y resolve d an d appea r t o contain mor e tha n on e folde d conforme r fo r d[(C 3TA2)3C3] a t acidi c p H (142) . Hence, curren t effort s ar e focuse d o n designin g variant s o f th e huma n telomer e d(C3TA2)4 sequence , with the aim of obtaining improve d NMR spectr a correspond ing to a single conformation necessar y for a high resolution structure determination o f an intramolecularly folded i-motif quadruplex. 4.3.2 Solution structure of the d(m5CCT3C2T3AC2T3C2) i-motif quadruplex Thermal denaturation , gel filtration, and NMR studie s have also been use d to demonstrate formatio n o f a n intramolecularl y folde d i-moti f quadruple x b y th e d(C2T3C2T4C2T3C2) sequence at acidic pH (145) . Thus, a s few a s eight cytosine s and a total o f four intercalated C:CH + mismatch pair s are sufficient to form an intramolecular i-motif quadruplex . A significan t ste p forwar d i n ou r understandin g o f th e i-moti f quadruple x ha s resulted fro m recen t NM R studie s o f th e d(m 5CCT3C2T3AC2T3C2) sequenc e a t neutral p H (146) . Th e NM R parameter s ar e consisten t wit h formatio n o f a n intramolecularly folde d i-moti f quadruplex , wit h th e foldin g topolog y show n i n Fig. 13.44 . A view o f the solutio n structur e of the d(m 5CCT3C2T3AC2T3C2) i-moti f quadruplex i s shown i n Fig . 13.45 . Thi s structur e is formed a t neutral pH wit h a pKa of 7.45 fo r the midpoin t o f the transition . This i-moti f quadruple x structur e contain s fou r contiguousl y stacke d C:CH + mismatch pair s capped at one en d by a T3A loop tha t spans the wide groov e an d at the other en d b y tw o spatiall y proxima l T 3 loop s tha t spa n th e tw o narro w groove s (Fig. 13.44) . Th e stackin g within th e C:CH + i-moti f is extended i n on e directio n by
446
Oxford Handbook of' Nucleic Acid Struture
Fig. 13.44. A schematic of the NMR solutio n structure-based foldin g topology adopted b y th e intramol ecular i-motif quadruples formed byghfghd thesesdfaklsdmfgfalsdgdfgd sdgflaskdquence (146). Reproduced with permission of J. Mil. Biol.
a propeller-twiste d revers e Hoogstee n T: A mismatc h pair an d i n th e othe r directio n by a T: T mismatc h pai r involvin g thymine s fro m th e spatiall y proxima l T 3 loop s (Fig. 13.44 ) (146) . 4.4 Potential biological relevance of the i-motif quadruplex To dat e ther e i s n o direc t evidenc e t o suppor t a biologica l rol e fo r th e intercalate d C:CH' mismatc h paire d i-moti f quadruplex . A primar y concer n i s th e requiremen t for acidi c p H t o favou r i-motif quadruplex formation . Th e pK , fo r cytosine N 3 pro tonation i s 4.3 a t th e monome r level , hu t thi s pK , increase s t o 6. S fo r severa l o f th e i-motif qnadruplexe s studie d t o dat e (137,139) . However , th e intramolecularl y foldt'd d(m' 5CCT1C?T1AC:T1C2) i-moti f quadruple x exhibit s a p/C, of 7.45 consisten t with i-moti f formatio n a t neutra l p H (146) , Indeed , crystal s o f th e four-strande d i-motif quadruple x ca n b e grow n fro m solution s a t p H value s u p t o 7. 5 (133,134) , These result s sugges t tha t th e requiremen t fo r slightl y acidi c p H condition s ma y not h e a n issu e fo r intramolecularly folde d i-moti f quadruplexes (146 ) an d coul d als o be overcom e b y othe r factor s suc h a s superhelical stres s or comple x formatio n wit h potential protein s tha t targe t th e i-moti f quadruplex.
Structures ofguanine-rich and cytosine-rich quadruplexes 44
7
Fig. 13.45. A vie w o f th e NMR-base d solutio n structur e o f th e i-moti f quadruple x d(m5CCT3C2T3AC2T3C2) (137) . Alternat e C:CH + pair s ar e shown by fille d an d open bonds. Th e loo p segments are shown by hatched bonds. The base s of residues T3, T4 , T9 , T10 , T14 , an d T15 have been deleted in the interests of clarity.
The i-moti f could hav e potential therapeuti c efficacy base d on th e abilit y o f phos phodithioate dC n to inhibit HIV-1 integras e (147). A protei n ha s been identifie d that bind s t o th e vertebrat e cytosine-ric h telomeri c d(C3TA2)n sequenc e (148) . However, n o protei n tha t bind s wit h hig h specificit y and affinity t o th e i-moti f quadruple x ha s been isolate d t o date , an d perhap s more tim e i s needed to pursue this goal given that the i-motif (15) was only discovered five years ago.
5 Future directions DNA quadruplexe s hav e the potential to play a critical rol e in self-recognition involv ing system s ranging fro m chromosoma l pairin g t o recombination . Th e repertoir e o f tetrad alignment s is currently limited , wit h th e emphasi s on G:G:G: G an d G:C:G: C
448
Oxford Handbook of Nucleic Acid Structure
alignments. Future efforts shoul d be directed toward s the identificatio n and characteri zation o f potential A:T:A:T an d G:A:G: A tetra d alignment s an d th e identificatio n of sequence contexts that favour suc h pairing alignments. Th e succes s associated with th e interdigitated, reverse d protonate d C: C mismatc h pair-stabilize d i-moti f quadruple x formation coul d possibl y b e extende d t o th e identificatio n an d characterizatio n o f potential i-motif s containin g reverse d protonate d A: C an d reverse d A: A mismatc h pairs. I t shoul d als o b e possibl e t o exten d th e limite d repertoir e o f bas e tria d align ments b y designin g sequence s wher e potentia l bas e tria d alignment s ar e stabilized through stackin g with adjacen t G:G:G: G tetrads . There i s a critica l nee d t o characteriz e structurall y G quadruplexe s an d i-moti f quadruplexes complexe d wit h ligand s rangin g fro m smal l molecule s t o saccharides, peptides, an d proteins . Th e diversit y associate d wit h th e fou r groove s o f differen t dimensions, togethe r wit h nove l loo p foldin g topologie s in the cas e of intramolecu larly folded quadruplexes , makes these higher orde r nuclei c acid architectures attractive targets for therapeutic intervention.
Coordinates deposition We have prepare d table s listin g th e structure s discusse d i n thi s chapte r wit h currentl y available PD B (Protei n Database ) accessio n number s fo r deposite d coordinates . Th e accession number s fo r guanine-ric h G:G:G:G-containin g G quadruplex-formin g sequences, fo r guanine-ric h G:C:G:C-containin g (an d related ) quadruplex-formin g sequences, an d fo r cytosine-ric h interdigitate d C:CH + mismatch-containin g i-rnoti f quadruplex-forming sequence s are listed in Table s 13.1-13.3 .
Table 13.1. A listin g of NMR an d X-ray based structure s of guanine-rich G:G:G:G-containing G quadruplex-forming sequence s along with PDB accessio n number for deposited coordinates Section
Sequence
Conditions
Ref
Accession no .
2.1.1 2.1.3 2.1.4 2.2.1 2.2.2
d(T2G4T) d(TG4T) r(UG4U) d(G4T4G4) d(G2T2G2TGTG2T2G2) d(G2T2G2TGTG2T2G2) d(G2T2G2TGTG2T2G2) plus thrombi n d[AG3(T2AG3)3] d(G4T4G4) d[G4(T4G4)3] d[G4(T4G4)3] d(G3T4G3) d(T2G4)4 d(TAG2)
Na+, solutio n Na+, crystal , 0.95 A K+, solutio n K+, crystal , 2.5 A K+, solutio n Na+, K +, solutio n Na+, crystal , 2.9 A
35 40 42 45 51 52 54
139d 352d Irau Id59 148d Iqdf Ihut
Na+, solutio n Na+, solutio n Na+, solutio n Na+, solutio n Na+, solutio n Na + , solutio n Na + , solutio n
59 47 61 62 76 77 78
143d 156d 201d 230d Ifqp 186d
2.2.3 2.3.2 2.3.4 2.3.5 2.3.8 2.4.1 2.5.1
Structures of guanine-rich and cytosine-rich quadruplexes 44
9
Table 13.2. A listin g o f NMR an d X-ray based structures of guanine-rich G:C:G:C-containing (and related) quadruplex-formin g sequence s along wit h PDB accessio n numbers for deposited coordinate s Section 3.1.1 3.1.2 3.1.3 3.2.1 3.2.2
Sequence d(GCG2T3GCG2) d(G3CT4G3C) d(G3CT4G3C) d(GCATGCT) d
Conditions +
Na , solution Na+, solution K+, solutio n Li+, Mg 2+, X-ray, 1. 8 A Na+, Ba 2+, X-ray
Ref
Accession no .
108 116 117 118 119
1a6h Ia8n 1a8w 184d 284d
Table 13.3. A listin g of NMR an d X-ray based structures of cytosine-ric h interdigitated C:CH + mismatch-containing i-moti f quadruplex-forming sequence s along with PDB accessio n number s for deposited coordinates Section
Sequence
Conditions
Ref
Accession no .
4.1.1 4.1.2
d(TC5) d(TC2) d(m5CCT) d(m5CCTC2) d(C4) d(C3T) d(TA2C3) d(C3A2T) d(m5CCT3AC2) d(m5CCT3C2T3AC2T3C2)
Solution, low pH Solution, low pH Solution, low pH Solution, low pH Crystal, 2.3 A Crystal, 1. 4 A Crystal, 1. 9 A, pH 5.5-7.5 Crystal, 2.0 A, pH 5.0-7.5 Solution, low pH Solution, neutral pH
15 129 129 130
225d 105d 106d 1rme 190d 191d 200d 241d 1bae 1a83
4.1.3 4.1.5 4.1.6 4.2.1 4.3.2
131 132 133 134 137 146
Acknowledgements The DN A quadruple x researc h in our laboratory i s funded b y NIH gran t G M 34504. We than k Drs . R . Aja y Kuma r an d Andre y Gori n fo r helpfu l discussions . We than k Drs Jean-Louis Lero y an d Mauric e Guero n o f th e Ecol e Polytechnique , Palaiseau , France, fo r providing a preprint an d th e coordinate s o f their solutio n structur e o f th e intramolecularly folde d i-moti f quadruplex (146 ) prio r to publication .
References 1. Sun , J.S. an d Helena, C . (1993 ) Curr. Opin. Struct. Biol. 3, 345 . 2. Radhakrishnan , I. and Patel, D.J. (1994 ) Biochemistry 33 , 11405 . 3. Plum , G.E. , Pilch , D.S. , Singleton , S.F. an d Breslauer , K.J. (1995 ) Annu. Rev. Biophys. Biomol Struct. 24, 319 . 4. Rhodes , D. and Giraldo, R. (1995 ) Curr. Opin. Struct. Biol. 5, 311 . 5. Pilch , D.S., Plum , G.E. and Breslauer, K.J. (1995) Curr. Opin. Struct. Biol. 5, 334 . 6. Lilley , D.M.J. an d Clegg, R.M. (1993 ) Annu. Rev. Biophys. Biomol. Struct. 22, 299 . 7. Altona , C., Pikkematt , J.A. an d Overmans, F.J.J. (1996) Curr. Opin. Struct. Biol. 6, 305 . 8. Gellert , M. , Lipsett , M.N . an d Davies, D.R . (1962 ) Proc. Natl. Acad. Set. USA 48 , 2013 .
450
Oxford Handbook of Nucleic Acid Structure
9. Henderson , E.R. , Moore , M. an d Malcolm, B.A . (1990 ) Biochemistry 29 , 732 . 10. Sen , D. an d Gilbert, W. (1988 ) Nature 334 , 364 . 11. Sundquist , W.I. an d Klug, A. (1989) Nature 342, 825 . 12. Williamson , J.R., Raghuraman , M.K. an d Cech, T.R. (1989 ) Cell 59, 871 . 13. Guschlbauer , W., Chantot , J.F . and Thiele, D . (1990 ) J. Biomol. Struct. Dynamics 8 , 491 . 14. Williamson , J.R . (1994 ) Annu. Rev. Biophys. Biomol. Struct. 23, 703 . 15. Gehring , K. , Leroy, J.-L. an d Gueron, M . (1993 ) Nature 363 , 561 . 16. Blackburn , E.H. an d Szostak, J.W. (1984 ) Annu. Rev. Biochem. 53, 163 . 17. Yu , G.L. , Bradley, J.D., Attardi , L.D . an d Blackburn, E.H . (1990 ) Nature 344 , 126 . 18. Arnott , S. , Chandrasekaran, R . an d Marttila, C.M. (1974 ) Biochem. J. 141 , 537 . 19. Zimmerman , S.B. , Cohen , G.H . an d Davies, D.R. (1975 ) J. Mol. Biol. 92, 181 . 20. Sasisekharan , V., Zimmermann , S.B . and Davies, D.R . (1975 ) J. Mol. Biol. 92, 171 . 21. Pinnavaia , T.J., Marshall , C.L., Mettler , C.M. , Fisk , C.L., Miles , H.T . an d Becker, E.D . (1978) J. Am. Chem. Soc. 100, 3625 . 22. Howard , F.B . an d Miles, H.T. (1982 ) Biochemistry 21 , 6736. 23. Hardin , C.C. , Henderson , E. , Watson, T . an d Prosser, J.K. (1991 ) Biochemistry 30 , 4460. 24. Xu , Q. , Deng , H . an d Braunlin, W.H. (1993 ) Biochemistry 32 , 13130 . 25. Sen , D. an d Gilbert, W . (1990 ) Nature 344, 410 . 26. Hardin , C.C. , Watson , T. , Corregan , M . an d Bailey, C . (1992 ) Biochemistry 32 , 833 . 27. Miura , T., Benevides , J.M. an d Thomas, G.J. , Jr (1995 ) J. Mol. Biol. 248, 233 . 28. Williamson , J.R. (1993 ) Curr. Opin. Struct. Biol. 3, 357 . 29. Henderson , E. , Hardin, C.C., Walk , S.K. , Tinoco , I., Jr an d Blackburn, E.H. (1987 ) Cell 51, 899 . 30. Wang , Y . and Patel, DJ. (1992 ) Biochemistry 31 , 8112 . 31. Aboul-ela , F. , Murchie, A.I.H . an d Lilley, D.M. (1992 ) Nature 360, 280 . 32. Jin , R. , Gaffney , B.L. , Wang, C., Jones, R.A . an d Breslauer, K.J. (1992 ) Proc. Natl. Acad. Sci. USA 89 , 8832 . 33. Sen , D. an d Gilbert, W . (1992 ) Biochemistry 31 , 65. 34. Sen , D . an d Gilbert , W . (1991 ) Curr. Opin. Struct. Biol. 1, 435 . 35. Wang , Y. and Patel, D.J . (1993 ) J. Mol. Biol. 234, 1171 . 36. Gupta , G. , Garcia , A.E., Guo , Q. , Lu , M. an d Kallenbach, N.R . (1993 ) Biochemistry 32 , 7098. 37. Aboul-ela , F. , Murchie , A.I.H. , Norman , D.G . an d Lilley , D.M . (1994 ) J. Mol. Biol. 243, 458 . 38. Guo , Q. , Lu , M. an d Kallenbach, N.R . (1993 ) Biochemistry 32 , 3596. 39. Laughlan , G., Murchie, A.I. , Norman , D.G., Moore , M.H., Moody , P.C., Lilley , D.M . and Luisi , B. (1994 ) Science 265, 520 . 40. Phillips , K., Dauter, Z., Murchie , A.I.H. , Lilley , D.M.J. and Luisi, B. (1997 ) J. Mol. Biol. 273, 171 . 41. Kim , J., Cheong , C . an d Moore, P.B. (1991 ) Nature 351, 331 . 42. Cheong , C. an d Moore, P.B . (1992 ) Biochemistry 31 , 8406 . 43. Marsh , T.C. an d Henderson, E . (1994 ) Biochemistry 33 , 10718 . 44. Marsh , T.C., Vesenka , J. an d Henderson, E. (1995 ) Nud. Acids Res. 23, 696 . 45. Kang , C., Zhang , X. , Ratliff , R. , Moyzis , R . an d Rich, A. (1992 ) Nature 356 , 126 . 46. Smith , F.W . an d FeigonJ. (1992 ) Nature 356, 164 . 47. Schultze , P. , Smith , F.W . an d Feigon, J. (1994 ) Structure 2, 221 . 48. Bock , L.C. , Griffin , L.C. , Lantham , J.A., Vermaas , E.H . an d Toole , J.J. (1992 ) Nature 355, 564 . 49. Macaya , R.F., Schultze , P., Smith , F.W. , Roe , J.A . an d FeigonJ. (1993 ) Proc. Natl. Acad. Sci. USA 90 , 3745 .
Structures of guanine-rich and cytosine-rich quadruplexes 45
1
50. Wang , K.Y. , McCurdy , S. , Shea , R.G. , Swaminathan , S . an d Bolton , P.H . (1993 ) Biochemistry 32, 1899 . 51. Schultze , P., Macaya, R.F. an d Feigon, J. (1994 ) J. Mol. Biol. 235, 1532 . 52. Wang , K.Y. , Krawczyk , S.H. , Bischofberger , N. , Swaminathan , S . an d Bolton , P.H . (1993) Biochemistry 32 , 11285 . 53. Marathias , V.M., Wang , K.Y. , Kumar , S. , Pham , T.Q. , Swaminathan , S . an d Bolton , P.H. (1996 ) J. Mol. Biol. 260, 378 . 54. Padmanabhan , K., Padmanabhan, K.P. , Ferrara , J.D., Sadler , J.E. and Tulinsky, A. (1993 ) J. Biol. Chem.268, 17651. 55. Kelly , J.A., Feigon,] . and Yeates, T.O. (1996 ) J. Mol. Biol. 256, 417 . 56. Hammond-Kosack , M.C. , Dobrinski , B. , Lurz , R., Dochert , K . an d Kilpatrick, M.W . (1992) Nud. Adds Res. 20, 231 . 57. Catasti , P. , Chen , X. , Moyzis , R.K. , Bradbury , E.M. an d Gupta, G . (1996 ) J. Mol. Biol. 264, 534 . 58. Wang , Y. , de los Santos, C., Gao , X., Greene , K. , Live, D. and Patel, DJ. (1991 ) J. Mol. Biol. 222, 819 . 59. Wang , Y . and Patel, D.J. (1993 ) Structure 1, 263 . 60. Wang , K.Y. , Swaminathan, S . and Bolton, P.H. (1994 ) Biochemistry 33 , 7517 . 61. Wang , Y. and Patel, D.J. (1995 ) J. Mol. Biol. 251, 76 . 62. Smith , F.W. , Schultze , P. and Feigon, J. (1995 ) Structure 3, 997 . 63. Sundquist , W.I. an d Heaphy, S . (1993) Proc. Natl. Acad. Sci. USA 90 , 3393 . 64. Skripkin , E., Paillart, J.-C., Marquet, R., Ehresmann , B. and Ehresmann, C. (1994 ) Proc. Natl. Acad. Sci. USA 91 , 4945 . 65. Paillart , J.-C., Skripkin , E. , Ehresman , B. , Ehresman , C . an d Marquet , R . (1996 ) Proc. Natl. Acad. Sci. USA 93 , 5572 . 66. Christiansen , J., Kofod , M. an d Nielsen, F.C . (1994 ) Nud. Adds Res. 22, 5709. 67. Smith , F.W . an d Feigon, J. (1993 ) Biochemistry 32 , 8682 . 68. Murchie , A.I . and Lilley, D.M. (1994 ) EMBO J. 13 , 993 . 69. Balagurumoorthy , P . and Brahmachari, S.K. (1994) J. Biol. Chem. 269, 21858 . 70. Jin , R. , Breslauer , K.J., Jones, R.A . an d Gaffney, B.L . (1990 ) Science 250, 543 . 71. Hud , N.V. , Smith , F.W. , Anet , F.A.L. and Feigon, J. (1996 ) Biochemistry 35 , 15383 . 72. Bouaziz , S. and Patel, DJ . (1998 ) submitted . 73. Scaria , P.V., Shire , SJ . an d Shafer , R.H . (1992 ) Proc. Natl. Acad. Sci. USA 89 , 10336 . 74. Strahan , G.D. , Shafer , R.H . an d Keniry, M.A. (1994 ) Nud. Acids Res. 22, 5447 . 75. Smith , F.W. , Lau , F.W. an d Feigon, J. (1994 ) Proc. Natl. Acad. Sci. USA 91 , 10546 . 76. Keniry , M.A., Strahan , G.D. , Owen , E.A . an d Shafer , R.H . (1995 ) Eur. J. Biochem. 233 , 631. 77. Wang , Y . and Patel, DJ. (1994 ) Structure 2, 1141 . 78. Kettani , A., Bouaziz, S., Wang, W.Jones , R.A. an d Patel, D.J. (1997 ) Nature Struct. Biol. 4, 382 . 79. Kuryavyi , V.V. and jovin, T.M. (1995 ) Nature Genetics 9, 339 . 80. Gate , J.H. , Gooding , A.R. , Podell , E. , Zhou , K. , Golden , B.L. , Szewczak , A.A. , Kundrot, C.E., Cech , T.R . an d Doudna, J.A. (1996 ) Science 273, 1696 . 81. Guo , Q., Garcia , A.E., Guo, Q. , Lu, M. and Kallenbach, N.R. (1993 ) Biochemistry 31, 2451. 82. Chen , Q., Kuntz , I.D . an d Shafer, R.H . (1996 ) Proc. Natl. Acad. Sci. USA 93 , 2635 . 83. Li , Y., Geyer , C.R . an d Sen, D. (1996 ) Biochemistry 35 , 6911 . 84. Li , Y. an d Sen, D . (1997 ) Biochemistry 36 , 5589 . 85. Wyatt , J.R., Vickers , T.A., Roberson , J.L., Buckheit , R.W. , Jr , Klimkait , T. , DeBaets, E., Davis , P.W. , Rayner , B. , Imbach , J.L. an d Ecker , D.J . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 1356 .
452
Oxford Handbook of Nucleic Acid Structure
86. Rando , R.F. , Ojwang , J., Elbaggari , A. , Reyes, G.R. , Tinder , R. , McGarth , M.S . an d Hogan, M.E. (1995 ) J. Biol . Chem. 270, 1754 . 87. Bishop , J.S., Guy-Caffey , J.K., Ojwang , J.O., Smith , S.R. , Hogan , M.E. , Cossum , P.A. , Rando, R.F . an d Chaudhary, M. (1996 ) J. Biol. Chem. 271, 5698 . 88. Mazumdar , A.D., Neamati , N., Ojwang , J.O., Sunder , S., Rando, R.F . and Pommier, Y. (1996 ) Biochemistry 5 , 13762 . 89. Jing , N. , Gao , X., Rando , R.F. an d Hogan, M.E . (1997 ) J. Biomol Struct. Dynamics 15 , 573. 90. Jing , N., Rando , R.F. , Pommier, Y . and Hogan, M.E . (1997 ) Biochemistry 36 , 12498 . 91. Fang , G. and Cech, T.R . (1993 ) Biochemistry 32 , 11646 . 92. Fang , G. and Cech, T.R . (1993 ) Cell 4, 875 . 93. Giraldo , R. an d Rhodes, D . (1994 ) EMBO J. 13 , 2411 . 94. Walsh , K. and Gualberto, A. (1992 ) J. Biol. Chem. 267, 13714 . 95. Weisman-Shomer , P . and Fry, M. (1993 ) J. Biol. Chem. 268, 3306 . 96. Schierer , T. an d Henderson, E . (1994 ) Biochemistry 33 , 2240 . 97. Liu , Z. an d Gilbert, W. (1994 ) Cell 77, 1083 . 98. Frantz , J.D. an d Gilbert, W. (1995 ) J. Biol. Chem. 270, 9413 . 99. Bashkirov , V.I., Scherthan , H. , Solinger , J.A. , Buerstedde , J.-M . an d Heyer , W.-D . (1997) J. Cell Biol. 136, 761 . 100. Murchie , A.I. and Lilley, D.M. (1992 ) Nucl. Acids Res. 20, 49 . 101. Simonsson , T., Pechinka , P. an d Kubista, M. (1998 ) Nucl. Adds Res. 26, 1167 . 102. Zahler , A.M. , WiUaimson , J.R., Cech , T.R. an d Prescott, D.M. (1991 ) Nature 350 , 718 . 103. Caskey , C.T. , Pizzuti , A. , Fu , Y.H. , Fenwick , R.G . an d Nelson , D.L . (1992 ) Science 256, 784 . 104. Sinden , R.R. an d Wells, R.D . (1992 ) Curr. Opin. Biotech. 3, 612 . 105. Nelson , D.L . (1995 ) Sem. Cell. Biol. 6, 5 . 106. Sutherland , G.R. an d Richards, R.I . (1995 ) Proc. Natl. Acad. Set. USA 92 , 3636. 107. Fry , M. an d Loeb, L.A. (1994) Proc. Natl. Acad. Sci. USA 91 , 4950 . 108. Kettani , A., Kumar, R.A. an d Patel, D.J. (1995 ) J. Mol. Biol. 254, 638 . 109. Brunger , A . (1992 ) X-PLOR. A System for X-ray Crystallography and NMR. Yal e University Press , New Haven . 110. Nilges , M., Habazettl , J., Brunger, A.T . an d Holak, T.A . (1991 ) J. Mol. Biol. 219, 499 . 111. Nilges , M. (1995 ) J. Mol. Biol. 245, 645 . 112. O'Brien , E.J. (1967 ) Acta Cryst. 23 , 92 . 113. McGavin , S . (1971) J. Mol. Biol. 55, 293 . 114. Mitas , M., Yu, A. , Dill, J. an d Haworth, I.S . (1995) Biochemistry 34 , 12803 . 115. Berns , K.I. and Linden, R.M. (1995 ) Bioessays 17 , 237. 116. Kettani , A., Bouaziz , S., Gorin , A. , Zhao , H. , Jones , R . an d Patel , D.J . (1998 ) J. Mol. Biol. 282, 619 . 117. Bouaziz , S., Kettani, A. and Patel, D.J . (1998 ) J. Mol. Biol. 282, 637 . 118. Leonard , G.A. , Zhang , S. , Peterson, M.R. , Harrop , S.J. , Helliwell , J.R. , Cruse , W.B. , d'Estaintot, B.L., Kennard, O. , Brown , T. an d Hunter, W.N. (1995 ) Structure 3, 335 . 119. Salisbury , S.A. , Wilson, S.E. , Powell , H.R. , Kennard , O. , Lubini , P. , Sheldrick , G.M. , Escaja, N. , Alazzouzi , E., Granada , A. an d Pedroso , E . (1997 ) Proc. Natl. Acad. Sci. USA 94, 5515 . 120. Shiber , M.C., Braswell, E.H., Klump , H. an d Fresco, J.R. (1996 ) Nucl. Acids Res. 24, 5004. 121. Hansen , R.S., Gartler , S.M., Scott , C.R. , Chen , S.H . an d Laird, C.D. (1992 ) Hum. Mol. Genet. 1 , 57 1 122. Hansen , R.S. , Canfield , T.K., Lamb , M.M., Gartler , S.M . an d Laird , C.D. (1993 ) Cell 73, 1403 .
Structures ofguanine-rich and cytosine-rich quadruplexes 45
3
123. Langridge , R. an d Rich, A. (1963) Nature 298, 725 . 124. Hartman , K.A . and Rich, A. (1965) J. Am. Chem. Soc. 87, 2033. 125. Cruse , W.B. , Egert , E. , Kennard , O. , Sala , G.B., Salisbury , S.A. an d Viswamitra, M.A . (1983) Biochemistry 12 , 1833 . 126. Robinson , H. , va n de r Marel , G. , va n Boom, J.H. an d Wang, A.H . (1992 ) Biochemistry 31, 10510 . 127. Wang , Y. and Patel, D.J. (1994 ) J. Mol Biol. 242, 508 . 128. Leroy , J.-L., Gehring, K. , Kettani, A. and Gueron, M. (1993 ) Biochemistry 32 , 6019 . 129. Leroy , J.-L. an d Gueron, M . (1995 ) Structure 3, 101 . 130. Nonin , S . and Leroy, J.-L. (1996 ) J. Mol. Biol. 261, 399 . 131. Chen , L. , Cai, L., Zhang, X . an d Rich, A. (1994) Biochemistry 33 , 13540 . 132. Kang , C.-H., Berger, I. , Lockshin, C., Ratliff , R. , Moyzis , R. an d Rich, A. (1994 ) Proc. Natl. Acad. Sci. USA 91 , 11636 . 133. Kang , C.-H., Berger, I. , Lockshin, C., Ratliff , R. , Moyzis , R. an d Rich, A. (1995 ) Proc. Natl. Acad. Sci. USA 92, 3874 . 134. Berger , L , Kang, C.-H., Fredian, A., Ratliff, R. , Moyzis , R . an d Rich, A. (1995 ) Nature Struct. Biol. 2, 416 . 135. Rohozinski , J., Hancock , J.M. an d Keniry, M.A. (1994 ) Nud. Adds Res. 22, 4653. 136. Ahmed , S. and Henderson, E. (1992 ) Nud. Adds Res. 20, 507 . 137. Nonin , S. , Phan, A.T. an d Leroy, J.-L. (1997 ) Structure 5, 1231 . 138. Bell , G.I. , Karam, J.H. an d Rutter, WJ. (1981 ) Proc . Natl. Acad. Sci. USA 78 , 5759 . 139. Catasti , P. , Chen , X. , Deaven , L.L. , Moyzis , R.K. , Bradbury , E.M . an d Gupta , G . (1997) J. Mol. Biol. 272, 369 . 140. Masumoto , H. , Masukata , H., Muro , Y. , Nozaki , N. an d Okazaki, T. (1989 ) J. Cell. Biol. 109, 1963 . 141. Gallego , J., Chou , S.-H . an d Reid, B.R. (1997 ) J. Mol. Biol. 273, 840 . 142. Leroy , J.-L., Gueron, M. , Mergny , J.-L. an d Helene, C . (1994 ) Nud. Acids Res. 22, 1600 . 143. Ahmed , S. , Kintanar, A. and Henderson , E.(1994 ) Nature Struct. Biol. 1, 83. 144. Manzini , G. , Yathindra, N. an d Xodo, L.E . (1994) Nud. Adds Res. 22, 4634 . 145. Mergny , J.-L., Lacroix , L. , Han , X. , Leroy , J.-L. an d Helene , C . (1995 ) J. Am. Chem. Soc. 117 , 8887 . 146. Han , X., Leroy , J.-L. an d Gueron, M . (1998 ) J. Mol. Biol. 278, 949 . 147. Marshall , W.S., Beaton , G. , Stein , C . A. , Matsukura , M. an d Caruthers , M.H . (1992 ) Proc. Natl. Acad. Sci. USA 89 , 6265 . 148. Marsich , E., Piccini , A., Xodo, L.E. and Manzini, G. (1996 ) Nud. Acids Res. 24, 4029 .
This page intentionally left blank
14 DNA bending by adenine-thymine tracts Donald M. Crothers1 and Zippora Shakked2 'Department of Chemistry, Yale University, New-Haven, CT, 06520, USA 2 Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
1. Global and spectroscopic properties of DNA curvature induced by A-tracts 1.1 Identification of A-tracts as the primary source of DNA curvature Fifteen year s have elapsed since the observations of Marini et al. (1) which associated DNA bendin g o r curvatur e with th e anomalousl y slo w electrophoreti c mobilit y an d fast overal l rotationa l relaxatio n observe d fo r DN A restrictio n fragment s fro m th e kinetoplast bod y o f Leishmania tarentolae. Confirmatio n o f increase d curvatur e soo n followed, usin g technique s suc h a s electri c birefringenc e deca y (2 ) an d electro n microscopy (3) . Th e sequence s responsibl e fo r bendin g wer e identifie d a s tracts o f oligo (dA):olig o (dT), each about half a helical turn long , repeate d in phase with th e DNA helica l screw (4). The experimen t o n which thi s conclusion was based relied on the slo w electrophoretic mobilit y o f a molecule containin g a bend a t its centre, com pared with a circularly permuted sequenc e variant in which th e ben d is at the end . A simple rule o f thumb i n interpreting such experiments i s that the shorte r the end-to end distanc e i n molecule s o f equa l contou r length , th e slowe r th e electrophoreti c mobility (5) . Gel electrophoretic methods for characterizing DNA bendin g have been reviewed by Crothers an d Drak (6). The importanc e o f the observe d phasin g of the A-tracts was confirmed b y experiments tha t compare d th e mobilitie s o f DN A ligatio n ladder s containin g A-tract s a t variable phasings (7,8). Repetition o f A-tracts in phase with the helica l repeat of DN A causes their effects t o b e additive , leading to a circular shape; if the phase match is only approximate, a left - o r right-hande d superheli x results , whic h i s o f highe r mobilit y than the plana r circle. Since repetition o f A-tracts a t 1. 5 helical turn phasing results in no observe d curvature, hyperflexibility in a plane associate d with A-tracts can be ruled out a s a source of the electrophoreti c anomal y and fast rotationa l relaxation (8). The dominanc e o f A-tracts a s the primar y sourc e o f DN A curvatur e i s indicated by experiment s suc h a s two-dimensiona l ge l electrophoresi s (9 ) an d selectio n amplification (10) , both o f which yielde d a numbe r o f molecules containin g phased A-tracts. Th e latte r experiments als o assigned a role t o C— A (T—G) dinucleotid e steps in conferrin g reduce d electrophoreti c mobility . Recen t amplificatio n experiment s starting with genomi c DN A an d selecting for molecules easil y bent t o for m nucleo some core particles also revealed a role for repeated C-A steps, and for short (n = 3—4) A-tracts (11).
456
Oxford Handbook of Nucleic Acid Structure
Given tha t systematic DNA bendin g o r curvature is associated with A-tracts that are phased wit h th e DN A helica l repeat , th e question s tha t remai n ca n b e divide d int o two categories : wha t ar e th e globa l propertie s o f th e bend , specifically , it s directio n and magnitude ; an d wha t i s the structura l basis fo r curvatur e a t th e molecula r level ? Earlier review s o f thi s genera l subjec t hav e bee n provide d b y Hagerma n (12 ) an d Crothers et al. (13).
1.2 Direction ofA-tract bends The firs t indicatio n o f the directio n o f the DN A ben d induce d b y A-tracts was pro vided by the experiment s of Koo et al. (8) who measure d the mobilit y o f molecules i n which A-tract s alternate d with T-tract s and compared the m with the value s observed when all of the A-tracts wer e on th e sam e strand. Since the mobilitie s wer e nearl y th e same, on e ca n conclud e tha t th e overal l directio n o f curvatur e o f a n A-trac t i s little affected b y rotatio n abou t th e pseudo-dya d axi s that run s throug h th e centr e o f th e tract, thu s interchanging th e A an d T strands . In othe r words , th e vecto r tha t bisects the ben d angl e is parallel to th e pseudo-dya d axi s runnin g betwee n majo r and minor grooves at the centr e o f the A-tract . Thi s resul t allowed Ko o et al. (8) to conclud e tha t the ben d i s towards either th e majo r o r the mino r groov e a t the centr e o f the A-tract . Based o n fibr e diffractio n studie s of poly (dA):pol y (dT) (se e below), the y proposed a model in whic h th e ben d i s towards th e mino r groove a t a locus at or nea r the centr e of the A-tract. Th e structura l basis for the bend cannot be establishe d by these experiments, bu t the y d o allo w exclusio n o f specific models , fo r example , tha t th e ben d i n solution is due to th e larg e rol l angle immediately adjacen t t o th e A-tract , a s observed in crystal s and NM R structure s (se e below). However , model s wit h positiv e rol l dis tributed ove r the adjacen t bas e pairs, or negativ e roll in the A-tract, o r tilt of appropri ate sign a t the junctions are consistent with these results. Gel electrophoresi s methods ca n be use d to determin e th e directio n o f the A-trac t bend b y comparin g th e mobilit y o f construct s in whic h th e A-tract s ar e a t variabl e phasings relative t o a bend o f known direction . Whe n th e tw o bend s ar e in the sam e direction, th e curvatur e i s maximal, th e end-to-en d distanc e i s minimized, an d th e mobility reaches a minimum (14,15) . (Ther e ar e some exceptions a t high ge l percentage; se e ref . 16. ) Zinke l an d Crother s (14 ) use d th e DN A ben d induce d b y E. coli CAP protei n a s a standard, and conclude d tha t the A-trac t ben d i s towards the minor , not th e major, groove, a t or near the centr e of the A-tract .
1.3 Polarity and imperfect dyad symmetry of A-tracts Many solutio n experiment s indicat e tha t the structure of A-tracts varies from the 5 ' t o the 3'-end , implyin g tha t the dya d symmetry deduce d b y Ko o et al. (8) from A - an d T-tract interchang e i s imperfect. Th e experiment s o f Ko o et al. suggested imperfect symmetry, bu t th e observation s o f Hagerma n (17 ) wer e decisive : multimer s o f th e form (A 4T4N2)n ar e highly curved , wherea s thos e o f the for m (T 4A4N2)n ar e straight. This i s not consisten t with a fully dya d symmetric structur e for a n A-tract; th e struc tural basis for the differenc e i n curvatur e might resid e in th e differen t characteristic s of the centra l base pair steps, A—T versus T—A, (se e below).
DNA bending by adenine-thymine tracts 45
7
Other solutio n experiment s supportin g a n imperfec t dya d includ e th e hydroxy l radical footprintin g result s o f Burkhoff and Tulliu s (18) , whic h showe d a progressive narrowing o f the minor groov e i n th e 5 ' t o 3 ' direction . NM R experiment s (19,20 ) revealed a stead y shif t toward s lowe r field , totalin g abou t on e ppm , o f th e imin o proton resonance s i n A-tracts , a s one move s fro m th e 5 ' t o th e 3'-en d o f th e tract . NOE measurement s als o provide d evidenc e fo r narrowin g o f th e A-trac t mino r groove ove r th e firs t thre e bas e steps, followed b y a region o f approximately constant width i n longer A-tracts (se e below). Th e structura l basis that gives rise to these observations remains a matter for conjecture, a s discussed below.
1.4 Temperature dependence of A-tract structure and curvature Early studie s o f A-tract s b y electrophoreti c method s reveale d tha t th e mobilit y anomaly i s strongly reduce d a t elevate d temperature s (21,22) ; reviewe d b y Breslauer (23). A premelting structura l change i n pol y (dA) : poly (dT ) ca n be detecte d b y U V absorbance (24 ) and C D spectroscop y (25) . Fro m th e widt h o f th e transitio n curve , centred aroun d 30—40°C , both groups estimated an apparent or van't Hof f enthalpy of about 20 kcal/mol. Chan et al. (26) used CD an d scanning calorimetry t o characterize the transitio n i n a molecule containin g phased A-tracts. The y sugges t tha t the transi tion follow s a two-state model , sinc e isoelliptic points are observed. Th e calorimetri c result, abou t 4. 4 kcal/mol pe r A— A dinucleotide step , together with thei r estimat e of 16 kcal/mol fo r th e van' t Hof f enthalp y fro m th e widt h o f the C D transitio n curve, can be used to estimate a length o f about 5 bp for the cooperativ e unit in the premelting transition . Thi s result , togethe r wit h dat a suc h a s those reporte d b y Hara n an d Crothers (27) , shows unambiguously tha t formatio n o f the aberran t A-tract structure is cooperative, wit h a n entire A-tract o f 5 bp undergoin g th e transitio n as an effectiv e cooperative unit . Recent temperature-dependen t resonanc e Rama n studie s o f the premeltin g struc tural transition in pol y (dA):pol y (dT) by Cha n et al. (28) provide importan t evidenc e concerning th e underlyin g physica l phenomenon. The y conclude d fro m deconvolu tion o f the resonanc e Raman spectru m that a thymine C4= O carbony l stretching frequency, normally observe d at 1684—168 6 cm -1 i n poly (dA—dT ) a t 5 and 55°C, and in poly (dA):pol y (dT) a t 55°C , i s anomalously red-shifted t o abou t 167 9 cm -1 i n pol y (dA):poly (dT ) a t 5°C . A simila r anomaly , although smalle r i n scale , i s also observe d for th e temperatur e dependenc e o f a vibrational mode assigne d to th e adenin e amin o group. These observation s strongl y favou r associatin g the lo w temperatur e for m o f pol y (dA):poly (dT ) with th e A-tract structure having propeller twiste d base pairs with bifurcated hydroge n bonds , whic h ha s been observe d i n crystal s of molecule s containin g A-tracts (se e below). Formatio n o f an extra (bifurcated ) hydroge n bond to th e thymin e carbonyl is consistent with the observe d reduction i n the forc e constan t for its stretching vibration. Thu s on e ca n now , wit h considerabl y increase d confidence, associate disappearance of this structural feature a t elevated temperature with th e premelting transitio n of poly (dA):poly (dT) and accompanying loss of DNA bendin g in solution. Another temperature-dependen t featur e o f A-trac t structur e i s th e downfield shifted positio n o f th e thymin e imin o proto n resonances , particularl y for thos e base
458
Oxford Handbook of Nucleic Acid Structure
pairs nea r th e 3'-en d o f th e A-trac t (th e 5'-end o f th e T-tract ) (19,20) . Thi s coul d arise fro m strengthenin g th e N—H... N hydroge n bon d i n th e propelle r twiste d state . The imin o proto n chemica l shift s mov e progressivel y to highe r fiel d a s temperature is increased i n th e rang e o f th e premeltin g transition . Th e observe d narro w mino r groove, particularl y towards the 3'-en d o f the A-tract , i s also a reasonable consequence of this structural feature. Thus ther e i s now persuasiv e solution spectroscopi c evidenc e fo r associatin g the low-temperature, ben t stat e of DNA containin g A-tract s wit h th e structur e havin g propeller twiste d bas e pairs in th e A-tract . Th e cooperativ e an d two-stat e characte r of th e therma l transitio n mean s tha t th e A-tract s tend s t o conver t a s a uni t int o a structure lackin g propelle r twisting , presumabl y on e tha t mor e closel y resemble s B-DNA. I n orde r t o yiel d a singl e imin o proto n resonanc e position fo r eac h base pair, th e structure s must equilibrat e o n a time-scale faste r tha n 10 0 us . However , th e average exten t o f propeller twistin g alon g th e A-trac t does no t see m t o b e unifor m in solution , bu t apparentl y increase s fro m th e 5 ' t o th e 3'-end . Sinc e th e A-tract tend s to ac t as a cooperative unit , thi s cannot be explaine d by a higher occu pancy o f the high-temperatur e stat e by base pairs at the 5'-en d o f the A-tract . I t is more likel y tha t th e exten t o f propelle r twistin g i n th e low-temperatur e stat e i n solution i s greater fo r bas e pairs nea r th e 3'-en d o f the A-trac t tha n fo r thos e nea r the 5'-end.
1.5 Bend magnitude Estimates o f th e exten t o f bendin g produce d b y phase d A-tract s hav e varie d fro m about 11 ° per tract using gel electrophoresis (29) to abou t 28° from the rat e of cyclization i n ligatio n ladde r experiment s (30) . Measurement o f rotationa l relaxatio n gav e 18° (31), as did computer simulatio n o f the experiment s o n th e relativ e rate of cyclization versu s dimerization o f DNA fragment s containing phase d A-tracts at 25°C (32). The uncertaint y i n thi s angl e i s estimate d t o b e abou t 10% . Comparative elec trophoresis experiment s revea l that the curvatur e is modulated by only abou t ±10% by changes in the nature of the DNA sequenc e betwee n th e A-tracts (33). The tempera ture dependenc e o f bending, a s well a s the effect s o f ionic conditions , shoul d als o be taken into accoun t when estimatin g the bend angle .
i. 6 Structural evidence from NMR spectroscopy Intrinsic curvatur e of DNA o f the observe d magnitud e require s only smal l deviations from th e norma l B-DN A structure . Fo r example , th e observation s coul d resul t fro m systematic roll o f about -6° i n the A-tracts o r +6 ° i n the DNA segmen t between th e A-tracts. Thi s amoun t o f rol l i s approximatel y equa l t o th e rm s fluctuation s i n th e angle between adjacen t bas e planes that results from thermal motion. NM R structure s reflect informatio n containe d i n a set o f proton—proton vectors , a s well a s scalar coupling constants . So far , the structura l resolutio n tha t ha s bee n achieve d fo r nuclei c acids ha s not bee n sufficien t t o yiel d a definitive solutio n structur e tha t explain s th e global curvature . However, th e NM R dat a contai n a numbe r o f interestin g feature s that must ultimately be explaine d by a definitive structural model.
DNA bending by adenine-thymine tracts 45
9
A serie s of NM R studie s o f poly (dA):pol y (dT ) an d oligonucleotide s containin g A-tracts hav e been reported(19,20,34—43) . I t is generally agree d tha t the structur e is a member o f the B family , bu t wit h som e o f the deoxyribos e ring s showin g deviation s from th e standar d C2'-endo conformatio n (pseudo-rotatio n angle , P = 150—180°) . Inferred P value s of som e o f th e residues , particularl y dT, fal l i n th e rang e 90—130 ° range (38,40) . I t i s also generally agreed tha t ther e i s an unusuall y strong cross-stran d NOE betwee n adenin e H 2 an d deoxyribos e H1' , whic h reflect s a narrowe d mino r groove an d i s probably associate d wit h propelle r twistin g o f th e bas e pairs . Mino r groove widt h seem s to decreas e along the A-trac t fro m 5' t o 3' . Imin o proton s i n the A-tracts have unusually long lifetimes, with th e shortes t lifetim e correspondin g t o th e residue a t the 5'-en d o f the trac t (19,39) . A s discussed above, th e imin o proton s vary in chemica l shift dependin g o n the length of the trac t and position i n it. Particular attentio n ha s bee n pai d t o conformationa l feature s a t th e junction s between th e A-tract s and th e adjacen t DNA (36,40,43) . However , thes e finding s are not consisten t wit h A-tract-induce d bending , sinc e th e directio n o f th e propose d bending i n these molecules, namely , helix axi s deflections corresponding t o rol l a t the A-tract junctions, is not i n agreement wit h th e ben d directio n deduce d fro m th e elec trophoresis experiment s (14) . Indeed, bendin g b y rol l a t the junction doe s no t satisf y the requiremen t fo r being unaltered in directio n whe n th e A-trac t i s rotated abou t its central dyad axis to interchange the A- and T-tracts .
2. X-ray crystallographic studies 2.1 Fibre diffraction studies of poly (dA):poly (dT) Fibres o f pol y (dA):pol y (dT ) wer e firs t analyse d b y Arnot t an d Seisin g (44 ) wh o obtained tw o X-ra y pattern s differen t fro m th e classica l A an d B pattern s o f genera l sequence DNA . On e patter n (a ) obtaine d a t abov e 85 % relative humidity , indicate d tenfold symmetr y wit h a ris e pe r bas e pai r o f 3.2 9 A an d a secon d patter n (B) , obtained belo w 77 % relativ e humidity , indicate d a tenfol d symmetr y wit h a ris e o f 3.24 A per base pair. Analysis of the polycrystalline B pattern yielded a heteronomou s structure where eac h chai n has a different conformation , th e adenin e chai n adopting a conformation wit h C3'-endo-puckere d suga r ring s characteristi c o f th e A-DN A family, an d the thymin e chai n adopting a conformation wit h C2'-endo puckered ring s characteristic of the B-DNA family (45) . Diffraction pattern s similar to thos e of poly (dA):pol y (dT) hav e also been observe d with poly (dI):pol y (dC ) an d with poly (dA-dI):pol y (dT-dC) (46) . An X-ra y analysi s o f fibre s o f th e Ca 2+ sal t o f pol y (dA):pol y (dT ) indicate d a symmetric structur e i n whic h th e tw o chain s ar e conformationall y identica l wit h a B-DNA-typ e backbon e (47) . Th e revise d analysi s of th e Na + sal t o f th e polyme r studied previousl y (se e above) yielde d a structure tha t i s only slightl y heteronomou s and fairly simila r to th e Ca 2+ structur e (47). In a mor e recen t analysi s o f th e sodiu m sal t o f th e homopolyme r (48) , severa l constraints were introduce d i n the refinement o f the model in order t o maintain con formational parameter s clos e t o thos e observe d i n th e crysta l structur e o f th e A3T3-containing dodecamer (49) .
460
Oxford Handbook of Nucleic Add Structure
The uniqu e an d commo n feature s o f th e fibre-base d structure s of pol y (dA):pol y (dT), which distinguis h them mos t fro m genera l sequence B-DNA, are negative incli nation o f th e bas e pair s wit h respec t t o th e heli x axi s (averag e —6°), hig h propelle r twisting o f th e bas e pairs (averag e —26°) an d a ver y narro w mino r groov e (averag e 3.4 A) . I n th e mode l o f Aymami et al. (48) , the larg e propeller twistin g i s associated with bifurcate d hydroge n bond s acros s th e majo r groove . Thes e feature s an d th e exceptionally narro w minor groov e ar e also characteristic of short A-tracts studied by single-crystal X-ray crystallography (see below). 2.2
Crystal structures of A-tracts and related sequences
2.2.1 Helical conformations Crystallographic studie s of short DNA oligomer s hav e been carrie d out ove r the pas t two decades , demonstratin g tha t th e structur e o f the DN A doubl e heli x is dependent on bot h th e bas e sequenc e an d th e environmen t (50 ; Chapte r 6 an d reference s cited therein) . A specia l effor t ha s been directe d toward s the elucidatio n o f A-tractcontaining duplexe s i n a n attemp t t o revea l th e structura l basi s o f A-tract-induce d curvature (49,51-54) . Thes e studie s have shown tha t A-tract DN A assume s a confor mation in which th e heli x axi s is straight, the base pairs are perpendicular t o the heli x axis an d th e helica l periodicit y i s clos e t o 1 0 bas e pair s pe r tur n (Tabl e 14.1) . Th e sugar pucker within th e A-trac t regions reflect s a broad range of conformations, as for the othe r B-typ e structures . However, th e resolutio n o f the diffractio n dat a o f the A tract structure s (1.9—2. 6 A ) i s no t sufficien t t o allo w accurat e determination o f th e sugar conformations. Two structura l feature s specifi c t o A-tract s wer e observe d i n th e variou s crystal structures: an exceptionall y narro w mino r groov e and highl y propeller-twiste d bas e pairs (Tabl e 14.1). The exten t o f propeller twistin g of the A: T bas e pairs was found t o
Table 14.1. Averag e helical parameters of A-tracts and related sequencesa Sequenceb
A-tract/ I-tract
CGCGAATTCGCG CGTGAATTCACG CGCAAAAAAGCG CGCAAAAATGCG CGCGAAAAAACG CGCAAATTTGCG CGCIAATTCGCG CGCAIATMTGCG CCIIICCCGG
AATT AATT AAAAAA AAAAAT AAAAAA AAATTT IAATTC AIATMT IIICCC
Roll Propeller Mino r groove Helix h (A ) Referenc twist (°) angle (°) twist (°) widt 34.9 34.7 36.2 35.7 35.1 36.1 36.8 36.2 35.8
-1.6 -2.7 -0.7 -0.3 0.5 0.8 -0.6 -1.0 -0.2
-17.0 -15.9 -19.8 -18.2 -21.5 -16.6 -17.1 -19.2 -12.6
4.1 3.7 3.7 3.4 3.7 4.7 4.1 3.5 3.7
55 56,57 51 52 53 54 58 59 59
" Adapted from referenc e 60 . I n case s of multiple sites or differen t studie s o f the sam e sequence, th e values correspond to th e averag e of individual averages. bI = inosine, M = 5-methylcytosine.
e
DNA bending by adenine-thymine tracts 46
1
Fig. 14.1. Schemati c representation of potential cross-strand interactions in (a ) AAA, (b) AIA, and (c ) III. Watson—Crick hydroge n bond s ar e show n a s heavy lines and cross-stran d bifurcated hydroge n bond s as broken lines.
be sufficientl y larg e to resul t in interstran d bifurcate d hydroge n bond s betwee n adja cent A and T base s across the majo r groove, a s illustrated schematically in Fig . 14.la . It ha s bee n propose d tha t th e hig h propelle r twistin g associate d with bifurcate d hydrogen bond s observe d in th e crysta l structures might b e importan t fo r the distinc tive abilit y of A-tracts t o induc e DN A curvatur e (49,51). However, th e role s of both propeller twis t and cross-strand interactions were subsequentl y challenged on the basi s of th e observatio n tha t curvatur e i s only weakl y affecte d b y substitutin g some o f th e A:T bas e pairs fo r I: C o r I: M ( M = 5-methylcytosine ) bas e pair s (e.g . AAIA A an d AIAIA), whereas curvature decreases abruptly for pure inosine tract s (I-tracts) (61,62). Since a bifurcated hydroge n bon d doe s not see m t o b e supporte d b y an I: C bas e pair that is flanked by A:T pair s (Fig . 14.1b), it appeared unlikely that the propose d hydro gen bond coul d be the principal component stabilizin g the A-tract structure. In a n attempt to identif y the structura l features o f A-tract an d A-tract-like regions , and t o distinguis h ho w the y diffe r fro m othe r AT-rich sequences , X-ray crystallogra phy an d ge l electrophoresi s studie s o f severa l oligomer s incorporatin g A:T , I:C , o r I:M ( M = 5-methylcytosine ) bas e pairs have been performed recentl y (59) . The X-ra y crystallographic analysi s demonstrate d tha t a n alternatin g purin e regio n o f th e typ e —AIA- i s structurall y similar t o a pur e A-trac t i n tha t bot h ar e characterize d b y a remarkably unifor m stackin g geometry associate d with hig h propelle r twistin g o f th e base pairs (Tabl e 14.1). Clos e interstrand contacts at the majo r groov e between amin o groups acros s A— I base pai r step s were observed . Thi s interactio n appear s to stabilize the geometr y o f such steps and make s them compatibl e wit h A— A steps (see below). In contras t t o A-trac t an d A-tract-lik e regions , I-trac t region s o f th e typ e IIICC C display a variable patter n o f base stackin g geometr y an d significantl y lowe r propelle r twisting (Tabl e 14.1 ) wit h n o indicatio n o f clos e interstran d interaction s acros s I— I steps (i.e . between inosin e carbony l groups an d cytosin e amin o groups) . The inosin e runs, however , shar e tw o feature s i n commo n wit h A-tracts : a helica l repea t o f 10 bp/turn and a narrow minor groov e occupie d b y a spine of hydration (Tabl e 14.1).
462
Oxford Handbook of Nucleic Acid Structure
The majorit y o f the duplexe s incorporatin g A-tracts (4— 6 bp long) displa y a n overall asymmetric ben d a s a resul t o f crysta l packin g interactions , th e exten t o f bendin g (10-22°) dependin g o n th e temperatur e an d crystallizatio n conditions use d (63) . Th e direction of the bending, found t o be localized at the GC-rich region or at the junction betwee n th e A-trac t an d the flankin g GC-rich segment , i s about 90 ° away from that deduce d fo r phased A-tracts b y gel electrophoresis (14) . It shoul d be emphasize d that th e bendin g observe d i n crystallize d oligomer s i s not necessaril y related t o tha t observed i n solution . Thi s bendin g appear s t o b e induce d a t flexibl e site s b y crysta l packing effects (50) . However, the shor t A-tracts flanked by G:C bas e pairs are unben t and rather resistan t to deformation s tha t migh t b e cause d by crystal forces. Unlik e th e structural uniformit y o f A-tracts , region s o f alternatin g A an d T base s o f th e typ e (AT)n(n = 2-3 ) ar e conformationally polymorphic (59,64,65) . The base-stackin g pattern s displaye d b y th e homopurin e step s o f th e typ e A—A , A—I, an d I- A ar e very simila r (59) . Th e propeller-twiste d conformatio n observe d i n such step s shoul d b e supporte d b y th e variou s component s o f base-stacking interac -
Fig. 14.2. Stereoscopi c drawing s of base pair step s showing propeller-twiste d bas e pair s with bifurcate d hydrogen bonds a t the majo r groove. Watson—Crick hydrogen bonds ar e shown as dotted lines. Bifurcated hydrogen bond s ar e show n a s broken line s an d ar e betwee n base s a t th e 5'-end s o f th e tw o strands . (a) A—A/T— T step: the bifurcate d hydrogen bond is between th e amin o hydroge n o f an adenine base and the carbony l oxyge n of a thymine base (take n fro m the crysta l structur e o f CGCAAAAAAGCG, ref . 51) . (b) A—I/M— T step : th e bifurcate d hydroge n bon d i s between th e amin o hydroge n o f a 5-methylcytosine base an d th e amin o nitroge n o f an adenine base, (c ) A—T step : the bifurcate d hydrogen bond is between the amin o nitroge n o f one adenin e bas e an d th e amin o hydroge n o f another one . (b ) and (c ) were take n from th e crysta l structure of CGCAIATMTGCG (59) .
DNA bending by adenine-thymine tracts 46
3
tions, bu t othe r factor s ma y als o contribut e t o th e stabilit y o f thi s conformation . Interstrand electrostati c interaction s betwee n amin o group s an d carbony l oxygen s across A-A step s (Fig . 14.2a ) o r betwee n amin o group s acros s A-I step s (Fig . 14.2b ) could stabiliz e th e hig h propelle r twis t an d thu s confe r structura l invariance t o suc h regions. Attractive interaction s between functiona l group s acros s the majo r groov e ar e also likely t o occu r i n othe r steps , such a s A—T (59,66). I n thi s case , th e interaction s are between th e amin o group s o f th e adenin e base s acros s the groove , a s seen i n several crystal structures and illustrated in Fig . 14.2c . Thi s interaction migh t explain , in part, the relativ e structural uniformity o f such steps , in contras t t o th e larg e variabilit y of T—A step s observe d i n th e crysta l structure s (compile d i n 67—7 0 an d Chapte r 6) . Crystal structure data on A— T steps show tha t they adopt a small roll angle and consid erable propeller twistin g when adjacen t t o shor t A-tracts . Thes e feature s ar e compati ble with thos e observed fo r A-A steps . Hence, the introductio n o f an A-T ste p withi n an A-tract (49,54—59 ) doe s not disrup t the conformationa l uniformity and stability o f such regions. I n contrast to A-T steps , T-A step s separating short A-tracts are characterized b y a positiv e rol l (i.e . bendin g int o th e majo r groove ) an d modes t propelle r twisting (71,72) . Thus, the insertion o f an incompatible hinge like a T-A ste p into an A-tract ca n disrup t structural uniformity an d optima l bas e stacking, unlike th e effec t of an A—T ste p insertion . Th e differen t structura l effect s o f th e tw o insertion s correspond with markedl y distinct melting behaviours (73 ) and gel migration dat a (17,27). X-ray an d gel migration studie s have shown tha t singl e substitution s of I:C o r I: M base pair s within A-tract s hav e little consequenc e fo r eithe r loca l o r globa l structural properties. However , ther e ar e clear differences i n th e behaviou r o f I:M o r I: C versus A:T bas e pairs, which becom e mor e pronounce d a s additional substitutions are made. Phased runs of I:M o r I: C bas e pairs display only a small fraction o f the curvatur e seen for A: T pair s (59 and references therein). The X-ra y stud y o f CCIIICCCG G ha s show n tha t inosin e stretche s displa y lo w propeller twistin g (Tabl e 14.1) . A s a result, interstran d distance s at th e majo r groov e between opposin g amino an d carbonyl groups are relatively long (averag e 3.6 A) com pared with th e equivalen t N...O contact s within A-tract s (average 3.2 A). It therefore appears tha t thi s regio n is not stabilize d by a network of bifurcated hydroge n bonds . The lo w propelle r twistin g an d lac k o f interstrand interaction s between inosin e an d cytosine bases across I— I step s have been suggested as underlying cause s for th e variabl e pattern o f thei r base-stackin g geometrie s i n contras t t o th e relativel y unifor m base stacking geometry o f A-tracts (59). Therefore, i t is likely tha t I-tracts in solution adopt a structur e tha t i s mor e variabl e tha n A-tract s an d close r t o genera l sequenc e B-DNA. Thi s ma y explain th e larg e reductio n i n macroscopi c curvatur e for I-tract s with respec t to A-tracts. Gel migration studie s have shown tha t methylation of cytosines has a weak effec t o n curvature i n case s wher e inosin e base s ar e adjacen t t o adenin e bas e pairs. However , there appear s to b e a cooperative effec t o f th e methy l grou p i n th e cas e of I: M bas e pairs, sinc e th e curvatur e increases significantly for pur e I-trac t a s a resul t o f suc h a modification (59) . N o structura l dat a ar e availabl e o n methylate d I-tract s t o explai n this observation at the molecula r level .
464
Oxford Handbook of Nucleic Acid Structure
2.2.2 Hydration patterns In severa l of the A-tract-containing helices , a single spine of hydration wa s observed i n the mino r groove , spannin g the 4— 6 A: T bas e pairs where th e groov e widt h is remarkably narro w (3— 4 A). Th e spin e consist s of first an d secon d shel l hydration molecules , as illustrated schematically in Fig . 14.3a . Th e firs t shel l molecules lin k th e cross-stran d minor groov e accepto r atoms, N 3 o f purine base s and O2 o f pyrimidine bases , whic h are positioned a t nearly identical site s in the mino r groove. Th e secon d shel l molecule s interact with th e firs t wate r shell to for m a zigzag structure. This characteristi c hydra tion wa s firs t observe d i n th e centra l regio n o f th e B-DN A dodecame r CGC GAATTGCGC (74).
Fig. 14.3. Schemati c representation of idealized minor groov e hydratio n (a) and major groov e hydration (b) wher e B denote s an y base. Acceptor an d dono r atom s o f th e base s ar e show n a s big circle s wit h th e corresponding ato m names. Water molecule s ar e shown a s small circles where firs t an d secon d hydration shell molecules are denoted b y 1 and 2, respectively. Hydrogen bond s are shown a s broken lines.
DNA bending by adenine-thymine tracts 46
5
A spin e o f hydration ha s been als o observed i n th e CCIIICCCG G decame r an d i n several othe r B-DN A helice s i n region s wher e th e mino r groov e i s narro w (59 ; Chapter 9). Unlike th e mino r groove , wher e commo n hydratio n pattern s wer e observe d [ a single spin e for a narrow groov e an d a double ribbo n fo r a wide groove , reviewe d b y Berman (75,76 ; Chapter 9)], the B-DNA major groove ha s not reveale d any common hydration motif . Th e possibilit y o f a uniqu e hydratio n moti f tha t i s specifi c t o A-tracts and relate d sequence s was demonstrated by the crysta l structure of CGCAIATMTGCG (59) . The majo r groove hydratio n of this dodecamer indicate d the exis tence o f a continuou s chai n forme d b y firs t an d secon d shel l molecule s alon g th e major groove , a s illustrated schematically in Fig . 14.3b . The cross-stran d water-mediated interaction s i n the minor groove lin k bases that are neighbouring i n th e 3 ' direction , wherea s the water-mediate d majo r groove contact s link cross-stran d bases neighbourin g i n th e 5 ' directio n (Fig . 14.3) . Thes e hydratio n patterns can therefore stabilize the propeller-twiste d conformation . In this manner th e specific hydratio n contribute s t o th e uniqu e stabilit y an d structural uni-formity o f Atract regions. Based o n th e crysta l structure data an d recen t observation s of th e effec t o f MP D (2-methyl-2,4-pentanediol) o n ge l mobility o f DNA fragment s incorporatin g phased A-tracts (77,78) , i t ha s been suggeste d tha t an y disruptio n o f th e mino r an d majo r groove hydratio n b y dehydratin g agent s suc h a s MPD woul d lea d t o a more flexibl e structure tha t i s simila r t o tha t o f genera l DN A sequence , an d thu s woul d reduc e A-tract-dependent curvatur e (59).
2.3 A-tracts in protein-DNA complexes Several crysta l structures o f protein—DN A complexe s hav e bee n determine d wher e the DN A targe t incorporate s shor t A-tracts . Thes e A-tract s ar e o f th e kin d A n an d AnTm I n severa l o f thes e complexe s th e DN A targe t i s severel y deforme d an d th e A-tract i s bent a t th e mino r groov e (79-86) . Th e A-trac t bendin g i s achieve d b y a combination o f loca l rol l an d til t angles , resultin g i n negativel y incline d bas e pairs with respec t to th e heli x axis . The contributio n o f the shor t A-tract s (4- 6 bas e pairs) to the overal l DNA curvatur e is modest, rangin g fro m 4 to 13° , wherea s the majo r contribution i s achieved b y majo r groove compression a t th e flankin g region s (Tabl e 14.2 and ref s 79—86) . Like the unboun d A-tracts , the mino r groove s of the complexe d A-tracts are narrow, the averag e helix twis t i s close to 36 ° o r slightl y overwound , an d the bas e pairs display a large propeller twistin g (Tabl e 14.2). The structura l similarity betwee n th e variou s A-tract-containin g helice s i s illus trated i n Plat e XIX . Tw o representative s of th e complexe d A-tract-containin g frag ments (A 6, with the IH F protein , an d A 5, with the DNA-bindin g domain o f the 434 represser protein , Tabl e 14.2 ) ar e displaye d togethe r wit h th e fibr e structur e of pol y (dA):poly (dT ) derive d b y Alexee v et al. (47 ) an d th e crysta l structur e o f a n A 6containing dodecame r (51) . Also shown fo r comparison i s the fibre-base d structure of the genera l sequence B-DNA helix (87) . Th e mino r grooves of the A-tract regions are narrow and the bas e pairs are highly propelle r twisted . In contrast , the mino r groov e of th e genera l sequenc e B-DN A i s wide an d th e bas e pair s ar e essentiall y flat . Th e
466
Oxford Handbook of Nucleic Add Structure
Table 14.2. Averag e helical parameters of A-tracts complexe d t o proteins" Protein
A-tract
CAP AAAA 434 represser AAAAA NF-KB AATT SRF AATT MATal/MATa2 AATTT AAAT Oct-lPOU IHF AAAAAA
Helix Roll Propeller twist (°) angle (°) twist (°) 37.5 36.5 38.0 37.8 34.7 36.2 35.5
-0.1 -1.5 -3.0 -5.0 -2.6 -2.7 -2.0
-21.8 -16.6 -16.6 -14.7 -15.0 -16.2 -15.7
Minor A-tract groove bending (°) Reference width (A) 3.7 4.0 3.5 3.4 3.3 4.5 3.4
9.4 9.8 6.4 8.8 13.1 8.9 4.2
79,80 81 82 83 84 85 86
' Adapte d from referenc e 6 0 (as for Table 14.1) .
four A-trac t helice s differ i n th e degre e o f inclination o f the bas e pairs, which displa y gradual chang e fro m -6 ° i n pol y (dA):pol y (dT ) throug h — 4 and —2 ° in th e com plexed A 5 and A6 regions, to nearl y 0° in the A 6 region o f the unboun d oligomer .
3. The stereochemical basis of A-tract-dependent curvature The ke y questio n i s what kin d o f mechanism , a t th e molecula r level , i s causin g th e observed macroscopi c curvatur e o f phased , 4— 6 bas e pair-lon g A-tracts . Sinc e n o single structur e explains th e whol e phenomenon , i t i s necessary at present t o rel y o n models, severa l of which hav e been propose d (8,13,88—94) . The y generall y confor m to th e ge l migration data , which sugges t tha t th e centr e o f curvatur e i s towards th e minor groov e o f th e A-tract s and/o r toward s th e majo r groov e o f th e intervenin g general sequence s (14) . However, the y diffe r substantiall y in th e detail s of the stereo chemical origi n o f curvature. The crysta l structure, spectroscopic, and ge l migration dat a suppor t a model wher e macroscopic curvatur e o f A-tract-containing DN A an d relate d stretche s is correlate d with a unique structur e conferred b y a narro w mino r groove , propeller-twiste d bas e pairs, cross-strand bifurcated hydrogen bonds , an d characteristic hydration. The variou s oligonucleotid e crysta l structure s show tha t A-trac t DN A i s straight. Here, w e us e th e ter m 'straigh t DNA ' t o mea n a structur e where th e bas e pairs are perpendicular to the straigh t helix axi s so that the rol l and tilt angles between adjacen t base pair s are essentiall y zero . Thi s shoul d b e clearl y distinguishe d fro m othe r struc tures, wher e th e heli x axi s is straight bu t th e bas e pairs are uniforml y incline d wit h respect t o a plane that i s perpendicular t o thi s axi s an d th e bas e steps have essentiall y no variatio n i n rol l an d til t angles . I n general , positiv e inclinatio n o f the bas e pairs is associated with a positive roll and a narrow majo r groove, wherea s negative inclinatio n is accompanie d b y a negative roll an d a narrow mino r groove . Example s ar e th e fibr e structure o f pol y (dA):pol y (dT) , wher e th e bas e pair s ar e negativel y incline d (se e above) an d th e fibr e an d crysta l structures of A-DNA wher e th e bas e pairs are positively inclined t o th e heli x axi s (87,95; Chapte r 5 and references therein). I f the latter
DM4 bending by adenine—thymine tracts 46
7
type o f DNA segmen t wer e joined wit h a 'straight' one , a change in DN A trajector y would resul t at the junction betwee n th e two , a s demonstrated recentl y i n th e DN A complexed t o th e TATA-bo x bindin g protei n (96) . Since the crysta l structures of short A-tracts are straight i n the abov e sens e an d exhibit little structural variatio n amongs t a number o f crysta l structures , i t ha s been propose d tha t bendin g mus t occu r outsid e such region s (51,63) . Related variant s o f th e straigh t A-trac t mode l sugges t tha t a gentl e roll-induce d writhe i s a propert y o f al l B-DNA sequence s t o a varyin g degree , wit h A— A step s exhibiting a n average roll clos e to zer o (88,89) . I n such a manner, th e globa l curvature of DNA containin g shor t A-tract s in phase with the helica l repeat is a consequence o f a ne t positiv e rol l (i.e . majo r groov e compression ) accumulate d i n th e intervenin g sequences. Severa l studies of B-DNA crysta l structures have shown tha t GC-rich an d general sequence s can bend int o th e majo r groov e (63,97) . A recent crysta l structure has demonstrated tha t a general B-DNA sequence ca n adopt a roll-induced writh e s o that th e bas e pairs are uniformly an d positively incline d t o th e heli x axi s by nearl y 7° (98). I t shoul d b e noted , however , tha t th e averag e rol l angl e determine d fro m B-DNA crystal structures has been estimate d to be near zero (69) . On th e other hand , th e fibre diffraction studie s of the homopolymer dA:d T and the crystal structur e studie s o f A-tract s boun d t o protein s indicat e th e possibilit y o f a 'bent' A-tract structure ; i.e. th e bas e pairs are negatively incline d t o th e heli x axis . I n this manner , th e globa l curvatur e o f phased A-tract s separate d by 'straight ' B-DN A segments is a consequence o f a net negativ e roll (i.e . mino r groov e compression ) at th e A-tract regions . Thi s model, originall y calle d th e 'junctio n model' , wa s proposed b y Koo etal. (8). The straigh t A-trac t mode l an d th e ben t A-trac t mode l presen t th e tw o extrem e views o f a scientific controvers y lastin g for more tha n a decade. The tru e stereochemi cal mechanism probabl y lie s somewhere betwee n th e tw o extremes . Indeed , w e hav e become increasingl y o f the opinio n tha t A-trac t curvatur e may be delocalized , i n th e sense that ther e ar e contributions fro m negativ e rol l i n th e A-tract s an d positive roll i n the adjacen t DN A segments , and there ma y even be small tilt contribution s a t the junctions. I t shoul d als o b e kep t i n min d tha t overal l curvatur e ma y b e th e resul t o f anisotropic bendability. Both A-tract s and B-DNA segments may be essentiall y straight in thei r lowes t energ y state , but i f bending excursion s tha t compress the mino r groov e in th e A-tract s an d th e majo r groov e i n B-DN A ar e les s costl y energeticall y tha n motions i n the opposit e directions , the averag e result will b e curvature of the molecul e in solution . I f the effec t i s operative i n bot h sequences , th e averag e exces s roll i n th e preferred directio n nee d b e onl y abou t 3 ° t o explai n th e magnitud e o f the observe d global curvature . Experimenta l verificatio n o f suc h a small effec t i s an imposin g chal lenge. Furthe r studies are needed t o establis h the mechanis m an d relative contribution s of A-tracts and the adjacen t sequence s to th e observed macroscopi c curvature .
Acknowledgements This work was supported by grants from the National Institutes o f Health (GM-21966 to D.M.C. ) an d th e Israe l Science Foundatio n administere d b y the Israe l Academy o f Sciences and Humanities (t o Z.S.).
468
Oxford Handbook of Nucleic Acid Structure
References 1. Marini , J.C., Levene , S.D., Crothers , D.M . an d Englund, P.T . (1982 ) Proc. Natl. Acad. Sci. USA 79 , 7664 . 2. Hagerman , PJ . (1984 ) Proc. Natl. Acad. Sci.USA 81 , 1763 . 3. Griffith , J. , BleymanM. , Rauch , C.A. , Pitchin , P.A . an d Englund , P.T . (1986 ) Cell 46 , 717. 4. Wu , H.-M . an d Crothers, D.M . (1984 ) Nature 308 , 509 . 5. Lumpkin , O.J . an d Zimm, B.H . (1982 ) Biopolymers 21 , 2315 . 6. Crothers , D.M. , Drak , J., Kahn , J.D.and Levene , S.D . (1992 ) Meth. Enzymol. 212B , 3 . 7. Hagerman , P.J . (1985 ) Biochemistry 24 , 7033. 8. Koo , H.-S. , Wu, H.-M . an d Crothers, D.M . (1986 ) Nature 320 , 501 . 9. Anderson , J.N. (1986 ) Nud. Acids Res. 14, 8513 . 10. Beutel , B.A . an d Gold, L . (1992) J. Mol. Biol. 228, 803 . 11. Widlund , H.R. , Cao , H. , Simonsson , S. , Magnusson, E. , Simonsson , T. , Nielsen , P.E. , Kahn, J.D., Crothers , D.M . an d Kubista, M. (1997 ) J. Mol. Biol. 267, 807 . 12. Hagerman , PJ . (1990 ) Annu. Rev. Biochem. 59, 755 . 13. Crothers , D.M. , Haran , T.E. an d Nadeau, J.G. (1990 ) J. Biol. Chem. 265, 7093 . 14. Zinkel , S.S . an d Crothers, D.M . (1987 ) Nature 328, 178 . 15. Salvo , J.J. an d Grindley, N.D . F . (1988) EMBOJ. 7 , 3609. 16. Drak , J. an d Crothers , D.M . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 3074 . 17. Hagerman , PJ . (1986 ) Nature 321, 449 . 18. Burkoff , A.M . an d Tullius, T.D . (1987 ) Cell 48, 935 . 19. Leroy , J.L., Cherretier , E. , Kochoyan , M . an d Gueron, M . (1988 ) Biochemistry 27,8894 . 20. Nadeau , J.G. an d Crothers , D.M . (1989 ) Proc. Natl. Acad. Sci. USA 86 , 2622 . 21. Marini , J.C., Effron , P.N. , Goodman , T.C. , Singleton , C.K. , Wells , R.D. , Wartell , R.M . and Englund, P.T . (1984 ) J. Biol. Chem. 259 , 8974 . 22. Dieckmann , S. (1987) Nud. Acids Res. 15, 247 . 23. Breslauer , K.J. (1991 ) Cun. Opin. Struct. Biol. 1, 416 . 24. Herrera , J.E. an d Chaires, J.B. (1989 ) Biochemistry 28 , 1993 . 25. Chan , S.S. , Breslauer , K.J. , Hogan , M.E. , Kessler , D.J. , Austin , R.H. , Ojemann , J. , PassnerJ.M. and Wiles, N.C . (1990 ) Biochemistry 29, 6161 . 26. Chan , S.S. , Breslauer, K.J., Austin, R.H. an d Hogan, M.E . (1993 ) Biochemistry 32 , 11776 . 27. Haran , T.E . an d Crothers, D.M . (1989 ) Biochemistry 28 , 2763 . 28. Chan , S.S. , Austin, R.H., Mukerji , I. and Spiro, T.G. (1997 ) Biophys.J. 72, 1512 . 29. Calladine , C.R. , Drew , H.R . an d McCall, M.J . (1988 ) J. Mol. Biol. 201, 127 . 30. Ulanovsky , L.E. , Bodner , M., Trifonov , E.N. an d Choder, M. (1986 ) Proc. Nad. Acad. Sci. USA 83 , 862 . 31. Levene , S.D., Wu , H.-M . an d Crothers, D.M . (1986 ) Biochemistry 25 , 3988. 32. Koo , H.-S. , Drak, J., Rice, J.A. and Crothers, D.M . (1990 ) Biochemistry 29 , 4227 . 33. Haran , T.E., Kahn , J.D. an d Crothers, D.M . (1994 ) J. Mol. Biol. 244, 135 . 34. Behling , R.W . an d Kearns, D.R. (1986 ) Biochemistry 25 , 3335. 35. Behling , R.W., Rao , S.N. , Kollman , P . and Kearns, D.R. (1987 ) Biochemistry 26 , 4674 . 36. Katahira , M., Sugeta , H., Kyogoku , Y., Fujii , S., Fujisawa , R. and Tomita , K. (1988 ) Nud. Acids Res. 16 , 8619 . 37. Gupta , G. , Sarma, M.H. an d Sarma, R.H. (1988 ) Biochemistry 27, 7909. 38. Celda , G., Widmer, H., Leupin , W., Chazin , W.J., Denny , W.A . an d Wutrich, K . (1989) Biochemistry 28 , 1462 . 39. Moe , J.G. an d Russu, I.M . (1990 ) Nud. Acids Res. 18, 821 . 40. Searle , M.S. an d Wakelin, L.P . (1990 ) Biochim. Biophys. Acta 104 9 69 .
DNA bending by adenine-thymine tracts 46
9
41. Karahira , M., Sugeta , H. an d Kyogoku, Y . (1990 ) Nucl. Acids Res. 18 , 613 . 42. Chen , S.M., Leupin , W. an d Chazin, W.J . (1992 ) Int. J. Biol. Macromol 14 , 57. 43. Young , M.A. , Srinivasan, J., Goljer , I. , Kumar , S. , Beveridge , D.L . an d Bolton , P.H . (1995) Meth. Enzymol. 261, 121 . 44. Arnott , S. and Seising, E. (1974 ) J. Mol Biol. 88 , 509 . 45. Arnott , S. , Chandrasekaran, R. Hall , I.H . an d Puigjaner, L.C. (1983 ) Nucl. Acids Res. 11 , 4141. 46. Leslie , A.G.W., Arnott , S. , Chandrasekaran, R. an d Ratliff, R.L . (1980 ) J. Mol. Biol. 143 , 49. 47. Alexeev , D.G., Lipanov , A.A. and Skuratovskii, I.Y. (1987 ) Nature 325, 821 . 48. Aymami , J., Coll , M.,Frederick, C.A. , Wang , A.H.-J . an d Rich, A. (1989 ) Nucl. Acids Res. 17, 3229. 49. Coll , M. , Frederick , C.A. , Wang , A.H-J . an d Rich , A . (1987 ) Proc. Natl. Acad. Sd. USA 84, 8385 . 50. Shakked , Z. (1991 ) Curr. Opin. Struct. Biol. 1, 446 . 51. Nelson , H.C.M., Finch , J.T., Luisi , B.F. an d Klug, A. (1987 ) Nature 330, 221 . 52. DiGabriele , A.D. , Sanderson , M.R. an d Steitz , T.A. (1989 ) Proc. Natl. Acad. Sci. USA 86 , 1816. 53. DiGabriele , A.D. an d Steitz, T.A. (1993 ) J. Mol. Biol. 321, 1024 . 54. Edwards , K.J. , Brown , D.G. , Spink , N. , Skelly , J.V. an d Neidle , S . (1992 ) J. Mol. Biol. 226, 1161 . 55. Dickerson , R.E . an d Drew, H.R . (1981 ) J. Mol. Biol. 149, 761 . 56. Larsen , T.A., Kopka , M.L. and Dickerson, R.E . (1991 ) Biochemistry 30 , 4443 . 57. Narayana , N., Ginell , S.L. , Russu, I.M . an d Berman, H.M . (1991 ) Biochemistry 30 , 4449 . 58. Xuan , J.-C. an d Weber, I.T. (1992 ) Nucl. Acids Res. 20, 5457 . 59. Shatzky-Schwartz , M. , Arbuckle , N.D., Eisenstein , M., Rabinovich , D. , Bareket-Samish , A., Haran, T.E. , Luisi , B.F. and Shakked, Z, (1997 ) J. Mol. Biol. 267, 595 . 60. Shatzky-Schwartz , M . (1997 ) PhD Thesis . Weizmann Institut e of Science, Israel. 61. Koo , H.-S . an d Crothers, D.M. (1987 ) Biochemistry 26 , 3745. 62. Diekmann , S. , Mazzarelli, J.M., McLaughlin , L.W. , vo n Kitzing , E. , an d Travers , A.A . (1992)J. Mol. Biol. 225, 729 . 63. Dickerson , R.E. , Goodsell , D . and Kopka, M.L. (1996 ) J. Mol. Biol. 256, 108 . 64. Yoon , C. , Prive , G.G. , Goodsell , D.S . an d Dickerson , R.E . (1988 ) Proc . Natl. Acad. Sci. USA 85 , 6332 . 65. Yuan , H., Quintana , J.R. an d Dickerson, R.E . (1992 ) Biochemistry 31 , 8009 . 66. Sponer , J. an d Kypr, J. (1994 ) Int. J. Biol. Macromol. 16 , 3. 67. Shakked , Z., Guzikevich-Guerstein , G. , Frolow , F. , Rabinovich, D. , Joachimiak, A . and Sigler, P.B. (1994 ) in Structural Biology: the State of the Art, (Sarma , R.H. an d Sarma , M.H. , eds), Vol. 1 , pp. 199-216 . Adenine Press, New York . 68. Suzuki , M. an d Yagi, N. (1995 ) Nucl. Acids Res. 23, 2083 . 69. Gorin , A.A. , Zhurkin, V.B . and Olson, W.K. (1995 ) J. Mol. Biol. 247, 34 . 70. E l Hassan, M.A. an d Calladine C.R . (1997 ) Phil. Trans. R. Soc. Land. A355, 43. 71. Goodsell , D.S. , Kaczor-Grzeskowiak , M . an d Dickerson , R.E . (1994 ) J. Mol. Biol. 239 , 79. 72. Balendrian , K., Rao, S.T. , Sekharudu , C.Y., Zon , G . and Sundaralingam, M. (1995 ) Acta Cryst. D51 , 190 . 73. Park , Y.W. an d Breslauer, K.J. (1991 ) Proc. Natl. Acad. Sd. USA 88 , 1551 . 74. Drew , H.R . an d Dickerson, R.E. (1981 ) J. Mol. Biol. 151, 535 . 75. Berman , H.M . (1991 ) Curr. Opin. Struct. Biol. 1, 423 . 76. Berman , H.M . (1994 ) Curr. Opin. Struct. Biol. 4, 345 .
470
Oxford Handbook of Nucleic Acid Structure
77. Sprous , D., Zacharias , W., Wood , Z.A . and Harvey , S.C . (1995 ) Nud. Adds Res. 23, 1816. 78. Dlakic , M. , Park , K. , Griffith , J.D. , Harvey , S.C . and Harrington , R.E . (1996 ) J. Biol. Chem.271, 17911. 79. Schultz , S.C, Shields, G.C. and Steitz, T.A. (1991) Science 253, 1001. 80. Parkinson , G. , Wilson , C. , Gunasekera , A., Ebright , Y.W. , Ebright, R.H . an d Berman , H.M. (1996 ) J. Mol. Biol. 260,395. 81. Rodgers , D.W . and Harrison, S.C . (1993) Structure 1, 227. 82. Ghosh , G., Van Duyne, G. , Ghosh , S . and Sigler, P.B. (1995) Nature 373, 303. 83. Pellegrini , L. , Tan, S. and Richmond, T.J. (1995 ) Nature 376, 490 . 84. Li , T, Stark , M.R.Johnson, A.D. and Wolberger, C . (1995 ) Science 270, 262 . 85. Klemm , J.D., Rould, M.A., Aurora, R., Herr , W . an d Pabo, C.O. (1994 ) Cell 77, 21. 86. Rice , P.A., Yang, S.-W., Mizuuchi, K . and Nash, H.A . Cell, 87, 129 5 (1996). 87. Chandrasekaran , R. an d Arnott , S . in Landolt-Bornstein, New Series, Croup VII (Biophysics), (Saenger, W., ed.) , Vol. 1b, pp. 31-170. Springer-Verlag, Berlin . 88. Calladine , C.R. , Drew, H.R., and McCall, M.J . (1988) J. Mol. Biol 201 , 127. 89. Maroun , R.C . and Olson, W.K . (1988) Biopolymers 27, 585. 90. D e Santis , P. Palleschi, A. Savino, M an d Scipioni, A . (1990 ) Biochemistry 29 , 9269 . 91. Bolshoy , A. , McNamara , P. , Harrington , R.E. , an d Trifonov , E.N . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 2312. 92. Zhurkin , V.B. , Ulyanov, N.B. , Gorin, A.A . and Jernigan, R.L . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 7046 . 93. Olson , W.K. , Marky, N.L. , Jernigan, R.L . and Zhurkin , V.B . (1993) J. Mol. Biol. 232, 530. 94. Goodsell , D.S . and Dickerson, R.E. (1994) Nud. Acids Res. 22 , 5497 . 95. Haran , T.E. and Shakked, Z. (1988 ) J. Mol. Struct. (Theochem.) 179 , 367. 96. Guzikevich-Guerstein , G . and Shakked, Z. (1996 ) Nature Struct. Biol. 4, 32. 97. Goodsell , D.S. , Kopka, M.L., Cascio, D . an d Dickerson, R.E. (1993) Proc. Natl. Acad. Sci. USA 90 , 2930 . 98. Rozenberg , H. , Rabinovich , D. , Frolow , F. , Hegde , R.S . and Shakked , Z . (1998 ) Proc. Natl. Acad. Sci. USA, in press.
15 Structures and interactions of helical junctions in nucleic acids David M. J. Lilley CRC Nucleic Acid Stmcture Research Group, Department of Biochemistry, The University, Dundee DD1 4HN, UK
1. The occurrence of helical junctions in biology Helical junction s i n nuclei c acid s ar e branc h point s wher e doubl e helica l segment s intersect wit h axia l discontinuities, such that strands are exchanged betwee n th e differ ent helica l sections . While bulge s ca n be brought int o this definition , w e will restrict our attentio n t o helica l junctions i n thi s chapter , o f whic h th e mos t commo n ar e three- o r four-way junctions (Fig . 15.1). These can be perfect junctions, wher e ever y base is paired with it s Watson-Crick complement , o r the y ca n contain mismatche s o r unpaired bases; the latte r can have significant effects o n th e foldin g of the structure s in some cases . A systematic nomenclature exist s for th e unambiguou s descriptio n o f dif ferent junctions (1). Helical junctions ar e quit e commo n i n RN A species . Fo r example , i f we loo k a t the secondar y structur e of a rRNA specie s we will fin d example s of three- an d four way junctions. The y ar e seldo m perfec t however , an d on e o r mor e single-strande d bases ar e ofte n presen t a t th e poin t o f stran d exchange . A numbe r o f functional catalytic RN A molecule s ar e base d aroun d helica l junctions , suc h a s th e hairpi n ribozyme (2) , which i s a four-way junctio n i n th e tobacc o ringspo t vira l RNA, and the hammerhead ribozyme , whic h ca n be regarded a s an imperfect three-way junctio n (3,4) (se e also Chapter 17). In th e cas e of DNA, th e mai n biologica l significanc e of branched helica l specie s is as intermediate s i n DN A rearrangement s o f variou s kinds , notabl y i n recombinatio n events. Th e four-wa y junctio n ha s been propose d t o b e th e centra l intermediat e i n
Fig. 15.1. Helical junctions in nucleic acids. The junctions of biological significance ar e three- and fourway branc h points . Junctions can be perfectl y bas e paired , or the y ca n b e modifie d by th e additio n of unpaired bases. The nomenclatur e used is the IUPA B scheme explained in ref. 1.
472
Oxford Handbook of Nucleic Acid Structure
homologous geneti c recombinatio n (5—11) , create d b y stran d invasio n betwee n tw o homologous DN A molecules . I n th e integras e clas s o f site-specifi c recombinatio n events ther e i s good evidence fo r a four-wa y junctio n intermediat e (12-16) . DN A junctions ca n als o aris e i n othe r ways , includin g th e replicatio n o f DNA , a s exemplified b y bacteriophage T4 (17) . DNA junction s ar e substrate s for protein s involve d i n th e late r stage s o f geneti c recombination. Protein s accelerat e th e proces s o f branch migration , an d ultimatel y resolve th e branc h poin t t o recreat e two independen t duple x species . Such protein s recognize thei r DN A substrate s a t th e leve l o f tertiar y structure , a proces s tha t should reflec t molecula r recognitio n o f DN A structur e o n a relativel y larg e scale . More recentl y i t ha s become apparen t that , a s well a s recognizing branche d DN A structure, suc h protein s als o alte r th e ver y structur e tha t the y recogniz e i n man y cases. One questio n tha t w e might usefull y pos e in thi s review i s whether w e can establish some general folding principle s for helical branch point s in nucleic acids. Two candidate principles offer themselve s a t this stage , and we will retur n t o thes e at the en d o f the chapte r to see how well the y bea r up. • Coaxial helical stacking. The formatio n o f branch points potentiall y involves unstack ing an d exposure o f base pairs to solvent . Coaxia l stackin g of helical arm s maximize s base stackin g interactions , an d thu s foldin g base d o n coaxia l stackin g migh t b e expected. A n earl y exampl e o f thi s ca n b e see n i n th e tertiar y structur e o f tRN A (18,19). Coaxia l stackin g can create alternativ e conformers , the relativ e stabilitie s of which ar e usually dependent o n loca l sequence. • Ion-dependent folding. Nuclei c acid s ar e highl y charge d polyelectrolytes . Thus , thei r folding i s goin g t o b e quit e differen t i n principl e fro m tha t o f proteins . Phosphate—phosphate repulsio n wil l ten d t o kee p th e structur e extende d i n th e absence o f charge neutralization , an d thu s metal ion s wil l pla y a n important rol e i n the folding . Th e foldin g may, in turn , creat e specifi c ion-bindin g pockets, an d such site-bound ion s ca n themselve s b e ver y importan t i n th e functio n o f th e nuclei c acid, notably in ribozyme catalysis.
2. Approaches to the study of branched nucleic acids Helical junction s ar e extende d species , an d th e analysi s o f thei r structur e generall y requires th e descriptio n o f conformatio n o n a relatively larg e scale . Initiall y i t i s th e global structur e tha t i s analysed, and informatio n abou t th e relativ e configuration s of helical arm s in spac e and th e angle s between th e helica l axe s i s sought. A t suc h earl y stages o f th e investigatio n hig h resolutio n method s suc h a s NMR spectroscop y ar e not appropriate , and the complexit y an d size of the structure s makes their applicatio n difficult. Technique s ar e require d tha t ar e sensitiv e to distance s ove r a relativel y lon g range (e.g . 20—10 0 A), and tha t ca n report o n th e relativ e disposition o f entire helica l arms. Tw o approache s hav e bee n particularl y valuable , namel y comparativ e ge l electrophoresis and fluorescence resonance energ y transfer .
Structures and interactions of helical junctions in nucleic acids 47
3
2.1 Comparative gel electrophoresis
Gel electrophoresis ha s been extensivel y applied to th e stud y of nucleic acid structure, and ha s provided a large bod y o f valuabl e dat a despit e th e relativ e simplicit y o f th e approach. Fo r example, electrophoresi s provided man y o f the ke y observations i n th e analysis o f sequence-directe d DN A curvature , an d ha s continue d t o provid e grea t insight into DN A structure s of various kinds. The proble m inheren t i n the techniqu e is the lac k of a detailed physical understanding lying behind th e method . Yet , despite this drawback , valuabl e contribution s have bee n mad e toward s ou r understandin g o f important structure s in bot h DN A an d RNA . Th e essentia l observatio n mad e i n many systems is that deviations from linearit y in double-strande d nuclei c acid s results in anomalously slow migration i n polyacrylamide gels (20—23) , an d that the fragments migrate mos t slowl y whe n th e sequenc e causin g th e axia l deformatio n i s centrall y located (24) . Variou s theories can provide at least qualitativ e agreemen t wit h experi mental dat a (25—27) . Mos t ar e based upon th e ide a o f the nuclei c acid reptation (28), in whic h th e nuclei c acid is considered t o mov e throug h th e ge l in a tube create d by the matrix, unde r the influence of the electri c field. Lump kin and Zimm (26 ) derived a relationshi p between th e rat e o f migratio n ((u) and th e end-to-en d distanc e o f th e molecule:
where Q is the charg e o n th e molecule , £ is the frictiona l coefficien t fo r translatio n along the tube , L is the contou r lengt h o f the molecule , an d hx is the componen t o f the end-to-end vector h in the directio n o f the electri c field. Th e bracket s indicate an average ove r a n ensemble o f configurations. The dependenc e o n end-to-en d distance can explai n the sensitivit y to shape , since this will be reduce d by curvature or kinking, and thu s suc h fragment s wil l migrat e mor e slowly . Usin g Mont e Carl o method s t o generate a n ensembl e o f chai n trajectories , Leven e an d Zim m (27 ) calculate d th e behaviour o f curve d DN A fragment s unde r electrophoresi s i n polyacrylamide . The y found i t necessar y to includ e cross-interactio n betwee n th e bendabilit y o f the DN A and th e elasti c properties o f th e ge l matri x t o obtai n a goo d fi t wit h experimenta l data. Calladin e and coworker s (29,30 ) hav e take n a different approac h t o explai n th e reduced mobility o f curved DNA, calculatin g the probabilit y o f the cylindrica l envelope o f a superhelix intersectin g randoml y locate d ge l fibres . Th e cylindrica l radius expands wit h th e curvature , increasin g th e probabilit y o f obstructio n t o forwar d motion. Gel electrophoresis i s very powerful in the analysi s of the globa l structure s of DN A junctions. I t wa s demonstrated ove r a decad e ag o tha t suc h specie s exhibited anom alously slow migration in polyacrylamide (31) , an d that the mobility depende d o n th e metal ion s presen t (32) . I n th e applicatio n o f comparativ e ge l electrophoresi s t o branched DNA, a set o f subspecies are create d havin g tw o arm s that ar e significantly longer tha n th e remaining arm(s). This can be don e by ligating reporter arms on to a junction cor e (33) , o r perhap s more easil y by shortening th e arm s (typicall y from 4 0 to 1 2 bp) b y restriction cleavag e (34). In th e cas e o f four-way junctions, ther e ar e six
474
Oxford Handbook of Nucleic Acid Structure
different specie s with tw o lon g arms , whil e i n th e cas e o f three-wa y junction s ther e are three . Th e electrophoreti c mobilit y o f th e two-long-ar m specie s i n polyacryl amide ar e compared, an d th e result s analyse d o n th e assumptio n that faste r mobilit y reflects a longer angl e betwee n th e long arms . I n thi s wa y w e ca n deriv e a n overal l shape fo r th e branche d molecule . This comes fro m comparison o f the mobilitie s of a set o f similar species, and relie s o n symmetr y an d shap e arguments; thu s the lac k of a fully develope d physica l basis for electrophoresi s need no t preven t a qualitative pictur e of the globa l structur e from emerging . Indeed , ou r experienc e usin g this approach for the stud y of a number o f different branche d specie s indicates that it is very powerful if used carefully , an d comparison s wit h independen t technique s have alway s confirmed the conclusion s from th e electrophoresis .
2.2 Fluorescence resonance energy transfer Fluorescence method s ca n contribute significantl y to ou r understandin g o f the struc ture an d dynamic s o f macromolecule s (35-40) . I n conjunctio n wit h moder n soli d phase syntheti c methods fo r bot h DN A an d RNA , an d th e variet y o f fluorophores now availabl e (41), i t ha s become a powerful metho d fo r obtainin g distanc e informa tion i n folded nucleic acids. In fluorescenc e resonanc e energ y transfe r (FRET ) experiments , tw o differen t fluorophores (e.g . fluorescein and tetramethyl rhodamine ) ar e coupled t o known posi tions i n th e macromolecule . I n th e cas e o f nucleic acids , the 5'-termin i o f individual strands provid e a convenien t locatio n i n man y applications . Upo n excitatio n o f th e donor (fluorescei n in th e abov e example) , dipola r couplin g betwee n th e transitio n moments o f the fluorophores lead s t o a transfer o f excitation fro m th e dono r t o th e acceptor, reducin g th e fluorescen t quantu m yiel d an d lifetim e o f th e dono r an d increasing th e fluorescen t emissio n fro m th e acceptor . Becaus e o f th e dipola r cou pling, th e efficienc y o f the energ y transfe r depends o n th e invers e sixt h power o f the distance between th e dyes , and thu s the efficienc y o f energy transfe r (£ ) i s greater fo r short separation s and fall s of f as the distanc e is increased, i.e.
where R i s th e distanc e an d R° i s th e distanc e a t whic h energ y transfe r i s 50 % efficient. The mos t sensitiv e way to observe energy transfer is to measure th e enhance d emis sion from th e acceptor . Since the emissio n from the dono r als o contains a componen t from direc t excitation , thi s must b e normalize d (40) , an d thi s allows the efficienc y o f the transfe r t o b e calculated . The mos t reliabl e results derived fro m FRE T hav e been acquired b y synthesizin g a series o f DNA molecule s tha t diffe r onl y i n th e position s where th e donor an d acceptor molecules ar e attached to the DN A molecule s (42,43) . In thi s way we ca n map relative distances within a n ensemble o f DNA molecule s that have th e same globa l structur e excep t at the local positions o f the dy e molecules. Th e conclusions ar e therefor e draw n fro m comparison s betwee n th e energ y transfe r
Structures and interactions of helical junctions in nucleic acids 47
5
efficiencies measure d fro m a series of isomeric o r ver y simila r molecules , rathe r tha n the determinatio n o f absolute distances . This remove s man y uncertaintie s that migh t be present , suc h a s an exac t knowledg e o f th e orientatio n paramete r /c 2 and R0. We have applie d th e FRE T metho d t o th e stud y o f a series of DNA duplexe s o f lengt h varying between 8 and 20 bp (44) . Overall th e FRET efficienc y reduce d wit h increas ing lengt h o f th e heli x a s expected, but , i n addition , w e observe d th e cylindrica l geometry o f the DN A a s a sinusoidal modulation o f the efficiency . Goo d agreemen t was found between th e experimenta l dat a and th e calculate d values based o n dipola r energy transfe r an d a knowledg e o f th e geometr y o f double-strande d DNA . I n another study , w e observe d a n increasin g kinkin g o f DN A an d RN A duplexe s a s bulges o f differen t size s wer e introduce d int o th e centr e o f th e molecul e (45) ; th e efficiency o f FRET between fluorophore s attache d to th e tw o 5'-termin i increased as the end-to-en d distance shortened a s a result of kinking . As applied to branched nucleic acids, the FRET approach requires the attachmen t of fluorophore donor—accepto r pairs to th e 5'-termin i o f pairs of arms of a junction wit h arms o f equa l length. Thus , fo r a four-way junction, si x different specie s of pairwise labelled specie s are prepared, and the efficiencie s of energy transfe r are measured unde r a give n se t of conditions. Thi s the n provide s a measure of the relativ e end-to-end dis tances between th e differen t arms , and from thi s the globa l structure may be deduced .
3. The four-way DNA junction The structur e o f the four-wa y DN A junctio n ha s been extensivel y studied in th e las t decade. 3.1 The global structure of the four-way DNA junction The four-wa y (4H ) junction ca n exist in a number o f different structures , and under goes ion-dependen t foldin g transitions (Fig. 15.2) . I n th e absenc e o f added cations the
Fig. 15.2. Ion-dependen t folding of the four-way DN A junction int o th e stacked X-structure. Th e four way junction in DNA exist s as an open extende d structure in the absenc e of added metal ions. Upon addi tion o f ion s (e.g . 10 0 u M magnesiu m ions ) th e junctio n undergoes a foldin g transitio n base d o n th e coaxial stackin g of helica l arm s i n pairs . Ther e ar e tw o alternativ e conformer s of thi s structure , whic h differ i n the choic e of stacking partners. The foldin g creates two differen t kind s of strand. The continuou s strands turn abou t the helica l axis o f the stacke d helices, while th e exchangin g strands pass from a helix in one coaxia l stack to th e othe r a t the exchang e point . I n th e antiparalle l structure th e continuou s strands run i n opposite directions (thei r chemical polarity is indicated by th e arro w heads).
476
Oxford Handbook of Nucleic Acid Structure
structure i s unfolded ; th e arm s remai n unstacke d an d full y extende d i n a squar e configuration (46) . Upon addition o f sufficient meta l ions (suc h as > 10 0 u M magne sium ions ) th e four-wa y DN A junctio n undergoe s a precis e foldin g vi a th e coaxia l stacking of pairs of helical arms, to generat e th e stacke d X structure. The essentia l features of this structure ar e as follows. • The arm s of the junctio n associat e in pair s by helix—heli x stacking . Two stereo chemically equivalen t conformer s ar e possible (34) , dependin g upo n th e choic e o f stacking partners . Th e relativ e stabilit y o f stackin g conformer s depend s o n loca l sequence. • Th e tw o pair s o f stacke d helices ar e rotated , rathe r lik e openin g a pair o f scissors. This minimize s electrostatic repulsion withou t disturbin g the helix—heli x stacking. • Th e twofol d symmetr y o f the structur e generates two set s of inequivalent strand s in the structure . Th e member s o f on e pai r (th e continuou s strands ) ar e relate d b y a helix axi s tha t passes continuousl y throug h th e poin t o f strand exchange. Th e othe r pair (th e exchangin g strands ) pass betwee n th e tw o coaxia l stack s a t th e poin t o f strand exchange . • Th e exchangin g strand s are disposed abou t the smalle r angle o f the X structure , and do not cross . This generate s an approximately antiparalle l alignment o f the continu ous strand s of the DN A helice s (34,42,47) . Th e tw o coaxia l helica l stacks lie across each othe r wit h a right-hande d sens e (42) , allowin g a favourabl e juxtapositio n between DN A strand s and groove s (se e Fig. 15.3) ; th e alignmen t i s best fo r a small angle o f abou t 60° . Simila r strand—groov e alignment ha s bee n observe d betwee n DNA duplexe s packe d int o crysta l lattices (48,49) . I f the backbon e o f on e o f th e exchanging strand s of the four-wa y junction is interrupted b y a covalent discontinu ity (nick) , th e helica l pair s appea r t o disengag e (whil e remainin g stacked ) and tak e up a new angl e of crossing of about 90° (50) . • Th e structur e presents two side s o f different character . Thi s arise s because the fou r base pair s a t th e poin t o f stran d exchang e ar e oriente d i n th e sam e direction . O n one sid e o f the junction (th e majo r groov e side ) th e poin t o f strand exchang e has major groov e characteristics , while th e othe r sid e (th e minor groove side ) has mino r groove characteristics. • Th e structur e ca n accommodate singl e bas e mismatche s withou t extensiv e disrup tion t o th e globa l structur e (51). Some mismatche s d o no t appea r to destabiliz e th e structure significantly , while other s elevat e th e concentratio n o f ion s require d t o permit foldin g into the stacked X structure. The globa l structure is consistent wit h al l available experimental evidence . Th e firs t indication o f th e stacke d X structur e cam e fro m th e analysi s o f th e overal l shap e b y means o f comparative ge l electrophoretic experiments (34) . Dat a fo r on e exampl e are shown i n Fig . 15.4 . Thre e pair s of mobilities ar e observed, i.e . slow , intermediate, an d fast, consisten t with a twofold symmetrica l X-shaped structure . The fas t mobility o f the BX an d HR specie s indicates tha t for this junction foldin g occur s by pairwise stackin g
Structures and interactions of helical junctions in nucleic acids 47
7
Fig. 15.3. Th e stacke d X-structur e of th e four-wa y DN A junction . The illustratio n uses a ribbon t o indicate the path of the backbones in the right-handed, antiparallel stacked X-structure (42). The tw o side s of the structure are not equivalent . The right side of the junction present s major groov e edges of the base pairs at the poin t o f strand exchange , while at the lef t sid e the minor groove edges are presented.
of B on X and H o n R arms . However, when th e centra l sequenc e was altered, results indicating the formation o f the alternativ e stackin g conformer were obtaine d (34) . Th e slow mobilit y o f th e B H an d R X specie s indicate s tha t th e B—centre— H an d R-centre—X angle s wer e small ; thi s tell s u s tha t th e b an d r strand s tur n abou t th e small angl e o f X, i.e . th e relativ e polarit y o f the h an d x strand s is antiparallel. Thes e conclusions wer e supporte d an d extende d b y FRET studie s (42,43) , whic h foun d th e largest efficienc y o f energ y transfe r fo r th e vector s B H an d R X i n junction 3 . Thi s confirmed th e antiparalle l structure , an d studies o f other junctions confirmed th e for mation o f alternative stackin g isomers fo r differen t sequences . Further experiment s i n which on e o f the fluorophores was moved aroun d the arm s to map the juxtaposition o f helical faces indicated that th e stacke d X structur e was right-handed (42) . The structur e is consisten t wit h othe r experiments . Seema n an d coworker s (52 ) studie d th e access ibility o f th e ribose—phosphat e backbon e o f a four-wa y junctio n (o f differen t bas e sequence fro m thos e above ) t o attac k b y hydroxy l radicals , an d conclude d tha t th e structure wa s twofold symmetrical . Usin g th e sam e junction sequence , Coope r an d Hagerman (53 ) compare d th e rotationa l dynamic s o f specie s wit h pairwis e extende d arms by means of transient electric birefringence. Their results were consisten t wit h a n antiparallel X-shape d structure. Time-resolved fluorescence measurements indicat e tha t there i s some scissoring motion of the arm s of the junction (54) .
478
Oxford Handbook of Nuderic Acid Structure
Fig. 15.4 . Analysi s o f th e globa l structur e o f th e four-wa y DN A j u n c t i o n i n th e presenc e o f magnesiu m ions by comparativeelectrophoresis. The junction comprises four arms (each of length 40 bp) labelled 1)13. 11. R. and X. generated b y th e associatio n o f th e strands b. h, r, an d x (each o f lengt h 8 0 nt) . B y mean s of selective restriction enzyme cleavage.. the siz possible species with two shortened arms (reduced to 15 bp) are generated, and their electrophonic mobility in timhiliiy in poly aery1 Liracompared. The species arc named b y thei r tw o lon g arms , e.g . th e spceic s B H ha s shortened R an d X arms . Th e patter n o f mobilitie tienLT^Lted in magnesiumions can be described by slow, intermediate, fast, List, intermediate, slow, and ma y b e explaine d by th e stacke d X-structure . Thu s th e angle s subtende d betwee n the long arm s and acute , obtuse , linear , linear , obtuse , acute, in goo d agreemen t wit h th e patter n o f eleetruphoreci c mobilities.
3.2
The role of metal ions in the structure of the four-way DNA
junction
Metal ion s pla y a critica l role i n th e structur e o f th e four-wa y DN A junction . I n th e absence o f added cation s the junction i s unable t o underg o foldin g t o for m th e staeke d X structure. , bu t remain s in a n extende d conformatio n wit h n o coaxia l stackin g o f helical arms . Thi s i s indicated by man y differen t experiments . Comparative ge l dec trophoretic experiment s sho w tha t th e junction adopt s a structure with approximately square symmetr y i n th e absenc e o f metal ions (34,46 ) (Fig , 15.5) an d thi s i s confirme d by FRE T experiment s (55) . Thymin e base s ar e reactiv e t o additio n b y osmiu m
Structures and interactions of helical junctions in medei acids acids 479 tetroxide i n th e extende d structur e o f th e junction unde r lo w sal t conditions (34) . A variety o f ion s ar e abl e t o brin g abou t th e foldin g (46) . Grou p I I metal s (e.g . magne sium an d calcium ) fol d th e junctio n a t concentration s greate r tha n abou t 10 0 u M, while comple x ion s an d polyamine s ar e mor e efficient ; 2 u M [Co(NH3)6|{III ) o r 25 u M spermm e ar e sufficient , t o promot e folding . Grou p I meta l ions , suc h a s sodium o r potassium , brin g abou t a t leas t a partia l foldin g o f th e junctio n (43) , bu t very hig h concentration s ar e require d an d th e junction-proxima l helica l termin i remain accessibl e t o additio n b y osmiu m tetroxid e (46) . Th e abilit y o f monovalen t ions t o achiev e something lik e th e correc t folde d geometr y overal l suggest s tha t site specific bindin g is not require d fo r thes e processes . However , uranyl-induce d photo cleavage experiment s indicat e the presenc e o f a specificion-bindin g sit e near th e poin t of stran d exchang e i n th e folde d junctio n (56) (Fig . 15.6) , Experiment s i n whic h selected phosphat e group s wer e electricall y neutralize d b y replacemen t wit h methy l phosphomtes (46 ) reveale d tha t repulsio n betwee n phosphate s a t th e poin t o f stran d exchange wa s ver y significant: , a s migh t b e expected . Foldin g th e junctio n probabl y generates a n electronegativ e clef t tha t bind s divalen t ion s wit h increase d affinity , whereupon th e centra l bases become inaccessibl e to osmiu m tetroxide .
Fig. 15.5 . Analysi s of'the globa l structur e of the four-wa y DN A junctio n i n the absenc e o f added ion s by comparative ge l electrophoresis . A n equivalen t set o f si x specie s wit h tw o long an d tw o shor t arm s use d tor the analysis in Fig. 15.4 was electrophoresed in .1 polyacrylamide gel in the presence of 1 nMEDTA. In marke d contras t t o th e patter n o f mobilitie s observe d i n th e pretenc e magnesiu m ions , th e patter n i n the absence of added ion can be described by slow, fast, slow, fast,slow. Thus is in good agreement with th e extended , squar e geometr y o f th e junctio n unde r thes e conditions , givin g the angle s betwee n th e long arms of 90 , 180 , 90 , 90 , 180 , an d 9 0
480 Oxford Handbook of Nature
Fig. 15.6 . location a t o n ion-bindin g sit e i n th e four-wa y DN A j u n c t i o n b y urjny l included photocplnitndi:avage. I n d m mechod, a nucleie an d i s irradiate d w i t h l i g h t a t 42 0 tn n i n th e presenc e o f unm y in n ( U O , 2 , w h e r e u p on th e (deoxyiribose-pfdi'oxyiriboit-pliospluli' KicklnniL1 ca n b e broken i n th e vicinit y o f th e bindin g the (126) . [deutification o f cleavag e site s thu s locatize s m y specifi c in n binding sites . Th e selectivit y o f th e probing ca n b e increase d b y indusio n o f fitr:iu - ion , which ^nfipvfsse- ; non-specifi c RMI.-UOII . A tour-wa y junction wit h th e centra l sequenc e show n wa s .assembled f r om fou r strands , on e o f whic h (stran d b ) wa s radhoaetively |5-t-P | labelle d (50) . Th e some stan d wa s als o hybridize d t o it s complcuient , t o giv e a perfect d u p l e x species to r comparison . Th e radioactiv e b stran d wa s subjecte d t o a format e ( A + G) sequencing reactio n (left brack). Th e duple x species wa s irradiate d i n th e ru M I r i s - H C l , pH 7.2 (middletrack t r a c k ) , givin g i n eve n leve l o f cleavag e alon g th e lengt h o f th e duplex. Th e junctio n speare s wa s photorc .ii.'ti-d i n 5 0 in.M Tns-H ICI, p I I 7. 2 0.75 m M citrat e (fiill n track) . Th e sequence a t th e centr e o f stran d b i s indicated o n th e left, an d th e arrow s indicat e th e poin t o f stran d exchange . Note th e pronounce d photo cleavage observed aroun d th e p o i n t o f strand exchang e i n th e Four-wa y junction .
Structures and interactions of helical junctions in nucleic acids 48
1
3.3 The local stereochemistry of the point of strand exchange in the four-way DNA junction There have been a number o f attempts t o model the stereochemistr y o f the exchang e point o f th e four-wa y junctio n (47,49,57) , but , experimentally , thi s mus t b e approached by NMR o r crystallography . The latte r has been hampere d by the lac k of suitable qualit y crystal s t o date , bu t despit e th e almos t heroi c scal e o f th e problem , significant progres s ha s been mad e i n solutio n b y ' H NM R i n th e laboratorie s o f Chazin (58—60 ) an d Altona (61) . While ful l structura l determination ha s not ye t bee n achieved, clea r evidenc e ha s been obtaine d fo r a numbe r o f aspect s o f th e structure . Thus, th e overal l DN A geometr y i s essentiall y B-like, wit h n o evidenc e o f broke n base pairing at the poin t o f strand exchange. Critically , evidenc e fo r base—base stacking across th e exchang e poin t ha s bee n obtaine d fo r severa l junctions (59,61) , an d a sequence-dependent stackin g conformer bias has been observe d (60).
4. The three-way DNA junction The three-wa y junction provide s a test of the generalit y o f the stereochemica l princi ples established with th e four-way junction.
4.1 The perfectly base paired three-way junction The firs t three-wa y junctions studied in DNA wer e constructed analogously to the usua l four-way junctions, suc h that three helice s were connecte d withou t th e interventio n o f unpaired base s (3 H junctions, se e Fig . 15.1) . Comparativ e ge l electrophoreti c experi ments (62 ) indicated tha t th e thre e angle s between th e arm s of such perfect three-wa y DNA junctio n wer e muc h close r to bein g equa l tha n wer e th e si x angles relating th e arms of the four-wa y junction. Thi s was later supporte d by FRET experiments , wher e the three end-to-end distances of a three-way junction wer e foun d t o be closel y similar (63). Thi s suggested that the arms fail t o undergo the kind of pairwise stacking exhibited by four-way junctions, whic h wa s consistent with th e permanen t reactivity of thymin e bases even at high magnesiu m concentrations (62) . Simple model buildin g leads one t o expect this result; if we attempt to construc t a three-way junction b y fusing a n additional helix t o a broken phosphodieste r linkag e i n on e stran d of a duplex, w e mus t inser t at least th e widt h o f the mino r groov e int o the spac e previously occupied by just a single phosphate group. This is not normall y possible, at least if full bas e pairing is maintained. This conclusion ha s been partially questioned fo r other sequences (64), and the structure is probabl y no t full y symmetrical . Bu t whil e th e angle s ar e probably not exactl y 120 ° between eac h pair of arms, the difference s stil l appear to b e smalle r than th e vari-atio n observed betwee n th e angle s o f th e four-wa y junction, an d th e extende d unstacke d structure is likely to be broadly correct for most sequences.
4.2 The effect of unpaired bases The perfec t 3 H three-wa y junction i s unable t o satisf y th e principle s outlined a t th e start, namely , tha t helica l junctions ten d t o underg o coaxia l helica l stackin g an d
Fig. 15.7. Analysis of the global structure of three-way DNA junctions by comparative gel electrophoresis. In order to analyse the structure of three-way junctions we compare the three species with one shortened arm. The 3HS,t junctions are based on a sequence with three arms, H, R. and X, mid R impaired adenine bases ('ji = 3.3, or 5) on the r strand, bying opposite the H arm us shown, For the perfectly paired 3H Junction (n — (i) the mobilities of the three two-long-,in]] specifs are closely similar under all conditions (not shown). However, this is dearly nor the case for the 3HSH Junctions ( 6 6 ) . In the albsence of added ious (upper). electrophoretic mobility patterns described by slow, fass, slow are obtained. where the difference between fast and slow becomes greatet as a increases. This is simply interpreted in terms of a widening of the angle containing the unpaired bases, i.e. between the R and X arms. In the presence of I mM magnesium ions (lower) the behaviour is more complex. and is consistent with a model where there is coaxial stacking between the Il and X arms (not possible in the perfect 3H junction). and a reduction of the angle between the H and R arms as a increases.
Structures and interactions of helical kjunctions in nucleic acids 48
3
ion-dependent folding . Th e rigi d framewor k o f the full y paire d three-wa y junctio n effectively remove s th e possibilit y o f suc h folding . However , thi s stereochemica l restraint coul d be relaxe d if some additiona l conformationa l flexibility were provide d by the additio n o f a single-stranded regio n betwee n th e helica l arms, creating a 3HS B junction (se e Fig. 15. 1 fo r a n exampl e o f a 3HS 2 junction). I t ha d bee n show n tha t such bulge d three-wa y junction s ha d increase d stabilit y i n ge l electrophoresi s (65) , and, usin g electrophoresi s (66 ) an d FRE T (63) , w e hav e demonstrate d tha t suc h junctions underg o a magnesium-dependen t conformationa l chang e i n whic h th e angles betwee n arm s becom e markedl y differen t (Fig . 15.7) . Thes e result s ca n b e interpreted i n term s o f th e formatio n o f a structur e i n whic h tw o arm s ar e no w coaxially stacked , whil e th e thir d subtend s a n angl e tha t i s se t b y th e numbe r o f unpaired bases. This globa l structure is also consistent with recen t FRET studies (63), in whic h th e distanc e betwee n th e end s o f th e tw o helice s becam e increasingl y shorter a s the numbe r o f unpaired base s i s increased. Changes i n helix—heli x lengths in three-way junctions wit h the introduction of unpaired base s were also observed by time-resolved FRE T measurement s (67). The distinc t conformation o f bulged junctions ca n als o explai n th e lowere d rate s o f cyclizatio n o f DN A containin g a bulged junction, compare d wit h thos e carryin g a perfectl y paire d junction (68) . Thu s w e find tha t once th e structura l restraints imposed b y the perfec t three-wa y junctio n ar e removed, three - an d four-wa y junction s exhibi t th e sam e genera l principle s o f folding. I f electrostati c repulsio n an d steri c factor s ca n b e reduce d sufficiently , the n coaxial helix—heli x stackin g wil l driv e th e foldin g process , resultin g i n a stacked conformation. Three-way DN A junction s containin g tw o unpaire d base s (3HS 2 junctions ) hav e been th e subjec t of two studie s by nuclea r magnetic resonanc e (NMR) . Junctions o f different sequenc e wer e studie d independentl y b y tw o group s (69—72) . Bot h studie s found structure s based upo n coaxia l stackin g o f tw o helices , wit h th e thir d heli x unstacked and extende d awa y from the poin t of strand exchang e (Plat e XX). Close r examination o f the tw o NM R structure s reveals that they ar e very different . Lik e th e four-way junction , ther e ar e tw o conformer s possibl e fo r th e three-wa y junction , which diffe r i n th e choic e o f stackin g partners. However , i n marke d contras t t o th e four-way junction, thes e are not stereochemicall y equivalen t structures, and are therefore unlikel y t o be equall y stable. In one structur e the polarit y of the bulg e sequence is 3' t o 5 ' a s it leave s the stacke d helices (conforme r I) , while i n th e othe r i t i s 5' t o 3 ' (conformer II ) (Fig . 15.8) . Th e structur e solve d b y th e Leonti s laborator y i s a n example o f conforme r I , whil e tha t solve d b y Rose n an d Patel i s conforme r II . A more recen t NM R stud y o f tw o furthe r 3HS 2 junctions b y Alton a an d coworker s (73) reveale d additiona l example s o f conforme r I I structures . W e hav e studie d a number o f different sequence s by comparative ge l electrophoresis and FRET, and have found tha t the y fol d int o conforme r I o n additio n o f magnesium ions . Nevertheless , when w e studied the sam e sequence as that investigated b y Rosen an d Patel we found that thi s adopted th e alternativ e stacking conformer (74) , in complet e agreemen t wit h the NM R analysis . Thus, despit e th e stereochemica l difference s betwee n th e tw o structures, bot h ca n be adopted , an d th e relativ e stability is clearly governed b y local DNA sequence . I n our experienc e th e formatio n o f conformer I I is relatively rare, yet thermal stabilit y measurement s indicat e tha t th e Rosen—Pate l sequenc e i s th e mos t
484
Oxford Handbook of Nucleic Acid Structure
Fig. 15.8. Alternative stacking conformers formed by bulged three-way junctions. Comparative gel electrophoretic analysis of two different 3HS2 junctions (74). The junction on the left is based on the same central sequence as those analysed in Fig. 15.7, while the junction on the right is based on a sequence studied by Rosen and Patel (70) the central sequences are presented above the autoradiographs. Note that the electrophoretic patterns are virtuallymirrorimagesindicatingthat the change in central sequence has provoked a change in structure. The left mobility pattern indicates a stacking of H and X arms, while that on the right requires a different model, i.e. I 1 on I I stacking. These alternative conformers are not stereochemically equivalent. Note that the polarity of the strand running through the A, bulge is opposite in the two structures. The structure deduced from this experiment for the Rosen-Parel sequence is completely consistent with the NMR study (see Plate XX).
Structures and interactions of helical junctions in nucleic acids 48
5
stable three-wa y junctio n tha t w e hav e examined ; thi s i s reflecte d i n bot h a highe r melting temperatur e compare d wit h othe r sequence s an d foldin g i n th e presenc e o f just 3 0 mM sodium ions . As we would expect , given th e forma l stereochemical differenc e between th e con formers, th e tw o NM R structure s contai n significan t differences . Th e pat h o f th e backbone o f th e bulge d sectio n o f th e Leonti s structur e (isome r I ) (69 ) i s relatively looped compare d wit h tha t o f the Rosen—Pate l structure (isome r II ) (70) , where th e backbone passe s quite smoothl y fro m th e stacke d helice s t o th e unstacke d arm. Th e unstacked heli x o f the Leonti s structure i s largely coplanar , an d lie s at approximatel y 90° to th e stacke d helices, although th e angl e i s probably no t wel l determine d b y th e available NM R dat a in an y of the structures . In th e Rosen-Pate l structure , the thir d arm i s less coplanar and, i n addition , i t is bent back at an acute angle, just a s our elec trophoretic dat a woul d indicate . Interestingly , th e overal l foldin g o f thi s junction i s remarkably simila r to tha t which woul d b e derive d b y the remova l o f one helica l ar m from th e right-hande d stacke d X structur e of the four-way DN A junction (42,47) . As discussed above, the four-way DNA junction appear s to be stabilized by the juxtaposition o f th e backbon e o f on e stacke d heli x i n th e majo r groov e o f th e other , an d a similar featur e ma y be observe d i n th e Rosen—Pate l structur e (70) , wher e th e backbone of stem II is located in th e majo r groove o f stem III. Thus w e fin d tha t th e three-wa y junctio n ca n exhibit man y o f th e sam e foldin g properties exhibite d b y th e four-wa y junction, provide d a little extr a conformationa l flexibility is added . Three-wa y junction s underg o ion-dependen t foldin g by pairwis e coaxial stacking of helices, int o one o f two alternativ e conformers determined b y local sequence.
5. The four-way RNA junction Given th e importanc e o f backbone—groove interactions i n the folding of the four-wa y DNA junction, i t might b e expecte d tha t four-way RN A junction s migh t fol d differ ently, sinc e RNA adopt s an A-form heli x wit h substantiall y different geometr y fro m the B-for m heli x o f DNA . W e hav e recentl y examine d th e globa l structur e o f a number o f 4H RN A junction s of different centra l sequence, using the comparativ e gel electrophoresis technique technically modified fo r the analysi s of RNA . We initiall y examine d tw o differen t RN A junction s wit h sequence s equivalen t t o junctions tha t w e ha d studie d extensivel y i n DN A (75) . Fro m th e electrophoreti c analysis it was quickly apparen t tha t ther e wer e bot h similarities and difference s com pared wit h th e DN A equivalent s (Fig . 15.9) . Th e RN A junction s apparentl y fold b y coaxial helical stacking, and even seem t o exhibi t th e sam e choice o f stacking partners as th e sam e sequence s i n DNA . However , th e globa l structur e i s different , an d responds to change s in ioni c condition s i n a very differen t way . The genera l structure of the RN A junctio n i n the presenc e o f moderate (e.g . 1 mM) magnesium io n con centrations i s a 90 ° cros s of helica l stacks , i.e. a structure tha t i s neither paralle l no r antiparallel. On e o f th e bigges t surprise s came whe n w e performe d th e analysi s o f global structur e in the absence of added metal ions. I n marked contras t to DNA junctions, the RNA specie s did not suffe r los s of coaxial stacking but tende d t o rotat e int o a parallel-strande d form . Th e paralle l distortion wa s rather sequence-dependent , bu t
486
Oxford Handbook of Nudeic And Structure
Fig. 15,9 . Analysi s of th e globa l structur e o f th e four-wa y RN A junctio n b y comparativ e ge l clec trophorcsis. Th e two-lung , two-shor t anu s mus t b e prepare d differentl y fro m th e correspondin g DN A species becaus e o f th e difficult y i n synthesizing ver y long RN A molecules , an d th e impossibilit y o f shor t ening arm s b y restrictio n cleavage . Th e molecule s analyse d therefore ha d RN A tore s o f 1 0 b p i n eac h arm, an d th e remaining portio n o f eac h ar m comprise d DNA . Th e si x species wer e prepare d b y synthesi s of eac h o f th e componen t strands . Th e electrophoreti c analysi s wa s performe d analogousl y t o tha t o n DNA four-wa y junction s (e.g . Fig. 15.4) , i n lil t presence o f I in M (upper) , 50 0 u M (middle) . or 10 0 u M (lower) magnesureions . I n th e presenc e o f 1 m M magnesiu m ion s th e electmphoreti c mobilit y patter n tan b e describe d a s slow , slow , fast, fast, slow , slow , an d i s explained by a model based o n coaxia l stacking of B o n X an d I I o n R anus , wher e th e angl e betwee n th e tw o axe s i s 90 ° (75) . Thi s give s angle s between the lon g arm s o f 90 , 90 , 180 , 180 , 90 , an d 90 , a s shown. O n reductio n o f th e magnesiu m io n concentra tion th e electrophoreti c mobilit y pattern change s t o intermediat e slow , fast, fast, slow , intermediate , an d tan b e interprete d i n term s o f a rotatio n o f th e tw o axe s t o giv e a structur e i n whic h th e c o n t i n u o u s strands ar e parallel.
Structures and interactions of helical junctions in nucleic acids 48
7
was th e firs t tim e a paralle l orientatio n ha d bee n observe d fo r an y nuclei c aci d four-way junction . B y contrast , whe n th e junction wa s placed i n 0. 5 m M calciu m ions, o r elevate d concentration s ( 5 mM o r higher ) o f magnesium ions , th e junctions rotated in the opposit e directio n t o adop t a n antiparallel structure. Thus, th e conversio n fro m DN A t o RN A ha s significant consequence s fo r th e global foldin g o f the four-wa y junction. Som e o f th e difference s ar e likely t o deriv e from th e formatio n of an A-form heli x by RNA, wher e th e similarit y in the width s o f the majo r and minor groove s suggest s that this backbone—groove juxtaposition wil l b e less favourable. I f the thermodynami c advantag e of strand—groove alignmen t i s denied the junction, the n th e balanc e o f othe r steri c and electrostati c factor s ma y result in a new globa l conformational minimum fre e energy . Thi s appear s to b e th e cas e for th e RNA junctio n i n the presence o f 1 mM magnesiu m ions . Th e absenc e of a transition to a n unstacke d extende d structur e i n th e absenc e o f adde d ion s contrast s strongl y with th e behaviour o f DNA junctions, an d suggests that overal l electrostati c repulsio n in the RNA junctio n i s lower. There are a number o f cases of four-way junctions occurrin g i n places that suggest an important biologica l role . A good exampl e is found in the U1A snRNA , tha t is involved in splicing of mRNA. The centra l sequence of this junction is shown i n Fig. 15.10 . Th e junction sequenc e is conserved i n mammalian, avian , and amphibian sequences (76,77), and is perfectly base paired for a t least three base pairs in eac h arm, excep t for th e singl e G:A mismatch located a t the point o f strand exchange. We analysed the globa l structure of a junction in which the central RNA cor e was based upon the Ul sequence , including the G: A mismatch. W e found that this adopted a folded structur e based on coaxia l helical stackin g in th e conforme r i n whic h th e adenin e bas e of the G: A mismatch was located o n th e continuou s stran d (Ac stacking conformer). This was in good agreemen t with th e results of Krol et al. (78), based on difference s i n sensitivity to ribonucleas e VI . We foun d tha t th e tw o stack s subtende d 90 ° unde r al l ioni c condition s tested . Interestingly, th e G: A mismatc h di d no t appea r t o destabiliz e th e structure , no r di d it influence th e globa l structur e adopted , sinc e its 'repair' t o eithe r G: C o r T: A di d no t alter the overal l conformation . Whil e the G: A mismatch i s conserved i n th e sequence s of many U l snRN A species , i t is replaced by an A:U base pai r i n the U l snRN A of Drosophila melanogaster. W e analyse d th e globa l structur e o f a junction i n whic h th e RNA sequenc e flankin g th e poin t o f stran d exchang e wa s based o n th e Drosophila sequence, an d foun d tha t th e junctio n folde d i n th e sam e wa y a s th e mammalia n sequence. Onc e agai n th e structur e wa s based o n coaxia l stackin g o f arm s i n th e Ac stacking conformer, with perhap s a little extr a rotation i n the antiparallel direction. Thi s suggests tha t ther e i s conservation o f three-dimensiona l structur e by th e differen t U l snRNA species that transcends changes in sequence . Another biologica l exampl e o f a four-wa y RN A junctio n ca n b e foun d i n th e hairpin ribozym e o f th e tobacc o ringspo t viru s (2,79) . Thi s ribozym e i s usuall y studied i n th e for m o f a nicked duple x containin g tw o bulge d regions , on e o f whic h contains th e scissil e phosphodiester bond . Th e essentia l sequence s ar e largely locate d in th e tw o bulges , an d evidenc e suggest s that thes e tw o region s associat e to generat e the activ e site for self-cleavage. In th e natura l vira l sequence, th e propose d secondar y structure places the tw o bulge s on successiv e arms o f the four-wa y junction, an d thus it would see m probable tha t the junction shoul d fol d i n such a way that the tw o bulge s
488
Oxford Handbook of Nacleic Acid Structure
Fig. 15.10. T h e four-way junction i n U I A s n R N A . U 1 A snRNA c o n t a i n s t h e tou r w a y j u n c t i o n shown (76) , whic h i s perfec t apart from a GA mismatch. Th e sequenc e i s wel l conserve d i n mammals , birds, an d anrphibian s (77) . Comparativ e ge ] electrophoresi s i n th e resenc e o f 1 ni M magnesiu m ion s gives a slow , slow , fast, fast, slow, slow patter n o f mobilitie s tha t i s consisten t with th e stacke d geomert y i n the conformer illustraced to the righ t (75).
would b e brough t together . W e hav e .analysedthe globa l structur e o f th e tobacc ringspot viru s junctio n (i n th e absenc e o f th e bulge s themselves) , an d foun d tha t i t naturally adopt s th e stackin g isome r tha t place s th e would-b e bulge-containin g arm s on opposit e stack s (126) . Moreover , a s th e concentratio n o f magnesiu m ion s wa s raised, th e junction adopte d a progressivel y mor e atiparallel conformation , whereb y the potentia l bulge s woul d b e brough t clos e together . Thu s th e tobacc o ringspo t virus junction ha s exactly th e propensit y require d i f the bulge s are to b e associate d t o generate th e activ e ribozyme ,
6. Interaction between DNA junctions and proteins Four-way DN A junction s ar e th e substrate s fo r a n importan t clas s o f protein s tha t exhibit fundamentall y structure-selectiv e binding . It i s an excitin g challenge t o under stand th e manne r o f the recognitio n o f DN A structur e b y proteins .
Structures and interactions of helical junctions in nucleic acids 48
9
6.1 A class of structure-selective proteins Enzymes tha t exhibi t selectivit y fo r DN A junction s ar e probably a ubiquitous clas s of proteins. Thes e can be junction-specific nuclease s (i.e . resolving enzymes ) o r proteins involved in othe r processe s suc h a s the acceleratio n o f branch migration . The y hav e been isolate d fro m a wid e variet y o f sources , fro m bacteriophag e t o mammals , an d these are summarized i n Table 15.1 . I n Escherichia coli the resolutio n of four-way junctions is carried out b y RuvC (80-82) , a n enzyme of 17 2 amino acids . Thi s has been extensively studie d (83—85 ) an d th e structur e i s know n (86) . Th e gen e encodin g another resolvin g enzyme (RusA ) ha s also been foun d in E. coli (87,88) ; however , thi s is carrie d b y a prophage an d i s constitutively repressed . Th e RuvA B comple x facili tates branch migration i n E. coli (89,90) . The junction-selective componen t o f this is a tetramer o f RuvA, th e crysta l structure of which ha s recently bee n solve d (91) . RecG is another E. coli protein tha t exhibits branc h migration-facilitatin g activit y (92) . Some bacteriophage s encod e junction-resolving enzyme s whos e physiologica l rol e appears t o b e th e resolutio n o f branches tha t ar e lef t followin g replicatio n o f DNA . The bes t characterize d is endonuclease VI I fro m phag e T 4 (93) , th e produc t o f gene 49. Th e enzym e cleave s isolate d four-wa y junction s o f variou s sequenc e in vitro (34,94), as well a s supercoil-stabilized crucifor m structures (95,96). We have expressed endonuclease VI I fro m a syntheti c gene , an d constructe d a numbe r o f site-directe d mutants (97) . Th e protei n appear s t o hav e a modular construction . Th e N-termina l section contain s fou r cystein e residue s that coordinat e a single zin c io n (97) . I n th e centre o f thi s 3 9 amin o acid , autonomousl y foldin g regio n lie s a cluste r o f histidin e and acidic residues , a number o f which appea r to be required fo r the catalysi s of DN A cleavage (98) . At th e C-terminu s i s a section tha t i s 47% identical t o a region o f th e
Table 15.1. Junction-resolvas e and binding activities isolated fro m variou s sources . See text fo r references. Source
Junction-resolving enzyme s Bacteriophage T 4 Bacteriophage T 7 Lambdoid prophage E. coli
Yeast Yeast Calf thymu s CHO cell s Vaccinia Branch migratio n protein s E. coli E. coli
Enzyme
Endonuclease VI I Endonuclease I RusA RuvC Endonuclease X I CCE1
RuvA RecG
Gene
Size (amino acids)
TT0
49 3 rusA ruvC
157 149 120 172
CT°
CCEI
353
N/A N/A
ruvA recG
Cleavage specificity
Structure determined
Yes
Yes
490
Oxford Handbook of Nucleic Acid Structure
T4 repai r enzym e endonucleas e V. The structur e of the latte r enzym e i s known, an d the regio n o f similarity is a helix an d tur n (99) ; interestingly, when the sequenc e fro m endonuclease V wa s used t o replac e th e correspondin g sectio n o f endonucleas e VII , the resultin g chimeric enzym e ha d suffere d n o detectabl e loss in it s selectivity for th e cleavage o f DNA junction s (97) . Lyin g between th e N - an d C-termina l section s is a section wit h wea k similarit y t o T 7 endonucleas e I , an d we hav e isolate d on e mutan t in thi s region tha t lacks catalytic activity but retain s the ful l selectivit y fo r bindin g t o DNA junction s (100) . Phag e T 7 possesse s a simila r resolvas e activity , calle d endo nuclease I (101-103) , tha t i s th e produc t o f gen e 3. W e hav e isolate d a numbe r o f catalytically deficien t mutants of endonucleas e I tha t retai n thei r structura l selectivity for bindin g to DNA junctions (104) . At leas t tw o differen t resolvin g activitie s have been isolate d from Saccharomyces cerevisiae. A n a s yet poorl y characterize d activit y calle d endonucleas e X I wa s isolate d (105), which cleave d isolated four-wa y junctions (106) . A differen t activit y (variously called CCE1 , MGT1 , o r endonucleas e X2) cleave d the four-wa y junctions o f super coil-stabilized crucifor m structures and figure-eight molecule s (107) . This ha s recently been clone d and expresse d and studie d in greate r detai l (108) . Althoug h encode d by the nuclea r CCEi gene , CCE 1 enzym e i s targeted t o th e mitochondrio n (109) . I t is believed t o pla y a n important rol e i n resolvin g junctions lef t i n mitochondria l DNA , without whic h segregation is hindered; ccei mutant s display a raised incidence o f petite cells and an increased frequency of junctions i n mtDNA (110) . Junction-resolving enzym e activit y ha s als o bee n isolate d fro m highe r eukaryoti c cells. West an d coworker s (111,112 ) hav e isolated protein s tha t cleav e synthetic DN A junctions wit h a specificity comparable t o tha t of the phag e enzymes . A n activit y has also been reporte d t o be encode d b y vaccinia virus (113).
6.2 Structure-selective recognition of DNA junctions The resolvin g enzyme s cleav e DN A junction s i n a ver y precis e manner . Thu s T 4 endonuclease VI I will , i n general , cleav e a t just tw o phosphodieste r bond s withi n a given four-wa y junction (Fig . 15.11). These enzyme s bind DNA junctions i n dimeri c form (100,108 ) an d th e complexe s migrat e a s discrete retarde d specie s i n polyacry lamide electrophoresis . A number o f nuclease-defective mutants o f T7 endonucleas e I (104), T 4 endonucleas e VI I (100) , an d yeas t CCEI (M. E Whit e an d D.MJ . Lilley , unpublished data ) retai n their selectivit y fo r binding t o DN A junctions, showin g tha t the bindin g an d catalyti c function s ar e divisible . I n genera l th e junction-interactin g proteins exhibi t a substantial selectivity for the structur e of branched species . Thus th e protein—junction complexe s canno t b e displace d b y 1000-fol d excesse s o f duple x DNA o f the sam e sequence (83,100,104,108) . I n anothe r experiment , tetherin g was used to constrai n the structur e of a junction o f constant sequence into alternat e form s (114), whic h were cleave d b y T 4 endonucleas e VII . I t wa s foun d tha t th e cleavag e pattern depende d o n th e structure of the junction (115) , showing that structure rather than sequence was the importan t element . In ever y case studied, it has been foun d that bindin g o f resolving enzyme s t o DN A is totall y dependen t o n structure , an d independen t o f bas e sequence . However , th e subsequent cleavag e of the junctions ca n exhibit sequenc e selectivity fo r som e o f th e
Structures and interactions of helical junctions in nucleic
acids 49
1
Fig. 1 5 . 1 1 . Cleavage o f a fou r wa y DN A junctio n b y a resolvin g enzyme . A j u n c t i o n wa s |5-32P| radioiWlivdy uniqiid y Iain-lid ! in tin. - li, li . r . an d x itr.'nuls , generatin g f o u r differen t specie s to r jn.iivMS , taL'Ji w,i s inCLLb L LtcJ w i t h fiKloruk'kM^i 1 VI I t iKR'tci'uiplT.ji^i. 1 ' I 4 h .mt i tilt ' p r n t i u f t s annily^t'tl by sequencing gel eletrophoresis (tracks labeled 1). Piperidine fornute (A - G) and bydrazine (C • T ) sequencin g rea tions wer e performe d fo r eac h raductiv e junctio n specie s (track s labelle d R an d Y , respectively ) t o provide sequence markers . Endonucleas e VI I induce s singl e cleavage s int o th e b an d r stands , a t th e arrowed poritions on the inser junction..
enzymes.. While thi s i s a relativel y wea k preferenc e i n th e nis e o f th e phag e enzymes , the sequenc e selectivit y i s considerably stronger fo r Ruv C o f E coli (116 ) (cleavage 3' to TT ) an d fo r CCE 1 o f yeas t (108) (cleavag e 3 ' t o CT) . Ruv C lia s bee n state d to requir e DN A junction s with . ) degre e o f h o m o l o g y ( 1 1 7 ) , suc h t h a t the y ca n branch ungrate . However , thi s i s probably a consequenc e o f th e sequence-selectivity cleavage filter , suc h tha t a junction tha t ca n branc h migrat e provide s mor e chance s o f
492
Oxford Handbook of Nucleic Add Structure
displaying the preferre d sequence in th e require d plac e relative t o th e poin t o f strand exchange.
6.3 Manipulation of junction structure by proteins Quite recentl y i t ha s emerged tha t a s well a s recognizing th e structur e of DNA junc tions, the resolvin g enzymes also distort tha t structure in general. Thi s has been show n for T 7 endonucleas e I (104) , T 4 endonucleas e VI I (100) , Ruv C (85) , and CCE1 (118). While eac h o f these enzyme s distort s th e globa l structur e of the junction, th e resulting structur e i s differen t i n ever y case . Perhap s th e mos t extrem e i s tha t o f CCE1, wher e th e resulting structure imposed o n th e DNA i s very close to tha t of the extended squar e conformation, just lik e tha t o f th e fre e junctio n i n th e absenc e o f added ions . However , th e CCE1—junctio n complex exist s i n thi s extende d structur e with o r withou t adde d meta l ions. Th e ope n centr e o f the CCE1—junctio n comple x can b e demonstrate d b y th e accessibilit y o f thymin e base s a t th e poin t o f stran d exchange t o attac k by potassium permanganate (118) . Distortion o f junction structur e i s not restricte d t o th e resolvin g enzymes . Ruv A also distort s th e structur e into somethin g ver y clos e t o a n extende d squar e structure (119), an d thi s ca n be readil y rationalized i n term s o f the recentl y determined crysta l structure o f the protei n (91) . Ruv A is a tetrameric junction-selective protei n tha t acts in concer t with tw o hexameri c ring s o f RuvB to facilitat e branch migration o f junctions. Th e compac t folde d structur e o f the junction suggest s tha t branc h migratio n might requir e significan t disruptio n o f the structure , and recen t measurement s of th e rates o f branc h migratio n unde r condition s wher e th e junctio n i s expecte d t o b e folded int o th e stacke d X structur e indicate tha t the proces s is indeed slow . Panyuti n and Hsie h hav e observe d tha t th e rat e o f branc h migratio n i s slowe r b y a facto r o f 1000 in magnesium, compare d wit h tha t found in sodium (120) . If the structur e could therefore be opened , the rat e of the exchang e o f base pairing should be increased, and thus the distortion impose d b y RuvA would b e expected t o facilitat e th e process.
7. Some final conclusions Branched nuclei c acids undergo foldin g transitions to generat e folde d conformations . At th e outse t w e propose d tw o genera l feature s o f these foldin g processes: that meta l ions would b e an important effecto r i n the conformationa l transitions, and that coaxia l helix—helix stackin g would b e a commo n featur e o f th e folde d states . We ca n no w look back over the availabl e data to se e how wel l these principles bea r up. In general , branche d nuclei c acid s undergo meta l ion-induce d foldin g transitions, driven b y th e reductio n i n electrostati c repulsion . Th e importanc e o f electrostati c interactions is clearly seen in the four-wa y DNA junctio n (4H) , where selectiv e phosphate neutralization ca n switch th e foldin g between alternativ e conformations. I n th e absence o f adde d meta l ion s th e four-wa y DN A junctio n i s completel y unfolded . Surprisingly, however , thi s i s not tru e fo r th e correspondin g RN A junction , whic h remains folded even unde r very low sal t conditions. Nevertheless , th e globa l confor mation of the four-wa y RNA junctio n is responsive to the natur e of the meta l ion s
Structures and interactions of helical junctions in nucleic acids 49
3
present, an d can change between paralle l and antiparallel forms. The natur e of the io n binding tha t leads to conformationa l change i s not ye t full y resolved . I n general , diva lent ion s lik e magnesium ar e much more efficien t tha n monovalen t ion s like sodium , and specifi c ion-bindin g site s hav e bee n reveale d i n th e four-wa y DN A junctio n b y uranyl-induced photocleavag e reactions . Yet , in som e circumstance s at least , partia l folding ca n be induce d b y monovalen t ions , fo r whic h sit e bindin g ca n probably b e excluded i n thes e systems . Probably a combination o f sit e bindin g an d mor e genera l overall charg e neutralization is important i n general . Coaxial stackin g o f pair s o f helice s i s see n t o b e a ver y commo n featur e o f th e folding o f branched nuclei c acids . Foldin g o f four-wa y junctions i n bot h DN A an d RNA i s based on pairwis e coaxial stacking, and i n eac h cas e thi s generates alternative conformers base d on th e tw o possibl e choice s o f stacking partners. Th e choic e seems largely determine d b y th e base s flankin g th e poin t o f stran d exchange . Usuall y on e form i s thermodynamically favoure d ove r th e other , althoug h ther e ar e example s o f junctions tha t exhibit n o stron g isomer bias. The three-wa y DN A junctio n i s an interestin g cas e tha t challenges , bu t ultimatel y obeys, thes e genera l foldin g principles. Th e perfec t three-wa y junctio n (3H ) doe s no t appear to chang e conformation with additio n o f metal ions , nor doe s it undergo coaxia l helical stacking . Thi s is a result of the rigi d framewor k o f the backbone , which woul d require los s o f bas e pairing t o permi t heli x stacking . However , whe n extr a unpaire d bases are added (3HS B junctions) th e situatio n is completely changed . Th e extr a confor mational freedo m allows th e junctions t o underg o meta l ion-induce d foldin g via pairwise coaxia l stacking, and onc e agai n two (no w stereochemically inequivalent ) stackin g isomers are possible. It really is a case of the exceptio n tha t proves the rule . In principl e th e genera l foldin g characteristic s established fo r th e mode l junction s should b e applicabl e to natura l helical junctions. Th e hammerhea d ribozym e provide s an interestin g exampl e o f ion-induce d foldin g i n a slightly mor e comple x three-wa y RNA junction . Th e cor e o f this self-cleaving RNA specie s is a HS,HS7HS3 junction, and the folded structure has been determine d i n two crystallographi c studies (121,122) . We hav e foun d tha t i n th e absenc e o f adde d meta l ion s th e hammerhea d cor e i s unfolded an d extended , an d upo n additio n o f divalent meta l ion s i t undergoes a two stage foldin g process (123,124) . Th e firs t ste p (occurrin g a t abou t 1 mM magnesiu m ions) involve s th e coaxia l alignment o f two o f the helica l arms , leaving the res t of th e core relativel y unstructured . I n the secon d stage (occurring a t about 1 mM magnesiu m ions), the probabl e catalyti c core folds , causin g a rotation o f the remainin g helica l ar m in space. This happen s over the sam e range of magnesium io n concentratio n that leads to the activatio n of ribozyme activity , and must generate a conformation tha t facilitate s the trajector y into th e transitio n stat e o f the S N2 cleavage reaction ; thi s would requir e colinear alignmen t o f th e attackin g 2' oxyge n atom , th e phosphoru s atom , an d th e leaving oxyge n atom . Ther e i s metal io n participatio n i n th e cleavag e reactio n (125) , and th e foldin g would b e expecte d t o generat e som e kin d o f electronegative bindin g site for one o r more meta l ions; usin g uranyl-induced photocleavag e w e hav e detecte d a high affinity meta l ion-binding site within the proposed catalytic cor e (123). Thus, man y o f the foldin g principle s establishe d i n DN A an d RN A junction s d o appear t o hav e general validity , and ca n be usefull y applie d t o natura l and functiona l nucleic acids.
494
Oxford Handbook of Nucleic Acid Structure
Acknowledgements It is a pleasure to thank many of my past and present colleagues for collaborations on th e structures of branched nucleic acids, especially Derek Duckett, Alastair Murchie, Rober t Clegg, Gurminde r Bassi , Richard Pohler, Jon Welch , Marie-J o Giraud-Panis , Malcol m White, Niels-Eri k M011egaard , and Eberhard vo n Kitzing . I thank Dr D . Patel for pro viding coordinates and the Cance r Research Campaig n fo r financia l support.
Note added in proofs 23 August 1998 DNA junctions A major topic o f interest in DNA junctions i n the las t 1 8 months ha s been th e demonstration of exchange between stackin g conformers. Miick et al. (127) used a combination of NM R an d time-resolve d FRE T measurement s t o demonstrat e th e presenc e o f an exchanging populatio n i n four-wa y junctions , dependin g o n centra l bas e sequence . While most junction sequence s are strongly biased towards one particula r stacking con former, we found a new junction sequence tha t adopted both conformers i n about equa l population, with interconversio n between the m (128) . To our surprise, we observed relatively long-range influence s o f sequence on the relative conformer population.
DNA junction-protein interaction This remain s a ver y activ e are a tha t ha s see n considerabl e progres s i n th e las t 1 8 months. Som e o f this has been reviewe d b y us in White et al. (129). A new junctionresolving enzym e ha s bee n discovere d i n Schizosaccharomyces pombe (130—132) , with properties closel y similar to CCE 1 o f 5 . cerevisiae. The sequenc e specificit y for cleav age o f DNA junction s ha s been studie d i n dept h fo r CCE 1 (133) . A tetranucleotide consensus cleavag e sequenc e S'-ACTU- A ha s bee n identified , althoug h specificit y i s determined mainl y b y th e centra l C T dinucleotide . Al l th e junction-resolvin g enzymes studied to date bind in dimeric for m t o DNA junctions, consisten t with th e bilateral resolution reaction. However, subuni t exchange reaction s in fre e solutio n var y widely, an d w e hav e recentl y foun d tha t i n contras t t o mos t o f thes e enzymes , th e exchange rate fo r endonuclease I of phage T7 i s extremely slow (134) . Using heterodimeric mutan t form s o f T4 endonucleas e VII , w e showe d tha t th e tw o subunit s act independently i n thei r cleavag e reaction s (135) . However , bot h cleavage s normall y occur withi n th e lifetim e o f th e enzyme—junctio n complex , leadin g t o th e bilatera l cleavage require d fo r productiv e resolutio n o f th e junction . Whil e recognisin g th e structure o f th e four-wa y junction , al l the resolvin g enzyme s appea r t o distor t th e global geometry o f the junction, an d this has been recentl y extende d t o th e lambdoi d enzyme RusA (136,137) . I n addition t o th e junction-resolving enzymes , a number o f other proteins interac t with four-way DN A junctions with some degree of selectivity . In som e case s a t least , th e biologica l relevanc e o f thi s interactio n i s questionable. I n general, the HMG-box proteins exhibi t selectiv e interaction with DN A junctions. We have recently shown tha t HMG boxe s of diverse origin bin d to junctions i n the opensquare conformatio n (138) , an d hav e suggested that the primar y sit e of interaction is the widened mino r groov e a t the point o f strand exchange.
Structures and interactions of helical junctions in nucleic acids 49
5
RNA junctions and ribozymes The globa l conformatio n o f th e four-wa y RN A junctio n ha s been studie d usin g FRET (139) . This has confirmed the general folding principles, includin g the stackin g conformers adopte d b y junctio n 1 an d th e U l snRN A junction . Th e hairpi n ribozyme has been studied i n it s natural conformatio n a s a four-way junction (126) . The ribozym e wa s found to b e activ e in thi s form , an d the leve l o f activity could b e modulated b y alterin g th e structur e of the junction. FRE T studie s showe d tha t th e ribozyme adopt s the stackin g conformer that places the unpaire d loop s (th e A and B loops) o n opposite stacked helical pairs. Addition o f magnesium, calciu m o r strontium ions induces a change o f conformation, in whic h th e helice s rotat e i n a n antiparalle l direction, leadin g to a close association between th e arm s carrying the unpaire d loops (1,140). This is presumed t o generate the active sit e that leads to the cleavag e reaction. The ion-induce d two-stag e foldin g o f th e hammerhea d ribozym e ha s als o bee n extensively studie d usin g FRET (141) . The result s indicate two sequentia l single-ioninduced foldin g event s tha t mostl y likel y correspon d t o th e formatio n o f domai n I I and domain I respectively (se e Chapter 1 7 by Masquida and Westhof i n this Volume).
References 1. Lilley , D.M.J. , Clegg , R.M. , Diekmann , S. , Seeman , N.C. , vo n Kitzing , E . an d Hagerman, P . (1995 ) Eur.J. Biochem. 230, 1 . 2. Hampel , A. and Tritz, R. (1989 ) Biochemistry 28 , 4929. 3. Forster , A.C . an d Symons, R.H . (1987 ) Cell 49, 211 . 4. Hazeloff , J.P. an d Gerlach, W.L . (1988 ) Nature 334, 585 . 5. HoUiday , R . (1964 ) Genet. Res. 5, 282 . 6. Broker , T.R . an d Lehman, I.R . (1971 ) J. Mol. Biol. 60, 131 . 7. Orr-Weaver , T.L. , Szostak , J.W. an d Rothstein, RJ . (1981 ) Proc. Nad. Acad. Sci. USA 78, 6354. 8. Potter , H. an d Dressier, D. (1976 ) Proc. Natl. Acad. Sci. USA 73 , 3000. 9. Potter , H. an d Dressier, D . (1978 ) Proc. Natl. Acad. Sci. USA 75 , 3698 . 10. Sigal , N. an d Alberts, B . (1972 ) J. Mol. Biol. 71, 789 . 11. Sobell , H.M. (1972 ) Proc. Natl. Acad. Sci. USA 69 , 2483 . 12. Kitts , P.A. an d Nash, H.A . (1987 ) Nature 329 , 346 . 13. Nunes-Duby , S.E. , Matsomoto , L . and Landy, A . (1987 ) Cell 50, 779 . 14. Hoess , R., Wierzbicki , A. and Abremski, K . (1987 ) Proc. Natl Acad. Sci. USA 84 , 6840. 15. Jayaram , M. , Grain , K.L. , Parsons , R.L. an d Harshey, R.M . (1988 ) Proc. Natl. Acad. Sci. USA 85 , 7902 . 16. McCuUoch , R. , Coggins , L.W. , Colloms , S.D . an d Sherratt , DJ . (1994 ) EMBOJ. 13 , 1844. 17. Kemper , B. andjanz, E . (1976) J. Virol. 18 , 992 . 18. Kim , S.-H. , Quigley , G.J. , Suddath , F.L. , McPherson , A. , Sneden , D. , Kim , J.J. , Weinzierl, J. an d Rich, A. (1973) Science 179 , 285 . 19. Jack , A. , Ladner, J.E. an d Klug, A . (1976) J. Mol. Bio/. 108, 619 . 20. Marini , J.C., Levene , S.D. , Crothers , D.M . an d Englund, P.T . (1982 ) Proc. Natl. Acad. Sci. USA 79 , 7664 . 21. Diekmann , S. and Wang, J.C. (1985 ) J. Mol. Biol. 186, 1 . 22. Hagerman , PJ. (1985 ) Biochemistry 24, 7033. 23. Koo , H.-S. , Wu, H.-M . an d Crothers, D.M. (1986 ) Nature 320 , 501 .
496
Oxford Handbook of Nucleic Acid Structure
24. Wu , H.-M . an d Crothers, D.M . (1984 ) Nature 308 , 509 . 25. Lerman , L.S. and Frisch , H.L. (1982 ) Biopolymers 21 , 995 . 26. Lumpkin , OJ. an d Zimm, B.H . (1982 ) Biopolymers 21 , 2315 . 27. Levene , S.D . an d Zimm, B.H. (1989 ) Science 245, 396 . 28. d e Gennes, P.G. (1971 ) J. Chem. Phys. 55, 572 . 29. Calladine , C.R. , Drew , H.R . an d McCall, M.J . (1988 ) J. Mol. Biol. 201, 127 . 30. Calladine , C.R., Collis , C.M., Drew , H.R. an d Mott, M.R . (1991 ) J. Mol. Biol. 221, 981 . 31. Gough , G.W . an d Lilley, D.M.J. (1985 ) Nature 313 , 154 . 32. Diekmann , S . and Lilley, D.M.J. (1987 ) Nucl. Acids Res. 14, 5765 . 33. Cooper , J.P. an d Hagerman, PJ . (1987 ) J. Mol. Biol. 198, 711 . 34. Duckett , D.R., Murchie , A.I. H., Diekmann , S. , von Kitzing , E., Kemper, B. and Lilley, D.M.J. (1988 ) Cell 55, 79. 35. Weber , G. (1953 ) Adv. Protein Chem. 8, 415 . 36. Steiner , R.F. (ed. ) (1983) Excited States in Biopolymers. Plenu m Press , New York . 37. Lakowicz , J.R. (1983 ) Principles of Fluorescence Spectroscopy. Plenum Press, Ne w York . 38. Jameson , D.M . an d Reinhart, G.D . (eds ) (1989) Fluorescent Biomolecules: Methodologies and Applications. Plenu m Press, Ne w York . 39. Lakowicz , J.R . (ed. ) (1991 ) Topics in Fluorescence Spectroscopy: Vol . 3 , Biochemical Applications. Plenu m Press , New York . 40. Clegg , R.M . (1992 ) Meth. Enzymol. 211, 353 . 41. Haugland , R.P . (1996 ) Molecular Probes: Handbook of Fluorescent Probes and Research Chemicals. Molecula r Probes . Eugene . 42. Murchie , A.I . H. , Clegg , R.M. , vo n Kitzing , E. , Duckett , D.R. , Diekmann , S . and Lilley, D.M.J. (1989 ) Nature 341, 763 . 43. Clegg , R.M. , Murchie , A.I.H. , Zechel , A. , Carlberg , C. , Diekmann , S . an d Lilley , D.M.J. (1992) Biochemistry 31 , 4846 . 44. Clegg , R.M. , Murchie , A.I.H., Zechel , A . and Lilley, D.M.J. (1993 ) Proc. Natl. Acad. Sci. USA 90 , 2994 . 45. Gohlke , C. , Murchie , A.I.H. , Lilley , D.M.J . an d Clegg , R.M . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 11660 . 46. Duckett , D.R. , Murchie , A.I.H. and Lilley, D.M.J. (1990 ) EMBOJ. 9 , 583 . 47. vo n Kitzing , E., Lilley , D.M.J. and Diekmann, S . (1990) Nucl. Acids Res. 18, 2671 . 48. Timsit , Y. , Westhof, E., Fuchs , R.P. P . and Moras, D. (1989 ) Nature 341, 459 . 49. Goodsell , D.S., Grzeskowiak , K. and Dickerson, R.E . (1995 ) Biochemistry 34 , 1022 . 50. Pohler , J.R.G., Duckett, D.R. an d Lilley, D.M.J. (1994 ) J. Mol. Biol. 238, 62 . 51. Duckett , D.R. an d Lilley, D.M.J. (1991 ) J. Mol. Biol 221 , 147 . 52. Churchill , M.E. , Tullius , T.D. , Kallenbach , N.R. an d Seeman , N.C . (1988 ) Proc. Natl. Acad. Sci. USA 85 , 4653 . 53. Cooper , J.P. an d Hagerman, P.J . (1989 ) Proc. Natl. Acad. Sci. USA 86 , 7336. 54. Eis , P.S. and Millar, D.P. (1993 ) Biochemistry 32 , 13852 . 55. Clegg , R.M. , Murchie , A.I.H., Zechel , A . and Lilley, D.M.J. (1994 ) Biophys.J. 66 , 99. 56. M011egaard , N.E. , Murchie , A.I.H. , Lilley , D.M.J . an d Nielsen , P.E . (1994 ) EMBOJ. 13, 1508 . 57. Srinivasan , A.R. an d Olson, W.K. (1994 ) Biochemistry 33 , 9389 . 58. Chen , S.M. , Heffron , F . and Chazin, W.J. (1993 ) Biochemistry 32 , 319 . 59. Chen , S.M . and Chazin , W.J. (1994 ) Biochemistry 33 , 11453 . 60. Carlstrom , G. and Chazin, W.J. (1996 ) Biochemistry 35 , 3534. 61. Pikkemaat , J.A., va n den Elst, H. , va n Boom.J.H. and Altona, C . (1994 ) Biochemistry 33 , 14896. 62. Duckett , D.R. an d Lilley, D.M.J. (1990 ) EMBOJ. 9, 1659 .
Structures and interactions of helical junctions in nucleic acids 49
7
63. Stiihmeier , F. , Welch , J.B. , Murchie , A.I.H. , Lilley , D.M.J. an d Clegg , R.M . (1997 ) Biochemistry 36 , 13530 . 64. Lu , M., Guo , Q . an d Kallenbach, N.R. (1991 ) Biochemistry 30 , 5815 . 65. Leontis , N.B., Kwok , W. an d Newman, J.S. (1991 ) Nucl. Acids Res. 19, 759 . 66. Welch , J.B., Duckett , D.R. an d Lilley, D.M.J. (1993 ) Nud. Acids Res. 21, 4548 . 67. Yang , M.S. and Millar, D.P. (1996 ) Biochemistry 35 , 7959 . 68. Shlyakhtenko , L.S. , Appella , E., Harrington , R.E. , Kutyavin , I . an d Lyubchenko , Y.L. (1994) J. Biomol. Struct. Dynamics 12 , 131 . 69. Leontis , N.B., Hills , M.T., Piotto , M. , Malhotra, A., Nussbaum, J. an d Gorenstein, D.G . (1993) J. Biomol. Struct. Dynamics 11 , 215 . 70. Rosen , M.A. an d Patel, D.J . (1993 ) Biochemistry 32, 6576. 71. Rosen , M.A . an d Patel, D.J. (1993 ) Biochemistry 32 , 6563 . 72. Ouporov , I.V . and Leontis, N.B. (1995 ) Biophys.J. 68, 266 . 73. Overmars , F.J . J., Pikkemaat , J.A., Va n de n Elst , H. , Va n Boom , J.H . an d Altona , C . (1996) J. Mol. Biol. 255, 702 . 74. Welch , J.B., Walter, F . and Lilley, D.M.J. (1995 ) J. Mol. Biol. 251, 507 . 75. Duckett , D.R. , Murchie , A.I.H. an d Lilley, D.M.J. (1995 ) Cell 83, 1027 . 76. Branlant , C., Krol , A. and Ebel, J.-P. (1981 ) Nucl. Acids Res. 9, 841 . 77. Guthrie , C . an d Patterson, B. (1988 ) Annu. Rev. Genet. 22, 387 . 78. Krol , A., Westhof, E., Bach, M., Luhrmann , R., Ebel , J.-P. an d Carbon, P . (1990 ) Nud. Acids Res. 18, 3803. 79. Feldstein , P.A., Buzayan, J.M. an d Bruening, G . (1989 ) Gene 82, 53. 80. Connolly , B. an d West, S.C . (1990 ) Proc. Natl. Acad. Sci. USA 87 , 8476 . 81. Connolly , B. , Parsons, C.A., Benson , F.E. , Dunderdale, H.J. , Sharpies , G.J., Lloyd, R.G . and West, S.C . (1991 ) Proc. Natl. Acad. Sci. USA 88 , 6063 . 82. Iwasaki , H., Takahagi , M., Shiba , T., Nakata , A. and Shinagawa, H. (1991 ) EMBOJ. 10 , 4381. 83. Bennett , R.J. , Dunderdale , H.J. an d West, S.C . (1993 ) Cell 74, 1021 . 84. Bennett , R.J . an d West, S.C . (1995 ) Proc. Natl. Acad. Sci. USA 92 , 5635 . 85. Bennett , RJ . an d West, S.C . (1995 ) J. Mol. Biol. 252, 213 . 86. Ariyoshi , M. , Vassylyev , D.G. , Iwasaki , H. , Nakamura , H. , Shinagawa , H . an d Morikawa, K . (1994 ) CeH78 , 1063 . 87. Sharpies , G.J. , Chan , S.N., Mahdi , A.A., Whitby, M.C. an d Lloyd, R.G . (1994 ) EMBO J. 13 , 6133 . 88. Mahdi , A.A., Sharpies, G.J., Mandal, T.N. an d Lloyd, R.G. (1996 ) J. Mol. Biol. 257, 561 . 89. Iwasaki, H., Takahagi , M., Nakata , A. and Shinagawa, H. (1992 ) Genes Dev. 6, 2214. 90. Muller , B., Tsaneva, I.R. an d West, S.C . (1993 ) J. Biol Chem. 268, 17179 . 91. Rafferty , J.B. , Sedelnikova , S.E. , Hargreaves , D. , Artymiuk , P.J. , Baker, P.J. , Sharpies, G.J., Mahdi, A.A. , Lloyd, R.G. and Rice, D.W. (1996 ) Science 274, 415 . 92. Lloyd , R.G. an d Sharpies, G.J. (1993) EMBOJ. 12 , 17. 93. Kemper , B. and Garabett, M. (1981 ) Eur.J. Biochem. 115, 123 . 94. Mueller , J.E., Kemper , B. , Cunningham , R.P. , Kallenbach , N.R . an d Seeman , N.C . (1988) Proc. Natl. Acad. Sci. USA 85 , 9441 . 95. Mizuuchi , K., Kemper, B. , Hays, J. and Weisberg, R.A. (1982 ) Cell 29, 357 . 96. Lilley , D.M.J. an d Kemper, B . (1984 ) Cell 36, 413 . 97. Giraud-Panis , M.-J.E. , Duckett, D.R. an d Lilley, D.M.J. (1995 ) J. Mol. Biol. 252, 596 . 98. Giraud-Panis , M.-J.E. an d Lilley, D.M.J. (1996 ) J. Biol. Chem. 271, 33148 . 99. Morikawa , K. , Matsumoto , O. , Tsujimoto , M. , Katayanagi , K., Ariyoshi , M., Doi , T. , Ikehara, M., Inaoka , T. an d Ohtsuka, E . (1992 ) Science 256, 523 . 100. Pohler , J.R.G., Giraud-Panis , M.-J.E. an d Lilley, D.M.J. (1996 ) J. Mol. Biol. 260, 678 .
498
Oxford Handbook of Nucleic Acid Structure
101. Center , M.S. an d Richardson, C.C . (1970 ) J. Biol. Chem. 245, 6285. 102. Sadowski , P.D . (1971 ) J. Biol. Chem. 246, 209 . 103. d e Massey , B., Studier , F.W. , Dorgai , L. , Appelbaum , F . an d Weisberg , R.A . (1984 ) Cold Spring Harbor Symp. Quant. Biol. 49, 715 . 104. Duckett , D.R., Giraud-Panis , M.-E. an d Lilley, DM.]. (1995 ) J. Mol. Biol. 246, 95 . 105. West , S.C. an d Korner, A. (1985) Proc. Nad. Acad. Sci. USA 82 , 6445 . 106. West , S.C. , Parsons , C.A. an d Picksley, S.M. (1987 ) J. Biol. Chem. 262, 12752 . 107. Symington , L. and Kolodner, R. (1985 ) Proc. Natl. Acad. Sci. USA 82 , 7247 . 108. White , M.F. an d Lilley, D.M.J. (1996 ) J. Mol. Biol. 257, 330 . 109. Kleff , S. , Kemper, B. and Sternglanz, R. (1992 ) EMBOJ. 11 , 699 . 110. Lockshon , D., Zweifel , S.G. , Freeman-Cook , L.L. , Lorimer , H.E. , Brewer , BJ . an d Fangman, W.L. (1995 ) Cell SI, 947 . 111. Elborough , K.M . an d West, S.C . (1990 ) EMBOJ. 9 , 2931 . 112. Hyde , H. , Davies , A.A., Benson, F.E . and West, S.C . (1994 ) J. Biol. Chem. 269, 5202 . 113. Stuart , D., Ellison , K. , Graham, K. and McFadden, G . (1992 ) J. Virol. 66, 1551 . 114. Kimball , A., Guo, Q. , Lu , M, Cunningham , R.P. , Kallenbach , N.R., Seeman , N.C. an d Tullius, T.D . (1990 ) J. Biol. Chem. 265, 6544 . 115. Bhattacharyya , A., Murchie , A.I . H. , vo n Kitzing , E. , Diekmann , S. , Kemper , B . an d Lilley, D.M.J. (1991)J. Mol. Biol. 221, 1191 . 116. Shah , R., Bennett , R.J . an d West, S.C . (1994 ) Cell 79, 853 . 117. Benson , F.E. an d West, S.C. (1994 ) J. Biol. Chem. 269, 5195 . 118. White , M.F. an d Lilley, D.M.J. (1997 ) J. Mol. Biol. 266, 122 . 119. Parsons , C.A. , Stasiak , A., Bennett, R.J. an d West, S.C. (1995 ) Nature 374, 375 . 120. Panyutin , I.G . and Hsieh , P . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 2021 . 121. Pley , H.W. , Flaherty , K.M . an d McKay, D.B . (1994 ) Nature 372, 68 . 122. Scott , W.G., Finch , J.T. an d Klug, A. (1995 ) Cell 81, 991 . 123. Bassi , G., Mollegaard, N.E., Murchie , A.I.H., vo n Kitzing , E . and Lilley, D.M.J . (1995 ) Nature Struct. Biol. 2, 45 . 124. Bassi , G.S., Murchie , A.I.H . and Lilley, D.M.J. (1996) RNA 2, 756 . 125. Dahm , S.C . an d Uhlenbeck, O.C . (1991 ) Biochemistry 30 , 9464 . 126. Murchie , A.I.H., Thomson, J.B., Walter , F . and Lilley, D.M.J. (1998) Molecular Cell 1, 873 . 127. Miick , S.M. , Fee , R.S., Millar , D.P. an d Chazin , W.J . (1997 ) Proc. Natl. Acad. Sci. USA 94, 9080 . 128. Grainger , R.J. , Murchie , A.I.H. an d Lilley, D.M.J. (1998 ) Biochemistry 37 , 23 . 129. White , M.F. , Giraud-Panis , M.-J.E. , Pohler , J.R.G . an d Lilley , D.M.J . (1997 ) J.Molec. Biol. 269, 647 . 130. White , M.F . an d Lilley, D.M.J. (1997 ) Mol. Cell Biol. 17, 6465 . 131. Whitby , M.C . an d Dixon, J. (1997)J . Molec. Biol. 272 509. 132. Oram , M. , Keeley , A. and Tsaneva, I . (1998 ) Nucleic Acids Res. 26, 594 . 133. Schofield , M.J., Lilley , D.M.J. and White, M.F. (1998 ) Biochemistry 37, 7733 . 134. Parkinson , M.J. an d Lilley, D.M.J. (1997 ) Molec. Biol. 270, 169 . 135. Giraud-Panis , M.-J.E. and Lilley, D.M.J . (1997) EMBOJ. 16 , 2528. 136. Giraud-Panis , M.-J.E. and Lilley, D.M.J. (1998 ) J. Molec. Biol. 278, 117 . 137. Chan , S.N., Vincent , S.D . an d Lloyd, R.G. (1998 ) Nucleic Acids Res. 26, 1560 . 138. Pohler , J.R.G., Norman , D.G. , Bramham , J., Bianchi , M.E . an d Lilley , D.M.J . (1998 ) EMBOJ. 17, 817 . 139. Walter , F. , Murchie, A.I.H. , Duckett , D.R . an d Lilley, D.M.J. (1998 ) RNA 4 , 719. 140. Walter , F., Murchie, A.I.H., Thomson, J.B. an d Lilley, D.M.J. (1998) Biochemistry, in press. 141. Bassi , G.S. , Murchie, A.I.H. , Walter , P. , Clegg, R.M . an d Lilley, D.M.J. (1997 ) EMBOJ. 16, 7481 .
16 DNA higher-order structures Wilma K. Olson Department of Chemistry, Rutgers, State University of New Jersey,New Brunswick, NJ 08903, USA
1. Overview The packagin g of DNA withi n th e clos e confine s of the cel l impose s a higher orde r structure o n th e long , thread-lik e molecule . Th e chai n mus t fol d withi n a highl y crowded environmen t a s well a s adopt arrangements that allow for correc t recognitio n and processin g o f th e geneti c message . Thi s organizationa l structure , whic h i s to o unwieldy fo r direct molecular characterization, can only b e inferred from th e physical properties o f relevan t mode l systems . Isolate d DN A supercoil s wit h intertwine d double helica l strand s constitute on e suc h usefu l model . Th e well-know n interpla y between long-rang e structur e and local twistin g of the supercoi l ca n be use d to driv e the foldin g o f DN A aroun d protein s an d othe r packagin g agents . Th e long-rang e association betwee n interwoun d strand s is relevan t t o th e clos e packin g o f DNA, while th e loca l structura l change s provide insigh t int o th e transien t openin g o f th e double heli x during biological processes. This chapter starts with a general discussion of DNA supercoiling , including the topological constraint s on th e chai n molecule an d th e know n biologica l significanc e of the supercoiled state. Following a brief review o f the intrinsic flexibility of the double helix , and the combine d elasti c rod/polyelectrolyte characte r of the chain,w e then tur n to th e models an d computational approaches used to deduce the structure of supercoiled DNA. The surve y covers novel mathematica l representations o f the doubl e helica l axis , classic parameterization o f DNA a s an elasti c rod, typical energ y minimizatio n an d dynamic s protocols, an d efficien t numerica l solution s o f the equation s o f equilibrium . Sectio n 4 details the equilibriu m structures and general structural principles gleaned from a variety of systems , startin g wit h th e uncharged , naturall y straight , isotropi c ro d a s a poin t o f reference. Th e example s point t o the rol e of the ioni c environment , a s measured by different non-bonde d energy terms, and the effect s o f bound proteins on th e configuration of the idealized rod. The fina l section illustrates how i t is becoming possibl e to study the influence o f realisti c chemica l features , suc h a s anisotropic bending , natura l curvature, and enhance d bendin g flexibility, on DN A supercoiling . Th e chapte r concludes with a discussion o f the large-scal e structural changes observed i n dynamical studies and a brief commentary on various perspectives of supercoiled structure.
2. DNA supercoiling Closed loop s of double-stranded DN A ar e ubiquitous i n nature, occurring i n systems ranging from plasmids , bacterial chromosomes, an d viral genomes, whic h for m single
500
Oxford Handbook of Nucleic Add Structure
closed loops (1,2), to eukaryoti c chromosome s an d othe r linea r DNAs, whic h appear to b e organize d int o topologicall y constraine d domain s by DNA-binding protein s or other cellula r attachments (3,4) . Th e topologica l constraint s in th e latte r systems are determined by the spacin g o f the bound residues alon g th e contou r of the chai n an d the impose d turn s and twists o f DNA i n th e intermolecula r complexe s (5—8) . As long as th e end s o f th e DN A sta y i n plac e and th e duple x remain s unbroken , th e linking number, Lk, o r number o f times the tw o strands of the doubl e heli x wrap around on e another, i s conserved. [Whil e th e linkin g numbe r i s conventionally associate d with a closed duple x (9) , a conserve d quantit y simila r t o Lk ca n als o be define d (I . Tobias, unpublished data ) fo r a spatiall y anchored linea r DNA. ] Thes e constraint s i n Lk underlie th e well-know n supercoilin g of DNA, i.e . the deformatio n of native three dimensional structure manifested by a higher-order foldin g of the chai n axis and com pensatory coilin g of the complementary strands. I n other words, th e stres s induced by positioning th e end s of the polyme r i n locations othe r tha n th e natura l (relaxed) state perturbs th e overal l shap e and/or loca l twisting o f the intervenin g part s of the chain . These structura l distortions ar e the nuclei c aci d counterparts of the tertiar y folding of helical segments in proteins (e.g . coiled coils , twisted sheets), but th e change s in struc ture are spread over a much large r molecular scale in DNA . 2.1
Topological constraints
The interdependenc e o f secondar y an d tertiar y structur e i n supercoile d DN A i s expressed in mathematical terms using White's equatio n (10), In th e absenc e of strand breaks Lk ha s a fixed valu e which ca n be decompose d int o a contribution Wr called the writhing number , whic h describe s the folding of the heli x axis, an d the tota l twisting o f the tw o strands , Tw. These tw o parameter s are differen tial geometri c quantitie s that var y continuously with th e shap e of the duplex , s o that when chai n ends are spatially constrained, Lk, a topological property, is constant. Th e writhing number, a n accounting o f pairwise spatia l interactions alon g th e helica l axis (9), i s zer o fo r plana r configuration s and fo r out-of-plan e symmetri c arrangements . Non-zero value s are obtaine d onl y whe n th e DN A axi s is distorted t o a non-plana r asymmetric arrangement . Th e writhin g number , however , i s not a unique characterization o f tertiar y structur e and ma y b e th e same for ver y differen t spatia l arrange ments, suc h a s the nicked , circula r DN A show n i n Fig . 16. 1 wit h a short fragmen t wrapped i n a superhelical pathway around a cylindrical 'phantom ' protei n o r th e un restrained interwound structur e that results when th e chai n is ligated and the protein is removed. The loca l dispositio n of chemica l residue s in differen t structure s with the same writhing number is also quite different . 2.2 Biological importance The linkin g number constrain t i n supercoile d DN A provide s a structura l basi s fo r comprehending th e helica l unwindin g implicate d i n significan t biologica l processes
DM4 higher-order structures 50
1
Fig. 16.1. Nicke d solenoidal (116 ) and unnicked interwoun d (107) configurations o f supercoiled DN A with the same magnitude of the writhing number (—1.7).
such as replication an d transcription. For example , th e bindin g o f different polymeras e enzymes t o DN A a t the startin g point o f replication is enhanced in negativel y super coiled chain s (11) , where th e linkin g number i s less than tha t i n th e nativ e molecul e (Lk0) an d th e doubl e heli x i s subjected t o a persisten t interna l strai n tha t tend s t o unwind region s o f local structure . Conversely, th e openin g o f DNA generate d upo n its complexation wit h RN A polymeras e creates topological subdomain s on eithe r side of the movin g enzyme , th e nuclei c acid segments behind th e protei n assembl y adopt ing a negative ALk, wher e ALf e = Lk — Lk0, an d thos e ahea d o f i t havin g a positiv e value (12 ) (se e Fig . 16. 2 fo r a computer-generate d representatio n o f th e bas e pair structure i n a loope d segment o f suc h a DNA) . A globa l respons e t o thes e locall y induced change s in ALfe , where th e unwoun d residue s behind th e polymeras e convert into configuration s with negativ e writhin g numbe r an d thos e ahea d o f i t fol d int o arrangements of positive Wr (13), helps t o accoun t for the uptak e of other proteins o n the DNA . Specifically , th e negativel y writhed structure s are expected t o facilitat e the reassembly o f DN A o n nucleosome s behin d th e polymerase , whil e th e positivel y writhed form s may enhance thei r disassembly . It is well-known tha t the associatio n of the histone proteins with DNA o n the nucleosome force s ~14 0 b p of the doubl e heli x into a left-hande d superheli x (Wr < 0 ) (14,15) , tha t nucleosom e formatio n occur s preferentially o n negativel y rathe r tha n positivel y supercoile d DN A (16) , an d tha t positive supercoilin g alters nucleosome structur e compared t o negativ e coiling (17,18) . The positiv e supercoilin g ahea d o f a movin g polymeras e ma y similarl y facilitat e th e
502
Oxford Handbook of Nucleic Acid Structure
Fig. 16.2. Computer-generate d illustration at the base pair level of the topologica l subdomains created in an anchore d DN A loo p b y th e actio n o f enzymes suc h a s RNA polymeras e an d certai n topoisomerases . Underwound segment s (ALf e = -1 ) behin d th e phanto m protei n o n th e righ t en d of the loop and overwound segment s (ALk = 1 ) o n th e lef t end . Imag e base d o n unpublishe d compute r simulation s b y S.C. Pedersen .
uptake o f topoisomerases lik e E. coli DNA gyrase , which removes adde d superhelica l stress and wraps 120—15 0 bp in a right-handed pathwa y (Wr > 0) around an aggregate of proteins (19-22).
2.3 DNA conformation and flexibility The manne r in which a DNA fragmen t responds to superhelical stress depends on th e native structur e an d intrinsi c flexibilit y o f th e chai n sequence . A structura l cod e embedded i n th e DN A bas e pai r sequenc e help s t o organiz e th e foldin g and deter mines th e flexibility of the lon g polyme r molecule . Som e DNAs , fo r example , for m natural superhelice s that hel p to organiz e the foldin g of supercoile d state s (23—25) , while othe r sequence s appear to resis t folding deformation (26—29) . As pointed ou t i n preceding chapters in this volume, th e doubl e helix bend s anisotropically at the dinu cleotide level . Neighbourin g bas e pairs preferentiall y rol l abou t thei r lon g axes , an d hence int o th e majo r and mino r groove s o f the structure , rather than tilt abou t thei r short (dyad ) axe s (30—32). Moreover, th e growin g databas e of X-ray crysta l structures (33) show s tha t th e bendin g an d twistin g o f individua l bas e pai r step s depen d o n sequence, with som e dimer s acting as natural wedges tha t chang e the directio n o f th e helical axi s and othe r sequence s acting a s sites o f under - o r over-windin g (34) . Th e
DAM higher-order structures 50
3
degree o f twistin g observe d i n th e X-ra y structure s i s further tie d t o th e degre e o f bending an d th e bas e pai r displacemen t wit h th e unwindin g o f adjacen t residues inducing deformations into the major groove an d the latera l displacement of base pairs along thei r lon g axe s (34—38) , i.e . a n increas e i n rol l an d a decreas e i n slide . Furthermore, th e loca l chai n stiffnes s i s sequenc e dependen t wit h certai n residue s adapting more easil y to impose d stress . For example , sever e protein-induced bend s of DNA occu r predominantl y a t pyrimidine—purin e step s (39—41), th e dimer s expecte d to b e th e mos t deformabl e o n th e basi s o f steri c (34,42 ) an d energeti c (30—32,43 ) arguments.
2.4 DNA as an elastic rod The influenc e of fixed end s and enzymati c activity o n th e overal l foldin g o f DNA i s analogous t o th e change s i n topolog y see n i n th e manipulatio n o f physica l models , such a s the loopin g an d self-interwindin g tha t result s whe n th e fre e en d i s rotate d and/or translate d wit h respec t t o th e anchore d en d o f a stif f rubbe r cor d o r guita r string. Mechanica l method s commonl y use d t o analys e thes e elasti c materials ar e a t once applicabl e t o th e stud y o f spatiall y constraine d DNA . Importantly , th e doubl e helix shares critical material features with the thin , circula r elastic rods treated in classical 19t h centur y model s (44—47) . Supercoile d DN A i s clearly longer tha n i t i s wid e (~20 A diameter). Furthermore , because of the strong hydrogen bondin g an d stacking interactions o f th e constituen t bas e pairs , th e DN A molecul e i s naturall y very stiff . The bending , twisting , an d stretching of adjacent residue s are so limited that chain s of 150 bp ar e almost full y extended , wit h th e compute d root-mean-squar e end-to-en d distance equa l t o roughl y 85 % the tota l contou r lengt h (48) . Th e deformation s o f DNA ca n thu s b e describe d i n term s o f Kirchhof f 's ro d mode l wit h tw o bendin g contributions (K J and K 2), the twis t density (K 3), and th e axia l extensio n (e) a t al l points 5 along the chai n contour L . The elasti c energy o f such a system is given by :
where th e angle s ar e component s o f th e vector , K(S) = [K l, K 2, K 3] describing th e angular rotatio n o f local coordinat e frame s embedde d i n cross-section s o f the ro d a t 5 and 5 + ds , and e reflects th e displacemen t o f adjacent frame s alon g the axi s of the rod . The parameter s E an d a comprising th e stretchin g constan t ar e the Young' s modulu s and cross-sectiona l area , respectively . I f th e ro d i s divide d int o a se t o f discret e ele ments and th e spacin g between plana r slabs, As, is taken a s equal to th e typica l 3. 4 A distance betwee n residue s i n B-DNA , K,A s and K 2A.s approximat e th e so-calle d rol l and tilt angles , K 3As the bas e pair twist angle, an d eA s the pe r residu e axia l rise (8,49). Lateral/shear displacements in the bas e pair plane (i.e . slide and shift ) ar e not treate d in this scheme . A s evident fro m eq n 16.2 , th e interdependenc e o f angula r an d transla tional variable s is also omitted i n the model . Until ver y recently , supercoile d DN A wa s alway s approximate d a s a naturall y straight, inextensibl e ro d tha t bend s wit h equa l likelihoo d i n al l directions , i.e .
504
Oxford Handbook of Nucleic Acid Structure
A1 = A2 = A, K1 ° =K2° = 0 in eq n 16.2 . A t thi s level o f simplification, the bendin g energy reduces to a function of the curvatur e of the helica l axis, K = (K 1 2 +K 2 2 ) 1 / 2 , an d the twistin g contribution simplifie s to a function of the writhing number, th e impose d value o f ALk, and the tota l contou r length (50):
This formulatio n take s advantage o f th e fac t tha t th e twis t densit y i s uniform i n th e equilibrium configuration s of a naturally straight rod. Th e computationa l advantages of omitting individual base pairs in this treatment are obvious (i.e . the energeti c profile is a function o f th e duple x axi s alon e an d ther e i s no nee d t o locat e individua l bas e pairs), but d o not necessaril y justify th e erroneou s representation of base pair structure. The simplificatio n is, of course, necessary in most analytica l schemes. The exac t results provided b y th e latte r studie s serv e a s critical benc h mark s fo r numerica l method s aimed a t modelling the doubl e heli x a t a more realisti c level.
2.5 Polyelectrolyte character of DNA As a polyelectrolyt e wit h a ne t negativ e charg e a t ever y nucleotid e residue , doubl e helical DN A i s profoundly affecte d b y it s ionic environment . Sal t effects ar e particu larly important i n supercoile d DNA wher e part s of the chai n that are distant in linear sequence may come int o clos e contact. Explicit atomi c leve l treatment o f supercoiled DNA, however , i s still beyon d th e capabilitie s of eve n th e mos t sophisticate d com puters. The siz e limitation problems confrontin g simulations of supercoiled molecules necessitate th e us e o f primitive model s wher e th e chai n backbon e i s reduced t o a n approximate atomi c representation . A t th e simples t level , th e DN A i s modelle d b y hard spher e exclude d volum e term s tha t onl y crudel y mimi c th e electrostati c repul sions of contacted segments (51) , while th e mos t detailed models to dat e (52,53) assign a point charg e to eac h nucleotide residue and use an implicit representatio n of solvent (Debye charg e screening). A number of intermediate scheme s (54-57 ) avoi d explici t counting o f charged residue s by dividin g th e chai n int o longe r segment s o f uniform charge density . Recent theoretica l wor k (58,59 ) point s t o potentia l attractiv e force s stabilizing the associatio n of closely spaced charged rods . Thes e interactions , which ar e thought t o reflec t th e share d counterion atmospher e o f the rods , hel p t o accoun t for the spontaneou s aggregation a t high salt concentrations o f short DNA fragment s wit h increased concentratio n o f polymer an d may be relevan t to bot h th e long-range con tacts brough t abou t by supercoilin g an d th e cholesteri c liqui d crysta l organization of DNA i n some organisms (e.g . bacteria, dinoflagellates, mitochondria) (60—62) .
3. Computational issues 3.1 Equilibrium vs. dynamic structures The logica l first ste p in understandin g the globa l foldin g of supercoile d DNA is to identify th e configuration s of minimum energ y (i.e . equilibriu m states) . Thes e state s
DNA higher-order structures 50
5
must compromis e th e natura l twisting an d bending o f the chai n i n orde r t o kee p th e ends in place and to avoi d long-range self-contacts . The force s tha t satisf y the boundary condition s ar e initially unknown i n rod models bu t ca n be determined alon g wit h the complet e se t of structural variables (typically Euler parameters tied t o th e bendin g and twistin g component s o f individual referenc e frames) tha t minimize th e energ y o f the constraine d DNA . Externa l forces , suc h a s those tha t migh t b e associate d wit h binding protein s o r a n electri c field , ca n als o b e include d i n th e tota l energy . Th e equilibrium configuration s of the syste m are then obtaine d by numerical solution o f a set o f non-linea r algebrai c equation s (8,49,63,64) . Othe r treatment s o f supercoile d DNA, b y contrast, add explicit terms to th e potential energy (65—68 ) o r include clever representations o f the chai n axi s (69—72 ) t o satisf y th e structura l constraints on chai n ends. Th e minimu m energ y state s in suc h studies are identified by a guided searc h of configuration space , typically via simulated annealing or other acceleratio n procedures in Monte Carl o an d molecular dynamics studies (70,73—75) o r with derivative s of the energy in direct minimizations (71,72) . In general, the therma l fluctuations of the doubl e heli x as a whole mus t be consid ered alongsid e th e equilibriu m structure s (76). These entropi c effect s becom e espe cially significan t when th e DN A i s large compare d wit h th e persistenc e length , a classical measur e of th e distanc e ove r whic h th e directio n o f the chai n i s maintained (77). I n B-DNA thi s distance is about 50 0 A (78) , assuming that th e measure d chain dimensions ca n be interprete d i n term s o f the isotropi c ro d model . Th e equivalenc e of this value with the contou r lengt h o f a ~150 b p duple x is thus a rough indicator o f the chain lengt h at which global flexibility starts to become important. The importan t issue i n sufficientl y lon g DN A i s how th e energ y difference s betwee n loca l minima , and th e barrier s betwee n them , compar e wit h th e therma l energy , kT. Par t o f this problem ca n be addresse d with technique s lik e Mont e Carl o samplin g (54,66,67,79 ) or b y Brownian (80—84) , Langevi n (71,85) , an d molecula r (73—75 ) dynamic s simula tions, a s well a s with analytica l theory (86,87) . Th e Mont e Carl o method , i f care is taken t o generat e a representativ e sampl e o f configuratio n space , wil l uncove r fin e details of the globa l states accessible through therma l fluctuations , whil e th e dynami cal studies , becaus e they ar e base d o n numerica l integratio n o f th e equation s o f motion, wil l giv e additiona l insigh t int o th e pathway s o f overal l structura l change . The classica l ro d models , whil e employe d t o dat e almos t exclusivel y in studie s o f DNA statics, are routinely applie d to a great variety of dynamical problems i n engineering mechanics . Application s o f classical rod dynamic s t o DN A ar e just beginnin g to appear (87-91). A variety of theoretical and computational approache s are therefore required to dea l with th e various aspects of DNA supercoiling . Fo r quantitative and qualitative predic tions o f th e effec t o f loca l structural changes o n th e globa l feature s o f DN A tha t is long compare d wit h th e persistenc e length, th e inclusio n o f therma l fluctuation s i s essential, an d Monte Carl o o r dynamica l methods mus t be used . There are situations, however, whe n th e inclusio n o f thermal fluctuation s i s not essential , such a s in shor t stretches of a long molecule (e.g . loops o f DNA anchore d a t their end s by proteins). If one expect s th e mos t importan t informatio n t o b e containe d i n th e structura l details of the equilibriu m state s (e.g . th e pat h o f DNA o n th e nucleosom e o r th e rotationa l positioning of bent DNA sequences) , on e should tur n to minimization method s o r to
506
Oxford Handbook of Nucleic Acid Structure
one o f the numerica l or analytica l approaches recently develope d o n th e basi s of classical rod theory.
3,2 Chain representations The representatio n o f closed chai n molecules wit h end s confine d t o a fixed separatio n and orientatio n i s a long-standing proble m i n polyme r physica l chemistry tha t ca n b e attacked fro m severa l point s o f view . I n on e approac h th e configuration s o f uncon strained linea r molecule s tha t mee t certai n spatia l criteria (normall y a se t of distances and angle s betwee n chai n ends ) ar e collecte d throug h exhaustiv e simulation studie s (92). Thi s method , however , i s not practica l for studies of the preferre d geometry an d intrinsic flexibilit y o f supercoile d DNA . Th e probabilit y o f identifyin g specifi c configurations fro m rando m samplin g o f th e unconstraine d chai n i s s o lo w tha t i t i s difficult t o accumulat e a meaningful set of appropriate states . The method , however , i s very useful fo r simulations of the kinetic s of chain cyclization (i.e . ring closure) (93—95 ) or the formation o f closed loops (79,96,97) . A second way to study supercoiled DN A is to star t with a configuratio n tha t meet s th e desire d structura l criteri a an d allo w th e system to defor m subject to some potential function . The majo r difficulty i n such simulations is the preservation o f the constraint s on chai n ends. Individual Cartesian coordinates must be moved i n small concerted steps , or internal torsions and valence angles varied i n a highly correlate d fashion , to maintai n th e fixed configuration of chain ends (66,98—100). Alternatively, one ca n introduce explicit energ y term s that force the chain ends t o a give n position . Elasti c potentials wit h n o physica l significanc e ar e typicall y employed i n Cartesia n simulations to kee p on e o r mor e interatomi c distance s within a desired rang e (65,73—75,101). Bot h approache s are computationally intensive . Two less computationall y demandin g methods , on e usin g curv e fittin g technique s and th e othe r involvin g Eule r parameters , ca n also be take n t o identif y th e preferre d configurations o f supercoiled DNA . Th e forme r method employ s simpl e mathemati cal formulations, i.e. piecewise B-splin e curve s or finit e Fourie r serie s representations, that automaticall y satisf y th e end-to-en d limitation s o n th e constraine d DN A axis . These expressions , wit h a smal l numbe r o f independen t variable s (th e vertice s o f a polygonal representatio n o f the smoothl y foldin g chain in the cas e of the B-spline an d a set of coefficients fo r th e Fourie r series) , have been use d in numerou s simulation s o f DNA modelle d a s a naturally straigh t isotropic ro d (5,52,55,72,102-105) . Th e Eule r parameters ar e unknowns determine d i n the elasti c rod treatment of supercoiled DN A (8,49,63,106,107). Bot h representation s ai d rapid optimization o f chain configuration. The degre e o f computed chai n movemen t depends o n bot h th e lengt h o f achiev able simulation s an d th e fines t leve l o f chai n representation . Large-scal e polyme r motions becom e apparen t i f th e DN A i s simplified an d th e numbe r o f independen t variables is thereby reduced . Th e treatmen t o f supercoiled molecule s frequently entail s reduction o f the polyme r t o a sequence o f virtual bonds, eac h o f which ma y some times spa n severa l helica l turn s (66,108) . Th e us e o f suc h rigi d unit s i s justified i n short, stif f fragments u p t o a few helical turn s and in very long chains , i.e. o f 2000 b p or more accordin g t o direc t computation s o f the Gaussia n limit fo r idealized B-DNA duplexes (48) , where th e extende d bond s correspon d t o hypothetica l Kuh n segment s (109,110). Th e representatio n o f intermediat e lengt h [O(10 2) bp ] DN A a s rigi d
DNA higher-order structures 50
7
repeating unit s can be misleadin g in tha t chains of this length ar e flexible enough o n the globa l scal e that th e mea n end-to-en d distance s diffe r b y 15 % or mor e fro m th e static rod approximation (48,111,112) . Furthermore , th e bending 'corrections ' neede d to relat e such long segment s t o th e observe d persistenc e length o f DNA ar e exagger ated (112,113 ) an d beyon d th e limite d angula r rang e ove r which th e elasti c ro d approximation i s valid. The globa l folding is also quite irregula r i n simplifie d models generated from extended polymer links (66,67,82) .
3.3 Curve fitting techniques: B-splines and finite Fourier series The mai n advantag e of B-splin e parameterizatio n o f a close d curve , r = Eri-(u), i s th e direct contro l o f the chai n pathway provided b y the choic e o f independent parameters, p, calle d controlling points (114). The order-fou r (cubic ) curves with regiona l segments,
are sufficient fo r th e calculatio n of topological an d energeti c parameter s of a naturally straight, isotropi c rod . Th e coefficient s i n thi s expression assur e the smoot h connec tion betwee n successiv e curve segments and the continuit y in firs t an d second deriva tives neede d t o evaluat e eq n 16.3 . Th e increment s o f the mes h parameter , u, whic h varies betwee n zer o an d unity , determin e th e leve l o f structura l representation (i.e . virtual bon d lengths). I n othe r words, th e locatio n o f individual residue s i s implicitl y determined b y the equation s of the close d curve , wit h th e numbe r o f computational variables sharply reduced compared with tha t necessary for explici t specification o f all chain units. A subset of controlling points can be fixed durin g the cours e of computation t o simulate effect s o f local rigidit y withi n th e DN A (103,115,116) . I n som e o f the simulation s reported below , a set of points describin g a superhelix o f appropriate proportions is used to model th e presence of a protein rigidl y bound t o DNA. Th e B spline procedure, however , ha s two drawbacks . The complexit y o f the curv e is limited by th e numbe r o f controllin g points : mor e variable s ar e neede d t o represen t mor e convoluted pathways . In addition , th e controllin g point s simpl y guide, bu t d o no t li e on, th e curv e that they define . Only i n th e limi t o f an infinite number o f controllin g points is it possible to represen t specific spatia l features. Fourier analysi s correct s for th e deficienc y in B-splin e configurationa l control an d provides a direct connectio n betwee n experimenta l measurement an d computer simu lation. Virtuall y an y targe t functio n o r se t o f coordinate s (e.g . a n electro n micro graphic tracing) can be transformed into a finite Fourie r series, the simplicit y of which can be exploite d for structura l manipulatio n an d analysi s (5,72,104,105) . An expres sion of the form,
508
Oxford Handbook of Nucleic Acid Structure
corresponds to th e differenc e betwee n a given startin g structure and an arbitrary chai n configuration. Th e vectoria l coefficients , am and bm, are the independen t variable s that determine th e foldin g of the heli x axis , while th e increment s of the contou r parame ter, 0 < u (s)/2] an d u = [u^(s), u2(s), u2(s}], avoid s singularities and the computationa l cost s associated with th e trigonometri c parameter ization (49) . The element s o f T(s) ar e furthe r relate d t o th e component s o f K(S) , th e parameters used in eq n 16. 2 t o monito r the bendin g an d twisting of the rod . Th e K(S) values ar e given b y th e scala r products o f the d , with thei r derivative s wit h respec t t o arc length , dd/ds = K(S) X Ai, e.g. K1 = d3-dd2/ds = 2(ql'q4+q2'q3—qT,'q2—q4qi). Thus , optimization o f th e energ y o f th e constraine d DN A ultimatel y yield s th e bas e pai r axes.
3.5 Energy minimization procedures Minimum energ y form s o f supercoiled DNA s ca n be identifie d usin g stochastic (e.g . Monte Carlo) , deterministi c (e.g . direc t minimization) , an d iterativ e methods . Th e Monte Carl o scheme s entai l rando m variatio n o f independent chai n parameter s (e.g . polygonal vertices , B-splin e controllin g points , Fourie r coefficients ) wit h configur ational acceptance based on the standar d Metropolis criterio n (118) . Th e simplicit y of
DM4 higher-order structures 50
9
the algorith m an d th e eas e o f programmin g ar e counterbalance d b y th e long time s required t o identify th e globa l energ y minimum . Mont e Carl o simulation s (5,103,119 ) can b e carrie d ou t a t a fixe d (high ) temperatur e wit h th e repetitio n o f successfu l downhill move s t o accelerat e convergence (70) , or graduall y over a series of tempera tures in a simulated annealin g scheme (120) . Direct optimizatio n method s entai l computatio n o f th e energ y an d it s firs t an d second derivative s with respec t to th e thre e component s o f the independen t variable s (55,72,102,104,105). Th e requisit e programming i s more demandin g tha n th e Monte Carlo method , bu t th e computationa l tim e i s significantly enhanced; se e Table 5. 1 i n ref. 11 6 for timings .
3.6 Elastic equilibrium conditions Iterative procedure s ar e use d i n solvin g th e se t o f non-linea r differentia l equation s o f equilibrium fo r a spatially constrained DN A ro d (8,49,63,106) . Th e equation s follow from th e equilibriu m conditions , dF/ds+f = 0 an d dM/d s —(1 + e)F X M//(yffijihit:
xlrttrtitn'f of RNA i
i / (swj7i (' /; i /17o' (i' (/« and ribozyittcs 54
1
Fig. 17.1 . (.] ) SUTOIVIL- W o f th e Standar d ,(,inii}: A(iiufj' ) tun - p;iir within : s regular- lii'li x i n striu'tun - .3.1. 3 ( I 1 - ) . ) ( I ' H I i 1! ) i m i i i i i i ' v : l i i T J l . ![i) StLTrin-in v nl " ; i \}] L-,ircJ (!: A p a i r stacke d 111 1 , 1 friui s W.iKon-Clrii-k/l ioti-Mrt- n A : A p.ii r i n structure 3.2. 4 ( I N ) ( I ' D l i I I ) [iiiinlx-r : 2K3Jj .
cross L'.tc h othe r a t a n angl e o f 57 " an d th e contact s involv e th e hydroxy l group s o f one molecul e wit h th e pyriniidiiu ' O 2 an d L!R ' p u r i nc N 3 o r N 2 atom s o f the- othe r (see below an d Fig . 17.12). I n tota l 1 8 direc t intermolecula r contacts ar c observed , which ca n b o divide d i n t o fou r rihosi - phosphate , fiv e ribose-bast- , o r nin e ribose-ribosc inti-rrictions . I n th e t e r m i n a l has e pairs , th e ()3 ' hydroxy l group s participate.
542
Oxford Handbook of Nucleic Acid Structure
3.1.5 The r(C4G4) helix in two crystal lattices The self-complementar y RN A octamer , r(C 4G4), was studied in tw o crysta l forms, a rhombohedral on e (a t 1.8 A) and a hexagonal on e (a t 2.8 A ) (12) . For th e rhombo hedral form, th e dat a extended to 1.4 6 A with synchroton radiation . The helice s are very similar in the tw o form s an d both ar e close t o the standar d fibre RN A helix . I n the middl e o f the helix , the 5'C—3' G step presents a pronounced interstrand stacking of th e purin e rings , a s is commo n i n RN A helices . I n th e rhombohedra l crysta l form, adjacen t doubl e helice s stack head-to-tail an d form infinite column s (th e local pseudo-twist angl e betwee n duplexe s i s 4°) . I n th e hexagona l crysta l form, th e helices stac k head-to-hea d (the y ar e relate d b y a twofol d axis) , whic h lead s t o a pseudo-right-handed superhelix . Th e packin g contact s in th e hexagona l for m ar e restricted t o stacking interaction s betwee n terminal bas e pairs, while in the rhombohedral crysta l th e helice s ar e interlocke d wit h insertion s o f th e sugar—phosphat e backbone o f one heli x in the shallo w groov e o f another helix. I n th e latte r case, th e contacts ar e mad e essentiall y by hydroge n bond s betwee n hydroxy l an d phosphat e groups. Th e differen t packin g contact s might explai n th e lowe r resolutio n o f th e hexagonal form. The RN A hydratio n wa s studied i n detai l in a noteworthy articl e (50) . The O1 P phosphate oxyge n atoms , th e pro-R P oxyge n atoms , ar e systematicall y bridged b y water molecule s on both strands (51) . Thes e bridging water molecule s are themselve s linked t o a strin g o f bridge d wate r molecule s bindin g t o hydrophili c atom s i n th e deep groov e (N4 , N7) . O n average , eac h O2 ' hydroxy l grou p i s hydrate d b y tw o water molecule s (abou t th e sam e level a s the pro-R P oxygen atoms) . Except fo r th e terminal base pairs, a water molecule bound t o th e O2 ' ato m bridges to the exocycli c O2(Y) o r rin g N3(R) atoms , a s is frequently seen. Around th e O2 ' hydroxy l groups , water molecule s cluster into fou r regions , indicatin g that the boun d wate r molecule s possess additiona l contact s to th e surroundin g pola r atoms , like O3' , O4' , an d O2 P (pro-Sp) atoms . Th e averag e distanc e betwee n O2 ' o f residu e (i) and O4 ' o f residu e (i + 1 ) is 3.68 A , longe r tha n tha t o f a typical hydrogen bond . I t wa s therefore con cluded that water molecules ar e better acceptor s than the rin g ribos e O4' atoms . Th e water structur e i n th e dee p groov e i s highl y organize d an d display s pentagona l arrangements. In the shallo w groove, a t the packing contacts, the hydrophili c atoms of one duple x (especiall y the O2 ' hydroxy l group ) ca n replace a water molecule o f th e hydration network. Interestingly , compared with th e same sequence with deoxyribos e sugars, th e ribo-oligome r i s strongly stabilized (AT m = 25.5° ) an d th e stabilizatio n is enthalpy driven . 3.1.6 The alternating purine-pyrimidine r(GUAUAUA)d(C) helix In thi s duplex , th e 3'-termina l residu e contain s a deoxyribose suga r an d not a ribose. Crystals were neve r obtained in tha t case , an observatio n tha t is not understoo d (14) . The crystal s belong t o th e rhombohedra l spac e group R 3 (on e o f the previou s structures belongs t o R32 ) wit h head-to-tai l packing o f helices with a negligible pseudo twist angl e a t th e junction. Eac h duple x i s surrounded b y thre e othe r duplexe s an d possesses three types of environment, tw o o f which present packing contact s whereby the sugar—phosphat e o f on e duple x face s th e shallo w groov e o f a neighbouring one .
Crystallographic structures of RNA oligoribonucleotides and ribozymes 54
3
Within eac h duplex, th e rol l angles alternate between th e larg e positive values at Up A steps (13.3° ) an d th e smal l values at ApU step s (3.6°) with , i n bot h cases , larg e nega tive value s fo r th e propelle r twis t [-18(3)°] . Eleve n o f th e 1 4 hydroxy l group s ar e hydrated and four of them directly contac t th e rin g O4 ' ato m o f the next residue . I n a couple o f instances, two-water bridge s lin k th e O2 ' t o the O2(Y)/N3(R ) o r the O2 ' to the O4'.
3.2 Helices with unusual internal base pairs The observatio n tha t thre e familie s o f tetraloops , th e — GNRA-, th e -UNCG— , an d the —CUUG — tetraloops , ar e overwhelmingl y presen t i n larg e RNAs lik e ribosoma l RNAs, o r self-splicing introns (52,53) , encourage d investigators to attemp t t o crystal lize them. Som e o f these tetraloops hav e been analyse d by NMR method s i n solutio n (54—56; se e Chapter 18) . However, sinc e such hairpi n loop s ar e attached to a n RN A duplex wit h Watson—Cric k complementarity , a t th e hig h RN A an d sal t concentra tions typica l o f crystallization conditions, the y ten d t o for m intermolecula r duplexe s with non-canonica l bas e pairs in thei r middle instea d o f intramolecular hairpi n loops . This le d t o structura l information o n non-canonica l bas e pairs , albei t sometime s i n somewhat unnatura l environments . 3.2.1 The helix with two U:C mismatches between two G:U wobble pairs The dodecame r GGACUUCGGUC C crystallize s with a twofol d axi s betwee n th e two centra l U:C bas e pairs (15). Ther e is an additional twofol d axi s between adjacen t dodecamers so tha t the y stac k in a head-to-tail fashio n with a pseudo-twist angl e of 16.1° an d a rise o f 2.12 A. Th e helica l parameters (32.1 ° an d 2.9 3 A) ar e typica l o f RNA helice s wit h th e Cl'—Cl ' distanc e a t the U: C bas e pair increase d by 1 A, and the angl e betwee n th e glycosy l bond s decrease d b y 15° . The U: C bas e pair contain s only on e direc t hydroge n bond , betwee n O4(U ) and N4(C), wit h th e tw o rin g N 3 nitrogen atom s bridge d b y a water molecule . Interestingly , two-wate r bridge s occu r between th e N4(C ) [or the O4(U) ] an d th e pro-Rp anionic phosphat e oxyge n o f the attached 5'-phosphat e grou p (se e Fig . 17.2a). Suc h a two-water bridge occur s also in the G: U pai r where i t involves the N7(G). I n the G: U pair , a water molecule link s the N2(G) an d th e O2'(U) , instea d o f th e N2(G ) and O2(T ) in DN A G: T bas e pairs (57). Th e wate r molecule s i n th e dee p groov e o f th e G: U an d U: C bas e pair s have isotropic B factor s abou t twic e a s high a s those i n th e shallo w groove . Th e widt h o f the dee p groove , normall y aroun d 4 A , i s almos t double d i n th e presen t structure , while th e widt h o f the shallo w groove , normall y aroun d 1 1 A, is almost unchange d at 9 A . I n addition , i t i s worth notin g tha t th e hydratio n patter n o f th e G: U remain s qualitatively unchange d whe n th e natur e o f the flankin g bas e pairs changes, as seen in recent crysta l structure s (58,59) wher e tandem s o f alternat e G:U pair s take place . I n the shallow groove , a water molecul e i s present that contacts the N 2 o f the G togethe r with th e O2' an d O2 atom s of the U. This patter n o f hydration i s typical of G:U pairs in crystal s and i n molecula r dynamic s simulation s (se e ref. 60 fo r discussion) . Th e sequence order , 5'-UG-3 ' o r 5'-GU-3' , mainl y affect s th e twis t angl e betwee n th e tandem G: U pair s by increasing it to 38.1°, or decreasing i t to 25.3°, respectively .
54-'l
Oxford Handbook of Niti'lcic Acil Sinu'litrc
Fig. 17.2, (a ) SrLTCov'Ho w o f a U: C pair i n strm'tm v .3 2.1 ( U i n buck ) with liu - MiiTOLiiuliiU', ^oK'rn r nio l ccuk's (d.ir k ^p]icr : I h n i h ) . Th e sheared C p j A pair i n th e t f l r j l i m p -GAAA- . Th e th e sheare d G A pair a s i n Fig. 17.1b) . An e x a m p l e o f a sheared A : A pair fro m s t r u c t u r e 4 . 2 (f>4 ) J I ' D H I I ) number:: 1gid) t o i l l i j i i i - . i i i - t h e iiosu-rii-ii y between a sheared A: A pai r an d a G A pair .
552
Oxford Handbook of Nucleic Acid Structure
Fig. 17.6. Stereoview s o f the magnesium binding site s around th e hammerhead structur e (69). In (a) , the full structur e is represented. I n (b) , onl y th e cor e structur e i s shown i n th e sam e orientatio n a s that i n (a). Th e nucleophili c 2'-hydroxy l hydrolysin g th e phosphat e grou p i s marke d b y a blac k arrow . Th e waters of hydration are those observed afte r equilibratio n of molecular dynamics simulations of the crystallographic structur e o f th e RN A an d th e magnesiu m position s identifie d by crystallograph y ar e fixe d a s described i n ref. 70.
4.2 The P4-P6 domain of group I introns The larges t RN A structur e ever solved contains , per asymmetri c unit, tw o molecules of the P4-P6 domain (15 4 nucleotides), of a group I intron within a gene o f the large ribosomal subuni t o f Tetrahymena thermophyla, an d wa s solved at 2.5—2. 8 A resolution (25). Th e secondar y structure consists o f eigh t helice s called P4 , P5 , P5a , P5b , P5c , P6, P6a , an d P6 b separate d b y junctions J4/5, J5/5a , J5a/5b, J5b/5c, J6/6a, an d
O)vi,i//i>t;iu/>/iiV structures of RNA i>li^iriboiinfleofidi'!: and riho^yina 55
3
Fig. 17.7. Th e secondar y (left ) an d tertiar y ( r i g h t ) structure s o f structur e 4. 2 (25) (Pll)B II) number: Is^ii!), th e P4-P6 domai n of th e 'I'clralipiicihl ttlrr>Hi>l>ln!tif group 1 intiron . Empt y square s betwee n base s indicate suin-LjiioniiM l ba se pjiritii^s . Som e importane three-Jiiiiciisiona l merits ar e i n d i c a t e d .
J6a/6b (Fig . 17.7) . Th e P4--P 6 domai n i s connecte d t o th e ribozym e cor e b y tw o junctions: a t th e 5'-endJ3/ 4 an d a t the 3'-end J6/ 7 (Fig. 17.7) . This remarkabl e crys tallogrophic achievemen t brings a wealt h o f s u n n i n g interactions , contacts , an d ne w motifs (26,71) . Besides , sinc e th e 'li'tKiltyiiu'ixi rihozym e ha s bee n oxsti^iisivol y studie d in soluiion , comparison s betwee n th e crysta l an d solutio n dat a ca n b e made . Furthermore, sequenc e comparison s hav e le d t o a mode l structur e o f th e catalyti c core (72) , and, mor e recently , of the full intro n (73) .
554 OxfordHandbookofNuclearAcidMitili'if Acid Structure Felices P6b , P6b, P6a, P6, P4 umd P5 form on e helica l domain an d helice s P5 a an d P5 b form anothe r stack , Between the two stac k tw o slacke d c o l u m n s , there i s a 150 o t u r n mad e b y the internal loop j5/5a so that the two belical domains are packed side by side (overall length 11 0 A , widt h 5 0 A , an d thicknes s 2 5 A) . Helice s P 4 an d P 6 stac k o n to p o f each othe r wit h th e 5'-enterin g stran d J3/ 4 bindin g into t h e shallowillo\v groove of PG and the 3'-leavin g s t a n d J6/ 7 bindin g int o th e dee p groov e o f P4 , a s predicte d (72,74) . However, tli e secondar y structur e o f heli x P 6 i s not a s expected (eithe r becaus e ther e is a crysta l contact just belo w i t o r becaus e o f th e presenc e o f an additiona l and unnat ural (G at th e 5'-en d whic h form s a non-nativ e additiona l bas e pair ) sinc e i t i s th e 5' end stretc h tha t base pair s to P6(5' ) an d no t P6(3) wit h th e 3 ' danglin g part formin g triples i n th e dee p groov e o f th e u n n a t u r a l helix . Th e interna l loo p J4/ 5 (Fig. 17.8a ; see als o Fig . 17.5 ) is important becaus e o f it s predicte d rol e i n recognitio n o f th e G: U base pai r in th e substrat e heli x (no t presen t i n th e crysta l structure, where, instead , tw o symmetrically relate d J4/5 loop s interac t wit h eac h other).I t consist s of a tande m o f
Crystallagraphic structures of RN A [i/^wi/iiiHiif/ciiffY/c s an
d
r i b o z y m e s 55
5
sheared A: A pai r w i t h , 1 third adenin e stacke d betwee n th e las t A: A pai r an d a G: U pair, i t i s important t o not e tha t a sheared A: A pai r is isosteri c w i t h a sheare d G: A pai r with a C' 2 H...N 7 hydroge n bond , instea d o f th e mor e classica l N2(G)...N7(A ) hydrogen bon d (se e Fig . 17.5)] .
Hg. 17,8. StLTL 1 ()vi L -\s-'i o f impotant motif s fro m structure -1.23.(a ) Th e 14/. i u n i ' r i i i i i [iiof . Ni)ni\ ' iht1 i i i U T i i n l l y si.iiki-i l siiijik - ^litniiiL 1 OLTWI'I'I ] TH E G U pai r an f a sheare d A; A pair . (b ) Th e |. : i--'5j JHiictu m «•!»i-li Form s th e ben d betwee n th e tw o licliiM i Joinjins. NliXU-i 1 th e cemtra l on e hydrpge n butu l ( !: ( ' pai r (c) Th e A - i ii li Inili^f . Id ) The' dirt\'--iv;i y j i i r u - t i o i ) liL'twcci i I'.T.I . I ' M i . jn d I'Sc .
556
Oxford Handbook of Nucleic Add Structure
The tur n betwee n th e tw o stacke d column o f helices is surprising (Fig. 17.8b). Th e last bas e pai r o f P 5 i s a G: C pai r sinc e th e followin g expecte d A: U i s not formed . Instead, th e U bulge s ou t an d the A stacks in. O n th e othe r side , th e firs t bas e pair in P5a i s a cis G:A base pair, a s in tRNAs , followed b y a G:C pair , with th e adenin e preceding th e G: A pair bulging out . Thus, o n eac h strand , there is a bulging base . In this way, tw o cytosin e residues face eac h othe r (formin g possibly a trans C: C pai r similar t o the Calcutt a base pair of structure 3.3. 2 with O2--H-N 4 an d N3..H-C5 hydroge n bonds) an d three adenin e residue s com e clos e with on e bas e stacked on a possible trans Watson-Crick/Hoogsteen A:A pair (N6...N7 and N1...N6). I n P5a, there is an asym metric bulge , th e A-ric h bulg e (Fig . 17.8c), whic h i s also important fo r th e assembl y of th e tw o helica l domains . Befor e th e A-ric h bulge , th e las t bas e pair i s a G: C pai r followed b y a bulging G and, surprisingly, a cis Hoogsteen A: U pair . After th e adenin e residue o f the Hoogstee n pair , a n adenine stacks below th e suga r of the firs t bas e pair occurring afte r th e A-rich bulge. Th e A-ric h bulge continue s with a n outside bulgin g U an d two stacke d adenine residues , the last one o f which stack s below the bulgin g G preceding th e cis Hoogsteen U(anti):A.(syn) pair , th e latte r constitutin g th e singl e occurrence o f a syn bas e i n publishe d RN A X-ra y structures . Withi n th e close d loop forme d b y th e A-ric h bulge , tw o magnesiu m ion s hav e bee n identifie d (5. 4 A apart) in interactio n with anionic phosphate oxygen s o f residues of the loop (i n whic h the phosphate s poin t toward s th e interio r an d th e base s toward s th e exterior) . Th e A-rich bulg e play s a n importan t rol e i n th e contac t betwee n th e tw o domain s vi a A183 an d A184 , which bin d i n th e shallow groov e t o th e ribose s o f bas e pair s C109:G212/G110:C211 i n heli x P4 , formin g a ribos e zippe r (se e below an d Fig. 17.12c) . The three-wa y junctio n betwee n P5a , P5b, and P5 c (Fig . 17.8d) i s of a new kin d and unlike the on e present in the hammerhea d structure . Although th e overal l impres sion o f the PSab c domain i s that o f a helical column , th e helica l axes of P5a an d P5 b are no t colinea r a t the three-wa y junction. Heli x P5c point s clearl y to th e side . Th e tandem sheare d A:G pair s (non-alternating) ar e instrumental for the left-hande d posi tioning o f heli x P5 a toward s P5b . Th e junctio n betwee n P5 c an d P5 a i s highl y unusual. Afte r heli x P5c , two residue s poin t toward s th e sugar—phosphat e o f the tw o guanine base s implicate d i n th e tande m G: A pairs , th e followin g A residu e stack s under th e last base pair of P5a, wit h the last U bulgin g out. Analysis o f th e structur e ha s revealed severa l ion-binding sites . Figur e 17. 9 shows two examples . Interestingly , the magnesium ion s bind i n the dee p groov e o f the RN A helices with a preference fo r guanine N7 an d O 6 atoms , an d especiall y stacke d G: U pairs (non-alternating , se e Fig . 17.9a). Th e alternatin g G: U pair s bin d onl y cobal t hexammines, an d not magnesiu m ions .
4.3 RNA:RNA interaction motifs Although al l RNA molecule s hav e a well-characterize d secondar y structure , larg e RNA molecule s wit h a biological functio n such a s recognition o r catalysi s requir e a tertiary structure . I t i s a puzzl e t o understan d ho w suc h larg e an d highl y charge d molecules ar e able t o fol d int o compac t structures , often b y themselve s withou t th e help o f proteins, a s in several autocatalytic RNAs (75,76) . The arrangement s betwee n
Crystallographic structure of RNA ^li^riboiiiiclceriJci
and
ribozymes 55
7
Fig. 1 7 . 9 , Sterepviews o f tw o mangesiu m 111 1 ] r u l i ng *;Hc.' s (als o occiroedd b y o s i t i u n l i tiL L \amniinu'; i n th e P4-P6 d o m a i n . Both occur i n th e dee p groove. (d ) B i n d i n g t o tw o non - alrematin g G: U pain (b) Binding to t h r e e adjacent guamn e residues.
RNE molecule s i n th e crysta l packin g revea l possibl e contacts . Th e RNA:RN A interaction modf s wil l b e describe d her e wit h specia l emphasi s on thos e that hav e been observe d recurrently . Larg e ribozymes , suc h a s group I nitrons , rel y heavii y o n two recognitio n motifs : th e loop-loo p moti f an d t h a t betwee n th e GNRA famil y o f tecraloops that interac t specificall y w i t h shallow groov e side s o f regula r o r irregula r helices (73) . Th e loop-loo p moli f wa s first. seen i n th e crysta l structur e of yeas t tRNA A s p i n whic h tw o a n t i c o d o n loops wit h almos t self-complementarit y ( GU C - ) form a smal l heli x wit h a centra l U: U pai r (35,77) . Th e recognitio n modi f involvin g the ( j N R A tetraluop s wer e firs t predicte d on th e basi s o f sequence comparison s (72 ) and prove d b y a swa p wit h a pseudo-kno t moti f i n a grou p I intro n (78) , Th e mor e sophisticated recognitio n moti f o f th e GAA A lelraloo p famil y wa s discovere d b y in vitr o selectio n experiment s (79) , Bot h type s o f recognitio n motif s hav e no w bee n observed b y X-ra y crystallography : th e G N R A - moti f t o a shallo w groov e o f G:C pair s i n th e hammerhea d structures , and th e GAA A moti f to a n interna l loop i n the P 4 P 6 d o m a i n . I n crysta l packin g a r r a n g e m e n t s , a frequent . contac t i s mad e between ribos e ring s o f adjacen t helices ; thi s motif , terme d th e ribos e zippe r motif , has als o bee n see n incramolecularly i n th e P4—P 6 domain . Finally , th e crysta l structure o f th e P4 - P 6 domai n le d t o th e discover y o f th e structurall y stunnin g A: A platforms.
558
Oxford Handbook of Nucleic Acid Structure
4.3.1 The -GNRA- helix-loop motif The-GNRA- tetraloops , especiall y -GNAA - an d -GNGA- tetraloops , interact , respectively, wit h tw o consecutiv e C: G pair s (note d 5'-CC:GG ) an d a C:G stacke d on a n U:A pai r (5'-CU:GA ) s o that th e fourt h residu e o f the loo p (alway s A) binds to th e secon d guanin e o f th e heli x (Fig . 17.10; se e als o Fig . 17.4) an d th e thir d residue o f th e loop , i f a n A , bind s t o G and , if a G , bind s t o A . Th e recognitio n occurs in the shallo w groov e o f the helix and the chiralit y i s such that the interactin g bases o f th e loo p ar e parallel t o th e purin e base s o f th e heli x (Fig . 17.10). With th e Hoogsteen sid e o f the base s in th e loo p oriente d toward s th e insid e o f the loop , th e recognition ca n onl y occu r wit h th e Watson—Cric k fac e o f th e loo p bases . Th e hydrogen bondin g schem e doe s no t involv e th e N 6 amin o grou p o f loo p adenine s but, instead , N1(A ) an d N3(A), which, for the third base of the loop, bind to N2(G ) and O2'(C) , and , for th e fourt h bas e o f th e loop , t o O2'(G ) an d N2(G ) (se e Fig . 17.5). Betwee n th e thir d an d fourth adenin e o f the loop , ther e i s therefore a rotation of about 30°. The —GAAA — tetraloops also specifically bin d an 11-nucleotid e interna l loop with a complex structur e (se e Fig. 17.11). I t start s with tw o C: G pairs , followed b y a bulged U, a trans Hoogstee n A: U pair , an d a n A: A platform . Th e thre e adenin e residue s of the loo p interac t with th e 11-nucleotid e moti f and form a stack o f four adenin e bases with the firs t A of the A:A platform. Withi n the 11-nucleotid e motif, th e bulgin g U folds bac k an d form s a on e hydroge n bon d contac t with th e firs t A o f the A: A platform. Th e secon d bas e of the loo p form s a trans Watson-Crick A: A symmetrical pai r (N6. . . N1) with th e adenin e o f th e 11-nucleotid e moti f involve d i n th e Hoogstee n pair. Th e thir d adenin e bas e of the loo p interact s with thre e hydroxy l group s (on e t o the G of the loop, on e t o tha t o f the bulgin g U , an d one t o th e G of the secon d G: C pair i n th e 11-nucleotid e motif) . Finally , th e thir d adenin e o f th e loo p form s th e network o f hydroge n bond s a s th e fourt h A o f th e —GNRA — tetraloo p interactin g with C:G pairs . 4.3.2 The A:A platform motif This moti f i s unexpected becaus e tw o consecutiv e A residue s stay at abou t th e sam e level and present a pronounced translationa l shift wit h th e N 3 o f the 5'-adenin e facin g the Hoogstee n site s o f th e followin g adenin e residu e (Fig . 17.10). I t ha s bee n remarked that , following th e A: A platform, a G:U pai r is generally foun d with th e G 3' t o th e As. In th e L5c loop, a non-canonical A: U pai r with a single hydrogen bon d between O4(U ) and N6(A ) i s found instea d o f the G: U pair . I n th e latte r case , th e segment C G o f the loop self-pairs between th e tw o molecules of the asymmetri c unit , forming a small intermolecular loop—loo p helix. I t is interesting to remar k that, in th e present model o f the ful l intron , th e sam e L5c loop form s an intramolecular loop—loop contact with loo p L 2 (73). 4.3.3 The ribose zipper motif This motif is dominated b y contacts involving th e hydroxy l O2 ' grou p o f two strands. It i s seen i n variou s forms in crysta l packing contacts . I n th e ribos e zippe r motif , th e O2' o f one residue hydrogen bond s with th e O2 ' an d the N3(R) [or the O2(Y) ] o f an adjacent residu e (Fig . 17.12).
Crystallographic structures of RNA oligoribonucleotides and ribozymes 55
9
Fig. 17.10 . The —GNRA—/heli x recognitio n moti f (fro m top to bottom) . Stereovie w of a —GNRA — tetraloop as modelled o n th e basi s o f chemical probing experiment s (80) . The rm s deviation between th e modelled structur e and the X-ra y structur e is 1.54 A . Left , a n idealized sheared G:A pair. Ribbon diagram illustrating the recognition potentia l of—GNRA— tetraloo p wit h th e shallow groov e o f helices. Below, th e idealized bindin g o f a loo p adenin e residu e an d a helica l G: C pai r (78 ) an d th e crystallographicall y observed contac t (81) with th e intermolecula r packing contact at the left .
.560
Oxford Handhook of Nudcic Add S t r u c t u r e
Fig. 17. 1 1. Stereoview o f th e GAA A loadoo p moa f an d it s docleond e interna l Ioop with tw o specific tripl e interactors
recepcor-pro r
5. Conclusions Our knowledg e o f RN A structur e an d foldin g ha s increase d considerabl y i n recen t years. Beside s smal l RN A fragments, three, three larg e crysta l structure s ar c no w availabl e ( t R N A s , hammerhea d ribozymes , an d th e p4...p 6 domain) . R e c u r r e n t three dimensional motifs , whic h ca n b e eithe r structura l o r folding , hav e bee n detecte d (the U-turn , th e A: A platform , th e ribos c zipper , th e G: A tandem . An d th e — G N R A - / s h a l l o w groove o r th e ( I A A A — / i n t e r n a l loo p contact) , Togethe r wit h th e concept o f hierarchica l foldin g o f larg e RNAs , th e existenc e o f recurren t RN A motif s has le d t o th e RN A tectonic s view , accordin g t o whic h larg e RN A structure s can b e decomposed int o module s an d assemble d fro m the m (82) . At th e atomi c level , however, th e variabilit y i n precis e contac t i s subtle. Fo r example , i t i s wort h compar ing th e variabilit y i n th e sheare d G: A pair s (se c Fig . 17.5), wher e th e N1(G ) i s at time s free an d a t othe r time s engage d i n hydroge n bondin g (se e Fig. 17,13, th e t R N A V r structure). I n Fig . I 7.13, a tripl e i n t e r a c i o n betwee n th e dee p groov e o f heli x 1 ) an d residues fro m th e variabl e loo p ar e show n m thre e differen t iKNAs . I n tRNA''"' , residue 4 6 form s a Watson—Crick/ 1 loogstee n pai r wit h residu e 2 2 an d th e phosphat e of"residue 9 bind s t o 4 6 ,and 1 3 (whic h pair s to 22) . However, i n tRNA ver'', i t i s residu e
crystallographle structures of RNA oliguribomdecolide s and ribezymes 56
1
Fig. 17.12 . Slercoviews o f there ribos c zippers (a ) I n structur e . 3 . 1 . 1 (39 ) (PDB ] I ) amnher e I m a j . (b : I n structure 3.1. 4 ( l I ) (PDB ] I ) I! ]l) mumber Isdr) . (c ) i n strucaur e 4. 2 ( 2 5 ) (PDB l ) numbe r I g i d ) .
9 that presents it s Hoogstee n site s t o th e Watson-Cric k site s o f residu e 13 , whic h itself form s a sheare d G: A pai r wit h residu e 2 2 (a t th e same time . i t i s th e phosphat e o f residue 2 2 t h a t bind s t o residu e 9) , ldentiea l overal l topologica l arrangement s ar e thus , coupled tomicrohererogeneitie s i n th e specifi c atomi c contact s betwee n residue s underlying th e stabilit y of the globa l tertiar y fold , The importan t experimenta l observatio n i s t h a t topologicall y distinc t molecule s share quasi-identica l three-dimensiona l micromotifs . Thes e frequentl y observe d motif s may hav e bee n selecte d durin g biologica l evolutio n becaus e the y ar e able t o accom modate, withi n t h e i r folding, variability an d heterogeneity . Th e buildin g and assembly of a three-dimensiona l databas e o f thes e motif s coul d therefor e b e a considerabl e hel p to scientist s dealing wit h RN A fo r whic h X-ra y o r NM R structur e model s ar e no t available.
562 Oxford
Handbook of Nuclear '[fit! Sinicttin-
Fig, 17.13 . SiinihntK' H LIIH ! differences i n rnpli - ronr^cr 1 ^ i n tou r RNA* . Thi s tripl e contrac t otvur s i n th e deep groove of the (tit1 I) [iL'[i\ . ' [ ' l i e tripl e (i t t R N A A > | 1 i s show n t o illustrat e th e similarit y o f th e contract s between AI 4 :uit l flk1\ and [li.i l lioiwi'i- n A 13 an d A2 2 i n tRNA' 1 "'. Ther e i s no hydroge n bond betwee n M ( A 14) ,and O2'(A2!) distance.- .V d A) . I'h e Tvll-r.- m i' s ar e (in - tRNA" 1 ' (S.l ) fl'D U 1! ) number: Iscr) , fo r RNAi' 1 "' (84 ) ( I J ] ) H II ) m i r n l u T : C.ln.i: , tn r iKNA- h («5 ) (PHI S [ ] ) i n u n l v i : li;t]) , fu r iRNA'"' 1 (3n ) (I'l )U [1) iui3iihL l f: 3rr.i) .
Acknowledgements B. M. is supported by a Bourse Docteur CNRS-Rhone-H'-lJouleiK: Ronrr, We than k l.)D r ' I h o m as H e r m a n n fo r supplyin g Fig . 17. 6 an d Quenti n V i c e n s fo r com p i l i n g t h e tables . F . W . i s t h a n k f ul t o th e I n s t i t u t UnivxTsitairi. ' C!L ' France fo r support .
References 1. M i l l i o n ) . J.F. an d UtiUriibi'irk , O.C. (1990) .Meth. l-li:yiil,>l. 180, 51. 2. Cli.iiiihi.Tlin , M. an d Ryan , T . ( 1 9 8 2 ) Itit-fiiit'* 15 . 85 . 3. P r i c e . S.R. , Iki . N., t')uhnd^- . C;. , Avis, J.M. an d Nagai , K . (1995) Mo l . «/,)/ . 249 , 398 . 4. Scaringe , S.A. , Francklyn, (.'.. ;ind Ustiiiin , N . (1990) .Vrn7 , /Icd/. v Kc.i . 18, 5433 . 5. UMIKID , N, . O^ilvii.- , K . K . , Jian^, M.-Y . an d (.; L -dot^ren, R.J . (1987) / . .-b» . Che m .See. 109, 7845 . 6. Ogilvie , K . K . , Usm.m , N. , Nk-o^hosi.in , K . an d Codor^u-n . K.J . (1988) JVor . .-Vw/ . Aca C: G > U: A > A: U (53) . The mos t stabl e double mismatc h 5'-UG/GU stabilizes a duplex by AG° (37°C ) = -4.8 kcal/mo l an d the leas t stable AA/AA destabilizes the duple x b y AG ° (37°C ) = +3. 0 kcal/mol . Ther e i s n o evidenc e o f mismatch induced bendin g i n RN A helices . T o illustrat e som e genera l feature s o f mismatc h geometry, we discus s in detail the structure s of G:U, G:A , AH+:C, and G:G pairs.
RNA structure in solution 57
7
Fig. 18.4. Non-Watson—Cric k bas e pair s observed i n solutio n structure s o f RNA. (a ) Sheared G: A pai r (18,58); (b ) G:A (59) ; (c) A:A (60,61); (d)-(f ) G: G mismatche s (29,30,64) ; (g ) reverse Hoogsteen A: U pai r (60,61); (h ) protonated A +:C pai r (62,68) ; (i ) wobble G: U pai r (36,72) ; (j ) water-mediated U: C pai r (72); (k) protonate d C +:C pai r (53,75) ; (1 ) U: U mismatc h (53,75) .
578
Oxford Handbook of Nucleic Acid Structure
3.4.1 G:U wobble pair G:U mismatche s are very common. Replacemen t o f standard Watson-Crick bas e pairs by 'wobble ' G: U pair s (Fig. 18.4i ) perturb s the A-for m heli x onl y slightl y (54) . Th e distance between th e Cl ' atom s across the mino r groov e i s increased from abou t 10. 6 A to 12. 8 A (6) , and the stacking , twist, an d ris e of the G: U pai r are slightly change d (36). Thi s perturbatio n ca n produc e site s that facilitat e ligan d binding . A singl e G: U pair form s a preferential site for bindin g Mn 2+ ion s i n th e majo r groov e o f an RN A helix, a s indicated b y broadening o f the NM R resonance s caused by the paramagnetic ion (55) . Magnesium ion s presumably bin d i n a similar fashion. X-ray diffractio n ha s revealed bindin g pocket s fo r Co 3+—hexammine and Os 3+-hexammine i n th e majo r groove o f an RNA heli x wit h tw o adjacen t G: U bas e pairs (56). 3.4.2 G:A mismatch G:A pairs commonly occu r i n RNAs a s a tandem mismatch . The stabilit y and struc ture of tandem G: A mismatches depend o n th e closin g Watson-Crick bas e pairs (57). The solutio n structure of the RNA duple x r(GGCGAGCC) 2 show s that the G: A pair is in a 'sheared' conformatio n with hydroge n bond s between G amino an d A N7, an d G N 3 an d A amin o (Fig . 18.4a ) (58) . Ther e i s a stron g cross-strand G—G and A—A stacking whic h contribute s t o th e hig h stabilit y o f th e motif . Whe n th e closin g Watson—Crick bas e pair s ar e changed , th e sam e tande m mismatc h i n r(GCG GACGC)2 form s a n imino-hydroge n bonde d structur e (Fig . 18.4b ) (59 ) wit h intrastrand G— A stacking . Th e tw o motif s create very differen t distortion s o f th e A form helica l geometry . Th e heli x i s much wide r fo r th e GGA C tha n fo r th e CGA G motif, wit h th e distance s between G and A phosphates on opposit e strands being 20. 4 and 12. 5 A , respectivel y (th e regular A-form distanc e between opposit e stran d phosphates i s 17. 5 A) . Th e 5'-GA-3 ' ste p i s underwound (21° ) a t th e GGA C mismatc h and overwound (81° ) at the CGAG . G:A pairs are abundant in biological RNAs . A sheared G:A motif wa s found in th e solution structur e o f th e GCA A hairpi n loo p (18) , th e loo p E famil y (se e Sectio n 3.5.1) (60,61 ) and loop A of the hairpin ribozyme (62) . The imino-hydroge n bonde d G:A pai r wa s observed i n a Rev respons e elemen t (RRE ) RN A (63,64 ) an d i n th e crystal structur e o f r(CGCGAAUUAGCG ) (65) . Bot h G: A hydroge n bondin g pat terns are present in the structur e of a flavin mononucleotide (FMN ) aptame r (66). Functionality o f th e sheare d G: A pair s i s attribute d t o th e availabilit y o f th e Watson—Crick face s o f the base s for additional hydroge n bonding . A n expose d N 7 o f G ca n als o form a divalen t metal-binding site , a s is the cas e i n th e highl y conserve d tandem G: A pair seen in the crysta l structure of the hammerhea d ribozyme (26,67) . 3.4.3 AH+:C pair The protonate d A:C pai r is geometrically simila r to th e G: U wobbl e pair . It has been observed i n a lead-dependent ribozym e (68) , loo p A o f th e hairpi n ribozym e (62) , and i n th e structur e of a small hairpin loo p (69) . There ar e hydroge n bond s forme d between th e A amino an d N3 o f C, an d between th e protonated N l o f A and O2 o f C (Fig . 18.4h) . Th e evidenc e fo r th e protonatio n o f adenin e N l come s fro m th e change i n chemica l shif t o f the C 2 carbon , whic h ca n b e monitore d a s a function o f pH (62,68) . The p H titratio n curves from thes e studies indicate that the pK a of the N l
RNA structure in solution 57
9
nitrogen i s significantl y shifte d an d ha s a valu e o f 6.2—6. 4 (fre e adenosin e has a pK a near 4) . Simila r protonate d form s o f adenin e wer e observe d i n a crysta l structure o f ApA dimer s (6). Th e ribos e ring s o f bot h mismatche d nucleotide s ar e i n th e usua l C3'-endo conformatio n an d th e pai r is incorporated int o a heli x withou t significan t distortions o f th e A-for m geometry . Th e AH +:C pai r is 2 kcal/mo l (AG° ) less stable than a n A:U bas e pair at 37°C (69). 3.4.4 G:G pairs Three different hydroge n bonding pattern s have been observe d fo r G:G mismatc h pairs (Fig. 18. 4 d—f) . G: G mismatche s ar e commo n i n structure s o f aptamer s identifie d through in vitro selectio n fo r bindin g o f variou s ligands . Th e structur e o f a n AT P aptamer (29,30 ) revealed tw o differen t G: G pair s bonded a s shown i n Fig . 18.4 d an d e. A G: G pai r (Fig . 18.4d ) wa s observed i n th e arginine/citrullin e an d flavin mononucleotide (FMN ) aptamer s (66,70). The imino-hydroge n bonde d G: G pai r (Fig. 18.4f ) is present in the RRE RN A interna l loop (63,64) . G: G mismatches easily dimerize i n G rich RN A sequence s forming ultra-stable tetrameri c structure s (G quartets) consisting of four hydroge n bonded guanin e bases (71). 3.4.5 Other mismatches Many other non-Watson—Cric k base pairs have been foun d in RNA. A n A:A pair and reverse Hoogsteen A: U pai r (Fig. 18.4c an d g) are formed in the eukaryoti c 5S rRNA loop E (61 ) and i n th e sarcin/rici n loo p fro m 28 S rRNA (60) . The structur e of th e U:C mismatc h shown in Fig. 18.4 j was found in crystals of RNA duplexe s containing the interna l loo p sequenc e 5'-UUC G (72,73) . Th e U: C pai r involve s tw o hydroge n bonds, on e directl y betwee n th e pyrimidin e base s and anothe r on e mediate d throug h a bridging wate r molecule. Incorporatio n o f solvent int o th e hydroge n bon d networ k spreads th e base s apar t an d ensure s a goo d fi t o f th e U: C mismatc h t o th e A-for m helical geometry . NM R studie s o f a duple x r(GGACUCGUCC) 2 sugges t tha t th e structure of the U: C pai r in solution i s similar to th e crysta l structure (74). The struc tures of U:U an d C:C + mismatche s shown i n Fig . 18.4 1 and k ar e strongly supported by one-dimensiona l NM R dat a an d thermodynami c studie s of shor t duplexe s con taining thes e pairs (53,75). 3.4.6 Mismatch summary RNA base s have the abilit y t o for m hydroge n bonde d pair s in an y combination (fo r the ful l lis t of possible base pairs with tw o hydroge n bonds see ref. 6, or Appendix I of ref. 76) . Mismatche s ar e stabilized by inter - o r intra-stran d stackin g interactions an d hydrogen bon d networks . Introductio n o f mismatche s int o a n RN A heli x doe s no t change the globa l A-form geometr y t o a large extent .
3.5 Internal loops An interna l loo p contain s nucleotide s tha t canno t for m Watson-Cric k pair s o n bot h strands o f a regular RNA duple x (Fig . 18.1). I f the numbe r o f unpaired nucleotides is the sam e on eac h strand, the interna l loo p i s symmetric. Accordin g t o thi s definition, the singl e and double mismatche s discusse d in Sectio n 3. 4 constitut e th e smalles t sym -
580
Oxford Handbook of Nucleic Acid Structure
metrical internal loops. Stabilitie s and structures of loops vary significantly depending o n the loo p siz e an d sequence . UV meltin g studie s o f internal loop s containin g unpaire d adenines showe d tha t symmetri c loop s wer e mor e stabl e than asymmetri c loop s o f th e same size (77) . RNA duplexe s containin g asymmetri c loop s A 5,An and U 5,Un ( « # 5 ) had slowe r electrophoreti c ge l mobilitie s tha n correspondin g symmetri c loops , o r a regular RNA duple x (78) . Slower electrophoretic mobilitie s can be a consequence of an intrinsically ben t conformatio n o r highe r flexibilit y o f asymmetri c loops , bu t mor e detailed structural analysis is needed t o distinguis h between thes e effects . In solution , interna l loop s ca n b e flexibl e an d disordered , bu t man y hav e well defined rigi d structures , wit h non-Watson—Cric k base pairs , base—suga r hydroge n bonding, an d extended stackin g interactions. I n order to illustrat e the structura l com plexity of RNA interna l loops, w e will describ e th e structure s of two classe s of internal loops . Th e firs t class , represente d b y loo p E fro m 5 S rRN A an d th e hairpi n ribozyme loo p A, constitutes loops tha t are structurally ordered an d relatively rigid i n solution. NM R spectr a of oligoribonucleotides containin g thes e loops normall y hav e well-resolved, shar p proto n resonanc e lines . Interna l loop s belongin g t o th e secon d class ar e disordere d an d flexibl e i n solutio n b y themselves , bu t becom e structure d upon binding an external ligand. Example s o f the latter class include th e AT P aptamer and the 3'-UTR regulatory elemen t o f human U1 A protein . 3.5.1 Structured loops Loop E family. Asymmetri c interna l loops o f nine nucleotide s with th e sequence
are foun d i n severa l biologically importan t RNAs . Thi s moti f is highly conserve d i n eukaryotic 5S rRNAs (loo p E) and 23S/28S rRNAs (sarcin-rici n loop ) I t als o occurs in viroid RNAs and in the hairpi n ribozyme. Evidenc e fro m U V cross-linking , chem ical modification studies , and NMR spectroscop y indicate that th e structure s of these loops ar e ver y simila r an d thu s ca n b e categorize d a s a singl e famil y (Fig . 18.5) . Detailed NM R studie s of loop E (61 ) and th e sarcin—rici n loop (60 ) showed tha t thi s motif containe d several non-Watson—Crick base pairs and a single bulge d base. In th e structure o f loo p E , a sheare d G: A mismatc h (Fig . 18.4a ) i s stacke d o n a revers e Hoogsteen A: U pai r (Fig . 18.4g ) an d a non-conserved G residue is bulged ou t o f th e helix. Th e loo p i s closed by A:A and U: U pairs . All ribose residue s are C2'-endo (A form like ) except for the bulged G and adjacent A residues. The backbon e of the loop is severel y distorte d a t th e G:A/U: A ste p (Fig . 18.5) . Electrophoreti c ge l mobilit y measurements o n RN A duplexe s containin g a eubacteria l loop E indicat e tha t thi s symmetric (seve n nucleotides in each strand) loop introduces a directional bend an d an increased helica l twis t i n th e A-for m geometr y (78) . I n summary , th e loo p E-lik e structure i s highl y ordere d an d roughl y resemble s a continuou s A-for m helix . It s diverse functionalitie s are mos t likel y accomplishe d b y th e accessibl e side s o f non standard base pairs that are accessible for intermolecular binding (79).
RNA structure in solution 58
1
Fig. 18.5. (a ) Famil y o f loo p E-lik e sequence s fro m differen t RNAs . (b ) Stereovie w o f th e three dimensional structur e of loop E from 5S RNA (61) .
582
Oxford Handbook of Nucleic Acid Structure
Hairpin ribozyme loop A. Loo p A o f th e hairpi n ribozym e i s a symmetri c interna l loop o f eigh t nucleotide s tha t contain s th e cleavag e sit e (80) . Th e structur e o f th e loop solve d b y NM R (62 ) show s tha t th e guanin e 3 ' t o th e cleavag e sit e form s a sheared G: A base pair and that th e cytosin e residu e immediately 5 ' t o the cleavag e site is involve d i n a protonated AH +:C bas e pair. Th e loo p i s stabilized b y extende d A form stackin g between residue s adjacent to th e cleavag e site and by several cross-strand base to sugar hydrogen bonds that are formed by residue G8. They includ e a hydroge n bond fro m th e G 8 carbony l oxygen t o th e 2'-O H o f G20 and from th e G 8 amin o o r imino proton s t o th e O4 ' o f U21. Th e structur e o f the loo p ha s an overal l A-for m helical shape with a widened majo r groove . 3.5.2 Flexible loops ATP aptamer. Th e techniqu e o f in vitro selectio n ha s bee n use d t o isolat e RN A aptamers that bind to biological cofactor s with hig h affinit y an d selectivity (81,82). An aptamer for ATP (o r AMP) wa s found to contai n a 12-nucleotide asymmetrica l RN A loop flanke d b y double helica l regions (Fig . 18.6a ) (28) . Two high resolutio n solutio n structures ar e availabl e for thi s motif, revealin g severa l unusua l properties (29,30) . I n the absenc e o f exogenou s AMP , NM R spectr a o f th e aptame r showe d tha t onl y Watson—Crick bas e pairs i n th e flankin g helice s ar e forme d whil e th e loo p itsel f is largely unstructured. Upon additio n o f AMP, sharp resonances of all imino hydrogen s in th e loo p appeared , indicatin g formatio n o f a structure d cor e wit h a n extensiv e network o f hydroge n bonds . I n th e complex , AM P i s tightl y docke d i n a bindin g pocket formin g a sheared G: A bas e pair wit h th e residu e G 8 (Fig . 18.6a) . Thi s bas e pair, along with residues A9 and A10, form s a GNRA tetraloop fold (se e Section 3.3 ) which i s stabilize d by stackin g o n a G11:G 7 pai r (Fig . 18.4d) . Ye t anothe r non Watson-Crick pair , G30:G1 7 (Fig . 18.4e) , form s i n th e bindin g pocke t providin g a stacking platform for residues A12 and U16. Th e backbon e o f the 11-bas e loo p form s the shap e o f a Gree k lette r £ , wit h th e middl e arc h correspondin g t o th e AMP binding sit e (29) . Th e entir e moti f i s stabilized by extensiv e base-bas e stackin g and hydrogen bondin g withi n th e loop . Th e AMP—loo p comple x introduce s a ben d o f about 100 ° betwee n th e two helical stems according to th e NM R structur e (30). 3'-UTR RNA. A n asymmetrica l RN A loo p o f eigh t nucleotide s (Fig . 18.6b ) i s involved in regulation o f expression o f a human U1 A protein . Th e structur e of the fre e loop an d the loo p boun d t o a ribonucleoprotein (RNP ) domai n hav e been solve d by NMR spectroscop y (83,84) . Th e structur e of the fre e RN A indicate s that th e single stranded loo p regio n contain s local stackin g interactions in th e contex t o f a generall y flexible structure (84) . Protein bindin g orders the internal loop and changes the overall shape of the RNA. I n the complex , th e RNA i s severely bent, with th e single-strande d nucleotides positione d acros s the surfac e o f a four-strande d / 3 sheet. Ther e ar e n o base—base hydrogen bond s formed i n the loop, but mos t of the residue s are involved i n stacking interactions. Th e RNA—protei n interface is highly structure d and consist s of extensive intermolecular hydroge n bond s and hydrophobic interactions (83) . Other example s o f RNA interna l loop s tha t becom e ordere d upo n bindin g t o a n external ligan d includ e aptamer s fo r flavi n mononucleotid e (FMN ) (66) , arginine / citrulline (70) , RRE Re v interna l loop (64) , and an aminoglycoside bindin g sit e fro m E. coli 16S rRNA (85) .
RNA
Fig. 18.6.
KNA mtrniis]
ftritcttiiv in M'/Hfi'iJ d 58
3
loop1, Mivokvtt in ink'nnolL'fiiLn" biMLiinLr. (:i) J ht' SI/VIIKTICC
the AT P .ipl.niiL- r [2')..lil) . ])nrri'i t lane s represen t Nt.)[' . iximai'ls unc d i n strLU'tiii v d:'U'niHi];um!] . Tla fijliiri's : i i f l-ii;. I :iin l l : i^. d fro m ri'T . (2'J). (li ) Th e i l n i i ' t u r t 1 of th e U ' l ' R i-iiiiirti l f l c i n t n t o f th e h u m a n U 1 A (irutrin , tii- r ( I r t i ) :nn l l i o i i r u t ( r i g h t ) t o i t s l a r g e t p e p t i d e (H.VS4) . -J.5...i hilcrual liVj> iisiniHtiry
Internal loops ar e cxtremel y importan t i n th e functio n o f RN A molecules. 1 Dependintg on thei r siz e an d sequence . i n t e r n a l loop s ma y introduc e site s o f loca l flexibilit y an d bending in the RNA double helix. Many internal loops form compactand
584
Oxford Handbook of Nucleic Acid Structure
structures wit h non-Watson—Cric k base pairs , sugar—base , an d sugar—suga r interac tions. Loop s involve d i n protei n recognitio n o r ligan d bindin g ar e ofte n disordere d and flexible in solution bu t become structure d upon binding t o ligand.
3.6 Bulge loops A bulg e loo p i s defined a s one o r mor e nucleotide s tha t interrup t on e stran d o f a n otherwise continuou s Watson-Crick-paired double heli x (Fig . 18.1). The stabilitie s of RNA bulge s depen d o n th e siz e an d th e sequenc e o f the unpaire d regio n and , addi tionally, o n th e sequenc e o f adjacen t Watson—Cric k bas e pair s (86) . U V meltin g experiments showe d tha t th e stabilit y o f bulge s containin g unpaire d adenosine s o r uridines depende d o n th e sequenc e o f th e flankin g Watson-Cric k bas e pairs . Fo r instance, a loop of three As was more stabl e by about 2 kcal/mol of free energ y whe n placed between 5'-C—A 3—C instead o f 5'-G—A 3—G adjacent nucleotides (86) . It is well establishe d that bulges bend th e A-for m doubl e heli x (87-89) . The exten t of bending depends o n severa l factor s includin g th e siz e an d sequenc e o f th e bulge , the sequence s of flanking base pairs, and th e presenc e o f divalen t metal ion s (89—91) . Transient electri c birefringenc e measurement s o n RN A duplexe s containin g singl e bulges o f a sequence A n o r U n (wher e n = 1—6 ) showe d tha t th e magnitud e o f heli x bending increase d with increasing size of the bulge . I n th e absenc e of Mg 2+, fo r bot h An an d U n series , the angl e increment varie d from ~20 ° t o ~8° pe r adde d nucleotide as « was increased from 1 to 6 . The tota l valu e of the ben d range d fro m 7 ° to 93 ° (89) . In all cases studied, uridine bulge s induced smalle r bends than adenosine bulge s of th e same size . The effect s o f mixed-sequence bulge s o n heli x bendin g hav e no t ye t bee n studied systematically (91). 3.6. 1 Single-nucleotide bulges
An NM R structur e of a single adenosin e bulg e i n th e ste m o f a hairpin loo p showe d that th e unpaire d bas e was intercalated int o th e helix , creatin g a small kink i n a n oth erwise norma l A-for m heli x (43) . Th e heli x axi s wa s bent awa y fro m th e bulg e t o allow bas e stacking o n th e stran d opposit e th e unpaire d A . Th e intercalate d adenin e was also stabilized by stackin g on adjacen t Watson—Cric k pairs. The bulg e regio n wa s more dynami c the n th e remainin g part of the heli x a s evidenced b y a mixed C2'/C3' endo conformatio n o f ribos e suga r pucker s an d b y broa d imin o resonance s fro m flanking Watson—Cric k base pairs. NMR studie s o f a duple x r(CUGGUGCGG),(CCGCCCAG) , whic h contain s a single unpaired uridin e residue , provided evidenc e tha t the extr a U wa s looped ou t o f the heli x (92) . Mode l buildin g studie s indicate d tha t a n extrahelica l residu e di d no t introduce significan t bendin g int o the duplex . A larger numbe r o f structural studies exist fo r DNA single-nucleotid e bulges . Th e equilibrium betwee n th e stacked-i n an d looped-ou t conformation s o f single nucleotide bulges in DNA i s dependent o n temperature , the unpaire d residue, and th e sequence of the adjacen t bas e pairs (93) . 3.6.2 TAR element from HIV The TA R elemen t (trans-activatio n respons e element ) fro m th e HIV- 1 genom e consists o f a six-nucleotid e hairpi n loo p an d a ste m wit h a three-nucleotid e bulg e
RNAstructureinsolution585
Fig, 18.7, Th e structor o f a n H I V - 1 TAR nucleodicr TAR bulge. (b) ) TAR bulg e bound t o .algorinoide.
b u l g e (94.95) . (a ) Ben i continutio n of ' the
(Fig. I8.7) . Th e solutio n structure s of HIV- 1 TA R element s boun d t o argininamid e or t o a 37 a m i n o aci d peptid e (ADP-1 ) hav e bee n solve d b y N M R , providing detaile d structures o f th e bulg e i n a fre e an d complexe d RN A (94,95) . I n th e u n b o u n d RNA, the nucleotide s i n th e bulg e (U23-U25 . Fig . 18.7) are flexible but stac k a s evidence d by intranucleotid e NOEs . consisten t wit h th e helica l geometry . Th e stacke d structure within th e bulg e induce s bendin g i n th e heli x axi s (90,96) . Th e conformatio n o f th e bulge change s significantl y upo n ligan d binding . Th e stackin g betwee n nucleotide s A22-U23-C24 i s disrupted, an d th e A-for m stem s flaokin g th e bulg e stac k coaxially . In th e argininamide—TA R complex , U2 3 form s a majo r groov e bas e tripl e wit h th e U38.A27 pair , and th e argininamid e i s positioned belo w th e tripl e formin g hydroge n bonds wit h G2 6 (94) . Th e ADP-1—TA R structur e does no t provid e evidenc e fo r th e U 2 3 U 3 8 . A27 tripl e formation, but als o position s an arginin e residu e withi n hydro gen bondin g distanc e fro m G26-N7 . Bindin g o f bot h th e Ta t peptid e an d argini namide straighten s the ben d introduce d b y th e bulg e i n th e unboun d RN A (96). .J.6..J Bulg e loop s summar y
Unpaired nucleotide s in a n RN A bulg e loo p ca n b e positione d insid e o r outsid e th e helix. Th e incorporatio n o f unpaire d base s int o th e duplc x introduce s a directional bend int o a regula r A-form helix . Th e amoun t o f bendin g depend s o n th e siz e an d the sequenc e o f th e bulge , an d th e presenc e o f divalen t meta l ions. Bulge s i n whic h all residues ar e loope d ou t o f th e heli x allow coaxia l stacking of th e helica l stem s an d d o not ben d th e helix . Large r bulge loop s ca n for m comple x bindin g pocket s tha t serv e as R N A - R N A o r protein-RN A recognitio n sites .
586
Oxford Handbook of Nucleic Acid Structure
3.7 Junctions RNA junction s ar e broadl y define d a s regions wher e tw o o r mor e interconnecte d double helica l sterns come togethe r (Fig . 18.1) . Man y type s o f junctions ar e possible depending o n th e numbe r o f stem s an d th e siz e o f th e branc h regio n (Fig . 18.8) . Junctions play a n important role in positioning helical domains at specific angles , thus determining globa l shapes of RNA molecules . An important forc e stabilizin g multibranched junctions is provided by coaxial stacking betwee n helices . A coaxia l stac k i s formed whe n th e termina l bas e pairs of tw o helices ar e i n va n de r Waal s contac t formin g a straigh t an d quasi-continuou s helical domain. Th e fre e energ y o f a n end-to-en d stackin g between tw o duplexe s follows essentially th e sam e sequence dependenc e a s Watson—Crick pairin g i n a continuou s helix, bu t i s usuall y mor e favourable . UV meltin g studie s showed tha t th e stacke d interfaces ca n contribut e fro m —0. 6 t o —1. 6 kcal/mo l extr a stabilit y in fre e energ y than th e equivalen t nearest neighbour pair s i n a continuous helix (97) . In mor e com plicated junctions , th e stacke d stem s ar e ofte n additionall y stabilize d b y hydroge n bonds t o unpaire d nucleotide s i n th e junction. Base d o n severa l structures discusse d later in thi s section, it seem s that relatively rigid coaxia l stacks are formed at junctions containing a n even number o f branches. It is difficult t o predic t whic h pai r of helices will stack coaxiall y from th e nucleotid e sequence alone. RNA an d DNA junctions are sites of extra counterion associatio n owing to a high phosphate charg e densit y (98) . Junctions fro m tRNA, grou p I introns, an d the ham merhead ribozym e for m specifi c metal-bindin g pocket s a s determined b y X-ra y dif fraction (26,56,67,99) . Th e releas e of bound counterion s upo n change s i n junctio n geometry ca n be a n important facto r determinin g stabilit y and th e functio n o f RN A junctions (100,101). 3.7.1 Two-way junctions A highl y conserve d two-wa y junction i s part o f th e catalyti c cor e o f self-splicin g group I introns. The junction consist s of two doubl e helica l stems (P4 and P6 ) flanke d by single-strande d overhang s at th e 3 ' an d 5 ' end s of th e branc h point (Fig . 18.8a) . Comparative sequenc e analysis an d a large amoun t o f biochemica l dat a availabl e on group I intron s le d t o a three-dimensiona l structur e (102) . I n th e Michel-Westho f model, th e P 4 an d P 6 stem s stac k coaxially , formin g a continuou s helica l domain . The right-hande d rotatio n betwee n th e stacke d helices place s th e nucleotide s of th e single strand s i n opposit e RNA groove s wher e the y ca n for m hydroge n bond s wit h the stems . Tw o residue s fro m th e 5 ' single-strande d en d bin d i n th e mino r groov e forming bas e triple s with th e P 4 stem , an d tw o nucleotide s fro m th e 3 ' unpaire d strand for m bas e triple s i n th e majo r groov e o f th e P 6 stem . Th e Michel-Westho f model o f th e P4/P 6 regio n ha s been show n t o b e essentiall y correct b y th e crysta l structure o f a 154-nucleotide P4/P 6 domain (25). Oligonucleotide model s o f the P4/P 6 junction hav e been studie d in solution . An NMR structur e of a small RNA oligonucleotid e containin g shortened versions of P4 and P 6 stem s an d th e 5 ' overhan g showe d tha t th e stem s forme d a coaxial stac k i n solution. Th e rotatio n a t th e junctio n o f th e helice s wa s right-handed an d almos t twice a s larg e a s th e rotatio n betwee n tw o Watson—Cric k bas e pair s i n a regula r
RNA structure in solution 58
7
Fig. 18.8. Multibranche d RN A junctions , (a ) Two-stem P4/P6 junction fro m Tetrahymena themophilus group I intron . (b ) Three-ste m junctio n fro m 5 S rRNA . (c ) Three-ste m junctio n o f hammerhea d ribozyme. (d ) Four-stem junction fro m tRNA phe .
A-form duple x (103) . As expected fro m th e Michel—Westho f model, nucleotides from the 5 ' overhan g formed nucleoside triples in the mino r groov e o f P4 (104) . The ter m 'nucleoside triple ' i s used because th e hydroge n bondin g involve s a ribose a s well as the bases . An NMR . structur e of th e junction containin g bot h 3 ' an d 5 ' overhang s
588
Oxford Handbook of Nucleic Acid Structure
showed a n entirel y differen t conformation . Whe n th e 3 ' unpaire d nucleotide s were included in th e model , th e stem s did not stac k coaxially an d the nucleosid e triple s i n the mino r groov e o f P4 did not for m (105) . Wit h th e 3 ' overhang , th e junction wa s bent wit h th e tw o helice s rotated in a left-handed fashion . Structura l analysis of junction mutant s with shortene d 3'-end s showed tha t one unpaire d nucleotide a t the over hang wa s sufficien t t o chang e th e conformatio n o f th e molecul e (105) . Thi s stud y clearly illustrate s the sensitivit y o f globa l RN A structur e to mino r change s i n th e nucleotide sequence . 3.7.2 Four-way junctions The bes t structurally characterized RNA junctio n i s the four-way junction fro m transfe r RNAs (Fig . 18.8d) . Crysta l structure s of severa l tRNA s revea l thi s junction a s a rigi d structure o f two pair s of coaxially stacke d helices; th e accepto r stem is stacked coaxially on th e T stem , an d the D ste m i s stacked on th e anticodo n heli x (99 ) (fo r more details see Chapte r 19) . The tw o helica l regions ar e roughly perpendicula r t o eac h othe r an d create a n overal l L-shap e fo r th e molecule . Numerou s studie s hav e show n tha t th e L-shaped geometr y o f tRNA i s also presen t i n solution . Th e unpaire d nucleotide s a t the junction for m severa l tertiar y contact s wit h th e stem s stabilizin g th e geometry . Several specific metal-bindin g pocket s are also formed in the tRNA junction region . The L-shape d structur e o f tRN A i s created b y separatin g the tw o stacke d helical domains wit h unpaire d nucleotide s a t th e junction . Anothe r well-characterize d example o f nucleic aci d four-way junctions (DN A Hollida y junctions) als o consists of two coaxia l stacks , but , i n th e absenc e o f intervenin g unpaire d nucleotide s a t th e branch point , th e stems assume a symmetric X shap e (106). 3.7.3 Three-way junctions An od d numbe r o f helices at the junction pos e severa l structural questions . Are ther e coaxial stack s formed between th e stems , an d i f so, what i s the spatia l relationship o f the remainin g heli x wit h respec t t o th e stacke d domains ? Som e insight s int o thes e questions were provide d b y lo w resolutio n solutio n studie s of tw o RN A molecules , the 5 S ribosomal RN A an d the hammerhea d ribozyme . Bot h o f these RNAs contain central three-way junction s with several unpaired nucleotides (Fig . 18.8 b an d c). The centra l junction o f 5 S rRNA, als o known a s loop A , form s a binding sit e for the transcriptio n facto r IIIA . Transient electri c birefringenc e measurements provide d evidence tha t th e 5 S rRNA junction fro m Sulfolobus acidocaldarius contain s two colin ear stems , I an d V (Fig . 18.8b) . Th e thir d ste m (heli x II ) wa s found t o b e relativel y unconstrained an d fre e t o reorien t wit h respec t to th e I— V axi s (107). An entirel y dif ferent resul t was obtained fro m chemica l modificatio n dat a an d compute r modellin g of loop A from E. coli and Xenopus leavis, which supporte d a colinear, stacked arrange ment o f helice s I I an d V (108,109) . Th e tw o alternativ e stackin g arrangement s may not diffe r greatl y i n fre e energ y an d thu s may coexist i n solutio n wit h differen t ratio s depending o n th e nucleotid e sequenc e fro m a particular organism. I t has been postu lated tha t interconversio n betwee n tw o form s o f 5 S rRN A migh t b e o f functiona l significance (110) . A junction betwee n thre e shor t helice s form s a n activ e sit e o f th e hammerhea d ribozyme (Fig . 18.8c) . I n additio n t o th e conserve d ste m sequences , there ar e several
RNA structure in solution 58
9
unpaired nucleotide s a t the branch point that ar e necessary for the catalyti c activity. I n the crysta l structures of th e ribozyme , th e thre e stern s for m a n overal l Y shap e with helices I an d I I formin g th e uppe r for k (24,26) . Ste m I I stack s directly o n ste m III , forming a pseudo-continuous helix . Th e junction i s stabilized by an arra y of hydroge n bonds fro m th e unpaire d nucleotides . Th e geometr y o f th e hammerhea d ribozym e measured i n solutio n b y fluorescenc e resonanc e energ y transfe r (FRET ) le d t o th e same Y-shaped conformation o f the junction foun d in the crysta l structure (111) . 3.7.4 Junctions summary Structures o f multibranched junction s ofte n determin e th e globa l shape s of biologically functiona l RNA molecules . Th e conformation s o f RNA junction s ar e difficul t to predic t an d depen d o n th e numbe r o f stems and the siz e of the branc h region . A n important elemen t o f junction structur e an d stabilit y is provided b y coaxia l stacking between doubl e helica l branches. Coaxial stack s stabilize junctions and join the shorte r stems formin g quasi-continuou s elongate d domains . RN A junction s ofte n for m complex structure s wit h multipl e metal-bindin g pocket s an d serv e a s site s fo r protein-RNA and RNA-RNA recognition .
4. Tertiary structures, interactions between secondary structures The secondar y structur e motifs that hav e been describe d ca n interact (mainl y by base pairing) t o for m tertiar y structure . The bas e pairs formed i n secondar y structure s can be represented b y drawing the sequenc e in a circle with non-crossin g line s joining th e paired base s (42,126) . Line s representin g th e interaction s tha t characteriz e tertiar y interactions cros s th e secondar y structur e lines . Thi s distinctio n i s importan t i n methods t o predic t structure . Secondar y structure s ca n b e considere d a s a su m o f structural elements . Th e non-crossin g o f th e base—bas e interaction s mean s tha t th e structural element s ar e independent . Tertiar y structure s involv e bas e pair s betwee n parts o f th e secondar y structure s an d thu s mak e the m highl y dependent . Othe r definitions o f secondary an d tertiar y structure are also used. 4.1 Base, nucleoside, and nucleotide triples When a single-stranded nucleotid e interact s with nucleotide s tha t are already involve d in a base pair, a triple i s formed. I f the hydroge n bondin g involve s onl y th e base s it is called a bas e triple. I f base—ribose o r ribose—ribos e hydroge n bondin g i s present, w e have a nucleoside triple . Phosphat e involvemen t as a hydrogen bon d acceptor create s a nucleotide triple . Tripl e interaction s hel p t o orien t differen t region s o f secondar y structure and stabilize the globa l three-dimensiona l fold s o f large RNAs. 4.1.1 Triple helices In addition t o regular Watson—Crick double helices , som e nucleic aci d sequences form stable three-strande d complexe s (fo r a review o f DNA tripl e helice s see Chapter 12) . RNA tripl e helice s consis t o f two strand s forming a n A-form Watson—Cric k duple x
590
Oxford Handbook of Nucleic Acid Structure
and th e thir d stran d boun d i n eithe r th e majo r o r th e mino r groov e o f th e helix . Although tripl e helice s ar e stabilize d by extensiv e stackin g between repeatin g bas e triples, the y onl y for m a t hig h ioni c strengt h condition s tha t overcom e th e unfavourable electrostati c repulsion s betwee n negativel y charge d phosphates . Fo r example, a regula r pol y r(A):pol y r(U) duple x convert s int o a stabl e three-strande d poly r(U):pol y r(A):pol y r(U ) structur e upo n additio n o f magnesium , o r a t highe r concentration of monovalent cation s (greater than 0.1 M Na + ) (112) . Fibre diffractio n studies o n thi s triplex showe d tha t the extr a poly r(U ) stran d was parallel to pol y r(A) and boun d i n th e majo r groov e o f a th e Watson—Cric k duple x formin g a n arra y o f U:AU bas e triple s (Fig . 18.9a ) (113) . Othe r polyribonucleotide s als o for m tripl e helices in solution. Poly r(C) and poly r(G) have been show n t o associat e at low pH t o form a poly r(C +):poly r(G):pol y r(C ) triple x (114 ) wit h th e protonate d pol y r(C +) strand bound i n th e majo r groove (Fig . 18.9b) . Evidenc e fo r poly r(G):poly r(G):pol y r(C) and poly r(A):poly r(A):poly r(U) tripl e helice s was found by using agarose-linked polyribonucleotide affinit y column s (115 ) an d th e formatio n o f pol y r(A):pol y r(G):poly r(C ) tripl e heli x ha s been show n b y U V mixin g curv e experiment s (116) . The formatio n o f poly r(A):pol y r(A):pol y r(U ) an d pol y r(A):pol y r(G):pol y r(C ) i s dependent o n the length o f the polynucleotide strand s participating in the triple x for mation. Th e A:A U triple x form s onl y whe n pol y r(A) strands ar e 28—150 nucleotides in length , wherea s th e siz e o f pol y r(U ) ha s n o effec t o n th e triple x stabilit y (117) . The formatio n of A:GC tripl e helice s depend s o n th e length o f the pol y r(C ) strand . This triple helix forms readily when th e averag e length o f poly r(C) is 100 nucleotides, but doe s not for m when th e averag e length i s 500 nucleotides (116). The tw o majo r groove triples , U:AU an d C +:GC, have isomorphic structure s and therefore ca n for m simultaneousl y i n a mixe d pyrimidine—purine—pyrimidin e sequences. Thi s triple-strande d motif is also stabl e when on e o r tw o o f the participat ing strand s are substitute d with deoxyribonucleotide s (118,119) . A structur e o f a small unimolecula r RN A tripl e heli x containin g severa l alternatin g U:AU ) an d C+:GC base triple s has been investigated by NMR (120) . The sequenc e of this molecule wa s based on a DNA oligonucleotid e tha t ha d bee n show n previousl y t o for m an intramolecula r triple heli x i n solutio n (121) . At p H 4.8 , th e NM R dat a showe d formation o f fou r U:A U an d thre e C +:GC majo r groov e bas e triples . Eac h o f th e third stran d pyrimidines formed tw o Hoogstee n hydroge n bond s wit h th e Watson Crick boun d purine s (Fig . 18.9 a an d b) . Stron g evidenc e fo r th e formatio n o f C+:GC triple s wa s provided b y th e presenc e of downfield-shifte d imin o resonances from protonate d N 3 o f the Hoogsteen-boun d cytosine s (122) . All of the nucleotide s involved i n base triples had th e A-form C3'-endo sugar conformation, indicating rela tive rigidit y o f th e structure . NM R studie s o f a n intramolecula r RN A triple x o f slightly differen t sequenc e als o showed formatio n o f alternating majo r groov e U:A U and C +:GC triples (123). 4.1.2 Isolated triples
Isolated triple s hav e bee n foun d o r predicte d i n a variet y o f large RNAs, includin g tRNAs an d grou p I intron s (102,124) . Single-nucleotid e triple s ofte n occu r a t th e interface o f coaxially stacked helices within bulges , interna l loops, or junctions. I f the
RNA structure in solution 59
1
Fig. 18.9. Tripl e interactions observed in RNA molecules . Major groove bas e triples: (a ) (A:U):U tripl e (113,120). (b ) (G:C):C + tripl e (120) . (c ) an d (d ) (U:A): A an d (C:G): G triple s fro m yeas t tRNA phe . (e) Minor groov e nucleosid e (C:G):A triple (103) . (f ) Nucleotide tripl e (A:U): G from FM N aptame r (66).
592
Oxford Handbook of Nucleic Acid Structure
Fig. 18.10 Placement of single-stranded overhangs in different grooves of Rna at the junction of two helices. A right-handed rotation at the junction of the helices positions the 5'-single strand in the minor groove of the helix-and the 3'-single strand enters the major groove. This figure is useful in visualizing structures of pseudo-knots, kissing hairpins, and two-stem junctions.
stacked helices continue a right-handed twis t at the junction, th e unpaire d nucleotides at the 5'-end of each duplex ente r the mino r groov e o f the othe r helix . Similarly , each 3' single-strande d en d wil l b e place d i n th e majo r groov e o f th e opposit e heli x (Fig. 18.10) . Severa l base triple s observe d i n differen t RN A structure s are consistent with thi s simple rule . Fo r example , tw o G:G C bas e triple s (Fig . 18.9d ) observe d i n crystal structure s of tRNA phe ar e formed a t the junction o f two stacke d helices. Th e single-stranded G4 5 an d G46 from the 3'-en d o f the anticodo n ste m ente r th e major groove o f the D-ste m an d for m bas e triple s wit h G10:C2 5 an d G22:C1 3 Watson Crick pairs , respectively (Fig . 18.8d) . Similarly , two set s o f simultaneou s majo r an d minor groov e triple s have been propose d t o form at the P4/P6 junction fro m grou p I introns (125) . An NM R structur e o f a model o f this junction confirme d th e forma tion o f two mino r groov e nucleosid e triples, A:GC an d U:GU althoug h thei r struc ture wa s different t o tha t propose d b y Miche l an d Westho f (103,104) . I n th e A:G C triple (Fig . 18.9e) , th e N 1 o f th e single-strande d A form s a hydroge n bon d t o th e 2'-hydroxyl o f a Watson—Crick paired G but n o base-base contact s were detected . A single major groove U:A U bas e triple wa s formed i n th e structur e of TAR RN A (see Sectio n 3.6 ) upo n bindin g o f th e argininamid e ligan d (94) . The geometry o f this triple is identical to th e major groove triple s seen in U:AU tripl e helice s (Fig. 18.9a). A well-defined G:A U nucleotid e tripl e wa s identified in the structur e of the FM N aptamer solve d b y NM R (66) . Th e tripl e i s forme d upo n FM N bindin g an d i s involved i n generatin g th e intercalatio n sit e pocket. A uniqu e featur e o f this tripl e is that non e o f th e base s ar e involve d i n Watson-Cric k pairing . Th e tripl e i s formed
RNA structure in solution 59
3
between a reverse Hoogstee n A: U pai r an d a G residu e (Fig . 18.9f). Beside s a single hydrogen bon d with the Hoogsteen-paire d uracil , the externa l G is in close proximity to phosphate oxygens , possibly formin g a n additional hydroge n bond (66) .
4.2 Pseudoknots A pseudoknot form s whe n a single strand pairs to a hairpin loop; tw o loop s an d tw o stems resul t (se e Fig. 18.11a) . Th e nam e pseudokno t wa s proposed (126 ) becaus e if each stem contained mor e tha n 1 1 base pairs, and thu s made a complete turn , an d if the end s were linked, a topological kno t woul d result . In 198 2 experimenta l evidenc e was obtained for a pseudoknot structur e in turni p yellow mosai c virus (127). Pseudo knots ar e found in al l types of RNA an d hav e a wide variety of biological functions ; several reviews describe thei r importance (128—130). In Fig . 18.1la a general pseudoknot i s shown wit h tw o stem s an d three loops . Thi s figure represent s a wid e variet y o f possibl e pseudoknot s i f w e allo w an y on e o f th e three loop s t o hav e zer o length , o r i f w e allo w the m t o fol d int o furthe r secondar y structures, such as hairpins. The simples t pseudoknot ha s loop 1. 5 with zer o length; this is the so-called H-type pseudoknot. The tw o stems can stack coaxially on eac h other t o form a quasi-continuou s helix . Becaus e o f th e right-hande d windin g o f A-for m helices, loo p 1 crosse s th e dee p majo r groov e o f stem 2 , wherea s loop 2 crosse s the shallow mino r groov e o f ste m 1 . Th e minimu m loo p length s for a give n numbe r o f base pairs in each stem can be estimated from A-for m geometr y (Fig . 18.11b) (131) . A minimum loo p 1 length o f one o r tw o nucleotide s occur s when stem 2 is seven base pairs long . Loo p 2 mus t b e longer . Wit h fou r bas e pair s i n ste m 1 a t leas t thre e nucleotides ar e neede d i n loo p 2 , an d th e loo p lengt h increase s rapidly wit h ste m length. Thes e estimate s are based on standar d A-form structure , so bending o r unusual twisting o f th e helice s ca n lea d t o differen t results . Experimenta l studie s have bee n done o n a n H-type pseudokno t with thre e bas e pairs in ste m 1 and fiv e bas e pair s i n stem 2 . Th e effec t o f loop length s (wit h U s i n th e loops ) on th e pseudokno t stabilit y relative to it s constituent hairpins was determined (132) . Magnesiu m io n preferentially stabilizes th e pseudokno t wit h respec t to it s hairpins. I n 5 mM Mg 2+ a minimum o f three nucleotide s i n loo p 1 wa s neede d fo r th e fiv e bas e pair s i n ste m 2 , an d a minimum o f fou r nucleotide s wa s neede d fo r th e thre e bas e pair s in ste m 1 . Thes e results are consistent with th e estimate s based on A-form stem geometry. Th e pseudo knot i s only marginally more stabl e than its constituent hairpins ; a decrease in standard free energ y of only 1. 5 to 2 kcal/molat 37° C results when th e pseudoknot forms . The structur e of the H-typ e pseudoknot wa s found to hav e the tw o stems coaxially stacked, with only minor distortio n i n helical stacking at the junction o f the tw o stems (133). Right-hande d winding continue s at the stem—ste m junction wit h a n increase in the windin g angle , which help s relieve the crowdin g o f the tw o loops at the junction. The phosphate s fro m th e loop s an d stem s ar e ver y clos e a t th e stem—ste m interface, and ma y provide th e bindin g sit e for th e Mg 2+ ion s require d fo r pseudokno t forma tion. Surprisingly , n o evidenc e fo r bas e triple formatio n wa s seen between th e loop s and stems. Although mode l building ca n place loop 1 in the majo r groove an d loop 2 in th e mino r groov e o f the stems , n o NM R evidenc e fo r loop—ste m interaction was seen.
594
Oxford Handbook of Nucleic Acid Structure
Fig. 18.11. (a ) Drawin g o f a genera l pseudo-knot , (b ) Distanc e (i n A ) acros s th e majo r an d mino r grooves o f an A-form RN A heli x as a function o f the numbe r o f base pairs (131). Th e distance s were cal culated usin g coordinates fro m fibr e diffractio n studies . Indicated on th e right-han d sid e of the grap h are the numbe r o f nucleotides necessar y to cros s the indicate d distanc e (assumin g that a nucleotide i s able t o span 7 A).
Thermodynamic an d structural studies have been don e o n pseudoknot s fro m gen e 32 mRNAs fro m T2 , T4 , an d T6 bacteriophage s (134,135). Thes e pseudoknot s bin d the gen e 32 protein t o autoregulat e the translatio n of its mRNA. The pseudoknot s are stabilized b y Mg 2+ an d hav e coaxially stacke d stems. Stem 2 contain s seven base pairs and is spanned by a loop o f only on e nucleotide ; th e minimu m predicte d fo r standard A-form geometry . Ste m 1 contains fou r o r fiv e bas e pairs and i s spanned by loop s o f five o r seve n nucleotides , respectively . Ther e wa s a hin t fro m NOE s o f loop—ste m interactions, but n o definit e structure coul d be deduced .
RNA structure in solution 59
5
If loop 1. 5 (se e Fig. 18.1 1 a) is non-zero th e direc t coaxia l stacking of th e stem s is interrupted. Pseudo-knot s wit h a loop 1. 5 of on e nucleotid e (on e nucleotid e inter rupts th e stackin g of the stems ) are important i n th e programme d frameshiftin g use d by several retroviruses to synthesiz e vital enzymes (136) . Th e structur e of the pseudo knot require d for frameshifting i n mouse mammary tumou r viru s (137) shows that the adenylate residu e betwee n th e stem s cause s a ben d i n th e pseudokno t [Plat e XX I (top)]. Removing the intervening nucleotide produce s a linear structure with coaxiall y stacked stems (138). I f two nucleotide s intervene betwee n th e stem s (loop 1. 5 contains two nucleotides ) th e stem s are not coaxial ; instea d they ar e displaced relative t o eac h other (139) . I t i s important t o realiz e tha t th e numbe r o f nucleotides in loop s 1 , 1.5 , and 2 ca n no t b e deduce d simpl y fro m th e sequence . Whethe r bas e pair s form a t the end s of the stems , or th e base s are part of the loops , mus t be determine d experimentally.
4.3 Loop—loop and loop—helix interactions These interaction s ca n includ e an y combinatio n o f hairpi n loops , interna l loops , bulges, an d helices . Thes e tertiar y interactions ar e important i n foldin g RN A mole cules into th e specifi c compac t forms require d fo r their biological functions . 4.3.1 Kissing hairpins Kissing hairpin s ar e forme d b y bas e pairin g betwee n complementar y hairpi n loop s (Fig. 18.1 ) (140) . The y ar e involved i n naturall y occurrin g antisens e control o f bio logical functio n (141) . Th e best-studie d exampl e i s th e contro l o f ColE l plasmi d replication i n E. colt (142) . A kissin g hairpin comple x form s a s the firs t ste p i n th e hybridization o f the complementar y RNAs. Formatio n o f the loop—loo p interaction is faster tha n th e conversio n o f th e comple x t o th e mor e stabl e duplex . Th e latte r process i s subsequently catalysed by a protein. Th e thermodynamic s (143 ) an d struc ture (144 ) of the kissin g complex betwee n th e RN A I and RNA I I stem loops o f the ColEl plasmi d hav e been studie d in detail . Imino proto n spectr a showed tha t all seven base pair s o f the loop-loo p heli x forme d an d that th e ste m bas e pair s wer e no t dis rupted. Two-dimensional NM R NOES Y spectra indicated continuou s stacking of the base pairs on th e 3'-sid e o f each stem. I n additio n t o NM R data , electrophoretic gel mobility experiment s showed tha t the comple x wa s bent. A model consistent with th e NMR an d electrophoresis results was obtained (144) . The structur e o f a kissin g comple x betwee n th e HI V TA R hairpi n loo p an d it s complement (145 ) i s shown i n Plat e XX I (bottom) . Al l six nucleotides o f eac h loo p form bas e pairs in th e loop—loo p helix. A s in th e ColE l complex , th e tw o stem s plus the loop—loo p heli x for m a quasi-continuous bent helix . Th e formatio n o f a helix by all th e nucleotide s tha t ar e par t o f th e loo p o f a stem—loo p structur e mean s tha t a single phosphodiester grou p must join the base s at the beginning an d end o f the helix . The shortes t distance between th e end s of an A-form heli x is across the majo r groove; for 6 or 7 base pairs the distanc e is about 1 0 A (Fig . 18.lib) (131) . Although thi s distance i s too lon g for a phosphate group , bendin g the heli x toward s the majo r groove , and increasin g winding angles and propelle r twist s (145) , allow s the formatio n of the complex. Tw o phosphate s (on e fro m eac h hairpin ) bridg e th e majo r groov e o f th e
596
Oxford Handbook of Nucleic Acid Structure
loop—loop helix . Th e phosphat e cluste r makes a likely Mg 2+-binding site . The heli x distortions ma y be part of the recognitio n mechanis m fo r the Ro m (o r Rop) protein , which specificall y binds kissing hairpins (146) 4.3.2 Loop-helix The GAA A tetraloo p in a hammerhea d ribozym e forms an intermolecula r contac t with th e mino r groov e o f ste m I I o f anothe r hammerhea d molecul e i n th e crysta l structure (24) . Onl y on e ou t o f thre e GAA A tetraloop s presen t i n th e uni t cel l i s involved i n th e loop—heli x interaction . Remarkably , th e structure s of the boun d an d unbound tetraloop s are identical, an d also closely resembl e th e structur e of the GCA A tetraloop solve d in solution by NMR (18) . In the complex, the tetraloop stem and the target heli x ar e almost parallel forming a 31° angle between th e heli x axes . The thir d and fourt h adenines of the tetraloo p for m mino r groov e triple s with tw o consecutive C:G bas e pairs; each A form s fou r hydroge n bond s wit h it s target C: G pair . I n eac h triple, onl y on e hydroge n bon d i s formed between th e bases ; th e othe r thre e involv e 2'-hydroxyl hydrogen bonds . The P4/P5/P 6 domai n o f the Tetmhymena thermophila grou p I intron contain s two loop-helix interactions (25) ; a GNRA tetraloop binds to its internal loop receptor and an A-ric h bulg e hydroge n bind s t o a helix . Thes e interaction s hol d tw o helica l domains i n clos e an d specifi c contact . Th e X-ra y structur e o f a grou p I ribozym e domain (147 ) shows that th e thre e A s of the GAA A tetraloop stac k o n tw o adjacen t As in the recepto r loop and hydrogen bon d i n the mino r groov e o f the adjacen t helix . The hydroge n bondin g provide s the sequence specificity between th e tetraloop recep tor an d th e tetraloop . Thei r divers e bindin g capabilities , and th e fac t tha t GNR A loops ar e presen t i n exceptiona l abundanc e i n natura l RNAs (16) , sugges t tha t th e GNRA tetraloo p famil y ma y ac t a s a genera l long-distanc e dockin g moti f fo r RNA-RNA recognitio n (32) .
4.4 Prediction of structure The ultimat e goa l o f method s t o predic t macromolecula r structur e is t o calculat e a high resolutio n structur e from th e bas e sequence, the solven t conditions (sal t concen tration, pH , etc.) , and th e temperature . N o experiment s ar e done, onl y calculations. We are far from thi s goal. Here w e will describ e methods availabl e for obtaining possi ble RN A secondar y structures, and for modellin g thei r three-dimensiona l structures . Useful genera l reviews o f this subject are available (148,149) . Secondary and tertiary structure can be obtaine d fro m sequenc e alone by phyloge netic compariso n o f man y RN A molecule s wit h th e sam e functio n fro m differen t species. Th e sequence s ar e firs t aligne d usin g invarian t an d homologou s sequenc e regions a s guides. Then covariatio n o f base s i s used t o establis h Watson—Crick base pairs. Fo r example, i f an A in on e specie s changes to a C, an d a U i n th e sam e species changes t o a G , the y potentiall y covary . A detaile d secondar y structur e can b e con structed i f enoug h sequence s are available . Similarly, i f ther e i s covariation o f a bas e pair wit h a thir d base , tertiary structure interactions ca n b e established . Whe n hun dreds of sequences are available, very detailed structure s can be determine d (150,102) . In the following sections we will describe methods tha t require only on e sequence.
RNA structure in solution 59
7
4.4.1 Secondary structure The fre e energ y i s a minimum fo r a system (such as a solution o f RNA molecule s i n a buffer) a t equilibriu m a t constan t temperatur e an d pressure . Therefore, i f we ca n cal culate th e fre e energie s of different RN A secondar y structures , we ca n predict whic h will actuall y occur , i.e . th e on e wit h th e lowes t fre e energy . Algorithm s t o calculate free energie s o f RN A secondar y structure s ar e base d o n th e nearest-neighbou r hypothesis (5) . Th e fre e energ y o f a secondar y structur e i s calculate d a s a su m of , (a) negativ e (favourable ) contribution s fro m adjacen t pairs—Watson—Cric k an d G:U neares t neighbours ; an d (b ) positive (unfavourable ) contributions fro m formin g mismatches, loops, an d bulges. Th e fre e energ y values are obtained fro m experimenta l data o n equilibriu m constant s fo r doubl e stran d formation , hairpi n loo p formation , etc. a s a function o f sequence . Th e calculate d fre e energ y i s approximate becaus e o f uncertainties i n th e measure d fre e energie s o f th e structura l elements , th e nee d t o extrapolate to othe r loop sequences and loop sizes , the assumption of additivity of the thermodynamic values , and so forth. Thus , algorithm s t o predic t secondar y structure must provide no t onl y th e optima l structure , but als o many possible suboptimal struc tures (151) . A comparison o f the thermodynami c prediction s o f base paired duplexe s with thos e establishe d b y extensiv e phylogeneti c comparisons , showe d abou t 90 % agreement (152) . As more referenc e thermodynamic dat a are obtained (153) , includ ing junction s an d extr a stabl e loo p sequences , th e thermodynami c predictio n o f secondary structur e should improve . The effect s o f solvent (Na +, K +, Mg 2+ concentrations , fo r example ) an d tempera ture nee d t o b e explore d further . Th e referenc e thermodynami c dat a is mainly avail able fo r 1 M Na +. Thi s wa s chose n t o avoi d th e hydrolyti c effec t o f Mg 2+, bu t t o provide sufficien t ioni c strengt h t o shiel d electrostati c repulsio n o f th e phosphates . Free energ y value s are give n fo r 37°C , bu t enthalp y an d entrop y value s needed fo r obtaining fre e energie s at other temperature s ar e also available. 4.4.2 Tertiary structure Prediction o f tertiary structure from a single sequence i s extremely difficult . Th e strat egy is to searc h for possible base—base interactions amon g th e secondar y structur e ele ments. Fo r example, pseudoknot s ca n be predicted b y considering furthe r bas e pairing of th e loop s an d single-strande d region s o f th e calculate d secondar y structure . Presumably, eventually , specific RN A structur e receptors, such as the tetraloo p recep tor (25) , will b e established . At present , however , tertiar y structure s are nearly com pletely base d o n phylogeneti c sequenc e information , chemica l reactivity , an d spectroscopic measurements. 4.4.3 Three-dimensional structure Modelling three-dimensiona l structure s fo r RN A fro m th e sequenc e i s base d o n building u p th e structur e fro m measure d structure s of mode l RNAs . RN A double strand helice s ar e essentiall y A-form , s o helice s obtaine d fro m thermodynamic s o r phylogenetics ca n be modelled accurately . The three-dimensiona l structure s of any of the tetraloop families, or loop-E-like sequences described above , can be added. Other sequences can be modelled fro m a database of possible mononucleotide conformation s (154,155). Ther e ar e restrictions on th e seve n torsio n angle s that specif y th e confer -
598
Oxford Handbook of Nucleic Acid Structure
mation o f each nucleotide , an d more constraint s ar e imposed b y eac h particula r loop size, o r b y th e requirement s o f mismatc h formation , o r a base triple . Al l thes e con straints ca n b e use d t o calculat e possibl e three-dimensiona l structure s fo r a give n sequence. A test o f this method for tRNA phe gave encouraging results (156) . Many othe r method s fo r calculatin g foldin g o f nuclei c acid s ar e bein g activel y developed a s described in ref . 149 .
Acknowledgements We gratefull y acknowledg e D r Kevi n Luebk e fo r readin g the manuscrip t an d makin g very usefu l comments . Th e wor k o n RN A i n ou r laborator y ha s been supporte d by the Nationa l Institute s o f Healt h an d th e Departmen t o f Energy. W e than k D r Juli Feigon an d Dr Gabriel e Varani fo r providing us with figures .
References 1. Herschlag , D. (1995 ) J. Biol. Chem. 270, 20871 . 2. Sauer , K. (ed. ) (1995) Biochemical Spectroscopy, Vol . 246, Methods in Enzymology. Academi c Press, San Diego . 3. Warshaw , M.M . an d Tinoco, Jr, I . (1966 ) J. Mol. Biol. 20, 29. 4. Altona , C . (1982 ) Reel. Trav. Chim. Pays-Bos. 101 , 413 . 5. Turner , D.H. , Sugimoto , N . an d Freier , S.M . (1988 ) Annu. Rev. Biophys. Biophys. Chem. 17, 167 . 6. Saenger , W. (1984 ) Principles of Nucleic Acid Structure. Springer-Verlag, Ne w York . 7. Varani , G. an d Tinoco, Jr, I . (1991 ) Q . Rev. Biophys. 24 , 479 . 8. Allain , F.H.-T. an d Varani, G. (1996 ) Progr. Nucl. Magn. Reson. Spectrosc29, 54 . 9. Gast , F.U . an d Hagerman, PJ . (1991 ) Biochemistry 30 , 4268 . 10. Weeks , K.M . an d Crothers, D.M . (1993 ) Science 261, 1574 . 11. Hall , K. , Cruz , P. , Tinoco , Jr, I. , Jovin, T.M. an d van de Sande, J.H. (1984 ) Nature 311 , 584. 12. Davis , P.W., Adamiak , R.W. an d Tinoco, Jr, I . (1990 ) Biopolymers 29 , 109 . 13. Wang , A.H. , Quigley , G.J. , Kolpak, F.J., Crawford, J.L., va n Boom, J.H., va n der Marel , G. and Rich, A. (1979 ) Nature 282, 680 . 14. Tinoco , Jr , I. , Davis, P., Hardin , C.C. , Puglisi , J.D., Walker , G.T. an d Wyatt, J. (1987 ) Cold Spring Harbor Symp. Quant. Biol. 52, 135 . 15. Noller , H.F . (1984 ) Annu. Rev. Biochem. 53, 119 . 16. Woese , C.R. , Winker , S . and Gutell , R.R. (1990 ) Proc. Natl. Acad. Sci. USA 87 , 8467 . 17. Varani , G. (1995 ) Annu. Rev. Biophys. Biomol. Struct. 24, 379 . 18. Heus , H.A . an d Pardi, A . (1991 ) Science 253, 191 . 19. Jucker , F.M. an d Pardi, A. (1995 ) Biochemistry 34 , 14416 . 20. Jucker , F.M. , Heus , H.A. , Yip , P.F. , Moors , E.H . M . an d Pardi, A. (1996) J. Mol. Biol. 21. Orita , M., Nishikawa , F. , Shimayama, T., Taira , K. , Endo, Y . an d Nishikawa, S . (1993 ) Nucl. Acids Res. 21, 5670 . 22. Szewczak , A.A. and Moore, P.B . (1995 ) J. Mol. Biol. 247, 81 . 23. SantaLucia , Jr, J., Kierzek , R. an d Turner, D.H . (1992 ) Science 256, 217 . 24. Pley , H.W., Flaherty , K.M. an d McKay, D.B. (1994 ) Nature 372 , 111 . 25. Cate , J.H. , Gooding , A.R. , Podell , E. , Zhou , K. , Golden , B.L. , Kundrot , C.E. , Cech, T.R . an d Doudna, J.A. (1996 ) Science 273, 1678 .
RNA structure in solution 59
9
26. Scott , W.G., Finch , J.T. an d Klug, A. (1995) Cell 81, 991 . 27. Quigley , GJ . an d Rich, A. (1976) Science 194 , 796 . 28. Sassanfar , M . an d Szostak, J.W. (1993 ) Nature 364, 550 . 29. Dieckmann , T. , Suzuki , E., Nakamura, G.K. an d Feigon, J. (1996 ) RNA 2, 628 . 30. Jiang , F., Kumar, R.A., Jones, R.A . an d Patel, D.J. (1996 ) Nature 382 , 183 . 31. Gluck , A., Endo, Y . and Wool, I.G. (1992) J. Mol. Biol. 226, 411 . 32. Jaeger , L. , Michel, F . and Westhof, E . (1994 ) J. Mol. Biol. 236, 1271 . 33. Tuerk , C. , Gauss , P. , Thermes , C. , Groebe , D.R. , Guild , N. , Stormo , G. , Gayle , M. , d'Auberton-Carafa, Y. , Uhlenbeck , O.C. , Tinoco , Jr , I. , Brody , E.N . an d Gold , L . (1988) Proc. Natl. Acad. Sci. USA 85 , 1364 . 34. Antao , V.P., Lai , S.Y. an d Tinoco, Jr , I . (1991 ) Nucl Acids Res. 19, 5901 . 35. Varani , G., Cheong , C . an d Tinoco, Jr, I . (1991 ) Biochemistry 30 , 3280. 36. Allain , F.H. -T . an d Varani, G. (1995 ) J. Mol. Biol. 250, 333 . 37. Selinger , D., Liao , X. an d Wise, J.A. (1993 ) Proc. Natl. Acad. Sci. USA 90 , 5409 . 38. Molinaro , M. an d Tinoco, Jr, I . (1995 ) Nucl. Acids Res. 23, 3056 . 39. James , J.K. an d Tinoco, Jr, I . (1993) Nucl. Acids Res. 21, 3287. 40. Jacobson , H . an d Stockmayer, W.H. (1950 ) J. Chem. Phys. 18, 1600 . 41. Gralla , J. an d Crothers, D.M. (1973 ) J. Mol. Biol. 73, 497 . 42. Chastain , M. an d Tinoco, Jr, I . (1991 ) Prog. Nucleic Acid Res. Mol. Biol. 41, 131 . 43. Borer , P.N. , Lin , Y., Wang, S. , Roggenbuck, M.W. , Gott , J.M., Uhlenbeck , O.C . an d Pelczer, I. (1995 ) Biochemistry 34 , 6488 . 44. Mirmira , S.R . an d Tinoco, Jr, I . (1996 ) Biochemistry 35 , 7664 . 45. Tuerk , C. an d Gold, L. (1990) Science 249, 505 . 46. Mirmira , S.R . an d Tinoco, Jr, I . (1996 ) Biochemistry 35 , 7675 . 47. Davis , P.W., Thurmes , W. an d Tinoco, Jr, I . (1993 ) Nucl. Acids Res. 21, 537 . 48. Jaeger , J.A. an d Tinoco, Jr, I . (1993 ) Biochemistry 32 , 12522 . 49. Fountain , M.A. , Serra , M.J. , Krugh , T.R . an d Turner , D.H . (1996 ) Biochemistry 35 , 6539. 50. Huang , S. , Wang, Y.X. an d Draper, D.E. (1996 ) J. Mol. Biol. 258, 308 . 51. Schweisguth , D.C . an d Moore, P.B . (1996 ) J. Mol. Biol. 267, 505 . 52. Sugimoto , N. , Kierzek , R., Freier , S.M. an d Turner, D.H . (1986 ) Biochemistry 25 , 5755 . 53. Wu , M. , McDowell , J.A . an d Turner, D.H . (1995 ) Biochemistry 34 , 3204 . 54. Crick , F.H.C . (1966 ) J. Mol. Biol. 19, 548 . 55. Allain , F.H.-T. and Varani, G. (1995 ) Nucl. Acids Res. 23, 341 . 56. Cate.J.H . an d Doudna, J.A. (1996 ) Structure 4, 1221 . 57. Walter , A.E. , Wu, M . an d Turner, D.H . (1994 ) Biochemistry 33 , 11349 . 58. SantaLucia , Jr. J. an d Turner, D.H . (1993 ) Biochemistry 32 , 12612 . 59. Wu , M . an d Turner, D.H . (1996 ) Biochemistry 35 , 9677 . 60. Szewczak , A.A., Moore , P.B. , Chan , Y.- L an d Wool, I.G . (1993 ) Proc. Natl. Acad. Sci. USA 90 , 9581 . 61. Wimberly , B. , Varani, G. an d Tinoco, Jr, I . (1993 ) Biochemistry 32 , 1078 . 62. Cai , Z. an d Tinoco, Jr, I . (1996 ) Biochemistry 35 , 6026 . 63. Peterson , R.D. , Bartel , D.P. , Szostak , J.W. , Horvath , S.J . an d Feigon , J . (1994 ) Biochemistry 33 , 5357 . 64. Battiste , J.L., Mao , H. , Rao , N.S. , Tan , R. , Muhandiram , D.R. , Kay , L.E., Frankel , A . and Williamson, J.R. (1996 ) Science 273, 1547 . 65. Leonard , G.A., McAuley-Hecht , K.E. , Ebel, S. , Lough, D.M. , Brown , T . an d Hunter , W.N. (1994 ) Structure 2, 483 . 66. Fan , P., Suri, A.K., Fiala, R., Live , D. an d Patel, D.J. (1996 ) J. Mol. Biol. 258, 480 . 67. Pley , H.W., Flaherty , K.M. an d McKay, D.B. (1994 ) Nature 372 , 68 .
600
Oxford Handbook of Nucleic Acid Structure
68. Legault , P. and Pardi, A. (1994) J. Am. Ghent. Soc. 116, 8390 . 69. Puglisi , J.D., Wyatt , J.R. an d Tinoco, Jr, I . (1990 ) Biochemistry 29 , 4215 . 70. Yang , Y. , Kochoyan , M. , Burgstaller , P. , Westhof , E . an d Famulok , M . (1996 ) Science 272, 1343 . 71. Cheong , C. and Moore, P.B. (1992 ) Biochemistry-31, 8406 . 72. Holbrook , S.R. , Cheong , C., Tinoco , Jr, I . and Kim, S.-H . (1991 ) Nature 353 , 579 . 73. Cruse , W.B.T. , Saludjian , P. , Biala , E. , Strazewski , P. , Prange , T . an d Kennard , O . (1994) Proc. Natl. Acad. Sd. USA 91 , 4160 . 74. Lewis , H.A. (1995 ) PhD Thesis. Universit y of California, Berkeley . 75. SantaLucia , Jr, J., Kierzek , R. an d Turner, D.H. (1991 ) Biochemistry 30 , 8242. 76. Gesteland , R.F . an d Atkins , J.F . (eds ) (1993 ) The RNA World. Col d Sprin g Harbo r Laboratory Press, Col d Spring Harbor . 77. Peritz , A.E., Kierzek, R., Sugimoto , N . an d Turner, D.H. (1991 ) Biochemistry 30 , 6428. 78. Tang , R.S . an d Draper, D.E . (1994 ) Biochemistry 33 , 10089 . 79. Wimberly , B. (1994 ) Nature Struct. Biol. 1, 820 . 80. Burke , J.M. (1994 ) Nud. Adds Mol. Biol. 8, 105 . 81. Burgstaller , P. and Famulok , M. (1994 ) Angew. Chem. Int. Ed. Engl. 33, 1084 . 82. Joyce , G.F . (1994 ) Curr. Opin. Struct. Biol. 4, 331 . 83. Allain , F.H.-T. , Gubser , C.C. , Howe , P.W. , Nagai , K. , Neuhaus , D . an d Varani , G . (1996) Nature 380 , 646 . 84. Gubser , C.C . an d Varani, G. (1996 ) Biochemistry 35 , 2253 . 85. Fourmy , D. , Recht , M.I., Blanchard , S.C . an d Puglisi, J.D. (1996 ) Science 274, 1367 . 86. Longfellow , C.E. , Kierzek , R . an d Turner, D.H . (1990 ) Biochemistry 29 , 278 . 87. Bhattacharyya , A., Murchie, A.I.H . an d Lilley, D.MJ . (1990 ) Nature 343, 484 . 88. Tang , R.S. an d Draper, D.E . (1990 ) Biochemistry 29 , 5232 . 89. Zacharias , M. and Hagerman, P.J . (1995 ) J. Mol. Biol. 247, 486 . 90. Riordan , F.A. , Bhattacharyya, A., McAteer, S . and Lilley, D.M. (1992 ) J. Mol. Biol. 226, 305. 91. Luebke , K.J . and Tinoco, Jr, I . (1996 ) Biochemistry 35 , 11677 . 92. va n de n Hoogen , Y.T. , va n Beuzekom, A.A. , d e Vroom, E. , va n de r Marel, G.A. , va n Boom, J.H. an d Altona, C. (1988 ) Nud. Acids Res. 16, 5013 . 93. Joshua-Tor , L. , Frolov , F. , Appella , E. , Hope , H. , Rabinovich , D . an d Sussman , J.L. (1992) J. Mol. Biol. 225, 397 . 94. Puglisi , J.D., Tan , R. , Calnan , B.J. , Frankel , A.D . an d Williamson , J.R . (1992 ) Science 257, 76 . 95. Aboul-ela , F. , Karn , J. an d Varani , G . (1995 ) J. Mol. Biol. 253, 313 ; (1996 ) Nud. Acids Res. 24, 3974 . 96. Zacharias , M. an d Hagerman, P.J . (1995 ) Proc. Natl. Acad. Sci. USA 92 , 6052 . 97. Walter , A.E. , Turner , D.H. , Kim , J. , Lyttle , M.H. , Muller , P. , Mathews , D.H . an d Zuker, M . (1994 ) Proc. Natl. Acad. Sci. USA 91 , 9218 . 98. Olmsted , M.C . an d Hagerman, P.J . (1994 ) J. Mol. Biol. 243, 919 . 99. Holbrook , S.R. , Sussman , J.L., Warrant , R.W . an d Kim, S.-H . (1978 ) J. Mol. Biol. 123 , 631. 100. Weidner , H . an d Crothers, D.M . (1977 ) Nud. Acids Res. 4, 3401 . 101. Damn , S.C . an d Uhlenbeck, O.C . (1991 ) Biochemistry 30 , 9464 . 102. Michel , F . and Westhof, E. (1990 ) J. Mol. Biol. 216, 585 . 103. Chastain , M . and Tinoco, Jr, I . (1992 ) Biochemistry 31 , 12733 . 104. Chastain , M. an d Tinoco, Jr, I . (1993) Biochemistry 32 , 14220 . 105. Nowakowski , J . an d Tinoco, Jr, I . (1996 ) Biochemistry 35 , 2577 .
RNA structure in solution 60
1
106. Murchie , A.I.H . an d Clegg , R.M. , vo n Kitzing , E. , Duckett , D.R. , Diekmann , S. , Lilley, D.M.J. (1989 ) Nature 341 , 763 . 107. Shen , Z. an d Hagerman, PJ . (1994 ) J. Mol Biol. 241, 415 . 108. Westhof , E., Romby, P., Romaniuk, P.J. , Ebel, J.-P., Ehresmann, C. and Ehresmann, B. (1989) J. Mol. Biol. 207, 417 . 109. Brand , C. , Romby , P., Westhof , E. , Ehresmann , C . an d Ehresmann, B. (1991 ) J. Mol. Biol. 221(1), 293 . 110. Stahl , D.A. , Luehrsen , K.R. , Woese , C.R . an d Pace , N.R . (1981 ) Nucl. Acids Res. 9, 6129. 111. Tuschl , T. , Gohlke , C. , Jovin , T.M. , Westhof , E . an d Eckstein , F . (1994 ) Science 266 , 785. 112. Felsenfeld , G., Davies, D.R. an d Rich, A. (1957) J. Am. Chem. Soc. 79, 2023 . 113. Arnott , S. and Bond, PJ . (1973 ) Nature New Biol. 244, 99. 114. Thiele , D . an d Guschlbauer, W. (1971 ) Biopolymers 10 , 143 . 115. Letai , A.G., PaUadino , M.A., Fromm , E. , Rizzo , V . and Fresco , J.R. (1988 ) Biochemistry 27, 9108 . 116. Chastain , M. an d Tinoco, Jr, I . (1992) Nucl. Acids Res. 20, 315 . 117. Broitman , S.L. , Im, D.D . an d Fresco, J.R. (1987 ) Proc. Natl. Acad. Sri . USA 84 , 5120 . 118. Roberts , R.W. an d Crothers, D.M . (1992 ) Science 258, 1463 . 119. Han , H . an d Dervan, P.B. (1993 ) Proc. Natl. Acad. Sri. USA 90 , 3806. 120. Klinck , R. , Liquier , J. , Taillandier , E. , Gouyette , C . an d Tam , H.-D . (1995 ) Eur. J. Biochem. 233, 544 . 121. Sklenar , V. and Feigon, J. (1990 ) Nature 345 , 836 . 122. d e los Santos, C., Rosen , M. an d Patel, D. (1989 ) Biochemistry 28, 7282 . 123. Holland , J.A. and Hoffman, D.W . (1996 ) Nucl. Acids Res. 24, 2841 . 124. Gautheret , D., Damberger, S.H. an d Gutell, R.R. (1995 ) J. Mol. Biol. 248, 27 . 125. Michel , F., Ellington, A.D. , Couture , S . and Szostak, J.W. (1990 ) Nature 347, 578 . 126. Studnicka , G.M. , Rahn , G.M. , Cummings , I.W . an d Salser , W.A . (1978 ) Nucl. Acids Res. 5, 3365. 127. Rietveld , K., van Peolgeest, R., Pleij , C.W . A. , van Boom, J.H. an d Bosch, L. (1982) Nucl. Acids Res. 10, 1929 . 128. Pleij , C.W . (1990 ) TIBS 15 , 143 . 129. Puglisi , J.D., Wyatt , J.R. an d Tinoco, Jr, I . (1991) Acc. Chem. Res. 24, 152 . 130. te n Dam , E., Pleij, K. and Draper, D. (1992 ) Biochemistry 31 , 11665 . 131. Pleij , C.W., Rietveld , K . and Bosch, L. (1985) Nucl. Acids Res. 13, 1717 . 132. Wyatt , J.R., Puglisi , J.D. an d Tinoco, Jr, I . (1990) J. Mol. Biol. 214, 455 . 133. Puglisi , J.D., Wyatt , J.R. an d Tinoco, Jr, I . (1990) J. Mol. Biol. 214, 437 . 134. Qiu , H. , Kaluarachchi , K. , Du , Z. , Hoffman , D.W . an d Giedroc , D.P . (1996 ) Biochemistry 35 , 4176 . 135. Du , Z. , Giedroc , D.P. an d Hoffman, D.W . (1996 ) Biochemistry 35 , 4187 . 136. Brierley , I . (1995 ) J. Gen. Virol. 76, 1885 . 137. Shen, L.X. and Tinoco, Jr, I . (1995) J. Mol. Biol. 247, 963 . 138. Chen , X. , Kang, H., Shen, L.X., Chamorro , M. , Varmus, H.E. and Tinoco, Jr, I . (1996) J. Mol. Biol. 260, 479 . 139. Kang , H., Hines , J.V. an d Tinoco, Jr, I . (1996) J. Mol. Biol. 259, 135 . 140. Eguchi , Y. and Tomizawa, J.I. (1991 ) J. Mol. Biol. 220, 831 . 141. Wagner , E.G.H. and Simons, R.W. (1994 ) Annu. Rev. Microbiol. 48 , 713 . 142. Tomizawa , J.I., Eguchi, Y. and Itoh, T. (1991 ) Annu. Rev. Biochem. 60, 631 . 143. Gregorian , R.S. Jr. an d Crothers, D.M . (1995 ) J. Mol. Biol. 248(5), 968 .
602
Oxford Handbook of Nucleic Acid Structure
144. Marino , J.P. , Gregorian , Jr , R.S. , Csankovszki , G . an d Crothers , D.M . (1995 ) Science 268, 1448 . 145. Chang , K. -Y and Tinoco, Jr, I . (1997 ) J. Mol. Biol. 269, 5 2 146. Predki , P.P. , Nayak, L.M., Gottlieb , M.B . an d Regan, L. (1995 ) Cell 80, 41. 147. Gate , J.H. , Gooding , A.R. , Podell , E. , Zhou , K. , Golden , B.L. , Szewczak , A.A. , Kundrot, C.E. , Cech , T.R . an d Doudna, J.A. (1996 ) Science 273, 1696 . 148. Jaeger , J., SantaLucia , Jr, J. an d Tinoco, Jr, I . (1993) Annu. Rev. Biochem. 62, 255 . 149. Louise-May , S. , Auffinger, P . an d Westhof, E. (1996 ) Curr. Opin. Struct. Biol. 6, 268 . 150. Schnare , M.N. , Damberger , S.H. , Gray , M.W . an d Gutell , R.R . (1996 ) J. Mol. Biol. 256, 701 . 151. Zuker , M . (1989 ) Science 244, 48 . 152. Jaeger , J.A., Turner , D.H. an d Zuker, M. (1989 ) Proc. Natl. Acad. Sci. USA 86, 7706. 153. Turner , D.H . (1996 ) Curr. Opin. Struct. Biol. 6(3), 299. 154. Gautheret , D., Major , F . and Cedergren, R . (1993 ) J. Mol. Biol. 229, 1049 . 155. Gautheret , D . and Cedergren, R . (1993 ) FASEB J. 7, 97. 156. Major , F., Gautheret, D. an d Cedergren , R . (1993 ) Proc. Natl. Acad. Sci. USA 90, 9408 .
19 Transfer RNA John G. Arnez and Dino Moras Laboratoire de Biologic Stmcturak, Institut de Ginetique et de Biologic Moleculaire et Cellulaire, CNRS/INSERM/ULP, 1, rue L. Fries~BP 163, F-67404 Illkirch, France
1. Introduction Transfer RN A (tRNA ) i s the ke y intermediate i n th e process of protein synthesis . It is a link between geneti c informatio n contained i n nucleic acids and its expression in th e protein world . Th e molecul e possesse s tw o importan t ends ; on e interact s with th e codon o f th e messenge r RN A (mRNA ) throug h thre e specifi c nucleotide s calle d the anticodon , while the othe r end serve s a s the attachmen t poin t for the amin o aci d and i s subseqently linked t o a growing polypeptid e chai n durin g protei n synthesi s on the ribosome. Thus , th e molecule adapt s the amin o acids to the geneti c code . Its existence as the adapto r molecule wa s initially postulated at a tie clu b meeting b y Francis Cric k afte r th e determinatio n o f th e three-dimensiona l structur e of DNA. I t was discovere d i n 195 7 b y Hoaglan d et al. (1). The firs t tRN A nucleotid e sequence , that o f yeast tRNAla, wa s determined b y Holley et al. in 196 5 (2) , who als o first pro posed the clover-lea f representation of its secondary structure. Later it was found that all tRNA sequences ca n be folded i n such a structure. Currently , th e primary structure s of 2700 tRNA s ar e known (3) . I n 196 6 Cric k propose d th e wobbl e hypothesi s for th e reading of the triplet codon s o n mRNA by tRNA anticodons (4) . The firs t nucleotid e modifications i n tRNA were isolated in 195 9 (5,6) . In 1969 , Bernhardt and Darnell (7) reported that tRNAs ar e transcribed as part of larger precursor RNAs, an d in 197 1 th e first precursor sequence, that of E. coli p—tRNA Tyr, was elucidated by Altman and Smith (8). Th e precursor s are then processed to giv e mature tRNAs; th e processing pathways have since been elucidate d in many organisms and many of the enzyme s involved have been isolate d (reviewe d i n refs 9 and 10). Frase r and Rich (11) and Sprinzl an d Cramer (12) noted , i n 1975 , tha t amin o acid s are specifically attached to eithe r th e 2'-O H o r the 3'-O H o f the 3'-terminal adenosine of a tRNA, depending o n the aminoacylation system. A functional correlatio n for this observation wa s found in 199 0 wit h th e parti tion of aminoacyl-tRNA synthetases into tw o classe s (13) . Th e firs t three-dimensiona l structure of a tRNA was that of yeast tRNA phe and was determined i n the earl y 1970s , by tw o group s concurrently , on e heade d b y A . Rich a t MIT , Cambridge , US A (14 ) and the other led by A. Klug at the MRC i n Cambridge, U K (15) , using X-ray crystal lography. Subsequently , th e crysta l structure s o f a few other tRNAs were determined , that o f yeast tRNA AsP (16), E. coli initiator tRNA Metf (17 ) and yeast initiator tRNA Met, (18). I n addition , X-ra y crysta l structures have been determine d o f tRNA complexe d with cognat e aminoacyl—tRNA synthetases for the E. coli glutamine (19) , yeast aspartic acid (20) , and T. thermophilus serine (21 ) systems, and wit h th e E. coli elongation facto r
604
Oxford Handbook of Nucleic Add Structure
Tu (22) . Most recently , tRNA ha s been observe d i n th e E. coli ribosome , usin g cryoelectron microscop y (23,24). It wa s noted ver y earl y that tRN A i s a substrate for man y enzymes . First , tRNA genes ar e transcribe d by RN A polymeras e i n prokaryote s (25 ) an d b y RN A poly merase III in eukaryote s (26); the transcript s are precursor molecules that have 5' an d 3' extension s i n additio n t o th e sequence s tha t correspon d t o tRNA s an d ar e processed b y a serie s o f specifi c nuclease s t o giv e matur e tRNAs . Th e 5'-en d i s specifically cleave d b y ribonucleas e P , a ribonucleoprotei n tha t contain s a catalytic RNA subuni t and a helper protei n cofacto r (9). The 3'-en d i s processed by a variety of nucleases , an d th e integrit y o f th e CC A terminu s i s maintaine d b y a termina l nucleotidyl transferas e (10) . Man y nucleotide s ar e modifie d b y specifi c modifyin g enzymes durin g an d afte r maturation . The n amin o acid s ar e attache d t o th e 3' adenosine b y thei r cognat e aminoacyl—tRN A synthetases. Aminoacyl-tRNAs ar e bound b y th e elongatio n facto r T u (EF-Tu ) i n prokaryote s an d l a (eEF-la ) i n eukaryotes an d carrie d b y thi s facto r to th e ribosoma l A site , wher e th e anticodo n interacts wit h th e codo n o n th e messenge r RN A an d th e aminoacylate d 3'-CC A end interacts with peptidy l transferase . It is translocated to th e P site once th e amin o acid is incorporated int o th e growin g polypeptide . Initiato r tRNA s are different fro m the majorit y o f tRNAs, calle d elongators , i n tha t they possess certain features tha t are specific fo r initiation factors an d agains t elongation factors ; the y bind t o th e P sit e of the ribosom e whe n thes e ar e assemble d fo r protei n synthesis . The y ar e usuall y charged with methionine . In thi s chapter we discus s the structur e of cytoplasmic tRNA a t several stages of its cellular translational 'career.' Althoug h thes e molecules ar e seldom fre e i n solution, i.e . uncomplexed t o anothe r molecule, b e it a protein o r a ribonucleoprotein particle , the structures o f three tRNA s wer e determine d i n thei r 'free' , i.e . uncomplexed , states . The firs t par t o f the chapte r wil l thu s focus o n th e three-dimensiona l (crystal ) structures o f thes e tRNAs . Th e secon d importan t mileston e i n thei r cellula r activit y i s aminoacylation. Thre e case s of tRNAs boun d t o thei r cognat e aminoacyl—tRN A synthetases are known i n structural detail and they ar e described in the secon d par t of this chapter. Th e thir d par t i s devoted t o tRNA phe complexe d wit h th e elongatio n facto r Tu, whic h take s it t o th e ribosom e tha t i s already synthesizin g a polypeptid e chain . Finally, tRNA s hav e bee n observe d o n a ribosom e b y electro n microscopy . Th e relevant structures are summarized in Table 19.1 .
2. The free tRNA Transfer RNA s ar e 73—93 nucleotides long an d ca n be folde d into a similar clover-lea f secondary structure (2,3,27). There ar e constant features tha t are present in all tRNAs (Fig. 19.1 ) an d a number o f semi-conserved residues , i.e. constan t purines o r pyrim idines, tha t ar e concentrated i n th e D (dihydrouridine ) an d T (thymidine ) arms . All base pair s i n th e stems , wit h fe w exceptions , ar e o f th e Watson—Cric k type . Th e acceptor stem comprises seven base pairs and four additiona l residues at the 3 ' extrem ity tha t ar e no t bas e paired ; th e las t thre e o f thes e ar e CCA . Th e amin o aci d i s attached t o th e ribos e o f th e 3'-termina l adenosine . Th e T ar m i s the mos t highl y conserved stem—loop structure; the helica l stem consists of five bas e pairs and the loo p
Transfer RNA 60
5
Table 19.1. Hig h resolution structure s of tRNAs, tRNA-binding proteins, and tRNA-protein complexe s tRN A/protein
Organism
Resolution
Reference
Transfer RN A tRNAphe
S. cerevisiae
2.5
S. cerevisiae
3.0
E. coli S. cerevisiae
3.5 3.0
32 30 16 39 17 18
tRNAAsP tRNAMetf tRNAMeti Aminoacyl— tRNA synthetase s GlnRS:tRNAGln:ATP GluRS TyrRS:TyrAMP TrpRS:TrpAMP MetRS AspRS:tRNA^ATP AspRS:AspAMP LysRSrLys SerRS SerRS:ATP; SerRS:SerAMP SerRS:tRNASer:SerAMP HisRS:HisAMP HisRS:HisOH:ATP HisRS:His GlyRS PheRS Elongation factor s EF-Tu:GDP EF-Tu:GppNHp EF-Tu:GppNHp EF-Tu:Phe-tRNAphe:GTP EF-Tu:EF-Ts: EF-G:GDP EF-G tRNA-modifying enzyme s tRNA-guanine trans glycosylase Met-tRNAMetf formyltransferase Ribosomal particle Ribosome Ribosome:tRNA
(A)
T. thermophilus B. stearothermophilus B. stearothermophilus E. coli S. cerevisiae
2.8 2.5 2.5 2.3 2.9 2.3 2.7
T. E. E. T. T. E. E. T. T. T.
2.8 2.8 2.5 2.5 2.7 2.6 2.8 2.7 2.75 2.9
E. coli
thermophilus coli coli thermophilus thermophilus coli coli thermophilus thermophilus thermophilus
19 92 120 121 122 118 20 103 158 125 109 111 21,112 128 159 160 127 129
E. coli
2.5
T. T. T. E. T. T.
thermophilus aquaticus aquaticus coli thermophilus thermophilus
1.7 2.5 2.7 2.5 2.7 2.85
135 136 137 138 22 139 161 162
Zymomonas mobilis
1.85
50
E. coli
2.0
132
E. coli E. coli
23. 25. 20.
151 23 24
606
Oxford Handbook of Nucleic Acid Structure
Fig. 19.1. Clover-lea f diagra m of the secondary structure of a generalized tRNA. The conserve d residues are marke d i n capita l letters ; the conserve d purine s as R an d pyrimidine s as Y. R mod stand s for a heavily modified R . Varian t residues in fixe d position s are indicated by circles. Elements of variable size are drawn as bold dots. Som e conserve d tertiary interactions are shown by connectin g lines.
contains seve n nucleotide s whos e sequenc e i s T^CRANY, wher e ^ i s a pseudouri dine, N can be any nucleotide, R is a purine, and Y a pyrimidine. The ste m ends wit h a G:C bas e pair on the T loo p side . The anticodo n ste m is built of five bas e pairs and the loo p ha s seven nucleotides . Th e thre e centra l base s comprise th e anticodo n an d thus var y according t o th e acceptin g activity of the tRNA . Th e D ar m i s more vari able; it s stem is three o r fou r bas e pairs long an d th e D loo p ma y hav e 7—1 1 residues. The D loo p contain s some conserve d residues , suc h as two invariabl e Gs and a n A at the beginning o f the loop. Ther e is a conserved U a t position 8 , between th e accepto r and D stems . The variabl e loop i s the mos t variabl e element, rangin g in length fro m 4 to 2 1 bases ; however , mos t o f the variabl e loop s ar e short. Th e structure s of tRNAs were recentl y reviewed b y Dirheimer et al. (28).
Transfer RNA 60
7
The tw o tRNA s whos e structure s were determine d first , tRNA Phe an d tRNA Asp, are both elongato r tRNAs , i.e . they participate in the elongatio n cycl e of protein syn thesis. Th e structure s o f tw o initiato r tRNA s hav e als o bee n determined , on e prokaryotic an d on e eukaryotic . They bot h posses s distinct feature s tha t enabl e the m to hel p initiat e protein synthesi s on the ribosome. All these structures were solve d by X-ray crystallography . The three-dimensiona l structur e of yeast tRNASer was deduced from biochemica l dat a using the structura l framework o f the tw o elongato r tRNAs .
2.1 Yeast tRNAphe Yeast tRNA Phe i s considere d th e canonica l molecul e sinc e i t wa s th e firs t know n tRNA structure. The numberin g o f all tRNA sequences is based on tha t of tRNAphe. Its crystal structure was determined i n the 1970 s by two group s (14,15,29-34). Severa l reviews wer e writte n o n it s structure in th e sam e period (27,35—37) . Th e clover-lea f secondary structur e contain s th e constan t feature s a s described abov e (Fig . 19.2a) . I t has a wobble G4:U6 9 base pair in th e accepto r stern . The D ste m comprises four base pairs, an d the D loop contains eigh t nucleotides. The variabl e loo p is small, extendin g over fiv e nucleotides . The molecul e i s folded into an L-shaped structur e (Fig. 19.2b) , with th e tw o limb s nearly perpendicular t o eac h other, an d i s 20 A thick. Th e mai n structura l element is the A-for m RN A doubl e helix , whic h ha s 1 1 base pairs per tur n (Tabl e 19.2) . Th e principal characteristic s o f thi s for m ar e a wid e an d shallo w mino r groove , a dee p major groove , an d bas e pairs tilted relativ e t o th e heli x axi s (se e Chapters 1 and 17) . The segment s havin g thi s structur e correspon d t o th e bas e paire d portion s o f th e clover-leaf. Th e accepto r stem stack s ont o th e T stem ; thi s combinatio n form s on e limb o f the L structure. The anticodo n an d D stem s stac k to for m th e othe r lim b o f the L.
Table 19.2. Averag e helical parameters o f A-RNA, tRNA phe an d tRNA Asp a Stem A-RNA tRNA
Rise/residue (A)
Residues/turn
32.7
2.8
11.0
35.8 32.6 31.1 33.6
2.36 2.68 2.71 2.51
10.1 11.0 11.6 10.7
33.0 32.1 34.3 32.5
2.63 2.52 2.03 2.62
10.9 11.2 10.5 11.1
phe
D
Anticodon T
Acceptor tRNAAsp D
Anticodon T
Acceptor a
Twist/residue (°)
Adapted from ref . 163.
Fig. 19.2. Yeas t tRNAphe. (a ) Clover-leaf representation . The tertiar y interactions are shown by connecting lines, (b) Three-dimensional fold . Th e backbon e is shown as a stick rendering an d the phosphat e atoms are traced as a thick black line. (Fro m ref. 124 , by permission o f Oxford Universit y Press.)
Transfer RNA 60
9
The two limb s are hel d togethe r at the elbow , whic h is structurall y the mos t complex par t o f the tRNA . I t is stabilized by tertiary interactions between th e D and T loo p and strongly anchore d b y what is known a s the augmented D helix . Th e latte r is formed by the helical portion o f the D stem , th e tw o residues between th e acceptor and D stems , namely U8 an d A9, and the flanking residues of the variable loop. Thes e bases for m tertiar y bas e triple s wit h th e D helix . Mos t tertiar y interaction s involv e hydrogen bond s betwee n base s that resul t in bas e pairs or bas e triple s tha t ar e not o f the Watson—Cric k typ e (Fig . 19.3) . Startin g a t th e botto m o f th e accepto r stem , as seen in th e clover-lea f diagram , an d moving toward s th e D stem , residu e U8 make s a reverse Hoogsteen interactio n with A14; the backbones ar e antiparallel. The followin g residue, A9 , interact s with bas e A23 i n a symmetric fashion , and th e backbone s ar e parallel; A2 3 pair s i n th e Watson—Cric k manne r wit h U12 . Proceedin g alon g th e strand, G10 , whic h i s the firs t residu e in th e D stem , form s a standard Watson-Crick base pai r wit h C2 5 an d a tertiary interactio n wit h G4 5 o f th e variabl e loop, whos e backbone run s paralle l wit h tha t o f G10 . Bas e pai r C11:G2 4 i s not involve d i n an y tertiary interactions with bases , but wedges i n between tw o triples, changin g th e heli x axis as a result. U12, a s already mentioned above , is involved i n a base triple with A2 3 and A9. It is followed by C13, whic h bas e pairs in the standar d Watson—Crick fashion with G22 ; th e latte r interact s i n a non-standar d an d asymmetri c fashio n with G46 , which i s part o f th e variabl e loo p an d whos e backbon e run s antiparalle l to tha t o f G22. A1 4 forms th e 5'-flan k o f the D loop and associates with U8 . G1 5 interacts in a reverse Watson-Cric k fashio n wit h C4 8 o f th e variabl e loop ; thei r backbone s ru n parallel t o eac h other . Thi s i s also known a s the Levit t pair , for i t ha d bee n predicte d by Levitt (38 ) befor e the crysta l structure was determined. Tw o interactions betwee n the D an d T loo p follow : th e non-standar d Gl8:^55 pair , an d th e Watson—Cric k pair G19:C56 . A n intra- T loo p stru t i s formed b y th e revers e Hoogstee n bas e pai r T54:A58. Th e transitio n fro m th e D ste m t o th e anticodo n stem i s marked b y a 24° kink i n the helica l axis between th e two , introduce d b y the hing e forme d by the sym metrical heteropurin e bas e pair G26:A44 . Furthermore , severa l bases for m hydroge n bonds t o th e backbone . Cl l o f the D stem contact s th e 2'-O H o f A9. A21, whic h flanks the D loop o n the 3'-end, interact s with th e 2'-OH o f the ribose o f U8. ^5 5 contacts th e phosphat e o f residue 58 . G57 , whic h follow s th e T^ C i n th e T loop , hydrogen bond s with the 2'-OH o f the riboses of residues 1 8 and 55, and with th e 4' O o f residue 19 . The bas e of G57 als o intercalate s between bas e pairs Gl8^55 and G19:C56 and thus enhances the stability of the junction. Mos t base s of the tRNA are engaged i n stacking interactions, which provide s additional stabilizatio n of the tertiar y structure. Onl y D16 , D17 , an d G2 0 o f th e D loop , an d U4 7 o f th e variabl e loop , point int o th e solvent and do not engag e in stacking interactions. The anticodo n loo p i s similar t o th e T loo p i n tha t bot h contai n seve n residues . They also have similar conformation s of the backbone , which make s a sharp turn, an d a U residu e at the ben d (U3 3 an d ^55) tha t stabilize s the ben d b y interacting wit h a phosphate moiet y o n th e opposit e stran d o f th e loop . Thi s tur n wa s dubbe d th e uridine, o r U tur n (Fig . 19.4) . Th e U bas e terminates th e hydrophobi c stac k emanat ing fro m th e anticodo n stem b y makin g a va n de r Waal s contac t wit h a phospat e group.
Fig. 19.3 . Som e representative tertiar y interaction s i n cRN A Cl 9 C5 6 i s no t shown, sinc e i t i s a standard Wacson Cric k hav e pair . (Fro m re f 124 . by pe r mission o f oxfor d Universit y Press. )
'I'Mtisfrr K ,\v l C
> 11
Fig. 19.4. S t o r e v i e w o f di e Lintkoiim i U tur n i n t R N A p l se
The accepto r ste m contain s a wobbl e G4:U6 9 bas e pair , whic h introduce s a series of rotation s i n [h e backbon e that resul t i n th e displacemen t o f phosphat e 5 b y abou t 2 A fro m wha t woul d b e it s norma l positio n i n a standar d doubl e helix . Th e confor mation i s stabilized b y a water molecule , There ar e fou r hexacoordinate d Mg 3+ bindin g sice s i n tRNA 1 ' 1 " 1 (Fig . 19.5) , tw o o n the bac k sid e o f th e elbow , one . i n th e augmente d heli x region , an d th e fourt h i n th e anticodon loop . Whil e som e o f th e meta l coordinatio n site s ar e fille d b y phosphat e oxygens, mos t direc t bindin g i s carried ou t b y wate r molecules , whic h the n interac t mostly wit h th e phosphate s o f th e tRNA , althoug h som e o f the m ar c ligande d b y nuclecotide bases . 2.2 Yeast tRNA A s p The structur e o f yeas t t R N A s p (Id,39 ) i s globall y simila r t o tha t o f tRNA 1 ''" 1 . It s clover-leaf secondar y structur e show s th e constan t feature s a s indicate d abov e (Fig. 19.6a) . A s fo r th e variabl e features , i t possesse s a three-bas e pai r I ) stem , a 10 nudeotide 1 ) loop, an d it s variable loo p i s shorter tha n tha t o f tRNA 1 '' 1 ', consisting o f four residues . Whil e th e D loo p contain s n o G: C bas e pairs , bot h th e accepto r an d anticodon stem s ar e ric h i n them . The overal l foldin g o f t R N A A s p (Fig , 19.6b ) i s th e sam e a s tha t o f tRNA 1''"1; bot h have simila r L-shapec l structure s an d ar e 2 0 A thick . However , th e conformatio n of tRNA A s p i s mor e open , resemblin g a boomerang . Th e angl e betwee n th e helica l axes o f th e accepto r T ste m heli x an d th e anticodo n 1 ) ste m lim b i s b y abou t 10 ° more obtuse . Th e doubl e helica l segment s ar e base d o n th e R N A - t y p e doubl e heli x (Table 19.2) . The relativ e positions of th e D an d T loop s are differen t a s well. Similarly to tRNAphe, the transitio n fro m th e D ste m t o th e ariticodo u ste m i s marked b y a 2.5 ° break i n th e helica l axe s between th e two , introduce d b y th e hing e forme d b y the mis matched symmetrica l punne-purine bas e pai r G26:A44. Tertiary interaction s (Fig . 19.7 ) ar e fo r th e mos t par t simila r t o thos e foun d i n tRNA1'111'. Unlik e tRNAple , al l bases o f th e variabl e loo p participat e in suc h contacts ; [he shorte r variabl e loo p induce s a differen t interactio n o f th e bas e o f A21 , whic h interacts wit h th e bas e o f A1 4 an d als o contact s th e ribos e o f U8 . U 8 an d A1 4 ar e
612 Oxford
Handbook
of
Nudeic Acid Structure
Fig. 19.5. Stereoview s o f three nmgnesium ion-bindin g site s i n t R N A (b) i n th e augnente d 1 ) helix, and (c ) in the D .) loop.
plu
(a ) in th e anticodo n l oop,
Fig. 19.6. Yeas t tRNAasp. (a ) Clover-leaf representation. Th e tertiar y interactions ar e shown b y connecting lines , (b) Three-dimensional fold . Th e backbon e is shown a s a stick rendering an d the phosphat e atoms are traced as a thick blac k line. (Fro m ref. 85, with th e permissio n o f Cold Sprin g Harbor Laborator y Press.)
61 4
Oxford
Handbook
of
Nudeic Add Structure
Fig. 19.7 . Some representativ e tertiar y interaction s i n RNA 1 '' 1 '. The- followin g AR E no Shwn. sinc e t h e y a r e v e r y s i m i l a r t o the) o n e s i n RNA 1 ' 1 ' 1 ': A 9 : A 2 3 . U 1 2, G18C56 (and:jiu T54:A58.
t l
engaged i n a revers e Hoogstee n bas e pairing . A s i n tRNA 1 ''", A 9 interact s in a sym metrical fashio n wit h bas e A23 , whic h i n tur n pair s i n th e Watson—Cric k manne r with U I 2 ; G1 0 form s a wobbl e bas e pai r wit h U2 5 an d a tertiar y interactio n wit h (145. Bast - pai r U11:A2 4 i s no t involve d i n an y tertiar y interaction s wit h bases , bu t forms a wedg e betwee n th e tw o neighbourin g triples . Unlik e RNA The . ^1 3 form s a wobble pai r wit h G22 ; th e latte r interact s wit h G46 , also in a non-standar d way . Th e Levitt pai r i n thi s cas e i s th e revers e Wastom-Cric k A 1 5 : U 4 8 bas e pair , i n tRNA phe this i s a revers e Watson-Cric k G C pairing . Owin g t o th e differen t relativ e position s of th e invarian t G s i n th e 1 ) loop , th e interaction s betwee n th e I" ) and '] ' loo p ar e slightly differen t fro m thos e i n tRNA Pae '. G1 7 interact s wit h ^5 5 an d G5 5 form s a
Transfer RNA 61
5
Watson—Crick base pair with C56 . A5 7 forms a backbone contac t wit h th e ribos e o f ^55, an d intercalate s betwee n bas e pair s Gl17^S S an d G18:56 , muc h a s it doe s i n tRNAPhe. ^55 als o interact s wit h th e phosphat e o f A58 . A n intra- T loo p stru t is formed by the revers e Hoogstee n bas e pair T54:A58. Residues D16 , D19 , an d G20 of the D loop project into the solvent an d do not participat e in stacking interactions. The anticodo n loop of tRNA Asp ha s the sam e fol d a s that in tRNA phe. I t has the U turn structur e an d th e sam e stackin g pattern . I n th e crystal , th e anticodon s o f tw o tRNAAsp molecule s interac t vi a thei r self-complementar y sequence s i n a two-fol d symmetrical fashion . Thi s duple x formatio n i s most likel y responsibl e fo r th e wide r angle between th e anticodo n an d acceptor stems , a s suggested by solution studie s (40).
2.3 E. coli initiator tRNAMetf The structur e o f E. coli tRNA Metf (17 ) i s globall y simila r t o thos e o f elongato r tRNAPhe an d tRNAAsp . I n fact , it s structur e wa s solve d b y molecula r replacemen t using yeas t tRNA phe a s the searc h model . Overall , i t i s 77 nucleotid e residue s long , and it s secondary structur e is a clover-leaf (Fig . 19.8 ) tha t show s mos t o f the constan t
Fig. 19.8. Clover-lea f representatio n of the E. coli initiator tRNA Metf.
616
Oxford Handbook of Nucleic Acid Structure
features indicate d above , wit h th e exceptio n tha t th e firs t bas e pai r o f th e accepto r stem i s a C: A mismatch . A s for th e variabl e features , i t possesse s a four-bas e pai r D stem an d a nine-nucleotid e D loop . It s variabl e loop consist s o f fiv e residue s an d i s thus the same length a s that o f tRNAphe. All helical stem s are rich in G: C bas e pairs. The overal l foldin g o f tRNA Metf is the sam e L-shaped structur e a s that o f the elon gator tRNAs . Th e conformatio n o f tRNAMetf is more aki n t o tha t o f tRNAphe, since the helical axes of the accepto r T ste m helix an d the anticodo n D stem lim b ar e nearly orthogonal. A s in th e tw o elongato r tRNAs , th e doubl e helica l segment s ar e based on the RNA-typ e doubl e helix . A mismatche d termina l pai r o f base s o f th e accepto r stem make s th e accepto r en d mor e flexible , whic h i s reflecte d i n th e mor e curve d conformation. Most o f th e tertiar y interaction s observe d i n tRNA phe appea r t o b e presen t i n tRNAMetf, althoug h som e difference s d o exist . Th e bas e o f residu e A5 7 intercalates between th e base s of G18 and G19 . Th e nucleotid e doe s not mak e as many backbon e interactions a s its analogu e i n tRNAPhe , G57 . Th e D loo p i s on e bas e longe r an d more tightl y organize d tha n tha t o f tRNA phe; i t i s als o folde d toward s th e core . Residues C1 7 an d U17 a d o no t exten d int o th e solven t bu t ar e close r togethe r an d the base s stack on eac h other . Outside o f the core , th e mai n difference s li e at the en d of the accepto r arm an d the anticodon loop . Th e termina l bas e pair o f the accepto r ste m is a mismatch. Th e anti codon loo p i s superficially simila r t o tha t o f tRNA phe fo r it s nearl y simila r stacking. However, th e orientatio n o f U3 3 i s dramatically different . Th e bas e point s int o th e solvent, wherea s i t i s stacked in tRNA phe. Thus , i t canno t hydroge n bon d wit h th e phosphate o f residue 36, but th e ribos e does . Henc e th e phosphat e i s in a slightly dif ferent position , whic h result s in a marked shif t awa y from th e loo p i n th e positio n o f the phosphate moeity o f nucleotide 35 .
2.4 Yeast initiator tRNA Meti The structur e o f yeas t tRNA Meti (18,41 ) i s globall y simila r t o thos e o f elongato r tRNAphe an d tRNA Asp. Overall , i t i s 75 nucleotid e residue s long, an d it s secondary structure i s a clove r lea f tha t exhibit s al l o f th e constan t feature s indicate d abov e (Fig. 19.9a) . A s fo r th e variabl e features , i t possesse s a four-bas e pai r D stem an d a shorter, seven-nucleotide , D loop . It s variabl e loo p consist s o f fiv e residue s an d i s thus the same length a s that o f tRNA phe. The overal l foldin g o f tRNA Meti i s th e sam e L-shape d structur e a s tha t o f th e elongator tRNA s (Fig . 19.9b) . I n tRNA Meti, lik e tRNA Metf, th e helica l axe s o f th e acceptor T ste m helix an d the anticodo n D ste m lim b ar e nearly orthogona l an d th e double helica l segments are of standard RNA-type. All of the tertiar y interaction s see n i n tRNA phe ar e present. Th e U8:A14:A2 1 an d C13:G22:m7G46 triples , th e G15:C4 8 revers e Watson-Crick , an d th e m 22G26:A44 symmetrical heteropurin e interaction s ar e essentiall y identica l i n th e tw o tRNAs . I n tRNAMeti, G18 interact s with U55 instea d of a U at the sam e positio n in tRNA phe. Other interactions are very similar . The G10:C2 5 pair interacts with U45 instea d of a G, an d th e essentiall y homologou s tripl e G9:C23:G1 2 replace s th e A9:A23:U1 2 o f tRNAphe.
Fig. 19.9. Yeas t initiator tRNA Meti. (a ) Clover-leaf representation. Th e tertiar y interactions are shown b y connecting lines . Novel interactions ar e shown b y bold lines. (b) Three-dimensional fold . Th e backbon e is shown a s a stick rendering and th e phosphat e atoms ar e traced as a thick black line.
618
Oxford Haudbook of Xnr/cj V Acid Structure
The u n i q u e feature s of tRNAMet1 cluster i n , 1 regio n o f the cor e o f the thre e dimen sional structure , givin g ris e t o . 1 unique contiguou s surface . The y for m a substructur e specific fo r eukaryoti c i n i t i a t o r t K N A s that i s chan-teierize d b y a shortene d I) loop , A2 0 instea d o f a D , an d A5 4 instea d o f th e T i n clongato r t R N A s . Thes e lea d to som e nove l tertiar y interaction s (Fig . 1 9 . 1 0 1 ). The A54:A5 8 stru t i s analogous t o that see n i n elongarors , althoug h differen t i n th e natur e o f th e bases . Th e asymmetri c homopurint" pai r shift s th e positio n o f th e backbon e a t residu e 5 8 slightly . Th e nearl y invariant pyrnmidin e a t positio n 6 0 i n th e T loo p i s replaced b y a n A . Th e substruc ture i s stabilized by a networ k o f hydroge n bonds . Residu e A2 0 o f th e D loo p inter acts wit h G57, A.59 , an d A6 0 i n th e T loop ; thi s interactio n i s sequenc e specifi c an d forms a stron g bridg e betwee n th e tw o loops . I t seem s t o fil l th e rol e o f th e Mg2 '
Fig. 19.10. Some - representativ e tiTtiar y inter.it:tmn s i n yt-;is l i l V N A v 1 ' ' ' , . Th e followin g interattion s an nut slunvii . liiu'L' the y ar e ver y simila r t o th e one s in tRNAphe : U 8 : A 1 4 : A 2 1 , G22:C [.>:m'' :{ ;^h, U 1 5 : C ! 4 H , in-.-C;2f>:A44, C ; i H : U S S . G 1 'J:C:.Sf>, .iru l [M:A5S .
Transfer RNA 61
9
coordinated i n th e sam e regio n o f tRNA phe. Thes e element s represen t a functional differentiation withi n th e common tRN A fold. The tRN A show s ye t anothe r uniqu e feature , whic h i s a nove l modification , a 5'-phosphoryl grou p O2 ' glycosylate d t o th e ribos e o f residu e A64 . I t appear s o n the surfac e i n th e mino r groov e an d i s accessibl e t o solven t an d othe r molecules . The phosphory l grou p interact s wit h th e bas e o f th e neighbourin g residu e 63 . It s role seem s to b e a rejection signal for elongatio n factors . The anticodo n ar m i s no t define d wel l enoug h i n th e electro n densit y ma p t o furnish detaile d structura l information (18) . Th e sequenc e i s distinc t fro m tha t o f elongator tRNA Metm; th e invariant G:C base pairs of the anticodo n ste m appear essen tial for the initiatio n functio n and ar e not foun d in elongater s (42).
2.5 Yeast tRNASer in solution The clover-lea f secondar y structur e o f yeas t tRNA Ser (Fig . 19.1 1 a) ha s th e standar d constant features , a three-base pai r D stem , a 10-nucleotid e D loop , an d a large variable loop. The latte r is built of a four-base pair ste m an d a three-nucleotide loop , an d is flanked by one nucleotid e a t the anticodo n ste m an d two residue s at the T stem . Its structure i n solutio n wa s probed wit h a variet y o f chemica l agent s (43) , alon g wit h those o f tRNA Phe an d tRNA Asp. Th e sequenc e an d the resultin g comparison o f pro tection pattern s were combine d wit h th e three-dimensiona l foldin g o f tRNA phe an d tRNAAsp to obtai n a model of tRNASer. The coordinate s of the structur e of tRNAAsp were use d for the actua l model, fo r it has a more simila r D loop . Th e resultin g model (Fig. 19.lib ) ha s the classica l tRNA L shape with th e extr a arm nearl y in the plan e o f the tw o limbs o f the L . There ar e slight differences i n th e anticodo n loop . The inter actions withi n th e T loo p ar e maintained. Th e variabl e stem an d loop ar e characterized by tight folding , with a three-nucleotide min i loo p cappin g a four-base pair stem. It is joined to th e body o f the tRN A i n a fashion mor e aki n to tha t of tRNA Asp. Th e large variabl e loop engender s som e replacement s i n th e tertiar y interaction s i n th e augmented D helix , whil e som e ar e preserved. Bas e pair G10:C25 doe s no t see m t o interact wit h bas e 45, for the latter is engaged in base pairing within the variabl e loop; the N 7 o f G1 0 i s accessible to chemica l agents . Residu e 9 i s likely t o interac t wit h base pai r 12:2 3 i n a differen t way . The bas e of residu e G47:9 , whic h i s analogous t o residue 4 6 i n tRNA Asp o r tRNA phe, stack s betwee n base s G9 an d A21 , whic h lock s the variable stem in its position relativ e to the body o f tRNASer.
2.6 Comparison in solution of yeast tRNAPhe and tRNAAsp The structure s of yeast tRNA phc an d tRNA Asp wer e probe d b y chemical modificatio n (44). The principa l differences wer e observe d i n the accepto r ste m (namel y in purines 4, 71, and 73), i n residu e A21 of the D loop , an d in residu e G45 of the variabl e loop. The N 7 o f A21 was found to be reactive in tRNA Asp an d unreactive in tRNA phe. Th e movement o f residu e A46 toward s th e interio r o f the molecul e i n tRNA AsP and th e absence o f residue 4 7 result i n a different shape of the variable loo p and expose th e N 7 of A21 ; th e grou p i s protected i n tRNA phe b y th e modifie d m 7G46. Th e tertiar y
Fig. 19.11. Yeas t tRNA Ser. (a ) Clover-leaf representation, (b ) Three-dimensional fold a s modelled on th e basi s of chemical modification experiments. The back bone is shown as a stick rendering and the phosphat e atoms are traced as a thick black line.
Transfer RNA 62
1
interaction U8:A14:A2 1 is different i n th e tw o tRNAs . Th e N 7 o f G45 i s reactive in tRNAphe an d protecte d i n tRNA Asp becaus e o f th e differen t stackin g o f residu e 9 between base s 45 an d 46 . I n th e accepto r stern , G 4 i s reactive i n tRNA Asp an d pro tected i n tRNA phe, wherea s th e situatio n i s reversed fo r th e N 7 o f G71 ; thi s occur s because o f differences i n stacking interactions. Whe n a purine i s stacked between tw o pyrimidines, th e N 7 i s reactive , otherwis e i t i s not ; i t i s als o unreactiv e whe n i t i s involved i n tertiar y interactions. Residues G18 , G19 , an d G3 4 hav e N7 expose d an d are reactive ; the y ar e located in loops. The solutio n structure s largel y agre e wit h th e crystal structures.
2.7 General principles oftRJVA structure The commo n featur e o f all these tRNAs is the overal l L structure. Although ther e are some difference s i n th e detail s o f tertiar y interactions , th e RN A chai n follow s th e same fold . Th e tw o helica l arms of the L , built throug h stackin g of the accepto r an d T stem s in on e cas e an d th e anticodo n an d D stem s in th e other , ar e both base d on the A-for m RN A helix . Althoug h thei r relativ e orientatio n wit h respec t t o eac h other ma y vary from on e tRNA to another , i t is very close to bein g orthogonal . Levitt correctl y predicte d som e interaction s before any of the structure s were determined (38) , namely U8:A14, 9:12:23 , G15:C48 , 18:55 , and 19:56 . He als o postulated some tha t were no t foun d i n th e crysta l structures; for example , h e ha d A2 1 pairin g with T54 . Whil e hi s prediction o f th e tw o limb s resultin g fro m th e stackin g o f th e acceptor ste m o n th e T ste m an d o f th e D ste m o n t o th e anticodo n ste m wer e correct, thei r relativ e orientation i n the for m o f a sausage instead of an L was not. This discussio n has so far focused on cytoplasmi c tRNAs. Plastids , i.e. mitochondri a and chloroplasts , also posses s thei r ow n translatio n machinerie s an d pool s o f tRNA s (45). Mitochondria l tRNA s have some uniqu e feature s (46) . In som e cases , they hav e truncated clover-lea f structures , i.e. a portion o f the D o r T ar m ma y be absent . I n principle, the y ca n still fold i n the manne r o f their cytoplasmic counterparts.
2.8 Nucleotide modifications in tRNA In additio n t o th e fou r standar d ribonucleotides, tRNA , lik e many othe r RNAs , pos sesses modifie d nucleotides . I t i s the mos t extensivel y modifie d RN A specie s in th e cell an d possesse s the greates t variet y o f suc h nucleotides . Thi s subjec t ha s bee n recently reviewe d i n considerabl e detai l (47—49) . Th e pattern s are similar in al l phyla, which reflect s commo n evolutionar y origins . However , som e modification s ar e specific t o certai n phylogenetic domain s and/or species . Eukaryoti c tRNA s ar e mor e extensively modified than prokaryotic an d mitochondrial tRNAs . There ar e mor e tha n 8 0 differen t type s o f modification s i n al l tRNAs . A tRN A species ma y possess a number o f modifie d bases , all of which, wit h th e exceptio n o f the Q base , are introduced post-transcriptionall y b y a variety of enzymes. Ther e are at least 4 5 differen t modificatio n enzyme s i n a bacteria l cell , whos e gene s represen t approximately 1 % of the genome . B y comparison, about 0.25% o f the genom e i s used to encod e th e tRNA substrates. A pathway of several of these enzymes may be neede d
622
Oxford Handbook of Nucleic Acid Structure
to produc e on e modifie d nucleoside . T o date , 1 7 ou t o f abou t 4 5 modificatio n enzyme gene s hav e bee n identifie d i n E. coli (47) . Th e structur e o f on e o f thes e enzymes, tRNA—guanin e transglycosylas e fro m Zymomonas mobilis (50) , whic h changes th e guanin e i n th e wobbl e positio n o f tRNA Asn, tRNA Asp, tRNA His, an d tRNATyr t o a hypermodifie d base , queuine , i s based o n a n eight-strande d ) 8 barrel core; th e paralle l B strands are connected b y simpl e helices , multipl e helices , o r eve n more elaborat e combinations o f helices and strands. The enzym e contain s a zinc (Zn) binding moti f tha t is implicated i n tRNA binding . Modifications ar e introduce d i n a stepwis e fashio n a t differen t stage s durin g an d after processin g of tRNA precursors, in a n intricate interpla y o f pathways; the timin g depends o n th e processin g stage, substrat e concentration , an d th e activit y o f a given processing enzyme. As modified nucleotides inhibi t RNAas e P activity, the 5 ' cleavage occurs early . Methylatio n o f ribos e moietie s occur s i n almos t matur e tRNA . I n eukaryotes, som e reaction s tak e place i n th e nucleu s whil e other s occu r i n th e cyto plasm (47). Chemically, an y singl e modificatio n ca n ad d o r enhanc e certai n propertie s o f a nucleotide bas e or sugar , which ma y include th e introductio n o f transient or perma nent charges , alteratio n o r restriction o f nucleoside o r phosphodiester conformation , hindrance of canonical or non-canonical bas e pairing, facilitatio n of metal ion coordi nation, rearrangement o f water structure, and formation of new interactions leading to new conformation s and chemistries. Modifications thus extend th e poo l o f functiona l groups in a nucleic acid beyond th e fou r standar d bases. They ma y be relativel y simple, such a s methylation (a s in rT) , thiolatio n (a s in 4-thioU ) o r glycosidi c bond substitu tion (a s in pseudouridine , Y ), o r more complex , involvin g addition s o f amino acid s or heterocyclic functiona l groups . However , eve n a simpl e methylatio n ma y alte r hydrophobicity, inhibi t Watson—Crick base pairing, o r introduce a charge whe n adde d on t o a heterocyclic nitrogen (e.g . N7) (49) . Structurally, modified an d unmodified tRNA s are similar, either in solution (51,52 ) or i n a complex with a protein (53) . However, unmodifie d tRNAs ar e not a s stable, as indicated b y thei r lowe r meltin g temperature s (53-56 ) o r chemica l an d enzymati c accessibility (55,57) . Modification s thu s enhanc e th e stabilit y o f tRN A structure . Uridine modification s ar e ver y widespread , representin g a larg e proportio n o f al l modifications. Th e mos t frequently encountered ar e the D, Y , an d thioU nucleotide s (49). M f appears t o stabiliz e th e structur e b y reordering neighbouring water molecules (53). D (dihydrouridine ) is a non-aromatic (saturated ) version o f U an d is found in th e D loo p an d sometimes in th e variabl e loop. I t alters the suga r pucker to C2'-endo and restricts backbone conformatio n (58,59) . Thiouridines , suc h as 2-thio- and 4-thio-U, restrict nucleotid e conformatio n (60) . Methylation s ar e als o involve d i n structura l stabilization throug h enhancemen t o f metal bindin g an d bas e stacking, restriction o f conformational flexibility, and reordering o f water (61,62). Most modification s ar e not essentia l for aminoacylation , whic h ha s been demon strated b y a numbe r o f biochemica l studie s performe d wit h unmodifie d tRNA s obtained b y transcription in vitro. In E. coli, most tRNA s accep t cognate amin o acids . Examples includ e tRNA Val (52) , tRNA Hls (63) , tRNA Gln (64) , an d tRNA phe (65) . There ar e thre e notabl e exceptions , tRNA IIe, tRNA Glu, an d tRNA tys. Th e mNm5s2U34 i s a ke y determinan t o f tRNA Glu identit y (66,67) . Th e absenc e of th e
Transfer RNA 62
3
same modificatio n i n tRNA Lys reduce s th e rat e o f aminoacylatio n b y tw o order s o f magnitude (68) . Aminoacylatio n o f tRNA Ilc i s similarly reduce d whe n th e lysidin e modification o f C a t th e wobbl e positio n 34 , k 2C34, i s replace d b y a C (69) . Th e kinetic parameter s of mos t aminoacylation s diffe r slightl y whe n unmodifie d tRNA s are used , compare d wit h modifie d tRNA ; modification s ma y modulate interaction s with aaR S (aminoacyl-tRN A synthetase) . A notabl e exceptio n i s tRNA Asp fro m E. coli, wher e th e unmodifie d specie s ca n als o b e charge d b y ArgR S (70) . Thus , modifications ca n constitute antideterminants, but no t i n all cases. Modifications als o play an important rol e in the way tRNAs interact with the ribosome and associated translation (initiation and elongation) factors . Fo r example, 2'-O ribosyladenosin[phosphate] a t position 6 4 of eukaryotic initiator tRNA Met i s likely to be a negative determinan t fo r acceptanc e by th e elongatio n facto r eEF-l a (71—73) . Moreover, modifie d nucleotides may strengthen tRNA-ribosom e association (49,74). Furthermore, modifie d nucleotide s at the wobbl e position o f the anticodo n (residu e 34) modulate codon readin g by enhancing the conformational flexibility or rigidity o f the nucleotide ; thi s extend s o r restrict s th e wobbl e read-ou t o f th e correspondin g codon nucleotid e (48) .
3. tRNA in aminoacylation The commo n structura l fold share d by tRNAs enable s them t o interac t with tRNA processing enzyme s and th e protei n synthesi s apparatus. However, the y sho w certai n distinguishing features tha t ar e recognized b y a cognate aminoacyl—tRN A synthetase (aaRS) an d rejecte d by a non-cognate aaRS ; thes e features , name d identit y determ inants, wer e firs t identifie d in tRNA Ser (75) . The y ar e distributed i n differentia l pat terns in differen t set s of tRNAs and comprise the necessar y and sufficient element s for recognition b y the cognat e aaRS and rejection by non-cognate aaRS , i.e . th e identit y of a give n se t o f isoacceptor tRNAs . The y ar e locate d primaril y i n th e anticodo n loop, th e accepto r arm, and a few base pairs in th e T an d D stem s (76). Biochemical analyses using in vivo and in vitro techniques have led to th e elucidatio n of the identit y determinants for a number of tRNAs (77 ) by using two approaches : identity swappin g and transplantatio n of identit y elements . I n th e former , minima l change s ar e intro duced int o a tRNA suc h tha t i t becomes recognized by the ne w aaRS. Th e experi ment must also prove that th e introduce d element s constitut e th e identit y o f the ne w system (78) . In th e latte r method , variant s of a particular tRNA ar e synthesized an d analysed fo r thei r capacit y a s substrates for th e aaR S involve d (79) . Sinc e efficien t aminoacylation depend s on th e overal l conformatio n o f the tRN A a s well a s on th e presence o f th e elements , tRNA s obtaine d i n suc h a way are not optimize d fo r th e new amino aci d acceptance (80). Aminoacyl-tRNA synthetases (aaRSs ) catalys e the esterificatio n o f the amin o aci d to on e o f th e hydroxy l group s o f th e 3'-termina l adenosin e o f th e tRN A vi a an aminoacyl—adenylate intermediate . Th e energ y fo r th e reactio n i s supplie d b y th e hydrolysis o f AT P (81) . Eac h amin o aci d ma y b e specifie d by severa l isoaccepto r tRNA species , while, i n general , ther e i s on e aaR S fo r eac h amin o aci d (81,82) . Several reviews have been published on the subjec t o f aaRSs (83—86) .
624
Oxford Handbook of Nucleic Add Structure
Fig. 19.12. GlutAminyl-r.RN A syothecas e t R N A g a comple x fro m F. ndi. Th e accepto r a n d - b i d i n g domain i s in light grey and th e .andcodon binding module - i s in clark grey. Th e t R N A i s draw n wit h it s phosphate chai n trace d a s , 1 thic k line. (Fro m ret" . 85 , w i t h the permissio n o f C'ol d Spring Harbor Laboratory press,)
Although the y catalys e what i s essentiall y th e sam e reaction , aaRS s ar e a diverse family o f enzymes, whos e quaternar y structure s can b e monome/rie , dimcric, an d eve n tetrameric. Ye t these enzyme s ca n b e groupe d int o tw o classe s of te n member s eac h (13,20), whic h ar e correlate d t o tw o structura l and functiona l solution s to th e organ ization o f th e activ e sit e domain . The activ e sit e domain s o f clas s 1 aaR-Ss contai n th e Rossmann fol d nucleoticle-bindin g motif, a n alternatin g x- B structur e wit h a centra l parallel B shee t an d sho w signatur e amin o aci d sequence s HIG H an d KMSKS . Thes e aaRSs esterit y th e amin o aci d t o th e 2'-O H o f th e 3'-termina l ribose . Th e activ e sit e modules o f clas s I I aaRS s ar e based o n a n antiparalle l B shee t an d hav e three concate -
Transfer RNA 62
5
nated homologous sequenc e motifs, 1 , 2, and 3 (87,88); th e latter two motif s form the catalytic site, while moti f 1 is involved i n the dime r interface , as these aaRSs are obligate dimers (88) . These enzyme s esterify th e amin o aci d to the 3'-OH, with th e excep tion o f phenylalanyl-tRN A synthetas e (PheRS) , whic h acylate s th e 2'-OH (11—13,87). T o th e activ e sit e cor e domain s tha t defin e th e class , whic h typicall y consist o f abou t 30 0 t o 40 0 residues , ar e attache d polypeptid e module s tha t lea d t o different size s and tRNA specificities of aaRS (88,89) .
3.1 tRNAGln complexed with glutaminyl-tRNA synthetase Glutaminyl—tRNA synthetase (GlnRS ) i s a class I aaRS. Th e enzym e fro m Escherichia coli i s a monomer o f 55 3 amin o aci d residue s and ha s a molecular weigh t o f 6 3 kD a (90). I t i s a n elongate d protei n consistin g o f tw o majo r modules : th e activ e sit e module consist s of th e paralle l B sheet nucleotide-bindin g fol d (th e Rossmann fold ) into whic h i s inserted th e acceptor-bindin g subdomain , an d th e anticodon-bindin g module comprise s tw o B barrel s (19 ) (Fig . 19.12) . I n th e activ e site, th e tw o motif s characteristic o f clas s I aaRS , HIG H an d MSK , interac t wit h eac h other , formin g a surface tha t bind s th e AT P molecul e i n a n extende d conformation . Th e 2'-O H o f tRNAGln an d th e a-phosphat e o f ATP ar e within hydroge n bondin g distanc e (91) . GlnRS binds its cognate tRNA Gln i n wha t i s considered a class I-characteristic mode : the accepto r ar m o f the tRN A interact s wit h th e activ e sit e domai n o n th e mino r groove side , an d th e variabl e loop face s th e solvent . Th e interfac e between th e tw o extends over 270 0 A 2 (92). The clover-lea f secondar y structure o f tRNAGln (Fig . 19.13 ) show s all the constan t features an d relativel y typica l variabl e features. I t possesse s a three-bas e pai r D ste m and a nine-nucleotide D loop . It s variable loop consist s of five residues . All stems are GC-rich. It s structure wa s solved i n comple x wit h GlnR S an d i t i s assumed that it s uncomplexed structur e resemble s tha t o f tRNA phe. It s overal l foldin g i s the sam e as that o f tRNA phe, givin g ris e t o th e classica l L-shaped structur e tha t i s 2 0 A thick . However, bot h limbs of tRNAGln have undergone dramati c conformationa l chang e as they ar e induce d t o fi t th e enzyme . Th e termina l bas e pai r o f th e accepto r ste m is unravelled t o facilitat e the bendin g o f the 3'-termina l CC A int o th e activ e site. Th e anticodon i s spread out s o as to maximiz e interaction s with th e protein . The cor e o f tRNAGln is very simila r t o tha t of tRNAphe an d possesses most tertiar y interactions foun d i n th e forme r (Fig . 19.14) . Th e 4-thioU:A1 4 pairin g i s enhanced through a base—base contact with residu e A21; in addition t o th e contac t made by the base of A21 to the ribose o f U8 als o seen in tRNA phe, a similar contact exists between A14 an d A21 . Th e 12:23: 9 tripl e i s similar , althoug h th e natur e o f th e bases , C12:G23:C9, i s different. Unlik e i n tRNA phc, bas e pair G10:C2 5 form s n o tertiar y contact wit h A45 ; th e latter , however , form s a twofol d symmetrica l purine—purin e pair with A13, whic h als o interacts with A22 . As in tRNA phe the G15:C4 8 is a reverse Watson—Crick base pair. There i s no residu e 17 , which make s the D ar m shorte r tha n that o f tRNA phe. Th e G18 : U5 5 an d G19:C5 6 interaction s betwee n th e D an d T loops ar e the sam e as in tRNA Phe, as is the interna l T loo p pair T54:A58. The bas e of C20 contact s that of G19 an d the ribose o f G57. Th e bas e of G57 is stacked betwee n those o f G1 8 an d G19 . Th e mismatche d purine—pyrimidin e pair, A26:C44 , a t th e
626
Oxford Handbook of Nucleic Acid Structure
Fig. 19.13. E. coli tRNA Gln. Clover-leaf representation. The tertiar y interactions are shown by connecting lines.
bottom o f the augmanted D helix replaces a purine—purine pair G26:A44 of tRNA phe. The base s of C16 and U46 project into the solvent. There ar e three mai n region s i n tRNA Gln tha t interact with GlnRS : th e accepto r arm, part of the D arm, and the anticodo n loo p (92 ) (Fig. 19.15). Biochemical analy ses performed in vitro (64,93 ) an d in vivo (94-96) , i n conjunctio n with analysi s o f th e three-dimensional structur e of the complex , hav e localized the identity o f tRNAGln to the acceptor ste m and the anticodon, wit h on e element i n the D stem, G10 . I n addition t o th e residue s that are directly involved i n protein—RNA interactions , tRNA Gln possesses nucleotide s tha t enabl e i t t o adop t th e conformatio n tha t facilitate s it s binding t o GlnRS . Thes e residue s ar e i n th e accepto r ste m (G7 3 an d bas e pai r U1:A72) an d in the anticodon loop (2'mU32 , U33, m 2A37, an d ^38) (19,92) . The thre e termina l bas e pairs i n th e accepto r ar m o f tRNA Gln ar e th e principa l recognition element s fo r GlnRS , an d th e enzym e use s tw o loop s an d a n a heli x t o interact directl y wit h them . Th e firs t loop , tippe d wit h Leu-136 , denature s base pair
Transfer RNA 62
7
Fig, 19.14. Som e representativ e tertiary interaction s in tRNA t r J 1 . Th e followin g interaaio m ar e no t shown, sinc e the y ar e ver y simila r t o th e one s i n tRNA p h e : m 2 Gl8:^55, G19:C56 , an d T54:A58 . G10:C25 i s a standard Watson—Crick bas e pai r and doe s no t participat e in a base triple ; therefore , it i s no t shown.
U1:A72, whic h facilitate s th e bendin g o f th e 3'-termina l CC A int o th e activ e site . This ben d i s stabilize d b y a n intramolecula r interaction withi n th e tRNA : th e exo cyclic amin o grou p o f G7 3 hydroge n bond s wit h th e phosphat e moiety o f residue 72 (Fig. 19.16a) . The secon d loo p (residue s 179-184) line s u p th e backbon e s o that th e peptide oxyge n o f Pro-18 1 hydroge n bond s wit h th e exocycli c amin o grou p o f G2 and th e peptid e nitrogen o f IIe-18 3 form s a water-mediate d contac t wit h C7 2 (Fig . I9.16b). Residu e Asp-23 5 o f the a heli x interact s directly with G 3 an d contacts C70 through a wate r molecul e (Fig . 19.16c) . Th e heli x extend s int o th e activ e site (19).
Fig. 19.15. E. coli tRNA Gln: (a ) clover-leaf representatio n and (b ) three-dimensiona l fold , a s it appear s in comple x wit h GlnRS . Interaction s between th e tRNA an d the enzym e are indicated as follows: the base s in direc t contact with th e protei n ar e circled i n (a ) and draw n in soli d black in (b) ; the base s that for m water-mediated contact s with the protein ar e boldface i n (a ) and drawn in dark grey in (b) ; the residue s that enable GlnRS to induce a deformation in the tRN A in a sequence-dependent manner , so as to facilitat e it s binding, ar e boxed i n (a ) and in light gre y in (b) . The segment s of the backbon e that interact with GlnRS are marked by asterisks in (a ) and draw n a s large grey spheres in (b) . (From reft 8 5 and 124 , wit h the permissio n o f Cold Sprin g Harbor Laborator y Press and Oxford Universit y Press, respectively.)
Transfer RNA 62
9
Fig. 1 9 . 1 6 , Sequence-specific interactions betwee n GlnRS an d th e acceptor Ar m o f tRNAGln : (a) intramolecular interactio n betwee n G7 3 and tilt phosphate o f A72 ; (b ) interactions wit h ba se pai r 2:71; interaeiions with base pair .1:7 0 b y th e (c ) wild typ e and (d ) mutant D235N CilnKS. (fro m ref . 164. with th e permissio n of Cambridge University Press. )
630
Oxford Handbook of Nucleic Add Structure
Mutating residu e 23 5 t o As n (Fig . 19.l6d ) o r Gl y result s i n change d interaction s wit h base pai r G3:C70 , i.e . tw o direc t hydroge n bond s o r altere d wate r structure , respect ively (97) . Th e GlnR S enzyme s harbourin g thes e mutations , whic h wer e isolate d using a n in vivo suppressio n scree n (98,99) , exhibi t a slightl y altere d abilit y t o gluta minylate wil d typ e tRNA Glu , whil e thei r abilit y to discriminat e agains t a non-cogant e U3:A70 base pai r i s lowered, whic h manifest s itsel f in incorrec t acylatio n o f the ambe r suppressor derive d fro m tRNA Tyr (supF ) wit h glutamine . The anticodo n base s o f tRNA Gln ar e essentia l recognitio n element s fo r GlnRS , as was show n ver y earl y b y Sen o et al. (100 ) an d i s see n i n th e crysta l structur e (92 )
Fig, 19.17. (a ) a n additiona l non - Watson—Crick bas e pair in the anticodo n loo p of cRNA Gln . Sequence specific interaction s between GlnRS and the .ancicodon loop of tRNA G l n : bases (b) 34 , (c) 35 , an d (d ) 36 . (From ret" . 164, wit h th e permissio n o f Cambridge University Press.)
TMruJcr R.\'A 63
1
(Fig. 19.17) . Th e anticodo n loo p undergoe s a dramati c conformanona l chang e whereby th e anticodo n ste m i s extended b y tw o 11011-Watson—Crick-typ e base pairs, which ar e no t presen t i n fre e tRNAPh e (Fig . 19.17a) . Th e thre e antieodo n base s ar e splayed ou t s o tha t they bin d t o complementar y pockets i n th e C-termina l domai n o f GlnRS (Fig . 19.17b-d) . Th e C3 4 bindin g clef t ca n accommodat e bot h th e (3 4 o f tRNAGln2 an d th e 2-thio-U34 o f tRNAGln 1, th e tw o isoacceptors . However, th e U35 and G36-bmdm g pocket s ar c highl y specifi c fo r thes e tw o bases . Th e thre e pocket s share ver y simila r structura l arrangements, A potypeptid e segment o f 5 o r 6 residue s contains a t leas t on e positivel y charge d residu e tha t make s a sal t bridg e wit h th e adja cent phosphate , whil e th e aliphatic : par t o f it s side chai n pack s against eithe r th e bas e or th e ribose , Eac h bas e i s recognized throug h direc t hydroge n bondin g wit h th e sid e chains or backbon e o f the pepcid e (92).
3.2 tRNA Asp complexed with aspartyl-tRNA sytithetase Aspartyl-tRNA synthetas e (AspRS) fro m yeas t i s a clas s [ ] aaRS. Th e yeas t enzym e is an a 2 dime r o f tw o 55 7 residue , 63 kl) a monomer s (101) . I t is a compact, diamond shaped dime r o f tw o elongate d monomers . Eac h AspR S s u b m i t consist s o f tw o
Fig. 19.18 . Aspartyl:tRN A synthttasi'itRNAAsp' tonipk- x from yease, On e monome r i s in in light grey and the othe r i s i n dar k grey- Th e t R N A i s drawn i s a phosphat e chai n trac e i n a thick black an d gre y line. (From ref . HS . with th e permissio n o f Col d Sprin g Harbo r L a b o r a t o r y Press. )
632
Oxford Handbook of Nucleic Acid Structure
modules connecte d b y a hinge (Fig . 19.18). The N-termina l domain i s a five-stranded B barrel (20 ) that ha s a topology simila r t o suc h unrelate d protein s a s staphyloccocal nuclease, verotoxin, an d ribosomal protei n S17 . Th e moti f is called the O B fol d an d is implicated i n th e bindin g o f eithe r oligonucleotide s o r oligosaccharide s (102) . Th e C-terminal modul e i s the larges t o f the tw o domain s an d contain s th e catalyti c site, which i s composed o f an antiparallel B sheet flanke d b y a helices , a topology charac teristic of clas s II aaRSs . The N-termina l domai n of one subuni t interact s primaril y with the C-termina l domai n o f the other . Most o f the dime r interfac e is between th e C-terminal cor e modules . Moti f 1 an d par t o f moti f 2 for m th e dime r interface . Motifs 2 an d 3 interact wit h th e 3'-termina l CC A o f tRNA Asp, th e amin o acid , and ATP; th e AT P adopt s a ben t conformatio n an d bind s i n a manne r characteristi c o f class II aaRSs (20,103) . The ribos e o f the 3'-termina l adenosine i s positioned i n such a way that th e 3'-O H ca n accept Asp from aspartyl-adenylat e (104) . The AspR S dime r binds tRNAs in a symmetrical fashion . Each monomer is complexed t o a molecule o f tRNAAsp in what i s considered a class II-characteristic mode. Th e accepto r arm o f the tRNA interacts with th e protein o n th e majo r groov e side , and the variabl e loop sid e faces th e protein . Th e burie d surfac e ha s an area of 2500 A 2, which represent s 20% of the solvent-accessibl e surfac e o f tRNA Asp (103). Since the structure of' free', i.e . uncomplexed , tRN A i s also known, a direct com parison o f tRNA Asp i n th e tw o state s is possible (Fig s 19. 6 an d 19.19) . Both limb s have undergone a protein-induced fi t via a substantial conformational change ; however , th e change i s most dramatic in th e anticodo n arm . Th e cor e region i s virtually unchanged ; all the interactions observed i n the uncomplexe d tRNAAsp are maintained. There are three region s i n tRNA Asp tha t form contacts with AspRS , o f which eac h contains a t least on e putativ e identity elemen t (103 ) (Fig . 19.19). The y ar e located i n the accepto r stem , th e D stem , an d th e anticodon , whil e th e base s tha t interac t directly wit h th e protei n ar e i n th e accepto r ste m an d th e anticodo n loo p (20,103,105). Th e thre e anticodo n base s and residu e G7 3 o f the accepto r stem wer e found t o b e th e mai n identit y determinants , an d base pair G10:C2 5 o f the D ste m is an accessory element. Yeas t AspRS ignores th e natur e of the termina l bas e pair in th e acceptor ste m o f tRNA Asp (106) , wherea s th e secon d bas e pai r i s a mino r identit y element i n E. coli (107) . Som e residue s enabl e th e tRN A t o adop t th e conformatio n that facilitate s its bindin g t o AspR S bu t ar e no t directl y involve d i n protein-RN A interactions. The y ar e G3 7 i n th e anticodo n loo p an d bas e pai r G10:C2 5 i n th e D stem; th e latte r stabilize s the conformatio n o f the D stem nea r a n important AspR S contact (20) . The accepto r ste m o f tRNA Asp i s positioned b y motifs 1 and 2 . Th e backbon e o f the moti f 2 loop interact s with th e bas e of G73 an d th e firs t bas e pair o f the tRNA , which i s undisrupted. The 3'-termina l GCCA of the tRN A is in a helical conforma tion an d interacts directly with th e helice s an d loops o f the protei n tha t for m part o f the activ e site pocket (20) . Two other loop s contac t C7 5 an d A76. Mos t direc t con tacts involve th e sam e subunit; onl y the phosphate o f Ul interact s with Lys-29 3 of the other subuni t (103) . The anticodo n base s o f tRNA Asp ar e essentia l recognitio n element s fo r AspRS . The ar m interact s wit h th e N-termina l modul e o n th e majo r groov e sid e an d undergoes a protein-induced conformationa l change . Thi s results in th e bulging ou t
Fig* 19,19* yeast R N A " " : {a} C L O V E R - - L E A F " representation AND (b) threedimensionALFOLD.ASATappears in complex with AspRS. Imeraction between) the tRNA and the enzyme are indicate as follows the bases in direct contact with the protein are c i r c l e d [a)and drawn in solid black in (b)the residues that enable AspRS to induce a daformation in the RNA in a sequence-dependent manner, so is to facilitate binding, are boxed in (a) and in light grey in (b). The seg;nienfi ofthebackbone.thatinteractwith AspRS arc marked by asterisks in (a) and drawn AS large grey spheres m (b). (from refs 85 and 124, with the permission of Cold Spring Harbor Laboratory press and Oxtord University Press respectively)
634
Oxford Hawdbook of Nuclear - Acid Structure
of residu e mG37 , whic h shorten s an d bend s th e anticodo n stem-loop ; th e residu e forms a n intramolecula r hydroge n bon d wit h th e phosphat e o f residu e 2 5 vi a it s exocyclic amin o grou p an d thu s stabilize s th e conformatio n (Fig . 19.20a). Th e thre e anticodon base s are unstacke d and sprea d ou t t o maximiz e contacts wit h th e protein ; they ar e recognize d b y direc t hydroge n bondin g betwee n th e sid e chain s o r back bone segment s o f th e enzym e an d th e hydroge n bondin g group s o f th e base s (20,103) (Fig . 19.20b-d) .
Fig. 19.20. (a ) Irui-jinolixuhi r mUT;n-(ii. m bctwee n G3 7 an d the p h o s p h a t e o f residu e 25 . Sequence spwihi.- interactions betwee n A s p R S an d the anticodon loo p of tRNA^' 1 ': lust s (h ) .34. (c) 35 , an d (d ) 36 . (From ref. 164. wit h th e permission o f Cambridge - Universit y Press. )
Transfer
RNA
635
Fig, 19.21 . Sery]aRN A synthect.lse;tRNAisfitKNA 1"'' comple x from V . tltt-mit'i'liilm. On e monome r i s i n ligh t grey an d th e othe r i s i n dar k grey . Th e tRNA i s draw n i s phosphate di.n n tr.ii' f i n thic k black . Th e pnMKin o f th e t R N A tha t wa s nol . seen i n th e electro n densit y ma p an d wa s modelle d i s shown u s as a light grey trace, (Fro m ref'. 85, wit h th e permissio n (o f C o l d Sprin g Harbo r Laboratory ' Press. )
3.3 tRNA Ser complexed with seryl-tRNA syuiheiast' Seryl—tRNA synthetas c (SerRS ) i s a class I I aaRS . Th e enzynit ; fro m H. coli i s a n x 2 (dimer o f 4 8 kl) a subunit s (108 , 109). It s counterpart fro m T . thermophilus i s ver y similar (110) . SerR S i s a compact dime r wit h tw o helica l appendages . Eac h monome r consists o f tw o modules . Th e firs t 10 0 N-termina l residue s for m a 6 0 A antiparallel coiled coi l o f tw o x helices . The cor e activ e sit e domai n i s made o f a seven-stranded , mostly antiparallel , B shee t surrounde d b y x helices , a topolog y characteristi c of clas s II aaRSs . Al l o f th e dime r interfac e i s betwee n th e cor e modules ; modi " I an d a portion o f moti f 2 constitut e an importan t par t o f i t (109) . Motif s 2 an d 3 for m par t of th e activ e sit e platform , which interact s wit h ATP , seryl-adenylat e ( 1 1 1 ) , an d wit h the accepto r en d o f tRNAset (112) in a characteristic class I I fashion . Th e tRN A bind s across bot h subunit s o f th e dimer ; th e majo r groov e o f th e accepto r ar m face s th e active sit e domai n o f on e subunit , wherea s th e variabl e ar m an d cor e o f t R N A s ee interact wit h th e N-termina l appendag e of th e othe r subuni t (21 , 112) (Fig . 19.21) . The clover-lea f secondar y structure of t R N A s er from T . thermophilas appear s to b e very simila r t o tha t fro m E . coli describe d abov e (Fig . 19.22) . I n th e core , man y ter tiary interaction s ar e altere d owin g t o th e presenc e o f th e lon g variabl e arm , whic h removes th e variabl e loo p base s tha t ar e availabl e fo r bas e tripl e formatio n i n th e aug mented I ) heli x of tRNAphe , tRNAAsp , an d tRNAGln (Fig. 19.23) , A s a result , th e I )
636
Oxford Handbook of Nucleic Acid Structure
stem bas e pairs, C10:G25 and C12:G23, d o no t participat e in tertiar y base-mediate d interactions. Th e U8:A14:A2 1 interactio n i s analogou s t o th e on e observe d i n tRNAASP. Residue G 9 interacts with a different pair , th e mismatche d G13:A22 . Th e Levitt pair , G15:C48 , i s buttressed b y th e intra- D loo p contac t betwee n G1 5 an d D20A. Th e D loop lacks residue 17 , but i t possesses two additiona l residue s betwee n C20 an d A21. Th e interaction s betwee n th e D an d T loop s see n i n othe r tRNA s so far, namel y G18:Y55 , G19:C56 , an d th e bas e of G57 intercalatin g between G1 8 an d G19, ar e preserved, a s is the interna l T loo p stru t T54:A58. Th e base s of U16 an d C20 projec t int o th e solvent . Sinc e tRNA Ser comprise s a larg e variabl e arm , i t ha s introduced a feature tha t buttresse s the ar m an d anchors it t o th e bod y o f the tRNA . The bas e of G20B stack s upon th e firs t bas e pair of the variable arm , A45:U47Q, and engages i n va n de r Waal s interactions wit h th e edge s o f th e base s of C4 8 an d A21 , while it s sugar moiet y interact s wit h C48 . Th e usua l mismatched bas e pai r 26:4 4 i s a twiste d Watson—Cric k A26:U4 4 pair ; th e bas e o f residu e 2 6 ca n als o conceivabl y
Fig. 19.22. T . thermophilus tRNA Ser. Clover-lea f representation . Th e tertiar y interaction s ar e show n b y connecting lines .
'Iraasfer
RNA 63
7
Fig. 19.23 . Some - tepresentative tertiary tertiary indetactos in tRNA SCI. ' 1 he- following interaction s ar e no t shown. smce- the y a r e ver y simila r t o t h e OIK' S i n t R N A 1 a c : ( I 18-y55, G 1 0 - C 5 6 , ,and T 5 4 y544: U8:A14:IJ21 i s the- siinic a s 11 1 t R K A A 1 . (G10C;2 5 i s . 1 standard Wason-Crick pli s an d doe s no t par ticipate i n a bas e triple ; therefore, i t i s no i shown Als o show n i s the- staekin g interactio n o f th e variabl e arm o n t o G20 B an d th e edge s o f base s A2 I an d C4 8
interact wit h (143 . "the lon g variabl e loo p insert s into th e bod y a t a n angle , suc h that the entir e molecul e i s not entirel y flat. The mos t strikin g featur e t)o f the sery ] syste m i s tha t SerR S doe s no t interac t wit h thc antieodo n o f it s cognat e t K N A a t al l ( 2 1 , 1 13, 1 14), sinee the . tRNA s aminoary lated b y th e enzyme, fiv e t R N A s e r i s o a t c e p t o r s , and th e tRNA s e c x s posses s a variet y of antieodo n sequence s (109) . Ther e ar e fou r area s o n th e t R N A tha t i t recognize s (Fig. 19.24) : th e 3'-en d o f th e accepto r stem , th e par t o f th e antieodo n ste m a t th e base o f th e variabl e loop , par t o f th e Ty C loop , an d th e base-paire d portio n o f th e long variabl e arm , as has been show n b y c h e m i c a t footprintin g an d enzymati c probe s
fig. 19.24. T. thermophilas tRNA ! "' r : (a ) clover lea f representatio n an d (h ) three-dimension;* ] told , a s i t appear s in complex wit h SerRS . Th e base s i n direc t contac t with th e protei n ar e circled i n (a ) and draw n i n soli d blac k i n (b) . The segment s o f the backbon e tha t interar t wit h SerRS arc marke d b y asterisk s in (a) and draw n a s large gre y sphere s i n (b) . (From ref . 85 , wit h th e permissio n o f Cold Sprin g Harbo r Laborator y Press.)
Transfer RNA 63
9
of tRNA Ser (113,114 ) an d confirme d b y X-ra y crystallographi c analysi s of SerR S complexed wit h tRNA Ser fro m T . thermophilus (21,112) . Eight base s that are located in the accepto r and D arm s were foun d to constitut e the identit y o f tRNA Ser (75,115) , including th e discriminato r base , G73 , an d th e firs t thre e bas e pairs of th e accepto r stem. In addition, the lengt h o f the variable arm i s an important factor (116) . The accepto r stem is recognized primarily by the moti f 2 loop, which , i n SerRS , is the longes t i n al l th e know n clas s I I aaRSs , suc h tha t i t extend s furthe r dow n th e major groov e o f the accepto r stem. It changes its conformation upon tRN A binding . Phe-262 form s van der Waals contact s with th e hydrophobi c edge s o f bases U68 an d C69 an d thu s favours pyrimidine s a t thos e position s (Fig. 19.25a) . Ser-26 1 interact s directly wit h G 2 an d possibly with C71 ; th e backbon e carbony l oxyge n o f Phe-26 2 interacts wit h C7 1 a s well (Fig . 19.25b) . Thi s i s th e mos t significan t base-specifi c interaction. Th e discriminato r bas e G7 3 i s selected b y Glu-258 , whic h hydroge n bonds to th e exocyclic 2-amino group . Th e protei n interact s with the backbone fro m residue 66 to 7 1 (112 ) (Fig. 19.25a). An important recognition featur e o f tRNAser is the long variable arm, which inter acts with th e long, coiled-coil , N-termina l domai n o f the othe r monome r o f SerRS . This protei n modul e undergoe s a n induced chang e in it s orientation an d i s stabilized upon tRN A bindin g (21) ; it als o interact s with th e T loop . Ther e ar e very fe w con tacts betwee n th e protei n an d nucleotid e bases . On e involve s th e tertiar y bas e pair G19:C56; th e peptid e oxyge n o f Ala-555 hydroge n bond s t o th e exocycli c 2-amin o group o f G1 9 (Fig . 19.25c) . The bas e pai r stack s upo n Pro-5 9 an d Val-58 . Ther e is one notable interaction between th e coiled-coi l o f SerRS an d the minor groov e of the variable ar m o f tRNA Ser: Gln-54 5 interact s with bot h G47 A an d C47 N (112 ) (Fig. 19.25d). SerR S make s many backbone interactions but fe w base-specific contacts with tRNASer. I t thus seems to recogniz e the uniqu e shape rather than the sequenc e of its cognate tRNA (21,112,117) .
3.4 Other aaRS systems and general principles The mode s o f binding o f tRNA to clas s I and clas s II aaRSs are mirror image s of each other. Th e clas s I mod e i s characterize d by th e variabl e loo p o f th e tRN A facin g the solvent ; the cor e domai n of the enzym e interacts with the mino r groov e of the acceptor ste m an d th e CC A terminu s o f the tRN A i s distorted upo n binding . Th e class I I mode o f binding i s characterized by the variabl e loop o f the tRN A facin g th e protein; th e cor e domai n o f th e enzym e interact s wit h th e majo r groov e o f th e acceptor helix. I n addition, class I aaRSs are mostly monomeric, wit h th e exceptio n of TyrRS and TrpRS, whil e clas s II enzymes are mostly dimers. The principle s governin g th e accepto r ar m bindin g ca n b e extende d t o othe r aaRSs o f th e sam e class . I n th e cas e o f clas s I aaRSs , th e principle s see n i n th e GlnRS:tRNAGln:ATP comple x wer e show n t o appl y t o tw o othe r aaRS s o f know n structure, MetR S (118,119 ) an d GluR S (120) . The activ e sit e domain s o f thes e tw o aaRSs ar e very simila r to tha t o f GlnRS, wherea s the anticodon-bindin g domain s are helical structures, unlike the doubl e B barrel of GlnRS. TyrRS (121 ) and TrpRS (122 ) are both obligat e dimers an d are very similar to eac h other. Thei r activ e sites share the
640
Oxford
Handbook of Nucleir Add Strttcturt'
Hg. l9.25. Sequence-specific interactions between SerRs and RNAs' in the (a), (b) acceptor stem, (c) D .T loop. and! (d) variable1 loop.
Transfer RNA 64
1
Rossmann fol d wit h th e othe r thre e clas s I enzymes . A mode l ha s bee n propose d fo r t R N A Iy r binding to TyrR S (123 ) that bear s more resemblanc e to th e clas s I I mod e o f binding; however , th e bindin g ca n conceivabl y occur i n a clas s 1 fashion (124) . I n th e case o f clas s I I aaR.Ss, the principle s exemplified b y AspR S an d SerR S wer e show n t o apply t o othe r aaRS s o f know n structure . LysR S belong s t o th e s.im e subgrou p a s AspRS an d th e structure s o f th e tw o enzyme s ar e very simila r (125). Therefore, LysR S would b e expecte d t o bin d it s cognat e tRN A i n th e sam e manner . Thi s ha s bee n shown to r th e amicodo n portio n o f th e i n vitro transcrip t o f tRNA l y s (126) . GlyR S (127) an d HisR S (128 ) share th e activ e site fol d wit h othe r clas s I I aaRSs ; the y hav e a similar anticodon-bindin g C-termina l domain , whic h i s different fro m tha t o f AspRS . They wer e show n t o bin d thei r cognat e tRNA s i n a fashio n simila r to tha t o f AspRS: in HisR S a simple superposition o f th e AspRS:tRNA Asp comple x bring s the 3'-O H o f the tRN A withi n 3 A o f th e carbony l carbo n o f histidyl-adenylat e (128) . PheR S i s a dimer o f clas s I I dimers (129) ; in eac h o f the tw o dimer s on e monome r i s inactive. Th e PheRS tetrame r thu s binds two tRNAPhe molecules, Many aaRS s hav e bee n studie d i n complexe s with amin o acids , ATP , aminoacyl adenylates, an d analogues . Clas s I aaRS s bin d AT P i n a n extende d conformation , characteristic o f othe r ATP-bindin g proteins , wherea s clas s I I aaRSs bind i t i n a new , bent conformation . These tw o distinc t AT P conformation s giv e rise to differen t angle s of attack b y th e amin o acid a t the tc-phosphate , whic h result s in tw o distinc t adenylat e conformations. Furthermore , th e tRNA s bin d i n differen t modes , positionin g th e 2'-OH o f the termina l ribose i n clas s I aaRSs and th e 3'-O H o f th e termina l ribose i n class I ] aaRS s i n lin e t o pic k u p th e amin o aci d fro m th e adenylat e (86) . I n bot h classes, tRNA specificit y result s fro m idiosyncrati c interactio n with th e cognat e aaRS ;
Fig. 19.26 . Cuntorimationa l changes i n t R N A u p o n bindin g t o it s L~oi;i];it f ;i;iKS . Superpositio n ot " (.1) tKNA' : l " j s [wini d t o C l n K S (bh..-k ; ;nu l ti-fi ; iKMA 1 ' 1 "' fligh t grey}, ;ni d (h ) t R N A M ' ;i s bound t o AspRS (black ) an d uncomplexe d (ligh t grey) . (From ref s 8 5 an d [24 . wit h th e permissio n o f Cold Sprin g Harbor Laborator y Pres s an d Oxfor d U n i v e r s i t y Press, r e s p e c t i v e l y . )
642
Oxford Handbook of Nucleic Acid Structure
this include s direct bas e pair-protein contacts , backbon e interactions , an d sequencedependent deformability . Both tRNA Gln an d tRNA Asp underg o dramati c conformationa l change s tha t ar e induced b y their cognat e aaR S t o ensur e complementar y fi t o f their bindin g surface s (Fig. 19.26) . Both anticodo n loop s bend inwards, unstacking the anticodo n base s so as to maximiz e thei r interaction s wit h th e protein . Othe r concomitan t change s i n the loo p an d ste m ai d in th e proces s (19,20) . I n th e accepto r stem o f tRNA Gln, th e 3'-terminal CC A bend s into th e activ e site, which i s facilitated by the meltin g o f th e U1:A72 bas e pai r (19) . I n contrast , th e accepto r ar m an d th e CC A terminu s o f tRNAAsp remain s helica l upo n bindin g t o th e activ e sit e o f AspR S (20,103) . Conformational change s induced i n tRNA Scr b y SerR S (21,112 ) ar e minimal , a s the anticodon i s not boun d a t all . Th e adjustment s in th e accepto r ste m ar e probably o f the sam e magnitude a s seen in AspRS ; thes e are difficult t o ascertai n since ther e i s no reference structur e of uncomplexed tRNA Ser, whic h i s different fro m tRNA phe. In prokaryotes, suc h as E. coli, aminoacylated initiato r tRNA Metf (Met-tRNA Metf) is further modifie d befor e i t enter s th e initiatio n stag e o f protei n synthesis . Thi s modification, th e transfe r o f a formy l grou p fro m N-1 0 formyl-tetrahydrofolat e t o the amin o grou p o f the methionin e esterifie d t o th e 3'-en d o f the tRN A i s carried out b y methionyl-tRNA Metf formyltransferase. Th e enzym e i s highly specifi c fo r ini tiator tRNA Metf an d discriminates against elongator tRNA Metm (130) . Th e ke y recognition elemen t i s the mismatche d C1:A7 2 bas e pair i n th e accepto r ste m (131) . Th e protein ha s two domains , a n N-teminal domai n tha t contain s a Rossmann fol d an d a B barrel C-termina l domai n tha t resemble s th e anticodon-bindin g domai n o f AspRS . This domai n an d the flexible loop inserted i n the N-terminal nucleotide-bindin g fol d are implicate d i n tRN A binding . Th e N-termina l domai n contain s th e activ e site . The modula r organizatio n o f this enzym e is similar t o tha t o f aaRS (132) .
4. tRNA in protein synthesis 4. i Phe-tRNAphe bound to the elongation factor Tu Once aminoacylated , a tRNA (aa—tRNA ) i s transported t o th e ribosom e an d posi tioned i n the ribosoma l A site by a protein know n a s the elongatio n facto r (EF)-T u in prokaryotes and eEF-lo ; in eukaryotes . This facto r als o ensure s that th e anticodo n o f the aa-tRNA recognizes th e correc t expose d codo n o f the messenge r RNA. It s function i s regulate d b y bindin g o f GT P an d GDP . I t i s active , i.e . capabl e o f bindin g aa-tRNA, onl y whe n GT P i s bound; onc e i t positions the aa-tRN A i n th e A site of the elongatin g ribosome , th e GT P i s hydrolyse d an d th e resultin g EF-Tu:GD P i s released fro m th e ribosome . A t thi s poin t it s affinit y fo r aa-tRN A i s substantially reduced, an d th e facto r need s t o b e recycled . Sinc e GD P dissociate s from EF-Tu a t a very slo w rate , anothe r protei n factor , EF-Ts , i s neede d fo r thi s recyclin g step . I t accelerates the rat e of exchange o f GTP fo r GDP (133,134) . EF-Tu i s a monomer o f 40 5 residue s wit h a molecular mass of 4 5 kDa . It s threedimensional structur e ha s bee n analyse d i n severa l functiona l states : a s a n inactiv e complex wit h GDP (135,136) , a s an active complex wit h th e slowl y hydrolysin g GT P analogue GppNHp (137,138) , a s a ternary complex wit h Phe-tRNA phe, an d a s a GT P
Transfcr
RNA 64
3
Fig. 19.27, Phe'-tRNA P h e complexed wit h theelongationfactort ot h eRNA 'Hit1 t R NA i s draw n a s phosphat e trace 1 i n soli d black. The : sphere s indicate th e portions o f th e blackbone- contactin g the protein. There ar e no significcant base interactions.
644
Oxford Handbook of Nucleic Acid Structure
analogue EF-Tu (22 ) and a complex wit h th e guanin e nucleotide exchang e factor EF Ts (139) . EF-Tu consist s of three domain s (Fig . 19.27) . Domai n I is a B sheet o f five parallel strands and one antiparalle l strand surrounded on bot h side s by six major a helices . It contains a guanine nucleotide-bindin g site ; henc e i t i s also know n a s the G domain . The structur e is similar to tha t of ras—p21 (135,137,138) . Domains I I and II I are com posed exclusively of antiparallel B sheets, each forming a B barrel. A large intramolec ular movemen t occur s durin g th e transitio n fro m th e inactiv e GDP - t o th e activ e GTP-bound for m (137,138) . Domain s I I an d II I mov e a s a rigi d uni t relativ e t o domain I b y a distance tha t exceed s one-thir d o f th e molecula r diameter ; th e angl e between th e tw o unit s changes by abou t 90° . Thi s result s in a transition fro m a tight and mostl y polar interface between domain s I and I I in th e activ e form, t o a substantial cavity separating the tw o domain s in the inactiv e form. The accepto r ar m o f aminoacylate d tRN A bind s t o al l three domain s o f EF-Tu , while th e anticodo n ar m doe s no t interac t with th e protei n a t al l (Fig. 19.27) . Th e aminoacylated CC A terminu s i s fixe d i n a narro w clef t betwee n domain s I an d I I (22), whic h i s lined wit h severa l positively charge d residues and i s present only i n th e GTP-bound for m (137,138) . Th e amin o acid-bindin g pocke t ca n accommodate an y one o f th e standar d 2 0 amin o acids . Th e protei n interact s primaril y wit h th e sugar—phosphate backbon e o f the 5'-en d o f the accepto r helix als o interacts with th e junction o f th e thre e domains . Th e overal l shap e o f th e protei n resemble s tha t o f the EF-G:GD P form . Th e tRN A itsel f change s it s conformatio n onl y slightl y upon binding to EF-Tu (22) .
4,2 tRNA in the ribosome The ultimat e destination of aminoacylated tRNAs is the ribosome , wher e th e amin o acid i s incorporated into a growing polypeptid e accordin g to th e geneti c message on the mRNA ; th e proces s occurs in thre e phases , initiation, elongation , an d termina tion. Th e ribosom e i s a large RNA—protei n complex tha t contains , in al l species, a small and a large subunit (140). Each subunit is a complex betwee n on e o r more large ribosomal RN A (rRNA ) molecule s an d a number o f relatively small, predominantly basic proteins . Ribosome s fro m prokaryoti c organisms such a s E. coli consis t o f 30 S and SO S subunits , comprising 16 S rRNA and 21 proteins, and 5S and 23 S rRNA an d 32 proteins , respectivel y (141) . Eukaryoti c ribosomes , suc h a s those fro m yeast , ar e larger an d ar e made o f 40S and 60 S subunits, which compris e 18 S rRNA an d about 30 proteins , an d 5S , 5.8S, an d 28 S rRNA an d abou t 4 0 proteins , respectively (142). Crystals of the particl e and individual subunit s have been availabl e for some time ; th e determination o f it s three-dimensiona l structur e by X-ra y crystallograph y is a chal lenging long-term goal (143) . Low resolution techniques such as electron microscopy, neutron scatterin g and diffraction , an d chemica l probin g (144,145 ) hav e furnishe d much informatio n o n th e structura l organization o f th e ribosom e an d it s subunits. Neutron scatterin g experiment s hav e yielde d a ma p o f th e relativ e location s o f all ribosomal protein s i n th e E. coli ribosom e (146,147) . Th e structur e o f the ribosom e and it s interactio n wit h it s substrates , mRNA an d tRNA , hav e bee n probe d ex tensively b y biochemica l method s (134,148,149) . Recently , th e overal l structur e o f
Transfer RNA 64
5
the E. coli particl e ha s been reconstructe d fro m cryoelectro n microscopi c image s a t 23- 2 5 A resolution. In the structure , the smal l subunit possesses a channel and the large subunit a bifurcating tunnel . Th e channe l ma y accommodat e th e incomin g mRNA , while th e tunne l may serve as the exi t pathway for the nascen t peptide (150,151) . In additio n t o informatio n o n th e overal l structur e o f th e ribosome , cryoelectro n microscopy ha s pin-pointed thre e tRNA molecule s boun d t o th e A , P, and E site s of the ribosom e (23 ) in what wa s an average structure and does not represen t any physiological stat e o f th e ribosome , sinc e onl y tw o tRN A site s ar e occupie d a t a time . However, th e arrangemen t o f tRNAs was determined a t 20 A resolution i n two func tional state s of elongation , befor e an d afte r translocatio n (24) . Sinc e ther e wer e n o gross overal l conformationa l change s between th e tw o state s a t this resolution , whic h were als o isomorphou s t o th e vacan t state , differenc e electron densitie s between th e two state s an d relativ e to th e vacan t particles revealed th e differentia l occupancie s o f the thre e sites and some other morphologica l changes . In the pre-translocational ribosomes, densitie s wer e observe d correspondin g t o tRNA s i n th e A an d P sites , whil e occupation by tRNA of the P and E sites was seen in the post-translocational state . As the P sit e is occupied i n bot h states , it wa s not see n i n a difference ma p betwee n th e two states . The A site was shown ver y clearly , while th e densit y corresponding t o th e E sit e was more diffuse , probabl y reflecting the large r conformational heterogeneity o f the site . In both th e A and P sit e tRNAs, a thin lin e o f density corresponding t o th e 3'-CCA terminu s point s toward s th e putativ e peptidy l transferas e regio n o f the 5 0 S subunit, while th e region s correspondin g t o the anticodo n arm s lie in the nec k o f the 30 S subunit, the putative decoding regio n (24) .
5. Perspectives Transfer RN A i s structurally an d functionall y a very versatil e molecule . I t ca n interac t with man y othe r molecule s an d serv e a s a substrat e fo r man y enzymes . Th e overal l general feature s ar e use d b y enzyme s suc h a s tRNA precurso r 5' - an d 3'-processin g nucleases, some modification enzymes, proteins such as translation factors, an d ribonucleoprotein particle s such as the ribosome . I n addition, tRNA s posses certain distinguishing features tha t constitute their identity ; these are recognized, withi n thei r common context , by specialize d enzymes suc h a s aminoacyl—tRNA synthetases, Met—tRNA Metf formyltransferase, Glu-tRNA Gln an d Asp-tRNA Asn amidotransferase s (152 ) an d man y modification enzymes . Al l thes e genera l an d specifi c encounter s betwee n tRNA s an d associated molecules constitute a n extensive structura l and functiona l puzzle , only a few pieces of which we have begun to fathom, as we have seen in this chapter. Many larger RNAs, suc h as those from som e plant viruses and virusoids, are capable of structurally and functionally mimicking th e versatilit y of tRNA (153) . The y d o so at thei r 3'-termini , sinc e thes e end s ca n b e processe d b y RNAas e P an d tRN A nucleotidyl transferase , underg o aminoacylation , an d interact wit h elongatio n factors . However, the y d o no t participat e i n protei n synthesis . Their primary rol e i s to ai d in viral replication . The y ma y hav e co-evolve d wit h tRNA s an d associate d molecule s from commo n ancestors , a s suggested b y th e genomi c ta g hypothesi s o f Weine r an d Maizels (154).
646
Oxford Handbook of Nucleic Acid Structure
Although th e principa l rol e o f tRNA s i n th e cel l i s to tak e par t i n th e message directed protein synthesis, they are not confine d t o tha t purpose alone. They can par ticipate i n othe r cellula r processes , suc h a s priming revers e transcriptio n (155 ) an d regulation o f gene expressio n (156) , whic h reflec t th e role s played b y the tRNA-lik e viral RNAs. I n addition , the y ar e involved in variou s other metabolic pathways, suc h as porphyrin biosynthesis (157) . The simpl e an d sophisticate d structur e o f tRNA , wit h it s overal l L-shap e an d two functiona l ends , on e fo r mRN A codo n readin g an d th e othe r fo r amin o aci d attachment and transfer , make s it an adapto r molecul e par excellence. It als o make s it very adaptabl e t o th e man y molecule s i t meet s an d associate s wit h durin g it s cellular career.
Acknowledgements We thank S . Cusack fo r the lates t atomi c coordinate s o f the SerRS:tRNA Ser comple x from T. thermophilus. All figures wer e made with program MOLSCRIPT (165) .
References 1. Hoagland , M.B., Zamecnik , P.C. an d Stephenson, M.L . (1957 ) Biochim. Biophys. Ada 24 , 215. 2. Holley , R.W. , Apgar , J., Everett , G.A. , Madison , J.T. , Marquisse , M. , Merrill , S.H. , Penwick, J.R. an d Zamir, R. (1965 ) Science 147 , 1462 . 3. Sprinzl , M., Steegborn , C. , Hiibel , F . and Steinberg, S. (1996) Nucl. Adds Res. 24, 68 . 4. Crick , F.H.C . (1966 ) J. Mol Biol. 19, 548 . 5. Dunn , D.B . (1959 ) Biochim. Biophys. Acta 34, 286 . 6. Smith , J.D. an d Dunn, D.B . (1959 ) Biochem.J. 72, 294 . 7. Bernhardt , D. an d Darnell, Jr, J.E., (1969 ) J. Mol. Biol. 42, 43. 8. Altaian , S . and Smith, J.D. (1971 ) Nature New Biol. 233, 35 . 9. Altaian , S. , Kirsebom , L . an d Talbot , S . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soll , D. an d RajBhandary, U., eds) , p. 67. American Society for Microbiology , Washington, DC . 10. Deutscher , M.P . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soll , D . an d RajBhandary, U. , eds) , p. 51. American Society for Microbiology, Washington , DC . 11. Fraser , T.H. an d Rich, A. (1975) Proc. Nad. Acad. Sci. USA 72 , 3044 . 12. Sprinzl , M. an d Cramer , M . (1975 ) Proc. Natl. Acad. Sci. USA 72 , 3049 . 13. Eriani , G., Delarue, M., Poch , O. , Gangloff , J. an d Moras, D. (1990 ) Nature 347 , 203 . 14. Kim , S.H. , Suddath , F.L., Quigley , G.J. , McPherson , A. , Sussman , J.L., Wang , A.H.J. , Seeman, N.C. an d Rich, A. (1974 ) Science 185, 435 . 15. Robertus , J.D. , Ladner , J.E., Finch , J.T., Rhodes , D. , Brown , R.S. , Clark , B.F.C . an d Klug, A. (1974 ) Nature 250, 546 . 16. Moras , D. , Comarmond , M.B. , Fischer , J., Weiss , R. , Thierry , J.C. , Ebel , J.P. an d Giege, R. (1980 ) Nature 288, 669 . 17. Woo , N.H. , Roe , B.A . and Rich, A. (1980) Nature 286 , 346 . 18. Basavappa , R. an d Sigler, P.B. (1991 ) EMBOJ. 10 , 3105 . 19. Rould , M.A., Perona , J.J., Soll , D. an d Steitz, T.A. (1989 ) Science 246, 1135 . 20. Ruff , M. , Krishnaswamy , S., Boeglin, M. , Poterszman , A., Mitschler , A. , Podjarny , A., Rees, B., Thierry, J.-C. an d Moras, D. (1991 ) Science 252, 1682 .
Transfer RNA 64
7
21. Biou , V., Yaremchuk , A. , Tukalo , M . an d Cusack, S . (1994) Science 263, 1404 . 22. Nissen , P. , Kjeldgaard , M. , Thirup , S. , Polekhina, G. , Reshetnikova , L. , Clark, B.F.C. and Nyborg,]. (1995) Science 270, 1464 . 23. Agrawal , R.K. , Penczek , P. , Grassucci , R.A. , Li , Y. , Leith , A. , Nierhaus , K.H . an d Frank,]. (1996) Science 271, 1000 . 24. Stark , H. , Orlova , E.V. , Rinke-Appel , J. , Junke , N. , Mueller , F. , Rodnina , M. , Wintermeyer, W. , Brimacombe , R . an d van Heel, M . (1997 ) Cell 88, 19. 25. Inokuchi , H. an d Yamao, F . (1995) in tRNA: Structure, Biosynthesis, and Function, (Soil , D . and RajBhandary, U., eds) , p. 17 . American Societ y for Microbiology, Washington , DC . 26. Sprague , K.U . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soll , D . an d RajBhandary, U. , eds) , p. 31. America n Societ y fo r Microbiology, Washington , DC . 27. Sigler , P.B. (1975 ) Annu. Rev. Biophys. Bioeng. 4, 477 . 28. Dirheimer , G. , Keith , G. , Dumas , P . an d Westhof , E . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soll, D. an d RajBhandary, U., eds) , p. 93 . America n Societ y for Microbiology , Washington , DC. 29. Suddath , F.L. , Quigley , G.J. , McPherson , A. , Sneden , D. , Kim , J.J. , Kim , S.H . an d Rich, A. (1974 ) Nature 248, 20 . 30. Quigley , G.J. , Wang , A. , Seeman , N.C. , Suddath , F.L. , Rich , A. , Sussman , J.L. an d Kim, S.H . (1975 ) Proc. Natl. Acad. Sci. USA 72 , 4866 . 31. Quigley , G.J . an d Rich, A. (1976 ) Science 194, 796 . 32. Ladner , J.E., Jack , A. , Robertus , J.D., Brown , R.S. , Rhodes , D. , Clark , B.F.C . an d Klug, A. (1975 ) Proc. Natl. Acad. Sci. USA 72, 4414 . 33. Jack , A., Ladner, J.E. an d Klug, A. (1976) J. Mol. Biol. 108, 619 . 34. Sussmann , J.L., Holbrook , S.R. , Warrant , R.W. , Church , G.M . an d Kim, S.H . (1978 ) J. Mol. Biol. 123, 607 . 35. Rich , A. and RajBhandary, U.L . (1976 ) Annu. Rev. Biochem. 45, 805 . 36. Rich , A. (1977) Ace. Chem. Res. 10, 388 . 37. Kim , S.-H . (1978 ) Adv. Enzymol. 46, 279 . 38. Levitt , M. (1969 ) Nature 224 , 759 . 39. Westhof , E., Dumas, P . and Moras, D . (1985 ) J. Mol. Biol. 184, 119 . 40. Moras , D. , Dock , A.C. , Dumas , P. , Westhof , E. , Romby , P. , Ebel , J.P. an d Giege , R . (1986) Proc. Natl. Acad. Sci. USA 83 , 932 . 41. Schevitz , R. , Podjarny , A.D. , Krishnanmachari , N. , Hughes , J.J. , Sigler , P.B . an d Sussman, J.L. (1979 ) Nature 278 , 188 . 42. Seong , B.L . an d RajBhandary, U.L. (1987 ) Proc. Natl. Acad. Sci. USA 84 , 334 . 43. Dock-Bregeon , A.C. , Westhof , E. , Giege , R . an d Moras , D . (1989 ) J. Mol. Biol. 206 , 707. 44. Romby , P. , Moras , D. , Dumas , P. , Ebel , J.P . an d Giege , R . (1987 ) J. Mol. Biol. 195 , 193. 45. Martin , N.C . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soil , D . an d RajBhandary, U. , eds) , p. 127 . America n Societ y fo r Microbiology, Washington , DC . 46. Watanabe , K . an d Osawa , S . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soil , D. an d RajBhandary, U., eds) , p. 225 . American Societ y fo r Microbiology, Washington , DC. 47. Bjork , G.R . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soil , D . an d RajBhandary, U. , eds) , p. 165 . America n Societ y for Microbiology, Washington , DC . 48. Yokoyama , S . an d Nishimura , S . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soil, D . an d RajBhandary , U. , eds) , p . 207 . America n Societ y fo r Microbiology , Washington, DC. 49. Agris , P.P . (1996 ) Progr. Nucl. Acid Res. Mol. Biol. 53, 79 .
648
Oxford Handbook of Nucleic Acid Structure
50. Romier , C. , Reuter, K., Suck, D. an d Ficner, R . (1996 ) EMBOJ. 15 , 2850. 51. Hall , K.B. , Sampson , J.R., Uhlenbeck , O.C . an d Redfield, A.G . (1989 ) Biochemistry 28 , 5794. 52. Chu , W.C . an d Horowitz, J. (1989 ) Nucl. Acids Res. 17, 7241 . 53. Arnez , J.G. an d Steitz, T.A. (1994 ) Biochemistry 33 , 7560 . 54. Sampson , J.R. an d Uhlenbeck, O.C . (1988 ) Proc. Natl Acad. Sci. USA 85 , 1033 . 55. perret , V. , Garcia , A. , Puglisi , J., Grosjean , H. , Ebel , J.P. , Florentz , C . an d Giege , R . (1990) Biochimie 72, 735 . 56. Derrick , W.B . an d Horowitz,J. (1993 ) Nucl. Acids Res. 21, 4948 . 57. Beresten , S. , Jahn, M . an d Soll, D. (1992 ) Nucl. Acids Res. 20, 1523 . 58. Emerson , J. an d Sundaralingam, M . (1980 ) Acta Cryst. B36 , 537 . 59. Cadet.J. , Ducolumb , R . an d Hruska, F.E . (1980 ) Biochim. Biophys. Acta 563, 206 . 60. Agris , P.F. , Sierzputowska-Gracz , H. , Smith , W. , Malkiewicz , A. , Sochacka , E . an d Nawrot, B . (1992) J. Am. Chem. Sac. 114, 2652 . 61. Chen , Y. , Sierzputowska-Gracz , H. , Guenther , R. , Everett , K . an d Agris , P.P . (1993 ) Biochemistry 32 , 10249 . 62. Agris , P.P. , Malkiewicz , A. , Brown , S. , Kraszewski , A. , Nawrot , B. , Sochacka , E. , Everett, K . and Guenther, G . (1995 ) Biochimie 77, 125 . 63. Himeno , H., Hasegawa , T., Ueda , T. , Watanabe , K., Miura, K. and Shimizu, M. (1989 ) Nucl. Acids Res. 17, 7855 . 64. Jahn , M. , Rogers , M.J. an d Soll, D. (1991 ) Nature 352, 258 . 65. Sampson , J.R., Behlen , L.S. , DiRenzo, A.B . an d Uhlenbeck , O.C . (1992 ) Biochemistry 31, 4164 . 66. Sylvers , L.A. , Rogers , K.C. , Shimizu , M. , Ohtsuka , E . an d Soll , D . (1993 ) Biochemistry 32, 3836 . 67. Rogers , K.C., Crescenzo , A.T . an d Soll, D. (1995 ) Biochimie 77, 66. 68. Tamura , K., Himeno, H., Asahara , H., Hasegawa , T. an d Shimizu, M. (1992 ) Nucl. Acids Res. 20, 2335. 69. Muramatsu , T., Nishikawa , K. , Nemoto , P. , Kuchino, Y. , Nishimura , S. , Miyazawa, T . and Yokoyama, S . (1988) Nature 336, 179 . 70. Perret , V. , Garcia , A. , Grosjean , H. , Ebel , J.-P. , Florentz , C . an d Giege , R . (1990 ) Nature 344, 787 . 71. Desgres , J., Keith , G., Kuo, K.C . an d Gehrke, C . (1989 ) Nucl. Acids Res. 17, 868 . 72. Kiesewetter , S. , Ott, G . and Sprinzl, M . (1990 ) Nucl. Acids Res. 18, 4677 . 73. Forster , C., Chakraburtty , K . and Sprinzl, M. (1993 ) Nucl. Acids Res. 21, 5679 . 74. Koval'chuke , O.V. , Potapov , A.P. , El'skaya , A.V. , Potapov , V.K. , Krinetskaya , N.F. , Dolinnaya, N.G . an d Shabarova, Z.A. (1991 ) Nucl. Acids Res. 19, 4199 . 75. Normanly , J. , Ogden , R.C., Horvath , SJ . an d Abelson, J. (1986 ) Nature 321 , 213 . 76. McClain , W.H . an d Nicholas, H.B.J . (1987 ) J. Mol. Biol. 194 , 635 . 77. Schulman , L.H . (1991 ) Progr. Nucl. Acid Res. Mol. Biol. 41, 23 . 78. Schulman , L.H. an d Pelka, H . (1988 ) Science 242, 765 . 79. Normanly , J. an d Abelson, J. (1989 ) Annu. Rev. Biochem. 58, 1029 . 80. Perret , V. , Florentz, C., Puglisi , J.D. an d Giege, R . (1992 ) J. Mol. Biol. 226, 323 . 81. Schimmel , P . and Soll, D. (1979 ) Annu. Rev. Biochem. 48, 601 . 82. Yarus , M. (1972 ) Nature New Biol. 239, 106 . 83. Carter , Jr, C.W . (1993 ) Annu. Rev. Biochem. 62, 715 . 84. Meinnel , T. , Mechulam , Y . an d Blanquet , S . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soll , D . an d RajBhandary , U. , eds) , p . 251 . America n Societ y fo r Microbiology, Washington , DC. .
Transfer RNA 64
9
85. Arnez , J.G. an d Moras , D . (1998 ) RNA Structure and Function, (Grunberg-Manago , M . and Symons , R.W. , eds) , p . 46 5 Col d Sprin g Harbo r Laborator y Press , Col d Sprin g Harbor. 86. Arnez , J.G. an d Moras, D. (1997 ) TIBS22, 211 . 87. Moras , D . (1992 ) TIBS 17 , 159 . 88. Delarue , M . an d Moras, D. (1993 ) BioEssays 15 , 1. 89. Jasin , M., Regan , L. and Schimmel, P . (1983 ) Nature 306 , 441 . 90. Hoben , P. , Royal, N., Cheung , A. , Yamao, F. , Biemann, K . and Soll, D . (1982 ) J. Biol. Chem.257, 11644 . 91. Perona , J.J., Rould , M.A. an d Steitz, T.A . (1993 ) Biochemistry 32 , 8758. 92. Rould , M.A., Perona , JJ. an d Steitz, T.A. (1991 ) Nature 352, 213 . 93. Hayase , Y., Jahn, M., Rogers , M.J. , Sylvers , L.A., Koizumi, M. , Inoue , H. , Ohtsuka , E. and Soll , D . (1992 ) EMBOJ. 11 , 4159 . 94. Ghysen , A. and Celis, J.E. (1974 ) J. Mol. Biol. 83, 333 . 95. Knowlton , R.G. , Soll , L . and Yarus, M. (1980 ) J. Mol. Biol. 139, 705 . 96. Rogers , M.J. an d Soil , D . (1988 ) Proc. Natl. Acad. Sci. USA 85 , 6627 . 97. Arnez , J.G. an d Steitz, T.A. (1996 ) Biochemistry 35 , 14725 . 98. Inokuchi , H., Hoben , P. , Yamao , P., Ozeki , H . an d Soll, D. (1984 ) Proc. Natl. Acad. Sci. USA 81 , 5076 . 99. Perona , J.J., Swanson , R.N. , Rould , M.A. , Steitz , T.A. an d Soll, D. (1989 ) Science 246 , 1152. 100. Seno , T. , Agris , P.F. an d Soll, D. (1974 ) Biochim. Biophys. Acta 349, 328 . 101. Amiri , I. , Mejdoub , H. , Hounwanou , N. , Boulanger , Y . an d Reinbolt , J . (1985 ) Biochimie 67, 607 . 102. Murzin , A.G. (1993 ) EMBOJ. 12 , 861 . 103. Cavarelli , J., Rees , B. , Ruff, M. , Thierry , J.C. an d Moras, D. (1993 ) Nature 362 , 181 . 104. Cavarelli , J., Eriani , G. , Rees , B. , Ruff , M. , Boeglin , M. , Mitschler , A. , Martin , F. , Gangloff, J., Thierry , J.C. an d Moras, D. (1994 ) EMBOJ. 13 , 327. 105. Rudinger , J. , Puglisi , J.D., Putz , J., Schatz , D. , Eckstein , F. , Florentz, C . an d Giege, R . (1992) Proc. Natl. Acad. Sci. USA 89 , 5882 . 106. Putz , J., Puglisi , J.D., Florentz , C. an d Giege, R . (1991 ) Science 252, 1696 . 107. Nameki , N. , Tamura , K. , Himeno , H. , Asahara , H. , Hasegawa , T . an d Shimizu , M . (1992) Biochem. Biophys. Res. Commun. 189, 856 . 108. Hartlein , M. , Madern , D. an d Leberman, R. (1987 ) Nud. Acids Res. 15, 1005 . 109. Cusack , S. , Berthet-Colominas, C. , Hartlein , M. , Nassar , N. an d Leberman, R . (1990 ) Nature 347, 249 . 110. Fujinaga , M. , Berthet , C.C. , Yaremchuk , A.D. , Tukalo , M.A . an d Cusack , S . (1993 ) J. Mol. Biol. 234, 222 . 111. Belrhali , H. , Yaremchuk , A. , Tukalo , M. , Berthet-Colominas , C. , Rasmussen , B. , Bosecke, P., Dial , O . an d Cusack, S . (1995) Structure 3, 341 . 112. Cusack , S., Yaremchuk, A. and Tukalo, M. (1996 ) EMBOJ. 15 , 2834. 113. Dock-Bregeon , A.C. , Garcia , A., Giege , R . an d Moras, D . (1990 ) Eur. J. Biochem. 188 , 283. 114. Schatz , D., Leberman , R . an d Eckstein, F. (1991 ) Proc. Natl. Acad. Sci. USA 88 , 6132 . 115. NormanlyJ. , Ollick, T . an d Abelson, J. (1992 ) Proc. Natl. Acad. Sci. USA 89 , 5680 . 116. Himeno , H. , Hasegawa , T. , Ueda , T. , Watanabe , K . and Shimizu, M. (1990 ) Nucl. Acids Res. 18 , 6815 . 117. Asahara , H. , Himeno , H. , Tamura , K. , Nameki , N. , Hasegawa , T . an d Shimizu , M . (1994) J. Mol. Biol. 236, 738 . 118. Brume , S. , Zelwer, C . an d Risler, J.L. (1990 ) J. Mol. Biol. 216, 411 .
650
Oxford Handbook of Nucleic Acid Structure
119. Perona , J.J. , Rould , M.A. , Steitz , T.A. , Risler , J.L., Zelwer , C . an d Brume , S . (1991) Proc. Natl. Acad. Sci. USA 88 , 2903 . 120. Nureki , O. , Vassylyev , D.G., Katayanagi , K., Shimizu , T. , Sekine , S. , Kigawa , T. , Miyazawa, T., Yokoyama , S . and Morikawa , K. (1995 ) Science 267, 1958 . 121. Brick , P., Bhat, T.N. an d Blow, D.M . (1989 ) J. Mol. Bio/. 208, 83 . 122. Doublie , S. , Bricogne, G. , Gilmore , C . an d Carter, C.W . (1995 ) Structure 3, 17. 123. Bedouelle , H . an d Winter, G. (1986 ) Nature 320, 371 . 124. Arnez , J.G. an d Moras , D . (1994 ) RNA-Protein Interactions, (Nagai , K . an d Mattaj , I. , eds), p. 52 . Oxfor d Universit y Press , Oxford . 125. Onesti , S. , Miller, A.D . an d Brick, P. (1995 ) Structure 3, 163 . 126. Cusack , S. , Yaremchuk, A . and Tukalo, M. (1996 ) EMBOJ. 15 , 6321 . 127. Logan , D.T., Mazauric , M.H., Kern , D . an d Moras, D . (1995 ) EMBOJ. 14 , 4156 . 128. Arnez , J.G., Harris , D.C., Mitschler , A. , Rees, B., Francklyn , C.S. an d Moras, D . (1995 ) EMBOJ. 14 , 4143 . 129. Mosyak , L. , Reshetnikova, L. , Goldgur , Y. , Delarue , M . an d Safro , M.G . (1995 ) Nature Struct. Biol. 2, 537 . 130. Mangroo , D . an d RajBhandary, U.L. (1995 ) J. Biol. Chem. 270, 12203 . 131. Guillon , J.M. , Meinnel , T. , Mechulam , Y. , Lazennec , C. , Blanquet , S . an d Fayat , S. (1992) J. Mol. Biol. 224, 359 . 132. Schmitt , E. , Blanquet, S . and Mechuhm, Y. (1996 ) EMBOJ. 15 , 4749 . 133. Miller , D.L . an d Weissbach , H . (1977 ) Molecular Mechanisms of Protein Biosynthesis, (Weissbach, H. an d Petska, S., eds), p. 323 . Academi c Press , New York . 134. Moazed , D. an d Noller, H.F. (1989 ) Nature 342, 142 . 135. Jurnak , F . (1985) Science 230, 32 . 136. Kjeldgaard , M . an d Nyborg, J. (1992 ) J. Mol. Biol. 223, 721 . 137. Berchtold , H. , Reshetnikova , L. , Reiser , C.O.A. , Schirmer , N.K. , Sprinzl , M . an d Hilgenfeld, R . (1993 ) Nature 365, 126 . 138. Kjeldgaard , M. , Nissen , P., Thirup , S . and Nyborg, J. (1993 ) Structure 1, 35. 139. Kawashima , T., Berthet-Colominas , C. , Wulff , M. , Cusack , S . and Leberman, R. (1996 ) Nature 379, 511 . 140. Lake , J.A. (1981 ) Sci . Am. 245 , 84 . 141. Wittmann , H.G . (1982 ) Annu. Rev. Biochem. 51, 155 . 142. Kozak , M. (1983 ) Microbiol. Rev. 47, 1 . 143. Yonath , A . and Wittmann, H.G . (1989 ) TIBS 14 , 329 . 144. Moore , P.B . (1988 ) Nature 331, 223 . 145. Lake , J.A. (1985 ) Annu. Rev. Biochem. 54, 507 . 146. Capel , M.S. , Engelman , D.M. , Freeborn , B.R. , Kjeldgaard , M. , Langer , J.A. , Ramakrishnan, V. , Schindler , D.G. , Schneider , D.K. , Schoenborn , B.P. , Sillers , I.-Y. , Yabuki, S. and Moore, P. (1987 ) Science 238, 1403 . 147. Walleczek , J. , Schiiler , D. , Stoffler-Meilicke , M. , Brimacombe , R . an d StofHer , G . (1988) EMBOJ. 7, 3571 . 148. Vonahsen , U. an d Noller, H.F. (1995 ) Science 267, 234 . 149. Samaha , R.R., Green , R. an d Noller, H.F . (1995 ) Nature 377 , 309 . 150. Frank , J., Zhu , J., Penczek , P. , Li , Y., Srivastava , S., Verschoor, A. , Radermacher , M. , Grassucci, R., Lata , R.K. an d Agrawal, R.K. (1995 ) Nature 376 , 441 . 151. Stark , H. , Mueller , F. , Orlova , E.V. , Schatz , M. , Dube , P. , Erdemir , T. , Zemlin , F. , Brimacombe, R . an d van Heel, M. (1995 ) Structure 3,815. 152. Ibba , M., Curnow , A.W. an d Soil, D. (1997 ) TIBS 22, 39.
Transfer RNA 65
1
153. Florentz , C . an d Giege, R . (1995 ) i n tRNA: Structure, Biosynthesis, and Function, (Soil , D . and RajBhandary , U. , eds) , p. 2141 . America n Societ y fo r Microbiology , Washington , DC. 154. Weiner , A.M . an d Maizels, N . (1987 ) Proc. Natl. Acad. Sci. USA 84 , 7383. 155. Wilson , S.H . an d Abbotts , J. (1992 ) Transfer RNA in Protein Synthesis, (Hatfield , D.L. , Lee, BJ. an d Pirtle, R.M., eds) , p. 1 . CRC Press , Boca Raton. 156. Graflfe , M. , Dondon , J. , Caillet , J. , Romby , P. , Ehresmann , C. , Ehresmann , B . an d Springer, M . (1992 ) Science 255, 994 . 157. Schon , A., Krupp, G. , Gough, S. , Berry-Lowe, S. , Kannangara, C.G . an d Soll, D . (1986 ) Nature 322, 281 . 158. Poterszman , A., Delarue, M. , Thierry , J.-C. an d Moras, D . (1994 ) J. Mol. Biol. 244, 158 . 159. Arnez , J.G., Augustine , J.G., Moras , D . an d Francklyn , C.S. (1997 ) Proc. Natl. Acad. Sci. USA 94 , 7144 . 160. Aberg , A. , Yaremchuk , A. , Tukalo , M. , Rasmussen , B . an d Cusack , S . (1997 ) Biochemistry 36 , 3084 . 161. Czworkowski , J., Wang, J., Steitz , T.A. an d Moore, P.B. (1994 ) EMBOJ. 13 , 3661 . 162. AEvarsson, A., Brazhnikov , E. , Garber , M. , Zheltonosova , J. , Chirgadze , Y. , Al , K.S. , Svensson, L.A. an d Liljas, A . (1994 ) EMBOJ. 13 , 3669. 163. Moras , D . (1989 ) Nucleic Acids:Crystalhgraphic and Structural Data II, (Saenger , W. , ed.) , p. 1 . Springer-Verlag, Berlin , Heidelberg , Ne w York . 164. Arnez , J.G. an d CavareUi , J. (1997 ) Q . Rev. Biophys. 30 , 195 . 165. Kranlis , PJ. (1991 ) J. Appl Crystallogr. 24 , 946 .
This page intentionally left blank
Index AAAloop 34 7 AAA motif 34 6 A:A mismatch 328 , 440 , 57 9 in quadruplexes 4 2 in RNA 54 5 A:A platform motif 557 , 55 8 A:A:T bas e triple 44 0 ab initio calculations 9 7 A—B interconversion 12 8 A-B junction 12 9 acceptor arm 623 , 626 , 64 4 acceptor stem 619 , 63 2 accuracy of NMR structure s 25 8 A:C mismatch 316 , 51 7 A conformation 6 7 actinomycin D 364 , 52 1 adaptor molecul e 60 3 adeno-associated viral DNA 42 0 adenosine platform s 7 4 A-DNA 99 , 100 , 10 5 in crystals, cytosine methylatio n 24 7 crystal packing 120 , 12 3 deformability 13 2 grooves 11 9 helical parameter s 63 , 119 , 12 7 hydration 13 3 and metal ions 13 5 mispairs 13 0 octamers 125 , 13 0 in solution 13 9 sugar pucker 1 9 A'-DNA 11 9 A-form RN A geometr y 57 0 A-form shallo w groove 12 5 A-form sugar s 12 3 A:G mismatch 31 6 A—I base pair steps 46 1 alternating dinucleotides an d Z-DNA 23 8 AMBER program 100 , 111 , 25 6 aminoacyl tRN A synthetas e 603 , 62 3 2-aminoadenine 23 7 anisotropy of molecular motion 27 3 anticodon 131 , 603 , 63 7 anticodon base s 63 2 loop 549 , 576 , 615 , 623 , 626 , 63 1 stem 606 , 61 6 anti conformation 4 7
antigene 32 5 antigene strategy 36 2 antiparallel G quadruplex 39 4 antiparallel hairpins 40 8 antiparallel triplex 35 6 antiparallel triplex , heli x morphology 37 7 antiparallel triplex structure 37 6 antisense 36 2 antisense contro l 59 5 APP sequences 19 9 aptamer 397 , 417 , 574 , 582 aspartyl tRNA synthetase 63 1 atomic forc e microscop y 51 3 ATP aptame r 58 2 A-tracts 178 , 27 0 bend magnitud e 45 8 geometry 27 1 phasing 45 5 symmetry 45 6 propeller twis t 45 7 in protein-DNA complexes 46 5 A-type RNA stran d 1 3 average B-DNA structure 25 8 average DNA structur e i n solution 26 6 average helica l parameters of RNA 53 9 backbone conformatio n i n mismatches 33 7 backbone torsion s 5 1 backbone-modified A-DN A 12 9 bacteriophage 59 4 bacteriophage R1 7 575 bacteriophage T4 472 base geometries 4 0 base hydration i n A-DNA 30 3 base methylation 41 , 42 base notations 3 9 base pair morphology 25 , 51 , 15 4 base pair opening 10 3 opening i n i-motifs 43 6 stacking in A-DNA 12 6 base pairs 6 , 43 base protonation 4 1 base stacking 56 9 base stacking in A-DNA 13 1 base steps 158 , 163 , 17 5 base steps in protein—DN A complexes 17 5
654
Index
base tautomerism 44 , 315 base triples 125 , 609 , 62 5 base-pair displacement 50 3 B—A transition 1 6 B-DNA 98 , 99, 100 base steps 15 8 crystal structures 46 7 decamers 15 4 deformation 15 7 dodecamers 14 6 local helix structur e 154,19 0 in fibre s 6 , 16 parameters 6 3 simulation 10 5 stability 10 1 sugar pucker 15 5 symmetry an d strand orientation 4 6 bend progra m 5 6 bending 176 , 19 0 bending anisotrop y 26 6 bending anisotrop y of DNA 51 9 bending energ y of DNA 51 1 B-formRNA 57 0 B-helical wrinkle s 3 5 bifurcated hydroge n bonds 46 0 biological relevanc e o f DNA crysta l structures 163 biological significanc e of triplex DNA 35 8 Bombyx mori telomere 41 5 branch migratio n 49 2 Brownian dynamic s 505 , 51 0 B-type polynucleotides 1 9 building nucleic acid structures 4 8 bulge loop s 58 4 bulged base s 6 8 bulges in RNA 54 5 B-Z transitio n 22 , 243, 266 Calcutta bas e pair 54 6 Calladine's rule s 126 , 154 , 26 8 calorimetric measurement s of hydration 29 6 Cambridge conventio n 52 , 146 Cambridge Structura l Database 8 2 C an d D allomorphs 8 canonical base triplets 36 3 canonical triplexes 37 3 CAP protein 176 , 45 6 carbocyanine dyes 41 7 carcinogens 32 1 cation switch i n quadruplexes 42 3 C:C mismatche s 33 5 CEHS progra m 58 , 61 centromere 342 , 346 , 38 9 centromeric DN A 33 5
centromeric repea t i-motif 44 4 chain configuratio n determinatio n 51 1 chain cyclisatio n of DNA 50 6 CHARMM force fiel d 10 9 parameters 10 8 program 97 , 256 chemical probes o f tRNA 61 9 chemical synthesis of RNA 53 3 chimeric oligonucleotides 120 , 12 9 chimeric RNA/DN A heli x 54 6 chromosomal DN A 50 0 class I aminoacyl tRNA synthetases 62 4 class II aminoacyl tRNA synthetase s 631 , 63 5 clover-leaf structur e 604 , 607 , 615 , 635 cobalt hexammine 109 , 135 , 140 , 209 , 57 7 codon—anticodon interactions 31 6 cone of hydration 105 , 10 7 configurational collapse 51 7 configurational entrop y 10 0 conformational chang e in anticodon loo p 63 1 conformational flexibility from NMR studie s 273 conjugate gradien t method s 9 8 Cornell forc e fiel d 96 , 108 , 10 9 correlated backbon e transitions 10 6 COSY 254 , 257 counterions 10 5 coupling o f structural parameters 12 7 crankshaft motio n 127,13 2 cross-linked DN A simulation s 10 9 cross-linking triplexes 36 5 cross-strand stackin g 35 0 cruciform structure s 48 9 cryoelectron microscop y 64 5 crystal packing artefacts 10 8 effects 2 9 effects an d bending 46 2 effects i n Z-DNA 20 6 forces 117 , 155,26 7 of RNA 53 9 crystal simulations 10 8 crystal structures A-DNA 12 1 A-tracts 46 0 B-DNA 14 7 G quadruplexes 391 , 395 , 397 , 42 8 tRNA 60 5 crystallisation of RNA 53 4 crystallographic B-DNA, sequence-induce d variations 6 5 crystal-pure RNA 534 C:U mismatc h 32 0 curvature of DNA 45 5 curved DN A fragment s 47 3
Index 65 curved helical axis 5 9 CURVES program 59 , 60, 61, 66, 162, 175 , 26 0 CVFF force fiel d 9 7 cytosine bromination , effec t o n Z-DNA 23 0 cytosine methylation 42 , 247 cytosine methylation, effec t o n Z-DNA 224 , 228 cytosine substitutio n in triplexes 36 4 damage t o DNA 31 3 Dbase 62 2 d(CGCGAATTCGCG) 51, 107 , 108 , 109 , 126 , 135, 145,15 5 helical parameters 6 4 hydration 156 , 46 4 simulations 104 , 105 , 10 7 Dloop 611,616,625,63 6 Dstem 607 , 611,616,62 5 D structure of DNA 1 6 d(TpA) steps in Z-DNA 23 2 D-DNA 9 7-deaza-2'-deoxyxanthosine 36 4 Debye charg e screening 50 4 Debye lengt h 50 9 Debye-Hiickel model 51 4 deformability o f DNA 52 1 densitometric measurement s of hydration 29 6 deoxyribose couplin g constant s 25 7 DEPC probes 33 3 diagonal loops in G quadruplexes 40 2 dielectric constan t 9 6 dimeric RN A G-quadruplexe s 40 1 direction o f A-tract bending 45 6 distamycin 29 6 distance geometr y 25 7 distance restraints 254 , 25 7 DMS probes 33 3 DNA anisotropic motio n 27 3 bending fro m NM R 270 , 271 cleaving reagents 36 2 as a closed elasti c rod 50 8 compacting 12 5 condensation 133,21 1 conformational flexibilit y 27 3 curvature 45 5 damage 31 3 dynamics 52 4 fibres 1 , 123, 14 0 intrinsic curvature 52 1 as an isotropic rod 506 , 51 0 junction recognitio n 49 0 parvovirus 34 7 polyelectrolyte character 50 4
polymerase 36 2 repair 13 1 replication sto p signals 33 3 sequence-structure relationship s 145 , 18 9 supercoils 49 9 twist angles 15 4 DNA:RNA hybrids 4 , 13 , 120 hybrid simulation s 10 9 hybrid triplexe s 36 9 triplex 37 6 DNA-drug hydration 30 7 DNase I 139 , 35 9 dodecamer, sequence-induce d variation s 6 5 double hairpi n 34 4 Drew dodecamer 14 5 DTA triple x 37 5 duplex classification 47 echinomycin 36 4 E. Coli tRNA^" 603 , 615 E. Coli tRNAMet 57 6 EcoRI 29 7 EF-Tu 64 2 eigenvalue approac h 5 1 elastic energy o f DNA 51 0 elastic rod, DNA 50 3 electrostatic effect s 96 , 51 2 electrostatic partia l charges 9 7 electrostatic potential 10 9 elongation facto r T u 604 , 64 2 elongator tRNA 60 7 empirical forc e fiel d 25 4 energy minimisation 9 8 entropy 10 0 error propagatio n 33 1 ethenoadenosine 32 4 ethidiurn bromid e 41 7 ETS domain 15 8 Ewald methods 96 , 100 , 104 , 107 explicit solven t in simulations 95 , 99, 103 , 10 5 fast Ewal d methods 10 7 fast exchang e 27 3 Fe-EDTA 36 9 fibre conformation s 6 3 fibre diffractio n 1 , 10, 23, 199 , 295 , 431 , 437 , 456,459 A-DNA 117 , 12 3 computerised mode l buildin g 8 homopolynucleotides 1 0 parallel triplexes 365 , 37 1 precision 1 0
5
656
Index
fibre diffractio n (cont.) RNA:DNA hybrid s 4 , 280 RNA heli x 54 2 structure of B-DNA 14 6 flexibility of TpA ste p 27 4 flexible RNA loop s 58 1 fluctuations o f twist 51 0 fluorescence energy transfe r 474 , 481 , 48 3 fluorescence resonanc e energ y transfe r 58 9 FMN aptame r 59 2 folding of i-motif quadruplexe s 44 1 folding principles for nucleic acid branch point s 472
force field s 9 5 force field terms 9 6 form A 6 form 1 3 6 four-way junctions 47 5 four-way RN A junction s 485 , 58 8 fragile X syndrom e 350 , 418 , 43 1 frameshifting 59 5 Franklin 4 free energ y perturbation 10 1 free M D simulation s 27 5 free R-facto r 25 9 free R-facto r i n NMR 25 9 FRET 474 , 481 , 483 , 58 9 furanose puckerin g 6 , 8, 10 , 31, 95, 97, 155 GAA repeats 35 0 GAAA loop 349 , 57 4 GAAA tetraloop 59 6 G:A base pairs in RNA 540 , 545 GAG moti f 34 6 G:A:G:A tetrad s 43 1 G:A mismatches 316 , 333 , 342 , 487 , 57 7 GA stacking 33 6 G:A tandem base pair 550 , 55 6 gauche tendency 9 5 GCA motif 344 , 34 8 G:C:G:C tetrad 419 , 42 6 GCG repeat s 35 0 gel electrophoresis 47 3 gel migration studie s of bending 455 , 46 3 gene 3 2 594 general feature s o f tRNA structure 62 1 GGC triple x 38 0 G:G:G pairin g 34 2 G:G mismatch 316 , 31 9 , 332, 335 , 57 9 global curvature 5 6 global folding of DNA 50 4 global helical feature s 27 1 global helical parameters 6 7 global structure of four-way junctions 47 5
glutaminyl tRN A synthetase 625 glycosidic angl e 51 , 71, 97, 259, 372 , 403 , 43 7 glycosidic conformation s in polynucleotid e structures 31 , 35 GNA moti f 34 4 GNRAloop 34 9 GNRA motif 55 8 GNRA tetraloop 543 , 557 , 572 , 596 G quadruple x 39 0 biological relevanc e 41 8 energetics 391 , 403 , 40 9 loops 40 2 molecular dynamics refinement 407 , 409 , 410, 418 , 42 0 recognition 41 7 GROMOS force fiel d 106 , 10 7 GROMOS program 25 6 groove bindin g ligands, triplex stabilit y 36 5 groove dimension s 25 , 78 groove geometr y 6 0 groove—groove and —backbone interactions 303 groove widt h 30 2 group I intron 548 , 552 , 574 , 590, 59 6 Group I ribozyme 59 6 GTA triplex 37 3 G tetrad 45 , 332, 342 , 39 0 G:T mismatc h 31 6 G:U bas e pair 554 , 57 7 G:U wobble bas e pairs in RNA 543 , 54 4 hairpin loop 328 , 59 5 in RNA 57 2 loops of DNA 52 2 thermodynamic stabilit y 57 5 hairpin regio n 34 1 hairpin ribozym e 471 , 487 , 577 , 580 , 58 2 hairpins in promoter region s 34 7 hairpin structures 33 3 Hamilton 9 hammerhead ribozym e 533 , 548 , 556 , 58 8 H-DNA 36 0 helical parameters 25 , 54 , 96, 101 , 258 , 260 , 373 of A-DNA 11 9 from fibr e diffractio n 2 5 from NM R 26 6 of Okazaki fragment s 28 5 in parallel helices 36 9 of RNA 57 0 of Z-DNA 20 7 helical symmetry 23 , 25 helicoidal paramete r calculations 54 global approac h 55 , 56 local parameter algorithms 5 5
Index 65 helix—loop—helix proteins 17 5 helix morphology i n parallel triplexes 36 8 helix-turn-helix proteins 17 5 hemiprotonated cytosin e 32 7
HETCOR 25 4
heteronomous duple x 1 3 heteronomous structure 280 , 285 , 45 9 high propelle r twist 27 2 high-resolution triplexes 37 7 histone protein s 50 1 HIV integras e 417 , 426, 44 7 HIV revers e transcriptase 28 5 HIV TA R 584 , 59 5 Holliday junction 43 1 holonomic restraints 25 9 homeodomain—DNA complex, hydratio n 30 9 homeodomains 17 8 homobases 33 6 homologous recombinatio n 361,47 2 homopolynucleotide fibre s 1 0 Hoogsteen base pairs 13 , 42, 47, 70, 103 , 333 , 355, 546 , 550 , 55 8 Hoogsteen edges 39 0 Hoogsteen hydroge n bondin g 45 , 365, 59 0 H-type pseudoknot 59 3 human genom e 34 2 human telomer e 414 , 438 i-motif 44 5 NMR structur e 40 2 Huntington's disease 35 0 hybrid helica l parameters 28 4 hybrid junctions 28 7 hybrid parallel triplexes 36 9 hydration in protein—DN A complexes 30 9 hydration motifs 29 7 hydration o f A-DNA 13 3 hydration o f B-DNA 15 6 hydration o f phosphates 30 1 hydration o f RNA 30 6 hydration o f Z-DNA 30 5 hydration shells 29 5 hydrogen bond s 9 6 hydrogen exchang e kinetic s 44 2 hydroxyl radical footprinting 457 , 51 8 I:A mismatc h 31 9 IHF protein 177,46 5 i-motif 42 , 49, 336, 389 , 43 1 biological relevanc e 44 6 crystal structures 437 , 43 8 quadruplex 43 4 solution structures 432 , 441 , 444 , 445 implicit solven t calculation s 10 2 initiation o f replication 34 4
7
initiator tRNA""" 607, 642 inosine 131 , 231 , 316 , 41 8 substitution in G quadruplexes 40 1 tracts 46 1 insulin-linked G quadruplex 40 1 insulin minisatellite repeat 44 4 intercalation of mismatch pairs 43 6 internal loops 57 9 intramolecular RN A G quadruplex 40 1 intramolecular triplex 37 1 intrinsic curvatur e of DNA 52 1 in vitro transcription 53 3 irehdiamine 52 1 IRMA program 25 5 irregular DNA conformation s 59 , 64, 70 isolated spin pair approximation 26 1 isolated triples 59 0 I tract s 46 1 JUMNA program 10 1 junctions bending 27 2 folding 49 3 model o f bending 46 7 protein interactions 48 9 resolving enzyme s 489 , 49 2 specific nuclease s 48 9 junk DN A 34 1 Karplus equation 256 , 27 5 keto to enol tautomeris m 4 4 kinetoplast DNA 51 9 kink 67 , 177 kinked heli x axi s 5 1 kinking o f duplexes 47 5 kissing hairpins 567 , 59 5 Kuhn segments 50 6 lac represser 158,51 6 Langevin dynamics 505 , 510 , 52 4 lattice forces 120 , 12 3 left-handed heli x 4 , 204, 207 left-handed polynucleotide s 2 2 leucine zippe r proteins 17 5 limitations o f NMR structure s 27 1 linear helix axi s 5 5 linked-atom least-square s 2 3 linking number 500 , 515 , 52 0 linking numbe r parado x 51 7 local helix parameters 56 , 58, 67, 126 , 16 3 local rigidity of DNA 50 7 long-range distance s 25 5 long-range electrostati c interactions 96 , 103 , 110
658
Index
loop structur e 314 , 32 8 loop-helix interactions 59 5 loop—loop interactions 59 5 loops i n RNA 4 5 low-angle X-ray scattering 51 4 macroscopic curvatur e of DNA 46 6 magnesium bindin g site s in tRNA phe 61 1 magnesium io n effect s 10 9 magnesium ion s in ribozymes 550 , 55 6 MARDIGRAS program 25 5 MD simulatio n o f hydration 30 2 MDtar simulation s 27 7 metal ions 13 5 metal ions and four-way junctions 47 8 methionyl tRNA fMet formyltransferas e 64 2 5-methylcytosine 36 1 5-methylcytosine i n A-DNA 13 0 methyltransferase 32 3 minor groove hydration 156,30 0 minor groov e hydration in Z-DNA 236 , 239 , 246 minor groove width 155 , 26 0 minor groov e width an d sheared base pairs 3 4 mismatches 31 4 mismatches in RNA 57 6 mismatch repai r 33 1 mismatch triplex 37 9 mispairing in A-DNA 13 0 mitochondrial tRN A 62 1 mitomycin 10 2 modelling RN A structur e 59 7 modified backbone s in triplexes 36 5 modified base s 41,32 1 modified Karplu s equation 27 5 modified nucleotide s in tRNA 62 1 molecular dynamic s 60 , 99, 505 Monte Carlo calculation s 99 , 100 , 104 , 51 3 Monte Carlo method s 505 , 50 8 MORASS program 25 5 MPD 53 4 MPD, effect s o n gel mobility 46 5 mRNA splicing 48 7 multinucleosomal DN A 51 7 multistranded conformation 7 0 mutagenic pathway s 31 4 nanosecond simulation s 10 4 narrow mino r groov e an d bending 46 0 NASTE 20 7 NDBQuery 81 , 82 negative writhing 50 1 negatively supercoiling 50 1
neutron scatterin g 64 4 NEWHELIX program 56 , 60 N7G triple x 37 4 NMR andA-tracts 27 0 back-calculations 26 9 figures of merit 25 8 relaxation 27 3 restraints 25 4 R-factor 258 structure refinement procedures 25 7 studies of bending 270 , 458 studies of RNA:DNA hybrids 28 0 studies of sugar puckers in hybrids 28 1 studies of three-way junctions 48 3 NOE build-u p curve s 26 1 NOE intensit y errors 25 6 non-alternating Z-DN A structures 23 9 non-B-DNA conformations 36 0 normal mod e analysis 9 8 NUCFIT program 25 6 nuclease hypersensitivity 36 0 Nucleic Acid Database, NDB 77 , 87, 162 , 175 , 261, 502,56 7 nucleosome 501 , 505 , 515 , 51 7 nucleosome cor e particle s 45 5 nucleosome positionin g 27 0 nucleotide-binding moti f 62 4 OB fol d 63 2 Okazaki fragmen t 61 , 129 , 28 5 O6-methylguanine 32 2 O6-methylguanine methyltransferase 32 3 OPLS force fiel d 11 1 OPLS parameter s 9 5 oriented fibres 4 overall structur e o f tRNAs 62 1 overhangs i n RNA 54 5 8-oxoadenine 324 , 364 Oxytricha G quadruplex 39 4 Oxytricha G quadruplex solutio n structure s 40 6 p53 178 parallel-stranded DN A 45 , 49 parallel-stranded duplexes 336 , 43 1 parallel-stranded G quadruplexes 39 1 parallel triplex 35 6 parameters for helical nucleic acids 25 , 5 2 PARSE refinement 27 7 partial molar volume 29 6 particle-mesh Ewald metho d 96 , 107 , 10 8 particle—particle particle mesh Ewal d method 107
Index 65 P4-P6 domain o f group I intron 548 , 552 peptide nucleic acid 327 , 365 pericentromeric DN A 34 5 periodic boundary conditions 10 7 persistence length o f DNA 507 , 520 phase problem 2 3 phasing of A-tracts 45 5 £X174 DN A 34 1 phosphate hydration 30 1 phosphate orientations i n polynucleotide structures 3 5 phosphorothioate hybrid s 28 4 phosphorus chemical shif t 37 3 phylogenetic comparisons 59 7 pitch 2 5 PNA 327 , 365 PNA triple x structure s 380 point mutatio n and base tautomerisrn 4 4 poly A 56 9 polyamines 137,211,47 9 polyC 9 , 13,437 poly dA: poly dU 1 3 poly dA: poly d T 13 , 456, 459, 465, 467 poly dAI: poly dCT 1 3 poly dAT: pol y dAT 1 6 poly dGC: poly dGC 16,2 2 polyelectrolyte character of DNA 50 4 poly I 9 polymerases 50 1 polymorphic sequence s 41 8 polymorphism in polynucleotides 9 polypurine repea t sequences 33 2 porphyrin 41 7 potential energ y hypersurface for nucleic acids 98 precision in restraints 25 5 prediction of RNA structur e 59 6 Pribnow box 27 5 proflavine 297 , 307 propeller twis t 155 , 190, 457 propeller twis t in RNA 54 0 propyne triplex 37 5 Protein Databank 261 , 372, 377 protein-DNA hydration 30 9 protein—DNA recognition 17 5 protein-induced bendin g 50 3 protein-induced supercoilin g 51 5 protein-nucleic aci d interface 10 3 protein—RNA interations 62 6 protein synthesi s 64 2 pseudo-isocytidine 36 4 pseudoknot predictio n 59 7 pseudoknots 567 , 592 pseudoknots an d magnesium ions 59 3 psoralen 36 5
purine-containing triad s 43 1 purine—purine mismatches 315 , 332 purine—pyrimidine mismatches 31 5 pyrimidine—pyrimidine mismatches 315 , 319 quadruplexes 10 , 13, 45, 50, 314, 327 quantum mechanica l simulations 10 7 Qbase 62 1 queuine 62 2 raman spectroscopy 12 7 RASMOL program 8 3 ras-P21 644 recombination 45 , 431, 471 relational database 8 1 relaxation matrix methods 25 5 repair enzymes 319 , 331 434 represser 46 5 resolvase 15 8 restrained MD 26 1 restrained molecular mechanics 25 7 restrained Mont e Carl o calculation s 257 , 261, 277 reversal loops in G quadruplexes 41 0 reverse Hoogsteen base pairs 579 , 593, 609 reverse Hoogsteen hydrogen bondin g 45 , 47, 356, 376 reverse Watson-Crick base pair 47 , 609, 614 , 625 reverse wobble pair 57 4 reversed Watson-Crick duplexes 5 0 rev response elemen t 57 7 ribose zipper 539 , 557, 558 ribosomal protein SI 7 632 ribosomal RN A 550 , 572, 64 4 ribosome 604 , 642, 644 ribosome structur e 64 4 ribozyme 324,341 , 416 ribozyme foldin g 49 3 Rich 9 rigid suga r conformation 27 5 RNA antiparallel triplex 37 7 average helical parameters 53 9 A—Z transitio n 57 0 base stacking 56 9 bulge 59 6 bulge loops 58 4 crystal packing 53 9 double heli x geometry 57 0 fibre helix 54 2 G quadruplexes 41 7 hairpin 57 5
9
660
Index
RNA (cont.) hairpin loop 57 2 helices 53 9 hydration 306 , 542 , 54 3 internal loops 58 3 junctions 58 6 metal binding 57 7 mismatch pairs 57 8 parallel helices 36 9 purification 53 4 quadruplex 394 , 40 1 secondary structura l motifs 56 7 single-stranded 56 9 stability 10 2 structure prediction 59 6 synthesis 53 3 tectonics 56 0 tertiary structure s 58 9 tetraloop 110 , 125 , 57 2 triple helices 58 9 triples 58 9 Zform 57 0 RNA:DNA hybri d suga r puckers 28 1 RNA polymeras e 347 , 359 , 362 , 501 , 533 , 60 4 RNA:RNA interactio n motif s 55 6 RNase H 280 , 285 RNasel 35 9 rod model for DNA 50 3 roll bending 17 7 roll/slide/twist correlations 18 2 Rossmann fold 624 , 64 1 RRY triple x 37 7 ruthenium hexammin e 20 9 RuvA and RuvC proteins 48 9 satellite DNA 332 , 335 , 34 2 sequence effect s o n Z-DNA 22 2 sequence-dependent DN A structur e 65 , 101 , 107 126 , 52 5 sequence-dependent structura l variations from NMR 26 7 sequence-specific flexibilit y 10 6 sequence-specific hydratio n 10 5 sequence-specific structura l rules 25 3 seryl tRNA synthetase 63 5 sheared base pairs 336 , 337 , 34 0 sheared G: A mismatches 343 , 344 , 349 , 550 , 572 Shine—Delgarno sequence 540 simulated annealing 505 , 50 9 simulation o f nucleic aci d crystals 10 8 single-nucleotide bulge s 58 4 single-stranded RN A 56 9 sodium counterion s 10 5
solvent accessibl e surface are a 24 7 solvent fre e energ y 223 , 22 9 solvent structur e 20 8 solvent-accessible surfac e 223 , 63 2 spermidine 21 1 spermine 133 , 137 , 211 , 231 , 47 9 spin diffusio n 25 5 spine of hydration 105 , 135 , 156 , 260 , 297 , 301 , 461,464 spine o f hydration i n Z-DNA 239 , 24 6 spin-lattice relaxation 27 3 5S RNA 316 , 394 , 540 , 580 , 58 8 stability of RNA mismatche s 57 6 stacked X structure 47 7 standard dictionaries o f geometries 8 2 steepest descent method s 9 8 stem-loop 60 4 straight A-tract model 46 7 straight DNA 46 6 structure recognition of DNA junctions 49 0 structure validation 8 2 SOS subunit 64 5 sugar conformation s 5 1 sugar flexibilit y 27 4 sugar pucker inA-DNA 119 , 127 inB-DNA 155 , 18 9 from fibr e diffractio n 3 1 in hybrids 28 1 by NMR 37 2 in parallel triplexes 36 8 sugar repuckering 27 5 supercoiled DNA 503 , 506 , 50 9 supercoiled Z-DNA superhelices 50 2 SV40 51 8 symmetry o f A-tracts 45 6 symmetry o f Watson—Crick pairing 4 8 syn conformatio n 44 , 49 , 102 , 204 , 231 , 236 , 239 syn cytosin e 24 4 T3A loops 44 2 TAR hexaloo p 57 6 TAR RN A 58 4 TATA box 28 7 TATA box-binding protein 71 , 133 , 139 , 158 , 177,467,521 TCG triple x 374 , 37 9 T4 DNA polymeras e 57 5 telomere 327 , 332 , 38 9 tertiary structure prediction 59 7 tetrad model 154,16 1 Tetrahymena ribozym e 548 , 55 3
Index 66 Tetrahymena telomer e 394 , 41 4 tetraloop 110 , 543 , 557 , 558 , 57 2 TFIIIA 128 , 13 9 therapeutic application s of triplex DN A 36 2 thermodynamic cycl e for Z-DNA 24 6 thermodynamics o f G quadruplex 391 , 403 , 409 thermodynamics o f hydration 29 6 4-thio-U 62 2 third stran d abasic residues 36 3 three-stranded model s fo r DNA 9 three-way junction 348,481,55 6 three-way RN A junction 493 , 588 thrombin-binding G quadruplex 397 , 41 7 crystal structure of thrombin comple x 39 9 thymine dime r 10 2 simulation 10 9 T loop 609 , 639 tobacco mosaic virus 4 tobacco ringspo t viral RNA 471 , 487 topological constraints i n supercoiled DNA 50 0 total stran d twisting 50 0 TpA ste p 27 4 trans U:U bas e pair in RNA 54 6 transcription factor IIIA 58 8 transcription factors 36 2 transcriptional regulation 35 9 transition mutation s 315 , 32 2 transversion mutatio n 315 , 32 4 triad DN A 7 4 triosin A 10 3 triple heli x construction 4 9 triple strand s 56 7 triple-stranded structure 1 3 triplet mismatches 33 5 triplet repea t disease 38 9 triplet repeat s 33 5 triplet repeat sequences 41 8 triplex-binding protein s 36 1 triplex DNA 10 , 314, 321 , 325 , 35 5 and chromosom e condensatio n 36 1 and transcriptiona l regulation 35 9 therapeutic applications 36 2 triplex families 4 9 formation, biologica l significanc e 35 8 stability 36 4 tRNA 125 , 130 , 316 , 324 , 328 , 549 , 557 tRNAA'P 63 1 tRNAGln 625 , 630 tRNA1'1" 592 , 598 , 62 5 tRNAScr 63 5 tRNAT"r 63 0 tRNA folding 34 3 tRNA guanine transglycosylas e 62 2 tRNA hydration 30 6
T7 RNA polymeras e 53 3 trp represser—DN A hydration 30 9 tip represser—operato r comple x 135 , 13 9 T:T mismatc h 336 , 43 6 T-tracts 45 6 turnip yello w mosai c virus 59 3 twist energ y 512 , 51 5 two-way RNA junctions 58 6 U1A protein 58 0 U:C bas e pairs in RNA 54 6 U:G bas e pair 57 4 UNCG tetraloop 57 4 universal bases 4 1 unwinding 6 3'-UTR RNA 582 U mm 549 , 573 , 576 , 60 9 U:U mismatc h 32 0 in RNA 54 5 valence geometry 7 8 validation of structures 8 2 van der Waals parameters 9 7 variable loo p 61 9 vinyl chloride 32 4 water 10 3 inA-DNA 13 3 inB-DNA 15 6 bridges 13 4 exchange 297 in fibre structures 2 9 models 10 4 networks 20 9 pentagons 302 , 30 7 Watson an d Crick 6 , 8 Watson-Crick base pairs 9 , 43, 47, 313, 33 1 wedge angle s 5 6 wedge mode l 52 1 wedge parameter s 5 7 White's equation 50 0 Wilkins 4 , 9 wobble bas e pairs 44 , 110 , 130 , 316 , 320 , 607 , 614 in RNA 54 3 wobble hypothesi s 60 3 wobble position 62 2 world-wide web 78 , 86 writhing numbe r 500 , 511 , 515 per nucleosom e 51 8 X-PLOR program 82 , 256
1
662
Index
yeast tRNA^ 60 3 yeast tRNA Asp 557,61 1 yeast tRNA™" 576 , 603 , 61 6 yeast tRNA phe 603 , 607 , 61 6 yeast tRNA Ser model 61 9
Z conformatio n 6 3 Zh Z, [ conformations 206 , 30 5 ZH conformation 21 7 Z-DNA 4 , 22, 49, 98, 137 , 42 8 crystal simulation 10 8 crystallisation 20 0 grooves 204 , 20 9 helical parameters 205 , 207 , 218 , 222 , 28 , 230, 23 7 hydration 30 5
and magnesiu m ion s 214 , 23 6 ordered hydration 23 6 in plasmids 20 0 and polyamines 200,211,21 6 salt transition 22 9 sequence effect s 222 , 24 5 simulations 10 6 solvent structure 208 , 24 5 spine of hydration 239 , 30 5 stability 10 2 structure 20 4 sugar puckers 20 5 symmetry an d strand orientation 4 6 water interaction s 20 9 Z-formRNA 57 0 Z-form ste m 32 8 zinc-binding proteins 17 5 zinc finger motif 12 8