GENETICS – RESEARCH AND ISSUES
MOLECULAR POLYMORPHISM OF MAN: STRUCTURAL AND FUNCTIONAL INDIVIDUAL MULTIFORMITY OF BIOMACROMOLECULES
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
GENETICS – RESEARCH AND ISSUES Additional books in this series can be found on Nova‘s website under the Series tab.
Additional E-books in this series can be found on Nova‘s website under the E-books tab.
GENETICS – RESEARCH AND ISSUES
MOLECULAR POLYMORPHISM OF MAN: STRUCTURAL AND FUNCTIONAL INDIVIDUAL MULTIFORMITY OF BIOMACROMOLECULES
SERGEI D. VARFOLOMYEV AND
GENNADY E. ZAIKOV EDITORS
Nova Science Publishers, Inc. New York
Copyright © 2011 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers‘ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Molecular polymorphism of man : structural and functional individual multiformity of biomacromolecules / [edited by] Sergei D. Varfolomyev and Gennady E. Zaikov. p. ; cm. Includes bibliographical references and index. ISBN 978-1-61324-929-1 (eBook) 1. Genetic polymorphisms. 2. Phenotypic plasticity. I. Varfolomeyev, S. D. II. Zaikov, Gennadii Efremovich. [DNLM: 1. Polymorphism, Genetic. 2. Genetics, Population. 3. Genomics. QU 500 M718 2009] QH447.6.M65 2009 611'.01816--dc22 2009021411
Published by Nova Science Publishers, Inc. † New York
CONTENTS Foreword
vii S. D. Varfolomyev and G. E.Zaikov
Chapter 1
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism S. D. Varfolomyev , I. N. Kurochkin and I. A. Gariev
Chapter 2
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva and V. A. Tarasov
Chapter 3
Chapter 4
Association of Candidate Genes Polymorphism with Asthma in Bashkortostan Republic of Russia E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova and I. R. Gilyazova Genes and Languages: Is Are There Correlations between MTDNA Data and Geography of Altay and Ural Languages E. Khusnutdinova and I. Kutuev
1
77
101
129
Chapter 5
Common and Special Features of the Human Ribosomal DNA Natalia. S. Kupriyanova and Alexei. P. Ryskov
145
Chapter 6
Ethnic Genomics of the East European Human Populations S. A. Limborska, D. A. Verbenko, A. V. Khrunin and P. A. Slominsky
175
Chapter 7
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova and Y. B. Lebedev
Chapter 8
Biomedical Aspects in Investigations of Biochemical Polymorphism of Actins and Some Actin-Binding Proteins S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva, M. A. Kovaleva, L. S. Eremina and V. O. Popov
203
237
vi
Contents
Chapter 9
Molecular Mechanisms of Adaptation: Stress and Aggression A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova, M. A. Kulikova, M. A. Timofeeva, O. V. Sysoeva, V. A. Shleptsova and A. I. Grigoriev
Chapter 10
Ethnogenomics: The Genetic History of Humans Written in Chromosomal DNA Markers L. A. Zhivotovsky and E. K. Khusnutdinova
Index
281
299 325
FOREWORD S.D. Varfolomeev and G.E. Zaikov ―You shouldn‘t be a motiveless optimist to believe that fifty years later the ‗biological code‘—chemical encryption of hereditary features—will be decoded and read. Since that moment, the man will become the absolute sovereign of the living matter.‖ V.A. Engelgardt, 1957 ―The man will not become the lord of the nature until he will not become the lord of himself.‖ G. Gegel, 1807
The Academician V.A. Engelgardt has rather accurately predicted the time of performance of the outstanding event in the history of mankind—decoding of the human genome. If you call to notice that this prediction was made just three and half years after discovery of DNA as the informative molecule carrying genetic information, one may only be surprised about sagacity and scientific intuition of the Academician V.A. Engelgardt. During fifty years, science has made a terrific experimental and theoretical academic breakthrough that by the beginning of new millennium provided decoding of human DNA genome molecule structure, and genome of many microorganisms, plants and animals, as well. At present, the structures of DNA genome for more than 500 organisms are determined. This process is in progress, and its intensity and information received avalanche-like increase. Besides deep admiration of achievements of the science and display of power of the global human intellect, what does this give to the modern society? Decoding of genomes has many consequences, and a lot of them will be continued and developed in recent years or decades. Decoding of the human genome creates a qualitatively new state in development of modern fields of science, technology and medicine. One of the basic results of the Human Genome Project, which are already comprehended, is formation of a basis for investigating genome of every individual with detection of differences at the gene and protein levels. Chemical-biological approach based on highly efficient physical methods provides opportunities of detailed molecular genetic typing of the population, investigation of genetic polymorphism, individual features of enzymatic and molecular-receptor processes in every person. Achievements in human genomics and proteomics, chemical enzymology,
viii
S. D. Varfolomeev and G. E.Zaikov
bioinformatics and medical genetics form the basis of modern investigations and multiple practical uses. The accuracy and efficiency of modern analytical methods allows for assigning tasks of obtaining genetic and proteomic molecular portray of every individual and detection of individual differences of personalities at the genetic and protein levels. In the nearest decade, post-genomic and proteomic investigations will lead to significant changes in many spheres of social life. At present, qualitatively new molecular medicine based on determination of the ultimate causes of many diseases is being established. Aptitudes and development of many diseases are genetically defined. Basing on post-genomic and proteomic studies, new branches, such as cardiogenomics, oncogenomics, neurogenomics, pharmacogenomics, based on objective appraisal and reduction of risks of cardiovascular diseases and cancer, forecast of neurodegenerative processes and aging appear in medicine. Today, it is referred to as creation and development of individual medicine based on molecular-genetic and proteomic human portray. Occupational guidance and study of personal dispositions in various spheres of action may be based on molecular-genetic analysis. Molecular genetic typing is the foundation for reasonable determination of potential occupational abilities of a man. The study of polymorphism of genes defining physical, psychological and intellectual human characteristics seems to be of crucial importance. In the modern post-genomic process, one of the main targets is creation of a unified platform for genetic analysis and the basis for genotyping of the population. Functional reserves of the human organism are significantly defined by the genotype of parents. At present in developed countries, and in the nearest future in Russia, a system for estimation of risks and abilities of children basing on genetic portray of parents is being developed. It is expected that in full, this system will start functioning in the nearest decade. Genetic forecasting of pathology risk and human abilities at the background of many social factors is the material basis for transition to genetically healthful population. Post-genomic projects suggest many special supplements; in particular, developed approaches provide full and unambiguous identification of an individual using superlow trace quantities of biological materials. The molecular genetic approach becomes the foundation for many human sciences. Analysis of structural features of genomic DNA passed from generation to generation is fundamental for the modern approach to the study of origination and evolution of ethnoses. New fields of science, ethnogenomics and ethnogeography appeared. Post-genomic development of the science touches upon many spheres of life of the modern society. Basing on molecular presentations created by modern physical, chemical and molecular biological methods and operating modern information technologies, which use an advanced mathematical apparatus, this field creates extremely socially meaningful products affecting development of the society as a whole. The problem of studying molecular polymorphism of a man is interdisciplinary and interesting for investigators in various branches. Recognizing the multilevel and interdisciplinary kind of the problem, in 2006, Russian Academy of Sciences and M.V. Lomonosov Moscow State University established a joint project aimed at solution of many problems, coordinated around the study of the multiformity of human biomacromolecules. The book suggested is the result of the first stage of development of this project. Workers of many scientific organizations in Russia took part preparation of this book.
Foreword
ix
Understanding of the modern methodical level of investigations of the same gene variety at the level of genomic DNA and expressed protein structures seems to be of principal importance. Methodology of the investigation includes various methodical approaches and physical observation methods that allow detection of differences in the gene structure, including, at the level of singular substitution of nucleotides. Of importance are detailed analysis and application of modern high sensitive, precise and highly productive methods based on mass spectrometry of biomacromolecules in this field. A number of key systems such as full system of enzymes or ribosomal RNAs existing in the human organism require analysis and understanding of the molecular variety affecting human molecular physiology. The questions about the role of structural modifications and singular substitutions in the structure and function of macromolecules may be answered both at experimental and theoretical stages of the investigation. The important role of modern computation and information technologies in the analysis of the role of singular substitutions, heritable and stable in the human population, should be emphasized. Human aptitudes to some diseases are genetically determined. On the example of analysis of the literature and self experimental data, the role of genetic and proteomic polymorphism in development of ischemic heart disease, liver diseases, bronchial asthma, habitual noncarrying of pregnancy, etc., is discussed in this book. The investigations of genetic control of carcinogenesis and mutagenesis are of the greatest importance and interest. At present, genetic aspects of psychology, type of psychological behavior, personality of individual, and psychoemotional behavior are the subject for molecular-genetic and proteomic analysis. Actually, molecular genetic methods of personality identification is of high practical significance. A special article in this book is devoted to the aspects of forensic genetics. Amazing results were currently obtained on genetic history of the mankind recorded in some DNA-markers. Polyethnic structure of the population is the specific feature of Russia. Molecular kinetic methods allow clarification of complex problems in intercommunication of different ethnoses in Russia. This is hard work lying on the shoulders of Russian researchers of ethnic genomics and genogeography. Modern results obtained in this field are shown in the final section of this book. Thus, the reader gets the book introducing him into the world of ideas and results oriented at the post-genomic development of the society. This is the book devoted to molecular polymorphism of man based on structural multiformity of biomacromolecules. Achievements of many branches of the modern science—physics and chemistry, biology and medicine, mathematics and informatics—and many human sciences, fancily and incredibly intersect in this field. This field is extremely socially significant, because it touches upon every individual. The nearest decade of the post-genomic era will give many interesting and unexpected achievements. The Editors would like to say many thanks to Dr. Valeriya I. Naidich for technical help during preparation of this volume. S.D. Varfolomeev, MSU Professor, Corresponding Member of RAS G.E.Zaikov, Professor, Institute of Biochemical Physics RAS
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 1
HUMAN ENZYMES – —GENETIC, PROTEOMIC AND CATALYTIC POLYMORPHISM S.D. Varfolomeev* , I.N. Kurochkin* and I.A. Gariev N.M. Emanuel Institute of Biochemical Physics, RAS M.V. Lomonosov Moscow State University, Chemical Department, Moscow, Russia
ABSTRACT Various aspects of enzyme molecular polymorphism phenomenon are considered. The state of investigations in the field of physical and structural chemistry of enzymes in the context of manifestation of some protein structure elements‘ role in catalytic activity and formation of tertiary structure of the protein is are analyzed. Capabilities of molecular mechanics in the study of effects of structural variations on a catalytic site and quantum-mechanical calculations of elementary acts of the catalytic cycle at relatively low changes in distances between catalytic groups are discussed. Bioinformative approaches in the study of catalytic site structures are considered. Analysis of protein capabilities and databases, which provide information on human enzymes at the genetic and structural levels, is of special attention. Databases and databanks of genes, proteins and single amino acid replacements, and database for the study of genetic polymorphism associations with diseases are considered. In addition, questions about theoretical methods of forecasting effects of single replacements on the structure and function of protein are considered. An approach that applies the entropic portray of a family of homologous proteins to detection of conservative and variable parts of the polypeptide chain important for the structure and catalysis is developed. On the example of several most physiologically important enzymes (acetyl- and butyrylcholin esterase, paraxonase, carboxylesterase, alcohol dehydrogenase, alkaline phosphatase, protein phosphatase, angiotensin converting enzyme, cyclooxygenase, catalase, peroxidase, superoxide dismutase) molecular polymorphism expression by these biomacromolecules in both three-dimensional structures and entropic images, and separate functions of the organism are considered. * *
[email protected] [email protected] 2
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
ABBREVIATIONS SNP PON HDL DNA RNA ACHE BCHE CE ADH ALPL ACE COX MPO EPX TPO SOD OPC
single nucleotide polymorphism paraoxonase high density lipoproteins deoxyribonucleic acid ribonucleic acid acetylcholine esterase butyrylcholin esterase carboxyl esterase alcohol dehydrogenase alkaline phosphatase angiotensin-converting enzyme cyclooxygenase myeloperoxidase eosinophilic peroxidase thyroid peroxidase superoxide dismutase organophosphoric compounds
INTRODUCTION Investigation of human molecular polymorphism is one of the most important and socially valuable modern post-genomic projects. The essential concentration of efforts of the world scientific society concluded in an amaizing achievement— – decoding of DNA, the structure of genomic informative molecule, which defines the structure and functioning of biomacromolecules and molecular machines in the human organism. One of the expected, but nevertheless solely important, results of the project is demonstration of the fact that the man has one genome, but every particular individual has different genes. The multiformity of structures of each particular gene at the DNA level, already occurred and still occurring, inheritable and passed on from generation to generation, led to a giant accumulation of differences of the individuals at the molecular level and human molecular polymorphism. Understanding of these molecular features of everayevery individual has many important and essential consequences. In the multiformity of biomacromolecules, enzymes take a special place. 1. Enzymes form the frame of metabolism. Being rather scanty in the human genome (about 3,000 among the total number of identified genes of 300,000), enzymes determine rates, directions and stationary concentrations of all chemical reagents thatwhich participate in organism function. Therefore, changes and variations in the structure of any enzyme or at the level of its expression may significantly contribute into behavior of the whole system. This is of special importance, if an enzyme is limiting in a complex sequence of transformations by any metabolic path. Structure variations may affect many properties of the enzyme: catalytic activity, stability,
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
2.
3.
4.
5.
6.
3
membrane activity, transfer targeting in the cell. All these properties are principally important and may significantly affect processes proceeding in the cell. Genetic and proteomic modification of an enzyme is modification of metabolism. At present, enzymes are the most well- studied class of proteins. For a wide variety of enzymes, the information about three-dimensional structure with atomic resolution is obtained, active sites are identified and reasonable representations of molecular mechanisms of catalysis are formed [1]. Structural antinomy of the enzymatic catalysis is so that the multiformity of enzymes having different primary structure is provided with rather limited number of catalytic sites. Apparently, the nature by a head found a combination of atoms with catalytic activity, learned to form their three-dimensional structure from polypeptide chains with various sequences of amino acids and specialized to use them in various enzymes and organisms. Structural convergence of proteins to the limited number of active sites observed forms quite favourable prerequisites to solution of the general problem, which is obtaining of the comprehensive and full information about all catalytic sites of enzymes existing in the nature. Information and computation technologies, developed by the present time, allow raising a question about a possibility to reconstruct full three-dimensional protein structure based on the notion of the primary structure. In the case of enzymes, this procedure is essentially unique by virtue of the structural antinomy of enzymatic catalysis and commonality of active site structures for enzymes from different sources. The methods developed allow reconstruction of the protein structure by homology. This is principally important stage in the study of enzymes, because in the modern science, the basic structural information in presented by primary sequences mostly due to development of genomic investigations. Thus, at the present time a question may be arisedraised about structural reconstruction of all human enzymes basing on genomic data with detection and analysis of possible molecular polymorphisms. Modern chemical enzymology has developed rather branched and reliable network for measuring catalytic activity of enzymes of all classes. This provides the basis for studying polymorphism of enzymes at the post-translational level by their functional activity. It is principally important and interesting to investigate molecular polymorphism of enzymes at two limiting points of molecular expression of the information— – at the level of gene and at the level of its end product. Many interesting observations and suprisingsurprising findings may be expected. The study of molecular polymorphism of enzymes will provide the possibilities of molecular interpretation of physiological features of organism. It has been known that the basic and the most well-known genetic diseases, such as Felling's disease, connected with polymorphism of enzymes. It is obvious that finer distinctions in the structure of enzymes define physiological features of the organism ont of one or another individual, such as physical and mental capacity, aptitude to particular diseases. It seems principally important to search for correlations: gene structure and activity of enzyme— - physiological response. Totality of such data will form principally new bases for molecular physiology. Enzymes are targets for a great number of modern pharmaceuticals. The study of structural polymorphism of enzymes forms grounds for understanding the
4
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev mechanisms of action of particular pharmaceutical on different individuals. Understanding of these mechanisms is the necessary formation stage of individual medicine and the way for creation of improved pharmaceuticals.
PHYSICAL AND STRUCTURAL CHEMISTRY OF ENZYMES: FROM THE PRIMARY STRUCTURE TO ELEMENTARY ACTS OF CATALYSIS Nearly 200 years passed since in 1814, K.S. Kirchhoff, the natural scientist from SaintPetersburg, discovered the phenomenon of chemical reaction acceleration by a biological substance. During that time, the biological (enzymatic) catalysis has passed a long way of evolution of apprehension of this phenomenon‘s nature and currently forms the area, which is deeply understandiable from the fundamental positions and widely applied to various fields of human activititesactivities. In 1836, Jons Jakob Berzelius, one of the founders of modern chemistry and the father of catalysis, wrote: ―In plants and animals thousands of catalytic processes between tissues and fluids proceed, implementing a great many chemical syntheses from a single primary material‖ [2, 3]. Hence, from the very beginning of the investigations, it was clear that the enzymatic catalysis demonstrates a number of absolutely outstanding properties: 1. Catalysis by enzymes, as a rule, is by 1012 - 1015 times more effective compared with the ―classical‖ chemical catalysts, hydrogen ion, for example. 2. Protein molecules, the material carriers of enzymatic activity, are able to "recognize" molecules of reagents and carry out reactions selectively with molecules of certain structure. The latter property is solely important in biological systems, because it provides directional flow of chamicalchemical variations in complex multicomponent biological mixtures of substances. In the recent 20-30 years, historical advances in the understanding of enzymatic catalysis and development of multiple applications of enzymes happened. The existing achievements are determined by the colossal scope of information about the structure of proteins, progress in studying kinetics and mechanics of reactions catalyzed by enzymes, creation of methods, which allow to manipulate the protein structure basing on modification of its gene, the use of modern computer informative and calculation methods. The latter shall be outlined. Creation, development and active use of storage and processing methods for large information volumes, a possibility of high -volume and high -speed computations provided success in this field of science and technology. Comprehensive study of enzymes and their application provided origination, establishing, development and success of the whole fields of modern natural science. Enzyme is a precise and high- performance tool for carrying out complex reactions, which require accuracy is cleavage and synthesis of particular chemical bonds. Providing opportunities of directional genetic modification of organisms and obtaining of proteins with useful properties at the required level, genetic engineering is based on ability of enzymes to hydrolyze selectively
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
5
particular bonds in nucleic acid molecules and synthesize, if necessary, phosphodiester bonds and link separate DNA and RNA chains.
Chemical Kinetics and Structural Investigations – The Basis for Making Notions About Mechanisms of the Enzymatic Catalysis At present, a significant progress in the understanding of enzyme operation mechanisms is reached. Systematic and miscellaneous investigation of enzymes allowed provision of a systematic approach to the study of this natural phenomenon. Modern presentations about the mechanism of enzyme action are based on two principally important baseline surveys: 1. Investigation of kinetic process schemes with identification of substrate insertion points to catalytic cycle, detection and determination of the chemical nature of labile intermediate compounds and analysis of various states of the active site. 2. Study of the structure of enzymes and their active sites. Creation of experimentally stipulated presentations about mechanisms of catalysis is based on the use of structural data and the study of reaction kinetics with identification of intermediate compounds participating in the mechanism of the process. Formally kinetic analysis of reactions catalyzed by enzymes is the subject of rather intensive investigations, carried out in recent decades. The baseline principles of analysis and basic results are represented in a series of monographs, textbooks and manuals [4-8]. This work results isn a comprehensive analysis of various kinetic schemes connecting reaction rates with concentrations of participants of the catalytic process (substrates, active sites, intermediates), and description of temporal development of catalytic process. In the most of the cases, similar to the study of both stationary and nonstationary processes, a significant fact making analysis simpler is structurally homogeneous character of the active site of enzyme and linearity of the main equations describing elementary stages of the catalytic process by enzyme concentration. This makes formally kinetic description of the reaction easier and makes possible adequate comparison of theoretical equations with experimental data. The conclusion of formally kinetic analysis is that catalytic cycle of the enzyme action represents a multistage chemical and physical process involving a set of labile intermediate compounds, which include both chemical intermediates and conformers of catalytic groups, substrates and intermediates. Formal kinetic survey of various kinetic schemes and appropriate experimental study in both stationary and unstationary modes allowed obtaining of detailed information about the number and nature of intermediates, characteristics of kinetic processes of their formation and transformation [1].
6
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Bioinformatics in the Study of Catalytic Site Structures For the majority of enzymes, active sites of enzymes representing a collection of functional radicals of amino acids may be rather reliably divided into two principally important functional areas – —sorptional and catalytic. The sorptional area performs the substrate complex formation with the enzyme and is responsible for substrate selection and specificity of the enzyme. The catalytic area, which, as a rule, comprises acids and bases, metal ions and prosthetic groups, performs principally important stages of substrate activation and its chemical transformation. In the recent decade, focusing of interest to structures of proteins allows stating with a certain assurance that a significant part of basic catalytic sites existing in the nature is now identified and studied. This assurance is based on a great experience in chemical enzymology, which operates a large array of structural data, reveals affined structures of catalytic sites for many enzymes [1]. Hence the enzymes of various classes may comprise identical or almost identical structures of catalytic sites. At the present time, information on the primary structure of more than a million of proteins, including proteins comprising genomes of man, animals, plants and bacteria, is obtained. Database on three-dimensional structures includes over 30,000 structures, with 5,000 basic structures among them. A significant part of these proteins are enzymes. Given that the classifier of enzymes comprises over 3,730 items, the statement that the majority of catalytic structures are already known to us is particularly confirmed. Bioinformatics methods are highly significant for the study of enzymes and the mechanisms of their action [9-15]. A possibility of comparing structures of many proteins using modern informative technologies provides an opportunity to determine their general, functionally valuable elements, active sites of enzymes, in particular. This approach is based on the natural phenomenon, according to which the totality of proteins, nearly infinite by the protein structure and having catalytic activity, has catalytic sites formed by a rather limited number of structures. Basing on the computer technology, we have developed two identification methods for functional groups composing an active site of any enzyme. The structure of catalytic site of enzyme may be determined from data on the primary sequence of amino acids in the protein [11, 12]. On the other hand, having data on atomic resolved three-dimensional structure of the protein, one can also detect a catalytic site, even if it has not been identified before [15]. From data on the primary sequence of amino acids in the protein informative technologies allow determination of catalytic groups of the enzyme. This approach is based on comparison of primary structures of proteins defined as carriers of any enzymatic activity with detection of general or the so-called ―conservative‖ positions containing the same amino acid. Modern computer databases contain significant informationoninformation on the structure of proteins and enzymes. The case in hand is both the primary sequence of amino acids in the polypeptide chain and atomic resolved structure of proteins. The biggest volumes of information on primary structures, i.e., on the sequence of amino acids in proteins, is obtained. Investigations of genomes have provided this field with giant volumes of information, which continue increasing exponentially. It has been known that the primary sequence is that defining all structural hierarchies of the protein. The following scientific problem is obvious: how could we forecast complete three-dimensional structure of protein including the structure of its active site basing on the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
7
knowledge of the sequence of amino acids? At present, to solve this task, at least, two approaches are used. The first approach proceeds from theoretical calculation of the energy performance of protein globule with searchinbgsearching for structures with minimal energy [16]. It is associated with large scope of calculations. Moreover, in the most of the cases, the task may not have a single-valued solution, although it may be expected that the progress in this field may lead to applicably valuable results. The second approach is much simpler. It is based on composing a structure by homology. The approach is based on the fact that many proteins, speciallyespecially proteins from various sources which perform the same or close function, are rather similar and possess high homology degree. Since for many proteins the complete three-dimensional structure is known, these structures can be used for composing the homologue structure. This significantly simplifies the calculations. To detect amino acids forming active site of an enzyme and structure forming amino acids ―uniting‖ side chains of the amino acid in the catalytic site, an approach based on comparison of sequences of amino acids in homological proteins presented by a broad family (see below) was used. The procedure of comparison of the primary protein structure with high homology degree (30% or higher) was called "alignment.". If any protein from retrieval is taken for the basis, other proteins-homologues can be ―bead‖ on it using a computer in order to compare identity of each position in all selection under study. The ―alignment‖ algorithm suggests automatic accounting for ―insertions‖ (the presence of additional, not general sequence) and ―deletions‖ (frequent skips of chain fragments). ―Alignment‖ results in a matrix of probabilities of occurrence of one or another amino acid in each position of the polypeptide chain selected as the base one. This approach allows detection of amino acid positions, in which amino acid may be preserved with high probability, the so-called conservative positions typical of all proteins of the selection. Distribution of various functions of conservative amino acids is rather obvious. Carboxylic group of aspartic and glutamic amino acids, imidazole group of histidine, guanidine group of arginine form three-dimensional catalytic structures, which perform coordinated nucleophilic-electrophilic catalysis. Elementary act of catalysis is associated with relatively low proton transfer along hydrogen bond line by 1-1.5 A or with bond polarization by electrophilic agent. Hence, reactivity of substrate molecule or active site group increases by 106- 107 times. For example, drawing the reagent structure in the transient state closer to hydroxylic ion, proton transfer from water increases its reactivity in nucleophilic replacement reaction by 107 times. If the process proceeds consistently and each of the substrates is activated in this manner, total acceleration of the reaction by 1012 - 1015 times may be expected. Catalytically valuable amino acids are spread by the whole polypeptide chain of the protein. During folding and three-dimensional structure formation, catalytically important functional groups are drawn closer in space, forming three-dimensional structure of the catalytic site. The factor providing reliable polypeptide chain folding is the presence of structure forming amino acids— - glycine, proline and cysteine – —where appropriate. Hence, catalytically valuable amino acids may be located in different order of the popypeptide chain.
8
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
The main structural antinomy of the enzymatic catalysis is so that the giant quantity of enzymatic reactions using virtually infinite number of proteins and differentiating by the primary structure is based on the use of rather limited number of catalytic sites. For example, the greatest class of enzymes – —hydrolases, gives about one -third of presently known enzymes (about 1,100 of 3,700 enzymes by their classification). Each enzyme is followed by a biological variety connected with the fact that in biological systems these proteins are represented by virtually infinite set of choices of amino acid sequence. However, there are five basic catalytic sites only, which activate molecules and the catalytic cycle [13]. Multiformity of the reactions (about 1,100 for hydrolases) is connected with the sorption site action, which forms the substrate complex and orients active site relative to catalytic groups of the protein. Thus, catalytic sites of enzymes are unique structures formed during evolution and, apparently, as a result of searching many intermediate versions of structures and genetically fixed as indexed positions of catalytically valuable amino acids and structure forming amino acids collecting functional groups of catalytic amino acids in the desired place of space with required steric match. The phenomenon of structural unity of catalytic sites composition occurs, when we compare active sites of enzymes, which are not hydrolases, with hydrolases. Analysis shows that active sites of enzymes of other classes different from hydrolases are composed from the same nucleophilic-electrophilic elements, as active sites of hydrolases are. Reactive sites of molecules are activated by the same nucleophilic-electrophilic mechanisms, frequently with the use of the same three-dimensional structures. The phenomenon of structural solidity of the active sites of enzymes becomes more obvious, when structures of active sites of hydrolases are compared with appropriate structures of synthetases, which form bonds and use no water molecule as a reagent. Transition from hydrolytic to synthetic reactions is thereby performed by replacement of activation of water by, for instance, carbohydrate hydroxylic group with full preservation of the active site structure [12]. The above-described identification of catalytic and structure forming amino acids from the data on primary sequence of amino acids in the polypeptide chain, the phenomenon of structural solidity of enzymes of various classes has many consequences. It is obvious that proteins having catalytic activity may be composed from amino acids, the number of which is much smaller than twenty acids existing in the nature. The greater number of positions in the polypeptide chain may be substituted by other amino acids without sugnificantsignificant change in catalytic function of protein. Of importance is the presence of some critical amino acids in particular positions either possessing acid-base properties, forming a catalytic site, or structure forming amino acids gathering catalytic groups in a particular area of the space. Traditional modification methods for the enzyme properties are site-directed mutagenesis or the method of directed evolution. The results of analysis show that free variation of amino acids is possible only in positions different from conservative by character. Otherwise modification of the enzyme may very probably cause loss of its enzymatic activity.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
9
Active Sites of Enzymes: Geometrical Invariants and Observed Values of Parameters At present, structural studies of enzymes have provided investigators in the field of catalysis with a broad set of atomic resolved structures. Structural PDB base is currently represented by more than 30,000 structures, the data quantity increasing by 20% annually. Hence, composing of structuresbystructures by combinatorial methods with respect to homology becomes the separate approach to obtaining three-dimensional structures of proteins and enzymes. This allows arising and solving general problems forming notions about catalysis as a whole as the natural phenomenon. The task that had been solved, concluded in designing of an automated procedure, which allowed detection of active sites as a configuration of atoms of definite structure and the associated problem, which was estimation of permissible variations of distances between atoms and bond angles at conservation of the catalytic function. To put it differently, of interest is answer to the the question how ―strict‖ structural requirements to configuration of atoms are in order to implement the catalytic function. The answer to this question may be obtained by statistic comparison of the structures of enzymes possessing the same active site but different by the source specificity, and structure of protein molecule. The model object of surveys serine hydrolases including a triad of the hystidineaspartic acid series as the catalytic site was was selected for. Enzymes of this class are studied most well from positions of catalysis mechanism, specificity, and structural organization of active site [6, 17]. In the PDB database, these enzymes are presented by 1,284 proteins. Hence, 1,256 structures were obtained by the X-ray structural analysis, and 28 structures by proton magnetic resonance. The primary analysis shows that enzymes performing catalysis with the help of Ser-HisAsp triad may be rather different by configuration of atoms forming the catalytic site. Statistical analysis of permissible variations observed in really operating enzymes was performed, on the one hand, for the purpose of detecting limitations and, on the other hand, for the purpose of creating automatic computerized procedure for identification of catalytic sites of this type from the data on three-dimensional structures of protein. The latter suggests designing of a computer "pattern" with permissible values of parameters, which would allow detection of catalytic sites in a complex configuration of atoms of a protein molecule. For the design of computer ―pattern,‖ the simple procedure of structure comparison by mean-square deviations of atoms appeared unacceptable, because it doesn not consider the multitude of specific structural differences associated with location of atoms in space. A procedure detecting local identity of atoms in protein structures, based on geometrical invariants, has been developed. For geometrical invariants, the distances between principally important atoms, planar angles between three atoms pertaining to two or three catalytic groups, and planar angles formed by two vectors, each composed on atoms of the same catalytic residue, have been taken. For the ―training‖ set, the data on enzymes with strictly known structure were used, and for limitations, minimal and maximal values of this parameter were taken.
10
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
This analysis gives principally important information on structural features of active sites of enzymes, allows detection of permissible values of geometrical invariants, creation of a procedure for automatic identification of active sites in the protein molecule, and determines important features of the structural reorganization of active site in the catalytic cycle.
Molecular Mechanics of Protein in the Study of Active Sites Recently new abilities in the study of molecular mechanisms of the enzymatic catalysis occurred. These abilities are associated with the use of molecular mechanics and quantum chemistry methods for calculation of protein molecule behavior during the catalytic act. Development of effective programs describing proteins within the classical mechanics gave the investigators opportunities to calculate energy properties of protein molecule and behavior protein chains in various conditions [18-21]. Of principal interest are calculations of protein interaction with lowmolecularlow molecular ligands (substrates, inhibitors), study of the so-called docking-process, and computer mutagenesis, as well, the calculation of changes in the protein molecule at replacement of an amino acid by another one. Computer mutagenesis is principally important for solving tasks of rational designing of proteins with directional changing of their properties. Molecular mechanics and computer mutagenesis methods helped in studying behavior of the active sites of enzymes at replacement of conservative and nonconservative glycines by other amino acids. It has been shown that replacement of conservative glycines by alanines seriously disturbs geometry of active sites [22].
Quantum-Chemical Calculations of Elementary Acts of the Catalytic Cycle A qualitative advance in understanding of the elementary acts of enzymatic reactions is associated with the use of quantum mechanics and quantum chemistry methods for description of separate acts of the catalytic cycle. Rather reliable quantum-chemical calculations became possible due to perceived separation of a catalytic site in protein molecule, which behavior may be described using quantum chemistry methods. Hence, the rest of the protein molecule may be described using molecular mechanics methods, the QMMM-approximation [23-25]. The methods developed provide calculation of multidimensional surfaces of potential energy, study and identification of stationary points (global and local minimums, transition states), and calculation of energy patterns of chemical reactions. All this forms a new level of understanding of molecular mechanism of the reaction. Quantum-chemical calculations provided a possibility of solving many problems connected with detailed apprehension of the mechanisms of enzymatic reactions. For example, since Blow et al. discovered the three-dimensional structure of -chymotrypsin, it has been assumed common that hydroxylic group of serine in Ser-His-Asp triad is activated by the mechanism of proton double transfer serine hydroxylic group histidine imidazole group deprotonated carboxylic group of aspartic acid. This mechanism based on seemingly obvious fact of proton transfer from positively charged imidazole ion to negatively charged
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
11
carboxylic group of aspartic acid was assumed for explanation of the catalytic triad action (see above) for serine hydrolases. However, quantum-chemical calculations of energy patterns performed in the framework of the QM-MM approximation demonstrated that regarding the environment of carboxylic group, the activation mechanism of serine hydroxyl is of the single-proton character. The role of carboxylic group of aspartic acid is defined by orientation of imidazole group in the active site space [25]. At the present time, rather reasonable ideas on the enzyme specificity nature, i.e., their ability to ―recognize‖ the structure of effective substrate or inhibitor, are developed. Basic physicochemical model is based on the idea that substrate or inhibitor interaction with sorption area of the active site provides a decrease of free activation energy at the limiting stage of the process. The mechanism of free activation energy decrease due to formation of stresses on the reaction bond is experimentally validated. Substrate selection by means of conformation changes of the active site induced by the interaction of ―good‖, i.e., specific, substrate with active site groups (the so-called ―induced conformity‖) is also theoretically and experimentally validated [1]. The mechanism of ―induced conformity‖ was suggested by Koshland [26] and is now confirmed by quantum-chemical calculations [27]. For serine hydrolases, O - N distance for free enzyme is systematically by 0.1 - 0.2Å longer compared with the enzyme-substrate (enzyme-inhibitor) complex. This suggests that complex formation between the substrate and the active site causes some change of the latter that moves the system by the reaction coordinate. As shown by quantum-chemical calculations, energy required for the proton transfer O - N in the enzyme-substrate complex is, approximately, by 3 kcal higher compared with free enzyme. Since the proton transfer is the component of the activation barrier (total barrier height is 9.6 kcal/mol), the data obtained show that the substrate makes some structural changes in the active site, which have a significant effect on enzymatic reaction proceeding (the reaction rate constant increases by 150 times, approximately). Quantum-chemical calculations demonstrate high ―sensitivity‖ of catalytic function of the enzyme to relatively low changes in the active site geometry. This may be valuable for structural polymorphism of the enzyme. Replacement that does not touch upon the active site directly and is located far from catalytic locus, is able to induce a change of distances between catalytic groups important for catalysis and, therefore, lead to catalytic polymorphism. The feature of the modern state of investigations in the field of physical and structural chemistry of enzymatic catalysis shall be emphasized. At present, the system approach to investigation and understanding of the origin of catalysis by enzymes is fully formed. Knowledge of the primary structure, i.e., the sequence of amino acids in polypeptide chain provides sufficient information to cosequent hierarchy levels of process understanding to the extent of elementary acts of the catalytic site. Modern informative methods give an opportunity to detect functional groups forming the catalytic site and principally important structure forming amino acids. In the presence of homologues of particular protein having the structure with atomic resolution, one may obtain quite reliable information on the three-dimensional structure of the studied protein. This allows determination of the active site geometry. Subsequent application of molecular mechanics and
12
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
quantum chemistry methods to calculation of the interaction between catalytic groups and the substrate provides the information on energy of elementary acts for various possible reaction mechanisms and participants of the catalytic process and, therefore, provides an idea about molecular mechanism of catalysis, physical nature of chemical reaction acceleration by enzyme action.
BANKS AND DATABASES GIVING INFORMATION ON HUMAN ENZYMES AT GENETIC AND STRUCTURAL LEVELS Two persons who are not relatives have about one difference per 1300 nucleotide pairs [28]. Human genome is 3.2109 pairs large, so two persons, on average, differ by 2.5106 positions. Calculation of total number of variations, which may be observed in the human population, is more complicated task. The lesser frequent mutation is, the smaller the chance is to detect it by comparing genomes of two individuals. Accurate estimate of total SNP number in the human genome requires, firstly, data on distribution of frequencies of mutation occurrence and, secondly, selected lower bound of occurrence of the variations, which will be accounted for. It is assumed that for SNP only such mutations are taken, which Minor Allele Frequence, MAF, is 1% or higher [29]. Using estimates of mutation distribution frequences one may show that human population has about 11 million SNP [30].
BANKS AND DATABASES OF GENES, PROTEINS AND SNP GenBank1 [31] is one of the primary and the most well-nownknown data banks providing information on nucleotide sequences. It was established to give a suitable access to experimental sequencing data. Owing to this data bank, modern publications list identification codes only (GenBank Accession Number), whereas nucleotide sequences themselves may be found by the identification code mentioned. Stored sequences are highly different by length: they may represent results of a single sequencing and genomes of the entire organisms. For example, total DNA sequence for E.coli is accessible by identification code АЕ014075. GenBank is maintained by NCBI (National Center for Biotechnology Information) at NIH (National Health Institute, USA). Since mid-90s, three leading data banks: GenBank, EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Database of Japan), have united their efforts and arranged data exchange. Therefore, at the present time, content of these banks is identical. Genome Database2 [32]. The results of sequencing perormedperformed within the "Human genome" Project are stored in GenBank which, by definition, does not analyze the data. The results obtained are annotated in a separate project, Genome Database, maintained in the same NCBI Centre. Annotation of the sequence includes identification of genes and description of their exonic-introgenic structure, the relation with databases of protein sequences and RNA, description of local features – —the presence of regions enriched with 1 2
http://www.ncbi.nlm.nih.gov/Genbank/ http://www.ncbi.nlm.nih.gov/Genomes/
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
13
repetitions, the presence of unique STS sequences (Site Tagging Sequences), references to homologic sequences of other organisms and much more the same. There are analogous projects: Genome Browser3 (University of California in Santa-Cruz) and Ensembl4 of European Institute of Bioinformatics. dbSNP5 [33]. During the ―Human genome‖ Project DNA of several volunteers was sequenced [34]. Therefore, nucleotide sequences from the Genome database do not belong to any particular man – —they are generalized, consensus for a group of people. The differences between individuals determined during sequencing and, first of all, SNP, formed a new database – —dbSNP, which is also supported by the group of NCBI. The database presents documents of two types - variations determined experimentally by any research group and variations representing associations of several experimental groups (because different groups may describe the same mutation independently). Identification codes in the first group begin with letters ss (submitted snp), and in the second group with letters rs (reference snp). For every variation, data on its location in chromosome and flanking sequences (25 n.p. or longer each) are shown. It is indicated if this mutation is present in the intergenic interval, nitron or exon of any gene, and if it leads to a change in amino acid sequence of the protein. If investigations with many people were performed, data on frequency of occurrence of every allelemay be presented. At present (November 2006), the database contains data on 11,961,761 human reference SNP, ant it is still refilled. Beside dbSNP, which is the largest database for general purposes, there are other projects, HGBase6 [40] or JSNP7 [41], for example, dedicated to detection of SNP drug metabolism in genes, observed among Japan citizens. SwissProt8 [35] is the largest database on proteins. The year 2006 is the twentieth year of works of Swiss Institute of Bioinformatics on creation of annotations of proteins basing on the data from the literature. At present, the project is carried on by Swiss and European Institutes of Bioinformatics together, and the name of the project was changed to UniProt. For proteins described in the database are given the name and synonyms, name of gene, data on protein extracting organism, full amino acid sequence and its annotation - post-translational modification sites, the presence of disulfide bonds, signal peptides, etc. Biological activity for proteins and EC numbers, required cofactors, catalytic site residues and catalyzed reaction for enzymes are shown. Finally, cross references with other databases (over 70), including GenBank and Protein Data Bank (if for this protein the three-dimensional structure is known). For more than 2,000 human proteins, data on diseases with which these proteins are associated and references to OMIM database are given (see below). Since summer 2006, data on mutations which lead to replacement of one amino acid residue are shown. At present, over 28,500 variations are shown, a half of which are associated with diseases or aptitudes; for ~30% of variations, references to dbSNP are given. Each mutation has the annotation page with shown references to the literature, location in the sequence and in the protein structure (if known), as well as conservatism of position, in which this mutation is observed on the multitude of homologic proteins. 3
http://genome.ucsc.edu/ http://www.ensembl.org/ 5 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp 6 http://hgbase.cgr.ki.se/ 7 http://snp.ims.utokvo.ac.ip/ 8 http ://www. expasy. or g/sprot/ 4
14
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Protein Data Bank9 [36] is the main information repository of three-dimensional structures of proteins and nucleic acids, obtained experimentally, usually by X-ray diffraction and NMR-spectroscopy. By November 2006, the bank included 36,642 structures for 15,130 unique proteins (for the same protein several structures may be obtained, and they will be stored in the bank). Basing on PDB, Structure/MMDB (Molecular Modelling DataBase)10 was created in the NCBI Centre. The documents in this database may differ by numeration of amino acid residues. Note that for reflection of SNP position in the three-dimensional structure of protein dbSNP database refers directly to the Structure database. MutDB11 [43]. Finally, let us denote a small but suitable web-service for SNP visualization. Software located on this server allows for on-line viewing of three-dimensional structures of proteins from PDB with mutations indicated on them, which were taken from SwissProt and dbSNP databases.
DATABASES FOR THE STUDY OF ASSOCIATIONS OF GENETIC POLYMORPHISM AND DISEASES As already mentioned, the number of possible SNP is about 11106. Determination of SNPs associated with particular disease is highly time consuming task. In many cases, a set of SNP-candidates may be reduced, if the known connection between the disease and any part of chromosome or, none the worse, a definite gene, is known. OMIM12 (Online Mendelian Inheritance in Man) [42] is the oldest database (refilled since 1960s) associating inheritable diseases and genes. The database is composed manually, by data published in medical literature, many of which were obtained by investigations of genealogy of families, in the members of which the disease developed. Since the database was created before discovery of SNPs, as a rule, no data on correlation of diseases with separate SNPs are shown. Moreover, many genetic diseases described in the database are induced not by SNP, but by loss of chromosome fragments or mutation, for example, which were detected in several patients only (by SNP definition, the frequency of occurrence of the minor allele shall exceed 1%). Database documents represent descriptive texts and are not suitable for automated analysis. Despite the disadvantages, however, the database contains sufficient information and bibliographical references, and many surveys detecting SNP associated with diseases used this database for the starting point. Modern database associating diseases with separate SNP, GAD13 (Genetic Association Database) [44] and HGMD14 (Human Gene Mutation Database) [12], for example, do exist. These databases are formed from literary data, which are manually reduced to the consistent form. GAD database shows names of gene and disease, release references, where the data were taken, statistical correlation confidence and sample size (if they are mentioned in the publication). References to many databases, including PubMed, dbSNP and НарМар, are also presented. At present, the database contains information on 2,850 genes and 5,633 diseases/phenotypic
9
http://www.pdb.org http://www.ncbi.nlm.nih.gov/Structure/ 11 http://www.mutdb.org/ 12 http://www.ncbi.nlm.nih.gov/entrez/querv.fcgi?db=OMIM 13 http://geneticassociationdb.nih.gov/ 14 http://www.hgmd.org/ 10
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
15
characteristics. Since summer 2006, HGMD database requires registration of users and for non-academic user is payable. HapMap (Haplotype Map) [37]. Data on hereditary character of aptitudes for many diseases are present, but the casual relation to any single gene is not determined. Therefore, the only way to detect SNP associated with such diseases is analysis of all possible SNP (genome-wide association study). Fortunately, the scope of work may be significantly reduced in this case, too. For many SNP pairs a correlation is observed, i.e., the presence of one of SNP alleles with high probability means the presence of this allele in another SNP. Therefore, the number of SNP sets (haplotypes) observed in the population is much smaller than the set of all possible combinations. Special international project was dedicated to detection of possible haplotypes. The results of this project are shown in HapMap database. It is found that for taking into account 95% of all SNP it is enough to check the presence of about 700,000 SNP markers. If we are limited by a signle population only (Central European or Asian, for example), the number of SNP to be checked is reduced to 300,000. For example, expression levels of more than 40 genes were analyzed (RNA quantity in 300 persons was determined) [38]. For more than a half of genes SNP were detected, which correlated with significant statistical reliability with the gene expression level. Note that the presence of correlation between various SNP causes difficulties to interpretation of association between SNP and the phenotype. For example, it may appear that a variation described in the literature by itself does not cause a significant change of protein properties, such as stability or catalytic properties. It just occurs together with another variation, which is the true reason for phenotype change (changing stability or RNA expression level, for example).
Theoretical Forecasting Methods for SNP Effect on the Structure and Function of Protein Taking into account large volume of posiblepossible SNP, it would be extremely suitable to obtain methods for functional evaluation of SNP significance in order to reduce the quantity of SNP-candidates for experimental check. Since the factors affecting RNA expression level or stability are not clear enough, the majority of investigators focus their efforts on forecasting of the effect of relatively small nsSNP group (SNP leading to replacement of amino acid residues) on functional characteristics of proteins. Three methods can be separated: methods considering conservatism and amino acid composition of the position, in which replacement happened, on the multitude of homologic proteins: methods based on analysis of changes in the three-dimensional structure of protein; combined methods, frequently using software learning on test data (machine-learning methods). The authors of many methods, described below, analyzed the data by polymorphism and published results, as their own computer databases. Analysis methods of amino acid sequences are based on a suggestion that amino acid replacement in a position conservative on the multitude of proteins-homologues of different organisms, will be more significant (and unfavorable), than replacement in nonconservative position. The methods by different authors differ in the methods of detecting proteins-homologues, composition of multiple alignments and statistical functions used for quantitative estimation of the position conservatism. For nonconservative positions amino acid composition is also frequently taken into account – —
16
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
for instance, mutation leading tito occurrence of an amino acid residue existing for any homologue is assumed less significant than mutation leading to occurrence of a residue absent in this position in all proteins-homologues. The most well-known methods are SIFT15 (Sorting Intolerant From Tolerant) [45, 46], PolyPhen16 (Polymorphicm Phenotyping) [47, 48] and Panther17 [22]. The authors of the latter work have not only developed computational method, but also experimentally checked its forecasts on the culture of human cells with 17 versions of АВСА1 gene, for 16 of which theoretical and experimental data coincided. General disadvantage of the methods based on analysis of three-dimansionaldimensional structure of proteins is that this three-dimensional structure is required. On the other hand, these methods allow not only to find potentially unfavorable mutations, but also to explain their reason – —the fault of protein hydrophobic nucleus packing and stability loss, changes in protein-protein contact sites, etc. For operation of such methods, an expert set of rules allowing identification of potentially unfavorable mutations (examples of the rules – — replacement of a small amino acid residue inside the protein globule by voluminous one; replacement of cysteine participating in disulfide bond formation by another residue; hydrogen bond loss, etc.). Then for proteins with the known three-dimensional structure, SNP are checked for conformity to these rules. Such approach was used for composing SNPs3D18 [50, 51] and LS-SNP19 [52] databases. Analysis of mutations, for which data from the literature on association with diseases do exist, indicated that about 83% of mutations reduce protein stability, whereas only 5% of mutations concern residues participating in ligand binding or catalysis [50]. Finally, there are methods, which use both structural data and data on conservatism for mutation role forecasting [53, 54]. An interesting group is formed by methods, which apply sets of mutations (neutral and experimentally confirmed unfavorable ones) for learning machine classifiers, such as SVM (support vector machine) [55, 56], decision trees [57] and random forest [58]. The advantage of use of learning samples composed on actual data is independence of human set of rules. Moreover, it becomes possible to find out, what factors are the most important for discrimination of neutral and unfavorable mutations. For instance, it has been shown [59] that a set of 32 considered criteria may be reduced to two criteria only without forecast accuracy loss. One structural parameter (the amino acid residue square accessible for solvent) and one of position conservatism parameters based on multiple alignment of superfamily proteins were found the most important. Entropic image of a family of homologic proteins – —the way to detect conservative, important for the structure and catalysis, and variable parts of the polypeptide chain. When discussing the structure of active sites of enzymes, two structural components of the active site shall be marked off: 1. sorption subsite responsible for binding, fixation and orientation of substrates; properties of this site define specificity of the enzyme; 2. catalytic subsite carrying out chemical transformation of substrate molecules and usually using general acid-bace catalysis for this purpose. 15
http://blocks.fhcrc.org/~pauline/ http://www.bork.embl-heidelberg.de/PolyPhen/ 17 http://www.pantherdb.org/tools/csnpScoreForm.jsp 18 http://www.SNPS3D.org 19 http://www.salilab.org/LS-SNP 16
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
17
One may expect that in the framework of one superfamily the sorption subsite responsible for enzyme specificity may be represented by a broad variation of the protein structure coinciding with the variation of substrate structures. At the same time, catalytic sites, the number of types of which is rather limited, seem to be conservative elements of the structure. To confirm this suggestion, a bioinformative approach based on comparison of amino acid sequence in proteins of one large family was used [10, 11]. The results of sequence alignment for several large families of enzymes, represented in HSSP database20, were analyzed. Enzyme families were selected basing on the following criteria: 1. the number of analyzed family representatives must exceed 100 that provides statistical confidence of the results; 2. for analysis, a family of anzymes from different classes (oxidoreductases, hydrolases, isomerases, etc.) shall be selected; 3. if possible, selected enzymes shall have structures of active sites and catalysis mechanism determined with high degree of confidence. Usually alignment is presented as large tables obtained by imposing a protein sequence on a sequence taken for the base. Conservative elements of the sequence are determined visually, by comparison. Obviously, this method becomes low-informative and extremely unsatisfactory, if more than three-five proteins are compared. It may be automated by characterizing quantitatively the conservatism of amino acid position in the sequence. One of quantitative criteria of position conservatism of each amino acid in the protein sequence may be statistical criterion of the Shannon entropy form. Note that in the information theory, Shannon entropy is one of the most important functions. This function was introduced as the measure of uncertainty, which characterizes any event with a definite probability. Hence, information may be defined as a measure of uncertainty quantity, which is improved after receipt of message. Formally, the quantity of information is presented by the difference between informational entropies before and after the experiment (message receipt). The informational entropy (Shannon‘s entropy) seems supremely suitable function by comparison of allied proteins with various sequences of amino acids. Sequence alignment procedure in relation to any reference protein represents disposition of sequences one above the other with fixation of homologic parts and detection and elimination of inserts. Thus, by comparison of a large number of proteins the probability of occurrence of one or another amino acid in every position of the protein sequence may be calculated with adequately. This probability is determined as relative frequency of amino acid j occurrence in particular position I. For each position in the protein sequence, for all 20 amino acids the entropy function may be calculated:
H j pij log 2 pij . i
An important feature of this function is the fact that it approaches zero for both events with high (рji 1) and low (рji 0) probabilities. Therefore, resulting calculations of 20
http://www.sander.embl-heidelberg.de/hssp/
18
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Shannon‘s entropy positions in the protein sequence, which are general (absolutely conservative) for this j-th amino acid for all large family of proteins, may be determined. This is the position, in which probability of occurrence of this amino acid approaches unit, whereas for the rest amino acids it approaches zero. High Shannon‘s entropies are typical of positions in the sequence with high variability of amino acids, and low entropies are typical of amino acids with conservative positions in the amino acid sequence. In extreme case at р ji 1 (absolute conservatism), Нi 0. Using the Shannon entropy criterion, superfamilies of proteins were analyzed. Families unite proteins of any one of the number of structures and origins. For example, analyzed family of trypsin consists of more than 1,200 proteins, including such enzymes, as chymotrypsin, kallikrein, plasmin, hypostatin, neuropsin, coagulation factors IX and X, thrombocytes aggregation proteases, activator of hepatocyte growth factor, elastase, transmembrane tryptase, thrombin and many others (see entropic portrays of enzymes below). It is of interest to consider utmost conservative amino acids, for which Нi 0 (or approaches zero). The analysis determined the following regularities. 1. Amio acids, which form catalytically active site, always appear to be conservative elements at amino acid sequence alignment in the enzymes. It is known that in acid proteases of pepsin type the catalytic site includes carboxylic groups of two aspartic acid residues, Asp32 and Asp215 [17]. At alignment of amino acid sequence in pepsin protein family these aspartic acids manifest themselves as conservative positions with minimal Shannon‘s entropy. 2. As amino acid sequences in enzymes are compared, glycin and aspartic acid are most frequently observed as absolutely conservative amino acids. The result that glycin is the most conservative amino acid is somewhat unexpected. The second is aspartic acid; hence, glycin and aspartic acid totally give ~50% of all conservative amino acids. Amino acids were rated for conservatism manifestation in the studied families. For this purpose, for every amino acid the frequency of its occurrence as conservative element (Нj 0) was determined with the rate fixing for summarized general number of conservative positions for all amino acids in the studied families. Figure 1 shows the rating of amino acid conservatism. As follows from the data presented, glycin, aspartic acid, cysteine, proline and histidine are conservative amino acids most frequently occurred in the sequences, giving ~70% of all conservative positions in enzymes. Methionine and isoleucine are infrequently conservative. It is reasonable to subdivide the most concervativeconservative amino acids into two principally different groups: 1. amino acids participating in elementary acts of substrate molecule activation as acids anf bases (aspartic acid and histidine); 2. amino acids forming architectonics of the active site (glycin, cysteine, proline).
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
19
a
b Figure 1. Frequency of amino acid occurrence f as conservative elements in the structure of enzymes (a) and in the nature (b).
20
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
The bioinformative approach used demonstrates the outstanding role of aspartic acid and histidine in functioning of the active sites of enzymes. The principle forming the basis of functioning of the active sites of enzymes is coordinated action of nucleophilic and electrophilic components of the active site, which allows to reach high acceleration of reactions. It is obvious that aspartic acid is principally important for these processes. Ionized form of carboxylic group of aspartic acid is a powerful nucleophilic reagent in water molecule activation in proton transfer processes (pepsin, lysozyme, -chymotrypsin). Aspartic acid is also of principal importance for formation of complexes of metals forming active site of metal-dependent enzymes. In the protonated form carboxylic group of aspartic acid is the proton donor, thus implementing functions of electrophilic agent. The role of glycin in formation and functioning of active sites is not so obvious, as compared with aspartic acid. Clearly, conservative glycin residues do not play a significant role in chemical acts of molecule activation in the catalytic cycle. Having no substituents at -C atom, glycin is deficient of expressed chemical function. Nevertheless, glycin residues in the proteins structure are of high importance. The fact that conservative glycin residues are of principal significance for enzymatic catalysis follows from experiments, in which site-specific replacement of these conservative residues by any of amino acid was performed. As a rule, this led to complete loss or heavy decrease of the enzyme activity. It is apparent that conservative glycin residues are principally important for two functions. 1. Being the unique amino acid with the most energetically free rotation around С—N and С—С bonds of the peptide chain ( and angles by Ramachandran), glycin play the role of a nodal point providing a possibility to change direction of the polypeptide chain at "assembly" of amino acid residues into active site. Thus, the presence of conservative glycin residues allows explanation of the structural paradox of enzymatic catalysis. This paradox is that absolutely identical active sites are ―assembled‖ from various polypeptide chains. General feature for these chains is the presence of conservative glycin residues and stabilization factors of ―assembled‖ structure. For example, by means of disulfide bonds (taking the third place in the conservatism rating, cysteine also demonstrates high conservatism level). It is interesting to note that conservative glycin residues in enzymes are usually strongly "reversed‖ by the angle (С—С bond rotation in amino acid). 2. Conservative glycins may play the role of conformation "hinges‖ providing mobility of the active site. It is confirmed by the fact that in many cases, conservative glycin residues near catalytically active groups may be detected. For hydrolases from various families, the following motifs are conservative, for example: Asp215XGly217 (pepsin); Asp170XXGly173 (thermolysin); Asp32XGly34, His63Gly64, Gly119XSer221 (subtilisin), Gly173XSer177 (trypsin); His76Gly77, Ser153XGly155, Gly175XAsp177 (lipases). In the enzymes mentioned Asp, Ser, His amino acids are parts of structure of active sites.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
21
Of interest is that for some active sites, the values of angles and for amino acids, which are components of a catalytic active site, are beyond the limits of energetically ―relaxed‖ ones. This follows from Ramachandran map composition for amino acids forming the active site of oschymotripsin (His57, Asp102, Ser195), for example. The active site of this enzyme id conformation strained ( and values fall within energetically unfavorable area). The primary substrate transformation into final products proceeding in the enzymatic catalysis is associated with participation of many intermediates having structure different from the primary substrate. Glycin residues from the active site may play the role of ―relaxing‖ elements, which perform conformational adjustment of the site for subsequent elementary act. Cysteine and proline (the fourth and the fifth positions in the amino acid conservatism rating) play an important role in formation of architecture of the active site. As is known, proline is the unique amino acid, which unfolds polypeptide chain. Apparently, the role of cysteine residues concludes in that the required structure of the active site, composed of different parts of the polypeptide chain, frequently remote from one another, is "fixed" by formation of a chemical bond in as a disulfide bridge. For many enzymes, this accomplishes formation of the active site architecture. The methodology developed is principally important for studying molecular polymorphism of human enzymes. A noticeable number of singular replacements in structural gene elements of proteins and enzymes observed shall be reflected on the properties of these proteins.
Molecular Polymorphism of Human Enzymes Enzymes are the largest and the most highly specialized class of protein molecules. They are the basis for realization f molecular mechanisms, by which genes act. Enzymes catalyze thoushands of chemical reactions, which, finally, form the cellular metabolism. In this connection, molecular polymorphism of enzymes has a significant effect on the human status, the features of his behavior, and reactions of external impacts. Taking as the example several physiologically most important enzymes, manifestations of molecular polymorphism of these biomacromolecules on both the level of threedimensional structures and entropic portrays, and separate functions of the organism shall be considered.
Acetylcholinesterase Acetylcholinesterase (ACHE), (ЕС 3.1.1.7) of mammals, besides central nerve system, is also observed in peripheral tissues, such as sympathetic and parasympathetic ganglions, parasympathetic endings of organs, motor endings of effector neurons, and perspiratory glands. In blood, ACHE is mostly presented by in membranes of erythrocytes. Moreover, depending on generic belonging different quantities of ACHE may be contained in plasma. Biological role of ACHE is associated with regulation of cholinergic neurotransmission, as it is the catalyst of acetylcholine hydrolysis in the intersynaptic space.
22
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Recently, great attention is paid to ACHE physiological functions not related to cholinergic transmission [59, 60]. The role of ACHE in stimulation of nerve and muscle cell proliferation has been shown. ACHE may be a marker of early differentiation of cells. Recently, correlations between change of activity of this enzyme and development of some widespread diseases, cardiovascular system [61-64], atherosclerosis [65], Parkinson‘s and Altzheimer‘sAlzheimer‘s diseases [61, 66], in particular, have been determined. Human ACHE gene consists of 6 exons. As a result of alternative splicing, three different polypeptide chains, which do determine the stock of isoforms of this enzyme (ACHE-Т, ACHE-Н, ACHE-R) determining catalytic properties and differing by distribution in tissues only. Until 2005, four ACHE gene polymorphisms were described, and only two of them were associated with clinical implications. The first of clinically valuable polymorphisms is located on the distal promoter of ACHE gene, and its implications are associated with increased sensitivity to anticholinesterase preparations and, possibly, with implication of the Gulf War syndrome. The second polymorphism is associated with His353Asn replacement and leads to occurrence of truly changed ACHE shape on the erythrocyte surface. The 3D ACHE structure with indication of position of this amino acid replacement is shown in figure 2. This mutation defines occurrence of the so-called YT-2 blood group instead of native YT-1 one. AccontingAccounting for this circumstance is necessary for choosing donor-recipient pairs for blood transfusion.
Figure 2. The structure of human acetylcholinesterase. The place of His353Asn amino acid replacement resulting the single enzyme gene mutation is indicated.
Table 1. Amino acid replacement as a result of singular mutations in AChE genes AA replacement type, SwissProt numeration Arg34Gln Glu344Gly Gly57Arg His353Asn Pro561Arg Pro592Arg 1
Hydrophobic property in norm (HBn)1 -0.59 -1.22 -0.67 -0.64 -0.49 -0.49
Probability of AC occurrence on the globule surface in norm (Sn)2 99 82 64 83 82 82
AA charge in norm3 + 0 + 0 0
Hydrophobic property after mutation (HBm)1 -0.91 -0.67 -0.59 0.92 0.59 -0.59
Probability of AA occurrence on the globule surface after mutation (Sm)2 93 64 99 88 99 99
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 + 0 + +
1.54 0.55 0.88 1.44 1.20 1.20
0.94 0.78 1.55 1.06 1.21 1.21
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
24
S. D. Varfolomeev , I. N. Kurochkin and I. A. Garie
Figure 3. Entropic image of human acetylcholinesterase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
ACHE physiological value combined with small number of determined genetic polymorphisms of this enzyme allowed to formulate the statement that ―virtually each mutation in ACHE must be dangerous.‖. Intrinsically, this statement indicates that the greater part of ACHE protein globule is important for manifestation of its function. In 2005, in Israel a large-scale investigation of ACHE gene polymorphism for different ethnic groups [67] was performed, and 13 SNP, 10 new among which, were detected. In three cases, the presence of previously detected mutations Pro592Arg, His353Asn and synonymous replacement Рrо477Рrо were confirmed. Three of newly detected SNP are nonsynonymous, i.e., lead to replacement of amino acids in the enzyme structure (Arg34Gln, Gly57Arg,
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
25
Glu344Gly), four more SNP occur in the untranslated region, and one is in the intron2; two mutations lead to synonymous replacements of amino acids. 17 Haplotypes and 5 ethnospecific alleles have been identified. The authors of this investigation suggested a hypothesis about expression of detected mutations under stress and medicated impacts. In 2003, an investigation was performed in Spain [68] that allowed to determine a relation between ACHE activity decrease and Pro561Arg replacement in patients with Altzheimer‘sAlzheimer‘s disease. Analysis of mutation disposition in the 3D ACHE structure shows that all amino acid replacements found are remote from the active site of the enzyme and cause no significant effect on the catalytic activity. Figure 3 shows ACHE entropic image indicating four of six described amino acid replacements. Determined replacements fall within high Shannon‘s entropy values that is the additional indication of their weak effect on catalytic activity of the enzyme, defined by low entropic (conservative) amino acids, which form the active site structure. Summary table 1 shows amino acid replacements due to singular mutations if ACHE gene, leading to phenotypic manifestations. Table 1 also shows values of hydrophobic property and probability of occurrence of amino acids on the protein globule surface in norm and after mutation. Table 1 shows that for ACHE all substituted amino acids are hydrophilic, as well as mutations resulting them. Hence, at replacement of amino acids the value of hydrophobic property changes insignificantly (HBm/HBn ratio), as well as the probability of contact between primary and mutant amino acids with solution surrounding the protein (Sm/Sn).
Butyrylcholinesterase Butyrylcholinesterase (BCHE), (ЕС 3.1.1.8) is contained in various tissues of mammals: liver, heart, vascular endothelium, nerve system and blood plasma. At present, there are many hypotheses describing possible biological role of this enzyme in developing and mature organisms. It was shown that BCHE plays the key role in neurogenesis [69]. In mature organisms, at their poisoning by low doses of organophosphorous compounds or carbamates, BCHE conceivably performs protective function [70] binding some part of toxicant appeared in the organism, thus, decreasing its acute toxicity. BCHE participates in metabolism processes of wide spectrum of endogenic and exogenic substrates and biotransformation of xenobiotics containing ester group. Hence, the main metabolic path for cocaine elimination from the human organism is its hydrolysis by BCHE [71]. Inactive precursors of many drugs are activated metabolically by their hydrolysis in the presence of BCHE. This effect was demonstrated on the antitumor agent irinotecan [72], antiasthmatic preparation bambuterol [73], a series of protective means against radiation – O-acyl serotonin derivatives [74]. Of special clinical interest is BCHE activity in the organism, when neuromuscular relaxant succinyl choline and analogous compounds hydrolyzed under the action of this enzyme are used. Low level of BCHE activity, which may be stipulated by its inhibition by anticholinesterase substances, at Altzheimer‘sAlzheimer‘s disease therapy or contact with insecticides [75], in particular, and the presence of genetic variants of BCHE in man, leads to an anomalous reaction to application of such relaxants - duration of their action increases that leads to asphyxia [76-78].
26
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Thus, therapeutic effect of many drugs depends on the level of butyrylcholinesterase activity of the organism. The use of each particular preparation and its dosage shall be strictly individual. That is why, in case of treatment using drugs based on carboxylic esters and at the first contact of a man with an anticholinesterase preparation the level of butyrylcholinesterase activity of the organism that may prevent occurrence of undesirable, frequently irreversible and dangerous consequences for the organism. Beside usual form (U), BCHE of the human blood plasma has about 9 phenotypic isoforms existing due to various genetic mutations [79]. The exact number of BCHE isoforms is not determined, because it is not confirmed that newly described forms are inconsistent with the previously known ones. By now, about 20 phenotypes are described, but only 10 of them may be identified clearly with the use odof standard biochemical methods. They are: ―atypical,‖, ―silent‖ (S), fluoride-resistant (F), as well as K (Kalow) и J (James), which are BCHE homo- and heterozygous variants [80]. Qualitatively they are separated by relation to inhibition in the presence of fluoride ions, dibucaine (used in medicine as local anesthetic) and (2-hydroxy-5-phenylbenzyl)-trimethylammonium bromide (Ro-02-0683) [79-81]. From clinical positions, ―atypical‖ isoform of BCHE, for which Asp98Gly, and in some cases Asp98His, replacement is typical, is of the highest importance. These mutations are present in the peripheral anionic site of the enzyme and result in 10-fold decrease of binding ability of positively charged substrates. People with such genotype are characterized by anomalous response to injection of short- acting relaxants succinyldicholine and mivarucium chloride that is expressed in 2-5-hour asphyxia (apnea) after relaxant elimination (at the background of 3-5-minute asphyxia in norm) and durable paralysis [82]. Moreover, they are more perceptive to the impact of anticholinesterase preparations, including toxic organophosphorus compounds that stipulates high risk of acute or belaid neurotoxic effects in them at contact with such compounds. For ―fluoride resistant‖ BCHE isoforms of the first and the second type amino acid replacements Thr271Met and Gly418Val were described, respectively [59]. In Japan, two more replacements, Leu335Pro and Leu358Ile, typical of this BCHE type were detected [83]. These mutant forms of the enzyme are characterized by high fluoride and low dibucaine numbers. For this BCHE form, anomalous response to injection of relaxants is lower than for ―atypical‖ form and is manifested by 30-minute asphyxia after relaxant elimination. ―K-form‖ of BCHE is characterized by Ala567Thr amino acid replacement and associated low dibucaine number [84]. This mutation is observed in patients with Altzheimer‘sAlzheimer‘s disease and other kinds of dementia. For BCHE in the ―J-form,‖, a mutation was detected [85] that led to Glu525Val replacement and manifested itself in decrease of catalytic activity of the enzyme and atypical responses to injection of relaxants. Recently, mutations were determined that led to a change in BCHE level and stability. For example, in Japan Tyrl56Cys amino acid replacement was detected [86] that led to a significant decrease of BCHE activity. In India, BCHE form with Leu335Pro replacement was determined [87] that led to very low level of the enzyme due to destabilization of its structure inducing a rapid decay of enzyme in the organism. Spatial disposition of amino acid replacement points due to BCHE gene mutations is shown in figure 4. As for ACHE, all detected amino acid replacements are observed in highly variable part of entropic image of the enzyme (see figure 5). In this case, however, such transformations of the protein globules lead to noticeable changes in catalytic (affinity
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
27
decrease of anionic site in relation to positively charged substrates) and regulatory properties (occurrence of fluorine resistance, dibucaine number decrease), and to BCHE structure destabilization, as well. Analysis of the summary table of replacements (see table 2) for BCHE shows that replacements of both hydrophilic and hydrophobic AA are typical of this enzyme. Note that in 7 of 9 cases, AA transformation at gene mutation increases hydrophobic property. Hence, strong changes in AA hydrophobic property index (by 10 times for Tyr156Cys and by 2.5 times for Leu335Pro) reduce the enzyme stability. Amino acids observed either import positive charge to the protein structure or do not change it.
Figure 4. The structure of human butyrylcholinesterase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
Table 2. Amino acid replacement as a result of singular mutations in BuChE gene
1
AA replacement type, SwissProt numeration
Hydrophoby in norm (HBn)1
Probability of AC occurrence on the globule surface in norm (Sn)2
AA charge in norm3
Hydrophobic property after mutation (HBm)1
Ala567Thr Asp98Gly Asp98His Glu525Val Gly418Val Leu335Pro Leu358Ile Thr271Met Tyr156Cys
-0.4 -1.31 -1.31 -1.22 -0.67 1.22 1.22 -0.28 1.67
62 85 85 82 64 55 55 77 85
0 0 0 0 0 0
-0.28 -0.67 -0.64 0.91 0.91 -0.49 1.25 1.02 0.17
Probability of AC occurrence on the globule surface after mutation (Sm)2 77 64 83 46 46 82 40 60 55
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 + 0 0 0 + 0 0
0.70 0.51 0.49 -0.75 -1.36 -0.40 1.02 -3.64 0.10
1.24 0.75 0.98 0.56 0.72 1.49 0.73 0.78 0.65
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 5. Entropic image of human butyrylcholinesterase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
29
30
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Paraoxonases An important group of esterases comprises paraoxonases (phosphoric ester hydrolases, organophosphate hydrolases, A-esterases, phosphate triesterases) [59]. Paraoxonase (PON1) is the member of protein family also including PON2 and PON3, which genes are clustered on the long arm of human chromosome 7 (q21.22). Paraoxonases demonstrate high homology and have ~65% identity by amino acids [88]. Mammals have three paraoxonase genes. They are highly conservative and demonstrate 79-95% identity by amino acids and 81-95% identity by nucleotides for various species [88-90] that allows suggestion of their important physiological role. PON-like proteins may be found in all species of animals, and even in fungi and bacteria. Figure 6 shows PON structure of rabbit having 84% homology with human paraoxonase.
Figure 6. The structure of rabbit paraoxonase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
PON1is synthesized in liver, wherefrom it is secreted to the plasma, where it is strongly bound to high density lipoproteins (HDL) [91, 92]. Extremely high activity of paraoxonase is observed in liver and plasma in rats, over 50% of total paraoxonasa being present in the plasma [93].
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
31
Primarily, PON1, the most well studied member of the PON gene family, was described as A-esterase-organophosphite hydrolase. PON1 was named after paraoxone, the first and one of the most well studied substrates. PON1 only possesses the paraoxonase activity, whereas all three PON possess any arylesterase and lactonase activity. To provide stability and manifestation of enzymatic activity, PON1 requires Са + presence [94-96]. Recently [97], PON1 activity was studied using over 50 substrates belonged to three different classes: esters, phosphotriesters and lactones. Another enzyme related to RON family is diisopropyl fluorophosphatase of mammals, for which identity with the marker aging protein-30 is currently indicated. This enzyme also hydrolyzes soman, sarin, but not paraoxone [90-100]. PON1 plays the key role in OPC detoxication. Serum A-esterases may hydrolyze active metabolites (oxones) of some OPC (paraoxone, chlorpyriOPC-oxone, diazinon-oxone, pyrimiOPC-methyl-oxone) or phosphorylic compounds (tabun, DFP, sorin, soman, dichlorvos), playing the central role in detoxication of these compounds and their toxicity. Compared with animals having high activity of paraoxonase (rats and especially rabbits), animals with low content of this enzyme (birds) are more sensitive to the action of some OPC [94, 101]. The recent data obtained in experiments on animals [95] convincingly demonstrated the main role of paraoxonases in detoxication of thionephosphorylic OPC, which way of metabolism is P450/PON1. Low paraoxonase level in young animals explains, at least partly, their high sensitivity to OPC [94, 95, 102]. In the series of pathologies associated with atherosclerosis, PON1 activity is decreased. Hence, the reverse dependence between paraoxonase activity and the risk of cardiovascular diseases is observed [103-105] that testifies about important clinical value of this enzyme. PON1 is the protein component of high density lipoprotein which, most likely, defines their antioxidant properties [106-108]. It is shown that decreased PON1 activity (by paraoxone and diazoxone) is one of the risk factors of some cardiovascular disease development [109]. In human populations, serum paraoxonase manifests substrate-dependent polymorphism and high variability of PON1 level in the blood plasma among individuals. PON1 polymorphism associated with Gln192Arg replacement determines different catalytic activity in relation to some organophosphorus substrates [110] and increases aptitude to coronary artery diseases and mortality among women in the second part of life [111]. Moreover, the effect of this mutation on the ability to adapt, the rate and quality of aging is indicated. It is of interest that mutation in the neighbor position Gln191Arg has no significant effect on the risks of coronary artery disease development, but just significant decreases catalytic activity of PON1 [112]. Polymorphism in position 108 (С/Т) of PON1 gene makes the main contribution into differences at the level of PON1 expression and, apparently, significantly affects PON activity in plasma. These two factors significantly determine the individual sensitivity to OPC action. Leu55Met and Met54Leu polymorphisms are associtedassociated with increased risk of diabetes development and the effect of glucose on metabolism, and tolerance to insulin [113117]. For PON2, Cys311Ser polymorphism was detected, with which risks of Altzheimer‘sAlzheimer‘s disease and vascular dementia development is associated [118].
32
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Disposition of determined amino acid replacements on the entropic image of paraoxonases (see figure 7) shows that in this case, all of them fall within the highly entropic zone. Summary table of detected AA replacements is shown below.
Figure 7. Entropic image of rabbit paraoxonase having 84% homology with the human paraoxonase. Red points indicate places of amino acid replacements described resulting single gene mutations of the human enzyme.
Carboxylesterases Carboxylesterases (CE), (ЕС 3.1.1.1) of mammals represent a large group of enzymes localized in endoplasmic reticulum and cytozole of cells of many tissues [59]. Maximal carboxylesterase activity in liver microsomes was observed. Rather high activity is typical of the plasma. CE is also present in narrow intestine and colon, stomach, brain, monocytes and macrophages [119]. This group of enzymes catalyzes hydrolysis of lipophilic ether, thioether and amide containing substrates [120, 121]. A broad CE substrate specificity determines the cell possibility to mobilize a spectrum of various ether compounds. They participate in detoxication and metabolic activation of various medicinal preparations, natural toxicants and carcinogens. A great number of exogenic substances are CE substrates. They comprise cocain, capsaicine, palmitoyl-coA, haloperidol, imidapril, salicilates, steroids, etc. [122-126]. For CE of EST2 type, Arg206His polymorphism typical of Japan and classified as medicinal preparation affecting human metabolism was found [127].
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
33
Alcohol Dehydrogenases Alcohol dehydrogenases (ADG) of mammals (alcohol-NAD+-oxidoreductase), (1.1.1.1), are dimmers consisting of subunits with the molecular weight about 40,000 and containing Zn2+ ion [128]. The 3D ADG I structure (ADG2,2 allele) is shown in figure 8.
Figure 8. Alcohol dehydrogenase 1 structure (ADG2,2 allele). The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
34
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
In the presence of nicotinamide adenine dinucleotide (NAD) ADG catalyzes oxidation of alcohols and acetals to aldehydes and ketones. Every subunit has areas participating in formation of binding sites of the substrate amd co-enzyme NAD. ADG classification was based on the differences of electrophoretic mobility. At present, 6 ADG classes are determined. Subunits forming the enzyme may be coded by identical or different genes. For instance, human ADG I is presented by multiple isoforms, which are coupled combinations of the three basic subunits (, , and ) subdivided, in their turn, in the variants , 1, 2, 3, 1 and 2 (ADG1, ADG1,2, ADG2,2, ADG3,2, ADG1,3, ADG2,3 alleles). ADG of classes II, III and IV consist of the pairs of identical subunits, , and , respectively. Isoenzymatic spectrum of ADG in liver reflects pathological changes in the organism, which is used for diagnostic purposes. The substrate ADG specificity of various classes has significant differences. First of all, their ability to ethyl alcohol oxidation is estimated. In a wide range of concentrations, this is the function of ADG I and ADG IV, namely. ADG II is an extremely limited participant in the ethanol oxidation. Of special attention is relatively low ability of ADG I and ADG IV to oxidize methanol. ADG II and ADG III have no such activity at all. The role of ADG II and ADG III classes in detoxication of alcohols is mostly associated with oxidation of long-chain alcohols. The important special function of ADG III is formaldehyde oxidation with participation of glutathione. ADG substrates widely comprise the compounds participating in the synthesis of some endogenic neuroregulators and hormones, as well as their catabolites. In particular, these are catecholamines and serotonine catabolites, many steroid hormone metabolites, intermediate products of cholesterol and bile acids synthesis, and retinol, as well [128]. The problems of evolution and occurrence of polymorphisms in the ADG structure have excited the curiosity of a broad range of investigators not only fundamentally, but due to significant differences in ADG in different races and ethnic groups, which are conjugated to ethanol acceptability by the man [128]. For Europeoids and Mongoloids, serious difference in ADG I activity were detected (for Mongoloids, specifically high active isoforms are ofteroften detected). Among them is Arg47His polymorphism of atypical ADG I (ADG2,2 allele) [129-133]. Activity of such enzyme in relation to ethanol is 50-100 times higher than that of ADG I isozymes typical of Europeoids. Of course, this is just a general, statistically reliable pattern of differences, although various variants of isoenzyme spectra exist in the framework of separate nations and ethnic groups. On the entropic image of this isoenzyme (see figure 9), amino acid replacement occurs in the range of rather high Shannon‘s entropies. Polymorphism 3 of the subunit in human ADG I Arg369Cys (ADG3,2 allele) weakens NAD cofactor binding and, possibly, reduces the risk of alcoholism development [134]. The presence of Gly78Stop muttion in the ADG I structure (ADG3,2 allele) increases the risk of Parkinson‘s disease [135]. It comes under notice that ADG I, for which the highest evolutional variability in the form of formation of new isoenzymes, possesses significant ethanol oxidizing activity [128]. Summary table of detected amino acid replacements for various ADG I allele forms is shown below.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
35
Alkaline Phosphatase Alkaline phosphatase (ALPL), (ЕС 3.1.3.1) is the enzyme from the class of hydrolases, which catalyze hydrolysis of phosphoric esters in the organism. Its function is to maintain the phosphate level required for various biochemical processes and phosphate transport to the cell. The enzyme consists of two identical subunits, which function alternately, and contains strongly bonded Zn atoms. Its molecular weight is 80,000. ALPL structure is shown in figure 10. Spatial disposition of polypeptide chains is known, and it is found that the reaction with substrate proceeds via the stage of enzyme phosphorylation1. Alkaline phosphatase is widespread in human tissues, especially in the mucous coat of intestine, osteoblasts, biliary duct walls in liver, placenta and lactating mammary gland. The highest ALPL concentration is observed in the bone tissue (osteoblasts), hepatocytes, cells of uriniferous tubules, mucous coat of intestine and placenta. ALPL participates in the processes associated with the bone growth. Therefore, its activity in child‘s serum is higher than in adults. Bone alkaline phosphatase is produced by osteoblasts – —large mononuclear cells on the bone matrix surface in the places of intense bone formation. Apparently, owing to extracellular location of the enzyme during calcification, a direct relation between bone disease and enzyme occurrence in the blood serum may be observed. In children, alkaline phosphatase level is high until adulthood. Activity increase of alkaline phosphatase accompanies rickets of any ethiology, Paget's disease, changes in bones related to hyperparathyroidism. The enzyme activity rapidly increases in case of osteosarcoma, cancer metastasis in bones, myeloma, megakaryoblastoma with bone affection. Alkaline phosphatase activity significantly increases at cholestasis. In contrast with amino transferase, the level of alkaline phosphatase remains normal or increases insignificantly at viral hepatitis. In 1/3 of patients with icterus and hepatic cirrhosis alkaline phosphatase activity increase was observed. Extraliver biliary obstruction is accompanied by sharp increase of the enzyme activity. Alkaline phosphatase activity increase is observed for 90% of patients with primary cancer of liver and at metastasis to liver. Its activity sharply increases in case of acute alcoholism at the background of chronic alcoholism. It may increase at curative prescriptions having hepatotoxic effect (tetracycline, paracetamol, phenacetin, 6-mercaptopurine, salicilates, etc.). On the first week of disease, about a half of patients with glandular fever demonstrate alkaline phosphatase activity increase. Women who administer antifertility agents containing estrogen and prohesteron may be subject to cholestatic jaundice and increased activity of alkaline phosphatase. Extremely high activity of this enzyme is observed in women with preeclampsia that is the consequence of placenta damage. Low activity of alkaline phosphatase in pregnant women indicates insufficient development of placenta. Beside the above-mentioned diseases and states, alkaline phosphatase activity increases in the following cases: increased metabolism in the bone tissue (at fracture healing), primary and secondary hyperparathyroidism, osteomalacia, renal rickets provided by vitamin D resistant rickets, combined with the secondary hyperparathyroidism, cytomegaloviral infection of children, extraliver sepsis, ulcerative colitis, regional ileitis, enteric bacterial infections, thyrotoxicosis. 1
http://obi.img.ras.ru/humbio/endocrinology/emptv/x00e3bd3 .htm#0014b260.htm
36
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 9. Entropic image of alcohol dehydrogenase 1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Reduction of the enzyme activity is observed at hypothyroidism, scurvy, expressed anemia, hypophosphatasia – —a heritable disease due to insufficient ALPL activity, characterized by rickets-like changes in the skeleton and urinary excretion of phosphoethanolamine and inheritable by autosomal-recessive type. All these diseases are related to ALPL gene mutations, which is localized in 1р36.1-34 b chromosome and comprises 12 exons separated by more than 50 kb2. Considering ALPL singular importance at the stage of early development of the human organism, special attention is devoted to obtaining information on molecular polymorphisms of this enzyme since the perinatal period. The extended and renewed database (184 mutations
2
http://obi.img.ras.ru/humbio/har/00238766.htm
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
37
for children of different age and adults) is present on special site3. At the same time, detected SNP, which were strongly associated by the investigators with unusual clinical responses, shall be noted. Ala179Thr replacement increases the risk of hypophosphatasia (in this case, bone demineralization and immature loss of teeth) development in children [136]. Hypophosphatasia in children is also induced by Arg71Cys and Gly334Asp replacements [137-139]. An increase of dedentition risk is observed for Pro108Leu [140, 141] and Ala116Thr replacements. Tyr263His mutation, observed in Japan [142], induces risks of brittleness of bone development and osteoporosis in women at the age of pausimenia. High frequence of Glu191Lys mutation related to the risk of moderate hypophosphatasia development is typical of European countries.
Figure 10. The structure of human alkaline phosphatase.
Disposition of determined amino acid replacements at the background of entropic image of human ALPL is shown in figure 11. It is of importance that mutations leading to detected decrease of ALPL activity in blood are observed in both high and low entropy zones. Table 5 shows summarized data on described amino acid replacements resulting single ALPL gene mutations, which correlate with valuable phenotypic manifestations.
3
http://www.sesep.uvsq.fr/database_hypo/Mutation.html
Table 3. Amino acid replacement as a result of singular mutations in PON genes
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Met54Leu (PON1) Glnl92Arg (PON1) Leu55Met (PON1) Cys311Met (PON2)
1.02 -0.91 1.22 0.17
Probability of AA occurrence on the globule surface in norm (Sn)2 60 93 55 55
AA charge in norm3 0 0 0 0
Hydrophobi c property after mutation (HBm)1 1.22 -0.59 1.02 -0.55
Probability of AA occurrence on the globule surface after mutation (Sm)2 55 99 60 78
AA charge after mutatio n3 0 + 0 0
HBm/ HBn
Sm/Sn
1.20 0.65 0.84 -3.24
0.92 1.06 1.09 1.42
1
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
Table 4. Amino acid replacement as a result of singular mutations in ADG I gene
1 2 3
AA replacement type, SwissProt numeration (allele)
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
AA charge in norm3
Arg369Cys (ADG3,2) Arg47His (ADG2,2) Gly78Stop (ADG3,2)
-0.59
99
+
-0.59 -0.67
99 64
+ 0
Hydrophobi c property after mutation (HBm)1 0.17 -0.64
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/S
55
0
-0.29
0.56
83
+
1.08 0.00
0.84 0.00
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
Table 5. Amino acid replacements as a result of singular mutations in AP gene AA replacement type, SwissProt numeration Tyr263His Pro108Leu Arg71Cys Glu191Lys Gly334Asp Ala179Thr Ala116Thr 1 2 3
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
AA charge in norm3
1.67 -0.49 -0.59 -1.22 -0.67 -0.4 -0.4
85 82 99 82 64 62 62
0 0 + 0 0 0
Hydrophobi c property after mutation (HBm)1 -0.64 1.22 0.17 -0.67 -1.31 -0.28 -0.28
Probability of AA occurrence on the globule surface after mutation (Sm)2 83 55 55 97 85 77 77
AA charge after mutation3
HBm/ HBn
Sm/S
+ 0 0 + 0 0
-0.38 -2.49 -0.29 0.55 1.96 0.70 0.70
0.98 0.67 0.56 1.18 1.33 1.24 1.24
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
40
S. D. Varfolomeev , I. N. Kurochkin and I. A. Garie
Figure 11. Entropic image of human alkaline phosphatase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Protein Phosphatases Protein phosphatases (F.K. 3.1.3.48) represent a large group of enzymes, which perform dephosphorylation of protein substrates and affect in a diversified manner on the functional activity of other enzymatic systems and the function of cells. Dephosphorylation is of the same importance as phosphorylation and, accordingly, protein phosphatases are the integral components of the signal systems controlled by protein kinases. In eukaryotic cell, about 30% proteins are phosphorylated. Thus, the reversible phosphorylation of proteins catalyzed by protein kinases and protein phosphatases regulates many intracellular processes. Sinceg the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
41
early 1980s, protein phosphatases were actively studied, and by the mid- 1990s, about 200 intracellular and 30 receptor phosphatases were described. It is suggested that about 100-120 catalytic phosphatase subunits and much more quantity of regulatory subunits are encoded in the mammal genome. The combination of these subunits defines the multiformity of phosphatases and testifies about large quantity of targets and regulated functions. There are several alternatives of protein phosphatase classification. For instance, two large groups of these enzymes are separated [143-146]: 1. intracellular low-molecular phosphatases; 2. high-molecular phosphatases associated with the surface receptors. Among intracellular protein phosphatases, two big classes are separated: serine-threonine phosphatases and tyrosine phosphatase. In their turn, serine-threonine phosphatases are divided into two big classes. Phosphatases of the first class are inhibited by two thermostable and acid-resistant proteins, called inhibitor 1 and inhibitor 2 [147-149]. The first type includes phosphatase 1A (PP1) capable of dephosphorylating phosphorylase kinase alpha-subunit (P.K. 2.7.1.38), and this phosphorylation is inhibited by heparin and protamine. Representatives of the second class dephosphorylate beta-subunit of phosphorylase kinase and are insensitive to the action of protamine and heparin [150, 151]. To this class phosphatases 2А, 2В, 2С relate. Phosphatases 1A and 2A are exposed to the action of total specific inhibitors – —ocadaic acid and microcystine-LR which has no effect on phosphatase 2C [152, 153]. Phosphatase РР2В (or calcineurin) is inhibited by immunosuppressive preparations - cyclosporin and FK506 and represents a heterodimer consisting of catalytic, A (59 kDa), and Са2+-binding regulatory, В (19 kDa), subunits [154, 155]. Hence, beside the catalytic domain, catalytic subunit of the enzyme comprises a binding site with calmodulin [156, 157] and C-terminal autoinhibition domain, which elimination leads to permanent CaN activation [157, 158]. In 1991, Monkanen et al. [159] extracted the third type of phosphateses – —PP3 phosphatase with molecular mass 36 kDa, sensitive to protamine and intact to heparin. Tyrosine phosphatases dephosphorylate substrates by tyrosine residues. They are represented by the first discovered phosphatase 1B and T-cellular phosphatase. Phosphatase 1B (ion-dependent, vanadate, molybdate-sensitive phosphatase [160]). In the literature, the name of placental phosphatase is frequently used [161, 162]. This enzyme with 37 kDa molecular mass contains 321 amino acid residues. In cells, phosphatase 1B interlocks phosphorylation of ribosomal protein S6 by S6-kinase, and other insulin-induced effects, i.e., it is the antagonist to tyrosine protein kinase with a receptor bound to insulin [163]. Phosphatase 1B gene is the long arm of chromosome 20; the gene product is 50 kDa protein, which transforms to 37 kDa enzyme via proteolysis [164]. This phosphatase is localized in the endoplasmic reticulum, fixing to the membrane by C-terminal fragment [165]. T-cellular phosphatase (TCP) has the molecular mass of 48 kDa, and carboxyterminal sequence (11 kDa) comprises 200 amino acid residues [166]. TCP substrate is intracellular protein pp34 (serine-threonine kinase), which in dormant cell is phosphorylated by Tyr15, and its dephosphorylation regulates the beginning of mitosis [167]. TCP dephosphorylates synthetic peptides, which reproduce C-terminal sites of pp60c-src (Tyr525) phosphorylation and suggested sites of рр60с-src (Tyr416) и p51frg (Tyr412) autophosphorylation, in this manner possibly affecting their functional activity [168]. TCP is homological to phosphatase
42
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
1B: 65% by nucleotide sequence and 74% by amino acid residues. TCP gene is localized in chromosome 18 in man and mice [169]. VH1-like phosphatases (of dual specificity). Dual specificity phosphatases may dephosphorylate both tyrosine and serine/threonine residues. Dual specificity phosphatases are also homological to Cdc25, regulators of the cell cycle of yeasts. They activate cyclindependent kinases-2 (Cdc2/CDK1) resulting dephosphorylation of neighbor threonine and tyrosine residues. A method of protein phosphatases into three families: PPP Phosphatases, PPM Phosphatases, and PTP Phosphatases, is described. РРР and РТР phosphatases include phosphoserine- and phosphothreonine-specific enzymes; РРМ is the family of phosphatases activated by magnesium. PTP are phosphotyrosine-specific and dual specificity phosphatases. PPP phosphatases. The PPP family comprises protein phosphatases of РР2А, РР1, РР, РР6, РР2В, РР5 and РР7 types. PPM phosphatases are magnesium (Mg2+) activated phosphatases. PPM comprise phosphoderine- and phosphothreonine-specific enzymes. PTP phosphatases are phosphotyrosine-specific ones. Contrary to resine-threonine phosphatases, which are oligomers and subunit composition of which defines substrate specificity of the enzyme, all tyrosine phosphatases are monomeric enzymes. They are divided into two groups: transmembrane or receptor-like and cytosolic. Transmembrane receptor-like PTP phosphatases are classified by the structure of their extracellular domains, which may consist of both extremely short and branched chains. Branched chains are similar to ligand-binding domains of adhesion molecules (fibronectin type), the range of physiological functions of which is very broad. Basing on similarity to ligand-binding domains of adhesion molecules, ist has been suggested that extracellular domains of phosphatases also play the role of receptors. However, neither ligands for these receptors, nor a signaling system associated with them are discovered yet. PTP are cytosolic phosphatases. They are classified according to their domain structure. Their substrates are nucleus and cytoskeleton proteins. The important subclass is composed of SHP-1 and SHP-2 possessing SH2 domains. SH2compact globular domain interacts with the proteins containing phosphorylated tyrosine residue in a definite amino acid sequence. SH2domain comprises 100 amino acid residues. It is found in cytoplasmic sequences of receptors of many growth factors (in the part where the receptor is autophosphorylated by tyrosine residues), phospholipase C, and GAP-protein [170]. The function of this domain, in particular, is enzyme direction to substrate and making the enzyme-substrate interaction easier. Other cytosolic phosphatases are characterized by the presence of PEST sequences (Pro-Glu/Asp-Ser/Thr) in the C-terminal half of the molecule. Physiological functions of protein phosphatases. Serine-threonine phosphatases are antagonists to serine-threonite kinases and play the specific role in signal conduction inside the cell. For instance, 10 min after T-lymphocytes stimulation by the alternative path via CD2 dephosphorylation of cytosolic protein (19 kDa) is observed, but 2-4 h after it was phosphorylated again by serine residues [171-173]. In this connection, the intialinitial interest to tyrosine protein phosphatases was stipulated by the hope that they may be antitumor agents, because transforming effects are associated with activation of tyrosine protein kinases. It has been found, however, that some phosphatases do really amplify protein kinase signals. Phosphatases 1A and 2A inhibit the gene expression, induced by one of ALPL-1 (activator protein-1) transcription factors [174, 175]. The are able to inhibit the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
43
intiationinitiation factor-2 and elongation factor-2 [176]. Under the effect of inhibitor of these phosphatases of ocadaic acid, formation of jun, c-OPC, fra-1 genes mRNA [174] and mRNA of gene encoding interleukin 2 [177] increases significantly. Phosphatases 1A and 2A regulate the action of cytotoxic lymphocytes of target cells [178]. Low doses of ocadaic acid intensify and high doses inhibit the response of cytotoxic lymphocytes. In activated T-cells, phosphatase PP2B dephosphorylates nuclear factors of activated Tcells that leads to their transfer into the nucleus, where they interact with other transcription factors. As found out in 1990s, phosphatase PP2B signal is sufficient, and in some cases necessary for supernormal heart growth. Polymorphous forms of protein kinases are rather widely described. For example, the following replacements in the structure of catalytic subunit PP1 of phosphatase are typical of Japan: Glu310Stop, Leu1098Stop, Leu1086His, Gln1062Lys, Thr330Asn, Glu674Stop and Pro1017Ala. These replacements are observed in 7% of population and lead to increased risk of lung cancer. Phe229Leu, Gly639Cys and Glu275Val replacements are manifested in 14% Japanese population and increase the risk of ovarian carcinoma, colon and stomach cancer [179]. Arg883Ser replacement was observed in 27% ethnic group of Pima tribe living in Arisona and affined to adiposis and diabetes development. Reduced insulin concentration in fasting plasma and high level of insulin mediated glucose capture as a response to insulin injection are associated with this mutation. In this ethnic group, 3-UTR polymorphism of (untranslated) sequence of PP1 phosphatase gene sequence (frequency of occurrence is 0.44) was observed. This leads to a 10-fold decrease of the enzyme expression level and is associated with resistance to insulin and type 2 diabetes [180]. Nonsynonimous replacement Asp905Tyr in amino acid sequence of catalytic PP1 unit correlates with the changes in insulin secretion and, as a consequence, with the occurrence of resistance to insulin in skeletal muscles. Moreover, a noticeable correlation between genotypes characterized by Asp905Tyr polymorphism and risk of Altzheimer‘sAlzheimer‘s disease was observed [181]. Replacements in the structure of PP2A beta-subunit leading to an increase of mammary gland cancer risk (Gly90Asp) and colon cancer (Gly15Ala, Leu499Ile, Val498Glu, Val500Gly, Ser365Pro) were found [182, 183]. Table 6 shows the summary information on detected phenotypically valuable mutations in РР1 and РР2А genes.
Angiotensin-Converting Enzyme Angiotension-converting enzyme (ACE), (ЕС 3.4.15.1) represents metalloproteinase containing zinc atom in the active site and being activated by chlorine ions [184]. Determination of the primary ACE structure of human endothelial cells allowed to detect high internal homology between two large doamains (357 amino acid residues each) of one polypeptide chain (1277-1278 residues) [185]. This ACE form is synthesized in all somatic cells, except of testicles [186, 187]. Each of the domains contains the active site and zinc atom; both domains are catalytically active, but are unequal. Active sites of the domains differ by peptide hydrolysis rates, rate of deceleration by specific ACE inhibitors and activation
Table 6. Amino acid replacement as a result of singular mutations in protein phosphatase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Arg883Ser Asp905Tyr Gln1062Lys Glu275Val Glu310Stop Glu674Stop Gly639Cys Leu1086His Leu1098Stop Phe229Leu Pro1017Ala Thr330Asn
-0.59 -1.31 -0.91 -1.22 -1.22 -1.22 -0.67 1.22 1.22 1.92 -0.49 -0.28
99 85 93 82 82 82 64 55 55 50 82 77
Gly15Ala Gly90Asp Leu499Ile Ser365Pro Val498Glu Val500Glu
-0.67 -0.67 1.22 -0.55 0.91 0.91
64 64 55 78 46 46
Hydrophobic property after mutation (HBm)1 РР1 (PPP1R3 gene) + -0.55 1.67 0 -0.67 0.91 0 0.17 0 -0.64 0 0 1.22 0 -0.4 0 -0.92 РР2А (PPP2R1B gene) 0 -0.4 0 -1.31 0 1.25 0 -0.49 0 -1.22 0 -0.67 AA charge in norm3
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
78 85 97 46
0 0 + 0
55 83
0 +
55 62 88 62 85 40 82 82 64
HBm/ HBn
Sm/Sn
0 0 0
0.93 -1.27 0.74 -0.75 0.00 0.00 -0.25 -0.52 0.00 0.64 0.82 3.29
0.79 1.00 1.04 0.56 0.00 0.00 0.86 1.51 0.00 1.10 0.76 1.14
0 + 0 0
0.60 1.96 1.02 0.89 -1.34 -0.74
0.97 1.33 0.73 1.05 1.78 1.39
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
45
profile by chlorine ions [188-190]. In contrast with N-domain, C-domain activity depends on chlorine ion concentration. In the absence of chlorine ions, C-domain loses its activity; the maximal activity is observed at 200-800 mM concentration depending on the substrate used. In the absence of chlorine ions, N-domain preserves the activity and is fully activated at rather low concentration (10-15 mM). It is suggested that these differences are of significant physiological meaning. It is suggested that in the organism N-domain may perform specific hydrolysis of some physiologically important substrate, such as negative regulator of haemopoiesis, peptide AcSDKP [191], luliberin [190]. Convertion of enkephalin seven-membered precursor Tyr1Gly2-Gly3-Phe4-Met5-Arg6-Phe7 to enkephalin is also performed mostly by N-domain. Most probably, in vivo at convertion of this heptapeptide to (Met5)-enkefalin the predominant role is also played by N-domain [192]. At the same time, Leu5-enkefalin and Met5-enkefalin are faster degraded by C-domain. Angiotensin I and bradykinin are hydrolyzed by both domains, angiotensin I being degraded somewhat faster by C-domain. Specific ACE inhibitors used in clinics reduce activity of both domains [193], being somewhat different by efficiency that is stipulated, in general, by difference in the dissociation rate. Depending on predominant interaction between inhibitors and one of two active sites, their biological effect may vary at their application as drugs. ACE plays an important role in blood pressure regulation and electrolyte balance by hydrolysis of angiotensin I to angiotensin II. Angiotensin II is a potential vasopressor and aldosterone-stimulating peptide sustaining cardiovascular homeostasis. The effect of ACE on cardiovascular system is in many ways genetically stipulated. A connection between ACE gene polymorphism and its activity in blood and tissues and increased risk of occurrence of some cardiovascular diseases was observed. In case of ACE gene cloning, it was found that in nitron 16 a DNA fragment (Alu-repeat) consisting of 287 base pairs is either present (Insertion, I) or absent (Deletion, D) [194, 195]. Hence, a correlation betwennbetween D alleles and ACE level in blood, linpha and tissues was observed. ACE level in serum of healthy men allozygous by D allele (DD genotype was observed approx. in 36% of men) was almost twice higher than in ones allozygous by I allele (II genotype observed for about 17%) and moderate for heterozygous ones – —ID genotype (47%). ACE level in human heart was also associated with the gene polymorphism [196]. ACE gene D-alleles are assumed to be the risk factor of acute myocardial infarction [197], hypertension, coronary vessel spasm [198], left ventricular hypertrophy, extravasations, and high risk of atherosclerosis development [199]. In patients homozygous by D-allel an increased tonicity of smooth musculature of vessels is observed [200]. I-alleles area associated with increased endurance of sprotsmen under physical loads (runners, oarsmen, climbers) [201]. Genetic predisposition to cardiovasularcardiovascular diseases, including acute ischemias [202], is observed in men, which, according to common criteria, are characterized by low risk factors (usually, risk factros are excessive body mass, hypercholesteremia, lipoproteinemia, etc.) [198, 203, 204]. The study of 4,773 patients with diabetes demonstrated that ACE gene D-alleles were associated with the risk of the main disease complication with nephropathy, but not with diabetical retinopathy (2,010 patients were examined). This was observed for both achrestic and usual diabetes [205]. Basing on the analysis of 145 messages, which included examinations of 5,000 patients, a group of investigators from several European countries has concluded that D-alleles are associated with increased risk of coronary vessels
46
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
disease, acute myocardial infarction, extravasation and diabetical nethropathy, especially at atherosclerotic diseases, but not with hypertension [206]. However, this did not concerned malignant hypertension, for which a connection of D-alleles with the risk of disease was observed [207]. For the malignant form, ACE DD-genotype was observed three times more frequent than for benign form. Detected SNP in ACE gene promotor of A-239-T type increases the risk of Altzheimer‘sAlzheimer‘s disease and reduces the risk of acute myocardial infraction among Europeans. The occurrence of G in 20218042 position and А in 20221743 position 45-fold increases the risk of Altzheimer‘sAlzheimer‘s disease in the Arabian-Israeli group. A the same time, A-11599-G replacement in exon 7 reduces the risk of this disease for Europeans. Simultaneous presence of А-262-Т and А-11860-G mutations in exon 17 increases the risk of artherial pressure rise in the Africans. А-240-Т replacement in the gene promotor increases the risk of mammary gland cancer in Chinese women.
Cyclooxygenase Cyclooxygenase (prostaglandin endoperoxide synthase, prostaglandin-H-synthase, PGendoperoxide synthase, PG-synthase, PES), (ЕС 1.14.99.1) catalyzes convertion of polyunsaturated fatty acids to PG-endoperoxide (PGH))1, which is the general precursor for other prostaglandins and thromboxane [208, 209]. During this conversion the enzyme performs two catalytic reactions – —cyclooxygenase reaction, in which 15-hydroperoxy-POendoperoxide (PGG) is formed, and peroxidase reaction, by which two-electron deoxidation of PGG to PGH happens. Both activities are associated with one protein molecule [210-212]. Cyclooxygenase is the integral membrane protein, mostly observed in microsomal membranes [213]. The studies of subcellular localization indicate that this enzyme is associated with endoplasmic reticulum. In these cells cyclooxygenase was also detected in the nuclear fraction (nuclear membranes) and plasmic membranes [214, 215]. Cyclooxygenase is a homodimer with subunit molecular weight of about 72 kDa [210, 211, 216]. The enzyme contains 2 to 3 oligosaccharides Man9(GlcNAc)2 and Man-6(GlcNAc)2 per subunit of the protein [211, 217]. Molecular weight calculated for the primary structure and taking no account of oligosaccharide residues equaled 65.5 kDa [218-220]. Cyclooxygenase is also hemoprotein. The hemin-protein subunit complex in stoichiometric ratio of 1:1 is formed. It is shown that iron atom forms a complex with the protein via His309 residue [221, 222]. For the mechanism of catalysis Tyr385 residue is of importance [223-225]. Tyr residue interlocking by a soecificspecific agent [223] and site-specific mutation with Tyr replacement by Phe [224] led to full loss of cyclooxygenase but not peroxidase activity by PG synthase. Cyclooxygenase in two isoforms (COX-1 and COX-2) plays an important physiological role in homeostatic and compensatory-reducing processes via regulation of prostaglandin synthesis from arachidonic acid. Both isoforms of the enzyme are represented everywhere, but are unequally distributed in different organs and tissues, and are functionally different [208, 209].
1
http://obi.img.ras.ru/humbio/endocrinology/00133206.htm
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
47
COX-1 is the constructive enzyme permanently expressed in cells. COX-1 predominates in mucous cover of gastrointestinal tract, where it implements the cytoprotector function, in thrombocytes, with which it is connected by aggregate properties, in kidney cells and some other organs. Constitutional properties of COX-1 are explained by controlled synthesis of thromboxane A2, prostaglandin E2 and prostacyclin. Insufficient COX-1 activity induces damage of mucous cover of gastrointestinal tract, the risk of gastropathies development up to alarm bleedings and perforations, inclusively. Nonsteroidal antiinflammatory drugs suppress COX-1 activity. Thrombocyte COX-1 activity suppression accompanied by reduction of thrombocytes aggregate abilities increases the risk of hemorrhage at clotting disorders, vasculites, but appears desirable at heart diseases and some other pathological states. COX-2 is the inducible enzyme, which expression may be caused by inflammation mediators; it dominates in brain, genital organs, kidneys, mononuclear leukocytes of blood (monocytes) and tissues (macrophages). In kidneys, it is one of the important enzymes controlling water and sodium reabsirption and, via it, other functions. COX-2 affects blood circulation by stimulating vasodilatory prostacyclin-12 synthesis. It is found that COX-2 promotes spreading of malignant neoplasms and development of inflammatory processes in joints and muscles. A polymorphism of COX-1 gene promoter A-842-G leading to occurrence of persistence to aspirin was found [226]. 18 polymorphous modifications of COX-1 gene were detected, 7 of them were nonsynonymous Arg8Trp, Pro17Leu, Arg53His, Lys185Thr, Gly230Ser, Leu237Met, Lys341Arg) [227, 228]. 4 nucleotide replacements and one deletion in the intron sequence were additionally identified. For three identified replacements only phenotypic manifestations were detected. Leu237Met replacement (for spatial disposition of this replacement in the COX-1 structure see figure 12) increases the risk of colon cancer in Caucasians [227]. At the same time, Arg8Trp and Pro17Leu replacements in the signal peptide significantly reduce sensitivity to aspirin in European patients [228]. As a consequence of COX-1 gene polymorphous modifications, five of seven determined amino acid replacements are represented at the background of entropic image (see figure 13) of human COX-1 close homologue - rabbit COX-1 (84% identity). The data indicate that detected mutations may locate in the areas of both low and high Shannon‘s entropies. Polymorphism Е-8473-С in COX-2 gene stop codon significantly increases the risk of lung cancer [229]. Val511Ala mutation is associated with intestine cancer manifestation [230]. Figure 14 shows spatial disposition of this replacement at the background of the structure of human COX-2 close homologue – mouse COX-2 (88% identity). The replacement is located far from the active site, in the area with high Shannon's entropy (see figure 15). Detection of COX-2 gene polymorphism (guanine replacement by cytosine in position 765 of the promoter) reduces COX-2 expression and allows genetic determination of the risk of acute myocardial infraction and atherothrobic ischemic insult [231, 232]. Table 7 summarizes information on the replacements for cyclooxygenases described. It should be noted that phenotypically sufficient replacements in COX-1 are the ones by hydrophobic amino acids. In all cases, replacements by hydrophilic amino acids give no sufficient phenotypic effects.
Table 7. Amino acid replacement as a result of singular mutations in cyclooxygenase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Arg8Trp Leu237Met Pro17Leu
-0.59 1.22 -0.49
Arg53His Gly230Ser Lys185Thr Lys341Arg
-0.59 -0.67 -0.67 -0.67
Val511Ala
0.91
Probability of Hydrophobi Probability of AA occurrence AA c property AA occurrence on the globule charge in after on the globule surface in norm norm3 mutation surface after (Sn)2 (HBm)1 mutation (Sm)2 COX-1 (replacements with phenotypic manifestations) 99 + 0.5 73 55 0 1.02 60 82 0 1.22 55 COX-1 (replacements without phenotypic manifestations) 99 + -0.64 83 64 0 -0.55 78 97 + -0.28 77 97 + -0.59 99 COX-2 46 0 -0.4 62
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 0
-0.85 0.84 -2.49
0.74 1.09 0.67
+ 0 0 +
1.08 0.82 0.42 0.88
0.84 1.22 0.79 1.02
0
-0.44
1.35
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 12. Cyclooxygenase-1 structure with indication of Leu237Met amino acid replacement.
49
50
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 13. Entropic image of cyclooxygenase-1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 14. Cyclooxygenase-2 structure with indication of Val511Аlа amino acid replacement.
51
52
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 15. Entropic image of cyclooxygenase-2. Red point indicates the place of Val511Аlа amino acid replacement.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
53
Catalase Catalase (ЕС 1.11.1.6) is the enzyme of hydroperoxidase group; it catalyzes redox reaction, in which 2 hydrogen peroxide molecules form water and oxygen1. Catalase is widely spread in cells of animals, plants and microorganisms; it relates to chromoproteids, which have oxidized heme as prosthetic (nonprotein) group. Typical heme catalase has large molecular mass (250-300 kDa) and possesses extremely high catalytic activity: nearly any collision of its macromolecule with the substrate finishes by substrate degradation. Four subunits in the catalase molecule are folded so that N-terminal sequence of the polypeptide chain in each subunit passes through a loopbinding heme-containing domain of one subunit with the domain including spiral sequence of the neighbor subunit. Similar to the family of hemoglobin, heme-containing catalases have their own unique spatial organization, invariable during the evolution. All studied heme catalases have the same packing of the polypeptide chain, which is called the catalase type of folding. Catalase specificity in relation to substrate-reducer is low, therefore, catalase may catalyze not only H2O2 decay, but also oxidation of the lower alcohols. The function of catalase irs reduced to degradation of toxic hydrogen peroxide formed during various oxidative processes in the organism. This enzyme is present in many cells (including erythrocytes in blood and liver cells). Polymorphous modifications of the catalase gene increase the risk of some diseases. For example, G replacement by A in position 5 of intron 4 leads to occurrence of the risk of acatalasemia (hereditary absence or low level of catalase in blood leading to frequently repeating infections and gingivitis and ulitis; this disease is most widespread among Japanese - Takahara's disease) [233, 234]. Deletion polymorphism of the catalase gene is described. It is associated with the risk of aniridia (characterized by full or partial hypoplasia of iris of the eye accompanied by cataract, opacity of cornea, glaucoma, etc.; the frequency of occurrence of the pathology is 1 per 64,000 births to 1 per 96,000 births) [235]. SNP in position 844 of the start codon of the catalase gene is associated with increased risk of hypertension in Chinese [236].
Myeloperoxidase Myeloperoxidase (MPO), (ЕС 1.11.1.7) is the general name of enzymes of peroxidase subclass contained in the blood cells of myeloid sequence. MPO represents a hemecontaining protein with molecular weight 150 kDa2. MPO molecule consists of two heterodimers with heme incorporated in their structure and bound by disulfide bond. Each heterodimer consists of 59 kDa ans 1-3.5 kDa subunits bound by disulfide bridge. Figure 16 shows 3D structure of the enzyme.
1 2
http://obi.img.ras.ru/humbio/Biochem/0009552f.htm#empty/x0088854.htm http://obi.img.ras.ru/humbio/har/0038a3ab.htm
54
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 16. Myeloperoxidase structure.
Hydrogen peroxide, chlorine and myeloperoxidase of neutrophil form a system producing extremely toxic substances: hypochlorite and molecular chlorine. These substances oxidize and halogenate various components of bacteria and tumor cells, and in very high concentrations may damage tissues. Myeloperoxidaze attaches makes gleet green and, possibly, participates in suppression of inflammation by means of inactivation of chemoattractants and suppression of motor activity of phagocytes. Deficiency of myeloperoxidase is the most widespread disturbance of neutrophil function. The disease is autosomally-recessively inherited. Its occurrence is about 1:2000. In the absence of other diseases making protective forces of the organism weaker, primarily, noncompensated diabetes, no deficiency of myeloperoxidase is manifested. Its activity compensates other antimicrobial systems of phagocytes, for example, hydrogen peroxide formation increases. Bactericide action of neutrophils delays, but not fully disappears. When deficiency myeloperoxidase is combined with diabetes, resistance to infections is significantly reduced. Acquired deficiency of myeloperoxidase is observed at acute myeloblastosis and acute myelomonoblast leukemia. The following polymorphisms: Arg569Trp [237], Tyr173Cys [238], Met251Thr [239], Alal66Val [240], Leu406Trp [240], as well as the presence of deletions in exons 3 and 9 [240] lead to various forms of MPO deficiency in the cells of myeloid sequence. Disposition of determined amino acid replacements at the background of entropic image is shown in
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
55
figure 17. G-463-A polymorphism in MPO gene promoter significantly increases the risk pof Altzheimer‘sAlzheimer‘s disease in men and affects hormone-substituting therapy at atherosclerosis development [241, 242].
Figure 17. Entropic image of myeloperoxidase. Red points indicate places of two amino acid replacements described resulting single enzyme gene mutations.
Eosinophilic Peroxidase Eosinophilic peroxidase (EPX), (ЕС 1.11.1.7) is the general name of enzymes of peroxidase subclass contained in eosonophils. EPX is the heme-containing dimer protein with total molecular mass of 70 kDa, consisting of light (15 kDa) and heavy (55 kDa) subunits bound with one another by a disulfide bridge. EPX participates in such pathological processes
56
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
as acute respiratory distress-syndrome (the main reason for mortality of patients after operation) or recently described autoimmune syndrome X, at which blood vessels are damaged. EPX gene mutations leading to deficiency of its activity are described. This is mutation leading to His286Arg replacement and occurrence of an insertion segment at the junction of intron-exon-10 [243]. Figures 18 and 19 show disposition of the amino acid replacement in the EPX structure and position of this replacement at the background of the entropic image, respectively. The important moment is disposition of the replacement in the low-entropic (conservative) range that shall affect significantly catalytic properties of the enzyme.
Figure 18. The structure of eosinophilic peroxidase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
57
Figure 19. Entropic image of eosinophilic peroxidase. The red point indicates the place of Arg286His replacement resulting the single enzyme gene mutation.
Thyroid Peroxidase Thyroid peroxidase (TPO) relates to the class of oxidoreductases, the electron acceptor for which is hydrogen peroxide. This enzyme is contained only in the thyroid gland that gives it the unique property compared with other tissues – —this is the ability to oxidize iodide. TPO is the essential participant of biosynthesis of thyroid hormones. For this type of peroxidases, several mutations are described, which lead to amino acid replacements and, as a consequence, to increase of probability of defects manifestation in the system of iodide organic conversion. These replacements are: Ile447Phe [244], Tyr453Asp [245], Gly590Ser [245], Arg648Glu [246], Trp693Arg [247], Glu799Lys [247], and subsitutions Туr/Asp and Glu/Lys in exons 9 and 14 [245], as well. Table 8 summarizes the information about amino acid replacements in the structure of human peroxidases, which are anyhow associated with phenotypic manifestations. In may be noted that for TPO, in 5 cases of 6, the replacement is made by amino acid, which contact with the solution surrounding the protein is more probable than in norm.
Table 8. Amino acid replacement as a result of singular mutations in human peroxidase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobi c property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Ala166Val Arg569Trp Leu406Trp Met251Thr Tyr173Cys
-0.4 -0.59 1.22 1.02 1.67
62 99 55 60 85
Arg286His
-0.59
99
Arg648Gln Glu799Lys Gly590Ser Ile447Phe Trp693Arg Tyr453Asp
-0.59 -1.22 -0.67 1.25 0.5 1.67
99 82 64 40 73 85
Hydrophobic property AA charge after in norm3 mutation (HBm)1 Myeloperoxidase (MPO) 0 0.91 + 0.5 0 0.5 0 -0.28 0 0.17 Eosinophilic peroxidase (EPO) + -0.64 Thyroid peroxidase TPO) + -0.91 -0.67 0 -0.55 + 1.92 0 -0.59 0 -1.31
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/S
46 73 73 77 55
0 0 0 0 0
-2.28 -0.85 0.41 -0.27 0.10
0.74 0.74 1.33 1.28 0.65
83
+
1.08
0.84
93 97 78 50 99 85
0 + 0 0 + -
1.54 0.55 0.82 1.54 -1.18 -0.78
0.94 1.18 1.22 1.25 1.36 1.00
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
59
Superoxide Dismutase (SOD) Superoxide dismutase (superoxide‘superoxide-oxidoreductase, ЕС 1.15.1.1) is represented by a family of metalloenzymes, which catalyze dismutation of superoxide radicals. Superoxide dismutases are the main enzymes playing the key role in utilization of free radicals and oxidative damage of the cell1. By SOD activity, organs of mammals differ by tens of times. The highest Cu,Zn- and Mn-SOD activity was observed in liver. High Cu,Zn-SOD activity is observed in erythrocytes that allows to use blood as the source for extraction and purification of the enzyme. At present, three superoxide dismutase isoenzymes are known, which were found in man. Table 9 presents brief characterization of these three enzymes. Physiological function of SOD is associated with protection of cells against free-radical damage. In conditions of normal exchange suoeroxide dismutases preserve standard concentration of superoxide radicals at a particular level. In the literature, special attention is paid to the study of changes in erythrocyte SOD activity during aging of the organisms and at some diseases, such as hemolytic anemia, ischemia, and some neurolytic diseases. A significant role is belonged to superoxide radicals in development of inflammatory processes. These investigations resulted in the use of SOD as anti-inflammatory agent (orgotein, peroxynorm). Table 9. Human superoxide dismutase characteristics Nomenclature Cofactor
SOD1 Cu2+, Zn2+
SOD2 Mn2+
Localization
Cytoplasm
Mitochondria
Structure Molecular mass (kDa) Number of amino acids Number of exons Gene
Homodimer 32 153 5 21q22.1
Homotetramer 84 153 1 6q25
SOD3 Cu2+, Zn2+ Intercellular space Homotetramer 120 222 1 4р15.2
SOD-1 SOD-1 possesses the highest activity among all superoxide dismutases. Activity of this enzyme is independent of medium pH in the range of 5-9. Figure 20 shows 3D structure of SOD-1. The Cu-binding site of the enzyme comprises 4 His, and the Zn-binding site – 2 His and 1 Asp. Arg-141, which is sufficient for the catalytic activity, and Cys58 and 160 disulfide bridge, unique per subunit, are permanent. Each SOD-1 subunit has a barrel-shaped structure (beta-barrel) formed by 8 antiparallel beta-layers and contains 3 corbelled external loops. The dimer represents a prolate ellipsoid (33, 67, 36A). About 5% of the external surface of each subunit is occupied by the contact zone. The first and terminal pairs of beta-layers of the betabarrel and the zones of two loops in the sequences 47-82 and 100-112 of residues. Beta-barrel 1
http://obi.img.ras.ru/humbio/proteins/000fc6e4.htm
60
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
is asymmetric: beta-layers from 5 to 8 are shorter and have shorter number of hydrogen bonds than layers 1-4. The loops differ by sizes and the structure. The greatest loop comprises a disulfide bridge and the zone of Zn binding site. The disulfide bond links covalently the large loop and the beginning of beta-layer 8. The second loop has a small alpha-spiral zone.
Figure 20. Superoxide dismutase-1 structure. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
The distance between C atoms and active sites equals 33.8A. Separation of two active sites in space and their seeming identity allow a suggestion that strong dimer interaction provides, more likely, structural stability of SOD rather than enzymatic function. Amino acid residues His61 and Arg141 are important for realization of a catalytic act. Cu(II) and Zn(II) are located at the bottom of deep narrow channel, at a distance of 6.3A: Zn is fully submerged, Cu is more open and accessible for the solvent. Side His-61 chain forms the bridge between Cu and Zn. Cu ligands are His-44, His-46, His-61 and His-118; Zn ligands are His-61, His-69, His-78 and Asp-81. Position of metal-binding residues is stabilized by a complex of hydrogen bonds. Molecular surface of the active site channel is formed by 18 amino acid residues.
Table 10. Amino acid replacement as a result of singular mutations in human superoxide dismutase genes AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Ala112Gly Ala145Thr Ala4Val Ala95Thr Arg46His Asp90Ala Asp96Asn Cys6Phe Glu100Gly Glu21Lys Gly12Arg Gly16Ser Gly37Arg Gly41Asp Gly41Ser Gly72Ser Gly85Arg Gly93Ala Gly93Arg Gly93Cys His43Arg His80Ala Ile104Phe Ile113Thr Ile151Thr Leu106Val Leu112Stop Leu144Ser Leu38Val
-0.4 -0.4 -0.4 -0.4 -0.59 -1.31 -1.31 0.17 -1.22 -1.22 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.64 -0.64 1.25 1.25 1.25 1.22 1.22 1.22 1.22
62 62 62 62 99 85 85 55 82 82 64 64 64 64 64 64 64 64 64 64 83 83 40 40 40 55 55 55 55
AA charge in norm3 SOD-1 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 + + + + + 0 0 0 0
Hydrophobic property after mutation (HBm)1
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/Sn
-0.67 -0.28 0.91 -0.28 -0.64 -0.4 -0.92 1.92 -0.67 -0.67 -0.59 -0.55 -0.59 -1.31 -0.55 -0.55 -0.59 -0.4 -0.59 0.17 -0.59 -0.4 1.92 -0.28 -0.28 0.91
64 77 46 77 83 62 88 50 64 97 99 78 99 85 78 78 99 62 99 55 99 62 50 77 77 46
0 0 0 0 + 0 0 0 0 + + 0 + 0 + 0 + 0 + 0 0 0 0 0
1.68 0.70 -2.28 0.70 1.08 0.31 0.70 11.29 0.55 0.55 0.88 0.82 0.88 1.96 0.82 0.82 0.88 0.60 0.88 -0.25 0.92 0.63 1.54 -0.22 -0.22 0.75
1.03 1.24 0.74 1.24 0.84 0.73 1.04 0.91 0.78 1.18 1.55 1.22 1.55 1.33 1.22 1.22 1.55 0.97 1.55 0.86 1.19 0.75 1.25 1.93 1.93 0.84
-0.55 0.91
78 46
0 0
-0.45 0.75
1.42 0.84
Table 10.(Continued)
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Leu84Phe Leu84Val Phe45Cys Ser134Asn
1.22 1.22 1.92 -0.55
Probability of AA occurrence on the globule surface in norm (Sn)2 55 55 50 78
Ala16Val
-0.4
62
Arg213Gly
-0.59
99
AA charge in norm3 0 0 0 0 SOD-2 0 SOD-3 +
Hydrophobic property after mutation (HBm)1 1.92 0.91 0.17 -0.92
Probability of AA occurrence on the globule surface after mutation (Sm)2 50 46 55 88
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 0 0
1.57 0.75 0.09 1.67
0.91 0.84 1.10 1.13
0.91
46
0
-2.28
0.74
-0.67
64
0
1.14
0.65
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA the probability that over 5% of AA molecule surface contacts with solution surrounding the protein amino acid charge at рН 6-7
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
63
Figure 21. Entropic image of superoxide dismutase-1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Investigations detected SOD-1 gene mutations at motor neuron diseases in 14-25% cases of family amyotrophic lateral sclerosis [248, 249], and in 5-7% patients with sporadic form of amyotrophic lateral sclerosis. Totally, by now over 60 mutations in this gene are described [250]. Among them, 52 are missence mutations, at which replacement of one nucleotide by another causes no change of the polypeptide molecule length. Among the replacements detected, 33 are nonsynonymous and are represented in the summary table 10 [249, 251-275], and at the background of SOD-1 entropic portray, as well (see figure 21). Beside this, deletion or insertion mutations, located in codons 126-133, as well as splicing mutation in the 3‘terminal end of intron 4 were determined. These mutations lead to translational frameshift and the change of polypeptide length. Analysis of locations of replacements on the entropic image of the enzyme indicates that of SOD-1 replacements in both high and low entropy zones are typical. It is interesting that not for all replacements in the low Shannon's entropy zone a significant decrease of catalytic activity of the enzyme is described. Only for two of 33 cases replacement by negatively charged amino acid is observed. In all other cases, no replacements by negatively charged amino acids were observed.
SOD-2 Beside SOD-1, the level of free radicals in the cell is also controlled by Mn-dependent SOD-2. Numerous investigations allow a suggestion that mitochondrial SOD-2 plays an important role in cell protection against stress [276, 277]. The detailed study of SOD-2 gene in patients with both family and sporadic amyotrophic lateral sclerosis has detected not mutations associated with development of the disease. However, recently some investigators determined Ala9Val polymorphism in SOD-2 gene and found that allele containing the alanine sequence is associated with increased risk factor of motor neurons disease development [277]. This polymorphism is also of interest, because, most likely, it affects cellular localization of the enzyme rather than its activity. Such
64
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
mutation changes the secondary structure by breaking the alpha-spiral, which integrity is important for enzyme transport from cytoplasm to mitochondrial matrix [278, 279]. Ala16Val replacement decreases the processing efficiency and, as a consequence, induces the risk of idiopathic cardiomyopathy [280].
SOD-3 For SOD-3 a polymorphism is described that leads to the amino acid replacement Arg213Gly [281]. In case of this mutation, a 10-fold increase of SOD-3 content in the plasma and increase of risk of ischemical heart disease are observed.
CONCLUSION Structural and catalytic polymorphism of hyman enzymes has a significant effect on metabolism of the human organism and risks of manifestations of many diseases. Singular replacements of bases in genes of enzymes may lead to replacements of amino acids in both low entropy and high entropy position of biocatalyst molecule; hence, replacements are more frequently observed in high entropy positions of proteins. Frequently, polymorphous modifications of amino acids in both high and low entropy positions have no effect on catalytic functions of the enzyme, but are tightly associated (entangled) with development of some diseases. Glycine is subject to polymorphous replacements more frequently than other amino acids. In the considered sequence of enzymes, asparagine is not subject to polymorphous replacements at all. The approaches to detection of amino acid residues involved in formation of catalytic sites of the enzyme and theoretical approaches to analysis of changes in coordinates of these amino acids at some singular replacements, developed on the basis of bioinformative methods, hold out a hope of creation of highly efficient methods for forecasting variations of the main functions of enzymes at changes in the structure of genes encoding these enzymes.
ACKNOWLEDGMENT The authors are thankful to V.V. Mal‘gin, the Head of Laboratory on Pharmacology, IPAS RAS, and G.F. Makhaeva, the Leading Sci., IPAS RAS, for presentesing material for description of properties and physiological role of esterases (ACHE, BCHE, CE, paraoxonases). The authors frankly appreciate G.V. Dubacheva, M.S. Osipova, M.V. Porus and L.G. Sokolovskaya for help in collecting information from the literature on existing polymorphisms of enzyme genes and for cooperation in technical preparation of the manuscript.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
65
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
[20]
[21] [22] [23] [24] [25] [26] [27]
Varfolomeev SD, Chemical Enzymology, 2005, M.: Publ.Centre “Akademia”, 472. Soloviev YI, Kurennoi VI, Yakob Bercelius. Life and Activity. 1980, M.: Nauka, 319. Shamin AN, Biocatalysis and Biocatalysts (Historical sketch), 1971, M.: Nauka, 193. Webb L, Inhibitors of Enzymes and Metabolism, 1966, M.: Mir, 1066. Dixon M, Webb E, Enzymes, 1982, M.: Mir, 392. Berezin IV, Martinek K, Fundamentals of Physical Chemistry of the Enzymatic Catalysis, 1977, M.: Vysshaya Shkola, 280. Varfolomeev SD, Zaitsev SI, Kinetic Methods in Biochemical Surveys, 1982, M.: Publ. Moscow State Univ., 345. Varfolomeev SD, Gurevich KG, Biokinetics, 1999, M.: Fair-Press, 720. Varfolomeev SD, Pozhitkov A.E., Vestn. Mosk. Univ., Ser. 2, Khimia, 41, (2000), 147156. Varfolomeev SD, Gurevich KG, Poroinov VV, Sobolev BN, Fomenko AE, Dokl. RAN, (2001), 379, 548-550. Varfolomeev SD, Gurevich KG, Izv. Akad. Nauk, Ser. Khim., (2001), 10, 1629-1637. Varfolomev SD, Mendeleev Comm. 5. 2004. P. 185-189. Varfolomev SD, Uporov IV, Gariev IV, Uspekhi Khimii, (2005), 74, 67-83. Varfolomev SD, Chemical and Biological Kinetics. New Horizons, (2005), M.: Khimia, 2, 175-213. Gariev I.V., Varfolomeev S.D., Bioinformatics. 2006, 22, 2574-2576. Finkelstein A, Physics of Protein, (2002), M.: Nauka. Antonov VK, Chemistry of Proteolysis, (1991), M.: Nauka, 504. McKerell A.D., Wiyrkiewicz-Kuczeru J., Karplus M.// J.ann.chem. Soc. (1995). 117. 11946-11975. McKerell A.D., Bashford D., Bellot M., Dunbrack K.L., Evanseck J.D., Field M.J, Fisher S, Gao J, Guo H, Ha S, Josph-MacCarthy D., Kuchnir L, Kuczera K, Lan F.T.K, Smith J.C, Store R, Straub J, Wa-tanobe M, Wiyrkiewicz-Kuczera J, Yin D, Karplus M, II J. Phys. Chem. 1998. 102- P. 3586-3616. Schelenkrich M, Brickmann J, McKerell A.D, Korplus M, (1996) in a Molecular Perspective from Computation and experiment, Merz km, Roux B, eds, Birkhouser, p. 31-81. Cornell W.D, Cieplak P, Bayly C.I, Gould I.R, Merz K.M, Ferguson D.M, Spellmeyer D.S, Fox T, Coldwell J.W, Kollman P.A.// J arm. Chem. Soc. 1995. 117. 5179-5197. Varfolomeev SD, Uporov IV, Fedorov EV, Biokhimia (2002) 67, 1328-1340. Warshel A, Levitt M.I // J. Mol. Biol. 103- 227. (1976). Nemukhin A.V, Grigorenko B.L, Topol I.A, Burt S.K. // J. Сотр. Chem. 24. 2003. P. 1410. Nemukhin A.V, Grigorenko B.L, Rogov A.V, Topol I.A, Burt S.K. // Ther. Chem. Асе. III, 36. (2004). Koshland D.E, // Proc. Nat. Acad. Sci. USA. 44, 98. (1958). Nemukhin A.V, Gariev I.A, Rogov A.V, Varfolomeev S.D. (2006) // Mendeleev Communications, 16, 290-292.
66
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[28] The International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.// Nature, 409, 928933. [29] Brookes AJ (1999) Gene. 234(2), 177-86. [30] Kruglyak L, Nickerson DA. (2001) Nat. Genet. 27, 234-236. [31] Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) Nucleic Acids Res. 34, D16-20. [32] Wolfsberg TG, Wetterstrand KA, Guyer MS, Collins FS, Baxevanis AD. (2003) Nat. Genet., 35 (Supp 1), 4. [33] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. (2001) Nucleic Acids Res. 29, 308-11. [34] International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome.// Nature, 409(6822), 860-921. [35] Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS.. (2005) Nucleic Acids Res. 33, D154-9. [36] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE.. (2000) Nucleic Acids Res. 28, 235-42. [37] Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P; International HapMap Consortium. (2005) Nature. 437, 1299-320. [38] Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. (2005) Nature. All, 1365-9. [39] Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. (2003) Hum. Mutat. 21, 577-81. [40] Brookes AJ, Lehvaslaiho H, Siegfried M, Boehm JG, Yuan YP, Sarkar CM, Bork P, Ortigao F. (2000) Nucleic Acids Res. 28, 356-60. [41] Iida A, Saito S, Sekine A, Takahashi A, Kamatani N, Nakamura Y. (2006) Cancer Sci. 97, 16-24. [42] McKusick, V.A.: Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders.// Baltimore: Johns Hopkins University Press, 1998 (12th edition). [43] Dantzer J, Moad C, Heiland R, Mooney S. (2005) Nucleic Acids Res. 33, W311-4. [44] Becker KG, Barnes КС, Bright TJ, Wang SA. (2004) Nat. Genet. 36, 431-2. [45] Ng PC, Henikoff S. (2001) Genome Res. 11, 863-74. [46] Ng PC, Henikoff S. (2002) Genome Res. 12, 436-46. [47] Ramensky V, Bork P, Sunyaev S. (2002) Nucleic Acids Res. 30, 3894-900. [48] Sunyaev S, Kondrashov FA, Bork P, Ramensky V. (2003) Hum. Mol. Genet. 12, 332530. [49] Brunham LR, Singaraja RR, Pape TD, Kejariwal A, Thomas PD, Hayden MR. (2005) PLoS Genet. 1, e83. [50] Wang Z, Moult J.. (2001). Hum. Mutat. 17, 263-70. [51] Yue P, Melamud E, Moult J.. (2006) BMCBioinformatics. 7, 166. [52] Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A.. (2005) Bioinformatics. 21, 2814-20. [53] Yue P, Moult J.. (2006) J. Mol. Biol. 356, 1263-74. [54] Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, Bork P. (2001) Hum. Mol. Genet. 10, 591.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism [55] [56] [57] [58] [59]
[60]
[61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83]
67
Yue P, Li Z, Moult J.. (2005) J. Mol. Biol. 353, 459-73. Karchin R, Kelly L, Sali A.. (2005) Рас. Symp. Biocomput., 397-408. Krishnan VG, Westhead DR.. (2003) Bioinformatics. 19,2199-209. Bao L, Cui Y. (2005) Bioinformatics. 21, 2185-90. Sokolovskaya LG, Sigolaeva LV, Eremenko AV, Kurochkin IN, Makhaeva GF, Malyigina VV, Zyikova IE, Kholstjv VI, Zavialova NV, Varfolomeev SD, Chemical and Biological Safety (2004) 1-2 (13-14), 21-31. Makhaeva, G., Filonenko, I., Fomicheva, S., Malygin, V. "Esterase profiles" of O,Odialkyl-0-dimethyl-chloroformimino phosphates in prediction of their toxic effects. Toxicol. Lett., 1996, v.88, Suppl.l, p.25. Small D. H., Michaelson S., Sbema G. Neurochem. Int., 1996, v. 28, p. 453-483. Billecke S, Draganov D, Counsell R. et al. Drug Metab. Dispos., 2000, v. 28, № 11, p. 1355-1342. La Du B. N, Billecke S, Hsu C. et al. Drug Metab. Dispos., 2001, v. 4, № 11, p. 566569. Antikainen M, Murtomaki S, Syvanne M. et al. J.Clin.Invest., 1996, v. 98, № 4, p. 883885. Blatter Garin M.-C, James R. W, Dussoix Ph. et al. J. Clin. Invest, 1997, v. 99, № 1, p. 62-66. Nicholls D. P, Maxwell A. P, Hasselwander O. et al. Atherosclerosis., 1997, v. 134, № 1-2, p.212. 7. Yamomoto M, Kondo I. Brain Res., 1998, v. 806, p. 271-273. Hasin Y, Avidan N, Bercovich D, Korzyn A.D, Silman I, Beckmann J.S, Sussman J.L. (2005), Current Alzheimer Research, 2,207-218. Clarimon J, Bertranpetit J, Calafell F, Boada M, Tarraga L, Comas D. J. Neurol. (2003) 250: 956-961. Barbosa M, Rios O, Velasquez M. et al. Surg. Neurol., 2001, v. 55, p. 106-112. Zhan C. G, Zheng F, Landry D.W. J. Am. Chem. Soc, 2003, v. 125, № 9, p. 2462-2474. Guemei A. A, Cottrell J, Band R. et al. Cane. Chem. Pharmacol., 2001, v. 47, p. 283290. Tunek A, Hjertberg E, Mogensen J.V. Biochem. Pharmacol., 1991, v. 41, № 3, p. 345348. Makhaeva G. F, Suvorov N. N, Ginodman L. M, Antonov V. K. Bioorg. Chem., 1977, v. 3, p. 1384-1399. Richardson, R.J. (1995) J. Tox. Env. Health, 44, 135-165. La Du B.N. et al (1990) Clin. Biochem., 23,423-431. Lockridge, O. (1990) Pharmacol. Therap., 47, 35-60. Pirmohamed, M, and Park, B.K. (2001) TIPS, 22 (6), 298-305. Simeon-Rudolf V, Reiner E, Evans R.T. et al. Chem. Biol. Inter., 1999, v. 119-120, p. 165-71. Simeon-Rudolf V, Kovarik Z, Skrinjaric-Spoljar M, Evans R.T. Chem. Biol. Inter., 1999, v. 119-120, p. 159-164. Satoh T, Hosokava M. Toxicol. Let., 1995, v. 82/83, p. 447-52. Pirmohamed M, Park K. Trends Pharmacol. ScL, 2001, v. 22, № 6, p.298-305. Sudo, K.; Maekawa, M.; Akizuki, S.; Magara, Т.; Ogasawara, H.; Tanaka, T, Biochem. Biophys. Res. Commun. 240: 372-375, 1997.
68
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[84] Barrels CF, Jensen FS, Lockridge O, van der Spek AF, Rubinstein HM, Lubrano T, La Du BN. Am. J. Hum. Genet. 1992 May; 50(5): 1086-103. [85] Barrels CF, James K, La Du BN. Am. J. Hum .Genet. 1992 May; 50(5): 1104-14. [86] Hidaka, K.; Iuchi, I.; Tomita, M.; Watanabe, Y.; Minatogawa, Y.; Iwasaki, K.; Gotoh, K.; Shimizu, C, Ann. Hum. Genet. 61: 491-496, 1997. [87] Manoharan I, Wieseler S, Layer PG, Lockridge O, Boopathy R, Pharmacogenet. Genomics. 2006 Jul. 16(7): 461-8. [88] Primo-Parmo, S.L., Sorenson, R.C., Teiber, J., La Du, B. (1996) Genomics, 33, 498507. [89] La Du B.N., Aviram, M, Billecke S. et al, (1999) Chem. Biol. Inter., 119-120, 379-88. [90] La Du, B.N., Draganov, D. (2004) First International Conference "Paraoxonases - Basic and Clinical Directions of Current Research", Ann Arbor, MI, USA, April 23-24, 2004. wvw.umich.edu/pons-conference/ files/abstract_book.pdf. [91] Sorenson, RC, Bisgaier, C.L., Aviram, M., Hsu, C, Billecke, S., La Du B.N. (1999) Arterioscler. Thromb. Vasc. Biol, 19(9), 2214-2225. [92] Deakin, S., Leviev, I., Gomaraschi, M. et al (2002) J. Biol. Chem., 277(6), 4301-4308. [93] Pellin, M.C., Moretto, A., Lotti, M., Vilanova, E. (1990) Neurotoxicol. Teratol., 12, 611-614. [94] Costa, L.G., Li, W.F., Richter, R.J., Shih, D.M., Lusis, A., Furlong, C.E. (1999) Chem.Biol. Inter., 119-120, 429-438. [95] Costa, L.G., Richter, R.J., Li, W.-F., Cole, Т., Guzzetti, M., Furlong, C.E. (2003) Biomarkers, 8 (1), 1-12. [96] Costa, L.G. (2004) First International Conference "Paraoxonases - Basic and Clinical Directions of Current Research,", Ann Arbor, MI, USA, April 23-24, 2004. www.umich.edu/pons-conference/ files/abstract_book.pdf. [97] Khersonsky, O., Tawlik, D.S,. (2005) Biochemistry, Epub. ahead of print. March 26, 2005; doi:10.1021/bi047440d. [98] La Du, B.N. (1996) Nature Medicine, 2 (11), 1186-1187. [99] Karanth, S. and Pope, C. (2000) Toxicol. Sci., 58, 282-89. [100] Satoh, Т., Taylor, P., Bosron, W.P., Sanghani, S.P., Hosokawa, M., LaDu, B. (2002) Drug Metab. Dispos., 30 (5), 488-493. [101] La Du„. B.N. (1990) Clin. Biochem., 23, 423-431. [102] Karanth, S. and Pope, C. (2003) Int. J. Toxicol., 22, 429-433. [103] Billecke S., Draganov D., Counsell R. et al. (2000) Drug Metab. Dispos., 28(11), 13551342. [104] La Du, B.N., Billecke, S., Hsu, C, Haley, R.W., Broomfield, C.A. (2001) Drug Metab. Dispos., 4 (11), 566-569. [105] Fuhrman, В., Aviram, M. (2002) Ann. NY Acad. Sci., 957, 321-324. [106] Mackness, M.I., Arrol, S., Abbott, C.A., Durrington, P.N. (1993) Atherosclerosis, 104, 125-135. [107] Mackness, M.I, Durrington, P.N, Ayub, A, Mackness, B. (1999) Chem. Biol. Interact., 119/120, 389-397. [108] Nguyen S.D, Sok D-E. (2003; Biochem. J. Oct 15, 375(Pt 2): 275-285. [109] Jarvic, R.P, Hatsukami, T.S, Carlson, C, Richter, R.J, Jampsa, R, Brohy V.H. et al (2003) Arterioscter. Thromb.Vasc.Biol, 23, 1465-1471.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
69
[110] Furlong, C.E, Cole, T.B, Jarvik, J.P, Costa, L.G. (2002), Pharmacogenomics, 3 (3), 341-348. [111] Humbert, R.; Adler, D. A.; Disteche, С. M.; Hassett, C; Omiecinski, C. J.; Furlong, С. E. Nature Genet. 3: 73-76, 1993. [112] Antikainen, M.; Murtomaki, S.; Syvanne, M.; Pahlman, R.; Tahvanainen, E.; Jauhiainen, M.; Frick, M. H.; Ehnholm, C, J. Clin. Invest. 98: 883-885, 1996. [113] Brophy, V. H.; Jampsa, R. L.; Clendenning, J. В.; McKinstry, L. A.; Jarvik, G. P.; Furlong, С. E, Am. J. Hum. Genet. 68: 1428-1436, 2001. [114] Garin, M.-C. В.; James, R. W.; Dussoix, P.; Blanche, H.; Passa, P.; Froguel, P.; Ruiz, J, J. Clin. Invest. 99: 62-66, 1997. [115] Kao, Y.-L.; Donaghue, K.; Chan, A.; Knight, J.; Silink, M, J. Clin. Endocr. Metab. 83: 2589-2592, 1998. [116] Deakin, S.; Leviev, I.; Nicaud, V.; Meynet, M.-C. В.; Tiret, L.; James, R.W, J. Clin. Endocr. Metab. 87: 1268-1273, 2002. [117] Barbieri, M.; Bonafe, M.; Marfella, R.; Ragno, E.; Giugliano, D.; Franceschi, C; Paolisso, G, J. Clin. Endocr. Metab. 87: 222-225, 2002. [118] Janka Z, Juhasz A, Rimanoczy A A, Boda K, Marki-Zay J, Kalman J, Mmol. Psychiatry. 2002;7(1):110-2. [119] Saton T, Hosokawa M. Toxicol. Let, 1995, v. 82/83, p. 439-445. [120] Xie M, Yang F, Liu L. et al. Drug Metab. Dispos., 2002, v. 30, Is. 5, p. 541-547. [121] Saboori A. M, Newcombe D. S. J. Biol. Chem., 1990, v. 265, № 32, p. 19792-19799. [122] McRee D. Chem. Biol, 2003, v. 10, p. 295-297. [123] Brzezinski M.R, Spink B.J, Dean R.A. et al. Drug. Metab. Dispos., 1997, v. 25, p. 1089-1096. [124] Park Y.H, Lee S.S. Biochem. Mol. Biol. Int., 1994, v.34, p. 351-360. [125] Nambu K, Miyazaki H, Nakanishi Y. et al. Biochem. Pharmacol., 1987, v. 36, p. 17151722. [126] Takai S, Matsuda A, Usami Y. et al. Biol. Pharm. Bull., 1997, v.20, p.869-873. [127] Aritoshi Iida, 1 Susumu Saito,2 Akihiro Sekine,3 Atsushi Takahashi,4 Naoyuki Kamatani4 and Yusuke Nakamura1,2,5,6, Cancer Sci. 2006 Jan; 97(1): 16-24. [128] Ashmarin IP, Uspekhi Biologicheskoi Khimii, (2003), 43, 3-18. [129] Shea SH, Wall TL, Carr LG, Li TK, Behav. Genet. 2001 Mar; 31(2): 231-9. [130] Osier M, Pakstis AJ, Kidd JR, Lee JF, Yin SJ, Ко НС, Edenberg HJ, Lu RJ3, Kidd KK, Am. J. Hum. Genet. 1999 Apr; 64(4): 1147-57. [131] Ogurtsov PP, Garmash IV, Miandina GI, Guschin AE, Itkes AV, Moiseev VS, Addict. Biol. 2001 Sep; 6(4): 377-383. [132] Suzuki Y, Fujisawa M, Ando F, Niino N, Ohsawa I, Shimokata H, Ohta S, Neurology. 2004 Nov 9; 63(9): 1711-3. [133] Chai YG, Oh DY, Chung EK, Kim GS, Kim L, Lee YS, Choi IG, Am. J. Psychiatry. 2005 May; 162(5):1003-5. [134] Burnell, J. C; Carr, L. G.; Dwulet, F. E.; Edenberg, H. J.; Li, T.-K.; Bosron, W. F., Biochem. Biophys. Res. Commun. 146: 1227-1233, 1987. [135] Buervenich, S.; Carmine, A.; Gaiter, D.; Shahabi, H. N.; Johnels et. al., Arch. Neurol. 62: 74-78, 2005. [136] Weiss M.J., Cole D.E., Ray K., Whyte M.P., Lafferty M.A., Mulivor R.A., Harris H. (1988), Proc. Natl. Acad. Sci. USA, 85, 7666-7669.
70
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[137] Henthorn P.S., Whyte M.P. (1992), Clin. Chem, 1992, 38, 2501-2505. [138] Henthorn P.S., Raducha M., Fedde K.N., Lafferty M.A., Whyte M.P. (1992) Proc. Natl. Acad. Sci. USA, 89, 9924-9928. [139] Greenberg C.R., Taylor C.L., Haworth J.C., Seargeant L.E., Philipps S., Triggs-Raine В., Chodirker B.N. (1993) Genomics, 17, 215-217. [140] Herasse M., Spentchian M., Taillandier A., Keppler-Noreuil K., Fliorito A.N., Bergoffen J., Wallerstein R., Muti C, Simon-Bouy В., Mornet E. (2003) J. Med. Genet, 40, 605-609. [141] Hu J.C., Plaetke R., Mornet E., Zhang C, Sun X., Thomas H.F., Simmer J.P. (2000) Eur. J. Oral Sci, 108, 189-194. [142] Goseki-Sone M., Sogabe N., Fukushi-Irie M., Mizoi L., Orimo H., Suzuki Т., Nakamura H., Orimo H., Hosoi T. (2004) J. Bone Miner Res, 20, 773-782. [143] Cool D.E., Tonks N.K., Charbonneau H., Walsh K.A., Fischer E.H., KrebsE.G. (1989) Proc. Natl. Acad. Sci. USA, 86,5257-5261. [144] Kaplan R. , Morse B. , Hubner K. , Croce C. , Howk E., Ravera M., Ricca G. , Jaye M. , Schlessinger J. (1990) Proc. Natl. Acad. Sci. USA, 87, 7000-7004. [145] Ramachandran C. , Aebersold R. , Tonks N.K. , Pot D.A. (1992) Biochemistry, 31, 4232-4238. [146] Hunter T. (1989) Cell, 58, 1013-1016. [147] Chan CP., McNall S.J., Krebs E.G. , Fischer E.H. (1988) Proc. Natl.Acad. Sci. USA, 85, 6257-6261. [148] Webb L, Inhibitors of Enzymes and Metabolism, 1966, M.: Mir, 862. [149] Thomas M.L. (1989) Аппи. Rev. Immunol, 7, 339-369. [150] Cohen P., Cohen P.T.W. (1989) J. Biol. Chem., 264, 21435-21438. [151] Monkanen R.E, Zwiller J, Daily S.L. , Khatra B.S, Dukelow M, Boynton A.L. (1991) J. Biol. Chem., 266,6614-6619. [152] Honkanen R.E, Zwiller J, Moore R.E, Daily S.L, Khatra P.S, Dukelow M, Boynton A.L. (1990) J. Biol. Chem., 265, 19410-19404. [153] Metcalfe S., Milner J. (1990) Immunol. Lett., 26. 177-182. [154] Klee C.B, Draetta OF., and Hubbard M. (1988) J. Adv. Enzymol., 61, 149-200. [155] Kakalis L. (1995) FEBS Lett., 362, 55. [156] Sikkink R. et al. (1995) Biochemistry, 34, 8348-8356. [157] Hubbard M.J., and Klee C.B. (1989) Biochemistry, 28, 1868-1874. [158] Hashimoto Y, Perrino B.A., Sodelring T.R. (1990) J. Biol. Chem., 265, 1924-1927. [159] Monkanen R.E, Zwiller J, Daily S.L, Khatra B.S, Dukelow M, Boynton A.L. (1991) J. Biol. Chem., 266, 6614-6619. [160] Chernoff J, Schievella A.R, Jost C.A, Erikson R.L, Neel B.G. (1990) Proc. Natl. Acad. Sci. USA., 87, 2735-2739. [161] Charbonneau H, Tonks N.K., Kumar S, Dilts CD, Harrylock M, Cool E, Krebs E.G., Fischer E.H, Walsh K.A. (1989) Proc. Natl. Acad. Sci. USA., 86, 5252-5256. [162] Tonks N.K, Diltz CD, Fischer E.H. (1988) J. Biol. Chem., 263, 6731-6737. [163] Cicirelli M.F, Tonks N.K, Diltz CD, Weiel J.E, Fischer E.H, Krebs G. (1990) Proc. Natl. Acad. Sci. USA., 87, 5514-5518. [164] Brown-Shimer S, Johnson K.A, Lawrence J.B, Jonson C, Breskin A, Green N.R, Hill D.E. (1990) Proc. Natl. Acad. Sci. USA., 87, 5148-5152. [165] Frangioni J.V, Beahm P.H, Shifrin V, Jost C.A, Neel B.G. (1992) Cell, 68, 545-560.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
71
[166] Cool D.E, Tonks N.K, Charbonneau H, Fischer E.H, Krebs E.G. (1990) Proc. Natl. Acad. Sci. USA., 87, 7280-7284. [167] Gould K.L, Moreno S, Tonks N.K, Nurse P. (1990) Science, 250, 1573-1576. [168] Ruzzene M, Donella-Deana A, Marin O, Perich J.W, Ruzza P, Borin G, Calderan A, Pinna LA. (1993) Eur. J. Biochem., 211, 289-295. [169] Sakaguehi AY, Sylvia V.L., Martinez L, Lalley PA, Shows T.B, Han E.S, Smith E.A, Chosh Choudhury G. (1991) Cytogenet. Cell. Genet., 58,2014-2015. [170] Shlessinger J, Ullrich A. (1992) Neuron, 9, 383-391. [171] Samstag Y, Badler A, Meuer S.C (1990) Immunobiology, 181, 149-150. [172] Samstag Y, Bader A, Meuer S.C. (1991) J. Immunol., 147, 788-794. [173] Samstag Y, Henning S.W, Badler A, Meuer S.C. (1992) Int. Immunol., 4, 1255-1262. [174] Thevenin С, Kim S.-J., Kehre J.H. (1991) J. Biol. Chem., 266, 9363-9366. [175] Chedid M., Ioza B.K., Brooks J.W., Mizel S.B (1991) J. Immunol, 147, 867-873. [176] Redpath N.T., Proud C.G. (1990) Biochem. J., 272, 175-180. [177] Richards F.M., Milner J., Metcalfe S. (1992) Immunology., 76, 642-647. [178] Taffs R.E., Redegeld F.A., Sitkovsky M.V. (1991) J. Immunol, 147, 722-728. [179] Kohno Т., Takakura S., Yamada Т., Okamoto A., Tanaka Т., Yokota J., Cancer Res., 59, 4170-4174. [180] Xia J., Scherer S.W., Cohen P.T., Majer M., Xi Т., Norman R.A., Knowler W.C., Bogardus C. and Prochazka M. (1998) Diabetes, 47, 1519-1524. [181] Liolitsa D., Powell J., Lovestone S. (2002) J. Neurol. Neurosurg. Psychiatry., 73, 2616. [182] Esplin E.D., Ramos P., Martinez В., Tomlinson G.E., Mumby M.C., Evans G.A. (2006) Genes Chromosomes Cancer., 45, 182-90. [183] Takagi Y., Futamura M., Yamaguchi K., Aoki S., Takahashi Т., Saji S. Gut, 41, 268-71. [184] Eliseeva YE, Zh. Bioorgan. Khim. (1998) 24, 262-270. [185] Soubrier F., Alhenc-Gelas F., Hubert C, Allegrini J., John M., Tregear G., Corvol P. (1988) Pros. Natl. Acad Sci. USA., 85, 9386-9390. [186] Antonov VK, Chemistry of proteolysis, (1991) M.: Nauka. [187] Hooper N.M. (1991) Int. J. Biochem., 23, 641-647. [188] Williams T.A., Soubrier F., Corvol P. (1996) Zinc Metalloproteases in Health and Disease/ Ed. N.M. Hooper. L.: Taylor & Francis, 1996. P. 83- 104. [189] Wei L., Alhenc-Gelas F., Corvol P., Clauser E. (1991) J. Biol. Chem., 266, 9002-9008. [190] Jaspard E., Wei L., Alhenc-Gelas F. (1993) J. Biol. Chem., 268, 9496-9503. [191] Rousseau A., Michaud A., Chauvet M.-T., Lenfant M., Corvol P. (1995) J. Biol. Chem., 270, 3656-3661. [192] Deddish P.A., Jackman H.L., Skidgel R.A., Erdos E.G. (1997) Biochem. Pharmacol, 53, 1459-1463. [193] Wei L., Clauser E., Alhenc-Gelas F., Corvol P. (1992) J. Biol. Chem., 267, 1339813405. [194] Rigat В., Hubert C, Alhenc-Gelas F., Cambien F. et al. (1990) J. Clin. Invest., 86, 13431346. [195] Rigat В., Hubert C, Corvol P., Soubrier F. (1992) Nucleic Acids Res., 20, 1433. [196] Danser A.H.,Schalekamp M.A., Bax W.A. et al. (1995) Circulation, 92, 1387-1388. [197] Arbustini E, Grasso M, Fasani R, Kiersy C. et al. (1995) Brit. Heart J., 74, 584-591. [198] Oike Y., Hata A, Ogata Y, Numata Y. et al. (1995) J. Clin. Invest, 96, 2975-2979.
72
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[199] Arbustini E, Grasso M, Fasani R., Kiersy C. et al. (1995) Brit Heart J., 74,584-591. [200] Prasad A, Narayanan S, Waclawiw M.A, Epstein N, Quyyumi A.A. J. Am. Coll. Cardiol, 36, 1579-1586. [201] Woods D.R, Humphries S.E, Montgomery H.E. (2000) Trends Endocrinol. Metab., 11, 416-420. [202] Tiret L., Blanc H,Ruidavets J.B, Arveiler D. et al. (1998 ) J. Hypert, 16, 37-44. [203] Cambien F, Poirier O, Lecerf L. et. al. Nature. 1992. V.359. P.641-644. [204] Lindpaintner K, Pfeffer M.A., Kreutz R. et al. (1995) N. Engl. J. Med, 332, 706-711. [205] Fujisawa T, Ikegami H, Kawaguchi Y, Hamada Y. et al. (1998) Diabetologia, 41,47-53. [206] Staessen J.A, Wang Ji G, Ginocchio G., Petrov V. et al. (1997) J. Hypertension, 15, 1579-1592. [207] Stefansson B, Ricksten A, Rymo L, Aurell M, Herlitz H. (2000) Blood Press, 9, 104109. [208] Varfolomeev SD, Mevkh AT, Prostaglandins – Molecular Bioregulators (1985) M.: Publ. Moscow State Univ., 308. [209] Sergeeva MG, Varfolomeeva AT, Cascade of Arachidonic Acid (2006) M.: Narodnoe Obrazovanie, 255. [210] Miyamoto Т, Ogino N, Yamamoto S, Hayaishi О. (1976) J. Biol. Chem., 251, 26292636. [211] Van der Ouderaa P.O., Buytenhek M, Nugteren D.H, Van Dorp D.A. (1977) Bioch. Bioph. Acta (L), 487, 315-331. [212] Pagels W.R, Sachs R.J, Marnett L.J, DeWitt D.L, Day J.S, Smith W.L. (1983)/. Biol. Chem., 258, 6517-6525. [213] Van der Ouderaa F.J.G, Buytenhek M. (1982) Methods Enzymol, 1982, 86, 60-68. [214] Smith W.L, DeWitt D.L, Allen M.L. (1983) J. Biol. Chem., 1983, 258, 4922-4926. [215] Rollins Т.Е., Smith W.L. (1980)7. Biol. Chem., 255,4872-4876. [216] Van der Ouderaa F.J, Buytenhek M, Slikkerveer F.J, Van Dorp D.A. (1979) Biochim. Biophys. Acta, 1979, 572, 29-41. [217] Mutsaers J.H.G.M, Van Halbeek H, Kamerling J.P, Vliegenthart J.F.G. (1985) Eur. J. Biochem., 147, 569-574. [218] De Witt D.L., Smith D.L. (1988) Proc. Natl. Acad. Sci. USA, 85, 1412-1416. [219] Merlie J.P., Fagan D., Mudd J., Needleman P. (1988) J. Biol. Chem., 263, 3550-3553. [220] Yokoyama C, Takai Т., Tanabe R. (1988) FEBSLett., 231, 347-351. [221] Lambeir A.M., Markey СМ., Dunford H.B., Marnett L.J. (1985) J. Biol. Chem., 260, 14894-14896. [222] Kulmacz R.J., Tsai A.L., Palmer G. (1987) J. Biol. Chem., 262, 10524-10531. [223] Kulmacz R.J., Ren Y., Tsai A.L., Palmer G. (1990) Biochemistry, 29, 8760-8771. [224] Shimokawa Т., Kulmacz R.J., DeWitt D.L., Smith W.L. (1990) J. Biol. Chem., 265, 20073-20076. [225] Dietz R„ Nastainczyk W., Ruf H.H. (1988) Eur. J Biochem., 171, 321-328. [226] Lepantalo A., Mikkelsson J. (2006) Thromb. Haemost., 95(2), 253-259. [227] Goodman J.E., Bowman E.D. (2004) Carcinogenesis, 25(12), 2467-2472. [228] Maree A.O., Curtin R.J. (2005) J. Thromb. Haemost., 3(10), 2340-2345. [229] S. Zienolddiny, Campa D. (2004) Carcinogenesis, 25(2), 229-235. [230] Lin H.J., Lakkides K.M. (2002) Cancer Epidemiol. Biomarkers Prev., 11(11), 13051315.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
73
[231] Konheim E.L. (2003) Hum. Genet., 113(5), 377-381. [232] Cipollone F., Toniato E. (2004) JAMA, 291(18), 2221-2228. [233] Wen J. K., Osumi Т., Hashimoto Т., Ogata M. (1990) Molec. Biol, 211, 383-393. [234] Kishimoto Y., Murakami Y., Hayashi K., Takahara S., Sugimura Т., SekiyaT. (1992) Hum. Genet., 88,487-490. [235] Mannens M., Slater R. M., Heyting C, Bliek J., Hoovers J., Bleeker-Wagemakers E. M., Voute P. A., Coad N., Frants R. R., Pearson P. L. (1987) Cytogenet. Cell Genet., 46, 655. [236] Jiang Z., Akey J. M., Shi J., Xiong M., Wang Y., Shen Y., Xu X., Chen H., Wu H., Xiao J., Lu D., Huang W., Jin L. (2001) Hum. Genet, 109, 95-98. [237] Nauseef W. M., Brigham S., Cogley M. (1994) J. Biol. Chem., 269, 1212-1216. [238] DeLeo F. R., Goedken M., McCormick S. J., Nauseef W. M. (1998) J. Clin. Invest., 101, 2900-2909. [239] Romano M., Dri P., Dadalt L., Patriarca P., Baralle F. E. (1997) Blood, 90, 4126-4134. [240] Marchetti C, Patriarca P., Solero G. P., Baralle F. E., Romano M. (2004) Hum. Mutat., 23, 496-505. [241] Reynolds W. F, Hiltunen M, Pirskanen M, Mannermaa A, Helisalmi S, Lehtovirta M, Alafuzoff I, Soininen H. (2000) Neurology, 55, 1284-1290. [242] Makela R, Dastidar P, Jokela H, Saarela M, Punnonen R, Lehtimaki T.J. Clin. Endocr. Metab., 88, 3823-3828. [243] Romano M., Patriarca P., Melo C, Baralle F. E, Dri P. (1994) Proc. Nat. Acad. Sci., 91, 12496-12500. [244] Bikker H, Baas F, De Vijlder J. J. M. (1997) J. Clin. Endocr. Metab., 82, 649-653. [245] Bikker H., Vulsma T, Baas F, de Vijlder J. J. M. (1995) Hum. Mutat., 6, 9-16. [246] Pannain S, Weiss R. E, Jackson С. E, Dian D, Beck J. C, Sheffield VC, Cox N, Refetoff S. (1999) J. Clin. Endocr. Metab., 84, 1061-1071. [247] Fugazzola L, Cerutti N, Mannavola D, Vannucchi G, Fallini C, Persani L., Beck-Peccoz P. (2003) J. Clin. Endocr. Metab., 88, 3264-3271. [248] De Belleroche J, Leigh P.N, Clifford Rose F. (1997) Familial motor neuron disease. In: Leigh P.N, Swash M. (eds). Motor neuron disease. // Springier-Verlag, London, 35-51. [249] Rosen D. R, Siddique T, Patterson D, Figlewicz D. A, Sapp P, Hentati A, Donaldson D, Goto J, O'Regan J. P, Deng H.-X, Rahmani Z, Krizus A, McKenna-Yasek D, Cayabyab A, Gaston S. M, Berger R, Tanzi R. E, Halperin J. J, Herzfeldt B, Van den Bergh R, Hung W.-Y, Bird T, Deng G, Mulder D. W, Smyth C, Laing N. G, Soriano E, PericakVance M. A, Haines J, Rouleau G. A, Gusella J. S, Horvitz H. R, Brown R. H, Jr. (1993) Nature, 362, 59-62. [250] Andersen P.M. (1997) Amyotrophic lateral sclerosis and CuZn-superoxide dismutase. // Umea Universitet. [251] Borchelt D. R, Lee M. K, Slunt H. S, Guarnieri M, Xu Z.-S, Wong P.C, Brown R. H, Jr., Price D. L, Sisodia S. S, Cleveland D. W. (1994) Proc. Nat. Acad. Sci., 91, 82928296. [252] Kawamata J, Hasegawa H, Shimohama S, Kimura J, Tanaka S, Ueda K. (1994) Lancet, 343, 1501. [253] Jones С. T, Brock D. J. H, Chancellor A. M, Warlow С. P, Swingler R.J. (1993) Lancet, 342, 1050-1051.
74
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[254] Hayward C, Swingler R. J, Simpson S. A, Brock D. J. H. (1996) Am. J. Hum. Genet., 59, 1165-1167. [255] Kikugawa K, Nakano R, Inuzuka T, Kokubo Y, Narita Y, Kuzuhara S, Yoshida S, Tsuji S. (1997) Neurogenetics, 1,113-115. [256] Deng H.-X, Hentati A, Tainer J. A, Iqbal Z, Cayabyab A, Hung W.-Y, Getzoff E. D, Ни P, Herzfeldt B, Roos R. P, Warner C, Deng G, Soriano E, Smyth C, Parge H. E, Ahmed A, Roses A. D, Hallewell R. A, Pericak-Vance M. A, Siddique T. (1993) Science, 261, 1047-1051. [257] Aoki M, Ogasawara M., Matsubara Y., Narisawa K., Nakamura S., Itoyama Y., Abe K. (1993) Nature Genet., 5, 323-324. [258] Aoki M., Ogasawara M., Matsubara Y., Narisawa K., Nakamura S., Itoyama Y., Abe K. (1994) J. Neurol. Sci., 126, 77-83. [259] Andersen P. M., Nilsson P., Ala-Hurula V., Keranen M.-L., Tarvainen I., Haltia Т., Nilsson L., Binzer M., Forsgren L., Marklund S. L. (1995) Nature Genet., 10, 61-66. [260] Aguirre Т., Matthijs G., Robberecht W., Tilkin P., Cassiman J.-J. (1999) Europ. J. Hum. Genet., 7, 599-602. [261] Hand С. K., Mayeux-Portas V., Khoris J., Briolotti V., Clavelou P., Camu W., Rouleau G. A. (2001) Ann. Neurol, 49, 267-271. [262] Ikeda M., Abe K., Aoki M., Sahara M., Watanabe M., Shoji M., St. George-Hyslop P. H., Hirai S., Itoyama Y. (1995) Neurology, 45, 2038-2042. [263] Sapp P. C, Rosen D. R., Hosier B. A., Esteban J., Mckenna-Yasek D., O'Regan J. P., Horvitz H. R., Brown R. H., Jr. (1995) Neuromusc. Disord, 5, 353-357. [264] Morita M., Aoki M., Abe K., Hasegawa Т., Sakuma R., Onodera Y., Ichikawa N., Nishizawa M., Itoyama Y. (1996) Neurosci. Lett., 205, 79-82. [265] Kostrzewa M., Damian M. S., Muller U. (1996) Hum. Genet., 98, 48-50. [266] Jones С. Т., Swingler R. J., Brock D. J. H. (1994) Hum. Molec. Genet., 3, 649-650. [267] Watanabe M., Aoki M., Abe K., Shoji M., Iizuka Т., Ikeda Y., Hirai S, Kurokawa K., Kato Т., Sasaki H., Itoyama Y. (1997) Hum. Mutat., 9, 69-71. [268] Aoki M., Abe K., Houi K., Ogasawara M., Matsubara Y., Kobayashi Т., Mochio S., Narisawa K., Itoyama Y. (1995) Ann. Neurol, 37, 676-679. [269] Kawamata J., Shimohama S., Takano S., Harada K., Ueda K., Kimura J. (1997) Hum. Mutat., 9, 356-358. [270] Zu J. S., Deng H.-X., Lo T. P., Mitsumoto H., Ahmed M. S., Hung W.-Y., Cai Z.-J., Tainer J. A., Siddique T. (1997) Neurogenetics, 1, 65-71. [271] Orrel R. W., Marklund S. L., de Belleroche J. S. (1997) J. Neurol. Sci., 153, 46-49. [272] Penco S., Schenone A., Bordo D., Bolognesi M., Abbruzzese M., Bugiani O., Ajmar F., Garre C. (2001) Neurology, 53, 404-406. [273] Gellera C, Castellotti В., Riggio M. C, Silani V., Morandi L., Testa D., Casali C., Taroni F., Di Donate S., Zeviani M., Mariotti C. (2001) Neuromusc. Disord., 11,404410. [274] Elshafey A., Lanyon W. G., Connor J. M. (1994) Hum. Molec. Genet., 3, 363-364. [275] Alexander M. D., Traynor B. J., Miller N., Corr В., Frost E., McQuaid S., Brett F. M., Green A., Hardiman O. (2002) Ann. Neurol, 52, 680-683. [276] Bowler С, Alliotte T, De Loose M. et al. (1989) EMBO J, 8, 31-38. [277] Landeghem G.F., Tabatabaie P, Beckman G, Beckman L, Andersen P. (1999) Europ. J. of Neurol., 6, 639-644.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
75
[278] Rosenblum J.S., Gilula N.B., Lerner R.A. (1996) Proc. Natl. Acad. Sci. USA, 93, 44714473. [279] Shimoda-Matsubayashi S, Matsumine H, Kobayashi T. et al. (1996) Biochem. Biophys. Res. Commun, 226, 561-565. [280] Hiroi S, Harada H, Nishi H, Satoh M, Nagai R, Kimura A. (1999) Biochem. Biophys. Res. Commun., 261, 332-339. [281] Sandstrom J., Nilsson P., Karlsson K, Marklund S. L. (1994) J. Biol. Chem., 269, 19163-19166.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 2
POLYMORPHISM OF TUMOR-SUPPRESSOR GENES AND GENETIC CONTROL OF CARCINOGENESIS M.M. Aslanyan*1,2 , S.S. Litvinov1, E.S. Tsyrendorzhieva2 and V.A. Tarasov2 1
Biological Department, M.V. Lomonosov Moscow State University, Russia N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
2
ABSTRACT Cancer is one of the fundamental problems of general biology that covers the processes of cell growth, differentiation, cell and tissue metabolism, immune response, mutations and repair of genetic damage. All these processes are under rigid genetic control, due to which homeostasis of cells and tissues in a normal organism is maintained. Development of malignant tumors is a multistep genetically controlled process that can be treated as the microevolution of individual cell clones within a tumor. Tumor progression is determined by the occurrence of mutations in one or several cooperatively functioning genes and by the selection of mutant clones. Decoding of the human genome primary structure has favored the determination and characterization of not only the structural-functional organization of the major genes controlling the process of carcinogenesis but also of their genetic polymorphism. These genes can be arbitrarily divided in 2 large categories – protooncogens and tumor-suppressor genes. Heterozygotes for mutant alleles of suppressor genes, or so-called germline mutants, have an increased hereditary predisposition to the development of cancer. Hence, heterozygosity for mutant alleles can be used for diagnosis of individual susceptibility to cancer. Analysis of 21 polymorphic sites in 14 tumor-suppressor genes has led to the identification of genotypes that differ by tens of times in the risk of breast cancer development in women.
*
[email protected] 78
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
INTRODUCTION The process of tumor development is called carcinogenesis or oncogenesis. A fundamental collective monograph ―CARCINOGENESIS‖ was issued in 2004, under the editorship of professor D.G. Zaridze. The authors of the works presented in the monograph are noted Russian specialists in oncology [1]. Our task is to outline in brief and easily understood terms the genetic aspects of the problems of oncogenesis. Tumor formation is a multistep genetically controlled process that can be treated as the microevolution of individual cell clones [2, 3]. Tumor progression is determined by the occurrence of mutations in one or several cooperatively functioning genes and by the selection of mutant cell clones. Decoding of the human genome primary structure has favored the establishment of the structural-functional organization and polymorphic variants of major genes controlling the process of carcinogenesis. These genes can be arbitrarily divided in two2 large categories – protooncogens and tumor-suppressor genes [2], [3], [4], [5]. A carcinogenic event for oncogens is the enhancement of their expression. A carcinogenic event in the case of tumor-suppressor genes is the loss of the normal function caused by the occurrence of inheritable recessive point mutations or deletions. More than 100 tumor-suppressor genes have been described in the human genome [6]. Heterozygotes for mutant alleles of suppressor genes, or so-called germline mutants, have an increased genetic predisposition to cancer development. Hence, heterozygosity for mutant alleles can be used for diagnosis of individual susceptibility to cancer. Early in the 20th century, the search for the causes of tumor formation was crowned with success. P. Rous discovered the virus of chicken sarcoma (1914). Later on, the virus of rabbit papilloma (R. Shope, 1932) and the virus of murine mammary gland tumor (J. Bitner, 1936) were discovered. In 1946, L. Zilber stated the virus-genetic theory of cancer [7]. According to this theory, the virus is a factor that causes the transformation of a normal cell into a tumoral one. Subsequently, it was demonstrated that tumoral transformation by a virus can occur through different pathways: 1) through enhancement of cell proliferation by virus oncogens; 2) through the change of the structure or expression of genes due to the integration of a virus in the cell genome. A great number of human oncogenic viruses are known now. In parallel with the elaboration of the virus-genetic theory, the mutational theory of cancer was worked out. This theory addresses the problem of mutation accumulation in somatic cells. As a result of spontaneous mutational events, normal cells can acquire selective advantages in the rate of growth and mutation. The frequency of somatic mutations in normal cells is constant but the mutation rate can increase under the action of mutagens or mutations in the DNA repair system. Changes in the normal cell phenotype can occur without changes in the genome nucleotide sequence. Such events are epigenetic and associated with disturbances in the regulation of gene expression. The epigenetic theory of cancer suggested in the 1980s postulated a direct relation between DNA methylation and changes in the expression of protogens and suppressor genes in the course of tumor development [8]. Thus, the above theories do not exclude each other and can be realized at different stages of cancer progression.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
79
FORMS OF CANCER Two forms of cancer are distinguished today: familial and sporadic. The familial form of cancer is found among members of one family. The disease demonstrates a clear-cut hereditary pattern and the probability of its occurrence in healthy members of the family is several times higher than in the case of sporadic cancer. The hereditary factor determining the development of familial cancer can be transmitted according to Mendel‘s laws. The sporadic form of cancer also has the genetic component but its structure may be much more complicated. In this case, a tumor develops under the action of not one but several hereditary mutations. The study of polymorphisms of genes involved in carcinogenesis has revealed combinations of their polymorphic variants that increase the probability of cancer development. Owing to combinational variation, such complex genotypes segregate already in the next generation. That is what explains the absence of diseases among relatives of patients with sporadic cancer.
MODELS OF CANCER DEVELOPMENT In the 1970s, Alfred Knudson proposed the ―two-hit‖ model of retinoblastoma development [9]. He suggested that two mutations are sufficient for development of retinoblastoma and that in the case of bilateral retinoblastoma, carriers of the disease have one inherited mutation. According to Knudson‘s model carriers of a germline mutation are by one step closer to retinoblastoma than individuals without it. Thus, in the first group, retinoblastoma will develop considerably more often. Later on, the RB gene inactivated during tumor development was mapped and cloned. Based on Knudson‘s two-hit model, other researchers [10] assumed that since inactivation of two copies of the RB gene leads to tumoral degeneration of cells the introduction of a functional RB copy will cause the restoration of the normal phenotype. The introduction of RB gene decreased tumor malignancy but in no way affected the rate of cell growth. From this, it followed that RB gene mutations alone were insufficient for retinoblastoma development. It was established later that 60% of retinoblastoma patients had a 6p isochromosome. These two factors necessitated the revision of the ―two-hit‖ model. The ―multihit‖ model of retinoblastoma formation was thus proposed [11]. It suggested three steps in the process of tumor development. The first two steps (or ―hits‖) correspond to the ―twohit‖ model of Knudson. Inactivation of both alleles of the RB gene normally induces cell apoptosis. Hence, a mutation in the apoptotic pathway is required for further existence of the tumor. And most likely, it is isochromosomes 6p and 1q that permit cells to avoid apoptosis and get an advantage in growth [10]. The multihit model explains the origin of monoclonal tumors (i.e., tumors originating from the same clone). The study of polymorphic sites in X-chromosome showed that far not all tumors are monoclonal in nature (for example, plexiform neurofibroma). To explain this phenomenon, the ―recruitment‖ model was elaborated. This model implies that tumor cells (represented in small quantities) stimulate the growth of normal cells and promote their rapid
80
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
transformation. Indeed, tumor cells are able to produce growth factors which that make them and cells surrounding the tumor independent of mitogenic signals. Despite differences between the current models of cancer development, all of them present crcinogenesis as a multistep genetically determined process.
MULTISTEP PATTERN OF TUMOR GROWTH Different tumors develop in accordance to general principles. In the course of development, the tumor focus undergoes changes that are revealed by histological and cytogenetic methods. The whole process can be divided in several steps: initiation, promotion, angiogenesis, invasion and metastasis. The first step, initiation, is characterized by cell changes on the genetic or epigenetic level. It is difficult to fix the time when this step begins because genomic changes do not lead to serious shifts in the cell phenotype. In the course of ontogenesis, normal tissue homeostasis is maintained in the organism. The cells are subjected to stabilizing selection which permits them to reproduce only at a medium rate (specific of each tissue type). The mechanism controlling the rate of proliferation involves extracellular and intracellular signals of growth. The main strategy of tumor cells is to get out of this control. An increase in the rate of division induces a programmed death (apoptosis) of cells. Therefore, mutations permitting cells to avoid apoptosis are found at the very early stages of carcinogenesis. Theoretically, tumor degeneration of cells can be provoked by 1-2 mutations but the mutation process is not finished with that. Analysis of the genome of intestinal tumor cells showed that there are about 11,000 mutations per one cell [12]. This suggests that mutagenesis in them proceeds at a very high rate and gives rise to a pool of genetically heterogenic cells. Such heterogeneity of cancer cells is revealed on both molecular and cytological levels by the FISH method (figure 2) [13]. In the absence of stabilizing selection factors, a clonal selection occurs: as a result of spontaneous mutagenesis cells acquire an advantage in growth rate and displace normal cells and cells similar to themselves. Every new advantage permits the cell to displace its ancestors. Such mutator phenotype is expressed in the case of disturbances in the systems of repair and replication. Thus, the main two characteristics of tumor cells are uncontrolled growth and accelerated mutagenesis. The next step of tumor progression is promotion. At this step, the population of tumor cells rapidly increases. Tumor cells are able to divide over an indefinitely long priodperiod, whereas normal cells divide only 50-60 times after which replicative aging and death follow [14]. But these processes are far from encompassing all cells. Stem and cancer cells are capable of restoring the length of the terminal regions of chromosomes (telomeres) due to the activity of the telomerase enzyme. Telomerase is a complex of proteins and ribonucleic acids. The catalytic subunit of telomerase is TERT (reverse transcriptase). The activity of telomerase, which protects cells from replicative aging, was revealed in different types of tumors. However, telomerase was not found in 10% of tumors and nevertheless no shortening of telomeres was observed in them. This phenomenon has been termed ―alternative lengthening of telomeres.‖. In such cells, the maintenance of the length of telomeres is associated with instability of minisatellite repeats and with recombination between telomeres.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
81
TERT gene expression is regulated by the c-Myc protein. In somatic cells, telomerase is present in residual amounts. TERT gene expression is partially inhibited by the BRCA1 protein. It can bind to c-Myc and inhibit the activity of c-Myc-dependent promoters. In cells with an inactive BRCA1 gene, the expression of TERT is increased by 30%. Such an increase in TERT gene expression and even overexpressionover expression of this gene is not sufficient for transformation of normal cells in tumor ones. But tumoral transformation of cells becomes possible upon addition to TERT of overexpressedover expressed H-RAS gene or T antigen of the SV40 virus (expressed separately they do not cause transformation). It can be suggested on this basis that the process of telomere maintenance by itself cannot provide the induction of oncogenesis but is necessary for unlimited division of cancer cells [16]. Uncontrolled growth of cells leads to tissue hardening. Such growth continues until the tumor size reaches just several millimeters in diameter. At this moment, necrotic processes begin in the center of the growth focus because of the deficiency of oxygen (hypoxia0 and glucose. Thus, at this stage the tumor size is limited by hypoxia. On the one hand, such conditions may be a signal for a cell to enter apoptosis, but there is an alternative pathway for further tumor progression – angiogenesis. Slight hypoxia activates genes that increase glycolysis and adapt cells to the altered conditions. In tumor cells, one of the key genes activated in response to hypoxia is HIF-1 (hypoxia-inducible factor 1). This gene encodes the transcriptional factor. In complex with other transcriptional factors and coactivators the HIF-1 protein induces expression of such genes as lactate dehydrogenase A, erythropoietin and vascular epithelium growth factor (VEGF). Expression of the latter one and other similar genes induced by hypoxia is needed for the formation of a vascular system (angiogenesis) around the tumor. In this way, necrotization of tumor cells is prevented and their growth is accelerated [17]. Studies of the nucleotide sequence of the HIF1 gene revealed its polymorphic regions. Two one-nucleotide substitutions (in the 5‘noncoding region A-2500T, amino acid substitution in codon 582 – proline by serine) showed an association with an adaptive response to oxygen deficiency: the basal level of oxygen in tissues changed in response to physical training [18]. Two polymorphic sites were found to be associated with prostate cancer (substitution of proline-582 by serine) [19] and with renal cell carcinoma (substitutions of proline-582 by serine and alanine-588 by threonine) [20]. But the activation of HIF-1 expression alone is not sufficient for angiogenesis to be induced. In an adult organism, angiogenesis is actively suppressed by the PTEN gene and therefore a necessary event for further development of a tumor is its inactivation. In different types of tumors, the loss of a chromosome 10 region with PTEN was observed in the range from 23% (in breast tumors) to 54% (in glioblastomas) of cases [21]. On the other hand, angiogenesis is influenced by growth factors. A dramatic example of such influence is the stimulation of HIF-1 gene by the insulin-like growth factor, IGF1. This way of activation of angiogenesis is also under PTEN negative control. Another way to get out of this control is mutational activation of genes suppressed by PTEN (PDK, AKT) [21]. Activated AKT kinase can affect such processes as metabolism, cell cycle, repair and regulation of the size of cells and organs. The loss of negative control over AKT may be a pathway for survival of tumor cells since this protein can directly activate the expression of the above-mentioned HIF-1 gene, i.e., this signal pathway can induce angiogenesis [22].
82
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Apart from the fact that tumor cells grow more rapidly than normal cells, they use glycolysis as the main source of energy. When reaching a large size, the tumor becomes a ―trap‖ of nutrients. The whole organism begins to work for maintaining the tumor. In most cases of cancer, tumors cells with defective mitochondria are found. Such cells derive energy from glycolysis rather than from oxidative phosphorylation. Oxygen deficiency enhances glycolysis in normal cells but a total absence of oxygen and consequently of oxidative phosphorylation causes apoptosis. Oxidative phosphorylation is possible only given a full functional electron-transport chain. A full chain consists of 87 proteins, of which 13 are encoded by the mitochondrial genome. Even with a normal respiratory chain a spontaneous leakage of electrons (mainly from complexes I and III) and generation of superoxide radicals can occur [23]. Mutations affecting the integrity of the chain lead to a sharp increase in the quantity of active forms of oxygen. The peculiarity of mitochondrial mutations is that they are not subject to selection. Because two types of mitochondria (normal and mutant) are simultaneously present in cells a defect in the respiratory chain manifests itself only after their segregation (figure 1).
Figure 1. A scheme of carcinogenesis induction as a result of a mitochondrial mutation.
A constant oxidative stress imparts to cells a mutator phenotype characteristic for tumors. A normal cell cannot exist in such conditions for a long time. Its further existence is possible only when the pathways of apoptosis are blocked. Thus, mutations of genes encoding mitochondrial proteins can initiate carcinogenesis or participate in its progression. The occurrence of secondary foci of carcinogenesis as a result of metastasis becomes the final step in tumor development. Cancer cells spread in the body through the circulatory or
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
83
lymphatic system. Multiple tumors occurring in different organs disturb their functions and lead to the lethal outcome. The step of invasion, as well as the subsequent step of metastasis, has a complex genetic control. A chain of consecutive changes in the structure of tumor tissue is required for its realization. First, to have the capacity for migration, cancer cells must break the connection with the tissue maintained through cellular contacts. Genes controlling the process of cell adhesion represent the first group of metastatic suppressors, including genes encoding calciumdependent cadgerins such as E-cadgerin. This glycoprotein is involved not only in the formation of intercellular contacts (in the form of cadgerin bridges) but also in the transmission of antigrowth signals. After disruption of intercellular connections, it is necessary for tumor cells to penetrate in the circulatory or lymphatic system. But normal tissues surrounding the tumor and the vascular walls still remain to be a barrier for tumor cells. Matrix metalloproteases (MMP) help them to overcome this barrier. These proteases belong to the families of transmembrane and secreted proteins. All these enzymes secreted by tumor cells degrade the components of the intercellular matrix such as fibrin, type 4 collagen (basic matrix component)), etc. The next step is metastasis – integration of tumor cells into another tissue. To grow in a foreign tissue, they have to adapt themselves to the new microenvironment. The process involves integrins that belong to the family of heterodimeric transmembrane glycoproteins (more than 200). Although integrins can normally participate in different processes, beginning with proliferation and ending with apoptosis regulation, their function in the process of adaptation is to anchor tumor cells in the new tissue [24]. After that, the integration of cancer cells ends and the formation of a secondary focus of tumor development begins. This stage presents a major concern because numerous metastases make the surgical methods of treatment insufficient.
GENES INVOLVED IN CARCINOGENESIS In the course of cancer development, genes involved in this process change their functional status. By the character of changes, the genes are divided in protooncogens and tumor-suppressor genes. The protein products of protooncogens are the components of a signal system which controls the processes of cell proliferation and differentiation (the family of G-proteins, integrins). The carcinogenic event for protooncogens is their activation that occurs in three ways. The first way is a mutation in the coding or regulatory region of a protooncogen; the second way is protooncogen amplification (the family of MYC genes; the N-MYC oncogen reaches 200 copies per cell in the case of retinoblastomas and neuroblastomas); the third way involves chromosome rearrangements (Philadelphian chromosome detected in chronic myeloleukemia occurs in 95% of cases from a translocation of a fragment of chromosome 9 onto chromosome 22 with the formation of a chimeric gene, BCR-ALB) (figure 2 A). After activation, the protooncogen becomes an oncogen. The result of protooncogen activation is usually the enhancement of proliferation. The oncogen dominates over the remaining normal copy of the protooncogen.
84
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Suppressors are genes involved in the negative control of the cell cycle (genes RB, P53, P21) and genes maintaining genomic stability (RAD51, BRCA1, BRCA2, ATM, MLH1, MSH2, PMS2). Inactivation of both alleles of a suppressor gene is required for induction of carcinogenesis, i.e., in this case the normal allele of the gene dominates over the mutant one.
Figure 2. Karyotypic analysis of human malignant cells. Left (A) – the karyotype of a patient with chronic myelocytic leukemia (translocation of a fragment of chromosome 22 to chromosome 9). Right (B) – the karyotype of a patient with bladder carcinoma demonstrating a great number of chromosomal fragments and chromosomal rearrangements.
TUMOR-SUPPRESSOR GENES Inactivation of a suppressor gene is easily revealed by the loss of expression of one of the alleles of a heterozygous locus (loss of heterozygisity). Such losses are often associated with the loss of a chromosomal fragment. One of the regions that are most often lost very early in tumor development is locus p13.1 on chromosome 17 [25]. This locus carries one of the most important tumor-suppressor genes, P53. Its contribution to the prevention of transformation of normal cells into cancer ones is difficult to overestimate. The rate of P53 gene inactivation in tumors of different types is higher than 50%. The P53 gene controls such vitally important processes as repair, recombination, cell cycle and apoptosis. It can participate in these processes both directly (in repair and recombination) and through activation of other genes involved in them (regulation of cell cycle and apoptosis). The P53 gene is exceptional in that 80% of all its mutations are amino acid substitutions (the frequency of missense-mutations in other suppressor genes is about 10%). The P53 protein is small in size (394 amino acids) and highly conserved. Therefore, almost all missense-mutations lead to disturbances in its functions. Inherited P53 gene mutations result
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
85
in the early development of the Li-Fraumeni syndrome characterized by an elevated frequency of occurrence of all tumor types in the organism. In 1992, Vogelstein and Kinzler [26] postulated the necessity of the presence of a tetrameric form of P53 protein for normal functioning of cells. They proposed five possible mechanisms of P53 gene inactivation: 1. the loss of one or both gene copies as a result of a deletion leading to reduced expression of cell growth inhibitors; 2. truncated protein (due to nonsense-mutations, frame-shift mutations or mutations disturbing splicing) preventing P53 oligomerization (since the oligomerization domain is located at the C-end of the protein (figure 4)); 3. missense-mutations producungproducing the dominant negative effect on the protein functions; 4. degradation of P53 caused by the interaction with the human papilloma virus E6 oncoprotein; 5. degradation of P53 resulting from overexpressionover expression of its antagonist, MDM2 gene. Numerous DNA lesions induce an increase in P53 gene expression. The P53 protein activates transcription of the P21WAF1 gene which encodes the inhibitor of cycline-dependent kinases. This protein inhibits activity of cycline-kinase complexes thus arresting the cell in the G1 phase (figure 3). A turning moment in passing the check-point between the G1 and S phases of the cell cycle is RB protein phosphorylation. RB was the first discovered tumor-suppressor gene. Its function is negative control of the cell cycle. In normal cells, the gene is expressed during the whole cell cycle. Complete phosphorylation of the protein occurs when the cell enters the S phase. In the hypophosphorylated state, the protein binds to the free E2F transcription factor (figure 3). In complex with RB this factor is unable to activate transcription of genes responsible for proliferation. This results in the arrest of the cell in G1 (figure 3) [27]. The RB-E2F complex binds to deacetylases of the HDAC family (HDAC1, HDAC2). Such a triple complex sits on nucleosomes and preserves the ―closed‖ form of chromatin in the region of promoters of cell proliferation genes and thus impedes their transcription. With RB protein phosphorylation the complex disintegrates and the structure of chromatin becomes ―open‖ under the action of acetylases [28]. RB phosphorylation is mediated by cycline-dependent kinases. In early G1 CDK4 and CDK6 kinases bound to cycline D phosphorylate the C-terminal region of the protein and deacetylase is thus released. At the next stage the CDK2 kinase bound to cycline E continues phosphorylation and the RB-E2F complex is disintegrated (figure 3). As a result of a mutation in the regulatory region of the D1 cycline gene its expression may increase. An excess of cycline D competitively binds to RB instead of E2F leading to tumor formation since E2F acquires a possibility to constantly stimulate cell proliferation.
86
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Figure 3. The mechanism of cell cycle blocking by P53 protein in response to stress conditions.
As a result of a mutation in the regulatory region of the D1 cycline gene its expression may increase. An excess of cycline D competitively binds to RB instead of E2F leading to tumor formation since E2F acquires a possibility to constantly stimulate cell proliferation. An alternative pathway of tumor transformation of cells is associated with the loss of P53 gene expression. This event may be caused by overexpressionover expression of the MDM2 gene. The protein encoded by this gene binds to the P53 protein inducing its ubiquitination and subsequent degradation of proteosomes. Polymorphism existing in the MDM2 gene promoter (substitution of T by G in nucleotide –309) enhances expression of the gene. Cells with the genotypes 309 G/G and G/T enter apoptosis less frequently under the action of radiation. Hence, the probability of their malignization is higher than of cells with the 309 T/T genotype [20]. Carriers of this genotype have a high risk of pulmonary cancer [30]. Another important function of the P53 protein is negative control of homologous recombination. P53 binds to RAD51 (homologous recombination protein) and inhibits recombination if the length of a completely homologous region does not exceed 200 bp. In this way, the stability of the genome enriched with different types of repeats is maintained. A very small amount of the functional P53 protein is sufficient for recombination inhibition. And vice versa a small amount of the dominant negative mutant P53 protein is sufficient to remove the 200 bp barrier of homologous recombination. It is of interest that some hot spots of mutagenesis in the P53 gene are located in regions responsible for the interaction of P53 with RAD51 (figure 4). Indeed, a lot of chromosome aberrations are observed in cancer cells. Thus, tumor initiation or progression can occur through mutations increasing the frequency of recombination [31].
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
87
Figure 4. Frequencies of somatic mutations in the coding part of P53 gene in all tumor types in projection on the protein domain structure.
In the case of DNA damage P53 activation can induce both a delay of the cell cycle during which the damage is repaired and apoptosis. An essential influence on this choice is produced by protooncogenic proteins of the MYC family [32]. MYC binds to the promoter of the P21/WAF1 gene and blocks the induction of its transcription by P53. Thus, the P53dependent response is switched from a cell cycle delay to apoptosis. In a critical situation, P53 can trigger the program of apoptosis through two pathways. The first one is the activation of proapoptotic genes (BAX, APAF-1) directly involved in apoptosis. The second mechanism implies that the P53 protein is able to bind and inhibit antiapoptotic proteins (Bcl-2, Bcl-xL) [33]. The central part in the realization of the program of apoptosis is played by the caspase pathway (a chain of proteolytic reactions) which is activated by cytochrome C. In normal cells cytochrome C is present only in mitochondria. Its release in the cytoplasm is controlled by the family of Bcl-2 proteins that includes activators (Bax, Bad, Bid) and inhibitors (Bcl-2, Bcl-xL) of apoptosis. The proteins of this family are present in the outer membranes of mitochondria and the ratio of activators and inhibitors determines the fate of the cell [34]. P53 protein binds to the protein inhibitors of apoptosis through the DNA-binding domain. Thus, mutations in this domain can simultaneously inactivate two pathways of apoptosis. The coding part of the P53 gene has two polymorphic regions with substitutions in the protein amino acid sequence (substitutions of proline-47 by serine and arginine-72 by proline). Both polymorphisms are associated with a low apoptotic response to DNA damage [35], [36], [37]. The serine-47 polymorphic variant is rare in the human population in contrast
88
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
to the codon 72 polymorphism. The frequency of the proline-72 allele was found to increase in the population as the equator is approached. Although this allele is associated with a low apoptotic response its wide distribution in the equatorial regions of the Earth is explained by a higher resistance to UV-radiation of its carriers [38]. P53 protein activation is achieved through the sensor system which signals about DNA damage. This system includes such proteins as ATM, CHEK2 and ATR. ATM gene mutations result in the development of ataxia-telangiectasia. This disease is characterized by defects in the immune system, hypersensitivity to radioactivity, genomic instability and a high risk of leukemia. With occurrence of DNA damage ATM phosphorylates P53 and CHEK2 (figure 5) [27]. Activated CHEK2 also phosphorylates P53, and the interaction of P53 with MDM2 and consequently P53 degradation are inhibited [27]. Phosphorylation of MDM2 by ATM leads to the same result.
Figure 5. A scheme of interaction of the products of tumor-suppressor genes in the case of DNA damage.
DNA damage occurring in the S phase can be repaired due to a temporary delay in the beginning of the G2 phase. An active Cdc25C protein is needed for proceeding to G2, but DNA damage provokes its phosphorylation by CHEK2. Inactivated Cdc24C moves from the nucleus into the cytoplasm arresting the cell cycle in the S phase. In turn, dephosphorylated Cdc25C protein activates the cycline-dependent kinase CDK2 promoting the progression of G2 and subsequent mitosis (figure 2) [27]. The second sensor of DNA damage is ATR gene. In contrast to ATM, ATR responds not only to double-stranded DNA breaks but also to damage caused by UV-radiation and replication errors. Activated ATR can control both G1 and G2 [27]. In parallel with blocking
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
89
the cell cycle ATR and ATM activate DNA repair proteins. One of such activated proteins is BRCA1 (figure 5). BRCA1 is the central component of the BASC protein complex. This complex includes more than ten proteins [39]. BCRA1 and BCRA2 form the core of the complex on the surface of which all the rest proteins (constant or temporary complex components) are located (GigFig.6) [40]. With occurrence of double-stranded DNA breaks the BCRA1 protein is phosphorylated by the ATM kinase and the whole BASC complex moves to the region of damage. The double-stranded breaks are repaired via homologous recombination. This process directly involves proteins of the complex: MRE11, NBS1, RAD50 and RAD51 (homologous recombination protein). In the absence of DNA damage BRCA2 binds to RAD51 and inhibits its activity [41]. Germline mutations in BRCA1 and BRCA2 genes are associated with the hereditary form of breast and ovarian cancers. In addition to the above-mentioned proteins, the BASC complex includes the components of the system of repair of unpaired bases: MSH2, MSH6 and MLH1 [39]. Proteins of this system correct replication errors and inhibit spontaneous recombination between lowhomologous DNA regions. As well as in the case of P53 protein, MSH2, MSH6 and MLH1 protein mutations can result in the occurrence of microsatellite instability [42]. Thus, proteins of the BASC complex play an important role in maintaining genomic stability.
PROTOONCOGENS One of the first discovered oncogens was v-Myc of the chicken sarcoma virus. Later on, its homologs were found in the human genome: c-Myc, N-MYC and L-MYC. Most important in the process of carcinogenesis is c-Myc. OverexpressionOver expression of this gene was observed in 90% of cases of sexual system cancer, in 80% of breast cancer cases, in 70% of intestinal cancer cases and in 50% of liver carcinoma cases. c-Myc gene encodes a protein that is a transcriptional factor. In normal cells the protein regulates the expression of genes responsible for cell proliferation and is also involved in such processes as apoptosis, metabolism, differentiation and adhesion [43]. In normal cells c-Myc is weakly expressed but its expression sharply increases owing to growth factors that appear at a certain time of the cell cycle. The functions of c-Myc and NMYC genes are negatively regulated by the RB protein, but when RB binds to some transforming proteins of DNA-containing viruses (T-antigen of SV40 virus, adenovirus E1A, E7 of human papilloma virus of types 16 and 18) the MYC genes are activated and cell proliferation increases [27]. In turn, c-Myc gene overexpressionover expression induces apoptosis via P53 protein [44]. Despite this, the c-Myc oncogen is overexpressedover expressed in cancer cells, suggesting changes in genes controlling programedprogrammed cell death. c-Myc protein induces the expression of genes that are required for a cell to enter the S phase. These are genes encoding D type cyclines, cycline E and cycline-dependent kinase CDC4. Cycline-dependent kinases activated by the interaction with cyclines promote cell progression from G1 into S.
90
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Expression of MYC genes is regulated by protooncogens of the RAS family. Oncogenic mutations in genes of this family are found in different human tumors with a frequency from 10% to 95% [45]. N-RAS gene is most susceptible to mutations. The RAS family is composed of genes encoding GTP-binding proteins that are anchored in the cell plasmatic membrane. Their function is normally the modulation of an extracellular growth signal from the tyrosine kinase receptor to effector proteins (Ral-GEF, Rafs, P13-K and MEKK) [45]. It was long ago established that RAS and MYC function in cooperation in carcinogenesis induction [46]. The activity of Myc is modulated by the RAS protein through two independent mechanisms. The first one is triggered by the Raf-1 protein after which a cascade of activations of signal kinases (MEK and ERK/MAPK) leading to MYC phosphorylation by serine in position 62 follows (figure 7). This modification stabilizes the protein and thus prolongs its half-life period. The second mechanism of regulation is realized through the phosphoinositide-3-kinase pathway (PI3K-AKT). This mechanism leads to the inhibition of glycogen synthetase-3 kinase (GSK-3) activity and prevents MYC phosphorylation by threonine in position 58 (figure 7). MYC phosphorylation by this amino acis residue is a necessary event for a rapid proteolytic degradation of the protein. Thus, both regulatory pathways prolong MYC protein activity. Mutant proteins of the RAS and MYC families induce uncontrolled proliferation of cells and the development of resistance to apoptotic signals [47].
Figure 6. The domain structure of BRCA1, BRCA2 and sites of protein-protein interaction.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
91
Figure 7. A scheme of MYC activity regulation by RAS protein under the action of extracellular growth signals.
The cooperative interaction of Myc and Ras protooncogens was studied in transgenic mice. Both genes fused with the promoter of the murine mammary tumor virus (MMTV) were transferred in the mouse genome. The result of overexpressionover expression of Ras and Myc protooncogens was an increased frequency of occurrence of mammary and salivary gland tumors as well as lymphoid tissue tumors. It was also found that mice with both transgens were distinguished by a higher rate of tumor formation than mice with one of the transgens (figure 8A) [46]. A similar result of synergid interaction was obtained for oncogens Wnt and HER-2 (figure 8B). The character of the tumors was similar to that of tumors formed in mice with oncogens Myc and Ras. This result is explained by the fact that Wnt and Myc genes are the components of one signal pathway and HER-2 and Ras of another one. In case transgenic mice had only one oncogen, either Wnt or c-Myc, the frequency of somatic mutations in the Ras gene in tumors was elevated [48]. Thus, generative mutations predetermine what mutational events will subsequently be advantageous for tumor progression.
Figure 8. Kinetics of tumor development in female mice carrying two oncogenic transgens myc and rasD – individually and in combination (A) and in a similar experiment with oncogenic transgens Wnt1 and Her-2 (B). The results demonstrate a cooperative effect of tumor induction in the case of two mutations.
92
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
GENES OF BIOTRANSFORMATION SYSTEM Detoxication genes are those that are responsible for metabolism, degradation, detoxication and release of xenobiotics and chemical compounds. It is polymorphic variants of these genes that determine individual reactions of the organism to different chemicals and foodstuffs [49]. Numerous epidemiological studies indicate that many common diseases, including different forms of cancer, are associated to a variable degree with unfavorable environmental factors, among which smoking and low-quality foodstuffs are most serious. When penetrating in the organism most of foreign compounds (xenobiotics) do not produce a direct biological effect. To be cleared from the body, xenobiotics undergo enzymatic transformation – biotransformation. Detoxication of toxic cell metabolites and xenobiotics proceeds in liver cells in two stages. Reactions of the first stage are catalyzed by the monooxygenase system, the components of which are incorporated in endoplasmic reticulum membranes. The reactions of oxidation, reduction or hydrolysis are the first step in the removal of hydrophobic molecules from the body. They transform substances into polar water-soluble metabolites. The main enzyme at the first stage of detoxication is cytochrome-P-450. A lot of isoforms of this enzyme have been identified at present and assigned to several families depending on their properties and functions. 13 Thirteen subfamilies of cytochrome-450 are known in mammals [50]. It is considered that enzymes of families I-IV are involved in biotransformation of xenobiotics and the rest metabolize endogenic compounds (steroid hormones, prostaglandins, fatty acids, etc) [51]. At the first stage of biotransformation the formation of hydroxyl, carboxyl, thiolic and amino groups takes place, and the molecule can undergo further transformation and be removed from the body. Besides cytochrome-450, biotransformation at the first stage involves cytochrome-b5 and cytochrome reductase. At the first stage of transformation many drugs are transformed into active forms and produce a desired therapeutic effect. However some xenobiotics are often not detoxicated with the involvement of the monooxygenase system and become more responsive. The metabolic products of foreign compounds formed at the first stage of biotransformation undergo further detoxication through a series of second stage reactions. The resulting compounds are less polar and therefore easily removed from cells. The predominating process is conjugation catalyzed by glutathione-S-transferase, sulfotransferase and UDP-glucuronyltransferase. Conjugation with glutathione that gives rise to mercapturic acids is commonly regarded as the main mechanism of detoxication [52]. The most widespread enzymes of the second phase of biotransformation belong to the superfamily of glutathione –S-transferases (GFT). The enzymes of this family catalyze conjugation of reduced glutathione (GSH) with a lot of electrophilic substances, participate in metabolism of prostaglandins and leukotriens, in the transport of steroid hormones and play an important part in the protection of cells from carcinogenic compounds. Genes encoding detoxication enzymes are characterized by a high polymorphism and the frequencies of this polymorphism show significant historically evolved populational, ethnic and racial differences [53, 54]. Polymorphism of these genes is associated with changes or a complete loss of the activity of enzymes encoded by them. The risk of onset of different
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
93
cancers increases in the case of unfavorable combinations of functionally defective variants of several genes involved in different phases of detoxication. A high activity of phase 1 cytochromes in combination with a low activity of phase 2 enzymes is the most unfavorable forerunner of cancer and other mutifactorial diseases. For instance, the combination of homozygosity for full deletions of GSTM1 and GSTT1 genes with homozygosity for Msp1 polymorphism in the CYP1A1 gene is associated with a high risk of breast cancer development. The group of genes of the biotransformation system responds most finely to the interaction with many environmental factors (foodstuffs, drugs, alcohol, tobacco and narcotics), thus determining the predisposition of specific genotypes to different diseases.
GENETIC SUBDIVISION OF THE HUMAN POPULATION Associated with the Risk of Tumor Development Since 2001, our team in cooperation with the Laboratory of Clinical Genetics of the Russian Research Oncological Center of the Russian Academy of Medical Sciences has been studying the relationship of polymorphism of genes controlling the processes of cell division, repair and apoptosis with the development of sporadic forms of breast cancer in women. Table 1 presents the characteristics of twenty- one polymorphic sites studied in the work. In most cases nucleotide substitutions at these sites lead to amino acid substitutions. Blood DNA samples from a group of women with breast cancer (151 patients) and from a control group (191 individuals) have been analyzed. Polymorphism was studied with the use of two methods: PCR-RFLP and PCR with modifying primers (dCAPs). The genes under study control the synthesis of proteins constituting the polyprotein complex BASC (39). The key role in this complex is played by proteins BRCA1, BRCA2, P53 etc.. There are good grounds to believe that this polyprotein complex is essential in recognizing DNA structural changes and in postreplicative repair regulation [55]. It has been established that the loss of functions of a set of genes encoding the proteins of the complex has a profound impact on the development of malignant tumors in different organs, including breast, ovaries, prostate, large intestine, pancreas and stomach [56]. Amino acid substitutions in certain domains of interacting proteins of the BASC complex can lead to its reduced functional activity and to genomic destabilization. It has been found by us previously that the genetic structure of the cohort of women with breast cancer differs from that of the control group. Besides, genotypes characterized by an increased risk of breast cancer development and genotypes with a low risk of tumor formation have been identified [57-59]. Analysis of 34 polymorphic sites in 18 tumor-suppressor genes has revealed genotypes that differ by several tens of times in the risk of breast cancer in women (table 2). The established associations of polymorphic alleles with a high risk of tumor formation may be basic for 3540% of cases of sporadic breast cancer in women.
94
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al. Table 1. Genes and polymorphic sites studied in the work
Gene
Exon
11
BRCA1 13
16 10
BRCA2
11 27
Codon
Nucleotide Nucleotide position in substitution mRNA DNA repair genes
Amino acid substitution
Symbo l
A
356
1186
A>G
Gln>Arg
693
2196
G>A
Asp>Asn
-
694
2201
T>C
Ser>Ser
S1
771
2430
T>C
Leu>Leu
S2
1038
3232
A>G
Glu>Gly
S3
1040
3238
G>A
Ser>Asn
-
1183
3667
A>G
Lys>Arg
S4
1431
4410
T>C
Ser>Pro
S5
1436
4427
T>C
Ser>Ser
S6
1605
4932
T>C
Leu>Leu
-
1613
4956
A>G
Ser>Gly
S7
1628
5002
T>C
Met>Thr
-
372
1342
A>C
Asn>His
M
991
3199
A>G
Asn>Asp
-
1420
4486
G>T
Tyr>Asp
-
1915
5972
C>T
Thr>Met
-
3412
10462
A>G
Ile>Val
-
135
C>G
-
R
NBS1
5' untranslated region 5 185
663
C>G
Gln>Glu
T
MSH6
1
39
172
G>A
Gly>Glu
W
RAD51
MSH3
23
1036
3386
A>G
Thr>Ala
E
XRCC1
10
399
1744
G>A
Arg>Gln
X
1245
C>G
Ser>Cys
U
776 T>G 2896 C>T Genes of biotransformation
Asp>Glu Pro>Ser
V P
Ala>Val
-
OGG1
7
326
APEX1 BRIP1
6 19
148 919
CYP11B1
8
386
4536
T>C
GSTT1
Gene deletion
-
GSTM1
Gene deletion
-
Gene of cell cycle control and apoptosis 4
72
12139
G>C
Arg>Pro
J
6
213
890
A>G
Arg>Arg
-
BARD1
7
557
1743
G>C
Cys>Ser
-
P21
2
31
187
C>A
Ser>Arg
Q
C>T
Pro>Ser
-
C>T
Ala>Val
F
P53
Gene of regulation of a cell response to hypoxia HIF1A
12
582
1772 Housekeeping gene
MTHFR
5
222
849
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
95
Table 2. Association of polymorphic sites for which the risk of tumor development is statistically significantly different from the average populational one Proportion(%) №
Combination of genes
Genotype
Cas e
Contro l
2 P
OR
Low risk 1
RAD51 NBS1 P21
Rr tt QQ
0,7
6,3
7,29
0,007
0,14
2
BRCA1 P53
SS jj
0,7
6,8
8,11
0,004
0,13
3
APEX P53
Vv jj
0
5,2
8,14
0,004
0,06
ss Ww
7,3
1,0
8,97
0,003
6,2
2,6
9,31
0,002
4,1
1,0
7,74
0,005
5,6
0,000 5
9,7
High risk 1
BRCA1 MSH6
2
OGG1 NBS1 MSH6
UU TT Ww
10, 6
3
BRCA2 NBS1 MSH6
Mm TT Ww
6,6
4
BRCA1 OGG1 NBS1 MSH6
AA UU TT Ww
9,3
1,0
12,1 9
5
APEX BRCA2 MSH6
VV Mm Ww
6,0
0,5
8,78
0,003
8,5
6
NBS1 APEX XRCC1
TT VV XX
5,3
0,5
7,50
0,006
7,5
7
RAD51 XRCC1 BRCA2
Rr XX Mm
8,6
2,1
7,58
0,006
4,1
* - Frequent alleles in genotypes are designated by capital letters and rare alleles by small letters.
Table 3. Association of polymorphic sites with an elevated risk of breast cancer in enlarged examined groups (456 samples from patients and 299 control samples) Combination of genes
Genotype
Proportion (%) Case
Control
OR
Confidence interval (95%)
P
Low risk NBS1 OGG1 APEX1 MTHFR
t t UU
5,92
10,70
0,53
0,40
0,69
5,28
0,022
v v FF
8,99
14,05
0,60
0,48
0,76
4,20
0,040
APEX1 OGG1
V v UU
12,94
18,73
0,64
0,53
0,79
3,98
0,046
NBS1 P53
t t Jj
3,73
7,02
0,51
0,37
0,72
3,90
0,048
NBS1 APEX1
t t Vv
3,73
7,02
0,51
0,37
0,72
3,90
0,048
MTHFR OGG1
f f Uu
5,04
1,00
5,24
2,82
9,73
8,56
0,003
APEX1 P21
VV QQ
24,56
16,39
1,66
1,37
2,01
5,66
0,017
APEX1 BRCA2
VV Yy
12,72
7,02
1,93
1,48
2,52
5,60
0,018
APEX1 BRCA1
VV Ss
14,47
8,70
1,78
1,39
2,27
4,95
0,026
APEX1 XRCC1
VV XX
12,50
7,69
1,71
1,32
2,22
3,94
0,047
NBS1 MSH6
TT Ww
15,57
8,70
1,94
1,52
2,47
6,64
0,010
NBS1 RAD51
TT Rr
14,25
8,70
1,75
1,37
2,23
4,63
0,031
NBS1 P21
TT QQ
37,72
28,43
1,52
1,30
1,79
4,58
0,032
NBS1 OGG1
TT UU
29,82
21,74
1,53
1,29
1,82
4,43
0,035
High risk
MSH3 OGG1
ee UU
6,36
3,01
2,19
1,48
3,23
4,03
0,045
MSH6 BRCA2
Ww Yy
13,38
8,03
1,77
1,37
2,28
4,59
0,032
RAD51 P53
Rr Jj
12,72
7,36
1,83
1,41
2,39
4,90
0,027
96
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
To improve the reliability of the results, the number of samples was later increased: 456 samples from women with cancer and 299 control samples. Genes analyzed in the larger groups of examined individuals are presented in bold type in table 1. Analysis of the distribution of genotype frequencies has revealed statistically significant differences by the MTHFR gene. The frequency of rare 222 Val/Val homozygotes in the group of patients and in the control group was 9.4% and 4.7%, respectively (OR=2.12, P=0.02). A large number of genotypes associated with a high and low risk of breast cancer have been established in the analysis of combinations of two genes. The results are presented in table 3. Analysis of polymorphic variants of tumor-suppressor genes and protooncogens permits molecular markers associated with cancer development to be identified and diagnosticums for early diagnosis of the disease to be worked out.
CONCLUSION It is known now that a combination of polymorphic variants of genes can be at the basis of many polygenic diseases, such as atherosclerosis, osteoporosis, ischemia, diabetes, etc. [49]. Cancer is a classical example of a polygenic multifactorial disease. Its development is associated with a successive occurrence of mutations in a number of genes, both protooncogens and tumor-suppressor genes [4]. An essential achievement in understanding the molecular mechanisms of carcinogenesis was the discovery of genes that suppress the formation of tumors and whose loss results in preventing negative regulation of cell proliferation. For instance, the development of breast cancer is controlled by many genes (BRCA1, BRCA2, P53, P21, RAD50, RAD51, Rb, MSH2 etc.). The key genes are BRCA1 and BRCA2. They encode multifunctional proteins the mutant phenotypes of which determine breast and ovarian cancer predisposition. Oncogenesis in individuals with a germinal mutation in BRCA occurs upon inactivation of a corresponding wild-type allele in a somatic cell. Hundreds of mutations have been revealed in tumor-suppressor genes and most of them are missense-mutations represented by tens of polymorphic variants evolutionally fixed in human populations. The knowledge of the nucleotide sequences of the main tumor-suppressor genes and consequently of the amino acid sequences and the configuration of their protein products makes it possible to establish sites which determine the tertiary structure of the product. The knowledge of sites where genes interact in complex can facilitate the identification of certain gaplotypes for polymorphic loci that show a good correlation with the development of certain types of tumors. Two forms of cancer are distinguished at present: familial and sporadic. The familial form is found among the members of one family, but it constitutes only 10% of the total incidence of cancer diseases. The familial form demonstrates a clear-cut hereditary pattern, and the probability of its occurrence in healthy members of the family is several times higher than in the case of sporadic cancer. The sporadic form of cancer also has the genetic component but its structure may be much more complicated. In this case, a tumor develops under the action of not one hereditary mutation but several mutations. The study of polymorphism of genes involved in carcinogenesis has revealed combinations of their
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
97
polymorphic states that increase the probability of cancer development. Owing to combinational variation such genotypes are segregated already in the next generation. This is what explains the absence of diseases among relatives of patients with sporadic cancer. In the analysis of twenty -one polymorphic sites in nine tumor-suppressor genes, we have identified associations of specific genotypes with a high risk of occurrence of sporadic breast cancer that can determine the development of a malignant tumor in 30-35% of cases [58, 59].
REFERENCES [1] [2] [3]
[4] [5] [6] [7] [8] [9] [10] [11]
[12] [13] [14] [15] [16]
[17]
Cancerogenesis [Russian], D.G. Zaridze ed., 2004, Meditsina, Moscow, 576 p. Baranova A.V., Yankovsky N.K. Geny-supressory opuholevogo rosta. Molekulyarnaya biologiya [Russian], 1998, t. 32, № 2, pp. 206-218. Gar'kavceva R.F., Gar'kavcev I.V. Molekulyarno-geneticheskie aspekty zlokachestvennyh novoobrazovanii. Vestnik Rossiiskoi Akademii medicinskih nauk [Russian], 1998, t. 2, pp. 38-44. Kiselev F.L., Geny stabilizacii DNK i kancerogenez. Molekulyarnaya biologiya [Russian], 1998, t. 32, № 2, pp. 197-205. Karp Gerald. Cell and molecular biology: Concepts and experiments, p. 705-720. Devilee P. Cleton Jansen A.M., Cornelisse C. J. Ever since Knudson. Trends in Genetics, v. 17, № 10, p. 569-573. Zil'ber L.A., Virusnaya teoriya proishozhdeniya zlokachestvennyh opuholei. M., Medgiz, 1946, 72 p. Holliday R. Epigenetics A Historical Overview. Epigenetics, 2006, v.1, № 2, p. 76-80. Knudson A.G., Mutation and Cancer: Statistical Study of Retinoblastoma. Proceedings of the National Academy of Sciences, 1971, v. 68, №. 4, p. 820-823. Tucker T., Friedman J.M. Pathogenesis of hereditary tumors: beyond the ‗‗two-hit‘‘ hypothesis. Clinical Genetics, 2002, v. 62, p. 345–357. Gallie B.L., Campbell C., Devlin H., Duckett A., Squire J.A. Developmentalbasis of retinal-specific induction of cancer by RB mutation. Cancer Research, 1999, v. 59, p. 1731–1735. Boland C.R., Ricciardiello L. How many mutations does it take to make a tumor? Proceedings of the National Academy of Sciences, 1999, v. 96, № 26, p. 14675–14677. Aplan P.D. Causes of oncogenic chromosomal translocation. TRENDS in Genetics, v. 22, № 1, p. 46-55. Sharpless N.E., DePinho R.A. Telomeres, stem cells, senescence, and cancer. The Journal of Clinical Investigation, 2004, v. 113, № 2, p.160-168. Muntoni A., Reddel R.R. The first molecular details of ALT in human tumor cells. Human Molecular Genetics, 2005, v. 14, i. 2, p. R191-R196. Xiong J., Fan S., Meng Q., Schramm L., Wang C., Bouzahza B., Zhou J., Zafonte B., Goldberg I.D., Haddad B.R., Pestell R.G., Rosen E.M. BRCA1 Inhibition of Telomerase Activity in Cultured Cells. Molecular and Cellular Biology, 2003, v. 23, № 23, p. 8668–8690. Kunz M., Ibrahim S. M. Molecular responses to hypoxia in tumor cells. Molecular Cancer, 2003, v. 2, p. 23-28.
98
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
[18] Prior S.J., Hagberg J.M., Phares D.A., Brown M.D., Fairfull L., Ferrell R.E., Roth S.M.Sequence variation in hypoxia-inducible factor 1alpha (HIF1A): association with maximal oxygen consumption. Physiological Genomics, 2003 v. 15, p. 20-26. [19] Chau C.H., Permenter M.G., Steinberg S.M., Retter A.S., Dahut W.L., Price D.K., Figg W.D. Polymorphism in the hypoxia-inducible factor 1alpha gene may confer susceptibility to androgen-independent prostate cancer. Cancer Biology and Therapy, 2005, v. 4, i. 11, p.1222-1225. [20] Ollerenshaw M., Page T., Hammonds J., Demaine A. Polymorphisms in the hypoxia inducible factor-1alpha gene (HIF1A) are associated with the renal cell carcinoma phenotype. Cancer Genetics and Cytogenetics, 2004, v. 153, i. 2, p.122-126. [21] Engelman J.A., Luo J., Cantley L.C. The evolution of phosphatidylinositol 3-kinases as regulators of growth and metabolism. Nature Reviews Genetics, 2006, v.7, p. 606-619. [22] Su J.D., Mayo L.D., Donner D.B., Durden D.L. PTEN and phosphatidylinositol 3kinase inhibitors up-regulate P53 and block tumor-induced angiogenesis: Evidence for an effect on the tumor and endothelial compartment. Cancer Research, 2003, v.63, p. 3585–3592. [23] Carew J.S., Huang P. Mitochondrial defects in cancer. Molecular Cancer, 2002, v. 1:9. [24] Keleg S., Büchler P., Ludwig R., Büchler M.W., Friess H. Invasion and metastasis in pancreatic cancer. Molecular Cancer, 2003, v. 2:14. [25] Garnis C., Buys T.P., Lam W.L. Genetic alteration and gene expression modulation during cancer progression. Molecular Cancer, 2004, v. 3:9. [26] Vogelstein B., Kinzler K.W. P53 function and dysfunction. Cell, 1992, v.70, p. 523526. [27] Hakem R., Mak T.W. Animal models of tumor-suppressor genes. Anuual review of Genetics, 2001, v. 35, p.209-241. [28] Luo R.X., Dean D.C. Chromatin remodeling and transcriptional regulation. Journal of the National Cancer Institute, 1999, v. 91, № 15, p.1288-1294. [29] Harris S.L., Gil G., Robins H., Hu W., Hirshfield K., Bond E., Bond G., Levine A.J. Detection of functional single-nucleotide polymorphisms that affect apoptosis. Proceedings of the National Academy of Sciences, 2005, v. 102, № 45, p.16297-16302. [30] Lind H., Zienolddiny S., Ekstrom P.O., Skaug V., Haugen A. Association of a functional polymorphism in the promoter of the MDM2 gene with risk of nonsmall cell lung cancer. International Journal of Cancer, 2006, v. 119, № 3, p. 718-721. [31] Bertrand P., Saintigny Y., Lopez B.S. P53‘s double life: transactivation-independent repression of homologous recombination. TRENDS in Genetics, 2004, v. 20, i. 6, p. 235-243. [32] Seoane J., Le H.V., Massague J. Myc suppression of the p21(Cip1) Cdk inhibitor influences the outcome of the P53 response to DNA damage. Nature, 2002, v. 419, p. 729-734. [33] Mihara M., Erster S., Zaika A., Petrenko O., Chitterenden T., Pancoska P., Moll U.M. P53 has a direct apoptogenic role at the mitochondria. Molecular Cell, 2003, v. 11, p. 577-590. [34] Li B., Dou Q.P. Bax degradation by the ubiquitinyproteasomedependent pathway: Involvement in tumor survival and progression. Proceedings of the National Academy of Sciences, 2000, v. 97, n. 8, p. 3850-3855.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
99
[35] Bergamaschi D., Samuels Y., Sullivan A., Zvelebil M., Breyssens H., Bisso A., Del Sal G., Syed N., Smith P., Gasco M., Crook T., Lu X. iASPP preferentially binds P53 proline-rich region and modulates apoptotic function of codon 72-polymorphic P53. Nature Genetics, 2006, v. 38, № 10, p. 1133-1141. [36] Dumont P., Leu J.I., Della Pietra A.C. 3rd, George D.L., Murphy M. The codon 72 polymorphic variants of the P53 tumor suppressor protein demonstrate marked differences in apoptotic potential. Nature Genetics, 2003, v. 33, № 3, p. 357-365. [37] Li X., Dumont P., Della Pietra A., Shetler C., Murphy M.E. The codon 47 polymorphism in P53 is functionally significant. Journal of Biological Chemistry, 2005, v. 280, № 25, p.24245-24251. [38] McGregor J.M., Harwood C.A., Brooks L., Fisher S.A., Kelly D.A., O'nions J., Young A.R., Surentheran T., Breuer J., Millard T.P., Lewis C.M., Leigh I.M., Storey A., Crook T. Relationship between P53 codon 72 polymorphism and susceptibility to sunburn and skin cancer. Journal of Investigative Dermatology, 2002, v. 119, № 1, p. 84-90. [39] Wang Y., Cortez D., Yazdi P., Neff N., Elledge S.J., Qin J. BASC, a super complex of BRCA1-associated proteins involved in the recognition and repair of aberrant DNA structures. Genes & Development, 2000, v.14, p.927–939. [40] Hedenfalk I. A., Ringner M., Trent M.J., Borg A., Gene Expression in Inherited Breast Cancer. Advances in Cancer Research, 2002, v. 84, p. 1-34. [41] Liu Y., West S.C. Distinct functions of BRCA1 and BRCA2 in double-strand break repair. Breast Cancer Research, 2002, v. 4, p. 9-13. [42] Banerjea A., Ahmed S., Hands R.E., Huang F., Han X., Shaw P.M., Feakins R., Bustin S.A., Dorudi S. Colorectal cancers with microsatellite instability display mRNA expression signatures characteristic of increased immunogenicity. Molecular Cancer, 2004, v. 3:21. [43] Dang C.V. c-Myc target genes involved in cell growth, apoptosis, and metabolism. Molecular and Cellular Biology, 1999, v.19, p. 1-11. [44] Prendergast G.C. Mechanisms of apoptosis by c-Myc. Oncogene, 1999, v.18, p. 29672987. [45] Reuter C.W., Morgan M.A., Bergmann L. Targeting the Ras signaling pathway: a rational, mechanism-based treatment for hematologic malignancies? Blood, 2000, v. 96, № 5, p.1655-1669. [46] Sinn E., Muller W., Pattengale P., Tepler I., Wallace R., Leder P. Coexpression of MMTV/v-Ha-ras and MMTV/c-myc genes in transgenic mice: synergistic action of oncogenes in vivo. Cell, 1987, v. 49, № 4, p. 465-475. [47] Bachireddy P., Bendapudi P.K., Felsher D.W. Getting at MYC through RAS. Clinical Cancer Research, 2005, v. 11, № 12, p. 4278-4281. [48] Podsypanina K., Li Y., Varmus H.E. Evolution of somatic mutations in mammary tumors in transgenic mice is influenced by the inherited genotype. BMC Medicine, 2004, v. 2:24. [49] Baranov V.S. Geneticheskie osnovy predraspolozhennosti k nekotorym chastym mul'tifaktorial'nym zabolevaniyam. Medicinskaya genetika [Russian], 2004, t. 3, № 03. [50] Grosso L.M., Triche E.W., Belanger K., Benowitz N.L., Holford T.R., Bracken M.B. Caffeine Metabolites in Umbilical Cord Blood, Cytochrome P-450 1A2 Activity, and Intrauterine Growth Restriction. American Journal of Epidemiology, 2006, v. 163 №. 11, p. 1035-1041.
100
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
[51] Hedenmalm K., Guzey C., Dahl M.L., Yue Q.Y., Spigset O. Risk factors for extrapyramidal symptoms during treatment with selective serotonin reuptake inhibitors, including cytochrome P-450 enzyme, and serotonin and dopamine transporter and receptor polymorphisms. Journal of Clinical Psychopharmacology, v. 26, № 2, p. 192197. [52] Satianegara G., Rogers P.L., Rosche B. 2006. Comparative studies on enzyme preparations and role of cell components for (R)-phenylacetylcarbinol production in a two-phase biotransformation. Biotechnology and Bioengineering, 2006, v. 94, i. 6, p. 1189-1195. [53] Nebert D.W., Carvan M.J. Ecogenetics: from biology to health. Toxicology and Industrial Health, . 1997, v. 13., p. 163-192. [54] Nebert D.W. Polymorphisms in drug-metabolizing enzymes: what is their clinical significance and why do they exist? The American Journal of Human Genetics, 1997, v. 60, p. 265–271. [55] Welcsh P.L., Owens K.N., King M.C. Insights into the functions of BRCA1 and BRCA2 Trends in Genetics, 2000, v. 16, № 2, p. 69-74. [56] Arason A., Barkardottir R., Egilsson V. Linkage analysis of chromosome 17q markers and breast-ovarian cancer in Icelandic families, and possible relationship to prostate cancer. American Journal of Human Genetics, 1993, v. 52, p. 711-717. [57] Tarasov V.A., Aslanyan M.M., Tsyrendorzhiyeva E.S., Gar‘kavtseva R.F., Lyubchenko L.N., Altukhov Yu.P. The dependence of the risk of breast cancer in women on their genotype. Doklady Biological Sciences, 2004, t. 398, pp. 391-394. [58] V. A. Tarasov, M. M. Aslanyan, E. S. Tsyrendorzhiyeva, R. F. Garkavtseva, L. N. Lyubchenko, Yu. P. Altukhov and V. A. Mel‘nik. Population Genetic Analysis of the Association Between the BRCA1 and P53 Gene Polymorphisms and the Risk of Sporadic Breast Cancer. Russian Journal of Genetics, 2005, v. 41, N. 8. [59] Tarasov V.A., Aslanyan M.M., Tsyrendorzhiyeva E. S, Litvinov S. S., Gar‘kavtseva R. F. and Yu. P. Altukhov Yu. P., Genetically determined subdivision of human populations with respect to the risk of breast cancer in women. Doklady Biological Sciences, 2006, V. 406, N. 1-6, pp. 66-69.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 3
ASSOCIATION OF CANDIDATE GENES POLYMORPHISM WITH ASTHMA IN BASHKORTOSTAN REPUBLIC OF RUSSIA E. K. Khusnutdinova*, A. S. Karunas, U.U. Fedorova and I.R. Gilyazova Institute of Biochemistry and Genetics, Ufa Science Center, Russian Academy of Sciences, Russia
ABSTRACT Asthma is a chronic inflammatory disease of respiratory tracts most probably caused by an interaction of genetic and environmental factors. Asthma is one of the most widespread and heavy chronic disorders both among children and adults. To reveal genetic risk factors for asthma development in Bashkortostan Republic of Russia, we‘ve examined associations between genetic polymorphisms of cytokines genes, β2-adrenergic receptor gene polymorphisms (ADRB2 gene), monocyte differentiation antigen (CD14), a disintegrin and metalloproteinase domain 33 (ADAM33) gene and asthma. The study sample included 156 asthma patients and 169 nonasthmatic subjects of Russian, Tatar and Bashkir ethnic origin – residents of Bashkortostan Republic. As a result of investigation, genetic markers of the increased and decreased risk of asthma development in Bashkortostan Republic were revealed. Significant genetic variation within ethnic groups of asthma patients have been demonstrated in asthma development.
ABBREVIATIONS ISAAC PCR RFLP-analysis *
[email protected] International Study of Asthma and Allergy in Childhood Polymerase chain reaction Restriction fragments length polymorphism
102
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al. OR IL4 IgE IL4Rα IL9 IL10 TNFA ADRB2 CD14 ADAM33
Odds Ratio interleukin 4 immunoglobulin Е α-chain of receptor IL-4R interleukin 9 interleukin 10 tumor necrosis factor alpha β2-adrenergic receptor monocyte differentiation antigen (receptor for lipopolysaccharide) metalloprotease domain 33
INTRODUCTION Asthma is one of the most common chronic diseases, in which many cells and cellular elements play a role [1]. Inflammation in asthma contributes to airway hyperresponsiveness, airflow limitation, respiratory symptoms - coughing, especially at night or in the early morning, wheezing, shortness of breath or rapid breathing, chest tightness. Asthma is one of the most widespread and a socially- significant diseases, which has not reached constantly high level, but the prevalence of asthma has dramatically increased over the past two dacadesdecades. According to the World-wide organization of public health services data, 100-150 million people all over the world suffer from asthma [2]. Asthma prevalence in various countries strongly varyies as among children (from 6,6 %% in Denmark to 34,0 %% in New Zealand and 42,2 %% in Austria), and among adults (from 2,7 - 4,0 %% in Germany, Spain and France to 12,0 %% in England and Australia) [1]. Recent studies under the «ISAAC» program (International Study of Asthma and Allergy in Childhood), performed in some cities of Russia in the period of 1993-1998, have demonstrated high asthma prevalence among children of 7-8 years old: 16,9 %% in Moscow and 10,6 %%-11,1 %% in Irkutsk; less asthma frequency has been shown among children of 13-14 years old: from 8 % in Moscow to 12,1 %% in Irkutsk. Asthma prevalence among adults in Russia varies from 5,6 %% in Irkutsk to 7,3 %% in St.-Petersburg. The data received using ISAAC questionnaires differ from the data of official statistics according to which the prevalence of asthma in Russia is 0,66 %% (according to Ministry of Public Health of Russian Federation, 2002) [3]. A recent statistical report (2005) estimated that the prevalence of asthma in Bashkortostan Republic is 0,776 %% that is comparable to the official data about asthma prevalence in Russia. Pathogenesis of asthma is complex. The majority of researchers consider that predisposition to asthma is caused by combination of some allelic gene variants in a genotype of the individual, resulting in an adverse hereditary background, realized in interaction with environmental factors (allergens, pollutants, professional sensitizers, respiratory infections, smoking, etc.) . Genome-wide linkage studies for asthma predisposition are conducted all over the world. Considerable effort has been extended in whole-genome screens aimed at detection of genetic loci contributing to the susceptibility to human complex diseases. More than a hundred of genes coding proteins which function is closely connected with asthma pathogenesis - cytokine, chemokine genes and their receptors, human leukocyte antigen (HLA) genes, inflammatory mediators genes and their receptors, β2-adrenergic receptor gene,
Association of Candidate Genes Polymorphism with Asthma…
103
xenobiotic biotransformation genes, etc. - have shown significant evidence of linkage to asthma [4-8]. More than 20 genome-wide studies have identified genetic regions involved in the disease, such as 5q31.1-33, 6p12-21.2, 11q12-13, 12q14-24.1, 16p12.1-11.2, Xq28/Yq12, and DNA-loci, associated with asthma development, were revealed. Recently, positional cloning allowed to identify five5 genes, which functions are still unknown: ADAM33, DPP10, PHF11, GPRA, and SPINK5 [6, 9]. Thus, numerous molecular-genetic studies of asthma, combining positional cloning and fine mapping studies indicated new data concerning genetic bases and pathology of asthma, but there are still a lot of unclear questions in etiology and pathogenesis of the disease. High prevalence and steady growth of asthma all over the world define the important social, economic and medical value of the problem, necessity to study mechanisms of asthma development to create effective methods of diagnostics, prevention and pathogenetic therapy taking into consideration ethnic origin and genetic features of each patient. Molecular-genetic studies of asthma in Russia are not numerous and are devoted, basically, to xenobiotic detoxication genes and cytokine gene polymorphisms research in asthma development [10-13]. In the laboratory of human molecular genetics at the Institute of Biochemistry and Genetics (Ufa Science Centre, Russian Academy of Sciences) GSTM1, chemokine receptor of macrophages (chemokine receptor 5 (CCR5)) and angiotensinconverting enzyme in children with atopic asthma from Bashkortostan Republic, and their relatives have been performed. Polymorphic variants of glutathione-S-transferase М1 gene and angiotensin-converting enzyme, being genetic markers of the increased and decreased risk for atopic asthma development are detected [14]. The study included DNA samples of asthma patients and healthy donors from Bashkortostan Republic. The purpose of the present research was analysis of polymorphic variants of cytokine genes (IL-4, α-chain of IL-4 receptor, IL-9, IL-10, TNF-α), ADRB2 receptor gene, CD14, and ADAM33 in asthma patients and nonasthmatic subjects. Taking into consideration the variety of populations, living in Bashkortostan, their heterogeneity and existence of significant differences in genetic risk factors of diseases development in different ethnic groups, the study samples consisted of three widespread populations of Bashkortostan: Russians, Tatars and Bashkirs. DNA collection included 156 affected patients of three ethnic origins (60 Russians, 33 Tatars, 16 Bashkirs and 47 patients from mixed ethnic origin marriages). The mean age of asthma patients was 46,5 years old. Asthma was diagnosed according to the principles proposed by GINA (Global Strategy for asthma management and prevention) and guideline «Asthma. A manual for doctors in Russia» on the basis of clinical and laboratory examination data, spirometry, R-graphy, skin and allergy tests results. The control group consisted of healthy individuals of Russian (58 individuals), Tatar (61 individuals) and Bashkir (50 individuals) ethnic origin from Bashkortostan Republic. The informed consent was obtained from each participant of investigation. Genomic DNA was isolated from peripheral blood leukocytes by a standard procedure of phenol-chloroform extraction [15]. The fragments of the examined loci were amplified using polymerase chain reaction (PCR) in PCR-machine ―Теrcik‖ ("DNA-technology" company, Moscow). Polymerase chain reaction was carried out in 25 µL reaction mixture containing 2,5 µL 10хTaq-buffer (67 mM Tris-HCl (pH 8,8), 16,6 mM (NH4)2 SO4, 1,5мМ MgCl2, 0,01 %% Тween-20), 0,1 mkg of genomic DNA, 1.0 mM of each dNTP and 1 unit of Taq-DNA polymerase (" Silex" company, Moscow) and 5-10 pM of each primer, specific for each locus.
Тable 1. Polymorphism, nucleotide sequence of primers and nomenclature of the analyzed DNA loci Gene, chromosomal localization ADRB2 5q31-32
Polymorphism, restriction enzyme, gene localization, reference
Primers
Alleles (fragments lengths, b.p.)
Arg16Gly ( NcoI - RFLP), exon 1 Holloway J. et al., 2000 [41]
5`-CCT TCT TGC TGG CAC CCC AT-3` 5`-GGA AGT CCA AAA CTC GCA CCA-3`
*Arg16 (308), *Gly16 (292, 16)
Gln27Glu (BbvI - ПДРФ), exon 1 Holloway J. et al., 2000 [41] IL4 5q31-33
-590C>T (AvaII -RFLP) promoter region Noguchi E. et al., 1998 [22]
*Gln27 (259, 49), *Glu27 (308) 5`-TAA ACT TGG GAG AAC ATG GT-3` 5`-TGG GGA AAG ATA GAG TAA TA-3`
IL4*C (177, 18) IL4*T (195)
Ile50Val ( MslI- RFLP), exon 3 Kauppi P. et al., 2001 [56]
5'-CTGTTGCTATGACCCCACCT-3' 5'-AGGTGACCAGCCTAACCCAG-3'
*Ile50 (308) *Val50 (254, 54)
Gln576Arg (MspI - RFLP) exon 12 Kauppi P. et al., 2001 [56]
5`-CCCCCACCACCAGTGGCTACC-3` 5`-CCAGGAATGAGGTCTTGGAA-3`
*Gln576 (221) *Arg576 (204, 17)
IL9 (5q31-33)
Thr113Met (NcoI – RFLP), exon 5 Walley A. et al., 2001 [19]
5`-GGC TGC TTG GCT CTA CAT C-3` 5`-ATT TAG AGT AGC TTA CTT G-3`
Thr113 (269) Met113 (172, 97)
IL10 (1q31-32)
-627А/С (RsaI - RFLP), Promoter region Hang L. et al., 2003 [33]
5`-CCTAGGTCACAGTGACGTGG-3` 5`-GGTGAGCACTACCTGACTAGC-3`
IL10*A (236, 176) IL10*C (412)
TNFA 6p21.3
-308 G>A (NcoI - RFLP), promoter region Karplus T.M., et al., 2002 [57]
5'-AGG CAA TAG GTT TTG AGG GCC AT - 3‘ 5'-TCC TCC CTG CTC CGA TTC GG -3'.
TNFA*G (87, 20), TNFA*A (107)
CD14 (5q31.1)
-159C/T (AvaII - RFLP) promoter region Baldini M.. et al., 1999 [45]
5`-GTGCCAACAGATGAGGTTCAC-3` 5`-GCCTCTGACAGTTTATGTAATC-3`
CD14*C (497) CD14*T (353, 144)
7575 G/A Intron 6 F+1 (Msp I - RFLP) Сheng L. et al., 2004 [58]
5`-GGGGAGCCCTCCAAATCAGAAGAGCC-3` 5`-AGTGGAAGCTGCTGGGCTT-3`
ADAM33*G ADAM33*A
IL4Rα (16p12)
ADAM33 20q13
Association of Candidate Genes Polymorphism with Asthma…
105
The list of the investigated loci, sequences of primers, sizes of amplified fragments are presented in Table 1. Definition of nucleotide changes was performed using restriction fragments length polymorphism (RFLP) analysis (Table 1). RFLP analysis products were resolved on a 7% polyacrylamide gel stained with ethidium bromide followed by the subsequent visualization in UV light. A statistical analysis was spent on IBM (Pentium 4) using Statistica 6.0 program [16], and Microsoft Exсel software application.
THE ASSOCIATION ANALYSIS OF POLYMORPHIC VARIANTS OF THE INTERLEUKIN- 4 GENE AND THE α-CHAIN OF RECEPTOR IL-4Rα WITH ASTHMA IL-4 is a pleiotropic cytokine that plays a critical role in the induction and maintenance of allergy and respiratory tracts inflammations. Large gene cluster of cytokines (IL-3, IL-4, IL-5, IL-9, IL-13, and the granulocyte-macrophage colony stimulating factor) has been mapped to chromosome 5q31-34 where asthma and associated clinical signs have also been linked [17-19]. The b2 adrenergic receptor and other genes connected with asthma development (CD14, glucocorticoid receptor 1, fibroblasts growth factor) are also located in this chromosome region. IL-4 plays a major role in allergic inflammatory development and IgE production. IL4 exerts its biological effects through binding to the IL-4 receptor complex, generating high serum IgE levels. IL-4 activates cellular adhesion molecules in vessels endothelium that results in T cells, monocytes, basophils, and eosinophilic cell migration to inflammation centre [20]. In 1995, Rosenwasser L.J et al. revealed that polymorphism in promoter of the IL4 gene – transition of cytosine to thymine in -590 position (-590С>Т) was associated with asthma and increased total IgE level in American population [21]. Subsequently, several studies reported associations of the IL4 gene polymorphic variants with asthma in Japanese and European populations [22, 23]. In 2003, Kabesh M. et al. performed complete screening of the IL-4 in asthma children of German ethnic origin, and revealed 16 polymorphisms, fourteen of which haven‘t been reported previously, and established association of ten closely linked polymorphisms of the IL-4 gene with asthma and elevated levels of serum IgE [24]. We carried out the analysis of the IL4 gene promoter polymorphism (-590С>Т) in asthma patients and healthy individuals from Bashkortostan Republic. The results of the investigations are summarized in Table 2. The analysis of heterogeneity revealed statistically significant differences within the control group between Tatars and Russians (χ2=7,7; р=0,009 and χ2=8,65; р=0,009, respectively) as well as between Tatars and Bashkirs (χ2=9,21; р=0,002 and χ2=9,33; р=0,008) when allele and genotype frequencies of the examined polymorphism were compared. The IL4*С/*С genotype frequency was significantly higher in Tatars (67, 21%) than in Russians (41, 38%, χ2=8,005; р=0, 0046) and Bashkirs (44%, χ2=6, 03; р=0,014). The IL4*Т/*Т genotype was found at a frequency of 14% in Bashkirs, that was higher compared to Tatars (1 64%, χ2=6,28; р=0,012) and Russians (6,9%, χ2=1,48; р=0,22). Allele IL4*С was the most frequent in all examined groups, and the highest frequency was observed in Tatars (82,79%) versus 67,24% in Russians and 65% in Bashkirs. According to literature data, allele IL4*С is prevalent in European populations
Table 2. Allele and genotype frequency distributions of the IL-4 -590С>Т polymorphism in asthma patients and healthy individuals
Group 1
2
Control group
Russians
Tatars
Bashkirs
Control (in whole)
Asthma patients group
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles C T ni, pi±sp, ni, pi±sp, CI % CI % 3 4 78 38 67,24±4,36 32,76±4,36 (57,91-75,67) (24,33-42,09) 101 82,79±3,42 (74,9-89,02) 65 65±4,77 (54,82-74,27) 244 72,19±2,44 (67,08-76,9) 89 74,17±4 (65,38-81,72) 46 69,7±5,66 (57,15-80,41) 24 75±7,65 (56,6-88,54) 224 71,79±2,55 (66,45-76,72)
N1 5
СС ni, pi±sp, CI % 6
Genotypes СТ ni, pi±sp, CI % 7
N2 ТТ ni, pi±sp, CI % 8
9
116
24 41,38±6,47 (8,6-55,07)
30 51,72±6,56 (38,22-65,05)
4 6,9±3,33 (1,91-16,73)
58
21 17,21±3,42 (10,98-25,1)
122
41 67,21±6,01 (54-78,69)
19 31,15±5,93 (19,9-44,29)
1 1,64±1,63 (0,04-8,8)
61
35 35±4,77 (25,73-45,18)
100
22 44±7,02 (29,99-58,75)
21 42±6,98 (28,19-56,79)
7 14±4,91 (5,82-26,74)
50
94 27,81±2,44 (23,1-32,92)
338
87 51,48±3,84 (43,68-59,23)
70 41,42±3,79 (33,91-49,24)
12 7,1±1,98 (3,72-12,07)
169
33 55±6,42 (41,61-67,88) 16 48,48±8,7 (30,8-66,46) 8 50±12,5 (24,65-75,35) 80 51,28±4 (43,16-59,35)
23 38,33±6,28 (26,07-51,79) 14 42,42±8,6 (25,48-60,78) 8 50±12,5 (24,65-75,35) 64 41,03±3,94 (33,22-49,17)
4 6,67±3,22 (1,85-16,2) 3 9,09±5 (1,92-24,33)
31 25,83±4 (18,28-34,62) 20 30,3±5,66 (19,59-42,85) 8 25±7,65 (11,46-43,4) 88 28,21±2,55 (23,28-33,55)
120
66
32
312
60
33
0
16
12 7,69±2,13 (4,04-13,05)
156
Reference here and further: N1,, N2 – number of individuals examined; pi – genotype (allele) frequency; sp –pi mistake, CI % - 95% confidence interval
Association of Candidate Genes Polymorphism with Asthma…
107
(70-82%), whereas IL4*Т allele is found at a high frequency (54-70%) in Afro-American and Mongoloid populations [11, 22-25]. No statistically significant differences between patients with asthma and controls were found in allele and genotype frequency distributions (р>0,05). However, statistically significant differences were revealed between asthma patients of Tatar ethnic origin and control Tatars (χ2=4,8; р=0,05). The IL4*Т allele frequency in asthma patients was significantly higher than in control group – 30,3% versus 17,21%, respectively (χ2=4,3; р=0,038). Risk of asthma development was assessed with the two-sided χ2 test with Yates‘s correction. Odds ratio (OR) for asthma patients, carrying IL4*Т allele was 2,09 (95%CI= 1,03-4,23). Moreover, our results showed a trend of association between IL4*Т/*Т and IL4*С/*Т genotypes with asthma. Genotypes IL4*Т/*Т and IL4*С/*Т were more frequently observed in asthma patients than in controls (9,09% versus 1,64%, 42,42% versus 31,15%, respectively). The genotype IL4*С/*С, on the contrary, was found at a lower frequency in patients (48,48 %%) than in control group of Tatars (67,21 %%), OR=0,46 (CI95 %% 0,18-1,09), χ2=3,15; р=0,076. In asthma patients of Russian ethnic origin, there was a tendency to IL4*С/*С genotype frequency increasing: it was revealed in Russian patients at a frequency of 55% versus 41,38% in controls (OR=1,73 (0,84-3,59), χ2=2,19, р=0,13). No statistically significant differences were found between patients exhibiting moderate and severe disease forms when the allele and genotype frequencies of the examined locus were compared. Taking into consideration the association of -590C>T polymorphism of the IL-4 gene with the severity of asthma which has been previously reported, and, reduction of FEV1 associated with -590*Т allele and -590*Т/*Т genotype in particular [25, 26], we‘ve conducted the analysis of genotype and allele frequencies of the given locus in relation to one of the basic parameters – FEV1 (forced expiratory volume in 1 s), and level of lung function according to spirometry data (tab. 3). The tendency of the IL4*Т allele association with FEV1 and lower level of lung function was revealed in asthmatic subjects. It was shown, that in patients with FEV1 value 20-39 %% from normal parameter, the IL4*С/*Т genotype frequency was 52,94 %% that was considerably higher in comparison with 21,05 %% of patients with FEV1 value more than 80 %% (χ2=3,88, р=0,049). Heterozygous genotype was statistically more frequent (50%) in patients with lung function value 40-59% than in patients with higher parameters of lung function (80% and higher) (25,45%) (χ2=6,63, р=0,01). The genotype IL4*С/*С, on the contrary, was found more frequently in patients with lower lung function, i.e., with higher parameters of FEV1 and lung capacity. The results of our investigation confirm the data of other researchers about the association of the IL4*-590Т allele with asthma severity [25, 26]. Thus, the analysis of -590С> Т polymorphism of the IL4 gene showed significant differences either between healthy individuals of Tatar and Russian ethnic origin or between Tatar and Bashkir ethnic origin. The allele IL4*Т was found to be a marker of the increased risk for asthma development in Tatars and genotype IL4*С/*Т was shown to be associated with low level of lung function. IL-4 acts through the IL-4 receptor (IL-4R) that consists of two subunits, the α chain (IL4RA) and the γ chain. IL4RA is a functionally significant component which plays an essential role in IgE production. The IL4RA gene is located on chromosome 16p (16p12.1), a region reported in linkage with asthma [18]. More than thirty singlenucleotide polymorphisms (SNPs) have been identified in the coding region of the IL4RA gene. The majority of SNPs (about thirteen) are located in exon 12 and results in
Table 3. Allele and genotype frequency distributions of -590С>Т polymorphism of the IL4 gene in asthma patients with different indices of spirometry FEV1 (forced expiratory volume) 20-39 % 40-59 % 60-79 % 80 % and higher
Alleles C ni, pi±sp 46 67,65±5,67 82 73,21±4,18 67 71,28±4,67 30 78,95±6,61
T ni, pi±sp 22 32,35±5,67 30 26,79±4,18 27 28,72±4,67 8 21,05±6,61
66 68,75+4,73 74 69,81+4,46 84 76,36+4,05
30 31,25+4,73 32 30,19+4,46 26 23,64+4,05
N1 68 112 94 38
СС ni, pi±sp 14 41,18±8,44 31 55,36±6,64 23 48,94±7,29 13 68,42±10,66
Genotypes СТ ni, pi±sp 18 52,94±8,56 20 35,71±6,4 21 44,68±7,25 4 21,05±9,35
ТТ ni, pi±sp 2 5,88±4,03 5 8,93±3,81 3 6,38=3,56 2 10,53±7,04
21 43,75+7,16 24 45,28+6,84 35 63,64+6,49
24 50+7,22 26 49,06+6,87 14 25,45+5,87
3 6,25+3,49 3 5,66+3,17 6 10,91+4,2
N2 34 56 47 19
FVC (forced vital capacity) 40-59 % 60-79 % 80 % and higher
96 106 110
48 53 55
Association of Candidate Genes Polymorphism with Asthma…
109
aminoacid substitutions. Approximately 14 SNPs of the IL4RA gene are considered to be polymorphisms [27, 28]. The association between Ile50Val and Gln576Arg polymorphisms of the IL4RA gene with atopy and severe asthma has been revealed in some recent works. [23, 26]. K. Ober et al. carried out investigation of nonsynonymous substitutions in the IL4RA gene in asthma families of various ethnic origin (Hatterites, outbredout bred whites, blacks from Chicago and Baltimore, Hispanics) as a result of which all population samples showed evidence of association to atopy and asthma, but the alleles and haplotypes showing the strongest evidence differed between the groups [27]. We‘ve lead the association analysis of two polymorphisms of the IL4RA gene (Ile50Val and Gln576Arg) with asthma in Bashkortostan Republic. The analysis of Ile50Val polymorphism of the IL4RA gene hasn‘t revealed statistically significant differences between asthma patients and controls (tab. 4). The analysis of allele and genotype frequencies between Russian, Tatar and Bashkir nonasthmatic subjects, showed significant differences between Russians and Tatars (χ2=2,7; р=0,05 and χ2=4,64; р=0,09, accordingly). Homozygous allele IL4RA*Ile50 frequency in Russians was significantly lower (24,14%) than in Tatars (42,62%, χ2=4,55; р=0,033) and Bashkirs (38%, χ2=2,43; р=0,11). Heterozygous genotype was found at a frequency of 58,62% in Russians, 42,62% in Tatars (χ2=3,04; р=0,08) and 40% in Bashkirs (χ2=3,72; р=0,05). Allele IL4RA*Ile50 was the most prevalent in all examined groups, but it was found in Russians at a lower frequency (53,45% of chromosomes) than in Tatars (63,93 %%) and Bashkirs (58%). The analysis of genotype and allele frequencies of the IL4RA gene in patients and controls taking into consideration their ethnic origin, showed that the allele IL4RA*Ile50 and genotype IL4RA*Ile50/*Ile50 were more frequent in Russian asthma patients (64,17% и 40%) than in control group of Russians (53,45% и 24,14%). Odds ratio for homozygous genotype IL4RA*Ile50/*Ile50 carriers was 2,1 (95%CI 0,98-4,5), χ2=3,4, p=0,05. The frequency of the IL4RA*Ile50/*Ile50 genotype was higher in patients with severe asthma form (42,19%) compared to moderate form of the disease (30,43%, χ2=2,28, p=0,131). The prevalence of the IL4RA*Ile50/*Ile50 genotype in patients of Russian ethnic origin with severe asthma compared to moderate asthma patients and controls of the same ethnic origin was 46,15% versus 35,29% and 24,14%. Odds ratio for severe asthma development in Russians was 2,69 (1,01-7,16), χ2=2,69, p=0,043. The analysis of allele and genotype frequencies and spyrometry data showed the correlation between IL4RA*Ile50/*Ile50 genotype frequency and parameters of FEV1 and level of lung function. The IL4RA*Ile50/*Ile50 genotype frequency was twice higher (52,94%) in asthma patients with low FEV1 value (20-39%) than in patients with high FEV1 value (80% and more) - (28,57%) (χ2=3,87; р=0,048). The frequency of this genotype in patients with 4059% of lung capacity was also higher than in patients with high lung capacity value (43,75% versus 27,27%) (χ2=3,06; р=0,08). The allele IL4RA*Ile50 was found at a frequency of 75 %% in patients with low parameters of FEV1 (20-39 %%) and at a frequency of 57,14% in patients with high FEV1 values (80% and higher) (χ2=3,82; р=0,05). The data of our research are consistent with the results of K. Mitsuyasu et al., who revealed association of the IL4RA*Ile50 allele with allergic asthma and found IL4RA*Ile50 variant to 3- fold increase of IL4 response in comparison with IL4RA*Val50 due to receptor subunit activity increasing [29].
Table 4. Allele and genotype frequency distributions of Ile50Val polymorphism of the IL4Rα gene in asthma patients and healthy individuals Group
Control group
Russians
Tatars
Bashkirs Controls (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Ile50 Val50 62 54 53,45±4,63 46,55±4,63 (43,95-62,76) (37,24-56,05) 78 44 63,93±4,35 36,07±4,35 (54,75-72,43) (27,57-45,25) 58 42 58±4,94 42,00±4,94 (47,71-67,8) (32,2-52,29) 198 140 58,58±2,68 41,42±2,68 (53,12-63,88) (36,12-46,88) 77 43 64,17±4,38 35,83±4,38 (54,9-72,71) (27,29-45,1) 40 26 60,61±6,01 39,39±6,01 (47,81-72,42) (27,58-52,19) 17 15 53,12±8,82 46,88±8,82 (34,74-70,91) (29,09-65,26) 193 119 61,86±2,75 38,14±2,75 (56,22-67,27) (32,73-43,78)
N1 116
122
100
338
120
66
32
312
Ile50/Ile50 14 24,14±5,62 (13,87-37,17) 26 42,62±6,33 (30,04-55,94) 19 38±6,86 (24,65-52,83) 59 34,91±3,67 (27,75-42,61) 24 40±6,32 (27,56-53,46) 11 33,33±8,21 (17,96-51,83) 4 25±10,83 (7,27-52,38) 55 35,26±3,83 (27,79-43,3)
Genotypes Ile50/Val50 34 58,62±6,47 (44,93-71,4) 26 42,62±6,33 (30,04-55,94) 20 40±6,93 (26,41-54,82) 80 47,34±3,84 (39,62-55,15) 29 48,33±6,45 (35,23-61,61) 18 54,55±8,67 (36,35-71,89) 9 56,25±12,4 (29,88-80,25) 83 53,21±3,99 (45,06-61,23)
Val50/Val50 10 17,24±4,96 (8,59-29,43) 9 14,75±4,54 (6,98-26,17) 11 22±5,86 (11,53-35,96) 30 17,75±2,94 (12,31-24,36) 7 11,67±4,14 (4,82-22,57) 4 12,12±5,68 (3,4-28,2) 3 18,75±9,76 (4,05-45,65) 18 11,54±2,56 (6,98-17,62)
N2 58
61
50
169
60
33
16
156
Table 5. Allele and genotype frequency distributions of Ile50Val polymorphsm of the IL4RA gene in asthma patients with different indices of spirometry FEV1 (forced expiratory volume) 20-39 % 40-59 % 60-79 % 80 % and higher FVC (forced vital capacity 40-59 % 60-79 % 80 % and higher
Alleles Ile50 ni, pi±sp 51 75±5,25 67 59,82±4,63 51 56,67±5,22 24 57,14±7,64
Val50 ni, pi±sp 17 25±5,25 45 40,18±4,63 39 43,33±5,22 18 42,86±7,64
67 69,79+4,69 63 59,43+4,77 63 57,27+4,72
29 30,21+4,69 43 40,57+4,77 47 42,73+4,72
N1 68 112 90 42
96 106 110
Ile50/ Ile50 ni, pi±sp 18 52,94±8,56 18 32,14±6,24 13 28,89±6,76 6 28,57±9,86
Genotypes Ile50/Val50 ni, pi±sp 15 44,12±8,52 31 55,36±6,64 25 55,56±7,41 12 57,14±10,8
Val50/Val50 ni, pi±sp 1 2,94±2,9 7 12,5±4,42 7 15,56±5,4 3 14,29±7,64
21 43,75+7,16 19 35,85+6,59 15 27,27+6,01
25 52,08+7,21 25 47,17+6,86 33 60+6,61
2 4,17+2,89 9 16,98+5,16 7 12,73+4,49
N2 34 56 45 21
48 53 55
112
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
Thus, the analysis of Ile50Val polymorphism of the IL4RA gene showed statistically significant differences between healthy individuals of Russian and Tatar ethnic origin. The investigation results also demonstrated the association of the IL4RA*Ile50/*Ile50 genotype with asthma and its severity in Russian patients and association of this genotype and IL4RA*Ile50 allele with lung function abnormalities. The analysis of allele and genotype frequencies of the Gln576Arg polymorphism of the IL4RA gene revealed no significant differences between asthma patients and controls (р>0,05). The IL4RA*Gln576 allele was prevalent either in asthma patients (79,81% of chromosomes) or in healthy individuals (81,31% of chromosomes). The most frequent genotype IL4RA*Gln576/*Gln576 was revealed in 62,18% of patients and 63,91% of controls. The IL4RA*Arg576/*Arg576 genotype was rare both in patients (2,56%) and controls (1,78%). No significant differences were observed in the genotype and allele frequencies of the IL4RA gene between patients of different ethnic origins, exhibiting different disease forms and severity and corresponding controls. Finally, in the present work there was a lack of association of the polymorphism Gln576Arg of the IL4RA gene with asthma in Bashkortostan Republic. The analysis of genotype combinations in the IL4RA gene polymorphisms (Ile50Val and Gln576Arg) found out eight of nine potential different combinations. The combination *Val50/*Val50-*Arg576/*Arg576 was not revealed. The most prevalent combinations were the following *Ile50/*Val50-*Gln576/*Gln576 (32,05% in patients versus 27,22% in controls); *Ile50/*Ile50-*Gln576/*Gln576 (21,15% versus 23,08%, respectively); *Ile50/*Val50-*Gln576/*Arg576 (19,87% versus19,53%). No statistically significant differences between the group of patients with asthma and the controls were found when genotype combinations frequency distributions were compared (p>0,05). Furthermore, patients were subdivided into groups according to the ethnic origin, form and severity of the disease, but there wasere also no significant differences observed when genotype combinations frequencies were analyzed (p>0,05). Taking into consideration the fact that IL4 and IL4RA proteins interact, we investigated whether there was an association between genotypes combination of IL4*590C>Т and IL4RA*Ile50Val with asthma. We did not detect differences in global distribution of genotypes combinations of these polymorphic loci between the group of patients and the group of controls (p>0,05). There were eight types of genotype combinations, three of them were the most frequent: IL4*C/*C-IL4RA*Ile50/*Val50, IL4*C/*T-IL4RA*Ile50/*Val50, IL4*C/*C-IL4RA*Ile50/*Ile50. Heterozygous genotype combinations frequency analysis between patients of different ethnic origin and nonasthmatic subjects of the same ethnicity revealed a significant difference between Russians and Tatars. The prevalence of this combination was detected in nonasthmatic subjects of Russian ethnic origin at a frequency of 31,03%, whereas only 15% of patients carried such combination (OR=0,39 (0,16-0,96), χ2=4,29; р=0,038). On the contrary, this combination was more frequent in asthma patients of Tatar ethnic origin (27,27%) compared to controls (14,75%) - OR=2,17 (0,76-6,15), χ2=2,17; р=0,14). The combination IL4*C/*C-IL4RA*Ile50/*Ile50 was more frequent in patients with severe asthma (23,44%) than in moderate asthma group (10,87%, χ2=4,43; р=0,035). Thus, case-control study of polymorphic variants of the IL4 and IL4RA genes in Bashkortostan Republic showed that the IL4*Т allele of the IL4 gene is a risk marker for asthma in Tatars; IL4RA*Ile50/*Ile50 genotype is a risk factor for asthma in Russians and is associated with severe disease form, and heterozygous genotype combinations IL4*C/*T-IL4RA*Ile50/*Val50 is a protective factor for asthma development in Russians,
Association of Candidate Genes Polymorphism with Asthma…
113
genotypes IL4*C/*Т and IL4RA*Ile50/*Ile50 are associated with lung function abnormalities.
ASSOCIATION ANALYSIS OF THR113MET POLYMORPHISM OF THE INTERLEUKIN 9 GENE WITH ASTHMA The interleukin-9 (IL9) gene is one of the cytokine genes located on chromosome 5q31-34, which plays an important role in the development of allergic inflammatory process. Taking into account the location of the IL9 gene and the fact that in transgenic mice the IL9 gene overexpressionover expression results in the development of an asthmatic phenotype, significantly higher expression of IL-9 mRNA and immunoreactivity in bronchial biopsies of asthmatics [30], we performed investigation of Thr113Met polymorphism of the IL9 gene in asthma patients and nonasthmatic individuals from Bashkortostan Republic. No statistically significant differences were found in allele and genotype frequency distributions of the Thr113Met polymorphism either between the group of asthma patients and controls or between ethnically subdivided groups of patients and controls. The frequency of IL9*Met113 allele in all examined groups was low - 14,66% in Russian nonasthmatic individuals, 13,11% - in Tatars, 9% - in Bashkirs. Its frequency in Russian asthma patients was 16,67%, in Tatar - 16,42%, Bashkir - 12,5%. The genotype IL9*Met113/*Met113 was observed in the control group only once in Russian subject, twice in asthma patients of Russian ethnic origin and once in patient of Tatar ethnic origin. The distribution of allele frequencies in the examined ethnic groups was similar to that of Finnish population where IL9*Met113 allele was found at a frequency of 15% in a whole sample and 13% in patients with asthma [31], and to results of asthma patients investigation in Tomsk, where the allele frequency was 18,06% in healthy individuals and 18,63% in asthma patients [10]. The analysis of allele and genotype frequencies of this polymorphism between patients with different disease forms and severity also revealed no significant differences. So, the results of our investigation have not pointed out an association between Thr113Met polymorphism of the IL9 gene and asthma in Bashkortostan Republic. The results of our investigation support the findings of two previous studies performed by Laitinen T. in Finnish population and Freidin M. in Russian population from Tomsk [10, 31].
ASSOCIATION ANALYSIS OF -627С>А PROMOTER POLYMORPHISM OF THE INTERLEUKIN-10 GENE WITH ASTHMA IL-10 is one of cytokines that might play a role in the process of inflammation and is therefore considered to be involved in the pathogenesis of asthma. It participates in both immunoproliferative and inflammatory responses. The anti-inflammatory effect of IL-10 is through the inhibition of macrophages and human polymorphonuclear leucocytes to the synthesis of proinflammatory cytokines, chemokines, and inflammatory enzymes. Low production of IL-10 was found in the alveolar macrophages and peripheral blood mononuclear cells of asthma patients. Taking into consideration data of literature about altered IL-10 synthesis in asthma patients, influence of promoter region polymorphisms on IL-10 gene expression, and association of this polymorphism with asthma [32, 33], we
114
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
investigated the potential relationship between asthma and -627С>А promoter polymorphism of the interleukin 10 (1q31-32) gene in Bashkortostan Republic. The analysis of allele and genotype frequencies revealed no statistically significant differences between group of patients and controls (p>0,05). The most frequent genotype in both groups was IL10*C/*C revealed in 55,13% of asthma patients and 49,11% of healthy individuals. The IL10*C and IL10*А alleles were found at a frequency of 74,68% and 25,32% in patients with asthma and 71,6% и 28,4% in controls, respectively. Statistically significant differences were observed between nonasthmatic subjects of Tatar and Bashkir ethnic origin (χ2=5,92; р=0,015; χ2=7,18; р=0,02). The frequency of IL10*С allele was higher in Tatars (77,87%) compared to Bashkirs (63%). The IL10*С/*С genotype was also revealed at a higher frequency in Tatars (57,38%) than in Bashkirs (36%) (χ2=6,24; р=0,013). The most frequent genotypes observed in Bashkirs were IL10*C/*А (54% versus 40,98% in Tatars, χ2=1,87; р=0,17) and IL10*А/*А (10% versus 1,64%, respectively, χ2=2,3; р=0,13). All the examined ethnic groups of patients showed no statistically significant differences in allele and genotype frequency distributions compared to nonasthmatic patients of the same ethnic origin. The distribution of alleles and genotypes in patients with different asthma severity revealed the increased frequency of the IL10*А allele (30,47% versus 21,74%, χ2=3,04; р=0,081) and the IL10*С/*А genotype in patients with severe asthma (45,31% versus 34,78%, χ2=1,76; р=0,18). S. Lim and co-workers has previously demonstrated that -627A allele/ATA haplotype (4000, -1200, and -627 polymorphisms) of the IL-10 gene was associated with low IL-10 expression in severe asthmatics [34]. Low level of the IL-10 gene expression will favor inflammatory, immune mediated, and profibrotic mechanisms of bronchial cells reaction. Thus the results of promoter polymorphism -627С>А of the IL-10 gene analysis revealed statistically significant differences in allelic and genotypic frequencies between healthy individuals of Tatar and Bashkir ethnic origin. Our data indicated that the IL-10 gene polymorphism is not associated with asthma in Bashkortostan Republic.
ASSOCIATION ANALYSIS OF POLYMORPHIC LOCUS -308G>A 0F THE TUMOR NECROSIS FACTOR α (TNFA) GENE WITH ASTHMA Association analysis between polymorphisms of the tumor necrosis factor α gene (TNFA) (6р21.1-21.3) and asthma has been intensively performed by scientists in different countries. Tumor necrosis factor α (TNFA) is particularly interesting because of involvement in the inflammatory reaction, elevated concentration in the airways, blood, sputum, bronchoalveolar lavage and alveolar macrophages cells of symptomatic subjects [35, 36]. In the studies reported so far, there are still controversies over the effects of TNFA polymorphisms on asthma. Association between a position –308 guanine (G)-toadenine (A) polymorphism in the TNF promoter and asthma has been tested in 19 published studies that included different age groups of individuals with asthma from a range of ethnic backgrounds. Association between the TNFА*А allele and asthma was reported in seven of these studies. Two studies showed an association between the wildtype TNFА*G allele and asthma. In nine additional studies, authors reported no association between asthma and this single-nucleotide polymorphism [37].
Table 6. Allele and genotype frequency distributions of -627С>А polymorphisms of the IL10 gene in asthma patients and healthy individuals
Asthma patients
Control group
Group
Alleles
N1
С
А
Russians
84 72,41±4,15 (63,34-80,3)
32 27,59±4,15 (19,7-36,66)
116
Tatars
95 77,87±3,76 (69,46-84,88)
27 22,13±3,76 (15,12-30,54)
122
Bahkirs
63 63±4,83 (52,76-72,44)
37 37±4,83 (27,56-47,24)
100
Control group (in whole)
242 71,6±2,45 (66,47-76,35)
96 28,4±2,45 (23,65-33,53)
338
Russians
93 77,5±3,81 (68,98-84,62)
27 22,5±3,81 (15,38-31,02)
120
46 69,7±5,66 (57,15-80,41) 25 78,12±7,31 (60,03-90,72)
20 30,3±5,66 (19,59-42,85) 7 21,88±7,31 (9,28-39,97)
233 74,68±2,46 (69,47-79,41)
79 25,32±2,46 (20,59-0,53)
Tatars
Bashkirs Asthma patients (in whole)
66
32
312
СС 30 51,72±6,56 (38,2265,05) 35 57,38±6,33 (44,0669,96) 18 36±6,79 (22,9250,81) 83 49,11±3,85 (41,35-56,9) 35 58,33±6,36 (44,8870,93) 16 48,48±8,7 (30,8-66,46) 10 62,5±12,1 (35,43-84,8) 86 55,13±3,98 (46,9763,09)
Genotypes СА
АА
24 41,38±6,47 (28,6-55,07)
4 6,9±3,33 (1,91-16,73)
58
25 40,98±6,3 (28,55-54,32)
1 1,64±1,63 (0,04-8,8)
61
27 54±7,05 (39,32-68,19)
5 10±4,24 (3,33-21,81)
50
76 44,97±3,83 (37,32-52,8)
10 5,92±1,82 (2,87-10,61)
169
23 38,33±6,28 (26,07-51,79)
2 3,33±2,32 (0,41-11,53)
60
14 42,42±8,6 (25,48-60,78) 5 31,25±11,59 (11,02-58,66)
3 9,09±5 (1,92-24,33) 1 6,25±6,05 (0,16-30,23)
61 39,1±3,91 (31,4-47,23)
9 5,77±1,87 (2,67-10,67)
N2
33
16
156
116
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
Polymorphism -308G>A of the TNFA gene indicated no significant association with asthma when the asthmatic subjects were compared with the nonasthmatic group. The TNFA*G allele was prevalent in both groups (87,18% and 89,05%, respectively). Analysis of heterogeneity hasn‘t revealed significant differences in allele and genotype frequency distributions in subgroups separated by ethnic origin (р>0,05). The frequency of the prevalent genotype TNFA*G/*G varied from 77,59% in Russians to 80,33% in Tatars; heterozygous genotype TNFA*G/*А was also found at a high frequency, whereas genotype TNFA*А/*А was found only in control group of Tatars (3,28 %%). There was also no statistically significant increase in the prevalence of the TNF308G>A polymorphism either in different ethnic groups from Bashkortostan Republic compared with corresponding controls or in the disease form and severity. In summary, the results of this study suggest that -308G>A polymorphism of the TNFA gene is not a risk factor for the development of asthma in populations of Bashkortostan Republic, that is consistent with data of many researchers [37].
ASSOCIATION ANALYSIS OF Β2 – ADRENERGIC RECEPTOR GENE POLYMORPHISMS (ARG16GLY AND GLN27GLU) WITH ASTHMA Many researches of asthma are focused on the β2 – adrenergic receptor gene (ADRB2) analysis because of its direct interaction with β2–agonists and its central role in the β2-agonist pathway. The β2-adrenergic agonists are the most potent bronchodilators for the treatment of asthma. Altered functional activity of β2 – adrenergic receptor was revealed in experimental animal models and in patients with severe asthma [38]. A total of 13 polymorphisms have been identified in the intronless ADRB2 gene, located on chromosome 5q31-32, four of which results in changes of amino-acid residues 16, 27, 34, and 164 [39, 40]. Two-closely linked polymorphisms, Arg16Gly and Gln27Glu, are the most widespread in European populations [40-42]. These polymorphisms according to the preliminary studies have been associated with increased bronchial responsiveness [42], total serum IgE levels, nocturnal and childhood asthma, and severe asthma [40, 41]. Current research in asthma pharmacogenetics has highlinghted associations between SNPs in the β-adrenergic receptors and modified response to regular inhaled β-agonist treatments (e.g., albuterol) [43]. We‘ve conducted Arg16Gly and Gln27Glu polymorphisms analysis of the ADRB2 gene in asthma patients and nonasthmatic individuals from Bashkortostan Republic to investigate the possible influence of the ADRB2 polymorphisms on the development of asthma. The analysis of Arg16Gly polymorphism showed no significant differences between asthma patients and controls (р>0,05) (table 7). When the subject groups were subdivided to focus on subgroups of different ethnic origin, we‘ve observed that allele and genotype frequencies differed significantly between healthy individuals of Tatar and Bashkir ethnic origin (χ2=4,29; р=0,038; χ2=4,57, р=0,1). A significantly lower frequency of homozygous genotype ADRB2*Gly16/*Gly16 was observed in Tatars (26,23%) compared with Bashkirs (44% χ2=3,85, р=0,049) and Russians 43,1%, χ2=3,75, р=0,05). The frequency of ADRB2*Gly16 allele in Tatars (53,28%) was similar to the average world-wide frequency (54,8%), in Russians (62,93%) and Bashkirs (67%) this allele prevalence was greater than average world-wide frequency and also more frequent than in populations of Caucasian (60,7%) and Asian
Table 7. Allele and genotype frequency distributions of Arg16Gly polymorphism of the ADRB2 gene in asthma patients and healthy individuals Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Arg16 43 37,07±4,48 (28,29-46,53) 57 46,72±4,52 (37,64-55,97) 33 33±4,7 (23,92-43,12) 133 39,35±2,66 (34,11-44,78) 47 39,17±4,46 (30,39-48,5) 31 46,97±6,14 (34,56-59,66) 15 46,88±8,82 (29,09-65,26) 125 40,06±2,77 (34,58-45,73)
Gly16 73 62,93±4,48 (53,47-71,71) 65 53,28±4,52 (44,03-62,36) 67 67±4,7 (56,88-76,08) 205 60,65±2,66 (55,22-65,89) 73 60,83±4,46 (51,5-69,61) 35 53,03±6,14 (40,34-65,44) 17 53,12±8,82 (34,74-70,91) 187 59,94±2,77 (54,27-65,42)
N1 116
122
100
338
120
66
32
312
Arg16/Arg16 10 17,24±4,96 (8,59-29,43) 12 19,67±5,09 (10,6-31,84) 5 10±4,24 (3,33-21,81) 27 15,98±2,82 (10,8-22,39) 10 16,67±4,81 (8,29-28,52) 8 24,24±7,46 (11,09-42,26) 4 25±10,83 (7,27-52,38) 29 18,59±3,11 (12,82-25,59)
Genotypes Arg16/Gly16 23 39,66±6,42 (27,05-53,36) 33 54,1±6,38 (40,85-66,94) 23 46±7,05 (31,81-60,68) 79 46,75±3,84 (39,04-54,56) 27 45±6,42 (32,12-58,39) 15 45,45±8,67 (28,11-63,65) 7 43,75±12,4 (19,75-70,12) 67 42,95±3,96 (35,06-51,11)
N2 Gly16/Gly16 25 43,1±6,5 (30,16-56,77) 16 26,23±5,63 (15,8-39,07) 22 44±7,02 (29,99-58,75) 63 37,28±3,72 (29,97-45,04) 23 38,33±6,28 (26,07-51,79) 10 30,3±8 (15,59-48,71) 5 31,25±11,59 (11,02-58,66) 60 38,46±3,9 (30,79-46,58)
58
61
50
169
60
33
16
156
118
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
(46%) descent [40]. No statistically significant differences were observed in asthmatic subjects of Russian, Tatar and Bashkir ethnic origin compared with nonasthmatic individuals of the same ethnic origin (p>0,05). A tendency of ADRB2*Gly16 allele and ADRB2*Gly16 homozygotes frequencies increasing was revealed in patients with severe asthma compared with moderate asthma group. The higher frequency of the ADRB2*Gly16/*Gly16 homozygotes was revealed in the severe asthma group (45,31%) compared with moderate asthma (33,7%, χ2=2,15, p=0,11). Pharmacogenetic investigations demonstrated that homozygous genotype ADRB2*Gly16/*Gly16 carriers have less effective bronchodilator response than ADRB2*Arg16/*Arg16 genotype carriers after exogenously administrated β2- agonist therapy [43, 44] that may course cause severe asthma. In conclusion, we have found statistically significant differences between nonasthmatic patients of Tatar and Bashkir ethnic origin and an increased frequency of the ADRB2*Gly16/*Gly16 genotype in a group of severe asthmatic patients. This finding is consistent with other studies which found association of the ADRB2*Gly16/*Gly16 genotype with asthma severity. The analysis of the Gln27Glu polymorphism of the ADRB2 gene haven‘t revealed significant differences between total groups of asthma patients and controls from Bashkortostan Republic. The overall frequency of the most widespread ADRB2*Gln27 allele in subjects without asthma and asthmatic individuals was similar ((58,58% and 58,01%, respectively, (table 8)), that is highly consistent to that of European populations [40-42]. Statistically significant differences were also revealed between nonasthmatic subjects of Russian and Bashkir ethnic origin (χ2=8,23, p=0,019) and between controls of Tatar and Bashkir ethnic origin (χ2=7,18, p=0,027). The frequency of the heterozygous genotype ADRB2*Gln27/*Glu27 in Bashkirs (62%) was higher than in Russians (36,21%, χ2=7,16, p=0,0074) and Tatars (39,34%, χ2=5,64, p=0,017). The ADRB2*Glu27 allele was also found at a higher frequency in Bashkirs (49% of chromosomes) compared to Russians (38,79% χ2=2,28, p=0,13) and Tatars (37,7%, χ2=2,86, p=0,09). The analysis of allele and genotype frequencies revealed significant differences between Russian asthma patients and Russian controls: the frequency of the ADRB2*Gln27/*Glu27 genotype was higher in asthma patients (53,33%) than in healthy donors (36,21%), (OR=2,01 (0,964,2), χ2=3,5; p=0,054). In patients of Bashkir ethnic origin genotype ADRB2*Gln27/Gln27 was significantly prevalent (50%) compared to nonasthmatic subjects of the same ethnic origin (20%), OR=4,0 (1,2-13,28), χ2=5,5; p=0,019). When the examined groups were subdivided to focus on subgroups with different disease forms and severity, there was no statistically significant difference observed. Thus, the results of our investigation of the Gln27Glu polymorphism of the ADRB2 gene showed significant differences between nonasthmatic subjects of Russian and Bashkir ethnic origin as well as Tatar and Bashkir ethnic origin when allele and genotype frequencies were compared. The association of the ADRB2*Gln27/*Glu27 genotype with asthma was demonstrated in Russians, whereas the ADRB2*Gln27/Gln27 genotype was associated with the disease in Bashkirs. The analysis of genotype combinations of two polymorphisms of the ADRB2 gene showed differences between control subjects of Russian and Bashkir ethnic origin (χ2=12,68; p=0,068). The proportion of the heterozygous genotype combinations was prevalent in Bashkirs (36%) compared with Russians (18,97%, χ2=3,97; p=0,046) as well as combinations of *Gly16/*Gly16 and *Gln27/*Glu27- 26% versus 13,79% (χ2=2,55; p=0,11). The analysis of genotype combinations frequency found no differences between either overall group samples or ethnically subdivided groups of patients and controls.
Тable 8. Allele and genotype frequency distributions of Gln27Glu polymorphism of the ADRB2 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Gln27 71 61,21±4,52 (51,72-70,11) 76 62,3±4,39 (53,07-70,91) 51 51±5 (40,8-61,14) 198 58,58±2,68 (53,12-63,88) 68 56,67±4,52 (47,31-65,68) 39 59,09±6,05 (46,29-71,05) 23 71,88±7,95 (53,25-86,25) 181 58,01±2,79 (52,32-63,55)
Glu27 45 38,79±4,52 (29,89-48,28) 46 37,7±4,39 (29,09-46,93) 49 49±5 (38,86-59,2) 140 41,42±2,68 (36,12-46,88) 52 43,33±4,52 (34,32-52,69) 27 40,91±6,05 (28,95-53,71) 9 28,12±7,95 (13,75-46,75) 131 41,99±2,79 (36,45-47,68)
N1 116
122
100
338
120
66
32
312
Gln27/ Gln27 25 43,1±6,5 (30,16-56,77) 26 42,62±6,33 (30,04-55,94) 10 20±5,66 (10,03-33,72) 61 36,09±3,69 (28,86-43,83) 18 30±5,92 (18,85-43,21) 14 42,42±8,6 (25,48-60,78) 8 50±12,5 (24,65-75,35) 55 35,26±3,83 (27,79-43,3)
Genotype Gln27/Glu27 21 36,21±6,31 (23,99-49,88) 24 39,34±6,25 (27,07-52,69) 31 62±6,86 (47,17-75,35) 76 44,97±3,83 (37,32-52,8) 32 53,33±6,44 (40-66,33) 11 33,33±8,21 (17,96-51,83) 7 43,75±12,4 (19,75-70,12) 71 45,51±3,99 (37,53-53,67)
N2 Glu27/ Glu27 12 20,69±5,32 (11,17-33,35) 11 18,03±4,92 (9,36-29,98) 9 18±5,43 (8,58-31,44) 32 18,93±3,01 (13,33-25,67) 10 16,67±4,81 (8,29-28,52) 8 24,24±7,46 (11,09-42,26) 1 6,25±6,05 (0,16-30,23) 30 19,23±3,16 (13,37-26,3)
58
61
50
169
60
33
16
156
120
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
The results of the investigation of β2 –adrenergic receptor gene polymorphisms observed significant interethnic differences between healthy individuals of Tatar and Bashkir ethnic origin when allele and genotype frequency distributions of the Arg16Gly polymorphic variant were analyzed. The investigation of the Gln27Glu polymorphism of the ADRB2 gene showed statistically significant differences between nonasthmatic individuals of Russian and Bashkir, Tatar and Bashkir ethnic origins. The ADRB2*Gln27/*Glu27 genotype was shown to be associated with asthma in Russians OR=2,01 (0,96-4,2), genotype ADRB2*Gln27/Gln27 – in Bashkirs OR=4,0 (1,2-13,28). The trend of increasing of the ADRB2*Gly16/*Gly16 genotype frequency was found in patients with severe asthma.
ASSOCIATION ANALYSIS OF -159С>Т POLYMORPHISM OF CD14 GENE AND ASTHMA CD14 is the receptor for lipopolysaccaride and other bacterial wall-derived components and mainly expressed on the surface of macrophages, dendritic cells and neutrophils. Baldini et al. described a C-to-T single nucleotide polymorphism at position 159 in the promoter of CD14 in 1999. They found that children with TT genotype had significantly higher soluble CD14 (sCD14) and lower total IgE concentrations in serum when compared to individuals with the CT and CC genotypes [45]. The high level of CD14 in serum may result in Th1 immune response stimulation and low IgE level. This polymorphic marker was also found to be associated with skin test positivity to aeroallergens [46, 47], higher serum IgE [48, 49] and altered CD14 expression in asthmatic subjects in some populations [50]. To test whether the variant in the promoter of the CD14 gene relates to asthma in Bashkortostan Republic, we‘ve conducted a genetic association study, the results of which are shown in Table 9. Statistically significant differences between nonasthmatic subjects of Bashkir and Tatar ethnic origin were found in the genotype and allelic frequencies for -159С>Т polymorphism of the CD14 gene (χ2=4,5; p=0,034 и χ2=5,13; p=0,07). The CD14*С allele was present in 61% Bashkirs versus 46,72% Tatars and 56,9% Russians, the genotype CD14*С/*С – 34% versus 18,03% Tatars (χ2=3,71; p=0,053) and 29,3% Russians (χ2=0,27; p=0,6). No significant differences between asthma patients and healthy controls were found when allele and genotype frequencies were analylyzed. We also could not find any association between CD14 gene polymorphism and various ethnic groups examined. However, when the patients and controls were subdivided according their ethnic origin, we revealed differences between Russian patients and controls and Tatar patients and controls. The CD14*Т/*Т genotype was 2- fold increased in frequency in asthmatics of Russian ethnic origin compared with controls (30% versus 15,52%), OR=2,33 (0,94-5,73), χ2=3,5; p=0,051. Finally, the present study found significant interethnic differences in allele and genotype frequencies of the -159С>Т polymorphism of the CD14 gene between Tatar and Bashkir nonasthmatic individuals. Moreover, the CD14*Т/*Т genotype of this polymorphism was shown to be risk marker for asthma development in Russians.
Table 9. Allele and genotype frequency distributions of the -159С>T polymorphism of CD14 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles С 66 56,9±4,6 (47,38-66,06) 57 46,72±4,52 (37,64-55,97) 61 61±4,88 (50,73-70,6) 184 54,44±2,71 (48,96-59,84) 60 50±4,56 (40,74-59,26) 34 51,52±6,15 (38,88-64,01) 21 65,62±8,4 (46,81-81,43) 171 54,81±2,82 (49,1-60,42)
Т 50 43,1±4,6 (33,94-52,62) 65 53,28±4,52 (44,03-62,36) 39 39±4,88 (29,4-49,27) 154 45,56±2,71 (40,16-51,04) 60 50±4,56 (40,74-59,26) 32 48,48±6,15 (35,99-61,12) 11 34,38±8,4 (18,57-53,19) 141 45,19±2,82 (39,58-50,9)
N1 116
122
100
338
120
66
32
312
СС 17 29,31±5,98 (18,09-42,73) 11 18,03±4,92 (9,36-29,98) 17 34±6,7 (21,21-48,77) 45 26,63±3,4 (20,13-33,96) 18 30±5,92 (18,85-43,21) 10 30,3±8 (15,59-48,71) 6 37,5±12,1 (15,2-64,57) 50 32,05±3,74 (24,81-39,9)
Генотипы СТ 32 55,17±6,53 (41,54-68,260 35 57,38±6,33 (44,06-69,96) 27 54±7,05 (39,32-68,19) 94 55,62±3,82 (47,79-63,25) 24 40±6,32 (27,56-53,46) 14 42,42±8,6 (25,48-60,78) 9 56,25±12,4 (29,88-80,25) 71 45,51±3,99 (37,53-53,67)
N2 ТТ 9 15,52±4,75 (7,35-27,42) 15 24,59±5,51 (14,46-37,29) 6 12±4,6 (4,53-24,31) 30 17,75±2,94 (12,31-24,36) 18 30±5,92 (18,85-43,21) 9 27,27±7,75 (13,3-45,52) 1 6,25±6,05 (0,16-30,23) 35 22,44±3,34 (16,15-29,8)
58
61
50
169
60
33
16
156
Table 10. Allele and genotype frequency distributions of 7575 G/A polymorphism of the ADAM33 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Аллели G 63 63+4,83 (52,76-72,44) 68 57,63+4,55 (48,19-66,67) 83 66,94+4,22 (57,92-75,12) 214 62,57+2,62 (57,21-67,72) 63 54,31+4,63 (44,81-63,59) 39 65+6,16 (51,6-76,87) 17 53,12+8,82 (34,74-70,91) 176 59,06+2,85 (53,24-64,7)
A 37 37+4,83 (27,56-47,24) 50 42,37+4,55 (33,33-51,81) 41 33,06+4,22 (24,88-42,08) 128 37,43+2,62 (32,28-42,79) 53 45,69+4,63 (36,41-55,19) 21 (35+6,16) (23,13-48,4) 15 46,88+8,82 (29,09-65,26) 122 40,94+2,85 (35,3-46,76)
N1 100
118
124
338
116
60
32
298
GG 17 34+6,7 (21,21-48,77) 16 27,12+5,79 (16,36-40,27) 29 46,77+6,34 (33,98-59,88) 62 36,26+3,68 (29,06-43,94) 15 25,86+5,75 (15,26-39,04) 12 40+8,94 (22,66-59,4) 4 25+10,83 (7,27-52,38) 49 32,89+3,85 (25,42-41,05)
Генотипы GA 29 58+6,98 (43,21-71,81) 36 61,02+6,35 (47,44-73,45) 25 40,32+6,23 (28,05-53,55) 90 52,63+3,82 (44,87-60,3) 33 56,9+6,5 (43,23-69,84) 15 50+9,13 (31,3-68,7) 9 56,25+12,4 (29,88-80,25) 78 52,35+4,09 (44,02-60,59)
N2 AA 4 8+3,84 (2,22-19,23) 7 11,86+4,21 (4,91-22,93) 8 12,9+4,26 (5,74-23,85) 19 11,11+2,4 (6,82-16,81) 10 17,24+4,96 (8,59-29,43) 3 10+5,48 (2,11-26,53) 3 18,75+9,76 (4,05-45,65) 22 14,77+2,91 (9,49-21,5)
50
59
62
171
58
30
16
149
Association of Candidate Genes Polymorphism with Asthma…
123
АSSOCIATION ANALYSIS OF THE 7575G>A POLYMORPHISM OF ADAM33 GENE WITH ASTHMA The gene, coding a disintegrin and metalloprotease domain 33 (ADAM33) and located on chromosome 20q13 has been identified as a susceptibility gene for asthma using a positional cloning strategy [51]. Gene expression studies indicate that ADAM33 is expressed in bronchial smooth muscle and other muscle tissues and its biological role includes myogenesis and bronchial hyperresponsiveness [51, 52]. Positive associations were demonstrated between SNPs of the ADAM33 and asthma in Afro-American, Hispanic, German, Korean and white populations [53]. Polymorphisms of the ADAM33 gene were subsequently associated with excess decline in lung function in asthma patients [54-55]. We failed to detect significant evidence of association to asthma with 7575G>A polymorphism of the 6 intron of the ADAM33 gene tested among the overall samples of astmathics and controls (Table 10). The heterozygous genotype ADAM33*G/*A was prevalent in both examined groups - 52,63% in controls and 52,35% in asthma patients. The ADAM33*G allele was revealed on 62,57% of chromosomes in control subjects and 59,06% - in asthma patients. The distributions of genotypes and alleles for this polymorphism showed significant differences between Tatar asthma patients and controls of the same ethnic origin: the ADAM33*G/*G genotype was overrepresented in asthmatics (40%) and underrepresented in healthy controls (27,12%), OR=1,79 (95%CI 0,7-4,5), χ2=1,53, р=0,1. The tendency towards association between ADAM33 polymorphism and asthma severity was observed. The frequency of ADAM33*G/*G genotype was increased in severe asthmatics (40,98%) compared to patients with moderate asthma (27,59%) (χ2=2,9 р=0,08). The results of the investigation showed the evidence of the increased ADAM33*G/*G genotype frequency in asthma patients of Tatar ethnic origin.
CONCLUSION The analysis of asthma susceptibility genes in Bashkortostan Republic have demonstrated significant genetic variation within ethnic groups in allele and genotype frequencies of the IL4, IL4RA, IL10, ADRB2, CD14 genes and genetic risk factors of the disease development. The genotypes IL4RA*Ile50/*Ile50, ADRB2*Gln27/*Glu27 and CD14*Т/*Т are shown to increase the risk of asthma in Russians. The IL4*Т allele of the IL4 gene has shown significant evidence of association with the disease in Tatar ethnic group. The genotype ADRB2*Gln27/*Gln27 is a risk factor for asthma development in Bashkirs. Moreover, genotypes IL4*С/*Т and IL4RA*Ile50/*Ile50 are associated with low level of lung function. Significant ethnic differences have also been demonstrated for risk factors of asthma development in various genes involved in the disease that emphasized importance and requirement of population genetics approach in such investigations. High prevalence and steady growth of asthma all over the world is a global medical and social problem, which still needs intense investigation and significant insights into understanding of asthma pathophysiology. The important goal of asthma research is to understand the genetic and environmental triggers and genetic markers of the increased and decreased risk for asthma development. Genetic studies may improve the understanding of asthma and lead to new methods to prevent, diagnose, and treat this disease.
124
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
[11]
[12]
[13]
[14] [15] [16] [17]
[18]
Global Strategy for asthma management and prevention (GINA)., Мoscow., Athmosphere, 2002, 160p. The WHO global programme. Report of a consultation to review progress and develop future activities. Geneva, World Health Organization, 2000. Chuchalin A.G. Manual on diagnostics, treatment and preventive maintenance of asthma. М., ООО «NTC KVAN», 2005, 37 p. Ober C., Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery. Genes and Immunity, 2006, v.7(2), pp.95-100. Cookson W, Moffatt M. Making sense of asthma genes. N. Engl. J. Med. 2004, v.351(17), pp.1794-1796. Gao P.S., Huang S.K. Genetic aspects of asthma. Panminerva Med., 2004, v.46, pp.121-134. Palmer LJ, Cookson WO. Genomic Approaches to Understanding Asthma. Genome Res. 2000, v.1.0(9), pp.1280-1287. Malerba G., Pignatti P.F. A review of asthma genetics: gene expression studies and recent candidates. J. Appl. Genet. 2005, v.46(1), pp. 93-104. Kere J, Laitinen T. Positionally cloned susceptibility genes in allergy and asthma. Curr. Opin. Immunol. 2004, v.16(6), pp.689-694. M. B. Freidin, V. P. Puzyrev, L. M. Ogorodova, O. A. Salyukova, E. M. Kamaltynova, I. M. Kulmanakova, and Yu. A. Petrovskaya Analysis of the Association between the T113M Polymorphism of the Human Interleukin 9 Gene and Bronchial Asthma. Russian Journal of Genetics. 2000, Vol. 36, No. 4, p. 453. M. B. Freidin, V. P. Puzyrev, L. M. Ogorodova, O. S. Kobyakova, and I. M. Kulmanakova Polymorphism of the Interleukin- and Interleukin Receptor Genes: Population Distribution and Association with Atopic Asthma. Russian Journal of Genetics, 2002, Vol. 38, No. 12, p. 1452. T. E. Ivaschenko, O. G. Sideleva, M. A. Petrova, T. E. Gembitskaya, A. V. Orlov, and V. S. Baranov Genetic Determinants of Predisposition to Bronchial Asthma. Russian Journal of Genetics. 2001, Vol. 37, No. 1, p. 94 . Ivaschenko T.E., Sideleva O.G., Baranov V.S. Glutathione- S-transferase micro and theta gene polymorphisms as new risk factors of atopic bronchial asthma. J. Mol. Med. 2002, v. 80(1), pp.39-43. Etkina I.A. Clinical and genetic associations in asthma children/ abstract of a thesis : 03.00.15, Ufa, 2000, 250p. Mathew C.C. The isolation of high molecular weight eucariotic DNA // Methods in molecular biology / Ed. Walker J.M. N.Y.; Haman press, 1984, v. 2, pp.31-34. StatSoft, Inc. (2001). STATISTICA (data analysis software system), version 6. www.statsoft.com. Marsh D. G., Neely J. D., Breazeale D. R. et al. Linkage analysis of IL4 and other chromosome 5q31.1 markers and total serum immunoglobulin E concentration. Science, 1994, v. 264, pp. 1152-1156. Daniels S. E., Bhattacharrya S., James A. et al. A genome-wide search for quantitative trait loci underlying asthma. Nature, 1996, v. 383, pp. 247-250.
Association of Candidate Genes Polymorphism with Asthma…
125
[19] Walley A.J., Wiltshire S., Ellis C.M., Cookson W.O. Linkage and allelic association of chromosome 5 cytokine cluster genetics markers with atopy and asthma associated traits. Genomics, 2001, v.72, pp.15-20. [20] Steinke J.W., Borish L. Th2 cytokines and asthma. Interleukin-4: its role in the pathogenesis of asthma, and targeting it for asthma treatment with interleukin-4 receptor antagonists. Respir. Res., 2001, v. 2, pp.66–70. [21] Rosenwasser L.J., Klemm D.J., Dresback J.K. et al. Promoter polymorphisms in the chromosome 5 gene cluster in asthma and atopy. Clin. Exp. Allergy. 1995, v. 25 (2), pp. 74-78. [22] Noguchi E., Shibasaki M., Arinami T. et al. Association of asthma and the interleukin-4 promoter gene in Japanese. Clin. Exp. Allergy. 1998, v. 28, pp.449-453. [23] Beghe B., Barton S., Rorke S. et al. Polymorphisms in the interleukin-4 and interleukin-4 receptor α chain genes confer susceptibility to asthma and atopy in a Caucasian populations. Clin. Exp. Allergy. 2003, v. 33, pp. 1111-1117. [24] Kabesch M, Tzotcheva I, Carr D, Hofler C, Weiland SK, Fritzsch C, von Mutius E, Martinez FD. A complete screening of the IL4 gene: novel polymorphisms and their association with asthma and IgE in childhood. J. Allergy Clin. Immunol. 2003, v. 112(5), pp. 893-898. [25] Burchard E.G., Silverman E.K., Rosenwasser L.J. et al. Association between a sequence variant in the IL-4 gene promoter and FEV(1) in asthma. Am. J. Respir. Crit. Care Med. 1999, v.160, pp. 919-922. [26] Sandford A.J., Chagani T., Zhu S. et al. Polymorphisms in the IL4, IL4RA, and FCERIB genes and asthma severity. J. Allergy Clin. Immunol. 2000, v.106(1 Pt 1), pp.135-140. [27] Ober C., Leavitt S., Tsalenko A. et al. Variation in the interleukin 4-receptor alpha gene confers susceptibility to asthma and atopy in ethnically diverse populations. Am. J. Hum. Genet., 2000, v. 66, pp. 517-526. [28] Hytonen A.M., Lowhagen O., Arvidsson M. et al. Haplotypes of the interleukin-4 receptor alpha chain gene associate with susceptibility to and severity of atopic asthma. Clin. Exp. Allergy. 2004, v. 34(10), pp.1570-1575. [29] Mitsuyasu H., Yanagihara Y., Mao X. et al. Cutting edge: dominant effect of Ile50Val variant of the human IL-4 receptor alpha-chain in IgE synthesis. J. Immunol. 1999, v.162(3), pp.1227-1231. [30] Shimbara A., Christodoulopoulos P., Soussi-Gounni A. et al. IL-9 and its receptor in allergic and nonallergic lung disease: increased expression in asthma. J. Allergy Clin. Immunol., 2002, v. 105, pp. 108-115. [31] Laitinen T., Kauppi P., Ignatius J. et al. Genetic control of serum IgE levels and asthma: linkage and linkage disequilibrium studies in an isolated population. Hum. Mol. Genet., 1997, v. 6 (12). pp.2069–2076. [32] Borish L., Aarons A., Rumbyrt J. et al. Interleukin-10 regulation in normal subjects and patients with asthma. J. Allergy Clin. Immunol. 1996, v. 97(6), pp. 1288-1296. [33] Hang L., Hsia T., Chen W. et al. Interleukin-10 Gene –627 Allele Variants, Not Interleukin-I Beta Gene and Receptor Antagonist Gene Polymorphisms, Are Associated With Atopic Bronchial Asthma. J. Clin. Lab. Anal. 2003, v. 17, pp. 168–173.
126
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
[34] Lim S., Crawley E., Woo P., Barnes PJ. Haplotype associated with low interleukin-10 production in patients with severe asthma. Lancet. 1998, Jul 11; v.352(9122), pp.113117. [35] Gosset P., Tsicopoulos A., Wallaert B. et al. Increased secretion of TNF and IL-6 by alveolar macrophages consecutive to the development of the late asthmatic reaction. J. Allergy Clin. Immunol. 1991, v.88, pp. 561-571. [36] Cembrzynska-Nowak M., Szklarz E., Inglot A.D. et al. Elevated release of TNF and IFN by bronchoalveolar leukocytes from patients with bronchial asthma. An. Rev. Respir. Dis. 1993, v. 147, pp. 291-295. [37] Gao J., Shan G., Sun B. et al. Association between polymorphism of tumour necrosis factor {alpha}-308 gene promoter and asthma: a meta-analysis. Thorax. 2006, v. 61(6), pp. 466-471. [38] Bai TR. Abnormalities in airway smooth muscle in fatal asthma. A comparison between trachea and bronchus. Am. Rev. Respir. Dis. 1991, v. 143(2), pp.441-443. [39] Reihsaus E., Innis M., MacIntyre N., Liggett S.B. Mutations in gene encoding for the beta 2-adrenergic receptor in normal and asthmatic subjects. Am. J. Respir. Cell Mol. Biol., 1993, v. 8, pp. 334-339. [40] Contopoulos-Ioannidis D.G., Manoli E.N., Ioannidis J.P.A. Meta-analysis of the association of b2-adrenergic receptor polymorphisms with asthma phenotypes, J. Allergy Clin. Immunol. 2005, v. 115(5), pp. 963-972. [41] Holloway J.W., Dunbar P.R., Riley G.A. et al. Association of beta2-adrenergic receptor polymorphisms with severe asthma. Clin. Exp. Allergy. 2000, v.30, pp. 1097-1103. [42] Ulbrecht M., Hergeth M.T., Wjst M. et al. Association of beta(2)-adrenoreceptor variants with bronchial hyperresponsiveness. Am. J. Respir. Crit. Care Med. 2000, v.161, pp. 469-474. [43] Palmer L.J., Silverman E.S., Weiss S.T., Drazen J.M. Pharmacogenetics of Asthma. Am. J. Respir. Crit. Care Med., 2002, v. 165(7), pp. 861-866. [44] Cho S.H., Oh S.Y., Bahn J.W. et al. Association between bronchodilating response to short-acting beta-agonist and non-synonymous single-nucleotide polymorphisms of beta-adrenoceptor gene. Clin. Exp. Allergy, 2005, v. 35(9), pp. 1162-1167. [45] Baldini M., Lohman I.C., Halonen M. et al. A Polymorphism in the 5' flanking region of the CD14 gene is associated with circulating soluble CD14 levels and with total serum immunoglobulin E. Am. J. Respir. Cell Mol. Biol., 1999, v. 20, pp. 976-983. [46] Koppelman G.H., Reijmerink N.E., Colin Stine O. et al. Association of a promoter polymorphism of the CD14 gene and atopy. Am. J. Respir. Crit. Care Med., 2001, v. 163, pp. 965-969. [47] Buckova D., Holla L.I., Schuller M. CD14 promoter polymorphisms and atopic phenotypes in Czech patients with IgE-mediated allergy. Allergy, 2003, v. 58(10), pp. 1023-1026. [48] Gao P.S., Mao X.Q., Baldini M. et al. Serum total IgE levels and CD14 on chromosome 5q31. Clin. Genet., 1999, v.56. pp.164–165. [49] Sharma M., Batra J., Mabalirajan U. et al. Suggestive evidence of association of C159T functional polymorphism of the CD14 gene with atopic asthma in northern and northwestern Indian populations. Immunogenetics, 2004, v.56(7). pp. 544-547. [50] Zdolsek H.A., Jenmalm M.C.. Reduced levels of soluble CD14 in atopic children. Clin. Exp. Allergy, 2004, v. 34(4), pp. 532-539.
Association of Candidate Genes Polymorphism with Asthma…
127
[51] Van Eerdewegh P., Little R. D., Dupuis J. et al. Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness. Nature, 2002, v.418, pp.426-430. [52] Shapiro S.D., Owen C.A. ADAM-33 surfaces as an asthma gene. N. Engl. J. Med., 2002, v. 347(12), pp. 936-938. [53] Lee J.H., Park H.S., Park S.W. et al. ADAM33 polymorphism: association with bronchial hyper-responsiveness in Korean asthmatics. Clin. Exp. Allergy, 2004, v.34, pp. 860–865. [54] Jongepier H., Boezen H.M., Dijkstra A. et al. Polymorphisms of the ADAM33 gene are associated with accelerated lung function decline in asthma. Clin. Exp. Allergy, 2004, v. 34(5), pp. 757-760. [55] Van Diemen C.C., Postma D.S.et al. A disintegrin and metalloprotease 33 polymorphisms and lung function decline in the general population. Am. J. Respir. Crit. Care Med., 2005, v. 172(3), pp. 329-333. [56] Kauppi P., Lindblad-Toh K., Sevon P. et al. A second-generation association study of the 5q31 cytokine gene cluster and the interleukin-4 receptor in asthma. Genomics, 2001, v. 77 (1-2), pp.35-42. [57] Karplus T.M., Jeronimo S. M., Chang H. et al. Association between the Tumor Necrosis Factor locus and the Clinical outcome the Leishmania chagasi Infection. Infection and Immunity, 2002, v. 70 (12), pp. 6919-6925. [58] Cheng L., Enomoto T., Hirotaw T. et al. Polymorphisms in ADAM33 are associated with allergic rhinitis due to Japanese cedar pollen. Clin. Exp. Allergy, 2004, v. 34, pp. 1192–1201.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 4
GENES AND LANGUAGES: ARE THERE CORRELATIONS BETWEEN MTDNA DATA AND GEOGRAPHY OF ALTAY AND URAL LANGUAGES E. Khusnutdinova and I. Kutuev Institute of Biochemistry and Genetics of Ufa Science Centre of Russian Academy of Sciences, Russia
INTRODUCTION Correlation between social and biological features in humans is one the most interesting questions in anthropology. Biological features are inherited, but social features could even change during the lifespan. Even Darwin was interested in possible correlations between inherited features and languages, saying that if we create the human genealogy and group human races, this will let us to make the best language classification [1]. Cavalli-Sforza was one of the first investigators who dedicated his work to researches to analysis of correlations between genes and languages. He inspired many investigators to follow this way [2,3, 4, 5, 6]. He noticed that as linguistic so and genetic evolutions are quite similar in their nature and they are consecutive divergence. After the division of two populations, differentiation of languages and genes starts. Of course, the speed of these different evolutions are different but they should be correlated [6-9]. Many researchers have not accepted correlation between genes and languages and as an example they pointed to Turkic speakers, in which language interrelations are really doesn‘t correlate with races. Among Turkic speakers, there are as Caucasian (Turks, Gagauzes, Azeris) so mongoloid (Yakuts, Dolgans, Tuvinians, Tofalars) populations, as well as mixed populations (Turkmens, Uzbeks, Kirgizis, Kazakhs, Karakalpaks). But in this case, it should be noticed that in early human evolution, time correlation between language and the race was much stronger, and further migrations, leading to admixture erased this these signals [10, 11]. Altaic speakers have underwent strong race transformation. Most of the Altaic speaking people were Caucasians as well as other people belonging to Nostratic language. But many people moving from west to east were assimilated by local mongoloid peoples. South-western
130
E. Khusnutdinova and I. Kutuev
Turkic speakers are still Caucasoid. The ancestors of Mongol and Tungus speakers as well as ancestors of Koreans and Japanese moving eastward haves lost their Caucasoid features. The same concerns Eskimo and Aleut [10, 11]. Some people of Ural language family gained mongoloid features in a various extent. Most of the Uralic speakers are Caucasoid;, only Khants, Mansis and Yukagirs are mongoloids [10, 11].
PHYLOGENETIC ANALYSIS OF MTDNA LINEAGES During many centuries population admixture, huge migrations, assimilations and even extinctions of many tribes and peoples took place in Eurasian steppe belt. Ethnogenesis of people inhabiting this region is a result of admixture of peoples from Europe and Asia. Two waves of migrations faced here. The steppe belt during long time was a place where many modern people have been formed, where populations of different origin and culture interacted with each other (Ugric people of Siberia, Finns of Eastern Europe, people of Near East, Turkic speakers of South Siberia and Altay, Slavic peoples of Eastern and Western Europe and nomadic Mongols) [11]. The traces of all of these demographic processes are imprinted in genes of modern people inhabiting the Eurasian steppe belt. Modern populations of the Eurasian steppe belt are very diverse as in language so in physical anthropological types. The region is inhabited by Uralic, Altaic and Indo-European speaking people. Modern population genetic researches are based on analysis of polymorphic markers, which nowadays have very high resolution and are powerful tool for analysis of demographic processes in populations, reconstruction of admixture and migrations which took place in the past. The most powerful tools are mtDNA and Y chromosome analysis. Unlike nuclear DNA, which is inherited from both parents and in which genes are rearranged in the process of recombination, there is usually no change in mtDNA from parent to offspring. Although mtDNA also recombines, it does so with copies of itself within the same mitochondrion. Because of this and because the mutation rate of animal mtDNA is higher than that of nuclear DNA, mtDNA is a powerful tool for tracking ancestry through females (matrilineage) and has been used in this role to track the ancestry of many species back hundreds of generations. Human mtDNA can be used to identify individuals. mtDNA contains 37 genes, all of which are essential for normal mitochondrial function. Thirteen of these genes provide instructions for making enzymes involved in oxidative phosphorylation. The remaining genes provide instructions for making molecules called transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), which are chemical cousins of DNA. These types of RNA help assemble protein building blocks (amino acids) into functioning proteins [12]. In sexually reproducing organisms, mitochondria are normally inherited exclusively from the mother. The fact that mitochondrial DNA is maternally inherited enables researchers to trace maternal lineage far back in time. (Y chromosomal DNA, paternally inherited, is used in an analogous way to trace the agnate lineage.) This is accomplished in humans by sequencing one or more of the hypervariable control regions (HVR1 or HVR2) of the mitochondrial DNA. HVR1 consists of about 440 base pairs. These 440 base pairs are then compared to the control regions of other individuals (either specific people or subjects in a database) to determine
Genes and Languages
131
maternal lineage. The concept of the Mitochondrial Eve is based on the same type of analysis, attempting to discover the origin of humanity by tracking the lineage back in time [13]. Because the base sequence of animal mtDNA changes rapidly, it is useful for assessing genetic relationships of individuals or groups within a species and also for identifying and quantifying the phylogeny (evolutionary relationships) among different species, provided they are not too distantly related. To do this, biologists determine and then compare the mtDNA sequences from different individuals or species. Data from the comparisons is used to construct a network of relationships among the sequences, which provides an estimate of the relationships among the individuals or species from which the mtDNAs were taken. This approach has limits that are imposed by the rate of mtDNA sequence change. In animals, the rapid rate of change makes mtDNA most useful for comparisons of individuals within species and for comparisons of species that are closely or moderately -closely related, among which the number of sequence differences can be easily counted. As the species become more distantly related, the number of sequence differences becomes very large; changes begin to accumulate on changes until an accurate count becomes impossible. Because mtDNA is not highly conserved and has a rapid mutation rate, it is useful for studying the evolutionary relationships— - phylogeny— - of organisms. Biologists can determine and then compare mtDNA sequences among different species and use the comparisons to build an evolutionary tree for the species examined [14-25]. At the moment, there is a huge data set about variability of mtDNA in many human populations [20, 26-49]. At the same time, many populations of Russia with its complex genetic structure and very high linguistic diversity are not involved in these studies. AtBy the moment, some data on Russian and some Siberian populations have been acquired. Populations of the Volga-Ural region, Central Asia and the Caucasus are still a white spot in the gene geography studies. Analysis of mtDNA HVSI in combination with coding region analysis is a very effective and reliable approach for investigation of these regions for understanding and reconstruction historical events which lead to up-to-date genetic landscape of the regions. Due to the complex history and location at the border of Europe and Asia the North Caucasus, Volga-Ural region and Central Asia are the most interesting regions for analysis of populations living there. Two waves of migrations meet here— – from Europe and from Asia. Located between Europe and Asia Volga-Ural region during all the historical time was the place of interaction of many peoples and tribes [50, 51]. The North Caucasus is situated between the Caspian and Black sea. With more than 50 distinct peoples and dozens of distinct languages, the Caucasus is one of the most complex linguistic and ethnic regions in the world. The Caucasus is one of the most important migrations corridors from Africa to Eurasia [52]. Central Asia is also a complex region at the border between Europe and Asia. Numerous anthropological, ethnographical studies of Central Asian populations demonstrated their close relationships with populations of Volga-Ural region, especially with Turkic -speaking Bashkirs [51].
MATERIALS AND METHODS Blood samples were collected in Volga-Ural region, the Caucasus and Central Asia from healthy unrelated individuals after obtaining informed consent in 1993-2004 (table 1). DNA was extracted using the phenol-chloroform method [53].
132
E. Khusnutdinova and I. Kutuev
HVS-I was sequenced between nucleotide positions (nps) 16024 and 16400 of the revised Cambridge Reference Sequence in all the DNA samples [12, 54]. RFLP analysis of diagnostic mtDNA positions was performed, and mtDNA haplogroups were assigned to each sample by use of published criteria [27, 28, 32, 33, 42, 55- 58]. Factor analysis has been performed in Statistica v.7 [59].
RESULTS AND DISCUSSION Most of lineages revealed in investigated populations (80%) belongs to western Eurasian haplogroups H, I, J, K, T, U, V, W and X [27, 28, 42, 55, 57, 60-64] (figure 1). The rest of the samples belongs to eastern Eurasian lineages [33, 44, 65-68].
Africa
Eastern Eurasia C
D L2 M1
L1
Z M
U6
E
L3 G A I Y
N
X
B
W K
R U
F J T
H
V
Western Eurasia
Figure 1. Phylogenetic tree of mtDNA clusters.
Figure 1. Phylogenetic tree of mtDNA clusters.
Geographic analysis of haplogroups frequencies in Altaic- speaking populations revealed a gradient of increasing of frequency of eastern Eurasian lineages from west to east. The frequency of eastern Eurasian mtDNA lineages varies from 1% in Gagauzes to 99% in Yakuts and Dolgans. We haven‘t revealed any correlations between mtDNA haplogroups frequency and geographical distribution of Turkic languages. Moreover, it is quite obvious that
Genes and Languages
133
linguistic affiliations of populations (concerning, for example, language subgroups) plays much less role than geography (figure 2).
Figure 2. Maternal lineages of western and eastern Eurasian origin among 18 Turkic speaking populations.
The same concerns Uralic speakers. The frequency of eastern Eurasian lineages varies from 0% in Estonians to 80% in Nganasans. The only exceptions are Khants, Mansis and Selkups, in which relatively high frequency of common western Eurasian lineages are revealed (60-70%). We found high frequency of U4 and low frequency of W haplogroups, what which is typical for populations of Volga-Ural region. This data demonstrate most likely geneflowgene flow from west to east, but not from east to west [69-71]. One of the advantages of mtDNA usage is coalescence time tool. A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed. Under conditions of genetic drift alone, every finite set of genes or alleles has a "coalescent point" at which all descendants converge to a single ancestor (i.e., they ―'coalesce'‖). This fact can be used to derive the rate of gene fixation of a neutral allele (that is, one not under any form of selection) for a population of varying size (provided that it is finite and nonzero). Because the effect of natural selection is stipulated to be negligible, the probability at any given time that an allele will ultimately become fixed at its locus is simply its frequency p in the population at that time.
134
E. Khusnutdinova and I. Kutuev
Coalescence time for H haplogroup in Volga-Ural region is 20,036±4,250 what which is related to population re-expansion time in Urals after the glacial maximum. Coalescence time for J1 and T1 haplogroups in Caucasus populations were 30, 000 and 20, 000 years correspondingly whatwhich is significantly— – several times— - exceeds the boundary of Holocene. Thus, there are reasons to believe that these two ―marker‖ haplotypes have been a part of the Caucasus gene pool far earlier than the Neolithic period began. The other explanation is that the Neolithic influx from the Upper Mesopotamia to the Caucasus was so massive that it carried with it much or the pre-existing diversity. The principal component analysis (figure 4) based on mtDNA haplogroups frequency in population of Volga-Ural region, the North Caucasus and Central Asia explains 52.8% of haplogroups frequency variability. The results obtained correspond with the east-west gradient of mtDNA haplogroups frequency alongside the Eurasian steppe belt.
Figure 3. Maternal lineages of western and eastern Eurasian origin among 16 Uralic speaking populations.
Populations on the plot are clustered together according to geography but not due to linguistic affiliation of the investigated populations. Close location of Nogays and Bashkirs on the plot could be explained by high percentage of eastern Eurasian lineages in both populations. High frequency of eastern Eurasian lineages in Nogays is not surprising since they moved and settled in Caucasus quite recently [69]. Analysis of three principal components (64.9%) hasn‘t revealed major changes (figure 5). Udmurts due to new dimension in the projection of the 3 rd component are located quite far
Genes and Languages
135
from other populations. This could be explained by relatively high frequency of haplogroup T in this population (0.238). The majority of mtDNA haplogroups revealed in populations of Volga-Ural region have western Eurasian origin. In European populations, frequency of eastern Eurasian origin is the highest in Eastern Europe. High frequency of G, D, C, Z, and F in Turkic -speaking Bashkirs, Uralic -speaking Udmurts and Permian Komis demonstrates gene flow from Siberia and Central Asia to Volga-Ural region [72].
Figure 4. Principal component analysis (2 dimensions) of mtDNA haplogroup frequencies in populations of Volga-Ural, Central Asia and the Caucasus.
Karachays Kumyks Chuvashes Maris Tatars Nogays
Komi Syrian
Mordvinians Bashkirs
PC3 (12.0%)
Uighurs Kazakhs Uzbeks
Komi Permyan
Udmurts
PC2 (22.3%) PC1 (30.6%)
Figure 5. Principal component analysis (3 dimensions) of mtDNA haplogroup frequencies in populations of Volga-Ural, Central Asia and the Caucasus.
136
E. Khusnutdinova and I. Kutuev
Bashkirs from Perm oblast of Russia have high percent of F (11.1%), D (13.9%) и G2a (6.9%) haplogroups [69]. These haplogroups are typical for populations of Central Asia [65, 67]. This fact let us to suggest that in ethnogenesis of this subpopulation Bulgars, Ugric [73], and Central Asia [74] people played major role. The interesting fact revealed is high frequency of western Eurasian lineages in Uighurs (~45%). It‘s noticeable still this population is surrounded by Mongol, Chinese, Kirgizis and Altays, in which frequency of western Eurasian lineages are less than 15% [46, 49, 67, 75, 76]. Detailed analysis revealed high frequency of typical Anatolian, Iranian and South Caucasus lineages what which let us to suggest that at least part of genetic pool of modern Uighurs are common for Indo-Iranian nomads of Neolithic time. It is quite obvious now that so-called protoaltaic or protouralic genetic substrate doesn‘t exist. Even low -level resolution based on haplogroup frequency data demonstrate great differences between the most western (Gagauz) and eastern (Dolgans) Turkic speakers. In case of analysis of all the populations inhabiting Eurasia (belonging to other language families), the genetic landscape doesn‘t change drastically. This means that the modern genetic landscape is formed generally due to demographic processes in populations inhabiting this vast area. Altaic and Uralic speakers inhabiting European part of the continent are characterized by high frequency of western Eurasian lineages; those who inhabit Asian part are characterized by high frequency of eastern Eurasian lineages. The only exceptions are Kalmyks and Nogays living in the North Caucasus [77, 78]. Low level of eastern Eurasian mtDNA lineages in Gagauzes, Turks, Azeris and Kumyks supports the hypothesis about recent expansion of Turkic languages to west. At the same time, it‘s possible that genetic pool of prototurkic people consisted mostly of western Eurasian lineages and subsequent admixture with populations rich of eastern Eurasian lineages lead to drastic increasing of latter in Turkic speaking populations living in Asian part of Eurasia. There are a couple of exceptions within the existent west-east gradient of mtDNA haplogroups frequency change. One of these exceptions is Nogays inhabiting northern part of Dagestan and KabardinoBalkariya. The frequency of eastern Eurasian haplogroups in Nogays is up to 40%, but the neighboring populations (Kumyks, Karachays, and Balkars) have them at frequency lower than 7% [79]. This fact can be explained by history of Nogays which are remnants of the Nogay Horde whatwhich compredcompared Turkic, Ugric and Mongol tribes. Nogays formed as an ethnicity quite recently (XIV-XV) [78]. The other exception is Kalmyks, people who came to the North Caucasus some 3 three centuries ago [77]. The interesting fact that mtDNA lineages revealed in the Caucasus populations don‘t belong to A clade which frequency in Nogays and Kalmyks are up to 6% [79]. This fact demonstrates that eastern Eurasian lineages penetrated into European part of the continent due to mass migrations of Mongols. At the same time, the admixture of nomadic Mongols with autochthonous populations of the Caucasus was insignificant. Another exception among Uralic speakers are Khantis, Mansis and Selkups in which up to 70% of western Eurasian lineages are revealed [80]. This observation is not the result of admixture with Russians. The detailed analysis of haplogroup spectra demonstrates typical pattern for Finno-Ugric populations. This fact displays recent mass circumpolar migrations what which is also supported by the phylogeography of Y chromosomal haplogroup N3 [81-
Genes and Languages
137
83]. Frequency of eastern Eurasian lineages in the most western Finnic Baltic populations is less than 1% [71, 84, 85]. In modern shape of genetic landscape, the major role played geographic locations of the populations and demographic processes in them but not linguistic or cultural barriers [69]. This means that there is much more common between Uralic and Altaic speakers inhabiting the same region (Volga-Ural), than between Uralic speakers from distant regions of Eurasia. Similar results have been obtained earlier on other populations [71, 85-87].
CONCLUSION Modern shape of ethnical landscape of Eurasian steppe belt is diverse and formed during several thousand years. Comings and goings, assimilations, admixture and migrations of numerous tribes formed this heterogeneous picture of up-to-date region. Apriori we expected that those people who speak the language belonging to the same language family or its subgroups should have at least slight common genetic pattern. In current research concerning Altaic and Uralic speakers, we haven‘t found any common patterns among speakers in the same language family or the same language group. We conclude that geography rather than genetic affiliation plays a major role in genetic relationships of Altaic- and Uralic -speaking populations.
REFERENCES [1] [2] [3]
[4] [5]
[6] [7] [8]
Darwin. The origin of species. London, John Murray, 1859. Barbujani G., Sokal RR. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc. Natl. Acad. Sci. U. S.A. 1990, vol. 87, № 5, pp. 1816-9. Villems R., Adojaan M., Kivisild T., Metspalu E., Parik J., Pielberg G., Rootsi S., Tambets K., Tolk HV. Reconstruction of maternal lineages of Finno-Ugric speaking people and some remarks on their paternal inheritance. In: WiikKJulkuK, editors. The roots of peoples and languages of Northern Eurasia I. Turku: Societas Historiae FennoUgricae; 1998. P., 180-200. Diamond J., Bellwood P., Farmers and their languages: the first expansions. Science, 2003, vol. 300, № 5619, pp. 597-603. Arnaiz Villena A., Martinez Laso J., Alonso Garcia J., The correlation between languages and genes: the Usko-Mediterranean peoples. Hum. Immunol. 2001, vol. 62, № 9, pp. 1051-61. Sokal RR., Genetic, geographic, and linguistic distances in Europe. Proc. Natl. Acad. Sci. U.S.A. 1988, vol. 85, № 5, pp. 1722-6. Cavalli-Sforza L., L Menozzi P., Piazza A. The history and geography of human genes. Princeton, N.J., Princeton University Press, 1994, xi, 541, 518 p. c. Cavalli-Sforza LL., Genes, peoples, and languages. Proc. Natl. Acad. Sci. U.S.A. 1997, vol. 94, № 15, pp. 7719-7724.
138 [9]
[10] [11] [12]
[13]
[14] [15]
[16]
[17]
[18] [19]
[20] [21]
[22]
[23]
[24]
E. Khusnutdinova and I. Kutuev Cavalli-Sforza LL., Piazza A., Menozzi P., Mountain J., Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl. Acad. Sci. U.S.A. 1988, vol. 85, № 16, pp. 6002-6. Puchkov P.I. Divergence of languages and the problem of correlation between languages and reaces. In: Peoples and religions of the world. Moscow, 1998. Tishkov V.A. Peoples and religions of the world. Moscow. The Big Russian Encyclopedia, 1998. Anderson S., Bankier A., T Barrell B., G Bruijn de MH., Coulson AR., Drouin J., Eperon IC., Nierlich DP., Roe BA., Sanger F., Schreier P.H., Smith AJ., Staden R., Young IG., Sequence and organization of the human mitochondrial genome. Nature. 1981, vol. 290, № 5806, pp. 457-65. Horai S., Evolution and the origins of man: clues from complete sequences of hominoid mitochondrial DNA. Southeast Asian J., Trop. Med. Public Health. 1995, vol. 26, № Suppl 1, pp. 146-54. Giles RE., Blanc H., Cann HM., Wallace DC., Maternal inheritance of human mitochondrial DNA. Proc. Natl. Acad. Sci. U.S.A. 1980, vol. 77, № 11, pp. 6715-9. Ward RH., Frazier B., Dew Jager K., Paabo S., Extensive mitochondrial diversity within A single Amerindian tribe. Proc. Natl. Acad. Sci. U.S.A. 1991, vol. 88, №, pp. 8720-8724. Torroni A., Sukernik R., I., Schurr T., G., Starikorskaya Y., B., Cabell M., F., Crawford M., H., Comuzzie A., G Wallace DC mtDNA variation of aboriginal Siberians reveals distinct genetic affinities with Native Americans. Am. J. Hum. Genet. 1993, vol. 53, № 3, pp. 591-608. Wallace DC., Mitotic segregation of mitochondrial DNAs in human cell hybrids and expression of chloramphenicol resistance. Somat. Cell Mol. Genet. 1986, vol. 12, № 1, pp. 41-9. Wallace DC., Structure and evolution of organelle genomes. Microbiol. Rev. 1982, vol. 46, № 2, pp. 208-40. Ashley R., Peterson E., Abbo H., Gold D., Corey L., Comparison of monoclonal antibodies for rapid detection of cytomegalovirus in spin-amplified plate cultures. J. Clin. Microbiol. 1989, vol. 27, № 12, pp. 2858-60. Forster P., Ice Ages and the mitochondrial DNA chronology of human dispersals: A., review. Philos Trans R., Soc. Lond B., Biol. Sci. 2004, vol. 359, № 1442, pp. 255-264. Excoffier L., Evolution of human mitochondrial DNA: evidence for departure from A., pure neutral model of populations at equilibrium. J. Mol. Evol. 1990, vol. 30, № 2, pp. 125-39. Mishmar D., Ruiz Pesini E., Golik P., Macaulay V., Clark AG., Hosseini S., Brandon M., Easley K., Chen E., Brown M., D., Sukernik R., I., Olckers A., Wallace DC., Natural selection shaped regional mtDNA variation in humans. Proc. Natl. Acad. Sci. U.S.A. 2003, vol. 100, № 1, pp. 171-176. Kivisild T., Shen P., Wall DP., Do B., Sung R., Davis KK., Passarino G., Underhill PA., Scharfe C., Torroni A., Scozzari R., Modiano D., Coppa A., Knjiff de P., Feldman MW., Cavalli-Sforza LL., Oefner PJ., The role of selection in the evolution of human mitochondrial genomes. Genetics. 2005, vol., №, pp. Wallace DC., Brown MD., Lott MT., Mitochondrial DNA variation in human evolution and disease. Gene. 1999, vol. 238, № 1, pp. 211-30.
Genes and Languages
139
[25] Horai S., Hayasaka K., Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA. Am. J. Hum. Genet. 1990, vol. 46, № 4, pp. 828-42. [26] Kivisild T., Rootsi S., Metspalu M., Mastana S., Kaldma K., Parik J., Metspalu E., Adojaan M., Tolk HV., Stepanov V., Gцlge M., Usanga E., Papiha SS., Cinnioglu C., King R., Cavalli-Sforza L., Underhill PA., Villems R., The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am. J. Hum. Genet. 2003, vol. 72, №, pp. 313-332. [27] Achilli A., Rengo C., Battaglia V., Pala M., Olivieri A., Fornarino S., Magri C., Scozzari R., Babudri N., Santachiara Benerecetti AS., Bandelt HJ., Semino O., Torroni A., Saami and berbers--an unexpected mitochondrial DNA link. Am. J. Hum. Genet. 2005, vol. 76, № 5, pp. 883-886. [28] Achilli A., Rengo C., Magri C., Battaglia V., Olivieri A., Scozzari R., Cruciani F., Zeviani M., Briem E., Carelli V., Moral P., Dugoujon JM., Roostalu U., Loogvali EL., Kivisild T., Bandelt HJ., Richards M., Villems R., Santachiara Benerecetti AS., Semino O., Torroni A., The molecular dissection of mtDNA haplogroup H., confirms that the Franco-Cantabrian glacial refuge was A., major source for the European gene pool. Am. J. Hum. Genet. 2004, vol. 75, № 5, pp. 910-8. [29] Cann RL., Stoneking M., Wilson AC., Mitochondrial DNA and human evolution. Nature. 1987, vol. 325, № 6099, pp. 31-6. [30] Derbeneva OA., Starikovskaia EB., Volod'ko NV., Wallace DC., Sukernik RI., [Mitochondrial DNA variation in Kets and Nganasans and the early peoples of Northern Eurasia]. Genetika. 2002, vol. 38, № 11, pp. 1554-60. [31] Derenko MV., Grzybowski T., Malyarchuk BA., Dambueva IK., Denisova GA., Czarny J., Dorzhu CM., Kakpakov VT., Miscicka Sliwka D., Wozniak M., Zakharov IA., Diversity of mitochondrial DNA lineages in South Siberia. Ann. Hum. Genet. 2003, vol. 67, № 5, pp. 391-411. [32] Finnila S., Lehtonen MS., Majamaa K Phylogenetic network for European mtDNA. Am J. Hum. Genet. 2001, vol. 68, № 6, pp. 1475-1484. [33] Kivisild T., Helle-Viivi T., Parik J., Yiming WS., Surinder SP., Bandelt HS., Villems R., The emerging limbs and twigs of the East Asian mtDNA tree. Mol. Biol. Evol. 2002, vol. 19, № 10, pp. 1737-1751 (erratum 20:162). [34] Kong QP., Yao YG., Sun C., Bandelt HJ., Zhu CL., Zhang YP., Phylogeny of East Asian mitochondrial DNA lineages inferred from complete sequences. Am. J. Hum. Genet. 2003, vol. 73, № 3, pp. 671-676. [35] Loogvali EL., Roostalu U., Malyarchuk BA., Derenko MV., Kivisild T., Metspalu E., Tambets K., Reidla M., Tolk HV., Parik J., Pennarun E., Laos S., Lunkina A., Golubenko M., Barac L., Pericic M., Balanovsky OP., Gusar V., Khusnutdinova EK., Stepanov V., Puzyrev V., Rudan P., Balanovska EV., Grechanina E., Richard C., Moisan JP., Chaventre A., Anagnou NP., Pappa KI., Michalodimitrakis EN., Claustres M., Golge M., Mikerezi I., Usanga E., Villems R., Disuniting uniformity: A pied cladistic canvas of mtDNA haplogroup H., in Eurasia. Mol. Biol. Evol. 2004, vol. 21, № 11, pp. 2012-21. [36] Macaulay V., Hill C., Achilli A., Rengo C., Clarke D., Meehan W., Blackburn J., Semino O., Scozzari R., Cruciani F., Taha A. ,Shaari NK., Raja JM., Ismail P., Zainuddin Z., Goodwin W., Bulbeck D., Bandelt HJ., Oppenheimer S., Torroni A.,
140
[37]
[38]
[39]
[40]
[41]
[42]
[43] [44]
[45]
E. Khusnutdinova and I. Kutuev Richards M., Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005, vol. 308, № 5724, pp. 1034-6. Malyarchuk BA., Derenko MV., Mitochondrial DNA variability in Russians and Ukrainians: Implications to the origin of the Eastern Slavs. Ann. Hum. Genet. 2001, vol. 65, № 1, pp. 63-78. Malyarchuk BA., Grzybowski T., Derenko MV., Czarny J., Drobnic K., Miscicka Sliwka D., Mitochondrial DNA variability in Bosnians and Slovenians. Ann. Hum. Genet. 2003, vol. 67, № Pt 5, pp. 412-25. Metspalu M., Kivisild T., Metspalu E., Parik J., Hudjashov G., Kaldma K., Serk P., Karmin M., Behar DM., Gilbert MTP., Endicott P., Mastana S., Papiha SS., Skorecki K., Torroni A., Villems R., Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet. 2004, vol. 5, № 1, pp. 26. Pakendorf B., Wiebe V., Tarskaia LA., Spitsyn VA., Soodyall H., Rodewald A., Stoneking M., Mitochondrial DNA evidence for admixed origins of central Siberian populations. Am. J. Phys. Anthropol. 2003, vol. 120, № 3, pp. 211-24. Quintana Murci L., Chaix R., Wells S., Behar D., Sayar H., Scozzari R., Rengo C., Al Zahery N., Semino O., Santachiara Benerecetti AS., Coppa A., Ayub Q., Mohyuddin A., Tyler Smith C., Mehdi Q., Torroni A., McElreaveyK Where West meets East: The complex mtDNA landscape of the Southwest and Central Asian corridor. Am. J. Hum. Genet. 2004, vol. 74, №, pp. 827-845. Richards M., Macaulay V., Hickey E., Vega E., Sykes B., Guida V., Rengo C., Sellitto D., Cruciani F., Kivisild T., Villems R., Thomas M., Rychkov S., Rychkov O., Rychkov Y., Golge M., Dimitrov D., Hill E., Bradley D., Romano V., Cali F., Vona G., Demaine A., Papiha S., Triantaphyllidis C., StefanescuG Tracing European founder lineages in the Near Eastern mtDNA pool. Am. J. Hum. Genet. 2000, vol. 67, № 5, pp. 1251-1276. Schurr T., G., Wallace DC Mitochondrial DNA diversity in Southeast Asian populations. Hum. Biol. 2002, vol. 74, № 3, pp. 431-52. Starikovskaya EB., Sukernik RI., Derbeneva OA., Volodko NV., Ruiz Pesini E., Torroni A., Brown MD., Lott MT., Hosseini SH., Huoponen K., Wallace DC., Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origins of Native American haplogroups. Ann. Hum. Genet. 2005, vol. 69, № Pt 1, pp. 67-89. Tambets K., Rootsi S., Kivisild T., Help H., Serk P., Loogvali E., L Tolk HV., Reidla M., Metspalu E., Pliss L., Balanovsky O., Pshenichnov A., Balanovska E., Gubina M., Zhadanov S., Osipova L., Damba L., Voevoda M., Kutuev I., Bermisheva M., Khusnutdinova E., Gusar V., Grechanina E., Parik J., Pennarun E., Richard C., Chaventre A., Moisan J., P Barac L., Pericic M., Rudan P., Terzic R., Mikerezi I., Krumina A., Baumanis V., Koziel S., Rickards O., Stefano De GF., Anagnou N., Pappa KI., Michalodimitrakis E., Ferak V., Furedi S., Komel R., Beckman L., Villems R., The Western and Eastern Roots of the Saami-the Story of Genetic "Outliers" Told by Mitochondrial DNA and Y Chromosomes. Am. J. Hum. Genet. 2004, vol. 74, № 4, pp. 661-82.
Genes and Languages
141
[46] Yao YG., Zhang YP., Phylogeographic analysis of mtDNA variation in four ethnic populations from Yunnan Province: new data and A., reappraisal. J. Hum. Genet. 2002, vol. 47, №, pp. 311-318. [47] Kivisild T., Reidla M., Metspalu E., Rosa A., Brehm A., Pennarun E., Parik J., Geberhiwot T., Usanga E., Villems R., Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am. J. Hum. Genet. 2004, vol. 75, № 5, pp. 752-70. [48] Kivisild T., Rootsi S., Metspalu M., Metspalu E., Parik J., Kaldma K., Usanga E., Mastana S., Papiha SS., VillemsR. The genetics of language and farming spread in India. In: BellwoodPRenfrewC, editors. Examining the farming/language dispersal hypothesis. Cambridge: The McDonald Institute for Archaeological Research; 2003. P., 215-222. [49] Comas D., Plaza S., Wells R., S., Yuldaseva N., Lao O., Calafell F., BertranpetitJ Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur. J. Hum. Genet. 2004, vol. 12, № 6, pp. 495-504. [50] Alexeev V.P. Geography of human races. Moscow. 1974. 351 P. [51] Kuzeev R.G. Peoples of Volga and Urals. Moscow. 1985, 308 P. [52] Kosven M.O. Peoples of the Caucasus. Ed. Odr. Moscow. 1960. 612 P. [53] Sambrook J., Fritsch E., F Maniatis T., Molecular cloning: A., laboratory manual. Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press, 1989. [54] Andrews RM., Kubacka I., Chinnery PF., Lightowlers RN., Turnbull DM., Howell N., Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 1999, vol. 23, № 2, pp. 147. [55] Torroni A., Huoponen K., Francalacci P., Petrozzi M., Morelli L., Scozzari R., Obinu D., Savontaus ML., Wallace DC., Classification of European mtDNAs from an analysis of three European populations. Genetics. 1996, vol. 144, № 4, pp. 1835-50. [56] Richards MB., Macaulay VA., Bandelt HJ., Sykes BC., Phylogeography of mitochondrial DNA in western Europe. Ann. Hum. Genet. 1998, vol. 62, № Pt 3, pp. 241-60. [57] Macaulay VA., Richards MB., Hickey E., Vega E., Cruciani F., Guida V., Scozzari R., Bonnu Tamir B., Sykes B., Torroni A., The emerging tree of West Eurasian mtDNAs: A., synthesis of control-region sequences and RFLPs. Am. J. Hum. Genet. 1999, vol. 64, № 1, pp. 232-249. [58] Tanaka M., Cabrera VM., Gonzalez AM., Larruga JM., Takeyasu T., Fuku N., Guo LJ., Hirose R., Fujita Y., Kurata M., Shinoda K., Umetsu K., Yamada Y., Oshida Y., Sato Y., Hattori N., Mizuno Y., Arai Y., Hirose N., Ohta S., Ogawa O., Tanaka Y., Kawamori R., Shamoto Nagai M., Maruyama W Shimokata H., Suzuki R., Shimodaira H., Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res. 2004, vol. 14, № 10a, pp. 1832–1850. [59] StatSoft Inc. STATISTICA (data analysis software system), version 7 2004. [60] Richards M., Corte Real H., Forster P., Macaulay V., Wilkinson Herbots H., Demaine A., Papiha S., Hedges R., Bandelt HJ., Sykes B., Paleolithic and neolithic lineages in the European mitochondrial gene pool. Am. J. Hum. Genet. 1996, vol. 59, № 1, pp. 185203. [61] Richards M., Macaulay V., Torroni A., Bandelt HJ., In search of geographical patterns in European mitochondrial DNA. Am. J. Hum. Genet. 2002, vol. 71, № 5, pp. 1168-74.
142
E. Khusnutdinova and I. Kutuev
[62] Metspalu E., Kivisild T., Kaldma K., Parik J., Reidla M., Tambets K., Villems R., The Trans-Caucasus and the Expansion of the Caucasoid-Specific Human Mitochondrial DNA. In: Papiha S., Deka R., Chakraborty R., editors. Genomic Diversity: Application in Human Population Genetics. New York: Kluwer Academic / Plenum Publishers; 1999. P., 121-134. [63] Torroni A., Bandelt HJ., D'Urbano L., Lahermo P., Moral P., Sellitto D., Rengo C., Forster P., Savontaus M., L., Bonne Tamir B., Scozzari R., mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am. J. Hum. Genet. 1998, vol. 62, № 5, pp. 1137-52. [64] Torroni A., Richards M., Macaulay V., Forster P., Villems R., Norby S., Savontaus M., L., Huoponen K., Scozzari R., Bandelt HJ., mtDNA haplogroups and frequency patterns in Europe. Am. J. Hum. Genet. 2000, vol. 66, № 3, pp. 1173-7. [65] Comas D., Calafell F., Mateu E., Perez Lezaun A., Bosch E., Martinez Arias R., Clarimon J., Facchini F., Fiori G., Luiselli D., Pettener D., Bertranpetit J., Trading genes along the silk road: mtDNA sequences and the origin of Central Asian populations. Am. J. Hum. Genet. 1998, vol. 63, № 6, pp. 1824-38. [66] Horai S., Murayama K., Hayasaka K., Matsubayashi S., Hattori Y., Fucharoen G., Harihara S., Park K., S., Omoto K., Pan IH., mtDNA polymorphism in East Asian Populations, with special reference to the peopling of Japan. Am. J. Hum. Genet. 1996, vol. 59, № 3, pp. 579-90. [67] Kolman C., Sambuughin N., Bermingham E., Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics. 1996, vol. 142, № 4, pp. 1321-34. [68] Schurr T., G., Sukernik R., I., Starikovskaya Y., B., Wallace DC., Mitochondrial DNA variation in Koryaks and Itel'men: population replacement in the Okhotsk Sea-Bering Sea region during the Neolithic. Am. J. Phys. Anthropol. 1999, vol. 108, № 1, pp. 1-39. [69] Bermisheva M., Viktorova T., Tambets K., Villems R., Khusnutdinova E. Diversity of mtDNA in peoples of Volga-Ural region of Russia. Molecular biology. 2002, Vol. 36, pp.905-906. [70] Tambets K., Tolk HV., Kivisild T., Metspalu E., Parik J., Reidla M., Voevoda M., Damba L., Bermisheva M., Khusnutdinova E., Golubenko M., Stepanov V., Puzyrev V., Usanga E., Rudan P., Beckmann L., Villems R., Complex signals for population expansions in Europe and beyond. In: BellwoodPRenfrewC, editors. Examining the farming/language dispersal hypothesis, McDonald Institute for Archaeological Research Monograph Series. Cambridge: Cambridge University Press; 2003. P., 449458. [71] Villems R., Rootsi S., Tambets K., Adojaan M., Orekhov V., Khusnutdinova E., Yankovsky N., Archaeogenetics of Finno-Ugric speaking populations. In: JulkuK, editors. The Roots of Peoples and Languages of Northern Eurasia IV. Oulu: Societas Historiae Fenno-Ugricae; 2002. P., 271-284. [72] Khusnutdinova E., Khidiatova I., Fatkhlislamova R., Viktorova T., Restriction polymorphism of mtDNA HVSI in populations of Volga-Ural region. Genetika. 1999, Vol. 5, P. 586-592. [73] Kuzeev R.G. The origin of Bahskirs, Moscow. 1974, 570 P. [74] Asfandiarov A., Asfandiarova K., History of Bashkir villages of Perm and Sverdlov region, Ufa, 1999. 253 P.
Genes and Languages
143
[75] Yao YG., Kong QP., Bandelt HJ., Kivisild T., Zhang YP., Phylogeographic differentiation of mitochondrial DNA in Han Chinese. Am. J. Hum. Genet. 2002, vol. 70, № 3, pp. 635-651. [76] Yao YG., Kong QP., Wang CY., Zhu CL., Zhang YP., Different matrilineal contributions to genetic structure of ethnic groups in the silk road region in china. Mol. Biol. Evol. 2004, vol. 21, № 12, pp. 2265-80. [77] Bakunin V.M. Description of Kalmyk peoples, especially Torgout, of their actions, Khans and masters. Elista. 1995. 153 P. [78] Kereitov R. Ethnic history of Nogays. Stavropol. 1999. 176 P. [79] Bermisheva M.A., Kutuev I. A., Korshunova T.Yu., Dubova N.A., Villems R., Khusnutdinova E.K. Phylogeographic Analysis of Mitochondrial DNA in the Nogays: A Strong Mixture of Maternal Lineages from Eastern and Western Eurasia Molecular Biology. 2004, Vol.38, p. 516-523. [80] Derbeneva OA., Starikovskaya EB., Wallace DC., Sukernik RI., Traces of early Eurasians in the Mansi of northwest Siberia revealed by mitochondrial DNA analysis. Am. J. Hum. Genet. 2002, vol. 70, № 4, pp. 1009-14. [81] Derenko MV., Malyarchuk BA., Denisova GA., Dorzhu ChM., Karamchakova ON., Luzina FA., Lotosh EA., Dambueva JK., Ondar UN., Zakharov JA., Polymorphism of the Y-Chromosome Diallelic Loci in Ethnic Groups of the Altai-Sayan Region. Russian Journal of Genetics. 2002, vol. 38, №, pp. 309-314. [82] Underhill PA., Inferring Human History: Clues from Y-Chromosome Haplotypes. Cold Spring Harbor Symposia on Quantitative Biology. Volume LXVIII: Cold Spring Harbor Laboratory Press; 2003. P., 487-493. [83] Underhill PA., Passarino G., Lin AA., Shen P., Lahr Mirazon M., Foley R., Oefner PJ., Cavalli Sforza LL., The phylogeography of Y., chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet., 2001, vol. 65, № 1, pp. 4362. [84] Sajantila A., Paabo S., Language replacement in Scandinavia. Nat. Genet. 1995, vol. 11, № 4, pp. 359-360. [85] Kittles RA., Perola M., Peltonen L., Bergen AW., Aragon R., A., Virkkunen M., Linnoila M., Goldman D., Long JC., Dual origins of Finns revealed by Y., chromosome haplotype variation. Am. J. Hum. Genet. 1998, vol. 62, № 5, pp. 1171-9. [86] Sajantila A., Lahermo P., Anttinen T., Lukka M., Sistonen P., Savontaus ML., Aula P., Beckman L., Tranebjaerg L., Gedde Dahl T., Issel Tarver L., DiRienzo A., Paabo S., Genes and languages in Europe: an analysis of mitochondrial lineages. Genome Res. 1995, vol. 5, № 1, pp. 42-52. [87] Rosser ZH., Zerjal T., Hurles ME., Adojaan M., Alavantic D., Amorim A., Amos W., Armenteros M., Arroyo E., Barbujani G., Beckman G., Beckman L., Bertranpetit J., Bosch E., Bradley DG., Brede G., Cooper G., Corte Real H., B., Knijff de P., Decorte R., Dubrova YE., Evgrafov O., Gilissen A., Glisic S., Golge M., Hill EW., Jeziorowska A., Kalaydjieva L., Kayser M., Kivisild T., Kravchenko SA., Krumina A., Kucinskas V., Lavinha J., Livshits LA., Malaspina P., Maria S., McElreavey K., Meitinger T., A., Mikelsaar A., V Mitchell RJ., Nafa K., Nicholson J., Norby S., Pandya A., Parik J., Patsalis P., C., Pereira L., Peterlin B., Pielberg G., Prata M., J., Previdere C., Roewer L., Rootsi S., Rubinsztein D., C., Saillard J.,
144
E. Khusnutdinova and I. Kutuev Santos FR., Stefanescu G., Sykes BC., Tolun A., Villems R., Tyler Smith C., Jobling MA., Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am. J. Hum. Genet. 2000, vol. 67, № 6, pp. 152643.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 5
COMMON AND SPECIAL FEATURES OF THE HUMAN RIBOSOMAL DNA Natalia. S. Kupriyanova and Alexei. P. Ryskov The Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia
ABSTRACT Ribosomes are among the most ancient and important cell organelles and have structural features common for all modern organisms. Ribosomal DNA (rDNA) in all the vertebrate genomes exists in a form of abundant discrete clusters. Genes within human ribosomal DNA clusters are tandemly repeated in a head- to- tail fashion and exist at multiple chromosomal locations, as it occurs in other vertebrates. Tandemly arranged rDNA repeats comprise so- called nucleoli organizers (NORs), specific chromosomal regions, were nucleoli are forming during a mitotic telophase. Each ribosomal RNA (rRNA) gene consists of coding regions for 18S, 5.8S and 28S rRNA and ribosomal intergenic spacer (rIGS). The coding region being formed by external transcribed spacer (5‘ETS)-18SrDNA-internal transcribed spacer1 (ITS1)-5.8SrDNA- internal transcribed spacer2 (ITS2)-28SrDNA-3‘ETS is transcribed as a long precursor (pre-rRNA). The variable rDNA regions differ in size and sequence from organism to organism, within organisms, and within individual species. Length variable regions exist upstream of the promoter, downstream of the terminator, and, at least in higher primates, in the central part of the rIGS. The rIGS harbors the gene promoter and terminator, the spacer promoter and terminator and, in human and apes, many Alu-retroposons, together with many sites of sequence motifs that can adopt alternative structures. Subtelomeric DNA regions, adjacent to the 5‘-ends of rDNA clusters at all human‘s NOR+ chromosomes, reveal surprising conservativeness, suggesting their participation in the rDNA conservativeness supporting. The multi-copy and multiple -cluster arrangement of the ribosomal genes makes the evolution of these gene systems very complicated, involving different mechanisms of concerted evolution. Connections were detected between ectopic nucleolus locations, rDNA copy number, methylation status, and transcription activity on some neurological and hereditary diseases, as for ageing and cancer. The vast extent of sequence heterogeneity coupled to the central role rRNA plays in protein translation
146
Natalia S. Kupriyanova and Alexei P. Ryskov makes the rDNA an interesting model system. The next few years should yield the results of the human rDNA sequence analysis in connection with other experimental data and lead to understanding of the role of various sequence motifs within the gene.
INTRODUCTION Cells grow and divide. Some cells grow without of dividing: neurons, or oocytes. Some cells divide without of growing: developing zygotes. For most cells, however, growth and division are coupled, thereby maintaining cell size within narrow limits. Cell growth requires the synthesis of proteins;, the synthesis of proteins requires ribosomes. Thus, ultimately the control of cell growth must involve the control of ribosome synthesis [1]. Genes that code for nucleic acids and proteins forming ribosomes, and genes that service their function, maturation of their transcripts, and activation of mature products, together form a vast polygenic complex, and their coordinated operation is of vital importance for the viability of individual cells and whole organisms. The mechanisms of the human rDNA replication, transcription, and occurrence of variability are far from clear. They are studied in some laboratories throughout the world, together with connected problems [2-10]. In Russia, the rDNA structure and functions in vertebrates have been reviewed before in 1982 [11], and in 2001 [12]. Here, we will discuss the problems of intra-chromosomal, interchromosomal and evolutionary variability of the rDNA clusters along with their 5‘- adjacent regions, some mechanisms of the rDNA expression activity regulation, and the problems of the rDNA status in different physiological states of a human organism: in an ageing and on different diseases, Werner‘s syndrome, schizophrenia, rheumathoid arthritis, Hodgkin‘s disease, other types of blood cancer, and so on.
RIBOSOMAL DNA ORGANIZATION, AND QUANTATIVE EVALUATION OF ITS CONTENT IN THE HUMAN GENOME The tandem structure of the rRNA genes has been demonstrated with various methods and supported by the possibility of their isolation in the form of a distinct band of a characteristic density upon centrifugation in a CsCl gradient, direct observations of rDNA transcription (loops of the lamp-brush chromosomes) during gametogenesis in several organisms [13,14], a restriction pattern specific for tandem repeats, and by resolution of large DNA fragments by pulsed-field gel electrophoresis [15]. Although sometimes evidences appeared against the tandem structure of the rDNA clusters, they were all subsequently refuted. So, it is now safe to say that the great bulk of the rDNA repeats form arrays in all known genomes. Human rDNA comprise tandemly arranged clusters that are present on the p-arms of the five acrocentric chromosomes (13, 14, 15, 21, and 22). They comprise so -called nucleoli organizers (NORs), specific chromosomal regions, were nucleoli are forming during a mitotic telophase. Nucleoli disappear during mitosis with the resulting NORs transforming into secondary subtelomeric constrictions in the condensed chromosomes. In an interphase, a NORs number often reduces as the result of their fusion.
Common and Special Features of the Human Ribosomal DNA
147
The rRNA genes are many times repeated in genomes of higher organisms. A number of rDNA repeating units markedly differ between individual classes of eukaryotes. The rDNA copy number is relatively low and varies within a narrow range (200-500 copies) in insects, birds, and mammals. The situation is different with fish, amphibians, and plants. In these classes, some species, even related, vary in ploidy and the rDNA copy number, and it is not necessarily associated with the genome size. Possibly, polyploidy contributes to the genome variation and promotes evolution [16]. An accuracy of estimating of the rDNA repeats number depends on a precision of the method applied. In the earliest experiments, a number of rDNA in a human diploid genome has been estimated as ~400 copies by saturation hybridization of nuclear DNA immobilized on nitrocellulose filters with labeled rRNA. With the method of Veiko et al. [17], the repeat number varies from 390 to 580 per diploid genome in humans. The most modern method for the detection of a DNA copies number today is based on the PCR in a real-time regime. Chromosome maps of the acrocentric short arms are infrequently studied due in large part to their paucity of transcribed genes and their high concentration of functionally unambiguous repetitive elements. On evidence of FISH and confocal laser scanning microscopy, rDNA clusters account for about 10% of the short arm of the acrocentrics and are isolated from other genes by long (about 10 Mb) satellite sequences [18]. In addition to rDNA, the nucleolus includes large heterochromatin blocks containing non ribosomal repeats. Disseminated repeats and microsatellites, which are major in heterochromatin, tend to form unusual three-dimensional structures, which possibly play an important role in the nucleolar organization and in genome evolution [19]. The above methods have also provided data on the spatial arrangement of individual components in the nucleolus. Thus, both the centromere and the rDNA cluster are close together at the periphery of the nucleus, which explains recombination between rDNA and pericentromeric alphoid satellites of acrocentric chromosomes [18]. An rDNA repeating unit consists of the transcribed region (ribosomal operon) and the ribosomal intergenic spacer (rIGS) (figure 1). In eukaryotic cells, the rRNA genes are transcribed by RNA polymerase I (Pol l) in the nucleolus to produce a large (40S- 47S in various organisms) rRNA precursor (рге-rRNA). A pre-rRNA molecule contains the 5'external transcribed spacer (5'-ETS), 18S rRNA, left internal transcribed spacer (ITS1), 5.8S rRNA, right internal transcribed spacer (ITS2), 28S rRNA, and 3' external transcribed spacer (З'-ETS). Mature products (28S, 18S, and 5.8S rRNAs) result from specific nuclease cleavage of pre-rRNA. Evolution is associated with elongation of 18S and 28S rRNA genes and of the transcribed spacers, but the general structure of the transcribed region remains constant. In mammals, 28S rRNA is about 1.5 kb longer than in yeast and consists of alternating conserved and variable regions. Inserts and nucleotide substitutions in these regions are phenotypically neutral. Intraspecific variation of the transcribed spacers is comparable with that of 28S rDNA. This parameter has been characterized for the human transcribed spacers [4] and ITS1 and ITS2 of higher primates [20].
148
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 1. The arrangement of the human ribosomal DNA repeats. The genes of 18S, 28S and 5.8S rDNA are shown as non filled rectangles. The curved arrows with ‗t‘ denote the transcription start points. The transcribed and non transcribed regions are denoted by thin and thick lines, correspondingly.
As mentioned above, the transcribed regions in the rDNA alternate with rIGSs. The rIGS size noticeably increases from primitive to higher eukaryotes, comprising 10% of the total rDNA repeating unit in yeast and up to 70% in mammals. The rIGS size varies among and between individuals (inter- and intragenomic polymorphism) [3, 5, 21]. The variation mostly concerns the repeat number in arrays with units ranging from 2-6 nt (microsatellite clusters) to several thousand nucleotides (blocks of large repeats). Microsatellite repeats spectra in mammalian rIGSs can partly span, although some microsatellite variants are species specific [22]. As a part of the ―Human genome‖ program, we were searching for highly polymorphic microsatellite markers in the cosmid library of the human chromosome 13 probing its highly ordered filters with labeled oligonucleotides composed by different microsatellite motives [23, 24]. It turns out that three of the seven motives used (TCC, GACA, GA) are detected in the majority (70-80%) of cases in the rIGS and often repeatedly represented in the same cosmid inserts. The clusters formed by GAC and GACT repeats are detected with the same frequency in the rIGS and the last part of the chromosome 13, whereas the clusters formed by TCG and GATG motives are practically absent in the rIGS. The results obtained let to propose that NOR‘s nucleotide sequences are evolving at least partly independently of the bulk part of the genome. We have detected later that in the rIGS of higher primates, a large number of microsatellite clusters formed by (TTGC) n [25, 26]. This motif is absent in rIGS of more primitive primates studied so far. Oligonucleotide probe, homologous to this microsatellite, may be used to detect human rDNA and investigate structural polymorphism in human rIGS (figure 2). The rIGS polymorphism based on differences in a number of repeating elements was earlier thought to have no phenotypic expression. Now it is clear that the repeat number in certain arrays located upstream of the promoter affects the intensity of rRNA gene expression and, consequently, the protein-synthesizing capacity of a cell and the general status of an organism, as most clearly demonstrated in plants [27].
Common and Special Features of the Human Ribosomal DNA
149
Figure 2. Southern hybridization of the nuclear DNA, isolated from the lymphocytes of four higher primate species, with the labeled (TTGC)4 probe. The DNA was digested by EcoRI restriction endonuclease. 1-Homo sapiens; 2- Pan troglodites; 3- Gorilla gorilla; 4- Pongo pygmaeus.
Polymorphisms of another common type are point nucleotide substitutions occurring throughout rDNA and especially in the region upstream of the promoter. Such polymorphisms, known for a long time, are best studied in man and higher primates, and have been first identified as restriction fragment length polymorphism (RFLP) on patterns obtained with EcoRl, HindII, NotI, HinfI, and HindIII [21, 28, 29]. A high concentration of nucleotide substitutions in a region associated with rDNA transcription has been assumed to affect the specificity of transcription factor binding and thereby to contribute to the molecular mechanisms of speciation [2]. Many point nucleotide substitutions have been found in X. laevis and Misgurnus fossilis 5S rDNA and 5S rRNA, which is a constitutive component of ribosomes and forms hydrogen bonds with 18S rRNA [30]. The sequences determined are as heterogeneous that there is no way to distinguish major variants between them. More likely, there is a certain 5S rRNA consensus from which individual molecules differ by a number of substitutions. The role of 5S rRNA in ribosomes is still unclear, and it is of interest to analyze whether the sequence variation of 5S rRNA is consistent with that of the other rRNAs and is associated with their interaction [31]. Our studies of cloned ribosomal DNA (rDNA) variants isolated from the cosmid library of human chromosome 13 have revealed some disproportion in representation of different rDNA regions [32]. We have shown also nonrandom cleavage of human rDNA with Sau3A or its isoshizomer MboI under mild hydrolysis conditions. The hypersensitive cleavage sites were found to be located in the ribosomal intergenic spacer (rIGS), especially in the regions of about 5-5.5 and 11 kb upstream of the rRNA transcription start point. This finding is based
150
Natalia S. Kupriyanova and Alexei P. Ryskov
on sequencing mapping of the rDNA insert 5‘- and 3‘- ends in randomly selected cosmid clones generated in a course of human chromosome 13 cosmid library engineering. It lets to propose that some Sau3A sites on a native rDNA exhibit hypersensitivity to the Sau3A restriction endonuclease action (figure 3). To answer this proposal, an experimental procedure was developed including EcoRl exhaustive treatment of genomic DNA followed by Sau3A (or MboI) endonuclease action at a low enzyme concentration for different periods of time and subsequent blot hybridization with the specific labeled rDNA fragment [32]. There exists a number of data about chromatin sites hypersensitive to endonucleases action. A detailed structural analysis of the mouse pre-promoter rDNA chromatin, for an example, has revealed hypersensitive sites in the ori of replication, enhancer repeats, spacer promoters, the two replication providing sites and so on [7, 9]. However, hypersensitive cleavage sites produced by micrococcal endonuclease on the naked SV40 DNA were described only once [33]. So, our results show that a methylation status and supercoiled state of the rIGS regions has no effect on cleavage sites‘ sensitivity. However, all primary cleavage sites are adjacent to or enter into Alu retroposons. A number of regulatory elements harbor in Alu elements, including Pol III promoter and terminator, and ―hot points‖ of DNA recombination [34-36]. These data let to suggest a possible role of neighboring sequences in an extent of the Sau3A sites‘ nuclease accessibility.
Figure 3. (A) A general map of the human ribosomal DNA repeat. EcoRl cleavage sites producing fragments A, B, C, and D are indicated by vertical arrows. The positions of oligonucleotide probes rl, r2, r3, r4, and r5 used for the identification of EcoRl fragments are denoted by vertical bars under the main line; clusters of (ttgc)n hybridizing with r3 are denoted by one bold bar; t, transcription start point. The location of each cosmid insert on the rDNA map is shown above the major rDNA line by the numbered horizontal lines. The insert‘s ends that reveal no homology with the human rDNA are indicated by dotted lines. (B) An expanded scheme of the fragment С with all Sau3A sites denoted by vertical lines under the main horizontal line and numbered in the direction upstream of the transcription start point. The predominant Sau3A sites revealing hypersensitivity to the action of Sau3A are shown by hammer symbols above the line; the Alu elements are indicated by horizontal arrows, according to their direction. The location of cdc 27 is shown by a horizontal bracket.
Common and Special Features of the Human Ribosomal DNA
151
Human rIGS pre-promoter region contains besides of unque nucleotide sequences and micro- and minisatellite clusters three pairs of collinear Alu retroposons. In our experiments, PCR amplification was used to find new structural variations in human rIGS. It turns out that on PCR amplification of the two rIGS regions containing collinear Alu repeats separated by microsatellite clusters (Alu1-Alu2, and Alu3-Alu4), the two PCR products are formed contrary to the expectations, the expected one and shortened one (figures 4, 5). All our results on cloning and sequencing of the PCR products unambiguously indicate that the shorter fragments do not exist in native genomic DNA but are forming during PCR reaction (37, 38). The shortened fragments lack one Alu element and sequences between collinear Alu pairs. A presence in the Alu1- Alu4 of specific nucleotide variations makes it possible to map stop points of the PCR amplification. They harbor in the most conservative (Alu-―core‖) part or the retroposons. It seems plausible that the Alu-―core‖ being a part of the Pol III promoter can display an elevated affinity to the Taq-polymerase arresting its movement along a DNA strand. It leads to a premature termination of the reaction and forming of hybrid shortened fragments. In any case, our results indicate that a great care should be exercised in interpreting comparison PCR data of complex loci, such as rIGS, generally used in evolutionary or comparative studies.
Figure 4. The scheme of the human rDNA repeat. (a) A, B, C and D are EcoRI-defined segments; ttranscription start point; (b) expanded BamHI-EcoRI section of fragment C, where Alu elements are shown by numbered arrows, the deleted part of the 1.8 kb fragment containing the 90 bp md cluster is shown by a gray line; positions of the specific and mcs probes and P1 and P2 primers are denoted by triangles, asterisks and letters, respectively.(c) A putative mechanism of the shortened PCR products arising.
152
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 5. Electrophoresis in 0.8% agarose of PCR products obtained as the result of the Alu3-md/mcsAlu4 region amplification. (a) UV; (b) Southern blot probing with the human rIGS specific oligonucleotide marker and (c) with microsatellite marker (ttgc)4. Lanes 1-4: genomic DNA of unrelated individuals, lane 5: rIGS containing cosmid DNA isolated from human chromosome 13 library (LA1 3NC01, Los Alamos). 1 kb ladder was used as a marker.
Beyond of rDNA arrays, there are experimental evidences that dispersed rRNA pseudo genes and sequences similar to rIGS (orphans) are abundant in eukaryotic genomes. The phenomenon of orphans was firstly shown for insects, D. melanogaster and D. simulans [39, 40]. Amplification of the 13.5 kb rIGS region (up to 200 copies) was also observed in the mouse BALB/c line [41]. The four clones were isolated from higher primate‘s genomes with interrupted 18S rDNA [42- 44]. A presence of rDNA- like sequences outside of there clusters was repeatedly recorded in the human genome [4]. On the distal part of the human chromosome 22, some rDNA- like segments were detected including those homologous to the 28S rDNA and rIGS [5], whereas the part proximal to the rDNA cluster, was completely lacking in them [45]. Some features of a structure and genomic distribution of pseudogenes let to propose that in the majority of cases, they most probably do not enter into and do not interact with genetically active rDNA clusters, i.e., exist beyond of the nucleolar region. It was proposed, however, that the 18S rDNA pseudogene with mosaic structure in D. melanogaster, which includes alternating conservative and diverged regions, could imply a reversible character of the 18S rDNA mutations and possibility of their restoration as the result of gene conversion with the normal rDNA [46]. Blot-hybridization analysis of the clones harboring rIGS fragments, isolated from the cosmid library of the human chromosome 13, revealed the two clones with vast expanded (10 and 26 kb) deletions in the rIGS [47], (figure 6). The deletions were mapped on a comparison of the rDNA sequence from the GenBank (U13369) with recombinant insertions‘ sequences from the cosmid clones 36G10 and 47H2, correspondingly. In the both cases, 5‘- and 3‘-ends of the deletions were located in the microsatellite (TC)n clusters. Comparative FISHs of the genome DNA with the insertion from the clone 47H2 and corresponding native rIGS DNA segment were performed in mild or hard conditions. The experiments revealed an intensive hybridization of the 47H2 probes with all NOR+ chromosomes in mild conditions, whilst hybridization took place only with chromosomes 13 and 21 as the result of hard washing. The results obtained demonstrate a possibility of presence in the human genome of rDNA units harboring deletions in their rIGSs (figure 7). A restriction-hybridization analysis of the 47H2 inserts showed their complex nature, namely, alternation of highly conservative rDNA regions with foreign segments. A searching in databases detected the ETS fragment at the 5‘end of the insert and showed a presence of an unidentified nucleotide sequence at its 3‘-end. An unsuccessful searching of a homology for this nucleotide sequence in the human genome
Common and Special Features of the Human Ribosomal DNA
153
is possibly connected with an exclusion of the short arms of the NOR+ chromosomes from the ―‗Human genome‖‘ program.
Figure 6. The positions of the prolonged deletions in the 36G10 (a), and 47H2 (b) cosmid clones isolated from human chromosome 13 library (LA1 3NC01, Los Alamos). EcoRl cleavage sites producing fragments A, B, C, and D are indicated by vertical arrows; t- transcription start point. The fragments remained intact after deletions are shown underline.
Figure 7. FISH localization of the cosmid clone 47H2 (with prolonged deletion) on human chromosomes: (a) – after mild washing; (b) – after hard washing.
154
Natalia S. Kupriyanova and Alexei P. Ryskov
SUBTELOMERIC AND SUBCENTROMERIC DNA AREAS NEIGHBOURING RDNA CLUSTERS IN ACROCENTRIC CHROMOSOMES It is shown, that sequences distal to rDNA at all acrocentrics in human and higher primates, which might have been expected to evolve independently without correcting each other, reveal, nonetheless, a high extent of uniformity [4-5]. The results obtained let to suppose that the 5‘- flanking regions play an important role in the conservativeness maintenance and/or in the regulation of the variability of the NOR‘s nucleotide sequences. The regions of the short arms of acrocentric chromosomes adjacent to the rDNA clusters are often involved in recombination between rDNA repeats. The Robertsonian‘s fusion reveals an extreme example of such an event, when rDNA clusters‘ deletion is accompanied by their q-arms fusion on retention of one or two centromeres. Analogous fusions can occur between homologous and nonhomologous acrocentric chromosomes in humans. Some time ago, during cloning of the site of X/21 translocation responsible for a rise of the Duschen muscular dystrophy, the clones harboring an edge between the rDNA and adjacent non ribosomal region (DJ) were isolated [48]. It was estimated by means of these clones that the rRNA transcription process at the p-arm of the acrocentric chromosome occurs in the direction of the telomere. A primary structure of the 8.3 kb of the non ribosomal region adjacent to the 5‘-end of the rDNA cluster was determined on the human chromosome 21 [5]. An analysis of the nucleotide sequence showed that the distal segment differs from the rIGS by an absence of prolonged tracts of simple repeats and benched DNA structures. It contains some fragments homologous to the 28S rDNA and rIGS and also of the two possible pseudogenes, 11 Alu elements, one LINE element and two MER4 fragments [5]. The nucleotide sequence of the link between the 3‘-end of the rDNA cluster and non ribosomal DNA on the human chromosome 22 was also estimated. It is localized in the ITS1 and represents a unique sequence 68 b.p. long, followed by a cluster of repeating 147 b.p. elements. This cluster is detected on all human‘s acrocentric chromosomes and is involved in forming of the repeating units of a more high order, of about 6.4-6.8 kb in size [4].
COMPLEX STRUCTURE AND FAST EVOLUTION OF THE HUMAN SUBTELOMERS A tendency for revision of main hypothesis of nuclear DNA evolution can be observed in recent years. According to universally accepted models, appreciable genome reorganizations were rarely occurring in evolution, no more often than once for 10 MYR [49, 50]. However as the result of the ―‗Human Genome Program‖‘ realization, it becamesbecomes clear that a wide range of prolonged segments‘ duplications took place for the last 35 MYR, namely, during primates‘ evolution [51]. By segments‘ duplications are meant duplications of DNA regions between 1 and about 500 kb in size revealing extent of similarity about 95.5%. Duplicated segments (duplicons) can occur in tandems or, more often, be distributed through genomes. Duplications can be inter-, or intra-chromosomal ones, with inter-chromosomal duplications most often located in subtelomeric and peri-centromeric regions. The structure and evolution of the most considerable in lengths (>1kb) and extent of homology (>90%) segment duplications of the human chromosome 22 was precisely studied [52]. One of the
Common and Special Features of the Human Ribosomal DNA
155
most interesting results of this work consists in the detection of the region on the human chromosome 22 without of homology to chromosomal DNA of other higher primates. It is precisely the end of the nucleotide sequence detected at present time in the pericentromeric region of the human genome. This result lets to propose the most ‗―young‘‖ (from the evolutionary point of view) sequences to be adjacent to chromosomal centromeres. It is a common point of view that human genome have has been totally sequenced. However, sub-telomeric, pericentromeric sequences, and the sequences of the short arms of acrocentric chromosomes have not been completely detected yet. Duplicons located in these regions hamper construction of contigs and thus of the full-sized human genome maps. However, a number of functionally important regions have been sequenced. Comparing of sequences of the short PCR amplified paralogous segments from the distal rDNA regions (rDR) of the human non homologous acrocentric chromosomes revealed their striking similarity, which possibly reflects their functional importance [5]. Recently, new information has appeared about the human‘s subtelomeric duplicon structure and organization. The extent of nucleotide sequence divergence within subtelomeric duplicon families varies considerably, as does the organization of duplicon blocks at subtelomere alleles. Subtelomeric internal (TTAGGG)n-like tracts occur at duplicon boundaries, suggesting their involvement in the generation of the complex sequence organization. Most duplicons have copies at both subtelomere and non-subtelomere locations, but a class of duplicon blocks is identified that is subtelomere-specific. In addition, a group of six subterminal duplicon families are identified that, together with six single-copy telomere-adjacent segments, include all of the (TTAGGG)n-adjacent sequence identified so far in the human genome [52].The significant levels of nucleotide sequence divergence within many duplicon families as well as the differential organization of duplicon blocks on subtelomere alleles may provide opportunities for allele-specific subtelomere marker development. We have performed a sequencing of the rDR segment (~10kb) sub-cloned from the cosmid library of the human chromosome 13 (GenBank, no AF478540). The nucleotide sequence analysis has shown its practically full homology to the paralogous segment from the human chromosome 21. The primary structure of the extra region (~2kB) detected towards the telomere revealed its 84% homology to the nucleotide sequence of the BAC clone of the human chromosome 19 (GenBank, no AC006504). This region of homology is the nearest one to the chromosome 19 centromere among all the regions already sequenced [BLASTN 2.2.18 (Mar.02-2008)]. This result reveals once more information about segmental duplications in the human genome (figure 8a, b). Both nucleotide sequences contain shortened Alu-repeats of 213 b.p. long [53]. This argues for presence of the Alu-fragments in the area under study even before duplication. The detection of the promoter of the gene CD30, connected with Hodgkin‘s lymphoma [54] in the chromosome 19 pericentromeric region together with its absence from the homologous region of the chromosome 13 provides a possibility to suggest that a segment harboring the promoter region invaded into the chromosome 19 after the segment duplication.
156
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 8. A schematic comparison of the human chromosome 13 subteloromeric fragment with: (a) the region of the BAC clone harboring the human chromosome 19 pericentromeric segment. (b) The localization of homologous regions on the 13 and 19 chromosomes is shown by a black rectangle.
SPECIFICITY OF THE HUMAN RDNA EVOLUTION Human rDNA is a tandemly arranged multicopy gene family that is present on p-arms of the five acrocentric chromosomes. Tandem genes on all five chromosomes are subject to concerted evolution, a process that promotes homogeneity among rDNA copies through the mechanisms of unequal homologous exchange and gene conversion. These mechanisms can correct and eliminate new rDNA variants, and they can also promote the spread of new gene variants. This spreading can occur throughout individual clusters, and among homologous and non homologous chromosomes. A number of scientists were trying to elucidate which mechanism is more important (gene conversion or unequal cross over) and to estimate relative frequencies of exchanges within chromatides, between sister chromatides, among homologues, and between nonhomologues chromosomes. Different experiments have yielded different answers to these questions. On the one hand, evidences exist in favor of interchromosomal exchanges (by unequal crossingovercrossing over) resulting in rIGSs lengths‘ variability on nonhomologous chromosomes [21, 55, 56] strengthening in generations [29, 57]. On the other hand, evidence favoring intrachromosomal exchanges includes the data implying linkage disequilibrium in human rDNA [58], the Mendelian inheritance of spacer variants in families [59], human rIGS variants in syntenic fashion [21]. A number of constant and variable areas are found in the human rIGS. The variable regions are adjacent to the start and termination points of transcription. Human genomic DNA contains four major BamHI fragment variants of 3.9kB, 4.6kB, 5.4kB, and 6.2kB in the region located just downstream of the primary transcription termination site, between nucleotides 13473 and 15523 [3, 60]. The human rIGS region preceding the promoter (up to - 7kb) contains three pairs of Alu retroposons alternating with homogenous and hierarchically organized tandem repeats (figure 9). Although a full version of the human rDNA nucleotide sequence is known for more than
Common and Special Features of the Human Ribosomal DNA
157
ten years [3], the features of its functional organization are poorly studied as compared with the other vertebrate models, mouse, rat, and Xenopus laevis [61-64]. The rIGSs of the mouse, rat, and frog harbor a variable number of spacer repeats, which have been shown to act as transcription enhancer elements in conjunction with the spacer promoter. In the mouse, for an example, enhancer elements (135-140 bp) including long poly-T tails abut to the promoter, spanning in a total of about 2 kb. The Xenopus laevis and the rat rIGS pre-promoter regions are organized similarly. The human rIGS lacks analogous repeats before the promotor, whereas the two collinear Alu retroposons divided by about 800 bp of so -called 90 bp repeats [3] formed by hierarchically organized microsatellite motives are positioned about 2 kb upstream of the rRNA transcription start point. The 90 bp repeats in a human rIGS were proposed to function as enhancers, although it has not been yet shown experimentally [5]. At the same time, the 3‘- end of the nearest to the promoter Alu-like B1 retroposon in the mouse is located only 3398 bp upstream of the transcription start point [64]. A drastic difference between the rIGS pre-promoter region organization in humans and other known vertebrates has made it interesting to study this region organization in the great apes. We have PCR amplified, cloned and sequenced the rIGS fragments of about 7 kb in length, located upstream of the rRNA transcription start point for Pan paniscus, Pan troglodytes, Gorilla gorilla and Pongo pygmaeus. The sequences have been registered in EMBLBank under GenBank nos. DQ133470, DQ133468, DQ133471 and DQ133469, correspondingly. Alignment of the primates‘ orthologic nucleotide sequences reveals high extent of similarity, with the exception of highly repetitious region between the two Alu repeats, nearest to the onset of transcription [26], (figure 10).
Figure 9. A scheme of the human rDNA unit organization. The region of investigation is expanded. Long arrows denote positions of the numbered Alu elements and their directions. t – Start point of transcription. Positions of the primers (P1-P3) and oligonucleotides for screening (TI-TIII) are shown by short arrows and short bold lines.
158
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 10. Core repeating units, forming the region between Alu2 and Alu1 in the great apes‘ rIGS. Identical or prevailing bases in vertical columns were used for consensus sequences formation. Consensus sequences are shown in bold letters. a. Homo sapiens; b. Pan paniscus; c. Pan troglodytes; d. Gorilla gorilla; e. Pongo pygmaeus.
Common and Special Features of the Human Ribosomal DNA
159
As far, as the human rIGS sequencing has been performed by parts since a middle of 80s‘, there are no universally adopted designations for the Alu elements, entering into it. So, we decided to designate them by the numbers from ―1‖ to ―6‖ in the direction away from the promoter. The Alu1 and Alu2 retroposones are separated by ~800 bp nucleotide sequence, formed by 90 bp repeats [3]. 90 bp monomers are formed mainly by regularly alternating microsatellite clusters (TTTC) n and (TTGC) n with rare nucleotide substitutions, deletions and insertions. Similar regions are lacking in the rIGS of the mouse, rat and Xenopus laevis. Earlier, we have shown that microsatellite (TTGC) n represents a specific marker sequence for the human and chimpanzee species, being absent in the rIGS of the orangutan and some less highly organized primates [25]. In the human genome, the estimated rate of point mutations is approximately 10-9 mutations/nucleotide/year, while the slippage probability is about 10-3 per repeat per generation. Our results for the great apes also show considerably higher rate of evolutionary changes among simple and microsatellite clusters on a comparison with unique DNA sequences. So, evolutionary repeat dynamics consisting of elongations and shortenings of repeats, combined with point mutation, can be considered as starting mechanisms of the evolution [65-67]. A number of neighboring repeating units can be elongated or shortened as the result of unequal crossover or a replication slippage. If, in doing so, two (or more) adjacent units have analogous base substitutions, they can form internal subcluster, which will evolve later on according to its own dynamics. The great apes‘ rIGS microsatellite evolution model reveals an appropriate illustration for this scheme. Although the ―90 bp‖ repeating units have not yet been experimentally shown to function as enhancers, their major element (CTTT) n has the potential to form triple stranded structures, which could be involved in gene regulation. On the other hand, it is known, that a fraction of MARs might cohabit with transcriptional enhancers. Classical AT-rich MARs have been proposed to anchor enhancers‘ complexes with transcription factors to the nuclear matrix via the cooperative binding to MARs of abundant matrix proteins [68]. The MARs/SARs distribution in the mouse rIGS, where enhancers have been experimentally mapped, lets to suggest, that transcription enhancers are adjacent to the MARs/SARs complexes, but do not enter into them. So, we decided to compare MARs/SARs regions distribution between primates and the mouse, believing that enhancers should be distributed similarly in these taxons. We have scanned the human and the mouse rIGS pre-promoter region (~ 7 kb upstream of the promoter) searching for MARs/SARs elements with the help of MAR-WIZ Programm (figure 11). It can be suggested that the rIGS evolution in the primates‘ ancestor lineage involved divergence and elimination of enhancer repeats nearest to the promoter. It was attended by active divergence of poly-pyrimidinic clusters that resulted in a rise and propagation of new microsatellite motives with the subsequent switching of the enhancer functions into the poly-pyrimidinic region.
160
Figure 11 (Continued)
Natalia S. Kupriyanova and Alexei P. Ryskov
Common and Special Features of the Human Ribosomal DNA
161
Figure 11. Pattern of the MARs/SARs distribution in the rIGS pre-promoter regions (about 5 kb upstream of the promoter) for the primates and mouse.
Figure 12. A scheme of the human rDNA unit organization. The region of investigation is expanded. t – Start point of transcription. The core promoter, and the site of the universal control element binding are denoted by grey rectangles. The positions of (CCCT)n microsatellite clusters are shown by black rectangles.
162
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 13. Electrophoresis in 4% PAAG of the complexes obtained as the result of incubation of the HeLa nuclear protein extracts with [γ -32P] labeled oligonucleotides. The products of binding between the SP1 factor and control oligonucleotide (k); between HeLa extracts and double stranded (5‘-CCCT3‘)6 / (3‘- AGGG-5‘)6 (1) and single stranded (5‘-CCCT-3‘)6 (2) and (5‘-AGGG-3‘)6 (3).
Figure 14. A scheme of the human ribosomal DNA repeats. 18S, 28S and 5.8S rDNA regions are shown in a dark gray. The vertical arrows show EcoRI restriction sites. The curved arrows with ‗t‘ letters denote the transcription start points. The expanded region corresponds to the LR1-LR2 repeats (black rectangles). The variable regions are set off by more light color. The regions of interest are denoted as LR1var and LR2var.
Common and Special Features of the Human Ribosomal DNA
163
In the subsequent work, we have used 24- 40-mer oligonucleotides, corresponding to the major microsatellite motives from the pre-promoter region of the ribosomal DNA for searching of functionally important elements by the method of electrophoresis mobility shift assays (EMSA) with HeLa cells extract. The results obtained showed an absence of binding between double -stranded oligonucleotides and proteins from the extract, whilst complexes between single- stranded microsatellites and proteins have been detected. The protein binding with single- stranded oligonucleotide (AGGG)10 was identified by our colleagues from the Emmanuelle‘s Institute of Biochemical Physics, using the mass-spectrometry method as DNA-dependent protein kinase catalytic subunit (DNA-PKcs). DNA-dependent protein kinase (DNA-PK) comprises a catalytic subunit (DNA-PKs) and DNA-binding protein Ku that interacts with double- and single-stranded DNA and RNA. DNA-PK catalytic subunit can phosphorylate many transcription factors and among other factors strongly repress transcription by RNA polymerase I (Pol I) [69]. A presence of the 5‘-(CCCT) 9- 3‘ /3‘(GGGA) 9-5‘ clusters in the higher primates‘ rDNA pre-promoter region lets to propose that the DNA-PKcs can bind them independently from the Ku subunit (figure 15).
Figure 15. A frequency of (G) n and (AG) m components of the central compound microsatellite cluster with different monomer units. a - (G) n clusters; b - (AG) m clusters.
The central part of the human rIGS also contains a variable region formed by 2kb repeats LR. A number of the LR repeats is usually equal to two, but can vary sometimes from two to three [5]. Changes in the LR number lead to variability of the total rDNA lengths. LR1 and LR2 on the background of 88.8 % similarity harbor four short orthologous hypervariable segments enriched in microsatellite clusters (figure 16). In the previous work, it was shown that the LR1 segment between the nucleotides‘ positions 20, 916 and 21, 000 begins with (G)n and subsequently contains several (AG)n/(CT)n clusters [5]. Comparing individual clone sequences shows that this region has class-specific patterns [5].
164
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 16. A sum of the allele variants (A-H) detected in the region LR2var among 547 sequences taken from ten human genomes. The nucleotide sequences are shown without of taking into account ‗n‘ and ‗m‘ numbers in the (G) n (AG) m clusters. Variable nucleotides and microsatellite arrays a shown in a bold. All the nucleotides are numbered in the both direction from the central (G) n (AG) m clusters. The number of occurrences of each variant found is shown at the end of the corresponding raw. Asterisks in H1-H5 variants indicate a presence of substitutions in the poly-AG clusters, the positions and characters of which are shown to the right of the corresponding rows.
In recent previous work, 36 copies of the rIGS hypervariable segment LR2var, 2276323523 apart from the transcription start point, were cloned from the individual human genome and sequenced [70]. Comparative analysis showed an absence of absolutely identical primary structures among the 36 inserts . More recently, we have studied wide-scale heterogeneity of the 547 LR2var DNA segments with coordinates 22763-23523 apart from the transcription start point obtained from 10 unrelated individuals. A variability of the central (G) n (AG) m microsatellite cluster consists in random changes of the ‗―n‘‖ and ‗―m‘‖ numbers. The ‗―n‘‖ number varies between 4 and 17, and the ‗―m‘‖- between 13 and 30 with random combinations of the (G) n and (AG) m variants. The monomer units‘ numbers (G) 8-11 and (AG) 18-20 are the most abundant in the total representation. The 31 groups of 547 LR2var sequences are shown in figure 17. Nucleotide sequences flanking the central (G)n (AG) m cluster without considering ‗―n‘‖ and ‗―m‘‖ numbers are represented by the two major groups (A and B) with minor variants. The nucleotide sequences of the most abundant group A (82% of all the LR2svar) are practically identical to the GeneBank sequence (GeneBank, U13369).
Common and Special Features of the Human Ribosomal DNA
165
Figure 17. An alignment of the nucleotide sequences of the H1-H5 alleles obtained from LR2var with their counterpart from the LR1var. Homologies are shown by asterisks, and deletions – by hyphens. Homology of the H4 and H5 3‘ ends with the LR2 nucleotide sequence is shown in a bold.
The members of the groups B (13%) and C (3%) exhibit heterogeneity upstream and downstream of the central (G) n (AG) m cluster. The sequences of the B1-B14 and C1-C4 alleles reveal specific features differing them from the A variants. The five uncommon alleles (H1-H5) are depleted of the central (G) n (AG) m cluster, whilst the upstream (AG) 6-10 cluster characteristic for the alleles B, is extended up to (AG) 18-32, and often contains G->C substitutions. The 3‘- part of the H1-H5 sequences harbors base substitutions, deletions, and insertions. A comparison of the H1-H5 sequences with their counterparts from the LR1var reveals in them identical segments (figure 18). The reason is possibly that the two repeats, LP1 and LR2, can exchange by their DNA segments. Different mechanisms possibly promote variability generation in discrete LR2 var segments. The mechanism of microsatellite DNA slip-strands mispairing during replication is mostly consistent with a type of variability in (G) n (AG) m clusters. In studies of minisatellite variability, a convincing body of evidence hasve been accumulated that along with equal exchanging of parts at the cross over points, there is an unequal conversion of one allele by the other. In this model, DNA staggered nicks with the formation of protruding single-stranded ends can invade the allelic partner or sister chromatid [71, 72]. Most strand-invasion events are aborted after a limited extension of the broken single strands, perhaps as a result of mismatch repair systems action. This mechanism could readily account for highly complex, patchwork interallelic transfers [73-74].We infer that differences between nucleotide sequences flanking (G) n (AG) m in the A-H groups could also arise as the result of crossovers and patchwork interallelic transfers. An alignment of the H1H5 sequence variants with their counterparts from LR1var and LR2var lets to propose, that they have arisen as the result of gene conversion between the two LR repeats (figure 19). Rdna status may be inherited and linked to different physiological states of the human organism. The major variably region in the human rDNA was mapped by Southern blot analysis downstream of the initial transcription termination site as the result of the 3‘- end of the 28S rDNA probing [59]. Analysis of this region in 51 individual genomes revealed 8 eight structural variants, two of which were presented in all the genomes studied, while six variants were detected only in some ones in different combinations. Some structural variants were inherited as a total locus, according to Mendel, that possibly reflected their clustering on individual chromosomes. These types of variability were supposed to rise as the result of nonequal crossover between homologous repeats during meiosis [59]. On the other hand, similar genomic analysis of 100 persons belonging to different generations of one family has not revealed a presence of any recombination distinguished from usual meiotic segregation
166
Natalia S. Kupriyanova and Alexei P. Ryskov
[21]. According to other data, children can have sometimes more rDNA copies than their parents, suggesting that nonequal crossover really exist [29, 57]. rDNA methylation status influences on the rRNA transcription activity that, for an example, does not increase in cells with amplified rDNA, which is able to bind antibodies to 5-Me-C, however its demethylation under 5-aza-C action leads to an increase of the rRNA level [75, 76]. It was shown that methylation status of the CpG 145 b.p. upstream of the transcription start point in the rat‘s rIGS can serve as an indicator of the gene activity, while methylation of the CpG 133 b.p. upstream of the transcription start point in the mouse‘s rIGS prevents to binding of the universal transcription factor UBF crucial for the PolI transcription complex formation [77, 78]. It is interesting that CpG is also present in -145 and -135 positions in the human rIGS [3]. Differences in rDNA monomers‘ number and methylation status on ageing were shown for human‘s brain and heart, and a number of mouse tissues [79, 80]. An individual character of decreasing in the rRNA synthesis rate depending on a donor‘s age was detected in human fibroblasts by counting of Ag binding NORs [81]. Werner‘s syndrome (WRN) manifesting itself as premature ageing is caused by mutations in the specifical helicase locus. Werner‘s protein (WRNp) was detected in nucleolus of replicating mammalian cells, where its appearance was connected with transcriptional activity of the rRNA genes [82]. An increase in methylation level was considerably higher on ageing in a cell culture of patients with WRN than its level in control cells [83]. A treatment of a fibroblast cell culture from a patient with rheumatoid arthritis (RA) by an oxidative agent did not result in rRNA synthesis activation, whereas in control cells, rRNA transcription activity showed an increase of 50-80%. The contents of rDNA in blood serum DNA and in DNA from leukocytic nuclei both in healthy donors and in patients with rheumatoid arthritis were compared using dot hybridization method [84]. The transcribed region of rDNA (13.3 kb) contains more than 200 CpG-motifs capable of interacting with TLR9 receptors, which are the mediators of the cell immune response to the action of CpG-rich DNA fragments. The data suggest that DNA from dead cells circulating in the peripheral blood is enriched with sequences possessing potent immunostimulatory properties. An early apoptosis is also characteristic for cells from patients with RA [84]. A comparison of the rDNA copies number in individuals with schizophrenia (42) and healthy ones (33) revealed its higher level on schizophrenia of about 20%, whereas a content of the satellite III DNA and histone genes was practically equal in genomes of all persons. It was shown cytogenetically that a content of active rRNA genes in the genomes of people with schizophrenia was higher than its content in genomes of healthy people (Ag-NORs staining) [85]. An extent of acrocentric (NOR)+ chromosomes association is sometimes using for prognostic and diagnostic purposes in acute deceases, such as immunodeficiency and tumor development, considering that this parameter along with a silver staining test reflects their functional state [86, 87]. Both human and animal malignant cells with structurally abnormal chromosomes often show variation in both the number and location of NORs. Rearrangement and possible amplification of the rRNA gene sites in the human chronic myelogenous leukemia cell line K562 was detected [88]. Karyotypic dissection of Hodgkin‘s disease cell lines (HDLM-1/2/3) revealed ectopic subtelomeres and ribosomal DNA at sites of multiple jumping translocations and genomic amplification [89]. Nascent pre-rRNA overexpressionover expression correlates with an adverse prognosis in alveolar rhabdomyosarcoma [90]. These data and a number of
Common and Special Features of the Human Ribosomal DNA
167
other ones imply that a question about connection between variability and transcriptional activity of the rDNA and a human‘s organism state is far from resolving, and further studies are demanded.
CONCLUSION In spite of crucial importance of an adequate action of the protein synthesizing system for cells and organism as a whole, investigations in this area are far from completion. It concerns information about links between a structural-functional organization, and polymorphism of the rRNA gene regulatory region and trans-factors participating in the gene activity modulation. In actively growing cells, RNA polymeraseI driven rRNA synthesis accounts for 50% of the total cellular RNA production, while a dynamics of the rRNA synthesis activity and mechanisms of polysomes reprogramming in human ontogenesis stay practically unknown. At the same time, the data are accumulated concerning a poly functional role of ribosomal proteins, an important role of nucleolin in the pre-rRNA maturation and traffics. Recent studies have suggested that the nucleolus is involved (possibly in cooperation with the rDNA) in other important functions: in the growth and cell cycle control, tumorigenesis, aging and so on. The data discussed here let to think that the rDNA and ribosomes biogenesis study calls for further investigation and can bring unexpected and important results.
REFERENCES [1] [2] [3]
[4] [5]
[6]
[7] [8]
Dipayan R.,Warner J.R., What better measure than ribosome synthesis? 2004, Genes & Development., v. 18, pp. 2431-2436. Jacob S.T., Regulation of ribosomal gene transcription. 1995, Biochem. J., v.306, pp. 617-626. Gonzalez I.L., Sylvester J.E., Complete sequence analysis of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. 1995, Genomics, v. 27, pp. 431-437. Gonzalez I.L., Sylvester J.E., Beyond ribosomal DNA: on towards the telomere. Chromosoma, 1997, v. 105, pp. 431-437. Gonzalez I.L., Sylvester J.E., Human rDNA: evolutionary patterns within the genes and tandem arrays derived from multiple chromosomes. Genomics., 2001, v. 27, pp. 255263. Gonzalez, I.L., Petersen, R., and Sylvester, J.E., Independent insertion of Alu elements in the human ribosomal spacer and their concerted evolution. Mol. Biol. Evol, 1989, v. 6, pp. 413-423. Grummt I., Life on a planet of its own: regulation of RNA polymerase transcription in the nucleolus. 2003, Genes & Development,. v. 17, pp. 27-35. Grummt I., Regulation of mammalian ribosomal gene transcription by RNA polymerase I. Progr. Nucleic Acid Res., 1999. v. 62, pp. 109-154.
168 [9]
[10]
[11] [12] [13]
[14]
[15]
[16] [17]
[18] [19] [20]
[21] [22]
[23]
[24]
[25]
Natalia S. Kupriyanova and Alexei P. Ryskov Langst G., Schatz T., Langowsky J., Grummt I., Structural analysis of mouse rDNA: Coincidence between nuclease hypersensitive sites, DNA curvative and regulatory elements in the intergenic spacer. Nucleic Acids Res., 1997. v. 25, pp. 511-517. Mayer C., Bierhoff H., Grummt I., The nucleolus as a stress sensor: JNK2 inactivates the transcription factor TIF-IA and down-regulates rRNA synthesis. Genes & Development., 2005. v. 19, pp 933-941. Nosikov V.V., Braga E.A., Structural organization of eukaryotic ribosomal genes. Itogi Nauki Tekhn:Mol. Biol., Moscow,VINITI,. 1982, pp. 110-125. Kupriyanova N.S., Conservation and variation of ribosomal DNA in eukaryotes. Mol. Biol.(Moscow), 2000, v. 34, pp.753-765. Scheer U., Trendelenburg M.F., Krohne G., Franke W.W., Lengths and patterns of transcriptional units in the amplified nucleoli of oocytes of Xenopus laevis. Chromosoma, 1977, v. 60, pp. 147-167. Kupriyanova, N., Popenko, V., Eisner, G., Vengerov Y., Timofeeva M., Tikhonenko A., Skryabin K., Bayev A., Organization of loach ribosomal genes (Misgurnus fossilis L.), Mol. Biol. Rep., 1982, v. 8, pp. 143-148. Srivastava A.K., Harino Y., Schlessinger D., Ribosomal DNA clusters in pulsed-field gel electrophorrsis analysis of human acrocentric chromosomes. Mammal Genome, 1993, v. 4, pp.445-450. Long E.O., Dawid I.B., Repeated genes in eukaryotes. Annu. Rev. Biochem., 1980. v. 49, pp.727-764. Veiko N.N., Lyapunova N.A., Bogush A.V., Tsvetkova T.G., Gromova E.V., Detection of the rRNA genes number in individual human genomes. Mol. Biol.(Moscow)., 1996, v. 30, pp. 1076-1086. Kaplan, F.S., Murray, J., Sylvester, J.E., et al., The topographic organization of repetitive DNA in the human nucleolus. Genomics, 1993, v. 15, pp. 123-132. Moyzis R.K., Torney D.C., Meyne J. et al., The distribution of interspersed repetitive DNA sequences in the human genome. Genomics, 1989, v. 4, pp. 273-289. Gonzalez I.L., Sylvester J.E., Smith T.F., Stambolian D., Schmickel R.D., Ribosomal rRNA gene sequences and hominoid phylogeny. Mol. Biol. Evol., 1990, v. 7, pp. 203219. Ranzani G.N., Bernini LJF., Crippa M., Inheritance of rDNA spacer lengths variants in men. Mol. Gen.Genet., 1984, v.196, pp. 141-145. Nanda I., Zischler H., Epplen C., Gutlenbach M., Schmid M., Chromosomal organization of simple repeated DNA sequences used for DNA fingerprinting. Electrophoresis, 1991, v. 12, pp. 193-203. Braga E.A., Kapanadze B.I., Kupriynova N.S., Brodyansky V.M., Netchvolodov K.K., Shkutov G.A., Ryskov A.P., Nosikov V.V., Yankovsky N.K,. Analysis of the distribution of microsatellites of seven motiffs within a cosmid of an ordered human chromosome 13 library.Mol. Biol. (Moscow), 1995, v. 29, pp. 1001-1010. Ryskov A.P., Kupriynova N.S., Kapanadze B.I., Netchvolodov K.K., Pozmogova G.E., Prosnyak M.I., Yankovsky N.K., Frequency of various mini- and micro-satellite sequences in DNA of human chromosome 13. Genetika. (Moscow), 1993, v. 29, pp.1750-1754. Kupriyanova N.S., Netchvolodov K.K., Ryskov A.P., Microsatellite (ttgc)n, specific for the intergenic spacer of human and chimpanzee rDNA: use for studying the structural
Common and Special Features of the Human Ribosomal DNA
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
169
variations of the prepromoter region of rDNA. Mol. Biol. (Moscow), 1999, v. 33, pp.314-318. Netchvolodov K.K., Boiko A.V., Ryskov A.P., Kupriynova N.S., Evolutionary divergence of the pre-promotor region of ribosomal DNA in the great apes. DNA seq., 2006, v.17, pp. 378-91. Akhunov E.D., Chemeris A.V., Kulikov A.M., Vakhitov V.A., Functional analysis of diploid wheat promoter by transient expression. Biochim. Biophys. Acta, 2001, v. 1522, pp. 226-229. Arnheim N., Krystal M., Wilson G., Ryder O., Zimmer E., Molecular evidences for exchanges among ribosomal genes on non homologous chromosomes in man and apes. Proc. Natl. Acad. Sci., 1980, v. 77, pp. 7323-7327. Kuick R., Asakawa J.-i., Neel J.V., Kodaira M., Saton C., Thoraval D., Gonzalez I.L.,Hanash S.M., Studies of the inheritance of human ribosomal DNA variants detected in two-dimentional separations of genomic restriction fragments. Genetics, 1990, v. 144, pp. 307-316. Sedman Y.E., Shostak N.G., Kupriynova N.S., Serenkova T.I., Fengelghauer P.E., Gimalov F., Lind A.Y., Timofeeva M.Y., Iintragenomic polymorphism of the 5S rRNA primary structure in a loach (Misgurnus fossilis L.). A transcriptional activity determination. Mol. Biol. (Moscow), 1989, v.23, pp. 1295-1307. Kuo B.A., Gonzalez I.L., Gillespie D.A., Sylvester J.A., Human ribosomal RNA variants from a single individual and their expression in different tissues. Nucleic Acids Res., 1996, v. 24, pp. 4817-4824. Kupriyanova N.S., Kirilenko P.M., Netchvolodov K.K., Ryskov A.P., Preferential cleavage sites for Sau3A restriction endonuclease in human ribosomal DNA. Biochem. Biophys. Res. Com., 2001, v. 272, pp. 11-15. Nedospasov S.A., Georgiev G.P., Non-random cleavage of SV40 DNA in the compact minichromosome and free in solution by micrococcal nuclease. Biochem. Biophys. Res. Comm., 1980, v. 92, 532-539. Lehrman M.A., Russel D.W., Goldstein J.L., Brown M.S., Alu-Alu recombination deletes splice acceptor sites and produces secreted low density lipoprotein receptor in a subject with familial hypercholesterolemia. J. Biol. Chem., 1987, v.262, pp. 3354-3361. Saikawa Y., Kaneda H., Yue L., Shimura S., Toma T., Kasahara Y., Yachie A., Koizumi S., Structural evidence of genomic exon-deletion mediated by Alu-Alu recombination in a human case with heme oxygenase-1 deficiency. Hum. Mutat., 2000, v. 16, pp. 178-179. Helisalmi S., Hiltunen M., Vepsalainen S., Iivonen S., Mannermaa A., Lehtovirta M., Koivisto A.M., Alafuzoff I., Soininen H., Polymorphisms in neprilysin gene affects the risk of Alzheimer's disease in Finnish patients. J. Neurol. Neurosurg. Psychiatry, 2004, v. 75, pp. 1746-1748. Shibalev D.V., Voronov A.S., Bashkirov V.N., Kupriyanova N.S., Ryskov A.P., Recombinant products detection on PCR amplification of DNA containing Alu repeats. Docl. Acad. Nauk (Moscow),. 2003, v. 388, pp. 689-693. Kupriyanova N.S., Shibalev D.V., Voronov A.S., Ryskov A.P., PCR- generated artificial ribosomal DNAs from premature termination at Alu sequences. Biomol. Engineering, 2004, v.21, pp. 21-25.
170
Natalia S. Kupriyanova and Alexei P. Ryskov
[39] Childs G., Maxon R., Cohn R.H., Kedes L.M., Orphons: dispersed genetic elements derived from tandem repetitive genes of eukaryotes. Cell, 1981, v. 23(3), pp. 651-663. [40] Lohe A.R. and Roberts P.A., An unusual Y chromosome of Drosophila simulans carrying amplified rDNA spacer without rRNA genes. Genetics, 1990, v.125, pp. 399406. [41] Kominami R. and Muramatsu M., Amplified ribosomal spacer sequence: structure and evolutionary origin. J. Mol. Biol., 1987, v.193, pp. 217-222. [42] Brownell E., Krystal M., Arnheim N., Structure and evolution of human and African ape rDNA pseudogenes. Mol. Biol. Evol., 1983, v.1, pp. 29-37. [43] Mashkova T.D., Tyumeneva I.G., Zinovieva O.L., Romanova L.Y., Jabbs E., Alexandrov I.A., Pericentromeric alpha-satellite DNA in human chromosome 21 bordering with euchromatin DNA. Mol. Biol. (Moscow), 1996, v. 30, pp. 1044-1054. [44] Burdon M.R., Leader D.P., Characterization of a human orphon 28S ribosomal DNA. Gene, 1986, v. 48, pp. 65-70. [45] Sakai K., Ohta T., Minoshima S., Kudof J., Wang Y., Jong P.J., Shimizu N., Human ribosomal RNA gene cluster: identification of the proximal end containing a novel tandem repeat sequence. Genomics, 1995, v. 25, pp. 521-526. [46] Benevolenskaya E.V., Kogan G.L., Tulin A.V., Phillipp D., Gvozdev V.A., Segmented gene conversion as a mechanism of correction of 18S rRNA pseudogene located outside of rDNA cluster in D. melanogaster. J. Mol. Evol., 1997, v. 44, pp. 646- 651. [47] Kirilenko P.M., Kupriyanova N.S., Ryskov A.P., Detection and characteristics of prolonged deletions in the ribosomal DNA cosmid clones from the human chromosome 13. Docl. Acad. Nauk. (Moscow), 2000, v. 371, pp. 60-62. [48] Worton R.G., Sutherland J., Sylvester J.E., Willard H.F., Bodrug S., Dube I., Duff C., Kean V., Ray P., Schmickel R.D., Human ribosomal RNA genes: Orientation of the tandem array and conservation of the 5‘ end. Science., 1988, v. 239, pp. 64-68. [49] Bailey J.A., Yavor A.M., Viggiano L., Musceo D., Horvath J.E., Archidiacono N., Schwartz S., Rocchi M., Eichler E.E., Human-specific duplication and mosaic transcripts: the recent paralogous structure of chromosome 22. Am. J. Hum. Genet., 2002, v.70, pp. 83-100. [50] Melford H.C., Trask B.J., The complex structure and dynamic evolution of human subtelomers. Nat. Rev. Genet., 2002, v. 3, pp. 91-102. [51] Horvath J.E., Bailey J.A., Locke D.P., Eischler E.E., Lessons from the human genome: transitions between euchromatin and heterohromatin. Hum. Mol. Genet., 2001, v.10, pp. 2215-2223. [52] Ambrosini A., Paul S., Hu S., Riethman H., Human subtelomeric duplicon structure and organization. Genome biology, 2007, v. 8:R151 (doi: 10.1186/gb-2007-8-7-r151). [53] Kupriyanova N.S., Shibalev D.V., Voronov A.S., Muravenko O.V., Zelenin A.V., Ryskov A.P., Segment duplications in subtelomeric regions of human chromosome 13. Mol. Biol. (Moscow)., 2003, v.17, pp. 221-227. [54] Durkop H., Oberbarnscheidt M., Latza V., Bulfone-Paus S., Krause H., Pohl T., Stein H., Structure of the Hodgkin‘s lymphoma-associated human CD30 gene and the influence of a microsatellite region on its expression in CD30(+) cell lines. Biochem. Biophys. Res. Com., 2001, v.1519, pp. 185-191.
Common and Special Features of the Human Ribosomal DNA
171
[55] Krystal M., D‘Eustachio, Ruddle F.H., Arnheim N., Human nucleolus organizers on homologous chromosomes can share the same ribosomal gene variants. Proc. Natl. Acad. Sci., 1981, v. 78, pp. 5744-5748. [56] Naylor S.L., Sakaguchi A.Y., Schmickel R.D., Woodworth-Gutal M., Shows T.B., Organization of rDNA spacer fragment variants among human acrocentric chromosomes in somatic cell hybrids. J. Mol. Appl. Genet., 1983, v. 2, pp. 137-146. [57] Schmickel R.D., Gonzalez I.L., Erickson J.M., Nucleolus organizing genes on chromosome 21: recombination and nondisjunction. Ann. N. Y. Acad. Sci., 1985, v. 450, pp. 121-131. [58] Seperack P., Slatkin M., Arnheim N., Linkage disequilibrium in human ribosomal genes: Implications of multigene family evolution. Genetics, 1988, v.119, pp. 943-949. [59] Garkavtsev I.V., Tsvetkova T.G., Yegolina N.A., Gudkov A.V., Variability of human rRNA genes inheritance and nonrandom chromosomal distribution of structural variants of nontranscribed spacer sequences, Hum. Genet.,1988, v. 81, pp.31-37. [60] Sylvester J.E., Gonzalez I.L., Mougey E.B., Structure and organization of vertebrate ribosomal DNA. The nucleolus, Mark Olson ed., 2003 Eurkah.com. [61] Cassidy B.G., Yang-Yen H.F., Rothblum L.I., Transcriptional role for the nontranscribed spacer of rat ribosomal DNA. Mol. Cell. Biol., 1986, v. 6, pp. 27662773. [62] Grummt I., Kuhn A., Bartsch I., Rosenbauer H. A., transcription terminator located upstream of the mouse rDNA initiation site affects rRNA synthesis. Cell, 1986, v.47, pp. 901-911. [63] Moss T., Boseley P.G., Birnstiel M.L., More ribosomal spacer sequences from Xenopus laevis. Nucleic Acids Res., 1980, v. 8, pp. 467-485. [64] Grozdanov P.N., Georgiev O.I., Karagyozov L.K., Complete sequence of the 45-kb mouse ribosomal DNA repeat unit: Analysis of the intergenic spacer. Genomics, 2003, v. 82, pp. 637-643. [65] Dover G., How genomic and developmental dynamics affect evolutionary processes. Bioessays, 2000, v.22, pp. 1153-1159. [66] Cox R., Mirkin S.M., Characteristic enrichment of DNA repeats in different genomes. Proc. Nat.l Acad. Sci., 1997, v. 94, pp. 5237-5242. [67] Borstnik B., Pumpernik D., Mutational dynamics of short tandem repeats in human genome. Europhys. Let., 2004, v. 65, pp. 290-296. [68] Boulikas T., Homeodomain protein binding sites, inverted repeats, and nuclear matrix attachment regions along the human beta-globin gene complex. Cell Biochem., 1993, v. 52, pp. 23-36. [69] Kuhn A., Gottlieb T.M., Jackson S.P., Grummt I., DNA-dependent protein kinase: a potent inhibitor of transcription by RNA polymerase I. Genes & Dev., 1995, v. 9, pp. 193-203. [70] Shibalev D.V., Voronov A.S., Firsov S.Y., Ryskov A.P., Kupriyanova N.S., Detection of intragenomic polymorphism in the LR2 region of human intergenic ribosomal spacer. Mol. Biol. (Moscow), 2004, v. 38, pp. 980-984. [71] Tamaki K., May C.A., Dubrova J.E., Jeffreys A.J., Extremely complex repeat shuffling during germline mutation at human minisatellite B6.7. Hum. Mol. Genet., 1999, v 8, pp. 879-88.
172
Natalia S. Kupriyanova and Alexei P. Ryskov
[72] Paques F., Haber J.E., Multiple pathways of double strand break-induced recombination in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev., 1999, v. 63, pp. 349-404. [73] Buard J., Vergnaud G., Complex recombination events at the hypermutable minisatellite CEB1 (D2S90). EMBO J., 1994, v. 13, pp. 3203-3210. [74] Lam K.W., Jeffreys A.J., Processes of copy-number change in human DNA: the dynamics of {alpha}-globin gene deletion. Proc. Natl. Acad. Sci., 2006, v.103, pp. 8921-8927. [75] Ferraro M., Lavia P., Activation of human ribosomal RNA genes by 5-azacytidine. Exptl. Cell Res., 1983, v. 145, pp. 452-457. [76] Giancotti P., Grappelli C., Pogges I., Persistence of increased levels of ribosomal gene activity in CHO-K1 cells treated in vitro with demethylating agents. Mutat. Res., 1995, v. 348, pp. 187-192. [77] Stancheva I., Lucchini R., Coller T., Chromatin structure and methylation of rat rRNA genes studied by formaldehyde fixation and psoralen cross-linking. Nucleic Acids Res., 1997, v. 25, pp. 1727-1735. [78] Santoro R., Grummt I., Molecular mechanisms mediating methylation-dependent silencing of ribosomal gene transcription. Mol. Cell, 2001, v. 8, pp. 719-725. [79] Johnson R.V., Strehler B.L., Loss of genes coding for ribosomal RNA in ageing brain cells. Nature, 1972, v. 240(5381), pp. 412-414. [80] Johnson L.K., Johnson R.V., Strehler B.L., Cardiac hypertrophy, aging and changes in cardial ribosomal gene dosage in man. J. Mol. Cell Cardiol., 1975, v. 7(2), 125-133. [81] Thomas S., Mukherjee A.B., A longitudinal study of human age-related ribosomal RNA gene activity as detected by silver-stained NORs. Mech. Ageing Dev., 1996, v.92, pp. 101-109. [82] Indig F.E., Partridge J.J., von Kobbe C., Aladjem M.I., Latterich M., Bohr V.A., Werner syndrome protein directly binds to the AAA ATPase p97\VCP in an ATPdependent fashion. J. Struct. Biol., 2004, v.146, pp. 251-259. [83] Machwe A., Orren D.K., Bohr V.A., Accelerated methylation of ribosomal RNA genes during the cellular senescence of Werner syndrome fibroblasts. FASEB J., 2000, v.14, pp. 1715-1724. [84] Veiko N.N., Shubaeva N.O., Ivanova S.M., Lyapunova N.A., Spitkovsky D.M., Blood serum DNA in patients with rheumatoid arthritis is considerably enriched with fragments of ribosomal repeats containing immunostimulatory CpG-motifs. Bull. Exp. Biol. Med., 2006, v.142, pp. 313-316. [85] Veiko N.N., Yegolina N.A., Radzwill G.G., Nurbaev S.D., Kosyakova N.V., Shubaeva N.O., Lyapunova N.A., Quantitative analysis of repetitive sequences in human genomic DNA and detection of an elevated ribosomal repeat copy number in patients with schizophrenia (the results of molecular and cytogenetic analysis. Mol. Biol. (Moscow), 2003, v. 379, pp. 409-419. [86] Grabovskaya I.L., Glukhova L.A., Tsvetkova T.G., Kravetz I.A., Mamaeva S.E., Kutsch A.A., The use of DNA hybridization in situ for identifying chromosomal rearrangements in the karyotyping of cell lines. Mol. Biol. (Moscow), 1992. v. 34, pp. 41-46. [87] Pedrazzini E., Slavutsky I.R., Ag-NOR staining and satellite association in bone marrow cells from patients with mycosis fungoides. Hereditas, 1991. v. 123, pp. 9-15.
Common and Special Features of the Human Ribosomal DNA
173
[88] Crossen P., Godwin J., Rearrangement and possible amplification of the ribosomal RNA gene sites in the human chronic myelogenous leukemia cell line K562. Cancer Genet. Cytogenet., 1985, v.18, pp. 27-30. [89] MacLeod R.A., Spitzer D., Sylvester J.E., Kaufman M., Wernich A., Drexler H.G., Karyotypic dissection of Hodgkin's disease cell lines reveals ectopic subtelomeres and ribosomal DNA at sites of multiple jumping translocations and genomic amplification. Leukemia, 2000, v.14, pp. 1803-1814. [90] Williamson D., Lu Y-J., Fang C., Pritchard-Jones K., Shipley J., Nascent pre-rRNA overexpressionover expression correlates with an adverse prognosis in alveolar rhabdomyosarcoma. Genes, Chromosomes, and Cancer, 2006, v. 45, pp. 839-845.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 6
ETHNIC GENOMICS OF THE EAST EUROPEAN HUMAN POPULATIONS S. A. Limborska*, D. A. Verbenko, A.V. Khrunin and P.A. Slominsky Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia
ABSTRACT We present the results of studies on ethnic genetics conducted at the Department of Molecular Basis of Human Genetics, Institute of Molecular Genetics, in the Russian Academy of Sciences. Many East European populations were studied for a number of DNA polymorphic markers. Detailed population characteristics of markers for the genes encoding chemokine (C-C motif) receptor type 5 (CCR5), myotonic dystrophy (DM), apolipoprotein B (APOB), tumor suppressor p53 (p53), as well as mitochondrial DNA and Y-chromosome polymorphisms, are discussed. Particular distinctions and general trends of variability in the gene pool of the populations studied are shown, providing new data on the complicated nature of the interactions and mutual influences of the wide variety of ethnic groups inhabiting this territory.
INTRODUCTION The years around the turn of this century were marked by impetuous progress in the field of human molecular genetics, initially arising from studies of human genome sequencing conducted within the framework of international and national '―Human Genome'‖ programs. These studies resulted not only in the accumulation of tremendous amounts of information on human DNA structure but also in the development of new efficient DNA typing technologies, the construction and storage of information databases and the development of methods for processing large volumes of results. Based on these advanced studies, a new area of *
Tel: +74991961858; Fax: +74991960221. E-mail:
[email protected] 176
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
research—genomics—has emerged, which has revolutionized modern biology. This discipline now allows the disclosure of many features of genome organization, comparisons of different organisms‘ genomes, the discovery of new genes and genetic elements and the detection of mutations arising in the course of numerous inherited diseases, including some previously unknown types. The elaboration of so many problems resulted in the considerable broadening of areas of interest in molecular genetics as well as in the application of its methods and approaches in both adjacent and rather remote directions of scientific research. These include medical genetics, pharmacology, comparative biology, forensic medicine and biotechnology, as well as anthropology, archaeology and human history. In this connection, specialized branches started developing within the frame of genomics: functional genomics, comparative genomics, medical genomics, computerized genomics and, finally, ethnic genomics (which we refer to here as ‗―ethnogenomics‘‖), whose problems are the subject of this article. The main goal of ethnogenomics is to investigate genomic diversity in the gene pool of separate populations, ethnic communities and ethnic territorial entities [1]. The investigation of extant human population genomes makes it possible to acquire evidence about the most remote historic events, even as far away as the moment of the origin of our species, and this is one of the most intriguing areas of ethnogenomics. To ‗―read‘‖ this evidence, analysis of genomic markers of numerous human entities must be carried out and the degree of their genetic relationships evaluated. Various studies have revealed a fundamental feature of the human genome: namely, its variability— - polymorphism. This feature can be revealed only when the genomes of different individuals are compared and the differences between them brought to light. Even the first studies showed that all humans are very similar as far as the main principles of their genomic organization are concerned, yet many loci have been found allowing us to distinguish one person‘s genome from another‘s with ease. This finding provides the basis for the identification of a person by DNA testing and for establishing familial relationships (for example, paternity or maternity). Genome polymorphism generally denotes neutral genomic variations in different people. Neutral, or ‗―silent‘‖, variations are those that do not reveal themselves phenotypically and do not affect the individual‘s health. The large numbers of polymorphic markers discovered during human genome sequencing provide a powerful tool for the analysis of the gene pool in terms of its dynamics, history and geography. This allows us to generate new evidence about the gene pools of various regions studied and to establish new approaches to the study of basic microevolution trends and the formation of the modern human gene pool. Several types of polymorphism have been distinguished in the human genome, including single nucleotide substitutions, insertion–deletion polymorphisms and polymorphic mini- and microsatellites. However, detailed characterization of individual markers and groups of markers is needed, so that they will be able to serve specific purposes in the future and so that problems involving various temporal and spatial parameters can be solved. The first type of polymorphism—single nucleotide substitutions (single nucleotide polymorphisms, SNPs)—is the most frequent in the genome. Typically, there is one substitution per 300-1000 nucleotides and these are ‗―neutral‘‖ substitutions not affecting health [2, 3]. Approximately six million SNPs have already been identified in the genome and even though this amount appears enormous, it covers only about 0.2% of the whole genome of three billion nucleotides. It should be noted that this type of polymorphism has very low
Ethnic Genomics of the East European Human Populations
177
mutation rate (about 2 10–8), indicating that one base substitution may occur very rarely, for instance, once in the course of several thousand years. The lower the mutation rate of any polymorphism, the more distant the historic events it can be used to mark. As shown below, this type of polymorphism finds application in cases that require elucidation of events that took place at very remote times. There are other types of polymorphism in the genome, for instance, the ‗―hypervariable‘‖ regions. Numerous regions of the genome appear to contain tandem repeats, in which one small nucleotide sequence can be repeated several times in an end-to-end fashion. For example, the gene for myoglobin bears a 33-nucleotide sequence that is repeated four times [4]. The same genomic position might contain 10 such tandem repeats in one person or 15 repeats in another. With such large individual diversity, the informative ability of these markers is very high . It should be noted that such hypervariable region differences arise much more frequently (several thousand times more often) than do SNPs. As will be shown later, investigation of this type of polymorphism allows one to test for comparatively recent events. From the point of view of population studies, all DNA markers can be subdivided into three groups: mitochondrial DNA (mtDNA) markers, autosomal markers and Y-chromosome markers. Polymorphisms among these markers arise from microevolutionary factors (migration, selection, genetic drift and mutations). However, their modes of variability reflect differently the actions and results of these processes. Mitochondrial DNA polymorphism has long been used in population studies because it is relatively simple to isolate. The major features of these polymorphisms are the absence of recombination, a high level of variability and strict maternal inheritance. Y-chromosome polymorphisms are complementary to mtDNA polymorphisms as they show paternal inheritance and typical absence of recombination (with the exception of the pseudoautosomal region). In practical terms, these two types of polymorphism supplement each other by supplying separate evidence about the paternal and maternal contributions to the evolution of populations. This phenomenon offers hitherto unknown opportunities for population studies: namely, the possibility of tracing and comparing the histories of the paternal and maternal lineages of populations and of evaluating their relative contribution into each population‘s gene pool. Passed from generation to generation through only one parental line and taking no part in recombination, they allow genetic events to be rebuilt, theoretically, starting from the hypothetical ancestors of modern humans—the ‗―Y-chromosomal Adam‘‖ and ‗―mitochondrial Eve‘‖—and proceeding to contemporary populations. Nuclear autosomal DNA markers characterize the whole of the human genome and do not focus on the particular genetic contribution of either sex. As many researchers believe, the study of distinct types of nuclear polymorphisms makes it possible to assess many temporal events that happened in the history of a population. At present, DNA polymorphisms are being explored among many human populations of the world. Such studies allow to revealrevealing considerable intra- and intergroup differences for frequencies of polymorphic DNA fragments across many geographic regions, and they have become one of the most important characteristic of the genetic structure of human populations.
178
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
MITOCHONDRIAL DNA POLYMORPHISMS Mitochondrial DNA polymorphisms were among the first used for studying human populations. It should be mentioned here that almost every cell of our organism contains two genomes: the nuclear genome with our essential characteristics encoded and another genome located outside the nucleus in the mitochondria, whose principal role is to provide the cell with energy. Every cell bears between several dozen and several thousand mitochondria, and the genomes of all mitochondria originating from each organism are similar. The mitochondrial genome is very small (16, 569 nucleotides) and carries only 37 genes encoding the proteins and RNAs needed for the functioning of the organelle. It displays a very high level of polymorphism, as mutations accumulate in it substantionally faster than in the nuclear genome. Inheritance of the human mitochondrial genome is maternal, and its analysis therefore supplies evidence about the genetic history of the maternal lineage. Linkage disequilibrium between polymorphisms in mtDNA makes it possible to regard mtDNA as a united locus represented by a majority of alleles: haplotypes whose definite groups correspond to the linkage groups of definite mutations [5]. This particular feature of mtDNA molecules is very useful for molecular studies of evolution, as the mitochondrial gene pool includes numerous combinations that allow the temporal variability of mtDNA molecules to be traced and molecular changes imparted by the evolution of populations to be classified. The geographic region of our interest, Northern Eurasia including the East European Plain, is presently inhabited by peoples listed as being of European and Asian origins as well as by those who combine both components. The mtDNA haplotype sets (‗―mitotypes‘‖) of European and Asian groups differ considerably. Moreover, Asian groups are heterogeneous with several haplotype variants whereas the European groups are less heterogeneous. We studied the mtDNA of three East European populations [6], comprising Eastern Slavs, one Byelorussian population and two Russian populations. Our results showed these populations bear quite a number of different mitotype variations, the most frequent being the so-called haplogroup H typical of most European peoples. The frequency of this haplogroup in mixed populations has helped us to evaluate the European contribution within each maternal lineage.
Y-CHROMOSOME DNA POLYMORPHISMS The human genome also carries a system of markers that allow the evaluation of the male lineage‘s genetic contribution to ethnic history. The Y chromosome is found only in the male genome; it passes from father to son and retains the same genetic material and the same combinations of polymorphous markers. Thus, the structure is very stable in time, although it undergoes changes caused by spontaneous mutations. Investigation of the polymorphism of Y-chromosome markers in Europeans has pointed to their ancient origin. The study by Semino et al. [7], “The Genetic Legacy of Paleolithic Homo sapiens in Extant Europeans: a Y Chromosome Perspective‖, was conducted by a large international team of researchers from two American and several European laboratories, including ours. More than 1,000 men originating from 25 different regions of Europe and the Near East were examined. Analysis of 22 binary markers in the Y chromosome showed that
Ethnic Genomics of the East European Human Populations
179
more than 95% of the samples studied could be restricted to 10 haplotypes or historic pedigrees, with two of them, at that time denoted by Eu18 and Eu19, emerging in Europe during the Paleolithic. More than 50% of the European males studied belong to these ancient haplotypes. Both are related, the only difference being one single point substitution (mutation M17). However, their geographic distribution evolved in opposite directions. Eu18, most common among the Basques, diminishes in frequency from west to east. The age of this haplotype is estimated to be 30, 000 years—thus this is likely to be the most ancient pedigree in Europe starting in the High Paleolithic among a population that inhabited the region of the Iberian Peninsula. The related Y-chromosomal haplotype Eu19 has been distributed differently in European populations. It is not found in Western Europe, and its frequency grows eastward to reach its maximum in Poland, Hungary and the Ukraine, where Eu18 is practically absent. Moreover, the Ukraine can boast the largest diversity of microsatellite markers apart from haplotype Eu19. These combined data allow the assumption that the expansion of this historic pedigree started from this very region. The distribution data for the two main European haplotypes suggest the following scenario. During the Last Glacial Maximum, people who occupied the northeastern and central parts of Europe were forced to migrate westward and southward. Some of them settled down in the Franco–Cantabrian refuge area while the others found refuge in the Balkans. Consequently, people survived in these two distantly isolated regions. Some other data, including those for other DNA markers, support this restored pattern. After the glacial retreat, the second inhabitation of Europe took place, with the Franco–Cantabrian and Balkan refuges being the main sources. Most of the other Y-chromosomal haplotypes are distributed geographically, an indication of their origin from the Near East. However, two of them, Eu7 and Eu8, also emerged in Europe during the Paleolithic, and they probably mark historic events connected with the spreading in Europe of Near Eastern populations in a period before the Last Glacial Maximum. All other Y-chromosomal haplotypes emerged in Europe later. During the Neolithic, there was an expansion of a number of haplotypes from the Near Eastern Region, possibly associated with the expansion of agriculture. Interestingly, a new variant of the Y chromosome was discovered in the course of this study: mutation M178, found only in the northeastern parts of Europe. This haplotype was estimated as being no more than 4,000 years in age, and its distribution might reflect a comparatively recent migration of populations from the Urals. In this way, this study showed that only a little more than 20% of European males belong to those historic lineages that appeared in Europe comparatively recently in the Neolithic, following the Last Glacial Maximum. About 80% of European males belong to ancient lineages that can be traced back to the time of the High Paleolithic. In other words, 80% of the current European male gene pool has Paleolithic and 20% has Neolithic ancestry. Subsequent studies conducted by other authors have confirmed particular details of these results [8, 9]. The tandemly organized hypervariable Y chromosome regions are appropriate in cases where comparatively recent events—1,000 to 2,000 years ago—are of interest. For example, we studied three groups of Eastern Slavs (the Kiev, Novgorod and Pinsk populations). The study was performed in cooperation with colleagues from the Ukraine and the Belarus [10, 11]. Because the divergence of Eastern Slavs is relatively recent, hypervariable regions of the Y chromosome were selected for investigation. Five polymorphic markers (DYS393, DYS392,
180
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
DYS391, DYS390 and DYS19) were analyzed and their combinations (haplotypes) determined. Fourteen haplotypes were discovered, with one 13/11/10/25/16, denoted as No. 1, being the most frequent and another, 13/11/11/24/16 (No. 2), the second (for loci DYS393/DYS392/DYS391/DYS390/DYS19, respectively). Haplotypes No. 1 and No. 2 appeared to be different at only two loci. Both are found in all three populations. Interestingly, haplotype No. 1 was most frequent in Russians, No. 2 in Ukrainians and both haplotypes were represented almost equally in Byelorussians. Analysis of the median network showed that for Byelorussians the genomic environment near haplotype No. 1 was similar to that in Russians and that near haplotype No. 2 was similar to that in Ukrainians. According to these characteristics, the Byelorussian population appears to be the closest to the ancestor Eastern Slavic population with the two remaining populations being its derivatives. The same opinion is shared by some researchers engaged in neighboring fields of research and the cited results support it at the current stage of the studies. Using the same kind of markers, some problems of local significance can be solved. For instance, we carried out a survey of allelic polymorphisms and haplotypes for the same five microsatellites of the Y chromosome in samples from Russian men living in geographically distinct regions (Archangelsk and Kursk) of the European part of Russia [11]. With regard to differences in the culture and the mode of life of these peoples, the first sample can be referred to as Northern Russians and the second one as Southern Russians. Comparative analysis of the allelic frequencies over all loci revealed statistically significant differences between the two populations (p = 0.001). The main contributions to the differentiation were made by the DYS392 (p = 0.005) and DYS393 (p = 0.003) markers. Allelic diversity indices calculated for them were more than 1.5 times higher, and they were close to the maximum values observed in some European populations. On the other hand, in the Kursk population, the values of Y-chromosomal allelic diversity indices in most cases were close to those for populations of the Novgorod region, Ukraine and Belarus [10, 11]. Interpopulation differences in the values of allelic diversity indices for the DYS392 and DYS393 loci revealed resulted from the high frequency of the alleles with 14 repeats in the Archangelsk population. Major alleles with 14 repeats of the DYS392 and DYS393 loci are typical for some Northern European populations [13, 14]. Based on data on allele frequency distributions for the loci of interest, genetic distances were estimated between populations from the Archangelsk and Kursk regions, and some of the European populations including Eastern Slavic ones. Irrespective of the chosen measure of genetic distance (GST, DA, DC, ()2 or DSW), the population from the Archangelsk region was closer to the populations of the Finno-Ugric linguistic group (Saami and Estonians) and to the Latvians, who are geographic neighbors of the Estonians, whereas the Kursk population was always a member of a cluster formed by Eastern Slavic populations (Russians of the Novgorod region, Ukrainians and Byelorussians). A comparative pairwisepair-wise analysis of haplotype frequency (using Fst values as a measure of genetic similarity) confirmed the absence of notable differences between the Russian population of the Archangelsk region and the populations of Saami, Estonians and Latvians. It also showed the genetic similarity of Russians from the Kursk region with Russians from the Novgorod region and with Ukrainians and Byelorussians as well. Phylogenetic analysis of the most frequent Y-chromosomal haplotypes (occurring more than once), based on the step-wise mutation model (where the neighboring haplotypes differ only by one repeated unit), demonstrated substantial differences in haplotype distributions in
Ethnic Genomics of the East European Human Populations
181
median networks of the Kursk and Archangelsk populations. The median network of the Archangelsk population consisted of two haplotype groups that showed equal frequency and were separated by six single-step mutation events. In contrast, the median network of the Kursk sample displayed structural unipolarity (23 haplotypes in one part of the network versus five in the other one). In addition, if haplotypes of one of the median network poles of the Archangelsk population are integrated into the net of major haplotypes of the Kursk population, the remaining ones are neither common nor neighboring for both populations. To determine the possible sources of such dissimilarity in sets of haplotypes among populations, the major haplotypes were included into more extensive median networks along with Byelorussians, Ukrainians, Novgorod regional Russians, Saami and Estonians. This analysis allowed us to show that the differences between the Kursk and the Archangelsk populations were associated with a high prevalence of major haplotypes in the latter, typical mainly for Finno–Ugric populations. The specific genetic nature of people from the Archangelsk region compared with other Slavic populations was also noted in our study of mtDNA polymorphisms when the Russians from Oshevensk were compared with the Russians from the town of Ufa and the Byelorussian population [6]. In mtDNA samples collected in Oshevensk, subcluster U5b1, which is not typical for European populations and is described in the literature as specific to the Saami population, was found at a frequency of 0.07.
AUTOSOMAL DNA MARKERS Markers of the essential part of the genome that are inherited in a sex-independent manner allow the characteristics of entire populations to be studied. These are single locus autosomal DNA markers, comprising two distinct groups: diallelic and multiallelic markers. Diallelic markers are represented by single nucleotide substitutions and insertion–deletion polymorphisms. Multiallelic markers include tandemly organized repeated sequences of miniand microsatellites.
Insertion–Deletion Polymorphisms of the Chemokine (C-C Motif) Receptor Type (CCR5) Gene Insertion–deletion polymorphisms in the gene for CCR5 can exemplify DNA diallelic polymorphism. CCR5 is a coreceptorco-receptor for macrophagotrophic strains of the human immunodeficiency virus HIV1 that is used by this virus for penetration into cells. The gene is localized in the p.21.3 region of chromosome 3. In 1996, a 32 bp deletion was revealed in the gene‘s segment coding for the second extracellular loop of the CCR5 protein. This deletion, denoted CCR532, is likely to prevent the interaction of the receptor with the virus and individuals who are homozygous for CCR532 are resistant to HIV1 infection. The mutant allele is found in European populations and in white Americans at frequencies of 2%–15% (mean 9%). This mutant allele is either rare or absent in populations from Black Africa and the Far East [15, 16]. Thus, the significant ethnic-specific property of this polymorphism was discovered because the frequency of the marker‘s allelic variants differs significantly between
182
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
human populations. The high frequency of CCR532 in some Caucasian populations raises the question of whether it is the result of random genetic drift or a consequence of selective pressure, possibly driven by an increased resistance to some infectious agents or by other factors. The frequency of the CCR532 allele varied widely in studied populations, ranging from 3% to 8.5% in five Asian populations (Tuvians, Uygours, Azerbaijanians, Kazakhs and Uzbeks) to 12% to 14% in populations of Tatars, Russians and Byelorussians [17, 18]. A homozygotic genotype in the mutant allele was only revealed in one individual from the Udmurtian population. In the Volga–Urals region, the lowest frequencies of the CCR532 allele were found in the northeastern and southeastern ethnogeographic groups of Bashkirs (2.17% and 2.50%, respectively). It was also calculated as low for the total population of Bashkirs (3.66%). The highest frequency of the allele was observed among Tatars (13.44%) who, like the Bashkirian population, are considered to belong to the Turkish branch of the Altai linguistic family. The mean frequency of the deletion was 7.02% in the Volga–Urals region [19]. The centers of maximal frequency are located in the northern and eastern parts of Europe. The emergence in these regions of centers with a large accumulation of such mutant genes seems unexpected, as these people had never encountered HIV1-related acquired immune deficiency syndrome (AIDS) before its emergence. Presumably, other infectious agents also make use of the CCR532 receptor for penetration into cells. In this case, selection could result in the accumulation of this mutation in the focal points of infection. Regardless, the presence or absence of this particular deletion is a type of polymorphism that can be used effectively in population analysis. The cartographic simulation of the distribution of the CCR532 mutation has been based on our own data combined with data from the literature, using 77 populations from different regions of the Old World [20]. The frequency of the CCR5 deletion allele was the highest in the populations of northeasternNorth-eastern Europe and gradually decreased from the Baltic region in all directions. This mutant allele was either rare or absent in populations from Black Africa and the Far East. Climatic and geographic data (annual radiation balance, average January temperature, average July temperature, total amount of insolation, altitude and the annual precipitation rate) were obtained from an atlas [21]. Spearman‘s rank–order correlation coefficients were computed between CCR532 frequencies and climatic and geographic variables. Table 1 presents the correlations between the CCR532 allele frequencies and each climatic or geographic factor addressed. We found a strong positive correlation with latitude (r = 0.72) and a somewhat weaker negative correlation with longitude (r = –0.34). Our data also suggest that the annual radiation balance, total amount of insolation and the average July temperature also affect the CCR532 allele frequency and its expansion throughout the world (r = –0.66, r = –0.66 and r = –0.64, respectively). The average January temperature and altitude have weaker negative effects (r = –0.50 and r = – 0.26, respectively). The annual precipitation rate showed no correlation with the frequency of CCR532 gene distribution in the Old World.
Table 1. Coefficients of correlation between CCR5Δ32allele frequencies and climatic-geographic parameters Parameter Number
1 2 3 4
Climatic parameters
Annual radiation balance (kcal/cm2/year) Average January temperature (°C) Average July temperature (°C) Total amount of insolation (kcal/cm2/year)
5 6
Longitude Latitude
7 8
Altitude (m) Annual precipitation rate (mm/year)
Coefficients of rank-order Spearman correlation
Coefficients of partial correlation
coefficient values depending on temperature parameters Temperature parameters –0.66*** –0.66
numbers of parameters which are held a constant
coefficient values if latitude is held constant
2,3,4
-0.42
–0.50***
+0.50
1,3,4
+0.25
–0.64***
–0.22
1,2,4
–0.09
–0.66***
–0.06
1,2,3
–0.22
Geographical coordinates –0.34** 0.72*** Common parameters –0.26* –0.07
– – –0.34 –0.07
Significance levels: *** - p