ADVANCES IN PROTEIN CHEMISTRY Volume 38
This Page Intentionally Left Blank
ADVANCES IN PROTEIN CHEMISTRY EDITED BY ...
17 downloads
875 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN PROTEIN CHEMISTRY Volume 38
This Page Intentionally Left Blank
ADVANCES IN PROTEIN CHEMISTRY EDITED BY
C. 6. ANFINSEN
JOHN T. EDSALL
Department of Biology The Johns Hopkins University Baltimore, Maryland
Department of Biochemistry and Molecular Biology Harvard University Cambridge, Massachusetts
FREDERIC M. RICHARDS Department of Molecular Biophysics and Biochemistry Yale University New Haven, Connecticut
VOLUME 38
1986
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Orlando San Diego New York Austin Boston London Sydney Tokyo Toronto
COPYRIGHT 0 1986
BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC Orlando, Florida 32887
United Kingdom Edition published by
ACADEMIC PRESS INC.
(LONDON) 24-28 Oval Road. London NWI 7DX
LTD.
LIBRARY OF CONGRESS CATALOG C A R D N U M B E R : 44-885 3 ISBN 0-12-034238-3 PRINTED IN THE UNITED STATES OF AMERICA
86878889
9 8 7 6 5 4 3 2 1
CONTENTS Regulatory and Cytoskeletal Proteins of Vertebrate Skeletal Muscle
IWAOOHTSUKI. KOSCAKMARUYAMA. AND SETSURO EBASHI I . Introductory Remarks . . . . . . I1. Calcium Regulatory Proteins: Troponin and Tropomyosin . . . . . . . . I11. Connectin (Titin) . . . . . . . References . . . . . . . .
.
.
1
. . .
. . .
7 52
60
Mechanistic Aspects of DNA Topoisomerases
ANTHONYMAXWELLAND MARTINGELLERT I. I1. I11. IV . V. VI . VII . VIII . IX. X.
Introduction . . . . . . . The Reactions of Topoisomerases . . DNA Binding . . . . . . DNA Cleavage . . . . . . DNAReunion . . . . . . ATP Hydrolysis . . . . . . Processivity in Topoisomerase Reactions Covalent Modification of Topoisomerases Mechanistic Models . . . . . Concluding Remarks . . . . . References . . . . . . .
. . .
. .
.
. .
. .
.
.
.
. . . .
.
. . . .
69 72 78 83 92
93 97 98 99 102 103
Molecular Mechanisms of Protein Secretion: The Role of the Signal Sequence
MARTHAS. BRICCSAND LILAM . GIERASCH I. I1. I11. IV . V.
Introduction . . . . . . . Historical Background . . . . . The Signal Sequence . . . . . Components of the Secretory Apparatus How Does Secretion Occur? . . . V
. . . . . . . . . . . . . . .
110 110 113 128 142
vi
CONTENTS
VI. What Are the Roles of the Signal Sequence?. VII. Recapitulation . . . . . . . VIII. A Model for the Initial Interactions of Signal Sequences with the Membrane. . . . IX. Signal Sequences as Membrane-Interacting Sequences . . . . . . . . References . . . . . . . .
.
.
.
152 168
.
.
170
. .
. .
171 174
.
Vibrational Spectroscopy and Conformation of Peptides, Polypeptides, and Proteins
SAMUEL KRIMMAND JAGDEESH BANDEKAR
I. Introduction . . . . . . . . 11. Theoretical Considerations. . . . . . 111. Extended Polypeptide Chain Structures. . IV. Helical Polypeptide Chain Structures . V. Reverse Turns . . . . . . . VI. Characteristics of Polypeptide Chain Modes . VII. Vibrational Spectroscopy of Proteins . . . . . . VIII. Prospects for the Future . References . . . . . . . . .
. .
. .
.
.
.
.
.
.
183 185 229 256 297 328 34 1 352 354
. .
. .
. .
. .
AUTHOR INDEX
.
.
.
.
.
.
.
.
.
.
.
365
SUBJECT INDEX
.
.
.
.
.
.
.
.
.
.
.
383
REGULATORY AND CYTOSKELETAL PROTEINS OF VERTEBRATE SKELETAL MUSCLE By IWAO OHTSUKI; KOSCAK MARUYAMA.t and SETSURO EBASHIS "Department of Pharmacology. Faculty of Medicine. Kyurhu University. Fukuoka 812. Japan tDepartment of Blology. Faculty of Science. Chlba Unlverrlty. Chlba 260. Japan *National Institute for Physlologlcal Sciences. Okarakl 444. Japan
I . Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . A. Regulatory Proteins . . . . . . . . . . . . . . . . . . . . . . . . . B . Cytoskeletal Proteins . . . . . . . . . . . . . . . . . . . . . . . . . I1. Calcium Regulatory Proteins: Troponin and Tropomyosin . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . TroponinI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. TroponinC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. TroponinT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Tropomyosin . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Some Aspects of Calcium-Regulatory Mechanisms . . . . . . . . . . . G . Structural Aspects of Troponin and Tropomyosin . . . . . . . . . . . 111. Connectin (Titin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Content in Myofibrils . . . . . . . . . . . . . . . . . . . . . . . . C. Molecular Size and Shape . . . . . . . . . . . . . . . . . . . . . . D . Other Physicochemical Properties . . . . . . . . . . . . . . . . . . . E. Hydrolysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F . Interaction with Myosin . . . . . . . . . . . . . . . . . . . . . . . G . Interaction with Actin . . . . . . . . . . . . . . . . . . . . . . . . H . Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
3 5 7 7 10 15 24 31 36 42 52 53 53 53 54 55 56 57 57 59 60
I . INTRODUCTORY REMARKS The effective contractile machinery of vertebrate striated muscle represents an elaborate framework. The motion of myosin and actin filaments is controlled by regulatory proteins and their position is supported by cytoskeletal proteins . Approximately 65% of the total myofibrillar proteins is myosin and actin. the contractile proteins of muscle. There are a number of both regulatory and cyotoskeletal proteins. as listed in Table I (for review. see Obinata et al., 1981; Maruyama. 1985a). The aim of this article is to describe the structure and function of the major regulatory proteins. troponin and tropomyosin. and also of the main cytoskeletal protein. connectin (titin). The former are perhaps the 1 ADVANCES IN PROTEIN CHEMISTRY. Val. 38
Copyright 0 1986 by Academic Press. Inc. All rights of reproduction in any form reserved.
TABLE I Myofibrillur Structural Proteins of Rabbit Skeletal Muscle"
Molecular weight (kDa)
Content
520 42
43 22
A band I band
Contracts with actin Contracts with myosin
33 x 2
5
I band
Tr op on i n Troponin C* Troponin I*
70 18 21
5
I band
Troponin T*
31
Binds to actin and locates troponin Ca regulation Ca binding Inhibition of actinmyosin interaction Binding to tropomyosin
Proteinb Contractile proteins Myosin* Actin* Regulatory proteins Major Tropomyosin*
Minor M protein M yomesin Creatine kinase* C protein F protein H protein I protein a-Actinin P-Actinin y- Actinin
eu-Actinin ABP (filamin) Paratropomyosin Cytoskeletal proteins Connectin (titin) Nebulin Vinculin Desmin* (skeletin) Vimentin* Synemin Z protein Z-nin
(wt
7%)
Localization
165 185 42 135 121 74 50
2 99% export block for this protein, as measured by the relative amounts of precursor and mature forms of MBP. Not surprisingly, the second mutations that are most effective at relieving this export block are those that change the arginine residue to an uncharged residue, either the wild-type methionine, or glycine or serine. Changing Arg -19 to cysteine restored export to a lesser extent (-50-70% of wild type). This mutation lengthens the hydrophobic region by four amino acid residues (to Lys - 23). Duplications in the signal-sequence coding region that result in addition of two to four uncharged residues are only slightly effective at relieving the export block; these double mutants export about 10-40% of the wild-type levels of MBP. Substitution of a more hydrophobic residue (Met or Phe) for a polar residue (Thr - 11 or Ser - 14, respectively) does not change the length of the core region, but increases its mean hydrophobicity. These mutations suppressed the export block to varying extents. Thus, even a charged residue can sometimes be tolerated in the hydrophobic core, if the remaining residues are sufficiently hydrophobic. From these and other data, the authors conclude that the length of the hydrophobic core may not be as important as its overall hydrophobicity. A mutant of PhoA in which Gln is introduced in place of Leu - 14 in the hydrophobic core causes an export defect, which argues that overall hydrophobicity, and not just the presence of a charge, may alter signal sequence function (Michaelis et al., 1983). Hortin and Boime (1980) studied the effect of hydrophobicity changes on secretion by incorporating a polar analog of leucine, P-DL-hydroxyleucine, into nascent chains of several eukaryotic secretory proteins. Addition of the analog to a cellfree translation, secretion, and processing system caused inhibition of translocation and processing of the proteins. Proteins with several leucine residues in the signal sequence (bovine preprolactin, rat preprolactin, and human placental prelactogen) were affected more strongly than one with few leucines in the signal sequence (the a subunit of human chorionic gonadotropin). In a similar experiment, Walter et al. (198 1) found that incorporation of P-hydroxyleucine into preprolactin partially alleviated the translation block that occurs in the presence of SRP (Section IV,C, 1). Incorporation of P-hydroxyleucine affects only the overall hydrophobicity of the core region, and not its length. These results indicate that decreasing the hydrophobicity of this domain is sufficient to abolish signal-sequence function.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
125
It has been suggested that the hydrophobic region may be involved in specific interactions, for example, with SRP (Emr et al., 1980; Silhavy et al., 1983). Clearly the mutations described above, as well as substitutions of P-hydroxyleucine for leucine, could disrupt such interactions and cause an export defect. However, the wide variation of signal sequences, and their interchangeability, argue against its specific recognition on the basis of sequence.
G. The Signal Peptidare Cleavage Site
The last five (in eukaryotes) or six (in prokaryotes) residues of the signal sequence are more polar than those in the hydrophobic region, and define the cleavage site for signal peptidase (von Heijne, 1984b). von Heijne (1985) called this the ‘‘c region.” There is markedly less variability in the length and sequence of this domain than in the rest of the signal sequence. Although the c region ranges in length from 10 residues in S . aweus protein A, to zero in ovine a-S2 casein (von Heijne, 1985) (in this case, part of the hydrophobic region forms the signal peptidase cleavage site), almost all of the known signal sequences have four to seven residues in this region. The effect of varying the length of this domain is not known. The cleavage site is defined primarily by the last three residues of the c region. von Heijne (1983) and Perlman and Halvorson (1983) have postulated the “-3,- 1” rule, which states that the residues that occupy positions - 1 and - 3 of the signal sequence must have small neutral side chains. Alanine is most common by far at these sites, but cysteine, serine, threonine, and glycine are found occasionally. Position - 2 is more variable, but frequently has a large aromatic or hydrophobic side chain. The remaining residues of the c region are also variable, but are usually polar, and have been predicted to favor formation of a P turn (Perlman and Halvorson, 1983) (see Section 111,H). A change in the amino acid at the cleavage site (the - 1 residue) can result in a lack of processing or a change in the processing site. For example, substitution of valine for alanine at the cleavage site of yeast invertase inhibits and delays processing (Schauer et al., 1985). While none of the protein is cleaved at the proper site, a small amount of cleavage occurs at an alr ‘mate site, between Ser + 1 and Met + 2 of the mature protein. Sii iilarly, incorporation of P-hydroxynorvaline in place of threonine at the processing site of rat preprolactin causes a change in cleavage site and slowed processing (Hortin and Boime, 1981a,b).
126
MARTHA S. BRIGGS AND LILA M. GIERASCH
H . Predictions of Signal Sequence Conformation
The lack of sequence homology among signal peptides, combined with their interchangeability in vivo, has prompted searches for conformational similarities. The method of Chou and Fasman (1974a,b) for predicting conformation from primary structure has been applied (Austen, 19’79). The Chou-Fasman method is based on the frequencies of occurrence of the amino acids in various types of secondary structure in water-soluble, globular proteins. Application of these data to signal sequences may be questioned, as signal sequences are probably found in a hydrophobic environment, such as the membrane or an apolar pocket in a protein. In general, the results indicate that signal sequences have a high probability of adopting a helix and fl sheet in the hydrophobic region. This result is not surprising, since the residues that favor the interior of globular proteins are generally hydrophobic, and also tend to occur in a-helical and @sheet conformations. Often both a and p structures are predicted, with one or the other being only slightly more probable. Most signal sequences are predicted to adopt a /3 turn in the c region, near the signal peptidase cleavage site, while the charged aminoterminal domain shows no consistent conformational preference. A conformational energy calculation carried out on the murine K light chain signal sequence revealed a favored a-helical structure throughout the hydrophobic region (Pincus and Klausner, 1982). The importance of signal-sequenceconformation for proper function has been tested by determining the effect on activity of sequence changes that are predicted to change conformational tendencies; these are discussed below. Only in the case of the LamB mutants (see below and Section II1,F) have correlations been made between the actual and predicted conformational preferences of the altered and native sequences. Brown et al. (1984) introduced insertions of three or four amino acids into the yeast invertase signal sequence near its amino terminus. The insertions were predicted to stabilize an (Y helix, favor a p turn, or to destabilize both a-helix and /I-sheet formation. None of these alterations prevented proper secretion of the protein. Thus, the amino-terminal domain of the signal sequence is relatively unconstrained as to conformation. The c region is more sensitive to conformational alterations. The E. coli wild-type lipoprotein signal sequence is predicted to form a p turn at positions -7 to -4. Alanine was substituted for serine (position - 6 ) , which occurs frequently in p turns, or threonine (position - 5 ) , which is also found in j3 turns, but less often than serine, or both (Vlasuk et al.,
MOLECULAR MECHANISMS OF PROTEIN SECRETION
127
1984). Substitution of Ala for Thr -5 alone did not cause a predicted loss of &turn conformation, while replacement of Ser -6 or both Ser -6 and Thr -5 yielded a structure predicted to lack a p turn. The phenotypes of the mutants correlate with the predicted presence or absence of the turn. Those predicted to lack a @ turn accumulate membrane-bound precursor lipoprotein, which is slowly processed to mature lipoprotein. Thus, the absence of a region favoring p turn in the c domain can inhibit removal of the signal sequence. Alterations predicted to change the conformation of the hydrophobic core can have profound effects on signal sequence function. The wildtype E. coli MBP is required for use of maltose as a nutrient. Its signal sequence is predicted (Chou and Fasman, 1974a,b)to be an a helix or p structure over much of its length. A mutant that has a Pro, which disfavors both a helix and /3 sheet, substituted for a Leu, which favors both conformations, at position - 17 is defective for MBP export (Bedouelle et al., 1980). This mutation decreases the number of hydrophobic residues capable of supporting an a helix from 18 to 10. The length of such a sequence (10 residues X 1.5 &residue in an Q helix) is 15 A, which is below the proposed minimum hydrophobic axis length required for signal sequence function (18 A, Bedouelle and Hofnung, 1981b). In this strain, precursor MBP accumulates in the cytoplasm, and growth on maltose minimal medium is very slow. This mutation is as effective in preventing MBP export as other mutations that place a charged residue in the hydrophobic core of the signal sequence. A similar mutation has been found in the signal sequence of E. coli ribose-binding protein (Iida et al., 1985). Substitution of Leu for Pro - 17 results in total inhibition of export. A pseudorevertant at position - 15 has a Phe in place of a Ser. Restored export function could result from altered conformation of the signal sequence or from increased hydrophobicity. The signal sequence of E. coli LamB is shown in Fig. 5. Its hydrophobic region is predicted to adopt a largely a-helical conformation, despite the presence of helix-breaking proline and glycine residues (Bedouelle and Hofnung, 1981a; Emr and Silhavy, 1983). Because the proline and glycine are separated by seven residues, the predicted helical potential (Chou and Fasman, 1974a,b) of the peptide is not severely reduced at any one point. A mutant strain defective for LamB export was found to lack four amino acid residues in the hydrophobic core of the signal sequence (Emr et al., 1980).This deletion brings the proline and glycine to within four residues of each other. The conformation of the hydrophobic region is now expected to be random (Bedouelle and Hofnung, 1981b; Emr and Silhavy, 1983). Bedouelle and Hofnung (1981b) predicted that a second mutation changing the proline to leucine,
128
MARTHA S. BRIGGS AND LILA M. GIERASCH
Wild Type
- 20 -15 -10 -5 . -1 l Met Met Ile Thr Leu Arg LyS Leu Pro Leu Ala Val Ala Val Ala Ala Gly Val Met Ser Ala Gln Ala Met Ala/Val
-25
Deletion Mutant -25
-20
-15
-in -5 -1 +l V a l Ala Ala Gly Val Met Ser Ala Gln Ala net Ala/Val
-15
-in Val Ala Ala
Met Met Ile Thr Leu ATg LyS Leu Pro
Gly+Cys
Pseudorevertant
1
-25 -20 Met Met Ile Thr Leu Arg LyS Leu Pro
Pro-Leu
Pseudorevertant
1
-25 -20 Met Met Ile Thr Leu A r g LyS Leu Leu
I
CyS
-5 -1 t 1 Val M e t Set' Ala Gln Ala M e t Ala/Val
-5 -1 +1 la Gly Val Met Sex Ala Gln Ala Met Ala/Val
-10
-15
val Ala
FIG. 5. Wiid-type and mutant E . coli A-receptor protein signal sequences. From Emr and Siihavy (1983).
threonine, alanine, or serine, or changing the glycine to cysteine or serine, would restore export competence, as the signal sequence, though shortened, would be able to adopt an a-helical conformation. Emr and Silhavy ( 1983) subsequently isolated two pseudorevertants from the export-defective deletion mutant. In one, the proline residue had been replaced by leucine, and in the other, the glycine residue had been replaced by cysteine. Both strains were able to export the A-receptor protein at 75-80% of the wild-type efficiency. The implication is clear that adoption of a regular secondary structure, in this case a helix, is necessary to signal sequence function. Hence, bacterial genetic results provide many suggestions of structure-function correlations in signal sequences. These results serve as a point of departure for studies that determine the physical properties of isolated signal sequences as a means of elucidating their mode of action (see Section VI).
IV. COMPONENTS OF THE SECRETORY APPARATUS Understanding of signal sequence action requires knowledge of the species with which it might interact in vivo. Secretion of a protein requires, at the very least, a membrane, a signal sequence, and a signal peptidase. Various other components of the secretory apparatus have been isolated or implicated, such as the SRP, the SRP receptor or docking protein, ribophorins, signal-peptide receptor, and signal-peptide peptidase. This section summarizes biochemical and genetic knowledge of the components of the secretory system.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
129
A . The Membrane The role of the membrane in protein secretion is disputed. In some models it is regarded merely as an obstacle that plays no direct part in translocation (Blobel and Dobberstein, 1975a,b). This view implies that the secreted protein crosses the membrane through a proteinaceous pore or export apparatus, and has no contact with the membrane lipids (Gilmore and Blobel, 1985). Others see the membrane as an important participant in some parts of the secretion process. Various authors have suggested that: (1) The signal sequence initiates translocation by partitioning into the hydrophobic region of the membrane bilayer (Engelman and Steitz, 1981). (2) The membrane potential provides energy required for protein translocation (Rhoads et al., 1984; Wickner, 1980; Oxender et al., 1984; Bakker and Randall, 1984). (3) The membrane phospholipids “pull” the signal sequence and the secreted protein through the membrane by formation of small nonbilayer regions within the membrane (Rapoport, 1985; Nesmayanova, 1982).The evidence for and against these ideas is considered in this section. Note that all of the research described below has been performed on systems derived from E. coli. Although it is tempting to speculate that secretion processes in prokaryotes and eukaryotes are very similar, this is not proven, and extension of the conclusions of these experiments to eukaryotic secretion is risky at best. Lipid Fluidity A major part of the evidence for the participation of the membrane in protein translocation comes from studies of the effects of altered lipid fluidity on secretion. Treatment of E. coli cells with phenethyl alcohol (PEA), which greatly increases membrane fluidity, caused a decrease in secretion of the outer membrane and periplasmic proteins (Pages and Lazdunski, 1981). The amount of protein associated with the inner membrane increased; this increase, and the decrease in secretion, were reversible on removal of PEA, suggesting that secretory proteins accumulate at the inner membrane in the presence of the lipid perturbant, but can be secreted and processed when the proper lipid fluidity is restored. [PEA also dissipates proton motive force, which is thought to be required for protein secretion (Daniels et al., 1981); see below.] Decreasing the membrane fluidity also affects secretion. Pages et al. (1978) varied fluidity by changing the temperature at which E. coli spheroplasts were grown. They found that the amount of secreted alkaline phosphatase (as a fraction of total synthesized protein) dropped dramatically from 29 to 13”C,while the amounts found in the cytoplasm
130
MARTHA S. BRIGGS AND LILA M. GIERASCH
and associated with the membrane increased. The “transition temperature,” as determined from Arrhenius plots, occurred at about 22°C.This result was obtained with both normal cells and a fatty acid auxotroph grown on oleic acid. Using data on alkaline phosphatase secretion from a fatty acid auxotroph grown on elaidic acid, which has a higher phase transition temperature than oleic acid, a “transition temperature” of 33°C was obtained. Using similar methods, DiRienzo and Inouye (1979) also found that reduced membrane fluidity inhibited localization of periplasmic and membrane proteins in E. coli. Membrane perturbants and temperature changes also affect translocation in an in vitro system (Rhoads el al., 1984; Chen et al., 1985). Thus, protein secretion in E. coli is dramatically affected by the physical state of the membrane lipids. Both increased and decreased fluidity inhibit secretion. It is possible that this effect arises because the signal sequence and/or the secreted protein interact with the membrane lipids, and these interactions are perturbed when the lipid fluidity is changed, However, changes in lipid fluidity also affect other membrane functions such as active transport (DiRienzo and Inouye, 1979).Thus the effect of membrane fluidity on protein secretion may be due to altered activity of a membrane-bound part of the secretory apparatus, and may not be an indication of signal sequence-membrane interaction. B . Signal Peptidase and Signal-Peptide Peptadase 1 . Signal Peptidase
Signal peptidase is the enzyme that proteolytically removes the signal peptide from the nascent secretory protein. Signal peptidases have been found in a variety of organisms, including E. coli (Zwizinski and Wickner, 1980), hen (Lively and Walsh, 1983),pig (Fujimoto et al., 1984),dog (Zimmerman et al., 1980), and D . melanogaster (Brennan et al., 1980). E . coli has at least two different signal peptidases, one of which is specific for the lipoprotein signal peptide [lipoprotein signal peptidase (LSP or peptidase 11)](Tokunaga et al., 1982, 1984),and one of which appears to cleave all of the other signal peptides (peptidase I). Only the E. coli signal peptidase I has been purified to homogeneity (Zwizinski and Wickner, 1980; Wolfe et al., 1983b). The signal peptidases found so far all seem to be integral membrane proteins (Brennan et al., 1980; Lively and Walsh, 1983; Fujimoto et al., 1984; Wolfe et al., 1983b). The amino terminus of the E. coli enzyme appears to be an uncleaved signal peptide (Wolfe and Wickner, 1984). In E. coli the signal peptidase has been reported to reside in approximately equal amounts in both the inner and outer membranes (Mandel
MOLECULAR MECHANISMS OF PROTEIN SECRETION
131
and Wickner, 1979).This result is surprising, as there is no other example of a protein that is localized to both of the E. coli membranes (Silhavy et al., 1983). In light of the recognized difficulty of clean separation of the two membranes (Tommassen et al., 1985), these conclusions should be regarded as questionable. Signal peptidase activity in dog pancreas appears to be limited to the RER membrane (Jackson and Blobel, 1977). Signal peptidase can be extracted from membranes using various detergents (Jackson, 1983). The detergent-solubilized bacterial enzyme is active (Wolfe et al., 1983b),while the canine enzyme requires phospholipid for activity (Jackson and White, 1981). The molecular weight of signal peptidase I from E. coli is about 37,000 (Wolfe et al., 1983b).The canine signal peptidase has a Stokes radius of 55 A; if the protein is spherical this size corresponds to a molecular weight of about 300,000 (Jackson and White, 1981). On the basis of this large molecular weight it has been suggested that the enzyme is part of a complex of proteins involved in translocation (Jackson and White, 1981). Neither the bacterial nor any of the eukaryotic signal peptidases is inhibited by the usual protease inhibitors such as tosyl-L-phenylalanine chloromethyl ketone, tosyl-Llysine chloromethyl ketone, o-phenanthroline, and phenylmethylsulfonyl fluoride (Wdlfe et al., 1983b; Stern and Jackson, 1985). Mammalian signal peptidases are also unaffected by leupeptin, pepstatin, antipain, diisopropyl fluorophosphate, and bestatin (Fujimoto et al., 1984; Stern and Jackson, 1985). Signal peptidases appear to be endoproteases. Treatment of bovine preproteins with an extract containing the enzyme in vitro yielded fragments with molecular weights corresponding to those of the mature protein and of the signal sequence (Stern and Jackson, 1985). Signal peptides from E. coli proteins accumulate in the cell envelope under conditions that inhibit the action of signal-peptide peptidase (see below) (Hussain et al., 1982). The fragments are also seen in vitro in the absence of signal-peptide peptidase (Silver et al., 1981). Thus, the signal peptidase cleaves only at the cleavage site between the mature protein and the signal sequence. The signal peptidase cleavage site has been described above. The interactions of the signal sequence and/or mature secretory proteins with the signal peptidase have not been studied extensively. The variability of the sequences of the signal peptidase cleavage sites and the predicted tendency of the c region of the signal sequence to fold into a /3 turn make it likely that the signal peptidase recognizes a secondary structure rather than a particular amino acid sequence. von Heijne (1983) has proposed a signal sequence-signal peptidase complex in which the small neutral side chains of residues -1 and -3 fit into a pocket of the protease, while Perlman and Halvorson (1983) have sug-
132
MARTHA S. BRIGGS AND LILA M. GIERASCH
gested that the predicted 6 turn is important for proper access of signal peptidase to the cleavage site. Cleavage appears to take place on the lumenal side of the ER membrane, or on the outside of the E. coli inner membrane. Wolfe et al. (1983a) presented evidence that the E. coli signal peptidase is oriented such that the bulk of the protein is at the outer surface of the cytoplasmic membrane and possibly also the outer membrane. This orientation is consistent with the peptidase’s site of action. Recent data suggest that blocking cleavage of precursors causes their accumulation in the E. coli inner membrane (Dalbey and Wickner, 1985). This result implies a catalytic role for the peptidase in release of exported proteins from the membrane.
2. Signal-Peptide Peptidase Signal-peptide peptidase activity has been found in E . coli (Zwizinski and Wickner, 1980; Hussain et al., 1982; Silver et al., 1981). The protein resides in the inner membrane, but is not identical to the signal peptidase or the lipoprotein signal peptidase. This enzyme has been purified, and has been identified as protease IV of the E. coli cell envelope (Ichihara et al., 1984). This signal-peptide peptidase is inhibited by a wide range of protease inhibitors, including antipain, leupeptin, chymostatin, and elastatinal (Hussain et al., 1982). It degrades signal peptide only after it is cleaved from the mature protein, and appears not to attack membrane proteins (Ichihara et al., 1984). Ichihara et al. (1984) have speculated that the signal-peptide peptidase is a carboxypeptidase that initiates digestion after cleavage of the signal peptide. Novak et al. (1986) have reported two cytoplasmic protease S with signal peptide hydrolyzing activity. They suggest that the membraneresident and cytoplasmic proteases may function in combination. C . Proteins Implicated in Eukaryotic Protein Secretion The signal hypothesis postulates the existence of several proteins necessary for secretion. These include the components of the SRP, which is proposed to bind to the signal sequence and block further translation of the mRNA coding for the mature protein; SRP receptor, or docking protein, which relieves the translation block imposed by the SRP; ribophorins, which bind the ribosome to the ER membrane; signal peptidase and signal-peptide peptidase, discussed above; and other proteins, which form a pore or transport apparatus in the membrane. Some of these proteins, the SRP, SRP receptor, signal peptidase, and ribophorins, have been isolated from eukaryotic cell extracts and characterized.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
133
1 . The Signal-Recognition Particle
The signal-recognition particle (SRP) is an 11 S ribonucleoprotein composed of a 7 S RNA molecule and six nonidentical polypeptide chains (Walter and Blobel, 1983). It is viewed as functioning to couple the translation of secretory proteins to their translocation through the membrane (Walter and Blobel, 1981b). SRP is isolated from the microsoma1 membrane by a high-salt wash (Walter and Blobel, 1981a). It can be disassembled into its individual components, which can then be reconstituted into an active particle (Walter and Blobel, 1983). The components of the SRP have been characterized. The sequence of the 7 S RNA is known, and a predicted secondary structure has been computed (Gundelfinger et al., 1984).The regions of the RNA to which the proteins bind have been defined. Only five of the six proteins bind the RNA; these are extremely basic, asjudged by isoelectric focusing and their tight binding to acidic chromatographic materials. The sixth protein binds to two of the other proteins, and not to the RNA (Walter and Blobel, 1983). Electron microscopy indicates that the SRP is an elongated rod about 24 nm long and 5-6 nm in diameter (Andrews et al., 1985). Since the distance between the nascent chain exit site from the ribosome and the peptidyl-transfer center is about 16 nm, it is possible that the SRP exerts its effect on protein translation by binding to both the signal sequence (near the exit site) and a component of the peptidyltransfer site (Andrews et al., 1985). The components of the SRP appear to have been conserved among eukaryotes. Chimeric SRPs reconstituted from mammalian SRP proteins and 7 S RNA from either X. laevis or D . melunogaster functioned both to block synthesis of secretory proteins temporarily and to cause their translocation through microsomal membranes (Walter and Blobel, 1983). A chimeric SRP composed of 6 S RNA (described in Section IV,D) from E. coli and mammalian proteins was inactive (Walter and Blobel, 1983). However, mammalian SRP can recognize both prokaryotic and eukaryotic signal sequences. Secretion of P-lactamase from E. colz by a canine system requires SRP (Miiller et al., 1982). SRP recognizes the prokaryotic signal sequence and binds to it, resulting in a translation arrest, and assists in the protein’s cotranslational translocation into dog pancreas microsomes. The SRP recognizes polysomes synthesizing secretory proteins (Walter and Blobel, 1981a). Recent evidence from synthesis of preprolactin with 6-azidobenzoyllysyl-tRNA as a photoaffinity group supports a direct interaction between the SRP 54 kDa polypeptide and the signal sequence (Kurzchaliaet al., 1986). Furthermore, construction of a secreted protein
134
MARTHA S . BRIGGS AND LILA M. GIERASCH
lacking its signal sequence led to loss of the SRP-imposed translation block (Weidmann el al., 1986a). In a cell-free translation system from wheat germ SRP binding imposes a translation block, which is relieved when SRP, the ribosome, and/or the signal sequence interact with the SRP receptor at the ER membrane (Walter and Blobel, 1981b). The ability of the SRP to cause an elongation arrest has been localized to two of its six proteins (Siege1 and Walter, 1985). In the presence of reconstituted SRP lacking these proteins [(-9/14) SRP], no translation arrest occurs. Secretory proteins are still translocated, although at a somewhat lower efficiency than usual. In synchronized translation experiments4 in the presence of (-9/14) SRP, the amount of protein translocated depends on the time of addition of microsomes. No translocation occurs if microsomes are added more than 5 minutes after translocation has begun. This amount of time corresponds to synthesis of a polypeptide of maximum length 150 residues. In contrast, in translation systems containing whole SRP, 100% translocation occurs regardless of the time of microsome addition. These results demonstrate that the translation arrest imposed by SRP does in fact couple protein synthesis to export. Meyer et al. (1982) have shown that reticulocyte lysates contain an SRP-like activity that causes a synthesis arrest on addition to a wheat germ translation system. This activity does not arrest translation in the reticulocyte lysate, however. In addition, no translation arrest occurs when canine SRP is added to cell-free translation systems derived from HeLa cells or rabbit reticulocytes (Meyer, 1985). He concludes that the translation arrest is peculiar to the wheat germ translation system. In these experiments, however, the translation system, SRP, and export apparatus have been derived from two or more organisms. Although some parts of the secretion apparatus, such as signal sequences and parts of the SRP, appear to have been conserved through evolution, others, such as the SRP receptor or some ribosomal proteins, may not have been. Thus, the lack of translation arrest in a given experiment may be due to a lack of interaction between molecules that come from different species. Conversely, the translation arrest observed in the wheat germ translation system may reflect a more tolerant requirement for interaction of some component critical for secretion. Clearly, the definitive
A synchronized translation system is one in which all of the ribosomes are translating the same part of a given mRNA at the same time. Synchronizationis achieved by allowing protein synthesis to proceed for a short period of time (e.g., 30 seconds) and then adding an inhibitor of further initiation. Thus, chains that are already being synthesized are completed, but no new chains are started.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
135
experiment would involve using a translation system, SRP, and export assay from the same organism. Another possible explanation for the lack of observed export block is the difference in rate of translation in vivo and in vitro. The rate of chain elongation in vivo in eukaryotes is about 180 residuedminute, while in vitro translation proceeds at about 30 residues/minute. If an SRP-nascent chain-ribosome complex has a half-life of, e.g., 1 second, it would cause a significant pause in synthesis in vitro, but would probably not be noticed in viva Such a short-lived complex may be sufficient to couple translation to translocation in vivo, but not in vitro, as the time required for the ribosome to diffuse to the membrane will depend on how far it has to go. Inside the cell, the ribosome will have a much smaller distance to travel than in an in vitro translocation mixture.
2. SRP Receptor The SRP receptor or docking protein (Walter et al., 1981; Walter and Blobel, 1981b; Meyer et al., 1982) is an integral membrane protein of the RER (Hortsch and Meyer, 1985) with a mass of 72,000 Da (Gilmore et al., 1982; Hortsch et al., 1985). One of its functions appears to be to relieve the SRP-imposed translation arrest by binding tightly to the SRP (Walter and Blobel, 1981b). It is required for secretion even in the absence of a translation block, however, so it must have at least one additional function (Siegel and Walter, 1985). Siegel and Walter (1985) have suggested that the SRP receptor’s primary function is to direct the SRP-bound nascent polypeptides to the ER membrane where the ribosome can be bound to the membrane and translocation can occur. The SRP receptor can be cleaved into two or three domains by proteolysis (Hortsch et al., 1985). Mild treatment with elastase generates a soluble fragment of 59,000 Da and a membrane-bound domain of 14,000 Da. The 59,000-Da fragment may be capable of restoring translocation activity to microsomes depleted of SRP receptor and can release an SRP-mediated translation arrest. This result conflicts with those of Lauffer et al. (1985) on what is allegedly the same fragment. Treatment with both trypsin and elastase results in three fragments: soluble pieces of 46,000 and 13,000 Da, and a membrane anchor of 14,000 Da. The 46,000-Da fragment was not able to reconstitute the activity of microsomes lacking the SRP receptor, nor did it bind SRP. The authors concluded that the soluble 13,000-Da fragment is required for both membrane association and SRP binding by the SRP receptor. The complete primary structure of the SRP receptor has been determined (Lauffer et al., 1985). It appears that the protein has an uncleaved signal sequence near its amino terminus. Its membrane-bound region is
136
MARTHA S. BRIGGS AND LILA M. GIERASCH
in the amino-terminal part of the protein, which has two hydrophobic stretches of amino acids that may insert into the membrane as a double “helical hairpin” (Lauffer et al., 1985). The remainder of the protein is quite hydrophilic. Three regions in particular have a large proportion of charged amino acids; basic residues are about twice as numerous as acidic residues. These regions resemble nucleic acid binding proteins, and may be responsible for binding to the RNA molecule of the SRP. It is unlikely that the SRP receptor binds the signal sequence. Association between polysomes and the SRP receptor appears to require the SRP as an intermediary, The extreme basicity of the SRP receptor makes it likely that it binds a highly acidic entity (Lauffer et al., 1985); no part of the signal sequence is acidic. Furthermore, Gilmore and Blobel (1983) have presented evidence that the affinity of the SRP receptor for the ribosome and the nascent polypeptide is low, both in the presence and absence of SRP. 3 . Ribophorins
T h e low affinity of the SRP receptor for the ribosome, and the substoichiometric ratio of SRP and SRP receptor to membrane-bound polysomes, suggest that interaction of the SRP and SRP receptor with the ribosome is transient (Gilmore and Blobel, 1983). Thus, the observed binding of the ribosome to the membrane must occur by some other means. Two integral membrane glycoproteins, ribophorins I and 11, have been identified (Kreibich et al., 1978, 1983). They appear to be required for translocation (Amar-Costesec et al., 1984), and may be responsible for binding of the ribosome to the membrane. Ribophorins I and I1 are immunologically distinct integral membrane proteins found in the RER membrane (Amar-Costesec et al., 1984; Marcantonio et a,!., 1982). The proteins were first isolated from rat liver, but very similar (in location, immunological behavior, and molecular weight) proteins have been found in RER membranes from other rat tissues and from other species, such as rabbit and dog (Marcantonio et al., 1982). Rat liver ribophorins I and I1 have molecular masses of 65,000 and 63,000 Da, respectively (Kreibich et al., 1978). Binding of ribosomes by microsoma1 subfractions correlates with the presence of these proteins, and they can be chemically cross-linked to ribosomes (Kreibich et al., 1983; Amar-Costesec et al., 1984). T h e molar ratio of the ribophorins to each other and to bound ribosomes is about one, supporting a model in which one of each of the ribophorins is required to bind one ribosome to the membrane (Marcantonio et al., 1984). Microsomal subfractions containing ribophorins can be stripped of their native ribosomes. These stripped microsomes can then bind ribosomes that are not carrying a
MOLECULAR MECHANISMS OF PROTEIN SECRETION
137
nascent secretory polypeptide (Amar-Costesec et al., 1984). This result suggests that the signal sequence does not play a part in ribophorinribosome binding. 4 . Other Proteins The signal hypothesis postulates that there are, in addition to the SRP, SRP receptor, ribophorins, and signal peptidase, other components of the secretory apparatus that are responsible for the transport of the nascent polypeptide across the membrane. To date, a protein that performs this function has not been isolated, although there is evidence for a proteinaceous site in the RER membrane that specifically binds signal peptides in the absence of ribosomes and at high ionic strength, which implies the absence of SRP (Bendzko et al., 1982; Prehn et al., 1980, 1981; Robinson et al., 1985). The activity resides at the cytoplasmic side of the ER, as signal-peptide binding is abolished by treatment with added protease (Prehn et al., 1980). Cross-linking experiments tentatively identify the binding activity with a membrane-bound protein of approximately 45,000 Da (Robinson et al., 1985). This “signal-peptide receptor” is saturable, with a K d of 1 X lO-’M. Its function is unknown (Prehn et al., 1980; Robinson et al., 1985). A number of proteins necessary for invertase secretion have been identified in genetic studies of yeast. Most of these affect secretion events that occur after translocation of the protein into the ER lunien (Novick et al., 1980). However, two mutant strains that are temperature-sensitive for the product of genes sec53 and sec59 accumulate secretory-protein precursors bound to the ER membrane at the nonpermissive temperature (Ferro-Novick et al., 1984a). The gene products of sec53 and sec59 have not been fully characterized, but the phenotypes of the mutant strains indicate that neither codes for components of the SRP or SRP receptor (Ferro-Novick et al., 1984b). Thus sec53 and sec59 may specify new proteins in the secretion machinery. D . Proteins Implicated in Prokaryotic Protein Secretion information on components of the prokaryotic protein-secretion apparatus has come primarily from genetic studies in E. coli. The chief technique has been to obtain a strain that is defective in some aspect of protein secretion and then to screen for strains in which the defect is suppressed by a compensating second-site mutation (Emr et al., 1981; Bankaitis and Bassford, 1985). Another approach has been to select for mutations that exhibit a pleiotropic export-defective phenotype (It0 et al., 1983; Oliver and Beckwith, 1981).These procedures have identified a host of genetic loci that specify proteins that affect secretion. Two of
138
MARTHA S. BRIGGS AND LILA M. GIERASCH
these proteins have been partially purified and characterized. Possible protein-protein, protein-membrane, and protein-ribosome interactions have also been determined. Recent results show that the effects of suppressor mutations may be indirect and should be interpreted with caution (Strauch et al., 1986). Another potential complexity is the finding that signal sequence mutations can directly influence levels of expression, either increasing or decreasing them (Kadonaga et al., 1986; Matteucci and Lipetsky, 1986). 1 . secA The secA gene product has a pleiotropic effect on protein secretion (Oliver and Beckwith, 1981). Lack of the SecA protein abolishes synthesis of most exported proteins (Liss and Oliver, 1986), and this defect is lethal to the cells. The product of the secA gene has been identified as a 92,000-Da protein that is located at the cytoplasmic face of the inner membrane (Oliver and Beckwich, 1982a,b). This protein appears to interact with the product of the prlA (secY) gene (described below), as defects in theprlA gene can suppress a secA defect (Brickman et al., 1984; Oliver and Liss, 1985). The protein may also interact with the signal sequence of exported proteins (Kumamoto et aZ., 1984): In the absence of SecA, synthesis of MBP is abolished. In three mutants with nonfunctional MBP signal sequences, the elimination of SecA does not prevent MBP synthesis. Instead, precursor of MBP accumulates in the cytoplasm. Kumamoto et al. (1984) had suggested that SecA might correspond to the part of the eukaryotic SRP that is responsible for binding to the SRP receptor and relieving the translation block. However, it has recently been found (Strauch et al., 1986) that the signal sequence mutations not only restore synthesis of the same protein; the effect appears to be a nonspecific one. Hence, either signal sequence mutations alter synthesis and secretion of other proteins or the secA product is induced by the presence of the mutation. Furthermore, cyclic AMP was found to restore the level of synthesis of exported proteins in SecA amber mutants (Strauch et al., 1986). Synthesis of SecA is regulated in response to protein export (Oliver and Beckwith, 1982b).When export is inhibited, the production of SecA increases at least 10-fold to compensate for the secretion defect. Oliver and Beckwith (1982b) have speculated that the presence of precursors of secreted proteins in the cytoplasm could serve as a regulator of the expression of proteins needed for synthesis. 2. secB and secC Strains containing mutations in the secB gene exhibit mild defects in secretion of a subset of exported proteins, including MBP, OmpF, and
MOLECULAR MECHANISMS OF PROTEIN SECRETION
139
LamB (Kumamoto and Beckwith, 1983, 1985). The precursors of these proteins accumulate in the cytoplasm. The gene product, a protein with an apparent mass of 12,000 Da, appears not to be essential for growth, possibly because not all exported proteins are affected. Double mutants defective in both secA and secB grow more poorly than either of the parent strains (Kumamoto and Beckwith, 1983). The effects of the two mutations are synergistic, suggesting that the gene products are part of the same export pathway, and may interact with each other. Little is known of the mechanism of the secB export defect. A mutation at the secC gene, which codes for a ribosomal protein (S15, S. Ferro-Novick and J. Beckwith, personal communication), can suppress the secretion defect of a secA mutant (Ferro-Novick et al., 1984~). The phenotype of a temperature-sensitive secC mutant is similar to that of a secA- strain. At the nonpermissive temperature, synthesis of exported proteins is blocked. The synthesis of MBP can be restored in secC mutants by mutations in the hydrophobic core of its signal sequence. The new insights into SecA mutants (Strauch et al., 1986) cloud the interpretation of these results. 3. plA (secY) The gene prlA was first identified as part of the E . coli secretion apparatus because mutations in the gene can restore export of signalsequence mutants of the A-receptor protein (Emr et al., 1981). Subsequently, Ito et al. (1983) isolated a temperature-sensitive mutant, secY, that was pleiotropically defective in protein secretion. The prlA and secY genes are almost certainly identical (It0 et al., 1984). The plA gene is located near the promoter-distal end of the spc operon (Schultz et al., 1982). This operon consists of genes for several ribosomal proteins, as well as the prlA gene product, and the X gene, which codes for a protein of unknown function (Cerretti et al., 1983). Since the prlA gene is part of the spc operon, it is possible that its gene product is also a ribosomal protein. However, the chromosomal locations of most ribosomal genes are known, so it is unlikely that plA codes for a known ribosomal protein (Schultz et al., 1982). The prlA gene has been sequenced, and its gene product was identified (Ito, 1984). The protein’s molecular mass, predicted from its DNA sequence, is 49,000 Da. It has unusual properties characteristic of integral membrane proteins. Akiyama and Ito (1985) have shown that this protein resides in the cytoplasmic membrane. They have suggested, based on its highly hydrophobic nature, that PrlA may correspond to the ribophorins of eukaryotic cells. Alternatively, the protein could form a pore through the membrane for passage of the nascent polypeptide.
140
MARTHA S. BRIGGS AND LILA M. GIERASCH
T h e prlA gene product may interact with the signal sequences of exported proteins, as it can suppress mutations in this region (Emr et al., 1981; Shultz et al., 1982). Interestingly, mutations in this gene have been shown to suppress an export defect that resulted from a change in the first amino acid in the mature exported protein (Liss et al., 1985). T h e allele specificity of p-lA also supports the idea of direct contact between the signal sequence and PrlA. Certain mutations in prlA can suppress export defects due to certain mutations in the signal sequence, but not to others (Emr et al., 1981). Such allele specificity implies a direct interaction of PrlA with the signal sequence. In addition, as noted above, mutations inprlA can restore export function to secA mutants (Brickman et al., 1984). Thus, the PrlA protein may also interact with the secA gene product. 4. prlB, prlC, prlD
Th e prlB and prlC genes were also identified as sites of suppressor mutations that restore export to mutant A-receptor proteins (Emr et al., 1981). The prlB suppressor is a mutation in a gene coding for a periplasmic ribose-binding protein (Silhavy et al., 1983). It has been suggested that prlB does not code for a part of the export machinery, but suppresses A-receptor protein signal sequence mutations by bypassing the normal secretion pathway (Silhavy et al., 1983). T h e prlC mutants are similar in phenotype to theprlA mutant. T h e gene maps between 69 and 71 minutes on the E. coli chromosome (Emr et al., 1981). Little else is known of this suppressor. Another gene, prlD, was identified as affecting protein secretion in E. coli (Bankaitis and Bassford, 1985). Mutations in prlD suppress the export defect of a signal-sequence mutant of MBP. It can also suppress a LamB export defect. Mutations in prlD are allele-specific, which implies that the prlD gene product interacts directly with the signal sequence. There is evidence that prlD also interacts with prlA. Certain strains with mutations in both the prlA and prlD genes exhibit a general lack of protein export, and accumulate precursors of exported proteins in the cytoplasm. 5. Other Components of the Secretion Apparatus in E. coli
A number of other genes appear to affect protein secretion in E. coli. Most are poorly characterized, and could affect the export process, synthesis of exported proteins, and/or regulation of export or synthesis. A mutation in the e n d gene, which is involved in the transcriptional regulation of genes coding for OmpF and OmpC, has pleiotropic effects on
MOLECULAR MECHANISMS OF PROTEIN SECRETION
141
secretion (Silhavy et al., 1983). A mutation in the expA gene causes decreased secretion of periplasmic and outer-membrane proteins without affecting cytoplasmic and inner-membrane proteins (Dassa and Boquet, 1981). Oliver (1985) has identified five new genes, ssaD, ssaE, ssaF, ssaG, and ssaH, that are extragenic suppressors of mutations in the secA gene. Mutations in these genes decrease the synthesis of MBP (and presumably other exported proteins), strengthening the idea that E. coli has a mechanism for coupling protein synthesis and secretion, but complicated by the recent findings of Strauch et al. (1986). Shiba et al. (1984) have isolated a mutation that suppresses the protein-export defect of a secY mutation. The gene in which the mutation occurs was designated ssyA. The phenotype of the strain carrying the mutant ssyA gene is altered in protein synthesis as well as in export. Thus, ssyA may code for a protein that is part of both the synthetic machinery and the export apparatus. Miiller and Blobel (1984a,b) have partially purified a soluble factor required for protein secretion from an in vitro translocation system derived from E. coli. As it sediments at about 12 S, the authors have suggested that it is a complex of smaller molecules. It does not contain 6 S RNA (see below), but may contain some other small RNA. Its function is unknown. There is also evidence of at least one protein at the cytoplasmic surface of the E. coli membrane that is involved in translocation: Treatment of inverted membrane vesicles with protease renders the membrane inactive for subsequent translocation (Rhoads el al., 1984; Chen et al., 1985). Escherichza coli possesses a 6 S RNA of unknown function. This molecule complexes with protein to form an 11 S particle, whose function is also unclear (Lee et al., 1978). It was suggested that the 11 S particle is analogous to the eukaryotic SRP, and that the 6 S RNA corresponds to the eukaryotic 7 S RNA that is part of the SRP (Walter and Blobel, 1983). Lee et al. (1985) demonstrated that the 6 S RNA is not necessary for growth of E. coli or for protein export. The authors conclude that the E. coli 6 S RNA is not a component of a bacterial SRP. There is indirect evidence for the existence of an SRP-like entity in E. coli. Pages et al. (1985) have presented preliminary findings that indicate that a translation block may occur during the synthesis of pre-PhoS, a periplasmic phosphate-binding protein. It is known that the translation rate in E. coli is nonuniform. “Pause sites” (Pages et al., 1985) occur at codons complementary to uncommon tRNAs. However, a pause site in PhoS elongation corresponding to a peptide of 8 kDa is not accounted for by the presence of such codons. This peptide was subsequently con-
142
MARTHA S. BRIGGS AND LILA M. GIERASCH
verted into pre-PhoS. This intermediate could be the result of a translation block similar to that imposed by the eukaryotic SRP. There is also preliminary evidence for a protein secretion apparatus in Bacillus subtilis. Caulfield et al. (1984, 1985) have studied the “S complex,” a particle consisting of four proteins that appears to be involved in protein secretion. The complex is present on ribosomes as a small particle, essentially a third ribosomal subunit; its proteins can be cross-linked to the 50 S ribosomal subunit. In addition, a 64-kDa protein present in the S complex is protected from added protease in the presence of both ribosomes and membrane, but not by either alone. The S complex does not appear to cause an arrest of translation (Caulfield et al., 1984).The S complex aggregates to form a clathrin-like structure when it is removed from ribosomes (Caulfield et al., 1985). The authors have proposed that such a structure might serve to form a “cage”around a nascent secretory polypeptide, isolating it from the cytoplasm until it reaches the membrane. Subsequent to membrane binding, three of the proteins of the S complex dissociate and the 64-kDa protein remains associated. This latter protein may then play a role in secretion.
6. Summary The genetic evidence presented above makes it clear that E. coli, and possibly other bacteria, possess a complex set of proteins that act in the protein-secretion process. Although it appears that at least one protein, the M13 phage coat protein, can be localized and processed in the absence of proteins other than signal peptidase (Section V,B) (Silver et al., 1981; Ohno-Iwashita and Wickner, 1983; Watts et al., 1981), most proteins of the bacterial cell envelope require the participation of a secretion apparatus for proper localization. Whether the bacterial secretion process is analogous to the eukaryotic process remains to be seen. The recent development of in vitro translocation systems derived from E. coli should facilitate research in this area (Rhoads et al., 1984; Miiller and Blobel, 1984b). V. How DOESSECRETION OCCUR?
Despite large amounts of evidence concerning the requirements of protein secretion, the molecular mechanism of the process is still unclear. Various models of the export mechanism have been proposed. No one hypothesis accounts for all of the data collected to date; it may be that one mechanism is insufficient to explain the many types of export that occur. In this section, some current models of protein secretion are
MOLECULAR MECHANISMS OF PROTEIN SECRETION
143
summarized. The models are then compared to each other, and to existing data, with reference to some of the major questions concerning protein-export mechanisms. A. Models of Protein Secretion
1 . The Signal Hypothesis The signal hypothesis of Blobel and Dobberstein (1975a,b) is diagrammed and discussed in Section 11,C; the details will not be repeated here. Its distinguishing points are (1) recognition of the signal sequence by the SRP, and subsequent translation arrest; (2) release of the translation arrest by the SRP receptor at the membrane, and association of the ribosome with the membrane; and (3) vectorial transport of the nascent protein through a proteinaceous pore. Energy for translocation is derived from elongation of the peptide chain.
2 . The Membrane Trigger Hypothesis Wickner (1980) proposed an alternative mechanism of protein secretion, called the membrane trigger hypothesis. This model proposes that the signal sequence influences the precursor protein or a domain of the precursor to fold into a conformation that can spontaneously partition into the hydrophobic part of the bilayer. In prokaryotes, the membrane potential causes the protein to traverse the bilayer. The protein then regains a water-soluble conformation, and is expelled into the medium. Signal peptidase removes the signal sequence during or after this process. Thus, secretory proteins or domains are transported across the membrane posttranslationally without the aid of a proteinaceous secretory apparatus. An energy source, such as the membrane potential, is required for secretion. 3. The Loop Model Based on analysis of the amino acid sequences of signal peptides, Inouye and Halegoua (1980) proposed the loop model of protein secretion. While not a detailed mechanism in terms of energy source, temporal relationships of translation and translocation, and export site, these ideas have shaped subsequent thinking about the topology of export. In this mechanism, positively charged residues at the amino terminus of the signal sequence bind to the negative charges of the phosphatidylglycerol head groups at the membrane surface. The proline and glycine residues that occur in most signal sequences induce formation of a reverse turn, so that the signal peptide enters the membrane as a loop. As the peptide is elongated, the loop protrudes further into the membrane. The cleav-
144
MARTHA S. BRIGGS AND LILA M. GIERASCH
age site is eventually located at the outer face of the cytoplasmic membrane, while the charged amino terminus remains anchored at the inner face. This idea is supported by experiments in which the gene coding for P-galactosidase is fused to the gene for prelipoprotein. Thus, the gene product is a hybrid protein consisting of P-galactosidase followed by the signal sequence and the structural sequence for lipoprotein. The lipoprotein portion of the hybrid is localized to the outer membrane, as usual. The signal sequence remains in the membrane, and P-galactosidase is found on the cytoplasmic side of the membrane (M. Inouye, personal communication). A fusion of part of the E. coli erythromycin resistance gene to the amino terminus of prelipoprotein was described above (Section 111,B; Hayashi et al., 1985). Lipoprotein was secreted in this case also. These experiments substantiate the central contention of the loop model, as they confirm that the amino terminus of the signal sequence remains on the cytoplasmic side of the membrane, while the secreted protein is translocated. The recent work of Perara et al. (1986, described in Section II1,B) suggests that the presignal sequence component in these hybrid systems may affect the eventual disposition of the protein product, as globin with the prolactin signal sequence on its C terminus was translocated.
4 . The Helical Hairpin Hypothesis T h e helical hairpin hypothesis of Engelman and Steitz (198 1) and the direct transfer model of von Heijne and Blomberg (1979) share an emphasis on the thermodynamic basis of secretion. These models note the unusual hydrophobicity of signal sequences and calculate that in a helical conformation (either a helix or 310 helix) a signal sequence can partition into the hydrophobic part of the membrane. The helical hairpin mechanism involves two helical regions long enough to span the membrane. One is composed of the signal sequence, and the other is the first 15-25 residues of the mature protein. These two regions form a side-byside “hairpin” structure that inserts into the membrane as a loop. T h e rest of the mature protein is translocated cotranslationally, with each residue passing through the membrane as part of a helical structure. This model predicts that a proteinaceous export site is not required for export and that export is initially driven by the favorable free energy of transfer of the hydrophobic signal sequence from the cytoplasm to the membrane.
5. The Amphiphilic Tunnel Hypothesis Rapoport (1985) has also considered the thermodynamic aspects of protein insertion and secretion in his amphiphilic tunnel hypothesis. This
MOLECULAR MECHANISMS OF PROTEIN SECRETION
145
model is essentially identical to the signal hypothesis in describing the initial steps of protein secretion. The translocation process and the properties of the region through which the protein traverses the membrane are described in detail. It is assumed that the export apparatus in the membrane consists of an amphipathic tunnel that can bind both hydrophobic and hydrophilic parts of the nascent protein. The tunnel might be formed of a protein with several types of binding sites, or of lipids arranged to provide both polar and apolar regions, or both. The translocation process begins with the signal sequence binding to the hydrophobic region of the tunnel. The nascent chain enters the tunnel, and begins to fold into a low-energy conformation. Hydrophilic regions of the protein are generally not retained in the tunnel, and are expelled into the aqueous phase. Hydrophobic and amphiphilic parts remain in the membrane, either until they assemble into a polar domain and are released into the medium, or until translation is complete. When translation ends, the amphipathic tunnel disassembles, and the portions of the nascent chain remaining in the membrane are either transferred to the outer face of the membrane or retained in the bilayer, depending on the compatibility of the peptide segments with a hydrophobic environment. Thus, this hypothesis models both protein secretion and the insertion of membrane proteins.
6. A More Active Role for Membrane Lipids? Nesmayanova (1982) has proposed a model that includes a more active role for the membrane lipids in translocating the secreted protein. Briefly, the model postulates an initial association of the signal sequence with the acid phospholipids of the inner membrane. This association neutralizes the negative charge of the lipid head groups and stimulates transbilayer movement of both the phospholipids and the signal sequence as a unit. A hydrophilic channel is formed by the lipids in a hexagonal arrangement. The secreted protein is then forced through the channel by the elongation process and by the motion of the lipids. Evidence for this model rests primarily on a correlation of lipid biosynthesis and translocation with protein synthesis and secretion. Increased synthesis of one causes increased synthesis of the other (Nesmayanova, 1982; Pag& 1982). Both are inhibited by dissipation of the membrane potential (Bogdanov et al., 1984). Phosphatidylglycerol is present at the site of protein translocation, and may be involved in binding to the nascent chains and/or the ribosome (Bogdanov et al., 198513).Bogdanov et al. (1985a) claim that secretion of alkaline phosphatase is accompanied by the appearance in freeze-fracture electron micro-
146
MARTHA S. BRIGGS AND LILA M. GIERASCH
graphs of “intramembrane particles” that represent areas of hexagonal lipid arrangement.
7. The Domain Model Randall and Hardy (1984a,b) proposed a secretion mechanism that could be called the domain model. This model incorporates and modifies portions of the signal hypothesis and the membrane trigger hypothesis. As in the amphipathic tunnel model, the initial steps of secretion (those involving targeting of the translation complex to the proper membrane) are as set forth in the signal hypothesis. After recognition and membrane association, the nascent chain is elongated at the membrane surface. It does not enter the membrane, however, until most or all of the protein has been synthesized. The protein is then transported across the membrane in “domains,” and not in a vectorial manner. It is important that synthesis occur at the membrane in order to prevent folding of the domains into a translocation-incompetent conformation. The mechanism of translocation is not specified, and may or may not involve participation of a proteinaceous secretory apparatus. The energy required for secretion is derived either from the membrane potential, or from a conformational change of the exported protein. A multistep export mechanism with some features in common with the domain model has been proposed for j3-lactamase (Koshland et al., 1982; Kadonaga et al., 1986). Some of the outstanding questions concerning protein secretion are: (1) What is the nature of the translocation site? (2) Is translocation vectorial or by domains? (3) How much energy is required for secretion and where does it come from? The models described above predict the answers to these questions in various ways. In the following sections, the available experimental data on each of these questions are described and compared to the predictions of the models. B . What Is the Nature of the Translocation Site? Exported proteins must cross a biological membrane, which is composed largely of lipids and proteins. Is the translocation site made of lipids or protein? The membrane trigger hypothesis, the helical hairpin hypothesis, the domain model, and the model of Nesmayanova postulate that protein translocation occurs directly through the lipid bilayer and that no proteinaceous export site is necessary. Other proteins may be needed for recognition, and signal peptidase is required for removal of the signal sequence after export. Evidence for a lipid translocation site comes primarily from experiments in reconstituted export systems in Wickner’s laboratory. It has been shown that the precursors of M13
MOLECULAR MECHANISMS OF PROTEIN SECRETION
147
phage coat protein and the E. colz proteins OmpA and MBP can partially insert into, and be processed by, liposomes containing no proteins other than signal peptidase (Silver et al., 1981; Ohno-Iwashita and Wickner, 1983; Ohno-Iwashita et al., 1984; Watts et al., 1981).The small size of the M13 coat protein (only 50 amino acids) (Wickner et al., 1980)may render it incapable of binding to an SRP-like particle while it is still attached to the ribosome, as 30-40 amino acid residues are required before the polypeptide begins to protrude from the ribosome (Smith et al., 1978; Randall, 1983). Consequently, synthesis of the protein would be finished, and the ribosome would have disassembled before the signal sequence became long enough to completely emerge from the ribosome. Thus, it is not surprising that secretion of the M13 coat protein is independent of species like SRP. However, secretion of MBP and OmpA is affected by mutations in proteins of the export apparatus (Liss and Oliver, 1986), and therefore should require proteins other than signal peptidase for proper export. None of the proteins was completely translocated into the interior of the liposomes, however, indicating that some protein or proteins are needed. These may or may not be part of a proteinaceous translocation site. The signal hypothesis describes a mechanism in which the exported protein traverses the membrane through a proteinaceous pore. Gilmore and Blobel ( 1985) have described experiments that test the accessibility of the nascent chain to aqueous solutes. They conclude that integral membrane proteins are involved in both the initial attachment of the signal sequence to the membrane, and translocation of the exported protein through the membrane. They suggest that the nascent chain does not interact with the membrane lipids. However, there is indirect evidence in prokaryotes (from the membrane-fluidity experiments described above) that the exported proteins may contact the lipids during translocation. Bogdanov et al. (1985b) have presented results indicating that phosphatidylglycerol is present at the site of protein translocation. Two laboratories have reported that synthesis of acid phospholipids correlates with protein secretion (Nesmayanova, 1982; Pag& 1982). Furthermore, we have found that the interaction of synthetic signal sequences with phospholipid monolayers correlates with their activity in vivo (Section V1,C). Thus there is evidence for at least some contact of the nascent protein with the membrane lipids. Although the nature of the translocation site is far from understood, it is probable that the nascent polypeptide contacts both proteins and lipids within the membrane. It may be that the translocation process is initiated by a transitory protein-protein interaction, but proceeds in a lipid environment, or vice versa. Alternatively, the translocation site may
148
MARTHA S. BRIGGS AND LILA M. GIERASCH
be partially proteinaceous and partially lipid, as proposed by Rapoport (1985).
C . Is Transfer Vectorial or by Domains? The question of vectorial or domain transfer is closely related to that of the temporal relationship between synthesis and export, which has long been a matter of controversy. The signal hypothesis and the membrane trigger hypothesis directly contradict each other on this point. The signal hypothesis requires that secretion be vectorial and consequently cotranslational, while the membrane trigger hypothesis specifies that secretion is by domains, and often posttranslational. The helical hairpin hypothesis specifies vectorial translocation, and the amphipathic tunnel model of Rapoport allows for either vectorial or domain modes of translocation. There is evidence that both co- and posttranslational translocation can occur, although for some cells or proteins secretion may proceed exclusively by one route or the other. Cleavage of the signal sequence is generally assumed to take place during or soon after translocation. A related matter is whether processing is necessary for release of the exported protein from the membrane. These points are discussed below. In higher eukaryotes, the timing of translation and translocation is organelle-dependent. Mitochondrial, chloroplast, and peroxisomal proteins are generally translocated posttranslationally (Hay et al., 1984; Kreil, 1981; Grossman et al., 1980), while secretion of proteins into the ER lumen occurs cotranslationally in nearly all cases. Secretion in vitro occurs nearly exclusively when the transport apparatus (from microsoma1 membranes) is present during synthesis. Secretion is generally not observed when the transport machinery is added after synthesis has stopped (Blobel and Dobberstein, 1975b; Mostov et al., 1981). There is evidence that secretion in vitro is prevented when membranes are added after the nascent chains are about 80 residues long (Rothman and Lodish, 1977). Thus, in nearly all eukaryotic systems studied to date, translocation into the ER lumen is tightly coupled to protein synthesis. However, recent in vitro studies have found secretion of a-mating factor in yeast to proceed posttranslationally (Hansen et al., 1986; Rothblatt and Meyer, 1986; Waters and Blobel, 1986).Also, Mueckler and Lodish ( 1986) have reported posttranslational translocation of the human glucose transporter in vitro. They also observed cotranslational translocation regulated by SRP. These results all indicate that alternative mechanisms may obtain depending on the species of organism, the protein, and the experimental procedure.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
149
In E. coli, the timing of translation and translocation appears to be quite variable. Although E. coli is certainly capable of cotranslational export (Smith et al., 1977, 1978;Josefsson and Randall, 1981), there is a large body of evidence that some bacterial proteins can be secreted posttranslationally, both in vivo (Josefsson and Randall, 1981; Goodman et al., 1981; Chen et al., 1985; Ryan and Bassford, 1985) and in vitro (Muller and Blobel, 1984b; Chen et al., 1985; Chen and Tai, 1985).A number of E. coli proteins appear to be secreted both co- and posttranslationally (Josefsson and Randall, 1981; Kadonaga et al., 1985). In the case of cotranslational secretion, Josefsson and Randall ( 1981) found that translocation of some E. coli proteins, including MBP, arabinose-binding protein, alkaline phosphatase, and OmpA, begins only after the nascent chains reach 80% of their full length. Randall (1983) has presented evidence that entire domains of nascent chains of MBP and ribose-binding protein are translocated after their synthesis. In direct conflict with reports of posttranslational secretion are the findings of Pag& et al. (1984),who found that precursors of the periplasmic phosphate-binding protein, PhoS, that have accumulated in the cytoplasm are not exported posttranslationally, but are slowly degraded. Precursors bound to the cytoplasmic membrane could be exported, however. Ryan and Bassford (1985) reported that an E. coli strain export-defective for MBP is incapable of rapid, apparently cotranslational, export of MBP, but can secrete the protein posttranslationally. The observed posttranslational secretion is much slower than wild-type secretion, however, and in many cases a fraction of the precursor pool remains in the cytoplasm. Thus, the temporal relationship between synthesis and secretion in E. coli remains somewhat unclear. Silhavy et al. (1983) and Rhoads et al. (1984) proposed that the coupling between the two processes is not as tight for prokaryotic secretion as it appears to be for eukaryotic secretion into the ER. The mode of secretion may be protein-specific. Some proteins [such as T E M P-lactamase (Josefsson and Randall, 1981)], may be exported primarily posttranslationally, whereas some [such as PhoS (Pages et al., 1984) and amp C @-lactamase (Josefsson and Randall, 198l)l may be exported primarily cotranslationally in vivo. At least one protein that is secreted cotranslationally in vivo (E. coli alkaline phosphatase) (Smith et al., 1977) can be translocated posttranslationally into E. coli membrane vesicles in vitro (Chen et al., 1985). The results of Ryan and Bassford ( 1985), described above, suggest that cotranslational export may be the major mode of secretion of wild-type proteins in “normal” (i.e., whole and not mutated) cells, while posttranslational secretion may be a backup system for use in case of damage to the secretory apparatus.
150
MARTHA S. BRIGGS AND LILA M. GIERASCH
In eukaryotes, cleavage of the signal sequence is thought to occur cotranslationally (Hortin and Boime, 198la,b), but can occur posttranslationally in cell-free systems (Jackson and Blobel, 1977),or in cases where cleavage is inhibited by changes in the cleavage site (Schauer et al., 1985; Hortin and Boime, 1981a,b). Inhibition of cleavage may slow translocation and affect subsequent processing and transport steps (Schauer et al., 1985). Signal-sequence cleavage does not appear to be necessary for translocation to the ER lumen (Hortin and Boime, 1981a,b), although in one case the precursors were associated with the membrane (Schauer et al., 1985). Cleavage of the signal sequence in prokaryotes can occur either cotranslationally (Josefsson and Randall, 1981) or posttranslationally (Wu et al., 1983). Cleavage does not appear to be necessary for protein export, as mutants deficient in the cleavage of lipoprotein (Lin et al., 1978), MBP (Ryan et al., 1986a), or the M13 coat protein (Russell and Model, 1981) signal sequences are localized properly. D . How Much Energy Is Required for Secretion, and Where Does It Come From? Secreted proteins are usually hydrophilic, but they must cross the hydrophobic membrane to leave the cell. The energy barrier for this process must be lowered or compensated by some mechanism. Engelman and Steitz (1981, 1984), von Heijne (1980b), and von Heijne and Blomberg (1979) have calculated that transfer of the signal sequence in an a helix or 3,o helix from the aqueous medium of the cytoplasm to the apolar region of the membrane is thermodynamically favorable. In the helical hairpin hypothesis, Engelman and Steitz (1981) show that the energy gained by insertion of the signal sequence into the bilayer is sufficient to “pull” an adjacent, more polar, helical segment into the membrane with it. Subsequently, the amino acid residues are translocated vectorially, each passing through the helical structure. Thus, for every amino acid that enters the membrane another leaves, and little energy is required other than that of protein elongation. This scheme requires a tight coupling between protein translation and translocation. Pages et al. (1978) have also calculated that no energy beyond that required for protein synthesis is necessary for protein secretion in E. coli, if secretion is cotranslational. The energy requirements of secretion into the ER support the idea that vectorial transport does not require an energy source. Cotranslational secretion is usually observed in the ER system, both in vivo and in vitro. The ER does not have the enzymes needed to generate an electrochemical gradient, and uncouplers and ionophores do not affect the ability of ER microsomes to translocate proteins (Rhoads et al., 1984).
MOLECULAR MECHANISMS OF PROTEIN SECRETION
151
An electrochemical gradient resulting in a membrane potential has been shown to be essential for protein secretion in E. coli (Daniels et al., 1981; Date et al., 1980a,b). Collapse of the transmembrane potential by addition to cells or spheroplasts of the ionophore carbonyl cyanide mchlorophenylhydrazone, or materials that form ion-permeable pores in the membrane, such as valinomycin or colicins El and A, inhibited translocation and processing of several proteins of the outer membrane (LamB, OmpF, OmpA) (Enequist et al., 1981; Pagb and Lazdunski, 1982a,b), periplasm (leucine-isoleucine-valine-binding protein, leucine-specific binding protein, alkaline phosphatase, 6-lactamase) (Daniels et al., 1981; Pagb and Lazdunski, 1982a,b), and inner membrane (MI3 coat protein) (Date et al., 1980a,b). The role (and necessity) of the membrane potential has been debated. Wickner (1980) and Daniels et al. (198 1) proposed that the membrane potential orients the signal peptide by “electrophoresing” a loop of protein into the membrane. The signal peptide in an a-helical conformation has a net dipole due both to the charged amino-terminal region, and to the alignment of the peptide bonds in the helix. The helix is positive near its amino terminus, and negative near its carboxyl terminus. The E. coli inner membrane has a net positive charge at its outer face, and a net negative charge at its inner face. Thus a helical signal peptide should orient itself within the membrane so that its amino terminus faces the cytoplasm and its carboxyl terminus faces the periplasm. In so doing, it could begin to pull the amino terminus of the mature secretory protein across the membrane. On the other hand, the membrane potential might be necessary for proper function of one or more proteins of the secretion apparatus (Rhoads et al., 1984). Such a protein might derive from the membrane potential energy needed for “active transport” of the secreted protein. Alternatively, an energized membrane might be required to maintain part of the secretion apparatus in an effective conformation. Another possibility is that the membrane potential is necessary to generate high-energy phosphate compounds, which are then used as energy sources for secretion. Various in vivo studies and in vitro translation/translocation experiments have refuted this idea (Bakker and Randall, 1984; Rhoads et d., 1984; Pag& and Lazdunski, 198213); it has been noted, however, that high-energy phosphate compounds are required for protein synthesis, and their contribution to the secretion process cannot be ruled out. Chen and Tai (1985) addressed this point by designing experiments to test the energy requirements of posttranslational secretion of the E. colz proteins OmpA and alkaline phosphatase. They found that if adenosine triphosphate (ATP) or a number of other nucle-
152
MARTHA S. BRIGGS AND LILA M. GIERASCH
otides were supplied, in the absence of a transmembrane potential, that the secretory proteins were translocated posttranslationally into E. coli inner-membrane vesicles. In the absence of both the transmembrane potential and added nucleotides, no translocation occurred. Thus, they concluded that ATP is essential for protein translocation and that a possible function of the membrane potential is the generation of ATP for use by the protein-secretion apparatus. Rhoads et al. (1984) noted that an energy source such as ATP or the membrane potential may be necessary only in posttranslational export processes. Supporting this idea is the finding that ATP is also required in posttranslational translocation in yeast in vitro systems (Hansen et al., 1986; Rothblatt and Meyer, 1986). Also, import of proteins into mitochondria and chloroplasts can occur posttranslationally and requires a transmembrane potential (Hay et al., 1984; Kreil, 1981). As noted above, the ER does not have a transmembrane potential and has usually displayed cotranslational secretion. Cotranslational secretion in E. coli may not require energy in addition to that supplied by protein synthesis (Pages et al., 1978). It is clear, however, that E. coli requires an energy source for (at least) posttranslational secretion. Thus, there appears to be a correlation between the energy requirements of protein secretion and the coupling of translation and translocation.
VI. WHATARETHE ROLESOF THE SIGNAL SEQUENCE? As discussed above, the roles of the signal sequence are not yet clearly defined, but its importance is well established. Among the questions raised by existing genetic and biochemical information are: With what components of the secretion apparatus does the signal sequence interact? Does the signal sequence come into direct contact with the membrane lipids? What are the principal factors (e.g., electrostatics, net hydrophobicity, amphiphilicity, conformation) influencing interactions of the signal sequence with the membrane lipids or proteins? These questions have been addressed by various studies of isolated signal sequences and precursor proteins. A . Studies of Precursor Proteins and Isolated Signal Sequences Genetic and biochemical data support the idea that the intrinsic nature and properties of signal sequences are related to function, and that it is not necessary to invoke the influence of the rest of the protein to explain at least part of the secretion process. For example, the ability of signal sequences to be transferred from one secreted protein to another while retaining export function argues that interactions of signal se-
MOLECULAR MECHANISMS OF PROTEIN SECRETION
153
quences with their associated proteins are less important to function than the properties of the signal sequences alone (see Section 111,C). In addition, precursor proteins are often detected by their reaction with antibody to the mature protein (Lingappa et al., 1984; Takahara et al., 1985; Palva et al., 1982; Bankaitis et al., 1984; Randall, 1983). As antibodies have been used as conformational probes (Lewis et al., 1983; Berzofsky, 1985), the ability of the precursors to cross-react with antibody to mature protein indicates conformational similarities, and implies that the non-signal sequence portion of the precursor protein folds into the same conformation as the mature protein. This is also supported by the reported activity of the maltose-binding protein precursor (Ferrenci and Randall, 1979) and by its detergent-binding characteristics (Dierstein and Wickner, 1986). Biophysical and biochemical studies of precursor proteins and isolated signal sequences have provided data on the conformational properties and membrane and protein interactions of signal sequences. The results of these studies, discussed in the following sections, are yielding information on the detailed molecular mechanism of signal-sequence action.
B . Confomational Studies of Signal Sequences Baty and Lazdunski (1979) used antibodies to demonstrate conformational homology among bacterial signal sequences. Some antibodies raised against the precursor form of E. coli alkaline phosphatase also react with mature alkaline phosphatase. These were removed by affinity chromatography; the remaining antibodies reacted with the alkalinephosphatase signal sequence. A disproportionate amount of the antibody population raised against the alkaline-phosphatase precursor is directed against the signal sequence, suggesting that this peptide is exposed at the surface of the precursor protein. This conclusion is strengthened by the finding that antibody to alkaline-phosphatase signal sequence also binds the signal sequence of leucine-isoleucine-valinebinding protein. However, the same antibody also bound aminopeptidase, which has no signal peptide, but does have a membrane-spanning region. The authors argue that this finding implies that the signal sequences and the membrane-bound part of aminopeptidase have similar conformations. More direct determinations of signal-sequence conformations have been obtained by circular dichroism (CD) and infrared (IR) spectroscopy of synthetic signal sequences, signal-sequence fragments, and peptides resembling signal sequences. In polar solvents, especially aqueous systems, CD data indicate that the 23-residue signal sequence of phage M13 coat protein (Shinnar and Kaiser, 1984) (Fig. 6), a 19-residue peptide
154
MARTHA S. BRIGGS AND LILA M. GIERASCH
50000
40000 d
8
(u‘ 30000
6
W
a w
g-10000 L
8
=-200 00 -30000 180
190
200
210
220
230
240
250
UAVUENGTH n m
FIG. 6. Circular dichroism spectra of synthetic phage M13 signal peptide in pH 2.8 phosphate buffer (-), showing predominantly random coil structure, and upon addition of 33% hexafluoroisopropyl alcohol (---), showing predominantly a-helical structure. Reprinted, with permission, from Shinnar and Kaiser (1984).
similar to that of pretrypsinogen (Austen and Ridd, 1981), and the 25residue signal sequence of the E. coli A-receptor protein (Briggs and Gierasch, 1984; Briggs, 1986) adopt largely random conformations. A 29-residue peptide consisting of the “prepro” region of the parathyroid hormone (the 23-residue signal sequence plus the polar 6-residue “pro” segment) adopted a predominantly p conformation (Rosenblatt et al., 1980). All of these synthetic signal peptides, as well as peptides resembling the signal sequences of lysozyme and lipoprotein (Reddy and Nagaraj, 1985),became partially a helical in polyfluorinated alcohols [trifluoroethanol (TFE) and hexafluoroisopropanol (HFIP)], which are more hydrophobic than water and have been used as models for the membrane interior. These solvents promote the formation of intramolecular hydrogen bonds, and consequently induce a-helix formation. In
MOLECULAR MECHANISMS OF PROTEIN SECRETION
155
contrast to the above results, Katakai and Iizuka (1984) have determined the conformations of synthetic signal-sequencefragments by CD and IR. They reported that peptides containing 13- or 14-residue fragments of three eukaryotic signal peptides (preimmunoglobulin light chain, pretrypsinogen, and pre-P-lactoglobulin) have random conformations in HFIP, become a helical in mixtures of HFIP and nonfluorinated alcohols, and adopt P structure in aqueous HFIP. These studies indicate that signal sequences adopt different conformations in polar and apolar environments. In general, the peptides have little structure in aqueous environments. An a-helical conformation is induced in the presence of apolar solvents. Thus, it has been proposed that the signal peptide undergoes a conformational change on passage from the cytoplasm to the membrane (Austen and Ridd, 1981; Rosenblatt etal., 1980; Katakai and Iizuka, 1984).It was also concluded that the active form of the signal sequence adopts an a-helical conformation in the membrane (Shinnar and Kaiser, 1984; Rosenblatt et al., 1980; Katakai and Iizuka, 1984; Reddy and Nagaraj, 1985). These experiments have not demonstrated that the conformational preferences of signal sequences are important to their ability to export proteins. To address this problem, we synthesized the family of E. coli Areceptor protein wild-type and mutant signal sequences (shown in Fig. 5 and described in Section II1,H) and determined their conformations in various polar and apolar environments by CD (Briggs and Gierasch, 1984; Briggs, 1986). The solvents for these experiments included aqueous buffer and TFE, as described above. In addition, sodium dodecyl sulfate (SDS) micelles and phospholipid vesicles were used as membrane model systems. The conformations of the LamB signal sequence family are summarized in Table 11. All of the LamB signal peptides are predominantly random in aqueous solution. In the other solvent systems, including those designed to mimic the membrane, the functional signal peptides take on more regular structures, while the nonfunctional deletion-mutant peptide remains mostly random. The functional peptides have a strong tendency to form an cr helix, but can also adopt a significant amount of P structure in some solvents. Thus, there is a clear correlation between the conformational tendencies of the signal sequences and their abilities to function in vzvo. These findings support the idea that the nature and properties of the isolated signal sequence are related to function, and that it is not necessary to invoke the influence of the rest of the protein to explain at least part of the secretion process. Our CD studies in water and SDS show that the nonfunctional deletion-mutant signal peptide has significantly lower tendency to form an a
156
MARTHA S. BRIGGS AND LILA M. GIERASCH TABLE I1 LamB Signal-Peptide Conformations","
Wild type Buffer 40 mM SDS 50% TFE POPEIPOPG vesicles Deletion mutant Buffer 40 mM SDS 50% TFE POPElPOPG vesicles Gly --* Cys pseudorevertant Buffer 40 mM SDS 50% TFE POPEIPOPG vesicles Pro + Leu pseudorevertant Buffer 40 mM SDS 50% TFE POPEIPOPG vesicles
% a Helix
% p Structure
7% Random
5 60 40 60
15 10 15 25
80 30 45 15
5 20 30 30
15 15 0 0
80 65 70 70
10 35 35 40
10 10 5 15
80 55 60 45
10 60 45 95
10 25 15 0
80 15 40 5
Based on curve-fitting of circular dichroism spectra [from Briggs (1986)l using reference spectra from Greenfield and Fasman (1969). bBuffer 5 mM Tris, pH 7.3. SDS, Sodium dodecyl sulfate; TFE, trifluoroethanol; POPEIPOPG, l-palmitoyl-2-oleoylphosphatidylethanolamine/~-palmitoyl-2-oleoylphosphatidylglycerol,65 : 35.
helix than do the functional wild-type or revertant peptides. These results support the proposal of Emr and Silhavy (1983) that functional signal sequences must adopt an a-helical conformation at some point during secretion. Conformational analysis suggests that the presence of proline and glycine separated by only three residues in the deletion mutant disrupts the helix-forming potential of the signal's hydrophobic core. The pseudorevertants, in which one of these residues is replaced by a helix-promoting residue while maintaining the same length as the deletion-mutant signal peptide, adopt a helical conformation in micellar SDS. It should be noted that these pseudorevertants are those predicted by Bedouelle and Hofnung (1981b) based on their HAL (see Section 111,F). CD spectra in lysolecithin micelles indicate that the functional
MOLECULAR MECHANISMS OF PROTEIN SECRETION
157
signal-peptide fragments also adopt a significant amount of /3 structure, but that the nonfunctional deletion-mutant peptide does not (Briggs and Gierasch, 1984). This correlation, like the conformational predictions discussed above, raised the possibility that the signal sequence may have to take on p structure or some other non-a-helical secondary structure during the secretion process.
C. Interactions with Lipids There is biological evidence for the interaction of the signal sequence with lipids and with proteins of the secretion apparatus [for example, see the experiments of DiRienzo and Inouye (1979), Rhoads et al. (1984), and the Pagits group (Pagits et al., 1978; Pagks and Lazdunski, 1981), in which alteration of the membrane lipid fluidity disrupted protein secretion, as discussed in Section IV,A]. Biochemical and biophysical studies using isolated signal peptides also suggest that the signal sequences interact with both lipids and proteins (e.g., SRP and/or a signal-peptide receptor; see Section IV,D) during the export process. Interaction of synthetic signal sequences with lipid vesicles has been observed in three laboratories. Nagaraj (1984) synthesized fragments of the chicken lysozyme signal sequence labeled at the amino terminus with the fluorescent dansyl (5-dimethylamino-1-naphthalenesulfonyl) group. Short (3-8 residues) fragments of the signal sequence show increased fluorescence intensity and a blue shift in the emission maximum when small unilamellar vesicles are added to an aqueous solution of peptide, indicating that the peptides are bound to the vesicles. Longer peptides (9 and 12 residues) had fluorescence spectra characteristic of an aggregated sample. The fluorescence intensity did not increase on addition of vesicles to these peptides, nor did the emission maximum shift. However, based on the effect of temperature on the fluorescence polarization, the author concluded that all of the peptides bind to the bilayers and suggested that the binding of the longer peptides does not affect the emission spectrum since the environment of the dansyl group may not differ significantly in aggregates and in the vesicles. Shinnar and Kaiser (1984) found that addition of the M13 coat protein signal sequence to small unilamellar vesicles caused aggregation, as judged by increased light scattering of the sample. The functional LamB signal peptides also induce vesicle aggregation at peptide to lipid ratios of about 1 : 50, while the nonfunctional deletion-mutant signal peptide does not cause aggregation, even at ratios of 1 : 10 (Briggs et al., 1985). Although the reasons for vesicle aggregation and fusion are unclear, it is apparent that there is some interaction of the signal peptide with the lipids. These observations, together with the results of Nagaraj and our
158
MARTHA S. BRIGGS AND LILA M. GIERASCH
studies of signal-peptide insertion into phospholipid monolayers described below, indicate that signal sequences partition into lipid environments spontaneously. IR spectroscopy of the A-receptor protein signal peptide in phospholipid monqlayers shows that the peptide affects the packing of the lipid hydrocarbon tails (M. S. Briggs, R. A. Dluhy, D. G. Cornell, and L. M. Gierasch, unpublished results). In samples formed at the same surface pressure, the lipid tails are oriented differently in the presence and absence of signal peptide. A phospholipase assay for structural defects in phospholipid bilayers (Jain et al., 1984) indicates that the A-receptor protein signal peptide interacts with vesicles to induce such defects. The peptides perturb the lipid structure at lower mole fractions than do various lysophospholipids. These data provide yet another indication that signal peptides interact with and perturb lipid complexes. However, Bendzko et al. (1982) found that preproinsulin does not bind to small vesicles of dimyristoylphosphatidylcholine (DMPC) or to smooth microsomal membranes, but does bind to rough microsomal membranes. In contrast, cytochrome b5, which has an N-terminal insertion sequence rather than a signal sequence, binds to rough and smooth microsomes and to DMPC vesicles. Binding of preproinsulin to rough microsomes is abolished by prior treatment of the microsomes with protease; protease treatment does not affect cytochrome b5 binding. These observations suggest that binding of preproinsulin to membranes is mediated by a specific proteinaceous receptor and not by spontaneous dissolution of the signal sequence in the bilayer. This result is in conflict with those described above. A possible explanation for this discrepancy is the following: In the studies that found interactions of signal sequences with lipid, isolated signal sequences were used. In the experiments that show no interaction with the membrane, an entire precursor protein was studied. Synthetic signal sequences are very hydrophobic and can be quite insoluble, either precipitating (Katakai and Iizuka, 1984) or aggregating (Nagaraj, 1984) in aqueous solution. Thus the results of Nagaraj (1984) and Shinnar and Kaiser (1984) may be explained as a favorable partitioning of the signal peptide into a hydrophobic medium. In contrast, precursor proteins are often water soluble, due to the presence of the hydrophilic mature protein, and may partition more strongly into water than do isolated signal peptides. Thus, partitioning of a precursor protein into a bilayer may be less favorable. It may bind loosely via an exposed signal peptide, but a tight association with the membrane may require an interaction of the signal peptide with a specific receptor. This then explains the results of Bendzko et al. (1982) (no binding to phospholipid vesicles, but binding to ER microsomes). In cases of cotranslational secretion, the entire precur-
MOLECULAR MECHANISMS OF PROTEIN SECRETION
159
sor is not found in the cytoplasm; thus studies of the isolated signal sequence may be more relevant to the situation in vivo (viz. signal sequences emerging from a ribosome/SRP complex). The question remains from these experiments whether the association of isolated signal sequences with lipids has functional significance. We have studied the lipid interactions of three synthetic signal peptides from the previously described family of functional and nonfunctional E. coli X-receptor protein signal sequences by surface tensiometry. Phospholipid monolayers have been used to simulate the membrane bilayer (Colacicco, 1970; Verger and Pattus, 1982; Rothfield and Fried, 1975; Fendler, 1982). A surface-active species such as a detergent or an amphiphilic peptide or protein dissolved in the aqueous phase beneath the monolayer can enter the monolayer and interact with the lipids at both the head groups and the hydrocarbon chains. Its insertion into the monolayer is indicated by an increase in the surface pressure or the surface area of the monolayer (Pethica, 1955). The result is a mixed monolayer. Adsorption of a solute to the lipid head groups without penetration into the hydrophobic region can also cause a change in surface pressure or surface area by electrostaticeffects, but the change is generally smaller (1-2 versus 8 or more dyn/cm) than is observed in cases of insertion (Mayer et al., 1983). Insertion of the signal peptides into the monolayer was studied by monitoring the increase in surface pressure at constant area (Briggs et al., 1985; Briggs, 1986). The increase in surface pressure depends on the concentration of the signal peptide in the subphase. At low concentrations, the surface pressure changes very little; as concentration increases, the surface pressure rises to a plateau, indicating saturation. The dependence of the surface pressure change on peptide concentration is shown in Fig. 7. The relative affinities of the signal peptides for the phospholipid monolayer were determined by measuring their critical insertionpressures. The tendency of a surface-active molecule to penetrate a monolayer of another is inversely proportional to the initial surface pressure of the monolayer (Verger and Pattus, 1982; Phillips and Sparks, 1980). The lipid monolayer pressure above which the penetrating molecule no longer inserts is called the critical pressure of insertion. It is obtained by measuring the dependence of the surface pressure increase on the initial monolayer surface pressure and extrapolating to a pressure increase of zero (Fig. 8). In both experiments, the wild-type and Pro + Leu pseudorevertant signal peptides behave very similarly. Particularly striking is the finding that the critical insertion pressure is 38 dyn/cm for both functional signal peptides. In contrast, the critical insertion pressure of the deletion-mutant signal peptide is 26 dynlcm. For purposes
160
MARTHA S. BRIGGS AND LILA M. GIERASCH
7-
1
h
\
c
k
0
4.8
I
8.8
5.6
7.2 [peptide]
6.4,
-log
8.0
8.8
FIG.7. The increase in surface pressure of phospholipid monolayers as a function of signal-peptide concentration for the various E. coli LamB synthetic signal sequences (from Briggs, 1986). A monolayer of egg phosphatidylethanolamine and egg phosphatidylglycerol(65 : 35) was spread from a benzene solution onto 5 mM Tris buffer, pH 7.3, yielding a final surface pressure of 20 dyn/cm after evaporation of the benzene. The peptide was added by injecting a concentrated solution below the lipid-water interface. The surface pressure was measured by the du Noiiy ring method with a Fisher Autotensiomat equipped with a platinum-iridium ring. The plateau values are plotted as a function of the peptide concentration for the wild-type (0), Pro + Leu pseudorevertant (A), and deletionmutant (0)peptides.
v
" I \ \
-€ 2 Y
C r U v
$a-
"0
8
16
24
32
40
ri(dyn/cm)
FIG.8. Determination of critical pressures of insertion of synthetic E . coli LamB signal peptides (from Briggs, 1986). A monolayer of 1-palmitoyl-2-oleoylphosphatidylglycerol and 1-palmitoyl-2-oleoylphosphatidylethanolaminewas spread as described in Fig. 7 to yield the desired initial surface pressure. Peptide was injected below the lipid surface LO a final concentration of 1 pA4 for the wild-type and Pro + Leu pseudorevertant peptides, and 2 pA4 for the deletion-mutant peptide. Surface pressure plateau values (AT) are plotted versus the initial surface pressure (P,) for wild-type (0),Pro -+ Leu pseudorevertant (A), and deletion-mutant (0)peptides.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
161
of comparison, it is notable that the equivalent surface pressures of cell membranes are estimated to fall between these values: viz., near 34 dynlcm (van Zoelen et al., 1977; Ter-Minassian-Saraga, 1979; Quinn and Dawson, 1969;Jackson et al., 1979). Thus, it appears that the capability of inserting into a lipid phase at physiological surface pressure is related to the intrinsic properties of the native functional signal sequences. These results show that the ability of these signal peptides to interact with phospholipid monolayers indeed correlates with their in vim activity. The pressure increases due to the functional signal peptides (8-1 1 dyn/cm) are in the same range as those caused by proteins known to insert into monolayers (Bougis et al., 1981). In contrast, prothrombin, which binds only to the membrane surface, causes a pressure increase in a phospholipid monolayer of 1.9-2.3 dyn/cm (Mayer et al., 1983). These values are almost identical to those obtained for perturbation of the monolayer by the deletion-mutant signal peptide. These findings suggest that the functional and nonfunctional signal peptides interact with the monolayer in different ways. The functional signal peptides insert into the hydrocarbon region of the monolayer, while the nonfunctional peptide binds only to the head groups. In a high-salt buffer (5 mM Tris, 0.15 M NaCl, pH 7.3), the interaction of the deletion-mutant signal peptide fragment with the monolayer is abolished (Briggs et al., 1985), implying that the binding forces are electrostatic in nature. High salt also affects the adsorption of the functional signal peptide fragments. The pressure change decreases by about 5 dyn/cm. Thus, the interactions of the functional signal peptides with phospholipid monolayers are both hydrophobic and electrostatic. The critical insertion pressures yield a rough measurement of the point at which the forces favoring transfer of the peptide from the subphase to the monolayer are balanced by the compressional forces opposing the addition of material to the surface. The critical pressures can be multiplied by the cross-sectional area per peptide molecule, if known, to provide estimates of the energies of insertion of the peptides into the monolayer. The areas of the inserted peptides can be assumed to be between 120 and 450 A*, which represent the extremes of vertical and horizontal orientations of a helical signal peptide in the monolayer, as measured from models. The insertion energies of the functional signal peptides are thus nearly double that of the nonfunctional signal peptide, assuming the same orientation. Assuming that the cell membrane’s equivalent surface pressure is about 30 dynlcm, one can estimate the excess energy (i.e., energy in addition to that required for peptide insertion) available from this favorable signal peptide-membrane inter-
162
MARTHA S. BRIGGS AND LILA M. GIERASCH
action to be 1.5-5 kcal/mol, depending on ~rientation.~ This suggests that signal peptide-lipid interactions contribute significantly to lowering energy barriers to protein translocation. The correlation of surface activity with function among these peptides may arise because of the different tendencies of these peptides to adopt secondary structures, such as the a helix, that minimize free (i.e., nonhydrogen bonded) amide groups, and thus enhance the effective hydrophobicity of the uncharged core region and the amphiphilicity of the signal peptide overall. The distinctions among the peptides are most clearly illustrated by comparing the pseudorevertant and deletion-mutant peptides. Although the hydrophobicities of their size chains are nearly equal, their lengths are the same, and their charged residues are identical, the pseudorevertant has a greater propensity to form secondary structure and interacts more strongly with phospholipid monolayers than does the deletion mutant. It is clearly desirable to correlate the conformations of signal peptides with their behavior in phospholipid environments. D . Conformations of Isolated Signal Sequences in Membranes The experiments described in Sections VI,A,B show that two physical properties of the synthetic LamB signal peptides correlate with their in vivo export function: tendency to adopt an a-helical conformation in hydrophobic environments, and tendency to insert into lipid monolayers. These properties may be involved in the same step in the secretion process, or in different steps. An a-helical conformation may be required to generate a structure sufficiently hydrophobic to allow monolayer insertion. Alternatively, these properties may reflect separate roles of the signal sequence in protein secretion. For instance, an a-helical conformation may be necessary for binding to a proteinaceous site, while the ability to interact with lipids may be important for another step in the secretion process. We have studied the conformations of the synthetic LamB signal peptides in phospholipid vesicles and monolayers by CD and IR spectroscopy. The CD spectra of the synthetic signal peptides in vesicles (described in Section VI,A) confirm that the functional signal peptides indeed adopt an a-helical conformation in a bilayer, while the nonfunctional peptide does not. Thus, the tendency of the signal peptides to take on an The critical insertion pressure of the functional signal peptides is 38 dyn/cm, while the estimated equivalent pressure of biological membranes is only 30 dyn/cm. The difference between these values, 8 dyn/cm, can be multiplied by the peptide’s assumed molecular area to yield an estimate of the work that can be done by the peptide on insertion.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
163
a-helical conformation appears to be directly related to their tendency to interact with lipid structures. The surface tensiometry experiments described in Section VI,C indicate that the synthetic LamB signal peptides can interact with phospholipid monolayers in two ways: by binding electrostatically to the head groups of the lipid and by inserting into the hydrocarbon region of the monolayer. By selection of the surface pressures at which the monolayer CD and IR samples are prepared, it is possible to determine the signal peptide’s conformation in each of these two binding modes. Two sets of samples were prepared. One set, in which the monolayer is spread at a surface pressure higher than the peptide’s critical insertion pressure, allows electrostatic binding, but prevents monolayer insertion. The other set of samples is prepared with the monolayer initially at a surface pressure lower than the critical insertion pressure; these conditions allow both insertion and electrostatic interactions. The procedure for depositing peptide/lipid monolayers on quartz plates or attenuated total reflectance (ATR) crystals for spectroscopy has been described by Cornell (1979, 1982).. The CD and IR spectra of the wild-type signal peptide in the presence of phospholipid monolayers are shown in Figs. 9 and 10. The CD spectrum of the peptide interacting with a monolayer below its critical insertion pressure fits to a conformation of 30% a helix and 70% p structure. The IR spectrum confirms the presence of both a helix and /?structure. This sample contains both peptide that is inserted into the monolayer and peptide that is electrostatically bound to the head groups, in unknown proportions. The electrostatically bound peptide alone is best fit as predominantly /3 structure, as judged from the CD spectrum of the peptide interacting with the monolayer at high pressure. The IR spectrum of this sample also shows a largely p structure, and furthermore indicates that the peptide is highly oriented (Briggs et al., 1986). Since the CD spectrum of the low-pressure peptidellipid monolayer contains contributions from both inserted and electrostatically bound peptide, and the electrostatically bound peptide is nearly entirely p structure, the inserted peptide is at least 30% a helical, and probably more. For example, if half of the peptide in the low-pressure monolayer is inserted, and half is electrostatically bound, the inserted peptide is 60% a helical and 40% /3 structure. There is a clear difference in the peptide’s conformation between the electrostatically bound and inserted modes of interaction with the monolayer. If these conformations also correlate with activity in vivo (experiments in progress), the difference may indicate that the signal peptides undergo a conformational change from /3 structure to a helix during the initial stages of protein secretion (see Section V1,A).
164
MARTHA S. BRIGGS AND LILA M. GIERASCH
6
CD 0
-3
- v
I90
210
230
250
X(nm) FIG.9. CD spectra of the wild type E. coli LamB synthetic signal peptide in phospholipid monolayers. The experiment was carried out as described in Figs. 7 and 8. The solid line spectrum was obtained for films spread at pressures below the peptide’s critical pressures of insertion, and the broken line spectrum for films spread above the peptide’s critical pressure of insertion. Hence, the former represents inserted plus adsorbed peptide, and the latter is from adsorbed peptide only. Experimental details are reported in Briggs et al. (1986). Copyright 1986 by the American Association for the Advancement of Science.
An even more striking comparison can be made between the wild-type signal peptide’s conformation when adsorbed to the monolayer and its conformation in aqueous solution. In both of these environments, the peptide should be solvated by water, but its conformations are very different. The peptide is 100%p structure when adsorbed to the monolayer, while it is 80% random in aqueous buffer. Thus, it appears that contact with the lipid surface induces substantial amounts of secondary structure in a molecule that takes on little structure in an aqueous environment. This finding implies that the initial binding of a signal sequence to a membrane may induce a particular structure, which may be important to the mechanism of signal-sequence function.
MOLECULAR MECHANISMS O F PROTEIN SECRETION
A
1800
1630
-4
1700
1600
165
4554
1500
1400
WAVENUMBER, C M - '
1800
1700
1600
WAVENUMBER,
1500
1 30
CM-'
FIG. 10. IR spectra of the wild-type E. coli LamB synthetic signal peptide in phospholipid monolayers. (A) Peptide adsorbed to the monolayer (film formed above the critical insertion pressure of the peptide). (B) Peptide adsorbed and inserted (film formed below the critical insertion pressure). Characteristic amide I bands for a-helix (or random coil): 1660 cm-I, for P-structure: 1630 cm-I. The amide 111 band (at lower frequencies) was used to confirm that the 1660 cm-l band was due to helix and not coil. Experimental details are reported in Briggs et al. (1986). Copyright 1986 by the American Association for the Advancement of Science.
166
MARTHA S. BRIGGS AND LILA M. GIERASCH
The CD spectra of the peptides in vesicles and monolayers confirm that the functional signal peptides adopt a helical conformation in the lipid phase. The monolayer CD work has allowed dissection of the modes of interaction of the wild-type signal peptide with lipids. When not inserted, the wild-type signal peptide adopts a /3 structure; as the signal peptides have predominantly random conformations in water, it is likely that interaction with the lipid surface induces this secondary structure. When inserted into the monolayer the wild-type peptide adopts an a-helical conformation. In fact, the affinity of the signal sequences for lipids may be due to their ability to become a helical. Adoption of a helical structure minimizes the surface exposure of the amide groups and thus enhances the hydrophobicity of the core region, and the amphiphilicity of the signal region overall. E . Interactions with Proteins The results described above in no way diminish the probability that various proteins are necessary for protein secretion, and that some of these proteins interact with signal sequences. Isolated signal sequences and precursor proteins have been shown to bind to proteins in the RER and to inhibit the translocation and processing of secretory proteins. For example, experiments of Rapoport and colleagues (Bendzko et al., 1982; Prehn et al., 1980, 1981) demonstrating the existence of a signalpeptide receptor in RER membranes were mentioned above (Section III,C,4). Precursors of secretory proteins were shown to bind to specific and saturable signal-peptide receptors in the RER membrane (Prehn et al., 1980). Binding is independent of ribosomes and is eliminated by prior treatment of the membrane with protease. Precursors of carp proinsulin and human placental lactogen competed for the receptors, while nonsecretory proteins did not, showing that binding is saturable and specific for exported proteins (Prehn et al., 1981). It was shown that the receptor sites determined in these studies of previously synthesized proteins are identical to those used during cotranslational export by experiments in which precursor proteins were bound to ER membranes before addition of a cell-free translation system. The newly synthesized nascent chains were not translocated or processed, presumably because the signal receptors were blocked by the bound precursors (Prehn et al., 1981). Synthetic signal sequences have also been found to bind to RER membranes. Habener et al. (1978) reported that the chemically synthesized signal sequence of bovine proparathyroid hormone associated with RER microsomes to a greater extent than does a peptide fragment of the mature proparathyroid hormone. Austen and colleagues (Austen and Ridd, 1983; Austen et al., 1984; Robinson et al., 1985) have designed and
MOLECULAR MECHANISMS OF PROTEIN SECRETION
167
synthesized a “consensus” signal peptide that incorporates the common structural features of known signal sequences. The signal peptide associated with RER membranes from which bound ribosomes had been removed. It did not bind to smooth ER membranes. A control peptide did not bind to microsomal membranes. Binding of the signal peptide was tight (Kd = 1 X lo-’ M) and saturable. Robinson et al. (1985) have identified the receptor as a 45,000-Da microsomal protein by cross-linking experiments. The protein is released from the membrane only by high concentrations of detergents, indicating that this is an integral membrane protein. Austen et al. (1984) found that the consensus signal peptide inhibits protein translocation into RER microsomes. Synthetic preproparathyroid hormone signal sequence also decreased the translocation and processing of four prehormones in a cell-free translation system (Majzoub et al., 1980). Synthesis of the prehormones appears to be unaffected. Addition of a control peptide failed to inhibit processing. A synthetic signal peptide also affected secretion in vivo. Koren et al. (1983) injected synthetic mouse light-chain immunoglobulin signal sequence into Xenopus oocytes. The peptide competitively inhibited localization of exported and membrane proteins. Cytoplasmic proteins were unaffected. The inhibition is time-dependent, rising to a maximum of 40% at 1 hour after injection, and returning to zero after 3 hours. No inhibition was observed when signal peptide was allowed to react with anti-signal peptide antibodies prior to injection into the cell. These observations are consistent with blockage of export sites or SRP binding by the signal peptide, followed by its degradation either in the cytoplasm or in the membrane. The signal peptide also affected the rate of secretion of proteins that have been translocated into the ER. The transfer of the proteins from the ER to the Golgi apparatus, and via the Palade pathway to the medium, is accelerated. The acceleration is a specific effect of the injected signal peptide, as injection of detergents or other peptides failed to accelerate secretion. Prior treatment of the signal peptide with anti-signal peptide antibody also abolished the acceleration of secretion. The authors suggest that the signal peptide has a regulatory role in the posttranslocational steps of protein secretion. The ability of the LamB synthetic signal sequences to inhibit protein translocation in vitro correlates with their activity in vivo (L. Chen and P. C. Tai, personal communication). The wild-type and mutant E. coli LamB signal sequences described above (Section II1,H) were added to the cell-free translation/translocation system of Chen et al. (1985). The wild-type peptide blocks translocation of OmpA and alkaline phosphatase; 50% inhibition is reached at a peptide concentration of 1-2 pM.
168
MARTHA S. BRIGGS AND LILA M. GIERASCH
T h e Pro + Leu pseudorevertant peptide, which is functional in uivo,is at least as effective as the wild type at inhibiting translocation. The other functional signal peptide, the Gly + Cys pseudorevertant, has somewhat less inhibitory effect. In contrast, the nonfunctional deletion-mutant signal peptide has little effect on translocation, even at -4 /AM.Since addition of the signal peptides after the precursor proteins have been translocated into membrane vesicles does not expose the mature protein product to added protease, these inhibitory effects are not due to membrane disruption. VII. RECAPITULATION From the work reviewed in the above sections, we can summarize what is known regarding roles of the signal sequence. I n eukaryotic systems, secretion of a protein across the ER requires the presence of a functional signal sequence and participation of SRP, the membrane and associated as yet undefined export apparatus, the SRP receptor, and a signal peptidase. Biochemical evidence derived from experiments in in vitro translocation systems generally supports the signal hypothesis (see Section 11,C). Although there is less biochemical information on secretion in prokaryotes, genetic data indicate that the process is likely to be similar to that in eukaryotes. The development of in vitro translocation assays for bacterial systems (Muller and Blobel, 1984a,b; Rhoads et al., 1984; Chen et al., 1985) should allow a more detailed analysis of the biochemistry of prokaryotic secretion in the near future. For purposes of discussing the roles of the signal sequence, the mechanism of secretion can be viewed as separable into three parts. [Others have suggested similar divisions of the secretion process. See for example, Randall and Hardy (1984a,b).] First the complex consisting of the ribosome, mRNA, and the nascent secretory protein must be directed to the appropriate membrane. T h e primary actor in this step in eukaryotes is the SRP, which serves as a delivery system (viz., interaction with SRP enhances the probability and rate of formation of a productive association with the membrane). T h e SRP dissociates from the synthetic machinery once this step is complete, and is not needed for subsequent parts of the secretion process. One SRP molecule may, in fact, be sufficient to target several adjacent polysomal ribosomes to a membrane as a group. Thus, the ratio of SRP to ribosomes synthesizing secretory proteins is less than one, as noted by Gilmore et al. (1982). T h e role of the signal sequence in this step is probably confined to
MOLECULAR MECHANISMS OF PROTEIN SECRETION
169
labeling the nascent protein as a secretory one, and allowing recognition and binding by SRP. However, it may also perform a regulatory function, as the differences in signal sequences could result in varying affinities for SRP, which would in turn translate into nonuniform rates of protein secretion. Once the synthetic machinerylnascent protein complex has been delivered to the membrane by SRP, it interacts with the membrane to form a translocation-competent species. It is in this process that we envision the signal sequence to play its most active and crucial part. In Section VIII we propose a model for the initial interactions of a signal sequence with the membrane. These interactions bring the signal sequence and nascent protein into the proper orientation for membrane binding and effect the initial entry of the protein into the membrane interior, readying it for the next step, which is translocation. The translocation step is probably the point at which prokaryotic and eukaryotic secretion differ most. The energy for this process may derive from different sources: from the energy of protein synthesis in eukaryotes, and from protein synthesis andlor ATP hydrolysis andlor the membrane potential in prokaryotes. [In fact there is evidence for more than one secretion pathway in E. coli. The degree of coupling between translation and translocation may also be different in prokaryotes and eukaryotes (Section V,C).] We predict that the signal sequence plays only a minor part in the translocation process. It may serve as an anchor to keep the ribosome in contact with the membrane. In some cases, however, it has been shown that removal of the signal sequence by signal peptidase occurs before translocation is complete (Sections IV,B and V,C), implying that the presence of a signal sequence is not required in this step. Posttranslational or domain translocation (Section V,C) requires a more active role for the signal sequence. As first postulated by Wickner (1980), such a mechanism may require that the signal sequence interact with the structural sequence of the protein in order to influence its conformation and membrane binding. Indirect evidence for such an interaction comes from genetic studies of mutations in the signal sequences of the E. coli proteins LamB and MBP that can be suppressed by second-site mutations in the structural gene (Silhavy et al., 1983; Bankaitis et al., 1984). In sum, the most active role played by the signal sequence in the export process appears to be in the second step, where the initial encounter of the export-competent assembly with the membrane occurs. In Section VIII we present a model for this encounter based largely on the biophysical studies presented in Section VI.
170
MARTHA S. BRIGGS AND LILA M. GIERASCH
FIG. 1 1. Model for initial interaction of a signal sequence with a membrane. The steps are described in the text. Note that the signal sequence is viewed as emerging from the ribosome into an aqueous environment (step I), then interacting with the charged surface of the membrane (step 2), and subsequently inserting into the hydrophobic region of the membrane (steps 3 and 4).Conformations adopted by the signal sequence in each of these steps are proposed to be random (aqueous), P-like (membrane surface), and (Y helical (inserted). A transient state (step 3) is suggested wherein the extended, /3-like signal peptide inserts then adopts a helical conformation. Associations with various components of the export apparatus may alter these simplified steps an vim. Modified from Briggs et al. (1986); copyright 1986 by the American Association for the Advancement of Science.
VIII. A MODELFOR THE INITIALINTERACTIONS OF SIGNAL SEQUENCES WITH THE MEMBRANE Presented below and illustrated in Fig. 11 is a model for the events that occur in uivo when a signal sequence first encounters the membrane. Step 1. At the membrane, the signal sequence binds electrostatically (via the basic residues) to the inner-membrane surface and adopts a folded p structure with a turn near residues -7 to -10 (Inouye and Halegoua, 1980). This structure has hydrophobic character overall, and an a proximate length of 34 8, (10 residues at a rise per residue of 3.4 ). Step 2. The signal peptide inserts into the membrane. We propose that the polar c region of the signal sequence resides transiently in the aqueous region of the periplasmic or lumenal side of the membrane.
w
MOLECULAR MECHANISMS OF PROTEIN SECRETION
171
This arrangement seems physically reasonable, and is supported by genetic data, since charged residues can be tolerated in this region, but not in the adjacent hydrophobic core (Silhavy et al., 1983). Step 3. The hydrophobic environment of the membrane causes a conformational change to an a- (or 310) helical conformation, in which intramolecular hydrogen bonding is maximized. As a result of this conformational change, a segment of the mature protein enters the membrane, provided that the basic residues continue to anchor the amino terminus on the inner surface of the membrane. In a helical conformation, the signal sequence spans the hydrophobic part of the bilayer, and the cleavage site falls on the opposite face of the membrane. This mechanism does not exclude the participation of proteins like the SRP and SRP receptor, as specified in the signal hypothesis. It does, however, require an exposed signal sequence, and thus is probably not consistent with the membrane trigger hypothesis, in which the signal sequence has a central role in the folding of the precursor protein. It can be accommodated in the domain model or the amphiphilic tunnel hypothesis as a first step in a folding process that occurs at or in the membrane. Like the helical hairpin hypothesis and the direct transfer model, it proposes that secretion is driven by a favorable free energy of transfer of the signal sequence from the cytoplasm to the membrane, followed by another low-energy step-the conformational change from a /3 sheet to an a helix. A /3 to a transition has been proposed in other models (Austen, 1979; Steiner et al., 1980; Rosenblatt et al., 1980). Thus, several features of this model are not new. For the first time, however, there are experimental data that support these features. IX. SIGNAL SEQUENCES AS MEMBRANE-INTERACTING SEQUENCES Signal sequences are characterized by exceptionally diverse and numerous roles; they are essential participants in the multistep process of protein secretion. While many details of this process are still poorly understood, it is clear that the involvement of the signal sequence includes recognition by proteins, interactions with membranes andlor membrane-resident components, facilitation of translocation, and specific cleavability. In these steps, the signal sequence probably undergoes conformational changes which themselves are required features of the mechanism. Yet despite these multifaceted functions of signal sequences, they are highly variable-between species, between related proteins, etc. Their sequence variability suggests remarkably few constraints on their evolutionary rate of change, whereas the apparent plethora of their functions
172
MARTHA S. BRIGGS AND LILA M. GIERASCH
g100,
20 -
0
'
would argue for highly constrained evolution. Figure 12 shows a comparison of the divergence rate of the mature chains of insulins from various species and their signal sequences, and illustrates that signal sequences do diverge rapidly with respect to other sequences. The explanation for this apparent paradox is that specific sequences are not required for signal peptide functions; it is instead the character of the signal sequence that must be preserved. Indeed, as discussed in detail in Section I11 of this review, the hydrophobic, ionic, polar, and conformational nature of the residues occurring in different regions of various signal sequences show strong homology, and the lengths of the characteristic regions of signal sequences are quite uniform. Table 111, showing several insulin signal sequences, demonstrates that the substitutions observed between species are conservative ones. Signal sequences represent a larger class of polypeptide sequences which share the characteristic of performing their functions by virtue of gross structural features and physical behavior in distinct environments. Other examples most likely include transmembrane sequences, viral fusion sequences, membrane entry sequences in toxins, and signal peptides of mitochondria and chloroplasts. These sequences are unlike seg-
173
MOLECULAR MECHANISMS OF PROTEIN SECRETION
TABLE 111 S i p 1 Sequences of Insulins" Species Human Monkey Dog Rat I
I1 Hamster Chicken Angler fish Salmon Carp Hagfish
Sequence MAL W M R L L P LLALL AL WG P DP AAA I MAL WMRL L P L LALL AL WG P DP A P A I MAL WMRL L P L L A L L A L WA P A P T R A I MAL WMRF L P L L A L L V L WE P KP AQA I MALWI RFLPLLALLI LWE PRPAQAI M T L W M R L L P L L T L L V L WE P NP ANA I MALWI RSLPLLALLVFS G P G T S Y A I M A A L W L Q S F S L L V L L V V S W P GS Q A I MALWLQAASLLVLLALS P GVDAI MAVWI QAGALLFLLAVS S VNAI MALS P F L A A V I P L V L L L S RAPPSADTI ~
All sequences from Watson (1984). The cleavage site is indicated by a slash (I).
ments of globular proteins or hormones or receptors, which are all involved in specific recognition processes (even if only required by packing within the protein structure). What this class of sequence has in common is the capacity to interact with membranes, usually by virtue of combined hydrophobic and electrostatic interactions, and to function by adopting conformations that are environmentally determined. The characteristics of these membrane-interacting sequences lead to two expectations, both already supported by preliminary evidence. First, since their functions do not depend strongly on specific interactions, one can fruitfully study their behavior using biophysical approaches in model systems that provide gross features of the in vivo environment. Our initial efforts to correlate biophysical studies of synthetic signal sequences with their ability to facilitate export in vivo support this expectation and encourage further work both on signal peptides and on other examples of membrane-interacting sequences. Second, signal sequences and other membrane-interacting sequences should be ready targets for genetic manipulation and de novo design. Many initial forays into these areas have been successfully accomplished (Adams and Rose, 1985; Davis and Model, 1985). It is also possible to alter a given native sequence to localize a normally cytoplasmic protein in a different cellular compartment (Hurt et al., 1984; van den Broek et al., 1985). The idea that multiple signals may exist in one protein, such as opsin (Friedlander and Blobel, 1985), which direct the topology of its incorporation in a membrane, leads to obvious applications in design of desired alterations in topology.
174
MARTHA S. BRlGGS AND LILA M. GlERASCH
Perhaps the most puzzling aspect of these emerging ideas about signal sequences is that signal sequences must also serve as the labels that inform the cellular machinery that the protein being synthesized must be delivered to the appropriate location. This step is postulated to occur via the SRP for export across the ER in eukaryotes. The SRP and other recognition assemblies must be able to bind specifically to sequences that share only the gross features described above-hydrophobicity, charge, conformation. This is a new concept in protein-based binding, and suggests that the proteins that interact with such sequences will be found to make use of novel modes of binding. ACKNOWLEDGMENTS We thank P. C. Tai and Tom Rapoport for critical reading of the manuscript and Gunnar von Heijne for sending us unpublished data. We are grateful to Peter Walter and Tom Silhavy for many helpful discussions. The willingnessof many to send us reprints and preprints is greatly appreciated. Th e writing of this review was supported, in part, by grants from the National Institutes of Health (GM 27616 and GM 34962). LMG is a Fellow of the A.P. Sloan Foundation (1984-86).
REFERENCES Adams, G. A., and Rose, J. K. (1985). Cell 41, 1007-1015. Akiyama, Y., and lto, K. (1985). EMBOJ. 4, 3351-3356. Amar-Costesec, A., Todd, J. A., and Kreibich, G. (1984).J. Cell B i d . 99, 2247-2253. Andrews, D. W., Walter, P., and Ottensmeyer, F. P. (1985). Proc. Nutl. Acad. Sci. U.S.A. 82, 785-789. Austen, B. M. (1979). FEBS Lett. 103, 308-313. Austen, B. M., and Ridd, D. H. (1981). Biochem. SOL. Symp. 46, 235-258. Austen, B. M., and Ridd, D. H. (1983). B i o c h . Soc. Tram. 11, 160-161. Austen, B. M., Hermon-Taylor, J., Kaderbhai, M. A., and Ridd, D. H. (1984). Biochem. J. 224,317-325. Bakker, E. P., and Randall, L. L. (1984). EMBOJ. 3, 895-900. Bankaitis, V. A., and Bassford, P. J., Jr. (1985). J. Bacterial. 161, 169-178. Bankaitis, V. A., Rasmussen, B. A., and Bassford, P. J., Jr. (1984). Cell 37, 243-252. Bassford, P. J., Silhavy, T. J., and Beckwith, J. R. (1979).J. Bucteriol. 139, 19-31. Bassuner, R., Huth, A., Manteuffel, R., and Rapoport, T. A. (1983). Eur.J. Bzochm. 133, 321-326. Baty, D., and Lazdunski, C. (1979). Eur. J. Biochem. 102, 503-507. Baty, D., Mercereau-Puijalon, O., Perrin, D., Kourilsky, P., and Lazdunski, C. (1981). Gene 16,79-87. Bedouelle, H., and Hofnung, M. (1981a). In “Membrane Transport and Neuroreceptors,” pp. 399-403. Liss, New York. Bedouelle, H., and Hofnung, M. (1981b). In “lntermolecular Forces” (B. Pullman, ed.), pp. 36 1-372. Reidel, Dordrecht, The Netherlands. Bedouelle, H., Bassford, P. J., Fowler, A. V., Zabin, I., Beckwith, J., and Hofnung, M. (1980). Nature (London) 285, 78-81. Bendzko, P., Prehn, S., Pfeil, W., and Rapoport, T. A. (1982). Eur. J . Biochem. 123, 121126.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
175
Benson, S. A., and Silhavy, T. J. (1983). Cell 32, 1325-1335. Berzofsky, J. A. (1985). Science 229, 932-940. Blobel, G. (1980). Proc. Natl. Acad. Sci. U.S.A. 77, 1496-1500. Blobel, G., and Dobberstein, B. (1975a).J. Cell Biol. 67, 835-851. Blobel, G., and Dobberstein, B. (1975b).J. Cell Biol. 67, 852-862. Blobel, G., and Sabatini, D. D. (1971). Biomembrunes 2, 193-195. Bogdanov, M. V., Kulaev, I. S., and Nesmayanova, M. A. (1984). Bzoorg. M a b r . (Moscow) 1,495-502. Bogdanov, M. V., Suzina, N. E., and Nesmayanova, M. A. (1985a). Bioorg. M a b r . (Moscow) 2,367-376. Bogdanov, M. V., Tsfasman, I. M., and Nesmayanova, M. A. (1985b). Bioorg. Membr. (MOSCOW) 2, 623-629. Bougis, P., Rochat, H., Pitroni, G., and Verger, R. (1981). Biochemistry 20, 4915-4920. Braell, W. A., and Lodish, H. F. (1982).J. Biol. Chem. 257, 4578-4582. Brennan, M. D., Warren, T. G., and Mahowald, A. P. (1980).J. Cell Biol. 87, 516-520. Brickman, E. R., Oliver, D. B., Garwin, J. L., Kumamoto, C., and Beckwith, J. (1984). Mol. G a . Genet. 196, 24-27. Briggs, M. S. (1986). Ph. D. Thesis, Yale University, New Haven, Connecticut. Briggs, M. S., and Gierasch, L. M. (1984). Biochemistry 23, 31 11-31 14. Briggs, M. S., Gierasch, L. M., Zlotnick, A., Lear, J. D., and DeGrado, W. F. (1985). Science 228, 1096-1099. Briggs, M. S., Cornell, D. G., Dluhy, R. A,, and Gierasch, L. M. (1986). Science, in press. Brown, P. A., Halvorson, H. O., Raney, P., and Perlman, D. (1984). Mol. Gen. Genet. 197, 351-357. Carlson, M., and Botstein, D. (1982). Cell 28, 145-154. Caulfield, M. P., Horiuchi, S., Tai, P. C., and Davis, B. D. (1984). Proc. Natl. Acud. Sci. U.S.A. 81, 7772-7776. Caulfield, M. P., Furlong, D., Tai, P. C., and Davis, B. D. (1985).Proc. Natl. Acad. Sci. U.S.A. 82,4031-4035. Cerretti, D. P., Dean, D., Davis, G. R., Bedwell, D. M., and Nomura, M. (1983). Nucleic Acids Res. 11,2599-2616. Chen, L., and Tai, P. C. (1985). Proc. Natl. Acud. Sci. U.S.A. 82, 4384-4388. Chen, L., Rhoads, D., and Tai, P. C. (1985).J. Bucteriol. 161, 973-980. Chou, P. Y., and Fasman, G. D. (1974a). Biochemistry 13, 211-222. Chou, P. Y., and Fasman, G. D. (197413).Biochemistry 13, 222-245. Colacicco, G. (1970). Lipids 5, 636-649. Coleman, J., Inukai, M., and Inouye, M. (1985). Cell 43, 351-360. Cornell, D. G. (1979).J. Colloid Znterfuce Scz. 70, 167-180. Cornell, D. G. (1982).J . Colloid Interface Scz. 88, 536-545. Dalbey, R., and Wickner, W. (1985).J. Biol. Chem. 260, 15925-15931. Daniels, C. J., Bole, D. G., Quay, S. C., and Oxender, D. L. (1981). Proc. Natl. Acad. Sci. U.S.A. 78, 5396-5400. Dassa, E., and Boquet, P.-L. (1981). Mol. Gen. Genet. 181, 192-200. Date, T., Zwizinski, C., Ludmerer, S., and Wickner, W. (1980a).Proc. Nutl. Acud. Sci. U.S.A. 77,827-831. Date, T., Goodman, J. M., and Wickner, W. T. (1980b). Proc. Natl. Acad. Scz. U.S.A. 77, 4669-4673. Davis, N. G., and Model, P. (1985). Cell 41, 607-614. Dierstein, R., and Wickner, W. (1985).J. Biol. Chem. 260, 15919-15924. Ding, J., Lory, S., and Tai, P. C. (1985). Gene 33, 313-321.
176
MARTHA S. BRIGGS AND LILA M. GIERASCH
DiRienzo, J. M., and Inouye, M. (1979). Cell 17, 155-161. Emr, S. D., and Silhavy, T. J. (1983). Proc. Natl. Acad. Sci. U.S.A. 80, 4599-4603. Emr, S. D., Hedgpeth, J., Cltment, J.-M., Silhavy, T. J., and Hofnung, M. (1980). Nature (London) 285, 82-85. Emr, S. D., Hanley-Way, S., and Silhavy, T. J . (1981). Cell 23, 79-88. Enequist, H. G., Hirst, T. R., Harayama, S., Hardy, S. J. S., and Randall, L. L. (1981). Eur. J . Biochem. 116, 227-233. Engelman, D. M., and Steitz, T. A. (1981). Cell 23, 41 1-422. Engelman, D. M., and Steitz, T. A. (1984). In “The Protein Folding Problem” (D. B. Wetlaufer, ed.), Vol. 89, pp. 87-1 14. Am. Assn. Adv. Sci,, Washington, D.C. Fendler, J. H. (1982). “Membrane Mimetic Chemistry.” Wiley (Interscience), New York. Ferenci, T., and Randall, L. L. (1979).J. B i d . Chem. 254, 9979-9981. Ferro-Novick, S., Novick, P., Field, C., and Schekman, R. (1984a).J . Cell B i d . 98, 35-43. Ferro-Novick, S., Hansen, W., Schauer, I., and Schekman, R. (1984b).J. Cell Biol. 98,4453. Ferro-Novick, S., Honma, M., and Beckwith, J. (1984~).Cell 38, 211-217. Friedlander, M., and Blobel, G. (1985). Nature (London) 318, 338-343. Fujimoto, Y., Watanabe, Y., Uchida, M., and Ozaki, M. (1984).J. Biochem. 96, 1125-1 131. Gilmore, R., and Blobel, G. (1983). Cell 35, 677-685. Gilmore, R., and Blobel, G. (1985). Cell 42,497-505. Gilmore, R., Walter, P., and Blobel, G. (1982).J . Cell Biol. 95, 470-477. Goodman, J. M., Watts, C., and Wickner, W. (1981). Cell 24,437-441. Greenfield and Fasman (1969). Biochemistry 8 , 4 108-4 1 15. Grossman, A. R., Bartlett, S. G., Schmidt, G. W., and Chua, N.-H. (1980). Ann. N . Y. Acad. Sci. 343, 266-274. Gundelfinger, E. D., Di Carlo, M., Zopf, D., and Melli, M. (1984). EMBO J . 3, 23252332. Habener, J. F., Rosenblatt, M., Kemper, B., Kronenberg, H. M., Rich, A., and Potts, J. T., Jr. (1978). Proc. Natl. Acad. Sci. U.S.A. 75, 2616-2620. Hahn, V., Winkler, J., Rapoport, T. A., Liebscher, D.-H., Coutelle, Ch., and Rosenthal, S. (1983). Nucleic Aczds Res. 11, 4541-4552. Hall, M. N., Gabay, J., and Schwartz, M. (1983). EMBOJ. 2, 15-19. Hansen, W., Garcia, P. D., and Walter, P. (1986). Cell 45, 397-406. Hay, R., Bohni, P., and Gasser, S. (1984). Biochim. Biophys. Acta 779, 65-87. Hayashi, S., Chang, S.-Y., Chang, S., Giam, C.-Z., and Wu, H. C. (1985).J. Biol. Chem. 260, 5753-5759. Hortin, G., and Boirne, I. (1980). Proc. Natl. Acad. Scz. U.S.A. 77, 1356-1360. Horton, G., and Boime, I. (1981a).J. Biol. Chem. 256, 1491-1494. Hortin, G., and Boirne, I. (1981b). Cell 24, 453-461. Hortsch, M., and Meyer, D. I. (1985). Eur. J. Biochem. 150, 559-564. Hortsch, M., Avossa, D., and Meyer, D. I. (1985).J. Biol. Chem. 260, 9137-9145. Hurt, E. C., Pesold-Hurt, B., and Schatz, G. (1984). FEES Lett. 178, 306-310. Hussain, M., Ozawa, Y., Ichihara, S., and Mizushima, S. (1982). Eur.J. Biochem. 129,233239. Ichihara, S., Beppu, N., and Mizushima, S. (1984).J. Biol. Chem. 259, 9853-9857. Iida, A., Groarke, J. M., Park, S., Thom, J., Zabicky, J. H., Hazelbauer, G. L., and Randall, L. L. (1985). EMBOJ. 4, 1875-1880. Inouye, M., and Halegoua, S. (1980). CRC Crit. Rev. Biochem. 7, 339-371. Inouye, S., Soberon, X., Franchesini, T., Nakamura, K., Itakura, K., and Inouye, M. (1982). Proc. Natl. Acad. Sci. U.S.A. 79, 3438-3441. Ito, K. (1984). Mol. Gen. Genet. 197, 204-208.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
177
Ito, K., Wittekind, M., Nomura, M., Shiba, K., Yura, T., Miura, A., and Nashimoto, H. (1983). Cell 32, 789-797. Ito, K., Cerretti, D. P., Nashimoto, H., and Nomura, M. (1984). EMBOJ. 3, 2319-2324. Jackson, R. C. (1983).I n “Methods in Enzymology” (S. Colowick and N. Kaplan, eds.), Vol. 96, pp. 784-794. Academic Press, New York. Jackson, R. C., and Blobel, G. (1977). Proc. Natl. Acad. Sci. U.S.A. 74, 5598-5602. Jackson, R. C., and White, W. R. (1981).J. Biol. Chem. 256, 2545-2550. Jackson, R. L., Pattus, F., and Demel, R. A. (1979). Biochzm. Biofhys. Acta 556, 369-387. Jain, M. K., Streb, M., Rogers, J., and DeHaas, G. H. (1984). Biochem. Pharmacol. 33,25412551. Josefsson, L.-G., and Randall, L. L. (1981).Cell 25, 151-157. Kadonaga, J. T., Gautier, A. E., Straw, D. R.,Charles, A. D., Edge, M. D., and Knowles, J. R. (1984).J.Biol. Chem. 259, 2149-2154. Kadonaga, J. T., Pluckthun, A., and Knowles, J. R. (1986).J. Biol. Chem. 260, 1619216199. Katakai, R., and Iizuka, Y. (1984).J. Am. Chem. Soc. 106, 5715-5718. Koren, R., Burstein, Y., and Soreq, H. (1983).Proc. Natl. Acad. Sci. U.S.A. 80,7205-7209. Koshland, D., Sauer, R. T., and Botstein, D. (1982). Cell 30, 903-914. Kreibich, G., Ulrich, B. L., and Sabatini, D. D. (1978).J. Cell Biol. 77, 464-487. Kreibich, G., Marcantonio, E. E., and Sabatini, D. D. (1983). In “Methods in Enzymology” (S. Colowick and N. Kaplan, eds.), Vol. 96, pp. 520-531. Academic Press, New York. Kreil, G. (1981). Annu. Rev. Biochem. 50, 317-348. Kumamoto, C. A., and Beckwith, J. (1983).J . Bacteriol. 154, 253-260. Kumamoto, C. A., and Beckwith, J. (1985).J. Bacteriol. 163, 267-274. Kumamoto, C. A., Oliver, D. B., and Beckwith, J. (1984). Nature (London) 308, 863-864. Kurzchalia, T. V., Wiedmann, M., Girshovich, A. S., Bochkareva, E. S., Bielka, H., and Rapoport, T. A. (1986).Nature (London) 320,634-636. Lane, C. D., Colman, A., Mohun, T., Morser, J., Champion, J., Kourides, I., Craig, R., Higgins, S., James, T. C., Applebaum, S. W., Ohlsson, R. I., Paucha, E., Houghton, M., Matthews, J., and Mifflin, B. J. (1980). Eur. J . Biochem. 111, 225-235. Lauffer, L., Garcia, P. B., Harkins, R. N., Coussens, L., Ullrich, A., and Walter, P. (1985). Nature (London) 318, 334-338. Lee, C. A., Fournier, M. J., and Beckwith, J. (1985).J. Bacteriol. 161, 1156-1161. Lee, S. Y., Bailey, S. C., and Apirion, D. (1978).J. Bacteriol. 133, 1015-1023. Lewis, R. M., Furie, B. C., and Furie, B. (1983). Biochemistry 22, 948-954. Lin, J. J. C., Kanazawa, H., Ozols,J., and Wu, H. C. (1978). Proc. Natl. Acad. Sci. U.S.A. 75, 4891-4895. Lingappa, V. R., Lingappa, J. R., and Blobel, G. (1979). Nature (London) 281, 117-121. Lingappa, V. R., Chaidez, J., Yost, C. S., and Hedgpeth, J. (1984). Proc. Natl. Acad. Sci. U.S.A. 81, 456-460. Liss, L. R., and Oliver, D. B. (1986).J. B i d . Chem. 261, 2299-2304. Liss, L. R., Johnson, B. L., and Oliver, D. B. (1985).J. Bacterial. 164, 925-928. Lively, M. O., and Walsh, K. A. (1983).J. Biol. Chem. 258, 9488-9495. Magner, J. A. (1982).J . Theor. Biol. 99, 831-833. Majzoub, J. A., Rosenblatt, M., Fennick, B., Maunus, R., Kronenberg, H. M., Potts, J. T., Jr., and Habener, J. F. (1980).J. Biol. Chem. 255, 11478-11483. Mandel, G., and Wickner, W. (1979). Proc. Natl. Acad. Sci. U.S.A. 76, 236-240. Marcantonio, E. E., Grebenau, R. C., Sabatini, D. D., and Kreibich, G. (1982). Eur. J . B i o c h a . 124, 217-222. Marcantonio, E. E., Amar-Costesec, A., and Kreibich, G. (1984).J. Cell. Biol. 99, 22542259.
178
MARTHA S. BRIGGS AND LILA M. GIERASCH
Matteucci, M., and Lipetsky, H. (1986).Biotechnology 4,51-55. Mayer, L. D., Nelsestuen, G. L., and Brockman, H. L. (1983).Biochemistry 22,316-321. Meek, R. L., Walsh, K. A., and Palmiter, R. D. (1982).J . Biol. Chem. 257, 1224512251. Meyer, D. I. (1985).E M B O J . 4, 2031-2033. Meyer, D.I., Krause, E.,and Dobberstein, B. (1982).Nature (London) 297, 647-650. Michaelis, S.,Guarente, L., and Beckwith, J. (1983).J.Bacterial. 154, 356-365. Milstein, C.,Brownlee, G. G., Harrison, T. M., and Mathews, M. B. (1972).Nature (London) New Biol. 239, 117-120. Moreno, F., Fowler, A. V., Hall, M., Silhavy, T. J., Zabin, I., and Schwartz, M. (1980). Nature (London) 286, 356-359. Mostov, K. E., DeFoor, P., Fleischer, S., and Blobel, G. (1981).Nature (London)292,87-88. Mueckler, M., and Lodish, H. F. (1986).Cell 44,629-637. Miiller, M., and Blobel, G. (1984a).Proc. Natl. Acad. Sci. U.S.A. 81, 7421-7425. Muller, M., and Blobel, G. (198413). Proc. Natl. Acad. Sci. U.S.A. 81, 7737-7741. Miiller, M., Ibrahimi, I., Chang, C. N., Walter, P., and Blobel, G. (1982).J. Biol. Chem. 257, 11860-1 1863. Nagaraj, R. (1984).FEBS Lett. 165, 79-82. Nesmayanova, M. A. (1982).FEBS Lett. 142, 189-193. Novak, P.,Ray, P. H., and Der, I. K. (1986).J.Biol. Chem. 261, 420-427. Novick, P., Field, C., and Schekman, R. (1980).Cell 21, 205-215. Novick, P.,Ferro, S., and Schekman, R. (1981).Cell 25, 461-469. Ohno-Iwashita, Y.,and Wickner, W. (1983).J.Biol. Chem. 258, 1895-1900. Ohno-Iwashita, Y.,Wolfe, P., Ito, K., and Wickner, W. (1984).Biochemistry 23,6178-6184. Oliver, D.B. (1985).J.Bacterial. 161, 285-291. Oliver, D.B., and Beckwith, J. (1981).Cell 25, 765-772. Oliver, D. B.,and Beckwith, J. (1982a).J.Bacterial. 150, 686-691. Oliver, D.B., and Beckwith, J. (1982b).Cell 30, 31 1-319. Oliver, D. B., and Liss, L. R. (1985).J.Bacterial. 161, 817-819. Oxender, D. L., Landick, R., Nazos, P., and Copeland, B. R. (1984).In “Microbiology1984”(L. Leive and D. Schlessinger, eds.), pp. 4-7. Am. SOC.Microbiol., Washington, D.C. Pagts, J.-M. (1982).Eur. J . Biochem. 122, 381-386. Pagts, J.-M., and Lazdunski, C. (1981).FEMS Microbial. Lett. 12, 65-69. Pages, J.-M., and Lazdunski, C. (1982a).FEBS Lett. 149, 51-54. Pagts, J.-M., and Lazdunski, C. (1982b).Eur. J . Biochem. 124, 561-566. Pagks, J.-M., Piovant, M., Varenne, S., and Lazdunski, C. (1978).Eur.1. Biochem. 86,589602. Pages, J.-M., Anba, J.. Bernadac, A., Shinagawa, H., Nakata, A., and Lazdunski, C. (1984). Eur. J . Biochem. 143, 499-505. Pages, J.-M., Anba, J.. and Lazdunski, C. (1985).Ann. Inst. PasteurJMicrobiol. (Paris) 136A, 105-110. Palade, G . (1975).Science 189, 347-358. Palmiter, R. D., Gagnon, J., and Walsh, K. A. (1978).Proc. Natl. Acad. Sci. U.S.A. 75,94-98. Palva, I., Sarvas, M., Lehtovaara, P., Sibakov, M., and Kaariainen, L. (1982).Proc. Natl. Acad. Sci. U.S.A. 79, 5582-5586. Paul, D. L., and Goodenough, D. A. (1983).J.Cell Biol. 96,636-638. Perara, E., and Lingappa, V. (1986).J.Cell Biol. 101, 2292-2301. Perlman, D.,and Halvorson, H. 0. (1983).J . Mol. Biol. 167, 391-409. Perlman, D.,Halvorson, H. O., and Cannon, L. E. (1982).Proc. Natl. Acad. Sci. U.S.A. 79, 781-785. Pethica, B. A. (1955).Tram. Faradaj Sac. 51, 1402-1411.
MOLECULAR MECHANISMS OF PROTEIN SECRETION
179
Phillips, M. C., and Sparks, C. E. (1980). Ann. N . Y . Acad. Sci. 328, 122-137. Pincus, M. R., and Klausner, R. D. (1982). Proc. Natl. Acad. Sci. U.S.A. 79, 3413-3417. Prehn, S., Tsamaloukas, A., and Rapoport, T. A. (1980). Eur. J . Biochem. 107, 185195. Prehn, S., Nurnberg, P., and Rapoport, T. A. (1981). FEBS Lett. 123, 79-84. Quinn, P. J., and Dawson, R. M. C. (1969). Biochem.]. 113, 791-804. Randall, L. L. (1983). Cell 33, 231-240. Randall, L. L., and Hardy, S. J. S. (1984a). Mod. Cell Biol. 3, 1-20. Randall, L. L., and Hardy, S. J. S. (198413).Microbiol. Rev. 48, 290-298. Rapoport, T. A. (1985). FEBS Lett. 187, 1-10. Reddy, G. L., and Nagaraj, R. (1985). Biochim. Biophys. Acla 831, 340-346. Redman, C. M., and Sabatini, D. D. (1966). Proc. Natl. Acad. Sci. U.S.A. 56, 608-615. Rhoads, D. B., Tai, P. C., and Davis, B. D. (1984).]. Bacteriol. 159,63-70. Robinson, A., Kaderbhai, M. A,, and Austen, B. M. (1985). Biochem. SOC. T r a m 13, 724726. Rosenblatt, M., Beaudette, N. V., and Fasman, G. D. (1980).Proc. Natl. Acad. Sci. U.S.A. 77, 3983-3987. Rothblatt, J. A., and Meyer, D. I. (1986). Cell 44, 619-628. Rothfield, L. I., and Fried, V. A. (1975). In “Methods in Membrane Biology” (E. D. Korn, ed.), Vol. 4, pp. 277-292. Plenum, New York. Rothman, J. E., and Lodish, H. F. (1977). Nature (London) 269, 775-780. Russell, M., and Model, P. (1981). Proc. Natl. Acad. Sci. U.S.A. 78, 1717-1721. Ryan, J. P., and Bassfotd, P. J., Jr. (1985).J. B i d . Chem. 260, 14832-14837. Ryan, J. P., Fikes, J. D., Bankaitis, V. A., Duncan, M. C., and Bassford, P. J., Jr. (1986a). Microbiology, in press. Ryan, J. P., Duncan, M. C., Bankaitis, V. A,, and Bassford, P. J., Jr. (1986b).J.Biol. Chem. 261, 3389-3395. Sabatini, D. D., and Blobel, G. (1970).]. CellBiol. 45, 146-157. Schauer, I., Emr, S. D., Gross, C., and Schekman, R. (1985).]. Cell B i d . 100, 1664-1675. Schecter, I., McKean, D. J., Guyer, R., and Terry, W. (1975). Science 188, 160-162. Scheele, G. (1983). I n “Methods in Enzymology” (S. Colowick and N. Kaplan, eds.), Vol. 96, pp. 94-111. Academic Press, New York. Schultz, J., Silhavy, T. J., Berman, M. L., Fiil, N., and Emr, S. D. (1982). Ce1131,227-235. Shiba, K., Ito, K., and Yura, T. (1984).J. Bacteriol. 160,696-701. Shinnar, A. E., and Kaiser, E. T. (1984).J. Am. Chem. SOC.106, 5006-5007. Siegel, V., and Walter, P. (1985).J. Cell Biol. 100, 1913-1921. Silhavy, T. J., Benson, S. A., and Emr, S. D. (1983). Microbiol. Rev.47, 313-344. Silver, P., Watts, C., and Wickner, W. (1981). Cell 25, 341-345. Smith, W. P., Tai, P. C., Thompson, R. C., and Davis, B. D. (1977). Proc. Natl. Acad. Sci. U.S.A. 74, 2830-2834. Smith, W. P., Tai, P. C., and Davis, B. D. (1978). Proc. Natl. Acad. Sci. U.S.A. 75.814-817. Steiner, D. F., Quinn, P. S., Chan, S . J., Marsh, J., and Tager, H. S. (1980).Ann. N . Y . Acad. Sci. 343, 1-16. Stern, J. B., and Jackson, R. C. (1985). Arch. Biochem. Biophys. 237, 244-252. Strauch, K. L., Kunamoto, C. A., and Beckwith, J. (1986).J. Bactenol. 166,505-512. Swan, D., Aviv, H., and Leder, P. (1972). Proc. Natl. Acad. Sci. U.S.A. 69, 1967-1971. Tabe, L., Krieg, P., Strachan, R., Jackson, D., Wallis, E., and Colman, A. (1984).J. Mol. B d . 180, 645-666. Takahara, M., Hibler, D. W., Barr, P. J., Gerlt, J. A., and Inouye, M. (1985).j.Biol. Chem. 260,2670-2674. Talmadge, K., Stahl, S., and Gilbert, W. (1980a). Proc. Natl. Acad. Sci. U.S.A. 77, 33693373.
180
MARTHA S. BRIGGS AND LlLA M. GlERASCH
Talmadge, K., Kaufman, J., and Gilbert, W. (1980b). Proc. Natl. Acad. Sci. U.S.A. 77,39883992. Talmadge, K., Brosius, J., and Gilbert, W. (1981). Nature (London) 294, 176-178. Tanford, C. (1980). “The Hydrophobic Effect.” Wiley (Interscience), New York. Ter-Minassian-Saraga, L. (1979).J. Colloid Interjace Sci. 70, 245-264. Tokunaga, M., Loranger, J. M., Wolfe, P. B., and Wu, H. C. (1982).J. Biol. Chem. 257, 9922-9925. Tokunaga, M., Loranger, J. M., and Wu, H. C. (1984).J. Cell. Biochem. 24, 113-120. Tommassen, J., van Tol, H., and Lugtenberg, B. (1983). E M B O J . 2, 1275-1279. Tommassen, J., Leunissen, J., van Damme-Jongsten, M., and Overduin, P. (1985). EMBO J . 4,1041-1047. van den Broek, G., Timko, M. P., Kausch, A. P., Cashmore, A. R., van Montagu, M., and Herrera-Estrella, L. (1985). Nature (London) 313, 358-363. van Zoelen, E. J. F., Zwaal, R. F. A., Reuvers, F. A. M., Demel, R. A., and van Deenen, L. L. M. (1977). Biocliim. Biophys. Acta 464, 482-492. Verger, R., and Pattus, F. (1982). Chem. Phys. Lipids 30, 189-227. Vlasuk, G. P., Inouye, S., Ito, H., Itakura, K., and Inouye, M. (1983).J. Biol. Chem. 258, 7141-7148. Vlasuk, G. P., Inouye, S., and Inouye, M. (1984).J. Biol. Chem. 259, 6195-6200. von Heijne, G. (1980a). Biochem. Sac. Symp. 46, 259-273. von Heijne, G. (1980b). Eur.1. Biochem. 103,431-438. von Heijne, G. (1981). Eur.J. Biochem. 116, 419-422. von Heijne, G. (1983). Eur. J. Bzochem. 133, 17-21. von Heijne, G. (1984a). E M B O J . 3,2315-2318. von Heijne, G. (1984b).J. Mol. Biol. 173, 243-251. von Heijne, G. (1985).J. Mol. Biol. 184, 99-105. von Heijne, G., and Blomberg, C. (1979). Eur.J. Biochem. 97, 175-181. Walter, P., and Blobel, G. (1981a). J. Cell Biol. 91, 551-556. Walter, P., and Blobel, G. (1981b).J. Cell Biol. 91, 557-561. Walter, P., and Blobel, G. (1983). Cell 34, 525-533. Walter, P., Ibrahimi, I., and Blobel, G. (1981).J. Cell Biol. 91, 545-550. Walter, P., Gilmore, R., and Blobel, G. (1984). Cell 38, 5-8. Waters, G. M.. and Blobel. G. (1986).I . Cell Biol., in press. Watson, M. E. E. (1984). Nucleic Acidr Res. 12, 5145-5164. Watts, C., Silver, P., and Wickner, W. (1981). Cell 25, 347-353. Wickner, W. (1980). Science 210, 861-868. Wickner, W., Ito, K., Mandel, G., Bates, M., Nokelainen, M., and Zwizinski, C. (1980). Ann. N . Y . Acad. Sn’. 343, 384-389. Wiedmann, M., Huth, A., and Rapoport, T. A. (1984). Nature (London) 309,637-639. Wiedmann, M., Huth, A., and Rapoport, T. A. (1986a). Biochem. Biophys. Res. Commun. 134, 790-796. Wiedmann, M., Huth, A., and Rapoport, T. A. (1986b). FEBS Lett. 194, 139-145. Wolfe, P. B., and Wickner, W. (1984). Cell 36, 1067-1072. Wolfe, P. B., Wickner, W., and Goodman, J. M. (1983a).J. Biol. Chem. 258, 12073-12080. Wolfe, P. B., Zwizinski, C., and Wickner, W. (1983b). I n “Methods in Enzymology” (S. Colowick and N. Kaplan, eds.), Vol. 97, pp. 40-46. Academic Press, New York. Wu, H. C., Tokunaga, M., Tokunaga, H., Hayashi, S., and Giam, C.-Z. (1983). J . Cell. B ~ o c 22, ~ .161-171. Zimmerman, M., Ashe, B. M., Alberts, A. W., Pierzchala, P. A., Powers, J. C., Nishino, N., Strauss, A. W., and Mumford, R. A. (1980). Ann. N.Y. Acad. Sci. 343,405-414. Zwizinski, C., and Wickner, W. (1980).J. Biol. Chem. 255,7973-7977.
VIBRATIONAL SPECTROSCOPY AND CONFORMATION OF PEPTIDES. POLYPEPTIDES. AND PROTEINS By SAMUEL KRIMM' and JAGDEESH BANDEKARt 'Biophysics Research Divisiod and Department of Physics and tBiophysics Research Division. University of Michigan. Ann Arbor. Michigan 48109
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Theoretical Considerations . . . . . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Normal Modes of Vibration . . . . . . . . . . . . . . . . . . . . . C . Polypeptide Chain Geometry and Coordinates . . . . . . . . . . . . D . Polypeptide Force Field . . . . . . . . . . . . . . . . . . . . . . . E. Band Assignments . . . . . . . . . . . . . . . . . . . . . . . . . 111. Extended Polypeptide Chain Structures . . . . . . . . . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Antiparallel-Chain Rippled Sheet . . . . . . . . . . . . . . . . . . C. Antiparallel-Chain Pleated Sheet . . . . . . . . . . . . . . . . . . . IV . Helical Polypeptide Chain Structures . . . . . . . . . . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . aHelix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. 310Helix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D . 3,Helix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. L, D p Helices. . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Reverse Turns . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. p T u r n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. y T u r n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Characteristics of Polypeptide Chain Modes . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Amide and Skeletal Modes of the Polypeptide Chain . . . . . . . . . VII . Vibrational Spectroscopy of Proteins . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Side-Chain and S-S Modes . . . . . . . . . . . . . . . . . . . . . C. Normal Modes of Proteins . . . . . . . . . . . . . . . . . . . . . VIII . Prospects for the Future . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
181 183 185 185 185 203 204 224 229 229 230 238 256 256 258 270 275 288 297 297 298 322 328 328 328 341 341 342 346 352 354
LIST OF SYMBOLS A ab APPS APRS as
integrated intensity; or constant in attractive term of nonbonded potential antisymmetric bend antiparallel-chain pleated sheet antiparallel-chain rippled sheet antisymmetric stretch 181
ADVANCES IN PROTEIN CHEMISTRY. Vol. 38
Copyright 0 1986 by Academic Press. Inc. All rights of reproduction in any form reserved.
182
fY f
F G-'
h
H ib IR
JY
J
L
N, ob PI
P PED (1
ql
Q r
5 Ar r
R S
S, sb sh ss t T TDC tw U
SAMUEL KRIMM AND JAGDEESH BANDEKAR bend constant in repulsive term of nonbonded potential matrix elements relating Cartesian coordinates x, to the internal coordinate r, velocity of light constant in exponent of repulsive term of Buckingham nonbonded potential character of symmetry operation for the ath species deformation interaction constant between peptide groups i and j in perturbation treatment unit vector unit matrix electric field vector force constant associated with 9#e matrix of force constants force constant matrix in internal coordinates inverse kinetic energy matrix in internal coordinates Planck's constant transformation matrix between two coordinate bases in-plane bend infrared Jacobian element, = av:""/df, Jacobian matrix transformation matrix from normal to internal or to local symmetry coordinates _transformation matrix from normal to mass-weighted Cartesian coordinates mass of ith atom matrix (diagonal) of atomic masses number of atoms in a molecule; or number of repeat units in crystallographic repeat of a helix Avogadro's number out-of-plane bend momentum conjugate to r, number of atoms in chemical repeat unit of a helix potential energy distribution column vector of q1 to qsN mass-weighted Cartesian coordinate of ith atom, = m,%, normal coordinate (x rock internal displacement coordinate j bond stretch from equilibrium column vector with internal coordinate components interatomic separation between nonbonded atoms stretch local group symmetry coordinate i symmetric bend shoulder symmetric stretch torsion kinetic energy transition dipole coupling twist atomic mass unit
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
183
potential energy wag weighting factor for ith frequency Cartesian coordinate of ith atom column vector with Cartesian coordinate components rotation angle about helix axis transforming one unit into an adjacent one phase angle between displacements in adjacent peptide groups of a periodic polypeptide chain structure dielectric constant in-plane angle bend from equilibrium 4lr9CW
dipole moment dipole moment derivative with respect to normal coordinate frequency in cm-1 unperturbed frequency observed amide A frequency observed amide B frequency dispersion in force constantf, dihedral angle change from equilibrium phase shift between displacements in adjacent units of a helix N, - CP dihedral angle sum of weighted squared errors CP - C, dihedral angle out-of-plane angle bend from equilibrium IR dichroism parallel to sample orientation axis IR dichroism perpendicular to sample orientation axis
I. INTRODUCTION
The vibrational spectrum of a molecule is determined by its threedimensional structure and its vibrational force field. An analysis of this [usually infrared (IR) and Raman] spectrum can therefore provide information on the structure and on intramolecular and intermolecular interactions. The more probing the analysis, the more detailed is the information that can be obtained. While the structure and force field uniquely determine the vibrational frequencies of the molecule, the structure cannot in general be obtained directly from the spectrum. However, to a useful approximation, the atomic displacements in many of the vibrational modes of a large molecule are concentrated in the motions of atoms in small chemical groups, and these localized modes are to a good approximation transferable between molecules. Therefore, in the early studies of peptides and proteins (Sutherland, 1952), efforts were directed mainly to the identification of such characteristic frequencies and the determination of their relation to the structure of the molecule. This kind of analysis depended on empirical correlations of the spectra of chemically similar molecules,
184
SAMUEL KRIMM AND JAGDEESH BANDEKAR
and occasionally yielded significant insights into the dependence of the spectrum on the conformation of the polypeptide chain (Bamford et al., 1956). Detailed analyses of the vibrational spectra of macromolecules, however, have provided a deeper understanding of structure and interactions in these systems (Krimm, 1960). An important advance in this direction for proteins came with the determination of the normal modes of vibration of the peptide group in N-methylacetamide (Miyazawa et al., 1958), and the characterization of several specific amide vibrations in polypeptide systems (Miyazawa, 1962, 1967). Extensive use has been made of spectra-structure correlations based on some of these amide modes, including attempts to determine secondary structure composition in proteins (see, for example, Pezolet et al., 1976; Lippert et al., 1976; Williams and Dunker, 1981; Williams, 1983). Polypeptide molecules of course exhibit many more vibrational frequencies than the half-dozen o r so amide modes. For example, a molecule as simple as the extended form of polyglycine has about 50 bands in its IR and Raman spectra. It is clear that the information contained in the entire spectrum must therefore be a more sensitive indicator of three-dimensional structure. The only way to utilize this information fully is through a normal-mode analysis, that is, by comparing observed frequencies with those calculated for specific secondary structures. This can provide a powerful method for testing structural hypotheses in great detail. Over the years, some normal-mode calculations have provided greater insight into the spectra of particular molecules. However, these have often been based on approximate structures (i.e., a group of atoms, such as CH2 or CHs, replaced by a point mass) or have employed limited force fields. T h e work in our laboratory has developed on the basis of a more systematic approach: We have endeavored to refine a vibrational force field for the polypeptide chain that is essentially transferable from one molecule to another. By starting with N ,lnethylacetamide to obtain a force field for the peptide group (Jakes and Krimm, 1971a,b), and building up through known polypeptide structures such as polyglycine I (Abe and Krimm, 1972a; Moore and Krimm, 1976a; Dwivedi and Krimm, 1982a), P-poly(L-alanine) (Moore and Krimm, 1976b; Dwivedi and Krimm, 1982b, 1983), and a-poly(L-alanine) (Rabolt et al., 1977; Dwivedi and Krimm, 1984a), it has been possible to develop vibrational force fields that can account for the observed spectra of these molecules with an average band-frequency error of -5 cm-’ (Krimm, 1983). These force fields can now serve as a basis for detailed analyses of spectral and structural questions in other polypeptide molecules.
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
185
The aim of this review is to present these recent developments in the vibrational spectroscopy of peptides, polypeptides, and proteins. We will first discuss the necessary basic aspects of normal-mode calculations. We will then give results for those polypeptide secondary structures that have been studied to date, with an evaluation of the insights obtained from these analyses. Finally, we will comment on the preliminary studies being done on proteins and the prospects for the future.
CONSIDERATIONS 11. THEORETICAL A. Introduction
Bands are observed in the IR and Raman spectra of a molecule that correspond to normal modes of vibration of that particular structure. These normal modes can be calculated from knowledge of the threedimensional structure of the molecule and of its vibrational force field. If a reliable force field is available, it is therefore possible to predict the vibrational frequencies of a known or hypothesized structure, and to draw significant structural conclusions on the basis of comparisons with observed IR and Raman bands. In this section we review the nature of the normal-mode calculation, discuss the various factors that enter into the specification and refinement of the force field, and consider aspects of band assignments in IR and Raman spectra that are important in making comparisons between observed and calculated frequencies. These considerations underlie the results discussed in subsequent sections.
B . Normal Modes of Vibration 1 . Isolated Small Molecule The interactions of electromagnetic radiation with the vibrations of a molecule, either by absorption in the infrared region or by the inelastic scattering of visible light (Raman effect), occur with the classical normal vibrations of the system (Pauling and Wilson, 1935). The goal of our spectroscopic analysis is to show how the frequencies of these normal modes depend upon the three-dimensional structure of the molecule. We will therefore review briefly in this section the nature of the normalmode calculation; more detailed treatments can be found in a number of references (Herzberg, 1945; Wilson et al., 1955; Woodward, 1972; Califano, 1976). We will then discuss the component parts that go into such calculations.
186
SAMUEL KRIMM AND JAGDEESH BANDEKAR
The normal-mode frequencies are obtained by solving an equation, the secular equation, that is the condition that must be satisfied if the molecule is to have harmonic modes of vibration. Since the constant terms in this equation are determined in part by the molecular structure, the frequency-structure correlation appears explicitly in the solution of the secular equation. These terms also depend on the potential-energy changes during the vibrations, and therefore the force field associated with such displacements from equilibrium must be known. The secular equation arises from the solution of the equations of motion for the atoms in the molecule. The kinetic energy, T, of a molecule of N atoms is given by 2T =
3N
3N
i= I
1=1
2 mix.? =
q‘
where mi are the masses, xi are Cartesian displacement coordinates (i.e., changes in Cartesian coordinates from their equilibrium values), xi = d x i / dt and qi m,!’*xiare convenient mass-weighted Cartesian displacement coordinates. Equation (1) can be expressed in matrix notation as
2T
=
qq
(q = M1’2x)
where q is a column vector of q1 to 4 3 N , with the tilde indicating the transpose, and M is a diagonal matrix of the atomic masses. The kinetic energy matrix is, of course, diagonal in Cartesian coordinates. The potential energy for displacement from equilibrium, in the harmonic approximation, is given by
where
are the force constants for infinitesimal displacements. In matrix notation this becomes 2V = qF,q where Fq is a 3N X 3N symmetric matrix of thej-j. Application of Lagrange’s equation,
(5)
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
187
to Eqs. (2) and (5),together with the assumption that the qi vary harmonically with the same frequency, produces a set of linear homogeneous equations that has a nontrivial solution only if the determinant of its coefficients is set equal to zero. This determinantal equation, with the frequencies as the variables, is known as the secular equation. The above formulation, however, is not always convenient. We generally tend to think, and to a major extent justifiably, in terms of energy changes that result from changes in bond lengths, bond angles, and dihedral angles in a molecule, that is, from displacements in internal coordinates. (Although not convenient, changes in relevant nonbonded interactions can be included in this category.) It is therefore desirable at times to recast the normal vibration problem into other than a Cartesian basis, and the internal coordinate basis is a generally useful one. A nonlinear molecule of N atoms has 3N - 6 internal vibrational degrees of freedom, and therefore 3N - 6 normal modes of vibration (the three translational and three rotational degrees of freedom are not of vibrational spech-oscopicrelevance). Thus, there are 3N - 6 independent internal coordinates, each of which can be expressed in terms of Cartesian coordinates. To first order, we can write any internal displacement coordinate rj in the form 3N
r, =
2 Bj,xi i=
( j = 1, 2, 3 , ..., 3N - 6 )
1
(7)
where the coefficients B,i are determined by the three-dimensional geometry of the molecule. If r and x denote the column vectors whose components are the internal and Cartesian coordinates, respectively, then Eq. (7) can be written as r = Bx
(8)
where B is, in general, a (3N - 6) X 3N matrix. Since we want to invert Eq. (€9,which cannot formally be done if B is not square, we can include the three translations and three rotations in the r vector, enabling us to transform from internal to Cartesian coordinates through
x = B-lr where B-' is the inverse matrix of B. By substituting Eq. (9) into Eqs. (2) and ( 5 ) we obtain
(9)
188
SAMUEL KRIMM AND JAGDEESH BANDEKAR
and
is the force constant matrix in Cartesian coordinates, and
F = B-~F,B-~
(13)
are the force constant and inverse kinetic energy matrices, respectively, in internal coordinates. The G-' matrix has the property that the kinetic energy can be written as
(Wilson et al., 1955), where p; is the momentum conjugate to r;. We proceed to the secular equation by recognizing that the normalmode frequencies correspond to normal coordinates, Q, in which the kinetic and potential energies are both diagonal; that is,
2T
=
QQ
2v
=
QAQ
(15)
and where A is a diagonal matrix whose elements are the normal-mode frequency parameters hi = 47r2c2v?, vi being in cm-'. In such coordinates each mode is a simple harmonic oscillator whose energy levels are determined by the Schrodinger equation and whose interaction with radiation is thus well defined. Normal coordinates are usually defined in terms of Cartesian coordinates by the relation (Wilson et al., 1955)
q
=
ZQ
so that the internal coordinates are given by
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
r = Bx = BM-l/2q = BM-1129 Q = LQ where
L = BM-l/23 From Eqs. (10) and ( 1 1) we see that
2T = QxG-'LQ and
2 v = QLFLQ which, by comparison with Eqs. (15) and (16), shows that
LG-lL = E and
LFL = A where E is the unit matrix. Solving Eq. (22) for L = L-IG and substituting into equation (23) give
L-lGFL = A
(24)
GFL = LA
(25)
and
Equation (25) is the most general form of the secular equation, since if we denote the a t h column of L by L, , we have
GFL, = L,A,
=
A,L,
(26)
and thus the a t h column of L is an eigenvector of GF with eigenvalue A,. The eigenvectors describe the forms of the normal modes, with the internal coordinate contributions being obtained for any Qa from Eq. (18), that is, ri = Li,Qa
(27)
The relative amplitudes of the ri in the normal-mode Qa are therefore given by the relative Li,. It is usually easier to visualize an eigenvector by expressing it in terms of Cartesian displacements of the atoms. This can be done, since, by combining Eqs. (9) and (18), we have x = B-lr = B-lLQ
(28)
Alternatively, the secular equation in Cartesian coordinates can be
190
SAMUEL KRIMM AND JACDEESH BANDEKAR
solved, in which case Cartesian eigenvectors are obtained. By methods similar to those used above, the secular equation in Cartesian coordinates, analogous to Eq. (25),is M-'F,SI! = 5% where F, can be obtained from F using Eq. (13),viz.,
(29)
F, = BFB The Cartesian eigenvectors are then given by
(30)
x = M-1123Q
(31)
using Eq. (17). Another convenient method of characterizing a normal mode is by the potential energy distribution (PED), which describes the relative contributions of various displacement coordinates to the total change in potential energy during the vibration. From Eq. (16)we see that when only one normal mode Qu is excited, the potential energy V, of the molecule is given by
2V,
= A,@
(32) so that A, measures the potential energy change for unit displacement of 4p. From Eq. (23)we have ha =
q
FqLiaLja
(33)
and therefore
Thus, terms such as 2FqLiaLj,lA,
(35)
and F~~ L:I A,
give the fractional contributions to the potentia. energy change during the normal vibration Qu of the internal coordinate displacements rirj and r?, respectively. It should be noted that the diagonal elements of the PED, which according to Eq. (36)are never negative, can add up to more than unity since the off-diagonal elements [Eq. (35)]can be negative. Although the PEDs can be given entirely in terms of the internal
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
191
coordinates, the fact that certain combinations of displacements in a normal mode can be consistently associated with particular chemical groups makes it convenient to identify such local group coordinates. Thus, the stretching vibrations of a CH:, group in general consist of a symmetric and an antisymmetric stretching of the two CH bonds; the bending of the HCH angle in a CHz group is a characteristic localized mode of vibration involving other adjacent angles; etc. We can therefore define specific linear combinations of internal coordinates that correspond to such local group (usually symmetry)coordinates, S,and formulate the secular equation in terms of these coordinates. The PED in a normal mode will then be given in terms of the Si rather than the r;. The normal-mode problem therefore consists of solving Eq. (25) to determine the eigenvalues A and the eigenvectors L. This requires knowledge of the force constant matrix F, i.e., the set of force constants F I ~in. internal coordinates, and of the kinetic energy matrix G. The latter is obtained by inversion of Eq. (14), that is, G = BWlB
(37)
and is seen to be determined by B, namely the geometry-dependent matrix that transforms from Cartesian to internal coordinates. The construction of the elements of the B matrix is a straightforward, if tedious, process that has been systematized (Wilson et al., 1955; Califano, 1976) and can be accomplished by computer programs. The solution of Eq. (25), namely the diagonalization of GF, can also be readily done by computer, usually after symmetrizing this product, since, although G and F are symmetric by their definition, their product in general is not (Miyazawa, 1958). [Of course, an alternative approach is to solve Eq. (29) in Cartesian coordinates. Then we only need to compute the B matrix and transform F into F, using Eq. (30).] Obtaining the normal modes does not automatically guarantee that these vibrations will be observable in IR absorption or Raman scattering. Other conditions must also be satisfied (Wilson et al., 1955): For IR absorption, there must be a nonzero dipole moment change during the vibration; for Raman scattering, there must be a nonzero poiarizability change during the vibration. Some quantitative aspects of IR absorption will be considered below (Sections II,C,2,c and II,E,2). The computational procedure used in our laboratory is illustrated schematically in Fig. 1. The procedure starts with the input of the number of atoms in the molecule, their Cartesian coordinates and masses, and the total number of internal coordinates. The latter are given specifically in terms of the number of stretch, angle bend, linear angle bend, out-of-plane bend, and torsional coordinates. From the index that char-
192
SAMUEL KRIMM AND JAGDEESH BANDEKAR
-
1 Read Index which determines Type of Intrrnol Coords = J
Read Atom Numbers Involved I n this deformation
i Form Unit Vectors Bstween Bonded Atoms Concerned
i Compile 6 - M a t r i x Row i
- NO
1
solve Secular Equation
f
~
I
~-
Eigenvaluer and Eigenfunctions
FIG. 1. Schematic diagram of computer program to calculate normal-modefrequencies and eigenvectors.
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
193
Y
t,
FIG.2. N-Methylacetamide structure.
acterizes the type of internal coordinate and the atoms involved, the program computes the B matrix and G matrix elements. The (symmetrized) secular equation is then solved for the eigenvalues and eigenvectors, using programs for matrix diagonalization. In the case of a repeating helical structure or a molecular crystal (see below), the procedure is essentially the same: Although the G (or B) and F matrices are for the entire polymer or crystal, symmetry allows them to be “folded” into smaller matrices with dimensions appropriate to the repeat or asymmetric unit. In the case of a force-field refinement (see below), the scheme in Fig. 1 changes somewhat. The G matrix elements remain unchanged, but the cycle involving the F matrix changes until the refinement converges to an acceptable agreement between observed and calculated frequencies.
2. N-Methylacetamide We illustrate the results of a normal-mode calculation on a small molecule by discussing the normal vibrations of N-methylacetamide (NMA), the simplest molecule containing a trans peptide group analogous to that in a polypeptide chain. A study of this molecule provides insights into the general nature of the so-called amide modes of the peptide group. Although a number of normal-mode analyses have been done on NMA (Miyazawa et al., 1958; Miyazawa, 1962; Jake3 and Schneider, 1968; Jake3 and Krimm, 1971a; Rey-Lafon et al., 1973), we discuss here a calculation (T. C. Cheam, personal communication, 1985) on a molecule of “standard geometry” (see below) using a polyglycine I [(Gly),I] force field (Dwivedi and Krimm, 1982a) [withf(NH ob, CN t) = -0.051. The model used is shown in Fig. 2, where we have replaced the CH3 group by an equivalent point mass, an assumption that should not seriously affect the main conclusions concerning the nature of the amide modes. The local symmetry coordinates of the peptide group in terms of internal-displacement coordinates (AT = bond stretch, A6 = in-plane angle bend, Au = out-of-plane angle bend, AT = dihedral angle change)
194
SAMUEL KRIMM AND JAGDEESH BANDEKAR TABLE I Local Symmetry Coordinates of the Peptide Group
SI
=
r(CC)
SZ = r[(Cz)Nl = r"(C4)I = r(C0) S5 = r(NH) Ss = [ZO(CCN) - O(CC0) - O(NCO)]/fi S7 = [O(CCO) - @ ( N C O ) ] / f i Ss = [2O(CNC) - B[(Cz)NH] - O[(C,)NH]]/fi S g = [O(C2NH) - O[(C,)NH]]/fi
SS s4
S l o = o ( C 0 ) sin(CCN) S L I= o(NH) sin(CNC) S12 = [T(CCNC) + T(CCNH) + r(0CNC) + 7(OCNH)]/4 a
CC stretch (CC s) CN stretch (CN s) NC stretch (NC s) CO stretch (CO s) NH stretch (NH s) CCN deformation (CCN d) CO in-plane bend (CO ib) CNC deformation (CNC d) NH in-plane bend (NH ib) CO out-of-plane bendb(CO ob) NH out-of-plane b e d (NH ob) CN torsion (CN t)
Atoms numbered as in Fig. 2. Positive: C moves in + Z . Positive: N moves in -2.
are given in Table I. The calculated frequencies and PEDs are given in Table 11, together with observed frequencies, and the Cartesian eigenvectors are presented in Fig. 3 (T. C. Cheam, personal communication, 1985). As will be seen from Table 11, the agreement between observed and calculated frequencies is not too good. The main reason for this, other than the CHS point mass approximation, is that we used a force field refined for another system which, though similar, is not completely analogous. In particular, the (Gly),I force field was refined for hydrogenbonded groups, which are in fact reflected in the observed NMA frequencies, whereas in the present example we have calculated the modes of an isolated NMA molecule. Nevertheless, the qualitative features of the normal modes, as given in Table I1 and Fig. 3, should be preserved. a. NH Stretch. The NH stretch mode (NH s), designated amide A, is completely localized in the NH group. It is usually found as part of a Fermi resonance doublet (see Section II,D,5,d),the other component of which is designated amide B and is observed in the 3100- to 3050-cm-I region. Although in NMA the resonance is with the overtone of amide 11, in certain conformations of the polypeptide chain NH s is in resonance with a combination of amide I1 modes. b. Amide Z. The amide Z mode is primarily a stretching of the CO bond, together with an out-of-phase CN s component and a small contribution from CCN deformation (CCN d). [Note that, even though the eigenvec-
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
195
TABLE I1 Obserued and Calculated Frequencies of N-Methylacetamide Mode
u,bQ (cm-I)
v,~, (cm-I)
Potential energy distributionb
NH stretch Amide I Arnide I1
3236s' 1653s 1567s
3254 1646 1515
Amide 111
1299M
1269
NC stretch CN stretch, CC stretch
1096W 881 W
1070 908
Amide V Amide IV Amide VI CCN deformation
725s 627W 600M 436W
721 637 655 498
CNC deformation Amide VII
289 206
274 226
NH s (100) CO s (83), CN s (15). CCN d ( 1 1 ) NH ib (49), CN s (33), CO ib (12). CC s (lo), NC s (9) NH ib (52). CC s (18), CN s (14), CO ib ( 1 1 ) NC s (77), CC s (17) CN s (31), CC s (17), CO s (16), CNC d (14), CCN d (10) CN t (75), NH ob (38) CO ib (44), CC s (34), CNC d (1 1) CO ob (85), CN t (13) CCN d (63), CO ib ( 1 l ) , CN s (8), NC s (8) CNC d (71), CO ib (19), CCN d (13) NH ob (64), CN t (15), CO ob (12)
a Absorbances are described as: S, Strong; M, medium; W, weak. Observed by Miyazawa et al. (1958). bSm, Stretch; d, deformation; t, torsion; ib, in-plane bend; ob, out-of-plane bend. All contributions 2 5 included. Unperturbed by Fermi resonance.
tor reveals a small NH in-plane bend (NH ib) contribution, this does not show up in the PED because its value of 2 is below the arbitrary cutoff of 5 used in Table 111. As a result, the dipole moment derivative ( d p / dQ) has a direction that is found experimentally to be + 15" to +25" from the CO bond direction in the plane of the peptide group (Bradbury and Elliott, 1963). [The positive direction is such as to rotate the moment from the CO bond direction to one closer to parallelism with the CN bond direction.] Experimental measurements on N , N '-diacetylhexamethylenediamine (Sandeman, 1955) and on silk fibroin (Suzuki, 1967) have given + 17" and + 19", respectively, for this angle. The direction of the dipole moment derivative can be calculated by ab initzo quantum mechanical methods (see Section VII,B,3), and it depends quite sensitively on the details of the eigenvector and therefore of the force field. In a comparison of various force fields (Cheam and Krimm, 1985), it was shown that the best prediction was given by that of Dwivedi and Krimm (1982a), which gives a value of + 17"; that of Miyazawa et al. (1958) gives +go, and that of Rey-Lafon et al. (1973) gives
196
SAMUEL KRIMM AND JAGDEESH BANDEKAR
0
b
C
e
f
FIG.3. N-Methylacetamide normal vibrations. (a) NH stretch; (b) arnide I; (c) amide 11; (d) amide 111; (e)arnide IV; (f) amide V; (g) amide VI; (h) amide VII; (i) 1070; (j)908; (k) 498; (I) 274.
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
197
- 19" (the latter resulting from the negligible contribution of CN s and the presence of a relatively large CO ib contribution). It should be noted that the small NH ib contribution results in a small downshift in the amide I frequency on N-deuteration (Miyazawa et al., 1958; Rey-Lafon et al., 1973). c. Amide II. The amide II mode is an out-of-phase combination of largely NH ib and CN s with smaller PED contributions from CO ib, CC s, and NC s. T h e transition moment direction determined experimentally is +73" or - 37" from CO (Bradbury and Elliott, 1963), the two values resulting from a sign indeterminacy. T h e result of about +73" from studies on N,N '-diacetylhexamethylenediamine (Sandeman, 1955) led to a favoring of the former value. T h e calculated transition moment direction is +69" (Cheam and Krimm, 1985), in excellent agreement with experiment and better than values of +58" and +52" predicted by other force fields. Because of the large NH ib contribution, N-deuteration has a major effect on this mode, converting it to a largely CN s mode and shifting it to -1480 cm-' (Miyazawa et al., 1958; Rey-Lafon et al., 1973). d. Amide III. T h e amide III mode is the in-phase combination of NH ib and CN s, with contributions from CC s and CO ib. Because of sign indeterminacy in the experimental transition moment direction (Bradbury and Elliott, 1963), values of +96" or +120" to the CO bond are possible. These are consistent with values of + 137", + 118", and +97" predicted by different force fields (Cheam and Krimm, 1985). All force fields predict the approximate cancellation of the NH ib and CN s contributions; they differ in assigning the main residual contribution to the dipole derivative, and thus its direction should be a very sensitive function of the force field. Since there is a large contribution of NH ib to this mode, it is affected significantly by N-deuteration: The ND ib coordinate separates out (as it does in the case of amide 11) into a relatively pure mode near 960 cm-', and the other coordinates become redistributed into other modes. e. Skeletal Stretch. Stretching of the three skeletal bonds contributes to two fairly well-defined modes: NC s mixed with a small amount of out-of-phase CC s, observed at 1096 cm-', and mixed CN s and CC s with C O s, CNC d, and CCN d, observed at 881 cm-'. In the polypeptide chain, we may expect to see these stretch contributions mixed with sidechain coordinates, perhaps giving complex modes. f . Amide V . The amide V mode, observed at 725 cm-', is largely an NH out-of-plane bend (NH ob) motion with some CN torsion (CN t). The large CN t contribution to the PED is a result of the large value of the CN t force constant in the (Gly),I force field. Calculations of the
198
SAMUEL KRIMM AND JAGDEESH BANDEKAR
various components (Cheam and Krimm, 1985) show that NH ob contributes about 5.5 times as much as CN t to the dipole moment derivative. Of course, since NH ob is such a large component of this mode, the frequency is very sensitive to N-deuteration: The band disappears and is replaced by a (mainly) ND ob mode at 510 cm-’. g. Other Amide Modes and Skeletal Deformation. The amide ZV mode is mainly CO ib and CC s, with a small contribution from CNC d. The phase relationships of these components are such that they subtract, leading to a low intensity for this mode. The amide VZ mode is mainly CO ob in terms of PED, but in fact most of its intensity derives from the NH ob component of the motion. Modes calculated at 498 and 274 cm-I involve deformation of the backbone skeleton, the former being primarily CCN d and the latter primarily CNC d. The amide VZZ mode is another strong mixture of NH ob and CN t, but like amide V its intensity derives primarily from NH ob. We see from this example that a normal-mode analysis provides a detailed understanding of the vibrational spectrum of a molecule of known structure and force field. We will discuss later how this technique can serve to determine structure.
3. Isolated Helical Molecule Many polypeptide chain structures are helical, and it is therefore desirable to have specific methods for determining the normal modes of such molecules. While actual structures are finite in length, and could be treated as “small” molecules, the theory is most simply formulated for infinite helices (Higgs, 1953a,b). It is generally assumed that, except in obvious cases, end effects for real structures are small, and that selection rules for IR and Raman activity are dominated by those for the infinite helix, viz., that only those vibrational modes can exhibit activity in which equivalent motions in each crystallographic repeat unit of the helix are in phase (Higgs, 1953a). Many authors have developed treatments of this problem (Tadokoro, 1960; Miyazawa, 1961b; Miyazawa et al., 1963; Piseri and Zerbi, 1968; Small et al., 1970; Fanconi, 1972a). Consider an ideally infinite helical chain whose crystallographic repeat contains N chemical repeat units, each with P atoms. A screw symmetry operation transforms one chemical unit into the next, with a being the rotation about the helix axis and d the translation along the axis. Let rp denote the ith internal displacement coordinate associated with the nth chemical repeat unit. The potential energy, by analogy with Eq. (3), is given by
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
199
Since the chain is periodic, it follows that the force constants FT' (= Fgn) depend only on the difference (n - n'1 = m, i.e., (39)
F?$"= F?$
Substituting Eq. (39) into Eq. (38) gives 2V = n;i,k
FirfrE +
z
(Fzrfr;'"
n;m#O;i.k
+ Fz-rfrf-")
(40)
The kinetic energy can similarly be written as 2T =
2 G i p f p E + c,
n;ik
(Czplp$'"
+ C$pfpE-")
(41)
n;m#O;i,k
Applying Lagrange's equation to Eqs. (40) and (41) leads to an infinite number of second-order differential equations in rf'", with an infinite number of normal-mode frequencies. However, the assumption of normal modes for the helix is equivalent to assuming that all r?+" vary with the same frequency and with a phase factor that depends only on m,viz., that r;+" = Ai exp[i(A1'2t+
m9)]
(42)
where Ai is the amplitude of the motion, which is independent of n, and is the phase shift between two equivalent, adjacent, internal displacement coordinates. Such a form is required by the helical symmetry (Higgs, 1953a). Substitution of Eq. (42) into the above system of differential equations reduces these to a set of 3P simultaneous homogeneous linear equations in the unknowns Ai. This set has a nontrivial solution only if C#J
IW)F(4) - h(4)El = 0
(43)
where
and
Equation (43) has 3P characteristic roots h (= 4,rr2c2v2) for each value of the phase 4. We have thus effectively reduced the calculation of the infinite helix to
200
SAMUEL KRIMM AND JAGDEESH BANDEKAR
that of one chemical repeat. T h e 3P functions vi(#J) are known as the dispersion relation, and the shapes of the branches, i.e., the variations of the frequencies with the phase, depend on the coupling between neighboring chemical units through the G or F matrices. Since (Kittel, 1969) v(#J)= ~ ( - 4 )and v(#J 27r) = v(#J),only the range 0 5 #J 5 7r, which corresponds to half of the first Brillouin zone (Brillouin, 1953), is needed. The true Brillouin zone of a helix is 1/N times smaller than the above. The dispersion curves in this zone are obtained by folding at the zone boundary points, i.e., #I = m r / N (0 5 m 5 N - 1). This results in four acoustic branches, i.e., in which v + 0 as #J --* 0, corresponding to three translations plus a rotation about the helix axis. Infrared-active normal modes occur for #J = 0 and a,and Raman-active normal modes for #J = 0, a,and 2a (Higgs, 1953a). The density of vibrational states in the infinite set can be obtained by summing d#Jldvas a function of v over the dispersion curves, i.e., determining how densely the phase angles are distributed in v space. T h e computations for a helical structure start by calculating the B matrix, which, through Eq. (37), is used to calculate the Gmmatrices of Eq. (44)(Small et al., 1970). T h e Gomatrix in this equation is that for a single chemical repeat unit, except that some atoms outside this unit must be used to define the internal coordinates within the unit. In hydrogen-bonded helices, internal coordinates in one unit involve atoms in nonadjacent units, and it is therefore necessary to have coordinate transformations between different units of the helix. To calculate the B matrix we proceed as follows. The internal coordinates belonging to the nth chemical repeat unit, r7',are related to the Cartesian coordinates of unit n + m, xn+"', by rn = C Bn,n+mxn+m (46)
+
m
Taking the helix axis as the z axis, the xn+mare related to the ( x ' ) ~ in + ~the rotated basis as follows: = Hn+m(a)Xn+m
(X')n+m
(47)
where (Goldstein, 1950)
+ m)a] -sin[(n + m)a] sin[(n + m)a] cos[(n + m)a]
cos[(n
Hn+m(a)=
[
0
0
0 0]
(48)
1
is the transformation matrix for rotating the basis vectors. We have neglected pure translations since they do not affect internal coordinates. Substituting from Eq. (47) into Eq. (46) we get
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
201
Since B n ~ n + m ~ + mmust ( a ) be independent of n, we can write Bn,n+m&+m(a) = Bo.mHm(a). Then, to obtain B, we sum over all values of m:
B=
C BosmHm(a) m
It is through this relation that the dimensionality of the infinite helix is reduced to that of a chemical repeat unit. If the calculation is being done in the internal coordinate basis, the B matrices are used, through Eq. (37), to calculate the Gmmatrices of Eq. (44). If the elements of F(+) are known, viz., the force constants in internal coordinates within one chemical unit, FO, and any force constants that span units m apart, P, then Eq. (43) can be solved for any value of 4: 0, a,and 2a for the IR- and Raman-active modes, and for a range of values between 0 and T in order to obtain the dispersion relation. T h e fact that G(+) and F(+) are complex is no problem: Transformations exist that will convert the complex matrices into equivalent real ones (Miyazawa et al., 1963). If Cartesian coordinates are used, then B(4) matrices, corresponding to Eq. (50) multiplied by eim+,are used to calculate the F,(+) [Eq. (30)] and then secular equation (29) is solved. It should be noted that there are advantages in working in Cartesian coordinates, even though the force constants are not as meaningful as in the internal-coordinate basis. First, it is much easier to include nonbonded interactions, which otherwise must be included as separate internal coordinates for each interaction. Second, whereas the G(+)F(c#J) matrix in Eq. (43) is in general not symmetric, the M-'Fx(+) matrix of Eq. (29) is, and it is usually easier to diagonalize a symmetric matrix. 4. Molecular Crystal T h e treatment thus far has been for isolated molecules, either small or one-dimensional infinite helices. I n some instances crystalline intermolecular interactions are important, and it is therefore necessary to be able to compute the normal modes of a molecular crystal. T h e general theory of crystal dynamics has been presented by Born and Huang (1954), and the theory of molecular vibrations in solids has been discussed by a number of authors (Fanconi, 1972a,b; Zak, 1975; Net0 et al., 1976; Decius and Hexter, 1977; Califano, 1977; Schrader, 1978). The formalism developed in the preceding section can be modified to apply to a molecular crystal. T h e starting point is the asymmetric unit
202
SAMUEL KRIMM AND JACDEESH BANDEKAR
in a unit cell. Let xn denote the set of Cartesian coordinates of all atoms in the reference asymmetric unit. Then X" includes all atoms required to define all the internal coordinates in the asymmetric unit. Using Eq. (7), we have rn
=
Bn,n+mxn+m m
where rnrefers to the internal coordinates of the asymmetric unit n. The xn+m are related to the ( ~ ' ) n +in~ the rotated basis as follows: =
(x')n+m
Hn+m(a)Xn+m
=
C;+mxn,
(52)
where
[
cos[(n + m)a]
+ m)a]
Hn+m(a) = -sin[(n
]
sin[(n + m ) a ] 0
cos[(n + m)a] 0
0
0
1
(53)
and C;+m denotes the character of the character table corresponding to the symmetry operation for the a species (see Section II,E,2). From Eq. (52), since the H matrices are orthogonal, Xn+m
=
C;+mAn+m(a)xn
(54)
Substituting in Eq. (51) for x " +gives ~ rn
=
x
C;+mBn,n+m~+mxn+m
(55)
m
[Note that the B matrix elements in Eqs. (55) and (51) refer to the same basis; in Eqs. (46) and (49) they refer to different bases.] This implies that the B matrix elements for a given species, P,are B(rQ)=
2 m
C;+mBn,n+m&+m(a)
(56)
or, since n is a dummy variable,
B(P)
CZBosmHm(a)
= m
(57)
Once the B matrix elements are computed for all values of m, those belonging to a given species are computed using Eq. (57). As mentioned before, when working in internal coordinates, it is necessary to compute the corresponding G(P) matrix elements. From Eq. ( 1 l), the potential energy may be written as
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
203
where the F"" are the force constant matrix elements in internal coordinates. For the I'" species,
rm= C;rn
(59)
so that
which implies that
F(P) =
C;Pm m
Once the B (or G.) and F matrix elements are known for a given symmetry species, the solution of the secular equation leads to the desired eigenvalues and eigenvectors. It should again be noted that the dimensionality of the B or F matrices is reduced to that of an asymmetric unit in the unit cell. The preceding discussions have established the theoretical basis for computing the normal modes of vibration of a polypeptide chain molecule. Throughout the discussion we have assumed knowledge of the structure of the molecule and of its vibrational potential energy function. It is now necessary to examine these two kinds of inputs, and in particular to understand how we can obtain a polypeptide force field that might serve to predict the vibrational frequencies of an arbitrary chain conformation. C . Polypeptide Chain Geometry and Coordinates
1 . Standard Geometry The secondary structure of a polypeptide chain is determined by its set of dihedral angles +i (rotation about the Ni-CP bond) and I,+(rotation about the CP-Ci bond), once it is assumed that the peptide group is planar and has a given geometry (Pauling et al., 1951). Although bond lengths and angles in the peptide groups of various molecules may vary slightly, it is reasonable to assume standard bond lengths and angles for this group, and we have done so in our calculations. This geometry (Corey and Pauling, 1953) is given in Table 111. It is less obvious, however, that the peptide group will always be planar, since relatively little energy is required for a small out-of-plane twist (Winkler and Dunitz, 1971). In fact, X-ray crystal structure analyses of peptides and proteins have shown that departures from planarity of 5- 10" are not uncommon. In our refinement of a force field for the polypeptide chain we have
204
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE 111 Standard Geometry of the Peptide Group Bond lengths
(A):
Bond angles (degrees):
l(C"-C) = 1.53 l(C-N) = 1.32 l(N-C")= 1.47
l(C=O) = 1.24 l(C"-H) = 1.07 l(N-H) = 1.00
C"CN = 114.0 C"C0 = 121.0 CNH = 123.0 CNC" = 123.0 NC"C = NC"H = CC"H = tetrahedral
taken the peptide group as planar, since this is applicable to NMA and to the (Gly),I, P-poly(L-alanine)[P-(Ala),], and a-poly(L-alanine)[a-(Ala),,] polypeptide structures used in the refinement. Despite the slight variability in other structures, we have also generally maintained the assumption of planarity in our calculations. This is not expected to lead to serious problems in the computed frequencies. In terms of the discussion in Section II,B, we therefore seek a set of FV with maximum transferability between different structures, recognizing that some changes will be necessary because of differences in hydrogen-bonding geometry and that small changes due to conformational differences can occur. We expect that the largest differences in the A, will arise from the significant changes in G (or in F,) brought about by the dependence of B on the three-dimensional geometry.
2. Internal and Symmet? Coordinates The eigenvectors of polypeptide chain modes, as in the case of NMA, can be described by PEDs in terms of symmetry coordinates, which in turn are related to internal coordinates. A list of the internal coordinates for (Ala), is given in Table IV, and the local symmetry coordinates are given in Table V (Moore and Krimm, 1976b).These serve as the general local symmetry coordinates for most polypeptide chain structures [for the particular set for (Gly),I, see Dwivedi and Krimm (1982a)l. D. Polypeptide Force Field The development of a force field suitable for the polypeptide chain involves several steps. First, as we will see below, it is necessary to select a physically appropriate form for the potential, V, both for the intramolecular, Vi,,,,, and the intermolecular, Vi,,,,, parts. Second, we need a formalism for producing an acceptable set of force constants from the observed data; this is usually done by a least-squares procedure. Third, the molecules used, and their sequence, in the refinement process
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
205
TABLE IV Internal Coordinates far Poly(L-alanine)
RI = Ar(C"-C) R2 = Ar(C-N) R3 = Ar(N-Ca) R4 = Ar(C=O) R5 = Ar(N-H) R6 = Ar(C"-H") R7 = Ar(C@-H) RE = Ar(C@-H) Rg = Ar(C@-H) R l o = Ar(C"-CP) R l l = Ar(H 0) R12 = Ar(H" *..Hn) Rls = AO(C"-C-N) R14 = AO(C-N-C") R15 = AO(N-C"-C) R16 = AO(Cn-C=O) R1, = AO(N-C=O) Ria = AO(C-N-H) Rig = AO(C"-N-H) R20 = AO(N-C"-Ha)
should be such as to lead to a properly convergent (if not unique) solution. We discuss these matters below, and present our present force field for a polypeptide chain with trans peptide groups. [Preliminary studies relevant to the cis peptide group have also been completed (Cheam and Krimm, 1984a,b).] 1 . Intramolecular Potential Functions No analytical form is known by which to express the potential energy of a molecule. It is therefore convenient to expand the potential in a Taylor series:
and to neglect, in the harmonic approximation, terms higher than quadratic. For convenience we set Vo = 0, and, if the internal coordinates are independent and the molecule is at equilibrium, the second term vanishes since all Fi = (aV/ar& = 0. Thus,
206
SAMUEL KRlMM AND JAGDEESH BANDEKAR
TABLE V Symmetry Coordinates for Poly(L-ahnine)
N-C" stretch Cn-C stretch C-N stretch C=O stretch N-H stretch CeCa stretch 0 - H " stretch CHs symmetric stretch CHs asymmetric stretch 1 CHs asymmetric stretch 2 C"-C-N deformation C=O in-plane bend C-N-C" deformation N-H in-plane bend N-C"-C deformation Cabend 1 Cb bend 2 H" bend 1 H" bend 2 CHs asymmetric bend 1 CHs asymmetric bend 2 CHs rock 1 CHs rock 2 CHs symmetric bend C=O out-of-plane bend N-H out-of-plane bend N-C" torsion C"-C torsion C-N torsion Cn-Ca torsion C=O H in-plane bend N-H ... 0 in-plane bend H 0 stretch H" Ha stretch C=O torsion N-H torsion
where the
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
207
are the harmonic force constants. Rarely can a complete harmonic force field be determined from the data, since there are in general only 3N - 6 vibrational frequencies but (3N - 6) 4[(3N - 6)2- (3N - 6)] = f(3N 6)(3N - 5) values of the Fq. It is therefore necessary to restrict the force field to a manageable number of Fq, and this is usually done by assuming what is considered to be a physically reasonable model. Many such model force fields have been discussed in the literature. [For general discussions, see Herzberg (1945), Wilson et al. (1955), Woodward (1972), and Califano (1976). For discussions of the UreyBradley force field, see the review by Duncan (1975). For discussions of the entirely different consistent force field approach, see Lifson and Warshel(1968), Warshel et al. (1970), and Burkert and Allinger (1982).] We have chosen to use a simplified general valence force field (SGVFF), which has been defined as one “which contains the minimum possible number of interaction constants compatible with a good fit of the spectra” (Califano, 1976). Such a force field has been demonstrated to be very effective for hydrocarbops (Schachtschneider and Snyder, 1963). For this form the potential energy of Eq. (63) is written explicitly as
+
+ 2 2 Fd r,e
Ar Af? + 2
2 Few Af? Am + 2 2 F , e,w
Ao Ar (65)
0.7
In Eq. (65) it is to be understood that the terms in the first line include cross-terms between similar coordinates, i.e., Ari Ar,, Af?; Ad,, etc., although these are assumed to be physically significant only for i close toj. A similar assumption is made for the cross-terms in the second line of Eq. (65). In both cases, experience with such force fields provides a guide as to which off-diagonal terms to include: terms in Ar Ao have been found to have a minor effect on the frequencies, and terms for which the Jacobian elements (see Section II,D,3) for all modes are small are usually negligible. The units for the force constants are F,, mdynl& F+, mdyn; all others, mdyn A. Even with the reduction in number represented by Eq. (65), there still are not enough observable vibrational data for a large molecule to permit a determination of the force constants. Additional assumptions are necessary, and two in particular are very useful. First, we assume the essential transferability of comparable force constants in different molecules containing the same groups. Thus, peptide group force constants for NMA should serve as a satisfactory starting point to describe the force field for this group in the polypeptide chain (although of course differences in hydrogen bonding have to be taken into account). Second,
208
SAMUEL KRIMM AND JAGDEESH BANDEKAR
we assume that when isotopic changes are made, as when NH is replaced by ND, the force field is unchanged. While this is true in the harmonic approximation, molecular vibrations are in fact anharmonic. When such anharmonicities are large, as in the case of the NH s mode, we find that calculated frequencies for isotopic derivatives deviate significantly from observed. However, nonstretch modes are affected much less, and thus both of the above assumptions result in the addition of more independent data without changing the number of FV. In favorable cases, such as the hydrocarbons, there can be u p to 10 times as many frequencies as force constants, thus significantly overdetermining the latter; in the polypeptide case this number is closer to three. The consistent force field (CFF) represents a different approach to increasing the ratio of observables to parameters. In this method, the total potential V is parameterized to a range of properties of a set of molecules, including known equilibrium structures and energies. This approach leads to a potential function that can be used for energy minimization and molecular dynamics calculations (Lifson and Warshel, 1968; Lifson et al., 1979; Brooks et al., 1983; Levitt, 1983). However, such functions have not led to good reproduction of frequencies, perhaps because frequencies have not been given great weight in the parameterization. Although improvements have been made (Lifson and Stern, 1982),these still do not provide satisfactory frequency agreement, and we have therefore used a refined vibrational force field for the determination of structure from vibrational spectra. 2 . Intermolecular Potential Functions I n analyzing the vibrational spectra of polypeptides, it is important to include certain intermolecular contributions to Vi,,, . We discuss here three of these contributions, two of which, hydrogen bonding and transition dipole coupling, have played a very important role in the development of our force field. a. Nonbonded Interactions. Nonbonded atom-atom interactions may be included explicitly (as in the Urey-Bradley force field) or implicitly (as in the SGVFF) in the intramolecular force field. However, in intermolecular potentials such interactions must always be explicitly included. T h e functional form of such a nonbonded potential, Vnb, has been discussed by a number of authors (Kitaigorodsky, 1961, 1973, 1978; Williams, 1981). It is composed of attractive, Vatt, and repulsive, Vrep, parts:
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
209
where Vat, is due to the long-range dispersion energy of interaction between two neutral atoms a distance R apart, and has the form Vat, = -AIR6
(67)
The constant A can be calculated (Slater and Kirkwood, 1931), and is given by
where a1 and a2 are the polarizabilities, N1 and N2 are the effective number of polarizable electrons on atoms 1 and 2, respectively, and e, m, and h are the familiar physical constants. In distinction to Vatt, no single analytical form has been used for the short-range exchange repulsive forces. In the more commonly used Lennard-Jones potential,
V,,, = B1R12
(69)
and only one additional parameter is introduced. The Buckingham potential,
Vrep = B exp(-CR )
(70)
is known to be more correct from quantum-mechanical calculations, but it introduces two additional parameters. A typical procedure for determining the parameters is to assume a form for the potential function (the Lennard-Jones function is most commonly used) and to calculate the constants by requiring that the functions reproduce a set of known crystal structures. Various authors have used this procedure to obtain nonbonded potentials suitable for atom-atom interactions within and between polypeptide chains (Brant and Flory, 1965a,b; Ramachandran and Sasisekharan, 1968; Scheraga, 1968; Lifson et al., 1979; Hagler et al., 1979a,b). These potentials have been quite satisfactory in predicting other structures, but they have not been uniformly successful in the calculation of intermolecular vibrations (Cheam and Krimm, 198413). Nonbonded interactions have generally not been included in our force field, except where an unusually close contact is present, such as Ha Hain (Gly),I (Moore and Krimm, 1976a).The main reason is that they ordinarily have a small influence on the medium-range frequencies, and in particular no effect on amide I or amide I1 splittings (V. Naik and S. Krimm, unpublished results, 1985). When obviously needed, as in (Gly),J, we have introduced a force constant derived from an intermolecular potential for the specific interaction involved. Intermolecular
210
SAMUEL KRIMM AND JAGDEESH BANDEKAR
nonbonded terms probably should be incorporated to give a better description of low-frequency vibrations. b. Hydrogen Bonding. The pervasive role of N-H --.O=C hydrogen bonds in determining the secondary and tertiary structures of proteins needs no emphasis, certainly not since Pauling et al. (1951) noted the energy cost of -3 kcal/mol for not maximizing hydrogen-bond formation. The geometrical properties of such bonds are very well characterized (Schuster et al., 1976; Taylor et al., 1983, 1984a,b), as are their properties in proteins (Baker and Hubbard, 1984). It is clear that these interactions, which can be intramolecular as well as intermolecular, must be explicitly incorporated in a vibrational force field. There are two main ways of handling the hydrogen-bond contribution: through an analytical potential function or with parameterized force constants. We have chosen to use the latter method, but we will comment on the former as well. Analytical functions for the hydrogen-bond potential, Vhb, seek to describe the energy of the X-H *. Y-Z interaction as a function of the geometrical parameters of this group of atoms. One of the first successful functions, designed to account for structural and spectroscopic properties of a linear X-H --.Y bond, was the Lippincott-Schroeder function (Lippincott and Schroeder, 1955). This potential represents the X-H and H -.* Y interactions by functions of a covalent bond type, and adds terms representing the X ... Y attractions and repulsions. Subsequent modifications (Schroeder and Lippincott, 1957; Chidambaram et al., 1970; Balasubramanian et al., 1970) allowed for nonlinear bonds. In order to avoid such complex descriptions, it can be argued that the hydrogen bond is fundamentally no different from other intermolecular interactions, and should therefore be describable by functions similar to those used for the nonbonded potentials (except, of course, that the constants will be different). A number of authors have taken this approach (Lifson et al., 1979; Brooks et al., 1983; Sippl et al., 1984). A variety of other hydrogen-bond potentials has also been given. In our force field the hydrogen bond is treated parametrically in terms of a local valence force field, with the contributing force constants being H 0 s, N-H 0 ib, and C=O --.H ib. This approach cannot provide the degree of transferability that would be possible with an analytic function, but it is simpler to use and should not seriously affect the mid-range frequencies. In addition, this representation behaves in a physically reasonable way, in that we find that the H 0 s force constant increases as the H -..0 distance decreases, i.e., as the hydrogen bond becomes stronger. (This is also accompanied by a decrease in the NH s force constant.) We believe that for purposes of studying the
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
2 11
influence of backbone conformation on the normal-mode frequencies, except perhaps in the low-frequency region, this is a satisfactory approximation. O=C hydroA comment is in order about the possibility of C-H gen bonds in polypeptide structures. Such a bond was suggested in the structure of polyproline I1 between a proline CHP group and the C=O group on an adjacent chain (Sasisekharan, 1959), but the attractive nature of this interaction was subsequently questioned (Arnott and Dover, 1968). In a revision of the structure of collagen (Ramachandran and Sasisekharan, 1965), such a bond was proposed between a backbone (glycine) CH2 group and the C=O on an adjacent chain. A similar interaction was also suggested for (Gly),II (Ramachandran et al., 1966a, 1967). Strong arguments were not possible in these cases because of the poor quality of the X-ray diffraction patterns of these polypeptides. However, spectroscopic evidence supported the presence of C-H 0 hydrogen bonds in (Gly),II (Krimm et al., 1967), as it has in a wide variety of other molecules (Green, 1974). Evidence for C-H ... 0 hydrogen bonds has also been sought from the structures of spa11 molecules. From an early survey of X-ray crystal structures (Sutor, 1963) it was concluded that a number of short C-H .-.0 contacts could be ascribed to hydrogen bonds, but a similar consideration of the available data challenged this interpretation (Donohue, 1968). In an in-depth analysis of 113 neutron-diffraction crystal structures (Taylor and Kennard, 1982), however, it was concluded that the majority of the surveyed short contacts “are attractive interactions, which can reasonably be described as hydrogen bonds.” It was also found that the shortest such contacts were particularly likely when the C-H was adjacent to a neutral or positively charged N atom. Although the N atom of the peptide group has a net negative charge, a quantummechanical population analysis shows that the Ha atom between two peptide groups can have a net charge of’ +0.2 e (Hagler and Lapiccirella, 1976). This would favor an attractive interaction between this atom and the negatively charged 0 atom of the C=O group. As will be seen below, a detailed normal-mode analysis of crystalline (Gly),II (Dwivedi and Krimm, 1982c) provides strong evidence for the 0 hydrogen bonds in this structure. It therefore presence of C-H seems likely that such hydrogen bonds can form under certain favorable conditions. c. Transition Dipole Coupling. We noted in the early part of this section that force-field models are necessary in order to reduce the number of force constant parameters. In the case of polypeptides, it was only natural to try to limit the SGVFF description to the usual internal coordi-
212
SAMUEL KRIMM AND JAGDEESH BANDEKAR
nates of a molecule plus hydrogen-bonding (and possibly nonbonded) interactions between molecules. Careful normal-mode analyses, however, have shown that the above potential energy terms are not sufficient to account for certain spectroscopic details, and that another kind of interaction involving resonance transfer of excitation, and which in general is expected to be present (Hexter, 1960), is of particular relevance for polypeptide systems. The identification of this interaction has had significant consequences in enhancing the power of vibrational spectroscopy as a tool for determination of three-dimensional structure. The main spectroscopic observation that required explanation was the -60-cm-I splitting in the infrared-active amide I (mainly CO s) modes of antiparallel-chain pleated sheet (APPS) polypeptides. Miyazawa (1960a) proposed that such splittings must be a consequence of the interactions between similar oscillators within the repeat unit of the structure, namely, the four peptide groups in the present case. He showed by a perturbation treatment that the frequencies for the four possible coupled modes would depend on the relative phases of the vibrations and the magnitudes of the interactions between peptide groups according to the relation v(6,S’)= Yo
+ DlO cos 6 + Do1 cos 6’
(7 1) where 6 and 6’ are the phase angles (0 or r)between adjacent groups in the same chain and in the neighboring chain connected by a hydrogen bond, respectively, D I Oand Do1 are the corresponding intrachain and hydrogen-bonding interchain interaction constants, and Y O is the amide frequency in the absence of such interactions. Although this treatment provided important insights into the understanding of the spectra of polypeptides and proteins (Miyazawa, 1960a; Miyazawa and Blout, 1961; Krimm, 1962), it suffered from some basic difficulties that were resolved only when it was recognized that another term had to be added to Eq. (71), viz., one whose physical origin was due mainly to interacting dipole derivatives, i.e., transition dipoles (Krimm and Abe, 1972). A resonance interaction can occur between two oscillators when one of them is in an excited state. The energy of this interaction is determined by that part of the total Hamiltonian that represents all pairwise coulombic interactions between electrons and nuclei in the two groups. At sufficiently large distances (probably over 3 A) these interactions can be expanded in a multipole series, of which the first important term for a neutral system is that due to transition dipole coupling (TDC). Higher transition multipoles may be important in some cases (Cheam and Krimm, 1985), but we treat here only the TDC case (Krimm and Abe, 1972; Moore and Krimm, 1975; Cheam and Krimm, 1984~).
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
2 13
FIG. 4. Geometrical parameters in transition dipole coupling interaction.
Our goal is to evaluate the contribution to V;,,,, due to the interaction between similar transition dipoles on peptide groups A and B. The potential for the interaction between two dipoles p.A and p~ a distance RABapart is (Jackson, 1975) AB Vdd
-
(l/E)IPAI
= (I/&)
I P B I [gA
. e^B - 3 ( g A
*
gAB)(e^B
IPA( 1 PB [XAB
'
gAB)]/RiB
(72)
where E is the dielectric constant (which we assume to be l), t A , & , and EAR are unit vectors in the directions of the dipoles and the line joining their center (see Fig. 4), and X A B is referred to as a geometrical factor. If the dipole moment is expanded in normal coordinates, then, assuming electrical harmonicity, P = Po
+
c (W/@J)rL i
(73)
and we have for the force constant for this interaction in the cuth transition
2 14
SAMUEL KRIMM AND JAGDEESH BANDEKAR
(d2v$/aQ:)
= (0.1)(ap/a&)2xzB
(74)
in units of mdyn A-1 u-l, where u is the atomic mass unit. We can also express TDC force constants in terms of local symmetry internal coordinates: Fij = (O.l)J(ap/ar;)JJ(ap/drj)JXq
(75)
if we recognize that the coordinate transformation [Eq. (l8)], namely
requires that we sum over terms lap/aril Idp/drjl. For a harmonic oscillator, the transition dipole moment, Ap, [in debyes (D)], is given by
where vo is the unperturbed frequency (in cm-') and (ap/d&) is in units of D A-1 u-lI2. Thus, the frequency shift in cm-' for the a t h mode due to TDC is
Av, = (AV$i/hc) = 5 0 3 4 ( A ~ , ) ~ X $ ~
(78) (79)
It follows from Eq. (23) that Av, is also given by
Finally, we note that integrated intensities, A, in cm mmol-l are also related to transition moments by (Person and Zerbi, 1982) A = (N,.rr/3~~)I(dp/aQ)1~ = 4225.471(ap/dQ)l2
(81)
where N , is Avogadro's number. Since Av and A are measurable quantities, and the ap/ar can be calculated by quantum-mechanical methods (Cheam and Krimm, 1985) (see Section VII,B,3), it is possible to evaluate from experiment the internal consistency of the TDC hypothesis for explaining band splittings in polypeptide systems. For (Gly)"I, Moore and Krimm (1976a) found that amide I splittings could be accounted for by TDC using lAp1 = 0.348 D oriented at 20" to the CO bond. From Eq. (77) we find that, for vo = 1672 cm-l (Mooreand Krimm, 1976a), this corresponds to
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
215
lap/aQl = 3.466 D u - ” ~ and , from Eq. (81) we obtain an integrated intensity of 50,760 cm mmol-I. Measured intensities (Chirgadze et al., 1973) fall in the range 47,000-61,000 cm mmol-I for ordered p structures and 30,000-5 1,000 cm mmol-I for random conformations. From a6 znitio calculations on a hydrogen-bonded NMA molecule (Cheam and Krimm, 1984c, 1985), values of lap/aQI were found to be 3.144 D A-l u - I ’ ~ (24” to CO) for the A, mode and 3.065 D A-1 u-”* (29” to CO) for the B, mode. Thus, the transition dipole moment obtained from band splittings is in good agreement with experimentally measured band intensities and with values calculated by quantum-mechanical methods. It should be noted that similar agreement was obtained for TDC parameters of amide I1 modes (Cheam and Krimm, 1984~). Two important conclusions emerge from these results. First, although the designation v ( & 6 ’) can still be used to describe normal modes of the APPS structure, the formalism of Eq. (71) can no longer be considered an adequate basis for explaining the splittings, nor can it be extended in general to other structures. Since TDC interactions were summed over a sphere of radius 30 A, and since the assumed value of Ap, which is in substantial agreement with that obtained from ab initio calculations, can account for the observed splittings, the contributions from valence force-field interactions represented by the Dl0 and Do1 terms must be negligible. Second, since the frequency shifts depend on XABin Eq. (72), which in turn is a function of the geometrical arrangement of the transition dipole moments, amide I (and amide 11) splittings provide a sensitive measure of the relative spatial arrangement of peptide groups in a polypeptide chain. We will see in the various examples discussed below how this dependence manifests itself in practice. In this connection, it should be noted that the relative values of the L, in Eq. (76) depend on the relative phase of the vibrations in the interacting groups: If, as in the case of the /?sheet, symmetry determines this angle to be 0 or 7,then only the value of X A B is affected; if symmetry is not operative, then the relative magnitudes of the L, must be obtained from the eigenvector for the overall normal mode. 3, Least-Squures Refinement Once a force-field model has been chosen, we need a mechanism whereby a set of force constants can be selected that gives optimal prediction of a (usually larger) set of observed frequencies. This could be done by a manual trial-and-error adjustment, but a least-squares fitting procedure is more satisfactory (for discussions of these methods see Duncan, 1975; Califano, 1976; Gans, 1977; Zerbi, 1977). In the present discussion we assume that we have “reasonable” starting force constants
216
SAMUEL KRIMM AND JAGDEESH BANDEKAR
and that observed bands have been properly assigned to calculated normal modes. Both of these points are discussed below. Suppose that we have a starting set of arbitrary force constants f and we wish to minimize the sum of the weighted squared errors: xo =
c
Wp(upbs - u p ) * =
6WS
(82)
1
where 13, = upbs- u?" and w, is a weighting factor. If we change the force constants by Af, this leads to a change in vcaicof Av, and the new sum is
x = (S - AC)W(S - Av) = SWS - SW AV - ACWS + Ai, W AV (83) We want to minimize x not in terms of the frequencies but in terms of the force constants, which we are trying to adjust. We therefore assume (Long et al., 1963) that a linear relation holds for small changes, that is,
Av = J Af,
(84)
where J is the Jacobian matrix with elements], = au:ll'/af,, so that Eq. (83) can be written as
x
6WS
=
T h e minimum in
-
6WJ AF
+ AiJWJ Af
(85)
+ 2JWJ Af
(86)
(JWJ)-'JW(vobS- v'"')
(87)
-
AfJWS
x is given by
(axla A f ) = 0 = -JWS
- JWS
from which
Af
=
(JWJ)-lJWS
=
Equation (87) permits us to calculate the Af that will minimize x. In practice, the procedure for getting a new set o f f is iterated until either a specified number of cycles is complete or x becomes smaller than a preassigned value. The dispersions in the refined force constants after the Kth cycle are given by
where No is the number of observed frequencies and Nf is the number of force constants being varied. The J matrix can be obtained once we have defined the F matrix in terms o f f , which is done through a Z matrix that depends on the force field being used. Thus, with
F=
2 Z' fi, I
and using Eq. (23), A = LFL, we have
(89)
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
2 17
and
The scheme that we use for such a force-constant refinement is shown in Fig. 5. We will not go into the details, but it is important to be aware of the fact that there are basic problems in the least-squares refinement of force constants (see previous references). Aside from the limitations imposed by the force-field model and the assumption of its transferability between (assumed) common structures, there are two main categories of problems. The first has to do with the handling and weighting of experimental data. We want as large a number of experimental frequencies as possible so as to reduce the dispersions in the force constants, as shown by Eq. (88). This can be achieved by using a large set of similar molecules as well as isotopic species (although the latter introduce problems associated with different anharmonicities). As seen from Eq. (82), we need to assign weights to the observed frequency differences, and this is often not a straightforward procedure. We usually assign large weights to well-assigned bands, particularly if they are not weak, but this procedure is difficult to quantify. The second kind of problem is mathematical, and there are several of these. The minimization procedure is based on the assumed linearity of the Jacobian elements [Eq. (84)]and this may not be valid if the starting force field is a “poor” one. If the determinant of the matrix (JWJ) is close to zero, then inverting it to get Af [viz., Eq. (87)], can lead to large errors, even singularities; the best way to minimize this possibility is to have at least three to four times as many frequencies as force constants, distributed so that enough observed frequencies occur in a region to which a particular force constant is contributing (Zerbi, 1977). Finally, the least-squares calculation may not have a unique solution. All of these issues need careful evaluation in arriving at a satisfactory force field. 4 . Valence Force Field for the Polypeptide Chain
In developing a transferable SGVFF for the polypeptide chain, we have utilized strategies based on all of the considerations discussed thus far in this article. We review these developments before presenting the details of the force field.
218
SAMUEL KRIMM AND JAGDEESH BANDEKAR
I
START
I
INPUT No. of force constant8 No. of variable force constants ( N f ) , No. 04 observed frequenciel (No). No.of cycles (Nc), value of refinement Index.8
\
INPUT Rood F matrix elements ReadG motrix element8
/
INPUT Rood force conatant names. volues of observed frequencies and weighting element8
Compare observed doto with computed frequencies
4
1
Form the Jocobion matrix
I
Modify the vorioble force constants A f = (?WJ)-’?W8
I
Convergence not poraible within prescribed No. of cycles
Print No. ot cycler. refinement index.8, computed ond observed frequencies
-t
FIG. 5 . Schematic diagram of force-field refinement program.
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
2 19
In order to have a reasonable starting set of peptide force constants, a complete analysis was done on NMA and its deuterated derivatives as well as on some nylons and their deuterated derivatives (Jake5 and Krimm, 1971a). Force constants for the hydrocarbon portions of these molecules were transferred from the elegant work of Schachtschneider and Snyder (1963) on n-paraffins. A total of 125 NMA frequencies and 306 nylon frequencies were fit with a force field of 27 transferred and 84 refined force constants, with an average error of JAvJ= 4.3 cm-', and a total of 51 frequencies having an error greater than 10 cm-'. The ratio of observed frequencies to refined force constants, NoINf, was therefore 431I84 = 5.1. This force field was then used as a starting point ,in refining a force field for (Gly),I (Abe and Krimm, 1972a), with only modest success. A total of 109 frequencies of (Gly),I and its isotopic derivatives were fit using 34 transferred or reasonably fixed and 38 refined force constants, giving NoINf = 2.9. However, this refinement resulted in a IAvl = 9.1 cm-l, with 26 frequencies having lAv( > 10 cm-'. When this force field was used as a starting point in a global refinement of (Gly),I, P-(Ala),, and P-poly(L-alanylglycine)[P-(AlaGly),], much better results were obtained. In this case, 131 frequencies were fit using 74 transferred or reasonably fixed and 47 refined force constants, for NoINf = 2.8. In this case we obtained JAvl = 4.9 cm-', with 16 frequencies for which (Avl > 10 cm-'. In the most recent refinement (Dwive'di and Krimm, 1982a,b,c, 1984a),we have added (Gly),II and a-(Ala), frequencies as well as those from an (unpublished) analysis of P-(AlaGly),. Frequencies of deuterated molecules were used only as guides in the initial stages of the refinement, and some force constants were allowed to vary slightly from one structure to another. The set of 198 frequencies of the native molecules is reproduced with IAvI = 5.0 cm-' and 13 frequencies with lA.1 > 10 cm-'. This force field is given in Table VI, and is the basis for our structural analyses. For cases where the backbone conformation is the major interest, it has seemed desirable to have a force field in which the side chain is approximated by a point mass. We have refined such force fields for P-(Ala), and a-(Ma),, starting from the detailed force field of Table VI (Dwivedi and Krimm, 1984b). This approximate force field gives frequency and eigenvector agreement comparable to that obtained with the full calculation (see Sections III,C,l and IV,B,l). The criteria discussed so far for judging the suitability of a force field have involved the extent of agreement between observed and calculated frequencies (we have assumed proper assignments of bands, which we discuss below). Recent ab initio calculations of dipole derivatives
220
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE Vl General Valence Force Constants far Different Polypeptide Chains Valueb for different
4.523 (4.823) 4. I60 6.415 9.882 5.674 e 4.4628
4.323
5.043
4.843
* *
* *
4.409
4.409
10.029 5.830
9.955 5.752
4.323
(*I
* *
5.840 4.564
4.523
4.564 4.980 (5.280) 4.800 0.150
* (5.080)
*
*
*
0.120
0.135
0.125
*
9.62 1 5.720 5.856 4.564 4.430 4.564 4.780
0.160 0.110
0.0027 0.819 0.765
* 1.119 (0.819)
1.119
*
0.050
0.819
1.150
0.715 0.715
0.715 0.7 15 0.785 0.715
*
*
(1.093) 1.446 1.033
1.446 1.033
1.246 1.400
1.166 1.300
1.046 0.687 0.556
1.046 0.687 0.556
1.246
1.166
0.527
0.5259 (0.5759) 0.566
0.826
0.826
0.687
0.537 0.532 0.487
0.556
0.556
0.527
0.684
0.654
1.193 (1.0783) 1.306 0.933 (0.833) 1.306 0.677 0.566
(*)
* 0.684
0.537 0.532
*
0.684 0.684 0.684
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
22 1
TABLE V I (Continued) Valueb for different polypeptidesc,"
CC"C0 H"C"H H"C"CB HUH C@CUC@ C"H" ... 0 ib CO ... H ib CO -.* H" ib NH ... 0 ib NH ... O b ib CO ob Cob ob NH ob NC" t (2°C t CN t NHt
co t
C"C@t NC", C"C NC", C - V C"C, CN C"C, co C"C, C"C@ C"H", C"H C"H", C"Hb C"H", H" ... 0 C a p , Hn ... Ha C"C@,C"C0 CBH, C@H CN, NC" CN, CO NC", CNC" NC", NC"C NC", NC", NC", NC",
NC"H" NC"H CC"Ha CC"H
1.181 (0.8687)
*
*
(0.981) 0.584
0.5 175 (0.5275) 0.524
0.6147 (0.6647)
*
0.560
* 1.181
0.0 10
*
*
*
0.020
*
0.036
0.020
0.020
0.0506
0.621
0.657
0.687
0.587
0.159 0.037 0.037 0.680 0.0005 0.001 0.1 10 0.300 0.101 (0.300) 0.300 0.500 0.101
0.129 0.087 0.060
0.129 0.087 0.060
0.129
0.020 0.057 0.050 0.487 0.537 0.129
*
0.0003
0.00 15
0.0035
0.100
0.090 0.100
*
*
* *
* *
* *
* *
0.010
-0.015 -0.050 0.080
*
* * *
*
*
* *
0.026
*
* *
* * * (0.150)
*
*
0.100
*
* * *
* *
*
* * *
*
*
*
0.517 0.517
0.517 0.517
0.026
0.026
0.427
*
* * *
0.301
-0.0075 0.071 0.300 0.500 0.300 (0.600) 0.300 (0.600) 0.627
* * *
*
*
*
(Continued)
222
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE VI (Continued) Value*for different polypeptidesCsd Force constant?
P-(Ala).
NC', C"NH NC", NC"U
0.294 0.417 (0.717) 0.079 (0.129) 0.200
NC", H°CnCe NC", CC"C8 NC", CaC"C@ C"C, NC"C C"C, C"CN C"C, C"C0 C"C, NC"H" C"C, NC"H C"C. CC"H"
a-(Ala),
(Aib),
(Gly),I
(Gly). I1
*
* *
*
*
*
*
*
0.100
*
* *
*
*
0.026
0.026
0.205
0.205
* *
* *
*
*
* *
* * *
*
(0.217)
*
* * *
0.300 0.300 0.200 (0.300) 0.026
0.100
0.205
0.305
* 0.200
*
*
*
*
*
(0.100)
C"C, CC"H C"C, CC"U
C"C, NC"Q C"Ca, NC"H" Coca, NC"C@ Coca, CC"H" C"C0, CC"C0 Coca, H"C"C@ C"C@,C"CSH Coca(I ) , NC"Ca(2) C"Cq I ) , CC"CO(2) coca, C@C"Ce CN, C"CN CN, CNC" CN, NCO CN, CNH c o , C"C0 CO, NCO CO, C°CN NC"C, C"CN NC"C, NC"H"
0.367 (0.667) 0.079 (0.029) 0.000 0.079 0.6 17 (0.317) 0.079 0.4 17 0.415 0.353
0.300 0.300 (0.450) 0.200 0.294 0.450 0.450 0.050 (0.170) 0.000 -0.03 1
*
*
* 0.000
0.100
0.517 (0.317)
0.517
* * *
* * * * (0.600)
*
* 0.403 0.030 0.030 0.5 17
* *
* *
0.000 (0.150) 0.160
0.000
.0.150
0.160
*
*
*
*
* * *
*
*
-0.150
* *
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
223
TABLE VI (Continued) Valueb for different polypeptides'*" Force constant0 NC"C, NC"H NC"C, C"NH NC"C, NC"Ca NC"C, CCnCa NC"C, CO ob NC"C, NH ob NC"H", CC"H" NC"H", H"C"C@ NC"H", NC"H NC"H", HC"H" NC"H", NH ob NC"H, CC"H NC"H, HC"H" NC"H, NH ob NC"C@,CC"C@ NC"C@,H"C"0 trans-(NC"Q, CnCaH) gauche-(NC"C@,C"CaH) NC"Ca, CaC"C@ NC"C@,NH ob NCO, CNH C"CN, CNH C"C0, CC"Q C"C0, CC"H" C"C0, CC"H C"NH, CNH C"NH, NC"H" C"NH, NC"H C"NH, N C W CC"H", HnCuC@ CC"H", CC"H CC"H", HC"H" CC"H", CO ob CC"H, HCnHn CC"H, CO ob CC"C@,H"C"Ca CC"Ca, CO ob CC"C@,CC"C@ CC'CS, C@C"C@
P-(Ala). -0.100 0.200 (0.150) -0.041 (-0.141) -0.1725 0.1 10 0.019 0.043
0.1022
-0.041 -0.031 -0.049 0.040 0.120 0.251 0.200 0.150 0.038 0.100 0.000 0.000 (0.096)
a-(Ala),
(Aib).
* *
* 0.100
*
*
-0.073 0.160
-0.073 0.160
* *
* * *
*
*
* 0.000 -0.031 0.060
*
* 0.0065
* *
*
0.150
-0.100
-0.03 1 0.162
-0.050
(Gly). I1
-0.031
-0.031
-0.0725 0.1092
-0.0725 0.1092
0.0463 0.0615
0.000 0.06 15
0.019 0.0615 0.0456
0.0 19 0.0615 0.0456
*
*
*
0.000
* *
(Gly),,1
* *
0.200
0.0065 0.050
* * * 0.100 0.0065
*
0.031
-0.032 0.0398 0.100 0.0398 0.100
*
*
*
*
* * * 0.100 0.0065 0.05 1 0.06 1
-0.032 0.0398 0.100 0.0398 0.100
* 0.100 -0.031 (Continued)
224
SAMUEL KRIMM AND JACDEESH BANDEKAR
TABLE VI (Continued) Valueb for different
C"C@H,C"C@H ~Y~TLS-(C"C@H, H"C"C@) gauche-(C"C@H,H"C"C@) trans-(C@C"C@, C"C@H) gauche-(C@C"C@, C"C@H) CNCa, C"NH CNC", NC"H" CNC", NC"H CNC", NC"C@ CO ob, NH ob NH 0 ib, NH ob CO ob, d N t NH ob? CN t
-0.045
0.122 0.100
-0.020
*
*
0.000
-0.040 0.100
* *
0.000 0.000 0.007 0.01 11 -0.1477
0.000 -0.050
* * *
0.100 0.010
*
*
*
0.000 0.000
0.000 0.270
*
0.010 0.000
0.010 -0.005
*
-0.1677
-0.1677
0.100 -0.050
*
*
*
a AB, AB bond stretch; ABC, ABC angle bend; X, Y, XY interaction; ib, in-plane bend; ob, out-of-plane bend; t, torsion. Units: mdynlA for stretch and stretch, stretch constants; mdyn for stretch, bend constants; and mdyn A for all others. Values in parenthesis are for force field with CHs replaced by point mass (all other constants are the same). Asterisk indicates that constant is the same as for P-(Ala),$. Blank space indicates inapplicable or unused constant. f Subscript b denotes constant applicable to bifurcated hydrogen bond.
[(ap/aQ)] of the peptide group (Cheam and Krimm, 1985) indicate that intensities and orientations of ap/aQ provide a very sensitive test of a force field. For example, it was found that for NMA our (Gly),,I force field gave good reproduction of intensities of amide modes and excellent agreement with measured directions of d p / d Q for amide I and amide I1 modes (cf. discussion in Section II,D,2,c). Comparable agreement was not found for other force fields (Cheam and Krimm, 1985). A similar calculation gave excellent agreement with observed intensities of (Gly),,I (see Section III,B,l). We expect that in the future such intensity information will also be utilized in the early stages of force-field refinements. E . Band Assignments T h e discussion thus far has made the assumption that observed bands in the IR and Raman spectra have been properly correlated with calculated normal modes, i.e., that we know that the atomic displacements associated with an observed frequency are essentially similar to those in
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
225
the eigenvector of the corresponding calculated frequency. Otherwise, for example, there is no basis for obtaining a (vpbs- v"") term in Eq. (82). Proper band assignments are therefore a vital element in a vibrational analysis, and must be determined independently of a normalmode calculation. We review here briefly the approaches that are used to make such band assignments.
1 . Group Frequencies Over the many years of studying IR and Raman spectra of molecules, many correlations have been established, both from theoretical as well as experimental studies, between characteristic vibrational modes and the frequency regions in which they are found. A number of books and articles have discussed these correlations (Bellamy, 1975; Parker, 1983; Frushour and Koenig, 1975c; Spiro and Gaber, 1977; Tsuboi, 1977), and it is natural that we make general use of these in an initial approach to assigning bands. For the peptide group, the analysis of NMA has provided an important benchmark, and thus the results shown in Table I1 represent guidelines for identifying general regions in which certain types of vibrations are expected. In addition, we know where to expect other kinds of modes (CH stretch in the region 3000-2800 cm-', HCH bend in the region 1500-1400 cm-', etc.), how relevant frequencies will shift with the strength of hydrogen bonds, and some general effects of environmental factors. Specific qbestions can often be answered by studying small model compounds or simple variants of the molecule in question. What the group-frequency approach usually cannot do is to provide a specific assignment for an arbitrary band in a complex spectrum. This becomes obvious when we recognize that side-chain modes overlap regions of main-chain modes, and that multiplicity of peptide groups in a repeat unit (cf. the four in the APPS structure) in general multiply the number of observable bands. In the latter case, it then becomes important to know with which vibrational phase angle a particular band is associated. We therefore need other methods to provide the required detailed band assignments. 2 . Symmetry: Activity and Dichroism Molecular symmetry imposes constraints on the nature of the normal modes of vibration, and these are reflected in observable characteristics in the spectrum. The theoretical basis for symmetry has been widely presented (see, for example, Wilson et al., 1955; Zak et al., 1969; Woodward, 1972), and will not be discussed here. We will only illustrate the results of its application in one case, that of (Gly)nI.
226
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE VII Symmetry Species and Selection R u b for Crystalline Polyglycine I Species Pleated sheet (Dz) A BI B2
Bs Rippled sheet (Czh) A, A, B" B,
Symmetry
46,6')
4 0 , 0) 40,9 ) v ( n , 0) 477, a)
v(0, 0) 40,a) v ( a , 0) v ( a , a)
C!Ca) 1
C2b)
Cdc)
-1 1 -1
1 1 -1 -1
1 -1 -1 1
Ca(b)
i
1 1 -1
1 -1 -1 1
-1
Number of modes
Activity"
Lattice vibrationsb
21 20 20 20
R R, WII) R, IR(I) R, IR(I)
R, Tb R T, Ta
21 20 19 21
R IR(II) IR(I) R
R, Tb R
u& 1 -1 1 -1
Ta, Tc
R, Raman; IR, infrared. R, Rotatory; T , translatory modes.
Two structures have been proposed for (Gly), I: an antiparallel-chain pleated sheet (APPS) and a similar rippled sheet (APRS) (see Section III,B, 1). These structures have different symmetries: the APPS, with DZ symmetry, has twofold screw axes parallel to the a axis [CS,(a)]and the b axis [C",b)], and a twofold rotation axis parallel to the c axis [C~(C)]; the APRS, with C2h symmetry, has a twofold screw axis parallel to the b axis [C",b)], an inversion center, i, and a glide plane parallel to the ac plane, ufC. Once these symmetry elements are known, together with the number of atoms in the repeat, it is possible to determine a number of characteristics of the normal modes: the symmetry classes, or species, to which they belong, depending on their behavior (character) with respect to the symmetry operations; the numbers of normal modes in each symmetry species, both internal and lattice vibrations; their IR and Raman activity; and their dichroism in the IR. These are given in Table VII for both structures. For the APPS structure, the modes divide into four symmetry species. The A species modes, of which there are a total of 21, are totally symmetric with respect to the symmetry operations (i.e., the characters are all l), are only Raman-active, and include a rotatory and a translatory lattice mode. The B1 species modes, 20 in number, are antisymmetric (i.e., with character -1) with respect to C;(a) and C$(b), can exhibit activity in both Raman and IR (of parallel dichroism; see below), and include a rotatory lattice mode. The B2 and Bs species modes, both 20 in
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
227
number, also exhibit activity in Raman and in IR (of perpendicular dichroism), and each set includes a translatory lattice mode. The predictions for the APRS structure are different. The four symmetry species divide up into two (Ag,Bg)whose modes are only Raman-active and two (A,,, B,,) whose modes are only IR-active (this mutual exclusion is a result of the i symmetry). Also, the relative number of V(T, 0) and V(T, T) modes is different, and this difference is entirely in the (low-frequency) translatory lattice modes. These predictions provide important guidelines in assigning bands. Of course, an assignment must be consistent with the prediction of activity for the mode in question, but an even more useful property is the expected IR dichroism. We noted in Eq. (81) that the integrated IR absorbance in a band is proportional to ( ~ 3 p / d Q ) ~If. polarized radiation is incident on an oriented sample, another condition must be satisfied, viz., that there be a nonvanishing component of (dp/dQ) along the direction of the electric field vector, 'is; in fact, the absorbance is proportional to ['is (dp/aQ)]*. The direction of maximum absorbance of a band with respect to the known axis of orientation of the polypeptide chain (i.e., the dichroism) cap be measured experimentally. Thus, the predictions from symmetry analysis permit us to restrict, or in the case of parallel bands to identify, the symmetry species to be associated with an observed band. This can be an important tool in making band assignments. 3. Isotopic Substitution Bands can often be assigned by studying their frequency behavior when an isotopic substitution is made in the molecule. An obvious case is the effect of N-deuteration, which, for example, results in the replacement of an NH stretch mode near 3300 cm-' by an ND stretch mode in the 2500-2400-cm-' region; or in the disappearance of amide 11, 111, and V modes (the first two of which involve major contributions from NH ib and the third of which involves NH ob) and the appearance of ND modes at lower frequencies. For modes with minor contributions from NH deformation, normal-mode calculations are a very important guide in assigning bands: Calculations for the N-deuterated molecule indicate explicitly the behavior of the residual mode when the NH contribution is removed, as well as how the ND contribution may mix with other modes in its spectral region, both aspects of which may be specific to the particular structure. Substitution of ND for NH is a simple procedure (oftenjust involving treatment with DzO), and has been used frequently. Substitutions of 15N for I4N or 19C for 12Chave been used much less frequently, but may be a more powerful tool for studying structure. As is discussed with respect
228
SAMUEL KRIMM AND JAGDEESH BANDEKAR
to the y turn (Section V,C, l), such isotopic substitution at strategic sites, although resulting in relatively smaller frequency shifts, produces shifts that are dependent on the conformation of the local region of the molecule. These subtle changes must be interpreted in terms of the results of normal-mode calculations, but they provide a deeper insight into structure as well as band assignments. 4 . Overtone and Combination Bands: Fermi Resonance
All of the preceding discussions have dealt with bands that are associated with the excitation of individual normal modes, i.e., the fundamental frequencies. Although only such transitions are permitted for a harmonic oscillator, the vibrations of real molecules are anharmonic, and in such cases double excitations of a normal mode (resulting in overtone bands) and single excitations of two different normal modes (resulting in combination bands) are allowed. Analysis of such bands often leads to information on the assignments of the fundamentals, and is therefore of importance. In this connection, we need to know not only the rules for the appearance of such bands, but we must understand that they often are perturbed by an interaction known as Fermi resonance. Overtone and combination bands belong to symmetry species determined by the species of their fundamentals. We can determine this symmetry by multiplying the characters for the fundamentals. Thus, overtones of all species have the character of the totally symmetric species, A for D2 symmetry and A, for C2h symmetry (see Table VII). For combinations, this rule implies that, for D2 symmetry, a B1 mode combining with a Bs mode produces a combination of B2 symmetry, etc., while for CPhr B, combining with B, results in a band of A, symmetry, etc. (see Table VII). Overtone and combination bands are usually weak in comparison with the fundamentals. However, when the frequency of such a combination falls close to that of another fundamental of the same symmetry species, a Fermi resonance interaction occurs, which results in a sharing of intensity between the two modes as well as frequency shifts in both. This occurs, for example, in the interaction between the NH stretch mode, Y:, and overtones or combinations of amide I1 modes, U: (Miyazawa, 1960b). From measurements on the frequencies and intensities of the observed bands, U A and trB , it is possible to obtain the frequencies of the unperturbed fundamental, v i , and combination, u! . The relation is given by (Miyazawa, 1960b) V B = ~ [ ( v A+ and $[(u: + u!) + S] where s = Y A - v B . It can also be shown that VA
=
uB)
- S]
(92)
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS = (s
- 6)/(s
229
+ 6)
(93) where 6 = - ug . Thus, measurements of the frequencies and intensities of the observed amide A and amide B bands permit a determination and ug, and from the latter it is usually possible to infer the of fundamental frequencies involved (Krimm and Dwivedi, 1982a). (ZB/Z*)
UX
VX
111. EXTENDED POLYPEPTIDE CHAINSTRUCTURES A . Introduction
The extended form of the polypeptide chain, with lateral hydrogen bonds stabilizing a sheet-like arrangement, was recognized as a likely structure from early X-ray diffraction studies of silk (Meyer and Mark, 1928) and stretched mammalian (p) keratin (Astbury and Street, 1931; Astbury and Woods, 1933). Neighboring chains in a sheet can be directed, in a chemical sense, in an antiparallel or parallel manner, and detailed coordinates for such stereochemically acceptable /?-sheet structures were first provided by Pauling and Corey (1951a, 1953b). Experimental evidence for the antiparallel-chain P-sheet structure has been obtained from X-ray diffraction studies of a synthetic polypeptide, (Ala), (Arnott et al., 1967), a fibrous protein, /?-keratin (Fraser et al., 1969), and from oligopeptides (Rao and Parthasarathy, 1973; Fawcett et al., 1975; Tanaka and Ashida, 1980; Yamada et al., 1980; Cruse et al., 1982; Admiraal and Vos, 1984; Yamane et al., 1985; Ashida et al., 1986) and proteins (Richardson, (198 1). The parallel-chain structure has been found in some oligopeptides (Marsh and Glusker, 1961; Chatterjee and Parthasarathy, 1984; Lalitha et al., 1987) and in proteins (Richardson, 198 1). The protein studies, as well as theoretical considerations (Chothia 1973; Zimmerman and Scheraga, 1977; Raghavendra and Sasisekharan, 1979; Chou et al., 1982), have shown that finite p sheets have a twist instead of being essentially planar, as they are in the extended structures of synthetic polypeptides. This twist is nearly always “right-handed” when viewed along the polypeptide chain axes (Chothia, 1973), and is a result of energy minimization (Salemme, 1983). Even more complex, cylindrical “/?-barrel” arrangements occur (Richardson, 198 l), and the range of topographies is quite extensive (Salemme, 1983). The p sheet thus plays an important role in the structure of proteins. In some it is the main secondary structural component (e.g., concanavalin A and Bence-Jones proteins); in others it is found in conjunction with a-helical segments; and in many proteins it occurs as a mixed sheet of parallel and antiparallel strands. To date, normal-mode analy-
230
SAMUEL KRIMM AND JAGDEESH BANDEKAR
ses have only been applied to the infinite antiparallel-chain sheet structures, and this work is described in this section. It is clear that such analyses need to be extended to other types of /3-sheet structures. B . Antiparallel-Chain Rippled Sheet Polyglycine I a. Structure and Symmet?. Early X-ray diffraction studies suggested that (Gly), I has an essentially extended chain conformation (Astbury et al., 1948; Astbury, 1949; Bamford et al., 1953). A specific model of this structure, namely the APPS developed from model-building studies (Pauling and Corey, 1951c),was believed to apply to (Gly),I on the basis of “sufficiently good’ agreement with the observed powder X-ray diffraction pattern (Pauling and Corey, 1953a). Because an oriented sample could not be obtained, no definitive structure determination existed for a long time, and the APPS structure was assumed as the basis for early analyses of IR spectra (Elliott and Malcolm, 1956; Miyazawa, 1960a, 1967; Abe and Krimm, 1972a) and in conformational energy calculations (Venkatachalam, 1968a; Hopfinger, 1971). The preparation of “single crystals” and of thin oriented films permitted Lotz (1974) to undertake for the first time an electron-diffraction analysis on an oriented structure. From considerations of unit cell symmetry and the results of conformational energy calculations (ColonnaCesari et al., 1974), Lotz proposed that an APRS structure was a more reasonable model for (Gly)nI. In this structure, first suggested on the basis of model-building studies (Pauling and Corey, 1953b), alternate chains in the sheet consist of all L and all D residues. [Of course, this condition can be satisfied by (Gly),I since its C” atom is achiral.] Although a distinction between APPS and APRS structures is difficult on the basis of calculated diffraction intensities (Lotz, 1974), it was possible to show that the APRS structure was consistent with the diffraction data and was able to account for the observed monoclinic geometry of the unit cell (the APPS structure generally giving an orthorhombic cell). The experimental evidence for APRS (Gly), I was strengthened significantly by the results of the first vibrational analysis on this structure (Moore and Krimm, 1976a).The results showed that the ratios of differences in the three observed amide I mode frequencies could be accounted for by a TDC analysis of the proposed APRS structure, but that these observed ratios were in significant disagreement with calculated values based on the APPS structure. Other spectral features were also in better agreement with the APRS structure.
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
23 1
FIG. 6. Antiparallel-chainrippled sheet polyglycine I structure. (A, B) Schematic diagrams (Moore and Krimm, 1976a); (C) ORTEP drawing.
We therefore assume that (Gly),I has an APRS structure. This structure is shown in Fig. 6, and its parameters are given in Table VIII. The angle a is that between the a axis and the projection on the ac plane of a line connecting successive C" atoms of one chain. The quantity Ab is the shift in the b axis direction of the second chain in Fig. 6A with respect to the first; a positive value of Ab results in a decrease in the distance between nearest carbonyl oxygen atoms on adjacent chains compared to this distance for Ab = 0. We have chosen Ab = 0, even though energy
232
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE VIII Structural Parameters of Clystalline Antiparallel-Chain Rippled Sheet Polyglycine I" Dihedral angles:
#IJ
= -149.9,
JI
= 146.5"
Sheet parameters: a / 2 = 4.77 8, b/2 = 3.552 A (fiber axis) Ab=O8, a = 76" Hydrogen-bond parameters:
l(H 0) = 2.12 8, 1(N ... 0) = 2.91 A 8(NH, NO) = 31.4" y(NH0) = 134.4"
Intersheet parameters: c = 3.67 8, @ = 113" -~ 5
See Table 111 for peptide-group geometry.
calculations suggest a value of - 0.7 A (Colonna-Cesari et al., 1974), since this is in best agreement with the results of vibrational analysis (Moore and Krimm, 1976a) and is more consistent with the electron-diffraction studies (Lotz, 1974). It is worth noting that the hydrogen bond in the APRS structure is longer [l(N 0) = 2.91 A] than that in the APPS structure [I(N 0) = 2.73 A] (Moore and Krimm, 1976b; Dwivedi and Krimm, 1982b), and that it is less linear: 8(NH, NO) being 31.4" (APRS) versus 9.8" (APPS). T h e H a-..H" distance is 2.61 A. T h e distribution of the normal modes of the APRS structure of (Gly),I among the symmetry species, and their optical activity, are given in Table VII. b. Vibrational Analysis. There have been a number of experimental studies of the IR (Elliott and Malcolm, 1956; Miyazawa, 1961a; Bradbury and Elliott, 1963; Suzuki et al., 1966; Krimm et al., 1967; Krimm and Kuroiwa, 1968; Fanconi, 1973) and Raman spectra (Smith et al., 1969; Small et al., 1970; Fanconi, 1973) of (Gly)nI.These have provided many kinds of information, including the effects of isotopic substitution (Suzuki et al., 1966), but only very limited data on IR dichroism because of the difficulty in obtaining oriented specimens (Bradbury and Elliot, 1963). There have also been some inelastic neutron-scattering measurements on (Gly),I (Gupta et d.,1968). Infrared and Raman spectra of (Gly),I are shown in Figs. 7 and 8, respectively. Early normal-mode calculations were based on the approximation of taking the CHz group as a point mass, in some cases with computations
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
I
233
3
3 1
5
FIG.7. Infrared spectrum of polyglycine I. Top, Experimental spectrum (KBr pellet); bottom, plot of calculated (ap/aQ)*(numbers correspond to numbered modes in Cheam and Krimm, 1985).
of the modes of only a single chain (Fukushima et al., 1963; Gupta et al., 1968), while in other cases taking the structure to be a two-dimensional sheet (Miyazawa, 1967; Fanconi, 1972a,b). T h e first calculation with all atoms included was done on a hydrogen-bonded sheet (Abe and Krimm, 1972a); it used the (now known to be) incorrect APPS structure but incorporated TDC in order to account for splittings in amide I and amide I1 modes (Krimm and Abe, 1972). The APPS structure was also used in a calculation that determined intensities as well as frequencies
FIG. 8. Raman spectrum of polyglycine I (Small et al., 1970).
234
SAMUEL KRIMM AND JAGDEESH BANDEKAR
(Stepanyan and Gribov, 1979). The APRS structure was the basis of a calculation (Moore and Krimm, 1976a) in which the three-dimensional structure was used to incorporate intersheet TDC (Moore and Krimm, 1975). In a subsequent analysis, the force field was further refined for maximum transferability between different polypeptide molecules (Dwivedi and Krimm, 1982a). The following discussion is based on these two papers (Moore and Krimm, 1976a; Dwivedi and Krimm, 1982a). The observed and calculated frequencies of (Gly),I are compared in Table IX. For a detailed discussion of the assignments, and the calculations of isotopically substituted molecules, the original publications should be consulted. We consider here only some salient features of the results. The relatively high value of the unperturbed amide A frequency, v i , compared to that of the APPS structure (Moore and Krimm, 1976a; Krimm and Dwivedi, 1982a), is generally consistent with the longer hydrogen bond in (Gly), I. The unperturbed amide B frequency, u! ,is now naturally accounted for by a combination between the observed 1517cm-' A, mode and an unobserved B, mode calculated from TDC near 1600 cm-', substantiating an inference of such a mode derived independently from a Fermi resonance analysis (Tsuboi, 1964; Moore and Krimm, 1976a). The large observed splittings in the amide I modes are very well accounted for by the TDC interactions. Since without TDC the normalmode calculation gives a maximum splitting of 10 cm-' (A,, 1684; A,, 1677; B,, 1676; B,, 1674 cm-'), whereas the observed splitting is about 50 cm-l, it is evident that this interaction is of major importance in accounting for the observations (see Section II,D,2,c). The same is even more true of the amide I1 modes: without TDC we find A,, 1534; A,, 1535; B,, 1559; B,, 1559 cm-'. The relatively low observed amide I1 frequencies (15 17 and 1515 cm-') seem to be characteristic of the APRS structure of (Gly),I. The amide I11 region has been used as an indicator of chain conformation, although it has been pointed out (Hsu et al., 1976) that caution is advisable in this regard since these modes (NH ib plus CN s) are sensitive to side-chain composition. I n (Gly),I, NH ib is predicted to contribute from 1415 to 1152 cm-', with only calculated bands at 1304 and 1286 (observed 1295W IR) cm-' having this coordinate as the major contributor. The results on N-deuterated (Gly), I support these assignments (Dwivedi and Krimm, 1982a), and show in particular that the 1236Mcm-' IR band and its strong Raman counterpart as 1234 cm-', although in a region normally associated with amide 111, should be assigned to CH:! tw rather than to an amide mode, as had been assumed (Smith et al.,
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
235
TABLE IX Observed and Calculated Frequencies of Polyglycine I Observeda (cm-I) Raman
IR
Calculated (cm-I) A,
A,
B,
B,
3272 3272 3271 32728' 29328 2932s
3271 2934 2934 2929
2929W 2929W
2928 2865
2869M 2869M
2865 2869VW 2869VW
286 1 2861 1695 1689
1685M 1674s
1677 1643
1636s '
1602 1572
1517s
1515
1515W
1514
1460s
1454 1454 1441 1439
14323 1410M
1415 1408W
1415
1341W
1341 1338
1338W 1304
1286
1295W 1255M
1253 1253
12348
1243 1242
1236M 1220w
1213
Potential energy distributionb NH s (98) NH s (98) NH s (98) NH s (98) CH2 as (98) CH2 as (98) CH2 as (99) CH2 as (99) CHg ss (98) CH2 ss (98) CHZ ss (99) CH2 ss (99) CO s (77), CN s (15), CICN d (11) CO s (75), CN s (20), CaCN d (11) CO s (74), CN s (21), CaCN d (11) CO s (69), CN s (22), CaCN d (11) NH ib (56), CN s (19), (2°C s (12) NH ib (51), CaC s (16), CN s (14) NH ib (35), CN s (28), C°C s (17), CO ib (14) NH ib (35), CN s (27), C°C s (17), CO ib (14) CHP b (66), CH2 w (16) CH2 b (65), CHp w (17) CH2 b (96) CH2 b (96) CH2 w (41), CH2 b (31), NH ib (14) CH2 w (40), CH2 b (33), NH ib (13) CH2 w (84) CH2 w (79) NH ib (30), CO ib (19), CN s (18), C'C s (16) NH ib (39). CmCs (17), CO ib (16), CN s (12) CH2 W t (76), CH2 w (17) CHg t W (76), CH2 w (17) CHZ tw (93) CH2 t W (92) NCn s (29), NH ib (23), CH2 w (18), CH2 tw (16), CN s (13) (Continued)
236
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE IX (Continued) Observed" (cm-I)
Calculated (cm-I)
~
Raman
IR
A,
1214W 1162M
A.
B,
B,
1212 1153 1152
1021vs
1015 1016M
1014 1002 1000
987W
980 979 946
936M
940
888W
890
884M
890 768 767
[
-3.
718 718 702 630
628W 614M
629 62 1 613 589
599w 589W
587 580 589M
327W
579 323
321W
NC" s (29), NH ib (23), CHz w (18), CHz tw (15), CN s (13) NC" s (50), CnC s (13), NH ib (12) NC" s (50), C"C s (14), NH ib (12) NC" s (77), (2°C s (10) NC" s (77), CnC s (10) CHZr (45), CO s (1 l), CnC s (10) CHz r (49), CO s (10) CH2 r (68), CN s (10) CHz r (70), CN s (10) CHZr (29). CN s (12), CnC s ( l l ) , NC"C d (10) CHP r (25),CN s (13), CnC s (12), NC"C d (10) C"C s (29), CN s (21), CHz r (14). CO s (13) (2°C s (31), CN s (24), CO s (12), CH2 r (12) CO ib (16), NC" s (15), CnC s (15), CN t (12), NCuC d (11) C"C s (19), CO ib (17), NC" s (16), NC"C d ( l l ) , CNC"d (11) CN t (63), NH 0 ib (15), NH ob ( l l ) , H 0 s (11) CN t (75), NH 0 ib (19), NH ob (16), H ... 0 s (10) CN t(79), NH ob (26), NH ... 0 ib (23), H ... 0 s (10) CN t(79), NH ob (29), NH 0 ib (25), H ... 0 s (15) CO ib (36), CO ob (24), CnC s (10) CO ib (37), CO ob (23), CnC s (10) CO ob (67), C C N d (15), NH ob (14), NCaC d (10) CO ob (59), C-CN d (20), NH ob (17), NCnC d (11) C"CN d (47), CO ob (17) C"CN d (43), CO ob (24) CO ob (45), CO ib (28), C C s (12) CO ob (45), CO ib (27), C"C s (1 1) NC"C d (21), CO ib (16), NH ob (15) NC"C d (21), CO ib (18), NH ob (15)
...
736
708s
Potential energy distribution*
320
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
237
TABLE IX (Continued) Observed4 (cm-I) Raman
IR
Calculated (cm-I) Ag
Au
BE
Bu
29 1 285W
290 252
260W
250 217W
226
211w
214
170W
178
180 140M, br
140 135
112M
111 108
82s
88
71 37 31
12
Potential energy distributionb C"CN d (51), NC"C d (21), NC" s (10) C"CN d (56), NC"C d (19), NC" s (12) CNC" d (41), CO ib (28) CNC" d (41), CO ib (30), NH ob (15) CNC" d (68), CO ib (lo), H ... 0 s (10) CNC" d (74) NH ob (70), CO ob (20) NH ob (67), CO ob (15), C"CN d (12), CHp w (12) H * * * 0 s (46), CN t (46), NH ob (37), CuC t (11) H ... 0 s (29), CN t (25), NC"C d (15) H ... 0 s (78), CN t (18) NC=C d (46), CnC t (16), NH ob (16), NC" t (14) NH ob (43), CN t(27), NCnC d (26), H" ... H" s (13), NC" t (12), H ... 0 s (1 1) CnC t (35), NC" t (24), CN t (22), NH ob (22), NH **. 0 ib (17) NH * * * 0 ib (38), CO ... H ib (28), NH ob (la), H 0 s (17) NH ... 0 ib (35), CO ... H ib (31), CN t (21), NH ob (16), NC" t (13) NH t (52), CO t (34)
~~
S, Strong; M, medium; W, weak; V, very; br, broad. s, Stretch; as, antisymmetric stretch; ss, symmetric stretch; b, angle bend; ib, in-plane angle bend; ob, out-of-plane angle bend; w, wag; tw, twist; r, rock; t, torsion; d, deformation. Only contributions of 10 or greater are included. Unperturbed frequency. a
238
SAMUEL KRIMM AND JAGDEESH BANDEKAR
1969; Small et al., 1970). Although TDC makes a smaller contribution to amide I11 than to amide 11, it is not necessarily negligible; the unperturbed frequencies of the eight modes in this region that contain NH ib contributions are 1422 (Ag), 1422 (A,,), 1285 (Bg), 1278 (B,,), 1223 (Ag), 1222 (A,,), 1158 (Ag), and 1157 (A,,) cm-'. The amide V mode, which seems to be very sensitive to chain conformation, is found as a strong IR band at 708 cm-' and is fairly well predicted by the calculation. Its assignment to the B,,(I) species is likely on the basis of the structure, but it could also have an A,,(ll)component because of the orientation of the peptide group; this can be decided only from polarized IR spectra on an oriented sample. In Section II,D,4 we mentioned that recent a6 initio calculations of dipole derivatives for the peptide group in NMA have been used to calculate intensities of IR bands in (Gly),I (Cheam and Krimm, 1985). Such calculated intensities are shown in Fig. 7, and it can be seen that they reproduce the observed intensities quite well. This kind of agreement indicates that the force field is a very satisfactory one, since intensities are a sensitive function of the eigenvectors. While (Gly),! is the only polypeptide so far for which intensities have been calculated, it can be expected that this technique will be used in the future to provide additional information on polypeptide chain conformation. C . Antiparallel-Chain Pleated Sheet 1 . P-Poly( L-alanine) a. Structure and Symmetry. Early X-ray diffraction studies of P-(Ala),, clearly demonstrated the extended nature of the chains in this structure (Bamford et al., 1953, 1954). Following a suggestion by Marsh et al. (1955a) that the sheet structure corresponds to the APPS (Pauling and Corey, 1951c, 1953b), Brown and Trotter (1956) tested various packing arrangements of such sheets but were unable to find any that gave calculated structure factors in acceptable agreement with observed intensities in their fiber-diffraction pattern. In an X-ray refinement procedure that additionally allowed A6 to be a parameter, Arnott et al. (1967) were able to find an APPS structure with statistical packing of sheets that gave good agreement with the observed intensities. In this structure, A6, although it could not be well refined, was felt to be probably between 0 and -0.65 A. However, A6 can be obtained relatively accurately from a TDC analysis of the amide I modes (Moore and Krimm, 1976b), and was found to be -0.27 A. This APPS structure of P-(Ala), is shown in Fig. 9, and its parameters are given in Table X. The bond lengths are the same as for the standard
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
239
FIG.9. ORTEP drawing of antiparallel-chain pleated sheet of poly(L-alanine). The CHs group is represented by a point mass.
TABLE X Structural Parameters of Crystalline Antiparallel-Chain Pleated Sheet Poly(L-alanine)" Bond lengths (A): l(C"-C) l(C-N) 1(N-C") 1(C=O)
=
l(C"-H) l(C-H) l(N-H)
1.53
= 1.32 = =
1.47 1.24
= 1.07 = 1.09 = 1.00
C"C0 = 121.0 Bond angles (degrees): C"CN = 115.4 CNH = 123.0 CNC" = 120.9 All angles about C" and Ca tetrahedral Dihedral angles: 4 = -138.4".
JI
=
135.7"
Sheet parameters: a / 2 = 4.73 A b/2 = 3.445 A (fiber axis) Ab = -0.27 A a = 80" Hydrogen-bond parameters:
1(H ... 0) = 1.75 A 1(N 0) = 2.73 A fl(NH, NO) = 9.8" y(NH0) = 164.6' 1..
lntersheet parameters: c12 = 5.27 p = 90" (I
From Arnott et al. (1967).
A
240
SAMUEL KRIMM AND JAGDEESH BANDEKAR
geometry, but the CaCN and CNCa angles differ by 1.4 and 2. lo,respectively; this is not expected to have a significant effect. The hydrogenbond length is slightly shorter than in (Gly),I, as is the H a ... H a distance (2.325 A). The normal modes of APPS P-(Ala), are distributed among the symmetry species, and have optical activity as follows (Moore and Krimm, 1976b): A[v(O, O ) ] , Raman, 30; Bl[v(O, n)],Raman, IR((I),29; B2[v(n, O)], Raman, IR(l.), 29; B3[v(.rr, T)], Raman, I R ( I ) , 29. b. Vibrational Analysis. Since specimens of P-(Ala), can be well oriented, dichroic IR spectra were soon available (Elliott, 1954). Far IR spectra have also been obtained (Itoh et al., 1968, 1969; Itoh and Katabuchi, 1972), as well as spectra of the N-deuterated molecule (Masuda et al., 1969; Dwivedi and Krimm, 198213). Raman spectra are also available (Fanconi, 1.973; Frushour and Koenig, 1974). Infrared and Raman spectra of P-(Ala)* are given in Figs. 10 and 11, respectively. The first normal-mode calculation on P-(Ala), to incorporate all of the atoms in the structure used a force field transferred from (Gly),I and refined for a CH3 side chain (Moore and Krimm, 1976b). This force field was subsequently adjusted slightly (Dwivedi and Krimm, 1982b, 1983),and the results of this calculation, given in Table XI, are the basis of our present discussion. (The original paper should be consulted for the corresponding analysis of the spectra of the N-deuterated molecule.) Using this detailed force field, an “approximate” force field was derived for a structure in which the CH3 group is taken as a point mass (Dwivedi and Krimm, 1984b), and its calculated frequencies are also given in Table XI (as the second of the two entries for each mode). The v i frequency, determined from a Fermi resonance analysis (Moore and Krimm, 1976a; Krimm and Dwivedi, 1982a), is significantly lower than that in (GlyI),: 3242-3250 vs. 3272 crn-’, consistent with the shorter hydrogen bond in P-(Ala),. In the case of v! it seems possible to account for its frequency of 3096-3109 cm-I by two combinations of amide I1 modes, B1 + Bs and A + B2, compared to only one likely combination for (Gly),I (Moore and Krimm, 1976a; Krimm and Dwivedi, 1982a). The maximum observed amide I splitting in P-(Ala),, 1694 - 1632 = 62 cm-l, is significantly larger than that in (Gly),I, 1685 - 1636 = 49 cm-l, and this difference is very well reproduced by the calculation: 65 versus 46 cm-l. (Again, of course, the amide I splittings are due to TDC. Without this interaction we calculate A, 1670; B1, 1673; B2, 1665; B 3 , 1670 cm-l.) Three assignable amide I1 modes are observed in the spectra, compared to two for (Gly)nI,and these agree well with the calculated frequencies. (Without TDC the computed frequencies are A, 1545; B 1 ,
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
I
I
I
3200
3300
3400
24 1
Frequency (cm'i)
b
700
600
400
200
(cm-'1
FIG. 10. Infrared spectrum of @-poly(L-alanine).(a) Mid-infrared region. (-) Electric vector perpendicular to the direction of stretching.(---) Electric vector parallel to the direction of stretching (Elliott, 1954). (b) Far-infrared region (Itoh and Katabuchi, 1972).
1549; Bz,1550; B3, 1554 cm-I.) It is interesting that the maximum predicted amide I1 splitting for /3-(Ala)n(64 cm-l) is much less than that for (Gly),I (88 cm-I), yet the observed splitting of the two lowest frequency modes of /3-(Ala)n(14 cm-l) is much larger than that of (Gly),I (2 cm-l) and this difference is predicted very well. Although amide I11 is generally assigned to the Raman bands at 1243 and 1226 cm-', and the calculation indeed shows NH ib + CN s contributions to these modes, we again see that NH ib is present in calculated frequencies that range from 1402 to 1195 cm-l. Observed bands in the higher frequency region support such an assignment; for example, the 1402MW-~rn-~ IR band disappears on N-deuteration (Dwivedi and
242
SAMUEL KRIMM AND JAGDEESH BANDEKAR 2
L
W
I-
E
zI4.
I 4
a
900 800
700
600 500 400 FREQUENCY(cm-l)
300 200
100
FIG.1 1 . Raman spectrum of p-poly(L-alanine)(Fanconi, 1973).
Krimm, 1982b). As we will see below, these higher frequency modes become important in characterizing &turn structures. The amide V mode, similar to (Gly),I, appears as a strong IR band at 706 cm-'. The essential absence of any significant dichroism in this band in well-oriented specimens of P-(Ala), (Itoh et al., 1968) suggests the near superposition of components having parallel and perpendicular polarization. As can be seen, this is supported very well by the results of the calculation, which additionally indicate that a weak Raman band at 698 cm-l should be assigned to amide V. 2. P-Poly( L-alanylglycine) Since the Ala-Gly sequence is the major one in Bombyx mori silk, the structure of this sequential polypeptide is of importance in understanding the structure of silk. In early studies of P-(AlaGly), it was found that the X-ray powder diffraction pattern and the IR spectrum resembled that from the crystalline component of B . mori silk. The proposed APPS structure of silk (Marsh et al., 1955b) led to a more detailed analysis of the X-ray pattern of P-(AlaGly), (Fraser et al., 1965, 1966). This, together with conformational energy analysis (Colonna-Cesariet al., 1975),
TABLE XI Observed and Calculated Frequencies of P-Poly(L-&nine)
Observed" (cm-I) Raman
IR
Calculatedh(cm-I) A
BI
B2
BS
3243 3242 3243 3242 3242Sd
3243 3242 3243 3242
2984s
2984 2984
Potential energy distributionr NH s (97) NH s (97) NH s (97) NH s (97) NH s (97) NH s (97) NH s (97) NH s (97) CHyas2 (50), CHyas1 (49) CHs as2 (55), CHyas1 (45)
-
2984 -
CHs as2 (53). CH3 as1 (46) 2984 -
2983 -
CHy as2 (52), CHJ as1 (48) CHy as1 (50), CH, as2 (49)
2983
CHy as1 (55), CHJ as2 (45)
-
2980 sh (I)
2983 -
CH, as1 (53). CH, as2 (46) 2983 -
CHS as1 (51), CHs as2 (48)
(Continued)
TABLE XI (Continued) Observed" (cm-I) Raman
IR
Calculated* (cm-I) A
Bi
B2
BJ
Potential energy distribution'
CHJ ss (100)
2929 -
CH:<ss (100)
2929 2929 2933s
2934W (I)
2929
-
2877 2875
2871 sh
2877 2875 2866 2864 2866 2864
2874VW (I)
1698 1698 1695 1695
1694W (11) 1669s
1670 1670 1632VS (I)
1630 1630 1592 1593
1553VW
1555MW (I)
1562
CHJ ss (100) C-H" s (98) C'Ha s (98) C"H" s (98) CuH" s (98) CuH" s (99) CnH" s (99) CuH" s (99) C-H" s (99) CO s (78), CN s (14) CO s (79), CN s (13) CO s (76), CN s (19) CO s (77), CN s (20) CO s (73), CN s (21) CO s (74), CN s (22) CO s (70), CN s (21) CO s (72), CN s (21) NH ib (57), CN s (21), C"C s (10) NH ib (58), CN s (21) NH ib (53), CN s (17). CuC s (14)
1563
NH ib (53).CN s (18), CuC s (13) NH ib (48).CN s (22),CO ib (12),CuC s ( 1 1) N H ib (48),CN s (22),CO ib (1 I), CuC s (10) NH ib (41), CN s (26),CO ib (14), CuC s (13) NH ib (42),CN s (26),CO ib (13),CuC s (12) CH, abl (44).CHs ab2 (39)
1539 1542 1528 1531 1455 1455
CH,3 abl (46), CH:
t v) z w I-
5
5
;Ii
J
I0
FREQUENCY (cm-1)
FIG. 28. Raman spectrum of Z-Gly-Pro-Leu-Gly-OH unpublished work).
(J. Bandekar and S. Krimrn,
cm-', respectively, in the standard turn, (see Table XXI), and are observed nearer these values. (2) The highest observed frequency, at 1568 cm-', is significantly higher than that seen for the a helix (1545 cm-') or p sheet (1555 cm-l) of (Ala),, a feature noted for the standard turn. The amide I11 modes calculated at 1313, 1291, and 1281 cm-' have a large contribution from NH ib CN s. The weak IR band at 1314 cm-' and the strong IR band at 1294 cm-' disappear on N-deuteration and are well accounted for by the first two of the above calculated frequencies. In addition, NH ib contributes to modes at 1391, 1331, 1326, and 1300 cm-', for the second of which there is an observed N-deuterationsensitive band observed at 1333 cm-'. The NH ob coordinate makes contributions to modes above -500 cm-' at 609,583,565, and 498 cm-'. A medium-intensity band in the IR at 599 cm-' is observed to disappear on N-deuteration, and is very well accounted for by the first of the above calculated modes. It is interesting that the general prediction that amide V frequencies of /3 turns are found below those of the a helix and p sheet is supported by the results on this molecule. The normal-mode calculations on Z-Gly-Pro-Leu-Gly-OH, together with IR and Raman spectra of this molecule, thus provide a strong basis for supporting the general conclusions drawn from a vibrational analysis
+
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
309
TABLE XXV Observed and Calculated Amide Mode Frequencies of CH~-O-Gly-Ala-Ala-Gly-O-CHI and Observed Bands of Type I D-Turn Z-Gly-Pro-Leu-Gly-OH Calculated (cm-I) Mode Amide I
Amide I1
Amide 111
Amide V
Observed (cm-')
v
Groupb
Raman
Infrared
1743 1688 1681 1659 1647 1579 1562 1544 1534 1391 1331 1326 1313 1300 1291 1281 609 583 565 498
5
1741MS 1689s 1674W 16568 1644MW -
1741s 1686s 1673W 1655VS 1639VS 1568MS 1548MS 1525M 1333W 1314W 12948 599M
1
3+2 4 2+3 2+ 1 3 4 1+2 2 4 3 1 3 4 2 1
-
1333M 1325W 1316W 1291M -
3 1+2 4
S, Strong; M, medium; W, weak; V, very. See footnote a , Table XXI, for general explanation of symbols. a
of standard /3 turns (Krimm and Bandekar, 1980). They also enhance support for the conclusion (Kawai and Fasman, 1978) that Z-Gly-Ser(OBut)-Ser-Gly-O-stearyl ester, on the basis of the similarity of its IR spectrum to that of Z-Gly-Pro-Leu-Gly-OH, probably has a type I j3turn structure. A complete vibrational analysis has not been done for a type I' j3-turn molecule, but the results of a Raman study of (Leu5)-enkephalin(Han et al., 1980),which is known from X-ray diffraction analysis to form a type I' j3 turn (Smith and Griffin, 1978), are consistent with the predictions made for the standard turn. The dihedral angles in this molecule [($, t,!~)~ = 59", 25" and (4, $)3 = 9 T , -7"J are quite close to the standard values of 60", 30" and go", 0", respectively, thus suggesting that the predictions for the standard type I' /3 turn (Table XXII) are likely to be applicable. Relevant amide I modes are calculated at 1684, 1680, 1676, and 1646
310
SAMUEL KRIMM AND JACDEESH BANDEKAR
cm-'; there are observed Raman bands at 1676VS and 1642W cm-' for the crystalline material, quite consistent with these predictions (no IR spectra were presented on the crystalline compound). No amide I1 data (from IR spectra) were given, but N-deuteration-sensitive amide 111 bands were identified at 1325, 1282, 1271, and 1255 cm-'. These bands are in good agreement with predicted modes at 1311, 1290, 1273, and 1268 cm-'. No IR data were presented on amide V modes, but the above results support the general predictive capabilities of the normal-mode calculation. b. Type II /3 Turn i. Pro-Leu-Gly-NHz . The C-terminal tripeptide of oxytocin, ProLeu-Gly-NH2 , has been shown from crystallographic studies to have a type I1 p-turn structure (Reed and Johnson, 1973). Its dihedral angles are t,bl(Pro)= 152.9", r#Jn(Leu)= -61.2", +2(Leu) = 127.8", and r#Js(Gly) = 71.8", which are close to the standard values of (r#J,$)z = -60", 120" and r#J3 = 80". The normal modes of this molecule as well as its N-deuterated derivative have been calculated, and compared in detail with Raman and IR spectra (Naik et al., 1980; Naik and Krimm, 1984a). No structural approximations were made in this case, and the force field for the peptide group was transferred from more recently refined force fields for (Gly),I (Dwivedi and Krimm, 1982a) and /3-(Ala)n(Dwivedi and Krimm, 198213). Force constants for the prolyl moiety were transferred from (Pro), (Johnston, 1975), for the leucyl side chain from hydrocarbons (Schachtschneider and Snyder, 1963), and for the CONHp group from acetamide (Uno et al., 1969, 1971). For the IR and Raman spectra, and the detailed description of the normal modes, of this molecule the original publication (Naik and Krimm, 1984a) should be consulted. The 51 Raman and 46 IR bands observed below 1700 cm-' could be assigned to 68 calculated normal modes with an average error of 6 cm-'. Comparable assignments could be made for 44 Raman and 50 IR bands observed in this region for the N-deuterated molecule. In Table XXVI we give only the results for the amide modes. The predictions for amide I and I1 modes are generally good, considering that force constants were transferred without further refinement. The large frequency difference between the 1680 (IR) and 1691 cm-' (Raman) bands may reflect the presence of intermolecular interactions in the crystal. In any event, we do not expect frequencies as high as these for a standard type 11 /3 turn. Their observation and prediction are undoubtedly related to the particular structure of this molecule, which emphasizes the caution required in assuming general characteristic /3turn frequencies. Incidentally, the calculation predicts that the 1658-
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
3 11
TABLE XXVI Calculated and Observed Amide Frequencies of Type II p Turn Pro-Leu-Gly-NHZ Calculated (cm-I) Mode
1680 1664 1658 1568 1545 1375 1354 1335 1328 1266 1237 699 657 603 575
Amide I1 Amide I11
Amide V
~
a
Groupb
Y
Amide I
Observed" (cm-I)
2 3
+3 +2 4 2 3 2 3 2 2 3 2 2 2 3 3
~~
~~
Raman
Infrared
1691 MW 1664sh 1652s 1375W 1351MW 1345M 1338M 1271MS 1241s 695W 647W 614W 571M
1680VS 1662M 1650sh 1565sh 1556VS 1370M -
-
1336M 1270sh 1241M 687MS 645s 612MS 570MS
~~
See footnote a, Table XXV, for explanation of symbols. See footnote b, Table XXV, for explanation of symbols.
cm-' mode, with a significant contribution from NH2 r and NH2 b, will be shifted down on N-deuteration by a larger amount (18 cm-l) than the amide I modes of the other peptide groups (about 5 cm-'). This uniquely large shift is observed, thus making unnecessary the interpretation of this shift in terms of conformational change (Hseu and Chang, 1980). In Table XXVI we also list the observed IR and Raman bands in the amide I11 and V regions that weaken or shift on N-deuteration, and the calculated modes containing NH ib or NH ob, respectively, that can be assigned to them. Despite the prediction for the standard turn that amide I11 frequencies should occur above about 1300 cm-' (see Table XXI), we find two clearly N-deuteration-sensitive bands below, at 1271 and 1241 cm-', that are well accounted for by the calculation. This arises from the different forms of the normal modes for the two molecules, particularly the CBCy(Leu) s contribution to the 1237-cm-' mode. A similar consideration may apply to the 687-cm-' (IR) band, which is much higher than the highest predicted mode of the standard turn: The CH2 r contribution in the latter case is absent for the tripeptide.
312
SAMUEL KRIMM AND JAGDEESH BANDEKAR
From the results on Pro-Leu-Gly-NH:, we can conclude that the force fields are highly reliable in their ability to reproduce observed frequencies of this type I1 @turn structure. In addition, we see again that p-turn frequencies depend strongly on the specific dihedral angles and side chains associated with the turn. ii. Z-Gly-Pro-Gly-Gly-OMe. The tetrapeptide Z-Gly-Pro-GlyGly-OMe is found from NMR studies to adopt a type I1 /3 conformation (Perly et al., 1983), and its spectra have been analyzed with the help of normal-mode calculations (Lagant et al., 1984a). A standard type I1 pturn structure was assumed in the calculations, which were done with a Urey-Bradley force field. T h e observed amide frequencies and their assignments to peptide groups were as follows: type I, 1694 cm-' (3), 1656 cm-I (2), 1639 cm-' (4); 11, 1560 cm-l (4), 1540 cm-' (3);111, 1280 cm-l(4), 1255 cm-l (3). These differ from those of the type I1 p turn of Pro-Leu-Gly-NHe , probably mainly because of the differences in sidechain structure. iii. Cyclo(L-Ala-Gly-Aca). T h e potential for structure determination through vibrational analysis is demonstrated by a study of cyclo(~-alanylglycyl-e-aminocaproyl) [cyclo(~-Ala-Gly-Aca)], a tripeptide cyclized by a (CH& chain and therefore constrained to form a /3 turn. T h e structure of this molecule was not previously known, but conformational energy calculations suggested possible low-energy structures (Nemethy et al., 1981). Since four of these had energies within about 1 kcal/mol of the minimum, it was difficult to be certain which structure prevailed. Normal-mode calculations were used to analyze Raman and IR spectra of this molecule (Maxfield et al., 1981), and from this it was possible to conclude which of the two type I1 flturns among the four is the predominant structure in the solid state and in solution. As a general test of the method, calculations were done on the 10 lowest energy conformations calculated for this molecule (Nemethy et al., 1981); these energies and the types of bends are given in Table XXVII. T h e calculated amide I, 11, 111, and V modes were compared with observed Raman and IR bands of the parent and N-deuterated molecules in order to determine which structure gives best agreement between observed and calculated bands. Results for the amide I and V modes are shown in Fig. 29. As shown in Table XXVII, the maximum predicted splittings of the amide I mode vary with the conformation of the turn (a result of differences in the TDC contributions). On the basis of the observed splitting of -50 cm-', conformations 3, 8, 9, and 10 could be considered possible ones, although the frequencies of 3 are in better agreement with observed bands of the solid (Fig. 29a). For amide 11, observed IR bands
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
3 13
TABLE XXVII Parameters Characterizing the Theoretically Computed Minimum-Energy Conformations of cyclo(L-Ala-Gly-Aca) with Trans Peptide Bonds Maximum splitting (cm-I) of amide I modesa Conformation numbe@
Energy A E c (kcal/mol)
1 2 3 4 5 6 7 8 9 10
0.00 0.74 0.93 1.07 1.22 1.25 1.59 2.80 2.96 3.08
Turn type
I1 I I1 111 I11 111
I 111' I' I'
Absent
Present
10 7 10 12 11 10 15 13 18 11
11 10 45 21 21 18 21 53 57 63
a Calculated value of maximum splitting of amide I modes, with and without transition dipole coupling. Ntmethy el al. (1981). AE = E - E o , where Eo is the computed energy of conformation 1.
show a maximum splitting of -35 cm-'; calculated splittings of conformations 1 (30 cm-') and 3 (-50 cm-I) agree best with the observations. The amide V modes provide a strong selection criterion: as seen in Fig. 29b, only structure 3 predicts two modes in the region 650-550 cm-' that are compatible with the observed N-deuteration-sensitive bands. Thus, the combined evidence clearly favors conformation 3 as the likely one in the solid state. This structure has (+,$)2 = - 85", 74" and (4, $)3 = 132", -62", which are significantly different from the standard values of - 60", 120" and 80", 0",respectively. The angles for the type I1 p turn of conformation 1 are (+, J l ) p = - 89", 85" and (4, $)3 = 81", 74", again emphasizing the dependence of the amide frequencies on the specific values of the dihedral angles (Krimm and Bandekar, 1980). A type I1 p turn has also been confirmed iv. C~C~~(L-AZU-D-AZU-ACU). in another cyclic tripeptide, cyclo(L-Ala-D-Ala-Aca) (Bandekar et al., 1982). As in the case of cyclo(L-Ala-Gly-Aca), this conclusion resulted from a vibrational analysis of a number of conformations obtained from a conformational energy calculation. In this instance the spectroscopic study concluded that the solid-state structure corresponded predominantly to the lowest energy conformation (number 3), a structure having (4, $)z = -89", 72" and (4, $)Q = 134, -61", with the possibility of a
3+2
a
2+3+l
1+2
I
10 I+3
2
3
I,
9
I
1
3
I . I , I
I
8
3
2
I
7
2
1
23
I In
6
5
I
4 3+2
3
I ,
2
3
I
I
1 1
2
3
I I
I .I
1
2+3
I
I
3
2
2 23
n,
I
1640
I
i650
I
1660
nl
1670
I , 1 I.
1
I
II
1
I
I
1
1680
I
1690
I
1700
I
1710
d
450
500
550
600
650
700
750
800
FIG. 29. (a) Calculated frequencies in the amide I region for the 10 lowest energy conformations of cyclo(L-Ala-Gly-Aca) (see Table XXVII). The observed infrared (solid bar) and Raman (open bar) bands are shown on the bottom line. Numbers above the computed frequencies represent the groups involved in the vibration (Maxfield et al., 1981).(b) Calculated frequencies in the amide V region for the 10 lowest energy conformations of cyclo(L-Ala-Gly-Aca) (see Table XXVII). The observed infrared and Raman bands occur at the same frequencies and are indicated by the shaded bars on the bottom line. Numbers above the calculated frequencies represent the groups involved in the vibration (Maxfield et ul., 1981).
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
3 15
TABLE XXVIII Obseroed and Calculated Amide Modes of Trpe I1 fi Turn in cyclo(L-Ala-D-Ala-Aca) ~~
Observed" (cm-1) Mode
Raman
Infraredb
1669VS 1656sh 1643W
1668W 1656sh 1641VS 1565sh (1565sh) 1554M (1554MS)
Amide I
Amide I1
1546MS (1545M) Amide 111
1379MW 1310MW 1255M 1240MS
Amide V 729W
1303MW (1290MW) 1252M (1256M) 1236sh (1238sh) (767MW) 736sh (746M) 716MS (719MS) 697MS (700M)
Calculatedc (cm-I) No. 3
No. 1
1686 1676
1683
1651 1560 1555
1535 1372 1337 1300 1255 1242
1662 1646 1546 1545 1542 1370 1339 1299 1262 1237 773
749 739 690
670W
597MW
(668W) (650VW) (615VW) 592MW (592MW)
660 650 620 59 1 573
See footnote a , Table XXV, for explanation of symbols. Frequency values in parentheses were observed at low temperature. c The numbers in the heading refer to those of the computed conformations (Bandekar et al., 1982). b
small amount of another type I1 /3 turn (number 1,0.74 kcal/mol higher in energy) also being present, having (4, $)* = -96", 94" and (4, $)3 = 85", 57". The observed Raman and IR bands of this molecule and the calculated frequencies of these two structures are given in Table XXVIII. Since the (4, $)z and (4, $)3 values of conformations 3 of cyclo(L-AlaD-Ala-Aca) and cyclo(L-Ala-Gly-Aca) are essentially identical, it is of interest to compare the amide modes of these two molecules to see what effect the replacement of a Gly H by an Ala CH3 has on the frequencies. The strong -1668 (Raman) and -1642 (IR) cm-' arnide I frequencies
316
SAMUEL KRIMM AND JAGDEESH BANDEKAR
are identical for both molecules. Of the amide I1 modes, a frequency at 1566 cm-I is common to both, though this is a medium-intensity band and represents a shoulder in the Gly and Ala molecules, respectively. T h e other two amide I1 modes are quite different: 1549MW and 1533M versus 1554M and 1546MS cm-l in the Gly and Ala molecules, respectively. While both molecules have modes with NH ib near 1375 and 1305 cm-l in common, the lower frequency amide I11 modes are significantly different: -1280W and -1230M versus -1254M and 1240MS (Raman) cm-l in the Gly and Ala molecules, respectively. T h e amide V modes show similar differences. These results emphasize a point made above, viz., that p turns with a Gly residue at position 3 are not good general models for p turns. It should also be noted that the frequencies of cyclo(L-Ala-D-Ala-Aca) are not in good agreement with those of a standard type 11 p turn with Ala in position 3 (cf. Tables XXIII and XXVIII). This is probably a consequence of the significant differences in dihedral angles of these p turns. T h e only complete vibrational analysis of a type 11' p-turn peptide is that of gramicidin S (Naik et al., 1984). Although the dihedral angles of (4, $)z = 60°, - 137" and (4, JI)s = -75", - 18" are not too far from the standard values of (4, $)z = 60", -120" and (4, $):3 = -8O", 0", the situation is complicated by the cyclosymmetric nature of the molecule and the presence of a Pro residue in position 3. Thus, although good agreement is obtained between observed and calculated frequencies for the gramicidin S structure, there is, as expected, poorer agreement with calculated modes of a standard type 11' p turn (cf. Table XXII). c. Type ZZZ /ITurn. The only complete vibrational analysis of a system that adopts a type 111 p turn is that of cyclo(L-Ala-L-Ala-Aca) (Bandekar et al., 1982). This molecule was found by conformational energy calculations to be likely to assume either a type I or type I11 p-turn structure. Comparison of the observed Raman and IR bands with calculated normal modes showed that the type I11 p turn, of 0.55 kcal/mol higher energy than the minimum-energy (type I p-turn) structure, predominated in the solid state, with some increase in type I structure occurring at low temperature. T h e type I11 p turn of cyclo(L-Ala-L-AlaAca) has (4, JI)z = - 8 lo,- 53" and (4, Jl)s = - 87", - 48",compared to the standard values of - 60", - 30" and - 60", - 30", respectively. Observed and calculated frequencies of this molecule are compared in Table XXIX; the agreement is reasonably good, except for some amide V modes. T h e strong amide I modes at 1670 (Raman) and 1650 (IR) cm-l are well predicted, despite their frequency differences from modes of the standard structure (cf. Table XXIV). The strong amide I1 modes at 1543 and 1530 cm-I are well accounted for, conformations 4 and 2
-
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
3 17
TABLE XXIX Obserued and Calculated Amide Modes of Type III (No. 4 ) and Type I (No. 2) /3 Turns in cyclo(L-Ala-L-Ala-Aca) Observed" (cm-I) Mode
Raman
Amide I
Infraredb 1682sh
1670VS 1652W 1641W
1650VS 1640sh 1574sh 1565sh 1554sh 1543s 1530s
No. 2
1686 1673 1651
1681 1674 1638
1379MW
(1572W)
(1536s)
1378MW (1377MW)
1242s 784W 757w 738W 726W 681W
1278W 1272M 1241M 781MW
723MS 682MS 595w
1578
(1554sh) ( 1548s)
1362VW 1330s
Amide V
No. 4
1585
Amide I1
Amide 111
Calculate& (cm-I)
(1 278W) (1272M) (1244M) (785MW) (738M) (726MS) (694W) (679MS) (595W)
1549 1537 1371 1365 1332 1281 1245 796
1548 1536 1379 1373
1271 1241 76 1 713
703 676
675
See footnote a, Table XXV, for explanation of symbols. values in parentheses were determined at low temperature. c The numbers in the heading refer to those of the computed conformations (Bandekar et al., 1982). (I
6 Frequency
being the only ones to predict another mode below the 1543-cm-l band. All of the observed amide I11 modes are predicted, which is not the case for the other conformations. And while the frequency agreement for the 713- and 703-cm-l modes is not good (assuming the assignment is correct), other conformations do not even predict any bands between 775 and 696 cm-I. In view of the overall frequency agreement, it is highly probable that cyclo(L-Ala-L-Ala-Aca) adopts a type I11 p-turn conformation. A type I11 /3 turn has been found in the crystal structure of benzoxycarbonyl-a-.aminoisobutyryl-L-prolylmethylamide (Prasad et al., 1979),
318
SAMUEL KRIMM AND JAGDEESH BANDEKAR
but no detailed vibrational analysis has been made of this compound. Its dihedral angles, (4, $)2 = -51", -38" and (4, $)s = -65", -25", are close to the standard values of - 60", - 30" and - 60", - 30", respectively, so it might be thought that its amide modes would be close to those of the standard structure. However, the presence of the two unusual residues, as well as the urethane groups, indicates that a simple correspondence may not occur. In addition, the presence of four molecules in the unit cell is likely to complicate the spectrum. Raman spectra of this molecule in the solid state (Ishizaki et al., 1981) show bands at 1693W and 1677M, and an amide I11 mode at 1286W cm-'. The IR spectrum in CHCls (Rao et al., 1980) has bands at 1715S, 1658VS, and 1645VS cm-l. The 1715and 1693-cm-1 bands are due mainly to the urethane group. The other two expected amide I modes are assignable to the 1677- and -1650cm-l bands (assuming in the latter case that the same type I11 /3 turn is preserved in solution). It is interesting that these frequencies are close to those of cyclo(L-Ala-L-Ala-Aca), and indeed not so far from those of the standard structure (see Table XXI). The amide I11 mode at 1286 cm-' may be related to the 1278-cm-l band found in the cyclic molecule. The incomplete deductions achievable in this case contrast clearly with the relatively powerful conclusions possible on the basis of a normal-mode analysis.
3.
Turns an Proteilzr Efforts in the past to interpret the IR and Raman spectra of proteins have generally been based on the assumption of a three-state model, viz., components consisting of a-helix, P-sheet, and "random coil" structures. These attempts, which are based on the supposedly known characteristic frequencies of the above structures, have often led to controversial assignments (Yu et al., 1972, 1974; Spiro and Gaber, 1977; van Wart and Scheraga, 1978) or to incomplete assignments (Lord and Yu, 1970a,b; Chen and Lord, 1976; Chen et al., 1973; Frushour and Koenig, 1974; Craig and Gaber, 1977) for bands in the amide I and 111 regions of the Raman spectrum. As has been pointed out (Bandekar and Krimm, 1980), part of the reason for this failure is the neglect of the contributions of P turns, which we have seen are expected to be significant. In order to incorporate their presence in protein spectral analysis, it is necessary to know the characteristic frequencies of p turns. However, since a variety of such &turn structures exist and since what one observes in proteins are P turns with dihedral angles that vary from the canonical values (Chou and Fasman, 1977; Venkatachalam, 1968a; Lewis et al., 1973), it is not easy to identify these structures by a study of model compounds, or by the assumption of characteristic frequencies
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
3 19
for a generalized &turn component derived from a set of known protein structures (Williams, 1983). In this context, it is nevertheless important to know if p turns in proteins make specific contributions to the spectra. The only protein for which such a vibrational analysis has been done is insulin (Bandekar and Krimm, 1980). This protein is a particularly suitable one for such a study, since its structure has been solved (Blundell et al., 1972), it is relatively small, with only four j3 turns, and Raman spectra of single crystals have been reported (Yu et al., 1974). The normal-mode calculations (Bandekar and Krimm, 1980) permit a correlation of previously unassignable bands in the Raman spectrum with j3 turns in the structure, as well as showing that some of the computed &turn frequencies lie in spectral regions previously associated exclusively with a-helix modes. These results thus emphasize our previous remarks that caution must be exercised in proposing unique assignments of bands to a-helix and P-sheet structures in proteins. The /3 turns in insulin are of four different types and have the following dihedral angles (Blundell et al., 1972): A7-10: Cys-Thr-Ser-Ile (type IV), (4, $ ) 2 = -89", 20"; (4, $)3 = -141", -134". A12-15: Ser-Leu-Tyr-Glu (type 111), (4, $ ) 2 = -76", -40"; (4, $)3 = -69", -47". B7-10: Cys-Gly-Ser-His (type 11'), (4, $ ) 2 = 84", -107"; (4, $)3 = -92", -24". B20-23: Gly-Glu-Arg-Gly (type I), (4, $)* = - 143", 11"; (4, $)3 = -96", -40". These dihedral angles were used in normal-mode calculations on the model system CHsCO-(Ala)4-NHCH3. The amide I and I11 frequencies of the above four /3 turns are given in Table XXX. We note that the angles for the type I turn are quite different from the standard values, and the frequencies (though not the ranges) are also different; whereas for the type I11 turn the angles are closer to the standard values, and so are the frequencies (though some group assignments differ). The predicted amide I modes for groups 2-4 of the /3 turns center near two frequencies, 1652 & 3 and 1680 & 4 cm-l. Bands are observed near these frequencies in the Raman spectra of single crystals of insulin, viz., at 1658 and 1681 cm-1 (Yu et al., 1974). The 1658-cm-' band has been assigned (Yu et al., 1974; van Wart and Scheraga, 1978), on the basis of previous correlations, to the known 40-50% a-helix component of insulin (Blundell et al., 1972). This is a reasonable interpretation,
320
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE XXX Amide I and Amide I l l Frequencies of P-Turn CH,-CO-(Ala)4-NH-CH;I Insulin Dihedral Angles Amide I
P Turn
a
Amide 111
Group"
Frequency (cm-')
Group"
Frequency (cm-I)
1+2 4+5
1697 1677 1660 1656 1650 1696 1683 1674 1655 1648 1680 1677 1675 1655 1650 1684 1683 1671 1653 1646
1 + 3 4 2 3+1
1311 1302 1296 1290 1281 1310 1305 1299 1289 1283 1319 1311 1307 1289 1281 1315 1296 1290 1287 1281
5 + 4
B7-10 (type 11')
with
3 2+1 1+2 4 + 3 5 2+ 1 3+4 1 5 2 3 4 2+1 3+4 5 4 + 3 1+2
5
1 3+4 2+4 4+3 5 1+3 3+1 4 2 5 1 3+1 4+5 2+3 5+2
See footnote b, Table XXV, for explanation of symbols.
except that our calculations would now suggest that the @-turncomponent of the insulin structure also contributes in this region. The origin of the observed band at 1681 cm-' had previously been perplexing. It had been assigned (Yu et al., 1972; Spiro and Gaber, 1977) to a random-coil component, but this is difficult to support since it disappears in denatured insulin (Yu et al., 1972). van Wart and Scheraga (1978) have commented that "the shoulder at 1681 cm-' might be due to a state not encountered in model studies." The results of the normal-mode calculations make a strong case for assigning this band to the @ turns in the native insulin structure. The disappearance of this band on denaturation (Yu et al., 1972) is certainly consistent with this assignment, as is its continued presence in a deuterated single crystal of insulin (Yu et al., 1974). T h e predicted amide I11 modes of groups 2-4 fall roughly into two groups: at 1289 k 1 cm-' and fairly uniformly distributed in the range 1311-1296 cm-l. If external hydrogen bonding is not included, these frequencies are at 1280 k 2 and 1298-1287 cm-'. T h e observed amide
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
32 1
I11 modes in the Raman spectrum (Yu et al., 1974) are found at 1240, 1269, 1284, and 1303 cm-l. These bands have been assigned at follows: 1240 cm-l to random coil and P sheet (Yu et al., 1972), 1269 and 1284 cm-l to a helix (Yu et al., 1972), and 1303 cm-l to “the a-helical category” (Yu et al., 1974). On the other hand, while essentially agreeing with the assignments of the first three bands, other authors (van Wart and Scheraga, 1978) note that “the band at 1303 cm-l cannot be assigned on the basis of what is presently known from model studies.” The reason behind this is that, although high-frequency amide I11 modes have been correlated with a-helix structures, the highest frequency that had been observed for such a band was 1295 cm-l in solid poly(Llysine) HCl at 50% relative humidity (Chen and Lord, 1974). Thus amide 111 bands near 1300 cm-l and above cannot be correlated with a-helix, P-sheet, or unordered structures. The calculations on insulin /3 turns, however, clearly indicate that the band at 1303 cm-l, and probably part of that at 1284 crn-’, can be assigned to /3 turns. I n addition, normal-mode calculations on canonical P turns (Bandekar and Krimm, 1979a; Krimm and Bandekar, 1980) have shown that they have characteristic amide I11 modes above 1300 cm-’, in a region where such modes are not found for a-helix and P-sheet structures. T h e 1303-cm-’ band assignment is thus strongly supported by these calculated results. Whereas the 1284-cm-’ band could be associated with the a helix, the fact that a band is predicted near this position for the /3 turns of insulin indicates that, at the very least, this band should be considered to be partly due to the presence of the latter structures. Based on the assignment of amide I11 modes with frequencies near 1300 cm-l to p turns, heretofore unassignable bands in other proteins could be interpreted (Bandekar and Krimm, 1980). A band at 1300 cm-’ in lysozyme (Chen et al., 1973) may also be due to P turns. Similar assignments may be appropriate for the 1305-cm-’ band in human carbonic anhydrase (Craig and Gaber, 1977), the 1305- and 1317-cm-’ bands in ribonuclease (Koenig and Frushour, 1972), the 1314-cm-l band in ovalbumin (Koenig and Frushour, 1972), and the 1279-cm-’ band in concanavalin A. In a study of the Raman spectra of BenceJones proteins (Kitagawa et al., 1979), amide 111 bands were observed at 1242, 1262, and 1318 cm-1 in the solid state and at 1245, 1265, and 1322 cm-l in aqueous solution for the type A protein. Since crystal-structure analysis of this protein (Epp et al., 1974) shows that it contains about 50% P-sheet structure and no a helix, the strong band at 1242-1245 cm-’ was assigned to the P-sheet structure by Kitagawa et al. (1979). But there were no reasonable assignments for the weak bands at 1262-1264
-
322
SAMUEL KRIMM AND JAGDEESH BANDEKAR
FIG.30. ORTEP drawing of CHS-CO-(L-Ala)s-NH-CHs model of a y turn. The CHS groups of the L-Ala residues are represented by point masses. External hydrogen bonds are included (Bandekar and Krimm, 1985a).
and 1318-1322 cm-1 on the basis of the three-state model. Since this protein has nine p turns, and since bands are predicted above 1300 cm-' for p-turn types 1-111 (Krimm and Bandekar, 1980), it was proposed (Bandekar and Krimm, 1980) that the bands at 1262-1265 and 13181322 cm-l of the Bence-Jones proteins are assignable to its p-turn component. These assignments are supported by the disappearance of these three bands on thermal denaturation and on N-deuteration (Kitagawa et al., 1979). It must be noted, however, that p-turn frequencies can occur in the region 1262-1265 cm-l if the conformation results in weak hydrogen bonds (Bandekar and Krimm, 1979a; Kawai and Fasman, 1978).
C. y Turns 1 . Standard T u r n a. Structure. They turn is formed by three amino acid residues, i, i + 1, and i + 2, and is characterized by the presence of two hydrogen bonds (see Fig. 30). The 3 + 1 hydrogen bond between the CO of residue i and the NH of residue i + 2 forms a C7 structure (Bragg et al., 1950). Since the early study in 1972 (Nemethy and Printz, 1972), the y-turn structure has been further refined using improved energy parameters (G. Nemethy, personal communication), and three different energetically stable conformations have been proposed: (1) the y turn ( y , a Cy structure); (2) the mirror-related y turn ( y M , a CFq structure); and (3) the inverse y turn ( y , , a C p structure). In y and y M there is a second (1 --.* 3) hydrogen bond between the NH of residue i and the CO of residue i + 2; in yI this bond is between the CO of residue i - 1 and the NH of residue i + 3 (a 5 + 1 bond).
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
323
TABLE XXXI Dihedral Angles for y-Turn Structures of CH~-CO-[L-A~~),-NH-CH~~ Angleb
y Turn
180 - 152 90 -161 58 -74 178 -76 149 -179 a
y~ Turn
180
- 155
-53 177 -81 74 - 179 - 154 158 - 180
y I Turn
180 59 - 177 172 -77 68 - 176 - 163 - 55 179
All angles given in degrees. See Fig. 30 for designation of angles.
The model systems for the normal-mode calculations (Bandekar and Krimm, 1985a) were CHsCO-(Ala),-NHCH3 (n = 3 and 5) with external hydrogen-bonded atoms. The dihedral angles used for the various 7-turn conformations of C H ~ - C O - ( L - A ~ ~ ) ~ - N H are - Cgiven H ~ in Table XXXI. T h e dihedral angles of the additional residues for the n = 5 y-turn model were taken to correspond to those of the APPS [viz., (c#I,$, O)O = (4, 0 ) 4 = - 139", 133", 180"l. For both structures, in order to use the refined force fields, we used the bond lengths and bond angles of P-(Ala), (Dwivedi and Krimm, 1982b), while retaining the above dihedral angles. The side-chain Ala CH3 groups were replaced by point masses at the P-carbon atom. T h e terminal CH3 groups were treated completely. 6. Vibrational Analysis. The force field used for these calculations was based on a recent refinement for P-(Ala),, in the approximation of the side-chain CH3 group taken as an equivalent point mass (Dwivedi and Krimm, 1984b). Transition dipole coupling was incorporated for amide I and II modes, using Apeff = 0.45 D for amide I and 0.279 D for amide 11. Detailed frequencies and potential energy distributions for both structures are given in the original paper (Bandekar and Krimm, 1985a). In Table XXXII we present the results for the amide modes of the n = 3 molecule; we indicate in the discussion the changes when n = 5. The amide I modes of all y turns are predicted in the range 16851650 crn-', although for n = 5 this range is narrowed to 1675-1655 cm-1. It should be noted that the lower part of this range overlaps the characteristic a-helix frequency. T h e possibility of distinguishing be-
+,
324
SAMUEL KRIMM AND JAGDEESH BANDEKAR
TABLE XXXII Calculated Amide Frequenciesa of CH3-CO-(~-Ala)3-NH-CH3 YM
Y Mode Amide I
Amide I1
Amide 111
Amide V
in y-Turn Conformatiom
v (cm-1).
Groupb
1684 1670 1655 1653 1552 1529 1526 1509 1390 1367 1336 1327 1297* 1282 1242* 1225*
0 + 3
1 + 3 3 + 1 2 o+ 1 1 1 3 3+2 1 0 3+2
709 706 676 655 608 602 548 517 493
3 3 1 1+2 2 2 0 0 0
1
2 3+0 0
u
YI
(cm-1).
Groupb
u (cm-1).
Groupb
1675 1668 1656 1654 1551 1540 1527 1518 1387 1352 1331* 1310* 1261* 1254* 1248*
0 + 3 2 1 3+0 2+1+0 0+2 3 1 0 2 1 2+3+1 0 3 0+1
729 719 707 677 643 570 562
2 1 3 2 1 0 0
1675 1667 1660 1649 1546 1540 1512 1503 1375 1371 1351 1346 1323 1308* 1268 1243* 1232* 718 712 706 698 644
0 + 3 2 + 1 1+2 3+0 1+2 1+2 3 0 1 2 3 0 1 2 + 1 3+1 3 + 2 0 1 2 + 3 3 0-t-1 2
a Only frequencies with an asterisk have a CN stretch contribution, in all cases appropriate to the peptide group except the 1268 cm-' band of 7 , . which has a CN(2) stretch contribution. b The numbers refer to the peptide groups of Fig. 30. The designation 0 + 3 indicates that both groups contribute to the mode, that of 0 being larger.
tween the different y turns clearly depends on identifying the groups contributing to different frequencies, since modes associated with groups 1 and 2 at the turn have frequencies near 1668 and 165'7 cm-' for all y turns (for n = 3 and 5). As we have shown (Bandekar and Krimm, 1985a),if isotopic substitution can be used, a distinction is possible: A calculation for a molecule with 14CO(l)indicates that, from the location of the shifted band in the original sequence of amide I frequencies plus the magnitude of the shift, it should be possible to identify the type of turn. Incidentally, this is aided by the fact that the amide II(1)
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
325
modes are also shifted by a I4CO(l) substitution (since amide I1 always contains CN s). The situation for the amide I1 modes is complicated by the different mixing of NH ib for the different turns. Thus, the frequencies for groups 1 and 2 are 1529 and 1509 cm-' in y and 1518 and 1551 in yM, but these modes mix equally in y~ to give frequencies at 1546 and 1540 cm-I. (For n = 5, differential changes occur for groups 1 and 2: 1527 and 1509 cm-I for y , 1547 and 1544 for YM, and mixed modes at 1541 and 1558, 1525 cm-l for 71,respectively.) This again suggests that backbone isotopic substitution can help to elucidate the structural origin of the frequencies, and a calculation for an 15NH(2)structure (Bandekar and Krimm, 1985a) indeed shows this to be the case: Amide II(2) drops by 12 cm-l for y, by 25 cm-I for YM, and the equal mixing for yI is altered to give purer modes at 1542(1) and 1540(2) cm-'. Backbone isotopic substitution thus provides a new dimension of conformational analysis when combined with normal-mode calculations. As shown in Table XXXII, a relatively larger number of modes in the region 1400-1200 cm-I contain a significant NH ib contribution. The frequency distributions of these modes differ between the different yturn structures, and the utility of this region for structure determination will depend on the intensities of the bands in the Raman and IR spectra. A similar comment is applicable to the region of NH ob contribution (below -730 cm-l), although some of the differences are larger than for NH ib (Bandekar and Krimm, 1985a). While calculations on canonical structures can provide useful guidelines for correlating frequencies with conformations, actual y-turn structures will often have dihedral angles that differ from the standard values. It is therefore important to do the normal-mode analysis on the actual structure under consideration, taking appropriate account of the side chains involved.
2 . y Turns in Peptides In order to gain confidence in the predictions for the canonical y-turn structures, it is important to have a satisfactory vibrational analysis on a molecule with a known y-turn structure. Such an analysis has been done for cyclo(D-Phe-L-Pro-Gly-D-Ala-L-Pro) (Bandekar and Krimm, 1985b), a cyclic peptide known from X-ray studies (Karle, 1981) to contain a y~ turn. Its dihedral angles are (+, +), = 135", -69"; (+, J l ) p = -82", 59"; (+, 4)s = 81", -126". The normal-mode calculations were done on a structure with the prolyl rings included, the side chains approximated by point masses, and external hydrogen bonds included (See Fig. 31). In order to maintain
326
SAMUEL KRIMM AND JAGDEESH BANDEKAR
FIG. 3 1. Schematic illustration of cyclo(D-Phe-L-Pro-GIy-D-Ala-L-Pro). Peptide groups are numbered as for canonical y turn (see Fig. 30), and external hydrogen bonds are shown (Bandekar and Krimm, 1985b).
standard bond lengths and angles (so that the force field could be transferred), the dihedral angles had to be modified slightly, in most cases by less than 5" but in the cases of +(Pro-3) and +(Pro-1) by + 12" and + 1lo, respectively. This modified the internal hydrogen-bond lengths only slightly. The force field for the backbone was one refined for a point mass approximation (Dwivedi and Krimm, 1984b), while that for the prolyl ring was transferred from one for (Pro),JI (Johnston, 1975). The observed Raman and IR amide bands, together with the calculated normal-mode frequencies, are given in Table XXXIII. The observed amide I modes are very well reproduced, including the fact that the two lowest frequencies are associated with the Pro groups. Only groups 0, 2, and 4 are expected to have amide I1 modes, and the observed (N-deuteration-sensitive) bands are reasonably well accounted for. A large number of modes in the region 1400-1200 cm-I are predicted to have NH ib contributions and therefore to be sensitive to N-
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
327
TABLE XXXIII Observed and Calculated Amide Frequencies of cyclo(D-Phe-L-Pro-GI-D-Ala-L-Pro) ~~
Observed" (cm-1) Mode Amide I
Raman 1685M
1689s
16643
1666VS
1634M 1618W
747w 724VW
1638MS 1621s 1563sh 1542MW 1522MS 1386sh 1341w 1319MW 1304MW 1280sh 1272MW 1266W 1248MW 1233W 750M 724W
701VW
700M
588MW
653sh 593MW
Amide I1
Amide 111
1389VW 1340W 1327W 1303W
-
1278M 1273M 1252s
-
Amide V
b
Infrared
~~
Calculated (cm-1) V
1693 1667 I1659 1640 1628 1561 1549 1531 1386 1336 1313 1301 1293 1286 1272 1266 1249 1237 756 731 655 593
Groupb 2 4 +0 0+4 1+3 3+1 4+2 2+4 0 4 2 4 0 2 0 0
4 2 4 2(+Phe) 4 0 4
See footnote a, Table XXV, for explanation of symbols. Group numbers refer to the peptide groups in Fig. 3 1.
deuteration. While many of the observed bands are weak and hard to assign definitively, they are well predicted. However, bands at 1319, 1304,1248,and 1233 cm-l clearly weaken on N-deuteration and their frequencies are calculated very well. The amide V modes are also predicted quite well, despite the overlap with Phe modes at 750 and 700 cm-' (using an internal standard band, it could be shown that their intensities decrease on N-deuteration). The good predictions for the cyclic molecule indicate that the predictions for the canonical structures should be reliable. As expected, the frequencies of the cyclic structure are different from those of the standard structure, a result of the cyclic nature of the former system, the differences in dihedral angles, and the presence of prolyl residues.
328
SAMUEL KRlMM AND JAGDEESH BANDEKAR
VI. CHARACTERISTICSOF POLYPEPTIDE CHAIN MODES
A . Introduction The goal of a vibrational spectroscopic study of a polypeptide molecule is to derive structural information from spectral parameters, such as band frequencies, intensities, and polarizations. In the past, the frequencies of the amide modes were the main diagnostic quantities, with structural insights being obtained from correlational studies based on observed spectra of known polypeptide chain structures. It is apparent from the preceding discussions that we now have a rigorous basis for understanding the normal modes of a polypeptide chain. Instead of speculating on the meaning of differences in the spectrum, it is now possible to provide a detailed prediction of the effects of structural changes on the normal modes. It is appropriate at this stage to examine what such calculations tell us about general characteristics of vibrational frequencies of the polypeptide chain. We do this on the basis of the structures studied thus far by normal-mode analyses. The discussion of the peptide group modes in NMA (Section II,B,2) serves as a useful background for the present considerations.
B . Amide and Skeletal Modes of the Polypeptide Chain
1 . NH Stretch Mode Although NH s is a highly localized mode, and therefore not likely to be sensitive to chain conformation, its frequency depends strongly on the strength of the N-H O=C hydrogen bond, and it can be expected that this will be a sensitive reflection of structure and its variations. The observed NH s band, normally seen between 3310 and 3270 cm-' (Krimm and Dwivedi, 1982a), represents a modification of v i by at least two factors: TDC and Fermi resonance. T h e former must be expected since the dipole derivative for this mode is large (Cheam and Krimm, 1985); however, to date no splittings assignable to this effect have been observed, such as are seen for amide I and I1 modes. Fermi resonance (see Section II,E,4) results in an upward shift in frequency because of the interaction with the lower frequency overtone or combination band, v 8 , involving amide I1 modes. In Table XXXIV we give data on observed spectra that have been analyzed, and on the assignment of the combination that interacts with the fundamental. It is interesting that, in general for helical structures, the combination interacting with the NH s
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
329
TABLE XXXIV Observed and Unperturbed NH Stretch Frequacies and Combinationsa and Their Assignments Structure
(GlY).I (G$)"II P- (Ah). a-(Ala), P-(GluCa), 310-(Aib),
VA
VEI
Vl
3300 3278 3280 3307 3275 3272
3080 3086 3072 3058 3088 3060 3030
3272 3257 3242 3279 3230
3261
Vl
Combinationb
3108 3108 3109 3086 3133 3070 3043
1517 + 1602 = 3119 2 X I550 = 3100 1524 + 1592 = 3116 2 X 1545 = 3090 1552 + 1576 = 3128 2 X 1545 = 3090 2 X 1531 = 3062
Ref.c 1 2 1 1 3 4
In cm-I. Italicized frequencies are observed IR bands; others are calculated values. 1 , Krimm and Dwivedi (1982a); 2, Dwivedi and Krimm (1982~);3, Sengupta et al. (1984); 4, Dwivedi et al. (1984). a
fundamental is an overtone of an amide I1 mode, whereas for @sheet structures it is a combination of amide I1 modes of different species. Having data on an experimentally determined NH s fundamental enables us to ask how well this frequency correlates with the geometry of the hydrogen bond. [It should be noted that the varying anharmonicities in the combination bands, i.e., the differences between v i and the combination frequency, suggest that v x cannot be obtained accurately from Y A + V B - 2v11 (Fraser and MacRae, 1973, p. 205).] Incidentally, the frequency depends on both F(NH) and F(H 0),and a correlation of these force constants with hydrogen-bond geometry is therefore a more fundamental relation; however, both force constants have the expected dependences on the strengths of the hydrogen bonds [viz., 5.674 and 0.150 versus 5.830 and 0.120 for /3-(Ala)nand a-(Ala), , respectively]. Previous authors (Nakamoto et aZ., 1955; Pimentel and Sederholm, 1956) have shown that there is a dependence of V A or Av 5 vfree- V A on Z(N ... 0),and this relationship has been used to relate the NH s frequency with Z(N 0)in the a helix (Fraser and MacRae, 1973, p. 205). This correlation suffers from several deficiencies. First, the original relationship was based on V A rather than v i values (Pimentel and Sederholm, 1956). Second, this relationship was derived from data on a broad range of crystalline compounds having N-H O=C hydrogen bonds, without regard to the influences of crystal packing forces and hydrogen-bond geometry other than Z(N -.-0).Third, it is necessary to know the frequency of the free NH group, and it is not clear whether the molecule chosen to provide this (CZH~CONHCZHF,) is necessarily the appropriate representative of a polypeptide chain. It is, therefore, better to correlate Z(N ..-0)with v i , recognizing that this still does not provide e..
e..
330
SAMUEL KRIMM AND JAGDEESH BANDEKAR
U; (CM-'I
FIG. 32. Relationship between N .*.0 distance [I(N O)] in hydrogen bond and unperturbed NH stretch frequency (I&. From lower left, points correspond to p-(GluCa), ,p(Ala), , 3,,-(Aib). , and a-(Ala),. '
a complete dependence on hydrogen-bond geometry (Cheam and Krimm, 1986). In Fig. 32 we show a plot of 1(N 0)versus v i for the non-glycinecontaining polypeptides listed in Table XXXIV. [The data for (Gly),I and (Gly),II depart significantly from the curve of Fig. 32,the reason for which is not clear.] The hydrogen bonds involved all have HNO in the range of about 3-10' and NHO in the range of about 165-175'. Under these conditions, and in the range of 1 = 2.70-2.90A, it appears that the relationship is relatively linear. While this curve coincides at the a-(Ala), point with one given previously (Fraser and MacRae; 1973,p. 204), the latter deviates significantly at the lower end [e.g., predicting v i = 3180 cm-l for Z(N 0)= 2.70 A]. The relationship in Fig. 32 should permit the determination of Z(N ... 0) from v i in the region of about 32003300 cm-l, where the variation corresponds to 0.0035 A1cm-l.
2. Arnide I Mode The amide I mode in polypeptides is still predominantly CO s plus CN s, but it can also contain significant contributions from C"CN d and minor contributions from CaC s, CNC" d, H a b, and NH ib. (The presence of the latter is mainly responsible for the downshifts in the amide I
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
33 1
frequency on N-deuteration.) Therefore, we cannot expect even an "unperturbed" frequency (defined, for example, as an average of the frequencies calculated without TDC) to necessarily reflect the strength of the hydrogen bond, as was true for the NH s frequency. In fact, although the F ( C 0 ) force constants of p-(Ala), and a-(Ala), of 9.822 and 10.029, respectively, are in the proper order of the respective hydrogenbond strengths, the "unperturbed" frequencies of 1670 and 1662 cm-', respectively, are not, demonstrating this point. This is probably related to the fact that amide I in p-(Ala)n is on average a CO s (74), CN s (19) mode, whereas in a-(Ala), it is essentially a CO s (82), CN s ( 1 l), C"CN d (10) mode. This eigenvector difference probably also accounts for the difference in intensities between the amide I modes of p-sheet (Chirgadze et al., 1973) and a-helix (Chirgadze and Brazhnikov, 1974) structures. T h e main perturbing influence on the amide I mode is TDC, although Fermi resonance interactions can occur in special cases (Dellapiane et al., 1980). For the structures analyzed thus far by normal-mode analyses, the calculated frequencies obtained with such coupling and their observed counterpqrts are collected together in.Table XXXV. While there are common special features within structural groupings, small observed differences result from real structural differences, showing that generalized perturbation treatments (Miyazawa and Blout, 1961 ; Krimm, 1962) cannot provide a correct description of the actual situation. In the case of the extended chain structures, the smaller splitting for (Gly),I (1685 - 1636 = 49 cm-l) than for P-(Ala), (62 cm-') and p(GluH), (69 cm-') results from different TDC interactions in the APRS as compared to the APPS structures. This also accounts for the higher frequency of the strong Raman band in (Gly),I at 1674 cm-', which in APPS structures is generally found close to 1669 cm-' (Frushour et al., 1976). T h e slightly lower value in p-(GluH), (1665 cm-') as compared to p-(Ala)" (1669 cm-') is probably related to the slightly stronger hydrogen bond in the former (Sengupta et al., 1984), as is the lower value of the strong IR band (1624 versus 1632 cm-'). (No allowance was made for this difference in hydrogen-bond strength in the calculated frequencies.) For the helical structures, it is interesting that, despite the significantly different conformations, the strong Raman band is found near 1652 cm-' for all structures. Small frequency differences may again reflect real differences in hydrogen-bond strengths: The v i value for a-(GluH), is between those of a-(Ala), and Slo-(Aib), (Sengupta and Krimm, 1985), and the mean values of Raman and IR modes are also in this range.
TABLE XXXV Observedo and Calculated Frequenciesb of Backbone Modes of Polypeptides
Gly.1 Mode Amide I
Obs.
Calc.
1685
1689 1677 1643 1515 1514
1674
Amide I1
1636 1517 (1515)
Amide 111
1410
(1408) (1295) (1220) (1214) Skeletal Amide V
1162 884
708
1415 1415 1286 1213 1212 1153 890 718
P-(Ala),, Obs.
a-(Ala),,
a-(GluH),
310-(Aib),~
Calc.
Obs.
Calc.
Obs.
Calc.
Obs.
Calc.
Obs.
Calc.
(1694) 1695 1669 1670 1632 1630 (1555) 1562 (1538) 1539 1524 1528 (1399) 1402 (1402) 1399 (1333) 1332 1243 1236 1224 1231
(1693) 1665 1624 (1597) (1568) 1560 1260 (1225) 1223
1692 1668 1630 1607 1576 1550 1249 1221 1222
1654
1654 1645
1658
1652
1657 1655
1656
1655
1657 1655
1653
1640
1647
1665 1661
(1560) 1550
1565 1551
1545 (1516)
1538 1519
1550 (1510)
1537 1517
(1531)
1380 (1333) 1283
1382 1344 1290
(1338) (1278)
1345 1287 1278 1262
1326 1299 (1283) 1287
913 706 704
956 (705)
944 713
884
889 738 669
908
706 698
a
(Gly).II
Obs.
909
Calc.
P-(GluCa).
740 673
1270 1265
658 618
910 660 608
134@ 1296
924
(670) (618)
922 678 626
1545
(1339) (1313)
1480
908
694 (680)
Raman bands, italic; IR and Raman bands, bold; IR bands, regular type. Intensities weaker than medium are in parentheses. In cm-I. Obs., observed; calc., calculated. Overlapped mode.
1547 1533 1346 1312 1287
905 701 676
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
333
3. Amide II Mode The amide I1 mode is predominantly NH ib plus CN s, but it always has a significant contribution from C"C s and smaller contributions from CO ib and NC" s (see Table XXXVI). It is always strong in the IR spectrum and weak o r absent in the Raman. On N-deuteration, it disappears from the IR spectrum, with ND ib contributing to and mixing with other modes in the region 1070-900 cm-' (and appearing in the spectra usually in the region 1040-940 cm-'), and CN s moving to the region 1490-1460 cm-', where it mixes with C"C s and CO ib to give modes that are usually observed in the region 1480-1465 cm-I. As we have noted before, amide I1 is perturbed by TDC, and a collection of observed and calculated frequencies is given in Table XXXV. For the extended chain structures, the significant frequency differences seem to result from the influence of the structure on the form of the normal vibration. Thus, for the strong IR bands at 1517, 1524, and 1560 cm-' in (Gly),I, P-(Ala),, and P-(GluCa),, respectively, there is a definite trend of increasing frequency with increasing relative contribution from NH ib. This contribution of course depends on the relative force constants in the cases of (Gly),I and P-(Ala),, and the relative structures, particularly in the influence of the side chains, in the cases of P-(Ala), and P-(GIuCa), . The fact that the observed frequencies are well reproduced suggests that these factors are properly accounted for. For the helical structures, it is interesting that the strong IR band is found consistently at 1550- 1545 cm-', despite the large differences in structures. Again, there seems to be a correlation, with the relative contribution of NH ib in the bands of a-(Ala),, 31~-(Aib),,and (Gly),II at 1545, 1545, and 1550 cm-', respectively. [The higher frequency for a(GluH), compared to a-(Ala),, even though the PEDs are the same, is due to the slightly stronger hydrogen bond in the former a-helix structure (Sengupta and Krimm, 1985).] If these frequencies are plotted together with those of the extended chain structures as a function of the NH ib contribution, there is an essentially linear relation between the two quantities. Interestingly, this relation extrapolates to near 1460 cm-' when the NH contribution is zero, roughly where the CN s mode is found for an N-deuterated system! 4 . Amide III Mode
Although the so-called amide 111 mode has been described as the localized counterpart to amide I1 (Fraser and Price, 1952), and, as we have seen, contains NH ib and CN s in NMA, the situation is in fact much more complex for the polypeptide chain. The main point is that
TABLE XXXVI Potential Energy Distributions" for ObservedbStrong Am& I1 Modes
~~~~~~~~~
NH ib CN s CQCs CO ib
~
35 28 17 14
NC- s a
Contributions 2 5 . In cm-I.
41 26 13 14
58 18
8
55 21 11 8 6
~
46 33 10 11 6
~~
46 33 10 11 6
47 29 11 11 6
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
-
335
NH ib is a significant component of a number of modes in the 1400- to 1200-cm-' region, mixing differently in different parts of this region and as a function of the side-chain structure. As has been noted (Hsu et al., 1976), it is therefore not possible to expect a simple general relationship between such a frequency and the backbone conformation (Lord, 1977). In NMA, the main contributions 310% to the PED of this mode, in addition to NH ib, come from CN s, CaC s, and CO ib (Rey-Lafon et al., 1973). Additional contributions are also made by NC" s and CO s, and the dipole moment derivative, and therefore the intensity of this mode and the orientation of dp/dQ, depends strongly on the details of the force field (Cheam and Krimm, 1985). However, in a polypeptide chain other coordinates can make major contributions, and these are influenced by the side chain. In addition, if the main criterion for an amide I11 mode is a significant contribution from NH ib (since CN s is certainly not a common feature), and therefore a sensitivity to N-deuteration, we must broaden our outlook to include all modes in the 1400- to 1200cm-' region. Such observed bands, and their calculated counterparts, are given in Table XXXV, and PEDs for nonweak observed bands are given in Table XXXVII. It is clear from these results that the nature of the amide I11 mode depends very much on whether or not the side-chain structure involves a Ca-Ha group. For the Gly polypeptides, CH2 modes mix extensively with NH ib. For 310-(Aib),,there is some mixing with side-chain modes. In the cases of side chains with Ca-Ha groups, the predominant contribution is from Ha b2, in which Ha moves essentially perpendicular to the H"CuC6 plane (Ha b l being the in-plane mode). It is clear that this is the most important coupling to NH ib, as can be seen from the observed bands of P-(Ala), and a-(Ala), . It is interesting that Ha b l makes large contributions in the case of the a helix and none in the APPS structure. This is probably due to the fact that in the extended chain structure the HaCaCP plane is more nearly parallel to the NH bond, making the NH ib and Ha b2 coordinates nearly parallel and therefore strongly coupled through the CN bond, whereas in the a helix the angle deviates more from parallelism, and maximum coupling requires contributions from Ha bl as well as H" b2. This would suggest that there may be a sensitivity of NH ib modes to the backbone 4 angle, rather than to $ (Lord, 1977), and in fact the lowest frequency NH ib mode correlates somewhat better with than $ . However, such relations must presently be viewed with caution, since the lowest frequency NH ib mode is not necessarily the one containing the largest contribution from this coordinate. Certainly, associating characteristic frequency ranges
-
+
TABLE XXXVII Potential Energy DistnbutionP for ObservedbNonweak Amide III M&s
Coordinate NH ib CN s CH2 w CHZ b CHp t W NC" s (2°C s HI b2 H" bl CSH, W t CYHp W t
1410
1162
1243
1224
1260
1223
1380
1283
1270
1265
1296
1280
14
12
19 13
18 11
11 15
11
13
28 6
40 9
15
10 6
41 31
5
13 12 23
8 8 24
20 24
58
29 7
50 13
19 34
24 28
17 23
10 7 22
6 14
16 15
15
14 8 15 11
28
23 13 14
ccp w
CHS r CO ib C"CS s C"CN d CSH, r
6
7 9
6 8
Contributions 25.
* In cm-'.
9
9
VIBRATIONAL SPECTROSCOPY OF PEPTIDES AND PROTEINS
337
TABLE XXXVIII Potential Energy Dtitributions" for Obsenredb Skekhl Stretching Modes
C"C s CN s co s CNC" d C"CN d NC"C d CHs r CHs r CK7 s C7C6 s CO ib
31 24 12 8
15 14 11 13 8 7
17 18 11
6 23 10 16
9 27 9 12
5
12
13 9 22 11 6
21 14
11
9 10
12
cc*ss a
18 13 11 9
8
11
7
Contributions 2 5 . In cm-'.
with conformation can be dangerous when w e observe that bands of comparable intensities are found in comparable regions for P-(GluCa),, (1260 cm-l) and cu-(Ala),2(1265 cm-I).
5. Skeletal Stretch Mode I n addition to the amide modes, all polypeptide chain conformations appear to have a characteristic skeletal stretching mode that is of relatively common origin and that gives rise to a strong Raman band, generally in the region 960-880 cm-'. (The counterpart skeletal stretch mode, found near 1100 cm-' in NMA, does not show up as a characteristic band in polypeptides; rather, its NC" s contribution is distributed broadly in the region 1180-920 cm-', depending on the side-chain composition.) It is important to determine whether this mode has any sensitivity to conformation. T h e observed and calculated frequencies of this mode are given in Table XXXV, and the PEDs for observed bands are given in Table XXXVIII. In (Gly),I and (Gly),II this skeletal stretch frequency is observed at 884 cm-', and the calculated modes at -890 cm-' have quite different PEDs. Thus, despite different structures and different eigenvectors, the frequencies are the same, and apparently insensitive to conformation. The same seems to be true of (AIa),t:for P-(Ala),, the observed band is at 909 cm-' (calculated at 913 cm-l), and for a-(Ala), the observed band is at 908 cm-' (calculated at 910 cm-l). Even 3lo-(Aib), has an observed
-
338
SAMUEL KRIMM A N D JAGDEESH BANDEKAR
band at 908 cm-’ (calculated at 905 cm-I). The only coordinate in common is CN s, and it seems as if the frequency of this mode is determined only by the side-chain composition, being 884 cm-’ for Gly and 908 cm-’ for Ala (or its related Aib), and is independent of main-chain conformation. This is both confirmed and modified by the results on (Glu),. For a(Glu), this mode is found at 924 cm-’ (calculated at 922 cm-’), an in-. crease from a-(Ala), that seems to indicate a dependence on the number of carbon atoms in the side chain. [This trend is continued in a-(Lys),, where the comparable band is observed at 945 cm-’ (Chen and Lord, 1974; Yu et al., 1973).] However, the frequency for @-(Glu),is not the same as for a-(Glu), [as was true of a-(Ala),], being observed at 956 cm-l (calculated at 944 cm-I). [The comparable frequency of p-(Lys), is 1002 cm-l (Yu et al., 1973; Frushour and Koenig, 1975b).]These results would seem to indicate that for side chains longer than CH3 the frequency of this skeletal stretch mode may depend on main-chain conformation. Calculations on (Glu), (Sengupta and Krimm, 1987) bear this out. Sengupta and Krimm (1987) calculated the normal modes of (Glu), in conformations varying from the extended APPS structure (a 21 helix) through 2.41- and 31-helix structures to the a-helix conformation (a 3.61 helix). They used force fields for various of these calculations that were based on those for @-(Ala),, (Gly),II, and a-(Ala),. Although the frequency values depend on force field, it was found that the frequency of this skeletal stretch mode varied essentially linearly with the backbone 4 angle. It would appear that, for side chains longer than CH3, there is mixing of backbone and side-chain stretching motions that depends on the “extension” of the backbone. It remains to be seen whether a linear relation with 4 is valid for all side chains, but it seems clear that the frequency of this mode can be an indicator of backbone conformation. 6 . Amade V Mode The amide V mode in NMA consists of CN t plus NH ob, although CO ob can make a small contribution (Rey-Lafon et al., 1973). In the polypeptide chain, CN t and NH ob are also the main components but other coordinates contribute significantly. Thus, the frequency of this mode depends not only on the strength of the hydrogen bond (Miyazawa, 1962), but also on the side-chain structure. We present in Table XXXIX the PEDs together with their N 0 and H .-.0 bond lengths, for observed amide V modes of structures for which the normal-mode calculations have been done. 1..
TABLE XXXIX Potential Energj Dirtributwns" for Observedb Am& V Modes
(G1Y)"I
P-(AW,,
Coordinate
708
I(N ... O)d l(H ... O)d
2.91 2.12
CNt NH ob NH ... 0 ib CO ob CO ib NCaC d CaCN d C"C s NC" s NH t NCa t Ha bl H ... 0 s C8CyC8 d C'CS s CCpr
75
44-749
16 19 5
41-28K 20
706
698