Non-Native Prosody
≥
Trends in Linguistics Studies and Monographs 186
Editors
Walter Bisang Hans Henrich Hock (main editor for this volume)
Werner Winter
Mouton de Gruyter Berlin · New York
Non-Native Prosody Phonetic Description and Teaching Practice
edited by
Jürgen Trouvain Ulrike Gut
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Non-native prosody / edited by Jürgen Trouvain, Ulrike Gut. p. cm. ⫺ (Trends in linguistics. Studies and monographs ; 186) Includes bibliographical references and index. ISBN 978-3-11-019524-8 (cloth : alk. paper) 1. Language and languages ⫺ Study and teaching. 2. Prosodic analysis (Linguistics) ⫺ Study and teaching. I. Trouvain, Jürgen. II. Gut, Ulrike. P53.68.N66 2007 4141.6⫺dc22 2007021894
ISBN 978-3-11-019524-8 ISSN 1861-4302 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. ” Copyright 2007 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Preface
The present volume brings together contributions by a group of researchers and teachers with a shared interest in the description and teaching of the prosody of a second language. The idea for this book was conceived at the International Workshop on “Non-native prosody: phonetic description and teaching practice”, held at the Saarland University in Saarbrücken on March 4th and 5th 2005. The two central objectives of the workshop were to stimulate research and teaching of second language prosody and to establish an interchange between researchers and language teachers. The last two decades have witnessed an increasing interest in prosody in general, yet most research on non-native speech is still restricted to segmental structures and largely disregards suprasegmental features like pitch and temporal structures; few publications as yet deal with L2 prosody. This neglect of prosody is also evident in the practical field of language teaching. Publications on foreign language instruction and also teaching materials rarely deal with prosody. Despite its pedagogically oriented historical foundation, current second language acquisition research is no longer directly concerned with pedagogic issues. As a result, communication between Second Language Acquisition (SLA) researchers and language teachers has become difficult or has ceased to exist altogether. The aim of this volume is to fill this gap and to provide a forum for exchange for both disciplines. The first part contains contributions by SLA researchers and experts in prosody. They present descriptions of non-native prosodic structures in the areas of intonation, stress, speech rhythm and vowel reduction as well as methodological considerations on research in SLA in a format accessible to teachers. This includes overviews of current theoretical models as well as findings from empirical investigations. In the second part, some of the leading teaching practitioners and developers of phonological learning materials present a variety of methods and exercises in the area of prosody. This volume is a product of scientific and practical interchange and provides a platform and incentive for further collaboration. On the one hand, research on non-native prosody can help teachers to interpret and make sense of their classroom experiences and to provide them with a broad range of pedagogic options. On the other hand, researchers
vi
Preface
may be encouraged to investigate aspects of non-native prosody that have been shown to be of primary importance in language classrooms. We hope that this volume will contribute usefully to this dialogue and that it shows some new trends in theoretical as well as applied linguistics. On this occasion we would like to thank the Ministry of Education, Culture and Research of the Saarland for their financial support. Without this help the workshop would not have taken place. We also would like to thank our anonymous reviewer for all the valuable comments. Saarbrücken and Freiburg, December 2006
Jürgen Trouvain and Ulrike Gut
Contents
Preface
v
Introduction Bridging research on phonetic descriptions with knowledge from teaching practice – The case of prosody in non-native speech Ulrike Gut, Jürgen Trouvain and William J. Barry
3
Part 1: Phonetic descriptions An introduction to intonation – functions and models Martine Grice and Stefan Baumann
25
Phonological and phonetic influences in non-native intonation Ineke Mennen
53
Different manifestations and perceptions of foreign accent in intonation Matthias Jilka
77
Rhythm as an L2 problem: How prosodic is it? William Barry
97
Temporal patterns in Norwegian as L2 Wim van Dommelen
121
Learner corpora in second language prosody research and teaching Ulrike Gut
145
Part 2: Teaching practice Teaching prosody in German as a foreign language Ulla Hirschfeld and Jürgen Trouvain Metacompetence-based approach to the teaching of L2 prosody: practical implications Magdalena Wrembel
171
189
viii
Contents
Individual pronunciation coaching and prosody Grit Mehlhorn
211
Prosodic training of Italian learners of German: the contrastive prosody method Federica Missaglia
237
Language index
259
Index of L1–L2 combinations
260
Subject index
261
Introduction
Bridging research on phonetic descriptions with knowledge from teaching practice – The case of prosody in non-native speech Ulrike Gut, Jürgen Trouvain and William J. Barry 1.
Introduction
The phenomenon of “non-native prosody” is of interest for a variety of groups and has been seen from different perspectives and used for different purposes. These groups include foreign language teachers, teachers of these teachers, authors of learning materials, researchers, and engineers facing the problem of non-native input for automatic speech recognizers. Broadly speaking, we can divide the professional groups concerned with non-native prosody into two categories: linguists who carry out research on language data, and teachers who give language classes. Both groups have in common that they deal with real data and not simply hypothetical concepts of non-native prosody. As a simplification, one could claim that the former group considers non-native prosody in theory, and the latter group is concerned with non-native prosody in practice. The aim of this article is to show the interests and methods of both groups, to ask for common and/or distinct interests, to uncover parallels but also differences, to describe the exchange between the two groups and to show the limitations and the benefits of a “bi-lateral” exchange of insights and knowledge. In section 2, the interests and methods of the theoretical and the practical groups are presented and the current state of the exchange between these two approaches to non-native prosody is described. Section 3 illustrates the potential for exchange with examples from the area of stress, articulation rate, speech rhythm and intonation. In the last section, we will point out requirements and solutions for the mutual benefit of both groups.
4
2.
Ulrike Gut, Jürgen Trouvain and William J. Barry
Theoretical and practical approaches to non-native prosody
The aim of theoretical research in the area of second language (L2) prosody, as in linguistics as a whole, is to develop descriptions in the form of models and theories with predictive power. Those models and theories are based on and tested by empirical research, that is on observations and measurements of non-native speech, and are modified according to these observations. A rich choice of research methods exists which vary along the lines of the type of language data that is analysed (experimental data or spontaneous data) and the analysis method (e.g. qualitative versus quantitative; auditory or instrumental). Typically, speech elicited from non-native speakers in closely controlled conditions is analysed instrumentally (see Barry, this volume, Gut, this volume, Jilka, this volume, Mennen, this volume, van Dommelen, this volume). Based on these data, generalizations are made and formulated in models and theories of non-native prosody. Fundamental research of this type can have two main foci: a synchronic or a developmental focus. In the former, non-native prosody at one stage is described, whereas in the latter the aim is to find common developmental paths or stages in the acquisition process of language learners. Findings by theoretical researchers are disseminated in publications and conference presentations on both the national and international level, whereby “international” is often restricted to English. The aim of language teachers is to enable language learners to produce and perceive the prosody of the target language to an adequate extent, depending on the learner’s needs. This might range from minimal communicative abilities to a near-native language competence. Teachers have a wide range of methods available, including imparting theoretical knowledge, raising awareness for language structures, practical production exercises and perceptual training. Again depending on the learner’s expectations and requirements, teachers pick a combination of these methods. Typically, language teachers learned these methods in their teacher-training courses and modify and extend their repertoire with increasing teaching experience. Occasionally, teachers are encouraged to participate in further training programmes. The two groups have different expectations and conceptions about “the other side”. Some researchers are interested in seeing their findings applied in language teaching and describe implications for teaching. They envisage the application of theoretical findings in second language research to language teaching as a top-down process, with a direct link between research-
Bridging research on phonetic descriptions
5
derived theory and classroom practice. Language teachers, conversely, wish to be provided with relevant teaching materials and methodologies. Both sides express dissatisfaction with each other, as reported by several authors (van Els and de Bot 1987:153, Ellis 1997). Often, the findings of empirical research are not clear and uncontested enough to provide a straightforward guideline for teachers. Moreover, the results of empirical research are rarely disseminated or presented in a way that is meaningful and immediately accessible to language teachers. In addition, the interests of researchers do not necessarily focus on areas that are considered most conspicuous and important by teachers. Lastly, the question remains whether there is a “best method” to teach L2 prosody. Due to the constantly varying nature of the classroom, teachers, based on their experience and knowledge, apply pedagogical methods flexibly, depending on the changing dynamics of the learner community and classroom context. The relationship between the two groups concerned with non-native prosody is and always has been difficult. Researchers do rarely go to language classes and teachers do rarely go to scientific conferences. An exchange between the two poles “theoretical research” and “language class” is highly desirable but there are no institutionalised platforms for the various professional groups concerned with L2 prosody to meet. At least one intermediate group of professionals can be identified: the writers of language text books and developers of teaching materials. Ideally, they form a bridge between theoretical research and language teaching by selecting findings and (re-)formulating them in a way to make them accessible to both language teachers and language learners and by developing appropriate learning materials. This means that they have to be simultaneously able to interpret and assess the relevance of the theoretical research and be aware of the requirements of language teachers. Moreover, they need to be able to transform theoretical findings into suitable exercises and come up with interesting examples. Unfortunately, very few people with these qualifications exist. In the commercial sector, language material is developed under time and financial pressure so that, in reality, a thorough sifting of the numerous publications and conference proceedings in the area of non-native prosody is not possible. However, even if there were sufficient professionals qualified to bridge the gap between theory and practice, in many cases they would fail because of the lack of overlap in interests between the two groups. Whereas language teachers are concerned with the acquisition of non-native prosody, researchers focus mainly on the description of individual stages. In most a-
6
Ulrike Gut, Jürgen Trouvain and William J. Barry
reas of L2 prosody research, a myriad of competing theories and models dealing with fine-grained details exist which predict very different acquisition processes and attribute different degrees of importance to particular pedagogical strategies and learner characteristics. It is the purpose of this article to describe this gap using the problem areas of non-native stress, articulation rate, speech rhythm and intonation as examples. Furthermore, the present volume as a whole constitutes a step towards bridging the gap between theory and practice in L2 prosody and to describe ways of achieving a mutual interchange beneficial to both sides.
3.
Theoretical-practical exchange in L2 prosody
In the following sections, we will trace the gap between theoretical researchers and language teachers with the examples of non-native stress, articulation rate, speech rhythm and intonation and show where improvement in the exchange and mutual benefits are possible. 3.1. Stress “Stress“ in theory Stress and accent, which give prominence to a syllable in a word or a word in a phrase, have been identified by many theoreticians as well as practitioners as important prosodic concepts (e.g. Fox 2001; see also Mehlhorn this volume, Missaglia, this volume, Hirschfeld and Trouvain, this volume). However, the terms “stress” and “accent” are used in contradictory ways among researchers (cf. Grice and Baumann, this volume). Sometimes, “stress” is defined as an abstract category, the prominence of a word represented in the speaker’s mental lexicon, and “accent” as its observable, phonetic realization in actual speech (e.g. Jassem and Gibbon 1980). Others use the terms with exactly the opposite meaning (e.g. Laver 1994). We use the term “stress” here in the first sense, i.e. stress as a potential accent, and we reserve “accent” for the realized “stress” (resulting in perceived prominence) when a word is produced in an utterance. Moreover, theoretical research in the areas of stress and accent is not only characterized by terminological debates but has also generated controversies on the subjects of the appropriate mode of their description, their phonetic correlates as well as their phonological role in specific languages.
Bridging research on phonetic descriptions
7
There are languages that are said not to have word stress as an abstract phonological category at all, for example Japanese (Beckman 1986). Other languages have been divided into those that have obligatory word stress and those without. Word stress can be relatively unpredictable or fixed. In the case of fixed stress, all words of a language have stress on a particular position, e.g. the last syllable (for example Turkish) or the penultimate syllable (for example Welsh). In languages with low predictability in their word stress (for example German and English), a set of phonological rules is usually needed to describe the stress patterns of words. Yet, little consensus has been reached on the appropriate description of word stress rules in these languages, and the competing proposals are typically based on abstract theoretical models that are not accessible to the uninitiated reader (e.g. Hayes 1984, Wiese 1996, Gamon 1996, Pater 2000). In addition, the term stress has been applied to two domains of phonological description: word-stress, which is a phonological property of the word, and sentence-stress, where stress is seen as a differentiating property of the utterance. In the second domain, a distinction between stress and intonation is difficult to uphold (e.g. Kingdon 1939) as the relationship between accents and pitch is very intricate. In intonation languages such as English and German, pitch is anchored to accents (see also section 3.4). Other languages differ with respect to whether “pitch” or “stress” is assumed to have precedence. In Swedish, for example, lexically stressed syllables have additional tonal information (van der Hulst and Smith 1988), whereas in Japanese, the presence of tone alone is assumed to determine the position of the prominent syllables (Abe 1998). The above-mentioned differences in terminology used to capture the prosodic differences between languages stem in part from the fact that the phonetic realization of accents can be different in different languages. In languages with “dynamic accent” such as English or German, the phonetic parameters pitch, length, loudness and articulatory precision are combined with different relative importance for the phonetic realization of stress (cf. Cruttenden 1997). In both English and German, the difference between stressed and unstressed syllables is correlated with differences in duration together with a different vowel quality, differences in pitch height and loudness. In “pitch-accent” languages (i.e. languages in which lexical words can have a distinctive tonal form) such as Swedish or Norwegian, phonetically different types of tones or pitch patterns are used to prosodically differentiate words (Gårding 1998).
8
Ulrike Gut, Jürgen Trouvain and William J. Barry
“Stress” in practice Numerous publications have shown that non-native speakers do not always produce stress on words and in sentences in a native-like manner (e.g. Backman 1979, Juffs 1990, Grosser 1997). Some authors even report “stress deafness” (Dupoux, Pallier, Sebastian and Mehler 1997): Speakers of French, a language without stress differentiation at the word level are deaf to lexical stress that Spanish speakers perceive. This “stress deafness” could affect the learning of stress-related phenomena in foreign languages. Moreover, no matter whether a researcher studies speech signals of nonnative speakers or a teacher is confronted with the oral performance of language learners in the classroom, the evidence is the same: non-native speakers of some languages have more difficulties with stress and accentuation than non-natives of other languages. This is the case whether or not the L1 and L2 involved both have word stress. Learners of English, whose native language has different word stress rules, for example, show different strategies in producing word stress and sentence stress patterns in their L2 (Archibald 1995). What is more, incorrect stress patterns often persist despite long exposure to correct forms. Thus it would appear that “stress deafness” is not merely the result of stress typology differences (as between French and Spanish). This dependency of the teaching of stress rules on the native and target languages involved requires a variety of didactic approaches. Target languages without word stress or with fixed word stress require different teaching methods than languages with unpredictable stress. When the stress systems of native and target language coincide, stress does not need to be taught at all, though attention to particular “faux amis“ must not be neglected (e.g. Spanish and Italian “teLEfono“ versus English and German “TElephon(e)“). In all other cases, current teaching methods typically focus on the creation of language awareness (see Mehlhorn, this volume and Wrembel, this volume). This is achieved by a combination of perceptual and articulatory training and knowledge input (see also Hirschfeld and Trouvain, this volume). Language awareness is also assumed to enhance the acquisition of further foreign languages. For example, it has been proposed that a native speaker of Polish who has learned in English as a first foreign language that the penultimate stress pattern of his or her native language cannot be transferred to the L2 has created phonological awareness of the importance of word stress and will increase his or her sensitivity for word stress rules in further foreign languages. Thus in the acquisition of a further lan-
Bridging research on phonetic descriptions
9
guage he or she will profit from general phonological awareness developed in the acquisition of another language, even though the two languages (of course) have different phonological systems. Naturally, the creation of language awareness presupposes a reliable phonological description of the stress rules of a particular language. Besides the phonological awareness for predicting the position of word stress a phonetic awareness is needed for the realisation of stressed syllables in contrast to unstressed syllables. It may be that the target language and mother tongue differ in how stress is realised with a mix of duration, pitch, intensity and articulatory precision. 3.2. Articulation rate "Articulation rate" in theory Listeners perceive their native language/s and those they speak with a high level of proficiency as less fast than those languages they have a poor command of or do not know at all. Abercrombie (1967: 96) puts it as follows: “Everyone who starts learning a foreign language, incidentally, has the impression that its native speakers use an exceptionally rapid tempo.” Though languages may differ in terms of rate of speech production – depending of course on speech mode and what unit is selected for measurement (including or excluding pauses, spontaneous speech or reading passage style) – there certainly appear to be differences in the way speech rate is perceived across languages. Some authors explain the false impression that an unknown language sounds faster than normal (i.e. than one’s own language) with phonological differences such as different patterns of syllabic complexity (Osser and Peng 1964). Articulation rate plays a significant role for learners of a foreign language, not only in speech comprehension but also in speech production. It is usually taken as a correlate of a speaker’s general language proficiency or fluency and is conceptualized to correlate with the fluidity, continuity, automaticity or smoothness of oral speech production. Rate of speech has been measured in many ways (cf. Trouvain 2004). This also applies in the context of language learning: Lennon (1990) measures speed rate both with words per minute unpruned and words per minute after pruning, where pruning refers to the exclusion of all repeated and self-corrected words as well as asides, i.e. comments on the narrative task itself. Towell (2002) measures the number of syllables per minute and Cucchiarini, Strik and
10
Ulrike Gut, Jürgen Trouvain and William J. Barry
Boves (2000, 2002) measure the number of phonemes per time unit. In addition, the mean length of a “run” has been analysed where a “run“ is defined as a stretch of speech between pauses (e.g. Lennon 1990, Towell 2002, Cucchiarini, Strik and Boves 2000, 2002, Freed, Segalowitz and Dewey 2004) with some researchers including filled pauses in “runs” and others not. Since a run is defined by its delimitation by pauses of a certain length, it does not necessarily represent a semantic or syntactic unit in speech. A syntactically-based chunking of speech is proposed by Lennon (1990) with the “t-unit”, which he defines as one main clause and all subordinate clauses. He measures the frequency and length of pauses within “tunits”, the percentage of “t-units” followed by a pause as well as the percentage and mean length of pauses at “t-unit” boundaries. The ratio between pauses and speech in recordings is referred to as the phonation/time ratio (Towell 2002, Cucchhiarini, Strik and Boves 2000, 2002) and is measured by dividing the total duration of speech by the total duration of the recording. Finally, the amount of speech can be measured either in the total number of words produced (e.g. Freed, Segalowitz and Dewey 2004) or in the duration of speaking time per total recording time. This measurement can obviously only be used when the analysed recordings of the different speakers have a comparable length. Experimental studies have shown that some of these quantitative measurements of articulation rate correlate with native speaker’s judgements of fluency (Lennon 1990, Cucchiarini, Strik and Boves 2000, 2002). However, it was further found that articulation rate is not constant in natural speech (Miller, Grosjean and Lomanto 1984). Even in reading passages, the articulation rate may be adjusted (by competent readers) by giving more time to sections with greater communicative weight and less time to those that are less important to the “message”. Rate variation is therefore an important concept. Hand in hand with this, of course, go all the other segmental and prosodic modifications that are associated with local temporal changes (often referred to as “local speech rate”) resulting from information weighting – from lexical stress position to function-word or particle destressing and topic and focal accenting (see e.g. Eefting 1991). “Articulation rate” in practice In language teaching and testing, articulation rate, or speed of delivery in an L2 is taken as an important diagnostic feature. Articulation rate which also reflects the level of fluency of a non-native speaker is highly correlated
Bridging research on phonetic descriptions
11
with the level of proficiency evaluated by native listeners (Gut 2003). When grading oral examinations, teachers are often asked to score candidates for fluency and even in standardized testing procedures such as exams taken by the Deutscher Akademischer Austauschdienst or the British Council, candidates have to be allocated to bands with descriptions such as “fluent, virtually error free” or “not fully fluent with occasional inappropriate use of structures”. Despite this central importance of a native-like speech rate in an L2, very few didactic methods for its acquisition seem to have been developed for language teaching. A common conviction seems to be that an increase in articulation rate merely constitutes a quasi-automatically acquired feature of the language learner’s generally improving linguistic competence. Missaglia (this volume) describes some exercises that raise the learners’ awareness of stylistic variants and the concomitant segmental and prosodic feature changes that are associated with speech rate changes. Yet, so far, there are almost no attempts to include speech-rate variation in the teaching strategy. This is valid for varying the global rate of the same utterances in audio material as well as for more varied local rate changes in different text sorts. On the comprehension side, didactic methods focussing on articulation rate could include examples of different, situationally defined stylistic variants of key expressions (cf. for German “Phonetik Simsalabim” by Hirschfeld and Reinke 1998). 3.3. Speech rhythm “Speech rhythm” in theory “Speech rhythm” is a concept that has been the subject of intensive discussion and empirical investigation over many decades. In early theoretical approaches it was described as a periodic and relatively isochronous recurrence of events such as syllables in the case of the so-called “syllabletimed” languages, and feet in the case of the so-called “stress-timed” languages (Pike 1945, Abercrombie 1967). In syllable-timed languages such as French, syllables were assumed to be similar in length. Stress-timed languages, to which English was counted, in contrast, were supposed to have isochronous, i.e. regular, recurring stress beats. Since in those languages the number of syllables between two stress beats varies, they are adjusted to fit into the stress interval – hence syllable length is reported to be very variable in stress-timed languages. No convincing acoustic basis for
12
Ulrike Gut, Jürgen Trouvain and William J. Barry
either isochrony of feet in stress-timed languages or equal length of syllables in syllable-timed languages has ever been found (e.g. Classé 1939, Uldall 1971, Fauré, Hirst and Cafcouloff 1980, Roach 1982, Dauer 1983). More recent approaches of measuring speech rhythm are based on the assumption that speech rhythm is a multidimensional concept which includes various phonological properties of languages. Accordingly, languages are no longer classified into discrete rhythmic classes but are assumed to be located along a continuum, though the continuum is still generally described in terms of its “syllable-timed” and “stress-timed” extremes. Dauer (1983), for example, suggested that rhythmic differences between languages are the result of phonological, phonetic, lexical, and syntactic facts such as variety of syllable structures, phonological vowel length distinctions, absence/presence of vowel reduction and lexical stress. Since syllables increase in length when segments are added and closed syllables are longer than open ones, speech rhythm measured in terms of syllable-duration differences reflects the syllable complexity distribution. So languages without complex syllables tend to have more equal syllabic durations than those with strongly varying complexity. Equally, overall differences in “rhythm” between languages reflect whether a language has vowel reduction or not; those classified as stress-timed do, though it may or may not be coded as a phonological alternation as it is in English, Danish or Portuguese. Many languages classified as syllable-timed either do not have lexical stress or accent is realized by variations in pitch contour. Conversely, stress-timed languages realize word level stress by a combination of length, pitch, loudness and quality changes, which result in clearly discernible beats, at least in deliberate or stylized production. On the basis of this approach several phonetic measurements of “speech rhythm” have been proposed. Ramus, Nespor and Mehler (1999) segment speech into vocalic and consonantal parts and calculate the proportion of the vocalic intervals of a sentence and the standard deviation of the vocal and consonantal intervals. Other measurements focus on local relations. Grabe and Low (2002) measure the difference in duration between successive vowel durations and between successive consonantal intervals. Gibbon and Gut (2001) calculate the ratio of adjacent syllable and vowel durations. These studies have succeeded in describing differences between languages (Ramus, Nespor and Mehler 1999, Grabe and Low 2002, Gut and Milde 2002) as well as between varieties of one language (Low and Grabe 1995, Gut and Milde 2002). Critics of these parametrisations, however, point out that speech rhythm is located on a higher phonological level than segments
Bridging research on phonetic descriptions
13
and that it consist of a coupling between intervals at a lower prosodic level with those at a higher level (Cummins 2002). Dauer (1983) and Barry (this volume) even suggest dispensing with the concept of “speech rhythm” altogether and recognizing that it is used merely as a cover term for a range of structural properties of a language. “Speech rhythm” in practice The concept of “rhythm” that a theoretically unburdened language teacher (or language learner) has is probably very different from the complex definition underlying the studies mentioned in the previous section. The traditional view of a syllable-timed or stress-timed distinction lies closer to the intuitively more plausible concept of rhythm as a regular beat. This brings together music and poetry, supporting the idea of utterances in different languages potentially differing in their inherent rhythm. However, even the most competent of teachers needs to understand the factors which underlie the differences between a “rhythmically correct” and an “incorrect” rendering of an utterance she/he is offering for practice. Typology statements reflect tendencies, but teaching requires concrete utterances which encapsulate the critical features that distinguish the L2 rhythmic type from the L1 type. Though these may be easy enough to find among the communicatively useful expressions that language course books introduce, the repetitive production that is essential in order to guarantee the sense of rhythmicality may be easier in some learner groups than others. Finally, the acquisition of rhythmic sensitivity must extend to an awareness of “utterance rhythm” as the product of “word sequence” x “context”, by varying the context in which a particular expression is practised. 3.4. Intonation “Intonation” in theory The term “intonation” is used in theoretical research with different scopes. In a broad definition, the term covers both linguistic and paralinguistic features such as tempo, voice quality and loudness which signal the emotional state of the speaker (cf. Fox 2001). Less broad definitions include only linguistic phenomena produced with the prosodic features tone, stress and quantity and their physiological correlates fundamental frequency, intensity, duration and spectral characteristics. The narrowest definitions of into-
14
Ulrike Gut, Jürgen Trouvain and William J. Barry
nation are restricted to only postlexical phonological phenomena thus excluding word stress, tone and quantity (Ladd 1996, Hirst and di Cristo 1998). Currently, two major competing models of intonational structure are in use for the description of intonation, based on a number of fundamentally different assumptions about intonational structure and using different conventions of intonational transcription (see also Grice and Baumann, this volume). The contour-based approaches on the one hand take pitch movements or contours as the basis of intonational analysis. Intonational analysis in this approach is mainly carried out auditorily. Intonation is represented in detailed interlinear transcriptions which depict the properties of each syllable in terms of accentedness, pitch height and pitch movement. The autosegmental-metrical approach, on the other hand, proposes that intonation consists of sequences of minimally two and maximally three different tone levels. These tones can be realized as pitch accents, usually aligned with accented syllables, or have a delimitative function as initial or final tones of intonational phrases. Intonational analysis in this approach relies on a combination of computer-assisted instrumental and auditory techniques. Cross-linguistic descriptions of the intonational system of languages are still few and far between (e.g. Delattre 1965, Fox 1981, Willems 1982, Grabe 1998, Hirst and di Cristo 1998, Jun 2005). For individual languages, tone inventories and the meaning of particular pitch movements or tone combinations have been proposed (e.g. Grice, Baumann and Benzmüller 2005 for German, and Pierrehumbert and Hirschberg (1990) for American English). In these descriptions, however, the authors stress that a specific tone or pitch contour does not have an abstract meaning but may rather be associated with a specific pragmatic meaning in given contexts. As yet, very few empirical studies exist that systematically investigate the intonation of non-native speech (but see Mennen, this volume, on pitch alignment and pitch range, and Jilka, this volume, on tone inventory), but native language influences have been variously described (e.g. van Els and de Bot 1987). “Intonation” in practice Despite the relatively uncontroversial theoretical side of intonation, the teaching of intonation still plays a minor role in the L2 classroom. This might be due to the fact that both teachers and learners of a foreign language still underestimate the consequences which deviant intonational pat-
Bridging research on phonetic descriptions
15
terns may have in communicative and attitudinal respects. The use of visualization techniques that enable learners to perceive differences between their own and a native speaker’s rendition of utterances with the help of computers that display the respective intonation curves is often still impeded by the technical requirements in classrooms and the lack of suitable software tools (but see Herry and Hirst 2002 for a successful attempt). As for the teaching of stress rules, the creation of language awareness (see Mehlhorn, this volume) and perceptual sensitization (see Wrembel, this volume) seems to constitute a prerequisite for the production of nativelike intonation by language learners. In the approach suggested by Missaglia (this volume), in contrast, the acquisition of intonation is pictured as an unconscious by-product of teaching methods that focus on larger prosodic units and imitative techniques.
4.
Research and practice – mutual stimulation?
In the preceding sections we illustrated the gap that exists between theoretical research on L2 prosody, on the one hand, and teaching practice in language classes on the other. In this summary we would like to suggest answers to the question how research and practice can benefit from each other. In particular we will discuss how research results can provide the source for course book materials for language teachers, and how we picture the possible impact from state-of-the-art teaching practice on theoretical researchers. “Research and Development” should ideally comprise a double orientation – theory and application – and a continuum of activity which allows the practical implementation of the theoretical results. In the case of language teaching at the applicational end of the activity continuum, theoretical research questions can be directed towards contrastive aspects of language structure and speech patterns, as we have illustrated in this paper. Equally valid theoretical poles from which to derive applicational answers are, on the one hand, research into learning psychology and patterns of languagelearning behaviour (cf. Flege and Hillenbrand 1984, Flege 1995, Strange 2002 and, on the other, research into didactics and language-teaching methodology. A comprehensive theoretical grounding of language-teaching materials clearly demands a breadth and depth of theoretical research knowledge that would go beyond anything that can be expected of anyone actively involved in teaching.
16
Ulrike Gut, Jürgen Trouvain and William J. Barry
Is it illusory, then, to expect the practical exploitation of theoretical research into prosody? When the results of research consist of theoretical descriptive models, the answer is probably “Yes”. But if the descriptive models provide contrastive information about different languages, they offer a theoretically solid basis for course book authors and teachers to focus exercises on, in whichever didactic and methodological framework they subscribe to. The contrastive work done within the structuralist linguistic framework during the 1950's and 1960's on the syntax, morphology and segmental phonology of various languages is an example of how theoretical work can become established as the basis for developing practical teaching materials (e.g. Moulton 1962, Kufner 1971). However, it also illustrates the problems inherent in theory which did not take the reality of the learning/teaching situation into consideration. Contrasting phoneme inventories ignores allophonic or other phonetic differences (e.g. vowelquality differences) that may lie behind identical phonetic symbols. The potential of research results for practical application, therefore, depends on their being formulated in a way which is relevant to the learner's task and understandable for the teacher. In general, however, the direct application of research findings in the classroom must be regarded with reserve. Rather, we have shown with the examples discussed in section 3 that an intermediate step is necessary. The relevance of research findings can only be investigated in studies on actual language teaching. It is those research results that offer possibilities of direct application in other classroom situations. Yet, scientific studies on foreign language classroom practices are rare to find. This is especially lamentable because we believe that these kinds of investigations provide the essential link between theory and teaching practice in L2 prosody. Furthermore, they present the opportunity for research to benefit from state-ofthe-art language teaching. For example, a possible focus could be whether the prosodic concepts of stress, intonation, speech rhythm and so forth employed in teaching are the same as the theoretical concepts proposed in research. Discrepancies can spark off new directions for research. Likewise, scientific results gathered on teaching prosody to non-native speakers with different native languages can be beneficial for research. Technological advances have brought the acquisition of speech produced in situ and its post-production processing and analysis within the reach of even small research teams and made non-intrusive collaboration between teachers and researchers a genuine possibility.
Bridging research on phonetic descriptions
17
References Abe, Isamu 1998
Intonation in Japanese. In: Daniel Hirst and Albert di Cristo (eds.), Intonation Systems. Cambridge: Cambridge University Press, 360–75. Abercrombie, David 1967 Elements of General Phonetics. Edinburgh: Edinburgh University Press. Archibald, John 1995 The acquisition of stress. In: John Archibald (ed.), Second Language Acquisition and Linguistic Theory, 81–109. Oxford: Blackwell. Backman, Nancy 1979 Intonation errors in second-language pronunciation of eigth Spanish-speaking adults learning English. Interlanguage Studies Bulletin Utrecht 4, 239–265. Beckman, Mary 1986 Stress and Non-stress Accent. Dordrecht: Foris. Classé, Andre 1939 The Rhythm of English Prose. Oxford: Blackwell. Cruttenden, Alan 1997 Intonation. Cambridge: Cambridge University Press (2nd edition). Cucchiarini, Catia, Helmer Strik and Lou Boves 2000 Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology. Journal of the Acoustical Society of America 107, 989–999. 2002 Quantitative assessment of second language learners’ fluency: comparisons between read and spontaneous speech. Journal of the Acoustical Society of America 111, 2862–2873. Cummins, Fred 2002 Speech Rhythm and Rhythmic Taxonomy. Proceedings of the Speech Prosody 2002 conference, Aix-en-Provence (France), 121–126. Dauer, Rebecca 1983 Stress-timing and syllable-timing reanalysed. Journal of Phonetics 11, 51–62. Delattre, Pierre 1965 Comparing the Phonetic Features of English, German, Spanish and French. Heidelberg: Groos Verlag.
18
Ulrike Gut, Jürgen Trouvain and William J. Barry
Dupoux, Emmanuel, Christophe Pallier, Nuria Sebastian and Jacques Mehler 1997 A destressing ‘deafness’ in French? Journal of Memory and Language 36, 406–421. Eefting, Wieke 1991 Timing in Talking. Tempo Variation in Production and Its Role in Perception. PhD thesis, University of Utrecht. Ellis, Rod 1997 SLA research and language teaching. Oxford: Oxford University Press. van Els, Theo and Kees de Bot 1987 The role of intonation in foreign accent. The Modern Language Journal 71, 147–155. Fauré, George, Daniel Hirst and Michel Chafcouloff 1980 Rhythm in English: isochronism, pitch, and perceived stress. In: Linda Waugh and Cornelis van Schooneveld (eds.), The melody of Language, 71–79. Baltimore: University Park Press. Flege, James E. 1995 Second-language speech learning: Theory, findings, and problems. In: Winifred Strange (ed.), Speech perception and linguistic experience: Theoretical and methodological issues, 233–273. Timonium, MD: York Press. Flege, James E. and James Hillenbrand 1984 Limits on phonetic accuracy in foreign language speech production. Journal of the Acoustical Society of America 76, 708–721. Fox, Anthony 1981 Fall-rise intonation in German and English. In: Charles V.J. Russ (ed.) Contrastive Aspects of English and German, 55–72. Heidelberg: Groos Verlag. 2001 Prosodic Features and Prosodic Structure. Oxford: Blackwell. Freed, Barbara, Norman Segalowitz and Dan Dewey 2004 Context of learning and second language fluency in French: comparing regular classroom, study abroad, and intensive domestic immersion programs. Studies in Second Language Acquisition 26, 275–301. Gamon, Michael 1996 German word stress in a restricted metrical theory. Linguistische Berichte 162, 107–136. Gårding, Eva 1998 Intonation in Swedish. In: Daniel Hirst and Albert di Cristo (eds.), Intonation Systems. Cambridge: Cambridge University Press, 112–30.
Bridging research on phonetic descriptions
19
Gibbon, Dafydd and Ulrike Gut 2001 Measuring speech rhythm. Proceedings of Eurospeech, Aalborg (Denmark), 91–94. Grabe, Esther 1998 Comparative Intonational Phonology: English and German. Doctoral Dissertation, Max-Planck-Institut for Psycholinguistics and University of Nijmegen. Grabe, Esther and Ee Ling Low 2002 Durational variability in speech and the rhythm class hypothesis. In: Carlos Gussenhoven and Natasha Warner (eds.), Papers in Laboratory Phonology 7, 515–546. Berlin: Mouton de Gruyter. Grice, Martine, Stefan Baumann and Ralf Benzmüller 2005 German intonation in autosegmental-metrical phonology. In: Sun-Ah Jun (ed.), Prosodic Typology, 55–83. Oxford: Oxford University Press. Grosser, Wolfgang 1997 On the acquisition of tonal and accentual features of English by Austrian learners. In: Allan James and Jonathan Leather (eds.), Second Language Speech – Structure and Process, 211–228. Berlin: Mouton de Gruyter. Gut, Ulrike 2003 Prosody in second language speech production: the role of the native language. Zeitschrift für Fremdsprachen Lehren und Lernen 32, 133–152. Gut, Ulrike and Jan-Torsten Milde 2002 The prosody of Nigerian English. Proceedings of the Speech Prosody 2002 conference, Aix-en-Provence (France), 367–370. Hayes, Bruce 1984 The phonology of rhythm in English. Linguistic Inquiry 15, 33– 74. Herry, Nadine and Daniel Hirst 2002 Subjective and objective evaluation of the prosody of English spoken by french speakers: the contribution of computer assisted learning. Proceedings of the Speech Prosody 2002 conference, Aix-en-Provence (France), 383–387. Hirschfeld, Ursula and Kerstin Reinke 1998 Phonetik Simsalabim. Ein Übungskurs für Deutschlernende. Berlin etc.: Langenscheidt. Hirst, Daniel and Albert di Cristo 1998 A survey of intonation systems. In: Daniel Hirst and Albert di Cristo (eds.), Intonation Systems, 1–44. Cambridge: Cambridge University Press.
20
Ulrike Gut, Jürgen Trouvain and William J. Barry
van der Hulst, Harry and Norval Smith 1988 The variety of pitch accent systems: Introduction. In Harry van der Hulst and Norval Smith (eds), Autosegmental Studies on Pitch Accent, ix–xxiv. Dordrecht: Foris. Jassem, Wiktor and Dafydd Gibbon 1980 Re-defining English accent and stress. Journal of the International Phonetic Association 10, 2–16. Juffs, Alan 1990 Tone, syllable structure and interlanguage phonology: Chinese learner's stress errors. International Review of Applied Linguistics 28, 99–117. Jun, Sun-Ah (ed.) 2005 Prosodic Typology. Oxford: Oxford University Press. Kingdon, Roger 1939 Tonetic stress marks for English. Le maitre phonétique 68, 60– 64. Kufner, Herbert L. 1971 Kontrastive Phonologie Deutsch-Englisch. Stuttgart: Klett Ladd, D. Robert 1996 Intonational Phonology. Cambridge: Cambridge University Press. Laver, John 1994 Principles of Phonetics. Cambridge: Cambridge University Press. Lennon, Paul 1990 Investigating fluency in EFL: a quantitative approach. Language Learning 40, 387–417. Low, Ee Ling and Esther Grabe 1995 Prosodic patterns in Singapore English. Proceedings of the13th International Congress of Phonetic Sciences, Stockholm, 636– 639. Miller, Joanne L., François Grosjean and Concetta Lomanto 1984 Articulation rate and its variability in spontaneous speech. Phonetica 41, 215–225. Moulton, William G. 1962 The Sounds of English and German. Chicago: University of Chicago Press. Osser, Harry and Frederick Peng 1964 A cross cultural study of speech rate. Language & Speech 7, 120–125. Pater, Joe 2000 Non-uniformity in English secondary stress: the role of ranked and lexically specific constraints. Phonology 17, 237–274.
Bridging research on phonetic descriptions
21
Pierrehumbert, Janet and Julia Hirschberg 1990 The meaning of intonational contours in discourse. In: Phil Cohen, Jerry Morgan and Martha Pollack (eds.), Intentions in Communication, 271–311. Cambridge, Mass.: MIT Press. Pike, Kenneth 1945 The Intonation of American English. Ann Arbor: University of Michigan Press. Ramus, Franck, Marina Nespor and Jacques Mehler 1999 Correlates of linguistic rhythm in the speech signal. Cognition 73, 265–292. Roach, Peter 1982 On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In: David Crystal (ed.), Linguistic Controversies, Essays in Linguistic Theory and Practice, 73–79. London: Edward Arnold. Strange, Winifred 2002 Speech perception and language learning: Wode's developmental model of speech perception revisited. In: Petra Burmeister, Torsten Piske and Andreas Rohde (eds), An Integrated View of Language Development: Papers in Honor of Henning Wode. Trier: Wissenschaftlicher Verlag Trier. Towell, Richard 2002 Relative degrees of fluency. A comparative case study of advanced learners of French. International Review of Applied Linguistic in Language 40, 117–150. Trouvain, Jürgen 2004 Tempo Variation in Speech Production. Implications for Speech Synthesis. (Doctoral Dissertation, published as Phonus 8), Phonetics, Saarland University, Saarbrücken. Uldall, Elisabeth 1971 Isochronous stresses in R.P. In: Louis Hammerich, Rodolfo Jacobson and Eberhard Zwirner (eds.), Form and Substance, 205– 210. Copenhagen: Akademisk Forlag. Wiese, Richard 1996 The Phonology of German. Oxford: Clarendon Press. Willems, Nico 1982 English Intonation from a Dutch Point of View. Dordrecht: Foris Publications.
Part 1. Phonetic descriptions
An introduction to intonation – functions and models Martine Grice and Stefan Bauman This chapter provides an introduction to intonation in general, and is loosely based on an oral presentation given in the workshop “Non-native prosody: phonetic description and teaching practice” in Saarbrücken. Although intonation is particularly difficult for learners of a second language to master, it is seldom taught systematically. Although much of the early work on intonation was didactic in nature, recent studies have tended to be more experimental and/or theretically rigourous. This has created a gap between intonation as it is used in teaching and intonation research, making it difficult for the results of such research to be of use to teachers of a second language. It is our aim to bridge this gap. We provide an overview of the main issues dealt with in current theoretical research, discussing the different forms intonation can take and the functions it can fulfill, the one of course dependent on the other. Reflecting the context of the workshop, examples are predominantly in German with English translations, accompanied where relevant by Italian equivalents.1 We then present the two currently most widespread models of intonation, which will hopefully be useful for second language teachers and textbook writers for their own research and for preparation of course material. We also aim to facilitate reading of current primary literature on aspects of intonation, in particular on languages not dealt with here. With this, we hope that results from theoretical research will find their way into the classroom.
1.
Intonation
The term ‘intonation’ has been defined in at least two different ways in the literature. A narrow definition equates intonation with ‘speech melody’, restricting it to the “ensemble of pitch variations in the course of an utterance” (‘t Hart, Collier and Cohen 1990: 10). The crucial role of pitch variations for the interpretation of utterances can be seen in the German example
26
Martine Grice and Stefan Bauman
utterances (1) and (2), in which the pitch contour is represented as a line above the words spoken.
(1) Sie
hat
ein
Haus
gekauft
‘She bought a house.’
(2) Sie
hat
ein
Haus
gekauft ‘She bought a house?!’
The examples display exactly the same string of segments. They only differ in their intonation, making (1) a statement with a (rising-)falling contour, and (2) an echo question with a (falling-)rising contour. Pitch can be modulated in a categorical way, with the presence vs. absence, or type of pitch movement, and in a gradient way, involving e.g. variations in the way a pitch movement is realised: the extent of the rise or fall, or the pitch range within which a pitch movement is realised. The two main tasks of pitch modulation are (1) highlighting, marking prominence relations (Haus is more prominent than ein), and (2) phrasing, the division of speech into chunks. However, it is not pitch alone which is responsible for these tasks. A broader definition of intonation includes loudness, and segmental length and quality, although languages differ in the extent to which they modulate these to achieve highlighting and phrasing. Like pitch, loudness, length and quality are auditory percepts. Their articulatory and acoustic correlates are given in table 1 below, adapted from Uhmann (1991: 109), (see also Baumann 2006: 12).
An introduction to intonation – functions and models
27
Table 1. Aspects of speech contributing to intonation in its broad sense Articulation
Acoustics
pitch perceived scale: high – low
quasi-periodic vibrations of vocal folds
fundamental frequency (F0) measure: Hertz (Hz)
loudness perceived scale: loud – soft
articulatory effort, subglottal air pressure
intensity measure: decibel (dB)
length perceived scale: long – short
duration and phasing of speech gestures
duration of segments measure: millisecond (ms)
vowel quality perceived scale: full – reduced
vocal tract configuration, articulatory precision
spectral quality measure: formant values in Hz
Perception
We now examine the two tasks of intonation, highlighting and phrasing, in more detail. 1.1. Highlighting In languages like English and German, utterance level prominence is realised on a designated syllable either by means of increased loudness and length, and unreduced vowel quality (all contributing to stress) or by means of the above, accompanied by a pitch movement (accent). This is not the case for all languages. Some languages use pitch movement without the accompanying loudness, length and vowel reduction (or at least using them to a lesser degree). English and German are referred to by Beckman (1986) as ‘stress-accent languages’, in contrast to, e.g., Japanese, which is a ‘nonstress accent language’. Both pitch movements with stress in stress-accent languages, and those without stress in non-stress-accent languages are referred to as pitch accents. In what is to follow, we concentrate on pitch accents in stress-accent languages. The notion of ‘stress’ applies to both word and utterance levels. We differentiate between ‘lexical stress’, also called ‘word stress’, denoting abstract prominences at word level, and ‘postlexical stress’, concrete promi-
28
Martine Grice and Stefan Bauman
nences at utterance level.2 Table 2 summarises the different levels of description. Table 2. Levels of description Lexical stress Postlexical stress Accent
word level, abstract, potential for concrete prominence utterance level, concrete prominence utterance level, concrete prominence
The difference between stresses and accents entails a difference in the strength or degree of (postlexical) prominence. There are at least four different degrees of prominence at utterance level, as listed in table 3. Table 3. Degrees of prominence No stress/accent Stress (equivalent to ‘force accent’ or Druckakzent) Pitch accent Nuclear pitch accent
A stressed syllable is louder, longer and more strongly articulated, with less vowel reduction than an unstressed syllable An accented syllable (i.e. a syllable bearing a pitch accent) has additional tonal movement on or near it the nuclear syllable is the last pitch accent in an intonation phrase, usually perceived as the most prominent one in the phrase
In (3) we provide an extended version of utterance (1) above. It might conceivably be produced with a nuclear pitch accent on Haus (‘house’), a nonnuclear pitch accent on the first syllable of schönes (‘beautiful’), and stress on the first syllable of Lena (and possibly also on –kauft) . All other syllables can be thought of as unaccented. In this and later examples, pitch accents are indicated by capital letters, stresses by small capitals.
(3) LEna hat ein SCHÖnes HAUS geKAUFT. ‘Lena bought a beautiful house.’
An introduction to intonation – functions and models
29
1.2. Phrasing Speech is divided into chunks delimited by means of intonation. These chunks have been termed breath groups, sense groups, tone units, tone groups, phonological phrases or intonational phrases, to name but a few (see Cruttenden 1997: 29–37). The most obvious indicators of boundaries between intonation units are (filled and silent) pauses. The longer the pause, the stronger the perceived boundary. However, there are many cases in which a boundary is perceived although a pause is missing. This effect is often due to an abrupt change in pitch across unaccented syllables, i.e. a jump up or down in pitch which cannot be attributed to the highlighting function of intonation. It is often difficult to decide whether an intonation unit boundary is present or not, in particular when investigating spontaneous speech. In fact, transcribers across a number of approaches to intonation have often reported that they need to capture different levels of phrasing – in simple terms larger and smaller phrases. Although the British School originally had only one level of intonational phrasing (Crystal 1969, for instance), large scale corpus transcription using this model carried out by Gerry Knowles and Briony Williams led to the introduction of an additional level, the major tone unit, which was able to contain a number of (minor) tone units (Williams 1996a, b). The autosegmental-metrical model of English intonation which contributed substantially to the ToBI framework (see section 3.2.) also makes a distinction between smaller, intermediate phrases and larger, intonation phrases. It is not clear whether there is a one-to-one correspondence between the two systems in terms of their phrasing, but it is possible to say that in many cases an intermediate phrase corresponds to a tone unit/tone group and the intonation phrase to a major tone group (see Roach 1994 and Ladd 1996 for attempts at converting between the British School and autosegmental metrical models). The intuitive need for at least two different sizes of phrase can be felt when comparing utterance (3) above, which consists of only one phrase, with (4), which appears most naturally to be composed of two:
(4) Findest Du NICHT, dass Lena ein SCHÖnes HAUS gekauft hat? ‘Don’t you think that Lena has bought a beautiful house?’
30
Martine Grice and Stefan Bauman
The jump in pitch (and thus the phrase break) occurs between nicht and dass. Although the tonal break coincides with a syntactic break here, we stress that intonational phrases and syntactic phrases are independent, even if they of course often correspond. Other instances of larger phrases containing more than one smaller phrase are lists, as in (5).
(5) Lena hat einen ROten, einen GELben ‘Lena has a red, a yellow, and a blue ball.’
und einen
BLAUen Ball.
In lists usually all but the last phrase end at a relatively high pitch, either as in (5) above, or with a high level pitch. The high pitch indicates that there is still at least one more item to come. After it the pitch is reset (i.e. there is a jump down), marking the beginning of the next phrase. A jump either up or down is a strong cue for a phrase break (the boundary between two phrases). 1.3. Consequences of highlighting and phrasing for the segments of speech In section 1.1. we claimed that sounds are more strongly articulated when they are stressed or accented. The strength of sounds is also affected by the position of the sound in the syllable and, in turn, of the syllable within the phrase. Below we outline what is meant by strengthening, both with respect to prominence and to phrasing, and describe another phrasal effect on the duration of sounds. An account of intonation cannot ignore these effects, as they are often consciously or unconsciously used as diagnostics for the intonational analysis itself. This is particularly the case for phrasing, where intuitions about levels of phrasing based on the pitch contour are often unclear. If we take the sound /t/, at the beginning of a stressed syllable it is stronger than it would be at the beginning of an unstressed syllable: compare /t/ realisations at the beginning of ‘tomorrow’ and ‘tomcat,’ where /t/ in ‘tomcat’ is stronger (we hear greater aspiration and a longer closure). Moreover, /t/ at the beginning of a syllable bearing a pitch accent is stronger than one at the beginning of a syllable which is stressed but bears no pitch accent: Compare initial /t/ in the word ‘tomcats’ in ‘I like
An introduction to intonation – functions and models
31
TOMCATS best’ with ‘Why not? I LIKE tomcats,’ where the former /t/ is longer and more aspirated. The strengthening of segments at the beginning of phrases (domains) is referred to as domain initial strengthening (see, e.g., Keating et al. 2003). Let us take the sound /t/ in English again. It is pronounced at the beginning of a larger phrase with greater strength than at the beginning of a smaller one. Furthermore, connected speech processes such as assimilation occur to a lesser extent across large boundaries than across small ones. This resistance to assimilation is also considered to be due to initial strengthening, in the sense that the segment preserves its identity, thus enhancing the contrast with adjacent segments (syntagmatic contrast), and possibly even enhancing a contrast with other segments which might occur in that position (paradigmatic contrast). At the ends of phrases there is a slowing down of the articulators, which is reflected in the signal as final lengthening. The larger the phrase, the greater the degree of final lengthening (inter alia, Wightman et al. 1992). Final lengthening leads to an increase in the duration of segments which is different from the increase obtained by stress and accent; the sounds are often pronounced less loudly and clearly than in stressed and accented syllables. Thus, final lengthening cannot easily be mistaken for accentual lengthening. Final lengthening has been found in a large number of languages, and is assumed to have a physiological basis, although there are language-specific, and even contour-specific differences as to the degree of final lengthening present. If a phrase break occurs across a sequence of unstressed syllables, those which are at the beginning of the second phrase are often pronounced very fast, this is referred to as anacrusis. Like an abrupt change in pitch, an abrupt change in rhythm is a strong cue for a phrase break. Now that the highlighting and phrasing tasks have been discussed, we turn to which functions they are used to express.
2.
Functions of intonation
In spoken language, intonation serves diverse linguistic and paralinguistic functions, ranging from the marking of sentence modality to the expression of emotional and attitudinal nuances. It is important to identify how they are expressed in the learner's native language, so that differences between
32
Martine Grice and Stefan Bauman
the native and target languages are identified. It is particularly important to point out that many aspects of information structure and indirect speech acts are expressed differently across languages. Making learners aware of the existence of these functions will not only help them learn to express them, but will also help them to interpret what they hear in a more analytic way, thus reducing the danger of attributing unexpected intonation patterns as (solely) a function of the attitude or emotional state of the speaker. We have seen that intonation analysis involves categorical decisions about whether there is stress or accent, and, if there is an accent, which type of pitch accent it is. It also involves decisions about whether a boundary is present, and if so which pitch movement or level is used to mark it. There are also many gradient aspects to intonation, such as variation in pitch height or in the exact shape of the contour (equivalent to allophonic variation in the segmental domain). 2.1. Lexical and morphological marking Lexical and morphological marking does not belong to intonation proper but uses pitch, and to some extent also the other channels used by intonation. Categorical tonal contrasts at word level are characteristic of tone languages. Two quite different examples of tone languages are Standard Chinese, which has lexical contrasts such as the well-known example of the syllable ma with four different tonal contours, each which constitutes a distinct lexical item (mother, hemp, horse and scold), and the West African (Niger Congo) language Bini, which has grammatical tone: a change of tone marks the difference between tenses, e.g. low tone marking present tense and high or high-low tones marking past tense (see Crystal 1987: 172). Categorical tonal contrasts are also characteristic of so-called pitch accent languages which may also have lexical or grammatical tone. Both Swedish and Japanese are pitch accent languages. The difference between tone languages and pitch accent languages is that the former have contrastive tone on almost all syllables, whilst the latter restrict their tonal contrasts to specific syllables, which bear a pitch accent. However, it is difficult to draw a dividing line between these two language categories (see Gussenhoven 2004: 47). In intonation languages (the most thoroughly studied of which are generally also stress accent languages) like English and German, pitch is solely a postlexical feature, i.e. it is only relevant at utterance level. All tone and
An introduction to intonation – functions and models
33
pitch accent languages have intonation in addition to their lexical and/or grammatical tone, although the complexity of their intonation systems varies considerably. 2.2. Syntactic functions As we have already pointed out, syntactic structure and intonational phrasing are strongly related, but do not have to correspond exactly. Intonation can be used to disambiguate in certain cases between two different syntactic structures. The attachment of prepositional phrases is often said to be signalled by intonation. For example, in (6), a phrase break after verfolgt tends to lead to the interpretation that it is the man with the motorbike which Rainer is following. A phrase break after Mann would tend to lead to the interpretation that Rainer is on his motorbike and is following a man whilst riding it. In the first case the prepositional phrase modifies the noun phrase (den Mann) and in the second it modifies the verb (verfolgt). This phrasing has the same effect in the English translation. (6) Rainer verfolgt den Mann mit dem Motorrad. ‘Rainer is following the man with the motorbike.’
However, it is often unnecessary to disambiguate between two readings, particularly if the context is clear. It should therefore not be expected that speakers will make such distinctions all of the time. A study on Italian and English syntactic disambiguation (Hirschberg and Avesani 2000) showed this particularly clearly, not only for prepositional phrase attachments, as in (7a), but also for ambiguously attached adverbials, as in (7b) (adapted from Hirschberg and Avesani 2000: 93). (7a) Ha disegnato un bambino con una penna. (7b) Lui le aveva parlato chiaramente.
‘lit. He drew a child with a pen’ ‘lit. He to her has spoken clearly.’
The two readings of (7b) are either that it was clear that he spoke to her (the adverbial modifies the sentence) or that he spoke to her in a clear manner (the adverbial modifies the verb).
34
Martine Grice and Stefan Bauman
2.3. Information structure An important linguistic function of intonation is the marking of information structure, in particular (a) the expression of givenness of entities within a chunk of discourse and (b) the division of utterances into focus and background elements. In both (a) and (b) we are dealing with a continuum rather than a dichotomy: entities are not simply given or new, but may have an intermediate status between the two extremes, just as an utterance might contain elements which are focussed to a greater or lesser degree. We deal with (a) and (b) in sections 2.3.1. and 2.3.2. respectively. 2.3.1. Givenness Degrees of givenness can be expressed through the choice of words. A clearly new discourse element can be expressed with a noun and indefinite article, as in the underlined noun phrase in (8). A clearly given one can be expressed as a pronoun, as in (9). (8) Thomas isst einen Apfel.
‘Thomas eats an apple.’
(9) A: Was ist mit dem Apfel passiert? B: Thomas hat ihn gegessen.
‘What happened to the apple?’ ‘Thomas ate it.’
An intermediate degree of givenness can be expressed by the use of a definite article, as in (10), where the word Apfel is considered to be more given than in (8), since it refers to a specific instance of an apple which has already been introduced into the discourse in some way. (10) Thomas isst den Apfel.
‘Thomas eats the apple.’
Of course, degrees of givenness can also be expressed through intonation. For example, the word Apfel in (11) receives a pitch accent and is thus more prominent than the same word in the second turn (B) in (12). In B’s turn Apfel is deaccented, which means that it does not receive an accent although it would be accented under default conditions, i.e. in an ‘all-new’ context such as in (11). (11) Thomas hat Hunger. Also isst er einen APfel. ‘Thomas is hungry so he eats an apple.’
An introduction to intonation – functions and models
35
(12) A: Hast Du gesagt, dass Thomas mit einem Apfel jongliert? B: Nein, er ISST einen Apfel. ‘Did you say Thomas is juggling with an apple? No, he’s eating an apple.’
(12) is similar to an example of Cruttenden’s (2006) for English, given in (13). (13) A: Would you like to come to dinner tonight? I’m afraid it’s only chicken. B: I don’t LIKE chicken.
Indian English, by contrast, does not deaccent, as in the example taken from Ladd (1996: 176), reproduced in (14). (14) If you don’t give me that CIgarette I will have to buy a CIgarette.
Italian is similar to Indian English in that the nuclear pitch accent tends to go on the final lexical item regardless of whether it is given or not. In (15), the nuclear accent is on casa in both cases, whereas in English it would have gone on outside and inside.3 (15) É un lavoro che si fa fuori CAsa o dentro CAsa? ‘Is it a job which you do outside the HOME or inside the HOME.’
Cruttenden (2006) refers to examples such as those in Italian and Indian English as having reaccenting. Not all types of accent are equally strong, and therefore the context sometimes dictates not only whether an accent is present or not but also which type of accent may be used. The interested reader is referred to Baumann and Grice (2006), where degrees of givenness are shown to be reflected in the type of accent used. A high accent is used for new information, and a step down in pitch onto the accented syllable for information which is not totally given but, rather, accessible. No accent at all is used for totally given information. 2.3.2. Focus The second aspect of information structure is the division of utterances into focus and background elements, based on the structure of the previous discourse and the intentions of the speaker. Although there is a relation between focus and newness on the one hand and background and givenness
36
Martine Grice and Stefan Bauman
on the other, the two dimensions are generally orthogonal to each other. For example, an item in focus may be given within the discourse, as the name Maria in (16) B. Compare this to (17), where Maria is both in focus and new. (16) A: Liebst Du Maria oder Anna? B: [ Ich liebe ]background [ MaRIa ]focus given given (17) A: Wen liebst Du? B: [ Ich liebe ]background [ MaRIa ]focus given new
‘Do you love Maria or Anna?’ ‘I love Maria.’
‘Whom do you love?’ ‘I love Maria.’
Both of these structures represent so-called ‘narrow focus’, that is only one element is focussed. What is important is that this element is accented irrespective of its degree of givenness. In broad focus structures, where focus extends over a number of words, the relation between focussed elements and pitch accents is less direct. In many languages, larger focus domains are marked by only one or two pitch accents, a phenomenon called focus projection (see Selkirk 1984; Uhmann 1991). The preference as to which element receives the accent, and thus serves as focus exponent, is language specific. Ladd (1996) points out that many languages place the focus exponent on the argument rather than on the predicate. For example, in (18) the accent is placed on the argument, Haus, and the following predicate, kaufen, is left unaccented. This is the case even if the argument is followed by the verb, not only in German but also in English, as in (19). (18) Ich habe kein Geld übrig. Ich muss ein HAUS kaufen. ‘I don't have any spare cash. I have to buy a HOUSE.’ (19) I don't have any spare cash. I have a HOUSE to buy.
As pointed out above, the tendency to accent the last lexical item is stronger in Italian than it is in English or German. Thus, in (20) the final word is accented despite the fact that it is a verb, as in Ladd’s (1996: 191) example. (20) Ho un libro da LEGgere.
‘I have a book to read’.
An introduction to intonation – functions and models
37
Another important influence on the accentability of words is their ‘semantic weight’. In (21) B and C the noun phrases meinen Anwalt and jemanden are both arguments and in focus (i.e. part of the broad focus domain).4 However, jemanden is semantically ‘light’, since it is an unspecific pronoun, and thus does not receive an accent (see Uhmann 1991: 200). (21) A: Was haben Sie ihrer Aussage hinzuzufügen? ‘Do you have anything to add to your evidence?’ B: Ich habe meinen ANwalt belogen. ‘I lied to my lawyer.’ C: Ich habe jemanden beLOgen. ‘I lied to someone.’
It is important to point out that there are differences even within a language as to where the nuclear accent is placed in broad focus contexts. One example of this is Greek, where the accent tends to be placed on the argument in statements but on the predicate in polar questions (Grice, Ladd and Arvaniti 2000, Arvaniti, Ladd and Mennen, to appear, more on polar questions in 2.4. below). 2.4. Speech acts Intonation is used to encode distinctions such as whether an utterance is intended as a request for information (Request) or as a request for the interlocutor to perform a particular action (Command). There are four major categories of communicative illocutionary acts: constatives, directives, commissives, and acknowledgments (Bach and Harnish 1979; Searle 1969), examples of which are statements, requests, promises, and apologies respectively. Much research has been carried out on questions, a special type of directive, and how they are marked intonationally. Although polar questions are often marked with a final rise (H% edge tone), there are a great many languages that have a rising falling pattern, constituting an LHL sequence. Intonation plays a crucial role in distinguishing polar questions from, e.g., statements if there is no distinct interrogative syntax or question particle, such as in Italian. Even in German and English it is possible to ask a question using a fragment, as in (22), in which case intonation plays the major role in disambiguating the question from a statement, providing the context does not make it entirely clear that a question seeking confirmation is being asked.
38
Martine Grice and Stefan Bauman
(22) mit LEna? ‘with LEna?’
Wh-questions are usually accompanied with a falling intonation unless there is some additional paralinguistic meaning such as an element of insistence or politeness. In some cases, a syntactic Wh-question in German can also be interpreted as a suggestion if uttered with a fall as in (23).
(23) Warum ziehst du nicht nach KaliFORnien? ‘Why don’t you move to California?’
2.5. Paralinguistic functions and iconicity of intonation Intonation is often said to serve primarily an emotive function, implying an inherently iconic usage of pitch variations. Such fundamental iconicity further implies that the (paralinguistic) meaning differences in spoken language brought about by changes in pitch height are universally valid. This is, in principle, Bolinger’s view when he claims that intonation is part of a gestural complex, a relatively autonomous system with attitudinal effects that depend on the metaphorical associations of up and down – an elaborate scheme of iconism. It assists grammar – in some instances may be indispensable to it – but is not ultimately grammatical. (1985: 106)
However, Bolinger (1985: 97–98) relativises this claim by arguing that the iconicity of intonation is only ‘symptomatic’ in nature; pitch variations do not directly mirror the meaning they help to convey, as is the case – at least to a larger extent – with onomatopoeic expressions, such as bang, smash in English and klatschen, gurren in German (see Crystal (1987: 174–175) for examples of sound symbolism in many languages). Carlos Gussenhoven (2002, 2004) brought together research on the different factors affecting intonational form, which have led to claims of a universal form-function relation, and, crucially, showed how they interact. It is precisely the analysis of the interaction of the different factors which has explained apparent discrepancies in the form-function relation in crosslanguage comparisons. Gussenhoven claims that the form-function rela-
An introduction to intonation – functions and models
39
tions are based on three biological codes: the frequency code, the production (phase) code and the effort code. Each code has affective and/or informational interpretations and may have different linguistic manifestations in different languages. According to the frequency code, which was introduced by Ohala (1983, 1984), size is suggested by pitch height: since a bigger larynx (including longer vocal folds) and a longer vocal tract produce lower frequencies, low pitch is associated with larger creatures and high pitch with smaller ones. The frequency code has affective interpretations along dimensions such as dominant~submissive or impolite~polite and more informational interpretations along dimensions such as certain~uncertain or – closely related – assertive~questioning, with low pitch attributed to the first pole and high pitch to the second (Gussenhoven 2004: 80-84). The most obvious linguistic manifestation of the frequency code is the distinction between statements and polar (yes-no) questions, which is a categorical manifestation of the assertive~questioning dimension. Polar questions are marked in a great number of languages by rising or high pitch (as in example (2) versus (1) above). For many interpretations of the frequency code, it is the contour endings which are particularly important (see Ohala 1983, 1984; Gussenhoven 2004: 82). However, for a large number of languages it is not a final rise but rather an accentual rise which marks polar questions. This rise is often followed by a fall. A rising-falling contour is found in many Southern varieties of Italian (Bari, Palermo, Neapolitan; see Grice et al. 2005). This is illustrated in example (24), taken from a recording of Bari Italian (Grice et al. 2005: 370).
(24) Lo mandi a
MassimiLIAno?
‘Will you send it to Maximilian?’
A similar contour is also found in varieties of Hungarian, Romanian and Greek (Grice, Ladd and Arvaniti 2000), as well as in varieties of German, as shown in example (25) from a recording of a Palatinate dialect (Peters 2004: 384). Note that the rise-fall is on the final unaccented syllable, in contrast to (24), where the rise is on the accented syllable.
40
Martine Grice and Stefan Bauman
(25) Isch
des e gute
WIRTSfre:: ?
‘Is that a good barkeeper (female)?’
The end of the contour is also important for the production code, which derives its interpretations from a gradual decrease in subglottal air pressure in the course of a breath group (Lieberman 1967, Gussenhoven 2004). One consequence of the drop in subglottal pressure is a gradual lowering of pitch (along with intensity), throughout the phrase, referred to as declination (Cohen and ‘t Hart 1967). The central linguistic interpretation of this code is finality~continuation, marked by low versus high endings. Many languages have distinct contours which they use to express nonfinality, see for example the contour in (5). However, as with questions, not all languages signal finality right at the end of a phrase. Palermo Italian, for instance, uses a rising type of accent instead (Grice 1995), although this rise is distinct from the question rise. A fall to low pitch can express varying degrees of finality, depending on the extent of the fall and the final pitch reached. At the beginning of a phrase, the relation is reversed: an initial high accent often signals a new topic, whereas a relatively low accent at the beginning marks topic continuation (in German and English; see Wichmann, House and Rietveld 2000), emulating an intake of breath and therefore increased subglottal pressure, leading to faster vibration of the vocal folds (producing higher pitch). The third biologically determined code is the effort code, which is based on the physiological phenomenon that an increased effort in producing speech leads to greater articulatory precision. This is reflected by more pronounced and wider pitch movements (see Gussenhoven 2004: 85–86). The primary informational function of this code in many languages is to express emphasis or importance achieved through gradient use of pitch height. Its most common categorical manifestation is accentuation used in the marking of focus (see section 2.3.2.) and the types of accent used to mark stages along the given~new continuum: As discussed in section 2.3.1., higher pitch is used for items which are new to the discourse, whereas a step down onto a lower pitch is used for items which are accessible to the hearer through context, but are not entirely given.
An introduction to intonation – functions and models
41
To sum up, a representative sample of prosodic functions and the means used to express them are shown schematically in Figure 1. Categorisation of function
Intonational means of expression
linguistic
categorical Lexical/morphological tone languages Syntactic structure Information structure background – focus given – new Speech acts command information-seeking question Emotional state/Affect/Attitude surprise/politeness/boredom
paralinguistic
gradient
Figure 1. Functions of intonation and their intonational realisation
It should be clear from the figure (and from the discussion above) that although categorical means are employed to make lexical distinctions as well as distinctions pertaining to information structure and speech acts, it is not possible to state either that categorical means are used to express only linguistic functions, or that gradient means are used only for paralinguistic functions, although this is a widespread assumption. Therefore, anyone analysing the intonational forms of a language should keep an open mind when relating form to function. Furthermore, it should not be assumed that gradient means are universally valid, since different languages interpret pitch height in different ways.
42
Martine Grice and Stefan Bauman
3.
Models of intonation
In the literature on intonation, pitch modulation is either captured as pitch configurations (as in the British School, see section 3.1.), such as rise, fall, rise-fall and so on, or as a sequence of targets (as in autosegmental-metrical models, see section 3.2.). Targets specify only specific points in the F0 contour, represented phonologically as ‘tones,’. H(igh) tones correspond to high targets, referred to as ‘peaks,’ L(ow) tones to low targets, referred to as ‘valleys’ or ‘troughs’. These tones can be combined into composite pitch accents, LH representing a rise, and HL a fall, or boundary tone combinations, e.g. LH representing a phrase final rise. In the British School, configurations such as rise or fall are the primitives (basic units), whereas in the autosegmental-metrical approach they are derived, the basic building blocks being the levels High and Low. 3.1. British School British-style analyses (e.g. Crystal 1969; Halliday 1967; O’Connor and Arnold 1973; Tench 1996; see also Kohler 1991 for German), treat intonation in terms of dynamic pitch contours. The most important contour and the one by which tunes are classified is referred to as the ‘nuclear tone’. It starts at the ‘nucleus’ or ‘nuclear syllable’ (Halliday’s ‘tonic’), which is said to be the utterance’s most prominent syllable, and continues to the end of the phrase. The nucleus represents the only obligatory part of a ‘tone group’. Maximally, a tone group consists of a ‘prehead’ (unaccented syllables before the first pitch accent), a ‘head’ (reaching from the first pitch accented syllable to – but not including – the nuclear syllable), a nucleus (last pitch accented syllable within the tone group) and a ‘tail’ (unaccented postnuclear syllables). Postlexical stresses (or Druckakzente), i.e. secondary prominences characterised by increased length and/or loudness but lacking an abrupt pitch movement (see section 1.1.), may occur within the prehead, the head, and the tail. Example (26) shows the structure of a tone group containing all possible parts (including a potential postlexical stress on – kauft):
An introduction to intonation – functions and models (26)
Prehead •
Head
•
•
•
Nucleus •
•
hat
•
Tail
•
Mag- da-
LE- na
ein
‘Magdalena
bought a house.’
HAUS
43
•
ge- kauft
The notation used in British-School analyses assigns a dot to every syllable, with stressed syllables larger than unstressed ones. Pitch accented syllables either represent turning points in a more or less smooth pitch contour (as the third syllable of Magdalena in (26)) or are characterised by a considerable pitch change within the syllable (as on Haus in (26)). The latter is indicated by a line. Due to the form of these symbols the notation has been called ‘tadpole’ notation. It has also been termed interlinear, since the transcription is placed between two lines indicating the upper and lower limit of a speaker’s pitch range. The usual method of transcription within the British School is to use tonetic stress marks for the nuclear contour, the pitch movement extending from the nucleus to the end of the phrase. This is called intralinear transcription, as in (27), where the diacritic indicates a high fall. (27) Magdalena hat ein `Haus gekauft.
‘Magdalena bought a house.’
It is also possible to mark the beginning of the head and the direction the pitch takes during the head. Online material for practicing intonation within the British School is available at http://www.eptotd.btinternet.co.uk/pow/ powin.htm. 3.2. Autosegmental-metrical models The currently most widespread phonological framework for representing intonation is termed ‘autosegmental-metrical’, starting with the work of Pierrehumbert (1980), and treated in detail in Ladd (1996), in which the term was coined. The division of utterances into phrases and the assignment of relative prominence to elements within the phrase (phrasing and
44
Martine Grice and Stefan Bauman
highlighting) represent the metrical aspect, which was first proposed by Liberman and Prince (1977). The association of the tones (grouped into accents – if the language has them – and boundary tones) with the metrical structure (in other words: the association of the tune with the text) represents the autosegmental aspect. The term autosegmental refers to the fact that the tune should be considered as reasonably autonomous with respect to the text – in fact they are represented as being on different tiers. A tune can thus be realised on a great many texts of different lengths and structures. However, the tune has to be anchored to the text at strategic points – these are the associations between the two tiers. The greatest advantage compared to the British School model is that tonal information can be precisely localised on single syllables and/or at the edges of phrases. In British School studies, the only direct connection between tones and text occurs on the nucleus. In most AM models, the nucleus does not have a special status. It is simply defined as the last fullyfledged pitch accent in a phrase, which means that there is no theoretical distinction between ‘prenuclear’ and ‘nuclear’ accents. A widely used autosegmental-metrical framework for the description of intonation is the ToBI (‘Tones and Break Indices’) system, which was originally developed as a transcription system for American English, but has since become a general framework for developing intonation systems. There is a transcription system for Standard German, ‘GToBI’, which is based on speech data mainly from Northern German speakers (see Grice and Baumann 2002, Grice, Baumann, and Benzmüller 2005 for an overview). A (G)ToBI record consists of at least three different levels of description, which can be thought of as corresponding to autosegmental tiers. These tiers contain labels for text, tones, and break indices. The text tier provides an orthographic transcription of the words spoken, the tones tier mirrors the perceived pitch contour in terms of tonal events such as pitch accents and boundary tones, and the break index tier marks the perceived strength of phrase boundaries. Pitch accents are associated with lexically stressed syllables, indicated by a starred (‘*’) tone placed within the limits of the accented word - generally at local F0 minima and maxima. Edge tones are assigned to phrase-final syllables, marked by ‘-’ or ‘%’ after the tone, signalling the edge of an intermediate (minor) phrase or a (major) intonation phrase, respectively (see section 1.2.). As an example, the utterance in (26), which consists of a single intonation phrase, would be transcribed in GToBI as in (28).
An introduction to intonation – functions and models (28) MagdaLEna hat ein HAUS gekauft. L*
H*
45
‘Magdalena bought a house.’
L-%
The first (prenuclear) accent in the phrase is realised low in the speaker’s pitch range, the second (nuclear) one high, thus transcribed L* and H*, respectively. The tonal movement before and between these targets does not have to be transcribed, since no pitch minima or maxima are reached. Rather, the target points can be thought of as being joined up by quasilinear ‘interpolation’. Finally, the falling nuclear movement is accounted for by the combination of a high accent and a low boundary tone (L-%). The combined notation of ‘-’ and ‘%’ stems from the fact that the end of each intonation phrase necessarily coincides with the end of an intermediate phrase, since a hierarchical structure is assumed.5 The original ToBI model has been extended as a general framework for developing intonation systems for a large number of languages and varieties. Complete ToBI systems including online training materials are available for English, German, Korean, Japanese and Greek. These and other ToBI systems are described in detail in a book (Jun 2005a), and training materials as well as a number of related papers can be accessed from the ToBI homepage (http://www.ling.ohio-state.edu/~tobi/). It is difficult to say which of these two models would work best teaching intonation to second language learners. The British School model is intuitively straightforward and has didactic origins. It is relatively easy to relate the transcription to an auditory impression. It is, however, very difficult to relate tonetic or interlinear transcriptions to F0 traces – something which might be a problem in an age where students have ever-increasing access to programmes which can estimate and display F0 contours. A further disadvantage of the British School model is that it is used less frequently than it used to be, so that research carried out for the purposes of preparing course materials must often be based on relatively old sources. Since pronunciation (including intonation) changes relatively quickly, both at a regional and standard level, this could be a problem, since any accompanying tapes will sound rather outdated and stilted. The autosegmental metrical model is more helpful for students who might be interested in looking at F0 contours as well as listening. Further, a knowledge of this model is indispensable for anyone wishing to search the current literature for information on a specific language, or for communication amongst or with theoretical intonation researchers.
46
Martine Grice and Stefan Bauman
It must be stressed that both of the models are phonological in essence, and are therefore good for capturing the categories of the intonation system of a given system, but not suited for a detailed analysis into the finer phonetic details and gradient variation within a category. In other words, these models can be used for teaching what in segmental terms would be the 'phonemes' of a language, but not the allophonic variants.
4.
Summary and conclusion
In this paper we have provided an overview of the communicative functions attributed to intonation, starting out from the two main tasks intonation performs, i.e. highlighting and the division of utterances into chunks. In the languages we examined here, highlighting is achieved by means of stress and accent. However, not all languages have pitch accents and/or lexical stress, such as Korean (Jun 2005b), which uses phrasing to indicate narrow focus. All languages make use of phrasing of some kind. Further, we have examined more specific linguistic and paralinguistic functions of intonation. At a clearly linguistic level, we have observed that intonation is not always used to disambiguate syntactically ambiguous structures but it can be in some languages in certain contexts (where disambiguation is necessary). As for information structure, givenness is expressed in some languages with deaccentuation, while in other languages there is no specific marking of givenness. Likewise, focus can be marked with certain types of accent. It is important to note, however, that not all languages use intonation to signal focus (e.g. Wolof; Rialland and Robert 2001). At the more paralinguistic level there appear to be more commonalities across languages but it is precisely these commonalities which lead to misunderstandings, since one language might interpret an utterance with high pitch as friendly (e.g. British English), whereas another might interpret the same utterance as emphatic (e.g. Dutch), a result which depends on the weighting of the frequency and effort codes (Chen 2005). Finally, we have outlined two influential models for transcribing intonation, the British School and the autosegmental-metrical approach. We have also provided links to further materials and exercises so that interested readers can hear examples in each model, and, in the autosegmental-metrical approach, in a number of languages.
An introduction to intonation – functions and models
47
Acknowledgements We would like to thank Barbara Gili Fivela and Michelina Savino for their intuitions on Italian intonation, and Michelina Savino for providing the Italian recording.
Notes 1. The audio files of the example utterances can be found on the accompanying CD-ROM. Their numbers correspond to the numbers in the text. There is no audio file for example (14). Audio file (26) corresponds to examples (26), (27) and (28) in the text. 2. These two meanings of stress follow the British school approach, e.g. Crystal (1969). For Bolinger (1964) on the other hand, ‘stress’ is a strictly lexical feature, whereas ‘accent’ exclusively applies at the postlexical level. 3. It is important to point out that this distribution of accents in Italian is only a tendency; it is quite possible to have a nuclear accent on fuori and dentro as well. 4. They cannot be treated as entirely given, since they have not been mentioned in the immediately preceding context (here: A), and are thus candidates for pitch accents. 5. Due to the lack of a separate tonal target on the final syllable, an explicit symbol for tone immediately before the percentage ‘%’ sign can be dispensed with. This notation is meant to increase the phonetic transparency of the contour, which used to be written as ‘L-L%’.
References Arvaniti, Amalia, D. Robert Ladd and Inneke Mennen to appear Tonal Association and Tonal Alignment: Evidence from Greek Polar Questions and Contrastive Statements, Language and Speech. Bach, Kent and Robert M. Harnish 1979 Linguistic Communications and Speech Acts. Cambridge, Mass. Baumann, Stefan 2006 The Intonation of Givenness – Evidence from German. Linguistische Arbeiten 508, Tübingen: Niemeyer.
48
Martine Grice and Stefan Bauman
Baumann, Stefan and Martine Grice 2006 The intonation of accessibility. Journal of Pragmatics 38 (10), 1636–1657. Beckman, Mary E. 1986 Stress and Non-Stress Accent. Dordrecht: Foris. Blum-Kulka, S., House, J., and Kasper, G. 1989. Cross-Cultural Pragmatics: Requests and Apologies. N.J.: Ablex. Bolinger, Dwight 1964 Intonation: Around the edge of language. Harvard Educational Review 34, 282–296. 1985 The inherent iconism of intonation. In: John Haiman (ed.), Iconicity in Syntax, 97–108. Amsterdam and Philadelphia: John Benjamins. Chen, Aoju 2005 Universal and Language-Specific Perception of Paralinguistic Intonational Meaning. Utrecht: LOT. Cohen, Antonie and Johan ’t Hart 1967 On the anatomy of intonation. Lingua 19, 177–192. Cruttenden, Alan 1997 Intonation (Second edition). Cambridge: Cambridge University Press. 2006 The de-accenting of old information: a cognitive universal? In: Giuliano Bernini and Marcia L. Schwartz (eds.), Pragmatic Organization of Discourse in the Languages of Europe, 311–356. The Hague: Mouton de Gruyter. Crystal, David 1969 Prosodic Systems and Intonation in English. Cambridge: Cambridge University Press. Crystal, David 1987 The Cambridge Encyclopedia of Language. Cambridge University Press. Grice, Martine 1995 The Intonation of Interrogation in Palermo Italian – Implications for Intonation Theory. Tübingen: Niemeyer. Grice, Martine and Stefan Baumann 2002 Deutsche Intonation und GToBI. Linguistische Berichte 191, 267–298. Grice, Martine, Stefan Baumann, and Ralf Benzmüller 2005 German intonation in autosegmental-metrical phonology. In: Sun-Ah Jun (ed.), Prosodic Typology. The Phonology of Intonation and Phrasing, 55–83. Oxford: Oxford University Press.
An introduction to intonation – functions and models
49
Grice, Martine, Mariapaola D’Imperio, Michelina Savino, and Cinzia Avesani 2005 Strategies for intonation labelling across varieties of Italian. In: Sun-Ah Jun (ed.), Prosodic Typology. The Phonology of Intonation and Phrasing, 362–389. Oxford: Oxford University Press. Grice, Martine, D. Robert Ladd, and Amalia Arvaniti 2000 On the place of phrase accents in Intonational Phonology. Phonology 17, 143–185. Gussenhoven, Carlos 1983 Focus, mode, and the nucleus. Journal of Linguistics 19, 377– 417. 2002 Intonation and interpretation: Phonetics and phonology. Proceedings 1st Int. Conference on Speech Prosody, Aix-en-Provence (France), 47–57. 2004 The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Halliday, Michael A. K. 1967 Intonation and Grammar in British English. The Hague: Mouton. Hart, Johan ’t, René Collier and Antonie Cohen 1990 A Perceptual Study of Intonation: An Experimental-Phonetic Approach. Cambridge: Cambridge University Press. Hirschberg, Julia and Cinzia Avesani 2000 Prosodic disambiguation in English and Italian. In: Antonis Botinis (ed.), Intonation. Analysis, Modelling and Technology, 87–95. Dordrecht: Kluwer Academic Publishers. Jun, Sun-Ah 2005a Prosodic Typology. The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. 2005b Korean Intonational Phonology and prosodic transcription. In: Sun-Ah Jun (ed.), Prosodic Typology. The Phonology of Intonation and Phrasing, 201–229. Oxford: Oxford University Press. Keating, Patricia, Taehong Cho, Cecile Fougeron, and C. Hsu 2003 Domain-initial articulatory strengthening in four languages. In: John Local, Richard Ogden and Rosalind Temple (eds), Phonetic Interpretation (Papers in Laboratory Phonology 6), 143–161. Cambridge University Press. Kohler, Klaus 1991 Terminal intonation patterns in single-accent utterances of German: Phonetics, phonology and semantics. AIPUK 25, 115–185.
50
Martine Grice and Stefan Bauman
Ladd, D. Robert 1980 The Structure of Intonational Meaning: Evidence from English. Bloomington: Indiana University Press. 1996 Intonational Phonology. Cambridge: Cambridge University Press. Liberman, Mark and Alan Prince 1977 On stress and linguistic rhythm. Linguistic Inquiry 8, 249–336. Lieberman, Phillip 1967 Intonation, Perception, and Language. Cambridge, MA: MIT Press. O’Connor, J.D. and G.F. Arnold 1973 Intonation of Colloquial English. London: Longman. Ohala, John J. 1983 Cross-language use of pitch: An ethological view. Phonetica 40, 1–18. 1984 An ethological perspective on common cross-language utilization of F0 of voice. Phonetica 41, 1–16. Peters, Jörg 2004 Regionale Variation der Intonation des Deutschen. Studien zu ausgewählten Regionalsprachen. State doctorate thesis (Habilitationsschrift), University of Potsdam. Pierrehumbert, Janet B. 1980 The Phonetics and Phonology of English Intonation. PhD thesis, MIT. Bloomington: Indiana University Linguistics Club. Rialland, Annie and Stéphane Robert 2001 The intonational system of Wolof. Linguistics 39, 893–939. Roach, Peter J. 1994 Conversion between prosodic transcription systems: “Standard British” and ToBI. Speech Communication 15, 91–99. Searle, John 1969 Speech Acts. An Essay in the Philosophy of Language. Cambridge: Cambridge University Press. Selkirk, Elisabeth 1984 Phonology and Syntax. The Relation between Sound and Structure. Cambridge, MA: MIT Press. Tench, Paul 1996 The Intonation Systems of English. London: Cassell. Uhmann, Susanne 1991 Fokusphonologie. Eine Analyse deutscher Intonationskonturen im Rahmen der nicht-linearen Phonologie. Tübingen: Niemeyer.
An introduction to intonation – functions and models
51
Wichmann, Anne, Jill House and Toni Rietveld 2000 Discourse constraints on F0 peak timing in English. In: Antonis Botinis (ed.), Intonation. Analysis, Modelling and Technology, 163-182. Dordrecht: Kluwer Academic Publishers. Wightman, Colin W., Stefanie Shattuck-Hufnagel, Mari Ostendorf and Patti Price 1992 Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 92, 1707–1717. Williams, Briony 1996a The status of corpora as linguistic data. In: Gerry Knowles, Anne Wichmann and Peter Alderson (eds.), Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus, 3–19. London and New York: Longman. 1996b The formulation of an intonation transcription system for British English. In: Gerry Knowles, Anne Wichmann and Peter Alderson (eds), Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus, 38–58. London and New York: Longman.
Phonological and phonetic influences in non-native intonation Ineke Mennen 1.
Introduction
Just as poor pronunciation can make a foreign language learner very difficult to understand, poor prosodic and intonational skills can have an equally devastating effect on communication and can make conversation frustrating and unpleasant for both learners and their listeners. Language teachers have lately become more aware of this and have shifted the focus of their pronunciation teaching more towards the inclusion of suprasegmentals alongside segmentals with a view of improving general comprehensibility (Celce-Murcia, Brinton and Goodwin 1996). It is therefore crucial for language teachers to be aware of current research findings in the area of foreign (second) language learning of prosody and intonation, the type of prosodic and intonational errors second language (L2) learners are likely to make, and in particular where these errors stem from. The focus of this chapter will be on intonation in L2 learning, but some related prosodic phenomena such as stress and rhythm will be touched upon. There is no doubt as to the importance of intonation in communication. Intonation not only conveys linguistic information, but also plays a key role in regulating discourse and is an important indicator of speaker identity, reflecting factors such as physical state, age, gender, psychological state and sociolinguistic membership. Intonation is also important for intelligibility (e.g. Laures and Weismer 1999; Maassen and Povel 1984). The use of an inappropriate intonation pattern may give rise to misunderstandings. Such misunderstanding can be major or minor depending on the context in which the intonation pattern is used. As there is no one to one correspondence between intonation and meaning, an appropriate meaning can often be found that fits with the ‘wrong’ intonation pattern. Furthermore, native listeners are used to a great deal of variation in the choice of intonation patterns, both within their regional variety as across varieties (e.g. Grabe, Kochanski, and Coleman, to appear).
54
Ineke Mennen
Nevertheless, some patterns will clearly not be acceptable in some varieties, and the cumulative effect of continuously using slightly inappropriate intonation should not be underestimated. Given that we derive much of our impression about a speaker’s attitude and disposition towards us from the way they use intonation in speech, listeners may form a negative impression of a speaker based on the constantly occurring inappropriate use of intonation. For example, the relatively flat and low intonation of German learners of English may make them sound “bleak, dogmatic or pedantic, and as a result, English listeners may consider them uncompromising and self-opinionated” (Trim 1988, as quoted in Grabe 1998), an example which illustrates that impressions based on intonation may lead to ill-founded stereotypes about national or linguistic groups. Finally, intonational errors may contribute to the perception of foreign-accent (Jilka 2000). The aim of this chapter is to present a summary of commonly occurring problems in non-native intonation, as well as provide a reanalysis of some past and current research findings in terms of a framework of intonational analysis that separates phonological representation from phonetic implementation. Section 2 describes possible influences in non-native intonation, it explains the importance of making a distinction between intonational influence at a phonological and at a phonetic level, and it briefly summarises the model of intonation used in this chapter. In section 3, some intonational properties will be described which are likely to be affected in L2 speech production. Examples will be given of previous and current research with particular attention to phonological and phonetic influences in L2 intonation. Section 4 will discuss the implications of the reanalyses and new results for teaching and research.
2.
Influences in non-native intonation
In a survey of major international journals in second language acquisition of the past 25 years carried out by Gut (this volume; personal communication), it was found that as few as 9 studies investigated intonation and tone. Only four of these studies were concerned with perception of intonation, the other five were production studies. A further search of conference proceedings and recent PhD theses revealed an additional tenfold of studies on L2 production of intonation. Most of the limited (and not very recent) studies of L2 production of intonation involve investigations of the errors made by learners from various language backgrounds when they acquire English
Phonological and phonetic influences in non-native intonation
55
as an L2 (Backman 1979; Buysschaert 1990; De Bot 1986; Grover, Jamieson, and Dobrovolsky 1987; Jenner 1976; McGory 1997; Ueyama 1997; Willems 1982). These studies provide evidence that transfer or interference from the L1 is an important factor in the production of L2 intonation. Many similarities of errors were found in these studies, leading to assumptions about whether there are universal patterns in acquiring the intonational system of a second language. For example, Backman (1979) observed that the errors she found in her study of the English of Spanish learners showed remarkable similarities with errors Jenner (1976) found in his study on the English of Dutch learners. Errors in the production of L2 English intonation by speakers with different language backgrounds which appear similar across studies are: – a narrower pitch range (Backman 1979; Jenner 1976; Willems 1982) – problems with the correct placement of prominence (Backman 1979; – –
– – – –
Jenner 1976) replacement of rises with falls and vice versa (Adams and Munro 1978; Backman 1979; Jenner 1976; Lepetit 1989; Willems 1982) incorrect pitch on unstressed syllables (Backman 1979: too high; McGory 1997: too high; Willems 1982: no gradual rise on unaccented words preceding a fall) difference in final pitch rise (Backman 1979: too low; Willems 1982: too high [overshoot]) starting pitch too low (Backman 1979; Willems 1982) problems with reset from low level to mid level after a boundary (Willems 1982) a smaller declination rate (Willems 1982)
Although it is true that some of the observed errors are similar, it should be emphasised that they all appeared in studies of English as a second language. So the similarities might be due to idiosyncrasies of the English intonational system. Furthermore, the similarities cannot be explained by developmental factors (due to the learning process) alone. For example, the fact that both Dutch and Spanish acquiring English intonation produce a smaller pitch range compared to native English speakers does not necessarily indicate that a reduction of pitch range is a universal tendency in L2 acquisition. The smaller pitch range in the data of the learners could simply be a case of transfer, since both Dutch (Jenner 1976) and Spanish
56
Ineke Mennen
(Stockwell and Bowen 1965) are reported to have a smaller pitch range than English. It is therefore more likely that there is more than one process involved in the acquisition of L2 intonation, a conclusion which has also been reached in other fields of L2 acquisition. 2.1. Cross-linguistic analysis of intonation It should be noted that a comparison of the findings described in the previous section is not an easy task. The studies differ considerably with respect to the proficiency level of the learners, the languages under investigation, the number of subjects, and the framework or methodology used in the study. These differences in methodology prevent us from coming to any reliable conclusions about the similarities and differences between the languages investigated in these studies and the process of L2 acquisition of intonation. In order to establish intonational differences and similarities across languages which could cause the L1 and L2 intonation systems to influence one another, a generally agreed framework for analysing intonation needs to be used. Without such a model it is difficult to compare and interpret the importance of similarities and differences across languages in a reliable and uniform way. A model which has been used successfully to describe a wide range of languages (e.g. Jun 2004) and regional varieties (e.g. Grabe, Post, Nolan, and Farrar 2000; Fletcher, Grabe, and Warren 2004; Gilles and Peters 2004) is the model of intonational analysis developed by Pierrehumbert (1980) and Pierrehumbert and Beckman (1988). Mennen (2004) showed that this model can generate predictions about the degree of difficulty certain aspects of L2 intonation will present to L2 learners. Together with other studies that have begun to emerge using this model in studies of L2 intonation (Jilka 2000; Mennen 1998, 1999a, 1999b, 2004; Ueyama 1997; Jun and Oh 2000), it shows the enormous potential of this model for crosslinguistic studies. The most important principle of Pierrehumbert’s model is that it separates the phonological representation from its phonetic implementation, and intonation is viewed as consisting of a phonological and phonetic component. The phonological component consists of a set of high (H) and low (L) tones, which are further organised into pitch accents, boundary tones, and phrasal tones. The pitch accents have a starred tone to indicate their association with the stressed syllable, and can consist of a single tone (H* or L*) or a combination of two tones (e.g. HL*, H*L). Boundary tones are
Phonological and phonetic influences in non-native intonation
57
indicated as H% or L% and associate with phrase margins. Phrasal tones are indicated as H- or L- (with a hyphen) and associate with the space between the last pitch accent and the boundary tone. The phonetic realisation of underlying tone sequences is usually defined along two parameters, the scaling (i.e. the f0 value) and the alignment (i.e. the temporal relation with the segmental string) of the tones (see further section 3.1). The distinction between a phonetic and phonological component in intonation is important as it suggests that languages can differ at both these levels. As a result, the L1 and L2 intonation systems may influence one another both at the level of phonological representation as well as at the level of their phonetic implementation. A phonological influence would result from intonational differences in the inventory of phonological tunes, their form, and in the meanings assigned to the tunes. A phonetic influence would result from a difference in the phonetic realisation of an identical phonological tune (Ladd 1997). An example of phonological influence is the use of rises where native speakers would use falls and vice versa, found in many studies of L2 intonation (e.g. Adams and Munro 1978; Backman 1979; Jenner 1976; Lepetit 1989; Willems 1982). An example of phonetic influence is the finding of a different pitch range (e.g. Mennen, this chapter) or a different slope of a rise (e.g. Ueyama 1997) compared to the monolingual norm. These types of influence roughly correspond to the types of influence evidenced at the segmental level, where phonological influence would result from cross-linguistic differences at the phonemic level (such as the use of the vowel /u/ instead of the target L2 vowel /y/ when that vowel is not in the L1 vowel inventory), and phonetic influence resulting from differences in phonetic detail (such as differences in the implementation of the phonological voicing contrast: long-lag instead of short-lag voice onset times in the French productions of native speakers of English, such as those observed in Flege and Hillenbrand 1984). Separating phonological representation from its phonetic implementation in non-native production of intonation makes it possible to determine the actual source of the L2 intonational error, beyond just establishing that it is due to interference from the L1. Once the source of the problem has been established it can be appropriately addressed by the language teacher and learner.
58
Ineke Mennen
3.
Possible difficulties in L2 intonation
In this section a description will be given of some intonational properties which are likely to be affected in L2 speech production. Particular attention will be given to distinguishing phonetic from phonological influences in L2 intonation, where this distinction may not have been made in previous studies, and where results may have been interpreted incorrectly because no distinction has been made between phonological and phonetic influences. This section is by no means an exhaustive description of all intonational properties which can be influenced by differences between the L1 and L2 intonation systems. It is intended purely as an illustration of why it is important to distinguish between phonological and phonetic influences, and where this becomes relevant for language teachers. 3.1. Alignment Alignment refers to the temporal relation of H and L tones with the segmental string (i.e. the timing of a peak or valley with the vowels and consonants in speech). Recent research has suggested that alignment exhibits certain language and dialect-specific characteristics, more or less like those found for voice onset time (Caramazza, Yeni-Komshian, Zurif, and Carbone 1973; Flege and Hillenbrand 1984). That is, the same phonological category may be realised (aligned) differently in different languages or dialects. Differences in alignment have amongst others been found in crossdialectal studies on Swedish (Bruce and Gårding 1978) and Danish dialects (Grønnum 1991), ethnic subvarieties of Singapore English (Lim 1995), and varieties of British English (Grabe, Post, Nolan, and Farrar 2000), and German (Atterer and Ladd 2004). Cross-linguistic differences in alignment have not been investigated extensively. However, Ladd (1996) suggests that such differences can be found when comparing the intonation of languages. He illustrates this with an example of a certain type of fall, which he describes as “a local peak associated with the accented syllable, followed by a rapid fall to low in the speaking range, followed by a more gradual fall to the end of the phrase or utterance” (Ladd 1996: 128). This fall can occur in Italian as well as in English (or German). However, its realisation is different in these two languages. Where the peak in English (or German) is rather late (at or near the end of the stressed syllable), it is early in Italian. The following rapid fall in
Phonological and phonetic influences in non-native intonation
59
English (or German) takes place between the stressed and following unstressed syllable, whereas in Italian the fall starts well before the following syllable. As a consequence, English or German learners of Italian may use their native alignment pattern when producing an Italian falling tune. In other words, the learner gets the phonological association right (i.e. the H* peak associates with the stressed syllable), but fails to produce the correct phonetic detail (i.e. the correct alignment). Figure 1 gives an example of such a mistake. As Italians would place the fall somewhere in the antepenultimate syllable, a delay of this fall may be interpreted by native Italians as a mistake in the placement of word stress, i.e. they may perceive this as stress on the penultimate, rather than on the antepenultimate syllable. So what in fact is a phonetic error is interpreted by native listeners as a phonological error. It is therefore important for language teachers to establish what the source of the error is, as well-meant exercises to teach non-native speakers the correct stress placement may in this particular example not be effective, as the error is not misplaced word stress but rather a misalignment of the falling contour with the stressed syllable.
Mantova
M a ntova
Figure 1. A schematic representation of alignment differences between non-native (left) and native (right) production of the Italian word ‘Mantova’, with a late peak in the non-native as compared to the native production.
It is for this reason that care needs to be taken when interpreting results on L2 intonation (especially when they are based on auditory observations only), which report errors in stress placement or replacement of rises with falls (e.g. Lepetit 1989; Backman 1979; Jenner 1976). Some of these errors, may actually be phonetic errors (alignment errors), rather than phonological errors (misplaced stress). For example, Backman (1979), in her study on intonation errors of Venezuelan Spanish adult learners of American English, reports that the L2 learners often had problems with stress placement. However, visual inspection of some of the sample contours presented in her paper, suggests that the Spanish learners tend to have an earlier alignment
60
Ineke Mennen
of rise-falls in their L2 American English. In their utterances the F0 reaches its peak very early (before the accented syllable), and falls just before and during the beginning of the accented syllable. This may have caused the American judges to conclude that the stress was placed incorrectly (too early), since Americans would expect the falling pitch to occur much later.
Figure 2. Waveform, spectrogram and F0 contour of [?otan epivra?ainame to ?vima mas] “When we slowed down our step” read as part of a statement by a native Greek speaker. The vertical lines delimit the beginning and end of the accented syllable of the prenuclear test word. The position of the peak is indicated by H and is aligned after the onset of the first postaccentual vowel.
Figure 3. Waveform, spectrogram and F0 contour of [?otan epivra?ainame to ?vima mas] “When we slowed down our step” read as part of a statement by a non-native speaker of Greek. The vertical lines delimit the beginning and end of the accented syllable of the prenuclear test word. The position of the peak is indicated by H and occurs within the accented syllable, unlike in native Greek.
Phonological and phonetic influences in non-native intonation
61
There are very few studies which attempt to determine the extent to which the native alignment pattern carries over into the pronunciation of a second language. Mennen (2004) investigated how Dutch non-native speakers of Greek realised cross-linguistic differences in the alignment of a phonologically identical rise. Dutch and Greek share the same phonological structure in prenuclear rises (L+H), but the phonetic properties of the rise differ. Although in both languages the rise starts just before the accented syllable, in Dutch the peak is reached within the accented syllable whereas in Greek the peak is consistently aligned after the onset of the first postaccentual vowel. It was found that even after many years of experience with the L2 and despite their excellent command of the L2, the majority of the Dutch speakers carried over the phonetic details of their L1 rise into their pronunciation of L2. Four out of five speakers aligned the rise considerably earlier than the native Greek speakers, as shown in Figure 2 and 3. Nevertheless, one speaker managed to align the rise as late as the native Greek speakers. Given that the subjects in this study were all very experienced with the L2 and were considered to be near-native, the findings suggest that it may be difficult – although not impossible – to learn the phonetic implementation of underlying tone sequences and that this may be acquired rather late in the acquisition process. It is conceivable that L2 learners may acquire phonological properties of intonation earlier than their phonetic implementation (as suggested by Mennen 1999, 2004; Ueyama 1997). Such implementation difficulties were also found in a study of German speakers of English who carried over native German patterns of alignment into their L2 English (Atterer and Ladd 2004), suggesting that this may be a more common phonetic error than previously thought. As it is suggested in the literature that temporal properties of speech may influence the intelligibility of utterances produced by non-native speakers (Tajima, Port, and Dalby 1997), it is well possible that an adjustment of peak alignment will lead to improved intelligibility and less foreign-accented speech. However, perception studies would need to be carried out to establish the relative contribution of alignment patterning on intelligibility and the perception of foreign-accent. 3.2. Word stress and nuclear placement It is generally accepted that L2 learners often have difficulty with the correct placement of word stress, especially in the initial stages of the learning
62
Ineke Mennen
process (e.g. Adams and Munro 1978; Archibald 1992; Fokes and Bond 1989; Wenk 1985). Also, studies on the teaching of L2 prosody suggest (although based to a large extent on impressionistic observations) that word stress needs to be given special attention in the classroom (e.g. AndersonHsieh, Johnson, and Koehler 1992; Buysschaert 1990). Alongside difficulties with prominence within a word, L2 learners also seem to experience difficulty with the correct placement of prominence at the sentence level (e.g. Backman 1979; Jenner 1976). Just as a language can have phonemic contrasts, like a contrast between a voiced and a voiceless stop (/d/-/t/), the prominence system within a language is also a system of contrasts. A word is produced with more acoustic salience, or prominence, in order to contrast that word with other less prominent words. Just as phonemes serve to distinguish one word from another word, a system of prominence allows a speaker to contrast the relative importance of words. Both Jenner (1976) and Backman (1979) report that language learners often move the most prominent word of the sentence (the main or nuclear accent) too far to the left in their L2 utterances. Again, it is not clear whether this is caused by a phonetic or a phonological error (as explained in the previous section). Most of the test sentences Backman (1979) presents in her study consist of monosyllabic words only. If the Spanish learners of English have aligned the rise-fall in a sentence like “I'm late” too early, with the peak occurring just before the onset of the word “late”, native Americans may have perceived this as a prominence on “I'm”. This may have led to the perception of a shift of the nuclear accent to the left.i For this reason, these results have to be interpreted with caution. Another reason for questioning the results obtained in the above mentioned studies, is the fact that the use of acoustic cues to signal stress may be different across languages. Beckman (1986), for example, suggested that even though languages use the same parameters to signal stress, their relative importance is language specific. For example, Americans use all four perceptual cues to stress (F0, duration, amplitude, and spectral coefficient) to the same extent, whereas Japanese use F0 cues to a much greater extent than other cues to stress (Beckman 1986). As a consequence, when listening to American English, Japanese will rely mainly on F0 cues, and may disregard other cues to stress which should influence their perception of stress. In production there also seem to be cross-linguistic differences in the cues used to signal stress. For example, Adams and Munro (1978) found a difference in the production of sentence stress between native and non-
Phonological and phonetic influences in non-native intonation
63
native speakers of English. Adams and Munro found that the “real difference between the stress production of the two groups lay not in the mechanisms they used to signal the feature [stress], but rather in their distribution of it...” (p. 153). In a similar study Fokes and Bond (1989) found that much the same is true for word stress. If it is true that the acoustic correlates of stress differ across languages, results of studies relying on native speakers’ judgements of stress placement by non-native speakers have to be interpreted with caution. Native judges may presuppose certain acoustic cues to stress other than the ones produced by non-native speakers. It is therefore possible that the non-native speakers described in these studies do not actually produce errors in stress placement, but merely differ in the relative importance of the cues used to produce stress. A study by Low and Grabe (1999) seems to support this explanation. Their results indicate that the widely reported claim (based on native British English listener judgements) that British English and Singapore English differ in stress placement is not true. Their experimental data suggest that the apparent word-final stress in Singapore English (as opposed to the word-initial stress in British English) in words like flawlessly, is not the result of a difference of lexical stress placement. Instead, it seems that Singapore English and British English differ in the phonetic realisation of stress, with more phrase-final lengthening, and a lack of “deprominencing” in F0 in Singapore English than in British English. As a result, Low and Grabe argue that “the location of stress (or even its presence) cannot be judged impressionistically in any cross-linguistically valid way.” It may therefore not always be helpful to give L2 learners exercises to practice L2 stress placement as in some cases learners may already be producing stress in the appropriate position in the word or sentence. However, they may not be producing stress using the same cues as native speakers do. It is therefore important to establish whether the difficulty the learner experiences is caused by a phonological influence from the L1 (i.e. misplaced word or sentence stress) or by a phonetic error (i.e. use of different cues to signal stress). 3.3. Pitch range There is growing evidence that pitch range – besides other common influences such as anatomy/physiology, regional background, emotional state, and many others – is influenced by a speaker’s language background (e.g.
64
Ineke Mennen
Van Bezooijen 1995; Scherer 2000). It is thought that cultures or languages have their particular ‘vocal image’, which reflects socio-culturally desired personal attributes and social roles, and that speakers choose a pitch (within their anatomical/physiological range) that approximates the vocal image they want to project (Ohara 1992). Listeners are very sensitive to these features, as evidenced by a wealth of research that relates the independent contribution of pitch to a class of character types (e.g. Ladd, Silverman, Tokmitt, Bergmann, and Scherer 1985, Patterson 2000), showing amongst others that the wider their pitch range the more positively speakers are characterised. There is no doubt that people hear differences in pitch range between a variety of languages. There is strong anecdotal evidence that people perceive differences between for example English and German – with English sounding higher and having more pitch variation than German (which is believed to be spoken with a relatively low and flat pitch). English speech (especially female) is often perceived as ‘überspannt and zu stark ‘aufgedreht’” (over the top) by German listeners (Eckert and Laver 1994: 145). This belief has even found its way into the German film industry, which uses German dubbing actors with a lower pitch and narrower pitch range than those of original English actors (Eckert and Laver 1994). Such beliefs are also expressed in language descriptions and manuals. For example, Gibbon (1998) refers to a smaller pitch range in German compared to English. Conversely, Germans feel that the pitch of an English speaker’s voice wanders meaninglessly if agreeably up and down (Trim 1988). Languages are believed to differ both in the average pitch height at which they are spoken and in the range of frequencies that are usually used. Ladd (1996) refers to these dimensions of variation in terms of level (i.e. the overall pitch height) and span (i.e. the range of frequencies). Crosslinguistic comparisons of level - and to a lesser extent span - have been carried out for a wide range of languages (e.g. Braun 1994). These studies provide some evidence for the existence of language-specific differences in pitch range, and the reported differences are usually explained by assuming an influence of socio-cultural factors on pitch. Intriguingly, while there are very few studies on bilingual production of pitch range, there is a suggestion that bilingual speakers vary their pitch range according to the language they are speaking. For example, Braun (1994) and Gfroerer and Wagner (1995) report a different level in the languages of German/Turkish bilinguals (with a higher pitch in their Turkish than in their German), and Jilka (2000) reports a difference in span but not
Phonological and phonetic influences in non-native intonation
65
in level for German/American bilinguals (i.e. with a wider span in their American English). Cross-linguistic comparisons of pitch range in L1 and L2 intonation have all been based on long term distributional measures (statistical moments), and there appears to be no agreement in these studies as to what constitutes pitch range. For level, measures of mean f0 and median f0 have been used. For span, measures used include maximum minus minimum f0, four standard deviations around the mean, the difference between the 95th and 5th percentile (90% range), and the difference between the 90th and 10th percentile (80% range). More recent work by Patterson (2000) suggests that there are some problems using long term distributional properties of f0, since they assume an even distribution of f0 around the mean and their results may be affected by spurious measures (e.g. octave errors). These measures also showed a lack of correlation with listener judgments of speaker characteristics and therefore lacked perceptually validity (Patterson 2000). Furthermore, the majority of cross-linguistic studies of pitch fail to control for factors influencing f0 (including regional accent, physiology/anatomy, type of speech materials), making it impossible to tease out the influence of the language itself. An alternative way to measuring span and level is to link measures of span and level to specific turning points (i.e. local minima and maxima) in the f0 contour (Patterson 2000). Patterson (2000) showed that such measures better characterise pitch range than the more commonly used long term distributional measures. Specifically, the linguistic measures were shown to be more perceptually valid in that they correlated better with listener judgments of speaker characteristics. Scharff (2000) recorded a small set of materials, which was subsequently analysed by Mennen (this chapter) and presented here for the first time. Span and level were investigated – using Patterson’s (2000) method – in three groups of speakers: a group of twelve monolingual native speakers of German (from the area of Stuttgart), a group of ten monolingual native speakers of English (from the area of Newcastle upon Tyne), and a group of twelve German non-native speakers of English (who all lived in or around Newcastle upon Tyne). All speakers were female between the ages of twenty and forty and they were all nonsmokers. The non-native speakers were advanced speakers of English and had a length of residence in Britain of over 5 years. They were all asked to read a phonetically balanced passage (“The North Wind and The Sun”/ “Der Nordwind und die Sonne”) in their respective language(s).
66
Ineke Mennen
Figure 4. The three selected target points in each sentence of the passage. From these points span and level were calculated. Span is defined as the average of a speaker’s M minus the average of a speaker’s V (in semitones). Level is calculated as the average of a speaker’s L% (in Hertz).
Following Patterson (2000) measurements were taken at 3 selected target points in each sentence of the passage. These target points were: all noninitial accent peaks (M); all post-accentual valleys, i.e. the low pitch of unaccented words (V); and all sentence-final lows (L%). The target points are exemplified in Figure 4. From these measures the span and level were calculated for each speaker. Span was defined as the difference between the average of a speaker’s non-sentence initial peaks and their average of postaccentual valleys (i.e. M minus V). The span measures were expressed in semitones (ST) since it is suggested that this best captures pitch range variation (Nolan, Asu, Aufterbeck, Knight, and Post 2002). Level was defined as the average of a speaker’s sentence final lows (L%), and was expressed in Hertz (Hz) rather than ST (since ST are not a suitable scale for measuring level due to its logarhithmic nature). Figure 5 gives the information from table 1 as a visual representation of the span and level measurements for all the twenty two speakers in a scattergraph. From this figure it can be seen that level and span measures seem to be independent with there clearly being speakers that have a narrow span yet with a spread of differing levels (e.g. speakers 10 and 13). Likewise there are speakers that have very similar levels with a wide range of spans (e.g. speakers 3 and 18). Nevertheless, there is a clustering of the native German speakers at the lower end of the x-axis (representing span) in the
Phonological and phonetic influences in non-native intonation
67
figure, with the native English speakers clustering mostly at the higher end of the x-axis. There are some exceptions to this pattern. Two native English speakers (13 and 15) cluster at the lower end of the x-axis (similar to the majority of the native German speakers) but they also cluster at the higher end of the y-axis with a very high level. This suggests that native English speakers may either have a wider pitch span, and/or a higher level than the native German speakers. Table 1. The means of span (in ST and Hz) and level (in Hz) measurements for each of the native speakers. SPEAKERS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
LANGUAGE German German German German German German German German German German German German English English English English English English English English English English
Overall Overall
German English
SPAN ST 6.26 5.71 4.23 4.82 5.44 5.98 6.20 5.96 5.57 4.59 5.71 4.53 4.73 7.48 4.70 8.43 9.43 7.66 7.95 6.57 7.92 6.74
SPAN Hz 98.8 74.1 52.7 61.0 72.0 79.0 75.7 75.4 69.5 54.3 58.4 51.9 69.5 98.0 65.7 101.5 137.4 94.13 87.6 81.6 108.9 79.5
LEVEL Hz 170.2 182.6 159.3 166.7 182.3 182.6 149.2 184.0 149.9 141.6 134.1 161.9 215.5 157 188.0 147.0 175.3 160.0 145.0 162.7 172.0 146.0
5.42 7.16
68.6 92.39
163.7 166.9
68
Ineke Mennen
220
LANGUAGE German English
13
200
15
LEVEL (Hz)
5286 180
17
21
1 4 3
160
20
12 9
7
18 14 16 19
22
10 140
11
120
4
5
6
7
8
9
10
SPAN (ST)
Figure 5. Level (Hz) and span measures (ST) for twelve monolingual German females and ten monolingual English females. Stars represent the measures for the native German females, plus signs represent the native English females. The numbers represent the different speakers.
Table 2 shows the averaged pitch range results for each of the non-native speakers in each of their two languages. Results for the non-native speakers showed that neither span nor level differed across the two languages of the non-native speakers (for span ST F < 1, n.s.; for span Hz F < 1, n.s.; for level Hz F < 4, n.s.), although there was a tendency for a higher level in their English which failed to reach significance due to a lack of statistical power (p=0.059). Figure 6 illustrates span and level measurements in the English and German spoken by ten non-native speakers (due to some problems with transfer of the data, data for two of the speakers had to be excluded). When inspecting this scattergraph, it becomes clear that although more than half
Phonological and phonetic influences in non-native intonation
69
of the speakers have a higher level in their English than in their German, only two speakers have a wider span in their English. It thus appears that the majority of the non-native speakers are adjusting only one of the dimensions of pitch range, the one which is possibly the less common dimension of pitch range in native English – something that has never to our knowledge been suggested before. Table 2. The means of span and level measurements for each of the non-native speakers. On the left are the means for their German, on the right are means for their English. GERMAN SPEAKERS
ENGLISH
SPAN
SPAN
LEVEL
SPAN
SPAN
LEVEL
(ST)
(Hz)
(Hz)
(ST)
(Hz)
(Hz)
1
4.36
53.2
157.8
5.93
69.6
168.0
2
5.52
70.6
148.0
8.82
110.3
148.0
3
9.12
103.3
131.2
9.02
108.6
154.0
5
7.03
74.8
128.7
6.02
67.0
146.3
6
4.19
53.7
184.0
4.09
51.0
187.0
7
6.41
67.0
133.5
5.33
58.1
141.7
8
5.84
68.3
150.7
5.74
69.8
155.0
9
5.01
56.6
148.0
5.08
57.8
149.6
10
8.20
97.4
137.0
6.39
75.7
148.3
11
8.82
113.1
135.0
4.88
59.6
163.6
Total
6.45
75.8
145.4
6.13
77.4
158.3
Figure 6 also illustrates that speakers do not all follow the same strategy in their different languages. For example, speaker 1 has a wider span and a higher level in her English, whereas speaker 3 has a higher level in her English, but her span is similar across the two languages. Speaker 11, has a higher level but a considerably narrower span in her English, just as speaker 10 and 7. Speaker 2 on the other hand has a wider span in her English than German, but has a similar level in both languages. Audio examples are provided for the bilingual speaker 1 in English (EB1) and German (GB1), and speaker 8 in English (EB8) and German (GB8). It is important to pay attention to such socio-phonetic differences in the use of pitch range between languages, particularly since it influences the way we perceive one another. Given that wider pitch ranges are generally perceived more positively, speakers of languages with a habitually nar-
70
Ineke Mennen
rower pitch range may be perceived as more negative by speakers of languages with a wider pitch range, and vice versa. It is likely that the negative perceptions towards German speakers described in section 1 could be partly due to such differences in pitch range. In order to avoid such misperceptions and misplaced stereotypes, it is important to address these differences in language pedagogy.
190
6
LANGUAGE German English
6
180
1
170
LEVEL (Hz)
11 1
160
8 9
150
3 8 10
2 9 7
2
5 10
140
7
11 3
5 130
120
4
5
6
7
8
9
10
SPAN (ST)
Figure 6. Level and span measures (in semitones) for ten female German nonnative speakers of English. Stars represent the measures for their German data and plus signs represent their English measures. The numbers represent the different speakers.
Phonological and phonetic influences in non-native intonation
4.
71
Summary and future directions
The aim of this chapter was to provide a summary of some of the most commonly occurring problems in non-native intonation, to reanalyse some past and current research findings in terms of a framework of intonational analysis that separates phonological representation from phonetic implementation, and to demonstrate the usefulness of such a distinction in L2 prosody teaching. It was suggested that L2 learners may go through different stages in the learning process and may first acquire phonological patterns of L2 intonation before they acquire the correct phonetic implementation of these patterns. This assumption was based on studies by Mennen (1999, 2004) which showed that native Dutch speakers who speak Greek near-natively were perfectly able to produce the correct phonological tonal elements but implemented these structures by using L1 phonetic regularities. This finding confirmed observations reported in a small-scale study by Ueyama (1997). Further research is necessary to verify this hypothesis quantitatively for different phonological and phonetic aspects of L2 intonation. Examples were given throughout the chapter to illustrate that intonational errors observed in L2 speech may not be what they seem and that a perceptually similar error may in fact have different underlying causes, which can be either difficulties with the phonological structure of the L2 or with its phonetic realisation. It was emphasised that it is important for teaching purposes to distinguish between phonological and phonetic errors, so that the source of the problem can be addressed in teaching. Only by careful comparisons of the language pairs using a commonly agreed framework of intonational analysis will it be possible to establish where the errors originate from. Further analyses of different language pairs are necessary if we want to incorporate this in pronunciation pedagogy in foreign language teaching.
Notes 1. Unfortunately, it is not possible to inspect Jenner's (1976) data, as in his study no acoustic data are presented to support his conclusion.
72
Ineke Mennen
References Adams, Corinne and R.R. Munro 1978 In search of the acoustic correlates of stress: Fundamental frequency, amplitude, and duration in the connected utterances of some native and nonnative speakers of English. Phonetica 35,125–156. Anderson-Hsieh, Janet, R. Johnson and Kenneth Koehler 1992 The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning 42, 529–555. Archibald, John 1992 Adult abilities in L2 speech: Evidence from stress. In: James Leather and Alan James (eds.). New sounds 92. Proceedings of the 1992 Amsterdam Symposium on the acquisition of SecondLanguage Speech, Amsterdam: University of Amsterdam. Atterer, Michaela and D. Robert Ladd 2004 On the phonetics and phonology of “segmental anchoring” of F0: evidence from German. Journal of Phonetics 32, 177–197. Backman, Nancy Ellen 1979 Intonation errors in second language pronunciation of eight Spanish speaking adults learning English. Interlanguage Studies Bulletin 4, 239–266. Beckman, Mary 1986 Stress and Non-stress Accent. Dordrecht: Foris. Bezooijen, Reneé 1995 Sociocultural aspects of pitch differences between Japanese and Dutch women. Language and Speech 38, 253–265. Braun, Angelika 1994 Sprechstimmlage und Muttersprache. Zeitschrift für Dialektologie und Linguistik, LXI. 2, 170–178. Bruce, Gösta and Eva Garding 1978 A prosodic typology for Swedish dialects. In: Eva Garding, Gösta Bruce and Robert Bannert (eds.), Nordic Prosody, 219–228. Lund University, Department of Linguistics. Buysschaert, Joost 1990 Learning intonation. In: James Leather and Alan James (eds.). New sounds 92. Proceedings of the 1992 Amsterdam Symposium on the acquisition of Second-Language Speech. Amsterdam: University of Amsterdam.
Phonological and phonetic influences in non-native intonation
73
Caramazza, Alfonso, Grace H. Yeni-Komshian, E. B. Zurif and E. Carbone 1973 The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America 54, 421–428. Celce-Murcia, Marianne, Brinton Donna M. and Janet Goodwin 1996 Teaching Pronunciation. A Reference for Teachers of English to Speakers of Other Languages. Cambridge: Cambridge University Press. De Bot, Kees 1986 The transfer of intonation and the missing data base. In: Eric Kellerman and Michael Sharwood Smith (eds.). Crosslinguistic influences in second language acquisition. New York: Pergamon Press. Eckert, Hartwig and John Laver 1994 Menschen und ihre Stimmen: Aspekte der vokalen Kommunikation. Weinheim: Psychologie Verlags Union. Flege, James Emil and James Hillenbrand 1984 Limits on phonetic accuracy in foreign language speech production. Journal of the Acoustical Society of America 76, 708–721. Fletcher, Janet, Esther Grabe and Paul Warren 2004 Intonational variation in four dialects of English: the high rising tune. In: Sun-Ah Jun (ed.). Prosodic typology. The phonology of intonation and phrasing, 390–409. Oxford: Oxford University Press. Fokes, Joann and Z.S.Bond 1989 The vowels of stressed and unstressed syllables in nonnative English. Language Learning 39, 341–373. Gfroerer, Stefan and Isolde Wagner 1995 Fundamental frequency in forensic speech samples. In: Angelica Braun and Jens-Peter Köster (eds.), Studies in Forensic Phonetics: 41–48. Trier: Wissenschaftlicher Verlag Trier. Gibbon, Dafydd 1998 German intonation. In: Daniel Hirst and Albert di Cristo (eds.), Intonation Systems. A Survey of Twenty Languages, 78–95. Cambridge: Cambridge University Press. Gilles, Peter and Jörg Peters 2004 Regional Variation in Intonation. Tübingen: Niemeyer Verlag. Grabe, Esther 1998 Comparative Intonational Phonology: English and German. MPI Series in Psycholinguistics 7. Wageningen: Ponsen en Looien.
74
Ineke Mennen
Grabe, Esther, Greg Kochanski and John Coleman 2005 The intonation of native accent varieties in the British Isles potential for miscommunication? In Katarzyna DziubalskaKolaczyk and Joanna Przedlacka (eds.), English pronunciation models: a changing scene. (Linguistic Insights. Studies in Language and Communication. Vol. 21). Frankfurt /Main: Peter Lang. Grabe, Esther, Brechtje Post, Francis Nolan and Kimberley Farrar 2000 Pitch accent realization in four varieties of British English. Journal of Phonetics 28, 161–185. Grønnum, Nina 1991 Prosodic parameters in a variety of regional Danish standard languages. Phonetica 47, 188–214. Grover, Cinthia, Donald, G Jamieson and Michael B. Dobrovolsky 1987 Intonation in English, French and German: perception and production. Language and Speech 30, 277–296. Jenner, Bryan 1976 Interlanguage and foreign accent. Interlanguage Studies Bulletin 1, 166–195. Jilka, Matthias 2000 The Contribution of Intonation to the Perception of Foreign Accent. Ph.D. dissertation, Institute of Natural Language Processing. University of Stuttgart. Jun, Sun-Ah and Mira Oh 2000 Acquisition of 2nd language intonation. Proceedings of International Conference on Spoken Language Processing Beijing, 4, 76–79. Jun, Sun-Ah (ed.) 2004 Prosodic Typology. The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. Ladd, D. Robert 1996 Intonational Phonology. Cambridge: Cambridge University Press. Ladd, D. Robert, K. E. A. Silverman, F. Tolkmitt, G. Bergmann, and Klaus R. Scherer, 1985 Evidence for the independent function of intonation contour type, voice quality, and f0 range in signaling speaker affect. Journal of the Acoustical Society of America 78, 435–444. Laures, Jacqueline S. and Gary Weismer 1999 The effects of a flattened fundamental frequency on intelligibility at the sentence level. Journal of Speech Language and Hearing Research 42, 1148a–1156.
Phonological and phonetic influences in non-native intonation Lepetit, Daniel 1989 Lim, Lisa 1995
75
Cross-linguistic influence on intonation: French/Japanese and French/English. Language Learning 39, 397–413.
A contrastive study of the intonation patterns of Chinese, Malay and Indian Singapore English. Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, 402–405. Low, Ee Ling and Esther Grabe 1999 A contrastive study of prosody and lexical stress placement in Singapore English and British English. Language and Speech 42, 39–56. Maassen Ben and Dirk-Jan Povel 1984 The effect of correcting fundamental frequency on the intelligibility of deaf speech and its interaction with temporal aspects. Journal of the Acoustical Society of America 76, 1673–1681. McGory, J. T. 1997 Acquisition of Intonational Prominence in English by Seoul Korean and Mandarin Chinese Speakers. Ph.D. dissertation, Ohio State University. Mennen, Ineke 1998 Second language acquisition of intonation: the case of peak alignment. In: M. C. Gruber, D. Higgins, K. Olson and T. Wysocki (eds.), Chicago Linguistic Society 34, Volume II: The Panels, 327–341. Chicago: University of Chicago. 1999a The realisation of nucleus placement in second language intonation. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, 555–558. 1999b Second Language Acquisition of Intonation: the Case of Dutch Near-Native Speakers of Greek. Ph.D. dissertation, University of Edinburgh, Edinburgh. 2004 Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics 32, 543–563. Nolan, Francis, Eva Lina Asu, Margit Aufterbeck, Rachel Knight, and Brechtje Post, 2002 Intonational pitch equivalence: an experimental evaluation of pitch scales. Paper presented at the BAAP Colloquium, University of Newcastle. Ohara, Yumiko 1992 Gender dependent pitch levels: A comparative study in Japanese and English. In: K. Hall, M. Bucholtz and B. Moonwomon (eds.), Locating power: Proceedings of the Second Berkeley Women and Language Conference 2, 478–488. Berkeley.
76
Ineke Mennen
Patterson, David 2000 A Linguistic Approach to Pitch Range Modelling. Ph.D. dissertation, Department of Linguistics, University of Edinburgh. Pierrehumbert, Janet 1980 The Phonology and Phonetics of English Intonation. Ph.D. dissertation, MIT. Pierrehumbert, Janet and Mary Beckman 1988 Japanese Tone Structure. Cambridge, MA: MIT Presss. Scharff, Wiebke 2000 Speaking Fundamental Frequency Differences in the Language of Bilingual Speakers. Unpublished Masters Dissertation, Human Communication Sciences, University of Newcastle upon Tyne. Scherer, Klaus R. 2000 A cross-cultural investigation of emotion inferences from voice and speech: implications for speech technology. Proceedings of the 6th International Conference on Spoken Language processing, Beijing 2, 379–382. Stockwell, R. and D.J. Bowen 1965 The sounds of English and Spanish. Chicago: Chicago Press. Tajima, Keiichi, Robert Port and Jonathan Dalby 1997 Effects of temporal correction on intelligibility of foreignaccented English. Journal of Phonetics 25, 1–24. Trim, J. L. M. 1988 Some contrastive intonated features of British English and German. In: J. Klegraf and D. Nehls (eds.) Essays on the English language and applied linguistics on the occasion of Gerhard Nickel’s 60th Birthday, 235–49. Heidelberg: Julius Groos. Ueyama, Motoko 1997 The phonology and phonetics of second language intonation: the case of “Japanese English”. In: Proceedings of the 5th European Conference on Speech Communication and Technology, Rhodes (Greece), 2411–2414. Van Bezooijen, Renée 1995 Sociocultural aspects of pitch differences between Japanese and Dutch women. Language and Speech 38, 253–265. Wenk, Brian 1985 Speech rhythms in second language acquisition. Language and Speech 28, 157–174. Willems, Nico J. 1982 English Intonation from a Dutch Point of View. Dordrecht: Foris Publications
Different manifestations and perceptions of foreign accent in intonation Matthias Jilka 1.
Introduction
The teaching of prosody to second language (“L2”) learners often suffers from the problem that while an individual error can be readily pointed out and corrected, it is much more difficult to formulate general rules that provide guidance in speech production. Due to the inherent complexity of the intonation-related aspects of foreign-accented speech, the knowledge about the nature of such errors and the corresponding teaching methods are not as well-developed as they are with respect to the segmental aspects of a second language. This study aims to offer a general overview of those aspects of intonation and their interaction with second language acquisition that need to be taken into consideration when attempting to identify and classify intonational foreign accent. Concrete examples of such manifestations of intonational foreign accent are provided by an analysis of German productions by native speakers of American English and vice versa, the English productions by native speakers of German. All example utterances are available for listening on the enclosed CD-Rom. It is obvious that non-native intonation can exhibit foreign accent and that such intonational characteristics also make crucial contributions to the overall impression of foreign accent. The – non-trivial – difficulty lies in two tasks: accurately identifying exactly those intonational deviations that actually constitute relevant manifestations of foreign accent, and ascertaining the respective relative significance of these deviations. Without this knowledge it will remain unclear which of the non-native speaker’s concepts of intonational organization are actually responsible for the foreignaccented intonation. Consequently, it would be just as unclear which specific intonational characteristics should be tackled in pronunciation teaching.
78
Matthias Jilka
For this reason first a number of insights with respect to the identification, classification and analysis of the specific characteristics of intonation that influence the determination of the prosodic phenomena responsible for the perception of foreign accent will be presented. This should create a greater awareness of the role intonation plays in foreign accent that will be helpful especially to the community of professional language teachers, e.g. of German as a foreign language. This study identifies four major intonation-specific factors that are assumed to determine the causes, manifestations and perception of foreign accent-related intonational deviations. Section 2 introduces the first of these factors, namely the problem of perspective, which states that our perception of any tonal event strongly depends on the chosen model of intonation description. Following this, the possible different sources of intonation errors – ranging from straightforward transfer of a tonal event from the speaker’s native language to seemingly “unmotivated” deviations – are presented in section 3. The great variability of intonation, which is connected to the large number of potential contexts and (un)intended interpretations, is discussed next in section 4, also with regard to the existence of tonal deviations that are not necessarily perceived as non-native by themselves but can accumulate to create such an impression. Finally, in section 5, it is demonstrated that a common overall impression of foreignness is created by the “cooperation” of several types of tonal deviations.
2.
Influence of differences in perspective
Quite obviously the perception of a particular manifestation of intonational foreign accent is shaped by the chosen model of intonation description, as it is the medium which must express the corresponding tonal deviations. This also means that it cannot be determined with absolute certainty which model if any reflects the true representation of an intonational phenomenon – with the exception of those cases where it can be demonstrated that a particular model is insensitive to tonal deviations that a different form of representation has shown to be relevant. It is thus just as likely that a particular model of representation will distort the causes of intonational foreign accent as it is that the different offered perspectives are all equally valid. In any case this problem reflects the uncertainties in identifying the nature of tonal deviations and how they might be related to each other.
Different manifestations and perceptions of foreign accent in intonation
79
This section attempts to illustrate the effects of the different philosophies of intonation models on the representation of intonation errors. 2.1. Models of intonation description The two most widely known and used approaches to intonation description are the so-called British School (e.g., Palmer 1922, Kingdon 1958, Halliday 1967, O’Connor and Arnold 1973) and the tone sequence model (Pierrehumbert 1980), which is based on the American tradition of analyzing pitch contours as sequences of pitch levels (e.g., Pike 1945 or Trager and Smith 1951). The tone sequence model has given rise to ToBI (“Tones and Break Indices”; Silverman et al. 1992, Beckman and Ayers 1994), a system developed specifically for the transcription of prosodic phenomena. Both the British school model of intonation description and ToBI, as well as the philosophical differences between them, are discussed in more detail in Grice and Baumann (this volume). A comparison of how ToBI and the British School approach analyze and label the same original utterances should make the consequences for the perception of the same phenomena, foreign accent or not, much clearer.1 In both versions of the short utterance Tom didn’t know produced by a native speaker of American English, the British School approach interprets the complete intonation contour, shown in the original in the top row of Figure 1, in terms of a single nuclear tone movement. In case A it is a high-fall (tonetic stress mark “ \ ” ) on Tom which corresponds in the ToBI approach to a high pitch accent (H*) on Tom, a low phrase accent (L-) to mark the end of the intermediate phrase and a low boundary tone (L%) to mark the end of the intonation phrase. The ToBI representation is thus more complex with at least two major points of reference (the pitch accent and the boundary constellation) as opposed to just one tonal movement in the British School interpretation. This difference becomes even more pronounced in case B where the ToBI labels mark an additional downstepped pitch accent (!H*) on know, whereas the British School approach again associates the complete contour with one nuclear tone movement, in this case a low-fall (tonetic stress mark “ \ ”). Such a representation is not very adaptable and less likely to be able to reflect possible variations. Indeed the narrow transcription (second row in Figure 1), which is basically a stylized reproduction of the original contour, for example does not account for the typical valley between two high pitch
80
Matthias Jilka
accents. Tellingly, this phenomenon must be expressed here using ToBI terminology, in fact it is dealt with explicitly in Pierrehumbert (1981) within the framework of the tone sequence model.
A.
B.
\
H*
Tom
didn’t
know
L-L%
Tom
H*
didn’t
know
!H* L-L%
Figure 1. Two versions of the sentence ‘Tom didn’t know’ produced with slightly different intonation patterns. Comparison of the British School approach and ToBI. Top row: original contours; second row from top: British School-style narrow interlinear tonetic transcription; second line from bottom: British School-style broad transcription with tonetic stress marks; bottom line: ToBI transcription
The intonation description provided in Pierrehumbert’s paper is actually a complex function meant to compute the appropriate contour. The tone sequence approach thus goes beyond a linguistic description, but can be
Different manifestations and perceptions of foreign accent in intonation
81
adapted for modelling intonation in the context of F0 generation and speech synthesis (see for example Jilka, Möhler and Dogil (1999) for a tool for intonation generation on the basis of ToBI labels). 2.2. Intonation models for F0 generation and speech synthesis In the field of speech technology quite a number of approaches to intonation modelling have been developed, some of them, as mentioned above, on the basis of the tone sequence model, others rather detached from strictly linguistic perspectives. The parameter-based approach, PaIntE (short for “Parametric Intonation Events”) described in Möhler and Conkie (1998), for example, is interesting as it provides an alternative perspective of the phonetic dimensions of intonation events. The events themselves are designated by a linguisticallybased model such as ToBI. F0 generation with the PaIntE approach attempts to achieve the reproduction of an identified intonation pattern by means of six descriptive parameters that determine the shape of an approximation function across a three-syllable window around the accented syllable. The parameters, as depicted in Figure 2, describe the steepness of rise and fall of the tonal movement (parameters a1 and a2). Steepness is determined via a sigmoid function and is basically defined as inversely proportional to the duration of the rise or fall. The parameters c1 and c2 describe the amplitude of rise and fall (difference from valley to peak). The location of the peak expressed in milliseconds from the start of the utterance is represented by parameter b, while parameter d stands for the absolute pitch value (in Hertz) of the function’s peak. Apart from these six main parameters it is possible to derive further parameters from the model. These can describe such aspects as the duration of rise or fall (trise, tfall) or the position of a peak or a valley (i.e., the beginning of a rise) with respect to a particular portion of the accented syllable. An approach like PaIntE thus offers an alternative, more detailed view of the phonetic dimension of tonal categories as opposed to a more traditional definition of those categories simply in terms of the temporal alignment of the peak and its relative position with respect to the overall pitch range.
82
Matthias Jilka
Figure 2. Main parameters of the approximation function in a three-syllable window around the accented syllable: a1 = steepness of rise; a2 = steepness of fall; b = temporal alignment of the peak, c1 = amplitude of rise; c2 = amplitude of fall; d = absolute peak height (adapted from Möhler 1998)
3.
Sources of intonation errors
The different causes of intonational errors due to foreign accent are a major aspect in the diagnosis and correction of the intonational mistakes nonnative speakers make. This section aims to help in the classification of at least some of them in order to facilitate the identification of certain types of errors. The section also discusses cases where there seems to be no obvious explanation for a particular tonal deviation. Examples of foreign accent in the German productions of native speakers of American English and in the English productions of native speakers of German are presented. The preferred transcription system is ToBI, mainly because it allows for a category-based interpretation of intonation that is compatible with the leading theories of second language acquisition (e.g., Flege 1995 or Best 1995), which are segment-based.
Different manifestations and perceptions of foreign accent in intonation
83
3.1. Transfer from the speaker’s native language Transfer is certainly the most straightforward case of foreign accent, both as far as the segmental and the suprasegmental aspects of speech are concerned. The appearance of features of a speaker’s first language in his or her productions in the second language is a phenomenon that everyone is familiar with. It is similarly obvious to most people that sounds that are in some way equivalent in the two languages are affected by this process. This principle also applies both to intonation categories that occur in comparable discourse environments and to the phonetic realizations of these categories. Transfer of a native category to the target language within a specific discourse situation Intonational foreign accent is especially easy to recognize when a clearly defined discourse situation such as a declarative statement or a yes/noquestion is produced with an inappropriate final intonation pattern. While in German and American English these two discourse situations are typically produced with very similar final tunes (i.e. combinations of the nuclear pitch accent and the phrase accents and boundary tones), there is a significant difference with respect to continuation rises. A continuation rise is meant to signal that despite the end of an intonation phrase the speaker intends to continue talking about a certain subject. Expressed with the Stuttgart ToBI system for German this tonal movement consists in a rising nuclear pitch accent that spreads to a default boundary tone (L*H %), thus a simple rise. In American English, on the other hand, the continuation rise is typically realized by an explicit rise in the boundary constellation itself (low phrase accent followed by high boundary tone: L-H%). If this rise is preceded by a rise on the nuclear pitch accent, the resulting tonal movement is made up by a rise, a fall, and yet another rise (L+H* L-H%). The top contour in Figure 3 demonstrates this latter intonation pattern in the German utterance Denn man hatte dort auf einem Schild schon lesen können, dass frische Butter eingetroffen sei (“you could read on a sign that fresh butter had arrived”) made by an otherwise near-native sounding American speaker. The additional rise and fall on lesen können is clearly inappropriate to the ears of native German listeners (see Jilka 2000). The F0 contour for the corresponding reading of the same sentence by a German speaker is depicted in the bottom contour of Figure 3 and shows the simple rise spreading from the nuclear pitch accent on lesen to the end of the into-
84
Matthias Jilka
nation phrase. As stated earlier all example utterances are available for listening on the enclosed CD-Rom.
Figure 3. Example of category transfer of a continuation rise on ‘lesen können’. Top contour: American speaker’s version with additional fall and rise; bottom contour: typical German pattern
Transfer of an equivalent tonal category with a different phonetic realization Apart from the transfer of completely different tonal categories it is also possible to encounter this mechanism on a lower level, namely with respect to the phonetic realization of what is essentially the same category (e.g., the same type of pitch accent). In other words, a deviating phonetic realization is heard as non-native without necessarily being perceived as reflecting an altogether different tonal event. In such cases differences may, but do not have to, depend on the tonal and/or segmental context and therefore be more subtle. The ToBI labels themselves would not reflect such contrasts. They could only be expressed in terms of the dimensions chosen to represent the labels as target points within the framework of the tone sequence model. These dimensions are typically the target’s position relative to a specific area in the accented syllable (e.g. the voiced or the sonorant part) and its position within the speaker’s pitch range (see e.g. Jilka, Möhler and Dogil 1999 for such an approach). However, the PaIntE model introduced in section 2.2 offers the possibility of an alternative representation of a tonal category such as a pitch accent designated by a ToBI label. Moreover, this approach allows for the relatively comfortable statistical analysis of a greater number of parameters.
Different manifestations and perceptions of foreign accent in intonation
85
An analysis of variance of rising (L*H) pitch accents in German as produced by American and German speakers, for example, shows significant differences for parameter a1, the steepness of the rise (p = 0.00015), parameter c1, the amplitude of the rise (p = 0.0000011) as well as the trise parameter, which refers to the duration of the rise (p = 0.0029). The results must be interpreted in the following way: the rises in L*H pitch accents produced by the Americans are steeper than those produced by the Germans because they have a significantly higher amplitude. The Americans’ rises are actually longer but this is outweighed by the greater amplitude. The differences in amplitude (and steepness) are not, as might be suspected, a consequence of speaker selection, i.e. the American speakers did not happen to have higher voices. Peaks are on average 216.5 Hz for the Germans and 220.8 Hz for the Americans, p = 0.5177, which is clearly not significantly different. Instead the baseline values are significantly lower for the American speakers (p = 0.0184), 169.3 Hz vs. 182.4 Hz, implying that they use wider pitch ranges. The values were based on the readings of six sentences of varying length. In those sentences the two groups of speakers (three women and one man each) produced altogether 330 realizations of the rising pitch accents. Interestingly, the Americans produced more than twice as many (221) in those very sentences as the Germans (109), a phenomenon that will be discussed further in section 5. 3.2. Individual intonation errors Many intonation errors made by second language learners are not clearly attributable to the influence of their native languages. These errors can take any form, from the occurrence of an additional pitch accent or the lack of one, to the use of deviating categorical or phonetic realizations. The interpretation of such cases remains speculative. The idea that native speakers use a generally reduced, i.e., simplified prosodic inventory of default categories may have merit in some cases. This notion would be compatible with the concept of a “Basic Variety” (Klein and Perdue 1997), which states that especially in non-formal language acquisition speakers develop a primitive closed system of the target language. While that study also postulates an interaction between native and target language on the phonological level it does not discuss this in detail and concentrates mainly on how morphology, vocabulary and syntax are reduced to basic elements.
86
Matthias Jilka
In the majority of cases, however, such simplifications cannot be considered to be convincing explanations. Figure 4, for example, illustrates the case of a native speaker of German who uttered the phrase I’m 31 years old when she introduced herself at the beginning of the recording session. The top contour in Figure 4 shows that she produced an unusual final tune by stressing the word years with a rising pitch accent (L+H*) and maintaining that high level (transcribed here by a relatively rare “plateau” H-L% boundary tone) until the end of the phrase. This intonation pattern can of course theoretically occur in both English and German and might very well be appropriate as a continuation rise in a very specific context that requires a particular focus on years old, possibly as a contrastive or alternative element. However, such an interpretation is impossible with respect to the circumstances under which the utterance was made. Therefore the intonation contour is perceived as clearly inappropriate, and there is no immediate motivation for it in terms of a transfer from German that would explain why it was produced.
Figure 4. F0 contour of the phrase ‘I’m thirty-one years old’ uttered by a native speaker of German. Top contour: original spontaneous utterance with unusual final tune; middle contour: F0 generated declarative; bottom contour: F0 generated continuation rise
F0 generated versions based on ToBI labels (Jilka, Möhler and Dogil 1999) with alternative final tunes are acceptable on the other hand, as shown in the declarative (with the typical L-L% boundary constellation) depicted as the middle contour in Figure 4 as well as a more regular continuation rise
Different manifestations and perceptions of foreign accent in intonation
87
with the characteristic L-H% boundary configuration (bottom contour in Figure 5). As the actual final tune (L+H* H-L%) is in no way more primitive than these alternative possibilities, we are left with the assumption of speakerspecific, individual errors. Such errors could be motivated either by a mistaken interpretation of the discourse situation or a general inability to deal with complex cognitive demands that leads to the assignment of more or less random tonal patterns.
4.
The high variability of intonation
Intonation has a high potential for variation. On the one hand, speakers unconsciously produce subtle phonetic deviations either at random or due to the influence of the segmental, tonal or phrasal context. Such phonetic variation should in theory be predictable. However, research into the numerous manifestations and interactions of these factors is still a long way from providing an all-encompassing overview of any one language (see, e.g., van Santen and Hirschberg 1994 or Jilka and Möbius 2006). On the other hand, speakers also consciously produce a multitude of differing realizations that correspond to just as many differing interpretations. This does, of course, complicate the identification of clearly “correct” and “incorrect” intonation patterns, both in the target language and in the productions of the learners, as there is not yet sufficient knowledge about all the intonational features of any language, e.g. where they occur, what their respective form is and how they relate to the semantic content of what is being said. For this reason it is not always a trivial task to determine whether a particular intonation pattern is really inappropriate. As the possible combinations and interactions between intonation, context and meaning are virtually infinite, it is quite challenging to draw conclusions about general classes of situations in which a particular type of pitch accent could be predicted to occur in a particular position. Using the perspective (ToBI) and classification types introduced in sections 2 and 3 as a basis, variability due to foreign accent can thus be observed on two levels, either as phonetic deviation within a tonal category (i.e., from an assumed prototypical realization in the segmental and prosodic context) or as the deviating use of whole categories (i.e., their choice and distribution).
88
Matthias Jilka
Indeed, not all the variability that deviates from an assumed standard form is necessarily perceived as an error, as it may only result in different, possibly unusual, interpretations that the context does not forbid. In such cases perception or rather awareness of the deviations becomes possible only via a cumulative effect, i.e. individual deviations, that by themselves alone would not trigger the impression of foreign accent, will create a chain of less and less likely interpretations that eventually leads to a point where the utterance’s intonation is incompatible with its semantic content. The example analysis of the sentence “Und wenn auch die Mehrzahl von ihnen gerade nur so lange Zeit blieb wie der Umtausch in Anspruch nahm, so gab es doch einige, die sich hinsetzten und gleich auf der Stelle zu lesen begannen” (“And even if the majority of them only stayed as long as it took to complete the exchange, there still were some who sat down and started reading right away”) as read by a native speaker of American English can be used to demonstrate such an effect. It shows individually acceptable deviations that slowly accumulate to a combination of incompatible interpretations which are then perceived as an expression of foreign accent. The first unusually placed tonal category in the example (see also Figure 5) is the rising (L*H) pitch accent on “auch”, here used as a – normally unstressed – focus particle in the sense of “even”. Due to the pitch accent, however, the impression is created that it is used in the more common interpretation of “also” or “too”. The following falling accent on “Mehrzahl” (“majority”) reinforces this impression, as in the unmarked case of a concessive construction with “wenn auch” (“even if”) we would have expected a rise on “wenn” followed by a high-level plateau (see Müller 1998 for a discussion of focus particles and intonation in German). Similarly, the focus particle “nur” (“just”, “only”) is also assigned a pitch accent instead of the following “so” as would be expected. A rising pitch accent is also found on “Zeit” (“time”), indicating a contrastive focus accent. This L*H pitch accent is immediately followed by yet another L*H accent on “blieb” (“stayed”) in the very next syllable, creating an uncommonly narrow rise-fall-rise pattern at the phrase boundary. The distribution and high number of pitch accents encourages a forceful interpretation. The listener is led to expect the utterance to continue with the description of an extraordinary action undertaken or experienced by the children mentioned in the preceding context. Such an interpretation would for example be associated with the conjunction “dass” introducing a subsequent subordinate clause “… so lange Zeit blieb, dass … X passierte / sie X taten” (“stayed just for so long that … X happened / they did X”). As the utterance simply
Different manifestations and perceptions of foreign accent in intonation
89
continues with a description of how long the children stayed, the listener will unavoidably get the impression that the intonation is inappropriate. In the remaining intonation phrases of the example utterance the pitch accents on the first syllables of “Umtausch” and “hinsetzten” are worth mentioning. Unlike in the preceding examples, the deviation does not consist in the placement or type of a whole tonal category, but only in its phonetic realization. In both cases the pitch excursion on the rise is unusually large, assigning strong emphasis to the syllables. This might be interpreted as indicative of contrastive focus and induce listeners to look for indications of alternatives in the context.
word|Und| die| von| grade| so| Zeit|
|der| in| nahm|gab| einige| sich| und|auf|St-| zu| wenn| Mehrzahl|ihnen| nur|lange| blieb| wie|Umtausch|Anspruch| so| es| die|hinsetzten| der| Stelle|lesen| auch| doch| gleich| begannen|
Figure 5. Accumulation of the impression of foreign accent by means of the combined effect of the inappropriate placement, choice and realization of pitch accents in the utterance ‘Und wenn auch die Mehrzahl von ihnen nur so lange Zeit blieb wie der Umtausch in Anspruch nahm, so gab es doch einige, die sich hinsetzten und gleich auf der Stelle zu lesen begannen’ (“And even if the majority of them only stayed as long as it took to complete the exchange, there still were some who sat down and started reading right away”).
In summary, it can be reiterated that the multitude of different facets of tonal variation strongly impedes any attempt to provide a structured overall representation or classification of its association with clearly defined semantic interpretations. This inherent characteristic of intonation has the disadvantage of making it difficult for the researcher to determine which tonal choices are really inappropriate and why. However, it also has the advantage of offering to the language teacher the possibility of identifying, selecting and teaching a wide variety of individual correspondences between particular intonation patterns and interpretations that he or she considers to be especially useful or important. As a matter of fact, this choice need not be restricted to a number of closely defined discourse situations, which are connected to specific tunes.
90
Matthias Jilka
5.
Overall impression of intonational foreign accent
Unlike the individual and immediately conspicuous cases of tonal deviations discussed in section 3, impressions of foreign accent caused by cumulative effects are not only associated with the moment when no reasonable interpretation for the overall intonation pattern is possible anymore (see section 4). Very often it is rather a process of becoming aware of preexisting subconscious impressions that something indefinable in the speaker’s productions is unusual (provided of course that there are no other more obvious errors, for example on the segmental level). The listener may therefore perceive one complex, overall impression as opposed to discrete individual deviations following each other. It it thus not unreasonable to postulate that when the many individual events potentially expressing foreign accent are combined, such a common overall impression is created. In other words, several intonational features together would conspire towards a specific overall intonation characteristic. Summarizing observations made with respect to American speakers’ intonation in German, several features can indeed be shown to exhibit similar tendencies. A comparison of the same sentence read by native speakers of American English and German as depicted in Figure 5 shows that the Americans use twice as many pitch accents as the Germans in the same stretch of speech and that they tend to have wider pitch ranges (in section 3.1 an identical observation was confirmed by measurements within the framework of PaIntE parameters). The American speaker’s production in our example can thus be described as comprising more tonal movements, rises and falls, with more extreme endpoints. The created perception is that of a much more lively intonation. This impression is representative of American speakers as opposed to Germans in general. Further support for this tendency can be found in the fact that if a transfer of a tonal category takes place, it is likely to lead to additional tonal movements as well, as for example in the transfer of the continuation rise described in section 3.1, in which the comparison showed an extra fall and rise (L*H % vs. L+H* L-H%) in the American speaker’s production.
Different manifestations and perceptions of foreign accent in intonation
91
Figure 6. Differences in overall intonation characteristics in the sentence ‘Alle kochten bereits vor Wut und der Mann konnte jetzt von allen Seiten Schimpfwörter hören’ (“Everybody was boiling with rage and the man could hear swearwords (directed at him) from all sides”) between productions by German (top contour) and American speakers (bottom contour). American speakers typically use about twice as many pitch accents and make more generous use of their pitch ranges
These accumulated patterns create a form of “global” intonational foreign accent that is language-distinctive, if not language-specific, due to the influence of the native language. This form of foreign accent exhibits a certain independence from the segmental level. Knowledge of the relationship between phonetic and phonological parameters (e.g., temporal alignment, choice and placement of pitch accents) and their interpretation is not necessary for listeners to be able to recognize and possibly identify the foreign accent. This language-distinctive independence of prosodic features has been demonstrated in a number of studies including, just to name an example, a thorough language identification task by Ramus and Mehler (1999) that used different stages of delexicalization by means of resynthesis. This aspect was also examined specifically with respect to the perception of foreign-accented speech in Jilka (2000). Listeners were presented with low pass-filtered stimuli and asked to decide whether the language they heard was English or German. The stimuli had been produced by native speakers of American English and German and were selected in such a way that the majority of them were foreign-accented. Therefore listeners were expected to identify the speakers’ native languages. While for stimuli
92
Matthias Jilka
of varying duration identification rates (i.e. correct recognition of the speaker’s native language) were generally not significant, there was a significant (p = 0.030) correlation of 0.786 (Spearman-Rho test) between identification rate and stimulus duration. For this reason a small-scale additional test with eight stimuli longer than 35 seconds was performed. As expected the speakers’ native languages were recognized in all cases, in six cases significantly so (p < 0.00005). Such results can certainly be interpreted as confirmation of the idea postulated earlier that the overall impression of foreign accent independent of semantic content slowly accumulates during a stretch of speech. The longer it is, the more hints at unusual tonal features reach the listener’s ear, until they eventually cross the threshold of awareness2.
6.
Possible conclusions for language teaching
The presented characteristic aspects of intonation address potentially difficult challenges for intonation research and teaching alike. It can be shown, however, that these challenges can be met to a considerable degree and that the discussion of these aspects can lead to insights that underline the usefulness of strictly research-related problems to the development of teaching methods. The question of the significance of perspective, i.e., the dependence on the model of intonation description, expresses a general uncertainty as to what the true representation of intonation is. This certainly is problematic for intonation research. However, from a more practical, pedagogical perspective it can also be argued that the multitude of different representations provides the chance to deal with an intonation error from different starting points. In section 3 the causes of some intonational deviations were shown to be either unknown or of a non-transparent nature. As a result it would be extremely difficult, if not impossible, to understand these particular sources of foreign accent and develop systematic approaches to predicting and avoiding them. On the other hand there are some well-defined environments, especially the basic discourse situations such as declaratives, wh-questions, yes/no-questions, continuation rises etc., for which it should be possible to make sure that the final tunes associated with them are produced correctly and do not contain any obvious cases of tune transfer. A similar approach could be applied to successfully identified transfer phenomena concerning
Different manifestations and perceptions of foreign accent in intonation
93
the phonetic realization of equivalent categories. See section 3.1 for example for both types of transfer. The importance of the high variability of intonation was discussed as a factor complicating the relationship between prosody and meaning. The virtually infinite number of tonal variations and corresponding interpretations makes it impossible for intonation researchers to provide a formal description that relates all possible variations in all possible contexts to the intended corresponding interpretations. Even if such a description existed it would obviously still be unreasonable to expect a second language learner to be able to acquire and apply it. It was pointed out, however, that from the pedagogical point of view this variability also has the benefit of allowing the identification and teaching of specific tonal constellations that are guaranteed to express the intended discourse meaning. The selection of such exemplary tonal patterns and interpretations, as well as the development of suitable teaching methods, may be challenging but is nevertheless well within the grasp of the language teaching community. Finally, the overall characteristics of foreign accent described in section 5 contribute an essential share of the impression of foreign accent that a non-native speaker conveys. One interesting inherent property that these general characteristics have is that it is not necessary to relate them to particular meanings or positions in the phrase. If specific differences exist between two languages, like they do between German and American English, and it is possible to teach learners conscious control of these global features (e.g., “don’t extend your pitch range too much”, “use fewer pitch accents” etc.), then this measure alone should greatly reduce the impression of intonational foreign accent, even though it would not affect the more persistent tonal deviations that are due to misrepresentations of equivalent contexts or categories in the learners’ native languages. The observations and suggestions contained in this study are made from the point of view of intonation research and do not incorporate insights from the fields of teaching methodology or even pedagogy in general. They express a relatively broad objective and would of course not lead to a completely accentless pronunciation (which is not a realistic goal anyway). It can be argued, however, that their application together with a heightened awareness of the nature of intonational errors will help foreign language teachers such as teachers of German as a foreign language to develop more systematic approaches to dealing with foreign-accented intonation. The application of speech technology in the form of F0 generation and resynthesis as demonstrated in Figure 4 should also be of use eventually,
94
Matthias Jilka
helping for example to make clear the difference between intonation contours that a learner has produced and more appropriate realizations generated with the learner’s own voice (some commercially available products that attempt to go in this direction already exist).
Notes 1. British School transcriptions follow the models given in Cruttenden (1997: 61) 2. A number of studies have also shown that rhythmic features alone may be sufficient to distinguish languages. In Jilka (2000) it is however shown, using low pass-filtered stimuli with a constant F0 of 220 Hz, that language identification rates are significantly better when intonation information is present.
References Beckman, Mary and Gail Ayers 1994 Guidelines to ToBI Labelling. Version 2.0. Ohio State University. Best, Catherine T. 1995 A direct-realist view of cross-language speech perception. In: Winifred Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 171–204. Timonium, MD: York Press. Cruttenden, Alan 1997 Intonation. Cambridge: Cambridge University Press. Flege, James E. 1995 Second language speech learning: Theory, findings and problems. In: Winifred Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 233–277. Timonium, MD: York Press. Halliday, Michael A. K. 1967 Intonation and Grammar in British English. The Hague: Mouton. Jilka, Matthias 2000 The Contribution of Intonation to the Perception of Foreign Accent. PhD Dissertation. AIMS 6(3). Stuttgart: University of Stuttgart.
Different manifestations and perceptions of foreign accent in intonation
95
Jilka, Matthias and Bernd Möbius 2006 Towards a comprehensive investigation of factors relevant to peak alignment using a unit selection corpus. Proceedings of Interspeech, Pittsburgh, 2054–2057. Jilka, Matthias, Gregor Möhler and Grzegorz Dogil 1999 Rules for the generation of ToBI-based American English intonation. Speech Communication 28, 83–108. Kingdon, Roger 1958 The Groundwork of English Intonation. London: Longman. Klein, Wolfgang and Clive Perdue 1997 The basic variety (or: Couldn’t natural languages be much simpler?). Second Language Research 13, 301–347. Möhler, Gregor 1998 Theoriebasierte Modellierung der deutschen Intonation für die Sprachsynthese. PhD Dissertation. AIMS 4(2). Stuttgart: University of Stuttgart. Möhler, Gregor and Alistair Conkie 1998 Parametric modelling of intonation using vector quantization. Proceedings of the 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves (Australia), 311–316. Müller, Karin 1998 German Focus Particles and their Influence on Intonation. Master’s Thesis, University of Stuttgart. O’Connor, Joseph D. and Gordon F.Arnold 1973 Intonation of Colloquial English. London: Longman. Palmer, Harold E. 1922 English Intonation. Cambridge: Heffer. Pierrehumbert, Janet 1980 The Phonology and Phonetics of English Intonation. PhD Dissertation. Cambridge, MA: MIT. 1981 Synthesizing intonation. Journal of the Acoustical Society of America 70, 985–995. Pike, Kenneth 1945 The Intonation of American English. Ann Arbor: University of Michigan Press. Ramus, Franck and Jacques Mehler 1999 Language identification with suprasegmental cues: A study based on speech resynthesis. Journal of the Acoustical Society of America 105, 512–521
96
Matthias Jilka
van Santen, Jan and Julia Hirschberg 1994 Segmental effects on timing and height of pitch contours. Proceedings of the 3rd International Conference on Spoken Language Processing, Yokohama (Japan), 719–722. Silverman, Kim, Mary Beckman, John Pitrelli, Mari Ostendorf, Colin Wightman, Patti Price, Janet Pierrehumbert and Julia Hirschberg 1992 ToBI: A standard for labelling English prosody. Proceedings of the 2nd International Conference on Spoken Language Processing, Banff (Canada), 867–870. Trager, George L. and Henry L. Smith 1951 An Outline of English Structure. Norman, OK: Battenburg Press.
Rhythm as an L2 problem: How prosodic is it? William J. Barry 1.
Introduction
Making L2 learners aware of pronunciation problems in general and, more specifically, of the difference between their own pronunciation and the pronunciation they are supposed to acquire is extremely difficult, as any language teacher (interested in pronunciation) will attest.1 It should therefore be paramount that the terms we use to direct learners’ attention to problem areas should be clearly defined and easy to associate with the phenomenon that they need to learn. And there’s the rub! It is well-known that the pronunciation problems we face are difficult to illustrate, explain and demonstrate because: (i)
(ii)
(iii)
Acoustic phenomena remain as pre-categorical percepts in our consciousness for no more than a fraction of a second (Massaro 1972; Kallman and Massaro 1983) and as perceived categories (which already resist change in our manner of dealing with them) for no more than a few seconds (Crowder and Morton 1969, and compare Crowder 1993 and de Gelder & Vroomen 1997). We do not process the time-varying signal uniformly over time: The mechanisms we have developed in our L1 for decoding the phonetic information contained in the acoustic signal are attentiondirected and the properties to which attention is directed can differ in importance from language to language (cf. for example Hazan 2002, and see Quené and Port 2005 for effects of “rhythmically” induced attention). Our decoding mechanisms are geared primarily to the extraction of communicatively relevant information (the semantics of an utterance, its significance for the ongoing communication act). For this we do in fact make use of phonetic nuances of the utterance, but in terms of speaker identity interpretation (cf. Palmeri, Goldinger and Pisoni 1993), which may also serve speaker-attitude interpretation. But we are not concerned with pronunciation analysis.
98
William J. Barry
In summary, becoming aware of and learning a foreign pronunciation is problematical. But it is not impossible, as some people’s natural acqusition of an acceptable L2 accent testifies. That we all do react to the differences between external models and our internal pronunciation habits is illustrated by many adults living abroad who, after many years in the foreign-language environment, lose their perfect native pronunciation but do not acquire perfect L2 pronunciation (cf. Markham 1997). The potential for using the acoustic differences in teaching depends, however, on directing a learner’s attention to the differences, or to quote the opening thought in this introduction: “making L2 learners aware of pronunciation problems”. Finding a “hook” on which to hang the problem is a vital first step. Different problems present different degrees of difficulty in finding the right hook, and prosodic problems are particularly difficult. The thesis behind this paper is that Rhythm2 presents the greatest difficulties and we therefore need to rethink the status of Rhythm in pronunciation teaching.
2.
The “hooks” to swing on
Segmental problems are the easiest problems to explain because we have an orthography-to-sound relationship (itself a “spelling” problem of course) which our Western, reading- and writing-orientated education fixes in our mind. Of course, as pronunciation teachers, we have to fight continuously against the confusion between letters and sounds, but the letters (and letter combinations) provide a permanently recordable focus (on paper) for developing exercises. The PERmanent GRAPHic RECord can also be exploited in making learner’s aware of the word-stress concept. In terms of accessibility and learner awareness, word-stress is not so problematical because word identity (meaning) is central to everyone’s idea of learning a language. If, by chance, there are minimal pairs relying on word-stress, then the way you reCORD them helps to strengthen the concept.
Rhythm as an L2 problem: How prosodic is it?
99
0.2067 0 -0.3202
0
10
-10
0
4000
0
1.34376
yDê==É======â==========l§=======Ç=y===========yê==f======Dâ*=========l§= =======Ç=y Time (s)
a
Time (s)
0
b
1.34376
1.34376 Time (s)
Figure 1. Microphone signal, F0 and spectrogram of a) REcord and b) reCORD
Nowadays, with the ubiquitous notebook (PC) and readily available signalprocessing freeware (perhaps the most powerful package available is Praat: www.fon.hum.uva.nl/praat/), a signal-based graphic record can be presented together with the auditory example (see Fig. 1, and listen to the sound-file REcord-reCORD.wav) to create the necessary link between intellectual understanding of the concept and experience of the phenomenon itself. At this level, the relationship between the simple experience of syllabic prominence and the complex of prominence-bearing signal properties (duration, F0, intensity and vowel spectrum) can be demonstrated and may also become comprehensible beyond being merely a verbal formula. Both the “hooks” already mentioned in connection with word-stress can be exploited for work on sentence-stress, and the learner’s awareness of the phenomenon can be easily stimulated because, here too, the natural process of decoding the meaning of an utterance results in a difference in understanding of e.g., “I THOUGHT he aGREED” and “I THOUGHT he aGREED”. Of course, the graphic signal representation with an accompanying acoustic demonstration (see Fig. 2 and listen to the sound-file Fig2-agreed-a+b.wav) adds flesh to the skeletal understanding of the concept triggered by the purely orthographic representation.
100
William J. Barry
0.3006 0 -0.4913 0 10
-10 0 4000
2.59565
y~f=Dq==l§=í=ÜᧅªÖê==á§====d=Ly===========y=~f====@ q==l§===í=Üá§=…=¤Ö=ê==á§=Ç=y Time (s)
a)
Time (s)
0 0
b)
2.59565
2.59565 Time (s)
Figure 2. Microphone signal and F0 trace of a) “I THOUGHT he aGREED” and b) “I THOUGHT he aGREED”
With the signal representation, the relationship is again illustrated between the complex signal structure (duration, F0, intensity and – in this case less so – vowel spectrum) and the less complex perceived difference in the prominence patterns between the sentences (with the accompanying difference in their meanings). If we now look for a “hook” on which to hang the concept of intonation, we begin to run into a number of difficulties. Firstly, the melodic pattern, which is fundamental to intonational structure, cannot be so simply, or at least not so naturally demonstrated using the orthographic manipulation tools that were so helpful for word- and sentence-stress. But careful progression through the methods used in intonation description, from iconic to more abstract (see Fig 3a-d), should help to develop the learner’s awareness.3 Secondly, even though we recognize the primary role of tonal properties in intonation, a too narrow understanding of intonation as only the melodic pattern carried by the fundamental frequency contour is patently wrong. The contour carrying version b) of the sentence in figure 2 (I THOUGHT he AGREED) can be seen to rise from “I” to “thought”, to remain level for “he a-” and then to scoop down low and rise again during “greed”.
Rhythm as an L2 problem: How prosodic is it?
a) I
c)
I
THOUGHT
THOUGHT
H*
he a
GR
E
b)
E
D
he a GREED ^H* L-%
d)
•
•
101
•
•
•
I
THOUGHT
he aGREED
I
T HOUGHT
he a^GREED
Figure 3. Different graphical means of conveying the tonal contour of an utterance, with increasing abstractness from a) to d).
0.2876 0 -0.5598
0
10
2.98785
y=~f==Dq==l§=í=Ü==ᧅªÖê==á§===d=y=========y==~f====@ q=l§=í=Üᧅ=¤Ö=ê===á§====d=y Time (s)
Orig. -10 0 4000
a
Time (s)
0 0 10
b
2.98785
2.98785 Time (s)
-10 0
2.98785 Time (s)
Figure 4. “I THOUGHT he aGREED” and “I THOUGHT he AGREED” with same tonal contour. (top contour: original production, bottom: manipulated contour)
102
William J. Barry
Figure 4, however, shows basically the same contour in a perfectly acceptable realization of version a), i.e., with the secondary sentence accent on “thought” and the primary accent on “agreed” (I THOUGHT he aGREED. Listen to sound-files Fig4-orig.wav and Fig4-manip.wav). Although the melodic contour is the same, no-one would wish to say that the intonation is the same. The two versions of the utterance (Fig. 2b and Fig. 4) clearly have a different meaning4, and that is due to the difference in intonation, which is the product of the tonal movements in relation to the duration and intensity of the accented syllables.
3.
A hook for Rhythm?
Having looked for and found (albeit with increasing difficulty) “hooks” to hang our awareness teaching on, we can now ask what Rhythm is? Is it something above and beyond the three prosodic structuring levels – wordstress, sentence-stress and intonation – that we have considered so far, or is it perhaps below and part of them? Before addressing that question, we need to recognize that there is a progressive overlap in the acoustic and linguistic nature of each of the phenomena as we consider them in turn: Sentence-stress makes use of the lexical stress patterns to structure its prominences (and appears to use the self-same acoustic parameters); intonation needs the sentence-stress structure to fit its melodic pattern over. Looking at it in another way, we see that the separation of word-stress from sentence-stress, and sentence-stress from intonation is an artificial product of the particular level of observation and analysis. In reality they are not separable: In a one-word “sentence”, word-stress is sentence-stress and it also carries the intonation contour. Similarly, in the more usual multi-word utterances, sentence-stress relies in part on the tonal movements of the intonation contour to make the important words prominent, and the tonal movement relies on the durational (and apparently to a lesser extent) intensity properties of the accented words. What, then, is Rhythm in spoken language? One approach to the question is to try to relate the prosodic structuring of spoken language to a more general understanding of Rhythm.
Rhythm as an L2 problem: How prosodic is it?
103
3.1. Rhythm in music and spoken language Outside language, particularly in music of the Western tradition, rhythm is commonly understood to be the repeated pattern of prominent beats and the less prominent beats between them. We talk about a whole piece of music being “rhythmic” if there is a regular strong beat. But the nature of the rhythm depends on the number of weaker beats between the strong ones. These, it seems, have to be of a predominantly constant number, though an occasional reduction or increase in the number doesn’t change the perceived nature of the rhythm as a whole, as long as the temporal relationship between the strong and the weak parts of the bar is kept constant. Another important feature is that rhythm is not continuous throughout a piece, but is manifested within phrases, which often have boundary properties (e.g., a weak beat before the first strong beat, a final strong beat with no accompanying weak beats, etc.) which are different from the regular beats within the phrase. Projecting this common understanding of rhythm onto spoken language, we can immediately appreciate that spoken verse can be produced and perceived as “rhythmic” in a similar sense. This is because the words and phrases are selected to conform to one of the classical poetic metrical patterns of strong ( ) and weak (ˇ) syllables iambic (ˇ ), trochaic ( ˇ), dactylic ( ˇ ˇ), anapaest (ˇ ˇ )– often with a strict number of beats (feet) in the phrase (line). The close relationship between musical and poetic rhythm is apparent in words put to music and tunes to which words are written. However, the natural production of a poetically well-formed phrase in normal speech communication, though possible, is rather rare and regarded as special (as the post-hoc observation “I was a poet and didn't know it” bears out). A further consideration which separates classical poetic metre from natural speech is its application across (Western) languages, independent of a language’s status in terms of linguistic rhythm typology. The perceptual effect in different languages of, technically, the same metrical structure can be very different. We can thus close the case on normal spoken language rhythm being the same as musical rhythm and come to a second approach, the languagetypology approach to spoken-language rhythm. Since Lloyd (1940: 25), who famously described French as having a “machine-gun rhythm” and English as having “morse-code rhythm”5 an almost mystical belief has arisen in a rhythm-based division of the languages of the world into what Pike (1946) termed “syllable-timed” and “stress-timed” languages. The
104
William J. Barry
identification of a third type – “mora-timed” – was separate from this dichotomy and has been attributed to Bloch (1950) and to Ladefoged (1975). This characterisation is as attractive as it is problematic, both in general scientific terms and in respect of its possible application to L2 pronunciation. 3.2. Rhythm in language typology Scientifically, binary (or even ternary) features which contribute to the categorisation of language phenomena are attractive concepts which demand serious examination. To be phonologically relevant, however, there needs to be some structural correlate of Rhythm which is best explained by that concept rather than another (already established) phonological category. Alternatively, the term can be based on the conceptual grouping of a number of structural correlates, possibly already established at other levels of description. Ideally, these structural properties should have identifiable phonetic exponents, either in measurable aspects of speech production or in reliable perceptual reflexes. Although there is extensive linguistically orientated and often experimentally supported discussion of the supposed universal rhythmic distinctions (cf. Bertinetto 1989 for a thorough and humorously (self-)critical discussion of the literature up to that point), the majority devoted to the syllable- vs. stress-timed distinction, no single structural correlate has been found which justifies the labels as phonological categories in the normal sense or the term. On the other hand, it has been suggested (Bertinetto 1981; Dauer 1983, 1987) that differences in rhythm type are the product of a number of phonologically relevant dimensions, among which are structural properties such as syllable complexity, vowel-length distinctions and word-stress, and interactional prosodic effects such as vowel-duration- and vowel- quality-dependency on stress, the coincidence (or not) of intonational F0 peaks and troughs and of lexical tones with accented syllables. In this respect, Rhythm becomes a phonologically relevant cover term, but no longer in the sense of a rhythm dichotomy or trichotomy. There is no reason to expect the properties listed by Dauer (1987) to group into two neat packages supporting the syllable- vs. stress-timed division, and the position of the mora-timed languages relative to the implied continuum (if the properties group freely) is undefined.
Rhythm as an L2 problem: How prosodic is it?
105
Psycholinguistic research, on the other hand, offers some support for the perceptual reality of the rhythmic typology divisions in terms of lexical processing, at least for the languages which are cited as being prototypical for the three rhythmic types (French, English and Japanese). Cutler et al. (1986), Cutler & Otake (1994), Cutler (1997), Cutler, Murty and Otake (2003), Otake et al. (1993) have demonstrated lexical access differences in terms of the effect of the syllable or the mora on the speed of access. It must be acceded, however, that however real these processing differences are, they do not relate to any concept of Rhythm we have discussed. A perceptual acceptability study by Bertinetto and Fowler (1989), however, demonstrated that English listeners are relatively insensitive to durational manipulation which shortens unstressed syllables compared to Italian listeners (though neither is particularly sensitive to lengthening of unstressed syllables). This corresponds to the results of production analyses for Italian (Farnetani & Kori, 1990) and Greek (Arvaniti, 1994) which, for these two languages, support the “syllable-timing” claim that sequences of more than two unstressed syllables are articulated without any “eurhythmic” differentiation. In languages such as English, sequences of more than two unstressed syllables are produced in such a way as to provide longer, less reduced syllables between shorter, more reduced ones, resulting in a perceptible alternating “rhythm”. It should be borne in mind, however, that this “rhythmic” difference occurs, and becomes apparent only in the perceptually less prominent parts of utterances between the more prominent syllables of sentence-accentuated (i.e. informationally important) words. The prominence patterning of the complete “information package” (possibly a sentence, or an intonation phrase within a longer sentence) will be necessarily more complex than the sequence of unstressed syllables alone. This may, in part, explain why no instrumental analyses looking for isochrony (either of syllables or feet) have been successful (Roach 1982 and compare Bertinetto 1989). Instrumentally based attempts to define the rhythmic types in quantitative terms at the level of production can be divided mainly into two approaches: those seeking syllable- vs. foot-based durational regularity or isochrony (cf. Bertinetto 1989) and those looking for differences between languages in the degree of variability in consecutive (part-of-) syllable durations (Grabe and Low 2002; Ramus, Nespor and Mehler1999; Gibbon and Gut 2001; Wagner and Dellwo 2004). The earlier studies sought regularity, seeking some confirmation in substance for the original auditory impressions. They were singularly unsuc-
106
William J. Barry
cessful, and it appears currently to be generally accepted that there is no direct acoustic, nor articulatory measure of the syllable- vs. stress-timed distinction. Studies that included mora-timing in their remit (Hoequist 1983a, 1983b) have been no more successful. The studies quantifying the structural variability of syllables are based broadly on the theoretical framework suggested by Bertinetto (1981) and Dauer (1983, 1987), though their measures are restricted to durational derivatives directly or indirectly linkable to many of the structural properties. They capture either the overall variability of the syllable, vowel or consonantal interval durations (e.g. with the standard deviation) or the average degree of durational change from one interval to the next throughout an utterance or a corpus. There has been considerable success in differentiating between languages traditionally regarded as belonging to one of the three rhythm types (cf. Ramus 1999; Grabe and Low 2002). However, there is no reason to consider the measures to be a reflex of specifically rhythmic rather than general structural properties (Barry et al. 2003, Wagner and Dellwo 2004), and it has been demonstrated (Steiner 2003) that, in the Bonn database at least, any subdivision of the sound inventory into “vocalic” and “consonantal” intervals, and ultimately the distribution of /l/ and /n/ in the different languages which served as language differentiators. But as measures of language classification (rather than language differentiation), they may be unreliable because they can be strongly influenced by speech rate (cf. fig. 5 from Barry et al. 2003), showing a shift from more to less variable structure with increasing articulation rate (see also Engstrand & Krull 2001 for similar observations on Swedish read vs. spontaneous speech). The extent of this shift observed in Barry et al. (2003) is almost certainly, in part, an artefact of the structural basis for the calculation of articulation-rate, i.e., syllables per second. This is unreliable for spontaneous speech, since the word sequences being compared are not identical and structurally less complex syllables are, ceteris paribus, produced more quickly than complex ones. In other words, the division of the corpus into three sub-corpora of differing articulation rates is also a division into utterances with different average syllable complexities. However, even the Bonn speech rate corpus (Dellwo et al. 2004) shows considerable inter-syllabic variability over slow-to-fast speech rates for lexically controlled (albeit read) speech (Dellwo and Wagner 2003). It is again the consonant variability measure (DeltaC) which shows the strongest variation (for German, English and French, though with a deviation from the general pattern for the fastest rate in English). In
Rhythm as an L2 problem: How prosodic is it?
107
their discussion, Dellwo & Wagner (2003) touch on the problem of different tempo norms in French compared to English or German, and suggest the normalizing variation coefficient (varco = DeltaC * 100 / meanC) as a means of teasing out language differences. They report an interesting separation of French (varco remains constant) on the one side from German and English on the other (varco changes with articulation rate). In articulatory terms, we suggest (without insight at present into the details of their findings) that German and English (and by analogy also Swedish, cf. Engstrand and Krull 2001) tend to simplify the potentially more complex syllable structure with increasing articulation rate, whereas French has less scope for such simplification. 100
120
Ba2 90
Gsp Ba2 Gr
90
Na Pi
70
PVI-C
DELTA-C
Pi
Na
80
Gsp Ba2 GrBa1 Bu1 Na Pi Gr Gsp Bu1
60
50
40
Ba2
Pi
Ba1
Ba2 Na Gr Ba2Ba1 Pi Na Bu1
60
Bu1
Bu1 Bu2 Bu2 Bu2
30 25
Bu1
Ba1 Na
Gsp
Pi
GrGsp Gr Gsp Bu2 Bu2 Bu2
30 50
75
100
DELTA-V
125
150
40
45
50
55
60
65
PVI-V
Figure 5. Measures of consonantal and vocalic variation (calculated after Ramus 1999 and Grabe & Low 2002) as a function of articulation rate (syll/sec). (Gsp = spontaneous German; Gr = read German; Pi = Pisa; Na = Naples; Ba1 and Ba2 = Bari; Bu1 and Bu 2 = Bulgarian)
A different approach by Cummins and Port (1998), using short, twobeat phrase repetition, does show clear production-pattern differences between French and English which are interpretable in terms of stress- vs. syllable-timing. Whereas English speakers appear to introduce an underlying, silent beat in order to regularize the timing of the repeated phrases at foot level, French speakers do not. This is interpretable as a sensitivity to foot-based structuring in English speakers and an inability in French speak-
108
William J. Barry
ers to structure the utterance rhythmically above the syllable level. The question arises, however, whether the “isochronic tendency” that is observable across phrases when they are repeated – akin to Abercrombie's silent stressed syllable in “__ ’kyou” (observed in repeated “Thank you” utterances, e.g. by a bus conductor, cf. Abercrombie 1967: 36) – corresponds to a need to regularize “feet” or “stress units” within a phrase. To summarize the attempts to pigeonhole Rhythm in linguistic terms over the past half-century, it is true to say that the many “isochrony” studies have looked for something measurable that is immediately relatable to “regularly repeated beats”. They have attempted (in vain) to verify instrumentally the original auditory observations by skilled phoneticians about contrasting “rhythmic” impressions of a small number of languages (originally only two). Since the 1980s, structural differences between languages have been moved into focus, and the thrust of work has been to identify differences between languages which conspire in one but not in the other group of languages to prevent the syllables in an utterance from occurring at equal intervals. Some are based in the segmental structure, like vocalic quantity oppositions or variable syllable complexity; others are observations of prosodic behaviour, like the tendency, or lack of it (a) to reduce the duration and spectral distinctiveness of unstressed syllables between accented ones and (b) to compensatorily shorten accented syllables as a function of the number of unstressed syllables following. The instrumental measures associated with this theoretical view (summarized above) have indeed shown that languages can be differentiated, and that they appear to divide up into groups containing languages that have traditionally been described as syllable-timed or stress-timed. However, the wide range of values across languages belonging to the “same” rhythmic group, the assumption of “mixed” rhythm types, and the conflicting positioning of the same language from one study to another within the language selection examined cast doubt both on the validity and on the “rhythmic” basis of the distinctions. 3.3. Rhythm in L2 In connection with the teaching and acquisition of a correct Rhythm in a foreign-language, the first question could well be whether the discussion so far has any relevance at all?
Rhythm as an L2 problem: How prosodic is it?
109
Most teachers probably associate the idea of Rhythm with the regular beats discussed and rejected as a normal phenomenon in non-poetic speech (section 3.1). A well-established German programme (for young French learners) which makes explicit use of Rhythm as an integral part of teaching active speech production (Andreas Fischer: www.phonetik-atelier.de) in fact uses rhythmic movement, simple rhythm instruments and silent beats to help young French learners of German to produce regular accent intervals and avoid the perceptually much more equal weight attached to consecutive syllables in French. On the other hand, a more complex and analytic view of rhythm in speech production is presented by Stock & Veličkova (2002, cf. also Veličkova 1990, 1993). Equally concerned with the practicalities of teaching and learning (with the focus on adult learners), they acknowledge the persistence of isochrony as the established view while discussing the possible bi-directional interpretation of Rhythm a) as the determinant of the segmental and prosodic properties associated with the rhythm-typology divisions and b) as the product of those properties (cf. also Krull and Engstrand 2003). The use in teaching of gestural support for segmental properties which affect the overall prosodic pattern of a phrase (e.g. vowel length in stressed syllables, cf. Veličkova 1990, 1993) underlines the recognition that properties from all levels of language structure contribute to a gestalt-like experience of Rhythm in an utterance. They consequently deal with rhythmic patterns of phrases, both as stand-alone expressions and sequenced within texts, which reflect the lexical stress patterns of the words used and the information weighting of those words within the context. The question of isochrony in stress-timed languages hardly arises because many phrases, even within a longer text, do not exceed two accents – even if they contain more than two accentable words. The example given by Stock and Veličkova (2002: 29) may serve as illustration of how the number of accents can vary for a given word sequence: // manche kol / legen / wissen das aber / nicht // 0 0 X 0 X 0 X 0 X X X 0 X X X X
(The accents, marked with X, can vary from one to four. Only the fourth variant, with all four feet accented, deviates from the default nuclear posi-
110
William J. Barry
tion (wissen). Isochrony is testable in the fourth, and in the third variant in a more limited way by excluding the last foot from the metrical frame) Putting it at its simplest and most extreme, we can say: Every utterance has its own particular “Rhythm-pattern”, determined by the relative communicative weight attributed (by the particular speaker) to the particular words within the particular syntactic structure within the particular communicative context. We thus define Rhythm as the situation- and utterancedependent pattern of prominences and shall use the term “prominence pattern” instead of Rhythm from now on. This information-based view allows for a considerable amount of variation in the “rhythmic” realisation of any given sequence of words, and underlines the importance of the teaching maxim that nothing should be taught without contextualisation. However, given the situation and the linguistic pre-context, and a not-too-eccentric speaker, the actual degree of freedom is much smaller. The choice of which and how many words to make communicatively prominent, as well as the relative prominence of the accented words are fairly strictly delimited. Figure 6 shows the tight syllable-duration clustering for the main accents in the sentence “Heute morgen bin ich zu spät aufgestanden” (I got up too late this morning) spoken in an unmarked manner, as if introducing a story. For the unaccented words there is more individual variation. In unaccented multi-syllabic words, the lexical-stress pattern for citation-form production can disappear, be dynamically (but not tonally) retained, or even shifted, depending on the language (e.g. its status within traditional rhythm-typology classes). Above all, the treatment of unstressed syllables will depend on the language, though in spontaneous speech, the occurrence of elision and assimilation phenomena, particularly at word boundaries appears to cut across traditional rhythm-typology differences (Barry and Andreeva 2001). However, the general observation which is supposed to separate “stress-timed” from “syllable-timed” languages, namely the tendency to reduce unstressed syllables in the latter and to retain the full phonetic identity in the former, certainly has some languagedifferentiating validity. But they do not all behave in the same way. As a comparison of English, Bulgarian, Russian on the one hand with German, Dutch, Swedish on the other – all of them “reducing” languages – will show: there are very different forms and degrees of reduction. Dutch, English, German and Swedish reduce the quantity of long vowels in unstressed position, but only English has systematic quality reduction of the vowels (towards schwa). Bulgarian and Russian do not have a long-short vowel
Rhythm as an L2 problem: How prosodic is it?
111
opposition, which precludes quantity reduction, but they do (like English) have spectral change in unstressed vowels, albeit in a more complex manner than the general centralization tendency found in English.
Normalised Syllable/Segments
,9
Segmentally normalized syllable duration
,8
Speaker AD
,7
ad AK
,6
ak MK
,5
mk SO
,4
so SW
,3
sw
,2
TH
th
,1 1
2
3
4
5
6
7
8
9
10 11 12
Syllable sequence of sentence: „Heute morgen bin ich zu spät aufgestanden“ Syllable Sequence Figure 6.
“1-Heu 2-te 3-mor 4-gen 5-bin 6-ich 7-zu 8-spät 9-auf 10-ge 11-stan 12-den”. Articulation-rate-normalized syllable durations for 6 native speakers of German.
However, it is not only the fact that languages differ in their reduction patterns which leads to prominence-pattern deviations in the speech production of non-native speakers. Prosodic differences alone may lead to learners with “stress-timed” L1 reducing the articulatory effort invested in the (unstressed) syllables between accents when speaking a “syllable-timed” L2, and possibly introducing spurious (eurhythmic) prominences into sequences of more than two unstressed syllables (cf. Arvaniti 1994; Farnetani and Kori 1990). However, all learners, whatever their L1, tend to overarticulate in comparison to native speakers. This makes the step for a “syllable-timed” learner producing “stress-timed” utterances particularly diffi-
112
William J. Barry
cult, although, as figure 7 shows, even experienced “stress-timed” learners of another “stress-timed” language (English, Russian) together with speakers of an assumed “syllable-timed” language (Korean) all tend to deviate from the average native-speaker pattern in a way which reflects too little differentiation of accented syllables and unstressed syllables6 (cf. also Gut 2002, 2003; Benkwitz 2003). ,7
,6
Sprecher Speaker
Segmentally normalized ,5 syllable duration (average for accent strength ,4 category)
English English
bb Korea I
Korea 1
kh-a
Korea II
Korea 2
kh-b
German
average German D. average Mittelw.
,3
Russian
,2 1
ot 2
3
Accent strength category Accent 1= main; 2 = secondary; 3 = unstressed Strength
Akzentklasse
Figure 7.
“Heute morgen bin ich zu spät aufgestanden”. Articulation-ratenormalized syllable durations for 5 L2-speakers of German in comparison to the average native-speaker durational pattern.
We see from the preceding discussion that some factors affecting the prominence pattern stem from segmental changes which (depending on the language) do or do not co-occur with stress- and accent-status. Since such segmental changes are unlikely to become established as an unconsciously
Rhythm as an L2 problem: How prosodic is it?
113
absorbed corollary of explicit, holistic rhythmic speaking practice (even with young French children), they need to be dealt with specifically. Indeed (taking English as an example), (i) weak forms, (ii) final voiced consonants (with their longer preceding vowels), (iii) vowel-length and -quality contrasts, (iv) consonant-cluster reductions at word-boundaries etc. are all accepted points of pronunciation practice, whether the learner comes from an assumed “syllable-timed” or from a “stress-timed” language. The thesis postulated here is that the sum of these (essentially segmental) properties are the determining features of an acceptable (prosodic) prominence pattern. Introducing the concept of foot-based isochrony (i.e. rhythmic regularity) on top of all these syllable-realization exercises is not only unnecessary, but also induces an element of stylization and artificiality which, if it actually becomes established in the learners’ production patterns, will have to be unlearned again.
4.
Conclusions
The assumption behind our discussion has been that the goal of pronunciation teaching is to make the learners aware of the nature of the task they have to practise. With regard to the Rhythm concept, awareness is most easily linked to the idea of isochrony, i.e. to a regular beat, traditionally considered to characterize so-called “stress-timed” languages (a regular beat of accented syllables) and “syllable-timed” languages (a regular syllabic beat). We maintain, however, that this is both unhelpful and misleading in the L2 teaching environment. Richard Cauldwell’s (2002: 1) recent summing up of the situation corresponds very much to the view we have tried to present in this paper: Although the formal events of speech – phones, strong and weak syllables, words, phrases – occur ‘in time’ (they can be plotted on a time line) they do not occur ‘on time’, (they do not occur at equal time intervals). English is not stress-timed, French is not syllable-timed. The rare patches of rhythmicality are either ‘elected’ – as in scanning readings of poetry and the uttering of proverbs – or ‘coincidental’ – the side-effects of higher order choices made by speakers. Coincidental rhythmicality is most likely to occur where there are equal numbers of syllables between stresses. In spontaneous speech, the speaker’s attention is on planning and uttering selections of
114
William J. Barry
meaning in pursuit of their social-worldly purposes, and this results in an irrhythmic norm which aids comprehension.
Ulrike Gut (2003) retains the term “rhythmical” in her study of prosodic behaviour in a number of different learner-groups' production of L2 German. However, operationally, she breaks Rhythm down into durational and metrical characteristics, the latter being defined as the relative prominence of units such as syllables. It is debatable whether prominence can, ultimately, be separated from duration (cf. Kochanski et al. 2005, however, who consider “loudness” separately from “duration” as determinants of prominence) but this is irrelevant for teaching purposes. It is important that individually learnable properties of language be brought into focus – informationally important (prominent) words, informationally less important words (and syllables within multi-syllabic words), long and short vowels, spectrally reduced vowels, consonant elision, etc. The contextualized introduction and practice of these properties in an optimal sequence is, of course, a non-trivial task. But their command will lead to a globally correct prosody and, in time, to a sense of prosodic “rightness” for the particular communicative intention in the same way that learning verb or noun morphology and syntactic regularities will lead to a command of the correct form and sequence of words. In neither of these areas would one think of introducing teaching points by appealing to a sense of “Morphology” or “Syntax”. We suggest that the appeal to a general idea of “Rhythm” which is abstracted from the prominence pattern of the particular utterance is equally unproductive. The implication of the message conveyed by this discussion will no doubt annoy those teachers who would like a lot of different pronunciation problems to be covered by one “rhythmic blanket”. But the facts remain: Prosodic differences between languages – and our discussion leads to the conclusion that correct Rhythm is the sum of the communicatively correct (i.e. contextually and situationally correct) prosodic properties – are distributed over all levels of phonetic-phonological structure. Correct pronunciation cannot emerge from an appeal to an undefined blanket term. Knowledge and treatment of individual problems remains essential. The articulation of individual sound, which can be “new sounds” (like /y/ for English speakers or /6, &/ for German speakers), combinations of familiar sounds (like /kn/ for English speakers) or combinations of new with familiar sounds (alveolar non-sibilants followed by dental fricatives for most learners of English) and new distributional patterns (like final
Rhythm as an L2 problem: How prosodic is it?
115
voiced consonants for Germans) lead to a slowing down of articulatory processes in their vicinity, which inevitably affects the overall prosodic pattern. Direct prosodic repercussions arise from differences in length oppositions between languages (whether L1 and L2 both have long and short vowels or long and short consonants) and intra-syllabic phonetic length relations (e.g. long consonants following short vowels and vice versa, as is the case in Swedish). Thus we see that a considerable amount of segmentally orientated pronunciation work, assuming that it is satisfactorily contextualized, contributes to rhythmically correct speech. At the level of prosodic or pronunciation practice, correct word-stress location is an obvious and fundamental contributor to the correct rhythmic identity of an utterance. But, as we identified in the discussion, even with correct stress location, the phonetic means of realizing the stress can be different from one language to another and thus distort the rhythmic impression. In Italian, for example, the vowels in open syllables are lengthened when stressed, and even more lengthened when given a topical or focal accent. This leads to the well known rhythmic distortions of Italian speakers of other languages, but is, of course, also the source of rhythmic distortion for learners of Italian. French is also a language that exploits a large degree of syllabic lengthening for the informational or affective weighting of words at utterance level (despite the fact that, phonologically, French has neither phonemic vowel length distinctions nor word-stress). It must be clear from this short selection of rhythm-distorting problems that a global appeal to language rhythm as “stress-timed” or “syllabletimed” is of no advantage. This does not mean, however, that learners of French should not be made aware of the fact that (outside the topical, focal or emphatic accents) syllables are all given as equal weight as possible, and vowels are not reduced (a statement often associated with “syllabletiming”). Nor does it mean that learners of English should not be made to practise the reduction and temporal compression of unstressed and unaccented syllables in words and phrases (a statement often associated with “stress-timing”). It does mean that teachers need to be aware of a lot more differences between the respective L1s and L2s, of the problems that contribute to incorrect pronunciation in general, and to the incorrect rhythmic impression of utterances in particular.
116
William J. Barry
Notes 1. In fact it is so difficult that many teachers neglect pronunciation because their own awareness has lagged way behind their expertise in other areas such as grammar and vocabulary. These have the advantage of being capturable in a permanent form – in writing – for post hoc consideration. 2. We use the word Rhythm throughout the paper (with a capital R) for the term we are discussing and calling in question as an independently identifiable phenomenon. 3. A universal ability to register tonal differences and types of tonal movement in speech should not be taken for granted, even if the universal ability (for the non-handicapped) to communicate with speech might make us assume it. How absolutely necessary the decoding of tonal structure is for successful (contextualized) speech communication has not been investigated, and tonal structure is accompanied by several other signal properties, as we have already shown in figs 1 and 2. This suggests the possibility of compensatory decoding, i.e., making use of other than tonal properties for speaker-hearers insensitive to tone. 4. The fig. 2b version implies that the speaker is confirming that the fact of the other person’s agreement corresponded to his/her (i.e. the speaker’s) assumption. The fig. 4 version implies that the speaker’s previous assumption of agreement by the other person might not be true; it expresses some degree of protest. 5. Quoted from Abercrombie (1967), p. 171, endnote 7. 6. The three accent-strength categories over which syllable durations were calculated are: (i) tonally prominent accented syllables, (ii) unaccented syllables without vocalic reduction and (iii) unstressed syllables.
References Abercrombie, David 1967 Elements of General Phonetics. Edinburgh: Edinburgh University Press. Arvaniti, Amalia 1994 Acoustic features of Greek rhythmic structure. Journal of Phonetics 22, 239–268. Barry, William and Bistra Andreeva 2001 Cross-language similarities and differences in spontaneous speech patterns. Journal of the International Phonetic Association 31, 51–66.
Rhythm as an L2 problem: How prosodic is it?
117
Barry, William, Bistra Andreeva, Michela Russo, Snezhina Dimitrova and Tania Kostadinova 2003 Do rhythm measures tell us anything about language type? Proceedings of 15th International Congress of Phonetic Sciences, Barcelona, Vol. 3, 2693–2696. Benkwitz, Anneliese 2004 Kontrastive phonetische Untersuchungen zum Rhythmus. (Hallesche Schriften zur Sprechwissenschaft und Phonetik 14). Frankfurt am Main etc.: Peter Lang. Bertinetto Pier Marco 1981 Strutture prosodiche dell’italiano. Firenze: Accademia della Crusca. 1989 Reflections on the dichotomy ‘stress’ vs. ‘syllable-timing’. Revue de Phonétique Appliquée 91-92-93, 99–130. Bertinetto, Pier Marco and Carol Fowler 1989 On the sensitivity to durational modifications in Italian and English. Revista di Linguistica 1, 69–94. Bloch, Bernard 1950 Studies in colloquial Japanese IV: Phonemics. Language 26, 86– 125. Cauldwell, Richard 2002 The functional irrhythmicality of spontaneous speech: A discourse view of speech rhythms. Applied Language Studies: Apples 2,1, 1–24. Crowder, Robert G. and John Morton 1969 Precategorical acoustic storage (PAS). Perception and Psychophysics 5, 363–73. Crowder, Robert G. 1993 Short-term memory: Where do we stand? Memory and Cognition 21, 14–145. Cummins, Fred and Robert F. Port 1998 Rhythmic constraints on stress timing in English. Journal of Phonetics 26, 145–171. Cutler, Anne, Jacques Mehler, Dennis G. Norris and Juan Seguí 1986 The syllable’s differing role in the segmentation of French and English. Journal of Memory and Language 25, 385–400. Cutler, Anne and Takashi Otake 1994 Mora or phoneme? Further evidence for language-specific listening. Journal of Memory and Language 33, 824–844.
118
William J. Barry
Cutler, Anne 1997
The syllable’s role in the segmentation of stress languages. Language and Cognitive Processes 12, 839–845. Cutler, Anne, Lalita Murty and Takashi Otake 2003 Rhythmic similarity effect in non-native listening? Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Vol. 1, 29–332. Dauer, Rebecca M. 1983 Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62. 1987 Phonetic and phonological components of language rhythm. Proceedings of the 11th International Congress of Phonetic Sciences, Tallinn (Estonia), Vol. 5, 447–450. Dellwo, Volker and Petra Wagner 2003 Relations between language rhythm and speech rate. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Vol. 1, 471–474. Dellwo, Volker, Ingmar Steiner, Bianca Aschenberner, Jana Dankovičová and Petra Wagner 2004 The BonnTempo-Corpus & BonnTempo-Tools: A database for the study of speech rhythm and rate. Proceedings of the 8th Intenational Congress of Speech and Language Processing. ICSLP, Jeju Island (Korea), 777–780. Engstrand, Olle and Diana Krull 2001 Simplification of phonotactic structures in unscripted Swedish. Journal of the International Phonetic Association 31, 41–50. Farnetani, Edda and Shiro Kori 1990 Rhythmic structure in Italian noun phrases: A study on vowel duration. Phonetica 47, 50–65. Gelder, Beatrice de and Jean Vroomen 1997 Modality effects in immediate recall of verbal and non-verbal information. European Journal of Cognitive Psychology 9(1), 97–110. Grabe, Esther and EeLing Low 2002 Durational variability in speech and the rhythm class hypothesis. In: Carlos Gussenhoven and Natasha Warner (eds.) Papers in Laboratory Phonology VII, 515–546, Berlin, New York: Mouton de Gruyter. Gibbon, Dafydd and Ulrike Gut 2001 Measuring speech rhythm. Proceedings of Eurospeech 2001, Aalborg (Denmark), 91–94.
Rhythm as an L2 problem: How prosodic is it?
119
Gibbon, Dafydd 2003 Computational modelling of rhythm as alternation, iteration and hierarchy. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Vol. 3, 2489–2492. Gut, Ulrike 2003 Prosody in second language speech production: the role of the native language. Zeitschrift für Fremdsprachen Lehren und Lernen 32, 133–152. Hazan, Valerie 2002 L'apprentissage des langues. Proceedings of XXIVemes Journees d'etude de la parole, Nancy, 1–5. Hoequist, Charles J. 1983a Durational correlates of linguistic rhythm categories. Phonetica, 40, 19–31. 1983b Syllable duration in stress-, syllable- and mora-timed languages. Phonetica 40, 203–237. Kallman, Howard J. and Dominic W. Massaro 1983 Backward masking, the suffix effect, and preperceptual storage. Journal of Experimental Psychology: Learning, Memory, and Cognition 9, 312–327. Kochanski, Greg, Esther Grabe, John Coleman and Bert Rosner 2005 Loudness predicts prominence: fundamental frequency lends little. Journal of the Acoustical Society of America 118, 1038– 1054. Krull, Diana and Olle Engstrand 2003 Speech rhythm – intention or consequence? Cross-language observations on the hyper-hypo dimension. PHONUM 9, 133–136. Ladefoged, Peter 1975 A Course in Phonetics. New York: Harcourt Brace Jovanovich. Lloyd James, Arthur 1940 Speech Signals in Telephony. London: Sir I. Pitman & Sons. Markham, Duncan 1997 Phonetic Imitation, Accent, and the Learner. Lund: Lund University Press. Massaro, Dominic W. 1972 Preperceptual images, processing time, and perceptual units in auditory perception. Psychological Review 79,124–145. Mehler, J., J.-Y. Dommergues, U. Frauenfelder and J. Seguí 1981 The syllable’s role in speech segmentation. Journal of Verbal Learning and Verbal Behaviour 20, 298–305.
120
William J. Barry
Otake, Takashi, Giyoo Hatano, Anne Cutler and Jacques Mehler 1993 Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language 32, 358–378. Palmeri, Thomas J., Stephen D. Goldinger and David B. Pisoni 1993 Episodic encoding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, and Cognition 19, 309–328. Pike, Kenneth L. 1946 The Intonation of American English. Ann Arbor: University of Michigan Press. Quené, Hugo and Robert F. Port 2005 Effects of timing regularitiy and metrical expectancy on spokenword perception. Phonetica 62, 1–13. Ramus, Franck, Marina Nespor and Jacques Mehler 1999 Correlates of linguistic rhythm in the speech signal. Cognition 73, 265–292. Roach, Peter 1982 On the distinction between “stress-timed” and “syllable-timed” languages. In: David Crystal (ed.), Linguistic Controversies, 73– 79, London: Edward Arnold. Steiner, Ingmar 2005 On the analysis of speech rhythm through acoustic parameters. In: Bernhard Fisseni, Hans-Christian Schmitz, Bernhard Schröder and Petra Wagner (eds.) Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen: Beiträge zur GLDV-Tagung 2005 in Bonn. (Computer Studies in Language and Speech 8). 647–658, Frankfurt/Main: Peter Lang. Stock, Eberhard and Ludmila Veličkova 2002 Sprechrhythmus im Russischen und Deutschen. (Hallesche Schriften zur Sprechwissenschaft und Phonetik 8), Frankfurt/M.: Peter Lang. Veličkova, Ludmila 1990 Untersuchungen zur Theorie und Praxis des Phonetikunterrichts. Habilitationsschrift, Halle. 1993 Die Vermittlung phonologischer Distinktionen mit einem Gestensystem. Deutsch als Fremdsprache 30, 253–258. Wagner, Petra and Volker Dellwo 2004 Introducing YARD (Yet Another Rhythm Determination) and reintroducing isochrony to rhythm research. Proceedings of Speech Prosody, Nara (Japan), 227–230.
Temporal patterns in Norwegian as L2 Wim A. van Dommelen 1.
Introduction
One of the fundamental properties of spoken language is that it, like all physical events, extends over time. Consequently, much of the last decades’ research in phonetics has been devoted to the investigation of the temporal organization of speech. An issue that has evoked much debate and stimulated empirical research is concerned with rhythmical differences between languages. For a discussion of the traditional classification of languages into stress-timed, syllable-timed and mora-timed, see Barry (this volume; cf. also Dauer, 1983). In spite of numerous research efforts devoted to issues of speech timing in general and of language-specific timing in particular, our knowledge and understanding are far from complete. This is true for temporal aspects of language spoken by native speakers (L1 speech) and, even more so, for L2 speech. The approach chosen for the present study is to analyze temporal aspects of Norwegian as a second language spoken by speakers from various language backgrounds in comparison with native Norwegian. The rationale behind this is that we hope to obtain information not only about what deviations from the L1 standard occur but also whether such deviations pattern in language-specific ways. Our study consists of two main parts. Following the description of the collection of subjects, speech material and segmentation in Section 2, the first part (Section 3) deals with the temporal structure of dyads consisting of a vowel followed by a consonant. A special property of the Norwegian phonological system is the quantity system which involves not only the vowels but also the consonants. In a stressed syllable, the vowel can be either short or long while unstressed syllables only can contain short vowels. Consonants in stressed syllables have a complementary distribution of duration, being long after a short vowel (e.g. in matte [¥OCVÖ] ‘mat’) and short following a long vowel (e.g. in mate [¥OCÖV] ‘[to] feed’). The phonological specification of the VC: vs. V:C opposition has been subject of some debate, the question being whether we are dealing with a vowel or a consonant quantity opposition (cf. Kristoffersen, 2000:
122
Wim A. van Dommelen
116–120). From a phonetic viewpoint it seems reasonable to argue that the vowel is the carrier of the quantity opposition. Previous investigations have shown that the ratio V:/V is considerably larger than the C:/C ratio. Fintoft (1961) measured vowel and consonant durations in isolated Norwegian logatomes. He reported a V:/V ratio of approximately 1.9 (varying between 1.7 and 2.1 depending on the nature of the following consonant as a fricative, nasal or liquid). In contrast, the duration ratio of medial long vs. short consonants amounted only to approximately 1.3 (varying between 1.2 and 1.4). Quite similar relations were found by Behne, Moxness and Nyland (1996) through their measurements of the durations of long and short vowels preceding voiced and voiceless plosives in Norwegian sentenceembedded words. From the data presented in their Figure 1, average duration ratios (pooled across voiced and voiceless plosives) of 1.8 and 1.3 can be calculated for V:/V and C:/C, respectively. Also, results on the perception of a long vs. short vowel followed by a voiceless stop by van Dommelen (1999a) suggest that vowel duration is a far more important cue for the perception of vowel quantity than the consonant (cf. also Krull, Traunmüller and van Dommelen, 2003). In our study we thus address the question of how users of Norwegian as a second language realize the VC: and V:C dyads. A point of particular interest will be the kind and amount of variation in the L2 productions. If the variation in vowel and consonant durations is relatively limited, we might be able to detect deviation patterns that are characteristic for L2 user groups from specific language backgrounds. Larger variation, on the other hand, could obscure such possible patterns and render it difficult to draw firm conclusions about typical deviations from Norwegian reference values and differences between the realizations from the L2 speaker groups. The second part of our investigation (Section 4) is concerned with speech rhythm in L2 compared with L1 speech. In recent studies attempts have been made to classify languages according to rhythmical categories using various metrics. To investigate rhythm characteristics of eight languages, Ramus, Nespor and Mehler (1999) calculated the average proportion of vocalic intervals and standard deviation of vocalic and consonantal intervals over sentences. Though their metrics appeared to reflect aspects of rhythmic structure, also considerable overlap was found. Grabe’s Pairwise Variability Index (PVI; see Section 4.1) is a measure of differences in vowel duration between successive syllables and has been used by, e.g., Grabe and Low (2002), Ramus (2002) and Stockmal, Markus and Bond (2005). In order to achieve more reliable results Barry et al. (2003) pro-
Temporal patterns in Norwegian as L2
123
posed to extend existing PVI measures by taking consonant and vowel intervals together. In her 2003 study Gut compared the speech of learners of German with English, Chinese and Romance languages as L1 with the speech of two native speakers of German. For utterances produced by these speakers she used a Rhythm Ratio (RR) to explore the temporal organization of subsequent syllables. Though the Romance language speakers produced syllables that tended to be of more similar duration than those from the German speakers, the difference did not achieve statistical significance. Also the RR values for the English and Chinese subjects did not differ significantly. The present study approaches the issue of language-specific speech rhythm indirectly by comparing the temporal structure of L2 utterances with similar utterances produced by native speakers. More specifically, we will use different measures derived from the sequences of syllables in utterances and use a discriminant analysis to explore whether those measures can be related to the different L2 groups investigated. For the present purposes, the main function of a discriminant analysis is the following. The first step is to define a number of variables (here mean syllable duration, durational differences between consecutive syllables, etc.; a complete description is given in Section 4.1). Secondly, these variables are entered into the analysis together with an a priori classification which in our case represents the six different groups of L2 users. The discriminant analysis then uses the variables to classify the input data into groups, importantly without any prior information about the predefined groups. The output of the analysis tells the user which of the variables entered into the analysis contributed significantly to the statistical grouping. The most interesting question for us is to see to which degree the purely statistical grouping of the data is in congruence with the user-defined classification according to L2 user groups. A reasonably large degree of agreement between the two classifications will indicate that the chosen measures capture relevant aspects of L1influenced speech rhythm.
2.
Subjects, speech material and segmentation
A total of 37 subjects served as speakers in this study, divided into the following seven groups. There were six second language speaker groups with the following L1s (number of speakers in parentheses): Chinese (7), English (4), French (6), German (4), Persian (6) and Russian (4). In an attempt
124
Wim A. van Dommelen
to collect speakers having approximately the same level of proficiency in Norwegian, most of the speakers were recruited from a Norwegian course offered at the Department of Language and Communication Studies (NTNU, Trondheim). Six native speakers of Norwegian served as a control group. The speech material used was chosen from existing recordings made for the Language Encounters project (see Acknowledgement). The recordings have been made in the department’s sound-insulated studio and were subsequently stored with a sampling frequency of 44.1 kHz. The material comprises readings of a short text, 120 different sentences and some spontaneous speech. Since the sentences have been designed to contain all the Norwegian phonemes and relevant VC: and V:C dyads, this part of the material was considered most suited for a systematic investigation and, therefore, a number of sentences has been selected for the present study. For the first part of our investigation, eight different sentences were chosen containing words with the short vowels /a/ and /ø/ (two sentences each) and their long counterparts /a:/ and /ø:/ (two sentences each) all followed by the voiceless plosive /t/. For the second part, ten different sentences were selected containing between 9 and 15 syllables (mean of 12.2 syllables). The total number of utterances investigated was thus 37 (subjects) x 8 (utterances) = 296 for the first part, and 37 x 10 = 370 for the second part. The segmentation of the speech material was done by visual and auditory inspection of the waveform and the spectrogram of the speech signal using Praat (Boersma and Weenink, 2006). Figure 1 shows an example of how vowel and consonant durations were measured for part 1. The test word under scrutiny is møtte ([¥O1VÖ] ‘met’). Determining the starting point of the VC: dyad (the transition from the nasal to the vowel) is a relatively straight-forward task. The end of the dyad was set at the beginning of the schwa, i.e. at the end of the postaspiration. In contrast to these two points in time, defining the exact end of the vowel (= the start of the intervocalic plosive) is not a trivial task. The Norwegian speaker shown in the figure has produced preaspiration, the realization of which can vary but which is here characterized by a short vowel portion with breathy voice quality followed by a voiceless friction phase. Segmentation of the present speech material always followed the convention illustrated here, i.e. defining preaspiration (if any) as the sum of the breathy part and the friction (both of which can be absent).
Temporal patterns in Norwegian as L2
1
2
3
125
4
5000 Freq (hz)
0 0.567
0.827 Time (s)
Figure 1. Waveform (top) and spectrogram (bottom) of the word møtte ([¥O1VÖ]) spoken by a female Norwegian subject. Indicated are (1) vowel, (2) preaspiration (breathy vowel + voiceless friction), (3) occlusion, and (4) postaspiration.
The segmentation for part 2 consisted of dividing the utterances into syllables and determining their durations. Syllabification was guided primarily by the consideration to achieve consistent results across speakers and utterances. In words containing a sequence of a long vowel and a short consonant in a context like V:CV (e.g., fine [¥HKÖP] ‘nice’) the boundary was placed before the consonant (achieving fi-ne), after a short vowel plus long consonant as in minne ([¥OKPÖ] ‘memory’) after the consonant (minn-e). Only when the intervocalic consonant was a voiceless plosive, the boundary was always placed after the consonant (e.g. in mat-et ‘fed’).
3.
Duration patterns of vowels and consonants
This section describes the results from the first part of our investigation, dealing with the temporal structure of VC: and V:C dyads as produced by both L2 and L1 speakers of Norwegian. After the inspection of mean durations in 3.1, Section 3.2 looks into the phenomenon of preaspiration, which was not only produced by the Norwegian natives but also by a group of L2
126
Wim A. van Dommelen
users. In Section 3.3 the problem of variation is dealt with and it is argued that the interpretation of the empirical data to a large degree depends on the perspective chosen for the evaluation. 3.1. Mean vowel and consonant durations Figure 2 depicts mean segment durations for VC: (Figure 2a) and V:C (Figure 2b) dyads as produced by the six groups of L2 users and the Norwegian control group. As the first thing of note for the latter group we observed a relatively long preaspiration (breathy vowel + friction) in both VC: (35 ms) and V:C (29 ms). Traditionally, the occurrence of preaspiration is considered to be restricted to a few dialectal variants, our present speakers (having dialect backgrounds from South-East Norwegian and the Trøndelag region) not belonging to them. Our data therefore suggest that preaspiration occurs more frequently in Norwegian than usually is assumed and they confirm similar findings from previous studies (van Dommelen, 1999b). Further, calculation of duration ratios for V:/V and C:/C (where preaspiration is included in the consonant) achieved values of 2.24 and 1.19, respectively. Fintoft’s (1961) material did not include postvocalic stops (see Section 1) so that a direct comparison with his results is precluded. (In this connection it should be noted that preaspiration only occurs with voiceless stops.) By and large, Fintoft’s mean values of 1.9 for V:/V and 1.3 for C:/C can be said to be not too different. The same conclusion can be drawn concerning Behne, Moxness and Nyland’s (1996) ratios of 1.8 and 1.3 (averaged across voiced and voiceless plosives; see Section 1). In their description no mention is made of the occurrence of preaspiration such that their data are inconclusive in this respect. As to the productions by the L2 speakers, Figure 2 shows that the deviations of the vowel and consonant durations from the L1 reference values are not as large as possibly expected. This is especially true for the V:C dyad. A more systematic pattern was found for the short vowel, which was produced with longer durations than the Norwegian mean value by all L2 speaker groups. To see how the second language users master the V:/V quantity opposition, let us compare their V:/V ratios with the value measured for the reference group, which amounts to 2.24 (i.e., for V excluding preaspiration). While the German, English and Russian speakers had relatively high ratios (1.53, 1.38, and 1.38, respectively), the remaining groups (Chinese, French, and Persian) had less clear durational contrasts (values of
Temporal patterns in Norwegian as L2 (a)
350
C: PA
300
(b)
350
C PA
300 Duration (ms)
Duration (ms)
V 250 200 150 100
127
V:
250 200 150 100 50
50
0
0 Ch
En
Fr
Ge
Pe
Ru
No
Ch
En
Fr
Ge
Pe
Ru
No
Figure 2. Mean segment durations (in ms) in words containing /a(:)/ and /ø(:)/ followed by a voiceless plosive spoken by different speaker groups (Chinese, English, French, German, Persian, Russian, and Norwegian). (a): V, preaspiration (PA) and C:; (b): V:, preaspiration, C.
1.07, 1.08, and 1.04, respectively). It may seem to be somewhat surprising that the Russian speakers pattern with the German and English groups though Russian lacks a vowel quantity/tenseness contrast as present in the L1s of the latter. An explanation could be sought in the Russian word stress system where vowels are lengthened when stressed (cf. Svetozarova 1998). This means that the speakers are at least familiar with conditioned vowel duration. In contrast to the apparently rather regular behaviour of the present Russian subjects, Markus and Bond (1999) report difficulties of Russian talkers to employ duration as a correlate of vowel quantity in Latvian. Similarly, the Russian L2 speakers of Latvian in Bond, Markus and Stockmal (2003) inappropriately produced short vowels with lengthening and failed to reach appropriate durations for long vowels. Prompted by these seemingly diverging results, we inspected our present data more closely. This inspection showed that the behaviour of the Russian speakers as a group is less appropriate after all. One of the subjects produced very long V: durations (mean value of 245 ms) while the other three Russian speakers produced much shorter durations (mean value of 113 ms). Given respective mean durations of 131 ms and 97 ms for the short vowels, the V:/V ratio was remarkably high for the former (1.87) and much lower for the latter (1.16). This result thus demonstrates the issue of variation within a group and so it may be worthwhile to have a closer look at the data and to investigate to what extent variation of individual segment durations can tell us more about L2 performance. Before doing this, however, we will focus on the transition of the vowel into the stop as produced by the English subjects, i.e. preaspiration.
128
Wim A. van Dommelen
3.2. Preaspiration A detail deserving our attention is the fact that the speakers with an English background produced relatively strong preaspiration in both VC: and V:C context (with respective mean durations of 62 ms and 50 ms even longer than the productions of the Norwegian control group; 35 ms and 29 ms, respectively). For some other groups (French, German, Persian, and Russian) short preaspiration portions were measured, but their short durations indicate that we are dealing with presumably physiologically conditioned transitions from the vowel into the stop. For the English speakers, however, the question is how to explain their substantial production of preaspiration. According to impressionistic observation, the proficiency in Norwegian pronunciation of this speaker group was not notably higher than for most of the other L2 groups. It seems, therefore, improbable that the English group had acquired the production of preaspiration through intensive learning and contact with Norwegian. According to occasional inspection of English speech material, preaspiration appears to occur in this language as well. Interestingly, the production of preaspiration seems to have gone unnoticed in the literature. At least, investigation of some textbooks on English phonetics reveals that the feature of preaspiration is not mentioned and, therefore, does not belong to the catalogue of relevant characteristics. For example, in her workbook on the pronunciation of English Kenworthy (2000) deals with aspiration, but not with preaspiration. The same is true for the introduction to phonetic science by Ashby and Maidment (2005). Ladefoged and Maddieson (1996:70–73) discuss the occurrence of the phenomenon preaspiration in well-known examples as Icelandic, Scottish Gaelic and Faroese but are silent on the (possible) production of preaspiration in English. Also the very detailed account of spoken English by Shockey (2003) only includes aspiration. Further, the absence of preaspiration (in contrast to postaspiration) in the textbook on English phonetics for Norwegian students by Davidsen-Nielsen (1996) suggests that this phenomenon does not have a very prominent position in practical phonetics in Norway. Based on the present results it might seem worthwhile for future research to have a closer look at the occurrence of preaspiration in English. It is not impossible that this feature plays a certain role in English speech sound production but until now has escaped our notice.
Temporal patterns in Norwegian as L2
129
3.3. The problem of variation Inspection of mean segment durations can tell us a good deal about how second language users master the durational contrast long/short vowel in V:C vs. VC: dyads. But, as indicated above in Section 1, we will also have to take into account that individual tokens will to a lesser or larger extent vary round the mean. Rather differently distributed duration values can result in the same average. Table 1 gives one of the most usual measures describing dispersion, namely standard deviation (This measure was not included in Figure 2 in order to avoid overloading the picture). It can be seen from the table that the native speakers produced durations with relatively small variations (standard deviation for the vowels on average 15 ms, for the consonants 32 ms). For the group of L2 speakers as a whole higher values were found (pooled across the six groups 36 ms and 46 ms, respectively). Taking averages pooled across all four conditions of long/short vowel and long/short consonant as a measure, the German group was most consistent in their productions (mean standard deviation of 24 ms), while the mean values for the other groups were rather similar to each other (lying between 42 ms for the French and 48 ms for the Chinese subjects). One might wonder whether this rather strong degree of similarity can be interpreted as similar L2 behaviour or whether other perspectives could supply us with additional useful information. Table 1.
Standard deviations (in ms) for vowels and consonants in words containing /a(:)/ and /ø(:)/ followed by a voiceless plosive spoken by different speaker groups. n= number of tokens
Chinese
English French German Persian Russian Norwegian
n
28
16
24
16
24
16
24
V:
48
26
37
20
36
67
20
C
45
62
44
20
45
23
28
V
38
27
25
30
41
38
10
C:
61
55
61
27
51
53
34
Wim A. van Dommelen
(a) consonant duration (ms)
350
Norwegian
V:C
300
(b)
VC:
350
250 200 150 100 50
300 250 200 150 100 50
0 0
50
100
150
200
250
0
300
0
50
100
vowel duration (ms)
150
200
250
300
vowel duration (ms)
(d) 350
V:C VC:
French 300 250 200 150 100 50
350
V:C VC:
German 300 consonant duration (ms)
(c) consonant duration (ms)
V:C VC:
Chinese consonant duratioon (ms)
130
250 200 150 100 50
0
0 0
50
100
150
200
vowel duration (ms)
250
300
0
50
100
150
200
250
300
vowel duration (ms)
Figure 3. Vowel and consonant durations (in ms) in words containing /a(:)/ and /ø(:)/ followed by a voiceless plosive spoken by native speakers of (a) Norwegian (vowel duration does not include preaspiration), (b) Chinese, (c) French, and (d) German. Each data point represents one token.
To answer this question we will demonstrate how one can obtain a more informative impression of variation in production through a graphic representation depicting the durational relationships of vowels and consonants in VC: and V:C. Figure 3 illustrates this for a selection of four speaker groups (native speakers of Norwegian, Chinese, French, and German). As can be seen from Figure 3a, the durations of the segments in the V:C and VC: dyads produced by the Norwegian speakers fall into distinct categories. This is more than possibly could have been expected because the test words did occur in different positions in the utterances (utterance-medial and final) and in the evaluation no attempt has been made to normalize for speech rate. Further, it can be seen easily that the main durational correlate of the VC: - V:C opposition is the vowel. Consonant durations for the two members of the opposition pair overlap to a large degree. In stark contrast with the two distinct categories found for the natives the Chinese speakers’ performance is characterized by almost complete overlap (Figure 3b). There is almost no distinction between the durations of V and V: as well as C: and C. Though the values for the French group (Figure 3c) show less over-
Temporal patterns in Norwegian as L2
131
lap, these speakers didn’t realize clearly distinct categories either. Presumably due to the lack of a vowel quantity opposition in French both V and V: have relatively short durations. At the same time, consonant duration is not being used to distinguish between the two dyads. Finally, the German speakers (Figure 3d) handle the VC: - V:C distinction more like the Norwegian natives. In spite of a certain overlap in vowel durations, a certain tendency of distinguishing two categories can be noticed.
4.
Quantification of rhythm
This section deals with the second part of our study of timing in L2 speech production, namely the question whether speakers from different language backgrounds produce different speech rhythms and whether typical rhythmical properties can be quantified. To that aim, Section 4.1 presents seven measures related to speech rhythm that have been used in a discriminant analysis. Section 4.2 presents the results for a central measure, mean syllable duration. In the last section (4.3) the results of a discriminant analysis are presented showing that in fact aspects of speech rhythm can be captured by some of the measures presented here. 4.1. Definition of measures To compare the temporal structure of the L2 utterances with the L1 reference utterances, seven different types of measures were defined. In all cases calculations were related to each of the seven groups of speakers as a whole. The first measure was syllable duration averaged over all syllables of each utterance, yielding one mean syllable duration for each sentence and each speaker group, i.e. 7 (groups) x 10 (utterances)= 70 mean syllable durations in total (For all measures used in the discriminant analysis the total number of observations is n= 70). Second, the standard deviation for the syllable durations pooled over the speakers of each group was calculated for each of the single utterances’ syllables. The mean standard deviation was then taken as the second measure, thus expressing mean variation of syllable durations across each utterance. Figure 4 may illustrate this for the 10-syllable sentence To barn matet de tamme dyrene (‘Two children fed the tame animals’) produced by the Chinese and the Norwegian speaker group. In this figure, vertical bars indicate ± 1 standard deviation. The
132
Wim A. van Dommelen
mean of the ten standard deviation values represents the second measure as defined above (for Norwegian 27 ms; for Chinese 63 ms). Figure 4 may also serve as an example illustrating the definition of the third and fourth measure. For the Norwegian reference group, mean syllable durations are indicated by closed symbols and ranked in ascending order. Similarly, open symbols depict the durations for the same syllables produced by the group of seven Chinese speakers. Note that the order of the syllables is the same as for the Norwegian natives. Also indicated are regression lines fitted to the two groups of data points. The correlation coefficient for the relation between syllable duration and the rank number of the syllables as defined by the Norwegian reference is the third measure in this study. The higher this correlation coefficient, the better agreement between the overall temporal organization of the syllables and the Norwegian reference. For the Chinese speaker group presented in the figure the value is relatively low: r = 0.541. Further, the slope of the regression line was taken as the fourth measure (here: 18.7). As illustrated in Figure 4, the measures three and four will contain information about the joint duration pattern of the syllables in an utterance. In the example it is obvious that the pattern produced by the Chinese subjects is rather different from the Norwegian reference.
Syllable duration [ms]
600 500 400 300 200 100 e
0 0
re 2
de
et 4
ne
dy
to
6
tamm mat barn 8
10
12
Syllable rank
Figure 4. Mean duration of syllables in a Norwegian utterance ranked according to increasing duration for six native speakers (closed symbols with regression line). Open symbols indicate mean durations for a group of seven Chinese subjects with syllable rank as for the L1 speakers. Vertical bars indicate ± 1 standard deviation.
Temporal patterns in Norwegian as L2
133
As measure number five speech rate was chosen, defined as the number of (actually produced) phonemes per second. This yielded one single value per utterance and speaker group, that is also here resulting in a total of n= 70 values. Subsequently, as the sixth measure for each utterance the standard deviation belonging to the speech rate value was computed. The standard deviation was calculated across the speakers of each group and thus indicates the degree to which mean speech rate varied within a group. Finally, the seventh measure was the normalized Pairwise Variability Index (nPVI) as used by Grabe and Low (2002):
(1) nPVI =
⎡m−1 dk −dk +1 ⎤ 100 × ⎢ ∑ /( m − 1) ⎥ (d +d ) / 2 ⎣ k =1 k k +1 ⎦
In this calculation the difference of the durations (d) of two successive syllables is divided by the mean duration of the two syllables. This is done for all (m-1) successive syllable pairs in an utterance (m= the number of syllables). Finally, by dividing the sum of the (m-1) amounts by (m-1) a mean normalized difference is calculated and expressed as percent. For the convenience of the reader the present measures are repeated below: 1. 2. 3. 4. 5. 6. 7.
mean syllable duration standard deviation for syllable durations correlation coefficient slope of regression line mean speech rate standard deviation for speech rate nPVI
4.2. Results: Mean syllable duration Since the main temporal unit under scrutiny is the syllable, let us first see whether and to what extent the various speaker groups produced different syllable durations. As can be seen from Table 2, mean syllable durations vary substantially. Shortest durations were found for the natives (176 ms), while the subjects with a Chinese L1 produced the longest syllables
134
Wim A. van Dommelen
(286 ms). The other groups have values that are more native-like, in particular the German speakers with a mean of 196 ms. For all speaker groups the standard deviations are quite large, which is due to both inter-speaker variation and the inclusion of all the different types of syllables. Note that the standard deviation described here was computed across all single tokens (e.g., for the Chinese n= 837) and thus differs from the second measure defined above in Section 4.1.) According to a one-way analysis of variance, the overall effect of speaker group on syllable duration is statistically significant (F(6, 4490)= 97.841; p< .0001). In order to obtain information about differences between syllable durations for all possible pairs of language groups, a Games-Howell post-hoc analysis was performed. The result showed that only the difference between the two mean durations for the English group (222 ms) and the Russian group (216 ms) was non-significant. All the remaining differences turned out to be statistically significant at a level of significance p= 0.05. Therefore, it can be concluded that the measure mean syllable duration captured characteristic differences between the speaker groups. Here one might raise the question of how to explain the differences in mean syllable duration. They need not necessarily be due to L1-dependent behaviour but could reflect differences in speech rate correlating with the subjects’ general performance level in Norwegian. A possible approach to investigating this issue could be to collect and analyze speech material from the present speaker groups for their respective L1s. But firstly, due to the considerable research efforts needed, until now we had to refrain from such an enterprise. Secondly, though L2 performance certainly is affected by L1specific factors we can not assume a linear transfer of temporal patterns from L1 to L2. Nevertheless, previous investigations of temporal similarities and dissimilarities between different languages can provide us with a frame of reference. Delattre (1966) compared syllable durations in English, German, French and Spanish. His material consisted of five minutes of spontaneous speech produced by one native speaker of each of these languages. Conditioning factors were syllable weight (stressed/unstressed), place (final/non-final) and type (open/closed). Mean durations of final, stressed closed/open syllables turned out to be longer for English (408 ms/335 ms) than for German (362 ms/298 ms) and French (341 ms/246 ms). For nonfinal syllables rather small differences between English (259 ms/192 ms) and German (246 ms/197 ms) were found (note that in French stressed syllables occur only in final position). Unstressed non-final closed/open syllable durations showed a reversed order for the three languages: French
Temporal patterns in Norwegian as L2
135
(192 ms/137 ms) > German (175 ms/132 ms) > English (155 ms/120 ms). These results indicate that the impact of syllable weight, place and type differ considerably between languages and that it could be worthwhile to look into the more complex matter of speech rhythm rather than average syllable durations. In particular, it should be kept in mind that the values presented in Table 2 represent averages across all three conditions of stress, position and type, which reduces the possibility of comparing results. Roach (1982) measured syllable durations in samples of spontaneous speech produced by one native speaker each of three so-called syllabletimed languages (French, Telugu and Yoruba) and three stress-timed languages (English, Russian and Arabic). He does not present absolute syllable durations but gives their standard deviation as a measure of variability. The hypothesis of more variable durations in stress-timed languages is not born out by the data: rather similar values were found for ‘stress-timed’ English (86 ms) and Russian (77 ms) on the one hand and ‘syllable-timed’ French (75.7 ms) on the other. The data presented in Table 2 are in line with this outcome, the standard deviation for French (101 ms) being comparable to that for English (106 ms) and Russian (107 ms) and even larger than for German (87 ms). Section 4.3 will take up the issue of speech rhythm and investigate whether the measure of syllable duration and the other six ones mentioned above contain sufficient speech rhythm information to classify the utterances according to their membership of the different groups. Table 2. Mean syllable durations and standard deviations in ms for six groups of L2 speakers and a Norwegian control group. Means are across ten utterances and all speakers in the respective speaker groups. Chinese
English
French German
Persian Russian Norwegian
mean
286
222
241
196
258
216
176
sd
113
106
101
87
107
112
86
n
837
489
731
488
732
488
732
4.3. Discriminant analysis In order to investigate whether rhythmical differences between utterances from the different speaker groups can be captured by the seven measures
136
Wim A. van Dommelen
defined above, a discriminant analysis was performed. Before going into the question of the possible contribution of the different measures, let us see how the statistical analysis classified the 70 utterances. The results are presented in Table 3. Here it can be seen that in the majority of cases the L2-produced utterances were correctly classified. The overall correct classification rate amounts to 92.9%. All utterances produced by the Chinese, German, Persian and Russian speakers were classified in accordance with their actual L1 group membership. Of the ten utterances from the English group, one utterance was classified as French and one as German-produced. One utterance from the French subjects was confused with the category English. The classification of two utterances from the Norwegian reference group as German confirms the native-like temporal structure of the speech produced by the Germans (Section 3.3). Table 3. Predicted L1 group membership (percent correct) of ten utterances according to a discriminant analysis using seven measures (see Section 4.1). L1 group
Chinese English French German Persian Russian Norwegian
Chinese
100 0 0 0 0 0 0
Predicted L1 group membership English French German Persian Russian Norwegian 0 0 0 0 0 0 80 10 10 0 0 0 10 90 0 0 0 0 0 0 100 0 0 0 0 0 0 100 0 0 0 0 0 0 100 0 0 0 20 0 0 80
We will now turn to the contribution of the present measures to this classification. The discriminant analysis was performed stepwise, which means that variables are entered one after another as long as they contribute significantly to the model. In turned out that four of the seven measures achieved statistical significance (in order of entrance): • • • •
Measure 1: mean syllable duration Measure 6: standard deviation for speech rate Measure 3: correlation coefficient Measure 5: mean speech rate
Temporal patterns in Norwegian as L2
137
This outcome suggests that three types of temporal information can be distinguished. First, the correlation measure containing information about the overall patterning of syllable durations. Second, the measures 1 and 5 both reflecting speech rate. Finally, measure 6 capturing aspects of variation in speech rate. It seems obvious that the information contained in the measures 1 and 5 contain could overlap to a large degree or even that including one of them could make the other one redundant. In order to get an impression of these two measures’ role the discriminant analysis was run again without measure 1, mean syllable duration. This lowered the classification rate from originally 92.9% to 81.4%. Doing the same thing for measure 5, mean speech rate, resulted in an overall rate of 91.4%. These percentages suggest that the two measures indeed contain redundant information, mean syllable duration having the most predictive power. Though the present analysis has succeeded in classifying the seven different speaker groups according to their respective language backgrounds, the issue of L1-specific speech rhythm is far from solved. Specifically, in interpreting the results one should take into consideration that speech rate and rhythm measures have been shown to co-vary. For example, Dellwo and Wagner (2003) demonstrated that the standard deviation of consonantal intervals as used by Ramus et al. (1999) is heavily speech rate dependent. A similar conclusion was drawn by Barry et al. (2003) among other things as to Grabe and Low’s (2002) PVI measures for vowels and consonants. It is conceivable that the differences in speech rate for the present speaker groups are only partly language-dependent and vary mainly with the speakers’ general skills in Norwegian.
5.
Conclusions
The goal of the present study has been to shed some light on temporal aspects of Norwegian spoken as a second language. In general, it could be shown that speakers from six different native languages at the level of single vowels and consonants as well as syllables produced patterns that differed from the Norwegian reference. In its generality this is, of course, a result that could be expected. Going into more detail, a central question is to what extent the data revealed deviation patterns that could be characteristic for the different speaker groups involved, i.e. depending on their respective native languages. To answer this question, data from measurements on the temporal structure of dyads VC: and V:C were evaluated in different
138
Wim A. van Dommelen
ways. From the average durations for each of the elements under scrutiny (V, V:, C, C:) it was not easy to detect any systematic differences between L2 and L1 productions. More informative was the duration ratio V:/V in this respect. Here, there was a tendency for speakers from languages closer to the target language to have somewhat more native-like values. This tendency was, however, not very clear. One the one hand, the Russian speakers performed similar to the German and the English speakers, which does not seem to be in congruence with the degree of language family membership. On the other hand, the French subjects’ ratios deviated more from the Norwegian reference and were, possibly somewhat surprisingly, similar to those for the Chinese and Persian speakers. The most revealing perspective to evaluate and interpret the present data was to inspect how the durations of the long and short vowels and consonants relate to each other and what the duration patterns for the classes of VC: vs. V:C look like. Most nativelike performance was found for the German speakers, thus confirming the previously observed tendencies for this group. While the data for the French subjects seemed to reflect the lack of vowel quantity in their native language, the Chinese speakers showed considerable scatter and so failed to systematically distinguish between the VC: vs. V:C categories. A fundamental problem in interpreting data like those from the present study is the complexity of the factors that contribute to the measurable output. First of all, there is at present no model to predict what kind of interference phenomena can be expected. From current models, Flege’s (1995) model can be used to make global predictions, but it seems difficult to make predictions about specific deviations. Apart from these L1 influences, which at least in principle could be predicted, there are many further contributing factors at the individual level: duration and intensity of contact with the second language, degree of familiarity with other languages, formal training in L2, education level, family situation as to the use of one, two or even more languages, motivation to learn a new language – just to mention some. All these factors contribute to obscure possible systematic effects to different degrees. The results for the VC: vs. V:C dyads confronted us with the kind of interpretation difficulties as mentioned above. Here, it has become clear that purely phonological reasoning cannot explain the data satisfactorily. The performance of the Russians was more native-like compared to the productions of the Chinese though in both native languages vowel quantity is absent. Further, it is difficult to give more than a rather superficial explanation of the substantial variation in the performance of the Chinese, saying
Temporal patterns in Norwegian as L2
139
that this reflects the pronunciation difficulties they encounter. It is thinkable that the observed variation to a certain extent is caused by uncertainties in grapheme-to-phoneme conversion in reading. All this does not mean, however, that the present results are without practical implications. For example, in the teaching of Norwegian pronunciation to German target groups there will presumably not be much need to focus on issues related to vowel quantity. Consequently, more time would be available to emphasize other aspects. Dealing with French as L1, it seems useful to make speakers aware of the long durations necessary to produce appropriate phonologically long vowels. At the same time, the complementary consonant duration differences in the VC: vs. V:C opposition should be brought to the learners’ attention. Learners with a language background that is more distant, like the present Chinese speakers, can be expected to need and to profit from a very thorough instruction concerning the temporal aspects of Norwegian. An unexpected outcome of the measurements was the presence of preaspiration in Norwegian produced by native speakers of English. This finding demonstrates the potential usefulness of phonetic analyses for pronunciation teaching. Though in many cases the human ear is unsurpassable as an instrument for judging speech productions, some relevant details might escape our attention until revealed by an instrumental analysis. So, instrumental methods may make us more aware of pronunciation phenomena and potentially contribute to improving teaching praxis. In the present case of preaspiration, drawing the attention of the learners of Norwegian to this detail of consonant production might help to make their pronunciation more authentic. Nowadays, with the help of the omnipresent computer and a free-ware program like Praat it does not require much specialist knowledge to integrate sound demonstrations in pronunciation teaching. In this way, learners could acquire a better understanding of all kinds of pronunciation aspects as, for example, vowel reduction, assimilation, intonation or, in a language like Norwegian, a notoriously difficult feature as the realization of tonal accents. As was expected from the outset, investigation of speech rhythm evidenced different temporal patterns for the six speaker groups. It seems reasonable to ascribe the deviations at least partly to the influence of the respective native languages. With rhythm-related measures as input a discriminant analysis classified L2 utterances according to their L1 membership with a relatively high degree of accuracy (92.9% correct). As to the relevance of the present measures of speech rhythm, only four out of the seven measures turned out to contribute significantly. Probably most closely re-
140
Wim A. van Dommelen
lated to speech rhythm, the correlation coefficient measure seems to convey relevant information about the overall patterning of syllable durations. Two further relevant measures (mean syllable duration and mean speech rate expressed in phonemes per second) are both related to speech rate and appear to contain overlapping information. The fourth significant measure involved the variation in speech rate. It thus appears that a large portion of the information about the utterances’ L1 membership originates from the rate of speech deliverance. Since it is conceivable that speech rate does not represent an L1-specific factor, but varies with the level of proficiency in L2 in general, further research on this issue will be needed. At present, it can only be speculated about the reasons why three measures don’t seem to convey rhythm information. Finally, we would like to point out that the present measures were of an exploratory character and some of them were possibly too crude to capture details of speech rhythm. Also, and presumably more importantly, operationalizing speech rhythm as the temporal organization of syllables means a strong reduction which fails to do justice to the complex of interacting factors involved. It is hoped, however, that future efforts studying more aspects of speech rhythm, both in production and perception, eventually will give a better understanding of this phenomenon.
Acknowledgement This research is supported by the Research Council of Norway (NFR) through grant 158458/530 to the project Språkmøter (Language Encounters). The speech material was developed and recorded by Snefrid Holm (Department of Language and Communication Studies, NTNU) as part of her PhD project. I would like to thank Rein Ove Sikveland (Department of Language and Communication Studies, NTNU) for the segmentation of the speech material.
References Ashby, Michael and John Maidment 2005 Introducing Phonetic Science. Cambridge: Cambridge University Press.
Temporal patterns in Norwegian as L2
141
Barry, William J., Bistra Andreeva, Michela Russo, Snezhina Dimitrova and Tanya Kostadinova 2003 Do rhythm measures tell us anything about language type? Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, 2693–2696. Behne, Dawn, Bente Moxness and Anne Nyland 1996 Acoustic-phonetic evidence of vowel quantity and quality in Norwegian. Fonetik 96, Papers presented at the Swedish Phonetics Conference, Nässlingen, 29–31 May 1996. KTH (Royal institute of Technology), Speech, Music and Hearing. Quarterly Progress and Status Report, TMH-QPSR 2/1996, 13–16. Boersma, Paul and David Weenink 2006 Praat: doing phonetics by computer (Version 4.4.11) [Computer program]. Retrieved February 23, 2006, from http://www.praat.org/. Bond, Dzintra, Dace Markus and Verna Stockmal 2003 Prosodic and rhythmic patterns produced by native and nonnative speakers of a quantity-sensitive language. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, 527–530. Dauer, Rebecca M. 1983 Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62. Davidsen-Nielsen, Niels 1996 English Phonetics. Translated and adapted for use in Norway by Barbara Bird and Per Moen. Oslo: Gyldendal Norsk Forlag A/S (Seventh impression). Delattre, Pierre 1966 A comparison of syllable length conditioning among languages. International Review of Applied Linguistics 4, 183–198. Dellwo, Volker and Petra Wagner 2003 Relations between language rhythm and speech rate. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, 471–474. van Dommelen, Wim A. 1999a Auditory accounts of temporal factors in the perception of Norwegian disyllables and speech analogs. Journal of Phonetics 27, 107–123. 1999b Preaspiration in intervocalic /k/ vs. /g/ in Norwegian. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, 2037–2040.
142
Wim A. van Dommelen
Fintoft, Knut 1961 Flege, James 1995
The duration of some Norwegian speech sounds. Phonetica 7, 19–39.
Second language speech learning: Theory, findings, and problems. In: Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-language research, 233–277. Timonium: York Press. Grabe, Esther and Ee Ling Low 2002 Durational variability in speech and the rhythm class hypothesis. In: Carlos Gussenhoven and Natasha Warner (eds.), Laboratory Phonology 7, 515–546. Berlin/New York: Mouton de Gruyter. Gut, Ulrike 2003 Prosody in second language speech production: the role of the native language. Zeitschrift für Fremdsprachen Lehren und Lernen 32, 133–152. Kenworthy, Joanne 2000 The Pronunciation of English: A Workbook. London: Arnold. (Co-published in the USA by Oxford University Press Inc., New York.) Kristoffersen, Gjert 2000 The Phonology of Norwegian. Oxford: Oxford University Press. Krull, Diana, Hartmut Traunmüller and Wim A. van Dommelen 2003 The effect of local speaking rate on perceived quantity: a comparison between three languages. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, 1739–1742. Ladefoged, Peter and Ian Maddieson 1996 The Sounds of the World’s Languages. Oxford: Blackwell Publishers Ltd. Markus, Dace and Dzintra Bond 1999 Stress and length in learning Latvian. Proceedings of the 14th International Congress of Phonetic Sciences, San Fransisco, 563–566. Ramus, Franck 2002 Acoustic correlates of linguistic rhythm: Perspectives. Proceedings Speech Prosody 2002, Aix-en-Provence (France), 115–120. Ramus, Franck, Marina Nespor and Jacques Mehler 1999 Correlates of linguistic rhythm in the speech signal. Cognition 73, 265–292.
Temporal patterns in Norwegian as L2 Roach, Peter 1982
143
On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In: David Crystal (ed.), Linguistic controversies. Essays in linguistic theory and practice in honour of F.R. Palmer, 72–79. London, Edward Arnold.
Shockey, Linda 2003 Sound Patterns of Spoken English. Malden, USA: Blackwell Publishing Ltd. Stockmal, Verna, Dace Markus and Dzintra Bond 2005 Measures of native and non-native rhythm in a quantity language. Language and Speech 48, 55–63. Svetozarova, Natalia 1998 Intonation in Russian. In: Daniel Hirst and Albert di Cristo (eds.), Intonation Systems. A Survey of Twenty Languages, 261–274. Cambridge: Cambridge University Press.
Learner corpora in second language prosody research and teaching Ulrike Gut 1.
Introduction
This article addresses methodological issues in L2 prosody research and teaching and argues for a corpus-based approach in both areas. Current research methods in L2 prosody have a number of limitations. A survey of all empirical studies on L2 prosody published in the major international journals in second language acquisition (SLA) research in the past 25 years demonstrates that research in L2 prosody tends to be based on a relatively small data base with a limited number of participants. Research on intonation, for example, is carried out with an average number of 22.6 participants (range 2 to 75); research on word stress is based on even fewer participants, on average 7.7, ranging from 4 to 10. The analysis of the productions of only few participants, however, precludes the study of variation between learners, for which representatively sized groups are necessary. Furthermore, empirical research on non-native prosody typically elicits data in a relatively controlled setting and is restricted to one speech style. Most studies base their investigations on the readings of words and sentences. Arguments against experimentally elicited data are brought forth for example by Leather (1999), who argues that some phonological structures may be more susceptible to errors in an experimental setting and suggests that “observations from artificial speech tasks cannot always be extrapolated to natural conditions” (p. 32). Moreover, data analysis in L2 prosody research usually focuses on just one aspect of non-native prosody such as a particular intonational structure. The relationship between different prosodic domains, however, is not investigated. Finally, studies rarely relate their findings to non-linguistic factors assumed to influence the acquisition of prosody in an L2. The only explanatory aspect of language learning under investigation has been the influence of the learners’ native language on their L2 prosody. If factors such as age, motivation and speech style are analysed, only one of them is studied. No longitudinal studies,
146
Ulrike Gut
where speech is collected from the same individuals at multiple intervals over a period of time, have yet been carried out. Recently, it has been suggested that a corpus-linguistic approach should be introduced into research in language acquisition. It is widely argued that a corpus-based methodology can complement the current research methods in second language learning and possibly compensate some of their weaknesses (Biber, Conrad, and Reppen 1994, Botley et al. 1996, Kettemann and Marko 2002, Granger, Hung and Petch-Tyson 2002, Sinclair 2004, Granger 2004). However, so far, corpus linguistics and second language research have mainly co-existed side by side and have not yet joined forces (cf. Hasselgard 1999). Due to the scarcity of learner speech corpora the analysis of learner phonology or prosody have so far been impossible (cf. Nesselhauf 2004). The recently completed LeaP (Learning Prosody) corpus fills this gap in providing a fully annotated speech corpus of learner English and learner German. Apart from serving as a resource for empirical research, language corpora are increasingly used in the classroom and the recognition of their pedagogical value is growing (e.g. Ghadessy, Henry and Roseberry 2001, Kettemann and Marko 2002, Granger, Hung and Petch-Tyson 2002, Sinclair 2004). It has been claimed that the application of corpora in the classroom supports inductive learning processes and the creation of language awareness in language students. By investigating corpora students are stimulated to enquire and speculate about language structures and develop the ability to recognize language patterns. In corpus-based “data-driven learning”, for example, students have the opportunity to work as researchers by developing a research question and analysing it with real-language data. It has been suggested that activities based on a comparison between native and non-native corpus data enable language learners – to focus on negative evidence and typical errors – to train their ability to notice differences between native and non-native language use – to increase their language awareness By observing the errors learners typically and most frequently make, students might find it easier to become aware of the features of their own interlanguage and possibly stimulate a restructuring of their own language use and knowledge (e.g. Granger and Tribble 1998). Due to the scarcity of
Learner corpora in second language prosody research and teaching
147
learner speech corpora the analysis of learner phonology or pronunciation in a classroom setting have so far been impossible (cf. Nesselhauf 2004). The aim of this article is to report on the advantages and new opportunities offered by the corpus-based approach in L2 prosody. Section 2 gives a brief overview of corpus linguistics, the various types of corpora that have been collected and the advantages of a corpus-based approach. In section 3, the learner corpus LeaP is described. It serves as the basis of the analysis of non-native vowel reduction in both L2 English and L2 German (Section 4). Section 5 summarizes the findings of a preliminary study on the application of the LeaP corpus in language teaching. The implications of the results of the analysis for research in L2 prosody and for the teaching of prosody are discussed in section 6.
2.
Corpus linguistics
Corpus linguistics as a method to study the structure and use of language can be traced back to the 18th century (Kennedy 1998: 13). Modern corpora began to be collected in the 1960s. In modern definitions the term corpus is usually used to refer to a substantial collection of language texts or transcriptions of spoken language in electronic form (Biber, Conrad, and Reppen 1996: 4). McEnery and Wilson (2001) list representativeness, sufficient size, a machine-readable form and its function as a standard reference as typical requirements for a corpus. Representativeness refers to the fact that the collection of speech data should be maximally representative for the aspect under investigation, that is, provide researchers with an as accurate as possible picture of the occurrence and variation of the phenomena under investigation. Modern corpora have to be machine-readable so that their purpose, the rapid (semi-)automatic analysis of large amounts of data, can be realized. The computer-based storage form furthermore allows an enrichment of the corpus by annotations. In general, it is assumed that a corpus functions as a standard reference for the language or language variety it represents. 2.1. Types of corpora Several types of corpora can be distinguished: Text corpora consist of collections of written samples of a language variety; speech corpora constitute
148
Ulrike Gut
a collection of spoken samples of a language variety. The latter is often also referred to with the term spoken language corpus. Text corpora are naturally not suited for the description and development of linguistic theory in the area of phonetics and phonology. Corpora can be unannotated or annotated. The term annotation refers to the enhancement of the primary data (audio or video recordings in the case of speech corpora) with various types of linguistic and non-linguistic information. Several types of linguistic annotations are in use and include orthographic transcriptions, phonemic and prosodic transcriptions, part of speech tagging, semantic annotation, anaphoric annotation and lemmatization. For example, the content of a recording may be transcribed orthographically, and an additional phonetic transcription may be carried out. Non-linguistic corpus annotations usually consist of meta-data, i.e. additional information about the corpus or its content. This includes information about the recording (e.g. time and place), about the speakers (e.g. age, sex, native language), about the recording situation (e.g. speech style elicited, instructions) and about the corpus (e.g. who collected it, where, when, with which purpose). A text-to-tone alignment, which links the transcriptions (annotations) with the audio or video recording, provides direct access from each annotated element to the primary data, i.e. the original recordings. By clicking on any annotated element the corresponding part of the recording will be played back by the annotation software. This is especially useful for the analysis of the corpus because items in question can be listened to again or additional phonetic analyses offered by the software such as a spectrographic analysis or pitch tracking can be carried out. In addition, this function enables language teachers and language learners to make use of the corpus in the classroom. In order to create, analyse, query and distribute an annotated speech corpus, an appropriate data format is required. The currently most widely used data format is based on the Extensible Markup Language (XML) technology, which allows an efficient document engineering of speech data by providing tools for the data collection (XML editors), for data analysis (e.g. XSL-T) and for data presentation. Corpora can be further divided into native corpora and learner corpora, the former containing language produced by native speakers, the latter containing language produced by learners of a language. Finally, corpora may contain only one language variety (monolingual corpora) or more than one language (multilingual corpora).
Learner corpora in second language prosody research and teaching
149
2.2. Corpus analysis Two major types of corpus analysis can be distinguished: qualitative and quantitative approaches. In qualitative research, small numbers of phenomena are described in detail and focus lies on the variation of the data. Its main drawback is that the findings cannot be generalized to larger populations with a sufficient degree of certainty. In contrast, a quantitative analysis of a corpus gives a precise account of the frequency and rarity of particular language structures. The specific findings can be tested to discover whether they are statistically significant and can be generalized to a larger population. In early corpus-based studies quantitative analysis was restricted to a simple counting of occurrence of linguistic items. However, the analysis of an annotated corpus allows the computation of various statistical measurements such as correlations between variables, i.e. the analysis of systematic ways in which some linguistic features vary with other linguistic features or how certain non-linguistic features vary with certain linguistic ones, and other multivariate measurements such as factor analyses and cluster analyses. The weakness of quantitative approaches lies in the risk that rare phenomena are not recognized and that fine distinctions are blurred. Corpus-based studies thus benefit most from combining both approaches (McEnery and Wilson 2001: 77). A number of advantages of using speech corpora in research on nonnative speech have been suggested (Biber, Conrad, and Reppen 1994, McEnery and Wilson, 2001, Granger 2002, 2004): – Corpora contain objective language data which reflects authentic natural
language use. A representative corpus of non-native speech constitutes a large empirical database of naturally occurring language structures and patterns of use and thus stands in contrast to the laboratory speech elicited in experimental studies on non-native speech, which has often been criticized as artificial and not generalizable (Leather 1999: 32). Corpora of non-native speech offer empirical investigations of the patterns of actual language use and allow quantitative and qualitative analyses whose results are generalizable to larger populations. – Corpus-based research allows an examination of more varied and larger amounts of data than any other methodology in second language research. This opens up the possibility that in an explorative manner previously unsuspected linguistic phenomena may be discovered and access to previously not accessible structures and patterns of use is
150
Ulrike Gut
provided. In this manner, researchers can for the first time test strongly held convictions and intuitions about frequency and type of learner errors. Granger (2004: 123) suggests that corpus-based research in L2 provides a basis for a new way of thinking which may challenge some of the deeply-rooted ideas about learner language. Similarly, Biber, Conrad, and Reppen (1994) report from morphosyntactic and lexical studies that researchers’ intuitions can prove incorrect when tested against actual frequencies and usage in the corpus. As pointed out variously, corpora constitute the only reliable source of evidence for questions of frequency. – A richly annotated corpus of non-native speech gives access not only to specific learner errors but provides a comprehensive description of all aspects of the learners’ interlanguage, combining information on different linguistic levels and non-linguistic information. This satisfies Leather’s (1999) call for an “ecological” approach to theoretical modelling in second language speech. He argued for paying more attention to experiential and environmental factors of the acquisition process and for research in non-native speech to take on a broader view. A corpus that extends over a wide selection of variables such as speaker learning history, learning situation, age and sex and across a variety of speech styles allows investigations of new issues such as co-occurrence of structures or the co-occurrence of certain linguistic with nonlinguistic features. – Corpora provide information about variation in non-native speech. By dividing the corpus into smaller subcorpora by, for example, grouping learners with the same native language or age at first exposure to the target language, or by comparing certain structures in non-native speech in different speech styles, the extent and type of variation in non-native speech can be analysed. Until recently, corpus-based research in L2 prosody was impossible due to the lack of an appropriate corpus. A small number of learner speech corpora have been set up in the area of speech technology in the past few years, mainly collected to train speech recognition systems which can then be used in man-machine conversations such as telephone booking of train tickets (e.g. the FAE (Foreign Accented English) corpus and the VILTS (Voice Interactive Language Training System) corpus). However, none of these corpora in their present form are immediately reusable for researchers in non-native prosody since they do not contain phonetic or phonological
Learner corpora in second language prosody research and teaching
151
annotations. Recently, a prosodically annotated learner corpus has become available, which will be described in the next section.
3.
The LeaP corpus
The LeaP corpus was collected between May 2001 and July 2003 as part of the LeaP (Learning Prosody in a Foreign Language) project1, which investigated the acquisition of prosody by second language learners of German and of English. The corpus consists of a total of 359 fully annotated recordings adding up to 73.941 words. The total amount of recording time is more than 12 hours. It comprises four different types of speech: – – – –
free speech in an interview situation (length between 10 and 30 minutes) reading passage (length about 2 minutes) retellings of the story (length between 2 and 10 minutes) readings of nonsense word lists (30 to 32 words)
In the LeaP corpus, different learner groups are represented: native speakers of English and of German, serving as controls, especially advanced learners (near-natives), learners before and after a training course in prosody and learners before and after a stay abroad. The English subcorpus contains recordings with 46 non-native and 4 native speakers. The mean age of the non-native speakers is 32.3 years and ranges from 21 to 60. 32 of them are female and 14 are male, and altogether, they have 17 different native languages. The average age at first contact with English is 12.1 years, ranging from one year to 20 years of age. In the German subcorpus, the mean age of the 55 non-native speakers at the time of the recording is 28.9 years and ranges from 18 to 54 years. 35 of them are female and 20 are male. Altogether, they have 24 different native languages. The average age at first contact with German is 16.7 years, ranging from three years to 33 years of age. A large number of additional data was collected for each recording, including data
152
Ulrike Gut
– about the recording (date, place, interviewer and language of the inter-
view) – about the non-native speaker (age, sex, native language/s, second
language/s, age at first contact with target language, type of contact [formal vs. natural], duration and type of stays abroad, duration and type of formal lessons in prosody, prosodic knowledge) – about motivation and attitudes (reasons for acquiring the language, motivation to integrate in the host country, attributed importance to competence in pronunciation compared to other aspects of language, interest, experience and ability in music and in acting) Annotation and text-to-tone alignment of the LeaP corpus was carried out for all reading passages, retellings and two-minute extracts of each interview. The manual annotation comprised six tiers; two further tiers were added automatically: – On the phrase tier, speech and non-speech events were annotated. The
interviewee’s speech is divided into intonational phrases. – On the words tier, words were transcribed orthographically. – On the syllable tier, syllables were transcribed in SAMPA. – On the segments tier, all vocalic and consonantal intervals plus the
intervening pauses were annotated. – On the tones tier, pitch accents and boundary tones were annotated. – On the pitch tier, the initial high pitch, the final low pitch and
intervening pitch peaks and valleys were annotated. – On the POS tier, part-of- speech coding was annotated automatically. – On the lemma tier, lemmata were annotated automatically. For a recording of about one minute length, on average, 1000 events were annotated. Figure 1 illustrates the manually annotated tiers and the annotation process with the waveform (top) and spectrogram (middle) and the six manually annotated tiers (bottom).
Learner corpora in second language prosody research and teaching
153
Figure 1. Manual annotation in the LeaP corpus. From bottom to top the tiers are the phrase tier, the words tier, the syllable tier, the segments tier, the pitch tier and the tones tier.
4.
A corpus-based analysis of vowel reduction
All previous studies investigating vowel reduction by learners of either English or German found that, in non-native speech, vowels are not reduced to an appropriate extent. Often, full vowels instead of reduced vowels are produced in unstressed syllables and the durational difference between full vowels and reduced vowels is not sufficiently large (Wenk 1985, Bond and Fokes 1985, Mairs 1989, Flege and Bohn 1989, Zborowska 2000 for English and Kaltenbacher 1998, Gut 2003 for German). Some experiments involved a comparison of L2 vowel reduction with vowel reduction processes in native speech; some compared learner groups with different native languages. In some approaches learners were presented with reading material of word lists or short phrases. Less frequently, semi-spontaneous speech as in story retellings was elicited. Many aspects of vowel reduction are still unexplored: As yet, there are no longitudinal studies on the acquisition of vowel reduction. Vowel deletion, which is very common in native speech (e.g. Helgason and Kohler
154
Ulrike Gut
1996 for German), has not been studied yet. Although native language influence has been investigated as a possible constraint of non-native vowel reduction, cross-linguistic comparisons of target language have not yet been carried out. Furthermore, no systematic analysis of the co-variance of speech style and vowel reduction has been analysed and the correlation with other prosodic features of non-native speech has not been investigated yet. In order to address these research questions, vowel reduction in the LeaP corpus was analysed quantitatively and qualitatively. For the quantitative analysis, the following measurements were taken: mean length sfv mean length srv
mean length sdv percentage red/del
syllable ratio
mean length of all syllables containing a full vowel mean length of all syllables containing the reduced vowels //, /,/ and /n/ (/n/ in German only) mean length of all syllables with a deleted vowel 100x number of all syllables with either reduced or deleted vowel divided by total number of syllables mean durational ratio of all syllable pairs in which a syllable with a full vowel is followed by a syllable with either a reduced or a deleted vowel
4.1. Results: Vowel reduction in native and non-native speech Vowel reduction in non-native German differs from that in native German in nearly all measured features (see Table 1). The mean length of all types of syllables is longer in non-native German and the syllable ratio, the durational difference between adjacent syllables with a full vowel and those with a reduced or deleted vowel, is lower. Only the percentage of syllables with reduced and deleted vowels is not significantly different between nonnative German and native German. In all variables, the standard deviation is much higher in the speech of the learners of German compared to the native speakers. The native speaker norm was defined as the native speakers’ mean value ±one standard deviation. For the syllable ratio, it lies be-
Learner corpora in second language prosody research and teaching
155
tween 1.58:1 and 1.94:1. Of the recordings with the learners of German 47 or 27.2% fall within this range. The vast majority of recordings outside the native normal range show a durational difference between the two types of syllables that is too small, only in two cases is the durational difference larger than that found in native speech. Table 1. Mean length and standard deviation of syllables with full vowels (sfv), syllables with reduced vowels (srv), syllables with deleted vowels (sdv), the percentage of syllables with reduced and deleted vowel of all syllables and the mean durational ratio of adjacent syllable pairs with the first syllable containing a full and the second a reduced or deleted vowel (syllable ratio) for all syllables in non-native German and native German. (Significant differences are indicated by **=p