Perspectives on Formulaic Language: Acquisition and Communication
This page intentionally left blank
Perspectives o...
149 downloads
1408 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Perspectives on Formulaic Language: Acquisition and Communication
This page intentionally left blank
Perspectives on Formulaic Language Acquisition and Communication
Edited by David Wood
Continuum International Publishing Group The Tower Building 80 Maiden Lane 11 York Road Suite 704 London SE1 7NX New York, NY 10038 www.continuumbooks.com © David Wood and contributors 2010 All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN:
978-1-4411-5047-9 (Hardback)
Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress.
Typeset by Newgen Imaging Systems Pvt Ltd, Chennai, India Printed and bound in Great Britain by the MPG Books Group
Contents
Notes on Contributors Acknowledgements 1. Formulaicity and Usage-based Language: Linguistic, Psycholinguistic and Acquisitional Manifestations Regina Weinert
vii xi
1
Part 1: Formulaic Language in Acquisition and Pedagogy 2. The Development of Collocation Use in Academic Texts by Advanced L2 Learners: A Multiple Case Study Approach Jie Li and Norbert Schmitt
23
3. Idiomatically Speaking: Effects of Task Variation on Formulaic Language in Highly Proficient Users of L2 French and Spanish Fanny Forsberg and Lars Fant
47
4. Effectiveness of Text Memorization in EFL Learning of Chinese Students Zhenqiong Dai and Yanren Ding
71
5. Lexical Clusters in an EAP Textbook Corpus David Wood 6. An Investigation of Lexical Bundles in ESP Textbooks and Electrical Engineering Introductory Textbooks Lin Chen
88
107
Part 2 : Identification and Psycholinguistic Processing of Formulaic Language 7. Formulaicity in Code-switching: Criteria for Identifying Formulaic Sequences Kazuhiko Namba 8. Holistic Processing of Regular Four-word Sequences: A Behavioural and ERP Study of the Effects of Structure, Frequency, and Probability on Immediate Free Recall Antoine Tremblay and Harald Baayen
129
151
vi
Contents
9. The Phonology of Formulaic Sequences: A Review Phoebe Ming Sum Lin 10. Processing MWUs: Are MWU Subtypes Psycholinguistically Real? Georgie Columbus
174
194
Part 3: Communicative Functions of Formulaic Language 11. A Text in Speech’s Clothing: Discovering Specific Functions of Formulaic Expressions in Beowulf and Blogs Matt Garley, Benjamin Slade, and Marina Terkourafi 12. The Semantic Structure of Arabic Idioms Ashraf Abdou
213
234
13. Formulaicity and Translation: A Cross-corpora Analysis of English Formulaic Binomials and Their Italian Translations Salvatore Giammarresi
257
Index
275
Notes on Contributors
Ashraf Abdou has obtained degrees in Arabic language and Islamic studies and in Linguistics from Cairo University. He also has an MA in Teaching Arabic as a Foreign Language from The American University in Cairo, and a PhD in Linguistics from the University of Manchester, where his dissertation was a corpus-based study of Arabic idioms. His teaching experience includes Arabic linguistics and Arabic as a foreign language at these three universities. Harald Baayen is a Professor at the Department of Linguistics at the University of Alberta. His research interests include quantitative linguistics, lexical statistics, exploratory data analysis, stylometry, mixed-effects modeling, morphology and morphological processing. Lin Chen worked as an electrical engineer for eight years in China before beginning her graduate studies in applied linguistics at Carleton University, where she is now a lecturer in English for Academic Purposes. Her research interests include formulaic language, corpus linguistics, and discourse analysis. Georgie Columbus is a phraseologist working in corpus linguistics and psycholinguistics. Her secondary interests lie in the variation in discourse markers between English varieties. Georgie is currently researching at the University of Alberta, Canada, on the processing of multiword units in native and non-native speakers. Yanren Ding is Professor of English, School of Foreign Studies, Nanjing University. His research interests include second language acquisition, discourse analysis and language teaching methodology. Lars Fant is a Professor in the department of Spanish and Portuguese at Stockholm University. He has taught Romance languages in a wide variety of contexts, and has researched and published on many aspects of second
viii
Notes on Contributors
language use including cross-cultural communication, discourse analysis and pragmatics, and formulaic language. Zhenqiong Dai is an English instructor, School of Foreign Studies, Nanjing University. Her research and teaching interests centre around formulaic language and the role of memorization in language acquisition. Her article in this volume, co-authored with Yanren Ding, is based on her graduate work. Fanny Forsberg is a researcher and lecturer in French at Stockholm University’s department of French, Italian and Classical Languages. She has researched and published extensively on formulaic language and high-level proficiency in second language use. Matt Garley is a PhD candidate in Linguistics at the University of Illinois at Urbana-Champaign, where he serves as editor of Studies in the Linguistic Sciences. His research interests include sociolinguistics, language contact, the language of hip-hop, and computer-mediated communication. He is currently planning a dissertation project on linguistic borrowings and the construction of identity in the German hip-hop fan community. Salvatore Giammarresi holds a PhD in Synchronic and Diachronic Linguistics from the University of Palermo, Italy. He is the site creator and moderator of the Formulaic Language Research Network (www.eflarn.ning. com). His academic interests include formulaicity, translation technology, translation theory and localization. He is a lecturer at the University of Palermo, Italy, teaching localization, computer assisted translation tools and global marketing. Professionally Salvatore is currently serving as the Vice President of Products at a web-based company in San Francisco. Jie Li is a PhD student at the University of Nottingham. Her academic interests include vocabulary learning and teaching, formulaic language, corpus linguistics, and foreign/second language acquisition. Her thesis focuses on second language (L2) learner acquisition and use of formulaic language in academic writing. She has recently published an article on Chinese advanced L2 learner’s acquisition process of lexical phrases in the Journal of Second Language Writing. Phoebe Ming Sun Lin is a researcher at the School of English Studies of the University of Nottingham, UK. At the time of writing, she is completing
Notes on Contributors
ix
a large study which provides a comprehensive description of the prosody of formulaic sequences. Her recent publications explore formulaic language from the perspectives of phonology, corpus linguistics and second language teaching and learning. Her wider research interests include formulaic language, intonation, corpus linguistics and spoken English. Kazuhiko Namba is Associate Professor of Applied Linguistics at Kyoto Sangyo University. His main research interest is the role of formulaic language in bilingual children’s language acquisition and the structural aspects of code-switching. He acquired his PhD and MA at Cardiff University. He taught English in Japanese secondary schools for over 20 years and is raising his two children as English-Japanese bilinguals. Norbert Schmitt is Professor of Applied Linguistics at the University of Nottingham. He is interested in all aspects of second language vocabulary, including vocabulary acquisition, formulaic language, vocabulary testing, the relationship between reading and listening and vocabulary learning, and implicit and explicit vocabulary knowledge. He has most recently completed a vocabulary research manual, to be published by Palgrave Press. Benjamin Slade is a PhD student at the University of Illinois, currently preparing a dissertation on the history of Sinhala interrogative constructions under the direction of Hans Henrich Hock. His earlier work was on the development of do-support in English and the evolution of Indo-Aryan compound verbs. Forthcoming work includes a study of dragon-slaying formulae in early Indo-European, to appear in Historische Sprachforschung. Antoine Tremblay obtained his BA in Hispanic Studies and MA in Spanish Morphology from Laval University, Quebec, Canada, and his PhD in Psycholinguistics from the University of Alberta, Edmonton, Canada. He is currently a postdoctoral fellow at Georgetown University Medical Center, Washington DC, in the Department of Neuroscience. His research focuses on the processing of compositional multi-word sequences in the auditory and visual modalities using behavioral and brain imaging methods. Marina Terkourafi is Assistant Professor in Linguistics, University of Illinois at Urbana-Champaign. She has research interests in post-Gricean pragmatics, theories of (im)politeness, language contact and change, language and ideology, and construction grammar(s). Her work in these areas has appeared in journals such as Cognition & Emotion, Diachronica, Journal of
x
Notes on Contributors
Historical Pragmatics, Journal of Pragmatics, The Journal of Politeness Research, and Journal of Greek Linguistics, as well as in edited collections. Regina Weinert is Reader in Germanic Linguistics at the University of Sheffield. Her main research interests are syntax and discourse-pragmatics, especially of spoken language, and usage-based approaches to language and language acquisition. Her work includes the analysis of clauses, clause complexes, focusing constructions, particles and pronouns, with an emphasis on German and English.
Acknowledgements
The editor wishes to acknowledge the support and inspiration of colleagues, mentors and friends in the development and realization of this project. Sincere thanks and appreciation go to Ito Harumi and the faculty in the Department of English at Naruto University of Education, in Tokushima, Japan. As well, special thanks to colleagues at Carleton University – Ellen Cray, Randall Gess. Mentors from the past, Mari Wesche and T. S. Paribakht at the University of Ottawa also deserve lasting gratitude. Those who have contributed to the building of the study of formulaic language are numerous, but Alison Wray, John Sinclair, A. P. Cowie, Andrew Pawley, Norbert Schmitt are among those who have made history. Additional thanks to the positive constants in my life, Jeremy Chee, Beryl Wood, and Deborah Wood-Salter and Donald Wood and their families. A round of applause goes to the contributors to this volume, who have done so much to further our knowledge of formulaic language and its vital role in language acquisition and in communication.
This page intentionally left blank
Chapter 1
Formulaicity and Usage-based Language: Linguistic, Psycholinguistic and Acquisitional Manifestations Regina Weinert University of Sheffield
Introduction A single chapter cannot do justice to the myriad approaches and the detail of studies into formulaic language and usage-based accounts of language and language acquisition, which is a sure sign of a flourishing field of enquiry. The various strands also speak for themselves in this volume. What a synthetic chapter can demonstrate is that formulaicity and the close relationship between language use and language representation are now central concerns in the study of language, evident in a cluster of booklength publications and dozens of articles in the last ten years. Questions which I raised in Weinert (1995) and associated methodological challenges are being tackled vigorously with a wide range of tools, while some answers remain elusive. My own work has since focused on the syntax and discoursepragmatics of spoken language and the implications for linguistic theory, arguing for a usage-based approach. This chapter sets out some of the key arguments and findings in work on formulaic and usage-based language as well as highlighting issues for further research.
Formulaic Language and Usage-based Language As a non-technical term, the English word formulaic connotes a lack of originality and stasis in cultural or language expression, witness three recent examples from online reviews of literature, music and film: Renowned author Dan Brown staggered through his formulaic opening sentence; Kronos becoming formulaic; Mummy 3 was formulaic, corny, and predictable. Similarly, in linguistic
2
Perspectives on Formulaic Language
traditions which place generative, syntactic rules at the centre of theories and inquiries, formulaic and also irregular language becomes a marginal or at best separate phenomenon and rule-governed, created language is accorded central, and often by implication a high status, considered to be indicative of the sophistication reached by human language (e.g. Pinker & Prince, 1991; Pinker, 1994). Wray (2002) was the first book-length treatment which gathered phenomena considered formulaic in adult language, first and second language acquisition and in language disorders, adopting an open-door policy and dispelling the notion that they are marginal in native language. Anyone looking for orientation in the field will find this and her subsequent book, Wray (2008), the most comprehensive guides. Wray (2002, p. 9) sets out to shed the baggage and associations of the label formulaic language accumulated through previous linguistic literature and to clear a path through the fifty or more alternative terms. She defines a formulaic sequence as follows: ‘ a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar.’ Subsequently Wray reserves the term formulaic sequence for her particular, theory-sensitive definition. The term formulaic language reappears as a ‘neutral mass (uncountable) noun’, including in the titles of her books, and formula ‘is used as the neutral count noun’ (Wray, 2008, p. 8). Most recent studies converge on the label formulaic as an umbrella term and refer to specific manifestations of the phenomenon with additional labels. These manifestations include (oral) narratives, prayers, proverbs, social routines, noncompositional idioms, (more or less) transparent idioms, collocations, lexical bundles, sentence stems, complex word forms, frequently used sequences of words and clauses, fixed sequences, sequences with open slots which can be filled subject to varying levels of constraints, community-wide sequences and idiosynchratic sequences. Studying formulaic language essentially involves two related tasks, data or corpus-based analysis and (psycholinguistic) experimentation, with both approaches also benefiting from being tied to theoretical argumentation. The first task is to uncover the extent and nature of formulaic language in healthy and impaired adult language use and in L1 and L2 acquisition. Computer searches of corpus data readily reveal the pervasiveness of re-current and co-occurrent word or other unit combinations. Yet selecting relevant sequences for analysis and interpreting and applying such findings requires a model of language and its psycholinguistic verification. The second task then is fraught with difficulties. Accounts of formulaic language do not only point to the
Formulaicity and Usage-based Language
3
phenomenon as a linguistic one, they assume or suggest that formulaic sequences are processed and produced as wholes, that is as single units, rather than being analysed or generated. This is said not only to apply to non-compositional language such as opaque idioms, but potentially also to sequences which can in principle be analysed. Formulaic language has tended to be examined as an aspect of the lexicon, and the obvious link to the extensive research under the label phraseology has helped to give it a coherent and substantial body of research. Wray (2002, p. 263) proposes the heteromorphic distributed lexicon, which contains different types of formulaic sequences, including sequences which at the level of the magna-language could be represented in terms of more general patterns or rules. Wray (2008, pp. 33–34) states that in her model the notion of formulaic language disappears as such. The heteromorphic lexicon contains simple as well as complex units which may be quite large and novelty lies in their combination, whether this involves many rules or simple selection from a small set of items. As Wray says herself, the specific characteristics of the units are nevertheless of major interest. However, there is a further point. While formulaic sequences are per definition ‘stored and retrieved whole’, they also per definition contain more than one unit and hence the issue of their internal structure arises, of patterns and of boundaries. Psycholinguistic verification is therefore also still necessary. One drawback of placing formulaic language within the lexicon, even if only metaphorically (Wray, 2008, p. 77), is the separation from syntax. While morphology naturally enters the arena through the inclusion of MEUs, that is, morphologically complex words may be formulaic and/or rule-generated, and while much of formulaic language is of limited generalisability, all kinds of combinatory phenomena are being studied within usage-based language models. It is in this context that the notion of formulaic language may also well disappear, that is, once a model of language uncovers and validates multiple levels of abstraction and representation. Formulaic language can in principle take its place in usage-based theories of language and is naturally aligned with such models since there is no claim or expectation of maximal analycity and minimal representation (e.g. Barlow & Kemmer, 2000; Bybee, 2006). Here we get, potentially, simple and complex forms, general rules, rules of limited generalisability, patterns with open slots, fixed expressions, multiple levels of categorisation and representation, particular exemplars, community wide sequences, and individual language users’ sequences. We get effects of locality, frequency, salience and recency, a relationship between sequentiality and hierarchical constituency, bi-directionality (from wholes to components, from components
Perspectives on Formulaic Language
4
to wholes), interaction (rather than a hierarchy) between phonology, lexis, syntax, semantics and pragmatics and we get dynamism. We get what Langacker calls ‘a sea of particularity’ (Langacker, 1987, p. 46.). In addition, cognitive, usage-based approaches such as Langacker (2000) not only reflect the immense complexities of linguistic structure, they posit a unified account of cognition built on basic psychological principles, in preference to a dual-mechanism account which acknowledges the pervasiveness of irregularity but still goes exclusively for compositionality and rules wherever patterns can be discerned. Both written and spoken language can be characterized along usagebased lines and formulaicity has been demonstrated for a variety of written texts, yet it is in spontaneous spoken language that performance factors may be seen to relate more directly to linguistic form. Spontaneous spoken language is subject to processing and production constraints which are very different for much of written language. Furthermore, spoken language is primary in humans. The concept of formulaicity and usage-based models therefore become especially pertinent, given that they include both a linguistic as well as a psycholinguistic component. What then is the evidence of formulaic and usage-based language and how can they be characterized?
Memory and Analysis
˘
The cognitive unity of formulaic sequences and the usage-based nature of grammar are explained in terms of a greater memory than processing capacity of the human brain, in terms of size versus speed (e.g. Weismer, 2004). Dabrowska (2004a) compares the human brain with a desktop computer, the latter processing at a rate a million times faster than the former. The question is, what would be achievable with the human capacity for neurons to fire about 200 times per second, with their vast number and the connections between them? de Bot (1992) calculates that speakers typically have to make a word choice decision two to five times a second from a store of 30 000 words, which may well be a conservative estimate. One way of meeting this cognitive challenge could therefore be pre-assembly and storage in long-term memory. Sensitivity to frequency is considered an indication of memory storage (the more frequent an item is, the easier it is to access). Yet idioms, especially the classic opaque ones such as kick the bucket, are not necessarily frequent (Moon, 1998). Hence particularity, and/ or irregularity, are also criterial – although irregularity is rarely absolute.
Formulaicity and Usage-based Language
5
This includes broad idiomaticity where a particluar function is expressed conventionally by one form rather than the many potential alternatives (the Pawley & Syder ‘puzzle’ (1983)), for example, telling the time as it’s half ten (Scottish English, meaning 10.30). This brings us then to specific methodologies and findings.
Data and Corpus-based Research Corpus analysis has been used extensively to search the limitations in actual language use, manifest in co-occurrence and re-currence, in the substantial work on phraseology, collocation, lexical bundles and so on (e.g. Sinclair, 1991; Stubbs, 1995; Cowie, 1998). Erman and Warren (2000), much cited, estimate that well over 50 per cent of their spoken and written English data is formulaic as opposed to novel. Studies vary greatly in estimates, mostly as an effect of differences in methodology (see Wray, 2002, pp. 28–31 for a discussion), that is, some search for fixed expressions, others have a very low threshold of frequency and sequence length, others again work with a pre-selected list of sequences. ‘Novel’ does not necessarily mean ‘ungenerated’ since a frequently used sequence could in principle still be assembled with rules. Measurements therefore always have to be accompanied by a detailed analysis of the sequences themselves. Wray points out that we also need functionally tagged corpora in order to compute the ratio of formulaic expressions to the total number of times a particular function has been expressed. This approach is common in cross-cultural pragmatic research, although mostly with elicited rather than naturalistic data. For instance, Vollmer and Olshtain (1989) note that German speakers make use of a range of linguistic options for apologizing, with some set phrases. English speakers appear to operate with a more restricted set. Finally, there is the question of how functions could have been expressed, beyond those which actually occur, but this is tricky to investigate since the options are virtually infinite. This question should remain within our vision, however, so that we do not overestimate the power of generation. It would require psycholinguistic experiments, for example, production tasks which deprive speakers of commonly used expressions or comprehension tasks with grammatically possible alternatives which do not commonly occur. Computer searches are revealing interesting patterns within formulaic language, for example, Butler (1998) notes that many fixed formulas are adverbials, and those which allow preceding or following material are often part of nominal or prepositional groups. Garley, Slade and Terkourafi
Perspectives on Formulaic Language
6
(this volume) show that formulaic sequences can serve to characterize different text genres. Attention to local effects is central to usage-based approaches and language users’ representation of language may well also be genre-bound. In addition, the need for examining dense corpora of individual speakers as opposed to cross-speaker corpora is becoming an urgent task in order to test usage-based models, especially of adult native usage. Sequences identified by computer searches may ‘intuitively’ not seem formulaic (e.g. and the) and these are often excluded from studies which attempt to compare formulaic and non-formulaic sequences and/or which compare native versus non-native use of formulaic sequences, for valid reasons (e.g. Underwood, Schmitt, & Galpin, 2004).‘Intuitively not formulaic’ typically means that there is no corresponding meaning/function attached. Yet if chunking is at least partially a processing phenomenon, and if formulaicity, frequency and particularity influence language, then even sequences which do not appear to be tied to a semantic or pragmatic function may still have unitary status. We should therefore not discount them irrevocably. Raupach (1984) is an early study which indicates that some syntactic stems such as what I mean is, I think that are formulae. Wray (2002) notes the potential formulaicity of similar sequences which do not conform to standard constituents in aphasic language (all around the). What counts as formulaic is therefore an empirical issue which links into the issue of formulaicity of syntax and mental representation more generally, to be discussed further below.
Morphology and Syntax
˘
Invoking formulaicity in morphology and syntax does not imply abandoning the notion of abstraction. But it allows us to re-adjust and re-size some generalizations. Morphology is often seen as a testing ground for dual versus single mechanism models of language and typically involves getting participants to create forms of nonce words, for example, to form plural nouns, past tense verbs and so on. Dabrowska (2004b) provides a tour de force discussion of the two types of models and shows that some of the common examples used in this context do not actually allow one to choose between them. For instance, using the English regular and irregular past tense confounds regularity with other factors such as the high token frequency, the lack of transparency of form and the narrow domains of application in irregular
Formulaicity and Usage-based Language
7
˘
verbs as opposed to the regular verbs. Testing native speakers’ productivity with Polish genitive and dative inflections to study the effect of regularity per se, using nonce words, she found that type frequency and phonological heterogeneity were much better predictors of productivity than regularity. In other words, speakers did not operate with maximally productive rules which apply equally across all contexts. Dabrowska argues that these results are consistent with a view of a single mental mechanism dealing with schemas of varying degrees of generality. Formulaicity in syntactic constructions has been claimed for lexicalized sentence stems which have specific discourse functions (Nattinger & de Carrico, 1992), for example, it seems to me, what I’d like to show, it has to be said. Spoken English clefts in particular exhibit some clear patterns of use. Weinert and Miller (1996) showed that the majority of Reverse WH-clefts of the type [NP BE WH-Clause] (that’s what I mean) had in fact deictic cleft heads, the main patterns being that’s what, that is what, that’s where and that is where. Calude (2008) observes a similar trend for Australian English. O’Keefe, McCarthy, & Carter (2007) confirm this for their data and add that most of these clefts can be accounted for by the formula that’s what I + forms of say, tell, mean, think, wonder and want. In German, where Reverse WH-clefts are rare but carry a strong emphatic function, they are equally limited, occurring as das ist x (das) was (‘that is x that what’), for example, das ist genau das was ich immer sage (‘that is exactly that that I always say’). Weinert and Miller (1996) and O’Keeffe et al. (2007) note preferred WHclefts, for example, what I being a frequent opening sequence. Bybee (2002) argues that frequency in sequentiality affects constituency, evidenced by phonological fusions such as wanna, hafta. Some such fusions affect semantically linked items, or traditional constituents, for example, can’t in English or l’ami (’the friend’) in French. Many others do not and seem to ‘violate’ constituency, for example, I’ll, I’m, I’ve, or German hörma(l), sagma(l) [Imperative+Modal Particle] (‘listen’, etc). Halliday and Hasan (1976) suggest that the resulting constituents may then assume a functional unity as the interpersonal ‘Modal Element’ versus the ‘Propositional Element’ of a clause. This does not readily appear to work for other combinations such as [Preposition + Determiner] in German (e.g. zum, im, ‘to the’, ‘in the’), yet their functional status remains to be examined. The unitary status of sequences has further been shown for some expressions in experiments. Sosa and MacFarlane (2002) report that language users find it difficult to identify ‘of’ in sort of and kind of. While semantic unity often seems to match formal unity, frequency effects apparently also hold without a form being tied to a meaning (Saffran, Alsin, & Newport, 1996).
8
Perspectives on Formulaic Language
Even the seemingly most rule-based and flexible syntactic phenomena illustrate boundedness, at least in spoken language. Let us take the examples of noun phrases and of word order in declarative main clause, the most basic and frequent of syntactic units. Miller and Weinert (1998) found that in informal spoken English and German noun phrases exhibit a small range of types and simple structure. Almost 50 per cent of the English noun phrases and over 60 per cent of the German ones are single pronouns. If we add to this the single nouns, ca 64 per cent of English and German NPs contain only one constituent. 16.3 per cent of English and 16.9 per cent of German NPs have the structure [Determiner + Noun]. This means that 80 per cent of NPs are accounted for in terms of Pronoun, Noun and [Determiner + Noun]. The remaining NPs include postmodified NPs (with prepositional phrases and relative clauses), some compound NPs and premodified NPs, but the latter amount to only 3.3 per cent in English and 3.6 per cent in German. The data came mostly from young adults, most of the German speakers had a university education. The question then arises as to the nature of the speaker’s mental representation of what a linguist could analyse as a noun phrase with its various pre- and postmodifying structures. At the very least this invites us to examine the level of abstraction language users operate with. Bybee (2002) sees the noun phrase as showing the strongest signs of being a constituent, based on the frequency with which nouns are preceded or followed by certain items. She suggests that various levels of abstraction are found, from very specific (word level), to partially general (e.g. my + Noun), to more general (Possessive + Noun), to fully general (Determiner + Noun). It could also be that grammatical function preferences affect such categorisations, for example, 80 per cent –90 per cent of German clause-initial pronouns are subjects (Weinert 2007), which brings me to my second example. Word order in main clauses is also a good testing ground for usage-based models, given the frequency and potential for variation inside main clauses (Weinert to appear). German is famously known for allowing virtually any constituent in clause initial position, in contrast to English. The difference is found especially with clause-initial object NPs, which in English are associated with strong focus, for example, to highlight a contrast or emphasize an NP, whereas in German they are mainly associated with thematic information. The distribution of elements in this slot is nevertheless not even. Durrell (2002) refers to an estimate that two thirds of initial elements in German in all registers are subjects. It appears that for spoken German especially we may have to revise the view of an entirely open clause-initial position. An examination of 2000 clause-initial NPs in my data revealed very
Formulaicity and Usage-based Language
9
confined usage. 1000 NPs each were taken from private, informal everyday conversations and from public semi-formal discussions and private academic student-lecturer consultations. The picture was strikingly similar for both data sets. Around 80 per cent of clause-initial NPs are pro-forms; of these ca 64 per cent are single pronouns and 16 per cent are deictic adverbs such as da and dann (‘there’ and ‘then’). Only ca 8 per cent are lexical adverbs and ca 12 per cent are full noun phrases. If we look only at noun phrases we find that ca 85 per cent are pronouns (compared with the figure of 60 per cent for all noun phrases referred to above in the section on noun phrases). In the informal data, 13.5 per cent are object NPs and in the formal data 9.7 per cent are object NPs (with one dative object, the others accusative in each data set; also included are 10 cases of dative mir (8 in the formal data) in constructions such as mir scheint (‘it seems to me’). To sum up, main clause initial NPs are 85 per cent pronouns and of these 90 per cent are subjects. Taking all clause-initial elements into account, 80 per cent are pro-forms, 64 per cent of these pronouns, the rest mostly da and dann. The similarity between the two data sets is rather surprising, especially given that in the formal data the noun phrases in other clause positions are much more complex. Again the question is, what type of word order generalizations do language users operate with? While the details remain to be investigated, in spontaneous spoken German an open rule for clause-initial position would seem rather too powerful, both for linguists as well as for language users.
Psycholinguistic Reality There is then plenty of surface linguistic evidence of formulaicity and constrained usage, that is, of limited distribution across constructions, even in the potentially most flexible and general ones such as noun phrases and main clauses. Some evidence of an accompanying psycholinguistic effect comes from the ‘fusion’ effects mentioned above, where phonological alternations, mostly reductions, and accompanying orthographic alternations (would of for would have) indicate that sequences are no longer subject to analysis. Usage-based effects have been shown by work on Polish nonce words, which suggest that morphological rules may well be local rather than maximally abstract and regular. The evidence regarding the processing of formulaic sequences and of their advantage over non-formulaic sequences is more sparse, not surprisingly, given how difficult this is to examine. Van Lancker, Canter, & Teerbeek, (1981) report that in a production task
10
Perspectives on Formulaic Language
speakers inserted longer pauses between the words of literal versus metaphorical interpretations of idioms such as he was skating on thin ice. But work on idioms is inconclusive and regularly suggests that there is no difference in comprehension speed between literal and metaphorical interpretations (Gibbs, 1985), or between idioms and literal paraphrases (Gibbs, Bogadanovich, Sykes, & Barr, 1997). Schmitt, Grandage, & Adolphs (2004) remind us that corpus-derived recurrent clusters are not necessarily psycholinguistically real. Conversely, as pointed out above, corpus-based analysis may discard formulae which are psycholinguistically real. A lack of processing difference (whatever the method) does not necessarily mean that formulaic and non-formulaic sequences are not processed differently, we may simply be considering the wrong sequences. The pre-selection of formulaic sequences is far from straightforward. Namba (this volume) tackles the issue of identification with a complex set of diagnostic criteria and Lin (this volume) refines the examination of phonological coherence as an indicator of formulaicity. Most processing studies rely on measuring participants’ production speed or reaction times on pre-selected sequences. Bod (to appear) for instance, compared frequently used clauses such as I like it with less frequent ones such as I keep it. He found that native participants were able to decide more quickly that a sequence was a possible English sequence when it was a more frequent one. Columbus (this volume) provides some evidence for non-semantic, non-constituent sequences such as at the end of the being processed faster than ‘compositional’ [their label] control sequences such as to eat a sandwich. How any such frequent, apparently compositional and non-constituent sequences are processed is still an open question. Conklin and Schmitt (2008) used formulaic sequences whose initial elements made later ones highly predictable, for example, [beauty is in the] [eye of the beholder], verified by means of cloze tests on native speakers. A self-paced line-by-line reading task found that such formulaic sequences were read more quickly than non-formulaic ones. The value of self-paced reading is that it can potentially tap the representation of individual words or items in a sequence, which is precisely what is needed. Schmitt and Underwood (2004) examined the terminal words of sequences in such a task. They found no difference in reading times between words which occurred in formulaic versus non-formulaic sequences in native speakers, which they attribute partially to the word-by-word presentation or to having to press a space bar, as this may have disrupted the holistic processing. Eye-tracking has been used to show that the final word in a formulaic sequence is fixated less often and for a shorter time than the same final
Formulaicity and Usage-based Language
11
word in a non-formulaic sequence in the case of native speakers (Underwood et al., 2004). Speed is certainly a plausible indicator of holistic processing, given the general consensus on the limits to our processing capacity. Tremblay and Baayen (this volume) make a convincing case for level of absolute speed indicating holistic processing and a lack of assembly of four-word sequences (rather than comparative speed between sequences or between native and non-native speakers), based on commonly accepted lexical access speed for single words. In addition, they found evidence of a frequency effect of three-word sequences of words and of single words within the longer sequences, and suggest that multi-word strings are stored as parts as well as wholes. The notion ‘holistic’ becomes further relativized in Columbus (this volume), who teases out differences according to type of multi-word unit. Wray (1992, 2008) alerts us to the possibility that psycholinguistic tests may not tap formulaicity, given clashes in findings between test and non-test situations in clinical investigations. Some of the authentic and spontaneous tasks referred to above address this problem to some extent. Studies which attempt, scrupulously, to develop reliable criteria for identifying formulaic sequences, may in fact be forced into a corner, back to a relatively small group of non-compositional, opaque idioms and possibly phonologically fused items. On the other hand, holistically processed formulaic sequences may be found to exhibit a much greater range – including those based on form, not only on function – once we examine sequences in context and in naturally occurring language, and it is these contexts which may usefully be narrowed and controlled for in studies, rather than just pre-selected sequences. Yet the difficulties in demonstrating that formulaic sequences are processed differently from non-formulaic sequences may not only be a question of methodology. This is where considering formulaicity as an aspect of usage-based language knowledge may throw light on what the issue is. If language knowledge is usage-based then it should not be surprising that it is difficult to find consistent and absolute differences between formulaic and non-formulaic sequences. This is because all sequences will be subject to factors such as frequency, familiarity, recency, and context (e.g. genre, scripts), rules can be local as well as general, there are different levels of abstraction and sequences and units have a potential effect on each other. Just as not every generalization has to occur at the most abstract level, so not every formulaic sequence has to be entirely cast in concrete form. What this means in terms of the internal composition of sequences is not clear. For instance, observed general word frequency effects on the
Perspectives on Formulaic Language
12
processing of formulaic sequences may not mean that such words are actually parts within the sequences, they may simply be bumps in the landscape, just as certain parts of words, such as the beginning and end, are more prominent, hence the ‘bathtub effect’ of lexical recall (Aitchson, 1987). Columbus (this volume) also observes non-linear frequency effects. One further intractable issue, which is implied but rarely stated explicitly in studies, is that a mental representation of a whole will in production at least, have to appear in linear order. Prior to being uttered, this whole may not necessarily be of linear form and the nature of the transition also deserves to be studied.
First Language Acquisition
˘
First language acquisition is clearly of immense, central relevance to a theory of language. Usage-based approaches allow for item-based learning and try to chart the relationship between exemplars and rules (e.g. Tomasello, 1992; 2002; 2003; Diessel, 2004; Brandt, Diessel, & Tomasello, 2008). This work regularly comments on children’s use of formulaic language and local schemas, rather than abstract, adult rules (whatever the adult rules may in fact be). Tomasello (1992) very extensively examines the verbs and the structures they occur in for one child from age 15–24 months. The analysis suggests that the child operated with structures associated with particular verbs rather than classes of verbs, that the structures were independent of each other and that recency in the child’s production also played a role. Diessel (2004) proposes how complex syntactic constructions may develop out of exemplars of simple ones, showing again that local restrictions go hand in hand with development, for instance descriptive relative clauses with main clause predicate complement heads or heads of presentational main clauses are the first to appear, which is consistent with the early appearance of such main clauses. Again, at the very least, such findings invite us to rethink where generalizations may be found, what they may be based on or look like. Furthermore, in terms of input, children have been shown to be exposed to highly repetitive lexical frames, for example, utterance initial item-based frames such as what, that, it, there, you and so on (Tomasello, 2002). Dabrowska and Lieven (2005) investigate the development of English syntactic questions, looking at the internal structure of the NPs and VPs, not only the position of the auxiliary and a WH-word. Questions were chosen since they are often seen as evidence of children’s abstract rules,
Formulaicity and Usage-based Language
13
˘
especially in generative accounts. The children were older, that is, 2 and 3 years of age. Dabrowska and Lieven were able to account for 90 per cent of the children’s utterances with a lexically specific grammar based on the child’s linguistic experience. While their corpus was dense, it still only comprised about 7 per cent of the total of this experience. Experimental study is then also much needed in this area, but this is still relatively sparse (Tomasello, 2000). Extensive work using nonce words has been done on verbs, with some work on morphology (Dabrowska, 2004b; 2006) and is gathering momentum. These studies are highly supportive of the exemplar-based leaning view, at least in young children. Akhtar and Tomasello (1997), Tomasello and Brooks (1999) and Akhtar (1999) found that 2–3.5-year old children do not readily produce or comprehend transitive constructions with novel verbs in terms of patterns abstracted from known verbs and learn about new verbs just in terms of these verbs. Childers and Tomaselleo (2001) found that 2.5-years olds were best at generalizing transitive constructions to new verbs in stabilising frames with pronouns such as he’s VERB-ing it, regardless of their level of familiarity with the verbs. Overgeneneralization on the other hand, often cited as evidence of abstract rules, hardly occurs before age 2.5–3. (Pinker, 1989). Overgeneralization also appears to be favoured by familiarity and constrained by lack of familiarity and the availability of alternatives, that is, it does not apply across all possible structures (Brooks, Tomasello, Lewis, & Dodson, 1999). In addition, previous studies have shown how specific communicative needs or crisis points can lead learners towards analysis, revealing what was previously a unit (Iwamura, 1980, p. 85). Such detailed analysis of communicative situations therefore continues to hold vital clues. The full extent to which first language acquisition involves formulaic sequences is not yet known, although evidence for a substantial role is being put forward. Yet we cannot ignore children’s demonstrated use of simple forms and words and their (own) subsequent construction of two, three, four word utterances. In addition, specific languages also yield different learning tasks. More comparative research with children acquiring languages which differ in terms of formulaicity, regularity and irregularity are needed to show when and how abstraction emerges. At the same time, a theory which links formulaic language to the evolution of language offers an intriguing set of questions (Wray, 2008). But just as we should not assume that once analycity is possible it replaces formulaicity, so it may be premature to suppose that just because formulaicity may have been evolutionary prior, it remains a default ontogenically. ˘
14
Perspectives on Formulaic Language
Second Language Acquisition Since the revival of interest in formulaic language in the mid-nineties, studies have examined its role in non-native language as communicative, production and processing, and learning strategy, but not to equal degrees. Myles, Hooper & Mitchell (1998) and Myles, Mitchell, & Hooper (1999) report some evidence for a relationship between formulas and the development of rules among classroom learners and Weinert (1994) notes some classroom-induced negative effects. But most prominent has been the issue of native-like idiomaticity in the broad sense, that is, including non-compositional idioms, collocations and other conventional formmeaning pairings, their development and how learners may be helped towards their use (e.g. Granger, 1998; Howarth, 1998; de Cock, 2000; Schmitt, 2004; Siyanova & Schmitt, 2008; Li & Schmitt and Dai & Ding, this volume). Such work includes comparisons of actual language use with the language of textbooks, for example, complex academic texts have been shown to consist of conventionalized combinations, ranging from discourse organisers to stance bundles and a variety of referential expressions (Chen & Wood, this volume). Research which aims to establish the extent of idiomatic language use among non-natives as well as compare the processing of formulaic sequences in native and non-native language reports both quantitative and qualitative differences between native and non-native speakers, with non-natives generally not operating with the appropriate range and function of formulas or not experiencing the same processing advantage when they do. As I have suggested elsewhere (e.g. Weinert, 1995), the (understandable) aim to develop non-native speakers’ knowledge and use of formulaic language through appropriate teaching may run counter to their own processes. Wray (2002) proposes that post-school and especially adult learners are driven more towards analysis than formulaicity. She sees the difficulty of achieving native-like idiomaticity as arising out of the need to develop strong associations between components, something which is not easy to achieve when faced with their particular social, intellectual and learning experience. One might say that it is difficult to become fully socialized, in a broad sense, a second time around, without the stable linguistic, developmental and motivational environment normally afforded first language learners. Kecskes (2002), adds to this the notion of conceptual socialisation, requiring the development of a common underlying conceptual base for the two languages. A further important aspect which is raised only sporadically is the use of non-native formulaic language, both as a communicative as well as a processing strategy (e.g. Rehbein, 1987;
Formulaicity and Usage-based Language
15
Bolander, 1989). Rehbein talks about migrant speakers’ formulae constituting a ‘self-imposed reduction of their own system of needs’. The learner’s lack of native-like formulaic use may also be due to affective variables and an investigation into which areas are particularly associated with a learner’s identity and goals seems timely. Studies into intercultural communication and competence (e.g. Woodin, 2007) have long acknowledged the complexities of development in both conceptual structure and pragmatic conventions, questioning the aim for learners of achieving native command of a second language (indeed an oxymoron). Socialisation research, including in L2 contexts, has become a field of its own (e.g. Kitzinger, 2000; Kasper, 2001).
Society and Culture Wray (2002, 2008) suggests that formulaic language is a mechanism for the promotion of self. This cannot readily be maintained, both as an explanation and as a single motivating factor. Created language can serve the self equally, and this is implied both by the Wray’s NOA (needs of analysis) hypothesis (i.e. there will be times when analysis is needed to promote the self). In addition, Wray herself (2008) points out that the language of esoteric societies may be characterized by a high level of formulaicity, whereas those of exoteric societies may have a lower level. So-called Western societies value originality, creativity and effort of thought, at least in some areas. What type of formulaic language attracts critical responses or prompts novelty is itself an interesting social phenomenon. Taking exception to you know what I mean can express prejudice against Americans, you know may be considered an unnecessary filler indicative of an empty head and refusing to use the phrase push the envelope a vestige of rebellion in an ever more corporate climate of educational establishments. Creative adaptions of the officialese used in the former GDR was considered a sign of subversion and liberation (v. Polenz, 1993). Even in related languages and cultures differences can be observed. This has long been recognized in relation to classic idioms (see also Abdou, this volume) as well as in work on cross-cultural pragmatics, but there are likely to be further, less obvious effects. Forsberg and Fant (this volume) show differences between Chilean Spanish and French in relation to grammatical and discursive versus lexical formulaic sequences, which they see as being related to structural differences between the languages. To what extent cultural differences are implicated in other differences will require
16
Perspectives on Formulaic Language
detailed and sensitive study. In the case of translation (Giammarresi, this volume), this opens up the question of the effect of a change from formulaic to non-formulaic, from the conventional to the novel. Finally, individuals may vary in their attachment to conventions or their thirst for the novel, not only as experience but also as meaningful activity.
Conclusion The relationship between memory and analysis and the conventional and the novel will continue to keep researchers busy in the areas of language acquisition, language use, language representation and processing and as a socio-cultural phenomenon. Three areas in particular seem to me pivotal: the nature of spoken language (child, adult, child-directed, native and non-native), the psycholinguistic status of linguistic units (interpretation of processing speed, unit internal structure, linearity/non-linearity, functionalisation of formal unity, level of abstraction, redundancy in representation, transition from representation to production, production vs. comprehension) and the socio-cultural parameters which influence the level of formulaicity and the balance between convention and novelty (in communities and for individuals, effect on acquisition, implications for translation). Recent research shows a vigorous engagement with the challenges inherent in studying formulaicity and in verifying usage-based accounts of language. In fact, considering formulaic language as an aspect of usage-based language alerts us to the possibility, likelihood even, that we may never find the perfect methodology for demonstrating the difference between formulaic and non-formulaic sequences per se. Instead, as shown by the studies in this volume, fine-grained analyses of formal and functional aspects of re-current and co-occurrent sequences, carried out for a wide range of purposes, are revealing just how immensely complex language use is and how variously language knowledge may be represented. To adapt a friend’s phrase, there is no time to sleep on our bay leaves.
References Aitchison, J. (1987). Words in the mind. Oxford: Basil Blackwell. Akhtar, N. (1999). Acquiring basic word order: Evidence for data-driven learning of syntactic structure. Journal of Child Language, 26, 339–56. Akhtar, N., & Tomasello, M. (1997). Young children’s productivity with word order and verb morphology. Developmental Psychology, 33, 952–65.
Formulaicity and Usage-based Language
17
˘
Barlow, M,. & Kemmer, S. (2000). (Eds.), Usage-based models of language. Stanford CA: Centre for the Study of Language and Information. Bod, R. (to appear). Exemplar-based syntax: How to get productivity from examples. The Linguistic Review, Vol. 23, Special Issue on Exemplar-Based Models of Language. Bolander, M. (1989). Prefabs, patterns and rules in interaction? Formulaic speech in adult learners’ L2 Swedish. In K. Hyltenstam & L. Obler (Eds.), Bilingualism across the lifespan: Aspects of acquisition, maturity and loss (pp. 73–86). Cambridge: Cambridge University Press. Brandt, S, Diessel, H., & Tomasello, M. (2008). The acquisition of German relative clauses. Journal of Child Language, 35 (2), 325–48. Brooks, P., Tomasello, M., Lewis, L., & Dodson, K. (1999). Young children’s overgeneralisations with fixed transitivity verbs. Child Development, 70, 1325–37. Butler, C. (1998). Multi-word lexical phenomena in Functional Grammar. Revista Canaria de Estudios Ingleses, 36, 13–36. Bybee, J. (2002). Sequentiality as the basis of constituent structure. In T. Givón & B. F. Malle (Eds.), The evolution of language out of pre-language (pp. 107–34). Amsterdam/Philadelphia: John Benjamins. Bybee, J. (2006). Frequency of use and the organization of language. New York/ Oxford: Oxford University Press. Calude, A. (2008). Demonstrative clefts in spoken English. Doctoral Thesis, University of Auckland. Childers, J. B., & Tomasello, M. (2001). The role of pronouns in young children’s acquisition of the English transitive construction. Developmental Psychology, 37, 739–48. Conklin, K., & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly that non-formulaic language by native and nonnative speakers? Applied Linguistics, 29 (1), 72–89. Cowie, A. P. (1998), (Ed.). Phraseology: Theory, analysis and application. Oxford: Oxford University Press. Dabrowska, E. (2004a). Language, mind and brain. Edinburgh: Edinburgh University Press. Dabrowska, E. (2004b). Rules or schemas? Evidence from Polish. Language and Cognitive Processes, 19, 225–71. Dabrowska, E. (2006). Low-level schemas or general rules? The role of diminutives in the acquisition of Polish case inflections. Language Sciences, 28, 120–35. Dabrowska, E., & Lieven, E. (2005). Towards a lexically specific grammar of children’s question constructions. Cognitive Linguistics, 16, 437–74. de Bot, K. (1992). A bilingual production model: Levelt’s ‘speaking’ model adapted. Applied Linguistics, 13, 1–25. De Cock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and writing. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory. (pp. 51–68). Amsterdam: Rodopi. Diessel, H. (2004). The acquisition of complex sentences. Cambridge: Cambridge University Press. Durrell, M. (2002). Hammer’s German grammar and usage. Fourth edition. London: Arnold. ˘
˘
˘
18
Perspectives on Formulaic Language
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20, 29–62. Gibbs, R. (1985). On the process of understanding idioms. Journal of Psycholinguistic Research, 14 (5), 465–72. Gibbs, R., Bogadanovich, J., Sykes, J., & Barr, D. (1997). Metaphor in idiom comprehension. Journal of Memory and Language, 37, 141–54. Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and application (pp. 145–60). Oxford: Oxford University Press. Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman. Howarth, P. (1998). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and application (pp. 161–86). Oxford: Oxford University Press. Iwamura, S. G. (1980). The verbal games of pre-school children. London: Croom Helm. Kasper, G. (2001). Four perspectives on L2 pragmatic development. Applied Linguistics, 22 (4), 502–30. Kecskes, I. (2002). Situation-bound utterances in Ll and L2. Berlin: Mouton de Gruyter. Kitzinger, C. (2000). How to resist an idiom. Research on Language and Social Interaction, 33 (2), 121–54. Langacker, R. W. (1987). Foundations of cognitive grammar, Vol 1. Theoretical prerequisites. Stanford, CA: Stanford University Press. Langacker, R. W. (2000). A Dynamic usage-based model. In M. Barlow & S. Kemmer (Eds.), Usage-based models of language (pp. 1–63). Stanford CA: Centre for the Study of Language and Information. Miller, J., & Weinert, R. (1998). Spontaneous spoken language: Syntax and discourse. Oxford: Oxford University Press. Moon, R. (1998). Fixed expressions and idioms in English. Oxford: Clarenden Press. Myles, F., Hooper J., & Mitchell, R. (1998). Rote or rule? Exploring the role of formulaic language in classroom foreign language learning. Language Learning, 48 (3), 223–363. Myles, F., Mitchell, R., & Hooper, J. (1999). Interrogative chunks in French L2. A basis for creative construction? Studies in Second Language Acquisition, 21, 49–80. Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press. O’Keeffe, A., McCarthy M., & Carter, R. (2007). From corpus to classroom: language use and language teaching. Cambridge: Cambridge University Press. Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–226). New York: Longman. Pinker, S. (1989). Learnability and cognition: The acquisition of verb-argument structure. Cambridge, MA: Harvard University Press. Pinker, S. (1994). The language instinct. Harmondsworth: Allen Lane, Penguin Press. Pinker, S., & Prince, A. (1991). Regular and irregular morphology and the psychological status of rules in grammar. Proceedings of the 17th Annual Meeting of the Berkeley Linguistics Society (pp. 230–51). Berkeley, CA: BLS.
Formulaicity and Usage-based Language
19
Raupach, M. (1984). Formulae in second language speech production. In H. Dechert, D. Möhle, & M. Raupach (Eds.), Second language productions (pp. 114–37) Tübingen: Gunter Narr. Rehbein, J. (1987). Multiple formulae. Aspects of Turkish migrant workers’ German in intercultural communication. In K. Knapp et al. (Eds.), Analysing intercultural communication (pp. 215–48). Berlin: Mouton de Gruyter. Saffran, J. R., Alsin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Schmitt, N. (2004). (Ed.) Formulaic sequences: Acquisition, processing and use. Amsterdam/ Philadelphia: John Benjamins. Schmitt, N., Grandage, S., & Adolphs, S. (2004). Are corpus-based recurrent clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 127–51). Amsterdam/Philadelphia: John Benjamins. Schmitt, N., & Underwood, G. (2004). Exploring the processing of formulaic sequences through a self-paced reading task. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 173–72). Amsterdam/Philadelphia: John Benjamins. Sinclair, J. McH. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A multi-study perspective. Canadian Modern Language Review, 64 (3), 429–58. Sosa, A. V., & MacFarlane, J. (2002). Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word of. Brain and Language, 83 (2), 227–36. Stubbs, M. (1995). Collocations and cultural connotations of common words. Linguistics and Education, 7 (4), 379–90. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge: Cambridge University Press. Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74, 209–53. Tomasello, M. (2002). The emergence of grammar in early child language. In T. Givón & B. F. Malle (Eds.), The evolution of language out of pre-language (pp. 309–28). Amsterdam/Philadelphia: John Benjamins. Tomasello, M. (2003), Constructing a language. Cambridge, MA and London: Harvard University Press. Tomasello, M., & Brooks, (1999). Early syntactic development: A construction grammar approach. In M. Barrett (Ed.), The development of language (pp. 161–90). Hove: Psychology Press. Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it. An eye-movement study into the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 153–72). Amsterdam/Philadelphia: John Benjamins. v. Polenz, P. (1993). Die Sprachrevolte in der DDR im Herbst 1989. Zeitschrift für Germanistische Linguistik, 21, 127–49. Van Lancker, D., Canter, G. J., & Teerbeek, D. (1981). Disambiguation of ditropic sentences: Acoustic and phonetic cues. Journal of Speech and Hearing Research, 24, 330–35.
20
Perspectives on Formulaic Language
Vollmer, H. J., & Ohlstain, E. (1989). The language of apologies in German. In S. Blum-Kulka, J. House, & G. Kasper (Eds.), Cross-cultural pragmatics: Requests and apologies. Norwood: Ablex. Weinert, R. (1994). Some effects of a foreign language classroom on the development of German negation. Applied Linguistics, 15 (1), 76–101. Weinert, R. (1995). The role of formulaic language in second language acquisition: A review. Applied Linguistics, 16 (2), 80–205. Weinert, R. (2007). Demonstrative and personal pronouns in formal and informal conversations. In R. Weinert (Ed.), Spoken Language Pragmatics: An analysis of form-function relations (pp. 1–28). London/New York: Continuum. Weinert, R. (to appear). German free word order – reality or myth? The front-field in spoken main clauses. Weinert, R., & Miller, J. (1996). Cleft constructions in spoken language. Journal of Pragmatics, 25, 173–206. Weismer, S. E. (2004). Memory and processing capacity. In R. M. Kent (Ed.), MIT encyclopedia of communication disorders (pp. 349–52). Cambridge, MA: MIT Press. Woodin, J. (2007). Intercultural positioning: tandem conversations about word meaning. In R. Weinert (Ed.), Spoken Language Pragmatics: An analysis of formfunction relations (pp. 208–28). London/New York: Continuum. Wray, A. (1992). The focusing hypothesis. Amsterdam: John Benjamins. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press.
Part 1
Formulaic Language in Acquisition and Pedagogy
This page intentionally left blank
Chapter 2
The Development of Collocation Use in Academic Texts by Advanced L2 Learners: A Multiple Case Study Approach Jie Li and Norbert Schmitt University of Nottingham
Introduction It is now generally agreed that the native-like use of collocations (word combinations such as heavy smoker, make a speech, bitterly cold) is an important element of proficient language use (e.g. Sinclair, 1991; Wray, 2002). However, researchers have found that L2 learners rely heavily on creativity so as to produce expressions which are simply not used by native speakers (Pawley & Syder, 1983; Wray, 2002). Skehan (1998) and Foster (2001) claim that non-native speakers, unlike native speakers, generate a great proportion of their language from rules instead of lexicalized routines. Native speakers use conventional expressions to convey meaning, while learners often express meaning with unidiomatic combinations of words. A number of studies (Granger, 1998; Howarth, 1998; Nesselhauf, 2003) have shown that even advanced L2 learners often experience problems with collocations in written English. For example, Granger (1998) used a corpus-based approach to look at -ly intensifier + adjective collocations automatically extracted from advanced French learners’ academic essays and similar essays written by native English students. She found that one type of intensifier, that is, ‘boosters’ (e.g. deeply, strongly, highly) were underused by French learners compared with the frequency (i.e. the number of types and tokens) of those used by natives. She then concluded that advanced French learners of English did use collocations, whereas they tended to underuse native-like expressions but overuse those unidiomatic word pairs which have direct L1 translation equivalents. Using a frequency-based statistic approach, Lorenz (1999) also investigated intensifier-adjective collocations in “expository-argumentative” texts
24
Perspectives on Formulaic Language
produced by advanced German learners and native British students. By calculating association measures of collocations and type-token ratios, he found that advanced German learners of English had smaller repertoires of collocations (as measured by type-token ratio) and overused a limited number of high frequency collocations (as measured by t-score and MI). Building on Lorenz’s statistical approach, one of Siyanova and Schmitt’s (2008) three studies used corpus-based frequency data and mutual information statistics (MI) to investigate adjective-noun collocations in advanced Russian learners’ and native university students’ written English. By consulting the BNC for counting frequency and calculating the MI value of each collocation, they found that only 45 per cent of the collocations used by advanced Russian learners in their writing texts were appropriate (i.e. frequent and strongly associated English word combinations). Following a phraseological approach, Howarth (1998) focused on restricted verb-noun collocations (e.g. make a claim, reach a conclusion) identified from native and advanced non-native academic written corpora. Based on the norms established in native speaker writing, he reported that advanced non-native MA students employed about 50 per cent fewer restricted collocations than natives. He also found that approximately 6 per cent of collocations produced by advanced learners are non-conventional. Based on Howarth’s analysis, it seems that among the three collocational groups (i.e. restricted collocations, free collocations, and idioms), restricted collocations are most problematic for advanced non-native learners. Another more comprehensive study which explored advanced German speaking learners’ verb-noun collocation (e.g. take a break, shake one’s head) is that of Nesselhauf (2003). Like Howarth, she also adopted a phraseological approach and classified collocations into three groups, namely, free combinations (e.g. want a car), collocations (e.g. take a picture) and idioms (e.g. sweeten the pill). She found that learners made the greatest proportion of errors with collocations (79 per cent), followed by free combinations (23 per cent) and idioms (23 per cent). As can be seen, the existing studies all used a corpus-based native versus non-native comparison to investigate learners’ collocation use and identify gaps between these two populations. In general, three main approaches have been employed to define and identify collocations in written texts. One has studied all word combinations of a particular grammatical form (e.g. –ly amplifier + adjective), regardless of whether they are ‘idiomatic’ or not (Granger, 1998). A second is the so-called phraseological approach, represented by Howarth (1998) and Nesselhauf (2003, 2005). Borrowing the Russian School’s definition and classification of phraseology (Cowie,
Collocation Use in Academic Texts
25
1998), collocations are typically identified according to two defining criteria: semantic opacity – the degree to which words are used with their ‘dictionary’ meanings, and fixedness – the degree to which elements of a phrase can be substituted. A final approach uses occurrence frequency of word combinations within the investigated corpus as identification criteria. Thus, Lorenz (1999) and Siyanova and Schmitt (2008) compared word pairs in non-native and native equivalent corpora and used statistical ‘association measures’ to identify which pairs were characteristically idiomatic. Although different approaches have been employed to identify and define collocations, they point to the same conclusion: L2 learners have difficulties with collocation use in their language production. However, existing studies have largely been descriptive in nature, and tend to focus on one-off compositions produced by learners. Little research has focused on an empirical analysis of L2 learners’ collocations over time, which could inform about how collocational knowledge develops. A small number of longitudinal studies have been undertaken to investigate the role of formulaic language improvement in young L2 learners’ language acquisition (e.g. Wong-Fillmore, 1976; Huang & Hatch, 1978). Apart from Adolphs and Durow’s (2004) longitudinal study of two L2 postgraduates’ three-word formulaic language improvement in spoken English, few studies have done the same for advanced L2 learners’ improvement of formulaic language. The only truly reliable way to identify patterns of development in the use of collocations by L2 learners is to conduct longitudinal studies of the same learners over time. This study attempts to do this using a multiple case-study approach. The purpose is to provide a rich and detailed description of several individual learners’ use of collocations over a period of one academic year. We are also interested in how the individual results combine into group results. As our goal is descriptive, we begin with no formal research questions. However, the following general questions helped to focus the investigation: 1. Will advanced Chinese L2 learners improve their collocation use in academic writing assignments over a one-year study abroad postgraduate programme? Are the collocations used by Chinese students similar to those used by published authors? 2. Are the statistical measures of collocations we use valid for the investigation of collocation improvement over an academic year? To what extent we can put our faith in the statistical results of group patterns of collocation development?
Perspectives on Formulaic Language
26
Methodology Participants The four participants were female Chinese postgraduates, on a one-year MA programme in English Language Teaching (here after ELT) in the School of Education at the University of Nottingham. All of them were English majors from China, with ages ranging from 26 to 29. Their English language learning experiences in China were similar in that their exposure to target language was mainly from non-native teacher-dominated classroom instruction, which was generally grammar-based and input-poor. The participants had similar career plans and expectations, namely returning to China to start teaching in colleges or universities. Overall, the four participants were advanced English language learners, who received similar English language training in China, and were exposed to the same L2 environment at a British university. Details of the individual participants are shown in Table 2.1 as follows: Table 2.1
Participant’s Personal Details
Participant Age Education background Teaching experience IELTS/TOEFL score LH
29
Technical College
5 years
6.5, Writing: 6.0
TT
26
Bachelor’s Degree
4 years
640 (TOEFL), Writing: 5.5
WL
27
Technical College
5 years
6.5, Writing: 6.0
YJ
27
Bachelor’s Degree
5 years
6.5, Writing: 6.0
Since the scores of IELTS and TOEFL are not directly comparable, it is impossible to compare TT’s English language competence to the other three participants. Nevertheless, similar to the other members of the participant group, she is a proficient English language user on the basis of her TOEFL marks.
The learner corpus The learner corpus consisted of 36 academic writing assignments (including eight essays and one dissertation for each participant) written over a period of one academic year (i.e. three terms). Since the four participants are all from the one-year MA programme in ELT, their writing requirements were the same except for the coursework for elective modules. This MA course is
Collocation Use in Academic Texts
27
comprised of two core modules: Applied Linguistics and Syllabus Design & Methodology; four elective modules and a final dissertation requirement. Each module requires a 3,000 word essay for coursework apart from core modules, which require 6,000 words (i.e. two essays of 3,000 words each). The word count requirement for the dissertation in Term 3 is 12,000 words. Overall, each participant is required to produce 12,000 words in each of three terms over the course of 12 months. Developing the corpus involved collecting each text in electronic form, cleaning it (i.e. removing unnecessary parts: titles, headers, footers, captions, and reference list), and categorizing it according to the term it was written. The resulting corpus contained 149,587 running words (tokens) and 7,259 types, which was divided into three subcorpora: Term 1 – 50,376 running tokens, Term 2 – 48,530, and Term 3 – 50,681. The three subcorpora are, therefore, directly comparable in terms of text length and text style.
The BNC academic written corpus The academic written sub-set of the BNC World Edition (2000) was used as the ‘proficient writer’ comparison corpus. It consists of 501 texts totaling over 16 million words, selected from books and journal articles in the six disciplines proposed by Lee (2001): humanities/arts, medicine, natural science, politics/law/education, technical/engineering, and social science.
Procedure All adjective-noun combinations were extracted from the learner corpus in the following manner. The corpus was searched for the 187 nouns from Sublist 1 of Coxhead’s (2000) Academic Word List (AWL), and those selected which were used by at least one participant over time (at least in two of the three terms). Then WordSmith 5.0 was used to locate adjacent adjective collocates for each of these recurring academic nouns. Collocations were excluded from analysis if they included one of the following constituents: hyphenated adjectives (e.g. corpus-based approach), pronouns, possessives, determiners, numbers/ordinals, adjectives to signify nationalities (e.g. Chinese, English), and terminology (e.g. Lexical Approach, Universal Grammar). This selection procedure produced 41 nouns, leading to 147 different adjective-noun collocation types in Term 1, 95 in Term 2, and 107 in Term 3, a total of 494 collocation tokens and 299 collocation types.
28
Perspectives on Formulaic Language
The number of collocate types (i.e. different collocations) and tokens (i.e. occurrences of each type) produced by each participant across terms was counted and recorded. For example, the node noun role was used with different adjectives by participant WL in her academic texts across three terms, that is, important in Term 1, central in Term 2, and key, potential, critical, significant in Term 3. Based on these frequency counts, the type-token ratio (TTR) of each collocation type was calculated for all of the four participants. The t-score and MI value for each collocation type was also calculated. Since it is claimed that low-frequency collocations jeopardize the reliability of all association measures (Manning & Schütze, 1999; Evert & Krenn, 2001), all the extracted collocations with less than four occurrences in the BNC academic corpus were excluded from MI and t-score calculations. Various researchers define their cut-off points differently: Manning and Schütze (1999) suggest a minimum of three occurrences; Stubbs (2001) five occurrences; Church and Hanks (1990) five occurrences. We used a cut-off point of four to include as large a set of learner collocations as possible. To explore the four participants’ collocational development pattern over the period of 12 months, the TTR, t-score, and MI values for each participant were then averaged within each term and these averages compared across terms. Finally, to explore the development of the strongly-associated collocations preferred by expert writers, the adjective-noun collocations were ranked into different bands according to their MI values.
Results The value of case-studies is the elicitation and analysis of rich data, and so we will report both the group results and the results of each individual participant.
Participants’ overall collocation use Group 494 adjective-noun collocation tokens were identified from the learner corpus, made up of 299 types. Of these 299 types, over 40 per cent can be considered frequent and strongly-associated (which we will term robust in this section), at least according to the criteria of appearing four or more
Collocation Use in Academic Texts Table 2.2
29
Participant Group’s Overall Collocation Use
Adjective-noun Tokens Types Tokens Types Tokens Types Tokens Types collocations (total) (total) (Term 1) (Term 1) (Term 2) (Term 2) (Term 3) (Term 3) F3 & t-score>2
283
123
112
67
81
42
90
54
Total
494
299
198
147
142
95
154
107
% of robust collocations
57.3
41.1
56.6
45.6
57.0
44.2
58.4
50.5
times in the 16 million-word BNC academic subcorpus and having association figures of MI>3 and t-score>2 (Table 2.2). On the other hand, there was a similar percentage of rarely-occurring combinations, that is, appearing less than four times in the BNC subcorpus. However, in terms of instances of use (tokens), the participants used the robust collocations considerably more often than the infrequent ones: TTR of 0.43 for robust collocations in comparison to 0.90 for infrequent ones. On average, each robust collocation type occurred more than twice in the 36 academic writing texts. The table also shows how the participants’ use of robust collocations developed over the academic year. Although the number of types used declined after Term 1, the percentage of robust types used remained essentially the same from Term 1 to Term 2, and then increased slightly by the time the dissertation was written up in Term 3. In terms of tokens, the number of robust collocations used over the three terms showed a similar pattern as that of types, while the percentage of robust tokens only ranged from 56.6 per cent to 58.4 per cent during the academic year, and so was relatively stable. Overall, after the one-year exposure to an English academic environment, it appears that the participant group as a whole displayed little if any improvement in the number of robust adjective-noun collocation types or tokens produced in their academic writing.
Individual participants Although the group results showed little improvement, an analysis of the individual participants’ results shows quite varied behaviors. Table 2.3 shows that although all participants used about 40 different collocation types in Term 1, LH and WL used steadily fewer over the year, while TT dropped in
Perspectives on Formulaic Language
30 Table 2.3
Participants’ Use of Robust Collocations over Three Terms
Participant
Adjective-noun collocation
Term 1 Term 1 (tokens) (types)
Term 2 (tokens)
Term 2 (types)
Term 3 (tokens)
Term 3 (types)
LH
F≥4 & MI>3 & t-score>2
20
17
16
13
9
8
Total
44
40
36
32
32
27
% of robust collocations
45.5
42.5
44.4
40.6
28.1
29.6
F≥4 & MI>3 & t-score>2
18
17
18
12
22
19
Total
43
41
32
22
38
33
% of robust collocations
41.9
41.5
56.3
54.5
57.9
57.6
F≥4 & MI>3 & t-score>2
36
19
20
13
17
11
Total
57
39
29
21
25
17
% of robust collocations
63.2
48.7
69.0
61.9
68.0
64.7
F≥4 & MI>3 & t-score>2
38
28
27
19
42
27
Total
54
43
45
37
59
42
% of robust collocations
70.4
65.1
60.0
51.4
71.2
64.3
TT
WL
YJ
Term 2, but recovered somewhat in Term 3. YJ remained relatively stable in the number of types she used through the year. Regarding the number of collocation tokens used by the four participants across three terms, it displays a rather similar development trend to that of collocation types apart from that of YJ. She tended to use more collocation tokens (59 in Term 3 compared with 54 in Term 1) by the end of the academic year, largely due to frequent repetition (with TTR 0.71 in Term 3, and 0.80 in Term 1). It is also interesting to note the percentage of robust collocations used. In conjunction with her reduced diversity of collocation types/tokens, LH also dropped in the percentage of robust collocations, both types and tokens. Overall, her mastery of these collocations seems to have deteriorated over the year. Conversely, although WL declined in the number of collocation types/tokens used, her percentage of robust collocation types increased over the year, while her percentage of robust collocation tokens reached the peak in Term 2, then dropped slightly in Term 3. She therefore used
Collocation Use in Academic Texts
31
fewer types over time, but a greater percentage of those types were similar to those used by proficient English writers. YJ had a dip in Term 2, but ended up in Term 3 essentially where she began in Term 1, both in number of types and percentage of robust types/tokens. Thus, her usage of collocations was relatively stable over the year. TT also had a dip in numbers of types/tokens produced in Term 2, but her percentage of robust collocations (both types and tokens) steadily increased over the three terms. Overall, her figures indicate gradual improvement in collocation mastery.
Development in the diversity of adjective-noun collocations produced The average number of adjective types used to describe academic nouns provides a general measure of the diversity of adjective-noun collocations produced. The group mean result in Table 2.4 exhibits U-shaped behavior, with the Term 3 figure not recovering to the Term 1 figure. However, this group average does not show the substantial differences between the individual participants. In fact, the group profile only serves to disguise the very
Table 2.4
Average Number of Adjective Types per Noun across Three Terms
2.70 2.50 2.30 2.10 1.90 1.70 1.50 Term 1
Term 2
Term 3
LH
2.00
1.68
1.59
TT
1.67
1.57
1.83
WL
2.60
1.62
1.89
YJ
2.15
2.18
2.33
Mean
2.11
1.76
1.91
32
Perspectives on Formulaic Language
real differences in the participants’ individual development of adjective variation. LH started with an average of two adjective types in the first term of her academic year, followed by a consistent decrease from 1.68 to 1.59 in the next two terms afterward. This continuous decline in the mean number of adjective types over the course of three terms indicates that LH used less diverse adjective-noun collocations (about 20 per cent less) by the end of her study abroad programme. In contrast, participant YJ showed an opposite developmental trend in the use of adjectives to describe academic node nouns in her academic writing tasks over the course of three terms. At the beginning of the academic year, an average of 2.15 adjective types were used. This figure rose to 2.18 and 2.33 respectively in the following two terms. The steady increase suggests that YJ used slightly more diverse (approximately 8.4 per cent more) adjective-noun collocations by the end of her MA course. Both TT and WL experienced a decrease in the number of adjective types from Term 1 to Term 2, and a substantial increase from Term 2 to Term 3, although their developmental profile is very different. WL’s employment of adjective types dropped sharply from 2.6 to 1.62 (about 37.7 per cent less), followed by a substantial rise of nearly 17 per cent from 1.62 to 1.89. This left her using less diversity of adjective-noun collocations over the course of the year. On the other hand, TT initially experienced a slight decline of approximately 6 per cent from 1.67 to 1.57, followed by a rise of 16.6 per cent from 1.57 to 1.83. Unlike WL, by the end of the academic year, TT used more various adjective-noun collocations (9.6 per cent more) compared with those used in Term 1. Changes in the repetition of adjective-noun collocation TTR value can provide indication of the repetition frequency of collocation use. Table 2.5 shows the TTR value of target adjective-noun collocations for each participant and for the participant group as a whole. The TTR pattern for the group shows a rather stable and subtle decline from 0.91 to 0.87 over the academic year. This steady drop in TTR value over time suggests slightly more repetition of collocation by the four participants as a whole. However, as the decrease is less than 4.5 per cent, it is probably not particularly meaningful. Of more interest is the individual behavior, which again varies substantially among the participants. The only participant with a profile which in
Collocation Use in Academic Texts Table 2.5
33
Type-Token Ratios of Adjective-Noun Collocations across Three Terms
1.00
0.95
0.90
0.85
0.80
0.75
Term 1
Term 2
Term 3
LH
0.98
0.94
0.86
TT
0.98
0.81
0.95
WL
0.78
0.90
0.81
YJ
0.89
0.94
0.85
Mean
0.91
0.90
0.87
any way resembles the group profile is LH, and even here the rate of decrease is much more extreme than the group profile. She started with a TTR of 0.98 in Term 1, dropping to 0.94 and 0.86 in Terms 2 and 3 respectively. This steady decline (nearly 12 per cent decrease) in the TTR suggests that LH tended to repeat collocations more often at the end of the academic year. Unlike LH, participant TT’s TTR value underwent a noticeable fluctuation over the course of three terms. Her TTR value began at 0.98 in Term 1, followed by a considerable drop to 0.81 in Term 2, and ending up nearly where she started at 0.95 by the end of her MA course. It is difficult to say what caused the drop in Term 2, other than to note that it was not based on a single aberrant paper, as TT submitted four papers in this term, as did all the participants. Participants WL and YJ share a similar trend, both experiencing a rise of TTR in Term 2, and a drop afterward in Term 3. Overall, WL showed slightly more repetition of collocations than YJ throughout the year.
Perspectives on Formulaic Language
34
Development of high-frequency and typical collocation use We have seen changes in the participants’ diversity and repetition of adjective-noun collocation use, and now focus on their production of the type of collocations frequently used by native professional writers in their academic publications, as measured by the t-score statistic and the BNC Academic reference corpus (Table 2.6). The group result indicates no change in t-score from Term 1 to Term 2 (5.30), and then a slight improvement to 5.44 in Term 3. This suggests that the four participants as a group used more frequent/typical adjective-noun collocations in their dissertations than in their earlier assignments. However, in this case, this profile accurately represents none of the individual participants’ profiles. LH’s average t-score in Term 1 was 5.20, which rose to 5.47 and then dropped to 5.10 in Term 3. Thus, over the year, there was no improvement in LH’s higher-frequency collocation use. It suggests that LH did not use more native-like adjective-noun collocations which are commonly used by professional expert writers in academic texts. WL’s
Table 2.6
T-scores of Adjective-Noun Collocations across Three Terms
6.00 5.90 5.80 5.70 5.60 5.50 5.40 5.30 5.20 5.10 5.00 Term 1
Term 2
Term 3
LH
5.20
5.47
5.10
TT
5.09
5.40
5.68
WL
5.60
5.29
5.94
YJ
5.36
5.05
5.04
Mean
5.31
5.30
5.44
Collocation Use in Academic Texts
35
developmental trend is almost a mirror image of LH’s with t-score averages of 5.60, 5.29 and 5.94 for Terms 1, 2, and 3 respectively. It seems that by the end of the MA programme, when WL wrote up her high-stakes dissertation, she tended to use collocations with higher frequency levels, compared with those used in her earlier assignments. YJ produced a declining profile, dropping from an initial t-score of 5.36 to 5.04/5.05, which indicated a tendency to use adjective-noun collocations which were less frequent and typical by the end of the academic year. Finally, TT produced the type of profile which one might expect given the rich linguistic environment, consistently rising throughout the three terms. She started with an average t-score of 5.09, which thereafter rose to 5.40 and 5.68 in the following two terms. This steady increase suggests that the collocations which occurred in TT’s academic writing assignments over the course of the 12-month postgraduate programme were, generally speaking, increasingly more typical of proficient writers. Development of strongly-associated collocation use Since MI value is known to emphasize a rather different set of collocations from t-score (Schmitt, in press), a similar analysis was carried out using the MI statistic. It highlights collocations which are typically not very frequent, but which are strongly associated when they do occur (e.g. tectonic plates). The group averages (Table 2.7) show a very shallow U-shaped profile, which can probably be best interpreted as no meaningful change across the different terms. But again, the group averages do not accurately represent any of the individual profiles. LH’s collocation use showed a continuous decline in MI values from 4.33 to 3.95 over the course of three terms. This consistent decrease suggests that participant LH tended to use adjective-noun collocations with less association strength in her dissertation, compared with those used in her writing tasks completed in Term 1. By contrast, TT’s collocation use displayed an opposite developmental direction. Her MI averages increased over time (4.51, 4.51, 5.48), which indicates TT’s use of adjective-noun collocation by the end of her study abroad programme was more nativelike, since such strongly-connected collocations characterized the professional writers’ academic texts in the BNC sub-corpus. Although participants WL and YJ underwent completely different developmental trends over the year, they both ended up with lower MI scores in comparison with their initial levels in Term 1. Despite the fluctuations which took place within the length of 12-month postgraduate programme,
Perspectives on Formulaic Language
36 Table 2.7
MI Scores of the Adjective-Noun Collocations across Three Terms
5.60 5.30 5.00 4.70 4.40 4.10 3.80
Term 1
Term 2
Term 3
LH
4.33
4.09
3.95
TT
4.51
4.51
5.48
WL
4.50
4.80
4.46
YJ
5.06
4.42
4.75
Mean
4.60
4.46
4.66
both WL and YJ showed little growth in the employment of adjectivenoun collocations with stronger association strength. This suggests that the collocations used by participants WL and YJ did not become more expert-writer-like after the one-year exposure to the academic target language environment.
Differences in the distribution of collocations according to MI banding In order to investigate the participant group’s collocation patterns in terms of the distribution of strength of association over time, we classified all the adjective-noun collocations used by the four Chinese MA students into five bands on the basis of their MI score values and raw frequency counts obtained from the reference BNC sub-corpus. The MI statistic tends to highlight collocations which are not frequent, but which are highly associated, and are thus likely to be very salient to native speakers (and perhaps proficient non-natives as well). It is thus useful to explore whether the participants began using more of the higher MI collocations, as these may be particularly important in providing a sense of native-likeness to written
Collocation Use in Academic Texts
37
compositions (Durrant & Schmitt, 2009). We use the term ‘non-associated’ to represent those two-word combinations which are either unattested or with raw frequency of below four in the BNC academic texts. (We recognize that these very infrequent combinations may well be associated, but use this terminology in order to clearly differentiate these combinations from our other categories.) ‘Weak-strength collocations’ are those which occur more than four times in the BNC reference corpus with a MI score of less than 3. ‘Moderate-strength collocations’ have MI scores of 3<MI