Exploring Second-Language Varieties of English and Learner Englishes: Bridging a paradigm gap (Studies in Corpus Linguistics)

Exploring Second-Language Varieties of English and Learner Englishes Studies in Corpus Linguistics (SCL) SCL focuses ...

Author: Prof. Dr. Joybrato Mukherjee | Prof. Dr. Marianne Hundt

82 downloads 998 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Exploring Second-Language Varieties of English and Learner Englishes

Studies in Corpus Linguistics (SCL) SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline. For an overview of all books published in this series, please see http/benjamins.com/catalog/scl

General Editor

Consulting Editor

Elena Tognini-Bonelli

Wolfgang Teubert

The Tuscan Word Centre/ The University of Siena

University of Birmingham

Advisory Board Michael Barlow

Graeme Kennedy

Douglas Biber

Geoffrey N. Leech

Marina Bondi

Michaela Mahlberg

Christopher S. Butler

Anna Mauranen

Sylviane Granger

Ute Römer

M.A.K. Halliday

Jan Svartvik

Yang Huizhong

John M. Swales

Susan Hunston

Martin Warren

University of Auckland Northern Arizona University University of Modena and Reggio Emilia University of Wales, Swansea University of Louvain University of Sydney Jiao Tong University, Shanghai University of Birmingham

Victoria University of Wellington University of Lancaster University of Nottingham University of Helsinki University of Michigan University of Lund University of Michigan The Hong Kong Polytechnic University

Volume 44 Exploring Second-Language Varieties of English and Learner Englishes. Bridging a paradigm gap Edited by Joybrato Mukherjee and Marianne Hundt

Exploring Second-Language Varieties of English and Learner Englishes Bridging a paradigm gap Edited by

Joybrato Mukherjee Justus Liebig University Giessen

Marianne Hundt University of Zurich

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

Library of Congress Cataloging-in-Publication Data Exploring second-language varieties of English and learner Englishes : bridging a paradigm gap / edited by Joybrato Mukherjee, Marianne Hundt. p. cm. (Studies in Corpus Linguistics, issn 1388-0373 ; v. 44) Includes bibliographical references and index. 1. Second language acquisition--Study and teaching. 2. Language and languages--Study and teaching. 3. English language--Variation. I. Mukherjee, Joybrato. II. Hundt, Marianne. P118.2.E97 2011 427--dc22 2011000209 isbn 978 90 272 2320 3 (Hb ; alk. paper) isbn 978 90 272 8714 4 (Eb)

© 2011 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Table of contents Introduction: Bridging a paradigm gap Marianne Hundt and Joybrato Mukherjee Modal auxiliaries in second language varieties of English: A learner’s perspective Carolin Biewer English in Cyprus: Second language variety or learner English? Christiane M. Bongartz and Sarah Buschfeld

1

7

35

From EFL to ESL: Evidence from the International Corpus of Learner English 55 Gaëtanelle Gilquin and Sylviane Granger Formulaic sequences in spoken ENL, ESL and EFL: Focus on British English, Indian English and learner English of advanced German learners Sandra Götz and Marco Schilk Studying structural innovations in New English varieties Ulrike Gut Interrogative inversion as a learner phenomenon in English contact varieties: A case of Angloversals? Michaela Hilbert Overuse of the progressive in ESL and learner Englishes – fact or fiction? Marianne Hundt and Katrin Vogel Typological profiling: Learner Englishes versus indigenized L2 varieties of English Benedikt Szmrecsanyi and Bernd Kortmann

79

101

125

145

167



Exploring Second-Language Varieties of English and Learner Englishes

A principled distinction between error and conventionalized innovation in African Englishes Bertus van Rooy

189

Discussion forum: New Englishes and Learner Englishes – quo vadis? Marianne Hundt and Joybrato Mukherjee

209

Bionotes Index

219 221

Introduction Bridging a paradigm gap Marianne Hundt and Joybrato Mukherjee

University of Zurich and Justus Liebig University, Giessen

The present book goes back to a workshop on “Second-language varieties of English and learner Englishes” at the First Conference of the International Society for the Study of English (ISLE-1) in Freiburg in October 2008, which brought together scholars from all branches of English linguistics. The general topic of this inaugural conference of ISLE was “The Linguistics of English: Setting the Agenda”; the great interest among the conference participants in the workshop on “Second-language varieties of English and learner Englishes” was triggered by the wide-spread feeling that it is necessary to develop an integrated view of English as a Second Language (ESL) and English as a Foreign Language (EFL) and to put this on the agenda of English linguistics. All the papers in the present collection, most of which have emerged from selected presentations at the ISLE-1 workshop, thus aim at bridging what Sridhar & Sridhar (1986) have called the ‘paradigm gap’ between research into learner Englishes (e.g. English produced by learners of English in Germany and Japan) in the tradition of second-language acquisition (SLA) research on the one hand and research into institutionalized second-language varieties (e.g. Indian English and Nigerian English) in former colonial territories on the other. In spite of Sridhar & Sridhar’s (1986) plea for an integrated approach almost 25 years ago, these two objects of inquiry have continued to be treated as fundamentally different and unrelated areas of research ever since – notwithstanding some early attempts at comparing the features, functions and the underlying acquisitional processes of second-language varieties and learner Englishes (e.g. Williams 1987) as well as a few notable recent publications (e.g. Nesselhauf 2009). The rigour with which researchers from both lines of research have abstained from taking the other group of non-native Englishes as a product of a different, yet not entirely dissimilar language-acquisition process into account also has to do with linguistic taboos, especially on the part of researchers interested in secondlanguage varieties of English to establish these New Englishes as full-fledged



Marianne Hundt and Joybrato Mukherjee

varieties with the potential to develop endonormative and local standards and norms. These emerging local standards and norms should not be conflated, in their view, with the error-focused description and analysis of foreign language learners’ output as a deviation from an exonormative norm; consider, in this context, for example Kachru’s (1982) clear distinction between deviations (in ESL) and mistakes (in EFL). However, since both learner Englishes and second-language varieties are typically non-native forms of English that emerge in language contact situations and that are acquired (more or less) in institutionalized contexts, it is high time that they were described and compared on an empirical basis in order to draw conceptual and theoretical conclusions with regard to their form, function and acquisition. Such descriptive studies and comparisons were not possible on a large scale in the 1970s, 1980s and 1990s as the relevant computer corpora of secondlanguage varieties of English (e.g. many components of the International Corpus of English, ICE) as well as learner Englishes (e.g. the International Corpus of Learner English, ICLE) have only become available recently. This book thus aims at bridging the afore-mentioned paradigm gap by: 1. presenting empirical, in particular corpus-based, case studies of features of learner Englishes and second-language varieties of English, e.g. with regard to the use of formulaic routines; 2. discussing similarities and differences against the background of theoretical models and conceptions, e.g. stages in the second-language acquisition process and stages in the evolution of New Englishes; 3. analyzing forms of English that sit somewhat uneasily on the boundary between ESL (i.e. the Kachruvian ‘outer circle’) and EFL (i.e. the Kachruvian ‘expanding circle’), e.g. English in Cyprus and South Africa; 4. assessing the suitability of categorial labels such as ESL and EFL as well as traditional distinctions such as the one between native and non-native speakers; 5. sketching out the future agenda of an integrated approach to non-native Englishes including both institutionalized second-language varieties and learner Englishes. An integrated and comprehensive approach to non-native Englishes is particularly relevant to the future agenda of English linguistics because today the English language is used to a much larger extent as a non-native language (ESL/EFL) than a native language (ENL) – be it as the global language of science and technology, as a link language in multilingual postcolonial societies or as a lingua franca between speakers with different mother tongues, to name but a few examples of contexts in which English is used routinely by many L1 speakers of languages other than English.

Introduction

The papers in the present volume address a range of hotly debated issues involved in – and arising out of – the empirical description, analysis, comparison and modelling of second-language varieties of English and learner Englishes. The authors were asked to address some or all of a number of lead questions in their papers, be it in setting the research context, in the discussion of the descriptive findings or in the concluding remarks. These lead questions were the following ones: 1. How can we distinguish the description of systematic features of a variant/variety from the analysis of errors? Is that distinction relevant in the first place? 2. a. To what extent can similar “routes of development” (cf. Mesthrie & Bhatt 2008) and/or stages of acquisition be posited for English as a second language and English as a foreign language? b. Is it useful and/or possible to provide for an integrated model for secondlanguage varieties of English and learner Englishes, e.g. in a framework based on the notion of contact phenomena/varieties? 3. a. Are traditional distinctions such as the well-established distinction between ENL, ESL and EFL and/or the related Kachruvian distinction between the inner circle, the outer circle and the expanding circle still useful and viable? b. What about the hotly disputed distinction between nativeness/native speakers and non-nativeness/non-native speakers? 4. To what extent are corpus data and corpus-linguistic methods relevant to the joint description and modelling of English as a second language and English as a foreign language? All the authors have combined their own objects of inquiry with answers to some or all of these lead questions. In the first paper, Carolin Biewer analyzes the use of modal auxiliaries across a wide range of Englishes in Africa, Asia and the South Pacific on the basis of comparable corpora. Her results trigger some interesting questions with regard to the suitability of the ESL-EFL distinction in general and the gradient nature of the second-language status of New Englishes in particular. Christiane M. Bongartz and Sarah Buschfeld look at English in Cyprus and assess to what extent it can be viewed as a second-language variety and/or as a learner English variant. Their sociolinguistic description and corpus findings make it clear that English in Cyprus is best viewed as a hybrid case, for which a variety spectrum offers a suitable descriptive tool. Gaëtanelle Gilquin and Sylviane Granger look at the use of the preposition into in the Spanish, French, Dutch and Tswana components of ICLE and compare it with native British English.They show that individual learner Englishes are more or less similar to native English with regard to different aspects of prepositional use, the emerging complex picture corroborating the adequacy of the label learner Englishes (rather than learner English);





Marianne Hundt and Joybrato Mukherjee

Tswana English even defies classification as either an ESL or EFL variety. Sandra Götz and Marco Schilk provide a quantitative and qualitative analysis of lexical bundles in native, second-language and learner language corpora. From their findings a very detailed picture of differences in the use of 3-grams between the three types of English speakers emerges, which indicates, inter alia, different degrees of formulaicity in ENL, ESL and EFL. Ulrike Gut’s paper focuses on a core issue in research into non-native Englishes, namely the question of how to categorize structural changes in New Englishes: are they innovations or (learner) errors? She argues that the answer to this question depends essentially on speakers’ attitudes and the status of the new variant or variety of English at hand. The nonstandard use of inversion is at the heart of Michaela Hilbert’s paper. Specifically, she analyzes interrogative inversion in Indian English, Singaporean English and Irish English; she argues convincingly that structurally identical forms and patterns may be based on vastly different processes, depending on the individual characteristics of the contact variety of English. Marianne Hundt and Katrin Vogel start off from a very detailed quantitative analysis of the use of the progressive in ENL, ESL and EFL forms of English on the basis of comparable corpora. Their findings lead them to call into question the seemingly neat divides between the three types of English, especially in the light of the complex interaction between globalization and localization of English on the one hand and cross-varietal influences between Englishes on the other. In contrast, Benedikt Szmrecsanyi and Bernd Kortmann argue that institutionalized second-language varieties and learner Englishes can be distinguished very clearly from a typological perspective. In particular, they analyze and compare the degrees of grammatical analyticity and grammatical syntheticity across a wide range of components of ICE and ICLE. Finally, Bertus van Rooy zooms in on New Englishes in Africa and discusses to what extent errors and innovations interact in the formation of new norms in this specific context. On the basis of three case studies, he introduces the notions of grammatical stability and grammatical acceptability as two essential criteria which allow linguists to identify emergent norms. In the present book, we have tried to not only collect a selection of papers that have emerged from presentations at the ISLE-1 workshop on “Second-language varieties of English and learner Englishes” but also to capture the essence of the highly inspiring and at times controversial discussions after the presentations and in between the sessions. To this end, we have included a discussion forum in the final section of the book. All contributors were confronted with a selection of theoretical or methodological core statements from the articles (i.e. the starting point for the discussion forum) and were asked to comment on them. The discussion brings together and reviews the key strands of argumentation and the major points of convergence and controversy throughout the papers, and it sheds light on a

Introduction

wide-ranging debate of the state of the art in research into second-language varieties of English and learner Englishes, some of the major concepts (and also some of the wide-spread myths) as well as potential avenues for future research. We hope that the present selection of papers and the discussion forum will trigger off a renewed interest in an integrated approach to second-language varieties of English and learner Englishes – another step, hopefully, on the way to bridging the still existing paradigm gap. Finally, we would like to thank Rosemary Bock and Sandra Götz for their invaluable help at all stages of the editing process. Marianne Hundt and Joybrato Mukherjee References Kachru, B.B. 1982. Models for non-native Englishes. In The Other Tongue: English across Cultures, B.B. Kachru (ed.), 31–57. Oxford: Pergamon. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Linguistic Varieties. Cambridge: CUP. Nesselhauf, N. 2009. Co-selection phenomena across new Englishes: Parallels (and differences) to foreign learner varieties. English World-Wide 30(1): 1–26. Sridhar, K.K. & Sridhar S.N. 1986. Bridging the paradigm gap: Second language acquisition research and indigenized varieties of English. World Englishes 5(1): 3–14. Williams, J. 1987. Non-native varieties of English: A special case of language acquisition. English World-Wide 8(2): 161–199.



Modal auxiliaries in second language varieties of English A learner’s perspective Carolin Biewer

University of Zurich Although Sridhar and Sridhar pointed out as early as 1982 that the two linguistic fields of second language acquisition (SLA) and New English studies could benefit from each other, the gap between the two disciplines has never been closed. This article draws attention to some of the reasons why these two disciplines have not come together and discusses how SLA theory could be applied to New English studies to explain grammatical patterns found in many, if not all, L2 varieties of English. As a case study the usage of modals and semimodals of obligation and necessity in various varieties of English as a second language (ESL) in Africa, Asia and the South Pacific will be considered. In this context it will also be discussed to what extent the (ENL)-ESL-EFL distinction of Kachru’s model is still suitable if we now include a learner’s perspective and focus on similarities between ESL and EFL (English as a foreign language). As differences in the usage of modal auxiliaries in different ESL varieties are mostly quantitative rather than categorical, a corpus linguistic approach was chosen.

1. Introduction In 1982 Sridhar & Sridhar drew attention to the lack of cooperation between the linguistic fields of second language acquisition (SLA) and New English studies. Although it had been claimed that SLA theory was generally applicable and that outer circle varieties of English had not explicitly been excluded from this approach, the difference between New Englishes and English as a foreign language had not been considered, so that the application of SLA theory to New Englishes



Carolin Biewer

remained questionable (cf. Sridhar & Sridhar 1992: 91ff).1 While their article discusses in detail how these two areas of research could benefit from each other, SLA theory and New English studies have until today remained two largely separate approaches to the use of English by non-native speakers. This article will explore some of the reasons for this on-going separation and make suggestions as to how SLA theory could be integrated into the research of New Englishes. It discusses to what extent a distinction between English as a second language (ESL) and English as a foreign language (EFL) is still a sensible approach, while their similarities, in particular their mutual origin as learner varieties, should not be neglected. As a case study for these theoretical reflections I have chosen to study modal auxiliaries in second language varieties of English in Africa, Asia and the South Pacific. The system of modal auxiliaries in English is highly complex in terms of function and meaning (e.g. Stephany 1995: 105), which makes it very interesting to follow the development of the modal system in the acquisition process. Similarities in the usage of modal auxiliaries between ESL and EFL may point to similarities in the acquisition process for these two groups of non-native English. On the other hand, in the semantic distinction of epistemic and deontic meaning, which is inherent in all languages of the world (e.g. de Haan 2006: 45 for ‘ability’), the appropriateness of the strength of a statement of probability or advice/permission is determined by the value system of the individual society.2 These “culturally construed notions of linguistic presupposition and propriety” (Kwachka & Basham 1990: 413) are likely to trigger lasting features in ESL to reflect cultural heritage and cultural identity. This could be one of the major aspects to distinguish ESL from EFL, in which the final goal of the learner will be to achieve native speaker competence. In EFL deviations that arise from learner mistakes will be eradicated; one’s own cultural background will not necessarily be expressed with the learned language (also see Kachru’s distinction of ESL as ‘norm-developing’ versus EFL as ‘norm-dependent’, e.g. in Bolton (2006: 249)). By examining modal auxiliaries in different ESL varieties – which have no historical or genetic connection to each other – one may therefore be able to find common patterns that could be explained with general SLA processes and distinct patterns that could be explained by 1. New Englishes in this paper are identified with the terms ‘L2 varieties of English’, ‘English as a second language’ (ESL) and ‘outer circle varieties of English’. They are to be distinguished from Learner English, which is also called ‘English as a foreign language’ (EFL) or ‘expanding circle varieties’, and from English used by native speakers, which is called ‘English as a native language’ (ENL) or ‘inner circle varieties’. The distinction ENL-ESL-EFL or inner circle – outer circle – expanding circle is adopted from Kachru (1985). 2. See for instance Basham and Kwachka (1989: 131) on differences of modal usage in Eskimo communities in Alaska or Biewer (2009: 51) on differences of modal usage in Fijian communities in Fiji.

Modal auxiliaries in second language varieties of English

differences in the local communities. One would not only be able to predict a path of nativization of certain patterns but also to discuss, with the modal auxiliaries as an example, the value of SLA theory for the research of New Englishes and Kachru’s model of the ESL-EFL distinction. As distinctions in the usage of modal auxiliaries are mainly quantitative distinctions, it could also be demonstrated that a corpus linguistic approach for the study of modal auxiliaries is beneficial. In Section 2 I will examine more closely how and why the paradigm gap between New English studies and SLA theory emerged and how it can be bridged. The important question to discuss in this context is to what extent we can and should distinguish between ESL and EFL. In that respect Kachru’s model of inner circle – outer circle – expanding circle varieties will be re-examined. Section 3 introduces several principles and constraints of SLA which could be applied to New English studies and discusses to what extent they can be used to explain the modal auxiliary system in ESL. Previous results on the acquisition of modality in SLA will be considered; previous findings on modal usage in the research of New Englishes will be reinterpreted. In Section 4 I will study the form and function of seven modals and semi-modals of obligation and necessity in various Asian, African and South Pacific L2 varieties and we will see how SLA theory can be applied to explain the results of the case study. The conclusion will draw attention to the wider implications of the case study. In this way it will be possible to shed some light on how SLA theory can be applied to ESL varieties – how the gap could be bridged before it becomes a gulf, so to speak – for the mutual benefit of both disciplines. 2. Second language acquisition and New Englishes – a bridge to be built 2.1

The (ENL-) ESL-EFL taxonomy revisited

One reason why the so-called paradigm gap between New English studies and SLA theory has not been closed yet can be found in the different usage of terminology. In variational linguistics a general distinction is made between three different speech communities – those using English as either a native, as a second or as a foreign language. This tripartite model – based on an initial distinction of three types of speakers of English by Strang (1970) – was established by Quirk et al. (1972), further developed by Moag (1982) and eventually adapted by Kachru (1983/1985), who called these different varieties ‘inner circle’, ‘outer circle’ and ‘expanding circle varieties’ (also see Mollin 2006: 25, McArthur 1998: 42f, Görlach 2002: 99). In this model the New Englishes, if we apply Platt et al.’s narrow definition (Platt et al. 1984: 2f), are identifíed as ESL, and defined as institutionalized non-native varieties of English which are used in former post-colonial settings





Carolin Biewer

(Kachru 1985, Granger 1996: 13). In contrast, EFL is learned in a setting in which English is restricted to the classroom and has no historical or official purpose in the country (Granger 1996: 14). It is not an institutionalized variety but a performance variety (Kachru 1983: 54). This terminology is common in British linguistics, whereas in America any non-native variety of English may be called ESL (Platt et al. 1984: 22). It is further complicated by the fact that in SLA studies ESL refers to any learners of English who acquire the language in an Englishspeaking country (Granger 1996: 14), therefore including the English of immigrants (Mesthrie & Bhatt 2008: 5f). Only in a wider definition of ESL and New Englishes in the variationist approach would immigrant English be included in ESL (Mesthrie & Bhatt 2008: 12). Moreover, in SLA studies ESL (in the variationist definition) may be labelled EOL, ‘English as an official language’, whereas ESL is then exclusively identified with immigrant English (Granger 1996: 14). The different categorizations in the two disciplines highlight different aspects. In the New English perspective the tripartite distinction of ENL, ESL and EFL has a socio-historical base focussing on the historical role of the English language in different societies; in SLA the main point is that people whose language is studied are learners of the language with different circumstances of acquisition, i.e. the exposure to the target language in everyday life differs. From the SLA point of view it does not matter whether someone who acquires English as a non-native speaker in an English-speaking environment has lived there for a couple of years or is living there permanently, whereas from the point of view of New English studies it is a vital difference, as people belong or do not belong to a certain speech community within the country. In the variationist approach ESL is clearly distinguished from EFL in that deviation from ENL can be an indication of nativization, the process of developing a new variety with its own norms. In SLA the major point is that this is not first language but second language acquisition. This may be one reason why some studies on SLA do not openly distinguish between ESL and EFL (e.g. Stephany, also see definition in Ortega 2009: 1f) or do not give much information on the people whose English is being studied (e.g. Salsbury & Bardovi-Harlig 2000). For Sridhar & Sridhar (1992: 93) the lack of recognition in SLA that New Englishes are different from learner English in terms of the goal of learning, the input, the function in society and the motivation of learning is the main cause of the paradigm gap. We can rephrase this as a neglect on both sides: while in New English studies the differences between ESL and ENL are at the centre of interest, distinctions tend to be overrated. In SLA, in which the search for a common core of general acquisition processes is at the heart of the investigation, similarities tend to be overrated. In this paper, the intention is to apply SLA to the study of New Englishes and show the extra-ordinary position of institutionalized non-native varieties in the paradigm. Thus, I will use the variationist terminology with a clear ESL-ELF


distinction. At the same time we must acknowledge that such a clear distinction as given by Kachru’s model is an idealization. The model disguises the fact that these distinctions are fuzzy (also see comment in Mesthrie & Bhatt 2008: 7f). Platt et al. (1984: 22) point out that we can talk about a number of varieties as being “more or less foreign, second, native” (Platt et al. 1984: 22). Various linguistic studies give examples not only of an overlap between ENL and ESL or ESL and EFL but also of a transition from one to the other. Görlach (2002), for instance, argues that Singapore English is moving from ESL to ENL as it is “increasingly introduced as a first language in upper middle-class families” and is used in “an exceptionally wide range of domains” (Görlach 2002: 108); he also predicts the transition of Hong Kong English from ESL to EFL owing to the political changes (Görlach 2002: 109).3 It is also not unusual for countries with a colonial past to have speakers of ESL, ENL and EFL living alongside each other. A complex case is the Cook Islands. While the younger generation of Cook Islanders on Rarotonga, the main island, starts to learn English as a mother tongue, several attempts are being pursued to stop this shift by banning English from certain domains. At the same time, on the more remote outer islands of the Cook Islands, exposure to English is mostly restricted to the classroom, giving it more of an EFL setting. An ENL-ESL-EFL cline within the Cook Islands speech community can therefore be observed. However, that does not mean that the tripartite model as formalized by Moag (1982) should be discarded altogether as Görlach suggests (Görlach 2002: 113). The problem lies partly in our own interpretation of the model. When using this model we have to bear in mind that we are looking at a socio-historical setting that is classified according to the dominant usage of English in the society. It seems to suggest a static situation with three clear-cut categories. This we should perceive as an abstraction of the real life scenario, as the depiction of prototypes within an actually dynamic setting, in which ESL varieties may move closer to or further away from the ESL prototype. Differences in the modal usage in Cook Islands English (CookE) versus Samoan English (SamE) and Fiji English (FijE) (see Biewer 2009) start making sense if we consider that CookE (of Rarotonga) may be further away from the ESL prototype and closer to the ENL prototype than SamE and FijE. Before we discard the model altogether we should think about how to adapt it (or change our perspective), acknowledging overlaps and transition periods between different types, choosing a dynamic approach rather than a static one. I suggest that we argue for a more subtle categorization rather than dissolving the categories of ESL and EFL. 3. See, however, the assessment of the future of Hong Kong English in Schneider (2007: 139) and Joseph (2004: 159ff). The different positions illustrate that post-colonial Englishes are highly dynamic.





Carolin Biewer

There are distinctions between ESL and EFL. English in ESL settings is institutionalized and used in a range of domains, including informal ones. Therefore, in contrast to EFL, a range of different styles is developed. The ESL speaker has a much larger exposure to English than the EFL speaker, who usually may be exposed to English through the media (pop songs, internet) but possibly only uses English actively within the classroom. As a consequence English learners in ESL settings are users of English in everyday life in many different situations with a range of styles at command, whereas EFL learners, if they do not need English at the workplace, may lose their competence as soon as they leave school (Moag 1982: 18ff). There are differences in terms of input, motivation and outcome. Unlike the EFL speaker, ESL speakers do not necessarily want to acquire native-like competence in English in the end or identify completely with the culture of the native speaker (Sridhar & Sridhar 1992: 93f) but use transfer from the mother-tongue to draw attention to their indigenous culture and identity; the speakers then move away from external norms developing their own indigenized endonormative variety (see models by Leitner (1992), Schneider (2003, 2007)). Further, the language the ESL speaker is exposed to is largely a mixture of ESL and the native language rather than ENL (Sridhar & Sridhar 1992: 102), whereas the exposure to English in the EFL classroom usually remains closer to standard ENL4 and the norm orientation remains external. This is another vital difference between ESL and EFL. For both varieties, errors can be fossilized, but for the former at a certain stage some of them will become systematic, i.e. “transcend individual speakers and individual cases” (Mollin 2006: 155), and turn into acceptable features of a newly developing variety in the socio-cultural setting in which the variety is used (also see definition of ‘feature’ in Bautista, 2004: 113, ref. to D’Souza 1998: 92).5 In SLA fossilized structures remain being seen as interlanguage, an “intermediate [variety] between the speaker’s native language and the target language” (Trudgill 2003: 65) not a target in itself (Sridhar & Sridhar 1992: 98); a possible status of such a variety in the community is denied, and transfer from the mother tongue is seen as something negative.6 To bring SLA and the study of New Englishes together this perception has to change. The value system of a society and 4. That of course also depends on the competence of the teacher. 5. For a definition of ‘fossilization’ see Trudgill (2003: 51). For a terminological difference between ‘mistake’ and ‘feature’ see Kachru (1992: 62). In this paper we will use the term ‘deviation’ as a neutral term that could refer either to an error or a feature. 6. On the misjudgement of transfer in SLA see Sridhar & Sridhar (1992: 98ff). For a discussion on the role of transfer in ESL and EFL see Methrie & Bhatt (2008: 159ff). In ESL varieties, too, deviation from ENL ‘parents’ will be evaluated in a negative way for quite a while. The difference to the evaluation for EFL is that in ESL at some stage a number of these deviations will be accepted as features.


its indigenous culture are important factors in the development of ESL as a variety not meant to be entirely identical with the target language. This is an aspect which has so far been largely neglected in SLA theory.7 At the same time similarities between ESL and EFL should not be underestimated. They are both non-native varieties of English, ESL too is usually first introduced to the learner in the classroom (see Mesthrie & Bhatt 2008: 156). Both groups of learners of English at the beginning of the learning process orientate themselves towards the norms of inner circle varieties of English and show a low estimation of their own competence (Moag 1982: 34ff). Particularly in the early stages of acquisition both EFL and ESL seem to take similar routes of development and similar stages of acquisition (Mesthrie & Bhatt 2008: 160ff). There are general constraints in the acquisition process that hold for both ESL and EFL. It is of course precisely these commonalities that make it not only possible but highly desirable to bring the learner perspective in New English studies into focus. In the following I will apply SLA theories that may help to explain the usage of modal auxiliaries in L2 varieties of English. 2.2

Modal auxiliaries in L2 varieties: An SLA perspective

There are similarities in the way learners of ESL and EFL deduct rules and gradually replace earlier rules with new ones as they learn more about the language (Mesthrie & Bhatt 2008: 161). At the beginning of the learning process both learners of ESL and of EFL tend to over-generalize and regularize the grammatical rule system of ENL, in both cases the learners will rely on their mother tongue as a comparison (see e.g. Winford 2003: 218, Sand 2005: 124). But transfer from the mother tongue is more than just an imitation of surface structure and does not always result in simplification (Mesthrie & Bhatt 2008: 160, Sridhar & Sridhar 1992: 99). Structural features of L1 are not immediately transferred to the L2 but added first to “a pool of variants”, from which they may be selected later for linguistic or social reasons (Mesthrie & Bhatt 2008: 164).8 One constraint for transfer is whether it helps to “accomplish social and interactional ends” (Firth & Wagner 1997: 292). For ESL it may be used to resist the “full power of the target language” (Mesthrie & Bhatt 2008: 164, ref. to Rampton 2005). In the process of transfer one 7. The beginning of the 21st century witnessed a reorientation of SLA theory in the recognition of a social dimension of language learning. However, what is meant by that is that the acquisition of a native-like competence goes beyond the “mastery of a language” and also depends on the social setting of the learning situation (Ortega 2009: 217ff). It does not include deviation from the target language as a necessary strategy to be able to function with an L2 in one’s native community. 8. Also compare Mufwene’s ‘feature pool’ (e.g. in Schneider 2007: 22f).





Carolin Biewer

has also to consider that learners modify their native system of grammar and end up using a modified system for L2 rather than completely switch over to the new system (Georgieva 1993: 158). Learners compare what they know about the L2 grammar with the L1 system and establish contrasts and similarities as they perceive them, not necessarily as they really are (Georgieva 1993: 158). According to the Transfer to somewhere principle (see Andersen 1983) only structural features in L1 that are unmarked in L1 are transferred into L2. If in some other part of the grammar of the target language there exists a similar structural feature, transfer will take place, as the feature is seen as viable. If not, transfer will be blocked (Mesthrie & Bhatt 2008: 165). An example would be the over-generalization of unmarkedness of person to the third person singular of verbs. Further, grammatical features can only be transferred if there is a grammatical feature in the target language that can be related to the L1 feature in function or meaning. Although the principle may explain some cases, it by no means explains them all.9 But this principle works as long as the acquisition process is (still) limited to the classroom (Mesthrie & Bhatt 2008: 166) and it draws attention to the fact that over-generalization owing to transfer is a pattern that will trigger similar outcomes in ESL and EFL. Another principle of SLA which illuminates the connection between transfer and over-generalization is the Shortest path principle. This principle postulates that if the rules of the target language allow for variation, one variant will be selected, and the selected variant will be the one that “corresponds most closely” to the L1 feature (Wald 1993: 516f). Over-generalization does not necessarily only occur in connection with transfer. In the early stage of language learning developments may be independent of the L1 language of the speaker, with everyone giving preference to unmarked categories, even though there is “no structural basis for [this rule] in either L1 or L2” (Mesthrie & Bhatt 2008: 164, referring to Hyltenstam 1984: 41–3). Although this hypothesis lacks proof, it finds support in a cognitive approach which interprets interlanguage in early SLA as evidence of “general cognitive strategies of linearisation” (Mesthrie & Bhatt 2008: 171) and postulates that a ‘basic variety’ underlies SLA at that early stage. This basic variety has no morphology, verbs are uninflected, there is no copula, adverbials are used to mark aspect, and verbs like finish are used as boundary markers (Mesthrie & Bhatt 2008: 171f). This view would explain many similarities in different ESL without seeing them as results of (constraints on) transfer. 9. What about the need to express complex ideas which play a role in the local community but not in, say, British society? On this point, also see the comments of Mesthrie & Bhatt (2008: 165f).


Markedness theory points in a similar direction, saying that less marked features in the world languages are less complex and more frequent. They are therefore easier to learn and usually acquired first (Tschichold 2002: 129). This again explains very well why both in ESL and in EFL marking of the third person singular on verbs or marking of plural on nouns tend to be neglected at first. In addition to this linguists found a general tendency of “[l]earners ... to stick to speech act constructions, whether or not directly related to the L1, which are well-trained, automatized, ‘rehearsed’, so to say” (Georgieva 1993: 161). What Hasselgren (1994) calls the Teddy bear principle is described by Tschichold (2002: 133) as follows: “learners clutch to what they feel is safe and familiar”. That would also mean that features with a high frequency in the target language may be used even more often in ESL or EFL.10 All these theories explain why grammatical patterns of the target language tend to be regularized, giving way to overusage of some patterns and underusage of others. Transfer can, but does not have to, be involved. Less marked features will be acquired first and may show a higher frequency in L2; comparison with the mother-tongue influences the decision of one possible variant in L2 over another. From a psycholinguistic point of view, similarities in New Englishes can also be explained by the assumption that learners make an effort to be explicit in communication (Williams 1987: 168f, adapting Slobin (1977)). In order to reduce ambiguity, grammatical patterns of the target language tend to be regularized and redundancies tend to be avoided by choosing one marker, usually the most salient one (Williams 1987: 169f, Mesthrie & Bhatt 2008: 174). Cases of avoidance of redundancies can be seen in examples such as one of plus unmarked noun, in which the quantifier already denotes ‘plural’, or unmarked verbs in connection with lexical markers of pastness (Williams 1987: 176f). On the other hand, double marking in ESL is also attested, e.g. the double marking of supposition (suppose if we ...) – a creation of redundancy not used in the target language – which can be explained by the urge to maximize clarity of meaning (see Williams 1987: 188f). One of the key results of SLA studies on modality is that in first language acquisition (FLA), if modality is described by free morphemes in the native language, deontic modality will be acquired before epistemic modality (Choi 2006: 157). It is likely that in that case in SLA deontic modality will also be acquired before epistemic modality since the grammatical structure of the native language will be used as a comparison (also see Biewer 2009: 46). In both SLA and FLA, epistemic meaning is still expressed with lexical items when modal auxiliaries are already used to express deontic meaning (Stephany 1995: 112, 116). Before a child starts using modal auxiliaries to express epistemic meaning, she starts “varying 10. It also shows the huge impact of teaching on the ‘outcome’.





Carolin Biewer

her syntax” for all other cases (Ehrich 2005: 184). In essence what this means is that in both ESL and EFL a higher number of modals should be used in their deontic meaning than their epistemic meaning, as well as a higher amount of complex phrases and sentences with deontic than epistemic modals – if that result can be applied to SLA. Although constraints on the acquisition of modals in terms of cognitive abilities and input must be different for ESL, the steps and underlying strategies of SLA may be the same – also because the learner has been successful with these strategies in the acquisition of his first language. Most modals have several functions but young learners tend to avoid this ‘plurifunctionality’ (Shatz & Wilcox 1991: 321f). In FLA and SLA the learners first acquire the basic categories ability, permission, prohibition, obligation, possibility and necessity before they move on to more subtle distinctions (Georgieva 1993: 153). These aspects show tendencies of over-generalization and regularization of the English modal system in ESL and EFL. If we see deontic meaning as the default case, the unmarked variant, markedness theory applies. Modals expressing epistemic meaning are “quite infrequent even in adult language” (Ehrich 2005: 172), ENL already shows a tendency to higher deontic usage for all modals and semimodals of obligation and necessity but must in conversation (Biber et al. 1999: 494), and deontic meaning is easier to grasp than epistemic meaning, as the latter presupposes the speaker’s ability to personally judge the truth of a proposition (Quirk 1985: 219). The learner will try to regularize the modal system by avoiding the plurifunctionality of modals. A predominance of deontic modality over epistemic modality can therefore be expected to be more pronounced in ESL and EFL than ENL. Regularization will mean that prototypical meanings are expressed but also that the number of modals will be restricted; already ‘dominant’ modals will be ‘safe’ to be used, marginal members will be further marginalized, particularly, if their meaning is expressed by another more frequently used one. Transfer may reinforce these patterns. We can see how redundancies will be avoided and how the Teddy bear principle, the Shortest path principle (if there is a choice between two variants) and the Transfer to somewhere principle (deontic meaning is already dominant in ENL, must, should etc. can be used with a deontic meaning) can be applied. Studies in EFL show that these theories can indeed be applied that way to EFL.11 Georgieva (1993: 155ff.) found that in the acquisition of the English modal system Bulgarian learners first show an overusage of a restricted number of modals, one per semantic field, with only prototypical meaning being expressed and an avoidance of epistemic modality. The restricted number of modals in L2 may be 11. On the other hand, the possible occurrence of double modals, such as might should, could be explained as a double marking to maximize clarity of meaning.

Modal auxiliaries in second language varieties of English 

reinforced by L1, as Bulgarian only has two modal verbs. On the other hand, Bulgarian learners of English show an overusage of impersonal constructions, even at a very late stage of SLA. Here the Shortest path principle applies. Bulgarian modal verbs have a personal and impersonal form, a distinction that is not made in the English modal system, but as English can distinguish between personal and impersonal forms in general the distinction in the Bulgarian system is seen as viable for the target language. Aijmer (2002) looked at Swedish, German and French learners of English, advanced students, and also found an overusage of modals for all three groups (see Aijmer 2002: 55); in particular she detects a higher percentage of deontic must in the student essays of EFL speakers than in student essays of native speakers (see Aijmer 2002: 64). Interestingly, while all learners overused the category of possibility, learners with different language backgrounds selected different modals within the category to use more than others (see Aijmer 2002: 62). The degree of certainty with which arguments are expressed depends on cultural norms (see Aijmer 2002: 63); it is likely that the modal auxiliary will be selected which seems the variant closest to what would be used in the native language. This will result in a greater variability in EFL which could be explained by the Shortest path principle, and possibly a misperception of the more subtle semantic distinctions between modal auxiliaries.12 If we can apply SLA theory to the usage of modal auxiliaries in ESL as it could be demonstrated for EFL, expectations can thus be formulated as follows: The modal system in ESL will be more restricted and more regularized, in general redundancies will be avoided, the more frequent and less marked variant will be given precedence, tendencies inherent in ENL will not be overlooked but rather reinforced. This means that: a. the usage of modal auxiliaries will both be restricted in terms of frequency and semantic diversity b. the deontic meaning will predominate over the epistemic meaning and often this predominance will be more pronounced than in ENL c. members of the modal system that are already marginalized in ENL will be further marginalized or even completely eradicated, particularly if their function widely overlaps with another modal, e.g. ought to, to be supposed to with should or need with need to13 d. there will be no problems with the form of modal auxiliaries, as the form is not inflected in English 12. Also see Aijmer (2002: 65, 73) for a discussion of reasons other than general learner strategies for these results. 13. As to the situation in ENL see Leech et al. (2009: 73).



Carolin Biewer

Transfer from the mother tongue will cause greater variability: e. in different varieties different modals will be excluded, and different modals will experience overusage, as this will be reinforced by comparison with the native language In addition, the Principle of maximum salience may give rise to the creation of redundancies (see Williams 1987: 188f). There may be the case of the usage of double modals to maximize clarity of meaning. In the following we will discuss whether previous findings in New English studies can be reinterpreted in the light of the above-mentioned SLA strategies, thereby showing whether the above-mentioned hypotheses can be supported or not. In comparison to ENL a less frequent usage of modals and semi-modals has been detected in English in Cameroon (Nkemleke 2007: 87), Ghana (Sey 1973), Nigeria (Kujore 1985) and other African countries (Schmied 1991). Nkemleke (2007) finds a predominance of the modals will and can in Cameroon English, while need and to be supposed to are next to non-existent, and also a predominance of deontic must and should (Nkemleke 2007: 87, 91, 104f). For Malaysian English a simplified modal system is also attested, in which only must and should are used to express obligation and necessity. In the English of Mexican immigrants ought to is non-existent (Wald 1993: 76). Biewer (2009: 49) found that ought to and need with bare infinitive, which are marginalized in inner circle varieties (Mair 2006: 111), are almost non-existent in South Pacific L2 varieties. Regularization of the modal system in terms of restricting the number of modals in New English studies is often explained by transfer. In Malay, for instance, only perlu denotes obligation and necessity (Baskaran 2008: 614). Wald (1993) cites the non-existence of ought to as an example of the shortest path principle; as ought to has the same function as should, the ESL learner selects one of the two variants (1993: 76). Wald also argues that in the English of Mexican immigrants have to is particularly strong as the Principle of shortest path applies again, the Mexican immigrants have tener que in their Spanish L1, which is syntactically and semantically close to have to; in the deontic meaning, in which they can choose between must and have to they therefore select have to (Wald 1993: 78). This would also explain a certain variability in ESL. Greater variability in L2 varieties than in inner circle varieties has been found by Biewer (2009: 48, 51) and Nelson (2003: 27ff). In various ESL studies a more frequent use of deontic modality than epistemic modality was detected, at a ratio that gave the deontic meaning a predominance not found in British English (BE). This was proved for should in Cameroon English (Nkemleke 2005: 51, 56), should and must in SamE and FijE but not CookE

Modal auxiliaries in second language varieties of English 

(Biewer 2009: 49f),14 as well as for should and must in East African varieties of English and Indian English, but not for Hong Kong English in the case of should (Nelson 2003: 30, 31). In the case of Cameroon, the indigenous languages do not have a modal system to express epistemic meaning and directives are predominant (Nkemleke 2007: 94); it is likely that transfer in various cases reinforces the predominance of deontic modality inherent in ENL. Problems in using the form of modal auxiliaries are not reported. Another distinction that Alimi (2007) finds in the English of first year and third year students in Botswana is the use of can be able, which is also known to be a feature of Nigerian English and Black South African English and apparently also of other varieties of English in Africa (Alo & Mesthrie 2008: 326, Mesthrie 2008b: 490). Alimi explains it by a direct structural transfer from Setswana ka kgona, whereas Mesthrie believes it is coined by analogy to might be able, should be able, will be able etc. (Mesthrie 2008b: 490), which are perfectly acceptable in Standard ENL. The Transfer to somewhere principle could apply here with a misconception of what are possible variants in the target language. One could also bring in the Principle of maximising salience as applied by Williams (1987). The speaker, from that point of view, uses can be able as a double marking that is to be understood to mean the same as to be able to. The particular usage of this feature in African L2 varieties points to a close or additional link to the substrate as mentioned for instance in the Botswana case. This shows that a number of facts detected in New English studies can be explained with the help of SLA theory. But there are also features that cannot be explained. Katikar (1984) detects a predominance of past tense forms for present tense forms in the usage of modal auxiliaries in Indian English (e.g. would for will, Nkemleke 2005: 47), a characteristic that is also mentioned by Bhatt (2008: 559); but there is no general predominance of past tense forms over present tense forms in the modal system of L2 varieties; it depends on the politeness strategies in the local community. In Nigerian English, for instance, the present form is preferred to express politeness (Alo and Mesthrie 2008: 326). Nkemleke shows that must as a performative, which in BE is largely restricted to domains such as court procedures (Coates 1983: 38), is widely used in Cameroon English in religious texts, private letters and student essays. This suggests a difference in the perception and expression of strong obligation in the two societies. Filipino students may use would too often as “they associate it with low or mid certainty [...] or with politeness.” (Bautista 2004: 123). These examples show that in the acquisition process the learner also has to consider the cultural needs of his community. “[L]inguistic presupposition and propriety [are] culturally construed notions” (Basham & 14. But also see her comparison to New Zealand English (NZE) and American English (AmE).



Carolin Biewer

Kwacka 1990: 413) and the usage of modals reflects the social relationship between speaker and hearer (Basham & Kwacka 1990: 418). Differences in the usage of modal auxiliaries in different ESL communities may result from different cultural perceptions of modality and different cultural rules in dealing with each other. Different politeness strategies, therefore, have to be considered as well. Moreover, inherent trends in ENL, such as “democratisation of discourse” (Leech 2003: 237, Smith 2003: 253, ref. to Fairclough 1992: 203f, see Myhill 1995: 157) and colloquialization of the written norm (Mair & Hundt 1997, Hundt 1998: 79), which may also influence changes in the modal system in ESL, are not really taken into account by SLA theory. Different norm orientations will also cause different trends in different communities, a factor that has also been neglected so far. Against this background, I will now turn to the case study on modals to see whether the results found for ESL in previous studies hold for ESL in general. This is the first study that directly compares L2 varieties from Asia, Africa and the South Pacific in terms of modal auxiliaries. The same analytical approach will be used for six different ESL and three different ENL varieties. 3. Modal expressions of obligation and necessity in Asian, African and South Pacific varieties of English 3.1

The corpus-linguistic approach

Differences in the usage of modal auxiliaries in different ESL varieties are more likely to be quantitative than qualitative. A corpus-linguistic approach was therefore chosen for this case study. A corpus was compiled of editorials, leading articles and letters to the editor from a range of newspapers and a range of authors, with at least 270,000 words per variety. The newspaper articles were downloaded from the internet. If available, material from the respective ICE component was added, therefore guaranteeing a certain variety of newspapers and authors.15 The newspaper articles were written in the period from 2004 to 2009. Articles that were signed with a European name or with the abbreviation of a news agency

15. I would like to thank Hans Martin Lehmann and Gerold Schneider for the automatic retrieval and conversion of data from the Ghanaian newspaper The Statesman. Special thanks go to Magnus Huber who very kindly shared with me data of the sections ‘press news editorials’ and ‘persuasive writing’ from ICE-Ghana, which is currently being compiled by him and his team in Giessen. Parts of the corpus used for this study belong to the SaFiRa-W corpus and the SPEAC-corpus, that have been used in previous studies (Hundt & Biewer 2007, Biewer 2008a, Biewer 2008b).

Modal auxiliaries in second language varieties of English 

Table 1. Number of words and sources for the different varieties variety

# words

BE AmE NZE FijE SamE CookE SingE PhilE GhanE Total

463,961 346,901 462,311 276,945 303,261 320,482 274,607 290,289 275,135 3,013,892

Source Guardian, Times, ICE-GB NY Times NZ Herald, Dominion Post, ICE-NZ Fiji Times, ICE-Fiji Samoa Observer CI News, CI Herald Singapore Straits Times, ICE-Sing Manila Times, ICE-Phil Statesman, Ghanaian Times, ICE-Ghana

were excluded. Data for Fiji, Samoa, the Cook Islands, the Philippines, Singapore and Ghana were complemented by data from the three inner circle varieties AmE, BE and NZE. Table 1 gives the number of words and the sources for each variety.16 3.2

Defining the variable

WordSmith Tools was used to retrieve the 7 modals and semi-modals of obligation and necessity should, must, need, ought to, need to, have (got) to and to be supposed to. This includes contractions such as mustn’t and I’ve got to and all different possible word forms of semi-modals like have (got) to or to be supposed to, which can be marked for tense, aspect and person. Elliptical constructions of the type I don’t know but maybe I should were not counted. In analogy to Biewer (2009) first the Mossé index was computed for each variety and each modal expression, i.e. the frequency of the modals per variety per 10,000 words. In a second step all hits for should and must were checked to determine whether they had a deontic or an epistemic meaning, or in the case of should, they conveyed neither a deontic nor an epistemic meaning. The results are given in percentages. In a third step, grammatical and collocational features in the usage of should and must were investigated more closely as these could point towards differences in different ESL. In the following I will look at the outcome of these three approaches in turn and I will discuss whether my findings meet the expectations raised by the results of the previous research reviewed in Section 2.

16. The abbreviations used in this article for ‘Philippine English’, ‘Ghanaian English’ and ‘Singapore English’ are PhilE, GhanE and SingE.



Carolin Biewer

3.3

Overall tendencies

Table 2 shows the results of Biewer (2009: 48), the Mossé index for the seven different modal expressions in the three inner circle varieties and the three South Pacific varieties. Table 3 shows the outcome for SingE, PhilE and GhanE in comparison with AmE, BE and NZE. While the inner circle varieties and the South Pacific varieties have the same rank of individual modals in terms of frequency, in SingE and GhanE must is more frequent than have (got) to. For GhanE, with a Mossé index of 9.6 for must and 3.1 for have (got) to, the contrast is particularly striking.

Table 2. Frequency of modals in the subcorpora (relative frequency per 10,000 words): inner circle varieties versus South Pacific L2 varieties*

should have (got) to must need to ought to supposed to need

BE

AmE

NZE

FijE

SamE

CookE

Mean

12.8 7.7 5.4 2.8 0.6 0.4 0.2

13.4 6.7 5.5 4.5 0.4 0.8 0.1

14.6 8.7 6.1 4.3 0.4 0.4 0.3

26.9 12.3 6.3 5.6 0 0.9 0.04

17 7.8 6.8 3.6 0.07 1.1 0.03

11.9 9.2 6.1 5.2 0.2 0.5 0.06

16.1 8.7 6 4.3 0.3 0.7 0.1

*reproduced from Biewer (2009: 48)

Table 3. Frequency of modals in the subcorpora (relative frequency per 10,000 words): inner circle varieties versus Asian and African L2 varieties

should have (got) to must need to ought to supposed to need

BE

AmE

NZE

SingE

PhilE

GhanE

Mean

12.8 7.7 5.4 2.8 0.6 0.4 0.2

13.4 6.7 5.5 4.5 0.4 0.8 0.1

14.6 8.7 6.1 4.3 0.4 0.4 0.3

12.7 8.6 9.1 3.6 0.4 0.4 0.5

15.6 6.99 6.4 1.4 0.1 1.2 0.6

14.8 3.1 9.6 2.4 0.7 0.4 0.3

13.98 6.97 7 3.2 0.4 0.6 0.3

Modal auxiliaries in second language varieties of English 

Looking at the mean we find quite some fluctuation in the frequencies in the Asian and African L2 varieties, so again a higher variability in L2. For all six ESL varieties we can see the predominance of should, but only for SamE and FijE is the difference from the inner circle varieties particularly striking. The frequencies for need and ought to are also very low in the Asian and African L2 varieties, but in the South Pacific varieties they are considerably lower. PhilE is the only L2 variety outside the South Pacific (in this study) for which one can safely say that the semimodal ought to is moribund, while to be supposed to in PhilE is in a healthier condition than in GhanE, SingE, CookE or FijE. The usage of need with bare infinitive is actually higher in PhilE than in the inner circle varieties and has to be read against a lower usage of need to in PhilE. One can see that different varieties choose to exclude different variants from the system. In comparison with results from previous studies, one can now conclude: a. there is usually a greater variability among L2 varieties than among L1 varieties; b. in L2 the modal system is usually but not always restricted in the range and frequency of individual modals, giving a predominance to one modal over the others while at the same time further mariginalizing already marginalized modal expressions; in most of the attested L2 varieties should is the overall winner; in Wald (1993) it was have to; ought to is usually nearly non-existent, so is either need or need to; one can see that ought to and should and need and need to convey the same meaning and therefore one may be replaced by the other; c. in connection with (b) one can see an ESL overusage of should, must, have (got) to while ought to, need and to be supposed to are used the same or less often than in ENL; for the African and Asian varieties need to is included in the second group, for South Pacific varieties in the first group; d. different marginalization or predominance of different modals in different ESL may be explained by the Shortest path principle. It is interesting to see that in terms of the Mossé index CookE is relatively close to NZE, whereas SingE and GhanE show results closer to BE and NZE than PhilE, which has results quite different from the inner circle varieties apart from the frequency of have (got) to, which nearly equals AmE. PhilE is historically based on AmE (Tayao 2008: 292) while Ghana, Singapore and New Zealand were British colonies (Huber 2008: 68, Wee 2008: 259, Gordon et al. 2004: 36ff) and the Cook Islands are politically associated with New Zealand (Biewer 2008). This shows that it is vital to know about the external norms from which the varieties are originally influenced to assess their development.



Carolin Biewer

3.4

The central modal should

Table 4 shows the frequency of deontic should in relation to all tokens of should. All varieties have a much higher usage of deontic should than epistemic or putative/hypothetical should, the percentages range from 75.4% in SingE to nearly 92% in AmE. For should there is a diachronic trend in inner circle varieties towards monosemy with the deontic meaning gaining ground at the expense of all other meanings and AmE is more advanced than BE (Leech et al. 2009: 86 and Collins 2005: 258), which can also be seen here. Also putative should is known to be more frequent in BE than in AmE (Coates 1983: 67). In this study one can also see NZE in an intermediate position. Biewer (2009) pointed out the closeness of SamE and FijiE to the AmE results whereas CookE was closer to NZE. It is interesting to see that SingE and GhanE are closer to BE and PhilE between NZE and AmE, which once again points towards the original external norms; SingE and GhanE also, like BE, show a high usage of putative should. On the other hand the lower usage of deontic should in GhanE and SingE coincides with a particularly high usage of deontic must (see below). Although the South Pacific varieties are closer to monosemy than the inner circle varieties (their external norms), that cannot be said of GhanE and SingE. SLA strategies apply in a higher usage of deontic should than epistemic should. But in general the frequency is usually close to or higher than in the ‘parental’ ENL. Again norm orientation has to be considered as well. We have to bear in mind, however, that SamE and FijE are not connected to AmE but, as more traditional societies, simply show less of an ENL influence than other varieties. We will return to this argument below. A closer look at the category ‘putative/hypothetical’ also revealed interesting differences. In this category are included all cases of should in which neither a deontic nor an epistemic meaning is intended but merely the hypothetical nature of Table 4. Mossé index and the semantics of should should

total

Mossé

BE AmE NZE FijE SamE CookE SingE PhilE GhanE

580 459 662 733 509 373 350 452 406

12.8 13.4 14.6 26.9 17 11.9 12.7 15.6 14.8

epistemic 45 (7.8%) 18 (3.9%) 61 (9.2%) 28 (3.8%) 33 (6.5%) 24 (6.4%) 37 (10.6%) 37 (8.2%) 15 (3.7%)

deontic

putative/hypoth.

463 (79.8%) 422 (91.9%) 542 (81.9%) 667 (91%) 461 (90.6%) 322 (86.3%) 264 (75.4%) 375 (83%) 314 (77.3%)

72 (12.4%) 19 (4.1%) 59 (8.9%) 38 (5.2%) 15 (2.9%) 27 (7.2%) 38 (10.8%) 28 (6.2%) 60 (14.8%)

Modal auxiliaries in second language varieties of English 

a statement. In that function should can replace a mandative subjunctive or a conditional if-clause (Should the court permit it ...). It can also be used as a verb in an if-clause (If he should believe that ...), or as a verb in the clause that follows the ifclause. In addition, there is the case of should being used in a that-clause with a verb that conveys a certain emotion of the speaker towards the outcome of a certain event, expressing his or her surprise or the opposite, as in (1) (Leech et al. 2009: 86, Quirk et al. 1985: 234, Collins 2005: 258). (1) ... indeed they are surprised that anybody should be so concerned [GhanE component] Whereas BE, NZE, FijE, SamE, SingE and GhanE in this category mostly use should to replace the mandative subjunctive, for AmE, PhilE and CookE it is more often used to replace an if conditional. Apart from one example in NZE the combination of if x should ..., then y ... is only found in the ESL varieties. FijE is the only variety with a relatively frequent usage of hypothetical should as the verb of the clause that follows the if-clause. GhanE is the only L2 variety that has a relatively frequent usage of should in that-clauses to convey an emotion of the speaker towards the outcome of an event. Now individual differences become apparent; substrate influence and local conventions within the society to express knowledge and beliefs and to utter requests will play a role. 3.5

The central modal must

Table 5 shows the usage of must in the 9 different varieties. The frequency of must is much lower in BE and AmE compared to all other varieties; in GhanE and SingE the number is particularly high. For Ghana different politeness strategies offer an Table 5. Mossé index and the semantics of must must

total

Mossé

epistemic

Deontic

ambiguous


251 191 282 175 207 197 250 185 264

5.4 5.5 6.1 6.3 6.8 6.1 9.1 6.4 9.6

56 15 55 14 23 41 18 18 18

191 (76.1%) 174 (91.1%) 221 (78.4%) 158 (90.3%) 183 (88.4%) 155 (78.7%) 227 (90.8%) 163 (88.1%) 239 (90.53%)

4 2 6 3 1 1 5 4 7



Carolin Biewer

explanation as directives are uttered more openly than in western societies (see Huber & Dako 2008: 370). In all varieties the usage of deontic must is much higher than the usage of epistemic must but for BE, NZE and CookE the percentage lies around 76–78%, whereas for all other varieties it is between 88–91%. PhilE again seems to orientate towards AmE. The results in Table 5 thus confirm the different status of CookE. As the percentage of deontic usage is high for all ESL but CookE, this again must be an SLA constraint rather than an imitation of AmE. This would also explain the difference of CookE with more ENL speakers, a lot of influence from NZE and a lot of exposure to NZE. Of course we also see reinforcement of tendencies already present in ENL. The usage of passive with deontic must was a predominant strategy to soften the strong tone of deontic must in FijE and SamE (Biewer (2009: 50)) Table 6 shows that this also applies to PhilE and GhanE. SingE and CookE, however, again pattern more closely to the ENL varieties. We can see that certain clusters of varieties repeat themselves. There is a difference between communities with a strong western influence and communities with a more traditional life style.17 To gain a better insight into mechanisms of usage that may be different, five of the varieties were chosen for a closer investigation of collocations with must and pseudo-exhortations. The five chosen varieties were BE, FijE, SingE, PhilE and GhanE. It was interesting to see that SingE had a certain preference for combining must with never or can, which can be two strategies to put emphasis on the strong obligation expressed.

(2) He can – and must – mitigate this roguish image ... [SingE component] (3) Singapore must never go this way. [SingE component]

On the other hand, among these varieties, SingE favoured the impersonal collocation there must be as a way of downgrading the strength of obligation.

(4) If change there must be, it is more in style.

[SingE component]

Pseudo-exhortations are exhortations directed at a group of people in which the speaker includes himself, e.g. we must act now, in the form of the impersonal we of a lecture style for instance, e.g. we must remember that ... (Coates 1983: 35). 17. For SamE and FijE as less westernized and CookE as more westernized see Biewer (2009: 51f). Equally one can argue that SingE is more westernized and PhilE and GhanE more traditional. Singaporeans have a positive attitude towards English as a window to the west for higher education (Wee 2008: 262ff). In the Philippines, since Aquino’s presidency, “there has been a cultural revolution to promote a new mass culture based on local rather than Western traditions (Thompson 2003: 211). In Ghana official policies now put more emphasis on the use and cultivation of the indigenous languages and the government sees it as its duty to “foster ... pride in Ghanaian culture” (Huber 2008: 72).

Modal auxiliaries in second language varieties of English 

Table 6. Deontic must in passive versus active constructions deontic must

all

active

passive

% passive


191 174 221 158 183 155 227 163 239

146 144 171 109 130 117 187 109 167

45 30 50 49 53 38 40 54 72

23.6 17.2 22.6 31.0 29.0 24.5 17.6 33.1 30.1

Whereas pseudo-exhortations in BE are typical of lectures and sermons (Coates 1983: 35), they seem to be frequent in Cameroon English with no restriction to a certain text type (Nkemleke 2005: 52f). Interestingly, out of the five chosen varieties it was also the African variety that had the highest number of pseudo-exhortations in the newspaper corpus. This might be a feature special to African varieties that could be related to local discourse strategies (rhetorical style). Once again pseudo-exhortations are actually an example of using must to denote weak obligation (Coates 1983: 35). This shows that each variety will have individual preferences in how to upgrade or downplay strong obligation.18 3.6

Discussion

The case study on modal auxiliaries across L2 varieties of different areas in the world in comparison with previous results on other ESL varieties has shown that: a. ESL varieties, as expected, show similarities that can be traced back to SLA constraints: greater variability among ESL than ENL, preference of one modal over others but not necessarily the same in all L2 varieties, restriction of the system in types and meaning, overusage and underusage. The Shortest path principle explains the predominance of should in one variety and have to in another; avoidance of redundancies accounts for the near non-existence of ought to in various L2 varieties. Markedness theory and the Transfer to 18. In terms of must (both in its epistemic and deontic meaning) it was also checked whether instances of double modals occur. One case of must have to was found for PhilE, which could be interpreted as a case of double marking to maximize clarity. However, I was also able to find one case in BE. In general, double modals are more likely to be expected in non-standard English; the English of newspaper articles is closer to Standard or acrolectal English.



Carolin Biewer

somewhere principle help to understand the apparent overusage of deontic meaning in ESL and possibly the reinforcement of trends already inherent in ENL. As these features can be explained by the learner situation, results will be similar (but not necessarily identical) for both ESL and EFL; b. there are individual differences that can be traced back to differences in the native language that will trigger a different outcome according to the Transfer to somewhere principle and the Shortest path principle, but also different politeness strategies in the local community have to be considered, e.g. different strategies to soften must. This is a transfer that is not likely to last in EFL as it has something to do with using English to express needs within the local community and to express social identity. There will be a difference between more autocratic/conservative and more democratic societies in the usage of modals of obligation and necessity. The deviation of ESL from the target language to express differences in indigenous culture and social identity is an issue still largely neglected in SLA theory. c. ESL will be more heterogeneous than EFL as they are norm-developing and different ESL varieties will be at different stages of nativization. We can understand this if we picture Kachru’s model as a continuum and position the different ESL in different relations to a prototypical ESL. It is likely that some L2 varieties are oriented more closely to ENL, whereas for others the local native languages are more relevant reference points, which also influences the type of English one is exposed to (proficiency of the teachers, number of native speakers in the country, access to media like the Internet), the attitudes towards English and the goals in learning English (achievement, proficiency). The learner perspective too acknowledges fossilization at different stages of SLA, but evaluates it negatively and does not consider local needs of communication. These are aspects SLA theory has to focus on in order to explain the varying impact of constraints of SLA and to make SLA theory fully applicable to ESL. 4. Conclusion EFL and ESL have a common starting point in the acquisition process; thus, cognitive processes and learning strategies will be similar at the beginning of the learning process. This fact helps to explain many of the features common to L2 varieties in general. Tendencies will be reinforced if other factors push in the same direction. But in ESL this is only one part of the story, and the local needs and differences in norm orientation have to be considered as well. All ESL varieties may develop in the same direction due to a common learning process but

Modal auxiliaries in second language varieties of English 

they will have different starting points and endpoints depending on norm orientation and different social value systems. It is important to adopt the bird’seye view by considering general SLA patterns – but not at the expense of the worm’s-eye view which takes into account societal distinctions in different communities. As in New English studies the bird’s-eye view tends to be neglected, we can also say: SLA alone is not the answer, but without SLA we will never have an answer. Mesthrie (2008a) described the study of grammatical features in New English studies as the Cinderella within other linguistic disciplines, one of those being SLA studies. He concluded that “[i]t is time Cinderella found her slipper.” (Mesthrie 2008a: 634). It is to be hoped that this article has helped her with her search. References Aijmer, K. 2002. Modality in advanced Swedish learners’ written interlanguage. In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching [Language Learning & Language Teaching 6], S. Granger, J. Hung & S. Petch-Tyson (eds), 55–76. Amsterdam: John Benjamins. Alimi, M.M. 2007. English articles and modals in the writing of some Batswana students. Language, Culture and Curriculum 20 (3): 209–14. Alo, M.A. & Mesthrie, R. 2008. Nigerian English: morphology and syntax. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 323–339. Berlin: Mouton de Gruyter. Andersen, R. 1983. Transfer to somewhere. In Language Transfer in Language Learning: Issues in Second Language Research, S. Gass & L. Selinker (eds), 177–201. Rowley MA: Newbury House. Basham, C. & Kwachka, P. 1989. Variation in modal use by Alaskan Eskimo student writers. In Variation in Second Language Acquisition, Vol I: Discourse and Pragmatics, S. Gass, C. Madden, D. Preston & L. Selinker (eds), 129–143. Clevedon: Multilingual Matters. Baskaran, L. 2008. Malaysian English: morphology and syntax. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 610–623. Berlin: Mouton de Gruyter. Bautista, M.L. 2004. The verb in Philippine English: A preliminary analysis of modal would. World Englishes 23(1): 113–128. Bhatt, R.M. 2008. Indian English: Syntax. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Methrie (ed.), 546–562. Berlin: Mouton de Gruyter. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. The Longman Grammar of Spoken and Written English. Harlow: Pearson Education. Biewer, C. 2008a. South Pacific Englishes: Unity and diversity in the usage of the present perfect. In Dynamics of Linguistic Variation: Corpus Evidence on English Past and Present [Studies in Language Variation 2], T. Nevalainen, I. Taavitsainen, P. Pahta & M. Korhonen (eds), 203–219. Amsterdam: John Benjamins.



Carolin Biewer Biewer, C. 2008b. Concord patterns in South Pacific Englishes – the influence of New Zealand English and the local substrate. In Anglistentag 2007 Münster, Proceedings, K. Stierstorfer (ed.), 331–343. Trier: Wissenschaftlicher Verlag. Biewer, C. 2009. Modals and semi-modals of obligation and necessity in South Pacific Englishes. Anglistik 20(2): 41–55. Bolton, K. 2006. World Englishes today. In The Handbook of World Englishes, B.B. Kachru, Y. Kachru & Cecil Nelson (eds), 240–269. Oxford: Blackwell. Choi, S. 2006. Acquisition of modality. In The Expression of Modality, W. Frawley (ed.), 141–171. Berlin: Mouton de Gruyter. Coates, J. 1983. The Semantics of Modal Auxiliaries. London: Croom Helm. Collins, P. 2005. The modals and quasi-modals of obligation and necessity in Australian English and other Englishes. English World-Wide 26(3): 249–273. De Haan, F. 2006. Typological approaches to modality. In The Expression of Modality, W. Frawley (ed.), 27–69. Berlin: Mouton de Gruyter. D’Souza, J. 1998. Review of Arjuna Parakrama’s De-Hegemonizing Language Standards: Learning from (Post) Colonial Englishes about ‘English’. Asian Englishes 1(2): 86–94. Ehrich, V. 2005. Linguistic contraints on the acquisition of epistemic modal verbs. In Linguistic Evidence – Empirical, Theoretical & Computational Perspectives, S. Kasper & M. Reis (eds), 165–186. Berlin: Mouton de Gruyter. Fairclough, N. 1992. Discourse and Social Change. Cambridge: Polity Press. Firth, A. & Wagner, J. 1997. On discourse, communication, and (some) fundamental concepts in SLA research. Modern Language Journal 81: 285–300. Georgieva, M. 1993. A cognitive approach to the acquisition of English modals by Bulgarian learners. In Current Issues in European Second Language Acquisition Research, B. Kettemann & W. Wieden (eds), 151–163. Tübingen: Gunter Narr. Görlach, M. 2002. English in Singapore, Malaysia, Hong Kong, Indonesia, the Philippines...a second or a foreign language? In Still More Englishes [Varieties of English around the World G28], M. Görlach (ed.), 99–117. Amsterdam: John Benjamins. Gordon, E., L. Campbell, J. Hay, M. Maclagan & A. Sudbury 2004. New Zealand English: Its Origin and Evolution. Cambridge: CUP. Granger, S. 1996. Learner English around the world. In Comparing English Worldwide: The International Corpus of English, S. Greenbaum (ed.), 13–24. Oxford: Clarendon Press. Hasselgren, A. 1994. Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics 4(2): 237–258. Huber, M. 2008. Ghanaian English: phonology. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 67–92. Berlin: Mouton de Gruyter. Huber, M. & Dako, K. 2008. Ghanaian English: morphology and syntax. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 368–380. Berlin: Mouton de Gruyter. Hundt, M. 1998. New Zealand English Grammar – Fact or Fiction? [Varieties of English around the World G23]. Amsterdam: John Benjamins. Hundt, M. & Biewer, C. 2007. The dynamics of inner and outer circle varieties in the South Pacific and East Asia. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 249–269. Amsterdam: Rodopi.

Modal auxiliaries in second language varieties of English Hyltenstam, K. 1984. The use of typological markedness conditions as predictors in second language acquisition: The case of pronominal copies in relative clauses. In Second Languages: A Cross-linguistic Perspective, R. Andersen (ed.), 39–58. Rowley MA: Newbury House. Joseph, J.E. 2004. Language and Identity: National, Ethnic, Religious. London: Palgrave. Kachru, B.B. 1992[1983]. Models for non-native Englishes. In The Other Tongue: English across Cultures, B.B. Kachru (ed.), 48–74. Urbana IL: University of Illinois Press. Kachru, B.B. 2006 [1985]. Standards, codification and sociolinguistic realism: The English language in the outer circle. In World Englishes: Critical Concepts in Linguistics, Vol. 3, K. Bolton & B.B. Kachru (eds), 241–269. London: Routledge. Katikar, P. 1984. The Meaning of the Modals in Indian English. PhD dissertation, Shivaji University, Kolhapur. Kwachka, P. & Basham, C. 1990. Literary acts and cultural artefacts. Journal of Pragmatics 14: 413–429. Kujore, O. 1985. English Usage: Some notable Nigerian variations. Ibadan: Evans Brothers. Leech, G. 2003. Modality on the move: The English modal auxiliaries 1961–1992. In Modality in Contemporary English, R. Facchinetti, M. Krug & F. Palmer (eds), 223–240. Berlin: Mouton de Gruyter. Leech, G. & Hundt, M., Mair, C. & Smith, N. 2009. Change in Contemporary English: A grammatical study. Cambridge: CUP. Leitner, G. 1992. English as a pluricentric language. In Pluricentric Languages: Differing norms in different nations, M. Clyne (ed.), 179–237. Berlin: Mouton de Gruyter. Mair, C. & Hundt, M. 1997. The corpus-based approach to language change in progress. In Anglistentag 1996 Dresden, U. Böker & H. Sauer (eds), 71–82. Trier: Wissenschaftlicher Verlag. Mair, C. 2006. Twentieth-Century English. History, Variation and Standardization. Cambridge: CUP. McArthur, T. 1998. The English Languages. Cambridge: CUP. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Linguistic Varieties. Cambridge: CUP. Mesthrie, R. (ed.). 2008a. Varieties of English, Vol. 4: Africa, South and Southeast Asia. Berlin: Mouton de Gruyter. Mesthrie, R. 2008b. Black South African English: Morphology and syntax. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 489–500. Berlin: Mouton de Gruyter. Moag, R. 1982. English as a foreign, second, native, and basal language: A new taxonomy of English-using societies. In New Englishes, J.B. Pride (ed.), 11–50. Rowley MA: Newbury House. Mollin, S. 2006. Euro-English: Assessing Variety Status. Tübingen: Gunter Narr. Myhill, J. 1995. Change and continuity in the functions of the American English modals. Linguistics 33: 157–211. Nelson, G. 2003. Modals of obligation and necessity in varieties of English. In From Local to Global English: Proceedings of Style Council 2001/2, P.H. Peters (ed.), 25–32. Sydney: Dictionary Research Centre, Macquarie University. Nkemleke, D. 2005. Must and should in Cameroon English. Nordic Journal of African Studies 14(1): 43–67. Nkemleke, D. 2007. Frequency and use of modals in Cameroon English and application to language education. Indian Journal of Applied Linguistics 33(1): 87–105. Ortega, L. 2009. Understanding Second Language Acquisition. London: Hodder Education.





Carolin Biewer Platt, J., Weber, H. & Ho Mian Lian 1984. The New Englishes. London: Routledge & Kegan Paul. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1972. A Grammar of Contemporary English. London: Longman. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Rampton, B. 2005. Crossing: Language and Ethnicity among Adolescents, 2nd edn. Manchester: St Jerome. Salsbury, T & Bardovi-Harlig, K. 2000. Oppositional talk and the acquisition of modality in L2 English. In Social and Cognitive Factors in Second Language Acquisition, B. Swierzbin, F. Morris, M. E. Anderson, C.A. Klee & E. Tarone (eds), 57–76. Somerville MA: Cascadilla Press. Sand, A. 2005. Angloversals? Shared Morphosyntactic Features in Contact Varieties of English. Habilitation, University of Freiburg. Schmied, J. 1991. English in Africa: An Introduction. London: Longman. Schneider, E. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79(2): 233–281. Schneider, E. 2007. Postcolonial Englishes: Varieties around the World. Cambridge: CUP. Sey, K. 1973. Ghanaian English: An Exploratory Survey. London: Macmillan. Shatz, M. & Wilcox, S.A. 1991. Constraints on the acquisition of English modals. In Perspectives on Language and Thought, Interrelations in Development, S.A. Gelman & J.P. Byrnes (eds), 319– 353. Cambridge: CUP. Slobin, D. 1977. Language change in childhood and history. In Language Thought and Language Learning, J. Macnamara (ed.), 185–214. New York NY: Academic Press. Smith, N. 2003. Changes in the modals and semi-modals of strong obligation and epistemic necessity in recent British English. In Modality in Contemporary English, R. Facchinetti, M. Krug & F. Palmer (eds), 241–266. Berlin: Mouton de Gruyter. Sridhar, K.K. & Sridhar, S.N. 1992 [1982]. Bridging the paradigm gap: Second-language acquisition theory and indigenized varieties of English. In The Other Tongue: English across Cultures, B.B. Kachru (ed.), 91–107. Urbana IL: University of Illinois Press. Stephany, U. 1995. Function and form of modality in first and second language acquisition. In From Pragmatics to Syntax: Modality in second language acquisition, A. Giacolone-Ramat & G. Crocco-Galèas (eds), 105–120. Tübingen: Gunter Narr. Strang, B.M.H. 1970. A History of English. London: Methuen. Tayao, M.L.G. 2008. Philippine English: Phonology. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 292–306. Berlin: Mouton de Gruyter. Thompson, R.M. 2003. Filipino English and Taglis: Language Switching from Multiple Perspectives [Varieties of English around the World G31]. Amsterdam: John Benjamins. Tschichold, C. 2002. Learner English. In Perspectives on English as a World Language, D.J. Allerton, P. Skandera & C. Tschichold (eds), 125–133. Basel: Schwabe & Co. Trudgill, P. 2003. A Glossary of Sociolinguistics. Oxford: OUP. Wald, B. 1993. On the evolution of would and other modals in the English spoken in East Los Angeles. In Modality in Language Acquisition, N. Dittmar & A. Reich (eds), 59– 96. Berlin: Mouton de Gruyter. Wald, B. 1996. Substratal effects on the evolution of modals in East LA English. In Sociolinguistic Variation: Data, Theory and Analysis. Selected Papers from NWAV 23 at Stanford, J. Arnold, R. Blake, B. Davidson, N. Mendoza-Denton, S. Schwenter & J. Solomon (eds), 515–530. Stanford CA: CSLI.

Modal auxiliaries in second language varieties of English  Wee, L. 2008. Singapore English: Phonology. In Varieties of English, Vol. 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 259–277. Berlin: Mouton de Gruyter. Williams, J. 1987. Non-native varieties of English: A special case of language acquisition. English World-Wide 8(2): 161–199. Winford, D. 2003. Introduction to Contact Linguistics. Oxford: Blackwell.

English in Cyprus Second language variety or learner English? Christiane M. Bongartz and Sarah Buschfeld University of Cologne

The postcolonial linguistic situation of the Mediterranean island of Cyprus has been widely neglected in research investigating the spread of English around the globe. This article seeks to remedy the lack of systematic investigation and places Cyprus English within the framework of World Englishes research. However, the linguistic reality is complex and heterogeneous, and Cyprus English can neither be assigned clear English as a Second Language (ESL) nor English as a Foreign Language (EFL) status. We illustrate what we came to see as a hybrid and complex situation drawing on data from a preliminary analysis of linguistic features attested in the CEDAR (Cyprus English Data Analysis and Research) corpus, and we link this analysis with findings from a sociolinguistic background analysis, a survey of language attitudes and of the use speakers make of English. To approach the question whether or not Cyprus English should be considered a second language variety, or whether it best be viewed as a kind of learner English, we suggest a way to map feature occurrence, possible structural nativization, and the influence of sociolinguistic variables to a matrix, the Variety Spectrum. Assuming hybrid ESL-EFL status for Cyprus English, we finally show that Kachru’s Three Circles model (Kachru 1985, 1992) does not account for such complex linguistic situations. We thus suggest the use of more flexible models for placing Cyprus English on the map of World Englishes (see Bruthiaux 2003).

1. Introduction Like numerous other countries the Mediterranean island of Cyprus was under British rule for a considerable amount of time (1878–1960). However, the English spoken there has not yet been discussed systematically within the framework of World Englishes. This article places Cyprus English1 (henceforth CyE) on the map 1. For the time being, we have chosen the term “Cyprus English” as a neutral denomination. It will be used as long as the variety status question has not been answered conclusively.



Christiane M. Bongartz and Sarah Buschfeld

of World Englishes research and thus seeks to remedy the lack of systematic and comprehensive investigation. In particular, we discuss whether or not CyE should be considered a second language variety, or whether it should be simply regarded as learner English. The handful of available studies approach the linguistic situation in Cyprus from a prescriptive and somewhat speculative perspective (Papapavlou 2001). Those studies that have dealt with the topic systematically and empirically look into language use and attitudes and the general linguistic situation on the island, but not with a specific focus on the possible variety status of CyE (Goutsos 2001, McEntee-Atalianis 2004, Papapavlou 2001). Two articles, however, do place CyE within the broader framework of World Englishes (Tsiplakou 2009, Yazgin 2007). Yazgin (2007) claims that English in Cyprus has moved from the Outer to the Expanding Circle (Kachru 1985, 1992), such that it has to be considered a foreign language today, with the role of a lingua franca (Yazgin 2007). Tsiplakou (2009), however, objects to the idea that Cyprus could have belonged to the Outer Circle of Kachru’s model (Kachru 1985, 1992) in the first place. In making this claim she also raises the question about the adequacy of the English as a second language (ESL)/English as a foreign language (EFL) distinction made in the Kachruvian model (Kachru 1985, 1992). This article suggests an approach which combines relevant micro- as well as macro-sociolinguistic aspects. In doing so, it expands on the flexibility of current models to include hybrid cases such as the situation brought on in the course of the historical development of Cyprus. The argument put forward here begins with a critical evaluation of the dichotomy between second language varieties and learner Englishes (Section 2.1) and it offers a methodology of how to assess variety status (see Mollin 2006), we then look into the socio-political context of CyE (Sections 2.2 and 2.3) and introduce the CEDAR (Cyprus English Data Analysis and Research) corpus with its methodology used for feature identification (Section 2.4). We propose an alternative model to identify and place second language varieties and learner Englishes within the broader framework of World Englishes using CyE as a case in point (Sections 2.5 and 2.6).2 To substantiate the data obtained from the corpus, we report on findings on language attitudes and use (Section 2.7). Finally (Section 3.2), we suggest that CyE can best be placed in the Dynamic Model proposed by Schneider (2003, 2007). With respect to the ESL-EFL status we find a hybrid situation tied closely to the personal history of individual learners. 2. A detailed description and implementation of this model is to be defined in the forthcoming dissertation of the second author that systematically and comprehensively investigates the variety status of CyE and attempts to place it within the broader framework of World Englishes.

English in Cyprus 

2. Cyprus English – second language variety or learner English? 2.1

Some theoretical considerations

When investigating whether a type of English should be considered a second language variety or some form of learner English, one has to look at specific and well-defined differences between the two concepts. Second language varieties have generally been considered varieties of English spoken in what Kachru refers to as Outer Circle countries (1985, 1992). In such Outer Circle countries like India, Kenya, Singapore etc., English is the second language for most of the inhabitants and is spoken in addition to the respective native language(s). This happened for historical reasons and across the speech community (see Schneider 2003, 2007, Schreier 2009; and others). The term learner English is closely connected to the notion of interlanguage, as introduced by Selinker (1972). It describes the linguistic mental system which develops in second language learners during their process of language acquisition (see Corder 1981, Selinker 1972;). Thus, while variety status is tied to language use in a speech community, learner language and the development of interlanguage grammar are notions that apply to the idiolect of individual speakers. Owing to language contact and transfer phenomena, interlanguage development in a speech community can take similar routes for individual speakers, as has been attested in so-called Expanding Circle countries as, for example, Russia, Indonesia, and Japan (Mesthrie & Bhatt 2008). English there is used and taught as a foreign language through formal instruction and serves as lingua franca, especially for international communication (Schneider 2003, 2007, Schreier 2009;). In comparison to second language varieties, learner Englishes with foreign language status in Expanding Circle countries “are more diffuse” (Mesthrie & Bhatt 2008: 208) and have not undergone the process of structural “nativization”, i.e. the cultural and referential adaptation of the English language in ESL countries (Kachru 1992, Mesthrie & Bhatt 2008: 10; see also Schneider 2003, 2007). If we take into account the respective characteristics of second language varieties and learner language, how can we then systematically distinguish between the two? A systematic and comprehensive analysis of the socio-political situation, language attitudes and use, and a structural analysis of linguistic features are vital to effectively characterize a linguistic (postcolonial) situation (Schneider 2003). The analysis of both linguistic features and the investigation of language attitudes and use must be combined to establish possible nativized linguistic forms and concepts of identity (Künstler Schneider 2003, 2007, Mendis & Mukherjee 2009, Schneider 2003, Schriedesh 2007;). We begin this integrative process with an account of the historical contexts for English language use in Cyprus.




2.2

English in Cyprus: Historical background

Cyprus is the third largest of the Mediterranean islands, situated in the Middle East, south of Turkey, and west of Syria and Lebanon. It covers 9,251 sq km, of which 3,355 sq km belong to the Turkish part and 5,896 sq km to the Greek part of the island (CIA World Fact Book). In the course of rising British imperialism, Cyprus fell under British protectorate in 1878. The island was especially of structural interest for the British, who were committed in the Middle East and India. On June 4th 1878, after 300 years of Turkish rule in Cyprus, the United Kingdom agreed on a contract with Turkey to protect the Turkish possessions in Asia. An additional treaty was signed on August 14th in the same year, which granted Great Britain the absolute legislative authority during the entire period of colonial rule. However, this authority and the right of occupation did not bring about territorial sovereignty initially. Only in 1914, after 36 years of occupation, did the British start to annex the island because the Turkish sultan and the Sheikh ul-Islam declared a holy war. Turkey took a stand against the Allies by joining the Central Powers in World War I. On this account, Britain declared the 1878 treaties invalid and announced the island’s annexation on November 5th, which was officially acknowledged by the Turks in the treaty of Lausanne in 1923. As a result, the sultan’s territorial jurisdiction expired and the British crown finally gained full territorial sovereignty of Cyprus. The island was officially declared a “Colony of Cyprus” in 1925, although the population had already been “British subjects” since the British order in council in November 1917 (Schwenger 1964). Although the British government offered financial and social gratuity for schools where, increasingly, the English language was taught and where students began learning the language from 1933 on, the use of English was never part of an official government policy (Tsiplakou 2009). Nevertheless, the colonial power was not always welcome and anti-British voices were raised repeatedly, something that led to bloody upheavals in 1931. This, in turn, led to an increase in British authority and power politics. The authoritarian government machinery stayed largely in power until the Cypriot declaration of independence in 1960, when the Cypriots refused to collaborate with the colonial power on the basis of a so-called “selfgovernment”. From about 1949 onwards, it was the people’s aim to reach full sovereignty (Schwenger 1964). In 1960, Cyprus became an independent and autonomous country (Mühleisen 1986). In Cyprus, the quest for national independence was a development of the second half of the 20th century. The trend to strive for political independence and national sovereignty has deep historical roots (Schwenger 1964), since the development of the island had long been characterized by changing foreign domination

English in Cyprus 

(Tzermias 1991). Hence, the development of identity constructions appears to be unique when compared with other former colonies. There, differences between the group of colonizers and the indigenous population often diminished in the course of time (see Schneider 2003, 2007). For most Cypriots however, a strong “us” group identity has been retained, which excludes the British presence as “other” (see Schneider 2003, 2007). 2.3

The status of English in postcolonial Cyprus

Today’s linguistic situation on the island is heterogeneous and sometimes elusive. When arriving in Cyprus, one immediately notices the bilingual status of the island with all shop and street signs displaying both languages (see also Papapavlou 2001). In addition, several other vestiges of the colonial period can still be found. Like many other ex-colonies, the island has several British chain stores, Cypriots drive on the left, and they use the British three-pin electric plug. As to population numbers, about 7.5% of the overall population in the Greek part (approx. 790,0003) are British expatriates, approx. 59,000 (Leonidou 2007). Furthermore, many Greek Cypriots today send their children to British universities and have relatives in the United Kingdom, especially in London (Yazgin 2007); they moved there in the 1950s and mid 1970s mainly for political and economic reasons.4 Thus, stable language contact on the island and with relatives in London and elsewhere is ensured. However, the amount of the use of either Greek or English largely depends on the degree of bilingualism in the family. Legal and official documents are still written in English (Tsiplakou 2009, Yazgin 2007), e.g. government department memoranda. Until not long ago, English was also the official language in court. It has also been used in private domains, hospitals and banking sectors (Papapavlou 2001). In the late 1980s and the early 1990s, however, public discussions about language politics dominated the press.5 During the symposium on The Greek Language Today in Cyprus, which was held in 1992, speakers expressed negative views and the fear that the English language might take over, resulting in an identity crisis. Different lecturers argued for restrictions on the use of English calques and loans and spoke in favor of generally restricting the use of English in both private 3. At the end of 2007 the number of inhabitants amounted to approximately 789,300 (Statistical Service of the Republic of Cyprus 2008). 4. This information is derived from the interviews conducted in Cyprus and in the Cypriot Greek community in London. 5. See, for example, an overview of attitudes and public discourse from 1985 to1992 (Karyolemou 1994; see McEntee-Atalianis 2004).




and public domains (McEntee-Atalianis 2004, Papapavlou 1997). The highly emotional debate about which language(s) should be declared the official language(s) of the newly opened University of Cyprus (McEntee-Atalianis 2004) displays similar views and fears. The discussion resulted in a decision against English as the general medium of instruction and in favor of the two native languages of the island, Greek and Turkish (Papapavlou 2001). However, such harsh rejection of the English language has not been supported by recent findings from comprehensive research on language attitudes and use (McEntee-Atalianis 2004, Papapavlou 1997). These studies reveal a strong desire to preserve the two national languages of the community (Standard Modern Greek and Cypriot Greek), but do not support the apprehension that English could take over and endanger national languages and concepts of identity. Most participants confirmed that English is seen as linguistic capital and as an important tool especially in the economic and professional domains, but that its use in the private domain is indeed restricted (McEntee-Atalianis 2004). Despite the decision against English as the official language of tertiary education, some public and private educational institutions (especially higher education) have been using English as the medium of instruction for many years (Papapavlou 2001), and there are also bilingual pre- and primary schools on the island. English is also the language for some parts of the media (McEntee-Atalianis 2004) and tourism with about 50% of all tourists being native speakers of English (Yazgin 2007). As a means of intranational communication, English is also used with migrant groups that are not proficient in Greek (see Yazgin 2007). Since English is still used in a variety of contexts and is not restricted to “international communication amongst a few bilingual people competent in English [...]” (see Mesthrie 2004: 805), we argue that it does not have traditional EFL status. However, it cannot be assigned clear ESL status because it is not an official language on the island, Cypriots do not usually use English among themselves, and the extent of bilingualism varies. There is also no institutionalized standard variety of CyE used by subgroups of the population. We thus assume hybrid EFLESL status for CyE. 2.4

CEDAR – Cyprus English Data Analysis and Research

In order to provide a comprehensive analysis of CyE that integrates a structural linguistic analysis, we began the process of assembling CEDAR our corpus of spoken English in Cyprus at the University of Cologne in 2007. The corpus consists of interviews conducted in the Greek part of the island. Once the data collection process is completed, CEDAR will contain approximately 350,000 words of informal spoken

English in Cyprus 

English from private direct conversations.6 Following the Labovian design of sociolinguistic interviews (see Labov 1968, Labov 1972, Labov 1984, Tagliamonte 2006), the interview questions were arranged in different modules, which inquire about various topics (e.g. family, hobbies, future plans etc.). The modules have been devised to retrieve a wide range of grammatical structures. By asking participants about past experiences, future plans and current hobbies and interests, various tenses are elicited. Other modules aim at triggering conditionals. In the end the corpus will contain about 130 interviews (and the respective transcriptions) of 20 to 60 minutes in length with L1 speakers of Cypriot Greek, a variety of Standard Greek. It will also include about 50 interviews conducted in Harringay, the Cypriot Greek community in London, to investigate potential differences between the two speech communities. The crucial issue is whether the features, found in CyE are simply transfer-induced learner features or whether they are shared across the whole speech community or subsegments thereof. For a first analysis of linguistic features of CyE, we have collected spoken data with priority, assuming “that oral performance is less constrained and less conservative than written styles, so this is where innovations [i.e. characteristic features] are most likely to surface” (Schneider 2004: 247). The data for our corpus are sampled along the lines of five sociolinguistic variables, age, sex, time spent abroad, education (private/bilingual vs. government schools), and occupation (collegiate vs. non-collegiate). The participants’ age has special importance because of the linguistically significant differences in the acquisition context. Cypriots of 60 years and older acquired English through formal classroom instruction on the one hand, but, more importantly, also in a natural immersive setting through direct contact with British military personnel, civil servants, merchants etc. Younger Cypriots, however, predominantly learn English through formal instruction. In our sample we identify three generational groups according to shared acquisition contexts: 1. Group 1: 10 to 25 (including high school and university students) 2. Group 2: 26 to 60 3. Group 3: 60 + 2.5

Potential candidates for structural nativization

For a first preliminary analysis of potentially nativized features we have coded ten interviews for three lexicogrammatical and three morphosyntactic features using TAMS Analyzer, a research tool designed to carry out qualitative and quantitative analyses (http://tamsys.sourceforge.net/). 6. This number only includes the interviewees’ share in the conversations.




2.5.1 Lexicogrammatical features The lexicogrammatical features we found in the corpus most likely occur as a result of L1 transfer. Greek Cypriots tend to prefer the time reference pattern before + X days/months/years over X days/months/years + ago. Consider the following examples from the corpus for illustration7: (1) a. I: I went to Thailand before few years [...]. b. I: That means before 60 and so years, Cyprus was not as it is now. In the data at hand, the informants do not use the British English (BrE) X days/months/ years + ago at all, but the CyE structure before + X days/months/years (see Table 1). The data also revealed a preference for like with Ø object pronoun as in: (2) a. I: [...] for this way I believe I like Ø. “This is why I believe I like (it).” b. IE: Do you [ / ]do you like to speak foreign languages? I: Yes, I like Ø. Table 2 summarizes the results and shows that speakers of CyE use like with Ø object pronoun more frequently (56.25%) than the BrE transitive like, which requires the use of an overt object. We also found in the data that Greek Cypriots prefer the quantifier too over very in structures such as: (3) a. I: [...] she likes too much to cook, [...]. b. I: [...] she love me too much, more than I. The results in Table 3 show that quantifying too is strongly preferred in CyE, e.g. in more than 70% of all cases where BrE requires very + much, Cypriots use too + much. Table 1. Results – before + X days/months/years Total

X days/months/years + ago

before + X days/months/years

% before

0

7

100%

7

Table 2. Results – like + Ø pronoun Total 32

7.

like + overt pronoun

like + Ø pronoun

% like + Ø pronoun

14

18

56.25%

I = Interviewee IE = Interviewer English

English in Cyprus 

Table 3. Results – too + much Total 32

very

too

% too

09

23

71.88%

2.5.2 Morphosyntactic features Also as a potential result of L1 transfer from Cypriot Greek, we found an underrepresentation of perfective aspect in our data, i.e. Greek Cypriots use past tense or present tense morphology instead. This might also occur due to influences from American English or as a general result of simplification strategies. Consider the following examples: (4) a. IE: Have you read all four volumes? I: Uh, I didn’t read the first one, [...]. b. I: It was really a pity and we are suffering from that time and on. Our tentative results reveal that in 75% of all instances where BrE normally requires the use of present perfect, the tested Cypriots used either past or present tense morphology with a slight preference for past tense structures, as illustrated in Table 4. Table 4. Results – the use of perfective aspect Total 24

present perfect simple past simple present past perfect 6

11

7

0

% simp. past/simp. present/past perfect 75%

The use of conditionals in hypothetical contexts is also underrepresented in CyE, as the following examples illustrate:

(5) a. IE: [...]. Uhm, what would you do if you won the lottery? You know, six million pounds, a lot of money. I: Uh, uhm, uh, the first thing that I will make with the first million, I will take it to the h & [//] uh to our house [...]. b. IE: [...]. If you won the lottery and you won like six million euros, what would you do? I: So # many things. I buy a new car. I go around to the many trips. The results for language choice to express hypothetical contexts given in Table 5 also show a strong preference for the CyE structures (72.55%) and a clear bias for will-future constructions over conditional forms.




Table 5. Results – hypothetical contexts Total

cond. perfect

cond. simple

will-future

simple present

% will-future/ simple present

0

14

23

14

72.55%

51

However, this cannot be adduced to L1 transfer effects, but can best be explained in terms of simplification strategies. Greek Cypriots omit referential and expletive subjects in many different contexts again influnced by their L1: Referential Ø subjects: (6) a.

IE: Okay. Okay, uhm, what are your plans for your future? I mean # just [///] do you wanna have a family, [...]? I: Some time yes. Ø Don’t have a problem but # not right now. IE: Not now. Okay. I: First, # Ø gonna finish my studies, get a job uh and after.

b. IE: Mhm. # Do [/] Perhaps you can tell me something about her [...]. I: Ø is straight person+/. c. IE: How was your life before the invasion? I: Before Ø is better, much better. I like more. Expletive Ø subjects:8 (7) a. I: Uh. ## I don’t know. Ø Is not easy if you win. b. IE: Mmh. Would you # recommend # buying # real estate in Larnaca at the moment? I: In Cyprus, yes. IE: In Cyprus in general? I: In Cyp yes, in Cyp & [//] in Larnaca Ø is very difficult to find something [...]. c. I: And is important to [/] to keep this feeling. The results for the use of referential and expletive Ø subjects in Table 6 show that, while referential pronouns are only dropped at a rate of 6%, expletive there and especially expletive it are omitted more frequently. This suggests that we are not confronted 8. The categorization here includes all uses of it which do not refer to a contextually given entity, are quasi non-anaphoric, and function as a semantically empty dummy construction (e.g. extrapositional and impersonal it, it-cleft constructions, it denoting weather, time, place and condition) (see Stirling & Huddleston 2002: 1481–1483).

English in Cyprus 

Table 6. Results – Ø subject pronouns Type of pronoun Referential Expletive

total

pronoun

Ø pronoun

% Ø pronoun

2085 total 100 it there 89 11

1960

125

6%

48

52

52%

it 39

there 9

it 50

there 2

it there 56.18% 18.18%

with a general pro drop feature in CyE, but that the omission of Ø expletives is the true potential candidate for structural nativization. The results presented here give a cross-section and mirror the average linguistic behavior of the participants involved. However, it should be noted that most of the figures are relatively small. Further research has to be carried out to validate our preliminary results. Assuming that especially the age factor is an important one, we now have to think about how to include and illustrate the impact of the interviewees’ age. 2.6

The Variety Spectrum

In search of a way to integrate feature occurrence, the impact of sociolinguistic variables, and the relationship between second language varieties and learner Englishes, we developed the concept of a Variety Spectrum in form of a scatter plot. Given that there is no clear and static cut-off point between second language varieties and learner Englishes, the Variety Spectrum helps to characterize and account for hybrid cases like CyE. It allows for depicting variety-internal variation, and integration of second language varieties and learner Englishes. We now illustrate how this model works using and stratifying the results of the Ø subject analysis. For illustrative purposes, this exemplification considers only two of the five chosen sociolinguistic variables, age and gender. In a first step, we coded and quantified the token frequency of both Ø expletive and Ø referential subjects for each informant. Table 7 summarizes the individual results for the use of Ø expletive subjects and the stratification criteria under investigation. We then applied the respective mean value for each informant to a coordinate system with percentage of occurrence located on the y-axis and the age of the individual speaker marked on the x-axis. The male/female distinction is illustrated by triangles for the male participants and dots for the females (see Figure 1).


Table 7. Results – Ø expletives Nr.

age

male (m) female (f)

Ø expletive

total Ø & overt

1 2 3 4 5 6 7 8 9 10

15 17 31 40 42 45 49 68 71 73

m f f m f m f f m m

1 0 0 3 0 14 4 1 7 22

11 2 2 5 5 16 4 2 21 32

mean % Ø expletives 9.1 0 0 60 0 87.5 100 50 33.3 68.8

120 Zero subjects (%)



100 80 60 40 20 0

Male Female 0

10

20

30

40

50

60

70

80

Age

Figure 1. Distribution – Ø expletives

We then used the same procedure for a stratified analysis of Ø referential subjects (Table 8 and Figure 2): The results support the initial hypothesis that, owing to differences in the acquisition context, an age effect will become apparent. Age group 1 (10–25 yrs.: under formal grammar instruction; left circle) uses noticeably fewer Ø subjects than age group 3 (generation 60 + yrs.; right circle), which most likely is an effect of formal language instruction. The tentative results for age group 2 (26–60 yrs.; middle circle) turn out to be heterogeneous. However, further analytical steps could shed light on these differences. We expect that including more variables – e.g. time spent abroad, education (private/bilingual vs. governmental schools) and occupation (collegiate vs. non-collegiate) – could account for heterogeneity, especially for “outliers” like participant number 2 (see Figures 1 and 2).

English in Cyprus 

Table 8. Results – Ø referentials age

male (m) female (f)

Ø referential

total Ø & overt

mean % Ø referential

1 2 3 4 5 6 7 8 9 10

15 17 31 40 42 45 49 68 71 73

m f f m f m f f m m

3 5 2 24 2 33 4 13 19 20

324 109 66 183 253 185 256 123 280 306

0.9 4.6 3.0 13.1 0.8 17.8 1.6 10.6 6.8 6.5

Zero subjects (%)

Nr.

20 18 16 14 12 10 8 6 4 2 0

Male Female 0

10

20

30

40

50

60

70

80

Age

Figure 2. Distribution – Ø referentials

The main idea of this model is that the higher the percentage of characteristic features used and the more homogeneous the picture is within one speech community, the more likely it is that the English under investigation is a variety with already nativized linguistic features. Summing up, we can point out the following options the model offers for variety-internal analyses: 1. It illustrates the token frequency of any linguistic feature and the interaction between frequency of occurrence and any predefined sociolinguistic variable. 2. It is flexible in that it allows for the inclusion of different numbers and types of sociolinguistic variables in different steps of analysis and with respect to different individual features.




3. It depicts and approximates the linguistic behavior of the speech community under investigation also in heterogeneous, complex situations. 4. It provides an overview of what features might be indicative of structural nativization, e.g. it visualizes if the features under investigation have already undergone the process of successive spread and are used by the majority of speakers or if they are still used by only subgroups of the whole speaker community or just a few individual speakers. However, the model can also be used for inter-varietal analyses: 1. It allows for a comparison between different varieties with respect not only to linguistic features, but also with respect to the impact of sociolinguistic variables. 2. Assuming that there is no clear cut-off between second language varieties and learner Englishes, this model is a first attempt to integrate both types as well as hybrid cases like CyE in a unified approach. Illustrating two or more Englishes in the same scatter plot, the Variety Spectrum depicts the relationship, e.g. characteristic overlaps, similarities and differences, between these Englishes without rigid categorization and overgeneralization. 2.7

Language attitudes and use

As pointed out above, the socio-political investigations and the linguistic findings have to be complemented by research on language attitudes and use. To this end, we used a two-part questionnaire which has been adapted from Künstler, Mendis & Mukherjee (2009). The first part of this questionnaire investigates the use of English, Cypriot Greek, and Standard Greek within the following domains: participants were asked to rate on a five-point Likert scale which of the languages they use in the family domain, when conversing with friends, with people they have never met before/foreigners, in email communication, education, everyday life (shops, airports, the marketplace etc.) and employment/business. In the second part of the questionnaire, participants were asked to read through 20 sentences and decide on a five-point scale whether they agree or disagree with different statements on the use of English. These sentences include statements on questions of language and identity, the role English plays in Cyprus and the status of both (Cypriot) Greek and English. The data will finally also be stratified with respect to the sociolinguistic variables introduced above to investigate whether the impact of such variables in this domain correlates with the results from the stratified feature quantification. A preliminary analysis of a first set of 100 questionnaires (out of about 400) confirms earlier empirical research findings on language use and attitudes in the Greek community of Cyprus. The “[r]esults suggest that all codes under investigation (i.e. SMG [Standard Modern Greek], GCD [Greek Cypriot Dialect]

English in Cyprus 

and English) retain a high share value within the linguistic marketplace” (McEntee 1999: 414). However, the Greek language is generally preferred over English, especially in the private domain, and participants expressed a strong desire to preserve the two national languages of the community (Standard Modern Greek and Cypriot Greek). English is mostly considered linguistic capital and an important tool especially in the economic and professional domain (McEntee-Atalianis 2004). 3. Placing Cyprus English on the map of World Englishes research 3.1

The ESL-EFL distinction in early models

The ESL-EFL distinction goes back to the first formal attempts to classify and characterize different varieties of English in the 1980s (e.g. Kachru 1985, 1986, 1988; Görlach 1990 and McArthur 1987). They classify the different varieties either by taking into account the functional and political role they play in the respective country or by focusing on geographical aspects.9 The first and most widely adopted model is Kachru’s Three Circles model (1985, 1992). It suggests a classification into Inner Circle, Outer Circle, and Expanding Circle countries, which basically follows the ENL (English as a Native Language), ESL (English as a Second Language), and EFL (English as a Foreign Language) distinction also employed in other approaches (Schneider 2003, 2007). The main distinction is that in Inner Circle countries like the United Kingdom, the USA, Australia, Canada etc., English is (one of) the de facto if not de jure official language(s) of the country and the native language for the majority of the inhabitants. In Outer Circle countries like India, Kenya, Singapore etc., the English language was transplanted to the respective speech community owing to historical reasons, e.g. the spread of English in the process of (British) colonization. In such countries, it is the often de jure second language for most of the inhabitants and is spoken in addition to the respective native language(s). In Expanding Circle countries, English may serve as a lingua franca, especially in international companies and is mainly used and taught as a foreign language through formal education (Schneider 2003, 2007, Schreier 2009;). However, current research has pointed to a number of problems and inaccuracies of the Kachruvian approach, since it leaves aside identity constructions and language attitudes and use. In addition, it does not take into account possible transitions between the three circles. It is thus a rather static approach (Mesthrie 9. For particular differences between these models see, for example, Bauer (2002), Mesthrie & Bhatt (2008), or Schneider (2003).




& Bhatt 2008, Schreier 2009;) that fails to characterize some varieties e.g. South African English, where English has ENL status for some parts of the population, but most indigenous people use English as a second language (Bruthiaux 2003, Mesthrie & Bhatt 2008, Schneider 2007). A similar, even more complex problem arises when we look into the heterogeneous language situation in Cyprus (see 2.2 and 2.3). The investigation of the socio-political background of Cyprus suggests that it cannot clearly be assigned to either the Outer or the Expanding Circle because of the generational layers and the different uses for English in terms of ESL/EFL criteria. The Three Circles model does not illustrate variety-internal differences (Schneider 2003) either, i.e. it does not visualize linguistic variation like that found within the Cypriot speech community that occurs due to synchronic social and demographic factors.10 3.2

Continuous models

An alternative approach to the static models of the 1980s has been developed by Schneider (2003, 2007). With his Dynamic Model he introduced a more flexible approach, which postulates a comprehensive approach to World Englishes. The basic assumption of his model is that, despite variation in the development of different varieties, “there is an underlying uniform process which has driven the individual historical instantiations of PCEs [Postcolonial Englishes] growing in different localities” (Schneider 2007: 21). This process proceeds along five major phases: (1) foundation, (2) exonormative stabilization, (3) nativization, (4) endonormative Schneider 2007; and (5) differentiation (Schneider 2003, 2007). This approach seems best suited for placing CyE onto the map of World Englishes. The Dynamic Model moves beyond an analysis of the functional and political role of English by taking into account the entirety of the historical and linguistic development, language attitudes and identity constructions. It also integrates feature analyses in the light of sociolinguistic findings (Schneider 2003, 2007). Postulating this integrative procedure, it accommodates heterogeneous language situations, allows for intergenerational differences, and considers the impact of other sociolinguistic variables. Allowing for stagnation and reversal in the development of postcolonial Englishes, the Dynamic Model leaves room for placing CyE within the framework of World Englishes, although it has not fully gone through the process of structural nativization and institutionalization.

10. It should be noted here that Kachru does not generally neglect the idea of variety internal differences and variation (see for example Kachru 1986, Kachru 1992).

English in Cyprus

4. Discussion Our discussion of different approaches to World Englishes shows once more that early models of World Englishes (e.g. Kachru 1985) do not make visually accessible complex linguistic situations. Drawing on Schneider’s (2003, 2007) more flexible and integrative Dynamic Model we interpret the socio-political investigations made above as first evidence that CyE has not reached the phase of feature nativization (phase 3), since this stage normally requires gradual assimilation of identity constructions. Cypriots, on the other hand, have generally tried to preserve their unique identity against foreign influence. However, it lies in the nature of such a model that phases are never static, but “boundaries and succession of stages may be realized fuzzily” (Schneider 2007: 57). Our analysis of linguistic features supports this observation in that it has revealed potential candidates of nativization, especially for the older generation. This leads us to the conclusion that CyE was on its way to structural nativization and reached an early phase 3 during the final years of the British occupation. The results for the younger generation learning English through formal instruction only, however, show a considerably less frequent use of characteristic features and a stronger exonormative orientation towards BrE. CyE thus appears to have had the status of a second language variety once, but turns out to be one of the few Englishes that are undergoing a process of reverting development (see also Yazgin 2007). Assuming that this never happens abruptly but is a developmental process, we can also account for the heterogeneity found for the middle-aged group (26–59 yrs.). Older speakers of this group may also have acquired the English language in a natural setting, e.g. through professional contacts with British military personnel, civil servants etc. and older Cypriots. The younger the speaker is, however, the more likely it is that s/he mainly acquired English through formal instruction only. English language teachers in Cyprus are normally young or middle-aged Cypriots who have spent a considerable amount of time in an L1-English country (most often Great Britain) and pass their strong exonormative orientation on to their students. Hence, their speech performance approximates a typical learner English with a strong focus on (BrE) norms. However, this should not lead to the conclusion that CyE has clear EFL status today. We observe an increasing tendency for parents to send their children to English speaking countries, predominantly the UK, to study there and become fluent in English. To date analyses of language attitudes and use have revealed that English was and still is an important second language on the island. So, despite the decline in status and role of the English language, the situation is still not on a par with typical EFL countries, and CyE has to be considered a hybrid case.






5. Conclusion This paper is a first attempt at placing CyE on the map of World Englishes and determining its variety status. We have focused on the question whether or not CyE should be considered a second language variety of English, or whether it is best characterized as learner English. First tentative results from an investigation of the socio-political background, language attitudes and use, and a stratified analysis of linguistic features show that we are faced with a hybrid case that has neither clear ESL nor EFL status. Our results suggest that CyE is one of the rare cases where development is undergoing reversal from an early phase of structural nativization to the phase of exonormative stablization. However, to conclusively account for the status of CyE, more detailed research has to be carried out. We have to further investigate whether the hypothesized (and tentatively observed) fundamental differences in the linguistic behavior of the older and the younger generations of speakers can be statistically confirmed. We will also include 30 interviews with speakers of ‘Standard’ Greek that will serve as a control group. The comparison between the data collected in Cyprus and in mainland Greece (Thessaloniki) will help answer the question whether feature occurrence and use in CyE is unique to a potential variety spoken on the island, or whether they are learner features shared by all learners with an L1 Greek background regardless of the sociolinguistic context. In the long run, we will also enrich the corpus with written data from student essays and articles from two local English newspapers, Cyprus Today and The Cyprus Weekly. Moreover, it will be a future task to illustrate how the Variety Spectrum works for cross-varietal analyses by integrating a clear second language variety and a typical learner English and comparing them to CyE. References Bauer, L. 2002. An Introduction to International Varieties of English. Edinburgh: EUP. Bruthiaux, P. 2003. Squaring the circles: Issues in modeling English worldwide. International Journal of Applied Linguistics 13(2): 159–178. CIA. The World Factbook. Europe: Cyprus. (02-09-2009). Corder, S.P. 1981. Error Analysis and Interlanguage. Oxford: OUP. Goutsos, D. 2001 A discourse-analytic approach to the use of English in Cypriot Greek conversations. International Journal of Applied Linguistics 11: 194–223. Görlach, M. 1990. Studies in the History of the English Language. Heidelberg: Carl Winter.

English in Cyprus  Kachru, B.B. 1985. Standards, codification and sociolinguistic realism: The English language in the outer circle. In English in the World: Teaching and Learning the Language and Literatures, R. Quirk & H.G. Widdowson (eds), 11–30. Cambridge: CUP. Kachru, B.B. 1986. The Alchemy of English: The Spread, Functions and Models of Non-native English. Oxford: Pergamon Press. Kachru, B.B. 1988. The sacred cows of English. English Today 16: 3–8. Kachru, B.B. 1992. Models for non-native Englishes. In The Other Tongue: English across Cultures, B.B. Kachru (ed.), 48–74. Urbana IL: University of Illinois Press. Karyolemou, M. 1994. Linguistic attitudes and metalinguistic discourse: An investigation in the Cypriot press. In Themes in Greek Linguistics: Papers from the First International Conference on Greek Linguistics, I. Philippaki-Warburton, K. Nicolaidis & M. Sifianou (eds), 253–259. Amsterdam: John Benjamins. Künstler, V., Mendis, D. & Mukherjee, J. 2009. English in Sri Lanka: Language functions and speaker attitudes. Anglistik 20(2): 57–74. Labov, W. 1968. The reflection of social processes in linguistic structures. In Readings in the Sociology of Language, J. Fishman (ed.), 240–251. Berlin: Mouton de Gruyter. Labov, W. 1972. Some principles of linguistic methodology. Language in Society 1(1): 97–120. Labov, W. 1984. Field methods of the project on linguistic change and variation. In Language in Use, J. Baugh & J. Sherzer (eds), 84–112. Englewood Cliffs NJ: Prentice Hall. Leonidou, L. 2007. For 59,000+ British expats in Cyprus – don’t lose your UK vote. Cyprus Mail. (25-08-2009). McArthur, T. 1987. The English languages? English Today 11: 9–11. McEntee, L.J. 1999. Language use and attitudes towards Greek, English and the Greek-Cypriot dialect in the Greek-Cypriot community in Nicosia, Cyprus. Proceedings of the Fourth International Conference on Greek Linguistics. Nicosia. September 1999, 408–415. Thessaloniki: University Studio Press. McEntee-Atalianis, L.J. 2004. The impact of English in post-colonial, post-modern Cyprus. Journal of the Sociology of Language 168: 77–90. Mesthrie, R. 2004. Introduction: Varieties of English in Africa and South and Southeast Asia. In A Handbook of Varieties of English, Vol. II: Morphology and Syntax, B. Kortmann, K. Burridge, R. Mesthrie & E.W. Schneider (eds), 805–812. Berlin: Mouton de Gruyter. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Linguistic Varieties. Cambridge: CUP. Mollin, S. 2006. Euro-English: Assessing Variety Status. Tübingen: Gunter Narr. Mühleisen, H.-O. 1986. Der Zypernkonflikt 1950–1984. In Inseln als Brennpunkte internationaler Politik. Konfliktbewältigung im Wandel des internationalen Systems 1890–1984: Kreta, Korfu, Zypern, J. Dülffer, H.-O. Mühleisen & V. Torunsky (eds), 97–144. Köln: Verlag Wissenschaft und Politik. Papapavlou, A.N. 1997. The influence of English and its dominance in Cyprus: Reality or unfound fears? Journal of Mediterranean Studies 7(2): 218–249. Papapavlou, A.N. 2001. The spread of English worldwide and the situation in Cyprus: Growing concerns. In Proceedings of the 4th International Conference on Greek Linguistics, Y. Agouraki A. Arvaniti, J. Davy, D. Goutsos, M. Karyolaimou, A.N. Panayotou, A.N.Papapavlou, P. Pavlou & A. Roussou (eds), 431–438. Thessaloniki: University Studio Press. Schneider, E.W. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79(2): 233–281.



Christiane M. Bongartz and Sarah Buschfeld Schneider, E.W. 2004. How to trace structural nativization: Particle verbs in world Englishes. World Englishes 23(2): 227–249. Schneider, E.W. 2007. Postcolonial English: Varieties Around the World. Cambridge: CUP. Schreier, D. 2009. Assessing the status of lesser-known varieties of English. English Today 25(1): 19–24. Selinker, L. 1972. Interlanguage. IRAL 10: 209–231. Statistical Service of the Republic of Cyprus. 2008. Latest figures: Population of Cyprus, 2007. (25–08–2009). Schwenger, C.P. 1964. Selbstbestimmung für Zypern. Die Prinzipien von Selbstbestimmung und Schutz der Nation in ihrem Einfluß auf die Entstehung der Republik Zypern. PhD dissertation, University of Würzburg. Stirling, L. & Huddleston, R. 2002. Deixis and anaphora. In The Cambridge Grammar of the English Language, R. Huddleston & G.K. Pullum (eds), 1449–1564. Cambridge: CUP. Tagliamonte, S.A. 2006. Analysing Sociolinguistic Variation. Cambridge: CUP. TAMS Analyzer. (05-10-2009). Tsiplakou, S. 2009. English in Cyprus: Outer or expanding circle. Anglistik 20(2): 75–87. (Special issue). Tzermias, P. 1991. Geschichte der Republik Zypern: Mit Berücksichtigung der historischen Entwicklung der Insel während der Jahrtausende. Tübingen: Francke. Yazgin, N. 2007. The role of the English language in Cyprus and its effects on the ELT classroom. ERIC document ED496971. (25-08-2009).

From EFL to ESL Evidence from the International Corpus of Learner English Gaëtanelle Gilquin and Sylviane Granger Université catholique de Louvain

This chapter revisits the dichotomy that is traditionally made in Second Language Acquisition (SLA) research between English as a Foreign Language (EFL) and English as a Second Language (ESL) and argues, on the basis of data from the International Corpus of Learner English, that it should be viewed as a continuum instead, with many in-between categories corresponding to a variety of learning contexts. Using the case of the preposition into as an illustration, we show that the different environments in which Spanish-, French-, Dutchand Tswana-speaking students learn English are reflected in their syntactic, semantic and lexical use of the preposition. More precisely, it appears that the Spanish-, French- and Dutch-speaking learners, who represent a cline in terms of exposure to the target language, from little exposure for the Spanish learners to considerable exposure for the Dutch learners, also form a cline in their use of into, from most distant to most similar to native (British) English. As for the Tswana variety, which clearly displays characteristics of both EFL and ESL, it occupies different positions along the cline, being sometimes closest to native English and sometimes most dissimilar, depending on the features of the use of into that are considered.

1. Introduction The distinction between English as a Foreign Language (EFL) and English as a Second Language (ESL) is a long-established one in Second Language Acquisition research. Like many other distinctions (e.g. nativeness vs non-nativeness, grammar vs lexicon), the distinction was initially presented as a dichotomy, but gradual recognition of the complexity of the language learning process and its many contextual determinants has led to a more qualified view. In this paper, we argue that, far from being clear-cut, the distinction between EFL and ESL should be



Gaëtanelle Gilquin and Sylviane Granger

viewed as a continuum with many in-between categories. We demonstrate this on the basis of data from the International Corpus of Learner English (ICLE). While this corpus was collected in such a way as to represent EFL, some components of it contain data produced by learners who studied English in a context closer to ESL. We may expect such differences in the learning environment to be reflected in language itself. This hypothesis is tested by means of a study of a notoriously difficult preposition, the preposition into, in four components of ICLE: the Spanish, French, Dutch and Tswana components. These data are compared with each other and with a reference corpus of British newspaper editorials (MULT-ED). The chapter is organized as follows: in Section 2, we define EFL and ESL and argue for considering them as two extremes on a cline rather than as a dichotomy. Next, we justify the choice of prepositions, and in particular the preposition into, as a subject for studying variation among several learner varieties. Section 4 consists of the corpus analysis proper, with results for the frequency of into, its syntactic, lexical and semantic behaviour, as well as its phraseological and non-standard uses. In Section 5, we introduce another distinction, that between novice and expert writing and, using data from LOCNESS (Louvain Corpus of Native English Essays), we briefly consider the role of the degree of expertise, and its relation with the nativeness/non-nativeness distinction. Section 6 concludes the paper. 2. From EFL to ESL The general framework within which this chapter is situated is that of Second Language Acquisition (SLA), i.e. the learning of a language after the first language has been learned. Within that context, we follow Gass & Selinker (2001: 5) in using the term Foreign Language to refer to “the learning of a nonnative language in the environment of one’s native language” and Second Language to “the learning of a nonnative language in the environment in which that language is spoken”. Like them, however, we also recognize that the picture is more complex as the degree and type of exposure may vary considerably in the two learning contexts. The International Corpus of Learner English (ICLE), which we have used for this study, is essentially a corpus of writing by learners of English as a Foreign Language rather than Second Language (cf. Granger et al. 2009). The corpus contains argumentative essays produced by higher intermediate to advanced learners from 16 different mother tongue backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian, Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Tswana, Turkish). It has been used to investigate a wide range of lexical, grammatical and discourse features of L2 writing. Researchers have either focused exclusively on one learner population (e.g. Neff van Aertselaer’s [2008] study of interpersonal

From EFL to ESL 

discourse phrases in Spanish learner writing) or compared two or more of them (e.g. Lozano & Mendikoetxea’s [2008] study of postverbal subjects in Italian and Spanish learner writing). Another option consists in using several ICLE subcorpora (or all of them) and treating them as an aggregate, irrespective of the learners’ mother tongues. Nesselhauf (2009), for example, uses a corpus, called ICLE-4L1, which contains data from German, French, Finnish and Polish learners. This corpus, however, is analyzed with no distinction between the four L1 components and serves as a basis to draw conclusions about “learner English” in general. Although ICLE is essentially an EFL corpus, it is important to bear in mind that there are a number of factors that blur the line between the two situations, amongst them the presence or absence of language instruction (in the case of ESL), the number of years of instruction, the focus of language lessons (focus on form and/or communication), the use of the target language for some or all of the nonlanguage subjects (for EFL), the quality of teacher talk, the type and amount of exposure to the target language outside the classroom, in particular access to English-speaking media and in the case of EFL learners, the amount of time spent in a country where English is spoken. For our study, we selected four populations – Spanish, French, Dutch and Tswana – that occupy different points along the EFLESL cline with regard to two main factors: amount of exposure to the target language and focus of language instruction. In order to ensure comparability of the data, we checked one other factor, namely the number of months spent in a country where the target language is spoken. Using the ICLE interface, we only selected texts produced by learners who had spent a maximum of three months in an English-speaking country. As shown in Table 1, the Spanish- and French-speaking learners are characterized by a relatively low amount of exposure to English. As regards the media, films and television serials are dubbed and the English in TV programmes, notably the news, is voiced over. Although nowadays the Internet is a source of potential contact with English, the ICLE data were collected before it became a major issue. The amount of exposure received by the French-speaking students represented in ICLE is arguably somewhat higher than that of the Spanish learners as all the linguistics and literature courses of their English philology degree are taught via the medium of English, while practices on this point vary in Spain. By contrast, the Dutch and Tswana learner populations benefit from a much higher degree of exposure. The Dutch learners get to hear a lot of English as all the films, TV shows and soaps are subtitled rather than dubbed and English speakers on TV programmes are not voiced over. According to Koolstra & Beentjes (1999), Dutch children spend about half their TV time watching programmes with English language sound. Ginsburgh & Weber (2006) attribute the much higher proficiency level of Dutch-speaking compared to French-speaking Belgians largely to that factor. The positive




Table 1. ICLE subcorpora: EFL-ESL cline ICLE subcorpus SP FR DU TSW

Exposure

Focus on form

– +/– + +

+ + + –

effect of undubbed TV programmes has led Van Parijs (2004) to launch a ‘ban dubbing’ campaign, which has unfortunately not had much impact so far in French-speaking Belgium or France. Besides the potential benefit gained from access to English-speaking media, the Tswana learners have an additional advantage as classes are taught through the medium of English from the fifth grade in primary school. While in primary school, code-switching between Setswana and English is the norm, in high school English instruction is dominant (cf. Van Rooy 2009: 199). As regards focus on form in the language classroom, the Spanish, the French and the Dutch cluster together in having their attention directed to morphological, grammatical and lexical accuracy. By contrast, the immersion education received by the Tswana learners leads to a high level of functional proficiency in English but, as demonstrated in numerous studies of immersion programmes, this advantage is counterbalanced by a much lower degree of (especially grammatical) accuracy and “endemic fossilization” (Sheen 2006: 828). A factor that further complicates the issue in the case of the Tswana learners is that they may be influenced by the emerging variety of Black South African English (Van Rooy 2009), notably via their own teachers who are predominantly Setswana speakers, not English speakers. Applying the Contrastive Interlanguage Analysis (CIA) methodology (Granger 1996, Gilquin 2000/2001), we carried out two types of comparison: a comparison between learner corpora and a comparison between learner corpora and reference corpora. Table 2 gives the breakdown of the corpora used. The four learner corpora, which are extracted from the second version of ICLE (Granger et al. 2009), contain essays written by higher intermediate to advanced learners with Spanish, French, Dutch or Tswana as a mother tongue. They were compared to three reference corpora. Two of these contain editorials from British and South African English newspapers respectively and are part of a larger corpus collected at Louvain, the Multilingual Editorials Corpus (MULT-ED).1 The third

1.

More information can be found at http://www.uclouvain.be/cecl-multed.html.

From EFL to ESL 

Table 2. Breakdown of learner and reference corpora Corpus Learner corpora

Reference corpora

No. of words ICLE-SP ICLE-FR ICLE-DU ICLE-TSW TOTAL BrE editorials SAE editorials LOCNESS TOTAL

TOTAL

156,840 182,328 192,771 199,380 731,319 152,123 150,401 150,590 453,114 1,184,433

reference corpus is the Louvain Corpus of Native English Essays (LOCNESS), a corpus of essays written by American English students. As the analysis of the data hardly showed any difference between the British and South African English corpora,2 we will only report the results for the British component (cf. Section 4). The comparison with LOCNESS will be the subject of Section 5. Based on the configuration represented in Table 1, we hypothesized that the Dutch learner population would be closest to the reference corpora, followed by the French and the Spanish. In view of the mixed configuration displayed by the Tswana group, no hypothesis was formulated for that learner population. 3. The preposition into In this study, we focus on the use of prepositions, and more precisely the preposition into. Prepositions have been shown to be problematic for non-native speakers of English. Kao (2001), for example, has demonstrated that communicatively redundant prepositions are likely to be omitted by learners. Many SLA specialists have also underlined learners’ tendency to avoid prepositions (e.g. Hulstijn & 2. This, admittedly, may be partly due to the fact that newspaper editorials tend to be heavily edited by native speakers of English. That differences may nonetheless exist between the British and African varieties of English is suggested by Mwangi’s (2003) study, which shows that into is significantly more frequent in ICE-GB, the British component of the International Corpus of English, than in ICE-K, the Kenyan component of the corpus. It should be emphasized, however, that using the written part of ICE-K as a reference corpus still results in a significant underuse of into among the four learner populations investigated here (p < 0.01 with the log-likelihood test), as is the case with our corpus of British editorials (see Section 4.1).

 Gaëtanelle Gilquin and Sylviane Granger

Marchena 1989, Sjöholm 1995, Liao & Fukuya 2004, Siyanova & Schmitt 2007). In fact, prepositions are often considered as the bête noire of both teachers and learners, being impossible to teach and impossible to learn. Prepositions also have a special status in indigenized varieties of English (World Englishes), where they can be described both as a ‘mutating species’ and an ‘endangered species’: ‘mutating species’ because they are likely to lead to innovations in World Englishes (see, e.g., Mukherjee 2010 on new prepositional verbs in Indian English), and ‘endangered species’ because some prepositions tend to vanish (see Mwangi 2003, 2004 on Kenyan English). Into is a particularly interesting preposition from an SLA perspective, because of its obvious link with (and hence possible confusion with) the preposition in. Thus, it is common for learners to use in instead of into, especially owing to the lack of a similar contrast in the learner’s mother tongue, cf. Swan’s (2005: 244) example: The ball rolled slowly *in the goal. The distinction between in and into also seems to be gradually disappearing from some indigenized varieties of English, e.g. There are so many people just coming in the country (Mwangi 2004: 28). In what follows, we will investigate several aspects of the use of into in ICLE (and the reference corpus of British English), namely its frequency, the syntactic structures in which it appears, the lexical variation it displays, the senses in which it is used, its phraseological uses and its non-standard uses. 4. Quantitative and qualitative analysis of into in ICLE 4.1

Frequency

The relative frequency of into per 100,000 words ranges from 146 (in the corpus of British editorials, BrE) to 71 (in ICLE-TSW). If we examine the full range of results, shown in Figure 1, we notice that three groups seem to emerge: one with the native corpus, one with ICLE-DU and ICLE-FR, and one with ICLE-SP and ICLETSW.3 These results confirm the general underuse of the preposition into in the non-native varieties of English. However, they also reveal a great disparity among the ICLE varieties, with a mild underuse in the Dutch and French subcorpora, and a marked underuse in the Spanish and Tswana subcorpora. With respect to our initial hypothesis, we see that the cline between ICLE-DU, ICLE-FR and ICLE-SP is confirmed, and that ICLE-TSW, which shows a mixed configuration in terms of learning context, comes last, after the Spanish learners. 3. The log-likelihood test reveals significant differences between BrE and ICLE-DU (p < 0.001) and between ICLE-FR and ICLE-SP (p < 0.05), but no significant differences between ICLE-DU and ICLE-FR, nor between ICLE-SP and ICLE-TSW.

From EFL to ESL  160 140 120 100 80 60 40 20 0 BrE

ICLE-DU

ICLE-FR

ICLE-SP

ICLE-TSW

Figure 1. Relative frequency of into per 100,000 words

4.2

Syntactic structures

From a syntactic point of view, a distinction may be drawn between three main structures: N + into, Vintrans + into and Vtrans + into (+ NP or Ving), as illustrated by the following sentences: (1) It does not need much research into Labour Party history to see that pounds and pence have been the downfall of many previous Prime Ministers of the Left. (2) But on many issues his generally admirable resoluteness has descended into pig-headed obstinacy. (3) a. Nothing looked more certain than that he would lead his party into the next general election, due in 2005 or, at the latest, in 2006. b. Others have ended because both sides were exhausted, or because outsiders cajoled them into putting down their weapons and starting to talk.

Not too surprisingly, the three structures display an underuse among learners (although to different degrees), as appears from Table 3. This underuse is particularly pronounced in the Spanish and Tswana subcorpora and less strong in the French and (especially) Dutch subcorpora, which corroborates the results for the overall frequency of into. The figures for the transitive use of into (Vtrans + into), however, hide an interesting variation. If we consider the causative use of into, i.e. Vtrans + into + Ving, as exemplified by (3b), we notice that, while this structure is generally rare among learners, it is used more often by the Dutch and Tswana learners, less often by the




Table 3. Relative frequency (per 100,000 words) and raw frequency of syntactic structures with into*

N + into Vintrans + into Vtrans + into

BrE

ICLE-DU

ICLE-FR

ICLE-SP

ICLE-TSW

12.49 (19) 49.96 (76) 82.17 (125)

6.23 (12) 43.06 (83) 54.46** (105)

5.48(*) (10) 24.13*** (44) 63.08(*) (115)

2.55*** (4) 19.77*** (31) 49.73*** (78)

1.50*** (3) 29.09** (58) 39.62*** (79)

* The asterisks indicate the degree of statistical significance of the log-likelihood test (BrE vs ICLE): (*) for p < 0.05, * for p < 0.01, ** for p < 0.005 and *** for p < 0.001.

7 6 5 4 3 2 1 0

BrE

ICLE-DU

ICLE-TSW

ICLE-FR

ICLE-SP

Figure 2. Relative frequency of causative structures with into per 100,000 words

French-speaking learners, and it never occurs in the Spanish component of ICLE (cf. Figure 2). The Tswana learners, who present the lowest frequency of into, thus appear to do comparatively well when it comes to the causative use of into. These results also show that, while the Dutch, French and Spanish learners seem to have a relatively fixed position along the cline predicted on the basis of the learning context, the Tswana learners appear to occupy varying positions depending on the feature that is investigated: further down the cline if we consider the overall frequency of into (see Figure 1), but closer to the native speakers if we examine the causative use of into. 4.3

Lexical variation

Limiting ourselves to the verbal structures with into (Vintrans + into and Vtrans + into), we then examined the lexical variation displayed by the verb among the four learner populations. In order to do so, we calculated the number of different lemmas per 100,000 words in the ICLE subcorpora, and compared these results with the

From EFL to ESL  80 70 60 50 40 30 20 10 0 BrE

ICLE-DU

ICLE-FR

ICLE-TSW

ICLE-SP

Figure 3. Relative frequency of verb lemmas with into per 100,000 words

results for the British English reference corpus. As appears from Figure 3, the learners use significantly fewer lemmas than the native speakers (log-likelihood value = 16.40, p < 0.001), which suggests repetition of a limited repertoire of verbs with into – a phenomenon which, incidentally, has been brought to light for other aspects of the learner’s lexicon (see, e.g., Hasselgren 1994). While the Dutch learners exhibit slightly better results, there is, generally speaking, little variation among the four groups of learners (no significant differences according to the log-likelihood test, except for a marginally significant one between ICLE-DU and ICLE-SP, p < 0.05). Table 4 lists the most frequent verb lemmas in the five corpora.4 It will be noticed that, despite the higher frequency of verbal structures with into in the British corpus, the list of lemmas occurring five times or more in this corpus is shorter than the list for ICLE-DU and ICLE-FR, which points to a lower degree of repetition in native English. In ICLE-SP and ICLE-TSW, the list of frequently recurring lemmas is not so long, but the top lemmas appear to be extremely common (take occurs 23 times in ICLE-TSW and as many as 27 times in ICLE-SP). Another phenomenon worth underlining is the higher degree of recurrence of certain verbs in the ICLE varieties as compared to the reference corpus, in particular take and put (which have been highlighted in the table). Some examples are given in (4) to (7).5

4. Although these figures are likely to be influenced by corpus size, it should be noted that the corpora used in this study are relatively similar in size, varying between 150,000 and 200,000 words. The same remark applies to some other results in the following sections as well. 5.

The examples are reproduced exactly as they appear in ICLE.




Table 4. Top verb lemmas (5 occurrences or more) BrE

ICLE-DU

ICLE-FR

ICLE-SP

ICLE-TSW

turn (21) bring (9) pour (9) go (7) break (5) fall (5) get (5)

turn (19) go (15) take (15) put (14) come (12) get (11) bring (6) change (5) force (5)

take (22) turn (16) put (12) divide (10) bring (9) come (9) get (8) transform (8)

take (27) put (12) divide (10) turn (8) fall (6)

take (23) come (9) look (8) turn (7) put (6)

(4) Another important fact to take into account is the bad treatment received while doing the military service. (5) My view is that the economy of the country should be taken into consideration. (6) To conclude money is useful, as long as it helps people to respect true human values and to put them into practice. (7) All this must be put into perspective, of course. Religion as well as television can be very positive and valuable in life.

4.4

Semantic analysis

In order to study the semantic behaviour of into, we distinguished between eight senses and categories of use of the preposition, using a dictionary- and corpusderived methodology similar to that described in De Cock & Granger (2004).6 These are listed in Table 5 and illustrated with examples from the corpus of British newspaper editorials. The semantic analysis of the corpora (see Table 6) reveals that the prototypical sense of concrete movement is never predominant. Instead, the most frequent sense in the reference corpus is abstract movement, as in (8), closely followed by the sense of transformation, whereas in the four ICLE components it is the (semi-) fixed expressions that are most common. Several senses are significantly underused by all four groups of learners, namely abstract movement, transformation and causation (which may be related to the underuse of the causative structure, since most of the uses of into expressing causation are of this type). Most of the 6. The dictionaries we used are the Oxford Advanced Learner’s Dictionary (Wehmeier 2000) and the Macmillan English Dictionary for Advanced Learners, Second Edition (Rundell 2007).

From EFL to ESL 

Table 5. Semantic classification of into Sense/use

Example

Movement

Max Hastings marched into Port Stanley at the head of a column of British troops. And they are not only moving into manufacturing – they are increasingly competing in services, too. Turning the entire country into a focus group won’t solve any real problems of government. The insurers hope that taking a hard line will prod a capricious Government into taking flood protection more seriously. The Eskimos compartmentalize their flakes into fine, fresh, drifting, clinging or crusted. The new editors have declined to extend the story into the 20th century. Law-abiding citizens should have a greater entitlement to take action against burglars who break into their homes. A graduate tax takes into account the level of an individual’s earnings.

Abstract movement Transformation Causation Division Other meanings Phrasal verbs (Semi-)fixed expressions

Table 6. Relative frequency of senses/uses of into per 100,000 words (and percentages) BrE Movement Abstract movement Transformation Causation Division Other meanings Phrasal verbs (Semi-)fixed expressions

ICLE-DU

ICLE-FR

15.12 (11.4%) 11.93 (12.2%) 9.32 (10.7%) 36.15 (27.4%) 22.31(*) 9.87*** (11.3%) (22.9%) 23.04(*) 34.84 (26.4%) 22.31(*) (22.9%) (26.4%) 14.46 (10.9%) 4.67** (4.8%) 3.29*** (3.8%) 1.97 (1.5%) 2.59 (2.7%) 8.23(*) (9.4%) 3.29 (2.5%) 1.04 (1.1%) 2.74 (3.1%) 7.89 (6.0%) 18.41 (13.9%)

ICLE-SP

ICLE-TSW

7.01(*) (10.1%) 6.52(*) (9.5%) 7.01*** (10.1%) 18.06*** (26.3%) 15.3*** (22.0%) 8.53*** (12.4%) 0*** (0.0%) 8.29(*) (11.9%) 4.46 (6.4%)

3.51*** (5.1%) 0.5 (0.7%) 6.02 (8.7%)

7.78 (8.0%) 3.29 (3.8%) 1.28* (1.8%) 24.9 (25.5%) 27.42 (31.5%) 26.14 (37.6%)

6.02 (8.7%) 19.56 (28.5%)

other results point to a lower frequency among the learners too, even though they do not always reach the threshold of statistical significance. This widespread underuse stands in stark contrast to the overuse of (semi-)fixed expressions, which are used with a relative frequency ranging from 19.56 to 27.42 per 100,000 words in ICLE, as against 18.41 in the reference corpus. The only other sense that is




found more often among some of the learner populations is division, significantly overused by the French- and Spanish-speaking learners, e.g. (9).

(8) Instead of falling into the easy temptation to also posture grandly, Mr Blair should seek a constructive relationship with the unions while standing firm on his policy agenda. (9) In this case, the believers are divided into two groups: catholics and protestans.

4.5

Phraseological uses

In the preceding section, we saw that phraseological usage plays a prominent role in the learners’ use of into. In this section, we focus on two types of phraseological uses of the preposition, namely its use in (semi-)fixed expressions and its use in phrasal verbs. We use the term ‘(semi-)fixed expression’ to refer to those expressions that are described as such in the Oxford Advanced Learner’s Dictionary (e.g. take into account, burst into the open, play into the hands of), whereas phrasal verbs are defined here as non-compositional prepositional verbs (e.g. look into, feed into). Interestingly, these two phraseological uses show divergent patterns of use in ICLE, with a high frequency of (semi-)fixed expressions and a relatively low frequency of phrasal verbs. Figure 4 shows the relative frequency of (semi-)fixed expressions in native English and learner English (see also Table 6). Although the results are not statistically significant and we would clearly need more data on which to base the analysis, it is interesting to note that the four learner populations make a greater use of (semi-)fixed expressions than native speakers, especially the French- and Spanishspeaking learners and, to a lesser extent, the Dutch-speaking learners. The Tswana learners come closer to the standard set by the native speakers (as was the case for the causative use of into). This seems to contradict the common claim that a great deal of exposure is necessary in order to acquire formulaic expressions, since the learners with the least exposure (French- and Spanish-speaking learners) show the highest frequency of (semi-)fixed expressions. There are at least two possible explanations for this apparent contradiction.7 One is that such expressions often have a direct equivalent in the learners’ mother tongues. A detailed contrastive analysis would be needed in order to support this hypothesis, but the equivalence between, e.g., take into account and French prendre en compte or Spanish tener en cuenta seems to point in this direction. 7. As pointed out by one of the reviewers, teaching might also be an additional factor to consider when seeking to account for the overuse of (semi-)fixed expressions among learners.

From EFL to ESL  30 25 20 15 10 5 0 BrE

ICLE-TSW

ICLE-DU

ICLE-SP

ICLE-FR

Figure 4. Relative frequency of (semi-)fixed expressions with into per 100,000 words

Another possible explanation for our results is that the learners are likely to repeat expressions that are familiar to them and appear to be safe. This explanation is confirmed by Table 7, which lists the most frequent (semi-)fixed expressions with into in the four ICLE subcorpora, together with their absolute frequencies and the percentage they represent in the total of (semi-)fixed expressions. It turns out, among other things, that the use of take into account accounts for 50% of all (semi-) fixed expressions with into in ICLE-FR, and that in ICLE-SP this proportion reaches 54%. In native British English, by contrast, take into account occurs only three times, which represents a percentage of some 10%, and none of the (semi‑) fixed expressions with into is repeated more than three times. The strong preference for certain expressions in learner English is reflected in the results for the type/token ratio of (semi-)fixed expressions with into (Table 8) – although the results also reveal a continuum among the learner populations, with the type/token ratio being lower, and hence repetition being more likely in ICLE-FR and ICLE-SP than in ICLE-TSW and ICLE-DU. This continuum, incidentally, corresponds to the continuum predicted on the basis of the learning context, with the Dutch learners coming closer to the native speakers, and the French and Spanish learners lagging behind. As was the case with some of the other features investigated (but not all of them), the Tswana learners turn out to be relatively high on the continuum, coming just after the Dutch learners. It is noteworthy that the learners’ preference for certain (semi-)fixed expressions may vary from one group of learners to the other. Table 9 compares the frequency of two synonymous expressions, take into account and take into consideration, in the four ICLE subcorpora under study. While the French- and Spanish-speaking learners show a marked preference for take into account, the




Table 7. Most frequent (semi-)fixed expressions with into Corpus

Expression

ICLE-DU

take into account come into being take into consideration take into account put into practice take into account put into practice take into consideration

ICLE-FR ICLE-SP ICLE-TSW

Frequency 9 (19%) 4 (8%) 4 (8%) 25 (50%) 7 (14%) 22 (54%) 6 (15%) 16 (41%)

Table 8. Type/token ratio of (semi-)fixed expressions with into Corpus

TTR

BrE ICLE-DU ICLE-TSW ICLE-FR ICLE-SP

0.79 0.50 0.41 0.28 0.22

Table 9. Frequency of take into account and take into consideration

take into account take into consideration

BrE

ICLE-DU

ICLE-FR

ICLE-SP

ICLE-TSW

3 0

9 4

25 2

22 5

1 16

Tswana learners clearly prefer the alternative expression take into consideration (in the Dutch subcorpus the frequency of these two expressions is closer to the reference corpus). This finding partially confirms the tendency, already noted by Sand (2005) for World Englishes, to reduce functionally equivalent variants to “a small number of choices or a single preferred variant” – with the additional caveat that the preferred variant may vary depending on the learner’s mother tongue. At the same time, our results contradict Nesselhauf ’s (2009) conclusion that take into consideration is the preferred option for learners (in general), and hence underline the danger of treating several learner populations as an aggregate.8

8. Cf. Mollin (2006) for a similar warning within the framework of English as a Lingua Franca.

From EFL to ESL 

In contrast to (semi-)fixed expressions with into, which are more frequent in learner English than in native English, phrasal verbs with into tend to be underused by the learners. Figure 5 shows that this is the case in ICLE-SP and ICLE-FR and, to a lesser extent, ICLE-TSW (although, again, the differences are not statistically significant, except for the difference between BrE and ICLE-SP, significant at the 0.005 level); the Dutch learners use approximately the same number of phrasal verbs as native speakers (see Table 6 for the exact figures). The amount of exposure may explain the difference observed between the Dutch and Tswana learners on the one hand and the French and Spanish learners on the other, as a high degree of exposure to the target language is said to be necessary in order to acquire phrasal verbs (Sjöholm 1995). The influence of the mother tongue may also be at work and account for the particularly good results of the Dutch learners, who have phrasal verbs in their mother tongue, unlike the other three groups of learners (see Waibel [2007] on the influence of the mother tongue background on the use of phrasal verbs). Whatever the reason(s) for these results, however, it is remarkable that the Dutch, French and Spanish ICLE subcorpora, once again, are ordered as predicted in Section 2. As for the Tswana subcorpus, it occupies an intermediate position, being situated in-between ICLE-DU and ICLE-FR. As is the case with (semi-)fixed expressions, we notice a tendency among the learners to repeat a small number of different phrasal verbs. Table 10 displays the type/token ratio of phrasal verbs with into in the four ICLE subcorpora and in the reference corpus. The type/token ratios for ICLE-FR and ICLE-SP have been put between brackets, as they correspond to only six and two tokens respectively (by contrast, there are over ten tokens in the other subcorpora). The type/token ratio in ICLE-TSW turns out to be particularly low (0.33). In the data, this translates 9 8 7 6 5 4 3 2 1 0 BrE

ICLE-DU

ICLE-TSW

ICLE-FR

ICLE-SP

Figure 5. Relative frequency of phrasal verbs with into per 100,000 words




Table 10. Type/token ratio of phrasal verbs with into Corpus

TTR

ICLE-FR ICLE-SP BrE ICLE-DU ICLE-TSW

(1) (1) 0.58 0.47 0.33

into a very high degree of repetition of the phrasal verb look into (67% of all the phrasal verbs found in ICLE-TSW). 4.6

Non-standard uses

We also examined the non-standard uses of into. Although the results should be seen as merely indicative, since they rely on the judgement of one native speaker only,9 they still reveal interesting findings. As appears from Table 11, two ICLE varieties stand out: ICLE-SP with almost 16% of non-standard uses and ICLE-TSW with over 30%. In ICLE-FR and ICLE-DU, the non-standard rate stays well under 10%. The low proportion of non-standard uses among the French- and Dutchspeaking learners could be due to a “play-it-safe” strategy (Hulstijn & Marchena 1989): into is only used when the learners feel confident that they can use it. It could also be a reflection of a higher proficiency level and/or greater attention to form/accuracy during instruction (see Table 1). As for the many non-standard uses found in ICLE-SP and ICLE-TSW, they seem to have several origins. One of them is the confusion between static in and directional into, as illustrated by (10) and (11). In the Tswana subcorpus, this also happens with more fixed uses, e.g. (12).10 Table 11. Proportion of non-standard uses of into Corpus ICLE-FR ICLE-DU ICLE-SP ICLE-TSW

Non-standard uses 4.7% 6.5% 15.8% 30.5%

9. Experienced Cambridge ESOL rater, native speaker of British English. 10. The same problem of confusion between in and into is mentioned by Mwangi (2003: 105–106) for Kenyan English, but with a focus on cases where in is used instead of into.

From EFL to ESL 

(10) The great problem of the prisioning is located into the jails or cells. (11) People on the continent of Africa find themselves fallen or trapped into the net of HIV/AIDS due to the fact that, most africans are very poor, hence they cannot affort a living. (12) This resulted into one scarverging for employment in other the cope with advanced life in the city. At the end it encourages prostitution. Another source for non-standard usage is interference from the mother tongue. (13) and (14) are two instances of transfer of phraseological expressions from Spanish. The literal translation of fall into account in Spanish, caer en la cuenta, means ‘to realize’, and put into relevance also has a word-for-word equivalent in Spanish, poner en relevancia, meaning ‘to highlight’. (13) Both men feel very bad, because they fall into account that they have treated very badly Mr. Hardcastle. (14) She represents just another human being who has died because she wanted a change, and she dies because her world was not prepared for that change, this is put into relevance in the epilogue and in the final sentence of the play put in her mouth. In addition, there are a number of instances, especially common in the Tswana subcorpus, where the non-standard use seems to be the result of creativity on the part of the learner. This is the case in the following sentences, all taken from ICLE-TSW: (15) I plea to South African football association to take soccer into a serious consideration. (16) In Uganda the government has tried to fight HIV/AIDS and this has come into fruition. (17) Africa is by and by moving towards its last grave, this is due to the following unoticed facts, yet not taken into seriusness: Poverty is the ambrella “word” and it has other contributary factors which include the following, unemployment, wars and language. (18) Safa should arrange with companies to request them to assist the clubs or sponsor them, therefore the attracting force at European teams must come into fiasco. Although they do not belong to the repertoire of expressions with into in standard English, these expressions are perfectly understandable and thus enable the speaker to get his/her message across. Often, they seem to result from the extension of existing patterns (e.g. come into fruition [16] and come into fiasco [18] seem to be




built by analogy with expressions like come into being or come into contact) and/or blends (e.g. take into seriousness in example [17] could be interpreted as a blend of take into consideration and take seriously). In fact, we may wonder whether such creative uses should be considered as real errors, or rather as new prepositional verbs. To further illustrate this, consider the two examples below: (19) The most important novels written by women were written by people without any experience of life that could enter into the house of a respectable clergyman. (20) Soccer players don’t have to rely on soccer only they can open up their businesses, enter into the corporate world. In both cases, the verb enter is followed by a noun phrase representing a place (the house of a respectable clergyman, the corporate world), a use which normally does not require the preposition into, but which seems to be licensed by the existence of expressions like enter into partnership or enter into discussions, through a process of “semantico-structural analogy” (Mukherjee & Hoffmann 2006: 166). The fact that this expression occurs several times in the Spanish and Tswana subcorpora forces us to reconsider its exact status, as does the presence of the expression in other ICLE subcorpora (the German component in particular), as well as in corpora representing indigenized varieties of English (including Singapore and Kenyan English, cf. Nesselhauf 2009). The line is thin between errors and creative uses (see also Rimmer 2008). Yet, one must recognize that non-native speakers are often denied the right to creativity. As Bamgbose (1998: 1) aptly puts it, “[i]nnovations in non-native Englishes are often judged not for what they are or their function within the varieties in which they occur, but rather according to how they stand in relation to the norms of native Englishes. To this extent, it is no exaggeration to say that these innovations are torn between two sets of norms”. Mukherjee (2010) recommends upholding “the distinction between ‘norm-developing’ L2 speakers and ‘norm-dependent’ foreign-language learners of English”, which amounts to interpreting departures from native standards as errors in the case of learner English and as creative innovations in the case of institutionalized L2 varieties. While descriptive studies such as this one or Nesselhauf ’s do not solve the problem of how to treat this type of usage, they at least have the merit of drawing attention to this crucial issue by highlighting the commonalities across several varieties of English. 5. Novice vs expert writing The control corpus used in our study is a corpus of expert native writing. Some linguists, among others Hyland & Milton (1997), Lorenz (1999) and McCrostie

From EFL to ESL 

(2008), have criticized this type of reference variety on the basis that it sets too high a standard for EFL learners and suggested using a corpus of native student writing instead. To assess the impact of the native variety on the results, we revisited the analysis of into using the Louvain Corpus of Native English Essays (LOCNESS) as comparable data.11 If, as demonstrated in several studies (cf. Hyland & Milton 1997 and Neff van Aertselaer 2008), native and non-native students share a large number of novice writer characteristics, many of the differences highlighted in Section 4 might disappear. The results paint a varied picture. For a number of features there is no difference between the two native varieties. For example, novice native writers display the same frequency of use of into as expert native writers (cf. Figure 6) and a high degree of similarity in the use of syntactic structures. As regards lexical variation, however, the lemma frequency displayed by novice native writers stands midway between expert writers and EFL learners (see Figure 7). This in-between status is confirmed by the results of the semantic analysis. On the one hand, LOCNESS is similar to BrE (and differs from the ICLE subcorpora) in having abstract movement as the most frequent sense. On the other, it is closer to the ICLE subcorpora in having (semi-)fixed expressions as the second most frequent sense, which suggests that “chunkiness” might be a transient feature in the acquisition of literacy. 160 140 120 100 80 60 40 20 0 BrE

LOCNESS ICLE-DU

ICLE-FR

ICLE-SP

ICLE-TSW

Figure 6. Relative frequency of into per 100,000 words (with LOCNESS)

11. While LOCNESS contains data produced by American students, we believe that this does not fundamentally affect the validity of the comparison, for preliminary analyses reveal that the frequency of into in the corpus of British editorials is not significantly different from its frequency in a comparable corpus of American English.



Gaëtanelle Gilquin and Sylviane Granger 80 70 60 50 40 30 20 10 0 BrE

LOCNESS

ICLE-DU

ICLE-FR

ICLE-TSW

ICLE-SP

Figure 7. Relative frequency of verb lemmas with into per 100,000 words (with LOCNESS)

While confirming the fuzzy nature of the native/non-native distinction, our results show that the distinction cannot simply be abandoned in favour of one undifferentiated category of ‘novice writers’. EFL learners prove to display a number of unique characteristics that are not found in novice native writing and require dedicated pedagogical attention (see Gilquin et al. 2007 for further discussion of this issue). 6. Conclusion Our study shows that the concept of ‘learner English’ needs to be broken down. Depending on a series of factors, notably the amount of exposure to the target language and the focus of language teaching, learner varieties display different degrees of similarity with the reference corpora. The term ‘learner Englishes’ reflects this diversity and is therefore more appropriate than the cover term ‘learner English’. Table 12 summarizes the main results of the corpus analysis by showing how the different learner varieties are related to each other and to the reference variety with respect to a number of syntactic, semantic and lexical features. A mere glimpse at the table is enough to make obvious a number of striking similarities and differences. First, the expert native writing reference corpus (BrE) clearly stands out from all the learner varieties. Second, the Dutch, French and Spanish learner corpora display a high degree of consistency while the Tswana variety occupies a

From EFL to ESL 

Table 12. Summary table Frequency Causative structure Lexical variation Abstract movement Freq. expressions TTR expressions Freq. phrasal verbs TTR phrasal verbs Non-standard/ creative uses

TSW

TSW FR

TSW

SP SP SP SP SP SP SP – SP

TSW

DU

FR FR FR FR FR FR – FR

TSW

TSW TSW TSW

DU DU DU DU DU DU DU DU

TSW

BrE BrE BrE BrE BrE BrE BrE BrE BrE

range of different positions. Third, our hypothesis for the Dutch, French and Spanish varieties is largely confirmed: the Dutch learners are the closest to the reference corpus, followed by the French and the Spanish. Fourth, Tswana learner English, for which we found it hard to make any predictions, presents both similarities and differences with the other ICLE varieties. One particularly striking finding is the closeness between ICLE-TSW and ICLE-DU, the two learner populations that have benefited from a high degree of exposure to the target language, albeit of a different nature. This closeness was also established in a study of the passive (Granger 2009), which brought out a much more frequent use of the passive by the Tswana and Dutch learners than all the other learner populations in ICLE. The Tswana variety clearly has a status of its own. Exactly what this status is is difficult to establish at this stage. In relation to ICLE-TSW, Van Rooy (2006: 62) claims that “[a] new outer circle variety of English is clearly emerging in South Africa”. For Kasanga (2006: 76), “it is reasonable to theorize that BSAE [Black South African English] is not a ‘learner language’”. However, Kasanga (2006: 77) further qualifies this statement: “It is important to point out that the form of BSAE which qualifies as a distinct variety in its own right is the ‘acrolang’ form which has reached a certain degree of stability, spread and prominence and excludes the ‘mesolang’ and ‘basilang’ forms”. The impression one gets from analyzing ICLETSW is that it rather qualifies as a mesolang form of BSAE. As such, it shares features with both inner/outer circle varieties of English and ‘mesolang’ varieties of the expanding circle, viz. learner English. As pointed out by Gilmour (2007), who describes a similar situation in Sri Lanka, extensive fieldwork is needed in order to identify the typical (i.e. stable, spread and prominent) linguistic features of the different varieties. Another major finding of our study concerns the degree of expertise of the native speakers represented in the reference corpus. The results show that the




novice native writers share features with both the expert native writers and the non-native writers. This suggests that, while the degree of expertise is an important factor to take into account when comparing learner English with native English, it does not make the nativeness/non-nativeness distinction redundant. Rather, it adds a layer to our understanding of the learner variety, which appears to be characterized by non-native as well as non-expert features. SLA specialists have been aware for quite some time that the EFL/ESL distinction is not a clear-cut dichotomy but a continuum, with many factors pulling language varieties in one or the other direction. In spite of its limited scope, our investigation of the use of into by students learning English in different environments has brought out the power of corpus linguistic methods in substantiating this continuum. In particular, the striking contrast between the Dutch, French and Spanish learners and the Tswana learners has shed some light on the hazy border between the expanding and the outer circle. While the results are promising, however, the field is vast and complex and we can only claim to have lifted a very small corner of a much larger veil. References Bamgbose, A. 1998. Torn between the norms: Innovations in world Englishes. World Englishes 17(1): 1–14. De Cock, S. & Granger, S. 2004. High frequency words: The bête noire of lexicographers and learners alike. A close look at the verb make in five monolingual learners’ dictionaries of English. In Proceedings of the 11th EURALEX International Congress, G. Williams & S. Vessier (eds), 233–243. Lorient: Université de Bretagne-Sud. Gass, S. & Selinker, L. 2001. Second Language Acquisition: An Introductory Course. Mahwah NJ: Lawrence Erlbaum Associates. Gilmour, K. 2007. World Englishes and Sri Lanka. Gilquin, G. 2000/2001. The Integrated Contrastive Model: Spicing up your data. Languages in Contrast 3(1): 95–123. Gilquin, G., Granger, S. & Paquot, M. 2007. Learner corpora: The missing link in EAP pedagogy. Journal of English for Academic Purposes 6(4): 319–335. Ginsburgh, V. & Weber, S. 2006. La dynamique des langues en Belgique. Regards économiques 42: 1–10. Granger, S. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In Languages in Contrast: Text-based Cross-linguistic Studies, K. Aijmer, B. Altenberg & M. Johansson (eds), 37–51. Lund: Lund University Press. Granger, S. 2009. More lexis, less grammar? What does the learner corpus say? Keynote presentation at the Third International Conference Grammar and Corpora, Mannheim, 22–24 September 2009.

From EFL to ESL  Granger, S., Dagneaux, E., Meunier, F. & Paquot, M. 2009. International Corpus of Learner English. Handbook and CD-ROM. Version 2. Louvain-la-Neuve: Presses universitaires de Louvain. . Hasselgren, A. 1994. Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics 4(2): 237–258. Hulstijn, J. & Marchena, E. 1989. Avoidance: Grammatical or semantic causes? Studies in Second Language Acquisition 11(3): 241–255. Hyland, K. & Milton, J. 1997. Qualifications and certainty in L1 and L2 students’ writing. Journal of Second Language Writing 6(2): 183–205. Kao, R.-R. 2001. Where have the prepositions gone? A study of English prepositional verbs and input enhancement in instructed SLA. IRAL 39: 195–215. Kasanga, L.A. 2006. Requests in a South African variety of English. World Englishes 25(1): 65–89. Koolstra, C. & Beentjes, J. 1999. Children’s vocabulary acquisition in a foreign language through watching subtitled TV programmes at home. In Educational Technology Research and Development 47(1): 51–60. Liao, Y. & Fukuya, Y.J. 2004. Avoidance of phrasal verbs: The case of Chinese learners of English. Language Learning 54(2): 193–226. Lorenz, G. 1999. Adjective Intensification – Learners versus Native Speakers: A Corpus Study of Argumentative Writing. Amsterdam: Rodopi. Lozano, C. & Mendikoetxea, A. 2008. Postverbal subjects at the interfaces in Spanish and Italian learners of L2 English: A corpus analysis. In Linking up Contrastive and Learner Corpus Research, G. Gilquin, S. Papp & M.B. Díez-Bedmar (eds), 85–125. Amsterdam: Rodopi. McCrostie, J. 2008. Writer visibility in EFL learner academic writing: A corpus-based study. ICAME Journal 32: 97–114. Mollin, S. 2006. Euro-English: Assessing Variety Status. Tübingen: Gunter Narr. Mukherjee, J. 2010. Corpus-based insights into verb-complementational Innovations in Indian English: Cases of nativised semantico-structural analogy. In Grammar between Norm and Variaton, A.N. Lenz & A. Plewina (eds), 219–241. Frankfurt am Main: Peter Lang. Mukherjee, J. & Hoffmann, S. 2006. Describing verb-complementational profiles of New Englishes: A pilot study of Indian English. English World-Wide 27(2): 147–173. Mwangi, S. 2003. Prepositions in Kenyan English: A Corpus-based Study in Lexico-grammatical Variation. Aachen: Shaker. Mwangi, S. 2004. Prepositions vanishing in Kenya. English Today 20(1): 27–32. Neff van Aertselaer, J. 2008. Contrasting English-Spanish interpersonal phrases: A corpus study. In Phraseology in Foreign Language Learning and Teaching, F. Meunier & S. Granger (eds), 85–99. Amsterdam: John Benjamins. Nesselhauf, N. 2009. Co-selection phenomena across New Englishes: Parallels (and differences) to foreign learner varieties. English World-Wide 30(1): 1–25. Rimmer, W. 2008. Grammatical creativity in learner corpora. Humanising Language Teaching 10(1). Rundell, M. (ed.). 2007. Macmillan English Dictionary for Advanced Learners, 2nd edn. Oxford: Macmillan Education. Sand, A. 2005. Angloversals? Shared Morpho-syntactic Features in Contact Varieties. Habiliationsthesis, University of Freiburg. Sheen, R. 2006. Comments on R. Ellis’s ‘Current Issues in the Teaching of Grammar: An SLA perspective’. TESOL Quarterly 40(4): 828–832.



Gaëtanelle Gilquin and Sylviane Granger Siyanova, A. & Schmitt, N. 2007. Native and nonnative use of multi-word vs. one-word verbs. IRAL 45: 119–139. Sjöholm, K. 1995. The Influence of Crosslinguistic, Semantic and Input Factors on the Acquisition of English Phrasal Verbs. PhD dissertation, Abo Akademi University Press. Swan, M. 2005. Practical English Usage. 3rd edn. Oxford: OUP. Van Parijs, P. 2004. Europe’s linguistic challenge. Archives Européennes de Sociologie 45(1): 113– 154. Van Rooy, B. 2006. The extension of the progressive aspect in Black South African English. World Englishes 25(1): 37–64. Van Rooy, B. 2009. The status of English in South Africa. In International Corpus of Learner English. Version 2. Handbook and CD-ROM, S. Granger, E. Dagneaux, F. Meunier & M. Paquot (eds), 198–204. Louvain-la-Neuve: Presses universitaires de Louvain. Waibel, B. 2007. Phrasal Verbs in Learner English: A Corpus-based Study of German and Italian Learners. PhD dissertation, Albert-Ludwigs-Universität Freiburg. Wehmeier, S. (ed.). 2000. Oxford Advanced Learner’s Dictionary. Compass CD-ROM. Oxford: OUP.

Formulaic sequences in spoken ENL, ESL and EFL Focus on British English, Indian English and learner English of advanced German learners* Sandra Götz and Marco Schilk

Justus Liebig University, Giessen & Macquarie University, Sydney and Justus Liebig University, Giessen In this pilot study we set out to compare formulaic sequences of the type of 3-grams in ENL (spoken British English), EFL (English spoken by advanced German learners of English) and ESL (spoken Indian English). The study shows that, for the overall number of types and tokens, there are no significant differences between ENL and ESL, but there are significantly fewer 3-grams in EFL vs. ENL. A comparison of the common core (i.e. the 3-grams all three variants have in common) reveals that these common-core 3-grams are significantly more frequently used in ESL and EFL-variants compared to ENL. A functional analysis shows differences in the distribution of the 3-grams across the variants. A study of the variant-specific 3-grams reveals less variability in EFL vs. ENL but a higher number and variability of both types and tokens in ESL.

1. Introduction: Comparing ESL and EFL communities and speakers While both the description of new varieties of English and the description of features of learner English have been the focus of many recent corpus-linguistic studies, a comparison between these two variants1 has so far only rarely been undertaken, * We would like to thank the audience at the ISLE Conference for a stimulating discussion, Christopher Koch for technical support, Rosemary Bock for her native-speaker expertise and Joybrato Mukherjee and Marianne Hundt for various comments and detailed feedback on earlier versions of this paper. Of course, all remaining errors or infelicities are our responsibility alone. 1. Note that we will use the term ‘variant’ to refer to ENL, ESL, and EFL in the following, because we do not consider EFL a variety, whereas the neutral term variant seems applicable to all the investigated Englishes, i.e. British English, Indian English and English spoken by German learners.



Sandra Götz and Marco Schilk

laudable exceptions being, inter alia, the studies by Williams (1987) or Nesselhauf (2009). Although there are some obvious parallels between speakers of new varieties of English and learners of English as a foreign language, such comparisons have long been almost a taboo, since they are often considered counterproductive to the acceptance of emergent norms in second-language varieties of English and might thus be in stark contrast to the emancipatory stance of scholars such as Kachru. In the present pilot study we set out to account for parallels and differences between those variants of English on the lexicogrammatical level of frequent multiword expressions. In order to compare spoken language of ESL-speakers (i.e. speakers of a new variety of English) and EFL-speakers (i.e. language learners) it is useful to give a short introduction to the terminology applied. The functional classification system of ENL, ESL and EFL, as, for example used by Görlach (1991), is, strictly speaking, not speaker-based but community-based, so that it would be more appropriate to talk about ENL, ESL and EFL-communities rather than speakers. One of the distinctions between these different communities is the functional and formal range in which English is used: While in the ESL-context English is used for a variety of international as well as intranational functions, in EFL-communities English is mainly used for international purposes or in restricted institutionalized settings such as schools, universities and internationally acting companies. However, even in these specific settings it is rather the topic of instruction than its medium. When generalizing from EFL and ESL-communities to the respective speakers of different variants within the communities, it should be borne in mind that the functional range in which the two types of speakers use English also differs individually. Apart from these functional differences, however, there are a number of parallels between speakers of ESL-varieties and learners in the EFL context. The most obvious of these parallels is the process of language acquisition that can be classified as a second language acquisition process in both cases. However, as Sridhar & Sridhar (1986) point out, the second language acquisition process of speakers of indigenized varieties of English (to use their term) differs in various points from the second language acquisition process of EFL-learners. In the case of ESL-speakers, neither are target norms automatically native-speaker norms, nor is the main function of the variant NS-NNS-speaker communication, but to a much larger extent NNS-communication within the respective community across a large scope of inner-community functions (cf. Sridhar & Sridhar 1986: 5–7). With regard to target norms Williams (1987) points out that: In NIVE [non-native institutionalized varieties of English] situations [...] [t]he original target is often no longer easily accessible (or even desirable) for most speakers. Instead the regional variety has become the standard and the target. Although

Formulaic sequences in spoken ENL, ESL and EFL 

an exonormative standard may be maintained officially, by and large the input is from the new variety, both in and out of the classroom. (Williams 1987: 164)

This difference in the target norms between EFL-learners and ESL-speakers is in turn also the yardstick for the classification of deviant language production. While the language of EFL-speakers would be classified as deviant if it differs from nativespeaker target norms, in “norm-developing” (Kachru 1996: 138) communities the target norm itself may deviate from native-speaker norms. Therefore a deviation from native-speaker norms does not automatically deviate from the target norm, the institutionalized ESL-norm. Thus, although “certain forms [...] found in NIVE strongly resemble forms found in learner language, and at one time, in fact, may have been the process of individual language acquisition” (Williams 1987: 163) these forms have often become endonormatively stabilized features of a variety and can be seen as the accepted target norms for many of its speakers. This difference with respect to target norm also plays a role in the classroom situations in ESL and EFL-communities. At first glance there are clear parallels of the second language acquisition processes between ESL-speakers and their EFLcounterparts. In both cases English is not acquired in early childhood but rather in school. In ESL as well as in EFL-communities the majority of English teachers are not native speakers of English, so that here stabilization of transfer phenomena may be likely. Owing to the differences in target norms, however, different attitudes about the speakers’ language output exist: Deviations from the native-speaker target norm may be deemed unsuccessful language acquisition in learner contexts, but may be accepted in ESL-contexts as here the target norm is the nativized variety. The most transparent example of this is pronunciation. While EFL-learners often wish to acquire a native-like accent, for speakers of ESL-varieties the accent spoken in their speech community may fulfil their linguistic needs better than a ‘foreign’ native-like accent; a case in point are the recommendations of Nihalani et al. (2004: 204) for the pronunciation of educated Indian English, where they point out that “[p]ronunciation teaching should have not a goal which must of necessity be normally an unrealized ideal but a limited purpose which will be completely fulfilled: the attainment of intelligibility”. This goal of intelligibility leads them to propose a model for the pronunciation of educated Indian English that significantly differs from RP and other native speaker accents. Such recommendations are at least highly unlikely to be educational goals of EFL-communities. Another area that needs some attention is the question of linguistic input and functional range of usage. While in EFL-communities linguistic input is mainly restricted to school settings, more precisely to language classes at school, in ESL-communities this input is arguably much more varied. Firstly, within classroom contexts English is often the medium of education for subjects other




than English and secondly there is a larger variety of settings within the speech community in which English is used. This is especially the case in communities where English functions as a link language between speakers with different L1systems. Furthermore in ESL-settings there is usually a large variety of Englishlanguage media, such as newspapers, books or radio and television programs. However, in contrast to ENL-contexts there are certain domains in ESL-communities where English is rarely used, for example, in informal or more intimate settings, as well as in ritual and religious contexts. These parallels and differences in the English language acquisition of ESL and EFL-speakers are the ground for our present analysis of the use of recurrent multiword units in ENL, ESL and EFL-spoken language. The investigation of frequently recurrent multiword units is motivated by the assumption that owing to the second-language acquisition process that differs in many respects from first-language acquisition, speakers of ESL and EFL-variants will differ in their use and their repertoire of these units from ENL-speakers. The differences between language acquisition in early childhood and school-based second-language acquisition will be the focus of the following section. The above-mentioned differences in the functional range of usage in different domains as well as the discussion of applicable target norms motivate our choice of spoken rather than written data. The data we use for the analysis consists of spoken language, since it is more likely that variant-specific features will be traceable in spoken data. This stems from the fact that many of the text-types included in standard written corpora are highly conventional so that in these text-types, text-type conventions may overcompensate variant-specific variation. In a study on variation between British English and American English, Mair (2007) points out that: [...] a context-sensitive definition of variation makes it instantly obvious why there are only minimal lexico-grammatical contrasts between the two major national standards in writing. Formal written language, after all, is highly decontextualised by definition. Lexico-grammatical contrasts increase in informal and spoken language, roughly in proportion to increasing degrees of contextualisation. (Mair 2007: 98)

Apart from these observations on the general unifying nature of the written medium, written texts are often post-edited and therefore features that differ from the norm the editor upholds may get altered and thus the resultant text may not reflect the original speaker’s production. Mesthrie & Bhatt (2008) formulate this caveat as follows: However, writing has its own conventions, some of which have little connection with features of speech. Moreover, we rarely have information on the editing process accompanying the written efforts cited. (Mesthrie & Bhatt 2008: 41)

Formulaic sequences in spoken ENL, ESL and EFL 

This editing process may be of specific importance in the case of published writing, since both EFL as well as ESL-speakers are often advised to seek native-speaker counsel before publishing an English text, whereas the spoken data we use in the present analysis is largely unedited. 2. Frequent multiword expressions in ENL, ESL and EFL Since Sinclair’s (1991) definition of the idiom principle, research into prefabricated linguistic units has been steadily growing. Broadly, it is possible to differentiate two main approaches in this line of research: A quantitative (corpus-linguistic) approach and a psycholinguistic approach. While the quantitative approach aims at showing which strings of language are frequently recurrent, the psycholinguistic approach is, inter alia, designed to capture the cognitive salience of these strings as prefabricated units that are stored and retrieved holistically. The present pilot study is in line with the quantitative approach, but a closer look at the psycholinguistic background is worthwhile, because, for example, the different speaker types under scrutiny have undergone slightly different languageacquisition processes. Especially the difference between first-language acquisition in early childhood and the acquisition of a second or foreign language bears some influence on the use of formulaic language. Child language acquisition is by its very nature more holistic than analytic, as Peters (1983) points out: It is not a dictionary of morphemes that the child is exposed to, but rather an intermittent stream of speech sounds containing chunks, often longer than a single word, that recur with varying frequency. It is out of this stream of unknown meaning and structure that the child must attempt to capture some pieces in order to determine their meaning and preserve them for future use. (Peters 1983: 5)

Because children who acquire a language do not learn words or morphemes but rather meaningful units that may be analyzable into smaller component parts, formulae play a larger role in first-language acquisition than in the learning of a second or foreign language (excluding highly conventionalized formulae that are learned at very early stages of the language acquisition process, like, e.g. routine greeting formulas). In EFL, this process usually sets in considerably later and can be considered as a learner’s correct application of the rules of how formulae are used in specific contexts (cf. Weinert 1995), whereas ENL-speakers solely use their intuition to automatically use the most native-like and idiomatic formula (cf. Pawley and Syder 1983). This is due to the fact that, in contrast to children who acquire a first language, learners of a second and foreign language already know a language system and are familiar with the concept of small component units that




can be used to build larger units. Furthermore, language learners are usually familiar with a writing system and language learning is often based on textbooks. Written text, however, is unlike natural speech, as there is an (artificial) demarcation of word or syllable units in the form of spaces. These spaces do not necessarily correspond to auditory breaks within utterances. At the risk of some oversimplification, one could say that children assign meaning to the units of spoken language in L1 acquisition whereas learners of a second/foreign language assign meaning to the units of written language since they are often taught specific syntactic rules, vocabulary and morphology, so that their acquisition process is far more analytic.2 Furthermore, adult learners have already experienced the properties of the different media in their native language and know that there is only indirect transferability between speech and writing. Esser (2006) describes the distinction of medium-dependent and medium-independent structures on three different levels of abstraction (LoA) as summarized in Figure 1. Although Esser’s (2006) model is not intended to account for differences between childhood first-language acquisition and adult second-language acquisition, his model is helpful in illustrating the differences in these processes. The first level of abstraction captures the medium-dependent structures, phonic substance in the spoken medium and graphic substance in the written medium. “The second level of abstraction lies in the recognition of medium-independent word-forms which undergo medium transferability without loss of information [...;] [t]he third level of abstraction lies in the recognition of higher levels of articulation, Phonic substance

Graphic substance ABCD

>word-form
word-form
word-form
word-form
0.05 for both types and tokens). This might give a first hint that there are no major differences between the occurrence and use of formulaic language in ENL and ESL, at least from a quantitative perspective. For the comparison of LOCNEC1–41 and LINDSEI-GE, however, the picture is slightly different: The frequency of occurrence of both types and tokens of 3-grams is significantly lower in LINDSEI-GE (p < 0.05 for types and p < 0.0001 for tokens). These findings may indicate that German learners of English have less access to formulaic language than ENL and ESL-speakers; this is shown by the learners’ overall infrequent use of 3-grams (represented by the total number of tokens) along with the restricted variability in their use (displayed in the significantly lower number of types) (this has also been found in earlier studies, e.g. De Cock 2000). What might at first come as a surprise, however, is the enormous difference in frequencies of 3-grams between the corpus families: As Table 3 shows, there are roughly twice as many 3-grams in LINDSEI-GE/LOCNEC1–41 than there are in the two ICE86k-corpora and these differences also pertain to the two ENL representatives. This, however, might be caused by the different corpus designs: As in both LOCNEC1–41 and LINDSEI-GE the speakers talk about very similar or even exactly the same topics (e.g. a stay abroad and a particular picture story), it is very Table 3. Overview of all 3-grams in the corpora ICE-India86k Corpus size Number of 3-word clusters Frequency of 3-word clusters % p types p tokens

ICE-GB86k

86,273 86,733 612 588 5195 5160 18.07 17.84 p > 0.05 (G2 = 0.62) p > 0.05 (G2 = 0.38)

LOCNEC1–41 LINDSEI-GE 88,110 86,206 1,198 1,068 11,432 9,846 38.92 34.26 p < 0.05 (G2 = 4.89) p < 0.0001 (G2 = 88.22)

Formulaic sequences in spoken ENL, ESL and EFL 

likely for them to use the same 3-grams to do so – and, thus, the required threshold of a minimum frequency of five is reached more easily. The ICE data from broadcast discussions, interviews and unscripted speech, on the other hand, are not quite as similar topic-wise, and the likeliness for the speakers to use the same clusters and thus reach the required minimum of five occurrences is not as high as for the other corpus family. 4.2

Common core of 3-grams in ENL, ESL and EFL

Despite the quantitative differences in 3-gram frequencies between the three variants, we also expected a certain amount of functional similarities. Since all three variants are exponents of the English language – or, in McArthur’s (2003) and Mesthrie and Bhatt’s (2008) terminology, part of the ‘English Language Complex’ –, a certain degree of overlap is to be expected. This expectation is in line with Quirk et al.’ s (1985: 16) concept of a common core or nucleus: A common core or nucleus is present in all varieties so that, however esoteric a variety may be, it has running through it a set of grammatical and other characteristics that are present in all the others. It is that fact that justifies the application of the name ‘English’ to all the varieties. (Quirk et al. 1985: 16)

In the second step of our data analysis, we compared the 3-grams of all four corpora, categorized them functionally according to the taxonomy outlined in Table 2 and found 54 3-grams shared by all four corpora, cf. Table 4: They form what we wish to call the 3-gram nucleus (or 3-gram common core) of our data. Table 4. 3-gram nucleus of ICE-India86k, ICE-GB86k, LOCNEC1–41 and LINDSEI-GE Functional category

3-grams

speaker-oriented (18)

and I think, I think that, I don’t think, I think the, I think it, but I think, I think we, well I think, I think it’s, so I think, because I think, I would like, would like to, I want to, I wanted to, I have to, when I was, I don’t know the first time, all over the, this is a, there was a, it was a, it was the, and it was, one of the, a lot of, part of the, most of the, a little bit, some of the, at the moment, a kind of, out of the, which is a, one or two, in front of, that was the, in the last, it is a, two or three and you can, you have to, you want to and of course, as far as have to be, be able to, to be a, to have a, to go to, and in the, would have been, want to do

topic-oriented (23)

audience-oriented (3) text-organizational (2) others (8)

 Sandra Götz and Marco Schilk

It is noteworthy that the 3-gram nucleus includes items from all of our functional categories (except for performance phenomena). Especially the high number of speaker-oriented (N = 18) and topic-oriented 3-grams (N = 23) is relevant to the 3-gram nucleus shared by all the variants of English under scrutiny. 4.2.1 Quantitative findings for the 3-gram nucleus Table 5 provides the overall frequencies of the common-core 3-grams listed in Table 4. These 3-grams are considerably more frequent in LOCNEC1–41 and LINDSEI-GE compared to the ICE-corpora, which might again be due to the corpus design as outlined in Section 4.1. A further interesting point, however, is the significantly higher frequency of the common-core 3-grams in both ICE-India86k (p < 0.0001) and LINDSEI-GE (p < 0.01) in comparison to their ENL-counterparts. It thus seems that speakers of both ESL and EFL-variants of English rely more strongly on the 3-gram nucleus and would consequently be expected to show less variation as well as less variability in the use of formulaic language at the level of 3-grams.4 4.2.2 Distribution of 3-grams in the nucleus In order to investigate whether there are possible structural or functional preferences in one of the variants, we analyzed the common-core 3-grams from a functional perspective. As illustrated in Figure 2, there is a frequent use of speakeroriented nuclear 3-grams in LOCNEC1–41 (46%) and especially in LINDSEI-GE (57%), while there are more topic-oriented 3-grams in the two ICE-corpora. Table 5. Number of 3-grams occurring in ICE-India86k, ICE-India86k, LOCNEC1–41 and LINDSEI-GE

Frequency of -wordnucleus-clusters % p types

ICE-India86k

ICE-GB86k

1018

738

3.54 2.55 p < 0.0001 (G2 = 44.34)

LOC-NEC1–41 LINDSEI-GE 1274

1414

3.74 4.92 p < 0.01 (G2 = 10.67)

4. For the EFL-speakers, this observation might be interpreted as an overuse which is similar to Hasselgren’s (1994) observation of learners’ use of ‘lexical teddy bears’ on the lexical level. In our study, the EFL-speakers rely more strongly on the nuclear 3-grams and show less variation in the variant-specific 3-grams (see Section 4.3). Note that, however, this is not observed for ESL-speakers who show both significantly more nuclear and more variant-specific 3-grams than the ENL-speakers.

Formulaic sequences in spoken ENL, ESL and EFL  60

57

50 40

40

44

46

% 30

40 40

38 27

20 10 0

5 4 5 5 Speakeroriented

Topicoriented

5

Audienceoriented

10 10 10 9 3

1 1

Textorganizational

ICE-India 86k N = 1018

ICE-GB 86k N = 738

LOCNEC 1–41 N = 1274

LINDSEI-GE N = 1414

Others

Figure 2. Functional analysis of the 3-gram nucleus in ICE-India86k, ICE-GB86k, LOCNEC1–41 and LINDSEI-GE

Whereas topic-oriented 3-grams are equally distributed across the ICE-corpora (a proportion of c. 40% in each corpus), there is a discrepancy between LOCNEC (38%) and LINDSEI-GE (27%). Text-organizational nuclear 3-grams are rare in all corpora (there are only two of them anyway, cf. Table 4). However, text organization is a crucial function in speech and the low percentages lead to the hypothesis that text-organizational functions tend to be taken over by non-nuclear 3-grams: Either text-organizational 3-grams are extremely variant-specific and therefore there is not a great overlap between the variants, or they might not be internalized in one of the variants, which has, for example, been shown for the use of discourse markers in EFL (e.g. Müller 2005). Another remarkable finding is the lack of performance phenomena in the nucleus, but this observation may be caused by differences in transcription conventions between the ICE corpora, LINDSEI-GE and LOCNEC1–41.5 5. While, for example, filled pauses in LOCNEC and LINDSEI-GE are transcribed as er/erm/ eh/ehm/em/uhm/um/uh/urm/uh/mm (depending on their communicative function), the ICEconventions for transcribing filled pauses is erm, but may not have been used consistently in all available ICE-corpora. This might have led to orthographically different realizations of 3-grams and thus might not have been recognized by our Perl-script.




4.2.3 Speaker- and topic-oriented 3-grams in the nucleus After having looked at similarities and differences in the distribution of 3-grams in the 3-gram nucleus across the functional categories, we would now like to focus on the group of speaker-oriented 3-grams in order to find possible explanations for the high degree of variability between the variants. As illustrated in Table 6, while the distribution of some of the 18 speaker-oriented nuclear 3-grams is generally stable across the four corpora, there are also a few major differences to be mentioned. Some of these differences can be explained by the differences in corpus design, such as the high percentages of the epistemological signals and I think, I think that and I think the in the ICE-corpora: They are the genre-specific preferred choices in discussions especially for ENL, but also for ESL. Another interesting finding is the high percentage of the 3-grams I would like and would like to in ICE-India86k. A tentative explanation for the frequency of these strings instead of their contracted variants could be the more formal nature of ESL-varieties like Indian English, leading to an archaic and writing-oriented overall flavour, as for example pointed out by Kachru (1983). However, due to the Table 6. Speaker-oriented 3-gram nucleus in ICE-India86k, ICE-GB86k, LOCNEC1–41 and LINDSEI-GE 3-gram and I think I think that I don’t think I think the I think it but I think I think we well I think I think it’s so I think because I think I would like would like to I want to I wanted to I have to when I was I don’t know

ICE-India86k

ICE-GB86k

LOCNEC1–41

LINDSEI-GE

#

%

#

%

#

%

#

%

28 28 20 18 14 13 11 10 9 9 5 28 37 23 9 7 10 8

9.8 9.8 7.0 6.3 4.9 4.5 3.8 3.5 3.1 3.1 1.7 9.8 12.9 8.0 3.1 2.4 3.5 2.8

74 52 39 33 22 17 9 12 32 6 6 11 10 16 14 9 13 27

18.4 12.9 9.7 8.2 5.5 4.2 2.2 3.0 8.0 1.5 1.5 2.7 2.5 4.0 3.5 2.2 3.2 6.7

25 12 38 12 27 28 7 7 39 16 5 7 7 25 30 17 40 170

4.9 2.3 7.4 2.3 5.3 5.5 1.4 1.4 7.6 3.1 1.0 1.4 1.4 4.9 5.9 3.3 7.8 33.2

52 27 19 29 34 31 9 14 52 21 10 28 34 33 25 39 28 284

6.8 3.5 2.5 3.8 4.4 4.0 1.2 1.8 6.8 2.7 1.3 3.6 4.4 4.3 3.3 5.1 3.6 36.9

Formulaic sequences in spoken ENL, ESL and EFL 

lack of source recordings from ICE-India, alternation of contracted forms in the transcription process cannot be completely ruled out, a point that adds to the tentative nature of this hypothesis. The most prominent case in point in this group, lending support to this hypothesis, is the 3-gram I don’t know, which is considered the most frequent 3-gram in speech (cf. Biber et al. 1999). While there is a high proportion of the 3-gram in LOCNEC1–41 (33.2%) and LINDSEI-GE, (36.9%) the proportion is extremely low in ICE-India86k (2.8%). However, this might also be due to corpus design, because the proportion is also very low in ICE-GB86k (6.7%), but it is still twice as high as in ICE-India86k. Although the percentages are equally high in both LOCNEC1–41 and LINDSEI-GE we still find functional differences in the use of I don’t know between ENL and EFL-speakers. Consider examples (1) and (2):

(1) yeah so .. it should be okay definitely yeah I don’t know I mean like they said in one of the lectures that you can go and no (LOCNEC1–41) (2) to do a . Sabbath year you know that . erm I I’d know I don’t know if this is expression is correct in English . sabba = oh sabbatical (LINDSEI-GE)

While in ENL I don’t know is most frequently used as an expression to refer to the content of the utterance (i.e. the speaker is uncertain about something that, as in example (1), “should be okay”), in EFL the same 3-gram is most frequently used to express uncertainty about the linguistic features of speakers’ utterances (i.e. the speaker is uncertain about the correctness of an expression or a word they utter, as in example (2)). In EFL this happens in the majority of cases, which shows that learners have not yet internalized the hedging function of the 3-gram I don’t know. The second group we investigated more closely are topic-oriented nuclear 3-grams. As illustrated in Table 7, there seems to be an overall preference for topicoriented 3-grams in ENL and ESL, but the majority of topic-oriented nuclear 3-grams do not show great distributional differences. However, there is a more frequent use of specific patterns in the non-native variants, like this is a, it is a (in ICE-India86k) and a little bit (in LINDSEI-GE) compared to the native-speaker corpora. Especially the highly frequent use of a little bit in EFL might again show the smaller variability the learners have in their 3-gram stock, while native speakers might have access to other semantically similar formulae to choose from, e.g. quite a few. 4.3

Variant-specific 3-grams in ENL, ESL and EFL

Apart from analyzing the 3-gram common core shared by all the variants, it is equally interesting to have a closer look at what features distinguish ENL, ESL and EFL-variants, i.e. which 3-grams can be labelled variant-specific and non-nuclear. Thus, we filtered and analyzed the clusters that are specific to only one of the




Table 7. Topic-oriented 3-gram nucleus in ICE-India86k, ICE-GB86k, LOCNEC1–41 and LINDSEI-GE 3-gram the first time all over the this is a there was a it was a it was the and it was one of the a lot of part of the most of the a little bit some of the at the moment a kind of out of the which is a one or two in front of that was the in the last it is a two or three

ICE-India86k

ICE-GB86k

LOCNEC1–41

LINDSEI-GE

#

%

#

%

#

%

#

%

8 8 27 11 6 6 6 39 24 23 6 5 16 6 11 13 14 9 5 6 11 28 6

2.7 2.7 9.2 3.7 2.0 2.0 2.0 13.3 8.2 7.8 2.0 1.7 5.4 2.0 3.7 4.4 4.8 3.1 1.7 2.0 3.7 9.5 2.0

7 5 20 19 39 10 16 63 48 19 11 19 23 22 21 14 9 8 7 7 6 6 5

1.7 1.2 5.0 4.7 9.7 2.5 4.0 15.6 11.9 4.7 2.7 4.7 5.7 5.4 5.2 3.5 2.2 2.0 1.7 1.7 1.5 1.5 1.2

9 9 7 18 53 10 70 34 100 5 15 17 19 43 8 6 11 5 6 12 5 9 7

1.9 1.9 1.5 3.8 11.1 2.1 14.7 7.1 20.9 1.1 3.1 3.6 4.0 9.0 1.7 1.3 2.3 1.1 1.3 2.5 1.1 1.9 1.5

9 8 10 14 33 13 47 14 71 7 17 54 7 14 8 6 5 7 14 11 6 5 7

2.3 2.1 2.6 3.6 8.5 3.4 12.1 3.6 18.3 1.8 4.4 14.0 1.8 3.6 2.1 1.6 1.3 1.8 3.6 2.8 1.6 1.3 1.8

variants. As we have pointed out above, the corpora vary in design and annotation, so that we decided to analyze the variant-specific 3-grams only within a corpus family (i.e. ICE-India86k vs. ICE-GB86k, and LOCNEC1–41 vs. LINDSEI-GE). 4.3.1 Quantitative analysis of variant-specific 3-grams in ENL, ESL and EFL Table 8 provides an overview of the variant-specific 3-grams that exclusively occurred in only one of the four corpora. A prominent finding shown in Table 8 is the higher frequency of variant-specific 3-grams in LOCNEC1–41 and LINDSEI-GE compared to the ICE-corpora. This is in line with the overall frequency of 3-grams (see Table 3) and can again be explained by the homogeneous content and the similar interview topics of the two corpora leading to a much higher probability of the threshold of five occurrences.

Formulaic sequences in spoken ENL, ESL and EFL 

Table 8. Overview of variant-specific 3-grams in ICE-India86k, ICE-GB86k, LOCNEC1–41 and LINDSEI-GE ICE-India86k Corpus size Number of 3-word clusters Frequency of 3-word clusters % p types p tokens

ICE-GB86k

86,273 86,733 493 396 3773 2820 4.37 3.25 p < 0.001 (G2 = 11.13) p < 0.0001 (G2 = 143.35)

LOCNEC1–41 LINDSEI-GE 88,110 86,206 759 628 5694 4469 6.46 5.18 p < 0.01 (G2 = 9.69) p < 0.0001 (G2 = 122.47)

Looking at the frequencies of occurrence for types and tokens in the two corpus families, Table 8 reveals that, interestingly, ICE-India shows significantly more occurrences for both types (p < 0.001) and tokens (p < 0.0001), although less variability in the use of 3-grams in Indian English as a representative of a non-native ESL variant might have been expected. Within the other corpus family (LOCNEC1–41 vs. LINDSEI-GE) there are significantly more types (p < 0.01) and tokens (p < 0.0001) of 3-grams in ENL than in EFL. This corroborates our hypothesis of EFL-speakers having less access to formulaic sequences in general, combined with a lower degree of variability in their use. 4.3.2 Categorization of variant-specific 3-grams in ENL, ESL and EFL In a similar vein to nuclear 3-grams, we also categorized and compared the distribution of variant-specific 3-grams according to the taxonomy outlined in Table 2. Figure 3 gives an overview of the distribution of the functional categories across the four corpora. As Figure 3 shows, there is little variation in the proportion of speaker-oriented and audience-oriented 3-grams. This means that apart from the common-core 3-grams there is quite a considerable proportion of variant-specific 3-grams that speakers of each variant use to fulfil these communicative functions. There is, however, a significantly stronger tendency towards using topic-oriented 3-grams for both ESL and EFL whereas there is a higher proportion of text-organizational 3-grams in both ENL-corpora. This observation might hint at different preferences for cohesive devices across the three variants of English. Accordingly, almost one third of the EFL 3-grams are performance phenomena, ENL ranges in the middle with 10% and 16% while there are almost no performance phenomena to be found in ESL with only 4%. Since this is the group which includes the highest degree of variation, we had a closer look at the distribution of performance phenomena across the corpora. Specifically, we distinguished between repetitions, self-corrections and hesitations. Figure 4 summarizes our findings.



Sandra Götz and Marco Schilk 50

48

45

41

40 34

35

36

30

28

% 25 20 15

18 17

21

21 17

10

6

5 0

17

16

14 15 7 3

13

10 3 4

Speaker- Topic-oriented TextAudienceoriented organizational oriented

4

4 3

Performance phenomena

ICE-India 86k: N = 3773

ICE-GB 86k: N = 2820

LOCNEC 1–41: N = 5700

LINDSEI-GE: N = 4483

Other

Figure 3. Variant-specific 3-grams within corpus family (ICE-India86k vs. ICE-GB86k, and LOCNEC1–41 vs. LINDSEI-GE)

100% 90% 80% 70% 60% 50% 40% 30% 20% Self-corrections Repetitions Hesitations

10% 0%

ICE-India 86k N = 160

ICE-GB 86k LOCNEC 1–41 LINDSEI-GE N = 299 N = 886 N = 1267

Figure 4. Variant-specific performance phenomena in ICE-India86k vs. ICE-GB86k, and LOCNEC1–41 vs. LINDSEI-GE

Formulaic sequences in spoken ENL, ESL and EFL 

Concerning the different categories of performance phenomena, Figure 4 illustrates that non-native speakers use far more 3-grams including hesitations (i.e. filled pauses like er, erm) than native speakers do. In ICE-India86k 100% of the performance phenomena are hesitations and there are no repetitions and self-corrections at all. This might be a result either of the transcription conventions of ICE-India (i.e. repetitions and self-corrections might have been left out) or of the low frequency of performance phenomena with only 160 instances (as compared to 1267 in LINDSEI-GE). Since performance phenomena are equally distributed across the ENL-corpora and both show a higher percentage of the use of repetitions (e.g. I I want to, ICE-GB86k), this is in line with Biber et al.’ s (1999) finding that native speakers use repetitions as their preferred planning-strategy in speech. In the case of EFL-speakers the high percentage of hesitations mirrors one deficiency of foreign-language learners that renders their speech less fluent (cf. also Götz 2007). The results for ESL-speakers, on the other hand, are not as easily interpretable. On the one hand, it seems very likely that these speakers are more fluent than their EFL-counterparts, as they use English more frequently and not only in academic settings, as is the case for our German learners. On the other hand it is surprising that ESL-speakers use fewer performance phenomena than their ENLcounterparts do. Especially the complete lack of repetitions in the data leads us to interpret these results with some reservation, as the audio source data are not available to us and corpus annotation conventions may have distorted the picture. 5. Conclusion and outlook We would like to conclude this paper with a summary of central findings and two caveats. If we look at the total number of 3-grams used in our corpora, the first prominent finding is the significant differences in the number of 3-grams between native speakers (ENL) and German language learners (EFL), while this is not the case if we compare British native speakers (ENL) with Indian English speakers (ESL). This corroborates our assumption that learners of English have a different and restricted repertoire of formulae, while this is not the case for users of English as an institutionalized second-language variety. A reason for this may lie in the different language-acquisition process and the different range of English language use. Since the ESL-speakers (and especially those represented in the International Corpus of English) use English much more frequently and in a much wider variety of contexts, their daily language use and linguistic input is much closer to that of native speakers and the use of prefabricated language is therefore more common. Foreign-language learners, on the other hand, use English only in a very limited




number of settings and their linguistic input is also restricted to school settings and some media, which leads to a lower degree of prefabricated language use. Parallels in the use of formulae are shown in the 3-gram nucleus, where both EFL-speakers and ESL-speakers use more nuclear 3-grams than the native speaker groups. The important difference here, however, is that while EFL-speakers use more nuclear 3-grams and fewer variant-specific 3-grams, this is not the case for Indian ESL-speakers. An explanation for this may be that EFL-speakers use a limited set of 3-grams for a large variety of different settings where ENL-speakers would display more variation. For ESL-speakers matters are not so clear. What we find is that although they also use more nuclear 3-grams, they also have a larger repertoire of variant-specific 3-grams. At a more qualitative level, we were able to show that there is a stronger tendency for ESL and EFL-speakers to use topic-oriented 3-grams, while in both corpus families the British native speakers made more use of text-organizational 3-grams. This observation might hint at different preferences for cohesive devices across the variants. While speakers of ESL and EFL may be more interested in conveying the content of the message, the cohesion of the message may be of greater importance to ENL-speakers. This is, of course, a tentative claim, given that it is based on a restricted set of N-grams – future research will have to shed further light on this aspect. There are two caveats that we wish to put forward. Note that we showed that German EFL-speakers’ usage displays more and different performance phenomena than the British native speakers included in the study. While the German advanced learners mainly used hesitations as text-planning devices, the British native speakers used many more repetitions as their preferred planning strategy. However, our results from the ENL-ESL setting, based on ICE-GB and ICE-India, are very surprising and not easily interpretable. Firstly, ESL-speakers display significantly fewer performance phenomena than their native speaker counterparts and, secondly, these phenomena are only represented by hesitations. This at least raises the question of corpus transcription and annotation practice. Since no audio data for ICE-India are available at present, it is not clear to us if and to what extent performance phenomena have been transcribed or if they may have been neglected, contrary to the transcription guidelines for the International Corpus of English. Our second caveat concerns the corpora we used for the present study in general. We see corpus-linguistic methodology as an ideal way to reconcile research into varieties of English with the analysis of learner English as corpus data are free from ideological stances and provide researchers with an empirical suitable language data. However, so far there are (to our knowledge) no comparable standard

Formulaic sequences in spoken ENL, ESL and EFL 

corpora that include ENL, ESL and EFL-speakers.6 For this reason we used a somewhat artificial dataset that consisted of two sets of comparable corpora (‘corpus families’), one for the comparison of ENL versus EFL and one for the comparison of ENL versus ESL. While these comparisons lead to sound results within the sets, we interpret our results across the sets conservatively. Although we were able to show certain trends that are in line with our initial hypotheses, it would be highly desirable to carry out future analyses with directly comparable spoken corpus data for all three speaker types and across more variants of English representing ENL, ESL and EFL. References Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson. Brand, C. & Kämmerer, S. 2006. The Louvian International Database of Spoken English Interlanguage (LINDSEI): Compiling the German component. In Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, S. Braun, K. Kohn & J. Mukherjee (eds), 127–140. Frankfurt: Peter Lang. De Cock, S. 2000. Repetitive phrasal chunkiness and advanced EFL speech and writing. In Corpus Linguistics and Linguistic Theory. Papers from the Twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), C. Mair & M. Hundt (eds), 51–68. Amsterdam: Rodopi. De Cock, S., Granger, S. & Petch-Tyson, S. 2006. The Louvain International Database of Spoken English Interlanguage – LINDSEI. (14 August 2009). Esser, J. 2006. Presentation in Language: Rethinking Speech and Writing. Tübingen: Gunter Narr. Gilquin, G., De Cock, S. & Granger, S. 2010. The Louvain Database of Spoken English Interlanguage. Handbook and CD-ROM. Louvain-La-Neuve: Presses universities de Louvain. Görlach, M. 1991. Englishes: Studies in Varieties of English 1984 – 1988. Amsterdam: John Benjamins. Götz, S. 2007. Performanzphänomene in gesprochenem Lernerenglisch: Eine korpusbasierte Pilotstudie. Zeitschrift für Fremdsprachenforschung 18(1): 67–84. Gries, S.Th. & Mukherjee. J. 2010. Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics 15(4): 520–548. Hasselgren, A. 1994. Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics 4: 237–260. Jucker, A.H., Smith, S.W. & Lüdge, T. 2003. Interactive aspects of vagueness in conversation. Journal of Pragmatics 35: 1737–1769. 6. Although the GLBCC (Giessen-Long Beach Chaplin Corpus, cf. Jucker et al. 2003) does include all three variants, the number of ESL speakers is too small for frequency-based analyses.

 Sandra Götz and Marco Schilk Kachru, B.B. 1996. World Englishes: Agony and ecstasy. Journal of Aesthetic Education 30(2): 135–155. Kachru, B.B. 1983. The Indianization of English: The English Language in India. New Delhi: OUP. Mair, C. 2007. British English/American English grammar: Convergence in writing – divergence in speech. Anglia 125(1): 84–100. McArthur, T. 2003. World English, Euro English, Nordic English? English Today 19(1): 54–58. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Linguistic Varieties. Cambridge: CUP. Müller, S. 2005. Discourse Markers in Native and Non-native English Discourse [Pragmatics and Beyond New Series 138]. Amsterdam: John Benjamins. Nelson, G., Wallis, S. & Aarts, B. 2002. Exploring Natural Language: Working with the British component of the International Corpus of English [Varieties of English around the World G29]. Amsterdam: John Benjamins. Nesselhauf, N. 2009. Co-selection phenomena across New Englishes: Parallels (and differences) to foreign learner varieties. English World-Wide 30(1): 1–26. Nihalani, P, Tongue, R.K., Hosali, P. & Crowther, J. 2004. Indian and British English: A Handbook of Usage and Pronunciation, 2nd edn. New Dehli: OUP. Pawley, A. & Syder, F. 1983. Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In Language and Communication, J. Richards & R. Schmidt (eds), 191–226. London: Longman. Peters, A.M. 1983. The Units of Language Acquisition. Cambridge: CUP. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Schmitt, N., Grandage, S. & Adolphs, S. 2004. Are corpus-derived recurrent clusters psycholinguistically valid? In Formulaic Sequences [Language Learning & Language Teaching 9], N. Schmitt (ed.), 127–152. Amsterdam: John Benjamins. Scott, M. 1998. WordSmith Tools. Version 4.00. Oxford: OUP. Sinclair, J.M. 1991. Corpus Concordance, Collocation. Oxford: OUP. Sridhar, K.K. & Sridhar, S.N. 1986. Bridging the paradigm gap: Second language acquisition theory and indigenized varieties of English. World Englishes 5(1): 3–14. Weinert, R. 1995. The role of formulaic language in second language acquisition: A review. Applied Linguistics 16(2): 180–205. Wiktorsson, M. 2001. Register differences between prefabs in native and EFL English. In The Department of English in Lund: Working Papers in Linguistics, Vol. 1, S. Manninen & C. Paradis (eds), 85–94. Lund: Lund University Press. Williams, J. 1987. Non-native varieties of English: A special case of language acquisition. English World-Wide 8(2): 161–199. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: CUP.

Studying structural innovations in New English varieties Ulrike Gut

University of Augsburg This chapter, which is of a theoretical-conceptual rather than an empirical nature, is concerned with the characterization of structural innovations in New English varieties and the question of whether they can be described as transfer phenomena and learner errors. It first gives a review of relevant empirical studies and presents the state of the art in research in second language acquisition on the nature of cross-linguistic influence and the factors that constrain it. Based on this, it subsequently proposes a model of how structural innovations in New English varieties might have emerged. In addition, previous methods of studying these innovations are discussed, and a best-practice methodology for future research is proposed. It is argued that the classification of a structure as either an error or as an innovation depends crucially on the speakers’ and speaker communities’ norm-orientation and attitudes.

1. Introduction “New English varieties” is a collective term for the many postcolonial varieties of English that are spoken in – usually multilingual – countries in which English now has an important status as an official or national language and where it functions as the language of business and commerce, education, media and mass communication and as a means of interethnic communication. It has been shown in many recent overviews (e.g. Hughes & Trudgill 1996, Kortmann et al. 2004, Schneider et al. 2004, Schneider 2007 and Mesthrie & Bhatt 2008) as well as in a large number of smaller empirical studies that all of the New English varieties show specific structural features that vary systematically from the structural properties of the so-called Standard varieties of English, such as those spoken in the U.K. and the U.S. These structures are often described as the result of language contact of English with the other languages spoken in the respective country. Hickey (2004: 529), for instance, writing about the New Englishes in Asia and Africa, states that “the

 Ulrike Gut

background languages of countries where English is spoken have had a decisive influence on its manifestation there”. To give just one example out of a wealth of similar statements in empirical studies, Sridhar (1996), in her analysis of English essays written by Indian students, claims about some syntactic patterns that “almost all these structures are strikingly similar to corresponding structures in Kannada, the students’ mother tongue” (p. 57). Several theories have been proposed on when and how indigenous languages have led to structural innovations in New English varieties – a process that is often referred to as “nativization” or “indigenization”. This process is thought to have begun very early in the formation of a New English variety, when the first generation of the indigenous population learned English as a second language through education in missionary schools or through intensive trading contacts (Schneider 2003: 246). In his five-stage model of the evolution of the New Englishes, Schneider (2003: 248) further claims that major nativization processes will happen when a large proportion of the indigenous population acquires English. This, he writes, often coincides with political independence gained by the former colony, increased contact between English-speaking settlers and the indigenous population and a resulting shift in norm-orientation. This view is shared by Kandiah (1998a: 36), who states that nativization of English in postcolonial countries “has accelerated since the departure of the rulers”. It is thus the cognitive processes underlying both individual language acquisition and large-scale language shift that have been proposed to have led to the structural innovations in New English varieties. Labelled “transfer effects” or “learner errors”, they are adduced as causes for the resulting structural innovations. Mukherjee & Gries (2009: 28), for example, write that some of the lexical forms and grammatical structures observed in New English varieties are “due to loanwords and transfer from local languages”. Referring to the phonologies of New Englishes, Schneider (2003: 248) claims that they will show features “which in many cases linguists will be able to identify as transfer phenomena from the phonology of indigenous languages”. This view is in line with Thomason’s (2001: 75) prediction that if a speaker group shifts to a language without continued contact with native speaker groups, their version of the language, including transfer features, becomes fixed. Jowitt (1991: 47) states that Nigerian English is composed of “a mixture of Standard forms and Popular Nigerian forms, which in turn are composed of errors and variants”. Likewise, writing about Ghanaian English, Gyasi (1990: 24) claims that it includes “errors arising from mother-tongue interference”. All of these claims imply the idea that structural features that were originally produced by language learners came to be adopted by later generation speakers and have developed into stable features of the newly emerged variety of English.

Studying structural innovations in New English varieties 

Yet, what does it mean when the structural innovations in the New Englishes are explained by language contact effects that arose when a (part of a) population in a colonized country adopted English? This claim raises a number of questions, beginning with what such a process would have to look like in detail. Following Muysken (2000: 271), it would require at least three steps: – some features of the indigenous language form part of the language productions of individual speakers who are learning English; – these features remain in the speakers’ language productions even when they have attained a high level of competence in English; – these features are adopted by speakers of subsequent generations, some of whom might acquire English as a first language. It is the aim of this paper to explore, by reviewing relevant empirical research, whether this scenario is a plausible one and which refinements such a model might require. In particular, it will be investigated whether the claim that the structural properties of New Englishes are based on learner errors that have become fossilized serves as a sufficient explanation of currently observed linguistic structures. Since not all structures in New Englishes differ from those of other varieties of English, more detailed explanations of the direction and mechanism of influence by indigenous languages need to be explored: Which linguistic structures are affected by this influence and which are not? Which factors determine whether a structure is affected or not? This will eventually lead to the fundamental question of the nature of the influence of indigenous languages on New Englishes: Can common processes and outcomes of such an influence be found? This paper will suggest answers to some of these questions. In particular, it will deal with the questions of – – – –

the nature of transfer or cross-linguistic influence, its manifestation in newly emerging varieties of English, the factors that in turn constrain the influence of indigenous languages and the best methodology to study the influence of indigenous languages in New Englishes.

The study of the concepts of “transfer” or “cross-linguistic influence” has a long tradition in research on second language acquisition and multilingualism. Section 2 discusses how the analysis of structural innovations in New Englishes could profit from the conceptualizations and findings in this line of research. It offers definitions of the term “transfer” and research findings on its nature. These are compared with research on the different types of influence of indigenous language structures on New English varieties. Section 3 presents what is currently known about the way in which cross-linguistic influence interacts with and is constrained by various

 Ulrike Gut

linguistic and non-linguistic factors and discusses the conclusions that can be drawn concerning the incorporation of transfer errors into a New English variety. Section 4 reviews the methods that have been employed so far for the analysis of the influence of indigenous languages on New English varieties and proposes a best practice methodology for these kinds of studies. The principal difficulty for such a methodology lies in the delimitation of this type of cross-linguistic influence to other, related processes. What, for example, is the borderline between code-switching and transfer, or is it not useful to draw such a line? What constitutes the threshold between a “learner error” and a structural innovation? When can a particular structure in the English produced by a particular speaker be classified as the one or the other? Which structural innovations might only appear to be caused by the influence of the indigenous languages but are actually results of other linguistic processes such as general language change? As a conclusion, Section 5 addresses the key issues of this volume. It suggests answers to the questions of what the difference between a structural innovation and a learner error is and discusses whether such a distinction is useful. Likewise, the differences between learning English as a second language and acquiring a New English variety are discussed. Furthermore, it offers arguments for and against the distinctions between English as a Second Language (ESL) and English as a Foreign Language (EFL). 2. What is cross-linguistic influence? The question of the influence of a language learner’s first language/s on both the course and result of second language (L2) acquisition has long been considered the most central issue in the study of second language acquisition1 (SLA) and is still highly relevant today (see e.g. Eckman 2004). Although first language (L1) influence has been a key concept in SLA since the 1950s and has sparked off a wealth of empirical studies, there is still no agreed definition or terminology for it (see e.g. Odlin 2003: 436). The terminological diversity is reflected in the interchangeable use of “L1 influence”, “transfer”, “interference” and “cross-linguistic influence” by various authors. In this paper the term “cross-linguistic influence” (CLI) will be used. This is motivated by the fact that the definition of the terms “first” or “native” language is a hotly debated subject in the multilingual settings in which New English varieties have emerged. Furthermore, since in most postcolonial 1. In line with most authors, I will use the term second language acquisition (SLA) to refer to the acquisition of any language, be it the second, third, fourth etc., after at least one language has been acquired to the age-appropriate level.

Studying structural innovations in New English varieties 

countries speakers of English are multilingual speakers of a number of indigenous – or, in the case of Cameroon, European – languages, the term “L1 influence” would be too restrictive. The term “CLI”, however, gives justice to the manifold ways in which a multilingual speaker’s languages can influence each other. Following Odlin (1989: 27), the term CLI will be used in this paper to refer to “the influence resulting from the similarities and differences between the target language and any other language that has been previously [...] acquired”. There is considerable disagreement among researchers about the extent and nature of the role of CLI in SLA. The proportion of CLI-induced errors of all errors in L2 language use that has been reported by different authors ranges from 3% to 51% (see Ellis 1985: 29 for an overview). These conflicting claims reflect the fact that as yet there exists no reliable method of quantifying the relative contribution of cross-linguistic influence on any structure produced by language learners. This problem is primarily due to the myriad ways in which CLI can manifest itself: it does not only occur as directly accessible and countable linguistic structures but also as indirect effects underlying many organizational principles of language. Empirical studies (see e.g. Gass 1996 for an overview) have shown that CLI can manifest itself in – – – – – – –

direct borrowings or loans, the production of mixed structures, conceptual transfer, avoidance patterns, preference patterns, hypercorrection and a general facilitating effect in language learning.

Direct borrowings of structures or words from a source language into the target language constitute the classic examples of “transfer”. In the LeaP corpus (see e.g. Gut 2007a), for example, an Arabic learner of English, who had lived in Germany for a number of years and had a good command of German, said in an interview: “...it is important for me to discuss my doctor work here in Germany in English language not in [...] in deutsch [German for German] language” (ax_ara_eng_f_free_c1). Ringbom (2005: 75) lists loan translations and borrowings that he found in English essays written by native speakers of Swedish with Finnish as another L2. Some of them would use the Swedish word och for and or the Swedish word fast for although in an English sentence. It appears that direct borrowings and loans are especially frequent in the early stages of SLA. In his Ontogeny Phylogeny Model, Major (2001) claims that L1 transfer is greatest in the initial stages of language learning and subsequently decreases gradually. Direct borrowings and loans seem to be principally employed as a communicative strategy in which L1 structures,

 Ulrike Gut

syntactic rules and lexical items are used to fill the gaps in the knowledge of the L2 (e.g. Chan 2004). Borrowings and direct transfer do not necessarily result in erroneous productions. In fact, it has been found that positive transfer affects language acquisition more than negative transfer does (e.g. Ringbom 1992). CLI can furthermore result in speakers’ production of mixed structures that exhibit structural properties of both native language/s and target language. Hammarberg and Williams (1993), for instance, report on morphological mixing such as the application of Italian articles and infinitive affixes to Swedish words in a native English and L2 German speaker with some knowledge of Italian. The learner produced the form i grannarna with the Italian plural article i for the Swedish target form grannarna (‘neighbours-the’). Similarly, Baker & Trofimovich (2005) showed that Korean learners of English produce vowels in their L2 that have acoustic properties of both Korean and English. Flege (1987) likewise reports on mixed structures in the production of /t/ by native French speakers of English: the voice onset time values of /t/ were intermediate between those of native French and native English. Many types of CLI, however, do not surface directly as specific structures in L2 language use but are reflected in it in an indirect way. An instance of this is conceptual transfer, which is reported in Carroll et al. (2000). They show that native speakers of English use the cognitive concepts relating to space of their respective L1 when describing pictures in their L2 German. This becomes evident mostly in the fact that, in comparison to German native speakers, they produce far fewer coadverbials such as daneben (‘next to’). These coadverbials do not exist in English, where the location of one object is described in relation to the other by explicitly mentioning this object (e.g. “There is a house; next to it is a tree.” vs. German “Da ist ein Haus; daneben ist ein Baum.” [There is a house; next-to is a tree]). Carroll et al. (2000) concluded that the expression of spatial arrangements in an L2 is influenced by the linguistic possibilities of the L1. Moreover, CLI has been shown to trigger avoidance patterns that cause particular structures not to be used at all by language learners. Schachter (1974), for example, observed that Chinese and Japanese learners of English produced far fewer relative clauses than Persian and Arabic learners did. Her explanation is the major syntactic differences between Chinese and Japanese on the one hand (pronominal relative clauses) and English on the other (postnominal relative clauses) – a difference that does not exist between either Arabic or Persian and English. The grammatical difference between their native and the target language, she suggests, might have led the Chinese and Japanese learners to actively avoid using this structure in English. A similar explanation was put forward by von Stutterheim (2003) to explain her findings. She analyzed narratives produced in German by native speakers of English and focussed especially on the ‘end-frames’, the utterances ending the

Studying structural innovations in New English varieties 

narrative. While native speakers of German often produce end-frames with which actions are located or bound to a specific point (e.g. “The dogs are walking down a lane towards the house.”), the English L2 speakers of German use fewer or none of these end-frames. Von Stutterheim explains this by pointing out that end-frames with reference to specific points are rarely used in native English, since the focus of the narrative typically lies on the action and is expressed by a verb with progressive aspect. The lack of morphological marking of progressive aspect in German leads these learners to underuse end-frames in their L2 German. CLI cannot only underlie certain avoidance patterns but might also trigger preferences in the L2, such as the overuse of certain types of apologies (Olshtain 1983) or intensifiers (Lorenz 1998). Structures present in the L1 can thus indirectly influence structures in the L2 by increasing their frequency. Similarly, CLI can cause hypercorrections, when speakers are aware of a structural difference between their L1 and L2 and overgeneralize ‘corrections’ to inappropriate structures. This has, for instance, been shown by Odlin (1989: 38) for spelling hypercorrections. Finally, the influence of previously learned languages can manifest itself indirectly in a speeding up and facilitation of the learning of additional languages in general. This has been demonstrated for the areas of lexis and syntax (e.g. Ringbom 2005, Fouser 2001 and Cenoz 2005) as well as for the transfer of general metalinguistic knowledge or language awareness acquired in previous language learning (e.g. Fouser 2001, Ó Laoire 2005). In summary, it is well-documented that CLI affects all linguistic subsystems and that the types of influence range from direct to indirect and comprise avoidance and preference patterns, mixing, borrowings as well as general facilitating effects in language learning. The complex ways in which language transfer can manifest itself has also been described in some studies on New Englishes. Mair (2002), in his investigation of the influence of Jamaican Creole on the emerging Standard Jamaican English, claims that the influence can be both ‘direct’ and ‘indirect’. The latter refers to a kind of influence that “can be invoked to account for the ways in which a superficially English word or construction are used [...] without the creole form itself surfacing in the English utterance” (2002: 41). By the same token, Kandiah (1998b: 87) concludes from his investigation of Singapore English structures that they are based on structures of both the speaker’s native language (L1) and English, but that they combine to produce a rule-governed structure that incorporates new grammatical and semantic features and stands independent of these languages. Alsagoff and Ho (1998) analyzed relative clauses in colloquial Singapore English, where constructions such as (1) can be found:

(1) The man sell ice-kachang one gone home already

 Ulrike Gut

The authors (1998: 132) explain both the choice and the position of the relative pronoun in this utterance with L1 influence from Chinese. First, the relative pronoun in this utterance is one, which is analyzed as a calque from the Chinese nominalizer (de in Mandarin, e or ge in Hokkien and ge in Cantonese) that has acquired nominal and pronominal function in colloquial Singapore Chinese. Furthermore, one appears at the end of the relative clause, which mirrors the position of the relative pronoun in Chinese (Alsagoff & Ho 1998: 130). The order of head and relative clause, by contrast, follows English rules (in Chinese, the head follows the relative clause). Constructions such as (2)

(2) The man who sell ice-kachang one gone home already

in which the relative clause has two relative pronouns, further demonstrate the extent of structural mixing of English and Chinese. Here, the English relative pronoun who, appearing at the beginning of the relative clause and thus following English word order rules, is combined with the Chinese relative pronoun one in end-position of the relative clause (Alsagoff & Ho 1998: 133). The next section discusses the questions of why and when the different types of CLI occur in second language productions and which linguistic and non-linguistic factors have been found to constrain their occurrence. This might help to explain which features of the other language/s of speakers of a New English variety can form part of a speaker’s language productions and whether these features are likely to remain in the language productions of highly advanced learners of English and thus qualify as candidates for an adoption into an emerging New English variety. 3. Factors constraining cross-linguistic influence It has been repeatedly shown in research on CLI that it is impossible to predict all learning difficulties and outcomes of cross-linguistic influence (see Gass 1996: 324, Odlin 1989: 441f.). One reason for this is the high inter-individual variation, even among learners of the same native and second language. Many issues such as the social context, the learner’s age and gender, motivation and type of instruction combine in myriad ways that make the learning situations of individuals virtually unique. Yet, some factors have been identified that have a systematic influence on the frequency and type of CLI. The linguistic subsystem seems not to be one of them. CLI occurs on all linguistic levels including morphology, syntax, phonology, semantics and pragmatics, and no evidence exists yet to show systematic differences in its relative frequency across the different linguistic subsystems (see Odlin 2003: 439). Factors that have been shown to constrain CLI are

Studying structural innovations in New English varieties 

– (perceived) language similarity, – markedness and universal rules and – proficiency in the L2 (learning stage). Transfer appears to be most likely when there is a structural similarity between the first and the second language. Jackson (1981), for example, found that Punjabi learners of English did not transfer structures that were very different in the two languages (e.g. the position of the verb in a sentence), but that transfer occurred between similar structures or equivalent items such as possessive constructions involving of. Likewise, Flege (1987, 1995) showed that phonological transfer is more likely to occur when the articulation of two sounds in the first and second language are similar than when a sound in the L2 is completely new. Similarly, Cenoz (2005) found that Spanish/Basque bilinguals, when speaking English, borrowed more words from Spanish, an Indo-European language like English, than from Basque, a non-Indo-European language. Kellerman (1979, 1983) was one of the first authors to propose psychotypology as a factor constraining the amount of transfer. The term psychotypology refers to the distance between two languages perceived by language learners. When learners perceive a similarity between their native language/s and their L2, transfer is more likely to occur, notwithstanding whether there exists an actual close linguistic relationship between the languages or not. This effect was for example observed by Ringbom (2005) and by Fouser (2001). Ringbom analyzed the production of single words in an English sentence-completion task by L1 Swedish speakers with a good command of Finnish and L1 Finnish speakers with a good command of Swedish. He found that while both groups produced English words that showed influence from Swedish, an Indo-European language like English, transfer effects from Finnish, a non-Indo-European language, were very rare. Fouser analyzed introspective data on the morphosyntactic and lexical acquisition of Korean by two speakers with some knowledge of Japanese. Both learners claimed that their knowledge of Japanese helped their acquisition of Korean and that they actively drew upon this knowledge. Gass (1996) even claims that when learners do not perceive any crosslinguistic similarity between the native language and the target language, they will invoke universal learning strategies rather than CLI in their productions. No systematic study has yet been carried out to investigate the interrelation between particular types of CLI and (perceived) language similarity. There is evidence that a number of different types of CLI can occur when language learners perceive different degrees of language distance. Cenoz (2005), for instance, reports on finding borrowings of Spanish lexical items in English “without any phonological or morphological adaptation” (p. 13) in the productions of Basque/Spanish learners of English. Borrowings of Basque, a language totally unrelated to English,

 Ulrike Gut

were however not found. By the same token, Ringbom (2005) observed in Swedish and Finnish learners of English that the production of hybrids, blends and relexifications occurred in the case of language similarity (Swedish and English), but that loan translations and semantic extensions tended to occur with greater language distance (Finnish – English). Furthermore, it has been reported that a high degree of perceived language distance triggers avoidance patterns, as observed in Schachter (1974) and discussed in Laufer and Eliasson (1993) and that it is perceived to hinder language acquisition in general (Ó Laoire 2005). Some studies on CLI furthermore show that native language influence is intertwined with linguistic universals. Universals are linguistic generalizations that have been postulated on the basis of an examination of a large sample of genetically unrelated and geographically non-adjacent languages in terms of their shared properties. They describe the occurrence, absence or co-occurrence of linguistic structures in any given language and can be divided into absolute and implicational universals. Absolute universals are inherent in all languages of the world, whereas implicational universals involve two language properties in a conditional relationship such as “if X then Y”, where the presence of one structure implies the presence of another structure but not vice versa. In such cases, the implicated structure is regarded as less marked. Typological markedness is thus an asymmetric relation that holds between language structures, based on the distribution of these structures in the languages of the world. It has been invoked to explain and predict learning difficulty and transferability in second language acquisition. Eckman (1984, 1991) claims that only those structures of the L2 that differ from the L1 and that are more marked than in the L1 will be difficult to learn. For example, a learner of English – a language that allows both marked stop + stop and unmarked fricative + stop consonant clusters in the coda position of a syllable – whose L1 does not allow any consonant clusters is predicted to experience both difficulties with the consonant clusters in general and more difficulties with stop + stop clusters than with fricative + stop clusters. Moreover, marked structures are considered to be less likely to be transferred to an L2 than unmarked ones. Kellerman (1979, 1983), by contrast, defines markedness as a cognitive concept, a hypothesis speakers have about linguistic structures: a marked structure of a language is thus one that is perceived as irregular, infrequent or semantically opaque, whereas unmarked structures are perceived as regular, frequent and easily interpretable. Different types of CLI can be triggered by language universal processes. The analysis of the production of final consonant clusters, such as the sequence [ndz] in friends, by learners of English showed several processes in second language speech: deletion of one or more consonants in the cluster (e.g. Broselow, Chen & Wang 1998, Major 1996, Bayley 1996, Hancin-Bhatt 2000, Hansen 2001, 2004), vowel epenthesis to break up the cluster (e.g. Major 1996, Hancin-Bhatt 2000,

Studying structural innovations in New English varieties 

Hansen 2001), paragoge (the adding of a schwa at the end of the cluster; e.g. Hodne 1985, Hansen 2001, 2004) and final devoicing (Broselow et al. 1998, Hansen 2001), all of which have been claimed to reflect universal principles, such as the preference for a CV syllable structure (Tarone 1980, Hodne 1985, Anderson 1987, Eckman 1991, Major 1996) and markedness constraints (e.g. Eckman 1991, Major 1996). Furthermore, the avoidance strategies in the production of relative clauses observed by Schachter (1974) could also be caused by markedness constraints. A stable finding across many studies is that CLI seems to occur more frequently in the language production and perception of beginners than in that of advanced language learners. Ellis (1985: 37) presents several claims that suggest that CLI mainly occurs as a strategy for successful communication when there are still insufficient L2 resources and is thus limited to the early stages of language acquisition. A similar proposal was made by Major (2001) in his Ontogeny Phylogeny Model, where he claims that L1 influence decreases steadily over the course of language acquisition. Some empirical studies have found evidence for this. Hammarberg & Hammarberg (1993), for instance, report that cross-linguistic influence from the L2 to the L3 is greater when the speaker still has a low proficiency level in the L3 than when the proficiency has increased. Similarly, Abrahamsson (2003) found a U-shaped learning curve in the acquisition of English and Swedish by Mandarin Chinese speakers. After an initial phase with few errors in the production of coda clusters, a phase with many transfer errors occurred, which in turn was followed by target-like production. Chan (2004) analyzed CLI by Cantonese learners of English in terms of the copula, the placement of adverbs, the expression of the existential or presentative function, relative clauses and verb transitivity. She found that syntactic transfer was highest for the learners with the lowest proficiency level. There exists as yet no empirical evidence that it is the learners’ age rather than level of competence that influences the degree of CLI. An increased occurrence of CLI at the early stages of second language acquisition has been observed for both child and adult learners. No systematic studies have been carried out so far on the question of whether learners in the early learning stages show different patterns of cross-linguistic influence than more advanced learners. In fact, Odlin (2003: 467) claims that such a description “still seems very far in the future”. Some studies, however, suggest that the type of CLI observed by less advanced language learners tends to be direct borrowing and loans, while in later stages other types of CLI dominate (Chan 2004). Major (2001), furthermore, claims that language universals have the highest degree of influence in intermediate stages of language learning but decrease in importance with increasing proficiency. Studies on emerging New Englishes point to the fact that it is mainly indirect forms of CLI that are prevalent in language productions by advanced learners. Mair’s (2002) analysis of lexical and grammatical

 Ulrike Gut

transfer from the Creole to written English produced by Jamaican undergraduates demonstrated a strikingly low number of direct loans, but a large number of indirect cross-linguistic influences such as tendencies towards hypercorrection and avoidance. Likewise, distinctive preference patterns in the use of English intensifiers by advanced Xhosa learners but no direct borrowings or other types of CLI have been reported by de Klerk (2005). In conclusion, the role of the factors influencing CLI is highly complex. (Perceived) language similarity triggers CLI more than (perceived) language distance. This means that the overall frequency of CLI in New Englishes can be expected to be rather low since none of the indigenous languages of Africa, and few of the languages in Asia (exceptions are e.g. Hindi, Bengali, Punjabi) are typologically close to English. Indeed, it has been shown that speakers in postcolonial countries where both English and French are spoken show more influence from French in their English than from indigenous languages (see Ringbom 2005: 71 for references). Furthermore, it appears that different types of CLI can be expected at different levels of proficiency: Direct transfer appears to be most dominant in the early stages, universal strategies seem most influential in intermediate stages, whereas indirect forms of transfer seem to be dominant in later stages. It is evident, however, that more systematic studies are required to substantiate these tendencies. Most researchers now favour a multi-factor approach to analyzing CLI that assumes a relationship between universal factors, specific factors about the learner’s L1, specific factors about the learner’s L2 and extra-linguistic factors such as learning strategies and social context. The fact that CLI can manifest itself in manifold and subtle ways, both quantitatively and qualitatively, and that it affects structures in seemingly unpredictable ways calls for a rigorous methodology to study CLI in the development of the New Englishes. Such a methodology will be suggested in the next section. 4. How to study cross-linguistic influence in New Englishes Given the complex nature of L1 influence, its complex ways of direct and indirect manifestation, its constraints and seeming unpredictability – how can the role of CLI in the development of innovative structures in New Englishes be investigated? In particular, how can it be determined which structures in New Englishes reflect CLI and which have other causes? In the following, the methods that have been employed in some studies on structural innovations in New Englishes will be discussed and a best practice methodology for future research enterprises will be proposed.

Studying structural innovations in New English varieties 

Previous research efforts on structural innovations in New English varieties have employed methods that can be grouped into four categories: – comparisons of a structure in a New English variety with the same structure in a standard variety of English; – comparison of the same structure in various New English varieties; – comparison of a structure in a New English variety with the same structure in one or more of the indigenous languages spoken in the country; and – comparison of a structure in a New English variety and in learner language. A large number of studies compare a structure in a New English variety with the same structure in a Standard variety of English. Setter (2003), for example, compared vowel reduction in Hong Kong English and British English. She found that many syllables that are usually pronounced with a reduced vowel in British English are pronounced with a full vowel in Hong Kong English. In addition, she observed that weak and unstressed syllables are significantly longer in Hong Kong English than in British English. Likewise, Olavarría de Ersson and Shaw (2003) observed systematic differences in verb complementation patterns between written Indian English and British English. Some rare constructions in British English (such as NP-V-NP-NP constructions) occur considerably more often in Indian English, whereas others such as NP-V-NP-with-NP constructions occur less frequently. This methodology succeeds in describing structural differences between New English varieties and Standard varieties of English. It, however, has the obvious drawback that conclusions about the ‘diverging’ structures in a New English variety being caused by cross-linguistic influence have to remain speculative. Further evidence for this hypothesis would be required in the form of comparisons with either those structures in the other language/s of the speakers of the New English variety or with other New English varieties. The latter type of comparison is realized in recent large-scale studies that compare structures across different New English varieties. Such comparative studies have become increasingly easy with the availability of comparable corpora such as the ICE (International Corpus of English) corpora (Greenbaum 1996). Schneider (2004), for instance, investigated particle verbs in the ICE India, ICE East Africa, ICE GB, ICE Philippines and ICE Singapore. He reports on “traces of possible structural nativization” (p. 247), innovations in terms of the frequency of use of particle verbs in general and individual verbs in particular, and preferences for certain lexical items, structural uses and meanings in specific varieties. In a similar study, Mukherjee & Gries (2009) analyzed the occurrence of intransitive, monotransitive and ditransitive constructions with different verbs in the ICE Hong Kong, ICE India, ICE Singapore and ICE GB. They found that the similarity of such “collostructional” patterns with British English patterns decreased in these

 Ulrike Gut

New English varieties in the order Hong Kong English –> Indian English –> Singapore English, with the latter showing the greatest divergence from British English patterns. These observed differences are interpreted as a reflection of the progress of nativization in the respective varieties. These studies demonstrate the relevance of corpus-based explorations of structural properties of New English varieties. Differences between the varieties in both distribution and preference patterns of certain structures can be shown. Furthermore, structural patterns that are common to several New English varieties can be found with this method. The results, however, can only be broadly described as being caused by nativization. The exact mechanisms of this nativization and especially the role of the speakers’ other languages, as Mukherjee & Gries (2009: 48) point out correctly, remain to be explored in detail. Some studies of the structural properties of New English varieties rely on direct comparisons with the corresponding structural properties in the indigenous languages of the country. Gut (2005), for example, compared the speech rhythm of Nigerian English with the speech rhythm of Hausa, Igbo and Yoruba, the three major indigenous languages spoken in Nigeria. She showed that speech rhythm in Nigerian English is distinctly different from both the speech rhythm of Hausa, Igbo and Yoruba, which group together, on the one hand and British English speech rhythm on the other. Equally, in terms of number of different syllable types produced in a reading passage, Nigerian English is positioned between the Nigerian languages and British English. As described in Section 2 above, Alsagoff & Ho (1998) compared the properties of relative clauses in colloquial Singapore English – in terms of order of head and attribute clauses as well as position and type of relative pronoun – with those properties of Chinese and Malay relative clauses. They were able to show that relative clauses in Singapore English are hybrid structures that combine grammatical elements and rules from the indigenous languages and English. The comparison of structures in the New English varieties with those in the indigenous languages of the country provides evidence for possible cross-linguistic influence. These studies show in which ways cross-linguistic influence can manifest itself on the different linguistic levels. Conversely, they demonstrate clearly the difficulty of delimitating cross-linguistic influence from code-switching: Did the bilingual speakers of colloquial Singapore English produce these forms ‘involuntarily’ as cross-linguistic influence or as code-switches in the knowledge that the addressees, being bilingual speakers themselves, would have no trouble interpreting them? Would they use different forms when speaking English to Singaporeans of Indian ethnicity or to non-Singaporeans? If the answer is ‘yes’, the observed structures cannot be called stable features of colloquial Singapore English but should rather be classified as linguistic phenomena that speakers can

Studying structural innovations in New English varieties 

handle flexibly and consciously. Given that the precondition for any type of crosslinguistic influence to occur at all is what researchers in bilingualism and multilingualism call an underlying ‘bilingual mode’, this distinction is maybe academic. Grosjean (1998, 2001) claims that the degree of activation of a bilingual’s languages can range from highly active to deactivated with the corresponding language modes ranging from a monolingual mode to a bilingual mode. Occurrences of cross-linguistic influence, code-switching and language mixing are assumed to be especially frequent in the bilingual mode, while they are drastically reduced in the monolingual mode, in which the second language is deactivated. Thus, codeswitching and innovations can be envisaged as being positioned on a continuum rather than as separate phenomena. One shortcoming of studies that compare structures in New English varieties with those in the indigenous languages of the country is that they cannot answer the question whether the observed structures constitute “learner errors”, as has been proposed. This can only be done in studies that directly compare learners of English with speakers of New English varieties. To date, only very few studies have compared speakers of a New English variety with learners of English. Nesselhauf (2009) compared phraseological expressions in the ICE Kenya, ICE Jamaica, ICE India and ICE Singapore with British English on the one hand and the learner English contained in the ICLE (International Corpus of Learner English) corpus on the other. She showed that, in terms of frequency of certain collocations and phraseological expressions, the New Englishes neither generally pattern like British English nor like learner English, but that they lie in between. Nesselhauf (2009: 23) explains the observed differences between the New Englishes and learner English by stating that some of these structures “have already become, to some degree, an accepted feature of certain L2 varieties”. A similar conclusion was drawn by Gut (2007b), who compared coda cluster reduction in Nigerian English and Chinese Singapore English with English produced by Chinese learners of English. All speakers had typologically similar native languages that differ distinctly from English phonology: The speakers of Nigerian English speak either Ibibio or Anaang as their native language, both of which do not allow any coda consonant clusters. Similarly, the speakers of Singapore English had Mandarin Chinese as their L1, a language that does not have any coda consonant clusters either. English, by contrast, is a language that allows up to four consonants in the coda position, as for example in the word texts [teksts]. The two speaker groups were therefore expected to show the same patterns of cross-linguistic influence in the area of coda consonant cluster realization. The speech of these speakers was compared to the speech of four native British English speakers and the Chinese L2 learners of English that Hansen (2001) investigated. The surprising result was that Singapore English and Nigerian English speakers show distinctly different patterns of coda cluster realization despite having the same

 Ulrike Gut

typological L1-L2 differences. Singapore English speakers produce a deletion rate similar to that of the British English native speakers, whereas Nigerian English cluster reduction is more similar to that in the L2 English speech produced by the Chinese learners. This study showed that cross-linguistic influence cannot be the only reason for structural innovations in New English varieties and that other factors must play a decisive role. Gut (2007b) explained her findings by pointing out the different norm-orientation that speakers of Nigerian English and speakers of Singapore English have. While in Singapore a nearly universally accepted local norm of English has developed, in Nigeria the orientation is extra-normative with British English serving as the model and yardstick of comparison in school examinations and value judgements. The influence of speaker attitudes on the manifestation of innovations was also observed by Sharma (2005) in her investigation of Indian English speakers in the U.S. She analyzed past tense marking, copula use and agreement as well as a set of phonological features in spoken English and found that the speakers’ attitudinal orientation corresponded with the use of “Indian” phonological features. Those speakers who expressed more positive attitudes towards American English and a greater motivation to adapt to the local variety showed fewer phonological features that could be characterized as “Indian” (aspiration, velarization and nonrhoticity) in their English. The syntactic features of their speech, however, were not influenced by attitude but rather by proficiency level. From her findings Sharma concluded that “non-native varieties of English can be distinguished from ‘approximative’ second language systems in both structural and attitudinal aspects” (2005: 219). Other factors besides attitudes and norm-orientation that affect the nativization of African New English varieties have been adduced by Simo Bobda (2003). He lists population movements, postcolonial opening to other continents, psychological factors and colonial input as such factors. The latter is also stressed by Mufwene (2001) and Mesthrie & Bhatt (2008: 47), who argue that some structures in New English varieties can be explained by dialect features of the different dialects spoken by the ‘founding fathers’ of colonized countries. This is confirmed in a study by Awonusi (1986), who claims that it was the different paths of Western and Eastern Nigeria in terms of colonization, administration and education that have resulted in the diverse accents spoken in these regions. In the West, the missionaries first employed Englishmen speaking RP as teachers in their schools. When they left as a result of the World War, Nigerian teachers took over, who had to rely on textbooks as a guide for English pronunciation. In the East, by contrast, schools recruited missionaries from Scotland and Ireland, and features of these accents can still be traced in today’s Igbo English.

Studying structural innovations in New English varieties 

This review of the methods that have been applied so far to study structural innovations in New Englishes has shown some shortcomings of previous research efforts. It has become clear that an investigative methodology for the study of cross-linguistic influence in New English varieties needs to be able to answer satisfactorily the following four basic questions: 1. Is there any evidence in other New English varieties with speakers of typologically similar and typologically different indigenous languages for or against explanations based on cross-linguistic influence? 2. Is there any evidence in their language productions for or against explanations based on cross-linguistic influence? 3. Can the structure be caused by or influenced by external factors, such as those described by Simo Bobda (2003)? 4. Can the structure in question be an outcome of internal language change, such as regularization processes? In order to answer the first question, comparative studies of New Englishes are necessary. Before categorizing a structure in a variety of English as the result of cross-linguistic influence, it should be tested whether this structure also occurs in other varieties with speakers that have an L1 with the same or with different structural properties. Moreover, such comparisons can reveal common features in New English varieties. As Mesthrie (2008: 634) says, “[t]he large number of similarities across L2 Englishes [...] needs to be explained more carefully than in the past, where the default assumption has often been interference from the substrates. Since there are over a thousand of these substrate languages in Africa-Asia, the explanation of interference has to be considerably fine-tuned”. The methodology can be further enriched by including learner language in the comparison, as suggested by question 2. For example, if the study of relative clauses in colloquial Chinese Singapore English by Alsagoff & Ho (1998) had included a comparison of the results with Chinese L2 English speakers’ productions of relative clauses, systematic differences between the two would have become apparent: L2 learners produce very few errors in relative clauses (11.8%; Schachter 1974); the most frequent type of error is the inappropriate use of whom, while lexical borrowing of relative pronouns does not occur. Altogether, learners seem to actively avoid producing relative clauses at all. This comparison shows that in colloquial Singapore English other factors must come into play that determine the type of L1 influence than those factors that constrain cross-linguistic influence in second language acquisition. These factors are likely to be language attitudes, linguistic identity construction and a propensity for code-switching in this speaker community. By the same token, question 3 encourages researchers to take into consideration some external factors that might have caused the innovative structure in a

 Ulrike Gut

New English variety. These factors include affective and attitudinal aspects (Sharma 2005, Simo Bobda 2003, Gut 2007b) as well as possible founding fathers effects (Mufwene 2001). The methodology of studying the role of CLI in New Englishes should furthermore comprise investigations of whether the innovation has been caused by general processes of language acquisition and change rather than cross-linguistic influence. For this, a comparison with findings from studies on language change in general and with Pidgin and Creole studies is necessary. Mair (2002: 55), for example, states that “many important salient features of written Jamaican English are not due to the creole substrate but should rather be seen as local Caribbean effects of trends commonly encountered in world English [...] or even as independent innovations”. 5. Summary and conclusion This paper was concerned with structural innovations in New Englishes, which are often described as evidence for “L1 influence” and “learner errors”. Its objective was to explore how research on these structural innovations can benefit from related research in second language acquisition. Section 2 showed that cross-linguistic influence is a highly complex phenomenon that often does not surface as direct structural transfer, but rather as the use of prior linguistic knowledge that can manifest itself in such diverse ways as the speed and path of acquisition, the avoidance or the overproduction of certain structures. It was also argued that while some studies on New Englishes have recognized the complex nature of cross-linguistic influence, others tend to focus mainly on direct transfer, a phenomenon that appears to be restricted to the early stages of language acquisition and is thus less likely to underlie language productions by the advanced “learners” of English that make up the population of postcolonial countries. The study of transfer phenomena in New Englishes can thus profit from findings in second language research by taking them as a guideline and model in the search for all the possible manifestations of cross-linguistic influence. Broad claims that structural innovations can be explained with L1 influence and learner errors during individual language acquisition and language shift should be replaced by detailed investigations of the specific ways in which a particular structure might have emerged. Not only the different types of cross-linguistic influence but also the factors that constrain their occurrence need to be considered when explaining structural innovations in New Englishes. If indeed these structures enter an emerging New English variety as “errors” produced by individuals learning English, they need to be features that are likely to occur even in the language of fairly proficient speakers. Section 3 showed that our knowledge of which types of cross-linguistic

Studying structural innovations in New English varieties 

influence are produced in later stages of language acquisition still has gaps: It seems that direct loans and borrowings as well as universal strategies decrease with the increase of proficiency, whereas mixing and some forms of indirect crosslinguistic influence such as preference and avoidance patterns remain. In this area, research in second language acquisition can probably profit from research on innovations in New Englishes, which has shown that it is mainly the occurrence of specific preference patterns that characterizes different postcolonial varieties of English (e.g. Schneider 2004, Mukherjee & Gries 2009, Olavarría de Ersson & Shaw 2003). As Odlin (2003: 437) states, “anyone seeking to understand transfer itself in all its manifestations needs to try to become familiar with a wide range of linguistic research”. To this end, a best practice methodology for the investigation of structural innovations was suggested in Section 4. It was argued that for each innovative structure in New English varieties detailed investigations of the different ways in which it might have emerged are necessary. In particular, the different possible manifestations of cross-linguistic influence and their interplay with other linguistic and non-linguistic factors need to be considered. Previous research has discovered some of the factors that influence structural innovations in New Englishes. Foremost, the speakers’ attitudes towards particular structures and their normorientation have emerged as variables that constrain the formation of innovations (Nesselhauf 2009, Gut 2007b, Sharma 2005, Simo Bobda 2003). A model of the development of structural innovations in New Englishes could thus be proposed as follows: 1. Some structures based on cross-linguistic influence form part of the language productions of individual speakers who are learning English. Depending on the stage of acquisition these will be direct forms of transfer rather than indirect ones. 2. Only some of these structures – mainly structural mixing and preference patterns – remain in the speakers’ language productions even when they have attained proficiency in English. These are mostly structures that are widely accepted and associated with positive attitudes as markers of a particular variety of English.2 3. Some of these structures are adopted by speakers of subsequent generations, some of whom might acquire English as a first language.

2. Without doubt, some structures might persist although they are associated with negative attitudes. In contrast to the positively valued ones, it is these that will mainly be targeted and penalized in school education.

 Ulrike Gut

What then can be concluded for the question of whether the structural innovations in New Englishes constitute “learner errors”? Is there a difference between structural innovations and learner errors? The classification of a linguistic structure as an innovation or an error proceeds on extra-linguistic rather than linguistic grounds. First of all, the identification of an “error” can only be made with reference to a model that serves as a yardstick of comparison. It is thus the norm-orientation and language policy of a speaker community or nation that labels a structure as an error. In countries such as Malaysia and Nigeria that have not developed a local norm of English and that take British English as the target, all structural innovations must be classified as errors. One consequence of this language policy is the high failure rate (more than 60%) of students in English that has often been lamented by Nigerian teachers and linguists (e.g. Aliyu 1995, Bamgbose 1998). In countries such as Singapore that have developed a local norm, innovations do not need to be referred to as errors but can be described as features of this particular variety of English (of course, speakers and politicians might differ in their acceptance of the local norm, as can be seen in the Speak Good English Movement of the Singaporean government, which aims to teach Singaporeans “grammatically correct” English; see ). The influence of political attitudes towards linguistic structures on their classification as an innovation or an error can be seen in recent developments in Jamaica. In 1989, a Senior Education Officer prescribed as the target of education in Jamaica “that our students develop proficiency in reading and writing Standard English” (cited after Shields 1989: 44). Ten years later, the emergence of a local Jamaican standard with normative function was acknowledged in official documents such as the Revised Primary Curriculum published by the Ministry of Education in 1999, which states that the major objective of the language programme at school is to assist pupils “to acquire the target language Standard Jamaican English” (p. 14) (cited after Irvine 2004: 45f.). Some structures of Jamaican English that would have been penalized as errors in 1989 would probably not be classified as such any more in 1999. The labelling of a structure as an error thus has an attitudinal and political rather than a linguistic basis. Errors and innovations should therefore not be categorized by linguists as distinct from each other but rather as structures representing two end-points of a continuum. The same structural feature of a language can move along this cline depending on a complex of extra-linguistic factors. These include geographical and demographic spread, acceptance, usage by “authoritative” speakers such as writers, teachers and politicians and codification (cf. Bamgbose 1998: 3f.). Any structure that is not widely spread and used, that is held in low esteem and that is not codified is likely to be classified as an error within and outside the speaker community. Conversely, structures that are widely found throughout a country, that are used by the majority of speakers, that are highly esteemed

Studying structural innovations in New English varieties 

and are even codified in dictionaries and grammars cannot be called learner errors – however far removed from a Standard variety of English they might be. They constitute innovations of this particular variety of English. The same arguments that have been put forward with regard to the question of how to distinguish innovations from learner errors can be adduced in the discussion of the differences between learning English as a second language and learning a New English variety. The major difference between the two scenarios seems to lie in the norm-orientation of the learner. Second language learners of English typically have as their target a Standard variety of English that is not spoken in their home country, whereas speakers/learners of a New English variety have as a target the local norm. The fundamental difference between English as a Second Language and English as a Foreign Language therefore mainly lies in differences in norm-orientation and attitudes, which in turn cause different kinds of cross-linguistic influence. References Abrahamsson, N. 2003. Development and recoverability of L2 codas. Studies in Second Language Acquisition 25: 313–349. Aliyu, J. 1995. Improving your Performance in English. Zaria: Amadu Bello University Press. Alsagoff, L. & Ho, C. L. 1998. The relative clause in colloquial Singapore English. World Englishes 17: 127–138. Anderson, J. 1987. The markedness differential hypothesis and syllable structure difficulty. In Interlanguage Phonology, G. Ioup & S. Weinberger (eds), 279–304. Rowley MA: Newbury House. Awonusi, V. 1986 Regional accents and internal variability in Nigerian English: A historical analysis. English Studies 6: 555–560. Bamgbose, A. 1998. Torn between the norms: Innovations in world Englishes. World Englishes 17: 1–14. Baker, W. & Trofimovich, P. 2005. Interaction of native- and second-language vowel system(s) in early and late bilinguals. Language and Speech 48: 1–27. Bayley, R. 1996. Competing constraints on variation in the speech of adult Chinese learners of English. In Second Language Acquisition and Linguistic Variation [Studies in Bilingualism 10], R. Bayley & D. Preston (eds), 97–120. Amsterdam: John Benjamins. Broselow, E., Chen, S.-I., Wang, C. 1998. The emergence of the unmarked in second language phonology. Studies in Second Language Acquisition 20: 261–280. Carroll, M., Murcia-Serra, J., Watorek, M. & Bendiscioli, A. 2000. The relevance of information organization to second language acquisition studies: The descriptive discourse of advanced adult learners of German. Studies in Second Language Acquisition 22(3): 441–466. Cenoz, J. 2005. Learning a third language: cross-linguistic influence and its relationship to typology and age. In Introductory L3 readings, B. Hufeisen & R. Fouser (eds), 1–9. Tübingen: Stauffenberg. Chan, A. 2004. Syntactic transfer: Evidence from the Interlanguage of Hong Kong Chinese ESL learners. Modern Language Journal 88: 56–74.

 Ulrike Gut de Klerk, V. 2005. Expressing levels of intensity in Xhosa English. English World-Wide 26: 77–95. Eckman, F. 1984. Universals, typology and interlanguage. In Language Universals and Second Language Acquisition [Typological Studies in Language 5], W. Rutherford (ed.), 79–105. Amsterdam: John Benjamins. Eckman, F. 1991. The structural conformity hypothesis and the acquisition of consonant clusters in the interlanguage of ESL learners. Studies in Second Language Acquisition 13: 23–41. Eckman, F. 2004. From phonemic differences to constraint rankings. Studies in Second Language Acquisition 26: 513–549. Ellis, R. 1985. Understanding Second Language Acquisition. Oxford: OUP. Flege, J. 1987. The production of ‘new’ and ‘similar’ phones in a foreign language: evidence for the effect of equivalence classification. Journal of Phonetics 15: 47–65. Flege, J. 1995. Second language speech learning theory, findings and problems. In Speech Perception and Linguistic Experience: Issues in cross-linguistic research, W. Strange (ed.), 233–277. Timonium: York Press. Fouser, R. 2001. Too Close for Comfort? Sociolinguistic transfer from Japanese into Korean as an L≥3. In Cross-linguistic Influence in Third Language Acquisition: Psycholinguistic Perspectives, J. Cenoz, B. Hufeisen & U. Jessner (eds), 149–169. Clevedon: Multilingual Matters. Gass, S. 1996. Second language acquisition and linguistic theory: The role of language transfer. In Handbook of Second Language Acquisition, W. Ritchie & T. Bhatia (eds), 317–345. San Diego CA: Academic Press. Greenbaum, S. 1996. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon Press. Grosjean, F. 1998. Studying bilinguals: Methodological and conceptual issues. Bilingualism: Language and Cognition 1: 131–149. Grosjean, F. 2001. The bilingual’s language modes. In One Mind, Two Languages: Bilingual Language Processing, J. Nicol (ed.), 1–22. Malden MA.: Blackwell. Gut, U. 2005. Nigerian English prosody. English World-Wide 26: 153–177. Gut, U. 2007a. Learner corpora in second language research and teaching. In Non-native Prosody: Phonetic Description and Teaching Practice, J. Trouvain & U. Gut (eds), 145–167. Berlin: Mouton de Gruyter. Gut, U. 2007b. First language influence and final consonant clusters in the new Englishes of Singapore and Nigeria. World Englishes 26(3): 346–359. Gyasi, I. 1990. The state of English in Ghana. English Today 6: 24–26. Hancin-Bhatt, B. 2000. Optimality in second language phonology: Codas in Thai ESL. Second Language Research 16: 201–232. Hansen, J. 2001. Linguistic constraints on the acquisition of English syllable codas by native speakers of Mandarin Chinese. Applied Linguistics 22: 338–365. Hansen, J. 2004. Developmental sequences in the acquisition of English syllable codas by native speakers of Mandarin Chinese. Studies in Second Language Acquisition 26: 85–124. Hammarberg, B. & Hammarberg, B. 1993. Articulatory re-setting in the acquisition of new languages. Phonum 2: 61–67. Hammarberg, B. & Williams, S. 1993. A study of third language acquisition. In Problem, process, product in language learning, B. Hammarberg (ed.), 60–70. Stockholm: Stockholm University. Hickey, R. 2004. Englishes in Asia and Africa: origin and structure. In Legacies of Colonial English, R. Hickey (ed.), 503–535. Cambridge: CUP. Hodne, B. 1985. Yet another look at interlanguage phonology: The modification of English syllable structure by native speakers of Polish. Language Learning 35: 404–422.

Studying structural innovations in New English varieties  Hughes, A. & Trudgill, P. 1996. English Accents and Dialects. London: Arnold. Irvine, A. 2004. A good command of the English language: Phonological variation in the Jamaican acrolect. Journal of Pidgin and Creole Languages 91(1): 41–76. Jackson, H. 1981. Contrastive analysis as a predictor of errors, with reference to Punjabi learners of English. In Contrastive Linguistics and the Language Teacher, J. Fisiak (ed.), 195–205. Oxford: Pergamon. Jowitt, D. 1991. Nigerian English usage. Ikeja: Longman Nigeria. Kandiah, T. 1998a. Why New Englishes. In English in New Cultural Contexts: Reflections from Singapore, J. Foley, T. Kandiah, B. Zhiming, A. Gupta, L. Alsagoff, C. L. Ho, L. Wee, I. S. Talib, W. Bokhorst-Heng (eds), 1–40. Singapore: OUP. Kandiah, T. 1998b. The emergence of new Englishes. In English in New Cultural Contexts: Reflections from Singapore, J. Foley, T. Kandiah, B. Zhiming, A. Gupta, L. Alsagoff, C. L. Ho, L. Wee, I. S. Talib & W. Bokhorst-Heng (eds), 73–105. Singapore: OUP. Kellerman, E. 1979. Transfer and non-transfer: Where we are now. Studies in Second Language Acquisition 2: 37–57. Kellerman, E. 1983. Now you see it, now you don’t. In Language Transfer and Language Learning, S. Gass & L. Selinker (eds), 112–134. Rowley MA: Newbury House. Kortmann, B., Schneider, E., Upton, C., Mesthrie, R. & Burridge, K. (eds). 2004. A Handbook of Varieties of English, Vol. 2: Morphology, Syntax. Berlin: Mouton de Gruyter. Laufer, B. & Eliasson, S. 1993. What causes avoidance in L2 learning, L1-L2 difference, L1-L2 similarity or L2 complexity? Studies in Second Language Acquisition 15: 35–48. Lorenz, G. 1998. Overstatement in advanced learners’ writing: Stylistic aspects of adjective intensification. In Learner English on Computer, S. Granger (ed.), 53–65. London: Longman. Mair, C. 2002. Creolisms in an emerging standard: Written English in Jamaica. English WordWide 23: 31–58. Major, R. 1996. Markedness in second language acquisition of consonant clusters. In Second Language Acquisition and Linguistic Variation [Studies in Bilingualism 10], R. Bayley & D. Preston (eds), 75–96. Amsterdam: John Benjamins. Major, R. 2001. Foreign Accent: The Ontogeny and Phylogeny of Second Language Phonology. Mahwah NJ: Lawrence Erlbaum Associates Mesthrie, R. 2008. Synopsis: Morphological and syntactic variation in Africa and South and Southeast Asia. In Varieties of English. Africa, South and Southeast Asia, R. Mesthrie, (ed.), 624–635. Berlin: Mouton de Gruyter. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Language Varieties. Cambridge: CUP. Mufwene, S. 2001. The Ecology of Language Evolution. Cambridge: CUP. Mukherjee, J. & Gries, S. 2009. Verb-construction associations in the International Corpus of English. English World-Wide 30: 27–51. Muysken, P. 2000. Bilingual Speech: A Typology of Code-mixing. Cambridge: CUP. Nesselhauf, N. 2009. Co-selection phenomena across New Englishes. English World-Wide 30: 1–26. Odlin, T. 1989. Language Transfer. Cambridge: CUP. Odlin, T. 2003. Cross-linguistic influence. In The Handbook of Second Language Acquisition, C. Doughty & M. Long (eds), 436–486. Oxford: Blackwell. Olavarría de Ersson, E. & Shaw, P. 2003. Verb complementation patterns in Indian Standard English. English World-Wide 24: 137–161.

 Ulrike Gut Olshtain, E. 1983. Sociocultural competence and language transfer: the case of apologies. In Language Transfer in Language Learning, S. Gass & L. Selinker (eds), 232–249. Rowley MA: Newbury House. Ó Laoire, M. 2005. L3 in Ireland: A preliminary study of learners’ metalinguistic awareness. In Introductory Readings in L3, B. Hufeisen & R. Fouser (eds), 47–55. Tübingen: Stauffenberg. Ringbom, H. 1992. On L1 transfer in L2 comprehension and production. Language Learning 42: 85–112. Ringbom, H. 2005. L2-transfer in third language acquisition. In Introductory L3 Readings, B. Hufeisen & R. Fouser (eds), 71–82. Tübingen: Stauffenberg. Schachter, J. 1974. An error in error analysis. Language Learning 24: 205–214. Schneider, E. 2003. The dynamics of new Englishes: From identity construction to dialect birth. Language 79: 233–81. Schneider, E. 2004. How to trace structural nativization: Particle verbs in world Englishes. World Englishes 23: 227–24. Schneider, E. 2007. Postcolonial English. Cambridge: CUP. Schneider, E., Burridge, K., Kortmann, B., Mesthrie, R. & Upton, C. (eds). 2004. A Handbook of Varieties of English, Vol. 1: Phonology. Berlin: Mouton de Gruyter. Setter, J. 2003. A comparison of speech rhythm in British and Hong Kong English. In Proceedings of the 15th International Congress of the Phonetic Sciences, Barcelona, D. Recasens, M.J. Solé & J. Romero (eds), 467–470. (CD-ROM). Sharma, D. 2005. Dialect stabilization and speaker awareness in non-native varieties of English. Journal of Sociolinguistics 9: 194–224. Shields, K. 1989. Standard English in Jamaica: A case of competing models. English World-Wide 10: 41–53. Simo Bobda, A. 2003. The formation of regional and national features in African English pronunciation. English World-Wide 24: 17–42. Sridhar, S. 1996. Toward a syntax of South Asian English: Defining the lectal range. In South Asian English, R. Baumgardner (ed.), 55–69. Urbana IL: University of Illinois Press. Tarone, E. 1980. Some influences on the syllable structure of interlanguage phonology. International Review of Applied Linguistics 18: 139–152. Thomason, S. 2001. Language Contact: An introduction. Washington DC: Georgetown University Press. von Stutterheim, C. 2003. Linguistic structure and information organisation: The case of very advanced learners. In EUROSLA Yearbook 2003, S. Foster-Cohen & S. Pekarek-Doehler (eds), 183–206. Amsterdam: John Benjamins.

Interrogative inversion as a learner phenomenon in English contact varieties A case of Angloversals? Michaela Hilbert

University of Bamberg The phenomenon of the non-standard use of inversion in both main clause and embedded interrogatives has been mentioned for several varieties of English, as well as for individual second language acquisition (SLA). There have been varying explanations, however: embedded inversion has been attributed to either L1 influence from either Gaelic or conservative British English for Irish English (Filppula 2004); to a general learner phenomenon for Singapore English (Gupta 1994); to overgeneralization for individual SLA (McDavid & Card 1973); or its patterning has left authors short of any explanation at all (Bhatt 2004 for Indian English). In addition, it is one of the examples often quoted in the discussion of so-called “Angloversals” (term attributed to Mair 2003), i.e. phenomena occurring across a range of historically and geographically unrelated vernacular varieties of English (Kortmann & Szmrecsanyi 2004). This paper aims at questioning these divergent explanations of what looks like one and the same phenomenon, by analyzing the syntactic factors that govern the occurrence or non-occurrence of inversion in interrogative clauses in three varieties of English: Indian English (IndE), Singapore English (SingE) and Irish English (IrE). The paper will show that what looks similar is not necessarily the same phenomenon at all and will single out a factor that unites contact varieties of English and individual learner Englishes, a factor which has often been neglected in quantitative studies: the frequency of specific lexical chunks, or “formulaic language”. The data that provides the basis for the quantitative analysis is taken from the International Corpus of English (ICE), more precisely the spoken private conversations sections of its Indian and Singaporean subcorpora. The data for Irish English is taken from the Hamburg Corpus of Irish English (HCIE).1 1. ICE-Ireland has now been published, but since non-standard uses of inversion occur only very infrequently in this corpus, the HCIE is used for the analysis of Irish English embedded inversion here.

 Michaela Hilbert

1. Introduction, previous research and the “paradigm gap” In Standard English, main clause interrogatives are usually marked by inversion (Where is she?), whereas embedded interrogative clauses remain uninverted (I don’t know where she is.) Vernacular varieties display several types of variation on this pattern. Firstly, uninverted polar interrogative main clauses can occur in a majority, if not all vernacular varieties of English, with the significant distinction, however, that in some (like standard British and American English) it is a pragmatically marked alternative to the inverted pattern, whereas in other varieties, as in example (1), it is an unmarked and, for some, the only strategy to form an interrogative main clause. These varieties include the L2-Englishes of Africa, Asia and the Caribbean, as well as English-based pidgins and creoles (see Kortmann & Szmrecsanyi 2004).

(1) (2) (3) (4)

You see see movie? (IndE, ICE-Ind, s1a-024) You’ve been there? (Bahamian English, Reaser & Torbert 2008) The mailman come today? (Fiji English, Thomas 1989) I can go? (Kriol, Fraser 1977)

Secondly, in addition to these uninverted polar interrogatives, uninverted constituent interrogatives also occur, but are only attested as a pervasive structure in L2-Englishes, pidgins and creoles, not in L1 varieties (see Kortmann & Szmrecsanyi 2004). (5) And why there are so many people in Tanzania mainland? (Tanzanian English, ICE-EA-s1a-018) (6) Where you are staying? (SingE, ICE-Sing-s1a-035) (7) Why you say it’s not worth it? (JamE, ICE-Jam-s1a-072) (8) Uhm how things are moving in Punjab now-a-days? (IndE, ICE-India-s1a-013) Thirdly, inversion in embedded interrogatives occurs in some L1 varieties (notably Irish English, some English dialects, vernacular American English and African American English), in some L2-Englishes of Africa, Asia and the Caribbean, but not in pidgins and creoles (see Kortmann & Szmrecsanyi 2004): (9) A wunthered did he gae hame. ‘I wondered if he went home’ (Ulster Scots, Montgomery 2006) (10) Sort of you know seeing what are people going to eat and so on (Kenyan English, ICE-EA-s1a-021)

Interrogative inversion as a learner phenomenon in English contact varieties 

(11) My my sis question me recently why aren’t I coming back you know (SingE, ICE-Sing-s1a-062) (12) we will be able to diagnose what is the problem (IndE, ICE-Ind, s1a-093) (13) let me know is John working or how are yous living (IrE, HCIE) This overview shows that these types of non-standard uses of interrogative inversion cross the boundaries of what has been classified as “inner-circle”, L1 or ENL Englishes (e.g. Irish English) and “outer-circle”, L2 or ESL Englishes (e.g. Indian English, Singapore English). In addition, all of them have also been reported in studies of English as a foreign language, that is, the “expanding circle” or EFL, most of which confirm a developmental sequence first established by Cazden et al. (1975), as displayed in Figure 1. Given the superficial similarity of the features described for individual language acquisition in Figure 1, and for varieties of English in the overview above, a comparison of the two is intriguing, but yields several theoretical problems: 1. Are we dealing with the same phenomenon here in the first place? If yes, how can we then explain the occurrence of embedded inversion in L1 dialects of English, which are usually not associated with features resulting from SLA? Stage I – Undifferention: learner did not distinguish betwenn simple and embedded wh-questions. a. uninverted: Both simple and embedded wh-questions were uninverted. simple: What you study?; emdedded: That’s what I do with my pillow. b. variable inversion: Simple wh-questions were sometimes inverted, sometimes not. inverted: How can you say it?; uninverted: Where you get that? c. generalization: increasing inversion in wh-questions with inversion being extended to embedded questions. simple: How can I kiss her if I don’t even know her name?; embedded: I know where are you going. Stage II – Differentiation: Learner distinguished between simple and embedded wh-questions. simple: Where do you live?; embedded: I don’t know what he had.

Figure 1. Developmental stages in the use of inversion in L2 acquisition of English; adapted from Cazden et al. (1975: 38)

 Michaela Hilbert

2. If we are indeed dealing with the same phenomenon, what is the exact relation between L2 varieties of English and SLA? Is the patterning displayed in NonENL varieties of English some kind of “fossilization” of the process described by Cazden et al.? 3. If it is, how can we explain the use of inversion in Indian English, which is reported to have inversion in embedded, but not in main clause interrogatives (Bhatt 2004), a pattern which is not predicted at all in Cazden et al.’s model? These questions are very much related to what Sridhar and Sridhar (1986) refer to as the “paradigm gap” between research into individual second language acquisition and that into “indigenized varieties” of English (“IVEs” in their terminology). One of their main claims, that this gap was due to the fact that research into varieties of English was largely impressionistic, non-empirical and atheoretical, is no longer true. Still unanswered, however, is the question of the exact relation between the two acquisition settings and their outcomes. Sridhar & Sridhar (1986: 5f.) importantly stress the differences between the two settings of acquisition, with respect to the target of acquisition, input, the role of other languages, motivation, as well as lexical and pragmatic aspects in the acquisition process. But to what extent do these actually make a difference in the results of the acquisition processes? And if they do make a difference, is it categorical/qualitative, or “merely” quantitative? A second gap that Sridhar & Sridhar (1986: 9) refer to, but this time within SLA theory and research itself, is the “competition” between approaches that stress transfer as the origin of non-target-like phenomena, as opposed to those that put more emphasis on overgeneralization, and the preference, at least by some authors, of the latter, whereas “there seems to be little motivation for being apologetic about claiming IVEs to be, in good part, products of transfer” (Sridhar & Sridhar 1986: 9f.). Still, the same debate is evident in research of varieties of English: whereas studies based on one variety or a group of related varieties often claim L1 transfer to be the source of a given structural innovation, typologicallymotivated studies may assign some general, sometimes called “universal”, status to the identical phenomenon, once it occurs in two or more historically and geographically unrelated varieties of English. Cases in point are the studies on embedded inversion in IrE, SingE and IndE. Filppula (2004), for instance, discusses the potential sources of embedded inversion in Irish English, as in example (14): (14) I wonder what is he like at all. Potential origins of the construction are, in his view, conservatism and dialect diffusion (as used within a framework of regional dialectology), overgeneralization of main clause inversion to embedded clauses, or substrate influence (as used within a framework of contact linguistics). Filppula comes to the conclusion that

Interrogative inversion as a learner phenomenon in English contact varieties 

[i]n the light of the evidence discussed above, the case for Irish substratum influence on [Hiberno English; MH] indirect questions looks very strong. [...]An important factor speaking against conservatism and dialect diffusion as the primary source is the peripheral status of this pattern in EModE and in present-day conservative BrE dialects. (Filppula 2004: 179)

Of course, the substrate explanation for Irish English falls short of explaining the occurrence of the same phenomenon in other varieties such as Singapore English, whose contact languages do not have inversion, either in embedded, main or any other clauses. Gupta (1994) does not explicitly comment on potential origins of the non-standard uses of inversion in Singapore English; by locating her analysis in a general framework of language acquisition, however, and by referring to the pervasiveness of the phenomenon in other L2 Englishes, she seems to implicitly suggest an explanation along the lines of overgeneralization (from uninverted main clause declarative word order to main clause interrogative word order, and then later from inverted main clause interrogative to inverted embedded interrogative, as predicted in Cazden et al. 1975’s model; see Figure 1). The conflict between substrate versus overgeneralization is not the only one, however, that comes up in the comparison of L2 English varieties. IndE has left not only Bhatt (2004) puzzled, with its pattern of inversion, which is reported to be inversion in embedded interrogatives, as in (15), but lack of inversion in main clause interrogatives, as in (16). (15) Do you know where is he going? (16) Where he has gone now?

(Bhatt 2004, 1020) (Bhatt 2004, 1020)

Bhatt concludes that [t]he simple empirical generalization that emerges [...] is that in vernacular IndE inversion is restricted to embedded questions; it does not apply in matrix questions. The question formation strategy in vernacular IndE is the mirror image of that of StIndE [Standard Indian English, MH]. (Bhatt 2004, 1020)

Neither substrate influence nor overgeneralization can explain this reported inversion pattern in IndE. Substrate influence, which was plausible enough for IrE, is impossible for IndE, since the substrate languages in India do not provide a possible origin of transfer. Any specific relation of IndE to Gaelic or Irish English has not been attested either. If conservatism and dialect diffusion are cancelled out for Irish English, they can equally be cancelled out for Indian English, and for the same reasons. Also, embedded inversion in IndE cannot be an overgeneralization of the main clause structure, since main clause interrogatives in IndE are uninverted, or supposedly so. What remains as the intriguing “third way” is to assign the status of an “Angloversal” (see Mair 2003), or some other kind of universal, to

 Michaela Hilbert

embedded inversion, with the remaining problem that this would only give it a label, but would not exactly provide an explanation, or suggest any kind of origin, at least not if one rejects the “bioprogram” that Chambers (2004) mentions as a possible source of his vernacular universals. The following case study will suggest a solution to this problem by means of a data-based comparison of three varieties of English, IndE, SingE and IrE, starting with an analysis of the exact patterning of inversion in these varieties in the following section. The primary assumption is that if the feature is governed by identical factors (that cannot be related to substrate influence) in all varieties, we are actually dealing with the same phenomenon in the first place and can then look for potential sources based on the specific factors that govern the occurrence of one variant or another. 2. Non-standard inversion patterns in varieties of English 2.1

IndE and SingE

The data used here comes from the private conversations sections of the International Corpus of English (ICE), available in parallel design for IndE and SingE (among others; for an analysis of additional varieties cf. Hilbert forthcoming). When it comes to the syntactic factors governing the occurrence and nonoccurrence of inversion, IndE and SingE show parallel tendencies as to the verb type occurring in inverted main clause interrogatives. In both varieties, be is predominantly inverted when it occurs in an interrogative clause, and so are modal verbs and have in its auxiliary use, but to a lesser extent. A more balanced variation exists with full verbs (requiring do-periphrases in inverted constituent questions), as shown in Figures 2 and 3. ICE-India constituent interrogatives 100%

– inv

80%

– inv

60% 40%

+ inv

+ inv

20% 0%

be

modal/aux

+ inv

full verb/doperiphrasis

Figure 2. Ratio of inversion for basic verb types in IndE main clause constituent interrogatives

Interrogative inversion as a learner phenomenon in English contact varieties  ICE-Sing constituent interrogatives 100%

– inv – inv

80% 60% 40%

+ inv

+ inv + inv

20% 0%

be

modal/aux

full verb/doperiphrasis

Figure 3. Ratio of inversion for basic verb types in SingE main clause constituent interrogatives

Table 1. Occurrence of verbs in inverted embedded constituent interrogatives in IndE and SingE

is ’s are was why don’t would other modal

IndE %

SingE %

49 44 2 3 2 0 0

43 33 11 2 3 3 4

Thus, the large majority of inverted embedded interrogatives in both varieties have be as the inverted auxiliary, as in (17). Only some modals appear in these clauses, such as in (18), and full verbs do so only in the construction why don’t..., as in (19), which appears to be a fixed sequence here; otherwise full verbs should be expected to occur with do-periphrasis and inversion in constructions other than why don’t you as well. In embedded constituent interrogatives, this pattern is not only confirmed, but is shown to be even stronger than in main clause constituent questions (Table 1). (17) Have you measured how much how high is your cholesterol? (ICE-Sing-s1a-013)

 Michaela Hilbert

(18) So I was beginning to wonder what would I be doing there? (ICE-Sing-s1a-047) (19) A lot of them ask me why don’t I go into teaching (ICE-Sing-s1a-046) To sum up, the majority of inverted main clause interrogatives involve the verb to be or one of the frequent modal auxiliaries. This tendency is parallel in both varieties under study here. The hypothesis to be deduced from these preliminary figures is that the basis of this pattern might not so much be the application of an inversion rule but the use of unanalyzed fixed chunks consisting of the interrogative pronoun and a form of to be. Based on this analysis of main clause inversion in the two varieties, the appearance of inversion in embedded interrogatives might receive a rather different analysis than suggested above: not overgeneralization of the inversion rule from main to embedded interrogatives, but the use of the same available and (most) frequent fixed chunks in both types of clauses. The basis of these fixed chunks, as suggested above, would be the fusion of the interrogative word with the following inverting operator (mainly a form of to be). This hypothesis is firstly confirmed by the fact that cliticization of this inverting verb to the preceding interrogative verb appears relatively frequently in both varieties, as in the following examples from the corpus: (20) How’s Natalie? (21) How’s it? (22) When’s Mary coming back here?

(ICE-Ind-s1a-098) (ICE-Sing-s1a-055) (ICE-Sing-s1a-005)

The same cliticized pronoun-verb chunks appear in embedded clauses, as in (23) You know what’s the latest dance style or not? (24) But I don’t know what’s the prize like.

(ICE-Sing-s1a-002) (ICE-Sing-s1a-104)

Secondly, this fusion leads to occasional double marking, i.e. the occurrence of two verbs in the clause, a form of to be cliticized to the interrogative pronoun, and one following later in the sentence, after the subject, thus in its uninverted position in an embedded interrogative: (25) Don’t understand how’s it is coming up. (26) I didn’t know what’s was indication...

(ICE-Sing-s1a-053) (ICE-Sing-s1a-65)

These cases of double marking provide a fundamental aspect of support for the “fixed chunk” analysis of what otherwise looks like inversion. With regard to the search for a distinct origin of the non-standard use of inversion in these two English varieties, the hypothesis that inversion in IndE and SingE is rather based on the use of fixed chunks of the interrogative pronoun and a frequent inverted verb would support the argument that the non-standard use of inversion is based on a

Interrogative inversion as a learner phenomenon in English contact varieties 

learner strategy. The difference, however, from preceding versions of this explanation is that imitation rather than rule formation (and rule overgeneralization) seems to be the underlying strategy here. This brings us back to one of the initial questions of this study, namely whether the non-standard use of inversion in Irish English is based on the same strategy, or whether it is in fact language contact with the Gaelic substrate. If the latter hypothesis were true, we should find inversion in Irish English to pattern differently from the one in IndE and SingE. 2.2

Irish English

In this section, we will look at the question whether embedded inversion in Irish English is the same phenomenon as in IndE and SingE or only a superficially similar construction. IrE does not display non-standard patterns of inversion in main clause interrogatives, that is, these are inverted as is common in L1 Englishes in general. Thus, the feature is restricted to embedded interrogatives as in the following examples of embedded constituent interrogatives from the HCIE. (27) Let me know how is all they old Neighbours. (28) She wants to know what is the matter with him. Looking first at embedded constituent interrogatives, Table 2 shows that IrE patterns practically identically with IndE and SingE with regard to the verbs that occur in these clauses. It is not only the ratios of verbs that are parallel in the three varieties, but also the same kind of double marking occurs in the IrE data, as exemplified in (29) and (30), though without cliticization: (29) let me know how is all the boys and girls is and how they are getting along (30) I want to {{k}}now how is David John Taylor is geting along and Paddy Heron also Table 2. Occurrence of verbs in inverted embedded constituent interrogatives + inv

IrE

IndE

SingE

be modal full verb

94% 0% 6%

98% 0% 2%

89% 7% 3%

 Michaela Hilbert

As a first conclusion from these results, it can be said that the factor “verb type”, which plays a role in governing the occurrence and non-occurrence of inversion in IndE and SingE, certainly also plays a role in inversion in IrE constituent interrogatives. Thus, IrE does not seem to be very different from the other two varieties, and imitation as a learner strategy rather than substrate influence suggests itself as a more plausible explanation of the feature. This, however, is very different for embedded polar interrogatives, which were not considered in the analysis of the IndE and SingE data above, but will be compared directly to the IrE data here. Inversion in embedded polar interrogatives is equally common as in constituent interrogatives in the IrE data, as in the following examples from the corpus: (31) You ask me is Hamilton home from New Zealand. (32) ...who asked me would I go with him as a servant (33) Let me know did your sister send for Thomas. As these examples suggest, the verbs in IrE inverted clauses are not restricted to forms of to be, but include modals and full verbs with do-periphrasis. In fact, the distribution of these verb types is significantly different from the one in embedded constituent interrogatives (Table 3). Full verbs appear in half of all these embedded polar interrogatives, twice as often as be and modal auxiliaries. In IndE and SingE on the other hand, embedded inversion hardly ever occurs in polar interrogatives. If it occurs at all, it does so with forms of to be, as in (34). (34) Anyway I ask is that Barry.

(ICE-Sing-s1a-092)

Instead of inversion, the construction with subordinating if and, particularly, whether, is generally used in these two varieties, as in (35): (35) And of course I don’t know whether for my son-in-law anything is available or not (ICE-Ind-s1a-096) To sum up, embedded inversion in IrE constituent interrogatives shares properties with embedded inversion in IndE and SingE, which suggests a common explanation along the lines of learner strategies and the imitation of frequent unanalyzed chunks as the origin of inversion of subject and verb in interrogative clauses. Table 3. Occurrence of verbs in inverted embedded polar interrogatives + inv

IrE

IndE

SingE

be modal full verb

22% 28% 50%

100% 0% 0%

90% 0% 10%

Interrogative inversion as a learner phenomenon in English contact varieties 

Embedded inversion in IrE polar interrogatives, however, is clearly distinct from the patterns of inversion shown for IndE and SingE. Thus, different explanations for the feature’s occurrence in the different varieties are needed, and substrate influence as a potential source of the feature in IrE comes back into play.2 3. Inversion as imitation rather than rule overgeneralization 3.1

Tendencies from individual SLA

Non-inversion in main clause interrogatives as well as embedded inversion is a well-known phenomenon of L2-acquisition. Inversion phenomena found in L2 acquisition share important properties with inversion found in the two Asian varieties of English studied above. The two central elements of the tendencies found in the analysis above have also been reported for individual SLA of English. Firstly, the statistical difference between inversion of the verb be and inversion of modals and full verbs (the latter with do-periphrasis), as found for the varieties of English discussed above, reflects the order of acquisition often stated for inversion in SLA (cf. Adams 1978; Ravem 1978; Shapira 1978; Tiphine 1983), that is be > (frequent) modals > full verbs with do-periphrasis

Secondly, the importance of formulaic language has been stressed by several authors. Ellis (2005), for instance, describes where’s and what’s + NP as particularly frequent chunks in his studies; likewise, Weinert (1995) stresses the central role of the copula and frequent modal verbs in the formation of formulas in interrogatives. Thus, inversion phenomena in IndE and SingE varieties of English apparently share important properties with L2-acquisition strategies. The term “overgeneralization” could still be applied for the features identified, but only to the extent that it is not the inversion rule of main clause interrogatives that is overgeneralized, but rather that fixed chunks consisting of an interrogative pronoun, a frequent verb and possibly also the following noun phrase are available for both main and embedded interrogatives. Likewise, the “mirror image” inversion in IndE (cf. Bhatt 2004) can be analyzed as overgeneralization to the extent that inversion in both clause types is based on the availability of fixed chunks (interrogative pronoun + is). If the imitation and use of fixed chunks are among the primary factors governing the occurrence of inversion and non-inversion in L2 varieties of English (and plausibly also in individual SLA), any further analysis aiming at an explanation that enables us to predict inversion patterns is based on the following hypotheses: 2. Different origins for embedded inversion become all the more plausible in the light of the fact that the two types of interrogatives also differ in grammaticality judgements (see Filppula 1999).

 Michaela Hilbert

i. Formulaic language plays a role in the occurrence of inversion. ii. Frequency is a major factor. iii. Frequent strings are more likely to be inverted. How can these hypotheses be tested? So far, no systematic attempt at quantifying formulaic language has been made in the domain of language acquisition (to the knowledge of the author). The following section will therefore look at the distribution of verb types and the subjects involved in inversion in more detail in order to see whether the concept of “formulaic language” or “fixed chunks” can actually be used in other cases to predict the occurrence of inversion in interrogative clauses. 3.2

An attempt at quantifying “formulaic language”

Since inversion manifests itself in the specific order of the interrogative pronoun, the subject and the verb of the clause, and since frequency seems to play a role in the occurrence or non-occurrence of inversion (as shown for the main verb in the preceding chapters), particularly frequent combinations of these three elements should be more likely to be inverted than non-frequent combinations. This hypothesis will be tested by means of frequency hierarchies for some selected types of interrogative clauses. It has to be noted that this is a first attempt at finding quantitative tendencies that are likely to be subject to the implications of the “formulaic language” hypothesis. (For a more detailed analysis of these tendencies and their exceptions, cf. Hilbert forthcoming.) The first case to be analyzed is polar interrogatives in IndE with subject you. You is the most frequent subject to appear in interrogatives in general, and thus provides the largest database for this approach. Figure 4 shows the frequency of operators with subject you in IndE polar interrogatives. Modal verbs occur most 35 30 25 20 15 10 5 0

modal

have aux

are progr

are fv

have fv

Figure 4. Frequency of verb types in polar interrogatives (IndE)

were aux

Interrogative inversion as a learner phenomenon in English contact varieties 

frequently in these clauses, followed by the other auxiliaries have and are. The fullverb uses of these two primary verbs are less frequent, and the past tense auxiliary were is the least frequent verb to occur. Figure 5 displays the inversion ratios for these verbs, with a hierarchy of inversion that parallels their frequency almost completely, except for have and the modals, which are, however, relatively close in both figures. As a result it can safely be concluded for the data analyzed that the more often a specific operator combines with the subject you in IndE polar interrogatives, the more likely it is to be inverted. A second case involves main clause constituent interrogatives with the interrogative pronoun what and subject NP in IndE. The frequency of all occurring verb types in these clauses is shown in Figure 6. Forms of to be in its main verb use occur most frequently in this construction, as well as full verbs, followed by the auxiliary use of to be and the modals. The two uses of have are the least frequent verbs to occur in these clauses. This again is paralleled, though not as clearly as in the preceding case, by the ratios of inversion for these 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

– inv % + inv % have aux

modal

are progr

are fv

have fv

were aux

Figure 5. Inversion ratios in IndE polar interrogatives 60 50 40 30 20 10 0

be fv

fv

be progr

modal

have fv

have aux

Figure 6. Frequency of verb types with subject NP in what interrogatives (IndE)

 Michaela Hilbert

verb types (Figure 7). The auxiliary uses of be overtake the full verbs in their inversion ratio, which could be explained by the fact that “full verb” includes a higher number of different verbs than the different forms of be, a relevant aspect with regard to the notion of fixed chunks (involving combinations with specific verbs). The third case to be analyzed along these lines is subject types in general. Figure 8 shows the subject types to occur in IndE polar interrogatives, with you constituting the large majority, followed by the other personal pronouns. Full noun phrases (proper names and common nouns) and the pronouns that and there occur only in a minority of interrogatives of this type. The inversion ratios of these subject types are displayed in Figure 9, and in this case, they do not parallel the plain frequency hierarchy in Figure 8. Thus, mere frequency of occurrence is not the primary factor to govern the occurrence of inversion, but rather the frequency of specific combinations, as shown in the other two cases above. 100% 80% 60% 40% 20% 0%

– inv + inv be fv

be progr

fv

modal

have fv

have aux

Figure 7. Inversion ratios in IndE what interrogatives with subject NP 80 70 60 50 40 30 20 10 0

you

it

prn other

proper name

full NP

Figure 8. Frequency of subject types in IndE polar interrogatives

that

there

Interrogative inversion as a learner phenomenon in English contact varieties  100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

proper that full NP he

she

you

they

I

it

there

+ inv % – inv %

Figure 9. Inversion ratios of subject types in IndE polar interrogatives

Since the inversion ratios of the subjects in these interrogatives cannot be accounted for by their frequency, a more explanatory approach would be to look at the different verbs these subjects combine with. This approach is suggested by the fact that there and it are most likely to be inverted, although they are not very frequent relative to the other types of subjects. The few clauses in which they appear, however, are all clauses with is as the main verb, and it is plausible to assume that is there and is it can be regarded as candidates for the status of unanalyzed fixed chunks. On the other hand, proper nouns and other full (i.e. non-pronoun) NPs will not appear in frequent combinations with specific verbs, firstly for semantic reasons (different kinds of actions, for instance, can be ascribed to referents denoted by full noun phrases much more plausibly than to e.g. there); and secondly because there is an infinite number of different NPs that can fill this position in the clause and thus a higher number different combinations than for the closed word class of pronouns. Consequently, if the likelihood of inversion for a given subject type is hypothesized to be connected to the number of verbs with which it can be combined, or actually occurs with in the data, then it should be possible to calculate a “combinability quotient” for these subjects, which includes the number of different main verbs occurring with a given subject and the total frequency of this subject. The resulting quotient expresses the “combinability” of a given subject: the higher the factor, the more variable this subject is with regard to the range of verbs it combines with. A low quotient indicates that a subject occurs with only a limited number of different verbs and thus is most likely to occur in relatively frequent fixed chunks. Combinability quotient = number of different main verbs occurring with a given subject divided by the total frequency of given subject

 Michaela Hilbert

Table 4. Combinability of subjects (IndE polar interrogatives) Subject

Combinability quotient

proper name full NP he/she that you they I there it

1 1 0.88 0.5 0.49 0.43 0.43 0.17 0.16

The combinability quotients for the subjects in IndE polar interrogatives are shown in Table 4. Proper names and other full NPs never occur more than once with the same verb in the data analyzed here, and similarly the third-person singular pronouns he and she. There is a group of pronouns (personal pronouns and that) which combine with a smaller variety of verbs, and, as expected above, there and it occur with a very limited range of verbs, mainly is and other forms of to be. The inversion hierarchy to be expected from these figures is: proper name > full NP > he/she > that > you > they/I > there/it

This order parallels much more consistently the inversion ratios of the different subject types, with only that not fitting into the picture (cf. Figure 9). To illustrate the explanatory potential of a combinability quotient again: There inverts in 100% of all cases, although it is comparatively infrequent. It occurs, however, only in the existential construction with forms of to be, predominantly with is, less frequently with are and was. (36) Is there any possibility of finding a trustworthy painter? (ICE-Ind-s1a-093) I and it are also relatively infrequent as subjects, but occur with a limited number of verbs in polar interrogatives: I occurs mostly with modal verbs, as in (37), and it mostly with is, as in (38): (37) Can I ask the question? (38) Is it a big campus?

(ICE-Ind-s1a-026) (ICE-Ind-s1a-022)

The empirical conclusion to be drawn from these exemplary results is that the primary factor governing the occurrence and non-occurrence of inversion in this

Interrogative inversion as a learner phenomenon in English contact varieties 

data indeed seems to be “formulaic language”, i.e. the frequency of specific strings: the more frequently a specific combination of [interrogative pronoun] + [subject] + [main verb] occurs, the more likely this string is to be inverted in both main clause and embedded interrogatives. With regard to methodology and theoretical accounts in the study of varieties of English, there are two aspects as to which conclusions from the empirical results of this paper can be drawn: firstly, with regard to the study of non-standard phenomena occurring across a range of varieties of English; and secondly with regard to the alleged “paradigm gap” in explaining phenomena that occur both in varieties of English and individual SLA. 4. Methodological and theoretical conclusions Regarding the study of non-standard phenomena in varieties of English, this paper has shown that a cross-variety perspective is relevant and essential since it increases the requirements as to the conditions under which potential explanations are to hold. This has become apparent in the fact that explanations and potential sources of a phenomenon might be plausible for one variety, but not for the apparently identical phenomenon in another variety (a case in point is embedded inversion, for which a substrate explanation only holds for Irish English). We must, however, be very careful with attempting generalizations across a range of varieties of English. Concentrating on the mere presence or absence of phenomena, which has mostly been done in approaches focusing on notions such as “vernacular universals”, “Angloversals” etc. may mask important details. A feature may have different causes in different varieties, and these can only be detected by a closer scrutiny of the factors that govern the occurrence of a given feature. These factors may well be very different, as shown for embedded inversion in IrE as against embedded inversion in IndE and SingE (even though the differences are restricted to only one subtype of embedded interrogatives). Initial hypotheses about “universal” status have to be based on a specific domain (e.g. overgeneralization in SLA) in order to be able to serve as explanations and to allow predictions as to the occurrence and non-occurrence of specific phenomena. Regarding the alleged paradigm gap this paper started with, between research into varieties of English and individual SLA, similar “routes of development” (Mesthrie & Bhatt 2008) can be found in the non-standard use of inversion in both settings. Firstly, what constitutes a developmental sequence in SLA is reflected in quantitative statistical tendencies for varieties of English (here: verb types and likelihood of inversion). Secondly, a hypothesis from SLA, the role of formulaic language in acquisition, has likewise been observable (and testable) in the data

 Michaela Hilbert

from English varieties. And maybe not accidentally: the more stable nature of varieties might make it easier to detect and quantify such tendencies than the (by definition) more fluctuating nature of individual SLA. In connection with the use of data, this paper has shown that corpus data and corpus-linguistic methods can make a significant contribution to the testing and development of hypotheses existing in both linguistic fields; in this case, for instance, the concept, role and methods of quantification of formulaic language. The parallels between statistical tendencies in varieties and developmental sequence in SLA are apparent for interrogative clauses, and whether they also hold for other phenomena remains to be investigated. Irrespective of the important question marks with regard to the exact nature of this relation, corpus research into varieties of English can provide an important additional testing ground for hypotheses and concepts originating in individual SLA research. The further development of learner corpora, preferably with a similar diversity of substrate languages as available for non-standard varieties of English, would and will be an important contribution to this line of research. In this case, the programme would rather be neglecting the existence of a paradigm gap and instead conducting parallel analyses of phenomena in both types of acquisition settings and testing hypotheses from both fields of research. Thus, one of the aims of this paper has been to suggest, with all due respect for the differences between the two settings, that using data from one field in order to test hypotheses originating in the other seems to be a way towards bridging the paradigm gap by intentionally ignoring it. Only when we can test hypotheses and observations in one setting against the other can we come up with an integrated framework for both. Such an integrated model would be a constructive and, in the long run, an indispensable goal, given that even within the countries in which English is officially spoken as a second language, we find diverging settings for individual groups of speakers. For some, English will be their only first language as a consequence of complete language shift, for others it will be one of two or more first languages, or a second language (both as a consequence of partial language shift); still other members might learn English as a foreign language in a setting of tutored learning, and another group might only acquire and use it as a lingua franca with restricted input. Thus, even within the field of research into L2 varieties of English, an integrated model is essential. References Adams, M. 1978. Methodology for examining second language acquisition. In Second Language Acquisition: A book of Readings, E.M. Hatch (ed.), 277–296. Rowley MA: Newbury House.

Interrogative inversion as a learner phenomenon in English contact varieties  Bhatt, R.M. 2004. Indian English: Syntax. In A Handbook of Varieties of English, Vol. 2: Morphology and Syntax, B. Kortmann, K. Burridge, R. Mesthrie & E. Schneider (eds), 1016–1030. Berlin: Mouton de Gruyter. Cazden, C., Cancino, H., Rosansky, E. & Schumann, J. 1975. Second Language Acquisition Sequences in Children, Adolescents and Adults. Washington DC: United States Department of Health, Education and Welfare. Chambers, J.K. 2004. Dynamic typology and vernacular universals. In Dialectology Meets Typology: Dialect Grammar from a Cross-linguistic Perspective, B. Kortmann (ed.), 127–145. Berlin: Mouton de Gruyter. Ellis, R. 2005. Analysing Learner Language. Oxford: OUP. Filppula, M. 1999. The Grammar of Irish English: Language in Hibernian style. London: Routledge. Filppula, M. 2004. Irish English: Morphology and syntax. In A Handbook of Varieties of English, Vol. 2: Morphology and Syntax, B. Kortmann, K. Burridge, R. Mesthrie & E. Schneider (eds), 73–101. Berlin: Mouton de Gruyter. Fraser, J. 1977. A Tentative Short Dictionary of Fitzroy Crossing Children’s Pidgin. From data collected Oct-Nov 1974; revised March 1977. Ms, Darwin. Gupta, A.F. 1994. The Step-Tongue: Children’s English in Singapore. Clevedon: Multilingual Matters. Hilbert, M. Forthcoming. Interrogative Constructions in Varieties of English. PhD dissertation, University of Hamburg. Kortmann, B. & Szmrecsanyi, B. 2004. Global synopsis: Morphological and syntactic variation in English. In A Handbook of Varieties of English, Vol. 2: Morphology and Syntax, B. Kortmann, K. Burridge, R. Mesthrie & E. Schneider (eds), 1122–1182. Berlin: Mouton de Gruyter. Mair, C. 2003. Kreolismen und verbales Identitätsmanagement im geschriebenen jamaikanischen Englisch. In Zwischen Ausgrenzung und Hybridisierung, E. Vogel, A. Napp & W. Lutterer (eds), 79–96.Würzburg: Ergon. McDavid, V. & Card, W. 1973. Problem areas in grammar. In Culture, Class, and Language Variety: A Resource Book for Teachers. A.L. Davis (ed.), 26–54. Urbana IL: National Council of Teachers of English. Mesthrie, R. & Bhatt, R. 2008. World Englishes. Cambridge: CUP. Montgomery, M. 2006 The morphology and syntax of Ulster Scots. English World-Wide: A Journal of Varieties of English 27(3): 295–329. Ravem, R. 1978. Two Norwegian children’s acquisition of English syntax. In Second Language Acquisition: A Book of Readings. E.M. Hatch (ed.), 148–154. Rowley MA: Newbury House. Reaser, J. & Torbert, B. 2008. Bahamian English: Morphology and syntax. In Varieties of English, 2: The Americas and the Caribbean, E. Schneider & B. Kortmann (eds), 591–608. Berlin: Mouton de Gruyter. Shapira, R. 1978. The non-learning of English: A case study of an adult. In Second Language Acquisition: A Book of Readings. E.M. Hatch (ed.), 246–255. Rowley MA: Newbury House. Sridhar, K. & Sridhar, S. 1986. Bridging the paradigm gap: Second language acquisition theory and indigenized varieties of English. World Englishes 5 (1): 3–14. Thomas, L. 1989. Just Another Day: A Play. Suva: University of the South Pacific. Tiphine, U. 1983. The Acquisition of English Statements and Interrogatives by French Speaking Children. PhD dissertation, University of Kiel. Weinert, R. 1995. The role of formulaic language. Applied Linguistics 16(2): 180–205.

Overuse of the progressive in ESL and learner Englishes – fact or fiction?* Marianne Hundt and Katrin Vogel

University of Zurich and University of Heidelberg New Englishes and learner varieties of English are both reported to overuse the progressive. Furthermore, previous research suggests that speakers of English as a Second Language (ESL) and learners of English as a Foreign Language (EFL) use the progressive construction differently, at times, from the way it is used by native speakers. Previous research, for the most part, has looked at New Englishes and Learner English separately. This study combines the data on the use of the progressive in academic writing by ESL and EFL speakers. As research on the progressive in varieties such as British and New Zealand English has shown, there are also differences in the ongoing spread of the progressive in English as a native language (ENL), so it is not enough to compare ESL and EFL varieties with just one ENL variety. This paper therefore brings together evidence from corpora of inner, outer and expanding-circle varieties of English to test the hypothesis of ‘overuse’ and ‘deviation’ in ESL and EFL use of the progressive. The results show that we might have to reconsider some of the models of World English that suggest neat divides between ENL, ESL and EFL usage.

1. Introduction The more extensive and deviant use of the progressive is a feature that is frequently mentioned within the context of both New Englishes and learner varieties of English. Rogers (2002: 193), for instance, remarks on the use in her Indian English data: “I noticed that the progressive form seemed to occur more frequently than one might have expected.” Westergren Axelsson & Hahn (2001: 5) observe that [t]he progressive is a feature of English grammar that is difficult to handle for nonnative speakers, both teachers and students. One consequence is that the progressive is claimed to be used too often and in the wrong places by Swedes and Norwegians. * We would like to thank the participants of the ISLE workshop, Joybrato Mukherjee, Danielle Hickey and Kevin McCafferty for comments on earlier versions of this paper.

 Marianne Hundt and Katrin Vogel

To date, the use of the progressive in English as a Second Language (ESL) and Learner Englishes (EFL) has mostly been studied separately, usually with a ‘parent’ or ‘native’ variety such as British English (BrE) or American English (AmE) as the yardstick for comparison (see Gachelin 1997, Rogers 2002 and Vogel 2007 for New Englishes or Virtanen 1997, Westergren Axelsson & Hahn 2001, Wulff & Römer (2009) for Learner Englishes compared with native varieties); a notable exception is van Rooy (2006, this volume) who compares Black South African English (BSAfE) with BrE on the one hand and German Learner English on the other hand. He found that the learner data were very similar to the native speaker data, whereas BSAfE provided evidence of new aspectual uses that he attributes to substrate influence from Bantu languages. We believe, however, that it is not enough to simply compare one variety each from the inner, outer and expanding circles as this approach is likely to over-emphasize differences between inner-circle and outer-circle varieties or differences between ‘native’ English and Learner English. An additional complication is the fact that the progressive is undergoing change in Present-day English (PDE), so we have to take variation among inner-circle varieties into account as well. Research on the progressive suggests that, indeed, some varieties in the inner circle are more advanced in the spreading use of the progressive than others (see Hundt 1998 and Collins 2009). Previous studies have also shown that within national varieties of English, considerable differences exist in the use of the progressive in different genres (see Smith 2005)1. It is therefore of paramount importance that a cross-varietal study be based on a comparable set of data. At the same time, the spread of ENL, ESL and EFL varieties should be as broad as possible, so we will consider ESL varieties at different developmental stages as well as a range of EFL varieties with typologically different first language backgrounds. The aim of our paper is to bring together evidence from corpora of inner-circle, outer-circle and expanding-circle varieties of English to test the hypothesis of ‘overuse’ and ‘deviation’ in ESL and Learner Englishes. Specifically, we would like to address the following research questions: 1. Do ESL and EFL varieties of English, in general, use more progressives than ENL varieties? 2. Will we find gradient patterns of usage ranging from ENL to ESL and EFL varieties (with internal gradience in ESL varieties reflecting the degree of institutionalization), as Nesselhauf (2009) did for some co-selection phenomena? 3. Do both ESL and EFL varieties provide examples of stative verbs in the progressive and do they use them to the same extent? 1.

Some important findings of this study are also reported in Leech et al. (2009: Chapter 6).

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

4. Will we find aspects of usage in which ESL varieties are different from both ENL and EFL varieties, as van Rooy (2006) did? In the following part of our paper, we will briefly contextualize the varieties that we investigated within the theoretical framework of world Englishes. In sections three and four, we will introduce the empirical basis of our study and outline our definition of the syntactic variable. The results will be presented and discussed in parts five and six. The conclusion will relate our findings to the concerns of the present volume – the connection between second language varieties and Learner Englishes. 2. Background In our short introduction, we used the terms ‘New Englishes’, ‘Learner English’ and ESL; we also made reference to Kachru’s (1986) concentric-circles model. The most uncontroversial point is that British (BrE), Irish English (IreE) and New Zealand English (NZE) are so-called inner-circle varieties of English – the data that we use for these varieties are texts that were produced by people who grew up with English as their first and often only language. In expanding-circle countries such as Germany, Sweden or Finland, English is taught as a foreign language at school, and it is not institutionalized as a second or co-official language. Texts produced by people from these countries are therefore typically classified as instances of Learner English or EFL. (This does not rule out that people in these countries occasionally achieve near-native proficiency in English, or, as Fraser Gupta (2006: 99) puts it, “[s]kill in Standard English is certainly not linked to native-speakerdom”). We do not consider EFL uses of English as varieties in their own right.2 In Singapore, the Philippines, Kenya and Fiji, English is an official, institutionalized variety, i.e. it is used as the medium of instruction in secondary and tertiary, sometimes also primary education. A substantial number of speakers in these countries use English as a second language and some speakers may even grow up using English as their first language.3 These countries are typically 2. Note that exposure to English, e.g. via the media, varies considerably from one expanding circle country to another; in Scandinavia, for instance, but not in Germany, English films are generally not dubbed. Furthermore, with the introduction of English at an early stage in education and the growing importance of English as a medium of instruction in tertiary education, the status and functions of English may be in a state of flux even in outer- and expanding-circle countries (see Kachru & Nelson, 2006: 30). As to whether English as a Lingua Franca (ELF) or Euro-English are varieties in their own right, contrast the views in Seidlhofer (2001) and Mollin (2006). 3. Singapore, for instance, has a growing number of speakers who use English as their first language (see Schneider, 2007: 157).

 Marianne Hundt and Katrin Vogel

classified as belonging to the outer circle. Schneider (2003, 2007), in his dynamic modelling of the evolution of New Englishes, distinguishes five phases: foundation, exonormative stabilization, nativization, endo-normative stabilization and differentiation. According to his model, the postcolonial Englishes in Singapore, Malaysia, the Philippines, Kenya and Fiji have to be placed at different developmental stages: a. Singapore English (SingE) is the most advanced variety which has progressed into stage four – a nativized and stabilized New English that is beginning to be recognized as a local standard, a variety that has been said to be on the way of developing into an inner-circle variety (see Pakir 2001, Foley 2001, Schneider 2007); b. Kenyan English is at stage three of the cycle (ongoing nativization, no codification); c. Philippine English (PhilE) is also at stage three, but according to Schneider (2007: 143) external pressure is likely to stall any further developments beyond this stage; d. Fiji English is still at stage two, i.e. shows borrowing from the indigenous language but only incipient structural nativization;4 e. in Malaysia, English is no longer an official language but has been replaced by Bahasa Malay, most importantly also in the education sector (see Schneider 2007: 147f.); the situation has recently changed again in that English has been reintroduced as a medium of education for two subjects, namely science and maths (see Gill 2005);5 according to Schneider (2007: 148ff.), Malaysian English shows signs of nativization; on account of its past colonial history, its present status and the structural effects of nativization, Malaysian English is thus expected to take an intermediate position between the typical outer-circle varieties in Singapore, Kenya and Fiji on the one hand and expanding-circle varieties where English is used as a foreign language on the other hand.6

4. Note, however, that the Fiji English lexis has been described and codified (see Tent, Geraghty & Mugler 2006) and that work is underway to describe aspects of Fiji English grammar. 5.

For a detailed discussion of the discourse accompanying this change, see Gill (2008).

6. Despite the different development of English in Singapore and Malaysia, a lot of books treat the two varieties together. Pakir (2001: 11), for instance, claims that speakers in both countries are “making the shift to an Inner Circle membership [...]” and Lim (2001: 125), in the same volume, uses the label ‘SME’ to refer to Singaporean and Malaysian English as one variety but sets out to discuss structural similarities and differences between the two varieties on the basis of newspaper corpora. For the use of the label Malaysian-Singapore English, see also Kachru & Nelson (2006: 115).

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

A somewhat more problematic notion is that of contact vs. non-contact varieties of English. According to Sand (2004: 281), “all varieties of English which have experienced continued and intensive language contact over extended periods of time since the Early Modern English period when the fundamentals of Modern English grammar emerged” are defined as contact varieties. Obviously, BrE would be a non-contact variety according to this definition, whereas NZE (though an inner-circle variety) might potentially be a contact variety of English. The status of IrE is complicated by the fact that it emerged in a language contact situation between English and Celtic, but obviously long before the Early Modern English period. Sand (2004: 282) therefore considers it an ‘old’ language contact variety.7 The distinction between contact and non-contact varieties (or ‘contact’ and ‘continuity’ varieties, as Fraser Gupta (forthcoming) calls them) might be important for the discussion of possible substrate influence in some varieties which we would then expect to be absent from BrE (and possibly also IrE). Whether such a distinction can be upheld in view of linguistic facts will be discussed below. 3. Data As our source of empirical evidence, we used essays and exam scripts that were produced by students from the set of countries introduced earlier. For Kenya, Singapore, the Philippines, Ireland, New Zealand and Britain, we were able to draw on the 10 untimed essays and 10 exam scripts sampled for the publicly available components of the International Corpus of English (ICE). The Fijian sub-corpus is still being compiled, but the data we used will eventually be made available as part of ICE-Fiji. For Malaysia, a matching set of essays and exam scripts was collected from Malaysian students who had been enrolled in a TESOL M.A. course at Victoria University, Wellington, New Zealand (see Table 1).8 Note that the interpretation of the text categories sampled in ICE may vary because not just language structure but text types may be localized, so to speak (see Biewer et al. 2010 for more details). Some untimed student essays in ICE-Phil, for example, are very informal, as the following example illustrates:

7.

Kevin McCafferty (p.c.) claims that IrE developed at the same time as AmE.

8. Special thanks go to Jonathan Newton, Victoria University, Wellington, who provided the contacts with the Malaysian students; we would also like to express our gratitude to those students who sent us their essays and gave permission for their exam scripts to be included in the corpus; finally, a special word of thanks goes to Ingrid Fauser who helped with the digitalization of the exam scripts.

 Marianne Hundt and Katrin Vogel

Table 1. Composition and size of our Student Writing Corpus

ICE-Fiji ICE-Ken ICE-Phil ICE-Sing* Malaysia ICE-Ire ICE-NZ** ICE-GB Total

Essays

Exam Scripts

Total

20,196 20,054 21,693 23,558 20,135 20,934 20,966 21,325 168,861

19,605 20,085 21,259 21,888 20,998 21,088 21,257 21,262 167,442

39,801 40,139 42,952 45,446 41,133 42,022 42,223 42,587 336,303

*ICE-Sing includes the complete essays; these were initially searched before the word counts were made; therefore, the size of this sub-corpus slightly exceeds that of the other ICE sub-corpora. **Word counts for the ICE-NZ essays were kindly provided by Mark Chadwyck.

(1)

Plato would suggest aristocracy. And Freud would .... ehehehe ... As for me ... Er ... Argh. Math is so much easier. P. S. I didn’t realize how hard it is to write something that has to do with Philosophy until now. Too many thoughts. (ICE-Phil W1a-001) The different text-typological conventions, e.g. with regard to formality, will have to be borne in mind for the interpretation of the results. Another problem for cross-corpus comparability is the fact that in the ICLE corpora, timed and untimed essays are not strictly balanced in number and that many studies tend to treat them as one category only. Table 2 therefore only gives the total number of words for the learner sub-corpora. The comparative data for the Learner Englishes come from two previous studies based on components of the International Corpus of Learner English (ICLE): Virtanen (1997) provides evidence of the use of the progressive by Swedish, Finland-Swedish and Finnish learners; Westergren Axelsson & Hahn (2001) and Wulff & Römer (2009) are studies that make use of different versions of the German ICLE component which are therefore quoted as ICLE-German1 and ICLE-German2.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

Table 2. Size of the ICLE-sub-corpora ICLE-German* ICLE-Swedish ICLE-Finland-Swedish ICLE-Finnish Total

78,856 100,863 55,332 119,919 354,970

*A more recent version of ICLE-German, namely ICLEGerman2 (see below), contains approx. 234,000 words.

Finally, as the study by Wulff & Römer (2009) has shown, the level of proficiency of the learners plays an important role. It might be the case that the proficiency levels in our data set are not strictly comparable. This caveat applies not only to the ICLE corpora, but also to the ICE components, since academic writing is a skill that also native speakers have to acquire in their own language. 4. Definition of the variable The verbal construction that this study is concerned with consists of a form of the auxiliary verb be followed by the present participle ending in -ing:

(2) Joan is singing well. (3) I was reading a novel yesterday evening.

However, not all combinations of be and a form ending in -ing constitute progressives. Superficially similar constructions, which have to be excluded, are predicative adjectival participles (4), equative constructions containing gerunds (5), and appositively used participles (6):

(4) This book is fascinating. (5) To do so is taking a prudent step. (6) I was in the kitchen, washing dishes.

When it comes to its status as a grammatical category, the progressive is widely regarded as an aspect. Following Quirk et al. (1985: 197f.), its core meaning is to indicate a situation in progress at a given time. This meaning consists of the three components duration, limitation of duration (together adding up to temporariness), and possible incompleteness, which do not all need to be present in a given instance. Applied to the examples, in (2), the progressive signals above all that Joan’s singing is a temporary phenomenon, while in (3), it implies more than anything else that the speaker’s reading was not completed.

 Marianne Hundt and Katrin Vogel

Previous research on the progressive in the inner-circle varieties has revealed that it is characterized not only by major diachronic changes but also by ongoing language change. Smith (2005) reports an increase of 8 to 10 per cent for BrE and AmE in the period between 1961 and 1991. Among the various factors that contribute to this change are (a) the colloquialization of the norms of written English, (b) an ongoing extension of the range of possible subjects and (c) new combinations or more frequent use with different verbs that were not previously regularly used with the progressive (see also Section 5.2 below). This latter aspect makes it difficult to distinguish between ‘(deviant) feature’ and ‘error’ in the analysis of ENL, ESL and EFL usage of the progressive. One aspect of this is the extension of the progressive aspect to stative verbs. We would like to come back to this problem below and argue that grammaticality in this case is not a clear-cut binary choice. With the exception of progressive infinitives, our analysis is limited to finite verbal constructions. These may include present and past progressives as well as combinations of progressives with modal verbs. Also note that all coordinated and often partly elliptic progressive constructions were only counted once. Coordinated and elliptic verb phrases with different aspect marking are an extremely rare theoretical option; furthermore, examples from our ESL-corpora such as the following are stylistically clumsy (at best) or even on the verge of being ungrammatical from a native speaker’s point of view:

(7) [...] these successful people might have been getting or [(might) have] gained some coaching and support from the well educated in the society. (ICE-Fiji w1a006) As the major reference grammars point out, in nonfinite verb phrases, on the other hand, the progressive is common only in to-infinitives (e.g. I expect to be working all weekend) (see Quirk et al. 1985: 153; Huddleston & Pullum 2002: 1174). Present participle constructions functioning as nonfinite clauses as in (6) neutralize the aspectual distinction, although their finite clause equivalents may contain a progressive form: “the progressive/nonprogressive contrast is not normally applicable here, since -ing participle phrases are incapable of expressing this distinction formally” (Quirk et al. 1985: 238).9 Last but not least, the future marker be going to was excluded from the frequency calculations because it is widely considered as a semi-modal and is thus no longer related to progressive meaning. 9. From the perspective of grammatical theory, the only problematic cases are appositively used past participles, which allow both a notional and formal contrast (the suspects (being) examined by the police) (see Quirk et al. 1985: 153f). As this progressive marking is unusual and as they only differ gradually from the gerund participles discussed above (4), neglecting them altogether is reasonable.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

The absolute progressive frequencies obtained on the basis of this definition were normalized to 1,000 words to allow a direct comparison across all corpora and especially with the existing studies on learner language. We decided against taking into consideration possible differences in the verbal densities of the subcorpora. Such a procedure would not only involve a time-consuming manual tagging of all finite verb phrases, but is unlikely to produce results that are largely different from the number of progressive constructions in relation to the number of words. In Vogel (2007), for instance, the general order of the varieties under analysis was the same for both calculation methods. 5. Results 5.1

Quantitative findings

Overall raw frequencies and normalized frequencies across the corpora are given in Table 1-a in the Appendix. We decided to present the overall results in a slightly different way in Figure 1, namely with regard to increasing relative frequencies of progressives, starting with the corpus that yielded the lowest number of progressives: Most obviously, and perhaps somewhat surprisingly, the varieties do not fall into neat groups along the lines of the ENL-ESL-EFL taxonomy. Contrary to popular expectations, the corpora with the lowest relative frequency of progressives in student essays are neither the British nor the New Zealand components of the ICE corpora, but a corpus representing an outer-circle variety, namely SingE, together with an ‘old’ contact variety, namely IrE. The low frequency of progressives in the IrE data is especially surprising in view of the fact that the Celtic substrate is reported to have played a role in the rise of the progressive construction (see Keller 1925) and we might therefore expect IrE to use the progressive more frequently than, e.g., BrE. The student essays in ICE-GB come a close second in their relatively low frequency of progressive usage. To test for statistical significance of our results, we applied the log-likelihood test to our data.10 The results of these tests showed that there are no significant differences between adjacent varieties in Figure 1. In other words, progressives are not significantly underrepresented in SingE and IrE when compared with BrE. A somewhat surprising result is the fact that the difference between British and New Zealand student writing with regard to the use of the progressive did prove significant at the 0.1% level. 10. As shown by Rayson, Berridge & Francis (2004), for testing statistical significance with this kind of data, the log-likelihood is more reliable than the chi-square test.

 Marianne Hundt and Katrin Vogel

4.0 3.6

1.7

2.2

2.6

2.9

3.0

3.1

ICE-Fiji

ICLE-German1

ICLE-Finland-Swedish

ICLE-German2

ICLE-Swedish

ICE-NZ

ICE-Ken

ICLE-Finnish

Malaysia

ICE-Phil

ICE-GB

1.5

ICE-Ire

ICE-Sing

1.5

2.0

2.3

2.7

Figure 1. Normalized frequencies (per 1,000 words) of progressives across ENL, ESL and EFL corpora (student writing)

Our first research question can thus be answered with a clear ‘no’: ESL and EFL varieties of English do not generally use more progressives than ENL varieties. As a result, the answer to our second research question also has to be in the negative: our data did not produce neat groupings or gradient patterns of usage ranging from ENL to ESL and EFL varieties (with internal gradience in ESL varieties reflecting the degree of institutionalization); in particular, Malaysian English does not fall between typical outer-circle varieties and Learner Englishes. At the same time, however, students using the least institutionalized ESL variety in our data set, namely Fiji English, also used the progressive most frequently. The results in Figure 1 may therefore first and foremost represent different levels of proficiency, not only with respect to English as a second or foreign language but also with respect to the acquisition of writing skills in academic English. Figure 2 therefore presents available comparative data from more advanced academic writing for some inner-circle varieties and German learner writing of two levels.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

3.1

3.0 2.7

1.7

1.6

1.4

1.5

1.3

ICLE-German

CHALK

ICE-NZ-stud

ICE-NZ-ac

MICUSP_HS

Hyland_HS

ICE-Aus-ac

ICE-GB-stud

ICE-GB-ac

0.7

Figure 2. Normalized frequencies (per 1,000 words) of progressives reflecting different stages of academic writing in inner-circle varieties and German Learner English11

The results in Figure 2 show that the patterning observed in our initial comparison of ENL, ESL and EFL varieties is not simply a result of different proficiency levels (even though these do play a role); instead, we are dealing with a genuine regional difference in the inner circle where NZE is the Antipodean variety with clearly the most frequent use of progressives in both published and student academic writing. But overall frequencies are only part of the story. The qualitative analysis of our data tells a somewhat different tale. 5.2

Qualitative analysis

Apart from its core meaning, the central semantic characteristic of the progressive is that in combination with different verb senses (Aktionsarten) it takes on slightly 11. Results for published American academic writing and advanced student writing (Hyland_HS and MICUSP_HS, respectively) as well as for advanced and intermediate German learner writing (CHALK and ICLE-German, respectively) come from Wulff & Römer (2009). Data for the academic part of the ICE corpora come from Collins, 2009; viz. his figures for the use of the progressive in the humanities (category W2A of the corpora): he retrieved 16 progressives from ICE-GB, 32 from ICE-Aus and 61 from ICE-NZ in this section of the corpora.

 Marianne Hundt and Katrin Vogel

different meanings. Since these semantic interactions are highly complex and cannot be considered in detail here, this study focuses on two important features, namely the combination with stative verbs and its possible semantic effects. In most contexts, “the progressive is unacceptable with stative verbs” (Quirk et al. 1985: 198):

(8) *Mary is being a Canadian. (9) *She is wanting to entertain her students.

It may occur with some of them under specific circumstances. Where it does, temporariness needs to be implied rather than the permanence which the verb itself suggests (10), sometimes combined with a re-interpretation of the core meaning of the verb (11) or a tentative implication (12) (see Quirk et al. 1985: 202f; Huddleston & Pullum 2002: 166f): (10) We are living in the country. (11) The neighbours are being friendly. (12) What were you wanting? Similar to other semantic considerations, the classification of verbs and situations as ‘stative’ or ‘dynamic’ is far from clear-cut. The boundaries are even fuzzier when it comes to the interaction of these verbs with the progressive: As combinations of the construction with stative verbs have become more frequent across time in the ENL varieties (see e.g. Smith 2005), speakers in the inner circle countries themselves may no longer agree regarding the grammatical acceptability of specific instances. Nevertheless, the major reference grammars list stative verbs that may change to such a more dynamic meaning and others that are generally unfriendly towards the progressive. In addition to this information, we relied on the support of a female native speaker from Australia. In our two ENL corpora, no unexpected combinations of the progressive with stative verbs were found. The few instances of stative verbs fulfil the conditions described above, i.e. the verbs take on a more dynamic meaning: (13) The effects of post traumatic amnesia are a disorientation of time, where subjects might talk coherently but believe themselves to be living in events that happened to them weeks or years previously. (ICE-GB w1a-016) (14) With a subsequent drop in aid from the North in recent years, many Third World countries are having to rely more on Non-Government Organizations or N.G.O.s from the North, [...]. (ICE-NZ s1a-006) In our ESL corpora, unusual combinations of the progressive with stative verbs are also relatively rare. In fact, we found none in the Philippine and Malaysian student essays and only a few in the Singaporean and Kenyan corpora:

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

(15) For instance, a woman graduate who is a full-time homemaker without an income would rank lower in the SES scale than her peer who is holding a professional job. (ICE-Sing s1a-008) (16) Another thing is that the speaker and the writer are both having the same purpose they both use the language to communicate and information and they must have recipients to their message [...]. (ICE-Ken w1a-016) The texts written by the Fijian students are exceptional in this respect because they seem to contain comparatively more progressives with stative verbs. Our native speaker confirmed that as many as seven sentences where progressive forms were unacceptable from her as well as from our point of view. Typical examples are: (17) Jane is the outsider, she is the one who invades and takes over what was belonging to Bertha. (ICE-Fiji w1a-013) (18) When someone is being poor with the present economic hardships, any activity that comes across that brings money will be taken.(ICE-Fiji w1a-007) The most striking case is certainly (17) because, according to the major reference grammars, the verb ‘belong’ does not change its meaning under any circumstance, which makes it incompatible with progressive aspect in general. The fact that only few progressives with unusual stative verbs are attested in our ESL data partly has to be attributed to the kind of data we used. Our evidence comes from written texts, and, as the following examples of being from different spoken genres in the Singaporean ICE corpus illustrate, what we observed on the basis of the written data is obviously only part of a much larger picture. (19) a. Even then, it is not easy to tell whether a given organism is being helpful or harmful. (ICE-Sing w2b-024) b. [...] and it is primarily being available [...]. (ICE-Sing s2a-027) c. [...] we are being lax you see. (ICE-Sing s1a-027) d. The reason is because when when the model is still being is being in the factory it contains certain chemicals [...]. (ICE-Sing s2a-058) e. Oh no no nobody is suggesting that we’re being patronising by using Singlish [...] (ICE-Sing s1b-035) Stative progressives do occur in Singaporean English, but obviously more in its spoken than its written form. The same is likely for all other ESL-corpora. The learner corpora, on the other hand, are unlikely to contain (m)any unusual combinations of stative verbs with progressive aspect. In fact, as Westergren Axelsson & Hahn (2001: 13–15) point out, German, Swedish, Finland-Swedish and Finnish learners choose only those lexical verbs that are typically associated with progressive aspect. This observation is supported by Van Rooy (2006) (and research

 Marianne Hundt and Katrin Vogel

quoted therein) who found similar frequencies of stative verbs in progressive in ICLE-German and the corresponding native-speaker corpus as well as a tendency of German learners to overuse the semantic prototype of the construction. Two conclusions can be drawn from these findings. First of all, the tolerance towards combinations of the progressive with stative verbs seems to be stretched in the ESL varieties. This is not the case in Learner English. In fact, the interaction of the progressive with these verbs is one of the contexts examined in this study, where ESL indeed differs from both ENL and EFL. Or, from a slightly different perspective: If we follow Sand (forthcoming), who suggests that ESL varieties are actually the most advanced and might be leading the way that English as a whole will be taking, the result for combinations of stative verbs with the progressive is an ESL-ENL-EFL- rather than the ‘typical’ ENL-ESL-EFL-cline. Secondly, owing to the low overall numbers of stative verbs in progressive in all corpora, this stretched tolerance is definitely not the reason (at least not the most important one) for the high progressive frequencies we found for some varieties. A more likely explanation for a high overall frequency of progressives is the extension of the construction to new contexts, namely those where native speakers of BrE and IrE (and probably to a smaller degree NZE) would prefer a simple form. Westergren Axelsson & Hahn (2001: 20–22) provide examples from their EFL data that can be matched with examples from our ESL corpora: (20) This essay will be discussing six factors why women have to work for empowerment. (ICE-Fiji w1a-015) (21) It spread due to movement of laborers. It is being used now in Zambia as a language of education. (ICE-Ken w1a-003) (22) Whereas in the 2nd article, it says that the economy is fast rising ever since the Ramos Administration started. (ICE-Phil w1a-011) (23) However, according to Hume, there is not guarantee that just because nature has been uniformly functioning in the past, it will continue to do so always. (ICE-Sing w1a-014) (24) When conducting research on a class where speakers are learning English as a second language, Harbord (1992) found that the confidence of learners [...]. (Malaysia) This conclusion is also supported by both Rogers’ (2006) study on Indian English and Vogel’s (2007) pilot study of Kenyan and Fijian English. Example (22) is particularly interesting because it combines the present progressive with an adverbial that, in BrE, would call for a present perfect. Similarly, Fraser Gupta (1986) found that Singaporeans were using the present progressive in contexts such as “This is the first time that I am submitting a linguistics paper” in contexts where inner-circle speakers would use the present perfect. We will come back to this issue in the next section.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

6. Discussion Our quantitative and qualitative results do not quite confirm our initial expectations of a gradient that ranges from ENL to ESL and EFL varieties. The frequent occurrence of progressives in the New Zealand data, in particular, seems to require an explanation. In this context, we would like to return to sentence (22) above. The example illustrates the possible co-occurrence of the present progressive with a temporal adverbial that would require a present perfect in British English. It is difficult to dismiss this example as a typical pattern of or an erroneous usage from an ESL variety because it bears a striking resemblance to a chance finding from NZE: While holidaying in New Zealand, one of the authors came across a notice in the shared bathroom facilities on a camping ground which asked parents to accompany young children to showers and toilets. The reason given was “We are experiencing too many accidents of late.” For a speaker of BrE, this sentence is likely to be odd because it combines a present progressive with an adverbial that usually requires the present perfect (most likely a simple present perfect). There was no obvious non-native influence on this sign and, furthermore, the native speakers of NZE that were asked to comment on the sentence did not find this usage unusual. A possible reason why this particular extension of the progressive to contexts of perfective marking might be relatively unobtrusive to some speakers in the inner circle is that the past progressive is also used in a similar way, namely as a marker of recent past: “Tom, you were just telling me that in all you had nine students going down there” (COCA:CNN_Morning, 1997; quoted from Bergs and Pfaff 2009). Likewise, Fraser Gupta (2006: 104f.) found that the progressive is occasionally used by inner-circle speakers (mainly in the US) in the “This is the first time”-contexts which we mentioned above.12 A possible explanation for the quantitative and – to a certain degree – also the qualitative similarities between NZE and some ESL varieties might be that NZE has been influenced by ESL varieties to the extent that the ENL-ESL distinction in this country is beginning to get blurred (at least with respect to the usage of the progressive). In a relatively small speech community like New Zealand, speakers of ESL and EFL varieties might have a greater impact on the language use of ENL speakers. In other words, close and frequent contact between ENL and ESL speakers in a country like New Zealand might have simply accelerated one aspect of 12. Similarly, Gachelin (1997: 43) makes the point that the extended use of the progressive (both in terms of frequencies and functions) in New Englishes may eventually lead to long-term change in the English language as a whole: “Its generalization [...] may herald what will be World English usage in the next century.” On the use of the construction in IrE, see e.g. Filppula (1999) and Hickey (2007).

 Marianne Hundt and Katrin Vogel

ongoing language change.13 The overall low frequency of progressives in our Singaporean sample, in turn, might be a result of a lingering concern with external norms, i.e. the BrE exonormative model – especially in formal written language. Students from Singapore could be seen as overachievers in that they use even fewer progressives than students in Britain.14 The fact that we found typical ESL usage of progressives in the informal spoken component of ICE-Sing would lend additional support to this interpretation. The central qualitative result of our analysis, namely the tendency in ESL only to combine the progressive with a wider range of stative verbs, may be seen in the light of the distinct “routes of development” (see Mesthrie & Bhatt 2008: 160–63) of ESL and EFL. In countries of the outer circle, English is not only learnt in the classroom, but widely used and acquired outside.15 As a result, speakers use the language – and students use the progressive – in a creative way. Learners’ attention in classrooms of the expanding circle, on the other hand, is drawn to grammaticality issues and semantic restrictions of the progressive, making them more likely to overuse the prototype of the construction and less likely to ‘stretch’ the progressive to new contexts, such as combinations with certain stative verbs or new aspectual uses.16 Finally, a couple of caveats emerge from the present study. On a methodological level, the following should be borne in mind: a. some of the differences that we observe may have to be attributed to the relatively small size of the subsections of the ICE corpora (and possible skewing effects, for instance with respect to the essay or exam topic) rather than genuine regional variation; b. in the case of learners in particular, larger corpora capturing different levels of proficiency would provide more insight. c. Conclusion on national/regional variation can only been drawn if authors have roughly the same proficiency level and interpret the text type in the same way; d. when comparing ENL, ESL and EFL, it is important to take variation within ENL, ESL and EFL into account before generalizing to the type of variety as such. 13. On lexical innovations borrowed into general NZE from the emerging variety of Pasifika English, see Hay et al. (2008: 108f.). 14. Likewise, learners in an EFL-context seem to follow the idealized grammar more closely (see Van Rooy 2006: 60–62). 15. Note, however, that Platt, Weber and Ho (1984: 73) claim the more frequent use of the progressive in New Englishes might be due to ‘overteaching’. 16. Van Rooy points out that the overuse of the progressive in Black SAfE has been attributed to it being overemphasized in the teaching as a second language (Van Rooy, 2006: 38); while this at first seems to go against our argument, his later comment fits into our discussion in that he, too, emphasizes the difference between tutored and untutored learning.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

7. Conclusion Our study confirms the results on a related study of the progressive passive in inner- and outer-circle varieties which, likewise, did not necessarily group together as they might have been expected (see Hundt 2009). One conclusion that we might have to draw could be that the somewhat simple categorization into inner-, outerand expanding-circle varieties is in need of revision or at least should be modified somewhat. Instead, what we are looking at might be a set of varieties that show (a) the effects of language contact and (b) exo-normative influence to a greater or lesser extent. If ESL varieties are, indeed, having an influence on ENL varieties then the distinction between contact and non-contact varieties appears to be less useful than previously assumed, as well. The distinction between ENL and ESL, in particular, might be more relevant sociolinguistically, in terms of people’s perception of their own and others’ status as speakers; structurally, however, the distinction might be much less clear when we look at the frequency and usage of individual features. In other words, increasing globalization might eventually blur distinctions between ENL, ESL, and EFL varieties.17 Perhaps some ENL varieties – with NZE leading the way – are developing into what we could call ‘secondary’ contact varieties. Our study feeds into the discussion about the relation between research on second language acquisition (SLA) and world English, which have not really connected so far. Kachru & Nelson (2006: 85f.) enumerate a number of reasons why this is the case. Among these, they list various axioms on which theories for SLA are based. Three of these are relevant to the issue that we have addressed in our paper, namely the question whether progressives are ‘overused’ in ESL and EFL varieties of English. These are (ibid.: 86): 1. The learners in the ESL settings are expected to acquire the competence to use the language effectively with native speakers. The norm is the idealized grammar that underlies the native speaker competence as envisioned in linguistic theory, i.e., the grammars of Standard American or British English; Australian, Canadian and New Zealand Englishes are still rare as norms. [...] 4. The roles that are or could be assigned to the primary language(s) of the learners are either impediments, or in some cases, facilitators to target language acquisition. 17. A piece of anecdotal evidence that would support such a scenario comes from a colleague – a native speaker of BrE who is married to someone with an EFL background: he occasionally uses discuss about in his e-mails, a collocation that is typical of ESL (e.g. Indian English) as well as EFL (e.g. German Learner language) varieties.

 Marianne Hundt and Katrin Vogel

5. Any difference from the American or British models in pronunciation, grammar, vocabulary [...] is evidence of failure: the grammatical differences are clues to fossilizations [...]. In a globalizing world where ESL and EFL varieties are beginning to impinge on ENL varieties, we do not only need to take into account usage patterns in younger inner-circle varieties such as NZE; SLA research might also ultimately have to revise the importance of ‘the’ native speaker model when some ‘native’ speakers come very close (for some even uncomfortably close) to ESL and EFL speakers in their usage and acceptance of certain grammatical patterns. The growing resemblance of an ENL variety like NZE and some ESL varieties also highlights the problem that the process of nativization in ESL varieties is often described as ‘erroneous’ language use (see Maley 2001). Thus, nativization happens locally, in ESL varieties, but these local developments may later globalize again and make it into standard English usage via what we have provisionally called ‘secondary’ contact varieties. The use of the progressive in contexts where previously a simple perfect construction was being used might turn out to be just one such example in the long run. All in all, our study should be seen as a strong claim for more systematic and large-scale comparisons of Englishes from different acquisitional contexts and parts of the world. Corpus-linguistic evidence, in particular, shows how varieties overlap much more than we would expect them to when we study the use of individual features. The corpus-based approach may also help us to distinguish between more exotic patterns, on the one hand, and regularizing usage on the other. Systematic corpus linguistic investigation and comparison of outer-circle and expanding-circle usage with ENL data are likely to reshape our perception because they bring to the fore the similarities and tone down the ‘exotic’ aspects of ESL and EFL usage (see Fraser Gupta 2006: 98). References Bergs, A. & Pfaff, M. 2009. I was just reading this article. Is the perfect of the recent past on its way out? Paper presented at the SEU Symposium Current Change in the English Verb Phrase, 14 July 2009. Biewer, C., Hundt, M. & Zipp, L. 2010. How a Fiji corpus? Challenges in the compilation of an L2 ICE component. ICAME Journal 34: 5–23 Collins, P. 2009. The progressive in English. In Comparative Studies in Australian and New Zealand English: Grammar and Beyond, P. Peters, P. Collins & A. Smith (eds), 115–123. Amsterdam: John Benjamins. Filppula, M. 1999. The grammar of Irish English. London: Routledge.

Overuse of the progressive in ESL and learner Englishes – fact or fiction?  Foley, J. 2001. Is English a first or second language in Singapore? In Evolving Identities: The English Language in Singapore and Malaysia, V. Ooi (ed.), 12–32. Singapore: Times Academic Press. Fraser Gupta, A. 1986. A standard for written Singapore English? English World-Wide 7(1): 75–99. Fraser Gupta, A. 2006. Standard English in the world. In English in the World: Global Rules, Global Roles, R. Rubdy & M. Saraceni (eds), 95–109. London: Continuum. Fraser Gupta, A. Forthcoming. One World, one English. Gachelin, J.M. 1997. The progressive and habitual aspects in non-standard Englishes. In Englishes Around the World 1. General Studies, British Isles, North America. Studies in Honor of Manfred Görlach [Varieties of English around the World G18], E.W. Schneider (ed.), 33–46. Amsterdam: John Benjamins. Gill, S.K. 2005. Language policy in Malaysia: Reversing direction. Language Policy 4(3): 241–60. Gill, S.K. 2008. Shift in language policy in Malaysia: Unravelling reasons for change, conflict and compromise in mother-tongue education. Association Internationale de Linguistique Appliqué Review 20: 106–22. Hay, J., Maclagan, M. & Gordon, E. 2008. New Zealand English. Edinburgh: EUP. Hickey, R. 2007. Irish English. Cambridge: CUP. Huddleston, R. & Pullum, G.K. 2002. The Cambridge Grammer of the English Language. Cambridge: CUP. Hundt, M. 1998. New Zealand English Grammar: Fact or Fiction? [Varieties of English around the World 23] Amsterdam: John Benjamins. Hundt, M. 2009. Global feature – local norms? A case study on the progressive passive. In World Englishes – Problems, Properties and Prospects [Varieties of English around the World G40], T. Hoffmann & L. Siebers (eds), 287–308. Amsterdam: John Benjamins. Kachru, B.B. 1986. The power and politics of English. World Englishes 5: 121–140. Kachru, Y. & Nelson, C.L. 2006. World Englishes in Asian Contexts. Hong Kong: Hong Kong University Press. Keller, W. 1925. Keltisches im englischen Verbum. In Anglica: Untersuchungen zur Englischen Philologie, Alois Brandl zum Siebzigsten Geburtstage überreicht. In Sprache und Kulturgeschichte. (Palaestra) 147: 55–66. Leech, G., Hundt, M., Mair, Ch. & Smith, N. 2009. Change in Contemporary English: A Grammatical Study. Cambridge: CUP. Lim, G. 2001. Till divorce do us part: The case of Singaporean and Malaysian English. In Evolving Identities: The English Language in Singapore and Malaysia, V. Ooi (ed.), 125–139. Singapore: Times Academic Press. Maley, A. 2001. Jumping on the Bangwagon: Issues in student writing. In Evolving Identities: The English Language in Singapore and Malaysia, V. Ooi (ed.), 112–124. Singapore: Times Academic Press. Mesthrie, R. & Bhatt, R.M. 2008. World Englishes: The Study of New Linguistic Varieties. Cambridge: CUP Mollin, S. 2006. Euro-English: Assessing Variety Status. Tübingen: Gunter Narr. Nesselhauf, N. 2009. Co-selection phenomena across New Englishes: parallels (and differences) to foreign learner varieties. English World-Wide 30(1): 1–26. Pakir, A. 2001. The voices of English-knowing bilinguals and the emergence of new epicentres. In Evolving Identities. The English Language in Singapore and Malaysia, V. Ooi (ed.), 1–11. Singapore: Times Academic Press. Platt, J., Weber, H. & Ho, M.L. 1984. The New Englishes. London: Routledge & Kegan Paul.

 Marianne Hundt and Katrin Vogel Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Rayson, P., Berridge, D. & Francis, B. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Le poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual data, Louvain-la-Neuve, Belgium, March 10–12, 2004, Vol. 2, G. Purnelle, C. Fairon & A. Dister (eds), 926–936. Louvain: Presses universitaires de Louvain. Rogers, C.K. 2002. Syntactic features of Indian English: An examination of written Indian English. In Using Corpora to Explore Linguistic Variation [Studies in Corpus Linguistics 9], R. Reppen, S.M. Fitzmaurice & D. Biber (eds), 187–202. Amsterdam: John Benjamins. Sand, A. 2004. Shared morpho-syntactic features in contact varieties of English: Article use. World Englishes 23: 281–298. Sand, A. Forthcoming. Angloversals? Shared Morpho-Syntactic Features in Contact-Varieties of English. Amsterdam: John Benjamins. Schneider, E.W. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79(2): 233–281. Schneider, E.W. 2007. Postcolonial English: Varieties around the World. Cambridge: CUP. Seidlhofer, B. 2001. Closing a conceptual gap: The case for a description of English as a lingua franca. International Journal of Applied Linguistics 11: 133–158. Smith, N. 2005. A Corpus-Based Investigation of Recent Change in the Use of the Progressive in British English. PhD dissertation, Lancaster University. Tent, J., Geraghty, P. & Mugler, F. 2006. Macquarie English Dictionary for the Fiji Islands. Sydney: Macquarie Library. Van Rooy, B. 2006. The extension of the progressive aspect in Black South African English. World Englishes 25(1): 37–64. Virtanen, T. 1997. The progressive in non-native speaker and native speaker composition: Evidence from the International Corpus of Learner English. In Corpus-based studies in English: Papers from the Seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17), M. Ljung (ed.), 299–309. Amsterdam: Rodopi. Vogel, K. 2007. Glocalization? A Case Study on the Progressive in Fijian and Kenyan English. Zulassungsarbeit zum Ersten Staatsexamen, Ruprecht-Karls-Universität Heidelberg. Westergren Axelsson, M. & Hahn, A. 2001. The use of the progressive in Swedish and German advanced learner English – a corpus-based study. ICAME Journal 25: 5–30. Wulff, S. & Römer, U. 2009. Becoming a proficient academic writer: Shifting lexical preferences in the use of the progressive. Corpora 4(2): 115–133.

Overuse of the progressive in ESL and learner Englishes – fact or fiction? 

Appendix Table 1-a. Frequencies of the progressive in EFL, ESL and ENL varieties (figures for EFL varieties are from Virtanen 1997: 301, 303; Westergren Axelsson & Hahn 2001: 11; and Wulff & Römer 2009) Corpus ICLE-German1 ICLE-German2 ICLE-Swedish ICLE-Finland-Swedish ICLE-Finnish ICE-Fiji ICE-Ken ICE-Sing Malaysia ICE-Phil ICE-Ire ICE-NZ ICE-GB

∑ progressives

no. of words

progressives/1,000 words

287 705 289 172 274 159 105 70 90 87 65 113 73

78,856 approx. 234,000 100,863 55,332 119,919 39,801 40,139 45,446 41,133 42,952 42,022 42,223 42,587

3.6 3.0 2.9 3.1 2.3 4.0 2.6 1.5 2.2 2.0 1.5 2.7 1.7

Typological profiling Learner Englishes versus indigenized L2 varieties of English*1 Benedikt Szmrecsanyi and Bernd Kortmann University of Freiburg

Drawing on naturalistic corpus data, this study is an exercise in establishing typological profiles of learner varieties (as sampled in the International Corpus of Learner English) vis-à-vis indigenized L2 varieties of English (as represented in the International Corpus of English), though we also include in our dataset, for benchmarking purposes, a number of European languages as well as three stylistic varieties drawn from the British National Corpus. Our research is informed by two typological parameters widely used in the crosslinguistic classification of languages: overt grammatical analyticity, which we operationalize as the text frequency of free grammatical markers, and overt grammatical syntheticity, which we define as the text frequency of bound grammatical markers. The principal insight afforded by this study is that learner Englishes and indigenized L2 varieties of English have strikingly different typological profiles, a finding which we trace back to a number of grammatical markers whose function and frequency differs between the two groups. We also present a methodology to explore if learner Englishes are sensitive to typological properties of their substrate languages, and find that this is not generally the case.

1. Introduction In this paper, our primary research interest lies with the typological profiles of learner Englishes (as sampled in the International Corpus of Learner English), on the one hand, and of indigenized L2 varieties of English (as represented in the International Corpus of English), on the other hand. To this purpose, we take an * We thank the following colleagues and student assistants for coding a number of European languages for their analyticity and syntheticity profiles: Alice Blumenthal (French), Johanna Gerwin (German), Stefan Madeja (Italian), and Tatiana Perevozchikova (Bulgarian, Czech, and Russian). All interpretational flaws are, of course, ours.

 Benedikt Szmrecsanyi and Bernd Kortmann

interest in the coding of grammatical information and draw on terminology, concepts, and ideas developed in quantitative morphological typology. Specifically, we will be concerned with two time-honored and well-known typological parameters, analyticity and syntheticity, which go back at least to August Wilhelm von Schlegel (cf., for instance, 1818). While this is not the place to review the rich history of thought about these notions that has unfolded since the 19th century, we feel compelled to point out here that the terms “are used in widely different meanings by different linguists” (Anttila 1989: 315). Thus, to fix terminology right at the outset, we define formal grammatical analyticity as comprising all those coding strategies where grammatical information is conveyed by free grammatical markers, which we in turn define as function words that have no independent lexical meaning. Conversely, we take formal grammatical syntheticity to comprise all those coding strategies where grammatical information is signaled by bound grammatical markers. We have shown elsewhere (Szmrecsanyi & Kortmann 2009a; Kortmann & Szmrecsanyi 2009, to appear; Szmrecsanyi 2009) that variability along the analyticity-syntheticity continuum is, in fact, endemic among synchronic and shortterm diachronic varieties of English. Specifically, our research has highlighted the fact that first, there is a good deal of geographic variation (for instance, Southeast Asian varieties of English are comparatively economical as far as the overall extent of grammatical coding is concerned); second, that we find significant differences according to variety type (for instance, low-contact, traditional English dialects tend to be more synthetic than other variety types); third, that there is substantial register variability (as a rule, written varieties of English prefer syntheticity, spoken varieties of English go for analyticity); and, lastly, that written English appears to have become more synthetic and less analytic over the past four decades or so. In the present contribution, then, we aim to add learner Englishes to our variety portfolio. The primary research question that will guide our empirical analysis is whether learner Englishes and indigenized L2 varieties share typological properties thanks to certain concomitants of second language acquisition (sla). For instance, much research in the sla vein has emphasized that learners – especially in early interlanguage stages – avoid synthetic structures and opt for analytic marking whenever possible (see, for instance, Seuren & Wekker 1986; Wekker 1996; Klein & Perdue 1997). One would thus hypothesize that both learner Englishes and indigenized L2 varieties of English will exhibit less syntheticity and more analyticity than, e.g., standard British English reference varieties, all other things being equal. A secondary research question that we shall be concerned with is whether and to what extent typological profiles of individual learner Englishes are

Typological profiling 

conditioned on learners’ native language background. To investigate such substrate effects, we shall present a methodology that will also involve profiling a number of European languages in terms of their analyticity and syntheticity levels. In this connection, a few comments on our general methodological orientation are in order. First, note that this study is an exercise in typological profiling rather than error analysis, which is why we shall also remain fairly agnostic about the distinction between nativeness and non-nativeness. Second, we maintain that the appropriateness of an integrated model for learner Englishes and indigenized L2 Englishes, and the appropriateness of labels such as English as a Second Language (esl) and English as a Foreign Language (efl), is not an a-priori issue but rather an actual empirical question, which the present study will attempt to shed light on. Third, we claim that the analysis of naturalistic corpus data is but one method to profile varieties of English, which can and – we believe – should be complemented by, e.g., survey-based evidence in the spirit of Kortmann & Szmrecsanyi (2004) and Szmrecsanyi & Kortmann (2009a,b,c). This paper is structured as follows. In Section 2, we present our dataset. In Section 3, we detail our empirical method. Section 4 will present our results. Section 5 offers a discussion of our findings and some concluding remarks. 2. Data This study investigates 25 data points: 11 learner Englishes, 5 indigenized L2 varieties of English, 3 standard British English benchmark registers, and 6 European mother-tongue languages. 2.1

Learner Englishes

To study learner Englishes, we tapped the International Corpus of Learner English (icle) Version 1.1 (Granger 1998; Granger et al. 2002), a resource providing a large number of essays by advanced learners of English with different mother tongue backgrounds. We selected 11 subcorpora which sample texts by learners with the following native and first languages at home: Bulgarian, Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish, and Swedish. With a view to targeting medium-advanced learners of English, we typically only included essays by learners who had studied 5–6 years of English at school, 2–3 years of English at university, and who spent a maximum of 2 months in an

 Benedikt Szmrecsanyi and Bernd Kortmann

English-speaking country.1 The 11 subcorpora thus selected span a total of approximately 266,000 words of running text. 2.2

Indigenized L2 varieties

To obtain data on indigenized L2 varieties of English, we turned to the International Corpus of English (ice) (cf. Greenbaum 1996). Specifically, we were interested in the student essay and exam script sections (genre code w1a), thus matching as far as possible the text type sampled in icle. We accessed the following ice components: ice-East Africa, ice-Hong Kong, ice-India, ice-Philippines, and ice-Singapore, which left us with five data points on indigenized L2 varieties of English based on a total of 281,000 words of running text. The five L2 varieties thus included in this study are classic ‘New Englishes’ (Platt et al. 1984) or, in Kachru’s parlance, ‘outer circle’ varieties (e.g. Kachru 1985), even though they represent very different developmental stages. 2.3

Standard British English benchmark varieties

Building on a previously published dataset (cf. Szmrecsanyi 2009) drawn from the British National Corpus (bnc) (cf. Aston & Burnard 1998), we also included in our analysis three standard British English registers for benchmarking purposes: school essays (genre classification code W_essay_school; 147,000 words of running text), university essays (genre classification code W_essay_univ; 65,000 words of running text), and – as the lone spoken register subject to analysis in the present study – spontaneous face-to-face conversation (genre classification code S_conv; approx. 4.3m words of running text). 2.4

European mother-tongue languages

To investigate the issue of substrate effects, we also profiled six European languages: Bulgarian, Czech, French, German, Italian, and Russian. For each of these languages, we drew on comparatively small corpora (approx. 10,000 words of running text each) sampling quality newspaper prose.

1. To obtain sufficiently large subcorpora, slight adaptations of these criteria were necessary for Finnish learner essays (adaptation: 4–7 years of English at school, 1–3 years at university, max. 4 months abroad) and Swedish learner essays (adaptation: 4–7 years of English at school, 2–4 years at university, max. 4 months abroad).

Typological profiling 

3. Method Methodically, the present study is going to explore part-of-speech (henceforth: pos) frequencies to gauge typological profiles, utilizing an aprioristic but theoryinformed categorization of pos categories to derive two frequency-based indices: an analyticity index and a syntheticity index. Our method is inspired by Joseph Greenberg’s (1960) seminal paper entitled ‘A Quantitative Approach to the Morphological Typology of Language’. Greenberg (1960) demonstrated that seemingly abstract typological notions can be sufficiently precisely measured by calculating a number of indices, on the empirical basis of naturalistic texts. Succinctly put, the present study will apply Greenberg’s index method to the corpus material described in the previous section. 3.1

Coding varieties of English

In the case of icle and ice (which unlike the bnc are not pos-annotated in the first place), an algorithm selected 1,000 random tokens (i.e. orthographical words) per variety studied. This yielded a dataset of 16,000 word tokens (11 icle varieties plus 5 ice varieties, multiplied by 1,000). Subsequently, all of these word tokens were annotated manually for their pos class using the bnc’s claws5 tag set (cf. Aston & Burnard 1998) with a minor extension.2 The technicalities are discussed in Szmrecsanyi (2009), a paper that also reports measures of interrater reliability and of the robustness of results deriving from 1,000-token samples. In the case of the bnc (a corpus that is pos-annotated in the first place, such that manual coding efforts are not a limiting factor), our results will be based on pos-frequencies not in random token samples but in the respective bnc texts as a whole. Given our definition of analyticity and syntheticity offered at the beginning, pos-tags (or rather the tokens annotated with pos-tags) were placed into two relevant categories: – Analytic tokens: conjunctions, subjunctions, and prepositions (tags CJ*, PRF, PRP); determiners, articles, and wh-words (D*, AT0, AVQ, PNQ); existential there (EX0); pronouns (PNI, PNP, PNX); the tokens more and most; the infinitive marker to (TO0); modals (VM0); the negator not (XX0), auxiliary be ((A)VB* + V*, (A)VB* + * + V*, (A)VB* + XX0), auxiliary do ((A)VD* + V*,

2. As in the claws8 tagset, the primary verbs be, do, and have were explicitly annotated for whether they occurred in auxiliary function by prefixing the character ‘A’ to the claws5 tag; note that in the analysis of the bnc itself, primary verbs were automatically disambiguated contextually for auxiliary or main verb usage.

 Benedikt Szmrecsanyi and Bernd Kortmann

(A)VD* + * + V*, (A)VD* + XX0), and auxiliary have ((A)VH* + V*, (A) VH* + * + V*, (A)VH* + XX0). – Synthetic tokens: the s-genitive (POS); comparative and superlative adjectives (AJC, AJS); plural nouns (NN2); plural reflexive pronouns (PNX + word token ending in *ves); inflected verbs ((A)V*D, (A)V*G, (A)V + N, (A)V*Z). Perl retrieval scripts were subsequently run on the dataset and established the text frequencies of the relevant pos-tags (or pos-tag categories), utilizing the above categorization to generate two Greenberg-inspired index values per data point as output: – The analyticity index: the ratio of the number of free grammatical markers in a sample (F) to the total number of words in the sample (W), normalized to a sample size of 1,000 tokens. Hence: analyticity index = f/w × 1,000. – The syntheticity index: the ratio of the number of words in a sample that bear a bound grammatical marker (B) to the total number of words in the sample (W), normalized to a sample size of 1,000 tokens. Hence: syntheticity index = b/w × 1,000. Both indices have a lower bound of 0, and an upper bound of 1,000 index points. 3.2

Coding European mother-tongue languages

The method used to code the non-English data points was overall very similar to the method utilized to code varieties of English. An algorithm selected 1,000 random word tokens from each of the six 10,000 word corpora sampling written Bulgarian, Czech, French, German, Italian, and Russian. These randomly selected word tokens – in all, 6,000 – were then coded, typically by native speakers, (i) for whether or not they are function words (defined as conjunctions, subjunctions, prepositions, determiners, articles, pronouns, infinitive markers, modals, negators, or auxiliary verbs), and (ii) for the number of bound grammatical markers borne by each token (note here that unlike English, Russian, for example, can affix several inflections to a single word token, in which case every one of those inflections loads on the syntheticity index). Retrieval scripts were then run on the coded random word token samples to calculate the indices. 4. Results We move on to a discussion of our results. Section 4.1. canvasses the big picture by projecting index scores to typological space. Secion 4.2. deconstructs the index

Typological profiling 

scores to isolate grammatical markers that are implicated in the overall variability. Section 4.3. investigates the issue of substrate effects between learner Englishes and their respective substrate languages. 4.1

The big picture

Figure 1 is a two-dimensional plane visualizing overall analyticity-syntheticity variability in typological space. The vertical axis plots analyticity index scores while the horizontal axis indicates syntheticity index scores. Thus, icle-Spanish, in the top left corner of the diagram, turns out to be the most analytic and least synthetic variety in our sample: the data point is associated with an analyticity index score of 541 (meaning that in 1,000 words, 541 words are function words) and a syntheticity index score of 133 (hence, of 1,000 words, 133 bear a bound grammatical marker). At the other end of the spectrum, in the bottom right corner of Figure 1, we find ice-India as the most synthetic and least analytic variety in the sample (analyticity index score: 390, syntheticity index score: 208). Among the icle data points sampling learner Englishes, it is icle-Czech that stands out as the most synthetic (syntheticity index score: 191) and least analytic (analyticity index score: 440) learner variety. As for the ice components sampling indigenized L2 varieties of English, we find that ice-Hong Kong is the least synthetic data 560 540

ICLE-Spanish ICLE-French ICLE-Swedish

520

Analyticity index

500 480

ICLE-Bulgarian

ICLE-Russian ICLE-Italian

BNC-S_conversation

ICLE-Polish ICLE-Dutch

ICLE-Finnish ICLE-German ICE-PHI-W1A

460

ICE-SIN-W2A

440

ICE-HK-W1A ICE-EA-W1A

BNC-W_essay_school ICLE-Czech BNC-W_essay_univ

420 400 380 360

indigenized L2 variety of English learner English standard British English

ICE-IND-W1A

130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 Syntheticity index

Figure 1. Analyticity by syntheticity (in index points, per thousand words): indigenized L2 varieties of English as sampled in ice (black dots), learner Englishes as sampled in icle (white dots), and standard British English registers as sampled in the bnc (grey diamonds)

 Benedikt Szmrecsanyi and Bernd Kortmann

point (syntheticity index score: 163) while it is ice-Philippines that yields the most analytic essay material (analyticity index score: 470). As for the three Standard British English registers (all drawn from the bnc) included in our inquiry, observe that these form a neat continuum from more analytic and less synthetic to more synthetic and less analytic: face-to-face conversation (bnc-S_conversation) tends towards the analytic pole (analyticity index score: 496, syntheticity index score: 148), university essays (bnc-W_essay_univ) are situated close to the synthetic pole (analyticity index score: 427, syntheticity index score: 197), and school essays (bnc-W_essay_school) cover the middle ground (although they are a lot closer to university essays than to face-to-face conversation).

Table 1. Mean syntheticity and analyticity indices by variety type: indigenized L2 varieties of English (ice) versus learner Englishes (icle) (significance of inter-group differences according to an independent samples t-test)

analyticity index syntheticity index

indigenized L2 varieties of English

learner Englishes

significance of group difference

438 187

494 167

t = –3.92 (p = .002) t = 2.01 (p = .064)

ICE-EA-W1A ICE-SIN-W2A ICLE-Czech ICE-HK-W1A ICE-IND-W1A ICE-PHI-W1A ICLE-Finnish ICLE-German ICLE-Swedish ICLE-Dutch ICLE-Italian ICLE-French ICLE-Russian ICLE-Polish ICLE-Bulgarian ICLE-Spanish 0

50

100

150

200

Figure 2. Hierarchical agglomerative cluster analysis: indigenized L2 varieties of English as sampled in ice versus learner Englishes as sampled in icle (cluster algorithm: Ward)

Typological profiling 

All this is another way of saying that there is, in our dataset, a good deal of variability along the analyticity and syntheticity dimensions. This variability notwithstanding, we must not miss an important generalization, which is that as a rule, learner Englishes are significantly more analytic than indigenized L2 varieties of English. There is also a tendency for indigenized L2 varieties to be more synthetic than learner Englishes. Table 1 details that the average icle variety has 56 more function words per 1,000 words than the average ice variety. Conversely, the average ice variety exhibits 20 more bound grammatical markers per 1,000 words than the average icle variety. Mean index values aside, Figure 1 moreover makes clear that the dispersion around the mean index scores displayed in Table 1 is sufficiently small to warrant treating icle varieties and ice varieties as two fairly discrete and internally coherent variety groups. For the sake of backing up this impressionist assessment in a statistically more robust way, a supplementary cluster analysis – see Figure 2 for the resulting dendrogram – run on an Euclidean distance matrix derived from the 16 × 2 index matrix confirms that on the whole, icle varieties and ice varieties indeed split up quite nicely into two different clusters. There are, in fact, only two outliers. For one, icle-Czech is grouped with the indigenized L2 varieties; second, ice-Philippines ends up in the icle cluster. The bottom line of all this is that the icle varieties and ice varieties, as groups, are clearly two different animals with regard to their degrees of analyticity and syntheticity – they have different typological profiles with only minimal overlap. How do the Standard British English registers fit into the picture? A glance at Figure 1 tells a simple story: Standard British English university essays are located right in the center of gravity of the ice cluster, Standard British English face-toface conversation is situated in the icle cluster, and school essays are to be found in the no man’s land between the ice and icle cluster. While we concede that conflating variety and register variation in one graph is not unproblematic, we still believe that this sort of linguistic geography allows for two interesting interpretations. It might be the case that ice essay writers are simply better at targeting Standard British English norms (if indigenized L2 variety users actually are in the business of targeting ‘standard’ norms, an issue whose discussion is beyond the scope of the present study) than icle essay writers, who underuse synthetic marking and overuse analytic marking at the expense of proximity to target norms. In this interpretation, then, icle essays conform to the well-known anti-syntheticity sla universal (Seuren & Wekker 1986; Wekker 1996; Klein & Perdue 1997) whereas ice essays do not, and it is a mere accident – one that is due to the fact that spoken texts always tend to be more analytic and less synthetic than written texts (see Szmrecsanyi 2009) – that Standard British English face-to-face conversation is close to the icle cluster. An alternative interpretation of the facts at hand is that we are dealing here with a phenomenon known as register interference

 Benedikt Szmrecsanyi and Bernd Kortmann

(Aijmer 2002: 55): both ice varieties and icle varieties are close, in their own ways, to Standard British English. It is just that icle writers (or the teachers that instruct them, or the text books used in the classroom) may not be fully aware of certain stylistic implications of analytic and synthetic modes of grammatical marking, which leads them to adopt inappropriately oral and conversational norms. Our data do not put us in a position to settle for good the question which interpretation is the correct one, but both probably have merit. By way of an interim summary, we have seen in this section that learner Englishes as sampled in icle and indigenized L2 varieties of English as sampled in ice have demonstrably dissimilar typological profiles. In short, icle varieties are more analytic and less synthetic than ice varieties. 4.2

Sources of variability

The task before us now is to identify those grammatical markers and/or marker categories which are responsible for the bulk of variability in index scores. To this end, we will deconstruct the indices considered in the previous section, exploring which of the component categories loading on the two indices discriminate between learner Englishes, on the one hand, and indigenized L2 varieties of English, on the other hand. In exactly this spirit, Table 2 lists those five grammatical markers where we see significant or marginally insignificant frequency contrasts between the icle dataset and the ice dataset. The most marked discrepancies concern the text frequency of pronouns, an analytic marker category. In indigenized L2 varieties of English, pronouns have a mean text frequency of 28 ptw; in learner Englishes, the frequency is more than twice this figure. It seems to us than in many cases, icle writers use finite subordinate clauses, as in (1) (is that you must not wait instead of is not Table 2. Mean frequencies (in frequency per thousand words) of grammatical markers by variety type: indigenized L2 varieties of English (ice) versus learner Englishes (icle); significant or marginally insignificant differentials only (significance of inter-group differences according to an independent samples t-test)

pronouns negator not, n‘t auxiliary do auxiliary have inflected verbs

indigenized L2 varieties of English

learner Englishes

significance of group difference

28 5 1 4 120

58 10 4 7 107

t = –6.77 (p < .001) t = –2.82 (p = .014) t = –2.31 (p = .037) t = –1.85 (p = .086) t = 1.82 (p = .090)

Typological profiling 

to wait) while ice writers tend to opt for the nonfinite – and pronoun-less – construction, as in (2): is to fragment rather than is that he fragments. (1) The most important thing here is that you must not wait until the child is “big enough” to learn (2) His idea, in a nutshell, is to fragment an input text according to key words.

It is particularly interesting to note that even those icle writers whose native language is pro-drop (Italian and Spanish) overuse pronouns vis-à-vis ice writers: in icle-Italian, pronouns have a text frequency of 46 ptw, in icle-Spanish it is 53 ptw. Moving down in Table 2, the negator not (including its contracted variant n’t), likewise an analytic category, is twice as frequent in learner Englishes (mean frequency: 10 ptw) than in indigenized L2 varieties (mean frequency: 5 ptw). Browsing through the concordance lines, it appears that in many cases, learners’ overuse of negators is a strategy to deal with certain limitations of their lexicon – in other words, we are likely to be dealing here with a lexically motivated structural difference. In (3), for instance, the icle writer uses the phrase treatments we still do not have (rather than its lexical alternative, treatments we still lack); the ice writer in (4), however, does use the verb lack instead of the paraphrase they do not have school fees.

(3) There are plenty of treatments we still do not have, for example, for HIV and cancer. (4) Some time they lack school fees and the educational supply materials for learning.

Two other analytic markers are substantially more frequent in the icle texts than in the ice texts: auxiliary do, as in (3) above, and auxiliary have. The significantly higher frequency of auxiliary do in learner Englishes is probably related to the aforementioned tendency by learners to use negative paraphrases instead of their lexical alternatives (this is the issue of lack versus do not have discussed in the foregoing paragraph). The marginally insignificant overuse of auxiliary have in icle texts, on the other hand, is due to a preference for the perfect construction, which we regularly find in contexts that would be coded with the simple past in many ice L2 Englishes. Compare example (5), where the icle writer employs the present perfect (as I have said before), to example (6), where the ice writer uses the simple past (as I said before).

(5) We can say that the man distroys not only the nature which surrounds him but also other people, as I have said before.

 Benedikt Szmrecsanyi and Bernd Kortmann

(6) As I said before lingua francas arise when there is a need of communication between/among groups with different languages.

The last item in Table 2 concerns overtly inflected verbs, a category that loads on the syntheticity index. Inflected verbs have a text frequency of 120 ptw in ice essays and 107 ptw in icle essays (this differential is marginally insignificant, but still warrants, we believe, the analyst’s attention). In other words, learners comparatively often use the base form of lexical verbs instead of inflected forms, as in (7): don’t and benefit (overt past tense marker absent) instead of didn’t and benefited (overt past tense marker present).

(7) There are several technological elements that one hundred years ago, don’t exist, such as television, radio, video, cars, and all kind of machines in the job, in house, in everyplaces which have done us better the life. These improvements have benefit us in part, for example to make your life more comfortable ...

At this point, it will be instructive to additionally investigate exactly which grammatical markers are most implicated in setting apart the two extreme varieties in our dataset, icle-Spanish (as the most analytic and least synthetic variety in our dataset) and ice-India (as the most synthetic and least analytic variety). Let us begin by comparing icle-Spanish to the other icle varieties in our dataset. A series of chi-square tests of independence reveals that Spanish learners of English use significantly (p = .001) fewer inflected verb forms than learners with other mother tongue backgrounds – example (7) above nicely illustrates this phenomenon. But compared to other icle writers, Spanish learners also significantly (p = .01) overuse determiners. This phenomenon is exemplified in (8), where we find two noun phrases (the society, the money) where other writers would not necessarily employ a determiner. (8) From the beggining of the society there are the problem of the money. Turning to Indian English, a statistical analysis of the text frequency of grammatical markers in ice-India reveals that writers in this corpus use explicit synthetic plural marking on nouns, as in (9) and (10), significantly (p = .01) more frequently than other ice writers. (9) Impacts Of Science On Human Life (10) Business without planning would not bring desired effects or would not achieve the objectives of a business in fairly manner. The main objectives of a business (every) is to survive in the market.

Typological profiling 

Note in this connection that the use of plural -s with non-count nouns is a wellknown characteristic of Indian English (cf., for instance, Kachru 1983; Bhatt 2008). We would also like to mention here that relative to other ice essays, ice-India essays exhibit significantly (p = .002) fewer conjunctions, especially fewer subordinating conjunctions; a detailed discussion of this phenomenon is reserved for another occasion. In sum, the results discussed in this section suggest that learner Englishes and indigenized L2 varieties of English differ primarily thanks to five grammatical categories with different usage patterns and frequencies in the two variety types: pronouns, the negator not, auxiliary do, auxiliary have, and verbal inflections. In a similar vein, we have probed those grammatical markers that make icle-Spanish and ice-India special: determiners/articles and verbal inflections in the case of the former, and explicit plural marking and (subordinating) conjunctions in the case of the latter. 4.3

Substrate effects?

We will now turn to the issue of whether we can predict a given learner variety’s typological profile by considering the typological profile of the learners’ mother tongue language – in other words, the question is if we can demonstrate substrate effects. To this end, we are going to relate the typological profiles of a convenience sample of six European languages (Bulgarian, Czech, French, German, Italian, and Russian) to the profiles of a matching subset of six icle varieties (icle-Bulgarian, icle-Czech, icle-French, icle-German, icle-Italian, and icle-Russian). The relevant information can be gleaned from Table 3. Thus, for instance, icle-Bulgarian has an analyticity index of 497 and a syntheticity index of 141; the corresponding index values for (written) Bulgarian are 372 and 395, respectively. The problem is that these values cannot be compared directly, as variability between languages is substantially more pronounced than variability between icle varieties, as a cursory glance at the standard deviations reported in Table 3 makes clear. So, what we need is a way to normalize index values, and we will draw on z-score transformation to achieve this normalization.3 Consider, again, icle-Bulgarian: the mean analyticity index in the icle dataset is 485, and the standard deviation in this dataset is 23; icle-Bulgarian‘s analyticity index score of 497 therefore translates into a z-score of .5 (which is another way of saying that icle-Bulgarian’s analyticity index score is .5 standard deviations above the mean of all icle varieties under study). In a similar fashion, the analyticity index score of 3. Notice that this sort of normalization is similar to speaker normalization of formant frequencies customary in acoustic phonetics.

 Benedikt Szmrecsanyi and Bernd Kortmann

Table 3. Mean index values and z-scores by icle variety and the respective substrate language. Z-scores were calculated on the basis of intra-group dispersion (icle varieties or substrate languages) analyticity index

syntheticity index

value

z-score

value

z-score

icle -Bulgarian icle -Czech icle -French icle -German icle -Italian icle -Russian mean standard deviation

497 440 498 480 494 498

.5 –1.9 .6 –.2 .4 .6

141 191 166 190 163 166

–1.5 1.1 –.2 1.1 –.3 –.2

(written) Bulgarian (written) Czech (written) French (written) German (written) Italian (written) Russian mean standard deviation

372 334 439 436 458 300

485 23

170 19 –.3 –.9 .8 .7 1.1 –1.4

390 64

394 683 153 301 250 670

–.1 1.2 –1.2 –.5 –.7 1.2 409 222

372 associated with (written) Bulgarian translates into a z-score of –.3: the mean analyticity index score in the European languages subset is 390, the corresponding standard deviation is 64 – and Bulgarian’s analyticity index score of 372 is .3 standard deviations below the mean value of 390. Interpretationally, we will consider matching signs of z-score-transformed index score pairings as a necessary – though not necessarily sufficient – condition for the assumption of substrate effects. Hence, we would argue that the typological profile of a substrate language X can be taken to have an effect on the typological profile of learner variety Y if both are more or less analytic than the respective group means, and likewise if both are more or less synthetic than the respective group means. In Figure 3 we find a visualization of the data in Table 3. The diagram divides typological space in four quadrants (starting with the top left quadrant and moving clockwise): (I) above-average analytic/below-average synthetic, (II) above-average analytic/above-average synthetic, (III) below-average analytic/below-average synthetic, and (IV) below-average analytic/above-average synthetic (consider

Typological profiling 

Analyticity index (z-score transformed)

2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 –0.2 –0.4 –0.6 –0.8 –1 –1.2 –1.4 –1.6 –1.8 –2 –2.2

written Italian written French ICLE-Bulgarian

ICLE-French

written German

ICLE-Russian

ICLE-Italian

written Bulgarian

ICLE-German written Czech written Russian

substrate language learner variety

ICLE-Czech

–2.2 –2 –1.8 –1.6 –1.4 –1.2 –1 –0.8 –0.6 –0.4 –0.2 0

0.2 0.4 0.6 0.8

1

1.2 1.4 1.6 1.8

2

2.2

Syntheticity index (z-score transformed)

Figure 3. Analyticity by syntheticity (in z-score transformed index scores): a subset of learner Englishes as sampled in icle (white dots) versus their respective substrate languages (black squares)

written Bulgarian: it is located in the below-average analytic/below-average synthetic quadrant since both of its index z-scores in Table 3 have a negative sign).4 Following our earlier definition of substrate effects, Figure 3 indicates substrate effects to the extent that learner varieties and their respective substrate languages end up in the same quadrants, thanks to matching signs of index z-score pairings. This is the case for exactly three learner variety/substrate language pairings: – icle-French is above-average analytic and below-average synthetic, and so is written French; – icle-Italian is likewise above-average analytic and below-average synthetic, as is written Italian; – icle-Czech is below-average analytic and above-average synthetic, just as written Czech is. icle-Bulgarian, however, is above-average analytic and below-average synthetic, while – as we have seen – written Bulgarian is above-average analytic and aboveaverage synthetic. icle-Russian is similarly both above-average analytic and 4. As an aside relating to Figure 3, it is quite interesting to point out in this context that the above-average analytic/above-average synthetic and below-average analytic/below-average synthetic quadrants (II and III) are almost empty – this may be taken to suggest that the old idea of a trade-off between synthetic and analytic grammatical marking does seem to hold in our dataset.

 Benedikt Szmrecsanyi and Bernd Kortmann

below-average synthetic, but its substrate language is below-average analytic and above-average synthetic. Conversely, icle-German is below-average analytic and above-average synthetic whilst written German is above-average analytic and below-average synthetic. We conclude that one might argue for substrate effects in Czech, French, and Italian learner English, but there is no empirical basis for assuming such effects in the case of Bulgarian, German, or Russian learner English. It is fair to say that this is a fairly mixed picture, which makes it hard to talk about systematic substrate effects in terms of grammatical analyticity and syntheticity across all learner varieties. 5. Discussion and conclusion Typological profiling was the name of the game in this study, and has yielded four very clear results for the parameter grammatical analyticity vs. syntheticity. Three of these relate to the two primary research questions which informed this study: do learner Englishes and indigenized L2 varieties share typological properties thanks to certain concomitants of sla, and to what extent are typological profiles of individual learner Englishes conditioned by the learners’ native language background? Concerning the first question, Figure 1 has made amply clear that, as hypothesized, the majority of learner varieties and even indigenized L2 varieties exhibit, on average, less syntheticity and more analyticity than Standard British English reference varieties. This tendency is far more pronounced for learner varieties, however, than for indigenized L2 varieties. The vast majority of learner varieties are both significantly less synthetic and more analytic than the two written Standard British English reference registers, school essays and university essays. This leads to our second, far more important finding. Learner varieties and indigenized L2 varieties clearly exhibit different typological profiles: the former are significantly more analytic and also exhibit a tendency to be less synthetic than the latter. In exhibiting these characteristics, the two variety groups are both fairly discrete and internally coherent. Our study therefore provides another piece of empirical evidence, based on a large-scale comparison of synthetic vs. analytic coding strategies in grammar, which confirms the need for drawing a distinction between English as a Foreign Language (efl) and English as a Second Language (esl) varieties on purely structural grounds. We have further suggested that the learner varieties as represented in icle can be argued to exhibit a phenomenon which has justly been labeled register interference (Aijmer 2002: 55). More exactly, our study replicated findings deriving from a range of icle-based studies on (largely) morphosyntax which have been

Typological profiling 

conducted since the late 1990s, all of which point to “the speech-like nature of learner writing” (Granger & Rayson 1998: 129) such “that aspects of the language in the corpus are more speech-like than comparable native English writing” (Aijmer 2002: 73; cf. also, e.g., Biber & Reppen 1998, Meunier 2000). Recall that according to our evidence, icle varieties are, on average, substantially closer to British English conversation than to school or university essays. It is, then, interesting that “the learners’ stylistic immaturity” (Granger & Rayson 1998: 130), which is the almost expected outcome of natural developmental factors (texts written by young or inexperienced native speakers likewise exhibit a strongly oral style) that are reinforced by educational factors (namely the dominant communicative approach to teaching English as a foreign language), can be observed not only at the concrete level of individual morphosyntactic categories, such as the use of articles, personal pronouns, special tensed and non-tensed verb forms, modals, passive, adverbials, subordinators, finite vs. non-finite adverbial and complement clauses. Instead, we appear to be able to gauge this immaturity also by exploring rather abstract and much more coarse-grained typologically inspired parameters, such as analyticity and syntheticity. We speculate that the reason why ice writers are better at approximating to Standard (here: British) English essay writing conventions than icle writers is that ice writers are subject to the long-term effect of being trained in English (essay) writing on a wide range of topics in many different school subjects in an English-medium education system. Finally, we failed to detect systematic substrate effects such that the degree of analyticity and syntheticity exhibited by the learner’s L1 influenced the degree of analyticity and syntheticity of the relevant learner variety of English. For some icle varieties such effects can be postulated, for others not. Future research will have to show, however, to what extent substrate effects along the parameters investigated here may come to the fore when zooming in on individual (bundles of) morphosyntactic categories. Take, for instance, the NP and the effect which the notorious underuse of articles by Russian (or, more generally, East Slavic) learners of English will have on the analyticity index of the relevant learner variety. In conclusion, let us take a step back and sketch some methodological implications and desiderata of this study for future research. This relates to sla research on non-native (learner) varieties and native (indigenized) L2 varieties of English, on the one hand, and to our own research focus, on the other hand, i.e. typologydriven morphosyntactic profiling of varieties and variety types of English in terms of recurrent bundles or ‘conspiracies’ of morphosyntactic features, grammatical surface complexity, and analytic vs. synthetic strategies for coding grammatical information. In the latter context, we have endeavored here to add learner varieties of English as a sixth variety type to our portfolio of variety types consisting of low-contact L1 varieties, high-contact L1 varieties, indigenized L2 varieties,

 Benedikt Szmrecsanyi and Bernd Kortmann

English-based Pidgins, and English-based Creoles. In a range of previous studies we have demonstrated that each of these variety types exhibits distinct morphosyntactic, complexity and analyticity/syntheticity profiles (see, e.g., Szmrecsanyi & Kortmann 2009a, Kortmann & Szmrecsanyi 2009). Since all of our previous research on World Englishes and English-based Pidgins and Creoles has so far largely been based on comparisons of spoken data, it will be most interesting to see how spoken learner varieties fit into the picture. Excitingly, this will be possible once the lindsei corpus (The Louvain International Database of Spoken English Interlanguage) is released, since from that point onwards comparisons with the spoken components of the ice corpora for indigenized L2 varieties can be conducted. In the context of the present study and previous icle-based research, one will want to probe if in the spoken medium, too, learner Englishes and indigenized L2 varieties exhibit different typological profiles. The expectation is that the typological profiles should approximate each other considerably: since genre interference can no longer be relevant for the spoken learner varieties, at least on a structural level, both learner and indigenized L2 varieties should exhibit features characteristic of spoken Standard English. As a result of the expected approximation of the typological profiles of lindsei varieties and spoken ice-L2 varieties, we might also expect the (structural) distance between these two variety types to be considerably smaller than between either of them and any of the other variety types we have investigated so far (e.g. low-contact L1 varieties, Pidgins, or Creoles). But even lindsei will not be able to remedy one general drawback which many currently (or soon) available corpora sampling learner English suffer from – namely, the very fact that both lindsei and icle represent fairly advanced learner Englishes. It is part and parcel of the icle design that the data stem from about 20-year-old non-native “university undergraduates in English Language and Literature in their third or fourth year” (Granger 1998: 10), and for the sake of comparability, this is also the profile of the lindsei informants. This is not a problem, of course, for sla research interested in advanced learner varieties. It is a big problem, however, if learner varieties are tackled from a completely different angle, as done by the present authors. Our prime interest in learner varieties was, and still is, to take them as a means to an end in order to learn more about the genesis and evolution of indigenized L2 varieties of English, on the one hand, and Englishbased Pidgins and Creoles, on the other hand. In previous research of ours (cf. Kortmann & Szmrecsanyi 2009, to appear, Szmrecsanyi & Kortmann 2009 a,b,c), we found confirmed claims by Trudgill (2001, 2009) and McWhorter (2001, 2007) to the effect that the grammars of Pidgins and Creoles are characterized by simplification processes, which in turn is taken to be a result of intensive language contact and rapid adult language acquisition (of English). From that point of view – investigating learner varieties of English to learn more about typical

Typological profiling 

strategies and grammatical patterns used by adult foreign language learners – what is desperately needed is collections of (primarily spoken) data for early adult learners of English, produced ideally in a natural, non-instructional environment. Longitudinal data of this kind were collected and investigated, for example, as part of a major sla project in the 1980s under the auspices of the European Science Foundation. In five European countries, the sla process of adult immigrants was documented for a period of 30 months for five L2s acquired by speakers of six L1s (for the methodology cf. e.g. Klein & Purdue 1997: 308–310 and Trévise & Porquier 1986). For English as L2, the project provides data from two adult immigrants with Italian and Punjabi as their native languages. This is a most valuable starting-point, but similar data need to be collected for several groups of adult learners of English with different L1 backgrounds. And there is yet another set of data for which corpora need to be compiled, since these data offer a second, equally important window on the genesis and evolution of (ultimately) indigenized L2 Englishes, namely data for their early stages. This means data from the first few generations of L2 speakers or, to use a different measure, from (different sub-stages of) the second and third stages of Schneider’s (2007) well-known evolutionary cycle of postcolonial Englishes, i.e. the stages of exonormative stabilization (stage 2) and, especially, nativization (stage 3). The sooner the sla and the World Englishes communities embark on these challenging corpus compilation projects in order to explore early sla in general, and early sla in (nativized as well as non-nativized) L2 Englishes in particular, the better. References Aijmer, K. 2002. Modality in advanced Swedish learners’ written interlanguage. Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching [Language Learning & Language Teaching 6], S. Granger, J. Hung & S. Petch-Tyson (eds), 55–76. Amsterdam: John Benjamins. Anttila, R. 1989. Historical and Comparative Linguistics [Current Issues in Linguistic Theory 6] Amsterdam: John Benjamins. Aston, G. & Burnard, L. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: EUP. Bhatt, R. 2008. Indian English: Syntax. Varieties of English, 4: Africa, South and Southeast Asia, R. Mesthrie (ed.), 546–562. Berlin: Mouton de Gruyter. Biber, D. & Reppen, R. 1998. Comparing native and learner perspectives on English grammar: A study of complement clauses. Learner English on Computer, S. Granger (ed.), 145–158. London: Longman. Granger, S. 1998. The computer learner corpus: A versatile new source of data for SLA research. Learner English on computer, S. Granger (ed.), 3–18. London: Longman.

 Benedikt Szmrecsanyi and Bernd Kortmann Granger, S. & Rayson, P. 1998. Automatic profiling of learner texts. Learner English on computer, S. Granger (ed.), 119–131. London: Longman. Granger, S., Dagneaux, E. & Meunier, F. 2002. The International Corpus of Learner English. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain. Greenbaum, S. (ed.). 1996. Comparing English Worldwide: The International Corpus of English. Oxford: Clarendon. Greenberg, J.H. 1960. A quantitative approach to the morphological typology of language. International Journal of American Linguistics 26(3): 178–194. Kachru, B.B. 1983. The Indianization of English: The English language in India. New Delhi: OUP. Kachru, B.B. 1985. Standards, codification and sociolinguistic realism: The English language in the outer circle. In English in the World: Teaching and Learning the Language and Literatures, R. Quirk & H.G. Widdowson (eds), 11–30. Cambridge: CUP. Klein, W. & Perdue, C. 1997. The basic variety (or: Couldn’t natural languages be much simpler?). Second Language Research 13: 301–347. Kortmann, B. & Szmrecsanyi, B. 2004. Global synopsis: Morphological and syntactic variation in English. In A Handbook of Varieties of English, B. Kortmann, E. Schneider, K. Burridge, R. Mesthrie & C. Upton (eds), 1142–1202. Berlin: Mouton de Gruyter. Kortmann, B. & Szmrecsanyi, B. 2009. World Englishes between simplification and complexification. In World Englishes: Problems – Properties – Prospects [Varieties of English around the World 40], L. Siebers & T. Hoffmann (eds), 263–286. Amsterdam: John Benjamins. Kortmann, B. & Szmrecsanyi, B. To appear. Parameters of morphosyntactic variation in World Englishes: Prospects and limitations of searching for universals. In Linguistic Universals and Language Variation, P. Siemund (ed.). Berlin: Mouton de Gruyter. McWhorter, J. 2001. The world’s simplest grammars are creole grammars. Linguistic Typology 6: 125–166. McWhorter, J. 2007. Language Interrupted: Signs of Non-native Acquisition in Standard Language Grammars. Oxford: OUP. Meunier, F. 2000. A Computer Corpus Linguistics Approach to Interlanguage Grammar: Noun Phrase Complexity in Advanced Learner Writing. PhD dissertation, Centre for English Corpus Linguistics. Université catholique de Louvain, Louvain-la-Neuve. Platt, J.T., Weber, H. & Ho, M.L. 1984. The New Englishes. London: Routledge & Kegan Paul. Schlegel, A.W.v. 1818. Observations sur la Langue et la Littérature provençales. Paris: Librairie grecque-latine-allemande. Schneider, Edgar W. 2007. Postcolonial English. Varieties of English Around the World. Cambridge: CUP. Seuren, P. & Wekker, H. 1986. Semantic transparency as a factor in creole genesis. In Substrata versus Universals in Creole Genesis [Creole Language Library 1], P. Muysken & N. Smith (eds), 57–70. Amsterdam: John Benjamins. Szmrecsanyi, B. 2009. Typological parameters of intralingual variability: Grammatical analyticity vs. syntheticity in varieties of English. Language Variation and Change 21(3): 319–353. Szmrecsanyi, B. & Kortmann, B. 2009a. Between simplification and complexification: Non-standard varieties of English around the world. In Language Complexity as an Evolving Variiable, G. Sampson, D. Gil & P. Trudgill (eds), 64–79. Oxford: OUP. Szmrecsanyi, B. & Kortmann, B. 2009b The morphosyntax of varieties of English worldwide: A quantitative perspective. Lingua 119(11): 1643–1663.

Typological profiling  Szmrecsanyi, B. & Kortmann, B. 2009c. Vernacular universals and angloversals in a typological perspective. In Vernacular Universals and Language Contacts: Evidence from Varieties of English and Beyond, M. Filppula, J. Klemola & H. Paulasto (eds), 33–53. London: Routledge. Trévise, A. & Porquier, R. 1986. Second language acquisition by adult immigrants: exemplified methodology. Studies in Second LanguageAcquisition 8: 265–275. Trudgill, P. 2001. Contact and simplification: Historical baggage and directionality in linguistic change. Linguistic Typology 5: 371–374. Trudgill, P. 2009. Vernacular universals and the sociolinguistic typology of English dialects. In Vernacular Universals and Language Contacts: Evidence from varieties of English and beyond. M. Filppula, J. Klemola & H. Paulasto (eds), 304–322. London: Routledge. Wekker, H. 1996. Creole Languages and Language Acquisition. Berlin: Mouton de Gruyter.

A principled distinction between error and conventionalized innovation in African Englishes Bertus van Rooy

Vaal Triangle Campus, North-West University, South Africa A distinction between error and conventionalized innovation is essential to understanding if and how New Varieties of English develop new conventions. This chapter proposes two criteria, grammatical stability and acceptability, to identify conventionalized innovations. It draws on the distinction drawn by Croft (2000) between the narrow process of creating new forms (innovation in the narrow sense) and the subsequent diffusion thereof, which are characterized as individual/psycholinguistic and social respectively. Three features from African Englishes are examined: the so-called extension of the progressive aspect to stative verbs and the use of “can be able to” in Black South African English, as well as the complementation of “enable” with bare infinitive clauses in East African English. The analyses indicate that while these features may have originated as errors due to analogy or overextension of existing patterns, which may also happen in the process of acquiring English as a foreign Language, the context of New Varieties of English is such that stabilization and conventionalization of these innovations may occur. Genuine new linguistic conventions emerge from forms that may have started out as errors.

1. Introduction Learner English (prototypically Foreign Language Englishes) and New Englishes are not treated as separate entities by a number of noteworthy commentators, such as Selinker (1972) or Quirk (1990). These non-native varieties are regarded as a single group, in contrast to native varieties, which are argued to have a separate nature. In the World Englishes paradigm, Kachru (1991 and elsewhere) argues for a model of three concentric circles, which places the Foreign Language users of

 Bertus van Rooy

English in the Expanding Circle, while New Englishes are in the Outer Circle, and differ along a number of parameters from both native and foreign varieties.1 Different research traditions pay attention to Foreign Language Englishes and Outer Circle/New Englishes separately, thereby obscuring some of the correspondences. On the other hand, in the typical mainstream applied linguistic approach, no systematic attention is paid to possible differences between Foreign Language and New Englishes. Ellis (1994: 220) carefully recognizes a difference between interlanguages and nativized varieties, citing with approval World Englishes scholars, including Kachru himself, but this concession has little effect on the understanding of the processes involved. The socio-political contexts of New Varieties are treated as one among many other typical social variables that may have a mediated effect on the ultimate level of proficiency attained (Ellis 1994: 197). The underlying goal remains, implicitly, to attain native-like competence, in some form of English that resembles a recognized Standard Variety. A South African commentator who aligns himself with Quirk (1990) vis-à-vis Kachru (1991), Peter Titlestad (1996: 168), formulates the conceptual challenge for scholars who want to maintain a special status for New Varieties as opposed to just a particular case of learner English as follows: “the random errors of secondlanguage learners at various stages of acquisition do not make a new English unless a codifiable consistency can be demonstrated”. Thus, to be a “Variety”, Titlestad imposes the requirement of stability and consistency, in effect the eradication of traces of learner errors. Somebody who is more sympathetic towards the recognition of New Varieties, Vivian de Klerk, is still faced with the following question: When does a substratal feature assert itself sufficiently to overcome the fear that if deviations are allowed, the rules will be abandoned and chaos will ensue? Is it when speakers use it often enough to silence or exhaust the prescriptors? Perhaps benign neglect is what is called for, rather than an intervention, and time is the ultimate strategy towards legitimation (Gonzales 1997: 210), although ultimately the test is the speakers’ ability to deliver their message among themselves and to the outside world. (De Klerk 1999: 315) 1. The relatively neutral terms Foreign Language Englishes and New Englishes are mostly used in this chapter, parallel to the Expanding and Outer Circle varieties in Kachru’s concentric circle model. However, all of these forms, including the forms used by learners in either Foreign Language or New Englishes contexts, are regarded as varieties. The term variety is taken to be relatively neutral in respect of degree of codification, standardization or the status of a particular variety, as opposed to a term like dialect, which is mainly invoked in native speaker contexts. To be a variety, a collection of linguistic forms habitually used by an identifiable population of speakers has to be identifiable by at least some of its own speakers and/or some other speakers of the language. It does not require the type of stability that is typical of standard written varieties, nor does it require any official sanction. Thus, any English that has some characteristic features and a population of speakers will be deemed a variety in this chapter as a matter of definition.

A principled distinction between error and conventionalized innovation in African Englishes 

Mesthrie (1992, 1999, 2008b) argues persuasively for the beneficial effect of taking into account insights from Second Language Acquisition in the study of New Varieties of English. In particular, Mesthrie (2008b: 634) characterizes New Varieties as the Cinderella of a range of linguistic disciplines, neglected and often subject to grossly simplified analyses, depending on the framework of the analyst. He notes, for instance, the improbability of transfer from more than a 1000 substrate languages being responsible for the extensive similarities observed across New Englishes from Africa and Asia. The distinction between error and conventionalized innovation is one of the crucial issues that researchers dealing with New Varieties struggle to come to terms with, and a key area in which more progressive views of “varieties” are open to criticism. Kachru (1991) proposes a distinction between error and deviation, where deviation refers to a form that differs from those habitually used in Inner Circle/native contexts, but is acceptable in a different (New English) context. Krishnaswamy and Burde (1998: 31) criticize Kachru for circularity in his argument. They note that Kachru argues on the one hand that the Indianness of Indian English is the result of acculturation, and acculturation is the result of the adaptation of a language to cultural needs. However, they claim that Kachru also argues Indian English results from the institutionalization of an interference variety, where interference is the result of language errors. Ultimately, then, the criticism is that errors remain at the root of deviations and that Kachru does not really provide an alternative, except in the form of a terminological maze. Bamgbose (1998) also argues for a difference between error and innovation, and proposes internal criteria to draw the distinction. Innovations are characterized by widespread use in a speech community (demographic norm), larger geographical diffusion, codification in some textbooks, acceptance by authorities (such as the media, teachers, examination bodies and publishing houses), and ultimately acceptability in a given speech community. In the same argument, Bamgbose argues against external norms for judging innovation, because these ultimately use standardized native varieties as a yardstick of comparison. The exact sources of innovation are seldom treated by World Englishes scholars, although an implicit assumption is the possibility that non-native performance phenomena may well feed into the linguistic feature pool (in the sense of Mufwene 2001) from which innovations are selected. Schneider (2007: 97–109) treats this matter more explicitly, by positing structural nativization as a central contributor to the reshaping of English. In his exposition of innovation as a possible source of new features in Post-Colonial Englishes, Schneider (2007: 102–107) avoids using the term ‘error’, when reviewing processes like simplification, restructuring and exaptation. Alongside these language-internal sources of new linguistic features, he also makes reference to contact phenomena, where loanwords from

 Bertus van Rooy

the indigenous languages are incorporated into English but also where syntactic and phonological templates are transferred – processes that are discussed under the heading of ‘transfer’ in studies of second language acquisition. The question that is raised by Schneider’s (2007) exposition, against the backdrop of an understanding such as Selinker’s (1972) “societal fossilization” is whether the difference is merely terminological, hence a matter of researcher attitude and ideology, or whether there is something qualitatively different between New Varieties of English and prototypical Foreign Language Learner Varieties. The crucial issue to resolve is whether the New Varieties exhibit genuine linguistic innovations that become conventionalized, or whether they simply exhibit errors. In this chapter, I argue for the view that the difference is qualitative. In the next section, the psycholinguistic similarities between New English and Learner varieties are acknowledged, to pave the way for appreciating the difference, which is of a social kind. In support of the position outlined in the next section, three linguistic features in New Varieties are reviewed. The remainder of the chapter shows how the social forces of New Varieties operate to give rise to new conventional linguistic forms. 2. Social and psycholinguistic forces Croft (2000) develops an explicit account of language change in terms of two independent processes – linguistic innovation and linguistic conventionalization. Innovation (in this narrower sense) is an individual, psycholinguistic process, through which new forms are created, usually unintentionally, as speakers try to communicate. He shows that even within monolingual native communities, socalled performance errors may occur, and some of these may eventually become accepted. In language contact situations, however, the potential for innovating new forms, among other means through performance errors due to transfer, simplification, overgeneralization and similar processes, is increased. Once forms are innovated, they become part of the feature pool of potential linguistic forms in a speech community, but this does not yet amount to acceptance. The process of diffusion of forms, and their eventual selection into the shared linguistic system of a speech community is a thoroughly social one according to Croft (see also Van Rooy 2010 for an extension of such a social concept of linguistic diffusion to New Englishes). Social forces, familiar to scholars of variationist sociolinguistics, operate on the feature pool, to lead to the (social) conventionalization and (individual) entrenchment in the grammar and/or lexicon. In view of the principled distinction between innovation, an individual psycholinguistic process, and selection, a shared social process, we can reconceptualize the difference between mere ‘error’, common to New Englishes and all learner

A principled distinction between error and conventionalized innovation in African Englishes 

Englishes (in Outer and Expanding Circle environments), and conventionalization, which is much more likely to occur in New Englishes in the Outer Circle than in Foreign Language English contexts in the Expanding Circle. Learners of Foreign Language Englishes and learners of New Varieties of English are in the same initial position because they learn English as a second language, usually from school-going age or even later. In consequence, their mental representations of the additional language are likely to be different from the representations of native speakers. Paradis (2004) summarizes neurolinguistic work on brain activity associated with first and second languages. Findings converge on a picture where native speakers do more work with their procedural memory, giving rise to rule or schema-like generalizations, and do somewhat less work with their declarative memory, with stored representations of words and idiom-like structures. At a later age, language learners tend to shift the learning burden to the declarative memory, and make less use of regularities. When they run out of linguistic resources, however, they may rely on rules, but then the potential for producing new, non-conventional forms, or ‘errors’, in another terminological conception, increases, hence the often repeated finding of rule overgeneralization among language learners. On an individual, psycholinguistic level, the process of acquiring either a New Variety of English or a Foreign Language English is quite similar, and more likely to contribute deviant forms to the feature pool than in Inner Circle contexts.2 Socially speaking, the difference is fundamental, however. On average, the contexts in which New Englishes arise, provide much more regular opportunity to use the language. English not only serves the purpose of interaction with foreigners, but serves a range of intra-national functions, such as education, courts, parliament and public media. This may in part give more individual opportunity for entrenching certain forms. The contrast between indigenized and performance varieties proposed by Kachru (1991) makes sense in the terms of extended versus limited opportunity to use the language. Also, on a social level, the nature of New English contexts is such that communication usually takes place among non-native speakers from different language backgrounds. Native speakers are seldom involved, and therefore do not exert much pressure on other speakers to conform to their speech and writing patterns. While teachers of English in Foreign Language contexts are often non-native speakers too, the target for acquisition is more 2. The target for acquisition may also be different in Foreign Language and New Englishes contexts, and may in both cases differ from the target for native language acquisition. Nominally, the target in Foreign Language contexts is often a codified native variety, while this is a more contentious (socio-politico-educational) issue in the Outer Circle. In Van Rooy (2006), I provided some corpus evidence in support of the claim that the target for native and foreign language acquisition is often more similar than in the Outer Circle, a notion that received further evidence in the work of Petzold (2002) and Coetzee-Van Rooy (2006).

 Bertus van Rooy

likely to be a native target, and outside the classroom, the likelihood of communicating with a native speaker is higher than in a New Englishes context (even if the overall opportunities for use are less frequent). Moreover, since different social dynamics operate in such societies, the forces that give rise to the selection of certain linguistic variants above others will inevitably lead to different linguistic systems emerging over time. One very important dimension of the social forces in the New Varieties is the identity dimension, the cornerstone of Schneider’s (2003, 2007) Dynamic Model. In post-colonial societies, the (in origin) ‘erroneous’ forms may attain conventional status as soon as native speakers, as well as acrolectal speakers approximating the external Standard English norms, accept the indigenous population as members of their own speech community. This leads to greater acceptance of the forms produced by the indigenous population, and in the long run leads to slightly different linguistic variants gaining acceptance in those communities. Hundt & Vogel (this volume) argue that this also applies to predominantly native English settings where extensive contact with second language speakers takes place, as is the case with New Zealand English for example. Of course, this may sound hopelessly resigned, as if ‘errors’ simply have to be accepted. However, the situation is not as simple. McWhorter (2007) argues that the same process has happened to English and a number of other languages of wider communication, such as Arabic, Malay, Mandarin Chinese and Persian at various stages of their development. These languages were all simplified and regularized at some point in their development, due to extensive acquisition by second language speakers. The prime targets for simplification are redundant features that McWhorter regards as ‘ornamental’ (see also Szmrecsanyi & Kortmann 2009), which are features of language, historical layers of linguistic change that are often not necessary from a communicative point of view. However, sometimes language change may even affect communicatively more useful contrasts, such as the singular/plural contrast in the second person pronoun (thou/you) that English also lost during the Early Modern English period. In this chapter, I argue that New Varieties in post-colonial communities, when they reach Stages 3 and 4 in terms of Schneider’s Dynamic Model, undergo the same process. These may, in some cases, even filter through to other varieties of English, after stabilization in particular varieties, such that native speaker varieties may show traces of influence from New Varieties. It is not anticipated, in the current state of affairs, that Foreign Language Englishes are likely to have such an extensive impact on the English language globally. There is broad quantitative support for this proposition from the work of Szmrecsanyi & Kortmann (2009) on complexity in New Varieties. Drawing on data from the Handbook of Varieties of English (Kortmann et al. 2004) and ICE corpora, they identify fundamental

A principled distinction between error and conventionalized innovation in African Englishes 

differences (mainly morphological) between native and non-native (but specifically New) varieties. Among the differences they observe, are more regularity, and fewer irregular forms in Outer Circle varieties, and some avoidance of redundant feature marking and ornamental rules. This is attributed to the conventionalization of more regular, simplified forms that have their origin in the learning process, but then simply become accepted. By contrast, low-contact traditional-dialects are among the morphologically most complex varieties of English, because they enjoy uninterrupted intergenerational transmission, where the changing effects of new generations on the stable, conventionalized forms are much slower. Thus, from a social perspective, what has happened to Standard English according to McWhorter (2007) is not fundamentally different from what is happening with New Varieties according to Schneider (2007) or Szmrecsanyi & Kortmann (2009). 3. Evidence Three case studies of linguistic features that originated as errors, but gained acceptance and have become conventionalized in their speech communities, are presented in this section. Two kinds of evidence are reviewed in every example: grammatical systematicity and acceptability. In respect of systematicity, data pertaining to the linguistic manifestation are considered, to show that these variants are not mere random errors, but have found a place in emergent linguistic systems. This is supported by acceptability evidence of the kind alluded to by Bamgbose (1998), to show that these forms have gained acceptance, and in the case of one of the case studies from South Africa, evidence that this acceptance has gone beyond the boundaries of the New Variety, Black South African English, into the usage of members of the native-speaker community. 3.1

Case Study 1: The progressive aspect in South African English

The progressive aspect has been characterized as a typical diagnostic of New Varieties. Mesthrie (2008b: 626) notes, for instance, that “A striking and almost universal characteristic among L2 varieties in Africa-Asia is the extension of BE + -ING to stative contexts.” In descriptions of Black South African English, it is indeed treated as an extension to a new subclass of verbs (e.g. Gough 1996, De Klerk 2003, Mesthrie 2008a: 489), and explained as if Black South African English speakers do not respect the grammatical contrast of stative and dynamic verbs in the aspectual system. Hundt & Vogel (this volume) provide further evidence of the more widespread use of the progressive in other New Varieties. The possibility that the

 Bertus van Rooy

construction has acquired a new meaning, in which the dynamic-stative contrast plays no part, has not been considered previously. Two issues merit reconsideration: are the so-called extended progressives simply a violation of a constraint on the Standard English construction, which serves little function, and are the so-called progressives regarded as errors or as innovations that are acceptable in high-stake communicative domains? Van Rooy (2006) analyzes the progressive construction in a corpus of Black South African Student writing, in comparison to native and German foreign language users. He concludes that the progressives in native and foreign language varieties follow similar patterns of use, in keeping with the textbook descriptions of Standard English (see Hundt & Vogel, this volume, for a more refined view on the distribution of this feature across various New and Foreign Language Englishes). The temporariness of a dynamic event, typically a non-punctual and non-teleological verb, is central to the meaning of the progressive, while various extensions and elaborations proceed from there. Of the total sample of 100 instances of the construction, the prototype meaning is encountered in 43 cases in the LOCNESS corpus of native speaker student writing. An example of the prototypical progressive in the native speaker corpus is:

(1) They found something on everybody, including the witnesses, the prosecution and defence lawyers, and even the dismissed jurors. How did they know what the jurors were doing and where they were sitting in the restaurants? (LOCNESS, USSCU0012-4)3

By contrast, the majority of uses in Black South African English (BSAE) radiate from a prototype of incompleteness, but extensive duration, for instance: (2) People don’t attend matches because players are not delivering. (ICLE-TSNO1220) The core meaning of the construction is different in BSAE, where temporariness, imminent change and the activity being ongoing or foregrounded at some temporal reference point are not central to the meaning of the usage. The actually observed uses form a coherent linguistic construction, which is intrinsically different from Standard English. Seventeen out of the hundred instances of the progressive signify a so-called permanent state (with another 4 examples using stative verbs in senses not expressing short duration), such as the following example: (3) most of the teenagers and Adults are suffering from this desease [HIV/ AIDS]. (ICLE-TSNO1116)

3. The clause containing the progressive construction is underlined in all examples, while the progressive form itself is highlighted with bold typeface.

A principled distinction between error and conventionalized innovation in African Englishes 

Alongside the stative progressives, perfective or durative readings account for another 46 examples, giving a clear majority of instances radiating from the (other) prototype of the BSAE construction. By contrast, the prototype of Standard English is only attested in nine cases. Van Rooy (2006) concludes that the evidence points to a new life for the structure BE + -ING in BSAE, and rejects the description of the extension of the progressive to stative verbs as inaccurate. This study removes the first objection to recognizing the “extended progressive” as an innovation – it is not an unsystematic or random use of a construction beyond the constraints imposed by Standard English. Gough (1996: 63) still found complete rejection of the extended use of the progressive among Black teachers. However, in the next decade, Van der Walt & Van Rooy (2002) found a very high degree of acceptance (96% among BSAE teachers, and 64% among BSAE tertiary students) of structures with the main verb “having” in the progressive aspect. The data samples used by Van der Walt & Van Rooy (2002) were also much bigger (60 teachers and 670 students) than the 20 respondents who participated in Gough’s (1996) survey. This form seems to be gaining in acceptance. On both counts, grammatical systematicity and acceptability, therefore, the use of the progressive with different meanings and with a wider range of verbs, can no longer be regarded as a simple error in approximating Standard English norms. Rather, this usage should be regarded as an innovation, already conventionalized, in BSAE. 3.2

Case Study 2: “Can be able to” in South African English

In Standard English, the periphrastic expression of ability, “be able to” collocates with all modal verbs except “can” and “could”. There isn’t a single instance of “can be able to” in either the BNC or ICE-GB. Crystal (2008) writes a comment about this form on his DCblog. He notes that during the 16th and 17th centuries ‘can be able to’ was quite common in British English. A number of examples from the King James Bible translation and Shakespeare are cited. However, the form fell in disuse during the subsequent centuries. During a websearch, using Google, however, millions of instances were found on websites with .uk, .nz and .au extensions. Thus, while standard corpora don’t really contain instances of this form, they make their way onto the web, and are not altogether absent. Both the BNC and ICE-GB have a single occurrence of “could be able to”, but more widespread use is found on the internet.4 In the BNC, the example reads as follows: 4. Websites from the .uk domain have 32 million hits for “could be able to”, with 14 million on .au, 6 million on .nz and 5 million on .za domains (Google search on 20 January 2010).

 Bertus van Rooy

(4) Pointing the way: A road sign of the future would be able to change from its normal wording (left) to give advice (right) on alternative routes in the event of traffic problems developing Cockpit-style driving: How a motorist could be able to beat the traffic jams by using an in-car display unit (K56)

The construction may be quite rare in Standard British English exactly because it becomes a tautology, if “can” already has an ability reading, as Crystal (2008) also notes. However, the occurrence of this expression is noted as a feature of BSAE by Gough (1996: 63)5 and more recently by De Klerk (2003: 476) and Mesthrie (2008a: 490). De Klerk (2003: 476) postulates that “can” is mainly used in its deontic meaning of permission, in reference to her corpus of Spoken Xhosa-English, leaving the phrase “can be able to” to express the epistemic meaning of ability. However, in the Tswana Learner English (TLE) corpus, both ability and permission meanings occur widely, separate from the “can be able to” construction. The permission meaning is the least frequent of the three meanings of “can” in the TLE corpus, so De Klerk’s analysis of the Xhosa-English data is not applicable to the TLE data. A closer look at the fifteen examples of “can be able to” in the 200,897 word TLE corpus (75 per million words), including (5) and (6) below, suggests that the construction “can be able to” combines the extrinsic possibility meaning with the ability meaning, and that it has nothing to do with permission. (5) Coaches put more effort in their players so that they can be able to win. (ICLE-TSNO1071) (6) People become sick for a long time and this caused Aids because this deseas will kill all your imune system and the body can’t be able to diffend itself against other deseases. (ICLE-TSNO1114) It seems like a simple extension of the grammar of English, where “be able to” combines freely with most other modals, or according to Crystal (2008), a reactivation of a latent pattern in the grammar. Mukherjee & Hoffmann (2006) also argue that a latent pattern, which fills a noticeable gap in the system by extension or analogy from a related construction, is a very probably probable candidate for innovation in the linguistic system of a New English. However, the point here is that the “can be able to” construction is not merely a tautology or emphatic construction, but draws on a subtle semantic contrast among the meanings of the modals “can” and “could”; Black South African English has conventionalized this usage, going beyond the mere potential of extending an existing pattern in English grammar to new territory. If this is accepted, the meanings expressed by “can be able to” in (5) and (6) can be paraphrased as follows: 5. In one of the comments on Crystal’s DCblog article, a reader even refers to this particular Gough chapter to point out the noticeable occurrence of the “can be able to” construction in Black South African English.

A principled distinction between error and conventionalized innovation in African Englishes 

(5′) Coaches put more effort in their players so that they possibly develop the ability to win. (6′) People become sick for a long time and this caused Aids because this deseas will kill all your imune system and the body possibly loses the ability to diffend itself against other deseases. A possible limitation on the TLE is that it contains learner data, i.e. data from students who are still in courses where they are taught English. Thus, the use of the “can be able to” construction, while possibly systematic according to Van Rooy (2005), may still not be met by wider acceptance. To reject this possibility, a number of facts should be borne in mind. De Klerk’s (2003) study is based on conversational spoken data of Xhosa native speakers using English, including a number of professionally qualified speakers, and does not represent learner data in this respect. She identifies 14 instances of the “can be able to” form in the Xhosa-English corpus, while noting in her comparative data base, a New Zealand nativespeaker corpus, no instances are attested. Re-analyzing De Klerk’s corpus, and including negative forms, the number of instances should actually be adjusted to 23, in the 550,754 word corpus (42 per million). De Klerk also collected a corpus of high school and university lectures presented in English by Xhosa speakers. This smallish but helpful corpus of 57,870 words, contains a further 17 instances of “can be able to” (294 per million words). Thus, in corpus data originating from a general public and the most educated segments of the public, Black South Africans use “can be able to” with much more than negligible frequency. Further evidence can be gathered from indications of acceptability. More than a decade ago, Gough (1996: 63) reported that 13 out of 20 teachers in his sample rejected the acceptability of the “can be able to” construction. However, thirteen years later, a web-search for the expression “can be able to”, restricted to English webpages from the .za domain, on 20 January 2010, Google finds more than 10 million different web pages.6 Opening the first few hits, to determine the authorship, shows that many users of the construction are indeed probably Black South Africans, but including for instance the text version of a speech by the Honourable Minister of Foreign Affairs, Dr. Nkosazana Dlamini-Zuma in March 2009:

6. As suggested by David Crystal as well, this form also occurs in other forms of English, and on the web, pages from the New Zealand domain shows more than 10 million hits for the expression “can be able to”, with 21 million from Australia and 42 million of the United Kingdom (on 20 January 2010). Thus, Black South African is not nearly unique anymore, although the presence of the form in relatively smaller corpora may suggest a higher frequency that cannot be refuted or supported with web evidence at present.

 Bertus van Rooy

(7) And we also have a project on Cuba, Rwanda and South Africa and Cuban doctors, and we have agreed to fund part of that project from the Renaissance programme and we agreed that we should get an MOU signed so that we can be able to continue with that project. In texts aimed at a general audience, and originating from websites of public corporations, where one would suspect a fair degree of editing, an example like the following is observed (in a FAQ website of Telkom, the national fixed-line telephone operator):

(8) 16. How many times will the telephone ring before it routes to the mailbox, when it remains unanswered? The telephone will ring seven times (20 seconds) before routing to the mailbox. Enhancement features have been developed, whereby customers can be able to program the ringing time according to their needs (See the ringing times under Enhanced Features). Examining the websites where authorship is explicitly shown, there are also a number of authors who are likely to be native speakers of English rather than Black South Africans, as exemplified by the following:

(9) Well, you should definitely make sure that your brain is still capable of responding into any activity that concerns utilizing it. There are various ways where you can workout your brain so it can be able to adjust readily to different conditions in the environment. You’ll actually find games or exercises that can contribute to this intellectual workout either if you would like to be comfortably sitting on a chair or working or you might as well go in front of your PC. (Author indicated as Vlad Stivenson) (10) Offering free graphics, banners, templates, etc. If you have the skill and talent for web design, then you can be able to make graphics, templates, banners etc., upload all to your website and allow your visitors to pass on your fonts, graphics, templates, banners etc., for free, of course. Simply display your ad onto your designs or oblige recipients that they link directly to your site. Also be certain that you incorporate a link back to your website in your copyright notice and oblige your recipients to hold intact your copyright notice. (Author indicated as Jeff Phelps)

A principled distinction between error and conventionalized innovation in African Englishes 

From the systematic and subtle meaning expressed by the form “can be able to” in corpora of Black South African English, and its increased acceptance in speaker judgements, supported by its very widespread attestation in South African websites, including ones containing carefully edited texts and native speaker texts, one can (be able to) confidently infer that this form has become a conventionalized innovation in the last decade. Its acceptance seems to go beyond the ethnolectal boundaries of Black South African English to other forms of SAE as well.7 “Can be able to” may well have started out as an error, some kind of overextension, or relaxation of a constraint in Standard English, but has clearly changed its status to that of an acceptable new form in Black South African English, and from there radiating into other sub-varieties of South African English. At the same time, one has to acknowledge that this particular extension (or resuscitation) of the permissible modal combinations with “be able to” fills a gap in the system of English grammar that is obviously there. Extension through analogy seems to be at work here. 3.3

Case Study 3: “Enable + bare infinitive verb” in East African English

Innovations in complementation patterns of verbs have been the topic of investigation of a number of recent studies (e.g. Mukherjee & Hoffmann, 2006; see also Schneider 2007 for a detailed overview of different varieties). In this section, a qualitative difference in East African English is examined, the use of bare infinitive complements with the causative verb “enable”, with data taken from Van Rooy & Terblanche (2009). While “enable + bare infinitive” is regarded as an innovation, this section also identifies data where the combination of the bare infinitive with other verbs should be regarded as performance errors, which have not yet been conventionalized and may perhaps never be. In Standard varieties of English, causative verbs generally select either bare infinitive complements as in (11) or to-infinitive complement clauses, as in (12). There are a number of exceptions, such as “help” in (13) and (14) that selects both complement types. (11) To-infinitive: He said this would enable the cooperatives to attract more deposits (ICE-IND, S2B-019) (12) Bare infinitive: But I was determined not to let him slip through my fingers and go home penniless. (ICE-EA, W2F-035)

7. Anecdotal evidence supports this: I work in an environment where most managers attending university meetings are either Afrikaans or Sesotho speaking, and most of the meetings are held in English. Speakers from both language backgrounds, including very highly ranked officials at the university, avail themselves of this form during meetings.

 Bertus van Rooy

(13) And then we can also use the same feedback to help them to produce those kind of pitch changes in their speech (ICE-GB:S2A-056 #109:1:A) (14) You ‘re bound to make mistakes early on and the instructor is there to help you put them right (ICE-GB:S2A-054 #102:2:A) There seems to be a systematic pattern involved in the choice between to-infinitives (13) and bare infinitives (14), as explained by Mair (1995) and Egan (2008). However, whereas variation is not attested for the verb “enable” in Standard English, it can be found in East African English. Buregeya (2006) examines the acceptability of a number of putative East African features by means of a questionnaire, and finds that structures like “enable them improve” are acceptable to more than half of his participants. In the corpus-based analysis of, among other corpora, ICE East Africa, Van Rooy & Terblanche (2009) find that the complementation pattern for “enable” in spoken East African is very similar to that of “help”, and clearly different from other verbs that are typically used only with to-infinitives, such as “require” and “cause”. The basic differences between the verbs are shown in Figure 1. A number of potential errors are also observed, where the verbs “allow” and “force” select bare infinitive complements in a very small number of cases, in spoken and/or written texts, as exemplified by (15) and (16). 100 90 Complementation pattern (%)

80 70 60 50 40 30 20 10 0 require

Bare-inf To-inf cause

allow

force

enable

help

let

Figure 1. Infinitive complements for selected verbs in the spoken ICE-EA

A principled distinction between error and conventionalized innovation in African Englishes 

(15) He implored with the Minister for Lands to allow him remain in the area, or be re-allocated a portion of the land. (ICE-EA, W2D-005) (16) I did not force accused make statement (ICE-EA, S1BCE-006K) The frequency of such exceptions is not of the same magnitude as the selection of bare infinitives with “enable” and “help”. As is clear from Figure 1, these two verbs show a similar distribution. Statistical confirmation for this clustering is obtained from Correspondence Analysis, which groups the data in the three expected groups, with χ² = 436 (df = 6, p < 0.001). Thus, “help” and “enable” pattern together. Examples of “enable” with bare infinitive complements, from spoken and written data, are: (17) All that money enabled me start this business of shop (ICE-EA, S2B-073K) (18) support to the people in the form of advice materials loans and designing to enable them build better and permanent (ICE-EA, S2B-027T) (19) The project is however at its early stages and is awaiting funds to enable it operate efficiently. (ICE-EA, W2B-030K) (20) The meeting had been called to “review points” which would enable the directors issue title deeds as soon as possible. (ICE-EA, W2C-026K) Based on the more widespread attestation of the “enable + bare infinitive” construction, and the evidence in favour of its acceptance among a majority of East African users surveyed by Buregeya (2006), one is led to conclude that this feature is a conventionalized innovation in this variety, and not an error. By contrast, the extension of the bare infinitive complement to the verbs “allow” and “force”, which still cluster with the exception-free verbs “require” and “cause”, clearly show that the few rare occurrences of those bare infinitive complements are performance errors in terms of the conventions of East African English itself. All these possibilities are supported by the psycholinguistic process of extension by analogy, but only one of the matrix verbs, “enable”, has conventionalized the use of bare infinitive complements. The difference between mere innovation of new forms and the conventionalization of a specific innovation is clearly shown by this contrast. As is the case with the two features of Black South African English analyzed earlier, we also find evidence for a feature that may potentially have originated as an error in the acquisition of East African English, but changed its status to innovation through frequency of use and a certain degree of acceptability. As was the case with “can be able to” in the previous section, the extension of the complementation patterns of “enable” also make use of an existing complementation schema in the grammar of causative verbs, and extends it to a new verb (Mukherjee & Hoffmann 2006: 166–167 also draw attention to the role of analogy in supporting

 Bertus van Rooy

the extension of existing patterns to new, but adjacent, grammatical territory). However, the East African example takes us one step further in showing for the same construction how its combination with other verbs should still be regarded as non-conventionalized, thus “errors”. 4. Conclusion New varieties share the propensity for errors with Learner Varieties, due to similar psycholinguistic processes involved in second language acquisition. In the case of New Varieties, however, these errors, with other kinds of innovations, fill a linguistic feature pool with additional variants, from which some are (mostly non-deliberately) selected by the community to attain conventional status among the speakers. In the case of foreign language varieties, due to the contexts of acquisition and use, there is much less opportunity for new forms to be conventionalized, hence Kachru’s designation of these varieties as performance varieties. However, should the Lingua Franca uses of English, perhaps especially in Continental Europe, increase, we may well head toward a scenario in future where new stable features emerge in these varieties, too. English plays an extensive role in societies where New Englishes develop. It is widely used outside the home. In these contexts, though, the norm-setting segment of the population is not upper-class native speakers, but educated second language speakers of the New Variety itself, from an indigenous rather than settler background. In terms of Schneider’s (2007) model, the outcome of the process of endonormative stabilization (stage 4) is likely to yield more divergence from the input variety if the indigenous, non-native speaking population (IDG-strand) is the majority of the population, as in East Africa, whereas the divergence is likely to be less in the case of a country like New Zealand, where the native speaker community (STL-strand) outnumbers the indigenous population.8 Through frequent use of English in recurring situations, certain variants in the feature pool are selected above those conventionalized in Standard English(es), leading to the emergence of the New Varieties. In this chapter, it was shown how the extended use of the progressive in BSAE has given birth to a new constructional semantics, where a stable resource with different functions than in Standard English has become part of the conventionalized 8. South Africa is very much in a transitional stage at present, where the presumed stabilization of a (largely STL-based) local variety is coming under pressure from the emergent IDGbased norms emanating from the black community (see van Rooy 2010 and Bekker & Adendorff, forthcoming, for details).

A principled distinction between error and conventionalized innovation in African Englishes 

linguistic schemas of the speech community. Van Rooy (2006) compares the New English innovations to native and foreign language learner data, and shows that BSAE is different from the other two, while the other two are quite similar. The German learners in his study show similar uses of the progressive as native speakers, using the same constructional prototype, but use the construction somewhat less and in fewer contexts than native speakers. German learners approximate native norms, but BSAE students present data that must be attributed to a different constructional prototype. Further support for this view is presented by Hundt & Vogel (this volume). “Can be able to” is likewise shown to function as a systematic resource in BSAE. It may also have its origin in analogy or the overextension of the Standard English collocational possibilities, but the mere fact that it used to be regarded as peculiar to BSAE and a potentially erroneous form because of that did not prevent it from acquiring conventionalized status. However, this process is also attested in the history of Standard English, which lost its grammatical gender after contact with speakers of Old Norse. Initially, speakers failed to observe the grammatical gender markings of Old English, but gradually the simplified system became conventionalized and replaced the old gender system (McWhorter 2007: 67–69). The clearest evidence for the difference between mere error and conventionalized innovation was seen in the analysis of bare infinitives with the verb “enable”, an innovation, and “allow” and “force”, where the combination is still an error. The same psycholinguistic learning process may underlie the origin of the combination with all three verbs, but one of them has changed its status to become conventionalized, while the other two remain errors for now, and perhaps forever. New Varieties are different from Learner Varieties because the extended opportunities for use give rise to a higher probability that new features may become entrenched in the communicative repertoires of those communities. While I do not deny their psycholinguistic correspondences, the social context of use is likely to create opportunities for the conventionalizaton of innovations in New Varieties more often than in the context of Foreign Language Englishes, where fewer recurrent situations and contacts between speakers occur to support conventionalization. It is therefore also more likely, as is shown by the use of “can be able to” in South African English, that features of New Varieties of English may sometimes spill over into native varieties. This is in any case a consequence that Schneider (2007) anticipates for post-colonial contexts by the time English in such countries reaches stage 4, endonormative stabilization.

 Bertus van Rooy

References Bamgbose, A. 1998. Torn between the norms: Innovations in world Englishes. World Englishes 17(1): 1–14. Buregeya, A. 2006. Grammatical features of Kenyan English and the extent of their acceptability. English World-Wide 27: 199–216. Bekker, I. & Adendorff, R. forthcoming. The history of SAE. In Handbook of English as a Foreign Language, K. de Bot, K. Schröder & D. Wolff (eds). Berlin: Mouton de Gruyter. Coetzee-Van Rooy, S. 2006. Integrativeness: Untenable for world Englishes learners? World Englishes 25: 437–450. Croft, W. 2000. Explaining Language Change: An Evolutionary Approach. Harlow: Longman. Crystal, D. 2008. On ‘can be able to’. (19 January 2010). De Klerk, V. 1999. Black South African English: Where to from here? World Englishes 18: 311–324. De Klerk, V. 2003. Towards a norm in South African Englishes: The case for Xhosa English. World Englishes 22: 463–481. Egan, T. 2008. Non-finite Complementation: A Usage-based Study of Infinitive and -ing Clauses in English. Amsterdam: Rodopi. Ellis, R. 1994. The Study of Second Language Acquisition. Oxford: OUP. Gough, D. 1996. Black English in South Africa. In Focus on South Africa [Varieties of English around the World 15], V. de Klerk (ed.), 53–77. Amsterdam: John Benjamins. Kachru, B.B. 1991. Liberation linguistics and the Quirk concern. English Today 25(7/1): 3–13. Kortmann, B., Burridge, K., Mesthrie, R., Schneider, E. & Upton, C. (eds). 2004. Handbook of Varieties of English, Vol 2: Morphology and Syntax. Berlin: Mouton. Krishnaswamy, M. & Burde, A.S. 1998. The politics of Indians’ English: Linguistic Colonialism and the Expanding English Empire. New Dehli: OUP. Mair, C. 1995. Changing patterns of complementation, and concomitant grammaticalisation, of the verb help in present-day British English. In The Verb in Contemporary English: Theory and Description, B. Aarts & Meyer, C.F. (eds), 258–272. Cambridge: CUP. McWhorter, J. 2007. Language Interrupted: Signs of Non-native Acquisition in Standard Language Grammars. Oxford: OUP. Mesthrie, R. 1992. English in Language Shift: The History, Structure and Sociolinguistics of South African Indian English. Cambridge: CUP. Mesthrie, R. 1999. The study of new varieties of English. Inaugural lecture, University of Cape Town. Mesthrie, R. 2008a. Black South African English: Morphology and syntax. In Varieties of English 4: Africa, South and South East Asia, R. Mesthrie (ed.), 488–500. Berlin: Mouton de Gruyter. Mesthrie, R. 2008b. Synopsis: Morphological and syntactic variation in Africa and South and Southeast Asia. In Varieties of English 4: Africa, South and South East Asia, R. Mesthrie (ed.), 624–625. Berlin: Mouton de Gruyter. Mufwene, S.S. 2001. The Ecology of Language Evolution. Cambridge: CUP. Mukherjee, J. & Hoffmann, S. 2006. Describing verb-complementational profiles of New Englishes: A pilot study of Indian English. English World-Wide 27: 147–173.

A principled distinction between error and conventionalized innovation in African Englishes  Paradis, M. 2004. A Neurolinguistic Theory of Bilingualism [Studies in Bilingualism 18]. Amsterdam: John Benjamins. Petzold, R. 2002. Toward a pedagogical model for ELT. World Englishes 23: 422–426. Quirk, R. 1990. Language varieties and standard language. English Today 21(6/1), 3–10. Schneider, E. 2003. The dynamics of New Englishes: From identity construction to dialect birth. Language 79: 233–281. Schneider, E. 2007. Post-Colonial Englishes. Cambridge: CUP. Selinker, L. 1972. Interlanguage. International Review of Applied Linguistics 10: 209–231. Szmrecsanyi, B. & Kortmann, B. 2009. Between simplification and complexification: Non-standard varieties of English around the world. In Language Complexity as an Evolving Variable, G. Sampson, D. Gil, & P. Trudgill (eds), 64–79. Oxford: OUP. Titlestad, P. 1996. English, the Constitution and South Africa’s language future. In Focus on South Africa [Varieties of English around the World 15], V. de Klerk (ed.), 163–173. Amsterdam: John Benjamins. Van der Walt, J.L. & van Rooy, B. 2002. Towards a norm in South African Englishes. World Englishes 21: 113–128. Van Rooy, B. 2005. Expressions of modality in Black South African English. In Proceedings from Corpus Linguistics 2005. Van Rooy, B. 2006. The extension of the progressive aspect in Black South African English. World Englishes 25(1): 37–64. Van Rooy, B. 2010. Societal and linguistic perspectives on variability in World Englishes. World Englishes 31(1): 3–20. Van Rooy, B. & Terblanche, L. 2009. Complementation patterns in causative verbs across varieties of English. Paper presented at Corpus Linguistics 2009, Liverpool.

Discussion forum New Englishes and Learner Englishes – quo vadis? Marianne Hundt and Joybrato Mukherjee

University of Zurich and Justus Liebig University, Giessen During the workshop at the ISLE conference in Freiburg, the participants agreed to engage in an online discussion on how to bridge the paradigm gap in researching ESL varieties and EFL variants of English. The most productive strand of this discussion concerned the modelling of different Englishes, but some participants also took up the threads initiated by the editors who had selected quotations from the articles in this volume as a possible starting point for discussion. The following is a summary of the main points of the discussion.1 Some of the threads initiated by the editors are woven into the discussion of the Kachruvian three-circles model in Section 1. Section 2 focuses on the terminological problem of categorizing and labelling features found in New Englishes and Learner Englishes. In Section 3, different developmental trajectories are discussed as the main reason for the ESL-EFL dichotomy. Section 4 addresses issues of corpus methodology and the role of frequency in the description of ESL varieties and EFL variants. The summary of the discussion brings the different contributions together but does not aim at providing answers to questions that can be addressed from different angles. These parallel (and maybe sometimes contradictory?) strands in the discussion might inspire future directions of research.

1. Modelling Englishes in the world One of the lead questions sent out by the editors of the volume touched on the distinction of English as a native, second and foreign language. An obvious way of 1. We would like to thank Gerold Schneider for help with setting up the discussion platform and Carolin Biewer, Sarah Buschfeld, Sandra Götz, Sylviane Granger, Ulrike Gut, Marco Schilk, Benedikt Szmrecsanyi and Bertus Van Rooy for participating in the discussion. Whenever we quote verbatim from the forum, the quotation is followed by the contributor’s full name.

 Marianne Hundt and Joybrato Mukherjee

approaching the distinction was to refer to Kachru’s (1986, 21992) concentric-circles model and discuss the usefulness of the distinction. Bertus van Rooy points out that “[w]hen the three concentric circles were proposed, the basis for classification was countries and the socio-political circumstances, rather than linguistic correspondences”. Kachru’s model was primarily concerned with ‘ownership’ of English and norms, building on the different social histories of spread and the current functional profiles of English in various countries. His model has had an impact on the perception of the New Englishes that evolved in the outer circle, but it seems to have been largely disregarded in its potential relevance for the expanding circle, as Fraser Gupta (2006: 95f.) observes: Having spent most of my adult life in Asia, and having been involved in these issues in the Outer Circle [question of ‘norms’ and ownership of English], I was surprised to find, on my return to Europe in 1996, that the insights from the Outer Circle had not been fed into the Expanding Circle, despite their having been raised by Kachru (1985) over ten years earlier. In the Expanding Circle English is predominantly a non-native language, used in very restricted domains (typically with foreigners), and learnt in scholastic settings. The teaching of English in mainland Europe is dominated by a monolithic model, usually based on Standard British English and RP, which may involve favouring ‘native speaker’ teachers, requiring teachers to adhere to an out-of-date and highly abstracted sense of what is correct, and penalizing students to use the ‘correct’ accent, typically the Daniel Jones variant of RP which is nowadays little heard.

Apart from applying the labels ENL, ESL and EFL to speaker communities, i.e. to particular countries, they can also be used to distinguish different types of (a) individual speakers and (b) varieties of English and their structural properties.2 1.1

ENL, ESL and EFL countries?

In some postcolonial societies, speakers of ENL, ESL and EFL live alongside each other (Carolin Biewer, Marco Schilk, Sarah Bongartz, Bertus Van Rooy), a fact that Kachru’s model does not deny but abstracts away from as it assigns a country to a particular circle on the basis of the majority of its speakers. Obviously, linguistic realities, especially in multilingual settings, may be quite complex, as in the case of South Africa, a country that was not assigned to a particular circle by Kachru himself, probably because its status as ESL or EFL was far less than clear:

2. Individual speakers will orient towards different norms, and this norm orientation, in turn, will have an influence on the structural properties of the variety/variant that the speaker uses (see also the remark at the end of Section 1.2).

Discussion forum: New Englishes and Learner Englishes – quo vadis? 

Kachru did not include South Africa in his early statements, and this is one of the more interesting examples, because there (a) is a native speaker community of about 10% of the population (although just under half of them are of Indian descent, and shifted to English in the last half century – not typical Inner Circle material!), (b) an African majority population (about 75%), who generally speak a Bantu language at home, and make extensive use of English for educational, economic and government functions. Besides them, there is an Afrikaans community (an Adstrate for Schneider 2007), that is largely in the same position as the African community, except for enclaves (in some rural parts of the country); their linguistic experience may be much more akin to the EFL context – they enjoy access to education, entertainment and economic opportunities in Afrikaans and have little use for English, except if they choose to watch certain foreign films. (Bertus van Rooy)

From the perspective of individual speakers or groups of speakers within a country, van Rooy therefore finds Schneider’s (2007) model more helpful because it considers the (changing) sociolinguistic relationship between speaker groups and the different acquisition histories within a community rather than grouping them all together into either an undifferentiated ENL, ESL or EFL community. 1.2

ENL, ESL and EFL speakers

With respect to the acquisition histories, Bertus van Rooy and Ulrike Gut (see Section 1.4) point out that there are language-acquisitional characteristics that are shared by ESL speakers and EFL speakers. Bertus van Rooy sees some overlap in particular in the individual, psycholinguistic experience, i.e. the fact that acquiring an additional language “is a more conscious and often a more laborious process than native language acquisition that comes ‘for free’ in the Inner Circle experience. Similar linguistic phenomena may have their basis in similarities across individual learner experiences”. But he also points out that in some countries (and for individual learners) the acquisition of English comes closer to that of a first language because learners are frequently exposed to the additional language before entering school and because there is extensive code-switching and possibly some form of diglossia at home. As regards ESL and ENL speakers, Bertus van Rooy also sees similarities in the social experience of English which is connected to the issue of national ownership and an awareness and acceptance of a local norm in countries that have reached stage four of Schneider’s evolutionary model of Englishes. Different regional norms for speakers will thus result in a divergence of the English spoken in the inner and outer circle, whereas orientation towards the ENL model on the part of the EFL speakers is likely to lead to structural similarities between ENL and EFL usage (note how this obviously links the

 Marianne Hundt and Joybrato Mukherjee

speaker-based definition of the terms at hand to the resulting variants/varieties or usage patterns). The orientation of usage norms in the expanding circle towards ENL varieties is also emphasized by Marco Schilk, and there is further empirical backing in the research by Szmrecsanyi & Kortmann (this volume) and Coetzeevan Rooy (2006). 1.3

Structural properties of ENL, ESL and EFL: Discrete variety types or continuum?

The studies in this volume investigate language use in ENL, ESL and EFL contexts. One concern, therefore, is with the description of structural properties of these varieties.3 Bongartz & Buschfeld (this volume) did not find a “clear and static cutoff point between second-language varieties and learner Englishes [...]” in their case study on English in Cyprus. Similarly, Gilquin & Granger (this volume) argue for a continuum rather than a clear-cut dichotomy of ESL vs. EFL varieties, a plea emerging from various corpus-based studies. Szmrecsanyi & Kortmann, on the other hand, found “a clear [typological] difference between ESL varieties (as sampled in ICE) and EFL varieties (as sampled in ICLE): the former are demonstrably closer to native varieties of English than the latter” (Benedikt Szmrecsanyi). There are five features that discriminate particularly well between ESL and EFL varieties, one of them typical of the former (inflected verbs) and four of the latter (see Szmrecsanyi & Kortmann, this volume). Van Rooy (this volume) points out why innovations in ESL varieties are more likely to become entrenched, namely because “recurrent situations and contacts between speakers occur to support conventionalization”. As an example of this, Marco Schilk mentions the stablization of new patterns of usage (e.g. new verb complementation patterns) through their recurrent use in prestigious local news media which “may serve as a linguistic yardstick for speakers of a variety”. There are thus several factors that may lead to a divergence of ENL and ESL varieties, especially in the area of lexico-grammar. At the same time, there are structural features that are similar across some EFL samples in the International Corpus of English and varieties of ENL (e.g. ICLE-Czech, see Section 1.2) whereas other samples of EFL usage clearly diverge from the ENL model

3. Whether the use of English as a foreign language results in a ‘variety’ of English or not is discussed in the individual contributions in this volume. The label ‘variety’ can, thus, be used in different ways: either as a configuration of usage patterns or as a full-fledged and stabilized variety of English. With respect to corpora of English as a learner language, it is used to refer to patterns of usage rather than a stabilized variety.

Discussion forum: New Englishes and Learner Englishes – quo vadis? 

(e.g. ICLE-Spanish).4 Corpus-linguistic methodology allows us to chart the variance in the data, but this variance needs to be accounted for: As corpus linguists, our next challenge, perhaps a little outside our comfort zones, will be to identify some of these external forces pulling varieties in different directions, and substantiate our claims with at least some meaningful correlations to the observed variance in the corpus data. (Bertus Van Rooy)

Some studies in this volume provide corpus-linguistic evidence of a structurally clear-cut distinction between ENL and ESL which is possibly motivated by sociolinguistic and psycholinguistic factors. However, the degree to which a clear distinction between types of Englishes and individual varieties is possible may depend on the descriptive approach. If based on a bundle of features, varieties may well be grouped together as predicted by the Kachruvian model (as applied to varieties rather than countries). If an individual feature is studied, however, varieties may cluster in unexpected ways, as shown, for example, in the study by Hundt & Vogel (this volume): “Systematic corpus linguistic investigation and comparison of outer-circle and expanding-circle usage with ENL data are likely to reshape our perception because they bring to the fore the similarities and tone down the ‘exotic’ aspects of ESL and EFL usage [...]”. To this, Benedikt Szmrecsanyi replied that corpus-linguistic evidence seemed, on the whole, to provide more evidence of divergence rather than convergence, even if only the non-exotic features were compared across varieties. Carolin Biewer added an important methodological caveat, namely that for corpus-based comparisons to be valid, the corpora used as a basis for general comparative claims need to be truly comparable, and that comparability may not even be given in seemingly ‘comparable’ corpora. Different compilers of ICE corpora, for example, may interpret text-types differently, and this might affect the results we will obtain from the data. Similarly, if a text category such as social letters were sampled from one part of the population only (e.g. students at a university), this is also likely to affect data comparability. If we increase the level of granularity and home in on the use of English in a particular country, it may turn out that the degree of variation within the country defies easy labelling of a variety as either ESL or EFL. Sarah Buschfeld and Christiane Bongartz point at their way of modelling variation within Cyprus in terms of a Variety Spectrum: We therefore developed the concept of a Variety Spectrum that depicts (individual) feature occurrence in form of a scatter plot and thus shows the degree of 4. Note that these differences might simply reflect different degrees of proficiency in the speakers/writers sampled in the corpora rather than fundamental levels of structural difference between EFL usage in the Czech Republic and Spain.

 Marianne Hundt and Joybrato Mukherjee

homogeneity/heterogeneity within a speech community. It can therefore account for hybrid cases like Cyprus English, since it considers the influence of sociolinguistic variables and depicts variety-internal variation.

1.4

Labelling and the paradigm gap

The terminology inherent in the Kachruvian model (whether applied to countries, speakers or varieties) might partly account for the paradigm gap between research into New Englishes and Learner Englishes. Researchers working within secondlanguage acquisition (SLA) find the term ‘L2 variety’ for New Englishes misleading and would prefer their colleagues to use the label ‘indigenized L2 varieties’ (Benedikt Szmrecsanyi). A label that takes the sociolinguistic rather than the structural peculiarities of new Englishes into account would be ‘institutionalized L2 varieties’ (Marianne Hundt). But the term ‘second language’ is also far from clearly defined in SLA research itself: even though the label is ‘second’ language acquisition, it actually includes situations in which learners acquire it as an additional language alongside a second, or third, etc. language and covers a range of different learner characteristics. The obvious conclusion is that “[i]t is high time that the similarities betwen the two fields are recognized so that a methodological and theoretical exchange can begin which will be to their mutual benefit” (Ulrike Gut). 2. The error-innovation cline As mentioned above, an important language-political motivation for the development of Kachru’s model was the distinction between errors and innovations.5 Frequently, usage patterns that diverge from the native-speaker norm are referred to as errors or mistakes if they occur in a text produced by an EFL speaker, as in the following comment on a text produced by a German learner of English: I am inclined to say that this text is written ‘in’ Standard English, with many mistakes or errors. [...] The non-standard verb forms (and the other non-standard features in the text) arise from the structures of Standard English, and the writer’s difficulties with those structures. The features are similar to the features found in many learners of English in Outer Circle locations too. (Gupta, fc.)

Gupta also writes that “[i]t would be good if we could think of some firm linguistic criteria to distinguish ‘learner varieties’ from contact varieties [ESL in her 5. Van Rooy mentions that a critique of Kachru’s model is based on the renaming of ‘errors’ or ‘deviations’ as ‘innovations’ (see Krishnaswamy & Burde 1998).

Discussion forum: New Englishes and Learner Englishes – quo vadis? 

terminology] and from continuity varieties [her label for ENL]”. On purely linguistic grounds, however, we cannot distinguish between errors and innovations and even the political and attitudinal distinction represents a cline rather than a dichotomy (see Gut, this volume). Bertus van Rooy argues that the identification of a deviation is based on a linguistic procedure. “To go one step further and label such differences as ‘errors’ is indeed extra-linguistic (unless, of course, what the speaker him/herself intended, did not realize – which would correspond to ‘mistake’ in the older error/mistake dichotomoy of some 1970s error analysis exponents)” (ibid.). Since the labelling of a pattern that diverges from the ENL norm as ‘innovation’ or ‘error’ is not based on linguistic criteria, it is hardly surprising that the corpusbased approach taken in this volume fails to provide a solution to the problem. 3. ESL and EFL: Developmental differences Carolin Biewer (this volume) argues that “[a]ll ESL varieties [i.e. institutionalized and non-instutionalized ones] may develop in the same direction due to a common learning process but they will have different starting points and endpoints depending on norm orientation and different social value systems”. Bertus van Rooy sees a similarity in the developmental process “[...] because the invidividuals learning languages in different contexts have similar individual attributes – they all speak at least one other language already, which they may draw on, and they all acquire the new language bit by bit, and probably much more consciously, since they are aware of differences with their existing language(s)”. The different starting points are in the different functions the language has in the community; the different exposure to the language and opportunities in using it outside the classroom context add to the divergent development alongside differences in norm-orientation. Van Rooy further adds that “the individual aptitudes for language learning, strengthened or weakened by attitude differences, may also lead to further differences within each society [...]”. But he also maintains that differences in the starting point as well as process will result in “a persistent difference between the native-speaking and all other contexts”. 4. Corpus methodology and the role of frequency Götz & Schilk (this volume) argue that the different contexts of acquisition and use result, among other things, in a lower frequency of collocations in the EFL learner data. Corpora are the prerequisite for making such statements. But frequency is a slippery issue. Referring to a previously published article (Granger 1998), Sylviane

 Marianne Hundt and Joybrato Mukherjee

Granger points out that learners are not ‘virgin territory’ when it comes to the use of prefabricated language chunks as they tend to transfer phraseological units from their L1 to the target language but tend to underuse those of the target language. In other words, learner language may turn out to be less phraseological only if those chunks are investigated that ENL speakers would use. Furthermore, they tend to use a limited set of prefabs (so-called phraseological ‘teddy bears’) – a finding that is confirmed in the case study by Götz & Schilk (this volume). Sylviane Granger, like others, attributes differences in the frequency and type of phraseological units that are used to differences in the input. A further area of study is the impact that a different learning environment might have on the phraseological profiles of different learners: “It would be great to compare the use of prefabricated language by French or Spanish students acquiring English in Great Britain or the USA vs. learning English in Belgium or Spain”. But Sandra Götz also points out that they found that ESL speakers used 3-grams more frequently and with a greater variability in some contexts than ENL speakers; an aspect that, in her opinion, merits further research. Her final comment highlights the fact that corpus-based studies tend to abstract away from the individual learner because corpora are intended as representative samples of a speech community. She refers to a recent study on lexical diversity by Foster & Tavakoli (2009) that shows how EFL speakers “who live in the target language community, increase their performance in this area even to the extent that there are no more significant differences compared to ENL speakers”. Corpus-linguistic studies on ESL and EFL language use can therefore benefit from case studies on individual learners/speakers as well as other sources of data (e.g. psycholinguistic evidence). 5. Looking ahead The focus of the papers in the present volume is on the corpus-based description of ESL and EFL varieties/variants. These descriptions are the basis for the discussion of theoretical issues. The strength of the contributions in this collection is on the wide range of varieties/variants that are studied as well as the breadth of features which are described. All the same, we have only seen the tip of the iceberg of what is possible to achieve with the corpus-based approach to second-language varieties of English. At the same time, it is obvious that the corpus-based approach also has its limitations. Future studies are likely to profit from a methodologically integrated approach that combines corpus-based description with sociolinguistic data on the one hand and psycholinguistic evidence on the other hand. Among the sociolinguistic methodologies that are likely to benefit the modelling of second language

Discussion forum: New Englishes and Learner Englishes – quo vadis? 

varieties are attitudinal studies, which might allow us to distinguish between ‘features’ and ‘errors’, for instance. Variable rule analysis is likely to contribute to a better understanding of the degree of variability within varieties/variants. As far as psycholinguistic methodology is concerned, integrating the study of the underlying acquisitional processes with the description of structural properties of the resulting varieties/variants leaves a wide scope for further research. The studies in this volume have therefore only started to close the gap in the study of ESL and EFL varieties/variants. References Coetzee-van Rooy, S. 2006. Integrativeness: Untenable for World Englishes learners? World Englishes 25(3): 437–50. Foster, P. & Tavakoli, P. 2009. Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity. Language Learning 59(4): 866–96. Fraser Gupta, A. 2006. Standard English in the World. In English in the World: Global Rules, Global Roles, R. Rubdy & M. Saraceni (eds), 95–109. London: Continuum. Fraser Gupta, A. Forthcoming. One World, One English. Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, Analysis and Applications, A. P. Cowie (ed.), 145–160. Oxford: OUP. Kachru, B.B. 1985. Institutionalized Second-Language Varieties. In The English Language Today, S. Greenbaum & B.B. Kachru (eds), 211–226. Oxford: Pergamon. Kachru, B.B. 1986. The power and politics of English. World Englishes 5(2/3): 121–140. Kachru, B.B.² 1992. Models for non-native Englishes. In The Other Tongue. English Across Cultures, B.B. Kachru (ed.), 48–74. Chicago IL: University of Illinois Press. Krishnaswamy, N. & Burde, A.S. 1998. The Politics of Indians’ English: Linguistic Colonialism and the Expanding English Empire. New Delhi: OUP. Schneider, E.W. 2007. Postcolonial English: Varieties around the World. Cambridge: CUP.

Bionotes Carolin Biewer is a Post-doctoral Assistant and teaches English linguistics at the University of Zurich. Her research interests include corpus linguistics, variationist sociolinguistics and varieties of English. She is currently working on her postdoctoral thesis on South Pacific Englishes. Christiane M. Bongartz is Professor of English Linguistics at the University of Cologne. She has published on topics such as second language acquisition, immersion and morphosyntactic theory. Sarah Buschfeld is a Research Assistant and doctoral student in English Linguistics at the University of Cologne. Her research interests include varieties of English, second language acquisition, and morphosyntax. Gaëtanelle Gilquin is a Research Associate with the Belgian National Fund for Scientific Research (FNRS) at the Université Catholique de Louvain (Belgium). She is interested in the link between Learner Englishes and World Englishes. Sandra Götz is a Research Assistant and doctoral student in English linguistics at Justus Liebig University, Giessen and Macquarie University, Sydney. Her research interests include spoken English, non-native varieties of English and corpora in English language teaching. Sylviane Granger is Professor of English Language and Linguistics and Director of the Centre for English Corpus Linguistics at the Université Catholique de Louvain (Belgium). Her research centres on the analysis of learner and bilingual corpora. Ulrike Gut studied English, psychology and linguistics at the Universities of Mannheim and Cambridge (UK) and currently holds the Chair for Applied English Linguistics at the University of Augsburg. Michaela Hilbert is a Research Assistant and Lecturer at the University of Bamberg, Germany. She is also the project coordinator of the Maltese component of the International Corpus of English (ICE). Marianne Hundt is Professor of English Linguistics at Zurich University. Her research interests include late Modern English, metropolitan varieties of English, Asian Englishes, Fiji English, and the Indian Diaspora.

 Exploring Second-Language Varieties of English and Learner Englishes

Bernd Kortmann is Full Professor of English Language and Linguistics at the University of Freiburg, Germany. His main research interest over the last decade has been the grammar of non-standard varieties of English around the world, especially from a typological perspective. Joybrato Mukherjee holds the Chair of English Linguistics at Justus Liebig University, Giessen. His research interests include corpus linguistics, varieties of English, learner English and the lexis-grammar interface. Marco Schilk is a Senior Lecturer for English linguistics at the University of Giessen. His research interests include World Englishes, morphosyntactic and syntactic variation and corpus linguistics. Benedikt Szmrecsanyi is a Fellow at the Freiburg Institute for Advanced Studies (FRIAS). His research interests include dialectology and dialectometry, and the interface between variation studies and cross-linguistic typology. Bertus van Rooy is Professor of English Linguistics at North-West University, South Africa. His interests include the linguistic features and variability of new varieties of English and how new features gain acceptability. Katrin Vogel teaches English and History at Lessing-Gymnasium Karlsruhe, Germany. Her fields of interest are ESL and learner varieties of English, and – currently – pedagogical issues, such as using corpus results in the language classroom.

Index A academic writing 151, 154–155 acceptability 156, 191, 195, 197, 199, 202 analyticity 168–169, 171, 173, 175, 181–184 analyticity index 171–174, 179–181, 183 attitude 37, 40, 48–49, 81, 116 B Black South African English (BSAE) (see also South African English) 19, 58, 75, 146, 195–198, 201–205 C common core 87, 89 complexity 55, 183–184, 194 concentric-circles model 147, 210 contact variety 125, 149, 153, 161–162, 214 Contrastive Interlanguage Analysis (CIA) 58 conventionalization 189, 192–193, 195, 203, 205 Cook Islands English (CookE) 11, 18, 21–27 creativity 71–72 cross-linguistic influence (CLI) 103–105, 108, 111–119 Cyprus English (CyE) 35–37, 40–43, 45, 48–52, 214 Cyprus English Data Analysis and Research (CEDAR) 35–36, 40 D do-periphrasis 134–135 dynamic model 50, 148, 194 E East African English (EAE) 189, 201–203

English Language Complex (ELC) 89 error 12, 72, 102–105, 115, 117–118, 120–121, 189–191, 193–194, 204–205, 214–215 F feature 8, 12, 14–15, 28, 41, 48, 102–103, 141, 152, 190, 192 Fiji English (FijE) 11, 18, 21–27, 126, 147–148, 154 formulaic language 83, 85, 135–136, 142 frequency 15, 17, 21–25, 45, 47, 56, 60–63, 65–69, 73–75, 83, 86–90, 92, 94–95, 97, 107–108, 112–113, 115, 125, 136–139, 141, 152–155, 158–161, 165, 171–172, 176–179, 199, 203, 215–216 G Ghanaian English (GhanE) 21–27, 102 H Hong Kong English (HKE) 11, 19, 113–114 I imitation 133–135 Indian English (IndE) 19, 60, 79, 81, 92, 95, 97, 113–114, 116, 126–141, 145, 158, 178–179, 191 innovation 60, 72, 102–104, 113, 116–121, 128, 191–192, 205, 212, 214–215 interlanguage 12, 14, 37, 168, 190 International Corpus of English (ICE) 4, 20, 86, 130, 149, 150, 170–171, 174–175, 184, 212 International Corpus of Learner English (ICLE) 56–58, 63–70, 75, 115, 150–151, 165, 169, 173–176, 179–181, 183–184 into 56, 59–64, 66–74, 76

Irish English (IrE) 126–129, 133–134, 141, 147, 149, 153, 159 J Jamaican English (JamE) 107, 118, 120 K Kenyan English (KenE) 70, 72, 126, 148 L learner corpus 58, 74 lexicogrammatical feature 42 local community 9, 19, 28 local norm 116, 120–121, 211 local variety 116 Louvain Corpus of Native English Essays (LOCNESS) 59, 73 Louvain International Database of Spoken English Interlanguage (LINDSEI) 86, 184 M Malaysian English (MalE) 18, 148, 154 markedness theory 15–16, 27 modality 9, 15–16, 18–20 morphosyntactic feature 41, 43 morphosyntax 182 multiword unit 82, 85 N New Zealand English (NZE) 21–27, 30, 147, 149, 155, 158–159, 162 n-gram 86 Nigerian English (NigE) 19, 102, 114–116 non-contact variety 149, 161 norm 2, 4, 10, 12–13, 20, 24, 28, 72, 80–81, 101–102, 119–120, 160–161, 175, 191, 194, 204, 210–212, 214–215 norm-orientation 102, 116, 120–121, 215

 Exploring Second-Language Varieties of English and Learner Englishes O overuse 17, 65, 66, 107, 145–146, 158, 160, 175, 177–178 ownership 210–211 P Philippine English (PhilE) 22–23, 25–27, 148 phrasal verb 65–66, 69–70, 75 preposition 56, 59–60, 64, 66, 72, 171–172 proficiency level 57, 70, 111, 116, 151, 155, 160 progressive 145–146, 151–162, 165, 195–197, 204–205 R register interference 175, 182 S Samoan English (SamE) 11, 18, 21–27 Second Language Acquisition (SLA) 2, 7–10, 12–20, 27–29, 55–56, 59–60, 76, 80–81, 84, 104–105, 110–111, 117–119, 128, 135, 141, 161–162, 168, 175, 182–184, 191–192, 204, 214

shortest path principle 14, 16–18, 23, 27–28 Singapore English (SingE) 11, 21–27, 107, 114–117, 128–135, 141, 148, 153 South African English (SAE) 19, 50, 58–59, 75, 146, 195–198, 201, 203, 205 stability 75, 189–190 stative verb 146, 152, 156–158, 160, 196–197 structural nativization 41, 45, 48, 50–52, 148, 191 substrate 19, 117–118, 128–129, 133–135, 142, 146, 153, 173, 179–183, 191 substrate effect (see also substrate) 169–170, 173, 179, 181–183 substrate influence (see also substrate) 25, 128–130, 134–135, 146, 149 substrate language (see also substrate) 117, 129, 142, 173, 180–182, 191 syntheticity 168–169, 171–175, 178–179, 181, 183–184

T target norm 80–82 teddy bear principle 15–16 transfer 12–13, 15, 18–19, 28, 37, 41–44, 81, 102–107, 109, 112, 118–119, 128–129, 191, 216 transfer to somewhere principle 14, 16, 19, 28 U underuse 60–61, 64–65, 69, 183, 216 V variability 17–18, 23, 27, 88, 90, 93, 168, 173, 175–176, 216–217 variety spectrum 3, 45, 48, 52, 213 verb complementation 113

Exploring Second-Language Varieties of English and Learner Englishes: Bridging a paradigm gap (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction (Studies in English Language)

Varieties of English (Studies in English Language)

Exploring Corpora for ESP Learning (Studies in Corpus Linguistics)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Structural Nativization in Indian English Lexicogrammar (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Bridging the Gap

Bridging the Gap

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

English General Nouns: A Corpus Theoretical Approach (Studies in Corpus Linguistics, Volume 20)

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

English General Nouns: A Corpus Theoretical Approach (Studies in Corpus Linguistics, Volume 20)

Bridging the Gap

Researching Specialized Languages (Studies in Corpus Linguistics)

Bridging the gap

Bridging the Gap

Corpus Linguistics

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

University Language: A corpus-based study of spoken and written registers (Studies in Corpus Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Small Corpus Studies and Elt: Theory and Practice (Studies in Corpus Linguistics)

Grammaticalization and English Complex Prepositions A Corpus-Based Study (Routledge Advances in Corpus Linguistics)

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

Exploring Second-Language Varieties of English and Learner Englishes: Bridging a paradigm gap (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction (Studies in English Language)

Varieties of English (Studies in English Language)

Exploring Corpora for ESP Learning (Studies in Corpus Linguistics)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Structural Nativization in Indian English Lexicogrammar (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Bridging the Gap

Bridging the Gap

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

English General Nouns: A Corpus Theoretical Approach (Studies in Corpus Linguistics, Volume 20)

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

English General Nouns: A Corpus Theoretical Approach (Studies in Corpus Linguistics, Volume 20)

Bridging the Gap

Researching Specialized Languages (Studies in Corpus Linguistics)

Bridging the gap

Bridging the Gap

Corpus Linguistics

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

University Language: A corpus-based study of spoken and written registers (Studies in Corpus Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Small Corpus Studies and Elt: Theory and Practice (Studies in Corpus Linguistics)

Grammaticalization and English Complex Prepositions A Corpus-Based Study (Routledge Advances in Corpus Linguistics)

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

Recommend Documents