Corpora and Discourse: The Challenges of Different Settings (Studies in Corpus Linguistics, Volume 31)

Corpora and Discourse Studies in Corpus Linguistics (SCL) SCL focuses on the use of corpora throughout language study...

Author: Annelie Ädel (Editor) | Randi Reppen (Editor)

48 downloads 1401 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Corpora and Discourse

Studies in Corpus Linguistics (SCL) SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline.

General Editor

Consulting Editor

Elena Tognini-Bonelli

Wolfgang Teubert

The Tuscan Word Center/ The University of Siena

Advisory Board Michael Barlow

Graeme Kennedy

Douglas Biber

Geoffrey N. Leech

Marina Bondi

Michaela Mahlberg

Christopher S. Butler

Anna Mauranen

Sylviane Granger

Ute Römer

M.A.K. Halliday

Jan Svartvik

Yang Huizhong

John M. Swales

Susan Hunston

Martin Warren

University of Auckland Northern Arizona University University of Modena and Reggio Emilia University of Wales, Swansea University of Louvain University of Sydney Jiao Tong University, Shanghai University of Birmingham

Victoria University of Wellington University of Lancaster University of Liverpool University of Helsinki University of Hannover University of Lund University of Michigan The Hong Kong Polytechnic University

Stig Johansson

Oslo University

Volume 31 Corpora and Discourse. The challenges of different settings Edited by Annelie Ädel and Randi Reppen

Corpora and Discourse The challenges of different settings

Edited by

Annelie Ädel University of Michigan, USA

Randi Reppen Northern Arizona University, USA

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Library of Congress Cataloging-in-Publication Data Corpora and discourse : the challenges of different settings / edited by Annelie Adel and Randi Reppen. p. cm. (Studies in Corpus Linguistics, issn 1388-0373 ; v. 31) Includes bibliographical references and index. 1. Discourse analysis--Data processing. 2. Corpora (Linguistics) I. Ädel, Annelie. II. Reppen, Randi. P302.3.C6683 2008 401'.410285--dc22 isbn 978 90 272 2305 0 (Hb; alk. paper)

2008006978

© 2008 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Table of contents

1. The challenges of different settings: An overview Annelie Ädel and Randi Reppen

1

Section I Exploring discourse in academic settings 2. ‘...post-colonialism, multi-culturalism, structuralism, feminism, post-modernism and so on and so forth’: A comparative analysis of vague category markers in academic discourse Steve Walsh, Anne O’Keeffe and Michael McCarthy 3. Emphatics in academic discourse: Integrating corpus and discourse tools in the study of cross-disciplinary variation Marina Bondi 4. Interaction, identity and culture in academic writing: The case of German, British and American academics in the humanities Tamsin Sanderson

9

31

57

Section II Exploring discourse in workplace settings 5. “Got a date or something?”: An analysis of the role of humour and laughter in the workplace meetings of English language teachers Elaine Vaughan 6. Determining discourse-based moves in professional reports Lynne Flowerdew 7. //→ ONE country two SYStems //: The discourse intonation patterns of word associations Winnie Cheng and Martin Warren

95 117

135

Section III Exploring discourse in news and entertainment 8. Who’s speaking?: Evidentiality in US newspapers during the 2004 presidential campaign Gregory Garretson and Annelie Ädel

157



Corpora and Discourse

9. Television dialogue and natural conversation: Linguistic similarities and functional differences Paulo Quaglio 10. A corpus approach to discursive constructions of a hip-hop identity Kristy Beers Fägersten

189 211

Section IV Exploring discourse through specific linguistic features 11. The use of the it-cleft construction in 19th-century English Christine Johansson 12. Place and time adverbials in native and non-native English student writing William J. Crawford Author index Corpus and tools index Subject index

243

267

289 291 293

JB[v.20020404] Prn:11/04/2008; 10:36

F: SCL3101.tex / p.1 (47-111)

The challenges of different settings An overview Annelie Ädel and Randi Reppen

Corpus-linguistic studies of discourse Corpus linguistics has, over the past few decades, undergone a transformation from a “little donkey cart” to a “bandwagon” (Leech 1991: 25), and is now at a point at which it “is becoming part of mainstream linguistics” (Mukherjee 2004: 118). Mainstream linguistics, however, is very broad and multifaceted, and some subfields are more amenable to corpus-linguistic methodology than others. If we disregard some basic research issues, such as access to a suitable corpus that gives a reasonably representative sample of the population studied, there are certain generalizations we can make about the compatibility of corpus-based methods with the research questions posed in different linguistic subfields. For example, while lexicographers are often able to use corpus-assisted methods in answering their particular questions about language in relatively straightforward ways, discourse analysts – whether working with speech or writing – are likely to spend a great deal of time finding possible solutions for computerizing their methods. Discourse phenomena, with their frequent dependence on and sensitivity to context, co-text, and interpretation, require rather complex solutions and often a great deal of intervention on the part of the researcher. Despite the potential difficulties of automatizing data retrieval and analysis, researchers interested in discourse have started to adopt corpus-linguistic methods more seriously over the past few years – a trend to which the current volume bears witness. This is, however, a very recent development. At the end of the twentieth century, Biber et al. (1998: 106) described the state of the art as follows: “although nearly all discourse studies are based on analysis of actual texts, they are not typically corpus-based investigations: most studies do not use quantitative methods to describe the extent to which different discourse structures are used, and relatively few of these studies aim to produce generalizable findings that hold across texts.” Two other textbooks on corpus linguistics published around the same time –

JB[v.20020404] Prn:11/04/2008; 10:36



F: SCL3101.tex / p.2 (111-170)

Annelie Ädel and Randi Reppen

McEnery & Wilson (1996) and Kennedy (1998) – both point to the comparatively marginal application of corpus-linguistic methods in discourse studies. However, a couple of years into the new century a slightly different picture of the compatibility of computer-assisted methods with discourse-level phenomena was presented. Comparing the state of the art in 2002 to the early days of corpus linguistics, Conrad (2002: 86) gives a positive characterisation, stating that, “[a]s corpus linguistics first developed, it was often thought that it could not be applied to language phenomena that extended beyond clause boundaries. As the field has matured, it has instead become apparent that many studies within corpus linguistics address discourse-level concerns, many showing association patterns or the interactions of variables that would not be apparent without corpus-based techniques.” At this point in time, we are happy to be able to say that things really are changing. For readers who wish to explore why this might be, Partington (2004) offers a summary of explanations (such as the widespread inclusion of text extracts rather than full texts in standard corpora) for the historically slight application of corpuslinguistic methods in studies of text and discourse. As a demonstration of recent shifts in this area, the present volume brings together researchers from diverse areas of text and discourse, all of whom demonstrate the viability of corpus-based research and corpus-assisted tools for discourse studies.

Finding discourse-relevant data It is interesting to consider the search methods used by the different researchers in this volume to locate linguistic forms in a corpus – usually, in the case of discourse analysis, forms that are linked to a particular function. We believe that a description of commonly used retrieval methods can help others in reflecting on their own studies and the options available to them. Four main methods were used by the authors of these chapters, which we believe to be representative of the field. The most typical search method can be called one-to-one searching, which involves investigating a linguistic form through a search term that only yields relevant hits. A good example of this is Crawford’s time and place adverbs here and now in Chapter 12, where there are no spurious hits, and the entire set that the researcher intends to examine is captured. To use more technical vocabulary, precision and recall are both at 100%. The ease of capturing relevant examples, however, does not necessarily mean that no more work remains for the researcher, who will often go on to examine the different discourse functions or semantic distinctions of the search term in question. Other search methods, however, need to be used when there is not a simple one-to-one mapping between a search term and the body of relevant hits in a corpus. To mention just a couple of complicating factors familiar to all linguists,

JB[v.20020404] Prn:11/04/2008; 10:36

F: SCL3101.tex / p.3 (170-233)

The challenges of different settings

individual linguistic forms can be polysemous, while specific functions of language (such as politeness) can be realized by many different linguistic forms. The second search method can be called sampling (Ädel 2003). It involves the use of one or more search terms that are good examples of the linguistic phenomenon in question. The drawback is that not all instances of the phenomenon, but only a subset, will be captured, although one advantage is that the search terms used tend to yield a high number of relevant hits. When using this method, the researcher cannot claim to have covered all bases or to have mapped out a linguistic function in its entirety, but many valuable insights can still be provided, especially if the search term is a good indicator of the phenomenon under study. Chapter 5 provides a good example of sampling, with Vaughan being able to draw interesting conclusions about the role of humour in the workplace based on occurrences of laughter. Vaughan uses occurrences of laughter, indicated in the transcriptions, as a “proxy” (cf. Garretson & O’Connor 2007: 89) for humour. The third search method can be called sifting (Ädel 2003), since once the initial hits have been retrieved, they need to be sifted through – meaning that a certain proportion will be manually discarded. Using this method, the researcher often needs to put a great deal of time into checking the retrieved data (before the actual analysis can begin). The advantage of this method tends to be that, once the sifting has been done, the remaining set covers all or most of the potential forms of the linguistic phenomenon one is looking for. An example of this method is found in Chapter 9, where Quaglio uses an extensive inventory of linguistic forms that tend to be associated with face-to-face conversation. A small subset of these includes so and really used as informal intensifiers (but crucially, not anaphoric so and not really as a news recipient). Although this is part of a multi-dimensional analysis (Biber 1988) that both finds and interprets the co-occurrence of a selection of linguistic features, some of the forms involved can still be said to be retrieved by sifting. The fourth and final method can be called frequency-based listing. It involves the use of a frequency list (of individual words or collocations), specifically based on the corpus under investigation, as a starting point. Using such a list, the researcher goes on to select the relevant search terms that occur with high frequency. This way, the search terms will be tailor-made for the corpus and the particular discourse studied. It is an effective way of using corpus-assisted methods to spot persistent patterns in a specific dataset. A nice example of this method is found in Chapter 2, where Walsh, O’Keeffe & McCarthy are able to identify exactly which expressions of vagueness to focus on based on a frequency list of multi-word clusters. Having identified the relevant expressions, they can go on to concordance and analyze them. Of course, we live in an increasingly hybridized world, and it would probably be foolish to expect to find only pure examples of each method. Two or more of



JB[v.20020404] Prn:11/04/2008; 10:36



F: SCL3101.tex / p.4 (233-295)


these search methods are sometimes combined. The study by Garretson & Ädel reported in Chapter 8, for example, uses both sampling and sifting. Sampling is the overall method: by listing what they call “reporting words” (e.g. the verb lemma STATE, the noun statement, and the phrase according to), they attempt to capture instances of hearsay evidentiality in their data. Sifting is employed when individual words in the list are ambiguous or polysemous, as in the case of states – a highly frequent string in the US newspaper data. The analyst is required to retain examples like the association states that misconceptions continue to affect law and reject examples like two dozen states that allow early voting, either by manual elimination or through complex computational algorithms. Any automatic or semi-automatic corpus-based method is restricted to considering surface realizations (whether actual linguistic forms, or units identified by annotation) – and herein lies the challenge for studies of functional categories. The present volume offers many interesting examples of how this challenge can be met.

Overview of the chapters Rather than organizing the book according to the different methods researchers used for analyses, we chose as the main organizing principle the different contexts of language use. One of the main strengths of this book is its exploration of discourse in various settings, covering discourse in academia, in the workplace, in news and entertainment. Thus, the four sections of the book primarily reflect the different settings of the discourses analyzed. The theme of the first section is “Exploring discourse in academic settings”. The section begins with Walsh, O’Keeffe, and McCarthy taking a close look at the use of vague language in a range of speech events recorded at universities in the Republic of Ireland and Northern Ireland. The chapter brings to light some interesting uses of vague language and how the use of vagueness varies depending on the discourse context. The next two chapters focus on language in academic journals. First, Bondi examines stance and engagement as realized through keyword adverbs in a corpus of English-language journal articles in history and economics. A selection of the adverbials (significantly, undoubtedly and invariably) is studied more closely, from the perspective of collocation and patterns of semantic preference as well as pragmatic and textual functions. Next, Sanderson looks at journal articles drawn from five different disciplines in the humanities and written in German, American English and British English, focusing on the use of pronouns that mark interactivity between writer and reader. Various types of sociological information about the authors were encoded, which enabled her to check the relative influence of variables such as linguistic background, discipline, age, and gender.

JB[v.20020404] Prn:11/04/2008; 10:36

F: SCL3101.tex / p.5 (295-341)

The challenges of different settings

The theme of the second section is “Exploring discourse in workplace settings”. This section examines language in the workplace, both the contexts of business and public reports, and the context of professional meetings. The section begins with Vaughan’s in-depth look at the roles humor plays in institutional interactions of teacher meetings. Using a corpus from two different settings of teacher meetings recorded in Mexico and in Ireland, Vaughan discovers interesting patterns in the use of laughter. The following two chapters explore a variety of aspects of the use of English in Hong Kong. Using a small, specialized corpus of professional reports, Flowerdew analyzes discourse moves, focusing particularly on problemsolution patterns. She also examines a couple of keywords (the lemmas problem and impact) and how they co-pattern with structural units in the texts. In the next chapter, Cheng and Warren end the second section with “a first attempt at examining the relationship between the phraseological characteristics of language and the communicative role of discourse intonation”. They present an innovative investigation of patterns of discourse intonation in frequent three- and four-word combinations based on a corpus of spoken English in Hong Kong. The theme of the third section is “Exploring discourse in news and entertainment”. This section exhibits the greatest diversity of genres, including newspaper reports, a television series, and internet-based discussion boards on hip-hop. As diverse as the genres, so are the techniques used to examine discourse. In Chapter 8, Garretson and Ädel tackle the highly political issue of how hearsay evidentiality is reported in news articles related to the 2004 US presidential election. In a detailed look at how campaign language is reported and attributed, they lead the reader through unexpected insights into how different newspapers report the speech of different individual and collective entities. The next chapter takes us from the serious world of reporting presidential campaigns to a popular American situation comedy, Friends. In Chapter 9, Quaglio provides a detailed linguistic investigation of Friends, comparing it to a large corpus of natural conversation. It is a data-driven investigation which combines multidimensional methodology with a frequency-based analysis of a large number of linguistic features associated with the typical characteristics of face-to-face conversation. Quaglio indicates how the language of this television show may prove to be a resource for ESL and EFL teachers. The section concludes by moving from the language of television to the internet postings of hip-hop fans. In Chapter 10, Beers Fägersten carefully examines how identity is constructed in the virtual environment of message board postings. She guides the reader through the linguistic construction of identity – through the use of specific openings and closings, slang and taboo terms, and “verbal art” – in this highly specialized use of language. The theme of the fourth and final section is “Exploring discourse through specific linguistic features”. Johansson traces the uses of it-clefts diachronically. Using several corpora of diachronic and present-day English, she looks across



JB[v.20020404] Prn:11/04/2008; 10:36



F: SCL3101.tex / p.6 (341-411)


several different registers to reveal how the use of it-clefts has changed over time. The greatest frequency and the greatest number of variations on the prototypical it-cleft pattern are found in manuscripts from trials, where the functions of identifying and clarifying are shown to be important, especially to verify the identification of a person, thing or place. In the final chapter, Crawford analyzes the time and place adverbs here, there, now and then in three corpora of learner writing in English and compares that with corpora of English speech and writing produced by native speakers. The adverbs are analysed quantitatively and qualitatively in order to test the hypothesis that the learner writers’ language use is closer to that of native-speaker speech rather than native-speaker writing. Although the investigations represented in this book are quite narrowly focused on English, the reader will learn a great deal about different varieties of English, for example diachronic, international, learner, and non-standard varieties. Not only does this volume offer a rich sample of the spoken and written discourse around the world that takes place in English – with the interesting exceptions of references to German in Chapter 4 – but it also offers a range of topics and methods. The different approaches to the use of corpora are as diverse as the topics investigated. It is our hope that this will encourage other researchers to continue to use corpora in new ways, addressing questions in ways that were previously difficult to imagine.

References Ädel, A. 2003. The Use of Metadiscourse in Argumentative Writing by Advanced Learners and Native Speakers of English. PhD dissertation, University of Göteborg. Biber, D. 1988. Variation across speech and writing. Cambridge: CUP. Biber, D., Conrad, S. & Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP. Conrad, S. 2002. Corpus linguistic approaches for discourse analysis. Annual Review of Applied Linguistics 22: 75–95. Garretson, G. & O’Connor, M. C. 2007. Between the humanist and the modernist: Semiautomated analysis of linguistic corpora. In Corpus Linguistics Beyond the Word: Corpus research from phrase to discourse, E. Fitzpatrick (ed.), Amsterdam: Rodopi. Kennedy, G. 1998. An Introduction to Corpus Linguistics. London: Longman. Leech, G. 1991. The state of the art in corpus linguistics. In English Corpus Linguistics. Studies in honour of Jan Svartvik, K. Aijmer & B. Altenberg (eds), 8–29. London: Longman. McEnery, T. & Wilson, A. 1996. Corpus Linguistics. Edinburgh: EUP. Mukherjee, J. 2004. The state of the art in corpus linguistics: Three book-length perspectives. English Language and Linguistics 8(1): 103–119. Partington, A. 2004. Corpora and discourse, a most congruous beast. In Corpora and Discourse [Linguistic Insights: Studies in Language and Communication 9], A. Partington, J. Morley & L. Haarman (eds), 11–20. Frankfurt: Peter Lang.

JB[v.20020404] Prn:8/02/2008; 12:22

F: SCL31P1.tex / p.1 (61-88)

 

Exploring discourse in academic settings

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.1 (48-136)

‘...post-colonialism, multi-culturalism, structuralism, feminism, post-modernism and so on and so forth’ A comparative analysis of vague category markers in academic discourse Steve Walsh, Anne O’Keeffe and Michael McCarthy Newcastle University, UK / Mary Immaculate College, University of Limerick, Ireland / University of Nottingham, UK

The use of vague language is one of the most common features of everyday spoken English. Speakers regularly use vague expressions to project shared knowledge (e.g., pens, books, and that sort of thing) as well as to make approximations (e.g. around sevenish; he’s sort of tall). Research shows that many of the most common single word items in a core vocabulary form part of vague language fixed expressions (e.g. thing in that kind of thing). This chapter will address the use of vague language in a new corpus of academic English, the Limerick-Belfast Corpus of Academic Spoken English (LIBEL CASE). The LIBEL corpus consists of one million words of spoken data collected in two universities on the island of Ireland, one in the Republic of Ireland and one in Northern Ireland. Analysis of the LIBEL corpus identified forms and functions of vague language in an academic context and these findings are compared with two corpora of everyday spoken language from the Republic of Ireland and the United Kingdom, the Limerick Corpus of Irish English (LCIE) and the Cambridge and Nottingham Corpus of Discourse in English (CANCODE). Cross-corpora comparison allowed us to look at how forms and frequencies of certain vague language expressions vary across casual and formal/institutional contexts. Within the academic data we build on Walsh’s work (see for example Walsh 2002, 2006) to show how vague language use is relative to mode of discourse at any given stage of classroom interaction. We suggest that these qualitative differences are a valuable means of understanding the complex relationship between language and learning.

JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.2 (136-164)

Steve Walsh, Anne O’Keeffe and Michael McCarthy

.

Introduction: Vague categories

The use of vague language is one of the most common features of everyday spoken English. Speakers regularly use vague expressions to project shared knowledge (e.g., pens, books, and that sort of thing) as well as to make approximations (e.g. around sevenish; he’s sort of tall). Research shows that many of the most common single word items in a core vocabulary form part of vague language fixed expressions (e.g. thing in that kind of thing). Carter and McCarthy (2002), who looked at five million words of spoken British English data, show that vague language items are among the core vocabulary items (see also O’Keeffe et al. 2007). Multi-word units which mark vagueness, such as and things like that, that sort of thing, occurred with greater frequency than many single word items. Degrees of variation exist in how vague language is defined. Channell (1994) restricts it to ‘purposefully and unabashedly vague’ uses of languages while Franken (1997) distinguishes between ‘vagueness’ and ‘approximation’. Zhang (1998) makes a case for four separate categories: ‘fuzziness’, ‘generality’, ‘vagueness’ and ‘ambiguity’. Chafe (1982) puts vagueness and hedging in the same category of ‘fuzziness’ – all of which are seen as ‘involvement devices’ more prevalent in spoken rather than written language. The notion of vagueness as an involvement device is consistent with the view that vague language is a core feature of the grammar of spoken language (Carter & McCarthy 1995, 2006; McCarthy & Carter 1995; O’Keeffe et al. 2007). As Carter and McCarthy (2006) note, vague language is a strong indicator of assumed shared knowledge which marks in-group membership insofar as the referents of vague expressions can be assumed to be known by the listener. This is consistent with Cutting (2000), who illustrates how discourse communities use vague language as a marker of in-group membership. The interactive aspect of vague language is important to our focus in this chapter where we examine the use of vague language in the learning context of university discourse. In this domain, the use of vague language is part of meaning making within specific learning contexts or modes (see Walsh 2006: 111). We will focus on one type of vague language, namely vague category markers (hereafter VCMs). These non-lexicalised categories are created within interactions, at the moment of speaking. The categories contain exemplars followed by a vagueness tag (and so on, and that kind of thing, et cetera, and things like that) and the listener(s) is/are expected and assumed to fill in, or implicitly understand the reference. The example in Extract 1 is taken from a drama lecture in the Limerick Belfast Corpus of Academic Spoken English (LIBEL CASE;1 see details in Sections 3 and 4): . Hereafter, LIBEL CASE will be shortened to LIBEL.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.3 (164-217)

A comparative analysis of vague category markers

Extract 1 . . . And I suppose my understanding of critical theory and critical aah critical studies I suppose as such emanate from or are the key social critiques of our time which have emanated from the work of the Frankfurt School. So at the moment it’s you know ahh critical theory is dominated by ideas of postcolonialism multi-culturalism structuralism feminism post-modernism and so on and so forth. Here the exemplars are post-colonialism, multi-culturalism, structuralism, feminism, post-modernism, and the tag which creates the VCM is and so on so forth. Extract 2 is an example from a corpus of casual conversation (the Limerick Corpus of Irish English, LCIE, see below) between friends who are chatting. Speaker (1) creates a VCM but the listener does not understand the exemplar. Hence the category is not created and needs further explanation. In the process of explanation, another VCM is created: Extract 2 (see Appendix for transcription codes) : He just made up words like he just made up I don’t know what. : Is that not artistic license like? amm coinage and stuff like that? : What? : Coinage. : What’s coinage? : When you are writing poetry and stuff you can make up your own words. : Yeah I mean yeah. : Like say sarcasamistic like? : Yeah you are a poet and you don’t know it my friend? : Ah snozberry. : Yeah. : Fantastic. This is a good example of how meaning is negotiated interactively within a conversation. While the first VCM which speaker (1) uses over-extends the range of assumed shared knowledge between the speakers by using the exemplar coinage, the second VCM uses a much more general exemplar, poetry, which is obviously within the range of shared knowledge of the group.

. Previous research into vague categories Vague categories can be divided into lexicalised and non-lexicalised types. Lexicalised categories are those which provide superordinates or prototypes encoded as a single, lexical item, for example bird, furniture, machinery. Until recently,



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.4 (217-270)


most research into the nature of categories has been concerned with these lexicalised categories within the field of semantics; see in particular the work of Rosch and her associates (Mervis & Rosch 1981; Rosch 1978; Rosch et al. 1976), who demonstrated that the categories they studied had a graded structure and that at the centre of each category was a prototype that exhibited the highest concentration of characteristic properties compared with members at the periphery which contained fewest characteristic properties. Non-lexical categories are ad hoc rather than prototypical. The concept is attributed to the work of Barsalou (1983, 1987), though links may be seen in the work of Cruse (1986) on what he called lax hyponymy (the non-institutionalised arrangements of items into instantial categories at the time of speaking). The question as to whether categories are stable or subject to change is addressed in particular by Barsalou (1983, 1987), who talks about the dynamic nature of ad hoc category formation, for example places to look for antique desks. In such examples, categorisation is non-lexicalised and without clear boundary, challenging the notion that categories are stable, easily recognisable and arrived at ‘pre-textually’ (after Overstreet & Yule 1997a). Overstreet and Yule (1997a: 85–86) reflect that: If only common (i.e. lexicalised) categories are studied then little insight will be gained into the discourse processes involved in categorisation when a single lexical item is not available to the discourse participants for the referential category.

Building on the ad hoc categories of Barsalou (1983), they stress the spontaneity of categorisation and the context-dependent nature of the categories themselves when one looks at examples from actual discourse as opposed to stylised examples. Overstreet and Yule (1997a: 87) suggest a continuum from lexicalised to non-lexicalised categories based on the degree to which categories are (a) conventionally and linguistically established, and (b) constrained by contextual factors. In the literature, the tags which help create these ad hoc categories go by different terms such as ‘general extenders’ (Overstreet & Yule 1997a, 1997b); ‘generalized list completers’ (Jefferson 1990); ‘tags’ (Ward & Birner 1992); ‘terminal tags’ (Dines 1980; Macaulay 1991); ‘extension particles’ (DuBois 1993); ‘vague category identifiers’ (Channell 1994; Jucker, Smith & Lüdge 2003); ‘imprecise language’ (Biber 2006) and vague category markers (O’Keeffe 2003, 2006; Evison et al. 2007). In this chapter we adhere to the term vague category marker (VCM). The questions of interest for this chapter are: do VCMs manifest themselves in spoken academic discourse, and if so, to what ends, and do such phenomena differ from or resemble uses of vague language in everyday causal conversation? This last question is important, since special registers in spoken language are often best characterised by the degree to which they resemble or depart from the typical linguistic features of everyday conversation. We enter this investigation via the notion of classroom modes (based on Walsh 2006), a set of ways of communicating be-

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.5 (270-329)


tween teachers and students which recur in the academic corpus, and which seem to have clear pedagogical foci in relation to overall goals in educational settings.

. Classroom modes In this section, a framework for analyzing spoken academic discourse at university level is presented and exemplified. The framework, SETT (Self-Evaluation of Teacher Talk, Walsh 2006), emphasizes the fact that interaction and classroom activity are inextricably linked, and acknowledges that as the focus of a learning event (e.g. lesson, seminar, or workshop) changes, so interaction patterns and pedagogic goals change. When language use and pedagogic purpose are considered together, different contexts emerge, making it possible to analyze the ensuing discourse more fairly and more objectively (see, for example, van Lier 1988; Seedhouse 2004). Under this variable view of contexts, student and teacher patterns of verbal behaviour can be seen as more or less appropriate, depending on a particular pedagogic aim. Characterizing university teaching in this way is not intended to offer an all-encompassing description nor a means to ‘code’ interaction patterns. Rather, the intention is to offer a framework and a metalanguage which may be used to interpret interaction in the context of third-level classrooms. Like other writers who adopt a variable view of classroom context (see, for example, Seedhouse 2004) the SETT framework, presented below, also adopts a variable approach. Specifically, the design of the framework rests on four assumptions. Firstly, all classroom discourse is goal-oriented: the prime responsibility for establishing and shaping the interaction lies with the teacher; secondly, pedagogic purpose and language use are inextricably linked – it is impossible to consider one without taking account of the other; thirdly, any higher education classroom context is made up of a series of micro-contexts (termed modes) which are linked to the social, political, cultural and historical beliefs of the participants (cf. Kumaravadivelu 1999); fourthly, micro-contexts are co-constructed by teachers and students through their participation, through face-to-face meaning-making and through a process of ‘language socialization’ (Pavlenko & Lantolf 2000). A mode is defined as a ‘classroom microcontext which has a clearly defined pedagogic goal and distinctive interactional features determined largely by a teacher’s use of language’ (Walsh 2006: 111). A modes analysis recognizes that understanding and meaning are jointly constructed, but that the prime responsibility for their construction lies with the teacher. The original SETT framework is based on a corpus of 14 English for Specific Purposes lessons, totalling approximately 12 hours or 100,000 words. The framework has since been applied to a much larger corpus of one million words of academic spoken English recorded in two universities on the island of Ireland. This



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.6 (329-486)


corpus, LIBEL (Limerick and Belfast Corpus of Academic Spoken English), is composed of spoken academic data, collected at Queen’s University Belfast, Northern Ireland, and the University of Limerick, Republic of Ireland, from the following contexts: lectures, seminars, small group tutorials, oral presentations and conference papers. 50% of the corpus was collected in each site and its design matrix spans subject areas and colleges within the two institutions so as to achieve internal comparability and overall representativeness (see www.mic.ul.ie/ivacs). Table 1. Overview of number of hours collected to date (LI = Limerick, BEL = Belfast) Discipline

LI

BEL

Arts and Humanities Social Sciences Science Engineering and Informatics Business

36 26 5 11 3

6 15 17 9 2

Based on the initial corpus findings, qualitative samples of the data were analysed by working from concordance lines. In the qualitative stage, a CA methodology was used, which centred on turn-taking mechanisms in relation to teachers’ perceived goals of the moment and their stated written lesson aims. Interaction patterns were found to vary according to instructional activity; for example, establishing procedures to complete an activity resulted in a very different pattern of interaction to that of open-class discussion. The different patterns manifested themselves in the turn-taking, sequence of turns and topic management. According to Heritage, interactants’ talk is ‘context-shaped’ by a previous contribution, and ‘context-renewing’ by subsequent ones; understanding is indicated by the production of ‘next’ actions (1997: 162–163). In other words, participants both contribute to and demonstrate understanding of the interaction through the ways in which turns are managed. In this way, it is possible to characterize both the relationship between talk and actions, and assess the extent to which the ‘talk-in-interaction’ is appropriate to the shifting agenda and pedagogic goals of the moment. Following this procedure, it was possible, by analyzing the corpus, to identify four patterns, or four micro-contexts, called modes: managerial mode, classroom context mode, skills and systems mode, and materials mode. Each mode has distinctive interactional features and identifiable patterns of turn-taking related to instructional goals. While other modes could almost certainly be identified (depending on the specific context), these four are included as being representative of the interaction which takes place in the third level classroom, because they provide clear-cut examples of different types of interactional patterning and because they are intended to be used by teachers using samples of their own data as a means of awareness raising.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.7 (486-501)


Table 2. Classroom Modes (Walsh 2006) Mode

Pedagogic Goals

Interactional features

Managerial – To give an instruction – To organize the physical learning environment – To refer students to materials – To introduce or conclude an activity – To change from one mode of learning to another

– A single, extended teacher turn which uses explanations and/or instructions – The use of transitional markers – The use of confirmation checks – An absence of student contributions

Materials

– To provide input or practice around a piece of material – To elicit responses in relation to the material – To check and display answers – To clarify the focus of the material when necessary – To evaluate contributions

– Predominance of IRF (Initiation, Response, Feedback) pattern – Extensive use of display questions – Content-focused feedback – Corrective repair – The use of scaffolding

Skills and systems

– To enable students to produce correct answers – To enable students to manipulate new concepts – To provide corrective feedback – To provide students with practice in sub-skills – To display correct answers

– – – – – – –

The use of direct repair The use of scaffolding Extended teacher turns Display questions Teacher echo Clarification requests Form-focused feedback

Classroom context

– To enable students to express themselves clearly – To establish a context – To promote dialogue and discussion

– – – – – – –

Extended student turns Short teacher turns Minimal repair Content feedback Referential questions Scaffolding Clarification requests

The four modes, together with teachers’ interactional features and typical pedagogic goals, are summarized in Table 2. Owing to the multi-layered, ‘Russian doll’ (Jarvis & Robinson 1997: 225) quality of classroom discourse, any classification is not without its problems and the present one is no exception. Tensions between and within modes do exist: rapid movements from one mode to another, termed mode switching; brief departures from one mode to another and back again, termed mode side sequences; the fact that some sequences do not ‘fit’ into any of the four modes identified. These have all posed problems for description. Moreover, the analysis is further complicated by the homogeneous and heterogeneous quality of classroom contexts (Seedhouse



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.8 (501-595)


2004); within a mode, every interaction is both similar to other interactions (homogeneous) and yet a unique encounter (heterogeneous).

. Data and methodology For this investigation, we draw on three spoken language corpora, LIBEL, from an academic setting, and two comparable corpora, CANCODE and LCIE, composed of casual conversation from Britain and Ireland. Table 3 summarizes these data. Table 3. Description of data used in the study Corpus

No. of words

Description

Limerick-Belfast Corpus of Academic Spoken English (LIBEL)

500,000 words2

– Consists of lectures, small group tutorials, laboratories and presentations – Collected in two universities on the island of Ireland: Limerick and Belfast3 – Data from common disciplinary sites (see Table 1)

Cambridge and Nottingham Corpus of Discourse in English (CANCODE)4

1 million words (a subset of the 5-million-word corpus)

– Consists of casual conversations between family and friends in Britain and Ireland – Designed to reflect spoken genres, speaker relationships and context (see McCarthy 1998)

Limerick Corpus of Spoken English (LCIE)

1 million words

– Designed as a comparable corpus to CANCODE – Consists of casual conversations between family and friends in Southern Ireland (see Farr et al. 2002)

In this chapter, we draw on two methodologies not always seen as complementary, corpus linguistics and conversation analysis. These have much to offer each other as they provide both quantitative and qualitative insights respectively (Carter . At the time of writing, LIBEL comprises one million words, 500,000 of which are fully transcribed. Its breakdown across disciplines in terms of number of hours transcribed is: Arts & Humanities 32%; Social Sciences 32%; Science 17%; Engineering & Informatics 15%; and Business 4%. . Note that while Limerick and Belfast are geographically on the same island (of Ireland), they come under two different jurisdictions: (1) The Irish Republic and (2) The United Kingdom and Northern Ireland, respectively. . CANCODE was a joint project between the School of English Studies, University of Nottingham, UK, and Cambridge University Press (with whom sole copyright resides). No part of the corpus may be used or reproduced without the permission of the copyright holder.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.9 (595-700)


& McCarthy 2002; O’Keeffe 2006; Walsh & O’Keeffe 2007). Applied to the corpora in Table 3, Wordsmith Tools software (Scott 1999) was used to produce word cluster (or chunks) frequency lists, that is to say, lists of recurrent strings of pre-selected extents (e.g. three-word clusters, four-word clusters). These quantitative data were sorted so as to identify VCMs in each corpus. This process involved concordancing individual high-frequency chunks operating as VCMs, and extensive manual reading of sample files. When we look at the micro-contexts, or modes, we employ CA to help understand the ways in which vague language is manifested in each mode, and the contribution VCMs make to the enactment of the modes. A brief summary of the transcription conventions used appears in the Appendix. Table 2 should be used as a reminder of the interactional features and pedagogic goals of each of the four modes.

. Analysis The quantitative findings based on the three corpora are illustrated below. These show the most common VCM forms and their frequencies in the three datasets. These forms are based on cluster analyses using Wordsmith Tools. First of all, at the level of geographical variation, these results point out that British English speakers’ use of VCMs is greater than that of Irish English speakers. However, closer examination shows that variation is accounted for mostly across less than half of all of the forms (i.e. it is these forms that diverge most): and all, and/or [something/anything/everything] (like that), and/or stuff (like that), (and) (all) this/that sort/kind of thing, and (and) (all) this/that sort/kind of thing. Overall, at a quantitative level, greater variation is evident between Irish and British English (i.e. CANCODE and LCIE) than between LCIE and the register-specific LIBEL data. At the level of contextual variation, or register, variation is accounted for by the higher frequency of use of certain forms in the academic data. These are: et cetera and and so on (and so forth)(like that). The various combinations of the form and so on (and so forth)(like that) account for 48% of all VCMs in the LIBEL data while the next most frequent form, et cetera, makes up 12% of all uses of VCMs in the academic data. In comparison, both the British and Irish casual conversation data draw more on a wider range of forms. If we remove the above forms from the overall count, we see that the total for LIBEL would be considerably lower than either of the casual conversation datasets.



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.10 (700-834)


Table 4. VCM forms resulting from cluster analysis (normalised to occurrences per million words)5 Form

LIBEL

LCIE

CANCODE

and so on (and so forth)(like that) et cetera (et cetera) and/or [something/anything/everything] (like that) and (all) (of) that and/or stuff (like that) or something (and) (all) this/that sort/kind of thing and things like that (and) (all) this/that sort of thing/stuff and all6 and all the rest (of it) this that and the other Total

524 136 126 77 67 61 52 46 21 13 4 2 1129

103 57 198 190 193 440 66 49 24 97 23 8 1448

60 30 1024 270 602 513 128 61 123 13 17 7 2848

As the concordance line extracts for the high frequency items et cetera and and so on show (Figures 1 and 2), the LIBEL VCMs were not found to be specific to particular disciplines. Note, in the case of et cetera, the strong preference for reduplication of form. In the results presented in Table 4, reduplications were counted as single VCM occurrences (i.e. a cross sectional hatching et cetera et cetera was counted as one vague category, marked by the form et cetera et cetera). However, as a percentage, 40% or all et cetera VCMs were reduplicated by speakers in LIBEL. This compares with 21% reduplication of et cetera in LCIE and 28% in CANCODE. As we have discussed above, various studies show that VCMs are used in casual conversation as involvement devices and are markers of the shared worlds of the speakers in a conversation. They draw on participants’ socio-cultural commonage and have an overall effect of marking in-group membership. In order to find out more about how and why speakers use VCMs in academic discourse, we turn now to a qualitative analysis which uses the four modes as its framework. . Only vague category uses of each form were counted. Rounded brackets ( ) mark words which may occur in the phrase and forward slashes / refer to either or options. For example, ‘(and all) this/that sort of thing/stuff ’ is a combined count of all of the following possibilities: a) and all this sort of thing; b) and this sort of thing; c) all this sort of thing (not including those already counted in a); d) this sort of thing (not including those already counted in a, b and c); e) and all that sort of thing; f) and that sort of thing; g) all that sort of thing (not including those already counted in e and f); h) that sort of thing (not including those already counted in e, f and g), and so on for all the combinations as above for ‘stuff ’. . This count is not inclusive of any of the combinations counted for and (all) (of) that.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.11 (834-834)


them and booksellers will hide them under the counter for closer customers et cetera. of am the purely educational system for developing media and social services et cetera. student tutor interactions. Okay. How do the students teacher teacher tutor et cetera. wrong. That’s the way to do it. Arrows should be this line thickness et cetera et cetera. You’re not writing an essay. Your use of short forms ellipsis R U there et cetera. an exploder view with an isometric with a cross sectional hatching et cetera et cetera. the country. There was great deal of talk about the harvest about farm work et cetera. not necessarily parallel. At least your printing is all up and down et cetera et cetera. include the inter relationships between the cube potential et cetera et cetera et cetera. product which really is open to am you know almost any additional dimension et cetera. somebody else is registered independent is going to join this party et cetera et cetera. On ability to respond to a child. A child’s level of communication et cetera et cetera. we can come back in and put in the shade and the shadow attaching et cetera et cetera.

Figure 1. Concordance samples of et cetera from LIBEL you know the Irish having a pig in the parlour and so on. you have like a play button or a stop and a rewind next and forward and so on. to exert relative to the the actual height of the workstation and so on. from an ergonomic viewpoint in relation to ahh the amount of force and so on. we have several different lists of ahh guidance for workplace design and so on. it can also contribute to the accidents and so on. The child itself gets its better at amm going to different people and so on. Well the bottom line is you will have people who are both tall and short and so on. So amm so dexterity it’s your ability to be able to manipulate objects and so on. So actually the average tax rate could be twelve thousand to zero and so on. You have to have fresh blood going into various muscles and so on. This file again by the twenty five hours. This by the forty five hours and so on. Amm and we can also then consider the actual workplace height and so on. one cry might mean lion yeah. Another cry might mean danger yeah and so on.

Figure 2. Concordance samples of and so on from LIBEL

. Managerial mode In Extract 3 below, we are at the beginning of a small group seminar on oral history, where the lecturer is setting up an activity and organizing the seating so that the session can begin. In this extract, as in most others where managerial mode is prevalent, there is little or no evidence of vague language category marking. Instead, the lecturer makes extensive use of instructional language (if you do have access to one of those transcripts; Just make sure you sit beside someone you can look in with) to locate the teaching and learning in time and space (all you do is pull the chair over by somebody who has one; I know a lot of people weren’t here last week for very good reasons ah just all you can do is fill in whatever words of wisdom were spread around ah from other people’s notes). Managerial mode occurs most often at



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.12 (834-887)


the beginning of a piece of teaching and is characterized in the first instance by an extended teacher turn of more than one clause and a complete absence of student turns. The focus is on the ‘institutional business’ of the moment, the core activity. Typically, there is a considerable amount of repetition and some kind of ‘handing over’ to the students which occurs at the end of each sequence. At this point, there is a movement to another mode: in Extract 3, for example, the pedagogic focus is re-aligned away from directing learning (managerial mode) to analysing a tape script (skills and systems mode). Extract 3 It’s an awful setting in the way the room is at the moment but aam if I try and [move] around a few and all you do is pull the chair over by somebody who has one. Aah Yeah okay hopefully. Ah I’d like to make sure now about the tape and the volume is the volume is there? Yeah you might need to bring it up. Anyway look right folks we’ll start. Ok it’s very awkward. It’s not the kind of set up we’d like to have because the lines are too reminiscent of what’s going to happen in a week or two but it’s not very pretty but anyway sure we’ll do the best we can. Now aam I know a lot of people weren’t here last week for very good reasons ah just all you can do is fill in whatever words of wisdom were spread around ah from other people’s notes aam and if you do have access to one of those transcripts eh all the better. Just make sure you sit beside someone you can look in with. Where vague language does occur in managerial mode, it appears to function almost as a time-saving device so that the main item on the teaching agenda can be realised with minimal disruption and minimal waste of time. Compare Extract 4 (managerial mode) below from a different lesson in another discipline. Here, the lecturer is anxious to move on to the task and to engage students with their own data which they were required to collect as part of their assignment for the semester, as part of a media class. Extract 4 Really what I want to know when you having done the interview and scribed it and looked at the content of the interview how does it relate to how you understand audiences and you now understand more about audiences, about audience agency and so on. Then you will obviously feed into your concluding points about the particular interview, about how it went, about what the content of the interview has taught you in terms of audience based research. The VCM and so on, as mentioned above, in its various forms, accounts for 48% of all VCMs in LIBEL. Here we see that it appears to serve an important classroom function in managerial mode, that is, to minimise the time spent on setting up the task and to allow the teacher ‘hand over’ to students with minimal fuss. The vague

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.13 (887-936)


category audience agency and so on is taken as a given, something that they already know about from recent input. The VCM in Extract 4 stands to mark shared/given knowledge which is background to the task at hand. Perhaps more importantly, creating this shared space gives learners a sense that that they can do the task and enhances their confidence. The absence of such language might make the instruction more direct and reduce the sense of ownership and collective ‘struggle’ which are essential features of higher education teaching and learning. Extract 5 is another similar example from a physiotherapy lecture where the VCM is found in the context of managerial mode, where again it is used in setting the scene for the next stage. We note the use of once again here too as an additional explicit reference to known information. Extract 5 So there may be some accidents and maybe some injuries and maybe some a strong physiological stress on the body. Especially if maybe it’s ahh a hot environment or a very cold. Once again you can use some subjective assessments to actually to assess how the person ask the person if they’re fatigued in the course of this task if a body was lumbered and ahh things like this. That’s just kind of setting the scene. And we will be coming across some ahh more points like for example Corlett’s principles in the next ahh few lectures. I’m going to use some specific points which we will consider here in relation to the machine design and the operator so that we can reduce the problems for example with repetitive strain style injuries. Okay so. Next we’re going to have a look at amm evaluating the solution. . . To sum up then, we can say that overall there is comparatively little evidence of vague language in managerial mode in LIBEL. However, large-scale quantitative studies would be needed to substantiate this fully (note that, in the T2K-SWAL, corpus evidence of vague language is found in managerial mode; see Biber 2006). We speculate that the low occurrence of VCMs in this mode is due to lecturers’ concern to establish a meaningful context where learning can take place. Any examples which do emerge in the data serve to facilitate the process of setting up (or feeding back on) an activity, or organizing learning in the most effective way so as to move to a new phase. Throughout, the prime pedagogic goal is to transmit information in the most economical way. Being able to use a VCM to refer to assumed background knowledge at the start up phase is expeditious for the lecturer. . Materials mode Materials mode centres around a phase in a lesson where there is input or practice around a piece of material. Responses are elicited in relation to the material and



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.14 (936-1004)


concepts and comprehension are checked. This mode is not one in which we find VCMs. As Extract 6 illustrates, interaction within this mode comprises many short IRF (Initiation – Response – Feedback) exchanges. The language is very specific and vague language, of any type, is rare here. Extract 6 [ = the lecturer in an English Language class. At this point in the lesson, the lecturer is checking homework that has been assigned to the class. Here we see how the task, and the student’s response to it, is the main focus. In this case the lecturer checks for the word missing from the gap fill, flat, and corrects the student’s pronunciation of the answer ‘. . .flat. Not a flight’ ] And decorate the? The? Decorate the? Flat. Flat flat flat. Not a flight. Flat. Okay. Pronounce that word. . Skills and systems mode In skills and systems mode, the interaction revolves around the core subject of the particular discipline. The main pedagogic goals are to allow students an opportunity to familiarize themselves with new skills or concepts and to provide corrective feedback. The discourse is typically tightly controlled and teachers make frequent use of display questions7 to elicit responses which are then evaluated. Meanings may be clarified in the give-and-take of the interaction through error correction, requests for clarification and confirmation checks. Where new concepts are expressed by technical language, teachers may scaffold key terminology, offering students an opportunity to gain access to a discourse community through the language of that community. Vague language does occur in this mode, as illustrated in Extract 7 (we also note, as in Extract 5, the use of again and so I’ve mentioned in Extract 7 that refer the students back to known information. Here they serve as an additional means of scaffolding and schema-building). Extract 7 Lecturer: Okay and again equity so I’ve mentioned they’re rejuvenating the economy poverty alleviation and so on. They’re the kinds of equities that we’re the effectiveness of the redistribution of taxation amm Okay and so on. Now we can only cover this to a certain degree. We’re very limited by the amount of time. . Display questions are those where the questioner already knows the answer. They are typically associated with classrooms and quizzes.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.15 (1004-1039)


In the data, the lecturer is under time pressure and uses vague language (and so on) as a means of reducing his contribution. As he recaps, he avoids the need to re-list the points which have been covered in the earlier part of the lecture, allowing students an opportunity to recall that information for themselves (rejuvenating the economy, poverty alleviation and so on). But, and perhaps more importantly, the vague language expressed in and so on does more than save time and prompt students to recall what has been covered earlier in the lecture. This example of vague language also creates a sense of shared space, common ground. The lecturer here, through his use of vague language, is actually saying ‘we all know this’. The net effect of this is to ensure that students feel included and feel ‘safe’ as opposed to feeling intimidated or excluded. This is related to the in-group membership function of VCMs that has been noted in relation to their use in casual conversation. Friends and family use them to create and sustain a sense of membership within a circle of friends or family. However, when they are used in an academic context by the lecturer in a university classroom context, it may be seen as a device used by an expert to bring novices into a discipline, to make them feel part of a given subject area of field. The creation of a shared space and the use of inclusive language are crucial to successful teaching since they create an atmosphere in which students are prepared to take risks and offer their own perspective on the content of the lecture or seminar. In Extract 8, students are made to feel included and this is part of the process of collaborative meaning-making which is so important in higher education discourse. Here, the lecturer is giving students an opportunity to answer without making them feel trapped or intimidated by the question. Extract 8 [ = lecturer, = student] . . . did you put it on V H S then or or ah yeah excellent did you try and digitize it or put it on the web or anything like that? totally oh very good excellent excellent

The VCM or anything like that offers options to the student and also creates shared space in which students feel free to respond. It is an interesting choice of form, which is more associated with casual conversation (e.g. 32 occurrences per million words in LCIE and 35 in CANCODE, compared with 17 in LIBEL). As a VCM it is very open-ended with both or and anything in its form. This may account for why it occurs more in casual conversation than in academic language. We also note that its use here marks an attempt on the part of the lecturer to not only create a vague category but to hedge the directness of the question. A more direct question such as ‘did you put it on the web?’ might have been interpreted as a criticism and not



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.16 (1039-1096)


received any response from students – the phrase or anything like that functions as a ‘softener’, oiling the wheels of the interaction, making the question less direct and facilitating a sense of membership. In Extract 9, we see that the use of a VCM by a student allows for the tentative positing of an answer to the lecturer’s question. This hedging effect of the VCM here provides face protection for the student as well as marking the proposition as tentative. Extract 9 [ = lecturer, = student, there in the final line refers to services, as opposed to agriculture] Okay. In amm nineteen eighty eight and nineteen ninety one there was a labour force survey done in each year. Now I’m just going to show you what sectors that ahh they were concerned with. Okay? Now how about someone anyone hazard a guess. Just analyse the graph analyse the bar graph now. Why do you think agriculture is so low and services is so high? Mike? I don’t know agriculture. You know fixed pay and things like that. . . . More people going to college more people coming out of college. Better jobs going there. We also note here the use of the pragmatic marker you know in conjunction with the VCM. As noted by Carter and McCarthy (2006), you know projects the assumption that knowledge is shared or that assertions are uncontroversial, and reinforces common points of reference. The use of you know plus the VCM and things like that serves to tentatively project shared knowledge on the part of the student. Jucker, Smith and Lüdge (2003) point out that vague category construction asks the hearer to construct the relevant components of the set which they evoke and, in so doing, promotes the active cooperation of the listener. In the learning context of the LIBEL data, we could say that VCMs are also a vehicle for collective meaning-making. When they are used, in skills and systems mode, on the part of the lecturer, they promote active cooperation that results in learning. When they are used by students (as in Extract 9), they also promote cooperative peer-to-peer engagement with the category and reach out to the teacher for confirmation. They therefore provide evidence of learning in action. . Classroom context mode In classroom context mode, the management of turns and topics is determined by the local context; opportunities for genuine communication are frequent and the teacher plays a less prominent role, allowing students all the interactional space they need. The principal role of the teacher is to listen and support the interaction,

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.17 (1096-1155)


which frequently takes on the appearance of everyday conversation. Pedagogic goals typically centre on promoting dialogue and discussion; students have genuine opportunities to express their own ideas and to make real contributions to academic debate. Student responses are usually quite long and the teacher may offer scaffolded input or seek clarification as and when it is needed. Vague language functions here in much the same way as it functions in everyday conversation, that is, as an ‘involvement device’ ensuring listener participation and promoting equity and understanding. Consider Extract 10 below. Here the teacher is trying to make a point by using a literary reference. The language of the extract is very similar to everyday conversation. The choice of VCM, and stuff, also aligns it with casual conversation. This form is not very frequent in LIBEL compared with LCIE and CANCODE (the form and stuff alone occurs 41 times per million in LIBEL compared with 141 and 167 in LCIE and CANCODE respectively; see Table 4 for other related results for stuff patterns, all of which are greater in casual conversation). Extract 10 Lecturer: . . .did any of you ever read Angela’s Ashes? [unintelligible comments from students] Yeah exactly and it’s just it’s just the poems and stuff that the Daddy keeps on you know every time he has a few drinks and he’s living abroad and he’s broke and he’s after like leaving Ireland like arrived there filled with the pox and you know like. It was just like not at all a romantic story. He gets there and then before you know he’s like standing up all the kids at night time going we’ll die for Ireland. And you know there’s was all of these like poems and and stuff like that and it was all about like will you die for Ireland? The VCMs here (and stuff, and stuff like that) ensure that the listeners feel involved and that there is empathy towards the stance that the teacher adopts, i.e. agreement. As was the case in Extract 9, we see the use of you know (like) as an additional involvement device. Through the combined use of these markers then, the teacher is able to progress the discourse, bringing everyone along together and making sure that there is a sense of purpose and direction to the dialogue. Again, the vague language being used here serves to ‘soften the blow’ of a more didactic tone. A more conversational style is also almost certain to promote good listenership (McCarthy 2002, 2003) and means that the learning will be more memorable. Classroom context mode, then, out of the four modes included in the SETT framework (Walsh 2006), offers the greatest potential for vague language since it most closely resembles everyday conversation. Note that in this mode, vague language is as likely to be used by students as it is by teachers, as exemplified in Extract 11. Here, the student asks a question, but uses vague language (and everything) as a means of creating shared space and involving the teacher-listener. The



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.18 (1155-1214)


net effect of this is to promote understanding and to ensure that the questioner is fully understood. Extract 11 [ = student, lecturer] I have a question. Yeah? I was reading in one of our books that ethnicity and race are completely different things and ethnicity you learn things and race is a is a ahh is inherent in the you know in the blood and your appearance and everything. That is that wrong? It depends on what theorist you go after.

. Conclusions We stated at the outset as the main questions posed in this chapter: do VCMs manifest themselves in spoken academic discourse, and if so, to what ends, and do such phenomena differ from or resemble uses of vague language in everyday causal conversation? To the first question, VCMs clearly do occur in spoken academic discourse (as others, such as Biber 2006 and Evison et al. 2007, have also shown). In our data they occur less frequently than in casual conversation and they appear to rely strongly on certain forms (two forms accounted for 60% of all VCMs in the LIBEL data). To the second part of this overarching question, we can say that two main functions arise in the LIBEL: (1) VCMs can be used as expeditious devices. This is particularly the case within managerial mode where VCMs are used by the teacher to help expedite the start-up phase of a class or activity. Because they provide shortcuts that mark information or concepts that can be taken as given, shared or unproblematic, they very quickly establish what is common ground and facilitate a speedy handing over to the task phase of the class; and (2) VCMs, as in casual conversation, can be used as involvement devices, where again they mark shared knowledge but to do so in a way which scaffolds learning. In skills and systems mode, for example, they operate as two-way portals. For the lecturer, they can open a door to what is key shared knowledge for this phase of the class and create a shared space around this ‘learning commonage’. For the student, they open a door to a space where it is safe to take risks. Tentative propositions can be marked using VCMs and loss of face is avoided (see Extracts 9 and 11). In classroom context mode, we find that because language use, in general, resembles casual conversation more closely (see Walsh 2006), VCMs occur along with other vague language items and mark shared, uncontested knowledge. The second function that we refer to above, the use of VCMs as involvement devices, seems to parallel their function in casual conversation. However, we need to go back to the contextual differences of the interaction. The use of VCMs, by

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.19 (1214-1256)


lecturers, in the LIBEL data ties in with pedagogical goals of the interactional mode within which they occur. Classroom contexts differ from casual conversations. As we noted, all classroom discourse is goal-oriented; pedagogic purpose and language use are inextricably linked; any higher education classroom context is made up of a series of micro-contexts (modes) and these micro-contexts are co-constructed by teachers and students through their participation, through face-to-face meaning-making. However, while a modes analysis recognizes that understanding and meaning are jointly constructed, it also holds that the prime responsibility for their construction lies with the teacher. Therefore, the use of VCMs as involvement devices in academic discourse and in casual conversation cannot be fully equated since the power semantic differs between the institutional setting of the university classroom and that of casual conversation. Friends and intimates in casual conversation use many types of involvement devices as they symmetrically reinforce their social relationships. University lecturers, on the other hand, use involvement devices such as VCMs to try to ‘bring their student in’ both at the local level of pedagogic goal and at the higher-order level of initiating them into their community of practice (Wenger 1998). When students use them, they are not in the power-holding role and so they function aspirationally as involvement devices. From another pedagogical perspective, we also have to recognize the importance of VCMs as vocabulary items for non-native speakers of English, either those taking classes in English, or indeed, teaching classes through the medium of English. In this respect, VCMs need to be considered as core academic vocabulary items. From a second language perspective, it is clear that the ability to understand and create VCMs is an important part of classroom language, but from the perspective of teaching/lecturing, the ability to draw on the shared and known world, as we hope to have illustrated, is a very important part of building up knowledge schema. Their prevalence in terms of high frequency chunks in casual conversation also adds to the case for including them as vocabulary items not just in English for Academic Purposes programmes.

References Barsalou, W. L. 1987. The instability of graded structure: Implications for the nature of concepts. In Concepts and Conceptual Development, U. Neisser, (ed.), 101–40. Cambridge: CUP. Barsalou, W. L. 1983. Ad hoc categories. Memory and Cognition 11: 211–77. Biber, D. 2006. University Language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins. Carter, R. A. & McCarthy, M. J. 2006. Cambridge Grammar of English: A comprehensive guide to spoken and written English grammar and usage. Cambridge: CUP.



JB[v.20020404] Prn:11/04/2008; 10:37



F: SCL3102.tex / p.20 (1256-1362)


Carter, R. A. & McCarthy, M. J. 2002. From conversation to corpus: A dual analysis of a broadcast political interview. In Windows on the World: Media discourse in English, A. Sánchez-Macarro (ed.), 15–39. Valencia: University of Valencia Press. Carter, R. A. & McCarthy, M. J. 1995. Grammar and the spoken language. Applied Linguistics 16(2): 141–58. Chafe, W. 1982. Integration and involvement in speaking, writing, and oral literature. In Spoken and Written Language: Exploring orality and literacy, D. Tannen (ed.), 35–53. Norwood NJ: Ablex. Channell, J. 1994. Vague Language. Oxford: OUP. Cruse, D. A. 1986. Lexical Semantics. Cambridge: CUP. Cutting, J. 2000. Analysing the Language of Discourse Communities. Oxford: Elsevier. Dines, E. 1980. Variation in discourse – and stuff like that. Language in Society 1: 13–31. DuBois, S. 1993. Extension particles, etc. Language Variation and Change 4: 179–203. Evison, J., McCarthy, M. J. & O’Keeffe A. 2007. ‘Looking out for love and all the rest of it’: Vague category markers as shared social space. In Vague Language Explored, J. Cutting (ed.), 138–157. Basingstoke: Palgrave. Farr, F., Murphy, B. & O’Keeffe, A. 2002. The Limerick Corpus of Irish English: Design, description and application. Teanga 21: 5–29. Franken, N. 1997. Vagueness and approximation in relevance theory. Journal of Pragmatics 28: 135–151. Heritage, J. 1997. Conversational analysis and institutional talk: Analysing data. In Qualitative Research: Theory, method and practice, D. Silverman (ed.), 161–183. London: Sage. Jarvis, J. & Robinson, M. 1997. Analysing educational discourse: An exploratory study of teacher response and support to pupils’ learning. Applied Linguistics 18(2): 212–228. Jefferson, G. 1990. List construction as a task and resource. In Interaction Competence. G. Psathas (ed.), 63–92. Lanham MD: University Press of America. Jucker, A. H., Smith, S. W. & Lüdge, T. 2003. Interactive aspects of vagueness in conversation. Journal of Pragmatics 35: 1737–69. Kumaravadivelu, B. 1999. Critical classroom discourse analysis. TESOL Quarterly 33(3): 453– 484. Macaulay, R. K. S. 1991. Locating Dialect in Discourse: The language of honest men and bonnie lasses in Ayr. Oxford: OUP. McCarthy, M. J. 2003. Talking back: ‘Small’ interactional response tokens in everyday conversation. In Special issue of Research on Language and Social Interaction on ‘Small Talk’, J. Coupland (ed.), 36(1): 33–63. McCarthy, M. J. 2002. Good listenership made plain: British and American non-minimal response tokens in everyday conversation. In Using Corpora to Explore Linguistic Variation. R. Reppen, S. Fitzmaurice & D. Biber (eds), 49–71. Amsterdam: John Benjamins. McCarthy, M. J. 1998. Spoken Language and Applied Linguistics. Cambridge: CUP. McCarthy, M. J. & Carter, R. A. 1995. Spoken grammar: What is it and how do we teach it? ELT Journal 49(3): 207–218. Mervis, C. B. & Rosch, E. 1981. Categorization of natural objects. Annual Review of Psychology 32: 89–115. O’Keeffe, A. 2006. Investigating Media Discourse. London: Routledge. O’Keeffe, A. 2003. ‘Like the wise virgins and all that jazz’ – Using a corpus to examine vague language and shared knowledge. In Applied Corpus Linguistics: A multidimensional perspective, U. Connor & T. A. Upton (eds), 1–20. Amsterdam: Rodopi.

JB[v.20020404] Prn:11/04/2008; 10:37

F: SCL3102.tex / p.21 (1362-1442)


O’Keeffe, A., McCarthy, M. J. & Carter, R. A. 2007. From Corpus to Classroom: Language use and language teaching. Cambridge: CUP. Overstreet, M. & Yule, G. 1997a. On being explicit and stuff in contemporary American English. Journal of English Linguistics 25(3): 250–58. Overstreet, M. & Yule, G. 1997b. Locally contingent categorization in discourse. Discourse Processes 23: 83–97. Pavlenko, A. & Lantolf, J. P. 2000. Second language learning as participation and the (re)construction of selves. In Sociocultural Theory and Second Language Learning, J. P. Lantolf (ed.), 155–178. Oxford: OUP. Rosch, E. 1978. Principles of categorization. In Cognition and Categorization, E. Rosch & B. Lloyd (eds), 27–48. Hilldale NJ: Lawrence Erlbaum. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M. & Boynes-Braem, P. 1976. Basic objects in natural categories. Cognitive Psychology 2: 491–502. Scott, M. 1999. Wordsmith Tools Software. Oxford: OUP. Seedhouse, P. 2004. The Interactional Architecture of the Second Language Classroom: A conversational analysis perspective. Oxford: Blackwell. van Lier, L. 1988. The Classroom and the Language Learner. London: Longman. Walsh, S. 2006. Investigating Classroom Discourse. London: Routledge. Walsh, S. 2002. Construction or obstruction: Teacher talk and learner involvement in the EFL classroom. Language Teaching Research 6: 1–23. Walsh, S. & O’Keeffe, 2007. Applying CA to a modes analysis of third-level spoken academic discourse. In Conversation Analysis and Languages for Specific Purposes, P. Bowles & P. Seedhouse (eds), 101–139. Frankfurt: Peter Lang. Ward, G. & Birner, B. 1992. The semantics and pragmatics of “and everything”. Journal of Pragmatics 19: 205–214. Wenger, E. 1998. Communities of Practice: Learning, meaning and identity. Cambridge: CUP. Zhang, Q. 1998. Fuzziness – vagueness – generality – ambiguity. Journal of Pragmatics 29: 13–31.

Appendix Transcription conventions speaker turn, e.g. = speaker 1, = speaker 2 etc. in order of ‘appearance’ on the recording. ... marks uncertain or unintelligible utterances where the number of syllables cannot be guessed. Where the number of syllables can be discerned, this number is marked, e.g. denotes two intelligible syllables.



JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.1 (47-115)

Emphatics in academic discourse Integrating corpus and discourse tools in the study of cross-disciplinary variation Marina Bondi University of Modena and Reggio Emilia, Italy

The role played by mitigation in academic discourse has been widely debated in the literature, but little attention has been paid to emphatics, expressions used to intensify the degree of certainty of an utterance and to increase its illocutionary force. Focusing on the use of adverbs in journal articles and on their evaluative orientations/parameters, the chapter looks at how their frequencies, meanings and uses vary across two “soft” disciplines: history and economics. The study combines a corpus and a discourse perspective, and shows that emphatics signal “engagement” as well as “stance”, by positioning research in the context of disciplinary debate, highlighting the significance of the data or the conclusions produced, negotiating convergent or conflicting positions with the reader.

.

Introduction

This chapter is part of a wider study which aims at investigating the role played by stance markers (e.g. adverbs like actually, definitely, apparently) in academic discourse. Great interest has been shown in redefining the interactive level of discourse in the light of a plurality of analytic models of evaluative elements of discourse (e.g. Hunston & Thompson 2000). In its broadest definition, evaluation is understood as “the expression of the speaker or writer’s attitude or stance towards, viewpoint on, or feelings about the entities or propositions that he or she is talking about” (Hunston & Thompson 2000: 5). Included in this definition are forms of modality as well as a vast range of instruments of metadiscourse aimed at organising the discourse, constructing and maintaining relations between the speaker/writer and the listener/reader, as well as reflecting the value-system of the speaker and the discourse community he or she is part of. The analysis of evaluation thus links up with the area of studies on metadiscourse (e.g. Vande Kopple 1985; Crismore 1989; Hyland 1998a; 1998b; 2005). If early classifications of the

JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.2 (115-165)

Marina Bondi

metadiscursive elements seemed to keep the textual and interpersonal functions (e.g. Halliday 1985) rigidly separate, more recent studies seem instead to focus on the overlapping of these two types of functions (e.g. Conrad & Biber 2000). A similar point has been made by Swales and Burke (2003: 4) in their study on the combination of polarised – i.e. “strongly positive” and “strongly negative” – v. centralized – i.e. “more neutral” – evaluative adjectives with intensifiers across academic registers. In particular, they note that the co-occurrence of these elements may reveal interesting rhetorical effects, e.g. increasing the interpersonal orientation of merely ideational statements. The analysis of stance adverbials – adverbials “commenting on the content or style of a clause or a particular part of a clause” (Biber et al. 1999: 853) – could be a case in point. From the point of view of Hunston and Thompson (2000), they can be defined as adverbials expressing the writer’s opinion as to entities or propositions in the text. The view taken here is that adverbials of stance do not only enable monologic discourse to be evaluative, but they also often assume a common ground between reader and writer in terms of what is regarded as scientifically ‘good’ or ‘bad’ at any given point in the discourse. Adverbials can thus contribute to the dialogic and argumentative features of academic discourse by constructing this common ground between reader and writer. By doing this, they also contribute to the organization of discourse and to the representation of conflict and negotiation within the discourse community. Emphatics, in Crismore’s sense (1989), or boosters, for Hyland (1998b, 2000b), are for the most part adverbs and adverbials which attribute an increased force or authority to statements: i.e. expressions used to emphasize a statement, intensifying the degree of certainty expressed and increasing its illocutionary force. This aspect has been extensively dealt with by Wierzbicka (2006) in her study of the wider cultural implications of the use of epistemic adverbs in modern English. In particular, Wierzbicka (2006: 270) illustrates the semantic peculiarities of the “confident adverbs” evidently, clearly and obviously, which appear to express varying degrees of writers’ confidence towards their own statements. Along with hedges (or mitigators), emphatics communicate both interpersonal and ideational information, allowing writers to convey judgments with greater accuracy and situate their positions in relation to knowledge and truth claims. Since new research gains approval because it is able to negotiate accepted views and ideas with those as yet unaccepted or unknown, these stance devices play an essential role in academic discourse, as communicative strategies for increasing or reducing the force of a statement, conveying conviction or caution, etc. in order to get the researcher’s views across in a convincing manner. A considerable amount of literature has been dedicated to the question of mitigation, both in studies of general communication and in the specific domain of academic discourse (e.g. Myers 1989; Hyland 1998a; Markkanen & Schroeder 1997). More recently, Poos and Simpson (2002: 17) inte-

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.3 (165-219)

Emphatics in academic discourse

grate the traditional view of hedges as signals of modesty and/or uncertainty, by showing that they may also reflect the equally important pragmatic function of displaying “more solidarity” towards “less academically indoctrinated” interlocutors. It is readily conceded that the semantic features and pragmatic functions of emphatics deserve closer study (Hyland 2000a, 2000b; Biber et. al. 1999; Conrad & Biber 2000; Precht 2003). Yet relatively few studies have addressed their role. The definition adopted for emphatics is purposely extensive due to an interest in exploring the function of these textual elements from a text-pragmatic and argumentative point of view. For the purpose of this chapter, emphatics can be defined as expressions used to increment the degree of certainty and increase or strengthen the illocutionary force of the statement. They also attribute a truthvalue or importance to what is being emphasized. The category includes a variety of tools. The most obvious forms are adverb(ial)s: As everyone knows, as we can plainly see, undoubtedly, etc. But similar functions can also be realized by superordinate projecting/inference frameworks like those analyzed by Hyland and Tse (2005) in their study of “evaluative that”: It is generally agreed that, we believe that, the key issue is, this shows, etc. All these tools share some basic pragmatic functions. First of all, they variously foreground the writer’s degree of endorsement of a statement and the degree of universality of the related belief. This is why Hyland (e.g. 2004: 16) classifies boosters primarily as expressions of stance rather than engagement. It will be easily recognized, however, that adverbs like certainly may be primarily expressions of writer’s stance, but they also tend to limit the reader’s possibility to disagree, thus becoming tools for reader’s positioning or engagement resources, i.e. tools by which writers adjust and negotiate the arguability of their utterances to their interlocutors (Hyland 2001, 2005).1 Similarly, when looking at classifications of evaluation parameters/orientations, we will notice that there is wide convergence on some meaning areas, but also that distinctions can be blurred. Following Conrad and Biber (2000: 57), we may distinguish epistemic stance – commenting on the certainty (or doubt), reliability, or limitations of a proposition, including comments on its source – from attitudinal stance – conveying the speaker’s attitudes, feelings, or value judgements – but we may still recognize that the two are part of the same meaning area (stance, or evaluation) and that distinctions will not always be clear-cut. It may not be possible, useful or accurate to distinguish the writer’s judgment about the certainty, . Using writer-orientation and reader-orientation as a basic classification tool, Merlini Barbaresi (1987: 4–6) identifies a significant semantic and pragmatic difference in the use of different emphatics in argumentative discourse. Focusing on the difference in functions of obviously and certainly, she sees the former as essentially an “epistemic modifier”, and the latter as more of an “indicator of inferability”. The two differ significantly in orientation, with the epistemic modifier being locutor-oriented and the indicator of inferability being essentially receiver-oriented.



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.4 (219-265)

Marina Bondi

reliability, and limitations of the proposition from her/his attitude or value judgment about the proposition’s content (Silver 2003). As Hyland has demonstrated for hedges, “in actual use the epistemic and affective functions of hedges are often conveyed simultaneously” (1998a: ix), preventing the formation of discrete descriptive categories. A similar case can be made for emphatics.2 Approaches to evaluation that emphasize continuity across the epistemic/ attitudinal divide have often offered alternative classifications of evaluative meanings. Thompson and Hunston’s basic parameters of evaluation (the main semantic areas in which evaluation can be placed) are those of certainty, expectedness and importance or relevance (Thompson & Hunston 2000: 23–24). Lemke’s evaluative orientations also include three similar relevant categories: (a) warrantability/probability, as exemplified by adverbs like certainly and undoubtedly; (b) usuality/expectability, as in invariably; (c) importance/significance, as in significantly (Lemke 1998: 37). The study presented in this chapter focuses on the role of emphatics in academic discourse in two disciplines. This is done by looking at variation in frequencies, meanings and functions of selected adverbs, as signals of the argumentative structures of research articles in history and economics. The next section of the chapter provides a brief presentation of the material used for the study, as well as of the methodology adopted, ranging from genre analysis (with the identification of textual and generic structure) to corpus tools (with the study of lexicalizations in context). The results of the analysis will start with a preliminary overview of variation in frequency data and move on to an examination of syntactic scope, semantic preference and textual patterns of selected items spanning the range of evaluative orientations listed above (undoubtedly, certainly, invariably, significantly). The discussion of the data will focus on differences in the two disciplines and on their variety and approaches.

. Methods and material . Methodological preliminaries The study of academic discourse across disciplines, as outlined for example by Hyland (2006), is inherently comparative. Although not always explicit, comparison seems to be the main tool for research in language varieties: even when focusing on a single register, analysis evokes comparison as the basis for any conclusion on the specificity of a language variety. Explicit comparison, on the other hand, does not . See also Precht (2003) for an emphasis on the relations between expressions of affect, evidentiality and hedging.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.5 (265-360)


just substantiate claims about distinctive features: it also helps bring out elements of variation. It is a heuristic procedure, as well as an important support for claims. The methodological aim of this chapter is to explore how comparative analysis of disciplinary variation is enhanced by integrating corpus and discourse tools. Combinations of both perspectives are advocated and practiced by many, especially in the area of academic and professional discourse studies (e.g. Biber et al. 1998; Biber et al. 2004; Connor & Upton 2004; Hyland 2000a, 2002, 2004; Del Lungo & Tognini Bonelli 2004; Tognini Bonelli & Del Lungo 2005). In a wider framework, a major figure like John Sinclair has played a leading role in developing theory and practice in both fields (e.g. Sinclair 2004). The reasons for choosing not to separate the two approaches, however, may help clarify what is desirable in their integration. A discourse perspective draws attention to how interaction and argument are instantiated in textual practices which are recognized and continually redefined by discourse communities. A corpus perspective looks at words in combination and finds in phraseology the ideal starting point for the exploration of the systematic relation between text and form (Sinclair 2005). Defining one’s own object of analysis from both points of view helps relate textual practice to language choice. It is one way of making sure that statements about genre and discourse are substantiated with reference to data: attention to patterns of form highlights the existence of systematic relations and trends, besides possibilities. Integration of both perspectives, however, also ensures that corpus data are not just described, but interpreted in terms of verbal action and textual structures, beyond immediate lexico-semantic associations. The choice to start from a discourse or a corpus perspective should not be taken as a methodological statement in favour of a specific direction. Quite the opposite: corpus tools can be seen as both “catalyzing” or supporting the analysis and the interpretation in terms of discourse, and vice-versa. The presentation of the analysis may have to follow a specific sequence, but this is mainly due to the linearity of text. The interrelation of the two perspectives should be seen as a dialogic sequence, where corpus and discourse – just like participants in interaction – co-construct the development of the research process. . Material The analysis is based on two specialized corpora of journal articles, taken to be representative of research writing in two different disciplines: economics and history. The corpora are about 2.5 million words each and include all the articles published in ten journals for each disciplinary area over the course of two years (1999–2000). The journals are listed in Table 1 below, together with the acronyms that identify them in the examples that follow.



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.6 (360-412)

Marina Bondi

Table 1. List of journals included in the two corpora and their acronyms Economics

History

European Economic Review (EER) European Journal of Political Economy (EJoPE) History of Political Economy (HOPE) International Journal of Industrial Organization (IJoIO) International Review of Economics and Finance (IRoEF) Journal of Corporate Finance (JoCF)

Labour History Review (LHR) Historical Research (HR) Gender & History (GH) Journal of European Ideas (JEI)

Journal of Development Economics (JoDE) Journal of Economics and Business (JOEB) Journal of Socio-Economics (JSE) The North American Journal of Economics and Finance (NAJEF)

Journal of Medieval History (JMH) Journal of Interdisciplinary History (JIH) Journal of Social History (JSH) Studies in History (SH) American Quarterly (AQ) American Historical Review (AHR)

The corpus design aims at a description of English as an international language, rather than a specific geographical variety of English. Thus mostly international journals, published both in the UK and in the US, were included. No attempt was made to separate native from non-native speakers/writers: the aim of the analysis was not to prescribe purity in writing, but to describe what is published in a variety of well-established journals in the community of historians and economists over a range of subdisciplines. The perspective adopted for the analysis paid attention both to the rhetoric and organization of text in discourse and to the language resources and the meanings realized in text. The methodology combined tools from discourse analysis and corpus linguistics. From discourse studies, the notion of genre – defined by Swales (1990: 68) as a class of communicative events sharing a common purpose – and the notion of units identified by their pragmatic function were used. When focusing on the lexical tools that allow academic writers to introduce emphasis, tools from corpus linguistics were used: in particular keywords, concordances, collocates and clusters, i.e. repeated strings of words as defined by Scott (1998). The first step of the study was an analysis of frequency data: this was meant to provide an overview of quantitative variation. The next section reviews the most common emphatic adverb(ial)s appearing in the research articles in the two fields of study and makes a few initial hypotheses as to what their use may be an indication of. The overview is based on keywords, as defined in Wordsmith Tools (Scott 1998), i.e. words that are unusually frequent or infrequent in one corpus or text when compared with a reference corpus. Key-ness indices based on comparing statistical frequencies are a measure of how much one word characterizes a corpus as against another.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.7 (412-530)


The bulk of the study is based on the analysis of concordances. After having identified a few significant adverbs to be used for closer study, the co-text of the nodes was analyzed with a view to their: a) syntactic roles, i.e. the scope of the adverb and its patterns of pre/post-modification; b) lexico-semantic patterns, i.e. patterns of collocation and semantic preference: “entities” and “processes” emphasised; c) textual patterns, i.e. pragmatic functions, argumentative moves and position in linear units of the text (introduction-body-conclusion). Their use was analyzed in the development of three logico-argumentative positions: an inferential position which pieces ideas or arguments together through verbal relations of analogy, cause-effect, specification, generalization, etc., a contrastive position which places ideas in opposition, and a concessionary and contrastive (or contrastive and concessionary) position which attenuates a contrast through partial acknowledgment or acceptance of the oppositional idea or argument. The final analysis regarded how these emphatics fit into textual patterns. To facilitate the task, the focus was exclusively on the most general distinction between introduction, body and conclusion of the article. By noting how and where the adverbs are called upon to intervene when placed in the text, certain generalizations about the strategic function of the emphatics in highlighting disciplinary differences can be put forth.

. Results and discussion . Comparing frequencies: Keywords of economics and history The preliminary overview of the study was carried out by identifying keywords through comparison of the two corpora. The adverbs in Tables 2 and 3 are listed in descending order of key-ness. They help provide a first general idea of the emphatics which are used most regularly in the two disciplines, and offer a sense of adverbial variation. Since no initial attempt was made to exclude less influential adverbs, the list contains all manner adverbs in our sub-corpora which conform to the key-ness criteria and reports their frequency in the corpus together with normalized figures (occurrence per hundred words) and key-ness index. From an initial, cursory analysis of the adverbs listed, a number of these can be said to function rarely, or never, as sentence adverbs (see below for the distinction in terms of syntactic role and scope). In economics, for instance, only significantly and typically really distinguish themselves as emphatics: typically is probably the most interesting one from the point of view of its direct relationship to the abstracting needs of a social science like economics (cf. also Bondi 2002), but significantly is also clearly related to methodological issues, in particular to the definition of statistical significance,



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.8 (530-550)

Marina Bondi

Table 2. Keywords in economics Adverb

Frequency in Economics

% in Economics

Frequency in History

% in History

Key-ness index

significantly positively substantially unambiguously typically perfectly fully

814 331 225 80 313 215 400

0.03 0.01 – – 0.01 – 0.02

214 32 41 9 134 81 245

– – – – – – 0.01

350.5 276.1 133.6 62.6 67.4 58.2 32.3

Table 3. Keywords in history Adverb

Frequency in History

% in History

Frequency in Economics

% in Economics

Key-ness index

certainly especially particularly throughout increasingly really entirely largely undoubtedly inevitably thoroughly surely evidently predominantly invariably clearly

523 833 837 527 429 244 322 391 139 112 72 150 88 76 72 707

0.02 0.03 0.03 0.02 0.02 – 0.01 0.02 – – – – – – – 0.03

144 372 412 206 143 67 117 171 31 29 13 59 23 19 17 515

– 0.01 0.02 – – – – – – – – – – – – 0.02

246.4 198.0 163.3 157.3 160.1 113.5 107.1 96.6 80.2 55.2 47.3 44.3 43.0 38.7 38.6 37.6

which seems to blur the distinction between the parameters of expectedness and significance: statistical significance is indeed based on expectedness. In history, although the keyword variety of adverbs is much greater, those which have a more extensive scope are certainly, undoubtedly, evidently, invariably and clearly, once again covering the whole range of parameters (certainty, expectedness, importance). Other adverbs, such as largely, thoroughly and especially do not normally function as sentence adverbs. Even a rough overview like this can be related to disciplinary variation. Interpreting frequencies in the light of disciplinary values may suggest that economics tends to place emphasis on a simplification of reality based on a process of abstraction (typically) and on statistics (significantly), whereas history places emphasis on frequency and accumulation of factual data (usually, largely, inevitably, thor-

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.9 (550-601)


oughly, invariably etc.), as well as their interpretation (as shown by a variety of epistemic markers). This in turn may be related to an emphasis on the singularity of events (Holmes 1997) or to forms of divergence from the tendency to abstract and generalize shown by other disciplines. . Ambiguity of functions: Focus on syntactic role/scope The study of the syntactic role and scope of adverbs often presupposes a basic distinction between emphasizers and intensifiers. Intensifiers are degree adverbs with a grading function; they are defined as degree adverbs that “scale upwards from an assumed norm” (Quirk et al. 1985: 445) or neutral point. Some modify gradable adjectives and indicate degrees on a scale (e.g. extremely cautious), while others indicate an endpoint on a scale (e.g. totally different) (Biber et al. 1999: 554–555). Emphasizers, on the other hand, contribute to the expression of modality or stance: they add to the force of the modified predicate and their syntactic scope extends over the whole predicate, they strengthen the illocutionary point of the utterance and signal that what is being emphasized is taken to be true and/or important. In expressing the semantic role of modality, emphasizers have a reinforcing effect. They add to the force as opposed to the degree of the modified predicate. As such, according to Quirk et al. (1985: 583), they do not require a gradable predicate. This does not necessarily rule out the notion that emphatics produce a semantic effect which may be similar to that of intensifiers. Moreover, when the emphasizer occurs with a gradable predicate, it “takes on the force of an intensifier” (Quirk et al. 1985: 583). The basic functions of expressing stance and degree can thus be seen to overlap. Another element which generally distinguishes emphasizers from intensifiers is their syntactic scope. It is recognized that emphasizers may take scope over the predicate or the whole sentence, while intensifiers do not (Quirk et al. 1985). And yet, there seems to be no fool-proof way of discriminating between the two in this sense either, since although intensifiers demonstrate a reduced scope, there is no set scope of ‘emphatics’, which vary greatly in correspondence with their pragmatic and argumentative roles. As Merlini Barbaresi (1987: 19) points out with respect to epistemic modifiers (e.g. certainly, inevitably, no doubt, incontestably), their argumentative scope and force of assertion “are directly proportional to the 1) degree of subjectivity/objectivity of the thesis, 2) relevance assigned to the thesis in the argumentative line, whether micro- or macro-structural”. When focusing on variation of syntactic role and scope of adverbs in our corpus, we sometimes notice a considerable difference from case to case. If initial position is mostly an indicator of a scope extending to the whole sentence, mid-position may be more ambiguous. Example (1) provides an illustration: unquestionably clearly functions as a sentence adverb (an emphasizer), whereas



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.10 (601-702)

Marina Bondi

undoubtedly may be interpreted both as a modifier (whose scope is limited to the following adjective, clear) and as an adverbial of stance qualifying everything that follows. (1) Unquestionably, the success of A History of Women, edited by Michelle Perrot and Georges Duby, as well as the substantial audience for the journal Clio, are undoubtedly clear indications of the importance of this history which has now outgrown its marginal stage. (GAH)

The scope of the adverbial may actually extend beyond the sentence, as it may participate in a macro-textual pattern. In Example (2), most obviously has a strong anaphoric quality, referring to a predictive marker (Tadros 1985) appearing earlier on, at the end of the previous sentence (several different ways). Significantly, on the other hand, functions as an adverbial modifying the verb and its object. (2) . . .the sample is skewed in several different ways. Most obviously, it is a catalogue of books published in London, and thus significantly excludes important publishers in Glasgow and Edinburgh, like William Collins and W. & R. Chambers. In addition, the Publishers’ Circular, from which the data are drawn, did not provide either a full or a representative sample of publications. (SIH)

Example (2) shows that thematic position often extends the scope of the adverb and gives it a cohesive function: thematized adverbs do not simply extend their scope forward, but they also signal the relationship between the syntactic unit they introduce and the previous text. Table 4 below offers an overview of quantitative patterns of four selected adverbs in our corpus. The table clearly shows that, in the case of these four adverbs, thematic position is on the whole much more frequent in history, but it also shows that economics and history may tend to favour different types of emphatics in thematic position: economics favours certainty adverbs, whereas history ranges rather equally over the three parameters of certainty, expectedness and importance. When looking at the same adverbs used as intensifiers, we get a complementary picture. Table 5 below shows how often the four adverbs under investigation are used to modify adjectives or adverbs. Economics clearly prefers this pattern, with the more limited scope of the adverb, and the most obvious trend is actually that of significantly, which is extremely frequent in economics, but mostly as intenTable 4. Sentence adverbs: Initial position Parameter of evaluation

Adverbs

Economics

History

Certainty

certainly undoubtedly invariably significantly

14/144 (9.7%) 3/31 (9.7 %) 1/17 (5.9%) 3/814 (0.4%)

91/523 (17.4 %) 16/139 (11.5 %) 16/72 (22.2 %) 38/214 (17.7%)

Expectedness Importance

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.11 (702-771)


Table 5. Modifying adj./adv. Parameter of evaluation

Adverbs

Economics

History

Certainty

certainly undoubtedly invariably significantly

27/144 (18.7%) 2/31 (6.5 %) 3/17 (17.6%) 569/814 (69.9%)

65/523 (12.4 %) 21/139 (15.1 %) 8/72 (11.1 %) 44/214 (20.5%)

Expectedness Importance

Table 6. Pre-modification of emphatic adverbs Economics

History

significantly

not (94/814) quite (3/814)

invariably certainly undoubtedly UNMODIFIED

almost (2/17) almost (4/144) 0/31 1011 (90.7%)

not (5/214) more (10/214) most (17/214) almost (17/72) almost (40/523) 0/139 751 (89.4%)

sifier of adjectives and adverbs, in contexts which make clear reference to statistical significance. A look at pre-modification elements also reveals interesting patterns. On the whole, pre-modification of these adverbs is rather limited. Some adverbs (e.g. undoubtedly) are never pre-modified, while others (invariably, certainly) may occasionally be graded; in this case, the tendency is once again for history to favour a wider range of shades. Once again, however, the very frequent use of significantly in economics reveals a peculiar pattern in the high incidence of negative contexts. As a general rule, pre-modification of adverbs may be related to cases of “polarization” and to parameters of evaluation. Undoubtedly is clearly the most “polarized” of our adverbs here: it does not accept shades of ‘undoubtedness’. Almost still combines with more polarized elements like invariably, especially in history. Significantly and markers of importance in general are less “polarized” than markers of warrantability and usuality; they have a higher “grading” function: things can be more or less relevant, more or less important. The majority of uses of significantly in economics, however, have statistical significance as their object, and statistical significance is typically used to establish when frequencies start being more or less significant: once the parameters are set, frequencies either have or do not have statistical significance. Many of the numerous negative occurrences of significantly have this particular meaning, and occur in patterns of analogy and contrast, as illustrated in Example (3): (3) The coefficients of the variables denoting trade agreements indicate that trade between the countries in the EEA zone is significantly greater (at the 1% level)



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.12 (771-877)

Marina Bondi

than average OECD trade. The same holds for trade between Australia and New Zealand. The NAFTA agreement does, however, indicate that trade between Canada, Mexico and the USA is not significantly greater than average OECD trade. Finally, the EU countries’ custom union with Turkey seems to have had a positive influence on trade. (EJOPE)

The specific use of significantly in economics also involves a highly formulaic use of the expression. A look at 5-word clusters (strings of word forms) in the concordance reveals frequent occurrences of “chunks” or extended collocations of language including significantly, whereas no 5-word cluster could be found for the other adverbs. Examples of clusters in negative contexts only are particularly numerous, as shown in Table 7. Table 7. Significantly: 5-word clusters in negative contexts 5-word cluster

Frequency

not significantly different from zero is not significantly different from are not significantly different from but not statistically significantly different not statistically significantly different from statistically significantly different from zero does not significantly affect the is not significantly related to not significantly different across takeover not significantly different from one significantly different across takeover amendment significantly different from zero for significantly different from zero in

29 17 10 7 7 7 3 3 3 3 3 3 3

This peculiarity of significantly can be seen even more clearly by focusing on how it is used as an intensifier. An analysis of the adjectives and adverbs it qualifies, summarized in Table 8, shows that economics, although using the expression remarkably more often than history, has a very similar range from a quantitative point of view. It also shows that the limited use of the adverb in history is mostly linked to comparative adjectives and verbs, whereas economics makes greater use of a fixed set of gradable adjectives.3 Once again the general trends studied in this section can be related to the nature of the disciplines when looking at variation from the point of view of crossdisciplinary comparison. Attention to abstraction in economics can be related to . Figures in brackets in this and the following table represent frequencies (>1) of lexical items; when no figure is provided, the frequency is 1.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.13 (877-962)


Table 8. Significantly as an intensifier Comparative adjective/adverb

Gradable evaluative adjective/ past participle

History

different (9), more (8), higher (6), lower (5), less (3), greater (3), shorter, lesser, harder, busier, larger

cautious, absent, beneficial

Economics

different (141) higher (59), lower (30), greater (28), more (24), less (23), larger (14), smaller (6), closer, below

positive (45), negative (35), exposed (16), procyclical, ameliorated

the dominance of reference to statistical norms and in a consequently limited use of emphasizers proper. Attention to detail and process in history may be related not only to the much higher and much more varied use of emphatics already noticed above, but also to the greater interest in shades of polarized elements and in the wider use of pre-modification. . Collocation and “semantic preference” A closer study of collocation and patterns of semantic preference of the adverbials also reveals variation across disciplines. The analysis was restricted to three adverbials only – significantly, undoubtedly and invariably – each representing one parameter of evaluation (importance, certainty and expectedness respectively). Concordances were studied in order to identify preference for particular types of processes. On the whole, verbs of state (be) greatly outnumber other types of verbs. This is particularly the case with economics: in the case of undoubtedly, for example, about 43.33% of the occurrences collocate with be (61.54% of which in inferential patterns) whereas history is limited to 26.43% (equally divided between contrastive and inferential patterns). A very preliminary cross-disciplinary conclusion that can be drawn from this is that economics seems to privilege emphasis on claims, whereas history is more interested in emphasizing trends. When focusing more specifically on process types, the two basic categories identified were processes of ‘change’ or ‘effect’ (increase, reduce, influence, etc.) and processes of ‘cognition’ or ‘exposition’ (relate, associate, explain, describe, etc.). Table 9 below provides an overview of the lexical items and the semantic areas involved for the three adverbs selected. The table illustrates marked differences in the preferences shown by the three adverbs. Significantly is equally associated with processes of change and of exposition: reference to processes of change is dominant in both disciplines, but much more so in economics than in history. This tendency is even more noticeable if we keep in mind that economics, as we will see below, generally makes greater



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.14 (962-981)

Marina Bondi

Table 9. Semantic preference: process types Adverb

‘Change/effect’ HISTORY

ECONOMICS

‘Cognition/exposition’ HISTORY ECONOMICS

significantly

differ(11), increase (7), alter (3), change (3), affect (2), expand (2), influence (2), enlarge (2), increase (2), alter, worsen, diverge, grow, expand, fall, improve, advance, depart

increase (30), affect (29), change (24), reduce (13), influence (8), impact (5), decrease (3), increase(3), differ (2), vary (2), compensate, contribute, deviate, fluctuate, simplify, tilt, influence, expand, improve, alter, modify.

contribute(15), correlate (4), draw on (2), figure in

relate (25), correlate (13), associate (5), explain, appreciate

undoubtedly

influence (3), cause (2), contribute to, discourage, distort, enhance, produce, is responsible for

contribute to (2), decrease (2), strengthen (2), help (2)

–

–

invariably

–

lead (2), produce, follow from

label (2), deem, describe, designate, portray, signify, know as, signify, emphasise as, appear as

–

use of metadiscursive contexts than history. Undoubtedly, on the other hand, is exclusively used with reference to change processes in our corpus. Furthermore, the table shows an interesting semantic preference for ‘exposition’ surrounding invariably in history texts. Data show that it is conveyed by the co-occurrence of the adverbial with verbal forms of ‘description’, such as labelled as, described as and appear as. However, it would also be appropriate to point out that this semantic preference goes hand in hand with an overall negative semantic prosody. Words which are shown to have a distinct semantic preference are sometimes affected in their meaning and they take such “aura of meaning” on themselves. Louw (1993), for example, shows that words like utterly – normally occurring in context with negative meanings – are heard as ironical when found in positive contexts. This is referred to as “semantic prosody”, and identified by Sinclair (1996: 87) as distinctly “attitudinal, and on the pragmatic side of the semantics/pragmatics continuum”. More specifically, we can see that 80% of

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.15 (981-1076)


the terms creating the semantic preference of ‘description’ occur in contexts where the person or object concerned is qualified in an unpleasant way. This is illustrated in the two examples below: (4) While wet nurses’ employers occasionally lauded their employees’ beneficial product, they invariably deemed wet nurses themselves impossibly troublesome – linking breastfeeding with immoral, unworthy women. (JOSH) (5) Similarly, menopause is invariably described as a diminishment of a woman’s biological potential, not as a positive change and a redirection of the body’s biological resources. (GAH)

The notion of semantic preference can be extended to pragmatic units and other elements of the relevant co-text. One major dimension to be explored could be the world of reference or the plane of discourse of the co-text. When looking at how adverbs were used, one relevant issue was, for example, whether – irrespective of the specific process they qualified – they were used in statements on discourse and the community or in statements on data and their interpretation. This analysis was meant to confirm trends observed elsewhere (Bondi 2005; Bondi 2007), which highlight a greater role of self-referential statements in economics compared to history and a tendency for history to be focused on factual narrative. A look at general figures for undoubtedly shows that the adverb is mostly used in statements on discourse and the community, accounting for 22/31 occurrences (70.9%), whereas in history the same adverb mostly refers to statements on data and their interpretation: 93/139 occurrences (66.9%) qualify statements about the object of disciplinary study. Similarly, invariably tends to be associated with statements on discourse and the community in economics (12/17 occurrences, i.e. 70.5%), whereas it is mostly associated with the object of study in history (54/72 occurrences, i.e. 75%). The data confirms the interest shown by economics in highlighting statements on discourse and the community and by history in highlighting statements on data and their interpretation. The trend is also highlighted by other markers: metadiscursive occurrences of there is no doubt that. . . /it is significant that and similar phraseology suggest a slight tendency of economics to privilege statements about discourse and a slight tendency of history to privilege statements about data. A look at phraseology also confirms that there is a clear tendency in historical discourse to make use of a much wider range of tools for emphasis, as shown in Table 10 below, where the frequencies of a number of phraselogical options are given. The quantitative data on these emphatics should be seen against the backdrop of general trends in disciplinary discourse. In general, emphasis on the discourse community and accepted methodologies is much greater in economics, whereas history emphasizes the reader’s direct contact with facts and their logical interpre-



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.16 (1076-1158)

Marina Bondi

Table 10. Selected phraseology Selected phraseology

Economics

History

it is significant that. . . There is / was no doubt. . . . . . that. . . There is little doubt that There can be no doubt that. . . There can be little doubt that. . . There is no reason to doubt. . . There is, however, hardly any doubt that. . .

3 2 1 1 0 0 1

14 24 6 11 11 3 0

tation (cf. Bondi 2005). Use of emphatics, however, is quite clearly meant to attract the reader’s attention to issues that play a major role in the line of argument of the writer and is thus more often related to references to one’s own discourse or to debate within the disciplinary community. . Pragmatic and textual functions: Focus on significantly, invariably and undoubtedly Emphatics clearly act as highlighters of key points in the line of argument of the writer. The functions they take on may often be related to their basic semantic potential and to the evaluative parameters they express. Significantly has been taken as an example of an adverb potentially referring to the parameter of importance, even if we have noticed that this is often interpreted in statistical terms in economics. A closer look at the co-text of significantly will show that, in economics, it is mainly used as highlighter of significant findings, but there is also a clear association with other emphatics and metadiscourse signalling inference and claims. The adverbial often collocates with other forms of “selfprojection” highlighting a shift from data interpretation to conclusion drawing. Among the collocates that precede it, we find numerous reference to findings: With respect to Conjecture 2B we find that behaviour is indeed significantly more competitive in Extra in the case of Cournot markets It is interesting to note that Whites significantly improve their cognitive skills as they grow older. . . Interestingly enough, * is significantly different from 1 at the 10% level for the 3-month data. . . We tested this by one-tailed Mann-Whitney U-tests and found that average quantities are significantly higher in BASIC BERTRAND than in EXTRA BERTRAND. . . The results are significantly modified if the demand for fiscal services is price-elastic. The four main findings of the paper are: (1) survival patterns differ significantly across specific industries. The implication is that redistribution fully or (in one case, significantly), compensates for the differences. . .

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.17 (1158-1205)


The results show that high-growth firms have significantly lower debt/equity ratios and dividend yields compared to. . . . . .the evolutionary act of creating a paradigm can lead to a result that deviates significantly from that ideal state. . . The table shows that in response to both more generous and longer benefits, the share of good jobs increases significantly. . . .Table 13 indicates that the pre-succession performance of the nonfamily and outsider successor firms is significantly lower than that of the family successor firms. More importantly, abnormal returns are significantly positive for firms that are below the median market value. . . . . ., and we test whether PY is significantly higher in it than in the previous and following periods. Our empirical findings indicate that the premium attached to voting stock is positively and significantly associated with the control value. . . At the very least, our results are significantly impacted whether we use lagged, contemporaneous or forward managerial ownership levels.

Other collocations following the adverbial reveal that the findings are then used for inference drawing: The relative value of commodities and the precious metals changed significantly. Thus the author of the hugely impressive study of the Spanish inflation estimated that. . . The coefficients for the lagged variance terns are not significantly different form zero, suggesting that the sizes of current and previous period residuals are not strongly correlated. . . .the estimated value of 1 is not significantly different from zero. This is consistent with our expectation that the relation should be weaker for firms in the low persistence environment. Furthermore, low load factors [. . .] are significantly associated with less differentiation in departure times. Overall, it appears that the predictions of location models with exogenous prices are supported by the results from the 1975 data. Given that most empirical estimates [. . .] are significantly less than 30, our analysis suggests that the stock market value is likely to be higher under a money rule, and. . . . . .markups of high CR4 industries are significantly procyclical. There are, however, interesting differences in the dynamics of the response of markups across the monetary measures. . . . . .the non-US country-specific portfolios are often found to be significantly exposed. These findings may be attributed to differing regulatory and supervisory requirements. . . . . .beta significantly decreases. Further analysis indicates that the size of the pre-disclosure beta, the amount of the abnormal return, the market value of the equity of the type of firm significantly affect the difference between post and pre-disclosure betas. None of the three sets of results show that education contributes significantly to individual wages. . .. This is sharply contrasted to the finding that the average rate of return to education is 12.8% for other Asian developing countries and 14.4% for all the developing economies that have been studied.



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.18 (1205-1247)

Marina Bondi

Debit card growth could significantly change consumer payment patterns. We provide an analysis of debit and potential debit users in Table 2. The figure shows that the estimated correlation [. . .] fluctuates significantly throughout. Several explanations have been advanced for the sign of the correlation. The difference [. . .] is significantly different [. . .]. One possible explanation for the slope sign change is that this point represents an equilibrium ownership condition. . . .it is significantly different from zero [. . .] as well as one [. . .]. Therefore, we reject the hypothesis of the Nash behaviour (t-stat of 6.72) as well as Cartel behaviour (t-stat of 3.68). The Spearman correlation [. . .] is significantly positive (1% level). Together, these findings support the hypothesis that the probability that a firm has a completely independent and active audit committee is positively related to firm size. . . .we find that more information makes markets significantly more competitive, supporting the imitation hypothesis. This simplifies the model significantly, but the assumption also carries some strong implications for the results.

An analysis of significantly in history provides similar results, but the pattern expands on a wider co-text, often requiring more than five lines of concordance co-text. By extending the context, it is easy to see that the main function for the adverb is to highlight significant findings, but also that the pattern is complicated by lists and narrative sequences: More significantly, Homberg used his instrumental expertise to work out in practical terms Boyle’s concern with the material and transmutable elements of chemistry. Thirdly, and more significantly, the cotton unions’ choice of constituencies to contest showed poor judgment. And most significantly, industrialists’ fear of diminishing profits played and preyed upon the long-standing fear of unrestrained women. Moreover, and perhaps more significantly, Bauer rejected emancipation despite his willingness to think of Jews in religious, rather than, for example, national terms. Significantly, the submission and humility of Jesus is emphasized in the frescoes. The first scene depicts not the moment of the institution of the Eucharist but Jesus receiving Judas’s denial of his betrayal. In the second, he kneels to wash the Apostle’s feet. In the next. . . Most obviously, it is a catalogue of books published in London, and thus significantly excludes important publishers in Glasgow and Edinburgh, like [. . .]. In addition, the Publisher’s Circular, from which the data are drawn, did not provide either a full or a representative sample of publications. There is no reason to presume Bigelow’s use of ether differed significantly from the norm. Indeed, later in the century, Bigelow was an ardent defender of individualist therapeutics when reformers at Harvard wanted to increase the laboratory requirements in the medical curriculum.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.19 (1247-1301)


. . . it is not surprising that he used London as a platform [. . .]. Similarly, it is not surprising that the proportion of Gladstone’s London speeches delivered while he was in office (43 per cent) was significantly greater than the proportion of for his speeches in the rest of Great Britain [. . .]. Her claim to the full authority of Augustus is most significantly expressed in the occasional use of male titles [. . .]. However, Kantorowicz’s conclusion on [. . .] seems over-optimistic. At least to some extent, they were able to transgress normal restrictions for women; most significantly in the jurisdictional capacity granted them, and also in the fact that teaching and spiritual guidance [. . .] could be seen as female prerogatives. For just as recourse to witnesses’ depositions was significantly more common in Exchequer, so too was the sending of issues of fact in equity cases to be tried before juries at common law.

Cross-disciplinary comparison thus seems to suggest that significant findings tend to lead to inferential reasoning in economics, whereas they become part of listing and contrastive patterns in history, problematizing data and highlighting claims. If we consider invariably and undoubtedly, we can easily relate the meaning potential of each to the parameter of evaluation we started from: certainty, expectedness, importance. It is possible to relate invariably to the parameter of expectedness, where the credibility and value of an utterance is emphasized by the predictability and regularity of the trend qualified by the adverb. In the case of undoubtedly, on the other hand, the dominant parameter will be that of certainty, clearly related to the meaning potential of the adverb, with its explicit reference to epistemic stance. One of the most common intra-sentential functions of invariably is that of highlighting consistency or inconsistency within a sentence, as shown in Examples (6) and (7). (6) This practice is puzzling. If MNE and HC have similar discount rates, why does the reduction in tax rates invariably take this form rather than a uniform reduction over time? In Section 2 we argued that HC’s discount factor is typical. . .. (EJOIO) (7) . . . many such lecture courses represented an important trend towards academic democratisation throughout the eighteenth century, both within and outside British universities. Not only were women involved in this trend but by the turn of the century such lectures were almost invariably open to them. For example ... (GAH)

When we look at the inter-sentential uses of the adverb, we notice that it acts as a predictive element in forms of prospection. It can be used, for example, in highlighting a generalization predicting a list of specific examples, as in Example (8). (8) In Canada, Combined Universities CND argued that ‘the damaging of our children, and [of] countless generations to come, is nothing short of crim-



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.20 (1301-1379)

Marina Bondi

inal No one has the right to do these things’. In the United States, SANE’s dramatic newspaper ads almost invariably developed similar themes. ‘What are the risks of tests?’ asked its 10 April 1962 ad in the New York Times. It replied: ‘Radioactive fallout will increase, endangering our lives and especially the lives of our children’. Another SANE ad that year, featuring a pregnant woman, proclaimed: ‘11/4 Million unborn children will be born dead or have some gross defect because of Nuclear Bomb (GAH)

Quantitative analysis of these functions, highlighting patterns of analogy and contrast, as well as general-specific sequences, shows that invariably is used in interestingly different patterns across disciplines. Table 11 provides the basic figures. Table 11. Pragmatic functions of invariably Pragmatic functions of invariably

Economics

History

Highlighting consistency Highlighting inconsistency Predicting list of specific examples

8/17 (47.1%) 7/17 (41.2%) 2/17 (11.7%)

51/72 (70.8%) 9/72 (12.5%) 12/72 (16.7%)

The adverb is shown to be mostly used as highlighter of patterns of consistency and inconsistency across the disciplinary spectrum, but a clear trend emerges showing a much greater interest of history in highlighting consistency of facts and processes, as against an almost equal distribution across consistency and inconsistency in economics. A similar analysis of the functions of undoubtedly shows that it is often used to highlight that the writer is stating the obvious, as a premise/conclusion to further argument: not so much what should be known, as what should be easily inferred. Use of the adverb is often related to sequences of (a) explanation (cause/effect; general/specific); (b) matching/contrast. More specifically, we have identified a major function in emphasizing logical inference or specification, as illustrated in Examples (9) and (10). (9) But the very insidiousness of the process made its causes harder to discern; Malestroit was undoubtedly misled into thinking debasement the more important problem, and it is Bodin who deserves credit for pinpointing the increase of precious metals as the real issue. (HOPE) (10) ‘Some of the wordings in programmes and decisions of the Social Democratic Party seemed to be inspired by Kvinnors liv och arbete’, wrote Edmund Dahlström, one of its authors. Undoubtedly there were now clear connections being made between academic research and the political climate. (GAH)

The specification is often accompanied by contrast, so that the adverbial highlights inconsistency with a generalization offered and functions as a qualification of the general statement, as in Example (11).

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.21 (1379-1444)


(11) Inventors tend to file for patents if the expected benefit exceeds the cost. In efficient capital markets, the creator of a useful invention can borrow to finance the patent and its development. Women inventors, however, undoubtedly faced greater obstacles in obtaining funding for their inventions, and might not have been able to afford the patent fee and application process, which could amount to as much as $100 (about one-quarter of average annual non-farm wages in the late nineteenth century). (JOIH)

This contrastive element often constructs more complex sequences where undoubtedly acts very much as a marker of concession followed by contrast, as in Examples (12) and (13). (12) Finally, it would be interesting to extend this model to an infinite-horizon setting. Although we believe that our two-period model captures the essential intertemporal tradeoffs that the central bank faces, an infinite-horizon environment undoubtedly would yield more general, and richer, sets of conclusions about the central bank’s instrument-choice problem in a real-world setting with no ”concluding” period. We leave these and other interesting issues for future research. (JOEB) (13) When a new crisis hits, the previous generation of models is judged to have been inadequate (p. 58). Undoubtedly, each crisis has certain distinctive features and peculiarities. However, in light of Rodrik’s observation, it is important to determine what – if any – common elements exist between some or all of these crises, and to develop a general framework that captures these important commonalties. (JOIH)

Both examples show quite clearly that use of emphatics does not simply signal writer’s stance, but also positions the reader, by showing temporary agreement with a claim that is then clearly refuted by what follows. The reader is offered recognition, but is also led to accept the writer’s claim. Quantitative analysis of the functions listed above shows that the contrastive meanings (which may be classified as more “reader-oriented”, or more dialogic, in that they presuppose different interpretations) and the inferential/specifying meanings (more “writer-oriented” or more monologic and focused on the internal logic of the exposition) are fairly balanced. See Table 12 for the data.

Table 12. Pragmatic functions of undoubtedly Undoubtedly

Economics

History

Emphasising logical inference/specification Emphasising contrast Concession and contrast

15/31 (48.4%) 8/31 (25.8%) 8/31 (25.8%)

78/139 (56.1%) 32/139 (23.1%) 29/139 (20.8%)



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.22 (1444-1517)

Marina Bondi

Table 13. Textual position of undoubtedly and invariably Textual macro-structure Introduction Body Conclusion

Economics undoubtedly invariably 4/31 (12.9%) 22/31 (70.9%) 5/31 (16.2%)

3/17 (17.6%) 11/17 (64.7%) 3/17 (17.7%)

History undoubtedly invariably 9/139 (6.5%) 117/139 (84.1%) 13/139 (9.4%)

7/72 (9.7%) 61/72 (84.7%) 4/72 (5.6%)

Reader-oriented use of undoubtedly can be seen to be slightly higher in economics, but on the whole there is no major difference across the disciplines in the balance between writer-oriented and reader-oriented uses of the emphatic. The last phenomenon studied was the distribution of adverbs in texts. Using a rough classification of the text sections into introduction, body and conclusion, the distribution of adverbials across the sections was studied. The quantitative data are reported in Table 13. One major drawback with this kind of calculation is that the introduction as such is much more clearly marked in economics than in history; on the whole, however, they do not differ significantly from a quantitative point of view. The patterns of variation they highlight can therefore be attributed some degree of reliability. Keeping in mind that introduction and conclusion correspond roughly to 10% of the whole text on average, the data can lead to an interpretation of variation: adverbs are distributed rather regularly in historical discourse, with a slight tendency for higher figures in openings, whereas in the economics corpus they are clearly more frequent in introductions and conclusions than in the body. The data thus show that, in economics, these two adverbials are more often used in sections which are also typically related to discussion of the literature and reference to the discourse community.

. Conclusion The analysis of frequencies and patterns has shown that the use of emphatics in history is much more varied and graded than in economics.4 Economics is characterized by rather limited use of emphasizers proper, as well as by more formulaic use of language, whereas history has greater interest in shades of polarized elements and greater use of pre-modification. Patterns of semantic and pragmatic preference also reveal different trends in history and economics, both in terms of the processes emphasized and in terms of the world of reference of the moves em. Comparison across disciplines, of course, may always be made problematic by the definition of the discipline itself, which can be identified at different degrees of delicacy and homogeneity. This does not, however, make the quantitative and qualitative differences observed less relevant.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.23 (1517-1585)


phasized, which are more frequently self-referential in economics. The analysis of moves in which emphasis is found also suggests that the emphasis may be placed on highlighting different features: the significance of findings, the ease of inferability or the dialogicity of patterns of contrast and concession. There is, however, noticeable variation across the span of emphatics considered. Emphatics are shown to signal “engagement” as well as “stance”: they contribute to positioning one’s research in the context of disciplinary debate and to highlighting the significance of the data or the conclusions produced, thus becoming resources by which the author negotiates (engages with) the various convergent or conflicting positions. In research-based genres, they contribute to positioning one’s research in the context of disciplinary debate and to highlighting the significance of the data or the conclusions produced by the writer. Lexical choices and patterns are also related to the epistemology of the disciplines examined. Economics, with its emphasis on simplification, abstraction, as well as contrastive sequences focusing on discourse participants, is clearly inspired by a “rhetoric of inquiry” which identifies well-defined sections in a research article, typically organized around the patterns inspired by natural sciences (Introduction, Methods, Results and Discussion). History, with its emphasis on accumulation and interpretation of factual data, as well as causal sequences focusing on the research object, is more clearly inspired by a “rhetoric of narrative”, where readers are confronted directly with data and sequences of events and processes.

References Biber, D., Conrad, S. & Cortes, V. 2004. If you look at. . .: Lexical bundles in university teaching and textbooks. Applied Linguistics 25(3): 371–405. Biber, D., Conrad, S. & Reppen, R. 1998. Corpus Linguistics: Investigating language structure and use. Cambridge: CUP. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Pearson Education. Bondi, M. 2007. Authority and expert voices in the discourse of history. In Language and discipline perspectives on academic discourse, Fløttum, K. (ed.). Newcastle: Cambridge Scholar Publishing. 66–88. Bondi, M. 2005. Metaargumentative expressions across genres: Representing academic discourse. In Dialogue within Discourse Communities. Metadiscursive perspectives on academic genres, J. Bamford & M. Bondi (eds), 3–28. Tübingen: Niemeyer. Bondi, M. 2002. Attitude and episteme in academic discourse: Adverbials of stance across genres and moves. Textus, P. Evangelisti & E. Ventola (eds), 15(2): 249–264. Conrad, S. & D. Biber. 2000. Adverbial marking of stance in speech and writing. In Evaluation in Text: Authorial stance and the construction of discourse, S. Hunston & G. Thompson (eds), 1–27. Oxford: OUP.



JB[v.20020404] Prn:11/04/2008; 10:43



F: SCL3103.tex / p.24 (1585-1693)

Marina Bondi

Connor, U. & T. Upton (eds). 2004. Discourse in the Professions. Perspectives from corpus linguistics. Amsterdam: John Benjamins. Crismore, A. 1989. Talking with Readers: Metadiscourse as rhetorical act. Frankfurt: Peter Lang. Del Lungo, G. & E. Tognini Bonelli (eds). 2004. Academic Discourse: New insights into evaluation. Frankfurt: Peter Lang. Halliday, M. A. K. 1985. An Introduction to Functional Grammar. London: Arnold. Holmes, R. 1997. Genre analysis and the social sciences: An investigation of the structure of research article discussion sections in three disciplines. English for Specific Purposes 16: 321– 337. Hunston, S. & Thompson, P. 2000. Evaluation in Text: Authorial stance and the construction of discourse. Oxford: OUP. Hyland, K. 2006. Disciplinary differences: Language variation in academic discourses. In Academic Discourse across Disciplines, K. Hyland & M. Bondi (eds), 17–45. Frankfurt: Peter Lang. Hyland, K. 2005. Stance and engagement: A model of interaction in academic discourse. Discourse Studies 7(2): 173–191. Hyland, K. 2004. Engagement and disciplinarity: The other side of evaluation In Academic Discourse: New insights into evaluation, G. Del Lungo & E. Tognini Bonelli (eds), 13–30. Frankfurt: Peter Lang. Hyland, K. 2002. Directives: Power and engagement in academic writing. Applied Linguistics 23(2): 215–239. Hyland, K. 2001. Bringing in the reader: Addressee features in academic articles. Written Communication 18(4): 549–74. Hyland, K. 2000a. Disciplinary Discourses: Social interactions in academic writing. London: Longman. Hyland, K. 2000b. Hedges, boosters and lexical invisibility: Noticing modifiers in academic texts. Language Awareness 9(4): 179–197. Hyland, K. 1998a. Hedging in Scientific Research Articles. Amsterdam: John Benjamins. Hyland, K. 1998b. Boosting, hedging and the negotiation of academic knowledge. Text 18(3): 349–82. Hyland, K. & Tse, P. 2005. Hooking the reader: A corpus study of evaluative that in abstracts. English for Specific Purposes 24(2): 123–139. Lemke, J. (1998) Resources for attitudinal meaning. Evaluative orientations in text semantics, Functions of Language 5(1): 33–56. Louw, B. 1993. Irony in the Text or Insincerity in the Writer?: The Diagnostic Potential of Semantic Prosodies. In Text and Technology: In honour of John Sinclair, Baker, M., Francis, G. & E. Tognini-Bonelli (eds). Amsterdam/Philadelphia: John Benjamins. 157–176. Markkanen, R. & Schröder, H. (eds) 1997. Hedging and Discourse. Approaches to the analysis of a pragmatic phenomenon in academic texts. Berlin: De Gruyter. Merlini Barbaresi, L. 1987. “Obviously” and “certainly”: Two different functions in argumentative discourse. Folia Linguistica 21: 3–24. Myers, G. 1989. The pragmatics of politeness in scientific articles. Applied Linguistics 10: 1–35. Poos, D. & Simpson, R. 2002. Cross-disciplinary comparisons of hedging: Some findings from the Michigan corpus of academic spoken English. In Using Corpora to Explore Linguistic Variation, R. Reppen, S. Fitzmaurice & D. Biber (eds), 3–23. Amsterdam: John Benjamins. Precht, K. 2003. Great versus lovely: Stance dfferences in American and British English. In Corpus Analysis: Language structure and language use, P. Leistyna & C. F. Meyer (eds), 133–151. Amsterdam: Rodopi.

JB[v.20020404] Prn:11/04/2008; 10:43

F: SCL3103.tex / p.25 (1693-1741)


Quirk, R., Greenbaum, R., Leech, G. & Svartvik, J. (eds) 1985. A Comprehensive Grammar of the English Language. London: Longman. Scott, M. 1998. Wordsmith Tools. Oxford: OUP. Silver, M. 2003. The stance of stance: A critical look at the ways stance is expressed in academic discourse. Journal of English for Academic Purposes 2(2): 359–374. Sinclair, J. 2005. What’s in a phrase. Lecture held at the University of Modena and Reggio Emilia, 15 November, 2005. Sinclair, J. 2004. Trust the Text: Language, corpus and discourse. London: Routledge. Sinclair, J. 1996. The search for units of meaning. Textus 9(1): 75–106. Swales, J. 1990. Genre Analysis. Cambridge: CUP. Swales, J. & Burke, A. 2003. “It’s really fascinating work”: Differences in evaluative adjectives across academic registers. In Corpus Analysis: Language structure and language use, P. Leistyna & C. F. Meyer (eds), 1–18. Amsterdam: Rodopi. Tadros, A. 1985. Prediction in Text. Birmingham: English Language Research Monographs. Tognini Bonelli, E. & Del Lungo, G. (eds). 2005. Strategies in Academic Discourse. Amsterdam: John Benjamins. Vande Kopple, W. 1985. Some exploratory discourse on metadiscourse. College Composition and Communication 36: 82–93. Wierzbicka, A. 2006. English: Meaning and culture. Oxford: OUP.



JB[v.20020404] Prn:11/04/2008; 10:48

F: SCL3104.tex / p.1 (47-118)

Interaction, identity and culture in academic writing The case of German, British and American academics in the humanities Tamsin Sanderson University of Freiburg, Germany

This chapter aims to illustrate one way in which corpus-linguistic methods and specialised corpora can be combined in work on academic discourse. It reports selected findings from a study of social interaction in research articles written by German, British and US-American humanities academics, based on the 1-million-word SCEGAD corpus. While the main interest of the project was in possible cultural differences in academic discourse, statistical analysis was used to examine the influence also of linguistic background, discipline, author age, status and gender on the construction of identity and the encoding of social relations in academic writing. The findings reveal significant cultural differences, but also demonstrate the influence of variables such as discipline, gender and academic status on author-reader interaction and identity construction in scholarly texts.

.

Introduction

Academic writing has traditionally been conceived as a register lacking in personal involvement and explicit references to authorial or reader identity. Positivist theories of knowledge production in the academy encouraged a view of the individual scholar as the mere transmitter of universal, objective truths, rather than the human author of subjective, argumentative texts with a persuasive function (Aikenhead 1996: 9; see also Harding 1991; Hyland 1999). Such views were never more than a socially- and culturally-constructed illusion, an attempt to conjure an objectivity no person can ever possess because of the innately subjective nature of human perception (as noted by philosophers from Bacon in 1620 to Popper in 1979). They have obscured the quintessentially interactive nature of academic prose, with its socially-situated dialogue between author, imagined reader and reader.

JB[v.20020404] Prn:11/04/2008; 10:48



F: SCL3104.tex / p.2 (118-157)

Tamsin Sanderson

Social constructivist theories of language have provided a useful foundation for re-examining traditional views of academic writing, since they emphasise the importance of language not just for conveying meaning, but also for structuring and maintaining social relationships (Berger & Luckmann 1966; Gumperz 1982; Le Page & Tabouret-Keller 1985). Thus, academic writing can be seen not as the disconnected transmission of immutable facts, but as a dynamic process of meaning construction within a defined social context. Accordingly, recent studies of academic language consider issues of identity, voice, evaluation, hedging and persuasion, and have begun to investigate specific linguistic features associated with social interaction in various scholarly genres (Duszak 1997; Hyland 1998, 2000, 2005b; Ivaniˇc 1998; Myers 2001; Thompson 2001; Mauranen 2004; Simpson 2004; Swales 2004; Harwood 2005a). With varying levels of methodological sophistication and success, contrastive studies have examined how academic authors from different cultural and linguistic backgrounds construct and manage social interaction (Busch-Lauer 2001; Hutz 2001; Zhabotynska 2001; Fløttum, Dahl & Kinn 2006). If academic writing is not the faceless genre it was long assumed to be, how and where does authorial identity manifest itself? One of the main instruments for indicating personal involvement and identity in academic texts is person reference (Mühlhäusler & Harré 1990). In British and US-American English and German – the languages examined here – the primary person-referential devices are personal pronouns. Scholarly authors make particular use of the first personal singular and plural pronouns and the second personal singular pronoun to construct individual, group and reader identity. Traditional grammatical descriptions assign fixed, and separate, speech-act roles to each person: first person pronouns are said to refer to the speaker, second person to the addressee (see for example Halliday & Hasan 1976; Lyons 1977; Comrie 1981; van Riemsdijk & Williams 1986). However, in naturally-occurring speech, there is no constant, one-to-one correlation between the grammatical form of a pronoun and its referent at any moment of speech or section of text. The same referent can be indicated by a number of different pronouns (I can be ‘I’ at one moment, but part of ‘we’ at the next). Conversely, the same pronoun can indicate a number of different referents (we are all ‘I’, depending on who is speaking). The traditional correspondence asserted between pronoun form and person is therefore little but convenience, since personal pronouns are multifunctional, and their referents context-dependent (Wales 1996: 7). In contrast, discourse-oriented approaches to person reference stress the importance of personal pronouns in encoding and managing social interaction: they are seen as playing a central role in the construction of ‘self ’ and ‘other’ (Malone 1997; Sacks 1992; Schegloff 1996), and are “indicators of the complex relationships between selves and the societies these selves live in” (Mühlhäusler & Harré

JB[v.20020404] Prn:11/04/2008; 10:48

F: SCL3104.tex / p.3 (157-201)

Interaction, identity and culture in academic writing

1990: 47; see also Wales 1996). According to the view adopted here, person reference is thus more than a device for encoding grammatical relations such as number or person: it is one of the central means by which authors relate to their imagined audiences. By considering how person reference is used in context, and which discursive functions it fulfils, we can learn much about processes of identity construction and interaction in scholarly writing. In the context of academic writing, person reference is of particular interest precisely because scholarly texts have so long been construed – or rather, constructed – as impersonal. As Aikenhead notes, the recommendation that academic texts should be impersonal follows from a positivistic view of science as “authoritarian, non-humanistic, objective, purely rational and empirical, universal, impersonal, socially sterile, and unencumbered by the vulgarity of human imagination, dogma, judgements, or cultural values” (1996: 9). In both German and English-language scholarly writing, this impersonality has been connected with avoidance of the first person singular pronoun; this convention has come to be known in German as the ‘Ich Verbot’ [I taboo] (Weinreich 1989: 132; Kretzenbacher 1991: 120; see also Hutz 1997: 232; Gläser 1998: 485). The avoidance of explicit person reference is one way in which academics attempt to conjure an impression of objectivity. However, the adoption of an impersonal writing style, most clearly signified by the avoidance of ‘I’, does not render such texts any less personal, or their authors any more objective. At times, the avoidance of surface linguistic features marking personal opinions as such is somewhat disingenuous. The lulling effect that ostensibly impersonal academic style can have on readers is not always unconscious or accidental: scholars are usually expert writers, who seek to increase the persuasive power and force of their argument by all manner of stylistic devices. By examining person reference in a corpus of academic articles, we can shed light on the relationship between the ideal-type impersonality often demanded of the scholar, and the personal identity inseparable from each academic author. Given that scholarly writing consists of opinion, argument and evaluation, are these features marked as such, or presented as impersonal truths? This tension between a practically and theoretically unattainable disconnectedness, and an actual and inescapable personal reality, conveyed in and through language, is one of the most remarkable features of academic writing.

. Methodological approach In addition to examining pertinent features of academic writing, the study aims to make a positive contribution to the methodology of corpus-based studies of academic discourse. In the section following I explain three main methodological



JB[v.20020404] Prn:11/04/2008; 10:48



F: SCL3104.tex / p.4 (201-261)

Tamsin Sanderson

issues and detail how these issues were addressed in the present study. These issues have been neglected in previous research (for a discussion of previous contributions to the field, see Sanderson forthcoming). First, an empirical study of academic discourse needs to draw upon data samples which are representative of the object of study. It is important to note that representativity is not a function of size. In order for a sample to be representative, it must contain all of the characteristics (that is, variables) present in the wider population, in roughly the same proportions as in the wider population. To be empirically sound, therefore, a sample needs to take account of all the variables present in the population being investigated, not just the one or two variables of particular interest. For work on academic discourse, this means that a study of academics from one particular culture should include for example both men and women, of different academic levels and ages, tenured and untenured. A crosscultural study also needs to take account of all of these variables, for each of the cultures it examines. Thus, in each of the three cultures investigated, the present study samples a broad cross-section of academics of both genders, all ages, at all stages of the academic careers, tenured and untenured, writing in a range of humanities disciplines. The focus of the present investigation is English and Germanlanguage humanities research writing. The corpus therefore had to sample a broad cross-section of work by native-speaker academics from these two groups, and I settled on scholars from Britain, the USA and Germany, who represent the three major English- and German-speaking cultural groups. Men and women, tenured and untenured, at all stages of the academic career, were sampled in roughly equal proportions. Five disciplines were selected, representing a cross-section of humanities research production. I have not sampled texts from one or two disciplines only, since this would not be representative of all humanities disciplines. I do not claim that my results are generalisable to academic writing as a whole, nor even to humanities research writing as a whole, since I sampled only research articles. The results of the present study are however generalisable to humanities research articles, and this is what I claim. Research articles were chosen both because they represent a defining academic genre (Swales 2004: 207) and also because their relative brevity meant that a relatively large number of them could be analysed closely and in their entirety by a single researcher. The precise texts chosen in any one study will of course depend on the area of interest and aims of each particular investigation. What is important, however, is the principled collection of texts in a corpus, as practised here, with a view to ensuring representativity. A second, vital issue considered here was that the data sampled must be generalisable to the larger population under examination. Failing this, the results of even the most well-intentioned study will have only anecdotal value. The current investigation recognises that in order for the findings of an investigation to be generalisable, the texts chosen for the corpus must constitute a random sample, which

JB[v.20020404] Prn:11/04/2008; 10:48

F: SCL3104.tex / p.5 (261-322)


has to be of a reasonable size and also has to cover the major variables contained in the broader population. In this study, these prerequisites were met through careful construction of the corpus, which is a major advantage of both specialised corpora and of corpora tailor-made for a specific investigation. As demonstrated here, future researchers will need to consider the size of their sample in relation to the total size of the population they wish to examine, and weigh this against their own time and possibly financial constraints. Since most researchers face limited resources, it is better to compromise by choosing a genre that is shorter, or examining only a few central features, rather than reducing the number of texts examined. As the present study shows, such compromises are practically feasible. A third, and final, issue which the present investigation considered and applied fully, was a detailed statistical analysis of the results. Quite simply, the human eye is a poor judge of statistical significance. For this reason, statistical tests must be applied to the findings of corpus-based studies in order to separate real from perceived differences and tendencies, in order to ensure that the conclusions reached in the study are true. Contrastive studies in particular will require a sophisticated grasp of multivariate statistical analysis if they are to discern the relative influence of multiple different variables (culture, language, discipline, etc.) on linguistic production. Presenting results as absolute numbers or a cumulative percentage measure presupposes that there is an equal number of possible occurrences in each subcorpus. In most contrastive studies, however, this condition is not met, because the subcorpora differ in size. The results therefore have to be case weighted, as they were here. The data a researcher selects are crucial to the credibility, reliability and explanatory power of a study. It is vital, therefore, that studies are based upon principled data collections, which are representative of the group or groups being examined and generalisable to the wider population. The present study reflects the author’s awareness that cultural background is not the only variable that shapes written production. This awareness motivated the extensive statistical analysis undertaken here, which was necessary in order to distinguish culture from other influential variables, and to determine the relative influence of individual variables on the various aspects of linguistic behaviour analysed in the study. The exact tests applied to the data are explained further below, but first I turn to a more detailed presentation of the corpus. . The SCEGAD corpus The analysis is based on the Synchronic Corpus of English and German Academic Discourse (SCEGAD), a 1-million-word corpus compiled by the author at the University of Freiburg in 2001–2003 for the purpose of systematically investigating native-speaker academic writing in English (British and US-American) and Ger-



JB[v.20020404] Prn:11/04/2008; 10:48



F: SCL3104.tex / p.6 (322-411)

Tamsin Sanderson

man. The corpus contains the full texts of 100 research articles: 50 were written in German by German academics, and 25 each in English by British and USAmerican academics respectively. SCEGAD therefore enables not only interlingual (German/English), but also intralingual (British/US-American English) and intercultural (German/British/US-American) comparisons to be drawn. The texts were published between 1997 and 2003 in leading journals in the following five humanities disciplines: philosophy, history, folklore, English/German literary studies and English/German linguistics. In addition to being balanced for the native language of the authors, the corpus is also controlled for gender, age and academic status. The authors were divided into six age groups, under 30, 30–40, 40–50, 50–60, 60– 70 and over 70, and four academic status levels, pre-PhD, post-PhD, full professor and emeritus professor. The corpus therefore samples a broad cross-section of humanities scholars, the youngest 28, the oldest 75, of both genders, from a variety of disciplines, who span all stages of the scholarly career from pre-PhD scholar to emeritus professor. Using SCEGAD, coupled with bivariate and multivariate statistical analyses, it is possible to examine the effect of a large number of variables, not just culture, on specific features of academic writing, and to make conclusions that are more likely to be representative of a diverse discourse community. . Phenomena examined and statistical methods The variety of phenomena which can be examined using a corpus such as SCEGAD is endless. The present paper focuses on person reference because, for the reasons outlined above, this feature is of considerable interest in academic writing. Person reference is unusual in that it is a discourse phenomenon that can be identified largely automatically; most discourse features in fact require extensive manual analysis, and this remains a major obstacle to large-scale discourse studies using corpora (see discussion in Hardt-Mautner 1995; Aston & Burnard 1998; Hunston 2004). The analysis considers both the form and the discourse functions of person reference, paying particular attention to the communicative purpose in context. The pronoun forms examined are shown in Table 1, grouped according to formal grammatical categories. The analysis centres on first and second person pronouns, or “interpersonal pronouns” (Wales 1996: 3), since these most clearly fulfil interactive and identity construction purposes. Third person pronouns generally do not serve an interpersonal function, and were therefore excluded from the analysis. However, third person references to the reader along the lines of ‘the reader may well wonder. . .’ were counted, as were oblique authorial self-references in the third person, such as ‘the author wishes to thank x’ or ‘der Forscher wurde aufgefordert’ [the researcher

JB[v.20020404] Prn:11/04/2008; 10:48

F: SCL3104.tex / p.7 (411-412)


Table 1. English and German-language pronoun forms analysed in the corpus

1st person singular

2nd person singular/plural

1st person plural

Nom. Acc. Gen. Dat. Refl. Nom. Acc. Gen. Dat. Refl. Nom. Acc. Gen. Dat. Refl.

English

German

I me my/mine me myself you you your your yourself/selves we us our/s us ourself/selves

ich mich mein/e/r/n/s/es mir mich Sie* Sie Ihr/e/r/n/s/es Ihnen sich wir uns unser/e/r/n/s/es uns uns

*This is the polite form of the German second personal pronoun: the familiar form, ‘du’, would not be used in a formal context such as an academic text.

was asked].1 In the tables which follow, the figures labelled ‘third person reference’ therefore refer not to personal pronouns, but to third person references specifically to one of the parties in the textual interaction. It should also be noted that, for the second person, singular and plural forms are conflated, because they are formally identical in both languages. The counts for second person reference include not only direct addresses to the imagined reader(s), but also instances of generic ‘you’. The quantitative results are presented in table form, showing relative frequencies and comparative differences in frequency, and also as bar graphs for different subgroups. The graphical representation is intended as a useful complement to the figures presented in table form. The results were case weighted, calculated as mean occurrences per 10,000 words, because the SCEGAD texts are on average 8,937 words long. Presenting the results as mean occurrences per 100,000 words would have artificially inflated the frequency of the phenomena examined here, whereas . The English-language tokens searched for were ‘author’, ‘writer’, ‘reader’, ‘researcher’ and ‘scholar’. In German, the tokens were ‘Autor’ [author], ‘Verfasser’ [author], ‘Leser’ [reader], ‘Forscher’ [researcher] and ‘Wissenschaftler’ [scholar]. Only those tokens which actually referred to the author or reader of the text were counted, not those referring to authors, scholars or readers in general, or some other researcher or reader. In addition, discipline-specific third person self-references to the author as a ‘philosopher’, ‘linguist’, ‘folklorist’, ‘ethnographer’, ‘historian’, etc. were also included in the analysis. All other instances of third person singular references or pronouns which did not construct authorial identity or express the relationship between author and reader were excluded from the analysis.



JB[v.20020404] Prn:11/04/2008; 10:48



F: SCL3104.tex / p.8 (412-469)

Tamsin Sanderson

showing results per 1,000 words would have made the phenomena falsely seem far less common than they in fact are. The raw results were subjected to bivariate and multivariate significance tests, in order to discern whether the differences between groups that could be perceived through human observation were indeed valid, and conversely to establish whether seemingly marginal differences in frequency were in fact statistically significant. A standard statistical measure was applied, by which I regarded as significant all results up to the “p is less than 0.05” degree (pIf there is a problem with access to the normal monitoring position, an alternative position may be chosen, and a correction to the measurements shall be made. <sol>The entire leachate problem generated from the Tseung Kwan O landfills will be addressed comprehensively in the subsequent Restoration of Tseung Kwan O Landfills Study.

Moreover, as the coding scheme in Appendix 1 shows, the problem and solution elements have only been explicitly coded in the Body section of the report. Problem and solution elements also occur in the Introduction and Conclusion, but it was not necessary to code these separately as I am mainly interested in the textual distribution of various lexical items signaling the Problem-Solution pattern. A concordance for problems with the tag function activated would indicate whether a particular instance occurs in a particular move structure in the introduction or whether it occurs in the conclusion part of the reports, as illustrated in Extract 3 below. Extract 3: Examples of problems under codes used in the introduction and conclusion Introduction Section In addition to the problems of flooding, the watercourses within the study area are among the most polluted of the Territory. Conclusion Section Provided that mitigation measures are properly and fully implemented, it is considered that the works for 43CD and 30CD will not result in any insurmountable environmental problems.

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.7 (444-507)

Determining discourse-based moves in professional reports

The remaining sub-headings in the Body of the reports did not present any coding difficulties as they all accurately reflected the subsequent content. Although there was quite a lot of variation in terminology, there was a close match between the content and its corresponding sub-heading. For example, ‘existing environment/conditions’, ‘site description and operations’ and ‘study area’ all described the situation. Likewise, ‘environmental management system’ and ‘monitoring and auditing’ reflected auditing procedures. Of the three main sections of the reports, it was the Conclusion section that showed the least close match between the content and its sub-heading. Two main headings were mostly used, ‘Conclusion(s)’ and ‘Conclusions and Recommendations’, although the type of information contained under these two headings seemed to be very similar, as shown in the two extracts in Extract 4 below. The codes I adopted reflected this difference in labeling rather than content (see Appendix 1). Extract 4: Conclusion sections of two reports Conclusions The section of Route 5 is a priority project to reduce traffic congestion and environmental impacts in the central parts of Tsuen Wan. The EIA study has considered the magnitude and acceptability of all environmental impacts from the project and has concluded that the key issues will be noise during construction and operation. Mitigation measures have been considered to reduce noise from traffic and road enclosures and barriers have been recommended along parts of the road together with the application of a low noise road surfacing. Conclusions and Recommendations In general, road traffic noise will have the most pronounced impact in the area. Measures capable of satisfactorily mitigating the impact of noise have been developed. Subject to the implementation of these measures the environmental impact of the roads to be constructed under Contract No. TK/39 will not be significant.

The above discussion highlights the complexity of attempting to assign discoursebased labels to text, especially in view of the fact that the headings and subheadings do not always accurately reflect the content in their sections. In order to ensure as much reliability as possible, the student assistant in charge of coding the text manually double-checked the labels against my criteria for assigning the codes before entering them. The section below presents an analysis of the singular and plural forms of two items, discussed within the framework of the discourse-based coding scheme.



JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.8 (507-649)

 Lynne Flowerdew

. Analysis of selected items for the problem element within discourse-based moves As mentioned previously, in the first stage of this analysis, I extracted the keywords which defined elements of the Problem-Solution pattern. Table 2 presents the keywords occurring in ten or more reports, relating to the problem element. Although problem and problems did not surface as keywords in the corpus, they are worth examining further as they can be regarded as superordinate terms for all of the keyword nouns listed in Table 2. I now describe the textual distribution of problem(s) and the keywords impacts(s) with reference to the discourse-based coding scheme outlined in the previous section. I also consider whether these words co-occur with items marking any type of causal relation and if these occupy any particular move structure slot, as so defined in my coding system. In the following section, I will first examine the singular form of problem, followed by problems, and secondly, the noun impact followed by its plural form impacts. . Analysis of ‘problem’/‘problems’ The frequencies of occurrence for problem and problems are 41 and 51 instances, respectively, as shown in Table 3 on the following page. A good starting point for looking at these items from a more discoursal perspective is to determine where exactly in individual reports these instances occur. As Baker (2006) points out, texts have beginnings, middles and ends and that it may be relevant to know whether a particular word form is more likely to occur at the start or at the end of the text. The number of words per section for problem and problems given in Table 3 below indicates that these different forms of the same lemma have somewhat different distributional tendencies. Problem tends to cluster in the middle sections, with 83%, i.e. 34 out of 41 tokens found here, but only 4 tokens occurring in the beginning section. Problems, meanwhile, displays slightly different tendencies, generally occurring in the beginning as well as in the middle sections. 39%, i.e. 20 out of the 51 tokens for problems are found in the introduction sections, and 57%, i.e. 29 out of the 51 tokens in the body sections. Table 2. Keywords for the Problem element impacts (50) noise (44) waste (20) pollution (10) contaminated (14)

impact (26) traffic (23) dust (20) emissions (10)

sewage (12) sediments (10)

Note: Figures in parentheses denote the number of texts in which the words were found to be key.

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.9 (649-678)


Table 3. Breakdown of tokens for problem/problems by codes and macrostructure Frequency of problem Introduction <prop> <sit> Sub-total Body <prso> <prob> <sol> Sub-total Conclusion Sub-total Total

Frequency of problems

– – 4 – – 4 (10%)

7 1 4 2 6 20 (39%)

20 3 10 1 34 (83%)

22 1 4 2 29 (57%)

2 1 3 (7%)

2 – 2 (4%)

41 (100%)

51 (100%)

I will now take a closer look at the behaviour of problem and problems found in the various discourse-based move structures of different sub-sections of the reports, especially in relation to the causal relations exemplified in Table 1. Over 90% of the tokens for problem and problems occurred is some type of causative construction, the most commonly being with a causative verb, e.g. the export scheme will create a noise problem, signalling the Reason-Result relation. However, a manual examination of the expanded concordance lines checked against the codes revealed that this type of patterning could not be identified with any particular move structures across the three main sections of the reports. It has already been noted that the majority of the tokens for problem occur in the body, with 20 tokens under <prso>, three under <prob> and 10 coded under <sol>. A comparison of the phrases containing problem coded under <prso> with those under <prob> and <sol> reveals some interesting differences. It was found that problem occurred in a Means-Purpose relation six times incorporating a two-way signalling verb such as ‘ameliorate’, ‘improve’, ‘alleviate’, which provide a linkage between the problem and solution elements, as in (1): (1) In order to alleviate the problem of noise, . . .

All six of these instances were accounted for by the <prso> tag. This data therefore suggests that the Means-Purpose relation has a tendency to be used when reference



JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.10 (678-792)

 Lynne Flowerdew

is made to a problem in the immediately preceding text (i.e. for text coded <prso>, which reflects the ‘part-by-part’ organisation), but does not tend to occur if the solution aspect is separated from the problem aspect (i.e. for text coded <sol>). Here, the 10 instances of problem coded <sol> were mostly found in phrases referring back to a previously introduced problem, e.g. Because of this potential problem. . . ; . . . as for the analysed noise problem. . . ; Nevertheless, the noise problem generated by additional traffic. . . , before the proposed solution was introduced. Another observation was that problems appears to have a different collocational patterning in the introduction and body sections of the reports. In the introductions, it is found to collocate with a more general adjective such as ‘environmental’, as in Example (2) coded , rather than specific ones (e.g. ‘noise’, ‘traffic’) which are found in the body of the reports under the <prso> tag, as in Example (3): (2) Where potential environmental problems may arise, mitigation measures were recommended . . . (3) Severe traffic noise problems already exist in TMNT. . . . The traffic noise impact on Butterfly Estate can be relieved following the introduction of the Foothills Bypass. . . <prso>

The fact that problems is a significant feature of the introductions, whereas problem has this role in the body sections of the reports, is suggestive of the fact that problems is probably used in a topic-like sentence in the introductions, and then each problem, in turn, is itemized in the body of the reports. Moreover, half of the tokens for problems found in the Means-Purpose relation were also coded <prso>, with none coded <sol>, indicating that in the body sections of the reports the Means-Purpose relation is used for binding at a local level. However, an examination of the tokens for problems in the introduction sections showed that the purpose statement referred to problems expressed in a very general way, or to problems that would be elaborated in the body of the reports, as in Example (4): (4) The EDS identified that highway improvement works would be required to overcome the anticipated traffic problems on Lung Mun Road. . .

Therefore, whether the Means-Purpose relation in which problems is found is used for local or more textual binding seems to be dependent on its positioning in the overall discourse structure. . Analysis of ‘impact’/‘impacts’ A breakdown of the tokens for impact and impacts across the various move structures in each macro-section of the reports is presented in Table 4. Interestingly,

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.11 (792-820)

Determining discourse-based moves in professional reports 

Table 4. Breakdown of tokens for impact/impacts by codes and macrostructure Frequency of impact

Frequency of impacts

Introduction <prop> <meth> <scope> Sub-total

16 22 6 8 1 32 14 48 147 (22.5%)

18 30 9 6 4 57 51 30 225 (23.5%)

Body <sit> <prso> <prob> <sol> Sub-total

7 311 69 55 20 462 (71%)

13 422 81 42 45 603 (63%)

Conclusion Sub-total Total

29 14 43 (6.5%)

104 24 128 (13.5%)

652 (100%)

956 (100%)

in percentage terms, the frequencies of impact and impacts are reasonably similar across the three sub-sections. I will now examine some of the key findings of impact and impacts within the different move structures of each macro-section of the report and compare these with their superordinate terms problem and problems discussed in the previous section. As in the previous section, the discussion will focus on their involvement in the Means-Purpose and Reason-Result relations, and their collocational preference for different adjectives in different sections of the reports. Nearly half (311) of the total number of tokens for the singular form impact are found under the <prso> tag in the body. In 51 cases impact is involved in a Means-Purpose relation, and it is to be noted that 28 of these instances fall under the <prso> tag. The remaining 23 Means-Purpose phrases are mostly found under and , with only three phrases found under <prob> and <sol>. This is very similar to the distribution for Means-Purpose phrases with impacts, where 30 out of 66 tokens for impacts in a Means-Purpose relation are coded <prso>. The fact that, as in common with problem and problems, Means-Purpose phrases tend to cluster under the <prso> tag lends further weight to the argument

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.12 (820-880)

 Lynne Flowerdew

that they act as a signal of local coherence in a part-by-part organizational structure rather than a signal of global coherence associated with a whole-by-whole organisational structure for the Problem-Solution pattern in the body section of these reports. One key aspect which sorting impact under the codes throws up is its inclusion in the fixed phrase ‘Environmental Impact Assessment/Study’, only found in the discourse-based moves in the introduction sections. For example, of the 22 phrases for impact coded , indicating background to the study, 19 are examples of this fixed phrase. This term is also very obvious in those phrases coded where 39 out of the 48 instances of impact are found in this fixed phrase, relating either intertextually to a previous study, as in Example (5), or to the purpose of the report in question, as in Example (6): (5) An Environmental Impact Assessment (EIA) Study has been undertaken. . . (6) The purpose of this Environmental Impact Assessment is to provide information on. . .

The sorting by codes also throws up differences in premodifying adjectives between the collocations of impact coded under <prob> and those under <prso>. It was found that 8 of the tokens for impact under <prob> were preceded by ‘direct’ and 10 tokens by ‘indirect’, although these adjectives were found to occur only once each with impact under <prso>. In contrast, a more specific noun modifier such as ‘noise’ was found to collocate with impact 36 times in those phrases coded <prso>, but only three times with those phrases coded <prob>. These adjectival collocations revealed by the sorting according to codes are thus suggestive of the two different organisational structures for encapsulating the Problem element in the body of the reports: in the ‘part-by-part’ structure (coded <prso>) one would expect more specific noun classifiers such as ‘noise’, ‘air’ to be used, as in Example (7), whereas in the ‘whole-by-whole’ structure (coded <prob>) it is not surprising to find more general adjectives such as ‘environmental’, ‘direct’, ‘indirect’ for describing the problems, as shown in (8). (7) Dust emission sources on site include site preparation/formation works and storage/handling of loose aggregates. On-site mitigation measures to reduce dust impacts to acceptable levels include:. . . (8) Indirect impact would result from disruption or loss of amenity to adjoining land uses including: disruption to traffic, restriction on access, noise pollution. . .

Turning now to impacts, it is interesting to note that 45% of the tokens for impacts (422 tokens out of the total of 956) fall under the <prso> tag in the Body section, which is a very similar distribution pattern to the tokens for impact. Moreover, it is in this move structure where certain patterns associated with particular causal

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.13 (880-949)

Determining discourse-based moves in professional reports 

relations have a tendency to occur. For example, out of the 82 phrases with impacts signalling Means-Purpose, as in (9), 32 were coded <prso>. (9) . . .mitigation measures are recommended to reduce impacts on water quality.

Of the 50 remaining purpose phrases 22 occur in various move structures in the introduction, with 13 under . Another 10 are coded , 10 coded and only two tokens are found under <prob> and six under <sol>. What is striking about these Means-Purpose phrases are the differences in lexis across the different move structures. For example, the verbs ‘assess’, ‘evaluate’ and ‘mitigate’ are the most common in phrases coded , as in (10), whereas the verbs ‘reduce’ and ‘minimise’ are found in phrases coded <prso>, as in (11). (10) . . .to assess in detail the environmental impacts of the Construction Proposals (11) . . .in order to reduce potential noise impacts.

Causative verbs signalling Reason-Result were concentrated in the move structures in the body section of the reports. The non-causation phrases with impacts, on the other hand, were mostly found in all the move structures of the introduction section and in those move structures associated with the monitoring and assessment aspect of the impacts. This kind of information is contained in the phrases coded (which are not purpose-related) and in the phrases coded dealing with assessment, as in (12). (12) . . .the EIA assessment has identified adverse impacts in relation to the following.

Given that the reports usually discuss several types of environmental impacts, it is not surprising to find that in the conclusion sections there are more occurrences of impacts than impact, with many of the phrases embodying the company’s overall assessment of the situation, as in (13). (13) In view of the low potential for environmental impacts from the operation of the GRS, no environmental monitoring is considered necessary. . .

. Conclusion Swales (2004: 229) reviews various genre studies in which grammatical features have been found to indicate the type or nature of a move. Citing Gledhill (2000), Swales notes that Gledhill found that “was to” was used to signal the onset of an introduction’s third move, i.e. outlining the purposes of the present study. Likewise, lexical signals can also be indicative of move structures, the most obvious of which would be lexis indicating section headings, such as method and results, e.g. The results are shown in Table 1.

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.14 (949-1019)

 Lynne Flowerdew

The research reported on in this paper has suggested that in addition to individual grammatical or lexical items, a consideration of lexical items together with their involvement in causative phrases and collocational behaviour could also signal move structures. This point has been especially borne out by an analysis of problem/problems and impact/impacts, which were found to be involved in a Means-Purpose relation under the sections of the reports coded <prso>. Moreover, these purpose clauses were often accompanied by verbs such as ‘reduce’ and ‘minimise’ linking the Problem and Solution elements. Although this is quite a small-scale study, and the results are suggestive rather than definitive, nevertheless, the coding scheme outlined in this paper has proved useful for shedding light on corpus data from a more discourse-based perspective. This research has also shown that the field of corpus linguistics can make use of genre-based methodologies.

References Baker, P. 2006. Using Corpora in Discourse Analysis. London: Continuum. Bhatia, V. K., Langton, N. & Lung, J. 2004. Legal discourse: Opportunities and threats for corpus linguists. In U. Connor & T. Upton (eds), 203–231. Biber, D., Connor, U. & Upton, T. 2007. Discourse on the move: Using corpus analysis to describe discourse structure. Amsterdam: John Benjamins. Connor, U. & Upton, T. (eds). 2004. Discourse in the Professions: Perspectives from corpus linguistics. Amsterdam: John Benjamins. Connor, U., Precht, K. & Upton, T. 2002. Business English: Learner data from Belgium, Finland and the U.S. In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, S. Granger, J. Hung & S. Petch-Tyson (eds), 175–194. Amsterdam: John Benjamins. Flowerdew, L. 1998. Corpus linguistic techniques applied to textlinguistics. System 26 (4): 541– 552. Flowerdew, L. 2003. A combined corpus and systemic-functional analysis of the ProblemSolution pattern in a student and professional corpus of technical writing. TESOL Quarterly 37(3): 489–511. Flowerdew, L. 2004. The argument for using English specialised corpora to understand academic and professional language. In Discourse in the Professions: Perspectives from corpus linguistics, U. Connor & T. Upton (eds), 11–33 Amsterdam: John Benjamins. Flowerdew, L. 2005. An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: Countering criticisms against corpus-based methodologies. English for Specific Purposes 24: 321–332. Flowerdew, L. 2008. Corpus-based analyses of the problem-solution pattern: A phraseological approach. Amsterdam: John Benjamins. Garside, R. 1996. The robust tagging of unrestricted text: The BNC experience. In Using Corpora for Language Research, J. Thomas & M. Short (eds), 167–180. London: Longman. Gledhill, C. 2000. The discourse function of collocation in research article introductions. English for Specific Purposes 19(2): 115–135.

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.15 (1019-1118)


Henry, A. & Roseberry, R. L. 2001. Using a small corpus to obtain data for teaching a genre. In Small Corpus Studies and ELT, M. Ghadessy, A. Henry & R. L. Roseberry (eds), 93–133. Amsterdam: John Benjamins. Hoey, M. P. 2001. Textual Interaction. London: Routledge. Paltridge, B. 1994. Genre analysis and the identification of textual boundaries. Applied Linguistics 15(3): 288–299. Scott, M. 1997. WordSmith Tools [Computer software]. Oxford: OUP. Simpson-Vlach, R. & Leicher, S. 2006. The MICASE Handbook: A resource for users of the Michigan Corpus of Academic Spoken English. Ann Arbor MI: The University of Michigan Press. Simpson R. C., Briggs, S. L, Ovens, J. & Swales, J. M. 2002. The Michigan Corpus of Academic Spoken English. Ann Arbor MI: The Regents of the University of Michigan. Swales, J. M. 1990. Genre Analysis: English in academic and research settings. Cambridge: CUP. Swales, J. M. 2002. Integrated and fragmented worlds: EAP materials and corpus linguistics. In Academic Discourse, J. Flowerdew (ed.), 150–164. London: Longman. Swales, J. M. 2004. Research Genres. Cambridge: CUP. Upton, T. 2002. Understanding direct mail letters as a genre. International Journal of Corpus Linguistics 7(1): 65–85. Upton, T. & Connor, U. 2001. Using computerized corpus analysis to investigate the textlinguistic discourse moves of a genre. English for Specific Purposes 20(4): 313–329.



JB[v.20020404] Prn:27/03/2008; 9:07



F: SCL3106.tex / p.16 (1118-1165)

Lynne Flowerdew

Appendix 1: Coding scheme for professional reports Code Specification

Code Name

Report heading/sub-heading

Headings: main headings sub-headings

Introduction: introduction

introduction introduction and objectives

foreword scope

<scope>

foreword scope

background

background study context

objectives

objectives objectives and key issues key issues to be addressed purpose of the study background and purpose objectives and scope

structure

structure of the report format of the report

proposal

<prop>

proposed project/works/ development/developments project description/characteristics description of the scheme study area and the project study assumptions conceptual design of facilities proposed restoration works landfill restoration scheme airport plan Shenzhen River regulation project

justification

need for CIF justification for project

methodology

<meth>

assessment approach study process

Body situation

<sit>

existing environment/conditions existing environmental conditions setting of project works environmental framework/setting local setting and site history site description and operations study area

JB[v.20020404] Prn:27/03/2008; 9:07

F: SCL3106.tex / p.17 (1165-1165)

Determining discourse-based moves in professional reports problem

<prob>

extent of impacts potential impacts environmental impacts environmental and safety issues

solution

<sol>

mitigation measures measures for mitigation measures to reduce impacts evaluation of impacts

problem/solution

<prso>

study findings and recommendations potential impacts & proposed mitigation predicted impacts and mitigation potential impacts and proposed mitigation measures detailed environmental impact assessment leachate management findings of the EIA principal findings impacts environmental impacts impacts during construction

auditing

environmental monitoring and audit monitoring and auditing environmental management system ongoing environmental monitoring additional mitigation measures monitoring studies reporting

budget

cost financial implications

Conclusion conclusion/ recommendations

conclusions and recommendations recommendations (summary) summary and recommendations recommendations for further investigations

conclusions

conclusion(s) overall conclusions summary (concl) final considerations



JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.1 (47-119)

//→ ONE country two SYStems // The discourse intonation patterns of word associations Winnie Cheng and Martin Warren The Hong Kong Polytechnic University

This study examines the relationship between the phraseological characteristics of language and the communicative role of discourse intonation (Brazil 1997). The findings are based on one of the four sub-corpora of the one-million-word Hong Kong Corpus of Spoken English (HKCSE), which has been prosodically transcribed. A number of studies have looked at word associations, but this is the first corpus-based study of speakers’ discourse intonation choices for these patterns. The intonational features, viz. tone unit boundaries and prominences, of the ten most frequent 3- and 4-lexically-rich word associations and the ten most frequent grammatically-rich word associations in the sub-corpus of public discourse, which forms 25% of the HKCSE, were examined to determine the extent to which this patterning also reveals patterns of discourse intonation. The findings suggest that discourse intonation patterns do exist in terms of tone unit boundaries and the distribution of prominence. However, while discourse intonation patterns are discernible, speakers may, and indeed do, deviate from them in order to alter their discourse-specific communicative role.

.

Introduction

As Sinclair (2004a: 148) observes, “the word is not the best starting-point for a description of meaning, because meaning arises from words in particular combinations”. Corpus-driven studies have highlighted the prevalence and importance of the co-selection of words in language and have led some (Sinclair 1991, 1996, 2004a, 2004b; Hunston & Francis 2000; Hoey 2005) to argue for a theory of language as ‘phraseology’. The phraseological character of natural language refers to the more-or-less fixed co-occurrence of linguistic elements. Sinclair (1991) terms this the ‘idiom principle’ which refers to the strong patterns of co-selection among words, and the opposing view is termed the ‘open choice principle’ (Sinclair 1991). The latter interpretation of language argues that many complex choices

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.2 (119-167)

 Winnie Cheng and Martin Warren

are available to speakers and writers at each place in a text where a word, phrase or clause is completed, and the only restriction derives from the grammar. Sinclair (1991) argues that there are important restraints overlooked by advocates of the open choice principle. These restraints are register, semi-preconstructed phrase, collocation, colligation,1 and semantic environment. Previous studies have examined phraseological patterns and characteristics of language (see, for example, Biber, Conrad & Reppen 1998; Biber et al. 1999; Cowie 1998; Hunston & Francis 2000; Burdine 2001; Bartsch 2004; Nesselhauf 2005; Sinclair 1987, 1996, 2004a, 2004b; Hoey 2005), and yet so far there has not been any study that investigates speakers’ discourse intonation choices for these patterns. This paper reports on the first corpus-driven study that aims to investigate the role played by discourse intonation (Brazil 1997) in adding context-specific communicative value to word associations. The data are the prosodically transcribed part of the Hong Kong Corpus of Spoken English (HKCSE) (see Cheng, Greaves and Warren (2005) for details of the Prosodic Corpus).

. Word associations Two different approaches to classifying the results of the phraseological tendency (Biber et al. 1999; Sinclair & Renouf 1991; Sinclair 1996) are briefly outlined below. Biber et al. (1999: 989–1025) distinguish among four different types of word associations. These are ‘idioms’, for example crop up, put up with, get away from (ibid.: 988), ‘collocations’, which are associations between lexical words when the collocates co-occur more frequently than expected by chance (ibid.: 988), ‘lexicogrammatical associations’ (ibid.: 989) whereby, for example, verbs such as think and know are strongly associated with to-complement clauses (ibid.: 989), and ‘lexical bundles’ (ibid.: 989–1025), which can be “regarded as extended collocations: bundles of words that show a statistical tendency to co-occur” (ibid.: 989), for example, do you want me to, the nature of, has not been, and put it in. This study looks at contiguous word associations, similar to Biber et al.’s (1999: 989) ‘lexical bundles’, in order to be able to describe patterns of discourse intonation that exist, if any. Tognini-Bonelli (2001: 105) notes that Sinclair (1996) has identified two types of extended unit of meaning. One has a lexical core, which is an obligatory element in Sinclair’s (1996) ‘lexical item’, based “on a lexical core and extended to incorporate grammatical as well as other lexical choices” (ibid.: 105). The second is . Colligation is patterning in terms of the word class, or the structural pattern/grammatical category, of the colligates.

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.3 (167-207)

The discourse intonation patterns of word associations 

based on a grammatical core which constitutes Sinclair and Renouf ’s (1991) ‘collocational framework’, “extended to incorporate lexical choices” (Tognini-Bonelli 2001: 105). Importantly, then, in Sinclair’s (1996) notion of units of extended meaning, there is what he terms a core unit (either lexical or grammatical) around which lexico-grammatical and semantic units are co-selected. When describing the procedure for identifying extended units of meaning, Sinclair (1996) states that the first of four steps is to identify the collocational profiles of a node word or phrase, and the second is to establish the colligational profiles. The third and fourth steps are to identify the semantic prosody and semantic preference of the texts. This study focuses on the first two steps when analysing the discourse intonation patterns. The centrality of phraseology in language use propounded by Sinclair has led those working in the area of pattern grammar (e.g. Francis 1993; Hunston & Francis 2000; and Hunston 1995) to argue that corpus linguists will eventually be able to describe all lexical items in relation to their syntactic preferences, and all grammatical structures with regard to their lexis and phraseology (Francis 1993: 155). An example of the important implications of this view about language is provided by Tognini-Bonelli (2001: 92–98) who points out that the widely accepted idea that “lemma and inflected forms share the same meaning and differ only in grammatical profile” is flawed. She cites studies by Sinclair (see, for example, 1985, 1987, 1991) and Mindt (1991) which demonstrate that each inflected form is associated with a particular pattern of use and that the “different senses of a word or phrase can be correlated with a characteristic pattern of choice at the lexical, grammatical and semantic levels” (Tognini-Bonelli 2001: 92). This study goes one step further and examines the additional communicative value of discourse intonation (Brazil 1985, 1997) in the phraseology of spoken discourse. This study adopts two categories for the word associations studied, namely lexically-rich and grammatically-rich. Members of the former category are determined by the majority of the associated words being what are traditionally termed ‘lexical words’, and those assigned to the latter contain a majority of ‘grammatical words’. The groupings are motivated by the research purpose to examine whether the patterning, if any, differs as it is generally assumed, for example, that ‘grammatical words’, compared to ‘lexical words’, are less likely to be made prominent by speakers. In this study, ‘word associations’ are determined by collocational profile; in other words, “the recurrent co-occurrence of words” (Clear 1993: 277). The word associations examined in this study are therefore the most frequent three or four word association found in the corpus. While there are statistical tests in use (e.g. t-scores and MI values) that claim to determine the strength of two word associations (see, for example, Barnbrook 1996; Clear 1993), these cannot be used on word associations of more than two words. Also, such statistical tests have

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.4 (207-256)

 Winnie Cheng and Martin Warren

come in for criticism as to their usefulness for calculating the significance of the associations of two words (Stubbs 1995). In this first attempt at discerning the relationship between discourse intonation patterns and word associations, the examples of the word associations are all contiguous in structure; in other words, constituency variation (e.g. AB and A*B) is not studied. Future studies need to go beyond contiguous instances and include constituency variation in order to more fully represent the extent of the phraseological tendency and discourse intonation patterns in language use. Future studies should also encompass positional variation (e.g. AB and BA), within what are termed ‘concgrams’2 (Cheng, Greaves & Warren 2006), once a clear framework for analysing concgrams has been established.

. Discourse intonation and word associations This study concentrates on two aspects of Brazil’s (1997) discourse intonation, i.e. tone unit boundaries and the distribution of prominence, in relation to the occurrence of the contiguous word associations. The decision was motivated by both Brazil’s (1995) description of the grammar of speech and the linear unit grammar developed by Sinclair and Mauranen (2005, 2006). Brazil’s (1995: 14) grammar of speech is a “linear real-time description of syntax” which seeks to be an integration of phonological patterns, in particular tone unit boundaries and prominence, with the grammar. Sinclair and Mauranen’s linear unit grammar is based on the way in which language is a strictly linear phenomenon produced in a succession of chunks, which can then be categorised as either message-oriented or organisation-oriented. Importantly, for this particular study, there is some initial evidence (Warren 2006) that Brazil’s tone unit boundaries tend to occur at the boundaries between the chunks identified in linear unit grammar, termed ‘provisional unit boundaries’ (Sinclair & Mauranen 2005, 2006). Linear unit grammar is not a grammar that is confined to spoken language, and its originators claim that all texts can be broken down into chunks based on whether a chunk is ‘incremental’ in that it is message-oriented, or ‘non-incremental’, that is contributing to the organisation of the text either interactively or textually. This division into message-elements and organisation-elements allows the analyst to eventually combine fragments of message elements, which come in a variety of forms, into one finalised message-element which in turn is a ‘linear unit of meaning’ (Sinclair & . Concgrams consist of up to five words that co-occur regardless of constituency variation (e.g. AB and A * B) or positional variation (e.g. AB and BA).

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.5 (256-335)

The discourse intonation patterns of word associations 

Mauranen 2005, 2006). As discussed later, the results of this study show that a speaker’s choice of discourse intonation can at times be explained with reference to linear unit grammar. . Tone unit and prominence The public corpus used in this study is a rare form of a spoken corpus in that it is both orthographically and prosodically transcribed. While the orthographic transcription of spoken data is well established and the conventions quite well-known, the number of spoken corpora that are also prosodically transcribed is very small. The HKCSE is the first large-scale attempt to employ the discourse intonation system (Brazil 1997) to mark intonation, but it is not the first corpus to have added a prosodic transcription to the usual orthographic. Examples are the London-Lund Corpus (Svartvik 1990), The Survey of English Usage (Svartvik 1990: 15), the Spoken English Corpus (Knowles, Wichmann & Anderson 1996; Wichmann 2000), the Santa Barbara Corpus of Spoken American English (Du Bois & Englebretson 2005), and the C-ORAL-ROM Corpus for Spoken Romance Languages (Emanuela & Massimo 2005). As the representation of prosodic features in corpus data is less standardized, it is necessary to briefly describe the discourse intonation system (Brazil 1985 and 1997) used to transcribe the corpus. The prosodic labelling system used for the HKCSE is given in the Appendix. The discourse intonation systems developed by Brazil (1985, 1997) are chosen for the prosodic transcription of the HKCSE because the primary concern of the research team is to analyse the corpus data with respect to discoursal, pragmatic and intercultural communication phenomena. Brazil’s systems are particularly useful for those seeking to explore the data in terms of the communicative value of discourse intonation. Speakers can select from four independent systems: prominence, tone, key and termination within a tone unit (see Table 1 below). In discourse intonation, a tone unit is taken to mean a stretch of speech with one tonic segment comprising at least one tonic syllable, but which may extend from an onset (first prominent syllable) to the tonic (final prominent syllable) (Hewings 1990: 136). Each of the independent systems is a source of ‘local meaning’ (Brazil 1997: xi) by which Brazil seeks to underline that these are Table 1. Discourse Intonation systems (adapted from Hewings & Cauldwell 1997: vii; in Brazil 1997) System

Choices

Prominence Tone Key Termination

prominent/non-prominent syllables rise-fall, fall, level, rise, fall-rise high, mid, low high, mid, low

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.6 (335-410)

 Winnie Cheng and Martin Warren

moment by moment judgements made by speakers based on their assessment of the current state of understanding operating between the participants. In other words, Brazil’s system eschews the notions that intonation conveys fixed attitudinal meanings or is associated with particular grammatical structures. It also needs to be borne in mind that intonation alone, let alone one particular choice within the four systems, is not the sole conveyor of discourse meaning. When looking at intonation, the researcher at the same time has to be mindful of all of the other possible contributing factors in the ongoing negotiation of meaning between discourse participants. This study focuses on the relation between word associations and the intonational features of tone unit boundaries and prominences. Prominence, according to Brazil (1997: 23–25), is used as a means of distinguishing those words which are situationally informative. Importantly, in this conceptual framework, the assigning of prominence is not fixed on the basis of grammar or word-accent/stress, but rather it is a choice made by the speaker in context. For Brazil, speakers have available to them two paradigms: existential and general. The ‘existential paradigm’ is the set of possibilities that a speaker can choose from in a given situation, while the ‘general paradigm’ is the set of possibilities that is inherent in the language system (ibid.: 23). The choice of prominence in naturally-occurring spoken discourse is made when the speaker chooses from the existential paradigm that is available at that point in the discourse, thus it represents an existential sense selection. Brazil (ibid.: 22–23) provides his well-known example of these two paradigms: the response queen of hearts to the question which card did you play. Brazil states that the speaker selects queen and hearts from the existential paradigm because these choices are limited by the contents of the pack of cards rather than the language system. Of, on the other hand, is a product of the general paradigm because the speaker is limited in this context to this word by the language system. It needs to be added that not every syllable in a word has to be made prominent for the word to have the status of prominence in a tone unit. Also, speaker decisions within the prominence system are made on the basis of the speaker considering the status of individual words (ibid.: 39). The other three systems in discourse intonation, tone, key and termination, are not attributes of individual words, but of the tonic segment (i.e. that section of the tone unit that falls between the first and the last prominent syllable).

. Data of the study The public sub-corpus of the HKCSE comprises approximately 230,000 words made up of naturally occurring discourse between Hong Kong Chinese (87%) and native speakers of English, or speakers of languages other than Cantonese (13%),

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.7 (410-463)

The discourse intonation patterns of word associations

Table 2. Data of the study Discourse type Speeches Speeches plus Q&A Press briefings TV/Radio interviews & discussion panels Discussion forums TOTAL

Number of texts

Number of Words (%)

43 11 18 19 9 100

105,803 (46%) 32,269 (14%) 11,508 (5%) 64,489 (28%) 16,136 (7%) 230,205 (100%)

including speech events such as public speeches, public forums, press briefings, radio and television interviews, and panel discussions (see Table 2). To establish the most frequently occurring contiguous word associations in the public sub-corpus, the data were examined by iConc©3 (see Cheng, Greaves & Warren 2005 for more details) to generate lists of 3-word and 4-word associations.

. Discussion of lexically-rich word associations Table 3 is a list of the ten most frequent lexically-rich word associations including both 3-word and 4-word units presented in order of total frequency of occurrence. Many of the top-ten lexically-rich word associations examined are specific to the Hong Kong public domain, which is inevitable given the kinds of discourse in the corpus, and so require a brief explanation. The term article twenty-three refers to a sub-section of Hong Kong’s constitution, officially known as the basic law, relating to internal security matters, which has been a highly controversial issue in Hong Kong. The year nineteen ninety-seven is the year that Hong Kong returned to the People’s Republic of China (PRC) after many years of British rule, and is therefore synonymous with that watershed event. The term one country two systems refers to the policy initiated by Deng Xiaopeng in the 1980s, stating that while Hong Kong, after 1997, is part of the PRC (i.e. one country), it will retain a high degree of autonomy from the PRC for 50 years, and so retain its capitalist economic system, legal system, internal government, immigration controls, currency, and so on (in other words, two systems will operate in the PRC). This policy results in public figures in Hong Kong frequently listing the perceived advantages that Hong Kong possesses and some of these are also on the list in Table 2: the rule of law, free flow of information and a level playing field (i.e. equal opportunities and the same rules for all citizens). Another term, asia’s world city, has been coined by . iConc is a search engine implemented by Chris Greaves, Senior Project Fellow, English Department, The Hong Kong Polytechnic University, specifically for research on the HKCSE (prosodic).



JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.8 (463-565)

 Winnie Cheng and Martin Warren

Table 3. Top 10 lexically-rich word associations Lexically-rich word associations article twenty-three nineteen ninety-seven one country two systems hong kong people the rule of law the central government the basic law asia’s world city free flow of information a level playing field

Total frequency in a single tone unit 56 49 36 54 35 36 36 11 6 14

% of word associations in a single tone unit 95.0% 89.1% 65.5% 100.0% 94.6% 100.0% 100.0% 44.0% 35.3% 93.0%

Total frequency of occurrence 59 55 55 54 37 36 36 25 17 15

Hong Kong’s political leaders to depict Hong Kong in a positive light. When first used around the year 2000, it was in the context of it being a long-term goal for Hong Kong to eventually become asia’s world city, and so be on an equal footing with other world cities such as New York, London, Paris, Tokyo and so on. However, within a fairly short space of time, asia’s world city was adopted as a synonym of ‘Hong Kong’ by the political leadership of the city. In the 2005–06 Policy Address of the Chief Executive of Hong Kong, for instance, asia’s world city is used four times. Table 3 shows the frequencies with which the ten lexically-rich word associations are spoken within one tone unit. All but two of the lexically-rich word associations are typically spoken in a single tone unit; that is between 65.5% of instances rising to 100% of instances. In fact, three of the top-ten word associations – hong kong people, the central government and the basic law – have a 100% occurrence. This intonation pattern is therefore strong for most of the lexicallyrich word associations, and so is rather predictable. The exceptions to this general pattern are asia’s world city (44%) and free flow of information (35.3%). All of the remaining instances of these two lexically-rich word associations are spoken in two, and sometimes three, tone units. Example 1 shows a sample of typical instances of asia’s world city being spoken in one tone unit, and also in more than one tone unit, and these are discussed below (see the Appendix for a list of prosodic transcription conventions).

Example 1 asia’s world city spoken in a single tone unit {\to [conSOlidate] our <poSItion>}{\as [Asia’s] {= <mainTAIN>}{\our <poSItion>}{\as [Asia’s] = for }{\}{/as [Âsia’s] d }{= our <poSItion>}{\as [Asia’s]

world }{= a } * world } * {= world }{\/we are }{/}{/}{\/and the <MOST> NG] kong is }{= be}{=}{= <WORLD>}{\}{= a [SET] our selves a}{= to become}{=}{= <WORLD>}{\}{= as [WELL] ET> our selves}{\a}{= to be }{\[WORLD]}{\[WE] are}{\it is [MOST] imPORtant to have a FREE flow of }{= our [FREE] < LD] by an indePENdent <juDIciary>}{\/the [FREE] flow of }{= a [^roBUST] and law}{= a [LEvel] playing }{=[_FREE] FLOW of }{= a }{= an by an [indePENdent] <juDIciary>}{\the [FREE] FLOW of }{\the [roBUST] and

free flow of information spoken across two tone units >}{= }{\}{= the [FREE] }{\/ of }{= }{= we [ÂLso] }{= the [FREE] }{\of }{= <SOME> >}{= and <JEAlously>}{= [GUARD] the flow}{= of }{= and the }, lines 3–4), rather than one and systems.

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.12 (777-889)

 Winnie Cheng and Martin Warren

Example 3 1 2 3 4

b1: {\}{\er [^THAT] being the }{= is [THERE] a danger }{= if kong}{= <ER>}{= }{= }{= under the }{\political <SYStem>}{= that } {\ it’ll [afFECT] one country two <SYStems>}

In Example 4 below, the speaker also selects an atypical distribution of prominence because he is focussing on the national perspective (line 4), and then the point of view of the entire country (line 5). It is, therefore, country rather than one that is made prominent in the three instances of one country two systems in lines 2, 3, 5 and 6. When this focus is dropped, and he summarises his overall position, beginning in line 10 with but I tell you, he selects the typical distribution of prominence and makes one and systems prominent ({ \/ [IF] you can make ONE country two <SYStems> work }, lines 11–12).

Example 4 1 b: {= and of }{= all this}{= <MAking> sure} 2 {= one [COUNtry] two <SYStems>}{= is [FULly] }{\[^ 3 NOT] only }{= one [COUNtry] two SYStems is } 4 {= <ER>}{= [FROM] a }{?from the}{?the the the}{ 5 [POINT] of view of the enTIRE }{= how [imPORtant] one 6 COUNtry two SYStems <sucCESSful>}{= }{= but the < 7 WHOLE>}{\}{= is [REALly] to }{?that our < 8 eCOnomy> is un}{\undergoing a [^treMENdous] }{= < 9 AND> that}{= the <WAY> forward}{\/ is [^NOT] going to be <EAsy>}{= 10 }{\i you}{= [I] am very }{= of [WHAT] 11 we CAN }{\/[IF] you can make ONE country two <SYStems> 12 work}{\[AS] we have }{\over the [LAST] four }

. Discussion of grammatically-rich word associations In addition to lexically-rich word associations occurring as 3-word and 4-word units, the ten most frequent grammatically-rich word associations (Table 5) are examined. Only one four-word grammatically-rich word association – the rest of the – made it onto the list; the others are all three-word ones. This is because as the number of associated words increases, the number of word associations in a corpus decreases. These grammatically-rich word associations fall into the category of ‘lexical bundles’ (Biber et al. 1999: 989–1025). They can be divided into four categories. Four of the ten are noun phrase fragments conveying a quantitative meaning (Biber et al. 1999: 1012), i.e. a lot of, one of the, the rest of the and some of the. Then there are four forms of declarative clause segments made up of a subject pronoun and a verb phrase (ibid.: 1002–1003), i.e. we need to, I think the and we have to. There are two that-clause fragments (ibid.: 1010), i.e. that we have and

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.13 (889-921)

The discourse intonation patterns of word associations 

Table 5. Top 10 grammatically-rich word associations Grammatically-rich word associations 1. a lot of 2. in terms of 3. one of the 4. we need to 5. I think the 6. we have to 7. the rest of the 8. that we have 9. some of the 10. that we are4

Total frequency in a single tone unit 140 126 102 79 82 76 49 28 58 30

% of word associations in a single tone unit 98.6% 98.4% 85.7% 82.3% 94.3% 90% 70% 44.4% 93.5% 60%

Total frequency ofoccurrence 142 128 119 96 87 80 70 63 62 50

that we are. Lastly, there is one prepositional phrase with an embedded of -phrase fragment (ibid.: 1017), i.e. in terms of. As shown in Table 5, it is possible to identify a pattern of intonation, namely the distribution of prominence, within these grammatically-rich word associations, similar to that found earlier with the lexically-rich word associations. With the exception of that we have (44.4%), typically all of the grammatically-rich word associations are spoken in a single tone unit, with the proportions ranging from 60% (that we are) to 98.6% (a lot of ) of the instances studied. Again, this suggests that there may be a connection between the notion of chunking (Sinclair & Mauranen 2005, 2006) and tone unit boundaries. In other words, these word associations are co-selected as part of a chunk in the unfolding discourse and, as such, they are likely to be spoken within the same tone unit. Indeed, it could be argued that in terms of, for example, could well be written as a single word – ‘intermsof ’ – because it never occurs other than in this invariant contiguous form. Sinclair (1991) calls what are in reality a single choice, such as this, ‘semi-preconstructed phrases’. The two grammatically-rich word associations with the lowest proportion of instances spoken in a single tone unit – that we have and that we are – are similar in that both are that-clause fragments. According to linear unit grammar (Sinclair & Mauranen 2005, 2006), they are made up of an organisational unit, that, and an incomplete message unit, we have and we are. This division between incremental and non-incremental units is where the tone unit boundary occurs when these word associations are spread across two tone units (see Example 5 below). This . The contracted form that we’re (10 instances) is not included, as it consists of two words in terms of speaker intonation, compared to the three-word that we are.

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.14 (921-1018)

 Winnie Cheng and Martin Warren

mixture of organisational units with message units in the associated words is only found in one other grammatically-rich word association on the list, I think the, and this will be discussed below in relation to the distribution of prominence.

Example 5 that we have spoken across two tone units 1 }{= <SO> their}{\[COMmon] that} 2 ECted> some}{\[HEALthy] }{= 3 ry <STRONG> asset}{= and with that} 4 nt] }{\ the problems that}

{\[WE] have }{= th }{= [WE] have a of er}{\[CLOSE {= [WE] have an }{\}{\[LET] me T

that we are spoken across two tone units 1 \the }{= [HAS] to <enSURE> that} {= <WE> are}{\} {= }{= i think [^THEY] are <SAYing> that} {= <WE> have to have}{= [TWENty] 3 > be}{= }{\by }{=}{= we are [NOT] going to SHY away from }{?i mean}{\an <eXAMPle> that}{= we are to}{= [aLIGN] our <WOR

Table 6 lists the distribution of prominence for each of the grammatically-rich word associations when spoken in a single tone unit. When some of these grammatically-rich word associations are examined, a number of the patterns can be accounted for with reference to Sinclair and Renouf ’s (1991) notion of a collocational framework. Further, Brazil’s (1997) notions of general versus existential paradigmatic speaker choices, which in the local context determine whether or not a syllable or a word is made prominent, are helpful Table 6. Distribution of prominence in grammatically-rich word associations occurring in one tone unit 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

a 0 in 0 one 102 (100%) we 2 (2.5%) we 9 (11.8%) I 12 (14.6%) the 0 that 10 (35.7%) some 62 (100%) that 18 (60%)

lot 115 (81%) terms 120 (93.8%) of 0 need 68 (86.1%) have 47 (61.8%) think 33 (40.2%) rest 42 (85.7%) we 4 (14.3%) of 0 we 17 (56.6%)

of 0 of 1 (0.79%) the 1 (0.98%) to 0 to 0 the 1 (1.2%) of 0 have 0 the 0 are 0

the 0

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.15 (1018-1082)

The discourse intonation patterns of word associations 

here. A number of collocational frameworks are observed, such as * of the, a * of, in * of and the * of the. In each of these collocational frameworks, a large number of possible words (either single words or combinations of words) occur that can occupy the empty slots in the framework, while the framework itself is invariant. Thus, the words in the framework, that is the grammatical core, are almost never spoken with prominence, and the word that is selected to complete the word association by occupying the framework is almost always spoken with prominence. There is a similar pattern of prominence distribution in we have to and we need to, where the first word (we) is usually not made prominent by speakers and, in the data studied, there are no instances of the last word (to) being made prominent at all. These are both cases of subject pronoun plus semi-modal, and it is the semimodal chosen (i.e. whether a speaker says, for example, have, need or be going) that is most likely to be made prominent. The remaining grammatically-rich word associations have relatively weaker patterns of prominence distribution. In the word association that we are, the first two words (that we) are fairly evenly spread between being made prominent and non-prominent, with the last word (are) consistently being non-prominent. In I think the and that we have, the first two words are mostly non-prominent and the last word almost always never prominent. That we are and that we have have been discussed as a mixture of organisation and message units in terms of linear unit grammar (Sinclair & Mauranen 2005, 2006). This is also the case for I think the, which is a combination of an organisation unit, I think, which is the most common opine marker in spoken discourse (Stenström 1994; Cheng & Warren 2007) and the, which prospects a noun or noun phrase, and is thus an incomplete message unit. The fact that these particular grammatically-rich word associations cut across the two types of chunks found in spoken discourse may account for the different patterns of prominence distribution found in them, compared to the other grammatically-rich word associations on the list which are all parts of message elements.

. Conclusions This study represents a first attempt at examining the relationship between the phraseological characteristics of language and the communicative role of discourse intonation. Other studies have looked at lexical patterning in terms of word associations, but this is the first corpus-based study of speakers’ discourse intonation choices for these patterns. The main finding of this study has been to confirm Brazil’s (1997) claim that intonation is context-specific rather than word-specific. The occurrence of word associations in a single tone unit is found to be a strong tendency across both

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.16 (1082-1126)

 Winnie Cheng and Martin Warren

lexically-rich and grammatically-rich word associations. Patterns of prominence distribution are found to be less predictable, although there are describable patterns which can be explained in terms of the notion of ‘situational informativeness’ (Brazil 1997). This confirms that a speaker’s decision to make a syllable or word prominent or not is not determined by grammar or word-accent, but rather by the dynamics of the discourse context. While patterns of discourse intonation have been identified and discussed, it is rare to find an intonation pattern that is one hundred percent consistent for a particular word association. Only three of the ten lexically-rich word associations have 100% occurrence in a single tone unit. They are hong kong people, the central government and the basic law. With respect to the patterns of prominence distribution (shown in Tables 3 and 5) among the words in the association, none of the lexically-rich word associations have a fixed pattern of prominence distribution within a single tone unit, and this is found to be the case only for one grammatically-rich word association, namely some (100% prominent) of (100% non-prominent) the (100% non-prominent). The patterns in terms of tone unit boundaries and the distribution of prominence among the word associations studied that do exist are found to be both quite strong and widespread. Thus most of the word associations are spoken in a single tone unit most of the time, and the typical distribution of prominence in these instances of the word association can be explained by Brazil’s (1997) notions of existential and general paradigms. In addition, the principles underpinning linear unit grammar (Sinclair & Mauranen 2005, 2006) have also been found to be useful to explain word associations that are atypical with regard to discourse intonation and also for those which conform to the general patterns identified. It has been argued that when a grammatically-rich word association contains parts of both organisation-oriented and message-oriented units, this may explain in part the differing patterns of discourse intonation. This phenomenon has been found to be confined to some of the grammatically-rich word associations because all of the lexically-rich word associations examined are message-oriented. In the case of the lexically-rich word associations, some evidence suggests that the discourse intonation patterns of word associations may change over time, as the status of the word association changes, as in the case of asia’s world city. If word associations such as asia’s world city become widely used, it would be interesting to see whether the patterning stabilizes. These last two observations, one relating to grammatically-rich and the other to lexically-rich word associations, suggests that there has been some merit in making a distinction between these two forms of word associations. Finally, while discourse intonation patterns are discernible, it needs to be emphasised that speakers may, and indeed do, deviate from them in order to alter their discourse-specific communicative role and negotiate the discourse-specific meaning in naturally occurring speech.

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.17 (1126-1222)


Acknowledgements The work described in this paper was substantially supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. G-YE86). The authors are grateful to the editors for their insightful comments and suggestions.

References Barnbrook, G. 1996. Language and Computers: A practical introduction to the computer analysis of language, 88–106. Edinburgh: EUP. Bartsch, S. 2004. Structural and Functional Properties of Collocations in English: A corpus study of lexical and pragmatic constraints on lexical co-occurrence. Tübingen: Narr. Biber, D., Johansson, S., Leech, G. Conrad, S. & Edward F. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Biber, D., Conrad, S. & Reppen, R. 1998. Corpus Linguistics: Investigating language structure and use. Cambridge: CUP. Brazil, D. 1985. The Communicative Value of Intonation. Birmingham: English Language Research. Brazil, D. 1995. A Grammar of Speech. Oxford: OUP. Brazil, D. 1997. The Communicative Value of Intonation in English. Cambridge: CUP. Burdine, S. 2001. The lexical phrase as pedagogical tool: Teaching disagreement strategies in ESL. In Corpus Linguistics in North America, R. Simpson & J. M. Swales. (eds), 195–210. Ann Arbor MI: University of Michigan Press. Cheng, W. & Warren, M. 2007. I would say be very careful of. . .: Opine markers in an intercultural business corpus of spoken English. In Managing interaction in professional discourse: Intercultural and interdiscoursal perspectives, M. Bondi & J. Bamford (eds), 46–57. Rome: Officina Edizioni. Cheng, W., Greaves, C. & Warren, M. 2005. The creation of prosodically transcribed intercultural corpus: The Hong Kong Corpus of Spoken English prosodic. International Computer Archive of Modern English ICAME Journal 29: 5–26. Cheng, W., Greaves, C. & Warren, M. 2006. From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11(4): 411–433. Clear, J. 1993. From Firth principles: Computational tools for the study of collocation. In Text and Technology: In honour of John Sinclair, M. Baker, G. Francis & E. Tognini-Bonelli (eds), 271–292. Amsterdam: John Benjamins. Cowie, A. P. 1998. Phraseology: Theory, analysis, and applications. Oxford: Clarendon Press. Du Bois, J. W. & Englebretson, R. 2005. Santa Barbara Corpus of Spoken American English, Part 4. Philadelphia PA: Linguistic Data Consortium. Emanuela, C. & Massimo, M. (eds). 2005. C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages. Amsterdam: John Benjamins. Francis, G. 1993. A corpus-driven approach to grammar: Principles, methods and examples. In Text and Technology: In honour of John Sinclair, M. Baker, G. Francis & E. Tognini-Bonelli (eds), 137–156. Amsterdam: John Benjamins. Hewings, M. Ed. 1990. Papers in Discourse Intonation. Birmingham: English Language Research.



JB[v.20020404] Prn:26/03/2008; 15:00



F: SCL3107.tex / p.18 (1222-1326)

Winnie Cheng and Martin Warren

Hewings, M. & Cauldwell, R. 1997. Foreword. In The Communicative Value of Intonation in English, D. Brazil, v–vii. Cambridge: CUP. Hoey, M. 2005. Lexical Priming: A new theory of words and language. London: Routledge. Hunston, S. 1995. A corpus study of some English verbs of attribution. Functions of Language 2(2): 133–158. Hunston, S. & Francis, G. 2000. Pattern Grammar. A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins. Knowles, G., Wichmann, A. & Alderson, P. (eds). 1996. Working with Speech. London: Longman. Mindt, D. 1991. Syntactic evidence for semantic distinctions in English. In English Corpus Linguistics: Studies in honour of Jan Svartvik, K. Aijmer & B. Altenberg (eds), 182–1096. London: Longman. Nesselhauf, N. 2005. Collocations in a Learner Corpus [Studies in Corpus Linguistics 14]. Amsterdam: John Benjamins. Sinclair, J. M. 1985. On the integration of linguistic description. In Handbook of Discourse Analysis, Vol. 2, T. A. van Dijk (ed.), 13–28. London: Academic Press. Sinclair, J. M. 1987. Looking Up: An account of the COBUILD project in lexical computing. London: Collins. Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Sinclair, J. M. 1996. The search for units of meaning. TEXTUS IX(1): 75–106. Sinclair, J. M. 2004a. Trust the Text. Routledge: London. Sinclair, J. M. 2004b. English Collocation Studies. London: Continuum. Sinclair, J. M. & Mauranen A. 2005. Degenrate data. Workshop held at AAACL 6 and ICAME 26, University of Michigan, 12–15 May 2005. Sinclair, J. M. & Mauranen A. 2006. FROM TEXT TO TREE: LUG, LUM and PUB. Third Inter-Varietal Applied Corpus Studies IVACS International Conference: Language at the Interface, University of Nottingham, UK, June 23–24, 2006. Sinclair, J. M. & Renouf A. 1991. Collocational frameworks in English. Reprinted in J. A. Foley (ed.), 1996 J.M. Sinclair on Lexis and Lexicography, 55–71. Singapore: Unipress. Stenström, A.-B. 1994. An Introduction to Spoken Interaction. London: Longman. Stubbs, M. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative methods. Functions of Language 21: 23–55. Svartvik, J. 1990 (ed.). The London-Lund Corpus of Spoken English: Description and research [Lund Studies in English 82]. Lund: University of Lund. Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins. Warren, M. 2006. A corpus-driven analysis of back-channels. Third Inter-Varietal Applied Corpus Studies IVACS International Conference: Language at the Interface, University of Nottingham, UK, June 23–24, 2006. Wichmann, A. 2000. Intonation in Text and Discourse. London: Longman.

JB[v.20020404] Prn:26/03/2008; 15:00

F: SCL3107.tex / p.19 (1326-1358)


Appendix iConc© and Computer readable prosodic transcription conventions Tone group boundaries are marked with ‘{}’ brackets. The referring and proclaiming tones are shown using combinations of forward and back slashes: rise ‘/’, fall-rise ‘\/’, fall ‘\’, and rise-fall ‘/\’. Level tones are marked ‘=’ and unclassifiable tones ‘?’. Prominence is shown by means of UPPER CASE letters. Key is marked with ‘[]’ brackets, high key and low key are indicated with ‘^’ and ‘_’ respectively, while mid key is not marked (i.e. it is the default). Termination is marked with ‘’ brackets with high, mid, and low termination using the same forms of notation used for key choices.



JB[v.20020404] Prn:8/02/2008; 12:26

F: SCL31P3.tex / p.1 (61-88)

 

Exploring discourse in news and entertainment

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.1 (47-127)

Who’s speaking? Evidentiality in US newspapers during the 2004 presidential campaign Gregory Garretson and Annelie Ädel Boston University and University of Michigan, USA

We examine a corpus of texts drawn from 11 US newspapers and related to the 2004 US presidential election, focusing on hearsay evidentiality, the reporting of what one has heard from others. Motivated by the general question of whether bias exists in news reporting, we analyze the sources to whom statements in the corpus are attributed, in order to determine who gets to speak through the press, and whether there is balance between the two sides in this election. We also examine the ways in which speech is reported, asking questions about the use of direct vs. indirect speech, the explicitness of source identification, and the effects that the choice of reporting word can have on the portrayal of a source. Although we find slight evidence of an apparent preference for one candidate or the other in certain papers, overall we find no statistically significant differences that could be construed as bias.

Introduction The reporting of news consists, more than anything else, in conveying what others have said. Journalists are expected to investigate newsworthy events and issues, gathering information from a range of sources, synthesizing this information, and reporting it to readers. To the extent that a bit of information is controversial or comes from a specific source, the journalist is expected to report the source. Further, it is expected that a journalist will refrain as much as possible from inserting his or her own opinions or biases into a story. Many criticisms have been leveled at journalists for failing to remain unbiased; the question of whether these criticisms are valid is one of the motivations for this study.

JB[v.20020404] Prn:28/04/2008; 11:15



F: SCL3108.tex / p.2 (127-165)

Gregory Garretson and Annelie Ädel

Bias and newspaper reporting With few exceptions, US news organizations, and among them, newspapers, claim to be dedicated to the unbiased reporting of news.1 Nevertheless, there is a popular perception that journalists are biased and untrustworthy. Niven (2002: ix), for example, cites a 2000 Gallup poll in which newspaper reporters were considered to have lower honesty and ethics than all other professions except car salesmen, insurance salesmen, and advertising professionals. Indeed, for several decades, a steady stream of accusations have been made that the US news media are biased. More often than not, these claims are made by conservatives, who accuse the media of having a liberal bias (Watts et al. 1999). Linguists, too, have warned of journalists failing to remain impartial. While some linguists, such as Biber et al. (1999: 9), characterize newspaper texts as “carefully crafted texts with little overt evidence of personal opinions,” others, such as Bhatia (2004: 73–74), offer more critical characterizations: In newspapers [. . .] objective news reporting has long been regarded as a socially recognized communicative purpose of the genre of news reporting; however, we often find well-established news-reporters giving what they think are legitimate slants to the events of the day, often mixing factual reporting with elements of opinions or interpretations in their writing.

Nevertheless, various studies that have attempted to investigate empirically the question of bias in newspaper reporting have found no compelling evidence of political bias. A meta-analysis of 59 such studies concluded that “no significant biases were found for the newspaper industry” (D’Alessio & Allen 2000: 133). Afterwards, Niven (2002) – possibly the most carefully designed empirical study to date looking for bias in newspaper reporting – found no evidence of liberal or conservative bias in a sample of hundreds of articles from 150 US newspapers, using measures such as number of articles, article length, article placement, and subjective assessments of whether coverage was favorable or unfavorable. The possibility still exists, however, that there are other measures by which we might find trends in reporting that could be construed as representing bias. In particular, there may be features of news reporting that linguists are in an especially good position to investigate. One textual phenomenon that linguists have investigated is the reporting of others’ speech. The use of reported speech in journalistic discourse has been examined from different perspectives: to show how reported speech may reinforce . Many US news organizations make their claims to objectivity and impartiality public in the form of guidelines and ethics codes, which the reader is invited to inspect. For example, as of October 2006, the New York Times, the Boston Globe, and the Los Angeles Times made their guidelines public via the Web.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.3 (165-225)

Who’s speaking? Evidentiality in US newspapers 

“social, cultural and ideological relations” (Caldas-Coulthard 1994: 307); to give a comprehensive analysis of the functional nature of journalistic text in a particular newspaper (Waugh 1995); to show how news broadcasts assign to story participants various roles and attributes (Leitner 1986); and to connect reporting verbs having negative and positive connotations to bias in the press (Floyd 2000). Reported speech is certainly a prominent feature of news reports; corpus-based analyses have shown that reporting clauses typically occur over 2,000 times per million words in news (Biber et al. 1999: 923). In this study we have opted to examine US newspaper reporting in the context of the 2004 presidential election, a point in history at which the role of the media in shaping public opinion was especially salient. This election was widely heralded as “the most important election of our lifetime” by both Democrats and Republicans and was marked by an unusually high level of partisan division within the US population.2 With the race in a dead heat up to the very end, the climate before the election was very politically charged, and a great deal of newspaper coverage was dedicated to the election. Ultimately, George Bush, the Republican candidate and incumbent, defeated Democratic challenger John Kerry by a margin of 50.7% to 48.3% of the popular vote. This study examines news reports related to this election for possible evidence of political bias, whether intended or unintended. Specifically, we look at the phenomenon of hearsay evidentiality from a variety of perspectives.

Evidentiality in language As mentioned above, some linguists have emphasized the difficulty – perhaps impossibility – of reporting what others have said without the opinions or interpretations of the one reporting becoming a part of the message (see, e.g., Lucy 1993; Caldas-Coulthard 1994; Fairclough 1995). In fact, in some languages, when a speaker makes a statement conveying information, the source from which the speaker received this information must be indicated through grammatical marking. This marking of source is referred to as evidentiality. For example, in Tariana, an Arawak language spoken in northwest Amazonia, the sentence José has played football could be realized grammatically in five different ways, depending on the source of the speaker’s knowledge, as illustrated in (1) with evidential morphemes shown in italics (example from Aikhenvald 2004: 1–3).

. Interestingly, an internet search for the phrase “the most important election of our lifetime” coupled with the term “2004” using the Google search engine retrieved more than 12,000 hits from a range of different sources in August 2006.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.4 (225-267)

 Gregory Garretson and Annelie Ädel

(1) a.

Juse irida di-manika-ka José football 3sgnf-play-rec.p.vis ‘José has played football (we saw it)’ b. Juse irida di-manika-mahka José football 3sgnf-play-rec.p.nonvis ‘José has played football (we heard it)’ c. Juse irida di-manika-nihka José football 3sgnf-play-rec.p.infr ‘José has played football (we infer it from visual evidence)’ d. Juse irida di-manika-sika José football 3sgnf-play-rec.p.assum ‘José has played football (we assume this on the basis of what we already know)’ e. Juse irida di-manika-pidaka José football 3sgnf-play-rec.p.rep ‘José has played football (we were told)’

In English, by contrast, there is neither grammatical nor obligatory marking of evidentiality. Rather, if the source of information is indicated at all, this may be done through a variety of linguistic means, as in (2). (2) a. b. c. d. e.

I saw him play football. I heard him out on the football field. Judging from his clothes, he’s been playing football. I guess he must have been playing football. Someone said he was playing football.

According to Roman Jakobson (1998: 316), this lack of grammatical marking of evidentiality in English led the anthropologist Franz Boas to exclaim “What a pity! It would be so useful for the New York newspapers!” Indeed, how do newspapers present their sources of information? This study takes an exploratory look at newspapers’ use of what Chafe (1986) calls hearsay evidentiality, the reporting of what we have heard from others, corresponding to (1e) and (2e) above. This is the presentation of information that has been conveyed linguistically by someone else, which, as we have noted, is the essence of journalists’ work. A reporter’s job is to gather linguistic data from various sources, synthesize it, and present it to readers. This process involves a tremendous number of choices concerning where to go for information and which information to include in the report. Might certain decisions be construed as constituting bias? It is from this perspective that we approach the current study. We choose to refer to the topic of investigation here as hearsay evidentiality (hereafter simply evidentiality) rather than as reported speech, even though reported speech is the primary material we investigate, for the reason that we are

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.5 (267-319)

Who’s speaking? Evidentiality in US newspapers

also concerned with the question of which sources get to speak – a question that is prior to and in some measure independent of the question of how the speech gets reported.

Research questions This study examines evidentiality in a corpus of news reports on the 2004 US presidential election in 11 prominent US newspapers during the month before the election. The analysis is guided by the following questions: i. ii. iii. iv. v. vi.

What sources are cited in these news articles? Is there a balance between sources from opposing sides? Are different sources treated differently by the journalist? To what extent are sources identified explicitly? To what extent is direct vs. indirect reported speech used? Do the factors in (i) to (v) vary across newspapers?

As this is essentially an exploratory study, we will not attempt to make any resounding claims regarding media bias on the basis of the analyses presented here. Nevertheless, we hope that the methodologies employed here will prove useful in pointing the way toward more sophisticated studies of news texts.

Methods We created a corpus of articles taken from leading US newspapers, consisting of news reports related to the 2004 US presidential election. We chose to draw data from the thirty days leading up to the election, when the quantity of reporting on the election was at a peak, and the stakes, in terms of possible media influence on voters, were highest.

Data collection We selected the eleven highest-circulation daily newspapers in the US to which we had access, with the provision that no two papers from the same city should be included. This gave us papers from ten cities in nine states and one national newspaper. The newspapers are described in Table 1.3 Each of these papers has a . Note that while these newspapers are fairly well distributed geographically, the method of selection naturally privileges large urban areas, most – though not all – of which traditionally vote Democratic (Sauerzopf & Swanstrom 1999). It would be interesting to compare this corpus to data from smaller newspapers in areas that tend to vote Republican.



JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.6 (319-376)

 Gregory Garretson and Annelie Ädel

Table 1. Description of the corpus used in the study Newspaper

Abbr. City

Atlanta JournalConstitution Boston Globe

AJC

Circ. (2004)

Atlanta, 371,853 Georgia BG Boston, 450,538 Massachusetts Cleveland Plain CPD Cleveland, 365,288 Dealer Ohio Houston Chronicle HC Houston, 553,018 Texas Los Angeles Times LAT Los Angeles, 914,584 California Minneapolis Star MST Minneapolis, 380,354 Tribune Minnesota New York Times NYT New York, 1,118,565 New York San Francisco SFC San Francisco, 512,640 Chronicle California St. Petersburg SPT St. Petersburg, 334,742 Times Florida USA Today UST nationally 2,154,539 distributed Washington Post WP Washington, 732,872 D.C. TOTAL: 7,888,993

Total articles

Total words

Tokens of Tokens/ eviden1,000 tiality words

143

105,423

2530

24

151

157,184

4023

26

79

64,703

1686

26

103

74,236

2077

28

212

234,893

5978

25

141

149,625

3513

23

210

233,748

6506

28

119

138,646

3162

23

86

96,762

2075

21

227

179,788

3947

22

297

301,480

8012

27

1768 1,736,488

43,509

daily circulation of over 300,000 copies. A number of these newspapers are also read online via the Web; however, we have no data on such usage. Our circulation figures are therefore based solely on printed newspapers. These figures mean that if only one person reads each copy (a low estimate), the total public reached by these papers every day is around eight million people. If we suppose that two people read each copy, these sixteen million readers represent approximately 7% of the voting-age population of the US.4 The articles were all drawn from the front section of the newspaper where national news is reported. Each article was required to be at least 400 words long and mention both Bush and Kerry, the two candidates. Editorials, letters from readers, and the like were excluded. We collected all such articles published within the thirty days before the election in each of the eleven newspapers, which resulted . The circulation figures are from the Newspaper Association of America, a non-profit organization representing the US newspaper industry (http://www. naa.org).

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.7 (376-436)

Who’s speaking? Evidentiality in US newspapers 

in a corpus of 1.74 million words. The corpus files were saved in XML format, which facilitated all of the subsequent processing. The next step was to locate instances of evidentiality in the corpus. Although other possibilities exist for reporting information heard from others, we chose to focus on instances of reporting involving a reporting verb, a reporting noun, or a prepositional phrase (according to) which serves this function. We will refer to these collectively as reporting words. Adjectival forms (e.g., stated in the stated purpose of . . .) were not included. We created an extensive list of potential reporting words likely to occur in news reports based on previous research, reference grammars and thesauri. Some verbs were excluded on the basis that they are not likely to occur in this particular genre; for example, mental verbs (e.g., he thought. . .), which are common in fiction, are uncommon in news (Floyd 2000). This resulted in a set of approximately 120 lemmas (e.g., state), most of which may be realized in several surface forms (e.g., state, states, stated, stating). With a view to improving precision rates for our search terms, we performed extensive concordance searches on our corpus to find patterns of usage that were not instances of evidentiality, weeding out spurious examples in cycles (e.g., for the lemma state, we excluded the United States). This left us with over 43,000 tokens of reporting words. The number of tokens found in each newspaper is given in Table 1 above. To test the accuracy of our methods, we checked recall and precision by examining randomly selected articles both for missed instances of evidentiality and for inappropriate tokens. We found that 93% of all potential tokens of evidentiality had been included (recall), and that 94% of tokens selected were indeed appropriate (precision). These numbers were deemed satisfactory for the purposes of this study.

Samples of the data The data were analyzed at three different levels of detail. The first level is the corpus as a whole (43,509 tokens). Because it was not feasible to manually code the entire corpus, manual coding was performed on a subsample, which we will call Sample A, containing 200 randomly selected tokens from each newspaper (2,200 tokens total). We will also pay special attention to a further subsample, which we will call Sample B, consisting of the Sample A tokens from four newspapers of special interest (800 tokens total). In the Results and Discussion section below, data will be presented for each level as appropriate.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.8 (436-550)

 Gregory Garretson and Annelie Ädel

Automated and manual data analysis We performed both automated analyses using computer programs and manual analyses involving hand-coding of the data. The primary advantage of the automated analyses is the number of data that may be processed – it was not possible to code 43,000 tokens by hand. The primary advantage of the manual analyses is the ability to leverage the subtleties of human judgment about matters such as political affiliation.5 The automated analyses were all carried out using custom computer programs written in the programming language Perl. Primarily, these were used to examine the reporting words used in the corpus and the proportion of direct and indirect reported speech in the corpus. Programs were also written to analyze the results of the manual coding. The manual data coding was performed using Dexter, a suite of tools designed by Gregory Garretson and freely available via the Web.6 Using Dexter enabled us to add codes to each of the tokens in context, which was very important given the need to determine the referent of each referring expression in the target sentences. Often, the referent was introduced quite far back in the story, and so having the context available was crucial. The two main variables manually coded for were direct vs. indirect reported speech and identity of the source. The former variable has three possible values: Direct, Partial, and Indirect. These will be described in the Results and Discussion section. The variable identity of the source has two dimensions: the type of entity to which the information is attributed (i.e., an individual or an organization) and the political affiliation, if any, of that entity. It is possible to group these categories into general classes by political affiliation, as shown in Table 2. As might be imagined, determining the political orientation of an individual or an organization can be tricky. We used a relatively conservative method, preferring to assume no affiliation in cases of doubt. The primary opposition in play in the corpus is between the Democratic and Republican parties and their candidates, spokespersons, and members. We also included the categories Liberal individual/organization, and Conservative individual/organization, which for all practical purposes align with Democrats and Republicans, respectively. The categories listed in Table 2 require some explanation. Tokens attributed to the Democratic and Republican presidential and vice-presidential candidates – . For more discussion of the relative merits of automated and manual coding, and an argument for a using a combination of these methods, see Garretson & O’Connor (2007). . See http://www.dextercoder.org. The Dexter Coder allows the user to add codes (i.e., annotations) to documents via an easy-to-use graphical interface and then perform complex searches on those codes, the text, and document metadata.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.9 (550-577)

Who’s speaking? Evidentiality in US newspapers 

Table 2. Coding taxonomy for the variable identity of the source, arranged by affiliation class and individual/collective dimension AFFILIATION CLASS

INDIVIDUAL

COLLECTIVE

LIBERAL

Democratic official Liberal individual Kerry Edwards

Democratic party Liberal organization

CONSERVATIVE

Republican official Conservative individual Bush Cheney

Republican party Conservative organization

SPECIAL INTEREST

Government individual Special interest individual

Government organization Special interest organization

IMPARTIAL

Expert individual News individual

Expert organization News organization Survey

UNKNOWN AFFILIATION

Unaffiliated individual Undecided voter

Unaffiliated organization

OTHER

Anonymous Impersonal Other source Coordinated

Kerry, Bush, Edwards, and Cheney – were given the only codes naming specific individuals. Individuals identified as affiliated with the Democratic or Republican parties (e.g., Republican National Committee spokesman Jim Dyke) were coded as Democratic official or Republican official, while statements attributed simply to the Democratic party or Republican party were coded as such. Tokens in which individuals or organizations were presented as having a clear preference for one party but not as being part of the party machinery (e.g., Bush supporters) were coded as Liberal or Conservative. This includes representatives from voter mobilization groups with a clear goal of having either Kerry or Bush elected. When individuals were presented as government officials, including members of the military, and their party affiliation was not mentioned, they were coded as Government individual (e.g., a senior government official). Representatives of special interest organizations (e.g., Carl Pope, executive director of the environmental group Sierra Club), including lobbyists and pastors, were given the code Special interest. Although the connections between these entities and politics are sometimes quite strong, it is often impossible to map them simply onto the liberalconservative scale. Often, such individuals will vote based on an issue, rather than a party, and therefore could in theory vote for either party.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.10 (577-627)

 Gregory Garretson and Annelie Ädel

The impartial class consists primarily of Experts, who are overwhelmingly college or university professors (e.g., John J. Pitney Jr., a political scientist at California’s Claremont McKenna College) but may also be researchers from “think tanks” and other research organizations. Typically, any affiliation such individuals have to one party or the other is not mentioned. When it was, they were coded instead as Liberal or Conservative. The other impartial categories are News agencies (e.g., the Seattle Times) and Surveys (e.g., the University of Pennsylvania’s National Annenberg Election Survey). Again, if any affiliation was mentioned (e.g., a Democratic pollster), the source was instead coded according to that affiliation. The remaining categories require careful differentiation. An Unaffiliated individual is someone who is presented as a “man/woman on the street” with no obvious link to either party (e.g., volunteer firefighter Margaret Horgan). An Undecided voter is someone presented specifically as unsure about how to vote; this category was much talked about – albeit not much talked to – by the media during this race. The category Anonymous was reserved for cases in which it was explicitly stated or strongly implied that a source had declined to be identified (e.g., according to the friend, who asked not to be identified because of their relationship). These were very rare in our corpus. By contrast, if the journalist simply failed to give a name for a source, with no indication that this was at the behest of the source (e.g., Officials say that. . .), the token was coded according to the categories above and, in addition, was given the further code Unnamed. Unlike anonymous sources, such tokens were relatively common; these tokens will be discussed below. Impersonal is a related category in which no source is mentioned at all, by dint of the use of impersonal constructions such as passive voice (e.g., It has been said that. . .). Finally, we coded Coordinated sources (e.g., Both Bush and Kerry agreed that. . .), and Other, a category used for terrorist spokesmen, third party candidates, and other unusual entities. Beyond the two main variables, Sample A was coded for a further set of variables, including the categories Nested and Irrealis. Nested tokens are those that occur within another quoted passage, as in (3), with one source citing a further source. These turned out to be surprisingly common in the sample, comprising over 5% of the tokens. (3) [Bush:] “Last week in our debate [. . .] Senator Kerry said our soldiers and marines are not fighting for a mistake, but also called the liberation of Iraq a ‘colossal error’.”

The code Irrealis was used in all cases in which a source was presented as not saying something, as in (4), or in which it was hypothesized that a source might say something, as in (5). Three percent of the tokens in the corpus were of this type.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.11 (627-689)

Who’s speaking? Evidentiality in US newspapers 

(4) The always-enigmatic Neil Young made a guest appearance, saying nothing but sporting a T-shirt that bore a picture of 19th-century Indian chiefs and the legend “Homeland Security.” (5) Those conclusions could allow Kerry to argue that Bush’s rationale for invading Iraq has been further discredited.

We will return to the topic of nested and irrealis tokens in the discussion below.

Results and discussion We will now present the results of various analyses of the corpus. Specifically, we will examine the following aspects of the data: the use of direct vs. indirect speech, the proportions in which different sources are cited, the use of nested tokens, the use of unnamed sources, and the choices made in using reporting words, including how these are linked to one candidate or the other. Where appropriate, we will also look for differences between the various newspapers.

Direct vs. indirect reported speech Even though English does not grammatically encode hearsay evidentiality, English speakers do make a distinction between direct and indirect reported speech – that is, whether we are representing verbatim what a source actually said, or whether we are rephrasing the source’s speech. This distinction is especially significant in newspaper reporting. While reporting another’s words verbatim in quotation marks is likely to increase readers’ level of confidence in the person reporting, using indirect speech allows the reporter to synthesize, smooth out, and present more succinctly the often lengthy and disfluent statements made by sources. This is, therefore, a tension within which journalists must continually work. One might suppose that the simplest question to answer about the corpus is how much direct and indirect reported speech are used. For our purposes, we define direct reported speech (hereafter direct speech) as reported speech occurring between quotation marks. However, even this simplistic definition raises several questions. The first is whether it ever happens that a direct quote is given without quotation marks. The literature gives different views on this. Waugh (1995: 138) states that the orthographic difference between direct and indirect speech is categorical in journalism, such that “direct speech is put in quotation marks and in italics, and indirect speech is not”, while Biber et al. (1999: 921) state that “quotation marks identifying the reported text are often missing (especially in news).” Meanwhile, Scollon & Scollon (1997: 107), contrasting newspapers in English and Chinese, state that whereas there is tremendous ambiguity and variation

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.12 (689-789)

 Gregory Garretson and Annelie Ädel

in Chinese-language newspapers’ treatment of reported speech, “[i]n contrast, the English newspapers present a face of clear and unambiguous quotation.” Our studies of the present corpus material have convinced us that it does occasionally happen that direct speech is presented without quotation marks, but this is infrequent enough to be negligible in our analyses.7 A further question is what to do when only part of a statement is enclosed in quotation marks, as in (6). This issue becomes even more complex when the use of scare quotes is considered, as in (7). (6) EPA’s advisory committee for children’s health issues reacted quickly and declared that the proposed rule “does not sufficiently protect our nation’s children.” (7) Goldman, who worked at the EPA during the Clinton administration, said that the Bush administration has adopted a “nip and tuck approach” that allows utilities to spend less on pollution control at the expense of children’s health.

To handle such cases, we introduced the category of Partial, which is used in the manual coding for partial direct reported speech, defined as anything in quotation marks that is less than a full clause. We chose not to distinguish between scare quotes and other brief spans of quoted material, as the functions served by these are likely to be manifold and difficult to differentiate. An automated analysis of the use of quotation marks in relation to each token in the whole corpus yielded the results shown in Table 3. This analysis was not able to distinguish partial from direct reported speech, so in Table 3 these are treated as a single category. However, the automated analysis did offer another way of looking at quotation use: counting the number of words within quotation marks in the different newspapers. The last two columns of the table present, for each paper, the overall amount of quoted material (per 1,000 words) and the average length of direct quotations. If we consider the data in Table 3 together with the last column of Table 1, we see the following: The frequency of evidentiality found in each newspaper varies from 21 to 28 tokens per 1,000 words. The Houston Chronicle and the New York

. Another potential problem is raised by Caldas-Coulthard (1994), who claims that direct quotations are often fabricated, wholly or in part, by reporters. As we are not in a position to check the veracity of the direct quotations in our corpus, we have no choice but to take these newspapers’ word for their accuracy. Moreover, it seems quite unlikely that reporters would fabricate information concerning widely-documented events. One possible way of investigating this would be to compare several newspapers’ versions of the same speech event, much as in Scollon & Scollon (1997).

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.13 (789-821)

Who’s speaking? Evidentiality in US newspapers 

Table 3. Direct vs. indirect reported speech in the corpus Paper

AJC SFC MST WP CPD LAT NYT SPT UST BG HC Average

Direct or partial reported speech

Indirect reported speech

Words in quotes per 1,000 words

Average words in quotes per token of direct speech

43% 42% 40% 39% 38% 38% 37% 37% 37% 36% 32% 38%

57% 58% 60% 61% 62% 62% 63% 63% 63% 64% 68% 62%

222 217 205 190 224 195 202 199 139 220 178 199

23.1 24.0 23.2 19.6 24.2 21.3 20.9 26.9 18.0 25.7 21.4 22.6

Times have the most, while the St. Petersburg Times and USA Today have the least. Meanwhile, the number of words within direct quotes is highest in the Cleveland Plain Dealer and the Atlanta Journal-Constitution and lowest in USA Today and the Houston Chronicle. The number for USA Today – 139 per 1,000 – is startlingly low, even compared to the Chronicle. This allows us to characterize selected papers: Not only does USA Today have very few tokens of evidentiality in general, but when direct quotes do occur, they are very short – at 18 words, the paper has the shortest average direct quotation length. By contrast, the Houston Chronicle has many tokens of evidentiality, but the lowest ratio of direct to indirect speech. Therefore, it also ends up with relatively few words in quotes overall. At the other end of the spectrum is the Atlanta Journal-Constitution, which has the highest ratio of direct to indirect speech, and therefore boasts a high number of words in quotes, despite the fact that it is only average in terms of overall token count. Unfortunately, one question our study design does not enable us to answer is to what extent each newspaper prints statements that should be attributed to a source but are not. We merely examine existing tokens of evidentiality and must leave that question to future studies. As is quite clear from Table 3, the overall balance between direct and indirect speech across the papers is approximately 40% vs. 60%, with only a moderate degree of variation. We may ask what proportion of the direct speech is actually partial direct speech; the manual coding of Sample A enables us to shed some light on this. We found the ratio of partial direct to direct to indirect speech in that sample to be approximately 1:3:6 for all papers. That is, overall, 10% of the tokens were

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.14 (821-882)

 Gregory Garretson and Annelie Ädel

realized as partial direct speech, 30% as direct speech, and 60% as indirect speech, which tallies with the results from the whole corpus. The predominance of indirect speech here is in line with Waugh’s statement (1995: 149) that indirect speech is “the unmarked type of reported speech in journalism.” Journalists appear to resolve the tension between fidelity to the source and clarity of expression in favor of the latter, at least by a small margin. The use of partial direct speech appears to represent a relatively attractive compromise, one which allows the journalist to frame a source’s speech as he or she sees fit, while including verbatim what might be seen as the most significant material of the source’s statements, as in (8) and (9). (8) Crichton said that, as a colleague, Daly is “hard-working, collaborative and intelligent.” (9) Catholic Answers, an independent organization based in El Cajon, Calif., denounces abortion as “intrinsically evil” [. . .]

These results suggest that in these papers, on balance, wording is subordinate to meaning, and that the specific words of a source are repeated only when they best convey the central meaning of the message. Otherwise, it is the job of the journalist to repackage the meaning in a form that, without distorting it, renders it more genre-appropriate or easier for the reader to take in. It seems clear, then, that newspapers have come to a (probably unspoken) consensus that sources need be quoted directly no more than half of the time. Presumably, it is considered more important to identify one’s sources than to cite them verbatim. In the next section, we look into the question of what sources are given a voice in this corpus; in the following sections, we will focus on the issue of nested citations and the question of how often sources go unidentified.

Sources cited in the corpus If the majority of information conveyed in news reports comes from outside sources, then which sources are selected, and in what proportion, is a significant question. We can expect to find a range of different sources in news reports, which tend to rely on a large knowledge base; as Waugh (1995: 132) puts it, “while writing their reports, reporters try to corroborate the claims, especially the controversial claims, made in a story, they try to get different perspectives on a given story, they turn to experts for verification of specific points, and so forth.” Do the newspapers in this corpus present different perspectives on the US presidential election? Which sources do they give voice to, and in what proportion? In this section, we answer these questions, looking first at all eleven papers in the aggregate, and then at four individual papers. Because the determination

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.15 (882-920)


of sources’ identities and affiliations requires careful examination, results will be presented only for Samples B and C. Figures 1 and 2 present the distribution of sources by type at different levels of granularity. Figure 1 presents only the totals for the broad affiliation classes, while Figure 2 presents the same data broken down into individual categories of source type (with a few pairs merged in the interest of clarity). The affiliation classes are shown in the legend. Beginning with Figure 2, the first thing we may note is that at 12.8%, John Kerry, the Democratic challenger, is the single most-cited source, ahead of

0,3 0,25 0,2 0,15 0,1

OTHER

UNAFFILIATED

IMPARTIAL

CONSERV.

0

SPECIAL INTEREST

0,05 LIBERAL

Percentage of total

Sources by affiliation class in Sample A

Affiliation class

Figure 1. Relative frequency of sources with different affiliations in Sample A

Figure 2. Relative frequency of source types cited in Sample A



JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.16 (920-972)

 Gregory Garretson and Annelie Ädel

incumbent George Bush at 11.0%.8 However, we hasten to point out that in a data set of this relatively small size (2,200 tokens), this difference is not statistically significant.9 The next four most-cited sources are expert individuals/organizations, Republican officials, unaffiliated individuals, and Democratic officials. Note that while Kerry is cited more than Bush, Republican officials are cited more than Democratic ones. If we want to know what the overall balance is between Democratic- and Republican-affiliated sources, we may examine Figure 1. This shows that overall, liberal sources outweigh conservative ones by a slim margin of 27% to 25%, which is not statistically significant. As a side analysis, we also calculated the proportions of direct, partial, and indirect speech corresponding to the liberal and conservative affiliation classes. The results show no difference whatsoever – liberal and conservative sources are treated exactly the same in terms of how often they are cited verbatim. The liberal and conservative classes are a full 10% more common than the next one, special interest sources, which, as discussed above, includes government, lobbyist, religious, and other sources. Slightly less common are the purportedly impartial sources, including experts, news organizations, and surveys. It may be of interest that unaffiliated sources – those individuals and organizations presented as not having a connection to either side in the election – represent the smallest of the six groups. Within this group, the Undecided voter category, a frequent topic of discussion during the final weeks of the race, represents less than 1% of the sources cited. It is also worth noting that, in a country where barely 50% of the adult population votes (an unusually high 58% voted in 2004), newspapers – and the media in general – dedicate very little attention to those citizens who do not vote. Clearly, the main reason for this is that such people tend not to be “newsmakers.” However, if one of the purposes of the press is to safeguard the democratic process in the republic, it would seem advisable to investigate the reasons why so many citizens appear to be disenfranchised from that system. The fact that special interest groups are given a relatively large amount of print complicates the picture, as many of these groups are speaking in support of one candidate or the other. Similarly, even among the unaffiliated sources, there are . This might seem especially surprising given that Bush was already serving as president. However, recall that articles in which only Bush was mentioned, but not Kerry, were excluded from the corpus. This is likely to have eliminated a fair number of non-election-related citations of the president that would be irrelevant to the current study. . All data on statistical significance presented here are based on the chi-square goodness-of-fit test or the chi-square test of independence, as appropriate, and the .05 level of significance.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.17 (972-1013)


those who speak out in preference for one of the candidates.10 It would no doubt be interesting to count up such statements; while we have not analyzed the data at this level, it does not seem to us that such counts would affect the overall picture of balance between the two sides. The results presented here suggest that, on the whole, journalists are very careful to give both parties equal space. Although at this level there is no evidence of bias toward one side or the other, it is interesting to note what a large proportion of the sources to which journalists turned did have a strong stake on one side or the other: arguably two-thirds of sources cited. This highlights both the need for and the difficulty of being judicious in the selection of sources, if balance is to be maintained between the sides of an issue.

Sources cited in Sample B Now let us focus our attention even further, to four of the eleven newspapers represented in the corpus. Sample B consists of all of the manually coded tokens from four papers of special interest: the Boston Globe, from Massachusetts, the home state of John Kerry; the Houston Chronicle, from Texas, the home state of George Bush; the Cleveland Plain Dealer, from Ohio, the state that proved pivotal in the election; and USA Today, the only officially national general-news paper in the United States.11 These four newspapers therefore make up an especially interesting set for the investigation of possible bias. During a political campaign in the US, the editorial boards of some newspapers choose to endorse a candidate, while others do not. Not surprisingly, the Boston Globe endorsed Kerry, while the Houston Chronicle endorsed Bush. Meanwhile, the Cleveland Plain Dealer and USA Today refrained from endorsing any candidate. Interestingly, the Cleveland Plain Dealer had endorsed Bush in the 2000 election but declined to endorse a candidate in 2004, while USA Today opposes newspaper endorsement of candidates as a matter of policy.

. Recall that the criterion for a source’s being coded as unaffiliated is that he or she is not presented as belonging to any group connected with one side or the other. An individual presented as a “Bush backer” would be coded as Conservative individual, while one presented as a bartender would be coded as Unaffiliated individual. This does not mean that these individuals do not express opinions in their statements. However, as with all other categories, they were coded on the basis of the journalist’s description rather than the content of their statements. . There is also the Wall Street Journal, which is a widely read national paper. However, it is a business newspaper, and therefore, although it does contain a fair amount of non-business news, we did not include it in our corpus.



JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.18 (1013-1081)

 Gregory Garretson and Annelie Ädel

Despite the decision by a newspaper’s editors to endorse a candidate (which tends to occur only one to three weeks before the election), the journalists writing news stories for the paper are expected not to reflect such preferences in their reporting. We might suppose that if we are, nevertheless, to find evidence of preference for one candidate or the other, it might well surface in the Globe or the Chronicle. By the same token, we might expect such preference to be less likely to show up in the Plain Dealer or USA Today, given the explicitly neutral stand taken by these papers. Figure 3 shows the proportions in which different types of sources are cited in the four newspapers. Overall, the data pattern relatively similarly across the papers. The most salient departure from this is the number of times Bush is cited in the Houston Chronicle: at 22.5%, twice as often as in the other papers, a statistically significant result. However, the Chronicle also cites Kerry slightly more often than the other papers, at 16%, and there is no statistically significant Bush-Kerry difference within the Chronicle data. The Boston Globe, in contrast, cites Kerry in 14% of cases, and Bush in only 8.5% of cases. The preference here, although also not statistically significant, works in the opposite direction from that in the Houston Chronicle. Interestingly, the Cleveland Plain Dealer also prefers to cite Kerry, by a small margin of 3.5%. Meanwhile, USA Today is the only paper to approach perfect parity on this measure, with a difference of only .5%. The tokens in which Bush is cited in the Houston Chronicle do not show any glaring patterns that set them apart from those in other newspapers. However, Bush is referred to as “the president” more often there than in other papers. Further, certain tokens appear to have an especially positive ring to them, as in (10). Nevertheless, most of the Bush tokens are fairly similar to those in other papers. What sets them apart is primarily their number. (10) “We stand for a country in which marriage is the cornerstone of our society,” Bush said to cascades of applause from a crowd of supporters in Reno on Thursday. [emphasis added]

Another statistically significant difference between the papers is the number of sources within the category of Special interest cited by the Cleveland Plain Dealer. At 11%, this is nearly three times the average of the other papers. One possible explanation for this is the fact that by October 2004, it was clear that Ohio would be a major battleground state in the election, and the state therefore attracted the attention of many groups intent on influencing voters. What is striking in the Special interest tokens in the Plain Dealer is that a clear majority of those cited belong to groups with a liberal stand on the issue in question, as in (11).

Figure 3. Frequency of source types cited in each of the four newspapers in Sample B Percentage of total 0%

5%

10%

15%

20%

25%

Kerry Edwards Dem official/party Liberal indiv/org Bush Source types cited in each newspaper in Sample B

Cheney Rep official/party Conserv indiv/org Gov indiv/org Source

Special int indiv/org Expert indiv/org News indiv/org Survey Unaffiliated indiv/org Undecided voter Other source Anonymous Impersonal Coordinated

BG CPD HC UST

Who’s speaking? Evidentiality in US newspapers JB[v.20020404] Prn:28/04/2008; 11:15



F: SCL3108.tex / p.19 (1081-1081)

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.20 (1081-1135)

 Gregory Garretson and Annelie Ädel

Sources grouped by affiliation class in Sample B 40% Percentage of total

35% 30% BG

25%

CPD

20%

HC

15%

UST

10%

OTHER

UNAFFILIATED

IMPARTIAL

SPECIAL INTEREST

LIBERAL

0%

CONSERVATIVE

5%

Affiliation class

Figure 4. Relative frequency of sources with different affiliations in the four papers in Sample B

(11) The Union of Concerned Scientists, a nonprofit watchdog group, contends those examples are part of a broad and systemic pattern by the Bush administration of ignoring, manipulating or censoring scientific information that clashes with the president’s policies.

Now that we have seen the individual categories, we may ask what pattern will emerge if we group the data into affiliation classes. This is done in Figure 4. The first observation we can make about this view of the data is that three of the four papers appear to privilege liberal sources over conservative sources. Meanwhile, the Houston Chronicle presents the opposite picture: Conservative sources there are privileged over liberal ones. However, these differences do not reach statistical significance in this small sample, and so we must take care in drawing conclusions. In this representation, the Cleveland Plain Dealer’s preference for special interest sources is masked somewhat by the Houston Chronicle’s frequent citation of Government sources, which are also grouped into the special interest class. At the same time, the merging of several categories into the impartial class clearly shows USA Today’s preference for these source types: experts, news sources, and surveys. If we are to generalize from these results, with the caveat that they are not significant by statistical measures, we may paint a picture of the Houston Chronicle as appearing to favor conservative sources, above all president Bush, who was

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.21 (1135-1182)

Who’s speaking? Evidentiality in US newspapers 

governor of Texas before attaining the presidency. The Chronicle does cite a relatively high number of liberal sources, but refers to fewer impartial and unaffiliated sources than any of the other three papers. Meanwhile, the Boston Globe appears to favor liberal sources over conservative ones, but it also favors impartial sources over special interest ones. The Cleveland Plain Dealer, for its part, similarly privileges liberal sources, but here liberal and conservative sources appear overall less than in the Globe or the Chronicle; instead, special interest sources are cited considerably more. Finally, USA Today appears to exhibit the least preference for one side or the other, with a strong preference for impartial sources over special interest ones. This falls neatly in line with its position as a national and editorially neutral newspaper. In sum, we do see some indications of preference for Bush and the Republicans in the Houston Chronicle, and for Kerry and the Democrats in the Boston Globe, especially as compared with USA Today. It is interesting to note that the Cleveland Plain Dealer patterns most like the Boston Globe. It may be worth highlighting that for a newspaper that endorsed Bush in 2000 to refuse to do so in 2004 amounts to a repudiation, if not an outright endorsement of the opposing candidate. If this reflects a “covert” preference for Kerry, it is in line with the patterning of the sources of evidentiality found in the paper. Before moving on, we must issue a caution against overinterpreting these results. Even if these differences had reached statistical significance, we are not advancing the claim that preference for one type of source over another equates to bias; we are suggesting that it may be indicative of bias. Before making bolder claims, we believe that it will be necessary to study far more data, and to look more closely not only at what sources are cited, but at what those sources are represented as saying, as well as how they are portrayed by the journalist. In that spirit, we now move on to look at somewhat subtler aspects of the data: the use of “nested evidentiality” and the use of unnamed sources.

The use of nested evidentiality As mentioned above, in Sample A approximately one token in twenty represents what we term “nested” speech. These are cases in which one person is cited as speaking for another, as in (12). These cases pose a particular challenge to the journalist. (12) “We really have to do something about it,” Kerry said, according to a Democratic official.

On one hand, such instances are likely to be unavoidable for the journalist, especially in the context of a contentious election in which assertions are continually

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.22 (1182-1244)

 Gregory Garretson and Annelie Ädel

repeated, supported, and refuted. Such actions are often in and of themselves news, as in (13) and (14). (13) Kerry said Bush, in a campaign debate four years ago, also in St. Louis, said he would allow imports from Canada. (14) Mr. Edwards pointed out that Mr. Kerry had explicitly said he would never give any other nation a veto over American national security decisions.

On the other hand, nested speech creates a delicate situation. Whenever a journalist chooses to print any statement made by another – whether using direct or indirect speech, nested or not – he or she is giving that source access to a public forum. Yet there is no guarantee that the statements made by the source are truthful, or that the speaker does not have a hidden agenda in making them. If a statement is presented by the journalist without qualification or contextualization, this is likely to be interpreted by readers as an implicit ratification of the statement. As Caldas-Coulthard (1994: 302) argues, “[d]irect and indirect reporting of words in the news have the function of legitimizing what is reported”. When a journalist prints a statement in which one source speaks for another – what we are calling nested speech – concerns about the statement are even greater than for ordinary reported speech, because now there is not only the possibility of untruthfulness or manipulation, but also the possibility of misrepresentation on the part of the speaker. Examples (15)–(17) present some instances in which one speaker clearly has a motive for revoicing another’s speech (with italics added to show the nesting). (15) “Just like with Iraq, just like with the economy, a top administration official is now saying that even with the benefit of hindsight, the administration wouldn’t have done anything differently,” Kerry said to the exuberant partisan audience. (16) When Russert asked Coburn about recent reports that he sterilized a 20-yearold woman without her written permission, Coburn said she had verbally asked him for the procedure and charged her with a “smear” campaign. (17) Kerry said he, too, considers marriage as between a man and woman, but he opposes the amendment. He suggested that Vice President Dick Cheney’s daughter, a lesbian, would say homosexuality is not a choice.

The very fact of one source speaking for another may itself be newsworthy, but the danger exists that readers will see the nested statement as accurately represented, even when this is not warranted. Therefore, the onus is on the journalist to present such statements carefully or risk perpetuating a misrepresentation. Example (17) above illustrates another common pattern in nested tokens – the co-occurrence with nesting of what we coded as Irrealis. It frequently happens that

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.23 (1244-1298)

Who’s speaking? Evidentiality in US newspapers 

one speaker hypothesizes about what another will or could say, as in (18)–(19), with italics added as above. (18) “Both campaigns will boast of a groundswell of support among independents and undecided voters at the very end. That’s typical,” he said. “Both candidates will say they’re enjoying a big surge.” (19) “It may happen that Brokaw tells voters, ‘We’re not calling this election. We expect court action in the morning’,” said Bill Wheatley, vice president of CBS news.

This Nested-Irrealis combination, found in around 1% of the sample, serves to emphasize the potential complications of presenting speakers as putting words in the mouths of others.

The use of unnamed sources A practice that has received much attention in discussions of the media’s role in society is the use of anonymous sources, in particular with recent cases of journalists (e.g., Judith Miller) being sent to jail for refusing to reveal the identities of their sources. In our manually coded sample, however, we did not find truly anonymous sources to be very frequent, totaling under half a percent of the manually coded tokens. However, we also coded for a related practice which is both more subtle and less discussed: the practice of citing sources obliquely, without referring to them by name. Such tokens were given the code Unnamed in addition to being coded for source type. Unnamed sources appear in various ways, as illustrated in (20)–(22); in these and the following examples, italics have been added to indicate the source. (20) But Bush is expected today to argue that Kerry’s alleged vacillating has ultimately landed him in the wrong place on many policies, said a senior Republican strategist familiar with White House thinking. (21) By the same token, analysts say Republicans may have squandered an opportunity to expand their majority. (22) Some even speculated that it might affect the election’s outcome by spurring more people – those with strong views on abortion, for example – to vote.

Example (20) comes very close to proffering an anonymous source; however, our methodology required some statement of the source’s desire to remain unnamed for a token to be coded as Anonymous. Example (21) is a clearer case of an Unnamed source; analysts typically have no call to remain anonymous. Therefore, it is incumbent on the journalist to name these sources, if not in the present sentence, at least in the article. Example (22) is a more extreme case of an Unnamed

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.24 (1298-1364)

 Gregory Garretson and Annelie Ädel

source, bordering on the category Impersonal. In all three cases, the source is not identified anywhere in the text. Journalists with whom we have spoken (and whom we will take the risk of leaving unnamed here) stress that reporters are expected to amass a large body of information on a topic before writing a story, and that readers are expected to trust that all statements in an article are justified by this research, whether or not the source of an individual piece of information is given. It is reasonable for a writer to make general statements as in (21) or (22), they assert, provided that at least some of the sources are identified elsewhere in the article. While we do not dispute these assertions, we believe that the use of unnamed sources is a potentially problematic area in news reporting. In individual cases, it may certainly be irrelevant what the source of an insight is; however, frequent reporting of speech without naming names is almost certain to erode the impression that a journalist is engaging in reporting rather than analysis. We see this therefore as a fine line to walk. But how common is this practice, and what types of sources go unnamed in the corpus? Overall, 10.3% of the tokens in Sample A were coded as Unnamed. Individual papers range from 6% to 12.5%; however, the differences among the papers are not statistically significant. We may say simply that in each paper, approximately 1 in 10 tokens has an unnamed source. There are essentially two types of tokens in this category: those in which the journalist cites a specific individual without naming him or her, and those in which the journalist makes a generalization about what is said by a class of people, such as “experts.” What types of sources most commonly go unnamed? Figure 5 shows how the Unnamed tokens in Sample A are distributed across source types. One of the two categories tied for first place is Expert individual/organization. Example (23) is representative of the Unnamed tokens of this type. Often, such general statements are followed up with a quote from a named expert; however this does not always happen. (23) Only those people who registered themselves because of their intense interest in the race are highly likely to vote, experts say.

Also quite common is the category Republican official, followed not long after by the category Democratic official. Examples (24) and (25) illustrate this sort of token, in which information is reported that has been divulged, typically, by some source close to a candidate. In the majority of cases (80% for Republican officials, 85% for Democratic officials), such sources are named, so one wonders what the conditions are that would prevent such naming, as well as what the significance is of identifying or not identifying such sources.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.25 (1364-1388)


Liberal indiv/org

Undecided voter

News indiv/org

Coordinated

Conserv indiv/org

Survey

Special int indiv/org

Unaffiliated indiv/org

Dem official

Gov indiv/org

Other source

Rep official

18% 16% 14% 12% 10% 8% 6% 4% 2% 0%

Expert indiv/org

% of Unnamed tokens

Source type of Unnamed tokens in Sample A

Source

Figure 5. Distribution across source types of the 227 Unnamed tokens in Sample A

(24) An adviser to Mr. Cheney said the vice president would not even be particularly focused on Mr. Edwards. (25) The city is third on the Bush campaign’s advertising intensity list, however, and Republican officials concede that the president is not doing as well as expected in that conservative stronghold.

Lastly, another common category is the unnamed government official, as in (26). This category frequently borders on that of the anonymous source, since it occurs fairly frequently that government officials agree to make statements to the press only on condition of anonymity. However, in many cases, it is not made clear whether such a condition was in force. (26) In disputing claims by Mr. Kerry that the Americans had lost the explosives, a senior administration official said Thursday, “We don’t know all the facts and no one should be jumping to conclusions.”

The practice among journalists of leaving unspecified the identity of sources in the absence of a pressing need for anonymity may be seen as a potential cause for concern. However, we acknowledge that this is a complex issue worthy of more discussion than we can give it here.



JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.26 (1388-1510)

 Gregory Garretson and Annelie Ädel

Reporting words used to report speech events A final approach we may take to the data is to look at the choice of words used by journalists in reporting speech. The reporting words that form the core of the tokens in this study are themselves interesting because of the different ways in which they allow a journalist to present speech events. In this section, we take a closer look at the reporting words used in the corpus, focusing specifically on which words are associated with the two candidates in the election. Previous research suggests that a journalist’s choice of one reporting verb over another is potentially significant. Floyd (2000) demonstrates that in cases in which one side is favored by the press, that side tends to state, explain, inform, and confirm, while the disfavored side tends to claim and allege. Leitner (1986: 196) states that “in the vast majority of cases say is used with people in authority and claim with those with little or no authority.” And so, while protesters make claims, government authorities make statements, or simply say things. Such lexical choices on the part of the journalist reinforce the notion that the burden of proof is on the former and not the latter. As Leitner (1986: 196) continues: Linguistic practices like these assign to participants in a story particular roles (authority, claimant etc.) and role attributes (trustworthy, doubtful etc.) that they may not necessarily have had in reality. They may, thus, reinforce the common bipolar interpretation of events (i.e. pro/con) and lead to ideologically biased readings of reality.

Let us begin by surveying the landscape where these words are concerned. Table 4 shows the top 30 lemmas (out of 120) used for reporting speech acts in the whole corpus.12 Forms of the verb say accounted for fully 47% of the 43,000 tokens found. Interestingly, the second most common lemma, tell, accounted for only 3% of the tokens. After that, the numbers taper off slowly, meaning that with the one enormous exception of say, the reporting words used are fairly similar to each other in frequency. How do these reporting words pattern with regard to different sources? We performed an analysis of the reporting words used when George Bush and John Kerry are cited in the corpus, counting the number of times each word was used in connection with the name Bush and the number of times each word was used in connection with the name Kerry.13 We retained all lemmas occurring at least . Note that we counted both verbal and nominal forms when we searched our corpus (which was not part-of-speech tagged). In many cases, the verbal and nominal form of a lemma are homographs and were considered together. Several non-homographic nominal forms were searched for as well; examples are included in Table 4. . These are only cases in which Bush or Kerry is being cited, not when they are being spoken about. We also excluded references to other people with the same last name. However, note

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.27 (1510-1579)

Who’s speaking? Evidentiality in US newspapers 

Table 4. The 30 most common reporting-word lemmas in the corpus Lemma say tell ask call talk argue/argument according to accuse/accusation suggest(ion) state(ment) criticize/criticism charge speak note agree

N

%

20,563 1355 1069 876 800 778 752 666 651 582 568 528 485 471 467

47.26% 3.11% 2.46% 2.01% 1.84% 1.79% 1.73% 1.53% 1.50% 1.34% 1.31% 1.21% 1.11% 1.08% 1.07%

Lemma claim promise report add cite/citation describe answer respond announce(ment) predict(ion) mention propose conclude/conclusion remark urge

N

%

464 421 421 408 388 386 373 352 334 331 310 303 297 291 291

1.07% 0.97% 0.97% 0.94% 0.89% 0.89% 0.86% 0.81% 0.77% 0.76% 0.71% 0.70% 0.68% 0.67% 0.67%

ten times with one candidate; this amounted to 36 lemmas used 3411 times in total. For the most part, each candidate’s list was similar to that shown in Table 4; however, there were some differences. In both lists, the top four lemmas are the same and account for 63% of Bush citations and 60% of Kerry citations: say (by far the most common), tell, accuse/accusation, and charge. After that, differences emerge. In order to identify words that characterize citations of one speaker more than the other, we calculated which of the 36 lemmas were used at least twice as often with one candidate as with the other (after normalizing the numbers). The list of all lemmas passing this test is given in Table 5. In only about half of these cases does the difference actually reach statistical significance – these are marked with a dagger in the table – due to the extremely infrequent occurrence of these words in comparison with say. This highlights the need for a very large quantity of data in order to perform such a study.14 A brief examination of these tokens begins to give a sense of the discourse that characterized the campaign. To begin with, as exemplified in (27)–(31), the frequent association of Kerry with verbs such as criticize and argue, and of Bush with that this method did not capture instances of pronominal references to these two figures, and therefore did not capture all instances of evidentiality with these sources. . We are fairly confident that with a larger corpus, all the words passing our test would reach statistical significance, and so we include them all here. Especially when engaged in qualitative studies, we believe it is best not to be overly focused on statistical measures, especially given the complexities of discourse. For example, consider the fact that a given form (e.g., agree) may have different meanings and serve different discourse functions.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.28 (1579-1619)

 Gregory Garretson and Annelie Ädel

Table 5. Reporting-word lemmas used at least twice as often with one candidate as with the other, in descending order of frequency (statistical significance marked with †) Bush-associated reporting words

Kerry-associated reporting words

portray† insist defend acknowledge/acknowledgment† suggest repeat describe

argue/argument† criticize/criticism† state(ment)† remark† talk agree mention pledge† quote vow

insist, acknowledge, and defend, paints a picture of Kerry as being on the offensive and Bush on the defensive, especially when it comes to Bush’s past performance as president. Italics have been added to indicate the referring word in focus. (27) Kerry criticized Bush for his tax cuts and policies such as a massive federal deficit, a Medicare prescription drug benefit that some called confusing and a spiraling morass in Iraq. (28) Kerry argued that Bush’s “miscalculations have created a terrorist haven that wasn’t there before.” (29) Bush insisted that the invasion of Iraq was the work of a coalition; Kerry replied that if Missouri were a country, it would qualify as the coalition’s third largest member. (30) Bush acknowledges that the lack of WMD [weapons of mass destruction] has been a surprise, but that Saddam still had the means and intention to develop them. (31) Bush defended his environmental credentials by saying the air had gotten cleaner since he became president.

Meanwhile, Bush’s attacks on Kerry were often presented as portrayals, as in (32). This suggests two things. First, Bush appears to be working hard on creating a negative image of Kerry during the campaign. Example (33), although it uses a different reporting word, presents an especially clear example of this attempted manipulation. Second, the reporting word portray seems to us to be relatively lacking in force, compared to many alternatives such as accuse; we might therefore hypothesize that the press was not fully convinced by Bush’s attempts to influence Kerry’s image. (32) Bush portrayed Kerry as a politician who has flip-flopped his way through an undistinguished Senate career.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.29 (1619-1675)


(33) “I can see why people think he changes position quite often. Because he does,” Bush responded.

Similarly, there is a large number of tokens in the data that refer to Kerry’s statements or remarks – more so than Bush’s – as exemplified in (34) and (35). Typically, Kerry’s statements are taken up and attacked by the Republicans, and then defended by the Democrats. (34) Bush said Kerry’s statements had denigrated soldiers and commanders in Iraq and were part of a pattern of saying almost anything to get elected. (35) Kerry campaign spokesman Phil Singer said it was the second time in three days that the Bush campaign had distorted Kerry’s remarks.

This seems to suggest that in this race, the primary areas of focus were Bush’s past actions and Kerry’s current statements. This appears reasonable, given that Bush had already been president for nearly four years, and that his approval ratings were exceptionally low for an incumbent president in wartime. Thus, his record was rather vulnerable to attack. Kerry, by contrast, was relatively unknown before the campaign. As he went on the offensive against Bush’s record, the statements he made were of critical importance; if they were accepted by the public, it would position him well to win the presidency. On the other hand, if the Republicans could successfully discredit them, Bush’s image would be protected. Without wanting to go too far, we might suggest that the reported speech in this corpus leads to the conclusion that the 2004 election revolved around the Republicans’ efforts to undercut Kerry’s and other Democrats’ critiques of President Bush. It could be argued that they were successful.

Conclusion This study has looked at hearsay evidentiality in a corpus of US newspaper reporting, examining who gets to speak, whether opposing sides get to speak equally, how directly these sources’ speech is reported, and how its presentation differs between the main actors in the drama of the 2004 presidential election. The main impetus for the study was the question of whether there is bias in news reporting and, if so, whether it might be realized in part in the choice of who gets to speak and how much. Like many previous studies that have looked at other factors, we found, overall, no evidence of bias reflected in the evidentiality examined in the corpus. We did detect evidence of subtle preferences for one candidate or the other: specifically, preference for George Bush as a source in the Houston Chronicle, and preference for John Kerry in the Boston Globe, but we must emphasize that since these measures did not reach statistical significance, the jury is still



JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.30 (1675-1734)

 Gregory Garretson and Annelie Ädel

out on the importance of these findings. Furthermore, even if we were to find an overwhelming tendency to cite one source over another, whether that should be interpreted as bias is an open question. We found that newspapers are remarkably uniform in the proportion of direct and indirect reported speech they use, with a near-constant ratio of 40% to 60%, respectively. We also found that approximately one in ten tokens of evidentiality is nested, with one source being allowed to speak for another. We pointed out that this, especially as it often co-occurs with negation or hypotheticals, is potentially dangerous territory for journalists, who run the risk of supporting possible misrepresentations. We noted as well that the corpus contains many instances of unnamed sources – some truly anonymous, some individuals who go unnamed, and some generalizations over classes of people. This is another practice that we see as potentially hazardous for journalists, as it may create the impression of analyzing rather than reporting. Worse, it may create an impression of untrustworthiness – a reputation that, fairly or not, journalists already “enjoy” in the United States (Niven 2002; see above). Finally, we looked at the reporting words associated with George Bush and John Kerry, suggesting that reporters’ lexical choices say a great deal about the situation reported on, providing a layer of meaning over and above the content of the utterances reported. From this perspective, we described the contest as one in which Kerry is on the offensive, trying to indict Bush for past failures, and Bush is on the defensive, attempting to discredit Kerry and undercut his accusations. Of course, this study is a first exploration of these aspects of news texts. Much more work will be necessary before definite conclusions may be drawn. We see both quantitative and qualitative methods as being of use here. On one hand, a larger body of data would make it easier to make statistically valid inferences about some aspects (such as the use of different reporting words). On the other hand, detailed qualitative studies not only of who speaks, but of what they are allowed to say, and of how they are presented as saying it will be necessary before we have a full picture of what journalists do when they bring sources into their texts. We feel it is incumbent upon us to conclude by pointing out that today, fewer Americans turn to newspapers for news than to television, and that about as many get news from the radio as read newspapers.15 Also, the Internet is steadily in. According to a report released in mid-2004 by the Pew Research Center for the People and the Press (http://people-press.org/reports/pdf/215.pdf, successfully retrieved in October 2006), while 4 in 10 people reported reading a newspaper regularly, and the same number listened to radio news, far more watched television news. The study also notes the rapidly increasing partisan nature of cable news audiences, with channels such as Fox News steadily attracting more Republican viewers and fewer Democratic viewers.

JB[v.20020404] Prn:28/04/2008; 11:15

F: SCL3108.tex / p.31 (1734-1816)

Who’s speaking? Evidentiality in US newspapers 

creasing in popularity as a source of news. Therefore, any large-scale search for bias in the media must look at these media as well as at newspapers. In fact, with the weight of many studies, including this one, indicating that print journalists are very careful to avoid biased reporting, it may be that radio and television – especially with the recent increase in cable news networks – are more likely to yield evidence of political partisanship.

References Aikhenvald, A. 2004. Evidentiality. Oxford: OUP. Bhatia, V. K. 2004. Worlds of Written Discourse: A genre-based view. London: Continuum. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Caldas-Coulthard, C. 1994. On reporting reporting: The representation of speech in factual and factional narratives. In Advances in Written Text Analysis, M. Coulthard (ed.), 295–310. London: Routledge. Chafe, W. 1986. Evidentiality in English conversation and academic writing. In Evidentiality: The linguistic coding of epistemology, W. Chafe & J. Nichols (eds). Norwood NJ: Ablex. D’Alessio, D. & Allen, M. 2000. Media bias in presidential elections: A meta-analysis. Journal of Communication 50: 133–156. Fairclough, N. 1995. Media Discourse. London: Arnold. Floyd, A. 2000. The reporting verb and bias in the press. Revista Alicantina de Estudios Ingleses 13: 43–52. Garretson, G. & O’Connor, M. C. 2007. Between the humanist and the modernist: Semiautomated analysis of linguistic corpora. In Corpus Linguistics Beyond the Word: Corpus Research from Phrase to Discourse, E. Fitzpatrick (ed.), 87–106. Amsterdam: Rodopi. Jakobson, R. 1998. On Language: Roman Jakobson, L. R. Waugh & M. Monville-Burston (eds). Cambridge MA: Harvard University Press. Leitner, G. 1986. Reporting the events of the day: Uses and function of reported speech. Studia Anglica Posnaniensa 18: 189–204. Lucy, J. 1993. Reflexive language and the human disciplines. In Reflexive Language: Reported speech and metalinguistics, J. Lucy (ed.). Cambridge: CUP. Niven, D. 2002. Tilt?: The search for media bias. Westport CT: Praeger. Sauerzopf, R. & Swanstrom, T. 1999. The urban electorate in presidential elections, 1920–1996. Urban Affairs Review 35(1): 72–91. Scollon, R. & Scollon, S. 1997. Point of view and citation: Fourteen Chinese and English versions of the ‘same’ news story. Text 17(1): 83–125. Watts, M., Domke, D., Shah, D. & Fan, D. 1999. Elite cues and media bias in presidential campaigns. Communication Research 26(2): 144–175. Waugh, L. 1995. Reported speech in journalistic discourse: The relation of function and text. Text 15(1): 129–173.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.1 (48-113)

Television dialogue and natural conversation Linguistic similarities and functional differences Paulo Quaglio State University of New York at Cortland, USA

Motivated by ESL (English as a Second Language) concerns, this study compares the language of a U.S. situation comedy, Friends, with natural conversation. A corpus of transcripts of the television show and the American conversation subcorpus of the Longman Grammar Corpus are used for analysis. This data-driven investigation combines multidimensional (MD) methodology (Biber 1988) with a frequency-based analysis of a large number of linguistic features associated with the typical characteristics of face-to-face conversation. The results of the MD analysis indicate that Friends shares the core linguistic characteristics of face-to-face conversation, thus constituting a fairly accurate representation of natural conversation for ESL purposes. However, a closer look at the linguistic features revealed interesting functional differences between the two corpora. These differences pointed to distinct functional patterns (e.g., vagueness, emotional language) suggested by the association of linguistic features sharing similar discourse functions.

.

Introduction

There has been an increasing call for the use of authentic materials in the ESL (English as a Second Language) classroom (e.g., Burns, Gollin, & Joyce 1997; Carter & McCarthy 1994; Carter & McCarthy 1995; McCarthy 1998). In particular, the depiction of spoken language in ESL textbooks has been shown to be problematic. Research has revealed a discrepancy between the characteristics of naturally-occurring conversation and the dialogues found in ESL textbooks (e.g., Carter & McCarthy 1995; Koester 2002). The discourse functions of conversation are linguistically realized by specific sets of grammatical features, which are rarely taken into account in ESL textbooks. As McCarthy and Carter (1995: 211) put it, “speakers regularly make [grammatical] choices which reflect the interactive and interpersonal nature of the communication.” To successfully interact in different social contexts, ESL students need

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.2 (113-252)

 Paulo Quaglio

to be aware of these grammatical choices and be exposed to the characteristics of spoken discourse (Burns, Gollin, & Joyce 1997). Spoken corpora can be instrumental in helping teachers and students accomplish this goal as “they offer direct access to characteristics of speech, so often inadequately described in textbooks” (Mauranen 2004: 89). . Television as a source of spoken data Despite the overall agreement on the important role of natural conversation in ESL instruction, spoken corpora are not as readily available as written corpora (e.g., McCarthy 1998; O’Keeffe & Farr 2003). Teachers are then faced with the difficult task of collecting and transcribing such data themselves. A possible alternative to this time-consuming and costly solution is the use of television dialogue, which can be easily collected by teachers and brought into the classroom. Washburn (2001), for example, recommends the use of American situation comedies, especially for pragmatic language teaching and learning. Even though the use of television dialogue as a surrogate for natural conversation for ESL purposes is appealing as well as practical, the language of television dialogue has not been fully analyzed from a grammatical point of view. Addressing the need to bring natural conversation into the ESL classroom (and the difficulty of obtaining spoken corpora), and recommendations for the use of situation comedies as a representation of face-to-face conversation, I take a corpus-based approach to compare the language of the U.S. sitcom Friends with naturallyoccurring conversation. Friends was chosen not only because of its popularity but also because of its nature: a show about people who just sit around and talk, which makes it an interesting object of study for linguistic analysis, both as a comparison to natural conversation and as an object of study in itself.

. Methodology In this section, the two corpora used for analysis, Friends and the American English conversation subcorpus of the Longman Grammar Corpus, are described. A summary and examples of the major situational characteristics (types of settings and interactions) of each corpus are also included. Finally, an explanation of how the data were coded for analysis is provided. . The Friends corpus: Settings and interactions The Friends corpus comprises transcripts (not scripts) of nine seasons of the show (from 1994 to 2003) and has approximately 600,000 words. The episodes were

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.3 (252-257)

Television dialogue and natural conversation

transcribed and made available for entertainment purposes by several online fan clubs. The data used for analysis were taken from one of these fan clubs, Crazy for Friends (http://www.livesinabox.com/friends/). Transcripts of three episodes from each season (a total of 27 episodes) were randomly selected and compared with the actual videos of the shows and were considered fairly accurate and very detailed, Table 1. Composition of the Friends corpus Seasons 1 (1994–1995) 2 (1995–1996) 3 (1996–1997) 4 (1997–1998) 5 (1998–1999) 6 (1999–2000) 7 (2000–2001) 8 (2001–2002) 9 (2002–2003) Total

# of episodes

# of words

Average # of words/episode

24 23 25 23 24 23 23 23 24 206

60,180 65,364 67,994 71,732 57,460 69,652 60,882 76,205 75,298 604,767

2,507 2,842 2,720 3,119 2,394 3,028 2,647 3,313 3,137 2,935

Table 2. Summary of settings and types of interactions in Friends Settings

Types of interaction

Central Perk

Discussing things only guys can do (e.g., pee standing up) and only women can do (e.g., take out bra without taking off blouse) Discussing date plans for Saturday night Phoebe is coaching Chandler on how to break up with Janice Discussing plans for New Year’s and how bad it is not to have a ‘partner’ Talking about how to quit smoking Playing the keyboard/friends listen and make comments Joey is trying to convince Monica to pose as his girlfriend Monica is trying to convince Rachel to waitress for her Monica & Phoebe are preparing for a barbecue for Rachel’s birthday and talking about Joey’s steady date A blind date situation Ross is showing Rachel how to do laundry & ‘hitting on’ her Monica and Angela are talking about ‘guys’

Monica and Rachel’s apartment

A fancy restaurant At a Laundromat The ladies’ room at a restaurant Chandler’s office Monica’s apartment Chandler and Joey’s At the beach

Chandler interacts with supervisor Making food Discussing how Ross’s date the previous night did not end up with sex Playing games; talking about sex partners; dating



JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.4 (257-330)

 Paulo Quaglio

including several features that scripts are not likely to present: hesitators, pauses, repeats, and contractions. Table 1 shows the composition of the Friends corpus. A sampling of settings and types of interaction was carried out with the analysis of every fifth episode of each of the nine seasons (approximately 41 episodes). Table 2 shows a summary of the most frequent scenes and types of interactions identified in the Friends corpus. In spite of the presence of a few alternative settings (e.g., restaurant, Laundromat) throughout the show, most of the action takes place in two of the characters’ apartments (Monica and Rachel’s and Chandler and Joey’s) and Central Perk, a coffeehouse where the group of friends meets regularly. This brief analysis reveals not only a limited number of settings (as compared to the conversation corpus) but also an extremely restricted range of conversation topics, which typically involve relationships, love, dating, and sex. In Extract 1, the characters talk about their plans for New Year’s. The topic of the conversation then shifts to relationships and how hard it is not to have a partner on such an occasion.

Extract 1: Friends, season 1, episode 10, The one with the monkey Rachel:

Hey, do you guys know what you’re doing for New Year’s? (They all protest and hit her with cushions) Gee, what?! What is wrong with New Year’s? Chandler: Nothing for you, you have Paolo. You don’t have to face the horrible pressures of this holiday: desperate scramble to find anything with lips just so you can have someone to kiss when the ball drops!! Man, I’m talking loud! Rachel: Well, for your information, Paolo is gonna be in Rome this New Year, so I’ll be just as pathetic as the rest of you. Phoebe: Yeah, you wish! Chandler: It’s just that I’m sick of being a victim of this Dick Clark holiday. I say this year, no dates, we make a pact. Just the six of us-dinner. All: Yeah, okay. Alright. Chandler: Y’know, I was hoping for a little more enthusiasm. All: Woooo! Yeah!

. The conversation corpus: Settings and interactions The American English conversation subcorpus of the Longman Grammar Corpus has approximately 4 million words. Carefully designed to be representative of American conversation, it includes a wide range of settings (e.g., park, family home, classroom) and types of interaction (e.g., casual/task-related/telephone conversations). For the purposes of this study, a subcorpus of the American conversation corpus (of approximately 590,000 words) was utilized for analysis. This was done to make the searches for some of the linguistic features more manageable, as some of them had to be checked manually for disambiguation purposes.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.5 (330-448)

Television dialogue and natural conversation 

Table 3. Composition of the American English conversation subcorpus Speech Types Casual conversation Task-related/service encounters/casual Phone/casual conversations Work-related only Complete subcorpus

# of texts

# of words

Average # of words/text

38 19 16 2 75

312,807 152,819 108,347 15,749 589,722

8,232 8,043 6,771 7,874 7,863

This sampling was proportional to the number of words in the four most frequent speech types (casual conversations, task-related and service encounters, phone and causal conversations, and work-related conversations) and settings and interactions described in the headings of each text.1 The four speech types were included in the analysis (not only casual conversation) because the purpose of the study is to compare natural conversation in general to the language of Friends. Table 3 shows the final composition of the American English conversation subcorpus utilized for linguistic comparisons. In addition to the number of texts that make up each of the speech type groups, the table shows the total number of words per group of speech types and the average number of words for each text in each of the four categories. These 75 texts containing a total of 589,722 words were utilized for analysis and will be referred to as the conversation corpus. The overall analysis of settings and interactions within the four groups of speech types in the conversation corpus was based on the information contained in the file headers and qualitative analysis of several segments of dialogues. Table 4 gives a brief summary of the types of settings and interactions within each of the groups of speech types. This analysis reveals that each of the groups of speech types comprises a wide range of settings and interaction types/conversation topics. Casual conversations refer to exchanges between family members and close friends; examples of task-related interactions include exchanges in a school in which students follow the teacher’s instructions in a pottery class; service encounters include interactions at a community college’s registrar’s office (where students registering for classes interact with an attendant); the phone/casual conversation set of texts is characterized by casual conversations interrupted by occasional phone conversations, most of which show just one side of the conversation; finally, most of the work-related interactions occur in business offices. Extract 2 is an example of casual conversation, the most frequent speech type found in the conversation corpus.

. For a detailed description of the sampling procedures, see Quaglio (2004).

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.6 (448-467)

 Paulo Quaglio

Table 4. Summary of settings and types of interaction within speech types Speech Type

Settings

Types of interaction/conversation topic

Casual Conversations

At home

Packing; playing games

Home/bedroom Café, parking lot Condo/kitchen Small restaurant In the car

Chit-chat/gossip Women meeting at Starbuck’s for coffee Eating dinner and talking afterward 4 women meeting for dinner Chatting; talking about cultural issues

Task-related & University building/ Service Encounters board room Study room Faculty office Small office space Community college Registrar’s office Dining room

Requesting money from Program Board for events Students studying, checking out books Discussion of collaborative article Interacting with co-workers and supervisors In a pottery class Registering for classes Making Christmas cookies

Phone & Casual Conversation

Relaxing in the living room; chatting over the phone Having lunch with father; at home with mother; on the phone Preparing dinner, eating, talking on phone Casual conversation; phone call Family chit-chat; phone conversation

Living room Restaurant/home Kitchen/living room Alterations shop Private home

Work-related

Business office House

Talk related to work activities Business meeting: discussing contents of a business letter

Extract 2. Setting: Living room – Casual conversation A: B: A: B: C: B:

So Larry did you manage to get any sleep beside Michelle’s crying. I didn’t hear a thing. Really. Yeah. God, I can’t believe it. I didn’t hear a thing.

It is interesting to notice that most work-related interactions, as in Extract 3, tend to be rather casual in spite of the ‘more serious’ topics of the conversations. Notice, for example, the presence of incomplete sentences, a hesitator (Uh), a discourse marker (well), and a moderated expletive (Biber et al. 1999: 1095) (gosh), which are typical features of face-to-face conversation.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.7 (467-526)

Television dialogue and natural conversation 

Extract 3. Setting: Business office – Work-related interaction A: Do you wanna attend the two-thirty meetings on Tuesday with the food vendors and the fire and healthy. B: Uh yeah, I don’t think I can. Well we’re, let’s say. A: The College of Santa Fe. B: College of Santa Fe gosh, I don’t know that I can do that. . .well let me put it on the calendar, Dwight.

. Data coding The two corpora were annotated for parts of speech and various grammatical features using an automatic grammatical tagger developed by Douglas Biber. The Biber Tagger ‘tags’ texts for over 100 linguistic features. This grammatical annotation makes it possible to search for grammatical features or a combination of lexical items and grammatical features (e.g., date used as a noun – as opposed to the verb form). Below is an example of a tagged text; each tag is followed by its description. I have a date tonight

^pp1a+pp1+++ ^vb+hv+vrb++ ât++++ ^nn++++ ^nr+tm+++

[1st person personal pronoun] [Have as main verb] [Indefinite article] [Singular noun] [Time adverbial noun]

All of the searches for linguistic features (see the Appendix for a complete list) were done automatically using a concordance software program, MonoConc Pro 2.2 (Barlow 2002) and were manually checked for accuracy and disambiguation purposes.

. Results The present study combined multidimensional (MD) methodology (Biber 1988) with a frequency-based analysis of 166 linguistic features associated with the typical characteristics of naturally-occurring conversation. Friends was compared to conversation on Biber’s Dimension 1 (involved vs. informational production), showing striking similarities. Despite these similarities, a closer look at the linguistic features revealed interesting functional differences between the two corpora. These functional differences were suggested by the association of linguistic features sharing similar discourse functions, which ultimately characterize each of the corpora.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.8 (526-603)

 Paulo Quaglio

In the following sections, I report on these results. First, I present the results of the MD analysis focusing on the similarities between the two corpora. I then describe the frequency-based comparisons of linguistic features and discuss the functional differences suggested by the results. . Multidimensional analysis: Similarities Multidimensional (MD) analysis is a quantitative corpus-based technique designed to find and interpret the co-occurrence of certain linguistic features in a corpus. “On the assumption that co-occurrence reflects shared functions, analysts interpret the co-occurrence patterns to assess the situational, social, and cognitive functions most widely shared by the linguistic features” (Biber et al. 2002: 14). The present study applied Biber’s 1988 model of register variation focusing on Dimension 1. Briefly, high frequencies of features such as private verbs (e.g., think, realize), that-deletions, contractions, present tense, first- and secondperson pronouns, the pronoun it, and demonstrative pronouns tend to co-occur in involved registers (e.g., face-to-face conversation), reflecting shared context, interactiveness, and real-time production, as in Extract 4.2 Conversely, registers like news reportage and academic prose (as in Extract 5) are characterized by the cooccurrence of features such as nouns, nominalizations, attributive adjectives, and prepositions, reflecting the predominantly informational focus of these registers. The relevant features are underlined in both extracts.

Extract 4. The conversation corpus 1. Ira: 2. Brian: 3. Ira: 4. Amy: 5. Ira: 6. Brian: 7. Ira: 8. Brian: 9. Ira:

I realize why I don’t watch these sitcoms. Cause they’re stupid. Yeah. Friends is great. No. That show is stupid. It was great last year. It was really good last year. I hate sitcoms. You watch Seinfeld? That’s, no I haven’t watched it in forever and a day, but that’s one of the only ones I actually . Intelligent, humorous.

Extract 5. Academic prose, Longman Grammar Corpus It is well recognized that a successful germination and establishment is one of the main contributing factors governing high yields.

. See Biber, Conrad, and Reppen (1998), page 148, for a complete list of Dimension 1 features.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.9 (603-697)

Television dialogue and natural conversation 

Table 5. Friends on Biber’s (1988) Dimension 1 Scores

Selected Registers

35 34 30 25 20 15 10 5 0 –5 –10 –15

Face-to-face conversations Friends

Personal letters

Prepared speeches General fiction Press editorials Academic prose

Once the Friends corpus was grammatically tagged, it was run through TagCount, a program developed by Douglas Biber. In simplified terms, this program counts the grammatical tags for each of the texts and outputs scores on each of Biber’s dimensions of register variation by comparing these texts to those originally used in Biber’s (1988) study.3 For example, if a text receives a high positive score on Dimension 1, we conclude that it has a high frequency of the linguistic features characterizing involved registers, such as face-to-face conversation. The MD analysis revealed a striking similarity between Friends and face-toface conversation. Table 5 shows that the score obtained by Friends (34.4) on Dimension 1 was very similar to that of conversation (35.3), indicating a high degree of involvement. Despite the similarities, it should be noted that the Standard Deviation for Friends (4.3) was much smaller than that of conversation (9.1), suggesting that the sitcom presents much less variation when compared to conversation. This difference seems to result from the much narrower range of speech types found in Friends, as revealed by the analysis of settings and interaction types discussed in Section 2.

Extract 6: Friends, season 2, episode 11, The one with the lesbian wedding Joey: Phoebe:

Pheebs, who’s Evelyn Dermer? [contraction, pres tense vb, wh-question] I don’t know. [1st pers pron, contraction, private vb, pres tense] Who’s Soupy Sales? [contraction, pres tense vb, wh-question] Mrs. Green: Oh my god, there’s an unattractive nude man playing the cello. [contraction, pres tense vb]

. Refer to Biber (1988), Chapters 5 and 6, for a thorough description of multidimensional analysis and how the scores were computed.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.10 (697-736)

 Paulo Quaglio

Rachel:

Yeah, well just be glad he’s not playing a smaller instrument. [discourse particle, emphatic, be as main vb, pres tense vb, contraction] Mrs. Green: You have some life here, sweetie. [2nd pers pron, pres tense vb] Rachel: I know. [1st pers pron, private/pres tense vb] And Mom, I realize you and Daddy were upset when I didn’t marry Barry and get the big house in the suburbs with all the security and everything, [non-phrasal coord, 1st/2nd pers pron, private/pres tense vb, that-deletion, 1st pers pron, indef pron] but this is just so much better for me, you know? [dem pron, be as main vb/pres tense, emphatics, 1st/2nd per prons, private/pres tense vb/discourse marker] Mrs. Green: I do. [1st pers pron, do as pro vb, pres tense vb] You didn’t love Barry. [2nd pers pron, contraction] And I’ve never seen you this happy. [Nonphrasal coord, 1st pers pron, contraction, 2nd pers pron] I look at you and I think, oh, this is what I want. [1st pers pron, pres tense vb, 2nd /1st pers prons, private/pres tense vb, dem pron, be as main vb, pres tense vb, 1st pers pron, pres tense vb] Rachel: For...me. [1st pers pron] Mrs. Green: Well, not just for you. [discourse particle, emphatic, 2nd pers pron] Rachel: Well, what do you mean? [discourse particle, wh-question, 2nd per pron private/pres tense vb]

Among several other linguistic features, Extract 6 has 11 first-person pronouns, 8 second-person pronouns, 8 contractions, 16 present-tense verbs, and 6 private verbs. Most of this dialogue involves one of the main characters of the show (Rachel) and her mother (Mrs. Green), talking about personal issues – primarily the acknowledgement by Mrs. Green that her daughter’s decision not to marry her fiancé was right. In addition, Mrs. Green’s realization is woven with a tinge of envy of her daughter’s happiness. Later in the segment, Mrs. Green reveals her plan to leave her husband, accentuating the personal nature of the exchange. This short segment illustrates how the linguistic features of Dimension 1 (in bold, descriptions of features in square brackets) co-occur in Friends, reflecting a high degree of involvement. In conclusion, the results of the MD analysis indicate that Friends shares the core linguistic characteristics of face-to-face conversation, thus constituting a fairly accurate representation of natural conversation for ESL purposes. Extract 6 shows that the language of Friends closely resembles that of naturally-occurring conversation as revealed in Biber’s (1988) study of register variation. Next, I compare Friends to conversation from a functional standpoint.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.11 (736-822)

Television dialogue and natural conversation 

. Functional analysis of Friends Language is used for different purposes, such as to inform, to analyze, to describe, to persuade, and to express stance and personal feelings. These discourse functions are realized by sets of co-occurring linguistic features. The discourse circumstances of conversation (e.g., shared context, real-time production, interactiveness) (Biber et al. 1999: Ch. 14) are reflected by specific sets of linguistic features. For example, the shared context of conversation (location, knowledge) “is often reflected linguistically in the simplification of grammatical structures” (Quaglio & Biber 2006: 705). The apparent vagueness resulting from this simplification is reflected in the high frequency of features, such as first- and second-person pronouns, hedges (e.g., kind of ), and nouns of vague reference (e.g., stuff ). In this section, I compare Friends to conversation relative to the association of linguistic features sharing similar discourse functions. I focus on two of these functions: vagueness and emotional content.4 .. Vagueness Consonant with the MD analysis, most of the linguistic features associated with conversation revealed an overall high frequency in both corpora. A closer analysis, however, showed that certain sets of features tended to have much higher frequencies in one of the two corpora. These features were then grouped according to the discourse functions they shared. Table 6 shows that the vast majority of the features associated with vague language are more frequent in the conversation corpus. In this section, I focus on the description of a selection of these features: hedges, coordination tags, nouns of vague reference, and the discourse marker you know.

Hedges, coordination tags, and nouns of vague reference Conversation tends to be vague due to the shared context and pressures of realtime production (Biber et al. 1999: Ch. 14). It makes extensive use of devices of vague reference, including hedges (e.g., sort of, kind of ), coordination tags (e.g., and stuff like that), and nouns of vague reference (e.g., stuff, thing), which are probably the most obvious features associated with vagueness. Figure 1 groups together each of these devices of vague reference showing that each of these groups of features is more frequent in the conversation corpus and that the overall count of these vague devices (5,228 times/million words in conversation and

. See Quaglio (2004) for a detailed analysis of the functional differences and multiple examples.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.12 (822-893)

 Paulo Quaglio

Table 6. Features associated with vague language Categories

Feature

Hedges Coordination tags

kind of; sort of or something (like that) or anything (like that) (and) stuff (like that) thing(s); shit you know; I mean probably perhaps; maybe could might seem; appear so

Vague reference Discourse marker Stance markers Modals Copular verbs Utterance final so

Conv

Friends

• • • • • • •

Both

• •

• •

• •

3,745 times/million words in Friends) is almost 1.5 times more frequent in the conversation corpus.5 Examples (1) through (6) illustrate the use of these features. (1) A:

B:

and, uh, he showed them to some university professors at UNM or someone did and now he’s got some sort of honorary degree. I think they, they gave him permission to go fossil hunting in places where only university folks can go. (Conversation)

(2) Monica: I hate men! I hate men! Phoebe: Oh no, don’t hate, you don’t want to put that out into the universe. Monica: Is it me? Is it like I have some sort of beacon that only dogs and men with severe emotional problems can hear? (Friends) (3) A: B:

Presto Pasta it’s a fast food pasta place. Pasta’s kinda heavy though. (Conversation)

(4) Rachel:

Okay, Monica, y’know what, honey, you’re kinda losing it here! I mean, this is really becoming like a weird obsession thing. (Friends)

(5) A: B:

If you want to check out that shrine on Sunday or something Yeah, that would be cool. (Conversation)

(6) Phoebe: You guys wanna try and catch a late movie or something? Rachel: Maybe, but shouldn’t we wait for Chandler? (Friends) . Even though the two corpora had a similar size, I report the results in occurrences per million words for ease of comparison with other studies, which tend to use large corpora and report the results in this manner.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.13 (893-922)

Television dialogue and natural conversation  5500 5000 4500

Frequency per million words

4000 3500 3000

Conversation Friends

2500 2000 1500 1000 500 0 hedges

coordination tags

nouns of vague reference

Vague devices (total)

Figure 1. Frequency of hedges, coordination tags, and nouns of vague reference

The apparent imprecision caused by hedges has, in fact, important discourse functions. As Leech (2000: 695) puts it, hedging expressions “allow[ ] a speaker to take refuge in strategic imprecision.” In (1) and (2), the speakers are aware that honorary degree and beacon are not the most precise terms to use. The hedge sort of is an acknowledgment of this lack of precision and can be an ‘invitation’ for the interlocutor to collaboratively construct the intended meaning. In other words, hedges “make it easier for the listener to pick out the specific referent the speaker has in mind if the linguistic expression is not exact” (Aijmer 1984: 122). Further, hedges create a desirable sense of vagueness, which may lead interlocutors to actively participate in the interaction by asking clarification questions and volunteering possible interpretations. This imprecision is ultimately a communicative device which facilitates the interaction between speakers, making it a dynamic process of verbal exchanges. The hedge kinda in (3) and (4) reflects a non-confrontational tone in the exchanges. McCarthy and Carter (1997) suggest that the undesirable effect that overly direct utterances can create is functionally mitigated by the imprecision brought about by the use of hedges. In (3), this softening effect is enhanced by the use of the linking adverbial though at the end of the utterance. When used in final position in conversation, “though makes the disagreement much softer than a marker of direct contrast, such as but or however” (Biber et al. 2002: 394). In (4), the same soothing effect is created when kinda precedes the potentially rude or overly straightforward losing it here. Notice that Rachel’s use of the term of endearment honey further cushions the potentially face-threatening utterance. The

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.14 (922-992)

 Paulo Quaglio

coordination tag or something in (5) and (6) produces a similar mitigating effect in that it suggests flexibility and a desire not to impose.

Discourse marker you know The discourse marker you know is 3 times more frequent in conversation: it occurs 4,990 times/million words in the conversation corpus and 1,563 times/million words in Friends. Aijmer (1984) suggests that because you know often collocates with kind of and sort of, it shares their hedging function, thus contributing to the vague nature of conversation. In (7), you know collocates with kind of and is preceded by sort of intensifying the overall vagueness of the utterance; in (8), y’know is preceded by kinda (and the downtoner just), which contribute to accentuate the mitigating effect intended by the speaker. (7) A:

you could just sort of open them up yeah and just you know kind of spread them out. (Conversation)

(8) Rachel:

Hi! Sorry- sorry we’re late, we, uh, kinda just, y’know, lost track of time. (Friends)

The higher frequency of the linguistic features discussed in this section suggests that conversation in general tends to be more vague than the conversation presented in Friends. Vagueness is less desirable in Friends, as the audience (the interlocutors of the show) cannot interact with the characters. It seems like the use of vague devices in Friends is constrained by a “clarity cut-off boundary,” beyond which comprehension can be adversely affected. However, it is important to point out that all of the vague devices with their different intended functions do occur in Friends but to a much lesser extent. .. Emotional language In involved spoken registers, such as casual conversation, participants express feelings, attitudes, and concerns. This involvement is reflected in the speakers’ tone of voice, intonation patterns, nonverbal signals, and linguistic features. In this section, I focus on the linguistic choices speakers make to convey their feelings and express stance. I use the cover term emotional language here to refer to any emphatic form of expression that is captured by the use of certain linguistic features and can thus be identified in a transcribed corpus and studied from a grammatical standpoint. Several features have been associated with the expression of stance and emphatic content. Among these features are adverbial intensifiers (e.g., really), some inserts (e.g., wow), stance markers (e.g., of course), and expletives/taboo terms (e.g., damn). Based for the most part on a survey of the Longman Grammar of Spoken and Written English (Biber et al. 1999), 33 features associated with the expression of emotion and/or emphatic content were chosen for analysis. Table 7

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.15 (992-1083)

Television dialogue and natural conversation 

Table 7. Features associated with emotional language and/or emphatic content Category

Feature

Intensifiers

so; really; totally too damn oh; wow of course wow sure; fine damn; bastard; bitch(y) son of a bitch shit(ty) fuck [and variations] suck; screw(ed)(up) piss(ed)(off) ass; crap(py) all + adjective/gerund so + verb so (not) + NP so not + Adj totally [emphatic agreement] I can’t believe (+complements) thank you so much do look; feel; sound

Inserts Stance marker Non-minimal responses Expletives

Innovations

Lexical bundles Emphatic do Copular verbs

Conv

Friends •

•

• • •

• •

Both • •

• • •

• • • • • • • • • • •

shows that 27 of these features were more frequent in Friends, two had similar counts, and only four of them were more common in conversation. Next, I describe and illustrate some of these features: adverbial intensifiers, expletives/taboo terms, and linguistic innovations.

Adverbial intensifiers so, really and totally Among the more obvious features indicating emphatic and emotional content are adverbial intensifiers, such as so, really, and totally. Biber et al. (1999: 564–6) report that amplifiers (adverbial intensifiers) are most common in conversation and that speakers use a wide range of informal intensifiers to express stance, emotion, and for emphatic purposes. Really occurs 3,456 times/million words in the conversation corpus and 3,968 times/million words in Friends. In (9), the repetition of really intensifies its amplifying effect; in Friends, this repetition occurs over 3 times more often than in conversation (108 versus 34 times/million words, respectively). It is also interesting to notice that the distribution of the repetition really really is more ‘balanced’ in Friends: it modifies either adjectives or adverbs 61.5% of the time and precedes verbs in 38.5% of the instances. The modification of ei-

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.16 (1083-1162)

 Paulo Quaglio

ther adjectives or adverbs is preferred 75% of the time in the conversation corpus, as in Example (10), whereas verbs are modified in only 25% of the occurrences. Even though the difference in the overall frequency of really in Friends and conversation is not striking, this repetition phenomenon (along with the important differences described below) contributes much to the more emphatic/emotional nature of the language of Friends. (9)

Rachel: You really, really need to get some sleep, honey. Monica: I know I do. (Friends)

(10) A:

And I said, I can’t do that and she got really, really angry with me. (Conversation)

So, as in (11), is 1.7 times more common in Friends, occurring 1,449 times/million words versus 842 times/million words in the conversation corpus. In addition to this higher frequency, the most common adjectives modified by so in Friends are sorry and glad; in the conversation corpus, the most frequent right collocates (adjectives) are good and cute, suggesting a more personal (or affective) use of this adverbial intensifier in Friends. Totally is over twice more frequent in Friends, occurring 402 times/million words and 180 times/million words in the conversation corpus. Notice that, in (12), totally modifies the adjective tired and is interchangeable with completely or really. A more recent use of totally is illustrated in (13): it is more commonly used by younger speakers of American English and has the meaning of for sure. Whether used in its more canonical sense or with its innovative connotation, totally reflects emphatic/emotional content and is much more frequent in Friends. (11) Chandler: Oh man, I am so sorry. Are, are you okay? Joey: Well, I’ve been better. But, I’m all right. So you like her huh? (Friends) (12) A: B: (13) Rachel:

I was like totally tired, you woke me up and I was home alone when you called to play basketball? It’s alright You’re going today? (Conversation) I was giving you an apology and you were totally checking her out! (Friends)

Expletives/taboo terms Following Stenström (1991), I lump together expressions or words commonly referred to as expletives, taboo words, and swearwords in this analysis and refer to them as expletives. Expletives in general are strongly associated with the expression of emotion; they “are realized by taboo words related to religion, sex and the human body, which are used figuratively and express the speaker’s (genuine or

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.17 (1162-1204)

Television dialogue and natural conversation  500 450

Frequency per million words

400 350 300 250 200 150 100 50

Conversation

as

s(

ex

pr ) pi ss (s ed )(o ff)

y) (p ap cr

bi tc h of a

so n

bi tc h( y)

ba st ar d

sh

it(

ty

) da m n( +v ar )

fu

ck

(+

va

r)

0

Friends

Figure 2. Frequency of expletives associated with emotional language

pretended) emotions and attitudes” (Stenström 1991: 240). Figure 2 displays the frequency of the expletives selected for analysis. Surprisingly, except for fuck, shit, and piss(ed)(off ), all of these expletives are more frequent in Friends. I comment on and exemplify the use of most of these expletives below. Fuck (plus variations) and shit(ty) are by far the most frequent expletives in conversation, occurring 435 and 244 times/million words, respectively. In (14), fucking is an adverbial intensifier, and shit an emotionally-loaded noun of vague reference;6 both shit and fuck in (15) are used as exclamatory inserts and are thus instrumental in conveying speaker A’s strong dissatisfaction with the fact that his computer file has been deleted. Because of restrictions and regulations imposed by the televised medium, these terms are not part of the Friends lexicon. Such limitations may be responsible for the overuse of the adverbial intensifiers discussed above. Further, the restrictions of the use of fuck as an exclamatory insert and shit as an exclamatory insert, a (emotionally-loaded) noun of vague reference, and as an evaluative adjective (shitty) seem to explain the overuse of crap(py) in Friends: it is twice more frequent than in the conversation corpus, occurring 90 times/million . I call this use of shit ‘emotionally-loaded noun of vague reference’ to differentiate it from the apparently neutral use of shit as a noun of vague reference, as in “. . . yeah there’s some shit in here I would buy but I hate buying shit out of a catalog, personally” (Conversation).

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.18 (1204-1305)

 Paulo Quaglio

words, as an apparent compensation strategy. In (16), it is used as an exclamation and in (17) as an emotionally-loaded noun of vague reference. (14) A:

I would kick someone’s fucking ass, if anyone put shit like that on me . . . here, drink these. (Conversation)

(15) A: B:

It’s not a memo any more there or something. Oh shit, no wonder I, fuck, they wiped my Word out. (Conversation)

(16) Chandler: Ho, ho, ho, holy crap is it hot in here! Joey: Really, hey, you mind if I turn the heat down? (Friends) (17) Chandler: Y’know, of all my friends, no one knows the crap I go through with my mom more than you. (Friends)

The also surprisingly higher frequencies of damn (plus variations/combinations, such as damnit and goddamn), bastard, bitch, and son of a bitch in Friends may be the result of the same compensatory phenomenon discussed above. Even though the nature and grammatical uses of these terms are different from those of fuck and shit, in conjunction with other emotionally-loaded language they reflect the speakers’ emotions and feelings more emphatically than other less harsh terms perhaps would. Damn is over twice more frequent in Friends, occurring 223 times/million words versus 97 times/million words in the conversation corpus); bastard occurs 49 times/million words in Friends and only 8 times/million words in the conversation corpus; altogether, bitch(y) and son of a bitch occur almost 3 times more often in Friends (127 times/million words). All taken from Friends, examples (18) through (21) exemplify the use of these expletives. (18) Monica: Ross:

Damnit Ross, get your butt out of the bathroom. Calm down, I’m blow-drying. (Friends)

(19) Mrs. Geller: Ross:

Well what is it? Come on sweetie, you’re like, freaking me out here.

(20) Ross: Monica:

Well ah, Aunt Silvia was, well not a nice person. Oh, she was a cruel, cranky, old bitch! (Friends)

I hate Chandler, the bastard ruined my life. (Friends)

(21) Chandler: All right! Go left! Go left! Go right!! Go right!! Phoebe: I can’t!! I can’t!! Noooooooo!!!!!!! You son of a bitch!!!!! (Friends)

Innovative uses of all, so, and totally Linguistic innovations are often associated with the expression of stance (Quaglio & Biber 2006). I included the innovative uses of all, so, and totally here because of their emphatic characteristics and apparently increasing frequency in American

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.19 (1305-1372)

Television dialogue and natural conversation  140

Frequency per million units

120

100

80 Conver sat ion Fr iends 60

40

20

0 All + Adj/ Ger und

So ( not ) + verb/ noun

Tot ally (agr eement )

Figure 3. Frequency of linguistic innovations associated with emotional language

English conversation. Figure 3 shows that the frequencies of these three features are higher in Friends. The relatively new use of all followed by an adjective or a gerund is described by Waksler (2001: 128) as “a marker of the speaker’s upcoming unique characterization of some entity in the discourse;” this characterization is typically emphatic in nature. In (22), all intensifies the adjective it precedes (pissed); in (23), its emphatic content seems to spread to the whole chunk of discourse following it. The adverbial intensifier so was discussed at the beginning of this section. As a linguistic innovation, so, which typically precedes an adjective, modifies a noun, as in (24), or a verb, as in (25). As an intensifier of nouns and verbs, so is 23 times more frequent in Friends, occurring 70 times/million words; in the conversation corpus, it occurs only 3 times/million words. This discrepancy is probably due to the fact that the conversation corpus was collected between 1995 and 1996. This innovation then might not have been fully captured by the conversation corpus. (22) A: B:

Yeah, let him deal with Chris . Okay. He’ll be all pissed though. (Conversation)

(23) Chandler: What are you talking about? Joey: She was all crying. She-she said you guys want different things, and that and that she needed time to think. (Friends) (24) Ross: Susan:

Please. This is so your fault. How, how is this my fault? (Friends)

(25) Rachel:

And even though I am so looking forward to the next part, I am really gonna miss being pregnant. (Friends)

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.20 (1372-1427)

 Paulo Quaglio

Another fairly recent innovation in American English conversation is the use of totally not as an adverbial intensifier but as a self-contained expression of emphatic agreement, as in (26). With this function, totally is 3 times more frequent in Friends, occurring 32 times/million words; in the conversation corpus, it occurs 11 times/million words. It never occurs in the first turn of an interaction; rather, it is a response and typically occurs by itself. Semantically, it expresses more than simply “I agree with you”; it also shows stance in that it suggests unconditional agreement. (26) Chandler: That’s a great idea! We can easily think of a way for us both to enjoy the room. Monica: Totally! (Friends)

This section has focused on the linguistic markers of emotionally-loaded language. The frequency-based analysis of these markers revealed that the language of Friends tends to be more emotional and emphatic than conversation. This dramatic effect is linguistically realized by a high frequency of several features associated with (but not limited to) emotionally-loaded language, such as adverbial intensifiers, expletives, and linguistic innovations. It is important to emphasize that all of the features found in the conversation corpus (except for the expletives fuck and shit) are also found in Friends and vice versa. What differentiates one corpus from the other is the frequency in which these features occur.7 . Conclusion The MD analysis of Friends showed that this sitcom shares the core linguistic features of conversation. A closer frequency-based analysis of a large number of linguistic features associated with conversation revealed interesting differences between Friends and conversation at the functional level; two of these differences were addressed in this paper: vagueness and emotional language. On the one hand, Friends has a lower frequency of the linguistic features associated with vague language, such as hedges and nouns of vague reference; on the other hand, the language of Friends is much more emotionally-loaded, as evidenced by the much higher frequency of features such as adverbial intensifiers and expletives. These results may be a reflection of situational differences between Friends and the conversation corpus. The typical vagueness of conversation is undesirable in Friends, as it can ultimately lead to incomprehensibility. The higher . Evidently, there are several differences between television dialogue and natural conversation. For example, Friends has virtually no instances of overlap, interruptions, and unclear words. Unlike in conversation, the turns tend to be evenly distributed in Friends. These important differences, however, are beyond the scope of the present study.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.21 (1427-1605)

Television dialogue and natural conversation 

emotional content of Friends can arguably be a reflection of the dramatic nature of television dialogue and/or a result of the close relationship shared by the characters. Despite such differences, addressing the original ESL-motivated research question that guided the present study, the language of Friends, overall, is a fairly accurate representation of face-to-face conversation.

References Aijmer, K. 1984. Sort of and kind of in English conversation. Studia Linguistica 38: 118–128. Barlow, M. 2002. MonoConc Pro (Version 2.2) [Computer software]. Houston TX: Athelstan. Biber, D. 1988. Variation across Speech and Writing. Cambridge: CUP. Biber, D., Conrad, S. & Reppen R. 1998. Corpus Linguistics: Investigating language structure and use. Cambridge: CUP. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Biber, D., Conrad, S., Reppen, R., Byrd, P. & Helt, M. 2002. Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly 36: 9–48. Burns, A., Gollin, S. & Joyce, H. 1997. Authentic spoken texts in the language classroom. Prospect 12(2): 72–86. Carter, R. & McCarthy, M. 1994. Language as Discourse: Perspectives for language teaching. London: Longman. Carter, R. & McCarthy, M. 1995. Grammar and spoken language. Applied Linguistics 16(2): 141– 158. Koester, A. J. 2002. The performance of speech acts in workplace conversations and the teaching of communicative functions. System 30: 167–184. Leech, G. 2000. Grammars of spoken English: New outcomes of corpus-oriented research. Language Learning 50(4): 675–724. Mauranen, A. 2004. Spoken corpus for an ordinary learner. In How to Use Corpora in Language Teaching, J. M. Sinclair (ed.), 89–105. Amsterdam: John Benjamins. McCarthy, M. 1998. Spoken Language and Applied Linguistics. Cambridge: CUP. McCarthy, M. & Carter, R. 1995. Spoken grammar: What is it and how can we teach it? ELT Journal, 49(3): 207–218. McCarthy, M. & Carter, R. 1997. Written and spoken vocabulary. In Vocabulary: Description, acquisition and pedagogy, N. Schmitt & M. McCarthy (eds), 20–39. Cambridge: CUP. O’Keeffe, A. & Farr, F. 2003. Using language corpora in initial teacher education: Pedagogic issues and practical applications. TESOL Quarterly 37: 389–418. Quaglio, P. 2004. The language of NBC’s Friends: A comparison with face-to-face conversation. PhD dissertation, Northern Arizona University. Quaglio, P. & Biber, D. 2006. The grammar of conversation. In The Handbook of English Linguistics, A. McMahon & B. Aarts (eds), 692–723. Oxford: Blackwell. Stenström, A.-B. 1991. Expletives in the London-Lund corpus. In English Corpus Linguistics: Studies in honour of Jan Svartvik, K. Aijmer & B. Altenberg (eds), 239–253. London: Longman. Waksler, R. 2001. A new all in conversation. American Speech 76: 128–138. Washburn, G. 2001. Using situation comedies for pragmatic language teaching and learning. TESOL Journal 10: 21–26.

JB[v.20020404] Prn:11/04/2008; 11:43

F: SCL3109.tex / p.22 (1605-1610)

 Paulo Quaglio

Appendix Grammatical features selected for analysis* Grammatical features

Instances selected for analysis

Lexical verbs

say [incl. past tense], get, go, know, think, see, make, come, take, want, give, mean, tell become, get, look, feel, seem, go, remain, keep, grow, sound, prove, appear damn, bastard, bitch(y), son of a bitch, shit(ty), fuck (+var), ass, butt, crap(py) cool, suck, piss(ed)(off), screw(ed)(up), check out, hang out, totally(agreement), what’s up?, freak out, I’m out of here, later kind of, sort of, or something (like that), or anything (like that), (and) stuff (like that), stuff, shit (expletive used as vague reference) I mean, you know, you see, oh, well, wow, yeah (non-minimal response) probably, of course, perhaps, maybe, actually guys, man, dude, buddy, folks, bro, bud

Copular verbs Expletives Slang terms Vague language devices Inserts Stance markers Vocatives (familiarizers) Non-clausal units Intensifiers Innovations

Lexical bundles in conversation Lexical bundles in Friends Vernacular features Pro verb do + it Aspect, present / past perfect, Verbs controlling that& to-clauses) Tense Emphatic do Modals &semi-modals Possibility &permission modals (individually) Personal pronouns Repeats Greetings & leave-takings

non-clausal questions, wow, exactly, sure, right, fine, good, lovely, okay, uh-huh (+ var) so, really, too, totally, damn all + adj/gerund, so + verb, so (not) + NP, so not + adj, totally (agreement), ish (time, color, etc), so (at the end of a turn), in vs. for (negative + present perfect + time expression) I don’t know what, I don’t know if, do you want to, you know what I, I don’t want to I can’t believe you, are you doing here, I want you to, are you talking about, to talk to you, what do you think, thank you so much, what do you mean ain’t, me and . . ., there’s + plural notional subject do + it, does + it, did + it, have(has) done + it simple, progressive, perfect, present perfect, past perfect, past perfect (canonical function), past perfect (counterfactual function) think that, say that, guess that, want to, try to, like to, seem to present tense, past tense do, does, did necessity/obligation, possibility/permission, prediction, semi-modals can, could, may, might 1st person pronouns, 3rd person pronouns 2-repeats, 3-repeats, 4-repeats hi, hello, hey, bye + bye-bye, goodbye, (I’ll) see you later, (I’ll) see you, later, I’m out of here

* Only a few of these features were included in the present article. For the complete analysis of these features, see Quaglio (2004).

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.1 (48-141)

A corpus approach to discursive constructions of a hip-hop identity Kristy Beers Fägersten Universität des Saarlandes, Germany

This chapter is an analysis of a 100,000-word corpus consisting of message-board postings on hip-hop websites. A discourse analysis of this corpus reveals three strategies employed by the posters to identify themselves as members of the hip-hop community in the otherwise anonymous setting of the internet: (1) defined openings and closings, (2) repeated use of slang and taboo terms, and (3) performance of verbal art. Each strategy is characterized by the codification of non-standard grammar and pronunciations characteristic of speech, as well as by the use of non-standard orthography. The purpose of the discourse is shown to be a performance of identity, whereby language is used and recognized as the discursive construction of one’s hip-hop identity.

.

Introduction

When compared with face-to-face conversation, interaction via computer does not afford participants the same opportunities to actively gather information or even passively notice characteristics about each other through aural or visual cues. Despite the existence of web cameras and streaming video, typical computermediated communication (CMC) mainly consists of type-written text messages. Information about another’s gender, race or age, for example, which one’s appearance or style of speaking may reveal, can be undetectable – or even falsified – in CMC. As a result, there is both an element of anonymity and danger of deception associated with CMC, as well as with much other interaction or discourse on the internet. Some interlocutors take advantage of the anonymity to pass as someone or something they are not (Bechar-Israeli 1995; Cutler 1999; Nakamura 1995; Stone 1991; Turkle 1995; Van Gelder 1990; Wallace 1999); others take considerable measures to try to portray their true selves (Bechar-Isreali 1995; Cutler 1999; Warschauer 2000). Whatever their intentions may be, participants in CMC present themselves almost entirely through linguistic means, constructing their identities through their discourse. In cyberspace, you are what you type.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.2 (141-162)

 Kristy Beers Fägersten

The language particular to computer-mediated communication has been called the “third medium” (Crystal 2001: 48; cf. Ong 1982), denoting a variety of language that is similar to spoken language, but in written form. Ferrara et al. (1991) refer to electronic discourse as ‘Interactive Written Discourse’ (IWD), Collot and Belmore (1996) use the term ‘Electronic Language’ (EL), while Takahashi (2003) uses the term ‘Net-En’ to refer specifically to computer-mediated communication in English, denoting a variety distinct from written or spoken forms of English. Considering the extent of comparisons between CMC and written and spoken varieties of language, both of which can vary greatly according to context, it is surprising that analyses of CMC have not focussed more on context-based variation. While CMC provides opportunity for enhancement of communication, different contexts of usage impose restrictions on it (Allwood 2000; Herring 2001). The adaptation of language to fit these different contexts thus raises the question of “how people tailor their use of written language to the conditions of medium and situation, as well as to their communicative wishes.” (Hård af Segerstad 2002: 10) This acknowledgment of the effect of situation on communicative discourse reflects the “move from the ‘language of CMC’ to computer-mediated discourse” (Androutsopoulos 2006: 421). This chapter will focus on identifying features of discourse styles in the context of hip-hop message boards. The language of hiphop CMC will be viewed as an adaptation of written language to discursively construct identities within the context of internet message boards. Message board postings can be considered an example of the third medium (or IWD, EL, Net-En, etc.), as they reflect a tendency among contributors to write as they speak and can therefore be likened to spoken language, but nonetheless written by the ‘speakers’ themselves as they choose to represent their ‘speech.’ However, more central to the present analysis of hip-hop message board postings is the context of the discourse, in which language use contributes to a discursive construction of hip-hop identity. In this chapter, I present an analysis of discursive constructions of hip-hop identities, based on a corpus of message board entries collected from hip-hop websites. Such a corpus study allows for both a quantitative and qualitative examination of the strategies employed by message board posters to establish their hiphop identities in the otherwise anonymous setting of the internet. Traditionally, hip-hop communities consist of predominantly male, urban, African-American youths (George 1999). The practice of posting on a hip-hop message board serves to include posters in an online hip-hop community, where age, race, gender, and social background are not immediately known or obvious. The assertion of an online hip-hop identity is achieved discursively. The large amount of data in the hip-hop message board corpus (hereafter referred to as the HMBC) helps in determining the extent of linguistic systematicity in the discourse of the message board entries, which in turn encourages a micro-analysis of the most common strategies for discursively constructing a hip-hop identity, namely, (1) the use of

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.3 (162-216)

A corpus approach to discursive constructions 

distinct openings and closings, (2) repetition of slang and taboo terms, and (3) performance of verbal art. . Hip-hop identity in message board discourse Central to the culture of hip-hop is music, in particular, “a form of rhymed storytelling accompanied by highly rhythmic, electronically based music.” (Rose 1994: 2) Outside the internet realm, a hip-hop identity is furthermore defined and/or recognized in part by break-dancing, clothing, and graffiti (Berns & Schlobinski 2003; Cutler 1999; George 1999; McLeod 1999; Newman 2001; Rose 1994). In other words, in real as opposed to virtual settings, both aural and visual cues help to establish and identify a member of the hip-hop community. In a CMC environment and, central to this paper, in the message board environment, most visual and aural cues are minimized if not nullified. The anonymity associated with the medium of CMC and internet message boards does not allow for reliable identification or recognition of contributors, or even paralinguistic corroboration of their claimed identities. It is primarily through discursive means that contributors can identify themselves and each other, and thus it is a challenge to members of the hip-hop community to assert and maintain their hip-hop identities during computer-mediated communication. The discursive construction of a hip-hop identity is as much a function of what is posted as how it is composed. The content and form of hip-hop message board postings represent communal and cultural practices. Insofar as the members of the hip-hop community represent a subculture, their language can be considered a sociolect. Since hip-hop refers to a style of music as well as a cultural history of “dance, painting, fashion, video, crime and commerce [...]” (George 1999: viii), message board postings that do not concern a topic relevant to hip-hop culture risk being ignored or mocked, requiring posters to assert a hip-hop identity by way of showing a familiarity with cultural practices, events, and issues. In the example below, the poster dramatically laments the present state of hip-hop. Both the familiarity with and disapproval of the evolution of hip-hop help the poster to assert a hip-hop identity:1 (1) The question is what is rap going to turn into wildness and no realness? Where is the realness? Its all about clubs and partyin and fuckin and bling bling etc. everydamn thing is the same. Nothing creative, just the same ass thing.... It pisses me off. Fuck everyone for their opinions on rap when they don’t even know what rap is! What hip hop is! The meaning of hip hop! And what it has revoultionized into! This Is SHIT! . Numbering of examples is for organization and reference only within this paper and does not indicate any order or chronology to the corpus postings.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.4 (216-279)

 Kristy Beers Fägersten

In Example (2), the poster is more explicit about asserting a hip-hop identity, calling him/herself a ‘hip hop person’: (2) Peace, first I would like to say this has to be one of the most educating sites on Hip Hop, as a culture and as a lifestyle. As a hip hop person, I like to always educate myself on facts and to learn the history of something in my interest, and this website is doing that. Thanks.

Examples (1) and (2) reveal that the content alone of message board postings can establish or at least imply a hip-hop identity. But the examples also reveal two discursive practices of the hip-hop message board community, as well as one important characteristic of the language of message board postings. First, in Example (1), there are several lexical items which characterize the discourse of many hiphop message board postings: ‘fuckin’, ‘damn’, ‘ass’, ‘pisses’, ‘fuck’, ‘shit’, and ‘bling bling’. The repeated use of such taboo terms (e.g., ‘fuckin’) and hip-hop slang (e.g., ‘bling bling’, flashy jewelry) is a common discursive practice, discussed further below. Another common practice is the use of defined openings and closings, as can be seen in Example (2) in the form of the opening ‘peace’ (also an example of hip-hop slang), and the closing ‘thanks’. There is, however, neither a defined opening or closing in (1), nor any use of taboo terms in (2), indicating a more fundamental stylistic difference between the two postings. The sentence structure, grammar and non-standard orthography used in Example (1) encourage a ‘spoken’ reading of the content. Example (2), on the other hand, with its opening, long sentences, standard grammar, spelling and punctuation, and closing is reminiscent of a written text such as a letter. Thus, among the message board entries can be found features of both written and spoken language: the language of CMC. It is the intervariation between (1) and (2) which underlines the necessity to move from the language of CMC to the context of CMC, calling attention to different discursive goals. A large amount of data in the form of a corpus is conducive to the identification of patterns and tendencies in the discursive construction of a hip-hop identity. The emergent discourse strategies may not be evident in every posting, but a quantitative and qualitative discourse analysis reveals the message board forum as an intersection of written and spoken language in which posters discursively construct their hip-hop identity. Example (3) illustrates how aspects of both spoken and written language can be deftly combined with the content and form of message board entries to assert a hip-hop identity: (3) ey foo......show meeh sum pic of yall nikka’s breakin iight homie.........do dat fo ur boi ..............i b joe frum Under Rated Breakaz.

To the uninitiated or out-group members, this posting (and many others in the corpus) may be a challenge to comprehend. A not-too-divergent gloss would be:

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.5 (279-333)

A corpus approach to discursive constructions

Hey fool. Show me some pictures of you niggers break-dancing, alright, homeboy? Do that for your boy. I am Joe from Under Rated Breakers. In addition to two straight-forward assertions of a hip-hop identity – the poster refers to himself as ‘boi’, a hip-hop slang term, and ‘joe frum Under Rated Breakas’, implying he is the member of a break-dance group – there is also the use of slang and taboo terms (‘foo’, fool; ‘nikka’, nigger; ‘iight’, alright; ‘homie’, homeboy, friend; ‘boi’, boy, breakdancer). Therefore, in terms of content, a hip-hop identity is already wellestablished. Nevertheless, it is the form of the posting including non-standard orthography, grammar, and punctuation which dominates as an expression of the poster’s hip-hop identity. Throughout the corpus of hip-hop message board postings, there are similar examples of creative form coupled with appropriate hip-hop content. A corpus analysis of message board postings allows for certain features of discourse to be made salient, revealing how conventions of posting structure, form and content in turn allow posters explicitly or implicitly to assert their hip-hop identities. In order both to acknowledge the distinct variety of language found on internet message boards, and to avoid unintentional alignment with written or spoken varieties of English, throughout this paper, the terms ‘to post’ (to write and submit a message on a message board), ‘posting’ (an individual message on a message board), and ‘poster’ (the submitter of a message) are used.

. Methodology A corpus-linguistic investigation of the discursive construction of a hip-hop identity is enabled by the quantity and quality of data available. Not only is there an abundance of websites dedicated to hip-hop culture, many with free access to message boards, but, most importantly, the message board postings themselves represent unique, raw data produced by the members of a socio-cultural community to which linguist-observers and/or out-group members might not have access. The postings furthermore capture the language of a specific medium as its users would have it represented. There are no questions of editing, nor is there, in contrast to speech, any need for transcription, which eliminates mediation and guesswork (Yates 1996). The reliable identification of linguistic conventions in hip-hop discourse requires both a quantitative and qualitative approach to data and data collection. Determining the extent of linguistic systematicity throughout the discourse demands a large amount of data, while an examination of variations within the identified system also requires a micro-analysis. In an effort to collect enough data to discover potential discursive patterns warranting careful investigation, but at the same time reduce the chances of these patterns being specific to one on-



JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.6 (333-360)

 Kristy Beers Fägersten

line community, the HMBC was composed from five different hip-hop websites; a description of the corpus and web addresses are available in Appendix A.2 There are many websites devoted to hip-hop culture, but not all sites feature message boards and, among those that do, only some are heavily posted. The five sites chosen for this study featured message boards with many postings, facilitating data collection. After dates, e-mail addresses and URLs were edited out, the corpus totaled 102,343 words (tokens) with 9,822 distinct types. WordSmith Tools (Scott 2004a) was used to analyze the corpus in terms of word frequency, sorted lists, keywords and clusters. Frequency, keyword and cluster lists were compiled in order to identify any frequently occurring lexical items or clusters significantly associated with hip-hop discourse. During an initial survey of the corpus, a number of words ending in ‘a’ and ‘z’ were noticed, and thus a reverse-sort list was compiled to determine the extent of these and other non-standard spelling practices. The aim of the investigation as a whole was to identify significant forms and/or lexical items specific to the discourse of hip-hop in order to determine their discursive function within individual postings in particular, and to investigate how they correspond to or signal general strategies for the discursive construction of a hip-hop identity.

. Openings and closings Openings and closings can be found both in spoken language, for example in conversations (Schegloff 1972), and in written language, such as in correspondence (Danet 2002; Eiler & Victor 1988). Crystal (2001), Taboada (2004), Yates (2000), and Yates and Orlikowski (1992) provide evidence and examples of openings and closings in computer-mediated communication as well, such as in e-mails, chat group conversations and message board postings, arguing that the opening-bodyclosing structure of such CMC is similar to a written letter. They also present evidence of instances of CMC where the opening, closing or both are absent, a common feature of CMC which often results in rendering the text informal (Taboada 2004). Examples (2) and (3) differ considerably in terms of stylistics and form, with (2) representing more standard, written English and (3) non-standard, spoken English. Their structures, on the other hand, are similar in as far as each includes a distinct opening and closing. Example (2) opens with the word ‘peace’, used within the hip-hop community as an expression of greeting or leave-taking. Example (2) then closes with the term ‘thanks’. Example (3) opens with ‘ey foo’, a term of address which can be compared to more traditional openings of let. http://odur.let.rug.nl/∼vdbeek/perl/lecture2.html

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.7 (360-421)

A corpus approach to discursive constructions 

ters or other written correspondence, such as ‘Dear/hello/hi ’. The same post then closes with ‘i b joe...’ which can be similarly likened to such traditional closings as ‘Sincerely yours, .’ Example (1) above illustrates that openings and closings are similarly not an absolute feature of hip-hop message board postings. In fact, a close analysis of the corpus reveals that a lack of opening and closing is common to most postings belonging to the same thread, or group of related postings. The HMBC contains 1,512 unique postings, representing 464 comment-response postings and 1,048 inthread postings. A total of 901 in-thread postings (86%) are without opening or closing, as illustrated below in Example (4) containing the first three postings of a new thread. The example features at least two different and anonymous posters, and each turn is prefaced with a number: (4) 1. CBS doin a hip-hop special tonight at 8 about hip-hop and it’s powers.... i think i might peep it! 2. Yeah I’m a be peepin’ it see whut it’s about... supposed to be how deep the industry really reach... 3. And wasn’t it on CBC not CBS?

Erickson (1999) refers to communication such as these message board postings as persistent conversation because, “although it is conversational in many aspects, it is also preserved for future review” (Taboada 2004: 61). Thus, as opposed to faceto-face conversation where contributions are ephemeral, message board postings linger. Website users have the ability to read through previous postings and contribute to the topic at any time, so that the conversation as well as the topic ‘persist’. Much like a chat room, the communication of a message board as interactive written discourse means that “a conversation can not be interrupted. [. . .] The nature of the chat room is, therefore, non-dyadic, unlike traditional speech. Chat room communications are not merely conversations between two participants, but conversations within a larger community” (Balfour 2004: 9). The following postings belong to the same thread as in Example (4) but occur later in the thread. There are at least two different posters, and each turn is prefaced by a number: (5) 1. its funny that the hiphop culture.. which everyone sees as a ”ghetto” and poor culture.. has one of the most expensive ’uniforms’... ive seen mens phat farm pants for over 200$’s... in that price range you could be wearing a pretty nice suit and tie! 2. I second this in every way. 3. kinda scary that cats spending that much on clothes... I would never spend that much on clothing Fuck that I rather be able to buy a house for my kids and shit.. 4. I find it amazing that parents today spend that much money on kids. I was never bought name brand shit. And when I was old enough to buy

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.8 (421-468)

 Kristy Beers Fägersten

5. 6. 7. 8.

my own.. the brand name shit most of the times fell apart faster then the no name shit. My first and only nikes only lasted 3 months.. wait till you all get kids .....mine come at me with this shit al the time ....like im gonna buy them $100 pants they must be fucking crazy .... Yeah... kat sound like he might be on to something... I love hip hop. that was kinda random... yeah it was lol.

Examples (4) and (5) illustrate that this particular thread indeed flows much like a conversation, reducing the need for an opening and closing at every turn. However, due to the lack of opening or closing in the posting “I love hip hop” (posting 6) in Example (5), it is attended to and commented on (postings 7 and 8) for its lack of relevance to the topic of the thread, much like an interruption might be. Herring (1996) explains the lack of openings in many e-mails as a result of headers including “to” and “subject” fields, making traditional openings redundant or superfluous. Message boards are categorized by topic and tend to include fairly distinct threads. An opening for the purpose of stating a topic, therefore, is unnecessary. Similarly, as threads represent persistent conversation, there is little need for closings. The corpus data suggest, however, that openings and closings for the purpose of general greetings and leave-taking similar to conversational openings and closings are common to general, non-threaded posts. Of the 464 comment-response postings, 283 (61%) included both an opening, and a closing, 116 (25%) included an opening or a closing, and 65 (14%) had neither an opening or a closing; each of these 65 was a response-posting from the webmaster addressing the previous comment. Such response-postings were common to this particular website, reducing the need and lowering the expectations for an opening or closing to each one. The data suggest that the openings and closings of hip-hop postings, while structurally similar, are lexically different from those of traditional letters or CMC texts. Hip-hop specific openings and closings were identified as features of hiphop message board postings via frequency and keyword lists. In the following sections, two frequent lexical signals of both openings and closings are presented, ‘yo’ and ‘peace’. . Yo ‘Yo’ is one of a number of words and expressions closely associated with hip-hop. Hip-hop songs, for example, are peppered with ‘yo’ in their lyrics, reflecting its frequent use in the hip-hop vernacular. Picking up on the frequent usage and attention-calling function of ‘yo’, from 1988–1995 MTV aired a program devoted

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.9 (468-600)

A corpus approach to discursive constructions 

to rap and hip-hop music called Yo! MTV Raps. In terms of recognition and use, ‘yo’ is now one of the most popular lexical items in hip-hop culture. The frequency list compiled from the HMBC revealed ‘yo’ ranking at 91, occurring 166 times, or 0.16% of the 100,000 word corpus. In order to determine if this frequency was noteworthy, a keyword list was compiled on the corpus. Keywords in corpora are those words “whose frequency is unusually high in comparison with some norm” and therefore “characterize the text” (Scott 2004b: 19) under investigation. The ‘norm’ used for the comparison is known as a reference corpus; for this study, the reference corpus used was Text G from the FROWN files.3 This particular corpus was chosen due to both its size and content. At a total of 216,104 running words, it is approximately twice the size of the HMBC, providing a good basis for comparison in terms of quantity. Consisting of written American English, the FROWN corpus represents a potentially different kind of language than that of the HMBC, and thus significant keywords are more likely to be identified. A keyword list compiled for the HMBC identified a total of 477 keywords, corresponding to approximately 5% of the total types in the corpus.4 Thus, a relatively high number of words in the corpus are key in that they occur unusually frequently when compared to the reference corpus. Since keywords may not be among the most frequent words in the corpus, a keyword list makes them salient in ways that a frequency list might not. ‘Yo’ ranked 28th in the keyword list, as shown in Table 1. The majority of the words in Table 1 represent function words and discourse markers characteristic of speech, and thus figuring as ‘key’ when compared to written texts. The keyword list suggests that ‘yo’ is a significant member of the hip-hop lexicon, which contains slang and taboo terms, as well. Unlike slang and taboo terms, however, ‘yo’ functions primarily as an opening to a posting. Of the 166 instances of ‘yo’, a total of 21 (13%) are used as alternative forms of ‘you’ or ‘your’, as in Example (6), while a further nine function as interjections, as in Example (7): (6) Go down to Fat Beats on Melrose and buy some b-boy videos and practice yo moves, don’t forget to be original too. (7) Yo wuzzup this is Juan a former O towner but now i live down south in the Mia yo your site is phat! The breakers and the writers and even the design of the site is hot but yo the freestyles are kind weak and a lil wack but everything is phat! If u lookin for some flows email me and i’ll hook it up but everything else on the site is phat! peace out.

. http://khnt.hit.uib.no/icame/manuals/frown/ . Keywords are types as well, so this figure is inflated in the same way that the number of total types is.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.10 (600-629)

 Kristy Beers Fägersten

Table 1. Keyword list N

Key word

Freq.

%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

i you u shit ur hip hop your like i’m n get ya dont just me got think is rap i’ll fuck if im can b know yo

2 590 1 411 936 474 432 374 357 475 705 363 303 404 266 255 476 571 326 336 1 395 221 215 203 552 195 419 206 327 166

2.5 1.36 0.9 0.46 0.42 0.36 0.34 0.46 0.68 0.35 0.29 0.39 0.26 0.25 0.46 0.55 0.31 0.32 1.35 0.21 0.21 0.2 0.53 0.19 0.4 0.2 0.32 0.16

The remaining 136 instances of ‘yo’ function as openings, as in the opening of Example (7), above, and as the following examples illustrate: (8) yo dawg, always love the site, been a dope resource for me and other heads. always tell cats about it when they are looking for some good culture. (9) Yo wutup? Im just a beginner learnin how to pop. Im a gymnast so i can already do a lot of the power moves. I was just wonderin what crew Mr. Wiggles is from and what city its based out of. Thanks Yo.

In (9), ‘yo’ is even used in the closing. The poster self-identifies as a ‘beginner’, and thus may be over-using ‘yo’ as a known hip-hop lexeme. Nevertheless, the closing use of ‘yo’ reveals its possible dual function as both an opening and closing of a post, a feature which also characterizes ‘peace’.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.11 (629-673)

A corpus approach to discursive constructions 

. Peace As in the case of ‘yo’, the word ‘peace’ was also identified as warranting a close analysis due to its association with hip-hop culture, as well as its inclusion in the keyword list. Within the hip-hop culture, ‘peace’ has specific semantics, meaning salutation or farewell. According to the website http://www.rapdict.org, the term ‘peace’ has been appropriated by the hip-hop community, which “updated the term to ‘peace out”.’ The evolution of ‘peace’ into ‘peace out’ represents the preferred use of ‘peace’ as a closing. In the HMBC, ‘peace’ ranked at position 158 of the keyword list, with an occurrence of 63 times in the corpus (0.06%). While certainly not very frequent, the occurrences and usage of the term ‘peace’ in the HMBC nevertheless clearly reflect its dual function as an opening or closing. Of the 63 occurrences, only eight are used as an opening, as in Example (10): (10) Peace, checked out your site after hearing it on miami hip hop, the 2 of us were spot lited on the show so i wanted to see what you guys are all about, [. . .]

Example (11) shows how ‘peace’ is used as both an opening and closing in one and the same post, while Examples (12) and (13) illustrate the use of ‘peace’ and ‘peace out’ strictly as closings: (11) Peace, This is Mecca, a chicago based MC. [. . .] Peace, Mecca. (12) [. . .] If you have any questions, check out the forum message boards on the event. Peace and see you all in LA in Feb. (13) [. . .]If u lookin for some flows email me and i’ll hook it up but everything else on the site is phat! peace out.

. Slang and taboo terms The data from the corpus suggest that the content of the message board postings is quite limited in scope, in terms of lexical density. While the type-token ratio 9.60 suggests a rather diverse vocabulary, the total number of types (9,822) is somewhat misleading. Many types are actually variants of one word, for example, ‘please’, ‘pleez’ and ‘plizz’, and thus when conflated, the total decreases. The total number of types could therefore be understood as much lower, suggesting a low lexical density for this corpus, which in turn indicates that “very few types occur very often”. In fact, a content analysis of the postings of the HMBC reveals that one main focus is listening to or performing hip-hop music or texts, as illustrated in the following posting:

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.12 (673-728)

 Kristy Beers Fägersten

(14) is it me, or are niggas on RB writing verses when they battle? =/... cause when im thinking of spittin, im thinking you actualy sayign the shit out loud, mathcing sylables, having a flow to it.. not sitting there writing a diss essay lol.. maybe its just me and i need to change my style, but when i spit i spit so have people head movin wit mine na mean? i should prolly stick to cyphers haha

The content of this particular posting furthermore features a number of lexical items worth close examination. First, the terms ‘battle’ (a freestyle lyrical challenge with another contributor), ‘spit’ (to rap) and ‘flow’ (lyrical rhythm) are examples of hip-hop slang, the use of which functions as an in-group marker. Second, ‘nigga’ (nigger) and ‘shit’ number among a group of taboo words used particularly frequently in hip-hop discourse. Each of these lexical items are keywords in the HMBC. . Slang The words ‘battle, ‘spit’ and ‘flow’ together with their inflected variants each constitute under 1% of the corpus, but as slang, and thus words included in the hip-hop register (Berns & Schlobinski 2003), their frequency is high enough (or low enough in the reference corpus) to render them key. The use of ‘battle’, ‘spit’, and ‘flow’ in hip-hop discourse is illustrated in the following postings: (15) 1. yeah i might start soon cuz ive been busy accually MAKING this site and now that its finally settling down and its pretty active i can now relax and maybe battle. 2. yeh g u shud ive never seen any of ur battles i look 4ward to seein ur battles n maybe if im feelin lucky i myt battle u. 3. Yo fo real ? dat wud be kool G tu see u battlin u know een one or two of ure drops wen I first started on this site and it was real good so du ure thang G and show ery1 who runs thangs lol. (16) after reading through some of the nonsense posted on this site i had to join to try to bring common sense to the discussion. what rhyme has anyone ever heard 50 spit on that was hotter than most of the nigg@a he beefin with worst track. nas, jada, j, game and everyone else he beefin with are by far way superior lyricist. all 50 can spit is shoot this, 9 rounds that, and bonin groupies. ne one who says he is anywhere near the best probably gay and jsut want to be in his next candyshop video. (17) and i think game is the best rapper in the usa he is as hot as a motherfucker. i went and seen him live in glasgow and he was outstanding. i have seen 50 3 times live in glasgow and he was no were as good as game his flow is the best.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.13 (728-794)

A corpus approach to discursive constructions 

The examples of (15) illustrate the function of ‘battle’ as both a noun and verb, used to refer to the practice of challenging others to competitions in rapping. The use of ‘spit’ in Example (16) refers to the act of rapping, while ‘flow’ refers to the overall delivery of a rap. The use of these specific terms serves to establish the contributors of the postings as legitimate members of the hip-hop community. Other keywords with specific hip-hop semantics include ‘ill’, ‘tight’, and ‘sick’ (positive, valued); ‘peace’ and ‘safe’ (salutation and/or farewell; see Example (21)); and ‘holla’ (recognize, acknowledge, communicate with): (18) i need a ill name or a name that fits me. i love graffiti.seeing my nigga who writes nerds influenced me. also many other graff heads i no like “coma.iw”, “win.cas” and my nigga “nerds.NB”. I try and try to get ill, but i don’t know what the deal is. i think its because i cant find a name that fits me. (19) I’m not a huge snoop fan, he’s kind of whack, but nate and warren g are tight. (20) Mic Club has a few weak beats on it (C Section, Drama A/T), but for the most part, Bis comes through with some absolutely sick rhymes over some equally sick beats (Master Thesis, Curriculum 101, Behind Enemy Rhymes, Allied Meta Forces....G Rap rips shit!!!!!). (21) SAFE U Peeps need to listen to 1xtra and then make ur judgments and trus me on this 1! (22) somebody tell me how to find that song and album holla at yo boy tru miami soulja fan.

The use of slang in the message board postings reflects a familiarity with both linguistic and non-linguistic or cultural hip-hop practices, helping to identify each contributor as an in-group or community member. The use of these slang terms and other hip-hop jargon is a linguistic practice which, in the online context, enables the discursive construction of identity (Bucholtz 1999). As alternatives to mainstream language, hip-hop slang terms may even function as anti-language (Halliday 1976); that is, they contribute to the development of a hip-hop language which is incomprehensible to out-group members, thereby preventing access by out-group members to the hip-hop community. Nevertheless, the usage of both slang and taboo terms carries covert prestige (Trudgill 1972) and fulfils the same in-group member marking function. . Taboo terms The high frequency of occurrence of ‘nigga’ establishes it as a keyword that characterizes at least this particular corpus if not hip-hop discourse in general. Along with ‘shit’ (0.5%), ‘fuck’ (0.45%), ‘ass’ (0.16%) and ‘bitch’, (0.13%), ‘nigga’

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.14 (794-844)

 Kristy Beers Fägersten

(0.14%) ranks among the most frequent lexical (as opposed to functional) keywords in the HMBC. Such words are easily recognizable as belonging to a group of words collectively and commonly referred to as swear words, profanity or taboo terms (Beers Fägersten 2000, 2007). The simple identification of such taboo terms as key words in the HMBC is not enough to identify them as characteristic of hip-hop discourse. In fact, “[w]hile corpus data allows us to describe swearing in English, for example, it does not begin to provide an explanation for anything that we see within the corpus” (McEnery 2006: 4). Based on examples of taboo terms in context, it is argued that, unlike the slang terms above, the use of swear words functions to marginalize hip-hop culture and the hip-hop community by virtue of their recognizability as taboo terms: (23) all u niggaz lost yall mind sayin tha black album is wack. if ya think tha black is wack den ya aint really listen to it. dat shit is da hottest album of tha year. yall need to sit back and listen to dat shit cuz dat shit is hot. (24) now its just a bunch of ignant cats playin into the stereo types of hiphop adn the black race but i wont comment anymore cause its not my place too comment on black issues. But if i was black fuck id have enough shit too say about what these stupid ass crunk rappers are sayin. (25) i just dont like it when i see a white kid or any race actin all hard and shit cuz their wearing fubu and rocawear and BX and shit that when i think its time to give an ass beatin to any race....actin all hard..BITCH PLEASE!!!

The most noteworthy feature of Examples (23)–(25) is the recurrent use of swear words within single postings. The practice of using taboo terms is thereby made much more salient, suggesting that such linguistic behavior is in fact characteristic of the members of the hip-hop community.

. Verbal art Verbal art, according to Joel Sherzer (1987: 296), is: discourse which creates, recreates, modifies, and fine tunes both culture and language and their intersection, and it is especially in verbally artistic discourse such as poetry, magic, verbal duelling, and political rhetoric that the potentials and resources provided by grammar, as well as cultural meanings and symbols, are exploited to the fullest and the essence of language-culture relations becomes salient.

In her discussion of verbal art and performance, Johnstone (2002: 220) addresses the aesthetic aspects of discourse, claiming that “humans attend to how discourse sounds and looks as well as to what it refers to and what it is meant to accomplish”. She (2002: 42) also says that “[a]s people construct discourse, they draw on

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.15 (844-913)

A corpus approach to discursive constructions 

the resources provided by language and on the resources provided by culture”. The postings of the HMBC can be considered verbal art due to the creativity resulting from an exploitation of available resources of CMC, namely the keyboard. In this section, verbal art in the form of alternative orthography including the use of numbers and keyboard symbols is presented as a common practice in computer-mediated hip-hop discourse. Wilkins (1991) points out that asynchronous computer-mediated communication and spoken discourse have in common both the occurrence of second person pronouns as well as linguistic creativity. The HMBC supports this finding, as it includes many postings in which contributors seek to make contact with other members of the community, discuss opinions or ask to be acknowledged (Beers Fägersten 2006). There is also a high frequency of performance of verbal art. Although the HMBC is composed of written English, it can be argued that the content reveals features of spoken, conversational English. A noticeable characteristic of the content of the hip-hop message board postings is the explicit acknowledgement of interaction with others via the use of second-person pronouns. The pronoun ‘you’ ranks as the second most frequent keyword of the corpus and sixth overall with a frequency of 1,411 or 1.36%. To demonstrate that this frequency is particularly high within a written-language corpus, a frequency analysis of the HMBC has been compared to frequency analyses of other corpora, including the Lancaster-Oslo/Bergen Corpus (LOB; 1 million words; British English) Brown Corpus (1 million words; American English) and the British National Corpus (BNC; 100 million words; British English). Table 2 shows the ten most Table 2. Ten most frequent words (types) by corpus

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

HMBC

LOB5

Brown6

BNC7

the i/I and to a you is of that in

the of and to a in that is was it

the of and to a in that is was he

the of and a in to it is to was

. http://alt-usage-english.org/excerpts/fxcommon.html . http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm . ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.16 (913-995)

 Kristy Beers Fägersten

frequent words across each corpus. The types in Table 2 have not been conflated, thus alternative forms of ‘you’ have not been included. It is clear that even in the genre- and register-specific HMBC, function words occur most frequently. Where the HMBC diverges is in the frequency of the firstand second-person pronouns ‘I’ and ‘you’. The difference should be considered in terms of a low frequency of these pronouns in the comparison corpora versus a high frequency in the HMBC; 90% of the BNC and the entire LOB and Brown corpora are composed of written texts, where first- and second-person pronouns are less frequent (Biber 1988; Yates 1996). A keyword link analysis of the corpus reveals ‘you’ to be the most frequently linked word, with 444 links to other keywords. A keyword link analysis reveals “which keywords are most closely related to a given keyword” (Scott 2004b). In other words, ‘you’ is the keyword which most often occurs in clusters or collocations with other keywords. When different forms of ‘you’ are accounted for, the number of keyword clusters increases. Although ‘you’ occurs 1,411 times in the corpus, corresponding to a frequency of 1.36%, when other forms such as ‘ya’, ‘y’all’, ‘your’, ‘you’re’, ‘u’, and ‘ur’ are included in a frequency count, the total jumps to 3,715, or 3.67%, making the superordinate second-person pronoun the most frequent corpus-wide type. In other words, there is a clear tendency among contributors to explicitly acknowledge and appeal to interlocutors through the use of the second-person pronoun. In the HMBC, the most frequent cluster is, not surprisingly, ‘hip hop’, occurring 274 times. The second most frequent cluster, however, is ‘if you’, with 190 occurrences. This cluster usually occurs in contexts where the contributor is seeking contact in order to appeal for assistance, as in the following examples: (26) also my friend told me that it’s better to just buy a couple of break dancing tapes instead of going off to a school to learn. l’d like to know if you agree or if you have another opinion. (27) Whats killing me is some of these wack as south rappers. Ryhming like niggas were ryhmin in the eighties back in the bronx, And slingin that shit like they just created a new style. Holla if you here me. (28) Word up! Nigga from New York claim there hoods are the hardest same thing with cali niggaz. Dog, theses are all huge cities where you can move somewhere else in the same city and be straight. What the hell do y’all know bout them little muder towns like Gary, Indiana Flint, Michian Little Rock, amonst others. Don’t praise the dirt that goes on in the hood. If u have another opinion we can discuss it in a civil manner. (29) YOU’RE FUCKED UP ON ECSTACY. U MUST BE DRUGGIN’ IF U FUCKIN’ THINK U CAN MESS WITH ME.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.17 (995-1067)

A corpus approach to discursive constructions 

Postings (28) and (29) are considerably more aggressive than postings (26) and (27), partly due to the ‘if u’ clusters functioning as challenges to the addressee. The increased aggressiveness corresponds to the shift from the use of standard ‘you’ to non-standard ‘u’. Much like threats, such challenges imply that the poster has the ability, social power or social status to question the beliefs or practices of another. Part of hip-hop culture is the practice of asserting your identity in terms of knowledge of hip-hop or talent in battling or rapping (George 1999; McLeod 1999; Newman 2001). Just as it is a lack of knowledge which motivated the postings in Examples (26) and (27), it is an assertion of knowledge which characterizes Examples (28) and (29). The polarization of status and power as well as their corresponding assertions of hip-hop identity are encoded in the alternate forms of ‘you/u’. The use of both standard and non-standard orthography allows the posters to discursively construct their identities through the form-based strategy of performance of verbal art. . Non-standard orthography The example postings thus far included give an indication of the extent of linguistic manipulation involved in performing a hip-hop identity. In Example (3), for instance, almost every word of the posting is either specific to the hip-hop genre (e.g., ‘nikkas breakin iight’) or written in an alternative manner (e.g., ‘do dat fo ur boi’). The corpus data further suggest that the verbal art of hip-hop discourse is reflected mainly through a community-wide and systematic use of alternative, non-standard varieties of orthography which permeate nearly all word types. Of the ten most frequent words listed in Table 2, alternative spellings for seven of them were found in the corpus. Both sets of words, along with the corresponding frequency percentages, are presented in Table 3. Only the most frequent alternative for each standard form is presented in Table 3. Table 3. The ten most frequent words and their alternative forms (if any)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Standard

% of corpus

Alternative

% of corpus

the i/I and to a you is of that in

3.36 2.53 2.28 2.04 2.02 1.38 1.36 1.34 1.14 1.11

da – n 2 – u iz a dat –

0.13 – 0.29 0.17 – 1.33 0.02 0.05 0.07 –

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.18 (1067-1107)

 Kristy Beers Fägersten

The HMBC consists of written message board entries, and thus neither directly represents oral language nor does it feature prosodic mark-up. Nevertheless, the use of non-standard orthography often encourages an interpretation of the postings as oral language, giving the impression that the contributions are written in such a way as to indicate a particular pronunciation. The alternative forms ‘da’, ‘n’, ‘a’, and ‘dat’ are orthographic representations of the standard forms as they would be phonetically realized in speech. In word-initial position, the dental [ð], represented orthographically as ‘th’, is phonetically realized as the voiced, alveolar, plosive [d]; this is even a feature of African American Vernacular English (AAVE), which is associated with hip-hop culture (Feldman 2002; Rickford 1999; Rickford 2004). ‘And’ and ‘of ’ are often reduced in informal speech such that ‘and’ becomes a syllabified [n] and ‘of ’ becomes the lax vowel [6]. In contrast to these four forms, the other three alternative forms ‘2’, ‘u’ and ‘iz’ do not represent non-standard pronunciations. They do, however, further illustrate verbal art in computer-mediated discourse since each alternative spelling calls attention to the form of the message. The use of small ‘i’ for the first-person pronoun could also be considered an alternative to the standard; however, a general absence of capitalization has in fact become the standard in computer-mediated communication (Crystal 2001). The percentages in Table 3 clearly indicate that the standard forms are more common, but considering the corpus size, the frequency of the alternative forms is remarkable, particularly when one also considers that the use of each alternative may entail a deliberate effort to avoid writing the standard form. Herring (1996) points out the economical use of special characters and acronyms in computer-mediated communication. It is true that each alternative form is shorter than the standard, but one could also argue that their use is potentially more time-consuming due to the potential effort involved in not typing the standard form. Admittedly, this evaluation reflects an out-group member bias. For a community member or a seasoned contributor to hip-hop message boards, saying, thinking, and even typing, for example, ‘dat’ instead of ‘that’ may be effortless; after all, this form is a feature of one’s hip-hop identity. The earlier discussion of the ‘if you’ cluster supports precisely this argument, as the alternative ‘if u’ clusters are shown to occur primarily in postings by contributors asserting their hip-hop identities based on their knowledge and expertise in – and thus familiarity with – hip-hop culture. . Use of numbers Although the HMBC had been edited to remove dates and websites, the first frequency list compiled revealed a conspicuously high frequency of numbers, particularly single digit numbers, i.e., 0–9. A manual, qualitative investigation of the corpus revealed the frequent use of numbers as alternatives to letters, phonolog-

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.19 (1107-1168)

A corpus approach to discursive constructions 

ical strings, and morphemes in the hip-hop message board postings. Because of the type-written form of message board postings, the physical similarity of some keyboard numbers and letters can be exploited, as in the following examples of substitution: ‘9’ for ‘g’, especially in ‘ni99a’ (30) this cat wasn’t even a street ni99a’s he just new street ni99a’s and told they stories, wich is cool but dont front like that’s yo live. (31) Ni99a’s i had to be on the block I much rather be in the board room Bitches, word!

‘0’ for ‘o’ (32) i love the page so pr0ps to u.

‘5’ for ‘s’, ‘4’ for ‘A’ (33) 54F£

Unlike the alternative orthography of several of the most frequent words, these number substitutions do not represent non-standard pronunciations. Furthermore, as the numbers only replace single letters, the alternative forms are legible and quite easily comprehensible. In the following examples, however, numbers are used to replace phonological strings and entire morphemes, which encourages and sometimes requires pronunciation, as the standard form is not always immediately recognizable from the altered, type-written form. ‘1’ for ‘one’ (34) i was readin a boys source magazine n da black ppl wer sayin stuf like ’how can we call eminem a racist wen we disrespect ourselves by calin each other niggas’. iaint even 2 sure bout dis so if ne1 can xplain. (35) i got some1 makin our sig its gonna have female gangstaz in front den in da background its gonna have 50 cent chingy and other people iight tell me wat yall think.

‘2’ for ‘to-’, ‘to’ or ‘too’ (36) i run da streets 2DAY- get on ur knees 2 pray- u tryin’ 2 be hard, but it doesn’t increase da rage (37) u aint no rapper plz all your lyrics is kept in a guitar-case fakest thinker huh? 2bad the world is mine sorry scar-face

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.20 (1168-1231)

 Kristy Beers Fägersten

‘4’ for free morpheme ‘for’ or phonological string [før] (38) yeh g u shud ive never seen any of ur battles i look 4ward to seein ur battles n maybe if im feelin lucky i myt battle u. (39) just u wait kuz the U4iK is gonna tackle his bitch ass.

‘8’ for phonological string [eIt] (40) get yah daym facts str8 noob before you correct me again! i know my sheit. (41) soul tld u y blks h8 whites, racism dnt change and neva will, its still out dere no matter how da government trys 2 hide it.

In many of the above postings, further examples of non-standard orthography can be identified. The ability to determine the extent of usage and systematicity of alternative forms is a distinct advantage of a corpus study. A frequency list, for example, has revealed the most common words of the HMBC, as well as the systematic usage of numbers to replace letters, phonological strings and morphemes. Using WordSmith, it is also possible to view a corpus as alphabetical or reversesort lists. Each list facilitates further identification of recurrent, systematic uses of non-standard orthography in that similar forms are grouped together, for example, ‘u’, ‘u’ll’, and ‘ur’. The reverse-sort revealed a curiously large amount of words ending in the letters ‘a’ and ‘z’, encouraging further investigation and revealing a systematic usage of non-standard orthography for specific (morpho-)phonology. . Word-final ‘a’ Excluding proper names and other words that end in –a in standard orthography, the total number of ‘a’-final tokens (not including plurals) in the HMBC is 1,307, distributed over 139 types, corresponding to 1.28% and 1.37% of the total tokens and types, respectively, in the corpus. The non-standard orthography featuring final ‘a’ can be categorized according to the word it substitutes for (Examples (42)– (45)), or the sound string it is meant to represent in speech (Examples (46)–(48)). In general, final ‘a’ reflects (morpho-)phonemic reduction: –a for ‘have’ (42) he shoulda neva gotten control of TS cuz now all thats left is armageddon and tony sunshine

–a for ‘of ’ (43) i kno a buncha y’all faggots, ur so hungry, “u act BIGGA”

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.21 (1231-1306)

A corpus approach to discursive constructions 

–a for ‘to’ (44) if u wanna speak yo raise ya hand cuz u dont wanna see the buckin if u disrupt the man

–a for [o] (45) LOL..NOW FELLAS, children tend to have wild imaginations. maybe he doesnt get enough positive attention at home..!

‘ma’ for [maI] in ‘my’ (46) i write to improve ma skillz

In the discussion of the most frequent words of the corpus, ‘u’ was identified as an alternative form of ‘you’. Futhermore, in postings (3), (15), (21), (36) and (38), examples of an alternation betweeen ‘your/you’re/ur’ can also be seen. The reverse-sort list revealed another non-standard variant used by the message board contributors, namely ‘ya’: (47) Ill never stop spitin, till ya run outa the shit that was prewriten (48) stab u in ya bladder, and drown u in Piss Puddles

The use of both the ‘u’ and ‘ya’ forms in Example (48) is particularly illustrative of the different phonetics they are each intended to represent. The overall phonology of both postings (47) and (48) is particularly important to the contributor, since these postings are actually part of rap lyrics posted on the message board as part of a battle. Many of the ‘a’-final tokens are the forms ‘da’ and ‘tha’, an alternative spelling for ‘the’. Unlike ‘da’, there is no obvious correspondence in pronunciation of the non-standard spelling of ‘tha’. However, Example (49) suggests that this variant may, in fact, be phonologically motivated: (49) i aint from tha D but i live here, in grosse point, where da rich mufuckas at, lol

This posting shows that the contributor indeed has both variants in his/her repertoire. The use of ‘tha D’ may be to avoid the alliteration which would result from ‘da D’ (Detroit), but still achieve poetic discourse with non-standard orthography. There is a switch to ‘da’ later in the posting, which further suggests that the earlier use of ‘tha’ is due to its phonological environment. Additional final ‘a’ tokens include three different examples of elision – ‘hella’, ‘ima’ and ‘ma’ – where syllables or, in the case of multi-word expressions, entire words are omitted. In posting (50) it can be seen that, through conversion, the form ‘hella’ functions as an intensifying adverb, much like ‘really’ or ‘very’:

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.22 (1306-1379)

 Kristy Beers Fägersten

‘hella’ for ‘hell of a’ (50) Yeah he seems hella hungry on that joint.

Many examples of AAVE can be found in the language of hip-hop (Cutler 1999; Feldman 2002; Rickford 1999; Rickford 2004). Rickford (2004) identifies ’ama’ as a feature of AAVE; the corpus includes the variants ‘im’a’, ‘imma’, and ‘i’mma’. ‘Ima’ for ‘I am going to’ (51) Thinkin u a thug, why dont u bust slugs and Humor Me Cuz ima think u a poser till bullets rip thru my Computer screen

It has been claimed that the word ‘motherfucker’ (or ‘mother fucker’) is most frequently used by African Americans, especially males (Berger 1970; Hughes 1998). Although there is evidence of a wider social distribution of its use (Beers Fägersten 2000), the association with African Americans entails an association with AAVE, which is, in turn, associated with hip-hop. Nevertheless, the variants of ‘motherfucker’ are relatively infrequent and the anonymous nature of message boards makes it difficult to ascertain if its use is associated with a particular race or gender. ‘ma’ for ‘mother’ or ‘motherfucker’ (52) stick to the music ma fuckas (53) yeh ure journal u got sum deep stuff up in that ma ya know

The balance of the ‘a’-final tokens reveals a systematic use of non-standard orthography to represent the sound [6r]. In the majority of cases, the final ‘a’ is a direct substitution for the letters ‘er’, corresponding to a nominal marker as in posting (54) or the comparative adjectival marker, as in posting (55). Final ‘a’ can also be found as a replacement for non-morphemic word-final ‘er’ in analogous environments, such as in Examples (56) and (57): (54) i thought it wuz hilarious dat a rappa would write about sum gawd dayum no tooth bitchez... (55) and shit still does happen 2 us. maybe not in america but in england it does. u got da national front and shit and my dad used 2 get jumped by white people wen he was yunga. (56) n another thing,black dudes dont wanna holla at white girls when black girls are around,but as soon as the black girls leave,they try hollerin.. (57) I waz neva really feelin Obie lyrically but if itz anythin like got some teeth itz cool to thro on when you goin out n shit

Other examples of final ‘a’ do not constitute a direct substitution for final ‘er’, but rather for the word final phones [6r] or [ør]:

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.23 (1379-1449)

A corpus approach to discursive constructions 

‘fire’ (58) king of the spit pit, i spit the fiya shit

‘for’ (59) oh i cant wait fa someone to hear this or see it. Ya get a hype feelin and from there ya feel at ya peak and just hold on to it

‘sure’ (60) 4sha!!!! peace to B.I.G

. Word-final ‘z’ A reverse sort revealed a total number of 533 words ending in –z, distributed over a total of 150 types. The final ‘a’ tokens effect a non-standard orthography intended to reflect pronunciation, encouraging readers to receive and process the text as if it were spoken. In contrast, final ‘z’ does not seem to elicit an alternative pronunciation. Like the morpheme /s/, ‘z’ is used to mark plurals, third person singular inflections, and possessives. The data suggest that final ‘z’ is used as a non-standard orthographic feature to reflect standard phonology, that is, when the phonological environment of the morpheme /s/ results in voicing, yielding word-final [z]. Posting (61) includes examples of plural, third person singular and possessive morpheme /s/ realized as ‘z’: (61) i ain’t feelin obie’z shyt.....his songz waz whack n i wasn’t feelin hiz cd...dat nicca iz jus an ”AD”....jus gettin ppl’z attention 4 a sec.

The following examples illustrate that final ‘z’ is also used to substitute for the inflectional morpheme /s/, even when the phonological environment would not cause voicing: Plural (62) I love Pac and Dear Mama is on his greatest hitz cuz it waz one of his greatest hitz ..Classic

Third person singular verb agreement (63) if you think pac’z overrated thatz your opinion

Final ‘z’ also appears in words ending in [z] or ‘s’, regardless of the phonology: ‘cuz’, ‘coz’, ‘kuz’, ‘becuz’, ‘becoz’ for ‘because’ (64) Cuz he has a few slower beat songs which manz can kick back to.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.24 (1449-1503)

 Kristy Beers Fägersten

‘plz’, ‘plizz’, ‘pliz’ for ‘please’ (65) u aint no rapper plz all your lyrics is kept in a guitar-case

‘asz’, ‘azz’ for ‘ass’ (66) I can still ”Pop” my azz off at this age... I was Known as ”Mr.Tic” becuz I could strobe my whole body like 3-D.

Furthermore, the use of final ‘z’ has been extended to words with no motivating phonology, indicating a trend towards word-final usage: (67) Crooked I is heavy but know 1 wants to know, Ras Kass is on another level. If they all move in unison! the west can be what it can be “GFunkedcrazymuthafuckers” ANYWAYZ LATERZ

The alternative spellings for both frequent function words as well as less frequent lexical items suggests that a general feature of the postings of the HMBC is the inclusion of verbal art, where verbal art is the creative exploitation of computer-mediated resources to achieve poetic and aesthetic content and form. The anonymity of the postings due to aliases or even lack thereof does not allow for a reliable survey of the extent of inter-poster variation, but the number of examples encourages the claim that the variation is not due to repeated entries from a single poster, but rather individual entries of many posters, which in turn offers support for the claim that performance of verbal art is common to the discursive construction of identity as a member of the hip-hop community. . Use of special characters In Example (33), the use of numbers to substitute for letters is illustrated: 54F£. The use of the British monetary symbol ‘£’ for capital ‘E’ suggests that the postings can be attributed to predominantly British contributors, and indeed the fuller contexts of postings such as (33) corroborate this conclusion, although the corpus data cannot confirm poster identity. Regardless of the origin, such postings also illustrate how contributors exploit the interface of computer-mediated communication, the keyboard, to accomplish yet another kind of non-standard orthography by substituting symbols or special characters for letters. This substitution occurs in words that are potentially offensive, suggesting self-censorship, possibly to avoid filtering (Crystal 2001; Dingwell 2004) which could result in non-publication of the posting:

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.25 (1503-1565)

A corpus approach to discursive constructions 

(68) i don’t know about that. u put too many dumb n!gg@s in a room and some dumb n1gg@ shit is going to happen. somebody is about to get shot over this shit soon. (69) gotta give her sum credit, i mean she writez all her sh!t, not alot of rnb b!tchz do that (70) IF YOU PULL OUT YO WALLET YOU’LL GET SHOT UP AND F#CK IN THE A$$ WITH A PLUNGER BY THE PIGS

Other examples of non-standard orthography specific to the keyboard interface include the usage of capital letters. The conspicuous use of capital letters for some or all parts of chats, e-mails or other forms of computer-mediated communication has been conventionalized to be considered as shouting (Danet et al. 1997), and is thus cautioned against as an inflammatory practice (Crystal 2001). However, a total of 51 postings (3.4%) in the HMBC are written entirely in capital letters, as in posting (70). Such postings may not be intended as or, more importantly, even considered shouting, since the persistent use of capitalization throughout an entire posting neutralizes the shouting effect: (71) I AGREE, 50 IS HOT BUT NAS AND JADKISS ARE BETTER. NAS GAVE JAY-Z A HELL. SOME PEOPLE THINK HE WON. WHATS MAKES 50 THINK THAT HE WANTS SOME OF NAS.AND JADKISS IS WAITING FOR SOMEONE TO SAY HIS NAME ON A TRACK. JADKISS CAN DEFINTY GET WITH HIM. FAT JOE IS A SESON VETERN. 50 NEEDS TO MAKE FRIENDS AND NOT EMEMYS BEFORE SOMEONE ENDS HIS LIFE OR HIS CAREER.

Only when the use of lower-case letters is established as the norm can intermittent capitalization be attributed paralinguistic meaning: (72) Read what i said you DUMB OLD FUCK, i was pointing out how stupid you are.

Another variety of non-standard orthography in the hip-hop message board postings involves the use of alternating lower-case and upper-case letters. The data suggest that this practice is not indicative of any vernacular pronunciation or paralinguistic effect, but rather fulfils a purely poetic function, such as the nickname of the poster of the following example: (73) AbSoLuTeBbOy

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.26 (1565-1613)

 Kristy Beers Fägersten

. Conclusion Many studies of hip-hop culture and hip-hop identity have focussed on the content of hip-hop music and discourses (Berns & Schlobinski 2003; George 1999; McLeod 1999; Newman 2001; Rose 1994), attending to the content but failing to address the form. The examples presented in this chapter provide ample argument for considering the how as well as the what of hip-hop discourse. Keyword and word frequency analyses of the HMBC have revealed the discursive construction of identity to be a function of the lexical content of postings; alphabetical and reverse-sort lists have also helped to identify the systematic use of alternative forms or verbal art. Slang terms were identified as keywords used by posters to mark their in-group membership, and to mark the hip-hop community as a counterculture. Each status is further established by the frequent use of taboo words, which, because of the general, society-wide recognizability of such terms as informal, non-standard and potentially offensive (Beers Fägersten 2000), marginalizes the community through less encrypted means than hip-hop jargon. Throughout the corpus and in many of the above examples, however, content is dominated by form, drawing attention from what the posting is about to how the posting looks. The juxtaposition of the following extracts from postings illustrates an increasing saliency of form over content: a.

As a hip hop person, I like to always educate myself on facts and to learn the history of something in my interest, and this website is doing that.

b.

i am white and porto rican but yu cant tell i look striat up with but i act hood not black not fuckn wiggaish i at hood cuz that is where i am from

c.

ey foo......show meeh sum pic of yall nikka’s breakin iight homie.........do dat fo ur boi ..............i b joe frum Under Rated Breakaz

Implicit or explicit membership in the hip-hop community characterizes the content of each posting, but the forms of (b) and, to a greater extent, (c) corroborate the content by showing a familiarity with the verbal art practices of hip-hop. I have argued here that, in the context of online hip-hop discourse, ‘as a hip hop person’ does not serve to identify the poster as a member of the hip-hop community (despite the claimed affiliation with a breakdancing group) nearly as much as ‘i b joe frum Under Rated Breakaz’ does. The postings reveal that both the content and the form of discourse contribute to the discursive construction of identity. It is these two elements together which create a hip-hop persona. The corpus data further suggest that such verbal art is frequent and, to a great extent, conventionalized in terms of wide-spread practice based on frequency of examples. Non-standard orthography including usage and integration of numbers and symbols would seem to be the theme resulting in variation of form among the

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.27 (1613-1691)

A corpus approach to discursive constructions 

postings. The anonymity associated with the message board forum allows for the possibility that only a few contributors, or in the extreme even one contributor, is responsible for the variation between standard and non-standard orthography or between formal and informal styles. It is argued, however, that the number of postings in this corpus created from five different websites suggests a greater dispersal. It is furthermore argued that only through familiarity with and use of such verbal art practices can members of the hip-hop community discursively construct and recognize a valid hip-hop identity. The message board medium affords interlocutors the time to produce and process language, and thus awards them the opportunity to exploit fully their linguistic resources, the resources of computer-mediated communication. Posters are not only able, but are expected to showcase their linguistic talents, rhythmic abilities, and familiarity with the practices of hip-hop and the conventions of computer-mediated communication to be identified and accepted as hip-hop. Consequently, the hip-hop message board postings reveal a deliberate exploitation of the language qualities of this medium. The corpus shows that members of the hip-hop community have adapted to the internet medium, indeed are embracing it and taking full advantage of the interface systematically to construct and assert their individual identities, and to establish community practices. In light of the oral and artistic traditions of hip-hop culture, internet message boards allow for creative codification of speech practices and therefore represent an ideal forum for members of the hip-hop community to discursively construct their identities.

References Allwood, J. 2000. An activity based approach to pragmatics. In Abduction, Belief and Context in Dialogue: Studies in computational pragmatics, H. Bunt & William Black (eds), 47–80. Amsterdam: John Benjamins. Androutsopoulos, J. 2006. Introduction: Sociolinguistics and computer-mediated communication. Journal of Sociolinguistics 10(4): 419–438. Balfour, J. 2004. The third medium: Sociolinguistics and chat rooms. http://a.parsons. edu/∼julia/thesis/oct13.pdf Bechar-Israeli, H. 1995. From ‘Bonehead’ to ‘cLoNehEAd’: Nicknames, play and identity on internet relay chat. Journal of Computer-Mediated Communication 1(2). Beers Fägersten, K. 2000. A Descriptive Analysis of the Social Functions of Swearing in American English. Ms. Beers Fägersten, K. 2006. The discursive construction of identity in an Internet hip-hop community. Revista Alicantina de Estudios Ingleses: Special Issue on Linguistics and Media Discourse 19. Beers Fägersten, K. 2007. A sociolinguistic analysis of swear word offensiveness. Saarland Working Papers in Linguistics. http://scidok.sulb.uni-saarland.de/sulb/portal/swpl/, retrieved 15 March 2006.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.28 (1691-1801)

 Kristy Beers Fägersten

Berger, A. 1970. Swearing and society. ETC: A Review of General Semantics 30: 283–286. Berns, J. & Schlobinski, P. 2003. Constructions of identity in German hip-hop culture. In Discourse Constructions of Youth Identities, J. Androutsopoulos & A. Georgakopoulou (eds), 197–219. Amsterdam: John Benjamins. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: CUP. Bucholtz, M. 1999. ‘Why be normal?’ Language and identity practices in a community of nerd girls. Language in Society 28: 203–223. Collot, M. & Belmore, N. 1996. Electronic language: A new variety of English. In ComputerMediated Communication: Linguistic, social and cross-cultural perspectives, S. C. Herring (ed.), 13–28. Amsterdam: John Benjamins. Crystal, D. 2001. Language and the Internet. Cambridge: CUP. Cutler, C. 1999. Yorkville crossing: White teens, hip hop and African American English. Journal of Sociolinguistics 3(4): 428–442. Danet, B., Ruedenberg-Wright, L. & Rosenbaum-Tamari, Y. 1997. Hmmm... where’s that smoke coming from? Writing, play and performance on Internet relay chat. Journal of ComputerMediated Communication 2(4) http://jcmc.huji.ac.il/jcmc/vol2/issue4/danet.html, retrieved 15 March 2006. Danet, B. 2002. The language of E-mail. European Union Summer School, University of Rome. http://www.europhd.psi.uniroma1.it, retrieved 15 March 2006. Dingwell, H. 2004. Exploring web chat: Are Internet chat sites a form of community? www.infinitecreations.net, retrieved 15 March 2006. Eiler, M. A., & Victor, D. 1988. Genre and function in the Italian and U.S. business letter. Proceedings of the Sixth Annual Conference on Languages and Communications for Worm Business and the Professions. Ann Arbor MI. Erickson, T. 1999. Persistent conversation: An introduction. Journal of Computer-Mediated Communication 4(4): http://www.ascusc.org/jcmc/ vol4/issue4/erickson.html. Feldman, M. 2002. African American Vernacular English in the lyrics of African American popular music. http://www.swarthmore.edu/SocSci/Linguistics/papers/2002/ mattfeldman.pdf Ferrera, K., H. Brunner & G. Whittemore. 1991. Interactive written discourse as an emergent register. Written Communication 8(1): 8–34. George, N. 1999. Hip Hop America. New York NY: Penguin. Halliday, M. 1976. Anti-languages. American Anthropologist 78(3): 570–584. Hård af Segerstad, Y. 2002. Use and Adaptation of Written Language to the Conditions of Computer-Mediated Communication. Göteborg: Göteborg University. Herring, S. 1996. Computer-mediated Communication: Linguistics, social and cross-cultural perspectives. Amsterdam: John Benjamins. Herring, S. 2001. Computer-mediated Discourse. In The Handbook of Discourse Analysis, D. Schiffrin, D. Tannen & H. Hamilton (eds), 612–34. Malden MA: Blackwell. Hughes, G. 1998. Swearing: A social history of foul language, oaths and profanity in English. London: Penguin. Johnstone, B. 2002. Discourse analysis. Malden MA: Blackwell. McEnery, T. 2006. Swearing in English. London: Routledge. McLeod, K. 1999. Authenticity within hip-hop and other cultures threatened with assimilation. Journal of Communication 39(4): 134–150. Nakamura, L. 1995. Race in/for cyberspace: Identity tourism and racial passing on the internet. In Cyberreader, V. J. Vitanza (ed.), 442–453. Needham Heights MA.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.29 (1801-1893)

A corpus approach to discursive constructions 

Newman, M. 2001. ‘I represent me’: Identity construction in a teenage rap crew. Texas Linguistic Forum, Proceedings from the Ninth Annual Symposium about Language and Society 44(2): 388–400. Ong, W. J. 1982. Orality and Literacy: Technologizing the word. London: Routledge. Rickford, J. R. 1999. African American Vernacular English: Features, evolution, educational implications. Malden MA: Blackwell. Rickford, J. R. 2004. What is Ebonics? (African American Vernacular English). Linguistic Society of America on-line, https://lsadc.org/info/pdf_files/Ebonics.pdf Rose, T. 1994. Black Noise: Rap music and black culture in contemporary America. Hanover NH: Wesleyan University Press. Schegloff, E. 1972. Sequencing in conversational openings. In Directions in Sociolinguistics: the Ethnomethodology of Communication, J. Gumperz & D. Hymes (eds), 346–380. New York NY: Holt Rinehart and Winston. Scott, M. 2004a. WordSmith Tools (Version 4). Oxford: OUP. Scott, M., 2004b. WordSmith Tools Help. Oxford: OUP. Sherzer, J. 1987. A discourse-centered approach to language and culture. American Anthropologist 89: 295–305. Stone, A. R. 1991. Will the real body please stand up? Boundary stories about virtual cultures. In Cyberspace: First steps, M. Benedikt (ed.), 81–118. Cambridge MA: The MIT Press. Taboada, M. 2004. The genre structure of bulletin board messages. TEXT Technology 13(2): 55–82. Takahashi, J. 2003. Do we talk (or write?) differently over the Net? A lexical enquiry into ‘a’ Net-EN. http://korpus.dsl.dk/cl2003/cdrom/papers/takahashi.pdf. Trudgill, P. 1972. Sex, covert prestige and linguistic change in the urban British English of Norwich. Language in Society 1:179–95. Turkle, S. 1995. Life on the Screen: Identity in the age of the internet. New York NY: Simon & Schuster. Van Gelder, L. 1990. The strange case of the electronic lover. In Talking to Strangers: Mediated therapeutic communication, G. Gumpert & S. Fish (eds), 128–142. Norwood NJ: Ablex. Wallace, P. 1999. The Psychology of the Internet. Cambridge: CUP. Warschauer, M. 2000. Language, identity, and the internet. In Race in Cyberspace, B. Kolko, L. Nakamura & G. Rodman (eds), 151–170. London: Routledge. Wilkins, H. 1991. Computer talk. Long-distance conversation by computer. Written Communication 8(1): 56–78. Yates, S. 2000. Computer mediated communication: The future of the letter? In Letter Writing as a Social Practice, D. Barton & N. Hall (eds), 233–51. Amsterdam: John Benjamins. Yates, S. J. 1996. Oral and written linguistic aspects of computer conferencing. In ComputerMediated Communication: Linguistic, social and cross-cultural perspectives, S. C. Herring (ed.), 29–46. Amsterdam: John Benjamins. Yates, J. & Orlikowski, W. 1992. Genres of organisational communication: An approach to studying communication and media. The Academy of Management Review 17(2): 299–326.

JB[v.20020404] Prn:11/04/2008; 11:45

F: SCL3110.tex / p.30 (1893-1920)

 Kristy Beers Fägersten

Appendix A The corpus consists of message board entries collected from the following websites: http://p081.ezboard.com http://www.hiphopsite.com http://www.jam2dis.com/hiphopboard http://www.thugz-network-board.tk http://www.underworldhiphop.com Each website was last accessed for data collection on 22 November 2004.

JB[v.20020404] Prn:8/02/2008; 12:28

 

Exploring discourse through specific linguistic features

F: SCL31P4.tex / p.1 (61-88)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.1 (48-118)

The use of the it-cleft construction in 19th-century English1 Christine Johansson Uppsala University, Sweden

This chapter offers a new description of the use of the it-cleft construction in nineteenth-century English. The data for the present study are primarily from historical corpora (a corpus of nineteenth-century English, CONCE, and the Helsinki Corpus of English Texts), but findings from modern corpora and studies of cleft constructions in present-day English (e.g. Collins 1991) are also presented. The results show that it-clefts become more frequent in the 19th century and particularly in speech-related texts, such as trials. This is contrary to both earlier and later periods of English, where it-clefts are more common in written English. The chapter discusses how the structure of the it-cleft and its thematic organisation may have contributed to its increased frequency in 19th-century English. An in-depth analysis of the forms and functions of it-clefts in trials, the genre that most closely represents spoken English of the period, is provided.

.

Introduction

In present-day English, the it-cleft construction is frequent in both spoken and written language, and conveys divided focus, i.e. the focus is both on callousness and ignore in: It is his callousness that I shall ignore. The context decides which item is the more important, i.e. new information (Quirk et al. 1985: 1372, 1383–1384). This chapter analyses the use and development of the it-cleft construction in English, focusing on the 19th century. This is apparently when it-clefts started being used with some frequency, as I will illustrate. The data are drawn from CONCE (Corpus of Nineteenth-Century English) consisting of 1 million words, covering genres representative of 19th-century English (see Kytö, Rudanko & Smitterberg . I want to thank Christer Geisler, Merja Kytö and Terry Walker, Department of English, Uppsala University for valuable comments on my paper. I would also like to thank Christer Geisler for his help in providing the material from CONCE.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.2 (118-188)

 Christine Johansson

2000). Periods 1 (1800–1830) and 3 (1870–1900) were studied to see if the use and frequency of it-clefts changed over the course of the century. I will first discuss the theory behind it-clefts in present-day English (PresE) and their frequency, then give a brief historical background on it-clefts. This background includes a study of it-clefts in Early Modern English (EModE) (Period III, 1640–1710) in the Helsinki Corpus and a comparison with 19th-century examples of it-clefts in CONCE. Third, examples of it-clefts in the 19th-century data will be discussed across speech and writing and across genres. Finally, Trials will be studied in detail since this genre contains both the greatest number of examples of it-clefts and also variations and extensions of the original it-cleft pattern.

. It-clefts in Present-day English: Theory and frequency The syntactic organisation of an it-cleft is as follows: It – is/was – focus – finite that/wh-clause / non-finite clause2 (see Pérez-Guerra 1991: 184)

The prototypical patterns for it-clefts are the following: It is his callousness that I shall ignore. (Quirk et al. 1985: 1383) (It is/was ... that/who/which) I think it was me that was being a bit, got a bit het up. (Biber et al. 1999: 962) (It is/was + personal pronoun ... that)

There are three different extensions of the basic pattern. One of them is found in the th-cleft (see Collins 1991: 3), when a demonstrative replaces it: Those/these are my biscuits (that) you’re eating. (Huddleston & Pullum 2002: 1420) This is a serious problem (that) we have. (Huddleston & Pullum 2002: 1420; Quirk et al. 1985: 1387)

Another extension of the original it-cleft pattern is the one where a personal pronoun replaces it: He was a real genius that invented this. (Quirk et al. 1985: 1384)

. A Ø-clause can also occur (see e.g. Quirk et al. 1985: 1387) and together with same the second part of the cleft is introduced by that or as. The following formula is found in Prince (1978: 883): It is /was Ci which /who(m) / that / Ø S – Ci . Collins (1991: 36) extends the formula to include preposition + which /whom and where/when. It-clefts with where and when are not discussed in this paper.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.3 (188-249)

The use of the it-cleft construction in 19th-century English 

A third possible extension of the it-cleft pattern is when a non-finite clause occurs instead of a relative clause: Is it Kim making all that noise?/Is it Kim who is making all that noise? (Huddleston & Pullum 2002: 1420)

More theoretical and detailed analyses of it-clefts than the ones described above from standard grammars of English are found for example in Prince (1978), Declerk (1984) and Collins (1991). In Prince’s discussion of it-clefts, two types are distinguished: the stressed-focus (SF) cleft, where the focused item is new information: It was John that killed her (Prince 1978: 895) and the informativepresupposition (IP) cleft, where the information in the that-clause is known, but not necessarily to the hearer/reader: It was just about 50 years ago that Henry Ford gave us the weekend (Prince 1978: 898). In other words, the information in the that-clause could be new, at least to the hearer/reader. Declerk (1984) sets up a third category of it-clefts: discontinuous clefts, in which both the focused item and that-clause are new and receive normal stress. Collins’ study of clefts (it-clefts and pseudo-clefts) in the Lancaster-Oslo/Bergen Corpus of written British English (LOB) and the London-Lund Corpus of spoken British English (LL) is one of the most important corpus-based studies of it-cleft constructions. Collins (1991: 212– 215) found that unmarked it-clefts (with the new information, stress and contrast in the highlighted item) are more common in writing than in speech, but that it is also the most common type of it-cleft in spoken texts. Two PresE examples from Collins (1991 : 55, 62) are (1) and (2), where tone unit boundaries are reproduced in the spoken example. (1) It was Mrs. Kennedy who drew the crowds, said police. (LOB A28, 26) (2) #I think it was through her inspiration# that possibly# the Women’s Institute and things like that# really developed# (LL:S.12.6, 995–9)

Scholars have commented on the status of the second part of the it-cleft, that is, whether it is a ‘true’ relative clause or not, since variation with which hardly occurs in Modern English. PresE (spoken corpus) examples show that a wh-form, particularly who, often introduces the subordinate clause, as in Example (1) above. Here a name is referred to, and this is the most obvious instance of the referring quality of the focused element (see Pérez-Guerra 1999: 190). Also, the focused element is the subject in this example. Of PresE grammars, Huddleston and Pullum (2002: 1416) call the second part of the cleft a “relative clause”; Biber et al. (1999: 959) are more careful and write about a “relative-like clause”, while Quirk et al. (1985: 1387) refer to the subordinate clause as an “annex clause”, stating that it differs from a relative clause since that as subject can be omitted (It was the President himself spoke to me), and since a proper noun can be the focused element. The construction with a proper noun

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.4 (249-295)

 Christine Johansson

and a that-clause, which is found in the 19th-century Trials (see Example (3) below), is discussed in some detail by Jacobsson (1994). The antecedent is not only the name but the whole it is/was-clause. A plausible explanation for Jonathan Martin + the that-clause in (3) is that it is a ‘condensed’ version of Jonathan Martin, the one that has done it (see Jacobsson 1994: 190).3 (3) [$Mr. Brougham.$] Was not the first thing you said to your wife when you heard the Minster was burnt, ”surely it is not Jonathan Martin that has done it?” [...]. [$Baron Hullock.$] When you first heard that the Minster was set fire to, did you not say to your wife, ”surely it is not Jonathan Martin that has done it.” – [$WILLIAM LAWN, SWORN.$] Yes; but I had some reason for it, as my son said there was a rope ladder. [$Q.$] And you then said, ”surely it is not Jonathan Martin that has done it.” (Trials, Jonathan Martin 1800–1830)

Ball (1994a) claims that the second part of the it-cleft is a relative clause (her term is cleft complement) since the restrictive relative clause and the cleft complement have a similar development over time as regards the distribution of Ø, that and wh-forms. Ball’s 20th-century data also show that wh-forms are more frequent in it-clefts than has generally been assumed. The proportions of wh-forms and that from Ball’s 20th-century data are who ≈ 91%, that ≈ 9%; and which ≈ 33%; that ≈ 67%, respectively (see Ball 1994a: 197). Collins (1991: 2) refers to the subordinate clause as a “relative clause”. Prince (1978: 883) refers to it as a “that/wh-clause”, which seems to be the most satisfactory solution since it is not necessary then to make a claim regarding the status of the clause. The term that/wh-clause will be used here. In 19th-century English, a few instances of which and who are found, and in the Drama texts, what occurs (see Section 3 below, and Ball 1994a: 185). Generally, a high frequency of wh-forms is found in the 19th century (see Johansson 2006), and for that reason they could be expected to be more frequent also in itclefts. Judging from the 19th-century material used in this study, the that-clauses are predominant, however, and variation with wh-forms rarely occurs. It seems to be the case that the it-cleft is a construction that more readily took a that-clause for reference to its antecedent than the ‘ordinary’ relative construction with a restrictive clause. In other words, in the 19th century, the it-cleft seems to be more or less formalised as it was/is + that-clause.

. According to Collins (1991: 3), Jonathan Martin, the one that has done it is a reversed pseudocleft.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.5 (295-360)

The use of the it-cleft construction in 19th-century English 

. Brief historical background on it-clefts Previous studies, for example, Ball (1991) and Pérez-Guerra (1999), and preliminary results from the present study, show that it-clefts are rare before the Late Modern English period (1700–1900). However, according to Ball (1991: 505), the Late Middle English (LME) period (1300–1500) is crucial in the development of it-clefts. Wh-pronouns and Ø-relatives are attested from the 14th century in the second part of the cleft (i.e. the that/wh-clause), even though that is still the most common. The it-cleft is extended to Adv/PP foci and this brings with it a new type of cleft: the informative-presupposition (IP) cleft where the that/wh-clause contains known information but often not known to the hearer (for a more detailed explanation and examples, see Section 5 below; also compare Ball 1994b). In the EModE period and Late Modern English (LModE) period, the Adv/PP type and the IP cleft continue to increase in relative frequency. Pérez-Guerra (1999: 181) found 3 examples in the LME period, 1420–1500, and 20 examples in the EModE period (1500–1710) of The Helsinki Corpus of English texts (in total, 23/769,850 words; see also Table 1 below). He notes (1999 : 191) that the period EModE II (1570–1640) “seems to be the landmark as far as the consolidation of the it-cleft construction is concerned”. In EModE III (1640– 1710), Pérez-Guerra found 11 examples of it-clefts but only 4 examples are from speech-related texts, and Pérez-Guerra states that it-clefts are not more common in speech-related genres than in written ones in the Helsinki Corpus. When a very small-scale study of the (speech-related) Trials in EModE III was carried out for the present study, however, a slightly different result emerged. In the two Trials texts alone, 19 examples of it-clefts were found (19/13,760 words). Only it-clefts with a that-clause as the second part occurred in the two texts. A focused NP is most frequent, but PPs and pronouns as focused items are also represented. See Examples (4)–(8) below. (4) (Dunne) And it was not a little girl that lighted thee to Bed or conducted thee in? (Trials, Titus Oates 1640–1710) (exclusiveness, focused NP, second part of the cleft = that-clause) (5) (L.C.J.) what Pains is a Man at to get the Truth out of these Fellows, and it is with a great deal of Labour, that we can squeeze one Drop out of them (Trials, Lady Alice Lisle 1640–1710) (focused PP [adverbial: manner]) (6) (L.C.J.) At Mrs. Harwell’s was it that you saw him? (Trials, Titus Oates 1640–1710) (focused PP [adverbial: place])

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.6 (360-454)

 Christine Johansson

(7) (Graves) It was the same Ireland that they said afterwards was executed? (Trials, Titus Oates 1640–1710) (focused NP, same + N) (8) (Lisle) I instructed him in Loyalty and sent him thither; it was I that bred him up to fight for the King. (Trials, Lady Alice Lisle 1640–1710) (focused pronoun)

Trials are included in Pérez-Guerra’s category G4, which also comprises Travelogue, Autobiography and Biography. The two EModE Trials texts from which the examples above are taken were studied in order to compare with the 19th-century material since it is in the Trials genre that most ‘spoken’ (i.e. speech-related) examples of it-clefts are found. Generally, it-clefts are nearly three times as frequent in speech-related genres as in written ones in 19th-century English; see Section 4. In nineteenth-century Trials, it-clefts are suitable to express the kind of information given in the courtroom which includes for example, stress and exclusiveness (see Section 5.3). In (9)–(13), the same patterns of it-cleft constructions are illustrated as in the EModE examples above. (9) [Mr JUSTICE WILLS.$] It is not a witness of this sort that would understand the plan. (Trials, Adelaide Bartlett 1870–1900) (exclusiveness, focused NP, that-clause) (10) I understand that it was in your character as a friend that you accompanied her to Dr Hopper? (Trials, James Maybrick 1870–1900) (focused PP [adverbial: manner]) (11) Was it in your presence that he read it? (Trials, Jonathan Martin 1800–1830) (focused PP [adverbial: place]) (12) [$MARY CADWALLADER’S EVIDENCE.$] Yes, Sir; it was the same week that he died. (Trials, Edwin Maybrick 1870–1900) (focused NP, same + N) (13) [$Sir Charles Russell$] It was you that took the telegram to Mr. Edwin Maybrick asking him to send the doctor? (Trials, Edwin Maybrick 1870–1900) (focused pronoun)

. It-clefts in 19th-century English In the 19th century material, it-clefts are used frequently and are found in most text types, speech-related and written, fiction and non-fiction. It is indeed possible to discern a consolidation of the construction in this period.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.7 (454-528)

The use of the it-cleft construction in 19th-century English 

Table 1. It-cleft constructions in LME/EModE (Pérez-Guerra 1999) and 19th-century English (Periods 1 and 3) Period

N

per 100,000 words

LME/EModE 19th-century (LModE)

23/769,850 words 207/644,972 words

3 32

Table 2. It-cleft constructions in 19th-century speech-related and written genres (Periods 1 and 3) Genre

N

per 100,000 words

Speech-related Written

141 66

47 16

Table 3. It-cleft constructions with that/wh-clauses in 19th-century English (Periods 1 and 3) That/Wh-clause

N

%

That-clause Wh-clause Ø-clause

159 28 20

77 13 10

The figures in Table 1 show that it-clefts are frequent in the 19th-century data. The tentative comparison with Pérez-Guerra’s figures shows that 32 itclefts/100,000 words occur compared with 3/100,000 words in LME and EModE.4 In Table 2, the number of it-cleft constructions in speech-related (Debates, Drama, Fiction and Trials) and written genres (History, Letters, and Science) in 19thcentury English are compared. Pérez-Guerra found that it-clefts were no more frequent in speech-related genres than in written ones in EModE and for that reason speech was not considered as a possible origin for it-clefts. For PresE, Collins (1991: 181) reports that it-clefts are more common in written than in spoken texts.5 However, in the 19thcentury material, it-clefts are nearly three times as frequent per 100,000 words in the speech-related genres: Debates, Drama, Fiction and Trials. As is evident from Table 3, 77% of the subordinate clauses, i.e. the second part of the cleft, include that in the 19th-century (spoken and written) data. Clauses with that are slightly more frequent in the 19th-century English corpus than in the PresE ones (the LOB and LL corpora) used by Collins (1991: 35), . In PresE, the figure would be about 53/100,000 words (see Collins 1991: 181). . In his Survey of English Usage corpus, Breivik (1986) found that it-clefts are evenly distributed between spoken (49%) and written data (51%).

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.8 (528-621)

 Christine Johansson

Table 4. It-cleft constructions with that, who, which, what and Ø in 19th-century English (Periods 1 and 3) That/wh-form/Ø That Who Which What Prep.+ whom Prep. + which Ø

N

%

159 5 14 5 1 3 20

77 2 7 2 – 2 10

who reports that 64% of the clauses are that-clauses, 19% have who/which and 14% include Ø-clauses. In the 19th-century data, 13% of the clauses are with who/which/what and 10% of the clauses are Ø-clauses. Table 4 shows the frequency of that, who, which, what and Ø for the 19th-century data. In the PresE material (Collins 1991: 35), who occurs in the second part in 12% of the cleft constructions. In the 19th-century data, on the other hand, only 3% of the clauses are with who (including preposition + whom) but 9% are with which (including preposition + which). Collins found that 7% of the clauses are with which. It may be worth noting that the most common relativizer in the 19thcentury corpus is which, and it is particularly frequent in the Science texts (see Johansson 2006 and the discussion below). When the wh-forms started being used in it-clefts in late Middle English, they were used for focused subjects, as in examples (14) and (15) (see also Ball 1994a: 198). (14) [$Mrs. P.$] Oh, then, madam, ’tis you who were here just now. (Drama, Arthur Pinero 1800–1830) (15) It is, indeed, a circumstance which enhances the geological interest of the commotions [...] (Science, Charles Lyell 1800–1830)

In the 19th century, it-clefts seem to have been typical of the spoken language. The speech-related genres with proportionally the most examples of it-clefts are Trials and Debates (65/100,000 words and 51/100,000 words; see Table 5). The genre Debates contains speech taken down as either direct or indirect speech (indirect speech occurs in Period 1). Compare example (16) (from Debates) and example (17) (from Trials). (16) it was his noble friend [$(lord Castlereagh,)$] to whom the hon. gent.’s observations could not in the least apply. (Parliamentary Debates 1800–1830) (17) [$MR. HAWKINS.$] Yes. What was it that made you think it was him? [...] [Mrs. ELEANOR SMITH, sworn$] Yes. It was the first thing that struck me it was him, by the movement [...] (Trials, Roger Tichborne 1870–1900)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.9 (621-703)

The use of the it-cleft construction in 19th-century English 

As can be expected, the language in the Debates is fairly formal. Preposition + wh-form, that is whom, in example (16) is a formal construction, but also the most common pattern for placement of preposition + relativizer in 19th-century English. Quirk et al. (1985: 1387) state that in PresE it-clefts, “it is virtually impossible to use whom or which preceded by a preposition.” Parliamentary debates are naturally different from Trials. Even if the dialogue of the courtroom generally must be regarded as formal, witnesses, for example servants, can be expected to speak less formally. In Example (18), in which a member of the legal profession speaks, a Ø-clause as the second part of the cleft occurs even though Ø is considered less formal than that or a wh-form. In the 19th-century data, Ø-clauses occur when the focused item is a prepositional object, as in (18), but most frequently when it is an adverbial (see Section 5.1). (18) [$Mr Wright.$] – The dates have got a little confused, my Lord. It was the previous Monday she was speaking of. But I don’t think I need pursue it. (Trials, Adelaide Bartlett 1870–1900)

Examples (19)–(21) are from Debates (Period 3), where It is only within [period of time]... that, as in (22) is a lexical pattern that occurs frequently. (19) It is only within the last ten years that the argument of population has served your purpose. (Parliamentary Debates 1870–1900) (20) But the hon. Gentleman went to the root of the matter when he told us that this was a question of political equality, and it is for political equality that we are fighting. We are absolutely in favour of political equality. (Parliamentary Debates 1870–1900) (21) I think it was Sir Robert Fowler, a late Member of this House, who used to boast that he had no fewer than thirteen votes in different constituencies, and that he was able at one General Election to record them all. (Parliamentary Debates 1870–1900) (22) [$The LORD CHIEF JUSTICE$] Was it before or after that that you heard Lady Tichborne had recognised the claimant? (Trials, Roger Tichborne 1870–1900)

In Table 5, it-cleft constructions across 19th-century English written and speechrelated genres are shown. Trials and Drama are similar in that they exemplify direct speech, either written down by a scribe at a trial or written by a playwright to be spoken. The scribe could naturally have influenced the text to a greater or lesser extent. In the Drama texts, certain syntactic features, such as it-clefts, could have been regarded as a feature of (everyday) speech and could thus be exploited by the playwright. Trials not only have nearly twice as many examples per 100,000 words (65) than Drama (33)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.10 (703-763)

 Christine Johansson

Table 5. It-cleft constructions in 19th-century genres (Periods 1 and 3) Genre

N

per 100,000 words

Trials Debates Drama Fiction History Science Letters

85 21 20 15 29 28 9

65 51 33 21 49 41 4

but also contain more variations on it-clefts. In both genres, it is is not as frequent as it was. In Trials, it was is predominant since crimes and other events are referred to in the past, as in Example (22). Most of the examples include a that-clause and the focused item is most commonly an adverbial, since time, place and manner are important points to clarify in a trial (it was a quarter before six that. . . ; it was first in Mr Paul’s house that. . . ; it was in your character that. . .). It-clefts in Trials are discussed in more detail in Section 5. Of the speech-related genres, Fiction has the lowest number of examples of it-clefts (21/100,000 words). Fiction could be regarded as the least speech-related genre since it only partly consists of constructed dialogue (see Kytö, Rudanko & Smitterberg 2000). Examples (23)–(27) are from Drama. A subject is the focused item in 18 out of 20 instances of it-clefts, and a that-clauses are twice (13 examples) as frequent as wh-clauses (7 examples). Out of the 7 wh-clauses that occur as the second part of the cleft, 5 are what-clauses (see (25) and (26)). The two other examples of whclauses are with who; see (27). What is used in just one play from the third period, The Dandy Dick by Arthur Pinero (1893). The major function of what in this play is its use in it-clefts. (23) [$Maitland.$] It is these qualities that forbid me now to tell you my secret. (Drama, Thomas Holcraft 1800–1830) (24) [$Elaine.$] It was your pushing that broke the looking-glass. (Drama, Arthur Henry Jones 1870–1900) (25) [$BLORE.$] ‘Annah, ‘Annah, my dear, it’s this very prisoner what I ‘ave called on you respectin’. (Drama, Arthur Pinero 1870–1900) (26) [$HANNAH.$] It’s ‘unger what makes you feel conscientious! [...] Oh, lady, lady it’s appearances what is against us. [...] It’s I and my whistle and Nick the fire brigade-horse what’ll bring him back to the Deanery safe and unharmed. (Drama, Arthur Pinero 1870–1900) (27) [$Phybus.$] My dear, it is you who will insist. (Drama, Arthur Henry Jones 1870–1900)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.11 (763-826)

The use of the it-cleft construction in 19th-century English 

The written genre which contains most examples of it-clefts is History (49/100,000 words); most instances are found in Period 3 (1870–1900). The construction it was naturally predominates, since the focus is on historical events. IP-clefts (“the thatclause contains the ‘message’,” see Prince 1978: 904) are particularly frequent in the History texts. As is evident from Examples (28) and (29), the focused element is an adverbial. Matters such as place and time are important elements to stress in historical texts. (28) It was not, however, in the North-West that the greatest danger was visible; it was not to the North-West that men turned with their chief anxiety. (History, Spencer Walpole 1870–1900) (29) There was some haggling over the price, and it was not until November that two friars appeared by night at the house of the English minister, staggering under the load of coined metal which they carried. (History, Samuel R. Gardiner 1870–1900)

Of the written genres, Letters has the fewest examples of it-clefts (9 examples equalling 4 examples/100,000 words), despite the fact that the style in the Letters is colloquial and close to the speech-related genres. In the few examples of it-clefts that occur in Letters, only that-clauses are exemplified. This may possibly indicate that when clefts were used they were used in their more formalised version (it/was ... that), i.e. with no variation with a wh-form. The focused element in the letters is also an adverbial in 5 out of 9 examples, something which favours a that-clause: (30) [...] it is only at the seaside that I never wish for rain. (Letters, Matthew Arnold 1870–1900)

Science and History represent 19th-century academic writing (see Kytö, Rudanko & Smitterberg 2000). Science is perhaps the most typical written genre; it is technical and formal, something which is evident in the fact that wh-forms occur in 8 of the 28 examples, as in Example (33) below. The use of it-clefts decreases in the third period; they are almost twice as common in Period 1. Possibly they were considered as not suitable for the style of scientific texts anymore. In the other genre representing academic writing, History, it-clefts occur more frequently in the third period than in the first. In Science, it is occurs almost exclusively (there is only one instance of it was) since present conditions or relations are the topics and explanations occur frequently. The focused element is either an adverbial or a subject. Sixteen out of the 28 examples of it-clefts in Science, and 7 out of 8 wh-clauses as the second part of the cleft, occur in David Ricardo’s text On the Principles of Political Economy and Taxation (1817). The use of it-clefts generally and it-clefts with wh-clauses could thus be a typical feature of the writing style of an individual scientist. Examples (33) and (34) are both wh-clauses and that-clauses from Ricardo’s text. In (34), the first part of the

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.12 (826-885)

 Christine Johansson

it-cleft is structurally complex and it is is at some distance from the that-clause. This is rarely seen in PresE, but occurs quite frequently in 19th-century texts. (31) It is evidently in sea-ports alone that we can look for very accurate indications of slight changes. (Science, Charles Lyell 1800–1830) (32) It is the triteness of these experiences that makes the most varied life monotonous after a time, [...] (Science, Francis Galton 1870–1900) (33) It is this principle which determines that wine should be made in France and Portugal, that corn should be grown in America and Poland and that hardware and other goods shall be manufactured in England. (Science, David Ricardo 1800–1830) (34) It is not by raising in any manner different from the present, the fund from which the poor are supported, that the evil can be mitigated. (Science, David Ricardo 1800–1830)

. It-clefts in 19th-century Trials In the following, Trials will be explored in more detail than the other genres in CONCE. The justification for this is, first, that the genre can be regarded as one which comes closest to spoken 19th-century English and, second, that the expression of stress, contrast, and exclusiveness and the like are part of courtroom discourse, something which favours the use of it-clefts. Information known to all participants but the hearer is sometimes given (see Prince 1978: 898), which favours the use of IP-clefts, i.e. a cleft construction where the that/wh-clause conveys the message. Collins (1991: 185 and 2005: 92–93) states that when it-clefts occur in spoken PresE, they are more frequent in prepared speech (which includes court cases) than in other spoken genres. It-clefts are used in the context of building arguments and expressing opinions. In the 19th-century Trials, however, the identifying and clarifying functions (particularly in questions) seem to be the most important. The dialogue recorded in the 19th-century courtroom also gives rise to interesting variations on the it-cleft pattern in the Trials genre; see section 5.2. I will first discuss structures of it-clefts in Trials. . Structures of it-cleft constructions in 19th-century Trials The typical structure of an it-cleft in Trials is as follows: It was + that-clause with focused subject/adverbial

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.13 (885-957)

The use of the it-cleft construction in 19th-century English 

Table 6. The structure of it-cleft constructions in 19th-century Trials (Periods 1 and 3) It was That-clause NP focused It-clefts in (direct) questions PP (AdvP/Clause) focused Adverbial focused Subject focused

78 (92%) 63 (74%) 49 (58%) 37 (44%) 36 (42%) 34 (40%) 26 (31%)

The it-clefts are often embedded in questions, as in (35)–(37) below, and can have a fairly complex structure. This is particularly the case when a clause, and not an NP or an AdvP/PP is the focused element, as in Examples (35) and (36). (35) [$Mr. JUSTICE WILLS.$] Then it was when you were told about the coals and the basin that he was walking about, was it? (Trials, Adelaide Bartlett 1870–1900) (36) [$Mr. Wright.$] And it was while you were living at Malvern, in the year 1873, that you issued the English edition of this book? (Trials, Adelaide Bartlett 1870–1900) (37) [$Mr. Clarke.$] Was it on that evening on which the conversation took place that he told you about the thirteen and the seven? (Trials, Adelaide Bartlett 1870–1900)

More detailed information on the structure of it-clefts is shown in Table 6 above. Table 6 shows that 74% of the subordinate clauses are with that in the Trials and it was is predominant in the first part of the cleft (92%). In the Trials, there are five examples of wh-clauses as the second part of the cleft. These are ‘special’ in different ways: two occur in pied piping constructions, i.e. preposition + relativizer, where a wh-form must naturally be used. Quirk et al. (1985: 1387) and Ball (1994a: 190) comment on the infrequency of pied piping constructions in it-clefts and Collins (1991: 32) reports only four preposition + which in his PresE corpus. The two pied piping constructions in the 19th-century Trials are Examples (39) and (40). In (38), a th-cleft is exemplified, which could explain why the wh-clause is used, namely to avoid repetition of that. (38) [$Sir Charles Russell$] And that is the landing which all the servants – all the persons in the house, in fact – who go up and down stairs must pass? (Trials, Edwin Maybrick 1870–1900) (39) [$The Judge$] And during the whole of the period it was that deranged digestion and his nervous system for which you were treating him off and on from 1882 to the end of 1888, and that was so in December 1888? (Trials, Edwin Maybrick 1870–1900)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.14 (957-1031)

 Christine Johansson

(40) [$MR. JUSTICE PARK$] It was in some character with which you were not acquainted? (Trials, James Bowditch 1800–1830)

Examples of Ø-clauses as the second part of the cleft are fairly frequent in Trials; 20% of the it-clefts include a Ø-clause and the focused item is an adverbial in more than 50% of the cases, as in (41): (41) [$MRS. ANN KILBY, SWORN.$] It was into our dining-room they came. (Trials, Jonathan Martin 1800–1830)

In Trials, the focused item is most commonly an adverbial (40%) or a subject (31%); see Table 6. Matters such as time and place (adverbials, most frequently expressed by PPs) are equally important to stress, as are people (as subjects and objects) in a trial. Therefore, PPs (and to some extent also AdvPs and clauses, 42%) are nearly as frequent as NPs (including pronouns and personal names, which account for 58%) as structures of the focused items. Examples (42)–(46) are from the two trials with the most occurrences of it-clefts. (42) [$MR. SERJEANT PELL.$] Now how long do you think it was previous to the 2d [sic] of September, that any of these persons, whose names I have mentioned, said any thing to you of a particular nature about James Bowditch? (Trials, James Bowditch 1800–1830) (43) [$MR. JUSTICE PARK.$] It was there you met them? (Trials, James Bowditch 1800–1830) (44) [$CHARLES PUDDEY sworn; examined by MR. JUSTICE PARK.$] They came to see a print we had got; it was not Mr Bowditch that was with her then; it was one of his sisters. (Trials, James Bowditch 1800–1830) (45) [$Sir Charles Russell$] I believe it was your own solicitor you recommended to her? [...] Do you know whether it was he who suggested the soup? (Trials, Edwin Maybrick 1870–1900) (46) [$ELIZABETH HUMPHREYS’S EVIDENCE.$] It was in October I found some fly-papers on the window sill in the kitchen. [...] I went back to the house in October 1888, and it was directly that I went back that I saw them. (Trials, Edwin Maybrick 1870–1900)

Questions containing an it-cleft are very common in Trials. It can be discussed if questions as structures may favour a frequent use of it-clefts since nearly half of all the cleft sentences that occur in Trials, 44% (see Table 6) are embedded in a question. The clefts seem to be used for the sake of ‘clarification’, to verify once more the identification of a person, thing or place. Compare the non-clefted variants of Examples (47) and (48): Now are you sure that your Master went to read the news a quarter before six? [...] What sort of matter was produced? The questions are asked exclusively by members of the legal profession, as would be expected. The differ-

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.15 (1031-1114)

The use of the it-cleft construction in 19th-century English 

ent structures of the questions are exemplified in (47)–(51) below. The questions are from the Trial of Charles Angus, a text which contains many it-clefts. Many of the questions seem less direct and more ‘polite’ since they are introduced by, for example, “Now are you sure that” or “Let me ask you”; see examples (47) and (49). (47) [$Q.$] Now are you sure that it was a quarter before six that your Master went to read the news? (Trials, Charles Angus 1800–1830) (48) [$Court.$] [$Q.$] What sort of matter was it that was produced? (Trials, Charles Angus 1800–1830) (49) [$Court.$] [$Q.$] Let me ask you who it was that sent you for the Madeira? (Trials, Charles Angus 1800–1830) (50) [$Q.$] You have been speaking in answer to a question dut [sic] about some stays, do you know who it was that put those stays in that place? (Trials, Charles Angus 1800–1830) (51) [$MR. HOLROYDE.$] What time was it that you first saw her on Wednesday the twenty-third of March? (Trials, Charles Angus 1800–1830)

‘Members of the legal profession’, referred to above, and ‘Others’ are the two speaker roles in the trials. A speaker role is a description based on the speaker’s social rank and professional background. ‘Members of the legal profession’ includes mainly judges and lawyers. ‘Others’ is made up of witnesses, such as servants, neighbours, friends and relatives of the defendants. ‘Others’ also comprises doctors as expert witnesses. The defendants themselves do not speak in the texts studied. ‘Members of the legal profession’ use 62 it-cleft constructions in their speech and ‘Others’ use 23. Lawyers and judges speak more than the witnesses do: in a representative sample of 5,000 words the ratio is 7 (‘Members of the legal profession’) to 3 (‘Others’). Thus, ‘Members of the legal profession’ also use more than twice as many it-clefts, particularly in their questions (see (47)–(51) above). Witnesses, such as servants, can be expected to speak more informally than judges and lawyers, as in Example (52). (52) [$CHARLOTTE HOLDER, sworn.$][$Cross-examined by the ATTORNEYGENERAL.$] Do you say you are quite sure he is the same man? – As him that went away; as I saw before he went away. Just the same? – I am convinced it is the same that went abroad. (Trials, Sir Roger Tichborne 1870–1900)

The use of it-clefts is not primarily linked to a formal or an informal style, however. Instead, it is the function of the it-cleft that is most important, i.e. to express identification, contrast, focus, and exclusiveness. Halliday (1994: 299–302) and Quirk et al. (1985: 1372, 1383) stress the thematic functions of the it-cleft, whereas Prince (1978: 883) points out the discourse functions of clefts and stresses the informational aspects of its use (see Section 5.3 below, Collins 1991: 3–4 and Johansson 2002: 41–48, 57–61).

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.16 (1114-1176)

 Christine Johansson

Table 7. Types of ‘cleft constructions’ in 19th-century Trials ‘Cleft construction’

Trials

It-cleft This/that/these/those is/are ... that (th-cleft) you are a person that TOTAL

85 (71%) 18 (15%) 16 (14%) 119

. Extensions of the it-cleft pattern The original structure of the it-cleft is naturally more frequent (71%, see Table 7) than its extensions. This is true both of the 19th-century data and the PresE material. The extensions of the pattern are mainly of two types: a demonstrative is used instead of it (the th-cleft, 15%, see Table 7) or a personal pronoun takes the place of it (14%, see Example (61) below).6 That/those are more frequent than this/these in the 19th-century material, since matters or events discussed earlier are often referred to, see Examples (53)– (55). In (56), where evidence is shown in court, these is used. (53) [$ELISABETH NIXON, sworn.$] Yes, the same dress, only she had a clean bed gown on, that is the only thing that I know of. (Trials, Charles Angus 1800–1830) (54) [$Mr. Brougham.$] Could those bodily symptoms that you describe have been put on for the purpose of deceiving you? (Trials, Jonathan Martin 1800–1830) (55) [$MR. SERJEANT PELL.$] That is the young woman that was with Mr Bowditch when Miss Glenn came out. (Trials, James Bowditch 1870–1900) (56) [$MR. CASBERD.$] My Lord, these are the letters which I now hold in my hand; does your Lordship think I may ask a question upon them. (Trials, James Bowditch 1870–1900)

In the 19th-century Trials, the pattern with a personal pronoun other than it is common and this construction often occurs as part of a lexical pattern with the word person. The function is to identify certain people and it is exclusively used by ‘Members of the legal profession’.

. Proverbial it-clefts, such as It’s a wise child that know his own father (Prince 1978: 905) are not found in the Trials. Proverbial it-clefts occur in Drama, in the fairly high-flown language of upper-class characters (see Culpeper 2001: 213): [CHEVIOT] It’s a coarse and brutal nature that recognises no harm... (Drama, W. S. Gilbert 1870–1900).

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.17 (1176-1245)

The use of the it-cleft construction in 19th-century English 

(57) [$Mr Holroyde.$] Have you not said, that you were doubtful, whether in the confusion that took place during this melancholy scene, you yourself was not the person that did it? (Trials, Charles Angus 1800–1830) (58) [$The ATTORNEY-GENERAL.$] Yes. In the Alresford circle you are a person that everybody knows? (Trials, Sir Roger Tichborne 1870–1900) (59) [$Sir Charles Russell$] Yes. Was he a man that was rather given to exaggerate symptoms? (Trials, Edwin Maybrick 1870–1900) (60) [$Q. (By MR. JUSTICE PARK.) $] Was she the lady who sat down on Mr. James Bowditch’s knee in your presence? (Trials, James Bowditch 1870–1900) (61) [$Q.$] At any rate, you are the Mr Nichols who published the book that has been mentioned in court? (Trials, Adelaide Bartlett 1870–1900)

Another construction, which is not an extension of the it-cleft pattern but marginally an alternative to an it-cleft, is existential there + relative clause, as exemplified in (62)–(63) below.7 Quirk et al. (1985: 1406) compare this construction to a cleft sentence of the type It is his callousness that I shall ignore: “A more important additional type of existential sentence is that which consists of there + be + noun phrase + relative clause, and which resembles the cleft sentence [...] in its rhetorical motivation.” As with clefts, Quirk et al. (1985: 1407) refer to an annex clause where that as subject can be omitted and different tenses can occur in the two parts of the sentence (There are some planets that were discovered by the ancients). The constructions that knows (spoken by a member of the legal profession) and there is (spoken by a witness) are worth noting in Example (63). Compare the discussion of speaker roles in Section 5.1 above. (62) [$Mr. Poland.$] There is a matter which I passed over. (Trials, Adelaide Bartlett 1870–1900) (63) [$ELISABETH STUBBS, sworn.$] [$Cross-examined by the ATTORNEYGENERAL. $] And I suppose there are not many people about but that knows your chimney-corner? – There is a great many that do. And very few that do not? – There is a great many that do. (Trials, Sir Roger Tichborne 1870–1900)

Quirk et al. (1985: 1407) state that the “existential-with-relative construction” commonly emphasises a negative, and, indeed, in Trials most examples of existential there + relative clause are with a negative. This is in situations where something is denied or where further clarification is needed, particularly in a question, as in . Examples such as There is a matter which I passed over seem to convey information in a different way than the it-cleft. Compare It is a matter which I passed over and That/this is a matter which I passed over which express the matter as being discussed before, i.e. known information.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.18 (1245-1308)

 Christine Johansson

(64) and (65). There are 18 examples of the existential-with-relative construction in Trials. (64) [$FRANCIS BARKER, SWORN. $] There was nothing that I could take any notice of particularly. (Trials, Jonathan Martin 1870–1900) (65) [$Q.$] Was there anything else that you observed about him? (Trials, Sir Roger Tichborne 1870–1900)

. Informational aspects of it-clefts in Trials Trials as a genre consists of direct dialogue written down with questions and answers; in other words, it is quite interactive. Compare Examples (66)–(73) below from the three texts which contain most it-clefts: the Trial of Charles Angus, the Trial of James Bowditch (both from Period 1) and the Trial of Edwin Maybrick (Period 3). The examples are all stressed focus (SF)-clefts, spoken by judges, witnesses and doctors. It-clefts and extensions of the pattern are represented; a th-cleft is exemplified in (67) and a personal pronoun other than it included in the lexical pattern with person is exemplified in (68). As is evident from the examples, an adverbial is often the focused element (and the stressed or foregrounded one; see Huddleston & Pullum 2002: 1414) since time and place are important information in Trials. (66) [$JANE NICKSON, sworn.$] Sir, it was the Cook that was trying the salt, and [that] said there was not a pewter plate to put the salt upon, [...] (Trials, Charles Angus 1800–1830) (67) [$Mr. Scarlet.$] And that was the tartar emetic that was given the week preceding her death to the maid servant? (Trials, Charles Angus 1800–1830) (68) [$Court.$] Were you the person that brought the quilt down stairs? (Trials, Charles Angus 1800–1830) (69) [$MARIA GLENN sworn.$] It was first in Mr Pauls house that I saw it... (Trials, James Bowditch 1800–1830) (70) [$MR. CASBERD.$] Who was it that produced those papers? (Trials, James Bowditch 1800–1830) (71) [$MR. JUSTICE PARK.$] Was it prior to your going to Mr Kinglake’s office that Miss Glenn had given you the description of the person whom she had seen in the court, [...] (Trials, James Bowditch 1800–1830) (72) [$MRS BRIGG’S EVIDENCE.$] Was it upon the conversation in this room on Valentine’s meat juice that the policeman said you must have no conversation? (Trials, Edwin Maybrick 1870–1900)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.19 (1308-1367)

The use of the it-cleft construction in 19th-century English 

(73) [$Mr Addison.$] You are an expert in the homoopathetic [sic] system, and it was from your reputation in this that he came to you? (Trials, Edwin Maybrick 1870–1900)

The stressed focus (SF)-clefts, as in (66)–(73), are decidedly the most frequent types of it-clefts in both the 19th-century Trials and the 19th-century data generally (compare also Collins 1991: 189–191 for PresE). Items like the salt, the quilt, it, those papers, the policeman in the that/wh-clause suggest that the information is already known (or backgrounded, see Huddleston & Pullum 2002: 1414). Example (66) illustrates both the structural and informational complexity of some of the clefts in the 19th-century data (both in the speech-related texts and the written texts, for example, History and Science). In PresE, the it-cleft is a fairly ‘short’ construction structurally and that part which presents given or known information is naturally shorter than the one conveying the message (see Collins 1991: 203–207). In Example (66), the sentence goes on mentioning new information in a coordinated clause (and said there was not a pewter plate to put it on). It is possible that the information and said... was known to some people in the courtroom but new to, for example, the judge and the prosecutor (see Prince 1978: 904). In Example (74), the information in the that-clause seems to be new, in other words, it is an informative-presupposition (IP) cleft (Prince 1978: 898). The thatclause contains the message. An expert witness describes the mental health of the accused. ‘Insanity’ has been discussed before but not the conditions of this particular case of insanity. Example (74) is one of the few examples in which it is is used in the first part of the cleft. It is occurs when, for example, expert witnesses and others give a description of a disease or its symptoms or for causes of death. (74) [$Witness$] It is a degree or species of insanity, that can be confined to one idea only, or to one train of ideas, upon one particular subject. (Trials, Jonathan Martin 1800–1830)

Other examples of IP-clefts are (75) and (76): the focused part contains an anaphoric item (the policeman, that kind of vomiting) and the wh-clause and thatclause respectively, convey the message, i.e. new information. According to Ball (1991: 506), IP-clefts became more frequent in the Late Modern English period, but they do not make up a large number of the clefts in the 19th-century material used in this paper. (75) [$MR TIDY’S EVIDENCE.$] Certainly not; it’s not that kind of vomiting that is described as taking place in a typical case of arsenical poisoning. (Trials, Edwin Maybrick 1870–1900)

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.20 (1367-1424)

 Christine Johansson

(76) [$Sir Charles Russell$] Did you gather from what your sister said that it was the policeman who desired that there should be no conversation with Mrs. Maybrick about this? (Trials, Edwin Maybrick 1870–1900)

Discontinuous clefts (Declerk 1984), i.e. clefts in which both the focused item and the that/wh-clause convey new information, seem to be fairly rare in the 19thcentury Trials and in the 19th-century data generally. This type of cleft could be expected to occur in the Trials genre since both what is in the first part of the cleft and what is in the second could be new to some people in the courtroom. The examples of discontinuous clefts in Trials often include some information on time (a point in time, a month, or a date) in the first part of the cleft, as in Examples (77)–(78). (77) [$Sir Charles Russell$] Let me remind you again, wasn’t it on Monday the 6th that the boy came up from the office with letters and the telegrams? (Trials, Edwin Maybrick 1870–1900) (78) [$Court.$] [$Q.$] But how long was it after you came down stairs that you went in? (Trials, Charles Angus 1800–1830) (79) She said said that Dr Humphreys said it was only his liver that was out of order, [...] (Trials, Edwin Maybrick 1870–1900)

. Conclusion Previous studies (Ball 1991; Pérez-Guerra 1999) have shown that in Old and Middle English, it-clefts are very rare, and that they are not very frequent in Early Modern English either. The present study shows that the period in which it-clefts start being used frequently is the 19th century (part of the Late Modern English period). If Late Middle English and Early Modern English are compared with 19thcentury English, it is clear that the use of it-clefts has increased: 32/100,000 words occur in the 19th-century corpus but only 3/100,000 words in the Late Middle English and Early Modern English data. According to Ball (1991: 511–512), the development of it-clefts is generally linked to broader changes in the English language: the movement from verb-final to verb-medial constructions, also evident in copular sentences such as I it am → it is I and the emergence of an overt expletive it in Early Middle English (on word order changes, see also Denison 1993: 28–30, and Görlach 1999: 74–75). Ball (1991: 509 and 1996) states that there are changes in the relative clause with the wh-forms being introduced in the 14th century, allowing AdvPs and PPs to be focused. That introducing the second part of the cleft has always been predominant (Ball 1991: 514) and this is also true of the 19th century (77%), even if this is the

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.21 (1424-1491)

The use of the it-cleft construction in 19th-century English 

period in which wh-forms are used more frequently than in both earlier and later periods of English (see Johansson 2006). Somewhat surprisingly, who introducing the second part of the cleft is rare in the 19th-century data; it is used in only 3% of the examples. The proportion of who in it-clefts is higher in PresE data. Ball (1991) includes only written genres in her historical survey of the itcleft (OE to PresE). Pérez-Guerra includes written and speech-related data (LME to PresE), and finds that it-clefts are more frequent in written genres. Collins (1991: 181–182) and Biber et al. (1999: 961–962) state the same result for PresE. The present study, however, has shown that in the 19th century, it-clefts are nearly three times as frequent in speech-related genres (47/100,000 words) as in written genres (16/100,000 words). They are particularly frequent in 19th-century Trials. The use of it-clefts remains constant and is essentially the same in the first period of the 19th century (1800–1830) as towards the end (1870–1900). In scientific texts, however, the number of it-clefts decreases at the end of the century, possibly because it-clefts were no longer considered suitable components of the style of scientific writing (for PresE, compare Biber et al. 1999: 960). Ball (1991: 506, 509) states that one of the most important pragmatic developments in Late Modern English (including the 19th century) is that IP-clefts (informative-presupposition clefts) increase in frequency. The that/wh-clause may thus contain new information (Prince 1978: 898). In the 19th-century data studied here, IP-clefts are not very common. Instead, the ‘original’ cleft construction, the stressed focus cleft, is the most frequent type. It-clefts are particularly frequent in what could be regarded as the genre representing most closely spoken 19th-century English: Trials. This genre includes different speaker roles, and is interactive in that questions and answers make up most of the dialogue. Members of the legal profession use it-clefts in their questions for further clarification or identification of people and objects. Different types of it-clefts occur: most frequent are the stressed focus clefts (Sir, it was the Cook that was trying the salt,...) but IP-clefts are also found (it was the policeman who desired that there should be no conversation with Mrs. Maybrick about this?), and there are a few examples of discontinuous clefts where both the first and the second part contain new information (it was only his liver that was out of order). Extensions of the it-cleft pattern are frequent in Trials, as in And that was the tartar emetic that was given (th-cleft) and you are a person that everybody knows? (personal pronoun other than it + a general noun, such as person or man). Having demonstrated that it-clefts became more frequent in the 19th century, we may now ask what the reasons are for this change. According to Ball (1991: 509), there were changes and processes in information structure in Late Modern English, which may still be going on. The main function of it-clefts is that of expressing divided focus. It is simply a useful device to convey information. The focused element can be more important than the information in the that/wh-clause but the

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.22 (1491-1580)

 Christine Johansson

that/wh-clause can also contain the message. Generally, 19th-century it-clefts seem to be more complex structurally and informationally than PresE examples. This is not only the case in writing (see Collins 1991: 203–207 and Prince 1978: 886) but also in speech-related texts; compare the following example from Trials: Was it on that evening on which the conversation took place that he told you about the thirteen and the seven? The fact that some it-clefts are more complex in 19th-century English could suggest that they have a different function in conveying information than in PresE and that the it-cleft construction, which is fairly short in PresE, is not ‘formalised’ (it is/was ... that) to the same extent in 19th-century English.

References Ball, C. N. 1994a. Relative pronouns in It-clefts: The last seven centuries. Language Variation and Change 6: 179–200. Ball, C. N. 1994b. The origins of the informative-presupposition It-cleft. Journal of Pragmatics 22: 1–38. Ball, C. N. 1991. The Historical Development of the It-Cleft. PhD dissertation, University of Pennsylvania. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English: London: Longman. Breivik, L. E. 1986. Some remarks on cleft sentences in present-day English. In Linguistics across Historical and Geographical Boundaries. In honour of Jacek Fisiak on the occasion of his fiftieth birthday, Vol. 2, D. Kastovsky & A. Swedek (eds), 815–826. Berlin: Mouton de Gruyter. Collins, P. C. 2005. Cleft and pseudo-cleft constructions in English spoken and written discourse. In Corpus Linguistics: Readings in a widening discipline, G. Sampson & D. McCarty (eds), 85–94. London: Continuum. Collins, P. C. 1991. Cleft and Pseudo-Cleft Constructions in English. London: Routledge. Culpeper, J. 2001. Language and Characterisation. People in plays and other texts. London: Pearson Education. Declerk, R. 1984. The pragmatics of it-clefts and wh-clefts. Lingua 64: 251–289. Denison, D. 1993. English Historical Syntax. London: Longman. Görlach, M. 1999. English in Nineteenth-Century England. Cambridge: CUP. Halliday, M. A. K. 1994. [1985]. Introduction to Functional Grammar. 2nd ed. London: Edward Arnold. Huddleston, R. & Pullum, G. K. 2002. The Cambridge Grammar of the English Language. Cambridge: CUP. Jacobsson, B. 1994. Nonrestrictive relative that-clauses revisited. Studia Neophilologica LXVI(2): 181–195. Johansson, C. 2006. Relativizers in 19th-century English. In Nineteenth-Century English: Stability and change, M. Kytö, E. Smitterberg & M. Rydén (eds). Cambridge: CUP. Johansson, M. 2002. Clefts in English and Swedish. A contrastive study of it-clefts and wh-clefts in original texts and translations. PhD dissertation, Lund University. Kytö, M., Rudanko, J. & Smitterberg, E. 2000. Building a bridge between the present and the past: A corpus of 19th-century English. ICAME Journal 24: 85–97.

JB[v.20020404] Prn:26/03/2008; 16:49

F: SCL3111.tex / p.23 (1580-1592)

The use of the it-cleft construction in 19th-century English 

Pérez-Guerra, J. 1999. Historical English Syntax. A statistical corpus-based study of the organisation of early modern English sentences [Lincom Studies in Germanic Linguistics 11]. Munich: Lincom. Prince, E. 1978. A comparison of WH-clefts and it-clefts in discourse. Language 54: 883–906. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.1 (47-117)

Place and time adverbials in native and non-native English student writing William J. Crawford Northern Arizona University, USA

This chapter builds on previous research that has established the spoken nature of learner writing by providing quantitative and qualitative accounts of time and place adverbs of student writing in comparison to published academic English writing and native English conversation. The chapter shows that the frequency differences among learner groups are not nearly as great as the frequency differences between student writing and conversation. The qualitative analyses point to some L1-L2 differences, particularly with respect to here. The other most pronounced differences were not found as L1-L2 differences but instead showed evidence of divergence due to language background.

.

Introduction

The popularity of Computer Learner Corpus (CLC) research has had an increasing influence in the areas of second language pedagogy and second language acquisition. Learner corpora have been used to guide reference material designed for second language learners (for example, the Longman Essential Activator 1997 and the Cambridge Advanced Learner’s Dictionary 2003) and to inform research in second language classrooms (Flowerdew 2001; Horvath 2001). CLC data has also guided research in Second Language Acquisition (SLA) to address, for example, issues such as native language transfer (Altenberg 2002) and developmental theories on the acquisition of lexical aspect (Housen 2002). In its relatively short lifespan, CLC research has informed both the areas of skill and form acquisition in second language contexts and has contributed to a more profound understanding of learner language. Another area where such insights are apparent is found in studies illustrating the “spoken nature” of learner writing. The vast majority of these studies are lead to this conclusion by noting differences in the frequency of linguistic features reflective of spoken language relative to the use of native writers of English (Biber & Reppen 1998; Granger & Rayson 1998; Petch-Tyson 1998; Aijmer 2002; Hinkel

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.2 (117-177)

 William J. Crawford

2002). The implications of this type of research support the position that linguistic behavior can be tied to explanations such as general experience in writing (Granger & Rayson 1998), a restricted access to more elaborated forms of English (Hinkel 2002), or even to a lack of register awareness (Altenberg & Tapper 1998). Regardless of the explanations for why learners use more spoken features in their writing, frequency counts of lexical and syntactic features are a defining characteristic of a good deal of CLC research and many studies have used frequency counts to support the view that learner writing uses lexical and clausal features at levels that are more comparable to spoken forms of language than to written ones. Sylviane Granger (1998, 2005) has discussed four distinguishing characteristics of CLC studies: (1) they contain language collected in a naturalistic setting; (2) they contain sizeable amounts of data; (3) they control for a number of variables; and, (4) they use computers to automate a number of tasks that are possible with numerous software applications. The increased availability of such large samples of learner language has allowed second language researchers to investigate a wide range of issues in both second language teaching and second language acquisition; some researchers, however, have cautioned against the potential problems associated with an overreliance on automated analyses. For example, Nesselhauf (2004) maintains that the use of automatic analyses has resulted in an over-abundance of studies that concentrate on frequency information and such a criticism suggests that CLC research would benefit from a wider range of analyses that are not (yet) possible using computer software. Motivated by studies which have characterized spoken features of learner writing as well as the suggestion to go beyond frequency discussions of learner language using automated analyses, the present study builds on previous research by providing a quantitative and qualitative account of time and place adverbs in student writing in comparison to published English writing and native English conversation. It is not unreasonable to assume that, if frequency counts in learner writing are truly reflective of conversation, functional comparisons should also be reflective of conversation. The automated frequency counts prevalent in many previous CLC studies have not allowed for the possibility that a given learner feature may share a conversational frequency but not a conversational function. For example, the frequency of the time adverb now in learner writing may be similar to what is found in conversation; however, it may turn out that learners are using now to serve functions that are more characteristic of written language than functions which are typical in conversation. A study that uses both quantitative and qualitative analyses is quite useful to shed more light on the conversational nature of learner writing. Using the Longman Grammar of Spoken and Written English (LGSWE, Biber et al. 1999) as a guide, this paper first identifies two place adverbs, here and there, and two time adverbs, now and then, all of which are markedly more frequent in con-

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.3 (177-212)

Place and time adverbials 

versation than in academic writing. A functional typology of these adverbs is then developed which permits qualitative comparisons of these adverbs in a corpus of “inexperienced” writers taken from the International Corpus of Learner English (ICLE) and the Louvain Corpus of Native English Essays (LOCNESS) with those of published academic writing and face-to-face conversation. The study addresses two questions: (1) To what extent does the frequency and function of the selected adverbs in the writing of L2 English student writers differ from what is found in L1 English student writers? and (2) To what extent do the frequency and function of the selected adverbs in the student writers (in both L1 and L2 English) differ from what is found in native English conversation and academic writing? The first question directly compares both L1 and L2 student writing as is done in the majority of studies mentioned above (Granger & Rayson 1998; Petch-Tyson 1998; Aijmer 2002; Hinkel 2002). The second question compares the “inexperienced” (i.e., student) writers with existing native corpora in different registers (as is done in Biber & Reppen 1998). In the following section of the paper, I will review a number of studies that have identified lexical items reflective of spoken language and then describe some examples of previous CLC studies that have included qualitative analyses of learner writing. I will then describe the corpora used in the study and explain the method used to provide a functional taxonomy of the lexical items here, there, now, and then. The results of the study and discussion of the findings follow. The paper concludes with a review of the general findings of the study and discusses some possible implications of this study and for future research.

. Previous work CLC studies comparing L1 and L2 writing can be categorized into a number of different types. The majority of CLC studies have compared the frequency of a feature (or set of features) in corpora of L1 and L2 student writers. This type of research provides a clear comparison of what inexperienced (i.e., student) writers may do relative to each other but does not compare L2 writing to what is found in published writing. Another approach used in CLC research is to compare L2 student writers with published academic writing or native-speaker conversation in order to illustrate the potential disparity between established characteristics of published writing and L2 writing. One common conclusion reached in these two approaches is the similarity between non-native writing and spoken language and researchers have proposed a number of reasons for this “defining characteristic” of L2 writing. A third approach uses previously established functions associated with a given feature in both L1 and L2 student writing and compares the functions in L1 and L2 learner writing. In the following paragraphs, I briefly review studies

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.4 (212-265)

 William J. Crawford

of these different approaches and then motivate the approach used in the present study, an approach which combines both frequency and functional analyses of L1 and L2 student writing as well as larger corpora of native English conversation and academic writing. Studies comparing L1 and L2 student writers have covered a wide range of lexical features. Granger and Rayson (1998) and Hinkel (2002) both have documented the high frequency of general or vague nouns such as people, things, and stuff as well as the indefinite and assertive pronouns such as everyone, anybody, and something in learner language. These same studies, along with Petch-Tyson (1998), have also mentioned the higher learner frequency of first and second person pronouns in writing are also more frequent in conversation. Lexical features related to the verb and the verb phrase have also been used to support the view that written learner language is similar to spoken language. Aijmer (2002) looked at different ways of expressing modality in L1 and L2 student writing and reasoned “. . .it may be that non-native speakers use modal expressions that are closer to the informal registers and to speech and that native speakers are more formal” (58). Hinkel (2002) found that modals of possibility were used more often by Chinese, Japanese, and Korean speakers and that the modals may and might were not as common as can and could. Hinkel also reported that certain verb types which were overused by learners, particularly private verbs (believe, think) and expecting/wanting/tentative verbs (attempt, desire, like, plan, try) are more common in speech. Adverbs and adverbials have also been popular candidates in CLC research. Hinkel (2002) reported that amplifiers (absolutely, severely), emphatics (a lot, for sure), and adverbs of cause (because, since) are used more often by learners. Granger and Rayson (1998) found that the subordinators if and because were reflective of spoken language and that short adverbs such as also, so, very, even, and more were “speech-like adverbs” (pp. 127–128). While these studies have been most concerned with L1 and L2 student writing comparisons, they have also used existing studies to support the speech-like or informal register explanations of L2 writing. Interestingly, and most likely due to the available research at the time, most of these studies establish spoken norms by reference to research that is either not corpus-based, or based on very small corpora to support their conclusions (Brown & Levinson 1987; Chafe 1970; Chafe 1982; Chafe & Danielewicz 1987). The only large corpus-based study in this group is Biber’s Variation Across Speech and Writing (1988). A representative corpus-based study that compares L2 writing with nativespeaker norms can be found in Biber and Reppen (1998). This study focused on clausal features of learner language and showed that French, Spanish, Chinese, and Japanese learners’ use of that deletion in complement clauses pattern closer to conversation than to news and academic writing as found in the LGSWE corpus.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.5 (265-318)

Place and time adverbials 

They concluded that, “. . .the patterns of use in the learner essays are very similar to those found in native conversation and fiction, but strikingly different from those found in academic prose” (157). In addition to the frequency-based studies, other research has focused on functional comparisons of learner writing. These studies are included here not because they make claims about the similarity between learner writing and conversation but because they compare the functions of a given feature in L1 and L2 writing and note the need for further functional comparisons. In an extensive comparative study of adverbial connectors, Altenberg and Tapper (1998) used four corpora: (1) the Swedish component of the ICLE by Swedish L1 speakers; (2) the French component of the ICLE; (3) an L1 Swedish corpus of essays; and, (4) the LOCNESS. They reported that both Swedish and French learners overused adverbial connectors compared to native English speakers (with the French learners using more than the Swedish learners). One aspect of this study that is especially relevant to the present investigation is the functional analysis of adverbial connectors provided by Altenberg and Tapper. Using the functions of adverbial connectors proposed by Quirk et al. (1985) (listing, summative, appositive, resultative, inferential, contrastive, transitional), they showed that when writing in their native language, the Swedish learners used the appositive types (for example, namely) more than native speakers, but in their L2 English writing, the Swedish learners used fewer resultative (as a result, consequently) and contrastive (by contrast, however) connectors than native English speakers. In their conclusion, Altenberg and Tapper state that “. . .further research is needed which takes into account not only the quantitative aspects of connector usage but also its qualitative aspects – how connectors are actually used by learners” (92). In a subsequent study of adverbial connectors used by Hungarian learners of English, Tankó (2004) found that the learners used adverbial connector types at a proportion similar to that of native speakers, but that they also used more tokens of each type. Using the same set of functions adopted by Altenberg and Tapper, Tankó showed that the learners preferred connectors that fulfilled a listing and contrastive function to a much greater degree than native speakers. Tankó ends the paper with a call for study investigating “[a] comparative study of texts written by expert academic writers” (178). As evidenced by this short literature review, CLC studies on L2 writing have identified a range of linguistic features in L2 writing that share frequency counts found in native conversation. Functional accounts of learner writing have also contributed to these comparisons but few make explicit mention of the functions of a selected feature in spoken or written language. An important issue that these studies rarely address is the possibility that learners are using a high frequency of a given lexical item that is similar to conversation but are employing the functions associated with academic writing and not with conversation. The present study addresses this concern by comparing both frequency of four adverbs as well as

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.6 (318-434)

 William J. Crawford

frequency of the functions associated with them to L1 student writers, published academic writing, and native conversation. In the following section, I motivate the selection of adverbs, describe the corpora used for the present study, and describe the methodology used for arriving at a set of functions for each adverb.

. Adverb selection, corpora and methodology This study considers four adverbs which were selected on the basis of their high frequency in conversation and their relatively low frequency in academic writing as reported in the LGSWE. Table 1 provides the frequency distribution of the selected adverbs in academic writing as well as in conversation (both American and British English).1 All of these are much more frequent (at least four times more common) in conversation than in academic writing. Table 1. Frequency of time and place adverbs in the LGSWE (pp. 796–797) per 100,000 words Adverb type Place Time

Adverb

Academic writing

Conversation (American/British)

here there now then

40 10 40 60

260/180 400/360 180/240 260/320

Table 2 describes the corpora used for the present study. The L2 corpora were selected to reflect languages with different typological backgrounds (Germanic (German), Romance (Spanish), and Slavic (Bulgarian)). The L1 corpus used for this study was the American argumentative writing component of the LOCNESS. Despite the differences in the four L1 and L2 student writing corpora with regard to number of texts, number of words per text and average number of words per text, the assumption adopted for this chapter is that the corpora are comparable; all student essays are of a similar genre (i.e., argumentative essays) and thus merit comparison in line with a large body of existing research which uses these same texts (see the studies in Granger 1998, 2002, for example). Table 2 also provides information on the LGSWE corpus that was used for comparative purposes. Before describing the method used to determine the frequency and identify the range of functions associated with here, there, now, and then, it is important to note that these lexical items do not always function as time and place adverbs. From a broad descriptive perspective, all four words involve deictic reference: here . Both American and British counts are included so that the reader can compare the frequency of a given adverb with either variety of English.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.7 (434-462)

Place and time adverbials 

Table 2. Corpora used in the study Corpora

# of texts

# of words

Average words per text

434 198 302 176 717 152

224,974 133,315 200,927 149,574 4,964,808 5,388,439

518 673 665 850 6,924 35,450

German (ICLE) Spanish (ICLE) Bulgarian (ICLE) English (LOCNESS) Conversation (LGSWE)* Academic Writing (LGSWE)* *These counts are per file not per text

and there entail deictic reference to a location relative to a participant in the speech event, typically the speaker (Levinson 1983: 62); now and then have reference to time relative to a temporal reference point. Typically, this point is the moment of utterance (Levinson 1983: 54, 62). From a more restricted perspective, these words may exhibit differences in the reference point involved (e.g., the place adverb here may serve as a discourse referent or as a place referent) or fulfill a non-deictic function altogether (e.g., the time adverb then may function as a linking adverbial or occur in fixed phrases such as “now and then” or “then again.”).2 Thus, although there is a generally understood or “core” meaning for these time and place adverbs, the core use is only one of many functions associated with each adverb. The four lexical items under investigation here are referred to as adverbs in reference to this core sense of the word for the sake of convenience. A detailed functional analysis shows that they have a much wider functional distribution than their use as time and place adverbs – they also have adverbial functions as well as discourse and grammatical functions that are not associated with the lexical class of adverbs. For the quantitative analysis, the software program Monoconc Pro 2.0 (Barlow 2000) was used to collect tokens and provide frequency counts of the four adverbs in the L1 and L2 student corpora. For the qualitative analysis, tokens were extracted from the corpora using Monoconc Pro 2.0 and saved as text files. Each token was then analyzed in context to identify its function. In some cases, functions were identified by reference to previous work; in other cases, substitution of synonymous forms was used to identify the functions of each lexical item. All of the L1 and L2 tokens were analyzed as well as 500 tokens of each adverb that were randomly sampled from both the conversation and academic writing LGSWE corpora. The counts are therefore not exhaustive and are merely meant to reflect a proportion of their functions in a 500-token sample with half the examples taken from British English and half from American English. The amount of contextual . For the purposes of this study, phrases such as now and then, now again, then again, and here and there, were omitted.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.8 (462-512)

 William J. Crawford

information used to identify the function of each token varied as is illustrated in the text samples below. In some cases, the function was fairly easy to identify (as in (1) below) and not much context was needed; in other cases, a good deal more context was used (as in (10) and (11) below). In a very few cases, it was necessary to go back to the corpus and consider more context in order to reliably identify a function. These functions were primarily identified by the researcher. In cases where there was some confusion or potential overlap between possible functions, another applied linguist was consulted before the final classification was made. In the following sections, I outline a proposed set of functions for here, there, now, and then. . Functions of here The adverb here can be used as: 1) reference to a place to a physical/metaphysical or unspecified location; or 2) a metadiscursive reference to a place in the essay or to a verbal statement. Examples (1)–(2) illustrate its first use and Example (3) demonstrates the second use. In (1), the adverb here is used to refer to a physical location that has been established in the previous sentence (their table). In (2), here makes reference to a nonphysical place that can be described as ‘our consciousness’ or ‘our everyday life.’ (1) We went over to their table and asked them how long they have been waiting for their drinks. They have been sitting here for almost an hour. L1 German (2) Although we have been very lucky to be born in the Age of technology, this does not mean we must forget what we are, what we want and that television must be here to be useful for the masses and not to be their opium. L1 Spanish

This contrasts with here when clearly used in a metadiscursive sense. This is seen in Example (3) where the writer uses here to refer to the previous complex point that further exploration and desire to learn results from a deeper understanding of a particular topic. (3) Then comes etiquette with its “frigid air” of formality and, of course, I shouldn’t miss the conventional concepts that hinder the development of science and art. If you allow me a little deviation here, I would like to elaborate on the “conventional concepts” because I find them important and quite relevant with our topic of discussion. L1 Bulgarian < ICLE-BG-SU-1065>

A methodological note worthy of mention is that the metadiscursive function of here was only counted if the writer made explicit reference to the essay/ conversation itself as in (3). All other cases were counted as a place reference.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.9 (512-571)

Place and time adverbials 

There was one example in American conversation of here used in the metadiscursive sense (What I’m saying here is that Congress has mandated that the benefits are to be given. Congress has mandates, and mandated that they deserve the benefits of what it says.). . Functions of there The adverb there also had two functions: 1) locative there and 2) existential there. Locative reference is illustrated in (4) where the place adverb there refers to the noun jail found in the previous sentence and contrasts with existential use in (5) and (6) where there is no previous noun phrase referent referring to there; instead, this example shows a use which expresses the existence (or non-existence) of something. Example (5) illustrates the typical type of existential construction with there occurring with a form of the verb be. While the vast majority of examples of existential constructions did contain some form of the verb be (the LGSWE reports that existential use without be is rare), examples such as (6) were also included as existential. (4) People who go to Jail do not rehabilitate, quite the opposite. Sometimes non guilty persons go to prison and they become corrupted there. L1 Spanish (5) But she intends to return home in 1993, after having married her fiance Goran; a marriage I am looking forward to. Maybe there is a chance to visit my courageous and remarkable, rosy-cheeked and snub-nosed and frecklefaced friend in her native country in 1994. L1 German (6) When man and woman decide to live together there comes up the question of faithfulness and love and it is not possible to live in total freedom because there is a certain obligation to the partner, one has to care for his/her partner. L1 German

. Functions of now The functional types of now were divided into four categories: 1) reference to present time or to metadiscursive time; 2) a linking adverbial describing a series of steps or processes that could be replaced by next/the next point is; 3) a result adverbial that could be replaced by so/therefore; and 4) a discourse marker that was found to occur only in the conversation registers. In Example (7), the writer uses now and the time adverbial in the recent past to illustrate a temporal change in the types of school discussions from the past to the present. Also included in this category was the metadiscursive now referring to a

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.10 (571-643)

 William J. Crawford

present point on the discourse. This is illustrated in Example (8) where the writer uses turning now to shift the topic to another group of ‘household carers.’ (7) With the current emphasis on moral education, values clarification, and death education, religious values are bound to be brought up and discussed more often in school now than they have been in the recent past. L1 English (8) Small wonder if tempers are sometimes frayed and words or actions rough. These are the women who cry alone at night. And life for their male counterparts may be no less bleak. Turning now to consider the other main group of younger household carers, there are an increasing number, currently about 11 per cent, of elderly people living with younger people, usually daughters and sons-in law. (Academic Writing)

A second function of now is found in its use as a linking adverbial. In Example (9) the writer uses now to introduce a further question concerning censorship where now is doing some procedural work in the essay; instead of referring to a present belief or state of being, it is used as a cohesion device to introduce a further point in the essay. In this function, now could be paraphrased as next or the next point is. (9) Two questions have been dealed: why and where. Now comes the third one: what contents should be censored? The answer seems to be the most difficult. L1 Spanish

A third function shows now as a result adverbial with the meaning of so or therefore. In (10), the point discussed after now (i.e., the participation of the lawyers was due to the fact that they were paid) is related to the previous point (i.e., some criminals avoid conviction because they are able to pay for expensive lawyers). In this sense, what follows now is a point about how rich people can afford effective lawyers, but what precedes now is the idea that rich people are more likely to appeal a case successfully.3 (10) Another reason in which society does not need a death penalty is because a death penalty really is not “full proof ” and consequently is not the correct choice for a punishment. First of all, many criminals sentenced to death can get out because of how much money they have. It has been proven that many criminals have been able to successfully appeal the death sentence because of the expensive, good lawyer that they had. Now, the only reason that they had these lawyers was because of the money they were able to pay them. L1 English . The LGSWE mentions a similar function of now as “an utterance launcher” which “often marks a return to a related subject and at the same time a new departure” (1088).

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.11 (643-670)

Place and time adverbials 

The final function of now is its use as a discourse marker. The definition and status of discourse markers in general is open to a good deal of interpretation (see Fraser 1999 and Muller 2005 for good overviews). While Fraser (1999), for instance, argues against the status of now as a discourse marker, Schiffrin (1987) discusses the use of now as a discourse marker which shows a “speaker’s progression through the discourse time of comparison” (p. 236). She discusses a number of different types of functions related to level of explicitness, syntactic location, and completion of comparisons, but in this paper they are placed into one general group and are categorized as such by their exclusion from the three previouslymentioned functions. An example of the discourse marker use is found in (11) where speaker A in discussing scheduling at work shifts from a general assessment ‘. . .roughly thirty one or thirty five to forty hours a week’ to the more specific assessment of the previous proposition; namely that speaker A will ‘. . .probably end up with more than that. . .’ In Schiffrin’s terms, the writer uses now to signal an “ideational” shift from the “general truth” expressing the number of hours that all workers are subject to the more specific result of the previous proposition. In this example, now does not relate to present time, nor does it function as a linking or result adverbial that can be replaced by the synonyms next/the next point is or so/therefore. (11) A: I mean, you’re on-, they’re only scheduling everybody for about roughly thirty one or thirty five to forty hours a week. Nobody is being scheduled for anything more than that. B: I know. A: Including myself so ... and I think I’m only scheduled for thirty one hours next week. Now I’ll probably end up with more than that, but I’m probably never going to end up with like sixty in a week. I mean I have been consistently getting at least one day off, where there’s just no need and we have extras anyway because they go ahead and they schedule two tops of the triangles and they only use one, so we have extra to begin with. (American Conversation)

. Functions of then Finally, then is used to realize three different functions: 1) a time adverb; 2) a linking adverbial with the meaning of next or the next point is; and, 3) a linking adverbial to mark result or conclusion with the meaning of so or therefore. Example (12), where the writer is using both a past and a present time adverb (then and now) to relate a state of affairs that has not changed over time, illustrates the use of then as a time adverb.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.12 (670-765)

 William J. Crawford

(12) “Ralph’s Story” was written in the last decade and is more contemporary than Betty Friedan’s work, but the similarity is unmistakable. Men and women, then as now, are trapped by the expectations of others. L1 English

Example (13) shows the use of then as a linking adverbial used to illustrate a sequence of steps or processes. In this example the writer is describing a new type of composite religion (Hindooslimianity) as one would explain a ‘recipe.’ (13) Today, there is a recipe for everything: you take an ounce of Christianity, a pinch of Islam, a little bit of Buddhism, then you add some blood of a black rooster and - abracadabra - you have Hindooslimianity, the religion in which “anything goes”. L1 German

The final function of then is its use as a linking adverbial to mark result or consequence as illustrated in (14)–(15). In (14), then co-occurs with the subordinator if to show a relationship between one idea (having ‘a healthy current account’) and another (regarding ‘yourself as being lucky’). The LGSWE refers to the if. . . then chain as a “correlative subordinator” (86). (14) If you are one of those who have a healthy current account, then, you could regard yourself as being extremely lucky. Immediately, all doors which were shuted up for other people, will open just for you. L1 Spanish

In (15), the adverb is used to indicate a connection between the stated question (how do artists manifest what is in their imagination) by comparing the ways different types of artists express themselves in different time periods. (15) And what are the artistic manifestations but the attempt to evoke those dreams, to express those fictitious ideas that wander in the mind of the artist? Let us compare, then, the amount of artists and movements within the different dimensions of art (literary, musical, pictorial, ...) that existed in the previous centuries with those that occur now, in the current century. L1 Spanish

As with now, the linking adverbial function of then is determined by reference to the synonyms so or therefore. The functions proposed for these four adverbs are summarized in Table 3.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.13 (765-882)

Place and time adverbials 

Table 3. A functional typology of here, there, now, and then Adverb Type

Adverb

Function

Place

here

physical/metaphysical/unspecified place metadiscursive referent physical place existential reference to present time linking adverbial in a series linking adverbial of result or consequence discourse marker reference to past time linking adverbial in a series linking adverbial of result or consequence

there Time

now

then

. Results and discussion With the aid of Monoconc Pro 2.0 (Barlow 2000), the frequency of the four adverbs was calculated across the four corpora. Table 4 provides the raw and normalized frequency counts for the time and place adverbs for all student writing corpora, as well as the normalized frequencies in academic writing and conversation. The overall picture that emerges from this frequency comparison illustrates no overall pattern of L1-L2 difference. In fact, while there are some cases where the L2 writers use a lexical item with a greater frequency than the L1 group (e.g., Bulgarian here and German there), there are other cases where the L1 writers actually exceed the frequency counts of the L2 writers (as with now or, more marginally so, then). Furthermore, instances of higher frequency do not extend to all L2 groups. In the instance of here, for example, while the Bulgarian use is over twice as high as the L1 writers, the other two L2 groups are around, or below, the frequency of the L1 group. The same trend is found in there as well, with a higher frequency in the German writers but not the Spanish or Bulgarian writers. With now and then there is also no clear L1-L2 difference. The only potential pattern found with these lexical items is the consistently lower frequency for both found in the Spanish writers while the German and Bulgarian writers are closer to (and just below) the L1 frequencies. Finally, the frequency counts also illustrate that both the L1 and L2 student writers use these adverbs at a frequency that is generally closer to academic writing than to conversation. Although there are cases where the counts are above academic writing (as is particularly apparent with now and then), there are also cases where the frequencies are below academic writing (as in L1 and L2 German and Spanish here). In fact, no student writer group reached similar frequency levels to what is found in conversation for any of the four adverbs.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.14 (882-952)

 William J. Crawford

Table 4. Frequency of selected lexical items normalized to 100,000 words (raw counts in parentheses) Adverb German Spanish Bulgarian English Academic Writing (ICLE) (ICLE) (ICLE) (LOCNESS) (LGSWE) here there* now then

24 (55) 68 (154) 92 (207) 109 (246)

32 (42) 27 (35) 75 (99) 88 (117)

67 (134) 46 (92) 92 (186) 113 (228)

29 (44) 31 (47) 112 (167) 118 (176)

Conversation (LGSWE – American/British)

40

260/180

10

400/360

40

180/240

60

260/320

*Note: the raw counts and normalized frequencies of there include only the locative counts in order to make the frequency comparisons with the LGSWE.

Table 5 presents the distribution of the proposed functions for each language group by percentage. The distribution in Table 5 illustrates a complex picture of both difference and similarity. For example, the L1 writers use the place adverb here to fulfill both functions of place adverbs equally; this is not found in the learner corpora where the pattern is for here to be used as a place reference more than in reference to the essay. Other differences in the student corpora do not seem to be related to overall L1-L2 differences but instead to language background. This is seen, for example, in the use of then as a linking adverbial where the distribution of series and result adverbials are not uniform across the groups of student writers. More specifically, while the Bulgarian and L1 English writers have a similar distribution of frequency of then, this does not extend to the other L2 groups. In addition to these L1-L2 differences, there are also cases where there is little variation between the L1 and L2 writers. This is seen in there and now where the distribution of functions between the L1 and L2 writers are more uniform. A comparison of the student writers with academic writing and conversation shows a variable picture that includes cases which differ from academic writing (as in the L2 use of here); cases where all student groups have similar functional distributions to academic writing (as in there and here); and cases where the student writers seem to be patterning closer to conversation (as with then) than to academic writing. The quantitative and qualitative distribution for each adverb is discussed in more detail below. In the following discussion, numbers should be understood as normalized counts per 100,000 words (i.e., 67 means 67/100,000 words) unless otherwise stated.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.15 (952-979)

Place and time adverbials 

Table 5. Functional distribution of adverb types by language background (by percentage) Type Adverb Function Place here

there

Time now

then

Physical/metaphysical/ unspecified place Metadiscursive TOTAL Physical place Existential TOTAL Reference to present time Linking adverbial in a series Linking adverbial of result or consequence Discourse marker TOTAL Reference to past time Linking adverbial in a series Linking adverbial of result or consequence TOTAL

German Spanish Bulgarian English Acad. ConverWrit. sation 71

74

74

50

40

99

29 100 17 83 100 92

26 100 6 94 100 96

26 100 10 90 100 89

50 100 10 90 100 92

60 100 6 94 100 91

1 100 66 34 100 69

3

3

1

3

6

2

5

1

9

5

3

0

0 100 25 46

0 100 23 18

0 100 16 23

0 100 9 25

0 100 13 41

29 100 6 58

29

59

61

66

46

36

100

100

100

100

100

100

. The place adverb here The student writers range from 24–67, although if the Bulgarian writers are removed, the range is from 24–32 which shows little variation in the frequency of here in the student writers. Furthermore, with the exception of the Bulgarian writers, all student writers use here slightly less than what is found in academic writing. While the Bulgarian writers are outside of this range, this number is still more than three times less frequent than in conversation. Consequently, the frequency of here in all four student corpora is much closer to published writing than to conversation. The functional distribution of here across the four student corpora illustrates an L1-L2 difference. The three L2 groups all use here in reference to a place around 70% of the time and in reference to the essay around 30% of the time. The native speakers, however, are divided equally in their use of here in reference to the essay and as a place referent. In fact, L1 student writing patterns closer to academic writing where a 61% to 39% ratio was found. The functional analyses of the four students essays thus suggests that the L2 groups are somewhere between academic writing and conversation. Furthermore, the L2 groups do not show much variation in the different functions.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.16 (979-1033)

 William J. Crawford

. The place adverb there The frequency of there has a wider overall range in the student writers than was found with here. While the German writers, at 68, display a higher frequency of there than the other groups (similar to the Bulgarian frequency of here), the range of the remaining student writers (27–46) is also larger than what was found for here. As with here, the L1 writers are in the middle of the range which suggests no L1-L2 difference with respect to frequency. Finally, the frequency counts of there are much lower than what is found in conversation and instead are much closer to academic writing. The functional distribution of there does not show the same L1-L2 difference as was found with here. In fact, with the possible exception of the German writers, the student writers show very little variation in the distribution of there, with existential there constituting around 90% of all instances. Furthermore, both the L1 and L2 student writers use there in a manner that is similar to what is found in academic writing. In fact, three of the four student writing groups pattern very closely to what is found in academic writing with one L2 group (Spanish) patterning exactly the same and two more (Bulgarian and native-speakers) using existential there 90% of the time and place referent there 10% of the time. The German writers, at 83% existential and 17% place referent, were somewhat different from the other groups of student writers but still closer to both writing and the other student groups than to conversation. In summary, while the functions of here were different between the L1 and L2 writers, there demonstrated a much more similar functional distribution between the groups of student writers. Moreover, both groups of student writers used these two adverbs at a frequency and function that is closer to what is found in academic writing than to what is found in conversation. . Time adverb now The frequency of now in the student groups has a different pattern than for here and there. First, the highest frequency (112) is found in the L1 writers, not the L2 writers. Second, the one L2 group that does not pattern like the rest (the Spanish writers at 75) uses now less frequently, not more frequently, as was found with here and there. The other groups of student writers fall into a moderately stable range (92–112). Moreover, compared to the two place adverbs where some groups had greater and some groups had lower frequencies compared to academic writing, all student groups were using now more than what is found in academic writing. The student frequencies exceed those found in academic writing but they do not reach the levels found in conversation; instead, they are between the two.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.17 (1033-1082)

Place and time adverbials 

The functional distribution of now shows no L1-L2 difference where reference to present time is the dominant function and ranges between 89–96%. Furthermore, all student groups use now in a manner that is closer to academic writing than to conversation, with the most apparent difference being the fairly high percentage of now as a discourse marker in conversation and its absence in writing. There is some variation in the remaining adverbial functions, however, with academic writing preferring the series adverbial use and most student groups (with the exception of the Spanish writers) showing a preference for the result adverbial use. . Time adverb then The frequency distribution of then shows a pattern similar to now. Again, the L1 writers have the highest frequency (closely followed by the Bulgarian and German writers respectively) with the Spanish writers below the other groups and all student groups above academic writing. Although all four student groups had higher frequency counts than academic writing, these counts are still much closer to academic writing than what is found in conversation. In fact, the conversational counts of then are over twice as high as what is found in any student writing. The functional distribution of now illustrates that this adverb is the most complex in its distribution of functions. One apparent L1-L2 difference is in the reference to past time function where the L2 writers (16–25%) use this function more often than the L1 writers (9%). Another apparent difference is a preference for the series adverbial with the German writers (46%) compared to 18–25% for the other student writers who instead prefer the resultative adverbial (590–66% compared to the German writers at 29%). Thus, the German writers do not follow the general similarity of functional distribution found in the other three student groups. Furthermore, the student writers have a functional distribution of then which is not found in academic writing. This is most readily apparent in the two uses of then as a linking adverbial where academic writing has a fairly equal distribution of functions (41% series, 46% resultative) but the student writers do not. In fact, the functional distribution of then is the only one of the selected adverbs that does not have a single student group that shares similarities with academic writing. The quantitative analyses presented in this study demonstrate quite clearly that, with respect to the frequency of here, there, now, and then, the writing of both L1 and L2 students is much closer to published academic writing than to conversation. Furthermore, the frequency differences among learner groups are not nearly as great as the frequency differences between student writing and published writing. This suggests that the situational variables associated with academic writing such as the lack of shared physical context and the ability to edit (as was permitted for all four student groups) may be better indicators of the frequency

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.18 (1082-1148)

 William J. Crawford

of these lexical features, than experience in writing (in a first or second language). It remains to be seen if other features pattern in a similar way. Further studies of lexical and grammatical items associated with spoken language conducted in a manner similar to the present study can help to shed light on this issue. The qualitative analyses offered here point to a few cases of L1-L2 difference as seen in here, where the native speakers employed the stated functions equally while the learners preferred the use of here in reference to a place other than the essay and, more marginally the use of then as a past time marker. The other most pronounced differences were not found as L1-L2 differences but instead showed evidence of divergence due to language background as was seen in time adverb then and, perhaps less convincingly, for the result adverbial now in the Bulgarian writers. These language background differences may be due to factors such as transfer or instruction. Further study that controls for these variables can provide a deeper explanation for the exact influence of language background. Additionally, to these differences, there was also a clear L1-L2 similarity in the functions of there as well as now as a present time referent. In fact, of the four adverbs studied, only two had functional differences between the L2 writers and academic writing – the place adverb here and the time adverb then. With here, the difference seems to be the furthest from academic writing and it could be argued that the functions of this adverb pattern closer to conversation than to academic writing in L2 writing. Thus, of the four adverbs selected, only here could lend support to the ‘oral nature’ of student writing. Interestingly, with the exception of the Bulgarian writers, the frequency counts of here in the student writers are actually lower than what is found in academic writing while the functions are closer to conversation. The time adverb then has a more differentiated pattern in functional use across L1 and L2 groups. Two possible explanations for these differences come to mind. First, the time adverbs had more functions associated with them than the place adverbs. A wider range of functional options available to the language users likely means that command of these functions are more difficult to acquire. Second, the overall frequency of the time adverbs was much greater than the frequency of place adverbs (see raw counts in Table 5). Thus, not only are there more functions associated with the selected time adverbs, but the sheer number is greater. This could mean that the writer has a greater likelihood to use them in ways that may not be target-like.

. Conclusion This study has provided both quantitative and qualitative analyses in a wide range of student and native corpora. The overall results show that inexperienced writers use these adverbs at a frequency that is much closer to academic writing than

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.19 (1148-1191)

Place and time adverbials 

to conversation. L1-L2 differences may not be apparent in frequency counts. The qualitative analyses offered here point to some clear disparities, particularly with respect to the place adverb here where the native speakers employed the stated functions equally, while the learners preferred the place references of both adverbs. The L1-L2 difference was not seen as clearly, neither with the place adverb there or the time adverbs. Instead, the functional distribution of then was quite different from what was found in academic writing as well as conversation – the writers were approximating neither. Thus, with respect to both frequency and function, there is no overall difference between L1 and L2 writing. This suggests that the situational variables associated with writing, such as the lack of shared physical context and the ability to edit may be better indicators of the frequency and function of the time and place adverbs than variables such as experience in writing or ability to write in a second language. It remains to be seen if other features pattern in a similar way. Further studies of other lexical and grammatical items associated with spoken language conducted in a manner similar to the present study can surely help to address this issue. There are a number of implications of this investigation that are worthy of mention. First, in order to determine the extent to which L1 student writing is similar to L2 student writing, a further comparison of both groups to other native speaker corpora would be helpful. A comparison of student writing alone may not indicate how far, or close, student writers are to the registers they aspire to. The present study used the LGSWE as a guide in this respect and was able to indicate general areas of over-/underuse as well as functional differences and similarities. Second, for those interested in examining areas where L2 writing is different from L1 writing, functional explanations are clearly worthy of mention and are perhaps not as well represented in the literature as are straight frequency counts. Finally, when we consider how to translate these findings to the teaching of writing, there are two issues to consider: 1) experience in writing will lead to decreased use of the features associated with spoken language; and 2) functional differences should be expressly taught. With regards to the first issue, there is obviously a good deal of writing development that occurs between student writing and published academic writing and explicit instruction (possibly including analysis of these adverbs – or other features associated with spoken language – in student essays) may be a useful way to teach the form of written genres. With regard to the second issue, it may be the case that the use of these adverbs in published writing are associated with specific functions (e.g., then with the meaning of “as a consequence” or “therefore”; or here referring to a place in the essay) and that knowledge of these functions can help inexperienced writers to use these forms in an appropriate manner. In order to achieve this, a more extensive functional analysis of these adverbs in published writing would be necessary.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.20 (1191-1291)

 William J. Crawford

References Aijmer, K. 2002. Modality in advanced Swedish learners’ written interlanguage. In Computer Learner Corpora, Second Language Aquisition and Foreign Language Acquisition, S. Granger, J. Hung & S. Petch-Tyson (eds), 55–76. Amsterdam: John Benjamins. Altenberg, B. 2002. Using bilingual evidence in corpus research. In Computer Learner Corpora, Second Language Aquisition and Foreign Language Acquisition, S. Granger, J. Hung & S. Petch-Tyson (eds), 37–54. Amsterdam: John Benjamins. Altenberg, B. & Tapper, M. 1998. The use of adverbial connectors in advanced Swedish learners’ written English. In Learner English on Computer, S. Granger (ed.), 80–93. New York NY: Longman. Barlow, M. 2000. MonoConc Pro 2.0 Houston TX: Athelstan. Biber, D. 1988. Variation across Speech and Writing. Cambridge: CUP. Biber, D., Johansson, S., Leech, G., Conrad, S. & E. Finegan. 1999. Longman Grammar of Spoken and Written English. London: Longman. Biber, D. & Reppen, R. 1998. Comparing native and learner perspectives on English grammar: A study of complement clauses. In Learner English on Computer, S. Granger (ed.), 145–158. London: Longman. Brown, P. & Levinison, S. 1987. Politeness. Cambridge: CUP. Cambridge Advanced Learner’s Dictionary. 2003. Cambridge: CUP. Chafe, W. 1970. Meaning and Structure in Language. Chicago IL: University of Chicago Press. Chafe, W. 1982. Integration and involvement in speaking, writing, and oral literature. In Spoken and Written Language: Exploring orality and literacy, D. Tannen (ed.), 35–53. Norwood NJ: Ablex. Chafe, W. & Danielewicz, J. 1987. Properties of spoken and written language. In Comprehending Oral and Written Language, R. Horowitz & S. J. Samuels (eds), 83–113. San Diego CA: Academic Press. Flowerdew, L. 2001 The exploitation of small learner corpora in EAP materials design. In Small Corpus Studies and ELT [Studies in Corpus Linguistics 5], M. Ghadessy, A. Henry & R. L. Roseberry (eds), 363–380. Amsterdam: John Benjamins. Fraser, B. 1999. What are discourse markers? Journal of Pragmatics 31: 931–952. Granger, S. 2005. Computer learner corpus research: Current status and future prospects. In Applied Corpus Linguistics: A multidimensional perspective, U. Connor & T. Upton (eds), 123–145. Atlanta GA: Rodopi. Granger, S. (ed.). 1998. Learner English on Computer. London: Longman. Granger, S., Hung, J. & Petch-Tyson, S. (eds). 2002. Computer Learner Corpora, Second Language Acquisition and Foreign Language Acquisition. Amsterdam: John Benjamins. Granger, S. & Rayson, P. 1998. Automatic profiling of learner texts. In Learner English on Computer, S. Granger (ed.), 119–131. London: Longman. Horvath, J. 2001. Advanced Writing in English as a Foreign Language: A corpus-based study of processes and products. Pecs: Lingua Franca Csoport. Housen, A. 2002. A corpus-based study of the L2-acquisition of the English verb system. In Computer Learner Corpora, Second Language Acquisition and Foreign Language Acquisition, S. Granger, J. Hung & S. Petch-Tyson (eds), 77–116. Amsterdam: John Benjamins. Hinkel, E. 2002. Second Language Writer’s Text: Linguistic and rhetorical features. Mahwah NJ: Lawrence Erlbaum. Levinson, S. 1983. Pragmatics. Cambridge: CUP.

JB[v.20020404] Prn:27/03/2008; 9:24

F: SCL3112.tex / p.21 (1291-1322)

Place and time adverbials 

Longman Essential Activator. 1997. London: Longman. Nesselhauf, N. 2004. Learner corpora and their potential for language teaching. In How to Use Corpora in Language Teaching, J. M. Sinclair (ed.), 125–152. Amsterdam: John Benjamins. Muller, S. 2005. Discourse Markers in Native and Non-native English Discourse. Amsterdam: John Benjamins. Petch-Tyson, S. 1998. Writer/reader visibility in EFL written discourse. In Learner English on Computer, S. Granger (ed.), 107–118. London: Longman. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Schiffrin, D. 1987. Discourse Markers. Cambridge: CUP. Tankó, G. 2004. The use of adverbial connectors in Hungarian students’ argumentative writing. In How to Use Corpora in Language Teaching, J. Sinclair (ed.), 157–181. Amsterdam: John Benjamins.

JB[v.20020404] Prn:21/05/2008; 11:09

F: SCL31AI.tex / p.1 (48-222)

Author index

A Adelswärd , , ,  Aijmer , , , ,  Altenberg , ,  B Ball , , , , – Barlow , ,  Bhatia , ,  Biber , , , , , , , , , , , , , , , , , , –, –, , , , , , – Brazil –, , , – Brown , , ,  C Carter , , , ,  Chafe , ,  Collins , –, , , , , , , ,  Connor , , –,  Conrad , , , ,  Crawford , , ,  Crystal , , , , ,  F Fairclough , ,  G Granger –,  H Halliday , , , 

Harré , –,  Harwood ,  Hay ,  Herring , ,  Hinkel – Hoey , ,  Holmes , , , ,  House , , ,  Hunston , , , , – Hyland –, , , , , , , , 

Prince –, , , , , , , 

K Kretzenbacher , 

S Sacks ,  Schegloff , ,  Scollon , ,  Scott , , , , , ,  Seedhouse ,  Sinclair , , –, – Stenström , ,  Swales , , , , –, 

L Lakoff ,  Leech ,  Levinson , ,  M Malone ,  Mauranen , , , , , ,  McCarthy , , , , , , , , , ,  McEnery ,  Mühlhäusler , , , ,  Mullany , ,  N Nesselhauf ,  P Pérez-Guerra , , –, , 

Q Quirk , –, , , , ,  R Reppen , , , , , 

T Thompson , , ,  Tognini-Bonelli , ,  U Upton , – W Wales , , ,  Waugh , ,  Y Yates , , 

JB[v.20020404] Prn:21/05/2008; 11:10

F: SCL31LI.tex / p.1 (48-149)

Corpus and tools index

B BNC (British National Corpus) , , ,  BROWN Corpus  C C-MELT , , – CANCODE , –, ,  CONCE (Corpus of Nineteenth-Century English) , ,  Conversation corpus , , , , , – Corpus of news reports on the 2004 US presidential election  Corpus of professional recommendation-based reports  Corpus of the Meetings of English Language Teachers  D Dexter 

F Friends corpus –,  FROWN Corpus  H Helsinki Corpus of English Texts ,  HMBC (Hip-hop Message Board Corpus) , , , , , , –, , , – HKCSE (Hong Kong Corpus of Spoken English) , , –,  I ICLE (International Corpus of Learner English) , , ,  L LCIE (Limerick Corpus of Irish English) , , –, ,  LGSWE Corpus OR Longman Grammar Corpus , , , , , , , , 

LIBEL CASE (Limerick-Belfast Corpus of Academic Spoken English) ,  LOB Corpus , , ,  LOCNESS , –,  Louvain Corpus of Native English Essays  M MICASE (Michigan Corpus of Academic Spoken English)  S SCEGAD (Synchronic Corpus of English and German Academic Discourse) , –, –, –, ,  T T2K-SWAL  W WordSmith Tools , , , , 

JB[v.20020404] Prn:21/05/2008; 11:20

F: SCL31SI.tex / p.1 (48-167)

Subject index

A AAVE (African American Vernacular English) ,  academic discourse , , , , , , , , , , –, , , , , ,  academic status , , , , , ,  affiliation , –, , , ,  affiliation classes , ,  American English , , , , , , , , , ,  annotation , ,  anonymity , , , ,  anonymous sources ,  anti-language  automated analysis  automatic , , , ,  B balance , , , , , , ,  bias –, , , , –,  British English , , , , , , ,  C CA (conversational analysis) , ,  capitalization ,  case weighted ,  casual conversation , –, , –, , , ,  causal relation ,  circulation figures  citation ,  citations , , ,  clarification , , , , , , , 

CLC (computer learner corpus) – closing , –, ,  cluster , , , , , , ,  cluster analysis  CMC (computer-mediated communication) –, , , , , ,  coding , , –, , , –, , ,  collocation , , ,  collocational framework , ,  concgram  conservative sources , ,  covert prestige  D democratic party ,  direct quotations  direct quotes  direct reported speech ,  direct speech –,  disciplinary variation , ,  discontinuous cleft , ,  discourse analysis , , , , , ,  discourse functions , , , , , , , ,  discourse intonation , –, , ,  discursive construction of identity , ,  E EAP (English for Academic Purposes) ,  emotional language , , , , , 

emotionally-loaded , ,  emphatics –, , , , , , , –, ,  ESL , , ,  ethnography  evaluation , , , , , , , , , , , ,  evidentiality , , , , –, –, , , ,  existential paradigm ,  expert , , , , , , , , , , ,  F face-to-face conversation , , , , , –, , ,  factor analysis ,  frequency , , , , , , , –, , –, –, –, , , , , , , , , , , , , , , , , , , –, , , , , , –, , , , , , , , , –, – frequency-based listing  functional , , , , , , , , , , –, , , – functional analysis , , , ,  G gender , , , , , , –, , , , ,  general paradigm ,  genre –, , , , , , , , , , , , ,

JB[v.20020404] Prn:21/05/2008; 11:20

F: SCL31SI.tex / p.2 (167-284)

 Corpora and Discourse , –, –, , , ,  genre move structure analysis  genre-based approaches  German , , –, –, –, , , , –, –, – grammatical feature , , ,  grammatically-rich word association , ,  H hearsay evidentiality , , , , , ,  hedging , , , , , , , , ,  hip-hop community –, , , , , , ,  humanities , , , , , , , , , ,  humour , –, – I identification , , , , , , , –, , , , ,  identity , –, , , , , , –, , , , , , , , , –, , , , , ,  idiom principle  impartial , , , , ,  impartial sources ,  impersonal , –, , ,  impersonal constructions  in-group , , , , , ,  inclusive “we” , , , –,  indirect reported speech , , , ,  indirect speech , , , , , ,  informative-presupposition clefts  institutional , , , , , , –, ,  institutional discourse ,  institutional interaction 

intensifier , , , , , ,  involvement , , –, , , , , , , , , , ,  involvement device ,  IP-clefts , , ,  Irish English , , ,  irrealis , ,  it-cleft construction , ,  J John Kerry , , , , ,  journalistic discourse  K key-keyword analysis  keyword(s) , , , , , , , –, , , ,  L L1 –, – L2 –, – laughter , , –, – lemma , , , , ,  lemmas , , , – lexical bundles , , ,  lexical choices , , , ,  lexicalised categories ,  lexically-rich word association , , , , , , , ,  liberal , –, , , ,  liberal sources , ,  linear unit grammar , , , ,  M macrostructure , , ,  Mann Whitney test , ,  manual , , , , , , , , , ,  manual coding , , ,  marked , , , , , , , , , , –, , , , , ,  media , , , , , , , , , 

media bias  meetings , , , –, , , , ,  message-board postings  metadiscourse , , ,  method –, , , , , , , ,  methodology , , , , , , , , –, , , , , , , , ,  multi-word units  Multidimensional analysis ,  N natural conversation , , , , ,  nested citation , , , , ,  news organizations ,  news reporting , , ,  news reports , , ,  newspapers , , , –, , , , –, ,  nineteenth century English ,  non-lexicalised categories ,  non-native speakers of English  non-standard , , –, – non-standard grammar  non-standard orthography , , , , , – O objectivity , , ,  one-to-one searching  open choice principle ,  opening , , , – opinions , , , , , , , –, , , ,  orthography , , , , – out-group , , , ,  P partial direct reported speech  partisan , ,  person reference , , –, , 

JB[v.20020404] Prn:21/05/2008; 11:20

F: SCL31SI.tex / p.3 (284-397)

Subject index  personal pronoun , , , , , , ,  persuasion , ,  phraseology , , , , –,  political bias ,  positivist  precision , ,  presidential election , , , , ,  press , , , , , ,  problem and solution , , , , , , ,  professional reports , , ,  prominence , –, –,  pronominal system ,  pronoun , , , , , , , , , –, , , , , , , , , , , ,  prosodic transcription , ,  public discourse  public opinion  Q qualitative , , , , , , , , , , , , , , , , –, , , , ,  qualitative analysis , , , ,  quantitative , , , , , , , , –, , , , , , , , , , , , , , , , ,  quantitative analysis , , , ,  quotation marks – quote , ,  quoted , , ,  quotes , ,  R recall , , , ,  recommendation-based reports ,  reduplication  reference corpus , , , 

register variation , – reinforcing humour , ,  reported speech –, , –, , ,  reporting words , , , , , ,  research article ,  reverse-sort , , ,  S sampling , , ,  scare quotes  search method ,  second-person pronouns , , , ,  self-reference , –, ,  semantic preference , , , –,  semantic prosody ,  SETT ,  SF-clefts ,  shared knowledge –, ,  sifting ,  significance test ,  situation comedy , , , ,  slang , , , –, , –,  social constructivist  solidarity , , , , , , , , –, , , , , ,  source type ,  sources cited , ,  speaker roles , ,  special interest sources , ,  speech acts ,  speech event ,  speech-related , –, , , , ,  speech-related genre  stance adverbial  statistical –, , , , , , , , , , , , , – statistical significance , , , , , , , , , , , , , , , – stressed focus clefts  subversive humour , , , , 

T taboo terms , –, , –, , , ,  television dialogue , , ,  television news  text comment ,  textual patterning ,  that/wh-clause , , , , , – thematic organisation  tone unit boundary  turn-taking  two-sample t-test 

U unaffiliated individual , ,  unaffiliated sources ,  undecided voter , ,  unnamed sources , , , ,  US newspapers , ,  US-American , , , , –, , 

V vague language , , , , , –, , , , , ,  vagueness , , , , , , ,  vagueness tag  variation , , , –, , , , , , , , , , , –, , , , , , , , , – VCM (vague category marker) –, , , , , – verbal art , , , , , , , , ,  verbatim , , ,  vice president , ,  voice , , , , , , 

W word association , , , –

In the series Studies in Corpus Linguistics (SCL) the following titles have been published thus far or are scheduled for publication: 31 Ädel, Annelie and Randi Reppen (eds.): Corpora and Discourse. The challenges of different settings. 2008. vi, 295 pp. 30 Adolphs, Svenja: Corpus and Context. Investigating pragmatic functions in spoken discourse. 2008. xi, 151 pp. 29 Flowerdew, Lynne: Corpus-based Analyses of the Problem–Solution Pattern. A phraseological approach. 2008. xi, 179 pp. 28 Biber, Douglas, Ulla Connor and Thomas A. Upton: Discourse on the Move. Using corpus analysis to describe discourse structure. 2007. xii, 290 pp. 27 Schneider, Stefan: Reduced Parenthetical Clauses as Mitigators. A corpus study of spoken French, Italian and Spanish. 2007. xiv, 237 pp. 26 Johansson, Stig: Seeing through Multilingual Corpora. On the use of corpora in contrastive studies. 2007. xxii, 355 pp. 25 Sinclair, John McH. and Anna Mauranen: Linear Unit Grammar. Integrating speech and writing. 2006. xxii, 185 pp. 24 Ädel, Annelie: Metadiscourse in L1 and L2 English. 2006. x, 243 pp. 23 Biber, Douglas: University Language. A corpus-based study of spoken and written registers. 2006. viii, 261 pp. 22 Scott, Mike and Christopher Tribble: Textual Patterns. Key words and corpus analysis in language education. 2006. x, 203 pp. 21 Gavioli, Laura: Exploring Corpora for ESP Learning. 2005. xi, 176 pp. 20 Mahlberg, Michaela: English General Nouns. A corpus theoretical approach. 2005. x, 206 pp. 19 Tognini-Bonelli, Elena and Gabriella Del Lungo Camiciotti (eds.): Strategies in Academic Discourse. 2005. xii, 212 pp. 18 Römer, Ute: Progressives, Patterns, Pedagogy. A corpus-driven approach to English progressive forms, functions, contexts and didactics. 2005. xiv + 328 pp. 17 Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.): Corpora and Language Learners. 2004. vi, 312 pp. 16 Connor, Ulla and Thomas A. Upton (eds.): Discourse in the Professions. Perspectives from corpus linguistics. 2004. vi, 334 pp. 15 Cresti, Emanuela and Massimo Moneglia (eds.): C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. 2005. xviii, 304 pp. (incl. DVD). 14 Nesselhauf, Nadja: Collocations in a Learner Corpus. 2005. xii, 332 pp. 13 Lindquist, Hans and Christian Mair (eds.): Corpus Approaches to Grammaticalization in English. 2004. xiv, 265 pp. 12 Sinclair, John McH. (ed.): How to Use Corpora in Language Teaching. 2004. viii, 308 pp. 11 Barnbrook, Geoff: Defining Language. A local grammar of definition sentences. 2002. xvi, 281 pp. 10 Aijmer, Karin: English Discourse Particles. Evidence from a corpus. 2002. xvi, 299 pp. 9 Reppen, Randi, Susan M. Fitzmaurice and Douglas Biber (eds.): Using Corpora to Explore Linguistic Variation. 2002. xii, 275 pp. 8 Stenström, Anna-Brita, Gisle Andersen and Ingrid Kristine Hasund: Trends in Teenage Talk. Corpus compilation, analysis and findings. 2002. xii, 229 pp. 7 Altenberg, Bengt and Sylviane Granger (eds.): Lexis in Contrast. Corpus-based approaches. 2002. x, 339 pp. 6 Tognini-Bonelli, Elena: Corpus Linguistics at Work. 2001. xii, 224 pp. 5 Ghadessy, Mohsen, Alex Henry and Robert L. Roseberry (eds.): Small Corpus Studies and ELT. Theory and practice. 2001. xxiv, 420 pp. 4 Hunston, Susan and Gill Francis: Pattern Grammar. A corpus-driven approach to the lexical grammar of English. 2000. xiv, 288 pp. 3 Botley, Simon Philip and Tony McEnery (eds.): Corpus-based and Computational Approaches to Discourse Anaphora. 2000. vi, 258 pp. 2 Partington, Alan: Patterns and Meanings. Using corpora for English language research and teaching. 1998. x, 158 pp. 1 Pearson, Jennifer: Terms in Context. 1998. xii, 246 pp.

Corpora and Discourse: The Challenges of Different Settings (Studies in Corpus Linguistics, Volume 31)

Corpora and Language Teaching (Studies in Corpus Linguistics)

Corpora and Language Learners (Studies in Corpus Linguistics)

Exploring Corpora for ESP Learning (Studies in Corpus Linguistics)

Using Corpora to Explore Linguistic Variation (Studies in Corpus Linguistics)

Corpus and Context: Investigating Pragmatic Functions in Spoken Discourse (Studies in Corpus Linguistics, Volume 30)

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

New Trends in Corpora and Language Learning (Corpus And Discourse)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries (Studies in Corpus and Discourse)

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Using Corpora in Discourse Analysis (Continuum Discourse)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

Using Corpora in Discourse Analysis

Researching Specialized Languages (Studies in Corpus Linguistics)

Corpus Linguistics

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

How to Use Corpora in Language Teaching (Studies in Corpus Linguistics, 12)

Historical Corpus Stylistics (Corpus and Discourse)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

Small Corpus Studies and Elt: Theory and Practice (Studies in Corpus Linguistics)

Academic and Professional Discourse Genres in Spanish (Studies in Corpus Linguistics)

C-ORAL-Rom: Integrated Reference Corpora for Spoken Romance Languages (Studies in Corpus Linguistics)

C-Oral-Rom: Integrated Reference Corpora For Spoken Romance Languages (Studies in Corpus Linguistics)

Studies in Interactional Linguistics (Studies in Discourse & Grammar)

Corpora and Discourse: The Challenges of Different Settings (Studies in Corpus Linguistics, Volume 31)

Corpora and Language Teaching (Studies in Corpus Linguistics)

Corpora and Language Learners (Studies in Corpus Linguistics)

Exploring Corpora for ESP Learning (Studies in Corpus Linguistics)

Using Corpora to Explore Linguistic Variation (Studies in Corpus Linguistics)

Corpus and Context: Investigating Pragmatic Functions in Spoken Discourse (Studies in Corpus Linguistics, Volume 30)

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

New Trends in Corpora and Language Learning (Corpus And Discourse)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries (Studies in Corpus and Discourse)

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Using Corpora in Discourse Analysis (Continuum Discourse)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

Using Corpora in Discourse Analysis

Researching Specialized Languages (Studies in Corpus Linguistics)

Corpus Linguistics

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

How to Use Corpora in Language Teaching (Studies in Corpus Linguistics, 12)

Historical Corpus Stylistics (Corpus and Discourse)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

Small Corpus Studies and Elt: Theory and Practice (Studies in Corpus Linguistics)

Academic and Professional Discourse Genres in Spanish (Studies in Corpus Linguistics)

C-ORAL-Rom: Integrated Reference Corpora for Spoken Romance Languages (Studies in Corpus Linguistics)

C-Oral-Rom: Integrated Reference Corpora For Spoken Romance Languages (Studies in Corpus Linguistics)

Studies in Interactional Linguistics (Studies in Discourse & Grammar)

Recommend Documents