Connected Words
Language Learning & Language Teaching (LL<) The LL< monograph series publishes monographs, edited ...
118 downloads
1292 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Connected Words
Language Learning & Language Teaching (LL<) The LL< monograph series publishes monographs, edited volumes and text books on applied and methodological issues in the field of language pedagogy. The focus of the series is on subjects such as classroom discourse and interaction; language diversity in educational settings; bilingual education; language testing and language assessment; teaching methods and teaching performance; learning trajectories in second language acquisition; and written language learning in educational settings.
Editors Nina Spada
Ontario Institute for Studies in Education, University of Toronto
Nelleke Van Deusen-Scholl Center for Language Study Yale University
Volume 24 Connected Words. Word associations and second language vocabulary acquisition by Paul Meara
Connected Words Word associations and second language vocabulary acquisition
Paul Meara Swansea University
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Meara, P. M. (Paul M.) Connected words : word associations and second language vocabulary acquisition / Paul Meara. p. cm. (Language Learning & Language Teaching, issn 1569-9471 ; v. 24) Includes bibliographical references and index. 1. Second language acquisition--Study and teaching. 2. Vocabulary--Study and teaching. 3. Language and languages--Study and teaching I. Title. P118.2.M423 2009 418.0071--dc22 2009019814 isbn 978 90 272 1986 2 (hb; alk. paper) / isbn 978 90 272 1987 9 (pb; alk. paper) isbn 978 90 272 8907 0 (eb)
© 2009 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Table of contents
Acknowledgements Introduction: Connecting words
vii ix
section 1. Early work
1
chapter 1 Learners’ word associations in French
5
chapter 2 Word associations in a foreign language
21
section 2. Associations as productive vocabulary
29
chapter 3 Lex30: An improved method of assessing productive vocabulary in an L2
33
chapter 4 Exploring the validity of a test of productive vocabulary
45
section 3. Word association networks
59
chapter 5 Network structures and vocabulary acquisition in a foreign language
65
chapter 6 V_Links: Beyond vocabulary depth
73
chapter 7 A further note on simulating word association behaviour in an L2
85
section 4. Bibliograhical resources for word associations in an L2
97
chapter 8 Word associations in a second language: An annotated bibliography
101
Connected Words
section 5. Software applications
129
chapter 9 Lex 30 v3.00: The manual
131
chapter 10 V_Six v1.00: The manual
147
chapter 11 WA_Sorter: The manual
159
References
165
Index
171
Acknowledgements Chapter 1 first appeared in Interlanguage Studies Bulletin – Utrecht 3(2): 192–211, 1978. Chapter 2 first appeared in Nottingham Linguistics Circular 11: 28–38, 1983. Chapter 3 and Chapter 4 were co-authored with Tess Fitzpatrick, and appeared respectively in System 28(1): 19–30, 2000 and in Vigo International Journal of Applied Linguistics 1: 55–74 (2004). Chapter 5 first appeared in: P.J.L. Arnaud & H. Béjoint (Eds). 1992. Vocabulary and Applied Linguistics. London: Macmillan. Chapter 6 was co-authored by Brent Wolter, and appeared in Angles on the English Speaking World 4: 85–97, 2005. Chapter 8 was co-authored with Brent Wolter and Clarissa Wilks and first appeared in Second Language Research 21(4): 359–372, 2005.
Introduction Connecting words Applied linguistics is a curious discipline, one that seems particularly badly inflicted with band-wagon research. Every now and then someone produces an especially insightful paper, and within a couple of years almost everyone else has abandoned the research they were previously working on to follow up these new ideas. The problem then is that once the original band-wagon stalls, and another band-wagon appears, people are only too ready to move on to the new topic. One consequence of this is that there is a huge quantity of “research” which does very little to move the field forward in any real sense. Hardly anyone looks at the fundamental assumptions underlying the current band-wagon, hardly anyone asks critical questions about the methodologies that the current band-wagon depends on, and few people are willing to invest much time and effort into developing better methodologies that support these critical questions. This volume contains a set of papers that deal with word associations in a foreign language. I first started working in this area back in the 1970s when the psycholinguistics of foreign language speakers was a severely under-researched field. At the time, most applied linguists still looked towards theoretical linguistics as their source discipline, and most of the interesting ideas that were being discussed still relied very heavily on exciting theories about the nature of language that were being developed by linguists. This meant that there was a strong emphasis on formal aspects of second language production, but only a handful of people were interested in the processes which underpinned these productions. This emphasis can be seen very clearly in the 1984 volume edited by Davies, Criper & Howatt, which marked Pit Corder’s retirement as head of the Department of Applied Linguistics in Edinburgh. Selinker’s summary paper in that volume (Selinker 1984) identified nine issues which had emerged in the conference, and which he felt were central to the enterprise of interlanguage studies: methodology (by which he meant the use of intuitions and judgements about grammaticality), language transfer (the influence of L1 structural features on L2 output), fossilization (the way some L2 structures persist even when they are grammatically incorrect by Native Speaker standards), the Universal hypothesis, Universal Grammar (the assumption that learners have a grammar in their heads), Interlanguage Strategies (specifically the distinction between communication strategies and learning strategies), Interlanguage Discourse (specifically the impact of classroom discourse on interlanguage
Connected Words
development), and Context in Interlanguage Studies (by which he seems to mean the impact of special purposes environments on language acquisition). My own contribution to that volume was unusual in that it was the only piece that identified second language vocabulary as an area of importance – and it got some criticism for doing so. At the time, this didn’t surprise me – after all, it was not long since the whole question of L2 lexical competence had been dismissed as uninteresting by the major figures in the field. Hockett’s assertion that “there is no point in learning large numbers of (words) until one knows what to do with them ... The acquisition of new vocabulary hardly requires formal instruction” was widely accepted as a non-negotiable premise (Hockett 1958:266). More recently, Canale and Swain’s seminal paper on communicative competence, which was to define the dominant paradigm in SLA for many years, had reduced vocabulary knowledge to a very minor role in grammar competence (Canale & Swain 1980). With hindsight, however, it is perhaps more of a surprise that so few people were taking vocabulary acquisition seriously. This, after all, was the heyday of Verbal Learning – a vast area of psychological research, which dealt entirely with words, how we learn them and how we use them, and what we can learn about memory and cognition by studying the way people handle words. I suppose that verbal learning had a bad press with linguists because it was linked in many people’s minds to behaviourism. Everyone knew that behaviourism had been definitively rubbished by Chomsky in his review of B.F. Skinner’s Verbal Behavior (Chomsky 1959) – a review which was often quoted but rarely read. A few die-hard psychologists still clung stubbornly to behaviorist views, but most linguists believed that these ideas had little to offer to linguistics. This view of the verbal behaviour movement was, of course, a travesty of what the work really involved. True, a lot of it did involve people learning lists of words, and a lot of it even involved people learning long lists of nonsense words. But this work was actually a lot more interesting than linguists made it out to be. Some of it was extremely sophisticated, making use of complex mathematical models that were rarely used by linguists. Some of it was not so sophisticated, but the sheer volume of work that was carried out, the large number of variables that were studied, and the huge range of problems that the results were applied to meant that verbal learning and verbal behaviour formed a compulsory core element in the training of academic psychologists in a way that it never did for linguists. A good example of this divergence can be found in Crothers & Suppes (1967) book, which presented a coherent and mathematically sophisticated model of the way L2 learners might acquire vocabulary from word lists. As far as I know, this work was not reviewed in any of the major journals in Applied Linguistics at the time of its publication, and even today, it is only rarely mentioned by the main researchers in vocabulary acquisition. Nation (2001), for example, summarises this text in three short sentences, while Singleton (1999) and Schmitt (2000) do not mention it at all.
Introduction
The study of word associations was one small part of this vast enterprise, but even so, it managed to attract significant figures. James Deese’s work was particularly important, e.g., Deese (1965) and David Palermo and James Jenkins published standard lists of word association norms which had enormous influence on the kinds of research people did in psycholinguistics (e.g., Palermo & Jenkins 1964). Word associations turned out to have implications for the study of memory, (Bousfield 1953), child language acquisition (Brown & Berko 1960), cognitive and behavioural disorders (Rapaport, Gill & Schafer 1968), language loss (Lesser 1974), cross-cultural psychology (Szalay & Deese 1978) and bilingualism (Lambert 1955, 1956), to name but a few of the many applications which appeared at this time. Wallace Lambert’s work was particularly interesting, because it hinted that word associations might be used as a way of assessing the overall language ability of L2 speakers. Lambert was mainly working with bilinguals in Canada – that is, with speakers of French and English, whose competence in both languages appeared to be relatively high. It was obvious, though, that the methodologies Lambert was using could be used with lower level subjects too, and might throw some light on the way vocabularies developed in more traditional L2 learners. This was not a new idea, of course. Klaus Riegel had already carried out some large scale and pioneering work with learners of German and Spanish as an L2, and by 1972, he had already established the main features of L2 word associations – that they were much less stable than L1 associations, that groups of L2 learners exhibited relatively low levels of associational stereotypy, and that associations broadly became more native-speaker-like as learners became more proficient in the L2. However, this line of research seemed to stop following Riegel’s early death. The only other influential piece of work which dates from this time was a paper by Politzer (1978) which suggested that vocabulary development in L2 learners parallels what happens in children learning an L1. Ervin (1961) had shown that young L1 speakers have a tendency to produce syntagmatic associations, but when they get to seven or eight years of age, this tendency drops off, and older children are more likely to produce paradigmatic associations – at least to high frequency stimulus words. Politzer argued (incorrectly in my view) that a similar shift occurs in L2 speakers. Surprisingly, this claim came to dominate L2 word association work (e.g., Söderman 1989). My own early work on word associations was very much in the style of Palermo and Jenkins. Later on, I managed to persuade some of my Master’s students to work on word associations too, and this enabled me to look critically at some of the methodological issues in word association research with L2 learners. The question of stability began to emerge as a crucial issue in these projects. When we ran repeated tests with L2 speakers, we found that the data they produced varied enormously from one administration to another, in a way which was not at all characteristic of L1 speakers. This feature of L2 word associations made it difficult
Connected Words
to interpret the raw data we were getting from our experimental work, and made it difficult to take some of the standard reports at face value. It also became clear that the “standard” word list that everybody used for word association research, the Kent-Rosanoff list – reproduced at the end of this chapter, was perhaps not the best stimulus list to use with non-native speakers. This list, which dated from 1910, had been widely used by psychologists, translated into many languages (e.g., Rosenzweig 1961), and underpinned the many lists of word association norms which had become available in the 1970s (e.g., Postman & Keppel 1970). I toyed briefly with the idea of developing an alternative list, but the difficulty of getting large enough numbers of subjects to take part in the necessary evaluations soon put paid to this plan. Instead, we developed two different approaches which looked as though they might make good use of small subject numbers. The first of these was to collect multiple associations from learners, rather than single associations. We thought that this might get round the problem of instability in L2 word associations, and we also thought that we might be able to use clever weighting systems based on the published norms to show that learners’ associations developed in systematic ways over time. Neither of these plans came to very much, mainly because the learners we had access to at the time had relatively small vocabularies, and this meant that they had difficulty producing more than two or three plausible associations for a given stimulus word. We also found that it was difficult to provide a plausible rationale for the weighting systems that we considered – a position which I later discovered to have been strenuously argued by Palermo & Jenkins (1964). Different weighting systems could give wildly different results, and the choice of one system over another seemed to be essentially unmotivated and arbitrary. Eventually, we were heavily influenced by another critical article from Sharwood Smith’s team in Utrecht (Kruse, Pankhurst & Sharwood Smith 1987), which seemed to show that there was no relationship at all between scores on a word association test and proficiency level in English as a second language, and questioned the very idea of using word associations as a proficiency measure. This paper appeared at a time when my own research was going particularly badly, and it convinced me that I didn’t have much of a future as a researcher. With hindsight, perhaps, I should have stuck to my guns and been a bit less overawed by colleagues who managed to get their work published in major journals, while my own rejected papers languished in drawers. Kruse et al. used only a tiny number of subjects, a handful of target words, a weighting system which was difficult to justify, and a multiple association task which required their subjects to produce up to twelve associations for each target word. These conditions were very different from the ones which my own group had been using, and we should not have been surprised that our own work was producing data which looked very different from what Kruse et al. were
Introduction
reporting. In practice, however, we pretty much abandoned main-stream association work, in favour of some rather different methodologies. One of these methodologies was a backwards association task (Vives Boix 1995), in which we gave subjects a set of association responses and asked them to identify the original stimulus word that they were all associated to. This turned out to be a very difficult task, which often stumped even advanced learners of the L2, and since most of the subjects available to us at the time were relatively low level learners, we did not take this idea very far. A second method turned out to be more interesting. It involved asking people to generate association responses to sets of semantically related words, and recording how often a word within the set occurred as an association to one of the other words in the set. For native speakers, small sets of words of this sort would often turn out to be densely inter-related. For example, the set SLEEP DREAM PILLOW SNOOZE WAKE BED NIGHT tended to produce very dense networks of associations. The same set of words presented to non-native speakers generated much sparser response networks, with hardly any evidence in extreme cases of the kind of semantic clustering that we were finding with native speakers. The effect was very easy to reproduce. This made me wonder whether it might be possible to move away from the individual responses generated by L2 speakers as raw data, and look instead at the structural properties of their L2 networks. The beauty of this idea was that it would allow us to ignore the ephemeral and unstable features of L2 speakers’ word associations and focus instead on deeper underlying structural properties of these lexical networks. At the time this seemed like an obvious and straightforward way to go, but it turned out to be much more difficult to implement than I had expected. The basic idea was that non-native speakers’ lexical networks would be less well-developed than the networks found in native speakers, a claim which few researchers would disagree with. In practice, however, it has been very difficult to pin this idea down, and finding a robust experimental methodology which could exploit this way of thinking about vocabularies has turned out to be a surprisingly elusive goal. The papers in this collection are very far from the last word in word association research. They do, however, illustrate two important themes which I hope might be of value to young scholars and beginning researchers. The first theme is that doing research is not straightforward. People who study bibliometrics – the relationships between published works and their authors – have established a number of laws which describe the structure of a research field. One of the most important of these laws is Lotka’s Law, which describes the number of publications generated by people working in a particular research field. The Law states that “the number of authors making N contributions is about 1/N2 of those making one contribution” (Lotka 1926). What this means in practice is that more
Connected Words
than half of the papers in a particular field are generated by people who publish only one paper in that field, and the number of people who publish more than a handful of papers is relatively small. The L2 word association field is probably too limited for Lotka’s Law to apply accurately, but the Law gives you an idea of how much “research” is actually produced by people who have not grappled with the ideas for very long. One of the problems with research that gets done in this way is that it only rarely asks the right questions. Most one-off research tends to ask questions which are obvious and unsubtle. The subtle questions only become apparent when you work with a problem over a long period of time, and eventually realise that the questions you started off asking were not the ones you should have been asking at all. It is at this point that research gets really interesting, but much more difficult. Almost by definition, you find yourself working at the edge of your methodological and conceptual competence, and inevitably some of the things you find yourself saying turn out to be misconceived, or just plain wrong. Once you step away from the obvious research questions, it becomes much harder to explain to your colleagues what you are doing, and it becomes much more difficult to persuade them that what you are doing is useful and relevant. Every applied linguist knows about syntagmatic and paradigmatic word associations, for example, and people seem to find it reassuring and comforting to sit through conference papers which go over this old ground, perhaps providing a couple of useful illustrations that can be used in first year lectures on psycholinguistics. Complex analyses of these familiar problems are unsettling, and generate surprisingly stiff resistance. Nonetheless, the right questions to ask almost always turn out to be questions that other people simply can’t see the point of. Finding these questions takes time and effort, and you may not recognise them when they appear. However, the longer you work in an area, and the more you worry its basic assumptions, the more likely you are to find the critical questions that are really worth answering. The second theme that arises in these papers is the importance of importing research methods from outside your own discipline. This shows up in three ways. Firstly, all the papers in this volume are heavily influenced by the work of psycholinguists. They use experimental and statistical methods which even now are infrequently used by applied linguists, who often seem to be uncomfortable with quantitative approaches to research. I think that this may be one reason why this work has not been as influential as it might have been, at least in the UK. (The way that “hard” psycholinguistic research is disappearing from undergraduate linguistics worries me a lot – I suspect that it will be seriously detrimental to research in Applied Linguistics in the future.) Secondly, some of the papers in this volume rely heavily on an abstract mathematical approach to the analysis of network structures called graph theory
Introduction
(cf. Harary 1969). This methodology is widely used in the Social Sciences, and has had enormous influence on our understanding of the way networks function. Surprisingly, however, few linguists have tried to apply these methods to vocabulary networks. Most of the running here is again made by psycholinguists e.g., Miller’s WordNet project (Fellbaum 1998) or by computer scientists (Landauer & Dumais 1977; Ferrer-i-Cancho & Solé 2001), and again, these works are rarely cited by the main figures working on L2 vocabulary acquisition. The fact that Applied Linguists typically make little of developments of this sort is again a worrying feature of the discipline. The third interdisciplinary feature of the work reported in this volume is that a lot of it relies on computing. Over the many years that I have been working on word associations, a lot of my research time has been taken up by writing computer programs, and I am now a fairly proficient programmer in half a dozen languages. This investment has influenced my work in a number of ways. At the simplest level, most of the empirical data reported here was collected using computer programs which I wrote to standardise the data collection, and to facilitate the analysis of the data. A lot of research is basically drudgery and slog, but word association research is particularly bad in this respect. I do not think I could have done this work had I not been able to write simple computer programs that reduced what would have been weeks of hand-coding to a few minutes of processing time. More importantly, perhaps, some of the later papers in this volume make use of modelling and simulation techniques, which also rely on programming skills. Modelling of this kind is still in its infancy as far as Applied Linguistics is concerned, and it has not always been welcomed by the research community. In my view, it is pretty obvious, even from the simplistic models developed here, that modelling is an enormously powerful tool which has huge potential for Applied Linguistics research, and I feel strongly that elementary computer programming should form an essential part of the training that young researchers in Applied Linguistics are provided with. The papers in this book illustrate the value of working on a problem for a long time, and attacking it from a number of different perspectives. Most research is fundamentally metaphorical, in the sense that it develops an analysis which describes some aspects of a problem in terms of other more familiar objects. In our case, the core metaphor seems to be the idea that vocabulary is a network, an idea that seems to be so blindingly obvious that it is easy to forget its metaphorical nature. What these papers show is that we are far from understanding precisely what assumptions the network metaphor brings with it, and that our understanding of the way second language vocabulary networks function is very far from complete. The rest of this book explores these ideas in more depth. The book is divided into five sections. Section 1 consists of two papers that first appeared in 1978
Connected Words
and 1983 respectively. They represent what you might call “classic” word assocation studies: in one paper I report a small collection of data from L2 learners of French, while in the second paper, I comment on some of the problems this type of approach presents for L2 researchers. Section 2 is a slightly different take on word associations. The papers in this section illustrate how word association data can be used to ask interesting questions about aspects of vocabulary knowledge which are not usually addressed in this way. The focus here is on L2 learners’ productive vocabulary. This is a notoriously difficult area to work in, and the word association techniques described offer a methodologically innovative solution to this problem. The papers in this section were first published in 2000 and 2004 respectively. Section 3 contains three papers which explicitly explore the idea of a vocabulary as a network, and they involve a much more sophisticated approach to word association data than anything in Section 1 or Section 2. Chapter 5 is an early paper (1992) which introduces the basic idea of applying graph theory concepts to word association data. Chapter 6, which first appeared in 2004, describes some attempts to implement these ideas in a computerised test format. Chapter 7, a paper from 2005, illustrates a more ambitious approach in which I tried to model word association behaviour using simulation studies. Section 4 consists of a single, previously unpublished paper, which lists and summarises all the available work on word associations in a second language. Section 5 is in many ways the most innovative part of this book. It contains instruction manuals for a set of computer programs which I have developed as part of my work ongoing on word associations. These programs will allow readers to explore for themselves some of the many ideas discussed in this book. It has become something of cliché to describe research as journey. When you set out it is by no means obvious how you are going to get to your journey’s end. When you get to the end of your journey, the place you arrive at is usually not the place that you expected to be in when you set out. Most often, the interesting happenings are the serendipitous and unexpected discoveries that you make on the way, the byways you explore en route, rather than what you expected to see. This book has explored lots of byways. I hope that the reader will find them as interesting as I have done.
Acknowledgements A number of people have influenced the way I have thought about word associations, and their ideas will be found throughout the papers in this volume. I owe a particular debt of gratitude to Clarissa Wilks, Brent Wolter and Tess Fitzpatrick, all of whom taught me far more than they realise. Good colleagues all.
Introduction
Downloads Up-to-date versions of the computer programs described in Section 5 can all be downloaded from my website: http://www.lognostics.co.uk/
Appendix The Kent-Rosanoff Word Association List The Kent-Rosanoff list is a set of 100 words commonly used in studies of word associations. They first appeared in: G.H. Kent & A.J. Rosanoff. 1910. A study of association in insanity. American Journal of Insanity 67: 37–96 & 317–390. 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
table 2 man 6 mountain 10 comfort 14 butterfly 18 sweet 22 slow 26 beautiful 30 foot 34 sleep 38 high 42 trouble 46 eagle 50 dream 54 boy 58 memory 62 swift 66 ocean 70 religion 74 hammer 78 butter 82 lion 86 tobacco 90 quiet 94 king 98
dark 3 deep 7 house 11 hand 15 smooth 19 whistle 23 wish 27 window 31 spider 35 anger 39 working 43 soldier 47 stomach 51 yellow 55 light 59 sheep 63 blue 67 head 71 whiskey 75 thirsty 79 doctor 83 joy 87 baby 91 green 95 cheese 99
music 4 soft 8 black 12 short 16 command 20 woman 24 river 28 rough 32 needle 36 carpet 40 sour 44 cabbage 48 stem 52 bread 56 health 60 bath 64 hungry 68 stove 72 child 76 city 80 loud 84 bed 88 moon 92 salt 96 blossom 100
sickness eating mutton fruit chair cold white citizen red girl earth hard lamp justice bible cottage priest long bitter square thief heavy scissors street afraid
section 1
Early work The two papers in this section represent what we might broadly call “classic research” in word associations. Chapter 1 is a report of a set of word association data collected from a group of L1 English speakers learning French in school. Each subject contributed a single response to each of 100 stimulus words. The paper lists the primary, secondary and tertiary responses to each of these stimulus words, together with a complete set of responses to a small number of the stimulus words, and comments on some of the characteristics of this data which make it different from L1 data. Surprisingly, perhaps, this paper remains one of only a handful of studies which have looked at the development of lexical skills in low-level learners of French. With hindsight, the approach it adopted wasn’t a particularly innovative or insightful way of going about things. It merely followed the standard methodology, transporting it into a new setting. The paper was very much influenced by Riegel and Zivian’s pioneering work on learners of German (Riegel & Zivian 1972). It soon became clear, however, that there were a number of problems with the type of approach illustrated in this chapter, and some of these problems are discussed in Chapter 2. Again, with hindsight, it is easy to see that the main problem was a general lack of a theoretical framework which would make sense of the masses of data that word association studies generate. Listing the responses was easy, and followed the standard approach used by psycholinguists at the time (e.g., Postman & Keppel 1970). What proved more difficult was to tease out what made L2 word association responses different from L1 associations, and how you could use these findings to develop interesting claims about the way L2 vocabularies grow. The main difference I identified was a surprisingly large number of responses that made absolutely no sense to native speakers, and a similarly large number of responses which could best be described as arising from misreadings of the stimulus words. This should have made me ask questions about the way word forms are coded in L2 lexicons, but I did not appreciate its significance at the time, and only picked up on this idea much later (Meara & Ingle 1986). The prevailing framework of analysis at the time was to classify responses as syntagmatic, paradigmatic or clang responses, following work done with young L1 speakers, and though it soon became obvious that this set of classifications was not the most
Connected Words
illuminating way of analysing L2 word association responses, it was not obvious what other classifications would have been better. There were also some serious issues with the standard stimulus word list, which seemed not to be responsive to the most interesting features of L2 lexicons. These niggling problems made me wonder about alternative ways of analysing word association data, which would be less dependent on the specific stimuli and specific responses, and more responsive to the overall structural properties of the learners’ vocabularies. These ideas will re-emerge in Section 3. Meanwhile, the data generated by this study, and other studies that we were running alongside it, was extensive. 76 subjects each producing responses to 100 stimulus words amounted to 7600 data points. At this time, desk top computers were still a far-off dream, so the data had to be stored in hard copy, and sorted and analysed by hand. It was a massive and mind-numbing task. Even carrying the data around was no mean feat. Everyone is familiar with the old adage about research being 1% inspiration and 99% perspiration, but the amount of perspiration involved in processing this data seemed to be excessive, and I started to look around for alternative ways of handling this amount of data. About this time, London University was beginning to offer training in elementary computer programming for staff, and I learned to write simple programs to process word association data. The programs were not difficult to write. There was a language called SNOBOL4, (Griswold, Poage & Polonsky 1971) which had very powerful string processing commands, and allowed you to write short programs that sorted out piles of data in no time at all. There were, however, two logistical problems. The first problem was that data needed to be stored on punch cards. These were thin pieces of card, which stored data in the form of holes punched in columns. The cards had to be punched using an enormous machine – about the size of a small dinner table – and these machines were operated by a team of data inputters. The data I had to deal with was fairly interesting compared to the numerical data that these operators punched all day, but the level of errors generated by the inputters was still high, and the data needed to be checked rigourously. Sometimes it took several days to get a data set properly recorded. Each card could store 80 columns of data – about one line of text. In practical terms, that meant that you could just about store 10 words on a card, so if your subjects were generating responses to the standard 100 word list, then you needed 10 cards per subject. If you had 76 subjects, then your data required a nearly a thousand punch cards to encode the entire data set. You could just about fit this much data into a small suit-case. The second problem was that there were only two data input points in London University and the nearest one to my office was a twenty-minute walk away. The drill was that you took your program and data cards to the input point, and left them to be processed by an operator. The operator ran them through a
Section 1. Early work
machine that read the punch cards. If you were lucky, and your program worked correctly, you could come back the next day and pick up your data. The chances of a 1000-card job running without a mis-feed were fairly remote, so you generally had to break big jobs down into smaller ones, and then recombine the results into a single report later. It was a lot of work. The arrival of desk-top computing has made it much easier to process large amounts of data. The suit-cases of data that I used to carry across London would now fit comfortably on a single floppy disk, and a job that would have taken me several days to complete can now be accomplished in a matter of minutes. Unfortunately, processing word association data is still a fiddly and frustrating job unless you can write your own programs. I have therefore developed a small suite of utility programs that researchers interested in working with word association data can use to collect and analyse their data. These programs, which take some of the effort out of sorting and analysing large word association data sets, are described in more detail in Section 5 of this book. In fact, the main spin-off from this early experience of writing programs was a long love affair with computers, which turned out to be a significant part of my development as a researcher. The skills I learned while I was processing word association data turned out to be immensely useful as desk-top computers became standard research equipment. They made me relatively independent of professional programmers (who were expensive to employ), and also independent of the fairly limited research packages which were available at the time (which severely constrained the types of research questions you could follow up). Eventually, I got to the stage where I was able to write applications for delivering tests and processing the data they generated. Some examples of this type of application are reported in Section 2. More importantly, perhaps, my programming skills allowed me to develop simulation programs with which it was possible to explore some of the ideas about global lexical competence which were beginning to emerge from these early studies on word associations. These ideas are discussed in more detail in Section 3.
chapter 1
Learners’ word associations in French Introduction A Word Association Test consists of a list of words which are presented one at a time. For each word in the list you have to write down or say aloud the first word that comes to your mind. For many people, tests of this sort are closely associated with psychoanalysis, and a popular image of them is that they are a key to our subconscious and innermost selves. Word associations are indeed used in psychoanalysis, and in a number of other clinical situations, but there is also a long and respectable history attached to the study of the word associations produced by people who are not disturbed in any way. In contrast with the popular image, the word associations of normal adults are very unrevealing about their subconscious selves, and they show a surprisingly high degree of unoriginality. Table 1 below contains a list of ten common words taken from one of the standard word association tests, The Kent-Rosanoff list (Kent & Rosanoff 1910). Read through the list quickly, and write down the first word that comes to mind for each word in the list. When you have done this, check your answers against Table 2. Table 1. Ten stimulus words from the Kent-Rosanoff list 1: TABLE
______________
2: MAN
______________
3: SOFT
______________
4: BLACK
______________
5: HAND
______________
6: SHORT
______________
7: SLOW
______________
8: NEEDLE
______________
9: BREAD
______________
10: BITTER
______________
Table 2 lists the most common responses to the words in Table 1, and you should find that most of your responses are to be found there. For common stimulus words, such as these, the associations that normal people make are in fact very predictable. Given TABLE, for example, 78% of respondents reply with chair; given MAN, 78% respond with woman; BLACK produces white 70% of the time; BREAD gives butter 56% of the time, and so on.
Connected Words
Table 2. Commonest responses to the stimulus words in Table 1 1: TABLE 2: MAN 3: SOFT 4: BLACK 5: HAND 6: SHORT 7: SLOW 8: NEEDLE 9: BREAD 10: BITTER
chair woman hard white foot long fast thread butter sweet
cloth dog cushion night finger tall quick cotton jam lemon
talk boy light cat glove fat train pin cheese beer
desk child bed dark arm small snail eye food sour
Normal adults produce two main types of association, called syntagmatic and paradigmatic associations. Syntagmatic associations are associations that complete a phrase (syntagm) and some typical responses of this sort are shown below:
BRUSH HOLD BLACK BANK
teeth hands mark robber
Paradigmatic associations are ones in which the stimulus word and the response that it evokes both belong to the same part of speech, nouns evoking nouns, verbs evoking verbs, and so on. In these cases, the two words usually share a large part of their meaning, and both stimulus and response can usually occur in the majority of contexts where the other appears. Typical paradigmatic responses are:
MAN BOY FATHER HOT TREE
woman girl son cold bush
(meaning identical except for sex) (meaning identical except for sex) (different views of the same relationship) (polar opposite adjectives) (both plants of a woody kind)
An association such as MAN ~ snail would technically be classed as a paradigmatic association, but responses of this sort, where the two words are not closely related semantically, are rather uncommon. Normal adults tend to produce more paradigmatic responses than syntagmatic ones, provided the stimulus words are reasonably common. Less frequent words, which tend to occur in more constrained contexts, are more likely to produce syntagmatic responses. Children under seven years of age have a strong tendency to produce syntagmatic responses as a first preference to any word. They also tend to produce a large number of so called “clang associates” – associations where the
Chapter 1. Learners’ word associations in French
response is heavily influenced by the form of the stimulus word rather than its meaning. Some examples of clang responses are given below:
LIGHT HUM LATE GO
bite him light goat
(rhyming response) (consonants unchanged) (assonance) (same initial)
Responses of this last type are rare in normal adults, though they frequently occur in some types of mental illness, and under the influence of drugs.
The study The associations reported in this paper are those of 76 girls learning French in two London comprehensive schools. All the girls were preparing for the O-Level examination in French, and were tested at the beginning of their final year of study. The girls were each given a list of 100 French words and were asked to write down beside each one the first French word that it made them think of. The words were a translation of the standard Kent-Rosanoff list (Rosenzweig’s 1957 translation). This list is made up of high frequency words which students at this level would be expected to know. All but seven of the words are contained in either the premier or the deuxième degré of the français fondamental list (Gougenheim et al. 1956). The complete list will be found in the tables that follow. There are a number of reasons why it is interesting to look at the word association patterns of a group of students who are moderately proficient in a foreign language, but who have not yet achieved any real degree of fluency. Firstly, most of the work on the psychology of foreign language learning has concentrated on syntactic aspects of acquiring a new language. Hardly anyone has looked at what happens to foreign language words in the early stages of their acquisition, although learners themselves often identify vocabulary as a major problem area. It seems important that this neglect should not be allowed to continue. Secondly, the work on syntactic aspects of foreign language acquisition has suggested that there are a number of interesting parallels between learners and children acquiring their first language. It would be interesting to know whether these parallels also extend to vocabulary, and in particular, it would be interesting to know whether there is any tendency for learners to produce the syntagmatic responses and clang associates that are characteristic of young children, or whether they produce typically adult responses from very early on in the learning process. Thirdly, there is the problem of how foreign words are stored in the learner’s mental lexicon. Are they organised into semantic networks that are quite separate from the native language lexicon, or
Connected Words
do learners merely tag their French words onto their native language equivalents? If the latter were the case, one would expect to find that a large proportion of the associations produced by learners were merely translations of the normal English responses to the equivalent English stimulus word. If the learners were building independent lexicons for the two languages, then one would expect to find systematic differences between learners’ responses in English and French. The word associations produced by native French speakers are broadly comparable with those of native English speakers. Both groups produce a high proportion of paradigmatic responses, and in many cases the most common responses are very similar in both languages. In other cases, either for cultural reasons, or because there is a mismatch between the French and English lexicons, the principal responses in the two languages are quite different. Some examples are given in Table 3 below. Table 3. The most common responses in English and French to ten words from the Kent-Rosanoff list DEEP PROFOND
shallow creux
sea mer
water puits
MOUNTAIN MONTAGNE
hill neige
valley plaine
snow mer
HOUSE MAISON
home toit
garden foyer
door porte
BUTTERFLY PAPILLON
moth fleur
wing aile
net couleur
SWEET DOUX
sour dur
sugar mou
bitter agréable
EARTH TERRE
soil mer
sky ciel
ground ronde
SOLDIER SOLDAT
sailor guerre
army plomb
uniform armée
STOMACH ESTOMAC
food digestion
ache ventre
pain faim
YELLOW JAUNE
blue vert
red citron
green serin
BREAD PAIN
butter vin
jam blanc
cheese manger
HEALTH SANTE
sickness maladie
wealth fragile
happiness bonne
MEMORY MEMOIRE
mind souvenir
thought intelligence
forgetfulness leçon
Chapter 1. Learners’ word associations in French
The results of this study will be found in Table 4. This table contains the three most frequent responses produced by the learner group. (These are known respectively as the primary, secondary and tertiary responses.) Table 4 also reports the number of students contributing to each response and the French primary response for each stimulus word. This table lists each of the 100 stimulus words (col2), the most common native speaker response (col3), and the three most frequent responses produced by the learner group (cols4–9). The numbers indicate the number of subjects contributing to each of the responses. The final column gives the number of different responses produced by the learner group. The symbols preceding the learner responses are explained in the text. Table 4. Data elicited from 76 female learners of French
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 30
stimulus
L1 primary L2 primary
table sombre musique maladie homme profond mou manger montagne maison noir agneau confort main petit fruit papillon lisse ordre chaise doux sifflet femme lent désirer rivière blanc beau fenêtre
chaise clair note lit femme puits dur boire neige toit blanc doux fauteuil pied grand pomme fleur rugueux désordre table dur train homme rapide vouloir fleuve noir joli rideau
=chaise :soleil /disque /malade =femme /plafond /vache =boire =neige :jardin =blanc :mouton /confortable =pied =grand =pomme =fleur /livre /demander =table /deux /soufflé =homme :vite =vouloir :mer =noir /belle :porte
L2 secondary 53 11 11 9 37 6 13 28 8 13 53 7 6 19 68 31 7 10 7 55 14 9 42 10 15 10 55 15 31
/tableau :noir :violon /musique /garçon prendre /mouton :pain /campagne /appartement /soir /mal :table :doigt /large :orange :oiseau /lire /menu :asseoir /trois :agent /mari /lentement :aimer :eau :neige /froid :maison
L2 tertiary 7 4 6 9 8 3 5 5 5 12 4 3 4 7 4 14 6 8 3 2 13 3 5 5 5 10 4 9 7
:manger /heureuse /chanson /tête /dame /professeur :chat /pomme /lac :famille :rouge /oiseau /maison :bras /petite :légume /papiers /lit garçon chat /un /gateau :fille noël :avoir bâteau :bleu /mal /ouvrir
N 2 2 6 4 7 3 4 4 3 5 2 3 2 6 1 6 4 4 3 1 7 3 4 5 3 9 2 5 5
12 40 23 29 8 50 34 31 47 25 13 48 41 23 4 14 40 35 44 19 23 46 10 34 44 30 13 34 25
(Continued)
Connected Words
Table 4. Data elicited from 76 female learners of French (Continued)
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
stimulus
L1 primary L2 primary
rugueux citoyen pied araignée aiguille rouge sommeil colère tapis fille haut travail aigre terre difficulté soldat chou dur aigle estomac tige lampe rêve jaune pain justice garçon clair santé évangile mémoire mouton bain villa rapide bleu faim prêtre océan tête fourneau
lisse vote chaussure toile fil noir lit rouge moëlleux garçon bas repos doux mer facilité guerre fleur mou oiseau digestion fleur lumière sommeil vert vin balance fille obscur maladie bible souvenir doux mer mer train mer soif noir mer cheveux cuisine
/rouge /auto :main /arranger /train :bleu /soleil /bleu /eau =garçon :montagne /école /tigre :ciel :facile =guerre /chat /sur =oiseau :manger :tigre /lit :dormir /vieux :beurre /police =fille :lune /noël :église /tête /vache :salle de bain :maison :vite :rouge :manger /prendre =mer :yeux /fourneau
L2 secondary 16 12 20 5 2 14 15 7 5 28 4 11 6 15 18 6 5 4 12 4 16 9 9 14 26 7 37 19 11 11 13 10 13 48 29 19 20 18 23 10 3
/football /voiture :jambe /argent /malade :blanc :lit /couleur porte :fils /couture /rester /âgé /pomme de terre /simple :homme /chien :facile /église /malade :lion /table /lêve :vert :manger /agent :homme /gateaux :malade :religion :/livre /agneau /pain /ville /eau :ciel :soif fenêtre :eau :cheveux /couteau
L2 tertiary 3 11 12 2 2 9 5 4 4 22 3 6 4 11 13 4 4 4 6 3 15 8 8 7 4 6 8 5 2 4 4 6 7 4 7 17 7 3 9 9 3
/rideaux :ville tête désordre /mouton :noir :dormir /blouse /pied /fil /voix /autobus /aiguille /pomme /français :armée :fleur /pendant /aigre /tabac /animal /soleil :lit :rouge :couteau /court :café :noir /église /ange /enfant /viande :eau /village /rivière :blanc /femme /petit :atlantique :pied :chaud
N 2 5 4 2 2 7 3 4 3 6 3 5 3 9 5 4 3 3 2 3 4 5 5 5 3 5 3 5 2 2 3 4 6 3 6 10 4 3 7 8 3
39 29 21 50 52 20 33 41 51 13 44 24 46 26 24 44 36 36 41 47 30 27 40 26 25 40 24 33 43 42 42 37 31 15 18 16 16 35 14 20 44
(Continued)
Chapter 1. Learners’ word associations in French
Table 4. (Continued)
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
stimulus
L1 primary L2 primary
long religion cognac enfant amer marteau soif ville carré beurre docteur bruyant voleur lion jolie lit lourd tabac bébé lune ciseaux tranquille vert sel rue roi fromage fleur effrayer
court église alcool petit doux pilon faim Paris rond jaune maladie enfant bicyclette crinière tristesse repos léger fumée rose nuit couper calme pré mer maison reine blanc rose peur
:petit =église :boire :bébé /aimer /manteau =faim :maison /voiture :pain :hôpital /brille :cambrioleur :tigre /joli :dormir /silence :pipe :enfant :clair /cheveux :silence :bleu /acheter =maison =reine :pain :jardin :enfant
L2 secondary 18 24 17 21 13 3 11 21 10 44 19 6 5 32 17 17 4 9 33 13 7 6 15 6 13 13 16 9 4
/grand :catholique :vin :petit /mer /boire /soir :village /porter /vin :maladie /noir /voiture :animal jouer /lampe /France :cigarette :petit :soleil :couper :calme :herbe :poivre :voiture /rue :beurre :rose /robe
L2 tertiary 8 14 13 12 8 3 6 20 6 4 13 2 4 8 8 10 4 9 8 8 6 4 10 5 7 6 9 8 3
:court :dieu /boit :école amie /mouton :eau :Paris /roue /jeune /patient /acheter /maison /tigre :heureux /lire /sac :fumer mère :ciel :couteaux /bruit :jaune /vendre /automobile /moi :lait /rouge /travaille
N 6 4 4 7 5 3 6 3 2 2 4 2 4 5 4 5 3 7 4 5 5 4 9 5 5 3 8 6 2
22 21 24 25 34 45 33 25 39 20 19 52 34 19 24 28 44 25 25 24 42 48 16 42 30 43 30 26 47
Notes: The French norms are taken from Rosenzweig (1970) and are the primary responses produced by 184 female students. Rosenzweig also reports two other sets of data, responses from 104 male students and responses from 136 workmen, but the female norms seemed most appropriate for comparison with the learner group which was also composed of females. Rosenzweig’s male and female students differ only rarely in their primary responses, though there are a number of differences between the student responses and those produced by the workmen.
Consider first the learner’s primary responses. These fall into three main categories: Category A (marked = in Table 4) comprises primary responses which are the same as the primary responses reported for the native French speakers; Category B (marked : in Table 4) is made up of words which are not the normal primary response of French speakers, but which do nonetheless occur in the list
Connected Words
of normal responses for native francophones; Category C (marked / in Table 4) are responses that are not normally made by French speakers. The number of responses in each category will be found in Table 5. Table 5. Distribution of the learner’s primary, secondary and tertiary responses Category Primary Secondary Tertiary
= 23
: 40 46 40
/ 37 54 60
= learner’s primary response is the same as the French primary : learners’ response appears in the native speaker norms / learners’ reponse is never made by native speakers
Category A, 23 cases, is basically uninteresting, in that though the learners produce the same primary response as the native speakers, these primaries are translation equivalents of the corresponding English primary. For the four cases where this is not so, the primary response is a translation equivalent of a corresponding high frequency response in English. There is no way of deciding whether the learners are producing genuine French-like responses here, or whether they are merely translating their normal English responses. Category B, 40 cases, also appears to be largely made up of translations of English responses. Twenty-five of the learner primaries are translations of the corresponding English primaries or other very frequent responses. Of the remaining cases, six are marginal in that they are made very infrequently by native French speakers (not more than once in a sample of 150 speakers). This leaves us with only six primary responses which are genuinely French and un-English: ESTOMAC ~ manger, CLAIR ~ lune, EVANGILE ~ église, TETE ~ yeux, DOCTEUR ~ hôpital and EFFRAYER ~ enfant. The third category, totally unFrench associations, is surprisingly large. Eighteen of the thirty-seven cases can be classified as clang associates, relying heavily on the form of the stimulus word and ignoring its meaning completely. The second largest sub-category consists of associations which are quite reasonable, but just do not figure in the French norms. There is also a third set which arises as a result of the stimulus word being misunderstood. JAUNE ~ vieux and CITOYEN ~ auto are fairly simple cases of this but SANTE ~ noël and SEL ~ acheter and MOU ~ vache are rather more serious. What seems to be happening here is that the learners are interpreting the stimulus words in terms of a suitable English sounding word rather than reacting to the French
Chapter 1. Learners’ word associations in French
stimulus. The final type of unFrench association is where the stimulus word is used as a base to generate a morphologically related reponse word. There were three examples of this type: CONFORT ~ confortable, BEAU ~ belle and MALADIE ~ malade. For the secondary and tertiary responses, the number of unFrench responses is considerably higher, 54% and 60% respectively. Here again there are a number of clang responses, and several examples of misunderstandings of the SEL ~ vendre type. The fact that this discussion has been limited to the three most frequent responses may make these typically unFrench responses seem less important than they really are. These three responses account for only 33% of the total number of responses made by the learners, and unFrench responses are much more common among the less frequent responses. To illustrate this point, Table 6 contains the whole range of responses produced to three of the stimulus words. In this table, “French response” includes any word that appears in Rosenzweig’s norms, even words occurring only once in the list of responses generated by a group of 378 subjects. Even with a criterion as lenient as this, it is clear that only a fraction of the responses produced by the learners can be classified as French-like associations. Table 6 also contains the complete set of native-speaker responses for comparison. These three sets of responses are fairly typical of the complete responses to the 100 stimulus words. With less frequent words such as LISSE and RUGUEUX, the number of non-responders and those who claim not to know the stimulus word rises. In the case of very frequent words such as HOMME or BLANC, the number of individual responses is lower, and the number of respondents contributing to the most frequent responses is rather higher than in these examples. The data in Figure 6 is untypical in that there are few examples of clang associates. This is probably due to the fact that two of the stimulus words are close cognates of English words. Clang associates are particularly common with less frequent French stimuli. Other points worth noting are the complete absence of some very frequent responses made by native speakers from the learners’ data, and the very small number of syntagmatic responses. MEMOIRE gives rise to no syntagmatic responses, although there are a number of examples of this type in the native speaker data. LONG produces mainly paradigmatic responses. PAIN produces a number of syntagmatic responses – beurre, eau, fromage – which are phrases, but only two examples of genuine syntagmas – manger and grillé. There is no evidence in the data as a whole that the learners produce syntagmatic responses in any systematic way.
Connected Words
Table 6. Three Complete Response Sets. The table shows the complete response set
generated for three stimulus words by the learner group (N=76) and by Rosenzweig’s student group (N=378). For the learner group, “French responses” are marked with *. Responses marked $ were generated by subjects who claimed not to know the meaning of the stimulus word responses to PAIN Learner Responses
N
Native Spkr Responses
N
*beurre *manger *couteau, gâteau *eau, *lait, doigt, malade français, provisions, mal de mer, *grillé confort, margarine, docteur, *flutes, être, anxious, guerre, toucher, baguette yeux, pense, berre, porter, pape, bière, cou illegible no response $merci, $animaux, $tabac
28 4 3 2 1
vin (et vin) blanc, manger faim mie (de mie) dur
30 19 16 14 13
1
nourriture
12
1
sec
11
1
bis
9
1 2 2 1
quotidien boulanger beurre blé, farine amour, bon, frais, noir aliment, miche, sel, viande, boulangerie, brûlé, couteau croissant, croûte, cuisine, épice lait, main, pauvre, repas, sucre, table, travail, vie Amour et Fantaisie, Amour et Jalousie appétit, beau, besoin, brioche, céleste, chaud, chocolat, Christ, cidre, corbeille è couper, craque, déjeuner, Dieu, doré drôle, eau, fantaisie, film, flûte, four, fromage, gâteau, gourmand, goûter, grillé, grossir, guerre, habitude, homme, justice, labeur, long, mâcher, miettes, moralité, nécesaire, odeur, pain bis, pamplemousse, planche, prison, rassis régime, repos, sandwich, saveur, seigle, sueur, tartine, tendre, trou, vivant
8 7 6 5 4 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1
(Continued)
Chapter 1. Learners’ word associations in French
Table 6. (Continued) responses to LONG Learner Responses
N
*petit(e) *grand(e) *court liste temp, temps, longtemps, cours
18 8 6 3 2 1 *cheveux, cour, cheveau, *vite, *loin, corte 1 pont, *kilomètre, vide, règle, lasse, lion, 1 tard, jambes, *giraffe, chanson, 1 longleterre lettre, Angelterre 1 illegible 1 no response 6 $pantalons, $vautours, $short 1
Native Spkr Responses
N
court large route chemin, mince jour(s) bâton, grand, pain maigre, petit, serpent étroit, fil, infini, jambe, nez courrier, ligne, règle, tige
65 34 10 9 8 6 5 4 3
arbre, bras, cheveux, cou, fatigue, lent mètre, rifle, ruban, train, trajet, ver adjectif, allongé, asperge, attente baguette, barbe, bête, bois, bond, bref, Chine, corde, couloir, cour, couteau crayon, discours, Don Quichotte, ennui enneuyeux, espace, étang, étendu, fatigant, girafe, gouttière, haut, héron hiver, horizon, immense, immensité indéfini, island, jour_sans_pain, jumeau kilomètre, loin, longévité, longitudinal long_way, main, mariage, Mississippi moi, mur, ovale, patiente, patte, pin plaisir, pluie, pointe, rail, rigide, rude, rue scieur, temps, trait, triste, turban, tuyau vite
2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
responses to MEMOIRE Learner Responses
N
Native Spkr Responses
N
tête *livre enfant diarie, oublier, *mort, maison history, *histoire, Italie, vert, lire, école remembre, jaune, morter, aimer, pense bord, fleurs, *vacances, pape, cahier nouveau, lettre, naItre, matin, monton moment, belle, grandparents, mal, libre prendre, manger, devenir, remoir demain, pleut, mourir, non, conservatoire
13 4 3 2 1 1 1 1 1 1 1
souvenir intelligence leçon livre, oubli fidèle, mot, travail cerveau, facilité, habitude, homme Bergson, bonne, chance, courte, faculté faible, passé, savoir, trou, visuelle affective, amnésie, d’un_âne, apprendre association, Chateaubriand, déficiente effort, géographie, histoire, idée
50 19 7 6 5 4 3 3 2 2 2
(Continued)
Connected Words
Table 6. Three Complete Response Sets (Continued) Learner Responses se_lève illegible no response
N 1 0 6
Native Spkr Responses intellectuel, maladie, mauvaise, mort, outre-tombe, pensée, psychologie, rappeler, santé, test abstrait, absurde, analyse, ancien, d’ange alphabet, appétit, aprentissage, atomisme attention, aucune, bêtise, blanc, cerveille cheval, par coeur, compliquée conscience, couloir, cours, défaillance défaut, défectueuse, différente, difficile difficulté, distraction, document, durée écrit, ennui, énorme, épreuve, esprit étude, examen, fable, fatigue, foi, folie force, fuite, de_Gaulle, grenier, grimoire gros_livre, imagination, instinct, jeunesse journal, lecture, lente, localisation, lyre machine, de_médecin, mémoire, mémorisation, mère, mnémonie, moyen moyen, noms_propres, pas, passable pathologique, penser, peu, physique philo, Piéron, poésie, poisson, précise qualité, Rabelais, rappel, rapidité, récitation, réponse, réserve, rétention Ribot, Ségur, sensibilité, Mme_de_Sévigné, simple, songe sonnet, table de multiplication, tombe trace, trouver, utile, vacances, vacillantes volonté
N 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Discussion There are two possible approaches that we can take towards the data presented above. The first is to take the very obvious discrepancies between the association patterns of learners and native speakers as indicative of serious inadequacies in the learners’ grasp of French. Ideally, it might be argued, learners ought to aim at performing like native speakers on every language task, and this ideal could be applied not only to primary language activities such as listening and speaking, but also to secondary activities such as the word association task. These secondary activities are not just academic curiosities: they are a useful way of investigating the way speakers’ knowledge of their language is structured and stored. Word associations
Chapter 1. Learners’ word associations in French
clearly tell us something about the way our mental dictionaries are organised. The data suggests that the native speaker’s mental dictionary is organised mainly on semantic lines, rather more like a thesaurus than a conventional dictionary. Words of similar meaning, or words that have the same range of convenience are stored in such a way that they readily evoke each other. In the learners’ case, however, this semantic organisation seems to be much less well established. The learners studied here do show some evidence of semantic organisation, but this is mainly dependent on translation between French and English. There also appears to be a conflicting principle of organisation, which makes use of the forms of words rather than their meaning. Even among respondents who claimed to have understood the meaning of the stimulus word, there is a strong tendency for totally extraneous words to emerge as associates. These responses are not related to the form or the meaning of the stimulus word. This lack of a proper semantic organisation for foreign language words may explain a large part of the difficulty that learners experience in processing both written and spoken foreign language material. Receptive skills rely heavily on a predictive process whereby the reader/listener anticipates what is about to appear, and checks these predictions against what does actually appear in the speech stream or text. A semantically based lexicon would obviously be effective here. It is usually possible to predict at least part of the meaning that your interlocutors are trying to convey, even though it is not always possible to predict the exact words that they will use. If we imagine that predicting the occurrence of a particular word brings to mind not only that word, but a whole cluster of other words that are closely related to it, then in a semantically organised lexicon, all the words brought to mind would be relevant to the matter in hand, and it is highly likely that one of the words in this cluster will match what appears in the utterance or text. A dictionary that was organised along non-semantic lines would be less efficient, since the cluster of words would contain a large number of items that were totally irrelevant to the message in hand. A dictionary based on formal criteria, for example, would bring to mind a whole cluster of similar sounding words, and this would be confusing even when the predicted word was correctly anticipated. In effect, learners with such a mental dictionary would be bombarded with irrelevant messages, which would make it very difficult for them to extract the true meaning of what they are trying to understand. If this characterisation of the learner’s mental dictionary as lacking a proper semantic organisation is a true description, then one implication would be that we ought to put a considerable research effort into developing learning methods which could lead learners to develop mental lexicons that are properly structured, and as closely as possible like those of native speakers. On the other hand, there is a second equally plausible, but quite contradictory approach which could be taken: to claim that though there are large and obvious
Connected Words
discrepancies between the learners and the native speakers, they are not really of any importance. It might be the case that all learners go through a phase when their foreign language lexicon is organised on non-semantic criteria, or indeed even randomly. If the lexicon was relatively small, this might not really matter, and it might be the case that given enough exposure the lexicon reorganises itself on semantic grounds when the number of words it contains becomes large enough to make efficient organisation important. Our knowledge of how learners acquire foreign language vocabulary, and how this part of their competence is elaborated, is so slight that there is not really any evidence available which could indicate which of these two approaches is more likely to be the correct one. This rather unhappy state of affairs has three main causes. Firstly, most of the major developments in applied linguistics in the last decade have been chiefly concerned with aspects of syntactic development. This is due to the existence of well-developed and useful models which have been worked out in the course of studies of first language acquisition in children. This work is obviously important, but it is also important to remember that syntactic problems are only part of a whole set of problems faced by learners of foreign languages. Syntax is not a serious source of difficulty for more advanced learners, and vocabulary problems are probably much more serious once the early phases of learning are past. Secondly, where vocabulary problems have been studied, this has almost always been from the point of view of the teacher, the tester or the course writer, rather than that of the learner. West’s work on the frequency of English words as a criterion for inclusion in text books (West 1953) and the work on français fondamental (Gougenheim et al. 1965) are good examples of this. Such work is clearly of great value, but it leaves unasked a number of questions of a fundamental kind about the psychological aspects of acquiring foreign language vocabulary. Thirdly, the small amount of work that has looked at learners acquiring vocabulary has usually assumed that learning a foreign word is merely a matter of being able to recognise that HOMME means MAN. The model that underlies this kind of thinking is an adaptation of the paired-associate idea found in psychological work on verbal learning, and implies that native language words and foreign words are linked together in simple stimulus response relationships. This is an impoverished view of the complexities involved, though. It assumes that vocabulary items are discrete, and ignores the networks of semantic relations that exist between words, and the fact that sets of related words in one language rarely map in any simple way onto the equivalent set in another language. More importantly, by defining the problem in terms of inter-language pairs, any comparison between what a learner
Chapter 1. Learners’ word associations in French
does with a foreign word and what a native speaker does are explicitly ruled out of consideration. This last point is an important one. ‘Knowing a word’ for a native speaker is a complex and multi-faceted skill, perhaps best described in behavioural terms as the ability to react to a word in ways which are considered appropriate by the speech community. Many learners are incapable of reacting appropriately to a word, even though technically they know its meaning and might be able to use it in a sentence. Two examples will suffice. Native speakers have little difficulty recognising words spoken against a background of noise, but even fluent learners are very much less tolerant of noise, and can fail to recognise words at noise levels which have no effect at all on the performance of native speakers. Native speakers can read single words exposed on a screen for as little as 30 milliseconds, but learners require much longer exposure times, even when the words tested are very common ones. Being able to perceive words in noise, or read words quickly are both examples of the type of skill all native speakers are expected to have by their speech community. Both are important subcomponents of the ability to communicate. It is clearly important that learners should be trained to share these appropriate reactions, so that they can perform these tasks, and others like them, with something like the facility found in native speakers. The case of word associations is not so clearly important as the activities mentioned above, as there is a very wide range of tolerance found among native speakers, and since the production of word associations is not so clearly related to ordinary language activities. My own feeling, however, is that all the various types of language activity are reflections of the same underlying basic skills, and that if we could develop learning methods that, as a side effect, produced learners with native-like association patterns, we would also be producing learners who were better able to communicate in their foreign language.
chapter 2
Word associations in a foreign language Lexicography for L2 learners is a well-developed and influential part of research in Applied Linguistics. Most of this work deals with the linguistic features of words, and very little of it is concerned with a related, equally interesting, but much more elusive question: what does a learner’s mental lexicon look like, and how is it different from the mental lexicon of a monolingual native speaker? As a part of a preliminary skirmish into this area, my students and I have been using word association tests. So far we have produced a small number of interesting, but unsurprising findings, and a large number of methodological puzzles and problems. The main findings have already been published elsewhere, and so in this paper I shall discuss them very briefly before dealing at greater length with the problems and their implications for further research. As pointed out in the previous chapter, the basic word association game is extremely simple. It requires two players: one whose task is to call out or show single words, and a second whose task is to respond to these words with the first word that comes into his or her head. Despite its popular image as a sure-fire way of probing people’s innermost secrets, the most striking thing about associations is that they are actually extremely boring and predictable. Even relatively unpredictable stimulus words like MEMORY or MUSIC still produce a very limited range of responses. With a hundred people, you would be likely to get about 25 to 30 different responses, but most of these will occur more than twice, and only a relatively small number will be unique responses. Using bigger groups of subjects does not make very much difference to this pattern; responses tend to stabilize with groups of fifty or more, and using a group very much larger than this makes little difference to the range or pattern of responses. It is customary to claim that word association responses generally fall into two main classes called syntagmatic associations and paradigmatic associations. These terms have much the same meaning as they do in Saussure. Syntagmatic associations are responses which form an obvious sequential link with the stimulus word. Given DOG, for example, bark, spotted, naughty, or bite would generally be classified as syntagmatic responses. Responses which are from the same grammatical form class as the stimulus word are classed as paradigmatic. Thus, given DOG,
Connected Words
cat, wolf or animal would all be classified as paradigmatic responses. Personally, I have found that this distinction is very difficult to work in practice, especially when you cannot refer back to the testee for elucidation, but this difficulty is not generally commented on in the literature. The distinction is important because it is generally held that most normal native speaking adults have a tendency to produce paradigmatic responses in preference to syntagmatic ones. Children, on the other hand, tend to prefer syntagmatic responses, at least until they reach the age of seven or so. Children also tend to produce large numbers of clang associates – i.e., responses which are clearly related to certain phonological features of the stimulus word, but bear no obvious semantic relationship to it. Rhyming responses, assonance, responses with the same initial sounds as the stimulus, or a similar prominent consonant cluster are common types of clang associate. The word associations produced by non-native speakers differ fairly systematically from those produced by native speakers. Surprisingly, learners’ responses tend to be more varied and less homogeneous than the responses of a comparable group of native speakers. This is an odd finding because learners must have a smaller, more limited vocabulary than native speakers, and this might lead one to expect a more limited range of possible responses. Learner responses are not generally restricted to a subset of the more common responses made by native speakers, however. On the contrary, learners consistently produce responses which never appear among those made by native speakers, and in extreme cases, it is possible to find instances of stimulus words for which the list of native speaker and learner responses share practically no words in common. The reasons for this are not wholly clear, but one contributory factor is the fact that learners have a tendency to produce clang associations like young children. A second contributory factor is that learners very frequently misunderstand a stimulus word, mistaking it for a word that has a vague phonological resemblance to the stimulus. This clearly leads to maverick responses, but these cannot be dismissed out of hand. The frequency of the phenomenon suggests that actually identifying foreign language words reliably is a major problem for many learners, and this seems to be the case even when the words are simple, and when the learners themselves claim to know them. Some examples of learner responses of this type are shown in Table 1, along with a set of plausible interpretations. This sort of data, taken together with the fact that learner responses tend to be relatively heterogeneous anyway, suggests that the semantic links between words in the learner's mental lexicon are fairly tenuous ones, easily overridden by phonological similarities, in a way that is very uncharacteristic of native speakers. So much, then, for the basic findings. What about further research based on these foundations? The word association test is so simple to use, and produces such a wealth of data with a minimum of effort, that one would expect to find a
Chapter 2. Word associations in a foreign language
Table 1. Associations to French Stimulus words which seem to be based on misinterpretations of some sort Stimulus
Response
Source of confusion?
béton béton béton béton béton béton fendre naguère semelle semelle traire cruche émail émail dru toupie toupie risible risible jeter mou etc...
animal stupide conducteur orchestre téléphoner Normandie permettre eau dessert odeur essayer important lettre chevalier dessiner argent cheveux lavable incre hurler vache
bête bête bâton bâton jeton breton défendre? nager semolina (?) smell try crucial mail mail drew 2p (?) toupé rinsable (?) rinsable (?) hurl moo
large amount of research using this paradigm. Surprisingly, this is not the case. A number of studies do exist, (see Meara 1981 for a survey of this work ), but they all seem to cover much the same ground, producing little in the way of new findings, and rarely even trying to break new ground. There are no theoretical models which account satisfactorily for word association behaviour in a second language, and consequently almost all the work published so far (including my own study reported in Chapter 1, alas) has been content merely to describe the sorts of responses that learners produce, together with a minimal statistical analysis. It seems to me that one of the prime reasons for this lack of development is that far too little consideration has been given to what words should be used as stimuli. Some of the published work makes use of idiosyncratic lists from which it is difficult to make generalizations. (An extreme case of this is Ruke-Dravina (1971) who used only four stimulus words in her study of Latvian-Swedish bilinguals.) Generally, where idiosyncratic lists of stimuli are used there is no discussion of why these words were chosen, or why they might be considered especially worthy of note. This is
Connected Words
unfortunate because it means that discrepant results can always be explained away in terms of the stimuli used, and there is no incentive to incorporate these discrepancies into a coherent overall framework. The alternative to idiosyncratic lists is to use one of the many standard lists of stimuli – generally the Kent-Rosanoff list. This list of words was first used by Kent and Rosanoff in 1910 as the basis for a study of the word associations made by mentally ill subjects. Since then, it has been widely used in word association research, both in English and – in translation – in a range of other major languages. The list consists of 100 relatively frequent words, all of which produce fairly stable response patterns in normal native-speaker adults. The extensive use of this list means that a very large number of sets of association norms are available: i.e., collections of responses based on large groups of similar subjects, (cf. for example, Postman & Keppel 1970). In theory, this ought to make it possible to do useful and illuminating comparisons between the responses of learners and native speakers, and, indeed, a number of studies have attempted to do this. Unfortunately, the Kent-Rosanoff list is not a particularly useful one for research on second language learners. The most important reason for this is that the high frequency words used tend to produce very similar responses in both the TL and the NL. Adjectives, for instance, tend to produce their polar opposites, so one finds BLACK ~ white; NOIR ~ blanc; MOU ~ dur; SOFT ~ hard. This makes it difficult to decide whether a Subject’s response to a stimulus word is really a direct L2-L2 response, or whether it is produced via translation into the mother tongue and back again. The same argument applies in the case of nouns which are marked for sex: these tend to produce the opposite sex form as a response; so, KING ~ queen; ROI ~ reine; and BOY ~ girl; GARÇON ~ fille. As far as English and French are concerned, about 60% of the items in the Kent-Rosanoff list are of this sort. I do not know the figures for any other pair of languages, but it seems probable that most European languages at least are likely to fall in the same general range. This means that the list as a whole is not a very sensitive tool when it is used with non-native speakers: fewer than half the words are really effective items. A second problem with the Kent-Rosanoff list is again one that derives from its one apparent advantage: the use of frequent words. Almost all the words in the list lie in the highest frequency band – in the French version, for instance, only four words do not appear in either the first or second steps of the Français Fondamental. This means that all the words tested are among the first words that learners acquire in their second language – often at a stage where learning new words is an unfamiliar and strange experience. This has two drawbacks. Firstly, we know very little about how second language vocabulary is acquired, but it seems a reasonable supposition that the early stages of learning a language might produce acquisition patterns that differ quite radically from what goes on when more advanced, fairly fluent speakers learn words. It is possible that the resulting
Chapter 2. Word associations in a foreign language
word association behaviour with basic L2 words might be quite different from what happens with more “advanced” vocabulary, and it might be quite wrong to generalize on the basis of what happens with a hundred highly frequent words learned in peculiar circumstances. Secondly, the use of the Kent-Rosanoff lists has had the effect of concentrating attention on a small number of words which form the hard core of the learners’ L2 vocabulary, and this has distracted attention away from what is potentially a much more interesting problem: what is happening at the periphery of a learner’s vocabulary – how new words are acquired and integrated into the existing word stock. The third problem with the Kent-Rosanoff list is that the apparent bonus of being able to compare learners’ responses with the published norms for native speakers turns out on closer inspection to be of doubtful value. In Chapter 1, I suggested that it was reasonable to expect learners to aim towards producing nativelike responses on a word association test, for the simple reason that one wants learners to behave like native speakers in all types of language behaviour. Several people have pointed out to me, however, that this argument is not a good one. Teaching a language aims to produce people who are bilingual, not mere replicas of monolingual speakers. It would, therefore, be more appropriate to compare the associations of learners with those of successful bilingual speakers, and not with native speakers. Unfortunately, of course, the necessary background work needed to make such comparisons has not yet been carried out. These three reasons, and particularly the first two, seem to me to be strong arguments for abandoning the use of the Kent-Rosanoff list with non-native speakers. It would be nice to be able to suggest a concrete alternative at this stage, but this is obviously very difficult to do. What would count as an appropriate set of stimuli depends very much on what questions you are trying to answer. Perhaps the general point to be made is that experimenters do need to think about their choice of words more carefully. Tried and trusted tools which work for L1 situations are rarely wholly appropriate for L2 situations, and word association research is clearly one of these cases. The problem of what words to use as stimuli in word association research with non-native speakers is one that requires thought, but not a topic that raises any really important questions. Now that we have got it out of the way, we can pass on to three topics which seem to me to be of rather more interest, both theoretical and practical. These are the stability of learners’ associations, what happens to new words as they are acquired, and on a slightly different tack, what we can deduce from obvious errors in word association tests about the way words are stored and handled by learners. The stability of learners’ responses in word association tasks is an important methodological question that has not been generally considered in the literature.
Connected Words
We know that native speakers’ associations are relatively stable: subjects tend to give the same responses to stimulus words, and this tendency is even more marked if we consider the responses of whole groups of subjects. This means that one can be reasonably confident that a single test is a reliable tool to use with native speakers, and that it is unlikely that a second test would produce wildly different response patterns. It is much less clear that this assumption can safely be made about learners, however. Learners’ vocabularies are by definition in a state of flux, and not fixed; learners often tend to give idiosyncratic responses; the indications are that semantic links between words in the learner’s mental lexicon are somewhat tenuous – all these considerations would lead one to suspect that learners’ responses could be considerably less stable than the response patterns of native speakers. If this turned out to be so, it would severely reduce the value of one-off studies of learners, and it would be impossible to ascribe to studies of learners the same sort of status we usually ascribe to one-off studies of native speakers. It would also mean that considerable caution would be needed in the interpretation of studies such as that of Randall (1981). Randall attempted to relate changes in association responses to measurable changes in the proficiency of a group of EFL learners. However, if learners’ responses are generally unstable, then there is no way of deciding whether observed changes are really permanent ones, and thus represent real progress, or whether they are just part of the random flux of the whole system. We have carried out two studies on stability so far, with a third study planned. These studies show rather mixed results. Morrison (1981) looked at FinnishEnglish bilingual children and found that they were equally stable, or rather equally unstable, in both languages. This is not very surprising, however, since children tend to be fairly unstable anyway. Hughes (1981), in a bigger and better controlled study of several groups of ESL learners found that responses on the whole were very unstable, but the general level of stability differed considerably from group to group and from word to word. There were, however, no obvious reasons for these discrepancies, and all we can say at the moment is that it seems safest to assume that learners’ word associations are not very stable. This is obviously an unsatisfactory state of affairs, as it effectively inhibits any other research in this area. It is equally obvious, however, that learners’ responses are not totally unstable, and our immediate aim is to work out what conditions lead to reasonably stable patterns and what are the causes of the instability. The second question that has interested us is what happens to new words which are acquired by learners, and how do they become integrated into the learners’ mental lexicon? It is often implicitly assumed that learning vocabulary is an immediate all-or-nothing affair – when words are studied, they are either acquired or not. This is a position which seems inherently implausible to me. Most
Chapter 2. Word associations in a foreign language
learners have the experience of knowing that they know a word, but being quite unable to say what it means, even though looking the word up in the dictionary produces an instant ‘of course!’ reaction. This experience and others like it, suggest that learning vocabulary is not just a question of pairing L2 stimuli and L1 meanings often enough for them to be ‘learned’. Some sort of complex absorption processes are likely to be involved, which allow words which have just been met to gradually find their proper place in the learner’s L2 lexicon. Perhaps it would be possible to tap this process by recording the associations made to new words and observing how these associations change over a period of time? So far we have carried out one experiment on these lines (see Beck 1981 for details). A group of English speaking students learning French at ‘A-level’ were given a list of forty French words that they were unlikely to know, and asked to produce chains of responses to each one. Not surprisingly this produced few responses overall, a large number of clang-type responses and only a handful of native-speaker-like responses. Subsequently twenty of the words were introduced into the students’ class-work in a non-obtrusive fashion, and two further tests were given over a twelve week period. The results of the first re-test showed that there was no real change in the responses to the words that had not been used in class teaching. They still produced a low level of total responses, lots of clang associations and few native-speaker-like responses. In contrast, the taught words changed markedly, producing a greater number of total responses, fewer clang associates, and a greater proportion of native-like responses. The second re-test again showed no change in the untaught words. The taught words showed a slight decline in the total number of responses they evoked, but an increase in the proportion of native-like responses. This data clearly confirms the view that learning vocabulary is not an instantaneous process. Changes are still taking place twelve weeks after the initial presentation of the taught words. Indeed, given that the total number of responses was far short of what one would expect of a fluent speaker, and given that the number of nativelike responses was less than 20% of the total, it seems plausible to suggest that the integration of these words was far from complete, and that these changes are likely to continue for quite long periods of time. The questions to be asked at this stage, then, are: how long does this stabilizing period last? is it the same for all words and for all learners? what environmental factors reduce or extend it? It should be possible to get answers, at least, of a preliminary sort, to all these questions by means of word association tests, and further work along these lines is projected. The third question which is currently interesting us concerns the large proportion of responses made by learners which are clearly ascribable to errors – either errors in the identification of the stimulus word or error in the choice of a response. These errors bear some resemblance to the sorts of errors native speakers of English
Connected Words
make when they produce malapropisms and slips of the tongue. The errors listed in table one, for example, show that certain features of the target tend to be preserved – initial consonants and salient consonant clusters seem to be fairly robust, while vowels and medial syllables seem to be particularly vulnerable, and these are the same features that crop up consistently in work on errors in English as an L1. This suggests that the mechanisms which underlie vocabulary errors in an L2 might be closely related to the sources of errors of vocabulary in an L1. Given that such errors typically occur with infrequent words, and that L2 words are by definition relatively infrequent items in the learner’s total word stock, this is perhaps not very surprising. Nevertheless, it does suggest that the traditional emphasis on L2 as a self-contained, independent system may be an unhelpful one, at least as far as vocabulary is concerned, and that a lot might be gained if we began to consider the learners’ total vocabulary, in all the languages they know, as an integrated whole, and not just as a set of small discrete components.
Conclusion This paper has discussed some of the findings and some of the interesting problems that have arisen out of our work on word associations – itself part of a wider project on Vocabulary Acquisition in a Second Language. Vocabulary Acquisition is generally considered to be a topic of little inherent interest and of slight theoretical importance, and even on the practical level it is very often ignored or treated in a cavalier fashion. I hope that this paper will help to convince sceptics that these attitudes are unjustified, and that vocabulary acquisition is not just an interesting area to work on, but potentially quite an exciting one too.
section 2
Associations as productive vocabulary Introduction In the previous section, I introduced the idea of word association tests, and discussed some of the problems that we encountered when we tried to use these tests with L2 speakers. This section takes a rather different tack, and illustrates how word association data can be used to examine issues which are usually thought of in different terms. Zareva (2005:560) has noted that “the analysis of the quantitative features of ... word associations was found to be of little practical usefulness... [but] we do believe that associative measures hold a potential as valid measures of L2 learners’ lexical knowledge that need to be re-examined in an assessment context”. And this was pretty much the position I had reached after my early work in this area. I came away from this early work with a feeling that the data word association studies generated was enormously rich, but it was difficult to know how to exploit this richness. By the mid 1980s, I had got involved in a series of studies aimed at measuring vocabulary size in L2 speakers. This project started out with some very exploratory work on YES/NO tests (e.g., Meara & Buxton 1987) – tests in which we simply asked learners to indicate whether they knew the meaning of the target words or not. It soon turned out that the YES/NO format was much more powerful than it looked at first sight. Specifically, it opened up the possibility that we might be able to make reasonably accurate statements about the size of a learner’s vocabulary – as long as we were interested in making claims about passive, receptive vocabulary, at least. The project soon mushroomed into a sizeable programme, which involved the development of tests in a whole range of languages, and the development of computer platforms to deliver them. Eventually, I felt pretty confident that we had a reasonable testing procedure, and started to think about how it could be used to develop the theory of vocabulary acquisition in an L2. I was thinking in terms of developing a two or three dimensional model of vocabulary acquisition which would allow us to track the relationship between vocabulary size, vocabulary organisation, and vocabulary accessibility – ideas we will return to in Section 3.
Connected Words
In the meantime, however, as interest in the YES/NO tests developed, we were getting a lot of criticism that our tests “merely” addressed receptive vocabulary skills, and had nothing to say about the more interesting facets of productive vocabulary. My first reaction to this criticism was that it didn’t matter very much. True, the YES/ NO tests were a very minimal vocabulary test, testing little more than students’ ability to recognise that a word form existed, but I felt that there were strong philosophical reasons for adopting this approach. Simple as they were, the YES/NO tests threw up an enormous range of problems, particularly when we started to develop them in languages other than English (Meara 2005), but at least it was obvious that these problems existed. With more complex test formats, the problems might have been much less obvious. I also felt that the relationship between productive and receptive vocabulary was probably a fairly straightforward one: receptive vocabulary would generally be larger than productive vocabulary, and it was simply an empirical question to determine how big this difference was in a typical case. Discussions with colleagues made me realise that this position wasn’t as straightforward as I thought. There are two obvious claims that we might want to make about the relationship between productive and receptive vocabulary. One was that productive vocabulary is typically a constant percentage of receptive vocabulary, so, for example, we might want to argue that Subjects typically use about 50% or 75% of the vocabulary they know, while the remainder of their vocabulary remains in a passive state. This position implies that there is a substantial gap between receptive and productive vocabulary, and that this gap gets bigger in real terms as the Subjects’ vocabularies get bigger. The second position is that productive vocabulary lags behind receptive vocabulary by a constant amount, typically, say, a few hundred recently acquired words. The idea here would be that over time words gradually move along some sort of passive/active continuum, and passive words would therefore generally be words which had been acquired only recently by the learner. (This idea of an active/passive continuum is one which is widely accepted by the vocabulary research community, e.g., Melka Teichroew 1982, and has strongly influenced the way people think about vocabulary teaching. As we shall see in Section 3, however, I think there are reasons for rejecting this way of looking at things.) Both these ideas imply that there is some sort of linear relationship between receptive vocabulary size and productive vocabulary size, and if this were the case then we would be justified in using a receptive vocabulary size test like the YES/NO test as a substitute for a more comprehensive productive vocabulary test. A more interesting, but less obvious, claim is that the relationship between productive and receptive vocabulary size is not linear, but varies. For example, it is possible that receptive vocabulary grows in spurts, and that productive vocabulary grows in the consolidation periods between these spurts. Once you start thinking
Section 2. Associations as productive vocabulary
in these terms, it becomes obvious that there are some really interesting theoretical questions to be asked about the way vocabularies grow, and how the relationship between receptive and productive vocabularies changes as a result of this growth and development. The obvious way to approach these issues was to take a reliable test of receptive vocabulary, and a reliable test of productive vocabulary, and compare the results. The YES/NO tests looked as though they could provide one half of the necessary tools, but it was much harder to identify a good test of productive vocabulary. Some work in this area had been undertaken by Laufer (1995) and by Laufer & Nation (1995, 1999), but for reasons which are explained in more detail in Chapter 3, neither of these approaches was without problems, and we felt there was perhaps some merit in looking at productive vocabulary from a different perspective. The basic problem was that most approaches to productive vocabulary relied on Subjects producing texts for analysis, but these texts were highly topic dependent, which biased the types of words they elicited, and more importantly, they failed to elicit uncommon words in large numbers. Eighty percent or so of most texts come from the first 1000 words of the target language, and this makes it difficult to estimate the productive vocabulary of an author without resorting to some sophisticated mathematics, which probably didn’t apply to short texts anyway. Word association data was not restricted in this way, however. We therefore began to wonder whether it might be possible to develop a standardised word association task that would tap into the productive vocabulary of L2 learners, a task which would have good measurement characteristics, and allow us make a preliminary foray into the areas outlined earlier in this section. Lex30, the instrument described in Chapters 3 and 4, was the outcome of this work. Unlike some of our other experimental tools, Lex30 was widely and rapidly taken up widely by other researchers. It was favourably reviewed by Baba (2003), and found particular favour with researchers in Spain, (e.g., Naves & Miralpeix 2002; Jiménez Catalán & Moreno Espinosa 2005). For a time we were worried that what we viewed as an exploratory tool was being adopted prematurely as a standard by over-eager researchers, though these fears receded somewhat as our confidence in the Lex30 approach grew. Our current view is that Lex30 is still in need of further validation, but it provides an interesting alternative to more traditional approaches (Morgan & Oberdeck 1930) and also more radical approaches (e.g., Meara & Bell 2001) to the question of how we assess productive vocabulary. The whole notion of productive vocabulary has turned out to be much less tractable than we expected it to be, and we suspect that it may in the long run turn out to be an idea which is best approached through intensive single subject studies rather than through studies of groups of learners in experimental situations. In the meantime,
Connected Words
the Lex30 approach remains an imaginative application of word associations in a difficult research context. Section 5 contains a detailed users’ manual for the Lex30 programs, and we do encourage readers to try them out for themselves. The current version can be downloaded from: http://www.lognostics.co.uk/
chapter 3
Lex30 An improved method of assessing productive vocabulary in an L2 Introduction This paper describes a tool which we believe can be used to make straightforward assessments of the productive vocabulary of non-native speakers of English. The data reported here are preliminary in the sense that we are not putting forward a well-developed and properly validated testing instrument. Rather, we are trying to address a complex ‘chicken and egg’ situation which is causing something of a blockage in the field of vocabulary research. This blockage arises from the fact that there are no well-established and easy-to-use tests of productive lexical skills. The nearest thing we have to a useful tool in this area is Laufer and Nation’s test (Laufer & Nation 1995, 1999). We think these are problematical for reasons which will be explained in more detail below. However, until we have some kind of test which might be interpreted, however loosely, as an index of productive vocabulary, it is unlikely that we will be able to make very much headway in this area. The tools described here, then, are intended as a first step in this direction. Our aim has been to develop a methodology which we think might be honed into something more formal. This paper first describes the methodology that we have developed, and then shows how the methodology could be used to make interesting comparisons between productive and receptive vocabulary in L2 learners. Successful L2 learners are avid collectors of words, and tend to measure their own success by the number of words that they know. Current teaching materials and methodologies exploit and encourage this. The New Cambridge English Course, for example, proudly claims that “students will learn 900 or more common words and expressions during level 1 of the course” (Swan & Walter 1990:5). The communicative language teaching techniques and comprehension-based teaching methodologies of the last two decades also attached more importance to vocabulary acquisition than did, for example, the grammar translation and audio-lingual approaches which dominated pre-1970 language teaching (Nunan 1995; Lightbown et al. 1998).
Connected Words
In most practical contexts, it is clear that communicative effectiveness is achieved more successfully by learners with a larger vocabulary than by learners with a more detailed command of a smaller one. It is not surprising then, that measurements of vocabulary size have been shown to correlate positively with proficiency levels in reading (Anderson & Freebody, 1981) and writing (Engber, 1995), and in general language proficiency (Meara & Jones, 1988). In practice, however, most claims of this sort have relied on measures of passive, receptive vocabulary knowledge, since it has been difficult to measure control of productive vocabulary effectively. The implicit assumption here is that active vocabulary knowledge can be reasonably extrapolated from measures of receptive knowledge. This assumption is not an implausible one. Few researchers would dispute that receptive vocabulary is probably larger than productive vocabulary, and that some level of receptive knowledge of a word must exist in order for the word to be produced. Nonetheless, one could imagine situations where the relationship between active and passive knowledge might not be straightforward, and for this reason, it would be useful to have an independent measure of active vocabulary. Unfortunately, it is much more difficult to assess productive vocabulary knowledge than it is to assess receptive vocabulary knowledge. The main reason for this is that the vocabulary produced by a learner, whether in written or spoken form, tends to be so context-specific that it is difficult to calculate from any small sample the true size or range of the learner’s productive vocabulary. It is also difficult to devise simple tasks which produce the large quantities of vocabulary that are necessary to make reasonable estimates. There are two principal methods of estimating productive vocabulary currently in use, but neither of these has fully resolved these problems. Controlled productive vocabulary tests prompt subjects to produce predetermined target words. Testees are given a sentence context, a definition, and/or the beginning of the target word, e.g.:
The book covers a series of isolated epi_________ from history.
and are required to complete the missing word – in this case episodes (Nation, 1983; Laufer & Nation, 1999). Free productive vocabulary tests such as Laufer & Nation’s (1995) lexical frequency profiling tests, analyse a written or spoken text generated by the subject, and categorise the vocabulary used in terms of frequent, less frequent and infrequent words. The higher the percentage count of infrequent words, the larger the subject’s productive vocabulary is estimated to be. There are problems inherent in these two types of test. The controlled productive vocabulary tests are effective mainly at low levels; when, for example, testees are expected to have a limited vocabulary size, the method allows a high proportion of these words to be tested. The controlled productive test used in
Chapter 3. Lex30
Laufer & Nation (1999) attempts to elicit 18 target words for each of five frequency bands: two thousand, three thousand, five thousand, the University Word List, and ten thousand. Although this approach seems to be effective at lower levels, it must be difficult to extrapolate about the size of a testee’s productive lexicon beyond a relatively small vocabulary. At the 10,000 word level, we are in effect testing 18 words from a pool of several thousand words, and using this to draw conclusions about the testee’s knowledge of all the other words in this pool. Suppose, for example, that we test the word fragrance with an item like:
The fra_________ of the flowers filled the room.
This item treads a fine line between receptive and productive skills: the production of the target word is dependent on receptive understanding of the surrounding context words. Additionally, it is possible that our subjects do not know fragrance, but do know scent, aroma, perfume, ..., similarly infrequent words that could easily have fitted the slot, but for the (helpful?) hint. The point is that this kind of test item can easily identify what testees do not know, but it is rather less successful at identifying the full extent of what they do know. In any case, if we are testing a vocabulary of any size, say, three or four thousand words, it would be impossibly difficult in practice to devise a comprehensive set of items large enough to provide the sort of coverage that we would need to get reliable estimates of productive vocabulary. The free productive vocabulary tests are problematical too. They are contextlimited, although in many cases the effects of this are minimised by using a broad subject base (e.g., essays discussing a moral dilemma; c.f. Laufer & Nation, 1995). In most cases, however, it is unclear that the material these tasks elicit genuinely encourages testees to ‘display’ their vocabulary in the way that a test of productive vocabulary would require. In addition to this, free productive vocabulary tests are not a cost-effective way of eliciting vocabulary: most text – even text generated by fluent native speakers – is predominantly made up from a small set of highly frequent words. A huge amount of text is needed to generate more than a handful of infrequent words, and it is often difficult to elicit texts of this length from non-native speakers. Laufer & Nation (1995), for example, reported that they needed to elicit two 300-word essays from their testees in order to obtain stable vocabulary size estimates. This required two hours of class time, a figure which would be prohibitive, except in special circumstances. One superficially attractive alternative to continuous text as a source of productive vocabulary is the spew test (Palmberg, 1987; Waring, 1999). In spew tests, subjects are simply asked to produce words which share a common feature, e.g., words beginning with the letter B. In our view, research using spew tests has not lived up to its promise, however. There are major problems over standardisation of scoring
Connected Words
which have not been addressed, and although we think there is some potential in the method, we think that spew tests need a lot more development work before they can be used reliably. Clearly, then, there is a need for a cost-effective and efficient way of eliciting data from testees which can give us enough material to make a rough estimate about their productive vocabulary skills. The rest of this paper describes a new productive vocabulary test, which has been designed with these criteria in mind, and addresses the practical problems we have discussed. The test generates rich vocabulary output from testees, but it is easily administered, and can be scored automatically using a computer program. We believe that it therefore has the potential to be developed into a practical and effective research tool.
Lex30 This section describes Lex30, and discusses the sorts of data it generates and the analysis that we apply to these data to make estimates about the productive vocabulary of the testees.
The test format The Lex30 task is basically a word association task, in which testees are presented with a list of stimulus words, and required to produce responses to these stimuli. There is no predetermined set of response target words for the subject to produce, and in this way, Lex30 resembles a free productive task. However, the stimulus words tend to impose some constraints on the responses, and Lex30 thus shares some of the advantages of context-limited productive tests. Word association tasks typically elicit vocabulary which is more varied and less constrained by context than free production tasks. The test consists of 30 stimulus words, which meet the following criteria: 1. All the stimulus words are highly frequent – in our experiment, the words were taken from Nation’s first 1000 word list (Nation, 1984), i.e., they are words which even a fairly low-level learner would be expected to recognise. This is a deliberate choice: it makes it possible to use the test with learners across a wide range of proficiency levels. 2. None of the stimulus words typically elicits a single, dominant primary response – the formal criterion that we adopted here was that the most frequent response to the stimulus words, as reported in the Edinburgh Associative Thesaurus (Kiss et al. 1973) should not exceed 25% of the reported responses. In this way, we avoided stimulus words like BLACK or DOG, which typically
Chapter 3. Lex30
elicit a very narrow range of responses, and selected stimulus words which typically generate a wide variety of different responses. 3. Each of the stimulus words typically generates responses which are not common words – the formal criterion here was that at least half of the most common responses given by native speakers were not included in Nation’s 1000 word list (Nation 1984). In this way, the stimulus words give the testee a reasonable opportunity to generate a wide range of response words.
Subjects A group of 46 adult learners of English as a foreign language were used as test subjects. These people were from a variety of L1 backgrounds ranging from Arabic to Icelandic. Their class teachers rated them from elementary to intermediate proficiency level.
Method The testees were asked to write a series of response words (at least three if possible) for each stimulus word, using free word association (an example was worked through with each class before the test). Stimulus words were presented one at a time, and testees were allowed 30 seconds to respond to each stimulus word, after which the administrator called the number of the next stimulus word. The entire test therefore took 15 minutes to complete. For an example of a completed test see Appendix A. The testees also completed a standard Yes/No Vocabulary Size test (Meara & Jones, 1990). Both tests were completed within the same week.
Scoring In order to score the test, each testee’s responses (approximately 90 per subject) were typed into a machine readable file. The stimulus words are discarded for the purpose of the analysis. Each of the responses was lemmatised so that inflectional suffixes (plural forms, past tenses, comparatives, etc.) and frequent regular derivational affixes (-able, -ly, etc.) were counted as examples of the base-forms of these words. Words with more unusual affixes were not lemmatised and were treated as separate words. For a full account of the criteria used, see Appendix B. The list of lemmatised suffixes corresponds to levels 2 and 3 of Bauer and Nation’s Word Families (Bauer & Nation, 1993). Once the stimulus words have been discarded, we are left with a short text generated by each testee, which typically contains about 90 different words. Each testee’s text is then processed using a program similar to Nation’s VocabProfile (Heatley & Nation, 1998). The program reports the frequency level
Connected Words
of each word in the text, and produces a report profile for that testee. Table 1 illustrates a typical results profile. Level 0 words (high frequency structure words, proper names and numbers) and Level 1 words (the most frequent 1000 content words in English) score zero points. Any response which falls outside these two categories scores one point up to a maximum of 90. In the example given in Table 1, the score was (10+40)=50 points. Table 1. A typical profile generated by Lex 30
Subject A1
Level 0
Level 1
Level 2
Level 3+
4
49
10
40
Results The results of the productive vocabulary test, Lex30, can be seen in Table 2. Not surprisingly, the number of structure words produced is low. Native speaker word association tests (Postman & Keppel, 1970; Kiss et al. 1973) also produce mostly content words. Most of the words produced by Subjects fall into Nation’s first thousand category (Nation, 1984). Analysis of the completed tests shows that the first response to a stimulus word was usually a frequent word; the second, third and fourth responses were more likely to be less frequent words. About a third of the responses, on average, fell outside this highly frequent set of words, and some testees produced very large numbers of words outside this category (Figure 1). Table 2. Mean profile for Lex30
Mean sd
Level 0
Level 1
Level 2
Level 3+
Total Wds
Lex30 score
3.7 3.6
59.3 13.9
7.8 3.6
20.8 11.4
91.6 24.2
28.9 13.9
The Lex30 scores were also compared with the results of the receptive Yes/ No vocabulary size test. The maximum score on this test was 10,000: two subjects scored this maximum. Mean scores on the standard Yes/No test were 5089, with a standard deviation of 2803. Figure 2 shows the relationship between testees’ scores on the two tests. The correlation between these two scores was 0.841 (p