EV@LUTION ~
~
-c/)
~
~
LANGUAG E
This page intentionally left blank
EVGLUTION
LANGUAGE Proceedings of the 7th International Conference (EVOLANG7) Barcelona, Spain
12 - 1 5 March 2008 Editors
Andrew D M Smith University of Edinburgh, UK
Kenny Smith Northumbria University, UK
Ramon Ferrer i Cancho Universitat de Barcelona, Spain
r pWorld Scientific N E W JERSEY
*
LONDON
. SINGAPORE . BElJlNG
*
SHANGHAI
*
H O N G KONG
*
TAlPti
. CHtNNAl
Published by World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA ofice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK ofice: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
THE EVOLUTION OF LANGUAGE Proceedings of the 7th International Conference (EVOLANG'I) Copyright 0 2008 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. 7Yiis book, or paris thereoJ may not be reproduced in any form or by any means,
electronic or niechanical, including photocopying, recording or any information storage and retrieval .sy?;temnow known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, M A 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-I3 978-981-277-61 1-2 ISBN-I0 981-277-61 1-7
Printed in Singapore by World Scientific Printers
Preface This volume collects the refereed papers and abstracts of the 7th International Conference on the Evolution of Language (EVOLANG 7), held in Barcelona on 12-15 March 2008. Submissions to the conference were solicited in two forms, papers and abstracts, and this is reflected in the structure of this volume. The biennial EVOLANG conference is characterised by an invigorating, multi-disciplinary approach to the origins and evolution of human language, and brings together researchers from many fields including anthropology, archaeology, artificial life, biology, cognitive science, computer science, ethology, genetics, linguistics, neuroscience, palaeontology, primatology, psychology and statistical physics. The multi-disciplinary nature of the field makes the refereeing process for EVOLANG very challenging, and we are indebted to our panel of reviewers for their conscientious and valuable efforts. A full list of the panel can be found on the following~. page. Further thanks are also due to: The EVOLANG committee: Angelo Cangelosi, Jean-Louis Dessalles, Tecumseh Fitch, Jim Hurford, Chris Knight and Maggie Tallerman. A particular debt of gratitude is owed to Jim Hurford, who has once again given generously of his time and expertise in the preparation of the proceedings. The local organising committee: Sergi Balari, Yolanda Cabre Sans, Joan Castellvi, Pere Cornellas, Ramon Ferrer i Cancho, Ricard Gavalda, Antoni Hernindez, Victor Longa, Guillerrno Lorenzo, Maria Antonia Marti, Txuss Martin, Josep Quer, Carles Riba, Joana Rossell6, Jordi Serrallonga and Mariona Taul6. CosmoCaixa and the Museum of Science for their financial support and offering us their unique facilities. The Department of Innovation, Universities and Business of the Catalan government (Generalitat de Catalunya), the Spanish Ministry of Education and Science, Universitat de Barcelona and Universitat Politecnica de Catalunya for their financial support. The Service of Linguistic Technology (STEL) for computing facilities. The plenary speakers: Derek Bickerton, Rudolf Botha, Camilo Jose Cela Conde, Francesco d’Errico, Susan Goldin-Meadow, Simon Kirby, Gary Marcus, Fridemann Pulvermiiller and Juan Uriagereka. Finally, and most importantly, the authors of all the contributions collected here. Andrew Smith, Kenny Smith and Ramon Ferrer i Cancho November 2007 V
This page intentionally left blank
Panel of Reviewers Michael Arbib Andrea Baronchelli Mark Bartlett Tony Belpaeme Derek Bickerton Joris Bleys Richard Blythe Rudie Botha Ted Briscoe Joanna Bryson Christine Caldwell Josep Call Angelo Cangelosi Ronnie Cann Andrew CarstairsMcCarthy Morten Christiansen Andy Clark Bernard Comrie Louise Connell Fred Coolidge Christophe Coupe Tim Crow Joachim de Beule Bart de Boer Dan Dediu Didier Demolin Jean-Lou is Dessal les Guy Deutscher Mike Dowman Robin Dunbar Shimon Edelman Mark Ellison Wolfgang Enard Nicolas Fay Emma Flynn Bruno Galantucci Simon Garrod Les Gasser Laleh Ghadakpour Kathleen Gibson David Gil
Jonathan Ginzburg Tao Gong Nathalie Gontier Tom Griffiths Takashi Hashimoto Bernd Heine Wolfram Hinzen Jean-Marie Hombert Carmel Houston-Price Jim Hurford Yuki Ike-Uchi Gerhard Jager Sverker Johansson Harish Karnick Simon Kirby Chris Knight Kiran Lakkaraju Simon Levy Phil Lieberman Elena Lieven David Lightfoot John Locke Gary Lupyan Heidi Lyn Dermot Lynott Peter MacNeilage Gary Marcus Davide Marocco Brendan McGonigle April McMahon James Minett Padraic Monaghan Salikoko Mufwene Van0 Nasidze Chrystopher Nehaniv Daniel Nettle Fritz Newmeyer Jason Noble Kazuo Okanoya Gloria Origgi Pierre-Yves Oudeyer Asli Ozyiirek vii
Domenico Parisi Anna Parker Irene Pepperberg Simone Pika Joseph Poulshock Sonia Ragir Florencia Reali Anne Reboul Luke Rendell Debi Roberson Thom Scott-Phillips Robert Seyfarth Katie Slocombe Andrew Smith Kenny Smith James Steele Samarth Swarup Ears Szathmary Maggie Tallerman Ian Tattersall M6nica Tamariz Carel ten Cate Peter Todd Mike Tomasello Huck Turner Natalie Uomini Juan Uriagereka Robert van Rooij Arie Verhagen Marilyn Vihman Paul Vogt Bill Wang Andrew Wedel Mike Wheeler Bencie Woll Liz Wonnacott Hajime Yamauchi Henk Zeevat Jordan Zlatev Klaus Zuberbiihler Jelle Zuidema
This page intentionally left blank
Contents Preface
V
Panel of Reviewers
vi
Part I: Papers Is Pointing the Root of the Foot? Grounding the "Prosodic Word" as a Pointing Word Christian Abry and Virginie Ducey The Subcortical Foundations of Grammaticalization Giorgos P. Argyropoulos Pragmatics and Theory of Mind: A Problem Exportable to the Origins of Language Teresa Bejarano
3 10
18
Two Neglected Factors in Language Evolution Derek Bickerton
26
Expressing Second Order Semantics and the Emergence of Recursion Joris Bleys
34
Unravelling the Evolution of Language with Help from the Giant Water Bug, Natterjack Toad and Horned Lizard Rudolf Botha Linguistic Adaptations for Resolving Ambiguity Ted Briscoe and Paula Buttery Modelling Language Competition: Bilingualism and Complex Social Networks Xavier Castelld, Victor M. Eguiluz, Maxi Sun Miguel, Lucia Loureiro-Porto, Riitta Toivonen, Jari Saramaki and Kimmo Kaski Language, the Torque and the Speciation Event Tim J . Crow The Emergence of Compositionality, Hierarchy and Recursion in Peer-to-Peer Interactions Joachim De Beule ix
42
51
59
67
75
X
Causal Correlations between Genes and Linguistic Features: The Mechanism of Gradual Language Evolution Dan Dediu
83
Spontaneous Narrative Behaviour in Homo Sapiens: How Does It Benefit Speakers? Jean-Louis Dessalles
91
What do Modern Behaviours in Homo Sapiens Imply for the Evolution of Language? Benoit Dubreuil
99
The Origins of Preferred Argument Structure Caleb Everett
107
Long-Distance Dependencies are not Uniquely Human Ramon Ferrer i Cancho, Victor M. Longa and Guillermo Lorenzo
115
How Much Grammar Does It Take to Sail a Boat? (Or, What can Material Artifacts Tell Us about the Evolution of Language?) David Gil
123
The Role of Cultural Transmission in Intention Sharing Tao Gong, James W. Minett and William S-Y. Wang
131
The Role of Naming Game in Social Structure Tao Gong and William S- Y. Wang
139
Do Individuals Preferences Determine Case Marking Systems? David J. C. Hawkey
147
What Impact Do Learning Biases have on Linguistic Structures? David J . C.Hawkey
155
Reanalysis vs Metaphor: What Grammaticalisation CAN Tell Us about Language Evolution Stefan Hoefler and Andrew D. M. Smith Seeking Compositionality in Holistic Proto-Language without Substructure: Do Counter-Examples Overwhelm the Fractionation Process? Sverker Johansson Unravelling Digital Infinity Chris Knight and Camilla Power Language Scaffolding as a Condition for Growth in Linguistic Complexity Kiran Lakkaraju, Les Gasser and Samarth Swarup
163
171 179
187
xi
The Emergence of a Lexicon by Prototype-Categorising Agents in a Structured Infinite World Cyprian Laskowski Evolutionary Framework for the Language Faculty Erkki Luuk and Hendrik Luuk
195 203
Artificial Symbol Systems in Dolphins and Apes: Analogous Communicative Evolution? Heidi Lyn
21 1
The Adaptiveness of Metacommunicative Interaction in a Foraging Environment Zoran Macura and Jonathan Ginzburg
219
On the Impact of Community Structure on Self-organizing Lexical Networks Alexander Mehler
227
A Crucial Step in the Evolution of Syntactic Complexity Juan C. Moreno Cabrera
235
Evolution of the Global Organization of the Lexicon Mieko Ogura and William S-Y. Wang
243
From Mouth to Eye Dennis Philps
25 1
What Use is Half a Clause? Ljiljiana Progovac
259
The Formation, Generative Power, and Evolution of Toponyms: Grounding Vocabulary in a Cognitive Map Ruth Schulz, David Prasser, Paul Stockwell, Gordon Wyeth and Janet Wiles
267
On the Correct Application of Animal Signalling Theory to Human Communication Thomas C. Scott-Phillips
275
Natural Selection for Communication Favours the Cultural Evolution of Linguistic Structure Kenny Smith and Simon Kirby
283
Syntax, a System of Efficient Growth Alona Soschen Simple, but not too Simple: Learnability vs. Functionality i n Language Evolution Samarth Swarup and Les Gasser
29 1
299
xii
Kin Selection and Linguistic Complexity Maggie Tallerman
307
Regularity in Mappings Between Signals and Meanings Mdnica Tamariz and Andrew D. M . Smith
315
Emergence of Sentence Types in Simulated Adaptive Agents Ryoko Uno, Takashi Ikegami, Davide Marocco and Stefano Nolfi
323
Desperately Evolving Syntax Juan Uriagereka
33 1
Constraint-Based Compositional Semantics Wouter Van Den Broeck
338
The Emergence of Semantic Roles in Fluid Construction Grammar Remi Van Trijp
346
Broadcast Transmission, Signal Secrecy and Gestural Primacy Hypothesis Slawomir Wacewicz and Przemysiaw Zywiczynski
354
Self-Interested Agents can Bootstrap Symbolic Communication if They Punish Cheaters Emily Wang and Luc Steels
362
Coping with Combinatorial Uncertainty in Word Learning: A Flexible Usage-Based Model Pieter Wellens
370
Removing 'Mind-Reading' from the Iterated Learning Model Simon F. Worgan and Robert I. Damper How does Niche Construction in Learning Environment Trigger the Reverse Baldwin Effect? Hajime Yamauchi
378
386
Part 11: Abstracts
Coexisting Linguistic Conventions in Generalized Language Games Andrea Baronchelli, Lucia Dull 'Asta, Alain Barrat and Vittorio Loreto
397
Complex Systems Approach to Natural Categorization Andrea Baronchelli, Vittorio Loreto and Andrea Puglisi
399
Regular Morphology as a Cultural Adaptation: Non-Uniform Frequency in an Experimental Iterated Learning Model Arianita Beqa, Simon Kirby and Jim Hurford
40 1
xiii
Neural Dissociation between Vocal Production and Auditory Recognition Memory in Both Songbirds and Humans Johan J. Bolhuis
403
Discourse Without Symbols: Orangutans Communicate Strategically in Response to Recipient Understanding Erica Cartmill and Richard W. Byrne
405
Taking Wittgenstein Seriously: Indicators of the Evolution of Language Camilo J . Cela-Conde, Marcos Nadal, Enric Munar, Antoni Gomila and Victor M. Egui'luz An Experiment Exploring Language Emergence: How to See the Invisible Hand and Why We Should Hannah Cornish
407
409
The Syntax of Coordination and the Evolution of Syntax Wayne Cowart and Dana McDaniel
41 1
The Archaeology of Language Origin Francesco D'Errico
413
The Joy of Sacs Bart De Boer
415
How Complex Syntax Could Be Mike Dowman
417
The Multiple Stages of Protolanguage Mike Dowman
419
A Human Model of Color Term Evolution Mike Dowman, Ying Xu and Thomas L. Griffiths
42 1
Evolution of Song Culture in the Zebra Finch Olga Feher, Partha P. Mitra, Kaeutoshi Sasahara and Ofer Tchernikovski
423
Iterated Language Learning in Children Molly Flaherty and Simon Kirby
425
Gesture, Speech and Language Susan Goldin-Meadow
427
Introducing the Units and Levels of Evolution Debate into Evolutionary Linguistics Nathalie Gontier
429
xiv
What can the Study of Handedness in Nonhuman Apes Tell Us about the Evolution of Language? Rebecca Harrison
43 1
Unidirectional Meaning Change with Metaphoric and Metonymic Inferencing Takashi Hashimoto and Masaya Nakatsuka
433
Recent Adaptive Evolution of Human Genes Related to Hearing John Hawks Inhibition and Language: A Pre-Condition for Symbolic Communicative Behaviour Carlos Hernandez-Sacristan
435
431
Pragmatic Plasticity: A Pivotal Design Feature? Stefan Hoefler
439
Continuity between Non-Human Primates and Modern Humans? Jean-Marie Hombert
44 1
After all, a "Leap" is Necessary for the Emergence of Recursion i n Human Language Masayuki Ike-Uchi
443
Labels and Recursion: From Adjunction-Syntax to Predicate-Argument Relations Aritz Irurtzun
445
Iterated Learning with Selection: Convergence to Saturation Mike Kalish
441
A Reaction-Diffusion Approach to Modelling Language Competition Anne Kandler and James Steele
449
Accent Over Race: The Role of Language in Guiding Children's Early Social Preferences Katherine D. Kinzler, Kristin Shutts, Emmanuel Dupoux and Elizabeth S. Spelke
45 1
Language, Culture and Biology: Does Language Evolve to be Passed on by Us, and Did Humans Evolve to Let that Happen? Simon Kirby
453
Three Issues in Modeling the Language Convergence Problem as a Multiagent Agreement Problem Kiran Lakkaraju and Les Gasser
456
The Development of a Social Signal i n Free-Ranging Chimpanzees Marion Laporte and Klaus Zuberbuhler
458
xv
Gestural Modes of Representation - A Multi-Disciplinary Approach Katja Liebal, Hedda Lausberg, Ellen Frincke and Cornelia Muller
460
Extracommunicative Functions of Language: Verbal Interference Causes Categorization Impairments Gary Lupyan
462
Form-Meaning Compositionality Derives from Social and Conceptual Diversity Gary Lupyan and Rick Dale
464
Language as Kluge Gary Marcus
466
Origins of Communication in Autonomous Robots Davide Marocco and Stefan0 No&
468
Handedness for Gestural Communication and Non-Communicative Actions in Chimpanzees and Baboons: Implications for Language Origins Adrien Meguerditchian, Jacques Vauclair, Molly J. Gardner, Steven J. Schapiro and William D. Hopkins
470
The Evolution of Hypothetical Reasoning: Intelligibility or Reliability? Hugo Mercier
472
Simulation of Creolization by Evolutionary Dynamics Makoto Nakamura, Takashi Hashimoto and Satoshi Tojo
474
Evolution of Phonological Complexity: Loss of Species-Specific Bias Leads to more Generalized Learnability in a Species of Songbirds Kazuo Okanoya and Miki Takahashi
476
Referential Gestures in Chimpanzees i n the Wild: Precursors to Symbolic Communication? Simone Pika and John C. Mitani
47 8
Modeling Language Emergence by Way of Working Memory Alessio Plebe, Vivian De la Cruz and Marc0 Mazzone
480
Mechanistic Language Circuits: What Can be Learned? What is Pre-W ired? Friedemann Pulvermiiller
482
Reflections on the Invention and Reinvention of the Primate Playback Experiment Greg Radick
485
xvi
An Experimental Approach to the R61e of Freerider Avoidance i n the Development of Linguistic Diversity Gareth Roberts Prosody and Linguistic Complexity in an Emerging Language Wendy Sandler, lrit Meir, Svetlana Dachkovsky, Mark Aronoff and Carol Padden Communication, Cooperation and Coherence Putting Mathematical Models into Perspective Federico Sangati and Jelle Zuidema
487 489
49 1
A Numerosity-Based Alarm Call System in King Colobus Monkeys Anne Schel, Klaus Zuberbuhler and Sandra Tranquilli
493
On There and Then: From Object Permanence to Displaced Reference Marie ke Sc ho uwst ra
495
Signalling Signalhood and the Emergence of Communication Thomas C. Scott-Phillips, Simon Kirby and Graham R. S. Ritchie
497
Wild Chimpanzees Modify the Structure of Victim Screams According to Audience Composition Katie E. Slocombe and Klaus Zuberbuhler
499
An Experimental Study on the Role of Language in the Emergence and Maintenance of Human Cooperation John W. F. Small and Simon Kirby
50 1
Replicator Dynamics of Language Processing Luc Steels and Eors Szathmdry Syntactical and Prosodic Cues in Song Segmentation Learning by Bengalese Finches Miki Takahashi and Kazuo Okanoya Why the Transition to Cumulative Symbolic Culture is Rare Mdnica Tamariz
503
505
507
A Gradual Path to Hierarchical Phrase-Structure: Insights from Modeling and Corpus-Data Willem Zuidema
509
Author Index
511
Papers
This page intentionally left blank
IS POINTING THE ROOT OF THE FOOT? GROUNDING THE AS A POINTING WORD CHRISTIAN ABRY Language Sciences Department, Stendhal, BP 25 FRANCE-38040 Grenoble CPdex VIRGINIE DUCEY Virginie Ducey, GIPSA-Lab Stendhal. BP 25 FRANCE-38040 Grenoble CPdex Recently in the Vocalize-to-Localire framework (a functional stance just started in the Interaction Studies 2004-2005 issues we edited, Abry et al., 2004), we addressed the unification of two grounding attempts concerning the syllable and the foot in language ontogeny. Can the movement time of the pointing strokes of a child be predicted from her babbling rhythm? The answer for 6 babies (6-18 months) was a 2.1 pointing-to-syllable ratio. Implications for the grounding of the first words within this Pointing Frame will be examined. More tentatively we will suggest that babbling for protophonology together with pointing for protosyntax pave the way to language.
1. Introduction
While the main scientific endeavour is Jission, say first break already known units, as in physics typically, the afterthought of formal constructions is to restart from primitives, e.g. building blocks. This is the foundational Chomsky & Schutzenberger's free monoid for computational linguistics, then Move and/or Merge in the Minimalist Programme (MP). In physiological behavior the degrees-of-freedom problem is rather seen developmentally as a problem of breaking early given coordinations (e.g. thumb-sucking in utero, Babkin's reflex, etc.) in order to elaborate new couplings for new skills (hand-to-mouth feeding ... piano playing).
2.
Emergence as mergence: Sign+Sign=>Sign and Foot+Foot=>Foot
Regarding the emergence of phonology, some students like Lindblom and ourselves have considered that features, particles, primes, etc., are just byproducts of other mechanisms (for a recent tentative reconciliation with the use A
J
4
of features within our Perception-for-Action-Control-Theory, see Schwartz, Boe & Abry, 2007). But what are the unit of the system you start from? The number of segments? The possible onsets and offsets of syllables. ..? In computational evolutionary phonology, the issue is still between a holistic-formulaic starting point, or a yet undefined layman word unit. This in spite of our linguistic stateof-the art, since ((we still do not have strict definitions of even the most basic units, such as segment, syllable, morpheme, and word,), as complained by Joan Bybee (2003, p. 2 ) . Now instead of fission, can fusion help? In other words can the compositional making of larger units from smaller bricks, be replaced by the blending of already more or less large units, typically two into one unit of the same level (an idea taken earlier in the categorial grammar formalism, still compatible with MP)? Which of course leaves open the evolutionary issue about where they could come from. Let us take an example from a still-on-the-making phonology. In Sign Language, where no stable consensus does exist about phonological units, can one use semantic blending and morphological fusion to evidence these components? In ASL, MIND+DROP=>FAINT (we are indebted to Wendy Sandler for this videoclip example). If Sign+Sign=>Sign is semantic blending (snowman), what are the corresponding phonological units? Is there a signlanguage specific ((syllable conspiracy)), as Sandler claims: Syll+Syll=>Syll? Or a more common foot isochrony Foot+Foot=>Foot? Like one-foot music, musical, musically? Snowman is obviously shorter than snow+man duration. In fact, once measured, the downstroke phase of FAINT (which starts from the head for MIND, with the finger point erased) is just a videoframe longer than the one for DROP (starting lower from the waist). Which is a strong cue of isochrony control for compression in one unit (chunk, template, etc.). Is that just emergence-supervenience of units due to informational constraints, just language-use, the war of attrition on constructions as formmeaning pairings, in cognitive construction grammars? Said otherwise: data compression for sparse coding? Are there no macroscopic units corresponding to universal control units, macroscopic primitives for making morphogenetic ((language bubbles)), not acquired simply by perceptuo-motor statistical patternfinding? Are there phonologically universal babble-syllable constraints in speech acquisition, and more, signs and words in both speech and sign language (even if syllables could be not ubiquitous in both media)? In other words, when in evo-development do you get a tuner for tuning? Who could attune what, along language attunement-imitation, without a specific what-tuner to capture the preferred radiostation among the buzzy broascasting landscape of speakers?
5 3.
The syllable, then the point: whence the word?
Recently in the Yocalize-to-localize framework (a functional stance just started in the Interaction Studies 2004-2005 issues we edited, see Abry, Vilain & Schwartz, 2004), we addressed the unification of two grounding attempts concerning the syllable and the foot in language ontogeny. Both units are highly disputed among phonologists and psycholinguists. But the proposal of a root for proto-syllables in canonical babbling can now be neurally evaluated on the basis of a motor control platform: MacNeilage's Frame/Content theory starting from the control of the mandible as the carrier articulator. We proposed the same ground of evaluation for the foot as the basic control unit for the phonology of the proto-word. We predicted that, if we would measure the babbling rhythm of a baby from the burst of canonical babbling around 6-7 months, we could calculate the range of duration of her pointing arm-strokes, from 9 months upwards. Tested on 6 French children in a longitudinal study, each fortnight between 6 and 18 months, this ((astonishing)) hypothesis was globally successful (Ducey, 2007), with a mean 2.18 pointing/babbling ratio. Moreover each child had at her disposal in her repertoire a sufficiently long point to cover a disyllabic utterance. Like for linguistic demonstratives, the semantics, pragmatics, and even the syntax of pointing have all deserved valuable attention and brought out results in related fields. And Sign Language phonology too, which meets ubiquitously pointing. But nothing was said about the proper phonological integrative links of the pointing gesture with speech phonological units, smaller or larger than the point, like the syllable, the foot, and the so-called ((prosodic word)). We can now consider that the phonology of the point with the arm-index could give for free the template of the ubiquitous one/two-syllable word foot (instead of an arbitrary FOOTBIN in Optimality Theory, where a onesyllable/moraic foot is considered as ((degenerated)) or ((subminimal))). Grounding the phonology of the point motorically, in the neural arm-index control, gives thus for free the template of the two-syllable word as a coordination of the hand and the mouth in language semiotics and phonetics. This result offers in addition considerable insights in line with the parallel development of syntax use of THat-demonstratives and WHat-interrogatives through the grammatization process in the world's languages (Diesel, 1999). It is in favor of an early demonstrative site, later attuned to language specific morphonology: see English (the) house vs. Swedish huset, French la maison vs. Rumanian domul; and even more elaborated compounding, with what could be
6 tagged ((double filled sites)): French cette maison-ci vs. Swedish det har huset, or Afrikaans hierdie huis, etc. This is just one of the issues, the developmental framework reminded below (Fig.]), allowed us to address up to now, in between the Vocalize-to-Localize (2003) seminar and the 2007 VOCOID ( VOcalization, COmmunication, Imitation, and Deixis, in infant and adult human and non-human primates), both international meetings we organized in Grenoble.
4. Beyond the presented Framework (Fig.1) Beyond reinforcing the very general claim that ((pointing is the royal road to language for babies)) (as recalled by the late George Butterworth in Kita, Pointing, 2003), we can add to our prediction of pointing stroke duration distributions from individual babbling rhythm distributions another replicated prediction: namely the prediction that two-word utterance emergence can be calculated from the beginning of the coproduction of a word together with a non redundant pointing (a result found in Susan Goldin-Meadow's group, and replicated with Jana Iverson in Iverson & Goldin-Meadow, 2005). Since this is not a pure slot-grammar story (POINT+Word gives Word+Word, but the POINT is still there in the predicate-argument structure), the rationale behind this development beyond the first year word, remains still a lot mysterious (personal conversation with Susan Goldin-Meadow and Elena Lieven). Finally we will add work in progress on two possible neural circuits found in adults, which could be relevant for language acquisition of the word-foot metric unit, namely the one we dubbed the THAT-PATH, for pointing with the eye, the arm and the voice (Loevenbruck et al., 2005, 2007). And ultimately the verbal working memory network, we dubbed the STABIL-LOOP (Abry, Vilain & Schwartz, 2004), for stabilizing the linguistic word forms (Abry el al., 2003, Sat0 et al., 2004, 2006). Working memory was already proposed by Francisco Aboitiz and Ricardo Garcia (1997) as a masterpiece in the primate evolution toward language, but with little concern about language (universal) preferred forms before matching for recall. We will insist here on the fact that, in our view, this STABIL-LOOP system can stabilize both word order (basic syntax and compounds) and word form structure (morphonology).
5. Summary Beyond the fissionlfusion metaphors, several of these empirical findings from ontogeny could help in building an evo-devo story of language with caveats:
7 (i) Syllables are definitely not built from segments; but segments are a late by-product of new degrees of freedom, making the carried lip and tongue articulator more and more independent from the carrier jaw (rhythm control). (ii) Words are neither built from syllables; but chunked from the babbling flow, in the pointing frame (discrete stroke control). (iii) Syntax does not emerge with 2-word utterances; but syntactic demonstrative (argumentative-referencing)pointing is there from the first word; and still there when 2 words appear, depending on the preceding date of emergence of the skill of pointing to the argument while predicating about a different referent from the pointed one (e.g. saying ” (meaning “a wolf is chasing a sheep”) are exchanged among agents during communicative acts. Through the pattern extraction ability, individuals may acquire some recurrent patterns in the exchanged utterances as lexical items (see the LEXICON rectangle in Fig. 1). By sequential learning, individuals may acquire local orders recording order relations among two lexical items in the exchanged utterances. In addition, when individuals observe that some lexical items with the same semantic role are similarly used (display
133
the same local order with respect to other lexical items) in some exchanged utterances, they can assign these lexical items to the same category; for simplicity, we labeled them with the syntactic roles met in simple declarative sentences in English (‘S’, Subject; ‘V’, Verb; and ‘O’, Object). Through reiterating local orders among categories, individuals gradually acquire emergent global order(s) to regulate strings of lexical items from categories and form utterances to encode integrated meanings. For instance, if an individual’s linguistic knowledge includes some S, V, and 0 categories locally ordered “S before V” and “ 0 after V” that lead to an emergent global order SVO, then, helshe can express the integrated meaning ”chase<wolf, sheep>” as /WOLF CHASE SHEEP/, letters within “/ /” are utterance syllables chosen from a signaling space and not necessarily identical to English words. The initial stage of the model could be either no language at all or a small holistic signaling system in which all individuals share a small number of holistic rules to encode some integrated meanings. Through iterated communications, a compositional language having a set of lexical items and global order(s) gradually emerges. This model gives us an appropriate level of complexity to observe the effect of intention sharing on language evolution and the optimization role of cultural transmission in adjusting the level of this ability.
Detection of m~wrmntpatterns
e EMERGENT GLOBAL ORDERS S I I V + O I I V + WLFCHASESHEEPI
‘BOUom-up’ syntadic develwrnent
Figure 1. The conceptual framework of the language emergence model: the SEMANTICS rectangle stands for the predefined semantic space; the ovals represent the three aspects of linguistic knowledge acquired by agents based on different domain-general abilities: pattern extraction, sequential learning, and categorization; the EMERGENT GLOBAL ORDERS rectangle encompasses the emergent syntactic palterns triggered by this linguistic knowledge.
Intention sharing in this model boils down to the mailability of the topic from the environment during communicative acts, and it is simulated as an individual’s parameter, Reliability of Cue (RC), which indicates the probability (from 0.0 to 1.0) for the listener in a communication to accurately acquire the speaker’s intended integrated meaning in the heard utterance from an environmental cue (an ongoing event represented by an integrated meaning in
134
their environment). In a communication without intention sharing, a wrong cue containing an event different from the speaker’s intended meaning is given to the listener; in a communication with intention sharing, the speaker’s intended meaning is directly given to the listener through a cue. From the speaker’s perspective, RC indicates the probability of choosing an ongoing event in the immediate environment as the topic of the communicative act. From the listener’s perspective, it indicates the probability of referring to the ongoing event to assist comprehension. If RC is 1.O, intention sharing is established in all communications; if it equals 0.0, the listener only gets wrong cues, and no intention sharing is established in any communication. In this paper, the relations among RC, language emergence, and cultural transmission are discussed by evaluating two indices: Understanding Rate (UR, the average percentage of accurately understood integrated meanings in communications of all pairs of agents in the population based on their linguistic knowledge only and without referring to cues) and Convergence Time (CT, the number of generations of communications to reach a high UR, say 0.8). 3. The Cultural Transmission Framework Cultural transmission is defined as the communications among individuals from the same (intra-generational) or different (inter-generational) generations. As the medium of language exchange, it plays important roles in language evolution. In this paper, we assume that there is an ongoing optimization process based on linguistic understandability during cultural transmission; individuals who can better understand others in communications may obtain more resources and produce more offspring, and these offspring may maintain some of their parents’ language-related abilities. A cultural transmission framework is simulated under this assumption to test whether this optimization process plays a role in adjusting RC. In the framework, after a number of intragenerational transmissions, some individuals who have higher linguistic understandabilities will become “parents” and produce “offspring”. The offspring replicate their parents’ RC values with some occasional, small changes. GA-like mechanism such as mutation (a tiny increase/decrease in a RC value) is applied during the reproduction. After “birth”, the offspring start to learn from their parents through inter-generational transmissions, and then replace them and other individuals from the previous generation. After that, a new cycle begins. For the sake of comparison, another type of simulations without optimization is implemented, in which agents are randomly chosen as parents to
135
produce offspring regardless of their communicative success in each generation. During the reproduction process, mutation is also applied. In all simulations of this paper, the population has 10 agents. In the first generation, all individuals' RC values are randomly chosen from a Gaussian distribution of RC whose standard deviations are 0.01 and their means range from 0.0 to 1.O in different conditions. In each generation, there are 200 rounds of random painvise intra-generational transmissions and 200 rounds of intergenerational transmissions from parents to offspring. A round of transmissions includes 10 communications among different pairs of agents. After intragenerational transmissions, 5 agents are chosen as parents, each producing 2 offspring. During the reproduction process, a small (0.1) increase/decrease of the RC values occurs with a probability of 0.02 (the mutation rate). The total number of generations is 200. In the simulations with optimization, parents are chosen according to their linguistic understandabilities, i.e., the average percentage of integrated meanings that they, without referring to cues, can accurately understand when others speak to them. In the simulations without optimization, parents are randomly chosen. In each condition of the simulations, the results of 20 runs are collected for statistical analysis. 4. The Simulation Results
Fig. 2 (a) records the average and standard deviation values of the highest UR throughout the simulations in the 20 runs with different initial RC values. UR reflects the average linguistic understandability of the whole population. Fig. 2 (b) illustrates the average CT under different initial RC values. ..
n l .
d l , . , , , 1
02
0 1
D l
05
oe
RC
07
08
OD
3
1
02
03
0.
0s
(16
07
08
OD
t
RC
(a) (b) Figure 2 The simulation results with and without optimization (a) Average highest UR vs RC, (b) Average CTvs RC The dashed lines trace the results with optimization during cultural transmission, and the solid ones trace the results without optimization
In the simulations without optimization, when RC is low (below 0.3), UR is rather low (around 0.125, the UR of the initially shared holistic rules), and a
136
communal language with a high UR does not emerge in the population; when RC lies in the interval [0.4 0.71, a communal language emerges, and the increase in RC accelerates language origin, which is indicated by the decrease in CT; when RC is rather high (over 0.8), an increase in RC does not further accelerate language origin. These results suggest that without optimization, a relatively low RC (around 0.5) is sufficient to trigger a language with a high UR (around OX), and a small increase in RC from 0.4 to 0.5 causes a qualitative change from no language to a communal one. In other words, a small phenotypic change can result in a communication means of a totally different nature (Elman, 2005). In the simulations with optimization, the adequate level of RC to trigger a communal language is further reduced; a much smaller initial RC (0.2) can trigger a communal language with a high UR (over 0.6). In addition, language origin in these simulations is more efficient than that in the simulations without optimization. However, if the initial RC is high (over 0.7), language origin doesn't differ much in these two types of simulations. The evolution of RC values in the simulations with optimization is shown in Fig. 3, in which Fig. 3 (a) traces the average and standard deviation values of initial, maximum and last RC throughout 200 generations and Fig. 3(b) traces the RC values in some particular runs. If the number of generations extends a little bit, say 300, a similar trend is maintained, though the further update (increasing or decreasing) of RC is inexplicit.
nh*l
U."
I\usrag. RC I" m p0pulallon
(a)
Nuntserolpansrslmns
(b)
Figure 3 . The evolution of RC in the simulations with optimization: (a) Statistical results of RC, each line summarizes the initial, maximum and last RC values in the simulations with a particular range of initial RC (from 0.1 to 1.0 with a step of 0.1); (b) Specific RC values in different runs, each line records the RC values at different generations in one simulation.
Two roles of cultural transmission with respect to RC are shown in Fig. 3. The optimization during cultural transmission is based on individual linguistic understandability. Since a high RC contributes to the acquisition of correct
137 linguistic rules that help an individual to accurately understand others’ idiolects, it can be indirectly selected by cultural transmission, and gradually spread in the population. Then, the average level of RC in the population increases gradually in respond. This increasing effect is well illustrated in Fig. 3, especially when the initial RC is low (below 0.8). However, if the initial RC is already high (around 0.7), cultural transmission does not greatly change it, but maintain it throughout the simulations. For a rather high RC in [0.9 1.01 interval, cultural transmission may even lead to a slight reduction of it; its last value becomes slightly smaller than its initial one, as shown in Fig. 3 (a). Slightly reducing a rather high RC is a side effect of optimization. Since these initial RC values are high enough to trigger a communal language, an individual who has a slightly lower value can still have a high understandability, and be chosen as the parent to produce offspring and spread this RC to the population. Then, the average level of RC in the population may slightly drop, without greatly affecting the UR of the emergent language. In this situation, there are a number of communications with no intention sharing during cultural transmission, which provides the opportunity for agents to develop robust linguistic knowledge that needs no assistance of cues or even resists distractions of wrong ones. This reliable language can efficiently describe the events not occurring in the immediate environment, gradually liberate itself from the restriction of nonlinguistic information, and become efficiently used in communications with no cues or other nonlinguistic assistance. Compared with the increasing effect on RC, this reducing effect is not much explicit in the short run, but it is crucial for language evolution in the long run.
5. Conclusions The simulations in this paper demonstrate the roles of cultural transmission in intention sharing. Cultural transmission can adjust the level of this ability to trigger a communal language. Meanwhile, it can also prevent this ability from going rather high so that displacement can establish in the emergent language. Apart from shaping linguistic features such as compositionality and regularity, our study shows that cultural transmission can help to optimize some languagerelated abilities, leading them to optima that are not necessarily the highest possible values. In addition, the framework in this paper can be adopted to study the role of cultural transmission in other language-related abilities, such as the ability to detect recurrent patterns or manipulate local orders. This approach will provide a clear picture on the “mosaic” fashion of language evolution, and may help to
138
verify the claim of Connectionism (Elman, 2005) that small phenotypic changes in our species may yield language as the outcome. Finally, the level of RC is modified via some GA-like mechanisms during inter-generational transmissions based on individual linguistic understandability. The adopted GA-like mechanism does not imply that the ability of intention sharing has to be updated necessarily through genetic transmission, and other optimization mechanisms may play a similar role. Acknowledgements We thank Dr. Christophe Coup6 from Laboratoire Dynamique du Langage, CNRS - University Lumibre Lyon 2 for the valuable discussions. References Cangelosi, A., Smith, A. D. M., & Smith, K. eds. (2006). The evolution of language: Proceedings of the 6th international conference. London: Word Scientific Publishing Co. Elman, J. L. (2005). Connectionist models of cognitive development: Where next? Trends in Cognitive Science, 9, 1 1 1- 1 17. Gong, T., Ke, J-Y., Minett, J. W., Holland, J. H., & Wang, W. S - Y . (2005). A computational model of the coevolution of lexicon and syntax. Complexity, 10, 50-62. Gong, T. (2008). Computational simulation in evolutionary linguistics: A study on language emergence. Taipei: Institute of Linguistics, Academia Sinica. Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 88-96. Ke, J-Y., CoupC, C., & Gong, T. (2006). A little bit more, a lot better: Language emergence from quantitative to qualitative change. In A. Cangelosi, A . D. M., and K. Smith, (Eds.), The evolution of language: Proceedings ofthe 6th international conference (pp. 4 19-420). London: Word Scientific Publishing co. Kirby, S. (1999). Function, selection and innateness: The emergence of language universals. New York: Oxford University Press. Oller, K. & Griebel, U . eds. (2000). Evolution of communication systems: A comparative approach. Cambridge, MA: MIT Press. Pinker, S. & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707-784. Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28, 675-69 1. Wang, W. S-Y. (1982). Explorations in language evolution. Osmania Papers in Linguistics, 8. Reprinted in W. S-Y. Wang, (Ed.), Explorations in language, Taipei (pp. 105-131). Taiwan; Seattle, WA: Pyramid Press.
THE ROLE OF THE NAMING GAME IN SOCIAL STRUCTURE TAO GONG, WILLIAM S-Y. WANG Department of Electronic Engineering, The Chinese University of Hong Kong Shatin, New Territories, Hong Kong This paper presents a simulation study to explore the role of the naming game in social structure, which is nearly neglected by contemporary studies from statistical physics that mainly discuss the dynamics of language games in predefined mean-field or complex networks. Our foci include the dynamics of the naming game under a simple distance restriction, and the origin and evolution of primitive social clusters as well as their languages under this restriction. This study extends the current work on the role of social structure in language games, and provides better understanding on the self-organizing process of lexical conventionalization during cultural transmission.
I . Introduction The origin and evolution of language or general communication systems is a fascinating topic in the interdisciplinary scientific community. A number of approaches from biology, statistical physics, and computer science have been proposed to comprehend some specific aspects in this topic (Oller & Griebel, 2000), among which the self-organizing emergence of a shared lexicon during cultural transmission has been extensively studied based on various forms of language game (Steels, 2001) models in the past few years. The naming game is one form of language games that simulates the emergence of a collective agreement on a shared mapping between words and meanings in a population of agents with painvise local interactions (Steels, 2001). A minimal version of it (Baronchelli and Loreto, 2006) studies the main features of semiotic dynamics. In this version, N homogeneous agents are describing a single object by inventing words during painvise interactions. Each agent has an inventory (memory) that is initially empty and can store an unlimited number of words. In a painvise interaction, two agents are randomly chosen, one as “speaker” and the other as “hearer”. The speaker utters a word to the hearer. If its inventory is empty, the speaker randomly invents a word; otherwise, it randomly utters one of the available words in its inventory. If the hearer has the same uttered word in its inventory, the game is successful, and 139
140
both agents delete all their words but the uttered one. If the hearer does not have the uttered word, the game is a failure, and the hearer adds (learns) the uttered word to its inventory. In a mean-field system, the dynamics of the naming game can be traced by N,,(t), the total number of words in the population; Nd(t), the number of different words; and S(t), the average successful rate of interactions among all pairs of agents. Statistical physicists (e.g., Baronchelli & Loreto, 2006; Dall’Asta et al., 2006) have further explored the dynamics of the naming game in structures such as 1D/2D lattices, small-world and scale-free networks. Although these studies extensively discussed the role of social structures in convergence of shared lexicon, most of them neglected the reverse role of the naming game in social structure; in these studies, a successful or failed naming game only affects individual’s linguistic knowledge, but has nothing to do with the predefined social structures. In a cultural environment, successful or failed interactions among individuals can not only adjust their knowledge, attitudes or opinions, but also affect their social connections or political status in the community. Factors that operate on a local scale, such as interaction procedures and geographical or social distance restriction, can adjust the possibilities of interactions among agents, thus affecting individual or group similarities on a global scale (Axelrod, 1997; Nettle, 1999). These simple factors may take place much earlier than the emergence of complex social structures, and cast their influences on formation of primitive social clusters and their communal languages. For instance, during language origin, a successful naming game towards a common object in their environment may form a social binding among the participants of this game, and share a common lexicon among them. These factors may take similar effect in modem societies during language change. For instance, a successful or failed naming game towards a salient concept may form a new binding or destroy an old one among the participants, and adjust their communal languages. Moreover, in order to establish a complex social network in a huge population in which not every two individuals could ever directly interact with, a certain degree of mutual understanding is necessary, and simple language games may play a role in achieving such mutual understanding through local interactions. Therefore, besides its dynamics in some predefined complex networks, the dynamics of the naming game under simpler constraints and its role in social structure are worth exploring as well. In this paper, we present a preliminary study in this respect. Instead of detailed constraints determined by complex networks, we simulate a simple distance constraint, and discuss its influence on formation of social clusters and their communal languages. The simulation traces the coevolution of language
141
and social structure based on the naming game, and the formation of mutual understanding in a population via local interactions among its members, both of which will help us better understand the self-organizing process of lexical conventionalization based on the naming game. The rest of the paper is organized as follows: Section 2 introduces the simple distance restriction; Section 3 discusses the simulation results of two experiments; and finally, Section 4 provides the conclusions and future work.
2. The Naming Game with a Distance Constraint The interaction scenario of our naming game is identical to its minimal version described in Section 1. To introduce distance restrictions, we situate all agents in a 2D square torus (X', X i s the side length of the torus), and each agent can randomly move around to its 8 unoccupied, nearby locations, as shown in Fig. 1. This torus represents either a physical world, or an abstract world, such as the distributions of opinions or social status.
0
Agent
6
Movement
''./
Possib'e locatla"*
CC Successful naming game * - - - C Failed Communication
[:I]
Distance restriction (ZDx+l)r(ZDy+l)
Figure 1. A 2D torus with moving agents.
The distance restriction, inspired from our previous study (Gong et al. 2005) and applied on agent selection during painvise interactions, is defined as follows: The distance restriction: interactions only take place between agents whose coordinates are within a limited block distance (Oxand D,), as shown in Eq. (l), where x,, y , are agent i's coordinates in X2, the second part of each condition calculates the situation where agents are located in boundaries but their block distance may still be within 0, and D, since they are in a torus:
142
This concept of distance can either represent geographical distance such as the city-county distance, or social distance such as the dissident opinions. Under this distance restriction, each agent in the torus can interact with at most (2Dx+l)x(2D,+l)-l(itself) nearby agents. When D, and Dyequal 1, each agent only interacts with those lying in its 8 nearby locations. This restriction provides a binding (bias) for the participants of the naming game: a successful naming game can bind the speaker with the listener, and they tend to move together to maintain their block distance within D, and Dy(in other words, either of them can move in such a way that after movement, their block distance is still within D, and Dy); however, a failed naming game may break down this binding (in other words, either of them can randomly move in any direction). This restriction is much simpler than those defined by complex networks. Based on it, some big social clusters containing agents who share a common lexical but may not necessarily interact directly with each other may emerge and be maintained. These clusters and their shared words could be the prototypes of complex social structures and their communal languages. We design two experiments to evaluate the influence of this simple restriction on formation of social clusters and conventionalization of shared lexicon. In Exp. 1, 100 agents are situated in a lo2 torus (each location in the torus is occupied by an agent), and D,and Dyrange from 1 to 10. In Exp. 2, 100 agents are put into tori whose side length Xranges from 10 to 55, but D,and D, are fixed. In each time step, a random sequence of agents is set, and following which, each agent is chosen to interact with one of the others lying within its distance restriction (if any), and then, it moves, based on the interaction result (successful or failed), to one of its unoccupied neighboring positions (if any). The total number of time step is 100, and the maximum number of possible interactions is lOOx 100=10000. In each condition, the results of 20 simulations are collected for statistical analysis. After a time step, S (the average successful rate of interactions among all pairs of agents) and Nd (the number of different words) are evaluated. If all agents gradually share a common lexicon, S will gradually increase to 1.0 and Nd reduce to 1. In this situation, NT (the number of time steps required to reach the highest S) indicates the degree of efficiency of the distance restriction on lexical conventionalization in the population. On the contrary, if all agents cannot share a common lexicon, but form different clusters, S and Nd will not reach 1. In this situation, Nd indicates the number of isolated clusters, and NT the effect of the distance restriction on lexical conventionalization within clusters. The following sections discuss the simulation results of the two experiments.
143 2.1. Exp. 1: frved torus size but various distance restriction
In this experiment, all 100 agents lie in a lo2 torus; D, and Dy change from 1 to 10. In all simulations, after 100 time steps, a common lexicon is shared in the population; both S and Nd become 1 at the end of simulations. Fig. 2 illustrates the average and standard deviation values of NT under different D, and Dy.
D, and Dy
Figure 2 The statistical results of Nr in Exp 1, each point is calculated based on 20 simulations after 100 time steps and maximum 10000 possible naming games
As shown in Fig. 2, with the increase in D, and Dy, the process of lexical conventionalization follows two regimes: as D, and Dy increase from 1 to 4, agents can interact with more nearby agents and adjust their words, then, the lexical convergence is accelerated and NT drops; when D, and Dy are greater than 5 , each agent can already interact with all the others in the population, then, the lexical convergence is not further accelerated and NT becomes stable. In addition, in a 10’ torus, when D, and Dy are small and each agent cannot directly interact with all others, lexical conventionalization is still accomplished after not many interactions via intermediate agents, and a cluster having agents who cannot directly interact with each other but share a lexicon is established.
2.2. Exp. 2: various torus size but frved distance restriction In this experiment, 100 agents are randomly situated in tori whose side lengths increase from 10 to 55 with a step of 5. D, and Dy are fixed to 5. Fig. 3 illustrates the average and standard deviations of S, NT, and N d in Exp. 2. The process of lexical conventionalization in Exp. 2 also follows two regimes: when X is smaller than 30, all agents in the population form a huge cluster and share a common lexicon; however, after X reaches a certain level (say, 30), S begins to drop, and both NT and Nd begin to increase. In a relatively small torus (Xis smaller than 30), although agents may not find many others within their distance restrictions, through moving around, they can encounter some agents and get their words converged to a shared lexicon. However, in a
144
big torus (Xis bigger than 30), this 1-step movement is insufficient for agents to meet many others and the big torus size greatly restricts the local interactions among agents, then, isolated, smaller clusters gradually emerge, and each of which shares a common lexicon. The drop of S and increase of Nd both indicate the emergence of small clusters. Within a cluster, S among the cluster members is high, but between clusters, S among members of different clusters is low, since they may share different words. In addition, once such clusters are formed, it is difficult for agents within clusters to interact with outsiders, since they tend to maintain their distance among each other and not to freely move. In a sense, the bindings within clusters are relatively strong, and these clusters and their shared words are relatively stable, which are indicated by the stable values of S(t) and N& for a long time in specific simulations.
Figure 3 . The statistical results of S (a), NT (b), and N,I (c) in Exp. 2, each point is calculated based on 20 simulations after 100 time steps and maximum 10000 possible naming games.
A “local convergence, global pulurizalion” phenomenon (Axelrod, 1997) is shown in Exp. 2 under a big torus: agents within clusters clearly understand each other via a shared lexicon, but those between clusters do not, since they may share different words. This phenomenon partially reflects the coexistence of many languages in the world, and it is mainly caused by the distance restriction and mutual understanding during local interactions. Besides, if we assume that agents are developing a basic vocabulary using the naming game, these simulations may actually trace the concurrent emergence of different vocabularies, and later on, different languages in the early stage of language development in the world. Second, combing Exp. 1 and Exp. 2, the boundary values of the distance restriction and torus size suggests a quantitative relation between the local view and the world size. Roughly speaking, the current results seem to show that given a certain number of time steps (loo), once the local view (2Dx+1)x(2Dy+1) is smaller than 1/10 of the torus size, the whole population will neither
145
efficiently form a cluster nor share a common lexicon. Further statistical analysis in simulations with bigger populations can confirm this prediction. Finally, people may intuitively think that under random or biased movements, sooner or later, all agents will encounter all others, and since the naming game can easily make the participants’ vocabularies converge in one interaction, all agents will eventually form a big cluster. However, two arguments are against such prediction. First, in the case of random movements, this process may take extremely long time. In our model, once agents form close clusters, those in the central may not easily move since all their neighboring locations are occupied by others. Therefore, even given an extremely long time, the formation of a big cluster may not occur. Second, the convergence role of the naming game may also cause divergence of a cluster, since the convergence is made via deleting all the other words in the participants’ vocabularies. For instance, if Agent 1 with Word A interacts twice with Agent 2 in a cluster where all agents only use Word B, Agent 2 may diverge from this cluster and form a new one with Agent 1 using Word A, and then, the agents in Agent 2’s original cluster has to interact at least twice with Agent 2 to drag it back to their cluster. This process introduces fluctuations that may delay lexical conventionalization. Therefore, even if agents, through random or biased movements, have chances to encounter all others, or all of them are within certain distance restriction, their vocabularies might not quickly converge. This partially explains why in the mean-field system all agents still need many rounds of naming games to conventionalize their vocabularies. Such fluctuations also show in our results and help to maintain the polarization state; the clusters are dynamically stable; their boundary agents may occasionally change, but their shared lexicon, sizes, and majority candidates remain roughly unchanged in a long run. 3. Conclusions
The simulations in this paper demonstrate the role of the naming game in social structure: the naming game under the simple distance restriction can adjust the social binding among agents and form primitive clusters based on mutual understanding. This line of research is largely neglected in contemporary studies that mostly focus on the impact of social structures on language games (e.g., Delgado, 2002; Dall’Asta et al., 2006). We present two experiments to illustrate the dynamics of the naming game under distance restrictions and word sizes. First, a big cluster sharing a common lexicon can be formed among individuals whose local views (distance restriction) might not allow them to see all members in the population. In addition, there is a
146
close relation between the local view and the world size: under a fixed world, the increase in the local view accelerates the conventionalization of individual knowledge; under a fixed local view, the increase in the world size triggers the emergence of different clusters and linguistic divergence, i.e., common knowledge (shared lexicon) is developed within clusters, but heterogeneity (different shared words) occurs between clusters. Furthermore, the enlarging local view may be reminiscent of the growing mass media and the “global village” phenomenon in recent centuries, while the fixed local view with increasing world sizes may represent the reality that people do have such a constraint of a relatively limited view. Considering these, our model may address a scenario with these two competing conditions, and other activities like opinion formation (Rosvall & Sneppen, 2007) may follow a similar scenario. Acknowledgements
We would like to thank Dr. Jinyun Ke from Michigan University and my colleague Dr. James Minett for the valuable suggestions and discussions. References
Axelrod, R. (1997). The dissemination of culture: A model with local convergence and global polarization. The Journal of Conflict Resolution, 41, 203-226. Baronchelli, A. & Loreto, V. (2006). Ring structures and mean-first passage time in networks, Physical Review E, 73, 026103. Dall’Asta, L., Baronchelli, A., Barrat, A,, & Loreto, V. (2006). Nonequilibrium dynamics of language games on complex networks. Physical Review E, 74, 036105. Delgado, J. (2002). Emergence of social conventions in complex networks. ArtrJicial Intelligence, 2002, 141, 17 1- 185. Gong, T., Minett, J. W., & Wang, W. S-Y. (2005). Computational exploration on language emergence and cultural dissemination. Proceedings of IEEE Congress on Evolutionary Computation, 2, 1629- 1636. Nettle, D. (1999). Using social impact theory to simulate language change. Lingua, 108,95- 1 17. Oller, K. & Griebel, U. eds. (2000). Evolution of communication systems: A comparative approach. Cambridge, MA: MIT Press. Rosvall, M. & Sneppen, K. (2007). Dynamics of Opinions and Social Structures, arXiv:0708.0368~1[physics.soc-ph]. Steels, L. (2001). Grounding symbols through evolutionary language games. In A. Cangelosi and D. Parisi, (Eds.), Simulating the evolution of language (pp. 2 1 1-226). London: Springer-Verlag.
DO INDIVIDUALS’ PREFERENCES DETERMINE CASE MARKING SYSTEMS?
DAVID J. C. HAWKEY Language Evolution and Computation Research Unit, Edinburgh UniversiWs 40 George Square, Edinburgh, EH8 9LL, UK
[email protected] The typological distribution of case marking systems presents a puzzle which some linguists have tried to solve in terms of the preferences of individuals. In this paper I highlight some flaws in these approaches, and argue that the typological facts are best dealt with from a diachronic perspective. Processes which could plausibly have given rise to case systems need not have any relation to hypothesised individual preferences concerning case marking. This divergence between putative individual preferences and reasons for the development of linguistic structures undermines the notion that generally typological facts can be informatively illuminated as optimal solutions to aggregated individual preferences.
1. Explaining case systems Accusative case systems mark S NPs the same as A NPs and differently to 0 NPsa In contrast, ergative case systems mark S and 0 NPs in the same way but A NPs differently. Somc languages use different case systems for different kinds of NP. An interesting typological generalisation can be described using a hierarchy of NP types ordered by their likelihood of performing the A role (the “nominal hierarchy’’, see Fig. 1): ergativity (when present) characterises all NP types from the right hand end of Fig. 1 up to a (language dependent) point. Similarly, accusativity (if present) affects all NP types from the left up to a certain point.
1.1. Discourse “motivations”
Du Bois (1987) sought to demonstrate the existence of a motivation in discourse for ergativity. Analysing a set of Sacapultec speech samples, he found that novel this paper I use S to refer to the NP argument of an intransitive clause and A and 0 to refer to the subject and object respectively of transitive clauses. Note that S can be defined purely syntactically, but A and 0 refer to syntactic categories which generally have certain semantic properties (A is on the whole the argument which could initiate or control the activity, and 0 is then simply the other argurnent).This paper is only concerned with morphemes attached to NPs to indicate their S/A/O roles. Other systems, such as ergative/accusative cross referencing of NPs with verb affixes, are not dealt with here. (I use bold font for S, A and 0 to avoid confusion with capitalised indefinite articles.)
147
148 Nominative-accusative
1st person pronouns
--+
2nd person pronouns
Demonstratives, 3rd person pronouns
, Proper Human nouns
Common nouns c
Animate
-
1
Inanimate
Ergative-absolutive
Figure 1. The nominal hierarchy (from Dixon, 1994).
discourse entities were almost always introduced in S and 0 roles. Du Bois argued that the pattern was likely to be common across languages, and interpreted this as being the “motivation” for ergativity: the absolutive case exists to “accommodate”b new information. He also suggested a competing motivation which would favour accusativity, namely that mentions in S and A roles are typically human, agentive and topical. Du Bois appealed to differences between NP types in terms of the degree to which these competing motivations apply to explain typological patterns of split ergativity. NPs higher up the nominal hierarchy are rarely used to introduce new information, and so the motivation for an ergative/absolutive distinction is weaker than for NPs lower down the hierarchy. Thus, higher up the hierarchy the motivation to equate S and A is more likely to dominate giving rise to accusative marking for (e.g.) pronouns, while further down the hierarchy, the pressure to accommodate new mentions (in S and 0) dominates, giving rise to ergative marking for (e.g.) common nouns. However, Du Bois didn’t propose a mechanism by which these motivations could give rise to case systems. In Kirby’s (1999) terms, he left the “problem of linkage” unsolved. If the motivations are interpreted as preferences of the speaker and/or hearer, they are rather unconvincing. Why should individuals prefer that new information be accommodated by an ablative case? It isn’t clear that highlighting the fact that an NP may not have been mentioned before (though the majority of absolutive mentions are not new according to Du Bois) has any significant effect on individuals’ linguistic intercourse. Nor is it clear that if it did, this fact would lead to the development of ergative case marking. Similarly, it is unclear that using the same case (on the whole) for the topic of discourse in both transitive and intransitive utterances produces any significant benefit, nor that were such benefit to exist it would lead to the development of accusative case marking.
1.2. Evolutionary Game Theory Approach In a recent paper, Jager (2007) attempts to account for the typological facts of case marking using Evolutionary Game Theory (EGT). His approach relies on stipulating a set of possible speaker strategies and hearer strategies, and comparing b D Bois ~ couldn’t say that the absolutive case “signalled new information as only about one quarter of absolutive mentions in his corpus were mentions of new discourse entities.
149
their “utility”. Jager divides NPs into two categories, prominent (p) and nonprominent (n), according to their position on the nominal hierarchy.“ This gives four NP categories: Alp, Aln, Olp, and O h . Speaker strategies determine which of these categories will be case marked. All S NPs are assumed to be unmarked (as is the case in most languages, Dixon, 1994, p. 63). Speaker strategies can be notated as a string identifying whether Alp, Aln, Olp and Oln NPs have zero (z), A identifying ( e ) or 0 identifying ( a ) marking. Thus the string ezaz represents the strategy that leaves non-prominent NPs unmarked and marks prominent NPs according to their role.d Hearer strategies all correctly interpret any utterance with case marking on at least one NP, and are differentiated by their response to utterances with two unmarked NPs: either word order is assumed to be consistent (though no speaker strategies employ consistent word order) or the arguments are interpreted as A and 0 on the basis of their prominence (when the NPs have the same prominence, the hearer guesses their roles). Strategies are assigned a “utility” value according to how often they lead to successful communication given a population of other strategies, and how many case marking morphemes are required per utterance on average (speakers are assumed to have a preference for strategies which lead to the production of fewer morphemes). For various different prominenthon-prominent split points, Jager counted the number of Alp, Aln, Olp, and Oln NPs in speech corpora, and used these frequencies when calculating the utility functions of speaker and hearer strategies in various populations.e The utility function is interpretedf in a manner analogous to the fitness of a biological system: strategies with greater utility are more likely to be employed at a later stage. New strategies may enter the population by “random mutation”. Jager identifies strategy sets which are evolutionarily stable (that is, persist given a low enough rate of mutation), some of which represent attested languages. However, the model excludes the pure accusative speaker strategy (zzaa, which characterises existing languages such as Hungarian), and includes typologically uncommon strategies (zzza and ezzz). There are a number of objections to Jager’s mode1.g However, the most signif‘Actually, Jager employs a slightly different hierarchy. For A role, the hierarchy is: pronoun + name definite full N P indefinite specific NP + nonspecific indefinite NP. The 0 role hierarchy is the reverse of this. dAn attempt to apply the terms ergative and accusative to the case markers in this strategy would be confusing. Rather. prominent NPs have a tripartite case system and non-prominent NPs have neutral (no) case marking. ‘It is these frequencies which introduce the asymmetry between A and 0 roles. ‘Jager mentions several different interpretations of his model. The differences between these interpretations do not change the mathematics of his model. gThe split point on the nominal hierarchy is arbitrarily determined to be at the same place for both A and 0 roles, in spite of the fact that for some languages (e.g. Yidiny) A and 0 marking overlaps in the middle of the nominal hierarchy (producing tripartite marking for these NPs, see Dixon, 1994,
+
+
150
icant arises from his attempt to explain why the evolutionarily stable ezzz and zzzu speaker strategies are so rare. Jager appeals to the notion of stochastic stability: some evolutionarily stable states are less resistant to invasion by mutant strategies (in the sense that it requires a smaller proportion of the population to mutate for the whole system to change state). To illustrate this concept, he compares ezzz and zzuzspeaker strategies (in the context of hearer strategies based on NP prominence rather than word order). A system dominated by ezzz will change to being dominated by zzuz if only 2.1% of the strategies happen to mutate from ezzz to zzaz (in contrast, the reverse state change would only occur if 97.9%of strategies mutated in the opposite direction). However, this raises the question: how could this kind of mutation occur? Certainly speakers could not simply adopt an alternative case system out of the blue, nor could they re-use the case markers they already have but switch their function. Jager’s model assumes that hearers always interpret case marked NPs correctly, but in order to do this they presumably must have learned the functions of the case markers they are presented with. Invention of a new case marker or the re-use of an old one for a new function would simply produce confusion (which has a low utility value). Jager notes that comparative methodology has suggested a number of pathways by which case systems can develop, but his model ignores these. His stochastic stability analysis is sensitive to the probabilities of each type of “mutation” occurring, but he adopts the “null hypothesis” that all probabilities are equal, citing the paucity of recorded evidence for the emergence of ergative systems. However, while we may lack evidence specifically for ergativity, we have abundant evidence concerning the development of other syntactic phenomena (McMahon, 1994). That evidence presents a picture of diachronic development as a gradual change whereby patterns of usage change without discontinuity (or individuals’ metalinguistic insight). If we apply that picture to ergativity, we may profitably ask what sorts of changes are likely to have given rise to ergative case marking. And, I suggest, in answering that question we also gain insight into the typological facts of differential NP ergative marking. 2. On the Origins of Split Ergative Systems Garrett (1990) presents an account of the development of ergative marking in Hittite from an earlier ablative case marker which could also have an instrumental sense. He suggests that this development was possible because of the functional overlap between instruments and agents in clauses with transitive predicates. For example, English John extinguished the fire with water and water extinguished 54.2). Another problem is that consistent word order is not available as a speaker strategy, though it is available as a hearer strategy (redundantly, in both A 0 and OA orders). A combination of consistent argument order speaker and hearer strategies would be evolutionarily stable, making perfect understanding of the utterances in Jager’s model possible without any costly case morphology.
151
the Fire. A Hittite translation of the first of these sentences would have ‘John’ in nominative case, ‘the fire’ in accusative, and instrumental/ablative marking on ‘water’. If the A core argument (‘John’) were omitted (giving a Hittite equivalent of the second English example), ‘water’ could be (re)interpreted as fulfilling the A role and the instrumental/ablative marking reanalysed as marking this role. A similar process for intransitives is unlikely given the fact that thematic instruments are rare (or absent) in the subject role of an intransitive clause.h This asymmetry means that it would only be for A (and not for S) that the ablative/instrumental would be reanalysed as a core role case marker. Thus the reanalysis of instrumental marking is a possible source of ergativity. Garrett explained the hierarchy for NP split ergativity by relating it to the likelihood that an NP would be an instrumental: instrumental pronouns are rare and, for pragmatic reasons, animate NPs in instrumental function are unusual. Those NPs which are less likely to fill an instrumental role are also less likcly to develop an ergative case via Garrett’s route. Garrett (1990) also argues that a similar development from an instrumental produced ergative case marking in the prehistory of the Gorokan language family. Thus, he presents evidence for this process happening twice. While Garrett’s explanation is convincing for the languages he analyses, it isn’t clear that it accounts for all NP conditioned ergative splits: Dixon (1994, p. 104) notes that a number of languages employ ergative marking for only some NPs (on the lower end of the nominal hierarchy) and only in the past tense. It seems unlikely that in these languages ergative marking arose from an instrumental used only in the past tense. An alternative route for the development of an ergative case is via the reanalysis of a passive construction. Passive constructions are intransitive clauses derived from active transitive clauses in which the NP which would fill the 0 role in the active transitive syntactically takes the S role. The NP which would fill the A role in the active transitive may be omitted or expressed as an oblique. Languages may employ a passive construction for a number of reasons, including satisfaction of syntactic constraints in the coordination of multiple clauses, to avoid mentioning the entity performing the A role, and to place focus on the 0 NP. If the passive becomes the dominant construction (replacing the previous active transitive), the oblique case marking on the A NP would become an ergative case marker (see Fig. 2). This explanation is favoured by most linguists as an account of the origin of ergativity in modern Indo-Iranian languages (Trask, 1996). In Sanskrit, the passive construction was used mainly in the past tense, and this fact is appealed to to explain the use of ergative case only in the past tense (in, for example, Pashto). Would the development of an ergative case system from a passive construction apply equally to all types of NP? If some kinds of NP are rarely expressed in the oblique case of a passive construction, they would be less likely to be reanalysed hE.g.English John walks with a cane but not *a cane wulks (Garrett, 1990)
152 Accusative Intransitive Active transitive Passive (transitive)
S-x
Ergative
V
A-x 0 - y V 0 - x (A-z) V
==+
Intransitive Transitive
S-x V
o-xA-z
Figure 2. Development of ergative case from passive construction. -x, -y and -z represent case markers. S, A and 0 represent NPs. In the passive construction on the left, S, A and 0 represent the roles these entities would play in the corresponding active construction (also on the left).
with an ergative case as the form A-OBLIQUE(Fig. 2 ) would be systematically missing prior to the reanalysis. Thus if NPs from the left of the nominal hierarchy are rarely expressed as oblique arguments of a passive construction, reanalysis of a passive could be another route by which NP split ergative systems could develop (and may account for some of the combined splits mentioned above). To test whether certain kinds of NP are systcmatically excluded as obliques, I extracted the passive construction arguments in the Switchboard corpus (recorded telephone conversations) of the Penn Treebank.’ In order to determine whether certain NP types are rarer in the oblique case of a passive construction than in the A role of an active construction, I compared arguments in active and passive clauses. Comparing all passive clauses with all active clauses could potentially bias the result if the passive construction is commonly used with an unrepresentative distribution of verbs, and different verbs vary with respect to the kinds of NPs commonly act as arguments. Therefore, passive and active constructions were matched for the main verb. Table 1 shows the results comparing first, second and third person pronouns (singular and plural) with all other types of NP. The passive construction most frequently omitted the NP that would semantically take the A role, but this omission didn’t apply evenly to all NP types. A chi squared test performed on the numbers of expressed underlying-A-role arguments revealed a significant effect ( ~ ~ (N3 = , 1075) = 171.1,p < O.Ol), allowing rejection of the null hypothesis that the absence of pronouns in the passive was simply due to the rate at which underlying-A-function arguments were omitted. Thus, in English at least, the way the passive construction is used systematically excludes pronouns as oblique arguments to a greater degree than non-pronoun arguments. Were a language with this pattern to develop an ergative case from a passive construction, the ergative pattern would likely apply to non-pronoun arguments and not to pronounsj As Garrett (1990) notes, analogical extension of ergative case marking from ’http://www.cis.upenn.edu/”treebank/ ’There are two minor wrinkles with this scenario concerning clauses with one prominent and one non-prominent argument. We may assume that Alp-Oln clauses would simply loose the now redundant old accusative case marking on the analogy with Aln-Oln clauses. and that the new absolutive case marking on infrequent (according to Jager’s counts) Aln-Olp clauses (derived from the passive) would be replaced with accusative case on analogy with the more frequent Alp-Olp clause type.
153 Table 1. A role arguments of active and passive clauses
Active Passive
First 313 1
Second 156 0
Third 328 1
Other 217 59
nominals to pronominals, while possible, is unlikely given that nominal and pronominal morphology are typically divergent. Thus a split ergative system with ergativity confined to nominals is likely to persist. 2.1. The relationship between A and S roles
The pathways outlined above began with an initially accusative language. Is it plausible to take accusativity as a starting point? As Dixon (1994, p. 14) notes, “there are some languages that appear to be fully accusative, in both morphological marking and syntactic constraints [. . . however] no language has thus far been reported that is fully ergative”. One explanation of this fact appeals to diachronic processes that lead to common forms for A and S NPs, irrespective of how accusative/ergative the language already is. If semantic change from a transitive to an intransitive verb most often involves the omission of the 0 NP, the result will be an intransitive verb with S marked in the same way as the original A NP. (Similarly, addition of an 0 to an intransitive would produce an accusative transitive.) The mirrors of these processes (such as A addition/omission) would produce ergativity. Is there any evidence that accusative-producing changes are more common? Du Bois (1987) found a clear association between human mentions and A and S, and continuity between subsequent clauses in the identity of entities in S and A, but not S and 0.Thus development of intransitive clauses from transitives by loss of 0 would be more likely as the resultant S would typically be the kind of entity speakers use intransitives with. Dixon (1994) appeals to such changes as a possible source of split-S marking. In an ergative language, a transitive verb which lost its 0 argument would produce an intransitive with A-like marking of S (in contrast to other intransitives with 0-like marking on S). Recurrence of this change could produce a situation in which the majority of S NPs were marked with A marking; from here all S NPs could become marked like A, either by generalisation of the dominant intransitive pattern, or by loss of the remaining intransitives with unmarked S. This series of changes would produce a “marked nominative” language (S and A treated in the same way but 0 unmarked). Marked nominative languages are rare, but “marked absolutive” (i.e. {SO} marked, A unmarked) are even rarer (Comrie, 2005), and this may be taken as indirect evidence that intransitives are more likely to develop from transitives by the loss of 0 than by the loss of A (a process which could produce a marked absolutive from an accusative language).
154
Diachronic processes which are sensitive to the functional overlap of S and A may thus explain why accusativity is more common than ergativity. Again, these processes are independent of language user’s cognitive case-system preferences.
3. Conclusions The diachronic processes mentioned in this paper probably do not exhaust the possibilities for how ergative case systems can develop, nor is every typological detail explained by these processes. A common feature of the pathways outlined here is that they have little to do with a functional account of case. The omission of subjects in clauses containing instrumentals, or the change from the active to the passive as the default clause type (which Trask, 1996,suggests may have been related to politeness), are two processes that happen to produce split case systems that fit Du Bois’ “motivations” without satisfaction of those motivations being the rationale for the development of case systems. Similarly, it seems unlikely that a passive construction would come to be the dominant form for Aln transitive clauses because of individuals’ preferences for fewer morphemes per clause. The applicability of Jager’s model seems to be restricted to changes from one strategy to another which only involve the loss of a case marker. By drawing attention to the frequencies of prominent and non-prominent argument combinations, the model suggests that the loss of some case marking morphemes may be more likely than others (those likely to be lost being those whose omission would rarely lead to communicative breakdown and the need for repair). In contrast, when it comes to changes which introduce new morphemes, Jager’s null hypothesis of equal mutation probabilities should be improved by adopting a diachronic perspective. However, once plausible diachronic pathways are identified, it seems there is little the EGT analysis can add to our understanding, and much that it obscures by presenting changes from one case system to another as if they always result from a trade-off between disambiguation and production effort. References
Comrie. B. (2005). Alignment of case marking. In M. Haspelmath, M. Dryer, D. Gil, & B. Comrie (Eds.), The world atlus of lunguuge structures. Oxford: Oxford University Press. Dixon, K. M. W. (1994). Ergativify (Vol. 69). Cambridge: Cambridge University Press. Du Bois, J. W. (1987). The discourse basis of ergativity. Language, 63(4), 8054355. Garrett, A. (1990). The origin of np split ergativity. Language, 66(2), 261-296. Jager, G. (2007). Evolutionary game theory and typology: a case study. Language, 83(1), 74-109. Kirby, S. (1999). Function, selection and innateness. Oxford: Oxford University Press. McMahon, A . M. S. (1994). Understanding lunguuge change. Cambridge: Cambridge University Press. Trask. R. L. (1996). Historical linguistics. London: Arnold.
WHAT IMPACT DO LEARNING BIASES HAVE ON LINGUISTIC STRUCTURES?
DAVID J. C. HAWKEY Language Evolution and Computation Research Unit, Edinburgh University, 40 George Square, Edinburgh, EH8 9LL. UK
[email protected] Recent work modelling the development of communication systems has suggested that linguistic structure may reflect cognitive structures through the repeated effect of biased learning, the language adapting to conform to the learning preferences of its users. However, the notion that an individual’s learning is biased can be cashed out in numerous ways. This paper argues that different ideas of what a “learning bias” is produce different population level effects. Biased learning may result in the population’s maintenance of structures disfavoured by the bias (cultural inertia) which we can think of as arising through a variety of diachronic processes. Without clear articulations of the relevant diachronic processes and an empirically sound notion of biased learning, the assumption that linguistic structure reflects individuals’ language learning psychology is premature.
The use of computer simulations in the field of language evolution offers the potential to illuminate cultural processes impacting on the structures of communication systems developed (as opposed to designed) by groups of individuals. Such simulations commonly employ agents endowed with a learning mechanism (and, usually, a pre-defined communication channel). Simulations are generally designed such that an agent’s learning mechanism responds to episodes of communication (in which the agent might be playing the role of speaker or hearer) in such a way as to increase the success of future communicative interactions. As agents repeatedly (iteratively) learn from each other, they settle on relatively stable shared functioning systems (at least, in simulations considered successful). The properties of such emergent systems commonly depend, among other things, on the properties of the learning mechanisms chosen for the agents: for example Kirby, Dowman, and Griffiths (2007) present a model in which a Bayesian learning algorithm with a nominallya weak learning bias produces strong linguistic universals through the process of iterated learning. aIn the model of Kirby et al. (2007), the Bayesian posterior probability of a hypothesis given some input data (see their Q. 1) does not play the role of the probability that the agent will choose that hypothesis. Rather, the agent chooses the hypothesis which has the maximum posterior probability. If the “bias” is thought of as a relationship between the learner’s experience and their behaviour,
155
156
Interpreting these simulations as models of the evolution of human communication systems, it seems that properties of human languages may be explicable by appeal to the learning biases of individuals. In a recent paper, Dediu and Ladd (2007) present a correlation between the presence of linguistic tone and two genes (known to be related to brain size) which they argue is not a reflection of shared linguistic history or geographical proximity. They suggest that this correlation might be due to the genes producing a weak learning bias for or against tone. The bias is assumed to be weak because it seems that anyone can (eventually) learn tonal and non-tonal languages, whatever their genetic makeup. However, the notion of a learning bias is somewhat vague and can be operationalised in a number of different ways. This paper investigates some different ways of modelling a learning bias, and asks how such biases can influence the communication systems that emerge in simulations of language evolution. Given contemporary ignorance, confusion and dispute over the processes of language learning, it would be foolhardy to try to implement realistic learning models, as what would count as “realistic” is utterly opaque. Rather, these simulations are intended to illustrate the difficulties attached to interpreting linguistic features (or distributions over these) as reflections of individual psychology.
1. Three models of learning bias Kirby (1999) presented a model in which learning was biased in favour of syntactic structures judged to be less complex by a certain theory of processing (viz. Hawkins’ (1990) processing theory). The learning algorithm used by Kirby (1999) implemented a bias for lower complexity by always setting the probability that an agent would produce the less complex form to be greater than the relative frequency with which that form had been encountered by that agent. This kind of learning bias I will refer to as a transformational bias to capture the notion that the average effect of biased learning is always to transform the relative frequencies of variants (from input to output) in favour of the biased form. A transformational bias, by definition, consistently biases the outcome of learning towards a particuarguably it should not be interpreted as the “prior” distribution over hypotheses used in the calculation of the posterior probability. Consider the trivial case of two hypotheses (hl and hz) and only one possible data set (d): both hypotheses produce only d so P(dlh1) = P(dlh1) = 1. If the prior is characterised as P(h1) = 0.50001 and P(h2) = 0.49999, a learner presented with data d will alwuys adopt h l . Thus, in this case, while the prior is only marginally in favour of h l , the learner is maximally biased toward hl. The hypothesis/data combinations explored by Kirby et al. have numerous data sets for which different hypotheses have the same production probability (i.e. the same P(dlh)),and when presented with such data learners will always choose the hypothesis with greater prior probability, regardless of how small the margin is. Different learners with different priors may be statistically behaviourally indistinguishable, raising the question of how to interpret these different priors, i.e. what the difference in the model is supposed to correspond to in reality. Interpretation in terms of a private language of expectation will sink in a philosophical quagmire whose articulation is far beyond the scope of this paper.
157
lar state. However, this is not the only way the notion of a learning bias could be interpreted: a bias may be thought of as an effect on the process of learning, and as such produce more complex, less consistent relationships between input and the outcome of learning. For example, a learning bias could be taken as what an inexperienced individual is most likely to do (which we could refer to as a default strategy bias), or it could be taken as meaning that certain linguistic forms are more readily learned than others (an ease of learning bias).
2. Learning Biases in an Iterated Learning Model The effect of these three kinds of learning bias were investigated using a simple iterated learning model. In the model, agents can select one of two possible variants in each interaction. Agents select variants stochastically, and learning affects the probability with which a particular form is chosen.b Agents interact with each other in randomly chosen pairs. During an interaction, one agent (the speaker) stochastically produces a form which the other agent (the hearer) learns from. Simulations consist of a fixed number of agents (the population), and after a number of interaction events a number of the oldest agents are removed from the population (they die) and are replaced by new (born) agents. The selection of agents as speaker and hearer is independent of agents’ degree of experience: agents learn from other agents throughout their lives, and agents with no experience have as much chance of being selected as a model from which to learn as more experienced agents. This unrealistic assumption (that agents can affect the language from birth) is made to increase the chance that agents’ biases will come to be reflected in the language. The two forms agents can produce are abstractly referred to in the model as a and b. The implementations of the various learning rules are biased towards the production of form a. Populations are initialised with agents who on the whole do not produce more a forms than b (and, in many cases produce more b forms than a ) , so these simulations can be seen as testing whether various learning biases can change the population’s system.
2.1. Learning rules It is worth stressing again that the models here make no pretence of being realistic. The learning rules employed are designed to capture different intuitive notions of biased learning. The transformational learning rule is modelled on the rule used by Kirby (1999, p. 48). However, rather than being used to calculate the probability that an agent will end up with a given competence given a fixed set of observations, bThis is contrasts with Kirby’s 1999 model which probabilistically assigned each agent a deterministic competence
158
the current model uses the rule to continuously update the probability that the agent will produce form a the next time it is a speaker. The probability, p that an agent will produce form a is given by Eiq. (1) where na and n b are the numbers of a and b forms the agent has experienced other agents producing during its lifetime, and a is the biasing parameter which ranges between zero (strongest bias) and one (weakest bias).c P=
na
+
n, anb In contrast to the transformational learning rule, the default strategy and ease of learning biases are implemented without an explicit representation of the frequencies with which forms have been encountered. Instead, when an agent learns as the hearer in an interaction, its probability vahe is modified by an amount which depends on the form it is presented with. Eq. (2) expresses the relationship between the change in probability (Ap), a “learning rate” parameter (A), the identity of the presented form ( a ,which is equal to one when the heard form is a and minus one when b is heard) for the default strategy bias. The terms in Eq. ( 2 ) involving p make Ap smallest when the agent has a particularly strong preference for producing one form rather than another. Agents’ default strategy bias (that is, their preference for producing form a prior to learning) is modelled by setting p to a high value for new agents entering the population.
Ap= A.a.p(l - p )
(2)
An ease of learning bias is modelled using a similar learning rule, but with a modification that favours changes in the value of p in one direction over the other. In Eq. ( 3 ) , the parameter P can vary from zero (no bias) to one (strongest bias). Inexperienced ease of learning agents begin life with p = 0.5.
2.2. Results and Explanation The results of some selected simulations are shown in Fig. 1 which plots the population mean probability that form a will be produced. During simulations with the transformational bias, this probability generally increases with time, until a dominates. In contrast, during simulations run with the default strategy and ease of learning biases, the direction of change of the population mean probability at a given moment in time depends on the values of the various parameters and the proportion of a forms being produced at that time. The system can evolve to be dominated by either the bias-favoured form or the bias-disfavoured form. =Agents with no experience, i.e. with na = 0 and n b = 0, produce form a with a 50%probability.
159
Figure 1. Qualitative results of using different learning biases. Simulations run for different numbers of iterations are scaled and plotted on the same axes to facilitate qualitative comparison. Numerical details: all simulations had 100 agents. Trunsformutionul hius: one agent replaced every 100 interactions. simulation run for 30,000 interactions, 01 = 0.4, simulation began with agents having n, = 1 and n b = 100. Defuulf strutegy bius: one agent replaced every 400 interactions, simulation run for 10,000 interactions. X = 0.1, simulation began with agents having p = 0.4 (lower) or p = 0.5 (upper). Ease ofleurning bius: same as default strategy except simulations began with p = 0.3 (lower) and p = 0.4 (upper), and fi = 1/3
One virtue of using such simplistic learning rules is it is easy to develop a perspicuous account of why populations behave in the ways they do. Transformational bias Consider a new agent introduced to the population when the average probability of producing a among the rest of the population is 7r. After playing the role of hearer in N interactions, the expected values for the numbers of observed forms, (n,) = 7rN and ( n b ) = (1 - r ) N , can be substituted into Eq. (1) to calculate the expected value of the agent’s production probability, ( p ) . As can be seen in Eq. (4), this expected value is always greater than 7r unless r is equal to one or zero (in which case only one form exists in the population) or a 2 1 (in which case the agent is not actually biased in favour of a). Thus a new agent introduced to a population will, on average, produce form a with greater probability than the rest of the population: therefore, the introduction of a new agent will, on average, increase the mean probability that the bias-favoured form will be produced (irrespective of what that probability actually is).d dNew agents do not have to be added for the transformational bias to drive the system to domination by the bias-favoured form. Agents always produce form a with greater probability than the relative frequency with which they have observed it, and thus their contributions always serves to increase the relative frequency of a throughout the simulation. However, without the introduction of new agents, the speed with which the form a comes to dominate is much slower.
160 7 r ( l - 7r)(1- a )
( p ) = 7r
+
7r
(4)
+ a(1 - 7 r )
Default strategy and ease of learning biases Consider an agent playing the role of hearer using the ease of learning rule in Eq. (3). If the probability that a randomly selected speaker from the rest of the population (i.e. not the hearer) will produce form a is again 7r, then the expected change in the hearer’s production probability value will be given by Eq. (5).
(Ap) = A . ( 2 +~p - 1) . P ( P - 1)
(5)
The direction of (Ap) is dependent on the relationship between 7r and p. While the value of 7r varies with the identity of the hearer, we can estimate the direction of change of the population mean probability by substituting this probability, p , for 7r in Eq. (5). Thus, roughly speaking, the expected change in the population’s production probability will be zero when p = (the unstable equilibrium), positive when p is greater than this value and negative when p is less than this value. The ease of learning bias always has p > 0, so the unstable equilibrium value of p is below 0.5. Thus the introduction of new agents can change the dominant form produced by the population if the effect of their nalve language production manages to increase p beyond a certain value which is generally lower than 0.5 (thus, for example, increasing the rate of population turnover would make the effect of the bias more likely to have an impact). In simulations run with the the default strategy bias, the learning rule is effectively the same as the ease of learning strategy with ,B = 0, so the unstable equilibrium value is p = 0.5. The introduction of an agent with a default strategy bias serves to move the population production probability towards the biased agent’s initial production probability. If, following the introduction of the biased agent, p is still less than the equilibrium value, the expected effect of each subsequent interaction will be to reduce j?j (in part, by reducing the new agent’s production probability). Whether the introduction of new agents with a default strategy bias manages to change the population from producing the bias-disfavoured form to the bias-favoured form depends on the rate at which the introduction of new agents increases p compared to the rate at which interactions between agents decrease p. With these two models of learning bias the population can become effectively‘
9
eBecause of the stochastic nature of this model, the state of a population is never indefinitely stable: there is always a non-zero probability that a population which produces a as the dominant form will switch to one dominated by b (for example, all agents could, by improbable chance, produce b forms in every interaction until p i s below the equilibrium value). However, the learning biases all make the probability of a transition from an a dominated situation to a b dominated situation less likely than the converse. Thus, in the limit of infinitely long runs, we would expect to see the population spend more time dominated by the bias-favoured form than the bias-disfavoured form. The relevance of this
161
trapped in a state dominated by the production of the bias-disfavoured form. We might say that in such situations, the population exhibits cultural inertia resulting from the effect of agents learning the ambient language outweighing the effect of nai’ve agents’ biased behaviour.
3. Discussion The models presented in this paper are not at all realistic, and no attempt was made to determine reasonable values for the parameters used: the ungroundedness of the overly simple equations used cannot be compensated for by, for example, realistic rates of population turnover. The purpose of presenting these models is not to find a counter-intuitive result of a complex process, but to show that not all intuitive notions of learning bias (afforded to us by our ignorance of the processes of language learning) will inexorably produce distributions of languages which reflect those biases. It would thus be premature to think that a link between learning biases and language typology has been demonstrated. The simulations presented here all (arbitrarily) begin with an initial state in which the bias-favouredform is not the majority form produced by the population. How might such a situation arise? Bybee and Newman (1995) performed experiments designed to detect learning biases (in students learning an artificial language) for plural marking with suffixes versus plural marking with stem-changes. They found no bias in favour of either plural marking scheme (in terms of ease of acquisition and generalisation), in spite of the fact that stem changes are far less common in the languages of the world. Bybee and Newman argue that it is the differences in the diachronic processes which produce affixes and stem changes that are responsible for the dominance of affixes over stem changes. Briefly, affixes generally develop by the grammaticalization of free morphemes whereas stem changes develop by the phonological conditioning of the stem by an affix followed by the deletion of the affix. Thus the process that produces stem changes depends on the presence of affixes, but not vice versa. The processes by which stem changes arise also generally take longer than the processes by which a free morpheme becomes an affix. Attention to these processes, rather than individuals’ preferences for one system over another, accounts for the typological distribution presented by the world’s languages. The clearest way linguistic structure can develop independently of learning biases is by some new structure being what is left when some other aspect of linguistic structure is omitted. For example, Garrett (1990) argues that ergative property of the model is hard to see given the lack of a connection between these models and reality: the time-scale over which this effect could become noticcably manifest within a population could be far longer than the time-scales over which human languages have existed. The point of this paper is that cultural inertiu may outweigh biased learning. The fact that another possibility (which we may call sponruneous population level c h n g e ) could theoretically outweigh both in an infinite limit is only relevant if we consider its occurrence likely to have a significant effect on linguistic structures.
162
case marking in Hittite developed from instrumental case marking and happened through the reanalysis of the instrumental in a null subject transitive clause as the (ergative) subject. This process was driven by the omission of transitive clause subjects, a process orthogonal to any putative language learning mechanisms’ biases for or against ergative or accusative case systems. We may draw an analogy between learning biases and Coriolis forces: when dealing with weather systems, the Coriolis force is a dominant effect in determining the direction of winds relative to the surface of the earth; however, when dealing with water draining from a bathtub, the Coriolis force (contrary to popular belief‘) is incredibly weak in comparison to other effects and does not determine the rotation of the draining water. Similarly, learning biases should be thought of as one among many kinds of effects which may shape the emergence and stability of language structures. Whether a particular learning bias substantially affects a particular language structure will depend on the balance of thesc cffects. The view of language adopted by generativist linguistics (Chomsky, 1965) sees universals of language structure as reflections of the fact that an individual’s “language acquisition device” can only select from a limited range of “possible human languages”. Viewing language stiuctures as the aggregate of biased learning is, in a sense a generalisation of this view which weakens the effect of the acquisition device from identifying possible languages to identifying preferred structures. Both perspectives have the shortcoming that they reduce the explanation of language structures to (speculative) features of the individual psychology of language learning. A broader perspective on the sources of structural similarities across languages would balance the effect of biased language learning against other effects that could impact on language structures and may be more or less similar across different human communities without necessarily being reducible to human psychology. For example, similarities found in the world’s languages colour term systems can be understood as a reflection of the similarities in the useful colour contrasts presented to humans by their environments (Hawkey, 2006). References
Bybee, J. L., & Newman, J. E. (1995). Are stem changes as natural as affixes? Linguistics. 33, 633454. Chomsky, N. (1965). Aspects ofthe theory ofsyntm. Cambridge, MA: M.I.T. Press. Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, aspm and microcephalin. Proceedings of the Nutionul Academy of Sciences, 104(26), 10944-10949. Garrett, A. (1990). The origin of np split ergativity. Languuge, 66(2), 261-296. Hawkey, D. J. C. (2006). The interrealted evolutions of colour vision, colour and colour terms. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of lunguuge: proceedings of the 6th internutionul conference (evofung6)(pp. 417-418). Singapore: World Scientific. Hawkins, J. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21, 223-261. Kirby, S. (1999). Function, selection und innutmess. Oxford: Oxford University Press. Kirby, S., Dowman, M., & Griffiths, T. L. (2007). Innateness and culture in the evolution of language. Proceedings of the Nutionul Academy of Sciences, 104(12), 5241-5245.
REANALYSIS VS. METAPHOR? WHAT GRAMMATICALISATIONCAN TELL US ABOUT LANGUAGE EVOLUTION
STEFAN HOEFLER & ANDREW D. M. SMITH Language Evolution and Computation Research Unit, Linguistics and English Language, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL
[email protected]/
[email protected] We argue that studying grammaticalisation is useful to evolutionary linguists, if we abstract away from linguistic description to the underlying cognitive mechanisms. We set out a unified approach to grammaticalisation that allows us to identify these mechanisms, and argue that they could indeed be sufficient for the initial emergence of linguistic signal-meaning associations.
1. Introduction Language evolution has a notorious data problem: its object of study is simply too far remote in the pre-historic past for any direct observation to be possible. In such situations, Ockham’s razor recommends the assumption of the uniformity of process: that the mechanisms operating in the past are the ones still operating in the present. This would lead to the assumption that we should be able to learn something about the evolution of language from the study of language change, and in particular of semantic change leading to grammaticalisation (Heine & Kuteva, 2002a; Hurford, 2003). Grammaticalisation denotes the (unidirectional) process by which a discourse strategy, syntactic construction, or word, loses some of its independence of use and becomes more functional. It is usually accompanied by phonetic reduction and semantic bleaching and generalisation. There is disagreement over whether the study of grammaticalisation can give useful insights into language evolution. Newmeyer (2006), for instance, criticises the assumption that the unidirectionality of grammaticalisation provides sufficient evidence that early human language contained only nouns and verbs. We argue that grammaticalisation is indeed worthy of evolutionary linguists’ study, if one abstracts away from linguistic descriptions of individual phenomena to underlying psychological mechanisms. We thus support calls for a more cognition-oriented study of grammaticalisation (Heine, 1997; Kuteva, 2001; Tomasello, 2003): Exactly how grammaticalization and syntacticization happen in the concrete interactions of individual human beings and groups of human beings, and how these processes might relate to the other pro163
164 cesses of sociogenesis by means of which human social interaction ratchets up the complexity of cultural artefacts, requires more psychologically based linguistic research into processes of linguistic communication and language change. (Tomasello, 2003, p. 103) The remainder of this paper falls into three sections. We first provide a unified approach to grammaticalisation, allowing us to identify the underlying cognitive mechanisms. We project these to study the emergence of a non-linguistic code, before exploring the implications of our approach for evolutionary linguistics.
2. Metaphor vs. reanalysis Two competing kinds of accounts of grammaticalisation phenomena can be identified in the literature: those which emphasise metaphorical use (Heine, 1997), and those which emphasise reanalysis (Hopper & Traugott, 2003). We propose a unified approach based on an ostensive-inferential model of communication (Sperber & Wilson, 1995). Such a model emphasises the fact that in a given situation, a speaker and hearer assume common ground (Clark, 1996). Common ground includes, among other shared knowledge, the awareness of shared linguistic conventions and the recognition of what is relevant in a given situation, which allows the hearer to infer what the speaker intends to communicate on the basis of an ostensive stimulus provided by the speaker. The grammaticalisation of the English construction be going to, which originally stood for SPATIAL MOTION, and then came to express INTENTION, as shown in Example 1, is one of the most cited examples in the grammaticalisation literature (Heine, Claudi, & Hiinnemeyer, 1991; Kuteva, 2001; Hopper & Traugott, 2003), and is also a particular instance of grammaticalisation which is very common, both historically and cross-linguistically (Heine & Kuteva, 2002b). (1)
a. We are going to Windsor to see the King. b. We are going to get married in June.
(MOTION)
not MOTION) (INTENTION,
(examples from Bybee (2003, p.147)). We illustrate our approach by presenting the underlying psychological mechanisms, for both speaker and hearer, of metaphor- and reanalysis-based accounts of this change. In the metaphor-based scenario, detailed in Example 2, a speaker intends to express INTENTION (2a). She uses the form for SPATIAL MOTION metaphoricallya, assuming that the hearer will realise that (i) spatial motion is irrelevant in the current context, and (ii) spatial motion often implies intention, which in turn is relevant (2b-f). The hearer realises that the literal meaning of the aThere are many reasons for ad-hoc metaphorical use; these could be sociolinguistic (e.g. for prestige), or the speaker could simply not have a convention for the intended meaning in her code.
165
signal is irrelevant in the current context, and falls back on INTENTION, which he associates-and knows the speaker associates-with SPATIAL MOTION (2g-m). (2)
Detail of the metaphor-based scenario. Speaker: (a) 1Want to express INTENTION. (b) I have a construction which expresses SPATIAL MOTION, and the
(c) (d) (e) (f)
hearer shares this convention. SPATIAL MOTION is associated with INTENTION. SPATIAL MOTION is not relevant in the given context. Because we share common ground, the hearer will be aware of (b)-(d), and realise that I am aware of it too. Because of (e), 1 can use the construction for SPATIAL MOTION metaphorically to convey INTENTION.
Hearer: (g) The speaker has expressed SPATIAL MOTION. (h) SPATIAL MOTION is not relevant in the given context. (i) SPATIAL MOTION often implies INTENTION. fi) INTENTION would be relevant in the given context. (k) I must assume that the speaker is co-operative. (1) I must also assume that the speaker is aware that I know (g)-(k), and that I know of his being aware of it. (m) From (g)-(l), I conclude that the speaker intends to convey 1NTE N TION.
Both speaker and hearer remember that be going to has been used succcssfully to express INTENTION; the more frequently be going to is used in this sense, the more deeply this new association will become entrenched (Langacker, 1987) in their knowledge. Such entrenchment eventually leads to the phenomenon of context-absorption, where a pragmatically inferred meaning becomes part of the lexical item’s conventional, semantic meaning (Croft, 2000; Levinson, 2000; Kuteva, 2001; Traugott & Dasher, 2005). The entrenched meaning no longer needs to be inferred from its relevance in the given context, but can be retrieved instead from the shared conventions which make up part of language users’ encyclopaedic knowledge. In the reanalysis-based scenario, detailed in Example 3, the speaker uses be going to in its conventional sense to express SPATIAL MOTION-the expression of which she deems relevant in the given context (3a-e) The hearer, however, perceives things differently; he does not think that SPATIAL MOTION is relevant in the present situation but does believe that information about INTENTION would be
166
(30. From the hearer's perspective, this appears to be exactly the same scenario as the metaphor-based scenario in Example 2. This time, the interlocutors make different adjustments to their codes: the speaker will further entrench the convention that maps be going to onto SPATIAL MOTION, whereas the hearer establishes a new, additional association between be going fo and INTENTION. (3)
Detail of the renalysis-based scenario, Speaker: (a) 1Want to express SPATIAL MOTION. (b) I have a construction for the expression of SPATIAL MOTION in my linguistic code, and the hearer shares this convention. (c) SPATIAL MOTION is relevant in the given context. (d) Because we share common ground, the hearer will be aware of (b)-(c) and realise that I am aware of it too. (e) Because of (d), I can use the construction to communicate SPATIAL MOTION.
Hearer: (f) performs the same reasoning as in (2g)-(2m) above. A special case of the reanalysis-based scenario is one where the hearer, in the role of a language learner, does not have any existing mapping for be going to in his linguistic code. However, because he can work out from the context that the speaker intends to express INTENTION, he will create an association between that meaning and be going to. In contrast to the previous two scenarios, layering (the co-existence of an old and a new mapping, which yields polysemy) does not arise in the hearer's linguistic code in this case. Two important conclusions can be drawn from our analysis of the metaphorand reanalysis-based explanations of the grammaticalisation of be going to. First, both scenarios are based on the same cognitive processes: (i) those involved in ostensive-inferential communication-in particular the assumption of common ground, including knowledge of shared linguistic conventions and the recognition of what is relevant in the given context; (ii) the automatisation-based process of entrenchment. Second, the difference between the scenarios is not that only one of them uses metaphor, but rather that the (infelicitiously named) metaphorbased scenario relies on common ground having been successfully established between speaker and hearer, whereas the reanalysis-based scenario describes a situation where, although common ground is assumed by the interlocutors, there is actually a mismatch between their respective discourse contexts (Kuteva, 2001). The metaphor-based scenario is thus speaker-oriented, focusing on the speaker as the source of linguistic innovation, while the reanalysis-based account is heareroriented. Depending on which of the two perspectives one takes, however, either scenario can be regarded as a special case of the other.
167
3. Reconstructible meanings How can we project these scenarios to language evolution? First, we step back to see how ostensive-inferential communication works-independent of language. We note that communication is inherently task-oriented; humans do not communicate “just so,” but to do something, to achieve a goal or solve a task (Austin, 1962). The task-orientedness of communication entails that once a speaker has made manifest her intention to communicate, the hearer will have certain expectations as to what are plausible things to communicate in the given situation. In this way, a hearer discerns what is relevant from what is irrelevant in a given situation (as in the scenarios for the grammaticalisation of going to above), and the speaker can likewise anticipate what the hearer is likely to infer. In the simplest case, in Fig. l(a), making manifest one’s communicative intention may suffice for the hearer to be able to infer the information one wants to communicate. The hearer’s reasoning may go as follows: my conspecific exhibits behaviour that does not make sense unless she intends to communicate; therefore she intends to communicate something; in the current situation, the only thing that would make sense for her to communicate is that there is some danger around; therefore, she is communicating that there is some danger around. Note that the speaker’s and hearer’s assumptions can be different (i.e. there can be a contextual mismatch): if the perlocutionary effect does not differ, this may go unnoticed, and speaker and hearer will map the produced stimulus onto different utterance meanings. In Fig l(b), for example, as long as the hearer runs and hides, it does not matter that the speaker thought she was communicating the presence of lion, while the hearer assumed that hyena were around. Of course it is not always possible to reduce the set of plausible utterance meanings to a single one; in such cases, the hearer needs some assistance in selecting the right one, namely a clue. The hearer’s reasoning might run along the following lines (see Fig. l(c)). Because it does not make sense otherwise, I must interpret the speaker’s behaviour as an attempt to communicate. In this situation, the only things that would make sense for her to communicate are to tell me that there is danger and to specify whether this danger is a lion or an eagle. She is communicating, so there is danger, but how can I decide if it is a lion or an eagle? The speaker must realise my dilemma, and so her ostensive stimulus will contain a clue. She is growling: lions growl, eagles don’t (hyenas growl too, but this is irrelevant as there are no hyenas around at this time of year); therefore, she is communicating that there is a lion. The cognitive mechanisms underlying these instances of communication are identical to those described in section 2 for grammaticalisation. This equivalence also extends to the entrenchment of the signal-meaning association and thus to the emergence of a convention. In all cases, the meanings which come to be associated with signals are those which can be reconstructed from the stimuli in context.
168
Figure 1. The reconstruction of meaning in ostensive-inferentialcommunication, where @ is the set of plausible intended perlocutionary effects in a given situation. (a) If only one thing makes sense to be communicated, e.g. that there is some danger around (A), then the recognition of a conspecific’s intention to communicate suffices to infer what she attempts to convey. (b) Contextual mismatch: the speaker means A (e.g. that there is a lion), the hearer infers C (e.g. that there is a hyena). Because both have the same perlocutionary effect p i (e.g. climbing a tree), the hearer’s misinterpretation goes unnoticed and communication does not fail. (c) In situations where more than one thing is plausible, the speaker must additionally provide a clue. For instance, it might make sense to communicate that there is a lion (A) or an eagle (B): if there is a lion, one must climb (pi); if there is an eagle, one must hide 012). Growling (S)serves as a clue: it is the sound made by lions (S -t A)-and by hyenas (S C),but this is irrelevant in the given context. ---$
169
Every speaker innovation can only be propagated through hearer reconstruction; semantic reconstructibility therefore constrains the types of form-meaning mappings which can persist over time (Smith, 2008).
3.1. Burling’s scenario revisited Burling (2000) makes a case for a scenario of the emergence of linguistic symbols that is reminiscent of the reanalysis-based explanation of the grammaticalisation be going to we have given above. He suggests that symbols arise from situations in which one individual erroneously interprets a conspecific’s behaviour as an ostensive stimulus. In our model, this would be represented as an extreme, but nevertheless ordinary, case of contextual mismatch: the hearer interprets the interaction as communicative but the speaker does not. Because the supposed ostensive stimulus will not have the properties of a proper clue, the hearer will only be able to infer a plausible meaning if there is only one relevant thing that would make sense to be communicated in the given context, and if their reaction does not expose the misunderstanding. Burling concludes that comprehension runs ahead of production: “[C]ommunication does not begin when someone makes a sign, but when someone interprets another’s behaviour as a sign” (Burling, 2000, p.30). This interpretation must be rejected on the basis of our analysis of the psychological underpinnings of the equivalent reanalysis-based scenario of grammaticalisation in section 2. Although in Burling’s scenario, the hearer does indeed infer something not implied by the speaker, he does so not on a whim, but under the assumption that the speaker is inviting him to make those very inferences. Rather than one being prior to the other, therefore, production and comprehension mirror each other: whatever a hearer can infer, a speaker can imply. Communication is inherently co-operative (Grice, 1975; Clark, 1996; Tomasello, 2003), and while Burling’s “reanalysis-based” account cannot be ruled out, its “metaphor-based” counterpart is equally possible. Both should be seen as instances of the same set of underlying cognitive mechanisms: ostensive-inferential communication and entrenchment. 4. Conclusion
We have shown that grammaticalisation can indeed answer questions relevant to evolutionary linguists, if one moves away from linguistic classification to investigating its underlying psychological mechanisms. We have argued that the same cognitive processes that lead to grammaticalisation phenomena could also have been sufficient for the initial emergence of linguistic signal-meaning associations. We thus neither endorse nor attempt to disprove Newmeyer (2006)’s specific criticism of the use of grammaticalisation as a source of information about language evolution. Our approach is different from both his approach and the approaches of those he criticises. We claim that the merit of studying grammaticalisation, and in fact any semantic change (Traugott & Dasher, 2005), for insights
170
into language evolution, lies in the underlying cognitive processes it makes visible, which can be applied to investigate the origins of language. References Austin, J. L. (1962). How to do things with words. Oxford: Oxford University Press. Burling, R. (2000). Comprehension, production and conventionalisation in the origins of language. In C. Knight, M. Studdert-Kennedy, & J. R. Hurford (Eds.), The evolutionary emergence of language: Social function and the origins of linguistic form (pp. 27-39). Cambridge: Cambridge University Press. Bybee, J. L. (2003). Cognitive processes in grammaticalization. In M. Tomasello (Ed.), The new psychology of language: Cognitive andfunctional approaches to language structure (Vol. 2). Erlbaum. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Croft, W. (2000). Explaining language change: An evolutionary approach. Longman. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts. New York: Academic Press. Heine, B. (1997). Cognitivefoundations of grammar. New York: Oxford University Press. Heine, B., Claudi, U., & Hiinnemeyer, F. (1991). Grammaticalization: A conceptual framework. Chicago: University of Chicago Press. Heine, B., & Kuteva, T. (2002a). On the evolution of grammatical forms. In A. Wray (Ed.), The transition to language (pp. 376-397). Oxford: Oxford University Press. Heine, B., & Kuteva, T. (2002b). World lexicon of grammaticalization. Cambridge: Cambridge University Press. Hopper, P. J., & Traugott, E. C. (2003). Grummaticalization (Second ed.). Cambridge: Cambridge University Press. Hurford, J. R. (2003). The language mosaic and its evolution. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 38-57). Oxford: Oxford University Press. Kuteva, T. (2001). Auxiliation: An enquiry into the nature of grammaticalization. Oxford: Oxford University Press. Langacker, R. W. (1 987). Foundations of cognitive grammar: Theoretical prerequisites (Vol. 1). Stanford, CA: Stanford University Press. Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicatures. Cambridge: Cambridge University Press. Newmeyer, E J. (2006). What can grammaticalization tell us about the origins of language? In A. Cangelosi, A. D. M. Smith, 8c K. Smith (Eds.), The Evolution ofLanguage (pp. 434-435). World Scientific. Smith, A. D. M. (2008). Protolanguage reconstructed. Interaction Studies, 9. Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (Second ed.). Oxford: Blackwell. Tomasello, M. (2003). On the different origins of symbols and grammar. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 94-1 10). Oxford: Oxford University Press. Traugott, E. C., & Dasher, R. B. (2005). Regularity in semantic change. Cambridge: Cambridge University Press.
SEEKING COMPOSITIONALITY IN HOLISTIC PROTO-LANGUAGE WITHOUT SUBSTRUCTURE - DO COUNTEREXAMPLES OVERWHELM THE FRACTIONATION PROCESS?
SVERKER JOHANSSON School of Education and Communication, University of Jonkoping, Box 1026 SE-551 I 1 Jonkoping, Sweden
[email protected] In holistic theories of protolanguage, a vital step is the fractionation process where holistic utterances are broken down into segments, and segments associated with semantic components. One problem for this process may be the occurrence of counterexamples to any segment-meaning connection. The actual abundance of such counterexamples is a contentious issue (Smith, 2006; Tallerman, 2007). Here I present calculations of the prevalence of counterexamples in model languages. It is found that counterexamples are indeed abundant, much more numerous than positive examples for any plausible holistic language.
1. Introduction Human beings today have language. Our ancestors long ago did not. The notion that modern language with all its complexity arose ex nihilo is preposterously unlikely, which implies that one or more intermediate stages, less complex than modern language, must have existed. A popular possibility for an early intermediate stage is a language where each utterance is a unit without substructure. In analogy with the ontogeny of language, we might call this a one-word stage. There are at least two ways to get from a one-word stage to a composite language, either analytic/holistic or synthetic (Hurford, 2000; Bickerton, 2003; Johansson, 2005). In the holistic version (Wray, 2000; Arbib, 2003), the units of the one-word stage are holistic utterances, which are then fractionated into parts that become independent recombinable morphemes in the next stage, whereas in the synthetic version (Bickerton, 2000; Jackendoff, 2002, among others), two or more units from the one-word stage are combined into structured utterances in the next stage. The segmentation and analysis step, finding substructure in utterances that are postulated to lack substructure, is a critical step for holistic theories. It is not obvious to me, nor to Bickerton (2003) or Tallerman (2007), why the fractionation process envisaged by Wray (2000) would be expected to work. A similar process is certainly present in modern-day language acquisition - children first acquire 171
172
some stock phrases as unanalyzed wholes, and later figure out their internal structure - but that works only because these stock phrases have an internal structure, given by the grammar of the adults from whom the child acquires them. As an analogy for the origin of grammar, this is unsatisfactory. Wray (2000) describes a scenario in which people already talking at the oneword stage at some point acquire a grammar from somewhere - apparently not from any linguistic or communicative pressures, but as an exaptation - and start applying it to their language, attempting to identify structure and constituents in their structureless holistic one-word utterances. Tallerman (2004, 2007) provides a detailed critique of this process, to which Smith (2006) provides a partial response. In this paper, I will concentrate on one specific point of contention between Tallerman and Smith, which concerns how connections are established between semantic components and sound segments. By pure chance, it may sometimes happen that different utterances have both a “phonetic segment” in commona, and a semantic component in common. It is argued by e.g. Wray (2000) that this will lead to the identification of the phonetic segment with the semantic component, so that the former comes to “mean” the latter. Tallerman (2004, 2007) argues that it is self-evident that counterexamples will by far outnumber confirming examples for such a generalization. Smith (2006) disagrees, arguing that there is no logical necessity that counterexamples outnumber positive examples. Smith (2006) further argues that it is not established that counterexamples, whatever their frequency, are actually fatal to generalization. This issue hinges on whether the analysis process in proto-humans has a logical and statistical component, or is purely based on positive examples. The mental processes of protohumans are unfortunately unavailable to direct observation, but since it has been established that both modern human infants (Saffran, Aslin, & Newport, 1996) and monkeys (Hauser, Newport, & A s h , 2001) are sensitive to statistical patterns in language-like input, it is not parsimonious to assume that proto-humans totally disregarded statistics. The weight of Tallerman’s argument thus depends on the actual ratio of counterexamples to positive examples in plausible proto-languages, a ratio that can be estimated through simplc calculation in simulatcd model languagcs. I present here the results of such a calculation. ““Phonetic segment” has been used in this debate as a term for whatever chunks of sound that proto-humans will identify as a unit, and hopefully connect with a meaning. It is far from obvious that proto-humans at the relevant stage possessed the segmentation ability and phonological awareness needed to segment an utterance into anything useful (Studdert-Kennedy, 2005; Tallerman, 2007), but for the sake of the argument this additional hurdle for the holistic model is assumed to be solvable. I will call these chunks “sound segments”.
173
0.01
2-
B
0.001
E
0!
0.0001
0.OOM)l
a
c
0
0.OOMx)l
%
0.0000001
5 Y
0 0000M)(31 0
20
40
60
80
IW
120
140
1M)
180
S b O f ICnQUOge
Figure I . The fraction of predicates for which positive examples outweigh counterexamples, as a function of the size of the language, The values of the other parameters are fixed at #segments = #predicates = 50, utterance length = 4 segments.
2. Model
A toy language is constructed by creating a set of utterances. Each utterance consists of a number of sound segments, and carries a meaning consisting of a basic predicate-argument structure, with a single predicate and one or more arguments. Both sound segments and meaning are randomly assigned to each utterance, uncorrelated with each other. The following features of the language could be varied as free parameters in the model: 0
Total size of language, number of distinct holistic utterances
0
Total inventory of sound segments (“#segments”)
0
Total semantic inventory of predicates (“#predicates”) Total semantic inventory of argumentsb
0
Number of sound segments in one utterance (“utterance length”)
Many different parameter combinations were investigated, to identify which regions, if any, in parameter space are conducive to creating a composite language the present analysis, the arguments are neglected. A full analysis is left for future work.
174 100% 90% 80%
70% 60% 50%
40%
30% 20% 10% 0%
She of language
Figure 2. The fraction of positive examples and the two types of counterexamples separately, as a function of the size of the language. The values of the other parameters are fixed at #segments = #predicates = 50, utterance length = 4 segments.
as argued by Wray (2000) and Smith (2006). For each parameter combination, a large number of toy languages (100,000or more) were generated and analysed. Once a language has been randomly generated with a given set of parameters, it is analysed for possible semantic-phonologicalconnections according to the following procedure: For all predicates and all sound segments in the language, the number of co-occurrences of predicate p , with segment sII in the same utterance are counted. For each predicate in the language, the phonological segment &'best that most often co-occurs with it is identified. For the segment Shest, if it co-occurs at least twice with p t , the following items are counted:
- The number of positive examples, where it co-occurs with p , in the
-
Same utterance. Counterexamples type 1, the occurrence of does not mean p t .
Sb&
in an utterance that
175
\
\‘t
015 C
zis
01
0
2 005
-
La@ I m l n u = l @ 2 0 )
0
0
2
4
6
a
10
12
S e g m n t s in uttetmce
Figure 3. The ratio of positive examples to counterexamples, as a function of utterance length, for two different language sizes. The values of the other parameters are fixed at #segments = #predicates = 50.
- Counterexamples type 2, an utterance that means p i but does not contain
Sbest.
The two types of counterexamples are shown separately in Figure 2. As can be seen there, both contribute substantially. In the rest of the figures, data is shown only for both types conflated. Various higher-order complications, like the possibility that the same segment s is the best choice for two different predicates, have been neglected. Taking such complications into account would only decrease the possibility of finding and reinforcing connections. It is also assumed for the sake of the calculation here, contra Tallerman (2007), and for that matter contra my own judgement, that segmentation of an utterance is unproblematic (see footnote a), and that proto-humans already have compositional semantics.
3. Results For all parameter combinations, the number of counterexamples were found to outweigh the number of positive examples by a considerable margin. For no parameter combination did the fraction of all predicates with more positive examples than counterexamples exceed 2% (Fig. 1). The most important parameter is language size. The smaller the language, the larger the fraction of predicate-segment connections with predominantly positive examples, as shown in Fig. 1, and the larger (but still much less than unity) is
176
0 0
20
40 60 80 Totd Inventory of sqgments In Imgucge
103
120
Figure 4. The ratio of positive examples to counterexamples. as a function of segment inventory, for two different language sizes. The values of the other parameters are fixed at #predicates = 50, utterance length = 4 segments.
the ratio of positive examples to counterexamples. This can be explained as a sampling effect, with random fluctuations being more important at small sample size, and also as a selection effect - only those predicates with at least two positive examples are counted at all, and this gives the positives a “head start” that is nonnegligible in a small language. Similarly, the number of segments per utterance has a substantial effect, with very short utteranccs bcing “bcttcr”, as shown in Fig. 3. For small languages the connection success gradually grows with increasing segment inventory and predicate inventory (Figs. 4 and 5 , upper curves). For large languages, the situation is different. Success rate is very low, largely independent of both segment inventory and predicate inventory (Figs. 4 and 5, lower curves). 4. Discussion
It is clear that there is only a small range of parameters for which the positive examples are not totally overwhelmed by counterexamples. The fractionation process has a non-negligible chance of success only for very small simple languages - but where would the pressure towards compositionality come from with a tiny language? And even for these tiny languages, success rate is small unless the inventory of segments and predicates is of the same order of magnitude as the total number of utterances in the language, which is hardly plausible. Unless it can be shown that humans totally disregard counterexamples when extracting patterns from data, the argument from counterexamples has considerable force.
177
0.18
0.16
Srrdll ~ ( n u 1 0 0 )
0.14
0.12 0.1 008
/
0.06
L age Icngrog3I nu =1 OOO)
0 04 0 02 0 10
100
lo00
N u n t e of predicates
Figure 5. The ratio of positive examples to counterexamples, as afunction of predicate inventory. for two different language sizes. The values of the other parameters are fixed at #segments = 50, utterance length = 4 segments.
References Arbib, M. A. (2003). The evolving mirror system: a neural basis for language readiness. In M. H. Christiansen & S. Kirby (Eds.), Language evolution. Oxford: Oxford University Press. Bickerton, D. (2000). How protolanguage became language. In Knight, StuddertKennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press. Bickerton, D. (2003). Symbol and structure: a comprehensive framework. In M. H. Christiansen & S. Kirby (Eds.), Language evolution. Oxford: Oxford University Press. Hauser, Newport, & Aslin. (2001). Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins. Cognition 78: B53-B64. Hurford, J. R. (2000). Introduction: the emergence of syntax. In Knight, StuddertKennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press. Jackendoff, R. (2002). Foundations of language. brain, meaning, grammar, evolution. Oxford: Oxford University Press.
178
Johansson, S. (2005). Origins of language - constraints on hypotheses. Amsterdam: Benjamins. Saffran, Aslin, & Newport. (1996). Statistical learning by 8-month old infants. Science 274:1926-1928. Smith, K. (2006). The protolanguage debate: bridging the gap. In A. Cangelosi, A. Smith, & K. Smith (Eds.), The evolution of language proceedings of the 6th international conference (evolarrg6)rome, italy 12 - I5 april2006. Singapore: World Scientific Publishing. Studdert-Kennedy, M. (2005). How did language go discrete? In M. Tallerman (Ed.), Language origins: Perspectives on evolution. Oxford University Press. Tallerman, M. (2004). Analysing the analytic: problems with holistic theories of the evolution of protolanguage. In Proceedings of 5th conference on evolution of language, leipzig. Tallerman, M. (2007). Did our ancestors speak a holistic protolanguage? Lingua 117:579-604. Wray, A. (2000). Holistic utterances in protolanguage: the link from primates to humans. In Knight, Studdert-Kennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press.
UNRAVELLING DIGITAL INFINITY CHRIS KNIGHT & CAMILLA POWER School of Social Sciences, Media and Cultural Studies, University of East London, Docklands Campus, London E l 6 2RD ‘The passage from the state of nature to the civil state produces a very remarkable change in man, by substituting justice for instinct in his conduct, and giving his actions the morality they had formerly lacked. Then only. when the voice of duty takes the place of physical impulses and right of appetite, does man, who so far had considered only himselJ find that he is forced to act on diferent principles, and to consult his reason before listening to his inclinations’. Jean-Jacques Rousseau, The Social Contract (1973 [1762]: 195).
1.1. Digital minds in an analog world
Language has sometimes been described as a ‘mirror of mind’. Chomsky attributes this idea to ‘the first cognitive revolution’ inspired by Descartes among others in the seventeenth century. ‘The second cognitive revolution’ triggered in large measure by Chomsky’s own work- is taken to have been a twentieth century rediscovery of these earlier insights into the nature of language and mind. In 1660, the renowned Port Royal grammarians (Arnauld and Lancelot 1972 [ 16601: 27) celebrated ‘this marvelous invention of composing out of twenty-five or thirty sounds that infinite variety of expressions which, whilst having in themselves no likeness to what is in our mind, allow us to disclose to others its whole secret, and to make known to those who cannot penetrate it all that we imagine, and all the various stirrings of our soul’.
If this ‘marvelous invention’ reflects some part of human nature, then on Cartesian first principles it must correspond to some innate mechanism in the biological mindbrain. Chomsky (2005) calls it ‘discrete infinity’. Or as Pinker (1999: 287) puts it: ‘We have digital minds in an analog world. More accurately, a part of our minds is digital.’ But if ‘a part of the mind is digital’, how did it ever get to be that way? Under what Darwinian selection pressures and by what conceivable mechanisms 179
180
might a digital module become installed in an otherwise analog primate brain? Can natural selection acting on an analog precursor mechanism transform it incrementally into a digital one? Is such an idea even logically coherent? If these were easy questions, the ‘hardest problem in science’ (Christiansen and Kirby 2003) might long ago have been solved. Chomsky concludes that the transition to ‘Merge’ - the irreducible first principle of ‘discrete infinity’ - was instantaneous, commenting that ‘it is hard to see what account of human evolution would not assume at least this much, in one or another form.’ Note that whatever the account of human evolution, the assumption of instantaneous language evolution must stand. Chomsky (2005: 11-12) writes: ‘An elementary fact about the language faculty is that it is a system of discrete infinity. Any such system is based on a primitive operation that takes n objects already constructed, and constructs from them a new object: in the simplest case, the set of these n objects. Call that operation Merge. Either Merge or some equivalent is a minimal requirement. With Merge available, we instantly have an unbounded system of hierarchically structured expressions. The simplest account of the “Great Leap Forward” in the evolution of humans would be that the brain was rewired, perhaps by some slight mutation, to provide the operation Merge, at once laying a core part of the basis for what is found at that dramatic “moment” of human evolution..._’ Merge, then, is more than an empirical necessity: it is a logical one. It is the procedure central to any conceivable system of ‘discrete infinity’. Merge is recursive: it means combining things, combining the combinations and combining these in turn- in principle to infinity. Chomsky suggests that a ‘slight mutation’ might have allowed the evolving brain of Homo supiens to do this for the first time. No matter how we imagine the physical brain, the transition to Merge is instantaneous, not gradual. This is because discrete infinity- ‘the infinite use of finite means’- either is or is not. What sense is there in trying to envisage ‘nearly discrete’ objects being combined in ‘nearly infinite’ ways? A moment’s thought should remind us that when objects are subject to even limited blending, the range of combinatorial possibilities crashes to a limited set. In short, for Merge to work, the elements combined must be abstract digits, not concrete sounds or gestures. Combining a sob with a cry would not be an example of Merge. Neither would we call it Merge if a chimpanzee happened to combine, say, a bark with a scream (Crockford and Boesch 2005).
181
1.2. Analog minds in a digital world One way to escape the conundrums inseparable from this position - conundrums foundational to all our debates and very well documented by Botha (2003)might be to keep the essential idea but reverse the underlying philosophy. Humans have analog minds in a digital world. More accurately, just a certain part of our world is digital. We are at one with our primate cousins in being immersed in ordinary material and biological reality - Pinker’s ‘analog world’. But unlike them, we have woven for ourselves an additional environment that is digital through and through. This second environment that we all inhabit is sometimes referred to as the ‘cognitive niche in nature’, but the evolutionary psychologists who invented this expression (Tooby and DeVore 1987) did so for their own special reasons. Adherents of the ‘cognitive revolution’ but attempting to weld Chomsky with their own mentalist version of Darwin, they were committed to minimizing the intrinsically social, cultural and institutional nature of the digital representations made available to our brains. The expression ‘cognitive niche’ may have explanatory value, but not if the purpose is to deny the existence of what social anthropologists and archaeologists term ‘symbolic culture.’ Contrary to those who coined the expression, the ‘cognitive niche’ actually doesn’t exist ‘in nature.’ No one has ever found such a niche in nature. As Tomasello (1999) points out, the niche in question exists only as an internal feature of human symbolic culture. So what exactly is this thing called ‘symbolic culture’? Following the philosopher John Searle (1996), let’s begin by drawing a distinction between ‘brute facts’ and ‘institutional facts’. Birth, sex and death are facts anyway, irrespective of what people think or believe. These, then, are brute facts. Legitimacy, marriage and property are facts only if people believe in them. Suspend the belief and the facts correspondingly dissolve. But although institutional facts rest on human belief, that doesn’t make them mere distortions or hallucinations. Take two five-pound banknotes and place them on the table. Now exchange them for a single ten-pound note. The identity of the two amounts is not merely a subjective belief it’s an objective, indisputable fact. But now imagine a collapse of confidence in the currency. Suddenly, the facts dissolve. It is crucial to Searle’s philosophy that institutional facts are not necessarily dependent on verbal language: one can play chess, use an abacus or change money without using language. The relevant digits are then the chess pieces, beads or coins that function as markers in place of any linguistic markers. Digital facts of this kind - the intricacies of the global currency system, for
182
example - are patently non-physical and non-biological. They are best conceptualized as internal features of an all-encompassing game of ‘let’s pretend’. Needless to say, the existence of such facts presupposes a brain with certain innate capacities, syntactical language being one possible manifestation of these capacities. But explaining distinctively human cognition by invoking ‘language’ is circular and unhelpful: it is precisely language that we need to explain. Institutional facts develop ontogenetically out of the distinctively human capacity for mindreading, joint attention and pretend-play (Leslie 1987; Tomasello 2006). Extended across society as a whole, ‘let’s pretend’ may generate a whole system of ritual and religion (Durkheim 1947 [1915)]; Huizinga 1970 [ 19491; Knight 1999; Power 2000). The morally authoritative intangibles internal to a symbolic community - that is, to a domain of ‘institutional facts’ - are always on some level digital. This has nothing to do with the supposedly digital genetic architecture of the human brain. The explanation is less mystical. It is simply that institutional facts depend entirely on social agreement - and you cannot reach agreement on a slippery slope. By definition, anything perceptible can be evaluated and identified through direct sensory input. But institutional intangibles are by definition inaccessible to the senses. They can be narrowed down and agreed upon only through a process in which abstract possibilities are successively eliminated. ‘Discrete infinity’ captures the recursive principle involved. The sound system of a language - its phonology - is prototypically digital. It is no more possible to compromise between the t and the d of tin versus din than to compromise between 1159 and 12.00 on the face of a digital clock. Of course, categorical perception is common enough in nature (Harnard 1987). But the meaningless contrastive phonemes of human language comprise only one digital level out of the two that are essential if meanings are to be conveyed at all. Combining and recombining phonemes - ‘phonological syntax’, as it is called by ornithologists who study the digital phenomenon in songbirds - would be informationally irrelevant if it did not interface with a second digital level, which is the one necessary if semantic meanings are to be specified. No animal species has access to this second level of digital structure. It would therefore be inconceivable and in principle useless anyway for an animal to make use of syntactical operations - whether Merge or anything else - in order to interface between the two digital levels. The explanation is that animals inhabit just their own biological world and therefore don’t have access to the extra digital level. It is the nature and evolution of the entire second level- the level of symbolic culture- that has proved so difficult to explain. Explaining ‘the Great Leap
183
Forward’ as an outcome of ‘Merge’ is a parsimonious solution (Chomsky 2005), but only in the sense that explaining it as an outcome of divine intervention might seem persuasive in terms of parsimony although less so in terms of testability. 1.3. A Darwinian solution The alternative (Knight 2000) is to conceptualize the language capacity as one special manifestation of a ‘play capacity’ continuous with its primate counterparts but let loose among humans in a manner not open to other primates. The development of ‘let’s pretend’ and the development of language in children are widely recognized as isomorphic. They have the same critical period, the same features of intersubjectivity and joint attention, the same triadic (‘Do you see what I see? ’) referential structure and the same cognitive expressivity and independence of external stimuli. It is unlikely that these parallels are a pure coincidence (Bruner et al. 1976; Leslie 1987; McCune-Nicolich and Bruskin 1982). ‘Digital infinity’ corresponds to what developmental psychologists might recognize as a children’s game - in this case, ‘let’s play infinite trust’. Take any patent fiction and let’s run with it and see where it leads. Metaphorical usage is an example of this. A metaphor ‘is, literally, a false statement’ (Davidson 1979). By accepting and sharing it, we construct it as truth on a higher level - truth for ‘our own’ joint purposes of conceptualization and communication. As fictional public representations become conventionalized and reduced to shorthands, one possible trajectory is that they crystallize out as linguistic signs. Grammatical markers and associated constructions are historical outcomes of processes of grammaticalization that are now well understood - processes that are essentially metaphorical (Meillet 1903; Heine et al. 1991; Gentner et al 2001). To evolve a grammar, in other words, humans must be trusting enough to accept falsehoods from one another. Animals cannot afford to do this. Their hard-to-fake signs - reliable signals on the model of human laughs, sobs, cries and so forth - are deception-resistant and evaluated for quality on an analog scale. Regardless of details of cognitive architecture, ‘honest fakes’ are in principle impossible to interpret in that way. Meaningless and valueless in themselves, they would read ‘zero’ on any costly signaling scale. Linguistic signs are ‘honest fakes’ - literal irrelevancies and falsehoods, significant only as cues to the intentions underlying them. Since communicative intentions are intangibles, processing them has to be digital by reason of conceptual necessity, not because the brain or any part of it is innately digital.
184
‘Animals,’ Durkheim (1947 [1915]: 421) long ago observed, ‘know only one world, the one which they perceive by experience, internal as well as external. Men alone have the faculty of conceiving the ideal, of adding something to the real. Now where does this singular privilege come from?’ Maynard Smith and Szathmary (1995) offered a bold answer to Durkheim’s question, citing Rousseau and viewing the puzzle of language origins as inseparable from the wider problem of explaining the emergence of community life based on social contracts. Their ‘major transitions’ paradigm is ambitious and conceptually unifying, assuming no unbridgeable chasm between natural and social science. The same applies to the paradigm being developed by Steels and his colleagues (Steels 2006; Steels et al. 2002), who use robots to show how lexicons and grammars - patterns far too complex to be installed in advance in each brain spontaneously self-organize through processcs of learning, recruitment, social co-ordination and cumulative grammaticalization. By maintaining continuity with primate cognitive evolution while introducing novel social factors, we can continue to apply basic principles of Darwinian behavioural ecology to account for the emergence of distinctively human cognition and communication. Pinker (1999: 287) concludes his book on ‘the ingredients of language’: ‘It is surely no coincidence that the species that invented numbers, ranks, kinship terms, life stages, legal and illegal acts, and scientific theories also invented grammatical sentences and regular past tense forms’. Confusing correlation with causation, Pinker here treats the supposedly digital concepts intrinsic to human nature as responsible for the legalistic distinctions of language and culture. Note, however, that the digital concepts he actually mentions here -whether linguistic or non-linguistic - belong without exception to the realm of agreements and institutions. Is there any evidence that a language faculty could operate at all outside such institutional settings? Reversing Chomsky - and correspondingly reversing the whole idea of ‘digital minds in an analog world’- we may conclude that ‘doing things with words’ (cf. Austin 1978 [1955]) is invariably more than just activating a biological organ. To produce speech acts is to make moves in a non-biological realm- a realm of facts whose existence depends entirely on collective belief. ‘Analog minds in a digital world’ is fully compatible with Darwinian evolutionary theory. ‘Digital minds in an analog world’ is not compatible at all. Installation of an innate digital mind - whether instantaneous or gradual - is a deus ex machina with nothing Darwinian about it. A model of language evolution, to qualify as scientific, cannot invent fundamental axioms as it goes along. It cannot invoke currently unknown physical or other natural laws. It ~
185
should be framed within a coherent, well-tried body of theory; it should generate predictions that are testable in the light of appropriate empirical data; and it should enable us to relate hitherto unrelated disciplinary fields. Whereas the deus ex machina approach rigidly rejects reference to any part of social science, the play/mindreading/joint attention paradigm (Tomasello 1996, 1999, 2003, 2006) has the potential to link the natural and social sciences in a theory of everything .
References Arnauld and Lancelot (1972 [1660]). Grammaire ge'ne'rale et raisonne'e, ou la grammaire de Port-Royal. Re'impression des editions de Paris, 1660 et 1662. Genkve: Slatkinem. Austin, J. L. (1978 [1955]). How to Do Things with Words. Oxford: Oxford University Press. Botha, R (2003). Unravelling the Evolution of Language. Oxford: Elsevier. Bruner, J. S., A. Jolly and K. Sylva (eds) (1976). Play: Its role in development and evolution. New York: Basic Books. Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry 36( 1): 1-22. Christiansen, M. H. and S. Kirby, (2003). Language evolution: the hardest problem in science? In M. H. Christiansen and S. Kirby (eds), Language Evolution. Oxford: Oxford University Press, pp. 1- 15. Crockford C. and Boesch C. (2005). Call combinations in wild chimpanzees. Behaviour 142(4): 397-421. Davidson, R. D. (1979). What metaphors mean. In S. Sacks (ed.), On Metaphor. Chicago: University of Chicago Press, pp. 29-45. Durkheim, 8. (1947 [1915]). The Elementary Forms of the Religious Life. A study in religious sociology. Trans. J. W. Swain. Glencoe, Illinois: The Free Press. Hamad, S. (1987). Categorical Perception: The groundwork of cognition. Cambridge: Cambridge University Press. Gentner, D., Holyoak, K. J., and Kokinov, B. N. (eds.), (2001), The Analogical Mind: Perspectivesflorn cognitive science. Cambridge, MA: MIT Press. Heine, B., U. Claudi and F. Hunnemeyer (1991). Grammaticalization: A conceptualpamework. Chicago and London: University of Chicago Press. Huizinga, J. (1970 [1949]). Homo Ludens. A study of the play element in culture. London: Granada. Knight, C. (2000). Play as precursor of phonology and syntax. In Knight, C., M. Studdert-Kennedy and J. R. Hurford (eds), The Evolutionary Emergence of Language. Social function and the origins of linguistic form. Cambridge: Cambridge University Press, pp. 99-1 19.
186
Leslie, A. (1987). Pretence and representation: The origins of ‘theory of mind’. Psychological Review 94: 412-426. Maynard Smith, J. and E. Szathmary (1995). The Major Transitions in Evolution. Oxford: W. H. Freeman. Meillet, A. (1903). Introduction ri l’e‘tude comparative des langues indoeuropdens. Paris: Hachette. McCune-Nicolich and C. Bruskin (1982). Combinatorial competency in play and language. In K. Rubin and D. Pebler (eds), The Play of Children: Current Theory and Research. New York: Karger, pp. 30-40. Pinker, S. (1999). Words and Rules. The ingredients of language. London: Weidenfeld and Nicolson. Power, C. (2000). Secret language use at female initiation. Bounding gossiping communities. In C. Knight, M. Studdert-Kennedy and J. R. Hurford (eds), The Evolutionary Emergence of language: Social function and the origins of linguistic form. Cambridge: Cambridge University Press, pp. 8 1-98. Rousseau, J.-J. (1973 [1762]). The social contract. In Jean-Jacques Rousseau, The Social Contract and Discourses. Trans. G. D. H. Cole. New edition. London & Melbourne: Dent, pp. 179-309. Searle, J. R. (1996). The Construction of Social Reality. London: Penguin. Steels, L. (2006). Experiments on the emergence of human communication. Trends in Cognitive Sciences, 1O(8): 347-349. Steels, L., F. Kaplan, A. McIntyre, and J. van Looveren (2002). Crucial factors in the origins of word meaning. In A. Wray (ed.), The Transition to language Oxford: Oxford University Press, pp. 252-271. Tomasello, M. (1996). The cultural roots of language. In B. J. Velichkovsky and D. M. Rumbaugh (eds), Communicating Meaning. The evolution and development of language. Mahwah, NJ: Erlbaum, pp. 275-307. Tomasello, M. (1999). The Cultural Origins of Human Cognition. Cambridge, MA: Harvard University Press. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, M. (2006). Why don’t apes point? In N. J. Enfield and S . C. Levinson (eds), Roots of Human Sociality: Culture, cognition and interaction. Oxford & New York: Berg, pp. 506-524. Tooby, J. and I. DeVore (1987). The reconstruction of hominid behavioral evolution through strategic modeling. In: W. G. Kinzey (ed.), The Evolution of Human Behavior: Primate models. Albany: State University of New York Press, pp. 183-237.
LANGUAGE SCAFFOLDING AS A CONDITION FOR GROWTH IN LINGUISTIC COMPLEXITY
KIRAN LAKKARAJU’, LES GASSER1”, AND SAMARTH SWARUP’
’
Computer Science Department Graduate School of Library and Information Science University of Illinois at Urbana-Champaign {klakkara I gasser I swarup}@uiuc.edu Over their evolutionary history, languages most likely increased in complexity from simple signals to protolanguages to complex syntactic structures. This paper investigates processes for increasing linguistic complexity while maintaining communicability across a population. We assume that higher linguistic communicability (more accurate information exchange) increases participants’ effectiveness in coordination-based tasks. Interaction, needed for learning others’ languages and for converging to communicability, bears a cost. There is a threshold of interaction (learning) effort beyond which (the coordination payoff of) linguistic convergence either doesn’t pay or is pragmatically impossible. Our central findings, established mainly through simulation, are: I) There is an effort-dependent “frontier of tractability” for agreement on a language that balances linguistic complexity against linguistic diversity in a population. To remain below some specific bound on collective convergence effort, either a) languages must be simpler, or b) their initial average communicability must be higher. To stay below such a pragmatic effort limit, even agents who have the ultimate capability for complex languages must not invent them from the start or they won’t be able to communicate; they must start simple and grow complexity in a staged process. 2) Such a staged approach to increasing complexity, in which agents initially converge on simple languages and then use these to “scaffold’ greater complexity. can outperform initially-complex languages in terms of overall effort to convergence. This performance gain improves with more complex final languages.
1. Introduction
Language evolution studies generally assume that the developmental trajectory for human languages followed stages from simple signaling systems to holistic protolanguages to simple compositional languages, and finally to the lexically and syntactically complex languages known today. If languages indeed grew from the simple to the complex, several questions need answering; two of these are: 0
0
Could complex languages ever emerge early? Why or why not? Local, individual innovations that increase linguistic complexity also create linguistic diversity and, at least temporarily, reduce communicability. How can a population maintain the communicability of its language while accommodating the diversity of innovation? 187
188
While inspired by the enduring issues of human language evolution, we are primarily interested a design stance: evolving artificial languages for artificial agents. We need to discover general principles of language emergence that also cover automated agents with different sensorimotor, cognitive, and/or interactional possibilities from humans, their evolutionary predecessors, or animals. We believe, in fact, that language evolution is a model problem for issues that arise in many kinds of distributed semantic systems, including Web semantics, resource description-discovery (metadata) systems, cartographic systems, and biological systems. One case in point is the intentional creation and ongoing revision of XML-based semantic web languages. These can vary in complexity (number of terms, syntactic categories, etc.), and they exhibit frequency-dependent “network effects”: any single language in the space has little value until a large population of agents can apply and interpret it. In this situation also, the two questions abovc are important: communities must converge on shared languages quickly, and ongoing linguistic innovations should only minimally disrupt the use of the language.
1.1. Assumptions We are interested in artificial agents that operate continuously over long periods of time in complex worlds, performing tasks that require coordination. The value of (reward from) successful coordination drives information exchange, which in turn drives agents to create and share languages. While rewards actually come from doing things with shared information, we can usefully attribute at least part of the reward to the language itself. Thus a language that allows agents to exchange more critical information or to coordinate better has a higher value. We assume that agents need to talk to each other about conditions and events in their worlds, and this talk is valuable in the sense above. The ability to describe and distinguish objects and actions are the fundamental kinds of information needed for coordination and increased fitness. We consider task complexity to be information-theoretic. That is, tasks differ in complexity on the basis of how many different objects, situations, and actions they involve, and how much information is needed to reliably distinguish these objects, situations, and actions. This becomes important later when we discuss how to measure complexity of language. The ability to handle greater task diversity and task complexity increases agents’ fitness; greater linguistic complexity helps enables this (as greater cognitive and motor complexity, etc. also would). Since tasks of interest here require successful communication, and since what needs to be communicated for unsophisticated tasks is different (“simpler”) than what needs to be communicated for complex tasks, agenl communication languages have to vary with task complexity. For agents to become competent at more complex tasks, they need more complex languages. This means that languages have to change in complexity over time.
189
2. The complexity-diversity-effort frontier
Since collective activity is ongoing and must remain so while complexity grows, we have a difficult problem: how do agents change their languagesfrom simple to complex while maintaining communicability? Language variation must originate at the individual level (Croft, 2001). If this is so, then as an agent originates a change from a fully communicative language, the agent will become less communicative with others, thus less effective in coordinated tasks. For language to grow in complexity this means there is a trajectory through which agents must somehow innovate (increasing complexity and decreasing communicability), then then build up communicability again by learning and propagating the innovations. This disruptive shift characterizes each increase in complexity. Computational tractability is Hwoihesized Relaiionshios an issue for this complexity growth. We hypothesize that given any set of agents with 6 a fixed cognitive structure and a set of tasks (need for lan- guage), there exists afrontier of 4 tractability for convergence to a $ m common language. Informally, 5 for a set of languages L of a =I given complexity C , greater iniIncreasing Initial Linguistic Diversity + tial diversity in the subset 1 of L spoken in the population will imFigure 1. Conjectured tractability frontiers. ply greater learning effort (e.g. time) to converge the population to full communicability. Similarly, for a given degree of initial linguistic diversity D, higher linguistic complexity implies greater effort to converge the population to full communicability. Let us limit the available convergence time (i.e., effort to converge) to some amount E and plot c = f ( d , E ) , where f means “given a set of agents whose set of languages exhibits diversity d, let f ( d , E ) equal the maximum linguistic complexity for which the population will converge within E units”. Then we will see a curve with the following property: Any complexitydiversity point “under” the curve limited by E (i.e. where for any point ( c ,d ) , c < f ( d , E ) ) will converge in time bounded by E while any point “above” the curve (i.e. c > f ( d , E ) )will take longer than time E . (See Figure 1.) E establishes a tractability frontier of complexity and diversity. Higher linguistic complexity lowers the degree of diversity the population can sustain and still converge within E. As a result, for languages that are higher in complexity, agents must make fewer, smaller innovations (introduce less diversity) if they are
f
6
~~
~
190
to converge within E.
Similarly, for a population to exhibit greater linguistic diversity and still have the possibility of converging tractably, its linguistic complexity must be lower. If a population is going to be highly innovative linguistically, introducing great diversity, then its language must be simple enough that the effort to converge from more widely varying linguistic “starting points” remains below E. Throughout this discussion we focus on languages as lexical matrices. A study of convergence frontiers for structured, compositional languages (languages with a grammar) is left for future work.
3. Implementation and experiments We demonstrate the existence of tractability frontiers through an experiment. Each agent represents its language as a Form-Meaning Association Matrix, which is a likelihood matrix that explicitly stores the joint likelihood of the forms and meanings. Forms are symbols in the language and meanings are concepts that can be talked about. For the present, we assume the simplest possible setup: the number of forms and meanings is equal, and the set of forms and meanings is shared among all the agents, so they are only tasked with achieving consensus on the associations between forms and meanings. The language game proceeds through random interactions between agents. We assume a “full information” scenario, where agents provide form-meaning pairs to hearers. A speaker generates a form for a given meaning, j , by finding the element in column j of its form-meaning matrix, that has maximum value. This is a maximum likelihood rule for language production. If aij is the current value of the hearer’s form-meaning matrix for the given symbol-meaning pair, it gets updated as follows: c~ij= q . uij (1 - 7 ) . Additionally, all the values in row i are updated as nic = q . cric Qc # j , and all values in column j are updated in the same way, arj = q . orJQr # i. This “lateral inhibition” is meant to discourage synonymy and polysemy (Vogt & Coumans, 2003).
+
3.1. Measuring linguistic diversity In order to understand the limits of this process, it is necessary to understand how much diversity can be introduced in a population such that the population can still return to (or maintain an adequate degree of) communicability to be successful in the ongoing tasks they face. There are several principled ways to measure linguistic diversity. Greenberg’s index (Greenberg, 1956) measures diversity as the probability that a pair of randomly selected individuals from the population do not speak the same language.
i
191
where pi is the probability of encountering a speaker of language i . Greenberg also suggests modifying this formula to take into account the similarity between languages, thus, ij
where rij is a measure of the overlap between languages i and j. A and B are both measuring communicability (or rather, the lack of it) in the population. We say a population is converged if the communicability is 1, i.e. diversity is 0. Another measure, more popular in genetics, is known as the Jensen-Shannon diversity (see, e.g., Grosse et al., 2002), given by,
J
= H(A1Pl
+ A#, + . . . + A,P,)
-
CA,H(P,),
(3)
i
xi
where X i = 1, and the Pi are the probability distributions describing the languages (form-meaning associations). H is the Shannon entropy function. Since languages for our agents are defined as the joint likelihood matrices for forms and meanings, J measures the diversity in the corresponding probability distributions which are obtained by normalizing the form-meaning matrix. When all distributions are identical, J = 0. The difference between Greenberg’s index and Jensen-Shannon diversity is analogous to the difference between phenotype and genotype in biology. J is a measure based on the underlying probability distribution, and A and B are more “behavioral” measures as they directly evaluate communicability. When J = 0, A and B are also 0, and J attains its maximal value, A and B equal 1. However, it is possible to have perfect communicability even if the underlying distributions are not identical, since communicability depends on the maximum likelihood interpretation.
3.2. Generating diversity To evaluate the tractability frontier we need to create a population with a specified diversity, not measure the diversity of a given linguistic population. To d o this we initialize the agents with identity matrices for their form-meaning mappings. Then we devolve this perfectly converged state by adding a uniform random variable, drawn from a range [0,€1, to each value in the matrix. It turns out that the noise level, E, is very strongly correlated with Greenberg’s index and the JensenShannon diversity. In other words, by increasing E , we can smoothly and (nearly) linearly increase the diversity of the population according to these two measures. We have confirmed this fact through careful simulation (not presented here for lack of space).
192
3.3. Linguistic Complexity
Complexity is determined by both form and meaning complexity. McWhorter has defined four criteria for the evaluation of the complexity of a language (McWhorter, 2001), based on phonology, syntax, grammaticalization, and morphology. However, only his grammaticalization criterion makes reference to meanings. It says that a language is more complex if it makes finer semantic and pragmatic distinctions. The language of an agent also reflects its cognitive capabilities, and an agent capable of making greater cognitive distinctions will have a more complex language simply by virtue of being able to express more meanings. This is an information-theoretic notion of complexity, as discussed earlier, and should be included in a measure of linguistic complexity. This is understandably hard to do for natural languages, but is the criterion we use in our simulations because artificial agents, in particular, can differ widely in their cognitive capabilities and characterizing this distinction is essential in a discussion of language evolution.
3.4. ~xperimentalresult We measure effort as number of iterations required to converge. We initialize a B i8 population of ten agents with varying levels of diversity g‘6 ’tas described above. We also vary the complexity of 8’0 the language by varying the number of meanings. Then ie we run the language game i* 8 25 3 3s * for each initial condition and D I Y ~ Hrll*iowm Y oy wsii evaluate the number of iterations necessary to converge Figure 2. Time to convergence vs. complexity and diversity. to a communicability level greater than 0.9. This gives us a three-dimensional graph, shown in two dimensions in Figure 2, with time to convergence colorcoded. We see a clear emergence of frontiers, demarcated by regions of different colors, confirming our hypothesis from Fig. 1. ‘4
a
t
1s
2 P.~u,,Wlon
1s
5
N m
4. Scaffolding and staged learning “Scaffolding” is one means of overcoming the diversity/complexityfrontier established by E. Scaffolding is a general human learning strategy, and its existence and efficacy has been reported for language learning both in the psychological literature (Iverson & Goldin-Meadow, 2005) and in simulation work (Elman, 1993). Lee, Meng, and Chao (2007) provide a model of “staged learning” that cap-
193
tures the idea of scaffolding. Agents a) constrain choices, b) act within those constraints until c) no novelty appears, then d) lift some constraints, and repeat. Constraints temporarily reduce the agents’ decision space. When quiescence occurs at one stage, strategically-chosen constraints are lifted. (Thus staged learning is order-dependent and there are likely more and less effective developmental trajectories.) Learning commences again in an extended decision space, now biased by the structures and generalities learned in prior stages. We created such a staged version of our experiments as 00 t 0 12 follows. We choose a max1.0 0.1 111 I imum number of meanings, n, that the population has to converge upon. However, the Stage 3 Stage 3 Beginning agents do not consider all of these meanings initially. Y They start at Stage 1. Ill1 111 0 1 01) The number of active meani f 1 0 0 114 111 ings (= “used in language games”) is a function of the Stage 4 Stage 4 stage number. The complexBeginning End ity step size 6 represents how many new meanings to make Figure 3. Moving from Stage 3 to Stage 4 uncovers a row and active per stage, ~h~~ the column of the matrix. The grey areas are hidden to the agent until it reaches that stage. 6 = 1 number of meanings active at Stage i is 6i. If the system is in Stage 4 and 6 = 4 there are 16 active meanings. Each agent is initialized with a rn x n lexical matrix. However at each stage i, an agent only sees part of its full lexical matrix, of size i6 x 26. As the stages progress more of the agents’ lexical matrix is revealed, as illustrated in figure 3. The system changes stages based on the communicability of the population. Let 0 be the stage transition communicability threshold. When the population has communicability 2 8 in stage i , it has converged to within 8 on i6 x i 6 forms and meanings. It then moves to the next stage and uncovers new meanings for each agenta. At this transition point, (i - 1)6 meanings have already been converged upon (to within B ) , and 6 meanings are new. These earlier convergence decisions bias agents’ learning choices for the new, larger matrix. This is scaffolding. To confirm the value of staging, we repeated the earlier experiment with staging added, evaluating the new tractability frontier for varying complexity and diversity levels. Note the axes of this plot go much farther than the axes in Figure 2. In
I
b
”Collective ordering of meanings is an issue, with several possible efficient approaches, e.g. common environment structure. We leave to future work a more detailed model exploring this topic.
194
fact we began each simulation with 10 meanings and 10 forms because smaller matrices converge very quickly. Even with higher initial noise levels and number of meanings going up to 30, we see that the population converges in a fairly short amount of time. Staging has pushed out the tractability frontier greatly.
5. ~onc~usions We have shown the need for scaffolding in language learning to be a fundamental requirement arising from the z5 5woc tradeoff between complexity 5 3wlter-set-prototype block. Different arguments are available in a learning situation. Consider for instance a situation in which the speaker used the utterance “the frouple” to discriminate object 01 in figure 1. The hearer indicated that he/she could not understand this utterance. The speaker then drew the attention to the topic by pointing to it. This presents a learning opportunity for the hearer. The filter-set-prototype block now has the source-set, which includes all objects in the scene, and the target-set, which contains the topic. It can try to infer the concept that could account for the filtering from the source-set to the targetset. The hearer could assume that this concept is the one meant by the word “frouple” and add this mapping in hisher lexicon.
340
Figure 1. A scene with a number of labelled objects of varying size and shape.
3. Constraint programs A semantic building blocks can have multiple operational modes depending on the availability of the arguments. Put differently, each block represent an omnidirectional relationship among a number of variables. Such relationships can be computationally modelled as constraints. The encapsulated functionality that implements the grounding and learning method and the semantic function enforre the relationship. The resulting procedural constraints can however be declaratively combined by linkinga relevant slots. The result is a constraint program that represents compositional meaning.
The constraint paradigm is a model of computation in which values are deduced whenever possible [. ..]. One may visualize a constraint ‘program’ as a network of devices connected by wires. Data values may flow along the wires, and computation is performed by the devices. A device computes using only locally available information (with a few exceptions), and places newly derived values on other, locally attached wires. (Steele, 1980) The interpretation of a constraint program can be seen as a constraint satisfaction problem, for which efficient algorithms exist. Our implementation uses a extension of the AC-4 algorithm (Mohr & Henderson, 1986) which implements a strong form of generalized relational arc-consistency. It involves constraint-ordering heuristics, and uses a look-ahead search to find the actual solutions.
3.1. Exumples Figure 2 depicts the constraint program that represents the meaning for the utterance “the bigger ball”. The particular values and data flow correspond with the interpretation of this program in the context of the scene shown in figure 1. The filter-set-protype constraint takes the context and the BALL prototype, and yields the set that contains all balls. Thefilter-set-comparison constraint takcs this set and the comparator BIG and selects the bigger one, i.e. the topic 04. asuchlinks represent equality relationships
341 BIG
BALL
IL
{oi, ,_,, ._..06) 06)
4
filter-setprototype
1
dl
{033O4+35)4 filter-set-
cornpanson
t,
04
Figure 2. The constraint program and interpretation data flow for “the bigger ball”.
Figure 3 shows the data flow involved in a learning situation. The hearer did not understand the modifier but was shown the topic 04. The hearer did properly understand “ball” and could thus produce the source-set taken by the filter-set-comparison constraint. This constraint can then, given the topic, infer the modifier BIG, and a new entry can be added to the lexicon.
{oi, .... 06)
Figure 3. The data flow involved in the inference of the modifier concept.
Figure 4 depicts the program and interpretation data flow for “the box close to the pyramid”. The~filter-set-relationconstraint takes the set of boxes as source-set, the pyramid as landmark, and CLOSE-TO as relation concept. Given these parameters, it can properly discriminate the topic 02. BOX
IL
C LOSE-TO
L
prototype
PYRAMID
\1
filter-set-
Figure 4. The program and interpretation data flow for “the box close to the pyramid”
4. Conceptualisation
We can now turn our attention to the conceptualisation of the compositional meaning. Since this meaning is represented as constraint programs, its conceptualisation must involve a process that constructs such programs. The input for this process is a communicative goal, such as “discriminate topic
342 in the sensory context". It must construct a constraint program that, when interpreted by the hearer, is expected to satisfy that goal. There are typically many potential programs that could fulfil a given goal. Various criteria are defined for measuring their relative strengths, such as the level of ambiguity involved, the expressibility in an utterance, the complexity, etc. Finding a suitable constraint program is a combinatorial problem. The constraint program composer algorithm used in our system involves a number of techniques and strategies for keeping the combinatorial explosion in check.
z
Eager, incremental search. The algorithm searches for suitable constraint programs by incrementally expanding incomplete programs, one constraint at a time. There can be many candidate constraints at each step. These candidates are handled in separate branches. The expanded programs are evaluated according to some heuristics to decide which branch to expand next. Solutions are found more efficiently with this strategy. Goal-directed search. If the goal is to discriminate a topic in a context, then the target program must be such that the topic can be inferred from the given concepts and context. In other words, one of the potential data flows in that program must be a coherent, non-cyclic one from the context and concepts to the topic. The algorithm tries to satisfy this requirement by only adding constraints that incrementally extend the data flow backwards. Each constraint is added to support a goal. The initial goal is the topic. Each constraint supports a goal by adding a piece of data flow. The added data flow connects the goal with the new sub-goals introduced by the constraint. When a filter-set-prototype constraint is for example added and its target-set slots is linked with the goal, then the new sub-goals are the source-set, unless it is linked with the context, and the prototype, unless it is expressed in the utterance. A more detailed description of this search process can be found in (Van den Broeck, 2007). All potential expansions that do not properly contribute to the data flow, are ignored. This significantly reduces the size of the search space. The number of potential combinations of T constraints from an inventory of n constraints is (); (the multi-set coefficient). The average number of potential links between the slots of T constraints with an average arity of a is s ( k , a ) = ( k - I) a ( ( k - 1)a 1)/2. The total number of potential constraint programs of size k is thus approximately (L)2"'k'"',while the size of the incrementally explored search space of constraint programs of maximum (1)2s(k,a). size IC is approximately For a small test case with 5 kinds of constraints with an average arity of 2.6 and a maximum program size of 6, the total number of partial constraint programs is approximately 5.199348e29. The goal-directed search does however find a suitable program (if there is one) after on average 262 expansions when conceptualising a program for a randomly chosen topic in our benchmark scene collection.
+
c,"=,
343 Interleaved constraint satisfaction. Determining if a constraint program fulfils the goal is done by interpreting it using the aforementioned constraint satisfaction algorithm. This algorithm also identifies branches with inconsistent partial programs, which can be pruned. Interleaving the constraint satisfaction in the incremental search furthermore minimizes the amount of consistency enforcing (when using AC-4), because all enforcing applied on some partial program caries over to the expanded programs. Chunking An additional technique we are currently exploring is chunking. This technique consists of taking a (part of a) successfully used semantic program and wrap it such that it can be re-used as a constraint in future programs. We call these composite constraints, since they are composed of a number of component constraints. The initially given constraints are in contrast called primitive constraints. Figure 5 depicts a constraint program that involves a composite constraint which wraps two primitive constraintsb. This composite constraint has four slots, which are internally linked with the appropriate slots of the component constraints. BOX
BALL
LEFT-OF
BIG
t
Figure 5. The constraint program and data flow for the interpretation of “the box left of the big ball”. This program involves a composite constraint that wraps two primitive constraints.
The composite constraint inventory of an agent is initially empty. New composites are created according to some chunking strategy. We currently use a basic strategy that chunks complete constraint programs. The resulting composite constraints are candidates, just like primitives, with which to expand incomplete programs. Adding a composite corresponds to jumping to a point in the search space that previously proved to be useful. First experiments show that chunking and re-using the resulting composites, significantly improves the performance of the composer algorithm, as shown in figure 6. These telling results were obtained in spite of the basic chunking strategy we currently use. The chunking strategy is also interesting because it can be relevant at the language level. In particular the potential relationship between composite constraints and grammatical constructions is intriguing, but unfortunately bcomposite constraints can also be hierarchically composed
344 2
.-
60
I
p 50 an.
p
40
g
30
.-2E
20
8
2 10
0
0
50
100
wilhouichunking
1%
-
200
250
with chunting
300 350 conceptualisations
----
Figure 6 . Comparison of run-time needed to conceptualise a series of topics, with and without chunk-
ing. beyond the scope of this paper. Finally we would like to note that the composer is also useful when a hearer could not fully reconstruct the constraint program due to misunderstanding or under-specification. The composer can in these cases propose potential completions of the incomplete program. 5. Conclusions
In this paper we showed how representing rich, compositional meaning in terms of constraints and constraint programs offers a uniform framework for dealing with their interpretation and conceptualisation. We demonstrated how the flexible data flows handles interpretation and appropriately adapts to learning situations. The bundling the semantic functions together with the grounding and learning methods affords a tight interaction between the interpretation and the concept acquisition. Encapsulating the procedural details of the bundled functionality allows experimenters to combine different techniques transparently. The constraint based representation of meaning enabled us to draw upon the well-developed body of knowledge on constraint processing in the fields of artificial intelligence and operations research. The interpretation of the constraint based representation constitutes a constraint satisfaction problem, for which optimal algorithms exist. The conceptualisation on the other hand, is implemented as a incremental composer of constraint programs. A number of techniques and strategies were discussed that effectively keep the involved combinatorial explosion in check. In traditional first-order logic representations of meaning, the concepts are typically represented as predicates. In a constraint based approach, the concepts are rather arguments for the semantic constraints, which can be thought of as relational predicates. A constraint based semantics can thus be regarded as a second-order semantics.
345 Finally, the proposed system does not favour any particular model or formalism concerning the emergence and evolution of language in general, or grammar in particular. It should thus be adoptable in a wide array of experimental and theoretical settings. One particular setting is presented elsewhere in this collection (Bleys, 2008). Acknowledgements
This research is supported by Sony Computer Science Laboratory in Paris and the ECAGENTS project funded by the Future and Emerging Technologies programme (IST-FET) of the European Community under EU R&D contract IST-2003-1940. It builds on the work first introduced in Steels (2000) and elaborated on in Steels and Bleys (2005). References
Blackburn, P., & Bos, J. (2005). Representation and inference for natural language. a first course in computational semantics. CSLI Publications. Bleys, J. (2008). Expressing second order semantics and the emergence of recursion. In A. D. M. Smith, K. Smith, & R. F. i Chancho (Eds.), The evolution of language: Evolang 7. World Scientific. Dechter, R. (2003). Constraint processing. Morgan Kaufmann. Mohr, R., & Henderson, T. C. (1986). Arc and path consistency revisited. Artijicial Intelligence, 28(2), 225-233. Plunkett, K., Sinha, C., Moller, M. F., & Strandsby, 0. (1992). Symbol grounding or the emergence of symbols? vocabulary growth in children and a connectionist net. Connection Science, 4,293-312. Roy, D. K., & Pentland, A. (2002). Learning words from sights and sounds: a computational model. Cognitive Science, 26, 113-146. Smith, A. D. M. (2005). The inferential transmission of language. Adaptive Behavior, 13(4), 31 1-324. Steele, G. L. (1980). The definition and implementation of a computerprogramming language based on constraints. Unpublished doctoral dissertation, MIT. Steels, L. (1996). Perceptually grounded meaning creation. In M. Tokoro (Ed.), Icmas96. AAAI Press. Steels, L. (2000). The emergence of grammar in communicating autonomous robotic agents. In W. Horn (Ed.), Ecai2000 (pp. 764-769). Amsterdam: 1 0 s Press. Steels, L., & Bleys, J. (2005). Planning what to say: Second order semantics for fluid construction grammars. In A. Bugarin Diz & J. S. Reyes (Eds.), Proceedings of caepia '05. lecture notes in ai. Berlin: Springer Verlag. Van den Broeck, W. (2007). A constraint-based model of grounded compositional semantics. In Proceedings of langro '2007.
THE EMERGENCE OF SEMANTIC ROLES IN FLUID CONSTRUCTION GRAMMAR
REMI VAN TRIJP
Sony Computer Science Laboratoiy Paris, Rue Amyot 6, Paris, 75005, France
[email protected] This paper shows how experiments on artificial language evolution can provide highly relevant results for important debates in linguistic theories. It reports on a series of experiments that investigate how semantic roles can emerge in a population of artificial embodied agents and how these agents can build a network of constructions. The experiment also includes a fully operational implementation of how event-specific participant-roles can be fused with the semantic roles of argument-structure constructions and thus contributes to the linguistic debate on how the syntax-semantics interface is organized.
1. Introduction Most linguists agree that there is a strong connection between the semantic representation of a verb and the sentence types in which the verb can occur. Unfortunately, the exact nature of the syntax-semantics interface is still a largely unresolved issue. One approach is the lexicalist account (e.g. Pinker (1989)) in which it is assumed that there exists a list of universal and and innate ‘semantic roles’ (also called ‘thematic’ or ‘theta’ roles). In the lexicon it is then specified how many arguments a particular verb takes and which semantic roles they play. For example, the verb push (as in Jack pushes a block) is listed as a two-place predicate which assigns the roles ‘agent’ and ‘patient’ to its arguments. These roles are then ‘projected’ onto the syntactic structure of the sentence through a limited (and usually universal) set of linking rules. Differences in syntactic structures are taken as indicators for differences in the semantic role list of a verb. Recently, however, the lexicalist approach has come under serious criticism. Goldberg (1995, p. 9-14) points to the fact that lexicalists are obliged to posit implausible verb senses in the lexicon. For example, a sentence like she sneezed the napkin off the table would count as evidence that the verb sneeze is not only an intransitive verb as in she sneezed, but that it also has a three-argument sense ‘X causes Y to move to 2’ and that it assigns the roles ‘agent’, ‘patient’ and ‘goal’ to its arguments. The lexicalist approach also fails to explain coherent semantic interpretations in creative language use and coercion effects, for example in A gruff ‘police monk’ barks them back to work (Michaelis, 2003, p. 261). 346
347 As an alternative, Goldberg (1995) proposes a constructionist account which we will adopt in this paper. Here, a verb’s lexical entry contains its verb-specific ‘participant-roles’ rather than a set of abstract semantic roles. To take push as an example again, two participant-roles are listed: the ‘pusher’ and the ‘pushed’. These participant-roles have to be “semanticallyfused” with semantic roles, which Goldberg calls ‘argument roles’ (p. 50) and which are slots in argument-structure constructions. Constructions are like the linking rules of the lexicalist approach in the sense that they are a mapping between meaning and form, but the difference is that they carry meaning themselves and that they add this meaning to the sentence. So instead of positing different senses for the verb to accommodate sentences such as he pushed a block and he pushed him a block, parts of the meaning are added by the verb and other parts are contributed by the constructions. For example, in he pushed him a block the ‘recipient’-role is added by the ditransitive construction which maps the meaning ‘X causes Y to receive Z’ to a syntactic pattern. In the constructionist account, semantic roles are no longer treated as universal nor as atomic categories. This is supported by empirical evidence from both cross-linguistic studies as from research on individual languages (Croft, 2001). Even for a specific category such as the English dative, the “relation between form and meaning is rather indirect and multi-layered’’ (Davidse, 1996). Moreover, it is shown that there is a gradient evolution from lexical items to become more grammaticalized (Hopper, 1987), which leads more and more linguists to the conclusion that pre-existing categories don’t exist (Haspelmath, 2007). The constructionist account is more plausible from an empirical point of view, but so far it leaves two questions unanswered: where do semantic roles come from and how exactly does ‘fusion’ work? This paper addresses both issues through experiments on artificial language evolution. It first proposes a fully operational implementation of the constructionist approach using the computational formalism Fluid Construction Grammar (Steels & De Beule, 2006, FCG). Next, the experiment itself is described. Since the experiment deals with artificial languages, the examples in this paper should not be confused with actual grammar descriptions, but rather as indicators of the minimal requirements for explaining semantic roles.
2. Semantic Roles and Fusion in Fluid Construction Grammar In FCG, a language user’s linguistic inventory is organized as a network of rules which is dynamically updated through language use. Figure 1 illustrates the relevant part of a speaker’s network for the utterance Jack pushes a block. There are three lexical rules on the left for jack, push, and block, which introduce the individual meanings of these words. In a logic-based representation, the complete meaning can be represented as (3 v, w, x, y, z: jack(v), block(w), push(x), pushl(x, y), push-2(x, z)}. Note that the lexical rule for push contains two participantroles and that these are represented as predicates themselves. Instead of the names ‘pusher’ and ‘pushed’, the more neutral labels ‘push-1’ and ‘push-2’ are used.
348 The careful reader will have noticed that there is a problem with the meaning: the variables v and y are bound to the same object (jack) so they are coreferential. Similarly, the variables w and z are coreferential because they are bound to the same object (block). Expressing coreferentiality between variables introduced by different predicates is one of the most important functions of grammar and languages have developed various strategies for doing so (e.g. word order in English and case marking in German). Coreferential linking is achieved by making the variables equal (Steels, 2005), which results in the following meaning for the sentence: (3 v, w, x: jack(v), block(w), push(x), push-l(x, v), push-2(x, w)}.
rub-ruie of
Figure 1. The fusion of an event’s participant-roles and a construction’s semantic roles is achieved through fusion links which are dynamically updated through language use
In the FCG implementation, the composition of meanings including the establishment of coreference is taken care of by con-rules which thus implement argument-structure constructions in construction grammar (Goldberg, 1995). The con-rules map a semantic frame (the left pole) to a syntactic pattern (the right pole). The semantic frame contains a set of semantic roles and the syntactic pattern includes simple ‘case markers’ that immediately follow the arguments of which they indicate the semantic role.a An example utterance could be pushjackBO block-KA where BO indicates that jack plays sem-role-8 (which fuses with ‘push-1’) and where KA indicates that block plays sem-role-3 (which fuses with aThe experiment only focuses on the emergence of semantic roles. It therefore assumes a one-toone mapping of semantic roles to grammatical markers.
349
0.e
0.6
0.4
0.2
0 2000
0
M O O
40W
..-I -
-
T - - ~
-
-
-
I
4wO
- .
low0
language games
total nurnbcr of pancipant-rdcs mvtred ...............................
2 5 - / . ’ .. . I
20
-::
15
{.
i.
..
number 01 participant.rolescovered by generalized roles ......
.L
................... . . . . . . . . . . . . . . . . . .
........................
number 01 generalized marker5 I
number pl verb.specific markers
0
40m
8W0
l0MO
language garnss
Figure 2 . The top graph shows that the agents rapidly reach communicative success and that they converge on a coherent set of semantic roles after 5,500 language games. The semantic role variance reaches almost zero. The bottom graph gives more details on the roles themselves.
‘push-2’). There are also links between con-rule 23 and con-rule 5 and con-rule 10 which means that the latter two are sub-rules of con-rule 23. For convenience’s sake, these sub-rules are only illustrated as nodes in the network. The fusion of the event-specific participant-roles and the semantic roles of a construction is specified in ‘fusion links’, which are the grey boxes in Figure 1. The fusion links represent all possible fusions known by an agent which can be extended if needed. Each of the links fuses a participant-role with a semantic role within a specific con-rule. This link has a ‘confidence score’ between 0 and 1 which indicates how successful this fusion has been in past communicative acts. For example, ‘push-1’ can be fused with ‘sem-role-8’ in con-rule 10 with a confidence score of 0.7. There is a competing fusion link in which ‘push-1’ is fused with ‘sem-role-]’ in con-rule 2, but this link only has a confidence score of 0.3 so the other one is preferred. Finally, ‘push-1’ can also be fused with ‘sem-role-8’ in
con-rule 23, which also contains the semantic role ‘sem-role-3’. In this case, the fusion has a confidence score of 0.5. This fine-grained scoring mechanism allows speakers of a language to cope with the fuzzy edges of grammatical categories, which is necessary because grammar rules have to be applicable in a flexible manner. A network of rules, as opposed to a limited set of linking rules, is also an elegant way of capturing the complex and multilayered mapping between form and function in language.
3. Experiments on the Emergence of Semantic Roles This paper hypothesizes (a) that the emergence of semantic roles is triggered by the need to reduce the cognitive effort of interpretation and to avoid misinterpretation, and (b) that generalizations and grammatical layers are developed as a side-effect of reusing existing linguistic structures in new situations. To test these hypotheses, the same experimental set-up was used as Steels and Baillie (2003). The experiment involves a population of 5 artificial agents which play description games about dynamic real-world scenes. Equipped with a vision system and embodied through a pan-tilt camera, the agents are capable of extracting event descriptions from the scenes. During a game one agent describes an event in the scene to another agent. The game is a success if the hearer agrees with that description. In order to focus exclusively on the emergence of semantic roles, the agents are given a lexicon at the beginning of an experiment but no grammar. The agents are autonomously capable of detecting when there might be communicative problems through self-monitoring (Steels, 2003). This enables the agent to detect whether variables are coreferential and thus whether there are missing links in the meaning of an utterance (Steels, 2005). If the speaker detects one missing link (but no more), he will try to repair this problem. The hearer’s learning strategy works in the same way, except that he has more uncertainty because he has no access to the speaker’s intended meaning. By comparing the parsed utterance to his world model, however, the hearer may exploit the situatedness of the communicative act to solve the missing link problem as well. Repairing a missing link can be done by classification or by combination. Repair by classification occurs when the missing link involves a participantrole which the speaker encounters for the first time (e.g. push-1) which we will call the target-role. The agent will first check whether he already knows a semantic role for an analogous participant-role (source-role) that might be reused. Analogy works by (1) taking the event of the target-role and the event that was used to construct the source-role, (2) decomposing them into their event structures, and then (3) constructing a mapping between the two. For example, a ‘walk-to’-event can be decomposed into an event structure that starts with two non-moving participants and then one participant approaching the other. Event structures themselves are represented as a series of micro-events. The algorithm takes all the participant-roles of the micro-events in which the target-role occurs
351 and maps them onto the corresponding participant-roles in the source event structure. An analogous mapping is defined as when the filler of those corresponding roles is always the same. In case of multiple analogies, the source role which covers the most specific participant-roles is chosen. The source role will then be generalized so that it also covers the target-role. If no analogy could be found, the agent will create a new con-rule which maps the target-role to a newly invented marker. In both cases, fusion links are created and updated for later usage. Repair by combining existing rules occurs when the speaker wants to express a two- or three-place predicate and already has separate rules that link some of the coreferential variables, but not all of them. The agent will then try to combine these existing rules into a new con-rule. New fusion links are created and family links (sub- and super-rules) are kept between the new con-rule and the rules that were used for creating it. In this way, a network of rules as seen in Figure 1 gradually emerges which improves linguistic processing. Given the population dynamics of the experiment, several semantic roles may be created and generalized in local language games and then start to propagate among the agents. This automatically creates conflicting solutions, however, so the roles start competing with each other for survival and for covering as much participant-roles as possible. Language thus becomes a complex adaptive system in its own right, very much like a complex ecosystem. There are two types of selectionist forces at work: functional (i.e. some roles are more analogous and therefore better suited for covering a participant-role) and frequency-based. To be able to align their grammars with each other, agents consolidate their linguistic inventory after each game by updating the scores of the fusion links. Since each construction has its own place in the grammar, fusion links are needed for each specific construction (see Figure 1). However, there is a danger of lingering incoherence if the scores of the fusion links are updated independently of each other. For example, the fusion link between ‘push-1’ and ‘sem-role-1’ may win the competition for single-argument utterances whereas the fusion with ‘sem-role8’ may win for two-argument utterances. This is incompatible with observations in natural languages which develop a coherent system for argument-structure constructions. In order to solve this problem, the agents apply a consolidation strategy of multi-level selection. Instead of updating only the fusion links that were actually used during processing, all the compatible fusion links are updated as well. Compatible fusion links are links that are related to sub- or super-rules of the applied con-rule. These scores are increased if the game was a success while all the competing links are decreased by lateral inhibition. The scores are lowered if the game was a failure.The exact algorithm and experiments on multi-level selection are reported in more detail in Steels, van Trijp, and Wellens (2007).
352 4. Results and Discussion The results show that the agents succeed in developing a coherent system of semantic roles. The top graph in Figure 2 shows that the agents rapidly reach communicative success and that they learn all the case markers after 2,000 language games. It takes them another 3,500 games before they reach total meaning-form coherence. Meaning-form coherence is measured by taking the most frequent form to cover a participant-role and divide this by the total number of forms circulating in the population. Inversely, the semantic role variance - which measures the distance between the semantic role sets of the agents - reaches almost zero which means that the agents have aligned their semantic roles. The bottom graph of Figure 2 gives more details about the roles themselves. The semantic role overlap indicates that there is still competition going on for 5 participant-roles. The graph also shows that there are 9 verb-specific markers whereas 7 have already become more generalized. These 7 markers cover 24 of the 30 participant-roles in the experiment. Figure 3 gives a snapshot of the evolution of case markers in one agent. It shows that there is a gradual continuum between more lexical, verb-specific markers and more grammaticalized markers which cover up to 8 participant-roles. Similar observations have been made in natural languages by grammaticalization studies (Hopper, 1987).
vuivos - puxaec - zoazeuch - naetaz - toawash .nudeua
.
,; ..: , ' : . \
,. , ,,. . ..:,.. .:. ,:, :, .: . . : ,
.,
r
0
1Mo
2000
.; '.,. J . i
: ,
3oW
....' . ,'"
., i.
,
..;:
.
'
4Mo
.
.. . , '.. :, .. , . 5OW
..
, ,
BOW
..
.. .
. .. , . . .j
7000 BOW Language games
Figure 3. The evolution of case markers in one agent. For example "fuitap" covers 8 specific roles after 600 games, but is in conflict with other markers and in the end covers 6 roles. The graph shows the continuum between more specific and more generalized semantic roles.
5. Conclusion This paper showed that experiments on artificial language evolution can be highly relevant for linguistic theories. It proposed a fully operational implementation of
353 the constructionist account to predicate-argument structure in Fluid Construction Grammar. By embedding this approach in experiments with embodied artificial agents, a coherent explanation was presented on the emergence of semantic roles. The results of the experiments showed that semantic roles can emerge as a way to avoid misinterpretation and to reduce the cognitive effort needed during parsing, and that they are further grammaticalized by reuse through analogy.
Acknowledgement This research was funded by the EU FET-ECAgents Project 1940. The FCG formalism is freely available at www.emergent-1anguages.org. I am greatly indebted to Luc Steels (who implemented the first case experiment in 2001), director of the Sony Computer Science Laboratory Paris and the Artificial Intelligence Laboratory at the Vrije Universiteit Brussel, the members of both labs, and Walter Daelemans, director of the CNTS at the University of Antwerp. References Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford UP. Davidse, K. (1996). Functional dimensions of the dative in english. In W. Van Belle & W. Van Langendonck (Eds.), The dative. volume I : Descriptive studies (pp. 289 - 338). Amsterdam: John Benjamins. Goldberg, A. E. (1995). A construction grammar approach to argument structure. Chicago: Chicago UP. Haspelmath, M. (2007). Pre-established categories don’t exist. Linguistic Typology, I l ( l ) , 119-132. Hopper, P. (1987). Emergent grammar. BLC, 13, 139-157. Michaelis, L. A. (2003). Headless constructions and coercion by construction. In E. Francis & L. Michaelis (Eds.), Mismatch: Form-function incongruity and the architecture of grammar (pp. 259-3 lo). Stanford: CSLI Publications. Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge: Cambridge UP. Steels, L. (2003). Language re-entrance and the ‘inner voice’. Journal of Consciousness Studies, lO(4-5), 173-185. Steels, L. (2005). What triggers the emergence of grammar? In Aisb’O5: Proceedings of eelc’05 (pp. 143-150). Hatfield: AISB. Steels, L., & Baillie, J.-C. (2003). Shared grounding of event descriptions by autonomous robots. Robotics and Autonomous Systems, 43(2-3), 163-173. Steels, L., & De Beule, J. (2006). Unify and merge in fluid construction grammar. In P. Vogt, Y.Sugita, E. Tuci, & C. Nehaniv (Eds.), Symbol grounding and beyond. (pp. 197-223). Berlin: Springer. Steels, L., van Trijp, R., & Wellens, P. (2007). Multi-level selection in the emergence of language systematicity. In F. Almeida e Costa, L. M. Rocha, E. Costa, & I. Harvey (Eds.), Proceedings of the 9th ecal. Berlin: Springer.
BROADCAST TRANSMISSION, SIGNAL SECRECY AND GESTURAL PRIMACY HYPOTHESIS SLAWOMIR WACEWICZ & PRZEMYSLAW ZYWICZYNSKI Department of English, Nicoluus Copernicus Universily, Fosa Staromiejsh 3 Toruri, 87-1 00, Poland In current literature, a number of standard lines of evidence reemerge in support of the hypothesis that the initial, “bootstrapping” stage of the evolution of language was gestural. However, one specific feature of gestural communication consistent with this hypothesis has been given surprisingly little attention. The visual modality makes gestural signals more secret than vocal signals (lack of broadcast transmission). The high relevance of secrecy is derived from the fundamental constraint on language evolution: the transfer of honest messages itself is a form of cooperation, and therefore not a naturally evolutionarily stable strategy. Consequently, greater secrecy of gestural communication constitutes a potentially important factor that should not fail to be represented in more comprehensive models of the emergence of protolanguage.
The idea of gestural primacy (in the evolution of language), in its various forms, has attracted numerous modern supporters (Hewes 1973, Armstrong et al. 1994, Corballis 2002, among many others), as well as several sceptics (e.g. MacNeilage & Davis 2005), with a small but notable minority denouncing it as a non-issue (Bickerton 2005). Its proponents adduce a wide range of evidence, focussing on the rigidity of preexisting primate vocal communication, iconicity of gestures, sign language acquisition, cortical control of the hand, and many others. However, one very interesting feature of gestural signals, the greater potential secrecy resulting from the lack of broadcast transmission, has so far remained unexplored, despite its strict relevance to the evolutionary context. At the same time, we have found it to be neglected in standard psychological, linguistic, and ethological approaches to nonverbal communication in humans (Feldman and Rime 1991, McNeill 2000; Atkinson and Heritage 1989; EiblEibesfeldt 1989). 1.
Definitions and caveats
It is important to voice a number of caveats at the outset. Firstly, we follow Hewes (1 996) in giving the pivotal term gesture a relatively broad interpretation. 354
In the present context, “gestures” are primarily defined as the voluntary communicative movements of the arm, hand and fingers. Somewhat less centrally, they also include elements of proxemics, posture and orientation, facial expressions, and gaze direction. On the other hand, gestures as understood here do not refer to the articulatory gestures involved in speech production, nor to non-intentional bodily signals (affective gestures), although they may form a continuum with the latter. Secondly, it must be emphasised that the present paper deals specifically with the very earliest stage of the phylogenetic emergence of languagelike communication. We subscribe to the widely held position that language as known today was preceded by a “simpler” protolanguage. We remain noncommittal as to the exact nature of protolanguage (e.g. holistic versus atomic), but assume it to be distinguished by the lack of generative syntax, but the presence of the conventional sign (sensu Zlatev et al. 2005). Thirdly, it should be noted that this text concerns broadcast transmission only with respect to its consequences to secrecy (“privacy”, “addressee discrimination”). The general implications of broadcast transmission of a communication system are much wider, including such aspects as independence from visibility conditions and line of sight, but they lie outside the scope of the present paper’. 2.
The fundamental constraint on the evolution of communication
A standard, intuitive approach to explaining the absence of language in nonhuman primates is to look to their cognitive, conceptual or physical limitations (relative to humans). Such a position implicitly assumes a natural motivation to exchange honest messages, only held back by the lack of suitable means of expression. This, in turn, is rooted in an intuitive view on the naturalness of cooperation, additionally backed up by the group selectionist mindset popular in the first half of the past century. From that perspective, the presence of extensive cooperation between nonkin in humans is expected; it is the lack of such cooperation in other primates that becomes the theoretical problem in want of an explanation. The above explanatory pattern has been reversed by the introduction
’
It is worth noting that once the argument becomes framed in terms of the advantages of one transmission channel over the other (as is often the case), it instantly loses its relevance to the issue of gestural primacy. The question of which communication system is more efficient is logically independent from the question of which communication system is more natural to evolve in an ancestral primate: “which is better” is fully dissociable from “which came first”.
356 into evolutionary theory of the gene’s eye view (Dawkins 1976) and gametheoretic logic (Maynard Smith 1982). However, the relation between cooperation and communication remains complicated, with communication often seen essentially as a mere means for establishing the cooperative behaviour proper (e.g. Gardenfors 2002). It takes another vital step to realise that the exchange of honest messages is a special case of communication that is itself a form of cooperation. As such, it requires special conditions for emergence (such as kinship, byproduct mutualism, group selection, reciprocity see e.g. Dugatkin 2002), and generates specific predictions as to its nature (Krebs and Dawkins 1984). Communication in general is constrained by the honesty of signals. Since receivers are selected not to respond to dishonest messages - ones that fail to be reliably correlated with their “contents” - in the absence of signal honesty communication breaks down. Honesty can be guaranteed in two different ways, reflecting two models of social interaction. They result in two distinct kinds of signalling that characteristically differ in their expensiveness (Krebs and Dawkins 1984; see also Noble 2000, who nevertheless generally endorses this conclusion). Typically the interests of the individuals and their genes are conflicting, and communication spirals into an arms race between “costly advertising” and “sales resistance”. Here, honesty of a signal is certified by its being expensive and thus difficult to fake. The costs incurred on the signallers are diverse and involve minimally the expenditure of valuable resources such as time, energy, attention - but they can also include attracting predators, warning potential prey, or otherwise handicapping the animal in performing a simultaneous action (see also point 4). However, in cooperative interactions, honesty is intrinsically present, and need not be backed up by signal expensiveness. In such a model, selection pressures act against signal expensiveness, favouring the emergence of “cheap” signalling. In particular, this is relevant to signalling in language, which follows the latter pattern of communicative interactions. To sum up, the emergence of language-like communication necessarily presupposes the cooperative spectrum of the payoff matrix. Furthermore, it strongly predicts the signals used in such a type of communication to rninimise their conspicuousness as well as all other kinds of costs. 3.
Broadcast transmission
The concept of broadcast transmission was defined by Hockett (1 977) as one of the design features of language. The idea of broadcast transmission captures a
357 basic trait of verbal communication, which results from its dependence on the vocal-auditory transmission channel. Under canonical conditions, a vocal signal travels in all directions from its source, its detectability being restricted only by the distance from the sender (and the sensory equipment of potential decoders). This fact has a number of consequences, but in the present context, it is important that a vocally coded message is available indiscriminately to all individuals within the hearing range. The signaller is normally unable to confine the scope of addressees of its message. It is of interest to note that this problem was recognised as early as Hockett himself (1977: 131): “The situation is like that in bidding at bridge, where any information sent to one’s partner is also (barring resort to unannounced conventions, which is cheating) transmitted to opponents. There must be many ecological conditions in which this public nature of sound is potentially contrasurvival.” In this respect, gestural communication stands in a clear contrast with vocal communication. Its dependence on the visual mode, despite being limiting in other ways, does not lead to broadcast transmission, allowing the sender to select the addressees of the message. 4.
The costs of signalling in (proto)language
Language is a communicative system distinguished by its very high flexibility in the range, kind and complexity of transferred messages. This is founded on detached representation (Gardenfors 1996), which affords linguistic communication with essential independence from contextual, thematic, etc. constraints. This is a qualitative difference from nonlinguistic communication systems, and we assume it to be characteristic of protolanguage, at least to a considerable extent. The use of conventional signs endows protolanguage, despite its limited compositionality/productivity, with the ability to represent states, events, relations, etc. in the world in a rich form that can be assigned, or at least effectively interpreted in terms of, truth values’. As stated in point 2, all signalling is costly, principally in ways that are directly related to the production of the message, rather than to its “content”. Nevertheless, signalling may bear yet another type of consequences that rise to prominence in increasingly language-like forms of communication. These pertain to the content of the message. In so far as other parties are capable of acting on the disclosed information in ways harmful to the signaller, this reduces This need not imply an explicitly propositional representation format. For a possible format see e.g. Hurford (2006).
358 the signaller’s fitness and therefore can be conceptualised as a cost. Such costs may be negligible for most kinds of animal communication. This changes radically in protolanguage, which enables its users to convey a qualitatively different kind of information: rich information about the location and ways of access to food and other resources or about the history of social interactions (the “who did what to whom”). Such information constitutes valuable knowledge, and the evolutionary costs on the individual unintentionally divulging it to “eavesdropping” competitors and opponents are proportional to its high value. It must be especially emphasised that the above constraint is particularly relevant to the early stages of the development of language-like communication, where the cooperative context of communication is fragile. This is so because as is well known - language introduces or facilitates a range of normative mechanisms, such as reciprocity and punishment, that bolster cooperation; cooperation and language co-evolve. Therefore, the ability to discriminate between the receivers of the message would have been particularly important in the “bootstrapping” phase of the emergence of protolanguage. 5.
The secrecy of gestural signals
Gestural communication has so far been little studied with respect to signal secrecy. However, secrecy resulting from the lack of broadcast transmission appears to be a prominent trait of the use of gestures in present day humans. When gestural communication occurs between speakers capable of vocal communication, it is likely to follow from the effort to constrain the number of addressees, and is a strong indicator of a conflict of interests with a third party present in the vicinity. A strong link between the use of gestural communication under default audibility conditions and the need of secrecy, motivated by a conflict of interests, is supported by diverse lines of circumstantial evidence, some of which are enumerated below:
-
-
parenthetical signals that qualify, or even contradict, the vocally transmitted information, are often designed to be inaccessible to part of the receivers of the vocal message (e.g. a conspiratorial wink accompanying a vocal statement) - see Scheflen 1972; in contexts involving team competitions, the secrecy of tactical decisions is secured by reverting to the gestural mode, e.g. by taking advantage of the blocked line of sight of their opponents - see fig. 1; thieves operating in public places are known to depend on gestures to
359 coordinate their actions in a manner designed to minimise conspicuousness; indigenous people of the Kalahari Desert resort to sign language during hunting; this case represents a markedly different type of secrecy from the ones described above: here, the use gestures is not motivated by the intention to hide the content of the message but by the intention to hide (from prey) the very act of communication.
As already noted, secretive use of gestures has not been given attention in communication studies. Our work should be seen as a preliminary attempt to bridge this gap. Given the speculative nature of our claims, we have designed a set of role-play experiments and hope that, in the wake of them, we will be able to give these claims a more empirical footing. 6.
Conclusion
The argument outlined above is conceptually simple. The specific thesis advocated here is that the use of gestures counters the disadvantage incurred by the “broadcast transmission” feature characterising vocal communication. We suggest that this apparently slight disadvantage becomes magnified in more human-like interactions relying on more language-like communication, where the cost of divulging valuable information becomes an important factor. The gestural mode of communication, making use of the visual channel of transmission and thus being more secret, allows one to choose the receivers of its messages more discriminately. The above argument, which can be referred to as the “gestural secrecy argument” is limited in its scope. It does not constitute a separate scenario of the
360 evolution of protolanguage; rather, it identifies a potentially powerful factor that should be included into existing scenarios. Also, the argument does not address the central issue of why communication in hominids took a cooperative course in the first place. Still, it lends certain support to gestural rather than vocal theories of language origins, showing them to be more economical in the above respect. Further necessary research includes the incorporation of the factor of signal secrecy into more formal modelling of (proto)language origins, as well as empirical studies of signal secrecy in present-day gestural communication. References
Armstrong, D. F., Stokoe, W. C. & Wilcox, S. E. (1994).Signs of the origin of Syntax. Current Anthropology 35-4,349-368. Atkinson, J. M.,& Heritage, J. (Eds.) (1 989).Structures in social action. Studies in conversation analysis. Cambridge: Cambridge University Press. Bickerton, D. (2005).Language Evolution: a Brief Guide for Linguists. URL= http://www.derekbickerton.com/blog/~archives/2005/7/1/989799.html Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton, NJ: Princeton University Press. Dawkins, R. (1976).The SelJsh Gene. Oxford: Oxford University Press. Dugatkin, L. A. (2002). Cooperation in animals: An evolutionary overview. Biology and Philosophy, 17,459-476. Eibl-Eibesfeldt, I. (1989).Human ethology. New York: Aldine de Gruyter. Feldman, R. S., & Rim& B. (Eds.) (1991). Fundamentals of nonverbal behaviour. Cambridge: Cambridge University Press. Gardenfors, P. (1 996). Cued and detached representations in animal cognition. Behavioural Processes 36,263-273. Gardenfors, P. (2002). Cooperation and the evolution of symbolic communication. Lund University Cognitive Studies, 91. Hewes, G. W. (1996).A history of the study of language origins and the gestural primacy hypothesis. In: A. Lock and C. Peters (Eds.), Handbook of human symbolic evolution (pp. 57 1-595).Oxford: Oxford University Press. Hockett, C. F. (1977). Logical considerations in the study of animal communication. In C. F. Hockett (Ed.), In The View from Language: Selected Essays 1948-1974 (124-162).Athens, GA: The University of Georgia Press. Hurford, J. R. (2006).Proto-propositions. In A. Cangelosi, A. D. M. Smith, and K. Smith (Eds.), The Evolution of Language. Proceedings of the 6Ih International Conference E VOLANG6 (pp. 13 1-138). Singapore: World
361 Scientific Publishing. Krebs, J. R. & Dawkins, R. (1984). Animal Signals: Mind-Reading and Manipulation. In J. R. Krebs and R. Dawkins (Eds.), Behavioural Ecology: An Evolutionary Approach (pp. 380-402). Oxford: Blackwell. MacNeilage, P. F. & Davis, B. L. (2005). The FrameEontent theory of evolution of speech: A comparison with a gestural-origins alternative. Interaction Studies, 6-2, 173-199. McNeill, D. (Ed.) (2000). Language and gesture. Cambridge: Cambridge University Press. Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge: Cambridge University Press. Noble, J . (2000). Co-operation, competition and the evolution of pre-linguistic communication. In C. Knight, J. R. Hurford and Michael Studdert-Kennedy (Eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form (pp. 40-61). Cambridge: Cambridge University Press. Scheflen, A. E. (1972). The significance of posture in communication systems. In J. Laver and S. Hutcheson (Eds.), Communication in face to face interaction (pp. 225-246). Harmondsworth: Penguin Books. Zlatev, J., Persson, T. & Gardenfors, P. (2005). Bodily mimesis as “the missing link” in human cognitive evolution. Lund Universiw Cognitive Studies, 121.
SELF-INTERESTED AGENTS CAN BOOTSTRAP SYMBOLIC COMMUNICATION IF THEY PUNISH CHEATERS
EMILY WANG Artijicial Intelligence Laboratory, Vrije Universiteit Brussel Pleinluan 2, Brussels, 1050, Belgium emily @arti.vub.ac.be
LUC STEELS Sony Computer Science Laboratory 6 Rue Amyot, Paris, 75005, France Artijicial Intelligence Laboratory. Vrije Universiteit Brussel Pleinlaan 2, Brussels, 1050, Belgium
[email protected] We examine the social prerequisites for symbolic communication by studying a language game embedded within a signaling game, in which cooperation is possible but unenforced, and agents have incentive to deceive. Despite this incentive, and even with persistent cheating, naming conventions can still arise from strictly local interactions, as long as agents employ sufficient mechanisms to detect deceit. However, unfairly antagonistic strategies can undermine lexical convergence. Simulated agents are shown to evolve trust relations simultaneously with symbolic communication, suggesting that human language need not be predicated upon existing social relationships, although the cognitive capacity for social interaction seems essential. Thus, language can develop given a balance between restrained deception and revocable trust. Unconditional cooperation and outright altruism are not necessary.
1. The Reciprocal Naming Game Sociality is generally regarded as a prerequisite for symbolic communication (Steels, 2008), but given the pressure of natural selection, there remains the question of how honest communication can be evolutionarily stable when individuals might gain an advantage by deceiving others (Dessalles, 2000). In hunter-gatherer societies, imparting personal knowledge to others about the location of food can be of negligible cost and may bring extra benefits if collaboration is required to harvest the food, or if the other individuals are likely to return the favor at a later time (Knight, 1991). Reciprocity has been put forward as a mechanism that sufficiently elicits altruism directed at unrelated individuals given Darwinian constraints, as long as individuals encounter each other repeatedly over the course of many interactions, and are exposed symmetrically to opportunities for altruism, as 362
363 in the prisoner’s dilemma strategy game (Trivers, 1971). With a tit-for-tat policy, a player remembers each opponent’s previous action so that cooperation is only directed towards those who did not defect in the previous interaction, and this has been shown to foster reciprocity because it is punishing yet forgiving (Axelrod & Hamilton, 1981). Thus, we present a computational model where individuals can recognize each other, keep a record of cooperative behavior, and direct their own altruistic behavior towards those who previously offered cooperation. We combine two well-studied models, the Naming Game and the Signaling Game, to make the Reciprocal Naming Game, which we use to study the interaction between optional altruism and the emergence of symbolic communication. The Naming Game (Steels, 1995) was introduced as a minimal model for studying the conventionalization of names in a population of agents, using only peer-to-peer interactions. The goal is to develop globally accepted naming conventions from only the sum experience of many local interactions. The Crawford-Sobel model of strategic information transmission (1982) defines a Signaling Game, which is a two-player strategy game in which the players communicate using signals. For convenience, we denote the signaler as S , and the receiver as R. S is better informed than R, with private information t about the environment. S transmits a message m to convey either t , or something misleading. Based on m, R takes an action a that determines the payoff for both players. If S adopts a strategy of lying about t , then R adapts by ignoring information in m. In the Naming Game, the speaker utters a word to best convey the intended referent to the hearer. But in a Signaling Game, the signaler need not transmit m E t. We create a single game out of these two by presenting two players, randomly chosen out of the population in each iteration, with a context of two items, one of which is the target, and the other a distructer. S has access to this information, but may choose either item as the referent. This situation can be conceived as a shell game, where a set of shells forms the context, and a dealer has hidden a pea under one of the shells. R is like a player who places a bet, and wins by correctly guessing which shell contains the pea. S is a third party that may act as an informant and truthfully indicate the target to R, in which case S takes a share of R’s winnings. Or, S may act as a shill by indicating the distracter, and receive a payment from the dealer if R guesses incorrectly. So S may use m to deceive and R must decide whether to believe m. This interaction scheme is similar to that of the regular Naming Game, but without feedback from explicit pointing. With the Reciprocal Naming Game, the signaler’s intended meaning is never revealed to the receiver. Adding this layer of uncertainty preserves the privacy of the players’ choices whether to cooperate or defect. The remainder of this paper studies the Reciprocal Naming Game. We first introduce a minimal agent architecture needed to play the game, and then some different strategies. Next we report on the result of computational simulations that examine key questions about the social prerequisites of symbolic communication.
364 2. Agent Architecture To remember object names, each agent is equipped with a lexical memory associating words with meanings and scores. Multiple lexicon entries may share the same word or meaning, and these competing conventions can be ordered by preference according to their score. Scores are governed by lateral inhibition, that is, incremented following successful usage and decremented following failed interactions, or the successful use of a competing association. Group coherence represents agreement in the population, and this is summarized by a group lexicon of the most widely accepted words, but this measure is only known to an external observer. The agents themselves receive only local information. To identify other agents in the population and to record previous experiences, each agent also has a social memory, associating each other individual with a rating. One agent can regard another with the intent to cooperate, regard(aj, a k ) = 1 , or with the intent to defect, regard(aj, a k ) = 0. Two agents that regard each other in the same way share mutual regard, regard(aj, U k ) = regard(ak, a j ) , but otherwise their relationship is one-sided. The outcome of one iteration of the Reciprocal Naming Game depends upon three binary parameters, a s , c, and U R . The actions of the signaler and receiver are a s and a ~where , cooperation and trust are coded as 1, and defection and disbelief as 0. The predicate c indicatcs whether R comprehended the message correctly. A fourth value p depends on the other three, and indicates whether R successfully located the pea, which can occur on purpose or by accident, depending on c. So p is set like an even parity bit, with p = 1 only when an odd number of the bits in { a s ,c, a ~ are } 1, and this collapses the eight possible combinations into four distinct outcomes. These outcomes are summarized by the payoff matrix, p=l
p=o
where u denotes utility, and each entry gives us,U R . Note that p is used to decide the payments instead of U R , since the dealer or R only pay S based on the final outcome of the shell game. Three levels of information govern the players’ knowledge. Actions a s and a~ are kept private by each player. The result p is public information, displayed to both players, but the result c is not revealed to any player; it is known only by virtue of experimenter inspection. Players cannot inspect each others’ internal processes, so they cannot know for certain whether their opponents cooperate or defect. Nevertheless, S and R can each estimate the action of the other, given knowledge of their own actions, and their observation of p . For an agent-knowledge formulation of the Reciprocal Naming Game, as well as further results not presented here, see http://arti.vub.ac.be/-emily/msc/.
365
3. Player Strategies Under the general condition of complete reciprocity, the signaler chooses as = regard(S, R) and the receiver chooses UR = regard(R, S), in accordance with tit-for-tat. An empty strategy was implemented to refute the null hypothesis, which would be that cheater detection has no effect on the ability of the population to agree upon lexical conventions. In this condition, S behaves as above. R assumes that the target m, but if R cannot interpret m, then it looks for the pea under a random context item. In another condition with only partial reciprocity, we relax the requirement that U S = regard(S, R). Instead we allow US = 0 even when regard(S, R ) = 1, by introducing a constantfairness parameter f for each agent. Afair agent has f = 1.0, and behaves with complete reciprocity. When f = 0, the agent acts as afree rider, and always defects when playing as S , although it can still choose to believe the signaler when playing as R. The agents also employ specified strategies for updating their memories. For the lexicon, both players promote the association that was applied in the interaction when they have received a nonzero reward, and they demote associations resulting in zero payoff. With a short-term memory strategy, associations reaching the minimum score threshold are deleted from the lexicon, but such entries are kept when using long-term memory. Updates for social regard are less symmetric. The signaler's sole criteria for updating its regard for R is whether or not the receiver chose the object that was intended, thus S assumes c = 1. When a s = 1, the intended object is the target, and when a s = 0, it is the distracter. So the receiver's choice matches the signaler's intention when p = a s . The receiver considers the size of U R to estimate whether the signaler cooperated in the interaction. As illustrated by the payoff matrix, R can sometimes deduce c and a s , given U R and p. When U R = 0.6, it is certain that a s = 1,even if R did not cooperate. R responds by cooperating with S next. When U R = 1.0, both players defected, and R continues to defect against S . When U R = 0, R cannot be certain about a s , and responds by modifying its regard for S by a bit-flip, since the payoff was not favorable.
=
4. Experimental Results Figure 1 shows a Reciprocal Naming Game with ten objects and ten agents using short-term memory. Measures are shown as running averages. Figures 2-5 are meant to be read in direct comparison to Fig. 1 (and so they have been simplified, and afforded less space; complete color versions can be viewed at http://arti.vub.ac.be/"emily/evolang7/). In successful systems, an initial lexical explosion due to the rapid invention of new words is followed by an approach towards high group coherence and communicative success as the lexicon becomes more efficient. Even under the more challenging conditions of the Reciprocal Naming Game, the agent population is capable of reaching complete agree-
366
0
2000
4000 6000 Games Played
8000
10000
Figure 1. Lexical agreement is not hindered by cheating in a simulation where the agents employ titfor-tat and have short-term memory. The lexicon becomes optimal and stable after 5,000 games, with complete group coherence fixed at 1.0, and lexicon size at 10. Communicative success is near perfect. but fluctuates just below 1.0. Reciprocating relationships are split about equally, and fluctuating.
ment on a set of lexical associations, despite the persistence of mutually defecting pairs. However, communicative success remains less than perfect, even when coherence is full, due to homonyms that are propagated following games where m was misunderstood. Because of the lack of pointing, agents cannot distinguish between a a zero payoff due to failed communication, and the same result due to a defecting partner. Thus communicative success and social relationships fluctuate continuously as a result of lexical inefficiency. We now examine the importance of sociality by discussing four major issues: 4.1. Retaliation allows deception to be tolerated
In Fig. 2, R employs the empty strategy and simply assumes that S is truthful, while S follows tit-for-tat. Coherence is not realized because misinterpreted messages pollute the lexicon with many homonyms. Even though the initial population is fully cooperative, R guesses randomly when it does not know m, and this introduces uncooperative regard into the system. So agreement can form when the agents are equipped retaliate, as they are in Fig. 1, but not in Fig. 2. This clearly rejects the null hypothesis since the population only develops group coherence when the receivers, as well as the speakers, follow a policy of reciprocation. Therefore lexical convergence depends not upon a complete lack of deception, but rather upon balance between deception and the ability to detect it. Given this, individuals can direct their altruism accordingly. But since R cannot always deduce the true value of a s , it seems even an approximation of the speaker’s honesty suffices. Thus, cheater detection is essential, even if it is fallible.
367
4.2. More memory prevents the death spiral One weakness of tit-for-tat, cited for the iterated prisoner’s dilemma, is the problem of the death spiral in noisy environments, where a single mistake can destroy a mutually cooperative relationship (Axelrod & Hamilton, 1981). The Reciprocal Naming Game tends to resist this pitfall since the true actions, U S and a ~remain , private, and players must deal with doubt when estimating these values. Cooperative relations become even more robust with long-term lexical memory, when obsolete associations remain accessible to R for interpreting m. This increases the chance of comprehension, and suppresses defecting pairs to much lower numbers, as shown in Fig. 3. The time to reach convergence doubles, but mutually cooperative relations are more constructive and stable since a shared reward results in synchronous score promotions, while defection virtually guarantees that the players will make mismatched lexical updates. 4.3. Limited numbers of free riders are bearable
Figure 4 shows that a population mostly composed of fair agents can accurately retaliate against a single free rider. But retaliation becomes less effective as the number of free riders grows, as shown in Fig. 5 where coherence is significantly more difficult to achieve, and unstable. Free riders detract from the common good in total utility, since mutually cooperative interactions benefit from a 0.2 bonus. The advantage of the free rider strategy depends on how many other agents in the population are following the same strategy. Individual utility is best served by taking part in the majority, that is, to cease reciprocating when there are more free agents than fair agents in the population. 4.4. Reciprocation produces coherence in spite of deception
While the agents never form explicit agreements, each agent’s personal utility depends on its ability to establish reciprocal relationships. Acting without reci-
I*..*...*...-.* ........._............*....~....
I
o
I 201x1
4000 6im Game, Played
~NKI
mx)
11
4IMl
RIYIO
IZ(HI0
16W)
ZIXHXI
Gamer Plrycd
Figure 2. Agents perform at random when R has Figure 3. Defection is suppressed when agents no strategies for detecting deceit. Lexical agree- have the added capacity of long-term memory. ment under these conditions is not possible. The learning curve compares with that of Fig. 1.
368 procity is costly. Cooperating with a partner who defects results in the sucker's payoff. Defecting against a partner who cooperates precludes future cooperation. But we must distinguish between failing to reciprocate and choosing not to cooperate. If two agents have established a pattern of repeated, mutual defection, then they receive roughly equal cumulative payoff. In a sense, one player sacrifices itself in each interaction, to provide the other with a large reward, and they take turns doing this since roles are randomly assigned. This way, cooperation takes place not within each interaction, but over the course of multiple interactions, emerging from tit-for-tat. The level of information sharing found in human language use suggests that speakers must be motivated to share personal knowledge by some direct payoff (Scott-Phillips, 2006). In the context of the Reciprocal Naming Game, a speaker can be seen to derive utility from the propagation of its own words, because later in the receiver role, this agent will deal better with the social situation when it is able to interpret the linguistic situation. Ostensibly, it would be every agent's goal to avoid coherence with unfair partners if coherence renders an agent vulnerable to deception perpetrated by shared words. But coherence contributes to personal utility when cheaters can be detected, and this supports convergence in the face of deception. Although an opponent might use a word to deceive once, the word cannot be used against the same agent to cheat repeatedly if the meaning of the word is shared, since an agent who has been deceived will choose to disbelieve the message in the next round, if playing by tit-for-tat. Thus in the long run, comprehension of messages elevates receiver performance above chance, and it is in an agent's interest to share the words it knows, and to learn the words spoken by other players. This way, the group lexicon serves as a neutral tool and as a sort of social contract, especially because it would be difficult for a single agent to deviate unilaterally from the agreed naming conventions. In this system, the language remains a constant fixture because the opportunity to brandish it for deceit is no greater than the opportunity to engage it for cooperation.
0.2 CommunicativeL U C E ~ M
Lexicon size
n
2 ~ m
4000
m
-
~(100
ii
Gamer Played
Figure 4. With only one free rider, lexical agree- Figure 5. With three free riders, the ability to build agreement becomes greatly diminished. ment and stability nearly matches Fig. 1.
369 5. Conclusion
In simulations guided by a model of selfish communication, we experimented by endowing agents with a tit-for-tat policy, as well as some other policies for guiding altruistic behavior. With tit-for-tat, the agents’ selfishness did not impede lexical agreement. But without sufficient reciprocation, deception prevented consensus. These simulations show that peer-to-peer negotiation of conventions in language games remains viable in a social environment where deception is prevalent, as long as a socially-informed mechanism governs the agents’ choices between cooperation and deception. Bootstrapping a symbolic system of communication can even occur in parallel with the formation of trust relations. This demonstrates that trust need not be permanent or unconditional for communication to develop and remain stable. Rather, reciprocity may serve as a proxy for honesty. Acknowledgments
This research has been conducted at the A1 Laboratory of the Vrije Universiteit Brussel, with funding from FWO project AL328. Emily Wang visited the A1 Lab during the 2006-07 academic year on a Fulbright fellowship sponsored by the U.S. Department of State. We would like to thank both Pieter Wellens and Joris Bleys for their insights on Naming Game dynamics. References
Axelrod, R., & Hamilton, W. (1981). The evolution of cooperation. Science, 21 1(4489),1390-1396. Crawford, V. P., & Sobel, J. (1982). Strategic information transmission. Econometrica, 50(6), 1431-1451. Dessalles, J-L. (2000). Language and hominid politics. In C. Knight, M. StuddertKennedy, & J. Hurford ( a s . ) , The evolutionary emergence of language: Socialfinction and the origins of linguistic form (pp. 62-79). Cambridge, UK: Cambridge University Press. Knight, C. (1991). Blood relations: Menstruation and the origins of culture. New Haven, CT: Yale University Press. Scott-Phillips, T. C. (2006). Why talk? Speaking as selfish behaviour. In Proceedings of the 6th international conference on the evolution of language (pp. 299-306). Steels, L. (1995). A self-organizing spatial vocabulary. ArtQicial Life, 2(3), 319332. Steels, L. (2008). Sociality is a crucial prerequisite for the emergence of language. In R. Botha & C. Knight (Eds.), The cradle of language. Oxford, UK: Oxford University Press. Trivers, R. (1971). The evolution of reciprocal altruism. Quarterly Journal of Biology(46), 35-57.
COPING WITH COMBINATORIAL UNCERTAINTY IN WORD LEARNING: A FLEXIBLE USAGE-BASED MODEL
PIETER WELLENS VUB AI-Lab, Pleinlaan 2, 1050 Brussels, Belgium
[email protected] Agents in the process of bootstrapping a shared lexicon face immense uncertainty. The problem that an agent cannot point to meaning but only to objects, represents one of the core aspects of the problem. Even with a straightforward representation of meaning, such as a set of boolean features, the hypothesis space scales exponential in the number of primitive features. Furthermore, data suggests that human learners grasp aspects of many novel words after only a few exposures. We propose a model that can handle the exponential increase in uncertainty and allows scaling towards very large meaning spaces. The key novelty is that word learning or bootstrapping should not be viewcd as a mapping task, in which a set of forms is to be mapped onto a set of (predefined) concepts. Instead we view word learning as a process in which the representation of meaning gradually shapes itself, while being usable in interpretation and production almost instantly.
1. Introduction Word learning is commonly viewed as a mapping task, in which the learner has to map a set of forms onto a set of concepts (Bloom, 2000; Siskind, 1996). While mapping might seem more straightforward than having to shape word meanings, it is in fact more difficult and lies at the root of many problems. The view that word learning corresponds to mapping forms onto concepts is commonly accompanied by claims that a learner is endowed with several biases (constraints) that guide him toward the right mapping (Markman, 1989). Whether these constraints are language specific is yet another debate (Bloom, 2001). While this approach recognises the uncertainty it largely circumvents it by invoking these constraints. Another possibility is to propose some form of cross situational learning where the learner enumerates all possible interpretations and prunes this set when new data arrives. This second approach would seem to have a problem explaining fast mapping, since it takes a large amount of time before the initial set of hypotheses can be pruned to such an extent that it becomes usable. To be clear, we are not unsympathetic to the idea of word learning constraints, but we believe that it is only when viewing word learning as mapping that the constraints become as inescapable as they seem. In this publication we try to 370
371 show that by trading the mapping view for a more organic, flexible approach of word learning (in line with Bowerman and Choi (2001)), the constraints become less cardinal. Moreover, the enormous diversity found in human natural languages (Haspelmath, Dryer, Gil, & Comrie, 2005; Levinson, 2001) and the subtleties in word use (Fillmore, 1977) suggest that language learners can make few apriori assumptions and even if they would, they still face a towering uncertainty when homing in on more subtle aspects of word meaning and use. Some developmental psychologists emphasize human proficiency in interpreting the intentions of others (Tomasello, 2003) or our endowment with a theory of mind (Bloom, 2000). While being supportive of these ideas and even taking some for granted in our experimental set-up, it is important to understand that intention reading is no telepathy. It might scale down the problem, but not entirely solve it. Any of these skills have to be accompanied by a model capable of coping with immense uncertainty in large hypothesis spaces. Siskind (1996) and others propose models based on cross situational learning to bootstrap a shared lexicon. Unlike the current experimental setup their experiments do not address an exponential scale-up in the number of hypotheses. Other models such as De Beule and Bergen (2006), Steels and Loetzsch (2007), Steels and Kaplan (2000) in different ways allow exponential scaling but tend to keep the hypothesis space small. For example the experiments in De Beule and Bergen (2006) are limited to 60 objects represented by 10 distinct features (there called predicates). These papers, however, do not address scale-up and therefore do not claim to handle it.
2. Overview of the model Agents engage in series of guessing games (Steels, 2001). A guessing game is played by two agents, a randomly assigned speaker and hearer, sharing a joint attentional frame (the context). The speaker has to draw the hearer’s attention to a randomly chosen object (the topic) using one or more words in its lexicon. After interpretation, the hearer points to which he believes the speaker intended. In case of failure, the speaker corrects the hearer by pointing to the topic. To investigate referential uncertainty, which is the problem that an agent cannot point to meaning but only to objects, we must ensure that multiple equally valid interpretations exist upon hearing a novel word. It follows that explicit meaning transfer (i.e. telepathy) or a non structured representation of meaning are to be avoided. Even with an elementary representation of meaning such as sets of primitive features the number of possible interpretations scales exponential in the number of features, given that word meaning can be any subset of these featuresa. For example, upon hearing a novel word, sharing joint attention to an do not claim such a representation to be realistic, but we believe it is the minimal requirement that suits our current needs for investigating the problem of referential uncertainty.
372 I
I
I
(attribute]
0.8
(attribute] )4- ( (attribute)
Figure 1 . Left an association between form and meaning as in common in many models of lexicon formation, scoring the complete subset. Right the refinement suggested in the proposed model, which is related to fuzzy sets and prototype theory.
object represented by 60 boolean features, and having no constraints to favor particular interpretations the intended meaning could be any of 260 = 1.153 x l0ls possibilities. Confronted with numbers of such magnitude one wonders how a learner, given a stable input language, ever achieves in finding out the intended meaning, let alone a population of agents bootstrapping, from scratch, a shared lexical language. Word learning constraints seem to be the only viable way out. With the number of hypotheses per novel word well over the billions a learner cannot enumerate these possibilities and score them separately, neither can he make series of one-shot guesses and hope for the best since finding the correct meaning would be like winning i n lottery. The first step towards a solution is to include uncertainty in the representation of word meaning itself. This is done by keeping an (un)certainty score for every feature in a form-meaning association instead of keeping only one scored link per word as in for example (De Beule & Bergen, 2006) (see figure 1). This representation is strongly related to both fuzzy set theory (Zadeh, 1965) and prototype theory (Rosch, 1973). A crucial difference with traditional cross situational learning approaches is that this representation avoids the need to explicitly enumerate competing hypotheses. The key idea during language use is that a weighted similarity can be calculated between such representations. In the model we use a weighted overlap metric using the certainty scores as weights. In short, shared features increase similarity and the disjunct parts decrease it. Employing this similarity measure, production amounts to finding that combination of words of which the meaning is most similar to the topic and least similar to the other objects in the context. This results in context sensitive multi-word utterances and involves an implicit on-the-fly discrimination using the lexicon. The most important corollary of using a similarity measure is the great flexibility in word combination, especially in the beginning when the features have low certainty scores. Thanks to this flexibility the agents can use (combinations of) words that do not fully conform the meaning to be expressed, resembling what
373 Langacker (2002) calls extension. The ability to use linguistic items beyond their specification is a necessity in high dimensional spaces to maintain a balance between lexicon size and coverage (expressiveness). Interpretation amounts to looking up the meaning of all uttered words, taking the fuzzy union of their features and measuring similarity between this set and every object in the context. The hearer then points to the object with highest similarity, again making interpretation flexible. Flexible use of words entails that in a usage event some parts of the meanings are beneficial and others are not. If all features of the used meanings are beneficial in expressing the topic it would not be extension but instantiation, which is rather the exception than the rule. As Langacker (2002) puts it, extension entails “strain” in the use of the linguistic items which in turn affects the meanings of the used linguistic items. This is operationalised by slightly shifting the certainty scores every time a word is used in production or interpretation. The certainty score of the features that raised the similarity are incremented and the others are decremented resembling the psychological phenomena of entrenchment and its counterpart erosion. Features with a certainty score equal or less than 0 are removed, resulting in a more general word meaning. In failed games the hearer adds all unexpressed features of the topic to all uttered words, thus making the meanings of those words more specific. Combining similarity based flexibility with entrenchment and erosion, word meanings gradually shape themselves to better conform future use. Repeated over thousands of language games the word meanings progressively refine and shift, capturing frequently co-occurring features (clusters) in the world, thus effectively implementing a search through the enormous hypothesis space, capturing what is functionally relevant. Word invention is triggered when the speaker’s best utterance cannot discriminate the chosen topic. To diagnose possible misinterpretation the speaker interprets his own utterance before actually uttering it, which is crucial in many models (Batali, 1998; Steels, 2003). Given that his lexicon is not expressive enough, the speaker invents a new form (a random string) and associates to it, with very low initial certainty score, all so far unexpressed features of the topic. Because word meanings can shift, it might not be necessary to introduce a new word. Chances are that the lexicon needs a bit more time to be shaped further. Therefore the more similar the meaning of the utterance is to the topic, the less likely a new word will be introduced. The hearer, when adopting novel words, first interprets all known words and associates, again with very low certainty scores, all unexpressed features with all novel forms.
3. Experimental results In the multi-agent experimental setup we use a population of 25 agents endowed with the capacities described in the previous section. Machine learning data-sets
374
' "'I
'
Lexicon Coherence Communicative Success
01
1 ::
Figure 2. Left shows the performance of the proposed model on a small world (averaged over 5 runs), right for the much larger world (averaged over 3 runs) . Although the number of hypotheses scales exponential the agents attain high levels of communicative success and lexicon coherence while keeping reasonable lexicon size.
are used to obtain the large meaning spaces required to verify the claim that the model can scale to large hypothesis spaces. We use both a small data-set containing only 32 objects represented by 10 boolean features with context sizes between 4 and 10 objects, and a much larger data-set comprising 8124 objects represented by a total of 100 distinct boolean features and context sizes between 5 and 20 objects (Asuncion & Newman, 2007). This larger data-set confronts the agents with incredible amounts of uncertainty but the the results (figure 2 ) show that the model can manage this. The following measures are depicted:
Communicative Success (left axis): A running average (window of 500) of communicative success as measured by the agents. A game is considered successful if the hearer points to the correct topic. It is therefore different from communicative accuracy as in Vogt and Divina (2007), Siskind (1996). Lexicon Size (right axis): Represents the average number of words in the lexicons of the agents. Lexicon Coherence (left axis): Measures the similarity (using the same similarity measure the agents use) between the lexicons of the agents. Coherence of 1indicates that for all words all agents have the exact same features associated. It makes sense to be lower than 1 since it is not required to have the exact same meanings to be able to successfully communicate. The agents will not be aware of their (slightly) different meanings until a particular usage event confronts them with it. As a comparison we ran a model that does not score the individual features, but instead keeps a score for the meaning as a whole as in figure 1 (left). It does not employ a similarity measure and updates scores based on communicative success instead of the more subtle entrenchment and erosion effects. Results show (figure
375
,/
/' I Cominunicative Success -1
6o OR
t
....
Lexicon Size
0 6 .
04
.......
-
,/
...... 01
I ] '
Coininunicative Success 0
2-
IMW
m ( 1
8-
100000
Figure 3. Both graphs show the performance of a model that doesn't score the individual features and does not use a similarity measure. Left for the small meaning space, right for the larger space. The model achieves success on the small one, but fails to scale to the larger meaning space.
3) that the population can bootstrap a shared lexicon for small meaning spaces but cannot handle the scale up to the larger world. Also note that even in the small world the agents using this second model reach only 20% communicative success by game 20000 while with the proposed model they have already attained close to 99% communicative success by then. Data from developmental psychology suggests that human learners can infer aspects of the meaning of a novel word after only a few exposures. The graphs in figure 2 do not give us any insight on these issues as they show the average of a population in the process of bootstrapping a lexicon. By adding a new agent to a population that has already conventionalised a shared lexicon we are able to shed light on the behaviour of the proposed model regarding this issue. We use the large world (8124 objects, 100 features), a stabilised population with an average lexicon size of some 100 words and measure for a newly added agent the average success in interpretation in relation to the number of exposures to the word (see figure 4). The graph shows the average success in interpretation (i.e. the new agent pointed correctly) of all words, in relation to the number of exposures. Due to the way success is measured the first exposure is always a failure and so average success is zero. Quite perplexing, on the second exposure a whopping 64% of the novel words are used in a successful interpretation. Further exposures gradually improve this result and by the tenth exposure 70% of the words result in a successful interpretation. This is the more baffling taking into account that the other members of the population are unaware they are tallung to a new agent, and thus use multi-word utterances, including difficult to grasp words. 4. Conclusion The proposed model tries to capture and bring together some insights from cognitive linguistics (Langacker, 2002) and other computational models (Batali, 1998; Steels & Belpaeme, 2005; De Beule & Bergen, 2006), while taking for granted in-
376 E
.-0 4-
m
0.7 -
-
0.6 -
-
0.5 -
-
0.4 -
-
c
c
.-c .-C
u) u)
Q
!
0.3 -
u)
0
m
E
3
0.1
average success in interpretation
0 1
-+-
I
I
I
1
I
I
I
I
2
3
4
5
6
7
8
9
10
Number of exposures
Figure 4. The graph shows the performance in interpretation of one new agent added to a stabilised population. Quite perplexing the average success in interpretation at the second exposure to a novel word is already 64%.
sights from developmental psychology (Tomasello, 2003) and criticising assumptions made by others (Bloom, 2000; Markman, 1989). The main strength of modelling is that it can operationalise ideas and so our main goal is in showing that a more organic view on word learning combined with flexible language representation, use and alignment results in a powerful idea, both for scaling to very large hypothesis spaces and arriving at operational interpretations after very few exposures. Although our model can be interpreted as Whorfian this is only so if you assume that word meanings and concepts are one and the same. We did not make this assumption and do not take a position regarding the relation of concepts and word meanings.
Acknowledgements The research reported here has been conducted at the Artificial Intelligence Laboratory of the Vrije Universiteit Brussel (VUB). Pieter Wellens is funded by FWOAL328. I would like to thank my supervisor Luc Steels and the referees for their useful comments.
References Asuncion, A., & Newman, D. (2007). UCI machine learning repositoiy. Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M. S. Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases. Cambridge: Cambridge University Press. Bloom, P. (2000). How chiZdren learn the meanings of words. MIT Press.
377
Bloom, P. (2001). Roots of word learning. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 159-181). Cambridge: Cambridge University Press. Bowerman, M., & Choi, S. (2001). Shaping meanings for language: Universal and language-specific in the acquisition of spatial semantic categories. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 132-158). Cambridge: Cambridge University Press. De Beule, J., & Bergen, B. K. (2006). On the emergence of compositionality. In Proceedings of the 6th evolution of language conference (p. 35-42). Fillmore, C. J. (1977). Scenes-and-frames semantics. In A. Zampolli (Ed.), Linguistic structures processing (p. 55-8 1). Amsterdam: North-Holland. Haspelmath, M., Dryer, M., Gil, D., & Comrie, B. (Eds.). (2005). The world atlas of language structures. Oxford: Oxford University Press. Langacker, R. W. (2002). A dynamic usage-based model. In Usage based models of language. Stanford, California: CSLI Publications. Levinson, S. C. (2001). Language and mind: Let’s get the issues straight! In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (p. 25-46). Cambridge: Cambridge University Press. Markman, E. (1989). Categorization and naming in children: problems of induction. Cambridge, MA: BradfordlMIT Press. Rosch, E. (1973). Natural categories. Cognitive Psychology, 7, 573-605. Siskind, J. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61, 39-91. Steels, L. (2001). Grounding symbols through evolutionary language games. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (pp. 21 1-226). London: Springer Verlag. Steels, L. (2003). Language re-entrance and the inner voice. Journal of Consciousness Studies, 10, 173-185. Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences, 28(4), 469-89. (Target Paper, discussion 489-529) Steels, L., & Kaplan, F. (2000). Aibo’s first words: The social learning of language and meaning. Evolution of Communication, 4( l), 3-32. Steels, L., & Loetzsch, M. (2007). Perspective alignment in spatial language. In K. Coventry, T. Tenbrink, & J. Bateman (Eds.), Spatial language and dialogue. Oxford: Oxford University Press. Tomasello, M. (2003). Constructing a language. a usage based theory of language acquisition. Harvard University Press. Vogt, P., & Divina, F. (2007). Social symbol grounding and language evolution. Interaction Studies, 8(1). Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353.
REMOVING ‘MIND-READING’ FROM THE ITERATED LEARNING MODEL
S. E WORGAN AND R. I. DAMPER Information: Signals, Images. Systems (ISIS) Research Group School of Electronics and Computer Science University of Southampton Southampton, SO17 I BJ, UK.
{swZOSr/rid}@ecs.soton. ac. uk The iterated learning model (ILM), in which a language comes about via communication pressures exerted over successive generations of agents, has attracted much attention in recent years. Its importance lies in the focus on cultural emergence as opposed to biological evolution. The ILM simplifies a compositional language as the compression of an object space, motivated by a poverty of stimulus-as not all objects in the space will be encountered by an individual in its lifetime. However, in the original ILM, every agent ‘magically’ has a complete understanding of the surrounding object space, which weakens the relevance to natural language evolution. In this paper, we define each agent’s meaning space as an internal self-organising map, allowing it to remain personal and potentially unique. This strengthens the parallels to real language as the agent’s omniscience and ‘mind-reading’ abilities that feature in the original ILM are removed. Additionally, this improvement motivates the compression of the language through a poverty of memory as well as a poverty of stimulus. Analysis of our new implementation shows maintenance of a compositional (structured) language. The effect of a (previously-implicit) generalisation parameter is also analysed; when each agent is able to generalise over a larger number of objects, a more stable compositional language emerges.
1. Introduction Hypothesising that language is a system of compression driven to adjust itself so that it can be learned by the next generation is a relatively new approach in the field of linguistics. Several important simulations (Kirby & Hurford, 1997; Kirby, 2001, 2002; Brighton, 2002; Smith, Kirby, & Brighton, 2003) have illustrated its potential and provide an alternative to established innate accounts of language (Chomsky, 1975; Bever & Montalbetti, 2002; Hauser, Chomsky, & Fitch, 2002). Currently, existing versions of this iterated learning model (ILM) suffer from a number of shortcomings, highlighted by Smith (2005), Vogt (2005), Steels and Wellens (2006). This paper will address some of these while maintaining the positive features of the model. In the classical ILM, an agent selects an object from its environment and produces a meaning-signal pair that is directly perceived by a listener. The pairing 378
379
is formed through a weighted connection between a meaning node and a signal node, and is used to adjust the weighted connections between the meaning space and the signal space of the listening agent. In this way, a language evolves across a number of generations. If each agent is only given the associated signal for a small subset of possible objects, it is forced to generalise across the remaining object space, so promoting the formation of a stable compositional language.
2. Shortcomings of the Iterated Learning Approach In the ILM, the agents’ meaning space loosely represents the ‘mind’ of a language user, In many respects, however, this analogy breaks down, as each agent is created with a perfect knowledge of the surrounding object space, which is never found in reality. We need to consider the nature of the object space and the agents’ ability to generalise across it. Also, a learning agent directly observes each meaning-signal pair, and this introduces an element of ‘mind-reading’, as the learner knows exactly what the adult teacher was thinking when it produced a signal. Obviously, this weakens the ILM’s credentials as a simulation of cultural language evolution. Kirby (2002, p. 197) himself acknowledges this criticism, writing “the ready availability of signals with meanings neatly attached to them reduces the credibility of any results derived from these models”, whereas Smith et al. (2003, p. 374) write: “This is obviously an oversimplification of the task facing language learners.” We aim to develop a new ILM to address these criticisms. Let the iterated learning approach yield a language, able to describe every object found in the object space, N,through a process of compression, governed by a form of generalisation. This compression is possible by forming a cornpositional language, which describes common features of objects in the space. Figure l(a) illustrates how a compositional meaning node is able to define partially a number of objects. In the original ILM, this is automatically determined by the number of values, V ,in the object space, e.g., in Fig. l(a) each compositional meaning node is able partially to define V = 4 objects. An implicit generalisation parameter y then determines the proportion of these V values that each meaning node can generalise over: in Fig. l(a), y = 1. This parameter, ignored in previous work, impacts significantly on the structure of the final compositional language. To understand the role of the environment in the emergence of language, we need to consider what happens when the generalisation parameter y is not equal to 1. Figure l(b) shows the compression which results from halving the, now explicit, generalisation parameter. We see that 4 meaning nodes-rather than 2 as previously-are now required to specify the same number of object nodes (i.e., poorer generalisation). In this example, y = 0.25 would correspond to a holistic, non-compositional language (i.e., no generalisation). Having acknowledged the role of this (previously-implicit) generalisation parameter, we are now able to remove the ‘mind-reading’ abstraction from our
meaning node = (>1,3)
(a) Y = 1
(b) y = 0.5
Figure I . In an ILM, the object space is defined by the number of object values V in each of F dimensions. In this example, F = 2 and V = 4. In the original ILM in (a), the generalisation parameter y,representing a proportion of object values, is implicitly set to 1. By varying y as in (b), where y = 0.5, we can vary the level of compression that each compositional meaning node can achieve.
simulations. To do this, we will define the agent’s meaning space as a selforganising map (SOM) and y as a radius around a selected object, removing the two criticisms of IL stated above. An agent no longer has complete and perfect knowledge of the object space, and this knowledge remains private so that each agent develops a different ‘understanding’ of its linguistic environment.
3. Self-organking Maps and Iterated Learning Self-organking maps (Kohonen, 1982)have previously been used to good effect to model emergent phonology (e.g., Guenter & Gjaja, 1996; Oudeyer, 2005; Worgan & Damper, 2007). In the present work, SOMs offer a way to model each agent’s unique and private understanding of its environment. Our model is based on the neural network model of (Smith et al., 2003, Sect. 4.2.1), but with important differences motivated by the discussion of Section 2 and described explicitly in this section. In this environment, an object can be defined as, e.g., xk = {1,2}, and in the meaning space as mj = { 1,2}. Equivalently, it can be defined as the pair:
m(, = {I,*} m;+l
=
{*,2)
where * represents a wildcard. In this example, mj forms a holistic signal, as this individual meaning node is only capable of defining one object, whereas mi and together form a compositional signal, as features from the object space are defined by the two meaning nodes and can be combined to define an individual object. These feature definitions can then be used in other combinations to describe other objects. We will maintain this aspect of traditional IL by redefining generalisation as a variable radius around a perceived object. The weightings on the connections between nodes of the meaning and signal spaces determine the mapping from meaning-to-signal and from signal-
381
to-meaning. The object space, N , that each agent talks about is represented by a simple coordinate system and a subset of these coordinates is drawn from the object space according to a uniform probability distribution. Each object in turn is mapped directly to the appropriate meaning node in the agent’s meaning space. The signals, l i , are generated by mapping from this meaning space to the signal space, and are represented as characters from an alphabet, C as: li = { ( S l , s2,.
. . , s i , . . . , Sl) : si E c , 1 5 1 5 lmax}
(1)
from which it is clear that we need a sufficient number of signal nodes to express any of the nodes in the meaning space. Formally, the object space is: with
N
=
{X~,X~,...,X~,...,ZN}
2k
=
{ ( f i , f z , . . . ,f , , . . . , f ~ ) : l < f i I V }
When required to produce an utterance, an agent will select an object X k , and each node in the meaning space mj competes to have the shortest euclidean distance from this point. Formally, if we define the closest node as m ( 2 k ) then: m ( z k )
=argmin11x-mjII, 3
j = 1 , 2 , . . . ,1
(2)
The winning node is then moved closer to the selected point, better defining the object space as a whole. In addition, neighbouring nodes are moved somewhat closer to the object, allowing the network as a whole to represent the experienced object space. The extent to which these nodes move is determined by a gaussian function, h j , k , centred around the selected object (Haykin, 1999, p. 449): hj,k
= exp
(-3)
with 0 = y
where d j , k is the distance between the winning neuron j and the excited neuron k. To form a compositional signal, we build valid decomposition sets from the meaning space, governed by the generalisation parameter, y. We can then define a set, K k , containing all of those meaning nodes which fall inside the radius around x k . Formally:
Considering all possible decompositions in turn, the agent will pick the signal, with the highest combina\tion of corresponding weight values according to:
382 which is similar to Smith et al.’s equation on p. 380, in that w ( K ( z ) j )“. . . is a weighting function which gives the non-wildcard proportion of . . .” K ( z ) j , so favouring compositional meaning nodes. All meaning and signal nodes that correspond to a possible decomposition of the object are activated, with activations a,,and am,, respectively. If two active nodes are connected, the weight on that connection is increased. If there is a connection between an active node and an inactive node the weight is decreased. Weights between two inactive nodes remain unchanged. The learning displayed by this Hebbian network can be formalised as follows:
AWY =
{
+1 -1 0
iff a,? = am, = 1 iff a s t # am, otherwise
(6)
where A W , ~is the weight change at the intersection between s, and m3, s, E N s and m3 E N M . While listening to each utterance, the weight values of the agent are adjustedextending its knowledge of the current language. This hypothesis allows it to generalise to objects it has not encountered before, resulting in a meaningful expression. Therefore, a poverty of stimulus causes the language to generalise across an objcct spacc. Additionally, by having a limited number of nodes form the meaning space, the agent does not have an infinite memory resource to draw upon, forcing compression through limited memory as well as limited stimuli. Using this model, we will vary y in order to assess how this affects the stability, S , of the final compositional language:
where Sc represents the proportion of compositional languages and sh defines the proportion of holistic languages, which emerge over cultural time. The higher the value of S , the more likely is a compositional language to emerge-see Smith et al. (2003,p. 377). In the new model, each agent’s meaning space is undefined at birth (randomly initialised) and will need to learn the structure of the object space as each object is encountered. Consequently, the meaning space gradually comprehends the object space but also remains potentially unique to each agent, as a different subset of objects is encountered. 4. Results
We first ran the new SOM iterated learning model under the same conditions as the previous implementation, see Figure 2. As we can see from the results, compositional languages emerge ( S > 0.5) under a similar set of circumstances
383
to Smith et al.’s (2003) previous implementation. Therefore, the requirements for a tight bottleneck and a structured meaning space remain in this implementation. S
S
0.9 0.6
0.5
10
(a) 10%
(b) 90%
Figure 2. Stability of the resulting languages, calculated according to equation 7, when each agent is exposed to some percentage of the object space (Smith et al.’s “bottleneck” parameter).
Next, we considered the effect of varying the generalisation parameter, 7, as shown in Figure 3. The higher the generalisation, the greater the stability, S , of the compositional language and, conversely, the lower the generalisation, the lower the stability. This highlights the importance of the previously implicit generalisation parameter on the final stability of the compositional language. Accordingly, a reasonable level of generalisation is required to enable cultural emergence.
(a) Y = 2
(b) y : 0.5
Figure 3. Stability of the resulting languages when each agent is exposed to 10% of the object space, with different degrees of generalisation: (a) y = 2, (b) y = 0.5. Here y has been reformulated as a gaussian width, as shown in equations 3 and 4
Figure 4 shows how structuring the object space allows each meaning node to generalise over a greater number of objects, increasing the stability S . As we can see, the potential generalisation of each meaning node is not as effective as fewer objects are located in each generalisation area, the compositional meaning node can only generalise across two objects in the unstructured object space of
384 Fig. 4(b). This gives us greater insight into Smith et al. (2003)’s comparison of structured and unstructured meaning spaces. By considering these results in terms of y we can see how these meaning spaces indirectly affect the level of potential generalisation.
.........................
~__________-________________________I
meaning node I
=
(a) Structured space
(z? 1 (b) Unstructured space
Figure 4. In a structured object space, each meaning node generalises over a greater number of objects.
5. Conclusions In this paper, we have addressed some criticisms of the well-known iterated learning model of cultural language emergence, most notably the ‘mind-reading’ aspect of earlier ILM implementations. This was achieved using self-organising maps to model each agent’s meaning space. The result is a closer analogy to real cognitive spaces. Specifically, the meaning spaces are limited in the amount of memory resource they have available, and are not omniscient. Rather they are private and unique to each agent. The SOM does not have a high enough capacity to completely define the agents’ environment-forming a further motivation to generalise. We have made explicit the generalisation parameter that was previously implicit to earlier ILM’s and demonstarted its role in promoting emergence of compositionality. As well as being unique to each individual, the learning displayed by the SOM demonstrates another property of real language learners: namely, change over time with each new encountered object. These enhancements, or improvements, to the classical iterated learning framework are gained without compromising the essential tenets of the paradigm. As with the classical framework, stable, compositional languages emerge through use (i.e., inter-agent communication related to structured object spaces) over cultural time. Further, the poverty of stimulus encountered both in reality and in our simulations remains essential in the evolution of a structured language, rather than a ‘problem’ as in the Chomskyian tradition. Although in this work, we have relaxed or removed some of the weakening assumptions in the classical ILM, much remains to be done. There are still many strong simplifications and abstractions concerning the nature of language and communication utilised in our computer simulations. One important direction
385 for future work is to move towards acoustic (‘speech’) communication- having agents produce and perceive sounds coupled to meaning, as suggested by Worgan and Damper (2007).
References Bever, T., & Montalbetti, M. (2002). Noam’s ark. Science, 298(22), 1565-1566. Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8( l), 25-54. Chomsky, N. (1975). Rejections on language. New York, NY: Pantheon. Guenter, F. H., & Gjaja, M. N. (1996). The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America, 100(2), I1 11-1 121. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298(22), 15691579. Haykin, S. (1999). Neural networks: A comprehensivefoundation (Second ed.). Upper Saddle River, NJ: Prentice Hall. Kirby, S. (2001). Spontaneous evolution of linguistic structure: An iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2), 102-1 10. Kirby, S. (2002). Natural language from artificial life. Artificial Life, 8(2), 185215. Kirby, S., & Hurford, J. (1997). Learning, culture and evolution in the origin of linguistic constraints. In P. Husbands & I. Harvey (Eds.), Fourth european conference on artificial life (pp. 493-503). Cambridge, MA: MIT Press. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43( l), 59-69. Oudeyer, P.-Y. (2005). The self-organization of speech sounds. Journal of Theoretical Biology, 233(3), 435-449. Smith, A. D. M, (2005). The inferential transmission of language. Adaptive Behaviour, 13(4),31 1-324. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: A framework for the emergence of language. Artificial Life, 9(4), 371-386. Steels, L., & Wellens, P. (2006). How grammar emerges to dampen combinatorial search in parsing. In Third international symposium on the emergence and evolution of linguistic communication (eelc 2006). Published in Symbol grounding and beyond, Springer Verlag LNAI Vol. 421 1, pp. 76-88. Vogt, F! (2005). The emergence of compositional structures in perceptually grounded language games. ArtiJicialIntelligence, 167(1-2), 206-242. Worgan, S. F., & Damper, R. I. (2007). Grounding symbols in the physics of speech communication. Interaction Studies, 8( l), 7-30.
HOW DOES NICHE CONSTRUCTION IN LEARNING ENVIRONMENT TRIGGER THE REVERSE BALDWIN EFFECT?
HAJIME YAMAUCHI School of Information, Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa hoplite @jaist.ac.j p Deacon (2003) has suggested that one of the key factors of language evolution is not characterized by increase of genetic contribution, often known as the Baldwin effect, but rather the opposite: decrease of the contribution. This process is named the reverse Baldwin effect. In this paper, we will examine how through a subprocess of the reverse Baldwin effect can be triggered by the niche-constructing aspect of language.
1. Introduction While the Baldwin effect describes how previously learnt knowledge becomes a part of innate knowledge, according to Deacon, under some circumstances, innate knowledge would be replaced by more plastic, learnt knowledge. As the process seemingly follows the opposite flow of what the Baldwin effect describes, he called this process the “reverse Baldwin effect” (Deacon, 2003). This effect is thought to have a strong explanatory power, which has already been applied to explain such phenomena as the mysterious loss of the ability to synthesize Vitamin C (Deacon, 2003) in primate lineage. This paper will present how the niche constructing aspect of language evolution serves as one of the key mechanisms necessary for the reverse Baldwin effect without assuming, as Deacon has, that externally motivated changes (like climate changes) in environmental conditions would take place.
2. Masking and Unmasking processes Unlike the Baldwin effect, where a simple interaction between learning and evolution produces a complex evolutionary process, the reverse Baldwin effect consists of two distinct processes which take place serially. These subprocesses are called the “Masking” and the “Unmasking” effects, respectively. The masking effect is triggered by an environmental change shielding an extant selective pressure, and neutralizes genetic differences. The neutrality permits genes to be drifted. The unmasking effect states that after a long period of this neutralization, another environmental change takes place and this time brings back the original selective 386
387 pressure. Because of the drift, the population has to develop other ways to deal with the change. Wiles, Watson, Tonkes, and Deacon (2005) demonstrates that this increases the overall phenotypic plasticity of individuals, hence it is called the reverse Baldwin effect. Given the potential explanatory power of the reverse Baldwin effect, Deacon (2003) envisages that it could play a significant role in language evolution. However, it is apparent that, for the reverse Baldwin effect to take place, there needs to be some causal agent to induce at least the masking effects. In the case of vitamin C , it was the warm climate (and abundant fruits). Deacon considers the potential masking agent in language evolution is its niche constructing process. However, it is unclear quite how the niche constructing process comes into play as regards the masking effect. 3. Computer simulation
In order to examine how the niche constructing property of language induces the masking effect, we set an agent-based computer simulation based on (Yamauchi, 2004). In the simulation, agents in the same generation attempt to establish communications with their learnt grammar (i.e., I-language) which constructs a normative social niche (i.e., E-language) which works as a selective environment, determining the gents’ fitness. The E-language becomes the next generation’s learning environment, from which learning agents receive linguistic inputs. As such, information in a given I-language is transmitted vertically through the channels of learning and genes. During learning, if a linguistic input cannot be parsed with the agent’s current grammar, she changes her grammar so as to be able to parse it. The cost of such modifications is calculated based on what type of genetic information she has: if her genetic information is consistent with the input, the cost will be less than when it is inconsistent with the input.
3.1. Model Structure
1. The Agent An agent has a chromosome containing 12 genes for coding the innate linguistic knowledge. There are two possible allelic values; 0 and 1. The initial gene pool consists of 0s and 1s randomly. A grammar is coded as a ternary string, and the length of the string is 12 -equal to the size of the chromosome. Three possible allelic values are 0, 1 and NULL. Wherever there is a NULL allele in the grammar, this part of the grammar is considered NOT to code any linguistic knowledge. Therefore, the more NULL alleles there are in a grammar, the smaller the size of the envelope of one’s language. The agent is equipped with a cognitive capacity which enables the agent to update her grammar when her grammar cannot parse an incoming linguistic input. Also, with this cognitive capacity she can partially invent her own
388 knowledge of grammar: The energy resource of the capacity is limited, and its size is represented as a vector value which is set to 24 in this particular simulation.
2. Learning Every agent in every generation is born with a completely empty grammar; all 12 alleles are NULL. Learning is the process to update such NULL alleles to substantial alleles (i.e. 0s and 1s). A learning agent seqnentially receives linguistic inputs from 5 adult neighbors. Adults are the agents from the previous generation. A linguistic input is thought of as an utterance of an adult, which is represented by one allele of her mature grammar. Utterances derived from NULL alleles are considered as NULL utterances, and no learning (thus no grammar update) takes place. Following is the algorithm to develop the grammar: Learning Algorithm Whenever the learner receives a linguistic input: 1. If the input value and the allelic value of the corresponding locus of the learner’s grammar are different (i.e., not parsable), carry out the following procedures: (a) If the corresponding allele of the chromosome “matches” (i.e. the two values are the same) the input, update the given allele of the current grammar, and subtract 1 point from the energy resource. (b) If the corresponding allele of the innate linguistic knowledge is different from the input, update the given allele of the current grammar, and subtract 4 points from the energy resource.
2. Otherwise keep the current grammar. The subtractions from the energy resource are thought of as internal cost of learning. It is internal, as this does not directly affect an individual’s fitness value. The learning procedure is stopped when either the energy resource reaches 0, or the number of inputs reaches 120 (the critical period). NULL utterances are counted for this process. Any locus of the grammar not receiving any input (or receiving only NULL utterances) remains NULL. Who makes an utterance and which part of her grammar is provided as an input is totally random. This means that if the adults have totally different grammars, the learner may update a given allele of her grammar frequently.
3. Invention Agents are capable of inventing their own grammar. If an agent still holds NULL alleles in her grammar after learning has taken place, and if her energy resource has not yet become 0, with a probability of .01, pick one NULL allele randomly, and flip it to either 0 or 1 randomly, and subtract 1 point from the resource. This process is carried out until either no more NULL alleles remain in the grammar, or the resource reaches 0. Once
389 the invention process is over, her grammar is considered to have reached a mature state, and no more grammar updates take place.
4. Communication Each agent is involved in 6 communicative acts with her immediate neighbor peers. The size of the fitness increase is 1 according to each parsable utterance using a mature grammar spoken to a hearer (it benefits both the speaker and the hearer). The representation of an utterance is the same as for learning input. As each neighbor also speaks to each agent the same number of times, a total of 12 communicative acts are involved in gauging her fitness. The maximum fitness value is 13, as those who cannot establish any communication still receive a fitness score of 1 in order to maintain the possibility of being a parent in Reproduction.
5. Reproduction Rank selection is used for selecting parents according to their fitness: the top 50% of agents can be selected by equal chance. Single-point crossover is used, and mutation rate is set to ,001 per allele. In the simulation, 200 agents are spatially organized: Individuals are placed on a one-dimensional loop (thus one of the immediate neighbors of the 200th agent is the 1st agent). Incidences of communication only take place within a generation, and are local, since an individual attempts to communicate with her two immediate neighbor peers. While communication is an adult-to-adult process that results in natural selection, learning is thought of as a vertical, adult-to-child transmission which results in cultural inheritance. One adult provides linguistic inputs for 5 neighbor learners (from the learner’s point of view, she receives the inputs from 5 immediate neighbor adults). Together with the model design, the spatial structure described above enables the agents to construct their own grammars, and hence their linguistic communities locally, and pass them on to the next generation. In this model, two closely-related, but different types of niche construction take place: First, the selective environment is dynamically constructed, as agents in earlier generations gradually build their own grammars through inventions, and collectively they form a linguistic community. Because utility of a given grammar in a given linguistic community depends on the specific linguistic demography of the community, the mode of selection through communicative acts is frequency dependent: a type of network effect takes place, and such an effect is created by their own activities. Second, because linguistic activities in a given generation become the next generation’s learning inputs, what types of language agents can potentially learn is largely determined by their ancestors’ activities. This may not be a niche construction in a traditional sense, as learning does not directly receive selective pressure. However, we believe that this mode of construction should be called “niche construction” in its own right: It defines what class of language can be learnt, and becomes the primal cause determining the direction of the assimilatory process of the Baldwin effect. It is this type of niche construction that would mainly serve as the masking agent.
390 4. Results
All figures shown here are taken from one typical run of the simulation under the conditions described, and as such they well characterize the general tendency of the model. Figure 1 shows the average fitness of the population over time with a solid line, and the average number of NULL alleles in matured grammars with a dashed line. Rapid increase of fitness shows the whole population quickly evolves to almost the optimal state as they develop their linguistic knowledge (i.e. reduction of NULL alleles). In order to increase their fitness, agents not only have to increase the size of their linguistic envelope, but also have to develop coherent grammars with other neighbor peers so as to successfully establish communications with them. As a result of this, the agents construct a highly coherent linguistic community. However around the 2500th generation, the stable state breaks drastically, and returns to normal afterward. Figure 2 summarizes the evolutionary transition of learning and genetic assimilation. In the figure, the solid line shows the remaining energy resource after the learning procedure has been completed (but before the invention process). This indicates intensity of learning (the lower the line, the higher the intensity). The dashed line shows the similarity between an agent’s genotype and her learnt grammar (this is also measured before the invention process takes place). This indicates how much of the learning environment is assimilated by the genepool. From the data, it can be said that the whole genepool seems to well assimilate the learning environment rather quickly (i.e., genetic assimilationhhe Baldwin effect), while the intensity of learning is slowly evolving. In contrast to Figure 1, the two data do not exhibit a radical degradation. Instead the transition of both figures from the highest to the lowest is rather gradual (i.e., from the 600th generation to the 2500th generation). However the recovery is similar across different data: within a matter of a hundred generations, all figures are return to their highest scores. This indicates that another assimilatory process takes place which is much quicker than the first one.
5. Analysis The overall result provides a somewhat perplexing picture of the evolution of linguistic knowledge and its genetic endowment. Although both Figure 1 and Figure 2 indicate that something significant happens around the 2500th generation, the data in the two figures exhibit quite different profiles, especially between about the 600th generation and the 2500th generation. From Figure 1, one may well assume that something happens within a quite short period. On the contrary, the graphs in Figure 2 indicates that a substantial process silently goes on. In other words, although the selective pressure has not radically changed over the generations, the learning process undergoes something significant. To get a clearer picture, in Figure 3, the graph from Figure 2 is superimposed
391 14
I
I .a
\
Fitness
NULL \, ,*,+
0
500
1000
1500
,.-, Jbpq
2000
2500
, 3000
3500
4000
Genera tion Figure I. Evolution of communicative success measured by agents’ fitness values, and the number of NULL alleles in their grammar. Both are average over the population size.
on the spatio-temporal diagrams of agents’ grammars. Each dot corresponds to one agent, and its color is assigned to one grammar type. The 200 spatially organized agents are plotted on the y-axis. Note that the color pattern of the graph rapidly becomes monotonic, indicating that the whole population converges into a monolithic linguistic community. This is because of the first assimilatory process based on the niche constructing properties of language (Baldwinian Niche Construction, Yamauchi, 2004). Once the community has converged, almost every learner receives the same inputs from her neighbor adults: The learning environment is niche-constructed so that it becomes a “Species-typical environment” (Morton, 1994). This reduces the importance of genetic endowment once it has contributed to constructing the monolithic community; even if her genotype is not fully assimilated to the dominant grammar, learning can easily compensate for the discrepancy. In other words, under this niche-constructed monolithic community, genes are “masked” from selective pressure by the learning capacity, namely the masking effect. In the same vein, a learner can compensate for some “input noise” from adults who misconvergeda her grammar from the dominant one. We can tell these from the figure: between about the 600th generation and the 2000th genera“Note that the words “misconverge” and “noise” are used here in a relative sense: Utility of a given grammar hinges on the local demography of the community, and as such these words simply refer a situation that an agent possesses a grammar which is different from other neighbors.
392 I
I OO
Gene-Grammar Match
500
1000
1500
2000
2500
3000
3500
I
Generation
Figure 2. Evolution of learning measured by the remaining energy resource, and the degree of assimilation.
tion, although both the remaining energy and the degree of assimilation decrease, almost no apparent change is observable from the diagram. The observable noise starts to appear roughly from the 2000th generation. It is closely related how much a learner can adjust her grammar against either mal-assimilated genes or input noise. Subsequently, genetic drift is gradually introduced (this appears in the data of the gene-grammar similarity which slowly, yet steadily decreases). This means that some agents are potentially incapable of learning the dominant grammar. Such misconverged agents steadily increase (this can be observed from the diagram, as the generation proceeds from the first assimilation, “random noise” visually increases). These go hand in hand with the increase of the learning intensity. Finally, the learning intensity hits the highest point, and no more learning can take place. This prevents some learners from reducing all NULL alleles. At this stage, the effect of genetic drift first surfaces on the average fitness. This produces a new selective pressure for another assimilation. This later process may be comparable to the unmasking effect, but we will not deal with this in detail here.
6. Conclusion This experiment confirms that the niche-constructing aspect of language, especially in the language learning environment, indeed provides the masking effect which creates neutrality among different genotypes, and subsequently induces ge-
393
I
0
500
2000
2500
I
1 GsneratiGfl 1000
I 1500
I
I
3000
3500
G9flemHGfl
2000I
I
4000
Figure 3. The data from Figure 2 are superimposedon the spatio-temporaldiagrams of the grammars present in the population across the generations.
netic drift. Baldwinian niche construction is responsible for both the strong uniformity of the linguistic community, and the high fidelity of genetic information to the dominant language. References Deacon, T. W. (2003). Multilevel selection in a complex adaptive system: The problem of language origins. In B. H. Weber & D. J. Depew (Eds.), Evolution and learning (p. 81-106). Cambridge, MA: The MIT Press. Morton, J. (1994). Language and its biological context. Philosophical Transactions: Biological Sciences, 346( 1315),5-11. Wiles, J., Watson, J., Tonkes, B., & Deacon, T. (2005). Transient phenomena in learning and evolution: Genetic assimilation and genetic redistribution. Art$cial Life, Z1(1-2), 177-188. Yarnauchi, H. (2004). Baldwinian accounts of language evolution. Unpublished doctoral dissertation, The University of Edinburgh, Edinburgh, Scotland.
This page intentionally left blank
Abstracts
This page intentionally left blank
COEXISTING LINGUISTIC CONVENTIONS IN GENERALIZED LANGUAGE GAMES
ANDREA BARONCHELLI
Departament de Fisica i Enginyeria Nuclear, Universitat Polittcnica de Catalunya Barcelona, 08034,Spain
[email protected] LUCA DALL‘ASTA
Abdus Salam International Center for Theoretical Physics Trieste, 34014,Italy
[email protected] ALAIN BARRAT LP1; CNRS (UMR 8627) and Univ Puris-Sud, Orsay, F-91405 and Complex Networks Lagrange Laboratory, ISI Foundation, Turin, 10133, Italy
[email protected] VITTORIO LORETO
Dipartimento di Fisica, Universita di Roma “La Sapienza”, Roma, 00185, Italy and Complex Networks Lagrange kboratory, ISI Foundation, Turin, 10133, Italy
[email protected] The Naming Game is a well known model in which a population of individuals agrees on the use of a simple convention (e.g. the name to give to an object) without resorting to any central coordination, but on the contrary exploiting only local interactions (Steels, 1996; Baronchelli, Felici, Caglioti, Loreto, & Steels, 2006). It is the simplest model in which the idea that language can be seen as a complex adaptive system (Steels, 2000) has been applied and challenged and it has therefore become prototypical. Indeed, its simplicity has allowed for an extensive application of complex systems concepts and techniques to various aspects of its dynamics, ranging from the self-organizing global behaviors to the role of topology, and has made it one of the most studied models of language emergence and evolution (Baronchelli, Felici, et al., 2006; Baronchelli, Dall’Asta, Barrat, & Loreto, 2006). However, while the Naming Game provides fundamental insights into the mechanisms leading to consensus formation, it is not able to describe more complex scenarios in which two or more conventions coexist permanently
397
398 in a population. Here we propose a generalized Naming Game model in which a simple parameter describes the attitude of the agents towards local agreement (Baronchelli, Dall’Asta L., Barrat, & Loreto, 2007). The main result is a non-equilibrium phase transition taking place as the parameter is diminished below a certain critical value. Thus, the asymptotic state can be consensus (all agents agree on a unique convention), polarization (a finite number of conventions survive), or fragmentation (the final number of conventions scales as the system size). More precisely, it turns out that, tuning the control parameter, the system can reach final states with any desired number of surviving conventions. Remarkably, the same dynamics is observed both when the population is unstructured (homogeneous mixing) and when it is embedded on homogeneous or heterogeneous complex networks, the latter being the most natural topologies to study the emerging properties of social systems (Baronchelli, Dall’Asta, et al., 2006). We investigate the general phenomenology of the model and the phase transition in detail, both analytically and with numerical simulations. We elucidate the mean-field dynamics, on the fully connected graph as well as on complex networks, using a simple continuous approach. This allows us to recover the exact critical value of the control parameter at which the transition takes place in the different cases. In summary, our generalized scheme for the Naming Game allows us to investigate, in a very simple framework, previously disregarded phenomena, like for instance the possible coexistence of different linguistic conventions in the same population of individuals. The complex systems approach, moreover, provides us a deep understanding of the mechanisms determining the realization of the different asymptotic states, namely consensus, polarization or fragmentation.
References Baronchelli, A., Dall’Asta, L., Barrat, A., & Loreto, V. (2006). Bootstrapping communication in language games: Strategy, topology and all that. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution oflanguage: Proceedings of Evolang 6. World Scientific Publishing Company. Baronchelli, A., Dall’AstaL., A., Barrat, A., & Loreto, V. (2007). Nonequilibrium phase transition in negotiation dynamics. Phys. Rev. E, 76,051 102. Baronchelli, A., Felici, M., Caglioti, E., Loreto, V., & Steels, L. (2006). Sharp transition towards shared vocabularies in multi-agent systems. Journal of Statistical Mechanics, P06014. Steels, L. (1996). Self-organizing vocabularies. In C . G. Langton & K. Shimohara (Eds.), Artijicial Life V (p. 179-184). Nara, Japan. Steels, L. (2000). Language as a complex adaptive system. In M. Shoenauer (Ed.), Proceedings of PPNS VI. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
COMPLEX SYSTEMS APPROACH TO NATURAL CATEGORIZATION
ANDREA BARONCHELLI Departament de Fisica i Enginyeria Nucleal; Universitat Politkcnica de Catalunya Barcelona, 08034, Spain andrea.baronchelli@upcedu VITTORIO LORETO, ANDREA PUGLISI Dipartimento di Fisica, Universita di Roma “La Sapienza ” Roma, 00185, Italy
[email protected],
[email protected] Computational and mathematical approaches are nowadays well recognized tools to investigate the emergence of globally accepted linguistic conventions, and complex systems science provides a solid theoretical framework to tackle this fundamental issue (Steels, 2000). Following this path, here we address the problem of how a population of individuals can develop a common repertoire of linguistic categories. The prototypical example of the kind of phenomenon we aim to study is given by color categorization. Individuals may in principle perceive colors in different ways, but they need to align their linguistic ontologies in order to understand each others. Previous models have adopted very realistic and therefore complicated microscopic rules (Steels & Belpaeme, 2005), or evolutionary perspectives (Komarova, Jameson, & Narens, 2007). We assume the point of view of cultural transmission (Hutchins & Hazlehurst, 1995), and we introduce a new multi-agent model in which both individuals and their interactions are kept as simple as it is possible. This allows us to perform unparalleled systematic numerical studies, and to understand in details the mechanisms leading to the emergence of global coordination out of local interactions patterns (see (Baronchelli, Dall’ Asta, Barrat, & Loreto, 2006) for a discussion on this point). In our model (Puglisi, Baronchelli, & Loreto, 2007), a population of N individuals is committed to the categorization of a single analogical perceptual channel, each stimulus being a real number in the interval [0,1]. We identify categorization with a partition of the interval [0,1]in discrete sub-intervals, to which we refer as perceptual categories. Individuals have dynamical inventories of formmeaning associations linking perceptual categories to words representing their linguistic counterparts, and they evolve through elementary language games. At the 399
400
beginning all individuals have only the trivial perceptual category [0,1]. At each time step two individuals are selected and a scene of M 5 2 stimuli (denoted as Oi, with i E [l,MI) is presented to them. The speaker must discriminate the scene and name one object. The hearer tries to guess the named object, and based on her success or failure, both individuals rearrange their form-meaning inventories. The only parameter of this model is the just noticeable difference of the stimuli, &in, that is inversely proportional to the perceptive resolution power of the individuals. Thus, objects in the same scene must satisfy the constraint that loi - ojl > dmin for every pair ( i , j ) . The way stimuli are randomly chosen, finally, characterizes the kind of simulated environment. The main result is the emergence of a shared linguistic layer in which perceptual categories are grouped together to guarantee the communicative success. Indeed, while perceptual categories are poorly aligned between individuals, the boundaries of the linguistic categories emerge as a self-organized property of the whole population and are therefore almost perfectly harmonized at a global level. Moreover, our model reproduces a typical feature of natural languages: despite a very high resolution power and large population sizes (technically, also in the limit N 4 00 and dmin -+ 0), the number of linguistic categories is finite and small. Finally, we find that a population of individuals reacts to a given environment by refining the linguistic partitioning of the most stimulated regions.
References Baronchelli, A., Dall’Asta, L., Barrat, A., & Loreto, V. (2006). Bootstrapping communication in language games: Strategy, topology and all that. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution oflanguage: Proceedings of Evolang6. World Scientific Publishing Company. Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon: the development of shared symbols in interaction. In G. N. Gilbert & R. Conte (Eds.), Artijicial societies: The computer simulation of social life. UCL Press. Komarova, N. L., Jameson, K. A., & Narens, L. (2007). Evolutionary models of Color Categorization based on Discrimination. Journal of Math Psychology, to appear. Puglisi, A., Baronchelli, A., & Loreto, V. (2007). Cultural route to the emergence of linguistic categories. Arxiv preprint physics/O703164, submitted for publication. Steels, L. (2000). Language as a complex adaptive system. In M. Shoenauer (Ed.), Proceedings of PPNS VI. Lecture Notes in Computer Science. Berlin: Springer-Verlag. Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences, 28(04), 469-489.
REGULAR MORPHOLOGY AS A CULTURALADAPTATION: NON-UNIFORM FREQUENCY IN AN EXPERIMENTAL ITERATED LEARNING MODEL ARIANITA BEQA, SIMON KIRBY, JIM HURFORD School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL, UK
One approach to explaining the origins of structure in human language sees cultural transmission as a key mechanism driving the emergence of that structure (e.g., Deacon 1997). In this view, universal features of language such as compositionality are an adaptation by language to the pressure of being successfully passed on from generation to generation of language users. Crucially, this adaptation is cultural rather than biological in that it arises from languages changing rather than language users. The support for this has mainly come from computational and mathematical modelling as well as observations of the distribution of compositionality in real languages. In particular, in morphology there appears to be a connection between high frequency forms and non-compositionality (a particular kind of irregularity). Kirby (200 l), in a computational simulation, demonstrates that this is just what one would expect of a cultural adaptation. If compositionality arises from the need for reliable transmission of forms for particular meanings then we would expect that need to be greater if those meanings were low frequency. An irregular form for a particular verb, for example, can only be acquired if that particular form is seen enough times by a learner. A regular form, on the other hand, is more reliably acquired because it is supported in part by evidence from all the other meanings that participate in the regular paradigm. Kirby, Dowman & Griffiths (2007) give further support for this result using a generalised mathematical model of cultural transmission. Despite this, there is still understandable skepticism about the realism and therefore applicability of such models. Can we be sure, for example, that the differential take-up of particular errors in linguistic transmission that drives adaptation in the models mirrors what happens in reality? In this paper we respond to these concerns by replicating the models of cultural transmission of regular and irregular morphology using real human subjects. Using the methodology pioneered by Cornish (2006), we examine the evolution of a verbal morphology in an artificial language. Experimental subjects were asked to learn 24 verbs in a simple language. Each verb was presented with a picture signifying its meaning. These denoted either a man or a woman performing some action allowing us to present a language whose verbs 40 1
402
marked gender. In the initial language we constructed, half of the verbs marked gender using a regular suffix attached to an invariant stem form (e.g. sagilir vs. sagilar), and the other half indicated gender through completely different forms for the masculine and feminine verbs (e.g.fuderi vs. vebadu). We further divided both sets of verbs into high frequency and low frequency types. In training, each low-frequency verb (whether regular or irregular) appeared 3 times, whereas the high-frequency verbs each appeared 10 times. After training, subjects were asked to try and recall the verb forms for all 24 actions. To implement cultural evolution, the output of each subject at test formed the language which the subsequent subject was trained on. We observed the evolution of the languages for 5 “generations” and repeated the experiment with 8 different initial randomly constructed languages (with different experimental subjects, of course). The initial languages are constructed to show no relationship between frequency and regularity - both frequent and infrequent verbs are equally likely to be irregular. However, the experiment confirms the modelling work: languages rapidly adapt so that infrequent forms become regular. We confirm this with statistical analysis of the emergent languages, and descriptive analysis of the process of language change and regularisation in the experiment. Our experiment confirms a) infrequent forms are harder to learn than frequent forms and b) regular forms ameliorate this difficulty. An adaptively structured language will ensure that infrequent meanings will participate in regular paradigms. The primary contribution of the experiment is c) a demonstration that just such an adaptive language can emerge in a very short time even when the initial state does not have these features. This occurs without any apparent conscious design on the part of the participants (whose native language, incidentally, does not inflect verbs for gender) and is instead a natural consequence of the cultural evolution of the artificial languages. References Cornish, H. (2006). Iterated learning with human subjects: an empirical framework for the emergence and cultural transmission of language. Master’s thesis, University of Edinburgh. Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. W. W. Norton. Kirby, S. (200 1). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2): 102-1 10. Kirby, S., Dowman, M., and Griffiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104( 12):524 1-5245.
NEURAL DISSOCIATION BETWEEN VOCAL PRODUCTION AND AUDITORY RECOGNITION MEMORY IN BOTH SONGBIRDS AND HUMANS JOHAN J. BOLHUIS Behavioural Biology and Helmholtz Institute, Utrecht Universi&, Padualaan 8 Utrecht, 3584 CH, The Netherlands
1. Emancipation of the bird brain
In the search for the neural mechanisms of vocal learning and memory, mammals are usually preferred to birds as model systems, because of their closer evolutionary relatedness to humans. However, a recent overhaul of the nomenclature of the avian brain (Jarvis et al., 2005) has highlighted the homologies and analogies between the avian and mammalian brain. In the revised interpretation of the avian brain it is suggested that the pallium (including the hyperpallium, mesopallium, nidopallium and arcopallium) is homologous with the mammalian pallium, including the neocortex, but that it is premature to suggest one-to-one homologies between avian and mammalian pallial regions. Within the avian forebrain, Field L2 receives auditory connections from the thalamus, and in turn projects onto Fields L1 and Field L3. These two regions project to the caudal mesopallium and caudal nidopallium, respectively. Thus, the Field L complex appears to be analogous with primary auditory cortex, in the mammalian superior temporal gyrus. In addition, the projection regions of the Field L complex (the caudomedial nidopallium, NCM and the caudomedial mesopallium, CMM) may then be analogous with the mammalian auditory association cortex. 2. The neural substrate of tutor song memory in songbirds
The process through which young songbirds learn the characteristics of the songs of an adult male of their own species has strong similarities with speech acquisition in human infants (Doupe & Kuhl, 1999). Both involve two phases: a 403
404
period of auditory memorisation followed by a period during which the individual develops its own vocalisations. The avian ‘song system’, a network of brain nuclei, is the likely neural substrate for the second phase of sensorimotor learning. In contrast, the neural representation of song memory acquired in the first phase is most probably localised outside the song system, notably in the NCM and CMM, regions within the likely avian equivalent of auditory association cortex (Bolhuis & Gahr, 2006). In zebra finches, neuronal activation (measured as expression of immediate early genes, IEGs) in the NCM correlated with the number of song elements that a male had learned from its tutor, suggesting that NCM may be (part of) the neural substrate for stored tutor song.
3. Neural dissociation between vocal production and auditory memory Bilateral neurotoxic lesions to the NCM of adult male zebra finches impaired tutor song recognition but did not affect the males’ song production or their ability to discriminate calls (Gobes & Bolhuis, 2007). These findings support the suggestion that the NCM contains the neural substrate for the representation of tutor song memory. In addition, we found a significant positive correlation between neuronal activation in the song system nucleus HVC and the number of song elements copied from the tutor, in zebra finch males that were exposed to their own song, but not in males that were exposed to the tutor song or to a novel song. Taken together these results show that tutor song memory and a motor program for the bird’s own song have separate neural representations in the songbird brain. Thus, in both humans and songbirds the cognitive systems of vocal production and auditory recognition memory are subserved by distinct brain regions.
References Bolhuis, J.J., & Gahr, M. (2006). Neural mechanisms of birdsong memory. Nature Reviews Neuroscience, 7 , 347-357. Doupe, A.J., & Kuhl, P.K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22,567-63 1. Gobes, S. M. H., & Bolhuis, J. J. (2007). Bird song memory: A neural dissociation between song recognition and production. Current Biology, 17, 789-793. Jarvis, E., et al. (2005). Avian brains and a new understanding of vertebrate brain evolution. Nature Reviews Neuroscience, 6, 15 1-159.
DISCOURSE WITHOUT SYMBOLS; ORANGUTANS COMMUNICATE STRATEGICALLY IN RESPONSE TO RECIPIENT UNDERSTANDING ERICA A. CARTMILL AND RICHARD W. BYRNE School of Psychologv, Universiv of St Andrews St Andrews, K Y l 6 9JP, Scotland
When people are not fully understood, they persist with attempts to communicate, elaborating their speech in order to better convey their meaning. This combination of persistence and elaboration of signals is considered to be an important criterion for determining intentionality in human infants (Bates et al., 1979; Golinkoff, 1986; Lock, 2001; Shwe & Markman 1997), and plays an essential role in human language, allowing us to clarify misunderstandings and disambiguate meaning. Chimpanzees have been shown to use persistence and elaboration in requesting food items from an experimenter (Leavens et al., 2005), and so these abilities likely predate human symbolic communication. Persisting in one’s attempts to reach a goal and discarding signals if they have failed to achieve the desired response could be mediated by relatively simple mechanisms and do not require an understanding of the recipient as an autonomous player in a communicative event. However, responding to how well one’s message has been understood is a more complex ability, requiring at least a functional use of the recipient’s mental state. We investigated whether captive orangutans (Pongo pygmaeus and Pongo abelii) would use persistence and elaboration when signaling to a human experimenter, and whether they could adjust their communicative strategies in response to how well the experimenter appeared to understand their signals. Captive orangutans were presented with situations in which out-of-reach food items required human help to access but the experimenter sometimes “misunderstood” the orangutan’s requests. Using a partially modified design from Leavens et al. (2005), we offered subjects both a highly desirable and a relatively undesirable food, allowing them the opportunity to request one or the other food by gesturing. The experimenter was initially unresponsive, and then gave the orangutan the entire desirable food (full understanding), half the desirable food (partial understanding), or the entire undesirable food 405
406
(misunderstanding). We then compared the orangutans’ gestures before and after the receipt of food. The orangutans altered their communicative strategies according to how well they had apparently been understood (Cartmill & Byrne, 2007). When the recipient simulated partial understanding, orangutans narrowed down their range of signals, focusing on gestures already used and repeating them frequently. In contrast, when the recipient simulated misunderstanding, orangutans elaborated their range of gestures, avoiding repetition of failed signals. It is therefore possible, from communicative signals alone, to determine how well an orangutan’s intended goal has been met. They transmit not only information about their desires but also about the success of the communicative exchange. A human observer can tell how well the orangutan’s Communicative goal has been met by considering the types and patterns of gestures the orangutan uses following delivery of a food item. If orangutan recipients are able to use this information as well, then it might function within their species as a method of achieving understanding more quickly. In the absence of a shared lexicon, one way of arriving at a shared meaning is to transmit not only the content of the intended message but also an indication of how well you have been understood. If the recipient can use this information, then the signaler and recipient will be able to arrive at a common understanding much faster. It is possible that this strategy played a central role in the earliest stages of “language.” If early humans had few referential gestures or vocalizations that were shared by the entire group, the strategy employed by the orangutans could have functioned as a way to communicate about novel or unlabelled events. It is possible that such strategies could also have resulted in the creation or adoption of new labels, thus helping to expand an initially bounded communication system into a more flexible one, bringing it one step closer to full-blown language. References Bates, E., Benigni, L., Bretherton, I., Camaioni, L., & Volterra, V. (1979). The Emergence of Symbols. New York: Academic Press. Cartmill, E. A,, & Byrne, R. W. (2007) Orangutans modify their gestural signaling according to their audience’s comprehension. Current Biology. 17, 1345-1348 Golinkoff, R. M. (1986). “I beg your pardon?”: The pre-verbal negotiation of failed messages. J. ChildLang. 20,199-208. Leavens, D. A,, Russell, J. L., & Hopkins, W. D. (2005). Intentionality as measured in the persistence and elaboration of communication by chimpanzees (Pan troglodytes). Child Dev. 76, 291-376. Lock, A. (2001). Preverbal communication. In J.G. Bremner & A. Fogel (Eds.) Blackwell Handbook of Infant Developmenf (pp. 379-403). Oxford: Oxford University Press. Shwe, H., & Markman, E. (1997). Young children’s appreciation of the mental impact of their communicative signals. Dev. Psychol. 33,630-636.
TAKING WITTGENSTEIN SERIOUSLY. INDICATORS OF THE EVOLUTION OF LANGUAGE CAMILO JOSE CELA-CONDE Department of Philosophy and Social Work, University of the Balearic Islands, Crta. Valldemossa. km 7,s Palma de Mallorca, 07122, Spain MARCOS NADAL, ENRIC MUNAR, ANTON1 GOMILA Department ofPsychology, University ofthe Balearic Islands, Crta. Valldemossa. km 7.5 Palma de Mallorca, 07122, Spain
ViCTOR M. EGUILUZ IFISC (Institute for Cross-Disciplinay Physics and Complex Systems). University ofthe Balearic Islands and Consejo Superior de Investigaciones Cientipcas, Crta. Valldemossa, km 7,s Palma de Mallorca, 07122, Spain
“Wovon man nicht sprechen kann, dariiber mu$’ man schweigen” Proposition # 7. Ludwig Wittgenstein, Logisch-Philosophische Abhandlung Wilhelm Ostwald (ed.), Annalen der Naturphilosophie, 14 (1921) Should we follow Wittgenstein and be quiet regarding the evolution of language? Or would it be too pretentious, even pedantic, to conclude that the long discussions about the relation between animal and human communication, and the conclusions offered by those comparative studies of our speech, do not actually throw light on the evolution of language? Our contribution to this symposium will be limited to taking Wittgenstein seriously. In this respect, we will try to clarify what researchers are trying to find out when studying the evolution of language, what is actually known about this process, and what conclusions are justified by such evidence. The index of our examination will be as follows: 407
408
What are we talking about? Definition of the concepts of “evolution” and ‘‘language’’ Language as a functional apomorphy fixed by natural selection after the divergence of the Homo and Pan lineages 2. The study of functional apomorphies: available tools in the case of language phylogenesis (LP) 3. Fossil evidences of LP 4. Archaeological evidence of LP 5. Genetic findings that are informative of LP 6. Mathematical models of human language 1.
AN EXPERIMENT EXPLORING LANGUAGE EMERGENCE: HOW TO SEE THE INVISIBLE HAND AND WHY WE SHOULD
HANNAH CORNISH Language Evolution and Computation Research Unit, Universiiy of Edinburgh, UK
[email protected] The complex adaptive systems view of language sees linguistic structure arising via the interaction of three dynamical systems operating over different timescales; biological evolution over the life-time of the species, cultural evolution over the life-time of the language, and individual learning over the life-time of the speaker (Kirby & Hurford, 2002). The outcome is the cultural adaptation of language to the different constraints imposed upon it by transmission (Kirby, Smith, & Cornish, 2007). These constraints can take a variety of forms, but the effect is largely similar; language adapts to become more easily learnable and transmittable by our brains rather than the other way around. Previous work exploring this idea has made extensive use of computational simulation (e.g. Kirby and Hurford (2002)). Models have shown it is possible for language to evolve culturally in populations of artificial agents as predicted, and furthermore, that the resultant systems exhibit some key universal features of human language. This lends strong support to the idea that the mechanism of cultural transmission plays a very powerful role in the evolution of language. In spite of this however, little is known about how such processes work in actual human populations. A simple question is therefore this: can the kinds of cultural adaptations seen in these models be observed in human populations in the laboratory? The development of experimental studies to explore aspects of language evolution is a fairly recent phenomenon, with work such as Fay, Garrod, MacLeod, Lee, and Oberlander (2004), Galuntucci (2007), and Selten and Warglien (2007) being examples. In spite of their many differences, one thing that all three of these approaches have in common is the fact that they rely on their subjects consciously negotiating a system of communication. Although the resultant systems show signs of cultural adaptation, they are clearly constructed devices. To illustrate, Selten and Warglien (2007) explicitly instruct participants to create a communication system with a partner, and that different symbols at their disposal in creating such a system have explicit costs which they should minimize. The languages that emerge are therefore the product of careful design on the part of the participants involved. Is this a good model for language? 409
41 0
Keller (1994) would argue not. As he sees it, much of what constitutes human language results from an ‘invisible hand’ process - whilst language change does have its origins in the actions of speakers, no single individual ‘decides’ to modify the language in order to effect an improvement. At the same time, this need not imply that all change is simply random drift. It is a defining characteristic of an invisible hand process that the end result is adaptive: we see the appearance of design without a designer. Bearing this in mind, this paper asks a second question. Previous experimental work already mentioned shows cultural adaptation of language can come about through intentional acts, but can it also come about through the unintentional actions of individuals? In order to address this, an alternative experimental framework is presented (Cornish, 2006) which confirms the central findings to have emerged from the computational literature. Participants are trained on a subset of an (initially unstructured) ‘alien’ language and then tested. A sample of the output of generation n is then given as training input to generation n + l , and the process iterates. Even when subjects are only exposed to half the language during training we still see gradual cumulative cultural adaptation leading to the emergence of an intergenerationally stable system. By simply changing the constraints on transmission slightly, we see different types of structure emerge, such as compositionality. Significantly, this is achieved without intentional design on the part of the participants.
References Cornish, H. (2006). Iterated learning with human subjects: an empiricalfiamework for the emergence and cultural transmission of language. Unpublished master’s thesis, MSc Thesis, The University of Edinburgh. Fay, N., Garrod, S., MacLeod, T., Lee, J., & Oberlander, J. (2004). Design, adaptation and convention: the emergence of higher order graphical representations. In Proceedings of the 26th annual conference of the cognitive science society (pp. 41 1-416). Galuntucci, B. (2007). An experimental study of the emergence of human communication systems. Cognitive Science, 29,737-767. Keller, R. (1994). On language change: The invisible hand in language. London: Routledge. Kirby, S., & Hurford, J. (2002). The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (p. 121-148). London: Springer Verlag. Kirby, S., Smith, K., & Cornish, H. (2007). Language, learning and cultural evolution: how linguistic transmission leads to cumulative adaptation. (Forthcoming) Selten, R., & Warglien, M. (2007). The emergence of simple languages in an experimental co-ordination game. PNAS, 104(18),7361-7366.
THE SYNTAX OF COORDINATION AND THE EVOLUTION OF SYNTAX
WAYNE COWART DANA MCDANIEL Linguistics Department, University of Southern Maine Portland, Maine, USA, 04104
Our purpose is to articulate and explore a possible connection between the syntactic theory of coordination and the theory of language evolution. The asymmetric functor-argument relation central to Merge (Chomsky, 1995) has come to be widely regarded as the foundational relationship in syntactic theory. Moreover, the recursive system based on Merge has been proposed as the sole uniquely human component of the human linguistic system, what Hauser, Chomsky, and Fitch (2002) term FLN - Faculty of Language - Narrow Sense. With these developments in view, the apparent symmetry of coordinate structures seems increasingly anomalous. Here we suggest that progress may be possible by reexamining what we term the Homogeneity Thesis - the widely accepted presumption that coordinate structures arise within the same general framework of syntactic structure as organizes prototypical subordinating structures. We review evidence suggesting that the Homogeneity Thesis is in fact false and propose that, by rejecting it, it may be possible to formulate a more plausible model of the evolution of the modern human linguistic system. Among several relevant lines of evidence, we report recent experimental evidence from English contrasting attraction-like effects (Bock, Eberhard, Cutting, Meyer, & Schriefers, 2001); (Eberhard, Cutting, & Bock, 2005) with complex coordinate and subordinate NP subjects. The materials were structured as in (1).
{
{ {
a newspaper ~ ~ ~ ~ ~ oI:d}o , some , ) newspapers
}{rre}
on the desk.
(1)
We compared grammatically illicit effects on judged acceptability that could be traced to the second NP, which was always at the right edge of either a coordinate or subordinate complex NP. As expected, the results showed strong, reliable differences in pattern between coordinate and subordinate forms, F1( 1,47) = 8.37, p articulated human language
44 1
442
References
Burling, R. (2005) The Talking Ape: How Language Evolved. Oxford University Press. Call, J and Tomasello, M. (2007). The gestural communication of apes and monkeys. Lawrence Erlbaum. Fitch, W.T., M.D. Hauser and N. Chomsky (2005) The evolution of the language faculty: clarifications and implications, Cognition 97(2): 179-210. Hauser, M.D., Chomsky N and W.T. Fitch (2002) The faculty of language: What is it, who has it, and how dos it evolve? Science, 298, 1569-1579 Jackendoff R. (1999) Possible stages in the evolution of the language capacity; Trends in Cognitive Sciences, 3: 7; 272-279 Jackendoff, R. and S. Pinker (2005) The nature of the language faculty and its implication for evolution of language, Cognition 97, 21 1-225 Pinker, S. and R . Jackendoff (2005) The faculty of language : What’s special about it? Cognition, 95,201-236 Vauclair, J. (2003) Would Humans without language be apes? In J. Valsiner and A. Toomela Cultural guidance in the development of the human mind: V o l 7 Advances in child development within culturally structured environments, pp. 9-26; Greenwich, CT: Ablex Publishing Corporation
AFTER ALL, A “LEAP” IS NECESSARY FOR THE EMERGENCE OF RECURSION IN HUMAN LANGUAGE MASAYUKI IKE-UCHI Language Evolution and Computation Research Unit, University of Edinburgh, UK und
Department of English, Tsuda College, Tokyo, JAPAN
[email protected] c j p The goal of this paper is to reconfirm the necessity of some kind of ‘‘leap’’ (i.e., punctuation, a qualitative change, or appropriation) for the emergence of recursive properties in human language both by showing the “sneak-in’’ problem in computational multi-agent modeling approaches and by revealing the implicit postulation of a ‘‘leap’’ in biological adaptationism approaches. Thus, this paper will reaffirm that continuous evolution from linear syntax to recursive syntax is not plausible. The usual definition of the notion of recursion will be assumed, including both nested and tail recursion. Researchers who have taken multi-agent modeling constructive approaches have claimed recursion-hierarchical structure-spontaneously emerges from things non-recursive like linearity. But closer scrutiny reveals this is not correct, because the very recursive properties themselves sneak into or are (implicitly) included in the initial conditions imposed on the agents. For example, Kirby (2002)’s agents have initial rules like Ybelieves (john, praises (heather, mary)) + ei, which in effect include syntactic embedding, when the simulation starts. In Batali (2002), as he himself notes, “the agents begin a simulation with the ability to use embedded phrase structure.” In other words, they have Merge from the outset. A similar argument holds of the embodiment modeling (for instance, Steels & Bleys (2007)), too. In sum, it has not been proved yet in terms of computer simulation approaches that recursion (or hierarchy) spontaneously emerges from non-recursive linear properties through interactions among the agents. In biological adaptationism approaches (Jackendoff (2002) and Parker (2006), for example), several steps have been postulated for the evolution of 443
444
current human language. Part of the syntaxlLF side of Jackendoff’s incremental scenario is: ... 0 Concatenation of symbols -+ @ Use of symbol position to convey basic semantic relations -, @ Protolanguage + @ Hierarchical phrase structure -+ @ Grammatical categories ... It should be pointed out here that a transition from stage 0 to @, in particular, is a clear qualitative ‘‘leap’’ from linearity to hierarchical recursion (although it is not explicitly recognized). In short, in these approaches, a certain ‘‘leap’’ has been implicitly postulated for the introduction of recursion into human language, though the approaches themselves are otherwise based on the assumption that every evolutionary step is just gradual, continuous, and incremental in accordance with the theory of natural selection. Notice that this is not a simple terminological issue, but is concerning a crucial qualitative difference between certain evolutionary steps in human language. If we didn’t properly recognize it, then that would be equivalent to saying that the evolution of language is no different from that of, say, the beak of the darwinfinch, which no one ever accepts. Noting that there may be principled reasons why two-dimensional, vertical, hierarchical recursion does not gradually derive from onedimensional, horizontal linearity, and also touching on the evidence from language acquisition (Roeper, 2007), I will conclude that (at least) at the present stage of inquiry into the origins and evolution of human language, some qualitative ‘‘leap’’ must be assumed for the emergence of recursion. ---f
-
References Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In T. Briscoe (Ed.), Linguistic evolution through language
acquisition (pp. 1 1 1-1 72). Cambridge: Cambridge University Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In T. Briscoe (Ed.), (pp. 173-203). Parker, A . (2006). Evolution as a constraint on theories ofsyntax: The case against minimalism. Ph.D. dissertation, the University of Edinburgh. Roeper, T. (2007). The prism of grammar: How child language illuminates humanism. Cambridge, MA: MIT Press. Steels, L., & Bleys, J. (2007). Emergence of hierarchy in fluid construction grammar. In
Proceedings of the social learning in embodied agents workshop at the 9‘” European conference on artificial l i f e .
LABELS AND RECURSION: FROM ADJUNCTION SYNTAX TO PREDICATE-ARGUMENT RELATIONS ARITZ IRURTZUN Linguistics and Basque Studies, University of the Basque Country, Vitoria-Gasteiz, 01006, Basque Country (Spain)
I explore the emergence and ontology of syntactic labels. I propose that labels are created derivationally as a ‘reparation’ that circumvents the violation of a legibility condition. As a consequence, I argue that predicate-argument relations are derived from a more primitive adjunctive syntax without labels (cJ: Hornstein, Nunes & Pietroski (2006), Hinzen (2006)). First, I show that the proposal of the label-free syntax (cJ: Collins (2002)) has serious empirical drawbacks: I briefly discuss the phenomena of XP movement, islands, incorporation, quantificational dependencies and argument structure. All these phenomena make reference to labeled XPs. But assuming labels some questions arise: (i) Why do syntactic phrases have labels? (ii) How do labels appear derivationally? (iii) How do labels identify the set they label? Having Merge as just symmetrical set-formation (cJ Chomsky (2005), Hinzen (2006)) entails that in itself, the merger of (a, p) cannot give a labeled structure, but a simpler {a, p} set. So, the only way to get a labeled structure using just Merge and the lexicon is to take Merge as a compound operation where the first step creates a bare set and the second one provides it a label (1).
V
DP
V
DP
That would answer question (i). However, since the notion of ‘labelhood’ is vague (after all, V is just one of the members of the {V, {V, DP}} set of (lb)), the ontology and consequences of labelhood will have to be explained (questions (ii) and (iii)). My proposal relies in the hypothesis that interfaces require sets with coherent categorial intensions. 445
446
Given such a restriction, labeling operations can be explained as repairing strategies (answering questions (ii-iii)): the label provides a set with a coherent intension (ie. all of the members of the set contain a given categorical feature). For instance, in the step1 of (la), the simple {V, DP} set is created but at this step, the set {V, DP} is heterogeneous: there is no grammatical category that can provide it a coherent type, and hence, it is illegible (assuming a Neodavidsonian conjunctivist semantics, in (la) we have two unrelated monadic predicates (something like {kiss(e) & Mary(y)})). I will argue that the labeling mechanism provides the step from this adjunct-like syntax of conjunction of independent predicates to the hierarchical predicate-argument syntax based on labels (cf: Hornstein, Nunes & Pietroski (2006), Hinzen (2006)): having {V, DP} in (la), the verbal head (the syntactically active locus) is remerged with the structure to give it a coherent type (1b). Now an asymmetry emerges in the new set; crucially, both members of {V, {V, DP}} will have a verbal character (both contain a [+V] categorial feature). Thus, the set {V, {V, DP}} labeled with a verbal intension is readable at the interfaces. We are left with a last problem though: the primitive {V, DP} of (la) (now, a member of {V, {V, DP}} in (lb)) is still an illegible object. And obviously, recursion on the labeling strategy won't solve the problem. Here my proposal is a purely repairing strategy: the DP that as such is interpretable ( i e , Val(y, Mary) iffMary(y)) is now in a verbal environment at the highest phrase (a VP). Thus, the solution to the VPcontained DP is to lift its type (ci la Pietroski (2005)) to accommodate its type to that of the intension of the highest set that contains it: this turns the DP complement of V from an individual-denoting type into an event-participant one (an argument) (2): (2) Val(y, Mary) iffMary(y) + Val(e, int-Mary) iffTheme(e, Mary)) Finally, I will argue that taking adjunction syntax to be more basic than predicate-argument syntax provides as well a way to characterize the operation of labeling as a crucial step in the evolution of the human language capacity: labeling provides a crucial trait of natural language; recursion.
References Collins, C. (2002). Eliminating Labels. In S. D. Epstein and T. D. Seely (Eds.), Derivation and Explanation in the Minimalist Program (pp. 42-64), Oxford: Blackwell. Hinzen, W. (2006). The successor function + Lexicon = Human Language?. Ms: U. Amsterdam & U. Durham. Hornstein, N., Nunes, J. & Pietroski P. (2006). Adjunction. Ms: UMCP. Pietroski, P. (2005). Events and Semantic Architecture, Oxford: OUP.
ITERATED LEARNING WITH SELECTION: CONVERGENCE TO SATURATION MICHAEL KALISH Institute for Cognitive Science. University of Louisiana at Lafayette Lafayette, LA 70504-3 772 USA A formal approach to language evolution requires specification of the properties
of variation and selection. Variation is plausibly the result of replication; errors in intergenerational learning produce variability in each generation (Griffiths & Kalish, 2007). A mechanism for selection is less transparent, and this may explain a bias toward selection-free evolutionary accounts of iterated learning as intergenerational transmission. Learning has interesting properties as a source of variation since its variability is not purely random, but rather depends on the data available for learning and the inductive biases of the learners. Exploring the role of inductive biases in iterated learning has resulted in clear results concerning the dynamic and asymptotic properties of the process. However, if we assume that a single set of linguistic universals dominate human languages these results leave a puzzle, since they suggest that there should be a distribution of universals equivalent to the prior bias (that is, learnability) of these priors (Dowman, Kirby & Griffiths, 2006). One might ask, are universals homogeneous or is there some stability in their spatial heterogeneity? Under the assumption that learners are Bayesian (that is, that they update their knowledge according to their experience), the iterated transmission of information results in the convergence of a population of independent learners to their common inductive priors (Griffiths & Kalish, 2007). To date, however, iterated learning has only been examined in the limit case of a large population of well mixed individuals, reproducing without constraint by fitness. The research presented here is a first empirical step in broadening this focus to spatially distributed populations of fixed size in which fitness plays a role in replication. I examined two different processes that both included selection based on communicative fitness and mutation based on Bayesian learning. (1) A birthfirst (Moran-like) process where only one agent in the space, chosen with a probability proportional to its relative fitness, reproduces on each cycle. The spawn then replaces a randomly chosen agent within the parent's neighborhood, possibly including the parent. (2) a deterministic (cellular-automaton-like) 447
448
process where every agent is replaced by the spawn of the fittest agent in the neighborhood. Agents were defined as Bayesian learners, equipped with just two hypotheses (A and B) which they induced through exposure to samples drawn from four possible signals (see Griffiths & Kalish, 2007 for details of the 'two language' example). Agents were placed on a torus and associated in Moore neighborhoods. I varied the number of samples (controlling stability of transmission) and the prior bias of hypothesis B (which controls the stationary distribution in the absence of selection). Fitness was frequency dependent, but symmetric between pairs of agents, reflecting their probability of mutual understanding, as in Nowak, Plotkin & Krakauer (1997). Similar to Nowak's (2006) analytic results for arbitrary mutation, the stability of intergenerational transmission largely determined the outcome of the simulations for the deterministic process. At high stability initial conditions dominated; whatever hypothesis was most prevalent initially increased fitness for agents operating with that hypothesis and thus the transmission probability of it. At low stability, as predicted by iterated learning, bias dominated as each agent was unlikely to shift from their prior due to the noisy data. At middle levels of stability the space was likely to saturate at one of the two hypotheses, with probability determined by both stability and prior bias. Spaces in which both hypotheses were maintained indefinitely decreased with increasing stability, but only stochastically. The spatial distributions of hypotheses in these spaces were not entirely random, but self-maintaining structures did not occur. The Moran process, in contrast, converged to the prior bias regardless of initial conditions, with convergence rate decreasing nonlinearly with the number of samples seen during learning. Either linguistic universals are homogeneous, or they are not because either (1) our space is in transition or (2) more complex processes govern the space of learners. Distinguishing these three possibilities remains a target for this research.
References Dowman, M., Kirby, S., and Griffths, T. L. (2006). Innateness and culture in the evolution of language. In Cangelosi, A,, Smith, A,, and Smith, K., editors, The Evolution of Language: Proceedings of the 6th International Conference on the Evolution of Language. World Scientific Press. Griffiths, T. & Kalish, M. (in press). Iterated learning with Bayesian agents. Cognitive Science. Nowak, M . A . (2006). Evolutionary Dynamics. Harvard U. Press: Cambridge Nowak, M. A., Plotkin, J . B., andKrakauer, D. C. (1999). The evolutionary language game. Journal of Theoretical Biology, 200: 147-162.
A REACTION-DIFFUSIONAPPROACH TO MODELLING LANGUAGE COMPETITION
ANNE KANDLER
JAMES STEELE
AHRC Centre for the Evolution of Cultural Diversity, Institute of Archaeology, University College London, 31-34 Gordon Square, London WClH OPY UK,
[email protected] [email protected] In this paper we consider competition between two languages (where there is also bilingualism) and try to formalise and explain its dynamic. By language competition we mean simply competition for speakers. Simple evolutionary models of language origins have emphasised the importance of co-operation within social groups, as a pre-condition for the emergence of stable shared linguistic conventions. Here we explore the dynamics of changing group size and the stability of group membership when groups are defined by the possession of a shared language, and when groups with different languages come into contact and compete for members. We take an ecological approach, as promoted in linguistics by Mufwene (Mufwene, 2002) and Nettle (Nettle, 1999) among others. Following the paper of Abrams and Strogatz (Abrams & Strogatz, 2003) which presented a two-language competition model to explain historical data on the decline of endangered languages, a number of modeling approaches to this topic have been published. Patriarca and Leppanen in (Patriarca & Leppanen, 2004) set up a reaction-diffusion model and showed that if both languages are initially separated in space and interact only in a narrow transition region, then preservation of the subordinate language is possible. Further, Pinasco and Romanelli developed in (Pinasco & Romanelli, 2006) an ecological model of Lotka-Volterra type which allows coexistence of both languages in only one zone of competition. Very recently Minett and Wang developed an interesting extension of the original Abrams and Strogatz model by including bilingualsm and a social structure. a The present paper should be seen as a further generalisation of the above approaches. We describe the interaction and growth dynamics of two competing languages in a reaction-diffusion competition model. However we also include a bilingual component, following (Baggs & Freedman, 1993) and contrast the reaA number of other mathematical approaches to language competition exist, including agent-based models (Castellb et. al., 2007) and Monte Car10 simulations based on game theory (Kosmidis et. al., 2005). some of which consider bilingualism (Baggs & Freedman, 1993); Caste116 et. al., 2007). Schulze and Stauffer have published a review of such work by physicists (Schulze & Stauffer, 2006).
449
450 sults with the findings of the Minett and Wang model. In our model, language switching cannot occur directly from one monolingual state to the other. There must be an intermediate step - the bilingual state. We develop a model which includes growth, spread and interaction of all three sub-populations of speakers. The reproduction of speakers is described by a logistic growth function with a ’common carrying capacity’, which restricts the sum of frequencies of the monolingual and bilingual components. The spatial spread is modeled by a diffusion term, and the different conversion mechanisms are included as competition terms. We are interested in long term equilibria of the three components, and derive existence and stability conditions for these states. We show that depending on environmental conditions, either coexistence of all three components or the extinction of one monolingual and the bilingual component are possible. Figure 1 shows an example of the course of language competition if each language is dominant in its ’home range’. The blue and red dots show the presence of speakers of the different languages. Growth and spread lead to an interaction zone. There both languages put pressure on each other, and as a result a bilingual group (green dots) occurs. Now the competitive strengths of both languages determine whether individuals of the bilingual group stay bilingual, or switch to one of the monolingual groups. Figure 1 (right) shows a stable long term equilibrium obtained where all three components coexist.
Figure 1. Example of language competition in which the parameter values lead to the stable coexistence of the two monolingual (red and blue) and of the bilingual (green) components.
eferences Abrams, D., & Strogatz, S. (2003). Modelling the dynamics of language death. Nature, 424,900. Baggs, I., & Freedman, H. (1993). Can the speakers of a dominated language survive as unilinguals? Mathl. Comput. Modelling, 18,9-18. Mufwene, S. (2002). Colonisation, globalisation, and the future of languages in the twenty-first century. Int. J. on ~ulticulturalSocieties, 4(2), 162-193. Nettle, D. (1999). Linguistic diversity. Oxford: Oxford University Press. Patriarca, M., & Leppanen, T. (2004). Modeling language competition. Physica A, 338,296-299. Pinasco, J., & Romanelli, L. (2006). Coexistence of language is possible. Physica A, 361,355-360. Schulze, C., & Stauffer, D. (2006). Recent developments in computer simulations of language competition. Computing in Science and Engineering, 8,60-67.
ACCENT OVER RACE: THE ROLE OF LANGUAGE IN GUIDING CHILDREN'S EARLY SOCIAL PREFERENCES KATHERINE D. KINZLER Department of Psychology, Harvard University KRISTIN SHUTTS Department of Psychology, Harvard University EMMANUEL DUPOUX LSCP, EHESS, CNRS, 29 Rue d'Ulm, Paris, 75005, France ELIZABETH S. SPELKE Department of Psychology# Harvard University
Gender, age, and race have long been considered the primary categories by which adults and children divide the social world. However, there is reason to doubt the role of any of these categories in the evolution of intergroup conflict. In neither ancient nor modern times were human groups comprised solely of individuals of one gender, or one age. While race may act as a marker for group membership today, in evolutionary times, groups separated by small geographic distances did not differ in physical properties such as race. Rather, our current attention to race may reflect a system that evolved for other purposes (Kurzban, Cosmides, & Tooby, 2001). In contrast to race, neighboring groups in ancient times likely differed in terms of the language or accent with which they spoke. Cognitive evolution therefore may have encouraged attention to language and accent as a mechanism for determining who is a member of us, and who is a member of them. The present research investigates the origins of attention to language as a social grouping factor, If language is indeed a psychological salient factor we use to make judgments about novel individuals, it might be observed early in development. Moreover difference in accent and language may trump differences in race in importance. Experiment 1 investigated young infants' looking preferences towards native speakers, finding that infants prefer to look longer at someone who 451
452 previously spoke in a native language compared to a foreign language, as well as a native accent compared to a foreign accent (Kinzler, Dupoux, & Spelke, 2007). Experiment 2 tested infants’ social preferences for native speakers more directly (Kinzier et al., 2007). In this study, 10-month-old infants in the U.S. and France viewed movies of an English-speaking actress and a Frenchspeaking actress. Following this, silently and in synchrony, each speaker held up identical toys and offered them to the baby. Just at the moment when the toys disappeared off screen, two real toys appeared for the baby to grasp, giving the illusion that they came from the screen. Infants in Paris reached for toys offered by the French-speaker, and infants in Boston reached for toys offered by the English-speaker, even though the toys were identical and the interactions non-linguistic in nature. In-progress research with 10-month-old infants shows that in contrast to the effects observed with language, infants do not preferentially accept a toy from a member of their own race compared to a member of a different race. Therefore, language, rather than race, influences children’s early interactions with others. In Experiment 3, two-and-a-half-year-old children demonstrated pro-social giving to a native-language speaker, compared to a foreign language speaker. Again, this effect did not obtain with race: Children gave equally to own-race and other-race individuals. Finally, Experiment 4 tested older children’s explicit friendship choices based on language. Five-year-old children demonstrated social preferences for native speakers over foreign speakers or speakers with a foreign accent, and these preferences were not due to the intelligibility of the speech. Finally, although White English-speaking children stated explicit preferences for White children in isolation, when accent was pitted against race, children chose to be friends with someone who was Black and spoke in a native accent. Together, this research provides evidence of the robust effect of language on early social cognition, and its relative importance compared to race in children’s social reasoning. Children, therefore, may attend to social factors that were important indicators of group membership throughout cognitive evolution. References Kinzler, K.D., Dupoux, E., & Spelke, E.S. (2007). The native language of social cognition. The Proceedings of the National Academy of Sciences of the United States ofAmerica, 104, I25 77-12580.
Kurzban, R., Tooby, J., & Cosmides, L. (2001). Can race be erased? Coalitional computation and social categorization. The Proceedings of the National Academy of Sciences of the United Sates ofAmerica, 98, 15387-15392.
LANGUAGE, CULTURE AND BIOLOGY: DOES LANGUAGE EVOLVE TO BE PASSED ON BY US, AND DID IFUMANS EVOLVE TO LET THAT HAPPEN? SIMON KIRBY School ofPhilosophx Psychology & Language Sciences, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL, UK
Over the course of the EvoLang series of conferences it has become increasingly clear that two senses of the term “language evolution” have emerged. When we think of the evolution of language, do we mean the evolution of the human faculty for language, or the evolution of language itself! Is the principal evolutionary mechanism natural selection in the biological sense, or some kind of cultural analog? It might be thought that the quest to understand the origins of human language should focus on the former, purely biological, question. After all, the cultural evolution of language could be considered synonymous with diachronic linguistics - a field with very different explanatory aims. I believe this thinking is flawed. Instead, I will argue that in order to have a satisfactory understanding of the origins of our faculty of language we must understand far better the mechanisms of cultural evolution, and the implications they have for the biological evolution of our species. In this talk, I will survey the initial suggestive evidence - mathematical, computational, and experimental - for two broad hypotheses relating to the evolution of language, and give an overview of the implications of these hypotheses should they eventually be supported. The first hypothesis, aspects of which can be found in many authors’ work, is: The biological hypothesis: Humans have the capacity for language primarily because of two quite separate preadaptations. Firstly, we are one of a diverse set of species capable of vocal learning (afeat no other primate is capable ofl. That is, we are able to acquire, through observation, sequentially structured gestural signalling. Secondly, we are able to infer intentions in others that are complex enough to have internal structure. I call these preadaptations because I am claiming that neither is necessarily the result of an adaptation to the functions presumed to be fulfilled by modem human language (e.g. “the transmission of propositional structures over a serial interface”, Pinker & Bloom, 1990). Arguably, either can be found in other species, and humans are unique solely in having the combination. This leaves a separate question of what pressures lead to their evolution, which I will not 453
454
address here. However, it is possible that the former arose as a fitness signaler (e.g. Ritchie et al, submitted). It is reasonable to assume that the latter may be adaptive in any social species with the cognitive wherewithal to achieve it. The combination of these two traits sets the scene for a protolanguage that pairs complex sequences with (potentially) complex meanings. It also provides something that is potentially far more significant, namely the substrate for a new kind of evolutionary system: a complex communication system that is culturally transmitted. This leads to the main topic of my talk: The cultural hypothesis: Language structure is the inevitable product of cultural adaptation to two competing pressures: learnability and expressiviq. Note that these are pressures acting on the new evolving entity (language), not on the old evolving entity (humans). They are the automatic consequences of the fact that language is culturally transmitted, and they have profound explanatory force, which we are only beginning to discover. For example, we are are now fairly sure that this means we can explain significant language universals without having to assume strong innate constraints on language acquisition (Kirby et al, 2007). Indeed, it may be the case that the evolutionary mechanisms involved in language lead naturally to a situation where there is little specifically linguistic content to innateness and not much of language structure is the result of natural selection. The picture emerging from computational and mathematical models, as well as a growing number of experimental studies, is one where language adapts to maximise its own chances of survival, providing support for the organismic metaphors of Christiansen (1994, and later work), Deacon (1997) and others. This kind of adaptive system is only possible because of our unique biology, but it is far from clear that this enabling biology arose because of language. References Christiansen, M. H. (1994). Infinite Languages, Finite Minds: Connectionism, Learning and Linguistic Structure. PhD thesis, University of Edinburgh, Scotland. Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. W.W. Norton. Kirby, S., Dowman, M., and Grifiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):524 1-5245. Pinker, S . and Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13(4):707-784. Ritchie, G., Kirby, S., and Hawkey, D. (submitted). Song learning as an indicator mechanism: Modelling the developmental stress hypothesis. Journal of Theoretical Biology.
455
Selected Publications
Kirby, S., Dowman, M., and Griffiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):524 1-5245. [Demonstrates how strong universals can arise without strong innateness.]
Ritchie, G. and Kirby, S. (2007). A possible role for selective masking in the evolution of complex, learned communication systems. In Lyon, C., et al, eds, Emergence of Communication and Language, 387-402. Springer Verlag. [Explores surprising interactions between biological and cultural evolution of birdsong.]
Brighton, H., Smith, K., and Kirby, S. (2005). Language as an evolutionary system. Physics of Life Reviews, 2: 177-226. [Synthesises a number of models treating language itself as a complex adaptive system.]
Kirby, S., Smith, K., and Brighton, H. (2004). From UG to universals: linguistic adaptation through iterated learning. Studies in Language, 28(3):587407. [Presents the iterated learning model of cultural evolution for linguists.]
Christiansen, M. and Kirby, S., editors (2003). Language Evolution. Oxford University Press. [An edited collection surveying the state of the art.] Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In Briscoe, T., editor, Linguistic Evolution through Language Acquisition: Formal and Computational Models, chapter 6 , pages 173-204. Cambridge University Press. [Demonstrates the emergence of recursive compositionality in an iterated learning model.]
Kirby, S. (2002). Natural language from artificial life. ArtEficial Life, 8(2): 18521 5 . [Surveys the computational models of language evolution.] Kirby, S. (200 1). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2): 102-1 10. [Shows how cultural adaptation leads to a regularity/frequency interaction in morphology.] Kirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In Knight, C., editor, The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, pages 303-323. Cambridge University Press. [Presents the first iterated learning model of the cultural evolution of language.]
Kirby, S. (1999). Function, Selection and Innateness: the Emergence of Language Universals. Oxford University Press. [Sets out mechanisms of linguistic adaptation - how language universals are shaped by language users.]
Kirby, S. (1997). Competing motivations and emergence: explaining implicational hierarchies. Language Typology, 1(1):5-32. [Shows how linguistic adaptation can provide explanations for a particular type of universal structure.]
THREE ISSUES IN MODELING THE LANGUAGE CONVERGENCE PROBLEM AS A MULTIAGENT AGREEMENT PROBLEM
KIRAN LAKKARAJU’ AND LES GASSER’.’ Computer Science Department 2Graduate School of Library and Information Science University of Illinois at Urbana-Champaign {klakkara I gasser}@uiuc.edu
Introduction A language is useless unless it is shared. Individuals and subgroups modify languages by adding new words, creating new grammatical constructions, etc., and propagating these changes through contact. To maintain communicability over time, the population as a whole must converge (possibly within some small diversity limit) to agreement on a “common” language. Abstractly, we can view this process as a Multiagent Agreement Problem (MAP) - individual agents, each in its own state (e.g., speaking some language), change state through interaction to better match the states of others, with the desired end configuration being all agents converge to the same state. The language convergence problem (converging a population of initially linguistically diverse agents to a single language) is clearly a MAP - the agents’ states are their languages and states change via learning from communicative interactions. MAPs with differing conditions have been studied in a wide range of fields, including distributed computing, multi-agent systems, sensor networks, and opinion dynamics to name a few (Lakkaraju & Gasser, 2007). Many powerful models for studying MAPs have emerged. Can we leverage the work in MAPs to develop a general undcrstanding of thc language convergence problem? We suggest that most current MAP models are not applicable to language convergence problems because they do not account for three language convergence issues: the complexity of language, the limited discernibility of language via interaction, and the large potential agreement space for language convergence. Before existing, powerful work in MAPs can be applied to language convergence, MAP models must be extended to account for these properties. Below we describe what is needed for this. Languages are Complex Most current MAP models assume that agents are trying to agree upon one state from a set of unstructured possibilities. Clearly lan456
457 guage is a structured, complex entity in which links between components are crucial. We view a language as made up of at least three constituents: meanings, grammar, and lexicon. Meanings comprise all the issues that can be expressed. The lexicon contains relationships between lexical items and meanings. Grammar specifies how to compose lexemes, and how sentential structure expresses semantic information. These three components are interlinked, and changing one of them can have a great affect on the other components and on communicability with other agents.
Limited Discernibility Most MAP models assume that agents can unambiguously determine the state of other agents through interaction. However, for the case of language, where “state” means “language spoken,” this assumption does not hold. In the language convergence problem agents often interact by playing language games. There are a variety of games, and they allow two agents to exchange information about their respective languages. The information content of these exchanges is always language samples, and they are used by hearers to infer properties of speakers’ languages. The number of samples is limited, and in general insufficient to completely determine the speaker’s language. Thus agents have limited discernibility of others’ states-their languages. This is insufficient to satisfy the typical MAP criterion of complete state discernibility. Large Agreement Space Each state in an agreement space (AS) is a possible point of agreement. In the MAP problem “meeting room scheduling,” for example, this is the set of times at which meetings can be held; agreement is convergence on a single commonly-available time. For language convergence, the AS is the set of possible languages that agents could speak; agreement means speaking the same language from this space. In most current MAP models the agreement space is assumed to be discrete and very small (e.g. ( 0 , l ) in Shoham & Tennenholtz, 1997)). Clearly for language convergence problems, MAP models must handle very large agreement spaces. Conclusion Our work in this area concerns defining current shortcomings in MAP techniques and creating new approaches specifically tailored to solving language convergence problems in a general way, especially for the evolutionary design of large communicative groups of artificial agents. References Lakkaraju, K., & Gasser, L. (2007). A unified framework for multi-agent agreement. In Proceedings of aamas ’07. Honolulu, Hawaii. Shoham, Y., & Tennenholtz, M. (1997). On the emergence of social conventions: modeling, analysis, and simulations. Artificial Intelligence, 94( 1-2), 139166.
THE DEVELOPMENT OF A SOCIAL SIGNAL IN FREERANGING CHIMPANZEES MARION LAPORTE, KLAUS ZUBERBUHLER School of Psycholoa, University of St Andrews, St Andrews, KYI 6 9JP. UK
Little research has been conducted on the question of how our closest living relatives, the chimpanzees, learn to produce and comprehend their own natural vocal repertoire from early infancy. Current theories and models of vocal development and vocal learning rely almost exclusively on research conducted with non-primates, mainly songbirds. However, there are a number of reasons to remain cautious when trying to apply these models to non-human primate vocal development or speech acquisition. For example, as with non-human primates, human infants go through a lengthy phase of non-linguistic vocal behaviour prior to speech production, which is largely responsive to ongoing social events. Birdsong, in contrast, is a sexually selected behaviour that functions in maximising reproductive success; and as such is probably based on fundamentally different psychological mechanisms. In this study, we present data on vocal development in a community of free-ranging chimpanzees at Budongo Forest, Uganda. We were particularly interested in the patterns that underlie the emergence of one specific signal, the pant-grunt vocalisation. When free-ranging chimpanzees encounter a higher-ranking community member they typically produce pant-grunts, which essentially function as a greeting signal. Pant-grunts are emitted at close range and, due to their social unidirectionality, are important manifestations of how callers assess their own social relations. We investigated the development of pant-grunts in infant chimpanzees to document (a) its emergence within an individual’s vocal repertoire, (b) its appropriate usage as a social signal and (c) the social learning processes that take place between infant callers and their mothers or other community members. We found that, unlike other call types, appropriate usage of pant-grunts required a relatively sophisticated understanding of the various social relations amongst community members, the rules thereof most likely had to be inferred by 458
459
observational learning. Pant-grunts emerged at the age of about 5 months, which usually coincided with infants passing through a stage of intense social behaviour usually involving the mother. During this initial period (between 5 and 18 months), pant-grunts were used in a way that differed significantly from adult usage, possibly serving a different function. At this early stage, we found no evidence that infants understood the social dominance hierarchy within the community, and infants used pant-grunts as a means to interact with other community members and participate in social activities. With increasing age and social experience, call use becomes more focused and increasingly used as a greeting signal towards higher-ranking community members. We discuss the role of social learning processes and individual experience during this transition.
GESTURAL MODES OF REPRESENTATION - A MULTIDISCIPLINARY APPROACH
KATJA LIEBAL Department of Psycholom, University of Portsmouth, King Henry I"' Street Portsmouth, PO1 ZDY, UK HEDDA LAUSBERG Department of Psychosomatic, University Friedrich Schiller Jena, Bachstrasse 18 Jena. 00743, Germany ELLEN FRICKE, CORNELIA MULLER Department of Cultural Studies, European University Viadrina Frankjiurt (Oder) Grosse Scharrnstrasse 59, 15239 Franhfiurt (Oder), Germany
This talk will present first results of an interdisciplinary project which investigates the structural properties of gestures from a linguistic, a neurocognitive, and an evolutionary perspective. The focus is on one fundamental aspect of these structures, namely the techniques underlying gesture creation, termed gestural modes of representation (Miiller 1998a,b). Four basic modes of representation are distinguished: the hands model a threedimensional shape of an object, the hands outline the two-dimensional form of an object, or the hands embody the object (a flat hand embodies a piece of paper or a window), or the hands reenact an everyday activity such as opening a window or turning a car key. In studies on patients with brain lesions, similar categories (pantomime, body-part-as-object) have been found to be generated in different brain regions (Lausberg, Cruz, Kita, Zaidel, & Ptito, 2003). On this basis, neuroscientific studies contribute to identifying formal and semantic structures of gestures. Comparative studies of gestural structures in human and nonhuman primates will investigate more closely which of the linguistically identified structures in human gestures are present in our closest relatives, the nonhuman great apes including orangutans, gorillas, chimpanzee es and bonobos (Liebal, Miiller & Pika, 2007). This will sharpen our
460
461 understanding of the different kinds of structures present in human gestures and reveal which aspects of the human techniques of gesture creation are also present in nonhuman primates. Determining exactly which structures overlap across primate species and which ones evolved uniquely with human language will contribute to the current debate in evolutionary anthropology that posits a gesture-first theory of language evolution (Hewes, 1973; Corballis, 2002) against one in which gesture and speech emerged in concert (Arbib, 2003,2005; McNeill2005). Acknowledgements
We would like to thank Volkswagen-Stiftung for funding this project. References
Arbib, M. A. (2003). Protosign and protospeech: An expanding spiral. Behavioral and Brain Sciences, 26(2), 199-266. Arbib, M. A. (2005). Interweaving Protosign and Protospeech: Further Developments Beyond the Mirror. Interaction Studies, 6(2), 145-1 71. Corballis, M.C. (2002). From hand to mouth, the origins of language. Princeton, New Jersey: Princeton University Press. Hewes, G. W. (1973). Primate communication and the gestural origin of language. Current Anthropology, 12( 1- 2 ) , 5-24. Lausberg, H., Cruz, R.F., Kita, S., Zaidel, E.,& Ptito, A. (2003). Pantomime to visual presentation of objects: left hand dyspraxia in patients with complete callosotomy. Brain 126,343-360. Liebal, K., Miiller, C., & Pika, S. (Eds.). (2007). Gestural communication in nonhuman and human primates. John Benjamins Publishing Company. McNeill, D. (2005). Gesture and Thought. Chicago: Chicago University Press. Miiller, C. (1 998a). Redebegleitende Gesten. Kulturgeschichte - Theorie Sprachvergleich. Berlin Verlag: Berlin. Miiller, C. (1998b). Iconicity and gesture. In S. Santi, I Guaitella and C. CavC (Eds.), Oralite' et gestualite': communication multimodale, interaction: actes du colloque Orage'98 (pp. 32 1-328). Montreal, Paris: L'Harmattan.
EXTRACOMMUNICATIVE FUNCTIONS OF LANGUAGE: VERBAL INTERFERENCE CAUSES CATEGORIZATION IMPAIRMENTS GARY LUPYAN Department of Psychology, Cornell University Ithaca, NY, 148.50 USA
A question that is centrally linked to the study of language evolution is whether language facilitates or makes possible certain cognitive acts. Such extra-communicative conceptions of language (e.g., Clark, 1998) argue that in addition to its adaptive value as a communicative tool, language may have evolved, in part, as a cognitive aid. One source of evidence for this claim comes from the study of aphasic patients, who have been observed to suffer not only from communication deficits that define aphasia, but also on a wide-range of tasks that do not require the overt use of language. Indeed, observations that aphasic patients suffer deficits on a range of nonverbal tasks have led some to conclude that one of the main function of language is the ability to “fixate thoughts,” and thus “defect in language may damage thinking” (Goldstein, 1948, p.115). The most consistent and profound non-linguistic deficits in aphasia are seen in a class of categorization tasks that require the patient to selectively attend to particular stimulus features. For instance, many patients are impaired at sorting objects by size, while ignoring shape. After conducting and reviewing a number of such studies, Cohen, Kelter, and colleagues concluded that aphasics have a “defect in the analytical isolation of single features of concepts” (Kelter et al., 1976; Cohen, Kelter, & Woll, 1980; Cohen et al., 1981). All tested subtypes of aphasic patients are “deficient if the task requires isolation, identification, and conceptual comparison of specific individual aspects of an event,” but are equal to controls “when judgment can be based on global comparison” (Cohen et al., 1980). To illustrate, consider patient LEW, who is profoundly anomic, but has excellent comprehension. This patient is severely impaired on taxonomicgrouping tasks with not only complex items like faces, but even the simplest 462
463
perceptual stimuli, being unable to sort colors or shapes meaningful categories (Davidoff & Roberson ,2004). One intriguing possibility is that such impairments are due to the failure of language to maintain appropriate conceptual representations. If so, then normal subjects when placed under conditions of verbal interference may exhibit some of the same symptoms exhibited by aphasic populations-in particular, a difficulty in isolating and focusing on specific perceptual dimensions. To test this hypothesis, participants performed an odd-one out categorization task: given three objects, participants had to choose the object that didn’t belong based on color, size, or thematic relationship (e.g., for a triad consisting of a potato, balloon, and a cake, potato was the correct choice). Verbal interference was implemented as a within-subject manipulation by having participants rehearse number strings during some of the categorization trials. Two experiments used pictures and words as stimuli, respectively. Based on the findings that aphasic patients have particular difficulties with tasks requiring isolation of perceptual features, it was predicted that verbal interference would have a stronger effect on categorization by color and size compared to categorization requiring a focus on broader association (thematic relations). The design for this experiment was borrowed from Davidoff and Roberson’s study (2004, Exp. 7) that was used with the anomic patient LEW in which he showed the predicted effect. Verbal interference resulted in an overall slowing down of responses. Critically, there was a significant interference-condition x trial-type interaction with a significant slowing of responses for perceptual-based categorization (color, size), and no significant effect for trials requiring categorizing based on thematic relations. This effect remained when words rather than pictures were used as stimuli. A control experiment using a visuospatial interference task that replaced to-be-remembered number strings with dot-patterns, failed to find this interaction. These results provide support for the hypothesis that certain categorization tasks may depend in some way on language even while they do not require any type of verbal response. The pattern of results in normal participants placed under verbal interference is strikingly similar to that found in aphasic patients, suggesting that language may play an on-line role in maintaining categorical distinctions and in helping to focus attention on specific perceptual dimensions. These results speak to possible adaptive benefits of language that go beyond interpersonal communication.
FORM-MEANING COMPOSITIONALITY DERIVES FROM SOCIAL AND CONCEPTUAL DIVERSITY GARY LUPYAN Department of Psychology Cornell University Ithaca, NY, 14850 USA
RICK DALE Department of Psychology University of Memphis Memphis, TN, 38152 USA
Language structure is often considered separate from its socio-cultural bearings (e.g., Chomsky, 1995). Such an assumption may obscure rich interaction between the structures present in a language and the social and conceptual circumstances in which they function. Recently, Wray and Grace (2007), drawing on earlier work by Thurston (1994), have argued for distinguishing two broad language types that reflect this interaction. Esoteric (inward-facing) languages are languages spoken within small groups and learned by relatively few outsiders. Exoteric (outward-facing) languages (of which English is an extreme example) are spoken by large groups, and learned by many adults as second languages. Exoteric languages tend to have more open-class words than esoteric languages, possess far simpler morphological systems and can often be well characterized by rule-based grammars. Semantics in exoteric languages are generally compositional-one can derive the meaning of the whole from the meanings of the parts. In contrast, esoteric languages have fewer open-class words, but complex morphological systems. They are highly context dependent, given to numerous exceptions that withstand regularization, and are often characterized by polysynthesis and morphologically-conditioned allomorphy. Wray and Grace (2007) explain the correspondence between language usage (esoteric vs. exoteric) and language structure through evolutionary reasoning. They argue that the characteristics of esoteric languages, though undaunting to infants, lead to substantial difficulty for an adult outsider to learn. Esoteric usage thus marks in-group members by the speakers’ ability to use this linguistic custom, having acquired it during childhood. However, an increasing need to interact with outsiders and about novel topics, insofar as it requires recombining existing elements into novel sentences that are understood by strangers, places a pressure on the language to become more transparent and compositional. This 464
465
makes the language easier to learn by new adult users. Compositionality, common to exoteric languages, is thus supported by a need to communicate with strangers. Compositionality also allows speakers to easily generate new meanings through recombination of familiar elements, allowing for comprehension without the need for extended in-group experiences. Thus, the property of compositionality, rather than an innate language universal, could be a product of out-group interaction-of “talking with strangers” (Wray & Grace, 2007). The current work tests this fascinating hypothesis in a computational framework. We tested two predictions derived from Wray and Grace’s analysis. First, we expected that learning basic grammatical structure common to esoteric languages will be easy for nalve learners, but progressively harder to acquire by learners with experience in another language. In contrast, grammars common to exoterictype languages should continue to be learnable by late learners. Second, because grammars common to exoteric languages have more transparent form-to-meaning mappings, we expected that networks exposed to these grammars should be better able to generalize their linguistic knowledge to novel contexts. A fully-recurrent neural network was trained to map phonological forms to semantics. The networks were trained on sentences corresponding to schematic structures of esoteric and exoteric languages. The exoteric-type grammar consisted of a large vocabulary of lexical morphemes with fixed semantics and few closed-class morphemes which, rather than having fixed semantics, modified the semantics of neighboring open-class words. In such grammars context plays a limited role and there exists a transparent form-to-meaning mapping. The esoteric-type grammars consisted of a much greater proportion of closed-class words and a smaller lexicon. This greater number and prevalence of non-lexical morphemes meant that the lexical semantics were much more context-dependent. Results provided support for both predictions. First, nalve networks could learn esoteric and exoteric grammars to roughly equal proficiency. Critically, age of exposure mattered more for esoteric than exoteric grammars, with the former being disproportionately more difficult to learn by more “mature” networks. Second, as predicted, generalization to novel contexts was more difficult for esoteric compared to exoteric languages. We aim to integrate two approaches to language and its evolution: anthropological theories of sociocultural influences on language, and psychological theories of computational mechanisms for language. In this integrated view, the structural characteristics of language have their origin in the interaction between sociocultural and computational constraints. Generative recursion, long considered foundational to the emergence of our linguistic abilities, may simply be derivative of this interaction.
LANGUAGE AS KLUGE GARY F. MARCUS Department of Psychology, New York Universiv, New York, NY 10012, USA
In fields ranging from reasoning to linguistics, the idea of humans as perfect, rational, optimal creatures is making a comeback - but should it be? Hamlet’s musings that the mind was “noble in reason ...infinite in faculty” have their counterparts in recent scholarly claims that the mind consists of an “accumulation of superlatively well- engineered designs” shaped by the process of natural selection (Tooby and Cosmides, 1995), and the 2006 suggestions of Bayesian cognitive scientists Chater, Tenenbaum and Yuille that “it seems increasingly plausible that human cognition may be explicable in rational probabilistic terms and that, in core domains, human cognition approaches an optimal level of performance”, as well as in Chomsky’s recent suggestions that language is close “to what some superengineer would construct, given the conditions that the language faculty must satisfy”. In this talk, I will I argue that this resurgent enthusiasm for rationality (in cognition) and optimality (in language( is misplaced, for three reasons. First, I will suggest that recent empirical arguments in favor of human rationality rest on a fallacy of composition, implicitly but mistakenly assuming that evidence of rationality in some (carefully analyzed) aspects of cognition entails that the broader whole (i.e. the human mind in toto) is rational. In fact, establishing that some particular aspect of cognition is optimal (or perfect, or near optimal) is not tantamount to showing that the system is a whole is; current enthusiasm for optimality overlooks the possibility that the mind might be suboptimal even if some (or even many) of the components of cognition have been optimized. Second, I will argue that there is considerable empirical evidence (most well-known, but rarely given due attention in the neo-Rationalist literature) that militates against any strong claim of human cognitive or linguistic perfection. Finally, I will argue that the 466
467
assumption that evolution tends creatures towards rationality or “superlative adaptation” is itself theoretically suspect, and ought to be considerably tempered by recognition of what Stephen Jay Gould called “remnants of history”, or what might be termed evolutionary inertia. I will close by suggesting that mind might be better seen as what engineers call a kluge: clumsy and inelegant, yet remarkably effective. References
Fisher, S. E. & Marcus, G. F. (2006). The eloquent ape: genes, brains and the evolution of language. Nature Reviews Genetics, 7,9-20. Marcus, G. F. (2004) Before the Word. Nature, 431,745. Marcus, G. F. (2004). The Birth of The Mind:How a Tiny Number of Genes Creates the Complexities of Human Thought. New York: Basic Books. Marcus, G. F. (2006). Cognitive Architecture and Descent with Modification. Cognition.,lOl, 443-465. Marcus, G. F. (2008). Kluge: The Haphazard Construction of the Human Mind. Boston: Houghton-Mifflin. [UK Edition: Faber & Faber]. Marcus, G. F. & Rabagliati, H. (2006) The nature and origins of language: How studies of developmental disorders could help, Nature Neuroscience, 10, 12261229.
ORIGINS OF COMMUNICATION IN AUTONOMOUS ROBOTS DAVIDE MAROCCO Institute of Cognitive Sciences and Technologies, National Research Council Via S.M. della Battaglia, 00185. Rome STEFAN0 NOLFI Institute of Cognitive Sciences and Technologies, National Research Council Via S.M. della Battaglia, 00185. Rome
The development of embodied autonomous agents able to self-organize a grounded communication system and use their communication abilities to solve a given problem is a new exciting field of research (Quinn, 2001; Cangelosi & Parisi, 2002). These self-organizing communication systems may have characteristics similar to that observed in animal communication (Marocco & Nolfi, 2007) or human language. In this paper we describe how a population of simulated robots evolved for the ability to solve a collective navigation problem develop individual and socialkommunication skills. In particular, we analyze the evolutionary origins of motor and signaling behaviors. The experimental set-up consists in a team of four simulated robots placed in an arena of 27Ox270cm that contains two target areas and are evolved for the ability to find and remain in the two target areas by equally dividing between the two targets. Robots communicate by producing and detecting signals up to a distance of 100cm. A signal is a real number with a value ranging between [O.O, 1.O]. Robots’ neural controllers consist of neural networks and the free parameters of the robots’ neural controllers have been evolved through a genetic algorithm. After the evolutionary process, by analyzing the fitness thorough out generations we observed that evolving robots are able to accomplish their task to a good extent in all replications. Moreover, the comparison between the results obtained in the normal and in the control condition in which robots are not allowed to detect other robots’ signals indicates how the possibility to produce and detect other robots’ signals is necessary to achieve optimal or close to optimal performance. To understand the evolutionary origins of robots’ communication system we analyzed the motor and signaling behavior of 468
469 evolving robots through out generations. To reconstruct the chain of variations that led to the final evolved behavior we analyzed the lineage of the best individual of the last generation. By analyzing the motor and signaling behavior through out generations we observed several evolutionary phases that progressively shape the final behavior by adding new communication behaviors and sensory-motor skills the behavioral repertoire of the robots. In particular, in a first phase the robots move in the environment by producing curvilinear trajectories and by avoiding obstacles and produce two stable signals when they are located inside or outside a target area, respectively, and far from other robots. Moreover, robots produce highly variable signals when they interact with other robots located nearby. In a second phase robots progressively evolve an individual ability to remain in target areas. In particular, robots located on target areas rotate on the spot so to remain there for the rest of the trial. In a third phase, the development of an individual ability to remain on target areas developed in previous generations posed the adaptive basis for the development of a cooperative behavior that allows robots located on a target area alone to attract other robots toward the same target area. At this stage robots are not still able to remain in a target area in couple. Finally, in the last evolutionary phase, we observe a number of variations that allow robots to not exit from target areas when they detect the signal produced by another robot located in the same target area. During this long evolutionary phase we observed that the performances of the robots, the number of signals, and the functionalities of signals remain stable. Obtained results indicate that the signals and the meaning of signals produced by evolved robots are grounded not only on robots sensory-motor system but also on robots’ behavioral capabilities previously acquired. Moreover, the analysis of the co-adaptation of robots individual and communicative abilities indicate how innovations in the former might create the adaptive basis for further innovations in the latter and vice versa. References Cangelosi, A. & Parisi, D. (2002). Simulating the Evolution of Language. London: Springer. Marocco, D. & Nolfi, S. (2007). Communication in Natural and Artificial Organisms. Experiments in evolutionary robotics. In: Lyon C., Nehaniv C. & Cangelosi A. (eds.): Emergence of Communication and Language, London: Springer. Quinn, M. (2001). Evolving communication without dedicated communication channels. Lecture Notes in Computer Science, 2159.
HANDEDNESS FOR GESTURAL COMMUNICATION AND NON COMMUNICATIVE ACTIONS IN CHIMPANZEES AND BABOONS: IMPLICATIONS FOR LANGUAGE ORIGINS ADRIEN MEGUERDITCHIAN1,2, JACQUES VAUCLAIR', MOLLY J. GARDNER3 STEVEN J. SCHAPIR03 & WILLIAM D. HOPKINS2z4 'Department of Psychology, Research Center in Psychology of Cognition, Language & Emotion, University of Provence, 29, Av. R. Schuman, Aix-en-Provence, 13621, France. 'Division of Psychobiology, Yerkes National Primate Research Center, Atlanta, GA, 30322, USA. 3Departrnent of Veterinary Sciences, M.D. Anderson Cancer Center, University of Texas, Bastrop, TX, 78602 USA. 'Department of Psychology, Agnes Scott College, Decatur, GA, 30030. USA.
Most humans show a left-hemispheric dominance for language functions (Knecht et al., 2000). Whereas such a left-lateralization has been historically linked to right handedness for manipulative actions, dominant use of the right hand is also observed for "language-related" gestures such as signing, pointing and manual movements when speaking (reviewed in: Hopkins et al., 2005), suggesting that left-lateralized language areas may underlie gesture production (Kimura, 1993). Behavioral asymmetries in apes and monkeys have been studied for investigating precursors of hemispheric specialization in human and some of these studies have revealed continuities with humans (Hopkins, in press). For example, captive chimpanzees and olive baboons show a dominance of the right hand in bimanual manipulative actions (Hopkins et al., 2005; Vauclair et al., 2005) and, in a higher degree, for communicative gestures (Hopkins et al., 2005; Meguerditchian & Vauclair, 2006). Interestingly, in both species, the hand preferences for gestures showed no correlation with those for bimanual actions. Such findings raise the hypothesis that a specific left-lateralized communicatory cerebral system, which is different from the one involved in manipulative actions, may control communicative gestures and led the authors to consider gestural behaviors as an ideal prerequisite for the emergence of language and its left-lateralization (see Corballis, 2002). To further investigate this hypothesis, the current study was undertaken to determine whether it is the communicative nature of the gestures (and not only 470
471
the motor properties) which induces a different pattern of laterality compared to non-communicative bimanual manipulative actions. Using an observational method, we measured manual preferences in samples of captive baboons and chimpanzees for two new categories of manual actions including: (1) a noncommunicative self-touching action (referred to “muzzle wipe”, serving as a “control” behavior) and ( 2 ) other communicative gestures previously unstudied in each species including: a) human-directed “food begs” in baboons and b) in chimpanzees, human-directed “clapping” and all conspecifics-directed gestures such as “hand slap”, “extended arm”, “wrist present” and “threat”. The results indicated that for both species: (1) communicative gestures show a dominance of the right-hand whereas the self-touching action does not induce populationlevel handedness; (2) within the same subjects, individual hand preferences for the newly investigated gestures are correlated with hand preferences for the previously investigated gestures (“hand slap” in baboons and “food begs” in chimpanzees) but are not correlated with hand preferences for muzzle wipe or bimanual actions. These results in baboons and chimpanzees may not only reveal a left-hemispheric dominance for the various communicative gestures studied (by contrast to a non communicative action) but also support the hypothesis of the emergence from the common ancestor of baboons, chimpanzees and humans of a specific communicatory cerebral circuit involved for gesturing, which may constitute ideal precursors of language-specific cortical network in humans. Refer en ces Corballis, M. C. (2002). From Hand to Mouth. The Origins of Language. Princeton, NJ: Princeton University Press. Hopkins, W. D. (Ed.) (in press). Evolution of Hemispheric Specialization in Primates, Special Topics in Primatology. American Society of Primatology. Hopkins, W. D., Russell, J., Freeman, H., Buehler, N., Reynolds, E., & Schapiro, S. J. (2005). The distribution and development of handedness for manual gestures in captive chimpanzees (Pan troglodytes). Psychological Science, 6, 487-493. Kimura, D. (1 993). Neuromotor mechanisms in human Communication. Oxford: Oxford University Press. Knecht, S., Deppe, M., Draeger, B., Bobe, L., Lohman, H., Ringelstein, E. B., & Henningsen, H. (2000). Language lateralization in healthy right-handers. Brain, 123, 74-8 1. Meguerditchian, A., & Vauclair, J. (2006). Baboons communicate with their right hand. Behavioural Brain Research, 171, 170-1 74. Vauclair, J., Meguerditchian, A., & Hopkins, W.D. (2005). Hand preferences for unimanual and coordinated bimanual tasks in baboons (Papio anubis). Cognitive Brain Research, 25, 2 10-2 16.
THE EVOLUTION OF HYPOTHETICAL REASONING: INTELLIGIBILITY OR RELIABILITY? HUGO MERCIER lnstitut Jean Nicod, 29 rue d’Ulrn Paris, 75005, France
We can divide the problems encountered during language evolution in two broad categories: cognitive problems and strategic problems. Cognitive problems are constraints on the production or understanding of language. Strategic problems are linked to the maintenance of honest communication. It has been argued that hypothetical reasoning (HR) evolved as a mean to overcome a specific cognitive problem, that of producing and understanding displaced reference (Harris, 2000). Here I will argue that HR instead evolved as a mean to overcome strategic problems, more precisely to check communicated information in order to ensure that we are not being deceived. A first argument is theoretical. Firstly, assuming that capacities such as episodic memory were present before language evolution, then there is no reason to expect that translating into language thoughts related to episodic memory (and thus having the properties of displaced reference) would be any harder than translating thoughts about the here and now. Secondly, some animal communication systems have displaced reference - the bee dance for instance without requiring HR. So it would seem that HR is actually not necessary to produce or understand displaced reference. HR can be useful as a mean to check communicated information though. It is well known that for communication to be evolutionary stable, its honesty has to be maintained. Several means to enforce that honesty have been studied in humans: source monitoring, use of behavioral clues, or consistency checking for instance (see DePaulo et al., 2003; Sperber, 2001). It has been argued that reasoning, generally, evolved as a mean to persuade and evaluate information (Sperber & Mercier, in press and Dessalles, 2007, for a related argument). HR, as a special type of reasoning, would be used for the same purposes. 472
473 In order to argue for such a view it is possible to gather different kind of evidence. The first is related to the contexts in which HR is used. If HR evolved to understand displaced reference, it should be used proportionally to the difficulty of understanding such sentences, but if HR evolved to check communicated information, it should mainly be used when confronted with information we have reasons to doubt. This is generally the case for reasoning, and HR doesn’t seem to be any different (see Sperber & Mercier, in press). The second is the efficiency of hypothetical reasoning used in argumentative contexts, because in these contexts people typically have to evaluate communicated information. Numerous experiments by David Green and colleagues have shown that people are proficient at using HR in such contexts (see for instance Green, Applebaum, & Tong, 2006). The third involves delineating features of HR that fit only with one hypothesis. For instance, if HR is used to understand what people say, then it shouldn’t systematically depart from what is meant. If, instead, HR is used to evaluate what is said, then it should depart from what is meant in at least one way: it should seek ways in which what is being communicated, if accepted, would advantage the sender. If such ways are found, then the message should be rejected. And this is what we observe, starting with young children who are able to use a match between people’s intentions and the consequences of what they state to decide whether they should believe them or not (Mills & Keil, 2005). References
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psycho1 Bull, 129(l), 74-1 18. Dessalles, J.-L. (2007). Why We Talk: The Evolutionary Origins of Language Cambridge: Oxford University Press. Green, D. W., Applebaum, R., & Tong, S. (2006). Mental simulation and argument. Thinking and Reasoning, 12( l), 3 1-6 1. Harris, P. (2000). The Work of the Imagination. London: Blackwell. Mills, C. M., & Keil, F. C. (2005). The Development of Cynicism. Psychological Science, 16(5), 385-390. Sperber, D. (2001). An evolutionary perspective on testimony and argumentation. Philosophical Topics, 29,401-413. Sperber, D., & Mercier, H. (In Press). Intuitive and reflective inferential mechanisms. In J . S . B. T. Evans & K. Frankish (Eds.), In Two Minds. Oxford: Oxford University Press.
SIMULATION OF CREOLIZATION BY EVOLUTIONARY DYNAMICS
MAKOTO NAKAMURA' TAKASHI HASHIMOTO' SATOSHI TOJO' School of {' Information, 'Knowledge} Science, Japan Advanced Institute of Science and Technology, Nomi. Ishikawa, 923-1292, Japan (rnnakamul; hash, tojo} @jaist.ac.jp
The purpose of this abstract is to investigate the characteristics of creole (DeGraff, 1999) using a mathematical formalization of population dynamics. Linguistic studies show that the emergence of creole is affected by contact with other languages, the distribution of population of each language, and similarities among the languages. Constructing a simulation model including these elements, we derive conditions for creolization from theoretical and numerical analyses. Creoles are full-fledged new languages which children of the pidgin speakers acquire as their native languages. Interesting is the fact that children growing up hearing syntactically simplified languages such as pidgins develop a mature form as Creoles (DeGraff, 1999). Pidgins and Creoles may concern the mechanism for language acquisition of infants. Particularly, some properties of Creoles imply the existence of innate universal grammar. Simulation studies of language evolution can be represented by population dynamics, examples of which include an agent-based model of language acquisition proposed by Briscoe (2002) and a mathematical framework by Nowak, Komarova, and Niyogi (2001), who developed a mathematical theory of the evolutionary dynamics of language called the language dynamics equation, in which the change of language is represented as the transition of population among a finite number of languages. We modified the language dynamics based on social interaction, and then dealt with the emergence of creole (Nakamura, Hashimoto, & Tojo, 2007). In response to the language dynamics equation, we assumed that any language could be classified into one of a certain number of grammars. Thus, the population of language speakers is distributed to a finite number (n)of grammars {GI . . . Gn}.Let zi be the proportion of speakers of Gi within the total population. Then, the language dynamics is modeled by an equation governing the transition of language speakers among languages. Our model is different from the language dynamics equation by Nowak et al. (2001) in that we neglect the fitness 474
475 term in terms of the biological evolution, and focus on the cultural transmission by introducing the degree of language contact, that is:
is the transition matrix among languages. Each element, where &(t)(= (?jij(t)}) qt3,is defined as the probability that a child of G, speaker obtains G j by the exposure to hisher parental language and to other languages. a(t)depends on the distribution of language population at t, similarity among languages and a learning algorithm. Creoles are considered as new languages. From the viewpoint of population dynamics, we define a creole as a transition of population of language speakers. A creole is a language which no one spoke in the initial state, but most people have come to speak at a stable generation. Therefore, creole is represented by G, such that: x,(O) = 0, z c ( t )> B,, where z c ( t )denotes the population share of G, at a convergent time t, and 0, is a certain threshold to be regarded as a dominant language. We set Bc = 0.9 through the experiments. From our experiments, we observed creolization and found a correlation between the number of input sentences and the similarity among languages. Creoles emerged within a certain range of similarity. In our model, languages are defined as similarity between languages, which denotes the probability that a G, speaker utters a sentence consistent with Gj. If we consider some situation of language contact, the target language is either very similar to speakers’ own language or dissimilar at all. Replacing the similarity values with 1 - E for very similar languages and with E for dissimilar languages, the model is very simplified and may be solved analytically. However, if we consider a creole, which is somewhat similar to other contact languages, we cannot replace the values with these simple ones. As a result, our creole model is very difficult to solve analytically. We discuss how to cope with this problem.
References Briscoe, E. J. (2002). Grammatical acquisition and linguistic selection. In T. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models. Cambridge University Press. DeGraff, M. (Ed.). (1999). Language creation and language change. Cambridge, MA: The MIT Press. Nakamura, M., Hashimoto, T., & Tojo, S. (2007). Simulation of common language acquisition by evolutionary dynamics. In Proc. of IJCAI 2007 Workshop on Evolutionary Models of Collaboration (pp. 21-26). Hyderabad. Nowak, M. A., Komarova, N. L., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114-118.
EVOLUTION OF PHONOLOGICAL COMPLEXITY: LOSS OF SPECIES-SPECIFIC BIAS LEADS TO MORE GENERALIZED LEARNABILITY IN A SPECIES OF SONGBIRDS
KAZUO OKANOYA & MIKI TAKAHASI Lab for Biolinguistics, BSI, RIKEN, 2-1 Hirosawa Wako, 351-0198, Japan
A species of songbirds, the Bengalese finch (Lonchura striata var. dornestica) is a domesticated strain of the wild white-rumped munia. White-rumped munias were imported to Japan some 250 years ago and then domesticated as a pet bird. Munias have been bred for their intense parental behavior and white color morph during the course of domestication, but they were never bred for their songs. Nevertheless, domesticated Bengalese finches sing very different songs from those of Munias: Bengalese songs are sequentially and phonologically complex while Munia songs are simpler (Okanoya, 2004).
MUNIA fostered
toBENG
1
I
I
1
; iL_-_-_
I
i
---
*I
_ I
Fig. I . A white-rumped munia cross-fostered to a Bengalese father (top) had a difficulty in learning a particular song note (bottom) while the Bengalese son learned father's song without difficulty (middle).
476
477
To elucidate the degree in which environmental and genetic factors contribute to these differences in song structure, we cross-fostered chicks of Munias and Bengalese. Detailed phonological analysis revealed that accuracy song-note learning is highest in Munias chicks reared by Munias, and lowest in Munia chicks cross-fostered to Bengalese. Bengalese chicks, on the other hand, showed intermediate degree of learning accuracy regardless whether they were reared by Munias or Bengalese. Results suggest that Munias are highly specialized in learning Munia song phonology, but less adopted in learning song phonology of the other strain, and Bengalese are less specialized in learning the own strain phonology but more generalized in learning the other strain phonology (Fig. 1). Results can be interpreted as that there is an innate bias to learn speciesspecific phonology in Munias, and that such a bias is lost during domestication. White-rumped munias have several sympatric species such as spotted munias in their wild habitat. To avoid infertile hybridization, having a strong innate bias to attend to own-species phonology should be adaptive for Munias. Bengalese, on the other hand, are a domesticated strain and breeding is under the control of breeders. In such environment, species-specific bias is a neutral trait and might soon be degenerated. By the degeneration of species-specific bias, Bengalese perhaps obtained more general ability to learn from a wide-range of phonology. Results also can be explained in the light of masking - unmasking and genetic redistribution, the idea proposed by Deacon (2003). Domestication functions as a masking factor and perceptual specialization for species-specific sound is masked. Under that environment, genetic specialization to attend species specific sound is re-distributed to more general ability to learn from a wider range of sounds in Bengalese finches, and perhaps in humans, in which case, through the process of self-domestications. Acknowledgements This work was supported by Grant-in-Aid for Young Scientists from JSPS to MT and a PREST grant from JST to KO. References Deacon, T. W. (2003). Universal grammar and semiotic constraints. In: M. H. Christiansen & S. Kirby (eds.) Evolution of Language, Oxford University Press, pp. 111-140. Okanoya, K. (2004). Song syntax in Bengalese finches: Proximate and ultimate analyses. Advances in the Study of Behaviour, 34,297-346.
REFERENTIAL GESTURES IN CHIMPANZEES IN THE WILD: PRECURSORS TO SYMBOLIC COMMUNICATION? SIMONE PIKA School of Psychological Sciences, University of Manchester, Coupland I Building, Oxfird Road, Manchester, Lancashire, M I 3 9PL, England (UK) JOHN C. MITANI Department ofAnthropology, University of Michigan, I01 West Hall, I085 South University Avenue Address Ann Arbor, MI 481 09-1 107,United States
One of the driving forces of human research is the question how spoken language, which is thought to be unique to humans, originated and evolved. Researchers quite regularly addressed this question by comparing human communicative signals to the systems of communication evolved in other animals, especially in one of our closest living relative, the chimpanzee (Pan troglodytes). The majority of research focused on vocal communication. Recent studies however provide evidence that gestures play an important role in the communication of chimpanzees and resemble those of pre-linguistic children and just-linguistic human infants in some important ways: they are used as intentional acts, represent a relatively stable part of an individual’s communicative repertoire, and are clearly learned. Chimpanzees however mainly use these communicative means as effective procedures in dyadic interactions to request actions from others (imperatives). Human children however, commonly use referential gestures, e.g. pointing, which direct the attention of recipients to particular aspects of the environment. The use of these gestures has been linked with cognitive capacities such as mental state attribution, because the recipient must infer the signaller’s meaning. Until now, referential gestures have been reported only in captive chimpanzees interacting with their human experimenters and human-raised or language trained individuals. It is therefore not clear yet whether these abilities represent natural communication abilities or are byproducts of living in a human encultured environment. 478
479
Here we report the widespread use of a gesture in chimpanzees in the wild, which might be used referentially. The gesture involved one chimpanzee making a relatively loud and exaggerated scratching movement on a part of his body, which could be seen by his grooming partner. It was observed between pairs of adult males and was recorded 186 times in 101 (41%) of 249 grooming bouts. One hundred nineteen times (64%), the groomer stopped grooming and groomed the scratched spot. Eight times (4%) individuals simultaneously scratched and presented a body part and were groomed there immediately. In 59 cases (32%), the groomer continued to groom without touching the area scratched by the signaler. The gesture received significantly more positive than negative responses (p < 0.001; exact binominal test) and occurred in 61% (N=51) of all observed grooming dyads (N=84). It was performed on average 3.65 timeddyad and was used significantly more often in dyads consisting of high ranking males than other possible pairings (p < 0.001; df=6, linear- linear association. We address the questions whether the behavior reflects, a) behavioral conformity due to stimulus enhancement, b) a physical response by an individual to parasites or dirt, thereby drawing the attention of the groomer to a potential area to groom, or c) a truly communicative signal. The discussion focuses on similarities and differences to i) other referential gestures in apes, ii) gestures of pre-linguistic and just linguistic human children, and iii) homesigns to elaborate on the question if the gestural modality of our nearest primate relatives might have been the modality within which symbolic communication first evolved.
MODELING LANGUAGE EMERGENCE BY WAY OF WORKING MEMORY
ALESSIO PLEBE and VIVIAN DE LA CRUZ Department of Cognitive Science, University of Messina, v. Concezione 8 98121 Messina, Italy {aplebe,vdelacruz} @mime.it
MARC0 MAZZONE Laboratory of Cognitive Science, University of Catania, vide Andrea Doria n 6 95125 Catania, Italy
[email protected] 1. The working memory hypothesis One idea on the origin of language is that a key element, if not the most crucial, was the availability of neural circuits in the brain for working memory (Aboitiz, 1995; Aboitiz, Garcia, Bosman, & Brunetti, 2006), the kind of of short-term memory theorized by Baddeley (1992). The neural connections working memory relies upon are those that the language network relies upon as well, namely the extensive connections between temporoparietal and prefrontal areas. Within this system Francisco Aboitiz and his collaborators consider phonological working memory as being of paramount importance in language evolution, suggesting that it originated as a working memory device involved in the imitation of different vocalizations. However, it is only a small part of the role working memory plays in human language. A brain ready for language may have evolved by virtue of an expanding working memory capacity, which allowed not only the processing of complex sequences of sounds, but the ability to keep under attention the semantic meanings of these sounds as they were being formulated as well as the posing of constraints for the emergence of syntactic processes. One of the first forms of embryonic syntax is the association of a word denoting an object with another word denoting a predicate of the object referred to by the other word. The gap between a purely lexical association between sound and meaning and this syntactic ability is well demonstrated by the documented difficulties children have in acquiring adjectives (Sandhofer & Smith, 2007). The attempt done with the proposed model is to contrast the early learning of names and adjectives, in a sufficient realistic model of the human cortex, and to compare the conceptual representation spaces, with or without the availability of a prefrontal working memory loop. 480
481
2. The proposed model A possible way of exploring hypotheses on the origins of language, without getting daunted by the gap of hundreds of thousand of years worth o f events that we cannot arrive at knowing, is to analyze the ontogenetic transition from a nonlinguistic phase to a linguistic one. In the context of this work, we inquire about what kind of basic connection patterns in the brain might have rendered it better suited to eventually support language. We propose a model of the early acquisition of language elements, grounded in perception, composed by cortical maps, with two versions, one implementing a working memory loop in the higher-level map ,and one that does not. This model is a system of artificial cortical maps, each built using LISSOM (Laterally Interconnected Synergetically Self-organizing Map) architecture (Miikkulainen, Bednar, Choe, & Sirosh, 2005), a concept close enough to the biological reality of the cortex, but that possesses the simplicity necessary for building complex models. Details of the model can be read in a similar but simpler system introduced in (Plebe & Domenella, 2007) to model the emergence of object recognition. The present model consists of two main paths, one for the visual process and another for the auditory channel, which convey to a higher map, in which a working memory connectivity can be added. Both models, with and without working memory, are exposed to 7200 pictures of 100 real objects, waveforms corresponding to names of 38 object categories, 7 adjectives in the class of colors, and 4 in the class of shapes, and learns by combination of Hebbian and homeostatic plasticity. The resulting representations are analyzed measuring the population coding of concepts elicited by pictures or sounds in the higher map. Both systems demonstrate the ability to develop semantic associations, but in the simpler version there is no clear representation of the predicative role of adjectives, while the version with working memory loop exhibits the emergence of an embryonic syntax, by establishing a relationship of adjectives with names. References
Aboitiz, F. (1995). Working memory networks and the origin of language areas in the human brain. Medical Hypotheses, 44, 504-506. Aboitiz, F., Garcia, R. R., Bosman, C., & Brunetti, E. (2006). Cortical memory mechanisms and language origins. Brain and Language, 98,40-56. Baddeley, A. (1992). Working memory. Science, 255,556-559. Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (2005). Computational maps in the visual cortex. New York: Springer-Science. Plebe, A., & Domenella, R. G. (2007). Object recognition by artificial cortical maps. Neural Networks, 20,763-780. Sandhofer, C. M., & Smith, L. B. (2007). Learning adjectives in the real world: How learning nouns impedes learning adjectives. Language Learning and Development, 3,233-261.
MECHANISTIC LANGUAGE CIRCUITS: WHAT CAN BE LEARNED? WHAT IS PRE-WIRED? FRIEDEMANN PULVERMULLER Medical Research Council Cognition and Brain Sciences Unit, Cambridge
[email protected] A brain theory of language and symbolic systems can be grounded in neuroscientific knowledge well established in animal research. Learning is manifest at the neuronal level by synaptic modification reflecting the frequency of use of given connections. Long-distance and short-distance links bridge between, and provide coherence within, brain areas critically involved in linguistic, conceptual perceptual and action processing. Therefore, discrete distributed neuronal assemblies (DDNAs) can develop - that is, they can be learned - that link together (i)
(ii)
acoustic and articulatory phonological information about speech sounds (Pulvermiiller et al., 2006) and spoken word forms (Garagnani, Wennekers, & Pulvermuller, 2007; Pulvermiiller et al., 2001), form-related information about a sign and information about aspects of its referential meaning (Hauk, Johnsrude, & Pulvermiiller, 2004; Pulvermuller, 1999, 2005; Shtyrov, Hauk, & Pulvermiiller, 2004). Referential semantics links signs to specific information about perceptions and actions and is laid down in DDNAs spread out over specific sensorimotor brain areas even reaching, for example, into motor cortex.
This approach does not explain a range of features specific and common to human languages, especially (a) (b)
large vocabularies (10,000s of words), abstract meaning, 482
483 (c)
combinatorial categorisation.
principles
that
govern
syntax
and
syntactic
These critical issues will be addressed, asking about possible brain prerequisites and, therefore, genetic preconditions. (a) We tentatively relate the capability to build large sets of DDNAs to a genetically determined behavioural feature, the early occurrence of repetitive movements and articulations, which leads to the formation of perception-action circuits in the brain that pave the ground for DDNAs later used in language processing (Braitenberg & Pulvermuller, 1992). (b) Abstract meaning processing is based on one more inborn feature of the nervous system, the capability to implement logical operations. Some aspects of abstract meaning can be analysed in terms of either-or functions operating on perceptual and action-related information. These neuronal function-units located close to relevant action-perception systems may provide a brain basis for abstract meaning (Pulvermuller, 2003; Pulvermuller & Hauk, 2006). (c) Combinatorial principles are thought to be laid down in the mind by linguistic principles and rules. A brain-inspired neuronal model of word sequence processing leads to the formation of discrete combinatorial rulerepresentations on the basis of learning (Knoblauch & Pulvermuller, 2005). Neurophysiological results further support the notion of discrete combinatorial brain mechanisms (Pulvermuller & Assadollahi, 2007). The need for and nature of inborn syntactic mechanisms at the neuronal level is discussed in closing.
References Braitenberg, V., & Pulvermuller, F. (1 992). Entwurf einer neurologischen Theorie der Sprache. Naturwissenschajien, 79, 103-1 17. Garagnani, M., Wennekers, T., & Pulvermuller, F. (2007). A neuronal model of the language cortex. Neurocomputing, 70, 1914-19 19. Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action words in the motor and premotor cortex. Neuron, 41,301-307. Knoblauch, A., & Pulvermuller, F. (2005). Sequence detector networks and associative learning of grammatical categories. In S. Wermter & G. Palm & M. Elshaw (Eds.), Biomimetic neural learning for intelligent robots (pp. 3 1-53). Berlin: Springer. Pulvermuller, F. (1999). Words in the brain's language. Behavioral and Brain Sciences, 22,253-336.
484
Pulvermuller, F. (2003). The neuroscience of language. Cambridge: Cambridge University Press. Pulvermiiller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience, 6 (7), 576-582. Pulvermiiller, F., & Assadollahi, R. (2007). Grammar or serial order?: Discrete combinatorial brain mechanisms reflected by the syntactic Mismatch Negativity. Journal of Cognitive Neuroscience, 19 (6), 971-980. Pulvermuller, F., & Hauk, 0. (2006). Category-specific processing of color and form words in left fronto-temporal cortex. Cerebral Cortex, 16 (8), 11931201. Pulvermiiller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O., & Shtyrov, Y. (2006). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences, USA, 103 (20), 7865-7870. Pulvermiiller, F., Kujala, T., Shtyrov, Y., Simola, J., Tiitinen, H., Alku, P., Alho, K., Martinkauppi, S., Ilmoniemi, R. J., & Naatanen, R. (2001). Memory traces for words as revealed by the mismatch negativity. Neuroimage, 14 (3), 607-616. Shtyrov, Y., Hauk, O., & Pulvermuller, F. (2004). Distributed neuronal networks for encoding category-specific semantic information: the mismatch negativity to action words. European Journal of Neuroscience, 19 (4), 1083-1092.
REFLECTIONS ON THE INVENTION AND REINVENTION OF THE PRIMATE PLAYBACK EXPERIMENT GREGORY RADICK Department of Philosophy (Division of History and Philosophy of Science), University of Leeds, Leeds LS2 9 f l , UK
In the early 1890s the theory of evolution gained an unexpected ally: the Edison phonograph. An amateur scientist, Richard Garner, used the new machine - one of the technological wonders of the age - to record monkey calls, play them back to the monkeys, and watch their reactions. From these soon-famous experiments he judged that he had discovered “the simian tongue,” made up of words he was beginning to translate, and containing the rudiments out of which human language evolved. Yet for most of the next century, the simian tongue and the means for its study existed at the scientific periphery. Both returned to great acclaim only in the early 1980s, after a team of ethologists, Robert Seyfarth, Dorothy Cheney, and Peter Marler, announced that experimental playback showed vervet monkeys in Kenya to have rudimentarily meaningful calls. What does the primate playback experiment’s invention and later reinvention tell us about the origin-of-language debate since Darwin? This paper will draw on material from a new book (Radick 2007) in order to explore the conditions - intellectual, institutional, material, cultural - under which the experimentally tested meanings of the natural vocalizations of apes and monkeys come to seem worth having and, for a wider constituency, worth knowing about. The paper will also consider the long period of the experiment’s “eclipse” and what lay behind it. Among other points to be stressed is an important difference in the cultural politics of the ca. 1890 versus the ca. 1980 experiment. In its first incarnation, the primate playback experiment was valued for its promise to vindicate a commonplace evolutionary prediction: that the “highest” nonhuman animals would be found to speak languages a little less complex than the “lowest” human races. In its second incarnation, the experiment had an opposite politics of hierarchy leveling, with the aim being to 485
486 show that when animals are studied “on their own terms,” via playback of the animals’ own utterances in the animals’ natural settings (rather than instruction in human-created languages in psychological laboratories), animal communication is revealed as languagelike in ways that more anthropocentric methods fail to detect. References
Radick, G. (2007). The Simian Tongue: The Long Debate about Animal Language. Chicago: University of Chicago Press.
AN EXPERIMENTAL APPROACH TO THE ROLE OF FREERIDER AVOIDANCE IN THE DEVELOPMENT OF LINGUISTIC DIVERSITY
GARETH ROBERTS Language Evolution and Computation Research Unit, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Adam Ferguson Building, 40 George Square, Edinburgh EH8 9LL, UK
[email protected] The existence of linguistic change and variation is inevitable: human language is genetically underspecified and culturally transmitted. However, variation and change are not dysfunctional. While there has not been enough time for human language of the kind we possess to become fully genetically specified (Worden, 19951, we should not assume that, given enough time, it would do so. On the contrary, it is reasonable to suppose that there has been pressure for it to remain underspecified (cf. Dunbar, 2003, 230). If language did not change and vary, it would be considerably less flexible and would lack the means to convey indexical as well as propositional information. The other side to this coin is the highly developed human ability to exploit linguistic variation as a means of identifying individuals as belonging (or not belonging) to this or that group: “people not from around here talk funny”. Such an ability to tell outsider from insider by the way they speak is of great benefit to the establishment and maintenance of complex networks based on cooperative exchange. Such networks are threatened by individuals that exploit the altruistic behaviour of others. From within the same community, these “freeriders” can be punished, or shunned. For mobile organisms, outsiders to the community pose a more significant threat, as they likelihood or meeting past victims is considerably reduced (Enquist & Leimar, 1993; Dunbar, 1996; Nettle & Dunbar, 1997; Nettle, 1999). There are innumerable real-world examples of groups and individuals distinguishing themselves from others by means of speech patterns, and such behaviour is documented in numerous sociolinguistic studies (see e.g. Labov, 1963; Trudgill, 1974; Evans, 2004). Furthermore, computer simulations have provided evidence that the existence of linguistic diversity can help maintain tit-for-tat cooperation in the face of such freeriders (Nettle & Dunbar, 1997) and, conversely, that social selection of variants is an important factor in the establishment and maintenance of inter-group linguistic diversity (Nettle, 1999). Very little experimental work has 487
488
aimed at exploring this issue directly, however, although work on related questions is encouraging. Garrod and Doherty (1994), for example, show how conventions can become established in a community by repeated one-on-one interactions. In this paper, an experiment is presented in which two equal teams of participants were taught a simple artificial language composed of 18 randomly generated strings with a CVCV or CVCVCV structure (e.g. gumalo, luwo) and English glosses like ‘meat’, ‘have’, ‘want’, ‘not’. Having had time to learn this language, participants were asked to play an online game involving repeated one-on-one interactions in which they negotiated, in the artificial language, to exchange resources. Any exchanged resource was worth twice as much to the receiver as to the giver, so points could be accumulated by exchanging resources with fellow team-members, and lost by giving them to members of the opposing team. During the interaction phase of the game, players were not told which team their partner belonged to, and had to infer this (the only obvious source of such information being the individual’s use of the artificial language). The players’ level of success was then measured, as well as the effect this behaviour had on the artificial language itself. It is hoped that this experiment will contribute to our understanding of the r6le played by cooperation and exploitation in the development of linguistic diversity. References
Dunbar, R. I. M. (1996). Grooming, gossip and the evolution of language. London: Faber and Faber. Dunbar, R. I. M. (2003). The origin and subsequent evolution of language. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 219-34). Oxford: Oxford University Press. Enquist, M., & Leimar, 0. (1993). The evolution of cooperation in mobile organisms. Animal Behaviour, 45,747-57. Evans, B. (2004). The role of social network in the acquisition of local dialect norms by Appalachian migrants in Ypsilant, Michigan. Language Variation and Change, 16(4), 153-67. Garrod, S., & Doherty, G. (1994). Conversation, co-ordination and convention: an empirical investigation of how groups establish linguistic conventions. Cognition, 53, 181-215. Labov, W. (1963). The social motivation of a sound change. Word, 19,273-309. Nettle, D. (1999). Linguistic diversity. Oxford: Oxford University Press. Nettle, D., & Dunbar, R. (1997). Social markers and the evolution of cooperative exchange. Current Anthropology, 38(l), 93-9. Trudgill, I? (1974). The social diflerentiation of English in Norwich. Cambridgc: Cambridge University Press. Worden, R. P. (1995). A speed limit for evolution. Journal of Theoretical Biology, 176,127-52.
PROSODY AND LINGUISTIC COMPLEXITY IN AN EMERGING LANGUAGE WENDY SANDLER Department of English Language and Literature and Sign Language Research Lab, University ofHaifa, Haifa 31 905, Israel IRIT MEIR Department ofCommunication Disorders, Department of Hebrew Language, and Sign Language Research Lab, University of Ha f a , Haifa 31905, Israel SVETLANA DACHKOVSKY Sign Language Research Lab, University of Haifa, Ha f a 31905, Israel MARK ARONOFF Department oflinguistics, Stony Brook, NY I 1794-4376, U.S.A CAROL PADDEN Department of Communication and Center for Research in Language, University of California San Diego, 92093, U.S.A.
Any model of language evolution must address the question of how stretches of symbols were segmented once humans started combining units, and how the relations among these larger units were conveyed. We suggest that, early in the evolution of language, complex grammatical functions may have been marked by prosody - rhythm and intonation -- and we bring evidence for this view from a new language that arose de novo in a small, insular community. The language we are studying, Al Sayyid Bedouin Sign Language (ABSL), was born about 75 years ago in an endogamous community with a high incidence of genetically transmitted deafness (over 100 out of 3,500 villagers are deaf). In the sign language that emerged spontaneously in this community, we find a robust but simple syntax, and prosodic marking which our data suggest is becoming more complex and more systematic across the generations. The investigation combines a model of sign language prosody developed in Nespor & Sandler (1999) together with a method of analyzing grammatical structure through semantic, syntactic and prosodic cues developed in our work on ABSL (Sandler et al 2005; Padden et al in press). Narratives from four deaf Al-Sayyid villagers, two older signers and two younger signers, are analyzed. 489
490
We see clear signs of the development of the system by comparing the older and younger signers. First, the prosodic marking of the younger signers is more salient, due to more redundancy in cueing constituent boundaries (e.g., rhythm + change in head position + change in facial expression) and to greater intensity or size. Second, the younger signers have a larger repertoire of prosodic patterns used consistently to mark particular kinds of structures. Third, the younger signers express dependency relations (e.g.. for conditional sentences) twice as often as older signers, and in a more consistent way. The clauses are both separated from one another and connected to one another by particular prosodic mechanisms. Such complex structures were rare in the older signers, whose narratives were more often characterized by a kind of iterating or stringing prosody. Complex expressions containing three or more dependent clauses were found in the younger signers only. In neither the younger nor the older signers were morpho-syntactic markers of sentence complexity found, such as conditional operators or subordinators. These results are in accord with our findings in the syntax, morphology, and phonology of this language, all of which indicate that language - even in the modern human brain - does not explode into existence full-blown, but develops over time. Our findings are compatible with suggestions by Hopper & Traugott (1993) and others that prosody provides the sole marking of syntactic dependencies in earlier stages of a language. The present study further demonstrates how a prosodic system itself develops, and provides clues to the interaction between prosodic structure and syntactic relations in a new language. It shows that prosody plays a crucial role in the development of a language, and teaches us that models of language evolution would benefit from the incorporation of a prosodic component. References
Hopper, P. & Traugott, E. (1 993). Grammaticalization. Cambridge: Cambridge. Nespor, M., & Sandler, W. (1999). Prosody in Israeli Sign Language. Language andspeech, 42(2&3), 143-176. Padden, C., Meir, I., Sandler, W., & Aronoff, M. (in press). Against all expectations: The encoding of subject and object in a new language. In D. Gerdts, J. Moore & M. Polinsky (Eds.), Hypothesis NHypothesis B: Linguistic Explorations in Honor of David M Perlmutter. Cambridge, MA: MIT Press. Sandler, W., Meir, I., Padden, C., & Aronoff, M. (2005). The emergence of grammar: Systematic structure in a new language. Proceedings of the National Academy of Sciences, 102(7), 2661-2665.
COMMUNICATION, COOPERATION AND COHERENCE PUTTING MATHEMATICAL MODELS INTO PERSPECTIVE
FEDERICO SANGATI & WILLEM ZUIDEMA Institute for Logic. Language and Computation, University of Amsterdam, Plantage Muidergracht 24, 1018 HG, Amsterdam, the Netherlands fsangati @science.uva.nl, jzuidema @ science.u v a d
Evolutionary game theory and related mathematical models from evolutionary biology are increasingly seen as providing the mathematical framework for
modeling the evolution of language (Van Rooij et al., 2005). Two crucial, general results from this field are (i) that altruistic communication is, in general, evolutionary unstable (Maynard Smith, 1982), and (ii) that there is a minimum value on the accuracy of genetic or cultural transmission to allow linguistic coherence in a population (Nowak et al., 2001). Both results appear to pose formidable obstacles for convincing scenarios of the evolution of language. Because language and communication did obviously evolve, finding solutions for both problems is a key challenge for theorists. In this paper we argue that both problems are due to some of the mathematical idealizations used in the theoretical analysis, and disappear when those idealizations are relaxed. To illustrate our argument, we present a surprisingly simple computational model where two idealizations are avoided: (i) we allow for individuals to interact and reproduce in a local neighborhood, avoiding the more common mean-field approximations; (ii) we allow languages to have different similarity relations to one another, avoiding the uniform compatibility function used to derive the coherence threshold. We show that in this model, predictions from the game-theoretic models do not hold, and communication can evolve under circumstances thought to exclude that. Part of our results and methodologies are not entirely novel: the model is inspired on the one defined by Oliphant (1994), and the results relate to work in mathematical population genetics. In our simulationa a population of 400 agents shares a finite set of signals used to convey a corresponding amount of shared meanings. Each individual has a transmitting and a receiving system specifying which signal is associated with a specific meaning and vice versa. We therefore consider the very general case where reception doesn’t necessarily mirror production. We show that the assignaAvailableat staff.science.uva.nlrfsangatiflanguage_evolution.html
49 1
492
ment of a local positioning to agents allows the emergence of linguistic cooperation: even when speakers are not rewarded, an optimal communication is able to emerge and be maintained, although suboptimal communications are able to survive above chance frequency in small subareas. To compare our model to the results of Nowak et al. (2001), we study a number of numerical approximations. We find that the coherence threshold phenomenon depends on the assumption of uniform distances between the possible languages, an assumption which is not valid in models such as ours (as well as the real world), where languages can be more or less similar to each other (figure 1).
Figure 1 . Linguistic coherence in a population with 16 different languages, having uniform distance of 0.5 as in Nowak et al. (2001) and according to the distances as in our model (left). Similarity matrix of the 16 languages derived from the possible mappings between 2 meanings (0/1) and 2 symbols (O/l), where each mapping is fully defined with a 2 x 2 transmitting and receiving system (right).
Although the model remains extremely simple, it allows us to put two famous mathematical results into perspective: in populations, such as our ancestor's, where language users are spatially distributed and languages are of varying similarity to each other, altruistic communication is not necessarily unstable and the coherence threshold does not define "a necessary condition for evolution of complex language" (Nowak et al., 2001, p. 115).
References Maynard Smith, J. (1982). Evolution and the theory of games. Cambridge University Press, Cambridge, England. Nowak, M. A., Komarova, N., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114-118. Oliphant, M. (1994). The dilemma of Saussurean communication. BioSysterns, 37(1-2), 31-38. Van Rooij, R., Jager, G., & Benz, A. (Eds.). (2005). Game theory andprugmatics. Palgrave MacMillan.
A NUMERSOITY BASED ALARM CALL SYSTEM IN KING COLOBUS MONKEYS ANNE SCHEL, KLAUS ZUBERBUHLER School of Psychology, University of St. Andrews, St. Mary’s Quad St. Andrews, KYI 6 9JP, Scotland, UK SANDRA TRANQUILLI School of Anthropology, University College London, London, WClH OBW, UK
One important aspect of understanding ‘what it means to be human’ concerns our extraordinary capacity to share knowledge by using referential acoustic signals. By assembling a small set of basic sounds, the phonemes, according to a number of language-specific rules, humans are able to produce an infinite number of messages. Human communication, according to most theorists, is based on syntax/grammar and semantics/symbolism, whereas animal communication is not. Although rule-governed meaningful communication is a uniquely human ability, there is also a wide consensus that elements responsible for human communication have not emerged de novo in modern humans, but instead have long and possibly independent evolutionary histories that can be traced by studying animal communication. Understanding the evolutionary origins of these abilities is of primary interest for a wide range of disciplines ranging from linguistics to anthropology. There is good converging empirical evidence from a variety of disciplines that the anatomy and neural capacity to produce modern speech emerged in our ancestral line relatively late. Genetic work supports this idea by showing that two mutations in a gene involved in the orofacial movements required for normal speech production, the FoxP2 gene, became stabilised in the hominid populations ancestral to ours only some 200.000 years ago. This gene seems crucial in the developmental process leading to normal speech and language, and one provocative conclusion from these studies is that humans were unable to produce normal speech prior to this time. 493
494
The proper use of normal language does, however, require much more than just a peripheral vocal apparatus capable of producing phonemes. Language is the result of a myriad of cognitive skills and it is simply not likely that the entire cognitive apparatus required for language has evolved over such a short time period. A more plausible scenario is that the capacity to produce and understand language finds its base in neural structures and cognitive capacities that were already present (but not necessarily used for language) in the primate lineage, and thus were inherited from our primate ancestors. The comparative method, therefore, is an important tool in trying to find out and understand which capacities needed for human language were inherited unchanged or slightly modified from our common ancestor with chimpanzees, and which ones are qualitatively new. Several studies on animal communication have been able to show that some animals produce vocalisations that function as referential signals and even simple forms of zoosyntax have been reported, which both are considered key elements of human language. Work on primate alarm calls has, for example, shown that some primates can produce acoustically distinct vocalisations in response to different predator types, to which recipients react with accurate and adaptive responses. The vervet monkeys’ referential alarm calling system has long been the paradigmatic example of how primates use vocal signals in response to predators. More recent fieldwork has however revealed several additional ways in which primates use vocalizations to cope with predators, suggesting that the vervets’ alarm calling system may be more of an exception rather than the rule. Here, we present the results of a playback study on the alarm call system of a little studied group of primates, King colobus monkeys of Tai Forest in the Ivory Coast, a member of the Colobine family. In order to study alarm vocalizations systematically, we played back predator vocalizations to naive monkey groups from a concealed speaker in their vicinity and we then recorded their vocal responses and analyzed their response patterns. We found that upon hearing predator vocalizations, the monkeys often reacted with two basic alarm call types, snorts and acoustically variable roars. Neither call type was given exclusively to one predator, but there were striking regularities in the sequenceorder of calls. Growls of leopards typically elicited long calling bouts consisting of short sequences made of a snort and pairs of roars, while eagles typically elicited short calling bouts consisting of long sequences made of no snorts but many roars. These monkeys thus seem to use an alarm call system that is based on numerosity and call combinations, a further example of a non-human primate that has evolved a simple form of zoosyntax.
ON THERE AND THEN: FROM OBJECT PERMANENCE TO DISPLACED REFERENCE
MARIEKE SCHOUWSTRA UiL OTS, Utrecht Universiw, Janskerkhof 13, Utrecht, 3512 BL, The Netherlands Marieke.Schouwstra @Phil.uu.nl
In the current debate about the emergence of language, researchers have looked for various sources of indirect evidence, either by comparing animals and humans, by analyzing the linguistic structure of certain present-day human languages or by constructing computer models. These approaches have been successful, at least to the extent that many hypotheses about language emergence have been put forward on basis of them. However, it has been recognized lately that it would be useful to combine the results from the different approaches, because that leads to a more complete picture of language emergence (Kirby, 2007). I will focus on one phenomenon, ‘displacement,’ (or ‘displaced reference’) through two approaches to language evolution: one cognitive, the other linguistic. Displacement has been described already by Hockett (1960) as interesting from the point of view of language evolution, as it is a feature that is supposedly unique to human language. Humans seem to be the only ones that are able to talk about things that are not here and not now. In Hurford (2007) it is shown that animals do show signs of the beginnings of displaced reference, though not in their language, but in their cognitive capacities. When an animal has achieved object permanence, it is aware that an object continues to exist, also when no sensory information about the object is available. This capacity is present in many animals, but there is a general trend: the more an animal genetically resembles humans, the better it performs at different ‘displacement tasks’. This indicates that object permanence has been important in the evolution of a species that has linguistic capacities: The capacity to know something about an object, even when ‘it isn’t there’ is a first step along the road to the impressive characteristics of human languages, their capacity for displaced reference. (Hurford, 2007, p. 72) Thus, Hurford sketches an evolutionary trajectory, on the basis of cognitive research, that starts from object permanence in animals’ cognitive capacities and ends in displaced reference in human language. 495
496
Support for this trajectory can be found in recent work in the field of linguistics: the windows approach. This is a perspective on language emergence that has been adopted in the work by Jackendoff (2002), Botha (2005) and goes back in part on earlier work by Bickerton. It studies (among other phenomena) restricted linguistic systems, such as pidgin languages, home sign systems and early stages of untutored second language acquisition by adults. These language forms all arise in situations where the resources for first language learning under normal circumstances are unavailable. The different restricted systems show striking similarities. Therefore, they may tell us something about the cognitive strategies on which language builds, or even about principles from evolutionarily early language, and thereby contribute to the language evolution debate. From various studies of temporal expressions in early second language acquisition and home signs (Benazzo, 2006; Morford & Goldin-Meadow, 1997) it becomes clear that even in the most ‘primitive’ stadia of these systems (when little grammatical means are available to speakers or signers; utterances consist of only several words, and almost no verbs are used), displaced reference appears: subjects make reference to past and future. They do this in relatively rigorous ways, and much work is left to the interpreter, but such an early appearance of displaced reference tells us that it is apparently a fundamental feature of language and must have been present already in evolutionarily early language. The conclusions drawn on the basis of the ‘window work’ described here can support and extend the evolutionary picture sketched by Hurford, but also force us to make precise claims about the relation between cognition and language: should the fact that we can talk about remote things really count as a property of language? References Benazzo, S . (2006, March). The expression of temporality in early second language varieties and adult home signs. (Paper presented at NIAS Workshop ‘Restricted Linguistic Systems as Winows on Language Genesis’) Botha, R. (2005). On the Windows Approach to language evolution. Language and Communication, 25. Hockett, C . F. (1960). The origin of speech. ScientiJicAmerican, 203, 88-96. Hurford, J. R. (2007). The origins of meaning. Oxford University Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammal; evolution. Oxford University Press. Kirby, S. (2007). The evolution of language. In R. Dunbar & L. Barrett (Eds.), Oxford handbook of evolutionary psychology (pp. 669-68 1). Oxford University Press. Morford, J. P., & Goldin-Meadow, S. (1997). From here and now to there and then: The development of displaced reference in homesign and English. Child development, 68(3),420-435.
SIGNALLING SIGNALHOOD AND THE EMERGENCE OF COMMUNICATION THOMAS C. SCOTT-PHILLIPS, SIMON KIRBY, GRAHAM R. S. RITCHIE Language Evolution and Computation Research Unit, University of Edinburgh
[email protected]. uk
A vast number of stable communication systems exist in the natural world. Of these only a few are learnt. A similarly small number of systems make use of arbitrary symbols, in which meaning is disassociated from form. Moreover, human language is the only system for which both of these facts are true. How such a system might emerge should therefore be of great interest to language evolution researchers. However at present barely anything is known about this process. A growing body of theoretical, computational and experimental studies have explored how symbolic systems might spread through a dyad or population of interacting individuals. However, all of this work has, with one exception, circumnavigated a key problem that remains unaddressed: how do individuals even know that a given communicative behaviour is indeed communicative? That is, how does a signal signal its own signalhood? We report on the first empirical work that explicitly addresses these questions. In order to do this we introduce the Embodied Communication Game, in which human subjects play a simple communication game with each other over a computer network. The game has three key properties. First, the communication channel is undefined (unlike e.g. Galantucci, 2005; Marocco & Nolfi, 2007). Second, the roles of speaker and hearer are undefined (unlike e.g. de Ruiter et al., forthcoming; Steels, 1999). And third, the possible forms that signals may take is also undefined (unlike game theoretic models, and also some experimental approaches, e.g. Selton & Warglien, 2007). These qualities have the result that player must use their behaviour in the game’s world to communicate not just their intended meaning but also the fact that their behaviour is communicative in the first place. This allows us to address the question of how to signal signalhood. Only one previous piece of work (Quinn, 2001) has adhered to all three of these constraints. Here pairs of simulated 497
498
agents had to find a way to communicate so that they could solve a simple coordination task, but no explicit communication channel was made available. Although some pairs of robots were successful in this task, the solution found was iconic and was also, moreover, innate rather than learnt. We are interested, however, in the case of learnt, symbolic communication. We find that the likelihood that a viable symbolic system will emerge is significantly increased if it is possible to first create some non-communicative convention onto which communication can bootstrap. The communication of communicative intent in the absense of pre-existing conventions is thus shown to be non-trivial task (even for already fluent users of a learnt, symbolic communication system) that is unlikely to be solved de n o w , i.e. created fullyformed by one individual and inferred wholesale by another. Instead a more organic process like ontogenetic ritualisation (Tomasello & Call, 1997) is more likely. Moreover, these results are the first lab-based instance of the emergence of symbolic communication when the problem of recognising communicative intent is not avoided by very nature of the investigative set-up.
Acknowledgements TSP and GR are funded by grants from the AHRC and the EPSRC respectively. We also acknowledge financial support from AHRC grant number 112105. References de Ruiter, J. P., Noordzij, M. L., Newman-Norland, S., Newman-Norland, R., Hagoort, P., Levinson, S. C., et al. (forthcoming).Exploring human interactive intelligence. Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive science, 29, 737-767. Marocco, D., & Nolfi, S. (2007). Communication in natural and artificial organisms: Experiments in evolutionary robotics. In C. Lyon, C. L. Nehaniv & A. Cangelosi (Eds.), Emergence of communication and language (pp. 189-206). London: Springer-Verlag. Quinn, M. (2001). Evolving communication without dedicated communication channels. In J. Kelemen & P. Sosik (Eds.), Advances in artficial life: ECAL6. Berlin: Springer. Selton, R., & Warglien, M. (2007). The emergence of simple languages in an experimental coordination game. Proceedings of the National Academy of Sciences, 104( 18), 7361-7366. Steels, L. (1999). The Talking Heads experiment. Antwerp: Laboratorium. Tomasello, M., & Call, J. (1997). Primate cognition. Oxford: Oxford University Press.
WILD CHIMPANZEES MODIFY THE STRUCTURE OF VICTIM SCREAMS ACCORDING TO AUDIENCE COMPOSITION KATIE E SLOCOMBE Department of Psychology, University of York, York, YO105DD, England KLAUS ZUBERBUHLER School of Psychology, University of St Andrews, St Andrews, KYl6 9JP. Scotland
One way of studying the evolutionary origins of language is to investigate the different cognitive capacities involved in language processing and to trace their phylogenetic history within the primate lineage. One conclusion from this research so far has been that some language-related capacities, such as recursion, are unique to humans and associated with the emergence of modern speech capacities while others have evolutionary roots deep in the primate lineage. The ability to communicate about external objects or events, for example, appears to be such a phylogenetically old capacity, and there is good evidence that various monkey species are able to convey information about external events with their calls. However, in these cases is often unclear whether callers are actively trying to inform each other about the event they have perceived, or whether their calling behaviour is a mere byproduct of a biological predisposition to respond to certain types of evolutionarily important events, such as the appearance of a predator. In either case, listeners will have to engage in a fair bit of inferential reasoning, suggesting that these types of systems have acted an evolutionary precursor to the semantic capacities evident in modern humans. However, despite good evidence for such functionally referential communication and inferential capacities in monkeys there is little comparable evidence available for any of the great ape species in the wild. This is problematic because great apes are the most important elements in any comparative approach. We studied the vocal behaviour of wild chimpanzees of the Budongo Forest, Uganda during agonistic interactions. Previous work has shown that victim and aggressor screams are acoustically distinct signals (Slocombe and Zuberbiihler, 2005) that have the potential to provide listeners 499
500 with information on the role of the caller during an interaction. In this study we examined victim screams in considerable detail to determine (a) the extent these calls contained information about the nature of the ongoing agonistic encounter and (b) to what degree these calls are the product of signalers trying to intentionally address particular target individuals that are likely to intervene and help the caller. We analyzed victim screams given by 21 different individuals in response to aggression from others. We found that these screams varied reliably in their acoustic structure as a function of the severity of the aggression experienced by the caller. Victims receiving severe aggression (chasing or beating) gave longer bouts of screams in which each call was longer in duration and higher in frequency than screams produced by victims of mild aggression (charges or postural threats). Chimpanzee victim screams therefore are promising candidates for functioning as referential signals. Playback experiments are now ongoing to assess whether listening individuals are able to extract information about the severity of a fight from these calls. With regards to addressing particular individuals, we found that victims receiving severe aggression were sensitive to the composition of the listening audience and they modified the acoustic structure of the screams accordingly. If there was an individual present in the party, who could effectively challenge the aggressor (because it was equal or higher in rank than the aggressor) then victims produced screams that were acoustically consistent with extremely severe aggression. This vocal exaggeration of the true level of aggression only occurred when the chimpanzees most needed aid, that is when they were subjected to severe but not mild aggression. In other observations we found that high-ranking individuals most often provided aid if victims were exposed to severe rather than mild aggression, suggesting that victim screams function to recruit aid and that callers modify them in a goal-directed manner. The low visibility of the chimpanzees’ natural rainforest environment seems to make this tactical calling a viable strategy. It is rare that bystanders during agonistic interactions have perfect visual access to the ongoing event, therefore callers run a relatively small risk of being identified as unreliable signalers or experiencing other types of negative feedback. This is the first study to show that non-human primates can flexibly alter the acoustic structure of their vocalizations in response to the composition of the audience.
References Slocombe, K. E. and Zuberbuhler, K. (2005) Agonistic screams in wild Chimpanzees vary as a function of social role, Journal ofComparative Psychology, 1 19( I), 67-77
AN EXPERIMENTAL STUDY ON THE ROLE OF LANGUAGE IN THE EMERGENCE AND MAINTENANCE OF HUMAN COOPERATION J.W.F. SMALL Language Evolution and Computation Research Unit, Department of Linguistics and English Language, University ofEdinburgh, 40 George Square, Edinburgh, EH8 9L, United Kingdom SIMON KIRBY Language Evolution and Computation Research Unit, Department of Linguistics and English Language, University ofEdinburgh, 40 George Square, Edinburgh, EH8 9L, United Kingdom
While the emergence of Language may have been promoted by a myriad of different factors, it seems intuitively obvious that some level of cooperation among humans was necessary. Desalles (2000) argues that cooperation itself was the decisive factor for language emergence while Knight (2006), suggests that any human cooperation requires contracts, the very contracts upon which society is based. Jeffreys (2006) presented experimental findings showing that cooporation on a social dilemma task required language and that once language was used, players often made altruistic sacrifices. The present experiment seeks to further explore some of these contentions. Forty participants (N=40), were split into two groups, one group encouraged to use language, the other not allowed to use language. Participants were each given a set of ping pong balls, put into pairs and then instructed to use the balls to traverse a sequence of holes on a board which stood separating them from the other player. Participants had five minutes to play and for each of their own balls through the course they were awarded one point. It was made known that the person with the highest score overall would be awarded a monetary reward. The relative location of the holes in the sequence made it nearly impossible to complete the course without the aide of the other participant. Thus, although they were not told that this was the case, by assisting one another participants were able to greatly reduce the time which it took to finish the course with a ball and so players who assisted one another were consistently able to achieve higher scores. Defining cooperation as any manual act which assited the other player, it was found that the use of language between two individuals on the task significantly shortened the time to the commencement of cooperation. M= 501
502 0.472, SE= 0.09 in the speaking group versus M=2.444, SE = 0.4833 in the nonspeaking group (t(20) = -4.167, p< 0.01). Furthermore, once cooperation had begun, the use of language enhanced efficiency on the task, the number of balls through the game board being higher in the speaking group (M=40.95, SE=2.14) than in the non-speaking group (M=14.33, SE= 1.43), t(38)=10.267, p