CONTINUITY IN LINGUISTIC SEMANTICS
LINGVISTICÆ INVESTIGATIONES: SUPPLEMENTA Studies in French & General Linguistics /...
226 downloads
784 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
CONTINUITY IN LINGUISTIC SEMANTICS
LINGVISTICÆ INVESTIGATIONES: SUPPLEMENTA Studies in French & General Linguistics / Etudes en Linguistique Française et Générale This series has been established as a companion series to the periodical "LINGVISTICÆ INVESTIGATIONES", which started publication in 1977. It is published by the Laboratoire d'Automatique Documentaire et Linguistique du C.N.R.S. (Paris 7).
Series-Editors: Jean-Claude Chevalier (Université Paris VIII) Maurice Gross (Université Paris 7) Christian Leclère (L.A.D.L.)
Volume 19 Catherine Fuchs and Bernard Victorri (eds) Continuity in Linguistic Semantics
CONTINUITY IN LINGUISTIC SEMANTICS Edited by
CATHERINE FUCHS BERNARD VICTORRI Université de Caen, France
JOHN BENJAMINS PUBLISHING COMPANY AMSTERDAM/PHILADELPHIA
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Library of Congress Cataloging-in-Publication Data Continuity in linguistic semantics / edited by Catherine Fuchs. Bernard Victorri. p. cm. — (Linguisticae investigationes. Supplementa ISSN 0165-7569; v. 19) Includes bibliographical references and index. Contents: The limits of continuity : discreteness in cognitive semantics / Ronald Langacker - Continuity and modality / Antoine Culioli - Continuum in cognition and continuum in language / Hans-Jakob Seiler — Is there continuity in syntax? / Pierre Le Goffic - The use of computer corpora in the textual demonstrability of gradience in linguistic categories / Geoffrey Leech, Brian Francis & Xunfeng Xu ~ A "continuous definition" of polysemous items / Jacqueline Picoche « The challenges of continuity for a linguistic approach to semantics / Catherine Fuchs — What kind of models do we need for the simulation of understanding? / Daniel Kayser - Continuity, cognition, and linguistics / Jean-Michel Salanskis ~ Reflections of Hansjakob Seller's continuum / René Thorn ~ Attractor syntax / Jean Petitot — A discrete approach based on logic simulating continuity in lexical semantics / Violaine Prince ~ Coarse coding and the lexicon / Catherine L. Harris — Continuity, polysemy, and representation: understanding the verb "cut" / David Touretzky ~ The use of continuity in modelling semantic phenomena / Bernard Victorri. 1. Semantics. 2. Continuity. 3. Linguistic models. I. Fuchs, Catherine. II. Victorri, Bernard. III. Series. P325.C57 1994 401'.43--dc20 94-38916 ISBN 90 272 3128 1 (Eur.) / 1-55619-259-2 (US) (alk. paper) CIP © Copyright 1994 - John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 75577 • 1070 AN Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia, PA 19118 • USA
CONTENTS
Preface
3
PART I : LINGUISTIC ISSUES Ronald Langacker : The limits of continuity : discreteness in cognitive semantics Antoine Culioli : Continuity and modality Hans-Jakob Seiler : Continuum in cognition and continuum in language Pierre Le Goffic : Is there continuity in syntax ? Geoffrey Leech, Brian Francis & Xunfeng Xu : The use of computer corpora in the textual demonstrability of gradience in linguistic categories Jacqueline Picoche : A "continuous definition" ofpolysemous items : its basis, resources and limits Catherine Fuchs : The challenges of continuity for a linguistic approach to semantics
9 21 33 45
57 77 93
PART II : MODELLING ISSUES Daniel Kayser : What kind of models do we need for the simulation of understanding ? 111 127 Jean-Michel Salanskis : Continuity, cognition and linguitics René Thorn : Reflections on Hansjakob Seiler's continuum 155 Jean Petitot : Attractor syntax : morphodynamics and cognitive grammar 167 Violaine Prince : A discrete approach based on logic simulating continuity in lexical semantics 189 Catherine L. Harris : Coarse coding and the lexicon 205 David Touretzky : Continuity, polysemy and representation : understanding the verb 'cut' 231 Bernard Victorri : The use of continuity in modelling semantic phenomena 241
PREFACE Until recently, most of linguistic theories avoided recourse to the notion of continuity. Structuralism, which developed within a problematic of purely discrete relations of commutation (paradigmatic relations) and distribution (syntagmatic relations) among forms, contributed to the marginalisation of all phenomena outside such discrete framework. This tendency was accentued by transformational-generative grammars, two essential characteristics of which tend towards the elimination of continuity : on the one hand, the priority accorded to syntax, with its emphasis on categorisation (which leads to compositionality as far as semantics is concerned) and on the other, the utilisation of an algebraic forma lism which sanctions the discrete character of these models. It should be noted that contemporary theories of cognition also resort massively to discrete formalisms, partly under the combined influence of linguistic theories and of symbolic artificial intelligence. Thus the Gestalt theory seems to be almost forgotten today, despite the fact that, particularly as regards the psycho logy of perception, it had completely transformed the paradigms of research, by opposing to the then prevailing associationist ideas a point of view closer to the theory of dynamic systems (inspired by the fields theory of physics), in which the relation between the whole and its parts is the result of more global interactions, irreducible to discrete combinations of elementary sensations. Nevertheless, the predominance of discrete approaches does not mean that continuity has been completely absent from the scene. In linguistics, in the hey day of structuralism, the psychomecanic theory of Gustave Guillaume already ran counter to the dominant point of view by defining a kinetics which could account for a continuum of significations obtained in discourse by earlier or later "interceptions" in the movement of "tension" inherent in each linguistic unit in tongue. More recently, various trends in linguistics have been trying to go beyond the constraints imposed by discrete approaches : Culioli, Picoche, Lakoff, Langacker, Leech, Seiler and others. Despite their diversity, these approaches evidently share a certain number of common semantico-cognitive preoccupa tions (whether it is a question of cognitive grammars, invariants in the typology of languages, or of enunciative categories). However, these attempts have remained generally isolated and seldom succeeded in building up problematics sufficiently operative to oppose to the all-powerful discrete models.
4
CONTINUITY IN LINGUISTIC SEMANTICS
At the same time, mathematical and computer science tools have been set forth, which seem interesting for the modelling of continuous phenomena in lin guistic semantics. Thus, in mathematics, Thorn has shown that the framework of differential geometry and of dynamic systems could constitute an alternative to discrete formalisations for linguistics. Several researchers (Petitot, Bruter, Wildgen) have explored the possibilities offered by Thorn's "catastrophe theory" for the treatment of linguistic problems. More recently, the revival of interest in connectionist techniques in artificial intelligence has been accompanied by some attempts to apply these techniques to the description of language : the modelling of language learning processes (McClelland and Rumelhart, Touretzky, etc.), the treatment of semantic ambiguities (Cottrell, Waltz and Pollack, etc.) and lately even syntactic representations (Smolensky). Time has come now to take stock of these advances. This book provides a confrontation between linguists, philosophers, mathematicians and computer scientists, dealing with two major questions : which language phenomena call for continuous models, and what can the tools of formalisation contribute in this respect ? In order to focus the reflexion even further, the authors deliberately restricted themselves to the problems of linguistic semantics, linked to the lexi con, to grammatical categories (aspect, modality, determination, etc.) or to syn tactic structures. The book is thus divided into two main parts : Part one is devoted to linguistic issues : one of our main concerns has been to give priority to linguistic problematics, because the utilisation of ma thematical or computer science formalisms too often leads to an emphasis on the tool at the expense of the phenomena to be accounted for. The following questions are at stake : In linguistics, which semantic phenomena appear difficult or impos sible to describe in discrete terms ? How can recourse to the notion of continuity allow resolution of the difficulties encountered ? How can a description of these phenomena in terms of a continuum be articulated with the discrete character of linguistic units and their composition ? Should continuity be conceived as a con venient representation of very gradual, but nevertheless basically discrete, pheno mena, or must one postulate that continuity is an intrinsic characteristic of seman tic phenomena ? Part two is devoted to modelling issues, considered from a threefold point of view : namely a philosophical, a mathematical and a computer science viewpoint. The various contributions try to answer the following questions : - From an epistemological point of view, must the introduction of the notion of continuity be seen as a radical break with the tradition of formalisation in linguistics ? In particular, how can the introduction of this notion be reconcilied with a methodology based on the falsifiability of theories ? Is there necessarily a
PREFACE
5
link between this type of modelling and the cognitive approaches which are also based on the notion of continuity ? What new interactions might such an approach open up, particularly with a general cognitive theory like the Gestalt theory ? - On the mathematical plane, what is the relation betweeen the notions of "continuity" versus "discreteness" in linguistics and the various mathematical properties to which they can be compared (oppositions between continuous and discontinuous for a function, between continuous and countable in set theory, between continuous and discrete for a variety) ? Can linguistic "continuity" really be accounted for by a mathematical model ? Can one expect really operative pre dictions (quantitatively and qualitatively) from such a model ? What links are there between a continuous mathematical model and the quantitative mathematics already widely employed in linguistics, namely statistics ? - Finally, on the plane of computer science, how can continuity be implemented on a digital, and therefore discrete, machine ? Must a continuous mathematical model necessarily correspond to a continuous implementation in computer science ? Does connectionism provide a novel and completely satisfactory solution to this problem ? Catherine FUCHS Bernard VICTORRI
PART I LINGUISTIC
ISSUES
THE LIMITS OF CONTINUITY : DISCRETENESS IN COGNITIVE SEMANTICS RONALD W. LANGACKER University of California, San Diego, USA
I take it as being evident that many aspects of language structure are matters of degree. This is a common theme in both functional and cognitive linguistics, including my own work in cognitive grammar (Langacker 1987a, 1990, 1991). It would however be simplistic to assume that a commitment to cognitive (as opposed to formal) semantics necessarily correlates with the view that semantic structure is predominantly continuous (rather than discrete). I suspect, in fact, that the role of true continuity in linguistic semantics is rather limited. My goals here are to clarify some of the issues involved, to briefly discuss a certain amount of data, and to propose a basic generalization concerning the distribution of discrete vs. continuous phenomena. In approaching these matters, I will ignore the fact that linguistic structure reduces, ultimately, to the activity of discrete neurons that fire in discrete pulses. Our concern is rather with phenomena that emerge from such activity at higher le vels of organization, phenomena that could in principle be either discrete or conti nuous. I will also leave aside two aspects of discreteness that are too obvious and general to merit extended discussion : the fact that we code our experience primarily by means of discrete lexical items, each of which evokes individually only a limited portion of the overall notion we wish to express ; and the discrete nature of the choice (i.e. at a given position we have to choose either one lexical item or another, not some blend or in-between option). In using terms like discrete and continuous, what exactly do I mean ? A continuous parameter has the property that, between any two values (however close), an intermediate value can always be found. There are no "gaps" along the parameter, nor any specific values linked in relationships of immediate succession. By contrast, discreteness implies a direct "jump" between two distinct values, one of which is nonetheless the immediate successor of the other. To take an obvious example, the real numbers form a continuous series, whereas the integers are dis crete (there is no integer between 4 and 5). Many continuous parameters are of course discernible in conceptualization and linguistic semantics : length, pitch,
10
RONALD W. LANGACKER
brightness, the angle at which two lines intersect, etc. Yet the role of true conti nuity appears to be circumscribed in various ways. We must first distinguish actual continuity from other phenomena that tend to be confused with it. One thing that does not qualify as continuity is hesitancy or indeterminacy in the choice between two discrete options (which is not to deny that one's inclination to choose a particular option may be a matter of degree). For instance, although I may not know whether to call a certain object a cup or a mug, I nevertheless employ distinct and discretely different prototypical conceptions in making the judgment (cf. Wierzbicka 1984). Continuity is also not the same as va gueness or "fuzziness". It would be arbitrary, for example, to draw a specific line as the definitive boundary of a shoulder. This body part is only fuzzily bounded — there is no definite point at which it is necessarily thought of as ending. Yet we do conceive of it as a bounded region (this makes shoulder a count noun), and a boundary implies discontinuity (a "jump" between shoulder and non-shoulder). We impose the boundary despite being unsure or flexible in regard to its place ment. I believe, moreover, that certain linguistic phenomena often thought of as forming a continuum are better analyzed in terms of multiple discrete factors that intersect to yield a finely articulated range of possibilities. For instance, basic grammatical categories are sometimes seen as varying continuously ("squishily") between the two extremities anchored by nouns and verbs (Ross 1972). I have ar gued, however, that grammatical classes are definable on the basis of discrete se mantic properties (Langacker 1987a, part II ; 1987b). A noun designates a thing (defined abstractly as a "region"), while an adjective, preposition, participle, infi nitive, or verb designates a relation. Verbs are temporal (in the sense of desi gnating relationships that evolve through time and are scanned sequentially), whe reas the other categories are atemporal (being viewed holistically). Some rela tions are simple (i.e. they comprise just a single configuration), but others — verbs, infinitives, as well as certain participles and prepositions — are complex (comprising multiple configurations). When additional semantic traits are taken into account, together with polysemy and the prototype organization of individual categories, behavior can be anticipated that is "squishy" for all intents and pur poses. Still, merely indicating the position of elements along a continuous scale can at best only summarize their behavior. Specific semantic characterizations offer the prospect of explaining it. Let us turn now to conceptual parameters that can indeed be regarded as continuous. Even here there are serious qualifications. The most obvious point is that a continuous scale is usually not coded linguistically in a continuous manner. For temperature we have terms like hot, cold, warm, cool, scalding, and freezing, not to mention ways of expressing specific values (e.g. 13 ° C). We devise musi-
THE LIMITS OF CONTINUITY
11
cal scales to structure the domain of pitch, and for time we have many discrete units of segmentation and measurement. The most celebrated example, of course, is the idiosyncratic "tiling" of color space imposed by the basic color terms of each language. Color is also celebrated as a domain which harbors a certain kind of dis creteness despite the apparent continuity of its basic dimensions. I refer to the phe nomenon of "focal colors", which provide the prototypical values of basic color terms and have special cognitive salience even when such a term is lacking (Berlin and Kay 1969 ; Kay and McDaniel 1978). Focal colors mitigate the continuity of color space by making it "lumpy" rather than strictly homogeneous. As natural co gnitive reference points, the lumps are easily adopted as the basis for categori zing judgments, so that color categories tend to coalesce around them. Focal colors are just one manifestation of a reference-point ability that I have claimed to be both ubiquitous and fundamental in cognition and linguistic seman tics (Langacker 1993a). It is, I think, self-evident that we are able to evoke the conception of one entity for purposes of establishing "mental contact" with another. This reference-point ability is manifested in the physical/perceptual do main whenever we search for one object in order to find another, as reflected in sentences like the following : (1)
a. The drugstore is next to the post office b. There's a deer on that hill just above the large boulder
More abstractly, a reference-point relationship is central to the meaning of "possessives" : the man s wallet ; my cousin ; the girl's shoulder ; our train ; her attitude ; your situation ; Kennedy's assassination ; etc. I suggest, in fact, that it constitutes the one constant aspect of their meaning. This schematic semantic value accounts for both the extraordinary variety of the relationships coded by possessive elements and also their asymmetry. If the "possessor" is properly analyzed as a natural cognitive reference point vis-à-vis the "possessed", it stands to reason that these roles would not in general be freely reversible (*the wallet's man ; *the shoulder's girl ; *the assassination's Kennedy). With respect to basically continuous parameters, the reference-point ability has what might be termed a "quantizing" effect. We are not in general able to di rectly ascertain a precise value falling at an arbitrary location along a continuous scale, nor do languages provide separate terms for each possible value. Instead, the usual strategy is either to assimilate the value to a salient reference point (ignoring its deviation therefrom), or else to estimate and discretely characterize its position in relation to one or more such reference points. The result is a kind of quantization, wherein linguistically coded values either "jump" directly from one reference point to another, or alternatively, are calculated from reference points in some discrete fashion.
12
RONALD W. LANGACKER
Quantization is most apparent when a continuous parameter is structured by means of a discrete grid or numerical scale. The musical scale is a case in point. Not only do the basic terms (C, D, E, etc.) jump from one precise value to the next, but they also provide the basis — both conceptual and linguistic — for deter mining the only permissible intermediate values. Conceptually, F-sharp or B-flat lies mid way between two primary values (or else one quantum above or below such a value, given some conception of the magnitude of allowable incre ments/decrements). Linguistically, the expression that codes an intermediate value comprises the basic term and either of two discrete qualifiers (sharp/flat). A tempe rature scale is comparable except that there is more flexibility in specifying inter mediate values. In describing the temperature as being 13.2 ° C, we take 13° C. as a reference point and indicate that the actual value lies beyond it at a distance repre senting a certain fraction of the interval between 13° and 14°. And while it is true in principle that any real number can be used to specify a temperature, in everyday practice we confine ourselves to the integers or at most to fractional intermediate values that we can estimate in terms of quanta. Intuitively, for instance, I unders tand 13.2° as a step beyond 13° such that five steps of the same magnitude would take me to 14°. Fractions themselves neatly illustrate the type of phenomenon I have in mind. Expressions like 21/2 and 53/4 clearly take the integers as both linguistic and conceptual reference points. Moreover, they use other integers as the basis for computing a specific intermediate position : the denominator indicates how many steps there are between two successive integers, while the numerator specifies how many steps should be taken. Or consider the angle at which two lines inter sect. Although there is obviously a continuous range of possible values, it seems evident that certain discretely computed values have special cognitive status. Particularly salient is an angle of 90°, as reflected by terms like right angle and perpendicular. The reason, I suggest, is that perpendicularity represents the privi leged situation in which the two angles formed by one line joining another have precisely the same magnitude — if one angle is mentally superimposed on the other (which is thus invoked as a kind of reference point), they are found to coincide. We are also more likely to characterize an angle as being 45°, 30°, or 60° than, say, as 7°, 52 °, or 119°. These values are privileged psychologically because they bear easily computed relationships to a right angle. An angle of 45° is one whose complement within a right angle is identical to it in magnitude. As for 30° and 60°, we can easily imagine sweeping through a right angle in three discrete and equal steps, defining three component angles whose superimposition likewise results in judgments of identity. Quantization of this sort is by no means confined to quasi-mathematical do mains. For instance, compound color expressions like brick red, celery green, and
THE LIMITS OF CONTINUITY
13
sky blue can be thought of as evoking dual cognitive reference points. By itself, a term like red, green, or blue evokes a focal color, which in turn evokes the more inclusive region in color space that it anchors. A noun such as brick, celery, or sky names an entity that not only has a characteristic color but is sufficiently familiar to serve as a reference point. From these two reference points, we compute the proper notion : red tells us that brick is to be construed with respect to its color, and brick directs our attention to a particular location within the red region. Also varying continuously are parameters such as size, length, and distance. We can of course measure these numerically, as for temperature. The more usual strategy, however, is to assess magnitudes only with respect to broad categories standing in binary opposition : big vs. small ; long vs. short ; near vs. far. The categorization is therefore basically discrete despite the vagueness of the boundaries. (Continuous analogical coding — as in The train was looooong — is clearly a rather marginal phenomenon.) Furthermore, placing an object in such a category involves a single step, in either a positive or a negative direction, from a privileged value that serves as reference point for this purpose. For a given object type, that reference point comprises the range of values that everyday experience has led us to regard as being "normal" for that type. Thus a big flea is smaller in absolute terms than a small moth, and a short train is longer than a long centipede. The phenomenon is quite general. It seems to me that in every domain we operate primarily in terms of salient reference points, from which we arrive at other notions in ways that are largely discrete. When the effect of reference points and quantization is worked out systematically and fully appreciated, the role of true continuity in linguistic meaning will, I believe, appear rather limited. This is not to say that it has no role whatever. There are in fact important aspects of linguistic semantics for which continuity should probably be considered the default assumption. Let me offer the following broad generalization as a working hypothesis that may have some heuristic value : with respect to the "internal structure" of a linguistically coded conception, discreteness predominates ; on the other hand, semantic effects due to "external" factors — i.e. relationships with other conceptual structures — are basically continuous. Since conceptions are containers only by dint of metaphor (and are not really very container-like), the terms "internal" and "external" should neither be taken as implying a strict dichotomy nor pushed beyond the limits of their utility. Factors reasonably considered internal are an expression's conceptual "content" and many facets of its "construal". I have thus far focused on content, arguing that true con tinuity is circumscribed and circumvented in various ways. Construal is the phe nomenon whereby essentially the same content is susceptible to alternate "viewings", which represent distinct linguistic meanings. One aspect of construal,
14
RONALD W. LANGACKER
namely background, is by nature an external factor. Other, more internal aspects include specificity, scope, perspective, and prominence. Here, as with content, the role of true continuity is more limited than one might think. Specificity (or conversely, schematicity) refers to our manifest ability to conceive and portray a situation at any level of precision and detail, as exemplified in (2) : (2)
Something happened > A person saw an animal > A woman examined a snake > A tall young woman carefully scrutinized a small cobra
Observe, however, that we can generally only adjust the level of specificity in quantized fashion, either by adding a discrete element {woman > young woman) or else by shifting from one discrete level to another in a taxonomic hierarchy {examine > scrutinize). Furthermore, beyond their obvious discreteness such hierarchies are usually "lumpy", in that certain levels have greater cognitive salience than others. In particular, the "basic level" (e.g. snake in the hierarchy thing > animal > reptile > snake > cobra) is known to have special psychological status (Rosch 1978). I define an expression's scope as the array of conceptual content it invokes and relies upon for its characterization. By nature, it tends to be flexible and va riable — it is generally no easier to precisely delimit an expression's scope than it is to determine exactly how far a shoulder extends. Nevertheless, there is good linguistic evidence for believing not only that this construct has some kind of cognitive reality, but also that scopes are conceived as being bounded, however fuzzily (see Langacker 1993b). Grammatical constructions can refer to them specifically and equate them with other bounded regions. Consider the "nested locative" construction, as in (3) : (3) The camera is upstairs in the bedroom in the closet on the top shelf Intuitively, this construction involves a "zooming in" effect, wherein each successive locative in the sequence focuses on a smaller area contained within the previous one. More technically, we can say that the scope for interpreting each lo cative in the sequence is limited to the search domain of the preceding locative. (The search domain of a locative is defined as the area to which it confines the en tity being located, i.e. the set of locations that will satisfy its specifications.) A partonomy like arm > hand > finger > knuckle further illustrates both nesting and quantization with respect to scope : the conception of an arm overall provides the spatial scope for the characterization of hand ; the conception of a hand in turn constitutes the immediate spatial scope for finger ; and that of a finger, for knuckle.
THE LIMITS OF CONTINUITY
15
The term perspective subsumes more specific aspects of construal such as vantage point, orientation, direction of mental scanning, and subjecti vity/objectivity. Although some of these factors can in principle vary conti nuously, in practice they tend toward discreteness. Presumably, for instance, our conception of a cat includes numerous visual images, representing cats with diffe rent markings, in various postures, engaged in certain activities, etc. There is doubtless considerable variation and flexibility. Still, it seems apparent that these images tend to reflect certain canonical vantage points, and certain orientations of the cat within the visual field. These vantage points and orientations are of course those which predominate in our everyday visual experience. For example, images in which the cat is viewed from underneath, or is upside down within the visual field, are possible but hardly typical. The notion of mental scanning can be illustrated by the contrast between pairs of expressions like the following : (4)
a. The roof slopes steeply {upward/'downward} b. The road {widens/narrows} just outside of town c. The hill gently {rises from/falls to} the bank of the river
In each case a difference in meaning is quite evident, even though the two expressions describe precisely the same objective situation. Intuitively, moreover, the semantic contrast involves directionality, even though the situations described are static — objectively, nothing moves, hence there is no apparent basis for direc tionality. I do ascribe motion to these sentences, but not on the part of the subject : rather, it is the conceptualizer who "moves" in these expressions, scanning men tally through the scene in one direction or the other (Langacker 1990, ch. 5). For our purposes, the pertinent observation is that the contrast in directionality (e.g. between the conceptualizer scanning upward along the roof or downward) is clearly discrete. Subjectivity/objectivity is defined as the extent to which an entity is construed, asymmetrically, as the "subject" vs. the "object" of conception (Langacker 1985 ; 1990, ch. 12). In (4), for example, both the conceptualizer and his motion are construed subjectively, since the conceptualizer does not actually conceive of himself as scanning mentally through the scene, but merely does so implicitly as he focuses on the objective configuration thus assessed. Although subjectivity/objectivity is a matter of degree, here too there are grounds for belie ving that the scale is lumpy and partially quantized owing to the privileged status of certain canonical arrangements. Consider the role of the speaker, for instance. At one extreme, represented by the pronoun I, the speaker goes "onstage" to be the expression's referent ; as the explicit focus of attention, the speaker is construed quite objectively. Another standard arrangement finds the speaker
16
RONALD W. LANG ACKER
"offstage" but still within an expression's scope, hence intermediate in terms of subjectivity/objectivity. Examples include deictics (e.g. this ; here ; now) and sentences like those in (5), where the speaker functions as the default-case reference point. (5)
a. The mailbox is right across the street b. Please come as soon as you can
The last basic option is for the speaker to remain outside an expression's scope altogether, having no role in the conception conveyed apart from the conceptualizer role itself (which he has in every expression). In this event the speaker's construal is maximally subjective. There are many sorts of prominence, and while some kind of quantization may in each case be discernible, I suspect that in general it may be less obvious and less important in this area. Nonetheless, the two kinds of prominence that are most essential for grammatical purposes show definite quantum effects. One type is profiling, which might be characterized as "reference within a conceptualiza tion". Within its scope (i.e. the conceptual content it invokes), every expression profiles (designates) some substructure. Thus knuckle profiles a certain substruc ture within the conception of a finger, and finger within the conception of a hand. Many expressions (verbs, adjectives, prepositions, adverbs, etc.) profile rela tionships. For example, conquer designates a two-participant relationship that evolves through time, whereas the stative-adjectival conquered profiles the final resultant state of that process. {Conqueror profiles a thing, namely the agentive participant of conquer). Even though an expression's profile is not invariably sus ceptible to precise delimitation, the contrast between profile and non-profile is ba sically discrete and grammatically significant. In particular, the nature of its profile determines an expression's grammatical class. For expressions that profile relationships we need to recognize a second type of prominence, pertaining to the relational participants, whose grammatical import is hardly less substantial. It is usual for one participant — which I call the trajector — to stand out as the primary figure within the profiled relation. Additionally, there is often a second "focal" participant — termed the landmark — with the status of secondary figure. Observe that two expressions, e.g. before and after, may invoke the same conceptual content and even profile the same relationship within it (in this case one of temporal precedence), yet differ semantically because they impose opposite trajector/landmark alignments. While participant prominence may in general be a matter of degree, I believe that trajector and landmark status represent distinct quantum levels, and that they furnish the ultimate basis for the notions subject and object.
THE LIMITS OF CONTINUITY
17
Now that we have examined the "internal structure" of conceptions, inclu ding both content and construal, it is time to recall the working hypothesis advan ced earlier : internally, discreteness predominates (a more cautious phrasing is that continuity is circumscribed and mitigated in various ways) ; by contrast, semantic effects due to "external" factors — i.e. relationships with other conceptual struc tures — are basically continuous. These external factors will now be briefly discussed. While they tend to be neglected, I do not regard them as incidental or even subsidiary, but as integral components of linguistic meaning. Moreover, they would seem to be essentially continuous (although the discovery of significant quantization would not at all surprise me). An important aspect of linguistic semantics is our ability to construe one structure against the background provided by another. There are many kinds of background, including previous discourse, pertinent assumptions and expecta tions, and — in metaphor — the role of the source domain in conceiving and structuring the target domain (cf. Lakoff and Johnson 1980 ; Lakoff and Turner 1989 ; Turner 1987). While it is not hard to think of possible quantization in this realm, I wish to emphasize an important parameter that may well be continuous : the salience of the background structure, i.e. its level of activation in the construal of the target. For example, once a discourse referent is introduced its salience tends to diminish through the subsequent discourse unless and until it is mentioned again. There are of course discretely different ways of doing so (e.g. with a pronoun, or with a definite article plus noun), reflecting quantized estimates of the referent's current status (cf. Givón 1983 ; van Hoek 1992). But its salience per se (and hence the effect of its background presence when it remains implicit) presumably varies continuously. The relation between the source and target domains of a metaphor poses a number of thorny questions. To what extent do we understand the target domain prior to (or independently from) its structuring by the source domain? To what extent does target-domain reasoning depend on metaphorical structuring? To what extent do the source and target domains merge to form a "hybrid" conception (Fong 1988)? Possibly we are dealing here with basically continuous parameters. Be that as it may, there are clearly many expressions that originate through meta phorical extension even though the target domain can easily be grasped indepen dently. At least in such cases, we can speak of the gradual, presumably continuous "fading" of a metaphor, reflecting the declining likelihood and/or level of the source domain's activation on a given occasion of the expression's use. Intuitively, for example, the literal sense of fade ('decrease in color intensity') is still reasonably salient in expressions like fading metaphor, fade from memory, etc. By comparison, reflect reflects more weakly the source domain of light and mirrors.
18
RONALD W. LANG ACKER
A related phenomenon is analyzability, the extent to which the component elements of a complex expression are recognized within it and perceived as contri buting to its meaning. Thus complainer is more analyzable than computer, which in turn is more analyzable than ruler (i.e. 'instrument for measuring and drawing — "ruling"—straight lines'). We invariably interpret complainer as 'someone who complains', whereas we do not necessarily think of a computer specifically as 'something that computes', and a ruler is hardly ever thought of as 'something that rules [lines]'. I consider analyzability to be an important dimension of linguistic semantics. Indeed, I characterize an expression's meaning as comprising not just its composite semantic value, but also the entire compositional path which leads to it. In processing terms, analyzability is interpretable as the likelihood or degree to which component semantic values are activated along with the composite conception. Presently I have no linguistic evidence to suggest that this parameter is other than continuous. Finally, I assume an "encyclopedic" view of linguistic semantics which de nies the existence of any precise or rigid line of demarcation between knowledge that is "linguistic" and knowledge that is "extra-linguistic" (see Haiman 1980 and Langacker 1987, ch. 4). Our conception of a given type of entity — e.g. a cat, an apple, or a table — is almost always multifaceted, comprising a potentially openended set of specifications pertaining to any domain of knowledge in which it fi gures. Of course, these specifications vary greatly in their status. Some (like shape and primary function) are so "central" to an expression's meaning that they are virtually always activated when it is used. Other specifications (contingent know ledge, cultural associations) may be quite peripheral, being activated only in very special circumstances. A priori, it is reasonable to suppose that their likelihood and strength of activation vary continuously, being determined by such basically conti nuous factors as entrenchment, cognitive salience, and contextual priming. It is time now for a brief summary and conclusion. By way of summary, I will merely reiterate a basic working hypothesis : that discrete constructs are essential if not predominant for the characterization of linguistically coded conceptualizations, so far as their internal structure is concerned ; whereas those aspects of meaning which involve the relationship of such conceptions to one another would appear to vary continuously. By way of conclusion, let me emphasize that discreteness vs. continuity may itself be a matter of degree, the various shades and types of discreteness being distributed along a (quantized) continuum. Depending on what we examine and what we wish to emphasize, both discreteness and continuity can be discerned in virtually any aspect of language and cognition. Our task is not to choose between them, but rather to explicate the specific ways in which this fundamental opposition plays itself out across the full range of linguistically relevant phenomena.
THE LIMITS OF CONTINUITY
19
REFERENCES Berlin, Brent, and Paul Kay. 1969. Basic Color Terms : Their Universality and Evolution. Berkeley : University of California Press. Fong, Heatherbell. 1988. The Stony Idiom of the Brain : A Study in the Syntax and Semantics of Metaphors, San Diego : University of California doctoral dissertation. Givón, Talmy (ed.) 1983. Topic Continuity in Discourse : A Quantitative CrossLanguage Study, Amsterdam : John Benjamins. Haiman, John. 1980. Dictionaries and Encyclopedias, Lingua 50.329-357. Kay, Paul, and Chad K. McDaniel. 1978. The Linguistic Significance of the Meanings of Basic Color Terms, Language 54.610-646. Lakoff, George, and Mark Johnson. 1980. Metaphors We Live By, Chicago : University of Chicago Press. Lakoff, George, and Mark Turner. 1989. More than Cool Reason : A Field Guide to Poetic Metaphor, Chicago : University of Chicago Press. Langacker, Ronald W. 1985. Observations and Speculations on Subjectivity. In John Haiman (ed.), Iconicity in Syntax, 109-150, Amsterdam : John Benjamins. Langacker, Ronald W. 1987a. Foundations of Cognitive Grammar, vol. 1, Theoretical Prerequisites, Stanford : Stanford University Press. Langacker, Ronald W. 1987b. Nouns and Verbs, Language 63.53-94. Langacker, Ronald W. 1990. Concept, Image, and Symbol : The Cognitive Basis of Grammar, Berlin : Mouton de Gruyter. Langacker, Ronald W. 1991. Foundations of Cognitive Grammar, vol. 2, Descriptive Application, Stanford : Stanford University Press. Langacker, Ronald W. 1993a. Reference-Point Constructions. Cognitive Linguistics 4 : 1.1-38. Langacker, Ronald W. 1993b. Grammatical Traces of some "Invisible" Semantic Constructs, Language Sciences 15 : 4.323-335. Rosch, Eleanor. 1978. Principles of Categorization. In Eleanor Rosch and Barbara B. Lloyd (eds.), Cognition and Categorization, 27-47, Hillsdale, N. J. : Erlbaum. Ross, John R. 1972. The Category Squish : Endstation Hauptwort. Papers from the Regional Meeting of the Chicago Linguistic Society 8.312-328. Turner, Mark. 1987. Death is the Mother of Beauty. Chicago : University of Chicago Press.
20
RONALD W. LANGACKER
van Hoek, Karen. 1992. Paths Through Conceptual Structure : Constraints on Pronominal Anaphora, San Diego : University of California doctoral dissertation. Wierzbicka, Anna. 1984. Cups and Mugs : Lexicography and Conceptual Analysis. Australian Journal of Linguistics 4.205-255.
CONTINUITY AND MODALITY ANTOINE CULIOLI University of Paris VII (URA 1028, CNRS), France
Introduction In this paper I shall attempt to identify and describe two possible fields in which the implications of continuity are particularly manifest. Firstly I shall deal with the approach which is concerned with problems such as continuity and lin guistics, and more specifically semantics. Secondly I shall consider that no basic discrimination between syntax, semantics and pragmatics is called for and I purpose, here, to put forward an attempt to model the operations which allow us to establish a verifiable relation between representations on the one hand, and on the other, the traces of these operations which implement the transition from re presentations to textual phenomena. 1.
The choice of a theoretical framework and an attempt at a definition of continuity
The question of continuity can obviously be approached in two different perspectives. It can either be considered from a methodological point of view, which involves the problem of continuity in mathematics, and concerns in a more general way the construction of a system of metalinguistic representations, or it can be considered from a point of view based on observation, i.e., a "theory of observables", in such a way that we may see which points are affected by this problem of continuity. For reasons of competence — or perhaps incompetence in the areas of mathematics and computer science — my position will be that of the linguist. I shall thus present a certain number of examples, which are wellknown, and do not allow any discussion concerning facts, to bring into view the areas in which continuity occurs. Before discussing my examples, however, I should first like to add one or two considerations concerning the notion of continuity such as it appears to the linguist. To simplify the question, I would say that continuity occurs, in the strict sense of the word, at a basic level, in the relation between several different types of continuity on the one hand, and on the other, the phenomena which one en-
22
ANTOINE CULIOLI
counters in the linguistic domain, i.e. in the assimilation which can be made between empirical observations and the various types of continuity known in mathematics. Let us now consider the areas in which continuity occurs. This phenome non is particularly manifest in domains such as that of exclamatives, which I have had the opportunity of studying in some detail in various languages around the world (cf. Culioli 1974). Examples like Quel beau livre ! ("What a beautiful book ! ") show that there is constancy in a certain number of representations, which is of great interest, because here it is possible, in certain respects, to carry out a kind of assimilation based on the fact that the process is carried out as though one had an ordered series of occurrences of representations of, say, livre or beau livre in such a way that, in exactly the same way as for real numbers one would have an order by means of which one would eventually reach the ideal, inaccessible, detached representation, which would then provide the "really" something. The problem of continuity also occurs in another domain, which is that of time and space, and more particularly in the domain of time. Once again it is possible to demonstrate, just as I have done above, that we have, basically, in a pre-mathematical fashion, discovered the notion of the limit. In the domain of time, when one works on representations of events by means of intervals provi ded with topological properties — even the elementary, rudimentary topology which one is brought to introduce as a linguist — one becomes aware of the problem of the limit. See Figure 1 :
Figure 1 If one is located in A, in relation to B, one is obliged to take two points ; in A, one can never say that one has reached the boundary unless, being at a point xj in B, one says "that is it ; it's over" and in this case one reconstructs a last, imaginary point xi. But one cannot have a final point in A. This phenomenon occurs with striking regularity, whatever language one considers. I have never come across a counter-example to this type of thing, and the empirical facts related to modeling are extremely clear. This leads to a whole series of consequences. Furthermore, if one looks at the other side of the figure,
CONTINUITY AND MODALITY
23
the same problem arises concerning B in relation to A, etc., Thus, in a certain number of domains, which I shall not study here, one encounters this problem which is the problem of continuity in what might be called the "serious" sense of the term or at least the "real" sense compared to constructions which have been put forward for instance in mathematics. Another point in connection with which one encounters the problem of continuity is that which naturally occurs when one is brought to introduce the notion of cuts — could there be such a thing as continuity without cuts ? — Here, space permitting, it would be interesting to look more closely at the pro blem and demonstrate that one cannot provide adequate models in the domain of linguistic observation if one fails to associate the notion of cuts with that of con tinuity. By a cut I mean something akin, in a way which mathematicians may find horribly metaphorical, to Dedekind's cuts, such as expounded in the foundation texts. Another area in which continuity becomes an active factor can be observed when one introduces the property of deformability. This occurs in connection with abstract schematic forms : when they are made to undergo certain pressures, either by bringing something to bear upon them, or by immersing them in a space element provided with certain properties, these abstract forms produce, through what must be called deformations ("warps" or "shear" effects), local forms, which will become associated with the basic form, which is an abstract schematic form. This property of deformability leads us to another acceptation of the con cept of continuity which is the extent to which there is a relation between conti nuity and contiguity, a question which has become highly controversial, as we all know. This question of contiguity arises each time a deformation is brought about, since it can be supposed that between the resulting local forms there is a relation in which there is indeed discontinuity on the one hand but also an ele ment which will have a connectional property, in the form of "jumps" from one form to another. This notion of contiguity is fundamental to a whole series of considerations, particularly in connection with relations of causality. The whole notion of causality revolves around the relation between contiguity and conti nuity. On the basis of these preliminary remarks I shall now proceed to analyse my examples. These are of widely different types, because the notion of conti nuity is difficult to discuss from a linguistic point of view if one's examples be long to a single type, owing to the wide variety of representations of continuity one obtains, depending on the type of example chosen.
24
2.
ANTOINE CULIOLI
The example of pouvoir
My first example is : (1) 77 peut atteindre le sommet (He can / may reach the top) I shall not enter into problems raised by translation into English or any other language. This example does not present any particular problem, and one may wonder why it is open to a certain number of interpretations. I shall not dis cuss them all, but there is one which is absolutely fundamental : that of being "in a position" to reach the top. This does not imply that he actually reaches the top, simply that he is "in a position" to. The problem is now to find a way to repre sent pouvoir in such a case, allowing for this interpretation. If we use the imperfect tense — in French this alternative past tense exists : (2) Il pouvait atteindre le sommet (2) is open to two interpretations. Firstly in Il a dit qu' il pouvait atteindre le sommet, we have, in indirect speech, the simple 'translation' of (1) Il peut atteindre le sommet. Secondly, il pouvait atteindre le sommet means "77 aurait pu..." ("He could have..."). This "unreal" interpretation means that "he did not reach the top". What, from a semantic point of view, makes these two interpreta tions possible ? In relation to the different states of affairs, we shall attempt to discover that property of the forms as marks of operations which provides ope nings for such interpretations. If we now turn to the preterit tense (simple past) — I shall not enter into the question of the passé composé at this stage — : (3) Il put atteindre le sommet (3) means "He was able to..." and not "He could..." i.e., "il a effectivement atteint le sommet" ("He actually reached the top"). Several facts come to light when one attempts to provide a representation of this phenomenon (see Figure 2):
CONTINUITY AND MODALITY
25
We shall begin by constructing a point in T0 which will give rise to two branches, in such a way that, in relation to this point, we have, in fact, an antici patory construction of another point in TX, which will represent : (""), and which is the representation of a validatable state of affairs, such that it will perhaps be possible to say in Tx "such is the case". So here we have a "gap" between T0 and TX, in such a way that we know that there is a path to be covered, which may, if necessary, be covered by a subject S if the latter wishes at any time to be at T x , which is subsequent to T0, depending on variable circumstances. But we always have at this point of junc tion (which branches out as the representation is elaborated) the possibility of taking a second path, whereby something else may take place, and in this case the "something else" is empty. This means that here we will have, taking < r > for relation, "< r > is the case", which is "envisaged", and here we do not envi sage anything else but "< r > is the case". This representation of pouvoir in the case of (1) Il peut atteindre le sommet, shows how the item undergoes a filtering process, due both to the aspec tual properties of atteindre le sommet and to modal or in a general way seman tic properties. The aspectual property of atteindre is well-known : we have a transition, such that we may initially be at a point where we have not yet started, where we are "seeking" to reach, then, should the case arise, we succeed, i.e. we "have reached" the top. The modal property is what I personally call téléonomique or "goal-directed", which means that one has fixed for oneself a purpose which one has evaluated as "good" and one therefore considers the space T0 —> TX as "to be completed". If on the other hand our position is strictly epistemic, in the sense that we would not have a subject engaged in the covering of a gap evaluated as "good", in this case we would have at TX something which would not be constructed as empty. So in the case of (1) Il peut atteindre le sommet, we observe that with pouvoir we always have the representation of one term plus another term. Here the other term is constructed as 0 . This incidentally recalls, in a different style, something I read recently in a book as old as The English Verb by M. Joos (1964), in which there is a whole series of considerations on modals which are extremely interesting from an intuitive point of view. When we have (1) Il peut atteindre, we see that the path T0 —> TX repre sents a path to be completed, whereas the other one (To —> Ø) represents a simple relation. This is why Il peut atteindre... means "Il n'a pas encore atteint' ("He has not yet reached..."). Let us now come back to the imperfect, which, as we shall see, functions as a "translated" value of the point of origin. We shall now go on to state that in the construction of any system of reference we deal with the construction of a certain number of points of origin, one of which is an absolute point of origin,
26
ANTOINE CULIOLI
in relation to which we shall have (apart from the the origin of speech, which H. Reichenbach (1947) calls the "point of speech") two origins : one of them is a translated origin which will retain the properties of the absolute point of origin, and the other, a "disconnected" origin, will have specific properties. This allows us to account for empirical facts. These findings are founded on observations not on a single language but on a very wide variety of languages which are not rela ted in any way, either geographically or genetically. When we deal with a trans lation — let us call the absolute origin T0 and the translated origin T'0 — the translation may take place in the past or, under certain conditions, in the future. But in almost every case, it takes place in the past, and we maintain the proper ties. To illustrate this process let us take the following examples : (4) s'il pleut (if it rains) (5) s'il pleuvait (if it rained) (6) s'il vient à pleuvoir (if it happens to rain) (7) s'il venait à pleuvoir (should it happen to rain) (4), (5), (6) and (7) are quite acceptable. But let us now consider (8) to (11) : (8) *s' il doit pleuvoir (if it must rain) (9) s' il devait pleuvoir (should it rain) (10) *s' il va pleuvoir (if it's going to rain) (11) s'il allait pleuvoir (if it were to rain) (8) is unacceptable, because il doit pleuvoir would be deontic ; (9) is acceptable ; (10) is unacceptable (I mean with the strictly hypothetical value : otherwise cases such as s'il va faire ça ("if he's going to do that") are perfectly acceptable, especially when si ("if') has the value of puisque ("since")) ; and (11) is accep table. These examples show that when we bring about a translation of this type, we obtain, almost mechanically, a filtering effect implemented by the aspectual and modal properties of aller, devoir and venir à. All these findings should be taken into account but in the present case I shall simply take the fact that when we have (2) Il pouvait atteindre le sommet, it is either a strict translation of il peut, (and in this case we have maintained exactly the same property), or a sepa ration (see Figure 3) :
CONTINUITY AND MODALITY
27
Figure 3 This raisěs the problem of the relation between contiguity and continuity in terms of space (I use the term "space" in a metaphorical sense). We now have a value of separability. Let us return to T0 in relation to T0. I have used T for "Time-space", and in fact there is also a parameter S for "Subject" (which I shall not introduce here, to facilitate things). When we construct this we have two possibilities : - either there is continuity from one to the other : one takes as a starting point a former, previous state of things, and one proceeds up to the present (this in fact corresponds to the definition of the imperfect which can be found in Greek grammars) ; - or there is a separation : in this instance, there will be reference to the construction of an "analogous reality", such that what prevailed at a certain time no longer prevails ; this is typically the case in the classic French example Il y avait, à tel endroit, quelque chose (il y avait means that now, this is no longer the case). For amusement's sake, I could take similar examples from many other languages. Now coming back to (2) il pouvait, let us consider Figure 4 :
Figure 4 We may observe that we have constructed a disconnected point (which I shall represent as T0 1 ), which stands for a fictitious, disconnected reference point. Then, in relation to this fictitious point, which in fact represents our abso-
28
ANTOINE CULIOLI
lute origin translated and disconnected, we may recommence the trajectory of il pouvait. In the same way, when we say (1) il peut in relation to the present, we are in fact saying that, should the case arise, we will say "il a pu " at TX. If we say (2) il pouvait this means that we may subsequently be able to say "il a pu ". Interestingly, il pouvait means "il aurait pu " — an unreal value — and there fore means "il n' a pas pu". At this point, one must explain why one has this re versal of the situation. I shall not be able to do this here, space not permitting, but I feel these reversals are a part of the problem of continuity. Although I have a suggestion for a solution, my aim here is to raise the problem, not to put for ward a demonstration. Let us now take (3) il put. In French, il put has one property which is ex tremely clear. Here again I shall not go into all the preambles which have allo wed me to reach this conclusion. We construct this tense as a closed bounded in terval in such a way that the complements are empty. We have, in fact, a pure transition (see Figure 5) :
Figure 5 Exactly the same conclusions — although differently formulated — have been reached by H. Seiler (1952), in his study of the aorist in a remarkable book on aspect in modern Greek, which I consulted recently in connection with this type of phenomena. This is what has also been called the taking into account of the process as a whole. One can observe phenomena of this kind in studies on aspect. There are also other properties which I shall not expound. What, now, are the implications of this in terms of our representation? It means that we have constructed a disconnected reference point, which raises the question as to why, when one constructs, within a reference space, a transition related to a disconnected space, it is inevitably constructed as a whole. If we take a transition, we can, point by point, obtain this type of thing i.e., we may say that point is dominant over this point. If we observe a distortion, this means that we have a cut. If we have no distortion, this means that we have a stable state. If we are astride, so to speak, we have introduced a cut. If we take position on the left, it means that there is a prospect of moving to the right. If we take position on the
CONTINUITY AND MODALITY
29
right, this means that we reconstruct a last point, etc. And here in this particular case, in fact, we take into account both the left and the right boundaries at the same time. And it can be demonstrated that one of the properties of this discon nected reference point is that it allows the existence of this property. We then observe that the branch T0—>TX which, in Figure II above represented a gap, a vacuum, is going to be occupied by a closed bounded interval (see Figure 6) :
Figure 6 Somewhere we will have a transition from "here nothing happened" (the gap remains) and here a change occurs in the course of the transition, so that we obtain "you have reached the top", and (3) Il put atteindre le sommet means in French "ƒ/ atteignit effectivement le sommet " ("He actually reached the top"). We could now go on to study il a pu in the passé composé, and show why it has such and such a value. In fact we may, in this manner, construct all of the semantic interpretations. I feel that I use the word "interpretation" too freely and it could be thought that this is solely a question of interpretative semantics. This is not the case at all, and I use the term because it is convenient and applicable both for production and for recognition in cases such as this, and because the linguist attempts to analyse problems from the outside. 3.
Manifestations of continuity in other types of examples
In my last part I wish to draw the reader's attention to other types of pro blems. I shall briefly discuss three other points. The first concerns the fact that the English (12) it may well (as in it may well be due to...) cannot, by any means, be translated by il peut bien {cela peut bien être dû... is impossible, and does not have the meaning of it may well be due to...). If we wish to translate it into French we must say cela
30
ANTOINE CULIOLI
peut très bien, cela peut fort bien, cela peut parfaitement être dû or cela pourrait bien être dû. If we now analyse by means of a metalinguistic system of representations the English may as compared to can, if we analyse well, if we analyse the conditional, we can demonstrate in algorithmic fashion why the English may well cannot be translated by peut bien but must be translated by peut fort bien or pourrait bien. This is where things start to become interesting, since we have here empirical data which supports and confirms the analysis which is otherwise purely formal. The second point concerns specifically semantic problems. I should have liked to discuss why the German mögen, and the Dutch equivalent mogen as well, in a majority of cases, no longer mean what they originally did (i.e. "to have the power or the strength to"), and have eventually come to mean "to want" or "to wish" : (13) ich möchte means "I would like" and (14) ich mag in certain cases, "I like" (either "apples" or "someone" if I understand correctly). The same applies to Dutch except that it is used essentially for human animates, another verb being used for "apples" or "chocolate". The third point concerns semantic drifting and contiguity. I have made an extensive study of history in the Germanic field ; I do not intend to go into this, but I wish to draw attention to a very interesting case of semantic drifting. We have the choice, here, between two positions : either we say that languages have an element of "divine madness", as Whitehead said about mathematics, and that anything can happen in the linguistic field, or we consider that it is possible to provide an explanation, which is my position. If therefore an explanation can be provided, it stems from what I have referred to as contiguity, i.e. the construc tion of a semantic space such that when we pass from one value to the next we will have a series of successive transitions, which implies that there is contiguity from one value to another, which is a form of continuity. Another very interesting case is that of the use of the verb lassen in German or låta in Swedish which mean both "to let" and the causative "to have someone do something". Here again if we study in detail how we can pass from "I let him do something" to "I have him do something", we come across a very interesting problem since we have the impression that a total or almost total re versal of the meaning has been brought about. Here, once again, we could de monstrate how the transition from one stage to the next is brought about, I mean, from an abstract point of view, as we do not always have all the historical ele-
CONTINUITY AND MODALITY
31
merits at our disposal. This presupposes once again the construction of a space within which these distortions or warps can be brought about. All these examples — maagan, mögen, lassen, — revolve around inter-subject relations. Conclusion Whether one considers the problem of modality from the point of view of the notion of the limit, from the point of view of semantic and aspectual proper ties, or from the point of view of inter-subject relations, one becomes aware that the linguist cannot provide adequate processing for semantic problems without introducing the concept of continuity in a serious sense of the term.
32
ANTOINE CULIOLI
REFERENCES
Culioli, A. 1974. A propos des énoncés exclamatifs. Langue Française, Paris : Larousse, pp. 6-15. Culioli, A. 1990. Pour une linguistique de rénonciation, t. 1 : Opérations et re présentations, Paris : Ophrys. Joos, M. 1964. The English verb ; form and meaning, University of Wisconsin Press. Reichenbach, H. 1947. Elements of symbolic logic, New-York : Macmillan. Seiler, HJ. 1952. L'aspect et le temps dans le verbe néo-grec, Paris : Belles Lettres.
CONTINUUM IN COGNITION AND CONTINUUM IN LANGUAGE HANS JAKOB SEILER University of Köln, Germany
1.
Introduction
Continuum is one of the central notions in the work of the UNITYP research group at the University of Cologne1. In the following presentation we shall first expose a number of theses about the continuum reflecting our actual views about this notion. Our demonstration will then proceed by way of commenting on a recent major publication where this notion has been extensively put to use. The publication is entitled Partizipation : Das sprachliche Erfassen von Sachverhalten. [Participation : The representation of states of affairs by the means of language] (Seiler/Premper 1991). While some of the contributions to this round table are dealing with continuity in lexical semantics, it is the purport of my paper to show the usefulness of this notion in the domain of semantax, i.e. the semantics of syntactic relations — specifically the relation between the verb and its complements and adjuncts. 2.
Theses about the continuum
1) The continuum is a construct serving the purpose of putting some order into a variety of facts. As such it may be compared with such other constructs as the paradigm and the Porphyrian stemma as it appears, e.g., in the phrase struc ture trees of the generativists2. 2) The continuum and the discrete stand to each other not in a contradic tory, but in a contrary or complementary relation : the notion of continuum pre supposes discreteness ; it depicts an increase vs. decrease of properties between discrete steps in a linear ordering. The notion of discreteness in turn presupposes that of continuity. 1) The label UNITYP is an abbreviation of the descriptive title of our project : "Language universals research and typology with special reference to functional aspects". The project is funded by the Deutsche Forschungsgemeinschaft which is herewith gratefully acknowledged. 2) On a detailed discussion of this comparison see Seiler 1985 : 21 ff.
34
HANSJAKOB SEILER
3) The UNITYP framework insists on the distinction between a scale and a continuum (Seiler,1.c.16 f.). Scale means 'measuring staff' ; it is a static, unidi rectional means for measuring regular intervals of a 'minus' or a 'plus' of certain properties. The continuum, on the other hand, while corresponding to the 'measuring staff', has properties that come up to the phenomena themselves : directionality, dynamics, binarity, complementarity, parallelism, and reversi bility. 4) The continua which we can detect within one particular language or in cross-linguistic comparison are prefigured by corresponding continua on the co gnitive-conceptual level. If we imagine a cognitive-conceptual content, such as, e.g., POSSESSION (Seiler 1983), we will find that it can either be progressively specified and elaborated upon, or, on the contrary, simply posited as such, wi thout further specification. A cognitive-conceptual continuum with intermediate steps spans between these two extreme options. 5) The continua that we posit on the cognitive-conceptual level are ideali zations. They cannot be arrived at by empirical observations alone. Instead, they derive from a constant shifting back and forth between observations and rational reasoning. They are regular. Linguistic continua, on the other hand, can be irre gular, showing overlaps, or gaps, as e.g. when we say that language X has no case marking. But it is precisely such statements that are only possible on the background of a framework where overlaps and gaps do not occur. 6) There must be a common functional denominator to a continuum. The items on the scale of a linguistic continuum are semantically distinct. E.g. tran sitivity differs semantically from case marking. Yet they have a denominator in common which we situate on the cognitive-conceptual level. 7) Adjacency is a further defining trait of a continuum : adjacent positions are more similar to one another, share more properties, than non-adjacent ones. 3.
The cognitive-conceptual dimension of Participation
The notion of Participation pertains to the cognitive-conceptual level. It is a relation between a participatum and its participants. The participatum, "that which is participated in", can be, e.g., a situation or a process. The participants can be, e.g., actants or circumstants, or more complex entities. The mental repre sentation of this relation is brought about by a number of different options which we call techniques. The techniques can be arranged in a continuous order pro gressing from minimal to maximal elaboration on the relation of participation — or, in the reverse sense, from minimal to maximal condensation of the relation. In the overview (p. 35) you find an ordered array of 10 techniques — options for the mental representation of the relation of participation. In paren theses you find such familiar morpho-syntactic notions as nominal clauses, noun/verb distinction, verb classes, valence, etc. These are preceded by such terms as posited participation, distinction participants/participatum, generally
CONTINUUM
35
implied participants, etc. They pertain to the cognitive-conceptual level and are coined to keep the two levels distinct. They are not meant to replace morphosyntactic terminology, but to encompass it. At a first glance the elaboration proceeds as follows (see the overview) : in the first two techniques the mental representation is holistic, instantaneous, or global. Techniques 2 to 5 lead to an increasing differentiation on the participation. Technique 6 marks a turning point where differentiation gradually shifts from the participatum to the participants. Techniques 7 and 8 produce further differentiations among the participants. Now that both participatum and participants are fully specified, techniques 9 and 10 proceed to further specifying the relation between them. This is brought about by a special relator that acts as operator. The path just described simulates the stepwise build-up of the relation of participation in our minds. It can also be followed in the reverse, leading to an increasingly condensated representation. We call such an order of techniques a dimension, in our case the dimension of participation. Overview 1. Posited Participation (holophrastic expressions, nominal clauses) 2. Distinction participants/participatum (noun/verb distinction) 3. Generally implied participants (verb classes) 4. Specifically implied participants (valence) 5. Orientation (voice) 6. Transition (transitivity and intransitivity) 7. Role assignment (case marking) 8. Introduction of new participants (serial verbs) 9. Cause and effect (causatives) 10. Complex propositions (complex sentences) Now you may say that this is all right. But how does the gradual exfolia tion or condensation of the relation of participation work ? What are the opera tional steps ? How are the techniques delimited from one another ? How do we define the dimension? Here we must have recourse to one further notion, the notion of parameters in a specifically defined sense. They are principia comparationis with a plus pole and a minus pole and possible intermediate steps, thus in principle again continua. They represent possible universals of cognition, and, at the same time, possible universals of language. As such, they would have to be defined — a task that must be left to further work. But like the notions of techniques and di mension, the parameter is an operational notion in the first place. This means that the definitions would have to be operational rather than categorial. The pa rameters are listed separately on p. 37 and figure again in the vertical of the fol lowing chart.
HANSJAKOB SEILER 36
PARAMETERS 1. meta language / object language 2. context-sensitive/ context-indep. 3. categorial / thetic 4. relational / absolute 5. referential / general 6. time-stable / temporary 7. dynamic / stative 8. active / inactive 9. plurivalent / monovalent 10. centralized / decentralized 11. profile outgoing / ingoing situation 12. basic / derived 13. individualized / non-individualized 14. affected / nonaffected 15. total in vol v./ non-total involv. 16. overt relator / covert relator 17. volitional / nonvolitional 18. control / noncontrol 19. emotive / nonemotive 20. epistemic / nonepistemic
2. Distinction p'atum/p'ants
3. Generally implied p'ants
4. Specifically implied p'ants 5. Orientation
6. Transition
0
7. Role assignment
0
8. Introducing new p'ants
0
9 Cause + effect
0
10. Complex propositions
0
1. Positing p'ation
0
0
0 0
0 0
0
0
0
0
I
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I 0
0
0
0
0
0
0
I
0
I
0
0
I
0
I I 0
I
0
I
I I
0
0
0
0
0
0
0
I
0
0
I
I
0
0
I
I
0
0
0
I
I
I
0
0
I
I
I
I
0
0
I
I
I
I
I
I I
I
I
I
I
I
I
0
I
I
0
I
I
I
I
I
CONTINUUM
37
List of parameters 1. metalanguage / object language 2. context-sensitive / context-independent 3. categorial / thetic 4. relational / absolute 5. referential / general 6. time-stable / temporary 7. dynamic / Stative 8. active / inactive 9. plurivalent / monovalent 10. centralized / decentralized 11. profile outgoing situation / profile ingoing situation 12. basic / derived 13. individualized / non-individualized 14. affected / non-affected 15. total involvement / non-total involvement 16. overt relator / covert relator 17. volitional / non-volitional 18. control / non-control 19. emotive / non-emotive 20. epistemic / non-epistemic This chart exhibits a cognitive-conceptual space featuring in the horizontal the techniques in the order from 1 to 10, and in the vertical 20 parameters in an order that follows the gradual build-up of the relation of participation as des cribed above. How did I arrive at these parameters, and how did I arrive at their ordering ? The procedure is not entirely governed by predetermined criteria. Rather, there is an initial understanding of such a Gestalt as the mental represen tation of the relation of participation. Parameters are chosen so as to guarantee maximum recurrence in other dimensions. Their ordering follows from rational considerations in the following vein. When perusing the sequence of parameters one can see that the first three are chosen with reference to a global characterization of Participation ; that 4 to 9 give rise to a progressive differentiation on the participation side ; that 10 to 12 mark a shift from the participatum to the participants ; that 13 to 15 produce further differentiation on the participants' side ; and that 16 to 20 have reference to the relation itself by introducing a RELATOR indicating whether the relation is one of cause, finality, consequence, etc. For lack of time and space it is not possible here to comment on every single parameter. Most of the names may be more or less self-explanatory.
38
HANSJAKOB SEILER
Nr. 11 "profile outgoing vs. ingoing situation" refers to a directed and bounded process (participatum) extending from its origin to its final stage and comprising an initiator that sets the process into going, and an undergoer affected by the process. In the technique "Orientation" that introduces this parameter the operation either profiles the outgoing situation and highlights the initiator, or the ingoing situation highlighting the undergoer. The total operation within this dimension proceeds in the following steps : 1) One to three new parameters are introduced in each technique as compared with the technique immediately preceding. The newly introduced parameters are those which distinguish it from the preceding technique. 2) The parameters inherited from a preceding technique are "active" for the following technique inasmuch as they produce new oppositions ; i.e. oppositions not encountered for that particular parameter in preceding techniques. On the chart this is symbolized by a vertical stroke. Otherwise they are "inactive", but they still form part of the total pool of parameters constituting the dimension. This is marked by a circle. To give an example : Parameter 4 (relational/absolute) applies to the opposition between noun and verb in technique nr. 2 : Nouns are typically absolute, non-relational ; verbs are typically relational, non-absolute. In technique nr. 3 the same parameter refers to the opposition between different verb classes, thus a new opposition. It is therefore "active" in this technique. In technique nr. 4 it refers again to the opposition between verb classes, and is therefore "inactive". 3) The active plus the newly introduced parameters define the technique, which thus appears as a bundle of parameters. 4) The dimension is defined by the thus ordered sequence of techniques. The schema visualizes the Gestalt of a continuum. The idea of decomposing such phenomena as transitivity or roles into a number of parameters is not new. Compare the work of Hopper and Thompson (1980) on transitivity, of Givón (1981) on passive, or Ch. Lehmann (1991a) on verb classes. Note that the bundles of parameters define the technique, but not necessa rily the linguistic means representing that technique. Thus, verb serialization is a means of representing technique nr. 8 ("introducing new participants"). But verb serialization serves other functions as well, such as TAM and directionality (Bisang 1991 : 510). The four parameters define the technique as represented by serial verbs only insofar as they represent participation, that is in their manifesta tion as coverbs. The chart says that techniques to the left and the right margin need fewer constitutive parameters as compared with the more central techniques. Transition shows the greatest number of active definitory parameters with regard
CONTINUUM
39
to the total pool. This amounts to saying that Transition is the prototypical tech nique of the entire dimension. 4.
Linguistic coding and typology
The question here is as follows : how are the various positions in the co gnitive-conceptual space as outlined above coded in one individual language and across languages ? At first sight it seems that at least in some languages the lin guistic coding reflects the cognitive-conceptual orderings in techniques and pa rameters fairly closely. As we follow the order from left to right we normally find a steady increase in morpho-syntactic "machinery" put to use in the repre sentation of Participation. In our UNITYP work we have described this by two universal operational principles, called indicativity vs. predicativity. Indicativity means that the cognitive concept is represented as inherently given, as being taken for granted. A simple renvoi suffices. Compare holophrastic utterances such as fire !, night !, or nominal clauses such as nomen omen. If the morpho-syntactic means are sparse, this is compensated by a high degree of pragmatic relevance : context-sensitivity, discourse pragmatics, etc. Predicativity means that the cognitive concept is represented by progres sive explication and differentiation, first on the participatum, then on the parti cipants, and finally on the signs for the relation itself. More and more "machinery" is being introduced, the number of oppositions increases. Complex sentences to the right-hand pole of the continuum represent the end-point. A third semiotic principle, iconicity, appears to have its preferential peak somewhere in the middle where the two other principles are about equal in force. In fact, transitivity - intransitivity, representing transition, exhibit a fairly balan ced distribution of markings on both the participatum and the participants, the reby iconically depicting the shift from the participatum to the participants. This technique also marks a turning point between constructions of government (e.g. valence) and constructions of modification (e.g. adverbials). Looking at one particular technique, say, "Cause and effect", we find that it aptly represents an invariant with corresponding linguistic variants again con tinuously ordered according to indicativity and predicativity. (German) : (1) i. Die Grossisten machen, dass die Ölpreise sinken "The wholesalers bring it about that the oil prices go down" (complex sentence, maximally predicative) ii. Die Grossisten lassen die Ölpreise sinken "... let... go down" (lassen as auxiliary verb, less predicative)
40
HANSJAKOB SEILER
iii. Die Grossisten senken die Ölpreise (one verb, umlaut, morphologically marked as causative) iv. Die Grossisten dumpen die Ölpreise (foreign word, unanalyzable, so-called "lexical causative", minimally predicative) Analogous continua between indicativity and predicativity can be observed for the other techniques as well — both intra — and cross-linguistically. If this is so, we predict that there will be overlaps in the linguistic coding as compared with our cognitive-conceptual space, where such overlaps do not occur. We have seen that the cognitive invariant "cause and effect" includes complex sentences as one of its coded variants. But complex sentences also ap pear under the technique "complex propositions" — and prototypically so. There is thus overlap. In a way, one might say that the linguistic codings produce a multi-dimensional space as compared with the two-dimensionality of our schema. On the other extreme of our causative continuum we found the so-called "lexical causatives". How can we draw a line between these alleged causatives and transitive action verbs? In a sequence of codings as arranged in (1), i.e. a continuum, it seems plausible that dumpen marks the end point. The continuum from i. to iv. exhibits a steady increase in grammaticalization. It operates on the parameter RELATOR that appears as bring it about that or to cause to and eventually ends up as an immaterialized, zero operator (Ch. Lehmann 1991b : 25 f.). Parameters can be over-extended or stretched. This is certainly the case when linguists are mislead into equating to kill with cause to die, or, when they even postulate causativity for such verbs as to sit under the odd paraphrase as "to cause one's body to 'rest upon the haunches'" (Wierzbicka 1975 : 520). Further overlaps are due to the fact that different techniques are not coded in disjunction, but quite often in combination with one another. On the other hand, there are also gaps. During the times of dominant structuralism it was con sidered unscientific to state that a language like Tagalog does not exhibit a clearly recognizable noun-verb distinction (Himmelmann 1991). Nowadays such assertions can be quite to the point, because, among other things, they may lead to inquiring into how these languages make up for the lack, e.g. by extension of one parameter in the technique "Orientation" ; in this case it would be the para meter "basic vs. derived". This amounts to saying that on the cognitive-concep tual level participatum and participants are distinct ; but on the level of linguistic coding it is not necessary that every language represents the distinction by a dis tinction between word classes noun and verb. A persistent endeavour to keep the levels distinct may help to settle the endless discussions about this problem.
CONTINUUM
41
It may also be helpful in statements of grammaticalization and of typology. To cite but one example : it has become customary to speak of "indirect case marking" when the morphological marking of case roles is on the verb instead of on the noun (Mallinson & Blake 1981 : 42 f.). On the cognitive-conceptual level we have the invariant "role assignment" (technique nr. 7). Morpho-syntactically there is an overlap between markings on the noun (= case marking) and markings on the verb (= cross-reference or agreement). Some languages exhibit one procedure to the exclusion of the other. Some other languages show a split along the parameter "centralized-decentralized" : decentralized participants are marked on the noun, centralized participants are marked on the verb. Typically, the maximally grammaticalized subject nominative is marked by verb agreement even in such languages that don't show any other verb agreement (English). And typically, a subject nominative has no case suffix on the noun even in such languages that otherwise do show case suffixes (German). This illustrates the pathway that grammaticalization has taken and is likely to take in analogous situations. 5.
Concluding remarks
1) We started from the assumption that the human mind has the ability to construct continua for the purpose of ordering the phenomena of this world. We set out to show the usefulness of such a construct. It is useful in leading us to distinguish between a level of invariance, where the continuum appears in its idealized form, and a level of variation, where continuity still obtains but over laps and gaps are not excluded. On this level it can be shown how semanto-syntactic phenomena that may seem to be disconnected otherwise, are related to one another : how verb classes are related to voice, voice to transitivity, transitivity to case marking, etc., but also distant relationships as between noun-verb dis tinction and voice, or causative and passive. 2) Taking up a question raised by G. Leech : "How do we demonstrate that gradients exist ?" The continuum appears to be the locus of linguistic energeia : gradual exfoliation vs. condensation ; substitutability between adjacent posi tions ; grammaticalization paths and paths of language change ; all amenable to empirical verification. 3) A note on the question of how we get to our cognitive-conceptual level : is not linguistic coding the only way of access ? Are we not moving in a circle ? — Yes, we are ; but in a non-vicious one. Initial observation of language data unveils the leading dynamic principles : indicativity, iconicity, predicativity, and the tendency toward gradient transitions. Rational considerations construct a lo gically consistent framework. Data from concrete languages can then be measu-
42
HANSJAKOB SEILER
red against such a framework. This, and only this, is the sense in which we want cognition-conceptuality to be understood. We do not mean to rely on the results of adjacent sciences — psychology, epistemology, etc. — for confirmation. 4) Both conceptual and linguistic continua admit of indefinitely recove rable intermediate values. On the linguistic side this can be shown by pointing out that techniques, steps on the overall continuum, admit of subcontinua which, in turn, are bundles of parameters, which are again continua. The antinomy bet ween a continuum model and a model of "multiple discrete factors intersecting to yield a finely articulated range of possibilities" as evoked by R. Langacker can be resolved by introducing the notion of parameters into the dimensional model.
CONTINUUM
43
REFERENCES Bisang, Walter. 1991. Verb Serialization, grammaticalization and attractor positions in Chinese, Hmong, Vietnamese, Thai and Khmer. In Seiler & Premper (eds.), pp. 509-562. Givon, Talmy. 1981. Typology and functional domains. Studies in Language 5.2, pp. 163-193. Himmelmann, Nikolaus. 1991. The Philippine challenge to universal grammar. Arbeitspapier Nr. 15 (Neue Folge), Universität zu Köln : Institut für Sprachwissenschaft. Hopper, Paul & Sandra A. Thompson. 1980. Transitivity in grammar and dis course. Language 56, pp. 251-299. Lehmann, Christian. 1991a. Predicate classes and Participation. In Seiler & Premper (eds.), pp. 183-239. Lehmann, Christian. 1991b. Relationality and the grammatical operation. In Seiler & Premper (eds.), pp. 13-28. Mallinson, Graham & Barry Blake. 1981. Language Typology. Cross-linguistic Studies in Syntax, Amsterdam et al. : North Holland Publ. Comp. Seiler, Hansjakob. 1983. Possession as an Operational Dimension of Language. Language Universals Series, vol. 2, Tübingen : Gunter Narr Verlag. Seiler, Hansjakob. 1985. Linguistic continua, their properties, and their interpretation. In Seiler & Brettschneider (eds.), pp. 14-24. Seiler & Brettschneider (eds.). 1985. Language Invariants and Mental Operations. International Interdisciplinary Conference held at Gummersbach/Cologne, Germany, September 18-23, 1983. Language Universals Series, vol. 5, Tübingen : Gunter Narr Verlag. Seiler, Hansjakob & Waldfried Premper (eds.). 1991. Participation. Das spra chliche Erfassen von Sachverhalten. Language Universals Series, vol. 6, Tübingen : Gunter Narr Verlag. Wierzbicka, Anna. 1975. Why 'kill' does not mean 'cause to die' : The semantics of action sentences. Foundations of language, 13 : 4, pp. 491-528.
IS THERE CONTINUITY IN SYNTAX ? PIERRE LE GOFFIC University of Paris III, France and URA 1234, CNRS, Caen
0.
Introduction
My aim in this paper is not to make propositions about continuity. Neither is it to give a mathematical or philosophical definition of it. Continuity, to my mind, can serve as "l'un de ces vocables, fort nombreux, auxquels il n'est demandé que de recouvrir d'un terme expressif une signification fuyante", as Guillaume said of metaphor (1968, p.171) : one of those intuitive and expressive terms which cover somewhat diffuse meanings ; it is possibly in this respect that it may be of some use to us. I wish therefore to put forward a criticism of discreteness, in those areas in which, wrongly or rightly, it is taken for granted. Thus, the question which I shall be examining is, in some ways, that single hour of metaphysics and critique in which one may indulge once a year, as Descartes is believed to have said. Is it really an established fact that there is, in language, a totally and indisputably dis crete framework or architecture ? 1.
Discreteness : a point of transition between two continuities
To begin, my representation of facts is extremely simple, possibly running the risk of over-simplification. I imagine language activity (in one certain respect, at least) as is represented in the simple diagram below :
46
PIERRE LE GOFFIC
In this sort of double funnel, the left part shows the phases that precede any interlocutory language act, i.e., the person who is about to speak forms, some how or other, a linguistic project, and puts it into shape. In the right-hand part, the receiver, on the basis of indications provided by the first speaker, tries in turn to build up for himself a representation which he can interpret and make pragmatic use of. Well, I would say, loosely speaking, that there is continuity at the start and at the finish. Where discreteness comes in, — necessarily to my mind —, is at the point of transition between the two. This point must necessarily be discrete since it is the only objectifiable part of the linguistic process ; it is the visible part of the iceberg. Here, in order to proceed, to become objectified, the message must be translated into formal, usable symbols. This, I think, is a question of common sense, which does not mean that guarantors and authors could not be found for it. There is a filtering process (somehow or other) between all the diffuse elements of experience and intended meaning, and the coding into discrete units. Then, the movement is continued and shifts to the interpretation side. Here one could talk at length about the infinite or at least inexhaustible character of meaning, in this part of the language act. Thus the point of transition necessarily occurs. I feel that discreteness, if anywhere, is here, i.e., at this necessary, compulsory transition point, which is the condition of communication between two ill-defined worlds which I have no means of describing but to which one can apply, roughly speaking, the term of continuity. One could simply add that continuity, in all cases, is multiform, if only with respect to the time and space dimensions, and all the metaphorical uses made of these. Language has systems of automatic adaptation to continuity, and this seems to me to be one of the features which make the formal apparatus of enunciation such a remarkable device : it is perfectly adapted to continuity. The objects in the world undergo changes in space and time, and a number of linguistic markers (among the most basic in language) have, among other functions, that of maintai ning a kind of permanence in the changing of these objects, thus giving us the comfortable illusion of fixity and stability that suits our pragmatic needs, in a world which is neverthess in a demonstrable state of permanent change. There is thus a sort of contradiction, which could be (and has been) developed from a phi losophical point of view. There is a kind of automatic adaptation of language in some of its essential aspects to this shifting character of reality. I shall not go into details about this, the important point being at the moment that there is this neces sarily, inescapably, discrete point of transition.
CONTINUITY IN SYNTAX
2.
47
The degree of discreteness in linguistic coding
But let us now consider this point to see if it really is all that discrete. Of course I do not intend to completely refute the observations made above and I shall attempt to remain within reason. Discreteness, here, has two facets : firstly the use of discrete units (phonemes and morphemes), and secondly, the categorization of these units in terms of discrete grammatical categories. As to the first point, the units must be discrete in order to be interpreted. And in this respect I shall say from the outset that there is no discussion possible : should there be any ambiguity, it will be a genuine alternative ambiguity (cf. Le Goffic 1981 ch. 7 ; Fuchs 1991 pp. 112-113). There is no continuity from one phoneme to the next, from a linguistic point of view. There can be continuity in the physical production of sound, but there is of course no continuity in the phonematic interpretation of these productions. As to the second point, everybody agrees that a term can be characterized, for example, as a noun or a verb, and further as having such and such a function in a sentence. So far, one reasons in discrete terms, with necessary choices : a term is a noun, so it is not a verb ; it is subject and therefore not something else. Although language can provide intermediate stages (like the infinitive or certain forms of nominalisation, sharing nominal and verbal features), one ultimately reaches a stage of discrete categorization. I would now like to examine these two points more carefully. 3.
Are units totally discrete ?
I have just expressed the intention of not going against the necessarily dis crete character of phonemes and morphemes. However, there are a few small problems concerning morphemes. Roughly speaking, if the cutting-up into mor phemes were possible without any restrictions, we would have a stock of per fectly determinable words. So it would be possible, with a few arrangements, to draw up a typical, fixed Hst of the lexicon of a language. Now, when one looks at history, including the history one fashions as a speaker, it is clear that a number of problems arise. If things are to be absolutely discrete, this means that each unit is to be confined to its own place, like a cell or an atom, adjacent to the next one, and so on. The units will not touch each other and will remain by definition separate. Where the contradiction arises, is when one unit gives rise to two, in the process of dissociation, or conversely when two units become a single one, in the process of fusion. Now this does occur, and the history of language bears witness to this, if not abundantly, at least clearly.
48
PIERRE LE GOFFIC
The relevant examples are habitually cast aside (although many of them are well established and well known), belonging to that minority of zero point some thing per cent which do not conform to the statistically normal situation used as a reference. The problem is that there is always a small fringe on the edge of the system which prevents one from encompassing it. This is a delicate problem, to which I am not sure what answer one could suggest from the outset, as it con cerns the status which should be given to that zero point something percent. In ordinary practice the rejection of these examples is a reasonably workable procedure. But this may be precisely the reason why it is ultimately impossible to encompass the system. So let us not cast them aside by principle. The history of various sciences may bring some support to the method of focusing on minority phenomena. I shall now proceed to a brief comment on a few examples, starting with cases of differentiation of units (leading to homonyms) : - the French verb voler : "to fly" and "to steal". We know that the flight of the bird is the primary meaning and that the language of venery (le faucon vole la perdrix "the falcon 'flies' the partridge", i.e. "the falcon, in its flight, gets hold of the partridge by force") gave rise to something which is accepted in synchrony as another separate unit ; - the French noun grève : "strand, beach" and "strike". The Place de Grève in Paris (strand along the Seine) was a gathering place in the XIXth century for workers wishing to stop work ; - in other cases differentiation led to graphic differences, as between dessin ("drawing") and dessein ("aim"), or between penser ("to think") and panser ("to dress (a wound)", related to pansement : "a bandage"). I still vividly remember my surprise (not to say my incredulity) when I discovered thai penser and panser were cognate, whereas panser, as in panser un cheval ("to groom, to rub down a horse"), had nothing to do with the panse ("belly") of the animal. Examples of this type abound, and detailed lists can be found in reference books (see e.g. Buyssens 1965, with numerous examples from English). Of the process of fusion, on the other hand, there are fewer examples : lan guage systems, supported by some Académie Française or other such social norm, offer more resistance to this type of process. Cases of fusion normally ap pear as reprehensible in the eyes of the norm, although they are fairly widespread : - a jour ouvrable ("working day") is a day when one oeuvre (old verb : "works"), but as it follows that it is also a day when the shops ouvrent ("are open"), it tends to be understood and used as meaning "openable day" ; - the adjective chaotique is found not only with its "genuine" spelling, which goes back to chaos, but also with the spelling cahotique, from the noun ca-
CONTINUITY IN SYNTAX
49
hot ("a jolt", "a bump"), as in un parcours cahotique ("a bumpy ride"), — which is a freak in the eyes of the norm ; - the verb dériver, as used in modern French, has a twofold origin, as Muller (1962) has explained : it is the continuation of the verb dériver ("to turn a river away from its course"), but draws at the same time on an old verb driver, of English origin ("to push" or "to be pushed" ; cf. Eng. to drive). The resulting meaning, a semantic cross between the two, is that of a slanting movement. Furthermore, it would of course be easy to elaborate upon the subject of fu sion between elements, taking into account the effect of associations governed by the subconscious, puns, etc. which channel semantic processes. Strikingly, there is no limit to the plasticity of semantic formations, and there is no unsurmountable boundary of contradiction : the fact that ils peignent can be said both of painters (verb peindre : "they paint") and of hair-dressers (verb peigner : "they comb") may well suffice to build up a unique semantic entity : a gesture common to those who use a comb or a paint-brush. In this matter, actual language data show more imagination at work than one can fancy (see Buyssens 1965 ; Le Goffic 1982). The essential point of all this is that it is not possible to "lock up" the stock of lexical items. In fine, a lexicon is a set of items in unstable equilibrium, in which the units exert upon each other conflicting or converging pressures (Le Goffic 1988) placing the system in jeopardy. This is how the system holds. One can, of course, take a stabilized viewpoint and ignore all this, which obviously gives a certain authority over the lexicon, but at the cost of neglecting phenomena which do exist, and whose neglect may prove harmful. 4.
Problems of grammatical categorization
A noun is a noun, a verb is a verb. Here again, this works in 99 % of the cases. This is important, sufficient perhaps, but one wonders what is to be done with the 1 % which does not work. Cases of terms difficult to classify are not ex ceptional, and one could draw up a catalogue, which would probably be most instructive. What is to be done with debout (originally a prepositional phrase de+bout, but generally considered as an adverb, with uses very close to those of an adjective : être assis ou debout, "to be sitting or standing") or with plein "full", commonly regarded as a preposition in plein les poches (lit. "the pockets [being] full" ; note the word order and the invariability of plein), etc.? This list could easily be lengthened, but I wish to concentrate on a few examples, beginning with that of the word pas. The story of how the word pas, which originally meant "a step", came to designate the negative, is fascinating. Je ne marche used to mean "I do not walk", and ... pas, "...(not even the minimum distance of) a step". Then pas became an obligatory part of negation, along with ne. There was ensuing competition bet-
50
PIERRE LE GOFFIC
ween the two, and finally in modern French, as we know, pas has more or less ousted and replaced ne. It is the pas which often represents negation in everyday spoken French. This is, to be sure, an extraordinary trajectory, resulting in a change of cate gory, since everybody agrees that pas is an adverb in this second use. Remarkably, this occurred in an area where one would expect logic to be most readily found. The logical operator of negation is somehow linguistically cut up in French, and shredded into two pieces. This historical and synchronic treatment of negation in French is a kind of slap in the face to this logic which we all need. Now is it possible to process a system in which changes of such magnitude can take place without resorting to continuity ? Once again, I use a term which would require more clarification, — which I am not in a position to provide. It is clear, nevertheless, that this type of question cannot be avoided. Can a system in which such phenomena occur be totally locked into a discrete perspective ? There must be, somewhere, an opening for transition, a possibility of continuity. To escape this, one might say that this is History, even ancient History, since the facts involved here extended over seven or eight centuries. But is there anything which allows one to say that changes of the same type are not taking place currently, though invisibly from a synchronic point of view ? This assump tion would imply that language itself (French) has changed on one essential point since the period when it allowed this type of evolution. What evidence do we have to claim that language has changed, qualitatively, since the Middle Ages, so that it no longer provides openings for this sort of evolution ? Thus, if one wishes to re ject the example of pas, one must prove that language itself has undergone a change, which demand may not be easy to satisfy. The word pas thus still raises problems today. Should one say about this example that one swallow does not make a sum mer, I would reply that, in this domain, it does : a single example suffices in fact to raise a problem concerning the quality of language. Were there but this single case, in the whole of French or indeed in all the languages of the world, I think that the problem would be raised all the same. But there are many more. Quantity and measure, for example, are not easy to categorize in surface terms. They are part noun, part adverb. I shall simply say that beaucoup and longtemps are obviously noun terms, but their functions show a mixture of nomi nal and adverbial features. Thus, in the field of quantification, I might, if I wished to use a somewhat provocative type of speech, say that discrete filtering, here, is not particularly successful. The number of difficulties regarding categorization, as can be seen, cannot be restricted to a narrow list, and I tend to think that this number is directly related
CONTINUITY IN SYNTAX
51
to the number of difficulties in syntax. In other words, I wonder whether every difficulty in syntax does not have something to do with problems of continuity. 5. The example of Fr. que Last, I would like to say a few words about Fr. que, the well known "complementizer", as the generativists call it : the wide range of its uses and the difficulty to break them down into separate units make it a particularly appropriate example for a discussion about continuity. If one attempts to outline the general functioning of the term, a certain num ber of uses can be retained as basic (cf. Le Goffic 1992) : a) the interrogative pronoun as in Que fais-tu ? ("What are you doing ?"), Que faire ? ("What is to be done ?") or, used indirectly, in Je ne sais que faire ("I don't know what to do" ; indirect interrogative examples are more difficult to find). b) "relative without an antecedent", as in Advienne que pourra ("Come what may" "Come whatever may [happen]"), parallel to the use of qui in Rira bien qui rira le dernier ("He who laughs last laughs longest"). In fact this type of use, confined to proverbs or limited phraseology in the case of qui, is nearly totally obsolete as regards que : the example quoted above is practically unique ; it is nevertheless structurally important. As has often been noticed, this label "relative without an antecedent" is inappropriate : the demonstration of this can be given easily, though indirectly, by considering the use of qui : in Rira bien qui rira le dernier, and more obviously in Embrassez qui voulez ("Kiss whomever you want to"), qui does not belong to the paradigm of the relative (it could not be object), but to that of the interrogative. For that reason, I call the qu- words occurring in these structures integrative pronouns (following Damourette et Pichon). c) the relative pronoun : le livre que j'ai lu ("the book which I read"). d) the "completive", as in je dis que ça va ("I say that it is all right"). A second train (series of uses) must be added, not with the pronoun que but with a homonym, the (indefinite) adverb of quantity que, which goes back to the Latin quam (and not quid or quod, which are the etyma of the pronoun que) : a) que exclamative adverb of degree as in Que c'est bon ! ("How good it is !"), Que d'efforts il a dû faire ! ("How many efforts did he have to make !"), which can also be found in indirect uses under certain conditions, as in Vous savez que d'efforts il a dû faire ("You know how many efforts he had to make"). b) que integrative adverb, as in 77 ment que c'est une honte ("He lies to which degree it is a shame"). This example (colloquial) may seem marginal, but it
52
PIERRE LE GOFFIC
is at the base of a whole series of extremely important and indisputable uses, viz. comparatives and consecutives : Paul est plus grand que Jean, "Paul is taller than John", i.e. : "Paul is superiorly tall, [in comparison with] to which degree John is tall". We thus obtain six types of examples which I readily consider basic : Que (1) (2) (3) (4)
pronoun : Interrogative : que faire ? ; je ne sais que faire Integrative : advienne que pourra Relative : le livre que j'ai lu Completive : je dis que ça va
Que adverb : (1) Exclamative : que c'est bon ! ; vous savez que de mal il a eu ! (2) Integrative : il ment que c'est une honte ! ; Paul est plus grand que Jean Here, it seems to me, are most propicious grounds for a debate about discre teness and continuity. What unity is there to all this ? It is certainly to be found in the basic fea ture of indefiniteness, in other words in the scanning ("parcours", "Verlauf') of all possible values. The whole series of the qu- terms {qui, que, quoi, quel, quand, as well as, though less apparently, où, comme, comment, combien, wi thout forgetting que as an adverb) have in common this indefiniteness, this scan ning process, applied to various domains : animate vs inanimate beings, quality, time, space, manner, quantity, degree). All the uses of que (pronoun or adverb) can be derived from that basic fun damental value. Let us start with the pronoun. In the interrogative, the quest is open and presented as requiring an ans wer. When I utter Que fais-tu ?, I am in search of "whatever you are doing", i.e. of the right instance X to validate "you are doing X", and it is urgent that you, the interlocutor, should "clock in". In the integrative use advienne que pourra, on the other hand, there is no urgency since any value settled for will do, as any value that validates "pouvoir [advenir]" ipso facto validates advienne ("let it come, let it happen"). In this type of use (as in rira bien qui rira le dernier), the indefinite value of que (or qui) is particularly perceptible. What now is the relation between a relative and an indefinite pronoun ? How can a relative stem from an indefinite ? This is an important and delicate position. I will adopt the following line of explanation : the qu- relative is still (logically, at the start) an indefinite pronoun, whereas the antecedent is the term which fulfills (saturates) it, and somehow provides the apropriate value, the ans-
CONTINUITY IN SYNTAX
53
wer to the question raised by the indefinite. So it seems that the relative is the re sult of a kind of capting process exerted by the antecedent upon an integrative, the result being the development of an apparently anaphoric relation. Latin examples would bear witness to a state of language in which relative pronouns and their socalled "antecedents" are clearly autonomous, the latter being often not before (ante) but after the former, thus clearly appearing as the saturating term, the res ponse (and not as the antecedent of an anaphoric relation). One can reasonably as sume that this represents an early stage of the formation of our relative system, but of course many aspects of the process (e.g. the above-mentioned change of the paradigm) are yet to be elucidated. As now regards the completive, it appears at first to be completely sepa rate. There is no indefinite value perceptible, no anaphora, the que has no func tion, it is said to be a different part of speech, etc. But here again, if one looks ca refully — and history bears this out — the completive appears to be related to the other uses of que : it is a kind of degenerated relative or integrative. To take an example of this "degeneration" (one of those troublesome atypical examples which are generally discarded by grammarians, who hardly ever venture to cate gorize it) : C'est une chose étrange que cet aveuglement (lit. "It is a strange thing that this blindness", i.e. "this blindness is a strange thing"), or Qu'est-ce que la métempsychose ? (lit. "What is it that metempsychosis ?", i.e. "What is metempsychosis ?"). What, here, is the que ? One is led to assume an ellipsis of the verb "to be" (Qu'est-ce que la métempsychose [est] ?), which hypothesis is supported by the fact that the verb être comes out clearly if the sentence is modalized : Qu'est-ce que la métempsychose pourrait être d'autre ? ("What else could metempsychosis be ?"). Ultimately the underlying structure is something like ce que cet aveuglement est, est une chose étrange ("what this blindness is, is a strange thing"), ce que la métempsychose est, est quoi ?, in which modern French requires ce que in place of the underlying integrative (obsolete) *que N est (as in advienne que pourra, quoted above ; parallel to Eng. what N is). One step further, one finds je dis que S, with the typical completive que, nevertheless analyzable as a degraded integrative (je dis (ce) que P [est], lit. "I say what S is"). Besides, a relative que with ellipsis of être will yield a completive in Il a cette particularité qu'il est gaucher ("he has that particularity that he is lefthanded") = il a cette particularité que IL-EST-GAUCHER [est], the clause following que being the subject of the elliptic verb "to be", que being the relative, whose antecedent is cette particularité, and whose function is "attribute of S" : "5 (=il est gaucher) est cette particularité". In fact the number of que items labelled comple tives which do have antecedents is very high : I shall only mention the numerous cases in which the completive has to take the form ce que (e.g. Je tiens à ce que vous veniez "I insist that you should come").
54
PIERRE LE GOFFIC
Turning now to the adverbial que, exclamative as in Que c'est bon !, it is a homonym of the pronoun que, although related to it. But here again we come across the fact that quantity, as I mentioned above, is something which involves both noun and adverb categorization. In fact que has nominal features in Que de gens se trompent ! ("How many people make mistakes !"). Besides, it is often (optionally or necessarily) replaced by ce que {Ce que c'est bon Î, with a relative que), in which case it may be difficult to categorize them as pronominal or adver bial : Vous ne pouvez pas savoir ce qu'il a pu f aire comme bêtises ("You cannot tell what mischief he has been up to"), Vous ne pouvez pas savoir ce qu'il est ennuyeux (" You cannot tell how boring he is"). As regards the integrative use of the adverb que, as in 77 ment que c'est une honte !, or, in a more important field, in comparative (correlative) clauses like Paul est plus grand que Jean, it still retains its value of "adverb of indefinite de gree" : Paul is taller that John, whatever John's tallness may be. To summarize briefly : the major distinctions usually recognized about que appear as mere salient stages. My position is that it may well be possible to find a gradation, possibly continuous, between them (as well as ambivalence phe nomena : in Qui as-tu dit que tu voulais voir ?, "Who did you say you wanted to see ?", the highly controversial nature of que seems to me to be at the intersection between relative and completive functions). I wish to conclude, once again, not with a proposition about what continuity is and how it should be processed, but rather, a tone below, with a criticism of discreteness. Ultima verba : whenever a problem arises involving possible con tinuity, it is always possible to try and defend discreteness by taking smaller units, by refining down to smaller terms. Thus one succeeds in breaking up what see med to be a continuous, dynamic entity, into a more manageable series of small discrete units. This reminds me of the paradox of Achilles and the tortoise. Perhaps there is a kind of false problem in this opposition between the terms of continuity and discreteness. What I wish to emphasise is that the discrete catego ries which we work with are too approximate, if not basically inadequate, and I do not think linguistics has anything to gain by ignoring the existence of the zero comma something percent.
CONTINUITY IN SYNTAX
55
REFERENCES Buyssens, Emile. 1965. Linguistique historique, Paris : P.U.F, et Bruxelles : P.U. Culioli, Antoine. 1990. Pour une linguistique de l'énonciation, Paris : Ophrys. Damourette, Jacques and Edouard Pichon. 1911-1940. Essai de Grammaire de la Langue Française (7 vol.), Paris : d'Artrey. Fuchs, Catherine. 1991. L'hétérogénéité interprétative. In H. Parret (ed.), Le sens et ses hétérogénéités, Paris : éditions du CNRS (coll. Sciences du langage), pp. 107-120. Guillaume, Gustave. 1969. Langage et Science du Langage, Paris : Nizet, Québec : Presses de l'Université Laval. Le Goffic, Pierre. 1981. Ambiguïté linguistique et activité de langage, Thèse de Doctorat d'Etat, Université de Paris-VII. Le Goffic, Pierre. 1982. Ambiguïté et ambivalence en linguistique, DRLAV 27, pp. 83-105. Le Goffic, Pierre. 1988. Tensions antagonistes sur les systèmes : les rapports entre diachronie et synchronie. In A. Joly (ed.) La linguistique génétique : Histoire et Théories, Lille : Presses Universitaires de Lille, pp. 333-342. Le Goffic, Pierre. 1992. Que en français : essai de vue d'ensemble. In Subordination (Travaux Linguistiques du Cerlico, 5), Rennes : Presses Universitaires de Rennes 2, pp. 43-71. Muller, Charles. 1962. Polysémie et homonymie dans l'élaboration du lexique contemporain, Etudes de Linguistique Appliquée 1, pp. 49-54.
THE USE OF COMPUTER CORPORA IN THE TEXTUAL DEMONSTRABILITY OF GRADIENCE IN LINGUISTIC CATEGORIES GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU Lancaster University, UK 1.
Introduction
This paper explores the empirical analysis of non-discrete categories in semantics, and in linguistics generally. The classical, Aristotelian tradition re quires linguistic categories, like other categories, to be identified by a set of pro perties giving sufficient and necessary criteria for membership. Contrasting categories, so defined, have disjoint membership, and clear-cut boundaries. This view of linguistic categories is convenient for logical processing, but unrealistic. Various theories or tools of analysis have been proposed to provide an alternative view of linguistic categories in terms of non-discrete modelling. Examples are : Rosch's prototype theory of psychological categories (Rosch 1978) ; Zadeh's (1965) fuzzy set theory ; Bolinger's (1961) concept of gradience ; Quirk's (1965) notion of serial relationship. More recently there is the work of Lakoff (1987), Langacker (1990) and others within cognitive linguistics. These approaches to non-discrete categorization seem to be dealing with the same phenomena in somewhat varying terms. For the present purpose, the focus will be on gradience, the phenomenon of a scale (or gradient) relating two contrasting categories in terms of relative similarity and difference. Gradience means that members of two related categories differ in degree, along a scale running from "the typical x" to "the typical y". From an informal point of view, the more closely one examines linguistic categories, the more one tends to discover gradience, and the less one tends to believe in the "classical" Aristotelian view. However, one problem is : how do we demonstrate that such gradients exist ? Semantic gradience, assuming it exists, is a mental phenomenon, unamenable to observation. Therefore, to sub stantiate the claim that a gradient exists between two categories, we have to provide indirect evidence of its existence. For example, we may carry out elicitation procedures : experiments, tests, or surveys which elicit responses from na-
58
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU
tive speakers of the language, given an appropriate stimulus, such as the visual objects used in Labov's (1973) investigation of the non-discrete semantics of the word cup. An alternative method, and in some ways a better one, is to examine the linguistic productions of native speakers in natural circumstances, when they are unconstrained by experimental conditions. In both cases, we study the obser vable reality of "parole" (or "performance") as an indirect means of reaching the underlying reality of "langue" (or "competence"). The second of these alternatives is the one pursued in this paper. It means studying the texts people produce, whether these texts consist of spoken or written material. Techniques of such text analysis have greater power than for merly, through the recent development of computer corpora 1 : large collections of machine-readable texts or text samples which can be searched and manipulated by computer. Such corpora are often designed to contain a "balanced" sampling, so as to be broadly representative of the language, or at least of some text types of the language. The present study is based on the analy sis of a one-million word corpus of written British English which has been widely used for linguistic research : the Lancaster-Oslo/Bergen (or LOB) Corpus2. A second problem is : how can we give a precise description of such nondiscrete phenomena as gradience ? How can one be exact about phenomena which, it seems, are by their very nature vague and indeterminate ? We will argue that precision is possible : that by studying the distribution of such phenomena in a sample of texts, one can arrive at a reasonably precise statistical model of their occurrence. This model can then, if desired, be tested on further data, or can be adjusted experimentally to provide a better fit to the corpus data. It may be objected that such a model is a model of language in use, and does not give access to the underlying cognitive realities. This is true : the factors showing up in the corpus data may have varied explanations. Nevertheless, it is reasonable to expect that non-discrete cognitive phenomena will be reflected substantially in the way native speakers make use of their language in actual performance. The contrary position — that this is not a reasonable expectation — is one that needs special pleading, in accordance with "Occam's razor". However, those who find this type of argumentation troublesome may simply be willing to accept that the results of corpus analysis
1) Two books which explain recent developments in computer text corpora, with particular reference to English, are Johansson and Stenstrîm (1991) and Aijmer and Altenberg (1991). A view of the current state of the art in corpus analysis is given in Leech and Fligelstone (1992). Of more particular relevance to the present paper is Leech (1992). 2) Details of the LOB Corpus are provided in Johansson etal.(1978).
GRADIENCE
59
are a contribution to the theory of language performance — and let the matter rest there. 2.
An Example of Gradience : the English Genitive and of-Constructions
To illustrate the proposed method of investigating and measuring gra dience, we will consider the case of the English genitive construction (as in the president's speech), and compare it with the frequently synonymous English ofconstruction (as in the speech of the president). For convenience, we distinguish these two constructions by the formulae : [X's Y] and [the Y of X]. The question for which we seek an answer, using corpus analysis, is : what determines the na tive speaker's choice between one form rather than another ? In practice, gram marians have identified a number of criterial factors, of which three of the most important are : (a) The semantic category of X For example, it is often claimed that if X has human reference, [X's Y] is normally preferred to [the Y of X] : George's car rather than the car of George. However, this is not the whole story, as [X's Y] is sometimes used with non-hu man reference, e.g. the earth's orbit, and [the Y of X] is sometimes used with human reference, e.g. the assassination of Abraham Lincoln. (b) The semantic relation between X and Y For example, the typical semantic relation between X and Y in [X's Y] is supposed to be "possession", as in George's car. On the other hand, even with a human X and a non-human Y, there are some semantic relations which favour the of-construction : the photograph of George will typically be preferred to George's photograph if the meaning is "the photograph representing George". (c) The kind of text type (style) in which the construction occurs It may be claimed, for example, that the genitive is more likely to occur in certain types of writing (e.g. popular journalism and broadcasting) than others. There are many other factors which might be added to these : there is, for example, a syntactic factor that might be added to the above three : (d) The ratio of the length of X to the length of Y For reasons of functional sentence perspective or end-weight, we expect [X's Y] to favour a shorter X and a longer Y, whereas [the Y of X] will tend to favour the opposite. Thus, other things being equal, this would give a higher li kelihood to John's second wife than to the second wife of John, but a lower like lihood to Napoleon Bonaparte's wife than to the wife of Napoleon Bonaparte. Some corpus-based quantitative studies of the genitive have already been carried out (Jahr Sorheim 1980, Altenberg 1982), taking into account the above
60
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU
factors and many others1. We will not attempt more than the analysis of the three factors (a)-(c) above. However, the innovation of the present study is that we employ a statistical technique, logistic modelling, which enables a number of factors and sub-factors to be simultaneously built into the model, and to be included or excluded from consideration at will, so that we can derive from the classified corpus data the model (given those factors and sub-factors) that best fits the data. One complication is that each factor itself can be thought of as represen ting an ordered categorical scale. For example, factor (a) above is not just a yesno matter ("X is either human or non-human"), since the likelihood of choosing [X's Y] is apparently conditioned by the degree to which X tends towards human reference. If X refers to an animal ("a lion's strength") or to a human or ganization ("the government's policy"), this may be intermediate, in likelihood, between a human noun and an inanimate or abstract noun. However, the model itself can decide on the relative levels of likelihood associated with different classes of noun, showing the linear relationships between these sub-factors (which we will call levels) on an interval scale. 3.
Technique of Analysis
The basis for the model is the calculation of the ODDS in favour of the genitive, i.e. :
for any combination of factors. From the corpus can be derived the optimal values for the factors and their levels, as well as interaction effects between them. So, for example, it is possible to determine which of the factors, and which of the levels within the factors, are the most important in determining the choice either of the genitive or of the of-construction. Further, it is possible to place the factors and levels in an order of importance ; to discover whether any factors or levels are redundant to the analysis ; to use a significance test (in the present case chi-squared) to determine the goodness of fit of a model to the data. The statistical model will, so to speak, "map" the gradience between the two categories.
1) Existing corpus-based studies of the English genitive are Jahr Sorheim (1980) and Altenberg (1982), Altenberg's study being of 17th century English.
GRADIENCE
61
Note, however, that the model is not entirely governed by predetermined objective criteria. In two respects, the human analyst has to intervene to make decisions. First, he/she decides (preferably after an informal analysis of some part of the corpus) which are the factors and levels which play a principal role in determining the likelihood of choice. According to Altenberg (1982 : 296), there are over 40 factors that might affect the choice of the genitive. But for practical reasons, at present, it is necessary to restrict the analysis to the three factors which we believe to be the most important. In the longer term, we will need to experiment with different factors, adding some and possibly subtracting others, to see how a better fit to the data can be obtained. Secondly, the analyst has to decide how to classify each example in the corpus by reference to the factors and levels included in the model. In some cases this decision is virtually automatic (e.g. in deciding whether a noun has human reference or not), whereas in others it requires judgement. There is nevertheless a hope that the analysis of examples will be objective enough to be largely capable of exact replication by different analysts1. 4.
Step-by-step Methodology
In the following, we detail the techniques of analysis in the order in which they were undertaken in our experimental study. i) Selection of appropriate textual data We decided to use the LOB Corpus (see 1 above) of written British English as our source of textual data. It was clearly impractical to use the whole of the corpus, so we chose several sections of the corpus of equal size : viz. parts of sections A, B and C for journalistic writing, J for scientific and learned wri ting, and K for general fiction2. In choosing these three stylistically contrasting text types, we had in mind the need to test the influence of style, or text type, on the odds in favour of the genitive.
1) Some aspects of the classification of examples are more problematic than others. We can generally expect consistency between different analysts for Factors (a), (c) and (d). For Factor (b) (semantic relation between X and Y), the criteria for judging membership of classes have to be clarified — by reference, for example, to paraphrase or entailment tests — if inconsistency is to be avoided. Even then, some residual cases may be unclear. Evidently there is fuzziness or gradience even in the classes which are necessary to the definition of gradience itself. 2) The sections of the LOB Corpus chosen for this experimental analysis correspond to those chosen from the Brown Corpus by Meyer (1987) in his research on punctuation.
62
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU
ii) Classification of examples For the purposes of this analysis, it was necessary to focus on those occurrences of [X's Y] which could, in principle, be replaced by [the Y of X], and those occurrences of [the Y of X] which could, in principle, be replaced by [X's Y]. The use of "in principle" here does not mean that each example had to be transformable, in its context, into an example of the opposite category. Rather, it means that we limited the analysis to transformable classes : i.e. classes of the genitive or the of-construction for which members of the opposite category also occur. For example, the possessive genitive is a "transformable class" because alongside possessive examples like my best friend's house there are also examples such as the house of my best friend. The fact that some examples of the possessive (e.g. Richard's new car), especially in their context, would be difficult or impossible to transform acceptably did not enter into the analysis. On this basis we excluded classes (such as the use of of with quantifiers some of many of, etc.) which were non-transformable, and arrived at the follo wing classification of the examples according to the three factors and their constituent levels. (In the tables, we add the frequency of each category in the sample for [X's Y] and [the Y of X] : Factor (a) Semantic Category of X [X's Y]
[Y of X]
223
186
H
Human nouns - including names
A
Animal nouns (excluding human)
3
26
0
Organization nouns (collective)
24
76
P
Place nouns - including names
28
61
T
Time nouns
13
41
C
Concrete inanimate nouns
0
467
B
Abstract nouns (excluding time)
1
399
Total
292
1256
GRADIENCE
63
Factor (b) Semantic Relation between X and Y [X's Y]
[Y of X]
POSS.
Possessive
115
391
SUBJ.
Subjective
82
160
OBJE.
Objective
0
262
ORIG.
Origin
55
31
MEAS.
Measure
13
41
ATTR.
Attributive
7
89
PART.
Partitive
20
282
Total
292
1256
Factor (c) Text Type (or Style) [X's Y]
[Y of X]
Journalistic style
177
456
Learned style
41
638
Fictional style
74
162
Total
292
1256
It is useful to add a few words here on Factors (b) and (c). Factor (b) : The classification of levels under Factor (b) needs some ex planation. Possessive is understood broadly to include all relations which entail the paraphrase "X has (a) Y", apart from those in the attributive and partitive cate gories. E.g. : the banker's son the orchestra's music stand the mind of the reader Subjective applies where there is a sentence analogue in which X is the subject of a verb corresponding to the head of Y. E.g. : nazi Germany's request Lawrence's imagination the formation of the heavy elements Objective applies where there is a sentence analogue in which X is the ob ject of a verb corresponding to the head of Y.
64
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU
E.g. : the inverse distribution of plants and animals [Coleridge's] definition of the secondary imagination the amusement of passers-by Origin applies where the construction entails "X is the originator (author, causer, maker, etc.) of Y". E.g. : Chaucer's poem the music of these three composers the opinion of the council Measure applies where X provides a measurement (in terms of time, dis tance, etc.) of Y. E.g. : last year's average wage yesterday's encounter the deplorable Homicide Act of 1957 Attributive applies where Y describes an attribute of X : Y has an abstract noun as its head, and the sentence analogues are both "X has (a) Y"and "X is Y'", where Y' is the corresponding adjective. E.g. : Mark's blindness the immensity of her work the extraordinary instability of the Anglo-Saxon public Partitive applies where there is a sentence analogue "X has (a) Y", and where, in addition, the semantic relation of Y to X is that of part to whole. E.g. : Michael's face the Football League's Division Two the foyer of the Leopold Hotel Factor (c) : It is already worth noting from the figures under Factor (c) that there is marked inequality between the frequencies of the genitive in the different styles. The Journalistic text type accounts for 60.62 per cent of the total number of genitives, as compared with 25.34 per cent in the Fictional texts, and 14.04 in the Learned texts. At this point it is also worth noting the important role of the computer in making the analysis practicable. By means of a concordance programme, it was possible to locate all examples of the genitive and the of construction, and to inspect their context, with little trouble. Also, the concordance programme allo wed us to encode each factor and level in an abbreviated way, and to sort the examples on the basis of these codes. In this way, it was relatively easy to verify the classifications, to check them for consistency, and to count the number of occurrences in each class. In principle, we could have undertaken the analysis
GRADIENCE
65
without the availability of the machine-readable corpus ; in practice, this task would have been Herculean, and would have been prone to error and inconsis tency. iii) Tabulating the results of the analysis As a result of this analysis, we arrived, in effect, at a 3-dimensional matrix, with each cell containing a numerical value (viz. a frequency count). For the purpose of input to GLIM, the result could be prepared as a set of three 2-dimensional tables, one for each pair of factors, as shown in Tables 1-3. Table 1 : Observed Proportion of the Genitive in Journalistic Style H
A
0
P
T
c
B
TOTAL
POSS.
46/72
0/0
8/33
16/43
0/0
0/28
1/57
71/233
SUBJ.
36/50
0/0
4/13
8/0
0/0
0/7
0/28
48/106
OBJE.
0/13
0/2
0/6
0/2
0/0
0/25
0/54
0/102
ORIG.
36/48
0/0
4/6
0/0
0/0
0/0
0/0
40/54
MEAS.
0/0
0/0
0/0
0/0
7/19
0/0
0/0
7/19
ATTR.
3/7
0/0
3/7
0/3
0/0
0/12
0/21
6/50
PART.
2/9
0/0
3/13
0/3
0/0
0/24
0/20
5/69
TOTAL
123/199
0/2
22/68
24/59
7/19
0/96
1/180
177/633
Table 2 : Observed Proportion of the Genitive in Learned Style H
A
0
P
T
C
B
TOTAL
POSS.
8/43
0/7
0/9
1/8
0/0
0/65
0/59
9/191
SUBJ.
13/28
0/13
1/3
2/3
0/0
0/29
0/23
16/99
OBJE.
0/11
0/3
0/1
0/1
0/0
0/75
0/53
0/144
ORIG.
13/23
0/0
0/4
0/0
0/0
0/0
0/0
13/27
MEAS.
0/0
0/0
0/0
0/0
3/29
0/0
0/0
3/29
ATTR.
0/1
0/0
0/0
0/0
0/0
0/12
0/20
0/33
0/0
0/112
0/27
0/156
3/29
0/293
0/182
41/679
PART.
0/6
0/0
0/2
0/9
| TOTAL
34/112
0/23
1/19
3/21
66
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU Table 3 : Observed Proportion of the Genitive in Fictional Style H
A
0
P
T
C
B
TOTAL
POSS.
31/50
3/3
1/2
0/3
0/0
0/9
0/15
35/82
SUBJ.
17/20
0/1
0/1
1/2
0/0
0/8
0/5
18/37
OBJE.
0/5
0/0
0/0
0/1
0/0
0/4
0/6
0/16
ORIG.
2/2
0/0
0/0
0/0
0/0
0/0
0/3
2/5
MEAS.
0/0
0/0
0/0
0/0
3/6
0/0
0/0
3/6
| ATTR.
1/2
0/0
0/0
0/1
0/0
0/4
0/6
1/13
| PART.
15/19
0/0
0/0
0/2
0/0
0/53
0/3
15/77
| TOTAL
66/98
3/4
1/3
1/9
3/6
0/78
0/38
74/236
iv) Using GLIM for statistical modelling The statistical package GLIM (Francis, Green and Payne 1992) was used to fit a selection of statistical models to the observed data. The techniques used are described below in section 5. v) Drawing conclusions from the analysis This, like the previous step, is crucial in the methodology, and is explained in section 6 below. 5.
Statistical Modelling
The idea of statistical modelling is to present a simplified representation of the underlying process by separating out systematic features of the data from random variation. To do this, a probability model is chosen to represent the ran dom variation ; the systematic features of the data are represented by a (hopefully small) number of parameters. So, the statistical task is to find a statistical model which is parsimonious (has a small number of parameters) but which fits the data well according to the underlying probability distribution (GOODNESS-OF-FIT). Thus Occam's razor plays an important part in statistical modelling. As an example, one of the classical statistical models is simple linear re gression, where the probability distribution is the Normal distribution, and the mean of the Normal distribution is modelled by a regression equation — a linear combination of variables with unknown regression parameters. More recently, techniques have been developed for modelling proportions, or ratios of counts. These techniques use the Binomial distribution as the proba bility distribution, and model the probability of a particular outcome as a func tion of a set of explanatory factors. A common model relates the probability of
GRADIENCE
67
an outcome to a linear combination of explanatory factors through the logit function log(p/(l-p)), or log-odds ratio (Collett 1991). We therefore try to cons truct a linear model predicting the log-odds ratio for any combination of factors. We then try to build a model for the probability of the English genitive construction. The simplest model is that the probability is constant and is not de pendent on the semantic category, the semantic relation or the style. The varia tion in the proportion of English genitive construction for each combination of factors would then be just Binomial variation. This model can be tested, a good ness of fit measure calculated and the model rejected or accepted, and a new model tested, and so on. More generally, we model the probability of giving the English genitive as a function of a set of individual characteristics. Formally, we assume that the probability of giving a particular response for a particular cell i is pi. For each cell of the table, we observe the number of genitive constructs Ri and the total number of constructs Ni. The logistic model assumes that : Ri is distributed with a Binomial distribution with mean pi Ri ~ Binomial (pi, Ni) Pi is related through a function f(.) to a linear function of the explanatory factors
where the bj are unknown parameters to be estimated. f(pi) is taken to be the logit function
Following the usual GLIM convention, each of the potential explanatory factors was converted by GLIM into a set of dummy variables, one dummy va riable for each level of the factor. A backward elimination procedure was adop ted, with the all two-way interaction model being fitted at the initial stage. At each subsequent stage, the least important variable or interaction was selected and removed from the model using the GLIM scaled deviance (likelihood ratio statistic) as a criterion for comparing nested models — a difference in scaled deviance between two models (the smaller of which has s parameters less than the other) will have a chi-squared distribution on s degrees of freedom if the term removed is unimportant (i.e. its parameters can be set to zero). The proce dure was terminated when all remaining variables were significant. For further details of logistic regression, GLIM and statistical modelling see Aitkin et al. (1989), Agresti (1990) or the new GLIM4 manual (Francis, Green and Payne 1992).
68
GEOFFREY LEECH, BRIAN FRANCIS AND XUNFENG XU
The results of the analysis are shown below. Table 4a shows the analysis of deviance table. We can see that starting from the all-two-way interaction mo del, first the CATEGORY by STYLE interaction is removed, then the RELATION by STYLE interaction, and finally the RELATION by CATEGORY interaction. We are left with a final model involving the main effects of CATEGORY, RELATION and STYLE. Table 4b shows the effect of removing each of these terms from the model. All terms are seen to be important, and the model cannot be simplified further. Does the model fit well ? The scaled deviance from this model is 63.01 on 71 degrees of freedom, which provides the goodness of fit test for the final mo del. This value is compared with a chi-squared distribution on 71 df. Its 95 % percentage point is 91.67, and as 63.01 is substantially below this figure, the model fits well. We can assess the importance of each of the terms in the final model by further examination of Table 4b. All factors are highly significant, and all are very important in predicting the proportion of genitive constructs. However, when CATEGORY is excluded from the model, the scaled deviance changes by a massive 361.0 on 5 degrees of freedom (72 for each degree of freedom). This makes CATEGORY the most significant term. The next most significant term is STYLE, and finally RELATION. Table 4a : Analysis of deviance table Difference in scaled deviance from previous model
Difference in df
p-value
39
12.32
10
0.2642
35.14
51
14.88
12
0.2481
63.01
71
27.87
20
0.1125
MODEL
Deviance
df
CATEGORY+RELATION+ STYLE+CATEGORY.STYLE + RELATION.STYLE+ RELATION.CATEGORY
7.94
29
CATEGORY+RELATION+ STYLE+RELATION.STYLE+ RELATION.CATEGORY
20.26
CATEGORY+RELATION+ STYLE+ RELATION.CATEGORY CATEGORY+RELATION+ STYLE (final model)
GRADIENCE
69
Table 4b : Effect of deleting terms from the final model Term deleted
change in deviance
change in df
STYLE
70.28
2
p (True, False) (x,y) > Be (x>y) = True if y is a relevant property of x for e. Definition of inference rules : V = {salient, inhibited, accompagnying, ignored) is the set of possible "activity values" for a non interpretable element.
200
VIOLAINE PRINCE p e = Pe1 U Pe2 U P1, supplied with predicate p on V ; p : Ae x V > (True, False) (x,v) > p(x,v) = True si x takes the value v by means of a rule of PeSpecific or pragmatic rules : Let IIe be the premisses of the pragmatic rules known about e. Let Pel be the set of these pragmatic rules. Pel= { [εj -—> p(ai, vi), ej e Ile, ai Є Ae,VЄ V) Constraints on properties (integrity rules in EDGAR termi nology) : Pe2= { [p(ai, vi)—-> p(bj, wj), ai,bj e A e , vi, wjG V} Default Inheritance rules :
2.3.2. Formalization of the interpretation function Let K be the set defining contextual information. Let IIe be the set of pragmatic rule premises associated with the entry e. Let N be the set of entry representations constituting the lexical knowledge base. The elements of N are n(e) representations. The interpretation function f is defined as : (k,n(e)) - - - > f(k,n(e)) = C(e,k) where C(e,k) is defined as : C(e,k) = {(x, v), x G A e , v G V}corresponding to pairs of (conceptproperties, value) resulting from the instan-ciation by k, and the application on A e , of the rules in P e . It is written as a theory A = ( W, P) where : X= {p(x, salient) A p(x, inhibited) → p(x, valid) ,V v G V [p(x, ignored) A p(x, v) → p(x, ignored)]} a consistency rules set ;
SIMULATING CONTINUITY
201
W = (∪e (Pel ∪Pe2)) ∪ X is the set of first order formulas of the system (all integrity and specific rules of all items plus consistency rules) ; P is the set of normal default written above. 2.4. Possibilities and Limits of the EDGAR model Among the advantages of the EDGAR model, let us notice that it tries, as much as possible, to reduce computational effort in both space occupation and processing time directions. Representations are restricted to the smallest level, un der which representation would not provide proper semantics. Inference rules are distributed along three types : the specific ones are not reductible, and therefore have to be registered, but we have attempted to shrink them to their thinnest shape. Integrity constraints are few and look like an interesting mean of optimi zing the specific rules number. Inheritance rules are general and therefore always applicable but written once. On the other hand, calculus is not performed in terms of a complex algo rithm. Specific rules trigger the process, which is continued by constraints propa gation of value and then by default propagation until all potential elements have been evaluated. In terms of polysemy resolution, this model is able to offer an interpretation, even if context is "hostile", which means that context information is scarce.The quality of interpretation depends on how high this hostility is. But then, human beings have to cope with the same alterations in their interpretation of polysemy, when its ambiguity is enhanced by a vague or obscure discourse. The C(e,k) configuration obtained as a result of the function f indicates the scope of interpretation : salient elements would hint for the most favoured con cepts, without dealing with the question of their integrity, and accompanying ele ments would hint for implicit knowledge. Both are useful for polysemy interpretation. We feel that conceptual semantics limits reside in that conceptual semantics cannot interpret words but for these salient concepts, and only when their integrity is not menaced. What we offer is a more qualified panel, although this latter is able to cooperate with a conceptual frame : it appears to us as a subtle "constraint relaxation" process on a conceptual representation. Nevertheless, our system has limits of its own, which we are trying to make recede. Among them first, limitations of the interpretation function, which, as defi ned here, is unable to produce an image when context and premises of pragmatic rules do not match, at least partly. We are mending this situation by providing a "mimickery behaviour interpretation" which supplies substitute premises from re sembling words (Prince & Bally-Ispas 1991). Second, this model has not taken into account interferences between many polysemous words within a sentence.
202
VIOLAINE PRINCE
We plan to provide it with a memory which will modify possible interpretation of the next encoutered words by using what has been found earlier (we plan to use Putejovsky's projective inheritance). Third, this system has difficulties into inter preting metaphors which are no more lexical but which involve big chunks of dis course. This has to be dealt with within a general framework for metaphor resolu tion by means of Artificial Intelligence : some of James Martin's achievements in that domain will lead us to also record some conventional 'moves' in terms of metaphoric behaviour. As a conclusion, we feel that the natural semantic wealth embedded in words must not drive researchers in Artificial Intelligence into looking at it as a bunch of "problems" but as an economic mean for meaning communication (discussion provided in [Picoche 1989]). Until now, ambiguity has been conside red as an impediment to comprehension, and hence, polysemy has not only been put on the same shelf, but evaluated as a true nuisance whenever it has mingled with interpretation processes. Whereas polysemy could be seen either as a short cut to communication between individuals — even between Man and machine — or as an enhancement to a particular communicative act by forcing associations wi thin the reader (or listener)'s mind. Thus, polysemy is a useful characteristic of natural language, as are ellipses, anaphora, or other economic phenomena. But this attitude will lead researchers to review their position toward poly semy in the natural language understanding (and generation) field. They would have to look like for sophisticated models to take into account polysemy as an in formation source on the utterer's intentions. We intuitively feel that models consi dering paraphrase and polysemy as dual phenomena (Fuchs 1982) (Fuchs 1988), whether associative (neuromimetic) or discrete (logic), are probably the best tracks for a unified AI model for "really natural" language understanding.
SIMULATING CONTINUITY
203
REFERENCES Brachman, R. 1979. On the Epistemological status of semantic Networks". In Associative networks : Representation and use of Knowledge by Computers, New York : Findler Academic Press, pp. 3-50. Cottrell, G.W. 1985. A connectionist approach to word sense disambiguation. Doctoral dissertation, University of Rochester. Culioli, A. 1986. Stabilité et déformabilité en linguistique. In Etudes de lettres, Langage et Connaissances, Université de Lausanne, pp. 3-10. Fuchs, C. 1982. La Paraphrase, Paris : Presses Universitaires de France. Fuchs, C. 1988. Représentation linguistique de la polysémie grammaticale. In TA. Informations, vol. 29, n°l/2, pp. 7-20. Greimas, AJ. 1966. Sémantique Structurale, Paris : Larousse. Hirst, G. 1987. Semantic interpretation and the resolution of ambiguity, Massachussets, Cambridge : University Press. Hobbs, J.R. 1983. Metaphor Interpretation as Selective Inferencing : Cognitive Processes. In Understanding Metaphor Empirical Studies in the Arts, vol. 1, n° 1, pp. 17-34, & n° 2, pp. 125-142. Hobbs, J.R. ; P. Martin. 1987. Local Pragmatics. Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI-87), Milan, Italy, pp. 520-523. Kayser, D. 1990. Truth and the interpretation of Natural Language : A non mono tonie approach to variable depth. In ECAI-90 proceedings, Stockholm, Sweden, pp. 392-394. Lakoff, G., M. Johnson. 1980. Metaphors we live by, Chicago University Press. Le Ny, J.F. 1979. Sémantique Psychologique, Paris : Presses Universitaires de France. Martin, J.H. 1990. A Computational Model of Metaphor Interpretation, New York : Academic Press. Martin, J.H. 1991. MetaBank : A Knowledge Base of Metaphoric Language Conventions. Proceedings of the IJCAI Workshop on Computational Approaches to Non-Literal Language, Sydney, Australia, pp. 74-82. Martin, R. 1983. Pour une logique du sens, Paris : Presses Universitaires de France.
204
VIOLAINE PRINCE
Mel'chuk, I. 1984. Dictionnaire Explicatif et Combinatoire du Français Contemporain, Montréal : Presses de l'Université. Ortony, A 1979. Beyond literal similarity. In The Psychological Review, vol. 86, n° 3, pp. 161-180. Pages, R. 1987. Ambivalence et Ambiversion. Actes du colloque Ambiguïté et Paraphrase. C. Fuchs (ed.), Presses de l'Université de Caen. France. Picoche, J. 1989. Polysémie n'est pas ambiguïté. Cahiers de Praxématique, n°12. Université Paul Valéry, Montpellier, France, pp. 75-89. Pottier, B. 1963. Recherches sur l'analyse sémantique en linguistique et traduction automatique. Publications de la Faculté des Lettres et Sciences Humaines de Nancy, série A. Nancy. Prince, V. 1991. GLACE : un système d'aide à la compréhension des éléments lexicaux inducteurs d'ambiguïtés. Proceedings of RFIA91, vol. 2. AFCET. Lyon, France, pp. 591-601. Prince, V. ; R. Bally-Ispas. 1991. Un algorithme pour le transfert de règles pragmatiques dans le processus complexe GLACE. Document Interne du LIMSI n° 91-17. Pustejovsky, J. 1989. Current Issues in Computational Lexical Semantics. Proceedings of the 4th Conference of the European Chapter of the ACL, Manchester, England. pp xvii-xxv. Pustejovsky, J. 1991. The Generative Lexicon. Computational Linguistics, vol. 17, n° 4., pp. 409-442. Rastier, F. 1985. L'isotopie sémantique, du mot au texte. Thèse de doctorat ès Lettres, Université de Paris IV. Rastier, F. 1987. Sémantique Interprétative, Paris : Presses Universitaires de France. Selman, B. ; G. Hirst. 1985. A rule-based connectionist parsing system. Proceedings of the Seventh Annual Cognitive Science Society Conference, Irvine, California. Sowa, J. 1984. Conceptual Structures : processing in mind and machine, Addison-Wesley, Reading, Massachussetts. Stallard, D. 1987. The logical Analysis of Lexical Ambiguity. Proceedings of the 25th Annual Meeting of the ACL, Stanford University, California, pp. 179185. Thorn, R. 1972. Stabilité structurelle et morphogenèse, Paris : BenjaminsEdiscience. Victorri, B. 1988. Modéliser la polysémie. In TA Informations, vol. 29, n° 12, Paris, France, pp. 21-42. Victorri, B. ; J.P. Raysz ; A. Konfe. 1989. Un modèle connexionniste de la poly sémie. Actes de la conférence Neuro-Nîmes, EC2 ed.
COARSE CODING AND THE LEXICON CATHERINE L. HARRIS Boston University, USA
Polysemy (words' multiple senses), while a source of delight for humorists and essayists, poses descriptive and theoretical problems for students of language. Does each distinct sense of a word receive a separate entry in the mental lexicon (Miller & Johnson-Laird 1976 ; Ruhl 1987) ? What factors make a particular sense of a word distinct enough from others that its meaning merits separate listing ? What principles constrain the types of relationships among the senses of a word (Jackendoff 1983 ; Lakoff 1987 ; Deane 1988) ? An abundance of recent work has provided some provocative answers. Observing that sense differences often have syntactic ramifications, theorists have identified these senses as the ones deserving distinct lexical entries (Pinker 1989 ; Grimshaw 1990). Related in spirit is Ruhl's (1989) "monosemic bias." Ruhl urges theorists to propose distinct lexical entries only after attempts to find a core sense common to all uses has failed. In contrast, the "radial category" perspective suggests that a much larger range of senses of a word are represented in the mental lexicon. Principles of human categorization and conceptual metaphor are thought to structure the relationship between word meanings (Lakoff 1987). Using spatial prepositions as his example, Deane (1992) attributes regularities among the polysemes of a word to basic cognitive processes such as the human ability to have different perspectives on the same spatial relation. This variety of opinion highlights the trade-offs between positing maximally abstract representations and enumerating the diverse senses of polysemous words. The former approach captures our intuitions about what is common across a word's uses, but seldom specifies details of the possible range of uses. The latter approach accounts for the range of uses, but obscures their commonalities. Unfortunately, both these approaches put off to be solved another day the problem of explaining how rules for words' contextual integration act to produce the specific interpretation we obtain on hearing words in context. In this chapter I argue that adopting a memory-based approach to lexical re presentation will illuminate the tension between abstractness and specificity of re presentation, as well as helping with the question of contextual integration. There
206
CATHERINE L. HARRIS
are two key ideas in the memory-based approach. The first is that units larger than words (such as phrases, clauses, conventional collocations and idioms) are the primary storage unit. On this view, words are laid down in memory with their fre quent left and right neighbors, and the meaning that is stored with them is the meaning of the unit as a whole, rather than the separate meanings of the individual words in the unit. This view has antecedents in Bolinger (1976), and draws hea vily on concepts and examples in Fillmore (1988) and Langacker (1987). The se cond key idea is that this large, heterogeneous set of phases and word combina tions is not a static list, but is stored in "superpositionaT or "distributed" fashion (Hinton, McClelland & Rumelhart 1986). The term from my title "coarse coding"1 refers to the encoding scheme in which information is represented as a pattern of activation over a pool of simple processing units which participate in the encoding of many different pieces of information. The virtues of conceiving of the lexicon as a superpositionally- stored list of phrases include the advantages noted by connectionist and neural-network resear chers : prototypes emerge when similar patterns reinforce each other, irregular pat terns are maintained if favored by frequency, and novel patterns can be generated or interpreted on analogy to familiar ones (McClelland, Hinton & Rumelhart 1986). I will try to show how, in addition to providing a natural representation for idioms and conventional expressions, the coarse-coding view incorporates me chanisms for both context-sensitivity and the abstraction of argument structure and subcategorization relations. I first describe coarse-coding schemes and why they are a useful way to conceive of lexical representation. Drawing on linguistic and psycholinguistic phe nomena, I motivate the view that the primary unit of linguistic storage is not the word, but is some larger piece (phrase, clause and sentence). Some aspects of the coarse-coding proposal can be illustrated with an existing simple connectionist model of prepositional polysemy (Harris 1994), although other aspects await a more thorough implementation. At that point in my story, a reader may well ask, if the organization of word and sentence meaning exquisitely reflects the statistics of the language, as I argue it does, what psychological variables constrain the statistics of the language ? My view is that factors related to language processing and communicative function are the ultimate shapers. Following researchers in the grammaticalization framework (Meillet 1958 ; Lehmann 1985 ; Givon 1989), I characterize speakers' communicative needs as a trade-off between the need to
1) In this chapter, I will use the term coarse coding rather than distributed representation to emphasize that coding schemes may vary in their coarseness. A coding scheme which contains some material that is relatively localized, but others that is distributed, can still be called a coarse coding scheme.
COARSE CODING
207
minimize processing costs while maximizing communicative impact. Polysemy figures in this equation because polysemy boosts the usage frequency of a word, which drives down the cost of lexical access. But extending a word into varied semantic contexts semantically bleaches it, which decreases its communicative impact. To achieve maximal impact, speakers reach for fresh words (Lehmann 1985). The historically observed cycle of recruitment of a new item, increasing semantic extension, and subsequent phonological reduction and ultimate use as a grammatical morpheme (Sweetser 1990) suggests that an encoding scheme which is inherently continuous will serve us well in understanding both synchronic and diachronic variation in words' form-meaning mappings. Coarse Coding In a coarse-coding scheme, the representational units do not match the in formation to be represented (e.g., "concepts") in a one-to-one fashion. Instead, each unit is active in representing a number of concepts. A concept is represented by a number of simultaneously active units. Distributed representations promote generalization (McClelland & Kawamoto 1986 ; St. John & McClelland 1988 ; Harris 1990) and exhibit graceful degradation (if one unit is destroyed, no single pattern is destroyed, although several patterns might be slightly degraded ; Hinton & Shallice 1991). An additional computational advantage is representational effi ciency (Hinton, Rumelhart & McClelland 1986). In a localist encoding, N units can represent at most N concepts. With coarse coding, a concept is represented by the joint activity in a number of units. The number of concepts that can be repre sented increases as the number of units that are simultaneously active increases (as long as each unit is active for several difference concepts ; Touretzky & Hinton 1988). Coarse Coding and Locating Visual Features One way to get a feel for how coarse coding leads to greater representational efficiency is to work through the visual processing example presented by Hinton, Rumelhart and McClelland (1986). The following four ideas are important to their example. 1. Receptive Field. The receptive field of a neuron in visual cortex is the area of the visual field to which the neuron is sensitive. The neuron becomes active if there is movement or change within this field. 2. Diameter and Overlap of Receptive Fields. In visual cortex, in dividual neurons often have large receptive fields which have considerable overlap with other neurons. The location of a feature in the visual field is accurately pin pointed when it falls within the receptive fields of a number of neurons. The joint activity of several neurons indicates that the feature is located at the intersection of the active units' receptive fields.
208
CATHERINE L. HARRIS
3. Accuracy Increases With Receptive Field Diameter. If there is no overlap in receptive fields, then we have a localist encoding rather than a distri buted one. We would say that the grain size of our coding scheme is fine, rather than coarse. No overlap means that single neurons are solely responsible for iden tifying the location of discriminable stimuli. If N processing units do not overlap, then N distinct locations in the visual field can be identified. But if we double the radius of a receptive field, then the fields of our N neurons will overlap, and we double the number of different locations that can be discriminated (assuming that each addition of an active neuron leads to a discernibly different network state.) 4. Coarse Coding Only Efficient if Features are Sparse. Hinton et al. point out that, if two or more stimuli in close proximity are to be distingui shed, then coarse coding will hinder more than help : several processing units will become active in response to more than one stimulus. In this case, a finer-grained coding scheme is needed, perhaps even a localist encoding. Coarse Coding and the Lexicon In the visual-field example above, the receptive field of a neuron in visual cortez is the set of simpler neurons in the retinotopic map. For the word-meaning example I will develop here, I will refer to "processing units" instead of "neurons". The receptive field of these processing units is a field of simpler units. Concepts or meanings are patterns of activation across a pool of units. An indivi dual simple unit does not have a distinct or determinable meaning. My main proposal is that a word is akin to a processing unit with a receptive field that may vary in size and the degree to which it overlaps with the receptive fields of other words. On this metaphor, polysemous words have wide receptive fields, and thus cover large (and perhaps illdefined) areas of semantic space. A distinct meaning (i.e., small region of multi-dimensional semantic space) is identi fied when several words, or words plus aspects of the non-linguistic context, combine to narrow down the space of possible meanings. On this interpretation, words do not encode one abstract meaning nor are they pointers to a list of several specific meanings. Instead, the mapping from sound to meaning is mediated by a coding scheme which varies in its coarseness. On different occasions of use, words communicate different pieces of information. Unambiguous pieces of information are usually communicated by the joint pre sence of several words. Coarse coding is an efficient representational scheme be cause, holding number of lexical items constant, a greater number of specific ideas can be communicated. For example, one could have a separate word for all the ways that an agent can act on an object using a sharp instrument, or one can have the single word cut. A specific intended meaning is pinpointed by conventional verb + particle combinations.
COARSE CODING
209
The traditional view of the advantage of stringing words together into larger units is that the individual items are the primitive building blocks of more complex ideas. The coarse-coding view suggests that multi-word compositions are used not only to construct a meaning that is more complex than any of its parts, but to pinpoint the concepts which are the intended building blocks. Limits to Linguistic Compositionality Three motivations for the coarse-coding view are the difficulty of specifying the building blocks of meaning construction, words' contextual stickiness, and our intuitions that highly polysemous words do not impose a burden on comprehension. Is the word the building block ? By "word" I refer to our folk-concept of a coherent phonological entity. This folk-concept has been concretized in our orthographic systems and legitimized through dictionaries and cultural scripts on how to talk about meaning and inten tion (Reddy 1979). What remains unclear is whether words have distinct, indivi duated meanings that are discretely represented in some kind of mental structure such as the hypothetical mental lexicon. It is now widely recognized that the meanings of1 most natural language ut terances are not obtained by concatenating the meanings of component words (Miller & Johnson-Laird 1976 ; Lakoff & Johnson 1979 ; Brugman 1988 ; Pinker 1989 ; Pustejovsky 1992). Despite this widespread agreement, many theorists continue to regard words as the building blocks of meaningful communication. It is generally assumed by lexical theorists (e.g., Pinker 1989 ; Miller & Fellbaum 1991) that words are privileged in at least two ways : 1. The form (either sound or orthography) of a word is associated with a data structure that is the primary storage site for linguistic meaning. 2. The form of a word is the entry-point into the representational system. These two factors do not logically have to co-occur, and indeed we can imagine a representational and access system in which neither is true, or true only to a degree which may vary from word to word. Researchers who acknowledge the ubiquity of polysemy may find congenial the perspective illustrated in
1) Some theorists consider the meaning of an utterance to be all evoked mental conceptualization (Langacker 1987 ; Lakoff 1987 ; Deane 1992), while others identify linguistic meaning with a subset of this (Pinker 1989). Although my own bias is towards the former view, taking a stance on this point is not necessary for the current discussion of limitations on compositionality.
210
CATHERINE L. HARRIS
Figure IA : the word is the entry-point into the system, but words activate representational structures that correspond to phonological units larger than a word.
Figure 1. Left-hand side lists language inputs, right-hand side "meanings". A. Illustrates the popular proposal that the phonological word is the unit around which meanings are mentally represented. Lexical items (cut, up, down, out) are thought to activate meaning structures. For polysemous words, multiple meanings are activated. In this illustration, only four of these are listed for cut. For up, down and out, only the sense that is typically meaningful in conjunction with cut is listed, but according to this proposal, all other meanings of these words would be listed here. B. It is proposed that frequent word combinations are recognized as units by the language comprehension system and that combinations, such as cut + particle, directly activate their conventional meanings. In addition, direct objects of cut that have certain properties, such as being an object with salient parts, or being a scale amount (e.g., cut costs) can combine with cut to directly activate a conventionalized sense such as "reduce in amount". The need for distinct meaning-representations that correspond to word com binations rather than words may be clearest to some readers for idioms such as
COARSE CODING
211
Shut up and out of sight, yet is necessary for handling many types of valencedmatch combinations, such as verb + particle (as in write off) or verb + highvalence matched noun phrase (such as open the door)1. Once we accept that linguistic concepts are represented in meaning-chunks that correspond to language units larger than the word, it is only a short conceptual leap to the view expressed by Figure 1B, wherein the word is no longer the privileged entry-point, but 2-, 3-, or 4-word combinations may be the size of unit that either initiates or achieves lexical access. On this view, there is a continuum of context-independence, with some words tightly associated with their typical neighbors, and others relatively inde pendent. But what accounts for the enduring appeal of the notion that words are a privileged unit of mental representation ? I suggest that words are privileged, not because of special ontological status, but because the word is the size of unit which maximizes a trade-off between frequency of usage and constancy of inter pretation. To explain what I mean by a trade-off between usage frequency and cons tancy of meaning, I will recruit some data from my ongoing study of the polyse mies of the word cut. I first investigated how many left-and-right context words were required for native speakers to identify the intended sense of cut. All ins tances of cut (nouns, verbs and adjectives) from the Brown University corpus (Francis & Kucera 1989) and the Lancaster University corpus were extracted in a manner that preserved five words of left context and five words of right context, to yield 11-word discourse fragments, with cut being the central word. Two 22year old native English speakers were given 15 minutes of training on how to categorize cut utterances into a classification system of cut senses similar to that described in Harris & Touretzky (1991). Some examples of the classifications made by raters are listed in Table 1. The two raters each judged the sense of 231 utterances in four separate ses sions that took 40 minutes each to complete. Raters sat in front of a Macintosh which controlled stimulus display and stored reaction times. Raters saw first a three-word utterance in which cut was the central word. After making a judgement of what sense of cut it was, they indicated their degree of confidence in this jud gement by hitting a key for either "guess", "some confidence" or "know for sure".
1) The term "Valence-match" refers to word combinations having a close semantic fit between a predicate and arguments (Brugman 1988 ; Mac Whinney 1989). Examples include subcategorization and selectional restrictions, as matches between the semantics of prepositions and their direct objects, such as in the cupboard and over the hill.
212
CATHERINE L. HARRIS
At this point the computer presented an additional right and left neighbor, and raters again selected a sense and gave their confidence rating. For each utterance there were a total of 5 increments of context to make up the 11-word discourse fragment, and thus 5 sense judgements for each utterance. Figure 2 shows the percent of utterances that were rated as either "know for sure" or "some confidence" with each increment of context. The absolute number of each type of judgement at the various increments obviously depends on task demands, such as the pressure raters may have felt to say "guess" at the 3-word and 5-word fragments to avoid the embarrassment of later reversals of judgement. Nevertheless, what is important is that there is no fixed amount of context neces sary for determining the sense of cut. Instead, we have a continuum of contextual dependency.
Number of Words in Sentence Fragment Figure 2. Raters successively judged the sense of 231 cut utterances with just one leftand-right neighbor (meaning that 3 words were in the sentence fragment), or with 2, 3, 4 or 5 left-and-right context words. At each increment they rated the confidence of their sense selection as either "guess", "know with some confidence*' or "know it for sure". In 97 % of fragments with the maximal amount of context (5 left-and-right context words), raters estimated they either knew the sense for sure, or knew it with "some confidence".
COARSE CODING
213
Although the linguistic unit we call the "word" can not be accorded buildingblock status on the basis of constancy of meaning, factors such as frequency may make it the "right-sized" unit for mental manipulation and mental representation. Intuitively, the larger the unit, the less frequently the entire unit will appear in spo ken and written texts. While whole sentences do repeat themselves in the ambient language (especially colloquialisms or other fixed expressions such, as Easy does it !), they repeat themselves far less frequently then word combinations or single words. Table 1 |
Discourse Fragment
Sense Selected
jar lids, omitting design disk. Cut a notch in lid for
penetrate
He waved at Fox to cut off the finale introduction. The
eliminate
were older two-story mansions, now cut up into furnished rooms and
section
Wars an Austrian threat to cut off supplies of coal to
sever connection
by the rotors. This was cut down to a minimum by
reduce
but you don't look exactly cut out for this life. Still
shape/formed
A comparison of the frequency of cut to the frequency of cut in combinations is illustrated in Figure 3. Cut appears 208 times in the Brown corpus.The most frequent cut combinations of size 2 include cut the (17 occurrences) cut off (16 occurrences), cut down (12 occurrences), cut in (9), cut his (9), cut across (8), cut to (7), cut through (6), cut it (6), cut up (5), cut into (4), cut from (4) and cut over (2). Mean frequencies were calculated for all occurrences of cut combinations of sizes 2, 3, and 4. The "Log Frequency" curve represents the frequency, on a logarithm scale, of the single word cut (log of 208 = 5.33) and the mean frequencies of cut combinations of size 2, 3, and 4. Superimposing the Log Frequency curve over the curve from Figure 2 illustrates the idea that a unit that is about the size of either the word or a valanced-match combination may gain special representation or access status due to an optimal interaction between frequency and constancy of meaning. Words' contextual stickiness Language acquisition researchers have noted that children usually first acquire words in one context of use, such as only saying bye bye ! when guests drive away in a car, or when the word deep is first restricted to describing puddles, and only more later understood to be applicable to swimming pools (Clark, 1983). Children also often learn a whole phrase as one unit, only later having the ability to use the parts out of their original linguistic context, as in the
|
214
CATHERINE L. HARRIS
demand many children can make at 15 months of age, Iwandat ! (Bates, Bretherton & Synder 1989). With time and linguistic practice, words do of course unstick from their original linguistic and extralinguistic environments, but it is likely that many words never entirely "unstick". This is most clearly seen with low frequency words such as paragon, a word whose meaning may be retrievable to some speakers only its typical linguistic context, paragon of virtue.
Size of Fragment (right context only) Figure 3. The single word cut appears 208 times in the Brown corpus. The most frequent cut collocations of size 2 include cut the (17 occurrences) cut off (16 occurrences), cut down (12 occurrences), cut in (9), cut his (9), cut across (8), cut to (7), cut through (6), cut it (6), cut up (5), cut into (4), cutfrom (4), cut over (2). Mean frequencies were calculated for all occurrences of cut collocations of sizes 2, 3 and 4. The "Log Frequency" curve represents the frequency, on a logarithm scale, of the single word cut (208 times in the Brown corpus) and the mean frequencies of cut collocations of size 2,3 and 4. Thise curve was superimposed over the curve reprensenting amount of context necessary to be sure of cut sense in order to illustrate that while the word is not the optimal size of chunk for determining sense, it is more nearly optimal in terms of frequency of access.
COARSE CODING
215
A second sign of words' contextual stickiness comes from psycholinguistic evidence that words are more easily accessed and more quickly understood in con ventional contexts (see Van Petten & Kutas 1991, for a review as well as relevant electrophysiological data). Models of the mental lexicon typically incorporate in formation about words' typical contexts of co-occurrence by positing spreadingactivation links between semantically and thematically related words. It is assumed that these links are built up out of speakers' years of experience with words in di verse contexts. But how these links are obtained from experiential corpora has ne ver been described. In the next section I suggest how the memory-based (or co arse-coding) view of the lexicon may be able to explain this. Why don't words with many meanings, or one abstract meaning, pose a comprehension burden ? If multiple senses of a polysemous word are represented in the lexicon, then the language listener is burdened with the task of selecting the current sense from all of those listed in the lexicon. On the other hand, if the lexicon contains only a maximally abstract encoding, along with abstract representation of allowable ar guments, our challenge is to articulate the rules of contextual inference allowing the concept "steal" to be inferred from sentences such as The thief took the jewels (Jackendoff, 1982 ; Miller & Johnson-Laird, 1976). On both accounts, words which can potentially cover a large semantic territory should imspose a comprehension burden, yet studies have failed to find that sentences containing these words are more difficult to understand than sentences containing words with more specific senses (Millis & Button 1989). One explanation for this might be that the extra processing burden of mat ching the multiple senses of a polysemous word to that word's context is obscured by the processing advantage of being high in frequency as the majority of polysemous words are (Gernsbacher 1984). I agree that the high frequency of polysemous words is part of the reason for their continued use, but would like to add that in many cases (although not all), polysemy does not pose a com prehension burden because the unit that initiates lexical access includes dis ambiguating lexical neighbors. Cut doesn't activate all its possible senses, because the system begins lexical access with cut in, cut up, cut down or the like. (In addition, we t don't have to proposal additional machinery to explain how the ap propriate sense of cut up is obtained from the listing of meanings for cut and up11) Many theorists recognize that the semantics of verb + particle combinations is such that these combination may require lexical entry status. But as long observed by Fillmore (1988) and more recently pointed out by Jackendoff (1992), granting lexical entry status to verb + particle combinations will take care of these obvious cases, but does nothing for the myriad other noncompositional conventional collocations.
216
CATHERINE L. HARRIS
But my claim is more than the idea that verb + particle has the status of lexical entry. My view (following Langacker 1987) is that there is no predefined limit on what amount of phonological signal can be used to activate a stable interpretation. Instead, there are mappings from larger combinations (phrases, even sentences) to stable interpretations, with varying degrees of componentiality within the larger combinations. Frequency of occurrence, and reliability of the form-meaning mappings, are candidates for the factors that determine what parts of the speech stream come to be represented in a relatively context-free manner. Computational realization In what type of representational system could these ideas could be computa tionally realized ? We desire a system with the following properties. 1. Stable (i.e. conventionalized) form-meaning mappings can exist over lin guistic units that vary in size, from sub-word units (morphemes and phoneme clusters with meaning connotations, such as English umble) to multi-word combi nations (including valence-match pairs, and idioms and other collocations). 2. The associations between forms and their meanings are sensitive to hori zontal co-occurrence statistics as well as the variety of meanings that a word can evoke (MacDonald, 1992 ; Juliano, Trueswell & Tanenhaus 1992). Horizontal co occurrence statistics are the frequent left and right neighbors of a word (as well as categorial abstractions over these neighbors). I have recommended conceiving of the lexicon as a memory-based system in which an extremely large number of form-meaning associations is stored. Storing many examples of a word in its different contexts, each with its context-appro priate meanings, ensures that the system contains information about what types of contexts are paired with what types of meanings. Distributed representations give us the ability to abstract over these pairings. If we idealize a language to be a dis tributed set of form-meanings pairs, we have a method for predicting what inva riances will be extracted, and how the degree of specificity of an extracted inva riance is related to the pool of utterances it summarizes. The extracted invariances will be those that were instrumental in learning the training corpus, and will thus be dependent on the type and token frequencies of pattern-set exemplars, and re presentational resources of the network (i.e., number of weights). These inva riances will naturally include the semantic and thematic associations between words that psycholinguistics have long observed in priming and reading-time ex periments. In the next section, I illustrate some aspects of this proposal by describing a simple connectionist network of prepositional polysemy.
COARSE CODING
217
An Illustrative Model In Harris (1994) distributed representations were used to model the mapping from a sentence containing polysemous prepositions to a representation of the sentence's meaning. Prepositional polysemy was selected as the example problem because the mapping from spatial expressions to their interpretation contains regularities which vary in their scope of application (Brugman 1988 ; Hawkins 1984). These regularities could be described by rules, although they would have to be rules that either have exceptions or are rules which have very specific conditions of application. Alternately, the regularities in mapping could be described by a constraint-satisfaction system. I took the approach that the cons traints emerge from the matrix of stored utterance-meaning pairs (Langacker 1987). Construction of corpus and training 2617 sentences of the form subject verb {over, across, through, around, above, under} object were constructed using 81 vocabulary items. The training corpus consisted of these sentences paired with hand-coded feature vectors identi fying salient semantic properties of the sentence's gestalt meaning. These features included domain features (that is, in which cognitive domain does the profiled re lation exist : the domain of space, of time, of money, of interpersonal power, of mental concepts), dimensionality and other salient properties of the primary figures (that is, the subject sentence and object of the preposition, also called the figure and ground, or trajector and landmark), and type of path (curved, end-point-focus). Figure 4 depicts the network architecture, while Table 2 lists some of the sentence templates (word combinations associated with semantic features) that were used to generate the corpus. Table 2 Sentence Templates
Types of Features Assigned
{road, fence, wires,river}stretched around {river, tunnel, road, corner, building, hill}
space curved-path 1-D static obstructing-object
snow lay across {blanket, bridge,river,tunnel, road}
space static 2-D extension
{tree, building,} stood across {field, road}
space path static end-pointfocus
{hiker, children, conspirators} lived around corner
space curved path 1-D static end-point focus
{hiker, children, conspirators} spent over {$100, $1000}
money below
{hiker, children, soldier, conspirators} was under captain
power below
218
CATHERINE L. HARRIS
A goal in constructing the corpus of utterance-meaning pairs was to include words which fall on a continuum of polysemy and which vary in the predictability of their left and right neighbors. I chose 4 prepositions which are relatively highly polysemous {over, across, through, around) and 2 that intuitively have only fewer different senses {above, under). The corpus was also constructed to contain verbs that had either little polysemy (1 to 4 senses), medium polysemy (6 to 10 senses), or high polysemy (13-20 senses). In this corpus, for example, the verbs cost and spent only had three senses, corresponding to whether they co-occurred with the prepositions over, under or around. At the other extreme, verbs such as ran and lay occurred with diverse polysemes of all the prepositions.
Figure 4. The network was trained to associated vectors representing word combinations of the form Subject Verb Preposition Object with vectors representing semantic salient features of the meaning of these word combinations. A relatively large number of vocabulary items was included to incorporate into the model both spatial and non-spatial senses of the prepositions. The corpus was trained using back-propagation (Rumelhart, Hinton & Williams, 1986) until error asymptoted (at 25,000 cycles-roughly 10 training cycles per input-output item). Network behavior Category abstraction. In previous work (Hinton 1986 ; Harris 1990 and others) it has been observed that the hidden-units of networks trained by backpropagation self-organize to categorize aspects of the input vector which participate in similar relations with other parts of the input, or which are paired with similar outputs. For example, some of the hidden units may evolve to have
COARSE CODING
219
identical activations for the items tunnel, woods and field, to capture the regularities in sentences differing only by this word, such as hiker walked through tunnel, hiker walked through woods, and hiker walked through field. Because of the large number of patterns in the corpus described above, the network can best decrease error by evolving hidden units that are selectively activated by items that participate in distributional regularities. For convenience, I'll refer to these hidden-unit organizations as categories. The categories formed by the network during training will vary in their specificity according to the demands of the regularities in the input-output patterns. For example, one of the main distinctions in verbs was whether they participated in spatial or mental relations. The mental verbs read, thought, argued and talked participated in very similar vectors and were thus categorized by the network without further subdivisions. In contrast, the spatial verbs (stood, is, lived, arrived, came, got, flew, moved, walked, ran, lay, and stretched) were similar and dissimilar to each other depending on the other words in the sentence. The verb ran behaved similarly to flew, moved, and walked (in denoting motion) when these words occurred with agentive subjects (soldier, conspirators, children, hiker, birds). But ran behaved similarly to another set of verbs (stretched, lay and was) when it occurred with non-agentive subjects such as road, river and fence. Context dependency. As just described, the network's hidden units did function as abstractions over items that fall in specific sentence positions. However, a more striking feature of the hidden-layer organization was that hidden units were always jointly activated by words in different sentence positions. For example, the hidden units that became strongly activated by the input node for walk were also activated both by items such as run and move as well as input items which commonly occurred with these motion verbs, such as agentive sub jects, the path prepositions over, across, and prepositional objects such as hill and yard. The network appeared to take two solutions to the problem of polysemy. It created internal categories corresponding to sentence-size templates, and it evolved hidden units which conflated semantic attributes of various senses of the polysemous words with their typical contexts of occurrence. To examine the extent to which the hidden-units illustrate the "continuum of context dependency" I analyzed all weights extending from the inputs to the hid den-layer in the following manner. An input unit was classified as activating a gi ven hidden unit if the weight from the input to the hidden unit was greater than the hidden unit's bias weight. In this network, a word's degree of contextdependency is encoded by the extent to which the word activates hidden-units which are also activated by its frequent left and right neighbors. To quantify this, the number of hidden- units which were activated both by a preposition and each of the preposition's possible direct objects was calculated. (Keep in mind that the
220
CATHERINE L. HARRIS
activation of hidden-units is akin to the network's encoding of the meaning of each word -the regularities in its co-occurrences with the semantic feature vector.) The graphs in Figure 5 plot the number of hidden-units activated for by prepositions and 17 selected direct objects. We can see that items which frequently occur together (such as frequent direct objects of over and across, the items building, hill and bridge) jointly activate more hidden units than items which don't occur together, such as the non-occurring combinations over the book and across the contract. Note that above strongly doesn't share strong encoding with any of the direct objects. This is consonant with above's relative context-independence.
Figure 5. Illustration of the continuum of valence-match encoding for four of the prepositions and 17 of the prepositional direct objects. Graphs plot the number of hidden units which were activated by both a particular preposition and the listed prepositional object. Note that frequent direct objects of over and across, such as hill and river, tend to activate the same hidden units that are activated by these prepositions. The input node for above can be viewed as less context-dependent than the other graphed prepositions in that there are fewer hidden units which are simultaneously activated by above and a preposition. The relatively high number of hidden units simultaneous activated by the inputs around and problem is due to the presence in the training corpus of the quasi-idiom, got around the (problem, contract).
COARSE CODING
221
Varied size of receptive field. The number of hidden units activated by an input unit can be called that input's "coarse-coding count" and viewed as the size of that item's "receptive field". The coarse-coding counts for the 81 input nodes varied from 0 to 12. The four "high polysemy" prepositions had coarse-co ding accounts of 5 to 6, while the "low polysemy" prepositions (above and under) only activated 2 hidden units each. The three inputs which activated no hidden units were cost, spent and had_authority. Why would a word activate no hidden units ? A word need not activate any hidden units if the output vector that occurs in all the word's contexts is totally predictable by the other items in the input vector. In natural language this is seldom, if ever, the case, but in the relatively artificial data set constructed for this simulation, the verbs cost, spent and had_authority added no information to the other words in the vector. All input vectors containing cost and spent also had either $100 or $1000 as the preposition's direct object. The verb had_authority al ways occurred in the context of person had_authority over person, a pattern which always activated the feature specifying the power domain. The correlation between the number of distinct senses of a word in the trai ning corpus and that item's coarse-coding count was only 0.29. The sheer diver sity of environments, independent of the question of number of senses, appeared to be the crucial factor for an item to activate a large number of hidden units. For example, the verbs came and got were in the medium polysemy group, yet after training ended up with a high coarse-coding count. Although I did not construct the training set to incorporate distinct senses for came and got, the fact that they could occur with the four highly polysemous prepositions meant that they ended up associated with a large number of meaning vectors. It would be helpful to more rigorously control the implementation of "number of distinct senses" and "diversity of contexts", and to think about whether it is important for our theories of the human lexicon to be sensitive to this difference. Assessment of the implementation of prepositional polysemy A drawback of the implementation just described is that words are identified with specific input nodes. A superior design would pair a phonological represen tation with semantic-feature vectors. This would allow word-sized phonological chunks to come to activate a distinct hidden-unit activation pattern to the extent that these word-sized chunks were predictive of meaning independent of their context. The positive points of the current implementation are that it illustrates several aspects of the theoretical proposal described in the first section of this chapter. - Implements the idealization of language as a set of associations between form and meaning, where "forms" are grammatical word combinations rather than words. Words in this model were not directly associated with specific meanings (except for the trajectors and landmarks, which were always paired with a few
222
CATHERINE L. HARRIS
features in the output vector regardless of the other items in the input vector). Instead, the network was trained to associate an entire "sentence" of the form subject verb preposition object with a semantic feature vector which encoded the relationship holding between the subject and object. - Continuum of context-dependency which reflects co-occurrence statistics, as reflected in the degree to which the hidden units activated by a particular word are the same as those activated by the words with which it typically co-occurs. - Presents a metaphor for conceptualizing the differences among, and the connection between, linguistic and non-linguistic aspects of meaning. The input nodes play the role of information about the form (sound or inscription) of words. The output nodes can be analogized to nonlinguistic meaning, including synapses to neurons that activate long-term memories and motor outputs. The weight ma trices interposed between these two can be conceptualized as the linguistic aspects of meaning : the categories, rules and mappings that mediate between purely com pletely arbitrary representations (the forms of words) and the conceptual structures they ultimately evoke. Processing Factors According to the coarse-coding proposal presented in this chapter, the pro blem of sense selection is minimized in natural language comprehension because processing units which encode a words' diverse senses also encode words' typical linguistic contexts. Once we have removed the problem of why polysemy does not tax comprehension, we are left with reasons for why polysemy is commonsensical. Intuitively, words that cover a large semantic territory will be used often be cause they fit more communicative contexts : high frequency derives from appli cability. An additional factor is the correlation between frequency and ease of lexi cal access. Usage frequency of a word, as measured by word counts in written texts (or by speakers' rating of familiarity), is the most consistent predictor of reaction time to naming and lexical decision tasks (i.e., deciding if a letter string is a word ; Forster 1981 ; Gernsbacher 1984). One significant processing cost is the difficulty of finding the phonological representation of a word or words given the intent to communicate a specific con cept or set of concepts. This link between meaning and sound in producing a sen tence has been hypothesized to be the weak link in the chain of linguistic proces sing (Bates & Wulfeck 1989). If this is indeed a weak link, then words that are easily accessible will be maintained in the lexicon. The characteristics of words that facilitate access are likely to include high frequency, high valence match with adjoining words, and high routinization of word combinations. Two types of evidence suggest that the link between meaning and sound is likely to be the weakest link in sentence processing. Cognitive psychologists have
COARSE CODING
223
shown that the more arbitrary an association, the more difficult to learn and more vulnerable to forgetting it is, and the more its access is facilitated by frequency of exposure (Anderson 1983). Aphasiologists have suggested that anomia (word finding difficulties) may be the common ingredient in the diversity of types of aphasia (Bates & Wulfeck 1989). One method for protecting the weak link between meaning and access to the sound pattern is to increase the frequency of access. Words that are repeatedly used will be more likely to remain in the language. Polysemy is thus a feature that decreases speakers' processing costs by decreasing the effort necessary for access. Above we noted that generality of meaning, or high number of distinct mea nings ("wide receptive field"), leads to high usage frequency. But causality may flow in the other direction. Words that are highly frequent are those that will be easiest to access, and thus are likely to be extended into new semantic territory, either to fill a new semantic niche that has appeared due to technological or cultural innovation, or to supplant existing words that may be harder to access because of their lower frequency. Pressures opposing polysemy. Not all words are highly polysemous. One force at work is likely to be speakers' goal of maximizing communicative im pact. All animal species grow accustomed to the commonplace and dishabituate to novelty (Lehmann 1985). Historical linguists refer to the "semantic bleaching" that accompanies the extension of a word into new semantic territories (Sweetser 1988). To increase impact, speakers are motivated to recruit old words to new uses, or to coin new lexical items. A second force for new coinages and for words with a restricted semantic range is that coarse-coding schemes have disadvantages when fine semantic distinctions are required. Many "overlapping receptive fields" need to be present to pinpoint a very precise meaning. If each of these receptive fields is a word, then the string of words necessary to convey a specific meaning would place processing burdens on speaker and hearer. Speakers will be motiva ted to use less phonological material, a motivation which may lead to the coining of new words. Closing Remarks, Related Approaches The short form of my proposal is that both language form and meaning are stored in chunks larger than a word, and that therefore the meaning of words is usually tightly linked to their typical contexts of occurrence. If we think it probable that a similar "microstructure of cognition" underlies linguistic as well as nonlinguistic abilities, then the continuum of context-dependency one observes in loo king at word meaning is just a linguistic manifestation of a continuum that is om nipresent in other cognitive domains.
224
CATHERINE L. HARRIS
I've proposed that the basic representational structure underlying linguistic knowledge is an associative pairing between actual sentences and their meanings. To the extent that humans possess templates, schemas, categories and rules, these are abstractions over stored utterance-meaning pairs. Among readers sympathetic to this proposal, I anticipate there will be those who find it so commonsensical as to render exposition unnecessary, while others may find much of it a restatement of ideas already thriving in the literature. I happily grant the latter point and will identify below at least a few of the theorists with whom these ideas originated. But it is worth emphasizing that the memory-based approach is strongly at odds with conventional wisdom about the nature of language. A succinct statement of this conventional wisdom is Aitchison's (1991) ex planation (to a general audience) that generalizations, not utterance-meaning pairs, are the fundamental representational structure of linguistic knowledge. "A lan guage such as English does not have, say, 7, 123, 541 possible sentences which people gradually learn, one by one. Instead, the speakers of a language have a fi nite number of principles or 'rules' which enable them to understand and put toge ther a potentially infinite number of sentences" (p. 14). After presenting examples of phonological and morphological rules, Aitchison concludes, "In brief, humans do not learn lists of utterances. Instead, they learn a number of principles or rules which they follow subconsciously." I have proposed that utterances are precisely what humans learn. Generalizations are hard- won, and are extracted only to the extent that they are li censed by statistical regularities, and existing abstractions over statistical regulari ties. The importance of people's store of expressions has been expressed by Langacker (1987) as follows : The grammar lists the full set of particular statements representing a speaker's grasp of linguistic convention, including those subsumed by general statements. Rather than thinking them an embarrassment, cognitive linguistics regard particular statements as the matrix from which general statements (rules) are extracted (p. 46).
The present proposal can be viewed as a logical extension of the approach to morphology championed by Joan Bybee (1985). Bybee notes that the traditional concern of the field of morphology, dividing words into parts and assigning mea ning to the parts, fails because it is impossible to find boundaries between mor phemes, and because morphemes change shape in different environments. If the field of morphology can not aspire to finding a one-to-one relation between se mantic units and their phonological expression, what should be morphologists' goal? Bybee takes on the task of explaining deviations from one-to-one correspon dence in terms of general cognitive characteristics of human language users. One of these is human's ability to use rote processing even for forms that could be
COARSE CODING
225
morphologically decomposed, and the tendency for frequency of occurrence to be the key characteristic that supports rote processing. I am not the first to import the coarse coding metaphor from visual proces sing to the domain of word meaning. Drawing on findings that patients with right hemisphere lesions have problems understanding nonliteral and pragmatic implica tions of linguistic expressions, cognitive neuroscientists Beeman, Friedman, Grafman, Perez, Diamond & Lindsay (1992) hypothesized that the right hemis phere, more than the left hemisphere, may coarsely code semantic information. By "greater coarse coding" in the right hemisphere the authors mean that large seman tic fields are weakly activated, allowing concepts that are more distantly related to the input word to become activated. This "long distance" activation could be what mediates the RH's ability to obtain more than one interpretation of a word or phrase, or to obtain the pragmatic point implicit in the meaning of several words. To test this hypothesis, Beeman et al. conducted a hemifield priming experiment. Subjects either read target words proceeded by three weakly related primes, or they read target words proceeded by one strong prime (flanked by two unrelated words). When naming targets presented to the right hemisphere, subjects benefited equally from both prime types, but benefited more from the one strong prime when naming targets presented to the left hemisphere. Why the two hemispheres differ in their degree of sensitivity to semantic overlap of multiple words remains to be addressed. Finding evidence that it re flects the conflicting demands of precision and creativity would do much to legi timize the notion that the structure of language is an exquisite interplay of spea kers' diverse communicative needs. Philosophers of science have long noted that the right theoretical metaphor can do much to invigorate even an ancient field. It is too early to know if the co arse coding metaphor will propel the field of lexical semantics, but an increased understanding of the nature of semantic continuums will do much to change poly semy from problem to delight for students of language.
226
CATHERINE L. HARRIS
REFERENCES
Aitchison, J. 1991. Language change : Progress or Decay ? 2nd Edition, Cambridge : Cambridge University Press. Anderson, J. 1983. The architecture of cognition, Cambridge, MA : Harvard University Press. Bates, E.A. & B. Wulfeck. 1989. Crosslinguistic studies of aphasia. In B. MacWhinney & E.A. Bates (eds.), The crosslinguistic study of sentence processing, New York : Cambridge University Press. Bates, E. ; I. Bretherton & L. Snyder. 1988. From first words to grammar : Individual differences and dissociable mechanisms, Cambridge : Cambridge University Press. Beeman, M. ; R.B. Friedman, J. Grafman, E. Perez, S. Diamond & M.B. Lindsay. 1992. Summation priming and coarse coding in the right hemisphere. Paper presented at the 33rd Annual Meeting of the Psychonomic Society, St. Louis, MO. Bolinger, D. 1976. Meaning and memory. Forum Unguisticum, 1, 1-14. Brugman, C. 1988. The story of 'over' : Polysemy, semantics and the structure of the lexicon, New York : Garland Publishing. Clark, E.V. 1983. Meanings and concepts. In J.H. Flavell & E.M. Markman (eds.), Cognitive Development, Volume III of the Handbook of Child Psychology, New York : John Wiley. Deane, P.D. 1988. Polysemy and cognition. Lingua, 75, 325-361. Deane. P.D. 1992. Multimodal semantic representation : on the semantic unity of over and other polysemous prepositions. Ninth Annual Meeting of the Eastern States Conference on Language, SUNY, Buffalo. Fillmore, C.J. 1988. On grammatical constructions. Unpublished manuscript, Department of Linguistics, University of California, Berkeley. Forster, K.I. 1981. Lexical access and lexical decision : Mechanisms of frequency sensitivity. Journal of Verbal Learning and Verbal Behavior. 22, 22-44. Francis, W.N. & H. Kucera. 1982. Frequency analysis of English usage : lexicon and grammar, Boston : Houghton Mifflin. Gernsbacher, M.A. 1984. Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness and polysemy. Journal of Experimental Psychology : General, 113, 256-281.
COARSE CODING
227
Givon, T. 1989. Mind, code and context : Essays in pragmatics, Hillsdale, NJ : Erlbaum. Grimshaw, J. 1990. Argument structure, Cambridge, MA : MIT Press. Harris, C.L. 1990. Connectionism and cognitive linguistics. Connection Science, 2, 7-34. Harris, C.L. 1991. Parallel Distributed Processing Models and Metaphors for Language and Development. Ph.D. Dissertation, University of California, San Diego. Harris, C.L. 1994. Back-propagation representations for the rule-analogy continuum. In J. Barnden, & K. Holyoak, (eds.), Analogical Connections, Norwood, N.J : Ablex. Vol. II, pp. 282-326. Harris, C.L. & D.S. Touretzky. 1991. Verbal polysemy as a knowledge representation problem. Paper presented to the Second International Cognitive Linguistics Conference, Santa Cruz, CA. Hawkins, B. 1984. The semantics of English spatial prepositions. Ph.D. Dissertation, University of California, San Diego. Hinton, G.E. ; J.L. McClelland & D.E. Rumelhart. 1986. Distributed representations. In D.E. Rumelhart & J.L. McClelland (eds.) Parallel distributed processing : Explorations in the microstructure of cognition, vol. 1, Cambridge, MA : MIT Press. Hinton, G.E. & T. Shallice. 1991. Lesioning an attractor network - Investigations of acquired dyslexia. Psychological Review, 98, 74-95. Jackendoff, R.S. 1983. Semantics and cognition, Cambridge, MA : MIT Press. Jackendoff, R.S. 1992. The boundaries of the lexicon, or, if it isn't lexical, what is it? Ninth Annual Meeting of the Eastern States Conference on Language, SUNY, Buffalo. Juliano, C. ; J.C. Trueswell & M.K. Tanenhaus. 1992. What can we learn from "That" ? Paper presented at the 33rd Annual Meeting of the Psychonomic Society, St. Louis, Missouri. Lakoff, G. 1987. Women, fire, and dangerous things : What categories reveal about the mind, Chicago : Chicago University Press. Lakoff, G. & M. Johnson. 1980. Metaphors we live by, Chicago : Chicago University Press. Langacker, R.W. 1987. Foundations of cognitive grammar, vol. I : Theoretical prerequisites, Stanford, CA. : Stanford University Press. Leech, G. & R. Leonard. 1974. A computer corpus of British English. Hamburger Phonetische Beitrage 13, 41-57. Lehmann, C. 1985. Grammaticalization : synchronic variation and diachronic change. Lingua e Stile 20.
228
CATHERINE L. HARRIS
MacDonald, M.C. 1992. Multiple constraints on lexical category ambiguity resolution. Paper presented at the 33rd Annual Meeting of the Psychonomic Society, St. Louis, Missouri. MacWhinney, B. 1989. Competition and lexical categorization. In R. Corrigan, F. Eckman, & M. Noonan (eds.), Linguistic Categorization, Amsterdam : Benjamins. McClelland, J.L. ; D.E. Rumelhart & G.E. Hinton. 1986. The appeal of parallel distributed processing. In D.E. Rumelhart & J.L. McClelland (eds.), Parallel distributed processing : Explorations in the microstructure of cognition, vol. 1, Cambridge, MA : MIT Press. McClelland, J.L. & A.H. Kawamoto. 1986. Mechanisms of sentence processing : Assigning roles to constituents. In J.L. McClelland & D.E. Rumelhart (eds.) Parallel distributed processing : Explorations in the microstructure of cognition, vol. 2, Cambridge, MA : MIT Press. Meillet, A. 1948. L'evolution des formes grammaticales. In Linguistique historique et linguistique générale, Paris : Champion. Miller, G. & P.N. Johnson-Laird. 1976. Language and perception, Cambridge, MA : Harvard University Press. Miller G. & C. Fellbaum. 1991. Semantic networks of English. Cognition, 41, 197-229. Milus, M.L., & S.B. Button. 1989. The effect of polysemy on lexical decision time : Now you see it, now you don't. Memory and Cognition, 17, 141-147. Pinker, S. 1989. Learnability and cognition, Cambridge, MA : MIT Press. Pustejovsky, J. 1992. In B. Levin & S. Pinker (eds.) Lexical and conceptual semantics, Cambridge, MA : Blackwell. Reddy, M.J. 1979. The conduit metaphor : A case study of frame-conflict in our language about language. In A. Ortony (ed.), Metaphor and thought, Cambridge : Cambridge University Press. Ruhl, C. 1989. On monosemy, Albany, NY : SUNY. Rumelhart, D.E. ; G.E. Hinton & R.J. Williams. 1986. Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (eds.), Parallel distributed processing : Explorations in the microstructure of cognition, vol. 1, Cambridge, MA : MIT Press. Touretzky, D.S. & G.E. Hinton. 1988. A distributed connectionist production system. Cognitive Science, 12, 423-466. St. John, M.F. & J.L. McClelland. 1988. Applying contextual constraints in sentence comprehension. Proceedings of the Tenth Annual Conference of the Cognitive Science Society, Hillsdale, NJ : Erlbaum. Sweetser, E. 1990. From etymology to pragmatics, Cambridge : Cambridge University Press.
COARSE CODING
229
Van Petten, C. & M. Kutas. 1991. Electrophysiological evidence for the flexibility of lexical processing. In G. Simpson (ed.), Understanding word and sentence, Amsterdam : North-Holland, pp. 129-184.
CONTINUITY, POLYSEMY, AND REPRESENTATION : UNDERSTANDING THE VERB CUT DAVID S. TOURETZKY Carnegie Mellon University (School of Computer Science), USA
Introduction In this chapter 1 would like to outline two senses of continuity, at entirely different levels of analysis, that one may encounter in the context of language un derstanding problems. Specifically I will look at the English verb cut, which is polysemous, imagery-laden, and open to numerous metaphorical extensions. The first hypothesis of this chapter is that while there may well be a small set of dis tinct senses of cut, when encountering the word in context we activate a blend of these senses, with some more primary than others. Even subliminally active senses may contribute to our understanding of an utterance, by for example pri ming future inferences. A second hypothesis has to do with representation in connectionist net works, which are often touted as having a continuous flavor. The essential pro perties of localist and distributed representations are reviewed, and shown to be fundamentally at odds. The holy grail of connectionist knowledge representation, in my view, is to reconcile these two approaches. Until that occurs, connectionist representations will not be very brain-like, nor are they likely to contribute much to our understanding of polysemy. 1.
The Verb cut
The principal senses of cut include sever, section, slice, excise, incise, di lute, diminish, terminate, traverse, and move quickly. Senses are frequently as sociated with syntactic particles, and their meanings often invoke image schemas (Langacker, 1987), as shown below1. - sever a one-dimensional object : cut. ; or sever the distal end of a onedimensional object : cut off. 1) This analysis was largely done by Catherine Harris, who has collaborated with me in investigating the meanings of cut.
232
DAVID S. TOURETZKY
- excise a two- or three-dimensional part from a whole : cut out. - incise into a two- or three-dimensional whole : cut into. - slice an object by dividing it into parallel sections perpendicular to its major axis or axis of symmetry, e.g., cut up a salami. - section an object by dividing it radially through its axis of symmetry, e.g., cut a cake. - dismember an object by dividing it into irregularly-shaped parts. - diminish some quantity, e.g., cut the amount of sugar in a recipe. - dilute a subtance by mixing it with other substances, e.g., cut whiskey with water. - terminate some action or process, e.g., cut off the flow of water from a pipe. - traverse an area, e.g., the runner cut across the field. - move quickly, e.g., the quarterback cut left to avoid a tackle. The particle structure of cut is far richer than the above listing indicates. For example, cut up can mean slice, section, or dismember, depending on the direct object. The particle up can also invoke an image schema implying repetitive action over an area (Lakoff 1987 ; Brugman 1988), so that cut up can mean "multiplyincised", as in The accident left him bruised and cut up. Similarly, cut down can either mean "sever" (as in cutting down a tree) or "diminish" {cutting down on salt in ones diet.) Cut also has many metaphoric uses, often involving image schemas. For example, That car cut me off evokes the "sever" sense of cut, with paths viewed metaphorically as one-dimensional objects. To be cut from the team means "excised", with the team (a social structure) viewed as a physical object one of whose components was removed. When someone's expertise cuts across several disciplines, we have a more complicated metaphor in which intellectual domains are viewed as physical regions, and spanning a domain is metaphorically descri bed as motion through it ; thus, the expertise "traverses" a broad region of intellec tual space. In many instances of the use of cut, a complete understanding of the mea ning relies on multiple senses interacting synergistically. For example, in (1) be low, the primary sense might be "excise", while in (2) it is "sever". (1) (2)
John cut the applefromthe tree John cut the boat from the dock
If asked to visualize the cutting in (1) and describe the literal object of the cut action, people generally report that what is being cut is not the apple, but rather the stem that binds the apple to the tree. This focuses on a secondary sense of cut, "sever", which applies to objects like stems that have one-dimensional extent.
POLYSEMY AND REPRESENTATION
233
In (2), "sever" is the most salient sense. People know that boats are typi cally connected to docks by mooring ropes, and ropes are one-dimensional (hence severable) objects. The use of a particle also provides an important cue : cut $x$ from $y$ is associated only with certain senses of cut. It is never used to express "incise" or "terminate", for example. Thus we see that the meaning of cut in con text is determined by a combination of syntactic cues (particles), world knowledge (apples are connected to trees by stems, boats are connected to docks by ropes), and image schemas (things with one-dimensional extent are severable.) In the classical model of polysemy, the language understander picks one sense as the "correct" meaning of the word and discards the others. However, it's clear that many senses of cut contribute to the meaning of (3) : (3) John cut a piece from the cake Here is a list of the relevant senses, roughly ordered by saliency : - Sense 1 : "excise" a part from the whole. - Sense 2 : "sever" the piece from the material it's attached to. - Sense 3 : "section" the cake. - Sense 4 : "incise". A knife (the default instrumentfor cutting cakes) enters the cake as part of the cut action. - Sense 5 : "traverse". The knife traverses the surface of the cake as part of the cut action. - Sense 6 : "diminish". The cake is diminished by having a piece removed. The meaning of cut in this context can be regarded as a blend of the above senses, with different degrees of participation based on their respective salience. This blending operation is not some blind mechanical superposition ; we don't confuse roles between the senses and think that the piece is being traversed or the knife is being diminished, nor do we confuse the construal of cake as "substance" to be severed with its construal as "radially symmetric object" to be sectioned. Rather, the effect of the blending is to generate a collection of inferences, or a po tential for inferences, based on all of the above ways of understanding the action. So we infer that there is motion of a knife because of the incise and traverse as pects ; we infer that there is now less cake because of the excise and diminish as pects ; we view the action as irreversible because of the nature of the severing ; and so on. By asking questions about the sentence, or using it to prime the understanding of a following sentence, we can demonstrate that these inferences do in fact take place. The notion of semantic continuity I am suggesting is that a broad range of shades of meaning may be derived from a polysemous word in context, by mixing multiple senses weighted by salience. This weighting is a function of both world knowledge, e.g., knowledge about the properties of objects like apples,
234
DAVID S. TOURETZKY
boats, and cakes, and contextual priming. The latter causes particular senses to fi gure more prominently in the conglomeration of meanings without necessarily eliminating any of the lesser senses. Compare (4), which emphasises "incise" and "traverse", with (5), which emphasises "excise" and "diminish" : (4) (5)
With a bent, rusty knife, John cut a piece from the cake Prompted by jealousy, John cut a piece from the cake
This view of word meaning does not require continuity in the technical ma thematical sense, in which there are an infinite number of points between any two points in the semantic space. A discrete semantic space would certainly suffice, provided only that the grain was fine enough to accomodate a sufficiently large number of semantic distinctions. Finally, as Norvig (1988) has observed, texts can admit multiple simulta neous interpretations with distinct meanings. An example is : (6)
John cut the engine
For some readers, this sentence invokes both the "terminate" and "sever" senses of cut. The former is based on John's performing some unspecified act to terminate the engine's running. (Here, the engine is used metonymically to refer to its operation.) Processes like the running of an engine have time lines that can be viewed metaphorically as one-dimensional objects, and hence cut applied to a pro cess implies interruption or termination via "sever". But a second interpretation of (6) is that John literally severed something such as a fuel line, which either pre vented the engine from running or stopped it if it was already running. These two readings of (6), one metaphorical and one literal, are not the sort of blending of senses I was discussing earlier ; they are distinct interpretations. But they are not incompatible interpretations. Some readers produce both simultaneously. Thus we see that multiple senses may contribute to an interpretation, and on another scale, multiple interpretations may exist simultaneously. The shape of "meaning space", whether continuous or not, is certainly complex. 2.
Inference in Localist Networks
In earlier work, Joseph Sawyer and I built a system for understanding simple usages of cut. The system combined syntactic cues, semantic features, and world knowledge to select the dominant sense of an instance. The syntactic cues came from particles and prepositions ; the system accepted verb phrases of a small number of types, such as cut the $x$from the $y$, cut off the $x$9 cut into the $x$9 etc. The nouns that could fill the $x$ and $y$ slots were tagged with a va riety of semantic features, such as "physob" for things that were physical objects, or "1-dim" for things that could be viewed as essentially one-dimensional. Some
POLYSEMY AND REPRESENTATION
235
nouns offer a choice of construals, e.g., team can mean either an abstract set of players or a physical collection of players. To be cut from the team means to have set membership revoked, but The team boarded the bus is a statement about phy sical objects. Multiple construals were represented by multiple nodes linked to the same word node. The various senses of cut were also represented by nodes linked into this syntactic/semantic network. No one piece of evidence was sufficient to determine the meaning of an instance, but by accumulating bits of syntactic and semantic support, a sense node could increase its activity level. The most active node would eventually win the competition. Thus, rather than using a hard deductive proce dure to grind out an interpretation of a cut instance, the system took a softer ap proach in which multiple competing interpretations would be partially active, but in the end only the strongest remained1. There is a sort of continuity of representation here, in that the activation le vels of nodes are continuous values, giving the network an infinite number of potential states. This is a property shared by all localist, spreading activation net works. However, this particular type of continuity is not very interesting, since there is no blending going on. The system is still built from a finite number of discrete nodes, and its goal is to select the "best" sense node based on the avai lable evidence. Actually, one of the strengths of this sort of representation is that it is able to entertain competing hypotheses without blending them together into an incoherent mush. But pure spreading activation is not sufficient to solve even the simple ver sion of the cut understanding problem, because it doesn't address the issue of how world knowledge is brought in. For example, the preferred sense of (1) is "excise", but if we want to focus on the physical action we will have to switch to "sever". Apples have the semantic feature "3'-dim" while "sever" requires "7dim", so a literal match is not very satisfactory. Section or slice would be compa tible with a three-dimensional direct object, but they don't fit the cut $x$from $y$ syntactic form of the input, whereas one sub-sense of "sever" does. Since no sense offers an acceptable syntactic/semantic match at this point, the system must try a less literal reading. A parallel search of the knowledge base produces the fact that apples are connected to trees by stems, and stems have the semantic feature "7-dim".
1) Although this approach is frequently associated with connectionist-style computation, in contrast with the "classical AI" deductive approach, in truth the former is also part of classical AI, which includes such things as parallel relaxation, heuristic evaluation functions, and production rules with numerical certainty factors.
236
DAVID S. TOURETZKY
So, since one of the sub-senses of cut is to sever a binding between an ob ject and an anchor, and there is the right kind of binding relationship between apples and trees in the knowledge base (i.e., apple fills the "bound" role, and the object of from fills the "anchor" role), this provides much stronger support for the "sever" interpretation, and furthermore allows us to infer that the deep object of cut in this case is "stem", which wasn't even mentioned in the input. This type of reasoning, in which unconnected bits of knowledge are brought together to solve a problem and give rise to new bits of knowledge as a result, I have called dynamic inference (Touretzky 1991). It is notorously difficult for connectionist systems, and well beyond the power of spreading activation models. In our toy cut system, we tackled the dynamic inference problem by allowing nodes to propagate messages to each other rather than scalar activation values ; we used elementary artificial intelligence search techniques to control the generation of these messages. The resulting system did not look at all connectionist. It was suc cessful in a limited way, but suffered form the combinatorial explosion and brittleness problems that have long plagued classical artificial intelligence. 3.
Continuity and Distributed Networks
An entirely different approach to representation is that of distributed connec tionist networks, where concepts are represented not by a single node, but by patterns of activity distributed over a collection of nodes, as in McClelland and Kawamoto's verb sense disambiguation model (McClelland & Kawamoto 1986). The nodes represent semantic features1 that collectively encode meaning. For example, some of the features that could make up cut senses are : 1-d object
2-d object
3-d object
|
penetration
separation
termination
|
physical motion
metaphorical motion
no motion
continuous motion
repeated motion
random motion
radial symmetry
parallel symmetry
no symmetry
single incision
multiple incisions
no incision
proximal/distal
part/whole
single mass
1) It is fashionable to refer to these as "microfeatures", implying that they are autonomously-constructed features encoding subtle statistical regularities of the domain, rather than gross semantic features that would be easily interpretable by human observers. However, in practice most models do in fact use gross semantic features, constructed by the experimenter. Calling these "microfeatures" is just wishful thinking.
POLYSEMY AND REPRESENTATION
237
A particular sense of cut, such as "sever a physical binding", would be en coded as a set of these features. In other words, some of the feature nodes in a connectionist network would be active and some inactive. The obvious advantage of this encoding is that we are not restricted to a small number of pre-defined verb senses. Instead, assuming we start with a rich feature inventory (a few dozen or perhaps a few hundred features), we can encode many thousands of subtly different cut senses, and have a built-in similarity metric (the dot product) for semantically related senses. And if we allow nodes to have real-valued rather than binary activation levels, we have a continuous $n$-dimensional semantic feature space. Although this encoding has great representational power, it also has a fun damental weakness. It is impossible to represent multiple competing hypo theses in a distributed semantic representation1, since there is only one set of semantic features. Thus the system cannot represent the competition "incise vs. terminate", but only some novel and perhaps nonsensical pattern derived from a mixture of the two competitors. A common solution to this problem is to use an associative memory architecture, such as a Hopfield network, to "clean up" a pattern by settling into one of a set of learned stable states. The problem here, though, is that now we're back to a small set of canonical meanings that have been set up as stable states ; we have lost much of the richness of a combinatorial representation. This cleanup difficulty can in theory be addressed by introducing hidden units to enforce semantic constraints among features, but this still does not allow competing meanings to (a) coherently coexist in the network, and (b) collect evidence and trigger supporting inferences to determine an eventual winner of the competition. 4.
The Holy Grail
What people appear to do, and connectionist networks at present cannot do, is represent multiple competing hypotheses each of which has a fine-grain seman tic structure and can potentially give rise to significant inferences. Localist spreading-activation representations support incremental evidence accumulation and soft competition (one form of continuity), but can't handle structured inferences,
1) It is however possible to represent competing hypotheses in a sparse, non-semantic distributed representation, as in (Touretzky & Hinton, 1988).
238
DAVID S. TOURETZKY
e.g., keeping track of which objects fill which slots in which schemas 1. Distributed encodings promise subtle and fluid representations (another type of continuity), but don't easily support representation of multiple entities simulta neously, and also have problems with structure. Artificial intelligence-style se mantic nets are perfectly suited to representing structured information, but fall short in the other areas. I can't predict when we will find a representation that supports all the pro perties we desire in an inference architecture. This is the "holy grail" of connectionist knowledge representation, and a topic of much ongoing research. At pre sent, representing concepts like cut, including the subtle contributions that mul tiple senses make in understanding instances such as (3), is simply beyond the state of the art of connectionist systems. Classical artificial intelligence approaches based on semantic nets at least provide a notation for formalizing our knowledge, and a means, albeit a slow and brittle one, for producing inferences based on it. Hybrid systems, such as Hofstadter and Mitchell's CopyCat (Mitchell 1990), combine spreading activation, stochastic search, and demons that dynamically create and modify network structure. These may prove a useful stepping stone toward more brain-like reasoning architectures.
Acknowledgements : The discussion of "cut" in this paper owes much to conversa tions I've had with Catherine Harris over the past five years. I thank Joseph Sawyer for his work on the computer simulation.
1) There have been some attempts to deal with schema-like representations in a spreading activation or marker propagation networks, but these either get bogged down in combinatorial problems, require excessive machinery to maintain parallelism, or else cannot deal with competition among schemas.
POLYSEMY AND REPRESENTATION
239
REFERENCES Brugman, C. 1988. The Story of 'Over' : Polysemy, Semantics, and the Structure of the Lexico, New York : Garland Press. Lakoff, G. 1987. Women, Fire, and Dangerous Things : What Categories Reveal About the Mind, University of Chicago Press. Langacker, R.W. 1987. Foundations of Cognitive Grammar, vol. I : Theoretical Prerequisites, Stanford, CA : Stanford University Press. McClelland, J.L. and A.H. Kawamoto. 1986. Mechanisms of sentence processing : assigning case roles to constituents of sentences. In J.L. McClelland and D.E. Rumelhart (eds.), Parallel Distributed Processing : Explorations in the Microstructure of Cognition, vol. 2, Cambridge, MA : MIT Press. Mitchell, M. 1990. Copycat : A Computer Model of High-Level Perception and Conceptual Slippage in Analogy Making. Doctoral dissertation, University of Michigan. Norvig, P. 1988. Multiple simultaneous interpretations of ambiguous sentences. Proceedings of the Tenth Annual Conference of the Cognitive Science Society, Hillsdale, NJ : Erlbaum, pp. 291-297. Touretzky, D.S. and G.E. Hinton. 1988. A distributed connectionist production system. Cognitive Science, vol. 12, n° 3, pp. 423-466. Touretzky, D.S. 1991. Connectionism and compositional semantics. In J.A. Barnden and J.B. Pollack (eds.), Advances in Connectionist and Neurally Oriented Computation, vol. 1 : High-Level Connectionist Models, Norwood, NJ: Ablex,pp. 17-31.
THE USE OF CONTINUITY IN MODELLING SEMANTIC PHENOMENA BERNARD VICTORRI CNRS (URA 1234, University of Caen), France Introduction Why should we use continuous models in semantics ? At first glance, this question seems simple : we have to use continuous models if and only if seman tic phenomena are continuous. But this last statement is wrong for at least two reasons. First, continuity or discreteness are not properties of phenomena, they are characterizations of theories upon phenomena. Second, as stressed by D. Kayser in this volume, one can use discrete models to represent continuous concepts, and the other way round. So, our first question must be split into two questions : (1) What kinds of linguistic theories concerning semantic phenomena need concepts related to continuity ? (2) What kinds of mathematical and com puter tools can deal with these concepts ? 1.
Linguistic issues
1.1. Categorisation At every level of linguistic description, we are confronted with the pro blem of classification. The scenario is always the same : in order to reach some degree of generality in the description of any phenomenon, one must define classes of linguistic expressions or relations and state rules in terms of these classes. But how to decide if one given expression or relation belongs to one gi ven class ? In most cases, a single criterion proves to be insufficient since lin guistic data show a great variability. So linguists tend to use sets of criteria to define classes. As a consequence, some graduality is obtained : expressions sa tisfying the whole set of criteria can be said typical elements of the correspon ding class, whereas other expressions can be viewed as more peripheral, further from the center of the class as they satisfy a smaller number of criteria. As we said, this situation prevails in every domain of linguistics. Even in syntax, we can find some sort of graduality and typicality in the definition of syntactical categories, functional relations, classes of transformations, and so on
242
BERNARD VICTORRI
(cf. P. Le Goffic, in this volume). But the domain where these notions most ob viously apply is semantics. Examples are numerous : semantic lexical features (like 'animate' versus 'inanimate'), types of process ('activity', 'accomplishment', 'achievement',...), aspect and modal classifications. In each case, it is relatively easy to exhibit typical examples showing all the features which can characterize a class. But it is also easy to find examples on the border of two classes, where we need to refine the classification, to distinguish between different characteri zations, and to introduce new factors that can take into account these differences. As the number of factors grows, and their interrelations become more complex, an alternative to the classical combinatorial representation comes out : the cons truction of a space representation, whose dimensions can combine the effects of different factors, each one with a specific strength. The advantage is to preserve the relation of proximity between elements (from the same class as well as from different classes) within the frame of a tractable low-dimensional space, by means of a distance on this space. Every element can be assigned a place in this representation, and the notions of center and borders of classes are fully accounted for by their geometrical counterparts. Obviously, some details will be lost from the original complexity, but with a judicious choice of the dimensions, such a compromise can be the best solution to represent the main tendancies in an efficient way, without completely eliminating any relevant factor. It is important to note that the choice of a space representation by no means implies that the phenomena are considered as continuous. Only graduality and typicality are assumed : in other words, here continuity is nothing but an ef ficient tool to deal with a multiplicity of interrelating discrete factors. 1.2* Compositionality One of the main issues for any semantic theory consists in explaining how the meaning of a whole (phrase, sentence, ...) can be computed from the mea ning of its components. The classical approach to this problem is compositiona lity. The starting point is the syntactical structure : at each node of the syntactic tree, meaning is computed bottom-up by applying rules that give the meaning of the current node as a function of the meaning of its directly dependant nodes. Two main difficulties arise in this appoach. First, local rules cannot be sufficient : very often, an element, far away in the syntactical tree, exerts a definite influence over the meaning of a given part of the sentence. The second point concerns the pervasive phenomenon of polysemy. It is well known that many words, and specially the most frequent ones, are highly polysemous. Their precise meanings depend upon the rest of the sentence and have to be computed during the process, so they cannot be taken as a basis for a bottom-up computation.
MODELLING SEMANTIC PHENOMENA
243
These difficulties are tackled by the classical approach. For instance, the so-called mecanism of 'recategorization' enables a node value to be changed in accordance with some operation on an upper node in the tree. But here again, the combinatorial complexity of these mecanisms grow very quickly and one can be sceptical about their capacity to handle the whole set of semantic interrelations in non-trivial sentences (for a discussion about the limits of recategorization, see Fuchs et al. 1991, pp 157-162). The alternative is to consider a sentence as a 'Gestalt' where relations between whole and parts are fully bi-directional. In this view, each component of the sentence interacts with each of the others, in no precise predefined order. What is important is the relative strength of each inte raction which acts as a constraint upon the potential of meanings carried by each polysemous element. Construction of meaning can be seen as the result of a dy namical global process in which a stable solution is obtained when the maximum of constraints are satisfied (Victorri 1992). Related to this issue is the question of 'degrees of analysability' pointed out by R. Langacker in this volume. The compositional approach imposes a dicho tomic choice : a phrase must be fully analysable (computable from its compo nents) or fully idiomatic (considered as a new genuine element). In the gestaltist approach, these configurations are only two extreme cases of a more general si tuation where meaning is partly due to each component and partly an irreducible quality acquired by their interaction. The need of continuity in gestaltist approaches is obvious. Claiming that construction of meaning is a dynamical process implies to define a space where this process can take place, where stable states can be defined, and so on. We must be aware that precise quantification is not necessary : the point of interest is the qualitative behavior of the process. A continuous space is the natural frame in which qualitative properties of dynamical systems can be handled, and nothing more. 1.3. Representation of meaning Another big issue for invoking continuity comes from representation of meaning. The main trend is to use the apparatus of logics, in one way or another, for this purpose. But a significant number of linguists challenge the prevailing views by advocating the use of topological representation. This is a very attrac tive idea : many lexical and almost all grammatical units can be associated to small graphic configurations that outline the kernel of their semantic value whe reas logical representations tend to split the different precise meanings in as many different representations. Moreover, topological concepts and their per ceptive counterparts seem to be more efficient than logical tools to explain the functional properties of these units in a cognitive perspective. The works of
244
BERNARD VICTORRI
A. Culioli (1991), R. Langacker (1987), G. Lakoff (1987) or L. Talmy (1988) are representative of the diversity and creativity one can find in this area. In these theories, continuity and other related topological, geometrical and dynamical notions are used as a conceptual framework. The abstract properties of, say, open intervals, boundaries, attractors, ... are the basic elements of the representation. Here quantification is completely irrelevant. As a matter of fact, one can say that the deep reason to use these concepts is their capacity to group in a single class situations which differ in their quantitative and domain-refe rence aspects. 2.
Mathematical definitions
To discuss the role continuity can play in semantic models, we first have to define this term. Most often, continuity is seen as a property of variables for which there is always an intermediate value between any two given values. Another frequent formulation uses a notion of distance : around any point in a continuous space one can always find other points as close as one wants. But from a mathematical point of view, at least three different notions related to continuity can be defined, and none of them corresponds precisely to these intui tive definitions. 2.1. Continuity versus discontinuity The opposition continuity/discontinuity applies to functions. It was first defined for numeric functions. Technically speaking, a real function f of one real variable is continuous in a given point x if for any open interval J comprising f(x) one can find an open interval I comprising x such as f(I) is included in J. In other words, there is no sudden "jump" in the value of the function when the va riable passes through x. In terms of distance, one can say that a continuous func tion preserves 'closeness' : as the variable get closer to x, the value of the func tion gets closer to f(x). This definition can be easily extended to real functions of several variables, and more generally for any function from one multi-dimensio nal real space to another. When using such a function in a model, points where the function is discontinuous are most often the most interesting ones, because they correspond to situations where the phenomenon under study changes its be haviour in an observable way. 2.2. Discreteness The definition of continuity is by no means limited to functions of real va riables. This property can be defined for functions from any set to any another, as soon as these sets have been provided with a topological structure. To provide a set with a topology, one must define a family of subsets, called open subsets, verifying a small number of rather simple axioms : the entire set and the empty
MODELLING SEMANTIC PHENOMENA
245
set must belong to the family, the intersection of a finite number of members of the family must belong to the family, as well as the union of any (possibly infi nite) number of members. Any set can be provided with several more or less in teresting topological structures. One of them is the discrete topology, for which the family of open subsets is constituted by all the subsets of the set. As a matter of fact, this topology is not very helpful because every function defined on a dis crete space is continuous ! Whenever a distance is provided on a space, a corresponding 'natural' topo logy can be derived for which continuous functions preserve closeness. But this topology can be the discrete one, as is the case, for instance, for the set of inte gers with the standard distance. In fact, no interesting topology can be defined on such a set, where the distance between any two points is greater than a cons tant value. 2.3. Continuum The third definition to be introduced is the notion of a continuum, some times called a continuous space. A continuum is a non-discrete space, but the converse is not true. For instance, the set of rational numbers with the standard distance is neither a discrete space nor a continuum. Though one can find ano ther rational number as close as one wants to any given one, there are neverthe less "holes" in this set. In a sense, filling these holes is equivalent to the axio matic construction of the set of real numbers, which is a continuum. Many simple geometrical properties expected from continuous spaces depend crucially on topological properties of real numbers. To give only one example, given any closed curve in the plane, one expects that a line joining a point inside the curve to a point outside must intersect the curve : such a property would not be true if one considers only points with rational coordinates in the plane. Therefore, the whole complexity of real numbers is required to build an adequate framework for most mathematical models using continuity in a geome trical sense. In particular, dynamical systems are defined on so-called differen tial manifolds, which are generalizations of curves and surfaces, and the topolo gical properties of these manifolds play a central role in the theory. 3. 3,1
Modelling considerations
Qualitative modelling Continuity is generally associated with quantitative modelling. We have just seen why one needs real numbers to get the advantages of continuum topo logy properties. But it does not mean that the model must be quantitative. Once a continuum framework has been built up, one can use it to represent a phenome non, and very often only its qualitative features are of interest. For instance, to
246
BERNARD VICTORRI
catch the notion of graduality, the best solution is to adopt a continuous repre sentation where graduality can be differentiated from sudden jumps by means of continuity and discontinuities in functions, even if one knows that not every point in the space of parameters corresponds to an observable value. To have a discontinuity in one point of the space is a qualitative property for a function, and we do not need to specify the exact position of the point nor the exact value of the jump at this point to characterize the class of functions presenting this outstanding feature. On the contrary, if one adopts a discrete representation, the only way to distinguish this feature from graduality is some kind of threshold : one must define "small" jumps and "big" ones, since there is anyway a jump when passing from one point to the next one. As shown by this last example, in a qualitative model, the focus is not on one particular numeric function but on a class of functions exhibiting a common behavior. Nevertheless, a qualitative model can be predictive. Many qualitative relationships between data can be tested to ascertain its validity. For instance, a model may imply that data must respect a given order relative to the importance of a gradual phenomenon, that some "jump" in its behavior must be observed during the combined variations of a set of parameters, and so on. Such predic tions are as useful as quantitative ones to validate or invalidate a model. 3.2
Dynamical systems As shown by our earlier discussion about linguistic issues, dynamical sys tems theory looks likely to play a central part in linguistic continuous models. It can be used to represent meaning of units, and also to model units interactions in a sentence. In both cases, qualitative modelling is needed. Actually, dynamical systems theory lends itself remarkably well to qualitative modelling. One can define classes of equivalent systems, characterized by a similar behavior, inclu ding in a same class systems on spaces of different dimensionality. This last point is very important in linguistics, where the same unit is used for a great va riety of domain-reference spaces. One important example is the notion of bifur cation, discussed in this volume by R. Thorn, and used as a central concept in Culioli's works on determination. Moreover, dynamical systems can easily deal with the full range of lin guistic phenomena known as ambiguity, indetermination and vagueness. In our work on polysemy (Fuchs et Victorri 1988 ; Victorri et Fuchs 1992), we built a mathematical model in which the precise meaning of a polysemous unit in any given sentence is represented by a dynamical system on a semantic space. The dynamics is parametrized by the other units present in the sentence. Each stable state (i.e. point attractor) of the dynamics correspond to a possible semantic va lue of the polysemous unit, so that the number of attractors and the form of the
MODELLING SEMANTIC PHENOMENA
247
basins of attractors characterizes the meaning of the unit in the given sentence. Thus the presence of two (or more) attractors is related to the existence of an ac tual ambiguity. A large shallow basin of attractor represents an indetermination whereas a deep narrow one represent a specific precise meaning. These different cases can be classified and the semantic behavior of the unit can be defined in terms of the relation between these classes of dynamics and the parameters de pending on the other units present in the sentence which are responsible for the form of the dynamics. For instance, one can observe how small modifications of the sentence induce qualitative changes in the dynamics, such as the appearance or disappearance of an ambiguity as an element of the sentence is replaced by another. 3.3
General framework If we try to outline the general framework emerging from the preceding considerations, we can bring out a few principles which constitute a common basis to continuous modelling in semantics. Two representations can be associated to each linguistic element. The first one is a representation of its kernel of meaning, sometimes called its 'iconic' re presentation, which specifies the constant contribution of this unit to the mea ning of any sentence comprising it. The second one is what we called here its 'semantic space', whose dimensions reflect the degrees of freedom corresponding to the variable precise meanings this unit can convey in different sentences. The interactions between units in a given sentence are then twofold. On the one hand, kernel representations interact, bringing out the full representation of the meaning of the whole sentence. On the other hand, each unit receives from the others a set of constraints which defines its behavior in its semantic space. In both cases dynamical systems theory seems to be the appropriate tool to com pute these interactions. It is at least the right tool to model these interactions as a gestaltist process. As it stands, this framework is not actually an effective semantic model. Our claim is that it defines a research program in which most of linguistic stu dies using continuity as a main ingredient can be included. 4.
Computer tools
4.1. Continuity on a digital computer If we turn now towards computer implementation, representing continuity on a machine is all but a simple problem. From a rigorous mathematical point of view, continuity cannot be reached with a digital computer : even the so-called 'real' variables take their values, whatever the precision, on a discrete finite set of numbers. So, the best we can do is to approximate continuous functions by dis-
248
BERNARD VICTORRI
crete gradual operations and the subtle distinctions we made at the mathematical level no longer apply in this context. Nevertheless, in many domains, and spe cially in physics, computer simulations are used to study continuous mathemati cal models, and machine precision is sufficient to obtain reliable results. So it cannot be argued that a digital computer is not suitable for representing conti nuity. But the problem is elsewhere. In domains like physics, the focus is on quantitative simulations, and computers are used for what they do the best : nu meric computations. In our case, we are most interested in qualitative behavior and choosing numeric values is most of the time an irrelevant burden. As we have shown, quantification is the opposite of what is needed in the mathematical apparatus related to continuity. If computer implementation of a qualitative con tinuous mathematical model imposes arbitrary numeric coding, it will be devoid of interest. 4.2. Connectionism Connectionist networks seem to provide an elegant solution to this pro blem. They are essentially numeric, of course, since the relations between units are defined by a 'real' number, i.e. the weight of their link. But these weights are not to be arbitrarily chosen by the designer of the system. They are adjusted as the result of a learning process. In concrete terms, what is needed is the encoding of a sample of input data and of expected corresponding output results. Then the learning algorithm automatically computes weight values so that the system gives a correct response when presented with data close to the learned sample. The most simple example is given by the so-called 'feed-forward' net works. They are constituted by an ordered set of layers of units, each unit of one layer being connected to all units of the next layer. The first and the last layers are respectively the input and output layers. Mathematically speaking, these two layers implement two spaces, and the learning process is equivalent to compu ting the most regular function from the input space to the output space satisfying the constraints given by the learning sample. To use such a system, one has to design the two basic spaces by specifying their dimensions and the rules of en coding data and results onto these spaces. Strictly speaking, encoding operations are also numeric, but here the situation is different : each unit must correspond to a linguistic criterion which is part of the model, and the choice of coding va lues which can be limited to two or three values is not arbitrary from a linguistic point of view. Connectionist approach is often opposed to symbolic approach, but in fact, there is a symbolic aspect in any connectionist network, precisely be cause input and output units are inevitably given a symbolic meaning, even in so-called 'distributed representations', otherwise the system could not be of any
MODELLING SEMANTIC PHENOMENA
249
use for modelling. The non-symbolic aspect of connectionist networks is limited to what happens strictly inside the network, in the correspondance between input and output, where the learning process takes place. One of most interesting classes of connectionist networks is the family of 'recurrent' networks. As opposed to feed-forward networks, they allow bi-direc tional links between units and so they are direct implementations of dynamical systems which we argued to be of cardinal importance in semantic continuous models. With these systems, one can capture the notions of attractors, bifurca tions, and so on. As an example, we used a recurrent architecture to implement our model of polysemy, and it enabled us to differentiate phenomena of ambi guity, indetermination, ... by the form of the basins of attractors of the dynamics created inside the recurrent network associated to the polysemous unit in diffe rent sentences (Victorri et ál. 1989 ; Gosselin et al. 1990). 4.3. Current limits of connectionism So connectionism is already an essential tool to implement continuous models. But it has a drawback that prevents it from playing in continuous mo delling the same role as classical artificial intelligence tools play in discrete mo delling. This flaw is related to an important notion developed in artificial intelli gence : the notion of control. A connectionist network remains a "black box" which does not allow much reasoning about its functioning. Our experience with polysemy modelling showed us how frustrating it was to work with such a sys tem. Even when it gave us roughly satisfactory results, we could not use it to answer the most important questions which motivated the implementation of the model : what were the decisive factors that explained the good performances ? In which direction might we modify the system to improve it ? How could we cha racterize the class of systems giving acceptable results ? To answer these questions, one must "open the black box", i.e. study the relation between the performances of the network and its internal configuration. Theoretical work is implied to classify networks in terms of qualitative behavior. This direction of research constitutes a major challenge for connectionism. This work has already started, as shown for instance by the important theoretical re sults obtained by D. Amit on one family of recurrent systems, the Hopfield net works. The usefulness of continuous model implementations crucially depends upon progress in this area. Conclusion Continuous models look likely to play a more and more important role in current research in semantics. In this paper, we tried to delimit to what extent mathematical and computer tools are adapted to this task. Clearly, many efforts
250
BERNARD VICTORRI
rare still to be made in this field to become an actual alternative to discrete for malisation. But an original theoretical framework already emerges : it can bring new light to many well-known linguistic issues, which cannot be taken into ac count by discrete modelling.
MODELLING SEMANTIC PHENOMENA
251
REFERENCES Culioli, A. 1990. Pour une linguistique de renonciation. Opérations et représentations, t. 1, Paris : Ophrys. Fuchs, C. et B. Victoni. 1988. Vers un traitement automatique de la polysémie grammaticale. TA. Informations 29, Paris. Fuchs, C. ; L. Gosselin et B. Victoni. 1991. Polysémie, glissements de sens et calcul des types de proès,Travaux de linguistique et de philologie, XXIX : 1, Strasbourg, pp. 137-169. Gosselin, L. ; A. Konfé et J.P. Raysz. 1990. Un analyseur sémantique automatique d'adverbes aspectuels du français, Actes du 4è colloque de l'A.R.C, Paris. Lakoff, G. 1987. Women, fire and dangerous things : what categories reveal about the mind, University of Chicago Press. Langacker, R.W. 1987. Foundations of cognitive grammar, Stanford University Press. Talmy, L. 1988. Force dynamics in langage and cognition, Cognitive Science 9:1. Victorri, B. et C. Fuchs. 1992. Construction de l'espace sémantique associé à un marqueur grammatical polysémique, Linguistica Investigationes 16 : 1. Victorri, B. 1992. Un modèle opératoire de la construction dynamique de la signification. In La théorie dAntoine Culioli, Ouvertures et incidences, Paris : Ophrys, pp. 185-201. Victorri, B., J.P. Raysz, A. Konfé. 1989. Un modèle connexionniste de la polysémie, Actes de Neuro Nimes 89, EC2.
INDEX
ambiguity, ambiguous 4, 47, 101, 112, 195, 198, 199, 201, 202, 246, 247, 249 analyzability 18,243 artificial intelligence 3,111, 145, 172, 202, 235, 249, attractor 132-134, 160, 167-169, 171, 184, 244, 246-247, 249 bifurcation 159-160, 169, 184, 246, 249 boundary 10, 22, 49, 57, 96-98, 115, 138, 145, 156, 175, 224, 244 catastrophe theory 4, 156, 184 category, categorization 3, 13, 49, 59, 67, 86, 98, 103, 243 cognition, cognitive 3, 9-18, 58, 95, 104, 119, 127-151, 164, 170-175, 205, 222 compositionality, compositional 3, 18, 102, 167, 189, 209, 242 connectionism, connectionnist 4, 119, 167, 189, 206, 216, 231, 237, 248 consistency 64, 122, 201 context, contextual 18, 37, 64, 74, 85, 90, 99, 101, 129, 161, 189, 197, 200, 205-216, 219, 233 contiguity 23, 27, 30, 99 corpus 57-61, 73, 101, 211, 216-221 cusp 136, 156 cut locus 175-177 deformability 23 degree 9, 16, 18, 47, 57, 60, 82, 85, 94, 97, 113, 119, 136, 209, 216, 219, 233,241,243 differential geometry, differential equation 4, 119, 127, 135, 176 discontinuity, discontinuous 5, 10, 23, 78,97, 101, 120, 162, 177, 244 discrete, discreteness 3, 9-18, 33, 50, 54, 73, 93, 97, 99-101, 104, 111-124, 127, 134, 151, 156, 170, 190, 193, 198, 234, 235, 244, 247 distance 118, 142, 195, 242, 244 distortion 28, 99, 104 distributed 171, 206-208, 216, 217, 236 dynamics, dynamic, dynamical, dynamistic 3,34, 37, 54, 83, 87, 97, 100, 131, 149, 157, 167, 172, 176, 193, 196, 236, 244, 246 filtering process 25, 46 frequency 62, 206, 211, 215, 222 fuzziness, fuzzy 10, 61, 72, 97, 120
254
CONTINUITY IN LINGUISTIC SEMANTICS
gap 9, 25, 29, 34, 40, 155, 162 generality 94, 193,223 geometry, geometrical 4, 98, 127, 136, 140-150, 168, 174, 244 Gestalt, gestaltist, gestaltic 3, 38, 103, 174, 217, 243 gradation, graduality, gradual, gradually 4, 17, 35, 37, 4 1 , 54, 86, 96, 97, 98, 99, 101, 162, 241, 242, 246, 248 gradience, gradient 4 1 , 57, 58, 59, 60, 6 1 , 7 1 , 72 73, 74, 97 grain 189, 208, 234, 237 holism, holistic 35, 174, 184 homonymy, homonym, homonymic 48, 5 1 , 54, 79, 83, 85, 99, 100, 190, 192 iconicity, iconic 39, 41, 164, 174, 184, 247 indeterminacy, indeterminate 10,58,95 jump 9-11, 244, 246 kinetism 83-85, 91, 93, 97, 101 lexical access 207, 215, 222 limit 22,31 localist 207, 208, 231, 234, 237 logics, logic, logical 50, 57, 93, 94, 114, 116, 118, 121, 122, 173, 189, 198, 243 memory 132, 155, 171, 202, 206, 215, 237 morphodynamics 167-187 network 130-138, 167, 170, 171, 177, 178, 190, 192, 206, 216-219, 222, 231, 234, 236-238, 248-249 non-monotonic 121, 196 perception, perceptive 3, 94, 112-113, 128-130 135, 149, 155, 160, 174, 184 polysemy, polyseme, polysemous 10, 77-92, 95, 96, 99-103, 150, 189-193, 196, 198, 201, 202, 205-229, 231-239, 242, 246, 249 probability 66-73, 94, 97 prototype, prototypicality, prototypical 10, 11, 39, 57, 86, 103, 136, 160, 161, 165 proximity 144, 208, 242 psychology, psychological, psychologically 3, 12, 14, 42, 57, 86, 89, 94, 111, 112, 133, 140, 171,206,222 qualitative5, 50, 114, 119, 120, 121, 133, 136, 143, 161, 162, 243, 245-246, 248, 249 quantification, quantitative 5, 50, 59, 74, 95, 102, 161, 243, 245, 246, 248 salience, saliency, salient 11, 12, 13, 14, 17, 18, 54, 103, 156, 164, 197, 199, 200,201,210,217,218,233 scale 11, 12, 15, 34, 57, 60, 67, 68, 72, 95, 97, 120, 210, 214 schema, schematicity 14, 142, 163, 174, 231
INDEX
255
semantic potential 198-199 semantic space 30, 101, 115, 159, 208, 234, 246-247 separability, separation 27, 156 singularity 156, 158, 182 statistics, statistical 5, 57-76, 159, 206, 216, 224, 236 symbolic 3, 117-118, 122, 123, 124, 128-130, 167-174, 189, 191, 248 synonymy, synonym, synonymous 59, 74, 83, 190-191 topology, topological 22, 93, 96, 98, 136, 142-145, 148, 156, 159, 162, 172, 174-176, 198, 243-245 trajector 16,217 transition, transitional 25, 28-30, 38-39, 4 1 , 45-46, 50, 80, 82-84, 87, 96, 100, 102, 129, 133, 156, 158-159 typicality, typical 15, 47, 53, 57, 59, 86, 94, 103, 210, 214, 219, 222, 223, 241242 undecidability, indecidable 97,190 vagueness, vague 10, 13, 58, 85, 95, 96, 120-121, 201, 246 variable depth 194-196 vision, visual 15, 58, 162, 176, 184, 207-208, 225