This page intentionally left blank
Language, Cohesion and Form
Margaret Masterman was a pioneer in the field of comp...
98 downloads
2083 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
Language, Cohesion and Form
Margaret Masterman was a pioneer in the field of computational linguistics. Working in the earliest days of language processing by computer, she believed that meaning, not grammar, was the key to understanding languages, and that machines could determine the meaning of sentences. She was able, even on simple machines, to undertake sophisticated experiments in machine translation, and carried out important work on the use of semantic codings and thesauri to determine the meaning structure of text. This volume brings together Masterman’s groundbreaking papers for the first time. Through his insightful commentaries, Yorick Wilks argues that Masterman came close to developing a computational theory of language meaning based on the ideas of Wittgenstein, and shows the importance of her work in the philosophy of science and the nature of iconic languages. Of key interest in computational linguistics and artificial intelligence, this book will remind scholars of Masterman’s significant contribution to the field. Permission to publish Margaret Masterman’s work was granted to Yorick Wilks by the Cambridge Language Research Unit. Y O R I C K W I L K S is Professor in the Department of Computer Science, University of Sheffield, and Director of I L A S H , the Institute of Language, Speech and Hearing. A leading scholar in the field of computational linguistics, he has published numerous articles and six books in the area of artificial intelligence, the most recent being Electric Words: Dictionaries, Computers and Meanings (with Brian Slator and Louise Guthrie, 1996). He designed C O N V E R S E , the dialogue system that won the Loebner prize in New York in 1998.
Studies in Natural Language Processing Series Editors: Steven Bird, University of Melbourne Branimir Boguraev, IBM, T. J. Watson Research Center
This series offers widely accessible accounts of the state-of-the-art in natural language processing (NLP). Established on the foundations of formal language theory and statistical learning, NLP is burgeoning with the widespread use of large annotated corpora, rich models of linguistic structure, and rigorous evaluation methods. New multilingual and multimodal language technologies have been stimulated by the growth of the web and pervasive computing devices. The series strikes a balance between statistical versus symbolic methods, deep versus shallow processing, rationalism versus empiricism, and fundamental science versus engineering. Each volume sheds light on these pervasive themes, delving into theoretical foundations and current applications. The series is aimed at a broad audience who are directly or indirectly involved in natural language processing, from fields including corpus linguistics, psycholinguistics, information retrieval, machine learning, spoken language, human–computer interaction, robotics, language learning, ontologies and databases.
Also in the series Douglas E. Appelt, Planning English Sentences Madeleine Bates and Ralph M. Weischedel (eds.), Challenges in Natural Language Processing Steven Bird, Computational Phonology Peter Bosch and Rob van der Sandt, Focus Pierette Bouillon and Federica Busa (eds.), Inheritance, Defaults and the Lexicon Ronald Cole, Joseph Mariani, Hans Uszkoreit, Giovanni Varile, Annie Zaenen, Antonio Zampolli and Victor Zue (eds.), Survey of the State of the Art in Human Language Technology David R. Dowty, Lauri Karttunen and Arnold M. Zwicky (eds.), Natural Language Parsing Ralph Grishman, Computational Linguistics Graeme Hirst, Semantic Interpretation and the Resolution of Ambiguity Andra´s Kornai, Extended Finite State Models of Language Kathleen R. McKeown, Text Generation Martha Stone Palmer, Semantic Processing for Finite Domains Terry Patten, Systemic Text Generation as Problem Solving Ehud Reiter and Robert Dale, Building Natural Language Generation Systems Manny Rayner, David Carter, Pierette Bouillon, Vassilis Digalakis and Mats Wire´n (eds.), The Spoken Language Translator Michael Rosner and Roderick Johnson (eds.), Computational Linguistics and Formal Semantics
Richard Sproat, A Computational Theory of Writing Systems George Anton Kiraz, Computational Nonlinear Morphology Nicholas Asher and Alex Lascarides, Logics of Conversation Walter Daelemans and Antal van den Bosch, Memory-based Language Processing Margaret Masterman (edited by Yorick Wilks), Language, Cohesion and Form
Language, Cohesion and Form Margaret Masterman Edited, with an introduction and commentaries, by
Yorick Wilks Department of Computer Science University of Sheffield
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521454896 © Cambridge University Press 2005 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2005 isbn-13 isbn-10
978-0-511-13459-3 eBook (EBL) 0-511-13459-2 eBook (EBL)
isbn-13 isbn-10
978-0-521-45489-6 hardback 0-521-45489-1 hardback
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
*Starred chapters have following commentaries by the editor (and by Karen Spa¨rck Jones for chapter 6).
Preface
page ix
Editor’s introduction Part 1. Basic forms for language structure
1 19
1. Words 2. Fans and Heads* 3. Classification, concept-formation and language
21 39 57
Part 2. The thesaurus as a tool for machine translation
81
4. The potentialities of a mechanical thesaurus 5. What is a thesaurus? Part 3. Experiments in machine translation 6. ‘Agricola in curvo terram dimovit aratro’* 7. Mechanical pidgin translation 8. Translation*
83 107 147 149 161 187
Part 4. Phrasings, breath groups and text processing
225
9. Commentary on the Guberina hypothesis 10. Semantic algorithms*
227 253
Part 5. Metaphor, analogy and the philosophy of science
281
11. Braithwaite and Kuhn: Analogy-Clusters within and without Hypothetico-Deductive Systems in Science
283 vii
viii
Contents
Bibliography of the scientific works of Margaret Masterman Other References Index
299 304 311
Preface
This book is a posthumous tribute to Margaret Masterman and the influence of her ideas and life on the development of the processing of language by computers, a part of what would now be called artificial intelligence. During her lifetime she did not publish a book, and this volume is intended to remedy that by reprinting some of her most influential papers, many of which never went beyond research memoranda from the Cambridge Language Research Unit (CLRU), which she founded and which became a major centre in that field. However, the style in which she wrote, and the originality of the structures she presented as the basis of language processing by machine, now require some commentary and explanation in places if they are to be accessible today, most particularly by relating them to more recent and more widely publicised work where closely related concepts occur. In this volume, eleven of Margaret Masterman’s papers are grouped by topic, and in a general order reflecting their intellectual development. Three are accompanied by a commentary by the editor where this was thought helpful plus a fourth with a commentary by Karen Spa¨rck Jones, which she wrote when reissuing that particular paper and which is used by permission. The themes of the papers recur, and some of the commentaries touch on the content of a number of the papers. The papers present problems of style and notation for the reader: some readers may be deterred by the notation used here and by the complexity of some of the diagrams, but they should not be, since the message of the papers, about the nature of language and computation, is to a large degree independent of these. MMB (as she was known to all her colleagues) put far more into footnotes than would be thought normal today. Some of these I have embedded in the text, on Ryle’s principle that anything worth saying in a footnote should be said in the text; others (sometimes containing quotations a page long) I have dropped, along with vast appendices, so as to avoid too much of the text appearing propped up on the stilts of footnotes. MMB was addicted to diagrams of great complexity, some of which have been reproduced here. To ease notational complexity I have in ix
x
Preface
places used ‘v’ and ‘þ’ instead of her Boolean meet and join, and she wrote herself that ‘or’ and ‘and’ can cover most of what she wanted. In the case of lattice set operations there should be no confusion with logical disjunction and conjunction. I have resisted the temptation to tidy up the papers too much, although, in some places, repetitive material has been deleted and marked by [ . . . ]. The papers were in some cases only internal working papers of the CLRU and not published documents, yet they have her authentic tone and style, and her voice can be heard very clearly in the prose for those who knew it. In her will she requested, much to my surprise, that I produce a book from her papers. It has taken rather longer than I expected, but I hope she would have liked this volume. MMB would have wanted acknowledgements to be given to the extraordinary range of bodies that supported CLRU’s work: the US National Science Foundation, the US Office of Naval Research, the US Air Force Office of Scientific Research, the Canadian National Research Council, the British Library, the UK Office of Scientific and Technical Information and the European Commission. I must thank a large number of people for their reminiscences of and comments on MMB’s work, among whom are Dorothy Emmet, Hugh Mellor, Juan Sager, Makoto Nagao, Kyo Kageura, Ted Bastin, Dan Bobrow, Bill Williams, Tom Sharpe, Nick Dobree, Loll Rolling, Karen Spa¨rck Jones, Roger Needham, Martin Kay and Margaret King. I also owe a great debt to Gillian Callaghan and Lucy Moffatt for help with the text and its processing, to Octavia Wilks for the index, and to Lewis Braithwaite for kind permission to use the photograph of his mother. YORICK WILKS
Sheffield December 2003
Editor’s introduction
1.
A personal memoir: Margaret Masterman (1910–1986)
Margaret Masterman was ahead of her time by some twenty years: many of her beliefs and proposals for language processing by computer have now become part of the common stock of ideas in the artificial intelligence (AI) and machine translation (MT) fields. She was never able to lay adequate claim to them because they were unacceptable when she published them, and so when they were written up later by her students or independently ‘discovered’ by others, there was no trace back to her, especially in these fields where little or nothing over ten years old is ever reread. Part of the problem, though, lay in herself: she wrote too well, which is always suspicious in technological areas. Again, she was a pupil of Wittgenstein, and a proper, if eccentric, part of the whole Cambridge analytical movement in philosophy, which meant that it was always easier and more elegant to dissect someone else’s ideas than to set out one’s own in a clear way. She therefore found her own critical articles being reprinted (e.g. chapter 11, below) but not the work she really cared about: her theories of language structure and processing. The core of her beliefs about language processing was that it must reflect the coherence of language, its redundancy as a signal. This idea was a partial inheritance from the old ‘information theoretic’ view of language: for her, it meant that processes analysing language must take into account its repetitive and redundant structures, and that a writer goes on saying the same thing again and again in different ways; only if the writer does that can the ambiguities be removed from the signal. This sometimes led her to overemphasise the real and explicit redundancy she would find in rhythmical and repetitive verse and claim, implausibly, that normal English was just like that if only we could see it right. This led in later years to the key role she assigned to rhythm, stress, breath groupings and the boundaries they impose on text and the processes of understanding. To put it crudely, her claim was that languages are the way they are, at least in part, because they are produced by creatures that 1
2
Editor’s introduction
breathe at fairly regular intervals. It will be obvious why such claims could not even be entertained while Chomsky’s views were pre-eminent in language studies. But she could never give systematic surface criteria by which the breathgroups and stress patterns were to be identified by surface cues, or could be reduced to other criteria such as syntax or morphology, nor would she become involved in the actual physics of voice patterns. Her views on the importance of semantics in language processing (which she continued to defend in the high years of Chomskyan syntax between 1951 and 1966) were much influenced by Richens’ views on classification and description by means of a language of semantic primitives with its own syntax. These, along with associated claims about semantic pattern matching onto surface text, were developed in actual programs, from which it might be assumed that she was a straightforward believer in the existence of semantic primitives in some Katzian or Schankian sense. Nothing could be further from the truth: for she was far too much a Wittgensteinian sceptic about the ability of any limited sublanguage or logic to take on the role of the whole language. She always argued that semantic primitives would only make sense if there were empirical criteria for their discovery and a theory that allowed for the fact that they, too, would develop exactly the polysemy of any higher or natural language; and she always emphasised the functional role of primitives in, for example, resolving sense ambiguity and as an interlingua for MT. She hoped that the escape from the problem of the origin of semantic primitives would lie in either empirical classification procedures operating on actual texts (in the way some now speak of deriving primitives by massive connectionist learning), or by having an adequate formal theory of the structure of thesauri, which she believed to make explicit certain underlying structures of the semantic relations in a natural language: a theory such that ‘primitives’ would emerge naturally as the organising classification of thesauri. For some years, she and colleagues explored lattice theory as the underlying formal structure of such thesauri. Two other concerns that went through her intellectual life owe much to the period when Michael Halliday, as the University Lecturer in Chinese at Cambridge, was a colleague at CLRU. She got from him the idea that syntactic theory was fundamentally semantic or pragmatic, in either its categories and their fundamental definition, or in terms of the role of syntax as an organising principle for semantic information. She was the first AI researcher to be influenced by Halliday, long before Winograd and Mann. Again, she became preoccupied for a considerable period with the nature and function of Chinese ideograms, because she felt they clarified in an empirical way problems that Wittgenstein had wrestled with in his so-called picture-theory-of-truth. This led her to exaggerate
Editor’s introduction
3
the generality of ideogrammatic principles and to seem to hold that English was really rather like Chinese if only seen correctly, with its meaning atoms, being highly ambiguous and virtually uninflected. It was a view that found little or no sympathy in the dominant linguistic or computational currents of the time. Her main creation, one that endured for twenty years, was the Cambridge Language Research Unit, which grew out of an informal discussion group with a very heterogeneous membership interested in language from philosophical and computational points of view. Subsequently, the attempt to build language-processing programs that had a sound philosophical basis was a distinctive feature of the unit’s work. This approach to language processing, and the specific form it took in the use of a thesaurus as the main vehicle for semantic operations, will probably come to be seen as the unit’s major contributions to the field as a whole, and it was Margaret who was primarily responsible for them. Her vision of language processing and its possibilities was remarkable at a time when computers were very rudimentary: indeed, much of the CLRU’s work had to be done on the predecessors of computers, namely Hollerith punched-card machines. Equally, Margaret’s determination in establishing and maintaining the unit, with the enormous effort in fundraising that this involved, was very striking: the fact that it could continue for decades, and through periods when public support for such work was hard to come by, is a tribute to Margaret’s persistence and charm. It is difficult for us now, in these days of artificial intelligence in the ordinary market-place, and very powerful personal computers, to realise how hard it was to get the financial resources needed for language-processing research, and the technical resources to do actual experiments. Perhaps the best comment on Margaret’s initiative in embarking on language-processing research, and specifically on machine-translation work, comes from a somewhat unexpected source. Machine translation, after an initial period of high hopes, and some large claims, was cast into outer darkness in 1966 by funding agencies who saw little return for their money. Reviewing twenty-five years of artificial-intelligence research in his presidential address to the American Association for Artificial Intelligence in 1985, Woody Bledsoe, one of the longstanding leaders of the field, though in areas quite outside language, said of those who attempted machine translation in the fifties and sixties: ‘They may have failed, but they were right to try; we have learned so much from their attempts to do something so difficult’. What MMB and CLRU were trying to do was far ahead of their time. Efforts were made to tackle fundamental problems with the computers of the day that had the capacity of a modern digital wristwatch. Despite every
4
Editor’s introduction
kind of problem, the unit produced numerous publications on language and related subjects, including information retrieval and automatic classification. For over ten years the unit’s presence was strongly felt in the field, always with an emphasis on basic semantic problems of language understanding. Margaret had no time for those who felt that all that needed doing was syntactic parsing, or that complete parsing was necessary before you did anything else. Now that the semantics of language are regarded as a basic part of its understanding by machine, the ideas of CLRU seem curiously modern. Margaret’s main contribution to the life of CLRU was in the continual intellectual stimulus she gave to its research, and through this to the larger natural language processing community: she had wide-ranging concerns, and lateral ideas, which led her, for example, to propose the thesaurus as a means of carrying out many distinct language-processing tasks, like indexing and translation. Margaret’s emphasis on algorithms, and on testing them, was vital for the development of CLRU’s work on language processing; but her ideas were notable, especially for those who worked with her, not just for their intellectual qualities, but for their sheer joyousness. Her colleagues and students will remember her for her inspiration, rather than her written papers: she made questions of philosophy and language processing seem closely related and, above all, desperately important. On their joint solutions hung the solutions of a range of old and serious questions about life and the universe. In this, as so much else, she was a Wittgensteinian but, unlike him, she was optimistic and believed that, with the aid of the digital computer, they could be solved. She could not only inspire and create, but terrify and destroy: she had something of the dual aspects of Shiva, an analogy she would have appreciated. Even in her seventies, and still funded by European Commission grants, her hair still black because a gypsy had told her forty years before that it would not go grey if she never washed it, she would rise, slowly and massively at the end of someone’s lecture, bulky in her big, belted fisherman’s pullover, to attack the speaker, who would be quaking if he had any idea what might be coming. The attack often began softly and slowly, dovelike and gentle, gathering speed and roughness as it went. As some readers may remember, there was no knowing where it would lead. 2.
Themes in the work of Margaret Masterman
In this introductory chapter I shall seek to reintroduce and then focus the work of Margaret Masterman by enumerating and commenting briefly on a number of themes in her work. Some of these have been successful, in the sense of appearing, usually rediscovered, in some established place in the
Editor’s introduction
5
field of natural language processing, while others, it must be said, appear to have failed, even though they remain highly interesting. This last is a dangerous claim of course, one that can be reversed at any time. There is in my view a third category, of general programmes rather than particular representational methods, about which one can only say that they remain unproven. In spite of their breadth, scope and originality it must also be conceded that Margaret Masterman did not have theories to cover all aspects of what would be considered the core issues of computational linguistics today: for example, she had little or nothing to say on what would now be called text theory or pragmatics. Nor did she have any particular reason for ignoring them, other than that she thought the problems that she chose to work on were in some sense the most fundamental. The order of the themes corresponds broadly to that of the sections of this book: it moves from abstract concepts towards more specific applications of those concepts, from particular forms to language itself, on which those forms imposed the coherence and redundancy that she believed to be at the core of the very idea of language. I shall continue here the affectionate tradition of referring to her as MMB, the initials of her married name Margaret Masterman Braithwaite. 2.1.
Ideograms
This was an early interest of MMB’s (Masterman, 1954 and Chapter 1) that persisted throughout her intellectual life: the notion that ideograms were a fundamental form of language and were of non-arbitrary interpretation. The root of this idea lay in Wittgenstein’s interest (1922) in how pictures could communicate: in how the drawing of an arrow could convey movement or pointing and, before that, in his so-called Picture Theory of Truth, where objects could be arranged to express facts. More particularly, she must almost certainly have been influenced by his Notebooks 1914–1916, where he writes, ‘Let us think of hieroglyphic writing in which each word is a representation of what it stands for’. The connection of all this to ideograms had been noted by I. A. Richards, who was much preoccupied by Chinese, and who developed English Through Pictures (Richards and Gibson, 1956), a highly successful language-teaching tool. MMB came to Chinese through Michael Halliday, then a Cambridge University lecturer in Chinese, and began to use stickpictures as representations of situations which could also provide a plausible referential underpinning for language: something universal, and outside the world of the language signs themselves, yet which did not fall back on the naive referentialism of those who said that the meanings of words were things or inexpressible concepts.
6
Editor’s introduction
Frege (new translation, 1960) had tackled this issue long before and created a notation in which propositions had a sense, but could only refer to the true or the false (at which point all differences between them, except truth value, were lost). This reference to situations, which MMB helped keep alive, has found formal expression again in Barwise and Perry’s Situation Semantics (1983). They, too, wanted a central notion of a situation as what an utterance points to, and they too resort to cartoon-like pictures but, unlike MMB, nowhere acknowledge the role of Wittgenstein’s Picture Theory of Truth. It is as hard to capture the future in this field as of any other, and the movement of a (partially) ideogrammatical language like Japanese to centre stage in language processing may yet show the importance of ideograms for grasping the nature of language. But whatever is the case there, MMB’s interest remained not only in the differences in the ways occidental and the main oriental language represent the world, but also in the ways those differences reflect or condition basic thought: she liked to quote a phrase of Whitehead’s that our logic would have been better based on the Chinese than the Greeks. 2.2.
Lattices and Fans
Although not a formalist herself, and considered an anti-formalist by many, MMB nevertheless believed passionately in the applicability of mathematical techniques to natural language; without them, she believed, there would be nothing worthy of the name of theory or science. What she was opposed to was the assumption that formal logic, in particular, could be applied directly to natural language, and she would not concede much distinction between that and the methods of Chomsky (1965), a position that has some historical justification. The two structures from which she hoped for most were lattices and ‘fans’, a notion she derived from some work of Brouwer (1952). MMB believed lattices (Masterman, 1959 and Chapter 3) to be the underlying structure of thesauri, and fans (Masterman, 1957b and Chapter 2), she believed, mapped the spreading out of the new senses of words, indefinitely into the future. She spent some time trying to amalgamate both representations into a single structure. These efforts have not met with much success nor have they been taken up by others, although Zellig Harris did at one time toy with lattices as language structures, and Mellish (1988) has sought to link lattice structures again to Halliday’s categories of grammar and semantics. Another problem is that fans are too simple to capture much: they have no recursive structure. And lattices are so restrictive: once it is conceded
Editor’s introduction
7
that neither words nor things fall neatly under a taxonomic tree structure, it by no means follows that they fall under a graph as restricted as a lattice either. More promising routes have been found through more general applications of the theory of graphs, where the constraints on possible structures can be determined empirically rather than a priori. 2.3.
Thesauri and the use of large-scale language resources
MMB believed thirty years ago that constructed entities like dictionaries and thesauri (especially the latter) constituted real resources for computational language processing (Masterman, 1956, 1958 and Chapters 4 and 6, respectively). That was at a time when any computational operations on such entities were often dismissed, by those working in other areas of computational linguistics, as low-grade concordance work. Betty May compacted the whole of Roget’s Thesaurus for MMB, from a thousand ‘heads’ to eight hundred, and had it put onto punched cards. That formed the basis for a range of experiments on Hollerith sorting machines, which contributed to Karen Spa¨rck Jones’ seminal thesis work Synonymy and Semantic Classification (1964, 1986). MMB believed that thesauri like Roget were not just fallible human constructs but real resources with some mathematical structure that was also a guide to the structures with which humans process language. She would often refer to ‘Roget’s unconscious’ by which she meant that the patterns of cross-references, from word to word across the thesaurus, had generalisations and patterns underlying them. In recent years there has been a revival of interest in computational lexicography that has fulfilled some of MMB’s hopes and dreams. It has been driven to some extent by the availability from publishers of machinereadable English Dictionaries, like LDOCE and COBUILD, with their definitions written in a semi-formal way, one that makes it much easier for a computational parser to extract information from them. But the initial work in the current wave was done by Amsler (1980) at Texas using Webster’s, an old-fashioned dinosaur of a dictionary. He developed a notion of ‘tangled hierarchies’, which captures the notion MMB promoted to get away from straightforward tree-like hierarchies. Current centres for such work include Cambridge, Bellcore, IBM-New York, Waterloo, Sheffield and New Mexico, where it has been carried out by a number of techniques, including searching for taxonomic structures, by parsing the English definitions in the dictionary entries, and by collocational techniques applied to the word occurrences in the entries themselves. This last normally involves the construction in a computer of very large matrices, as foreseen in the earlier work of Spa¨rck Jones. Those matrices
8
Editor’s introduction
can now be computed effectively with modern machines in a way that was virtually impossible twenty-five years ago. Although dictionaries and thesauri are in some sense inverses of each other, they also differ importantly in that dictionaries are written in words that are themselves sense-ambiguous, except, that is, for those entries in a dictionary that are written as lists of semi-synonyms (as when, for example, ‘gorse’ is defined as ‘furze’ and vice-versa). One of the major barriers to the use of machine-readable dictionaries has been the need to resolve those lexical ambiguities as the dictionary itself is parsed, which is to say, transformed by computer into some more formal, tractable structure. MMB was more concerned with thesauri than dictionaries as practical and intellectual tools, and they do not suffer from the problem in the same way. Words in a thesaurus are also ambiguous items, but their method of placement determines their sense in a clearer way than in a dictionary: the item ‘crane’, for example, appears in a thesaurus in a list of machines, and therefore means a machine at that point and not a bird. The name ‘machine’ at the head of the section can thus straightforwardly determine the sense of items in it. Yarowsky (1992) returned to Roget as a basis for his fundamental work on large-scale word sense discrimination. However, the last ten years have seen the Princeton WordNet (Miller et al. 1990) take over from dictionaries like LDOCE as the most-used linguistic-semantic resource. WordNet is a classic thesaurus, made up from scratch but with a powerful indexing mechanism and a skeletal set of categories and relations replacing the Roget 1,000 heads. 2.4.
The use of interlinguas
MMB was much associated with the use of interlinguas (or universal languages for coding meaning) for MT and meaning representation (Masterman, 1967 and Chapter 7), and her reply to Bar-Hillel’s criticism (1953) of their use has been much quoted. The notion of a uniform and universal meaning representation for translating between languages has continued to be a strategy within the field: it had a significant role in AI systems like conceptual dependency (Schank 1975) and preference semantics (Wilks 1975a), and is now to be found in recent attempts to use Esperanto as an interlingua for MT. MMB’s own view was heavily influenced by the interlingua NUDE (for naked ideas or the bare essentials of language) first created by R. H. Richens at Cambridge for plant biology: in a revised form it became the interlingua with which CLRU experimented. NUDE had recursively constructed bracketed formulas made up from an inventory of semantic primitives, and the formulas expressed the meaning of word senses on
Editor’s introduction
9
English. Karen Spa¨rck Jones worked on making NUDE formulas less informal, and defining the syntactic form of those entries was one of my own earliest efforts, so that a revised form of NUDE became my representational system for some years. In that system some of Richens’ more ‘prepositional’ primitives had their function merged with what were later to become case labels, in the sense of Fillmore’s Case Grammar (1968), for example, Richens’ TO primitive functioned very much like Fillmore’s Destination Case. However, MMB’s attitude to these primitives was very unlike that of other advocates of conceptual primitives or languages of thought: at no point did she suggest, in that way that became fashionable later in Cognitive Science, that the primitive names constituted some sort of language in the mind or brain (Fodor’s view, 1975) or that, although they appeared to be English, the primitives like MOVE and DO were ‘really’ the names of underlying entities that were not in any particular language at all. This kind of naive imperialism of English has been the bane of linguistics for many years, and shows, by contrast, the far greater sophistication of the structuralism that preceded it. MMB was far too much the Wittgensteinian for any such defence of primitive entities, in this as in other matters: for her, one could make up tiny toy languages to one’s heart’s content (and NUDE was exactly a toy language of 100 words) but one must never take one’s language game totally seriously (linguists forgot this rule). So, for her, NUDE remained a language, with all the features of a natural one like English or French, such as the extensibility of sense already discussed. That tactic avoided all the problems of how you justify the items and structure of a special interlingual language that are claimed to be universal, or brain-embedded, of course, but produced its own problems such as that of what one has achieved by reducing one natural language to another, albeit a smaller and more regular one. This, of course, is exactly the question to be asked of the group proposing Esperanto as an interlingua for MT. She would put such questions forcefully to those in CLRU who showed any sign of actually believing in NUDE as having any special properties over and above those of ordinary languages, a possibility she had herself certainly entertained: this was the technique of permanent cultural revolution within an organisation, known to Zen Buddhists, and later perfected by Mao Tse-tung. MMB believed that such interlinguas were in need of some form of empirical justification and could not be treated as unprovable and arbitrary assumptions for a system, in the way Katz (1972) had tried to do by arguing by analogy from the role of assumed ‘axiomatic’ entities in physics
10
Editor’s introduction
like photons or neutrons. One weak form of empirical support that was available was the fact that statistics derived from dictionaries showed that the commonest defining words in English dictionaries (exempting ‘a’ and ‘the’ and other such words) corresponded very closely indeed for the first 100 or so items to the primitives of NUDE. But MMB wanted something more structural than this and spent considerable time trying to associate the NUDE elements with the classifying principles of the thesaurus itself, which would then link back to the distributional facts about texts that the thesaurus itself represented. In this, as in other ways, MMB had more intuitive sympathy with the earlier distributional or structural linguistics, like Zelig Harris, than with the more apparently mathematical and symbolic linguistics of Chomsky and his followers. 2.5.
The centrality of machine translation as a task
There is no doubt that MT has become in recent years a solvable task, at least for some well-specified needs, sometimes by the use of new representational theories, but more usually by means of better software engineering techniques applied to the old methods. Merely doing that has yielded better results than could have been dreamed of two decades ago. MMB must be credited with helping to keep belief in MT alive during long years of public scepticism, and above all with the belief that MT was an intellectually challenging and interesting task (Masterman, 1967, 1961; Chapters 6 and 8, respectively). I think that is now widely granted, although it was not conceded within artificial intelligence, for example, until relatively recently. There it was still believed that, although language understanding required inference, knowledge of the world and processing of almost arbitrary complexity, MT did not: for it was a task that required only superficial processing of language. I think that almost everyone now concedes that that view is false. What MMB sought was a compromise system of meaning representation for MT: one that was fundamental to the process of translation, but did not constitute a detailed representation of all the relevant knowledge of the world. She believed there was a level of representation, linguistic if you will, probably vague as well, but that was sufficient for MT and, in that sense, she totally denied the assumption behind Bar-Hillel’s (1953) critique of MT, and which was taken up by some artificial intelligence researchers afterwards (though not, of course, the same ones as referred to in the last paragraph), that MT and language understanding in general did require the explicit representation of all world knowledge. This position of hers cannot be separated from her quasi-idealist belief that world knowledge cannot be represented independently of some language, and hence any true
Editor’s introduction
11
distinction between meaning representation and the representation of world knowledge is, ultimately, misconceived (see her discussion of Whorf in Masterman 1961 and Chapter 8). The only dispute can be about the ‘level’ or ‘grain’ of representation that particular acts of translation require. In later years she became highly critical of the large EUROTRA machine translation project funded by the European Commission, and surprisingly sympathetic to the old-fashioned American MT system SYSTRAN that she had criticised for many years as naive. This was partly, I think, because she came to see the vital role of dictionaries for practical MT, a matter that was clear in the development of SYSTRAN, but not (at that time at least) in the linguistic theories that drove SYSTRAN. In a 1979 letter to Margaret King, MMB wrote: My stance hasn’t changed that EUROTRA has got to get clear of the TAUMapproach [the French logical paradigm that underlay early EUROTRA work, Ed.], and to have a major revolution over dictionaries. But there is one question nobody ever asks me, How would you feel if EUROTRA was a triumphant success? Answer; absolutely delighted.
2.6.
Parsing text by semantic methods
A major concern of MMB’s was always how to transform, or parse (Masterman, 1968b and Chapter 10), written English into a machine representation for MT. She believed that such a representation should be fundamentally semantic in nature (i.e. based on meaning rather than syntax) and that those semantic structures should be used in the parsing process itself. The latter view was highly original, since virtually no one had ever proposed such a thing – that doctrine is now known as ‘semantic parsing’, and is well-known even if not as fashionable as it was ten years ago – and espousing it certainly set MMB apart from the prevailing syntactic approaches of her time. Some contemporary clarification will be needed in later commentary on this point, since the meaning of the word ‘semantics’ has changed yet again in recent years. Let us simply add here that ‘semantic’ as used by MMB in this connection cannot be equated with either its use in ‘semantic grammar’ (e.g. Burton 1978) to mean parsing by the use of particular word-names as they occur in text (e.g. as in a program that knew what words would probably follow ‘electrical’), nor with its currently dominant use in formal, logical semantics, to which we shall return in a moment. One of MMB’s main motivations for her view was that natural languages are highly ambiguous as to word sense, and that fact had been systematically ignored in computational language processing. She went further, and this was again influence from Wittgenstein, and held that they
12
Editor’s introduction
were infinitely or indefinitely ambiguous, and that only criteria based on meaning could hope to reduce such usage to any underlying machineusable notation. This emphasis set her off not only from those advocating syntactic parsing methods but also from any approach to meaning representation based on a formal logic, including any claim to deal with meaning by the use of set-theoretic constructs that never took any serious account of the ambiguity of symbols. Historically, MMB was vindicated by the growth of semantic parsing techniques during her lifetime and, although syntactic methods have recently recovered the initiative again, one can be pretty sure the pendulum will continue to swing now it is in motion. In recent years, since the work of Montague, there has been an enormous revival of formal philosophical semantics for natural language, in the sense of set- and model-theoretic methods, that ignore exactly those ambiguity aspects of language that MMB thought so important. Indeed, for many theorists ‘semantics’ has come to mean just that kind of work, a development MMB abhorred, not because she did not want a philosophical basis for theories of language, on the contrary, but because she did not want that particular one. Formal semantics approaches have not yet proved very computationally popular or tractable, and the verdict is certainly not available yet for that struggle. It is worth adding that for other languages, particularly Japanese, MT researchers have continued to use semantic parsing methods, arguing strongly that such methods are essential for an ‘implicit’ language like Japanese, where so much meaning and interpretation must be added by the reader and is not directly cued by surface items. 2.7.
Breath groups, repetition and rhetoric
These were three related notions that preoccupied MMB for much of her last twenty years, but which have not in my view yet proved successful or productive, and certainly not to MT where she long sought to apply them. This line of work began when she met Guberina, the Yugoslav therapist who managed to reclaim profoundly deaf persons. From him, MMB developed a notion she later called the Guberina Hypothesis (Masterman, 1963 and Chapter 9), to the effect that there were strong rhythms underlying language production and understanding (that could be grasped even by the very deaf ), and that these gave a clue to language structure itself. From this she developed the notion of a ‘breath group’, corresponding to the chunk of language produced in a single breath, and that there was therefore a phrasing or punctuation in spoken language, one that left vital structural traces in written language too, and could be used to access its content by computer. Much time was spent in her later years
Editor’s introduction
13
designing schemes by which the partitions corresponding to idealised spoken language could be reinserted into written text. From there MMB added the notion that language, spoken and written, was fundamentally more repetitive than was normally realised, and that the points at which the repetition could be noted or cued was at the junctions of breath groups. This notion was linked later to the figures of traditional Greek rhetoric, in which highly repetitive forms do indeed occur, and with the claim that the forms of repetition in text could be classified by traditional rhetorical names. MMB produced an extensive repertoire of language forms, partitioned by breath groups, and with their repetitions marked: a simple standard example would be ‘John milked the cows/and Mary the goats’, which was divided into two breath groups as shown by the slash, at the beginnings and ends of which were items of related semantic type (John/Mary, cows/ goats). Traditional forms of language such as hymns, biblical passages and the longer narrative poets were a rich source of examples for her. The problem with all this was that it required the belief that all text was fundamentally of a ritual, incantatory, nature, if only one could see it, and most people could not. The breath group notion rested on no empirical research on breath or breathing, but rather on the observation that language as we know it is the product of creatures that have to breathe, which fact has consequences even for written text. This last is true and widely accepted, but little that is empirical follows from it. What is agreed by almost all linguists is that spoken language is, in every way, prior to written. Again, there is agreement among some that the phrase is an under-rated unit, and language analysis programs have certainly been built that incorporate a view of language as a loose linear stringing together of phrases, as opposed to deep recursive structures. Some support for that view can be drawn from the classic psychological work (the so-called ‘click’ effect) that shows that sounds heard during listening to text seem to migrate to phrase boundaries. But none of this adds up to any view that language processing requires, or rests on, the insertion of regular metrical partitions carrying semantic import. Again, the claims about repetition and rhetoric can be seen as an extension of a more general, and certainly true, claim that language is highly redundant, and that the redundancy of word use allows the ambiguity of word sense meaning to be reduced. Programs have certainly been written to resolve semantic ambiguity by matching structured patterns against phrase-like groups in surface text: my own early work did that (e.g. Wilks 1964, 1965a), and it owed much to MMB’s work on semantic message detection. However, the partitions within which such patterns were matched were found by much more mundane processes such as
14
Editor’s introduction
keywords, punctuation and the ends of phrases detected syntactically (e.g. noun phrase endings). The oddest feature of MMB’s breath-group work, stretching as it did over many years, was that it referred constantly to breathing, but nothing ever rested on that: partitions were always inserted into text intuitively in a way that, to me at least, corresponded more naturally to the criteria just listed (keywords, punctuation etc.). Finally, of course, it would be overbold to assert that there will never be applications of Greek rhetorical figures to the computer understanding of natural language, but none have as yet emerged, except their explicit and obvious use as forms of expression. However, in all this one must accept that MMB was one of the few writers on language who took it for granted that the fact it was produced directionally in time was of some fundamental importance. One can see this, from time to time (as in the work of Hausser, 1999) emerge as a key observation requiring structural exploration, but in most theorising about language, such as the transformational-generative movement, this is never mentioned. 2.8.
Metaphor as normal usage
The claim that metaphor is central to the processes of language use is one now widely granted in natural language processing and artificial intelligence, even if there are few systems that know how to deal with the fact computationally, once it is granted. MMB always maintained that position (Masterman, 1961, 1980 and Chapters 8 and 11, respectively), and the recent rise of ‘metaphor’ as an acceptable study within language processing is some tribute to the tenacity with which she held it. For her it followed naturally from the ‘infinite extensibility’ of language use, the majority of which extensions would be, at first at least, metaphorical in nature. It was one of her constant complaints that Chomsky had appropriated the phrase ‘creativity’, by which he meant humans’ ability to produce new word strings unused before, while paying no attention, indeed positively deterring study, of aspects of language she considered universal and genuinely creative. Work such as Fass (1988), Carbonell (1982) and Wilks (1978) carried on her view of metaphor explicitly. MMB would also welcome anecdotal evidence, of the sort to be found in the work of Cassirer, that metaphorical uses of language were in some historical sense original, and not a later accretion. She rejected the view that language originally consisted of simple, unambiguous, Augustinian names of objects, the view parodied by Wittgenstein (1958, 1972) in the opening of the Philosophical Investigations, and preferred the idea of original primitive atoms of wide, vague, unspecific meaning, which were then both refined to specific referents in use and constantly extended by metaphor.
Editor’s introduction
15
Here, for MMB, was the root not only of metaphor but also of metaphysics itself, which consisted for her, as for Wittgenstein, of words used outside their hitherto normal realm of application. But whereas he thought that words were ‘on holiday’ when so used, for her it was part of their everyday work. Her critical paper on Kuhn’s theory of scientific paradigms (Chapter 11) is an attempt to defend the originality of her own husband (Richard Braithwaite), but what she actually does is to deploy the techniques developed in the chapters of this book as tools to investigate scientific metaphor and analogy empirically, using methods drawn from language processing. This was a wholly original idea, not to surface again until the artificial intelligence work of Thagard (1982). 2.9.
Information retrieval, empiricism and computation
A strong strand in CLRU’s work was information retrieval (IR): ParkerRhodes and Needham (1959) developed the Theory of Clumps, and Spa¨rck Jones (1964) applied this theory to reclassify Roget’s thesaurus using its rows as features of the words in them. MMB touches on IR in more than one of the papers in this volume and she could see what almost no one could at that time, and which many in today’s empirical linguistics believe obvious, namely that IR and the extraction of content from texts are closely connected. She believed this because she thought that IR would need to take on board structural insights about language and not treat texts as mere ‘bags of words’, and it is not yet totally clear which view of that issue will ultimately triumph (see Strazlkowski 1992). Much of CLRU’s theoretical IR work could not be tested in the 1960s: large matrices could not be computed on punched card machines and an ICL 1202 computer with 2040 registers on a drum! It is easy to imagine, looking back, that researchers like MMB guessed that computers would expand so rapidly in size and power, so that the supercomputer of ten years ago is now dwarfed by a desktop workstation. But I suspect that is not so and virtually no one could envisage the way that quantitative changes in machine power would transform the quality of what could be done, in that (once) plainly impossible methods in language processing now seem feasible. It is this transformation that makes it all the more striking that MMB’s ideas are still of interest and relevance, since so much has fallen by the wayside in the rush of machine power. 2.10.
The overarching goal: a Wittgensteinian computational linguistics
There is no doubt that MMB wanted her theories of language to lead to some such goal: one that sought the special nature of the coherence that
16
Editor’s introduction
holds language use together, a coherence not captured as yet by conventional logic or linguistics. Such a goal would also be one that drew natural language and metaphysics together in a way undreamed of by linguistic philosophers, and one in which the solution to problems of language would have profound consequences for the understanding of the world and mind itself. And in that last, of course, she differed profoundly from Wittgenstein himself, who believed that the consequence could only be the insight that there were no solutions to such problems, even in principle. It is also a goal that some would consider self-contradictory, in that any formalism that was proposed to cover the infinite extensibility of natural language would, almost by definition, be inadequate by Wittgenstein’s own criteria, and in just the way MMB considered Chomsky’s theories inadequate and his notion of generativity and creativity a trivial parody. The solution for her lay in a theory that in some way allowed for extensibility of word sense, and also justified ab initio the creation of primitives. This is a paradox, of course, and no one can see how to break out of it at the moment: if initially there were humans with no language at all, not even a primitive or reduced language, then how can their language when it emerges be represented (in the mind or anywhere else) other than by itself ? It was this that drove Fodor (1975) to the highly implausible, but logically impeccable, claim that there is a language of thought predating real languages, and containing not primitives but concepts as fully formed as ‘telephone’. This is, of course, the joke of a very clever man, but it is unclear what the alternatives can be, more specifically what an evolutionary computational theory of language can be. It is this very issue that the later wave of theories labelled ‘connectionist’ (e.g. Sejnowski and Rosenberg, 1986) sought to tackle: how underlying classifiers can emerge spontaneously from data by using no more than association and classification algorithms. MMB would have sympathised with its anti-logicism, but would have found its statistical basis only thin mathematics, and would have not been sympathetic to its anti-symbolic disposition. It is easier to set down what insights MMB would have wanted to see captured within a Wittgensteinian linguistics than to show what such a theory is in terms of structures and principles. It would include that same ambiguous attitude that Wittgenstein himself had towards language and its relation to logic: that logic is magnificent, but no guide to language. If anything, the reverse is the case, and logic and reasoning itself can only be understood as a scholarly product of language users: language itself is always primary. It is not clear to me whether MMB extended that line of argument to mathematics: I think that she had an exaggerated respect for it, one not based on any close acquaintance, and which for her exempted it
Editor’s introduction
17
from that sort of observation, so that she was able to retain her belief that a theory of language must be mathematically, though not logically, based. Her language-centredness led her to retain a firm belief in a linguistic level of meaning and representation: she shared with all linguists the belief that language understanding could not be reduced, as some artificial intelligence researchers assume, to the representation of knowledge in general, and independent of representational formalisms (a contradiction in terms, of course), and with no special status being accorded to language itself. Indeed, she would have turned the tables on them, as on the logicians, and said that their knowledge representation schemes were based in turn on natural languages, whether they knew it or not. As to the current concern with a unified Cognitive Science, I think her attitude would have been quite different from those who tend to seek the basis of it all in psychology or, ultimately, in brain research. Chomskyans have tended to put their money on the latter, perhaps because the final results (and hence the possible refutations of merely linguistic theories) look so far off. MMB had little time for psychology, considering it largely a restatement of the obvious, and would I think have argued for a metaphysically rather than psychologically orientated Cognitive Science. Language and metaphysics were, for her, closely intertwined, and only they, together, tell us about the nature of mind, reasoning and, ultimately, the world. She would have liked Longuet-Higgins’ remark, following Clausewitz, that artificial intelligence is the continuation of metaphysics by other means.
Part 1
Basic forms for language structure
1
Words
To the question ‘What is a word?’ philosophers usually give, in succession (as the discussion proceeds), three replies: 1. ‘Everybody knows what a word is.’ 2. ‘Nobody knows what a word is.’ 3. ‘From the point of view of logic and philosophy, it doesn’t matter anyway what a word is, since the statement is what matters, not the word.’ In this paper I shall discuss these three reactions in turn, and dispute the last. Since it is part of my argument that the ways of thinking of several different disciplines must be correlated if we are to progress in our thinking as to what a word is, I shall try to exemplify as many differing contentions as possible by the use of the word ward, since this word is a word which can be used in all senses of ‘word’, which many words cannot. Two preliminary points about terminology need to be made clear. I am using the word ‘word’ here in the type sense as used by logicians, rather than in the token sense, as synonymous with ‘record of single occurrence of pattern of sound-waves issuing from the mouth’. Thus, when I write here ‘mouth’, ‘mouth’, ‘mouth’, I write only one word. The second point is that I use in this paper, in different senses, the terms ‘Use’, ‘usage’ and ‘use’. The question as to how the words ‘usage’ and ‘use’ should be used is, as philosophers know, a very thorny one. Here I am using ‘use’ in two senses, and ‘usage’ in one. Thus in this paper, the Use of a word is its whole field of meaning, its total ‘spread’.1 Its usages, or main meanings in its most frequently found contexts, together make up its Use; but each of these usages must be conceived of as itself occurring in various temporary contexts, or uses, in each of which it will have a slightly different shade of meaning from what it will have when occurring in any other context or use.
1
‘Spread’ is here used linguistically, not mathematically.
21
22
Basic forms for language structure
1.
‘Everybody knows what a word is.’
The naive idea of ‘word’ is ‘dictionary-word’; as the Oxford Dictionary puts it, ‘(a word is) . . . an element of speech; a combination of vocal sounds, or one such sound, used in a language to express an idea (e.g., to denote a thing, attribute or relation) and constituting an ultimate of speech having a meaning as such’. The philosopher’s invariable reaction to this definition is that of Humpty Dumpty in Through the Looking Glass: that is, that it sounds all right (said rather doubtfully) but that, of course, there is not time to go into it just now. One thing we could all agree on, philosophers and plain men in the street alike: that for practical purposes this definition is all right, provided that it is used ostensively, with the Oxford Dictionary as reference. It is all right (that is) provided that, to distinguish word from word, we pin our faith blindly to the Oxford Dictionary by saying, ‘Words distinguished from one another in the Oxford Dictionary are different words, and only words so distinguished from one another are different words’. Thus (according to this Oxford Dictionary criterion) ward the noun is to be distinguished from ward the verb; each of these is to be distinguished from ward (‘award’), from ward (‘wartlike callosity’), from ward (-ward) and from ward meaning ‘might’ or ‘would’; but no other forms of ward are to be distinguished from one another, however great their apparent divergence of meaning from one of these. But for the professional purposes of the philosopher, this won’t quite do. For him, as also for the structural linguist, a word is a usage; it is what the Oxford Dictionary-maker would call ‘a sense of a word’. Thus, for the philosopher, in addition to the distinctions made above, ward meaning ‘ward in chancery’ would be a different word from ward meaning ‘garrison’; just as ward meaning ‘to parry’ (in fencing) would be a different word from ward meaning ‘to put into a hospital ward’; and ward meaning ‘to put into a hospital ward’ would be a different word from ward meaning ‘ward of a key’. These sorts of differences in meaning are not, actually, the differences that the philosopher most desires to examine. But there are differences of usage that can be exemplified in the case of the word ward which it is the philosopher’s primary business to examine. Thus we get ‘Ward!’ meaning ‘Hey, Mr Ward!’ and ‘Ward!’ meaning ‘Parry (you fool)!’ Of each of these usages we can significantly ask, ‘Is this usage a word merely, or is it also a statement?’, as well as asking, ‘Are these two different usages of the same word, or are these different words, both pronounced ward?’ We can further ask, ‘Are these two statement-like usages of ward different in kind from other, non-statement-like, usages of the word ward, or are there intermediate cases, which we might call ‘‘quasi-statement-like’’ usages of the word ward?’ And is ward the logical parent of the proper name Ward?
Words
23
Then there is another set of philosophic questions which arise from consideration of the metaphoric use of usages of ward. Consider the following examples: 1. ‘Your wits must ward your head.’ 2. ‘True confession is warded on every side with many dangers.’ 3. ‘Every hand Of accident doth with a Picker stand To scale the wards of life.’ 4. As thou hast five watergates in the outer-ward of they body, so hast thou five watergates in the inner-ward of they soul.’ In this series not only does the meaning of ward in the pair 1 and 2 differ greatly in use from the meaning of ward in the pair 3 and 4 but also ward used in 1 differs definitely and significantly from ward as used in 2, and ward as used in 3 differs more subtly, but definitely and demonstrably from ward as used in 4. For it makes no sense to say, in 3 of the wards of life, that they have watergates; neither would it make sense to say that they are part of a metaphysical castle, that they have, for instance, an outer and inner structure. In so far as can be seen, the wards of 3 are extremely indeterminate nonspatial hills, of which it may still make sense to say that they are defensive (whether it makes sense or not depends on the sense in which ‘picker’ is here being used), but it may not make sense even to say that. On the other hand, if ward in 3 is being used indeterminately, ward in 4 is being used abstractly, for, as well as being used as a technical sort of noun, it is also being used as a preposition of direction; and, as we shall see at a later stage, it is logically no accident that this is so. So we get, philosophically, into the position of saying that, in examining uses of usages of the word ward, firstly, we do not feel quite sure how many words of the form ward we have to deal with, even in the crude dictionarymaker’s sense of ‘word’; and secondly, the more we think of it the less clear we feel as to how finely and on what grounds we should subdivide these. Is ‘Ward!’ for instance (Mr Ward) cognate to ward as used in ‘the Ward of London’, or ‘London-Ward’? We can say, of course, ‘Oh well, in fact it’s all quite easy as long as we go by common-sense and don’t start trying to be philosophers, because we can all, in ordinary life, agree quite easily to call certain usages of the word ward different usages of the same word and other usages, different words’. But is it so easy, even in ordinary life? Is ward in ‘to ward a patient’ the same ward as in ‘maternity ward’? The Oxford Dictionary says that it is not, on the assumption, apparently, that a noun can never be used as a verb or a verb as a noun. But should the philosopher acquiesce in this? Difficulties multiply. And even now we have not begun to be really subtle and philosophic – though we shall have to take another example than ward to show this. Is sun in ‘The sun rises in the East and sets in the
24
Basic forms for language structure
West’ really, philosophically, the same word as sun in ‘The sun is the gravitational centre of the planetary system’? And what about matter in Wisdom’s famous series in Moore’s Technique (1953): 1. ‘Matter exists.’ (I put up my hand, and my hand is solid and material enough, isn’t it?) 2. ‘Matter doesn’t exist.’ (Analogue: ‘Fairies, unicorns, dinosaurs don’t exist.’) 3. ‘Matter doesn’t really exist.’ (Analogue: ‘There is no such thing as Beauty; Beauty doesn’t really exist.’) 4. ‘Material things are nothing over and above sensations.’ (Logical equivalent: ‘Statements about material things can be analysed into statements about sensations.’) 5. ‘Matter is just sensations’, said with a desire to shock. (Analogue: ‘There’s no such thing as thought or feeling, only patterns of behaviour.’) 6. ‘I don’t believe in Matter.’ (Analogue: ‘The ship is bearing down on us, but I don’t believe there’s anyone aboard.’) 7. ‘Matter is a logical fiction.’ (Certain indirect empirical expectations and non-expectations could genuinely be held to follow from this.) 8. ‘Matter is a logical fiction.’ (Logical equivalent: ‘Matter is a construction out of sensations.’) The customary things to say about this series is: ‘Oh yes, in this series of statements, matter and material are the same words throughout. It’s just that, when you put them back in their contexts, all of these can be shown to be different statements.’ But suppose I tried to be awkward and to contest this. Suppose I said, ‘I don’t see what you mean. You would have it, earlier, that ward meaning ‘‘prison cell’’ was a different word from ward meaning ‘‘section of a lock, or of a key’’, and that ward meaning ‘‘prison cell’’ was a different word from ward meaning ‘‘section of a hospital’’; and that ward, meaning ‘‘prison or ward-room’’ was also a different word, and not merely a different usage of the same word, from ward meaning ‘‘guard’’ or ‘‘prisonguard’’, or as used in the compound phrase ‘‘watch and ward ’’. And you were at least doubtful as to whether ward meaning ‘‘section of a hospital’’ wasn’t a different word from ward meaning ‘‘the patients living in a section of a hospital’’. You didn’t say, then, that ward in the statement ‘‘Prisoner and in disgrace he might be, but, fretting like a caged lion, he paced the ward’’ was the same word as ‘‘When she was wheeled out on the operation-trolley, shrieking, fretting like a caged lion, he paced the ward’’ but it was just then, when each of these was put back in its context, the meanings of the statements were different. Why then, when comparing ‘‘It’s only lunatics who say ‘Matter doesn’t exist. Why, I put up my hand, and that’s solid and material enough, isn’t it?’’’ with ‘‘Matter doesn’t
Words
25
exist; fairies, unicorns and dinosaurs don’t exist’’, shouldn’t I say similarly that the question at issue is not primarily whether ‘‘Matter doesn’t exist’’ in context 1 and ‘‘Matter doesn’t exist’’ (2) are different statements, but whether matter as used in 1 and matter as used in 2 are different words?’ It seems to me that if I insisted on being awkward on these lines, I should be hard to answer. The truth about this matter seems to be that while ‘statements’ and ‘sentence’ are used by logicians and philosophers both as sophisticated logical concepts and also as much less-sophisticated grammatical concepts, ‘word’ is still regarded by philosophers as being purely a grammatical concept; and therefore as still being a concept that it is quite easy to understand. No opinion could be more ill-based. In this matter of defining ‘word’ in an intellectually adult manner, philosophers are leaning upon what they believe to be the knowledge of grammarians, while grammarians are leaning upon what they believe to be the insights of philosophers. For there is at present time no question being more hotly and constantly disputed by grammarians than that of what a grammatical word is, or is not. It is no good whatever the philosopher saying, ‘I am simple-minded. I use ‘‘word’’ as grammarians use ‘‘word’’,’ because there is no simpleminded manner in which grammarians use ‘word’. It is at this point that we pass to the second stage of the discussion; that in which the philosopher, still suspecting that it will be all quite easy when he really looks into it, turns to the grammarian to be refreshed on what a word is; and returns shortly afterwards, rather quiet, saying, ‘I don’t suppose it matters philosophically, but actually, you know, nobody knows what a word is at all.’ 2.
‘Nobody knows what a word is.’
As a preliminary to seeing how the various grammatical conceptions of a ‘word’ differ from the philosophic conceptions of a word, it is both helpful and relevant to become a little more precise as to how the analytic philosopher differs from the modern grammarian. The usual answer to this is, ‘Instead of talking about ‘‘language’’ the grammarian talks about ‘‘languages’’’. But this is not true. The chief modern grammatical periodical is called ‘Language’, not ‘Languages’; there is a grammatical subject called ‘theory of language’ (or sometimes, in American, the combined study of ‘general linguistics’ and of ‘metalinguistics’) just as there is a philosophic subject called ‘the philosophy of language’ (or sometimes, when we feel rather more American, ‘linguistic philosophy’). What, then, is the difference between the two?
26
Basic forms for language structure
The answer to this second query is that, whereas philosophical grammarians talk about ‘language’ in such a way as to take account of the facts of which they have been made aware by learning languages, philosophers of language talk about ‘language’ in such a way as to take account only of the facts of which they have been made aware by learning their own language – but they make an attempt to take account of all of these. Now, if it were the case that an actually spoken language – say, the English language – were, logically speaking, a homogeneous entity, then the philosophers of language would show up as very naive in comparison with the philosophical grammarians. For in that case they would be making the naive assumption that, logically speaking, ‘ordinary language’ is the same in all languages: that all structural grammar, whatever its origin, will ‘mirror’ that of our own version of our own spoken tongue. Actually, however, the assumption that philosophers of language make is a different one, and very much more sophisticated. They assume that, natural language not being logically homogenous, any logical language-game that can be devised to throw light on some extant logical form in language A can also, with a little ingenuity, be applied to throw light on some extant logical form in language B, though the logical form that the game is meant to illustrate may not be so prominent, may not even be noticeable, in language B. It seems to me that this assumption is well based. It seems to me, too, though, that for philosophers it is not enough; that they need, just as philosophic grammarians need, to examine basic logical forms as these occur in different languages, as well as to examine their own language’s myriad different kinds of logical form. ‘No matter what way of carrying on in Chinese you bring forward, I guarantee to produce the logical equivalent of it in English.’ So runs the challenge to the linguist from the logical poet; and certainly I, in my capacity as linguist, would not like to take on myself, in my capacity as logical poet, in a contest consisting of this novel kind of language game, since I can quite well see that I, as logical poet, would keep up with myself as linguist very well. But that is just the point. All I would do is ‘keep up’. I would not – as I should do, being a philosopher – go on ahead. The utmost of my ambition, in my capacity as logical poet, would be ingeniously to reproduce – perhaps even to exaggerate – the constructions that I, in my capacity as linguist, had thought of first. Thus, by the nature of the case, while this relationship continues in this form, the linguist leads the philosopher of language all the way. There is no controversy, moreover, which brings this fact out more clearly than that as to the logical nature of a word. For whereas the naive reaction of the philosopher, as we have seen, is ‘What’s the trouble?
Words
27
Everybody knows what a word is’, the far more sophisticated reaction of the linguist is ‘When you think it over, nobody knows what a word is’. And yet there is a sense in which they can both agree; for there is a sense in which all languages have words. To exemplify in more detail the difficulty which the linguist feels, I shall bring forward for consideration a philosophical paper by the linguist Chao, entitled, The Logical Structure of Chinese Words (1946). And, playing the role of philosopher of language, I shall ‘keep up with’ Chao’s argument, at every stage, by imaging what it would feel like for the English language to have the logical complications that are presented by Chinese. I shall not have to imagine very hard to do this. Let us therefore take once more the word ward – a monosyllable. We have already seen this word used as a collective noun (e.g., ‘to set the ward’). Let us now imagine it used distributively (i.e., as equivalent to ‘warder’), as a technical term (e.g., in ‘the wards of a key’), as a medical term (as a variant of wart, perhaps, to mean a hard round callosity on the skin) and as an armourer’s technical term (‘ward of rapier’). We have already seen it, also, used as a verb: in swordsmanship (‘to ward’, meaning ‘to parry’); as a psychological term, meaning to ‘threaten, to press’; as a medical verb (as in the phrase ‘to ward a patient’). These few usages, of course, by no means constitute the total Use; we have not even entered into the great variety of usages and of use that occur as soon as ward is used as a legal technical term. But now let us examine ward in another kind of usage, by considering the following two series of examples: – (a) ‘To Acres ward.’ ‘To heaven ward.’ ‘To us ward.’ ‘Northward.’ (b) ‘The fore-ward foremost, the battell in the midst, the rere-ward hindemost.’ ‘Forward be our watchword.’ ‘First cast the beam out of thine own eye, and afterward . . . ’ Now imagine English to be a language in which the notion of suffix does not exist; in which, firstly, ‘foreward’ is thought of as ‘fore- ward’, and secondly, in which words such as ‘warden’, ‘warding’ and ‘wardage’ are all pronounced ‘ward’. Introduce now one or two instances of ward in usages that overlap those of other words (e.g., ‘I have done my dewte in every pronte according to your warde’, ‘well ward thou weep’). Ignore now the possibility of there being differences between archaism, colloquialisms, different forms in different dialects, proper names and
28
Basic forms for language structure
ordinary descriptive words, different pronunciations and different spellings and you have the ordinary naive Chinese conception of a tzu4, or word2 [Editors note: Tzu4 is a romanisation of a Chinese character. See Figure 1 Below, and note at the end of this chapter]. Tzu4 are ‘what is talked about’, ‘the (only) linguistic small change of everyday life’, ‘the term of prestige’ (when people want to talk about words). And it is that great ‘spread’ of the tzu4, interlingual both in space and time, that is denoted by a single ideographic ‘character’, that is, by a visual shape, or graph. There is, as well as this, in Chinese, what Chao calls the ‘‘‘free form of word’’, consisting of one or more syllables, which may enter into what we should call syntactical relations with other similar units. Linguistically, this would be much more like what we call ‘‘a word’’ in other languages than monosyllable is. But it has no everyday name in Chinese.’ (italics mine). Thus, with regard to this conception of a word, and as between ‘word’ in English and ‘word’ in Chinese, we actually have, in a language situation, that state of affairs that philosophers so often wrongly apply to scientific situations, which consists of there being considerable agreement about the nature of the facts, but no agreement about which way of describing them should be stressed. As regard the facts, we ourselves have tzu4, or general root words, in English; ward is one. With a little trouble, we could compile a glossary of the most fundamental English tzu4, and we could then invent ideographic signs for them. But we have no need for this conception of a ‘word’. Similarly, the Chinese have our Oxford Dictionary ‘words’; but they envisage these as compounds – what we think of as phrases. They have no distinctive concept in common use by means of which to describe them, and when they occur in a statement there is no ideographic way of indicating the fact. After this beginning, it is no wonder that Chao, with Chinese in mind, finds himself obliged to disentangle seven or eight logical conceptions of a word. All of these, except the European word, the Chinese phrase or ‘free word’ are variations of, or subdivisions of, tzu4. Thus, Chao first distinguished the tzu4 as given by sound from the tzu4 as given by sight; a very necessary task when dealing with an interlingua. He then points out that, while a gramophone recording of a Chinese tzu4 produces what is in some sense a real replica sound (what Peirce calls an icon), the visual graph of a tzu4 bears no such close resemblance to it. ‘Characters’, or written tzu4, 2
It has been represented to me, in discussion, that none of the words that I have so far used to mean ‘total Use’ are sufficiently logical in tone to enable them easily to be used to denote a logical concept. I therefore propose fan, which has been suggested to me by E. W. Bastin. By adopting fan, instead of tzu4, to mean ‘total Use’, I can now restate the argument of this paper by saying that it draws attention to the possible far-reaching results of building logic on fans, instead of on usages, or words.
Words
29
are symbols; they are not icons. On the other hand, they are not wholly arbitrary symbols, as are words written in an alphabetic script. The visual ‘character’ for ward, for instance, would be a stylised picture of something like a Martello tower: a picture of a round cairn of stones, used for purposes of defence; and this ideograph would tell you something about the ‘spread’ of ward. Here, again, we have a situation in which, as between English and Chinese, the facts are not as different as they might be, and where the main difference lies in what is logically stressed. In English, we have quite a few ideographic signs, especially the punctuation signs. In Chinese, there are quite a few arbitrary graphs – graphs, that is, in which no connection can be discerned between sign and sense. Nevertheless, the logical idea of what the relationship between sound and symbol ought to be, in Chinese, is quite different from that arbitrary relation that we think is all that ought to hold, in English, between the spoken and the written word. In Chinese, the analogy between a language and an ideographic symbolism, that is, a logical symbolism, is built into the very form of the script. Chao then distinguishes initially between three things: (1) the spoken tzu4 (2) the transcribed tzu4 and (3) the portrayed tzu4, the ‘character’ or graph. But these distinctions turn out not to be enough in Chinese, as indeed they would not be enough in English. For, in the series (a) bear (to support), (b) bare (gave birth to), (c) bear (the animal, ursus) and (d) bare (unclothed), we want to distinguish the third and the fourth not only from the first two, but also from each other. And to do this we need yet another conception of the tzu4, the etymon. Here, immediately, we get two interrelated senses of ‘etymon’; for we can have a wider sense in which many cognate tzu4 (taking tzu4 indeterminately, in senses (1), (2), or (3)) can be said or shown to come from the same etymon; and a narrower sense in which the notion of etymon only applies to one general tzu4. In the wider sense – reverting to our original example – we can say that ward and award come from the same etymon; in the narrow sense, we can say that ward meaning ‘prison’ and ward meaning ‘person in chancery’ come from the same etymon, and, in both senses, we can say that warred, though pronounced the same way as ward, comes from a different tzu4 because it comes from a different etymon. So we have already five senses of tzu4: (1) the spoken tzu4, (2) the transcribed tzu4, (3) the portrayed tzu4, (4) the etymon and (5) the root etymon, consisting of a ‘family’ of (4). A moment’s further reflection will suffice to show, however, that we need many more kinds of tzu4 even than these. For, under (2) we have (6), the transcribed tzu4 analysed into phonemes, which must be distinguished from the unanalysed transcribed word. And under (1) we must distinguish between the ‘simple’ spoken tzu4, the synchronic linguistic unit, the tzu4 (sense (1)) of one dialect, and both
30
Basic forms for language structure
the diachronic and the interlingual tzu4 (classes of these). Moreover, under tzu4 (sense (3)) the written ‘character’, we must distinguish, in Chinese, as in English, usage from Use. Take the Chinese jen2, in English humanity. We distinguish in English, by sound, human from humane. Chinese does not; the Chinese speaker must distinguish the two by context. But the modern Chinese writer can write two different characters for the two. On the other hand, the ancient Chinese writer, by writing them the same, could put himself in a position to produce a very elegant (and algebraic) script form when writing the paradox humanity is humanity (i.e., ‘to be fully human is to be humane’). But the logical question that here arises is: In the case of words that are two usages of the same tzu4 (sense (1) and that are different derivations from the same etymon (tzu4, sense (4)), if these are written with different characters and thus are different tzu4 in sense (3), do we call them, logically, different tzu4 or not? Let us now tabulate Chao’s main senses of ‘word’. (He has other refinements but we can leave those to the linguistics): I. ‘Free’ or phrases (roughly, Oxford Dictionary ‘words’). II. tzu4: 1. spoken tzu4: (a) the synchronic ‘simple’ word (i.e., the spoken tzu4 used at one time in one homogeneous speech unit); (b) the interlingual and/or diachronic simple word (i.e., the spoken tzu4 consisting of a class of 1(a)); 2. transcribed tzu4: (a) the phonemically transcribed tzu4 (which may refer to 1(a) or 1(b), as desired); (b) the phonetically transcribed tzu4 (e.g., the tzu4 as recorded on a gramophone record or tape-machine); 3. portrayed tzu4: (a) the logically ideal ideographic symbol, in which the ‘spread’ of the character is to all usages of tzu4, and to only those usages that give extensions of the original ideographic idea; (b) the actual ideographic symbol, which tends to have a much narrower ‘spread’ and to stand for only one group of usages of 3(a), thus appallingly multiplying the number of Chinese characters listed in dictionaries; (c) logically aberrational ideographs, including mistakenly copied characters, phonetically borrowed characters, and any other logically abberational characters. 4. The etymon, or etymological tzu4: (a) as used to distinguish homonyms that would be identified with one another under classifications 1 and 2.
Words
31
(b) as used to group cognate tzu4 as defined in 4 (a) into classes that have the same ultimate etymological root. (N.B. These are what should be, but often are not, portrayed by 3(a).) Thus we have nine primary conceptions of a tzu4 and ten of a ‘word’; and, of course, in practice, new conceptions of a word are continually being framed and used that are ambiguous as between some combination of these nine. And this list is moderate, the moderation being due to the fact that Chao is an exceptionally philosophic and self-conscious grammarian. Thus, speaking grammatically, it is an understatement to say that it is not clear what a grammatical word is. What is clear is that Chao’s distinction between a ‘free form’ and a tzu4 is logically fundamental. In the next section we shall consider what this means. 3.
‘From the point of view of logic and philosophy, it doesn’t matter anyway what a word is, since the statement is what matters, not the word.’
It was contended, at the beginning of the last section, that it was not enough for the philosopher of language just to ‘keep up with’ the logical revelations made by the philosophic linguist. The reason for this contention must be now made clearer. It is that logical revelations are interconnected. We cannot, for instance, allow ourselves to be logically impressed by forming a new conception of what a word is, without finding our new conception of what a statement is. For the logical ferment caused within us by reimagining words will not bound its action so as only to apply to our thinking about words. In order, however, strongly to initiate this logical revolution within ourselves we must be really impressed by the potentialities of a new language’s basic logical forms. We must not just say, as the philosopher of language, in contrast with the grammarian, tends to say, ‘Yes, I quite see. The Chinese grammarian means by ‘‘word’’ what we mean by ‘‘general root word’’. How interesting’, and then go on talking about statements and entailments between statements, just as before. And why is it that we must not go on doing this? For two reasons, which are themselves interconnected: 1. It can be shown that if we once adapt as our logical unit the logical conception of a tzu4, for instance, rather than that of a word, we take a step that has fundamental logical effects. 2. In so far as we have adopted as logically fundamental that analytic technique that consists in comparing, in context, slightly differing usages and uses of the same word, just so far we have already committed ourselves, though we may not know it, to a technique that works in terms of tzu4, not in terms of words.
32
Basic forms for language structure
Let us take the first point first, and go back to Chao; let us continue, that is, the train of thought of the last section, where we left it off. Chao ends his analysis of different conceptions of a Chinese word by pleading for the creation of a general Chinese language, based on a general Chinese syntax and made of ‘general words’. He proposes this as though it were the most ordinary thing in the world to propose, for reasons that he thinks will appeal to the practical linguist. Does he realise, though, what havoc he is making in the foundations of logic? Does he realise how far he is driving us back towards the later Whitehead’s conception of a word? Does he realise that he is pressing us, in all seriousness to adopt as logical unit the Platonic idea? The possibility of having a partially workable idea of a general word [says Chao] is, of course, to be explained by the fact that (Chinese) dialects are historically interrelated . . . While (from the point of view of linguistics) this is only a partially workable idea (of a word), it is a question of more practical concern to the user of a language than a (more ‘simple’ or ) strictly synchronic idea (of a word). A person who asks about the pronunciation of a word not only assumes that there is a corresponding ‘same’ and ‘correct’ pronunciation in other dialects, he also assumes, rightly or wrongly, that the ‘correct’ pronunciation is the same as that of some standard (Chinese) dictionary. The degree to which the boundaries of a general word are definable depends upon the unity of culture of the speakers . . . and the extent to which the body of the literature is still living (though pronounced by modern speakers in modern forms). From the point of view of producing clear and neat descriptions of linguistic phenomena in China, it is neither necessary nor possible to set up the idea of the general word . . . But for the compilation of practical dictionaries and for language-education in China . . . and for discussions of such problems as Chinese syntax . . . the idea of the general word is often unconsciously assumed. We have here tried to make this idea explicit, with a warning that it is not an idea of clearly defined scope unless and until it is artificially frozen into a system of general Chinese. (The creation of) this, though theoretically illogical and impossible, is yet practically necessary and highly desirable.
Two considerations immediately obtrude themselves for discussion, from the despairing cri-de-coeur that Chao’s last sentence contains. The first is that, in language, what is ‘practically necessary and highly desirable’ cannot be ‘theoretically illogical and impossible’. In such a case, ‘theory’ and ‘logic’ must yield. The second is that formal examination of an ideographic written script – of a script that is based on the tzu4, not on the Western word – should yield first clues to the kind of concepts that will be required to ‘model’ logical forms in a language made of ‘general words’. I have endeavoured to state elsewhere what these clues are, and to hazard a first guess on the way in which they should be used. But rather than enter further now on so large a question,
Words
33
I wish to pass to the second point made earlier, to the effect that, in spite of appearances, ‘ordinary language philosophers’ are usually working with tzu4, not with words. Consider the following philosophical argument: S.: It does not seem to be invariably the case that from I know that p there follows p. T.: What ? ? ? S.: Consider the sort of statement that might be made, on occasion, by an absentminded man. ‘I know I left it in my waistcoat pocket last night’, he says, speaking of his watch. But it is evident, from examination of the waistcoat pocket made early this morning that he couldn’t have left his watch in his waistcoat pocket last night; he knew wrong. Or consider this statement, made by a mediaeval Flat-Earther: ‘We all of us know that the sun goes round the earth’. In this case, it would follow from the statement ‘we all of us know that the sun goes round the earth’ that ‘I (the same speaker) know that the sun goes round the earth’. But it would not follow from this that the sun does go round the earth because, in fact, the earth goes round the sun. T.: What you are saying here seems to me completely fantastic. For in each of the two cases that, it seems, you quite seriously put forward in support of your claim that from ‘I know that p’, there does not always follow ‘p’, you are producing the statement of a man who thinks he knows that p, but in fact does not know that p; and who therefore inaccurately says that he knows that p, when the fact of the matter is that he does not know that p. But in every case in which I correctly say, ‘I know that p’, there will invariably follow p. But, of course, in cases in which someone incorrectly says that he knows that p, well, I cannot see what can be said about such cases at all. S.: And yet it is precisely in such cases that, in ordinary life, I feel most inclined to say, ‘I know that p’. In the majority of the cases in which, in your sense of ‘know’, I know that p, it would never occur to me to say, ‘I know that p’. In such cases I should never dream of saying more than just ‘p’. T.: You might not dream of saying ‘I know that p’. Nevertheless, those are just the cases in which it would be correct for you to say, ‘I know that p’. Etc., etc., etc.
Now all of us are familiar with examples of such arguments. What we do not sufficiently face is how serious they are. That two logicians should argue against one another from such different presuppositions without either of them being able to bring forward any of Mill’s ‘considerations capable of influencing the intellect’ in order to help the other to understand what he or she is trying to say and do, this is, logically speaking, a very serious matter indeed. Consider firstly what a disadvantage S is at, compared with T. T has clearcut logical generalisations to put forward; he knows (on his logical and linguistic presuppositions) what the ‘correct’ (i.e., logically fundamental) sense of ‘know’ is. He knows, too, what are and are not the effects of adopting as fundamental this logically fundamental sense of ‘know’; how, once you adopt it, from ‘I know that p’ it follows that p, in a well-established sense of
34
Basic forms for language structure
‘follows from’. T can reason; T can make logical connections; to T, S must look nothing but a sort of logical spoilsport. For S has no reasons to put forward for his attitude; in a perfectly good sense of ‘connection’ and of ‘think’ S can make no connections: S cannot think at all. All S can do is to keep on saying, feebly, ‘But we don’t say this sort of thing, in ordinary language’; and to this T can always reply, ‘Then you should say this sort of thing in ordinary language, for my sense of ‘‘know’’ is the logically correct sense of ‘‘know’’’. If, in such an argument, any rational considerations are advanced, the persuasive power of T’s arguments will be overwhelming. In fact, S has already given his whole case away by conceding that sometimes, from ‘I know that p’ there can follow ‘p’ in T’s sense of ‘follow’and T’s use of ‘p’. S must on no account concede this. He must not concede this because, in the face of T, there is only one thing that S, if he is to remain rational, can do. The one rational step he can take is that he can become self-conscious about the kind of logical unit he himself is using. For, having done this, he can then firmly allege, with formal demonstration, that the adoption of this new form of logical unit fundamentally affects the form of all logical connections that are made by using it; so that (if you once use the new form of logical unit) ‘follows from’ is no longer the ordinary kind of ‘follows from’ and ‘the statement p’ is no longer what used to be thought of as the statement p. To hold his own against T, S must maintain, in fact, that new logical units require new logics. Far from disdaining the help of formal methods of analysis on the ground that, in the analysis of anything as flexible as natural language, no one calculating procedure can throw more than a limited light, S must, on the contrary, develop new formal methods of analysis, on the ground that no one calculating procedure (i.e., T’s calculating procedure), language being what it is, can throw more than a limited light. If S did this, T would be at a disadvantage, for T would have to say things like, ‘Your new logic may be more powerful, but I find the old logic much easier’, which would be an admission that S is now setting the logical pace for T. As it is, T (let’s face it) is defending rationality, for there is nothing rational, in reply to him, that S can say. But is S adopting a new logical unit, as contrasted with that of T? Clearly. The logical unit that S is feeling for is that of the total spread, or Use, of the concept ‘know’. T is adopting, as his logical unit, one, not very common, usage of the Use (‘know’ as contrasted with ‘think’, or with ‘should be inclined to judge’, or with ‘guess’); and T is assuming that only this usage of the total Use of ‘know’ is such that it can be logically used. But T’s selection of this usage as the only possible one is based upon coherence principles, not correspondence ones; for, as S keeps saying, ‘know’ is not used mainly or exclusively in this usage in ordinary language. What S is feeling for is not the usage, or meaning of, the ‘word’ ‘know’, but the ‘spread’ of the tzu4 ‘know’. The concept, the general idea of which can be
Words
35
Figure 1
expressed by the phrase ‘I’ve got it!’: that concept the spread of which includes phrases such as ‘know-how’ and ‘know-all’ and ‘in the know’: that wide and indeterminate concept in terms of which knowledge is primarily something you have got, not something which ‘is’ (for the Flat-Earthers had ‘got’ something, that cannot be denied, when they said that they knew that the sun went round the earth, whereas we, having also ‘got’ something, say that the earth goes round the sun); that tzu4 the ‘character’ for which is an archer hitting a mark. What prevents S seeing that this is what he is looking for is the fact that he has been inspired to look for it by adopting an apparently a-logical analytic technique – one that is very much more successful at drawing attention to new forms of difference of usage and of use than at drawing attention to new forms of connection between them. It has not occurred to S, then, to make a symbol to represent ‘‘‘know’’: total Use’: a symbol whose direct field of reference is to the total indeterminate spread of a tzu4, or Word. But once it does occur to S to do this, it may well occur to him, too, that the potential advantages of doing something like this are immense. For the fact that formal logical games are played with indeterminate units does not prevent specific rules from being devised to govern the methods of combination of such units. Moreover, the games can be so played that, as the sequence of units extends, the spread of the sequence becomes more and more determinate; as complication increases, indeterminacy grows less. It is extremely difficult to exemplify this process in English, since we have become accustomed to think of English as built up out of words, not of tzu4. But let us see what we can do with our original example. If I say or write just 1. ward it is impossible to tell what I mean. The total spread of the tzu4 ‘ward’, as we have seen is very highly indeterminate. But suppose I say: 2. ‘I ward thee a ward, Ward. WARD! Ward WARDward, Ward! Ward not to be warded, Acreward, in ward, I ward thee, WARD! WARD! WARD! I ward thee, WA-ARD!’ it is a great deal more evident what I mean. Moreover, the constructions that I am using are the kind of instructions that are used every day in
36
Basic forms for language structure
ordinary colloquial language, though no one could say that I am using ordinary language. I am warning somebody called Ward to defend himself in a certain way, that is, by parrying the other man’s blow with the ‘ward’ of his sword. I am further warning him that if he doesn’t do this, with some agility, he (Ward) will find himself carried of to Acre and there cast into a ‘ward’, or prison cell. So I am warding him a ward, or word, that he had better look slippy, or dreadful things will befall him. I am clearly a coward and a prig, or I would myself come in and help. But possibly I cannot because I have myself already been captured, or because I am looking down from the top of a high wall. Now, in example 2, I have not merely strung together eighteen different occurrences and uses of the tzu4 ‘ward’. I have, in addition, punctuated my string. I have put in stress marks, capital letters, a suffix (‘-ed’), a few connecting phrases and another tzu4, ‘Acre’, here shown in its usage as a proper name. But what formal device have I used to do all this? Bracketing, nearly always. Sometimes I bracket words together by actually joining them, as in ‘wardward’. Sometimes I bracket together several wards, lightly with a comma or more fundamentally, with an exclamation-mark or full stop. Once I ‘bind’ three wards together by increasing the stress. And the small words, or other devices that I use, are used only as operators to define the interrelations of these brackets. Suppose, now, that I write out my example again, using the following conventions – that is, using a more tzu4-like notation: I can now write 3. [[[ / ((({W}(W))W)F) / ((W)E) / (((W(WW))W)E) /// ] ( ((W((NW)P)) ((AW)IW)) ((({W})W)E)) /// ] / ((W)E) / ((W)W) / ((({W})W)E) /// ] It seems at first as if it would be quite impossible, on positional data such as is given here, to distinguish different usages of the tzu4 ward from one another. Experience shows, however, that given a sufficiently long unit of discourse, this is not so, and that some rules at least for distinguishing usages from one another can be devised.3 And, even now, this is only the beginning of what I could do. For I have devised no rule as yet to ‘reduce’ my formula, though this is clearly a formula that could be reduced. Neither 3
It has, however, been made clear to me, in discussion, both that this formalisation is not selfexplanatory and also that to elucidate the reasons for producing it in this way would require a second paper on ‘Words’ of equal length with the first. In entering this new universe of discourse, the logician has to prepare himself for two successive and contrary shocks. The first is that of finding, as one logician said, that there is such a very great deal of indeterminacy everywhere. The second is that of finding – once one has faced the first shock – what
Words
37
Table 1. Conventions for tzu4 tzu4
Abbreviation
Ward Acre Finality (assertion) Exclamation Inclusion Passivity Nonning (cancellation)
W A F E I P N
Table 2. Conventions for the phrases and phrasing Phrases/phrasing
Abbreviation
‘I . . . thee’ ‘a . . . ’ ‘ . . . punctuation sign’ ‘...,...’ ‘_. ¼ ¼ ¼ ,_’ ‘ . . . and . . . ’
{...} (...) ( . . . ).) (...)...) / / / /// [[ ]]
Note: When two tzu4 are immediately juxtaposed, they shall be interpreted as bracketed thus: (( . ) .)
have I devised any rule for interrelating my bracketing conventions, although, clearly (to take only one example), the bracketing convention for ‘ . . . and . . . ’, that is, for indicating that I am saying both this ‘and’ that, is logically not independent of the bracketing convention for building up progressive stress. On all sides, fascinating logical vistas open out before me. I stand, like Alice, above a chessboard which covers the whole world. unforeseen new vistas open out, and what a lot can be done. The ‘philosophers of ordinary language’, in so far as they have interested themselves in formal logic at all, have made the mistake of putting their positive and negative results much too much at the service of the old logical approach, in order, apparently, to try and sophisticate it. Considerations drawn from the study of languages show, however, that this attempt is hopeless, that this sophistication never can be achieved by proceeding by steps, and on such lines. What is needed, if the philosopher of the new sensitiveness to ordinary language is to be put in a form that consists of considerations capable of influencing the intellect, is a fundamentally new approach to the problem of what logic is; a change of approach not less new and fundamental than – and indeed analogous to – that made by Brouwer to the foundations of mathematics when he first developed Intuitionism.
38
Basic forms for language structure
One thing, however, among many uncertainties, is already clear. Out of a sequence of indeterminate tzu4, however long, I can never build a completely determinate ‘statement’. For even if it be granted that each extension of sequence lessens the sequence’s total indeterminacy of spread, only in the case of an infinitely long sequence would zero indeterminateness be attained. So, if we base our logic on tzu4, not on words, only in the infinite case shall we get the (usually assumed to be) completely determinate ‘statement’ of propositional logic – and this even if there were no other formal differences between a sequence of tzu4 and a ‘statement’ in propositional logic, which there are. Thus the logical importance of finding out what we think about words – and the logical importance of examining and distinguishing usages in context – turns out to lie in the fact that our conception of a ‘statement’ (and, a fortiori, of logically possible forms of connection between ‘statements’) is fundamentally affected by our logical conception of a ‘word’. We do wrong, then, to underestimate the logical value of making these many and detailed examinations of usages and uses of words in ordinary language. For this discovery, that ‘word’-conception fundamentally affects ‘statement’-conception, is a very important discovery indeed. Editor’s note: in the original text, where tzu4 first appears, there is a footnote where MMB makes clear that she intends this character, whose meaning is not explained (and which seems to play the nonce role of WARD), to carry the force of ‘fan’ in her own special sense, set out in more detail in the next chapter.
2
Fans and heads
1.
The construction of the fan
Let us take a word, its sign, and its set of uses. Let us, in the simplest case, relate these to reality; about which we shall say no more than, since it develops in time, the uses of the word must also develop in time. Let us denote the word by a point; its sign by an ideograph (Masterman 1954), and its set of uses – all linking on to reality at unknown but different points, but all radiating out from the original point denoting the word, because they are all the set of uses of that same word, by a set of spokes radiating from that point. Let us call the logical unit so constructed, a FAN,1 and the figure will give the essential idea of such a fan (see Figure 2). From Figure 2 several facts can be made clear. The first is that, however many uses the word may have (however many spokes the fan may have), they will always be marked with the same sign. But it does not follow from this that all the uses of the word mean the same thing; that they all have the same meaning in use. It follows from this, on the contrary, that there is no one-to-one correspondence between sign and signified of the kind that logicians have always considered as an essential prerequisite for the construction of a mathematical theory of language, and that therefore a fresh type of mathematical construction must be envisaged, one, that is, which allows for a looser type of ‘fit’ between sign and signified. As shown in the key to Figure 2, the point of origin of the fan will be the dictionary-entry of the word – that is, the word taken in isolation, and in the totality of its uses, without it being initially clear what these uses are. Each of the spokes of the fan will then give one actual use of the word; it being presumed that each actual use of the word can be, theoretically at any rate, taken in isolation. 1
This term was suggested to me by E. W. Bastin, who also introduced me to the work of L. E. J. Brouwer (1952, 1954).
39
40
Let Let Let
Basic forms for language structure
= word (taken in isolation) = sign (to be pronounced flower) = one use of the word = flow of time
Figure 2
So we know that the fan has a point of origin, and spokes; we also know that it has a sign. What we do not know, finally or definitively, is how many spokes any fan has. This fact is indicated, in Figure 2, by the presence of the arrow indicating the flow of time. For at any moment, as we know, any new use of any word may be being created; and there will be no formal marker of this fact, since, as we have seen, all the uses of the word will be marked by the same sign. Now we can lay down, from our general knowledge of language, that there will be a finite number of spokes in any fan, provided that the total dictionary-entry only takes account of the evolution of the language from a time T1 (the approximate date when we first heard of the language) to a point T2 (the present time). For if we assume that any new use of any word in any language takes a finite stretch of time to establish itself, and that every language had a definite beginning in time, then it will follow that no word in any language has an infinite number of uses; that the number of spokes in any fan will be countable and finite. On the other hand, if we consider the totality of use in the language – that is, the use of a fan, for any T2 – then it will be clear that the number of uses of any word, though still finite until the language has lasted for an infinite stretch of time, is nevertheless tending to get larger and larger. This gives the number of spokes in any fan the property of a Brouwerian infinity, that is, of getting progressively larger and larger; which means in its turn that any theorem about the language will have to be proved for the infinite case, as well as for the finite one. That the figure we have constructed of the set of uses of a word is not wholly fanciful, say, from the point of view of linguists, may be seen by constructing a similar figure, using the same conventions, from the entry ‘PLANT’, taken from the cross-reference dictionary of Roget’s Thesaurus (Figure 3). For philosophers it may also be important to show that, with the same conventions, fans may be constructed from Wittgenstein’s concepts, by the
Fans and heads
41 PLANT (alphabetically given dictionary entry)
PLANT [place] 184
PLANT PLANT [insert] [vegetable] 900 367
PLANT PLANT [agriculture] [trick] 371 545
PLANT [tools] 633
PLANT PLANT PLANT [property] [a battery] [–oneself] 780 716 184
Figure 3
use of which, as will appear, the differing properties that Wittgenstein discusses with regard to his concepts can be further defined. The references used in this paper are all to Part II of Wittgenstein’s Philosophical Investigations. Wittgenstein is primarily investigating not the concept considered as the set of uses of a word, but the concept considered as a gestalt given by perception; of how it itself (considered both as an actual percept, given by experience, and also, by extension, as a picture or an image) can be affected by context. That this is so can be shown from the following passage: ‘I contemplate a face, and then suddenly notice its likeness to another face, I see that it has not changed; and yet I see it differently. I call this experience ‘noticing an aspect’. Its causes are of interest to psychologists. We are interested in the concept and in its place among the concepts of experience . . . (II, xi, p. 193e)
That Wittgenstein thinks, however, that there is an analogy (as well as a contrast) between the way in which context affects the meaning of a word can be shown by the following passage: I can imagine some arbitrary cipher – x, for instance – to be a strictly correct letter of some foreign alphabet. Or again, to be a faultily written one, and faulty in this way or that: for example, it might be slap-dash, or typical childish awkwardness, or like the flourishes in a legal document. It could deviate from the correctly written letter in a number of ways. – And I can see it in various aspects, according to the fiction I surround it with. And here there is a close kinship with experiencing the meaning of a word! (II, xi, p. 210e)
Let us now take the cases of the schematic cube, the duck-rabbit, and the triangle. In each case, I will quote the relevant passage, and then construct the fan.
42
Basic forms for language structure
Figure 4
[cube seen as glass cube]
[cube seen as inverted open box]
[cube seen as wire frame]
[boards forming a solid angle]
Figure 5
1.1.
The schematic cube
You could imagine the illustration appearing in several places in a book, a text-book for instance. In the relevant text something different is in question every time: here a glass cube, there an inverted open box, there a wire frame of that shape, there three boards forming a solid angle. Each time the text supplies the interpretation of the illustration. But we also see the illustration now as one thing now as another. – So we interpret it, and see it as we interpret it. (II, xi, p. 193e)
The fan of this concept would be as shown in Figure 5. 1.2.
The duck-rabbit
I shall call the following figure, derived from Jastrow, the duck-rabbit. It can be seen as a rabbit’s head, or as a duck’s. (II, xi, p. 194e)
The fan of this concept would be as shown in Figure 7. 1.3.
The case of the triangle
There is not one genuine proper case of [a description of what is seen], – the rest being just vague, something which awaits clarification, or which must just be swept aside as rubbish . . .
Fans and heads
43
Figure 6
[duck-rabbit seen as a rabbit’s head]
[duck-rabbit seen as a duck’s head]
[no indication is given as to which aspect will dawn first]
Figure 7 Take as an example the aspects of a triangle. This triangle can be seen as a triangular whole, as a solid, as a geometrical drawing; as standing on its base, as hanging from its apex, as a mountain, – as a wedge, as an arrow or pointer, as an overturned object which is meant to stand on the shorter side of the right-angle, as a half parallelogram, and as various other things . . . (II, xi, p. 2002)
The fact that Wittgenstein takes seriously the analogy between the concept as a gestalt given in experience and the concept consisting of the total set of uses of a word is clearly seen by his example of the triangle. For the fan of the triangle, if it is constructed by using only the conventions that have been established up to this point, conveys only a first approximation of knowledge of the ‘uses’ of this concept, since no subdivision of uses is so far allowed for in the fan. But in the case of the triangle, it is clear that those aspects of it that consist of seeing it as standing on its base or of seeing it as
44
Basic forms for language structure
Figure 8
hanging from its apex are far more like one another – are far more like genera of the same species of aspect – than those aspects of it that consist, for example, of seeing it as a triangular whole and of seeing it as a mountain are like one another. And similarly, as we shall see, in the case of a word, conventions have to be establishment of this distinction which turns the fan into a tree. Taking it, then, as established that Wittgenstein intends this analogy, what he intends to imply about the uses of words could be put like this: in the case of words, as in the extreme conceptual case of those ambiguous gestalt concepts that Wittgenstein has taken as prototypic of all concepts, new uses, in some sense, have to ‘dawn’ upon the user. None of these new uses are more or less fundamental than the others: none is more or less fundamental than the original use (it is not the case that there are ‘literal’ or primary uses of a word and secondary or ‘metaphoric’ ones); they are discrete from one another, like the ‘revisions’ produced by the gestalten: they are also exclusive to one another: if you ‘see’ the duck rabbit as a duck, you do not at the same time see it as a rabbit. In short, as always with Wittgenstein, who saw the world not only in depth, but also with exactness even when what he was talking about was inexact, the comparison between the set of revisions of the ambiguous gestalt figure and the set of alternative uses of a word points the way once more to an exact method for describing the apparently ineffable. The main difference between the two cases consists in the difference in the amount of information given by the context. In the conceptual case the sign itself is logically luminous: one can ‘see through’ it to the ambiguity that it signifies by looking at the ambiguity that it actually exhibits. In the linguistic case the sign, taken in isolation, is logically opaque; it is impossible, for instance, to foretell, by looking at the sign ‘flower’ given in Figure 2 that a plant concept might exist in some language which would have the set of uses given in Figure 3. On the other hand, in the case of the word the context can be made explicit to an extent to which, in the conceptual case, it usually cannot. And these two defects in the analogy cancel one another out in this way, that each case allows of some method being developed by which each separate aspect or context can be exactly defined. When that
Fans and heads
45
method fails, one gets a feeling of giddiness: both when looking at an undeciphered ideographic language, when one knows the context is there but one cannot get at it (‘Those signs look like the names of common objects – but they’re not’) and when looking at a doubtfully interpretable plate or result in a textbook (‘Is that result meant to be ambiguous – or am I mad?’) Wittgenstein, then, as will appear later, seems to me to have opened up an extremely important line of inquiry by using a gestalt figure to represent a concept, and one that would repay a much longer commentary. I shall not, however, give this for the moment, since for the purpose of this paper this whole discussion of Wittgenstein constitutes a digression, but shall merely conclude by constructing a first approximation to the ‘triangle’ fan. The fan of this concept, on the conventions so far established, would be as shown in Figure 9. This is all that can be said, at this stage, to elucidate the general notion of a fan without considering the fan as part of a language.
[as a solid]
[on its base]
[as a triangular [as a whole] geometrical drawing]
[as a [as a wedge] pointer] [as a [as an mountain] arrow] [as overturned]
[as hanging from its apex]
[as many other things]
[as half of a parallelogram]
[tendency to progress from more mathematical to more concrete uses]
Figure 9
46
Basic forms for language structure
2.
The language consisting of only one fan
We must now proceed to consider the Language [i.e. as a whole, Ed.]. It is trivially obvious that, logically speaking, a language could be considered that had only one fan. This fan would then have to be used in every situation that could exist for the speakers of that language, and it would have a finite or infinite set of uses according as the speakers of that language did or did not succeed in stereotyping the situations to which the fan referred. Thus, on Wittgenstein’s analogy, we can imagine an animal language with only one fan, which referred to only a small set of highly stereotyped situations: ‘STRANGER-APPROACHING-NEST’ ‘EXTENSION-OF-PERSONALITY-IN-NEST’ [EGG] ‘STRANGEROUTSIDE-NEST’ [YOUNG BIRD HAVING FALLEN OUT]. In the case of this language the sign for the fan might be what students of animal behaviour call a sign-stimulus: for instance, the invariable and immovable splash of red on the robin redbreast’s breast. This one sign has to stand for a warning to the hostile bird which is approaching the robin’s territory: that is, it has to have a use WARNING-TO-STRANGERAPPROACHING-NEST. It also has to act as a recall sign to the robin’s mate when he is sitting on the eggs to allow her to get food,2 that is, it has to have a use URGENT-RECALL-TO-RESUME- SITTINGON-EGG; and as a slightly differing recall sign to the young bird who, through having strayed just a little too far from the nest, is being stereotyped as ‘STRANGER’, instead of as ‘EXTENSION OF PERSONALITY’, that is, it has to have a use ‘RECREATEEXTENSION-OF-PERSONALITY-PATTERN-FOLLOWED-BY-RECALL’. With difficulty, then, an animal language, referring to a world consisting of a small set of highly stereotyped situations, might be conceived of as having only one fan. A moment’s reflexion, however, will show that this imagined language is an artificially isolated abstraction. For the positioning of the sign-on-the-red-breast would by no means be the same in all the three situations; moreover, in situation three, the strayed young bird might chirrup, and the parent robin might hear it and chirrup back, in which case there might be a supplementary recall sign; and so on . . . and so on. If we were to describe this language, and the robin’s life, at all realistically, we would have to allow the set of positions in which the robin could put his breast (stuck out, to impress a stranger; just showing above the nest, to recall his mate; half-turned on, as he doubtfully observes a de-stereotyped
2
I do not know whether robins do behave like this, but the point is immaterial.
Fans and heads
47
offspring) as themselves being additional signs in this language; not to mention the additional complications introduced by the chirruping . . . So it becomes clear that from the fact that a fan can have an indefinitely large number of spokes, it does not follow that we can actually imagine a language consisting of the total set of uses of a single fan. It only follows that we can formally construct such a language – which is a very different matter. And the further consideration that arises from this one is that in the construction of any language consisting of fans, including the language consisting of one fan, two kinds of considerations are relevant: (1) considerations arising out of the principles of logical construction of any fan; and (2) considerations arising out of the set of contexts of any fan. I will call the first set of considerations, following Brouwer (his terms are spread law and complementary law) the fan-law of the language; and I will call the second set of considerations the context-law of the language. I will further say that, whereas the fan-law of the language is logically determinable, the context-law has to be gathered piece by piece, since it is derived from material that is empirical. The fact that a language consisting of one fan will have no fan-law – that is, it will not be, in fact and in detail, imaginable – can also be shown by taking the other case of such a language that is always cited, namely the case where the sign of the fan is a pointing hand, meaning ‘This’. For a pointing hand, it is argued, can point to anything; therefore it must, by definition, be possible to imagine a language with a single fan, the sign of which would be a pointing finger, which covers every situation that will be encountered by the users of this one-fan language and in the language of which, therefore, the number of spokes of the single fan will be infinite. But imagine the situation in which anyone would use such a language. Such a man would be considered, in relation to his fellow men, dumb; and he would use his one-fan language only to increase his knowledge of their n-fan language, so that he could begin to use the n-fan language as soon as possible. Conversations of this type could be imagined: ‘dumb’ man points; man who can talk replies ‘house’; ‘dumb’ man then repeats, after man who can talk, ‘house’. Dumb man points again; man who can talk replies ‘bicycle’; dumb man then repeats, after man who can talk, ‘bicycle’, and so on. But no conversation can be imagined, with both men pointing, in which they never did anything else but point, and always pointed in the same way, that is, using the same sign. This would be a ritual; it would not be a language. For any communication to take place at all, some variation of response must be made to the sign, for it to have a set of uses, and the set of uses constituted by this variation of response can itself be taken as at least one other sign. So we see that, in the infinite case as in the finite, it is logically possible to construct, but not actually possible to imagine, a language consisting of a single fan.
48
Basic forms for language structure
3.
The fan law and the context law
Let us now consider the fan law of any fan. It will consist of an amplified definition of a fan. 3.1.
The Fan law of any fan
A fan is a formal construction such that 1. it has a sign, which we will call the sign of the fan; 2. it has a point of origin, which we will call the hinge of the fan; 3. it has an unknown number of connections between its point of origin and a row of other points; we will call these connections the spokes of the fan, and the row of points so connected with the point of origin the row of points of the fan; 4. in the case of any spoke, we shall call the relation that connects the point of origin with the row of points the determination relation of the fan. This fourth clause of the fan law at once brings up the question as to whether we can say more, in the fan law, about the determination relation of the fan. And here we have to ask ourselves: ‘What would it be like to know more of this relation?’ To this it may be answered that we know already (in the sense that we have assumed already) that it is a single relation; can we similarly (and in the same sense) know any more about it? It seems to me evident that we can, once we admit of information taken from the context law of any fan as being relevant to the formulation of its fan law. For it can be intuitively seen, though not demonstrated, that if the point of origin of any fan is to be its total (ideal) dictionary entry, and if that is to be thought of as its total meaning, then we can say that the meaning of any use of the fan which forms a point in the row of points of the fan, is included in the total meaning of the fan, and we can formalise this determination relation as ‘‡’. We can further say that the total dictionary entry of the fan will have the form: ‘the use x1 and/or the use x2, and/or the use x3, up to xn’: and since we have here the well-known logical connective ‘and/or’, it can be formalised either by the propositional-logical alternation symbol (‘v’) or by the Boolean join (‘u’). And since we have already formalised the form of inclusion found in the determination relation not as the propositional-logical form of inclusion (‘=’) but as the partial-ordering relation (‘‡’), and since we can at present give no meaning to the other propositional-logical connectives, and particularly not to that of negation (‘’), without which the propositional calculus cannot be constructed in any form, the obvious formalisation for the and/or connective, as used or
Fans and heads
49
implied in any dictionary-entry,3 will be the Boolean join symbol [here written ‘[’, Ed.]. It now looks as though we can make two additions to the fan law; one defining the determination relation of any fan as the inclusion relation, and the other defining the hinge of any fan as the join (in the Boolean sense) of the points in the row of points in the fan. Moreover, having made these two additions to the fan law, it looks as though a mathematically reasonable state of affairs were beginning to set in; as though, for instance, we were getting into a situation in which we could ask, ‘Is a fan a directed partially ordered set?’ (see Birkhoff, 1940). A moment’s reflection, however, will suffice to show us that we cannot make these two additions to the fan law, because we cannot, as yet, ever exemplify them. For the inclusion relation, the partial-ordering relation, is defined as being reflexive, anti-symmetric and transitive; that is to say, it obeys the three axioms: ‘For any x, x ‡ x’ (the reflexive axiom); ‘For any x and y, (x ‡ y).(y ‡ x) (x ¼ y)’ (the axiom of anti-symmetry); and ‘For any x, y and z, (x ‡ y).(y ‡ z) (x ‡ z)’, (the axiom of transitivity). Moreover, the Boolean join relation might be defined as ‘Granted a partially ordered set in which every two elements, x and y, have a least upper bound, P, and a greatest lower bound, Q, we can define P as ‘‘x [ y’’ (and Q as ‘‘x \ y’’)’. But we cannot define either of these relations: not because we have no ‘‡’ or ‘[’, but because we do not have any ‘xs’ or ‘ys’. For the first clause in the fan law of any fan is, ‘A fan is a formal construction such that (1) it has a sign (i.e. one single sign) which we will call the sign of the fan’. I have further given an example of a fan, in Figure 2 above, a fan the dictionary entry of which (that is the hinge of which) and all the uses of which (that is, all the points on the row of points of which) have the sign FLOWER [Editor’s note: A flower symbol occurs here in the text]. For FLOWER let us now substitute x. By defining the determination relation of the fan as ‡, we can now assert that, in any fan, x ‡ x (that is, the meaning of the total dictionary entry FLOWER includes the meaning of any separate use of FLOWER): that is, we can assert the reflexive axiom. But we cannot assert the antisymmetric axiom, or that of transitivity, since the assertion of these 3
Note for Linguists: by the phrase ‘used or implied’ here I do not wish to go behind the actual texts of dictionary entries, replacing them as they actually are by something that I, as a logician, am asserting that they ought to be. I am merely wishing to call attention to the fact that whereas scripted dictionary uses (e.g. in the OED) tend to be joined by commas or semicolons, in colloquial dictionary entries (e.g. ‘The word ‘‘pianto’’, in Italian, can mean ‘‘plant’’ or ‘‘trick’’ or ‘‘plan’’’) the connective word, joining the list of uses, will sometimes be ‘and’ and sometimes ‘or’: i.e. it can therefore compactly be referred to as ‘and/or’.
50
Basic forms for language structure
requires in the first case two signs and in the second case three signs (not uses, but signs), and by my definitions of fan and of sign, given in the fan law, I have, by definition, ourselves made impossible for any more signs than one ever to occur in the fan. Of course, we can substitute the xs and ys for the uses, not the signs; we can say, ‘let x stand for ‘‘plant used as plant’’, y for ‘‘plant used as plan’’, z for ‘‘plant used as trick?’’’ But then the xs and ys, together with the uses they symbolise, become part of the context law and cease to be part of the fan law; for if they became part of the fan law, firstly there would be no way of telling them apart (i.e. ‘x’ could certainly well be used for ‘y’), and ‘x’ for ‘z’, in which case every formula would reduce to ‘x ‡ x’); and secondly, on the definitions, they would immediately become signs for other fans, and we are so far studying the fan in isolation; we have no other fans. It comes to this: what we are analysing, in analysing the set of uses of a word, is the situation at the foundations of all symbolism, where the normal logical sign-substitution conventions cannot be presumed to hold, because exactly what we are studying is ‘How do they come to hold?’ Now I am not the first to have asked myself this question. It has been asked before by many symbolic logicians, notably by H. B. Curry. For instance, in a lecture given in 1953 in Cambridge on Combinatory Logic and the Paradoxes, Curry began his lecture by saying that the object of combinatory logic originally was to analyse certain processes taken for granted in ordinary logic. It could therefore be described as a ‘pre-logic’. In constructing it, he continued, he had two fundamental motives: firstly, that of trying to simplify the foundations of logic, and especially the process of substitution; and secondly, that of trying to re-examine the theory of the paradoxes. In pursuance of his object, his thought went through two phases: that of analysing the substitution-powers per se, ‘taking all kinds of term to be on a par, as it were’; and that of analysing the process of categorisation of terms.4 Thus Curry admitted it to be part of the study of ‘pre-logic’ to analyse the substitution process itself. Moreover, this study of it has already had the revolutionary effect, for Combinatory Logic, of breaking down the distinction between logical constant and variable.5 But the pre-logicians, while breaking down the variable/constant distinction of category (under which variables could be substituted for one another throughout without
4
5
Thus Church describes his system as ‘a system without logical constants’, and Curry describes his as ‘a system without variables’, when they are speaking of two systems that are isomorphic with one another. H. B. Curry, ‘Lecture on Combinatory Logic and the Paradoxes’ (notes taken by MMB) (Curry, 1953).
Fans and heads
51
constraint, whereas constants could be substituted for one another only as permitted by the axioms of the calculus), have in fact left the actual substitution process just as it was (i.e. as it was in the Lower Functional Calculus for bound variables), merely incorporating an explicit conversion rule permitting it into the new calculus.6 Thus, in combinatory logic, as opposed to propositional logic, the nature of the permitted substitution process is indeed made explicit; but the process itself is not further analysed or changed. Now, however, by the analytic device of separating context law from fan law, we are enabled to see deeper into the substitution process itself. For the fan law will give the formal system of the language; the context law will give the contextual system of the language. (using ‘system’ in a slightly different sense here) and – this is the essential insight – an initial sophistication in the substitution process will enable the two systems to interact in such a way that our final mathematical knowledge of the combined system is greater than it could have been by consideration of either the formal system taken alone or the context system taken alone. That this can be so can be shown only by giving an actual fan law and an actual context law, and by following a step-by-step process of watching the two interact. Let us now examine the type of initial sophistication of the substitution process that is required in order to get a first taste of the ‘feel’ of the kind of interaction that can be produced by it. In order to consider the kind of change I propose, let us reconsider the substitution process from the point of view of combinatory logic. In the combinatory logics, although the distinction between constant and variable has been broken down, it is not the case that all distinction between types of symbol has been broken down. For the formulae of combinatory logic contain two kinds of symbol, not one (and this not counting brackets); firstly, the upper-case roman letters, which stand for combinators – that is, for the names of certain operations that can be performed on any string of elements – and secondly, lower-case letters, which stand for the paradigmatic strings of elements that can be combinated. In the lambda calculus there is only one special sign, the lambda l, which, in any formula, binds all those elements that immediately follow it and that precede the dot; but this does not alter the fact that, in this calculus also, there are two categories of signs, the l and the elements – not to mention the fact that by establishing
6
‘First rule of conversion. If an expression M contains a bound variable, an expression which is convertible with M will always be obtained by replacing the bound variable in M, wherever it occurs, by another variable which did not originally occur in M. Thus l’’ cl’ ’, and {l’ ’} c{l’ ’} But l’ ’ is not c l’’ nor l’’ l’ ’ . . . . This rule is the generalisation of the rule which holds for bound variables in ordinary logic . . .’ (Feys, 1946. Translation: MMB).
52
Basic forms for language structure
certain l-sequences as additional premises in the l-calculus, these additional premises can then be used as combinators. By mathematical convention, then, if not by mathematical assertion, variables have names. The traditional algebraic variables, x and y, stand for numerals; the traditional propositional variables, p, q, r, stand for statements. And if the second characteristic of variables be removed, namely that they occur in systems in which there are other signs that are not variables, then it becomes clear that the fact that variables have names becomes their first and vital characteristic without knowledge of which their method of operation within the system cannot be understood; only secondarily does it matter (if it matters at all) that they vary – that is to say that by operating with one or more substitution rules, a further symbol, giving a concept with a single meaning, can be substituted for the variable. The question therefore arises: If we analyse the substitution process taking account of the first and third characteristics of variables, but leaving out the second, should we not be re-analysing the substitution process itself? The answer to this is that this is just what we are doing when we create, in vacuo, the notion of a fan. For the point of origin, the hinge, of the fan is the variable that is also a name (a name for a total list of dictionary entries); and the row of points of the fan are the individual uses, or meanings, that can be substituted for this variable. Moreover, the requirement that a language shall consist of more than one fan already gives the beginning of an extra restriction upon the substitution rules of this new type of variable; for the rules of such a language, say, consisting of two fans, and , will give us the rules of substitution on combination of and in such a way that, as will appear, what we analyse, in analysing such a language, is the very basis of the substitution process itself. As soon, however, as we begin to work with the inclusion relation, as was suggested earlier, the variables with which we work are no longer so basic; for, by creating the contrast between variables (that is, fans), and relations, that is the relations (such as inclusion and join, ‘‡’ and ‘[’) that operate within the fan, we are again embedding the variables within a formal system, that is, the fan law, the operation of which, once it becomes explicit, will prevent us from further analysing the nature and properties of the fan itself. It becomes clear then, even from this preliminary discussion of the nature of the analysis of the substitution process in pre-logic, that the first thing to do, if a further analysis of the required type is to be obtained, is to re-examine the nature and interrelations of the fan law and the context law; not further to develop the fan law in isolation in order to do a more reasonable sort of mathematics.
Fans and heads
53
Earlier I gave the fan law of any fan, to the extent that it can be given when we are considering only one fan. Let us now, assuming the same conditions, give the context law of the fan. 3.2.
The Context law of any Fan
1. Let the dictionary entry of any fan, which is formally indicated by the hinge of the fan be . . . . . . (here insert what the actual dictionary entry of the fan in question is). 1. (a) Let the logical form of the dictionary entry of any fan be presumed to be: [first entry] and/or [second entry] and/or [third entry] . . . up to n entries. Let these be formalised as [first entry] [ [second entry] [ [third entry] [ . . . [nth entry]. 2. Let the contextual meanings, or uses of any fan, which are formally indicated by the row of points of the fan, be . . . (here insert for any fan, or set of fans, the actual set of uses of those fans) reading from left to right along the row. 3. Let the relation between the dictionary entry of any fan and its set of uses be that of inclusion of meaning, formalised as [dictionary entry] ‡ [use in question]. Comments 1. The fact that the set of uses of any fan is ordered from left to right along the row of points, that is, in the same direction as the flow of time, gives the dictionary maker the right logically to order the set of uses of any fan as he or she wishes. In most dictionaries this logical order is given as the historical order of the development of the uses of the word in the language. 2. Any inferences that can be made, in the case of any fan, as the result of logical information obtainable by comparing or otherwise analysing the set of uses of the fan, will be thenceforth straightforwardly usable in the system. For example, suppose that of the set of two uses of a particular fan with a dictionary entry cleavage one use means ‘to stick together’ (‘she, cleaving only to him . . .’) and the other use means ‘to split apart’ (‘split the stone and thou wilt find me, cleave the wood and I am there . . .’). This double use of ‘cleave’ is, I think, mentioned in a paper by Waismann. We can then say, from our knowledge of the context law of this fan, that one of these uses is complementary to the other; and if it had been already established that the mathematical system constituted by the fan was such that it could make sense to say of it that it was a complemented system, then the actual property of complementation could be added to it, from information derived from the context law.
54
Basic forms for language structure
3. Of course, the temptation now is to say, ‘What you are examining, in examining the relations between fan law and context law, is no new thing. It is simply the time-honoured relation between any formal system and the field of its application.’ That this is not so – that it is a new, and more logically primitive, set of relations which is being examined – can be seen from the two following considerations. 1. The ‘system’ consisting of the field of application, call it LA, is cognate in structure both to the ‘system’ that is being employed to analyse it, call it LB, and to the further ‘system’ that is being employed, call it LC, to discuss the possibility of applying LB to LA, and also to discuss the conditions under which LB can be applied to LA. At first it might be thought that the instrument is not merely cognate, but the same, in all three cases: that Language (LC) was being used to discuss whether a Language (LB) could be employed to analyse Language (LA). A moment’s reflection, however, will show that Language, here, is being used in three considerably differing senses. For LC (sometimes called, by logicians, the Meta-language) is straightforwardly a version of ordinary philosophic English; it is the language in which this paper is written, and a linguistic description of it could be obtained by taking the paper as text. LB (sometimes called by logicians the formal system) is a much queerer version of the English language. It has an extremely restricted vocabulary, and extremely tight syntax, and a special method of generating new expressions out of itself by calculation rather than by ordinary discursive concatenation, which is the method of sentencegeneration in LC. LA is more restricted even than LB, but differently so. It consists, or will consist, of a few simple, paradigmatic sets of uses of words and their classifications, which are examined, by using LB, in order to see how close an approximation, at infinity (a phrase used in a Brouwerian sense to mean, ‘When we have finished the activity of constructing language’ – a point that will never be reached) can be built to a language like, but even richer than LC; that is, that abstraction which we think of as ‘the total English language’. Thus we can say, for rough and ready purposes, that LA is contained in LC; we can also say (trivially) that LB is contained in LC (since any formal system described will form part of the language of the paper describing it). What we cannot feel sure of is whether, at infinity, LA is contained (or, if suitable examples were chosen, could be contained) in LB; though it is clear that LB is not contained in LA. Thus LA, LB and LC are not identical; they are cognate. The first must be the source of the context law of any language under examination; the second must be the source of any fan law of any language under
Fans and heads
55
examination. But (this is the point that I here want to stress) because they are cognate, which means having features in common, they can interact; in particular, common features that are contributed from LA can be progressively incorporated into LB. Language is not the only field in which such incorporation can occur. Brouwer’s proof of his fan theorem, for instance, essentially depends on saying that any mathematical proof can be considered as a sub-system having the same properties as the system of his numerical primordium (that is, the initial blur from which he will construct the real-number continuum). Now the number of steps in any proof is finite; so also, therefore, is the number of steps after which determinate numbers appear in the primordium. Other instances of such cognateness could no doubt be used. Now this incorporation, in applied mathematics, does not normally occur in engineering for instance; the steel and stone of which the bridge is built do not themselves mathematically contribute to the formulae, except trivially by supplying units for the formulation. 2. In applied mathematics a formal system is applied to a field, the formal system supplying, in a sense, the formal law and the field the context law. But the fan law, as given up till now, is not a system: it is an expanded definition. It is, moreover, a developed way of thinking of an analogy; ‘Language is like a net’ (I might have said, making a very long story very short) because the set of uses of any word is like a fan. Instead of doing this, I think, as exactly as I can, about a certain sort of fan; I focus my conception of a fan: in other words, I take one only out of the set of uses of the fan ‘fan’. Nevertheless, this conception only develops an analogy by extended definition in such a way that the analogy can later become a system; it does not itself, in the fan law alone, suffice to create the system. And this helps to show how it is that initial analogical thinking, in a new branch of science, can similarly help to develop and generate more systematic forms of system; how it is that the first stage of developing a systematic scientific theory is always first to develop an analogy. For the analogy, like the theory, can have fan law and context law; and often, as between the tenor and vehicle of an analogy (see Richards, 1936), the generative feature that I have described as incorporation can occur, whereas when premature systematisation is attempted in the science, it often cannot occur as between the prematurely developed system and the description of its field. These last remarks need expansion, and there is no place here to expand them. They are inserted here only to present the case for saying that the relations between the fan law and the context law are more intimate, more generative, more primitive than the normal relations between a mathematical system and its field.
56
Basic forms for language structure
Author’s note: I believe myself to be indebted to E. W. Bastin, for conversations in which I gained the ideas of fan law and context law; to T. J. Smiley for a conversation in which I gained the idea of removing progressively differences of category from the symbolism; to I. A. Richards for this conception of analogy; and to W. S. Allen for what is inserted here about linguistics. Editor’s note: In the middle of the paper I have removed a technical section which is a transcription of some of Feys’ axioms for combinatory logic, but which do not add to the core of the argument. Editor’s Commentary This paper is difficult and an extreme example of MMB’s belief that certain kinds of logical formalism were crucial for understanding the function of language and yet, at the same time, she retained all the ambivalence of Wittgenstein towards such formalisms. What she always did was to choose non-standard forms of logic in which to seek such insights, and in the present paper the key figure is Brouwer, who wanted to reduce the whole basis of logic to something simpler, as many have before and since. His basic notion was that of difference in time: the world separating into distinguishables before and after a moment, and MMB sought to link this to her basic intuition that a word’s senses developed in time in a lawful, non-random way, as opposed to a mere historical listing of sense accretion of the sort found in the OED. This notion has always been attractive on the fringes of computational linguistics and artificial intelligence (cf. Givon, 1967; Wilks, 1978; Copestake and Briscoe, 1991; Pustejovsky, 1995). MMB wants to argue that senses generated by some fan-like process can later be placed into a quite different classificatory scheme, the thesaurus (as in Chapter 5), and it is the interaction of these two structures to which she always returns. The real problem is that the fan is a metaphor and not, in her work at least, any kind of generative mechanism that could actively predict what the interpretation of the next fan spoke for a word would be.
3
Classification, concept-formation and language
The argument of the paper is as follows: 1. The study of language, like the study of mathematical systems, has always been thought to be relevant to the study of forms of argument in science. Language as the scientist uses it, however, is assumed to be potentially interlingual, conceptual and classificatory. This fact makes current philosophical methods of studying language irrelevant to the philosophy of science. 2. An alternative method of analysing language is proposed. This is that we should take as a model for language the classification system of a great library. Such a classification system is described. 3. Classification systems of this kind, however, tend to break down because of the phenomena of profusion of meaning, extension of meaning and overlap of meaning in actual languages. The librarian finds that empirically based semantic aggregates (overlapping clusters of meanings) are forming within the system. These are defined as concepts. By taking these aggregates as units, the system can still be used to classify. 4. An outline sketch is given of a mathematical model of language, language being taken as a totality of semantic aggregates. Language, thus considered, forms a finite lattice. A procedure for retrieving information within the system is described. 5. The scientific procedures of phrase-coining, classifying and analogyfinding are described in terms of the model. 1.
The point of relevance of the study of language to the philosophy of science
Two very general disciplines have always been thought especially relevant to our understanding of the nature of science. The first of these is the study of mathematical system and the second the study of language. Up to now, however, the two have been very unequally prosecuted; for whereas the first, metamathematics, has advanced within living memory to being a fundamental discipline in its own right, the second, the study of language – of 57
58
Basic forms for language structure
language, that is, in so far as it helps us to understand science and the kinds of thinking we use in science – remains as naif as ever it was. Thus, the study of the philosophy of science stands actually in an intellectual predicament. For just as the raw material of metamathematics is system (and, in particular, any system that can be taken as, or made to give, an arithmetic), so the raw material of the philosophy of science is or ought to be language: the actual stuff of scientific arguments, together with any form of discursive talking or writing that contains, or that can be made to give, a scientific argument. And scientific argument, in science as it really is, is not systematic, in the metamathematical sense of system. It is, on the contrary, extremely haphazard. Scientists use apparatus or system, when they can, just as carpenters, when they can, use a power lathe; but the tool must not be identified with the activity itself, which is one of constructing argument, or with its product, which is the statement of a discovery made. Of course, in so far as the art of building scientific argument (what scientists, looking glum, call ‘doing the philosophy’) passes over into the skill of using and developing the technique (which, on the carpentering analogy, is using the tool), science does not only consist of language. But it is surprising what a lot of cursing and worry there is, even then, about the validity and interpretation of the technique; and all this worrying is done in language. Moreover, when the technique ceases to be a cause of worry, then, ideally, you are not doing science with it any more, because, by that time, you will have handed it over to the technologists. To do science with it again, you have got to make it give out a fresh cause of theoretic cursing and worry; and that means, once more, using language. So science, basically, consists partly of fiddling (with a system, or a tool, or a machine, or something) and partly of using language. But now, just as to see the relevance of the study of language to the study of science we have to understand science in a particular way (e.g. to understand that science, like other philosophy, is built of language), so we have to understand language in a particular way; we have to ask ourselves, ‘To do science, what is it that we want of language?’ It is obvious, for instance, that to do science we do not primarily want grammar-book language. Indeed, scientists compile tables and draw diagrams and take trouble over this, much more than they take trouble about getting their natal grammar right. To go on with, science is international and interlingual, so that such matters as absence of a true copula, in modern Chinese, and of all prepositions except one in Ibo, or the general absence of correspondence in function of all the auxiliary parts in language as well as of all the principal ones, this cannot matter to it. Linguistics, then, is not to be our method of studying language. We want to get at language, in so far
Classification, concept-formation and language
59
as it is seen in general to be rather like question-answering, rather like table-compiling, rather like classifying, rather like diagram-drawing and analogy-transforming; we do not want precisely to pinpoint or to interrelate the particular grammatical or other functions of a single language. And this thought shows us that logic is not to be our method either; or at least not propositional logic, as first conceived by Russell. For the units of language as we want it have got to be concepts; concepts to classify, concepts to use as letters or arrows or labels on the diagrams. And the idea of a language as consisting of a set of interacting concepts was precisely that which Russell, following Moore, found that he had to abandon, before he could start framing the propositional calculus.1 Only one form of contrast, negation (which is, for practical purposes, far too few) and, at most, three connectives (which is quite often, for the scientists’ purpose, two too many) and a set of disconnected gaps to be filled not with concepts, but with ‘expressions’, isolated sentences, that is, squeezed of all semantic juice, this is all that is left when you have so depleted language as to make it able to turn into propositional calculus. And language thus conceived is far too weak, too little information-bearing to be of interest to science: no scientists mind about whether they have got their propositional connectives right in an argument anyway. So it is clear that propositional logic is not what we want here. There are other, second-order forms of analysis of ‘language’; such as statistical analysis, and information theory. But these are irrelevant because they are second order; they presuppose that a unit of analysis, or of information ‘for example, a word’ has been determined before you ever start using this form of analysis. So they are not what we want here either. The purpose of this paper, then, is to propose a new type of analysis of language, namely, language conceived as a classificatory system. And the model of this new conception of language, which has the advantage of being concrete, so that it can be both constructed and examined, is an extremely advanced classification system, which could be mechanised and used under the supervision of a librarian to retrieve relevant documents from one of the world’s greatest libraries. I should have said, strictly speaking, one of the world’s future great libraries, one of those sciencefiction libraries of the future into which, day by day, hour by hour, the world’s new written matter will be inserted without its ever being touched by human hand. It is part of the thesis of this paper that such a library – continually increasing and for ever largely unread – will come in the end to a state in which it can sufficiently serve as a model of the unread world of
1
Concepts are more exactly defined later in this chapter.
60
Basic forms for language structure
nature itself. And, to the extent to which it can so serve, the system used to classify it will serve, also, as a model for language, as the scientist uses it to classify the world. 2.
A new model for scientific language: the classification system of a great library
To embark on the philosophy of library classification is to split one’s self in two. On the one hand, it is all so simple and obvious; we have all used libraries with subject-classification systems. On the other hand, once one seriously takes such a system as a model of language itself, one is in a new countryside, in that one arrives at a very non-obvious view of language indeed. Six facts immediately force themselves upon the attention, each of which can be stated in the form of a definition. 1.
Definition of a library point
The basic unit of the system is not a word, but a card. It is a card with a phrase (sometimes a word, but nearly always a phrase) written on it. ‘Physical world, Eddingtonian conception of ’: ‘Physical world, general’: ‘Physical study of strain on aircraft frames of proceeding at Mach speeds’: ‘Physical affects of weightlessness upon human physiology.’ We will call the set of all such cards with phrases written on them the set of library points of the library. Since we are now using the library as a model for language, we shall similarly envisage the basic units of language as being the set of language points of the language. 2.
Definition of an event (or situation) in the world
It is worth seeing how the library points got into the library; all the more so as many people who are otherwise sophisticated still conceive of a library classification as being nothing but an alphabetised index of titles and authors’ names. They got in, not because of any facts about the nature of phraseology or the nature of libraries, but because of events that happened in the outside world. That is to say, the selection of the phrases written on the cards is dictated by events that sufficiently attracted the notice of document-writers (and/or of librarians) for phrases describing them to have merited cards in the library. There actually lived
Classification, concept-formation and language
61
a physicist named Eddington, for example, who had new notions about physics. Writers – quite a number of writers – commented on these. Hence the library point ‘Physical world, Eddingtonian conception of ’ has to be created. At a certain point in history, aircraft began to fly at Mach speeds. One of the effects of this was that special studies had to be made of the deleterious effects, on aircraft frames, of flying at such speeds. Hence it became necessary, in the relevant libraries, to create the library point ‘Physical study on aircraft frames of flying at Mach speeds’. Thus the classificatory tags used as subject classifiers in any great library are by no means the words or phrases most frequently to be found in the classifier’s language. They are, on the contrary, words or phrases that have forced themselves on the classifier’s notice owing to events that have occurred in the outside world. We shall therefore say that, in the library, an event (or, more complicatedly, a situation) is any occurrence of any kind that has created a library point. And, applying the model to language, we shall similarly say that an event (or, more complicatedly, a situation) is any occurrence that has created a language point; that is, an event that has been sufficiently noticed by speakers of that language for descriptions of it to have become frequently occurring words, or phrases, in that language. In so far as the outside world contains events or situations that have not produced library points (or language points), we shall ignore these. For us, the totality of noticed situations that have produced language points constitutes, quite straightforwardly, the whole outside world. 3.
Definition of a document in the library
A book, or document, in any sophisticated library system, is not primarily classified either under the author’s name or under its title. It is classified by means of a document called a term abstract, which, as well as author and title, contains a list of the words or phrases occurring most frequently in it: that is to say, of the set of library points by which it is to be defined. Thus, if there were two documents the term abstracts of which contained exactly the same set of library points, these would be indistinguishable in the library; and given a document X, uniquely defined by a set of library points a, b, c . . . n, we can say that X is the document in the library that is concerned both with a and with b and with c . . . and with n. 4.
Definition of a tag in the library
The cards that form the points of the library are themselves interrelated under headings: ‘PHYSICS, GENERAL’: ‘PHYSICS, PARTICULAR’: ‘AIRCRAFT, PART OF, FRAMES’ would constitute such headings,
62
Basic forms for language structure
each of which is normally denoted, on the library card, by a decimal number: for example, 27.8 ¼ AIRCRAFT, PART OF. We shall call all such coded units of the library system tags of the library. A great deal of research in library classification consists in devising systems of such tags, such as, for instance, the Universal Decimal System (British Standards Institution, 1958), and, indeed, I shall propose here in our model system a multiple hierarchy of such tags. 5.
Definition of a concordance relation in the library
As well as the basic library point cards of the library (suppose them blue cards), there are also, in the index, subsidiary cross-referencing cards (call them white cards): ‘AIRCRAFT-FRAMES: SEE ALSO HOLLOW CONTAINERS’. ‘COOKERY: SEE ALSO CHINESE URNS’. There are a great many of these white cross-reference cards. In one reference library of 10,000 items, all on social science, it was estimated that there were five cross-reference cards for each original library card. It is evident, therefore, that if there are so many cross-reference cards in the system, they are likely to form an integral part of it, not an extra. Moreover, it is intuitively evident that they do. All librarians always assume that the cross-referencing relation is the same hierarchical relation as that which already exists between tags and points in the library: that is, they assume that a cross-reference is always from something more general to something more particular, as is the relation in the tags–points hierarchy. We shall say that there exists an asymmetric relation, which we shall call a concordance relation, between any two connected tags or points in the library. And similarly, mutatis mutandis, for language. (Provided that every tag or point in the library is interconnected with at least one other by a concordance relation, and that no two tags or points have exactly the same set of concordance relations, a tag, or point, in the library (as also in language) can be defined by reference to the set of its concordance relations). 6.
Definition of a component of a library point
We have postulated a library system, so far, that has two kinds of cards: library cards (blue cards) and cross-reference cards (white cards). We now postulate a third type of cards (grey cards), which we will call component cards, though more usual names for such components, in the literature of library classification, are ‘descriptors’ (Mooers, 1956), or ‘terms’ (Thorne, 1955). In order to understand the need for these, the library system must be examined in operation, not at rest. The object of a library system, pace
Classification, concept-formation and language
63
librarians, is not merely to classify – that is, it is not merely to insert more and more new documents, in an orderly manner, into the library – it is to get relevant documents (or parts of documents) out of the library. The library user (this is the theory, as the literature shows) comes to the librarian with some sort of request: ‘Give me what you’ve got on stress on aircraft frames, will you?’ or ‘What happens to aircraft frames when travelling at Mach speeds?’ The librarian has to transform the main semantic components of this request (e.g. ‘stress’, ‘aircraft frames’, ‘Mach speeds’) into a set, or set of sets, of library points. To do this, he has to be able to identify them in his index (which must therefore contain grey cards for such words as ‘stress’, ‘aircraft’, ‘frame’, ‘Mach’ and ‘speed’), even if the index already contains a single library card for ‘physical study of deleterious effects on aircraft frames of travelling at Mach speeds’, for he cannot be expected to hold the complete set of library cards in his head. Moreover, having extracted his grey cards, they must lead him by tags written in their corners to the most relevant library cards; and the library cards, by means of the tags written in their corners, must then lead him to the correct set of actual documents. Thus the general operation of extracting wanted information from an antecedently existing totality of ‘processed’ data (i.e., a large quantity of classified facts, or documents) is always envisaged, in the literature, as having two stages: (1) from request to system (i.e., a reformulation of the request in terms of library points), (2) from system to actual documents in the library. It comes to this: that a process of redefinition, of ‘translation’ (in the technical literature this is usually called ‘the process of information retrieval’) has to occur if information is to be extracted from any great library by users who do not know under what titles, or authors’ names, it will be classified. The components of the library user’s request have to be transformed into the relevant set of library points. The transformation therefore formalises, in our model, the whole colloquial process of a question being answered in language. The idea that a library user can make a request of a library system, and have it answered, will come as a novel one to English library users, since an English library user, before going to a great library, will probably ring up a specialist friend to get the titles and authors he wants. In the philosophy of library classification, however, the ‘translation’ performed by the specialist friend has got to be accounted for as part of the process of using the library. We shall define the components of a library point as the subset of the total set of grey cards in the library which, by means of their tags, lead the library user to that library point of which they are components, whether the actual words used in the formulation of the request are the same as those used in the definition of the library point or not.
64
Basic forms for language structure
3.
The process of concept formation in language
We have now concretely defined, in our library model of language, the notions of library point, situation (or event), document, tag, concordance relation and component of a library point. From the theoretic point of view, however, it will be clear that we have created more sorts of notions than we need. Situations (i.e. events in the world) need no longer concern us, since there is by definition a one-to-one correspondence between them and library points; and actual documents in the library need no longer concern us since they are now equivalent, by definition, to sets of library points. Thus we are left with the three types of theoretic entity, points, tags and components, with a single asymmetric, the concordance relation. Let us now see what actually happens in the library. If things happened in the way that the librarian wants them to happen the system would function smoothly and there would be nothing to record; but human nature does not conform to the wishes of classifiers. What the librarian wants, clearly, for his system to function smoothly, is that every user of the library should be compelled to frame his or her request by means of a set of components that the librarian has chosen – in terms of the official authenticated set of grey cards, as it were. He further wants every author, writing a document, which has got to be term abstracted, to use the same official key-phrases as every other author writing on the same subject. What is more, what would make the librarian really happy would be for the components of these standard key-phrases, which would then occur in the term abstracts of his documents (i.e. his library points, his blue cards), to be compounded of the same official set of authenticated components (grey cards) in which he had formerly forced the user to formulate his request. Thus he would use his official set of authenticated components (which he must evidently not allow to get too large) as a controllable notation, or interlingua, into which both requests and library points could be ‘translated’, so as to get from the first to the second in the simplest possible manner. And indeed a library system consisting of just such interlinguas of terms, or of descriptors (see Joyce and Needham, 1958), is what advanced classification research workers in this field currently desire to create. The excellent current idea, however, of constructing library systems from a carefully chosen set of basic terms or descriptors comes up against two snags, both due to the determination of human beings to exploit the richness of language. The first snag is that users of the library will not formulate their requests in the proper form. They say what they want to say, not what they are told to say. The second is that authors writing for the library on cognate subjects will not consent to use a shared set of standard key-phrases. They make up their own phrases: they do not borrow them
Classification, concept-formation and language
65
from others. In both cases, the result of this is that the largest set of components that the librarian can handle is too sparse for them; they can, and do construct respectively, their own personal set of personal components of their questions and answers. In spite of this, however (as the research classifier points out), what different people say about the same situations does largely overlap. Different people using different phrases ‘mean the same thing’ (if they did not, no classification of any information or data would be possible). It follows, therefore, that if the slightly differing words and/or phrases they use, either in formulating requests or in creating library points, could be clustered, somehow, in terms of their overlap of meaning, to form semantic aggregates, a set of key-signs standing for these whole aggregates (which could be, by fiat, kept to a reasonable number) could then form the actual set of units of the library system, by the use of which both requests and library points could be defined, and thus, in spite of the vagaries with which human beings use language, the whole process of information retrieval could again be simplified. Now this can be done. In fact, in two places, and on a small research scale, it has been done already. It has the effect of creating a new theoretic entity in the system: that of key-words, or keys, or (in language) headings, or heads, which stand for whole aggregates of almost synonymous words and phrases, so that now we have points, tags, components and keys in the system, and, to relate them, still the single asymmetric concordance relation. But the change made in the system by the introduction of keys is enormous, for in fact this change makes sense only if heads really exist in language: if language tends to classify itself in this way before ever the librarian gets hold of it. One new good concrete result of introducing keys into the system becomes immediately evident. It is that we can now get rid of the whole pack of white cross-reference cards; for we can now redefine crossreferences between points or tags in terms of contiguity between aggregates. Take the cross-reference ‘COOKERY, CONTAINERS: SEE CHINESE URNS’, the bizarre cross-reference which we imagined earlier. At first sight it looks as though there was very little indeed in common, semantically speaking, between the notion of ‘Cookery, containers’ and that of ‘Chinese Urns’. But that is because we were unthinkingly taking ‘Cookery containers’ and ‘Chinese Urns’ as library point components coming from different documents. Reimagine them now as keys. ‘Cookery’ would obviously stand for a very large aggregate of near synonyms and of otherwise related ideas. So would ‘Containers’. And under the aggregate ‘Containers’, if it were complete, would undoubtedly come ‘Urn’. Moreover, if the library, in an eighteenth-century manner, contained a
66
Basic forms for language structure
separate aggregate for things Chinese, this, if complete, would include both ‘Chinese cookery’ and ‘Chinese Urns’. So, where determinate points and tags cross-refer, aggregates overlap. But that this happens – which can be found out, in practice, by trying it – can only be accounted for theoretically, by making a further assumption. This is that the relation that connects members of aggregates to one another is the same asymmetric concordance relation as that which already connected the tags and points of the library classifications’ already-existing hierarchy. So the whole system is connected by a single relation. So the library cross-references have gone; but this is not the end of it. For what are tags but rather larger keys? Keys, that is, that stand for rather larger semantic aggregates than ordinary keys do. I have just cited ‘Cookery’, and ‘Container’ as keys; earlier I cited them as tags. Clearly unless we recreate tags, to make them something of a different logical nature altogether, which will handle aggregates, tags will disappear; they will just become super-keys. So, cross-references have disappeared from the library; and so, in the old sense, have tags. We are left now with library points, keys (of various sizes) and aggregate components. Now library points cannot disappear. If they did, we should never be able to get any documents out of the library. But (since all the variants of the phrases of which they may be composed have now become components of aggregates) the library points being now defined in terms of aggregates will swell in a most unpleasant manner. Suppose we have a key called ‘deterioration’, which stands for an aggregate of all kinds of unpleasant bad effects that can happen to material objects. Among the synonyms for ‘deterioration’ we shall have ‘stress’ and ‘strain’. We shall also, however, have other deleterious effects. Suppose we also have a key called ‘aircraft’. Under it among the parts of an aircraft we shall have ‘aircraft frame’. We shall also have other aircraft parts, for example ‘propeller’. Suppose we further have a key called ‘speed’. It will have all the high speeds among its synonyms. It will therefore have ‘Mach speeds’, ‘speed faster than sound’, ‘supersonic speed’. But it will also have, for example, ‘normal cruising speed’. Thus by picking up, in each case, different synonyms from the same semantic aggregate, a library point concerned with deleterious effects on aircraft frames of flying for a long period at Mach speeds, and a cognate library point concerned with the deleterious effects on aircraft propellers of travelling indefinitely at normal cruising speed will become the same library point; each will be defined in terms of the same keys, because each will have, as components, synonyms from the same aggregates. Moreover, if library points swell, components will now fan. Suppose, for instance, ‘Mach’ is a component in the library system. This component will
Classification, concept-formation and language
67
occur as a member of the aggregate ‘speed’. It will also occur as a member of the aggregate (if there is one) ‘famous men of science’. Thus the single component ‘Mach’ will fan out into two components, ‘Mach, speed’ and ‘Mach, famous men of science’. And more elastic components (such as ‘Physical’, which occurred in all the four library points given at the beginning of this paper) will fan out into an indefinitely large set of components: one for each aggregate of which ‘physical’ is a member. Thus if the crossreferences and the tags disappear from the library system, thus simplifying it, the library points swell and the components fan out, alarmingly. Clearly something must be done to keep the flexibility and the simplicity which the library system now has, while doing something to counteract the swelling and the fanning. But, by now, it should be clear that we are no longer researching only on library systems; we are also, straightforwardly, researching on language. We are developing a new, generalised conception of language that has evolved from the needs of library-classification research, but which must now be developed so as explicitly to represent what it has been implicitly referring to all the time, namely language (for library classifications, after all, are also in language). Moreover, a development of the model to make it more like language will also make it more efficient as a retrieval system, in that it will correct its fanning and swelling tendencies. In language itself, after all – as opposed to in the library model – we can distinguish the stress on aircraft frames from flying at Mach speeds from the wear-and-tear on aircraft propellers from flying at cruising speeds; although, as we have just seen, in the model, we cannot. In the next section, an outline sketch is given of the model as it explicitly applies to language. The keys of the library system have turned into heads:2 components remain, and two new units have been added to the system. The first of these new units is a new race of tags. These are no longer subject classifiers, like COOKERY and AIRCRAFT. They are now a small set of very general notions (more general than would be found in any natural language, but reminiscent, some of them, of the prepositions, classifiers and auxiliary verbs of natural language). These are denoted by monosyllables, followed by a shriek: IN!, TO!, FROM!, CAUSE!, CAN!, WILL!, MUCH!, PLANT!, BEAST!, MAN! They are used in the system, singly and in combination, to divide up the constantly swelling semantic aggregates into sub-aggregates; they can also be used, in a preliminary
2
By taking out the word key and substituting for it the word head, we are only making a stylistic change, not a formal one. We are exchanging the artefact metaphor of the quasiartificial keys in the library model for the more natural physiological metaphor for the semantically overlapping contexts of words and phrases piling up until they cluster and form heads in the language.
68
Basic forms for language structure
manner, to cut the now constantly fanning total set of components into subsets which are of a reasonable size, and which can be manipulated. For instance, by sorting those components that have the tag IN! in their tag entries from those that have not, one can get a reasonable pack from which to find a component for a request containing the English word ‘in’. The second new unit is a set of syntax markers:3 entity, qualifier, concreteuse, abstract-use, metaphoric-use, phrase-marker, applies-to-one, appliesto-many, proper name. Its purpose is to enable request components (the grey cards) and library-point components (the blue cards) to be grouped by being variously bracketed together, so that requests and points with the same tag-and-head specifications, but with different bracketings, can now be distinguished from one another. Thus the purpose of developing the model is to enable finer and finer distinctions between requests and between point specifications to be made (thus counteracting the swelling tendencies of the system) while providing mechanisms for cutting up the now huge component pack (the grey cards) into manageable sub-packs before calculation begins (thus counteracting the fanning tendencies of the system). But the development has the additional and welcome result of making the now bracketed sequences of tags and heads look very much more like sentences, though they never achieve sentencehood in the version of the model that is described in the sketch; sufficiently like sentences for it to be plausible to assert that the model is a model of language. Meanwhile, even if grammatically and linguistically the model is still crude – it could be alleged against it, for instance, by monolingual grammarians that it is still a system of interlingual semantic units, and is therefore still not very much like what they mean by language – philosophically, the idea behind it is profound. For it enables an empirical definition, for the first time, to be given of a concept, the traditional unit of thinking; a rather startling definition, but an empirical one. For every semantic aggregate that appears in the model we shall now define, philosophically speaking, as a concept, with the result that a concept now becomes something composite, like a gestalt figure. For it too is an aggregate that appears to have different aspects, according as you pick it up by different synonyms. It could be doubted, and has been doubted by Bar-Hillel, that the semantic aggregates of the system – the concepts – are, in fact, empirically founded. As detailed philosophical investigation of the uses of words in context extends itself, however, and as mechanised classification research develops, this doubt becomes more and more perverse. Because the fact is 3
‘Syntax’ is here used in the wide logical sense, in which no distinction is made between syntax and grammar. Even speaking linguistically, the distinction is artificial, since what is grammar in one language may be syntax in the next.
Classification, concept-formation and language
69
that there are clusters of overlapping contexts of words in language. J. L. Austin (1956) has found them, using a dictionary; and the US Patent Office has to presuppose them in order to retrieve previously existing patent specifications from its almost unimaginably enormous mass of material, when judging whether or not to grant a new patent request. Then classifications into semantic aggregates already exist in languages, though these have been compiled in a very erratic way. There is Roget’s Thesaurus, which has versions in English, French, German, Hungarian, Swedish, Dutch, Spanish and Modern Greek. There is Dornseiff ’s monumental Der Deutsche Wortschatz; and, going back further, there is the great sixth-century Chinese dictionary, the Shuo Wen. And now it is rumoured that there are rudiments of such Sumerian classification . . . This is one of the primordial ways of classifying language. Of course, in demand for a request for ‘proof’ that language so clusters, it would be possible to examine large-scale samples of it with machines. With M computers searching N texts (obtained and correlated by at least 2N tape-recorders) it would be possible in a ( just) finite number of years, to find all the possibilities of combination, in a language, of a set of preselected components of it. And if this were done, I think it could be shown that synonyms in a good synonym dictionary have formally similar possibilities of combination in a language. But in default of all this apparatus, and staff to work it, we can perfectly well trust, I think, the contextdistinguishing tricks of philosophical analysts; and perhaps, even more, the trained intuitions of librarians. And, of course, there may be another difficulty which may make the heads look arbitrary. They may change; they may not remain constant. They may reorientate themselves, in a language, every time two people talk; in fact, that they should reorientate themselves, this may be the object of all serious conversation. It may be that language feels: that it breathes. Let us return from these speculative heights to the model. It remains a criterion of the model, however developed, that it shall still serve for extracting information out of library systems. It has also, in practice, formed a new research basis for Mechanical Translation (Masterman et al., 1956); and a potential new research basis for mechanising the process of abstracting documents (Luhn 1956). 4.
Outline sketch of a model for language
I.
The units and subdivisions of the system
1. Let the units of the system be the following: 1.1. Tags: e.g. ‘IN!’, ‘UP!’ ‘MUCH!’, ‘THING!’, ‘STUFF!’ . . .
70
Basic forms for language structure
These will form a small finite set (e.g. 50–100 elements). Let the set of tags be formalised as: [T] ¼ A! ¨ B! ¨ C! ¨ D! ¨ . . . N! where [T] stands for the whole set of tags, and ¨ for the relation ‘and/or’. 1.2. Heads: e.g. ‘COOKERY’, ‘PHYSICS’, ‘CONTAINER’ . . . These will form a large but still finite set. Let the set of heads be formalised as: [H] ¼ A ¨ b ¨ c ¨ d ¨ e . . . ¨ n where [H] stands for the whole set of heads, and ¨ for the relation ‘and/or’. 1.3. Components: e.g. ‘stress’, ‘strain’, ‘wear and tear’, ‘supersonic’, ‘Chinese’, ‘urn’ . . . The set of these forms an open set. Let us formalise the set of components as: [C] ¼ c1 ¨ c2 ¨ c3 ¨ c4 . . . ¨ cn where [C] stands for the whole set of components derived from all languages, and ¨ stands for the relation ‘and/or’. 1.4. Markers: e.g. ‘Abstract-Use’, ‘Concrete-Use’, ‘Proper-name’, ‘Applies-to-one’, ‘Applies-to-many’, ‘Qualifier’, ‘Entity’, ‘Phrase-marker’, ‘Action’, ‘Kind’, ‘How’, ‘Completion-of-action’, ‘Process’, ‘State’, ‘Resultof-action’ . . . These form, like the tags, a small finite set. Let the set of markers be formalised as: [M] ¼ @ ¨ £ ¨ & ¨ þ ¨ * . . . ¨% where [M] stands for the whole set of markers, and ¨ for the relation ‘and/ or’. 2. The dictionary The dictionary of the system will be compiled by alphabetising any required list of components. The dictionary, as required for any purpose P (it being assumed that, when used for any P, it is finite), will consist of this given alphabetised list of components together with the dictionary entry of each. The dictionary entry of any component in the dictionary must contain one or more specifications in terms of tags, one specification in terms of heads, and one or more specifications in terms of markers. 2.1. Since [T], [H] and [M] are all finite, it follows that although [C] is infinite, being open, c, c being a member of [C], can be mapped on to some position in the system consisting of [T], [H] and [M]. A dictionary entry of any c will be formalised as follows:
Classification, concept-formation and language
71
c ¼ (tA! ˙ tB! ˙ tC! . . . ˙ tN!) ˙ (hm) ˙ (m@ ˙ m£ ˙ m& . . . m%) where t is any member of [T], h any member of [H] and m any member of [M] and ˙ stands for the relation ‘both . . . and’. N.B. When the system is interpreted, c stands for one use of a C. The total set of components in the dictionary [C] must be distinguished from the total fan of uses of any component C, which must in its turn be distinguished from any single use of any component, c. 3. The concordance relation Let the concordance relation, which holds between some but not all pairs of units of the system, be formalised as . This formalisation makes of the total system a partially ordered set. Since, however, the system requires also the relation ¨ (‘and/or’ and the relation ˙ (‘both . . . and’) to hold between elements, we shall impose on it the additional restriction that, for any two units in it, x and y, there shall be a least upper bound, or join, x ¨ y, and a greatest lower bound, or meet, x ˙ y. This additional restriction, applied to any partially ordered set, converts it into a lattice, L. The imposition of this restriction is not arbitrary. For given two components of language, @ and £, it is possible to conceive of and if required to name that which is either @ or £, or that which is both @like and £like at once (which may be nothing at all). From this it follows that the total system must be representable as a lattice (though in actual languages many elements will be unnamed). The case of all vacuous or self-contradictory names is represented by the O-element of the lattice. 3.1. Since [T], [H] and [M] are all finite, L will be finite. 3.2. Any dictionary entry of any c will be a meet in L. 4. The factor T Since the members of the set of tags in the language [T] (to which there is a partial approximation in the small closed sets of prepositions, postpositions, preverbs and postverbs, particles and classifiers which tend to occur in languages) – are chosen to be, so far as possible, semantically independent of one another, T will form a spindle-lattice of the form given in Figure 10. For semantic reasons, however, it is convenient to have two ranks in T. Some members of the set of tags in [T] tend to approximate to ‘prepoverbs’, in language, some to classifiers (e.g. PLANT!, BEAST!, MAN!) and some are required in both forms. For any t, say A!, let A!! be the member of [T] which is in the top rank of T, and A! be the member of [T] which is on the bottom rank in T. 4.1. This gives T the form shown in Figure 11. 4.2. Thus, suppose there are tags IN! and THING! Users of the system, assigning dictionary entries to components, may form the meet of IN!
72
Basic forms for language structure I = total set of elements of T I = A! ∪ B! ∪ C! ... ∪ N!
A!
B!
C!
D!
E!
F!
The property of O membership of T in L
Figure 10 IT
T2
A!!
B!!
A!
B!
D!!
C!
N!!
N!
OT
Figure 11
and THING! to convey the idea of a container (i.e. of a ‘thing’ that has something ‘in’ it); or else the meet of IN!! and THING! to convey ‘inning’ (i.e. the idea of containment, of ‘going in’, or ‘being in’ a ‘thing’) (see Figure 12). 5. The factor H; and the total semantic lattice MT H In so far as the members of [H] are semantically independent of one another, H will also be of the spindle form. But the system will not function unless two conditions are fulfilled:
Classification, concept-formation and language
73
IT
IN!!
IN!
THING! IN!! ∩ THING! = containment IN! ∩ THING! = container
Figure 12
1. there must be inclusion relations from at least one t to any h; 2. there must be a semantic residue in any h, not definable by any t or combination of ts (otherwise the h in question becomes redundant). Since [T] is a small set, whereas [H] is a large set, condition (2) is not difficult to satisfy; and the members of [T] must be chosen to fulfil condition (1). The lattice required to fulfil these conditions in all possible cases is formed by making the direct product of T and H, T H. We will call T H the total semantic lattice. 6. The factor M; or, the syntax-marker lattice For the system to function correctly, the set of syntax markers [M] must be divisible into mutually exclusive subsets, each with two members; for example, ‘singular’, ‘plural’, ‘concrete’, ‘abstract’, and so on; mutually exclusive sets that have more than two members being split into a number of sets, each with two members. Each subset is then inserted into a Boolean lattice of four elements, as in Figure 13. Provided that no dictionary entry, of any c, contains both an m and its complement, a Boolean lattice of even degree can then be formed to hold the total set of all the pairs of ms, namely [M]. This lattice will be of the form shown in Figure 14. 7. The total language lattice: L The total language lattice L is formed by making the direct product of the total semantic lattice T H and the syntax lattice M. Thus L ¼ (T H) M
74
Basic forms for language structure Neither @ nor £
@ = not-£
£ = not-@
Figure 13 IM
@
@£
&
£
@&
@£&
£&
@%
@£%
@&%
%
£%
&%
£&%
@£&% = OM
Figure 14
II.
The operations used in the system
1. Components and language points as points on L 1.1. Since any use of a component is defined by its dictionary entry, as in 2.2, any such component will fall on a point in L, which will be the meet of the constituent units of its dictionary entry. 1.2. any language point will similarly be the meet of its constituent components, and since each component will fall in a point in L, any language point will also fall on a point in L, which is formed by taking the meet of its components.
Classification, concept-formation and language D
D1
d1
75
d2
d3
d4
d5
d6
Figure 15
1.3. Any document can similarly be defined as a point on L, to be found by taking the meet of the library points that constitutes its term abstract. 1.4. Any library user’s request can similarly be defined as a point in L to be found by taking the meet of its components. 2. The main operation of the system: the transformation of a given set of request components (i.e. ‘the request’) into its equivalent, or nearest, set of library points (i.e. ‘the documents’) Let us call this the retrieval transformation: R. R can also be interpreted as translation, or as a generalised process of redefinition. 2.1. R1: the determination of the request. 2.1.1. Let us call any dictionary-entry one use of a component, as defined below in 2.2, Let [Dc] be the total set of uses of a component c. Thus Dc ¼ d1 u d2 u d3 . . . dn This [D] is a partially ordered set of the form shown in Figure 15. Thus Dcm ¼ C, where C is the total dictionary entry of the whole set of uses of any C, as given in 2.2. 2.1.2. If, now, we can add an element that consists of the actual spelling of the component (which is all that we know to be in common between all its uses), D is converted to a spindle-lattice, as in Figure 16. 2.1.3. All the elements of D are points in L. If therefore D be taken as the lattice representing the total dictionary entry of any C and ‘c’ as its spelling, both D and ‘c’ will be constructable in L, since only the two cases are possible; either (1) they will fall on existing points in L, or (2) they can be inserted as new points in L, to be completed by taking the join and the meet, respectively, of d1 . . . dn. 2.1.4. To be points in L, d1 . . . dn must be specifiable in terms of elements of T, of H and of M. Among these elements there may be some common to more than one member of d. Such common elements will form additional meets in D. This will prevent D from being a spindle-lattice.
76
Basic forms for language structure D = d1 d2 d3 … dn = C
D2
d1
d2
d3
d4
d5
d6
“C”
Figure 16
Such a transformation from a spindle-lattice to a more Boolean lattice will occur every time that D, the lattice representing the dictionary entry [D], of any C shows that there is something semantically in common between some pair c1 and c2 of the set c1, c2 . . . cn of component dictionary entries which form the total dictionary entry D, or any ‘c’. 2.2. The transformation R1: specification of any given request from the total dictionary entries of its components. 2.2.1. Let the set of sub-lattices of L which contain the set of total dictionary entries of all the components of any given request be formalised as: S ¼ Lc1 ¨ Lc2 ¨ Lc3 . . . Lcn where S is the given request. 2.2.2. Expand the d-points in Lc1 . . . Lcn into sets of T-elements, of H-elements and of M-elements. 2.2.3. Operating only upon the H-elements in the resulting set of formulae, look for common elements. 2.2.3.1. Case I: If common elements are found in the H-specifications of some d in every Lcm, reject all other ds in S. 2.2.3.2. Case II: If no common H-element is found, when comparing the H-elements of the series Sm, operate now not on the H-elements but upon the T-elements on the constituent formulae, and look for common elements. Two cases are possible: 2.2.3.1. Case I: If common T-elements are found in the T-specifications of some d in every Lc, reject all other ds in S.
Classification, concept-formation and language
77
2.2.3.2. Case II: If no common T-elements are found, use an algorithm for determining the lattice-distance between the constituent as of any two L cs, and keep only those ds in any Lc which, on the algorithm, are computed as nearest to some d in some other Lc. It is essential that only one d shall be retained from each Lc; that is that, after R1 has been performed, only one use of each component of the request shall be retained. 2.3. The transformation R2: the operation of converting the specification of the request, S, obtained under R1, into a set of library points. 2.3.1. It is assumed that the paraphernalia of the total system will include a list of the library points (or language points). 2.3.1.1. Match S, as obtained under R1, with each item of this list. Case I: a complete match is obtained. In this case, the operation R2 has been successfully performed. Case II: no match is obtained. In this case, a partial match must be tried for, to obtain the most relevant set of points which the system contains. 2.3.2. To obtain a partial match under R2, (1) perform operation 2.2.3 on the H-entries of S comparing the H-entries of S with those of each item in the list of points. If this fails, (2) perform operation 2.2.3.2 upon the T-entries, comparing those of S with those of each item of the list of points, in turn (this is a lengthy operation, but performable). If this fails, (3) similarly, perform operation 2.2.3.3. (This is a quite horrible operation to perform, but still finite.) Whether R2 is performed simply, that is, under 2.3.1, or complicatedly, as under 2.3.2, the result of performing it will in each case be the determination of a set of library points (or language points) from S. 2.4. The operation R3: the further determination of the set of library points (or language points). Earlier in this paper it was stated that as soon as a language model, made of semantic aggregates, is substituted for a library classification model as an instrument of retrieval, the library points all began to swell. In the example of such swelling, only H-specifications were given of the components of the library point in question. As soon as T-specifications are also given, the swelling tendency is very considerably reduced, since library points with the same H-specification, but different T-specifications, can still be distinguished; for practical purposes they are not the same library point. There is one type of such swelling, however, that no attempt has been made to deal with up to now. This is the kind of conflation between points
78
Basic forms for language structure
that results from ignoring the order in which their components are stated: a comparable confusion is caused by ignoring the ‘word order’ of the components of the request. The operations of lattice theory require such computativeness and associativeness; and, up to a certain extent, semantically speaking it works. ‘Top house’ does not mean the same as ‘house top’; nevertheless, they are still far more like one another than either is like ‘Schroedinger Wave Function’ or ‘Cookery’. As soon as the model begins to be used for language, however, it requires to have a device incorporated in it that converts the information given by word order into a latticeable form. And, indeed even in the library stage, a request for the exports from Great Britain to the United States in 1957 will not be satisfied by information about the exports in 1957, to Great Britain from the United States. The operation R3 is designed to make a further set of distinctions by introducing non-commutativeness into the system. 2.4.1. The transformations R1 and R2, by operating only upon H-elements and T-elements of S, have, in effect, operated only upon the total semantic lattice, T H, not upon the total language lattice L. 2.4.1.1. Take the set of components of the request, which remain after performing R1.4 Using an algorithm construct, with these, the smallest sub-lattice of the lattice TH which contains them all. Call this sub-lattice R. 2.4.1.2. Form the direct product of R with the syntax-marker lattice M, producing the lattice R M. 2.4.1.3. Using an algorithm,5 together with a list of permitted bracketting forms, convert the M-elements in the component dictionary entries of the request, as remaining after R1, into pairs of brackets that will bracket the H-elements and T-elements given under R1. Thus the M-elements, in the dictionary entries, now disappear; they turn into pairs of brackets. 2.4.1.4. Repeat operation 2.4.1 upon the H-components and T-components that are given by R2. Let the resulting sub-lattice be called P; and its direct product with M, P M. 2.4.1.5. Repeat operation 2.4.1.1 upon P M. The M-elements in the specification of the set of library points, as given in R2, have also disappeared; they have been converted into pairs of brackets.Thus the specifications of the request, and of the set of library points given as most relevant to it under R2, have now both been bracketed. 2.4.1.6. Match these specifications with one another, rejecting any numbers of the set of library points, as given under R2, of which the bracketing
4 5
This is by no means the easy matter that it sounds. This algorithm, equally surprisingly, is much simpler than it looks.
Classification, concept-formation and language
79
specification does not tally6 with the bracketing specification of the request, as given under R3. When the resulting reduced set of library-points has been obtained, R3 has been successfully performed. 3. The Subject-Predicate Relation It will be observed that, on this model, no close approximation to the subject-predicate form is ever made. Every request, and every library or language point, is finally transformed into a bracketed formula, of which the elements consist of tags and heads: but no further analysis of them is made. If requests and language points are specified by small sets of components only, this is all that is required. If the model is required to analyse whole texts, however, further development of it is needed. By bracketing every set of components into two parts, and using an algorithm, a weakened and generalised analogue of the subject-predicate relation can be obtained. 4. Conclusion: scientific ways of thinking seen in terms of the model And is this theory – is ‘language’, seen like this – of any use to the philosopher of science? In order to make up our minds about this, let us forget our original conception of scientists as individuals who are continually using language, and substitute for this a conception of scientists as individuals who are continually using the model. This substitution makes it possible for us now to ask, ‘What, stated in terms of the model, is the scientist doing?’ The first, and most general, answer to the question obtrudes itself: scientists are individuals who are creating new language points. Scientists, by virtue of their observations and fiddling activities (whether these are done by using mathematical systems or by using other apparatus) force the world to notice new events and situations for which they then coin new key-phrases. Their activities, however, do not stop there; for they are not only observers and reporters; they are also classifiers. Now, of course, once scientists are envisaged as using the model rather than as using language, it follows, by definition, that every time they use the model, they classify; since the model has classification built into it. It is therefore necessary to re-envisage their activity; the activity of classification, as used in science, must become that of reclassification (which, when it is looked at historically, is just what it is). The classificatory
6
Warning. The exact definition of ‘tally’, as used here, is not yet well understood.
80
Basic forms for language structure
activity of scientists therefore becomes that of restructuring the whole or part of the model. This involves re-imagining the model as a self-correcting or a self-organising system. Quite apart from any application to ways of thinking, techniques are currently being developed for doing this, in order to represent such reorientation as it may occur in language. A third and central activity of scientists is always held to be that of producing hitherto unthought-of analogies. It is always assumed, moreover, that there is, and can be, no way of ever computing analogy. The discovery of a new analogy is an ‘intuitive leap’, a ‘lucky guess’; and there can be no philosophy, so it is said, of lucky guesses . . . Carrying this argument further, there can be no real philosophy of science either. For real science consists, as was said at the beginning of this paper, partly in fiddling with apparatus and with systems, partly in retrieving, reselecting and reclassifying known information, and partly, also, in creating new analogies, and in making lucky and unlucky guesses. Here, however, the model really proves itself; for it is possible to develop, by using it, an algorithm for finding analogy. Thus, by using it, not only can ‘concept’ be empirically defined, but, this having been done, analogy itself can now be formally defined. In short, the thesis of this paper is that, except in the sustained, intense stage of actual fiddling, science might well consist in manipulating language points computed from a language. But to do this we need, as philosophers, steadily to envisage not ‘ordinary language’, but ‘language’ language imagined as a theoretic entity: language caused to be manipulatable by being made into a net-like schema. By using this, algorithms can be worked, reclassifications and selections can be made, and answers (which, for scientists, are hypotheses) given to requests. In fact, perhaps scientists do use the model. In this case, perhaps also a basis could be constructed for a philosophy of real science.
Part 2
The thesaurus as a tool for machine translation
4
The potentialities of a mechanical thesaurus
There are five sections to this chapter: 1. The logical effect that adopting the logical unit of the MT chunk, instead of the free word, has on the problem of compiling a dictionary. 2. Dictionary trees: an example of the tree of uses of the Italian chunk PIANT-. 3. Outline of a mechanical translation programme using a thesaurus. 4. Examples of trials made with a model procedure for testing this: translations of ESSENZ-E, GERWOGL-I and SI PRESENT-A from the Cambridge Languages Unit’s current pilot project. The simplifications that the use of a thesaurus makes in the research needed to achieve idiomatic machine translation. 5. Some preliminary remarks on the problem of coding a thesaurus. 1. In MT literature: it is usually assumed that compiling an MT dictionary is, for the linguist, a matter of routine; that the main problem lies in providing sufficient computer storage to accommodate it. Such judgements fail to take account either of the unpredictability of language (Reifler, 1954) or of the profound change in the conception of a dictionary produced by the substitution of the MT chunk for the free word. By chunk is meant here the smallest significant language unit that (1) can exist in more than one context and (2) that, for practical purposes, it pays to insert as an entry by itself in an MT dictionary. Extensive linguistic data are often required to decide when it is, and when it is not, worthwhile to enter a language unit by itself as a separate chunk. For instance, it has been found convenient to break up the Italian free word piantatore into the chunks PIANT-AT-ORE. It has not been found convenient to break up the Italian free word agronomi into chunks AGRO-NOM-I, but only into the chunks AGRONOM-I, since the addition of -NOM- to -AGRO- enables the distinction to be made between AGRO- meaning ‘agriculture’ and AGRO- meaning ‘bitter’. Experience shows that the cutting down of the number of entries, and the compensatory extension of the range of uses of each entry, caused by 83
84
The thesaurus as a tool for machine translation
the substitution of chunks for free words are together sufficient to call in question the current conception of a dictionary article. In this paper I shall speak of current dictionary articles, MT dictionary entries and thesaurus items. 2. From the logical point of view, it can be shown that the range of uses of any chunk form a tree. Some paths of this tree are open to alternative analysis, but a considerable number of the paths, as of the points, can be determined on objective criteria determined by the immediate context. For instance, the use of the Italian chunk PIANT- in the free word piantatoio is clearly different from its use in the free word piantatore. Moreover, the design of the tree can often be tested by its predictive value: for instance, in making the tree of the chunk FIBR- a junction point that had to be inserted to account for well-established uses later found, when a larger dictionary was consulted, to be exactly fitted by the use of FIBR- in the free word fibroso, which had not appeared as an article in the smaller dictionary. [ . . . ] Dictionary articles that contain PIANT1. im-PIANT-ament-o, s.m., implantation, building, establishment. 2. im-PIANT-are, v.tr., to establish, to settle down to business, to found. impiantare una scrittura, to open an account. 3. im-PIANT-arsi v.rifl., to take one’s stand. 4. im-PIANT-it-o s.m., floor, tiled place. 5. im-PIANT-o, s.m., establishing, setting up of a business. 6. PIANT-a, s.f., plant; tree; (arch.) plan, groundwork; sole ( pianta dei piedi); lineage i.e. family tree: (fig.) race; pianta esotica, exotic plant; pianta di un edificio, plan of a building; essere in pianta, to be on the list; rifare una cosa di sana pianta, to do a thing a second time. 7. PIANT-abil-e, adj., pertaining to a plantation. 8. PIANT-aggin-e, s.f., plantain, i.e. pasture-plant. 9. PIANT-agion-e, s.f., plantation, planting; piantagione di patate, potato field. 10. PIANT-ament-o, s.m., planting, plantation. 11. PIANT-are, v.tr., to plant; to set; to stick; to drive in; to place; to forsake, to abandon, cf. French plaguer; piantare una bandiera, to set up a standard, to hoist a flag; piantare in asso, to leave a person in the lurch; piantare un pugnale nel petto, to stab with a dagger, cf. English, stick a dagger into him; piantare carote (fig.) to make someone believe, cf. English, to plant a clue; piantare le tende, to lodge, to dwell.
The potentialities of a mechanical thesaurus
85
12. PIANT-arsi v.rifl., to fix oneself, to settle; to set up; piantarsi in un lucco, to settle; to set up; piantarsi in un lucco, to settle down in one place. 13. PIANT-a-stecch-i, s.m., (calz.) punch, puncheon (arnese per piantar gli stecchi nelle suole). 14. PIANT-at-a, s.f., plantation; row of trees. 15. PIANT-at-o, part.pass.e.adj, planted, set up; fixed; ben piantato, well-built, well-set-up man. 16. PIANT-at-oi-o, s.m., (agr.) tool for planting, dibbler. 17. PIANT-at-ore, -trice, s.m.f., planter. 18. PIANT-at-ur-a, s.f., plantation, planting. 19. PIANT-im-i, s.m., plur., many sorts of plantations. (PIANTO, s.m., tears, weeping; lament; (fig.) pain; regret.) (PIANTO, part, pass., wept; lamented; deplored.) 20. PIANT-on-ai-o, PIANT-on-ai-a, s.m.f., (agr.) nursery. 21. PIANT-on-are, v.tr., to watch over, to nurse, to guard; to plant cuttings. 22. PIANT-on-e, s.m., (mil.) sentry, nurse, guard; (fig.) watcher; (agr.) sucker scion, sapling. Essere di piantone, to sentinel, to be on guard, to guard. 23. s-PIANT-a-ment-o, s.m., uprooting, transplanting. 24. s-PIANT-are, v.tr., to uproot, to transplant; to ruin. 25. s-PIANT-at-o, s.m., penniless person; (fam.) someone who is dead broke, stony broke. 26. s-PIANT-o, s.m., ruin, destruction. Mandare a spianto, to ruin. 27. tra-PIANT-a-ment-o, s.m., transplantation. 28. tra-PIANT-are, v.tr., to transplant. 29. tra-PIANT-at-oi-o, transplanter (a tool). When the contexts provided by translation into a second language are added to the above, the tree becomes very much more complicated. Inspection immediately shows, moreover, that the only criterion for differentiating many of the new points on the bilingual tree is the fact that if, say, of two otherwise similar uses of PIANT- English translations are given, different English words will be used in the two cases. For instance, once the English language is considered as well as the Italian, the use of PIANT- in the phrase piantare le tende, ‘to pitch a tent’, must clearly be distinguished from its use in the phrase piantare una bandiera, ‘to set up a standard’. But to the man thinking wholly in Italian, this difference of use may not be perceptible: for him, one plants a tent on the ground and a standard in the air in exactly the
86
The thesaurus as a tool for machine translation
same figurative sense of ‘plant’; all the more so, indeed, as piantare le tende means permanently to establish a tent (compare ‘Caesar then established his winter quarters’) and is to be contrasted with rizzare le tende, which means to pitch a tent with the intention of taking it up again in a short time, and this last differentiation of context is one that we do not have in English. Such considerations raised doubts of the validity of such translation points on bilingual dictionary trees, which led to the reanalysis of bilingual dictionary trees not as trees but as lattices. For translation points on a dictionary tree are not just points on a single path, but junctions of two paths; as, indeed, the contexts of the unilingual tree might also be taken to be if such chunks as -UR- and -AGION- were taken as the points of origin of trees. Moreover, if it be granted that, even in simultaneous translation, translation is never actually made between more than two languages at once, a multilingual tree, as opposed to a bilingual tree, will also have this property that all its points will be translation points, and it will therefore be a lattice. Moreover, it will not always be true that as the number of languages that are incorporated increases this lattice will become significantly more complex, because many of these translation points will fall on one another. [ . . . ] 3. In this design the chunks of the input text are passed through four successive processes of transformation. The first stage of each of these consists of matching the chunks, in turn, with some sort of dictionary; there are thus four dictionaries used in succession in the programme. These are (1) the bilingual pidgin dictionary, (2) the lattice inventory, (3) the thesaurus cross-reference dictionary, (4) the thesaurus. In order to exemplify this whole mechanical-translation process in concrete form, the following test procedure has been devised. Translation trials might be undertaken which, if MT is to develop as a subject in its own right, will provide the controlled empirical material which we so much need. In the procedure described below, the lattice inventory and program, which is by A. F. Parker-Rhodes, will in the near future actually go through a computer. The thesaurus used was Roget’s Thesaurus (1953 edition), amended and amplified according to the procedures given below. The general design was by Masterman, and the pidgin passage dictionary by Masterman and Halliday. The matches were made by means of alphabetically stacked packs of written cards, each containing the entry for one chunk. The procedure was developed as follows: A paragraph from an Italian botanical paper was chosen, and divided into chunks as given below: LA PRODUZ-ION-E DI VARIET-A DI PIANT-E PRIV-E DI GEMM-E ASCELL-ARI, O PERþ LEþ MENO CON GERMOGL-I A SVILUPP-O RIDOTT-O, INTERESS-A DAþ TEMPO GENET-IST-I ED AGRONOM-I,. TAL-E PROBLEM-A SIþ PRESENT-A PARTICOLAR-MENT-E INTERESS-
The potentialities of a mechanical thesaurus
87
ANT-E PER ALCUN-E ESSENZ-E FOREST-AL-I E FRUTT-IFER-I, PER LE PIANT-E DI FIBR-A, MA SOPRATTUTTO PER IL TABACCO-O. IN QUEST-A COLT-UR-A E INFATTI IMPOSSIBIL-E MECCANIZZ-ARE L‘ASPORT-AXION-E DEI GERMOGL-I ASCELL-ARI, NECESS-ARI-O D’þ ALTRAþ PARTE PER OTTEN-ERE FOGLI-E DI MIGLIO-E QUALIT-A. (N.B. Entries of the form A þ B þ C . . . þ N were entered as single chunks)
A simple Italian-English pidgin dictionary was then compiled covering the chunks of this paragraph. Specimen entries taken from this are given below. It was to be noted that, while the schema of this dictionary allows of one chunk having, if necessary, two lattice position indicators (LPIs), though the chunks entered in this passage dictionary have only one, it does not allow of any chunk having more than one pidgin translation. The whole passage dictionary was planned to give, as simply as possible, an output embodying only what the machine could immediately find out of the structure of Italian. When the chunks of this dictionary were matched with the chunks of the input, the output shown in Figure 17 was obtained. Sample Italian-English pidgin dictionary entries Italian
LPI
-A -ALDAþ TEMPO FIBRI GENET-
28 39 28 30 26 60
1!0 routine
0!1, 1!1
English ! -Y FORþ SOMEþ TIMEþ PAST FIBRE THOSE-WHICH-ARE GENETIC-
The LPIs of the chunks of this output were then picked up and inserted uniquely into lattices by means of the lattice inventory and lattice programme. These lattices give synthesis routines for English which produce output II, below: Output II THE PRODUCE-MENT OF VARIETY-S OF PLANT-S WITHOUT AXIL- ARY BUD-S, OR ATþ LEAST WITH SPROUT-S AT REDUCED DEVELOPMENT-S, (SING) INTEREST-(PRES) FORþ SOMEþ TIMEþ PAST GENETICIST-S AND AGRICULTURE-IST-S (PLUR). SUCH PROBLEM- S (PLUR) SELF-PRESENT (PRES) PARTICULAR-LY INTEREST-ING FOR SOME FOREST-Y AND FRUIT-BEARING ESSENCE-S, FOR THE PLANT-S OF FIBRE-S, BUT ABOVE ALL FOR TOBACCO. IN THIS
88
The thesaurus as a tool for machine translation
Output I: top line: singular/plural subroutine second line: output in chunks decimal numbers: LPIs initial set of subroutine (i.e. unmarked form) 1 1→0
0→1, 1→1 30 28 35 30 28 VARIETY–ω–S OF PLANT–φ–S
26 29 54 28 35 THAT–ONE–WHICH–IS PRODUCE–MENT–φ OF
set–back–to–1 35 30 28 30 39 WITHOUT BUD–ψ–S AXIL–ARY
56 28 OR AT+LEAST
,
set–back–to–1 35 30 28 35 30 28 60 28 WITH SPROUT–ψ–S AT DEVELOPMENT–χ–S REDUCED–χ–S BLANK 29
28
28
60
30
,
28
56
62
INTEREST–ω–S FOR+SOME+TIME+PAST GENETIC–IST–ψ–S AND AGRICULTURE– set–back–to–1 30 28 IST–ψ–S
1→1,0 0→30 28 42 29 28 SUCH 16 PROBLEM–ω–S SELF+PRESENT–ω–S
31
.
BLANK 35 60 PARTICULAR–LY
29 39 28 35 26 28 30 28 INTEREST–ING–φ–S FOR SOME–φ–S ESSENCE–φ–S
30 39 28 56 30 39 28 FOREST–Y–ψ–S AND FRUIT–BEARING–ψ–S
set–back–to–1 – 35 26 , FOR THAT–WHICH–IS
set–back–to–1 – ,
30 28 35 30 28 PLANT–φ–S OF FIBRE–ω–S
56 28 35 BUT ABOVE+ALL FOR
set–back–to–1 1→0, 0→0 26 30 28 THAT–ONE–WHICH–IS TOBACCO–ω BLANK 29
50 28 29 –
28
31 . 60
ignore–last–signal 35 IN
THIS 26 1→1 , 0→0
28
CULTIVATE–URE–ω BE–α IN+FACT IMPOSSIBLE–φ BLANK 29 – 26 29 28 35 30 28 MECHANIZE–α THAT–WHICH–IS REMOVEMENT–φ OF+THE SPROUT–ψ 30
39
set–back–to–1 –
60
39
28
24
35
NECESS–ARY–χ–S ON+THE+OTHER+HAND FOR set–back–to–1 29 48 30 28 35 28 28 30 28 31 OBTAIN–TO LEAF–φ–S OF BETTER–φ–S QUALITY–ω–S . AXIL–ARY
,
Figure 17
The potentialities of a mechanical thesaurus
89
CULTIVATE-URE IT BE (PRES) INþ FACT IMPOSSIBLE TO MECHANIZE REMOVE-MENT OFþ THE AXIL-ARY SPROUT, ONþ THEþ OTHERþ HAND NECESSARY FOR TO OBTAIN LEAF-S OF BETTER QUALITY-S (PLUR). It will be noticed that, in this output, the translation procedure fails for non-grammatical reasons at a few easily identifiable points. (I am ignoring spelling mistakes produced by the pidgin, such as PRODUCE-MENT for ‘production’, as these could be picked up by cross-entries in the thesaurus cross-reference dictionary.) ESSENZ-E, in the original, is translated ESSENCE-S; GERMOGL-I is translated SPROUTS; SI PRESENTA is translated SELF-PRESENT; and if ASCELL- had been given its vernacular meaning of ARMPIT-, the phrases ARMPIT-ARY BUD-S and ARMPIT-ARY SPROUT-S would have occurred in the translation. In order to decide between AXIL- and ARMPIT-, as the translation for the pidgin dictionary, a trial was made by rendering into pidgin the biblical story of Jeremiah the prophet, who was rescued from the pit by ropes which rested on the rags he had put under his axils. This story remained comparatively comprehensible. This result could semantically have been foreseen, since an armpit is an instance of an axil, as is also the crutch of the legs – the only other place Jeremiah could have put his rags, whereas the idea of an axil cannot, inductively, be reached from that of an armpit. It was therefore decided further to examine these cases, by putting them through the thesaurus cross-reference dictionary and the thesaurus. Roget’s Thesaurus cross-reference dictionary is arranged alphabetically. The entries in it form trees, but much simpler trees than those produced by normal dictionary entries. Specimen entries from it are given below: Specimen entries from the cross-reference dictionary of Roget’s Thesaurus bud 367 beginning 66* 129* germ 153 *ornament 847* expand 194 graft 300 – from 154 -dy 711, 890
plant place 184 insert 300 vegetable 367 agriculture 371 trick 545 tools 633 property 780 – a battery 716 – oneself 184 -ation 184, 371, 780
problem 454, 461, 533, -atical 475
90
The thesaurus as a tool for machine translation
These insertions have been made to make the thesaurus multilingual. They have not, however, been made ad hoc. If the thesaurus dictionary procedure given here is to work for translation trials, additions and emendations to the thesaurus must be made only according to thesaurus principles; that is, according to one of the procedures given below: Procedure for amplifying a translation thesaurus Each chunk in any pidgin dictionary must successfully match with an entry in the cross-reference dictionary: for example PLANT-, plant. Each main meaning of the corresponding source-language entry in the pidgin dictionary must be compared (not matched) with the sub-headings of the cross-reference entry. If the comparison is unsatisfactory in that there is reason to suspect that the cross-reference spread is too narrow (i.e. that the cross-reference tree has not enough main branches), then one of the two emendation procedures given below must be adopted. Procedure 1 Without making an addition to the cross-reference entry, bring down the actual thesaurus items that are referred to in the entry and search for the missing meanings. If they are found, no addition to the cross-reference entry need be made. Example The Italian bilingual dictionary tree of PIANT- (actually a lattice) has a branch with the main meaning design. This branch has derived meanings groundwork, plan, blue-print, installation, list, scheme, invention, pretext, lie. In the cross-reference entry ‘plant as design’ does not occur, ‘plant as trick, 545’, however, does; and the thesaurus item 545, Deception, gives, either directly or by subreference, lie, pretext, invention and blue-print. Scheme, design and plan can also readily be reached from this item if (under emendation procedure (2) below) an addition is made to item 545, row 3, so that this row now reads: item 545, row 3: trick, cheat, wile, ruse, blind, feint, plant, catch, chicane, juggle, reach, hocus; thimble-rig, card-sharping, artful dodge, machination, swindle, hoax, hanky-panky; tricks upon travellers; confidence trick; strategem, &c. 702; scheme, &c, 626, theft, etc., 791. That the new asterisked element is a legitimate addition to the thesaurus can be confirmed by consulting item 702, Cunning, where schemer occurs, and where there is a reference back to 545.
The potentialities of a mechanical thesaurus
91
‘List’ could legitimately be inserted into 626 as follows: item 626, row 4: List, programme, &c, 86; forecast, play-bill, prospectus, scenario, . . . This addition can be checked by looking up 86, List, which already contains programme. List should also be inserted into 626 in row 11, so that this now reads: item 626, row 11: cast, recast, systematise, organise; arrange, list &c, 60, digest, mature. This addition can be confirmed by consulting item 60, Arrangement, which already contains list. Finally, under list, in the cross-reference dictionary, a sub-heading must now be added ‘list as plan, 626’, so that the total entry now reads: list as catalogue 86 as plan 626 as strip 205 as leaning 217 etc. etc. Of the remaining meaning of PIANT-, routes to groundwork and installation can only be constructed, if at all, by more intervening steps, since the items 25, Support and 185, Location, where they occur, do not appear in the dictionary cross-reference entries of any of the others. Thus there is no incentive to add ‘plant as design’ to the cross-reference entry of plant, which would be done under procedure (2) since the entry ‘plant as trick’ already leads to all the items that could thus be reached. Procedure 2 Under this procedure an addition is made to the actual cross-reference dictionary entry of the chunk in question. Example The bilingual dictionary tree (actually a lattice) schematising the uses of GEMM- contains a branch of which the main English meaning is ‘gem’. The thesaurus cross-reference dictionary entry BUD includes no cross-reference that leads to any item containing gem or jewel. If, however, the cross-reference ‘bud as ornament 847’ is added between ‘bud as germ’ and ‘bud as expand’ (see above), the required connection is made, since item 847, Ornament reads as follows:
92
The thesaurus as a tool for machine translation
item 847, row 7:
tassel, knot, epaulet, frog; star, rosette, bow; feather, plume, aigrette. row 8: jewel; jewellery; bijouterie, diadem, tiara; pendant, trinket, locket, necklace, armilla, bracelet, bangle, armlet, anklet, ear-ring, nosering, chain, chatelaine, brooch. row 9: gem, precious stone; diamond, emerald, onyx, plasma; opal, sapphire, ruby; amethyst, pearl . . .
We now have the required connection from entry to item. In order to be able to get back from item to entry, however, one of the given rows of 847 must be extended so as to include bud. The suggested extension is as follows: item 847, row 7 (contd.) . . . feather, plume, aigrette; bump, button, nipple, nodule, bud. The justification for this extension, of course, has got to be that some, at least, of this chain of metaphoric uses exist in English. Bump can be taken as colloquial: (‘that is a very ornamental bump you have upon your forehead’). Ornamental buttons are dressmaking stock in trade; this element should be already in the item. Nipple has a definite, though rare, use as a nipple-shaped beautiful object. (‘The crests or nipples of the hill line are crowned with the domes of the mosques’, wrote Cory in 1873: Oxford Dictionary.) Nodule has an even rarer one, meaning ‘something like a knot’. Finally, bud, meaning ‘ornament’ does exist, but only poetically and archaically. Thus we get ‘Their breasts they embuske on high and their round Roseate buds immodestly lay forth’ (Nashe, 1613). And Emerson, in his poems, wrote much later of ‘the bud-crowned spring’. Thus we get the curious situation that the use of an extended train of meanings for ornament, all of which have become cliche´s in Italian, is still an act of poetic originality in English and American. Nevertheless, the train of uses exists, and the addition to the thesaurus item is therefore justified. These methods of emending and amplifying Roget’s Thesaurus have been exemplified in detail, because, in view of the surprisingly good outputs that follow, it might be thought that the thesaurus routes used had been manipulated to suit the Italian paragraph. This is not so; every suggested new connection has been checked and justified, and all relevant asterisked emendations used to reach the outputs are given below. The suspicion of manipulation represents a direction opposite to that in which the research has gone, for, in actual fact, the more the experience which is gained of using this thesaurus, the less the emendations that are made. It is a sound presumption that, with few exceptions, all possible chains
The potentialities of a mechanical thesaurus
93
of meanings are somewhere in Roget’s Thesaurus if they can be found. A minimum number of trials, moreover, begets a strong conviction that thesaurus searching and matching would best be done automatically from the earliest possible date; they are no work for a mere human being. In other words, if the thesaurus technique proves, on trials, to have definitive MT significance, it will also prove to be the frontier point where the MT worker, in this new kind of calculation, hands over to the machine; where results, uncalculated in advance by the programmer, are produced by the program. It may also (that is, if it establishes itself as having translation value) be the point of departure for a new exploration of the analogy between the human cortex and a computer; for this feels like a model of what we do when we ourselves translate. 4. Work done on the Italian paragraph has provided the following examples of translations produced by the thesaurus procedure. Case I: ESSENCE-S If the chunks FOREST AND FRUITBEARING ESSENCE-S, that is, all the chunks in the invertor-lattice 56, 60, 60 in which they occur are matched with the entries in the thesaurus cross-reference dictionary, the following output is obtained: Output III forest 57 367, 890
and 37, 38
bearing relation 9 support 215 direction 278 meaning 516 demeanour 692 -rein 752 fruit- 168, 637, 367 child- 161
fruit result 164 produce 161 food 298 profit 775 forbidden- 615 reap the -s, 973 -tree, 367 fruitful 168 fruition 101 fruitless 169, 645, 732 essence 5, 398 essential intrinsic 5 meaning 516 great 31 required 630 important 642 essentially 3, 5 essential stuff 5
94
The thesaurus as a tool for machine translation
Upon this output the thesaurus operations are performed with the aid of restrictive and permissive rules, given as they occur, and the object of which will be evident. If the machine could be programmed to know that ESSENCE, and not FOREST-, FRUIT-BEARING is the word that needs to be retranslated, the right output, namely ‘example’, would be obtained, because the machine could then be instructed to suspend any restrictive rule that is designed to prevent a chunk already rightly translated in Output II from being replaced by a string of synonyms. Such a rule would have to run, ‘In the case of the chunk to be retranslated, reject output given by Rule X, and replace by output normally rejected by rule X’. We will call this rule Post-Editing Rule I, to show that, in this thesaurus-procedure, it cannot be automatised. Operation 1.1. Pick out all numbers that occur more than once in Output III. Let these be called ring numbers. Result 1.1. Ring number Thesaurus item Sources of ring number 367 vegetable forest, fruit 161 production fruit, fruit, bearing 168 productiveness fruit, bearing 516 meaning bearing, essence 5 intrinsicality essence, essence, essence It is worth remarking, as an incidental fact, that ‘The Intrinsic Meaning of the Productiveness of Vegetable Production’ could stand as a sub-title, of a sort, for the whole paper. Operation 1.2. Reorder ring numbers in order of descending frequency of occurrence. In the case of two ring numbers that occur with equal frequency, put first those that ring together most chunks. If order is then still undecided in any case, take input order. Result 1.2. 5, 367, 161, 168, 516 Operation 2.1. Compare for common elements, in twos, the thesaurus items bearing the ring numbers in the comparisons which are permitted by the lattice relations of the chunks that are being put through the procedure (in this case those of the invertor lattice 60, 56, 60). In the case of any two chunks, A and B, call this comparison A B. Order of comparisons: (1) A A (e.g. fruit ˙ fruit) N.B. When this lattice relation yields a ˙ a, a being not a chunk but a ring number, take the output that is identical with the original chunk (e.g. 161 ˙ 161)
The potentialities of a mechanical thesaurus
95
(2) A covers B (3) A B The output produced by the comparison, subject to the restrictive and permissive rules given below, is to be taken as synonymous with the chunk A in the form A A, and with the chunk B in the case where A covers B or A B. Since the invertor-lattice elements 60, 56, 60 are formed from twoelement chains 30, 39, the following comparisons are permitted in this case. Result 2.1. Ring-number Lattice relation Chunk comparison comparison AA Aþ A ¼ A FRUIT ˙ FRUIT 367 ˙ 161 FRUIT ˙ FRUIT 161 ˙ 168 FRUIT ˙ FRUIT 367 ˙ 168 BEARING ˙ BEARING 161 ˙ 168 BEARING ˙ BEARING 161 ˙ 516 BEARING ˙ BEARING 168 ˙ 516 ESSENCE ˙ ESSENCE 5 ˙ 516 A covers B Forest ˙ -Y No comparison, as -Y has no entry FRUIT ˙ BEARING 161 ˙ 168 FRUIT ˙ BEARING 161 ˙ 367 FRUIT ˙ BEARING 161 ˙ 516 FRUIT ˙ BEARING 168 ˙ 367 FRUIT ˙ BEARING 168 ˙ 516 FRUIT ˙ BEARING 367 ˙ 516 AB FOREST ˙ ESSENCE 367 ˙ 5 A˙B=B FOREST ˙ ESSENCE 367 ˙ 516 (A ˙ B) ˙ C FRUIT-BEARING ˙ ESSENCE 161 ˙ 5 (A ˙ C) ˙ (B ˙ C) FRUIT-BEARING ˙ ESSENCE 168 ˙ 5 FRUIT-BEARING ˙ ESSENCE 161 ˙ 516 FRUIT-BEARING ˙ ESSENCE 168 ˙ 516 FRUIT-BEARING ˙ ESSENCE 367 ˙ 5 FRUIT-BEARING ˙ ESSENCE 367 ˙ 516 N.B. The comparison FOREST ˙ FRUIT is prohibited, since these chunks are incomparable in the lattice. But no new comparison would result from allowing this, since all possible combinations of the five numbers already occur.
96
The thesaurus as a tool for machine translation
Operation 2.2. List common elements given by thesaurus-item comparisons. Ring numbers Thesaurus items 5 ˙ 161 Intrinsicality ˙ Production New Comparisons Generated: 5 ˙ 22 Intrinsicality ˙ Prototype 161 ˙ 22 Production ˙ Prototype 5 ˙ 168 Intrinsicality ˙ Productiveness 5 ˙ 367 Intrinsicality ˙ Vegetable 5 ˙ 516 Intrinsicality ˙ Meaning New Comparisons Generated: 5 ˙ 22 516 ˙ 22 Prototype ˙ Meaning 161 ˙ 168 Production ˙ Productiveness
161 ˙ 367 Production ˙ Vegetable 161 ˙ 516 Production ˙ Meaning New Comparisons Generated: 161 ˙ 22 516 ˙ 22 168 ˙ 367 Productiveness ˙ Vegetable 168 ˙ 516 Productiveness ˙ Meaning 367 ˙ 516 Vegetable ˙ Meaning
Outputs flower; &c 22 example, specimen pattern, prototype NO OUTPUT flower essence, example, meaning, &c 22 SEE ABOVE prototype, example propagation, fertilisation, fructify, produce – 168, 168 – 161 growth, flower prototype &c 22 SEE ABOVE SEE ABOVE NO OUTPUT NO OUTPUT NO OUTPUT
Operation 3.1. Produce synonyms for the passage required by applying outputs given under 2.2 to comparisons permitted under 2.1. Synonym outputs for FRUIT
for BEARING for ESSENCE
(1) growth, flower (2) propagation, fertilisation, fructify, produce N.B. since cross-references both from 161 to 168 and from 168 to 161 lead to permitted comparisons 161 ˙ 161 and 168 ˙ 168, apply 2.1(1) and substitute FRUIT AS ABOVE, i.e., under 2.1(1), substitute BEARING essence, example, meaning, &c 22 example, specimen prototype, example
The potentialities of a mechanical thesaurus
for FRUITBEARING for FOREST ESSENCE for FRUITBEARING ESSENCE
97
AS ABOVE. i.e. under 2.1(1) substitute FRUIT-BEARING flower
flower, &c 22, example, specimen pattern, prototype, prototype, &c 22, example, specimen, prototype, example, flower. So far, we have used no restrictive or permissive rule except 2.1(1). If we make use of the following additional rules, to distinguish between output, we get the following final result: Restrictive rules 1. 2.1(1) (as above). 2. If a chunk of Output II generates no ring number in the thesaurus, and thus generates also no comparison, replace it by itself in Output IV. By this rule, FOREST is reinserted as FOREST. 3. If rule 2.1(1) operates, reject all other output. By this rule, FRUIT remains FRUIT, -BEARING remains -BEARING, and FRUIT-BEARING remains FRUIT-BEARING. 4. When selecting final output, take longest output first; i.e., if there is a synonym output for FRUIT-BEARING ESSENCE, select it in preference to a synonym for FRUIT-BEARING. (This is analogous to the pidgin-dictionary matching rule, given earlier.) By using these, we remove all but the final synonyms: Output IV for FOREST ESSENCE forest flower for FRUIT-BEARING ESSENCE fruit-bearing example, (3 occurrences), flower (2 occurrences), prototype (2 occurrences), specimen (2 occurrences), pattern (1 occurrence). N.B. In this output, alternatives have been reordered in order of occurrence, and the output &c 22 deleted. Asterisked entries in thesaurus: In item 5: item 5, row 1:
row 2:
. . . essence, essential, . . . essential part, . . . gist, pith, core, kernel, marrow, . . . important part, &c, 642, *meaning, &c, 516* principle, nature, constitution, character, type, quality; *token, example, instance, specimen &c 22;*
98
The thesaurus as a tool for machine translation
item 161, row 4:
authorship, publication, works, opus; *result, answer, calculation; arrangement, pattern, prototype, &c 22; product, treatment*
In the case of ESSENCE, the full thesaurus test procedure has been given. In the other cases taken from the Italian paragraph that follow only the results of the successive operations are shown. Case II: SELF-PRESENT Ring numbers such 17 problem 454 self 13, 79, 451, 486, 565, 604, 717, 836, 861, 864, 879, 880, 942, 943, 950, 953, 990 present 118, 151, 186, 457, 505, 763, 861, 894 interest 454, 455, 457, 780 (re-ordering of these in descending order of frequency of occurrence) permitted comparisons SELF ˙ SELF PRESENT ˙
output (153 comparisons wait for future computer)
118 ˙ 151 Eventuality ˙ Present Time
NO OUTPUT
118 ˙ 186 Presence ˙ Present Time 118 ˙ 457 Present Time ˙ Attention 118 ˙ 763 Present Time ˙ Courtesy 118 ˙ 894 Present Time ˙ Offer 151 ˙ 186 Eventuality ˙ Presence 151 ˙ 457 Eventuality ˙ Attention 151 ˙ 763 Eventuality ˙ Offer 151 ˙ 894 Eventuality ˙ Courtesy
present
PRESENT
NO OUTPUT present NO OUTPUT NO OUTPUT concern NO OUTPUT NO OUTPUT
The potentialities of a mechanical thesaurus
186 ˙ 457 Presence ˙ Attention 186 ˙ 763 Presence ˙ Offer 186 ˙ 894 Presence ˙ Courtesy 457 ˙ 763 Attention ˙ Offer 457 ˙ 894 Attention ˙ Courtesy 763 ˙ 894 Offer ˙ Courtesy PARTICULAR ˙ 79 ˙ 151 Speciality ˙ Eventuality PARTICULAR 79 ˙ 594 Description ˙ Speciality 79 ˙ 780 Speciality ˙ Property 51 ˙ 594 Eventuality ˙ Description 151 ˙ 780 Eventuality ˙ Property 594 ˙ 780 Description ˙ Property INTEREST ˙ 454 ˙ 455 Topic ˙ Curiosity INTEREST 454 ˙ 457 Topic ˙ Attention 454 ˙ 780 Topic ˙ Property 455 ˙ 457 Curiosity ˙ Attention 455 ˙ 457 Curiosity ˙ Property
99
NO OUTPUT NO OUTPUT NO OUTPUT NO OUTPUT attentive NO OUTPUT NO OUTPUT
particularise, specify personal NO OUTPUT business NO OUTPUT interest, &c, 461 &c, 451 interest, business interest, attentive NO OUTPUT
By these comparisons, two new ring numbers are generated, 451, 461. These cause the ring numbers for problem now to be 454, 451, 461, and the ring numbers for interest now to be 451, 454, 455, 457, 461, 780. These additions permit the following additional comparisons of the form A A.
100
The thesaurus as a tool for machine translation
PROBLEM ˙ 454 ˙ 451 THOUGHT ˙ TOPIC PROBLEM 451 ˙ 461 THOUGHT ˙ INQUIRY 454 ˙ 461 TOPIC ˙ INQUIRY INTEREST ˙ 451 ˙ 454 THOUGHT ˙ TOPIC INTEREST 451 ˙ 455 THOUGHT ˙ CURIOSITY 451 ˙ 457 THOUGHT ˙ ATTENTION
451 ˙ 461 THOUGHT ˙ INQUIRY 451 ˙ 780 THOUGHT ˙ PROPERTY 454 ˙ 455 TOPIC ˙ CURIOSITY 454 ˙ 457 TOPIC ˙ ATTENTION 454 ˙ 461 TOPIC ˙ INQUIRY 454 ˙ 780 TOPIC ˙ PROPERTY 455 ˙ 457 CURIOSITY ˙ ATTENTION 455 ˙ 461 CURIOSITY ˙ INQUIRY 455 ˙ 780 CURIOSITY ˙ PROPERTY 457 ˙ 461 ATTENTION ˙ INQUIRY 457 ˙ 780 ATTENTION ˙ PROPERTY 461 ˙ 780 INQUIRY ˙ PROPERTY
&c 461
study, discuss, consider &c 451; question, problem &c 461
&c 457 thought, reflection, consideration, interest, close study, occupy the mind, strike one as, &c 458 study, discuss, consider NO OUTPUT NO OUTPUT interest, &c 451 &c 451, question, problem business interest, attentive prying, what’s the matter? NO OUTPUT &c 451 NO OUTPUT NO OUTPUT
The potentialities of a mechanical thesaurus
101
At this point the detailed procedure was broken off, since it was already clear that the output of greatest frequency, among the synonyms given for INTEREST would be ‘thought, reflection, consideration, interest, close study, occupy the mind, strike one as, &c 458’, namely the output of 451 ˙ 457. For the additional newly generated ring number, 458, inattention, yields only &c 457 as output when compared with any of the others; and this output is already also given by 451 ˙ 455. Three other outputs already also include 451. Thus if the work of comparison is continued, the combination 451 ˙ 457 will increasingly recur. It is clear that the synonyms required for idiomatic translation of SELFPRESENT, namely ‘strike one as, occupy the mind’, will occur in the wrong position, namely as synonyms for INTEREST. Nor can this error be corrected from the lattice program. For this, as given, allows only the comparisons SUCH ˙ PROBLEM and PARTICULAR ˙ INTEREST, neither of which will improve the synonym output for SELF-PRESENT. The only lattice relations that will produce the required connection are those given by the extended lattice consisting of the whole sentence, and only this after the dualising operation has already been performed. This operation, by reversing the meets and joins of the lattice, allows SELFPRESENT ˙ PROBLEM to occur as B-element of a two-element chain of which PARTICULAR ˙ INTEREST occurs as A-element and thus allows the last algorithm using permitted pairs to operate. But this intersentential lattice program does not exist as yet. The final output, therefore, of this application of the procedure, is as follows: for PROBLEM study, discuss, consider; question, problem for PRESENT present, concern, attentive for PARTICULAR particularise, specify, personal, business for INTEREST thought, reflection, consideration, interest, close study, occupy the mind, strike one as; study, discuss, consider; question, problem; business, attentive; prying, what’s the matter? for PARTICULAR ˙ application, hobby, particularity, INTEREST application, indicate, prove, occur, find, affair, run over, specification. Of these last, the output of 151 ˙ 451, Eventuality ˙ Thought, prove, occur, find, is of interest, as it would be given under the output following ‘Ring numbers’ above, since 151 is a ring number also in PRESENT.
102
The thesaurus as a tool for machine translation
Case III: SPROUT 1.1. Ring numbers with 52 sprout 35, 154, 194 reduce 144, 160 development 35, 144, 154, 194 1.2. Ring numbers in order of frequency of occurrence: 35, 144, 154, 194, 52, 160 2.1. Permitted comparisons SPROUT ˙ 35 ˙ 154 SPROUT
Increase ˙ Effect
production, development, grow, sprout, shoot
Increase ˙ Expansion
increase, enlargement, augmentation, extension, growth, development, spread, swell, shoot, sprout growth, development, sprout, shoot. reduce
SPROUT 35 ˙ 194
154 ˙ 194 Effect ˙ Expansion REDUCE ˙ 144 ˙ 160 REDUCE DEVELOPMENT ˙ 35 ˙ 144 DEVELOPMENT 35 ˙ 154 35 ˙ 194
REDUCE ˙ DEVELOPMENT
Conversion Weakness Increase ˙ Conversion Increase ˙ Effect Increase ˙ Expansion 144 ˙ 154 Conversion ˙ Effect 144 ˙ 194 Conversion ˙ Expansion 154 ˙ 194 Effect ˙ Expansion SEE SPROUT ˙ REDUCE ˙ DEVELOPMENT
growth, development, grow SEE ABOVE SEE ABOVE grow development, growth, grow SEE ABOVE
The potentialities of a mechanical thesaurus
SPROUT ˙ REDUCE ˙ DEVELOPMENT
103
35 ˙ 144
Increase ˙ Conversion
growth, development, grow
35 ˙ 154
Increase ˙ Effect
35 ˙ 160
Weakness ˙ Increase Increase ˙ Expansion
production, development, grow, sprout, shoot shoot
35 ˙ 194
144 ˙ 154 Conversion ˙ Effect 144 ˙ 160 Conversion ˙ Weakness 144 ˙ 194 Conversion ˙ Expansion 154 ˙ 160 Effect ˙ Weakness 154 ˙ 194 Effect ˙ Expansion
increase, enlargement, augmentation, extension, growth, development, spread, swell, shoot, sprout grow reduce development, growth, grow bud, shoot growth, development, sprout, shoot
2.2. Synonyms for SPROUTS, in SPROUTS AT REDUCED DEVELOPMENT; development (5 occurrences), shoot (5 occurrences) growth (4 occurrences), sprout (3 occurrences), production, bud, reduce, spread (1 occurrence). Asterisked entries in thesaurus, for cases II and III: cross-references: interest concern 9 *occupation* curiosity 455 etc. sprout grow 35 germinate 161
104
The thesaurus as a tool for machine translation
off-spring 167 *vegetable 365, 367* expand 194 -from 154 item 35, increase, row 2:
item 129, infant, row 5:
item 160, weakness, row 4:
item 451, thought, row 5:
row 9:
item 454, topic, row 1: row 2: item 455, curiosity, row 4: item 457, attention, row 1:
V increase, augment, add to, enlarge; dilate &c 194; grow, wax, mount, swell, get ahead, gain strength; advance; run, shoot, shoot up; rise; ascend &c, 305; sprout &c 194. scion; sapling, seedling; bud, tendril, shoot, olive-branch, nestling, chicken, duckling; larva, caterpillar, chrysalis, etc. weakling; infant; mite, tot, little one, slip, seedling, tendril, shoot, whelp, pup, lamb; infantile, puerile, babyish, new-fledged, callow. V think, reflect, reason, cogitate, excogitate, consider, deliberate; bestow thought upon, bestow consideration upon; speculate, contemplate, meditate, ponder, muse, dream, ruminate, run over; brood over; animadvert, study; bend the mind, apply the mind &c, 457; digest, discuss, hammer at, weigh, prove, perpend; realise, appreciate, find; fancy, &c 515; trow. occur; suggest itself; come into one’s head, get into one’s head; strike one, strike one as; be; run in one’s head, etc. N food for thought; mental pabulum, hobby, interest, &c 451 subject, subject-matter; theme, question topic, thesis, etc. Adj: curious, interested, inquisitive, burning with curiosity, etc. attention; mindfulness &c, adj.; intentness; thought &c, 451;
The potentialities of a mechanical thesaurus
row 2:
item 461, inquiry, row 3:
item 780, property, row 8:
item 894, courtesy
105
advertence; observation; consideration, reflection; heed; particularity; notice, regard &c, interest, concern; circumspection, &c, 459; study, scrutiny, etc. catch the eye, strike the eye; attract notice; catch, awaken, wake, invite, solicit, attract, claim, excite, engage, occupy, strike, arrest, fix, engross, absorb, rivet the attention, mind, thoughts; strike one, strike one as, be present to, uppermost in the mind. sifting, calculation, analysis, specification, dissection, resolution, induction. money, &c 800; what one is worth; estate and effects; share-holdings, business assets, business. courteous, polite, attentive, civil, mannerly, urbane, etc.
5. What is claimed for the thesaurus procedure is the following: 1. It is a planned procedure for producing idiomatic translation. When the translation fails, it is possible to see why. 2. Translation trials made by using it throw unexpected light on the principles of construction of a thesaurus. They should, therefore, yield information that will facilitate the construction of a thesaurus strictly compiled on statistical data for scientific MT. 3. On this procedure, the only bilingual dictionaries used are word-forword pidgin dictionaries. Nearly all the dictionary-making is done in the target language, in which the work of compiling the thesaurus, however laborious, need only be done once, since the thesaurus will transform the mechanical pidgin produced from all languages. 4. The thesaurus procedure uses previous MT results, which have established the high degree of intelligibility that can be reached by a mechanical pidgin, while at the same time keeping open the possibility of further analysing the input text. As against this, it will be urged that for MT the whole procedure is quite impracticable, since no computer could hold a coded thesaurus. This is true if the thesaurus were to be actually constructed and kept in being. The possibility exists, however, if all the items form lattices, of coding merely
106
The thesaurus as a tool for machine translation
the chunks of the English language, together with a specification of the thesaurus positions in which each occurs. This presents a formidable coding problem, but, with modern techniques of compressed and multiple coding, not an impossible one. Once idiomatic MT is what is aimed at, a problem of comparable order would be presented by the necessity of coding, say, the two-volume concise Oxford Dictionary. Current comments on the literature, moreover, already make it clear that the commercial world is not going to be satisfied with anything short of an attempt to provide multilingual, fully idiomatic MT, since the better the mechanical pidgin that is provided for the commercial readers’ inspection, the more impatient the reader becomes with the fact that it is not wholly intelligible and correct.
5
What is a thesaurus?
Introduction Faced with the necessity of saying, in a finite space and in an extremely finite time, what I believe the thesaurus theory of language to be, I have decided on the following procedure. First, I give, in logical and mathematical terms, what I believe to be the abstract outlines of the theory. This account may sound abstract, but it is being currently put to practical use. That is to say, with its help an actual thesaurus to be used for medium-scale mechanical translation (MT) tests, and consisting of specifications in terms of archeheads, heads and syntax markers, made upon words, is being constructed straight on to punched cards. The cards are multiply punched; a nuisance, but they have to be, since the thesaurus in question has 800 heads. There is also an engineering bottleneck about interpreting them; at present, if we wish to reproduce the pack, every reproduced card has to be written on by hand, which makes the reproduction an arduous business; a business also that will become more and more arduous as the pack grows larger. If this interpreting difficulty can be overcome, however, we hope to be able to offer to reproduce this punched-card thesaurus mechanically, as we finish it, for any other MT group that is interested, so that, at last, repeatable, thesauric translations (or mistranslations) can be obtained. I think the construction of an MT thesaurus, Mark I, direct from the theory, instead of by effecting piecemeal changes in Roget’s Thesaurus, probably constitutes a considerable step forward in our research. In the second section of the paper I do what I can to elucidate the difficult notions of context, word, head, archehead, row, list as these are used in the theory. I do not think this section is either complete or satisfactory; partly because it rests heavily upon some CLRU workpapers that I have written, which are also neither complete nor satisfactory. In order to avoid being mysterious, as well as incompetent, however, I have put it in as it stands. Any logician (e.g. Bar-Hillel) who will consent to read the material contributing 107
108
The thesaurus as a tool for machine translation
to it is extremely welcome to see this work in its present state; nothing but good can come to it from criticism and suggestion. In the third section of the paper I try to distinguish a natural thesaurus (such as Roget’s) from a term thesaurus (such as the CLRU’s Library Retrieval scheme), and each of these from a thesauric interlingua (such as R. H. Richens’ NUDE). Each of these is characterised as being an incomplete version of the finite mathematical model of a thesaurus, given in section 1 – except that Richens’ interlingua has also a sentential sign system which enables NUDE sentences to be reordered and reconstructed as grammatical sentences in an output language. This interlingual sign system, when encoded in the program, can be reinterpreted as a combinatory logic. It is evident, moreover, that some such sign system must be superimposed on any thesaurus, and the information that it gives carried unchanged through all the thesaurus transformations of the translation programme, if a thesaurus programme is to produce translation into an output language. Thus, Bar-Hillel’s allegation that I took up combinatory logic as a linguistic analytic tool and then abandoned it again is incorrect. This section is also meant to deal with Bar-Hillel’s criticism that ‘thesaurus’ is currently being used in different senses. This criticism is dealt with by being acknowledged as correct. The next section asks in what ways, and to what extent, a language thesaurus can be regarded as interlingual. We feel that we know a good deal more about this question than we did six months ago, through having now constructed a full-scale thesauric interlingua (Richens’ NUDE). This consists currently of Nuda Italiana and Anglo-NUDE. Nuda-Italiana covers 7,000 Italian chunks (estimated translating power 35,000 words), and can be quasi-mechanically expanded ad lib by adding lists and completing rows. We are, however, not yet developing it, since our urgent need is to construct a NUDE of a non-romance language (e.g. Chinese); this will, we think, cause a new fashion to set in NUDES, but will not, we hope, undermine the whole NUDE schema. In the final section of the paper I open up the problem of the extent to which a sentence, in a text, can be considered as a sub-thesaurus. This section, like section 2, is incomplete and unsatisfactory; I hope to take it up much more fully at a later date. It is so important, however, initially, to distinguish (as well as, I hope, finally to interrelate) the context lattice structure of a sentence, which is a sub-thesaurus, from the sentential structure, which is not. We hope to issue a fuller report than this present one on the punchedcard tests which we are doing and have done. We hope also to issue, though at a later date, a separate report on interlingual translation done with NUDE. I should like to conclude this introduction by saying that we hope
What is a thesaurus?
109
lastly and finally to issue a complete and authoritative volume, a sort of Principia Linguistica, or Basis Fundamentaque Linguae Metaphysicae, devoted entirely to an exposition of the theory, which will render obsolete all other expositions of the theory. I see no hope at all, however, of this being forthcoming, until an MT thesaurus (Mark n) survives large-scale testing on a really suitable machine. 1.
Logical and mathematical account of a thesaurus
1.1.
General logical specification of a thesaurus
1.1.1. Basic definition of a thesaurus A thesaurus is a language system classified as a set of contexts. (A context is further described below; it is a single use of a word.) As new uses of words are continually being created in the language, the total set of contexts consisting of the thesaurus is therefore infinite. 1.1.2. Heads, lists and rows In order to introduce finiteness into the system, we therefore classify it non-exclusively in the following manner: 1. The infinite set of contexts is mapped on to a finite set of heads. (Heads are further described below; they are the units of calculation of the thesaurus.) It is a prerequisite of the system that, whereas the number of contexts continually increases in the language, the number of heads does not. 2. The contexts in each of these heads will fall into either (a) lists or (b) rows. (A list and a row are further described below. A list is a set of mutually exclusive contexts, such as ‘spade, hoe, rake’, which if used in combination have to be joined by ‘and’; a row is a set of quasisynonymous contexts, such as ‘coward, faint-heart, poltroon’, which can be used one after the other, if desired, in an indefinite string.) 1.1.3. Paragraphs and aspects The heads are subdivided into paragraphs by means of syntax markers. (A syntax marker is further described below; it is a very general concept, like the action of doing something, or the concept of causing somebody to do something.) Ideally, a syntax marker specifies a paragraph in every head in a thesaurus. In fact, not every paragraph so specified will contain any contexts. A paragraph can consist either of a set of rows in a head, or a set of lists; or of a set consisting of a combination of rows and lists. Such a set can have no members (in which case it is a vacuous set), one member or more than one member. The heads are cross-divided into aspects, by means of archeheads. (An archehead is further described below; it is a very general idea, such
110
The thesaurus as a tool for machine translation
as that of ‘truth’, ‘pleasure’, ‘physical world’.) A thesaurus-aspect consists, ideally, of a dimediate division of the thesaurus (e.g. into ‘pleasing’ and ‘non-pleasing’ contexts), where a dimediate division is a binary chop. In actual fact, an archehead usually slices off an unequal but still substantial part of a thesaurus. 1.1.4. The resolving power of a thesaurus It cannot be too much stressed that once the division into heads, paragraphs, rows, lists and aspects has been effected, the contexts of the thesaurus are not further subdivided. This limit of the power of the thesaurus to distinguish contexts is called the limit of the resolving power of the thesaurus, and it is the great limitation on the practical value of the theory. Thus, the thesaurus theory of language does not, as some think, solve all possible linguistic problems; it does, however, successfully distinguish a great many contexts in language in spite of the fact that none of these contexts can be defined. To find the practical limits of the resolving power of any thesaurus should thus be the first object of any thesaurus research. 1.2.
A finite mathematical model of a thesaurus
1.2.1. Procedure for conflating two oriented partially ordered sets When a finite mathematical model is made of a thesaurus, the non-exclusive classification generates a partially ordered set. By adding a single point of origin at the top of the classification, this set can be made into an oriented partially ordered set, though it is not a tree. It must be remembered, however, that if it is to have an empirical foundation, a thesaurus of contexts must also be a language of words. An actual thesaurus, therefore, is a double system. It consists of: 1. context specifications made in terms of archeheads, heads, syntax markers and list numbers; 2. sets of context specifications, which are uses of words. Now, a case will be made, in the next section, for defining also as an inclusion relation the relation between a dictionary entry for a word (that is, its mention, in heavy type, or in inverted commas, in the list of words that are mentioned in the dictionary) and each of the individual contexts of that word (that is, each of the definitions given, with or without examples, of its uses, and that occur under the word entry in the dictionary). In the next section, it will be argued in detail that such a relation would generate a partially ordered set but for the fact that, owing to the same sign, or a different sign, being used indiscriminately both for the dictionary mention of the word and for one or any or any number of its uses, the axioms of a partially ordered set can never be proved of it. This is my way of approaching the fundamental problem of
What is a thesaurus?
111
‘X’
X1
X2
X3 X1 X, X2
X4
X5
X, … X
X6
Xn
Xn
Figure 18. Oriented partially ordered set consisting of the dictionary entry of a word.
the ‘wobble of semantic concepts’ which Bar-Hillel has correctly brought up, and which unless some special relations between semantic units are clarified, prevents anything ever being provable. Now, a thesaurus is precisely a device for steadying this wobble of semantic signs; that is one way of saying what it is; and the device which it uses is to define not the semantic signs themselves, nor their uses, but the thesaurus positions in which these uses occur. The same word sign, therefore, that is, the same conceptual sign, the same semantic sign, occurs in the thesaurus as many times as it has distinguishable contexts; a word like ‘in’, which has, say, 200 contexts in English, will therefore occur in the thesaurus 200 times. Thus, the theoretical objection to arguing on the basis that the relation between a dictionary mention of a word and its set of contexts is an inclusion relation disappears as soon as these contexts are mapped onto a thesaurus. In this section I assume, therefore, what in the next section I argue that we can never prove: namely, that the relation between a dictionary mention of a word and the items of its entry itself generates an oriented partially ordered set (Figure 18). But now, we have to notice an important logical fact. This is, that a use of a word as it occurs in an actual text (that is, when it is actually used, not mentioned) is logically different from the bold type mention of the word when it is inserted as an item of a dictionary. For the word as it occurs ‘in context’, as we say – that is, in an actual text in the language – by no means includes all the set of its own contexts. On the contrary, the sign of the word there stands for one and only one of its contexts; it therefore stands also for a context specification of this use made in terms of archeheads, heads, syntax markers and list numbers (see above). This assertion requires a single proviso: in a text (as opposed to in a language) the set of archeheads, heads, syntax markers and list numbers
112
The thesaurus as a tool for machine translation X1
X2
X3
X4
X5
X6
Xn
X X1
X,
X2
X, … Xn X
Figure 19. Oriented partially ordered set, dual of the set given above, consisting of the dictionary entry of a word, consisting of the relation between the word-sign and the total set of its possible contexts, as appearing in texts.
needed to make the context specifications of the constituent words will be a subset of the set consisting of the total thesaurus, namely, that subset that is needed to specify the contexts of the actual text. Thus, the contexts used in any text (or any sentence) in a language will be a sublanguage system, consisting of a sub-thesaurus. This fact alters the nature of the mathematical model that it was proposed to make of a thesaurus. For the word, as it is used in all the texts of the language (as opposed to the word as it is mentioned in the dictionary), now consists of that which is in common between all the context specifications that occur in all the texts; these context specifications being in terms of archeheads, heads, syntax markers and list numbers (see above). Because all that is in common between all these text specifications, so made, is the empirical fact that all of them can be satisfactorily denoted in the language, by the sign for that one word. When it is inserted into a thesaurus, therefore, as opposed to when it is inserted as part of a dictionary, the oriented partially ordered set consisting of the set of uses of a word becomes inverted (i.e. it has to be replaced by its dual), because the inclusion relation becomes reversed (Figure 19). It follows, if partially ordered set II is the dual of partially ordered set I, that they can be combined into one partially-ordered set. It is easy to see intuitively that the partially ordered set so formed is the ‘spindle lattice’ of n þ 2 elements (Figure 20). It may be a help to see that the interpretation of the meet and join relations that is here made has an analogy with the interpretation of a Boolean lattice which is given when the meet and join relations are imagined to hold between numbers. Thus, in a four-element Boolean lattice of which the side elements are numbers N1 and N2, the join of these two
What is a thesaurus?
113 ‘X’
X1
X2
X3
X4
X5
X6
Xn
X (property of being designated, in a given corpus of texts by the sign ‘X’) ‘X’ = X1 ∪ X2 ∪ X3 … Xn X = X1 ∩ X2 ∩ X3 … Xn
Figure 20. Spindle-lattice formed by conflating the two partially ordered sets given above. Lowest Common Multiple of N1 and N2 = N1 ∪ N2
N1
N2
Highest Common Factor of N1 and N2 = N1 ∩ N2
Figure 21. Numerical case
numbers will be their least common multiple, and the meet of the same two numbers will be their highest common factor. Analogously, in the interpretation that I am making, the join of the two contexts of a word, C1 and C2, will be the dictionary entry listing both of them, and the meet will be any property that is in common between them: in this case, the property of being denotable by the sign of the same word. This analogy is illustrated diagrammatically in Figures 21 and 22.
114
The thesaurus as a tool for machine translation Dictionary-entry of C1 and C2 = C1 ∪ C2
C1
C2
Property of there being the same word sign for both C1 and C2 = C1 ∩ C2
Figure 22. Word case
I RANK I
(total thesaurus)
RANK II
A
B
D
C
archeheads
RANK III
heads a1b1
b2
a2b3 b4c1d1 b5c2
c3d2
a3d3
c4d4
a4d5
Figure 23
To return now to the thesaurus model, if it be granted that partially ordered set I and partially-ordered set II can be conflated, without empirical or mathematical harm, to form the second lattice, it will be no empirical or mathematical surprise to find that, on the larger scale also, two oriented partially ordered sets can be conflated with one another to form a figure that has a tendency to become a lattice. For, whereas the total archeheads and heads of the thesaurus form an oriented partially ordered set of the form in Figure 23, the words and their contexts in the thesaurus (not in the dictionary) form an oriented partially ordered set of the form in Figure 24.
What is a thesaurus?
115
w1 v1 v2 v3 y1 w2 w3 x1 x2 y2 x3 z1 x4 z2 z3 w4 y3 z4 z5 z6 RANK IV
RANK V
contexts
W
V
Y
X
Z
words
(property of being a word in a language)
RANK VI O
Figure 24
By conflating the two partially ordered sets, which is done by mapping the sets of contexts of the words onto the heads – the sets being finite, as this is a finite model – we now get a single partially ordered set with one top point and one bottom point; that is, a partially ordered set that has a tendency to be a lattice-like figure constructed by conflating the two oriented partially ordered sets, given in Figure 25. 1.2.2.
Procedure for converting the conflation given under 1.2.1 into a finite lattice Mathematically, it will be easily seen that there is no great difficulty in converting the figure, given above, into a finite lattice. If it is not a lattice already, vacuously, extra context points can be added wherever sufficient meets and joins do not occur. If, upon test, an extra rank begins to show up below the word sign rank, and corresponding to the archeheads, it will probably be possible, with a minimum of adjustment, to embed this thesaurus in the lattice A3/5, which is the cube (A3) of the spindle of 5 elements (A5). Of course, if any of the vacuous context points turn out to ‘make sense’ in the language, then word uses or phrase uses can be appointed to them in the thesaurus, and, in consequence, they will no longer be vacuous. Empirically, however, however desirable it may be mathematically, there seems to be grave objection to this procedure. For even if we ignore the difficulty (which is discussed below) of determining what we have been meaning throughout by ‘language’, it yet seems at first sight as though there is another objection in that we have been conflating systems made with two inclusion relations; namely, 1. the theoretic classifying relation between heads, archeheads and contexts; 2. the linguistic relation between a word and its contexts.
116
The thesaurus as a tool for machine translation I
A
a1b1
w1 v1
b2
v2 v3
y1
W
B
a2b2
w2
V
w3
b3c1d1
b4c2
3 x1 x2 y2 x
X
X
Figure 25
D
C
c3d2
a4d3
4 z1 x z2
z3
Y
c2d4
w4
y3
a5d5
z4 z5
Z
z6
What is a thesaurus?
117
If we look at this matter logically, however (that is, neither merely mathematically nor merely empirically), it seems to me that the situation is alright. For even if we get at the points, in the first place, by employing two different procedures (i.e. by classifying the contexts, in the librarian manner, by means of archeheads and heads, whereas we deploy the contexts of a word, in the dictionary maker’s manner, by writing the sign for it under every appropriate head), yet logically speaking we have only one inclusion relation that holds throughout all the ranks of our thesaurus. For the heads, as well as having special names of their own, can also be specified, as indeed they are in the lattice-like figure, as being intersections of archeheads. Similarly, the contexts on the rank lower down could be specified not merely in terms of the units of the rank immediately higher up (i.e. of the heads), but also as intersections of heads and archeheads. And as we have already seen, at the rank lower down still, word signs can be seen as intersections of their contexts, and therefore specifiable also in terms of intersections of archeheads and heads. It may be asked whether there is any difference, on this procedure, between a good and a bad thesaurus lattice. To this, it may be replied that the second object of any thesaurus research should be to discover how many vacuous context points remain vacuous (i.e. cannot have any word uses or phrase uses attached to them) when any given thesaurus is converted into a lattice. On the ordinary canons of scientific simplicity, the more vacuous context points have to be created, the less the thesaurus, in its natural state, is like a lattice. Conversely, if (as has been found) very few such points have to be created, then we can say in the ordinary scientific manner, ‘Language has a tendency to be a lattice’. Some time ago, the Cambridge Language Research Unit was visited by the director of a well-known British computer laboratory, who was himself very interested in the philosophic ‘processing’ of language. On the telephone, before he arrived, he announced that his point of view was, ‘If language isn’t a lattice, it had better be’. Sometime later, after examining the CLRU evidence for the lattice-like-ness of a language, and what could be done with a lattice model of a thesaurus, he said mournfully, and in a quite different tone, ‘Yes, it’s a lattice; but it’s bloody large’. 1.2.3.
Syntax markers: the procedure of forming the direct product of the syntax lattice and the thesaurus lattice The argument up to this point, if it be granted, has established that a finite lattice model can be made of a thesaurus. It has only established this fact, however, rather trivially, since the classificatory principle of A3/5 is still crude. It is crude empirically since it embodies, at the start, only the amount of classification that the thesaurus compiler can initially make when
118
The thesaurus as a tool for machine translation
constructing a thesaurus. Thus the initial classification of ‘what one finds in language’ is into archeheads, heads, syntax markers, list numbers and words. Of these, using Roget’s Thesaurus as an example of ‘language’, the archeheads (in so far as they exist) are to be found in the Chapter of Contents, though they usually represent somewhat artificial concepts; some of the heads themselves, though not all, are arbitrary; the syntax markers, noun, verb, adjective and adverb, are not interlingual; finally, instances of every length of language segment, from morpheme to sentence, are to be found among the words. It is also crude mathematically, since the lattice A3/5, splendid as it looks when drawn out diagrammatically, is founded only upon the spindle of five elements; and, in this field, a spindle is of all lattices the one not to have if possible, since it represents merely an unordered set of concepts with a common join and meet. Two things are needed to give more ‘depth’ to the model: firstly, the structure of the syntax markers, which have been left out of the model entirely so far; secondly, an unambiguous procedure for transforming A3/5, which, on the one hand, will be empirically meaningful and, on the other hand, will give a lattice of a richer kind. Let us consider the syntax markers first. Two cases only are empirically possible for these: 1. that they are similar in function to the archeheads, being, in fact, merely extra archeheads that it has been convenient, to somebody, for some reason, to call ‘syntax markers’; 2. they are different in function from archeheads, as asserted earlier in this chapter; in which case this difference in function must be reflected in the model. Now, the only empirical difference allowable, in terms of the model, will have to be that whereas each archehead acts independently of all the others, picking out its own substantial subset of the total set of the thesaurus, the syntax markers act in combination, to give a common paragraph pattern to every head. And this means that the total set of syntax markers will form their own syntax lattice; this lattice, taken by itself and in isolation, giving the pattern that will recur in every head. It is thus vital, for the wellbeing of the theory, that the lattice consisting of the total set of syntax markers should not itself (as indeed it tends to do) form a spindle. For this fact implies that the set of syntax markers, like the set of archeheads, is unordered; in which case, the markers are merely archeheads. If, however, without damage to the empirical facts, the syntax markers can be classified into mutually exclusive subsets, then the situation is improved to that extent; for the syntax lattice will then be a spindle
What is a thesaurus?
119 23
2
3
13 22
21 03
1
2
1
12
02
0
0
20
10
11
01
A4
C3
00
A4C3 Figure 26
of spindles. And any further ordering principle that can be discovered among the syntax markers will improve the mathematical situation still further; since it will further ‘de-spindle’ the paragraph pattern of the heads. But such an ordering principle must be discovered, not invented, for the allowable head pattern for any language is empirically ‘tight’ in that, much more than the set of heads, it is an agreed and known thing. Moreover, if it is to pay its rent in the model, it must be constant throughout all the heads, though sometimes with vacuous elements. For if no regularity of paragraph pattern is observable in the heads, then it is clear that, as when the syntax lattice was a spindle, the syntax markers are again only acting as archeheads. The former betrays itself in the model. There will be a huge initial paragraph pattern, large parts of which will be missing in each head. Thus the construction of the syntax lattice is fraught with hazards, though the experimental reward for constructing it correctly is very great. The procedure for incorporating it in the model, however, is unambiguous: a direct product is formed of the thesaurus lattice and the syntax lattice, this product forming the total lattice of the language. This total lattice can be computed but not displayed, since it is quite out of the question to present in diagram form the direct product of a spindle of spindles with A3/5. The principle of forming such a direct product, however, can be easily shown; it is always exemplified by the very elegant operation of multiplying the Boolean lattice of four elements by the chain of three (Figure 26). And a sample syntax lattice, like a simple direct product, can be constructed. But in even suggesting that it should be
120
The thesaurus as a tool for machine translation
constructed, I am putting the logical cart before the logical horse. For it is precisely the set of lattice operations that I am about to specify that are designed to enable thesaurus makers objectively to restructure (which means also, by the nature of the case, to ‘de-spindle’) both the syntax lattice and the thesaurus. Until we have the data that these operations are designed to give, it is not much use imagining a thesaurus lattice except as embedded in A3/5, or a syntax lattice except as a spindle of sub-spindles, the points on each sub-spindle carrying a mutually exclusive subset of syntax markers. The total sets of syntax markers that we have been able to construct are not nearly sufficient, by themselves, to give grammatical or syntactical systems for any language. They are, however, interlingually indispensable as output assisting signals, which can be picked up by the monolingual programme for constructing the grammar of the output text, or even the semantic part of the output-finding procedure. As assistance to grammar, they are very useful indeed; for since they are semanticised, rather than formalised, they can straightforwardly operate on, and be operated on by, the other semantic units of the thesaurus. Thus they render amenable to processing the typical situation that arises when it comes to the interlingual treatment of grammar and syntax; the situation, that is, where information that is grammatically conveyed in one language, is conveyed by non-grammatical (i.e. by semantic means) in the next. 1.2.4. Lattice operations on a thesaurus 1.2.4.1. The translation or retrieval algorithm This is the process of discovering from a specification, given as a set of heads, an element of a given set with as nearly as possible the specified heads. This is exemplified by the procedure used in the rendering of ‘Agricola incurvo . . .’ (see Chapter 6). There, however, it is only applied to the semantic thesaurus, not to the language lattice as a whole. 1.2.4.2. Compacting and expanding the thesaurus This is the process of making some of the heads more inclusive or more detailed, in order to affect the distinctions made by the heads or to change the number of heads used. An example of this process is described by M. Shaw (1958) when it was found necessary for coding purposes to have only 800 heads rather than 1,000. 1.2.4.3. Embedding the total lattice in other lattices This again is an operation performed primarily for coding purposes; it depends essentially on the theorem that any lattice can be embedded in a Boolean lattice. From this it is possible to derive a number of theorems and methods for handling thesauric data economically (Parker Rhodes and Needham, 1959).
What is a thesaurus?
121
However, the process also throws some light on the logical structure of the whole thesaurus. 1.2.4.4. Extracting and performing lattice operations on sentential sublattices (See Section 5) 1.2.4.5. Criteria for nearness of fit It is possible to regard a lattice as a metric space in several ways, and as having a non-triangular pseudometric in many others. To do this, in practice, is extremely difficult, though the task is not, I still think, an impossible one. The obvious criterion of thesaurus-lattice distance is ‘number of heads in common’. For instance, if there are 10 words in common between the head Truth and the head Evidence, 7 words in common between the head Evidence and the head Truth, and 3 words in common between the head Existence and the head Evidence, it might be thought that, by counting the words in common, we could establish a measure of their relative nearness. Consider, however, the possible complication: Existence might have 50 words in it, Evidence 70, Truth 110; this already complicates the issue considerably. Then there are the further questions of aspect and paragraph distinction; are similarities in those respects to contribute to ‘nearness’? One such is embodied in the translation algorithm above, and research is in progress on the selection of the most appropriate one for translation purposes. For example, it is necessary to be able to say whether a word with heads A, B, C, D, C is nearer to a specification B, C, D, F than a word with heads C, D, F, G. The remaining two kinds of operation are concerned with testing a thesaurus rather than using it. 1.2.4.6. Finding the resolving power This consists of discovering what sets of words have exactly (or once a metric has been agreed, nearly) the same head descriptions. The closeness of the intuitive relation between these words is a test of the effectiveness of the thesaurus. 1.2.5.
The impossibility of fully axiomatising any finite lattice model of a thesaurus A thesaurus is an abstract language system; and it deals with logically primitive language. That this is so can be seen at once as soon as one envisages the head signs as logically homogenous ideographs. The words (to distinguish them from the heads) could then be written in an alphabetic script. But what kind of sign are we then to have for the syntax markers? What kind of sign, also, for the archeheads? Different coloured ideographs, perhaps, or ideographs enclosed in squares for the syntax markers and ideographs enclosed in triangles for the archeheads.
122
The thesaurus as a tool for machine translation
A thesaurus is an abstract language system, and it deals with logically primitive language. It therefore looks, at first sight, as though it were formalisable: as though the next thing to do is to get an axiomatic presentation of it. That it is logically impossible to get such a formalisation, however, becomes apparent as soon as one begins to think what it would really be like. Imagine a thesaurus, for instance, typographically set out so that 1. all the head signs were pictorial ideographs; 2. the archeheads were similarly ideographs, each, however, enclosed in a triangle; 3. the syntax operators were similarly ideographs, again, each being enclosed, however, in a square. Would it not be vital to the operation of the thesaurus to be able both to distinguish and to recognise the ideographs? To know, for instance, that the ideographic sign for ‘Truth’ (say, a moon exactly mirrored in a pond) occurred also in the archehead ‘Actuality’, which will be a moon mirrored in a pond, and enclosed in a triangle? Moreover, imagine such a system ‘mathematicised’, in other words that is re-represented in a different script, that is, with its ideographs replaced by various alphabets (you would need several) and the triangular and square enclosures respectively by braces and square brackets? What have you done, when you have effected this substitution, except replace ideographs by other ideographs? Are not A, B, C, D ideographs? Are not brackets ideographs? And is it not as important in the alphabetic as in the pictorial case to know that A is not B, and B is not C; to distinguish (A) or [A] not only from A, but also from B or (B) or [B]? There could be no better case than this for bringing home the truth – which all logicians in their heart of hearts really know – that there are required a host of conventions about the meaningfulness and distinguishableness of ideographic symbols before any ideographic system can be formalised at all. In a CLRU Workpaper issued in 1957 I wrote, What we are analysing, in analysing the set of uses of a word, is the situation at the foundations of all symbolism, where the normal logical sign-substitution conventions cannot be presumed to hold. Because exactly what we are studying is, ‘How do they come to hold? . . . ’ By mathematical convention, then, if not by mathematical assertion, variables have names . . .
(In fact, a mathematical language that consisted of nothing but variables, like a thesaurus, would be logically equivalent to St Augustine’s language, which consisted of nothing but names.) A mathematical variable has meaningfulness and distinguishableness in a system because it has the following three characteristics:
What is a thesaurus?
123
1. It is a name for the whole range of its values; we learn a lot about these values by naming the name. The traditional algebraic variables x and y, stand for numerals; the traditional variables p, q, r stand for statements; and so on. 2. It has a type: it occurs in systems that have other signs that are not variables (e.g. the arithmetical signs, or the propositional constants) from which it can be distinguished by its form. 3. It has context: that is to say, by operating with one or more substitution rules, a further symbol giving a concept with a single meaning can be substituted for the variable. In the paper, I took the combinator rules of a combinatory logic and, by progressively removing naming power and distinguishability from the symbols, produced a situation where no one could tell what was happening at all. Now as soon as we operate with the heads of a thesaurus, we operate with variables from which the second characteristic has been removed.1 The result of this is that the first and third characteristics, namely that a mathematical symbol is a name and that it has context, acquire an exceptional prominence in the system, and that is the case whatever system of mathematical symbols you use. Why, then, give yourself a great effort of memory learning new names, when names already approximately existing in your language, and the meaningfulness and distinguishableness of which you know a good deal about already, will perfectly well do? Another, general way, of putting this argument is by saying that any procedure for replacing the head signs by other signs will be logically circular. For, in the model, as soon as we replace the archehead or head specifications by formal symbols, we can only distinguish them one from another by lattice position. But we can only assign to them lattice positions if we can already distinguish them from one another. In making this model, Language (philosophic English, L1) is being used to construct a Language (the heads, archeheads, markers, list numbers of the thesaurus and the rules for operating them, L2) to analyse Language (the words and contexts of a natural language, L3). Every attempt is made, when doing this analysis, to keep L1, L2 and L3 distinct from one another. But there comes a point, especially when attempting normalisation, beyond which the distinction between the three goes bad on you; and then the frontier point in determining the foundations of symbolism has been reached. Beyond that point, variable and value, variable and constant, mathematical variable and linguistic variable, sign and meta-sign – it’s all one: all you can do is come up again, to the same semantic barrier, by going another way. 1
In the model, the heads etc. can of course be distinguished from the lattice connectives. To that extent, but only to that extent, the system is formalisable.
124
The thesaurus as a tool for machine translation
In our thesaurus, in order to avoid the use of ideographs, archeheads are in large upper-case letters and followed by one shriek (e.g. TRUE!), heads are in small upper-case letters with a capital (e.g. EVIDENCE, TRUTH); words are in ordinary lower-case letters (e.g. actual, true) and syntax markers are hyphenated and in italics (e.g. fact, concrete-object). 2.
Contexts, words, heads, archeheads, rows, lists
2.1.
Contexts
It is evident that if we wish to come to a decision as to the extent to which thesaurus theory has an empirical foundation, the vital notion to examine is that of context. Having said this, I propose now to examine it not concretely, but abstractly, because in the course of examining it abstractly, it will become clear how very many obstacles there are to examining it concretely. Roughly, if a language were merely a large set of texts, there would be no such difficulty; research with computers would show to what extent these could be objectively divided up by using linguistic methods, and into how small slices; a list of the slices of appropriate size (i.e. morphemes, rather than phonemes) would be the contexts. Actually, however, language is not like that. Firstly, nobody knows how large a number of texts, and what texts, would be required for these to constitute a true sample. Secondly, we have to know quite a lot about any language, both as to how it functions and to what it means, in order to give the computer workable instructions as to how to slice up the text. So even if we wish to be 100 per cent empirical – ‘to go by the facts and nothing but the facts’ – we find that a leap of the creative intellect is at present in fact needed to arrive at a purely empirical notion of collocation, or context. And that being so, there is everything to be said, for using to the full, in an essentially general situation, the human capacity to think abstractly.2 [Editors note: a long section has been removed here that duplicates an earlier chapter.] If I am right in thinking that the basic human language-making action consists in dreaming up fans (that is, in first evolving logically primitive, i.e. general and indeterminate language symbols, and then, in explanatory talk, specifying for them more and more contexts), it will follow that the various devices for specifying word use in any language will be the logically primary devices of the language. And so, they are: the pointing gesture, the 2
That is, if we have to take a creative leap, in any case, let it not be a naive one; let us do our best to turn it into an informal theoretic step.
What is a thesaurus?
125
logical proper name, ‘Here!’, ‘Now!’, ‘This!’, the defining phrase, all these are logically far more basic than case systems or sentence connectives. In short, in asking for the kind of context specifications that I am looking for, what I am after is the most logically primitive form of definition. This can be obtained instantly the moment it is seen that the basic characteristic of definitions is that they do not define. They distinguish, just as a pointing gesture does, but they do not distil. Except possibly in mathematics, which we are not now talking about, you can never go away hugging your definition to your breast, and saying, ‘Ah, now I’ve got THE meaning of that word!’ As soon as one has thought this thought, one achieves liberation, in that one ceases to look for merely one kind of definition. One lifts one’s eyes and says, ‘Well, how do people distinguish word uses from one another?’ 1. They do it by gestures, especially when they do not know the language. (We will not go further into this, now.) 2. They do it by explanatory phrases: ‘‘‘Father’’ usually means ‘‘male parent’’. But it doesn’t always. ‘‘Father’’ can mean any venerable person. The Catholics use it as a name for priests’ and so on. 3. They do it by actually showing the word in the use that they want to distinguish. ‘‘‘Rich’’ means ‘‘humorous’’; have you never heard the phrase. ‘‘That’s rich?’’’. It is upon this fact, namely, that exhibiting a word in collocation is one well-used type of context specification, that scientific linguists base their hope of getting meaning distinction from texts. Well, they may; and this would give us at once an empirical definition of context; but they have not yet. The kind of difficulty I believe they are up against can be exemplified by the way in which I learnt the meaning of ‘That’s rich!’ I learnt it when a sudden spasm of laughter at a joke suddenly convulsed me; and someone else, who was also laughing, said ‘That’s rich’. In other words, I connected the phrase ‘That’s rich’ with a kinaesthetic sensation, that is, with an extralinguistic context, not an intra-linguistic one. The fact that ‘rich’ occurs in this sense, often in the collocation ‘That’s . . .’ was irrelevant, and is to my distinguishing this meaning of ‘rich’. 4. They do it by compiling lists of synonyms: ‘Father, male parent, male ancestor’. This is a special form of procedure 2, and in my view it is a perfectly valid convention of definition. Why should you not just group overlapping word uses, and then say no more, instead of giving each a lengthy explanation. 5. They do it by juxtaposing analogous sentences. It is the method currently used by what is currently called derisively ‘Oxford philosophy’; that is, by the current school of philosophers of ordinary language.
126
The thesaurus as a tool for machine translation
If we now recall the whole argument of section 1, it will be clear that the kind of specification that will give our fan, or any set of fans, a context law, is the synonym-compiling device given above under (4). If the synonyms in such groupings were complete synonyms, the device would be no use to us; they are not. They are distinguished one from another, by being, for example, more colloquial, by being, for example, pejorative or approbative, or more intensified versions of one another; and the groupings are distinguished by sentential function. In short, the synonyms in synonym groupings are compared to one another and distinguished from one another in terms of specifications by heads, syntax markers, archeheads . . . To sum up: whether you decide that context, in this sense, is an empirical notion, will depend firstly on whether you think that the five forms of definition that are given above are logically equivalent, and secondly, whether you think that any one (say, (3), or even (5)) could be explored by detailed research methods to throw light upon (4). If you think that either could, you will be empirically satisfied; and even if you do not think this, you need not be ultimately dissatisfied, if a context system successfully built of language fans achieves mechanical abstracting or MT. For basically, a word use in context is something that you ‘see’ . . . 2.2.
Heads
1. It should be possible, by taking the notion of fans, to construct a generalised and weaker version of Brouwer’s calculus of fans. If this could be done, then Brouwer’s fan theorem, which in classical form is the stop-rule theorem in Koenig’s Theory of Graphs, will provide a theoretic definition of head. 2. The question has to be discussed as to whether the totality of contexts in a language form a continuum, in view of the fact that the set of contexts of any word appear to form a discrete set. That is to say, if a word is being used in one way, it is not being used in another. The uses of a word do not ‘fade into’ one another; new uses continually appear, but the set of them is discontinuous. As against this, I can see no way of imagining the total set of concepts of a language (i.e. the set of the total possible continually increasing dictionary entries of all the words) except as a Brouwerian continuum. Because of this, my present view is: make a continuum (Brouwer’s is the only true continuum) and then use the context law to wrinkle it afterwards. 3. The question has to be discussed with context; contexts, or word uses, look very empirical until they are subjected to analysis, when it turns out that you have to ‘see’ them. Heads, on the other hand, gain empirical solidity the more the notion of extra-linguistic context is analysed,
What is a thesaurus?
127
and the more thought is given to the practical necessity of accounting for human communication. (Roughly: something must be simple and finite, somewhere.) Probably, perversely, I have hopes of confirmation for this part of the theory coming from research in cerebro-physiology. Philosophically, it comes to this: the fundamental hypothesis about human communication which lies behind any kind of thesaurus making is that, although the set of possible uses of works in a language is infinite, the number of primary extra-linguistic situations that we can distinguish sufficiently to talk to one another in terms of combinations of them is finite. Given the developing complexity of the known universe, it might be the case that we refer to a fresh extra-linguistic situation every time we create a new use of a word. In fact we do not; we pile up synonyms, to re-refer, from various and differing new aspects, to the stock of basic extra-linguistic situations that we already have. It takes a noticeable new development of human activity (e.g. air travel) to establish so many new strings of synonyms in the language that in the thesaurus, Aerial Motion may conveniently be promoted from being a sub-head of Travel to being a new head in its own right; and even then, if inconvenient, the promotion need not be made. The primary noticed universe remains more stable than do continually developing sets of uses of words; in fact, all that ever seems to take place in it, in the last analysis, is a reorientation of emphasis, since the number of heads in any known thesaurus never increases beyond a very limited extent. The importance of this fact for MT is obvious. If the hypothesis is right, communication and translation alike depend on the fact that two people and two cultures, however much they differ, can share a common stock of extra-linguistic contexts. When they cannot come to share such a stock, communication and translation alike break down. Imagine two cultures: one, say, human, one termite. The members of the first of these sleep, and also dream, every night; the members of the second do not know what sleep is. As between these two cultures, communication on the subject of sleeping and dreaming would be impossible until acquired knowledge of sleeping and dreaming by members of the second culture sufficed to establish it. 2.3.
Archeheads
The problem of theoretically describing an archehead involves bringing up the difficult notion of the meaning line. 2.3.1. The problem of the meaning line It is found in practice that, when points in the thesaurus lattice are very near the top, they become so general
128
The thesaurus as a tool for machine translation
that, by meaning practically everything, they cease to mean anything. Such points will be defined as being ‘above the meaning line’. In practice, we count them, or call them by letters, or by girls’ names (‘Elsie’, ‘Gerite’, ‘Daisy’). Each of these devices (see section 1, above) is, strictly speaking, logically illegitimate, in that it ascribes to such points a type of particularity that they do not have. It is not that they mean nothing: it is that they mean too much. They are, in the logical empiricist sense of the words, metaphysical. 2.3.2. Archeheads must be just below the meaning line Archeheads are not words that could exist in any language. But they must be sufficiently like words that can be handled in any language to enable them themselves to be handled. TRUE! must be like ‘true’; or at least, TRUE! must be more like ‘true’ than it is like ‘please’. Until lately I was so impressed by this difficulty that I assumed that it was impossible, in practice, to name or handle archeheads. Constructing Richens’ NUDE has convinced me that this can be done. R. H. Richens is thus the discoverer of archeheads, not as theoretic entities (they are in Roget’s Chapter of Contents) but as usable things. 2.3.3. Archeheads, as has been shown by tests on NUDE, have an extremely practical property: they intersect when the thesaurus algorithm is applied to them at just those points where the thesaurus itself lets you down: e.g. change/where | in (pray:where:part) – CHURCH This is ‘to go to church’ in NUDE. Notice that the archehead WHERE! is here in common between both entries: although you would never persuade a thesaurus maker to include ‘church’ in a list of places to which people go. e.g. (cf. Bar-Hillel) in | (man/use)/(in:thing) – INKSTAND ‘in the inkstand’ Notice that the archehead IN! is in common between the two entries, although no thesaurus maker would intuitively think of ‘inkstand’ as an in-thing unless something had brought the fact that it was to his notice. These intersections, of course, are caused to occur by the fact that, if you have only forty-eight archehead elements to choose from in defining something, the chances go up that descriptions will overlap. In other words, the fewer the heads, the smaller the resolving power of any thesaurus; and the smaller the resolving power of any thesaurus, the greater the intersecting power of the thesaurus. In order to combine a high
What is a thesaurus?
129
resolving power and a high intersecting power, the thesaurus should contain a large number of heads, to secure the first, and, including them, a large number of archeheads, to secure the second. Thus, a thesaurus of forty-eight heads, which is what NUDE can be taken as being if you ignore the sentential connectives, has a very high intersecting power indeed. 2.4.
Rows
The problem of making a theoretic description of a row is that this involves making a theoretic description also both of a word, and also of a language. For 1. the rows of a thesaurus consist of words (but these words can be of any length). 2. the totality of rows of the thesaurus (empirically speaking) constitutes the language. And how do we distinguish here ‘languages’ from ‘language’? 2.4.1. Words The great difficulty of defining a ‘word’ was discussed by me some years ago [see Chapter1]. I pointed out there that nobody has, in fact, tackled the problem of defining the notion of a ‘word’ in an intellectually satisfactory manner. Philosophers regard it as being purely a grammatical concept. Traditional grammarians are leaning on what they believe to be the insights of philosophers; modern linguistics professes not to be interested, for it claims that the ‘word’ is in no sense a fundamental notion. So the difficulty is there, in any case. If the thesaurus is to be interlingual, there is no length for ‘word’. As so often, the difficulty of operating within one language mirrors the difficulty of operating between various languages. One’s first impulse is to say, ‘Let a word be any stretch of language, short or long, which, in practice, serves to distinguish a point on the rank of the thesaurus lattice’. But this definition is circular. First, we define the points on Rank V of the thesaurus lattice as being those separable words the contexts of which can be mapped on to the points of Rank IV; then we define the words that go on a thesaurus lattice as language stretches that map on to the points of Rank IV of the thesaurus lattice. I do not see the way out of this difficulty. 2.4.2. Language 2.4.2.1. Language is abstraction. All logicians know this; but they behave as though the ‘fit’ between the abstraction ‘Language’ and any language is so close that the fact that ‘Language’ is an abstraction does not matter.
130
The thesaurus as a tool for machine translation
Nothing could be further from the truth. The proposition ‘Language exists’ is a theoretic one. It is rather like ‘Matter exists’ or ‘God exists’ or, still more, ‘The Universe, considered as a whole, exists’. What is needed is a theoretic definition of ‘a language’. 2.4.2.2. What we know about a language, according to the theory, is that it is a sub-lattice of the total language lattice. The archeheads, the syntax markers, the heads of any given language will be a different subset of the total set, but each will be a subset of the total set. Yes, but suppose what is really different as between language and language (considering now ‘a language’ as well as ‘Language’ as something that is given in terms of the theory) is not that it is made up from a different set of archeheads, markers, heads, but that it is made up of these in different combinations? This would mean that every language was a different lattice, not a sub-lattice of a central total language lattice,3 and that every single language lattice had different rows. The semantic, grammatical and syntactic devices used by any given language would then be imagined as being alike, distinguishable and specifiable in terms of combinations of a set of initially very weak semantic components. These components would be very alike indeed to the weak semantic components that linguists at present use to distinguish components of a system. It has frequently been claimed by linguists, particularly those of the American ‘structuralist’ school, that their subject is a science, based on purely empirical foundations; some have even gone so far as to describe it as a kind of mathematics. However, it is impossible to relate the abstract systems linguists create to any particular linguistic situation without reference to immediate and undisguised concepts. As Kay has said, the moment one asks the most fundamental question of all, ‘What is being said here?’, we must find other apparatus than linguistics provides. Thus it is that when Harold Whitehall (1951) writes on linguistics as applied to the particular case of the English language, semantic categories, heads, descriptors – call them what you will – immediately begin to play a leading part. One of the great merits of this book, in my view, is that no apology is made for the introduction of these semantic categories; they do not have to be introduced furtively under the guise of mnemonics for classes established in a more respectable way. The following is an actual table from Whitehall’s book (Whitehall, 1951 p. 72).
3
They will all be sub-lattices of the lattice of all possible combinations, but this lattice is both almost unconcernedly large and also empirically irrelevant.
What is a thesaurus?
131
The system of propositions Simple Primary RELATION Transferred
Complex
Double
Group
1. Location
at by in on
down from off out through up
aboard, above, across, after, against, amid, before, beneath, beyond, near, beside, between, next, over, past under
inside, outside through-out, toward(s), underneath, upon, within without; down at at, by, in, on; out at, by, in, on; up at, by, in, on.
in back of in front of inside of on board (of) on either side (of) on top of outside of
2. Direction
down from off out through to up
at by in on
aboard, about, across, after, against, among, around, between, beyond, over, under
inside, outside, toward(s); underneath; into, onto, down to, from off to, from; out to, of, from; up to, from; near to, next to; over to; to within, from among
in back of, in front of inside of on top of on board (of) on either side (of) outside (of)
So, looking at this fundamental feature of linguistics from a theoretic and thesaurus maker’s point of view, we see that Haugen may have been onto a more important point than he realised when he said (1951): ‘It is curious to see how those who eliminate meaning have brought it back under the covert guise of distribution.’ The discipline that we are here imposing on linguists is that we will not allow them a fresh set of concepts for each system. Their semantic concepts must form a single finite system; and with combinations of them they must make all the distinctions that may turn out to be required within the language. Now, if word and language can be theoretically defined as I have desired to define them, but failed to define them, above, then we can say that a row is a set of overlapping contexts of words in any language, this set being distinguished from all other sets in terms of heads, markers and archeheads, but the members of the set only being distinguished from one another by means of archeheads. To go back to the question of each language being a separate lattice, instead of each being a sub-lattice of a total language lattice: this does not seem to me to matter as long as the lattice transformation that would turn any language lattice into any other is finite and mathematically knowable.
132
The thesaurus as a tool for machine translation
2.4.3. The row is also an empirical unit in a thesaurus. You test for rows, as a way of testing NUDE and the lattice. If a thesaurus or interlingua, when used on any language, produces, when tested, natural-sounding rows and lists that occur as lists in that language, then the thesaurus or interlingua has an empirical basis for that language. If the test produces arbitrary output it has failed.4 The empirical question as to whether in practice rows can be found that are interlingual is discussed, to the extent to which I am able to discuss it, in section 4. 2.5.
List numbers
2.5.1.
Lists are sets of mutually exclusive contexts e.g. spade, hammer.
If he hit her with a spade, he didn’t hit her with a hammer. In the sentence, ‘He hit her with a . . .’, either ‘spade’ or ‘hammer’ can be used to fill the gap, but not both (Contrast the sentence, ‘He was a coward, a craven, a poltroon’). If one sentence mentions two members of a list, then the two members must be joined by at least ‘and’. ‘He was carrying both a spade and also a hammer’. You can, of course, replace the commas by ‘ands’ in ‘He was a coward, a craven, a poltroon’. But the ‘ands’ won’t mean the same thing here. The list-joining ‘and’ is logically a true Boolean join, ‘and/or’; the synonymjoining ‘and’ is a logical hyphen, a meet, you might say, ‘He was a cowardcraven-poltroon’. 2.5.2. Theoretic definition A list number is a head in the thesaurus with only one term in it; that is, with only one context, or word use in it. Thus, the sub-thesaurus consisting of the members of a list is, and always will be, a spindle. The occurrence of a list number in a thesaurususing translation programme is a warning that the limit of the resolving power of the thesaurus has been reached. 2.5.3. Algorithm for the translation of list numbers Take the thesaurus dictionary entry for ‘carrot’. Take also the dictionary entry for ‘parsnips’. These two dictionary entries are saved from being identical by the fact that you can dangle a political carrot in front of someone; and that ‘Hard 4
This test works, too. You know at once when you see the set of cards, whether it is trying to be a list or a row, or whether it is arbitrary.
What is a thesaurus?
133
words butter no parsnips’. So the two words can be distinguished from one another, in the thesaurus, by the fact that they do not have identical dictionary entries. But the two contexts cannot be distinguished from one another when both of them occur in the same row of head VEGETABLE. Suppose we try to translate the following sentence: ‘He was digging up a carrot in his garden’; then the translation algorithm will produce the whole list of vegetables. The only solution is to add to the dictionary entry of carrot and parsnip a list number that is attached to a definite head of the thesaurus (say, VEGETABLE) but does not have to intersect in the intersection procedure. Thus, carrot, as well as having a political head in its dictionary entry, will also have VEGETABLE (139). And parsnip, as well as having a civility and soft-spokenness head in its dictionary entry, will also have VEGETABLE (141). As soon as the translation algorithm gives VEGETABLE as the context, the machine picks up the list numbers. It then brings down the list given under VEGETABLE and brings down the one-to-one translation carrot into the output language of carrot given under (141). In other words, a thesaurus list is a multilingual one-to-one micro-glossary (no alternative variants for any list word being given) in which the different members of the list have different numbers. But the micro-glossary itself must be attached to a given head, because only when it is known that that head gives the context that is being referred to in the input text, that it is known also that the words in the micro-glossary will be unambiguous. ‘Mass’ can mean ‘religious service’ as in ‘Black Mass’; ‘charge’ can mean ‘accusation’ or ‘cavalry charge’. Only when it is known that both are being used in the context of physics can they be translated micro-glossary-wise, but using their list numbers. 2.5.4. Theoretic problems that arise in connection with list numbers It might be thought that the theoretic problems of list numbers would be easy. Actually, they are, on the contrary, very difficult; and the philosophy of lists is still most imperfectly understood. Certain things are known: 1. No head must contain more than one list; otherwise the procedure5 will not tell you which list to use. If you want more lists, you must have more heads. 2. One word, however, can figure in several lists.
5
The difficulty is a coding one; methods may perhaps be found to associate a list with a combination of heads.
134
The thesaurus as a tool for machine translation
3. The list procedure, unlike the translation algorithm, gives a single translation. But none of us really knows how to compile a list and when not; and what is the principle uniting the words in a micro-glossary. If the arguments of the above sections had been fully filled out, and if all the theoretic difficulties arising from them had been adequately encountered, this would be the end of the theoretic part of this paper. In the two sections immediately following, the problems brought up for discussion are much more empirical problems. 3.
Kinds of thesaurus
Bar-Hillel, and other critics, have asserted that the CLRU uses the word thesaurus in a variety of different senses, thus causing confusion. This criticism must be admitted as correct. It can also be correctly replied that these senses are cognate, and that different senses of ‘thesaurus’ are being used, because CLRU is experimenting with different kinds of thesauruses. The purpose of this section is to enumerate and describe the kinds of thesaurus, so that the difficulty caused by past inexplicitness may be overcome. All the kinds of thesaurus, which are used in the Unit, can be taken as being partial versions of the total thesaurus model defined in Section 1 above. This provides the unifying theoretic idea against which the various examples of partial thesauruses should be examined. The senses in which ‘thesaurus’ has been used, apart from the total sense of Section 1 are: 1. A natural thesaurus – e.g. Roget 2. A term thesaurus – e.g. that associated with the CLRU Library Scheme 3. An interlingua – e.g. Richens’ interlingua 3.1.
The natural thesaurus
For most English-speaking people, this is exemplified by Roget’s Thesaurus of English Words and Phrases (London, 1852 and later). In this document, words are grouped into 1,000 heads or notional families; words often coming into more than one head. An index at the back contains an alphabetical list of words with the numbers of the heads in which they come. There are, however, a number of other such documents: 1. ‘Copies’ of Roget in some six other languages. 2. Synonym dictionaries. These are alphabetical lists of words with a few synonyms or antonyms attached. Heads could be compiled from these, but prove inadequate in practice.
What is a thesaurus?
135
3. Ancient thesauruses. Groupings in languages (Chinese, Sanskrit, Sumerian) where alphabetical dictionaries are ruled out by the nature of the script have been found to have thesauric properties, though they may be sometimes overlaid by the groupings round graphically similar characters. The best known of these is the Shuo Wen ancient Chinese radical dictionary. While natural thesauruses have the advantage for experimental purposes of actually existing in literary, or even in punched-card, form (for which reason all CLRU thesauric translation tests have been made on them), they suffer from serious drawbacks imposed in part by the necessities of practical publishing. These drawbacks may be listed as follows: 1. The indices are very incomplete. It seems that publishers insert only some 25% of the available references to the main texts since, if they insert more, the resulting volume is too heavy to publish. As for testing and mechanisation purposes, by far the most convenient way of using the thesaurus is to compile it from the index, this is a very considerable research defect. 2. Since the main purpose of thesauruses published in book form is to improve the reader’s knowledge of words, they tend to leave out everyday and ordinary words, and to insert bizarre and peculiar words that will give users the feeling that their wordpower is being increased. For translation purposes, the opposite is what is required. 3. In Roget, the ‘cross-references’ from one head to another are very incomplete and unsystematic. Their insertion causes an even greater inadequacy of the index; their omission, an even greater dearth of ordinary words in the heads. 4. The heads themselves are classified, in the chapter of contents, by a single hierarchy, in tree form; whereas what is required is a multiple hierarchy of archeheads. The cross-references between heads provide the rudiments of an alternative classification; but this is too incomplete to be of use. All these deficiencies may be discovered by simply opening and reading an ordinary Roget. More recondite characteristics of the existing document were brought to light by tests of various kinds. 5. The cross-references from head to head tend to be symmetrical: that is, a head that has a great many cross-references from it is likely to have a great many cross-references to it. 6. The intersection procedure, as in ‘Agricola . . .’ failed to work even when reasonably predictable common contexts were present, in an attempted translation from English to English. This was almost certainly because the common possibilities of word combination in the language are not in it. (See Section 2, on archeheads).
136
The thesaurus as a tool for machine translation
7. The thesaurus conceived as a mathematical system was exceedingly redundant, and when this redundancy was investigated further it was found that this was because of the presence of a large unordered profactor in the lattice containing the thesaurus (Parker-Rhodes and Needham, 1962). This was tantamount to saying that the thesaurus at present existing had a great deal less usable structure than would at first sight appear. 8. Some of the heads can be shown by tests to be arbitrary. Most of the arbitrary heads are artificial contraries of genuine heads. As a result of all these characteristics, although the idea of a thesaurus is sometimes most conveniently defined by displaying Roget as a particular example, it becomes clear that existing thesauruses are very unsuitable from MT work. However, it is possible from the defect above to obtain a fairly precise idea of the changes that are necessary to make a usable thesaurus for mathematical treatment. It is likely that for some time to come experiments will make use of the natural thesauruses with changes made to remedy particular defects, rather than with an entirely new thesaurus, which would require a major effort for skilled lexicography, which will in turn require a considerable time to carry through. 3.2.
The term thesaurus
The term thesaurus is exemplified by the thesaurus used for the CLRU Information Retrieval System (Joyce and Needham, 1958). It was invented to deal with a situation where a large number of new technical terms had to be handled that were not to be found in any existing thesaurus (or, for that matter, any dictionary). Also, it was required for reasons set forth in Joyce and Needham (1958) that all terms should be retained as individuals as well as being incorporated in heads, while, nonetheless, all reasonable heads should be used. The structure thus set up is a very detailed one, with a large number of levels. There is no formal distinction between heads and terms, and the thesaurus (which is sufficiently small and can actually be drawn on a rather large piece of paper) appears as a multiple hierarchy of points representing words. The point representing word A appears above the point representing word B, if the uses of A are a set of contexts including those of the word B. In many parts of the system, this corresponds to a straightforward subject classification, which is clearly a subcase of the whole. It will be seen that since each word is treated entirely individually, the degree of detail of the system is rather greater than that of natural thesauruses; the term thesaurus can cater for relations of considerable complexity between words that would simply fall under a head together in the natural thesaurus.
What is a thesaurus?
137
The operation of this kind of system is discussed in detail in ParkerRhodes and Needham, 1962. There is some advantage, however, in here discussing it again, in order to consider the relation of the system to the other kinds of thesauruses. Firstly, it is clear that the higher terms are functioning as something very like heads (or even archeheads), as well as functioning as words in their own right. It has appeared that this phenomenon has in some cases seriously warped the lattice in the sense that a term high up (e.g. mathematics) carries so much weight by virtue of the many terms that it includes that it no longer functions efficiently as the terms associated with its word (e.g. ‘mathematics’). This defect may be corrected by using a device; however, it indicates that the treatment of all words and heads, pari passu may be incorrect. Secondly, the system is excessively cumbrous through the great number of its terms; in an anxiety not to lose information from the system, uncomfortably large amounts have been kept, much of which is unlikely to be required. Now this was an anxiety not to lose it by absorption of words into entirely intuitively based heads. The intuitively based heads are there expressed by the inclusion system of the lattice; but the original and detailed information is there too. It is at present intended to conduct experiments on the mechanical reconstruction of the retrieval thesaurus, which experiments are expected to throw considerable light on the relations between the term thesaurus and the natural and total thesauruses, and also to throw more light on the structure of the latter. The basis of these experiments is the idea that words that can properly be amalgamated in a head should have the property of tending to occur together in documents; if the heads are built up on this principle, the loss of information through replacing the word by the head will be minimised. This naturally gives rise to a measurement of the extent to which pairs of words tend to occur in the same documents, which will be called their similarity. In order that experiments may be made to see whether this line of thought is at all profitable, two things are necessary: 1. An algorithm for calculating on some agreed basis in the data what the similarity of a pair of terms shall be. 2. An algorithm for finding, from the total set of terms, subsets that have the property that the similarity between their members are high compared with similarity between members and non-members. Several algorithms of the type 1 are available. Probably the simplest is that described by Tanimoto (1937). This may be exactly described if the agreed basis for computation is the description of documents by their term abstracts. The search for an acceptable rigorous definition and consequent algorithm (2) is being carried on by several workers under the name of
138
The thesaurus as a tool for machine translation
research into The Theory of Clumps.6 This is not the place for an extensive discourse on the progress to date in this field; however, various attempts exist. It is shortly intended to carry out by means of a computer an exhaustive examination of a simple case to compare them. If the results of this are satisfactory, tests will be conducted on parts of the CLRU Library Scheme, the general principle being as follows. An already-existing classification of the terms will be used as a kind of ‘trial set’ of heads. On the basis of similarities of terms computed on an increasing number of documents, these heads will be examined for satisfaction of the ‘clump criterion’ (as the rigorised definition 2 is called) and altered so that they satisfy it as far as possible. These altered heads will then be used for retrieval. 3.3.
Interlinguas
An interlingua means here: 1. A thesaurus consisting solely of the archeheads of Section 1 2. A thesaurus with a procedure for finding syntactic structure. If the syntactic structure procedure is regarded as something super-added to the thesaurus, Richens’ NUDE is an interlingua in the present sense. If the bonding7 be disregarded, the forty-eight elements seem very like archeheads and would give rise to a lattice structure with much less resolution than a whole thesaurus, but with an additional intersecting power.8 An Italian-NUDE dictionary of some 7,000 chunks has been made at CLRU, and various tests on it have been performed. Since, however, only a small part of the dictionary has been key-punched, the tests have had to be limited, and particular ones directed to examining the internal consistency of the NUDE entries for Italian. Typically, a set of near-synonyms was found from an Italian synonym dictionary, and their NUDE equivalents found. These would come from different parts of the dictionary, and were usually made by different people; the object of the exercise was to see whether the entries were widely divergent. While the tests sometimes brought out errors of considerable differences of interpretation, in general support was given to the objective character of NUDE as an interlingua. These tests are to be continued and the detailed results written up. While NUDE conforms to the definition of a partial thesaurus, it suffers from the drawback that it has so far proved impossible to attach a quantitative measure to the extent to which one NUDE formula is like another. If all brackets and bonds are removed so that the measures used in the total thesaurus may be applied, the results are unsatisfactory, since much of the 6 7
The term ‘clump’ was invented by Dr I. J. Good. NUDE is described below in Section IV. 8 cf. Section II, 3, above.
What is a thesaurus?
139
character of a word resides in its bonding pattern. The discovery of a procedure for ‘inexact matching’ as it is called is a matter for present research on NUDE, and when some progress has been made in it, it will be possible to repeat in a more cogent manner the tests on near-synonyms described above. On the other hand – though this is not a thesauric property – the fact that every NUDE formula has a unique, though simplified, sentential or phrase structure is of the greatest help when NUDE is used for translation. This is a characteristic which every attempt is being made to simulate in the full thesaurus, by establishing convertibility between the NUDE sentential signs and certain combinations of elements in lattite.9 No tests, however, have been done on the lattite as yet, so NUDE remains the Unit’s MT interlingua. The following Italian–English translation trial, done on a randomly chosen paragraph with a dry run, probably gives a fair idea of what its translating power is. It is hoped that in the not too far distant future to put NUDE on a big machine, in which case, large-scale ItalianEnglish output could be obtained. SPECIMEN TRANSLATION Italian ! ! Interlingua ! ! English Input Il colere della farina charatteristica cui nel commercio si attribuisce assai grande importanza, dipende essenzialmente dalle sostanze coloranti naturali presenti nella stessa farina. Pero sul colore varie cause accessorie influiscono e supratutto la presenza di sostanze scure estranee. La granularita` stessa della farina ha un effetto sul colere, giacche i grossi proiettano un ombra che da alla farina una sfumature bluastra. (Genetica Agraria 1 (1946): 38) Output The colour of the caratteristic flour of which very big importance is thought in connexion with commerce is conditioned naturally by the color natural present substance in the same flour. But different accessor causes and especially presence of dark estrane substances influence the colour. The same granul-ness of the flour has a effect in connexion with the colour because the big granul-s proiett a shade that gives the flour a bluish sfumatur. NB. 1 – Words in italics did not occur in the dictionary used. NB. 2 – In the above translation characteristic, which did not occur in the dictionary, was taken as an adjective. The correct 9
MMB intended by the term ‘lattite’ a very restricted lattice consisting only of the terms at the very top of the thesaurus lattice, terms she sometimes called ‘archeheads’. She explains the term a little more at the end of this chapter [Ed. note].
140
The thesaurus as a tool for machine translation
100 Language
Natural Philosophy
562 Discovery
101 Languages
266 Machines
244 Machine Translation
104 Linguistics
652 Analysis
Language Analysis
Machine Translation Analysis
545 Linguistic Analysis
Linguistic Machine Translation Analysis
Figure 27
interpretation is indicated by the comma, which precedes the word instead of following it. Since commas are used so diversely, they have not been exploited in the present programme. From the above accounts, it will be clear that, though we are indeed at fault in having used ‘thesaurus’ in our reports in different senses, yet these senses are more cognate than might at first sight appear. 4.
To what extent is a thesaurus interlingual?
The extent to which any thesaurus is interlingual is, in practice, one of the most difficult possible questions to discuss. For two questions, which should be separate, always become inseparable. Firstly, ‘What would it be like for a thesaurus to be, or not to be, interlingual?’ And secondly, ‘So long as one
What is a thesaurus?
141
and only one coded mathematical structure is used as the intermediate vehicle for translation, does it matter if it is, to a certain extent, arbitrary?’ 4.1.
The search for head-overlap between thesauruses in different languages
The obvious first way to go about considering this double-headed question is to ask whether thesauruses exist for many different languages and, if they do, is there an overlap in their heads? The immediately obtainable answer to this question is apparently most encouraging. Thesauruses with heads directly taken from Roget do exist in French, German, Hungarian, Swedish, Dutch, Spanish and Modern Greek.10 This transference – and especially the transference into Hungarian, constitutes a high testimonial to the heads of Roget – unless the heads in the first place could safely be arbitrary. Now a procedure has been devised to test arbitrariness in heads. It was devised by Gilbert W. King, and was tried out on three subjects at IBM Research Laboratory, Yorktown Heights, New York in November 1953. The heads selected were cause, choice and judgement. The words from these heads were separately written on different slips of paper. Fifty per cent of them were left in piles to ‘define’ the heads; the titles of the heads were not made known to the subjects. The other 50% of the words were shuffled and given to the subjects, who had to separate them back into their correct heads. All the three subjects proved able to do this with over 95% of accuracy. Moreover, they all titled the three heads correctly, and a misprint, ‘usual’ for ‘casual’, was without difficulty detected. Finally, a later attempt by one subject (the present author) to repeat the test, with the three heads existence, substantiality and intrinsically failed; words like ‘real’ ‘hypostatic’, ‘evident’, ‘essential’, ‘concrete’, ‘matter of fact’, ‘truth’ and so on cannot be identified as belonging to any one, rather than any other, of the three. So it seems at first sight as though we have succeeded in contriving a simple and effective head-arbitrariness. It is all the more disconcerting, therefore, to find that it is the arbitrary heads, as well as the empirically folded ones as judged by this test, that are blithely transferred from Roget thesaurus to Roget thesaurus. Let us next consider, in the search for head-overlap, the extant thesauruses that have not derived their heads from Roget. There is, for instance, Der Deutscher Wortschatz nach Sachgruppen geordnet by Franz Dornseiff, the Dictionaire analogique by M. C. Maquet, and various 10
This information, together with other information used in this section, comes from Der Deutscher Wortschatz.
142
The thesaurus as a tool for machine translation
alphabetically ordered synonym dictionaries covering most of the European languages. These are encouraging to look at not only because there is a very considerable head-overlap between them and Roget, but also because the Roget heads that they have dropped are not the heads that it is likely that the test for genuineness, described above, would give as arbitrary. There is less overlap, as one would expect, between heads of the ancient thesauruses and the modern ones. By the time one has documented oneself on the Amari Kosha and the Shuo Wen, however, and ignored the rumour that there is a Sumerian thesaurus, and has asked why a hieroglyphic thesaurus has not been found, when they obviously had to have one, one is beginning to revive from one’s first discouragement. One thing is clear: thesaurus making is no evanescent or fugitive human impulse. It is, on the contrary, the logically basic principle of word classification; the same principle as that which inspired the age-old idea of scripting a language by using pictographic or ideographic symbols. So, surely, something can be done to relate thesauri? Something that does not presuppose a complete cynicism as to the empirical foundation of the nature of the heads? 4.2.
The procedure of comparing rows and lists
In the special section on heads, above, it was asserted that heads, by their nature, must represent frequently noticed extra-linguistic contexts. It follows from these facts that it is contexts, not facets, that are being classified and that the heads of a language are only the language users’ frequently noticed set of extra-linguistic contexts, not the total possible set of extralinguistic contexts. It follows that encyclopaedic knowledge of all facts is not required by a thesaurus maker, before he or she can assign word uses to heads, but only a thorough knowledge of the contexts of the language. This is all right in a theoretical exposition. As soon as one changes, however, even in one’s mind, from the very general word ‘context’ to the more easily understandable word ‘situation’ (thus replacing ‘extra-linguistic context’ by ‘extra-linguistic situation’) then it becomes apparent that a sharper, smaller interlingual unit than that of a head is what is for practical purposes required. Consider, for instance, the comparable head paragraphs, taken from an English, a French and a German thesaurus respectively, and given below: 1. English: from Roget’s Thesaurus: Head 739: Severity. N. Severity; strictness, formalism, harshness, etc. adj.; rigour, stringency, austerity, inclemency, etc. 914a; arrogance etc. 885 arbitrary power; absolutism, despotism; dictatorship, autocracy, tyranny, domineering, opression; assumption,
What is a thesaurus?
143
usurpation; inquisition, reign of terror, martial law; iron heel, iron rule, iron hand, iron sway; tight grasp; brute force; coercion, etc. 744; strong hand, tight hand. 2. French: Dictionnaire analogique, edited by Maquet: catchword Dur. (The catchwords are not numbered, being listed alphabetically.) Dur d’autorite´ Se faire craindre. Se´vir, se´vices Maltraiter. Malmener. Rudoyer. Traiter de Turc a More. Parler en maıˆ tre. Parler d’autorite´. Ton impe´ratif. Ne pas badiner. Montrer les dents. Cassant. Rembarrer., – Discipline. Main de fer. Inflexible. Rigide. Se´ve`re. Strict Tenace. Rigoreux. Exigeant. – Terrible. Tyrannique. Brutal Despotique. – Re´barbatif. Pas commode. Grandeur. Menac¸ant Cerbe`re. Intimider. 3. German: Deutscher Wortschatz, Head 739 Strenge. Harte. Unerbittlichkeit. Unerschu¨tterlichkeit. Hartherzigkeit. Herzenshartigkeit. Grausamkeit. Ru¨cksichtslo¨sigkeit. Gemeinheit. Unduldsamkeit. (Intoleranz). Rechthaberei. Unnachsichtigkeit. From a comparative inspection of these paragraphs two things become clear. Firstly, it is clear that the paragraphs are not interlingual, though the heads pretty exactly correspond; secondly, that the words could be rearranged so as to make the three paragraph-structures correspond a great deal more closely than they at present do. Moreover, there are two classificatory devices that could be employed here; firstly, that of getting the words of the same part of speech next to one another (and as has been already hinted in Section 2, the relevant parts of speech in this particular case are by no means as purely monolingual as they look); secondly, the further device of classifying words of the same part of speech by their ‘feel’ (or aspect). ‘Traiter de Turc a More’ for instance, and ‘rule with an iron hand’ are both concrete images, both continuous processes, both pejoratives, both phrases indicating violence, both phrases describing a social habit of human beings. All these aspect-indicators are interlingual; there will not be a large class of word uses in either language that have all of them; together with the head-reference, which in this case is very highly interlingual, they may well jointly specify a single interlingual point. Nor is the comparative example that I have just given in any way exceptional; on the contrary, many paragraphs correspond more closely than these three. Comparative perusal of thesauruses, then, shouts out for an interlingual way of defining paragraphs and aspects; and that without any concessions to preconceived theory. And if one is now determined not to be theoretical,
144
The thesaurus as a tool for machine translation
Roget’s rows
Discursive description of row
whiteness
people think of the abstract notion of WHITENESS; a colour white concrete objects, both solid and liquid
snow, paper, chalk, milk, lily, ivory; white lead, chinese white, white-wash, whitening render white, blanch, white-wash, silver, frost white; milky, milk-white, snow-white, snowy, candid white as a sheet; white as the driven snow
vision, sight, optics, eye-sight visual organ, organ of vision, eye eye-ball, retina, pupil, iris, cornea, white abject fear, funk white feather, faint-heart, milk-sop, white liver, cur, craven faint-hearted, chicken-hearted; yellow, white-livered etc.
the action of causing something to become white people see objects having a white appearance concrete whiteness of colour being used to symbolise mental states of FEAR, INNOCENCE the faculty of seeing the part of the body with which a man sees list of parts of the eye people exhibiting this picturesque statements of the appearance and physiology of people exhibiting COWARDICE people abusing their fellows in concrete terms for exhibiting COWARDICE etc.
the obvious method to start streamlining paragraphs is in one’s own language; and the way to do this, in each case, is to coin a descriptive phrase. Above is an extract from an attempt by me to use this method to define a set of sub-paragraphs in Roget’s Thesaurus that contain the word white. If it is desired to test my descriptions against other possible descriptions, all that is required is to cover up the right-hand column, in the table above, make your own set, uncover the column again, and compare.11 The question which this leads us to ask is twofold: (1) could the descriptions in the right-hand column be expressed in an arbitrarily chosen language (I think they could). (2) could a limited vocabulary be found for expressing them, which itself could be translated into any language?
11
It will be noticed that many of the row descriptions are verbal phrases, not noun phrases. The frequent use of these may be my personal idiosyncrasy; though the frequent appearance of such phrases in NUDE entries also suggests otherwise; if the tendency to use verbal phrases for row definition is a natural one, then the criticism that a ‘thesaurus’ is a system consisting only of nouns (G. W. King) is unfounded.
What is a thesaurus?
145
This limited vocabulary is what we hope Lattite is. Lattite is the set of translatable mutually exclusive subsets of syntax-markers and archeheads that is being used on the thesaurus at present being multiply punched onto cards. The reason why I am at present very coy about issuing definite lists of Lattite markers and archeheads is that until this thesaurus has been constructed and tested, it will be impossible to discover which of the Lattite terms turn out to define aspects, and which paragraphs and rows. Instead of Lattite, therefore, I propose to discuss NUDE, the simpler interlingua with two sentential connectives, forty-eight elements and two list-numbers, and nothing else at all. (And again, the spectral question lurks in our minds: Suppose, whether using Lattite, or using NUDE, different compilers give wholly different descriptions of the content of a row; either because they mistranslate some term of Lattite when operating Lattite in their own language, or because they ‘see’ the content of a row in a way differently from that in which other compilers ‘see’ it. Suppose this happens. Does it matter? Surely it does.) [ . . . ]
Part 3
Experiments in machine translation
6
‘Agricola in curvo terram dimovit aratro’
This chapter examines a first-stage translation from Latin into English with the aid of Roget’s Thesaurus of a passage from Virgil’s Georgics. The essential feature of this program is the use of a thesaurus as an interlingua: the translation operations are carried out on a head language into which the input text is transformed and from which an output is obtained. The notion of ‘heads’ is taken from the concepts or topics under which Roget classified words in his thesaurus. These operations are of three kinds: semantic, syntactic and grammatical. The general arrangement of the program is as follows: 1. Dictionary matching: the chunks of the input language are matched with the entries in a Latin interlingual dictionary giving the raw material of the head language; this consists of heads representing the semantic, syntactic and grammatical elements of the input. 2. Operations on the semantic heads: these give a first-stage translation. 3. Operations on the syntactic heads: giving a syntactically complete, though unparsed, translation. 4. Operations on the grammatical heads: giving a parsed and correctly ordered output. 5. Cleaning up operations: the output is ‘trimmed’ by, e.g., insertion of capital letters, removal of repetitions like ‘farmer-er’. Only Stage 2 of the procedure is given in detail here. 1.
Information obtained from Stage 1
The Latin sentence to be translated was chunked as follows: AGRI-COL-A IN-CURV-O TERR-AM DI-MOV-IT AR-ATRO A number of these generated syntactic heads only. Those with semantic head entries are AGRI-COL-IN-CURV-TERR-DI-MOV-AR-. The interlingual dictionary entries for each chunk were constructed by a transformation into thesaurus heads of the information given in Lewis’ 149
150
Experiments in machine translation
Latin Dictionary for Schools for all words containing the chunk in question. This can be followed by comparing the semantic head sets and the dictionary entries taken from Lewis’ Dictionary given below. Sample head set corresponding with Lewis’ Dictionary entry form from which it was made AGRI181 REGION 189 ABODE 371 AGRICULTURE 780 PROPERTY AGER, GRI, . . . I. In a restricted sense, improved or productive land, a field, farm, estate, arable land, pasture etc: II. In an extended sense. A. Territory, district, domain, the soil belonging to a community. B. the fields, the open country, the country. C. Poet, plain, valley, champaign:[quotes]. AGRICOLA, AE. . . . I. Prop. a husbandman, agriculturer, ploughman, farmer, peasant: [quotes]. II. Praegn. a rustic, boor, clown:[quotes].
Semantic head sets of the input text given by the interlingual dictionary AGRI181 REGION 189 ABODE 371 AGRICULTURE 780 PROPERTY
-COL188 INHABITANT 186 PRESENCE 758 CONSIGNEE 371 AGRICULTURE 342 LAND 876 COMMONALITY
-IN54 COMPOSITION 176 TENDENCY 176 INTERIORITY 232 ENCLOSURE 247 CONVOLUTION 264 MOTION 259 FURROW 278 DIRECTION 286 APPROACH 294 INGRESS 300 INSERTION
-CURV244 ANGULARITY 245 CURVATURE 279 DEVIATION
TERR- (1) 181 REGION 211 BASE 318 WORLD 342 LAND 673 PREPARATION
TERR- (2) 668 WARNING 669 ALARM 378 PAIN 860 FEAR 887 BLUSTERER
DI44 DISJUNCTION
-MOV371 AGRICULTURE
AR- (1) 371 AGRICULTURE
‘Agricola in curvo terram dimovit aratro’ 49 DECOMPOSITION 91 BISECTION
61 DERANGED 140 CHANGE 264 MOTION 673 PREPARATION 615 MOTIVE 824 EXCITATION 49 DECOMPOSITION 44 DISJUNCTION 259 FURROW
AR- (2) 340 DRYNESS 384 CALEFACTION
AR- (3) 1000 TEMPLE 903 MARRIAGE
151 247 CONVOLUTION 876 COMMONALTY
2.
Discursive description of the set of operations used on semantic heads (i.e Stage 2 of the translation procedure)
2.1.
Elimination of unwanted heads by intersection
2.1.1. Standard procedure It is assumed that those semantic concepts relevant to the sentence to be translated will occur repeatedly (i.e. at least more than once). Selection of the heads representing those concepts could therefore be obtained by an intersection procedure as follows: Each member of the head set representing a chunk is matched in turn with all other heads occurring for other chunks in the sentence. Only those heads occurring twice or more are retained. 2.1.2. Removal of puns This procedure should eliminate puns: a chunk such as TERR- has two completely different sets of heads, only one of which is relevant in a particular context. The unwanted heads will probably fail to occur elsewhere in the sentence so that only the relevant heads representing the appropriate chunk in question are retained. 2.1.3. Scale of relevance procedure It may happen that all members of the head set for a particular chunk fail to intersect. In this case, we try to find heads in the rest of the sentence that are closely related to the heads in this set. For purposes of the present test, heads that are within the same bracket in the Table of Contents in Roget’s Thesaurus are regarded as closely related. The procedure is as follows: all the heads occurring in the same bracket(s) of the Table of Contents as those already given for a nonintersecting chunk are introduced; from a practical point of view they are regarded as representing a new chunk in the sentence. The intersection procedure can be carried out as before.
152
Experiments in machine translation
If unsuccessful, the manoeuvre can be repeated using bigger brackets in the Table of Contents. It should be noted that the introduction of these new head sets may increase the number of intersections for other chunks in the sentence. After the intersection has been carried out, the heads retained for the new chunk are amalgamated with those of the chunk that generated it. We now have for each unit of head language a group of heads that have shown themselves to be relevant to the subject under discussion. Thus for -MOV- we have: AGRICULTURE PREPARATION DECOMPOSITION DISJUNCTION FURROW MOTION List of heads from ‘special form’ brackets and ‘motion with reference to direction’ required for the extended translation procedure SPECIAL FORM 244 ANGULARITY 246 STRAIGHTNESS 248 CONVOLUTION
245 CURVATURE 247 CIRCULARITY 249 ROTUNDITY
MOTION WITH REFERENCE TO DIRECTION 278 DIRECTION 280 PRECESSION 282 PROGRESSION 284 PROPULSION 286 APPROACH 288 ATTRACTION 290 CONVERGENCE 292 ARRIVAL 294 INGRESS 296 RECEPTION 298 FOOD 300 INSERTION 302 PASSAGE 304 SHORTCOMING 306 DESCENT 308 DEPRESSION 310 PLUNGE 312 ROTATION 314 OSCILLATION
279 DEVIATION 281 SEQUENCE 283 REGRESSION 285 TRACTION 287 RECESSION 289 REPULSION 291 DIVERGENCE 293 DEPARTURE 295 EGRESS 297 EJECTION 299 EXCRETION 301 EXTRACTION 303 OVERSTEP 305 ASCENT 307 ELEVATION 309 LEAP 311 CIRCUITION 313 EVOLUTION 315 AGITATION
‘Agricola in curvo terram dimovit aratro’
153
The sets of heads derived after the non-intersecting heads have been eliminated AGRIREGION AGRICULTURE
-COLAGRICULTURE LAND COMMONALTY
INCONVOLUTION FURROW MOTION
-CURVANGULARITY CURVATURE
SPECIAL FORM ANGULARITY CURVATURE CONVOLUTION
TERRREGION LAND PREPARATION
DIDISJUNCTION DECOMPOSITION
-MOVAGRICULTURE PREPARATION DECOMPOSITION DISJUNCTION FURROW MOTION
ARAGRICULTURE FURROW CONVOLUTION COMMONALITY
Note that the thesaurus has been expanded so as to allow of the insertion of a set of curve-producing tools (of which Roget takes cognisance of only one member, corkscrew) under CONVOLUTION. Roget classifies a ploughshare as a cutting edge, but not as a device for turning over the sod. In fact, ploughs, anchors and so on are less convoluted than horns, serpents and corkscrews, but more convoluted than horseshoes, crooks or sickles, and therefore constitute an intermediate head. Lacking courage to construct this, I have classed them under CONVOLUTION. The introduction of SPECIAL FORM is due to the failure of -CURVto intersect with any of the other words. I therefore introduce, as a new chunk, all the other heads in the bracket titled ‘SPECIAL FORM’, which includes ANGULARITY and CURVATURE given by -CURV-. We can then obtain our intersections. The bracket titled ‘MOTION, with reference to DIRECTION’ was also introduced as it includes DEVIATION, which is also given by -CURV-. This did not, however, result in any intersections and was therefore eliminated. 2.2.
Selection of correct output word from the select head sets representing each chunk
Here the actual translation from head language to output language is made. (As the output language is English, the interlingual thesaurus, Roget, can still be used. This need not necessarily be the case.) The procedure is as follows: the contents of each head retained for a chunk are compared in turn
154
Experiments in machine translation
with those of all other heads retained for that chunk. Any word that occurs more than twice is retained as output. This output constitutes a first-stage semantic translation of the text. It is obvious that difficulties may occur either if no intersection is obtained, or if there is only one head retained for a word. 2.2.1.
Output of a set of translation intersections to obtain the words of the output text The notation used below is to be interpreted as follows: A ˆ B ¼ C----C is to be interpreted, ‘When the list of synonyms given by Roget under the head A is compared with the list of synonyms given by Roget under head B, the series of words C1--C2, which we will call the output, will be found to occur in both lists of synonyms’. The Output of these intersections should be referred to any words having the two heads concerned. E.g. AGRICULTURE FURROW relates both to -MOV- and AR-.
[Editor’s note: each of the head pairs on a line below is an intersection operation on the thesaurus.]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE AGRICULTURE LAND LAND LAND REGION FURROW FURROW ANGULARITY ANGULARITY CURVATURE CONVOLUTION DISJUNCTION CONVOLUTION DISJUNCTION DISJUNCTION DISJUNCTION
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ
REGION LAND COMMONALTY FURROW CONVOLUTION PREPARATION DECOMPOSITION DISJUNCTION MOTION PREPARATION COMMONALTY REGION PREPARATION CONVOLUTION COMMONALTY CURVATURE CONVOLUTION CONVOLUTION COMMONALTY DECOMPOSITION MOTION PREPARATION FURROW MOTION
¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼
etc. 189 farmer ploughman, tiller of the soil, rustic plough no output till, cultivate the soil no output no output no output no output no output ground, soil no output no output no output bend, etc. 217 no output curl no output disperse, etc. 73, break up no output no output no output no output
‘Agricola in curvo terram dimovit aratro’ 25 26 27 28 29 30
DECOMPOSITION DECOMPOSITION DECOMPOSITION PREPARATION PREPARATION FURROW
ˆ ˆ ˆ ˆ ˆ ˆ
PREPARATION FURROW MOTION FURROW MOTION MOTION
¼ ¼ ¼ ¼ ¼ ¼
155 no output no output no output no output cultivation, cultivate no output
If two heads have a common cross-reference, this head should be included in the intersection procedure. We now bring down: 73 DISPERSION 189 ABODE 217 OBLIQUITY We then reinsert ABODE in the head set of AGRI- (where it once belonged). OBLIQUITY we insert in the head sets of -CURV- and SPECIAL FORM, which both contain ANGULARITY and CURVATURE as members; and we insert DISPERSION as an extra head in the head sets of DI- and -MOV-, both of which have both DISJUNCTION and DECOMPOSITION as members. We then perform a further set of intersections as follows:
31 32 33 34 35 36 37 38 39 40 41
AGRICULTURE REGION ANGULARITY CURVATURE CONVOLUTION DISJUNCTION DECOMPOSITION AGRICULTURE PREPARATION FURROW MOTION
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ
ABODE ABODE OBLIQUITY OBLIQUITY OBLIQUITY DISPERSION DISPERSION DISPERSION DISPERSION DISPERSION DISPERSION
¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼
farm etc. 232 incline, bend, crook, crooked bend, crook, etc. 245 twist disperse, etc. 44 no output sow no output no output no output
We now bring down 44 DISJUNCTION 232 ENCLOSURE 254 CURVATURE of which we retain only ENCLOSURE (under AGRI-, since both the others already exist under the relevant heads). We thus get the further set of intersections:
156
Experiments in machine translation
42 43 44
AGRICULTURE ABODE REGION
2.2.2.
ˆ ˆ
ˆ
ENCLOSURE ENCLOSURE ENCLOSURE
¼ ¼ ¼
no output no output no output
An example of the method of translation-intersection of heads 259
371
Furrow – N. furrow, groove, rut, scratch, streak, stria, crack, score, incision, slit; chamfer, fluting, channel, gutter, trench, ditch, dike, dyke, moat, fosse, trough, kennel; ravine, etc. 198. V. furrow etc. n; flute, groove, carve, corrugate, plough, incise, chase, enchase, grave, etch, bite in, cross-hatch. Adj. furrowed etc. v; ribbed, straited, fluted; corduroy. Agriculture – N. agriculture, cultivation, husbandry, farming, agronomy; georgics; tillage, tilth, gardening, vintage; hort-, arbor-, silv-, vit-, flor-iculture; intensive culture; landscape gardening; forestry, afforestation. husbandman, horticulturist, gardener, florist; agriculturalist; yeoman, farmer, cultivator, tiller of the soil, ploughman, sower, reaper; woodcutter, backwoodsman, forester; vine grower, vintager. field, meadow, garden; botanic-, winter-, ornamental-, flower-, kitchen-, market-, hop-garden; nursery; green-, hot-, glass-, house; conservatory, cucumber-, cold frame, cloche; bed, border; lawn; park etc. 840; parterre, shrubbery, plantation, avenue, arboretum, pinery, orchard; vineyard, vinery, orangery; farm etc. 189. V. cultivate; till; farm, garden; sow, plant; reap, mow, cut, crop etc. 789; manure, dig, delve, dibble, hoe, plough, harrow, rake, week, lop and top, force, transplant, thin out; bed out, prune, graft. Adj. agricultural, -arian. arable; rural, rustic, country, bucolic; horticultural.
The procedure consists in comparing the above sections word by word, from which it will be seen that the common output is plough. WARNING: The use of hyphens in Roget’s Thesaurus is ambiguous, since the constituent words of a hyphenated sequence of words, e.g. setshoot- up, are not repeated within the same head, even though set, and set up can be synonyms of one another. In this matter the person operating the thesaurus must use his or her own judgement.
‘Agricola in curvo terram dimovit aratro’
157
2.2.3. Semantic translation of the text (That is, translation with the syntax unresolved with DI- and -MOV-combined, and with IN- and -CURVcombined.)
AGRIfarm
-COLfarmer ploughman tiller of the soil rustic
TERRground soil
DIMOVplough till cultivate the soil cultivation cultivate disperse break up sow
INCURVbend incline crook crooked twist ARploughman tiller of the soil rustic plough
Note that there is no output for IN-. This fact reflects the somewhat redundant character it has. The syntactical and grammatical operations must now be carried out to choose between these alternatives, to reorder the whole sentence and to introduce the additional elements that are necessary to make the output a correct sentence. [Editor’s note: presumably so as to create some form like ‘the farmer breaks up the ground with (his) crooked plough’, which is the only form the syntax/morphology of Latin allows and which uses up all the derived content, and – as we might now say – preserves the best target collocations.] 2.3.
The head set of the new ‘chunk’ shown as a lattice so that the procedure for applying the scale of relevance may be made precise
Author’s note: It can be seen that the use of the bracket group of heads, as described in the Scale of Relevance procedure, can be looked at from another point of view as utilising the lattice property of language. Made more precise, the procedure is: compare each head in the head set of the non-intersecting chunk (in this case -CURV-) with the Table of Contents (this last being arranged as a lattice). If, to find a common idea between any two heads in the head set, not more than two steps need be taken up the lattice, bring this common idea down as a new chunk in the input text, this
158
Experiments in machine translation Special Form
Curvature
Rotundity
Convolution
(Non-curvature)
Circularity
Angularity
Straightness
Figure 28
new chunk being inserted after the original non-intersecting chunk. (Thus SPECIAL FORM, the new chunk, will be inserted after -CURV-.) See whether any of the heads in the head set of the new chunk intersects with any head of any of the head sets in the chunks of the input text. If an intersection is obtained, amalgamate the head sets of SPECIAL FORM and -CURV- to form a single head set. If no intersection is obtained, extend the procedure to bring down the second Scale of Relevance (i.e., in this case, bring down all the heads given in Roget’s Table of Contents under GENERAL, SPECIAL and SUPERFICIAL FORM) and try again for an intersection, as above. If it is still the case that no intersection is obtained, the chunk -CURV(or more probably the whole word INCURVO) becomes an untranslateable word of the head language – as it might be a foreign word – and is carried through complete into the English output, all heads being given in the English output text. Authors’ note: This program is based partly on an interlingual translation programme by R. H. Richens published in July 1957 as a workpaper of the CLRU entitled ‘The Thirteen Steps’, partly on a thesaurus-using translation procedure by Margaret Masterman, from a paper entitled ‘The Potentialities of a Mechanical Thesaurus’ (this volume, chapter 4), read at the 2nd International Conference on Machine Translation (MIT 17 October 1956), and partly on a library-retrieval procedure making use of a thesaurus devised by T. Joyce and R. M. Needham, described in a CLRU workpaper entitled ‘The Thesaurus Approach to Information Retrieval’.
‘Agricola in curvo terram dimovit aratro’
159
Commentary by Karen Spa¨rck Jones Much of the machine translation (MT) research of the 1950s and 1960s focused on syntax; however, some groups, notably the Cambridge Language Research Unit (CLRU), argued that semantics was much more important. The CLRU addressed the problem of lexical disambiguation, and advocated the use of a thesaurus as a means of characterising word meanings, in part because the structure of a thesaurus naturally supports procedures for determining the senses of words or, complementarily, for finding words for meanings. The assumption is that text has to be repetitive to be comprehensible so, in the simplest case in disambiguation, if a word’s senses are characterised by several thesaurus classes, or heads, the relevant one will be selected because it is repeated in the list for some other text word. In text production, the fact that two heads share a word suggests that this is the right one; for translation this mechanism could provide a means of selecting appropriate target-language equivalents for source words. The thesaurus could thus be seen, for translation, as constituting an interlingua; and as it appeared it could be formally modelled as a lattice, procedures using it could be formally specified as lattice operations. It was further argued that syntax, and grammar, could be approached through the thesaurus, though this was never worked out in detail. In particular, the relation between syntax and semantics in text processing was never properly specified, though one strength of the way a thesaurus was used for disambiguation was that its application was not narrowly constrained, as it was later by Katz and Fodor (1963), by syntactic structure. But equally, the experiments done were very simple, so the need to relate syntactic and grammatical information to semantic information in processing was underestimated. Actual tests on sense selection in the translation context tended to retain input word order in the initial output, for hypothesised rearrangement for the final output. The experiment described in this chapter is part of a series carried out by the CLRU in the late 1950s: Latin was chosen as the input language as the only one apart from English common to all members of the CLRU. The experiments could not be carried out automatically, as the CLRU had no computer, but were done ‘mechanically’, that is by working with paper lists in the style required for the procedures using punched card apparatus then being devised at the Unit. The essence of the experiment described here was to select the appropriate senses of words, or rather of their semantically significant morphological components, referred to as chunks, by selecting those heads in each chunk’s list that were shared with some other chunk, and then to obtain the corresponding English chunk (in fact word) as any item common to the heads in each chunk’s list.
160
Experiments in machine translation
The procedure included strategies for dealing with failures to obtain any common heads or words. This test, like the other CLRU ones, was a very limited one. But the experiments the CLRU did were tests of well-defined procedures. The idea of using a thesaurus was a very attractive one, and the CLRU’s ideas on the semantic aspects of natural language processing were known to other research workers at the time. Editor’s note: As can be seen from the CLRU bibliography at the end of the volume, this chapter was originally a work paper by MMB, the late Roger Needham, Karen Spa¨rck Jones, and Brian Mayoh. It is reprinted here with their permission, where available. At the time of this experiment to resolve word senses with Roget’s thesaurus, most considered it had failed, for all its boldness and originality. More recently Yarowsky (1992) has re-used Roget’s Thesaurus with a machine learning algorithm and achieved very high levels (90%þ) of word-sense disambiguation with it.
7
Mechanical pidgin translation
This chapter gives an estimate of the research value of word-for-word translation into a pidgin language, rather than into the full normal form of an output language.
1.
Introduction
The basic problem in machine translation is that of multiple meaning, or polysemy. There are two lines of research that highlight this problem in that both set a low value on the information-carrying value of grammar and syntax, and a high one on the resolution of semantic ambiguity. These are: 1. matching the main content-bearing words and phrases with a semantic thesaurus that determines their meanings in context; 2. word-for-word matching translation into a pidgin language using a very large bilingual word-and-phrase dictionary. This chapter examines the second. The phrase ‘Mechanical Pidgin’ was first used by R. H. Richens to describe the output given at the beginning of Section 2 of this chapter (below), which, he said, was not English at all but a special language, with the vocabulary of English and a structure reminiscent of Chinese. Machine translation output always is a pidgin, whose characteristics per se are never investigated. Either the samples of this pidgin are post-edited into fuller English, or the nature of the output is explained away as low-level machine translation, or rough machine translation, or some vague remark is made to the effect that pidgin machine translation is all right for most purposes. Thus, if a pidgin dictionary is defined as one made by using the devices 1–4, given below, it might be said that the use of a pidgin dictionary characterises all machine translation programs. For in all programs a special dictionary is used to translate a limited subject matter, pidgin variables (see below) form part of the output text, and some difficult grammatico-syntactic features of English (e.g. the use 161
162
Experiments in machine translation
of certain auxiliary verbs or of articles) are deliberately not accounted for by the program. But there is a difference, indicated below by additional requirements. For the Cambridge Language Research Unit were deliberately setting out to accentuate and explore the pidginness of pidgin as a language in its own right, on the assumption that it was a basic language. The general requirements of a pidgin dictionary are the following: 1. Predominance of dictionary entries for phrases rather than words. 2. Special sub-dictionaries, and the presupposition that a choice of subdictionary appropriate to the text has been made. 3. Specially constructed symbols, here called pidgin variables, i.e. widely ambiguous words that the reader intuitively interprets according to the context (Reifler’s HE/SHE/IT (1954) is a pidgin variable). 4. The omission of grammatical and syntactic features of the input language that a word-for-word machine translation program cannot transform. The special requirements for a mechanical pidgin dictionary as defined here are the following: 5. It must not allow of any alternatives being included in the output, between which the reader of the output must find a way to choose. The theory behind this rule is that a reader is less confused by a text containing occasional vague equivalents than by one containing all the possible equivalents of every word (International Business Machines, 1959). 6. The program must contain no provision for changing the word order of the text. This pinpoints the importance of studying what the older grammarians called the actual sequence of ideas (Allen and Greenough, 1888). 7. The pidgin must be treated and studied as a homogeneous language with properties of its own, without consideration of the fact that different specimens of it may be derived from different source languages. The research went as follows. In 1959 a Latin–English mechanical pidgin dictionary of 700 entries was used as a control for other, more analytic, machine translation programs. The extreme difficulty of doing better than the control stimulated interest in doing mechanical translation into pidgin for its own sake; and in November 1959, an actual pidgin-producing machine program (for a punched-card laboratory) was constructed, debugged and operated. This program performed the same operations as the USAF–IBM photoscopic translation system then performed, except that there was no Rho-stuffing program (International Business Machines, 1959). It chunked words into subwords, not by a peeling-off method (Reifler, 1952), but by a method called
Mechanical pidgin translation
163
by R. M. Needham, who invented it, exhaustive extraction (Needham, 1959; Kay and McKinnon Wood, 1960). It had also a phrase-finding procedure, and performed a one-to-one dictionary match. It had no device for changing word order, nor for printing the output. Output from it is given in Section 3 below. In order to establish the notion of a mechanical pidgin, we start this chapter with output obtained by Booth and Richens, and sophisticate this in stages, beginning with any two sentences, and using the four devices mentioned above. Section 2 is devoted to the construction of a pidgin dictionary for use in the program, and Section 3 operates the program. Finally, the potentialities of the work are estimated. 2.
The construction of a pidgin dictionary: Investigation of Booth and Richens’ mechanical pidgin
2.1.
The text and the pidgin markers
The experimental material used was the mechanical pidgin output originally produced by Booth and Richens, and reported in Locke and Booth, 1955. Twenty sentences in different source languages were taken at random from the literature of plant genetics, sentences with proper names or numerical data being avoided. The samples were taken only from languages with Latin script, except for two sentences from oriental languages, Arabic and Japanese, which were transliterated to illustrate further points. In our investigations we treat these twenty sentences as if they came from a continuous text written in a single language. This continuous pidgin text is printed below, and is preceded by the list of pidgin markers used by Booth and Richens to indicate the function of words in the source languages. These markers, though capable of variable interpretation, are unambiguous in the sense that each marker is associated uniquely with a given class of inflections or constructions in any source language for which it is used. It is an additional assumption in all that follows that it is possible to define in use a single set of markers applicable to each of twenty different languages. This output was not a mechanical pidgin, according to our definition, since many of the main words offered a choice of translations to the reader (these choices being separated in the output given below by a slash) and because the output contained no consciously contrived pidgin variables, that is, translations not occurring in English, designed to cover the whole range of meanings of a single word.
164
Experiments in machine translation
Pidgin markers of Richens and Booth’s mechanical pidgin a accusative d dative f future g genitive i indicative l locative m multiple, plural or dual n nominative
o oblique p past q passive r partitive s subjunctive u untranslatable v vacuous z unspecific
Pidgin output of Richens and Booth in continuous form S1. vine z enter in rest z in autumn/harvest z z from/whence reason zv temperature opv low z. S2. together work z between m country economic z union m and Danish z rural-dweller union mg seed/frog supply is continue p after same line m which/as in/you previous year. S3. v disease come z thus very rapid up and has in many case za/ one total amiss crop then p follow z. S4. other m four foreign country (out of) standpoint r/standard r/bear are shown oneself pm cultivation value g/a very insecure (become). S5. v not is not/step astonish v of establish v that/which v hormone m of growth act m on certain species m, then that/ which v not operate m on of other m if v one dream/consider z to v great v specificity of those substance m. S6. in a/one d large (more) area two form m beside one another live z without self to/too mix z, so belong/hear pzz different m form m circle m at. S7. v small berry v variety m so crop/fruit quantity in as dry matter yield in surpass mv great berry v variety ma. S8. (causative) sow vm sometimes thus enormous v damage vv, till/so that ought v sow once more/also. S9. is been/status prove p that/which v cereal m of winter z grow pm in mountain crowd greenhouse show m little v resistance to cold while v same m/is ps grown pm in field open v are much more resistance m. S10. possible is however not prove/lacking z all m species/appearance same g genus/son-in-law z from same species/appearance o arise z draw p.
Mechanical pidgin translation
165
S11. however is able we already fixed z speak v concerning our river z oak forest z type z extensive (more) z spread z earlier lm time lm as also concerning this z that this z forest z dying out d at least through part d basis l been climatic reason m. S12. growth m of autumn wheat was/wary more variable m from year to year than growth m or spring wheat. S13. direction m bend v shot/trunk answer m direction dm dominant om wind gm and it behoves judge v that swordshaped (abstract noun) is cause pq through wind m. S14. the/to existence of a/one number variable of seed m within of fruit show z that/which v various m ovule m of this plant has identical v possible (abstract noun) of self develop v. S15. chromosome m barley gm cultivated z are of a/one diameter more great z than those v barley gp wild z. S16. v study of v distribution of v temperature m minimum m annual m as is obvious v in all v work, reduce vv justification to v density of v station m and to record of observation m of each a/one of v. S17. round/if earth v been freeze p long and deep, has no injury of clover rot v get pq. S18. entire acidity view p (from) always rich is/become (not) wine m our because malic acid decomposition condition a/v desire d suitable is not. S19. and occur time mz division of chromosome mv limited z and that period v division v sperm v last result zd occurrence m mitotic z. S20. this endure cold sex/disposition g difference (as for) tetraploid n diploid (at) sort/compare d/also osmotic pressure n high (adverb) becoming is fact v large v reason with/when consider q. 2.2.
Sophistication of two sentences of the text by stages, to form an intelligible pidgin translation
The sentences chosen were those obtained from Italian and from Latin, that is, S9 and S10. The stages of sophistication were as follows: 2.2.1. Stage 1: Remove z and v To do this requires reference back to the two source languages, since it is often the case that Richens has thrown away information as vacuous and/or unspecific for a full English translation, which could be carried over into the pidgin by creating a pidgin variable.
166
Experiments in machine translation
The two original sentences were as follows (the asterisk* indicates a chunking-point): S9. Italian: E stato pro*ato che i cerial*i dinvern* o cresc*iuti in serra mostr*ano proc*a resistenza a; freddo, mentre gli stessi cresc*iuti in campo apert* o sono molt* o piu resisten*i. Output: is being/status prove p that/which v cereal m of winter z grow pm in mountain/crowd/ greenhouse show m little resistance to cold while v same m/is ps grown pm in field open v are much v more resistant m. English: It has been proved that winter cereals grown under glass show little resistance to cold, while those grown in the open are much more resistant. S.10 Latin: Possibil*e est, at non expert*um, omn*es speci*es eiusdem generis ab eadem speci*e ort*um trax*isse. Output: Possible z is however not prove/lacking z all m species/ appearance same g genus/son-in-law z from same species/appearance o arise z draw p. English: It is possible, though not proved, that all species of the same genus have been derived from the same species. Inspection of the above shows that Richens has translated by z (unspecific) all the Latin and Italian endings that are grammatically ambiguous, and he has translated by v (vacuous) all the Italian endings that are so very ambiguous that, in Richens’ view, they mean nothing at all. As might have been expected, therefore, from the nature of the two languages, the pidgin output from Latin is sprinkled with z’s, but has no v’s, whereas the pidgin output from Italian is sprinkled with v’s but has only one z. The z and v dictionary entries that produced the two outputs were as follows: Latin -e -um -is -adem
z z z z
Italian i -o -a -e
v z v v
Nothing can be done with these, pidginwise, so they are deleted from the output. The result is as follows:
Mechanical pidgin translation
167
From Latin: possible is however not prove/lacking all m species/appearance same g genus/son-in-law from same species/appearance o arise draw p. From Italian: is been/status prove p that/which cereal m of winter grow pm in mountain/crowd/greenhouse show m little resistance to cold while same m/is ps grown pm in field open are much more resistant m. 2.2.2. Stage 2: convert m, g, o, p Refer back to the two source languages. We will take the Latin sentence first. In the Latin, we come upon two mistakes: 1. -es, the ending which comes at the end of both omn-es, and speci-es, is translated m in the first case, and not translated at all in the second case. Moreover, if the pidgin dictionary is to have any generality -es cannot be translated m, since -es is also a third declension singular ending: indeed it is the nominative singular ending of species itself. Even if Richens is picking up the m semantically, from the stem-meaning of omn-is, ‘all, every, each’, it becomes redundant if omn-is is translated ‘all’ and inappropriate if omn-is is translated ‘each’. Thus the marker should have been z in both cases, and hence should have been deleted at Stage 1. 2. -e cannot be translated by o when it occurs at the end of speci-e since it was translated by z when it occurred at the end of possibil-e. It should therefore be z in both cases, and hence should have been deleted at Stage 1. g we pidginise as ‘-ish’ for any language in which it comes out in the output post-positionally, and as ‘of þ the’ for any language in which it comes out in the output prepositionally. p we pidginise as ‘-ed’ for any language in which it comes post-positionally in the output, and as ‘did’ for any language in which it comes prepositionally in the output. In the Italian case -at, translated by Richens p, we pidginise as ‘-ed’ (see above), -i, which he translated by m, as ‘-s’ (on the assumption that the Plant Genetics pidgin dictionary is never going to have any imperatives. N.B. This means making a special Italian pidgin dictionary for cookery books) and -ano we pidginise as ‘-they’. The result is as follows: From Latin: possible is however not prove/lacking all species/ appearance same-ish genus/son-in-law from same species/appearance arise draw-ed. From Italian: is been/status prove-ed that/which cereals of winter grow-ed-s in mountain/crowd/greenhouse show-they little resistance to cold while same-s/is grow-ed in field open are much more resistant-s.
168
Experiments in machine translation
2.2.3. Stage 3: The creation of pidgin variables Create other pidgin variables as follows: Latin form for species family for genus
Italian: ‘(w) that’ for che
The result is as follows: From Latin: possible is however not prove/lacking all form sameish family to same form arise draw-ed. From Italian: is been/status prove-ed (w)that cereal-s of winter growed in mountain/crowd/greenhouse show-they little resistance to cold while same -s/is grow-ed in field open are much more resistant-s. 2.2.4. Stage 4: The creation of phrases In the two sentences under discussion, the following phrases occur: Latin:
1. possible est ‘it þ is þ possible’ 2. non expertum ‘non þ proven’ 3. ortum traxisse ‘(to þ have) derived þ an þ origin’ 4. eiusdem ‘of þ the þ same’ Italian: 5. e stato ‘has þ been’ 6. cresciuti in serra ‘grown þ under þ glass’ 7. gli stessi ‘the þ same’. The final result of sophisticating the translation of the two sentences is as follows. With the relevant phrases added to the dictionary, the Latin now becomes: ITþ ISþ POSSIBLE HOWEVER NONþ PROVED ALL FORM OFþ THEþ SAME FAMILY FROM SAME FORM (TOþ HAVE)þ DERIVEDþ ANþ ORIGIN. The Italian now becomes: HASþ BEEN PROVE-ED(W) THAT CEREAL-S OF WINTER GROWNþ UNDERþ GLASS SHOW-ED-THEY LITTLE RESISTANCE TO COLD WHILE THEþ SAME GROW-ED IN FIELD OPEN ARE MUCH MORE RESISTANT-S. Comment: Those to whom these sentences were shown had no difficulty in understanding them.
Mechanical pidgin translation
169
2.2.5.
Stage 5: Analysis of the whole text, frequency count and transition to the interlingua NUDE The following initial facts were obtained: 1. Total number of pidgin words in the text 574 2. Number of sentences 20 3. Number of words in each sentence: (These are given together with a list of the source languages): Albanian Danish Dutch Finnish French German Hungarian Indonesian Italian Latin Latvian Norwegian Polish Portuguese Romanian Spanish Swedish Turkish Arabic Japanese
19 30 22 20 46 32 24 18 42 22 54 19 31 32 22 43 19 23 28 28
A marker frequency count was then made. The result of this is given below: Marker frequency count: (the markers are arranged alphabetically) a 3 d 7 g 7 l 3 m 53 n 2 o 3 Total number of marker-occurrences: 189
p q r s v z
14 3 2 1 51 40
170
Experiments in machine translation
Total frequency-count of words in Richens’ pidgin: m v z OF p IN; IN/YOU IS d;g AND;TO;TO/TOO;THE/TO;MORE;(MORE) MORE/ALSO;NOT;NOT/STEP;NOT; AS/ONE;AS;(AS FOR); WHICH/AS; FROM; FROM/WHENCE; (FROM); SAME; THAT: THAT/ WHICH; THIS l;o;q; ARE; AT; (AT); BECOME,BECOMING,(BECOME); BEEN/STATUS;GREAT;GROWTH;HAS;REASON SELF;ONESELF;SHOW;SPECIES/ APPEARANCE;YEAR; a;n; (ABSTRACT NOUN); ALL; AUTUMN; AUTUMN/HARVEST;BARLEY;BERRY (CAUSATIVE); CAUSE; CHROMOSOME; COLD;CONCERNING;COUNTRY;CROP; CROP/FRUIT;DIRECTION;DIVISION; CONSIDER;FOREST;FORM;FRUIT; HOWEVER;IF;LARGE;ON;ONE;OTHER; OUR;OUT;(OUT OF); POSSIBLE;PROVE; PROVE/LACKING;SEE;SEED/FROG;SO; SOW;TEMPERATURE;THAN;THEN;THOSE; THROUGH;THUS;TIME;UNION;VARIABLE; VARIETY;VERY;WHEAT;WIND;WORD;
53 51 40 20 14 10 10 7 6 5 4 3
3 2
2
Alternatives (i.e. groups of words separated by an oblique stroke) are counted as single occurrences of each of the words. Thus CROP/FRUIT is counted as an occurrence of both CROP and FRUIT. The total number of occurrences is thus slightly higher than the number of words in the original text. Inflected forms of the same root were counted as distinct words; for example, DIFFERENT and DIFFERENCE were not counted as the same word. Words in brackets were counted together with words not in brackets (e.g. (MORE) was counted together with MORE).
Mechanical pidgin translation
171
50
Frequency
40
30
20
10
0
10
20
30 40 Rank (by frequency)
50
100
Figure 29. Richens and Booth’s pidgin. Relative frequency of units plotted against rank by frequency (linear scale).
It is doubtful whether any inferences can legitimately be made from this frequency count, given that all constituent sentences of the text came from different source languages, and that the sample is such a small one. If, however, we approach this text not linguistically, but logically, in the older W. E. Johnson sense of logic as Universal Grammar (Johnson, 1921), it can be shown how this procedure, and others like it, suggested to R. H. Richens (the originator, with A. D. Booth, of machine translation from the British side) the first design of NUDE, his interlingua (Richens, 1956, 1959; Masterman, 1962; Spa¨rck Jones, 1963). And the same experiment can also be made to suggest a line-of-design, simpler than NUDE and therefore easier to handle, for a mechanical pidgin to be used in wordfor-word translation operating directly between two languages. The graph (Figure 29) shows the decrease of frequency of occurrence of the words in the sample when this is plotted against their rank order of occurrence. The graph I give was plotted for a very small sample of machine translation output and yet it does suggest that the mechanical pidgin displays this characteristic of natural language to which Zipf drew attention. On the basis of very large samples, Zipf also came to the conclusion that this law is not obeyed by the few enormously frequent words of a language (Zipf, 1953). This deviation for small values of rank order is usually indicated by a small upward bend of the line in Zipf’s graphs.
172
Experiments in machine translation
Discussing this bending phenomenon, Zipf makes the bold guess that the break in frequency of occurrence, indicated by this bend, represents a division of the words of a language into two groups. He suggests that these could function as the two fundamental parts of speech of the language. If Zipf is right in this guess, we may draw an inference as to the construction of a mechanical pidgin. Our sample is too small to show any deviation for the most frequent words. However, we may not only hope to find it in larger samples, we may also bias the construction of the pidgin to accentuate any such deviation. We can do this by extending the use of pidgin variables to produce a class of words with a very wide range of meaning, and hence of very frequent occurrence. (In its untreated state the output’s most frequent words were Richens markers, including among them the vacuous and unspecific markers.) Thus we could reasonably expect to obtain a pidgin that reflected Zipf’s distinction between Group I (the content words) and Group II (the deviants, the bits and pieces of language) words. We would expect the Group II words to be predominantly pidgin variables. Each Group II pidgin variable would need to have both a pre-and postpositional form so that we could translate it optimally for English readers regardless of the nature of its representation in the source language. 2.3.
Logical analysis of mechanical pidgin
Acting on the above, we shall say that pidgin, generated by an identical method from any input language, is a logically basic language, not an eccentricity. Using this assumption, it is easy to see how Richens came to choose the fifty-odd elements of his interlingua, NUDE (Richens, 1956, 1959, Masterman, 1962, Spa¨rck Jones, 1963), from the first output, thus creating by his action a new, as yet undeveloped, field of semantic logic. 2.3.1.
Comparison of the Group II words, or redefined markers, given by the frequency test, with the basic elements of Richens’ NUDE If we now redefine Richens’ marker list, regarding as arbitrary Richens’ distinction between words indicated by lowercase letters and dictionary words, omitting z and v, and conflating o and g, we get the following reorganised marker list in descending order of occurrence: MANY; OF; PAST (i.e. p); IN; IS; THAT; AND; FOR (i.e. d ); TO; MORE; ONE; NOT; THIS; AS; FROM; SAME; If these then be compared by table with Richens’ list of NUDE elements, the semantic overlap becomes evident:
Mechanical pidgin translation
173
Table showing overlap between elements of Richens’ NUDE with the redefined set of markers NUDE elements
Markers
NUDE elements
NOT DONE MUCH CAUSE CAN LAUGH IN PRAY DO ASK UP FEEL MORE COUNT TRUE SELF FOLK PLANT THING WORLD LIFE HEAT GRAIN HOW
NOT PAST MANY
WHERE BANG WILL FOR CHANGE WANT SENSE HAVE USE POINT SAME THINK BE WHOLE ONE PLEASE PART MAN BEAST LINE PAIR STUFF KIND WHEN
2.4.
IN
MORE
AS TO
Markers
FOR
OF THIS, THAT SAME IS ONE
FROM
Sophistication of the whole text
Using the techniques described above we proceed to sophisticate the whole text, given in Section 2.1 with the aid of the information in Tables 7.1 and 7.2. 2.4.1 Recognition and translation of phrases The designation of certain strings of words as ‘phrases’ for the purpose of constructing a phrase dictionary is closely bound up with the decision as to what are the key bits of information that must be received if a text or its pidgin translation are to be understood. Thus the phrase translation ITþ ISþ POSSIBLE of the Latin possibile est, when compared to the translation ‘IS POSSIBLE’ obtained from the separate Latin words possibile and est, yields the bit of information that somebody is postulating something, and that what they are postulating is going to follow. Thus, the phrase tells one that
174
Experiments in machine translation
Table 7.1. Distinguishing prepositional and postpositional variants of Richens’ pidgin markers (sample page)
Old form
New form
Position-in-text
Source language
Source marker
Pidgin translation
a2 a2/v d1 d2 d3 d4 d5 d6 d7 g1 g2/a g3 g4 g5 g6 g7 11 12 13 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 m16 m18 m19
-a -a -d -d -d -d -d dd-g -g -g -g -g -g -g -1 -1 -1 m-m -m -m -m -mA -m -mA -m -m -m -m -m -m -m -m -m -mA
7,24 18,18 6,4 11,40 11,45 13,9 18,20 19,7 19,24c 2,17 4,7 10,11 13,15 15,4 15,19 20,5 11,23 11,26 11,47 2,5 2,10 2,16 2,25 4,2 4,14 5,15 5,17 5,20 5,22 5,28 5,32 5,46 6,10 6,27 6,29 7,6 7,17
Hungarian Turkish German Latvian Latvian Polish Turkish Arabic Arabic Danish Finnish Latin Polish Rumanian Rumanian Japanese Latvian Latvian Latvian Danish Danish Danish Danish Finnish Finnish French French French French French French French German German German Hungarian Hungarian
-at -i -em -ai -a -om -ya lll-ers -n eiusdem -ow -ilor -ilor -no -os -os -a de -er -ers -r Muut -neet -s -issent -es -s -antes -s -en -en -en -en -ak -ban
(vacuous) (vacuous) -WARD -WISE -WISE -WISE -WISE OFþ THE OFþ THE -WARD -WARD -ISH -ISH -ISH -ISH -ISH -POINT -POINT -POINT SOME -S -s -S -S -THEY -S -THEY -S -S (vacuous) -S -S -S -S -S -A -THEY
Mechanical pidgin translation
175
Table 7.2. Showing creation of pidgin variables additional to those given in table 7.1 (sample page) Richens’ output
Position-in-text
Pidgin variable
A/ONE
3;15;6;3;14; 4;15;10;16;41; 13;24;14;28;
AN(E)
(ABSTRACT NOUN) (ADVERB) (AS FOR) (AT) AUTUMN/ HARVEST (BECOME) BELONG/HEAR
20;18 20;7; 20;11; 7;8;
Comment
-MENT
4;20 6;22
-LY -INþ REGARDING -THERE(AT)þ TO AUTUMNþ HARVEST (TIME) BECOMING (AD)HE(A)R(E)
BEEN/STATUS
9;12
STATUS
(CAUSATIVE) CROP/FRUIT DO/ALSO
1, 7,8 20,13
DREAM/ CONSIDER FACT FROM/WHENCE (FROM GENUS/ SON-IN-LAW
5,36
CAUSEþ CROP (MAKE) TOGETHERþ WITH BROODþ ON
20,21 7,10 18,5 10,12
THEþ FACTþ IS FROM/WHEREOF -FROM FAMILY
IN7,IN2,IN4,IN5, IN8,IN9,IN10 IN3/YOU
1;4;1,7;3,11;6,2; 9,15;9,33;16,18; 2,27
IN IN
IN6,IN7,
7,10;7,15;
IN REGARDING
This stretches to the limit the device of making the context the reader’s eye. Picks up all forms of Italian verb to be as phrases regardless of cost in phrase increase
‘dream’ should not be in a plant genetics dictionary i.e. FACT
son-in-law should not be in a plant genetics dictionary
Picks up Danish you to verb
IS POSSIBLE ought to be preceded not by HE or SHE but by IT, so that it ought to be followed either by THAT, making ITþ ISþ POSSIBLEþ THAT or by TO making ITþ ISþ POSSIBLEþ TO. Syntactically, therefore, it tells one that a subsidiary sentence is contained in the main sentence. But this piece of information is not a ‘bit of information’ in the general sense,
176
Experiments in machine translation
whereas the ‘bit of information’ that someone (a human being) is postulating as possibly true what follows is. (Note that ‘bit of information’ is not being used here in the sense in which it is used in information theory.) Provided this general ‘bit of information’, which is part of the writer’s argument, gets over somehow into the translation, it does not matter what specific English words, and what specific syntactic devices, are used to convey it. This notion of ‘bit of information’ helps to make a transition from the notion of a matching dictionary to that of a thesaurus; for phrases can be classified according to the ‘bit of information’ that they convey. And such a ‘clustering’ is the characteristic property of a thesaurus classification of the kind required for machine translation, as opposed to the vaguer forms of classification used in Roget. We can also draw the following conclusions on pidgin translation: 1. An English pidgin designed as a language must have at least two parts of speech, i.e. content words, and the small set of frequently used variables. 2. Very large special word and phrase dictionaries will be needed for each special subject, e.g. 500,000 entries. 3. A thesaurus establishing synonym classes of words and phrases can be compiled if these can be classified according to the ‘bit of information’ they convey. The following phrases, which also occur in the pidgin output, are produced by translating a single ambiguous word or chunk of the input language into a ‘pidgin-variable’, which consists of a whole pidgin phrase: THATþ WEþ AREþ TALKINGþ ABOUT THATþ ONEþ WHICHþ IS THOSEþ WHICHþ ARE INþ ALLþ ROUND OFþ THE WHICH BE Pidgin translation after the inclusion phrases S1.
S2.
VINE THATþ WEþAREþ TALKINGþABOUT BECOMEþ DORMANT INþAUTUMN BECAUSEþ OF TEMPERATURE -WISE WHICHþ BE LOW. CO-OPERATION BETWEEN SOME COUNTRY ECONOMIC UNION-S AND DANISH RURALDWELLER UNION-S-WARD SEEN SUPPLY IS CONTINUE-ED BEEN AFTER SAME LINES(AS)þ THAT IN PREVIOUS YEAR.
Mechanical pidgin translation
177
Table 7.3. Technical phrases
Technical phrase BECOME þ DORMANT IN þ AUTUMN CO-OPERATION (STANDARD þ VARIETY) GROWTH þ HORMONES
[SPECIFICITY] FORM þ CIRCLE GROWN þ UNDER þ GLASS INþ FORMER þ TIMES [AUTUMN þ WHEAT] [SPRING WHEAT] [OVULE] [CHROMOSOME] MINIMAL þ ANNUAL þ TEMPERATURE [MALIC þ ACID þ DECOMPOSI-TION] CHROMOSOME þ DIVISION OF þ SPERMATOGONIALþ THE þ OSMOTIC þ PRESSURE [TETRAPLOID] [DIPLOID]
Sequence of pidgin words that it replaces
Sentence number
ENTER IN REST IN AUTUMN þ HARVEST(TIME) TOGETHER-WORK STANDARD þ VARIETY THOSE(WHICH þ ARE) HORMONES FOR þ OF(M) GROWTH SPECIFICITY FORM-S CIRCLE GROW-ED-S IN LUMP-HUMP EARLIER-POINT-S TIME-POINT-S AUTUMN WHEAT SPRING WHEAT OVULE CHROMOSOME TEMPERATURE-S MINIMUM-S ANNUAL MALIC ACID DECOMPOSITION
S1 S1 S2 S4 S5
DIVISION OF þ THE CHROMOSOME-S
S19
WHICH þ BE-DIVISION WHICH þ DIVISION þ BE-SPERM OSMOTIC PRESSURE-THAT þ WE þ ARE þ TALKING þ ABOUT TETRAPLOID DIPLOID
S20
S5 S6 S9 S11 S12 S12 S14 S15 S16 S18
S20
S20 S20
Notes: 1. It will be recalled that words that are chunked all in one piece count as a one-word phrase. 2. Phrases given in square brackets in the above table are carried through unchanged to the raw output.
S3. THAT þ ONEþ WHICHþ IS DISEASE COME-S THUS VERY RAPID UP AND HAS IN MANY CASE-S AN(E) TOTAL AMISS CROP THEN DID-FOLLOW. S4. OTHER-S FOUR FOREIGN COUNTRY-OUTþ OF STAND-POINT STANDARDþ VARIETY. HAVEþ SHOWNþ THEMSELVES CULTIVATION VALUE-WARD VERY INSECURE-BECOMING.
178
Experiments in machine translation
Table 7.4. Unilingual phrases (i.e. phrases that are presumed to justify themselves in use because of some characteristic of the source language) Sentence Number
Language-of-origin Phrase
Pidgin
Albanian Finnish
FROM WHEREOF REASON ARE SHOW ONESELF-EDþ BEEN-THEY THATþ(ONE)NOT-IS REALLY ASTONISH-ING FOR OF (M) SUPPOSE-ING WHEN(W)THAT FOR OF (M) OTHERS BROOD(S) ON TO (AD)HE(A)R(E) . . . AT IS STATUS PROVE-ED
S5 S5 S5 S5 S6 S9
POSSIBLE IS IS-ABLE WE FIXED SPEAK-ING WITHIN OF ON JUSTIFICATION THEREþ(AT)þ TO
S10 S11 S11 S14 S16 S20
French French French French French German Italian Latin Latvian Latvian Portuguese Spanish Japanese
BECAUSEþ OF HAVEþ SHOWNþ THEMSELVES ITþ ISþ NOTþ ASTONISHING TOþ SUPPOSE WHILE OTHERS THINKþ OF BELONG ITþ HASþ BEENþ PROVED ITþ ISþ POSSIBLE WEþ CAN CONCLUDING INSIDE INþ FACT BYþ COMPARISON
S1 S4 S5
S5. ITþ ISþ NOTþ ASTONISHING TOþ SUPPOSE(W)THAT GROWTHþ HORMONES ACT-THEY ON CERTAIN-S SPECIE-S WHILE THEY ARE NON-OPERATE-ING ON OTHERS-S, IF ONE THINGSþ OFþ THATþ ONEþ WHICHþ IS GREAT SPECIFICITY FORþ OF(M) THOSE SUBSTANCE-S. S6. IF IN AN(E)-WARD LARGE-ER AREA TWO FORM-S BESIDE ONE ANOTHER LIVE WITHOUT SELF TO(O) MIX, SO BELONG-THEY DIFFERENT -S FORM þ CIRCLE-S. S7. THE SMALL BERRY-LIKE VARIETY-S SO CROP QUANTITY-IN-REGARDING, AS DRY MATTER YIELD þ IN- REGARDING, SURPASS-THEY GREAT BERRY-LIKE VARIETY-S. S8. CAUSE SOW-ING S SOMETIMES THUS ENORMOUS BE DAMAGE- ED TILL SO OUGHT BE-SOW ONCE AGAIN.
Mechanical pidgin translation
179
S9. ITþ HASþ BEENþ PROVED (W)THAT THE CEREAL-S OF WINTER GROWNþ UNDERþ GLASS SHOW-THEY LITTLE RESISTANCE TO COLD WHILE THE SAME GROW-ED-S IN FIELD OPEN ARE MUCH MORE RESISTANT-S. S10. ITþ ISþ POSSIBLE, HOWEVER NOT PROVE, ALL FORM SAME-ISH FAMILY FROM SAME FORM ARISE DRAW- TOþ HAVE. S11. HOWEVER WEþ CAN ALREADY CONCLUDING CONERNING OUR RIVER OAK FOREST-S TYPE EXTENSIVE-ER- ISH SPREAD-THEY INþ FORMERþ TIMES, AS ALSO CONCERNING THIS, THAT THIS FOREST DYINGþ OUT-WISE, AT LEAST THROUGH PART-WISE, BASIS-POINT BEEN CLIMATIC REASON-S. S12. GROWTH-S OF AUTUMNþ WHEAT WAS MORE VARIABLE-S FROM YEAR TO YEAR THAN GROWTH-S OF SPRINGþ WHEAT. S13. DIRECTION-S BENDING-TRUNK, ANSWER-THEY DIRECTION-WISE-S DOMINANT-ISH-S WIND-ISH-S AND IT BEHOVES JUDS-ING THAT SWORD-SHAPED-MENT IS CAUS-ED-BEEN THROUGH WIND-S. S14. (TO)THE EXISTENCE OF AN(E) NUMBER VARIABLE OF SEED-S INSIDE FRUIT SHOW (W)THAT THOSEþ WHICHþ ARE VARIOUS-S OVULE-S OF THIS PLANT HAS IDENTICAL POSSIBLE-MENT OF SELF DEVELOPMENT. S15. CHROMOSOME-S BARLEY-ISH-S CULTIVATED ARE OF AN(E) DIA-METER MORE GREAT THAN THOSE BARLEY-ISH-S WILD. S16. THE STUDY OF THATþONEþWHICHþIS DISTRIBUTION OF MINIMALþ ANNUALþ TEMPERATURE, AS IS OBVIOUS IN ALL WORK, REDUCE INþ FACT TO THATþ ONEþ WHICHþ IS DENSITY OF THOSEþ WHICHþ ARE STATION-S, AND TO RECORD OF OBSERVATION-S OF EACH AN(E) OF THEM. S17. IFþ ALLþ ROUND EARTH BEEN FREEZE-ED LONG AND DEEP, HAS NO INJURY OF CLOVER ROT GET-ED-BEEN.
180
Experiments in machine translation
S18. ENTIRE ACIDITY VIEW-FROM ALWAYS RICH ISþ BECOME-NOT WINE-S THATþ WEþ HAVE, BECAUSE MALIC þ ACID þ DECOMPOSITION CONDITION DESIRE-WISE SUITABLE-IS-NOT. S19. AND OCCUR TIME-S CHROMOSOMEþ DIVISION WHICHþ BE-LIMITED-IS(H), AND THAT PERIOD OFþSPERMATOGONIALþ DIVISION WHICHþ BE-LAST, RESULT-IS(H) OFþ THE OCCURRENCE-S MITOTIC-IS(H). S20. THIS ENDURE COLD (SEX)DISPOSITION-ISH DIFFERENCE-IN þ REGARDING, AS FOR TETRAPLOID-THATþ WEþ AREþ TALKINGþ ABOUT DIPLOID BYþ COMPARISON (CON)SORT, UNþ TOGETHERþ WITH THEþ OSMOTICþ PRESSURE HIGH-LY BECOMING-IS, THEþ FACTþ ISþ WISE LARGE-LY REASON WHEN WITH CONSIDER- IS þ BEEN. 3.
Control translation into mechanical pidgin of a portion of Caesar’s Gallic War, Book I, as generated by a fully mechanised machine translation program.
Since Caesar’s Gallic War is famous as a text for translation, no English translation is appended. The pidgin generated was not further sophisticated but was left as it was. 3.1.
Input Latin text Caesar – The Gallic War
Apud Helvetios longe nobilissimus fuit et ditissimus Orgetorix. M. Messalla et M. Pisone consulibus regni cupiditate inductus coniurationem nobilitatis fecit et civitati persuasit, ut de finibus suis cum omnibus copiis exirent: perfacile esse, cum virtute omnibus praestarent, totius Galliae imperio potiri. Id hoc facilius eis persuasit, quod undique loci natura Helvetii continentur: una ex parte flumine Rheno latissimo atque altissimo, qui agrum Helvetium a Germanis dividit; altera ex parte monte fura altissimo, qui est inter Sequanos et Helvetios; tertia lacu Lemanno et Flumine Rhodano, qui provinciam nostram ab Helvetis devidit. His rebus fiebat ut et minus late vargarentur et minus facile finitimis bellum inferre possent; quo ex parte homines bellandi cupidi magno dolore adficiebantur. Pro multitudine autem hominum et pro gloria belli atque fortitudinis angustos se fines habere arbitrabantur, qui in longitudinum milia passuum CCXL, in latitudinum CLXXX patebant. His rebus adducti et auctoritate Orgetorigis permoti constituerunt ea quae proficiscendum pertinerent comparare, iumentorum et carrorum quam maximum numerum
Mechanical pidgin translation
181
coemere, sementes quam maximas facere, ut in itinere copia frumenti suppeteret, cum proximis civitatibus pacem et amicitiam confirmare. Ad eas res conficiendas biennium sibi satis esse duxerunt; in tertium annum profectionem lege confirmant. Ad eas res conficiendas Orgetorix deligitur. Is sibi legationem ad civitates suscepit. In eo itinere persuadet Castico, Catamantaloedis filio, Sequano, cuius pater regnum in Sequanis multos annos obtinuerat et a senatu populi Romani amicus appellatus erat, ut regnum in civitate sua occuparet, quod pater ante habuerat; itemque Dumnorigi Aeduo, fratri Diviciaci, qui eo tempore principatum in civitate obtinebat ac maxime plebi acceptus erat, ut idem conaretur persuadet eique filiam suam in matrimoniam dat. Perfacile factu esse illis probat conata perficere, propterea quod ipse suae civitatis imperium obtenturus esset; non esse dubium, quin totius Galliae plurimum Helvetii possent; se suis copiis suoque exercitu illis regna consiliaturum confirmat. Hac oratione adducti inter se fidem et iusiurandum dant, et regno occupato per tres potentissimos ac firmissimos populos totium Galliae sese potiri posse sperant.
3.2.
Chunked (Kay and McKinnon-Wood, 1960) Latin text with corresponding English translations (sample page) [COMPAR [ARE [, [I [IUM [IUMENT [UM [ENT [ORUM [OR [UM ET [CARR [OR [ORUM [UM [QUAM MAXIM [UM [NUMBER [UM [COEM [ERE , [SEMENT [ES
GET þ TOGETHER -TO , BEAST þ OF þ BURDEN -THEY -S OF AND CHARIOT -S OF AS þ MUCH þ AS þ POSSIBLE NUMBER þ OF BUY UP -TO , SOWING -
182
Experiments in machine translation
QUAM MAXIM AS [FAC [ERE , UT IN ITINERE [COPI [A [FRUMENT [I [SUPPET [ERET , CUM [PROXIM [IS
AS þ MUCH þ AS þ POSSIBLE MAKE -TO , THAT ON þ THE þ JOURNEY RESOURCES CORN BE þ MADE þ AVAILABLE -MIGHT , WHEN(ITH) THE þ NEAREST -
Note on the symbolism of the dictionary and pidgin output: þ connects words forming an output phrase. connects word stems and their appropriate output inflexions (see rule below). ‘(‘and’)’ indicate a particular type of pidgin variable, as in the case of (w)that in Section 2.2. In this output either these are variables ambiguous as to number, i.e. parts(s), or are variables in meaning, e.g. (ap)prove or when(ith). The last, for the Latin cum, is to be understood as when or with depending on its context. 3.2.1. Rule used to decide cases of multiple chunking Many words in the preceding dictionary are chunked in a number of ways. iumentorum, for example, has pidgin output for the following forms [I, Ium, Iument, um, ent, orum, or, um]. If the outputs for all these were inserted in the pidgin translation of the passage, the reader would have to make choices as he or she read. My word-for-word method precludes the examination of the context of iumentorum in order to determine a unique output, and hence the correct chunking of the Latin word. I therefore adopt the following rule to determine the correct chunking from the dictionary entry. I assume that the chunking procedure allows us to distinguish inflexion chunks (those followed by a space in the text) from the others, which I call stem chunks. RULE: take the longest inflexion chunk (i-chunk for short), in the dictionary entry for the word. Write down the output for the
Mechanical pidgin translation
183
corresponding stem chunk unless this is written only in chunked form, in which case repeat the procedure. Then write down the output for the longest inflexion chunk. Example: CUPIDITATE has entries for CUPID, CUPIDIT, IT, AT, E. Longest i-chunk is E, but corresponding s-chunk CUPIDITAT does not occur. Repeat for AT: Corresponding s-chunk CUPIDIT exists chunked, but also entire. Thus write down outputs for CUPIDIT, AT and E.
3.3.
Output English translation
Among the þ Swiss by þ far noble-est was and rich-est the þ chief þ Orgetorix. He, during þ the þ consulate þ of þ M.Messalla and þ M.Piso kingdom desire-s induced conspiracy persuaded-s, that or limit-s own when(ith) all-s resources might þ go þ out-they: a þ mere þ nothing to þ be when(ith) strength all-s excelthey- they þ might, the þ whole þ of Gaul control to þ gain þ the þ mastery þ of. That þ thing this the þ more þ easily to þ them persuaded-s, in þ respect þ of þ which on þ all þ sidesþ the þ nature þ of þ the locality þ the þ Swiss contain they þ are one however river the þ Rhine wide-est and high-est, who district Switzerland from the þ Germans divide-s; the þ other however mountain(s) the þ Jura high-est, who is between the þ Seige dwellers and the þ Swiss; third lake Leman and river the þ Rhone, who province our from the þ Swiss divide-s. to þ all þ this result-was that and less widely wander they þ might þ be þ able: in þ which þ respect man war to desirous great grief brought theyþ wereþ for þ the þ sake þ of honour war and bravery narrow-s self limit have-to declarewere. Who in length miles CCXL, in width CLXXX lie-they þ were. From þ all þ this were þ led and by þ the þ authority þ of the þ chief þ Orgetorix excited fixed ad þ they those þ things þ which to set þ out to tend they þ might get þ together-to, beast þ of burden-s þ of and chariot-s þ of as þ much þ asþ possible number þ of buy þ up-to, sowing as þ muchþ as þ possible make-to, that on þ the þ journey resources corn be þ made þ availablemight. When(ith) theþ nearest the þ state-s peace and friendship confirm-to. To the matter þ accomplish-to two þ years self enough to þ be considered- ed þ they: in third year an þ expedition by þ law confirm-they. To the þ matter accomplishto the þ chief þ Orgetorix choose-is. He self deputation to the þ state-s undertook-s. On þ the þ way persuade-s Casticus, the þ chief þ Catamantaloedes son, a þ Seinedweller, of þ whom father kingdom in the þ Seine-dwellers many-s year-s possessedhad and from the þ senate of þ the Roman þ people friend called there þ was that kingdom in the þ state-s own occupy-might, in þ respect þ of þ which father before had-had; and þ besides Dumnorix a þ Haeduan, brother the þ chief þ Diviciacus, who at þ that þ time the þ predominant þ influence in the þ state-s obtain-was and þ also mostly theþ people acceptable there was. That the þ same þ thing-attempt- might þ be persuade-s and þ they daughter own in þ marriage gives. A þ merely þ nothing to þ do to þ be that (ap)prove-s attempt-s finish- to, because he þ himself own the þ state-s control obtain-would might þ be:
184
Experiments in machine translation
not to þ be doubt, but þ that the þ whole þ of Gaul to þ do þ a þ great amount the þ Swiss they þ might þ be þ able; himself own resources and þ own army that kingdom secured-would confirm-s by þ this þ speech þ were þ led from þ one þ another pledge and oath give, and kingdom occupied three powerful-est-s and þ also strong-est people-s the þ whole þ of Gaul self to þ gain þ in þ the þ mastery þ of to þ be þ able hope-they.
3.2.2. ‘Garbage’ production generated by Caesar pidgin dictionary In order to correct the misleadingly good impression conveyed to uninformed outsiders by the output given above, two sample translations of a passage from Newton’s Principia and the first seven lines of Virgil’s Aeneid follow. The former was chosen partly because it represents scientific writing in Latin, and also because the word order is far more closely related to English than classical Latin. It was hoped that this test would bring out the extent to which a word-for-word translation is affected by word order. Latin Text 1: Newton’s Principia Mathematica, Book 1, Proposition LIX, Theorem XXII Corporum duorum S & P, circa commune gravitatis centrum C revolventium, tempus periodicum esse ad tempus periodicum corporis alterutrius P, circa alterum immotum S gyrantis, & figuris, quae corpora circum se mutuo describunt, figuram similem & aequalem describentis, in subduplicata ratione corporis alterius S, ad summam corporum S þ P.
Pidgin translation Body ofþ two S and P, about common centreþ ofþ gravity C revolve-they, time periodic toþ be to time periodic body oneþ orþ theþ other P, about theþ other unmoved S circle-they, and form, which body about self mutually describe- they, form like and equal describe-they, in squareþ root theþ reckoning body theþ other S, to whole body Sþ P.
Full translation, for reference, from Motte The periodic time of two bodies S and P revolving around their common centre of gravity C, is to the periodic time of one of the bodies P revolving round the other S remaining fixed, and describing a figure similar and equal to those which the bodies pffi pffi describe about each other, as S is to (Sþ P).
Note on the experiment: This experiment was carried out by hand; the Latin dictionary made for the Caesar text was used, with additions for the new words. Thus no attempt was made to construct a special dictionary; even when new words were added to the dictionary, they were given as widely applicable translations as possible, for example, ‘form’ for ‘figur-’. The only exceptions were words that do not occur in classical Latin at all, such as ‘subduplicata’ (‘square root’) and ‘gravitatis centrum’ (‘centre of gravity’).
Mechanical pidgin translation
185
Latin Text 2 Arma virumque cano, Troiae qui primus ab oris Italiam fato profugus Laviniaque venit litora, multum ille et terris iactatus at alto vi superum saevae memorum Iunonis ob iram, multa quoque et bello passus, dum conderet urbem inferretque deos Latio, genus unde Latinum Albanique patres atque altae moenia Romae. Pidgin translation Arms man-and sing, Troy who first from theþ shore Italy destiny fugitive Laviniaand come-s theþ shore, much that on earth/terror tossing and high strength higher furious remembering Juno onþ accountþ of rage, much also and war step/suffered/outspread, while found-might city bringþ in-s and theþ Gods Latin, race whence Latin Alba-and ancestors and high wall Rome. Full translation, for reference, from the edition of Page, Capps, Rouse, Warmington Post and Rushton Fairclough Arms I sing and the man who first from the coasts of Troy, exiled by fate, came to Italy and Lavinian shores; much buffeted on sea and land by violence from above, through cruel Juno’s unforgiving wrath, and much enduring in war also, till he should build a city and bring his gods to Latium; whence came the Latin race, the lords of Alba, and the walls of lofty Rome.
4.
After five years (written in 1965 by the Editor)
By converting the program referred to in Section 3 from punched-card form to computer form, more extended mechanical pidgin translation experiments (Masterman et al., 1960) could obviously have been done. However, from the experiments we had done, we considered that mechanical pidgin translation had been tested to destruction. The point of breakdown was this: semantic ambiguity can indeed be damped down by creating a very large number of particular phrases, but these do not help the getting out of the generalised bits of information that make up the message. To show this, imagine the length of these phrases progressively extended, to clause length, sentence length, paragraph length and finally text length. With each extension the content will become more particularised, whereas what was needed, from the start, was to have it more general. Nor will unilingual syntactic analysis supply the right type of generality, though it may supply data for it; for, notoriously, the same bit of information can be differently expressed, both with regard to vocabulary and with regard to syntax.
186
Experiments in machine translation
The way forward is: 1. To accept the conclusion derivable from the mechanical pidgin translation experiments that the phrase and not the word is the semantic unit of translation; 2. To make the machine cut the source text up into phrases (using syntactically and/or phonetically derived data), and then to do a dictionary match of these with a mechanical pidgin phrase dictionary in which classes of phrases are coded into sequences of pidgin variables (e.g. into sequences of elements of Richens’ NUDE). As Alice found, in Through the Looking-Glass and What Alice found there, after she had finished reading the poem Jabberwocky, the essential enterprise in deciphering a foreign text in an unknown language is to get hold of the bits of information of which the message basically consists. Examples of such ‘bits of information’ are: that a past action has occurred; that a comparison is being made between tetraploids and diploids with regard to the capacity of each to endure cold; that a statement is being made by somebody about something; that, as Alice said, ‘somebody killed something’, and so on. These bits of information are more fundamental than the grammatical and syntactic features of the text. 3. To assign to these sequences of pidgin variables a mathematically determinate recursive structure that can also be interpreted semantically as a mechanical pidgin structure (Wilks 1965c). Thus the notion of a mechanical pidgin variable is abstracted from that of an English pidgin variable; and the notion of the structure of a mechanical pidgin from that of a simplified English grammar and syntax. 4. To print, as first output, the structured concatenation of sequences of pidgin variables; each such sequence conveying a bit of information. This will be the message. 5. To convert this output by some phrase-construction program into a sequence of phrases in the target language. This generalisation of the idea of a mechanical pidgin forms the basis of our present machine translation research program. Stage 5 has not been worked on as yet. [Editor’s note: this paper was first written as a research memorandum with Martin Kay in 1960. The version above was re-edited for publication by this Editor in 1967 for the Locke and Booth volume (1955).]
8
Translation
The purpose of this chapter is to present a philosophical model of real translation. ‘Translation’ is here used in its ordinary sense: in the sense, that is, in which we say that passages of Burke can be translated into Ciceronian Latin prose, or that the sentence ‘He shot the wrong woman’ is untranslatable into good French. The term ‘philosophical’, however, needs some explaining, since, so far as I know, no one has made a philosophical model of translation as yet. I shall call a model of translation ‘philosophical’ if it has the following characteristics: 1. It must not only throw some light on the problem of transformation within a language, but must deal also with the problem of reference to something. That is to say, it must relate the strings of language units in the various languages with which it deals to public and recognisable situations in everyday life. It is characteristic of philosophers that, unlike most linguists, they do not regard a text in language as self-contained. 2. It must deal in concepts, not only in words or terms. All philosophers believe in concepts, though they sometimes pretend not to. 3. It must face, and not evade, the problem of constructing a universal grammar, while yet recognising fully how greatly languages differ, and how peripheral is the whole problem of determining the nature of language. 4. It must deal in word uses, that is, with words as they occur in their contexts: that is, it must face and not evade the problem of the indefinite extensibility of word meaning. It is this last characteristic, philosophically speaking, that is the novelty, since it ties up my translation model not to philosophy in general, but to a particular kind of contemporary philosophy, namely, linguistic philosophy, the ‘philosophy of ordinary language’. The philosophical relevance of this translation model, in my view, is twofold. Firstly, following lines laid down by C. H. Langford (1942), it can be used to solve Moore’s Paradox of Analysis. Secondly, following lines laid down by J. L. Austin both in his seminars and in his paper on ‘Excuses’ (1956), and following also, though less nearly, a line laid down by Wittgenstein in Part II of Philosophical Investigations (1958), it can be used to operate a Contrast Theory of 187
188
Experiments in machine translation
Meaning. This Contrast Theory of Meaning may well be only analogous to, and not the same thing as, the theory of meaning glimpsed at by Wittgenstein. Nevertheless, if the analogy of each theory of meaning presented here is admitted at all, the fact that it is possible to construct this translation model constitutes a far more fundamental answer than any given yet to the attack delivered on the ‘philosophy of ordinary language’ in Gellner, 1959. Thus the philosophical roots of this model of translation lie not in the older logic, but in the study of ‘ordinary language’. The system presented here is, however, a model in the sense that it can be operated, and yields results, either right or wrong; it is not just a piece of philosophical dictionary making, undertaken either for its own sake or to discredit generality. 1.
Situations
Such a book as Charles Duff’s How to Learn a Language (1947) settles for me beyond doubt that not very clever differing-language speakers with minimal sign apparatus can understand one another – that is, translate to one another – if and only if they can both recognise and react to situations common to both of them in real life. What I want to say here is that, even when we know one another’s languages, we still do the same thing. It is important to side with the language teachers on this, and not with the behaviourist psychologists or the linguists; for either of these last two groups, starting from their own assumptions, can talk one into thinking that translation, in the ordinary sense, is impossible. But language teachers who teach translation know how it is that it can occur: the right hotel room is engaged, the puncture in the left back tyre is mended, the telegram is sent, the friend’s (unknown) friend is safely met at the station, all because, however little the people engaged know of each others’ languages, they know a very great deal about the relevant situation. And in so far as this knowledge of a common stock of situations breaks down, as it well might break down as between us and the termites, or between us and sulphur-breathing beings from another planet, then it is evident that, whatever the language involved, translation becomes impossible. We now have to consider the place in the model of the situations occurring in real life; indeed we have to consider how to portray them at all, given that situations in real life are so many and have such vague boundaries. Fortunately, a philosophical technique for situation portrayal has recently grown up, which is used by Anscombe for describing Wittgenstein’s Picture Theory of Meaning. It is used also, though less philosophically, by I. A. Richards and Molly Gibson in their
Translation
189
language-teaching series of books Language through Pictures (1956). This technique consists in portraying a situation in real life by a stylised stickpicture of the sort that is used in comic strips or in animated cartoons; moreover, it is a technique that can be logically examined and systematised, not completely, but to a greater extent than at first sight appears.1 Here is a brief description of the system. We will assume that the stickpictures form a set, and that this set can be classified in the following way: (1) Two or more distinct stick-pictures picturing the same basic situation will be called situationally similar. The set may then be partitioned into mutually exclusive and collectively exhaustive subsets that are situationally similar; the whole subset will then correspond to one basic situation. The principles according to which the subsets corresponding to one basic situation are distinguished from one another we shall call principles of basic situation contrast. (2) We then partition each subset of situationally similar stick-pictures into subsubsets, the stick-pictures in each of which picture the same basic situation from the same angle; or, as we shall say, from the same aspect. The principles according to which the subsubsets of aspectually similar stick-pictures are distinguished from one another will be principles of aspect contrast. (3) The principles of aspect contrast according to which one subset of situationally similar stick-pictures are partitioned will usually be partly the same and partly different from the principles according to which another subset of situationally similar stick-pictures are partitioned, but we can conflate all the principles of aspect contrast applicable to any of the basic situations and take them all as applicable to every subset corresponding to a basic situation, if we allow for empty subsubsets in the subset of situationally similar stick-pictures. It is characteristic of the system that it may be partitioned any time according to different principles of classification. For example, new principles of situation contrast and of aspect contrast may be used, so that two stick-pictures regarded from the old standpoint as situationally similar but aspectually dissimilar may be regarded from the new standpoint as similar or dissimilar, both situationally and aspectually, or as situationally dissimilar but aspectually similar. Moreover, two stick-pictures that are previously regarded as both situationally and aspectually similar may now be regarded as dissimilar in either or both of these respects. If such a
1
It is probable, indeed, that far more can be done along this line than I have at present done. The rules and examples of the crude stick-picture situation system given below are the result of the enterprise of interlingualising and generalising the first stages of the Language through Pictures books, and were made for the purpose of interlingualising a crude procedure for mechanical translation.
190
Experiments in machine translation
Figure 30
repartitioning is made, the new arrangement of the system will, like the old, be a double classification system, but, of course, a different double classification system from the old one. What does this come to in terms of real life? We assume that, in life, we can recognise, and distinguish from one another, basic situations. Of these basic situations, three are stick-pictures in Figure 30, namely, that of someone showing grief, that of someone pointing to himself, and that of someone thinking about himself. We further assume that, in real life, basic situations are logically independent and all of equal weight, except that, logically, they go in contrasting pairs: for example, ‘Laughter, Grief ’, ‘Self, Other Man’, ‘Food, Drink’, ‘Birth, Death’, ‘War, Peace’. Pairs of such basic situations can be resorted, but only into other contrasting pairs. Situation series can also be built up (e.g. all those stick-pictures that have human beings in, or all those stick-pictures in which the sun is shining), but these series also will build up into contrasting pairs. Thus, if we make ourselves a pack of cards on the specification of the system given above, each card having one stick-picture, and each stickpicture portraying one and only one aspect of a basic situation, we shall find at the end that we have a double contrast pack of cards. Such a pack, as it stands, will be objective, in the sense that it will readily be sorted by differing players into the same sets, and these sets can be subsorted. If, however, it is desired to resort the pack, or any part of it, it will be found
Translation
191
that all the resorted cards will have to be subtly redrawn in order to bring up, or play down, new resemblances and contrasts between them. In my view, this resorting and renoticing process is just what we do in real life, when we perceive a situation, as we say, ‘from a new angle’. According to many people – following lines associated in linguistics with Whorf (1950) and, in philosophy, with Waismann (1953) – this is also what we do when we start to think in a new language; the new language, which will use different sorting principles, will actually make its user notice different features of the world; he or she will see the world differently. My novelty in all this lies in introducing into the renoticing and resorting process a general principle of making contrasts in pairs. We now turn to the mechanics of the ideography. It is evident that, if situation contrasts and aspect contrasts of a stick-picture system are ever to be describably redrawn, their portrayal in the first place cannot be given by any feature alone; it must be given multiply. If, in an ideographic system for sorting and resorting cards by contrast, there are no units or elements of the system that can be inserted, transformed or removed, no change in the contrasts derivable from the system can ever be made. The rules of the system, then, as they appear to the artist, will differ from the rules of the system as they appear to the sorter or resorter; the two will deal in differing units. The unit for the sorter or resorter is the card; for it is from the reshuffling of the cards that he or she will have to build up new basic situations. The unit for the artist is any visual feature of a stick-picture that they find by experience and that they use recurrently when making cards. This recurrent visually representational unit of the artist’s I shall now call, extending Peirce’s use of the term (1958), an icon. This definition enables me to say that the rules of the system, as they appear to the artist, consist in an icon glossary together with a basic set of ideography-making principles; whereas the rules of the system, as they appear to the pack user, consist of discoveries made from their knowledge of the world and of languages, as to which combination of stick-picture cards are likely to occur together and which are not, and which extensions to the contrast system can or cannot be made. Since I want to pass quickly from the situation system to the rest of the model, I will now leave the pack user on one side and concentrate on the artist. There is neither need, nor space, to give a complete icon glossary. Inspection of a small section of it, however, will make the nature of the whole system much clearer: A free cloud,
192
Experiments in machine translation
must not be confused with a tied cloud.
A free cloud stands for any abstraction from the objects that are within it, whereas a tied cloud, attached to a man’s head, contains his thoughts given as images (see the right-hand stick-picture in Figure 30). A stick-picture with an arrow
represents some sort of change or motion or action. A stick-picture without an arrow, that is, with a blank background, represents a quiescent state (see the left-hand stick-picture in Figure 30). A stick-picture man with eyes only represents a participant in a situation or action, as opposed to an onlooker at it. A stick-picture man with eyes and mouth represents the doer of some action. A stick-picture man with features is a participant in some action; a stickpicture man without features merely exemplifies some situation. A stick-picture man whose hands and feet turn up is in a state of liveliness, whether of action or of movement. A stick-picture man with hands and feet turned down is limp; he is in an inert and quiescent state . . . And so on. Consider now what happens when you are teaching anything by pictures to someone with whom you have no common language. You go on building up your picture or pictures, adding more and more realistic last-minute touches, and with your informant still looking blank, until suddenly communication is established. You will not yourself know, though you can sometimes guess, just what extra icon, what particularised spontaneous last-minute flick, adapted to his culture, caused your informant suddenly to fling up his hands in delight, burst into a flood of (to you) incomprehensible verbal expression, seize the chalk, and himself continue drawing the rest of the picture or pictures, correctly and without prompting. The point is, once he understands anything, he understands everything. Once he understands that the point of the picture that you are drawing for him is that it depicts, in every conceivable way, a sudden catastrophe, he will understand also that the exclamation mark – that icon, even in Peirce’s
Translation
193
sense, of sudden explosion – that has been figuring prominently in a corner of the picture throughout – is the icon – in the new extended sense – that is to be used for all situations of sudden catastrophe. He will understand this, and be able to act on it, even though there is no one word in his language for ‘sudden catastrophe’ and therefore no counterpart of the exclamation mark icon; for sudden catastrophes occur in his civilisation also. But it is the picture, or set of pictures, that makes him understand the icon, not the icon the picture. And so there has to be something in the picture that is instantly noticeable and recognisable to both draughtsman and viewer as a concomitant of catastrophe. It might be that the stick-picture man’s hair is standing right on end (although ‘hair standing on end’ has never yet been an icon), or that his face, beyond any doubt, expresses horror, or that the house is clearly on fire beyond putting out, or that the atom bomb has actually gone up – any or all of these – the point is that once something in the picture has been recognised as catastrophic, all the rest of the symbolism of the picture, by contagion, becomes catastrophic also. This is the principle upon which all comic strips, and all animated cartoons, in fact work; only in these, communication, both of mood and content, is so subtly achieved that the viewer can never consciously think back to what it was that first made them understand what was meant but never verbalised. Now, it is a come down from the brilliance of Walt Disney to the crude touched-up Language through Pictures stick-picture system described here. Also, a language-teaching stick-picture system, unlike a cartoon, has not only got to tell a story; it must be charged with the message that aspect indicators are the pegs upon varying combinations of which to hang various information-carrying devices used by languages. But it should now be clear that, however crude the system, quite complex multiple iconic contrast between two stick-pictures becomes securely recognisable by readers from different cultures. Here, for instance, is the iconic layout of the aspect contrast between a movement and a static posture; more generally, between an action and a state: Basic picture: Active aspect:
A stick-picture man is lying (down) on a bed. 1. There is an arrow somewhere in the picture, indicating the movement the stick-picture man is making. 2. The stick-picture man’s head, hands and feet all turn up; they show perkiness. 3. The stick-picture man has eyes. 4. The stick-picture man has a mouth. 5. If possible, the bed is being bounced on; but this is very difficult to draw.
194
Experiments in machine translation
Quiescent aspect: 1. There is no arrow in the picture. 2. The stick-picture man’s head, hands and feet hang down; he looks limp; (but he is not sprawling, i.e., he is not dead). 3. The stick-picture man has no eyes and no mouth. Nor is this an exceptionally complicated aspect contrast to build up; many of the others are far worse. Here, to conclude, are the basic principles of stick-picture making in this system: 1. Any aspect contrast, in order to be understood by speakers from different cultures, must be overdetermined. 2. Any icon in the system can occur also as a complete picture, and any complete picture can occur also as an icon. (See, in Figure 30, the two positions of the picture of the man pointing at himself). 3. There must always be something in common between any pair of icons, if this pair of icons is to convey an iconic contrast; and there must always be something in common between any pair of stick-pictures, if this pair of stick-pictures is to convey an aspect contrast. 4. The icon contrasts in the iconography cannot coincide with the aspect contrasts in the aspect system, since the latter are built of the former. 5. The icon glossary, together with its rules of use, cannot completely specify the aspect contrast system (this comes to saying that the whole ideography cannot be used to specify itself fully), since the way must always be left open for the stick-picture artist to add or alter any particularising lastminute touches, designed for some special culture, in order to make some basic situation or aspect recognisable to speakers of that culture. 2.
Concepts
Let it be assumed that there can be constructed one set of situationally similar stick-pictures for each logically independent head (or paragraph, or topic) in Roget’s Thesaurus.2 Let the overlap of meaning of the total set of word uses in such a head be called a concept.
2
As Roget’s Thesaurus stands, the heads in it are by no means logically independent. Many of them, on the contrary, are logically connected; for instance, in the series 360 DEATH, 361 KILLING, 362 CORPSE and 363 INTERMENT, the last three are all aspects (in the sense of aspect that I wish to define in this chapter) of the first. It is possible, however, by using a system of tags, to streamline Roget’s Thesaurus in such a way as to leave only heads that can be taken as logically independent. This question will not be further discussed here.
Translation
195
Let it be assumed also that the contrasts between the different subparagraphs, rows of word uses and even smaller strings of word uses that are separated by semi-colons as subdivisions of rows, in any head in Roget’s Thesaurus, can be defined in terms of the aspect contrasts of the stick-picture system; either as single aspect contrasts, or as alternations of aspect contrasts, or as conjunctions of aspect contrasts. Let it be assumed further that any synonym dictionary, in any language, could be similarly defined in terms of the stick-picture system, due allowance being made for the facts that both the stick-picture sets might have to be resorted, and the word use distinctions within the heads might have to be specified by using different combinations of aspect contrasts. In so far as these three assumptions are true, it follows from them and from what I have said earlier about situations that we have now a general and interlingual way of constructing a meaning-contrast system that is interpretable as a synonym dictionary in any language. How this interpretation operates will become clearer in the course of describing how the whole model operates. There remains the need, however, to justify making the interpretation at all. That is to say, if this model of translation is to be philosophical, there is a need to show the sense in which Roget’s Thesaurus is a philosophical document, as well as a synonym dictionary written in English. And if this model is to be a model of real translation, there is a need to show the connection between aspect contrasts occurring between stick-pictures and basic devices for carrying information in various languages. Let us take the philosophical matter first. Langford, in his article on ‘The Notion of Analysis in Moore’s Philosophy’ (1942), explains philosophical analysis, both of language and of thought, in terms of a characteristic, both of language and of thought, which he calls ‘being idiomatic’. Both verbal expressions and ideas, he says, can be idiomatic; an idea is idiomatic if it is ostensively defined – that is, if you cannot give its meaning by applying the language’s rules. The purpose of analysis is either to mitigate, or to remove an idiom, the analysandum being presumed to be always more idiomatic than the analysans. Thus, though in one sense of meaning, the analysandum and the analysans are synonymous, in another sense of meaning they are not, since the analysandum is always more idiomatic than the analysans. And so the Paradox of Analysis is solved, because a philosophical analysis, if correct, is not trivial; it does not only assert a bare identity; it asserts also a decrease in idiomaticness. Now, ignoring various troubles that Langford gets into owing to his having two conceptions of analysis (the first applying to concepts, or ideas, the second to verbal expressions) in both of which analysis consists in decrease in idiomaticness, I want to examine, but examine critically, his
196
Experiments in machine translation
central notion of ‘being idiomatic’, which is common to both. My first contention is that this is by its nature an empirical notion, deriving not from any philosophical or logical root, but from the detailed, day-to-day study of languages. In what way, then, can an idea, or concept, be idiomatic? It is idiomatic, says Langford, if it has to be ostensively defined. This contention of his is wrong, though, in two ways. Firstly, idioms are just that part of language that is never ostensively defined. If someone asks me, ‘What does ‘‘It’s raining cats and dogs’’ mean?’, I say, ‘It means the same as ‘‘It’s raining very hard’’ ’; and that whether I am defining the idiomatic verbal expression or the idiomatic idea, I do not turn dumb, and drag him to the window. On the other hand, if he goes on to ask, ‘And what does ‘‘It’s raining very hard’’ mean?’ and if he persists in doubt, I do, in the end, have to drag him to the window. So the second way in which Langford’s definition of an idiomatic idea or verbal expression is wrong is that it defines all the ideas and all the verbal expressions in language except the idioms. For both word uses (Langford’s verbal expressions) and concepts (Langford’s ideas) have to be ostensively defined; idioms do not on the contrary; they have to be mitigated or removed, as he says, by analytic definition. Suppose now, in an attempt to save Langford’s general position, we extend his notion of being idiomatic to all words, instead of only to idioms in the narrow sense, it being now only required of the words of any language that some words shall be ascertainably more idiomatic than others. Following Langford, we shall now have to extend it similarly to all concepts. Suppose further that we accept that the notion of ‘idiomatic’ is indeed an empirical one, to be explained in terms of ostensive reference to situations outside language, and that whether it applies to verbal expressions or to concepts. We shall then be compelled to have empirical concepts; that is, we shall be compelled to have a new conception of the nature of a concept. If we can do this – and only if we can do it – can we solve, along Langford’s lines, Moore’s Paradox of Analysis. Let me put this point another way. Langford’s ‘idiomatic ideas’ cannot, by the nature of the case, be the ordinary concepts. Consider: we see a male; we then talk about ‘the concept of maleness’. But whoever talked about ‘the concept of raining-cats-and-dogs-ness?’ Clearly, to talk sense about this last, we want a second, more fundamental type of concept; in my words, a concept corresponding to a basic situation (the situation of wet weather, portrayed in all versions and seen from all angles, including picturesque ones) rather than, like maleness, a concept corresponding sometimes to a basic situation but mainly to an aspect. Let us now, with the need for generalised idiomatic concept finding in our minds, re-examine the basic situation already stick-pictured, that is, the basic situation of grief: of someone being in grief, of someone shedding
Translation
197
tears. It comes to this, that having dealt already with the question of situation, we now have to deal with that of reference; granted that we have now stylised what we mean by a basic situation in real life, we now have to ask, ‘How are we to refer to it?’ Are we really going to assume that there is only one really correct way of referring to this grief situation, that is, by the proposition ‘x is shedding tears?’ Are we further going to assume – as, according to Anscombe, the Wittgenstein of the Tractatus seems to have assumed – that there is only one kind of contrast between sentences that is relevant to this basic situation, namely, the contrast between this proposition in its T-form, ‘x is shedding tears’, and the same proposition in its F-form, ‘It is not the case that x is shedding tears’. We can make this artificially restrictive assumption if we like; if we are concerned with facts, though, our presuppositions will be quite different. For whereas agreement between different people can fairly easily be reached as to what the basic situation referred to by any set of situationally similar stick-pictures is (or, if we remove the stylisation, as to what any frequently occurring situation in real life is), there is only the very vaguest tendency towards agreement as to how any such situation may legitimately be referred to. Thus, in a recent test, and taking now the basic stick-picture of a man pointing to himself (i.e. the middle one in Figure 30), the Language through Pictures series gave the following variety of utterances as references: English through Pictures French through Pictures German through Pictures Spanish through Pictures Hebrew through Pictures Italian through Pictures
I C’est moi Ich bin ein Mann yo soy un hombre (picture not in book) (picture not in book)
Nor could this variegation be blamed only upon differences between languages, for a set of young British philosophers when shown the same picture wrote the following even more variegated set of remarks under it in ‘ordinary language’: ‘It is I.’ ‘Cogito, ergo sum.’ ‘My head is bloody but unbowed.’ ‘My name is John.’ Suppose we now ask, ‘What is in common between all these remarks?’ Certainly, no sentence; not even any word use, in any exact sense. The most we can say is that there is a certain conceptual overlap, that is, that there is a certain overlap of meaning between all these remarks seen in the context of
198
Experiments in machine translation
this particular stick-picture, which could be expressed by saying that all of them contain or presuppose some sort of notion of ‘self’, or of ‘I’. I propose to call this overlap of meaning, whatever it may be, ‘the concept of ‘‘I-ness’’ ’. Let us now have another look at Roget’s Thesaurus. This time, to get a good case, let us go back to the basic situation of grief, and therefore turn to head 839, LAMENTATION. Here we have just such an overlap of meaning. We cannot define it, but by reading through the list of synonyms, we can get a good idea of it; if we could not, there would be no synonym dictionaries. Nor does it matter if the set of word uses in English that might go into that paragraph is always liable to be subtracted from or added to; so long as there remains some overlap, this defines the concept. Nor is this new concept of a concept unknown in philosophic literature. Consider, from this new angle, the following well-known passage: Suppose . . . that we set out to investigate excuses, what are the methods and resources initially available? Our object is to imagine the varieties of situation in which we make excuses, and to imagine the expressions used in making them. If we have a lively imagination, together perhaps with an ample experience of dereliction, we shall go far, only we shall need system . . . It is advisable to use systematic aids . . . First we may use a dictionary – quite a concise one will do, but the use must be thorough. Two methods suggest themselves, both a little tedious, but repaying. One is to read the book through, listing all the words that seem relevant: this does not take as long as many suppose. The other is to start with a wide selection of obviously relevant terms, and to consult the dictionary under each: it will be found that, in the explanations of the various meanings of each, a surprising number of other items occur, which are germane, though of course not often synonymous. We then took up each of these, bringing in more for our bag from the definitions given in each case; and when we have continued for a little, it will generally be found that the family circle begins to close, until ultimately it is complete and we come only upon repetitions. This method has the advantage of grouping the terms into convenient clusters – but of course, a good deal will depend upon the comprehensiveness of our initial selection. (Austin 1975, p. 12, italics mine)
It cannot be doubted, I think, that in his second method, given above, Austin is describing not only a new method of thinking, but also the best possible method of compiling a synonym dictionary; and it follows from that fact that, if it be granted that Austin’s method of investigating word use did in fact bring up deep philosophic issues (and it should be clear by now that I think this must be granted), then the same, or cognate, philosophic issues will be raised by the whole enterprise of compiling a synonym dictionary – which must then be considered not only as a lexicographical, but also as a philosophical document.3 3
‘Metaphysicians engaged in the more profound investigation of the Philosophy of Language will be materially assisted by having the ground thus prepared for them, in a
Translation
3.
199
Grammar, syntax and phrases
Now consider the partition of the stick-picture system with respect to aspects. Suppose that each aspect is represented, though not necessarily recognised, by exactly one icon (call these key icons). Then every aspectually similar stick-picture will contain the same key icon in any one partition; for the distribution of the set of key icons will change, in part at least, with each repartition; since the assignment of key icons fixes the aspect system.4 Let us assume further that every key icon in the total aspect set can be represented, somehow or other, well or badly, in every language; either by a word, or by a grammatical or syntactic device, or by a phrase. (At a pinch, numbers or nonsense syllables can be used for such names; but in practice, mnemonics are much better.) Let any such name for any key icon, in any language, be called a tag. We now have a set of names, in any language, for the members of our total set of contrasting aspects. It follows, moreover, from the whole argument that we have built up (it being always granted that the tags get their primary meanings not from the language in which they occur as words, or subwords or phrases, but from the key icons, and that the key icons in turn get their meanings not from the tags in any language that name them, but from the sets of stick-pictures in the stick-picture system within which they are found) that I have now given a way of defining, generally and conceptually, a set of very general and widely recurrent perceptual distinctions that frequently recur in real life as aspects of basic situations. It is my case that such very general and widely found aspect distinctions cannot fail to have been noticed by the users of any language, and that there will therefore be a tendency, in any language, to refer to these pairs of aspects – that is, to the distinctions – either by very frequently occurring pairs of contrasting words or contrasting phrases, or by contrasting
4
previous analysis and classification of our ideas; for such classification of ideas is the true basis on which words, which are their symbols, should be classified. It is by such an analysis alone that we can arrive at a clear perception of the relation which these symbols bear to their corresponding ideas, or can obtain a correct knowledge of the elements which enter into the formation of compound ideas, and of the exclusions by which we arrive at the abstractions so perpetually resorted to in the process of reasoning, and in the communication of our thoughts.’ From the Author’s Introduction to Roget’s Thesaurus (1852) (italics mine). For an actual example of a key icon occurring in a stick-picture, see the miniature stickpicture of the man pointing to himself that is inserted in the corner of the full-size picture of the man pointing to himself in Figure 30. This key icon is in a sense redundant, for it is already clear that the stick-picture man lounging in his chair is thinking (or dreaming) of himself. The presence of the key icon of the man pointing to himself, however, clinches the matter. It says, ‘Note that the dominant note of this picture is ‘‘self’’, seen against the basic situation of ‘‘man brooding on’’; not, for instance, the fact that the stick-picture contains a thought or a dream.’
200
Experiments in machine translation
grammatical devices (by devices, that is, which operate within a word) or by contrasting syntactic combinations of words (by devices, that is, logically analogous to the grammatical ones, which operate within a sentence or within a paragraph.) I see grammar and syntax also as a contrast system, although I grant that, to see in this way, complex sets of grammatical or syntactic alternatives, for example the Latin case system, have to be broken down into ordered pairs of contrasting alternatives.5 It is one thing, however, to see grammar-cum-syntax as in general a contrast system and quite another, in constructing an actual model, to determine how much of a complex grammatical contrast system of some language to put into the model. Nor is the question of deciding what to put in and what to leave out made any easier by the very great confusion that currently prevails in philosophic circles as to what grammar and syntax in natural languages really are. And it is not surprising that this should be so, since philosophers, by their nature, have to think generally about language, whereas grammars and syntax systems vary in all possible ways, as between languages. The result is that, in practice, being unable straightforwardly to generalise discussion of this phenomenon without doing violence to a multitude of known facts, philosophers usually brazenly identify the habits of their own language with those of the thinking world and, by doing so, provide ground for well-grounded and sour comment by philosophically informed linguists. On the other hand, philosophers can and do reply that there reigns an almost equal confusion, of another kind, in linguistic circles, in spite of an initial appearance of sophistication and precision of attack. For, in order to discuss what they themselves are doing, linguists also, in the end, have to think generally, having explicitly deprived themselves in the beginning of any conceptual apparatus for doing so. They are therefore apt to get trapped into making such remarks as ‘I know you will not misunderstand me if I say that this is what we used to call in old-fashioned language a verbal phrase’. By such obiter dicta, however, they betray themselves not as philologists, but as philosophers, and themselves become subject to all comment that they dispense. I propose summarily to break through this confusion by saying that I want to pick up the relevant basic-situation-referring habits of a language in preference to its grammar. I do not mind, that is, for purposes of the model, if I do not pick up any of the grammar or syntax of a language at all 5
For instance, in the Latin case system, Nominative/Accusative is the primary contrasting pair. Nominative/Vocative can be taken as forming a secondary contrast; Accusative/ Oblique cases another. Among the Oblique cases, Genitive/Dative-cum-Ablative can be taken as primary, Dative/Ablative secondary, and, within each of these pairs, and also between them, further pairs of contrasting uses can fairly easily be built up by consulting such a work as Robey’s two-volume Latin Grammar.
Translation
201
except as grammatical or syntactic forms occur in particular phrases. To do this is not as stupid as it seems. To start with, nearly everything that can be said in any language by using a grammatical or a syntactic device can also be said without such devices, by using a common word or phrase; it is, after all, by using such words or phrases that we explain to learners of the language how to use the grammar. We can say as we choose, for instance, in English, ‘She killed him with a hammer’, thus conveying the same notion here grammatically by using the past tense of the verb; or I can say, ‘She hammered him’, conveying the same notion here grammatically by using the past tense of the verb; or I can say, ‘She killed him: her instrument, a hammer’, using the actual word ‘instrument’ to convey the instrumental idea; or I can say, ‘She took a hammer, and bonk! – he was dead’, thus referring back direct with ‘bonk’ to a presumed known situation, without conveying the instrumental idea in language at all. For the purposes of any kind of formal analysis of language it matters very much, of course, which of these forms I use; for no formal equivalence between them can be established. For purposes of translation, however, it matters much less, for, speaking roughly, they all convey the same information, and their conceptual nearness to one another is more important, for the translator, than their divergence of form. Suppose now, pursuing the provenance of this same example, we look up ‘instrument’ in Roget’s Thesaurus. We are directed immediately to a series of heads that, logically, are all aspects of the same idea: 631 INSTRUMENTALITY, 623 MEANS, 633 INSTRUMENT. Within these, together with the other heads cross-referred to by them, we can find all the ways given above, and more, of dealing with our hammer. ‘By means of, with; by any means, all means, some means’; these come under the adverbial section of 632 MEANS, 633 INSTRUMENT refers us to 276 IMPULSE. Here we find not only ‘hammer, sledge-hammer, mall, maul, mallet, fail; battering-ram. . . . cudgel, etc., (weapon) 727;’ but also ‘strike, knock, hit, bash . . . beat, bang, slam, dash; punch, thwack, whack, strike hard; swap, batter, dowse, baste . . . buffet, belabour (insert here ‘hammer’); . . . fetch one a blow, swat (insert here ‘bonk’); strike at, etc. (attack) 716 . . .’ all the accumulated richness of the English language for describing the classic blunt-instrument-using situation. It comes to this, then: the procedure for classification, in a synonym dictionary, goes in the contrary direction from the procedure of classification of a grammar, though, ideally, grammatical classification should be reached in the end. Thus ‘with a’ (a grammatical device in Latin, a syntactic device in English) is in Roget’s Thesaurus all right, but classified merely as an adverb; ‘hammered’ could be in under IMPULSE, among the verbs, if the Thesaurus was extended to allow of crude differences
202
Experiments in machine translation
between past and future reference; but we should never be able to classify such a system sufficiently finely to get the whole English verbal tense system out of it. Grammar and syntax are potentially there, but they are there in a particularised form, and without being identified as such. From the classification point of view, they get in, as it were, by the back door. It follows, then, that in a general meaning-contrast system interpretable in any language, as a synonym dictionary, the framer of the system will have to deal with grammar and syntax in the same sort of way as Roget does, only, if possible, more fully. This means that the primary system of classification that is required, in order to get at whatever grammar or syntax the system can pick up, will have to be one aimed at subdividing Roget’s heads. Below, two tables follow immediately after one another. The first attached a set of tags to key-icons. The second, using these, subdivides a Roget head, with comments. Here is the raw material of W. E. Johnson’s (1921) ‘Universal Grammar’; and, what a come-down! Sample set of tags, defined by key icons
Icon
Tag
Description of icon in English
BE
a dot
BANG
an exclamationmark
DO
ARROW
DONE
(i.e. the same occurrence shown twice, once in continuous and once in discontinuous line)
Translation
Icon
203 ONE
one stick-like object in a free cloud
PAIR
two stick-like objects in a free cloud
Tag
Description of icon in English
CHANGE
quartered circle — actually the phases of a moon — with an arrow inside it.
CAUSE
two round objects — actually billiard balls — connected by an arrow
KIND
free cloud with objects from which abstraction must be made inside it
HOW
free cloud, as above, with label attached
SAME
two crosses marking two similar objects, also in the picture, the whole in a free cloud
NEXT
As for same, with dot-series and a third cross added, the third cross marking a third object, also in the picture.
C C LRU C L
3.1.
A head in Roget’s Thesaurus classified by using the model
Below is shown a pared-down and reorganised Roget head with the word uses classified, in so far as they can be classified by the use of the set of tags given above, and of numerical cross-references. The set of tags
204
Experiments in machine translation
given above is too sparse to give a natural-sounding classification; it is sufficient, however, to separate out the sub-paragraphs and rows of the head. The numerical cross-references are to be interpreted, in terms of the model, as the overlap of meaning between the cross-referring and cross-referred to head, this overlap of meaning being indicated in the thesaurus – whenever the cross-referring is adequately done – by the presence of the cross-referred string of synonyms in the two heads. Synonyms within a string can also often be distinguished from each other by cross-references, but I have not attempted here so to distinguish them. 839 LAMENTATION Tag Word uses kind lamentation, mourning; lament, wail, 363 INTERMENT; languishment, grief, moan, condolence, 915 CONDOLENCE; sobbing, crying, tears, mourning, 837 DEJECTION; one be sob, sigh, groan, moan; complaint, plaint, grumble, murmur, grief, 923 WRONG; mutter, whine, whimper, 886 CIVILITY; bang kind flood of tears, burst of tears, fit of tears; crying, howling, screaming, yelling, 411 CRY; one bang be spasm of sobbing, outburst of grief; cry, scream, howl, 411 CRY; wailing and gnashing of teeth, 900 RESENTMENT; thing weeds, crepe, crape, deep mourning, sackcloth and ashes, 225 INVESTMENT; passing-bell, knell, keen, death-song, dirge, 402 SOUND; requiem, wake, funeral, 998 RITE; she thing widow’s weeds, widow’s veil, 225 INVESTMENT; man do mourner, weeper, keener; pall-bearer, chief mourner, professional mourner, 363 INTERMENT;
Translation
do
more do
less do how
now be now bang be more now be Bang
4.
205
lament, mourn, grieve for, weep over; condole with, moan with, mourn for, 915 CONDOLENCE; fret, groan, 828 PAIN; keen, attend the funeral, follow the bier, 363 INTERMENT; mew, bleat, bellow and roar, whine, 412 ULULATION; burst into tears, cry one’s eyes out, cry one’s self blind; scream, wail, yell, rend the air, 411 CRY; beat one’s breast, wring one’s hands, gnash one’s teeth, 3 SUBSTANTIALITY; sigh, shed a tear, fetch a sigh for; lamenting, mourning; in mourning, in sackcloth-and-ashes, 225 INVESTMENT; mournful, tearful, sorrowful, in tears, 837 DEJECTION; with tears in one’s eyes, bathed in tears, 824 EXCITATION; with tears standing in the eyes; with tears starting from the eyes; with eyes suffused, – swimming, – brimming, – overflowing with tears; Alas! Alack! Woe is me! miserable dictu! too true! Alas the day!
Words
We have now defined a meaning-contrast system containing situations, concepts, and some grammatical and syntactic forms, particularised as phrases and defined by tags. We have yet, however, to insert into the system any words; that is, the statement that a set of word uses in English are all uses of the English word W does not yet make sense within the system. Let the set of uses of a word in any dictionary be called a fan. If the set of uses is unstructured we shall call it a simple fan; if any method of subclassification of the uses is employed, we shall call the resulting system a jointed fan. Let us call the point of origin of the fan its hinge, and the set of word uses represented in it its spokes. Let the word token W for any fan be called the sign of the fan.
206
Experiments in machine translation moan
moan in 839
moan in 405
moan in 411
Figure 31
Consider now the interpretation of the fan. We shall say that the word token printed in bold at the head of any dictionary represents the alternation of the actual uses of the word which are given underneath it. Thus, the bold word sign for any word W represents an alternation of the form of the form U1 v U2 v U3 . . . Un, each U being a particular use of the word given in the entry. If we now ask, ‘What is there in common between all the uses of W?’, the only safe answer is ‘The fact that they are referred to, in that dictionary, by the word token W’. Thus we arrive at a conception that in a dictionary, in the case of any word W, the word token printed in bold at the top of any entry, the word token of W as occurring in any particular entry, and the word token of W, ‘W’, as listed in the list of words of that language, all vitally differ in logical status. Only when we have fully seen this are we in a position to make a formal model for dictionary entries of words. Such a model, however, will still be unilingual. To make it interlingual, the dictionary maker’s set of defining classifiers for separating U1 . . . Un for any W must be exchanged, in the case of any U, for a definition given in terms of tags. Below are given an entry from Roget’s Index, shown as a simple fan (Figure 31); the same entry, with Roget’s sub-classifications inserted into it, shown as a jointed fan (Figure 32); and the same entry classified by tags, and shown as a jointed fan (Figure 33). From this it can be seen that the system of tags used in this model does not completely separate the members of U2 . . . Un as given in the OED, but that it does something to separate them. The Oxford English Dictionary definition of moan A: AS NOUN Complaint lamentation
1 (a) Complaint, lamentation (in general) (no examples) 1 (b) A complaint, lament (an instance of 1 (a))
(e.g. ‘In Henry’s days the people made their moan that they were ground down’)
Translation
207 moan
moan in 839 LAMENTATION
moan in 411 CRY
moan in 405 FAINTNESS
Noun ‘moan’
Verb ‘to moan’
Noun ‘moan’
Verb ‘to moan’
Figure 32
moan
839
405 or 411
411
405 839 Kind
839 One be
890 do Cross Ref 915 405 Kind
405 One be
405 One be
Figure 33
405 One be
411 do
(Cross Ref 412 Ululation)
208
Experiments in machine translation
1 (c)
obsolete: a state of grief and lamentation ¼ ¼
(e.g. ‘T’would kill my soule to leave thee drowned in mone’) 2 A prolonged, low, inarticulate murmur
2 (a)
Differing from ‘groan’ in that it suggests a sound less harsh and deep, and produced rather by continuous pain than by a particular access or paroxysm;
(e.g. ‘moan of an enemy massacred’) 2 (b)
transference of the low, plaintive sound produced by the wind, water, etc.
(e.g. ‘The moan of the adjacent pines chimed in noble harmony’) B: AS VERB (given as separate entry in OED)
1 (a)
to complain of, lament (something)
(e.g. ‘She . . . bitterly moaned the fickleness of her Matilda’) 1 (b)
reflexive; ‘to make one’s moan’:
(e.g. ‘You should rouse up yourselves and moan yourselves to the Lord’) 2
To pity (obsolete)
(e.g. ‘Does he take no pity on me? Prithee moan him Isabel’) 3 (a)
intransitive: with ‘for’:
(e.g. ‘Achilles moaning for his lost mistress’.) 3 (b)
(causatively) to cause to lament: obsolete
(e.g. ‘And yet my wife (which infinitely moanes me) Intends . . .’) 4 (a) intransitively To make a low mournful sound indicate of physical or mental suffering. (e.g. ‘The King . . . passionately moaned . . .’)
Translation
209
4 (b)
transferred, of inanimate things
(e.g. ‘You hear . . . the forests moan’) 5
transferred To utter moaningly:
(e.g. ‘Madeline began to weep And moan forth witless words’) Let the dictionary entries of the words in a good dictionary in any language be redefined by using a set of heads and a set of tags. Call such a dictionary entry a T-fan. Whatever the entry, the set of heads used in it will now be contained in the set of heads defined earlier as corresponding to basic situations; it is now required in addition, however, that this set of heads should be contained in the set that defines the heads of that particular language’s synonym dictionary. Similarly, the set of tags used for any dictionary definition must be contained within the subset of the total set of tags that has been used in that language’s synonym dictionary. If the dictionary entries of that language can be so redefined – and inspection of the dictionary entries in the examples attached shows that they can – then it follows that their constituent word uses can be inserted into the system, which means that a dictionary entry also can be seen as a sub-system of a system of contrasts.
5.
Specification of the mathematical model
We shall first define a system of heads, taking no account of tags; then we shall insert the system of tags; then we shall map on to the combined system of heads and tags the word system of fans, for any language. 5.1.
Heads
Let a meaning-contrast system, or language (or thesaurus), consist of a finite set of heads, and let the mathematical specification of a head be as follows: Let the total set of word uses in the head be represented by a single alternation formula of the form a v b v c . . . v n, the set a, b, c . . . n be a single conjunction formula of the form a.b.c . . . . . . n. Call this conjunction 0. 1 [conventionally the lower bound of a lattice, and 1 the upper bound. Ed note] and 0, together with the set a, b, c, . . . n form a partially-ordered set to be interpreted as meaning-inclusion. Then (if all semantic and grammatical distinctions between the word uses in the set be provisionally ignored), the connective v can be identified with the Boolean join ¨, interpreted as ‘and/or’, and the connective be interpreted as the Boolean
210
Experiments in machine translation I
total set of word-uses forming a head
word-uses of constituents of the head
O
overlap of meaning: concept
Figure 34
meet ˙, interpreted as ‘and’, in which case the head will be an interpreted lattice of the spindle form given in Figure 34.
5.2.
Theorem of language theory
Suppose each head is treated as a point, and a method is given6 for constructing a new set of finite lattices by adding 1 and 0 elements to sets of head points on the same principle as for sets of word points (namely, by finding and defining overlaps of meaning between heads, in the way in 6
Roget himself provides two methods for combining heads; (I) the Chapter of Contents given at the beginning of the book, and (ii) the numerical cross-reference system between heads. If (I) is used, the very general classifiers occurring in the left-hand column of the Chapter of Contents, which are numbered with Roman numerals and printed in large upper-case type, can be taken as the joins of the less general classifiers numbered with Arabic numerals and printed in small upper-case type (on the ordinary principles of classification); and similarly, the less general classifiers can be taken as the joins of the bracketed and numbered sets of head-names printed in lower-case type and occurring in the left-hand column of the Chapter of Contents. Thus, on this method, join is interpreted but meet is not; a set of heads classified together under, for example, LINEAR SPACE may be presumed to have some overlap of meaning; but this overlap of meaning is nowhere explicitly specified in the Thesaurus. If method (ii) is used, both join and meet are interpretable and specifiable. The superhead set consisting of a head and/or all other heads cross-referred to in it will be interpretable as the join of its constituent heads, while the meet of any two of those heads will consist of (a) the actual set of word uses that are common to both the heads, and (b) the common cross-reference number (if the Editors of the Thesaurus have remembered to put this in). Thus, in method (ii) both join and meet are specified.
Translation
211
which overlaps of meaning have been found and defined between word uses); then (1) these superheads also can be treated as points, and so combined, up to N orders of superheads, N being finite, and (2) by adding an I element and an O element to the total resultant structure, language itself can be defined as a finite lattice, H. 5.3.
Tags
Let the set of tags given in the table form a spindle lattice, T. Form the direct product of T with H, in order to produce the language lattice L. Thus L ¼ T H. Philosophically speaking, the constituent subspindles of T can now be regarded as structures that give ‘new ways’ for seeing the head lattice H, and also structures that can be abstracted from L at will. We could now regard any head as a simplified analogue of Wittgenstein’s concept which he compared in Philosophical Investigations to a gestalt figure (the cube, the triangle, the steps, the duck rabbit) if we imagine the lattice as an elastic and simplified space. 5.4.
Fans
As we have described it above, any fan is a partially ordered set, and any jointed fan is, in addition, a tree. T fans are therefore trees. We have now to map the set of T fans on to L. We know already that the hinge of any T fan, that is, the point of origin of any T fan considered as a tree, is to be interpreted as an alternation of the form U1 v U2 . . . Un; it can therefore be interpreted as the join of U1, U2 . . . Un. Let the set U1, U2 . . . Un for any TF be called the U set of TF. Now, if the U set of any T fan TF can be assigned points on L, and the inclusion relation of any TF be interpreted as meaning inclusion, that is, if it be given the same interpretation as the inclusion relation in L, the TF can be meaningfully mapped on to L; for the join of the U set already has an interpretation in L, and any meet of any pair of points in the U set can be interpreted as ‘that which is in common between the two points of the U set, that is, the fact that they are both referred to by the sign W’. Consider now any point Tp of the U set of any TF. Any Tp will be defined by being assigned one or more tags and one and only one head; it represents, as in any classification that can be shown as a tree, what is in common between the meanings of the assigned tags and the assigned head. It is thus interpretable as a meet in L; since any combination of tags will have a unique meet in the tag lattice T, and every head H will occur in the complete T lattice assigned to it, since L is the direct product of T and H. Thus any Tp will occur in L. If it be now assumed that the inclusion relation of any TF can be interpreted as
212
Experiments in machine translation
meaning inclusion, which can intuitively be seen to be the case, then it follows that any TF can be mapped on to L. [ . . . ] 6.
Translation of: ‘My father is a strict parent.’ The result is the analysis of [the above quoted sentence] Sn. Result My I ˙ have Father 166 PATERNITY ˙ he ˙ man 11 FAMILY ˙ man 164 PRODUCER ˙ man 737 AUTHORITY is 494 TRUTH ˙ be a one strict 494 TRUTH ˙ be 737 AUTHORITY ˙ how parent 166 PATERNITY ˙ he ˙ man 11 FAMILY ˙ man 164 PRODUCER ˙ man
6.1.
The reformulation
1. Take the first T fan that retains more than one head specification in the analysis. Scan the lists of Roget synonyms, in the dictionary, that are attached to the head specifications of this T fan. Intersect these lists of synonyms, in pairs, retaining only words that are common to both lists.7 Repeat this operation for all other T fans in Sn that have retained more than one head specification. The intersecting sets of words will be the translations of the original word that matched in the dictionary with the T fan. Result
father
strict parent
7
166 PATERNITY ˙ 11 FAMILY 166 PATERNITY ˙ 164 PRODUCER 166 PATERNITY ˙ 737 AUTHORITY 11 FAMILY ˙ 164 PRODUCER 11 FAMILY ˙ 737 AUTHORITY 164 PRODUCER ˙ 737 AUTHORITY 494 TRUTH ˙ 737 AUTHORITY 166 PATERNITY ˙ 11 FAMILY 166 PATERNITY ˙ 164 PRODUCER 11 FAMILY ˙ 164 PRODUCER
The synonym match must be exact: phrases do not match with words.
parent, father parent – parent – – strict parent, father parent parent
Translation
213
2. Where more than one translation is given take the translation with the most specific set of tags. Result:
‘father’ translates as ‘father’ ‘parent’ also as ‘father’, since he ˙ man is more specific than man.
3. In the case of any T fan of which only one head specification has been retained scan the corresponding list of synonyms in the dictionary, and take the first. Result:
‘is’ now translates ‘be’.
4. In the case of any T fan that retains no head specification, transcribe [ . . .]: Result:
‘my’ translates as ‘I have’ ‘a’ translates as ‘one’
and the final do is retained. Correlating these results, we now get: I have father be one strict father do as the translation that the model gives of the English sentence My father is a strict parent. Editor’s Commentary MMB’s view of translation was that it was a natural and not a bizarre human activity, and a philosophy of meaning must account for it. That basic assumption set her apart from most English-speaking philosophers, for whom translation, like the speaking of other tongues on which the ability relies, is both unusual and philosophically irrelevant. So much for real translation (what she claimed in this paper to be offering a ‘philosophical model’ for), but it must be remembered that part of the standard philosophical apparatus in England during her formative period was the notion that a proposition, or meaning, was what two synonymous sentences shared, necessarily shared one might say, since that was what ‘synonymous’ meant. Hence the idea of synonymy, as what translation plausibly required, was a central concern at the time she wrote, and Quine’s radical criticism of the notion (1953) had not then been fully absorbed. Here, in fairness, it should be mentioned that Casimir Lewy was a major figure in analysis at Cambridge during MMB’s life, and he, uniquely, did
214
Experiments in machine translation
make use of real translation techniques in his work, however baroque they might seem to practising translators. No one who ever heard him lecture could ever forget the consequences of the fact that ‘ ‘‘Vixen’’ means female fox’ does not express the same proposition as ‘ ‘‘Vixen’’ means ‘‘female fox’’ ’ since the second whole sentence, translated into German, say, would not tell you what vixen meant, whereas the first would. However, other parts of what she believed, or had inherited, made her position more difficult to sustain: she had taken on much of Wittgenstein’s view of language, and that was in large part an attack on that very view of propositions. For him, to know a language was not simply a matter of propositions, but was to know a form of life, and knowledge of a language could not be separated out from culture, knowledge, gestures, arguments and so on. That led naturally to a view of language and the world often called Whorfian, in which a language, to some degree at least, determines its surrounding world, that is, such that different languages cannot just chat about a common, agreed, ‘real world’. And on that view translation is impossible: if we do not share a language, we do not share a world and hence there may be nothing determinate to be translated. It was along these lines that John Lyons once argued that you could not even translate ‘The cat sat on the mat’ into French, because they have a quite different view of mats and no generic word for them as well as requiring that the sex of a cat be specified. Interestingly enough, of course, both Whorf and Wittgenstein normally appear in facing-page translations (in Whorf ’s case with translations of the Indian languages he discusses), which fact rather undermines both authors’ cases. In this paper MMB was aware, implicitly at least, of these conflicts, and sought a way out. First, she sought an escape from a closed world of language and culture to an objective interlingual world of reference, although she did not want the conventional and simple-minded world of formal individual objects that still seems to satisfy most of today’s formal semanticists and ‘realists’. She proposes to call a model of translation philosophical ‘if it has the following characteristics’, which she listed above, and there, in those four assumptions, is much of what drove the empirical work of the CLRU in later years; but one must also ask whether the four constitute a consistent set, especially when taken together with other principles she held. The reference to ‘transformation within a language’ is certainly an implicit criticism of Chomsky, whose system of transformational grammar had been first set out in Syntactic Structures (1957) four years before this paper. But the mention by MMB of the need for a ‘universal grammar’ is particularly interesting, because it had not then, to my knowledge at least, become the
Translation
215
key Chomsky phrase: it did later. MMB’s usage points further back to Wittgenstein’s, and sometimes to Carnap’s (1937), goal of a universal grammar of forms that was not just that of a language, nor was it simple-minded structures out there in the world. Whatever it was to be, it was what saved Wittgenstein from a closed Whorfian world of language and culture. In Chomsky’s theories that universal element first took the form of ‘deep structures’ for language, and then later became constraints on possible languages, hardwired in the brain at birth (the root and, later, meaning for him of ‘universal grammar’). MMB’s tack was different: she wanted to draw together the insights of Wittgenstein’s earlier Picture Theory of Truth (in the Tractatus, 1922) and his later fascination with iconic representation (in the Philosophical Investigations it was presented first as the question of how we know an arrow picture points one particular way). For MMB that interest had led her to view Chinese characters as icons and the belief that they were, like the arrow, non-arbitrary. That interest had given impetus to her association with Michael Halliday (then the Cambridge University lecturer in Chinese), which led later to a mutual interest in syntax and meaning. In the Translation paper, all this also led her to put forward her view of situations as interlingual and universal, and that these could be captured by stick-pictures of the kind found in Richards’ and Gibson’s original Teach yourself English through Pictures (1952). That representation system was not chosen at random, since both those authors had themselves been profoundly influenced by Wittgenstein, and their work was a sort of practical, highly successful, commercial Wittgensteinianism. We would now call it technology transfer from philosophy, I suppose. It really did work as language teaching, at least between culturally close languages, as many of us can testify from experience. To my knowledge, it was never really tested whether a native speaker of some language remote from Europe can in fact understand the stick-picture for SHE in the way MMB hoped (‘a skirted figure with a womb sign’ ¼ the Venus symbol). What is the paper about? The paper is only about translation in the sense that any representation scheme for natural language will be about translation. MMB also believed in an interlingual hypothesis about translation and representation: that for n languages, 2n translations into and out of an interlingua will be less effort than n(n-1) transductions between the languages taken pairwise. This has always been an appealing argument (see Bar-Hillel, 1953, and Masterman, 1962a), but MMB’s paper is much more immediately about how to obtain a representation for natural-language sentences, and there are also claims in
216
Experiments in machine translation
the paper that the use of stick-pictures and the equivalents of those pictures in a ‘language of semantic primitives’, give an interlingual representation of a sentence, free of the features of any one language, such as English. The paper was originally part of an Aristotelian Society symposium with Willy Haas, whose commentary originally followed MMB’s paper immediately. It is quite clear that he had no idea whatever what the function and content of the paper were, and my only hope now is to do better, not because I have any analytical skills he lacked, but because I had long exposure to what MMB wanted to say. But it must be admitted that it is very difficult indeed to draw out a single clear view of the paper from what is actually in it: it has all the faults as well as the advantages of MMB’s style, among the former being a set of suggestive and attractive notations, none of them wholly perspicuous. Added to that are sets of definitions of intimidating formality, but which in fact serve little purpose beyond that of trying to make the paper look formal to formalists. I think we all understand this temptation, but will have to strip all this away in discussing the paper, and try to locate what is essential and intuitive in it, which is also the best part. Situations MMB uses a notion of situation: . . . not very clever differing-language speakers with minimal sign apparatus can understand one another – that is, translate to one another – if and only if they can both recognise and react to situations common to both of them in real life. What I want to say is that, even when we know one another’s languages, we still do the same thing. It is important to side with the language teachers on this, and not with the behaviourist psychologists or the linguists; for either of these last two groups . . . can talk one into thinking that translation, in the ordinary sense, is impossible. But language teachers who teach translation know how it is that it can occur: the right hotel room is engaged, the puncture in the left back tyre is mended, the telegram is sent, the friend’s (unknown) friend is safely met at the station, all because . . . they know a very great deal about the relevant situation. And in so far as this knowledge of a common stock of situations breaks down, as it well might break down as between us and the termites, or between us and sulphur-breathing beings from another planet, then it is evident that, whatever the language involved, translation becomes impossible.
She then moves to the equivalent of situations on the page: they are to be the stick-pictures of Richards’ and Gibson’s English through Pictures, and a philosophical version of that pictorial technique is also attributed to Ansombe as a tool for explaining what Wittgenstein meant by the Picture Theory of Truth. Interestingly, it is very similar to the recent technique of illustration used by Barwise and Perry to explain their
Translation
217
influential Situation Semantics (1983, a book in which, curiously, no reference to Wittgenstein appears). At this point, a series of complex definitions occur, and I suspect that their essence is as follows: ‘Situationally similar’ stick-pictures are such that they contain the same basic situation, which means the same basic icon (e.g. a human – though this is my use of icon). The set of pictures with that basic icon can then be subdivided in many ways, but the ways will involve contrasts of aspect, as she puts it, for example there will be a subset in which the figure has eyes, to be contrasted with another in which it does not, and that difference is significant in terms of the use of the pictures (in that case, MMB claims it distinguishes a particular human from a generic one). Later on, it seems that basic icons can themselves be contrasted with each other, or with similar ones, in ways that are not captured by aspects: for example a single stick human is contrasted later with a group of them, and a stick human is contrasted with an egg-like object. She then suggests that the stick-pictures be pasted on cards and that people be invited to subdivide the pack so created in different ways: this resorting and renoticing process is just what we do in real life, when we perceive a situation, as we say, ‘from a new angle’. According to many people – following the lines associated in linguistics with Whorf . . . this is also what we do when we start to think in a new language; the new language, which will use different sorting principles, will actually make its user notice different features of the world.
The passage should recall MMB vividly to anyone who knew her: her patience (solitaire) playing as relaxation when she was tense, and her belief that having things stuck on packs of cards and then spreading them out was the way to perspicuity at least, and insight at best. It also had the capacity for striking novelty: for bringing together things previously unassociated. It also has the characteristic faults of MMB’s style: the would-be formal definitions that actually mask what is going on: for you, the reader, have to work out what she really intended and how it could be brought to bear on concrete problems. Again, much of the paper is taken up in scoring obscure points against largely forgotten works like Langford’s ‘The Notion of Analysis in Moore’s Philosophy’ (1942). MMB clearly thought that these ways of connecting what she was doing to authors they knew were necessary for a philosophical audience; she did not quite have the necessary confidence in the sheer novelty of what the bulk of the paper was about. As Haas’ puzzled reply showed, the philosophy sections did not disarm or interest the philosophers, they just remained puzzled, even though some younger philosophers became sure there was something important and novel in what was being said, and found their way to CLRU to try to discover what it was.
218
Experiments in machine translation
It then becomes clear that the purpose of the whole paper is to use techniques applied to a combination of stick-pictures, semantic primitives (see Chapter 7) and thesauri (see Chapter 5) in order to give an analysis of the sentence ‘Father’ means ‘male parent’. It was exactly such a sentence as this that Moore had used to illustrate what he called the Paradox of Analysis (1922): if the quoted items mean the same, the argument went, then the sentence is trivial, but if they do not, then it is false. Yet, it remains unsatisfactory (and hence the paradox) to say a sentence is either trivial or false when it is clear that sentences just like it are used to give real information about the meanings of words to those ignorant of them, for example ‘turdiform’ means ‘shaped like a thrush’. What MMB was seeking in this paper, among other things, was to show that the essentially computational techniques of analysis she was proposing were utterly unlike Moore’s and involved a complex method for actually trying the combinations of the possible real senses (in Austin’s sense of what is actually to be found in dictionaries) of the words of the ‘Father’ sentence against each other, then throwing out the implausible or inappropriate combinations of the senses of the words, and arriving at a structure for the whole sentence that is non-trivial, in the sense that the parts on either side of the copula are not identical. But in a rather deeper sense, I would maintain, my criticism above stands: the use of these techniques, designed or intended for the analysis of serious realistic natural language texts, was applied to a philosophical sentence in this paper, simply so as to play the philosophers’ game, and even though they themselves were largely unable to follow what was being done with one of their long-treasured examples at the time. It should be said too that this is precisely what Katz did rather later, with his and Fodor’s semantic marker projection technique (1963). That became famous as the basis for a semantics of primitive atoms to be attached to a version of Chomsky’s syntactic theory (1965). And the applications Katz himself made of it in his more philosophical papers were only a re-analysis of classic examples of Quine, such as: Bachelors are unmarried men. Quine, as we noted earlier, had already questioned (1953) the virtue of any attempt to divide putatively ‘analytic’ sentences like this from ones like: Bachelors are all twenty feet tall.
Translation
219
Katz and MMB’s treatment of philosophical examples were very similar, even if the quotation marks did not fall in exactly the same places in the Moore, Lewy and Quine originals. He, like MMB, used techniques based on primitive meaning atoms but which also took account of the genuine lexical ambiguity of the words concerned (as did MMB but not the philosophers in question) so as to show that the first bachelor sentence really did come out analytic, in the sense of having equivalent structures on either side of the copula, and that analyticity therefore had an empirical basis. The similarity between MMB’s and Katz’s ways with philosophers’ sentences is striking (i.e. adding real lexical ambiguity, plus real symbolic manipulations), and both were also aiming to undermine a philosophical claim not founded on symbolic manipulation: one of Quine’s in Katz’s case, and of Moore’s in MMB’s, though her paper was earlier by some years. It is also interesting that they came to opposite empirical conclusions: putatively trivial sentences of the same type are found acceptable and trivial by Katz and non-trivial by MMB. I have now cited and described the goal to be reached by the paper, so let us hurry on to the methods. Before things become too complicated, let me extract for examination a technique MMB produces almost in passing, but which later became celebrated in linguistics. MMB marshalls her weapons and, before attacking the target ahead, argues that grammar and syntax are also contrast systems, rather like the one she had wanted between stickpictures. In developing a notion of syntactic contrast without meaning contrast, she rehearses exactly the paradigm that later underlay Fillmore’s case grammar (1968) in sentences like: She killed him with a hammer. She hammered him. She killed him, her instrument a hammer. It has often been observed that many natural-language processing systems adopted Fillmore’s case grammar as a representational device, though without adopting its original linguistic goals, and without giving it up when the underlying theory was abandoned by virtually all linguists. The aside here in MMB’s paper can serve to remind us that, in some sense, alternative sources, ones closer to language processing, were already available for that idea, especially since the basic case elements were present in the pidgin language of primitives MMB proposed for machine translation (see Chapter 7). MMB now makes two rapid notational moves, which are perhaps easier to illustrate than explain:
220
Experiments in machine translation
1. The contrasts between stick-figures, as regards aspects at least, are associated with the contrasts between two semantic primitives drawn from the CLRU interlingua NUDE (cf. Chapter 7). 2. Roget’s Thesaurus is taken and a single sample head (e.g. 839 LAMENTATION, and see Chapter 2) is broken up into rows of semi-synonyms, some of them cross-referenced to other related heads in the thesaurus, and then the rows, so separated, are associated with a list of primitives, now called tags and written in lower case. We now have a three-way association: thesaurus row, a set of semantic primitives, and a stick-picture. Let us delay analysis of that equivalence (none is really given in the paper) and quickly note two more moves: 3. The senses of an English word are taken as individuated by their occurrences in the thesaurus, for example ‘moan’ in Head 839 LAMENTATION, defines one sense of ‘moan’. Then these differing senses of a word are arranged on a hierarchical tree, so as to represent the way they divide occurrences of a word into subsets, that is, a sensenode higher up the tree can be considered to cover all the occurrences in text of a node it dominates (as set out in Chapter 2). Then trees are produced where the nodes are also associated with the primitive tags of (2) above. 4. MMB then produces a fan (as in Chapters 2 and 3) to represent the spread of the senses of a word, that is, so that each ‘spoke’ of a fan represents a sense, but that fan is then replaced by a lattice, with the fan being the upper half, as it were, and such that the inverted tree in the lower half of the lattice was meant to capture the ‘overlap of meaning’ of corresponding items in the upper half. The operations that form a substantial appendix then follow, and involve pages of very complex set-theoretic manipulation, using UNION and INTERSECT, where the former moves you up from n nodes in a lattice to a higher node (guaranteed unique by the lattice form) taken to represent the widest class of contexts of all the words in the n. While the INTERSECT of n nodes takes you downwards in the lattice to a (similarly unique) node, to give you a node expressing the meaning overlap of the n, expressed in terms of semantic primitive tags. The basic idea is not hard to grasp: CLRU did a range of experiments that took the intersections and unions of the contents of Roget’s Thesaurus heads (described above in Chapter 6). These computations were perfectly clear, even if not wholly satisfactory as a technique. Again, the notion of meaning intersection, in terms of primitives, is not hard to grasp, and follows from any elementary consideration of Euler diagrams in a logic class: if a class of entities are deemed all to have the properties ANIMATE and FELINE and another different class of entities are deemed
Translation
221
ANIMATE and BOVINE, then we would expect any UNION of their nodes (i.e. higher up the lattice), to somehow express the idea BOVINE and FELINE, or cows-and-cats, while we would expect the ‘meaning intersection’ in the corresponding lower half to contain only the property ANIMATE, that is, all the meaning the sets have in common, which is to say, the one property shared by all their members. So far, so good. However, the hand computations shown by MMB in this chapter are of enormous complexity, and I suspect no one has fully understood them. Indeed, it would not be worth trying to follow them in full detail in that, once one saw what she was trying to express algorithmically, then, given a more sophisticated view of processes and programs, it would be easy to devise clearer methods and formalisms for achieving that end. The virtue in what MMB was doing, and its originality, does not rest on its ability to convey to us now, almost forty years later, the detailed content of those processes: but rather on the suggestiveness of what was being done in meaning computation, and the boldness and freeness of the notational devices drawn in. These descriptions in the paper are, in a sense, only icons for possible computations. It is worth reminding ourselves at this point of what MMB’s conclusion was to be, and then analysing it a little and drawing some more modern parallels. Correlating these results, we now get:
I have father be one strict father do as the translation that the model gives of the English sentence
My father is a strict parent. This is in fact the very end of the paper, and lest it create a bathetic reaction, let us reconsider it, and how and why MMB reached it and, above all, what was right and wrong about that, by which I mean fundamentally right or wrong, and not just a tracing of complex and difficult notation that we have not the time here to analyse in sufficient detail. It should be noted also that I have deleted much of the last section of the paper (‘Examples of the Operation of the Model’) and replaced it with the explanation above, since a modern reader will find the dense notation almost impossible to follow. MMB herself referred to this outcome as a pidgin, and it relates directly to her view of pidgins as possible usable (speakable and writable) interlinguas for MT (see Chapter 7 above). In that sense only, Haas was right to see this paper as concerned with an interlingua for MT, but what he seems to have missed was the analytic goal I described briefly earlier: the desire to
222
Experiments in machine translation
cut through philosophical analysis techniques and show that those same issues could be seen as empirical issues of language analysis with empirical outcomes. That was also, as we saw, Katz’s goal, and it has been equally unsuccessful within the philosophical community. Indeed, it is worth recalling at this point that David Lewis (1972), like most of the American philosophers of his generation, used Katz and Fodor’s system of semantic primitive codings as the principal example in our time of how philosophy cannot be done and meaning cannot be captured, even in principle. Lewis stigmatised such codings as ‘markerese’, the translation of one language into another, with nothing being explained by the process. He would certainly have said the same of MMB’s pidgin coding above, had he known of it, and Haas, in his commentary, made this very point. Haas noted that since translation into and out of the pidgin interlingua required two distinct processes of translation, that process could not therefore explain translation itself in any way. And it is quite fair to ask what MMB can have thought she was explaining by producing this curious pidgin sentence as the output of complex processing of another, quite different, English sentence. The answer to this was touched on earlier: it seems clear that MMB considered the pidgin interlingua, of tags or primitives, was justified in two ways: one by the stick-pictures, which provided some reference to human situations of a stereotyped form. Those who find this odd should remember the motives behind Wittgenstein’s so-called Picture Theory of Truth (1922) and, more recently, Barwise and Perry’s Situation Semantics (1983) to do the same thing: to escape to a world of things that were organised and related and distinguishable, rather than being just the possible worlds or sets of abstract logical objects. Secondly, MMB thought that the thesaurus gave access to real distributional facts of a language – English in this case – and that a primitive expression was to be justified (and thus be more than mere English) if it was tied formally to a distributional classification of real usage. The problems here are many: first, that MMB never really considered interlinguality in any serious or empirical way. The stick-pictures had indeed been used to teach a range of languages, but most of them were culturally close to purely western norms (e.g. zodiacal signs for sex, and skirts on women stick-figures). Again, the primitives still remained very much like English words and, even if their sense was given by the thesaurus rows themselves, that thesaurus remained an obviously human construct. It is revealing that, although MMB, like most philosophers of her generation, used the term ‘word sense’ without any qualm as a term of clear significance, it is in fact far from clear that saying that ‘post’ has six rather
Translation
223
than one or fifteen senses is a claim based on any clear empirical or distributional criteria. Although more recent empirical work (Wilks et al. 1996) may help here, MMB certainly had no access to it in her time. In spite of all these problems, it is clear that empirical work has been done in recent years that MMB would have seen as very much in the spirit of what she described: Fass (1988) has designed and programmed a system called Collative Semantics that computes graph-like structures of meaning that MMB might have recognised as the next best thing to lattices. Guo (1992) has reconsidered the issue of primitives and their role in dictionaries and shown how a primitive set can be shrunk or grown with well-defined cycles of definition. Plate and Slator (see Wilks et al. 1996) have shown how a notion of word-sense cluster, corresponding roughly to our intuitive notion of sense, can be obtained by large matrix calculations over the distributions of words in texts. This last owes a great deal to work done by Spa¨rck Jones (1964/1986) on such clusterings of semantic terms and done when the computing resources were not available. That earlier work was done while she was working closely with MMB at CLRU. A minor point to note here is that the pidgin tags attached to thesaurus rows had no internal syntax: they were just strung together but, as we shall see (Chapter 10), MMB sought later to remedy that in work on pidgin languages with the use of what she called semantic message detection.
Part 4
Phrasings, breath groups and text processing
9
Commentary on the Guberina hypothesis
There are two reasons why I am writing this preface to a presentation of Peter Guberina’s hypothesis that there exists a single formula for semantic progression at the basis of all human communication. I think, firstly, that this hypothesis, whether universally true in its present form or not, represents a new generative idea of the first magnitude in the basic research of the mechanical translation field. It is insufficiently appreciated by workers in other fields how many fundamental new basic hypotheses of the nature and characteristics of human communication MT research has already thrown up. There is the CLRU (and others’) idea that a semantic system of thesaurus type can be mathematically represented by a lattice, algorithms done on it, and a mathematics of semantic classification built up from it; there is Yngve’s hypothesis of the ‘limit in depth’, which must occur in the grouping on linguistic units within sentences; there is A. F. Parker-Rhodes’ (and my) idea of the applicability of the mathematical notion of lattice centrality to the notion of exocentric syntactic form; there is Ida Rhodes’ idea that quite simple conditional probability chains can be used in doing syntactic analysis (because that is what her idea really is); there is Chomsky’s idea that a full language can be mechanically constructed by deriving it mathematically from a small set of kernels; and now there is this Guberina hypothesis that there must exist one and only one basic form of semantic progression (which is both quite simple and also formalisable at the basis of all human communication). There is current widespread detraction of the MT field because of the false claims that have been made on behalf of the present state of the art in political quarters and in the popular press. In my view, however, detractors of this field should also ask themselves whether their detraction of it, and/or their prejudice against it, is not also partly due to the number of fundamental and very general hypotheses about human communication – hypotheses that are as dissolvent to contemporary overspecialisation as they are irritant to the would-be scientific complacency of contemporary second-rateness – which are being thrown up by research workers within it. 227
228
Phrasings, breath groups and text processing
Secondly, I think, as Guberina also does, that it is applicable to making a semantic model of interlingual MT. The following factors have, I think, contributed to giving Guberina a new eye on the problem. 1. A large part of Guberina’s daily life is spent in developing electronic techniques for helping the deaf to speak. This means that, for him, what is being talked about – that is, the actual subject of any piece of discourse, and the linguistic elements that carry it – is vastly more important than what is said about it. If the deaf man can once pick up the subject of conversation, three-quarters of this problem is solved, even if he cannot clearly hear all that is said about it. If, on the other hand, he clearly hears some one thing that is said about some basic subject of discourse, while the actual subject of discourse remains unknown to him, very little of the deaf man’s problem is solved; he has only heard one thing. This practical fact, and his daily contact with it, has enabled Guberina to escape without strain from the current habit of up-grading predication (so noticeable in Russell and in the whole intellectual tradition that derives from Principia Mathematica). On the contrary, he has streamlined and made practicable the Whiteheadian-cum-Hegelian idea that human communication consists of patterns of semantic interactions between ascertainably cognate subjects of discourse. There is a persistent tendency for the therapy of the deaf to give rise to new fundamental theories of semantics. Thus, in the seventeenth century (in Britain) this therapy in part provoked Wilkins’ Universal Character. 2. Guberina is a professor of phonetics, not of linguistics. His empirical data, therefore, are graphs, not texts, and his daily subjects of meditation are not ‘-emes’ derived from written texts, but intonational forms. All linguists pay lip-service to the primacy of spoken language; few, in practice, allow their imagination to be stimulated by it. 3. Guberina is also preoccupied, in his practical life, with developing audio-visual, or situational, methods of language teaching. As a teaching technique these methods can be and are being criticised. As a philosophical stimulant to generality of exact thought about the problem of systematising an interlingual scheme of extra-linguistic reference, they seem to me to have a heuristic value that is unrivalled. The basic criticism that can be made against Guberina – apart from the fact that he has developed no such semantic systems as the CLRU system T' to use for the development of his hypothesis – is that nothing actually found in the whole gamut of ‘communication’ (except possibly in the pattern of the linear threading of the genes upon the chromosome) actually exhibits the longueurs of his eightfold semantic progressions. This is not quite true, actually; the ancient Chinese philosophy of Mencius intermittently exhibits
Commentary on the Guberina hypothesis
229
it; so does Longfellow’s poem Hiawatha; so do the Hebrew psalms. Nevertheless, the fact that all the ordinary devices of, for example, modern prose must be regarded as variegating and/or abbreviative tricks for shortening and brightening the inescapable basic advance of the single, inexorable slowly rolling form of semantic progression – this fact undoubtedly acts as a deterrent to believing the theory. (This basic difficulty is later discussed at length.) On the other hand, it is much easier to analyse – and to interrelate – abbreviative and variegating devices once one has some idea what they might be abbreviations or variegations of. Of course, I am prejudiced in favour of Guberina’s theory. When we met I realised that his hypothesis provided the basic general rules for formula making, which in my purblindness I had exemplified without discerning the nature of when I had set out my specification of the system T'. He, on his part recognised the information-retrieving system T' to be the missing tool for basic semantic model making, which he ought earlier to have himself constructed – if his daily life had not been ‘torn between compassion and research’. Of course, therefore, we are prejudiced in favour of one another; but this is, in itself, no reason for believing either of us. It is because I think that any basic hypothesis of the semantic nature of human communication should be exposed to the fullest possible examination, criticism and commentary, that I am now taking such trouble to present Guberina’s hypothesis, in English, to this colloquium, and, incidentally, to write this preface to it.
1.
The logic of logic and the logic of language
1.1.
The separation of the logic of logic from the logic of language
1.1.1. An introductory historical account is given of the separation, in the fifteenth century, of the ‘logic of logic’ from the ‘logic of language’. Comment: As this first paragraph is later itself used, in the commentary, for testing the applicability of Guberina’s hypothesis, it is there quoted and translated in full. 1.1.2. From the separation, however, there resulted a lacuna for there was now no general ‘logic’ (or ‘skeleton’) of language. Mais un autre probleme fut ouvert: de´crire et de´finir la logique du langage. Tant que le syste`me grammaticale fut subordonne´ a` la logique, une possibilite´ existait de former une ossature des faits grammaticaux ou`, apparrement, furent visibles les contres et les e´le´ments relies. Une fois ce joli e´difice e´croule´, tout e´tait a` refaire . . .
1.1.3. The result was that an entirely negative hypothesis of ‘illogicality’ of language was always invoked to explain difficult features of a particular
230
Phrasings, breath groups and text processing
language; for example, the presence or absence of the definite or indefinite article, the ambiguity of verbs in the present tense, the existence of singleword sentences, etc. etc. Il en re´sulta qu’une hypothe`se, que la logique de langage e´tait purement illogique, fut invoque´ pour expliquer toutes les re´alite´s complexes, multiformes des expressions linguistiques . . .
1.1.4. There tended to remain, however, a ghost-like survival of the ‘logic of logic’ to apply to complex (i.e. to molecular) prepositions. 1.1.5. Meanwhile, the (original Aristotelian) ‘logic of logic’ developed into modern meta-mathematics, an instrument of great power throwing light upon innumerable forms in language (e.g. modalities of verbs, definite articles etc.). There would be no basis for semantics, not even a ghostlike one, if modern meta-mathematics ceased to exist. Qu’on aille de la pense´e a` la langue ou des mots a` la pense´e, la logique de la logique intervient toujours. Que dire de la semantique, qui disparaıˆ trait a` l’instant ou` l’on e´liminerait la logique de la logique . . . Il n’y a aucun doute qu’un meˆme auteur puisse renier l’enseignement de la grammaire base´e sur une logique de la logique et cre´er un autre enseignement sur la base d’une autre logique de la logique . . . .
1.2.
Re-estimate of the situation
1.2.1. The question is, whether the moment is not now ripe, to realise the intuitions of F. Brunot (La pense´e et la langue), of C. Bailly (fondateur de la stylistique), of the modern school of phonologists, and of all those many linguists who pay lip-service to the importance of meta-mathematics, by laying the foundation for a new and empirically based, but also general logic of language. 1.2.2. A base much wider than that of current linguistics is needed for doing this: 1. Language must be studied as a sociological institution geared to exterior reality (‘qui se rattache a` la re´alite´ exte´rieure’) and geared also to 2. meaningful thought, which must be taken as a psychological reality; it must be assumed to exist (‘et a` la re´alite´ de la pense´e, point du de´part du langage’). 1.2.3. The heart of the whole matter is to postulate a fundamental Gestalt (‘un ensemble’) and to specify its nature. Il paraıˆ t peu vraisemblable que les expressions linguistiques puisent eˆtre comprises par elles-meˆmes sans recours a` leurs significations et sans recours a` un ensemble qui puisse explique aussi bien la signification de l’expression linguistique que l’expression linguistique comme telle . . .
Commentary on the Guberina hypothesis
231
This hypothesis would have been put forward long ago but for the hypostasis, by the older logic of logic, of the subject-predicate construction: 1. The interconnection (‘solidarite´’) [semantic overlap: MMB] between natural phenomena, being invisible, remained unnoticed, whereas the isolated units (‘e´le´ments’) which were so interconnected, and which were the sole observables, were taken as self-contained isolates, and so no hint was given of the existence of the underlying gestalt (‘ensemble’) as points on which the linguistic isolates could alone be operationally defined. 2. Linguistic expressions though apparently linear in form (‘fait line´aire’) are actually subgestalten (‘solidaire’), as are also the units (‘solidaire’), as are also the units (‘unite´s’) of thought itself. Preoccupation with the linear form of the expressions – and the exclusive use, as a grid, of the subject and predicate categories of the older logic – so distorted everyone’s view of the underlying gestalt (‘ensemble’) that the question of determining its specification was never even allowed to arise. 1.2.4. The consequence of all this was that, in sixteenth century [Idealist] philosophy (‘L’ide´e philosophique’) – all philosophies, in fact, that saw reality as a gestalt – burst through and blew up not only the categories of the older logic, but also those of all other scientific systems based upon the non-interdependence of phenomena (‘les syste`mes base´s sur la noninterde´pendance des phe´nome`nes’). As far as linguistics was concerned, when the dust began to settle, linguistics more and more tended to claim for itself an absolute autonomy [the baby was thrown away with the bathwater: MMB], the very notion of the possibility of there being semantic values (‘des valeurs se´mantiques’) was dismissed with derision, and all general logical analysis, no matter of what kind, was regarded as a cage from which linguistics must escape at all costs. 1.2.5. This already complex situation was even further complicated by the almost accidental creation of a psychosociology of language (‘une psychologie-sociologie linguistique’), which then proceeded, over the next fifty years to make enormous strides. 1.2.6. In all this complexity, it is vital to make some simplifying assumption (‘une mise au point’) which shall enable us to reconsider the whole question of what the total gestalt of language really is, what the linguistic values really are, and what the sciences known by the names of logic and of linguistics really are, in such a way that the assumptions of each of these two sciences of language shall no longer flatly contradict those of the other. 1.2.7. We have got to try to find and specify some basic existent (‘une autre valeur, une existence . . . ’) that will serve as a common base from which to analyse:
232
Phrasings, breath groups and text processing
1. Logical interconnection 2. The interconnections of thought 3. (The semantic interconnections of) Linguistic expression. In the next section of the paper this existent is to be postulated; in the final section it is to be applied: 1. to the linguistic problem of phrases; 2. to that of stylistics. 1.3.
Specification of the fundamental gestalt
1.3.1. It follows from the general orientation towards language (which has been just given in x 1.2.7) that the simplest case of the most basic form of semantic interconnection must be a connection which is both common to, and existent between, the linguistic expression, the exterior object, and the unit of thought (whatever this may be). 1.3.2. These interconnections (‘rapports’) must be defined in terms of (‘pourraient se re´sumer a` l’aide de’) similarities (‘reciprocite´s (unite´s)’) [semantic intersections, or overlaps: MMB] and antithetical contrasts (‘contradictions dialectiques’) [semantic joins, or alternations: MMB]. 1.3.3. These same relations also hold, though not in a simple manner, between the two ultimates themselves: that is, between the units of exterior reality, the units of thought, and the units of linguistic expression ‘(1) Re´alite´ exte´rieure, (2) re´alite´ de la pense´e, (3) re´alite´ de l’expression linguistique’). [Comment by MMB: This and the following are quasi-Hegelian paragraphs.] 1.3.4. We are ‘at the centre’ of our problem: we must, however, so define this centre as to enable ourselves to understand the actual mechanism of language (‘le me´canisme du langage’). 1.3.5. First step: we must define the [symmetric] similarity-relation of solidarity: [i.e. of semantic overlap: MMB]. 1.3.6. Second step: we must define the distinction between a total phenomenon and a determinate aspect of a phenomenon. These two have indeed solidarity with one another; but this time the relation giving this solidarity is an asymmetric relation. 1.3.7. Guberina’s view of exterior reality. This is given at length below because it is the fact that Guberina takes such a childlike, simple clearcut view of exterior reality that enables him to imagine a semantic system directly conceived in terms of it. Nous soulignons que toute la nature se de´roule de manie`re phe´nome´nale ou bien un fait (phe´nome`ne) succe`de a` un autre; ou bien plusieurs faits (phe´nome`nes)
Commentary on the Guberina hypothesis
233
se produisent simultane´ment. Mais ils ont toujours un aspect phe´nome´nal. La vie humaine, pour qu’elle puisse exister, doit se manifester non pas sous forme de tout, mais se de´rouler a` l’inte´rieure des phe´nome`nes. Nous ne pouvons pas en meˆme temps marcher et eˆtre assis, dormir et travailler etc., mais toutes ces activite´s se succe`dent selon l’ordre impose´ par la nature. Le langage se de´roule e´galement sous un aspect phe´nome´nal. Meˆme la` ou` il y a simultane´ite´ de phe´nome`nes dans la nature, cette simultane´ite´ adopte dans le langage un aspect phe´nome´nal tre`s particularise´. Ainsi, si nous voulons exprimer cinq phe´nome`nes diffe´rents et simultane´s concernant, par example, un arbre, un oiseau, ou un homme, nous sommes oblige´s de composer cinq phrases (ou moins, si le tout peut eˆtre compris, a` l’aide du contexte re´el et des phrases pre´ce´dentes). On se rende compte que les e´le´ments du phe´nome`ne du langage sont plus distincts, plus se´pare´s que les parties des phe´nome`nes dans le monde des objets et dans le monde socialo-individuels. Comparons par exemple la re´alite´ totale de l’oiseau en vol, telle qu’elle apparait comme une vision unique fixant plusieurs caracte´ristiques, avec l’expression de ce fait dans une langue quelconque: l’oiseau vole, haut, bas, vite; voila` l’oiseau qui vole, etc. etc. L’expression des phe´nome`nes au moyen du langage comprend une succession des mots formant un amas de ‘particules’. Pourtant, c’est dans les parties du phe´nome`ne de la nature et dans la solidarite´ (unite´) de ses parties que se retrouvera la phe´nome`ne du langage, et qu’il sera possible de l’expliquer. It must be stressed that the whole of nature unrolls itself to our vision in the form of a succession of phenomena. Either a fact – that is, a phenomenon – succeeds a previously occurring fact, or else several facts – that is, several phenomena – are presented to the observer simultaneously. [Which ever of these two things occurs, however,] the facts are always actually presented aspectually. It is a condition even of human life itself that it does not in fact manifest itself as a whole, but as an unrolling succession of phenomena. We cannot at one and the same time both walk and sit, sleep and work, etc.; all these activities succeed one another in the order in which they naturally occur. Similarly, (the units of) language form a sequence unrolling itself in time. Even if in nature phenomena are simultaneously presented, in language this similarity has to be transformed into a succession of presented aspects – in fact into a succession of highly particularised aspects. For example, if we wish to present, in language, five simultaneously occurring phenomena that have to do with a tree, a bird, and a man, we have to make up five successive sentences (or less, only if we can get over what we want to say by reference to the actual situation in the world, or help ourselves on with other sentences which we have pronounced earlier). Of course, the phenomenal elements of language are more clearly cut off from each other, more distinct than elements or parts of phenomena, either in the world of material objects or in the world of societies or individuals. Compare, for example, the total reality signified by a bird in flight, as we actually see it – i.e. as a single (flash or) vision, of which one or more characteristics become fixed in our minds – with our attempts to express this fact in no matter what language: ‘the bird flies high, low, fast’, ‘look ! a flying bird’, ‘it was a bird, flying
234
Phrasings, breath groups and text processing
past’, and so on, and so on. As contrasted with the real phenomenon, the expression of it in language consists of a succession of words forming a particular set (‘amas’). And yet it is to the aspectual nature of the phenomena of nature, and to their solidarity [overlap] with one another that we must look, if we want to discern and understand the [further] phenomenon of language; [not the other way round].
1.3.8. The two kinds of unit: 1. a manifestans, ms, i.e. a phenomenon which has got to manifest itself, at any one moment, in one of a totality of possible ways. 2. the actual manifestation, ma, in which it manifests itself. In language, the distinction between these two is best envisaged as a generalised form of that between subject and predicate. Thus, just as the primitive, or shortest possible, phenomenal sequence in nature consists of not less than two parts, the multiply manifestable phenomenon and its actual manifestation, so also, in language, the primitive semantic formula consists of ms ma. 1.3.9. If we continue this semantic sequence, i.e. by writing ms ma ms ma, etc., we have to postulate solidarity, i.e. semantic overlap between all the ms, and all the mas, deriving from the very fact that they unroll in a single phenomenal sequence. 1.3.10. When we ask ourselves what the connective is between any ms and its set of mas, we have to answer ourselves that this must be a very general [asymmetric] causative. The ms ‘causes’ one or all of its mas. On the other hand, an ms can be also an ‘effect’ of the other ms with which it has solidarity and of an ma, especially in conjunction with its ms, or ‘cause.’ 1.3.11. If we now reconsider our primitive semantic sequence, we find that we now have three types of solidarity: 1. between an ms and its set of mas; 2. between the different ms in the sequence; and 3. between the ma-sets of the sequence. 1.3.12. The basic semantic fact: ‘ . . . ms ma donnent de concert ms nouveau, ms, qui de son coˆte´, produit un nouveau ma.’ ‘ms ma asserted in succession together give (by their solidarity) a second ms, which in its turn, produces a second ma.’
Comment: In other words, in any primitve semantic sequence, there must be the following, and only the following, semantic overlaps: 1. between the first ms and its ma. 2. between the first ma and the second ms. 3. between the second ms and its ma.
Commentary on the Guberina hypothesis
235
4. between the second ma and the first ms. 5. between the first ma and the second ma. If these semantic overlaps do not occur, we do not have a sequence. 1.3.13. Formalisation of this form of sequence by Guberina. Comment: On the above interpretation of the basic semantic fact, the correctness of which interpretation has been checked with Guberina, Guberina’s own formulisation is defective; the required additions have been put in in squared brackets. Let ms = the first manifestans ms = the second manifestans MS = the third manifestans etc. It will be assumed that ms, ms, MS, etc. have semantic overlap (solidarity). Similarly, Let ma = the first manifestation ma = the second manifestation MA = the third manifestation etc. Let the set of other ms or ma with which any ms has solidarity be called its x-set. _. Let the solidarity of ms or ma with ms, etc. be formalised by ^ Let the solidarity of any ms with its total possible set of mas be formalised by ˚. Let the actual act of manifestation [or predication] be formalised by >. We can then say: The basic semantic pattern in language is: _ ma ms > ^ _ 1 ma ms [> 1ma] ms ˚ ^ This can be broken down into: (1) ms > 1 ma (2) ms ˚ ma (3) ms _ ^ ms [(4) 1 ma _ ^ ms] The actual sequence in language looks like the following: ms > ma > > ms > ma > > > MS > MA, etc.
236
Phrasings, breath groups and text processing
1.3.14 Example: ‘Jean Travaille-et-re´ussit, de sorte que ses parents sont heureux’ (John works-and-succeeds, so that his parents are happy). John works-and-succeeds his parents are happy
ms ma ms ma
The semantic overlaps are as follows: 1. John is a human being, capable therefore of working-and-succeeding. 2. John’s parents are also human beings – capable therefore of workingand-succeeding. 3. Human beings are capable of being happy. 4. John, being a human being, could also be happy. Therefore 5. There is a semantic overlap between succeeding and being happy. From this example the predictive strength of the hypothesis becomes evident. It says that in any piece of text you have to look for ms > ma > > > > ms > ma, and find or supply the single basic pattern of semantic overlap as above. Variegations, inversions, abbreviations, ellipses may or may not be found; but, underlying these, the single basic semantic pattern of the progress of human communication must also be there. 1.3.15. The Aristotelian syllogism can be thought of as a special case of this basic semantic progression (see later extract). 1.3.16. The hypothesis also throws light on the traditional distinction between concept and judgement. 1.3.17. The hypothesis that there is one single form of basic semantic progression explains how we can continually add to our stocks of applicable predicates – and yet continue to understand one another. 1.3.18. The hypothesis explains why the Aristotelian laws of thought, (p. p) and p. p ¼ p, do not apply to language. It explains therefore the linguists’ objections to the older logic of logic. 1.3.19. Conclusion: the assertion twice over of the basic semantic formula forms the semantic square. Sometimes one of these assertions is the occurrence of the corresponding patterning in real life, indeed, inside man himself. Sometimes both occurrences actually occur in the text. 1.4.
Application of the hypothesis
1.4.1. Phrases 1.4.1.1. The hypothesis forces a new conception of a phrase, according to which synonymous phrases, having the same reference to exterior reality, can
Commentary on the Guberina hypothesis
237
be classed as ‘the same phrase’, without regard either for the number of words or the sorts of words contained in the phrase. Thus Feu! and La maison bruˆle (i.e. ‘Fire !’ and ‘the house is burning’) count as the same phrase; and ‘je ne sorts pas a` cause de la pluie’, ‘je ne sorts pas parce qu’il pleut’, and ‘je ne sorts pas’, with a gesture pointing to the rain (i.e. ‘I’m not going out because of the rain’, and ‘I’m not going out because it’s raining’, and ‘I’m not going out’ with a gesture pointing to the rain) are all the same phrase. 1.4.1.2. This leads to an immense simplification in formalising phrases. (1) Simple phrase (see last section) ms > ma e.g. Jean travaille John works. (2) Subordinate phrase: ms > ma e.g, Jean travaille de sorte qu’il re´ussit toujours John works so that he always succeeds. (3) Coordinate proposition [i.e. molecular proposition]: will be either ms > ma þ ms > ma or ms > ma > > ms > ma þ ma > > ms > ma But often coordinate and subordinate propositions can be interchanged. (‘ . . . e´tant donne´ que les propositions co-ordonne´es et subordonne´es peuvent eˆtre entremeˆle´es’). Comment: This last comment seems to me to show that Guberina’s notation as used here is inappropriate, since, in fact, in his formalisation of a complex proposition, the insertion of the connective, the þ, neutralises comment to the effect that, by virtue of the hypothesis, co-ordinate propositions cease being semantically distinct from subordinate propositions; and it is also inconsistent with a great deal of what was said in the last section. That Guberina does basically mean to assert the uniqueness and pervasiveness of his fundamental semantic formula is clearly shown by his next paragraph, which is therefore given, and translated, in full. Ces Formules, nourries par le fait que l’homme intervient toujours dans son expression (le carre´), libe`re l’expression linguistique de tous les cadres, pourvu que les sens et la re´alite´ qui est a` la base de ce sens soient respecte´s. Ainsi les phrases: Il a travaille´ de sorte qu’il a re´ussi, et il a travaille´, il a re´ussi seront toutes deux des propositions subordonne´es conse´cutives car l’identification, le sens global, le rapport logique, est le meˆme. Si tu ne m’avais pas retenu, je serais tombe´ – Tu ne m’aurais pas retenu, je serais tombe´ – sont toujours des propositions conditionelles irre´elles. L’ancienne logique et la grammaire fonde´e sur l’ancienne logique verraient dans le deuxie`me cas des propositions co-ordone´es, car la conjunction ‘de subordination’ y manque. La grammaire plus moderne et qui se croit inde´pendante de toute logique he´site
238
Phrasings, breath groups and text processing
e´galement, car ‘le signe linguistique de subordination n’est pas pre´sent.’ Or il est e´vident que l’expression linguistique exprime toujours un ensemble ou` l’homme participe sans cesse, et l’analyse linguistique doit en tenir compte. These formalisations, enriched by the fact that man is himself always a logical factor (intervient toujours) in the ‘semantic square’ (le carre´): [this form of] analysis liberates linguistic expression from all other classifying frameworks (de tous les cadres): [this liberation] can, of course, only occur, provided that the sense, and also the reality that lies at its base, are both respected. Thus, the phrases ‘He has so worked that he has succeeded’ and ‘He has worked, he has succeeded’, these will both be [i.e. these can now both be] subordinate clauses of result; for in both the [extra-linguistic?] identification, the overall sense, the pattern of logical [semantic?] connection is the same. ‘If you had not hung on to me, I should have fallen’ – ‘You hung on to me; but for you, I was a goner’, both of these can now express an unrealised condition. The traditional logic and the kind of grammatical analysis that was founded on it would have to see, in the second forms of each of these remarks, conjunctive [i.e. molecular] propositions, since ‘the subordinating connective’ has been left out. Modern grammatical analysis, which thinks that it has cast off all dependence upon logic equally, however, hesitates; on the ground that the ‘linguistic subordinating sign is not present [in the text]’. In fact, however, it is completely evident that the linguistic expression is the vehicle of a [semantic gestalt (exprime un ensemble) in which man himself] is a continual participant; and linguistic analysis has got to take account of this evident fact.
1.4.2. Stylistics 1.4.2.1. Whereas the attempt just made to analyse phrases immediately reveals the presence of [patterns of] semantic connection, in the field of stylistics, the affective content of language (Bailly), the chain of inference required to establish them is both longer and also less general. 1.4.2.2. The whole current underlying the basis of stylistics (as opposed to style) in language is that man disposes of a choice in each remark and that he makes, i.e., he can express himself either with or without some sign of affectivity. Thus ‘he is a liberal spender’ (‘prodigue’) is the logical, intellectual expression, ‘money just runs through his fingers’ (‘panier perce´’) expresses the same thing, with affect. Compare also the pairs of adjectives ‘not strong’ and ‘weakling’, ‘imprudent’ (imprudent) (i.e. perhaps not altogether wise) and ‘crazy’ (fou). The results of affectivity getting into language appeared to be that all logical grammatical pattern went out of it (l’affectivite´ coupait tous les ponts de la logique); ellipses appeared everywhere, verbs disappeared, subjects could no longer be found, a progressive destruction of all grammatical form apparently set in. 1.4.2.3. This, however, is a ridiculous way to analyse, since it simply means that conventional analysis collapses before the phenomenon of human spontaneity – when in fact it is precisely at his most spontaneous that man says most clearly what he really means.
Commentary on the Guberina hypothesis
239
1.4.2.4. Once you refound analysis on semantic logic (une logique semantique), once you look for the basic pattern of semantic connection between propositions, then so-called affective remarks stand out as having a very clear semantic pattern. In particular [having once established the semantic square] a so-called affective remark can always be taken as a response to something occurring in life or expressed in language, earlier. This earlier occurrence deprives the speaker of all semantic choice to say anything other than what he did say – given that the antecedent situation [i.e. the ‘first half’ of the square] was what it was – thus ‘affective’ remarks have their own variant of the basic semantic pattern in their own right; they are not secondary to (i.e. ‘values of’) other, non-affective remarks. The same goes for ‘figurative’ remarks; they are no longer secondary to ‘literal’ ones; semantically speaking, they stand on their own feet. Conclusion: ‘The objection will be made against me that I am doing philosophy, not linguistics. I accept this; one always does philosophy, when one talks about language. The vital thing is to do the right philosophy; not to get stuck, either with a philosophy of language, or a system of logic, which linguistically shocks.’ 1.5.
Extract from Guberina’s text
1.5.1. Guberina’s proposed solution to Bar-Hillel’s ‘ostrich’s egg’ difficulty. Comment 1: In this paragraph, G. shows how, by postulating one and only one form of primitive semantic progression he can account for the success of human communication when new things are being said: i.e. when new predictions are being made of any given subject of discourse. For reasons that have now become obscure, Bar-Hillel’s ‘ostrich’s egg’ difficulty is customarily formulated in CLRU as follows: Consider the sentence: ‘Bar-Hillel walked down King’s Parade, carrying an ostrich’s egg’. In this sentence, for ‘carrying’ (which is an ambigous word, in nearly all languages) to be translated correctly, the machine needs to pick up the information that an ostrich’s egg is something that can be carried in the hand. On the other hand, what conceptual dictionary maker in his senses, making a conceptual dictionary entry for ‘ostrich’s egg’ would remember, at the moment of making the entry, that an egg was something that could be carried in the hand? In other words, this is a new complement, forming a new predicate, never before asserted on this subject-of-discourse: in G’s words, a new manifestation of an existing manifestans. How is a machine – or, come to that, a human hearer – to pick up this fact?
240
Phrasings, breath groups and text processing
Text D’un autre coˆte´, les contradictions existant entre le manifestans et la manifestation libe`rent l’homme de l’esclavage des choses, l’homme qui forme les jugements et, en ge´ne´ral, les unite´s compose´es de manifestans et de manifestations. L’homme graˆce a` son cerveau, a` la possibilite´ d’approfondir sans cesse les choses et de pe´ne´trer leur essence. Que fait au fond l’homme dans ces ope´rations mentales? Il e´tudie les choses par les phe´nome`nes et en les observant il les lie les unes aux autres; c’est ainsi que la compre´hension des choses devient plus comple`te. En d’autres termes, l’homme, en observant une par une les manifestations d’un manifestans lie, a` la base de ces manifestations, un manifestans aux autres manifestans. On peut dire que toutes les classifications sont base´es sur cette ope´ration. Il est (‘ne est’ is a misprint) de meˆme dans le monde des concepts ou` un concept explique l’autre. Justement parce que les concepts ne sont pas des abstractions ni quelque chose d’absolument ge´ne´ralise´ sans liaison avec des cas concrets, toutes les pense´es de l’homme, toutes ses re´flexions (par exemple sur la vitesse de la propagation de la lumie`re – sur la rotation de la terre autour du soleil; sur la justice, sur la verite´, sur le concept et le jugement comme tels) ne sont rien d’autres que les liaisons, l’acte de lier les ‘phe´nome`nes’ du domaine des concepts aux manifestations des autres concepts. C’est par de tels proce´de´s que l’homme est un cre´ateur perpetuel; c’est graˆce a` ces possibilite´s qu’il peut exprimer diffe´rentes opinions et approfondir le monde mate´riel et spirituel.
Translation From another point of view it is the distinction existing between manifestans and manifestation that frees man from slavery to things – man, the judgement-former, man who – to put it more generally – forms unities consisting of compositions of manifestans and manifestations. Thanks to his brain, man (and man alone) has the possibility of ceaselessly attaining a more and more profound insight into the nature of things, of penetrating their very essence. Now what is man actually doing in these mental operations? He is studying things by studying their phenomena; by observing the phenomena, he is binding the things to one another; it is by this process that man’s comprehension of things completes itself more and more. In other words, man, in observing one by one the manifestations of a manifestans, uses these observed manifestations to bind one manifestans to another manifestans. All classifications, actually, are founded on this principle. The same thing happens in the world of concepts – in that universe of discourse in which (we can say that) one concept explains another. Just because concepts (as I here envisage them) are neither pure abstractions nor something entirely generalised without any connections with concrete instances, all the thoughts of man, all his reflexions (for instance, on the speed of the propagation of light – on the rotation of the earth round the sun; on justice, on truth, and on the interrelation of the judgement and the concept taken as such) are nothing else but connections (of the kind I have just postulated), namely, results of the act of interconnecting ‘phenomena’ in the conceptual domain to the manifestations of other (antecedently existing) concepts.
Commentary on the Guberina hypothesis
241
It is thanks to such procedures that man can be a perpetual creator: it is by using such possibilities (of manifestans-combination) that he can express different opinions, and gain an ever more profound insight into the material and spiritual world.
Comment 2: Put logically, what Guberina is saying is the following: Once grant that there is one and only one form of primitive semantic progression – namely the semantic ‘square’, which, in the section immediately before this one he has just formalised at length – and it at last becomes possible to explain how it is that man can form new predicates without being misunderstood. For if we know that the primitive form of semantic progression – in terms of the CLRU system, (the primitive pattern of semantic overlap) – must be: fM1A m1B ¼ M2A m2B g then we can program the machine to put in the classifying As and the Bs in this formula even if they have never existed in the language before, and so accumulate mechanically a continually increasing stock of semantically overlapping predicates, taken from texts. Of course, to do this the text must be presented to the machine in what we will have to call ‘semantic standard form’. For instance, the Bar-Hillel example would have to be reformulated: ‘Whereas other men go down King’s Parade carrying books and gowns, Bar-Hillel went down carrying an ostrich’s egg.’ Guberina’s proposed solution, therefore, is only at present of theoretic importance – that is, it can be used to form a semantic model of human communication, but not yet to process text. Nevertheless, it gives immensely more insight into the whole semantic process; and this, at this stage, is what we want; not premature development. 1.5.2. Guberina’s argument to the effect that his hypothesis falsifies the Aristotelian Laws of Thought, (–(p. –p.) ‘not both p. and not –p.’, and p.p. ¼ p, ‘p and p. is the same as p.’) in just the way that a greater realism requires, and his resulting construction of the ‘semantic square.’ Text C’est par la qu’on e´limine efficacement encore une fois la tautologie dans le domaine de la logique, et tout ms (S) se reve`le comme un progre`s dans son ma (P) . . . Le principe S est et n’est pas P repre´sente une qualite´ fort complexe, car celle contradiction signifie avant tout: changement, e´tapes successives. Ms (c’esta`-dire S) est dans la nature et dans le cerveau humain en un de´veloppement perpe´tuel. Il progresse: 1) Comme le re´sultat ms-ma (SP), et 2) Comme anneau dans la
242
Phrasings, breath groups and text processing
chaıˆ ne des autres ms (ma), S (P), y compris l’homme cre´ateur. Ainsi les formules ms est dans ma (S est dans P) et ma est dans ms (P est dans S) sont en ge´ne´rale vraies, a` la seule condition qu’on ajoute aux ms (ma) et S (P), – en tant qu’e´le´ments d’un objet, – l’homme dans sa qualite´ de ms (ma), qui exprime les ms-ma (SP) et est en lui meˆme un ms-ma (SP). . . . Il s’en suit que en tant qu’homme, nous assistons toujours a` des ms-ma au carre´, ou` le carre´ est forme´ par ms-ma, et le ms-ma de base est donne´ par l’object, par le phe´nome`ne, par un stade de´termine´ du progre`s, par le concept connu jusqu’a` un certain degre` et par la nature elle-meˆme. Ou pourrait finalement formuler de la fac¸on suivante les conclusions tire´es jusqu’a`:
_ _ ms2) > 1 ma2 ( ms2) [1 ma2] ^ ms2 ( ma2 ^ Ou 1) ms2 > 1 ma2 _ ma2 2) ms2 \ ^ 2 3) ms ms 2 _ [4) 1 ma2 ms2] . . . ’ ^ Comment: If we say, ‘John works, and works – and works’, we are not saying the same thing as when we say simply, ‘John works’; and the very characteristic intonational form that we use shows that we are not. ‘John works – and works’, means something far more like, ‘John is an absolute glutton for work. He works [fully as much as another man would] and after that he goes on working, and even after that again he still works.’ Whereas ‘John works’ without further context is highly indeterminate; in the sentence, ‘John works, but Philip works hard’, it is consonant with John working very little indeed. Thus, in ordinary language, the Aristotelian axiom p. p. p. ¼ p, which is designed to indicate the general truth that a remark gets no truer by being continually reasserted, in this application, as we have seen, is falsified. On Guberina’s hypothesis, however, ‘John works’ would be ms > ma, whereas ‘John works, and works – and works’ would be ms > ma >> ms > ma MS > > > > ma, and so the semantic difference between the two is allowed for. Similarly, if we say of someone, ‘Well, he was – and he wasn’t.’ (‘Was he in fact in love with her?’ ‘Well, he was –and he wasn’t’) we are not uttering a tautotology, or talking nonsense; we are referrring, as Guberina says, to a progressively more complicated state of affairs, since we are being given to understand that, with regard to some of the total set of manifestations of being in love, well, he had them; but with regard to some others, well, he didn’t. The Guberina hypothesis can deal with such a sequence in language whereas Aristotelian logic cannot; for, in Aristotelian logic (p. – p). (‘Not both ‘‘p’’ and ‘‘not – p’’ ’) is taken as being a law of thought itself. It will be noticed that a great many of Guberina’s examples require for their elucidation a double sentence, not a single one; in fact, they require, as context, the whole of his ‘semantic square’. With regard to this square, it
Commentary on the Guberina hypothesis
243
is possible, for MT purposes, to ignore all the cases of the semantic square except those that (a) are actually present in the text and (b) could be caused to present in the text by expanding it. See the example in the piece of text in Guberina standard form, given later on. Guberina, like all Idealist philosophers, holds that there is if not a straight correspondence, at any rate an ascertainable correspondence, between the way exterior reality works, the way language works and the way the mind of man works. Well, there may be; but if the correspondence is ascertainable, I think it is up to the Idealists to produce a way of ascertaining it. At present the alternative philosophic view, that the fundamental structure of language, which we necessarily operate whenever we think (since it is impossible to think without using some form of language) superimposes itself, like a grid, on the structure both of the exterior world and of our minds, tricking us into thinking that both of these realms have the same structure as language has – this more sceptical view is at least as plausible as the Idealist view. The overlap of outlook between Guberina and myself therefore is that we agree in our general conception of the semantic structure of language; and thus we can, in concert with one another, construct all those instances of the semantic square in which both halves of the square could occur within a language. When one half of the square has to be supplied, extra-linguistically, from outer or inner reality, then Guberina is prepared, without philosophic qualms, to supply it, whereas I am not quite sure that I am. Translation By using this kind of argument (‘par la`’) one can also prevent the Aristotelian tautologics (la tautologie dans le domaine de la logique) [from being meaningless]; since [in my hypothesis] every ms (S, i.e. Aristotelian subject) shows itself in an ma (i.e. in P, the Aristotelian predicate). The assertion ‘S is – and is not P’ [which is an application of the Aristotelian contradiction p. – p, ‘p and not p’] represents, in fact, a highly complex reality, since this [apparent] contradiction primarily signifies that a process of change took place in some situation, which manifested itself in successive, contrary, stages. An ms (that is, an Aristotelian subject [a substance]) is, in fact, both in nature, and as portrayed in the human brain, in a state of perpetual development. It progresses (1) by the actual assertion in language (or occurrence in life) of the primary sequence ms > ma (SP), and (2) by the fact that it is connected, like a link in a chain, to the gestalt consisting of the whole set of possible predicates, ms (ma) S (P) – counting in this set that set of predications that is the whole series of creative acts perpetrated by man. Thus the formulae ‘ms is in ma’ [i.e. ms has to manifest itself in some ma] (‘S is in P’) and ma is in ms [i.e. every ma has to be an ma of some ms] (P is in S) work out as generally true only if one puts alongside ms (ma and S P) a [parallel] sequence consisting of man himself, who, while he expresses the sequence ms > ma (S P) is himself [in the act of expression] an ms manifesting itself in an ma, i.e. ms > ma.
244
Phrasings, breath groups and text processing
It follows that what we in fact always get are not single semantic sequences, but semantic squares [that is two parallel semantic sequences which, by semantically interacting, together form a ‘square’]. The ‘squaring’ is performed by the human ms > ma operating alongside the basic ms > ma [which we have already postulated]. [This basic ms > ma] can be given either in terms of actual objects, (phenomena), or by anything taken at any determinate stage of its progression, or in terms of concepts that have, up to now, been made precise only up to a certain point, or form nature itself. [The construction of the square leads us to our final formulation of the semantic basis of language] which summarises all the conclusions reached up to now. [Instead of the formula given earlier, we now have]
ms2 ( ma2 ms2) > 1 ma2 ( ms2([ > 1 ma2] This breaks down into: (1) ms2 > 1 ma2 (2) ms2 ma2 2 (3) ms2 _ ^ ms 2 _ [(4) 1 ma ^ ms2] 1.5.3. Guberina constructs an argument which, in fact, shows, that the Aristotelian syllogism can be envisaged as a special case of his general form of primitive semantic progression. Text . . . le manifestans se manifestant dans une manifestation ne se manifeste pas comme un phe´nome`ne unique, mais il se re´alise dans une manifestation par laquelle les autres manifestans se manifestant e´galement. Cela veut dire que le manifestans ‘s’e´largit’ par sa manifestation, et bien qu’elle ne soit chaque fois qu’une de ses manifestations possibles, pourtant c’est par cette manifestation particulie`re que le manifestans touche, atteint, se lie aux autres manifestans. C’est ainsi que sa ‘vie’ se de´veloppe ulte´rieurement. Et cela non pas sous une forme ge´ne´rale, non parce que la manifestation de´passe le manifestans, mais parce que, le manifestans manifeste´ en tant que manifestation pe´ne`tre ne´cessairement dans les domaines des autres manifestans, et se trouve uni a` eux pars divers rapports. Ainsi:
Jean travaille assiduˆment ¼ ms > ma; Jean travaille assiduˆment et re´ussit ¼ ms > ma > > ms > ma; Jean travaille assiduˆment et re´ussit, de sorte que ses parents sont heureux ms > ma > > ms ma > > > > MS > ma (Ainsi nous aurions l’explication suivante des signes employe´s cidessus: > ¼ se manifeste; > > ¼ ms, > ma qui ensemble donnent ms. Toute nouvelle cre´ation, formation (progression) du ms provenant des ms > ma est marque´e par l’addition d’un nouveau, > entre ms et ma, alors que tout ms nouvellement cre´e, forme´, est indique´ par une nouvelle ligne au-dessous de ms.) C’est de cette manie`re qu’en meˆme temps nous concevons la solidarite´ et l’unite´ existant entre le singulier, le particulier
Commentary on the Guberina hypothesis
245
et l’universel, et inversement. Il en ressort que ms n’est jamais isole´ du fait qu’il ne peut pas exister sans ma, qui, de son cote´, est inexistant s’il n’est a` la fois la manifestation des autres ms.
Translation The manifestans, manifesting itself in some one manifestation, does not manifest itself as a unique phenomenon, but instantiates itself in a manifestation by means of which other manifestans can also manifest themselves. This means that the manifestans ‘enlarges itself’ with its manifestation, and although this manifestation is only, each time, one out of the total set of possible manifestations, nevertheless it remains true that it is by this particular manifestation that the manifestans reaches out to, touches, connects itself on to the other manifestans. It is thus that its ‘life’ ultimately develops. And this not under any general form, not because the manifestation extrapolates from (de´passe) its manifestans, but because the manifested manifestans, by definition, necessarily overlaps with (pe´ne`tre dans) the fields of other manifestans, and thus finds itself united to them by various connections. Thus:
Jean works like a trooper: ms > ma½M1 > m1 Jean works like a trooper and gets through: A ms > ma >> ms > ma½MA 1 m1B 6 M2 m2B
Jean both works like a trooper and gets through, so his parents are very happy: ms > ma >> ms ma >> ms A A > ma½MA 1 m1B 6 M2 > m2B 6 M3 m3B
Thus we get the following interpretation of the above signs: > se manifeste; > > ¼ ms > ma, and the two together give ms. Each new creation, operation (progression) of ms coming from the more primary form ms > ma is marked by the addition of an extra > between ms and ma, whereas any ms newly created, formed by the last ms > ma, is formalised by a new line below the ms. It is by applying this progression principle (C’est de cette manie`re en meˆme temps) that we can conceive the pattern of semantic overlap (la solidarite´ et l’unite´) existing between the (logical) similar (l x), the (logical) particular ($ x) and the (logical) universal (x); and inversely. That an ms cannot exist in isolation follows from the fact that it can’t occur (in the formula) without its ma, which, on its side is ‘inexistent’ (not well formed) unless it is also the manifestation of the other ms in the formula. [Ed. note: It will be clear to the bilingual reader that MMB has added her own decorations to the formulae for ‘John works like a trooper’ et seq.]
246
Phrasings, breath groups and text processing
2.
Re-statement of, and formalisation of, the Guberina hypothesis, by reference to the system ‘T’
Now, what is Guberina really saying? When we rethink his hypothesis, mathematico-logically what he is saying, I think, comes to this. He sees language as a multiple contrast system – made up of two units and three relations, with single systems of semantic classifiers, and one formula (the ‘semantic square’) of which all other semantic or syntactic forms actually found in language must be construed either as variants or as abbreviations. 2.1.
The units
Guberina’s two units are a generalised subject of discourse, or subject, and the most generalised possible version of what is said about it, or predicate. He calls these, respectively, the manifestans and the manifestation. Let us call them respectively M and m. 2.1.1. M: the manifestans. Let the series of Ms in any text be formalised by the series M 1 M 2 . . . M n. [N.B. If any two or more Ms in an M series are represented in the text by the same word or by pronouns representing the same word, they still, on Guberina’s hypothesis of the nature of semantic progression, count as different Ms.] Let the sub-series of semantically cognate Ms in any M-series be represented as having the same superscript, the superscripts being taken from the alphabetic series, A, B, C, D . . . N. Thus sub-series of Ms that are semantically cognate with regard to the semantic classifier A will be the series M A M A2 M A3 . . . M An; and so also with regard to M-series marked by combinations of semantic classifiers, e.g. M AB1 M AB2 . . . M ABn. 2.1.2. m: the manifestation. Let the series of ms, in any text, be formalised by the m-series, m 1 m 2 m 3 . . . m n. Let the sub-series of semantically cognate ms, in any m- series, be formalised by a superscript, as for Ms, it being noted that a different semantic classifier must be formalised by a different upper-case letter, and the same classifier by the same letter, no matter whether the classifier applies to an M or an m. Thus a semantically cognate m-sub-series will be: m A1 m A2 . . . m An. [It is worth remarking that in the system T', the distinction between M and m is given by the colon and the slash: that is, M A ” a: and m A ” a/.] 2.2.
The relations
2.2.1. Inclusion. The primary logical relation, for Guberina, is an asymmetric relation that connects an M with all the ms that are predicable of it.
Commentary on the Guberina hypothesis
247
Thus, we shall say: (1) M A1 - (m A11 m A12 m A13 . . . m N1n) Using as it is used in the system T', and: (2) M A1 any m A1. This notation, however, is redundant, since, if M1 includes the whole set of m1, the semantic cognateness of M1 with the set m1 is sufficiently given by their common subscript alone. We shall therefore say: M1 (m 11 m 12 m 13 . . . m1n.). (Note: our inclusion is therefore used both for Guberina’s sign and for his sign >; it covers his solidarity of any manifestans with the open set of manifestations which are predicable of it.) 2.2.2 Solidarity, or, semantic overlap; or, semantic cognateness. This is a symmetric relation: that is to say if A has solidarity, or semantic-cognateness with B, B has solidarity, or semantic cognateness, with A. Since we are presuming that we are using a set of classifiers (as in the system T' we are now saying that M1 shall be held to have solidarity with or to be semantically cognate to any M2 if M1 and M2 have at least one semantic classifier in common. For instance MA1 is semantically cognate to or has solidarity with MA2 with respect to the semantic classifier A. Similarly, mB1 has solidarity with, i.e. is semantically cognate to mB2 with respect to the semantic classifier B. [N.B. We have now accounted for all three of Guberina’s kinds of solidarity; for the solidarity of any manifestans with its set of manifestations is given by the common subscript; the solidarity of any manifestans with which it has semantic cognateness (or solidarity) is given by the common superscript, and the solidarity of any manifestation with any other manifestation is given by the common superscript. It is worth remarking that it follows from this, in terms of the system T', that any M will be a K-point, since it will be describable as the set of alternations of its predicates. ms can also be K-points in the system T0 , but only if it is found convenient to describe them as alternations of the sets of ms with which they could combine.] 2.3.
Descriptive implication; ‘and so’; 6
This relation is not among those explicitly given by Guberina in the paper here being analysed. It is, however, both explicitly given and discussed at length in his book Valeur logique et valeur stylistique des propositions complexes.
248
Phrasings, breath groups and text processing
The probable reason that Guberina does not explicitly give this third relation on his later work is that the necessary and sufficient conditions for its insertion can be inferrred from the relations he has given, together with the further assertion that there is one and only one basic semantic formula (see Roger Needham’s analysis, below). However, it is almost certainly more convenient to assert it in the basic semantic formula than to require for identification of the formula, explicit semantic cognateness between m1 and M2, which is what the formula requires. For it very often happens in MT tests that the text gives the descriptive implication in an easily recognisable form. Whereas the semantic cognateness required by the basic formula to exist between m1 and M2 is not obvious, it may be, in fact, being created by the assertion, in which case it will never have been present in the language till now. Moreover, there are varieties and degrees in descriptive implication (see Guberina’s book), which means that every sign in a text conveying such an implication will also be describable as a K-point in T'; and it is convenient to have the K-points in a text explicitly catered for in any formalisation. 2.3.1. The basic semantic formula. Guberina’s basic semantic formula will now be: B A B MA 1 m1 6 M2 m2 the semantic cognateness between m1 and M2 being given by the fact that they are connected by the connective 6. In other words, taking C as the common semantic cognateness between m1 and M2, the basic formula could now be re-written: BC MA 1 m1 B MAC 2 M2
Moreover, since it follows from Guberina’s rules that, if semantic cognateness is to be presumed between ma and M2, it will also be detectable (or creatable) between ms and M1. By using D to indicate this last cognateness, we can now also rewrite the formula: MAD mBC 1 1 BD MAC 2 m2
2.3.2. Guberina’s ‘axioms’ will now be: (1) M i m 11 m 12 m 13 . . . m1n A A A (2) M A1 _ ^ M j M n ... M R (3) Mi 1 m i (4) 1 m Ai M Aij M Ak . . . MAn
Commentary on the Guberina hypothesis
2.4.
249
The semantic square
Begging a considerable number of questions in a way that is by no means quite fair to Guberina, but which is strictly necessary for purposes of MT, we shall equate the double proposition, or semantic square, with the basic semantic formula as we have set it out above. We do this on the presumption that, in any text, when half of any formula is missing in that it can be supplied from exterior reality, sufficient indications are nevertheless present in the text for the missing half of the square to be mechanically creatable, if required. This is all the more likely to be the case in that, on Guberina’s view, a paragraph clearly consists (at the least) of a ‘square of squares’; and if three ‘sides’ of a ‘square’ of ‘squares’ are actually present in the text, it will be easier mechanically to create the fourth than it will be unambiguously to create a second from a first alone. (See the piece of text in Guberina-standard-form, which is given and commented on, in the next section). 3.
Reformalisation of Guberina’s hypothesis, using a subject-predicate notation (By Roger Needham)
We adopt the following notation: Si are subjects of discourse, and Pj are predicates. The application of a particular predicate to a particular subject is Si)Pj. We denote the set of predicates that can be applied to an Si by X(Si), and so Si)Pj is only a valid remark if Pj 2 X(Si). Pj and Pk will be said to be semantically cognate if there is some Si such that both Pj and Pk belong to X(Si). Similarly, Si and S1 will be said to be semantically cognate if there is a Pj which belongs to both X(Si) and X(S1). (This would be a better definition if it took account of the number of Si that could be found in the first part, and Pj that could be found in the second). We now consider an utterance of the form Si)Pj and so Sk)P1. Guberina asserts that the use of and so is only legitimate if Si and Sk are semantically cognate, and so are Pj and P1. For this to happen it is sufficient that Pj 2 X(Sk) and P1 2 X(Sj). It thus follows that given an incomplete sentence Si)Pj and so Sk) . . . we are restricted in our choice of last term to those predicates in the intersection of X(Sk) and X(Si).
250
Phrasings, breath groups and text processing
Guberina asserts that all discourse is basically of this pattern, and thus he is able to make a further inference. If an utterance of pattern A appears in text, it is to be presumed that it is a valid use of and so, and so the requirements of semantic cognateness are satisfied. It is thus possible to infer that P j 2 W(Sk) and that P 1 2 X(Sj), even if we did not know it before. By this means our knowledge of the Xs is increased. We thus see that if discourse is written in a Guberina-like manner, it is to some extent possible to infer the Xs from a text, and conversely to predict parts of the text if we know the Xs. However, it is only necessary to look at some text to see that it is not normally constructed this way and to look at MMB’s reconstruction of this kind of discourse (given in the next section), to see why it is not. In the ordinary way we simply do not need the fantastic redundancy it provides. Guberina wants this redundancy when talking to the deaf, and someone using a bad telephone might want it, or someone reading bad writing. For purposes of MT it might indeed sometimes be needed to deal with the different kind of noise (if it may be so-called) arising from the wide range of uses of some words; and the differing variants of it, or abbreviations of it, current in differing languages. Thus, suppose we have a sentence of form A, and that the word used for P1 has a very wide range of uses. To translate correctly we only select a use that belongs to X(Sk), as we already knew we must, but also select one that belongs to X(Si) – a new and helpful restriction. Now what goes for sorting out for MT is likely to go also for sorting out for unilingual understanding, and we now see why discourse as found is not Guberina-like. For if a word has not got a wide range of uses, we do not need so much help to sort it out. In my view, it is only if the words are very indeterminate (e.g. Chinese) or very liable to be corrupted, and of course we might need it in order to make target-language-like re-abbreviations of it, and reconversions from it, in the middle stage of interlingual MT. 3.1.
A piece of text set out in Guberina-standard-form
The piece of text chosen was the first paragraph of Guberina’s own paper, La logique de la logique du langage. First this paragraph is given in the original French. It is then given in an English translation. It is lastly given, set up in Guberina-standard form. The third version has not yet been formalised in T', and is therefore not guaranteed error-free. It is fairly clear, however, that the way to test it would be to convert it into a comparable bracket-segmented formula in T'(choosing T' elements to taste, to secure maximum simplicity) and then test each bracketed segment in turn to see whether or not it conformed to any of the three forms given in the last section for the basic semantic formula.
Commentary on the Guberina hypothesis
251
Text Le proble`me est vieux. Les Grecs croyaint l’avoir re´solu en assimilant le jugement logique a` la proposition, et les espe`ces de mots aux cate´gories logiques. Le coˆte´ de l’expression par le langage fut par cette equivalence subordonne´ a` la logique. Les sie`cles plus recents y ont apporte´ de petites retouches. Les linguistes, portant, ont exprime´ quelques doutes au XIX e sie`cle sur l’e´quivalence et la subordination du langage a` la logique. Le XX e sie`cle proclama tre`s ouvertement et avec une insistence de plus en plus forte que la langue et la logique pre´sentent deux domaines distincts. La the´orie grammaticale crut s’eˆtre finalement de´barrasse´e de l’emprise de la logique. On distingua nettement une logique du langage et une logique de la logique. Et la paix fut faite.
Translation The problem is an old chestnut. The Greeks thought they had solved it by the device of assimilating logical judgement to a proposition, and different sorts of words to different logical categories. Linguistic expression was, however, by the very fact of making this equivalence, subordinated to the basic forms of logic. More recent centuries brought only slight retouchings-up of the Greek solution, until the nineteenth century, when linguists expressed various qualms about the equation, and doubts about the subordinating of language to logic; and now, the twentieth century has proclaimed quite openly, and more and more insistently, that language and logic constitute two totally distinct domains. Grammatical theory has thus come to believe itself to have totally thrown off the cramping yoke of logical formulation; ‘logic of language’ has been explicitly distinguished from another quite different ‘logic of logic’; and peace has been finally made.
The same text roughly set up in, though not formalised, in Guberina-standard form This area-of-problem-solvability is old; One long-thought-of-as-being-a-solution-to-it emanated from the Greeks – an oldpeople, and likewise a people famed for problem-solving. How did the Greeks think-to-have solved the problem? This is how the Greeks thought-to-have-solved the problem. They assimilated language-judgements to propositions: They assimilated sorts-of-words to logical categories. [Thus to your question: ‘How did the Greeks think they had solved the problem?’] I reply; the Greeks thought they had solved the problem by subordinating language to logic. [How long did people continue to accept this solution?] For centuries – with little refinements – people continued to accept this solution. [When did people first begin to think differently?] In nineteenth century, people began to think differently. [Who were the people who began to think differently?]
252
Phrasings, breath groups and text processing
Linguists were the people who began to think differently. They began to have qualms about the equivalences: They began to express doubts about the subordination. In twentieth century, linguists began to protest openly; They proclaimed, more and more insistently, that language and logic were separate domains. Grammatical theory was thus thought-to-have-been separated for ever from logic; Grammarians thought they had thrown off the stranglehold of logical forms. Thus in the one field, there was ‘logic of logic,’; In the other field, there was ‘logic of language.’ And since, through the complete separation of the parties, there could be no more disagreement; There was, in the end, an established peace made between them.
10
Semantic algorithms
The purpose of the paper that I want here to present is to make a suggestion for computing semantic paragraph patterns. I had thought that just putting forward this suggestion would involve putting forward a way of looking at language so different from that of everyone else present, either from the logical side or the linguistic side, that I would get bogged down in peripheral controversy to the extent of never getting to the point. I was going to start by saying, ‘Put on my tomb: ‘‘This is what she was trying for’’.’ But it is not so. I don’t know what has happened, but I don’t disagree with Yehoshua Bar-Hillel as much as I did. And on the linguistic side I owe this whole colloquium an apology and put forward the excuse that I was ill. I ought to have mastered the work of Weinreich (1971). I am trying to. But it is not just that simple a matter to master a complex work in a discipline quite different from that which one ordinarily follows. I may misinterpret, but it seems to me that the kind of suggestion I put forward in this paper could be construed as a crude way of doing the kind of thing Weinreich has asked for. But Yehoshua Bar-Hillel is actually very right when he wants to question all the time what real use the computer can be in this field. So don’t be misled by the size of the output in this paper. In all the devices used except one, which is the one I want to talk about, the computer is used above all as a clerical aid. One should be clear, I think, in doing semantics work, whether one could have done it without a computer and, if not, in just what way the computer was a scientific or clerical device. 1.
Phrasings
The hypothesis from which we start, and which there is almost no time to defend, is that the semantic unit of language is given by intonational and phonetic data and is not perspicuous from written speech. This semantic unit we call a phrasing. I will start, therefore, by defining a phrasing: 253
254
Phrasings, breath groups and text processing
A phrasing is a piece of utterance consisting of two stress-points, and whatever intonationally lies between them or depends on them. In other words, phonetically speaking, a phrasing is a tone group (Shillan 1954). To illustrate the nature of phrasings I give, as example, the beginning of the last paragraph, phrased up by hand in a rough and ready manner. /The hypothesis( )/ /from/which we start / /and which there þ is almost / /no þ time to defend / /is that þ the semantic/ /unit of language /is given by intonational/ /and phonetic data/ /and þ is not perspicuous/ /from written speech./ /I þ will start, therefore, by defining a phrasing./ Key: / / boundaries of phrasing ( ) silent beat þ intonational connection stressed word Note: Segments smaller than the word are not here stressed. You will appreciate that the phonetics of intonational form is a definite discipline and that it is not the subject of discussion here. I can sustain discussion on what we at CLRU are doing to make precise the study of what these phrasings look like in actual text; but I give warning that this study will involve further massive and tight experimentation, which we at CLRU are not equipped to do. Three lines are being pursued. Firstly, The Gsell Tune Detector at Grenoble will give the data (Gsell et al. 1963). The technological difficulty in recording phrasings is that of making static recordings of pitch data; and the Tune Detector will do this. But even if literally miles of output were to be obtained from such a tune-detecting machine – and we do need literally miles of output from it to allow for variations between speakers – this output would be very little good without the possibility of subsequently processing it. We are therefore struggling in CLRU to find a way of making a computer simplification of it, so that the program itself (a clerical aid again, but nevertheless a good one) can process this output mechanically and analyse it.
Semantic algorithms
255
Then, secondly, a statistical survey is being made of the characteristics of phrasings in English and Canadian French; these phrasings have been antecedently marked in the text by hand (Dolby 1966). Thirdly, there is one ‘hard’ criterion of the existence of phrasings that I can here and now show. We have been examining comparatively large masses of official text issued by the Canadian Government. This has the original English and the Canadian French translation published together in the same volume. By examination of actual material, we have been trying to see what it would be like for a machine to perform the transformation from the English to the French. Such an examination exposes whoever makes it to the full shock of discovering the absence of linkage between any initial text and some other text that purports to be a translation of it in some other language. The sentential breaks do not always correspond, since a Frenchman translating from English takes pleasure in not letting them correspond; the vocabulary of course does not correspond. What then does correspond? What corresponds is that the translation goes phrasing by phrasing. Since the phrasing proves to be so important, therefore, as the semantic unit of translation, my second exhibit, SEMCO, is the first output of a semantic concordance of phrasings which, in design anyway, is a considerable improvement on the IBM Key Word in Context. The merging and sorting program for this concordance is not finished yet; but it can already be seen from the output that the phrasings of which it is composed can each be sorted in three semantically significant ways: 1. by the main stressed word; 2. by the secondarily stressed word; and 3. by the total unstressed remainder of the phrasing, or pendant. We hope to make this concordance a translation aid by setting it up bi-lingually; that is, by setting up a set of correspondences between phrasings in English and phrasings in Canadian French, and then programming a reactive typewriter, on which the human translator will type out whole phrasings in English, to do what it can to retrieve some phrasings in the French. If the English phrasing consists of a technical term or a stereotyped piece of officialese or an idiom, there will be a one-to-one match with the corresponding phrasing in Canadian French. If not, we hope progressively to enrich the system so as to enable it to retrieve French translations of semantically cognate English phrasings, that is, either of other phrasings that have both the same words stressed, but with different pendants; or with phrasings with one stressed word in common with the original phrasing; or with phrasings with the same pendant; or with phrasings synonymous with the original phrasing in some defined sense.
256
Phrasings, breath groups and text processing
Thus, supposing that /the Queen’s Government/ /the Canadian Government/ /in Canada ( ) / were all with their translations in the concordance, but /Her Majesty’s Government/ for some reason was not, the concordance would retrieve the first two of these, in order of closeness, with their French translations, on the ground that they had one common stressed word with the original (namely, ‘Government’) and that Queen’s is here synonymous with Her Majesty’s. Similarly, suppose the second phrasing, in the same text, was /is þ of þ the considered opinion/ the concordance might retrieve (e.g.) and in the following order: /is þ of þ the opinion ( ) / /has þ given serious consideration/ /has formed the opinion/ /we think ( ) / In this case, the first of each of the two sets of retrieved phrasings, that is, / the Queen’s Government / is þ of þ the opinion ( ) / would indeed be a pretty good paraphrase of the original /Her Majesty’s Government / is þ of þ the considered opinion/. But notice also that even in the worst case, obtained by taking the bottom phrasing of each of the two sets of phrasing retrieved by the concordance, some inkling would be retained in context of the brute sense of the original by saying /In Canada ( ) / we think ( ) / All this is in the future, and we want to test it out in a pilot scheme; in particular, we want to watch the concordance for size. What is already true is that we have made comparative analyses of quite a quantity of English and Canadian French text, including a text of 375 continuous phrasings, and there are only very few counter-examples to the hypothesis that you can go through, from parsing to parsing. There is another point. A program is being written by John Dobson for marking phrasing boundaries from written text, using syntactic information. But, in fact, the phrasings do not always go with the syntax, though they usually do. See, for example, such English phrasings as
Semantic algorithms
257
/A man who þ is said/ /Although there þ has been/ We have here two separable sub-systems operating within the total system of language: an intonational phrasing system determining the semantic units of the message, and a grammatico-syntactic system, determining the grammatico-syntactic groupings of the utterance. They usually draw boundaries at the same places, but not always. We can, of course, stress any segment of speech up to quite a long string of syllables. In that case the pace of speaking accelerates, though the rhythm not much. Here, as I have already said, when any syllable has been stressed, I have italicised the whole word; and I have used þ signs to connect contiguous stressed or unstressed words. I have also used empty brackets, ( ), to denote silent beats or pauses. I will not here discuss the notorious difficulty created by the fact that different speakers stress the same passage differently, except to say that in our so far limited experience, the longer the text, the more unequivocally determined the stress pattern.
2.
Quatrains
The second semantic assumption that we make at CLRU is that phrasings tend to couple up in pairs, and the pairs in turn to couple up in fours. Thus, taking again the last paragraph that I have written and phrasing it up by hand in a rough and ready manner, we get /The second semantic þ assumption/ /that þ we make at CLRU/ 1. /is þ that phrasings tend/ 2. /to couple þ up in þ pairs,/ 3. /and þ the pairs in þ turn / 4. /to couple þ up in þ fours./ or, said more quickly, 1. 2. 3. 4.
/The second semantic þ assumption/ /that þ we make at CLRU/ /is þ that phrasings tend þ to couple þ up þ in pairs/ /and þ the pairs, in þ turn, to couple þ up þ in þ fours./
These pairs of pairs of parsings, however obtained, we call quatrains. It is clear from the above example that this second assumption is normative. In the case of a short piece of utterance, in particular, one
MEGA-QUATS [or second-level quatrains] main stresses (1)
(2)
MEGA-QUATS [or first-level quatrains] main stresses (1)
(2)
(3)
(4)
QUATS [or zero-level quatrains] main stresses (1)
(2)
Figure 35
(3)
(4)
(5)
(6)
(7)
(8)
Semantic algorithms
259
can always so arrange it that the phrasings fall in fours, and one can, alternatively, so arrange it that the phrasings fall irregularly. Moreover, this second hypothesis is elastic, in that, to make it work, you have to allow for silent beats. And though there is a consensus of opinion that these genuinely exist (Abercrombie 1965), there must obviously be independent criteria of their existence and location for them to be usable in defence of the quatrain-hypothesis, for otherwise, by just inserting up to four silent beats wherever needed to complete a quatrain, any piece of prose whatever could be analysed into quatrains. I should prefer, therefore, to call the assumption that there are quatrains a device, for by using it we can (and do) provisionally define a standard paragraph as a sequence of four quatrains, that is, as a Quatrain. We can then suggest that intonationally speaking the constituent quatrains of a Quatrain (call them quats) may themselves be intonationally interrelated by higher order phrasings, with higher order stresses, these higher order stresses being spread over longer lengths of text, thus producing a hierarchical intonational picture of a standard paragraph, as illustrated in Figure 35. Of course this standard schema is a drastic and normative simplification of everything that intonationally happens in a real paragraph; it ignores all kinds of transpositions, aberrations and variants. Similarly, though more crudely, the hypothesis that a standard paragraph is a sequence of four quatrains itself tailors-to-shape any paragraph that is, in fact, not a sequence of four quatrains. But it is much easier in all study of language to analyse transpositions, aberrations and variants of anything if you have some initial schema or idea, simple enough to be easily grasped and retained by the mind, of what it is that they are transpositions, aberrations and variants of. This schema-notion also, you appreciate, like the phrasing hypothesis, constitutes the kind of provisional assumption that needs massive and precise experimentation. It ought to be possible, for instance, quasimusically to estimate the accentuation or diminution of stressing that occurs in any segment of intonationally fully contoured text according to whether the segment in question is or is not included within the boundaries of a higher-order stress. For instance, in the last paragraph that I have written immediately above (i.e., the paragraph which began ‘Of course this standard schema . . . ’), my rough guess is that in the last sentence the secondary mega-stress of the final mega-phrasing is initial þ schema þ or þ idea, while the main overall mega-stress of the same mega-phrasing, and therefore the intonational climax of the whole paragraph, is what þ it þ is þ that þ they þ are þ transpositions þ aberrations þ and þ variants þ of; for note the tremendous emphasis, which I had to indicate by italics even when writing down the original paragraph, of the final, usually totally unstressed syllable, ‘of.’
260
Phrasings, breath groups and text processing
However, meso-stressing and mega-stressing are far away in the future. What I promised the organisers of this conference to bring along and try to explain were some exhibits of some CLRU semantic algorithms that had been used in the past. And I have some exhibits that show the analytic use we have made of the basic empirical fact on which the quatrain-finding device rests, namely, that there is a sort of two-beat rhythm (||) that goes through discursive prose, especially through the sort of discursive prose that occurs (e.g.) in the London Times and in official documents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
/A /to þ have /of þ the / / /with þ a /was /at / /for
man who þ is walked through þ the Queen’s marching through þ the taking cine´ fined Bow þ Street yesterday insulting
said/ ranks/ Guards/ Mall / pictures / camera,/ £10/ Magistrates’– Court/ ()/ behaviour./ (12)
And in the seventeenth and eighteenth centuries, when prose was prose, as it were, and a great deal of written text was composed to be read aloud, the existence of this two-beat rhythm was deliberately exploited. Here is the beginning of the philosopher Hume’s preface to his Inquiry Concerning Human Understanding: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
/ I have put in thy / what þ hath / of / and / / the good þ luck / of any /() / and / so þ much / as / / / / /
hands/ been some heavy If
the diversion / of þ my idle / hours./ it þ has been / to prove þ so/ of thine,/ () then hast þ but half/ pleasure in reading/ I þ had in writing þ it,/ () ()/ thou wilt as little / think thy money/ as I þ do my þ pains/ ill bestowed./ (13)
Semantic algorithms
3.
261
Templates
If the intonation of a paragraph is the study of its tune, the semantics of it is the study of its pattern, because the study of the kind of semantic pattern that occurs in a standard paragraph has some analogy with the kind of pattern that is mechanically searched for in pattern-recognition searches. I have twice said that in studying semantics one feels as though one is identifying a visual component in language rather than an auditory component in language. This I should not have said unless I was prepared to make it good, since such an analogy, being as it is between two finite algorithms, must be by its nature precisely determinable. I therefore do not wish to go any further into this matter here, since it needs a special publication on its own, which I hope in due course to provide. The reason that I was tempted to bring up this analogy at all is that its existence (if it does exist) emphasises the main point I here want to make, namely, that formal logic as we at present have it is not and cannot be directly relevant to the contextually based study of semantic pattern. Logic is the study of relation, not of pattern; and, in particular, it is the study of derivability. By assimilating the kind of semantic pattern that we in CLRU want to make a machine find with the kind of visual pattern that research workers in the field of pattern-recognition also want to make a machine find, I hoped that by establishing a new analogy, based on visual pattern, I could obliterate the thought of the false analogy between an applied logical formula and a piece of natural language. But I see now that I have been premature. In order to get semantic patterns on to a machine, we have created in CLRU a unit of semantic pattern called a template. The word template, applied to natural language, has already quite a history, having been used twenty years ago by Bromwich and more lately by Miller. In the sense in which I am here going to use it, it was a development of my earlier notion of a Semantic Shell, simplified, streamlined and further developed in CLRU by Yorick Wilks (1964). It will be recalled that a phrasing was earlier defined as a piece of utterance consisting of two stress-points and whatever intonationally lies between them or depends on them. Thus a phrasing consisted, by definition, of three units: a main stress, a subsidiary stress and an unstressed part, or pendant. I will try to make clear what I mean by the notion of a double abstraction. The notion of a pendant is itself already an abstraction from the linguistic facts because it creates one unit out of one or more unstressed segments of text, which may occur in the phrasing between the two stress points but may also occur before or after them (also, of course, the phrasing may contain no
262
Phrasings, breath groups and text processing
unstressed segment of text). Carrying this notion of the form of a phrasing consisting of three units further, we create three positions: an imaginary piece of metal with three holes or template, the two end holes standing each for a stress point, and the hole in the middle for the pendant, thus: General Template-Form (Stage I) 0
0
0
first stress point
pendant
second stress point
These units we fill with interlingual elements which, philosophically speaking, can be regarded as Aristotelian terms – indeed, though formally speaking they are not terms, they are in use the only genuine Aristotelian terms there have probably ever been. For an Aristotelian term has (a) to be a ‘universal’ (e.g. a general term like ‘pleasure’ or ‘man’), (b) to be such that two terms can be linked with a copula in between them, (c) to be such that they can occur, without change of meaning, either as subjects or as predicates. As is notorious, this last is the difficult condition for an actual word in use in language to fulfil, for if we say, for example ‘Greek generals are handsome.’ using ‘handsome’ here as a predicate, we have to continue ‘Handsomeness is a characteristic of all the best men.’ (or some such thing) if we are to use the term ‘handsome’ also as a subject; that is, we not only have to change its form, but also give it a far more abstract meaning than it had as a predicate. To use a semantic sign as a genuine Aristotelian term requires a quite new way of thinking. We achieve this by creating a finite set (c. 50) of English monosyllables of high generality (e.g. MAN, HAVE, WORLD, IN, WHEN, DO etc.), and, divesting them by fiat of their original parts of speech, ordain that they may be combined with two and only two connectives: 1. a colon (:) indicative of ‘subjectness’ 2. a slash (/) indicative of ‘predicateness’. By using these two connectives we then recreate English ‘parts of speech’ as follows:
Semantic algorithms
263
Noun a: Adjective a: Verb a/ Preposition a/ Adverb a/ (17) Finally, we rule that at least two terms shall be required to make a wellformed formula (the two terms having one connective between them), and say that any two-term formula ab in which the a and the b are separated by a colon (i.e., a:b) shall be commutative, whereas any formula ab in which the two terms are separated by a slash shall be non-commutative (i.e., a/b). Finally, a bracketing rule has to be made (not, I think, thought of by Aristotle), allowing any two-term formula itself to be a term. This set of rules for CLRU interlinguas has been given in various work papers and publications (Richens 1956). Using this term system, we fill in the holes in our template-form as follows: (a:
(b/
c)):
Since these brackets are invariant we may omit them giving
a:
b/
c:
MAN:
CAN/
DO:
e.g.
However, if it be remembered that a template is meant to be a coding for a phrasing, it is clear that we have not made a second type of abstraction from the linguistic facts. For we have not merely made a positional abstraction from them, representing the primary and secondary stress points, and the pendant of any phrasing. We have also, by inserting general terms into the three positions, made a semantico-syntactic abstraction from them; for a whole class of phrasing will, clearly, be representable by a single triad of terms.
264
Phrasings, breath groups and text processing
To separate the members of this class, we complicate our template by inserting into it three variables, a, b, as below: General Template-Form (Stage II)
a a:
b b/
c:
These variables can be filled as values by further specifications, made by using the rules above, composed of terms, the object of the specification being to specify the semantic content of an actual phrasing sufficiently to distinguish it from all other phrasings coded under the system that have the same general template form: General Template-Form
-a MAN:
b DO/
TO
Actual Coded Phrasing
(SELF:MAN)
(WILL/DO)
(CHANGE/WHERE)/TO)
Phrasing
/I
will
come/
Sometimes, however well chosen the original set of terms, thesaurus heads or other descriptors are, in addition, necessary to distinguish two phrasings from one another: (ONE:Male MAN) /(WILL/DO)/(CHANGE/WHERE)/TO) /He will come/ (ONE:Female MAN):/(WILL/DO)/(CHANGE/WHERE)/TO) /She will come/
Semantic algorithms
265
It will be evident that, with so sparse a coding system, only a limited number of the shorter phrasings of natural language can be coded. For instance, I remember a long discussion in CLRU about how to code the phrasing /that þ it þ was þ the Annual Fair/ from the text ‘ . . . then I found that it was the Annual Fair, which was always held at Midsummer . . . ’ It is obvious that into this phrasing the information content of two or more smaller phrasings taken from some such set as the following have been compressed, e.g. /the Annual Fair/ /that it þ was ( )/ /it þ was the Fair ( )/ /( ) it þ was þ the Fair/ It is clear that it would not be out of the question to mechanise the process of cutting up one long phrasing into two small ones; but I do not want to go further into this here. For what this query does is to bring up the far more fundamental question ‘What is this whole semantic coding technique for?’ ‘What is it worth?’ ‘And what is it going to be used for?’ And it is this deeper and more philosophic question that I now want to discuss.
4.
The semantic middle term: pairing the templates
As I see it, in contemporary linguistics there are two trends. The first is connected in my mind, rightly or wrongly, with such names as W. S. Allen, M. A. K. Halliday, John Lyons, R. M. W. Dixon and, of course, above all J. R. Firth; and I therefore think of it as ‘the British School of Linguistics’, though it is almost certainly, in fact, a worldwide trend. The members of this school take a raw untampered-with utterance and then try to segment it, analyse it and account for it, using machines as clerical aids but taking the text as given; they do not try to add anything to it, excise anything from it, or otherwise explain it away. They try, moreover, to name the categories they find from the operation of finding them, instead of appropriating to new linguistic situations the well-known hackneyed categories of Graeco-Latin grammar. The rationale of doing this kind of work is brilliantly expounded in W. S. Allen’s work (1957); and a major theoretic work has recently been published from within this general trend (Dixon 1965).
266
Phrasings, breath groups and text processing
I will confess that it is with this school and not with the MIT school that my linguistic sympathies primarily lie; for it seems to me that the whole point of doing scientific linguistics – the whole battle that it has taken the scientific linguists thirty years to win – is that the practitioners of this technique engage themselves to open their eyes to look at the utterances of the languages of the world as they really are, instead of forcing them all (as in the older philology) into a Latin-derived straightjacket, or seeing them (a` la Chomsky) through the distorting glass of an Americanised norm. It is no accident, of course, that Allen and Halliday should have formed my conception of linguistics, for W. S. Allen is a professor at my own university, while M. A. K. Halliday, besides being one of the group who originally founded CLRU, also put us on the original thesaurus idea, on which all our more recent semantics work has directly or indirectly been founded. Also the view of language taken by the phonetic analysts, and in particular by P. Guberina, much more nearly coincides with that of ‘the British School of Linguists’ than with that of the present MIT school. But now we come to a difficulty, to another form of the same difficulty that probably led Chomsky and his school, and probably Fodor and Katz also, to make their drastic abstractions from the facts of language. If the distributional method of linguistics, unaided, is the only tool that is to be used to analyse and understand natural language as it really is, such language will remain forever unanalysed and non-understood; that is, it will remain ineffable. For even with a whole row of the largest imaginable computers to help, all the potential distributional potentialities of a whole national language cannot possibly be found in any finite time; and it is part of the scientific linguists’ contention that nothing less than the finding of the whole is any theoretic good. Unless, therefore, some new technique can be developed, unless some fairly drastic abstraction can be made from the genuine linguistic facts so that a system can be created that a machine can handle and that has some precisely definable analytic scientific power, all the analytic linguists of the world will turn from truly linguistic linguistics back to Chomsky, Fodor and Katz (and now Weinreich), and they will be right. Here I think I should do something to make clearer what the nature of my criticism of the Chomsky school is and what it is not. My quarrel with them is not at all that they abstract from the facts. How could it be? For I myself am proposing in this paper a far more drastic abstraction from the facts. It is that they are abstracting from the wrong facts because they are abstracting from the syntactic facts, that is, from that very superficial and highly redundant part of language that children, aphasics, people in a hurry, and colloquial speakers always, quite rightly, drop. On the same level Chomsky wants to generate exactly the ‘sentences’ of English; and yet, to do so, he creates a grossly artificial unit of a ‘sentence’, that is,
Semantic algorithms
267
founded on nothing less than that old logical body, the p and q of the predicate calculus. Similarly, Fodor, Katz (and Weinreich), when doing semantics, talk about ‘contexts’ and ‘features’ and ‘entries in dictionaries’; but their dictionaries are always imaginary idealised dictionaries, and their examples are always artificially contrived examples, and their problems about determining context always unreal problems. So, for me, in spite of its clean precision and its analytic elegance, I think this approach combines the wrong marriage of the concrete and the abstract. That this is so is now beginning to be operationally shown, in my view, in the appalling potential complexity that is about to be generated by keeping all the transformations in the calculus meaning-preserving, when the whole point of having grammatico-syntactic substitutions in a language at all is that precisely they aren’t meaning-preserving. And now that the elephant of an encyclopedic semantics is about to be hoisted on top of the tortoise of the already existent syntactic Chomsky universe, it seems to me that the whole hybrid structure is shortly about to topple with a considerable crash of its own weight. And this is a pity indeed; for the complications that have gathered obscure the whole very great potential usefulness of the original, simple, and above all elegant, analytic idea. In contrast with this elegance, see the crudeness but also the depth of what I now propose. I don’t have sentences at all; I have phrasings. And, granted also that in my first model I can only have small phrasings (see above) and that I can’t yet distinguish differences of stress-and-tune within them (see above) and that all my phrasings have to combine in pairs; that is, I can’t yet accommodate triplets (see above); and that the pairs of parsings have to be handled by a quatrain-finding device (see above) which is itself highly artificial and stylised (see also above), I can deal with stretches of language like Trim’s classic example: / H-m ( ) / /Hm()/ /Hm ( ) / let alone / Colourless green þ ideas / / sleep furiously / which Chomsky can’t. Secondly, I analyse these phrasings, even in my first model (see above and below) by a coding device, which is philosophically derived not from the logic of predicates, but from the logic of terms. This means that with
268
Phrasings, breath groups and text processing
fifty categorically changeable operators, two connectives and a bracketingrule I can create a pidgin-language, the full structure of which can really be mechanically determined by the strict use of the scientific-linguistic methods of complementary distribution; that’s the cardinal point. Maybe the first such structure that I propose is a wrong one: nevertheless, I alone propose some such structure. Thirdly, even in my first model I make provision for the cardinal semantic-linguistic feature of anaphora, or synonym recapitulation. Granted that syntactic interconnection in this model withers to a vestigial shred of itself, the far more cardinal rhythmically based phenomena of reiteration, recapitulation and parallelism are centrally provided for. Likewise, with this coding the machine can write poetry and therefore handle metaphor; though actual output from this has not been shown yet. When it be considered, therefore, what, semantically, the CLRU semantic paragraph-model can do – as opposed to what, grammatico-syntactically, it can’t do – a very different and much more sophisticated view of its potentialities becomes possible. This model is crude, yet: but its ‘deepstructure’ unlike Chomsky’s deep structures to date, is really deep. And now it is necessary to show what that deep structure is. A preliminary remark: In judging it, it is necessary to remember its technological provenance. It is just here, that is, in the guidance given by technologies towards determining this structure, as it seems to me, that the severe discipline imposed on CLRU by sustained research in the technological fields of Machine Translation, Documentation Retrieval, Information Retrieval, and Mechanical Abstracting has stood us in good stead. For the prosecution of these goals lends a hard edge to thinking and an early cut-off to the generation of complexity in programming, which purely academic studies of language do not have. It was this technological pressure that led us to Shillan’s practical Spoken English and the discovery of the phrasing, to the semantic utilisation of the two-beat prose rhythm, and to the quatrain-finding device and to the notion that there might be comparatively simple overall intonational contours to the paragraph. And it is this same technological pressure that has predecided for us what use we will make of all these stylised and streamlined phoneticosemantic units. We code them up into a crude but determinate ‘language’, and then, by giving this language vertebrae, as it were, that is, templates, we construct (or misconstruct) a paragraph’s semantic backbone, or alternatively, other parts of a text’s semantic skeleton. This is done by using the device of the ‘middle term’. The ‘middle term’ derives in idea – though not in use – from the syllogism as originally conceived by Aristotle, any syllogism being here considered not as an inference structure but as a text. Thus a syllogism, linguistically
Semantic algorithms
269
interpreted, consists of three phrasings, which, between them, contain only three terms; and the differing forms of the syllogism are distinguished from one another by reference to the action of the middle term. Here, analogously, we make the machine make a unit consisting of two coded templates, the connection consisting of the recapitulation of one of their constituent terms. Thus if I code /The /and the /and the /and the
girl house wood trees
was þ in þ a was þ in þ a was þ full þ of were þ covered þ with
house/ wood/ trees/ leaves/ etc.
I get templates of the form a MAN: a PART: a WHERE: a PLANT:
b IN/ b IN/ b HAVE/ b HAVE/
PART: WHERE: PLANT: POINT:
and the recapitulation-pattern is as follows: A
B
C
C
B
D
D
E
F
F
E
G
If, then, we further simplify by matching only ‘stressed’ terms, that is, if we ignore as skeletally adventitious the recapitulation of the two pendants in the middle positions, we are left with what I believe to be one of the basic anaphora patterns of all language: A
B
B
C
270
Phrasings, breath groups and text processing
which, in the case of the syllogism, introduces the transitivity rule-carrying syllogism If A is B and B is C . . . It must be evident that, in terms of our model (and allowing coded pendants as well as coded stressed segments to match) we can have nine basic pairing patterns. Likewise, it will be evident that combinations of these can be permitted (e.g., see above), and that, the set of 50 elements of the system being strictly finite, the strict matching algorithm A matches with A can be relaxed to allow A to match with some subset of other elements or with any other element. If only elements in the first and third positions are allowed to match, we get four basic patterns, corresponding indeed to the four categorical forms. For this model – and allowing for the fact that what has to match are not, as above, the actual words of the phrasing but the terms in the coded templates – these are the four basic semantic patterns of language.
5.
The philosophical notion of the semantic square
It must be evident, from even cursory examination of the above, that a great deal of meta-fun can be had, by inserting a list of permitted pattern transformations into this model to produce approximations to various brute syntactic forms; or to account for ellipsis (which is only the same thing, after all, as complete unstressedness); or, better still, to make the machine infer ‘logical’ interconnections between various specifiable particular pairs of templates. This meta-fun we in CLRU do not as yet propose to allow ourselves to have. This is partly because having once broken right through in our thinking, to a conception of phonetic-semantic pattern that is independent of, because prior to, that of syntactic pattern, we do not want prematurely to reimprison ourselves within the patterns of syntax. It is also because we conceive our first duty to be to try to put the machine in a position to proceed from paired-phrasing-patterns to the overall semantic pattern of a paragraph: that is, not to find out what logically follows from what, but, far more primitively, what can follow what. To do this, we postulate a basic semantic pattern in language, namely, Guberina’s pattern of the ‘semantic square’ or ‘carre´ semantique’. This also derives from an ‘Aristotelian’ device; but I have caused a great deal of obfuscation and confusion by stating, without further explanation and as though the fact were obvious, that it derives from Aristotle’s
Semantic algorithms
271
A
B
/I
B
C
/and+the
A
B
/Sylvia
saw+him/
B
C
/Sylvia
kissed+him/
A
B
/On+the
one
hand/
C
B
/On+the
other
hand/
A
B
/Is
C
A
went to+the
/Yes,
lake
he
lake/ was frozen/
coming?/
he
is./
Figure 36
Square-of-Opposition. Psychologically, it does, and I have no doubt in my own mind that in Guberina’s case it did. But to see how it did it is necessary to keep a basic hold on three truths: Firstly, that the ‘Square of Opposition’ forms no part of syllogistic logic. Secondly, that it must be reinterpreted for this purpose as being a logico-linguistic schema, giving a pattern of semantic contrast between four pairs of four terms. Thirdly, that it must then be generalised so that it can be restated as a semantic hypothesis, as giving the basic overall pattern of semantic contrast within a primary standard paragraph. Thus the original Square of Opposition is a schema giving the valid forms of immediate inference between the four categorical forms: where A is: All As are Bs E is: No As are Bs I is: Some As are Bs O is: Some As are not Bs. As is well known, when interpreted in terms of the logic of classes, or in terms of a logic of predicables, this schema runs into difficulties.
272
Phrasings, breath groups and text processing contrast
A
reiteration
pit-
pit-
reca
I
E reiteration
reca
on ulati ulati on
contrast
O
Figure 37
Interpret it now linguistically, that is, in terms of the four following actual phrasings: A is: /Allþ As E is: /Noþ As I is: /Someþ As O is: /Someþ As
are Bs/ are Bs/ are Bs/ are not Bs/
Now imagine other words in the stressed positions but keeping the semantic stress-pattern, so that realistic actual colloquial conversation results:
1st Speaker (a Scot) 2nd Speaker (an Irishman) 1st Speaker [I don’t care what you say]. 2nd Speaker [And I repeat.]
/Allþ Irish are crooks./ /Noþ Irish are crooks./ /Some Irish areþ crooks./
/And some Irish are utterlyþ non-crooks. (i.e., the most honest characters alive.)/ Continue now the conversation in a realistic manner: 1st Speaker [It comes to this.] /They’re either-angels or devils / /Irishmen goþ to extremes./ 2nd Speaker [Exactly.] /Some mayþ be utterþ blackþ heartedþ fiends/ /but others are absolutely angels þ of light./
What on earth have we here? And in particular, what have we here if we reimagine this as a general standard of paragraph schema, that is, if we abstract from it by dropping the particular linguistic segments ‘All’, ‘No’,
Semantic algorithms (antithesis 1) E
contrast synsyn-
I (thesis 2)
is 2 thes
thes
is 1
contrast
modification
modification
(thesis 1) A
273
O (antithesis 2)
Figure 38
‘Not’, ‘Some’? (For I am talking about the uses of these English words, not about logical quantifiers.) What we have is a pattern of diminishing semantic contrast, which is accentuated by the necessity of constantly repeating all the terms (or rather if, by using the model, the phrasings were replaced by coded templates, the terms would repeat). This pattern can be schematized as in Figure 37. If we restate this schema less semantically and more philosophically, we immediately get a semantic contrast-pattern reminiscent of dialectic (Figure 38). However, if we impose an ordering on this (in order to construct a standard paragraph) we find (as can be seen from the example already given) that we cannot straightforwardly combine A and O to get synthesis 1 or E and I to get synthesis 2, for if we could, the paragraph would not progress:
1 2 3 4 5 6 7 8
Thesis 1 Antithesis 1 Thesis 2 Antithesis 2 Synthesis 1 Synthesis 2
/All þ Irish are crooks/ (A) /No þ Irish are crooks/ (E) /Some Irish are crooks/ (I) /And some Irish þ are utterly þ non-crooks/ (O) /The Irish go þ to extremes:/ /they’re either þ angels or devils/ /Some may þ be utter þ black þ hearted fiends/ /but others are absolute þ angels of light/
I should be hard put to it, using the CLRU model, to make a machine construct these two syntheses, depending as they both do on the vital notion of ‘extreme’, which recapitulates the earlier notion conveyed by ‘utterly’ in Antithesis 2, that is, it recapitulates just the part of Antithesis 2 that is not traditionally part of the proposition O.
274
Phrasings, breath groups and text processing
I therefore headed this section ‘The Philosophical Notion of the Semantic Square’, thereby indicating that the Square of Opposition, thus linguistically reinterpreted, can only be used suggestively as a rough guide to fill in the semantic pattern of a standard paragraph. With this suggestion in mind, however, let us go back to the model and its four basic semantic patterns. 6.
The semantic square: drawing the second diagonal
It will be seen from the account of the four primary semantic patterns as given by the model that not only intonation and stress, but also position are taken as being cardinal information-bearers in semantics (semantics in this being sharply contrastable with syntax). That is to say, if a semantic match is obtained between two elements, each in the first position of a template (and therefore each standing for the first stressed segment of a phrasing), a different semantic pattern is obtained from that which would result from a match between, say, the last two elements in the two templates. Temporal sequence in the one-dimensional flow of utterance is here projected onto spatial position in the two-dimensional model; and it is, more than any one thing, the semantic significance of stressed position in speech that is being studied. Therefore the linguistic reinterpretation of the Square of Opposition, as set out in the last section, ‘plays down’ the logical interrelations indicated by the names of the lines on the Square; it tunes them down to the very lower edge of the human being’s intuitionally perceptible threshold. But it ‘tunes up’, to a corresponding extent, the actual geometrical properties of a square, for example, the fact that a square has four corners, four equal sides, two equal diagonals. This raises the question: How on earth can the Square, consisting of the semantic deep contrast pattern of a standard paragraph, be interpreted as having the geometrical properties of an actual square? How, in particular, can it have four equal sides and two equal diagonals, given that in the model, as just stated above, one-dimensional speech flow is mapped onto a two-dimensional spatial frame? Part of the answer to this question is easy. The ‘points’ of the square are the stressed ‘humps’ of speech. Spoken language, even taken at its very crudest, is a string with nodes in it. Likewise, the equidistance between the points are temporal equidistances between these main stresses of speech – at any rate, in the stressed as opposed to the syllabic languages. So far, so good. The crunch comes in the question: What are these diagonals? To proceed with this, consider again what I asserted earlier possibly to be the primary overlap pattern of all language:
Semantic algorithms
275
1
A
B1
2
B2
C1
3
C2
D1
4
D2
E
Figure 39
/The girl /and the house /and the wood /and the trees
lived þ in þ a was þ in þ a was þ full þ of were þ covered þ with
house/ wood/ trees/
GIRL HOUSE WOOD
HOUSE WOOD TREES
leaves/
TREES
LEAVES
Suppose now that we try to draw in more diagonals. We find at once that we can draw the diagonals B – C and C – D: for all we get by doing this is the two pairs of stressed elements that already occur in the second and third phrasings, and therefore we know in each case what the third connecting element is. If we abstract these two phrasings, moreover, we get quite a sensible pair of actual phrasings: 1 2 3 4
A B2 C2 D2
B1 C1 D1 E
/The House was þ in+the wood/ /The wood was þ full þ of trees/
The point is that we can’t, similarly, draw the other diagonals, that is, from A–C and from B–D because we would not know how to fill in the phrasings. (Remember, we are not now doing metamathematically based referential semantics; we cannot say that it ‘follows,’ by the Transitivity Principle, that if the girl was in the house and the house was in the wood, then the girl was in the wood.) For we precisely do not know whether, in the semantic universe of discourse that the utterance is creating, it does follow that when the girl was in the house she was also in the wood. On the contrary, we don’t know yet, but if you ask me for a guess, I should say it will not follow; if there were bears in the wood, then when the girl was safe in the house, with the door locked, she would jolly well not be any longer in the wood; though if there were also wizards in the wood, as there well might be, who could come through keyholes and vaporize themselves down chimneys, then even though she might be in the house and with the door locked, she would still be (in two more senses of the phrase) ‘not out of the wood’.
276
Phrasings, breath groups and text processing
On the other hand, no one is contending that this primary semantic pattern gives us a piece of paragraph; on the contrary, it does not even give us adult discourse. We get therefore to this thought: perhaps the semantic criterion of the existence of a paragraph – as opposed to any other indefinitely long sequence of phrasings – precisely is that in a paragraph we become able to draw the second diagonal. Consider this girl in this wood again. If we compress the sequence not in a syntactic way, by using pronouns, but by using the semantic algorithm that I have just given, which selects the second and third from the sequence of four phrasings,1 if we do this, we get information about the wood, but we have forgotten the girl. Continue the sequence, however: would it not be very likely to continue (e.g.): /The girl was a beauty/ /Her beauty was dazzling/ /Dazzling even the very þ birds þ and þ animals/ /For the very þ birds þ and þ animals knew the girl/ /That þ the girl was a disguised þ princess/ If now we try to draw the second diagonal, namely, from the first A to the final element which stands for /disguisedþ princess/, note that we can; for, applying the algorithm, we shall get out, as a result of this, the final, vital, phrasing (which, note, is also the only phrasing that breaks the monotonous ding-dong pattern of the sequence) which says that the girl was really a disguised princess. And now the sequence of phrasings looks much more like a paragraph. So we postulate: finding the paragraph is drawing the second diagonal. 7.
The two-phrasing paragraph and the notion of permitted couples
Thoughts of this kind led to the further thought: would it be possible, using the model, to define a minimal paragraph, that is, a paragraph consisting of only two phrasings, within which the machine could discern whether or not there was a semantic square? Only two types of candidate for such a paragraph intuitively presented themselves: a. the 2-phrasing double predicate: (Guberina’s example) /Mary milked the cows/ /John did the goats/ 1
Note: that to make an intuitively acceptable ‘abstract’ of the sequence, we really want the second and fourth phrasings: to get /the house was in the wood/:/the trees were covered with leaves/, i.e., we have to make use of more intonational features.
Semantic algorithms
277
α MAN:
β CAUSE/ χ BEAST:
α MAN:
β DONE/
χ BEAST:
b. the one-phrase question followed by a one-phrase answer (as in an imaginary linguistically condensed Automobile Association phrase-book). Using the model to do this experiment, we coded into templates eight short questions and eight short answers. The machine, by doing a semantic match, was required to pair these up so as to produce intelligible discourse, and succeeded in doing so, with the exception that the question and answer /What is the time?/ /Early next week./ could not be eliminated. In addition to the primary term-anaphora indicated by the match, however, the machine was permitted to discover a secondary semantic connection. To make this, it first formed permitted couples of all the individual templates, and then looked for other occurrences of these couples as between templates. For this experiment all permitted couples were taken to be commutative (though the interlingua used for it permitted a term with a slash (a/) to occur in any one position in a template). Using permitted-coupling on the four primary patterns, it is easy to see that this device greatly increases their semantic interconnectivity, as under: A
B
B
C
A
B
C
permitted couples AB BC
permitted couples AB AC
A
B
A
B
C
B
C
A
278
Phrasings, breath groups and text processing
permitted couples AB CB cv BC
permitted couples AB cv BA CA cv AC
If we turn back to our philosophic notion of the Semantic Square for a moment, we see that the notion of permitted couple is standing in both for the notion of minimal semantic contrast and also for those of reiteration and recapitulation. For in this program to construct a micro-paragraph, the A, the 1, and the 0 are to be interpreted only as single terms, each term standing for one single stressed segment. Synonym, or anaphora, is indicated by point-name equality: in the primary semantic pattern E = 1, in the second A = 1, in the third E = 0, and in the fourth A = 0. So it is no wonder that the dialectic pattern vanishes. On one linguistic phenomenon it threw considerable light, namely, on the use of the set of English verbs known as ‘anomalous finites’. For these are now seen, in at least one of their properties, as micro-paragraph formers: they enable the machine to construct the left-hand diagonal. /Are
you
Coming ?/
/Yes,
I
am./
Be/
MAN:
To:
THIS R:
MAN:
BE/
The squaring of an element in a template, as in THIS R, indicates a rule, R, of matching-relaxation operating with regard to it. In this case the rule is: match with any right hand element of any template (i.e., draw the right diagonal) with regard to which a left-diagonal match has been already achieved. 8.
Schema of the CLRU semantic model
I conclude by giving a schema of the CLRU semantic model to show that this is a model that, in principle, is mechanisable. One variant of it is in process of being mechanised by Wilks. Editor’s note: this paper was submitted originally as a project report for a US Government grant and was accompanied by very long appendices
Semantic algorithms
279
that more than doubled the length of the paper, but are not now worth reprinting since the general idea of their contents is clear from the body of the paper itself. Editor’s commentary This paper is full of ideas that were developed later by others. It is clear that, somewhere in the notion of matching similar phrasings, is the germ of what later was to be called EBMT or example-based translation (Nagao 1989), which is now perhaps the most productive current approach to MT world-wide, and I have heard Professor Nagao refer to MMB in this connection in a lecture. There is also the notion of skeletal gists of phrases or messages, made up from mapping interlingual items into templates and which formed the core of my own later work in the 1960s and 70s (under the name Preference Semantics (Wilks 1975a, b)), and I cannot now be certain exactly what MMB was contributing at this stage and what I was adding, as I had a strong hand in the original writing and editing of this paper. This notion re-emerged in the work of a number of people in the ‘Yale school’ of Schank (1975). There is also a strong emphasis here on the stereotopy of phrasings. Becker (1975) later did an influential analysis of the stereotopy of dialogue, and one can see how much of this work was waiting to be quantitative in terms of a full corpus analysis: ‘waiting for a future computer’ as MMB plaintively notes in one paper in this collection. Here, as throughout the papers in this book, is an emphasis, almost an obsession, with real language, attested in speech and writing, and as opposed to artificial examples. This battle has now been won, but MMB saw the point exactly, but had no appropriate tools. Less successful were MMB’s attempts to link dialogue forms with classical Greek rhetoric, and above all the notion that the phrasings (which defined message units) were to be defined intonationally, an idea MMB believed was implicit in the work of Guberina (this volume, chapter 9) and David Shillan’s idea of using the phrasing as a basic notion in language teaching. It also fitted, too conveniently one might say, with MMB’s strong interest in rhythmical psalms and plainsong. As often in her work, she took characteristics of religious language and projected them onto ordinary language, in her desire to show the continuity of the two. However, the notion that the phrase has a crucial role, as a segmental unit for analysis in syntax or for lexical entries, is now widely accepted, both theoretically and in practical MT systems. The primacy of the spoken language is almost universally conceded, and all speakers have to breathe, yet no one has, to my knowledge, located an
280
Phrasings, breath groups and text processing
empirical connection between the breathing points in discourse, which often correspond to written punctuation, and the points in the sentence carrying the most meaningful items, or focus points, which is what she wanted to claim. On the other hand, there are remarkable results such as that up to 90% of syntactic disambiguation can be predicted on phonemic discontinuities in the spoken sentence stream. Whether or not that confirms some of what MMB was trying to say may be material for further debate. I am also sure that her notion of ‘permitted couples’ in the ‘semantic square’ (she derived from Guberina) is also reaching outside all conventional notions of logic towards contemporary themes like ‘relevance logic’, which she was almost certainly unaware of, in the work of Belnap (1960).
Part 5
Metaphor, analogy and the philosophy of science
11
Braithwaite and Kuhn: Analogy-Clusters within and without Hypothetico-Deductive Systems in Science
1. Current relativist conceptions of science depend widely, though vaguely, upon the insights of T. S. Kuhn (1962), and, in particular, upon his notion of a paradigm. This notion is being used by relativists to support the contention that, since scientific theory is paradigm-founded, and therefore context-based, there can be no one discernible process of scientific verification. However, as I have shown in an earlier paper (1970a), there is another, more exact conception of a Kuhnian paradigm to be considered: namely, that conception of it which says that it is either an analogically used artefact, or even sometimes an actual ‘crude analogy’, that is, an analogical figure of speech expressed in a string of words. This alternative conception of paradigm, far from supporting a verification-deprived conception of science (which, for those of us philosophers who are also trying to do technological science, just seems a conception of science totally divorced from scientific reality) can, on the contrary, be used to enrich and amplify the most strictly verification-based philosophy of science that is known, namely the Braithwaitean conception of it as a verifiable hypothetico-deductive (H-D) system. For such a paradigm, even though, in unselfconscious scientific thinking, it is usually a crude and concrete conceptual structure, can yet be shown to yield a set of abstract attributes. These can provide ‘points’, or ‘nodes’, or other more complex units, on to which some even more abstract H-D system can then, like a mathematical envelope, be ‘hung’, after which the power of the mathematics can ‘take off on its own’. And, although Braithwaite’s whole account of science is, I think, over-simplified, yet he is right in showing how, from that point on, the mathematics can be used. For it can indeed be used progressively to fit on to, and to test features of, a second concrete, but also verifiable and operational, B-component (the concrete analogy that started everything off being the A-component); and, even in comparatively undeveloped science, let alone in advanced science, this kind of verification really does occur. But for the Braithwaitean H-D model to be realistic, the crude analogy, the A-component, has got to be there also; 283
284
Metaphor, analogy and the philosophy of science
because its function is either to guide, and orient, the subsequent mathematical (or mechanical development when this development has occurred, or, predictively speaking, to do instead of it when it has not yet occurred – for theory-making comes at quite a late stage, in real science. This point needs amplifying. In real science, as Norman Campbell, who knew about it, long ago said (1920), there is nearly always far too much mathematical ‘play’ in any mathematics that is powerful enough to be used for scientific development. The mathematics produces infinities; it fails to produce what you want; the complexities, which were required to make it fit on to the original analogical insight in the first place, later on, when inappropriate theorems have to be brought to bear, make it altogether out of hand, so that it generates nothing you can any longer recognise. Its excesses and rigidities have to be tailored; moreover, operationally valid fudges (which make it fit the B-component) have to be organised. Moreover again, it may become necessary to shift from one mathematical system to another, in order, predictively and in the end, to get anywhere. And by what other system of predictive ideas can the mathematically predictive system of ideas itself be tailored? And, if it is necessary to do mathematical shifting, what guides the shift? Only the original (Kuhnian) analogical insight that started the whole enterprise off in the first place; for in abstract deductive science as it is actually done, the second kind of fitting, which permits the verification, normally only comes into action right at the end. So you cannot work backwards, in orienting mathematical development. If the whole enterprise is to be predictive – and prediction is the object of it – you have got to work forwards; and there is nothing else to work forwards from, except the original crude paradigm. That this evident fact has not been seen by philosophers of science is due, I think, to many diverse causes. One cause is that epistemologists and logicians (from the ranks of which philosophers of science are normally recruited) have relegated the study of analogy to the English Department. Another cause is that (speaking broadly) analogy is used to form scientific paradigms in a way converse to that in which it is used in poetry – a point to which I shall return – so that nothing that is said about it in English Departments is likely to be serviceable to the philosophy of science. And the third cause was that, whereas the special skills of analytic philosophy were precise conceptual analysis and/or conceptual model-making (which in the older philosophy used to be called ‘rational reconstruction’) no technique was available for applying these skills to analogy; so that it would have seemed plain unintelligible to say that an analogy could form a predictive structure onto which a mathematical or mechanical system could then subsequently be ‘hung’.
Braithwaite and Kuhn
285
In my view, the missing technique has now become available. And therefore, in this second paper on the nature of a Kuhnian paradigm, I propose to use the technique to make a model of analogy, thus modelling that primitive modelling activity of science that still persists, even when the vehicle for it is not a three-dimensional artefact, or even a two-dimensional schema, or ‘picture’ or diagram, but only a one-dimensional stretch of natural language. To make this model I shall indeed have to cross the academic disciplines, and this will cause problems, in that I will be accused of no longer doing philosophy. But I shall not be retreating into literary criticism. On the contrary, I shall be offending literary critics by manipulating natural language with an imaginary computer, and also, probably, offending philosophers by saying that to do this philosophically requires making three adjustments to the current customary conception of philosophy. (And yet, is it not high time that we widened philosophy?) The three adjustments that I need to make are the following: 1. I need to extend the current sense of ‘deduction’ so as to be able to say ‘The sentence S2 is computable from the sentence S1 in the coded language L’, as a philosophic replacement for ‘The formula F2 is deducible from the axioms A1, A2 . . . An in the axiomatised calculus C’. 2. Secondly, I need to persuade philosophers of science, and notably Hesse (1974), to reverse the whole direction of approach they now take when they are wishing to incorporate a conception of analogy within the contemporary universe of discourse of the philosophy of science. For they, in order to remain academically ‘within the literature’, always water down the real characteristics of language when talking of analogy, as the earlier logicians did when talking of Moore’s technique (cf. Masterman 1961), in order to keep just so much of the phenomenon of analogy, though by free association, to be included in it also. But in fact, as I think my model conclusively shows, if you take either natural language or analogy seriously, you cannot do this. 3. The third adjustment is to the currently fashionable conception of the nature of language as it is referred to from within philosophy, and it consists in saying that Hacking’s (1975) emasculatory trail, from the ‘Heyday of Ideas’ through the ‘Heyday of Meanings’ to the far more superficial, though more systematic, ‘Heyday of Sentences’, has got to be retrod. Analogy, like metaphor, is the superimposition of one framework of ideas upon another; so, to analyse it, you have to have a model that, in an unashamedly seventeenth-century manner, though with a new gloss, deals in ideas. For the seventeenth-century philosophers were operationally right in their inherited belief that (in some sense)
286
Metaphor, analogy and the philosophy of science
ideas ‘lay behind’ words. But they were operationally wrong in their conviction that these same ideas, which, as they themselves admitted, formed the root and basis of public language, were ascertainable only by private introspection, with no public provenance. What has happened, in the passage from the Heyday of Ideas to the Heyday of Sentences, is that philosophers with an exaggerated reverence for mechanism have tried at all costs to find something in language to mechanise. Grammatical transformation (Chomsky), propositional connection (Russell et al.), verificational systematisation as between factsentences and first-order or other predicative sentences (Tarski, Quine, Montague, Davidson), systematisation of speech-acts (Austin to the future through Searle): they have all done it. What all these philosophers have forgotten, when calling the resulting systematisation ‘a language’, was that all the rest of what was really there in language – and all that really matters about it, once you are no longer doing logic – was still being fully and efficiently processed by them themselves, intuitively, subliminally, nonconsciously. But now, in the computer world of word processing, we put real language into a real machine; and this machine really is an inert mechanism: it has no sublimen. And the result of this, of course, is that all the semantically shifting layered and interlacing depths of language – all the most Coleridge-like features of this frightening and volatile phenomenon of human talk, the very foundation of thinking – are now progressively coming out into the light. How many already identified philosophical problems will have to be solved, using the new methods, before those who refuse to admit to their existence become castigated (probably in future works of Hacking) as ‘know-nothing’ philosophers (1975, p. 163) is anybody’s guess. Here, in a first whiff from the new world, I try to apply them to analyse analogy. 2. The analogical use of language, on this model, becomes only a special case of the normal use of all language; that is, when ‘the normal use of all language’ is interpreted operationally, not philosophically. For language, when coded and dictionary-matched on a machine, does not turn out to consist of isolated, single-meaning sentences. Real language consists of a reiterative semantic flow, the total sequence of the units of which combine to form a text. These units are, predominantly, seven (plus-or-minus) word phrases, which are often coded within nested brackets as being lists, for on this model, the primitive unit of knowledge is an item on a list (the unit ‘booked’ in a ‘book’ – a book was originally a list: Shorter Oxford English Dictionary 3rd edn, 1977, p. 217), not a sentence. Each such item, or phrase, is indeed built from a sequence of stressed or unstressed words. But, in the general model, though not in all particular ones, every single word in every such sequence has both multiple syntactic use (called
Braithwaite and Kuhn
287
homographs) and also, when considered semantically, a whole string, if not an actual whole structure, of multiple meanings, consisting of the ways in which it is used. Thus it is the whole coded phrase, or list item, that is sometimes – though by no means always – reminiscent of a logician’s term. A coded word, at its simplest, is a whole partially ordered set on its own, a fan: a totally different animal. 2.1. The model-maker’s philosophic first question is: How are the fans of word-uses to be coded, within an imagined coded language, L? And the answer is that they have to be coded by mapping each of them onto a system the units of which consist of the most-talked-of semantic-areas of the language. Since these areas, which are wide, are also themselves structured, to include all possible aspects, syntactic, referential and analogical, under which any subject of talk can be talked about, there is only one practical way of codifying them, and that is to assign to each area the name of some very general idea. Moreover, if the area system is to be converted (computationally) into a knowledge structure, (1) the aspect markers, which recur from topic to topic, will be even more abstract than the area codings; (2) to separate the analogical references from one another, the semantic areas will have to be cross-referenced; (3) the main overall inclusion relations within and between the semantic areas will themselves have to be labelled with classifying labels so that the machine can generate inference schemata. And from all this it becomes evident that such a knowledge structure, though it remains a mappabile of often concrete individual word uses, will have no lack, within itself, of abstract attributions; the names of these also being word uses, though sometimes specially constructed word uses (for, seen fan-wise, the model bends back upon itself). To discuss the complexities and handling problems of such a model, and of what happens when some attribute in it goes ‘above the meaning line’, all this would take us way out of philosophy. The point here is that (to the extent that any selection of semantic areas that is used for processing is really taken from among the most-talked-of subjects within any language) these semantic areas, though unobservable, have objective validity. Thus, in the twentieth century, ideas reappear as the objects of content; but this time round they are publicly inferable unobservables, rather than privately introspected entities. If we are to be simple, and thus philosophical, we must go right back in our minds, via the nineteenth-century Royal Society Library, to the seventeenth century itself and imagine ourselves restructuring Roget’s Thesaurus (1962) in a manner reminiscent of Bishop Wilkins’ Character Universalis (1668). Not that simple, you will say; but now consider, using Roget, the actual fan of uses there given for the notoriously ambiguous English word ‘bank’ (see Figure 40).
‘BANK’
209 highland
218 seat
HEIGHT
SUPPORT
Figure 40
220 acclivity
220 be oblique [verb]
OBLIQUITY
234 edge
239 laterality
344 shore
EDGE
LATERALITY
LAND
632 storage
632 storage [verb]
STORE
798 treasurer
799 treasury
TREASURY TREASURER
Braithwaite and Kuhn
289
In Figure 40, the word ‘bank’ is at the apex, and the names and numbers of the subjects-which-it-is-most-used-to-talk-about (call them heads, though they are actually neo-Wittgensteinian ‘families’) separate out the uses of the word and so form the spokes of the fan. If you turn the figure upside down, then a whole sequence of text is produced consisting only of a string of overt ideas, with the word ‘bank’ then becoming the unobservable connective between them. To enable this inversion to be performed, when required, it is advisable to extend the fan into a lattice, of which the I-element will be the total dictionary entry of the word ‘bank’, and the O-element anything that may be in common between its uses. Minimally, this common element will be the bare fact that all these subjects-of-discourse, in English, are referable to by making use of the same word ‘bank’; but sometimes also there is some discernible common element of meaning between all of them or some of them, as there is, for instance, between the three uses of ‘bank’ which refer to store, treasurer and treasury. 2.2. The question now arises as to how to make the machine detect, for operational purposes, the meaning of ‘bank’ that is actually used in any given text. The human being will at once answer, ‘By embedding it in a phrase’: ‘in the savings bank’, ‘up a steep bank’, what have you. This answer is correct, but of the difficulty produced by the fact that ‘up’, ‘savings’ and ‘steep’ will all themselves have to be coded into fans of uses; and thus will all of them also be ambiguous. The simplest algorithmic answer to this problem is to make the constituent fans of a phrase pare down each other by retaining only the spokes with ideas that occur in each. The application of this algorithm retains the area OBLIQUITY 220 in ‘steep’ and ‘bank’; whereas it retains as common between ‘savings’ and ‘bank’ both of the two areas STORE 632 and TREASURY 799. Note also that the general application of this algorithm and no other produces a reiterative model of the foundations of language, the main current application of which is to coordinate indexing in libraries (see Figure 41). 2.3. However, when applied to the full variegation of natural language, reiterative meaning-specification is not as simple or one-staged as this. For instance, if we had said ‘grassy bank’ instead of ‘steep bank’, we should have had to use Roget’s (erratic) cross-reference system in order to get back from ‘grassy’, via PLAIN 348, to LAND 344, which then intersects with ‘bank’ at SHORE. (And, in a better structured knowledge system, proceeding reincarnationally from Wilkins to Wilks (1975), we should operate with a layer of aspect markers, in a manner reminiscent of the seventeenth-century Logic of Predicables, to detect that, whereas banks are regions of land that are often COVER-ABLE, grass is a plant that often acts as a COVER-ING.)
290
Metaphor, analogy and the philosophy of science Idea-structure of ((UP) (THE (STEEP (BANK))))
((UP)
(STEEP
(THE
(BANK))))
209 ∪ 220
220 OBLIQUITY 209 HEIGHT
‘steep’ ‘bank’ ‘slope’ ‘hill’ ‘rise’ ‘rising’
((HEIGHT) 209
(THE
(HEIGHT 209 OBLIQUITY 220
(HEIGHT)))) 209 OBLIQUITY 220
‘up’
‘the’
‘steep’ ‘rising’
‘bank’ ‘slope’ &c.
Figure 41
There are other cases. In bad metaphysical writing, for instance, there are far too many spokes of all the fans remaining (Figure 42). 2.4. In imaginative writing, however, in which not every sentence is a cliche´, the writer can often deliberately avoid any overlap at all: ‘It was an unbelievable restaurant; there were no tables, no dining-room, no waiters, no kitchen, no food; only pills and an open space in which to do physical exercises.’ Here ‘restaurant’ really means ‘anti-restaurant’, though with a difference: and so there will never be a semantic overlap, whatever you do, between the fan for ‘unbelievable’ and the fan for ‘restaurant’. Here the solution is, as the reader will already have guessed, to intersect ‘restaurant’
291
Braithwaite and Kuhn Idea-structure of ((IN) (THE (SAVINGS (BANK))))
((IN)
(THE
(SAVINGS
(BANK))))
632 ∪ 799
799 TREASURY
632 STORE
632 ∩ 799 ‘safe deposit’ ‘depository’ ‘treasure-house’ ‘bank’ ‘safe’ ‘coffer’ ‘money-box’ ‘money-tag’ ‘thesaurus’
((STORE) 632
(THE
(STORE 632 TREASURY 799
(STORE)))) 632 TREASURY 799
‘in’
‘the’
‘savings’ ‘safe’
‘bank’ ‘depository’ &c.
Figure 41 (cont.)
with ‘dining-room’, ‘waiters’, ‘kitchen’ and ‘food’, and ‘unbelievable’ with the reiterations of ‘no’. But to do that the machine has to be provided with a semantic pattern; and this pattern has to have both intersection specifications and also four, six, eight or more numbered positions (see Figure 43). This last device, of employing a reiterative semantic pattern and then filling in the holes in it, is what we use when we model analogy. For
292
Metaphor, analogy and the philosophy of science ((REAL
(TRUTH))
((IS
(SUBSTANTIAL
(INTRINSICALITY)))))
1 EXISTENCE
494 TRUTH
1 EXISTENCE
1 EXISTENCE
1 EXISTENCE
494 TRUTH
3 [substance] SUBSTANTIALITY
3 [intrinsically] SUBSTANTIALITY
5 INTRINSICALITY
5 INTRINSICALITY
3 [reality] SUBSTANTIALITY 494 TRUTH
494 TRUTH
Figure 42
((IT (WAS)) (AN (THERE (WERE))
(UNBELIEVABLE (NO (NO (NO (NO (NO
(ONLY (AND (AN (OPEN (IN (WHICH) (TO + DO) (PHYSICAL
(RESTAURANT))); (TABLES)), (DINING-ROOM)), (WAITERS)), (KITCHEN)), (FOOD)); (PILLS)) (SPACE)) (EXERCISES)).)
Figure 43
when we draw an analogy, we are proposing to open up a new much-to-be-talked-about semantic area, rather than just drawing upon old ones; and this we can only do by filling in already known positions, in known patterns, in new ways (see Figure 44). Here, using Roget, ‘man’ by no means directly intersects with ‘wolf ’. A ‘man’, for Roget, was by no means a wolf: far from it. Nevertheless, when we begin to fill in the pattern, connections begin to display themselves which then allow the patterning to ‘force the analogy’ (see Figure 45). 2.5. Since analogy drawing, on this model, and as I earlier said, is only one very normal instance of the way in which all language works, the two ways to develop analogy are, semantically speaking, the same two basic ways in which all semantic flow develops down a page. And note that, in each case, we finish up with a structure. The first way to extend analogy is by analogical pile-up, the result of which is to form an analogy cluster. This is done by simply drawing on
293
Braithwaite and Kuhn ( ( ))}
( 88 ONEN 486 DOUBTN 864 WONDERN (103 ZERON
[too many entries: see Roget]
( ( ))}
( (( )
(103 ZERON (103 ZERON (103 ZERON
( (+)|
[too many entries: see Roget]
(
ABODE (( ( 192 301 FOOD 194 RECEPTACLE ( 301 FOOD (( 194 RECEPTACLE (( ( 301 FOOD H,S,L
H,R
H,S,L
H,R
H,S,L
H,R
(301 FOODH,R)) 194 RECEPTACLEH,S,L 301 FOODH,R (103 ZERON (301 FOODH,R)) ( 88 ONEN 301FOODH,R 658 MEDICINEH,R (201 INTERVALSp,L (201 INTERVALSp,L)) (534 EDUCATIONH,| (534 EDUCATIONH,|)) )
((
( (
((
A1 primary semantic flow
B1
(AB)2
A2
B2
A3
B3
A4
B4
A5
B5
reversed recapitulatory flow
⊥
⊥
(AB)1
[A] (C1)6
tertiary flow
[B] (D1)6
([A] [B])3
[A] (E1)7
tertiary flow
[B] (E2)7
([A] [B])4
[A] (F1)8
tertiary flow
[B] (F2)8
Figure 44
Metaphor, analogy and the philosophy of science
947
898 906
904
898
IS
786
365
301
176
9
894
855
837
742
742
713
688
633
372
371
WOLF ∪ WOLFISH
IS + LIKE
HUMAN ∪ HUMAN + NATURE ∪ HUMANITY
CRUEL
827
735
718
675
[too many entries]
176
905
901
897
736
934
901
897
371
… … 365
134
MAN
18
294
BASIC REITERATIVE PATTERN [7-FOLD PARISON] AUXILIARY REITERATIVE PATTERN SECONDARY REITERATIVE PATTERN [AUXESIS] [DOUBLE COGNATE]
(MAN) MANKIND
(IS + LIKE) 371
A,M,H (HUMAN + NATURE) ANIMAL MANKIND
Comp. (IS)
VIOLENCE 176 ANIMAL 365 MALEVOLENCE 898 A,M (CRUEL)
365 371 Gen, Att, H
Figure 45
(A (WOLF))
VIOLENCE 176 MALEVOLENCE 898 Comp.
Gen, Att, A
Braithwaite and Kuhn
295
more synonyms, from Roget, to extend your text; an operation that, in principle, is easy to mechanise, since Roget’s semantic areas are themselves, predominantly, instances of such pile-up. The second way to extend analogy is to develop it. By complicatedly adding to Roget a set of structured knowledge-glossaries, and then using an inference-schema to pass to and fro between them, we can, again in principle, ‘develop the analogy in all its aspects’. For instance, you can match a glossary of the parts of the body of a wolf with those of a man, provided that somebody has remembered to store the two glossaries in the machine in the first place. The actual matching algorithm, however, is not difficult to automatise; and it is the possibility of algorithmic aspect development that makes of analogy a conceptually predictive (and therefore a scientifically predictive) vehicle. Such development also makes of it a verificational vehicle, whereas analogical pile-up, on the contrary, tends to make the aspect attributions progressively more abstract and metaphysical. For, given that the whole operation of the model has already given the analogy an underlying context by embedding it in a text, it is not unreasonable to suppose that any sentence that can be constructed, as an aspect development of some original analogy, can be judged, in a two-valued and neo-Keynesian (Keynes 1921, ch. 18) manner, as either contributing to the total positive analogy or to the total negative analogy, thus providing a primitive mechanism, even at the early stage when there is no other mechanism, for confirming or disconfirming a Kuhnian paradigm. 3. Of course, developing an analogy that will be usable in science is by no means as simple as comparing a man with a wolf; and of course the ways in which analogy is used in science are by no means as easy to model as an analogy that can be developed by forming a single text down a page. In order to gain more knowledge by example of how analogy was actually used on scientific occasions, I devised a schema representing analogy development which, by courtesy, could have been called a two-person game. The game was between the proponents of the analogy, who kept trying to add to the positive analogy, and the opponents of the analogy, who kept trying to add to the negative analogy, in order to ‘shoot the whole analogy down in flames’. There were three sorts of move that could be made by either side: (1) posing an analogy, or composing an analogy cluster, the cluster being produced by constructing pile-up; (2) clicking an analogy; (3) cannoning, an end-game strategy performed by superimposing whole analogy clusters, and of which further explanation is produced below. The moving system was as in billiards, with the proponents of the analogy having the first move, and going on composing and clicking until no more positive analogy could be produced. The opponents, for their move, produced and developed negative
296
Metaphor, analogy and the philosophy of science
analogy, and also produced tests ‘to test the strength of the analogy’, which, if the tests failed, would bring the whole analogy down. There was a scoring system: every positive click scored 2; every negative click scored 2; and if the score became negative, the game was provisionally lost. However, it could be retrieved again, by successful cannoning. For cannoning was the bringing to bear, on the original concrete analogy, of a whole new analogy cluster, with a whole new mathematical or other technique attached; and a successful cannon, which could be achieved by either player, doubled either the positive or the negative score. I played the game on a simplified form of the kinetic theory of gases, taken from the historical chapter of Sir James Jeans’ book (1921), and, in more detail, on the theory of continental drift. The composed analogies, in the first case, were those of Democritean atom, of a set of rubber billiard balls bouncing in a box, of billiard balls ricocheting on a two-dimensional billiard table, at first finite, then infinite, and finally, of a set of interacting points. From that stage on, the mathematics of the theory took over; but it was interesting to see, in Van der Waal’s correction, that the original concrete analogy had to be reasserted, and the points reconverted, in imagination, back into balls. But the final cannon, which established the theory and won the game, only occurred quite late on, with the discovery of Brownian movement. Until then, there were far more opponents of the theory, producing negative analogy, than we now realise. In the second case, that of continental drift, and where there were no mathematics, far more weight was thrown onto the actual game. Here the analysis was taken from a New Scientist paper by Barrie Jones (1974), called ‘Plate Tectonics: A Kuhnian Case?’ The analogies used by the proponents of the theory were the jigsaw puzzle analogy, with aspect development showing continuity of animal-plant pattern; the blown-up balloon analogy, which made the evolutionary movement of the earth’s crust able to blow up vertically, with wrinkles for mountain ranges; and a toy-boat analogy (disastrous, but implicit in the very word ‘drift’), which, on a stabilist earth, allowed the continents to float about. The opponents of the theory demolished the toy-boat analogy, by causing it to produce any number of negative clicks, and they produced yet more by adducing instances of ‘misfit’ of the jigsaw puzzle. In addition, they produced, to replace the toy-boat analogy, an intercontinental bridge-formation analogy which they knew would prove untrue, in that no traces would ever be found of any of the bridges. They further produced negative clicks from the blown-up balloon analogy. All this caused the score to become negative, and so the theory was judged to have been shot down in flames. However, much later, plate tectonics re-established the theory by producing a famous and successful cannon. The new analogy clusters then
Braithwaite and Kuhn
297
brought to bear were: the acoustic analogy of shape determination by echo (with technique attached); the analogy of handlike mechanisms moving to juxtapose symmetric/asymmetric remanent magnetic patterns; the analogy of the unevenly rising pie-crust (which replaced that of the expanding toy balloon); the analogy of the twisting plasticine, the twist being caused by the wandering of the earth’s magnetic poles; and the analogy of overlapping ironplates, which piled up with an analogy of similarly overlapping large flakes on the pie-crust to produce a picture of large-scale horizontal movements on the earth’s crust. All this cannoned on to, and saved, the jigsaw puzzle analogy; because it created, by aspect development, such a new wealth of positive analogy that it was judged that the negative analogy (all of which still remains) would never catch up, and so the game was judged won. The kind of analogy handling, even though only carried on in two cases and in a simplified way, convinced me once and for all of the absolutely cardinal role that analogy handling (once you have the model with which to handle the analogies) can be shown to have in all paradigm-based science; and since then examination of other, more sophisticated cases, such as that of black holes in the universe (see Clarke 1978), has strengthened this belief yet more. The handling also prompted the reflection that analogy is (overall) used in converse ways, when it is used in poetry and when it is used in science. For in poetry we tend to apply novel, hitherto undreamed-of analogies to illuminate ordinary, easily perceived or intuited circumstances, so that we add to our ordinary vision a new vision, which makes us see ordinary circumstances in a new way. In science, on the contrary, we pile up quite ordinary, not to say obvious analogies, and then apply them counterintuitively, in extraordinary circumstances, so that the analogy itself supplies our only guide to further exploration in the conceptual darkness into which we have then projected ourselves. Thus, as an example of the first, consider Cecil Day Lewis’ poets’ analogy (though actually to be found in a prose work): ‘The barrage balloons, those whales, floated over London.’ Here the analogy itself is not ordinary; for whales are esoteric, not normally observed creatures. And yet how good it is. For having read the sentence, together with its metaphor, almost any ordinary person who has ever observed barrage balloons, will now be able to ‘see’ also the great aerial fishes, floating and swaying over the besieged town. Contrast this with the use of analogy in the kinetic theory of gases. This employs quite an ordinary pile-up of analogies, of rubber balls bouncing, or billiard balls ricocheting. Yet, until Brownian movement became observable through microscopes (which is actually to posit yet another analogy, since, with an ordinary microscope, the actual molecules are much smaller than the
298
Metaphor, analogy and the philosophy of science
entities that we appear to observe), who, using their unaided common sense, would ever have imagined that gases were like that? And since no ordinary person could have produced the act of imagination, no ordinary person, either, could have invented kinetic theory. Nevertheless, in the theory, the pile-up was there; and thus the fact has got to be faced that Kuhn, by digging for paradigms, saw, in effect, the absolutely cardinal fact that, in real science, the thinking that goes on is enormously cruder, with the crudity going on much longer, than anyone who had not actually observed it going on would have believed possible. And why? Because the whole purpose of inventing scientific theory is to get some strong-minded way – any way – of predicting what is going to happen in totally unknown areas of reality that you cannot in any other way explore; and of then verifying to see whether it then does happen or whether you have got to start again. And that is what Braithwaite saw (1953, pp. 263f.), and Kuhn, in effect, did not.
Bibliography of the scientific works of Margaret Masterman1
Masterman, M. (1949, 1951, 1953) ‘Three Attempts to Exemplify the Pictorial Principle in Language. (1) The Word ‘Philosophy’ as used by John Wisdom (1949); (2) Towards a Symbolism for Dogmatic Theology (1951); (3) Analysis of a Religious Paradox (1953)’. Unpublished manuscripts. (1953) ‘The Pictorial Principle in Language’. In: Proceedings of XIth International Congress of Philosophy, vol. 14, pp. 1011–27. (1954) ‘Words’. In: Proceedings of the Aristotelian Society, vol. 54, pp. 209–32. (1956) ‘The Potentialities of a Mechanical Thesaurus’. Paper read at the Second International Conference on Machine Translation, MIT, October. (1957a) ‘Metaphysical and Ideographic Language’. In: British Philosophy in the Mid-Century (C. A. Mace, Ed.), London, Allan and Unwin pp. 314–37. (1957b) ‘Metaphysics’. Paper read to the Cambridge University Moral Science Club, 30 May (chapter 2, this volume). (1958) ‘The Thesaurus in Syntax and Semantics’. Machine Translation, vol. 4, pp. 35–43. (1959) ‘Classification, Concept Formation and Language’. Paper presented at the Fourth Annual Conference of the British Society for the Philosophy of Science, September (chapter 3, this volume). Masterman, M., Needham, R. M. and Spa¨rck Jones, K. (1959) ‘The Analogy between Mechanical Translation and Library Retrieval’. In: Proceedings of the International Conference on Scientific Information, Washington, DC National Academy of Sciences, pp. 917–33. Masterman, M. (1961) ‘Translation’. Aristotelian Society Supplementary Volume 35, pp. 169–216. Masterman, M. and Parker-Rhodes, A. F. (1961) ‘A New Model of Syntactic Description’. Paper read at the International Conference on Machine Translation of Languages and Applied Language Analysis, National Physical Laboratory, Teddington, September. Masterman, M. (1962a) ‘Semantic Message Detection for Machine Translation, Using an Interlingua’. In: Proceedings of the First International Conference on Machine Translation of Languages and Applied Language Analysis, HMSO, London, pp. 437–56.
1
Research Memoranda wholly or partly authored by MMB are listed separately at the end of this bibliography.
299
300
Bibliography of Margaret Masterman
(1962b) ‘The Intellect’s New Eye’. In: Freeing the Mind. Articles and Letters from the Times Literary Supplement 3/6, March–June, pp. 62–4. (1963) ‘Commentary on the Guberina Hypothesis’. Methodos, vol. 15, pp. 107–17. (1964) ‘The Semantic Basis of Human Communication’. Arena, No. 19, April, pp. 18–40. (1965) ‘Semantic Algorithms’. In: Proceedings of the Conference on ComputerRelated Semantics in Las Vegas, NV. December 3–5, pp. 174–87. (1967) ‘Mechanical Pidgin Translation’. In: Machine Translation (A. D. Booth, Ed.) North-Holland, Amsterdam, pp. 263–79. (date unknown) ‘Man-aided Computer Translation from English into French Using an On-line System to Manipulate a Bi-lingual Conceptual Dictionary, or Thesaurus’. Paper read at the Deuxie`me Confe´rence Internationale sur le Traitement Automatique des Languages, Grenoble. (1970a) ‘Bible Translating by ‘‘Kernel’’’. Times Literary Supplement, 19 March, p. 299. (1970b) ‘The Nature of a Paradigm’. In: Criticism and the Growth of Knowledge (I. Lakatos and A. Musgrave, Eds.), Cambridge University Press, Cambridge, pp. 59–89. (1973) ‘Interlinguas’. Paper presented at Informatics I Conference, 11–13 April, Cambridge. (1975) ‘Life, Death and Resurrection of the Academic Woman: Expounded by Playing the Paranormix Game’. Fawcett lecture, University of London, Bedford College. (1977) ‘The Primary Unit of Aboutness’. Seminar on ‘Aboutness’, held at ASLIB, London, 17 April. (1977) ‘Reiterative Analysis of a Simile. Part I: The Nature and Stages of the Process’. Theoria to Theory, vol. 11, pp. 57–68. (1978) ‘Reiterative Analysis of a Simile. Part II: The Second Instalment of a Philosophic Serial’. Theoria to Theory, vol. 12, pp. 17–29. (1978) ‘Reiterative Analysis of a Simile. Part III: The Third Instalment of a Philosophic Serial’. Theoria to Theory, vol. 12, pp. 123–34. (1978) ‘Reiterative Analysis of a Simile. Part IV: The Fourth Instalment of a Philosophic Serial’. Theoria to Theory, vol. 12, pp. 191–204. (1978) ‘Can we Progress in Determining Criteria for Evaluating Machine Translation?’ Paper for Workshop on Evaluation in Translation Systems, Luxembourg, 28 February. (1978) ‘Reiterative Semantics’. Paper for Lucy Cavendish College, Cambridge Science Group, 6 June. (1978) ‘The Basic Reiterative Semantic’. In: Informatics 3. Proceedings of a Conference held by ASLIB Co-ordinate Indexing Group on 2–4 April 1975 at Emmanuel College, Cambridge (K. P. Jones and V. Horswell, Eds.), ASLIB, Cambridge, pp. 103–13. (1979) ‘The Essential Mechanism of Machine Translation’, Paper read at the British Computer Society, January. (1979) ‘Rhetorical punctuation by machine’. In: Advances in Computer-Aided Literary and Linguistic Research (D. E. Ager, F. E. Knowles and J. Smith, Eds.), University of Aston, Birmingham, pp. 271–91.
Bibliography of Margaret Masterman
301
(1979) ‘The Essential Skills to Be Acquired for Machine Translation’. In: Translating and the Computer (B. M. Snell, Ed.), North-Holland, Amsterdam, pp. 86–97. (1979) ‘The Implications of Machine Translation for Language Learning’. Obtainable from Linguaphone House, Bearbore Lane, Hammersmith, London. (1980) ‘Braithwaite and Kuhn: Analogy-clusters within and without Hypothetico-deductive Systems in Science’. In: Science, Belief and Behaviour: Essays in Honour of R. B. Braithwaite (D. H. Mellor, Ed.) Cambridge University Press, pp. 63–95. (1981) ‘First Impressions of a Whiteheadian Model of Language’. In: Whitehead and the Idea of Process (H. Holz and E. Wolf-Gazo, Eds.). Proceedings of The First International Whitehead-Symposium, Karl Alber, Freiburg/Munich, pp. 21–42. (1982) ‘The Limits of Innovation in Machine Translation’. In: Practical Experience of Machine Translation (V. Lawson, Ed.) North-Holland, Amsterdam, pp. 163–86.
CLRU publications authored wholly or partly by Margaret Masterman (dated where this is known) Masterman, M. (1956) ‘The Potentialities of a Mechanical Thesaurus’. CLRU memo ML-1 (in this volume, chapter 4). Masterman, M. and Conway, R. S. (date unknown) ‘How Latin Grammar was Built Up’. From ‘The Making of Latin’, CLRU memo ML-2A. Masterman, M. (date unknown) ‘The Mechanical Study of Context’. CLRU 957 Project II-memo ML-11A. (date unknown) ‘Linguistic Problems of Machine Translation’. CLRU memo ML-14. (1957) ‘Fans and Heads’. CLRU memo ML-21 (in this volume, chapter 2). Also read as a paper (under the title ‘Metaphysics’) to the Cambridge Moral Sciences Club, 30 May. (date unknown) ‘The Use, in Brouwer, of the Terms ‘‘Fan’’, ‘‘Spread’’, ‘‘Spreadlaw’’ and ‘‘Complementary Law’’’. CLRU memo ML-22. (date unknown) ‘Note on the Properties of the Successor Relation’. CLRU memo ML-22A. (date unknown) ‘Outline of a Theory of Language’. CLRU memo ML-24. Masterman, M., Needham, R. M. and Spa¨rck Jones, K. (date unknown) ‘Description of Current Work on Syntax at CLRU’. CLRU memo ML-27. Masterman, M. (date unknown) ‘The Effect of using Electronic Techniques for Examining Language’. CLRU memo ML-36. Masterman, M. and Spa¨rck Jones, K. (date unknown) ‘First Thoughts on how to Translate Mechanically with a Thesaurus’. CLRU memo ML-43A. (date unknown) ‘The Analogy between Mechanical Translation and Library Retrieval’. CLRU memo ML-44. Masterman, M. (date unknown) ‘Syntax (Monolingual and Interlingual)’. CLRU memo ML-72.
302
Bibliography of Margaret Masterman
Masterman, M., Needham, R. M., Spa¨rck Jones, K. and Mayoh, B. (1957) ‘Agricola in curvo terram dimovit aratro’. CLRU memo ML-84, November (in this volume, chapter 6). Masterman, M. (1959) ‘What is a Thesaurus?’. CLRU memo ML-90, June (in this volume, chapter 5). (date unknown) ‘Fictitious Sentences in Language’. CLRU memo ML-91. (date unknown) ‘Classification Concept-Formation and Language’. CLRU memo ML-95. Masterman, M. and Shillan, D. (1968) Progress Report, Oct.: ‘Text-Handling and Mechanical Translation’. CLRU memo ML-112. Masterman, M. (date unknown) ‘Manipulating a Thesaurus with a Pidgin Language’. CLRU memo ML-113. (date unknown) Half-Yearly Report to the US National Science Foundation: ‘General Mechanical Translation Program for Use on a Digital Computer’. CLRU memo ML-114. (date unknown) Final Report: ‘The CLRU Lattice-Theory of Language’. CLRU memo ML-116. ‘The Halliday ‘‘Twenty Questions’’ Analytic Method Re-envisaged as a Procedure’. CLRU memo ML-117. Masterman, M. and Shillan, D. (date unknown) Progress Report: ‘Work on TextHandling and Mechanical Translation’. CLRU memo ML-118. Masterman, M. and Needham, R. M. (1960) ‘Specification and Sample Operations of a Model Thesaurus’. CLRU memo ML-128. Masterman, M. and Kay, M. (1960) ‘Mechanical Pidgin English (Parts 1 & 2)’. CLRU memo ML-133. Masterman, M. (1961) ‘Semantic Message Detection for Machine Translation using an Interlingua’. CLRU memo ML-141. Masterman, M. and Parker-Rhodes, A. F. (date unknown) ‘A New Model of Syntactic Description’. CLRU memo ML-142. Masterman, M. (date unknown) ‘Commentary on the Guberina Hypothesis’. CLRU memo ML-148 (in this volume, chapter 9). (date unknown) ‘The Theory of the Semantic Basis of Human Communication Applied to the Phonetics of Intonational Form’. CLRU memo ML-155. (date unknown) ‘Semantic Message Detection Research for Machine Translation (Short Note)’. CLRU memo ML-158A. Masterman, M. and Wilks, Y. (1964) Annual Summary Report: ‘Semantic Basis of Communication’. CLRU memo ML-167. Masterman, M., Wilks, Y., Shillan, D., Dobson, J. and Spa¨rck Jones, K. (date unknown) Final Report: ‘Semantic Basis of Communication’. CLRU memo ML-171. Masterman, M. (1964) ‘A Picture of Language’. CLRU memo ML-180. (date unknown) ‘Finding Semantic Patterns with a Thesaurus’. CLRU Memo ML-186A. (date unknown) ‘Parker-Rhodes’ Syntax Program – Its Pros and Cons’. CLRU memo ML-188. (date unknown) ‘Man-Aided Computer Translation from English into French using an On-line System to Manipulate a Bi-lingual Conceptual Dictionary or Thesaurus’. CLRU memo ML-198.
Bibliography of Margaret Masterman
303
(1968a) ‘Semantic Language Games or Philosophy by Computer’. CLRU memo ML-210. (1968b) ‘Semantic Algorithms’. CLRU memo (in this volume, chapter 10). (date unknown) ‘Computerised Haiku’. CLRU memo ML-221. (date unknown) ‘Conspectus of CLRU Research’. CLRU memo ML-223. (1971) ‘Studies in Mechanical Abstracting (2)’. CLRU memo ML-227, September. Masterman, M., Bastin, E. W., Parker-Rhodes, A. F. and Braithwaite, R. B. (date unknown) ‘The CLRU New Research Cube: The Search for a Way of Strengthening a General Combinatorial Conception of Information as it occurs in Natural Language’. CLRU memo ML-231. Masterman, M. (1970) ‘Compute me a poem’. CLRU memo ML-241. (date unknown) ‘The E.E.C. Test Sentences. An Exercise in Pre-editing’. CLRU, Final Report of TH-17, Annex 1. Masterman, M. and Smith, R. J. (date unknown) ‘The Subsenu Experiment’. CLRU, Final Report of TH-17, Annex 2. (date unknown) ‘The ‘‘Skeletal’’ Nature of Systran’. CLRU, Final Report of TH17, Annex 3. (date unknown) ‘The Role of Humour’. CLRU, Final Report of TH-17, Annex 4. Masterman, M. (date unknown) ‘Note for Translators: How to Read Hex’. CLRU, Final Report of TH-17, Annex 5. (date unknown) ‘Routine for Finding the Stress Pattern of English’. CLRU, Final Report of TH-17, Annex 7. Masterman, M. and Williams, B. (1983) ‘Patterns of Emphasis and Heterodynes of Meaning’. CLRU, CLT 007, March. Masterman, M., Parker-Rhodes, A. F., Blackmore, R. M. and Spa¨rck Jones, K. (1958) ‘Description of the Tests Carried out on Methods of Constructing Sentence Lattices’. CLRU Unnumbered Workpaper. Masterman, M. and Needham, R. (1966) ‘Natural Language Analysis as Non-Numerical Data Processing’. CLRU Unnumbered Workpaper, January.
Other References
Abercrombie, D. (1965) Studies in Phonetics and Linguistics. Oxford University Press. Allen, W. S. (1957) On the Linguistic Study of Languages. Cambridge University Press. Allen, R. and Greenough, L. (Eds.) (1888) Caesar’s Gallic War. London. Amsler, R. A. (1980) ‘The Structure of the Merriam-Webster Pocket Dictionary’. PhD dissertation, Technical Report TR-164, University of Texas at Austin. Anscombe, G. E. M. (1959) An Introduction to Wittgenstein’s Tractatus. Hutchinson, London. Austin, J. L. (1956–7) ‘A Plea for Excuses, Presidential Address’. In: Proceedings of the Aristotelian Society, vol. 56 pp. 12–20. Baird, A. (1966) ‘Transformation and Sequence in Pronunciation Teaching’. English Language Teaching, vol. 20, pp. 71–9. Bar-Hillel, Y. (1953) ‘The Present State of Research on Mechanical Translation’. American Documentation, 2, pp. 229–36. (1960) ‘The Present Status of Automatic Translation of Languages’. In: Advances in Computers (F. L. Alt, Ed.) Academic Press, New York, vol. 1, pp. 91–163. Barwise, J. and Perry, J. (1983) Situations and Attitudes. MIT Press, Cambridge, MA. Bastin, E. W. (1953–4) ‘General Mathematical Problems involved in Mechanical Translation’. Mechanical Translation, vol. 3(1), p. 6. Becker, J. (1975) ‘The Phrasal Lexicon’. In: Proceedings of the First Workshop on Theoretical Issues in Natural Language Processing, MIT, Cambridge, MA, pp. 54–8. Belnap, N. (1960) ‘Entailment and Relevance’. Journal of Symbolic Logic, pp. 176–179. Birkhoff, G. (1940) Lattice Theory. American Mathematical Society Publication, 25, Providence, RI. Black, M. (1962) Models and Metaphors. Cornell University Press, Ithaca, NY. Braithwaite, R. B. (1953) Scientific Explanation. Cambridge University Press. British Standards Institution (1958) Universal Decimal Classification. Trilingual Abridged Edition. British Standards Institution, London. Brondal, V. (1950) The´orie des propositions. Munksgaard, Copenhagen. Brouwer, L. E. J. (1952) ‘Historical Background, Principles and Methods of Intuitionism’. South African Journal of Science, vol. 49 pp. 139–46. (1954) ‘Points and Spaces’. Canadian Journal of Mathematics, vol. 6, pp. 26–34. 304
Other references
305
Burton, R. (1978) ‘Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems’. Bolt, Beranek and Newman Technical Report 3453, Cambridge, MA. Campbell, N. R. (1920) Physics, the Elements. Cambridge University Press. Later reprinted as Foundations of Science. Dover, New York, 1957. Carbonell, J. G. (1982) ‘Metaphor: An Inescapable Phenomenon in Natural Language Comprehension’. Research Report CMU-CS-81-115, Department of Computer Science, Carnegie-Mellon University; also in: Strategies for Natural Language Processing (W. G. Lehnert and M. H. Ringle, Eds.) Lawrence Erlbaum Associates, Hillsdale, NY, pp. 415–33. Carnap, R. (1937) The Logical Syntax of Language. Routledge, London. Chai, J. Y. and Biermann, A. W. (1997) ‘The Use of WordNet in Automatic Information Extraction’. In: Proceedings of the ACL workshop on Word Sense Disambiguation, Washington, DC, pp. 62–68. Chao, Y. -R. (1946) ‘The Logical Structure of Chinese Words’. Presidential Address read at the regional meeting of the Linguistic Society of America in New York, December 31, 1945. In: Language, vol. 22, p. 4. Chomsky, N. (1957) Syntactic Structures. Mouton, Dordrecht, Netherlands. (1965) Aspects of the Theory of Syntax. MIT Press, Cambridge, MA. Clarke, C. (1978) ‘Black Holes’. Theoria to Theory, vol. 12, pp. 275–9. Copestake, A. and Briscoe, E. J. (1991) ‘Lexical Operations in a Unification-Based Framework’. In: Proceedings of ACL SIGLEX Workshop on Lexical Semantics and Knowledge Representation, Berkeley, CA, pp. 88–101. Curry, H. B. (1953) ‘Mathematics, Syntax and Logic’. Mind, vol. 62, pp. 172–83. Dixon, R. M. W. (1965) What is a Language? Longmans, London. Dobson, J. (1965) ‘Report of an Experiment to Find Semantic Squares in an Interlingually-Coded Text Taken from a Traveller’s Handbook’. CLRU Memo. Dolby, J. L. (1966a) ‘On the Classification of Written English Phrases’. Memo to CLRU, January (unpublished). (1966b) ‘On the Complexity of Phrase Translations’. Memo to CLRU, January. Dorr, B. and Jones, D. (1996) ‘Role of Word–Sense Disambiguation in Lexical Acquisition’. In: Proceedings of COLING ’96, Copenhagen, pp. 129–35. Duff, C. (1947) How to Learn a Language. Oxford University Press. Fass, D. C. (1988) ‘An Account of Coherence, Semantic Relations, Metonymy, and Lexical Ambiguity Resolution’. In: Lexical Ambiguity Resolution in the Comprehension of Human Language (S. L. Small, G. W. Cottrell and M. K. Tannenhaus, Eds.) Morgan Kaufmann, Los Altos, CA, pp. 151–78. Feys, R. (1946) ‘La technique de la logique combinatoire’. Revue Philosophique de Louvain, vol. 18, p. 81. Fillmore, C. (1968) ‘The Case for Case’. In: Universals in Linguistic Theory (E. Bach and R. Harms, Eds.) Holt, Rinehart and Winston, New York, pp. 61–82. Fodor, J. (1975) The Language of Thought. Thomas Crowell, New York. ‘Freeing the Mind’. Articles and Letters from The Times Literary Supplement, March–June 1962. Frege, G. (1960) ‘On Sense and Reference’. In: Translations from the Philosophical Writings of Gottlob Frege (P. Geach, trans.) Basil Blackwell, Oxford, pp. 56–78.
306
Other references
Gellner, E. A. (1959) Words and Things. Gollancz, London. Givon, T. (1967) Transformations of Ellipsis, Sense Development and Rules of Lexical Derivation. SP-2896. Systems Development Corp., Santa Monica, CA. Gsell, R. et al. (1963) ‘Etude et re´alisation d’un de´tecteur de me´lodie pour analyse de la parole’. L’Onde Electrique, vol. 43, pp. 90–101. Guberina, P. (1954) Valeur logique et valeur stilistique des propositions complexes. Editions Epoha, Zagreb. (1957) ‘La logique de la logique et la logique du langage’. Studia Romanica, pp. 87–89. (1959) ‘Le son et le mouvement dans langage’. Studia Romanica, pp. 201–13. Guo, C. (Ed.) (1992) Machine Tractable Dictionaries: Design and Construction. Ablex, Norwood, NJ. Haas, W. (1961) ‘Commentary on Masterman’s Paper ‘‘Translation’’’. In: Aristotelian Society Supplementary Volume, 35, pp. 21–7. Hacking, I. (1975) Why Does Language Matter to Philosophy? Cambridge University Press. Halliday, M. A. K. (1956) ‘The Linguistic Basis of a Mechanical Thesaurus’. CLRU Memo. An abstract appeared in Machine Translation, vol. 3, p. 37. Haugen, E. (1951) ‘Directions in Modern Linguistics’. Language, vol. 27 (Presidential Address to the Linguistic Society, Chicago, 1950), pp. 1–5. Hausser, R. (1999) Foundations of Computational Linguistics. Springer, Berlin. Hesse, M. B. (date unknown) Analogy and Structure in a Thesaurus. CLRU Workpaper, ML 101. (1959–60) ‘On Defining Analogy’. In: Proceedings of the Aristotelian Society, December, pp. 79–89. (1974) The Structure of Scientific Inference. Macmillan, London, and University of California Press, Berkeley. Isham, C. (1979) ‘Quantum Gravity’. Theoria to Theory, vol. 13, pp. 19–27. International Business Machines (1958) Literature on Information Retrieval and Machine Translation, Bibliography and Index. IBM Research Centre, Yorktown Heights, New York. (1959) ‘Final Report on Computer Set AN/GSQ-16 (XW-I)’, IBM Research Center, Yorktown Heights, NY, 6 volumes. Jeans, J. H. (1921) The Dynamical Theory of Gases. Cambridge University Press. Johnson, W. E. (1921) Logic, Part 1. Cambridge University Press. Jones, B. (1974) ‘Plate Tectonics: A Kuhnian Case?’ New Scientist, 63, pp. 536–8. Joyce, T. and Needham, R. M. (1958) ‘The Thesaurus Approach to Information Retrieval’. American Documentation, 9(3), July, pp. 76–89. Katz, J. and Fodor, J. (1963) ‘The Structure of Semantic Theory’. Language, 39, pp. 170–210. Katz, J. J. (1972) Semantic Theory. Harper and Row, New York. Kay, M. (no date) ‘The Relevance of Linguistics to Machine Translation’. Unpublished Manuscript. Kay, M. and McKinnon-Wood, R. (1960) ‘A Flexible Punched-Card Procedure for Word Decomposition’. CLRU memo ML-119. Keynes, J. M. (1921) Treatise on Probability. Cambridge University Press. Kuhn, T. S. (1962) The Structure of Scientific Revolutions. Chicago University Press.
Other references
307
Langford, C. H. (1942) ‘The Notion of Analysis in Moore’s Philosophy’. In: The Philosophy of G. E. Moore (P. A. Schilpp, Ed.) 3rd edn, Open Court, La Salle, IL, pp. 210–25. Lewis, D. (1972) ‘General Semantics’. In: Semantics of Natural Language (D. Davidson and G. Harman, Eds.) Reidel, Dordrecht, pp. 61–82. Locke, J. (1690) An Essay Concerning Human Understanding. London. Locke, W. N. and Booth, A. D. (Eds.) (1955) Machine Translation of Languages. John Wiley & Sons, New York. Luhn, H. P. (1956) ‘A Statistical Approach to Mechanised Literature Searching’. IBM Journal of Research. Reprinted in Advances in Automatic Text Summarization (I. Mani and M. Maybury, Eds.), MIT Press, Cambridge, MA, pp. 23–37. Lukasiewicz, J. (1951) Aristotle’s Syllogistic. Oxford University Press. Mellish, C. (1988) ‘Implementing Systemic Classification by Unification’. Computational Linguistics, vol. 14(1), pp. 40–51. Miller, J. (date unknown) ‘Extension and Testing of CLRU Library System’. CLRU memorandum. Miller, G. et al. (1990) ‘Five Papers on WordNet’. Research Memorandum 43, Cognitive Science Laboratory, Princeton, NJ. Mooers, C. N. (1956) The Next Twenty Years in Information Retrieval. Zator Company, Cambridge, MA. Moore, G. F. (1922) Philosophical Studies. Cambridge University Press, London. Nagao, M. (1989) Machine Translation: How Far Can it Go? Oxford University Press. Needham, R. M. (1958) ‘Research Note on a Property of Finite Lattices’. CLRU memorandum. (1959) ‘The Problem of Chunking’. CLRU memo ML-75. (1961) ‘Classes and Concepts’. Paper read at the fourth Conference of the British Society for the Philosophy of Science. Newman, S. M. (1959) Analysis of Prepositionals for Interrelational Concepts. Preliminary study. US Department of Commerce, Washington, DC. O’Connor, J. D. and Arnold, G. F. (1961) Intonation of Spoken English. Longmans, London. Olney, J. L. (1964) ‘An Investigation of English Discourse Structure with Particular Attention to Anaphoric Relationships’. Memo, Systems Development Corporation, Santa Monica, CA. Parker-Rhodes, A. F. (1956) ‘An Algebraic Thesaurus’. CLRU memo. An abstract in Machine Translation, vol. 3, p. 36. (date unknown) ‘The Lattice Method for the Mechanical Processing of Syntax’. CLRU Workpaper, ML-116. Parker-Rhodes, A. F., Needham, R. M. (1959) Reduction Method for NonArithmetic Data. IFIP, Paris, pp. 1125–30. (1962) ‘Computational Methods in Lattice Theory’. Cambridge Philosophical Society Transactions, pp. 83–92. Parker-Rhodes, A. F. and Wordley, C. (1959) ‘Mechanical Translation by the Thesaurus Method Using Existing Machinery’. Journal of the SMPTE, vol. 68, pp. 98–106.
308
Other references
Parker-Rhodes, A. F., McKinnon Wood, R., Kay, M. and Bratley, P. (date unknown) The Cambridge Language Research Unit Computer Program for Syntactic Analysis. CLRU Monograph, ML-136. Peirce, C. S. (1931–58) Collected Papers. (C. Hartshorne and P. Weiss, Eds.), Harvard University Press, Cambridge, MA, vol. 2. Price, P. J., Ostendorf, M. and Wightman, C. W. (1989) ‘Prosody and Parsing’. In: Proceedings of Speech and Natural Language Workshop, DARPA, October, pp. 5–11. Pustejovsky, J. (1995) The Generative Lexicon. MIT Press, Cambridge MA. Quine, W. V. O. (1953) From a Logical Point of View. Harvard University Press, Cambridge, MA. Reifler, E. (1954) ‘The First Conference on Mechanical Translation’. Mechanical Translation, vol. 1(2), pp. 23–32. Richards, I. A. (1936) Philosophy of Rhetoric. Oxford University Press. Richards, I. A. and Gibson, C. M. (1956) English through Pictures. Repr. Pocket Books, New York (1952). Originally published as The Pocket Book of Basic English, New York, 1945. Richens, R. H. (1956) ‘A General Program for Machine Translation between Any Two Languages via an Algebraic Interlingua’. CLRU memo ML-5. Abstract appeared in Machine Translation, vol. 3, p. 37. (1957) ‘The Thirteen Steps’. CLRU workpaper, July. (1958) ‘Interlingual Mechanical Translation’. The Computer Journal, vol. 1, pp. 144–56. (1959) ‘Tigris and Euphrates’. In: Proceedings of the Symposium on the Mechanisation of Thought Process, National Physical Laboratory, HMSO, London, pp. 312–23. Roget, P. M. (1931, 1962) Thesaurus of English Words and Phrases. Longmans, London. Rosten, L. (1934) The Education of Hyman Kaplan. Gollancz, London. Russell, B. (1930) The Principles of Mathematics. Norton, New York. (1953) ‘The Cult of ‘‘Common Usage’’ ’. The British Journal for the Philosophy of Science, vol. 3, pp. 303–7. Schank, R. C. (1975) Conceptual Information Processing, North-Holland, Amsterdam. Sejnowski, T. and Rosenberg, C. (1986) ‘NETtalk: A Parallel Network that Learns to Read Aloud’. Johns Hopkins University Electrical Engineering and Computer Science Technical Report, JHU/EEC-86/01. Shaw, M. (1958) ‘Compacting Roget’s Thesaurus’. CLRU memo. Shillan, D. (1954) Spoken English. Longmans, London, 2nd edn, 1965. (1965a) ‘A Linguistic Unit Adaptable to Economical Concordance-Making’. CLRU memo. (1965b) ‘A Method and a Reason for Tune-Analysis of Language’. CLRU memo. (1965c) ‘Anomalous Finites’. CLRU memo. Spa¨rck Jones, K. (1963) ‘A Note on NUDE’. CLRU memo ML-164. (1964) ‘Synonymy and Semantic Classification’. PhD thesis, University of Cambridge (and Edinburgh University Press, Edinburgh 1986).
Other references
309
Stetson, R. (1951) Motor Phonetics, North-Holland Amsterdam. Strzalkowski, T. (1992) ‘Information Retrieval Using Robust Natural Language Processing’. In: Proceedings of Speech and Natural Language Workshop, DARPA, February, pp. 206–11. Tanimoto, H. (1937) An Elementary Mathematical Theory of Classification and Prediction. IBM Corp., New York. Thagard, P. (1982) ‘Programs, Theories and Models in Cognitive Science’. In: Proceedings of the 4th Annual Conference of the Cognitive Science Society, Ann Arbor, MI, pp. 155–7. Thorne, R. G. (1955) ‘The Efficiency of Subject Catalogues and the Cost of Information Searches’. Journal of Documentation, 11, pp. 130–48. Waismann, F. (1945) ‘Verifiability’. In: Proceedings of the Aristotelian Society, Supplementary vol. 19. Weinreich, U. (1966) ‘Explorations in Semantic Theory’. In: Current Trends in Linguistics, vol. 3, (T. Sebeok, Ed.), Mouton, The Hague, pp. 69–86. Whitehall, H. (1951) The Structural Essentials of English. Harcourt, Brace & Co., New York. Whorf, B. L. (1950) Four Articles on Metalinguistics. Foreign Service Institute, Department of State, Washington DC (repr. from The Technology Review and Language, Culture and Personality). Wilkins, J. (1668) An Essay Towards a Real Character and a Philosophical Language. London. Wilks, Y. (1964) ‘Text Searching with Templates’. In: Masterman, M., ‘A Picture of Language’ CLRU memo. (1965a) ‘Final Report AF 61-(052) 647’. CLRU memo ML-171. (1965b) ‘Application of the CLRU. Method of Semantic Analysis to Information Retrieval’. CLRU memo ML-173. (1965c) ‘Computable Semantic Derivation’. CLRU memo. (1975a) ‘Preference Semantics’. In: The Formal Semantics of Natural Language (E. Keenan, Ed.) Cambridge University Press, pp. 103–24. (1975b) ‘A Preferential Pattern-Seeking Semantics for Natural Language’. Artificial Intelligence, Vol. 6, pp. 53–74. (1978) ‘Making Preferences More Active’. Artificial Intelligence, vol. 11, pp. 75–97. Wilks, Y., Slator, B. and Guthrie, L. (1996) Electric Words. MIT Press, Cambridge, MA. Wisdom, J. (1942) ‘Moore’s Technique’. In: The Philosophy of G. E. Moore (P. A. Schilpp, Ed.) 3rd edn, Open Court, La Salle, IL, pp. 421–50. Wittgenstein, L. (1922) Tractatus logico-philosophicus. Routledge and Kegan Paul, London. (1958, 1972) Philosophical Investigations (trans. G. E. M. Anscombe), Basil Blackwell, Oxford. Yarowsky, D. (1992) ‘Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora’. In: Proceedings of COLING ’92, Nantes, France, pp. 317–24. Zipf, G. K. (1953) The Psychobiology of Language. Houghton Mifflin, Boston.
Index
Abercrombie, D. 259 Allen, W. S. 265 Amsler, R. 7 archeheads 127 Austin, J. L. 69, 187, 198 Bar-Hillel, Y. 8, 10, 107, 108, 111, 128, 134, 215, 253 Barwise, J. 6, 216, 222 Bastin, E. W. 28, 39, 56 Becker, J. 279 Belnap, N. 280 Birkhoff, G. 49 Booth, A. D. 171, 186 Braithwaite, R. 15, 283, 298 Briscoe, E. 56 Brouwer, J. 6, 37, 47, 55, 56, 126 Burton, J. 11 Cambridge Language Research Unit (CLRU) 117, 159 Campbell, N. 284 Carbonell, J. 14 Carnap, R. 215 case grammar 9, 219 Chao, Y. 27, 28, 30, 32 Chomsky, N. 6, 10, 14, 16, 214, 218, 227, 266, 286 chunk 83 Church, A. 50 Clausewitz, J. 17 cognitive science 17 conceptual dependency 8 context 124 Copestake, A. 56 Curry, H. B. 50 Davidson, D. 286 Dixon, R. M. W. 265 EUROTRA 11
example-based translation 279 fan theorem 55, 126 fans 126, 211 Fass, D. 14, 223 Feys, J. 51 Fillmore, C. 9, 219 Firth, J. R. 265 Fodor, J. 9, 16, 159, 218, 266 Frege, G. 6 Gellner, E. 188 Givon, T. 56 Guberina hypothesis 12, 227 Guberina, H. 12, 270, 279 Guo, C. 223 Haas, W. 216 Hacking, I. 285 Halliday, M. 5, 6, 86, 215, 265, 266 Harris, Z. 6, 10 Haugen, D. 131 head 203, 209 Hesse, M. 285 hypothetico-deductive system 283 information retrieval 268 interlingua 134, 138, 221 Johnson, W. E. 171 jointed fan 205 Joyce, T. and Needham, R. M. 136, 158 Katz, J. 9, 159, 218, 266 Kay, M. 130, 163, 181, 186 King, G. W. 141 King, M. 11 Kuhn, T. 15, 283, 298 Langford, C. H. 187, 195 lattice 119
311
312
Index
Lewis, D. 222 Lewy, C. 213 library-point 60 Locke, J. 186 Longuet-Higgins, C. 17 Lyons, J. 265 machine translation 161, 162, 171, 219 Masterman, M. 69, 158, 171, 172, 215, 285 McKinnon-Wood, R. 163, 181 mechanical abstracting 268 mechanical pidgin 161, 162, 171, 186 mechanical translation 69, 83, 107, 227 Mellish, C. 6 Montague, R. 12, 286 Mooers, C. 62 Moore, G. E. 59, 218, 219, 285 Moore’s Paradox of Analysis 187, 195, 218 Nagao, M. 279 Needham, R. 15, 64, 120, 136, 137, 158, 163, 249 Parker Rhodes, A. F. 15, 86, 120, 136, 137, 227 Peirce, C. S. 191 Perry, J. 6, 216, 222 picture theory of truth 222 pidgin 161 preference semantics 8, 279 propositional calculus 59 pun 151 Pustejovsky, J. 56 Quine, W. V. O. 213, 218, 286 Reifler, E. 83, 162 relevance procedure 151 Rhodes, I. 227 Richards, I. A. 5, 55, 188, 215 Richens, R. H. 8, 9, 108, 128, 158, 161, 171, 172, 263 Roget’s Thesaurus 7, 15, 40, 69, 86, 89, 92, 93, 107, 118, 134, 149, 151, 160, 194, 198, 203, 287 Rosenberg, I. 16
rows 129 Russell, B. 59, 228, 286 Schank, R. 8, 279 Sejnowski, T. 16 semantic parsing 11 semantic primitive 220 semantic shell 261 semantic square 270 Shaw, M. 120 Shillan, D. 254, 279 situation semantics 6, 217, 222 situation 216 Spa¨rck Jones, K. 7, 9, 15, 159, 171, 172, 223 spindle lattice 211 stick picture 189, 220 Strzalkowski, T. 15 syllogism 268 syntax lattice 119 SYSTRAN 11 Tanimoto, H. 137 Tarski, A. 286 template 261 Thagard, P. 15 thesaurus 109, 134, 220 Thorne, J. 62 translation 187 Waismann, F. 191 Walt Disney 193 Weinreich, U. 253, 266 Whitehall, H. 130 Whorf, B. 11, 191, 214 Wilkins, Bishop 228, 287, 289 Wilks, Y. 8, 13, 14, 56, 186, 223, 261, 278, 279, 289 Wisdom, A. J. T. D. 24 Wittgenstein, L. 5, 6, 11, 14, 16, 41, 43, 44, 45, 56, 187, 214, 215, 217, 222 WordNet 8 Yarowsky, D. 8, 160 Yngve, V. 227 Zipf, P. 171