COGNITIVE TECHNOLOGY In S e a r c h of a H u m a n e I n t e r f a c e
COGNITIVE TECHNOLOGY In S e a r c h of a H u m a n e I n t e r f a c e
ADVANCES IN PSYCHOLOGY 113 Editors:
G. E. STELMACH R A. VROON
ELSEVIER Amsterdam
- Lausanne
- New
York
- Oxford
- Shannon
- Tokyo
COGNITIVE TECHNOLOGY In Search of a Humane Interface
Edited by Barbara GORAYSKA Department of Computer Science City University of Hong Kong Kowloon, Hong Kong
Jacob L. MEY Department of Linguistics Odense University Odense, Denmark and
Northwestern University Evanston, IL, USA
1996 ELSEVIER Amsterdam
- Lausanne
- New York-
Oxford - Shannon
- Tokyo
NORTH- HOLLAND ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
ISBN: 0 444 82275 5 9 1996 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science B.V., unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands
TO ALL WEARY TRAVELLERS ON THE INFORMATION SUPERHIGHWAY TO ALL TRAPPED USERS IN THE WORLD WIDE WEB
AN OLD, IRISH PILGRIMS' WISH:
MAY CT & TC RISE UP TO MEET YOU MAY YOU REACH YOUR GOAL MAY THE WEB NOT EAT YOU!
This Page Intentionally Left Blank
vii
FOREWORD
Formal interest in Cognitive Technology at the City University ofHong Kong began its official life as the result of a growing interest amongst a group of colleagues, mostly in the Department of Computer Science, in exploring the ways in which developments in information technology carry implications for human cognition and, inversely, how human cognitive abilities influence the way we act technologically. This interest led to a proposal for the establishment of a Cognitive Technology Research Group which would draw in colleagues from a variety of departments within the University as well as from other institutions in Hong Kong. One of the early events organised to launch the Research Group was a series of Cognitive Technology lectures in 1993 (some of these are now available in print). For this purpose, invitations were extended to individuals in universities within and outside Hong Kong who were known to have an interest in this area of research. At the same time, plans were laid to stage an international conference in August 1995, for which an international programme committee was created, with participants from Australia, Canada, Denmark, Germany, Hong Kong, Israel, Japan, the UK and the USA. Many of the chapters in this volume have been written by people affiliated with the Conference in various ways, either as participants or plenary speakers, or as members of the conference programme committee. In addition, there are chapters authored by other experts, who were specially invited to contribute by the editors of the volume. The number of individuals who are ready to promote the aims of Cognitive Technology research and development by contributing to this volume and attending the Conference, reflects a growing concern among the scientific community about what it means to be human in an increasingly technological world. The volume contains many innovative ideas, all of them exciting and a number of them controversial. The reader will find its perusal a stimulating and rewarding experience.
N. V. Balasubramanian Head, Department of Computer Science, City University of Hong Kong.
This Page Intentionally Left Blank
ix
ACKNOWLEDGEMENTS
The Editors of the Volume feel the need to make a pleasurable acknowledgement of all the help and assistance they were allowed to receive during the preparation of this book. First of all, thanks go to the management and staff of the two institutions that were involved in hosting and caring for the editors during their various periods of collaboration: City University of Hong Kong, and Northwestern University, Evanston, Ill., USA. Special thanks are due to Dr. N.V. Balusubramanian, Head of the Department of Computer Science at City University, who not only showed his vivid interest in the CT project from the very beginning, but did everything in his power to get our effort off the ground, and continued to follow up with good advice and support, making things possible that otherwise not would have happened (such as the one editor's three-months' stay at City University). At the other end, Professor Roger Schank, Director of the Institute for the Learning Sciences, Northwestern University, provided the proper atmosphere for an effort of this kind, and saw to it that the crossocean contacts between the editors could be tended to without disruptions of practical sorts. The ILS working group 'Video-Media' graciously put up with the Evanston editor's frequent and prolonged absences from the project, while secretarial and other staff (in particular Ms. Teri Lehmann) were extremely helpful in facilitating the necessary contacts. At the Hong Kong end, the General Office of City University's Department of Computer Science (in particular Miss Giovanna W.C. Yau, Miss Anita O.L. Tam, Miss Tiong C.W. Chan, Miss Winnie M.Y. Cheung, Miss Ada S.M. Wong, Miss Amy Lo, and Miss Candy L.K. Tsui) were incredibly helpful in handling our mail, fax, xerox and computer problems, in dealing with the accounts, and in countless other 'user-friendly' ways. We also want to thank the many Research Assistants and Demonstrators who sweated over photographs and diagrams, getting them into the proper computer format prior to print-out as camera-ready copy; in addition, they provided invaluable help in scanning documents that had got stranded in the vagaries of the various word processing systems and their avatars (Word 4, 5, and 6, Word for Windows 2 and 6, WordPerfect, MacWrite and what not). Some of the people we want to thank specially are, at the Hong Kong end, Mr. Jims C.F. Yeung and Mr. Ted Lee; at the Evanston end, Ms. Inna Mostovoy. Among our colleagues, Kevin Cox deserves the highest praise for having taken over the formatting of the book according to the style sheet provided by the publishers - a daunting task for which neither editor was properly prepared or mentally equipped, and which neither of us ever is going to undertake again unless we get princely remunerated! Jonathon Marsh, Laurence Goldstein, Roger Lindsay, Kevin Cox, and Ho Mun Chan were always ready to help with advice and good ideas in their areas of expertise, while Brian Anderson, Stevan Harnad, and Tosiyasu Kunii added new dimensions to many of our thoughts often by simply telling us how to express them better. Finally, we wish to express our gratitude to all the authors in this volume for their generous and diversified contributions to the major theme of Cognitive Technology
x
Acknowledgements
which bring forth its many subtle facets and hidden avenues. And, on the penalty of innuendo, the Editors themselves want to grab this opportunity to thank each other for a splendid cooperation- in sweat and blood, and almost no tears. People have sometimes felt that our title 'Of Minds and Men' is less then appropriate, as it carries with it (as one contributor expressed it) the connotation of male sexism, and besides (as some others pointed out) it disregards the one half of humanity. We would like to ask our well-meaning critics to leave their Steinbeck behind and look back to Robert Bums, who is the original source of the quotation. Burns' words are not only not sexist, they are certainly anything but macho. In fact he pokes fun at men (and mice as well), by commenting on their various hare- (or mice-) brained notions. Here are his words (more or less in the Scottish original): "Of mice and men The cunning schemes So often gang aglay. "
Here you are. No sexism, just plain old Bums. Apologies for any inconvenience caused to mice and men. Hong Kong & Evanston, July 1995 Barbara Gorayska Jac o b L. Mey
xi
CONTENTS
INTRODUCTION Barbara Gorayska and Jacob L. Mey
Of Minds and Men THEORETICAL ISSUES
Cognition 1 Barbara Gorayska and Jonathon Marsh
Epistemic Technology and Relevance Analysis: Rethinking Cognitive Technology
27
2 0 l e Fogh Kirkeby and Lone Malmborg Imaginization as an Approach to Interactive Multimedia
41
3 Frank Biocca Intelligence Augmentation: The Hsion Inside Hrtual Reafity
59
Modeling and Mental Tools 4 David A. Good Patience and Control: The Importance of Maintaining the Link Between Producers and Users
79
5 Hartmut Haberland
"And Ye Shall Be As Machines" - Or ShouM Machines Be As Us? On the Modeling of Matter and Mind
89
6 Ho Mun Chan
Levels of Explanation: Complexity and Ecology
99
Agents 7 Margaret A. Boden
Agents and Creativity
119
Contents
xii 8 Myron W. Krueger
Virtual (Reafity + Intelligence)
129
CASES AND PROBLEMS
Communication 9 Roger O. Lindsay
Heuristic Ergonomics and the Socio-Cognitive Interface
147
10 Alex Kass, Robin Burke, and Will Fitzgerald
How to Support Learning from Interaction with Simulated Characters
159
11 Richard W. Janney
E-mail and lntimacy
201
12 Robert G. Eisenhart and David C. Littman
Communication Impedance: Touchstonefor Cognitive Technology
213
Education 13 Kevin Cox
Technology and the Structure of Tertiary Education Institutions
225
14 Orville L. Clubb and C. H. Lee
A Chinese Character Based Telecommunication Device for the Deaf
235
15 Laurence Goldstein
Teaching Syllogistic to the Blind
243
16 Che Kan Leong
Using Microcomputer Technology to Promote Students' "Higher-Order" Reading
257
Planning 17 Mark H. Burstein and Drew V. McDermott
Issues in the Development of Human-Computer Mixed-Initiative Planning
285
18 David Heath, Simon Kasif, and Steven Salzberg
Committees of Decision Trees
305
Contents
xiii
19 Roger C. Schank and Sandor Szego
A Learning Environment to Teach Planning Skills
319
Applied CognitiveScience 20 Tosiyasu L. Kunii
Cognitive Technology and Differential Topology." The Importance of Shape Features
337
21 Alec McHoul and Phil Roe
Hypertext and Reading Cognition
347
22 Hiroshi Tamura and Sooja Choi
Verbal and Non-Verbal Behaviours in Face to Face and TV Conferences
361
23 John A. A. Sillince
WouM Electronic Argumentation Improve Your Ability to Express Yourself?.
375
24 Tony Roberts
Shared Understanding of Facial Appearance - Who are the Experts?
389
25 Stevan Harnad
Interactive Cognition: Exploring the Potential of Electronic Quote~Commenting
397
INDEX
415
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
Introduction OF M I N D S A N D M E N Barbara Gorayska City University ofHong Kong csgoray@cityu, edu.hk
Jacob L. Mey Odense University, Denmark Northwestern University, USA
[email protected],
[email protected] et mihi res non me rebus subiungere conor 'and I try to adapt the world to me, not me to the world' Horace, Epistulae I.i: 19
This Introduction will be in two parts. The first part is a general statement about Cognitive Technology, its aims, and how it goes about realizing them. In this (the present) part, only some specific links with the individual authors' contributions to our volume will be highlighted. The second part consists of a 'guided tour' through the volume, briefly characterizing each of its chapters and familiarising the reader with its contents. A certain amount of thematic structure will tentatively be uncovered, and connections between the individual chapters will be suggested. COGNITIVE TECHNOLOGY AS A DISTINCT AREA OF INVESTIGATION What happens when humans produce new technologies? This question can be considered under two perspectives, each having to do with the how and the why of such a production. It may be concretised as a desire to explore: a) how and why constructs that have their origins in human mental life are embodied in physical environments when people fabricate their habitat, even to the point of those constructs becoming that very habitat; and b) how and why these fabricated habitats affect, and feed back into, human mental life.
2
B. Gorayska and J.L. Mey
The present volume initiates such an exploring of the human mind via the technologies the mind produces. As instances, consider problem solving devices such as algorithms, or mind-organizing devices such as metaphors. These mental constructs, when externalised, find their expression in the form and functionality of physical tools, defined as structured parts of our physical world which, becoming space-organizing devices, help us shape and manipulate our physical environment. Obvious examples here are a hammer, or a computer. Using the tool, in turn, binds human epistemology and, within the constraints inherent in the functional and structural characteristics of the tools used, determines our cognitive processes of adaptation. For this reason, all explorations of the mind via its self-produced technologies will have to consider the 'situatedness' (as Fogh Kirkeby & Malmborg call it in their contribution) of such constraints. This situatedness closely links our explorations to concerns about the environment. The process of externalising the human mind we will name Cognitive Technology, CT. Cognitive technological processes always take place in a particular environment. No technologies come out of the blue; neither are they created ex nihilo, as Alec McHoul aptly has observed in a recent contribution (1995). As an instance, he refers to the well-known example of the printing press: its origins are not just located in a general trend, some e-volutionary development of the human mind, but must be found in a particular de-volution of the human mind into some existing, contemporary technologies, such as metal-cratting and wine-pressing. A technology is always grafted onto another technology, says McHoul (1995: 14) - but the development of a particular technology is never a necessary, deterministic one. The printing press happened when pressing techniques and iron-mongering had reached a certain stage of perfection, so that in retrospect we can see that it happened, and could happen, at the time, and how it happened; but not why it had to happen, and why right there and then, in 15th century Mainz. Similarly, to understand CT, we need to understand the environment in which a particular cognitive technological development came about and was or is being developed; similarly, there is no causality involved here. Still, by itself, a mere understanding is not enough: the mind has to be consulted not just as an abstract faculty, but as a human characteristic that develops technology, and is developed by it. With respect to the environment, this includes both the physical and the mental world: we must investigate the environment both as a necessary precondition for CT and as conditioned by CT, taking the human mind into consideration under the perspective of this mutual relatedness. For this reason CT is allied with environmentalism, which brings us to another point. Environmentalism expresses our need to understand how we manipulate our physical environment by means of the tools we have created. However, it leaves out both the generative processes of the human mind by means of which the tools come into being, and the feedback effects such tools produce in the human mind, after these tools have become a part of the physical environment. Following a distinction proposed by Gorayska and Marsh (in their contribution to this volume), we can say that the processes of cognitive adaptation by which the human mind must deal with already externalised mental constructs, i.e., the physical tools at our disposal, constitute a domain of investigation which is different from, although complementary to and closely aligned with, CT. This domain, which Gorayska and Marsh call Technological Cognition, TC, focusses attention on how human cognitive
Of Minds and Men
3
environments are generated; these environments comprise sets of cognitive processes that are essential to human conscious thought, which they inform as well as constrain. At the same time, the authors say, the TC activities within this cognitive environment provide input to the externalisation processes of CT, and thus are complementary to the latter. Thus, apart from the need, expressed in current environmental concerns, to protect our physical environment from the unwanted or uncontrolled impact of technology, there also exists a need to understand how our cognitive environments can be, and are, manipulated by that very same technology (as Jonathon Marsh has observed; pers. comm.). Investigations within CT and TC are intended to satisfy that need. As Gorayska and Marsh further state in their contribution to this volume, the movement between the products of CT and the processes of TC is recursive. Giving due consideration to this recursive movement, they point out, is a necessary, and occasionally a sufficient, condition for the design and construction of what they call an Epistemic Technology (ET). ET tools are tools whose interfaces serve to amplify the processing capabilities of both humans and machines to a point which in a normal course of events is out of reach for either of them functioning alone. From this, it becomes obvious that ET tools can only come into being once we realise that CT products and TC processes are neither exclusively physical nor exclusively mental, but integrated in a spiraling, 'Heraclitean' relationship. 1 The CT tool constitutes the embodiment of a task (as do all tools), but this particular embodiment is seen as cognitively appercepted and organized in a piece of technology. Thus, while a problem solving algorithm is a mental tool, it only becomes a CT tool when it is realized in a material shape, such as when it is embodied in a computer program and runs on a real machine, or when it takes shape in a mechanical device, like one of those old, now mostly defunct National Cash registers or a mechanical calculator. In an epistemic technology, understood in terms of the C T / T C relationship, the physical and the mental are two sides of one and the same process: the
1The reference is to the well-known tenet formulated by the Greek philosopher Heraclitus, according to which 'bne cannot immerse oneself into the same river twice" (Diels 1954:Fragm. 11). Usually, this saying is interpreted in one direction only: the river changes for every person immersing him- or herself in it. This intepretation has its roots in the formulation given above, which, however, is not Heraclitus' own, but the one found in Plato and Aristotle, where they refer to (and misquote) the Heraclitean saying (Plato, Cratylus 402A; Aristotle, Metaphysica 111:5, Bekker 1010a30). The converse is namely just as important, but is often overlooked: nobody is ever the same after having taking a dip in the river. Hence, the human and the aquatic bodies are in a constant dialectic relation a relationship that is emblematic of the general relationship between humans and their creations (both the already given, and the ones emerging). Such an interpretation is, moreover, in harmony with Heraclitus' original text (admittedly obscure, but what else is new?), which does not say (with Plato, Aristotle and the rest) that 'one would have a hard time trying to get into the same river twice', but that 'different waters flowingly touch those who enter identical rivers' (potamo~si to~sin auto,sin embainousin h~tera kai h~tera h(tdata epirre~ ). Owing to the special construction of this dictum, one can also read it as meaning: 'different waters flowingly touch the same persons entering [identical] rivers', and it is this interpretation which jibes best with Heraclitus' notion of the 'soul as a humid exhalation', to be likened unto a flow of water, as well as with Arius Didymus' (to whom we owe this quote from Heraclitus) accompanying commentary. Dipping into the same river, one and the same person thus will perceive a (psychic) difference, perhaps even each time receive a different soul: he is touched by a 'humid exhalation' of an ever-changing nature, or: the river changes us more than we change the river.(For the Greek originals, see Diels & Kranz 1954; Kirk 1954:367) -
4
B. Gorayska and J.L. Mey
mental extemalises itself in the CT tool, but then the tool reflects back to set up a niche of its own within the cognitive environment: a TC space with its associated techne. The salient point in these reflections is the truly innovative character of ET tools inasmuch as they embody the CT/TC relationship. These innovative properties reside in the fact that the structures we see emerge are not merely ascribed to, and confined by, the worlds in which they arise, but are developed in response to a movement which is not only recursive or, as we have said earlier, 'spiraling', but properly dialectic. This dialectic movement does not only go 'from the inside out', as in the classical definition of the tool, or in modern approaches to Human Computer Interaction (HCI), but, more importantly, it goes 'from the outside in'. That is to say, the structure of our techne, of our mental constructs, originates in the impact that tool use has on our cognitive world, in a manner which parallels the way the physical tool is said to originate in the clash of the mind with a physical obstruction. Cognitive Technology, by turning the leaf, so to speak, and interlinking with Technological Cognition, at the same time turns itself from a branch of technology into a techne of cognition. In their unique ways, all contributors to the present volume seek to find an answer to the crucial question: 'What technologies can best tune human minds to other human minds and to the environment in which these minds must operate?' Such technologies will be characterised by what Tosiyasu Kunii (pers. comm.) has termed 'humane interfaces'. But if we are ever to discover what it really means to be humane in a technological world (a question which is at the heart of the proposed investigation), then there are other pertinent questions which must be asked. These questions emphasise the human aspect of how minds are externalised, and they include: 9 Why are human minds externalised, i.e., what purpose does the process of externalisation serve? 9 What can we learn about the human mind by studying how it externalises itself?. 9 How does the use of externalised mental constructs (the objects we call 'tools') change people fundamentally? 9 To what extent does human interaction with technology serve as an amplification of human cognition, and to what extent does it lead to an atrophy of the human mind? Why are human minds externalised?
Looking around us, we see the externalising of minds in full progress everywhere. People jot things down in notebooks, they write memoranda, articles, letters, books, they note down music, they cry for help or sympathy, they vent their anger, they paint their fantasies and imaginations on canvas, walls, and their own bodies, they erect statues, monuments, buildings, and so on and so forth. Externalising the mind seems to be one of the human race's most favorite pastimes; and in our externalisations, the seeds of language are sown. Human language, in whatever textual forms it happens to come (including, perhaps, art and music), is a spontaneous and ingenious product of this process. Externalised language is one of the first and best examples of ET: a human-made epistemic tool for mediating the dialectics between the CT product and the TC process (on this, see Good's contribution to this volume). ET is also a perfect externalised expression of, and a reflection upon, the characteristics of the human mind
Of Minds and Men
5
itself (Gorayska and Lindsay, 1989, 1993). Tremendous efforts have been expended in the cognitive sciences to date to understand how the human mind, mediated by language, maps onto, and reflects, the properties of the physical environment; in other words, how true propositions about the world come into being. By contrast, what has rarely been in focus (although, following Whorf (1969), it ought to have been, and constantly so (cf Gorayska, 1993)), is the pivotal role of language as an instrumental tool, which not only reflects, but also serves to shape and control, from the outside in, an organisation of the human mental world, grounded in motivation and sensorimotor action. This role goes well beyond a mere recovery of communicative goals or speech acts and enters the realm of pragmatic acts, as proposed by Mey (1993).2 What purpose does this process of externalisation serve? If we consider some of the items listed in the preceding paragraphs more closely, we may obtain a first clue as to the 'why' of these externalisings. The list contains, e.g., such items as 'monuments' and 'memoranda'. The latter term goes directly back to the Latin word for 'remember' (of. 'memory', 'memento', 'memorable', and other derivatives of the same root). The former term is even more instructive. It has to do with a root meaning 'remind' (as in 'monitor', 'admonish' and so on). Note how the word 'mind' itself is related both to 'memory' and to 'monument'; the latter being a 'reminder' in some externalised form, such as stone, bronze, concrete. Hence, the immediately plausible answer to the question 'Why?' is that we externalise our minds to make them more durable, to prevent them from going under in the general chaos that ensues when we leave our bodies (and our minds!) at death. Some people have been good at externalising in this fashion, and moreover they must have known that they were successful: how else would Horace been able to say that he had 'erected a monument more durable than bronze, one that neither biting rains nor violent hurricanes' would be able to destroy? (Odes III.i:l-5) This monument was nothing other than his externalised mind, his poetry. Apart from our desire for immortality, we externalise minds to share them with others. There is not a day in our lives when we don't benefit, one way or another, from the externalisations of our forebears' minds. Conversely, we ourselves do everything we can to ensure that our own minds will not only live on forever, but that others, too, will benefit from them. This latter desire to share and to influence may even extend to the point of the ridiculous, as when we send out our own images, our own externalised selves, into a universe whose possible inhabitants in all probability never will find, or, even if they do, understand our externalisations (as in the case of the U.S. space probe 'Explorer', carrying those notorious copper tablets depicting our 'civilisation' and its progress, on board). In the externalisation process, two sets of motivating tendencies operate in tandem: individuation and detachment on the one hand, belonging and uniformity on the other. The first have to do with expressing oneself in contradistinction to the mass of humanity, to erect a singular monument for the autonomous self, the second concern the desire for recognition by others, the wish to make sure that my externalisations are accepted as valuable and valid by my fellow-humans. As such, the latter desire borders 2 Unlike speech acts, pragmatic acts are not limited to utterances. They include a variety of action types, across different modalities of expression and processing, that are jointly performed by an individual in order to communicate within the constraints of his or her ecological environment.
6
B. Gorayska and J.L. Mey
on the urge to control my environment (including my fellow-humans), such that I may be sure that my externalisations will be acceptable to, and accepted ('internalised', if you wish) by the others. In the framework of our present discussion, one could say that the former tendency belongs in the domain of CT (an 'externalising' process), while the latter tendency pertains to a process of'internalising', included in TC. Harmonious mediation between these two tendencies is a hallmark of holism and a source for cooperative creation in all living organisms (Koestler, 1964). As part of a living organism, the human mind exhibits similar characteristics. A conspicuous failure to consider and satisfy either of these tendencies incurs the risk of fatal consequences for the organisms involved: limited exernalisation results in frustration, forced internalisation will lead to mind-control and all the horrors that Koestler saw developing in the totalitarian regimes he criticised. Through the complementary processes of CT and TC, the externalising/internalising human mind, being a vulnerable organism in an only partially controlled world, is equally confronted with the same potential and exposed to the same abuse. E x t e r n a l i s i n g - internalising- externalising ... an eternal loop When we externalise our minds, we create an object. This object, in its turn, is not just an object in space: it is something that we consider, relate to, love or hate, in short, work with in our minds, hence internalise. In very simple cases, the object is 'just' an object for vision (as Descartes seemed to think); more sophisticated 'considerations' include the mirroring that takes place when the child discovers its own image as separate from itself (as Janney points out in his chapter, where he treats of a particular mind-object, viz. email messages; see also Krueger, this volume), or when we evaluate a mind product as to its 'adequacy', and compare it to the original representation that we had 'in mind'. Conversely, removing this check can have some strange and unexpected effects, as in the cases where an artist loses the use of one of his senses: the near-blind Monet, the deaf Beethoven, who continued to externalise their minds, but with unmistakably different (but not necessarily artistically inferior) outcomes. The re-internalised object is different from the one that started the externalising process: it retains a tie to its origin, but has also become strangely independent. It now has a life of its own, and at a certain point of time, it is its turn to become externalised. This process continues until we think the result is adequate, and in the meantime, every new version interacts dialectically with the previous one. It supersedes it, but cannot replace it quite. 3 True, humans and the artifacts they produce are not cut of the same cloth, and in a sense, these 'twain shall never meet' -yet, in their disparities and dissimilarities reside the seeds of growth. Cognitive dissonance is the basis for creativity. It leads to progress. It arouses motivation. It is also the source for goal formation. It serves to put
3 This process we all know in its crudest form as the cycle of producing an article or report. Which is why one has to be very careful in taking the process of writing on the computer as being the 'same' as writing on a piece of paper: the externalisations in the latter case are not easily or accidentally wiped out, whereas in the computer case we often destroy entire files at the touch of a button, whether we want it or not, and certainly cannot afford to have our machines clogged up with innumerable earlier versions of our articles and other mental products. (But figure how difficult and frustrating the life of a literary critic must be in future times, when all the world's poets have gone on line and consequently no longer keep their scratch versions around...)
Of Minds and Men
7
in motion mental processes of adaptation. Once the novel problems have been solved, the techniques used in their solution may be externalised into the physical environment, so as to open up a cognitive space in our mind for further enhancement of our creative acts, very much like what happened when we de-linked the computer tool from the limited-purpose physical artifact it had been defined as earlier. On the basis of these newly formed physical environments, new dissonances arise that lead to the perception of new problems, and so on. Here, the perspectives visualised in the works of M. E. Escher become of relevance; one may also think of the paradoxes outlined by R. D. Laing in his famous 'Knots', or of the paradoxes of Zen which, if resolved, are believed by the proponents of this spiritual order to lead to deeper insights, and to take those who have succeeded onto higher planes of cognition, resulting in a more balanced and harmonious ecological integration. (In our volume, some of these aspects are reflected in the chapters by Kunii and Boden; also Biocca's notion of the 'evolution of cognitive abilities' and 'intelligence amplification' belongs here). What can we learn about the human mind by studying how it externalises itself?. It has long been the feeling of many people that the products of one's mind, one's mental 'externalisations', tell us something about their origins. Graphology is by many considered a science that, on the basis of handwritten text, can say something about the writer's personality. We consider Wagner's oeuvre to be the true expression of the Germanic mind, for better or worse (if we believe in such a thing, that is). Similarly, we think of Liszt's music as characteristic of the playboy type that he represents for us: brilliant, but superficial and emotionally shallow. The question of course is how many of these externalisations are in fact internalisations of earlier produced judgements judgements that may wholly or in part have been provoked by considerations that were external to the externalised product. We may or may not like Poles or Germans, and consequently we think of Chopin or Wagner as 'typical' for our likes or dislikes. In this way, the externalised mind becomes superordinate to the internalised one: we become the slaves of our own mental products. With this proviso, viz., that we quite possibly learn nothing new about the mind, but rather replicate what is already there, albeit in an implicit form (see Boden, this volume), perhaps the most important property we can identify by looking at the mind's ways of externalising itself is its enormous versatility and resourcefulness in dealing with obstacles. It is as if the human mind were some kind of amoebe: when it encounters an obstacle, it internalises it and represents it as something mental, no longer 'out there', and consequently tractable by a mental operation (often called 'wishful thinking'), just like the amoebe digests its adversaries by engulfing them and absorbing them into its own system. Conversely, what an externalised technique-cum-tool also tends to reveal is the existence of stages (cognitive or physical) inherent in human evolution. Thus, each tool reveals the particular human thresholds which it is designed to help us transcend, ot~en with a hidden vengeance. To this issue we now turn. How does the use of externalised mental constructs (called 'tools') change people fundamentally? What we said above is of the utmost relevance for our discussions on how to define the relationship between the humans and the tools they make (including the most versatile tool of them all, the computer). The tool is both an affordance (in the
8
B. Gorayska and J.L. Mey
Gibsonian sense; Gibson, 1979) and a limitation. It is an extension of the mind inasmuch as it is mind externalised. But insofar as it is externalised (that is, a material thing), it is also marked by the inherent limitations of matter. In other words, it is an object among other objects, and is treated as such. The tool is, then, not only a means of liberating the mind; as an object, it is liable to the same 'fetishising' (to use a Marxian expression) that other objects are. We believe objects to have power because we either have created them in our image, or (as objets trouv~s) have 'found them in our image', in the double sense of the word: 'found' them, like the primitive native who finds a stick and believes it' s a god, and 'found' them, in the sense of finding them to be like us: 'And ye shall be as machines', as Hartmut Haberland puts it in his chapter (cf. also Mey, 1984). The fundamental change in the human occurs when he or she no longer considers the materiality of the tool as a subordinate property, but is intent on making it shine in all its material splendor (like Aaron polishing the Golden Calf). 4 Or, worse still, when He or She becomes subordinated to It, with often quite unforeseen consequences (cf. Piercy, 1990). To what extent does human interaction with technology serve as an amplification of the human condition, and to what extent does it lead to an atrophy of the human mind?
Every device that has been invented to transcend human weaknesses has occasionally (sometimes as the rule) been perverted to promote, rather than cure, those weaknesses, or create other, related (and worse) weaknesses. Take a simple invention such as clothes. They were destined to keep people warm, hence more resistant to sickness. At the same time, clothes remove some of the natural resistance that the body has to temperature changes, and makes it more prone to illnesses such as colds and infections of various kinds (see also Goldstein, Biocca, this volume). Or take the automobile, originally invented to let people travel in comfort and with greater speed and efficacy to their destination. Today, the car is an instrument of purposeless torture for many people trying to get to their work in the morning and having to sit on the freeway in noxious fumes for hours on end, or take the car to the workplace an hour ahead of time and eat their breakfast in splendid isolation in the carpark, rather than in the bosom of the family. And think of what the car does to its regular occupant's physical fitnessT As far as the computer is concerned, the most egregious case of perversion of its purpose has been the so-called simplification of office routines. It was said that the computer inaugurated the 'paperless once': no more mindless copying by hand or by spirit duplicator, no more generation of reams and reams of useless memoranda and standard letters; everything would be kept in the computer, and only brought forth when the necessity arose. Now look what we've got: more paper than ever... Another instance of the computer's ambiguous delivery on its promises is the ease with which one now can produce relatively nice copies of one's work; this ease perverts into a need to produce perfect instances of whatever piece of insignificant office procedure one has to put out. Similarly, spelling checkers (which originally were intended to help one spell correctly) now tyrannize us into spelling everything the same way, and do not allow us to distinguish between a draft (where spelling errors are 4 This is, in a nutshell, computer fetishism, the inherent and endemic illness of all computer programmers and computer fans.
Of Minds and Men
9
irrelevam) and a final document (e.g. a project description that has to go to some Research Council or other authority). What was supposed to make life easier and more meaningful has made life much harder and much more meaningless. And the reason? We have not been able to distinguish between the different 'rationalities' that are built into the machine (to borrow, and expand on, Max Weber's (1923) classical distinctions): the machine's own limited 'object' rationality (Sachrationalitat: what can this machine do?), and our own, also limited, 'subject' rationality (what can we do, what do we want to do, and why: Zweckrationalitat)? Furthermore, we must ask ourselves: do we really want it, or do we just want it because it' s there, or because it's possible? Which leads us to the ultimate rationality: the unlimited 'common' rationality of society, also known as the common good, but most often perverted to stand for the good of one particular class of people, say computer manufacturers or network freaks or hackers or criminals of various kinds. Fearfully we ask ourselves: Will the same adverse fate await our expectations of an amplified intelligence, of increased creativity, and of any other similar promised cognitive improvemems of the Information Age? The computer as a tool: Catastrophe, turning point, or both? Karl Marx, in one of his caustic asides on the benefits of industrialisation, observes how with the advent of machines and increased productivity, the laborer not only is pressed to the utmost, but actually risks being killed by that super-tool, the machine: 'The tool kills the worker' (Das Werkzeug erschl~tgt den Arbeiter). How is this possible? Isn't it the case that the tool helps us achieve things easier, fulfil our duties with more precision and speed, allows us to have more free time on our hands (after all, the work is done faster, and with less expenditure of energy)? It behooves us to recall what has been said about that housewives' blessing, the vacuum cleaner. In the beginning, when people first acquired this new gadget, there was undoubtedly a whole bevy of benefits that followed in its wake: houses became cleaner than they had ever been before, cleaning times were but a fraction of what they had been earlier, no more bent backs and varicose veined legs. But with the advent of the clean house, the ante was upped, so to speak. And what earlier had been an exception (witness the expression 'Easter clean', meaning an exceptionally clean state of affairs, to be achieved only at Eastertime or Passover, a tradition which still exists in a number of cultures, such as orthodox Jewry), now becomes the rule and the standard. And that is not the worst part of it. Not only has the rule of the game been changed, the game itself has got a new definition. What earlier had been a merit, now becomes a duty. What had been a task, now is a chore, to be performed at least once a day, and by increasingly more laborious and complicated methods, as not only the mental ante is incremented, but also the tool itself increases its level of perfection and technical complexity. The toolness of the tool, measured either in abstract, calculable terms (size of RAM or ROM, 16/32 bit processor, various operating systems and 'development environments', and so on), or in terms of outer appearances ('sleek from', 'aesthetic 3D-look', 'photo-realistic graphics', 'advanced' whatever) becomes more important than the uses for which it was originally created. Furthermore, this 'toolness' passes itself off to the mind as the only natural state of affairs for humans as well: we are to be measured in relation to how well we function as appendices to our tool. For example,
10
B. Gorayska and J.L. Mey
it is no longer important just to have a computer that works, and serves as a tool for our purposes (however limited and modest): we need to have the tool's latest version, because that's what computers are at today (and besides, we can't get spare parts or service for our old dinosaur any longer, so we simply have to buy an expensive new, shiny monster). Even if we are rank and file amateurs, when it comes to buying a computer, we insist on purchasing, along with it, professional quality sot~ware - o r 'industrial strength C/C++ code", as one ad has it (Dr. Dobbs'Journal, May 1995) much of which the majority of us will never have the faintest chance of putting to any decent use. Contrary to what someone might think on reading this, the above is not a Luddite plea for more primitivity. Rather, it is a plea for reflection on what a tool is, and how the computer tool, if we want to use that metaphor (or for that matter, any other metaphor) should be conceived of. The word 'conceived' is used with a vengeance here: a conception, viewed as an act, a process, rather than as a product ('a concept'), is a human work of space and time and use. That which was conceived, needs to be borne until fruition; but the story does not end there. The 'right to life' of the concept, once born, does not terminate at birth: the concept, the metaphor must grow in the environment in which it was conceived and born, and in which it was destined to be used. The way a concept develops is in its use; and it is through its use that it gets 'worded' (see Mey, 1985:166f). After all, a thought is not a thought until it has been expressed in proper language, to quote Marx (and Engels) again (The German Ideology). Vice versa, once the thought has been worded, and the conceived notion has been 'given' something verbal to wear (this 'giving' should not be taken in too passive a sense), the words themselves become important metaphorical agents (not to say tools). It is otten said that 'words don't break bones': we beg to disagree. Words, in general, are the 'lineaments of our verbal gratifications', to vary Blake (from his Note Books); we kill for words, like we kill for partners and food, and conversely words may kill us, like gratifications do, when they are not kept in their proper time and space (as David Good remarks in his contribution to this volume). The historical vicisssitudes of the concept of geocentrism, with its associated metaphors, furnish a good illustration: in the abstract, Galileo's beliefs and his wordings were scientifically gratifying; in their concrete form, however, they were a matter of life and death, and he had officially to recant and swear to being convinced that the sun rotated around the earth rather than vice versa, in acordance with the geocentric metaphor. Conversely, the competing metaphor, heliocentrism, while it may have won out on the scientific battlefield, does not play any significant part in our daily lives: we still talk about sunset and sunup, and the sun rotates around the earth, as it has always 'done'. The reason? It's the way we have conceived of things way back, in a more 'primitive' stage of our existence; the primary sensation of seeing the sun rise has captivated our language, and subsequently language has captivated our mental perception. 'Once you start, you're fight inside the thing: the rhetoric has you, language implicates you in the lie right off" (Mclnerney, 1993:108) Does language, then, shape our minds? Not directly; neither is this what Whorf was thinking, when he formulated his famous thesis about language's influence upon the shaping of the human thought and mind (cf. Whorf, 1969). What we do have, and always have had, is a 'working relationship' with language: if it has us by the tail, the reason is because our tails are language-shaped, like the rest of ourselves. We came
Of Minds and Men
11
into being, we were conceived, in a linguistic environment, and our being carries the imprint of that original Language (which is not necessarily any particular idiom, but that which Marx would have called Gesamtsprache, the 'universal language', had he had his wits about him when he wrote on the subject). So Whorf was right about language, but only on condition that we make his 'concepts' work for their 'living': humans and concepts shape one another, because language and thought, being 'conceived' together, must needs live and work together. From
tools to words
One may ask how we came to shift our previous emphasis on tools to one on words and language? Consider the three distinct stages of tool evolution that have been progressively separating the human mind from its natural habitat, as illustrated in figures 1, 2, and 3.
n t.r ' I enera~ cognitive environment
-
" manipulation? " .
amplification?
evolution~ h Y ? J
tools
J
manipulation
natural physical environment
Fig. 1. The original feedback loop As the human mind evolved (figure 1), natural cognitive environments generated tools for a direct manipulation of natural physical habitats. These modified environments then began to feed back into the natural cognitive environments. Since tool use was relatively minimal, the feedback effect from tools to minds was too. It was the human encounter with nature, characterised by its own, inherent dynamism, that was originally responsible for our mental growth. With tools getting more and more sophisticated, and increasing in number, they became themselves the immediate environment for the mind's dialectic encounters, as shown in figure 2. This state of affairs led to an ever more pronounced detachment and a more forceful alienation of humans from the living matter which earlier had been their predominant partner in interaction, thus entailing a growing gap in their emotive, cognitive and biological adaptation. Alienation is the predominant condition of urban people in their fabricated worlds of everyday utilities enhancing their human physical or mental characteristics. Our manual skills and many bodily functions, once directly responsive to the rhythmic dynamics of nature, now thrive on the sounds, the looks, and the behaviour of purely technological devices. Here, we're all in the same boat, on
12
B. Gorayska and J.L. Mey
our way to the controlled environments of a 'virtual reality': latter-day 'feelies' of a brave, new world and a perhaps not-so-future 1984-ish universe.
natural cognitive
1
generation
d q manipulation? ]
L environment p t~,~
tools
amplification? /
evolutio~~
manipulation
fabricated physical environment natural physical ......
environment
Fig. 2. The intermediate feedback loop One of the most notable creations in this phase was the introduction of (precious) metals as market exchange tools, followed some fifteen centuries later by the invention of bank notes. This paved the way for the competitive invention of other tools, useful to society, with the inventor becoming an investor, whose reward was stored in the form of added exchange value. The monetary tool (technically called the 'general equivalent') thus also unleashed greed. Greed created the need for increased profitability and started the mad rush for effectiveness; both were successfully taken care of by all-pervasive and all-embracing business enterprises. As far as the human mind was concerned, the monetary tool affected and distorted our sense of values: sellable things (now called 'commodities') stopped being appreciated for any other values than their market value. ('The value of a thing/Is the price it will bring', as the Classsical Economists used to say). Money, having become a precious object for possession, established itself as the greatest asset in its own right, ungracefully subordinating everything else, including our morality, to itself. (One of the first to draw attention to this 'consciousness-perverting' influence of the invention of money by the 8th century B.C. Greeks was the German-British philosopher Alfred Sohn-Rethel (1972, 1978). See also Lindsay, this volume, on the importance of these 'other' values for a satisfactory socio-cognitive interaction). It is beyond any dispute that neither emotions nor intimacy go hand-in-hand with greater profits and increased productivity. The fabricated worlds for mass consumption ensure little of the former, and instead make the mind concentrate its attention on the tasks at hand formulated by the latter. (The cognitive effects of fabricated worlds are discussed by Gorayska and Lindsay (1989 & 1994) within the framework of their
Of Minds and Men
13
'Fabricated World Hypothesis', and by Gorayska and Marsh, this volume.) Such a fabrication has increasingly done away with warmth-generating, hand-crafted aspects in design. Unexpected asymmetries and imperfections, unpredicted gentle curves or crooked lines - the hallmarks of life and character- have given way to machinegenerated, straight and square, uniform, predominantly sky-scraping, lifeless- you name i t - jungles of plastic and concrete for human dwellings and work: everlasting monuments of optimal rationality. 5 Where standardisation and transportability of skills across tool-use rule the day, cultural diversity disappears from view, and many travellers no longer derive their creative inspiration from visiting 'foreign' lands. Next, following the expansion of mechanical tools, the computer arrived on the evolutionary scene. Unlike the other mechanical devices up to that time, the electronically mediated information tool externalises some of our known cognitive abilities. This tool, therefore, fabricates human cognitive environments, as illustrated in figure 3. The human mind, finally having found a way of turning upon itself, in so doing turned against itself, as it were. natural 1 cognitive L environment ~
"
~ ~ evolutio~
generation
LI q manipulation? I amplification? atrophy9~
tools
/ manipulation
fabricated cognitive environment fabricated physical environment
natural
physical environment
Fig. 3. The modern feedback loop. Humans express themselves through words and bodies alike (Arndt and Janney 1987; see also Krueger, this volume). Verbal language directs attention to the relevance of largely unconscious, sensory exchanges which it cannot substitute for, only complement (Lindsay and Gorayska, 1994). Inputs and outputs, transmitted by s Remember what happened to the cheerful hues of colour in the former totalitarian regimes of the East? They all became strange shades of grey: blue grey, green grey, pink and red grey, yellow grey; dull and subdued.
14
B. Gorayska and J.L. Mey
the senses of touch, smell, hearing, and vision, need to be integrated in meaningful ways so that appropriate responses in contexts can be generated (Sperber and Wilson, 1989; Mey, 1993). The task of the conscious mind has been an active, cognitive, search for congruity in this sensory intake, a concern with what the Scholastic philosophers, following Aristotle, called the sensus communis, or 'common sense'. Our sensitivity to the varying degrees of such a congruity, which previously allowed us to use our common sense to distinguish reality from fiction, now takes on quite the opposite value; in modern, computer-driven environments, the implied denotation of 'common sense' no longer is to do with congruity in variety, but has come to stand for
uniformity in singularity. Note that here, too, optimising rationality has taken its toll. It used to be the case, as we said earlier, that our handwritten symbols, with their varying shapes, served as the paramount tool for expressing human emotions and personalities. The same can be said of the vast richness of tones in the spoken medium; today, these riches, too, are a matter of the past. What we are left with is a unified type 6 (in all senses of the word), good for nothing more than the mere exchange of information. Adopting the role of exchangers of information, we have adapted ourselves to the very name coined for the Age. And there is more: By exchanging and manipulating electronic information over long distances (sometimes called 'telematics'), we are able to connect people, and connect with people, in all sorts of distant places. A true slogan for our times could be: Telecommunicators of the world, uniteT But how many of us stop to consider that this modern facility also makes possible, on a global scale, a separation where previously none existed, nor should, or would have been? Unless we exercise proper care, our global village, McLuhan's dream, will be turned before our eyes into a Searlian 'SuperChinese Room', the very 'Hermeneutic Hall of Mirrors '7 that Harnad (1990) warns us against. There, nothing is found except ungrounded symbols which, even if we were able to interpret, we could not really understand - f o r the precise reason that such symbols would not have been acquired through a shared, real world experience (see also Good, Biocca, and Janney, this volume). Nobody would wish to deny that the tools we use have originated in acts of human creation, or that many of them embody great scientific achievements. We also grant that those inventions have been mostly well-intended. The typist first got a typewriter, then a flexowriter, and finally a word processor; the bookkeeper got a Hollerith, then an electric book-keeping typewriter, and finally a spreadsheet and other sophisticated software; the manager kissed his secretary goodbye and got a decision support system and a laptop; the accountant got a spreadsheet; the learned got their files and archives; the readers got their Hypertext, and any old artist (self-styled or officially recognized) can now create, at the touch of the keyboard, shapes and colours previously undreamt
6 We must not let ourselves be fooled here by the recent invention of the 'notepad' computer, which supposedly learns to recognise our handwriting on a touch screen. As many have observed, our handwriting tends to adapt quickly to the expectations built into the machine. This process reminds one of the opposition that exists between what has been called 'adaptivity' (adapting humans to tools) and 'adaptability' (adapting tools to humans; Mey 1994). 7 Compare to the glass walls of many modem skyscrapers, in which all you will see is at best your own reflection, or the reflection of other skyscrapers (which, in fact, may be a lot more interesting, as anyone knows who has strolled the streets of downtown Toronto on a sunny day).
Of Minds and Men
15
of. (On the cognitive benefits of electronically-mediated communication, in particular among scientific communities, see Harnad's chapter; for a critique of the advantages of Hypertext, see the chapter by McHoul and Roe). In all this, there is a 'but': by using these tools, we have tacitly said 'farewell' to both our control of the mental means of production (consult Roberts, this volume, for some experimental evidence), as well as to our sole ownership of the externalised objects that are the result of that production; none of these creations can any longer be said, or seen, to be of our own making, or reside within the domain of our personal decision-making. The quality of our creative and analytical thought becomes increasingly dependent on the availability and skills of technicians, support people, software engineers, and providers of electrical power, to mention but a few. Take these away, and where would we be? We gained the world, but lost our souls (to paraphrase the Bible), if we didn't outright sell the latter down the river, just like olim Doctor Faustus. And rather than adapting the media to us, we have adopted every one of its quirks and idiosyncrasies. Few are today the people who are able to think, and form their thoughts sequentially, in sentence form; it all has become a matter of jotting down and sorting out on the screen, with the help of a thought organiser or e v e n - God forbid- a thought generator. 8 (Examples of how computerised tools can be used with a minimal cognitive dependency trade-off for the benefiting individual can be found in Kunii's contribution to this volume.) Tool-generated deficiencies in human make-up have always, and often quickly, been tool-corrected. The evident lack of natural nutrients in machine-produced, artificiallyfertilised food has led to the invention of synthetic substitutes. Rather than stopping the process of refining our flour, we are putting its original roughage back in as a precious extra. Lead-free petrol was sold as an innovation, hence used to be more expensive; but why did we put the lead in in the first place? Our waning physical condition is corrected by the invention of 'fitness centres'; but why did we stop walking? And so on and so forth. But our thinking depends, as it always has done, on the senses; hence, in order to obtain the proper food for our thoughts, we have to rely on our natural, different sensual demands, rather than settling for the impoverished fare that we are standardly offered ever since modern society has forced us to rely on its artificially diversified input sources. With computers arriving on the scene, we are witness to (not surprisingly) the prompt advent of multimedia delivery systems, or so-called 'virtual realities', which promise to repair, by artificial and not-always-advantageous means (Biocca, this volume), the fading senses, and restitute our last, vital, 'missing link' to the outer world by our total, symbol-free immersion in a faked sensory experience (as described by Fogh Kirkeby and Malmborg, this volume). And the final result of it all? Not only does Harnad's Hermeneutic Hall replace the familiar Tower of Babel, but this development, being uniquely solitary and only falsely gregarious in character, turns all of us into solipsists in reverse. A corollary of the above is the emergence of a new perception of the Universal Mind: No longer is it the Big Unknown: it has taken shape before our very eyes as an externalisation of our own minds. No longer are we talking about the 'mind in the machine'; the vital question on the agenda is now that of the effects of the machine on
8 In the early days of AI, one of us had a friend who, in the Preface to his dissertation, remarked that, since this was a dissertation in AI, it properly should have been written by an intelligent machine...
16
B. Gorayska and J.L. Mey
the mind, and the resulting symbiosis of the mental and the physical: 'Of Minds and Men' ... in the Machines! A GUIDED TOUR THROUGH THE INDIVIDUAL CHAPTERS Based on the above, we want preliminarily to single out the following themes among the topics selected by the contributors to our book: 9 using technology to empower the cognitively impaired (Goldstein, Leong, Clubb & Lee) 9 the ethics versus aesthetics of technology (Krueger, Lindsay, Fogh Kirkeby & Malmborg, Gorayska & Marsh) 9 the externalisation of emotive and affective life and its special dialectic ('mirror') effects (Janney) 9 creativity enhancement: cognitive space, problem tractability (Boden, Good, Hamad, Krueger, Kunii, Chan, Tamura) 9 externalisation of sensory life and mental imagery (Biocca, Fogh Kirkeby & Malmborg, Krueger) 9 the engineering and modelling aspects of externalised life (Burstein & McDermott, Haberland, McHoul & Roe) 9 externalised communication channels and inner dialogue (Good, Hamad, Heath, Kasif & Salzberg, Krueger, Lindsay, Littman & Eisenhardt, Roberts, Sillince, as well as Clubb & Lee) 9 externalised learning protocols (Cox, Gorayska & Marsh, Kass, Burke & Fitzgerald, Schank & SzegO, Sillince) 9 relevance analysis as a theoretical framework for cognitive technology (Gorayska & Marsh, Lindsay) The above list is just a first approximation; more details will be provided below, where we take the readers on a guided 'walk' through the book's chapters, as these are grouped together in their appropriate sections. The chapters fall more or less naturally into two groups: one of a more general, theoretical type, the other dealing with specific, concrete cases and problems. Of the altogether 25 chapters (not counting the Introduction), almost one third (8) fall into the first group, while the remaining 17 make up the second. Each group of chapters has been divided into a number of thematically coherent sub-sections.
Theoretical issues of cognition, modeling, mental tools, and agents Cognition Barbara Gorayska & Jonathon Marsh (City University of Hong Kong and Hong Kong University), in their chapter 'Epistemic Technology and Relevance Analysis: Rethinking Cognitive Technology', raise the issue of changing goals in a quasi-familiar environment. What is 'new' in the new technology, they ask, and how does the mind react to the new 'superimposed structures'? They raise this issue from the point of view of the 'technologised mind', rather than (as has been done so far) from the angle of the human-friendly tool with its affordances on action (as in HCI, 'Human-
Of Minds and Men
17
Computer Interaction'). Both Gibson's (1979) direct realism in accounting for ecological perception, from where the idea of action affordance has been imported to HCI, and the current trends in HCI to treat action affordance in purely functional terms, leave some fundamental questions unanswered, viz.: (1) 'What causes a perceiving agent to attend to a particular set of stimuli to begin with?', and (2) 'How are affordance characteristics mapped directly onto the process of cognitive formation itself?.' Unless we answer these questions, the authors maintain, we will not gain real understanding of the process that enables meaningful interactions of agents with environments, nor will we be in a position to understand how environments shape our thinking. The theme of innovation is also one that haunts Ole Fogh Kirkeby and Lone Malmborg (Department of Computer and Systems Science, Copenhagen Business School, Denmark). In their contribution 'Imaginization as an Approach to Interactive Multimedia', they insist on the necessity of reflection in order to be able to produce innovation. This reflection takes the shape of 'mental images' that can be stored interactively, and anchored in what they call 'situated cognition', using multi-media technology. As there can be varying degrees to which multi-media technology supports reflection and image creation, the question arises whether it is at all possible to combine these different modes of interaction without one destroying the cognitive effects of the other. Frank Biocca (University of North Carolina, Chapel Hill, N.C., USA) raises the question: 'Can Virtual Reality Amplify Human Intelligence?', and considers, as part of the answer, the problems of 'Cognitive Extension, Adaptation, and the Engineering of "Presence"'. The crucial issue to be raised in this connection is whether this kind of 'presence' is a matter of technology only, as many proponents of Virtual Reality seem to believe; the problem is that nobody has yet defined what 'amplifying intelligence' really means. Modelling & Mental Tools In his contribution 'Patience and Control: The Importance of Maintaining the Link Between Those who Produce and Those who Use', David Good (Department of Social and Political Sciences, Cambridge University, England), observes that we must be careful to distinguish between 'indulging' the user and truly benefiting him or her. The problem is that the wrong technology (as also observed by Barbara Gorayska & Jonathon Marsh) may turn out to be detrimental to the user, not only individually, but also on a broader social scale. The new technologies lead to an ever diminishing authority and control of the speaker/writer over how technologies structure the environment, which context they are interpreted in, and which needs of the hearer/reader they therefore are able to satisfy. What can be learned (if anything at all) by those who use, Good asks, when the normative effect of direct and immediate social interaction with those who produce is gone? Hartmut Haberland (Department of Language and Culture, Roskilde University, Denmark), in a take-off on an old adage, asks himself whether it is more fruitful to model the human on the machine, or the machine on the human ("~And ye shall be like machines"- or should machines be like us?'). He points out the importance of distinguishing between simulation and emulation, and shows how all analogy, if not checked, in the end will turn out to be a circular process. Models are meaningless
18
B. Gorayska and J.L. Mey
unless they are grounded in direct experience. In our metaphorical effort to further understanding of both humans and machines, it is possible to model theories about the former by analogy to our perception of the latter, and vice versa. But the price we may have to pay for such visibility, Haberland warns, is that we will no longer know where to look for the meaning of either. In his contribution 'Levels of Explanation: Complexity and Ecology', Ho Mun Chan (Department of Public and Social Administration, City University of Hong Kong, Hong Kong), observes that the daunting complexity of many tasks and the seemingly paradoxical ability of the human mind to cope with them, contains a lesson for us when we are planning our cognitive environment on the computer: viz., by generalizing our assumptions about that environment, we are able to make it less complex, and easier to deal with. Machine-implemented general problem solvers are not possible for the same reason that no single human has ever been a general problem solver. What we can reasonably achieve, and should therefore strive for within the Cognitive Technology agenda, is the type of human-machine interaction that can solve a range of tractable problems in specific environments.
Agents Margaret Boden (University of Sussex, Brighton, England), in her chapter 'Agents and Creativity', discusses aspects of creativity in a computerized environment. Her thesis is that true creativity consists of making new use of already existing components, rather than creating things ex nihilo. Since human agents are best at the former activity, our construction of a cognitive environment should aim at stimulating human creativity by facilitating access to new, unpredictable, conceptual formations generated by the computer, rather than force the user to adapt his/her creative dan to the machine's limitations. Myron Krueger's (Artificial Reality Corporation, Cambridge, Mass.) chapter is called 'Virtual (Reality + Intelligence)'. Exploring the relationships that exist, or may come into being, between humans and machines, the author focusses on the relation of intelligence to physical reality, including the role that intelligence technologies can play in virtual realities. For Krueger, aesthetics is a higher measure of performance than efficiency, and he therefore chooses to consider success in establishing such relationships as a form of art. (Compare with the stance taken by Gorayska and Marsh). In contrast with what most computer scientists, and indeed intellectuals of all persuasions, believe, it is Krueger's thesis that much of our cognitive intelligence is rooted in our perceptual intelligence, and that one therefore from the very beginning should seek to reintegrate the mind and the body: one should experience a computer program with one's body, rather than through the medium of keyboard input or interaction with a data tablet or mouse. Thinking along these lines, Krueger arrives at many of the ideas developed in what is now called 'virtual reality'; he also is able to predict a variety of ways in which virtual reality and cognitive technologies (including traditional AI) are going to interrelate in the next few years.
Of Minds and Men
19
Applying insights from CT to individual problem areas Communication
Roger Lindsay (Psychology Unit, Oxford-Brookes University, Oxford, England) has named his contribution 'Heuristic Ergonomics and the Socio-Cognitive Interface'.. He takes his point of departure in early approaches to the 'human factors' problem in HCI, and shows that such approaches fail, because they only focus on the machine end of the problem- the impediment also discussed at length by Gorayska and Marsh. What is needed is an interactive approach in which the machines are allowed to interact with the human user on the latter's premises. Such a notion is close to the idea expressed by Haberland in his contribution: 'Whoever said that humans should be like machines?'; why not rather take the machines seriously as potential cognitive agents that humans can react to, and interact with, on human premises? For Lindsay, communication on human premises necessarily involves an ability to engage in a cooperative dialogue governed by a normative, ethical heuristic. Providing examples of ethical language and norms, the author defines the challenge to Cognitive Technology as the need to develop a 'social ergonomics'. The necessary parameters must be found not primarily in the physical, but in the socio-cognitive interface. Research into the potential of human interaction with computers through simulation has targeted on how to produce the cognitive changes that are necessary for proper learning. Alex Kass, Robin Burke & Will Fitzgerald (Northwestern University, Evanston, I11., USA and University of Chicago, Chicago, II1., USA) suggest in their contribution: 'How to Support Learning from Interaction with Simulated Characters' that interfacing students with practices and experiences that are embodied in a computer based learning environment can open the way for the natural acquisition of communicative skills in everyday situations; they also report on results obtained with 'educational interactive story systems'. For these authors, the first and foremost undertaking for Cognitive Technology, if it is to maximise the benefits arising from the effects of tools on human cognition, is to build computer systems that match, in a fundamental manner, the ways people learn. Richard W. Janney (Department of English, Johann Wolfgang Goethe University, Frankfurt am Main, Germany), in his chapter 'E-mail and Intimacy', suggests that the apparent lack of restrictions on communications that are observed in a medium that otherwise imposes severe restrictions, may be explained by a special type of interaction in communication: the 'virtual partnership' that is exercised in electronic mail, and which allows us to cross an 'email-intimate' threshold that normally would not allow us to interface with other users this closely. If this partnership is to realise the strong hopes formulated by McLuhan (of which Janney reminds us), viz., that one day electronic technology will follow directions which are not only socially unifying but above all humanly satisfying, the need, and promise, of today's Cognitive Technology is to find the fight balance between technology and experience. The subject of thresholds of communication is also the subject of the next contribution: 'Communication Impedance: Touchstone for Cognitive Technology', by Robert Eisenhardt and David Littman (SENSCI Corporation, Alexandria, Va., USA, and Advanced Intelligent Technologies, Ltd., Burke, Va., USA). The authors ask themselves: What can go wrong in computer communication? For an answer, they hypothesize that computers lack the human capacity of detecting potential
20
B. Gorayska and J.L. Mey
communication failures before they arise, thus preventing the occurrence of 'impedance' in the communicative chain. The problem, being computer generated, needs to be solved by means of the computer, which is what the authors set out to do: a practical Cognitive Technology, they claim, has to result in development tools that would take it far beyond a mere theoretical curiosity or a handbook of design heuristics. Education
Among the applications of CT to problems of daily life, endeavours in the educational sector have a high standing, both historically and content-wise. Kevin Cox (Department of Computer Science, City University of Hong Kong, Hong Kong), in his chapter 'Technology and the Structure of Tertiary Education Institutions', takes up the challenge thrown out by Kass, Burke & Fitzgerald in their chapter: how can the computer assist us in making education better, and more accessible to users? Computers, he answers, have the ability to help structure cognitive environments which are both closer to the users and allow them to be physically absent (both in space and in time) from the location of the educational practice, thus revolutionising our concept of'schooling' as bound to a particular phase or location of a person. This favorable view is in contrast with David Good's more cautious outlook on computer assisted learning. Orville L. Clubb and C. H. Lee (Department of Computer Science, City University of Hong Kong, Hong Kong) are involved in a project aimed at developing a telecommunication device that will allow Chinese hearing impaired users access to the information networks available to users of Roman characters. In their contribution, 'A Chinese Character Based Telecommunication Device for the Deaf (TDD)', they investigate how the appropriate infrastructures can be provided in order to develop an interactive telecommunications service for Hong Kong, and perhaps in the future, for Mainland China as well. A prototype for such services has been developed and is described. The next contribution deals with aspects of another impairment, blindness, when viewed from a cognitive technological viewpoint. Laurence Goldstein (Department of Philosophy, Hong Kong University, Hong Kong) investigates the theoretical implications of 'Teaching Syllogistic to the Blind' - a teaching which normally (in the case of sighted people) is done with the help of visual aids, such as Venn diagrams. The author introduces Sylloid, a tactile device invented by himself, and discusses practical problems arising from its application. The important question to which Goldstein draws our attention is what such an effort can teach us with regard to the normal functioning of the human cognitive/sensory system, and what pedagogical inferences can be drawn. C.K. Leong (Department for the Education of Exceptional Children, University of Saskatchewan, Saskatoon, Canada) discusses the implications of computer-mediated reading and text-to-speech conversion systems, designed to enhance reading. His chapter 'Using Microcomputer Technology to Promote Students' "Higher-Order" Reading' consists of a theoretical part, in which certain fundamental notions are discussed (such as the principles of 'automaticity' and 'compensation'), and a practical study of the results obtained in using an advanced computerized text-to-speech system (DECtalk) in working with below-average readers in grade school. The author
Of Minds and Men
21
believes, along with others quoted, that, due to the 'unnaturalness' of reading on-line and the complexity of reading and listening comprehension (among other factors that may also intervene), the pros and cons of computer-mediated reading will have to be appraised carefully before we can be certain of the conditions under which this particular mediation is helpful.
Planning Mark Burstein and Drew McDermott (Bolt, Beranek & Newman, Cambridge, Mass., USA; Department of Computer Science, Yale University, New Haven, Conn., USA), discuss 'Issues in the Development of Human-Computer Mixed-Initiative Planning'. Mixed-initiative systems allow humans and machines to collaborate in planning, and mainly, they allow the machine to suggest possibilities that the human user may not have thought of. In a productive synthesis, humans and machines can obtain 'synergistic improvements' in the planning process. The authors discuss what kind of multi-agent technology is most suitable from a cognitive-technological viewpoint. They believe that, in contradistinction to the world view of traditional AI, designers of cognitive technology tools must recognise and accept the fact that real life mixed initiative planners operate in unstable environments; the participants will fight back if they need to, but most of all they can be made to actively collaborate. In their contribution 'Committees of Decision Trees', David Heath, Simon Kasif, and Steven Salzberg (Department of Computer Science, John Hopkins University, Baltimore, Md., USA) attack the problem that besets the decision maker when he/she is dealing with pieces of evidence that have to be assigned different weights. In such a case, expert opinion is invaluable; but what to do if the experts disagree? A 'committee approach' is suggested that allows us to proceed with greater accuracy than when we have to rely on a single expert opinion. Learning how to deal with your problems, and how to plan, not so as to prevent them from coming up, but to learn from them while you look around for a solution, is the theme of Roger Schank and Sandor Szego's chapter, entitled 'A Learning Environment to Teach Planning Skills'. It is the authors' conviction that the usual school teaching only serves to suppress and kill any desire for true learning that the students may have had; the computer can help us restore the old learning environment, favoured also by Good, where teacher and student interacted on a one-to-one basis. The particular instrument for this teaching planning is called a 'goal-based scenario' (GBS); a concrete application is worked out in some detail.
Applied Cognitive Science Tosiyasu Kunii (The University of Aizu, Aizu-Wakamatsu, Aizu-chi, Japan) remarks that human cognition has suffered from computer dominance for as long as we have had computers. It is time, he says, in his contribution on 'Visual Recognition Based on a Differential Topology of Feature Shapes', to reverse the roles, and examine how cognitive technology can help and enhance human cognitive processes. It is shown that the most effective technology is also the most abstract one; several examples are discussed. 'Is There a Natural Readability?' is the question authors Alec McHoul and Phil Roe (School of Humanities, Murdoch University, Murdoch (Perth), Western Australia) ask themselves in their chapter on 'Hypertext and Reading Cognition'. It turns out that this
22
B. Gorayska and J.L. Mey
notion is open to serious questioning, and that readability as such does not exist prior to the technologies that facilitate reading and make it possible. However, since reading itself is a (cognitive) technology in its own right, it is over-optimistic and at any rate premature to expect saving graces to be inherent in pure technology-inspired efforts at enhancing readability (such as Hypertext). Hiroshi Tamura (Department of Information Technology, Kyoto Institute of Technology, Kyoto, Japan) has done a comparative study of 'Verbal and Non-Verbal Behaviors in Face-to-Face and TV Conferences'. His finding is that, contrary to expectation, the use of TV in remote conferencing has not enhanced communication; more factors need to be explored, such as the difference between private end business communication, the role of the non-vocal channel, and so on. A model has been developed for the analysis of conference participants in various modes. The question which John A. A. Sillince (Department of Computer Science, University of Sheffield, England) invites us to consider is: 'Would Electronic Argumentation Improve Your Ability to Express Yourself?.' He points out that the advent of electronic environments raises the challenge for us to discover in what ways, and to what extent, humans can gainfully use computer support in order to enhance their quality of argumentation. There always is a trade-off, when new technologies enter the human working-space: more knowledge may result in overload, multifarious connections in confusion, and so on. Several hypotheses are drafted, intended to capture the pros and cons of technological assistance in arguing. Special attention is given to problems of'asynchronicity', especially in remote discussions. In 'Shared Understanding of Facial Appearance- Who Are The Experts?', Tony Roberts (Department of Psychology, University of Southampton, England) explores the effect of introducing an 'expert' computer into a situation where people are trying to communicate about facial appearance, e.g., where a witness to a crime may be trying to help the police by looking at mugshots. In the experiment reported, the assumed level of involvement of the computer system used was varied systematically between two groups of participants. Those in the 'expert system' group were significantly less effective in identifying the correct face. Roberts argues that we rely on shared understanding of categories of facial appearance in such situations, and that assumptions about the role of the computer in the loop serve to disrupt this subtle aspect of communication. The book closes on an optimistic note from Stevan Harnad (Cognitive Sciences Centre, Department of Psychology, Southampton University, England), thus directly counterbalancing the pessimism expressed by McHoul & Roe, as well as Sillince's scepticism. In his contribution: 'Interactive Cognition: Exploring the Potential of Electronic Quote/Commenting', he draws attention to certain unnoticed, subtle but potentially revolutionary, changes that have evolved with the advent of electronic communication. In the traditional forms of communication, the speed of exchange is often either too fast (oral medium) or too slow (written medium). Email, and what Harnad has dubbed 'scholarly skywriting' (i.e., email discussion lists), together with hypermail archives and links to the virtual library, have opened up new doors for learned inquiry as well as for education, and blazed new paths in the exploitation of the human brain's potential. Among these new features, several are found that no prior medium has made possible; this holds in particular for the 'text-grabbing' option, called 'Q/C', that allows one to quote, and comment on, pertinent excerpts from previously
Of Minds and Men
23
read texts. Harnad describes a possible series of studies that would need to be done in order to convincingly demonstrate the potential of the Q/C feature. In many respects, knowledge building, though cumulative and ot~en collaborative, has been largely the work of 'cognitive monads'. 'Skywriting' facilitates a form of interactive cognition in which the monadic boundaries rapidly dissolve in Q/C iterations that have the flavour of a fast-forwarded recapitulation of the ontogenesis of knowledge; in this process, the identities of the individual thinkers get too blurred to be sorted back into monadic compartments. REFERENCES
Arndt, Horst and Richard W. Janney, 1987. InterGrammar: Towards an integrative model of verbal, prosodic and kinesic choices in speech. Berlin: Mouton de Gruyter. Diels, Hermann. 1899. Fragmente der Vorsokratiker. Berlin: Teubner. (7th ed. in 3 vols. by W. Kranz, 1954) Gibson, James J. 1979.The ecological approach to visual perception. Boston, Mass.: Houghton Mifflin. Gorayska, Barbara, 1993. Reflections: A commentary on 'Philosophical implications of Cognitive Semantics'. Cognitive Linguistics 4(1):47-53. Gorayska, Barbara & Roger Lindsay, 1989. On Relevance: Goal dependent expressions and the control of action planning processes. Research Report 16. School of Computing and Mathematical Sciences, Oxford Brookes University, UK. Gorayska, Barbara and Roger Lindsay, 1993. The roots of relevance. Journal of Pragmatics 19: 301-323. Gorayska, Barbara and Roger Lindsay, 1994. Towards a general theory of cognition. Unpublished MS. Harnad, Stevan, 1990. The symbol grounding problem. Physica D 42:335-346. Kirk, G. S., 1954. Heraclitus. The cosmic fragments. Cambridge: University Press Koestler, Arthur, 1964. The act of creation. London: Hutchinson & Co. Reprinted by Penguin Books: Arcana 1989. McHoul, Alec, 1995. The philosophical grounds of pragmatics (and vice versa?). (Submitted for publication, Journal of Pragmatics). McInerney, Jay, 1993. Brightness falls. New York: Vintage. Mey, Jacob L., 1984. 'And ye shall be as machines...' Reflections on a certain kind of generation gap. Journal of Pragmatics 8:757-797. Mey, Jacob L., 1985. Whose Language? A Study in Linguistic Pragmatics. Amsterdam & Philadelphia: John Benjamins. Mey, Jacob L., 1987. CAIN, and the transparent tool, or: Cognitive Science and Human-Computer Interface. In Proceedings of the Third Symposium on Human Interface, Osaka 1987, pp.247-252. (Japanese translation in Journal of the Society of Instrument and Control Engineers (SICE-Japan) 27(1), 1988) Mey, Jacob L., 1993. Pragmatics: An introduction. Oxford: Blackwell. Mey, Jacob L., 1994. Adaptability. In: R. Asher & J.M.Y. Simpson, eds., The Encyclopedia of Language and Linguistics, Vol. 1, 265-27. Oxford & Amsterdam: Pergamon/Elsevier Science. Mey, Jacob L. & Hiroshi Tamura, 1994. 'Barriers to communication in a computer age'. AI & Society 6:62-77 Piercy, Marge, 1990. He, She and It. London: Fontana.
24
B. Gorayska and J.L. Mey
Sohn-Rethel, Alfred, 1972. Geistige und kOrperliche Arbeit. Zur Theorie der gesellschaffiichen Synthesis. Frank~rt am Main: Suhrkamp. [ 1970] Sohn-Rethel, Alfred, 1978. Intellectual and manual labour: A critique of epistemology. Atlantic Highlands, N.J.: Humanities Press. Sperber, Dan and Deirdre Wilson, 1986. Relevance: Communication and cognition. Oxford: Blackwell. Weber, Max. 1950. The Protestant ethic and the spirit of capitalism. New York: Charles Scribner's Sons. (Engl. tr. by Talcott Parsons of: Die protestantische Ethik und der Geist des Kapitalismus. Archiv fiir Sozialwissenschaft und Sozialpolitik 2021, 1904-1905.) Whorf, Benjamin L., 1969. Language, thought and reality. (Selected Writings, ed. John B. Carroll). Cambridge, Mass.: MIT Press. [ 1956]
T H E O R E T I C A L ISSUES
COGNITION
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
27
Chapter 1 EPISTEMIC TECHNOLOGY AND RELEVANCE ANALYSIS: RETHINKING COGNITIVE T E C H N O L O G Y Barbara Gorayska City University ofHong Kong csgoray@cityu, edu. hk
Jonathon Marsh Hong Kong University
[email protected] INTRODUCTION
It is a disturbing thing to find oneself attempting to describe a set of novel ideas. It is impossible to avoid a strong sense of self-doubt and a nagging feeling that one is really just reworking old ground. At times the notion that there is nothing really new about the ideas one is struggling with seems inescapable. Then atter much reexamination of the possibilities the sense of novelty not only persists but continues to grow. Such has been the case with our exploration of the idea of cognitive technology as a distinct field of enquiry. The difficulty is that, while similarities to the widely studied areas of ergonomics and human computer interaction (HCI) are inescapable, the differences seem equally obvious Gorayska and Mey (1995) have made an attempt to detail these differences.
The paradigm we are going to propose takes as its key focus a specification of how and to what extent, human construction of environmental artifacts bears on the operations and structure of the human mind. Notable is the change of direction of the influence from a) the mind shaping the external worm by virtue of its mental interpretation processes, to b) the external worM's acquired structure shaping the mind. (Ibid.) However much more is needed if the idea of cognitive technology as a new field of enquiry is to come to fruition. To that end Gorayska and Mey have outlined the discipline by indicating four primary areas of investigation:
1) The nature of and changes in, the processes of access to information now made available through technological advances;
28
B. Gorayska and J. Marsh
2) How the interaction between humans and technological devices in the realm of information processing influences, from a pragmatic point of view, cognitive developments in humans; 3) Social and moral issues underlying cognitive developments as affected by modern delivery systems; 4) The feedback effect of such influences and interactions on future advances in Information Technology. While not denying the importance of these issues, we want to expand their frame of reference and further define the novel aspects of the approach, but only in so far as we place greater emphasis on the direct and generative relationship between mind and technology. We begin by taking a closer look at the adopted terminology. The term cognitive technology may be too narrow for our intentions. It serves well to describe those issues which deal with determining approaches to tool design meant to ensure integration and harmony between machine functions and human cognitive processes. Unfortunately, it does not adequately describe a number of other issues, in particular, those concerns which relate to the identification and mapping of the relationship between technological products and the processes by which human cognitive structures adapt. We see these two types of issues as constituting related but distinct areas of investigation which are best kept separate but must be given closely aligned treatment. We therefore reserve the term Cognitive Technology to refer to methodological matters of tool design, and propose the term Technological Cognition to refer to theoretical explorations of the ways in which tool use affects the formation and adaptation of the internal cognitive environments of humans. Human cognitive environments are constituted by the set of cognitive processes which are generated by the relationships between various mental characteristics. These environments serve to inform and constrain conscious thought. Under the new schema theoretical developments in Technological Cognition would find concrete expression in the constructed artifacts produced by Cognitive Technology. It is this dichotomy which forms the basis for our argument and the grounds from which we develop a framework for analysis. Taken together Technological Cognition and Cognitive Technology (henceforth referred to as TC/CT) involve the study and construction of human - tool interfaces which exploit and/or amplify the processing capabilities of one or the other such that the cognitive capabilities of the pairing involve a radical departure from those inherent to each separately. They invoke an Epistemic Technology concerned with outputs from the relationship between the structure of thought and the form of the environment in which thought occurs. THE COGNITIVE ENVIRONMENT, AFFORDANCE, AND TC/CT The assertion of Gorayska and Mey (1995) that a) The human mind anal the worm are interrelated in intricate and inseparable ways,
therefore b) the structure given to the human fabricated environment must have a profound influence on the structure of the mind
Epistemic Technology
29
remains central to the purposes of this paper. However, their argument further implicates the need for greater consideration of the processes which govern current approaches to designing the fabricated environment such that the construction of our internal cognitive environments is optimally benefited. It is arguable that enquiry into the manipulation of cognitive environments by technological means ought to begin with the premise that every tool is an embodiment of all the tasks which can be successfully accomplished using that tool. The critical underlying idea that there is a recursive effect from the fabricated environment on the structure of mind is not new. Joseph Weizenbaum (1983) argued for the importance of considering this effect in the study of Artificial Intelligence. In the study of perception, notably visual perception (discussed extensively in Bruce and Green, 1990), connectionist models have been built which, based on the work of Mart and Poggio (1976), attempt to map the cognitive ability to represent and recognize external objects directly to patterns of neuronal stimulation. Such work has produced some interesting and useful ways of analysing the mechanisms by which inputs to human perceptual faculties are acted upon cognitively to form recognizable constructs and ultimately abstract conceptual frameworks. Similarly work in psychology, notably that of the early Gestalt psychologists (Wertheimer, 1923; Koffka, 1935; K6hler, 1947), has provided us with useful models of how information is processed and sorted once it has been attended to. However, what is of greater interest to Epistemic Technology, understood in terms of the TC/CT relationship, is the fundamental question of what causes a perceiving agent to attend to a particular set of stimuli to begin with. Without an answer there can be no understanding of the process by which meaningful interactions with the environment are enabled. It is easy to attribute greater or lesser degrees of attention to obvious imperatives such as hunger, survival, comfort, or sexuality. The problem lies in trying to establish the mechanisms by which these imperatives themselves become consciously recognized and responded to in increasingly more purposeful ways. The assumptions underlying our approach to this problem conform with those of connectionist thinking only in so far as we accept that the environment of a perceiving agent dictates to one degree or another the perceptual constructs which can be elicited from it. Koffka (1935) expressed the idea clearly when he wrote of the 'demand character' within perceivable objects which depicts and communicates inherent functionality to a perceiving agent. Gibson's (1979) ecological approach elaborated the concept further by arguing that within any perceived environment only a finite set of perceptual experiences are possible. He proposed that inherent within any environment there exists a set of'affordances' whereby the characteristics of that environment allow certain possibilities for action to exist (for a detailed account see Warren, 1984; for applications see Norman, 1988). These in turn, when instantiated, serve to condition the characteristics of the environment. Such affordances can be said to be operational whether or not they are actively perceived to be so by a perceiving agent. Consequently, the process of analysing affordances becomes essential to gaining an understanding of the functional value of an environment or sets of objects within that environment (i.e., tools). The notion of affordances, and the ecological model of perception it embodies, remain interesting and useful to TC/CT. They support the analysis of tool use and artifact construction in terms of perceived functionality and disposition of mind. The
30
B. Gorayska and J. Marsh
TC/CT approach likewise assumes that environments are commonly perceived in terms of their potential for action. However, it is further concerned with how perceptual capabilities are themselves modified by environmental constraints on action. It should be made clear that we do not accept all aspects of Gibson's thinking. Notably we disagree with his claim that all perception can be understood without reference to linguistic or cultural mediation. Such direct realism does not allow for the relationship between perception and mental representation as a constructive effort mediated by cognitive processes. Instead it leads to the idea of perception as a direct phenomenon of mind which occurs, strictly reactively, as a result of exposure to the environment. This limits any attempt (Norman, 1988; Gaver, 1991; Eberts, 1994) to use analysis of affordances to forward understanding of how environments shape our thinking. Exploration of optimal approaches to tool design is further limited by the inability to map affordance characteristics directly to the process of cognitive formation itself. Ultimately, these limitations must restrict analysis of human/technology interactions to an examination of how affordances relate to the brain's ability to perceive the inherent functionality of a set of tools, leaving unaddressed the issue of how tool use serves to fundamentally alter the shape of the mind. TC/CT: IMPLICATIONS FOR INFORMATION SYSTEMS
TC/CT is naturally concerned with the effect of conducting the examination of affordances solely in terms of functionality. Of particular interest is how such a restriction has influenced our approach to the design of electronically mediated information tools. This restriction must condition the way system designers perceive their aims with respect to providing usable systems. Perhaps more importantly it must also condition the way they envision themselves in their role as providers of such systems. Heuristics for designing human-computer interactions have become dominated by an apparent concern for 'human factors' (Van Cott and Huey, 1992). This is evidenced by the fact that the rhetoric of design practice has become focused on facilitating and improving human ergonomics, human cognition, and human to human dialogue in cooperative task performance. The design imperative is to "devise new and better ways to expand human capacity, multiply human reasoning, and compensate human limitations" (Strong 1995: 70). This assumes the idea of the user as central to system design (Norman and Draper, 1986). We believe this assumption is in conflict with the above mentioned functionality driven approach and hence is rendered unrealizable. System designers are not system end users. By virtue of their adopted role, and their functionality oriented perceptions of that role, system designers can only deal in matters of construction. This situation must constrain their thinking about what they do and what system users expectations of them are. Even if system users are actively involved in matters of design (as in Gould and Lewis, 1985; Kyng, 1994) their contributions are only considered in terms of system usability; hence the users themselves become designers who contribute to the interests of the system. This concern for the usability of system products must cause the concern for human issues to become quickly reduced to engineering issues (Norman, 1986; Rassmussen, 1988; Bailey, 1989). These in turn must reduce to machine issues. The immediate consequence of these unintentional reductions is a growing tendency to perceive the end user strictly as a customer of the computer industry. Hence the benefits of
Epistemic Technology
31
improved information tool interfaces are increasingly marketed solely in terms of functional benefits such as 1) faster response time, 2) reduced learning time, 3) fewer errors, and 4) improved customer service, all of which are globally justified by an appeal to improved cost/benefit ratios (Curtis and Hefley, 1994). Such product oriented thinking ultimately reflects a tacit determination of value as improved efficiency in the workplace. Unfortunately it makes no reference to specific ways in which the individuals who must actually use the products may benefit. The situation is reminiscent of the critique made by Ellul (1965) of the cosmetic industry as providing a real solution to an artificially constructed need. Cognitive models may be considered with reference to the design process; however, these models tend to be considered only in terms of machine ends. That is to say, users are seen to be transformed by machine use only in so far as they become more adept at that use. Ironically, despite rhetoric to the contrary, consideration of the ways in which human capabilities themselves may be amplified rarely finds concrete expression in machine functions. Another unavoidable consequence of a functionality driven approach to humancomputer interaction designs is that the computer product, if it is to be usable, must look and feel good. This demands the construction of computer-mediated environments which closely reflect the perceptual interactions we are normally at ease with. A lot of effort is currently being expended on generating feelings of comfort. The route generally taken is to incorporate a variety of modalities, such as sound, graphics, video, text, or animation, and to explore the use of common metaphors (desktop, blackboard, workbench, etc.) in order to ensure that system functions are not only easily understood but also entertaining. In this context the design focus shifts onto the nature of interactivity itself and how it is controlled/conditioned by successful communication. On the one hand, resulting designs often involve the user as an "actor" (Laurel, 1991) within the machine-mediated environment, while on the other hand, they cause the machines themselves to be perceived as social agents by their users for reasons well explained by Nass, Steuer and Tauber (1994). It is obvious how this relationship is further reinforced when the interface begins to simulate human linguistic behaviour supported by human facial expressions (as in, e.g., Walker, Sproull and Subramani, 1994; Takeuchi and Nagao, 1993). The computer industry is thus involved in making business more competitive, omen by either exploiting an illusion of human human interaction or appealing to the mechanisms of social play. This situation represents an explicit reversal of the stated aims of TC/CT, which are ultimately concerned with amplifying the effectiveness of interactions between humans and not simply between humans and machines. Without wishing to appear overly dramatic, we wonder if there is an indication here that ethics are in danger of being traded in for aesthetics. (cf. Krueger, this volume.) With respect to the value of analysing the affordances projected by tools, it is important to consider the fact that the computer industry is also involved in, and to some degree depends on, the production of new knowledge. This calls for increasingly more powerful technologies "to significantly augment the skills that are necessary to convert data into information and transform information into knowledge" (Strong, 1995:70). Once again the argument for a design process driven by usable outcomes is invoked. "[T]his knowledge and these skills must be translated into effective design, design not merely of graphical displays, but initial design that takes into account users and constraints in such a way that the later changes are not necessary and users can
32
B. Gorayska and J. Marsh
immediately employ the products" (Ibidem). Seen this way, the user is at risk of being forced into the role of a mere consumer of knowledge and not one of an active participant in the process of constructing knowledge. Consequently, in spite of the rhetoric extolling the virtues of interactivity, a contradictory assumption remains operational in the design process: that is, that information which is machine-generated and passed to a user, will miraculously, by virtue of contact alone, become that user's knowledge. When in fact it may remain nothing more than another piece of information to be dealt with. Such an approach to system design cannot be effective. The logical outcome must be a proliferation of information pollution. For example, despite ever improving interfaces, it is becoming increasingly more difficult to find one's way around in a coherent and meaningful way on the internet. Distraction from purpose is commonly experienced by users who find that the wealth of information and readiness of access render selective searching problematic. Paradoxically, the task of becoming well informed for the purpose at hand is often hindered more than it is helped. The problem lies with the explosion of usability factors precipitated by a product oriented approach to system design. Functionality is interpreted in terms of a one to many relationship between a developing system and its users. Hence any consideration of the mental models constructed by users to accommodate the system's functional value is conducted strictly with reference to how well they relate to the system and how well they meet its intended purpose. Typically during test phases numerous users are observed in order to determine how effective they are at taking advantage of the system's functions. On the basis of the information gained the system is then modified to narrow the margin between the systems functions and the conceptual models which the users have of that system: further reifying the need for those functions. The approach thus remains cyclical and self fulfilling. Concern for how the design of the system works to transform the user is trivialized, if not lost entirely, and is understood only in terms of the system itself and not in terms of the users. We contend that in reality an inverse relationship is at work. There is a many to one relationship between systems (or tools) and users. TC/CT is about orchestrating the influences of those systems on the cognitive modeling capabilities of users so as to optimize human benefits. TC/CT AND THE PHENOMENON OF ATTENTION Gibson's theoretical framework, which appears to underpin current approaches to usability, cannot serve to explain the phenomenon of attention as it relates to perception because 1) even the simplest environment can be perceived in a variety of ways albeit according to its affordances, and 2) it lacks reference to the internal processes of cognitive formation. Attention is governed by what matters most to the perceiving system at the precise moment of perception. It may fluctuate rapidly within an apparently stable communication event and, consequently, may appear to be unfocused and disjointed, perhaps giving the appearance over time of 'inattention'. However, on a moment to moment basis there is always something which is capturing the 'processing' attention of the perceiving system. Attention then can be described first in terms of longevity (i.e., the length of time a set of perceptions remains in focus) and second in terms of intensity (i.e., the degree to which cognitive processing capabilities are brought to bear on the object of perception). Within this scheme, by
Ep&temic Technology
33
adopting a connectionist model of cognitive processing, intensity can be determined through a binary representation of the presence or absence of response across a varying number of processing nodes. It need not be thought of in terms of varying degrees of response or activity within a given perceptual or cognitive faculty. We believe that a consideration of the manner and degree to which any given perceptual input gains and sustains attention is fundamental to the development of heuristics for information tool design which successfully account for human factors. This process we hold to be determined by the degree to which the system is able to assign relevance to perceptions. Inversely, since attention can be said to signal relevance, the question as to what determines relevance becomes pivotal. R E L E V A N C E AS THE ANVIL OF A T T E N T I O N By virtue of the way it is structured, any tool carries a potential for releasing and mediating the mental processes of association which construct varying motivational states within humans. It follows that it also contains the potential for triggering a search for effective action sequences which are perceived or known to be able to satisfy those states. Lindsay and Gorayska (1989, 1993, 1994) have proposed a framework well suited to the analysis of these processes of association. It gives primacy to the notion of relevance as an essential theoretical construct which underpins all human symbolic-action planning and goal management processes. 1 Relevance can be defined simply as the relationship between goals, action plans, and action plan elements:
E is relevant to G if G is a goal and E is a necessary element of some plan P which is sufficient to achieve G Within a relevance driven analytical framework, the emergence of rational, purposeful behaviour is thus accounted for as an output of fine tuning of goals (i.e. cognised motivational states) to effective action sequences (i.e., connecting cognised plans for achieving goals to appropriate motor movements). Such tuning is further conditioned by the extent to which a perceiving agent is able to recognize the utility of all objects and events necessary to the occurrence of an effective action sequence. It is this process of fine tuning, which we hold to determine attention. From a generative perspective, the relationship which governs the instantiation of relevance can best be understood in terms of a governing global relevance metafunction (RMF) (Gorayska et. al., 1992). The purpose of the RMF is to act as an interface control mechanism (or possibly as a narrow bandwidth communication channel) between various cognized associative groupings and/or related search processes within functionally distinct cognitive subsystems. Simply formulated as:
[subjectively] relevance (Goal Element-of-plan, Plan, Agent, Mentalmodel) the RMF can return values for all of its parameters, depending on the initial inputs. When supported by goal management, external feedback, and hypothesis
1 How this framework differs from the widely accepted Relevance Theory of Sperber and Wilson (1986) has been explained in Gorayska and Lindsay (1993, 1995), Mey (1995), Lindsay and Gorayska (1995), Zhang (1993), and Nicolle (1995). Furthermore, Zhang (1993) has produced a formalised account of optimal relevance visa vis goal satisfaction and activation, using this framework.
34
B. Gorayska and J. Marsh
formation/confirmation, the function can account for the positive adaptation of minds to minds or minds to environments. Interestingly it is also possible to envision the more fundamental process of cognitive representation itself being represented in terms of a recursive application of the RMF. Unfortunately, despite its importance to TC/CT investigations, elaboration of this point is beyond the scope of this chapter. The utility of the RMF is immediately obvious when we consider mere recognition of the goals, plans for action, and environments captured by a perceiving agent in cognitively represented world models. What is less obvious, but much more important, is that, due to its iterative and recursive nature, the RMF also allows for the initial cognition of the motivational states, motor movements, and environmental percepts from which goals can be derived. Necessary to investigations of TC/CT is the realisation that cognitive goals, so derived, are not stable over time but are constantly generated, modified, clarified, specialised, prioritised and forgotten. It is our contention that, fortified by the RMF, relevance analysis provides sufficient adaptability as a theory to allow for such instability without losing any of its explanatory value. As such, it provides an ideal framework within which to situate the study of TC/CT as it applies to activity within a variety of disciplines. The assumptions underlying the above have been reflected in, and supported by, work in cognitive science in general, and Artificial Intelligence (AI) in particular. Both these disciplines find little difficulty in successfully accounting for goal seeking behaviour, once the goals of an organism or device are known and the relevance of individual objects and events which contribute to effective action sequences are established; that is, once problem spaces (Newell and Simon, 1972) have been generated. It is not a coincidence, we believe, that nearly all the endeavours in AI to date have focussed on human and/or machine action plan generation. The questions which still remain unresolved are more fundamental. They are 1) 'Where do our goals come from?' (Wilensky, 1988), and 2) 'How is the relevance of elementary objects and events established prior to the formation of effective action sequences that satisfy these goals?' (Lindsay and Gorayska, 1994). Through an application of relevance analysis, TC/CT seeks to answer these questions by providing a method for examining human sensitivity to the structures superimposed on our cognitive apparatus by the fabricated environment. This can only be done in conjunction with feedback mechanisms which register changes in degrees of satisfaction with respect to currently detected needs. Such feedback is necessary because structure in the environment guides the formation of mental schemata by dictating what can be accomplished successfully within the limits of that structure. Without the presence of such feedback mechanisms thought would be entirely conditioned by the affordances supplied by the environment. All capacity to modify environmental constraints towards meaningful ends would be negated. The fabricated environment must output feedback which primarily affects, positively or negatively, the generation and modification of the perceiving agent's goals and not only plans for action. Such goals are instrumental for the wants and needs which serve to construct human conscious awareness. In this context, the RMF constitutes a base construct from which cognitive formation mechanisms can be derived. These in turn generate the mental schemata needed to account for the ability to cognise problem spaces, activate goal seeking behaviour, and transform the problem spaces into the corresponding solution spaces.
Epistemic Technology
35
Such schemata can subsequently be understood as a direct result of the RMF interfacing and filtering the outputs/inputs of two systems running in parallel: 1) an unconscious relevance seeking connectionist system driven by genetically mediated motivational processes (accounting for order being imposed on perception) and 2) a conscious goal directed action planning system which uses relevance relationships as a basis for establishing symbolically represented goals and the plans sufficient to achieve them. (Lindsay and Gorayska, 1994; cf. a hybrid system, proposed by Harnad (1990), in which the role of motivation is not considered.) At this point we are able to consider how goals are actually generated. Several important factors must be noted. First, goals are not simply symbolic descriptions of motivational states. Rather, they are procedural objects interconnecting goal related mental constructs (Gorayska et. al., 1992) such as: 9 projected future states of the agent, 9 different objectives of either attaining, sustaining, stopping, or preventing those states, 9 activation conditions 9 satisfaction conditions *
additional constraints which themselves may be embedded negative goals
Second, activation and satisfaction conditions can be states in either the internalcognitive or external-physical environments. The former must exist and be perceived for the agent to activate goal seeking behaviour. The latter must exist and be perceived by that agent for her or him to attain a projected future state. Activation and satisfaction conditions for a given goal, when attended to, initiate problem solving in search for, or construction of, the set of operations which can affect a transition between them. Finally, humans integrate into the environment by cognizing its invariant features as activation and satisfaction conditions for goals. According to Gibson (1979) any environment contains features, referred to as invariants, which remain consistently recognizable from a variety of viewpoints. These invariants can be understood as satisfaction conditions for perceptual object recognition. Inversely, higher levels of cognized satisfaction conditions can be seen as invariants within the internal cognitive environment. (cf. the symbol grounding problem in Harnad, 1990) These invariants provide navigation points for spatial or temporal orientation within solution spaces (Gorayska and Tse, in prep.). To be effective, it is essential that invariants be salient and readily perceived. Across cultures, this has led to the construction of fabricated habitats that facilitate the reduction of sensory noise, thus highlighting the relevant invariants within them. The Fabricated World Hypothesis put forward by Gorayska and Lindsay (1989, 1994) extends Gibson's affordance theory by proposing not only that a) most of human memory is in the environment, but also that b) the human fabrication of habitats is such as to ensure activation and satisfaction of very few goals within them at any one time. This eliminates unnecessary mental processing, serves to make complex problems tractable, and makes simple algorithms sufficient for effective ecological interaction. In this context it is plausible to believe that external control of invariants activating and satisfying goals leads to an iterative and recursive application of the RMF. This in turn
36
B. Gorayska and J. Marsh
may lead to a formulation of cognitive goal chains that ultimately interface with the motivational states of participating agents, inducing their symbolic realisation. More importantly, this process is as valid for the formulation of domain specific goal/plan/situation correlates as it is for the formulation of meta-goals which embed and control cognitive processes themselves. Foundations can be laid here for significant manipulation not only of what people think about but also how they think about what it is that they are thinking about. TC/CT research takes the Fabricated World Hypothesis to its extreme. It attempts to address the issue of how the human fabrication of externalized, environmentally situated memory outlets dictates or prescribes which goals people will pursue most of the time, hence changing behavioral norms. It acts to investigate the way in which changes in perceived satisfaction conditions, affected by goal changes, serve to modify any previously generated related goals, thus modifying the internal cognitive environment. It considers how such modifications must induce changes in the perception of affordances. In turn these must precipitate changes in the structure of mind. Consequently, within the TC/CT approach, the perception of affordances, and ultimately the processes underlying cognitive formation, are seen to be dependent on that which determines goal generation and attentiveness through the perception of satisfaction conditions, namely relevance. 2 CONCLUDING REMARKS We have tried to illustrate what we think is novel about the TC/CT approach to the analysis and design of tools, particularly information tools, and the fabricated environment. Unlike other approaches (HCI, ergonomics, cognitive engineering, etc.) in which tool development and environmental fabrication are driven primarily with reference to functionality within the artifacts they produce, the TC/CT approach is foremost concerned with understanding how human cognitive processes interact and are partially formed by interactions with such artifacts. It is particularly concerned with how tools can be constructed which will best serve to amplify the cognitive capabilities of humans and enhance human to human communications. It is interesting to note that throughout our analysis two related streams of interest have emerged. One has to do with examining generative process outputs and is product oriented, the other looks at the nature of these generative processes themselves. The former emphasizes the need to understand functionality within the fabricated environment. The latter emphasizes an understanding of the processes by which that environment comes into being. Favouring development in one stream to the detriment of the other leads to an imbalance in our understanding of the relationship between humans and the environment. We have proposed the development of epistemic technologies, framed by relevance analysis, as a way to integrate the two. We have made a distinction between the terms Technological Cognition (TC) and Cognitive Technology (CT) which reflects the dichotomy between product and process. However, to generate an effective epistemic technology, each must be studied with reference to the other. The ensuing need to understand the generative processes 2 The connection between Gibson's affordances and relevance has also been noticed by Mey and Gorayska (1994), but they do not discuss the generative relation between the two, nor do they consider the mediating role of attention in this process.
Epistemic Technology
37
associated with Epistemic Technology led to a discussion of environmental affordances. It yielded the same dichotomy. On the one hand we noted a functionality driven approach to the analysis of affordances commonly leading to an emphasis on system usability and product orientation. On the other hand we identified the need to address the generative processes by which affordances are determined. Epistemic Technology reconciles the two by focussing scrutiny on the underlying factors which cause a perceiving agent to pay attention to one set of environmental invariants over another. In considering the nature of attention we began to discuss relevance analysis as a possible framework for enquiry. As we tried to illustrate how the relevance metafunction could be used as a basis for building a method of analysis, the influence of the same dichotomy became evident once again. Relevance analysis provides a credible way in which to approach the mapping of effective action plan sequences for the purpose of satisfying existing goals. However it also points to an explanation of the ways in which new goals and action plan elements, at various levels of cognitive functioning, can be generated from any combination of raw percepts and previously acquired concepts. Epistemic technology derives value from both these aspects of relevance analysis through a cyclical process whereby the outputs of one are continuously conditioning the inputs of the other in a recursive and self-regulating fashion. We believe that the degree to which the generative aspects of this cyclical process influence our understanding of human interactions with the fabricated environment has in the past been largely unaddressed. We further believe that a deeper examination of these aspects is needed before Epistemic Technology can provide us with the means by which to effectively control the ways in which we are affected by the products of our own ingenuity. REFERENCES
Alben, Laurelee, Jim Faris, and Harry Sadler, 1994. Making It Macintosh: Designing the message when the message is design. Interactions 1(1): 11-20. Bailey, Robert W., 1989. Human Performance Engineering. Englewood Clifffs, N.J.: Prentice Hall. Bruce,Vicky and Patrick Green, 1990. Visual Perception: Physiology, Psychology, and Ecology. Hillsdale, NJ: Erlbaum. Curtis, Bill and Bill Hefley, 1994. A WIMP No More: The Maturing of User Interface Engineering. Interactions 1(1): 22-34. Eberts, Ray E., 1994. User Interface Design. Englewood Clifffs, N.J.: Prentice Hall. Ellul, Jacques, 1965. Propaganda: the Formation of Men's Attitudes. New York: Knopf. Gaver, William W., 1991. Technology Affordances. Human Factors in Computing Systems, Conference Proceedings CHI'91, 79-84. New York: ACM. Gibson, James J., 1979. An Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Gorayska, Barbara and Roger O. Lindsay, 1989. On Relevance: Goal Dependent Expression & the Control of action planning Processes. Research Report 16. School of Computing and Mathematical Sciences, Oxford-Brookes University, UK.
38
B. Gorayska and J. Marsh
Gorayska, Barbara and Roger O. Lindsay, 1993. The Roots of Relevance. Journal of Pragmatics 19(4): 301-323. Gorayska, Barbara and Roger O. Lindsay, 1995. Not a reply - more like an echo. Journal of Pragmatics 23(6). Forthcoming. Gorayska, Barbara, Roger O. Lindsay, Kevin Cox, Jonathon Marsh, and Ning Tse, 1992. Relevance-Derived Metafunction: How to interface the intelligent system's subcomponents. Proceedings of the Third Annual Conference of AL, Simulation and Planning in High Autonomy Systems, Perth, Australia, July 8-10, 64-71. IEEE Computer Society Press. Gorayska, Barbara and Ning Tse, in preparation. A Goal Satisfaction Heuristic in the Relevance-Based Architecture for General Problem Solving. Gorayska, Barbara. and Jacob L. Mey, 1995. Cognitive Technology. In: Karamjit S. Gill, ed., New Visions of the Post-Industrial Society: the paradox of technological and human paradigms. Proceedings of the International Conference on New Visions of Post-Industrial Society, 9 - 10 July 1994. Brighton: SEAKE Centre. Gould, John D., and Clayton Lewis, 1985. Designing for Usability: Key Principles and What Designers Think. Communications of the ACM 28:300-311. Harnad, Stevan, 1990. The Symbol Grounding Problem. Physica D 42: 335-346. Koffka, Kurt, 1935. Principles of Gestalt Psychology. New York: Harcourt Brace K6hler, Wolfgang, 1947. Gestalt Psychology: An introduction to new concepts in modern psychology. New York: Liverwright Publishing Corporation. Kyng, Morten, 1994. Scandinavian Design: Users in Product Development. Celebrating Interdependence, Conference Proceedings CHI'94, 3-10. Boston: ACM. Laurel, Brenda, 1991. Computers as Theater. Reading, Mass.: Addison-Wesley. Lindsay, Roger O. and Barbara Gorayska, 1994. Towards a General Theory of Cognition. Unpublished MS. Lindsay, Roger O. and Barbara Gorayska, 1995. On putting necessity in its place. Journal of Pragmatics 23: 343-346. Lohse, Gerald L, Kevin Biolsi, Neff Walker, and Henry H. Reuter, 1994. A Classification of Visual Representations. Communications of the ACM 37(12): 3649. Marr, David and Tomasso Poggio, 1976. Cooperative Computation of Stereo Disparity. Science 194: 283-287. Mey, Jacob L., 1995. On Gorayska and Lindsay's Definition of Relevance. Journal of Pragmatics 23: 341-342. Mey, Jacob L. and Barbara Gorayska, 1994. Integration in computing: An ecological approach. Systems Integration '94, 594-599. (Proceedings IIId International Conference on Systems Integration, Sao Paulo, August 15-19, 1994). Los Alamitos, Calif.: IEEE Computer Society Press. Nass, Clifford, Jonathan Steuer, and Ellen R. Tauber, 1994. Computers are Social Actors. Celebrating Interdependence, Conference Proceedings CHI'94, 72-78. Boston: ACM. Nicolle, Steve, 1995. In defence of relevance theory: A belated reply to Gorayska & Lindsay, and Jucker. Journal of Pragmatics 23(6). Forthcoming. Norman, Donald A., 1986. Cognitive Engineering. In: D. A. Norman and S. W. Draper, eds., User Centered System Design, 31-61. Hillsdale, N.J.: Erlbaum.
Epistemic Technology
39
Norman, Donald. A., 1988. The Psychology of Everyday Things. New York: Basic Books. Norman, Donald. A. and Stephen W. Draper, eds., 1986. User Centered System Design. Hillsdale, N.J.: Earlbaum. Newell, Allan and Herbert Simon, 1972. Human Problem Solving. Englewood Cliffs, N.J.: Prentice Hall. Rasmussen, Jens, 1988. Information Processing and Human-Machine Interaction: An approach to Cognitive Engineering. New York: North Holland. Sperber, Dan and Deirdre Wilson, 1986. Relevance: Communication and Cognition. Oxford: Blackwell. Strong, Gary W., 1995. New Directions in HCI Education, Research and Practice. Interactions 11(1): 69-81. Takeuchi, Akikazu and Katashi Nagao, 1993. Communicative facial displays as a new conversational modality. Proceedings of INTERCHI'93, 187-193. Conference on Human Factors in Computer Systems, Amsterdam 24-29 April 1993. Van Cott, Harold P. and Beverly M. Huey, eds., 1992. Human factors specialists' education and utilization: Results of a survey. Washington, DC: National Academy Press. Walker, Janet H., Lee Sproull and R. Subramani, 1994. Using a Human Face in an Interface. Celebrating Interdependence, Conference Proceedings of CHI'94, 85-91. Boston: ACM. Warren, William H., 1984. Perceiving Affordances: Visual Guidance of Stair Climbing. Journal of Experimental Psychology: Human Perception and Performance 12: 259266. Weizenbaum, Joseph, 1976. Computer power and human reason: from judgment to calculation. San Francisco: Freeman. Zhang, Xio Heng, 1993. A goal-based relevance model and its application to intelligent systems. Ph.D. Thesis. School of Computing and Mathematical Sciences, OxfordBrookes University, UK.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
41
Chapter 2 IMAGINIZATION AS AN A P P R O A C H TO INTERACTIVE MULTIMEDIA Ole Fogh Kirkeby & Lone Malmborg Institute of Computer and Systems Sciences Copenhagen Business School, Denmark ofk/dasy@cbs, dk; dslone@cbs, dk
ABSTRACT Recently, it has become an important issue in human computer interaction how to conceptualize humans' spontaneous interaction with the multimedia interface, and how we can design this interface so as to satisfy the demands of communicative competence in the dialogue. Using phenomenological philosophy, Kirkeby and Malmborg give an interpretation of how metaphors are created and used in common language, and extend this interpretation to comprise also our cooperation with the computer. In its ideal realization, such a spontaneous creation of metaphors is called imaginization. The authors also show how to categorize the new media in relation to the dimensions of closeness and situatedness. le
Currently, interactive multimedia (IMM) are generally conceived of as a kind of computer based technology, characterized by the cooperation between discursive text, commands (including command-driven vehicles), icons, and images (static and moving). In contrast, virtual reality (VR) techniques are regarded both as a supplement and as a substitute for multimedia. It seems appropriate to try and give a more profound diagnosis and definition of these concepts. IMM, as a technology, mediates between the acting consciousness and the world. By referring to an outer world this medium presupposes a cognitive distance, a distance which implies a reflexive consciousness of the representative functions of the media. IMM conveys a central cognitive significance to the image as intermediary to, and structuring, information. Virtual reality (VR), on the other hand, is a technology whose purpose it is to substitute for the experience of reality characterized by its interactive, meaningful, senses-based relation to this very reality. We can state the following conditions for VR: 9 Images as such do not exist in successful VR;
42
O.F. Kirkeby and L. Malmborg 9 In successful VR, one cannot presuppose theoretical, reflexive consciousness, but only practical reflexivity: the actor is immersed into the reality in which s/he exists; 1 9 Considered as an epistemological ideal, the VR-world should not presuppose a reference to 'another reality'. As 'reality', VR should at most be a parallel (in the sense of a 'possible world') to 'real reality'; 9 Only such consciousness of the body, and such practical consciousness as realized through the media, is able to exist in VR. We are the reality that we experience in VR. Similarly, any theoretical consciousness that we might develop in VR is also bound to this reality, and ideally ought not to interfere, or be 'confronted' with requirements of adjusting itself to knowledge grounded in our familiar 'real world' reality.
One may conceptualize the relationship between VR and IMM and their possible combinations in the following ways: 9 IMM as embedded in VR: VR has priority over IMM inasmuch as it is a substituting reality, whereas IMM are still media, forced to duplicate reality. Here, IMM is itself a VR-function and must refer to the VR-world. 9 VR is coordinated with IMM. Here, VR does not function as virtual reality, even though, from a cognitive theoretical as well as from a pedagogical point of view, the two are difficult to compare. 9 VR is embedded in IMM. Here, we will find VR functioning, among other things, as a research space for identifying and handling files. As we have mentioned above, IMM is located half way between symbolic media and discursive media, because even moving images, video-sequences, and the like have an inherently symbolic character: they must exemplify reality to a far higher degree than merely referring to it. For our purpose, however, the crucial point is not whether there exists a reference to, or an exemplification of, a reality. On the contrary: we are concerned with the possibility of creating a readiness in the user that enables him/her to act in this reality, which means the handling of such knowledge as is capable of creating this readiness. This process of handling knowledge we shall call imaginization.
Imaginization is not restricted to a single form of representation; it is thus neither bound to a single sense, neither to seeing (images and text) nor to heating. Imaginization implies the total bodily-mental reality and is thus an embodiment: one could say it is 'incorporated'. The history of the concept of imaginization throughout phenomenology has been a rather checkered one; it has only acquired a certain unity starting from its development in the oeuvres of Martin Heidegger and Maurice Merleau-Ponty, where it expresses the 'human condition' of language being a sensus communis, a 'sixth sense', allowing the five 'regular senses' to combine in relation to some theme (or noema), or to some semantically unambiguous notion. This would mean, in a computer context, that incorporation first and foremost should be negotiated in accordance with the criterion of closeness.
1 The concept of practical reflexivity is developed in Kirkeby (1994a).
Imaginization and Interactive Multimedia
43
With respect to closeness, we have to consider speech as the primary, active ability to which the sense of hearing, as well as the senses activated in the gestural space, belong. Furthermore, it involves seeing, the ability which, in its dialectical relation with speech, creates our image of the other person. Just as does speech, seeing, too, has its own 'style', its own different cognitive process (Merleau-Ponty, 1964), a fact that manifests itself in our construction of the image of the other person, as well as in the image we create of the situation structuring the communication, not only between the I and the other person, but also between person and machine. Moreover, if we inject the concept of incorporation into that of 'situatedness', this latter concept will come to denote the historically given horizon of meaning. We will come to realize that situatedness is the condition of the realization of meaning, while being itself conditioned by, communication as the 'suprema differentia specifica' of man (Kirkeby, 1994a). In other words, speech and seeing have a characteristic, common 'style' inasmuch as they rise from the expectations in our cognitive abilities and in our senses which, as a prefigured readiness toward meaning, hide in our bodies in the form of habits and socially governed concepts. These expectations are precisely the 'tacit conditions' of which Wittgenstein speaks. 2 The concept of incorporation, when brought to bear upon situatedness, thus implies the existence of a readiness towards meaning at a level that is not reducible to a mental and a bodily system; a level one might term 'the inter-world' between the systems. This readiness is not least realized through metaphor, a fact that is naturally of general importance for any reflection upon the relation between humans and machines (computers); it is of special importance for any interpretation we may attach to the concept of interface in IMM. It should be mentioned that the combination of situatedness and incorporation alters the classical concept of 'intentionality' as developed by Husserl, heavily reducing the possible autonomy and cognitive range of the reflexive cogito. In the following, we will interpret situatedness through the concept of interactivity. This concept is a regulative idea expressing the possibility of imaginization, if the maximum of incorporatedness and situatedness are realized in relation to the system. In this perspective, imaginization appears as something which is very different from mere 'knowledge representation': it becomes the activity of creating symbols- or, as Merleau-Ponty has put it: it is language letting you catch a meaning which did not exist earlier (Merleau-Ponty, 1964). By the same token, imaginization is different from learning by example, because the prototypical relation between user and the IMM is one of interactivity: imaginization implies the combination of learning by example and learning by doing. Thus the process of imaginization refers back to the dynamic process in which new metaphors are created, it refers to the user's practic-reflexive activity, and hence to a sense of possibility, which dissolves any conventional meaning inherent in examples. One might even say that such a process has the character of a positive self-reference, i.e. a process where both knowledge and the self referred to change during the interaction.
2 "Dann ruht der Vorgang unseres Sprachspiels immer auf einer stillschweigenden Voraussetzung." (Wittgenstein, 1989)
44
O.F. Kirkeby and L. Malmborg
Imaginization, as a special kind of user practice, first of all appears in and through one particular activity: the ability to create symbols. Symbols express what I know, what I am able to do, and what I want to be or have; the symbol points both to the case, to my cognitive relation with it, to my intentionality, and to the very context in which case and intentionality both acquire their meaning. We see how already at this stage, a purely epistemological attitude towards metaphors becomes problematic: we cannot strategically reduce metaphors to an ontological level that stretches beyond the historically given media. Here, we meet a further difficulty: other (e.g., discursive) media, too, are able to refer symbolically; the language of poetry embodies the essence of this practice. Thus the symbolic element in the media need not be bound to the image-form itself, to the image-sign: images may exemplify the symbolic element, but the discursive statements refer to it. It is a fact that images only in a very broad sense are able to refer discursively - as in the case of a comic book without a text. The crucial opposition, then, is not between discursive vs. image-bound reference to the symbolic dimension; rather it is between two ways of using language: one in which one refers symbolically, and one in which one does not. Carried to its extreme, the latter opposition implies a difference in linguistic practice: we either refer by using images (or symbols), or we do not. The opposition is thus between 'image-language' and a language without images. A general problem regarding IMM as a cognitive environment and as a technology is the question whether it is at all possible to combine discursive language with imagelanguage. Will the image-language not destroy the potential symbol-creating power in the discursive language? Here, we may have overlooked another distinction: Actually, there are two different ways of using and creating symbols: One that could be called 'overdetermined'; it will be discussed below, section 2. And one that could be called spontaneous, which is the main ingredient in the ability we have called imaginization, and which, while subject to the limitations imposed by each individual case, as well as by the general constraints inherent in intentionality and context, still is able to transcend its own limitations. The spontaneous creation of symbols is at the core of situated and incorporated cognition, whenever the maximum demands on interactivity are fulfilled. As an activity characterizing a particular relation between user and system, interactivity can only be conceived of as originating in human interaction and characterizing a particular relation between two people. This relation distinguishes itself by the fact that the one person cannot be a means to the other; neither can any kind of authority be legitimized (Hegel, 1807). In modern philosophy of language, this view is codified through the concepts 'illocutionary' versus 'perlocutionary' in relation to speech acts (Austin, 1962, Searle, 1974; Habermas, 1981). Ideally, illocutionary speech acts-pace Habermas- should be the dominating ones. Illocutionary speech acts distinguish themselves by expressing a particular intention in its 'raw', unconcealed form (such as a promise, a statement, an emotional manifestation). But illocutionary speech acts are only a necessary, not a sufficient condition of interactivity. Interactivity only happens when, and in the way that, the other person reacts to the illocutionary act.
Imaginization and Interactive Multimedia
45
On the one hand, this reaction (the 'answer') should express the fact that the other person has understood the speech act's formal character. On the other hand, and in addition to this, the answering person should be able to relate to a number of speaker's properties: his personality, his basis of experience, his level of knowledge; to the probability and truth of the subject addressed by the speaker; to his sincerity; and to the way in which the possible content of his speech is created, altered, or annihilated by the situation. 3 Optimal interactivity thus consists in a maximally reflexive openness of mind towards all these facts, towards the style, the truth, and the person (Kirkeby, 1994b). Imaginization expresses an ideal, prototypical horizon, on the background of which the possibility of a creative use of multimedia must be seen.
What characterizes imaginization, or spontaneous symbol-creation, is its relation both to our habit of exemplifying through images (as discussed in the previous section) and to our practices of representation in discursive languages. Crucial for the latter is what we have called its 'overdetermined' relation to symbols characterizing a particular type of both the image-media and the discursive language. As such, however, this relation cannot be considered to be inherent in any individual media. In the image-media, such an 'overdetermined' application and creation of symbols are characterized by icons, i.e., static and dynamic images exemplifying by means of a conventional cognitive frame, a frame that is most often not consciously acknowledged. Examples may be found in mental models that classify our faculties into cognitive, conative, and emotive, and which naturalize this classification through illustrative images taken from the world of science (such as the white coat and, in earlier times, the slide ruler); from the world of politics (such as the stony face, symbolizing ruthless power); or from the private world of intimacy (symbolized by the caring mother). Alternatively, one could think of our communicative structures, such as they are illustrated by the common metaphor of sending and receiving through some channel. Or of the use of metaphors in information processing, which rely on the imagery and symbolism of manufacturing industries. This overdetermination, however, does not characterize the image-media alone; it is also an inherent quality of discursive language. Frequently, unconscious reference to types of metaphorical scenarios, similar to the ones discussed above, is made. In discursive language, such semantic qualities typically cannot, most of the time, be related to a unique existential domain, whether senso-motoric, sexual, or that of family life and work. In this connection, it behooves us to recall what has been stated by the German phenomenologist Hans Lipps as the primordial condition of symbol understanding, viz., that there exists no "original meaning, but only the origin of a meaning". According to
3 Habermas' formal pragmatics cannot cope with the fact that the situation is the final mechanism creating meaning by canceling it. 'We do not know what we mean until we have heard ourselves saying it'.
46
O.F. Kirkeby and L. Malmborg
Lipps, all names ought to be considered as being "the result of a baptism in circumstances of urgency ('Nottaufe')" (Lipps, 1958: 73). Thus, spontaneous creation of meaning is hard to spot and resists critical analysis: the reason is that overdetermination blocks our v i e w - due, among other things, to the fact that symbols are integrated into historic-social settings and thus themselves (have) become active ways of reproducing a given social reality. As examples, consider the fact that only certain types of work or family life can provide a framework for cognitively viable metaphors. For this reason, bodily metaphors, as epistemological means of attacking the very problem of metaphors, their semantics and pragmatics, should be handled with the utmost c a r e - a reservation which we will come back to in our critique of Lakoff' s and Johnson's theories in the next section. But what, if in spite of all this, one wants to stick to the idea of a spontaneous use and creation of symbols? In that case, the question must be asked: where and how could such a creative use be practiced? The answer is that this happens first and foremost in dialogue; but not just any old dialogue. This dialogue has to be of a very special kind, characterized by a maximum of interactivity. This means that for the interactive agents, the dialogue is characterized by the actual possibility of referring to the interlocutor's style, his conception of truth, and the quality of his person. This rules out any (explicit or implicit) acceptance of 'the compulsion of the best argument' - as held by formal pragmatics (Habermas, 1981). On the contrary: arguments must be understood as embedded in both rhetorical and poetical reality, the pragmatic dimension of which implies that no agent is ever fully informed about the content of his or her own arguments unless, and until, they are uttered. No argument, therefore, can be taken to be the carrier of a unique, abstract rationality; on the contrary, all rationality is 'bounded' by its context, and dependent on the constraints of the utterance (in time and space) as to what can be 'expressed'; hence, an argument might 'win out' precisely because of qualities transcending the rational. These are the qualities which traditionally are treated in rhetoric or poetics, especially with respect to arguments that can be validated (or: whose relation to truth can be established) only at a later time; still, these qualities may actually carry the day in virtue of their ability to influence the opponent. Another way of saying this is that spontaneous symbol-creation is nothing but the insistence on the 'non-identity' of concept and reality- as T.W. Adorno used to put it (Adorno, 1966). Similarly, in the words of Ernst Bloch, it could be called the promise of an as yet unrealized, but possible Utopian and primordial experience (Bloch, 1969). 4 The pragmatic aspect of the linguistic proposition dominates dialogue, and thus dominates the reference to the restrictions enforced on to its usage through incorporation and situatedness. Propositions become provisional. They become projections of worlds ('ways of world-making' as Nelson Goodman put it); and
"Und auch die Symbolik, die zum Unterschied yon der mehrdeutigen Allegorie v6Uig einsinnig eine Realverhiilltheit des Gegenstandes bezeichnet, ist eben in der dialektischen Offenheit der Dinge angesiedelt; den an diesen Bedeutungsr~indern lebt das Fundierende jeder Realsymbolik: die Latenz. Und die Einheit fOr Allegorien wie Symbole ist dieses, dass in ihnen objektiv-utopische Archetypen arbeiten, als die wahren Realziffern in beiden". (Bloch, 1969:1592.) 4
Imaginization and Interactive Multimedia
47
attempts at meaning-catching, in an endeavor to make them carry meanings that at most exist sporadically. In a context like this, the validity of a metaphorical proposition is to be determined in accordance with what Aristotle has to say about the rhetorical and the poetical. Aristotelian thought is unique in that it considers rhetoric as an interdisciplinary matter, touching on dialectics at the one end, and on ethics and politics at the other, with poetics taking the 'lead', though, at least from an aesthetic point of view. Exactly this insistence on the tension between rhetoric and poetics in the classical Aristotelian sense will show itself to be of importance for the development and analysis of IMM, because here the tension is built into the very 'media', i.e. the common ground between dialectics (including logic) and aesthetics. Q
First, let us nail shut a popular escape-hatch. Those who do not accept the idea of a formal meta-language will turn to dialogue in the hope of finding a non-symbolic metalanguage. This is because symbols here function provisionally and can always be transcended by the very intensity of the words, through overt reference to context, to style, truth, and person. Similarly, the concepts expressing the body-phenomenological dependency of language (which is Lakoff's and Johnson's point of departure in cognitive science; Lakoff, 1987; Johnson, 1987) have an obvious metaphorical reference. The same goes for Lakoff's concept of 'idealized cognitive model'. This concept has its origins in several sources; here, only those with the most typical metaphorical reference will be mentioned: Minsky's 'frames', Fauconnier's theory of 'mental spaces', and AI-related concepts of'scripts' and 'schemata' (Lakoff, 1987: 68), as developed, e.g., by Schank and Rumelhart. These concepts refer to conventionalized complexes of images which themselves have no further discursive reference; their legitimization stems from the mini-worlds of theater, cinema, and architecture. In other words, they are themselves symbols. Against the background of modern American linguistics, psycholinguistics, and experimental psychology, Lakoff (1987: 127)treats the phenomenological concept of intentionality using the slogan: "Seeing typically involves categorization". It does not seem to bother Lakoff that by 'objectivizing' the very concept of intentionality, he lets in objectivism through the back door. Implicitly accepting Husserl's evidence-criterion of intentionality, he more or less explicitly spurns any reference to 'situatedness' (on which see below) and its concomitant historicity, as they have been stressed by Heidegger, especially through the latter's distinctions within the category of 'being-in' (In-sein): 'Befindlichkeit', 'Geworfenheit', and ' Verfallen' (lit.: 'disposition', 'thrownness', and 'deterioration'), distinctions that capture the quintessence of nonauthentic reasoning and acting. Precisely for this reason, we cannot expect to find any primordiality or authenticity in metaphorical speech (Heidegger, 1927). On the contrary: it is history which makes and breaks metaphors: it makes them into a vehicle of power, as Nietzsche has shown us in the last century, or reduces them to trite commonplaces without other than 'historical' interest. Lakoff develops the cognitive theoretical model that forms the basis of his categorizations in two tempi: One is kinesthetic reference, based on the
48
O.F. Kirkeby and L. Malmborg
Schopenhauerian concept of 'body-image', in our century so brilliantly developed by Merleau-Ponty (Merleau-Ponty, 1945), and grained by Lakoff onto Eleanor Rosch's so-called 'basic level categories'. The other is a rather erratic insistence on the notion of the mental image, a concept which in the end destroys the basis of his own fundamental paradigm. This critique of Lakoff also applies to his partner, the philosopher Mark Johnson (see M. Johnson, 1987). Johnson says: Our reality is shaped by the patterns of our bodily movement, the contours of our spatial and temporal orientation, and the forms of our interaction with objects. It is never merely a matter of abstract conceptualization and propositional judgments. (Johnson, 1987: xix.) Even though there can be no doubt about the last part of the Johnson quote, it still is the case that our reality is shaped by the historically transferred, linguistically given possibility of concrete bodily experience, and that the 'object' having the greatest significance for our experience in its various form, is the other person. Hence the paramount importance of the historic and social dimensions. In particular, Lakoff's naturalization of the famous Wittgensteinian concept of 'family likeness' restricts the so-called 'metonymic effect' (which links the more or less representative exemplars of the category to the prototypical carrier of the familylikeness) to a crude concept of categorization (whether we call this concept 'anthropologizing' or 'universalistic', makes no difference). By doing this, Lakoff skirts the entire issue of the historical character of meaning. From another point of view, one might say that Lakoff lacks a feeling for the 'unhappy consciousness'; that is, he lacks the fundamental critical distance which would enable him to unveil the body as the area of alienation, of unreality, of lack of originality, and thus unmask our body-image as the product of historical and social forces - as Michel Foucault has made us aware of (Foucault, 1975). Lakoff does not transcend the naturalistic concept of use, as instantiated in the Heideggerian concept of 'das Zeug' (from his book Sein und Zeit (1927): literally, 'the thing in its pure materiality', but carrying all sorts of other connotations such as 'trash, nonsense', and also 'tool, outfit'). Nowhere does Lakoff show himself to be conscious of the influence which the technological-scientific complex exerts on the creation of the modern b o d y a consciousness which has been emphasized in Heidegger's works ever since the forties. 5 Furthermore, Lakoff's idea of incorporating (in the literal sense of the word: placing in a human body, 'em-bodying'), language's power of creating reality (cf. Wittgenstein's earlier mentioned 'tacit conditions' of language use) remains naively naturalistic in that his concept of'embodiment' parallelizes (not to say: simply equates) sensual perception and linguistic (social) experience. Abandoning this simplistic parallelism would require Lakoff to reflect on the fact that in modern society, all experience is a social and historical construction; only by doing that would he be able
5 The theme is first played in Ober den Humanismus from 1946, and then fully orchestrated in Die Technik und die Kehre from 1962. If Lakoff had made himself familiar with (in particular) Ober den Humanismus, he might have discovered that there was such a thing as an anthropological frame of reference (and even used it).
Imaginization and Interactive Multimedia to cross the boundary from his phenomenological realm of thought.
49 sterile
objectivism
into
a
more
fruitful,
Here are Lakoff' s own words:
Cognitive models are embodied, either directly or indirectly by way of systematic links to embodied concepts. A concept is embodied when its content or other properties are motivated by bodily or social experience. This does not necessarily mean that the concept is predictable from the experience, but rather that it makes sense that it has the content (or other properties) that it has, given the nature of the corresponding experience. Embodiment thus provides a non-arbitrary link between cognition and experience. (Lakoff, 1987: 154) Lakoff is correct in maintaining that incorporation excludes all non-arbitrary relationships between cognition and experience; however, this does not imply - as Lakoff seems to think - that this relation is not also (and indeed necessarily) one that has developed historically. For him, the social dimension is glued onto a bodynaturalistic idea of how concepts are created, whereas the historical dimension is conspicuous by its absence. As if to remove any possible doubts, Lakoff's presentation of his perspective on incorporation explicitly omits any reference to a theory of communication. One is tempted to ask why he does not mention Merleau-Ponty, whose theoretical approach to 'incorporated cognition' in essence was developed long before Lakoff's, and who formulated the necessary constraints that such an approach would have to obey in order to be consistent with a phenomenological perspective on cognition. Perhaps the reason is that on Merleau-Ponty' s view, the concept of communication implies that we can neither allow a kinesthetic level of conceptualization to be subject to ontologizing, nor accept the Husserlian, pre-phenomenological idea of mental images in the form of a pre-linguistic language, even if this language - as in Lakoff's case - is founded on our bodily praxis and not - as in Fodor's work - somewhere in thin air. In a way, Lakoff' s dilemma reproduces the very crux of the cognitive paradigm that he wants to reform. There is, of course, the possibility that the problem is one of different traditions: one has to remember that phenomenology only came to America in the disguise of its wake, constructionism and deconstructionism, now themselves on the wane, as Barbara Johnson has pointed out (Johnson, 1995). In this connection, it may be of significance that Derrida took his central ideas principally from Heidegger's later writings; as to Merleau-Ponty's radical thinking, the case can be made that it probably was overshadowed by the existentialist movement. However this may be, it seems rather obvious that in any critique of Lakoff and Johnson, by far the most difficult problem is how to speak about metaphors in a nonmetaphorical language. Leaving aside the strictly 'meaningless' logico-mathematical languages, we must admit that a non-metaphorical metalanguage covering all dimensions (semantic, syntactic, pragmatic) necessarily has the character of a 'regulative idea', as Kant called it. Here the idea of imaginization, of spontaneous symbol creation in dialogue, may be useful, since it insists both on our being conscious of the necessarily non-reductionist character of any theory of symbols, and on our realizing that non-identity is a normative constraint on any theoretical explanation of the relation between language and reality.
50
O.F. Kirkeby and L. Malmborg
0
So far, we have not made any explicit distinction between the different kinds of interactive multimedia systems. Usually, IMM systems are simply defined as collections of different media within a single integrated system. In our conception, IMM are primarily characterized by their focus on the interaction between the user and the computer. The notion of 'imaginization', as defined earlier, captures our readiness to create images: it allows our language to catch a meaning that did not exist previously. Imaginization is a complete mode of expression, an ideal, prototypical horizon for creative application of multimedia systems, as we have noted earlier. The question is now which qualities a multimedia system must have in order to support the users' access to creating their own images. This question can be addressed by describing the manner of interaction between the user and the multimedia system (called here agent I and agent II respectively). We suggest to use imaginization as a means of characterizing multimedia systems by a typology based on their degrees of ability to support incorporation and situatedness. Doing this makes it possible to identify a number of systems, differing significantly as regards user/system interaction; they can then be related to what are loosely called 'multimedia systems'. INCORPORATION A multimedia system is called more or less 'incorporated' in terms of its closeness of interaction, i.e. according to the user's perception of the distance to the reality represented by the system. Perceived closeness is particularly connected to visual experience (as well as to speech, as mentioned earlier). Thus, the spatial dimension is an important determining factor in the visual perception of closeness, whereas the other senses, in this respect, are inferior to seeing (even though they, too, may influence the spatial perception of the user). Piaget's notion of 'intuition' assumes that any original thinking requires an intuitive basis. 6 Imaginization is a means for original thinking. The most important characteristic of an intuitive process is that it is based on sense impressions; that means: it always refers back to an ontogenetically prior constitution of reality through the senses. This 'sensitized' or 'sensualized' reality is contained in our perception as an ever-present readiness towards alternative meanings. Intuitive thinking is ruled by context and by the discourse of the perceived meanings. It is hard to imagine how this could be supported by a computer, for the simple reason that the computer does not possess any devices that makes perception possible. For several reasons, a computer will never A crucial distinction in Piaget's thought is that between pre-conceptual and conceptual thinking. For ages 4 to 12, it has been established that the child can move back and forth between these levels: preconceptuality alternates with conceptuality. We see this illustrated not only by the formation of the linguistic concept through speech; it is also possible to move in the opposite direction, connecting the linguistic concept with the mental image. Here, the image acts as a cognitive tool compared to the word. It is this function, the possible interaction between concept and image, that the multimedia focus on, basing themselves on a more 'primitive' way of perception, and allowing for a 'ready', 'incorporated' way of coping with new or not well-known situations. The notion of incorporation is crucial for an understanding of Piaget's concept of 'readiness' (Piaget, 1923).
6
Imaginization and Interactive Multimedia
51
have sense impressions in the proper sense of the word; computer simulation of sensebased perception is impossible. First of all, there is the meta-theoretical knowledge about the situation and its typology, which is a necessary condition for constructing prototypical scenarios of experience; such a knowledge is not within the capacities of the computer. Second, the learning algorithms referring to the individualization processes themselves are not too well known, and hence their simulation on the computer presents unsurmountable difficulties. Thus, multimedia systems owe their cognitive strengths to their close connection with the user in interaction; hence, it is the experience and the perception processes of the user that form the object for the interface. By contrast, virtual reality systems ideally simulate - as mentioned earlier - an exchange of sense impressions with the user. Laurel (1993: 204) writes that "by positing that one may treat a computer-generated world as if it were real, virtual reality contradicts the notion that one needs a special-purpose language to interact with computers." It is only a simulation of true exchange, since any analogue sense expression sent from the human user is converted into digital signals which can be processed by the VR-sottware. The other way around any digital 'sense expression' sent from the VR system is converted into (today still rather primitive) analog signals (i.e. poor graphical resolution). To the degree that multimedia systems are dependent on the symbolic dimension, they have a built-in cognitive and media-based limitation. In virtual reality as a technological-cognitive Utopia, this limitation seems to have been overcome. Complete incorporation seems to have been reached, when there is no longer any need for a specialized way of communication, and expressions and means of perception, as they are intuitively used in human contexts of communication, are sufficient. However, when it comes to 'situatedness' (as expressing the connection of perception to the historical media; see below), it is doubtful whether this ever can be simulated by virtual reality: the medium constitutes already in and by itself, so to say, a violation of the continuous reality of the individual. On the other hand (and mainly on account of the technological development in the multimedia area, as also pointed out by Frank Biocca elsewhere in this volume), one may consider the possibility of treating IMM as a device for support of original thinking. Figure 1 shows the changing applications of incorporation to certain aspects of (multimedia) systems. The degree of incorporation is, as mentioned above, determined by the degree of closeness that we perceive in it - a perception which primarily is dependent on vision. For this reason, the degrees of incorporation should be characterized in terms of interface: does the system have a character-based, nongraphical interface, a graphical, multimedia interface, or a synkinesthetic interface to the user? The interfaces themselves can then described as one-, two-, or threedimensional, respectively. Such a description of the degrees of incorporation takes its point of departure in technology, as applied to the senses through which the interaction between system and user takes place. Crucial to this interaction are sight and speech; the latter taken as the central sense (the sensus communis of the Scholastics) grouping and combining in its functions all the other senses. In the first interface dimension, interaction typically takes place through the activation of sight (the user reading characters), the user's response being given in the form of keyboard commands. In the second dimension of the interface, all of the senses
52
O.F. Kirkeby and L. Malmborg
may be affected. Since typically, the IMM systems involved here contain sequences of text, pictures, video sequences and sound, the user's experience of reality still remains two-dimensional: the user does not feel that he or she is really interacting with the system, the way we will see it to be the case in the third dimension. In the second dimension, too, the users may respond to the system not only by means of keyboard strokes, but also by using pointing devices such as mouse, digitizer, joystick, sensitive screen, or the like. In this dimension, the users still have a clear feeling of the limits of the system, and of the boundaries between themselves and the system, in the guise of the screen itself, even in cases where the system is capable of realistic simulation (e.g. of depth, as in computer games like DOOM). Finally, in the third dimension, the interaction takes place through affection of, and perception through, all the users' senses; vice versa, the users are able to apply all their senses in responding. In principle, this holds true for the system as well, though in this case the interaction using all the 'senses' is a simulated one, based on the transformation of analog signals to digital representations. The boundaries between user and system are dissolved, thus creating the impression in the users that they are moving into the system's reality, while similarly, the system moves into theirs (e.g. by affecting their sense of balance in what is often called 'simulation sickness'; see Biocca, this volume). SITUATEDNESS As was the case for incorporation, so situatedness, too, is closely tied to the form interaction between the user and the multimedia system takes. And just as when we talked about degrees of incorporation, so situatedness, too, comes in degrees. These degrees are related to a system's flexibility, and to its ability to perceive and 'understand', as well as react to, the user' s patterns of activity and intentions. Earlier, we claimed that interactivity can only be understood if one starts out from its origin in human activity, which is based on 'equal rights' for participants: no participant is superior to any. However, in normal interaction between a human and a computer, the human user is superior. Computers are not, and have never been, expected to have or develop, on the basis of their knowledge of the individual user, any assumptions about that user's intentions. The computer is simply expected to react to in an appropriate way to the user's unambiguous commands. But where does situatedness come in? Simply like this: the more a computer 'knows' about the user, or the better it 'understands' the user's intentions, the greater the system's flexibility, and the more adaptable the interface. Thus, we have a high degree of situatedness in the case of the so-called 'autonomous agents' which, applying AI-related techniques, are basing their behavior on a superior knowledge of the user; here, the user "is engaged in a cooperative process in which human and computer agents both initiate communication, monitor events and perform tasks." (Maes, 1994:31; see also Lindsay's chapter, this volume, for a rather divergent view). Such agent-based systems, of course, invoke some important issues of authority and jurisdiction; they presuppose a relation of trust between the user and the system (as an example of how badly things can go when that mutual trust is absent, we only have to think of supercomputer HAL's rather arrogant dispositions in Stanley Kubrick's classic
lmaginization and Interactive Multimedia
53
movie '2001: A Space Odyssey'; truly a prophetic vision, some will say). The system's 'sensing' of the user's readiness towards meaning is crucial to its success or failure in supporting situatedness as a means towards imaginization. 'Readiness towards meaning' is used here in the sense which 'intentionality' acquires in modem phenomenology, as opposed to its use in Husserl's early writings (Husserl, 1980). As to situatedness (as has already been mentioned), it should be understood as what Heidegger had in mind, when he, in his 1927 masterpiece, Sein und Zeit ('Being and Time'), defined the concepts of 'Befindlichkeit' (lit. 'disposition'), 'Geworfenheit' (lit. 'thrownness'), and ' Verfallen' (lit. 'deterioration'). Incorporation and situatedness are thus the very qualities of intentionality, and they are augmented through the thematic and reflexive relationship that the intending person has to his or her own existence. A question of crucial importance is whether intentionality can, or should, be defined on the basis of this reflexivity. Since incorporationand situatedness are united in a conceptual reciprocity, where 'intentio' constitutes its 'intentum' by realizing the individual's as well as the collective's tacit conditions, it follows that this cannot be the case. The problem is that the system ought to reinforce the user's consciousness of his or her own intentional basis; this is the core of any logic of autonomy. Here, it is of importance whether the user wants to emphasize situatedness at all, if it entails certain parts of the system being restricted in their autonomous enacting of knowledge. Obviously, there are different types of rationality, and their relations to the individual user deserve to be brought out into the open. Is it, then, possible to speak about different rationales having different ontological status? Can such non-mainstream rationales continue to function in secret, coming out into the open only at a later date, as it is often argued in psychoanalysis and Marx-inspired sociology? From another point of view, the criterion of introspection that we called upon above, along with the very consciousness of this reflexivity, of this turning inward to oneself, are likely to inhibit spontaneity. We might perhaps again refer here to the 'practical reflexivity' (Kirkeby, 1994a), that is able to express a continuous sense of what steps to take next to go where, and which does this, to a higher degree than is the case in mere abstract introspection, in a conceptual emphasis on the steps of the process. Finally, we should be aware that the phenomenology of the gestural space does not always seem to be well suited to support the system's diagnosing of the user's situatedness. Take the case of a system whose functioning is based on eye movement tracking: if I, while working professionally at the computer, keep gazing towards the picture of my lover, the machine might get the idea I'm in love with it! Figure 1 shows the three degrees of situatedness, determined in accordance with the different types of interaction that are possible from both the user's (Agent I) and the IMM's (Agent II) point of view. As to the first degree and its type of interaction, Agent II has no possibility of acting independently: it only responds to the commands of Agent I. However, Agent I's possibilities are restricted as well: this means that the flow of information is unidirectional only, and furthermore that Agent I must learn to use a formal code, and/or is restricted to choosing commands from a limited menu only, in order to be allowed to retrieve information. In the second degree of situatedness, the same restrictions as to its form of interaction are placed on Agent II (the IMM) as those constraining Agent I in the first
O.F. Kirkeby and L. Malmborg
54
degree. By contrast, Agent I's possibilities of acting are supposedly unrestricted, in the sense that there is free access to information, and that this access is provided in such a way as to suit the needs of the user, as defined by him/herself. One could say that in this case, situatedness is brought about through the interaction of the agents, I and II. In the third degree of situatedness, the type of interaction differs radically from the two previous ones in that Agent II now has acquired autonomy. Ideally, however, this autonomy should happen entirely on Agent I's conditions, as Agent II's behavior ought to reflect its task (supporting Agent I by simulating the latter's behavioral patterns) by reading the behavioral pattern of this Agent, even without Agent I's active cooperation, and accept the fact that there are certain autonomous possibilities of action that unambiguously deserve to be called communicative competence, and that these serve as a pre-condition to creative competence. SOME EXAMPLES How do degrees of incorporation and situatedness manifest themselves in these kinds of systems? Below, we shall give examples of specific applications within all of
Incorporation / closeness - > Character based, non b~raphical interface - I D
Graphical multimedia (audio/video)-
Synkinestetic interface
interface - 2 D
- 3D
Menu- or
command-based
Pr':sentation of text in i - D : 'Flag text
Hyper-based one-way interaction (agent I's ations open)
Free choice of text in l - D : Hypertext
interaction (agent i's actions restricte,4)
Presentation of text in: Multimedia
Information 'played 3 - D : Virtual reality:film'
Free choice of information in 2-D: Hypermcdia
Free choice of information in 3 - D : Virtual reality
T
Hyper-based mutual interaction (agent r s actions open and coordinated with agent ITs open action)
!
l
Mutual interaction in l-D: Text agents
"
Mutual interaction in 2-D:
Hypermedia-agents
,,=F---------4D-
Mutual interaction in 3-D: Virtual reality-agents
Figure 1. IMM and related technologies categorized by incorporation (horizontal dimension) and situatedness (vertical dimension). Agent I is the user and Agent II is the IMM. the nine categories in order to illustrate this from an interface-technological point of view. The categories are examined from leR to right, beginning from the upper row. We claim that the closer a certain system is to the lower right corner of the figure, the better it will support imaginization. The reason is that imaginization presupposes the highest possible degrees of situatedness and incorporation, combined with a maximum of interactivity. Ten years ago most systems could be allocated to the upper leR corner of our model. All of us have probably tried working with a word processing system where we
lmaginization and Interactive Multimedia
55
had to remember the meaning of the function keys, and where we had to go through quite a lot of menu layers before we got to where we could do what we wanted. As an example of a multimedia application with restricted possibilities of interaction for the user, consider a menu in which the user is led to a certain piece of information by being given a choice among a variety of options (such as e.g. manifested by icons). An example of a three-dimensional application in multimedia would be that of 'virtual movies' - i.e. movie 'watching' in three-dimensions, where we (primarily by the use of audio-visual effects) obtain a synkinesthetic experience, such as a sensation of falling forward that is so real that we actually fall. There are very few or no possibilities of interaction with this type of systems. In the category of systems that do offer the user a possibility of acting, the best known today are hypertext systems (actually a one-dimensional version of the systems mentioned in the previous paragraph). In hypertext, we are not restricted to a fixed way of'reading' the text, but we are allowed to use the text freely in accordance with our needs and our level of experience - just as we normally go about reading an encyclopedia (see McHoul and Roe, this volume). Rather than reading an encyclopedia from beginning to end, we consult it selectively: our reading is determined by the need to know, and by the wish to have additional more information presented to us as we go. In this way, we let the text adapt itself to us, whereas we in the first case had to adapt ourselves to the text as it was presented to us. (On the question of adaptation, especially 'who adapts to who?', see also the Introduction to this volume by Gorayska & Mey, as well as Mey, 1994). Other hypermedia systems allow us to adapt the information we need (not just in the form of text, but of sound and images as well), as we navigate through the system. When we non-technically speak of interactive multimedia we are olden referring to this type of hypermedia. Well-known examples are computer games like DOOM. 7 VR systems, as they are known today, border on the category possessing a true synkinesthetic interface, one in which all the user's possibilities of action are wide open. That is, the user does not have to adopt a particular way of communication in order to interact with the system, but is able to apply all of the senses 'naturally' during interaction; similarly, the system is able to influence all of the user's senses. The relationship between VR and synkinestheticality is a 'borderline' case, because most VR systems activate only a few of the senses for their operation. From a technological point of view, systems that are based on unrestricted possibilities of action, both for the user and for the system are still in a provisional state, although a limited number of 'primitive' text based systems of this type are well known and well tested. As an example, consider electronic agents such as MAXIM (Rosenschein & Genesereth, 1985) that assist users in sorting their mail on the basis of the latter's registered filing habits. Some hypermedia systems are based on a similar form of interaction: e.g., there are agents that present different choices of entertainment on the basis of their knowledge 7 DOOM simulates - rather convincingly - 3D effects. However, the decisive feature in determining whether or not we are in the presence of a synkinesthetic interface, is not just the experience of bodily motion in a three-dimensional space, nor is it the fact that that we are able to interact with the system through our body movements.
56
O.F. Kirkeby and L. Malmborg
of the 'taste' of the user in music, theater, literature, etc. An example is RINGO, a system that supports the user in her choice of music (Maes, 1994). The most advanced systems with regard to incorporation and situatedness would be those in which mutual interaction in a three-dimensional space takes place between the user and the system; however, we are not familiar with any examples of such systems. We can imagine, though, a system such as a virtual, intelligent office which would be able to adapt to, and act on behalf of, the user in an ongoing interaction. CONCLUDING REMARKS In our discussion of Multimedia and Virtual Reality in relation to our radically different way of interacting with these, the crucial problem of the cognitive possibilities and consequences of this combination presents itself. On the one hand, we have treated some of the problems inherent in combining discursive text, which refers to its objects, and images, which mostly exemplify. The conclusion here was that images in themselves do not offer any guarantee of a cognitive gain unless they are used as a means to imaginization. Discursive language might here show much more flexibility and possible depth due to its high degree of freedom in relation to the world mediated by our senses than do images. But what about the differences in the way we interact with IMM and VR, respectively? In the beginning of the article we showed some scenarios describing these differences. In relation to these we may conclude: If the VRsystem operates only at a technological level, where IMM is 'embedded' in VR, then we cannot be sure of any cognitive gain. One might here try - very tentatively - to state the following hypothesis. A powerful VR that comes very close to simulating sense experience, or perhaps even a VR still marked by 'artificiality', but rendering vivid, dynamical, expressive and colorful experiences of interaction, might cause traditional images, static or dynamic, with or without sound, to come close to discursive language, or at least to change into some kind of discursivity. That means that visual images as such, by losing their power of fascination, will lose both their imaginative and suggestive power. They will degrade into an unsuccessful version of a referring- in opposition to an exemplifying - media. Unsuccessful, because they not have the power of the spoken language: they are still 'dense', as Goodman used to call it, and they oppose codification. On the other hand, they lack the flexibility of discursive language because they are still bound to sense experience. In a way one might say, such visual images would degrade into some kind of all too complicated, cognitively unwieldy, iconographic language. Or put in another way: who would be willing to watch pictures of wine, when he could drink it? And who would draw pictures of her thoughts, when she could speak them out aloud? Perhaps the ultimate, ideal VR would play the film of civilization back to the place where the culture of literacy has not even begun; before symbolic representation, and hence, the possibility of generalizing over your own practices and over the reappearing patterns of nature which are the conditions of reflexivity. To summarize, we have - by using a phenomenological approach (meaning here: 'continental' phenomenology)- endeavored to cast some light on the phenomena of IMM and VR. The phenomenological approach provides us with some useful ways of conceptualizing interactivity in relation to the IMM interface, and it may give us some ideas as to how metaphors function cognitively. In this way, it may help us in
lmaginization and Interactive Multimedia
57
determining the true character of imaginization as a regulative notion - a notion that hopefully will clarify the issue of our spontaneous interaction with the IMM interface, and thus may inspire further development of these new media. We also hope that the two criteria we have suggested for categorizing these media, viz., the dimensions of closeness and situatedness, will contribute to establishing some criteria for evaluating the overall relation between humans and machines. REFERENCES
Adorno, Th., 1966. Negative Dialektik. Frankfurt a.M.: Suhrkamp Verlag. Austin, J. L., 1962. How To Do Things with Words. Oxford: Oxford University Press. Bloch, Ernst, 1969. Das Prinzip Hoffnung. Vol. III. Frankfizrt a.M.: Suhrkamp Verlag. Foucault, M., 1975. Surveiller et punir; Naissance de la prison. Paris: Gallimard. Habermas, Jtirgen, 1981. Theorie des kommunikativen Handelns. Frankfurt a.M: Suhrkamp Verlag. Hegel, G. W. F., 1807 (1952). Phanomenologie des Geistes. Hamburg: Felix Meiner Verlag. Heidegger, Martin, 1927 (1967). Sein und Zeit. Ttibingen: Max Niemeyer Verlag. Heidegger, Martin, 1955. Die Technik und die Kehre. Pfullingen: Neske. Husserl, Edmund, 1900 (1980). Logische Untersuchungen. Vol. I-III.Ttibingen: Max Niemeyer Verlag Johnson, Barbara, 1995. The Wake of Deconstructionism. Cambridge: Harvard University Press. Johnson, Mark, 1987. The Body in the Mind. The Bodily Basis of Meaning, Imagination and Reasoning. Chicago: The University of Chicago Press. Kirkeby, Ole Fogh, 1994a. Event and body-mind. A Phenomenological-Hermeneutic Analysis.Aarhus: Modtryk. Kirkeby, Ole Fogh, 1994b. World, word and thought. Philosophy of Language and Phenomenology. Copenhagen: CBS Publishers. Lakoff, George, 1987. Women, Fire and Dangerous Things. What Categories Reveal About the Mind. Chicago and London: The University of Chicago Press. Laurel, Brenda, 1993. Computers as Theater. Reading, Mass.: Addison-Wesley. Lipps, Hans, 1958. Die Verbindlichkeit der Sprache. Frankfurt a.M.: Vittorio Klostermann. Maes, Patti, 1994. Agents that Reduce Work and Information Overload. Communications of the ACM, July 1994/Vol. 37(7): 31-40. Merleau-Ponty, Maurice, 1945. Ph6nom6nologie de la perception. Paris: Gallimard. Merleau-Ponty, Maurice, 1964. Le visible et l'invisible. Paris: Gallimard. Mey, Jacob L., 1994. Adaptability. In: R. Asher & J.M.Y. Simpson, eds., The Encyclopedia of Language and Linguistics, Vol. 1,265-67. Oxford & Amsterdam: Pergamon/Elsevier Science. Piaget, Jean, 1923. Das Erwachen der Intelligenz beim Kinde. Stuttgart: Kohlhammer. Rosenschein, Jay S. and Michael R. Genesereth, 1985. Deals among Rational Agents. Proceedings of the Ninth International Joint Conference on Artificial Intelligence. AAAI Press, Menlo Park, Calif., 91-99. Searle, John R., 1974. Speech Acts. Cambridge, England: Cambridge University Press. Wittgenstein, Ludwig, 1989. Philosophische Untersuchungen. Werkausgabe Bd.1, Frankfurt a.M.: Suhrkamp Verlag.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
59
Chapter 3
INTELLIGENCE AUGMENTATION: THE VISION INSIDE VIRTUAL REALITY Frank Biocca
Communication Technology Group University of North Carolina at Chapel Hill, USA
[email protected], edu
VIRTUAL REALITY AS A COGNITIVE TECHNOLOGY Can any computer truly enhance the functioning of the human mind? Can steel and silicon be so harmonized with the chemistry of the brain, that one amplifies the other? If human intelligence is partially shaped by the environment, can a highly enriched virtual environment augment human intelligence? At its essence, this is almost the same as asking, "Is there such as thing as a cognitive technology?" The very title of this b o o k - the very history of print itself-suggests that we want to answer, "yes". In this chapter I will take a glance inside the 3D world of virtual reality (VR) designers and observe them impelled by a vision of intelligence augmentation through immersive VR technology. From the very beginning,VR engineers and programmers have conceived of the medium as a cognitive technology, a technology created to facilitate cognitive operations (Brooks, 1977, 1988; Furness, 1988, 1989; Heilig, 1955/1992; Krueger, 1991: xvii; Lanier and Biocca, 1992; Kheingold, 1991; Sutherland, 1968). For a large segment of computer graphic engineers and programmers, virtual reality technology marks a significant milestone in the development of computer interfaces (Foley, Van Dam, Feiner, and Hughes, 1994). Fulfilling a long term goal in the history of media (Biocca, Kim, and Levy, 1995), VR promises to finally create compelling illusions for the senses of vision, hearing, touch, and smell. In the words of a respected VR designer who has helped pioneer systems at NASA and the University of North Carolina, "The electronic expansion of human perception has, as its manifest destiny, to cover the entire human sensorium" (Robinett, 1991 : 19). Like a bright light just out of reach of their data gloves, VR designers stretch their arms to grasp an enticing vision, the image of virtual reality technology as Sutherland's "ultimate display" (Sutherland, 1965), a metamedium that can augment human intelligence. Engineers and programmers attempt a masterful orchestration of electricity, LCDs, hydraulic cylinders, and artificial fibers. With these they hope to so dilate the human senses that waves of information can pour through this high bandwidth channel into the brain. In full union with the user, virtual reality might
F. Biocca
60
emerge to be a universal "tool for thought". In this vision virtual reality would extend the perceptual and cognitive abilities of the user. The claim that virtual reality may augment human intelligence is based on the increasingly compelling, sensory fidelity of virtual worlds. Computer graphics and kinematics capture more and more of the physical and sensory characteristics of natural environments. Immersive VR simulations increasingly perfect the way the virtual environments respond to user actions: the link of physical movement to sensory feedback increasingly simulates human action in a natural environment (Biocca and Delaney, 1995). The designers' confidence in the cognitive potency of these environments results in part from the very experience of the medium, the deep gut level reaction that designers and users feel when immersed in high-end VR systems. This experience suggests to some that VR has crossed a threshold never reached by older media. More than any other medium, virtual reality gives the user a strong sense of "being there" inside the virtual world. The senses are immersed in an illusion. The mind is swathed in a cocoon of its own creation. The word, "presence", (Sheridan, 1992) has come to mean the perceptual and cognitive sensation of being physically present in a compelling virtual world. In this chapter I would like to consider the design agenda that motivates VR designers' claims that virtual reality is a cognitive technology. More specifically I want to look at the goal of intelligence augmentation that beats in the heart of VR. I will consider the following question:
What are the claims implicit in the idea of intelligence augmentation through the use of VR technology? What are they? How are they conceptualized? Are they valid? In what way? INTELLIGENCE AUGMENTATION INTELLIGENCE (AI)
(IA)
VERSUS
ARTIFICIAL
Looking at the whole human enterprise of computer design, we can pick out three competing visions of the computer. Each goads the efforts of engineers and programmers: 1) the creation of an artificial mind, Artificial Intelligence (AI) 2) the creation of a mind tool, Intelligence Augmentation (IA), 3) the control of nature, machines, and telecommunication, Control and Communication (CandC).
i
Computer Design Goals
Figure 1
}
Intelligence Augmentation
61
Researchers, ideas, and money have flowed through the three streams of research, rushing out through our desert of ignorance towards three points on the horizon. Researchers and ideas have often drifted from stream to stream. Over time, shifts in human energy and interest have made each stream rush ahead. The streams have sometimes flowed into each other, for example, they have sometimes made use of similar developments in computational, display, and storage devices. But there remains a fundamental gap between these streams. They flow through different terrains and overcome different obstacles as they meander forward. The separation between these streams is sometimes slight, but it is always there. Within each stream the currents of thought that power the flow of research are propelled by a different understanding of the relationship between human, artifact, and environment The opposition between artificial intelligence and intelligence augmentation is particularly revealing of the motivation behind the design of VR. VR pioneers like Fred Brooks of the University of North Carolina are fond of saying that when computer science was fixating on AI, the eyes at his lab were all focused on the mirror image, IA. 1 The clever reversal of the letters suggests something more profound. In each, M and IA, there is an inversion of the relationship of humans to machines. Each is building a mind: one is human, the other silicon and electricity. But M and IA emphasize different cognitive operations. Building an artificial mind is a very different goal from artificially amplifying the human mind. The success of one may come at the expense of the other. In Table 1, I have tried to list some of the key points where the goals and understandings of M and IA designers diverge.
Points where Artificial Intelligence (AI) and Intelligence Augmentation (IA) Diverge
M
IA
Artificial Intelligence
Intelligence Augmentation
AI seeks to create an intelligent other. IA wants to create an intelligence tool. AI wants to internalize artificial IA wants to externalize human consciousness in a machine consciousness in a machine AI focuses on the detached mind. IA focuses on the mind/body in a context. AI emphases abstract decision making. IA emphasizes the thinking through the senses. AI engineers mind through products of the IA engineers mind through the body mind AI simulates cognitive operations IA simulates cognitive environments AI wants to produce an independent IA wants to produce a dependent machine machine Table 1
1Fred Brooks calls it " intelligence amplification". I have called it intelligence augmentation to connect the program of VR research to the longer tradition of interface design traced back to the work of Vannevar Bush and Douglas Englebart.
62
F. Biocca
THE PROMISE OF INTELLIGENCE AUGMENTATION
Sir Francis Bacon saw in technology a "relief from man's burden". AI tries to produce a silicon slave to perform mental labor; IA tries to produce a mind tool to enhance the same labor. This notion of relief from labor has often been accompanied by a related thought, the idea that relief from drudgery elevates the human mind for higher things. In the early days of computer design when VR, hypertext, and the World Wide Web were but phantasms floating above a hot noisy box of vacuum tubes, Vannevar Bush wrote an early form of the proposal for computer-based augmentation of human intelligence in his classic article, "As we may think" (Bush, 1945). He looked at the emerging mind tool and articulated four key goals: a) relief from the "repetitive processes of thought" (p. 4); b) improved methods for finding, organizing and transmitting information; c) "more direct" means for "absorbing materials...through...the senses" (p. 8); d) improved means for "manipulating ideas" (p. 4). Bush's dream of a computer tool he called "Memex" was to be more than a hypertext engine. It was also designed to be a VR-like device for augmenting intelligence by channeling electrical information through the senses: In the outside world, all forms of intelligence, whether sound or sight, have been reduced to the form of varying currents in an electric circuit in order that they may be transmitted lnside the human frame exactly the same sort of process occurs. Must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another? (Bush, 1945: 8). In the work of later designers Bush's ideas evolved. The machine would not only liberate the mind for higher things, it would augment it. Like a vacuum tube it might amplify the neuronal currents coursing through the brain. With the invention of the m o u s e - a simple 2D input device - the body entered cyberspace (Bardini, in press). In the work of its inventor, Douglas Engelbart, we see the most explicit expression of the goal that VR has inherited, his project for the "augmentation of the human intellect". By "augmenting the human intellect" we mean increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions to problems that before seemed insoluble. Augmenting man's intellect...can include.., extensions of means developed., to help man apply his native sensory, mental, and motor capabilities - we consider the whole system of the human being and his augmentation means as proper fields of search for practical capabilities. ~ngelbart, 1962: 1-2) VR is now a major site where the "search for practical capabilities" attempts to apply our "native sensory, mental, and motor capabilities". Engelbart's project takes
Intelligence Augmentation
63
place at the cusp of the 1960's, a decade known for the pursuit of human and social transformation including the use of chemical technologies for "mind amplification". These cultural themes of human transformation and perfectibility achieved further expression in the human potential movement of the 1970s and 1980s. By the 1990s human potential enthusiasts like Michael Murphy, co-founder of the Esalen Institute, were cataloging massive lists that purported to show "Evidence of Human Transformative Capacity" (Murphy, 1992). But this movement dwelled on the older technologies of eastern ascetic, religious, and medical practice. This cultural thread very much alive in places like Silicon Valley- would come to rejoin virtual reality technology in the early days of its popularization. The mixture of these themes was welcomed and echoed in such cultural outposts as the magazines Mondo 2000 Wired, The Well, and Cyberpunk culture. It is on the borders of this frontier that VR research rides out towards the forward edges in pursuit of intelligence augmentation. But the earlier notions that the machine would free the mind for "higher" things were sometimes born of a disdain for physical labor. This sentiment was tinged by a Cartesian distrust of the body and the evidence of the senses. But VR's research program embraces the body and the senses with Gibsonian notions (Gibson, 1979) of the integration of the moving body, the senses, and the mind. Its most ardent enthusiasts promise to augment the mind by fully immersing the body into cyberspace. VR promises to take the evolutionary time scale both backwards and forwards by immersing mind and body into a vivid 3D world, from the open savanna to fields of data space. VR promises to take the external storage system, which was born when the first human symbol was stored in sand or clay, and immerse each sensory channel into the semiotic fields of human communication activity. Reflecting the interaction of technology and the body, Jude Milhon, an editor of Mondo 2000 proclaimed, "Our bodies are the last frontier". (Wolf, 1991). Standing on the edge of that frontier, we ask: Will the sensory immersion afforded by V R - this multisensory feedback loop between social mind and its creations- amplify, augment, and adapt the human intellect? Can such a vision guide a research program? How do VR designers conceptualize this outcome they pursue? HOW IS INTELLIGENCE AUGMENTATION CONCEPTUALIZED? TWO PHASES: AMPLIFICATION AND ADAPTATION Ideas about a VR-like machine that can augment intelligence have been advanced primarily by computer scientists and rarely by psychologists (e.g., Brooks, 1977; Bush, 1945; Licklider and Taylor, 1968; Heilig, 1955/1992; Krueger, 1991; Sutherland, 1968). The conceptualization of intelligence augmentation has sometimes been wanting the technology was claimed to somehow assist thinking or augmented human performance. How it will assist thinking is not always specified. The conceptualization has been, for the most part, sketchy- more a design goal than a psychological theory. But the incomplete conceptualization is partially compensated by its concrete operationalization in the actual designs. These designs embody theoretical postulates. These postulates and hypotheses are sometimes made more explicit in studies of the value of simulation and virtual reality technology for cognitive operations. Let's briefly -
F. Biocca
64
explore what intelligence augmentation might mean for media technology in general and for VR specifically.
! !
IMediuml
I I
Mind/Body
Adaptatiol "11 I ~
Environment
Figure 2. The interaction of mind, medium, and environment can be seen in two phases: 1) amplification of the mind and body, and 2) adaptation of mind and body. Most technologies, but especially communication media, interact with cognition in one of two ways. Figure 2 illustrates these two phases in the interaction of mind, medium, and environment: (a) amplification, tools that amplify the mind; (b) adaptation, mediated environments that alter the mind. This distinction not only captures two phases in the interaction of humans with technology, it also suggests two types of theoretical claims. When theorists say that a medium like virtual reality amplifies cognitive operations, it is implied that those operations are not fundamentally altered. The mind remains as it was before contact with the technology. When theorists argue that a medium alters mental processes, then a stronger claim is made: the mind has adapted in some way to the medium. Many theorists would argue that cognitive amplification tends to lead to cognitive adaptation. For example, this is what McLuhan meant by the "Narcissus effect" of media: we embrace some aspect of ourselves (our objectified mind) and become fixated and defined by this one facet of ourselves. A set of cognitive operations, a part of us, is selected, favored, and augmented. We are changed through the selective enhancement of cognitive skills.
Amplification Claims that media amplify cognition group into three general types sensorimotor amplification, simulation of cognitive operations, and objectification of semantic structures.
Sensorimotor Extension McLuhan (1964, 1966) popularized the notion that media "extend the senses". McLuhan was unknowingly continuing a long tradition in engineering philosophy that saw technology as organ extension (Mitcham, 1994). This position is now widely accepted. Media are seen as prosthetics - once attached they extend the body or mind. In what way might this augment intelligence? Human intelligence is provided with more sensory data and experience when the senses are extended over space (e.g., telephone, remote sensing), over time (e.g., photography), and beyond the bounds of normal sensation (e.g., infrared goggles). Before the arrival of advanced VR
Intelligence Augmentation
65
telepresence systems, media extended only the visual and aural senses, for example, the way a remote control television extends our vision and hearing into another room. VR expands the possibility of sensorimotor extension. More senses are addressed with illusions of greater fidelity. But VR also integrates the actions of the body and the senses in a more "natural" way when it extends them. Many older technologies extend motor capabilities but provide poor feedback. For example, a back hoe extends the scooping action of the arm and hand, but provides little more than visual feedback. VR telepresence systems may improve both human performance and amplify human intelligence by closing gaps in the feedback loop between action and sensation. The user can explore distant real environments or purely virtual environments with more of the body.
Simulation of Cognitive Operations To the degree that many technologies are extensions of the body, they simulate physical and mental processes. Mental processes require mental labor. If the labor is transferred to some electromechanical entity, then more brain capacity may be available for pattern perception, decision making, and creativity. This proposition has been the driving force behind the design of the computer since at the least the days of Babbage - i f mathematical processes can be simulated by gears, tubes, or silicon, these mental operations could be amplified in speed and complexity. In this way human intelligence might be freed and amplified. At the moment, designers clearly do not yet know how to best represent and simulate mental operations. It is one thing to conceptualize mental models (e.g., Johnson-Laird, 1984), it is another to build a tool that amplifies them. It is not yet clear how best to use the unique capabilities of VR technology to teach, assist, or augment cognitive skills. It is not clear how much of the existing research about media and development of cognitive skills applies (e.g., Salomon, 1979; Wetzel, Radtke, and Sterm, 1994). At the moment designers are merely importing techniques that have been used to instruct individuals using pictures, film, and animation. The unique representational capabilities- the "language" of the medium- are only beginning to be explored (e.g., Meyer, 1995).
Objectification of Semantic Structures Intelligence can be augmented by the objectification of a mental structure in some material form. The use of external memory storage systems is an evolutionary development that helped in the evolution of the mind (Donald, 1993). The objectification of semantic structures is the very essence of all semiotic systems (Eco, 1976): media and the codes they use allow users to record, store, exchange, and manipulate ideas. Various forms of computer technology are replacing older interfaces and storage media like the notepad, the drafting board, and the physical model. The objectification of semantic structures in a code or message reduces attentional and memory load while augmenting the performance of creative and decision making processes. Most computer systems allow users to easily manipulate thought objects by manipulating symbolic objects. The most common is the objectification of a semantic network in some medium: outlines, diagrams, lists, etc. During decision making, concepts can be scanned. They can be made contiguous or linked in some way:
66
F. Biocca
hierarchical modeling, causal modeling, etc. There is evidence that the spatialization of thought, the objectification of symbolic tokens in a spatial structure, appears to augment human intellectual performance. The work on data visualization is based on the notion that human performance can be enhanced if abstract information is spatialized. It is proposed that human intelligence can detect patterns in abstract relations by using the ability of the senses to detect patterns (invariances) in the visual field. VR designs promise to extend this to all of the senses.
Adaptation Intelligence amplification involves the augmentation of human intellect without any significant change in intelligence, i.e., changes in cognitive processes or structures. A crane or back hoe may amplify the power of the human arm, but it does not alter the arm in any way. The concept of adaptation suggests that the amplification of human intelligence through a medium may alter cognitive processes and structures. The mind adapts in function or structure to the medium. When humans and technology come in contact, we can observe both short and long term human adaptation. Broadly speaking, adaptations following the use of a technology can be psychological, behavioral, or physiological. Look down towards the floor and take a look at a simple technology like the shoe. Many of us don't think of the shoe as a technology, but it is an old technology we take for granted. Mentally compare your foot to that of a shoeless Kalahari desert Bushman. OK. Think about the shape of that foot. Any urban dweller can observe that long term use of the shoe may create a structural adaptation in the shape of the human foot (e.g., the toes curl inward and push against each other) and texture of the sole (e.g., a less callused and softer sole). This is a simple, easily observable physiological adaptation of the morphology of the body brought on by the extended use of a technology. Now let's consider the idea of cognitive adaptation to VR systems. Adaptation of cognitive processes might emerge from either long term or short term use of a medium. Because VR is a new technology, most of our experience is with short term adaptations. But the issue of adaptation is already a central problem in VR design. For example, some users experience simulation sickness (Biocca, 1992) when using VR systems. Simulation sickness appears to be related to motion sickness. To some degree, simulation sickness is caused by the inability of the brain to reconcile and adapt to discordant spatial cues impinging on the senses immersed in the VR systems (i.e., vision) and cues from the physical environment (e.g., proprioception). The body's response to this intersensory conflict is simulation sickness. VR systems are imperfect. Designers assume that the user's perceptual and proprioceptive systems will adapt to the medium. A study of adaptation to an augmented reality system, showed that the perceptual-motor system does rapidly adapt to the sensory alterations of a VR system (Biocca and Rolland, 1995). Subjects' handeye coordination was significantly adapted as a result of a virtual displacement in felt eye position. Once users removed the VR equipment, their hand-eye coordination remained adapted to the VR environment. They made significant pointing and reaching errors. They had to learn to readapt to the natural environment. Note that none of this evidence of adaptation shows any augmentation in human cognitive performance. These adaptations or failures to adapt are all decrements in human performance. This is not to say that VR will not lead to adaptations that augment cognitive processes and
Intelligence Augmentation
67
structures. For example, long term use of VR may augment spatial cognition. But there is little evidence of this yet, though we can observe improvements in human performance. The interesting questions as to whether long term use of the medium can augment human performance through adaptation remains unanswered. KEY DESIGN HYPOTHESES LINKED TO THE GOAL OF INTELLIGENCE AUGMENTATION The design of VR is motivated by a set of design postulates and hypotheses that are psychological in nature. A VR designer at Autodesk and the University of Washington's Human-Interface Technology Lab (HITL), William Bricken, captured the essence of VR design when he pithily pronounced: "Psychology is the physics of virtual reality" (quoted in Woolley, 1992: 21). Virtual worlds are constructs of the senses. The psychological reality of VR is what matters in the final analysis. Therefore, many design principles are based on implicit or explicit psychological postulates and hypotheses. Many of these pertain to the design goal of intelligence augmentation. I would like to briefly discuss the key ones that appear to drive the design of VR. They are often advanced as postulates, but I will treat them as hypotheses. Each suggests references to a number of psychological theories. I will not refer to these here, but rather present each hypothesis as it is used by VR designers. The Bandwidth Hypothesis: VR can increase the volume of information absorbed by a human being. If media are information highways, than designers see VR as a potential superhighway to the mind. The goal is the feeling of presence (Sheridan, 1992). The senses are the delivery vehicle. VR designers try to deliver enough veridical information to the senses so that a coherent, stable, and compelling reality emerges inside the mind of the user. As Warren Robinett, master VR designer at NASA and the University of North Carolina, said of his goal, "I want to use computers to expand human perception" (Rheingold, 1991: 25). On the engineering side this manifests itself as four design goals: 1) increase the number of sensory channels addressed by VR; 2) increase the sensory fidelity and vividness within each sensory channel; 3) increase the number of motor and physiological input channels; 4) link and coordinate the motor outflows (i.e., walking, head turning) to sensory inflows (i.e., visual flow) so that they match or even exceed those found in the natural environment. In simulator systems (e.g., driving and flight simulators) the bandwidth hypothesis is straightforward. The goal is "fidelity". The design attempts to precisely match all the relevant sensory characteristics of the real world, task environment, "(1) the physical characteristics, for example, visual, spatial, kinesthetic, etc.; and (2) the functional characteristics, for example, the informational, and stimulus and response options of the training situation" (Hays and Singer, 1989: 3). The user learns a set of perceptual discrimination and motor tasks by doing them. In an imperfect system, when absolute fidelity is not possible, the problem becomes determining what are the most "relevant", task-related cues.
68
F. Biocca
But the argument for increasing sensory bandwidth goes beyond the goal of replicating natural environments. One also finds an implicit or explicit argument that suggests the greater the number of sensory channels and the greater the sensory information, the better the learning. Various versions of this proposition have proponents in the VR design community. For example, master VR designer Fred Brooks asserts, "we can build yet more powerful tools by using more senses" (Brooks, 1977). Even as early as 1965, Sutherland argued that the computer "should serve as many senses as possible" (1965: 507). The bandwidth hypothesis is a seductive idea. It has accompanied many proposals for augmenting human intelligence through computer interfaces. For example, the influential work of master designer Alan Kay contained a version of the bandwidth argument when he outlined a design for an all purpose learning machine he called the "dynabook .... a dynamic medium for creative thought" (Kay and Goldberg, 1977). Researchers have tended to emphasize the portability of the dynabook, but more important was the notion that the dynabook was to be a "'metamedium' (that) is active". In its interactivity the metamedium was to "outface your senses...(and) could both take in and give out information in quantities approaching that of the human sensory systems". (Kay and Goldberg, 1977: 32). Intelligence augmentation was one of the goals of this device. Kay hoped to help the user "materialize thoughts and, through feedback, to augment the actual paths the thinking follows" (Kay and Goldberg, 1977: 31). Kay and Goldberg summarized a design prejudice that is now widely shared by the VR community, "If the 'medium is the message' then the message of low-bandwidth is 'blah'." (1977: 33). The Sensory Transportation Hypothesis: V R can better transport the senses across space, time, or scale.
Media historian Harold Innis (1951) was among the first to focus on the role of communication media in the manipulation of space and time. VR technology advances this function of communication media. But with V R , the manipulation, construction, and reconstruction of space is central to the use of the medium. It is clearly central in the construction of virtual space, that 3D illusion that beguiles the sensorimotor channels of the user. But manipulation of space has another important role in VR technology. Some dimensions of the technology emerged from the research program in telerobotics. The central goal of the program of telerobotics and telepresence is not the construction of cyberspace, but the collapse of physical space. The collapse of space is built on the electronic transportation of the senses across space. In his greetings at the first IEEE Virtual Reality Annual International Symposium (VRAIS), Tom Furness, Air Force VR pioneer and a leading VR engineering researcher, proclaimed that "advanced interfaces will provide an incredible new mobility for the human race. We are building transportation systems for the senses ... the remarkable promise that we can be in another place or space without moving our bodies into that space" (1993: i). At the distant frontiers of VR's transportation mission lies an agency whose sole mission is the collapse of space. NASA is developing virtual reality as a means of transmitting the experience of being telepresent on distant planets ( McGreevy, 1993). At the other end of the spatial scale are VR systems squeezing the human senses down into the space that surrounds atoms. Work at the University of North Carolina
Intelligence Augmentation
69
(Robinett, 1993) ties the virtual reality interface to the end of a scanning-tunneling microscope. Atoms become mounds on what looks like a beach of pink sand. Atoms can be "touched" and even moved; the pink sand reshapes itself and new mounds appear. Both of these examples are different forms of one way to augment human intelligence: the extension of sensorimotor systems. The Expanded "Cone of Experience" Hypothesis: Users will simulate and absorb a wider range of experience. There is a materialist streak in the VR community, learning is seen as the direct outcome of experience. It is reasoned that more experience leads to more learning. But the argument is slightly more complex. Harking back to Dewey and Gibson (1979), there is an implicit proposition that 3D, sensory, and interactive experience is at the core of learning invariants and patterns in the environment. The promise of VR brings out another function of media: the simulation and modeling of the world of experience. This function of media is as old as the theater and role playing. Media, such as VR, can be characterized as expanding the "cone of experience". The human mind can vicariously experience a wide range of situations. The range of experiences and the diversity of models of problem solving and action have been augmented by communication using existing media. VR promises to expand the capability of media by making the expanded cone of experience a little less vicarious. Unlike books, the user need not use as much imagination to fill in the mental simulation. VR designers try to directly engage the automatic, bottom-up perceptual processes to deliver an intense simulation of an experience. This is the essence of the goal of delivering experience that gives users "a sense of presence". VR proselytizer and artist, Jaron Lanier, was fond of suggesting that the goal of VR is the construction of a personal "reality engine", an all purpose simulation device (Lanier and Biocca, 1992). This is far beyond what the technology can do, but developments far short of this goal may have effects on the amplification of human intelligence. The property of VR, alluded to by Lanier and embodied in this hypothesis, involves two aspects of intelligence augmentation: the attempt to simulate cognitive operations and the expanded experience of objectified semantic structures- exposure to predigested cultural understandings. As Jaron Lanier has observed, "Information is alienated experience" (Rheingold, 1991). The Sensification of Information Hypothesis: Relationships in abstract information are better perceived and learned when mapped to sensory/spatial/experiential forms. Sensification is a generalization of the concept behind the terms "visualization" and "sonification". It means the creation of representations that use the information processing properties of the sensory channels to represent scientific data and other abstract relationships. Work arguing for the value of sensification for intelligence augmentation often has a neo-Gibsonian (1979) cast. It is argued that over thousands of years of evolution, the mind and the body have evolved to move, think, and act in a 3D environment. Because of the limitations in our symbolic systems and representational technologies, our means of communication have not been a b l e - until n o w - to fully harness the rich multisensory, spatial, and kinematic components of human thought and problem solving. VP~ more than any other medium, comes close to providing an environment that has all the sensory characteristics of the physical world
70
F. Biocca
in which our brain has evolved, while retaining the responsiveness and flexibility of abstract semiotic systems like language and mathematics. In some VR systems scientists sail through 3D scatter plots, chemists pick up 3D models of molecules with their hands to think up new pharmaceuticals, and stock market patterns are perceived through a cavelike corridor of undulating curves and changing sounds. The goal is to take the pattern detection capabilities of the senses, the spatial modeling capabilities of the eyes, ears, and muscles, to perceive, model, and manipulate ideas. The work on scientific visualization suggests the possibility for increased ability to detect patterns in data, faster problem solving, and more creative ideas. These are some of the cognitive outcomes Engelbart (1962) sought from his project to augment human intelligence. In essence, it is argued that advanced sensory displays can augment human intelligence by involving the senses more directly in the perception and manipulation of iconic entities. Amplification of Interpersonal Communication Hypothesis: Humans will be able to express and receive a broader range of human emotion, intention, and ideation. All the propositions so far have emphasized the augmentation of what Howard Gardner (Gardner, 1977) would call logico-mathematical and spatial intelligence. Until recently, most VR systems have involved a single operator moving in a socially barren environment. Those social VR environments that existed, for the most part have been designed for the military. The primary interpersonal interaction is search and destroymore the augmentation of interpersonal annihilation than the augmentation of interpersonal communication. As VR matures and multiple users can be represented in VR environments, more researchers are considering the use of VR to amplify interpersonal communication (e.g., Biocca and Levy, 1995; Palmer, 1995). Part of the early mission of intelligence augmentation through computer design was the creation of a "more effective" means of interpersonal communication (Licklider and Taylor, 1968). Most existing media like the telephone and email transmit only reduced personal presence. The primary goal in this area has been telepresence, the attempt to reproduce most of the cues found in interpersonal communication (e.g., Morishima and Harashima, 1993). This goal, if achieved, would do nothing more than reproduce any common face-to-face interaction. This is no small achievement. It involves the transportation of the sensorimotor channels. But it is hard to see how simply recreating an everyday interpersonal interaction could augment human intelligence.
Some writers have speculated about the design of hyperpersonal or hypersocial VR environments. In these environments VR tools would amplify interpersonal interaction cues such as facial expression, body language, and mood cues. For example, Jaron Lanier (Lanier and Biocca, 1992) has speculated about how VR environments could be designed to alter body morphology to signal mood. Biocca and Levy (1995) have discussed expanding the sensory spectra of users by mapping physiological responses such as brain waves, heart rate and blood pressure to properties of the environment such as room color to signal mood and cognitive states. There have been so few experiments in this area. It is not at all clear in what direction such tools would influence interpersonal communication or the augmentation of human intelligence.
Intelligence Augmentation
71
INTELLIGENCE AUGMENTATION: CAN A VISION BECOME A "SENSIBLE" RESEARCH PROGRAM? The overall goal of augmenting the human intellect is a highly motivating vision of the possible utility of the cognitive technology. It has also become a research program. The ideas listed above motivate design and research work in the area of VR. Researchers in VR labs around the world explicitly or implicitly subscribe to one or more of them. Each hypothesis (design postulate) mentioned above is as much vision as it is scientific hypothesis. In some ways the very nature of these "hypotheses" indicates a difference between the design sciences and the natural sciences. The "hypotheses" are not just about the "discovery" of scientific laws. They are teleological in spirit (Biocca, Kim, and Levy, 1995). They reflect human goals, the desire to exercise human will in the construction of an artifact - t h e very creation of virtual and cognitive reality. Are these goals attainable? I leave the response to another paper or to another 50 years of research. We might ask a more modest question: are these hypotheses sensible? Can they be founded on any valid evaluation of the technology or of the plasticity and abilities of the human mind? After all, we hardly know what "intelligence" is, how can we hope to "augment" it? Each "hypothesis" will certainly require more profound theoretical elaboration as both research and design move forward. As an example, let's consider one set of ideas that would require more theoretical elaboration as they are transformed from visionary proclamation to a concrete theory of human-computer interaction. A number of the hypotheses share a common assumption that simply increasing the sensory fidelity or vividness of information will improve human performance. This is partially due to the logic of simulator design (e.g., Hays and Singer, 1989; Rolfe and Staples, 1986). It is assumed that the closer the simulator is to the "real" thing, the better the training. When one thinks of plane, tank, or car simulators, this seems to have face validity. If someone is trying to learn motor sequences, it makes sense that practicing the actual sequences would be better than reading about them and imagining the motor sequences. But it does not follow that the sensory fidelity or vividess of VR systems would generalize to an overall improvement in human performance. Research on the value of sensory fidelity using previous media like pictures, film, and video has produced inconsistent results. For example, there is little support for the notion that more vivid messages are more memorable or persuasive (Taylor and Fiske, 1988; Taylor and Thompson, 1982). It also appears that sensory vividness interacts with individual differences. For example, the sensory vividness of training materials interacts with the ability of students. In one experiment using pictures and videos, increased sensory fidelity assisted students of low ability but provided no assistence to those of higher ability (Parkhurst and Dwyer, 1983). Existing research on instructional training and simulator design is not uniformly supportive of the ideas that increased sensory fidelity improves learning or performance (Alessi, 1988; Hays and Singer, 1986; Wetzel, Radtke, and Stern, 1994). One also has to ask a more basic necessarily valuable? Increasing sensory the information is relevant to the user's the best way to use media to train
question: Is any increase in sensory fidelity fidelity provides more information, but not all communication goals or tasks. In some cases, someone involves reducing the amount of
72
F. Biocca
information. For example, we o~en use maps or schematics of objects -like engines or human internal organs - rather than pictures. The reduced information of the schematic helps the user to detect the relevant information such as the location of various components. Learning a skill (e.g., a doctor's reading of chest X-rays) sometimes involves acquiring the ability to pick out relevant information from a field of noise and irrelevant data. Interfaces may reduce or alter the sensory fidelity of the image to selectively highlight the relevant cues. But assessing the design value of some dimension of sensory fidelity is not always clear or obvious. We don't always know how the mind uses various sensory cues. Consider the following design decision: Should designers of a driving simulator simulate ambient "street and engine noise"?. Will street and road noise increase or decrease the performance of a novice driver? Increasing the sensory fidelity of steering wheel dynamics is clearly more important than increasing the fidelity of street and engine noise. But a number of cognitive issues might be involved about a decision involving street noise. For example, there is the question of the user's attentional capacity: a novice driver is already bombarded with more information than he or she can handle. There is a question of information relevance: street noise might be just that, noise. It might carry little informational value. On the other hand, the changing acoustics of the tires on the road or wind noise as the car turns might provide some unconcious information about the automobile's velocity or attitude. For example, there is ample evidence that car drivers use the sound of their car to detect changes in its performance. So even when assessing the value of a detail like auditory simulation of street and road noise, its value for human performance is not clear. While there is some valuable research (e.g., Gibson, 1966; 1979), we still know too little about how humans use sensory cues to assemble cognitive models of environments. But my brief discussion of the issue of sensory fidelity still has not addressed the larger question of intelligence augmentation: Can a medium's level of sensory fidelity ever increase human intelligence? Take my example of the car simulator above. What if we had the perfect car simulator, one that would reproduce every sensory detail of car driving: the feel of the steering wheel, the 3D visual world rolling past the windshield; the rattle of the doors and the shoosh of the wind rolling over the car body; the smell of the plastic car interior, etc. At its best, such a simulator would do nothing more than simulate what you probably experience every d a y - driving a car. Would this augment human intelligence? The fellowship of car drivers stuck in traffic jams all over the world would certainly shout, "No!" Before we rush to judgement that something like sensory fidelity has little to do with augmenting human intelligence, we should remember one thing. Virtual reality is not really about reproducing reality. So my car simulator example leaves out a large segment of virtual environments. Simulation does not always mean reproduction. In fact, few media try to reproduce reality; rather they select and amplify certain parts of human experience. Consider the last movie you saw. Was it "realistic"?. Sure, the stroboscopic illusion of visual motion flowing on the screen had a certain level of sensory fidelity. But that visual sensory realism was attached to a camera. Through camera movements and zooms, your "augmented" vision travelled through space. It sometimes occupied positions in space you rarely occupy. Some moments you saw the scene through the eyes of one character, then, suddenly, through the eyes of another. Is this movement from one human identity to another realistic? Through editing, your
Intelligence Augmentation
73
"augmented" vision jumped around unrealistically through space from one scene to another, from one place in time to another. Is this realistic? In fact the whole format of the movie medium selected, abbreviated, and amplified all manner of human experience. The experience of travel, love, death, anger were all condensed and funneled through the medium. The m e d i u m may have simulated how we think rather than simulated reality. Do such codes and media augment intelligence? At some point in our history, they probably did (Donald, 1993). Can the further agumentation of human experience and training possible- or a least, thinkable- in some advanced VR system augment human intelligence? Maybe. But we will have to better understand the psychology of communication and the way to encode and deliver information. Through this we might achieve the goal of intelligence augmentation. We might be able to support more of the mind's cognitive models so that human information processing can be increased in ability, complexity, and capacity. The work on human creativity and problem solving suggests that a medium for augmenting human intelligence will be based more on our understanding of how we use sensory information and imagery to encode, think, and problem solve (e.g., John-Steiner, 1985) than by simply increasing the power of a graphics supercomputer. But the illusions of the graphics supercomputer may give us a means to explore how we encode, think and problem solve. A CONCLUDING NOTE The world-wide effort to rapidly develop virtual reality is motivated by a desire to augment human intelligence. Ideas related to intelligence augmentation have also permeated the culture. In the United States this desire is wrapped up in long standing cultural beliefs about technology and human perfectibility (e.g., Marx, 1964). In this article I have also tried to show how the design hypotheses propelling VR technology are part of a fii~y year effort to augment intelligence. In the vision of Vannevar Bush and his intellectual progeny, the computer would lead to unique cognitive technologies, cognitive environments that might free the human mind by enhancing its operation. What is clear at this point is that research in the design of virtual reality systems will attempt to push the envelop of human intelligence by creating new tools to amplify, augment, and adapt cognitive processes. It is not yet clear if this faith in the ultimate cognitive value of VR is justified or misplaced. REFERENCES
Alessi, S. M., 1988. Fidelity in the design of instructional simulations. Journal of Computer-based Instructions 9:335-348. Bardini, T., and A. T. Horvath, in press. The social construction of the computer user: The rise and fall of the reflexive user. Journal of Communication 45(2). Biocca, F., 1993. Will simulation sickness slow down the diffusion of virtual environment technology? Presence 1(3): 334-343. Biocca, F., and J. Rollland, 1995. Virtual Eyes Can Rearrange Your Body: Perceptual adaptation to visual displacement in Augmented Reality Systems. Submitted to Prescence.
74
F. Biocca
Biocca, F., T. Kim, and M. Levy, 1995. The vision of virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 3-14. Hillsdale, NJ: Lawrence Erlbaum. Biocca, F., and B. Delaney, 1995. Immersive virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 57-126. Hillsdale, NJ: Lawrence Erlbaum. Brooks, F., 1977. The computer scientist as toolsmith: Studies in interactive computer graphics. In: B. Gilchrist, ed., Information processing 77, 625-634. Amsterdam: North Holland. Brooks, F., 1988. Grasping reality through illusion: Interactive graphics serving science (Report TR88-007). Chapel Hill: Dept. of Computer Science, University of North Carolina at Chapel Hill. Bush, V., 1945, July. As we may think. The Atlantic Monthly, 101-108. Donald, M., 1993. The origins of the modern mind. New York: Cambridge University Press. Eco, U., 1976. A Theory of Semiotics. Bloomington: Indiana University Press. Engelbart, D., 1962, October. Augmenting human intellect: A conceptual framework. [Summary report, contract AF 49(638)-1024], 187-232. Stanford: Stanford Research Institute. Foley, J. D., A. Van Dam, A. Feiner, and J. F. Hughes, 1994. Computer graphics: Principles and practice. Reading, MA: Addison-Wesley. Furness, T. A., 1988. Harnessing virtual space. Society for Information Display Digest 16: 4-7. Furness, T., 1989. Creating better virtual worlds (Rpt. M-89-3). Seattle: HITL, University of Washington. Furness, T., 1993. Greetings from the general chairman. Proceeding of the IEEE Virtual reality annual international symposium, i-ii. Piscataway, NJ: IEEE. Gardner, H., 1977. Frame of mind. Boston: Harvard University Press. Gibson, J. J., 1966. The senses considered as perceptual systems. Boston: HoughtonMifflin. Gibson, J. J., 1979. The ecological approach to visual perception. Boston: Houghton Mifflin. Hays, T., and M. Singer, 1989. Simulator Fidelity. Boston: Houghton Mifflin. Heilig, M., 1992. E1 cine del futuro: The cinema of the future. Presence 1(3): 279-294. (originally published in 1955) John-Steiner, V., 1985. Notebooks of the mind: Explorations of thinking. Albuquerque: University of New Mexico Press. Kramer, G., 1995. Sound and communication in virtual reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 259-276. Hillsdale, NJ: Lawrence Erlbaum. Krueger, M., 1991. Artificial reality. New York: Addison-Wesley. Lanier, J., and F. Biocca, 1992. An inside view of the future of virtual reality. Journal of Communication 42(2): 150-172. Licklider, J. C. R., and R. W. Taylor, 1968, April. The computer as a communication device. Science and technology 17:21-31. Marx, L., 1964. The machine in the garden: Technology and the pastoral ideal in America. New York: Oxford University Press. McLuhan, M., 1966. Understanding media. New York: Signet.
Intelligence Augmentation
75
McLuhan, M., and E. McLuhan, 1988. Laws of media, The new science. Toronto: University of Toronto Press. Meyer, K., 1995. Design of synthetic narratives and actors. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 219-258. Hillsdale, NJ: Lawrence Erlbaum. Morishima, S., and H. Harashima, 1993. Facial expression synthesis based on natural voice for virtual face-to-face communication with machine. In Proceedings of the 1993 IEEE Virtual reality international symposium, 486-491. Seattle: IEEE. Murphy, M., 1992. The future of the body: Explorations into the further evolution of human nature. Los Angeles: Jeremy Tarcher. Parkhurst, P. E., and F. M. Dwyer, 1983. An experimental assessment of students' IQ level and their ability to profit from visualized instruction. Journal of Instructional Psychology 10: 9-10. Rheingold, H., 1991. Virtual reality. New York: Summit Books. Robinett, W., 1991, Fall. Electronic expansion of human perception. Whole Earth Review 17:16-21. Rolfe, J., and K. Staples, 1986. Flight simulation. Cambridge: Cambridge University Press. Rolland, J., F. Biocca, R. Kancherla, and T. Barlow, 1995. Quantification of perceptual adaptation to visual displacement in head-mounted displays. Proceedings of the IEEE Virtual reality annual international symposium, 56-66. Piscataway, NJ: IEEE. Salomon, G., 1979. Interaction of media, cognition, and learning. San Francisco: Jossey-Bass. Shapiro, M., and D. MacDonald, 1995. I'm not a real doctor, but I play one in virtual reality: Implications of virtual reality for judgments about reality. In: F. Biocca and M. Levy, eds., Communication in the age of virtual reality, 323-346. Hillsdale, NJ: Lawrence Erlbaum. Sheridan, T., 1992. Musings on telepresence and virtual presence. Presence 1(1): 120126. Sutherland, I., 1965. The ultimate display. Proceedings of the IFIPS Congress, 2: 757764. Taylor, S. E., and S. C. Thompson, 1982. Stalking the elusive "vividness" effect. Psychological Review 96: 569-575. Wetzel, C. D., P. H. Radtke, and H. W. Stern, 1994. Instructional effectiveness of video media. Hillsdale. N.J.: Lawrence Erlbaum Associates. Winograd, T., and F. Flores, 1987. Understanding computers and cognition. Reading, MA: Addison-Wesley Publishing. Wooley, B., 1991. Virtual worlds. Oxford: Blackwell.
This Page Intentionally Left Blank
M O D E L I N G AND MENTAL TOOLS
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
79
Chapter 4
PATIENCE AND CONTROL: THE IMPORTANCE OF MAINTAINING THE LINK BETWEEN PRODUCERS AND USERS David A. Good Department of Social and Political Sciences University of Cambridge, UK
[email protected] INTRODUCTION An important feature of various new information and communication technologies is the power they place in the hands of the user to choose between various activities and modes of operation as that user sees fit. This degree of control can range from the simple to the complex. The television remote control allows the supine viewer to easily browse a large number of channels as passing whims dictate. A similar remote control can guide an imaginary walk down a virtual mall, in which real interactive shopping can be done. A student working with a complex hypertext system can move between all sorts of material - graphics, text, sound - in a knowledge base seemingly without constraint as circumstances and desires dictate. At face value, this flexibility and user control can seem to be a highly desirable property. It certainly fits the current ideological climate where the market rules, and the consumer is supposedly sovereign over his or her choices. More importantly, it places power in the hands of the user, and who else would know best about that user' s needs, and how to achieve user-centredness? Thus, user control and user-centredness would seem to accomplish a central part of the Cognitive Technology Agenda (CTA henceforth). As Mey (1992) notes, we should be seeking systems which avoid 'forced adaptivity' and display 'adaptability', and what could display this more than a system which adapts to the user, moment by moment, as that user expresses his or her needs and follows his or her desires? Anyone who has seen a supine channel-hopper, witlessly cruising through endless TV channels as each moment's boredom creates a demand for something new, might instinctively feel that the answer to this question is not a foregone conclusion. It would not require a particularly puritanical frame of mind to think that there is something vaguely distasteful or even immoral about systems which allow uncontrolled selfindulgence. Instinctive responses to technological innovations must always be treated with caution especially with the creation of devices which are intelligent. It is very easy to summon an image of Frankenstein's monster, or a Luddite fear, but instincts are not always completely wrong. Indeed, as will be argued in this paper, there are grounds for
80
D.A. Good
believing that in many areas such extremes of responsiveness to user demand are not an unqualified benefit. There may, in fact, be more than a grain of sense in this instinctive moral reaction if we fail to distinguish between systems which are user centred and user indulgent. This is a distinction which in times gone by would have seemed quite pointless, as the only way in which an intelligent system could lead to self-indulgence would be if the user liked working hard. However, now we need to distinguish between systems in this way because user-indulgence can effectively destroy communication, and thus be quite harmful for individuals, and the societies in which they live. This concern is implicit within CTA, but it needs elaboration and development. An important component of CTA is a concern with how new communication technologies, and intelligent devices which can be communicative agents in their own right, can change individuals and the societies in which they live (Gorayska and Mey, 1995), [1 ]. This change might be affected in a variety of ways. There might be direct cognitive consequences resulting from everyday experience with such devices, either for work or leisure. Alternatively, the model of intelligence and interaction which they proffer might be taken as a metaphor through which self could understand self and other, [2], [3]. It is in being concerned with these threats that CTA has an interesting and distinctive moral tone, not to be found elsewhere in the literature on developing these new technologies, but fairly common in other literatures on their social and political impact (see, for example, Cherry, 1985; Dizard, 1989; Murdock and Golding, 1985; Salvaggio, 1989). By comparison to that social and political literature, this moral focus is relatively illformed, and some might dismiss it for proposing a well-meaning, but naive and poorly considered liberal sentiment of user-centredness. It could also be construed, and similarly dismissed, as a dramatic and overly fearful reaction to these technologies. To dismiss the moral agenda it proposes, in this way, would, however, be to ignore the fact that it is rooted in a specific view of human psychology. That view of human psychology lends the agenda a validity which makes it much harder to dismiss, and also gives it a sharper focus. It is a view which gives a central role to the experience of social interaction in the construction and maintenance of human mentality. THE PRIMACY OF CONVERSATION In many, if not all respects, face to face conversation is the basic model from which all other forms of human communication ultimately derive. For the young child, it is the medium through which he or she develops his or her understanding of language, its use, society, and the intelligence which pervades that society. Until not so long ago, it was the form which dominated human communication, and, although its central role has recently been challenged by the advent of various communicative technologies from the invention of writing onwards, it is still the environment in which humans typically operate. It is also the one they prefer because it enables them to pursue a full range of personal goals (Rutter, 1984). Recent work on human evolution has even argued that it is in the individual's social life, and the demands which it produces, that we may find the real evolutionary pressures which lead to primate intelligence in general, and human intelligence in particular (Byrne and Whiten, 1988; Good, 1995a; J. Goody, 1995). Indeed, many argue that social interaction and conversation are a necessary condition for human life, human intelligence and human society.
Patience and Control
81
Unsurprisingly, however, there is much debate about what other factors are significant, and how their significance varies with respect to different cognitive domains. CTA reflects this primacy and argues for its continuing importance, but understanding how it should affect system and interface design, for example, is not a simple matter. The experience of building various intelligent interactive machines has encouraged and, perhaps, even necessitated that we view these devices as isolated entities whose connection to the rest of the world is very much a secondary consideration without real consequence for their essential character. Initially, the scope and manner of their activity was so limited that the nature of their connection to other intelligent devices, be they natural or artificial, did not seem to carry any implications for the structure of the systems themselves, nor any consequence for those who used them. To all intents and purposes they could be considered as stand-alone devices with their own cognitive properties, in so far as they had any, and user skill, flexibility and adaptivity was sufficient to ensure the link to the user. Having acknowledged this, though, it is important to recognise that the use of any object [4] can be seen as part of a dialogue. A dialogue, that is, in the sense that an object is created with a purpose in mind, and that an understanding of the designer's purpose informs our understanding and use of it. As a consequence we can understand successful design as successful dialogue, no matter how limited it is, and we can also expect that, in the same way that dialogue carries consequences for the participants, so too will the design process and the products which result from it. In the simplest of cases, seeing use in this way adds little to our understanding of any object or its use, but as the complexity of manufactured objects grows, so does the importance of understanding the intention of the designer or creator, and the sense of a dialogue grows. There are many simple examples which illustrate the idea, and it is a point which can be seen as underlying the early work of Duncker, Maier and others on phenomena such as 'functional fixedness' (Duncker, 1945; Maier, 1931). This expression describes the way in which experimental subjects faced with a problem find it extremely difficult to see an object as being used for anything other than that purpose for which it was designed. For example, they often failed to recognise that they could use a hand tool, such as a spanner, as a pendulum bob to solve Maier's two string problem. Seeing an object and its use as part of a dialogue becomes more important, but much more difficult, as the complexity of the object grows, particularly in the case of communications, information and computer technologies. The increased difficulty lies in the fact that it becomes progressively less clear who is in dialogue with the user, particularly when the ambition behind the design is to provide more effective communication between human actors. With the simplest communications technologies such as, for example, the telephone, this is hardly a problem since the medium is seemingly transparent and the moment by moment intentions of the human users suborns any concern with an understanding of the intended use of the system. With more complex systems the situation is complicated by the apparent intelligence of the device itself. While an on-line encyclopedia is just another way of communicating information from those who know to those who do not, the way in which it responds to requests for information from a user can lead to a sense of the machine being the partner in the exchange. Thus, in an important way the user is in the position of communicating both through and with a system. If we are to understand the
82
D.A. Good
consequences for individual users of different designs, and the extent to which user choice as an expression of user-centredness is desirable, then we need to consider in more detail how communication and information technologies transform the ideal speaker-hearer relationship. This can be interestingly done if we first consider the impact of the oldest and best studied communication and information technology, writing. INTERACTION TRANSFORMED BY TEXT Writing is, of course, not exactly what CTA is aimed at, but the way in which it transforms interaction has much in common with the ways in which other technological developments transform interaction. Central to these changes are the following. The speaker and hearer who become writer and reader are displaced in space and time from one another. The channel through which they communicate carries less information. The signal sent loses its ephemeral quality, but, because it endures, it provides an important form of information storage that is independent of the vagaries of human memory. The consequences of these changes, and this dialogue of a different kind, are said to be many. Some might be seen to be relatively beneficial. Other forms of discursive structure are developed, both for the literate and non-literate members alike of literate societies; more time is available for reflection in the production process for the speaker/writer; and the written page becomes a prosthetic device for the mind, both in the moment and the longer term. Other consequences might not be thought to be so benign. Spontaneity is lost; the communication is impoverished in terms of its social and emotional content; and the precision of the written page can exert its own form of pedantic tyranny as the prospects for negotiating meaning are reduced, (E. Goody, 1986; J. Goody, 1990; Illich and Sanders, 1988). All these consequences are important, but at their heart is the fact that the very nature of the relationship between the two sides to the dialogue is changed, and thus so are the participants. Any spoken dialogue offers two principal roles for the participants [5]. On the one hand, those who speak need to compose something intelligible which can be interpreted by those to whom it is addressed, and failings of the composition are addressed there and then. On the other, those who are addressed can play an important role in revealing the success of the utterance which the speaker has produced. They are also required to pay attention, and be a competent patient listener who is engaged in the speaker's project. From this simple fact of co-presence, and the system constraints of the participants considered both individually and together, a number of properties flow which provide all parties with resources, but also impose constraints on their actions. In written communication, however, neither need pay the same kind of realtime, moment by moment attention to the other, and there is no compulsion to orient to a collaborative enterprise in the same way. In other words, while each can be more self-centred, this is especially the case for the reader. Unless the reader has some independent motivation for persevering with the reading of the text, he or she can play with it as they wish, or even completely disregard it. In face-to-face conversation, behaving in this way would be impossible if one wished to maintain any kind of relationship with the speaker. In brief, the other-centred listener can become the selfcentred reader.
Patience and Control
83
The reader's independent motivations for attending to a text might, however, be many, and could include all manner of extra-textual factors; yet it should not be forgotten that the text itself contributes to that motivation. Apart from the widespread conventions on how one reads, the structure of a text, its permanence and its scale give the writer resources for engaging and controlling the reader. The structure of the text is one way in which the author's presence is maintained in the dialogue. The reader also maintains a conception of authorship, and this too can be a constraint as it evokes a notion, no matter how limited, of a relationship. No reader believes that a text created itself Thus, by convention and by virtue of the text itself, a link is maintained between the writer and the reader. This itself can counter-balance potential egocentricity, and when it does, the intellectual demands of the task of reading contribute something more to an individual's mentality. Thus, in the case of writing, a gap may be opened between the speaker/writer and reader/hearer allowing a degree of egocentricty to emerge, and this is especially so for the reader/hearer. Nevertheless, other demands and resources enter the picture to close this gap by providing the speaker/writer with a degree of control and authority. The demands of literacy, in turn, provide the reader with additional cognitive benefits. I N T E R A C T I O N T R A N S F O R M E D BY H Y P E R T E X T If we now turn to the other end of the technological spectrum and examine, for example, a powerful multi-media hypertext system, in the light of the considerations which have been just applied to writing, it is easy to see that the potential for destroying the link between the archetypal speaker and hearer is much greater. The same factors to do with displacement in space and time apply, and two more potent elements come into play. Both reduce the possibility of authorial control; one which confuses any understanding of the nature of authorship, and one which reduces or eliminates text structure. First, the very nature of the material, its quality, its variety, its dynamic, and seeming intelligence, elevate the system to the position of interlocutor, but not interlocutor. This is a new conversational role which completes the separation speaker from hearer, and, since the occupant of this role, the system, has no rights standing, the requirement for respect for and attention to the other disappears, and egocentric mentality, on the part of the user, is permitted, and, perhaps, encouraged.
its as of or an
Second, as systems of this type become more poweful and flexible, the choices available to the user at any point rapidly multiply so that the number of different routes through the material seems to be almost without limit. This entails that the author cannot assume that any user has necessarily arrived at any point by any specific route. Thus, although the elements of the hypertext are linked to one another by a web of connections, they must also be relatively discrete and self-contained. The result is that the elements become increasingly self-sufficient and reduced in size, while the whole becomes comparatively unstructured, and unconstraining on the activities of the user who is using it. This encourages a degree of self-centredness because the user is encouraged to follow his or her own needs, as seems appropriate to him or, and promotes a view of knowledge as a collection of discrete and fragmented parts.
84
D.A. Good
SELF-CENTRED EDUCATION IS NOT USER-CENTRED EDUCATION If any part of this brief and gloomy picture is right, then what is threatened most is a particular way of learning and developing, and it is not clear that there is any effective substitute for this way. To understand why this might be thought to be so, it is necessary to focus on a rarely examined, but important paradox which I have explored elsewhere, (Good, 1995b). This paradox originates in certain views of education which are quite influential, and have a good deal of attraction for those constructing intelligent knowledge-based multimedia devices for use in education. What we traditionally identify as education is one area where these devices will be heavily used in the future, and the wide availability of them is quite likely to transform the institutional nature of education, and make it a ubiquitous, life-long activity. A fundamental premise of most education and instruction is that there is an asymmetry of knowledge between teacher and pupil. The teacher knows more than the pupil does. This does not mean that there are not cases where the less well-informed say or do something from which those who are more knowledgeable can learn, but these cases are in the minority, and rarely, if ever, are they cases where the educational event is intended. The aim of instruction or tuition is to reduce the difference between the student and the teacher by, amongst other activities, the transfer of knowledge, skills or ideas from one to the other. When we contemplate this problem, it is very tempting, and many have yielded to this temptation in the past, to view communication in education as a process in which the teacher interprets the student's current state of ignorance, and decides on what can be safely added to that knowledge base without either bemusing or boring the student. Too much will do the former, too little the latter, and, if what is offered is the right amount but it is not configured in an intelligible form, confusion will be the result again. However, for a teacher to know what it is that he or she might usefully say, under this scenario, that teacher needs to know what it is like to be in a state of ignorance. To put it another way (which applies to most of our communicative activities), to know how to formulate an idea which is unknown to somebody else so that they can understand, it is necessary to know what it is like to not understand it - which is an impossible demand. Now teachers manage to circumvent this difficulty in all sorts of ways, and education still happens. Common to all the tactics which are used is that in some way or another they rely on the experts in ignorance, i.e. the students, for advice. This may come directly, in relation to each student, from a contemporaneous dialogue in the classroom, or it might come via other teachers' experiences, or it might come from other students at other times. Equally, of course, the students, the experts in ignorance, have a problem. They cannot say what should be presented to them, because they are ignorant. So education can only proceed by both sides working together to find out what it is that each needs from the other. This may well not be a dialogue of equals, because control depends on power and knowledge; but as a dialogue, it can only be exercised with the assent of those who are in the position of ignorance. In other words, effective teaching depends on collaboration and dialogue, and on the ability to take part in dialogue, an ability which depends on one' s experience of dialogue. All of this is a somewhat simplistic way of paraphrasing one part of Vygotsky's idea of the 'zone of proximal development' (Vygotsky, 1962/1934), which proposes that a child' s development is dependent on the social life in which it can engage because of its interactional skills and the social world in which it lives. Central to this is the claim that
Patience and Control
85
the psychological growth a child can achieve at any point is constrained by this particular developmental space which depends for its character on a number of factors. The most important of these is the dialogic skills of both the child and those with whom it is interacting. The contributions which others make in a conversation with the child effectively erect a scaffolding which enables him or her to develop by helping to support a shaky capacity in the first instance. Furthermore, as children come to understand the roles that others may take with respect to them, they can internalise an understanding of the resulting dialogues, and so extend their own cognitive skills in a more self-reliant fashion. To do this depends upon the experience of working with others, empathising with their aims and ambitions, and a degree of patience and willingness to comply with the demands they make. The totally self-indulgent child who wishes to do only what he or she fancies would never have this kind of experience, and would suffer as a consequence. It is the wise child who learns that suspending one's disbelief and boredom, and paying close attention to the speaker is an important step. Vygotsky's description of child development is not one which loses its importance when the child reaches adulthood. There are many reasons to believe that the dialogic imagination is very important for all manner of intellectual activities, and it is a form of mental life which is only preserved in its use. There is no alternative [6]. This line of argument has been explicitly offered by Laurillard in her work on the impact of various kinds of educational technology on University level education (Laurillard, 1994). As an education technology specialist, she is fully cognisant of the potential of these new technologies for enhancing and extending educational opportunities. However, she is equally aware of the need to understand the different kinds of learning experience which a student of any age needs. A central element in her account is an emphasis on the way in which interacting with someone places interpretative and expressive demands which simply do not arise in any other context. These demands are important, not only for developing the individual's communicative skills, but also for developing the intellectual capacities which make communication worth while. Interestingly, this conclusion is also being recognised by a number of those involved in recent UK programmes for the introduction of IT to higher education (Mayes, 1995) If the development of different systems does take the separation of speaker and hearer to a point of complete isolation of one from another, then certain consequences will follow. Authorial control is severely limited, and the reader's patience is a virtue which is no longer rewarded. The implication of this view is that understanding how a system might be best structured to benefit the needs of a user is not simply correlated with the users' experiences at any particular stage of their use of the system. It is an old lesson that the user of any object or instrument, or the reader of any text will persevere in the face of great difficulty, if there is some reason to have faith in the author or creator of that text or object. It is not unusual for great benefits to flow from such perseverance when there is a temptation to succumb to an easier alternative. This lesson should be borne in mind for CTA. CONCLUSION In this paper, I have sketched an argument that it is important not to confuse usercentred with self-centred, and speculated that the former is often satisfied by an arrangement where it is not assumed that the user always knows best. The moral
86
D.A. Good
agenda which is part of CTA raises important issues about the relationships between people as transformed by technology because it links the nature of human mentality to the social life which is led. Consideration of this will hopefully lead to appropriate, effective, and useful technology which extends and enhances human capacities and activities. Those developments can only be successful if the humans in question are left with the capacity to be integrated and connected members of the societies in which they live. This depends on their dialogic abilities which in turn are a major force in the establishment of their intellectual abilities. These will come from many different experiences in many different domains, and it is quite clear that the experience of new forms of relationship between the parties to any kind of dialogue does not in itself pose a threat, as our collected experience of literacy shows. However, if the link between those parties in this new communicative domain is broken by the replacement of one of them with an intelligent device which needs not to be respected, and at the same time narrative structures are destroyed, the model of knowledge and communication offered could ultimately be far more damaging. NOTES [1] It is important to bear in mind though, both here and later, that the user's conception of any supposedly intelligent system is an important consideration. This point is forcefully illustrated by the study of a radical psychotherapy service where clients were asked to ask ten yes-no questions into a microphone, and, after each question, offer an interpretation of the 'yes' or 'no' answer which had been given by a light coming on. Believing the answers to be coming from a trained psychotherapist, some subjects were able to interpret the most bizarre sequences of answers as meaningful. They had, however, been misled, there was no psychotherapist and the answers were randomly generated, (McHugh, 1975). [2] Both academics and non-academics alike are prone to take all kinds of metaphors from the wider world for understanding themselves and others, and there is no reason to believe that an instance which can be so intimately known will be free of this tendency. [3] There is a faintly amusing irony in this concern because these technologies which potentially pose this threat almost certainly owe their existence to the flexibility and creativity of mind which, CTA assumes, developed from the social life which is threatened. This ironic sense verges on the paradoxical when we realise that the flexibility of mind which enables the user to display a high level of 'adaptivity', and so adapt to all sorts of technology in the first instance, is also the capacity which makes that user, or even the society of users, vulnerable to the iniquities of 'forced adaptivity'. 9[4] One might simply consider created objects at this point; however, found objects are created as something new in the light of their use, and so it becomes quite difficult to specify the boundary between natural and manufactured. [5] Restricting the number of participant roles to just two, speaker and addressee, ignores the fact that there are many different kinds of role in conversation apart from these two, but for the sake of the current discussion, the original Adam and Eve of conversation will do. See Levinson (1988) for a discussion of the variety of roles, and reasons for taking them seriously.
Patience and Control
87
[6] It is interesting to note how children with exceptional mental skills in very narrow domains, children who are often known as idiots savants, are often autistic and have very poor or non-existent social skills. The mental feats they can perform often seem to be amazing in the computational power required compared to what most of us can do, for example, calculating what day of the week any date will fall on, but they seem to be feats almost totally lacking in intellect as we normally construe it. REFERENCES
Byrne, Richard, and Andrew Whiten, eds., 1988. Machiavellian Intelligence. Oxford: Clarendon Press. Cherry, Colin, 1985. The Age of Access: Information Technology and Social Revolution. London: Croom Helm. Dizard, Wilson, 1989. The Coming Information Age. 3rd edn. London: Longman. Duncker, Kurt, 1945. On problem solving. Psychological Monographs 58: 270. Good, David, 1995a. Where does foresight end and hindsight begin? In: E.N. Goody, ed., Social Intelligence and Interaction, 139-149. Cambridge: Cambridge University Press. Good, David, 1995b. Asymmetry and accommodation in tutorial dialogues. In: R. J. Beun, M. Baker, and M. Reiner, eds, Dialogue and Instruction, 31-38. Berlin: Springer-Verlag. Goody, Esther, ed., 1995. Social Intelligence and Interaction. Cambridge: Cambridge University Press Goody, Jack, 1990. Technologies of the Intellect: Writing and the Written Word. Memorandum Nr. 5, Projektgruppe Kognitive Anthropologie, Max-PlanckGesellschafl. Gorayska, Barbara, and Jacob L. Mey, 1995. Cognitive Technology. In: Karamjit S. Gill, ed., New Visions of the Post-Industrial Society: the paradox of technological and human paradigms. Proceedings of the International Conference on New Visions of Post-Industrial Society, 9-10 July 1994. Brighton: SEAKE Centre. Illich, Ivan, and Barry Sanders, 1988. ABC: The Alphabetization of the Popular Mind. San Francisco: North Point Press. Laurillard, Diana, 1994. Rethinking University Teaching. London: Routledge. Levinson, Steven, 1988. Putting linguistics on a proper footing. In: P. Drew and A. Wootton, eds, Erving Goffman, 161-227. Cambridge: Polity. Maier, N., 1931. Reasoning in humans II: The solution of a problem and its appearance in consciousness. Journal of Comparative Psychology 12:181-194 Mayes, T. A., 1995. Paper to CAL 95, Queens College, Cambridge, April 1995 McHugh, Paul, 1968. Defining the Situation. Indianapolis: Bobbs Merrill Inc. Mey, Jacob L., 1992. Adaptability: reflections. M and Society 6:180-185. Murdock, Graham, and Golding, Peter, 1989. Information poverty and political inequality. Journal of Communication 39:180-195. Salvaggio, J., ed., 1989. The Information Society. Hillsdale, NJ: Brooks Cole. Rutter, D. R., 1984. Seeing and Looking. London: Academic Press. Vygotsky, Lev S., 1962. Thought and Language. [Originally in Russian, 1934.] Cambridge, Mass: MIT Press.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
89
Chapter 5 "AND YE S H A L L BE AS M A C H I N E S " - OR S H O U L D M A C H I N E S BE AS US? ON TH E M O D E L I N G OF M A T T E R A N D MIND* Hartmut Haberland Department of Languages and Culture University of Roskilde, Denmark
[email protected] If adaptation (cf. Mey 1994 on 'adaptability') is one of the big words in Cognitive Technology, the question immediately to be asked is: who adapts to what (or what adapts to whom)? In communication between people and machines, do people adapt to machines or do machines adapt to people? This sounds like a variation on HumptyDumpty's famous remark that it all depends on the question which is to be master (as he told Alice). Even though we, as users of intelligent machines, sometimes may feel that we are victims of their stupidity, this should not be the case since after all, there is a fundamental built-in asymmetry: machines are programmed by people. The question which is to be master is thus settled from the start, one should assume. However, such is not the case. On the one hand, there is nothing uncommon in a situation where human beings create a structure and lose control of it. When Marx talked about alienation, he had this in mind: humans are confronted with societal structures which are the works of their likes, but they experience them as something 'objective' they cannot change. This also means that they can learn how to deal with these structures, to adapt to them, without actually understanding them. "Sie tun es, aber sie wissen es nicht," as Karl Marx characterized this state. 1 The case is comparable to that of the kula as analyzed by Malinowski (1922), viz. the trading cycle of highly-valued but intrinsically worthless objects among the islanders off the NE coast of New Guinea. This trading cycle involves an astonishing number of people who never have had the full experience of all the parts of this cycle, and to our knowledge
*The first half of the title is taken from the title of Mey (1984). - I should probably thank Wolfgang Fritz Haug in this place. He really introduced me to philosophy, although I do not know what he will think of this piece when he reads it. Jens Balslev gave me a hard time years ago when I tried to convince him of the very position which I am attacking here. Special thanks go to Jeremy Franks for a discussion of how to translate Gustafsson into English. Soren Schou has shared his knowledge about Lars Gustafsson, and his copy of Utopier, with me. While writing this paper, I got a very encouraging electronic note from Lars Gustafsson which is gratefully acknowledged here. And a very special thanks to the Editors, Barbara and Jacob, for maieutic help. "They do it, but they don't know."
90
H. Haberland
this cycle has not been devised by any one master mind, but has developed through practice. Still, everybody knows exactly what his role is in the cycle. The question of which is to be master has turned meaningless here: though a thoroughly human product, the machinery of an abstract structure (such as a patterned habit, or an institution) has taken command over the individuals functioning in it. On the other hand, the asymmetry of the relationship between human users and programmer on the one hand, and intelligent, programmed machines on the other is just another reflection of the asymmetry that crops up whenever we talk about consciousness. Already in Cartesian dualism, res cogitans and res extensa are not endowed with equal opportunities: mind can be conscious of matter, but not the other way around. In Turing's (1950) famous Gedankenexperiment (the one which should enable us to decide whether a machine is intelligent or not), it is an observer that has to be convinced by the machine that it is intelligent; this role cannot be taken by a machine if only for the reason that the machine could not see the point of getting an answer to the question whether machines can think. In Marvin Minsky's classical treatise Matter, mind a n d models (1968), the role of the observer is duly emphasized in connection with his discussion of models. Minsky uses the term 'model' in the following sense: "to an observer B, an object A * is a model o f an object A to the extent that B can use A * to answer questions that interest him about A. " (1968: 426)
Now if a human being M is interested in answering questions about the world W, he or she would use a model W* of W. This model could be inside M, but at the same time contain a model M* of M (since M is part of W). M* can then contain a model W** of W*, and so on and forth. All these models would be motivated by the specific type of questions they can answer (e.g., my built-in model M* of myself can answer questions like how old or tall I am, but not what kind of thing I am - this question would have to be referred to a model M** of M*). But (and this is the point that interests us here) although all these models in principle can be emulated by programmed machines- M* does not have to be in M, but can be programmed and exist somewhere outside M - , there is no point in relegating the task of the observer to a machine. Machines can be used to answer questions, but they cannot genuinely be interested in asking questions. So "humans are in a privileged position" (Edelman, 1989:22). By this, Edelman means that humans can report about their consciousness, whereas they are dependent on inference when discussing consciousness in animals (assuming that animals have one). Traditionally, the assumption of human privilege amounts to self-consciousness (together with self-conscience) being the ultimate, irrefutable, irreducible property specific to humans. Still, the question is lurking: and what if it is not? How can we prove that we are privileged in this way? The fact that we want to be privileged does not prove that we are. Neither does the introduction of dualistic assumptions (man-animal, mind-matter, and so on) excuse us from deconstructing the presumed privilege. Originally conceived as a means of establishing the superiority of res cogitans (that means us) over res extensa, dualism can turn against itself. If dualism wants to say anything sensible about
Modeling Matter and Mind
91
the privileged member of the mind-matter dichotomy (and here I am tracing Minsky's argument against free will (1968:431)), it has to apply models of the mind based on the structure of its opposite, viz. matter. From this to the use of a an ontological metaphor like TnE MINDIS A MACHINE (as acknowledged by Lakoff and Johnson (1980: 27) 2 ) is not a big step. An historically adequate, literary expression of the shock created by the realization of possibly losing this privileged position is found in a poem by the Swedish author Lars Gustafsson, originally published in 1966, with the title Maskinerna, 'The Machines'3
The Machines 4
Lars Gustafsson Some came early, others late, and outside the time where it exists each one of them is homeless. Hero's steam ball. The Voltaic pile. The ballista. Polhem's ore hoist at Falun. Curiosities: The "pneumatic fan." Una macchina per riscaldare i piedi. We only perceive machines as homeless When they belong in another century. Then they become obvious, they acquire a meaning. What do they mean? Noone knows. The crankshaft device: a way of transmitting power over long distances with the aid of two levers moving backward and forward. What does the crankshaft mean?
2 and examplified by English expressions like 'to grind our a solution', 'my mind isn't operating today', 'I'm running out of steam', etc. 3 It is only fair to acknowledge that Lars Gustafsson in 1995 does not take the same philosophical stance in these matters as he did in 1966, as he informed me in the electronic message referred to above. 4 translated by Yvonne Sandstroem. Quoted by kind permission of the University of Minnesota Press from Modern Swedish Poetry in Translation, ed. by Gunnar Harding ans Anselm Hollo, Minneapolis 1979, 75-76. The Swedish original is reprinted in Gustafsson (1969).
92
H. Haberland
DIE BERGWERKE IM HARZ ANNO 1723 The picture teems with people. People, small like flies go up and down in the buckets, the object marked "J" in the picture, "La Grande Machine," by the fresh waterfall, drives all the cables. Noone has ever combinedas would be perfectly possiblecrankshaft and steam engine Voltaic pile and Hero's ball. The possibility remains. A foreign language that noone has spoken. And actually: Grammar itself is a machine Which, from innumerable sequences selects the strings of words for intercourse: "The healthy instruments", "the productive parts", "the cries","the muffled whispers". When the words have vanished, grammar's left, And it's a machine. Meaning what? A totally foreign language. A totally foreign language. A totally foreign language. The picture teems with people. Words, small as flies go up and down in the buckets and the object marked "J", "La Grande Machine" by the fresh waterfall, drives all the cables. A few years later, Gustafsson published an analysis of his own poem in a collection of essays (Gustafsson, 1969). This analysis gives us a number of technical explanations of matters not obvious to the modern reader. Whereas Hero's steam ball, the Voltaic pile and even the ballista still may be generally known, we must consider the great Swedish engineer Polhem's ore hoist at Falun, blankstOtspelet, as less well-known, perhaps even in Sweden. (Figure 1 shows a detail of blankstotspelet.) Not many people today are familiar with a crankshaft device, unless they realize that it is the very same principle, viz. the lever system, that propels the wheels of a steam locomotive. Yet, this
Modeling Matter and Mind
93
contraption was an extremely common sight around the ore and coal mines of the 17th Century, having a function comparable to today's power lines; by this device mechanical power could be transferred through a system of levers and shafts from its source (eg a waterfall driving wheels) to the its place of application.
Blankst6tsspelet i Falun. Det maskinella hos maskinerna blir tydligt f6rst niir de f6rdldrats och ryckts ur sin ursprungliga uppgifi, s. 33.
Figure 1. Polhem's ore hoist at Falun. The machinery of a machine appears most clearly when it is seen outside its historical context. (Detail from a copperplate by van den Aveelen in Eric Dahlberg's 'Suecia antiqua et hodierna', 1701)
94
H. Haberland
Gustafsson's poem obviously deals with alienation; but not only alienation in the sense of Marx' Entfremdung (as mentioned above), but also in the sense of Brecht's Verfremdung. Machines take on a foreign character, they become "homeless", when seen in a different historical context. Seen from the vantage point of the 20th century, erstwhile immensely useful machines like the Falun ore hoist or the crankshatt device share their place in history with utter curiosities like a feetwarmer machine. Gustafsson links the eery mood that overcomes us when we look at old prints of mechanical devices to the shock that we experience seeing one of the functions of our mind, language, described as the output of a machine. As Gustafsson himself points out in his self-analysis, the historical locus of this shock is what much later was called the Chomskyan revolution. Gustafsson's poem reflects the poet's amazement at the fact that language could just be a machine rattling off sentences in our mind; sentences that, when spoken, are taken for utterances by our listeners. Noam Chomsky's characterization of grammar as a machine was the point of departure for Gustafsson's poem. It is conceivable that we are machines ourselves, and that we would not be able to tell.
"The symbolic value o f the machines consists in the fact that they remind us o f the possibility that our own life in some way is simulated in the same way in which the machine simulates life." (Gustafsson 1969: 40, my translation 5) Now this is not a necessary consequence of reading Chomsky, neither now nor in the 60s. First, Chomsky's theory was never meant as a theory of linguistic communication; his view of human language is basically and inherently extra- or metacommunicative, and thus his theory is only a partial theory of human language. Chomsky would be the first to admit this, simply because he does not assume that communication is the primary raison d'Otre of human language. 6 He is not primarily interested in human communication but in human language and the human mind. This is obviously at variance with our common experience of human language. Leaving consciousness aside, which either has to be inferred (in the case of animals, if they have it) or can be reported on (by humans, like in grammaticality judgments), we have a via regia to language which neither depends on inference nor reports: language use. Gustafsson's poem shows this indirectly, when he talks about "the strings of words for intercourse ''7 that the grammar machine selects from an infinite set of sentences. In Chomsky's original view, grammar is a device that recursively enumerates the infinite set of sentences of some language. In Chomsky there is no talk about anyone (and certainly not about a grammar) selecting any of those sentences for interaction with another mind (a mind which embodies another grammar). But if Gustafsson had 5 In the Swedish original: "Maskinemas symboliskavarde ligger i att de erinrar oss om mrjligheten att vS.rt eget liv ~irp5 nhgot satt simulerat i samma mening som maskinen simulerar liv." 6 Ironically, the 'formalist' Chomsky is joined here by the 'functionalist' Malinowski who also thought that communicationwas only a secondary function of human language based on a communion which joins speakers and hearers without necessarily communicating something about some third person or object, cf. Malinowski (1923: 316) and Haberland (1984: 18). 7 In Swedish, samfardselns ramsor. I would have preferred a translation of samfardsel as 'interaction' rather than 'intercourse', to avoid a preemption of the picture which only emerges in the following lines of the poem as a consequence of the ambiguityof 'intercourse'.
Modeling Matter and Mind
95
followed the orthodox view, the word would not have become flesh (as it certainly does, when he talks about" productive parts" and "muffled whispers"). Instead of the instance of language use suggested by Gustafsson, we would have had two minds comparing notes about the identity of two recursively enumerable sets, or at most two linguists comparing grammaticality judgments. Thus the role of language use is reclaimed by the workings of poetic truth. Chomsky's theory of language is, possibly, a theory of the human mind, but not of human beings interacting with the help of the "healthy instruments" of language. The second objection is that although Chomsky may be able to describe the human mind as a Turing machine, this does not prove that the human mind is a Turing machine, not even that the human mind could be a machine. But only if it could be a machine, the shock induced by our falsificationist powerlessness viz. that if it were not the case, we could still not prove it is real. The mere fact that the grammar device can enumerate all, and only the sentences of a human language does not make it simulate the human mind: it just emulates an important part of its functioning. This difference between simulation and emulation is crucial; only a simulation can claim to be structurally analogous to its Urbild as a model. 8 At this point, we must make sure that we distinguish properly between models and metaphors. The notion of model in itself it not without its problems, as Yorick Wilks (1974) has reminded us. In mathematics, 'model' is used in a specific sense - and this practice goes back ultimately to Tarski -, viz. in the sense of a 'second interpretation of a calculus'. Since this interpretation (e.g. when we in formal semantics talk about 'truth in a model') often is more concrete than the first interpretation, one easily gets the impression that mathematicians use the term exactly in the opposite sense from the sense established in the behavioral sciences, where models tend to be more abstract than what they model. I'll leave this question aside here - even though the difference only may be apparent, it still can cause confusion; rather I want to emphasize that both models and metaphors are ternary relations between a user or observer and two objects or concepts of which the one is to be understood through the help of the other. For that reason, both models and metaphors are crucially dependent on people that employ them. If we compare Minsky's explication of a model, quoted above, with Lakoff and Johnson's account of metaphor, where one concept "is partially structured, understood, performed, and talked about in terms of'" another (1980: 5), and metaphor "allows us to comprehend one aspect of a concept in terms of another [concept]" (1980:10), then one difference should be clear: in metaphors, the two objects we are talking about can in principle be exchanged. If we understand argument in terms of war (since we make use of the cognitive metaphor argument is war), this is because the concept of argument is "partially structured, understood, performed and talked about in terms of" war, and then we can also talk about war in terms of argument; if we can understand one aspect of argument in terms of war then we can also understand one aspect of war in terms of argument. Likewise, if the mind is partially understood on the basis of the metaphor THz MINDIS A MACHINE(like 'my mind is on the blink'), we also have metaphors that describe machines by exploiting their similarities to the human
8 In the sense of Mey (1972), Chomsky's theory of competence is a descriptive, not a simulative model.
96
H. H a b e r l a n d
mind (like 'the machine has gone crazy '9 ), or body (like 'to feed data into the machine'), or even the whole human being (like 'the machine is on strike'). Contrary to the case of metaphor, the relationship between a model A* and what it is a model of, viz. A, is not symmetrical. 1~ If it is to make any sense for us to answer questions about A by asking them about A*, then A and A* cannot exist independently of each other. (Cf. what the astrologer does, viz. answering questions about something observationally non-accessible, the future (A), by asking questions about something more or less abstract, but observationally accessible: relationships (A*) between heavenly bodies. This presupposes that one believes in some pre-established homology between A and A*, although not necessarily in some causal relationship of A* to A, as vintage astrology would assume.) A* must specifically be constructed as a model of A, and A cannot at the same time be a model of A* in the sense that it helps answer questions about A* (although nothing is wrong with A* being part of A, which means that A* can contain a model A** of itself, which is something completely different). Using Turing machines as models of the human mind is attractive, because within an automata-theory based hierarchy, Turing machines are the simplest automata that are powerful enough to answer relevant questions about the set of sentences in any human language. (One of Chomsky' s achievements is the proof that nothing less than a Turing machine will do as model for the recursive enumeration of the sentences of some human language L.) The Turing machine, for all its power, is at the same time a welldefined and reasonably simple device which makes it possible, in a relative straightforward way, to study the formal properties of the languages it generates. 11 The value of the answers to the questions directed at model A* is, on the other hand, dependent on how well A* and A match, in this case, how much the sets of sentences generated by the grammar A* have to do with the actual language used by A. A relevant question is whether the concept of an infinite set of sentences generated by A* can be interpreted in a meaningful way with respect to A. As we see, even though Turing machines have been used as models of the human mind, it is doubtful if they could be used as models of human language, if one insists on interaction (or at least communication) as essential aspects of human language. This is a very different matter from the use of metaphors like THEMINDIS g MACHINEin everyday language. Even if we restrict ourselves to information processing machines (which is not required by this cognitive metaphor), such machines, although theoretically equivalent to a Turing machine, are much more complex than the latter and actually much less well understood. There is often no easy way of predicting their behavior at a given task other than letting them execute it, simply because a computer C* that is supposed to model another computer C cannot be any faster or simpler than C. The On machines going crazy, cf. EngstrOm(1977). lOTo me it seems that it is here that the mathematician's use of 'model' is at variance with its use in the behaviorial sciences. If the model servesthe purpose of establishing that the 'first interpretation' is consistent, then the roles of the two interpretations can reversed; it all depends on which questions one wants to ask. ~1 One of the results of these investigations lead to a paradox: although a grammar for natural languages has to be at least as powerful as a Turing machine, Turing machines are not restricted enough since they also generate sets of strings which never could be the set of sentences of some human language. I am referring here to the work done by Peters and Ritchie (1969, 1971), and others. The successive attempts to solve this paradox have led to the different paradigms of transformational-generativegrammar, to Government and Binding theory and beyond.
9
Modeling Matter and Mind
97
actual effect in understanding resulting from the comparison of the human mind with an actual computer is, therefore, rather limited. Similarly, many of the more specific instances of the general metaphor THE MINDIS A MACHINE do not really explain the mind through the workings of a computer but take their point of departure in an experience with computers, like 'My mind went totally blank (sc. like a screen)'. But if the human mind cannot be explained by reference to a computer, then maybe computers can be explained by reference to the human mind? In the terms used earlier, this would mean that the observer M has inside her- or himself a model C* of the programmed computer 12 C whose behavior she or he wants to understand. Like a model C* of C on a computer, such a model inside M will be neither faster nor simpler than C, but this is not so much of a problem: many of the questions one would want to ask about C are best answered by inspecting C itself anyway, in a way that would not be possible for questions about M itself. Computers cannot report about themselves, but we are not totally dependent on inferencing about them as we are with animals; computers allow for a certain amount of inspection (we can read their programs, e.g.). What we cannot ask the computer (at least not as ordinary users) are those questions which (in Minsky's terms) really are questions about C*, i.e., questions of a more general character, like which kind of questions C can answer. If a computer reports an error ("I cannot understand this input") we cannot sensibly ask it, " W h a t kind of input would you be able to understand then, if I may ask?". We simply may not ask, at least we cannot direct the question at the computer C itself. In order to answer these questions, a model C** of C* is needed, and this model can exploit a metaphorical understanding of C in terms of M. If we look at how people deal with this problem in practice, we find that they usually direct their question either at the manual or at the superuser next door. Both are expected to be able to function as this model C**. Manuals are useless most of the time, and only rarely give us the answers we are looking for, simply because they are not conceived of as such models. They are often little more than sophisticated descriptions of the inner workings of C and seem to have been written in happy ignorance of what kind of questions they should provide the answer to. Superusers sometimes can help, but they are rarely capable of formulating how they arrived at their superior knowledge: they have a poor model of themselves. But if better models both of C, exploiting human-machine metaphors of the right kind, and of the users of C could be developed, this could help the empowerment of computer users. This does, of course, not mean that computers really are people (just as we have seen that it is not the case that people are machines). But it does mean that it sometimes may help to look at them as if they were (albeit very quaint) people. This is what we do in our everyday metaphorical talk about computers and this talk is legitimate, as is every metaphorical effort at understanding something. Meaning, after all, is in the use. The fact that vintage machines lose their meaning for us follows from their uselessness. If meaning only emerges from use, then being without use must mean being without meaning. If a sentence is not used, but only enumerated by a machine, it stands out from its background in Gustafsson's sense,
~2By programmed computer, I am not referring to the hardware but to what one often calls a system or the program, i.e., whatever users experience as the instance they interact with.
98
H. Haberland
exactly in the way as sentences stand out as numbered examples in a standard grammatical treatise. Being thusly exposed, they become visible, but they also become homeless. We still know that they must have a meaning (grammars only generate objects with potential meaning), but we do not know where to apply for their meaning. If indeed we knew where to apply for such a meaning of the computers we are dealing with, we would finally be able to settle Humpty-Dumpty' s question. REFERENCES
Edelman, Gerald M., 1989. The remembered present. New York: Basic Books. Engstrom, G6ran, 1977. Some analogies between adaptive search strategies and psychological behavior. Journal ofPragmatics 1(2): 165-170. Gustafsson, Lars, 1969. .
G. < ~z
LU "r C) a. GC LU "r
.-
~g~
s
o""
~ O
-~t~.
t.
""
-
~
~ g=J~J
~
~:~ ~,,
g.JT
F i g u r e 2 - A topic m e n u from C a s p e r
168
A. Kass, R. Burke and W. Fitzgerald
Tutor:
That sounds like a good idea. Here is the kind of information an Operations Officer suggests you collect.
Video:
(The tutor plays a video of an engineer explaining how to gather information about discoloured water problems. Among other things, the engineer suggest that one should ask how the water looks, tastes, and smells, and whether the neighbors have the same problem.)
The student can return to the conversation with the simulated customer at any point to make use of the guidance that the tutor has provided: Student:
Please describe the colour of your water.
Customer:
My water is a sort of funny brown colour, with a kind of sandy sediment in it.
Student:
Are any of your neighbours affected?
Customer:
Yes, I rang Mr. Wellman in the next house over. His water is just the same.
The student may proceed with the conversation for quite awhile, gathering facts about the case. One thing the tutor looks for as it monitors the conversation is a premature conclusion that may upset the customer as in the following example: Student:
Your water is contaminated.
Tutor:
Contamination is a very serious problem. Telling the customer that his water is contaminated may cause him to panic. Watch this video about a CSR who caused a customer to panic.
Video-
(A video of an experienced CSR talking about a CSR who once caused a customer to panic simply by asking the customer whether there was anything swimming in her water. The customer immediately assumed that the question implied that there were things swimming in the water. After getting off the phone with the CSR, the customer called the health department and told them that the water company told her that there were eggs in the water which would develop into swimming insects.)
Tutor:
So you should be very sure not to panic the customer, especially if you are not sure what the problem is. Why do you think Mr. Lamb's water is contaminated?
The student can answer the tutor' s question by clicking on some items in the transcript, for example; the customer's indication that his neighbors are effected and that the problem has only existed since this afternoon. After giving specific explanations of how those items each give a small amount of evidence for the contamination theory, the tutor would then summarizes as follows:
How to Support Learning Tutor:
169
Those examples give weak evidence toward the possibility of contamination being the cause of Mr. Lamb's problem. In addition, contamination seldom occurs. You should first consider more likely theories. Would you like to retract your statement to the customer about the water being contaminated or leave the statement be?
The student might then choose to retract the statement, "Your water is contaminated." YELLO Yello teaches the fine points of selling Yellow Pages advertising. Yello presents the student with an assignment to sell Yellow Pages advertising to a client. The task for the student is to get to know the client's business, come to some understanding of its market and its advertising needs, construct a proposal geared to these needs and the client's concerns, present that proposal in a convincing way, and get the customer to buy. Different kinds of customers demand quite different ways of performing these tasks: salespeople would not ask a lawyer the same questions as they would a roofing contractor, nor would they address them in the same way. Students have a wide variety of options for action, some of which are appropriate for a given client, some of which are not. The Yello environment
Unlike Casper, where all of the action takes place in the office, on the phone, students in Yello go to their customers' places on business and meet them in person. The interface must show appearances, both of places and of people. The Yello interface therefore places a significant emphasis on showing pictures. As shown in Figure 3, the largest portion of the screen is devoted to a visual scene that shows the environment where the student is located. Student can gather important clues from the appearance of a customer's place of business. For the same reason, students are shown visual depictions of the characters with whom they are interacting. In addition to the conversational actions found in Casper, students in Yello must make physical actions in the course of the sales process. Students compose proposed ads to present to customers, and prepare presentation materials, which are displayed during the sales call. The action constructor in Yello therefore contains a wider range of actions including physical actions such as moving around in the simulated world and assembling material to bring on the sales call. Yello's characters communicate using text, which appears in text bubbles as shown in Figure 3, rather than the audio found in Casper. In this instance, we have sacrificed some of the realism of the simulation for the ability to adapt the simulation easily. Tone-of-voice and other expressive components of the face-to-face interaction are absent from characters' responses using this method. In its place, we have the emotional display meters that appear below each character's picture. These are explained more fully in the example below.
170
A. Kass, R. Burke and W. Fitzgerald
Swain Roofing: a sample scenario The Yello scenario we show is one in which the student is selling to a roofing contractor, Swain Roofing. The cast of characters includes the following: Ed Swain:
The owner of Swain Roofing and the person listed as the primary contact for the account.
Lucy Swain:
Ed's wife, the office manager for the company. She keeps the books and answers the phone.
Dave Swain:
The Swains' son, who is gradually taking over more of the business.
As is typical for small contractors, the business is run out of the Swains' home. Swain Roofing currently has a quarter page ad in their local Yellow Pages directory. The underlying business situation, which the student may or may not uncover, is that the Swains' primary business, residential roofing, has been undercut by lower-cost, lowerquality competitors. The Swains' are trying to get more business in the area of commercial roofing, where they feel their high-quality approach will be more valued. They are also interested in expanding their residential business into areas of the county they have not traditionally serviced. Ed and Lucy also intend that their son, Dave, take over the business, and are concerned that it might not be strong enough for him to make a good living. Ed has given Dave responsibility for the advertising, and Dave has started to look into direct mail as a means of getting to potential customers. A salesperson who asks the right questions and discovers these issues may be able to sell Swain Roofing a large ad campaign, including advertising in a "business-tobusiness" directory, a larger version of the current ad, small ads in headings other than "roofing," and display advertising in one or two directories in adjacent areas. A student who is less capable may have to settle for a renewal of the existing ad. That scenario begins with the student (who we will assume is named Mike Johnson) receiving the account information for Swain Roofing. Using the action constructor component of the interface (see below), the student calls the Swains' to make an appointment with Ed, who is listed as the contact person on the account. Lucy Swain answers the phone and agrees to an appointment. The student goes to the Swains' house for the appointment, and greets Lucy at the front door, as shown in Figure 3.
Student:
Hello. My name is Mike Johnson. I'm with Ameritech Pages Plus, the Bell Yellow Pages. I spoke with you on the phone about handling your account this year.
Lucy:
Please come in, Mike. Ed knows you're coming, and he should be here shortly.
A picture of the scene, the Swains' kitchen, occupies most of the screen. Inset in this area is the picture of Lucy Swain with a text bubble containing what she has most recently said. Under her picture are emotion display meters. These are intended to stand in for the multitude of cues to emotion given off by people in social situations. Here the meters show that Lucy is being polite: someone has just come to her house, so she is looking
How to Support Learning
171
somewhat happy (the first scale indicates happy/angry), somewhat interested (the second scale is interested/bored), and somewhat calm (the third scale is calm/threatened). The meters also indicate recent change in these indicators with the lighter area next to the indicator showing where the meter was last: here it shows that the happy and interested meters moved slightly after the student identified himself. The action constructor, from which the student has chosen the actions, appears on the left. It is similar to the utterance constructor interface used in Casper. The student makes choices from the menus and sub-menus to create the desired utterance. Mike's statement was actually the result of four action constructor choices: 1. Under the "courtesy" menu, there is an entry for "greet." Choosing this gets the English phrase, "Hello." 2. Under the "tell about" menu, selecting the "self" submenu and the option "name" gets the next part of the utterance, "My name is Mike Johnson." 3. The description of Mike's job, "I'm with Ameritech Pages Plus, the Bell Yellow Pages," comes from the option "affiliation" under the "tell about self' submenu. 4. Finally, to refer to the appointment set up in the previous conversation, the student can look under the "tell about" menu, choose the "previous conversation" submenu, and the option "appointment." The sentence is "I spoke with you on the phone about handling your account this year."
See Figure 4 for a snapshot of the action constructor in use. Here the student is preparing to select "affiliation" from the "tell about self' submenu. The English language phrase corresponding to the student's menu choices so far appears in the preview box at the bottom of the action constructor area. When the construction of the utterance is finished, the "Say It" button communicates it to the simulation. Although Ed is the owner of the business, Lucy has an important role. An experienced salesperson would try to take advantage of Ed's absence to gather information about the business from her. However, in this instance, the student does not realize that Lucy is an important source of business information and instead engages in small talk until Ed arrives.
Student: Lucy: Student:
What a wonderful view of the lake you have! Thank you, we like it. Did you just move in?
When Ed arrives, his picture appears in the scene next to Lucy' s.
Lucy:
Well, there's Ed now. Hi honey, this is Mike Johnson from the Yellow Pages.
172
A. Kass, R. Burke and W. Fitzgerald
Figure 3 - The student greets Lucy Swain
How to Support Learning
173
An opportunity to tell a story
Since Lucy is deferential to Ed, the student has missed an opportunity to find out her thoughts about the business. From now on, he will be dealing with Ed. The Storytellersees this missed opportunity as evidence that the student does not expect Mrs. Swain to be useful in giving him information about the business. The Storyteller can intervene to show that others who have made similar assumptions have seen them not borne out. In fact, if it did not intervene, the student might never realize that this opportunity was ever present. Interjecting a story here points out the student's assumption, perhaps unconscious, right at the time when that assumption has affected the course of the sales call. The Storyteller signals that it has a story that is relevant to this situation, by highlighting its button with the headline: "A warning about something you just did. "
(See Figure 5 for the screen at this point.) If the student presses this button, the Storyteller screen, is shown. (See Figure 6.) The first item on the screen is the bridge that explains why the story has come up. It reads: "If you assume that Mrs. Swain will not have a role in the business o f Swain Roofing, you may be surprised. Here is a story in which a salesperson had a similar assumption that did not hold.""
The student can use the buttons below the video flame to view the video of an Ameritech account executive telling about a sales experience. Here is a transcription of the story: "I went to this auto glass place one time where I had the biggest surprise. I walked in; it was a big, burly man; he talked about auto glass. So we were working on a display ad f o r him. It was kind o f a rinky-dink shop and there was a TV playing and a lady there watching the TK. It was a soap opera in the afternoon. I talked to the man a lot but yet the woman seemed to be #stening, she was asking a couple o f questions. She talked about the soap opera a little bit and about the weather. It turns out that after he and I worked on the ad, he gave it to her to approve. It turns out that after I brought it back to approve, she approved the actual dollar amount. He was there to tell me about the business, but his wife was there to hand over the check. So if I had ignored her or had not given her the time o f day or the respect that she was deserved, I wouldn't have made that sale. It's important when you walk in, to really listen to everyone and to really pay attention to whatever is going on that you see.
The Storyteller sums up the story for the student with the following coda: "An assumption that a spouse will not have a role in a spouse's business may be unrealistic. "
174
A. Kass, R. Burke and W. Fitzgerald
Aik abnut
v
Ask for
T
Comment
T Yellow Pages Industry v Tracking Swain Roofing Appointment Co-op Funding
ii.,,~.,ipi i u
ii i I . . , 1 u i ,
Small talk
ii
~
Experience "~ Reminder of Appointment Reason f o r Appointment
I1/Advertising Waiting
.
Hello.
Name First Name
~
~
. .
My n a m e is k l i k e J o h n s o n .
Figure 4 - Using the action constructor
How to Support Learning
175
This example illustrates the synergistic interaction between simulation and explicit instruction. Without the story to provide the impetus to examine the situation, the student might never realize what opportunities were missed. However, without active engagement in the simulation, the student might lack the motivation and context to understand and remember the story. The conversation can continue for some time as Mike tries to gather information about the business of Swain Roofing. He may uncover all of the important issues or he may not. Eventually, he will go back to his office, design an ad program for the Swains, construct his sales presentation, and come back to try to sell the ad or ads. The entire interaction proceeds in the manner seen here: the student acts in the domain, working toward the goal of making a sale, and receives guidance as teaching modules identify opportunities to present relevant material. DESIGNING INTERACTION WITH SIMULATED CHARACTERS Interacting with a simulated character requires communication in two directions: the user of the system must be able to communicate to the character, and the character must be able to communicate to the user. Each of these represents a difficult technological problem that has not given way to a single, perfect, general purpose solution. The goal of creating a faithful, engaging simulation is tremendously demanding, and must often be balanced against practical engineering concerns. Furthermore, the optimal tradeoffs vary according to the pedagogical goals of each system. In this section we will discuss the various criteria by which one can evaluate a communication interface for a user talking with a simulated character, the different means for communication available, and the trade-offs among the options.
Accuracy and fidelity The first question that jumps to mind when one evaluates a simulation is often, "How faithful is this simulation to reality?" But faithfulness is a many-faceted issue. When one evaluates the faithfulness of a simulation it is often useful to decompose the concept of faithfulness into two more specific, inter-related but distinct concepts, accuracy and fidefity. An accurate simulation is one that achieves the same functional properties as the reality being simulated. For example, a real-world situation in which someone has to choose between three explicitly presented choices can be accurately simulated by any system that presents the same three choices and provides a mechanism for the student to choose. A high fidefity simulation is one that closely matches the specific signals and cues found in the real world. For example, consider a PC-based flight simulator that provides graphics and sound, but does not actually move the user around as do the multi-million dollar units which commercial pilots are trained on. The PC-based flight simulator may be just as accurate as the professional version; in theory it could be even more so if, for instance, it had been updated more recently to reflect new changes to the simulated aircraft. However, it will never be as high fidelity. For educational technology, the appropriate balance between accuracy and fidelity depends on what is being taught, and on characteristics of the target student. When the goal is to teach someone a cognitive skill, such as how to perform a diagnostic interview, accuracy is generally the most important criterion. Choices available to the student must be equivalent to those available in real life, and responses made by the simulated characters must be functionally equivalent even if they are, for example,
176
How to Support Learning
~!~ "~~.~~176 -~ ..............
.~
Figure 5 - Storyteller indicates it has a relevant story
A. Kass, R. Burke and W. Fitzgerald
177
delivered in a different modality. For physical skills, fidelity takes on a heightened importance, and it not inconsequential for teaching cognitive skills either. A simulation that looks and feels a great deal like the real world can help motivate students, and can help them to correctly interpret the mapping between the simulation and reality.
Criteria for input interfaces
Accuracy The user's side of a conversation with a simulated character can be modeled as a two step process: first, the user generates an idea; then, the user expresses it. Both the expressiveness and generativity of language must be accommodated by an interface to as great an extent as possible. A crucial issue is whether the interface adequately covers the space of intentions users will wish to express. Of course, it is impossible to install an intentional analyzer up to the mind of the user to measure the size of this space. At the very least, we would prefer interfaces that allow a user to express easily a larger number of intentions rather than fewer. Of course, some additional assurance will be necessary that the right intentional space is being covered. An accurate interface must allow users to choose how to express intentions in a way natural to them. In contrast, a communication interface that only allowed user to select (recognize) from a list of choices would not create as accurate a simulated conversation as one that allowed the user to generate statements. Choosing from a list a really a very different task from deciding what to say. Choosing is both more restrictive and simpler; the choices are explicitly given. So, a communication interface for speaking to simulated characters can be measured for accuracy by asking: 9 Is the interface expressive? That is, does it allow the user to express what he or she wants to express? Is the interface generative? That is, does it allow the user to form his or her own way of expressing an intention?
Fidelity Creating a high fidelity communication interface helps to prevent communication
breakdown (Winograd and Flores, 1987). Breakdown occurs when the attention of the interface user is drawn to the interface, rather than the task the user is attempting to perform with the interface. The point of the interface is to express an intention; when one's attention is drawn away from this task, this breakdown prevents the user from achieving his or her communicative goals. Additionally, the interface typically serves instrumental goals, that is, the ultimate goal of the user is typically not to hold a conversation, but to hold that conversation to achieve some other goal. For example, in Casper, the student converses with the simulated customer in order to learn how to solve real customers' problems. When breakdown occurs in the conversational tool, it occurs at two levels: first, the user's attention is drawn away from achieving the instrumental goal of conversing; second, the user's attention is also drawn away from achieving whatever it is the conversation is for. Three important criteria for evaluating fidelity are speed, negotiation rate, and modality.
178
A. Kass, R. Burke and W. Fitzgerald
Figure 6 - The Storyteller presents a story
How to Support Learning
179
Speed: The interface must be "fast enough" to be a faithful simulation of interaction. What "fast enough" will mean will vary from interface choice to interface choice, but, in general, fast enough will mean fast enough so the user's attention is not drawn to the interface tool itself, but can express his or her intentions. An interface tool will tend to break down as the time it takes to communicate the intentions to the simulated character increases. Negotiation rate: It is often the case that an interactive program will allow the user to commit to or to reject the utterance that has been selected or built up. If the user rejects the utterance, then the user has to start over in generating an utterance. This is analogous to a human conversation, in which one person says something, call it A, and the other person asks whether by A the first person meant B (a paraphrase of A). The first person can agree, or try again. The more tries that it takes for a user to express an intention will tend to cause breakdown in communication: the user will begin to wonder how to express an intention rather than simply expressing it. Modality: The interaction between a person and a simulated character simulates some interaction between two people in the real world. For example, Casper simulates phone conversations between a customer service representative and a customer. In Yello, the simulation is of a face-to-face conversation between the user and the character. In these examples, the modality of conversation is speech, although in faceto-face conversation, the speech is often augmented with visual cues. The closer the modality of interaction matches the real world, the less likely the conversation will break down, because it will not as likely for the user's attention to be drawn to this lack of congruence in modality.
Supporting the learning goals of the learning environment Two requirements of a learning environment communication interface relate to specific learning goals that the environment is intended to serve. It is a given that no interface will be perfect with regard to accuracy or fidelity. With that constraint, it is crucial that the compromises made not result in an interface that does either of the following: 9 trick the user into making mistakes, or 9 give away information, such as a possible course of action, that is best for the student to discover on his or her own. In Casper, for example, one of the pedagogical goals of the program is to warn students against asking leading questions of customers, because customers will tend to give the answer they think the customer service representative wants. Because the representative is attempting to discover the real cause of the problem, a representative should not ask leading questions. But if the communication interface in Casper made it easier to ask a leading question than a non-leading question, the student may be tricked into asking a leading question. The student may ask a leading question not because he or she intended to, but because it was simply easy to do so. Another criterion is that the interface should not give information that should be hidden. Certain interfaces will require that the utterance choices be articulated or displayed to the student. In tutorial programs, it is very common that a student's choices not be revealed. Following up on the leading question example, we want the interface in a program such as Casper not to articulate the distinction between a
180
A. Kass, R. Burke and W. Fitzgerald
leading question and a non-leading question just because the tutor is built to respond if a leading question is asked. A student may choose to ask a non-leading question simply because, in seeing the distinction between a leading and non-leading version of a question articulated by the interface tool, the student can guess the "right" answer and give it, instead of the answer he or she would give in a real conversation. Thus, we see that there are criteria for interface tools for talking to simulated characters that are general to all software, specific to communication tools, and dependent on the application programs for which they are built. As with all software, we need to be concerned with scale up: whether we can build the software we want to build. Communication tools should also support an accurate simulation of a conversation, especially in supporting the generation and expression of intention. High fidelity communication tools will also help prevent breakdowns in communication: tools that are fast enough, low in negotiation rate, and similar in modality. Finally, the goals for which the pedagogical goals of the embedding learning environment should also influence the design by the communication interface, goals such as not giving away answers in tutorial programs.
Options for input interfaces We will discuss four different interface options for talking to a simulated agent: multiple choice interfaces, action constructors, natural language text and speech recognition. We will focus our attention on action constructors and text interfaces, because multiple choice interfaces fulfill too few of the requirements we have set, and speech recognition technology is not yet practicable. On the other hand, we can build action constructors and natural language text interfaces that do meet many of our criteria.
Multiple choice The first option to consider is multiple-choice selection. That is, the user is offered a fixed number of choices. The user can select one of the options or perhaps opt out of the selection process. Clearly, multiple choice fails on all but the most basic of the criteria for communication interfaces. They are easy to build and are quick in response, but they especially fail in their lack of expressiveness and generativity on the one hand, and modality on the other. In offering the user a fixed number of choices from which to select, they do not meet the generativity criterion. Multiple-choice selections are typically limited in number. This can be due to either cognitive limitations (people can only scan so many choices at a time) or screen real estate issues (only so many choices can fit on the screen at once). Because of this limit, they are limited in their expressiveness as well. Finally, the modality of interaction is very far removed from speech. Speech recognition On the other side of the continuum of options is speech recognition. That is, in the course of a conversation with a simulated character, the user speaks his or her intentions to the character. This is a highly congruent method of interaction in terms of the modality and highly generative. On the other hand, speech recognition technology, even at the state of the art, is highly limited in its ability to scale up to large problem domains. One is limited to what is essentially putting in a speech recognition layer on
How to Support Learning
181
top of a multiple choice selection mechanism. Recognizing speech is beyond the state of the art for highly expressive systems. Thus, practical speech recognition systems are limited in expressiveness. Multiple choice selection and speech recognition are on the two ends of a continuum. A multiple choice system is very easy to design and implement, but highly lacking in accuracy and fidelity. A speech recognition system would be highly accurate and faithful, but is beyond the state of the art, unless one is willing to limit the number of choices that can be recognized. Two interface methodologies that lie between multiple choice selection and speech recognition are action constructors and natural language text input. Action constructors
Casper, Yello, $2 and BoSS all provide action constructors for communicating with their respective simulated characters. An action constructor is a hierarchical set of menus that allow a user to express an intention to do something in the simulated world. This can be any action supported by the simulation, but we will focus on communicative actions. Actions are hierarchically arranged so that more general intentions, such as "Ask about...", are selected before more specific ones, such as "...the fire brigade." The underlying knowledge representation requires an adequate understanding of the task domain and the interests students bring to the domain in a particular situation. For example, in Yello, the knowledge representation needed to reflect the statements and intentions novice Yellow Pages salespeople bring to a selling interaction. Further, action constructors are typically dynamic. New choices are offered as users gain information about the simulated world in which they are acting, and old choices are removed as they become irrelevant to changes in the simulated world. Action constructors tend to meet the expressiveness criterion for an accurate communication device, but fail the generativity criterion. Action constructors, with their dynamically created, hierarchical series of menus can allow a user to express the entire space of intentions that the user has. This assumes that the content analysis has been done correctly in identifying the space of intentions a user might have. Action constructors allow the developer to specify a large space of linguistic (and nonlinguistic) actions take. On the other hand, action constructors essentially require the user to recognize a match between what the user intends to say, and how the system developer expressed that intention. The user does not generate his or her own utterance; rather, the user puts together an utterance from the options that the hierarchical menus make available. Action constructors meet some of the requirements for high fidelity communication systems, and fail others. Certainly, once a statement is chosen, the response is quick. When an action constructor is well done, the hierarchical menus can also provide a quick way for a user to say something. The major limiting step is in the user finding among the hierarchical menus what he or she want to say. Although this is difficult to measure, we would claim that this is not so much a matter of real time (that is elapsed time from when the user begins a search to the time when a selection is made and uttered), but of a virtual time. As long as the user finds the hierarchical menus logically arranged (logical, of course, in the mind of the user), they seem to be in a relatively timeless mode as they search for the right utterance. Although there is often some frustration with the physical requirements of using hierarchical menus, what is more
182
A. Kass, R. Burke and W. Fitzgerald
frustrating for users is when the arrangement does not allow them to find quickly what they want to say. This relates closely to negotiation rate; the number of paths a user takes through the hierarchy may indicate difficulty in negotiating with the interface about what the user wants to say. Finally, it is hard to say that an action constructor is very similar in modality to speech. Action constructors can be quite practical to develop. They provide a direct mapping between what the system is prepared to respond to, and what the user can say. This allows the pedagogical requirements of the system to drive the knowledge engineering requirements, rather than requiting large amounts of additional knowledge engineering for the sake of the interface. The dynamic nature and hierarchical nature of action constructors also tend to allow action constructors to scale up to the size required by the application. In terms of pedagogical requirements, action constructors tend to not trick the user into saying things they do not want to say. Because what can be expressed is fully articulated in the action constructor, a user can know exactly what he or she can say. Therefore, it is usually the case that the user isn't tricked into saying things he or she does not want to. On the other hand, articulating all possible utterances means that information may be revealed to the user that should not be. Action constructors, then, provide a relatively inexpensive technology for building expressive communication interfaces. Although they are a recognition-only technology, they provide a much richer way to communicate with simulated characters than simple multiple-choice systems, and their hierarchical structure can map well into the user's logical mapping of the task. Natural language text Another possibility for building a communication interface to simulated agents is to create a type-in box. The user would type in what he or she wants to say, then natural language processing techniques would map what the user entered into the most appropriate conceptual representation in the program. Such an interface has several good characteristics:
It is generative; that is, it allows users to express (in their own words) what they want to say; It is expressive; that is, it maps well to the space of intentions carried by users and understood by the simulated characters; It is similar to speech, both by virtue of its generativity and the linear nature of the input; It is non-revealing; that is, it would not work against pedagogical goals of not giving away information. It is generally assumed that natural language processing is an "Al-complete" task; that is, to build a system capable of understanding text would require building a general purpose, intelligent machine. However, research in case-based reasoning (Riesbeck & Schank, 1990; Kolodner, 1993) indicates that various knowledge-based indexing techniques can form a basis for natural language processing systems that are both
How to Support Learning
183
cognitively plausible as well as practical to build. One such architecture for natural language processing is indexed concept parsing.
Indexed concept parsing Indexed concept parsing (Fitzgerald, 1995) is a case-based reasoning approach to parsing, in which underlying target concepts (that is, those conceptual representations of the application program identified as forming the intentional space of potential users) are associated with sets of index concepts. Each index concept is associated with sets of phrasal patterns. At run time, the parser looks for phrasal patterns in input text, and the index concepts recognized thereby are used to appraise the best matching target concepts. The architecture defines a range of parsers, in which the complexity of the index concept representations can vary according to the needs of the application program: index concepts can be key words, synonym sets, representations in an abstraction hierarchy, or representations in a partonomic hierarchy. Indexed concept parsing was originally developed to build a parser for Casper. The index concepts in Casper were arranged in a simple concept hierarchy, with phrasal patterns attached to concepts at different levels in the hierarchy. Indexed concept parsing proved accurate, yet required minimal knowledge representation. Measuring expressiveness by how frequently the intention of a user was matched to an internal representation in Casper, we found that the Casper parser, in early tests with real users, had a 83% accuracy rate. Speed of response and negotiation rates were acceptable as well. More details can be found in Fitzgerald (1995). In Table 1 are shown the steps that a user takes to use the indexed concept parser in Casper. The student enters text and requests a parse (the actual graphical interface differs from the idealized interface in the table). The parser returns the best result, which the student can accept. Alternatively, the student can enter a different text, or look at other best matches. The indexed concept parser met many of the criteria we set out for a communication interface to a simulated character. It is accurate, in that is both generative and acceptably expressive. It is sufficiently high fidelity, in that it was fast enough, had an acceptable negotiation rate, and was somewhat similar (especially in contrast to menu-based systems) to the modality of speech. It was practical enough to develop, in that the underlying knowledge representations (index concepts) were simple to build. It met the pedagogical goals, although near matches might alert the student to possible significant utterances the student did not intend. Other parsers built on indexed concept parsing techniques are described in Fitzgerald (1995).
Summary of interface options Discussion of interfaces for communication with simulated characters tend to lie at the extremes: either relatively simple multiple-choice entry systems, or impossible to build speech recognition systems. Action constructors and natural language text interfaces based on indexed concept parsing are two interface technologies that lie near the center of these extremes. They allow for accurate interfaces to be built in exchange for a practical amount of effort for development. Action constructors are easier to build than indexed concept parsers, but are non-generative and lower in fidelity. Indexed concept parsers require more knowledge representation, but allow for a generative interface, somewhat higher in fidelity.
184
A. Kass, R. Burke and W. Fitzgerald
Understanding simulated characters
To create conversations, we must allow the user to express their half of the conversation,but we must also provide the means for the simulated characters to communicate. This area deserves careful consideration in educational settings, because part of what a social simulation can teach is how to interpret the reactions of others. If the simulated characters can only react in a few simple ways, interpretation is unrealistically simple. On the other hand, a certain amount of abstraction can be useful in structuring students' social perceptions. Table 1 -
P a r s i n g f r o m a t y p e - i n box, a c c e p t i n g the best result or c h o o s i n g f r o m t h e best m a t c h e s
Step the student takes
1. The student enters text and presses the "Parse" button.
2. The parser returns the best result, but the student requests more matches.
3. The student selects a choice from the best matches.
What the student sees on the screen I1 What do you Could you describe the bits to me, please? ] ]t~anctt~oSaYetO Parse
I
What kind of bits are in your water?
] |
i What kind of bits are in your water?
L~
Can you run the cold tap for a bit and tell me what you see?
4. The student confirms choice by pressing the "Say this" button.
I Canyou describe the problem? D
I (More...)
Communicating what they say Casper uses digital recordings of human voices to give its characters telephone voices. When students ask questions, the customers respond by playing back one of these prerecorded audio clips. This approach has the advantage of endowing the
How to Support Learning
185
simulation a high degree of fidelity. Obviously, hearing the customer's voice is much more realistic than reading the text of what is being said. Characters' tones-of-voice convey a great deal of information. Making the characters sound as similar as possible to real callers is very important for trainees who are learning to perform a difficult information-gathering task in a telephone-based environment. The drawback of the pre-recorded voice technology lies in its inflexibility. There must be a one-to-one correspondence between literal output and directions that the system can go. It is not possible to build a general reaction statement of the form: "I didn't check X," where X can be replaced by any feature of the water system that the student might ask a simulated character about. Instead, each such possibility must be handled separately and linked to a particular pre-recorded statement: "I didn't check the hot water tap," "I didn't check the color of the water," etc. Because there are many meaningfully different statements that customers make in this environment, maintaining the accuracy of the simulation requires a great deal of engineering work under the pre-recorded voice approach. The pre-recorded option is also inflexible with respect to the nuances of speech. If the possibility exists that the customer might say the same thing several different ways, with different intonations for example, all of these options must be represented separately, and recorded as separate speech events. In the end, we determined that the fidelity needs satisfied by pre-recorded voices were great enough in the context of training to perform phone-based diagnostic interviews, that we were willing to bear the costs required to maintain accuracy in terms of number of utterances that the simulated user could make. While burdensome, the number was manageable in the Casper domain because the water-diagnosis task is not completely open-ended. There is a reasonably well-defined set of reasonable student utterances to which the simulated customer must be equipped to respond. Thus the meaningful mistakes, which as we discussed in the introduction, must be open to the student in order to effect the cognitive change that is the objective of any GBS, could be accommodated within the pre-recorded voice approach. When the open-endedness of the simulation is even more important, as we judged it to be for the tasks taught by both Yello and $2, a more flexible, but less realistic and less nuanced approach is called for. Selling, for example, is a skill that each expert goes about in a slightly different way; while there are general tips and habits to be taught, success often means finding the approach that fits one's own particular personality. Therefore, a crucial factor in achieving accuracy is allowing a very broad range of approaches, which in turn calls for an open-ended simulation. In the $2 Trainer, the problem was even more severe. Because the simulated characters must respond to configurations of icons that the trainee places on map overlays, those characters must be able to generate an almost infinite set of responses. For example, the a battalion intelligence officer is required to place icons on a map overlay representing predictions about where enemy units will be placed, and what sort of units they will be. In order to accurately simulate a commander's response to all possible trainee actions, the prerecorded voice approach would require pre-defining an appropriate utterance for every possible coordinate where the student might place each type of icon. For these reasons, in Yello and $2 we sacrificed fidelity to maintain accuracy. Characters' speech is output as text in cartoon-like text balloons above their heads. This approach allowed great flexibility in the development of these systems, allowing
186
A. Kass, R. Burke and W. Fitzgerald
us to generate character's utterances through the use of templates that could be filled in at run-time rather than specifying every utterance completely in advance. One intermediate approach that we have not fully explored is the use of speech generation. This would add back some of the fidelity lost by using text, without loss of flexibility. Existing speech generations systems do not produce the nuances of intonation, the most important benefits of pre-recorded speech, but this technology holds great promise for the future. Communicating what they do A face-to-face conversation is much more than two voices going back and forth. Gesture, body language and facial expressions all carry information that enriches the interaction. These aspects of social simulation present similar problems to those discussed with respect to speech output: flexibility vs. richness. The most rich and flexible system would be one in which each aspect of a character could be modeled with great detail and their reactions rendered on real bodies. For extremely simple characters, such detailed real-time simulation has been attempted (Bates, Loyall & Reilly, 1992). For human characters, however, tradeoffs are required. The equivalent to "recorded speech" would be digital video of a character responding in the conversation. The amount of nuance that can be conveyed is quite high and it is possible to build engaging, although limited, simulations in this way (Stevens, 1989). It is very expensive to use digital video for an open-ended simulation. One reason for the high cost is that the storage requirements are very great. A conversation of 90 minutes or so, (the time that students typically spend on the Swain Roofing scenario) with an open-ended set of options available to the student, would require thousands of minutes of video. Even with the best compression schemes available today, this would require gigabytes of storage per scenario, making multiple scenario systems much more storage-intensive than most users can currently afford. A second cost consideration is the time and expense of producing high-quality video for interactive simulations. Changing a few details in the simulation could require a complete re-shoot of the associated video, creating a barrier to flexible development even higher than that associated with recorded voice. As discussed above, we have treated open-endedness as an important design consideration in Yello and therefore chose not to use video to depict characters' visible responses. We were lett, then, with the challenge of depicting this same information in other ways. We have explored two alternatives to video for depicting the non-literal information that video conveys. In Yello, we chose to use still images to show what characters look like: their dress and general demeanor. For dynamic feedback about character's emotional state, we used an abstract technique, associating with each character a set of "visible emotion meters," which are intended to represent visible changes in a character's demeanor that the student should be able to recognize. We used three such meters: happy/angry, interested/bored, and calm/threatened. One problem with this approach is that the meters are abstract; they do not teach students how to identify customers' body language. In Yello, we have compensated for this lack, to a certain extent, by using the simulation as a jumping off point for tutoring that addresses these issues in all their subtlety.
How to Support Learning
187
Other systems we have worked on use less abstract representations of observable expressions. The BoSS system, for instance, has a library of still images associated with each character instead of the single image used in Yello. Each image contains a different expression and appropriate images are recalled and presented as the character's expression changes. Although somewhat limited in amount of nuance that can be expressed, this approach is more natural than the abstract meters. Depicting their surroundings Another important detail in understanding a conversation is understanding the setting in which it takes place. A sales call that takes place across a kitchen table is different from one that takes place in a plush office with leather furniture. The appearance of a client's place of business tells the salesperson a great deal about the business and the client's personality, and clients' dress says something about the way they expect business to be conducted. In all of our simulations, we put significant effort into getting detailed, realistic visual scenes that contain useful information. I N T E R A C T I N G W I T H THE T U T O R Why interacting with a tutor is different
The interface considerations that enter into the design of the student's interaction with a tutor are in many ways similar to those that pertain to considerations discussed above. At one level, a tutor can be thought of as just another agent with which the student communicates. The central considerations in both cases are how to make it easy for the student to express things to the agent, and how to make the communication from the agent to the student clear. However, the purpose of the communication is different, and that has important interface implications. The most common communication from the student to a tutor is a request for help. Our technique for addressing this interface concern in Casper is simply to make certain utterances that a student might want to address to a tutor available at all times the form of"WhyT" and "Now WhatT" buttons. Not all tutor/student interactions are extended conversations. Often, the most effective form of intervention that a tutor can provide is simply to present a case of success or failure from the past. Since this form of communication is less interactive than a conversation, the focus is on communication from the tutor to the student. Making that communication clear and interesting, and communicating the relevance of the case are the crucial concerns. The need to make sure that this sort of casepresentation tutoring holds the interest of the student is one of the considerations behind our use of video for the cases in Yello. When tutoring does take the form of an extended conversation, as in Casper's Socratic-style dialogues, those conversations still have a different function, and therefore a different style from the interaction with simulated characters. While simulated characters may be acting on the basis of a wide range of goals, the goal of the tutor is always pedagogical, and the style of interaction is dependent on which pedagogical goals are active, and what pedagogical strategies are being pursued. For example, a tutor will interact one way when its interaction is designed to convey a set of functional relationships, and differently when it is trying to push the student to choose and defend a hypothesis. The general interface challenges are different then for
188
A. Kass, R. Burke and W. Fitzgerald
tutorial dialogues; the concerns of accuracy and fidelity that are important for simulated characters are replaced with an emphasis on supporting a pedagogical strategy. In the rest of this section we will look a bit more closely at three styles of tutoring that were illustrated in the interactions with Casper and Yello that were presented earlier in this paper. Case presentation In social arenas, there often are no hard-and-fast rules and it is not always possible for the tutor to be certain that the student is in error. There will be some aspects of the social world in which the student's general social knowledge will exceed the tutor's. This makes it difficult to apply ~raditional tutoring methods that call for the tutor to know the right thing to do in every circumstance (Anderson, 1988). One effective strategy the tutor can use in such situations is to make reference to the experiences of others. Instead of saying that the student is in error, it can say "Here's a situation in which what you're doing turned out to be a bad idea." Instead of saying that the student should perform a certain action, it can point out a situation in which that action led to a good result. The tutor can leave it to the student to form a judgment about whether the advice is relevant, and if so, how to apply it. Since it is not required to present one "right" answer, the tutor can show multiple perspectives on difficult issues. It might, for example, bring up two stories, one about someone who was successful doing what the student is doing and another about a failure in the same situation. It is important for students to recognize that even experts can disagree about the best course of action (Lesgold & Lajoie, 199-1). Case-based reasoning theory (Riesbeck & Schank, 1989; Kolodner, 1993) suggests that this tutoring strategy meshes well with the student's need to acquire relevant cases. Such a storytelling tutor broadens the student's experience by bringing in relevant experiences that expert salespeople have had. In the Yello example above, the story shows an instance that demonstrates that the student's approach could be ineffective, even if it appears to be succeeding in the scenario; conversely, a story could show the student that he or she is taking the right approach even when the simulation does not respond to it. There is an emerging body of literature within the study of education and psychology that emphasizes the importance of stories (Witherell & Noddings, 1991; Hunter, 1991; Carter, 1993). In particular, we use first-person narratives from experts about particular episodes in the exercise of their skills. In apprenticeship situations, stories of this type are often used in a similar way to show useful examples relevant to the learner's current experience (Lave and Wenger, 1991). It is useful to distinguish these stories from other kinds of cases that a tutor might use, such as design examples, re-enactments or invented cases. First-person stories have properties that make them particularly useful for instruction (Schank, 1990; Witherell & Noddings, 1991): Authenticity: the fact that such stories come directly from a person's real experience and are therefore relatively trustworthy as accounts of the real world,
How to Support Learning
189
Detail: the tendency of such anecdotes to be vivid and detailed, and Cultural content: the way in which personal stories reflect a person's beliefs and values. The demands of authenticity and detail encourage the use of the most vivid means of story presentation. Research in video-based learning environments has discovered that students find stories quite compelling when the act of storytelling is recorded on video and replayed (Ferguson et al., 1992; Slator et al., 1991). Even stories that are fairly lengthy can maintain interest when presented on video at the right time. As video sequences, stories themselves are told the same way every time. This is a problem because students need to be given some kind of explanation of the tutor's interruption, and the relevance of a story may not be immediately obvious. In human storytellers, the purpose behind a story's telling permeates its production: the teller puts particular emphasis on those aspects of the story that contribute to the point. Since we did not have the ability to tailor stories themselves, we instead took the approach of creating tailored explanations. In Yello, we developed a set of templates for introducing stories, called headlines, bridges and codas. Headlines provide a short functional description of the story. Bridges and codas introduce and explain each story, allowing the tutor to capitalize on the shared context of the simulation environment that the student and the tutor are both observing. Yello's storytelling tutor, called SPIEL (Burke, 1993; Burke & Kass, in press), uses a library of storytelling strategies to retrieve stories that make a variety of educational points. Each storytelling strategy has natural language templates for the headline, bridge and coda. Before a story is presented, natural language phrases are generated to fill the spaces in the template and produce texts tailored to the student's situation. The headline, bridge, and coda play important roles in helping the student to understand the relevance of the story, to make the analogical connection between the simulation and the case described in the story, and then to transfer lessons from the story back to the simulated world in which the student must act. This four part structure to SPIEL's case-presentation sequence (headline, bridge, story, and coda) can be thought of as a simplification of the six part structure (abstract, orientation, complicating action, evaluation, result, and coda) used by Labov (1972) to describe conversational narratives. The term "coda," which we use to refer to the final recapitulation, is borrowed directly from him. Here is an example of the templates associated with one of SPIEL's strategies: Headline: A warning about something you just did. Bridge: If you assume that , you may be surprised. Here is a story in which had a similar assumption that did not hold: Coda: An assumption that may be unrealistic. In the Yello example above, the tutor uses this bridge and some simple natural language generation to add the items in italics to create the following bridge:
190
A. Kass, R. Burke and W. Fitzgerald "If you assume that Mrs. Swain will not have a role in the business of Swain Roofing, you may be surprised. Here is a story in which a salesperson had a similar assumption that did not hold:"
A storytelling tutor offers a fairly non-directive form of tutoring: it does not tell the student what to do. In this, the tutor relies on the fact that students do have knowledge about social interactions and selling. Students can use the information provided by the bridge and coda to judge whether the advice of a story is relevant, and if so, how to apply it. The tutor does not have to guarantee that the student gets it right. If a student in the Swain Roofing scenario does not manage to make a sale to Ed Swain, the learning experience in Yello is not greatly diminished. Storytelling tutoring will be successful even if students do poorly in the simulation because they will come away having been exposed to some important cases and seen how they apply in particular contexts. Such students have begun to build the case base of experience on which their expertise will be founded. The main interface concern therefore is to ensure that students understand the relevance of stories that are presented. This is achieved in Yello by tailoring the presentation with introductory texts generated on the fly.
Socratic-style tutoring Case presentation and direct instruction can be very effective and, when done well, very engaging ways to convey relatively pithy general principles. The methods are particularly effective for soi~ skills, where large mechanistic models are not the central issue. However, these techniques are not as appropriate when the goal of tutoring is to help the student refine a complex model of a large causal system. Presentation-based methods are not as effective for such teaching goals because they do not force the student to be active enough, and they do not respond to the student's need for knowledge in a fine-grained way. For a tutor to be effective at helping the student acquire complex reasoning skills, such as those involved in conducting a diagnostic interview, it must encourage the student to examine his or her own misconceptions. The tutor in the Casper system is an example of the more interactive, Socratic-style of tutoring that can address the need for this type of intervention. The student engages in an extended dialog with the tutor in which the student's assumption can be questioned and his or her problem-solving techniques critiqued. This depends crucially on an interface that allows a student to communicate his or her conceptions to the tutor. What the Casper tutor does
Students engaged in a causal reasoning task need to be able to invoke a tutor directly when they realize they are stuck. For example, if a student using Casper is confused about why something happened or what to do next, he or she can explicitly invoke the tutor. The student asks the tutor "WHY?" or "NOW WHAT?" by hitting the button with that label (as seen above the transcript in Figure 1). In addition to responding to those explicit invocations by the student, an effective tutor somtimes also needs to automatically intervene in response to certain actions. For example, the Casper tutor will be activated, and will challenge the student when the student announces a hypothesis to the customer which is not supported by the evidence thus far collected. It may also initiate a dialog with the student when he or she
How to Support Learning
191
indicates an incorrect, or unsupported understanding of the situation, for instance through interactions with the CCS or an on-line hypermedia map of the water system. The precise algorithm used to decide when the tutor should be invoked, and what to do once it is invoked is beyond the scope of this paper (Jona, in prep.); we will just discuss illustrative examples here. The algorithm for determining when to tutor and what strategy to pursue is determined by the system designer through the use of a set of general purpose tutor-authoring tools partially described in (Jona & Kass, 1993). The Casper tutor does not merely tell the student the correct answer to a question, but instead tries to lead the student through an appropriate chain of reasoning. The goal of the tutoring is not to reveal the solution to the simulated customer's problem, but to teach the student how to solve problems like it. For example, when the student clicks on "NOW WHAT?" the tutor responds as depicted in Figure 7: Tutor:
To help you decide what to do next, we need to understand your current goal. Choose the item at the right that best describes your current goal.
The buttons at the right cover the range of activities that are appropriate when attempting to form a diagnosis and fix a problem. They are as follows: 9 Gather Information 9 Examine Possible Causes 9 Narrow Down the Likely Causes 9 Act On a Diagnosis In addition to these buttons there is a final choice, intended for students who really do not understand the theory-building process: 9 I Don't Have a Clue If the student clicks this last button, the system presents a detailed explanation of what the other buttons mean, and describes the general sequence one should go through in determining how to solve a customer's problem. If the student chooses one of the other buttons, the tutor asks the student to attempt the next step that will help fulfill the chosen goal. If the student has chosen an inappropriate goal, the student will either discover this while attempting to meet the tutor's request, or will be informed by the tutor upon asking for assistance. For example, a student who indicates that he wants to act on a diagnosis is first asked to indicate his diagnosis using the water map, and is then led to a more appropriate goal if it is too early in the diagnosis process to settle on a hypothesis. When the student's goal is appropriate, the tutor reacts supportively; it will often present a video of a water company expert explaining how to do the sort of thing that the student has indicated he or she is trying to do. When the system invokes the tutor to respond to a mistake the student has made, it does not simply announce what the student has done wrong and what should be done instead. Rather, it asks the student to explain his or her own reasoning, and it critiques that reasoning. For example, if the student announces an unsupported hypothesis about the cause of the customer's problem, the tutor will ask the student to defend the hypothesis. The student communicates the reasoning behind the hypothesis by
192
A. Kass, R. Burke and W. Fitzgerald
selecting, from the transcript, specific utterances made by the customer which the student believes to be evidence for the hypothesis. The tutor then uses its expert domain model to critique the student's reasoning. For instance, the tutor might indicate that the items selected by the student give some evidence for the student's diagnosis, but that more likely causes exist. At this point the student may choose to receive a more detailed analysis of the evidence. The tutor might then ask the student to explore the water map to find other possible causes, and would ask the student to defend the new, alternative hypothesis in the same way as the old. If the student makes a mistake, such as asking the customer to do something that is expensive or inconvenient without good cause, or asks leading questions, the tutor will break in, ot~en with a video of an experienced CSR telling a real-world story about a time when he or she made a mistake similar to the one the student is currently making. By recounting the negative consequences of a mistake, just at the time when the student is making that mistake, the expert helps drive the lesson home in a very effective way. After the tutor offers negative feedback on a statement the student has made, the tutor allows the student to retract that statement before going returning to the customer.
Knowledge needed to make Socratic-style tutoring work Socratic-style tutors, like the one in Casper, require two important kinds of knowledge. First, the tutor must have access to domain-independent strategies for deciding when to tutor, and how to manage the teaching interactions. In order to implement those strategies, the tutor must access the second kind of knowledge, which is knowledge of the specific domain. In Casper, this domain knowledge includes the causal chains that relate symptoms at the tap to root causes in the water system. Each symptom can be linked to various causes at one of several levels of certainty and each potential cause can predict the existence of several symptoms, also at one of several levels of certainty. For instance, the domain model encodes the fact that orangecolored water is usually a symptom of rust in the water, which is in turn caused by something stirring up rust in the mains, and the possible causes of that include a burst in the main, work on the main, or a fire truck drawing water from a hydrant. Casper includes a set of authoring tools that can be used to develop the domainindependent strategies and the specific domain models. Applying the tutor to a new domain requires the use of the tools to author a new domain model, and perhaps to adapt the general strategies somewhat. No programming is required to do this. Casper's strategies for intervention are contained in a list of rules that stipulate when a teaching interaction should take place, and which specific dialog with the student should result. An example of a tutoring strategy in Casper is: IF the student makes a diagnosis of the problem AND there is not enough evidence for the student's hypothesis THEN execute the following tutoring sequence 1. Ask the student to justify his or her diagnosis. 2. Explain the insufficiency of the student's justification and why the diagnosis is premature. 3. Ask the student to retract the diagnosis statement. 4. Help the student with the next problem-solving step.
How to Support Learning
Figure 7 - Casper's Socratic Tutor Responding to "Now What?"
193
194
A. Kass, R. Burke and W. Fitzgerald
After using its domain model to determine that the preconditions for this strategy have been met, the tutor executes the strategy through the use of a series of rules and templates that allow the task-specific details to be spliced into a general interaction. For example, step 1 in the sequence above might be presented in a particular context as follows:
Tutor:
What evidence leads you to believe that Ms. Hughes' milky coloured water has been caused by work on the service pipe?
This query is generated from a general purpose template: What evidence leads you to believe that has been caused by <current-student-hypothesis>? Some of the fillers needed to instantiate some of the templates are a function of specifics of a call (for example, the name of the customer, or the specific hypothesis that a student has announced). Other fillers are drawn from the domain model. Still others are drawn from another source of tutoring knowledge, which is the system's case-base of video clips.
Delivering tutoring through a simulated character Up to this point we have been maintaining a strict separation between two different kinds of interaction that a student has with the system: Interaction with a character that is part of the simulation, and interaction with a tutor that watches over the student's interaction with the simulation but is not itself part of that simulation. This separation between these levels of interaction has a kind of elegance to it, and it ot~en serves the student well also, since the existence of an external tutor that looks and feels different from the characters in the simulated world keeps the student from becoming confused about when the simulation is running, and when there is a "time out" for tutoring. However, there are times when it is both helpful and realistic to blur the distinction between the tutoring function and the practice environment, and we therefore offer a few words on that topic here. One problem with the external tutor is that it is rather heavy handed. In some situations we are likely to appreciate having the equivalent of someone looking over our shoulder who will interrupt us from time to time to offer help, but in others we will not. One approach to delivering tutoring without interrupting the simulation is to have feedback and assistance delivered by one of the characters within the simulation. This is only appropriate when it can be done realistically: when the environment being simulated contains characters who might realistically provide such feedback in real life. We use this approach extensively in the $2 Trainer. The commander and other senior officers often critique an intelligence officer's work during formal briefings as well as informal interactions. Therefore, it seemed appropriate to use that form of tutoring in our simulation. The effect can be more subtle than explicit tutoring (which we also use in the $2 Trainer). Useful accurate feedback can be mixed in with smalltalk or even with incorrect information: even one's commanding officer is not always right. An important advantage of tutoring delivered by a simulated character is that it is delivered as part of the flow of the simulation rather than as an interruption. If a
How to Support Learning
195
simulation can captivate the student's interest (as they often can), then the student will be very focused on the feedback that the simulated characters give. On the other hand, feedback from a simulated character does not always have the impact of human tutors, captured on video, recounting experiences they have had in the real world. Therefore, we do not believe that this form of tutoring should typically be used to the exclusion of the others we have discussed, but when it is both practical and realistic a particular form of tutoring to be delivered by a simulated character rather than an external tutor, this is often a good element to throw into the tutoring mix. FUTURE DIRECTIONS The preceding sections have contained no discussion of how to communicate the students' non-verbal behavior because nobody has built a social simulation that does much with such input. This is unfortunate, since it means that social skills that involve such non-verbal communication are currently beyond the reach of this form of computer-based education. The technology for sophisticated output (video, rendered graphics) has vastly outstripped that for input. Some researchers in virtual reality, computer vision, and advanced interfaces are beginning to experiment with technologies to broaden the range of possible inputs to include gesture, facial expressions, and body language. When such modalities become possible channels of input, their use will become important considerations for the creation of social simulations. Still, the same considerations of fidelity vs. engineering will remain. If we incorporate student's gestures into our systems' inputs, the simulated characters will have to have accurate responses to those gestures. These are long-term research problems. Our near-term research agenda revolves around the creation of specialized authoring tools by which simulations of the types discussed above can be more easily and quickly generated. We are working toward a set of tools that will be sufficiently flexible that designers will be able to choose among a range of fidelity tradeoffs without doing any custom programming. CONCLUSION In developing simulations that allow students to learn social skills by practicing on a computer, we have explored a space of different interface choices as shown in Table 2. Each case has involved tradeoffs between fidelity, the "feel" of the interface, and engineering concerns, including scale up, flexibility, and maintainability. In some cases, complete fidelity is beyond the state of the art, such as the understanding of free natural language speech input. In other cases, such as the presentation of non-verbal behavior of simulated characters, a very high level of fidelity is technically feasible through the use of digital video, although the need for the simulation to maintain runtime flexibility often requires some sacrifice in fidelity. The challenge is to ensure that the fundamental requirements of pedagogically effective communication are met given the inevitable compromises that must be made. If cognitive technology is to be the study of how tools affect cognition, then educational tools are likely to be among its most important beneficiaries, because the whole purpose of educational tools is to create cognitive change. Learning environments involving interaction between a student and a set of simulated characters
196
A. Kass, R. Burke and W. Fitzgerald
represent an important challenge for cognitive technology because they are among the most complex educational tools that we are likely to build. Table 2 - Different interface options explored in Casper, Yeilo and other educational social simulations. Description 1. Students verbal behavior 2. Characters' verbal behavior 3. Characters' non-verbal behavior 4. Tutorial interactions
Options explored Action constructor Text input Audio Text Meters Still photos Tone of voice in audio "Socratic" dialog Storytelling Tutoring from simulated characters
Likewise, the theory of cognitive technology will contribute greatly to education if it can help developers build effective learning environments. Computer systems will be effective at teaching only if they match, in fundamental ways, how people learn. If people learn best by doing, learning social skills via educational interactive story systems must involve allowing students to interact with realistic simulated characters. We have found that creating social simulations that are effective at creating cognitive change requires providing accurate, reasonably high-fidelity means of both listening to and responding to those characters. The fundamental requirements for how a student talks to the simulated characters are that the interface be expressive (that is, allowing students to say what they intend to say) and generative (that is, allowing students to generated their own means of saying what they intend to say). These fundamental requirements we have described as accuracy criteriamother requirements (such as an interface that is fast enough and similar in modality) we have described as fidelity criteria. Meeting the accuracy criteria provides just the crucial realism to make learning effective; meeting the fidelity criteria can additionally keep students from noticing they are in a simulated world, and hence prevent communicative breakdown. The fundamental requirements for simulated characters' communication to the student are that their responses be both wide ranging and compelling. There are many things to learn in social domains. The characters must be able to react in many different situations in order for social knowledge and skills to be acquired by students. Further, their responses need to be compelling- that is, the characters must present realistic challenges for the exercise of student's social skills. It is not enough for characters to be visually engaging, for example; there must be a point to their particular responses. Fidelity issues, such as faithfully mimicking how real people express emotion, also come into play in preventing breakdown and retaining the interest of the student. However, the most important feature is depth: simulated characters need a wide range of realistic responses in order to come off as characters, not caricatures.
How to Support Learning
197
Interaction with well-designed sets of simulated characters can facilitate a great deal of learning, but simulated characters alone have some limitations, which the tutoring modules described above are designed to overcome. One example of those limitations that we have discussed is that engineering considerations often impose restrictions on the fidelity of a simulated character's output. For instance, body language and tone of voice is lost when the simulation uses only static pictures and text. When this loss of fidelity obscures an important lesson, explicit comments from a tutor can make the point that would otherwise be lost. This approach can be especially effective when the comments take the form of a video clip of an expert retelling a story about a similar situation in which the cues that the simulation does not convey were present. The expert can provide high-fidelity demonstrations of what the cues look or sound like, and can explain how to deal with them in context. Other limitations that a tutor can help mitigate are not a function of specific technological limitations, but rather are inherent in any learning-by-doing approach. When students do not know what to do, simply allowing them to try and fail is not an efficient learning method unless they can get help understanding their mistakes. Assigning blame for a failure is difficult so that student can become frustrated when things go wrong if they are not helped to figure out what went wrong and why. Similarly, when the student has a success at a complex task, it is often difficult to repeat that success without help determining which decisions contributed to that success. The tutoring modules we have discussed all can help with this valuable sort of credit and blame assignment. There is great synergy in combining an accurate, but relatively low-fidelity simulation with high-fidelity tutoring. The simulation gets students involved and focuses their attention, forces them to make important choices, and allows them to make meaningful mistakes. The tutor fills in the gaps in the simulation, adding fidelity just where it is needed, making implicit principles explicit, and helping students to understand the causes of their failures so that they can really learn from their mistakes. REFERENCES Anderson, John R., 1988. The Expert Module. In: Martha. C. Poison and J. Jeffrey Richardson, eds., Foundations of Intelligent Tutoring Systems. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Bates, Joseph, A. Bryan Loyall and W. Scott Reilly, 1992. Integrating Reactivity, Goals, and Emotion in a Broad Agent. Technical Report CMU-CS-92-142, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 1992. Burke, Robin, 1993. Representation, Storage and Retrieval of Tutorial Stories in a Social Simulation. PhD thesis, Northwestern University. Issued as Technical Report #50. Institute for the Learning Sciences, Evanston, IL. Burke, Robin, and Alex Kass, in press. Supporting Learning through Active Retrieval of Video Stories. Journal of Expert Systems with Applications 9(5). Carter, Kathy, 1993. The Place of Story in the Study of Teaching and Teacher Education. Educational Researcher 22(1): 5-12, 18. Collins, Allan, J. S. Brown and S. E. Newman, 1989. Cognitive Apprenticeship: Teaching the Crafts of Reading Writing, and Mathematics. In: Lauren B. Resnick, eds., Knowing, Learning, and Instruction: Essa/ys in honor of Robert Glaser. Lawrence Erlbaum Associates.
198
A. Kass, R. Burke and W. Fitzgerald
Ferguson, William, Ray Bareiss, Lawrence Birnbaum and Richard Osgood, 1992. ASK Systems: An Approach to the Realization of Story-based Teachers Technical Report #22. Institute for the Learning Sciences, Evanston, IL. Fitzgerald, Will, 1995. Building Embedded Conceptual Parsers. Technical Report #63. Institute for the Learning Sciences, Evanston, IL. Hunter, Kathyrn M., 1991. Doctors' stories: the narrative structure of medical knowledge. Princeton, N.J.: Princeton University Press. Jona, Menachem Y., and Alex Kass, 1993. The Teaching Executive: Facilitating Development Of Educational So~ware Through the Reuse Of Teaching Knowledge. 10th Annual International Conference on Technology and Education. Massachusetts Institute of Technology. Jona, Menachem Y., and Alex Kass, forthcoming. Using Simulated Colleagues to Teach Analysis, Planning, and Communication Skills. Institute for the Learning Sciences, Northwestern University. Jona, Menachem Y., 1995 Representing and Applying Teaching Strategies in Computer-Based Learning-by-Doing Environments. Unpublished PhD thesis, Northwestern University. Kass, Alex, 1994. The Casper Project: Integrating Simulation, Case Presentation, and Socratic Tutoring. Technical Report #51. Institute for the Learning Sciences, Evanston, IL. Kass, Alex, Robin Burke, Eli Blevis, and Mary Williamson, 1994. Constructing learning environments for complex social skills. Journal of the Learning Sciences 3(4): 387-427. Kolodner, Janet L., 1993. Case-based Reasoning. San Mateo, CA: Morgan Kaufmann. Labov, William, 1972. Language in the Inner City: Studies in the Black English Vernacular. Philadelphia: University of Pennsylvania Press. Lave, Jean, and Etienne Wenger, 1991. Situated Learning: Legitimate Peripheral Participation. Cambridge University Press. Lesgold, Alan, and Suzanne Lajoie, 1991. Complex Problem Solving in Electronics. In: Robert J. Sternberg and Peter A. Frensch, eds., Complex Problem Solving: Principles and Mechanisms. Hillsdale, NJ: Lawrence Erlbaum Assoc. Newman, Denis, Peg Griffin and Micheal Cole, 1989. The Construction Zone: Working for Cognitive Change in School. Cambridge University Press. Riesbeck, Christopher K., and Roger C. Schank, 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Lawrence Erlbaum. Schank, Roger C., 1982. Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press. Schank, Roger C., 1990. Tell Me a Story: A New Look at Real and Artificial Memory. New York: Charles Scribner's Sons. Schank, Roger C., 1994. What We Learn When We Learn By Doing. Technical Report #60. Institute for the Learning Sciences, Evanston, IL. Schank, Roger C., and Robert Abelson 1977. Scripts, Plans, Goals and Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Schank, Roger C., Andrew Fano, Menachem Y. Jona, and Benjamin Bell, 1993. The Design of Goal-Based Scenarios. Technical Report #60. Institute for the Learning Sciences, Evanston, IL.
How to Support Learning
199
Slator, Brian M., Christopher K. Riesbeck, Kerim C. Fidel, M. Zabloudil, Andrew Gordon, Micheal S. Engber, Tamar Offer-Yehoshua, and Ian Underwood, 1991. TAXOPS: Giving Expert Advice to Experts Technical Report No. 19. Institute for the Learning Sciences. Stevens, S. M., 1989. Intelligent Interactive Video Simulation of a Code Inspection. Communications of the ACM 32 (7): 832-843. Winograd, Terry, and Fernando Flores, 1987. Understanding Computers and Cognition: A New Foundation for Design. Reading, MA: Addison-Wesley. Witherell, Carol, and Nel Noddings, eds., 1991. Stories Lives Tell: Narrative and Dialogue in Education. New York: Teachers College Press.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Me), (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
201
Chapter 11 E - M A l L AND I N T I M A C Y Richard W. Janney University of Cologne and University of Frankfurt, Germany
[email protected] "Our electronic extensions of ourselves ... create problems of human involvement ... for which there is no precedent." Marshall McLuhan (1964:103-104)
INTRODUCTION One wonders whether Marshall McLuhan could have imagined the speed with which his 'global village' would shrink, in less than three decades, through the development of computer networks like INTERNET, to the size of a desktop - and then, only shortly thereafter, with the advent of the laptop computer, to the size of a briefcase. McLuhan's analyses of the effects of electronic communication on perception and social organization greatly influenced discussions of the mass media in North America during the sixties and seventies. Today, they have been largely superseded by later developments. Although McLuhan did not foresee the development of electronic mail, he dreamed of technologies capable of someday achieving what he called "world-wide integration" and "world-wide consensus" (1964: 106). E-mail is perhaps the closest thing we presently have to a technology capable in principle of achieving those goals. At least it provides almost unlimited possibilities for global interaction. The question addressed in this paper is what kind of interaction this is, and, above all, what is the quality of this interaction. In a very general sense, notions of integration and consensus are characterized by convergence: by a metaphorical 'joining' of interests or attitudes; by a movement of partners metaphorically 'closer' to each other; by a sense of merging common bonds of some type (cf. Malinowski, 1923). Thus viewed, convergence is a relational state, and is not necessarily strictly ideational. In order to achieve interpersonal convergence, we have to be able to relate to each other emotively (of. Caffi and Janney, 1994). We have to be able to 'read' and respond to each other's displays of affect; we have to be attuned to subtle relational messages communicated iconically by how we interact with each other: by choices of words, tones of voice, gazes, gestures, facial expressions, and
202
R. W. Janney
so forth (cf. Arndt and Janney, 1991). Successful emotive communication is a prerequisite for interpersonal convergence (cf. Watzlawick, Beavin and Jackson, 1967). In McLuhan's terminology, e-mail could be described as a hybrid form of communication in which the so-called 'hot' medium of linear print is translated into the so-called 'cool' medium of the video image. The text appears as lines of ASCII characters on a PC monitor, or as a computer printout. The senders and receivers are connected interactively, and their messages move through the network at the speed of signals in the central nervous system. When information moves this fast, McLuhan argued (1964: 103), perceptual processes change, and old patterns of psychic and social adjustment begin to break down, often with unexpected results. This paper is about emotive communication in e-mail, and about some, possibly unique, emerging patterns of interaction and interpersonal adjustment in modem e-mail society. E-MAIL INTIMACY One of the unexpected results of e-mail has been the phenomenon of 'e-mail intimacy'. Most veteran e-mail users have experienced it: the paradoxical sense of immediacy or nearness sometimes experienced with strangers in this medium. There is a remarkable contrast between the cool, disembodied impression made by the e-mail medium itself, and the heated involvement of some of the interaction that it seems to produce. For not yet fully understood reasons, e-mail seems to encourage displays of affect that would be unusual between strangers in normal written correspondence, telephone conversation, or face-to-face interaction. In these cases, it is almost as if, in spite of the coolness of the medium and the vast distances that it bridges, the partners are 'close': easy to get in touch with, easy to exchange information with, easy to be friendly with, and, if problems arise, easy to be unfriendly with. The sense of the partner's eminent 'interpersonal availability' in e-mail, is quite different, I think, from feelings of intimacy in other types of human interaction; it has a curiously abstract, virtual quality that is perhaps unique to the e-mail communication process. To give an example, during the Gulf War, linguists around the world received email reports from a colleague in Israel, describing the day-to-day life of his family, as they sat confined by circumstances to their home, fearful of rockets and gas attacks, hoping for the end of the war. This moving, personal, electronic document - a private diary shared publicly with hundreds (or thousands) of e-mail users - brought the intimate reality of the writer's family's attempts to cope with the uncertainties of the war into vivid focus for the rest of the INTERNET world. The diary was originally circulated among a small circle of friends. Soon, however, due to the large number of forwarded copies, it became an e-mail happening of global dimensions, with different functions for different groups of readers. For the writer and his friends, it no doubt continued to be a means of keeping in touch and maintaining emotional bonds during the trying times. For many strangers, however, it was news: timely information about the psychological, social, and political situation in Israel. And for still others, reading the diary from installment to installment was like eavesdropping on the family through the PC monitor, or like watching a gripping true-life drama on television. With a sort of voyeuristic curiosity, one turned on the PC wondering, "how are they doing today?"
E-mail and Intimacy
203
As one of the many secondary receivers of this dramatic document, I read it with increasing empathy for the writer and his family, and with a certain growing sense of contradiction. Who were the private details meant for7 The writer was clearly sharing his feelings - b u t with whom? As only one of his own diary's many 'senders' (counting the 'forwarders'), his status in this event was not entirely clear to me from the standpoint of the communication processes involved. And what was my status as a coincidental 'receiver' of the diary7 Where were the emotive bonds, in the Malinowskian sense, between myself and the writer7 He seemed somehow both near and hopelessly far away at the same time. I imagined that I was receiving copies of the diary from someone who had received copies from someone who had received copies from someone who had received copies from someone who had received copies, and so on, in a sort of infinite regression. The event, as a whole, in fact, consisted of so many spontaneous 'sendings' and 'receivings' that it seemed quite impossible to imagine who might possibly have been intending to share what type(s) of intimacy with whom, why, at any particular time in the process. What remained stable was the document itself, which appeared regularly in my emailbox with its paradoxical double identity as an intimate, personal account of the writer' s experience on the one hand, and as a simple piece of public information on the other: it had become 'private network public property'. Its receivers, whether friends, concerned strangers, or peeping Toms, were free to treat it as they wished: free to read it, ignore it, file it, forward it, comment on it, take sides with it, attack it, etc. It occurred that, as a global network event, the diary had real senders and receivers in only a very abstract sense. It was simply something 'there' in the network, with a life and dynamism of its own, which could be picked out and inspected if one wanted - or not, as the case might be. E-mail abounds in messages competing for the attention of receivers, like characters in search of authors, which only come alive if they are chosen for reception by the receiver. As often happens in the INTERNET, after a few installments of the diary, the online spectators began commenting on it. Soon, their comments led to disagreements, and the disagreements flared into a little network flame war in its own right. In this bloodless war, the combatants provoked and insulted each other with great abandon, precision, and aggressivity; and at times, the attacks were quite personal. The term for this in network jargon is 'flaming'; it is the darker side of e-mail intimacy, where people, merely by virtue of what I have called their 'interpersonal availability', become potential targets for whatever hostilities others in the network might feel like venting on them. It is almost as if, upon entering the INTERNET argumentative arena, e-mail users pass an invisible barrier beyond which almost anything goes. In e-mail, where there are few constraints on primal pathologies, exhibitionism, voyerism, narcissism, hostility, and so forth can be commonplace features of the communication process. E-MAIL AND AFFECT Stories like the above seem to contradict the notion that communication in computer networks is somehow 'cold' or emotionally underdeveloped. On the other hand, the emotive expressive deficits of e-mail are well-known. Like other forms of writing, it lacks a voice and a body. The emotive expressive possibilities of ASCII characters are more restricted than even those of other typewritten characters, and of course infinitely more restricted than those of handwritten characters. Basically, from a
204
R. W. Janney
graphological standpoint, in the ASCII code, one can only talk (in small letters) or SHOUT (in capitals). There is the problem of the failing emotive middle-register. The ASCII code provides only very limited ways of suggesting variations in voice quality, intonation, stress, speech rhythm, speech rate, etc.; and there are no replacements for the various facial expressions, gazes, postures, gestures, and so forth that help us interpret the emotive significance of utterances in everyday speech (cf. Arndt and Janney, 1987). The use of 'smileys' in e-mail is sometimes regarded as an attempt to overcome the emotive deficits of the ASCII code, but it is a poor attempt, and more like a form of play. Another problem with e-mail, from an emotive point of view, is its relative lack of replacements for the ancient rituals of writing, mailing, receiving, and reading paper letters. Paper letters are complex ritual events (cf. Violi, 1984; Caffi, 1986; Flusser, 1989). A good letter is handwritten in a place that is personal for the writer. The paper, while being written on, shares space in this place with other things that are personal to the writer, partaking, in a sense, of their intimacy. The handwriting, quite independently of the words, iconically represents the writer's feelings while thinking and putting down the thoughts on paper. The handwriting can suggest a state of mind, a state of health, a state of a relationship, a momentary personal situation, or a whole personal history. Before the letter is sent, it is traditionally put in an envelope, and the envelope flap is licked by the sender, symbolically sealing and ending the act of communication. All these (and many other) small, intimate, ritual acts are sent out, as pieces of the writer's 'self', on an uncertain journey through the world's postal system, to be reconstructed and re-experienced by the receiver on receipt. This is one reason why reading other people's mail can quite literally be an invasion of their personal privacy. There is also something ritualistically significant about receiving a sealed envelope, containing an unknown message, which has been passed from hand to hand, and has traveled across a great distance, perhaps taking a long time. A letter that appears unexpectedly in one's mailbox is potentially many things: a surprise, a shock, a promise of potential change. It is a sort of undeciphered gesture from a remote 'there' and 'then' that has somehow found its way into one's own immediate personal 'here' and 'now'. It can bear good news or bad news. It can change one' s life. An unopened letter is a silent invitation to embark on an unknown fate. Letters that one has been waiting to receive for a long time are especially significant. Waiting for them and wondering why they have not arrived yet are important parts of the ritual, as are weighing them in one's hand once they arrive, feeling the texture of the envelope, scrutinizing the stamps, postmarks, and smudges, opening them, seeing how the paper was folded, and reading them. In e-mail, sending is accomplished by giving the mail program a 'send' command, and receiving is accomplished by giving it a 'read' command. Is it possible that e-mail intimacy is, in part, an attempt to overcome the emotive deficits and the ritual constraints of the e-mail medium? This, at least, would be an explanation in keeping with Malinowski's (1923) notion that the two underlying functions of human interaction are to communicate facts (I have caught six fish) and to maintain relationships (Would you like one?). Malinowski called the first function 'communication' and the second 'communion', the first being a broadly ideational function, and the second broadly relational. He believed that both are necessary for
E-mail and Intimacy
205
social organization, and he claimed that it is a mistake to imagine that the main (or even most important) function of human interaction is to communicate factual information. Following Malinowski's reasoning, might we hence imagine that e-mail intimacy has something to do with people's need to maintain relational bonds - or with their need for interpersonal w a r m t h - in the cold, impersonal world of the cybernetic systems through which they interact? E-mail intimacy would then be explained as a form of compensation, where the partners 'try harder' than usual to communicate emotively, like people who lean closer to each other in a noisy room, shouting, gesticulating, grimacing, touching each other, etc., to compensate for 'noise on the line'. E-mail partners would be seen as intensifying their attempts to maintain bonds of union by acting in exaggeratedly 'warm' ways to overcome the restrictions of the 'cool' video communication channel. But this would suggest that the e-mail user could be regarded as a sort of unlucky projectionist in Plato's cave, who is forced to compensate for the emotive deficits of the shadows projected on the wall by moving the figures about more actively in front of the fire. Is this all there is to e-mail intimacy? THE SENDER-RECEIVER RELATIONSHIP
The idea that e-mail intimacy is strictly a reaction to deficient code and channel properties of the e-mail system is not very satisfying. First, it seems to entail imagining that e-mail users are somehow forced (as if against their will) to be intimate with each other, which seems counterintuitive. Second, it ignores the fact that people can indeed express their feelings in e-mail, and they can do this very well if they want to. Although the communication of affect requires some effort on a computer keyboard, it is by no means impossible, and, as Detlef Borchers (1995: 74) has recently said, in e-mail, the effect of affect is dangerously spontaneous. No facial expression, tone of voice, or physical gesture reduces its impact on the partner. I think if we want to explain what e-mail intimacy is really about, we have to move beyond considerations of the ASCII code and the video mode, and start asking questions about patterns of sender-receiver interaction in e-mail itself, and about patterns of interaction between e-mail users and the system. Here, I think, are the keys to understanding e-mail intimacy. If we look at what people really do as users of email, we can see that there is a basic lack of clarity in the sender-receiver relationship in the e-mail communication process. It is a product of the complexity, size, and flexibility of the n e t w o r k - and, above all, an effect of its incredible speed. The ease with which copies can be forwarded to third parties in the system tends to destabilize our notions of who our partners are. Partly as a result of this, our own roles as senders and receivers in the e-mail process also become fuzzy. It is as if the relational zeropoint of communication, or the 'I-you' core of the interaction, is sometimes not fully clear in e-mail. The Sender Role
The notion of being a sender in traditional communication models generally implies some correlative notion of being the originator of the message, and of directing it toward a particular receiver or receivers over a channel. Hence, e-mail messages are assumed to be encoded into the ASCII code, sent via the e-mail network to receivers, and decoded at the receiving end into something like a mirror images of themselves.In
206
R. W. Janney
real e-mail practice, however, this type of sending (in which (1) 'I' send 'my' message to 'you') is only one possibility- and sometimes not even the most important one. Others, for example, include: (2) 'I' send 'my' message to several of 'you' simultaneously; (3) 'I' send 'my' message to 'you', and copies to 'them'; (4) 'I' send 'my' message to 'you', and copies to 'them', with comments to 'them' from 'me'; (5) 'I' send copies of'their' message to 'me' to 'you'; (6) 'I' send copies of'my' copies of 'their' messages to each other to 'you', with comments to 'you' from 'me'; (7) 'I' use the reply function to send 'your' message to 'me' back to 'you' with 'my' comments to 'you', simultaneously sending 'your' message to 'me' with 'my' comments to 'you' back to myself. The point here is not that such things cannot be done by regular post or by telefax, but rather that in e-mail they can be done absolutely effortlessly and instantaneously with a simple push of a button, and they are infinitely combinable. Classical notions of senders, codes, channels, and receivers have been criticized increasingly in pragmatics in recent years (cf. Bickhard and Campbell, 1992; Mey, 1993). What has not yet been discussed, however, is that even the legitimacy of the notion of the sender-receiver dyad as the primary unit of e-mail interaction seems somehow questionable at times in the e-mail communication process. The 'I-you' relationship is permanently complicated in e-mail by the potential presence of additional 'he's', 'she's', or 'they's' who can get access to messages and forward copies. As any message sent via e-mail can potentially be forwarded by the receiver to other receivers, and by these to yet others, the sender of an e-mail message can never know exactly how many receivers will receive it. Hence, in the sender role, one can never know exactly who, in this extended sense, one is writing to: the notion of the receiving partner becomes vague. Since, according to enunciation theory (cf. Benveniste, 1971; Rosenbaum, 1994), the enunciating 'I' can hardly be conceptualized without reference to the addressed 'you', the vagueness of the addressee is a permanent underlying problem in e-mail communication. Therefore, in pursuit of explanations of e-mail intimacy, we ought to ask ourselves what this diffuse notion of the 'you'-receiver in e-mail means for the 'I'sender, and how it influences the sender's behavior. One would think that not knowing exactly who could receive one's messages might lead to a certain insecurity, or at least to a certain caution in matters of e-mail intimacy. Paradoxically, however, it does not seem to do this. The potential vagueness of the other, on the contrary, seems to encourage some e-mail users to indulge in curious forms of exhibitionism. The Receiver Role
If the concept of the receiver is potentially vague from the sender's standpoint in email, the concept of the sender is o~en no less vague from the receiver's standpoint. First, strictly speaking, we do not literally receive e-mail messages ourselves; our emailboxes do this. With the help of our PC's, we then go metaphorically 'into' our mailboxes, selecting the messages that we want to receive from the list of messages in the mailbox when we enter the system. There is hence an interesting sense in which, as receivers, we actually select our senders rather than being selected by them, as traditional communication models would have it. But we are greatly disadvantaged in distinguishing between categories of senders (friends, colleagues, professional interest groups, junk mailers, anonymous third parties, and so forth) by the paucity of information in the mail display on a PC monitor. Unlike in regular mail, where senders can be categorized relatively effortlessly just by
207
E-mail and Intimacy
glancing at the sizes, shapes, and colors of envelopes, and by looking at stamps, postal marks, address formats, and so forth, in e-mail, assigning senders to categories involves reading and interpreting a good deal of digitalized information on the monitor screen (e.g., the e-mail address, message header, size of the file, etc.). It is sometimes necessary first to read the opening of a message in order to assign its sender to a particular category. Moreover, from a new mail display on a PC monitor, it is not always immediately evident whether messages are originals, copies, or replies to previous messages, and it is not always clear whether they are intended for oneself in particular, or for a wider audience. Together, these characteristics of the e-mail display system tend to blur our sense of which messages might be worth selecting for reception and which might not be. THE
USER-NETWORK
RELATIONSHIP
E-mail users naturally communicate not only with each other when sending and receiving e-mail messages, but also, of necessity, with the e-mail system itself, via their PC's. Interaction between the user and the system, I think, offers further clues to explaining e-mail intimacy. As users of e-mail, we tend constantly to vacillate between two rather different role-alignments with respect to the e-mail system: either (1) we tend to selflessly serve the system, unreflectingly carrying out our information processing duties as metaphorical extensions of the n e t w o r k - as 'willing nodes', so to speak - or (2) we manipulate and exploit the possibilities of the network, regarding the system as a sort of metaphorical extension of ourselves and our interests. In the first role - that of the node in the n e t w o r k - the user is analogous, in a way, to a neuron in a central nervous system, or to an ant in an unimaginably huge, active anthill, swarming with actors performing individually senseless activities that organize themselves into socially useful forms. There are colleagues, for example, who come out of their offices in the morning, after processing the night's e-mail, proud of having already worked for two hours (I am indebted to Jacob Mey for this observation). In this role, they are not only users but also servants of the e-mail network. In the second role - where the network is regarded more as an extension of the self the user is somewhat like a post-paleolithic nomad, roaming about a self-constructed cybernetic environment, who hunts for information and gathers partners much as our stone-age ancestors once roamed about the earth hunting and gathering food. There are colleagues, for example, who are impassioned players of INTERNET games, subscribers to lists, appreciators of flame wars, forwarders of entertaining messages, senders of Christmas greetings with pictures drawn in ASCII characters, and so forth. In this role, they are not only users but also explorers and manipulators of the network, and their work takes on some of the characteristics of play.
-
The contrast between these two basically different styles of interacting with the email system, I would like to suggest, carries over into our relationships with our partners. In the latter role, in particular, we are sometimes tempted, when playing with our possibilities for manipulating and exploiting the system, to regard our partners indirectly as extensions of our own personal interests or desires. This, I think, might be the underlying attitude in many instances of e-mail intimacy: the assumption that the partner is a sort of extension of the self- someone called into being, in a sense, by the decision to communicate with him or her.
208
R. W. Janney
THE U S E R - M O N I T O R RELATIONSHIP
The final clue to unraveling the mystery of e-mail intimacy may well be in the relationship between the user and the PC Monitor. Following a line of reasoning used by Eco (1984), PC monitors can be compared with mirrors. Both produce virtual images, or images that tend to be perceived as appearing somehow 'inside' or 'behind' the glass, although the projecting surface per se has no 'inside'. Mirrors and PC monitors are threshold phenomena. The monitor is in a very literal sense an interface a type of third face between the e-mail sender and receiver- where the users' respective egos, projected on the screen as the messages that they send each other or select from each other for reception, become social egos of a special cybernetic type. Ever since Lacan's (1953) discussion of the mirror stage in child development, it has been known that experience with mirrors involves imagination and projection. The child at first mistakes the image in the mirror for reality, then recognizes that it is only an image, and finally realizes that it is his or her own image. The time when the reflected image is recognized as the 'self' is an important time in the child's social development, marking the birth of its so-called 'social ego'. It is the first step toward imagining different possible projected social selves, and the first step toward imagining others' thoughts, feelings, and desires. Ironically, adult e-mail users seem to go through somewhat similar stages in their experience with e-mail messages on PC monitors. They begin by imagining e-mail messages as 'real' documents that have somehow literally traveled through space and time by satellite from some place on earth into their PC terminals; then they recognize that e-mail messages are short-lived, virtual things in the dynamic life of the network; and finally they realize that the only reason why messages appear on their PC screens at all is because they themselves decide to make this happen. When an e-mail message on a PC monitor is recognized by the receiver as something that he or she has 'constructed' (by deciding to receive it), this marks the emergence of what we might call the 'social cybernetic ego'. This stage corresponds roughly with the time when users begin regarding the e-mail network as an extension of themselves, stop thinking much about secondary receivers of their own messages, and stop wondering about the 'real' identities of senders of messages that they have decided to receive. When this stage is reached, it becomes easy to imagine partners as virtual partners, and to imagine e-mail interaction as a kind of virtual interaction taking place metaphorically 'inside' the computer. In imagining this, the user, rather like Alice in Lewis Carroll's Through the Looldng-Glass, crosses over the threshold into the monitor, to interact with the virtual partners on the other side. Once Alice jumps into the looking-glass room in Lewis Carroll' s novel, her first words are, "Oh, what fun it'll be, when they see me through the glass in here, and can't get at me!" I think this is a rather good explanation of many instances of unexpected e-mail intimacy: e-mail intimacy arises, perhaps, out of the simple illusion that once one has metaphorically stepped 'into the monitor', one is invulnerable. I think that if we regard e-mail intimacy as a type of virtual intimacy- as a type of intimacy with a projected cybernetic extension of the user's 'partner-interests', as opposed to a 'real' partner - we are coming somewhat closer to the true nature of the phenomenon. Being intimate with a virtual partner involves no risks: it is like being
E-mail and Intimacy
209
intimate with oneself, or with a figure in a video game. There is little to lose by insulting a virtual partner: first, there is probably a good chance that you will never actually meet your living Doppelganger face-to-face; second, as long as the partner remains virtual, the issue of his or her 'personhood' is not relevant; and third, in any case, a virtual partner's reactions can be ignored at will, the way an uninteresting television show can be ignored. Especially the more aggressive forms of e-mail intimacy, like flaming, may be products of the feeling that the e-mail system, and all of the attackable virtual partners metaphorically 'inside' it, are extensions of the attacker' s sels CONCLUSION But where does this leave us, relationally speaking: with a looking-glass e-mail communication system capable of providing almost unlimited opportunities for intimate virtual emotive interaction with phantoms? Or, even worse, only with ourselves? Thirty years ago, McLuhan (1964:102-103) said that when information moves at the speed of the central nervous system, we are confronted with the obsolescence of all earlier forms of psychic and social adjustment. Our experience with e-mail sometimes seems to confirm this. McLuhan was deeply ambivalent about the social and psychological implications of electronic communication, but his work was always characterized by a strong hope which we see re-emerging again now, after three decades, in today's cognitive technology movement - that ways would eventually be found to steer electronic technology in socially unifying and humanly satisfying directions. He reasoned that "when we have achieved ... world-wide fragmentation, it is not unnatural to think about ... world-wide integration," and he dreamed of the possibility of some day achieving a balance between technology and experience that would, in his words, "raise our communal lives to the level of a world-wide consensus" (1964:106). I think that the emotive problems discussed above are not problems of the e-mail system, but problems of its users. A virtual world does not neccesarily have to be an aggressive world, and in the long run, intimacy, however virtual, can hardly help but be at least as integrative and conducive to global consensus as our other relational alternatives. Perhaps the problem is only that e-mail users are still preoccupied with playing with the e-mail system, and with playing with each other with the system. When they have finished playing, it will be time to sit down and start talking seriously about intimacy. POSTSCRIPT Somewhere between the beginning of this paper and the end, I received an air-mail letter, partly typed and partly hand-written, from my friend Yuri Kite at the Canadian Academy in Kobe, Japan, thanking me for my concern during the days following the earthquake that devastated that beautiful city at 5:47 A.M. on Tuesday, January 17th, 1995. She had finally given the pendulum of the antique clock on the wall of her office a nudge to restart it. It was February 6th, the water and gas had just been turned back on, and half of the students had returned. Most of the letter was a printout of e-mail messages that had been sent from the Academy during the days following the earthquake, and the remaining last half-page was the handwritten message.
210
R. W. Janney
The world had indeed seemed very small during the days after the earthquake in Kobe, and the e-mail immediacy had been very high. As I opened Yuri's letter (it was naturally in a sealed envelope, with stamps, postmarks, smudges, and the rest of the ritually significant relational icons), the thought arose that beyond the pathological fascination of some e-mail users with their possibilities for shaping virtual realities in their PC monitors, quite another global e-mail connection exists, which is very direct, human, non-egocentric, and intimate. It is the connection between friends. And I suddenly regretted not having realized the full strength of this connection as I had read the earlier diary reports of my Israeli colleague during the Gulf War. His status in that particular global e-mail event had been absolutely clear; mine too. He had been talking to his friends, and the rest of us had been listening in. One must really almost apologize for such an invasion of intimacy. The real truth of e-mail intimacy, I believe, lies somewhere between the poles of the self, the system, and the other. In e-mail, we can categorize our partners at all times in terms of any of these: they can be creations of our choices, ghosts in the monitor, or simply our friends. And it is possible that a certain helpless cynicism results from having such alternatives. On the other hand, it is also possible that, if we learn to get along with such possibilities, we do not have to be victims of them. Getting ready to do a last proofreading of this paper before sending it (too late, sorry) to the Editors in Hong Kong, I looked again at Yuri's handwriting from Kobe, and what I was trying to say in this postscript became clear to me. There were still telephone party-lines in the country in Montana when I was young. Two rings meant the family in the next valley, three rings meant our neighbors, and four rings meant us. And everybody heard and knew everything. The rest was only empathy, sympathy, understanding, and old-fashioned Ferris County discretion. And now, with the advent of e-mail, the party-line intimacy is only a little bigger. We've got to get used to it. REFERENCES
Arndt, Horst, and Richard W. Janney, 1987. InterGrammar: Toward a unified model of verbal, prosodic, and kinesic choices in speech. Berlin/New York/Amsterdam: Mouton de Gruyter. Arndt, Horst, and Richard W. Janney, 1991. Verbal, prosodic, and kinesic emotiee contrasts in speech. Journal ofPragmatics 15:521-549. Benveniste, Emile, 1971. Problems in general linguistics. Coral Gables, FL: University of Miami Press. Bickhard, Mark H., and Robert L. Campbell, 1992. Some foundational questions concerning language studies. Journal of Pragmatics 17: 401-433. Borchers, Detlef, 1995. Redeschlacht ohne Pardon. Die Zeit 3: 74. Caffi, Claudia, 1986. Writing letters. In: Jorgen Dines Johansen & Harly Sonne, eds., Pragmatics and linguistics. Festschri~ for Jacob L. Mey, 49-57. Odense: Odense University Press. Caffi, Claudia, and Richard W. Janney, 1994. Toward a pragmatics of emotive communication. Journal of Pragmatics 22: 325-373. Eco, Umberto, 1984. Semiotics and the philosophy of language. London: Macmillan. Flusser, Vilrm, 1989. Die Schriff. G/Sttingen: Immatrix. Lacan, Jacques, 1953. Le srminaire de J. Lacan. Paris: Seuil.
E-mail and Intimacy
211
Malinowski, Bronislaw, 1923. The problem of meaning in primitive languages. In: C.K. Ogden and I.A. Richards, The meaning of meaning, 296-336. London: Routledge & Kegan Paul. McLuhan, Marshall, 1964. Understanding media: The extensions of man. New York: McGraw-Hill. Mey, Jacob L., 1993. Pragmatics: An Introduction. Oxford: Blackwell. Rosenbaum, Bent, 1994. Passion and enunciation. Aarhus: Center for Semiotic Research (mimeographed). Violi, Patrizia, 1983. Letters as written interaction. In: Valentina D'Urso and Paolo Leonardi, eds., Discourse analysis and natural rhetorics, 213-219. Padova: Clueb. Watzlawick, Paul, Janet Helmick Beavin, and Don D. Jackson, 1967. Pragmatics of human communication: A study of interactional patterns, pathologies, and paradoxes. New York: Norton.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
213
Chapter 12 COMMUNICATION IMPEDANCE: TOUCHSTONE FOR COGNITIVE TECHNOLOGY Robert G. Eisenhardt SENSCI Corporation Alexandria, VA 22314, USA David C. Littman* Advanced Intelligent Technologies, Ltd. Burke, VA 22015-4737, USA dlittman@cs, gmu. edu
INTRODUCTION: MOTIVATION AND GOALS One of the key goals of the theoretical and engineering disciplines of Cognitive Technology must be to make it possible to create, efficiently and reliably, complex systems of humans and machines who work effectively together toward common goals. Unfortunately, the mere existence of the need to create a discipline of Cognitive Technology suggests--rightly, we believe--that humans and machines just do not naturally work together well. For all our personal, social, and political difficulties, the same cannot be said for humans: Humans work together in extraordinary ways and are absolutely marvelous at overcoming what would otherwise be fatal blows to joint problem solving. In essence, people are very good at detecting potential hazards to problem solving and correcting them before they make it impossible to achieve the goal toward which the problem solving was directed. For complex human-machine systems, especially in life critical applications, it is absolutely necessary that: 1) humans be able to communicate effectively with each other, 2) machines be able to communicate effectively with other machines and, 3) humans and machines be able to communicate effectively with each other. If misunderstandings occur in any of these communication channels, the potential for disaster can be high. There are many tragic examples of such failures in communication. Consider: About 10 years ago, a DC-10 taking off from Chicago's O'Hare airport lost its right engine shortly after lift-off. The plane crashed with loss of all passengers and crew. Investigation of the accident revealed that the crew eventually deduced the nature of the trouble and had begun to take corrective action, but too late. The investigation also showed that had the crew (or the Corresponding author.
214
R. G. Eisenhardt and D. C. Littman system) recognized the problem in time, the plane, passengers and crew were savable. All necessary data were available but the crew knew only that their model of the world was no longer valid. None of the available data were used to generate information describing the plane's change in state. Technology is available today to prevent that from happening, but it is not in place in any aircratt. 9 A similar incident occurred in 1987 in Detroit when the crew of a Boeing 727 took off with the flaps up. All on board were killed. Again, ample data were available to prevent the tragedy, but it was precipitated because not just the pilot, but ALL crew members' internal models of the plane had the flaps down, and all warnings to the contrary were ignored. The point is made that while it is the crew's responsibility to control the plane, the plane could have been equipped with intelligent systems that would have shut the plane down when it recognized the crew was unaware of the deteriorating situation. 9 A third case involved the Air Force Demonstration Team during a practice session a few years ago. The group was performing a split 'S" maneuver beginning at a low altitude. Unfortunately the lead pilot miscalculated and flew his plane into the ground followed almost immediately by his three wingmen in a tight fixed formation. An easily preventable accident where none of the pilots had a correct mental picture of reality even though sufficient data were available in the plane to correct it.
9 A final example is that of the USS STARK where misinterpretation of sensor data combined with inadequate protocols caused a command to be issued resulting in the destruction of a commercial airliner. While these events constitute significant tragedies, we hasten to point out that, in some respects, it is surprising that there are not more of them, considering the complexity of tasks humans are called on to perform daily. Our explanation for the relatively small number of catastrophic events, given the huge number of opportunities for them, is quite simple: One of the most important aspects of communication between and among humans and between humans and their environment in critical circumstances is the way they are able to identify and repair potential communication failures. Indeed, we believe that, the next generation of complex human-machine systems will have to be able to employ many of the strategies humans use to identify and repair communication failures. Moreover, the need for such capabilities will only increase as human-machine systems become more complex and expose different, and perhaps new, forms of communication failure. This implies that design tools used to develop the next generation of complex human-machine systems must be able to 1) foresee potentials for communication failure and, 2) recommend design alternatives to prevent such failures. Recently there has been a strong drive to develop engineering technologies for robust, complex, problem solving systems comprised of humans and machines 1. Such technologies are based on design methods and heuristics that, while effective to a For instance, Air Force, Navy, ARPA, and NIST have significant initiatives in the arena of intelligent tools to support design of complex systems.
Communication Impedance
215
degree, do not constitute a scalable and replicable design technology for constructing multi-agent systems able to avoid, or detect and repair, communication failures before accidents result. Current cognitive technologies are not detailed enough to support simulation of potential designs with the goal of detecting where, when, and why communication failures might occur, and how they might be prevented by altering the system design. This is so because, in large part, current system design tools do not reason about the effects of the different kinds of knowledge representations that humans use when engaging in complex problem solving. This, in turn, is because we do not yet clearly understand the types of knowledge structures humans bring to bear in complex, rapid and cooperative problem solving situations to the same degree we understand the types of knowledge structures employed in solving, for example, story problems involving time and motion. In short, we do not yet have a sufficient theoretical base to create robust engineering methods based in Cognitive Technology. In studies of individual problem solving--particularly in the area of acquisition of expertise--much research has focused on how knowledge representations used during early phases of individual skill acquisition become recoded, or "compiled", into rulelike procedural representations. (cf. work by Anderson, Kieras, Lesgold, Mitchell, Rouse, etc.) Mental models for spatial mapping and structure problems have been investigated also with the intent of discovering how humans make inferences about spatial relations and, especially, how they incorporate new constraints on spatial relations into existing mental models. However, little work has addressed the role of these different types of knowledge structures in group problem solving. Even less work has focused on their role in group problem solving tasks requiring different members to use different types of knowledge representations and communicate to other group members. We believe the absence of research on this topic has left complex systems designers without design tools necessary to develop systems capable of preventing or correcting for communication impedance. Our ultimate goal in this enterprise is, therefore, to develop a design technology consisting of 1) a theory of the types of communication failures that affect agents in complex problem solving, 2) the knowledge that humans do--and machines could and should--bring to bear in problem solving situations prone to communication failure and, 3) design tools to support rapid design, evaluation, and construction of robust, complex, intelligent human-machine systems that can: a) prevent communication failures or, b) identify communication failures and repair them before they become tragic. In this paper, we sketch the outlines of what we think a Theory of Communication Impedance might look like, how we intend to build it, and why we think it is a critical component of a discipline of Cognitive Technology. TYPES OF COMMUNICATION IMPEDANCE We have defined several categories of sources of communication impedance, each of which we believe poses significant challenges to the discipline of Cognitive Technology, if it is ever to evolve to the point where it serves as the basis for a robust
216
R. G. Eisenhardt and D. C. Littman
engineering methodology. In this section, we briefly describe our first pass at a typology of sources of CI. Impedance Associated with Mental Models. In the area of mental models, we have identified the following four fundamental causes of communication impedance: 9 Dysmodal Mental Model--This term, coined for this paper, refers to mental models that suffer from structural defects. One structural defect we call false
closure. In false closure, adjacent steps in an inference chain 'haake sense" locally. However, their effects on the overall inference chain are, unfortunately, destructive. For example, a novice program manager may conclude that it is possible to deliver a large software job on time, but with some additional expense required to put '~a few more people" on the job to %peed up" progress. This makes sense because if five people can do the job in a month, surely 10 people can do the same job in two weeks. Unfortunately, this appeal to the infamous Mythical Man Month, popularized by Fred Brooks, makes sense locally. As Brooks shows, however, in the overall context of developing and executing a software development plan, it simply does not have the effects one might expect from the naive intuition that 'If a little is good, more is better". More extreme examples come from schizophrenics in crisis whose speech makes reasonable sense in adjacent topic shifts but very little when the entire path of topic shifts is considered. 9 Mismatched Mental Model Contents---(A) Two or more communicators
have mental models of the same objects and/or processes, but there are differences in the contents or structure of the mental models that can lead to communication impedance and, possibly, failure. (B) One individual has two (or more) related models wherein any or all of them may be incomplete, erroneous, and of different levels of refinement. When some action is required to use some of these models in combination, the combined model is usually also incomplete, erroneous, or incorrect leading to improper decisions and subsequent action. This class of impedance is usually the root cause of human errors associated with interpreting her/his environment. Inference Rules f o r Mental M o d e l s - - T w o or more communicators may have identical mental models but different rules for operating on them.
9 Mismatched
9 Mismatched Referents--Two or more communicators may have different
referents for the same symbol although semantic meanings of the symbol are the same. 9 Mismatched Semantics--Two or more communicators may have different
meanings for the same symbol. Impedance Associated with Communication Acts. In the area of communication acts, we have identified the following three types of impedance sources: 9 F r a m i n g m A communicator may require different framing (from others) for a
specific communication to permit her/him to refer appropriately to the same mental model as the originator. 9 Conflicting Commands or Data--Errors can arise both within a medium,
such as (near) simultaneous conflicting commands e.g., 'turn left" vs. 'turn
Communication Impedance
217
right", and between mediums such as the perceived relative motion from the observer of trains in a station or jets on a carrier deck. 9 Recipient's AttentionmThe recipient may be preoccupied with other tasks of which the communicator is unaware.
Impedance Associated with Contextual Factors. In addition to mental models and communication acts, several other factors can affect communication deleteriously: noise; stressmtypically caused by task demands; physical state such as fatigue; and emotions, such as frustration or fear. Of these categories we hypothesize that model dissimilarities between (among) communicators and between communicators and the environment are the major source of communication impedance and will serve as our major focus of attention throughout the study.
Communication Time-Bandwidth Limit
/
0
s~
~
~
~ Allocationto Substantive
0
D
B Communication
Model Similarity Figure 1 Impact of Model Dissimilarity on Communication Because of its importance as a primary source, communication impedance associated with mental models is explored in more detail. Figure 1 represents the relationship between mental model dissimilarity and communication bandwidth. The term 'bommunication bandwidth" is used loosely for lack of a good definition of model bandwidth, a subject of our research. Note that communication is divided into two categories competing for available bandwidth. These are substantive communications, which convey conditions associated with the model, and meta-communications, which convey data related to the communication process itself. In the case of mental model dissimilarity, metacommunications are intended to bring the communicators' models into sufficient alignment to permit meaningful substantive communications to proceed. Note also in Figure 1 that at some point mental model dissimilarity is so great (D, E) it would appear impossible to communicate since channel bandwidth is insufficient to support the necessary meta-communication. When this occurs, substantial model building through 'off-line' meta-communication is required before meaningful dialogue
218
R. G. Eisenhardt and D. C. Littman
can occur. A famous, absolutely paradigmatic, example of this condition is illustrated by the comedy routine of Abbot and Costello, "Who's on First", where one side believes they are engaged in substantive communication while the other side believes they are engaged in a mixture of substantive and meta-communication. Sometimes, communication tends to follow the closed cycle shown in Figure 1 wherein some model disparity between communicators exists at point 'A" in a dialogue. As the dialogue progresses the models held by each communicator are revealed to be farther apart in actuality, but the disparity is unknown to the parties; however, the dialogue will tend to continue until, at some point (B), one of the parties recognizes the model discrepancy and moves the dialogue to the meta level (C) and the dialogue stays there in an attempt to bring the models into sync and, if successful, the dialogue resumes at point "A." This method of correcting the problem depends on many capabilities but foremost is an ability to understand the current situation, determine the interpretations of the situation held by all communicators, and to know if either situation and/or interpretations are changing and why. This is one of the essential aspects of the concepts of situational awareness and assessment. An example of this process is the game of Bridge. Here, partners have access to data (i.e., visual) with which to construct models of the world but initially they have no knowledge of two critical pieces, their opponents' and their partner's hands. The bidding process is designed to provide sufficient data for each of the four players to construct models of the other hands. But these models are at best flawed because the bidding signals and number of available bids are insufficient to construct fully correct models. To help overcome this problem, bidders set up elaborate signals or conventions for transmission of data between them. These signals serve two purposes. First they convey more data per bid than the normal bidding conventions convey and, second, they are intended to confuse opponents: This process of setting up conventions is a form of meta-communication between the partners. Sometimes a player will get a new partner and simply presume the partner follows the same conventions as he/she does. It is only after some substantive bidding communication that one of them may realize they are using different conventions or models of bidding. Thus, the substantive communication has produced nothing but disinformation in the minds of all players until recognized and corrected through off-line or real time meta-communication. This process works at the human level because the communicators are both intelligent and able to modify their mental models on-the-fly. In man-machine cooperation the problems are different but just as real and serious. In a machine with fixed assets, a copier for instance, the machine's models of the world and of the user are limited to those built in by the designer; any coping strategies the machine possesses for off-nominal users must also be built in by the designer. If either of those models is incorrect for a specific user, or the user's model of the copier is not correct (naive, incomplete, or wrong) the communication between man and machine will probably be difficult--an experience many of us have had. As the development of enabling technologies for situated computing progresses, it is inevitable that machines will have sensors and effectors they can use to sample the information required to detect and accommodate unanticipated events. This will have to be the case for most military systems. But do we have the knowledge and tools to create a solution that will permit us to build systems with these attributes? Not yet.
Communication Impedance
219
A CUT AT A SOLUTION We believe that successful approaches to gaining wide acceptance for Cognitive Technology will have to be based in a combination of empirical studies and intelligent design environments for systems designers. Briefly, we are following a two-phase strategy in our efforts to develop an engineering methodology based on the Theory of Communication Impedance: Phase I
Study and build a knowledge base about how people detect (or why they fail to detect) and correct communication errors resulting from sources of impedance and, from this, develop a Theory of Communication Impedance.
Phase II
design, construct, and test tools, including workbenches, to assist designers of intelligent machines to create the abilities to identify and overcome errors caused by communication impedance.
We plan to gather data about mental models by developing test programs that emphasize the importance of models in communication. We intend also to include studies of the human's ability to: 1) correctly perceive models of the environment and, 2) correctly identify machines' implicit and explicit models of tasks, both very important issues in both military and civilian computing infrastructures. Our development of a Theory of Communication Impedance will continue to codify relationships that enable outcome prediction for a given set of circumstances (e.g., a specific task domain, such as Air Traffic Control). In particular, the priorities for our early focus is on: 1. sensitivity of model dissimilarities, mismatched inference rules, mismatched referents, and mismatched semantics to the probability, and types of miscommunications for different model structures, 2. sensitivity of communication effectiveness to communication acts, 3. sensitivity of communication effectiveness to contextual factors. As part of this activity, we are identifying categories of mental models and determine how and when use of these models is appropriate as well as when inappropriate use can lead to communication impedance. We intend to identify classes of severity of impedance incidents and relate them to how humans identify and repair them. For example, one member of an air traffic control team may have an incorrect mental model of the position of an aircratt that a colleague is about to hand off. This may lead the receiving controller to incorrectly issue flight directives to another aircraft in preparation for accepting the hand-off craft. If one of the other controllers, or the pilot, does not detect the error, an accident may occur. As we indicated before, our ultimate goal is to develop an instantiation of Cognitive Technology that will assist designers of intelligent machines to create the abilities to identify and overcome errors caused by communication impedance. For example, the Navy, in its conceptualization of the Surface Combatant for the 21 st century will likely need to perform studies of the tradeoffs incurred by partitioning functionality to hardware, sottware, and humans: A robust Theory of Communication Impedance would materially contribute to this activity.
220
R. G. Eisenhardt and D. C, Littman
A rough view of the kind of intelligent design environment we propose to construct is shown in Figure 2. As illustrated, the CIWB consists of several types of knowledge components. It contains knowledge bases for encoding data about types of communication impedance, types of knowledge structures used by humans in problem solving tasks, and task types instantiated for a specific domain, such as Air Traffic Control. The CIWB allows the human system designer to generate a system design along with a specification of the tasks to be performed, and the knowledge structures and problem solving strategies used to perform the tasks. The CIWB is then used to perform a qualitative simulation of the proposed design with the specific tasks and knowledge structures, and identifies potential sources of communication impedance as the simulation unfolds. The design search space heuristics are used to home in on a system design minimizing communication impedance either by eliminating its sources or by providing mechanisms for handling it when it arises. To see that results of the proposed effort are prepared for and made available to as many people as possible in the human-machine interface business, we propose, ultimately, to develop the CIWB to assist designers of man-machine systems to identify potential sources of communication impedance for their intended applications and to be able to replicate their effects on various communication tasks. Given an understanding of particular types of problems, they may select preconfigured diagnostic strategies for the impedances, specify or select corrective strategies, and then simulate effects of the strategies on proposed system designs. Parameterized
nstantiated-~ System | Design . J
Domain Knowledge
~1
Tasks
K-strucs
Design InstantiationRules Communication
Typesof Communication Impedance
i
Desert Search Space
Control Rules
Impedance
Workbench
Figure 2 General Form of The Communication Impedance Work Bench functional code modules for each of the diagnostic and corrective strategies would be included in the CIWB. These modules then could be selected and included in working,
Communication Impedance
221
delivered systems. In essence, we intend to build part of the infrastructure that may enable significant aspects of Cognitive Technology Engineering. CONCLUDING REMARK The discipline of Cognitive Technology will need system development tools to make it more than a theoretical curiosity or a book of design heuristics, such as that developed by many government agencies as guidelines for form and content of Graphical User Interfaces. We suggest that our work on developing a Theory of Communication Impedance may provide a useful forcing function for the creation of a practical Cognitive Technology. REFERENCES Littman, David C., B. Gadget and D. Antic, 1994. The dot-loop architecture: A virtual reality based system for aircraft design, operation, and training. Proceedings of International Conference on Aviation Systems. Anaheim, CA. September 1994. Littman, David C., 1991. Seamless knowledge-based design environments. Proceedings of AAAI91 Workshop on Automated Software Design. USC/Information Sciences Institute Technical Report #RS-91-287. Marina del Rey, Calif., 1991. Littman, David C., 1989. Constructing expert systems as building mental models Or Toward a cognitive ontology for expert systems. In: K. Morik, ed., Knowledge Representation and Knowledge Organisation in Machine Learning, 88-106. Berlin: Springer-Verlag.
This Page Intentionally Left Blank
EDUCATION
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
225
Chapter 13 T E C H N O L O G Y AND THE STRUCTURE OF TERTIARY EDUCATION INSTITUTIONS Kevin Cox
Department of Computer Science City University ofHong Kong
[email protected] ABSTRACT The direct interaction of computers with humans is of relevance to cognitive technology. However, the indirect influence on human cognitive development through technology changing social structures and social institutions can be as great or even greater. Education is one such area. Humans develop by interacting with their environment and by reflecting on the nature of that interaction. In its broadest sense education is continuous and is the acquisition of knowledge, wisdom and skills. It has become an industry because of the needs of society to efficiently certify, ration and control the distribution of human knowledge and skills. The acquisition of skills, knowledge and wisdom does not require the industrial structure of the current education industry and these structures may even inhibit an individual's development. It is argued that new technology permits the reorganisation of tertiary institutions. The close interlinking of the delivery of education with the certification of individuals is no longer mandated by economics and changes are possible in the physical location and methods involved in the delivery of education. This paper explores some possible changes in the education industry and in the likely effects on individuals. THE ROLE OF TERTIARY EDUCATION
The tertiary education industry is an expensive but vital component of all industrial societies. Like all industries it is subject to change in the face of new technologies and in particular in the economics of the industry. Here I explore a likely scenario derived from the economic and technological imperatives driving education systems evolution. It is not claimed that the scenario will be the only structure but it is claimed that it is likely that much tertiary education will evolve in these directions over the next two to three decades. If new technologies have significant cost advantages in providing certain services then it is almost inevitable that these changes will happen. It is claimed that many of the services tertiary institutions provide can be provided in more economic and more effective ways without compromising the quality of the education and training of the students.
226
K. Cox
While we hope that education is broad and enriching, the major reason why society tolerates and supports tertiary education is to support the economic activities of society. The major tasks of universities in the past (Hernes, 1993) have been to transmit learning and knowledge. Students in tertiary education still spend most of their time preparing for future careers and work. Except for the fortunate few, education is not an end in itself but is done so that students can better perform tasks within the broader society. This goal is important and the increasing demands of society for more accountability, which institutions throughout the world are experiencing, will ensure that tertiary institutions will continue in this role and continue to mainly educate students for work in industry in its broadest terms. However, the great research universities, particularly of Germany and the USA, have added to the fundamental role, of the transmission of knowledge, the generation of knowledge and the application of knowledge to society. Universities are principally ranked by their research, not by the transmission of learning and knowledge. Evidence for this is in the way league tables of Universities in the UK, Australia and the USA are compiled and scored. At the level of individual staff members there are strong correlations between promotion and research output (Over, 1993) and negative correlations between perceived teaching committment and promotion. Research now and in the future will continue to drive much of the development of tertiary institutions. Opposed to this trend are the increasing demand for tertiary education to become more accountable to students and society and the need to improve the quality of the transmission of knowledge process. This paper will not discuss the conflict and reciprocity between these roles but will concentrate on the transmission of knowledge and the marketing and selling of that service. THE STRUCTURE OF TERTIARY INSTITUTIONS Given that an important role of tertiary institutions is to educate students so that they can perform professional work, and given that institutions will be increasingly judged by their abilities to achieve this task, we need to examine how this can best be done. As part of this role tertiary institutions certify that students are competent at certain activities, have received an education and have benefited from it. How to achieve this is a major preoccupation of most institutions. Here I examine trends and indicators of how these activities are likely to change due to economics and new technologies. First we must examine the way education is currently structured and delivered. In almost all institutions there are courses of study. These courses have specific occupation targets. There are courses for nurses, courses for programmers, courses for engineers and courses for teachers. Courses are broken into modules for instruction. Modules cover some part -of the body of knowledge required for the overall course of instruction. Modules themselves can vary but they typically have a standard length of time and are almost all identical in structure. A module consists of a body of knowledge packaged together with higher level skills required to apply that knowledge. Thus students are meant to not only know things but they are meant to be able to apply the knowledge and to see its relevance in a broader context. Module descriptions are standardised and consist of statements of aims, objectives, syllabuses and evaluation methods. A course of study puts together
Technology Tertiary Education
227
modules of knowledge and techniques to form a coherent whole which prepares students for some future job whether it be a researcher, dentist, teacher, manager or technical analyst. Students study these modules and a certification is given that they have achieved the objectives of the modules. Typically, modules are delivered in way that is a close analogy to a factory with batch processing. A pertinent analogy is the production of beer in an old style brewery. Here a batch of beer is brewed in large vats and allowed to ferment for a given period of time. When the brew is finished then the whole batch is bottled. In our analogy each final bottle of beer is equivalent to a student completing a module. The analogy operates at the level of the structure of the course and the progression of students through a course. In structure there are striking similarities. All students get the same module at the same time. Students are handled in batches. This is quite different from an assembly line or on demand manufacturing. Batch processing in manufacturing is an old technique and normally only used when the economics and technology do not allow more continuous and flexible processing. It is contended that this basic model of courses consisting of modules of knowledge will remain. What will change is be the way knowledge is presented to students and the transition from batch processing of students towards a more flexible structure (Darby, 1994). The current education systems are characterised by the need for efficient assessment of students through examinations, by the close coupling of assessment with instruction and by the delivery of knowledge through lectures and tutorials. The current system is a cost effective educational approach as it allows economies of scale and processing. Most institutions know that their courses have to have many students. Most institutions know that they have to give examination assessment at the same time. Most institutions know that they have to divide knowledge into fixed time length modules so that all knowledge is compartmentalised into fixed sized chunks. Courses with few students and courses with different modes of assessment are rare and expensive to operate. The main reasons for this restrictive and constrained system are economic and technological, not educational. A N O T H E R POSSIBLE STRUCTURE FOR A DELIVERY OF MODULES.
Using the manufacturing analogy, another model for the structuring of education is as a continuous process with just on time delivery of instruction (Perelman, 1993). It must be stressed again that this analogy is an analogy for structure, not process. We should not take the analogy too far and imply that all students are all standardised identical components. The structure and organisation of their courses are standardised. What this paper is suggesting is that technology may be able to make this process more flexible and attuned to both the students and to the subject matter. In a just on time system a student would do a module at any convenient time at their own rate. A course would still Consist of a set of modules. These modules do not have to be of fixed size or fixed format. Students are not required to do modules in lockstep. From the point of view of the student there are obvious advantages. The students can fit the course more to their own needs and abilities. Students who need more time, could take more time. Students who wished to work part-time could work part-time. From the point of view of the institution this could also work well. There would be no longer surges of activities taking place at short intervals. Examination processing,
228
K. Cox
student enrolments, demands for lecture space, demands for library resources, demands for computing resources could be better controlled and managed. Students would enter the system continuously and the number of students admitted would be matched to the resources available. Courses and modules could change continuously with a more rapid adjustment to problems of quality and relevance. Why do we continue to use the old batch processing model? Why does almost every tertiary institution in the world have groups of students moving as one through the system7 The reasons are economic. It is too expensive to put on lectures on demand for an individual student. It is too expensive to set up an examination for a single student when we can service thousands with little more effort. The information systems in our institutions are not capable of handling the new organisation. The existing structures, both physical and organisation, are set up to handle batches and cannot cope with continuous education. It is the contention here that the flexible manufacturing model is now technically and economically possible for tertiary education. Moreover, it is contended that imperatives of economics and the market for this structure are such that given a flee market this will become the dominant structure in the future. I now examine indicators of this approach, consider objections to the approach, discuss possible organisational implications and finally outline the possible cognitive effects on students and staff. CURRENT INDICATORS FOR FLEXIBLE EDUCATION There is an increasing interest in the distance education model as a way of providing education for large numbers of people. The English Open University has 50,000 students, Australia has just started an equivalent university structure, the Open Learning Institute in Hong Kong has over 10,000 active students. China is said to have 2,000,000 distance education students. As well as these formal courses a major growth industry is the provision of training courses for professionals and in-house training in organisations. Such training courses are sometimes done on a just in time basis. Most degrees by thesis are conducted flexibly with students entering the programs at different times. The distance education model is however only a step along the way to flexible education. It does not rely on the lecture/tutorial model but most of these systems are still structurally organised with batches of students entering in year groups, with examinations and assessment happening at fixed periods. Distance education is still fundamentally a batch system; the method of delivering of instruction is the main innovation. Even here most delivery is still a translation of the old lecture format to paper or video. Technology is not used in the best way yet but there are signs that it is changing (Laurillard, 1994). It is difficult to generalise on costs and to compare different institutions but distance education does seem to have a significant cost advantage. Muta and Saito (1994) show that the direct costs of the University of the Air of Japan is about half the cost of an equivalent conventional campus-based program and that the economics improve as the system becomes larger. The Open Learning Institute of Hong Kong is similarly less expensive than any of the Universities or Vocational Training Institutes in Hong Kong. Another trend is the franchising of education services (Yorke, 1993). Here education institutions franchise their teaching methods, their name and their practices
Technology Tertiary Education
229
and sell education away from their campuses. This form of trade in education services seem to be a growing phenomenon. Society is increasingly demanding accountability of education. A major way that this is being realised is the demand for performance based assessment or criterion referenced assessment (Glaser, 1963). The demands in England and the USA for national testing of educational attainments of school children is another striking example. This form of accountability is an attempt by society to know that education teaches students to do things which they could not do before, such as reading, writing and arithmetic. This demand for criterion based assessment is different from the norm based assessment which is the most current practice in almost all universities and which is made easy by the batching of students. These trend: of cheaper delivery of education via distance education techniques, the idea of franchising education services, the demands of society for accountability, criterion based assessment and the increasing debate between the roles of research and teaching could lead to the development of Universities with a different structure made possible by computer and communications technologies. THE NEW STRUCTURE FOR A COURSE OF STUDY For expository purposes let us imagine a possible course structure based upon these ideas. In this structure students are enrolled continuously in courses and modules. A student can start a course or a module at any time. Students will study in a variety of ways but most information will be delivered in a "distance education" mode where progression is based on work previously done and assessment is continuous and criterion referenced. Students take a periodic controlled assessment when they are satisfied with their progress and when they have finished the required material in a module. A course structured this way is similar to current courses and so it is likely to be accepted by staff and students. Students will find it attractive because of the flexibility. Employers will find it attractive because the assessment is criterion referenced. Administrators will like the regularity of the process and the evening out of peaks of demand on facilities. Staff would like the system because it gives them great flexibility in how they deliver instruction and there is the possibility for more individual contact with students. THE NEW UNIVERSITY One of the reasons why this model of a course is almost inevitable is that it gives a great competitive advantage to the well known prestigious universities. The new university structure will consist of distinct parts. There will be the undergraduate and industry instructional components and there will be the research and graduate research component. Already this structure is evident in many schools with distinct graduate schools. The graduate research schools will remain much the same. Their purpose is to gain prestige and status for the University and so assist the selling of the undergraduate instruction. The major structural difference will be the franchising of the instructional courses and the removal of the need for batching of students.
230
K. Cox
Once the system of instruction is well organised and established there is no reason why it needs to be physically restricted to the local environment. In the same way that the most successful marketing phenomenon of the 80's and 90's has been franchising so the same is likely in education. A second tier university in England is more likely to thrive and prosper running and promoting MIT courses than it is in running and promoting its own courses. It can attract better students from a wider community and it can almost certainly provide the courses at lower cost. The franchising universities' role is to assure quality. This can be done through a rigorous controlled examination regime. The benefits to the franchising university are the large sums of money that will be generated to be used to support the prestige enhancing research and also to develop and construct better instructional modules. The models for marketing and delivery are already with us. It only requires one of the major universities to start the process and others will follow. If this happens then it will occur quickly because of the economic and marketing imperatives. There will be no need for new buildings, no need for massive investment in infrastructure; it only requires reorganisation. The education world of the early part of the next century will then become similar to other industries. It will be dominatedby multi-national universities with good brand names. OBJECTIONS TO THE APPROACH
There is a major political problem with this vision. Education is a sensitive cultural issue and this is one of the reasons why education is heavily subsidised by governments. There are strong cultural reasons against the Coca-Colonisation of universities but there are ways around the brand name and perception problems. Although the brand names may be modified the underlying structure is still likely with appropriate relabelling of courses and local variations added to satisfy the politics. Staff in institutions will almost certainly oppose the changes. Strong unions will attempt to preserve the status quo. Some countries will resist more than others but, because the system is more efficient in resources, staff can share in the benefits through increased salaries and benefits. Opposition attenuates when material benefits are given to opponents. The educational objections are similar to those leveled against distance education, particularly when the education is provided across country boundaries (Woodhouse and Croft, 1993). However, local franchising with good electronic communications between staff and students, may counter many of these objections. If student to student contact is desirable then there is no reason why students doing the same course cannot be organised in small groups. Even though the model supposes a continuous flow of students through the system, there will still be several doing the same work at the same time and these students can be organised in groups. If it is desirable to have face to face meetings rather than electronic video conferencing then appropriate residential courses can be offered. IMPLICATIONS FOR STUDENTS One of the greatest benefits for students from this approach will be the relegation of norm-based assessment to relative insignificance. (Winter, 1993). Individuals will be
Technology Tertiary Education
231
judged on their performance rather than their performance relative to others. Performance relative to others can only occur in a batch situation or when material remains static and courses are not based on mastery principles. There will still be the opportunity for excellence to be demonstrated, but it is no longer necessary to use student comparisons and ranking as the basis for grades. Although this aspect has not been addressed explicitly in this paper, technology does offer the chance to get away from the teacher 'telling' and towards the student 'experiencing' model of education (Laurillard, 1993). The structural model envisaged here requires the development of modules of this nature and the education systems will not work if the material is in the form of transplanting the classroom teacher lecturing mode of delivery. The technology allows students a choice of materials and choice in the way they receive instruction and they will be able to choose an appropriate method for their style of learning. This form of learning is truly student based with students learning by doing and by interacting with the learning environment presented via the computer. There are many educational advantages to this process. Students get much more immediate feedback, the simulated study environment can be made much closer to reality, students interact and react instead of passively attempting to absorb knowledge. It is difficult and expensive to construct good modules with these desirable characteristics but as the market for modules expands so it becomes worthwhile to finance modules. On the negative side there are cognitive disadvantages. As the simulations become closer to reality, the student is liable to believe that the simulation is the reality. There is little concern that a student will mistake a lecture for reality. In the real situation students will make the necessary transformations to put the lessons into practice. If a simulation becomes too close to reality then the difficulty of making the necessary transformations from simulation to reality may become more difficult. When there are large cognitive distances between situations we can reasonably hypothesise that there are few problems in recognition, but as the cognitive distances become closer, the problems of recognising reality become more difficult. An important part of the student experience is the interaction with other students. The current campus based education systems mean that students are thrown together and serendipity rules. Groups and interactions arise naturally. The same phenomena can occur, perhaps more intensely, in periodic gatherings of students. Electronic meetings and connections will also occur and are an additional avenue for communication. The electronic infrastructure that affords the delivery of instruction also gives a communication channel to both instructors and to other students and it is inevitable that this will be used. Technology can add another channel of communication for people and free them from the tyrannies of distance and time. On the negative side people can perhaps have too much computer based interaction. There is a finite amount of time available to each individual. If this time is increasingly occupied with some of the new forms of interaction, there will be less time for other forms such as conversation. If we spend all our time answering our e-mail then we have less time available to talk to our friends and to discuss issues. If people spend increasing amounts of time interacting with the world through computer screens then it must change the way they think. As the number of hours spent in front of screens increases, the screen becomes our perception of what is real and "reality" becomes
232
K. Cox
secondary and somehow less "real" because it is less common. Already students are spending (Cox, 1994) ten or more hours per week in front of computer screens. When the changes suggested here in education occur, we can expect screen interaction, along with TV watching, to occupy the major part of each day with unknown and possibly unpredictable side effects. Work and education are likely to become intermixed. Education now occurs as a separate activity from work activity. This is primarily a structural problem. The new model allows work and education to coexist more easily and there will be an increase in the proportion of students who study and work at the same time. The structural impediments that make this difficult and inconvenient are removed. Today's tertiary educated have been socialised and formed through the shared experience of campus life. This life is different from both school and a normal working environment and has an important influence on one's world view. If this stage is removed from many people's lives what effect will it have? Again the implications are unknown except that we know they will be significant and will change the way people view the world. IMPLICATIONS FOR STAFF It is likely that University staff will become even more polarised into researchers and those who deliver education. The delivery of education will remain the main source of income for institutions and demand will be related to prestige and brand names. Because of this, research will be encouraged to add to the prestige of the institution as will direct links with industry. Links with industry not only give prestige and funds but also another source of students for education. Research teaching will concentrate on post-graduate teaching. This is likely to remain similar to the current system with close interaction between research students and staff. Research schools can also gain funds by producing teaching material from their research to be included in the module offerings. The developers of education material will become a specialised area with many of the same skills now required of movie and software producers. Modules will be developed in the same way that movies are now produced with teams of specialists coming together to create a product. With the virtual disappearance of most lectures and formal tutorials the teachers who now interact with students will require considerable social and communication skills in dealing with individual students. The performance qualities that characterise many good lecturers will be less valuable than the social and helping skills. Because less time is spent in lecture preparation and delivery there is more time available for staff to interact with students. It is likely though that the prestige and subject knowledge of these staff will decrease and staff directly involved in student interaction are likely to receive lower salaries. This will then permit more interaction. Quality assurance and the setting and testing of students will become a major activity. Large numbers of staff will be involved in arranging for continuous examinations and testing of students. This will also become a specialised area. A worry of staff is that as the delivery of education becomes more effective and as technology replaces people for some activities, the need and demand for teachers will diminish and teaching jobs will disappear. Fortunately economic history suggests the opposite (Economist, 1995). As the effectiveness of teaching improves so the demand
Technology Tertiary Education
233
for the product will increase. More people, more tasks and more training and education is the likely result. Particular jobs may change but the net effect is that the total number of jobs increases in the sector in which efficiencies occur. While the number of telephone operators has dropped the total number of people in the telecommunications industry has increased dramatically. Predictions in the USA (Bureau of Labor Statistics) say that the number of teachers will increase over the next few years. As delivery of information becomes more efficient, the gains in efficiency are likely to be balanced by more researchers, more developers of instruction and more people to interact with students. The lessons of history also suggest that those that try to stop the inevitable are the ones to suffer the most. Governments that try to regulate industries to save jobs tend to lose jobs and not create new ones. IMPLICATIONS FOR ORGANISATIONS Universities will become profit making organisations and be listed on the stock markets of the world. The ones that will be successful will be the ones that deliver quality education. Because brand identification is crucial to success, universities will ensure that quality of education is maintained. Research for research sake will flourish, because it is in the interests of universities to maintain their image so that they can sell their education services. Boutique universities will still exist and centres of particular excellence will continue to thrive. Individuals and groups of talented people will still be able to produce and market excellent courses, but there will be a tendency for large organisations to dominate the market. The lesson from the movie industry is that most of the products come from the large organisations but small groups tend to innovate and create new and exciting products. In fact, the incentive for a small group to develop and create good courses will increase, because the rewards for success will become greater. The software industry is another example where individuals and small companies become stars and important players. There is a danger of drab uniformity in delivery of instruction. The solution to this problem is to remove the artificial barriers to competition and innovation such as government regulation. Countries that fear cultural imperialism can counter by fostering and supporting their own industries through appropriate incentives and by making sure that other countries do not impose restrictive barriers to the introduction of their own products. Giving people the freedom to choose and allowing variety to flourish is the best guarantee for innovation and assures variety and choice. CONCLUSION Computing and communication technologies allow instruction to be delivered in different ways to students. This can be made to give a high quality of student experience and to facilitate learning. Because it is possible it does not follow that it is inevitable. However, the technology does integrate and fit with other social and economic trends. This combination of economic imperatives together with the ability to organise along the lines of other successful industrial entities means that it is likely that the institutions of tertiary education will evolve in these directions. The twentyfirst century will see little difference in the internal structure of a large vehicle production company and a large university. Courses of instruction will be developed
234
K. Cox
and marketed throughout the world. Tertiary education will employ even more people as the value of their products becomes apparent. More education will be available to more people and the demand for the product will continue to increase. There will be an increase in the machine mediated interaction in education with unknown and unpredictable effects. REFERENCES
Darby, Jonathon, 1994. A vision of higher education in the Year 2000. Proceedings Apitite 94:15-18. Laurillard, Diana, 1994. Multimedia and the changing experience of the learner. Proceedings Apitite 94: 19-24. Cox, Kevin, 1994. Computers in tertiary education. Proceedings Apitite 94, 939-944. Laurillard, Diana, 1993. Rethinking university teaching: A framework for the effective use of educational technology. London: Routledge. Perelman, Lewis J., 1993. School's out: Hyperlearning, the new technology, and the end of education. New York: Avon. Woodhouse, David and Alma Craft, 1993. Exporting distance education and importer's view. Higher Education Management 5(3): 333-337. Muta, Hiromitsu and Saito, Takahiro, 1994. Comprehensive cost analysis of the University of the Air of Japan. Higher Education 28: 325-353. Hernes, Gudmund, 1993. Images of Institutions of higher education. Higher Education Management 5(3): 265-271. Yorke, Mantz, 1993. Quality assurance for higher education franchising. Higher Education 26: 167-182. Over, Ray, 1993. Correlates of career advancement in Australian universities. Higher Education 26:313-329. Glaser, Robert, 1963. Instructional technology and the measurement of learning outcomes: some questions. American Psychologist 18:519-521. Economist, 1995. Technology and Unemployment. February 11th 1995, 19-21 Winter, Richard, 1993. Education or grading? Arguments for a non-subdivided honours degree. Studies in Higher Education 18(3):363-377.
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
235
Chapter 14 A CHINESE CHARACTER BASED TELECOMMUNICATION DEVICE FOR THE DEAF (TDD) Orville Leverne Clubb & C. H. Lee
Department of Computer Science City University ofHong Kong
[email protected] ABSTRACT The use of TDD (Telecommunication Devices for the Deaf) is common in Western societies. This has not been the case in Asian countries using the Han script (Chinese characters), due mainly to the difficulties of mapping large character sets to a reasonably sized device. Among the hearing impaired, the lack of suitable devices and methods also results in a lack of access to valuable services, such as emergency phone calling, which people with normal-heating take for granted. In this paper, we introduce our work on a Chinese language based TDD system. Among various possible input techniques, we selected one that is based on the strokesequence method because of its simple-to-learn and fast convergence characteristics. Since mapping sequences of strokes to characters can be a complicated procedure, a micro-processor based auxiliary device is being developed. A test is also set up so that communication behavior of the deaf can be evaluated to feed back into the design of the system. It is assumed that the results, when more generally applied, will significantly benefit the deaf communities in Chinese speaking countries.
INTRODUCTION In every society, about one person in a thousand can be expected to have a severe hearing loss that is going to cause him or her to be classified as legally deaf. Due to the nature of their handicap, the hearing impaired are isolated from the community atlarge: they are able to move 'normally' among the hearing community, but as soon as verbal communication is required, their handicap is noticed. The hearing impaired in the United States and Europe have had, for a number of years, a TDD (Telecommunication Device for the Deaf), based on the Latin alphabet. In contrast, the hearing impaired of Hong Kong (being mostly Chinese) are presently an isolated subculture that cannot interface with the community at large. While the use of the TDD in many developed countries of the West has permitted the hearing impaired community to take advantage of services that the hearing community have
236
O.L. Clubb and C.H. Lee
taken for granted, the Chinese speaking community (both in Hong Kong and in the PRC, as well as in other places) does not have the same tools and infrastructure available that would allow the hearing impaired access to most telecommunication services such as airline reservations, emergency numbers, as well as a means to communicate with friends and family. The hearing impaired of the United States and Europe have been able to make use of old teletype technology for communicating over standard telephone lines. The device has been streamlined over the years but still uses the baud code of the teletypes. Since the introduction of this technology, a large service infrastructure has been established for the hearing impaired in the West. It was quite easy for the Western hearing impaired to pick up the typing skills needed to be able to use the teletype devices in a two way interactive mode. In Hong Kong and some other Asian countries, the hearing impaired have not had the advantage of a TDD. Consequently, in large parts of Asia, the hearing impaired are isolated into a subculture with their sign language. This situation could be due to many reasons, among them cultural and legal ones. Equal rights legislation is now starting to be passed in places like Hong Kong; further legislation will have the objective of providing equal opportunity for the heating impaired. As a result, community organizations will have to investigate how such infrastructure services can be provided. An example is the emergency number service "999" which was not available to the hearing impaired in Hong Kong until around 1994, and at present only is available through a fax facility. This means that the heating impaired still do not have an interactive connection to the emergency service. A character based TDD would be required in order to enable the authorities to set up an interactive telecommunications service for the Chinese heating impaired communities of Hong Kong and perhaps the PRC as well. A project is under way at City University of Hong Kong to develop a Chinese character-based TDD. The idea of the City University's TDD project originated during IT Week 1991, part of which dealt with 'IT [InformationTechnology] for the Deaf' Subsequently, the TDD project caught the attention of Hong Kong Telecom who donated HK$200,000 to develop a prototype, to run on an IBM PC. A PC based prototype has been developed and demonstrated. We will soon start trials by placing seven to ten prototypes in the homes of Hong Kong hearing impaired families. SIGN LANGUAGE IN THE CHINESE COMMUNITIES The sign language used in the hearing impaired communities of Hong Kong and other (including Western) territories and countries does not map directly onto the language as spoken or written by the heating communities. Sign languages are like spoken languages in that they evolve into many varieties; consequently, there is no international sign language. Since sign languages do not map directly onto the spoken and written language, the writing systems of the hearing community (including those using Chinese characters) are second languages for the hearing impaired. This makes literacy more difficult to obtain for the hearing impaired. The sign language users of Hong Kong have borrowed some Chinese characters from the written language into their sign language. They will 'sign' these borrowed characters by drawing them in the air. However, this technique has limited application,
Chinese TDD
237
and there still is no direct mapping from such 'signed' characters onto the grammatical structure. (Fok and Bellugi, 1986:224) THE CHINESE LANGUAGE The Chinese language has many dialects. However, in the PRC and Taiwan there is one official spoken language, 'Putonghua', (called 'Mandarin' in Taiwan, and also in Singapore, where it is one of the four official languages). In Hong Kong, the majority of the people speak a Chinese dialect called Cantonese. Written Chinese was standardized in the Qin Dynasty, about 210 BC, but can be traced back some 3,000 years. (Guo and Zhang, 1985:26) The Chinese writing system was based on the language of the major ethnic group, the Han. Recently, this writing system has branched into two major varieties: the 'conventional Chinese character set', used by Taiwan and Hong Kong (and promulgated as the official standard in Singapore), and the 'simplified Chinese character set', used by the PRC. Approximately 1,500 years ago, some neighboring peoples, such as the Japanese and Koreans, started borrowing the Chinese characters for writing their own languages. The Chinese writing system is ideographic, using a combination of pictograms, ideograms, and phonograms (Tang and Clubb, 1992:1 -7); it uses one character per syllable. This works well with Chinese, since the morpheme is normally at the syllable level. In Japanese, however, it often takes several syllables to express the full meaning of a word, and for this reason the Chinese writing system does not map well onto the language. Since the Chinese system did not match the structure of their languages, both the Koreans and the Japanese devised supplementary syllabic alphabets, called 'hangul' in Korea and 'kana' in Japan. In both countries, however, the Chinese characters were retained, with occasional simplifications (which do not normally match the modern, simplified characters introduced since 1955 in the PRC). While the Koreans have given up most of their Chinese characters (especially in the North), the Japanese 'kanji' are still widely used, and form the basic requirement for written communication; in 1954, a subset of 2,000 so-called 'tooyoo kanji' (literally: 'kanji in normal use'] were isolated as the minimum set to be mastered by everybody upon leaving High School. For any normal purpose, such as reading newspapers and novels, however, knowledge of minimum 5,000 kanji is required. (O'Neill, 1982:15-16) Around 100 AD, Xu Shen wrote a famous dictionary known as the 'Shuowen Jiezi'. In his dictionary, Xu Shen stated that there were six writing principles in forming Chinese characters: 1. Pictographic characters (pictures: sun, eye, mountain, etc.); 2. Symbolic characters (an abstract idea: one, above, below, etc.), 3. Ideographic characters (a character formed by combining several symbols:e.g. 'woman' and 'child' together give the meaning 'goodness', a 'woman' under a 'roof means 'peace', etc.); 4. Ideophonetic characters (one portion of a word is a symbolic character or 'radical', giving it a general meaning; its phonetic counterpart indicates the pronunciation);
238
O.L. Clubb and C.H. Lee
5. Transfigurative or extended characters (the character meaning 'music' is used to denote 'pleasure'); 6. Borrowed usage (a few characters with no connected meaning, such as the numerals above the number 3). The majority of Chinese characters (about 95%) are formed on the basis of the first four principles. One of the major drawbacks of the Chinese writing system is the number of characters. For example, the Chinese 'Great Dictionary' of 1915 AD listed 49,905 characters. However, it is claimed that if one knows 100 characters, one can recognize 40% of the content of a general article (but it is not said which 40%). To recognize 90% of the content, it takes at least 1,000 characters (but 'recognize' is not the same as 'understand'). (Chen and Jin, 1984:13-14) CHINESE COMPUTING INPUT SYSTEMS The authors have developed two different types of TDD prototypes based on a personal computer. The first type uses a modem for data transmission and the E-Ten input method. The second type uses dial tones for data transmission and the Jie-jing input method. (Both of the above are keyboard input methods). The reason the project has concentrated on the keyboard as input device is that we are in a "keyboard culture" of computer input methods; since many of the hearing impaired cannot speak, even future technologies such as voice recognition would not be useful. The two input methodologies were chosen because a keyboard containing all the Chinese characters would become too bulky. Even so, we have a few options. There is the coding method based on a Pinyin transcription, using an alphabetic keyboard. However, this works only if the person using the system knows the Putonghua phonetic value of the character. Another approach uses the stroke order method. Chinese characters are written with strokes, following a prescribed order. Stroke input methods use between eight to ten primary strokes to generate characters. This method is among those that the authors believe are worthy of further investigation in connection with developing a Chinese TDD system. For the City University project, we adopted a stroke order input system, the Jie-jing method. This method was developed in Australia. The system uses statistical concepts to speed up the input method. When the first stroke of a character is entered, the five most frequently used characters that start with that stroke are displayed. The user may then select one of the displayed characters; alternatively, if the desired character is not offered, the next stroke of the original character is entered, and so on, until the desired character is displayed. When the fully determined character has been inputted, the first stroke of the next character is entered. Using linguistic techniques, the system tries to present the next possible character based on the meaning of the previously selected character. Even if the user only knows a few of the strokes making up a character, under the Jie-jeng system he or she has a reasonable chance to generate the desired character. This situation is ideal for the hearing impaired, since most of them are not as skilled at writing as the population at large. It is felt that the aid provided by the JieJeng system makes it an attractive option.
Chinese TDD
239
CHINESE TDD PROJECT DESIGN OBJECTIVES
One of our major objectives is to keep the system simple to use. The reasoning is that using the device should require a minimum amount of training. To improve ease of use, the primary stroke symbols are directly represented on the key pad. Another objective is to use as many standard components as possible. We believe that the necessary technology exists, and that the only problem is how package it. The device could use the 12 keys on a touch tone telephone for inputting the message; the use of tones would eliminate the need for a modem, thus making the device simpler as well as less expensive. Also, since the device needs only to receive and transmit characters from a similar device, and would not have to interact with any other machine, it could be kept relatively simple. The authors want to keep the cost of the final production model under US$250, in order to keep the device affordable for the target market. As to the hardware components needed to build a Chinese TDD, the present prototype uses a processor (80286 or similar) to run the Jie-jeng Chinese inputting program. In addition, to store tables, bit maps for characters, and programs, all in all about 1 megabyte of ROM would be required. The device would need about 64K of RAM for local variables used by the programs. A tone generator and tone decoder is used for dealing with the touch tone telephone system. And finally, an LCD screen would be needed that could display at least 25 characters in a message area, in addition to one line containing sufficient space to display 20 characters in the character generation area. TESTING STRATEGY Since the Chinese character based writing system does not map directly on to the deaf's signing system, some testing has to be done in order to find out what specific problems the hearing impaired encounter using the TDD device. The stroke based input method was designed with an objective of minimal training requirement, and indeed, tests have established (although not with the deaf) that in the case of some office workers with ordinary computing knowledge, inputting simple chracters required no training at all. The testing is divided into two phases: In the first phase, the device is placed in four or five homes of hearing impaired adults who regularly communicate with each other. This phase will look at the software interface and general problems with the system, and will continue for about two to three months. The adults will evaluate the device and give feedback on improvements needed. The second phase is to have the device placed in the homes of hearing impaired secondary school students so that they can use the TDD on a casual basis. For testing purposes, there need to be a control group and a test group. The two groups should be randomly picked from a pool of students that are of approximate the same level of ability in using written Chinese characters. The devices will be left in the homes of the test subjects for one school term. At the end of the term, the students' grades in Chinese will be evaluated to see if there has been any improvement. The test group's grades as a whole will be compared to the control group's grades.
240
O.L. Clubb and C.H. Lee
The test subjects' conversations will be saved for analysis. (For ethical reasons, the users will have to be informed that all conversations potentially will be analyzed; their permission should be requested). The analysis will first look for common mistakes in character construction. Second, grammatical errors will be identified. Since there is no direct mapping of characters to signing, patterns of grammatical mistakes may be expected to occur. If such pattems can be established, then teachers of Chinese, by looking for these patterns, can give remedial help to hearing impaired students. The authors feel that if the hearing impaired users have to communicate using the TDD, they will become more literate as a side effect. The more the test subjects are motivated to communicate with others by telecommunication, the more they will be motivated to use the TDD. The more they use the TDD, the more they will practice using the written language; and the more practice they have with the written language, the better they will become at using it. SERVICES PROVIDED BY A CHINESE C H A R A C T E R TDD
At present, the hearing impaired people of the Chinese speaking communities, when making or answering telephone calls, must rely on other people to interpret for them by using sign language. For a hearing impaired person, using the telephone involves having a person present with "normal hearing" to act as an interpreter. This results in a loss of privacy for the hearing impaired person; in addition, such a service may be difficult to arrange, particularly if two hearing impaired people wish to communicate via a telephone line. Our Chinese character TDD will directly address this problem. Also, many other services that people in the hearing community take for granted, such as the use of emergency numbers for police and fire services, could suddenly become available to the hearing impaired. Other, less critical services, such as making an airline reservation, could now also be within reach of the Chinese hearing impaired community. POSSIBLE ADDITIONAL FEATURES Some further thoughts about the final product follow: Both real-time interactive exchange and "e-mail" style messaging aretobe supported. For a TDD, the real-time communication functions are the primary goal. However, given that the resources needed to support Chinese input include fairly extensive computing resources, we can make the messaging facility available with small incremental costs. Furthermore, other telephone features, such as conference 'calls', are being investigated. CONCLUSION The components to build a Chinese TDD are there, they only need to be integrated. The hearing impaired of the Chinese character based writing communities deserve to have access to the same services that are available to the hearing communities in the West and elsewhere. By providing the Chinese hearing impaired a means to have access to communication via telephone lines, a Chinese character TDD would be of significant human service. And not only to the deaf: In September of 1986, of Hong Kong's 5.5 million people, 8,000 were registered with the Central Registry of the Disabled as being deaf (Kwan, 1986:470). For every
Chinese TDD
241
deaf person, there is at least one person with normal hearing that may need a TDD device to communicate with that deaf person. Furthermore, also people that have suffered a severe hearing loss may find the TDD an easier way to communicate. Technology is not an issue with our project. The technology is available in the market-place, the problems are of economy and packaging. Inour project, we found that the main challenge was in the area of input methods, of interactive operation design, and of gathering information by TDD. These findings will help us to further design features which are of service to the deaf communities. REFERENCES
Chen, Zhengwu, and Jin Lianfu, eds., 1984. Chinese Information Processing Systems. Beijing: Chinese Computer User Alliance et al. (In Chinese) Fok, Y.Y., and Ursula Bellugi, 1986. Towards Better Communications, Cooperation and Coordination. In: Proceedings of the 1st Asian-Pacific Regional Conference on Deafness. Hong Kong. Guo, Pinxin, and Zhang Songzhi, eds., 1985. Chinese Information Processing Techniques. Beijing: Nation Defense Industry Publishers. (In Chinese). Kwan, E. W., 1986. Toward Better Communications, Cooperation and Coordination. In: Proceedings of the 1st Asian-Pacific Regional Conference on Deafness.Hong Kong. O'Neill, P.G., 1982. Essential Kanji. 2,000 Basic Japanese Characters Systematically Arranged for Learning and Reference. New York & Tokyo: Weatherhill. Tang, M.W., and O.L. Clubb, 1992. Chinese Computing, History and Trends. Hong Kong: Tamarind Books.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All fights reserved.
243
Chapter 15
TEACHING SYLLOGISTIC TO THE BLIND Laurence Goldstein
Department of Philosophy Hong Kong University laurence@hkucc, hku. hk
by Bill Watterson
CaIvin and Hobbes Is~T IT STrAnGe_ ~
SMELLS
[email protected]~O ~I~CATI~E, BuT ~E c . ~ T
DESC~IBZ
1
[
~4AS A S~ORK~, BElk~B~S~ / IY,.RC~N~,N~,~,LS I BR~ ~KX,BU~ Ik.__ 5~c._LL. ~.t I W%L}LO~A~E / T~ LQVq
I
v
, ~ou~ T~LU~ ~ ~E~ ~E TI4ATM~\MAL~ L SUR~. / i
S~ELL? 9~
~
~
'l~il
"
(c) Universal Press Syndicate The cognitively impaired (let this cover sensory, emotional and other kinds of mental impairment too) ought to be one of the primary objects of research in Cognitive Technology. If one wishes to learn about the relation of the human brain to the mental characteristics of a person, or to the environment with which a person interacts, one method would be to remove parts of a subject's brain and observe which functions were obliterated or impaired as a result. There are, however, obvious ethical limits to this kind of investigation. Another method is to observe subjects who already have
244
L. Goldstein
some deficit and to compare these with non-impaired subjects 1. For example, in one famous case dating from 1848, an accident at work caused a large metal stake to fly through the head of Phineas Gage, a young roadworker, yet, despite heavy loss of brain matter, this individual appeared to suffer no loss of cognitive abilities and suffered only a personality c h a n g e - he became more moody. Encephalics, with only 10% of normal brain volume, and that squeezed around the periphery of the skull, often are normal or close to normal in all aspects of cognition. Stroke victims frequently suffer specific language deficits, such as anomia: all aspects of normal speech are retained, apart from the ability to remember names. The study of aphasics leads to a better understanding of the structure of language, and that, in turn, to new techniques for second-language acquisition. Any such process falls under the description 'cognitive technology' if it involves the construction of devices which play an essential role in manipulating subjects' cognitive economy (their mental make-up) either in the course of research experimentation or as the outcome of such research. Now, in a process of this sort, the ideal order of events would be as follows: gain an understanding of the nature of a cognitive impairment, then build devices to help remedy the deficit by, as it were, stretching one part of the mind/brain to compensate for what is missing in another; next investigate the possibility of applying the mind-stretching technique to the non-impaired. By a process of mutual adjustment, a more symbiotic relationship between mind and the environment would be achieved: the artefactual environment could be made more accessible to minds, and minds could be stretched to engage more profitably, either in terms of efficiency or ecstasy with the surrounding world 2. Unfortunately life is not that simple. First, it may be no easy task to specify, except in the broadest terms, the nature of the particular cognitive impairment - it is one thing to describe brain damage, quite another to describe mental deficits. Second, in most cases of such impairment only the most tentative, provisional hypotheses can be put forward. So, third, at this stage, one is not generally in the position of being able to build a device to remedy the deficit; rather, the function of building the device is primarily to test the hypothesis. In the light of test results, new hypotheses get formulated, and new devices have to be built in order to test them. This last phase may then cycle several times. The present paper describes a device, in fact, two devices, which were designed to help blind students learn a simple fragment of elementary l o g i c - the theory of syllogisms which originated over two thousand years ago with the work of Aristotle. The devices have been built and used, but have not so far been extensively tested. It will be clear why the project is still at this early stage: in Hong Kong it is difficult to find a reasonably large cohort of blind people who need to learn syllogistic logic! The next stage of the operation would require building multiple copies of the equipment
Thus, Oliver Sacks (Sacks, 1995) writes: 'Total colour blindness caused by brain damage, so-called cerebral achromatopsia ... has intrigued neurologists because, like all neural dissolutions and destructions, it can reveal to us the mechanisms of neural construction--specifically, here, how the brain "sees" (or makes) colour.' 2 I am not taking a stand on the question of whether the mind is an entity separate from the brain or whether the two are identical or, indeed, on any other option in the theory of mind. The use of both terms is merely in recognition of the fact that there is at present no intertranslation between psychological and neurophysiologial discourse.
Teaching Syllogistic to the Blind
245
and performing tests in, for example, a large number of universities and community colleges in North America. But for purposes of the present paper, the outcome of this large-scale experiment is not so important. For the project, even at its present state, throws up a host of very deep questions, many not local to itself but integral to cognitive technology research in general. It will be necessary to give a sketch of what we are trying to teach by means of these devices. The theory of the syllogism is simple, but beautiful. There is a useful comparison that can be made between it and Euclidean geometry. Recall that there are some geometrical truths that seem so obvious as hardly to require proof; for example, that vertically opposite angles are equal. The proof of this theorem is indeed short and dead simple. But other theorems of geometry, such as Menelaus' Theorem are extraordinarily surprising. Yet, by a series of simple steps, Menelaus' Theorem can be derived from the elementary Euclidean postulates. Now, Aristotle had the idea, which we now take for granted but which, when you think about it, is utterly sensational, namely that the discussions that people have, specifically, the arguments that they put forward, can be taxonomized and studied in a scientific, mathematically precise way. There are simple arguments that we can see, straight away, to be valid, such as All ducks are German citizens All German citizens are prime numbers Therefore
All ducks are prime numbers.
But there are other arguments which, though not, on the surface, much more complicated, are not at all easy to assess for validity. Here's an example: No cows are French citizens Some French citizens are moons of Jupiter Therefore
Some moons of Jupiter are not cows.
Arguments like these which have just two premises, three noun phrases each occurring twice, and where each sentence has to be of one of only four possible types are known as syllogisms. Syllogisms comprise only an insignificantly tiny fraction of the arguments that occur in ordinary conversations. Psychologists have run experiments to find the average time it takes normal subjects to produce a verdict ('valid' or 'invalid') on the different possible kinds of syllogism, and, as you can imagine, the time needed for the type of argument exemplified by our second example is far greater than that needed for those exemplified by our first.
The form (or structure) of the first argument is All X are Y All Y are Z Therefore
All X are Z
L. Goldstein
246 And the form of the second argument is No X are Y Some Y are Z Therefore
Some Z are not X
Aristotle showed that, on the basis of our knowledge of the validity of simple argument-forms, such as the first, it is possible, by a series of simple steps, to derive a correct assessment of the validity or invalidity of difficult argument-forms such as the second3--just as, in Euclid, we build upon the simple theorems in order to prove the complicated ones. This is a powerful technique. With the complicated syllogistic arguments, people differ quite a bit in their assessments of validity. But with Aristotle's work, we have a rigorous procedure for determining with mathematical objectivity which arguments are valid and which are not. It is clear that someone who can prove Menelaus' Theorem has a richer understanding of the theorem than does someone who just understands a statement of the result and accepts it as true. Good teaching imparts deep understanding. Now, it is known that, of the 256 possible types of syllogistic form, only 19 argument forms are valid. So, if we wished to teach someone to recognize valid syllogisms (so that, in practical life, he could detect and reject the invalid syllogistic arguments he encountered) all we should have to do is to make him learn the 19 valid forms. However, this would not be to furnish that person with a rich understanding of syllogistic. In the 19th century, the English mathematician John Venn invented a technique for testing the validity of syllogisms which is both extremely simple to employ and which imparts (or, at least, seems to impart) rich understanding. His is a graphical technique. The basic idea is that the sentences occurring in syllogistic arguments be paraphrased as sentences expressing relations between classes of things, and then those relations can be depicted graphically, using circles to represent classes. For example, consider the sentence
No positrons have negative charge. This can be paraphrased as
The intersection of the class of positrons and the class of things that have negative charge has no members - it's empty. Now represent the intersection of classes as the intersection of circles, and indicate emptiness by shading the relevant section of the diagram. The result is this
3 See (Lear, 1980).
247
Teaching Syllog&tic to the Blind
positrons
things that have negative charge
Figure 1 The left hand circle represents the class of positrons; the right hand circle the class of things that have negative chargeMbut this diagram can be used to represent any sentence which has the form No X are Y Consider next a sentence of the form Some X are not Y
The paraphrase for this is The part o f the class o f X which contains no members o f Y contains some m e m b e r s - it's non-empty
This is represented graphically as follows
Figure 2
248
L. Goldstein
Note that non-emptiness (i.e., the presence of some members) is denoted by a little bar. It should be obvious how to give a graphical representation of the sentence 'Some X are Y'. We are now in a position to draw a Venn diagram for the first of our sample arguments, the one which had the form No X are Y Some Y are Z ThereforeSome Z are not X Draw 3 intersecting circles and represent the two premises of the argument, as follows
Y \
.,Y,.
/
Z
Figure 3 Now, how would the conclusion be represented on this diagram? Obviously, by drawing a bar lying within the 'Z' circle but outside the 'X' circle. But look at the diagram: in representing the premises, we've already done just that: there is already a bar lying within the 'Z' circle but outside the 'X' circle; the conclusion is already contained in the premises and that's the criterion for a deductive argument being valid. What about invalid arguments? First, I'll show how to represent graphically a sentence of the form All X are Y The paraphrase for this is The class of things which are X but not Y is empty
249
Teaching Syllogistic to the Blind
in other words There are no things that are X but not Y
So the representation looks like this
X
Y
Figure 4 You can now represent the first of our sample arguments and demonstrate graphically that it is indeed valid. But consider the argument All sulphates are alkalines No sulphates combine with ammonia Therefore
No alkalines combine with ammonia
To depict the conclusion of this argument, one would need to shade the whole lens formed by the intersection of the Y (alkalines) circle with the Z (things which combine with ammonia) circle. No such picture is obtained when the premises are represented in the diagram (figure 5); therefore the argument is invalid. In my own Department, we use a microcomputer program called JOHN to teach syllogistic logic, and JOHN, as its name implies, employs the Venn-diagrammatic technique 4. Of course, this program is of no use to blind students. So how does one teach blind students to test the validity of syllogisms? Well, one method would be as follows. The premises of any syllogism can be written as expressions in Boolean algebra. So, for example, a sentence of the form: No X a r e Y
4 This is part of a software package (Goldstein and Moore, 1991). A software package called Hyperproof, developedby Jon Barwise and John Etchemendy teaches a much broader segment of firstorder logic by graphical methods. Information on this is available on the World Wide Web. The URL is http://csli-www.stanford.edu/hp/index.html
250
L. Goldstein
sulphates
alkalines
~
~
thingsthat combine with ammonia
Figure 5 is written as the equation XY = 0 Since there is an effective (i.e., a purely mechanical) method for determining the validity of syllogisms, it would be very easy to write a program such that, when the blind student typed in the Boolean expressions for the premises and conclusion of a syllogism, the computer would always return, in audible form, the correct verdict on whether that syllogism was valid or not. Hence the blind student would be able, perfectly easily, to test syllogistic arguments for validity. The problem is, of course, that by using such a method, the blind student gains no rich understanding of syllogisms; he just (blindlyT) gets the answer right every time. The challenge, then, is to design a device for blind students that will enable them to get as deep an understanding of syllogisms as sighted students using the Venn technique can obtain. This may not be a simple task for, as Calvin reminds us (see cartoon at the opening of this paper), the different modalities are not comparable in terms of either sensitivity or dimensionality. Both Tim Moore and I faced this challenge, but our solutions were very different. He made a 'tactile' version of Venn's diagrams which has intersecting hexagons instead of intersecting circles. The student locates his position on the device by feeling the different serrations on the raised rims of each of the three intersecting hexagons, and represents premises by means of plastic fillers (the equivalent of shading on the Venn diagram) and a metal piece which plays the r61e of Venn's bar. The idea behind Moore's device Venntouch is to create equivalents in tactile 'space' to the visible relations embodied in Venn's diagrams.
251
Teaching Syllogistic to the Blind
The device called Sylloid, invented by me, makes no such attempt to preserve an isomorphism across sense modalities. Instead it is inspired by the recognition that the Venn diagram can be divided into seven significant areas, as shown
Figure 6 Sylloid consists of seven tetrahedra (pyramids) fitted to seven of the faces of a solid regular octahedral core, the 'spare' face being anchored to a heavy base. The user locates his position on the device by means of some metal brailled buttons.
Sylloid
252
L. Goldstein
Premises are represented by pulling the relevant tetrahedra away from the core (the Venn equivalent of shading) and by slapping a magnetic hinge across faces of the tetrahedra to represent non-emptiness. There are certain poor features of the design of Sylloid. For example, when trying to pull a tetradron away from the core, it is hard to get a purchase on a vertex, especially if one's hands are sweaty. Second, when a tetrahedron is successfully pulled out, it can fall to the ground, creating problems for a user who cannot see where it has landed. Venntouch too has its practical problems. The plastic 'filler' pieces are not easy for a blind student to orientate whenthey have to be inserted into a given area: fitting pieces to places is fairly easy for a sighted person, because we can see simultaneously the shapes of the gap to be filled and of the piece for filling it. For the blind person, a feat of tactile memory is required. These are practical problems, and there is no substitute for building and testing out devices, for it would be a miracle if all such difficulties could be anticipated and avoided. However, the more interesting problems are theoretical, and it is now that the questions for future research start piling up. For a start, what is the purpose of teaching someone the rules of syllogism or the Venn-diagrammatic technique for testing validity? One answer would be that it makes the learner a better reasoner, more sensitive to the argumentative mistakes of others, more circumspect in his own reasoning. Yet it is by no means clear that a diagrammatic technique can be 'transferred' inside the head so that, in normal reasoning situations, the master of Venn can reason well, unaided by diagram or device. For comparison: you can teach a man with a one leg to walk with a crutch, but that will not mean that he can walk well if you take away the crutch. This question of learning transfer from artificial to real situations, well known of course to psychologists, is bound to loom large in cognitive technology. One response to the above problem might be that, in the case of Venn diagams, one can move the 'crutch' to the mind. In other words, after a bit of practice, the learner does not need to draw diagrams on paper, he can just create such diagrams in his visual imagination. It is true; he can. But reasoning by introspecting diagrams that one has introspectively constructed is not the normal way in which an efficient reasoner operates. PhenomenologicaUy, we know this to be so: rarely do visual images invade the mind when we are engaged in deductive reasoning. And it certainly could not be the case that such images are necessary for reasoning, for otherwise blind people would not be rational. Fast reasoning would be inhibited, not enhanced, if a cumbersome procedure of evoking images had to be invoked. It is often simply assumed by psychologists and linguists that any kind of thinking or perception involves the manipulation of inner representations, and that it is the presence of such entities that makes perception possible. On this account, what would be needed of a device for the blind that did the work of Venn Diagrams is a piece of equipment that produces an inner tactual representation playing the same, or roughly the same role as the visual representation, or sense-datum does for the sighted person. But whether there are such inner representations and, if there are, what they are, has been the subject of heated philosophical debate for centuries. There was, for example a famous discussion of the issue between Malebranche and Arnauld, the latter denying
Teaching Syllogistic to the Blind
253
the existence of these 'inner objects '5. One problem, of course is to understand how these objects perform the function they are alleged to perform. Are these inner objects perceived by 'inner eyes'? If this is so, then the project of producing an inner tactual representation (in the sense of an inner representational object) for a blind person is doomed to failure 6. A more cautious approach to the problem of how exterior objects are mapped into the interior and thence processed is taken by Keith Stenning and Jon Oberlander who argue that graphical representations are more computationally tractable than arguments stated verbally and hence that, in psychological reality, it would be more efficient if, where possible, reasoning were conducted graphically. They write: 'Since we observe that the external circles are conducive to reasoning, we speculate that it is because this external aid maps onto internal structures and processes perspicuously. And so the algorithm hints as to what these processes might be. We do not however believe, in the style of some imagery researchers, that what is implemented within is isomorphic to the full detail of the external aids. We rather look for minimal internal implementations...' 7 Yet their minimal implementation (for which they have produced a PDP simulation) is still recognizably graphical and their leading idea is that the spatial analogy (sets as containers) reduces the 'problem space' thereby simplifying syllogistic reasoning to the point where, literally, it may be taken in at a glance. Clearly, then, this method is not available to the blind. Yet blind students, using Sylloid and Venntouch, have become perfectly competent at solving syllogisms. Some intriguing possibilities now suggest themselves. First, that testing for validity at a touch may be quite a different process, but one (almost) as efficient as testing at a glance. One would need to compare solving speeds for blind and sighted subjects and, at this stage, such experiments would be wholly unreliable, for we are quite uncertain about the extent to which design defects in Sylloid and Venntouch slow the touch testing. (It would be necessary also to separate the congenitally blind from those who had lost their sight; the tactile perception of the latter group might be contaminated by visual strategies.) Second, we have pointed out that the 'proof of the pudding' of any device used for teaching the testing of syllogisms comes when the device is thrown away and the user attempts to bring his skill to bear on real-life reasoning. Now, it is obvious that the use
5 For discussion of these issues and of the Malebranche-Arnauld controversy, see (Ishiguro, 1994) and (Watson, 1994). The philosophical problems involved in the theory of perception are fearsomely difficult, and cannot be broached in this paper. For a useful introduction to recent discussion, I recommend the editor's introduction to (Crane, 1992). For a defence of the view that blind people can form perceptual representations of space, see Gareth Evans' paper 'Molyneux's Question' in (Evans, 1985). 6 It has, however, been argued with some persuasiveness that one can demonstrate a contrast between visual and tactual spatial perception (e.g., that there is no tactual counterpart to the visual field) without relying on the theory of private, inner objects. See (Martin, 1992). 7 See (Stenning and Oberlander, 1994) which favours Euler Circles over the Venn Diagrams that constitute the mental models of (Johnson-Laird, 1983). There are several standard logic texts that discuss both Euler Circles and Venn Diagrams, for example, (Stebbing, 1966). William and Martha Kneale, in (Kneale and Kneale, 1962), p.337, point out that geometrical illustrations of logical relations similar to Euler's had been used by Leibniz (1646-1716), and Clark Glymour claims that drawing circles to test syllogisms for validity was a device developed during the Renaissance (Glymour, 1992).
254
L. Goldstein
of calculators does nothing to enhance a user's mental arithmetic: it is pitiful to see checkout clerks at supermarkets rendered helpless when the computer goes down. Likewise, the system of cueing speech by means of hand gestures, intended to improve the speech of deaf children, had the effect of depreciating the children's speech because they came to use the gestures as a substitute for forming the correct sounds 8. Do diagrams and devices help furnish the user with useable reasoning skills, or do they have the opposite effect of diminishing our natural reasoning abilities? The really interesting question would be whether in real-life argumentation, blind people who had learned from Sylloid, Venntouch or some similar device performed better than the sighted who had learned through Venn or Euler (and better than the untutored). If this proved to be so, then sound pedagogy would demand that we teach sighted people with devices designed originally for the blind. Is this a realistic possibility: that the learning achieved through the methods designed for the blind is superior to what sighted people achieve? This brings me to my third point: Yes. In discourse about argumentation, the containment metaphor is pervasive. As we have seen, set membership is construed as objects within a container. We say (following Kant) 'in all judgments ... either the predicate B belongs to the subject A, as something which is (covertly) contained in this concept A; or B lies outside the concept A...'.9 We say that a deductive argument is valid if the conclusion contains the premises, and this metaphor is made vivid in Venn diagrams where we use circles as depictions of containers and test for validity b y examining whether those containers have been filled (the circles filled in) in such a way that the depiction of the premises contains the depiction of the conclusion. George Lakoff has argued that meaning is a product of metaphor; in this case that sets are containers. 1~ But this may be an entirely false and misleading conception of sets. Some sets contain themselves as members - e.g., the set of sets that contain more than five members. But nothing can physically contain itself. Further, if we think of a set as a container, we are likely to find difficulty in accepting the non-existence of the Russell set (the set containing all and only non-self-membered sets) because we envisage that set as a bucket into which we can unproblematically shovel members, such as the set of pigs, the set of sets of students studying logic, the set of prime numbers etc.. Yet we know, from elementary logic, that the Russell set does not exist. So using containment as a model' for set membership carries a danger--that of reading features of the model back into the thing being modelled. It is likely, then, that the containment metaphor, as incorporated in Venn diagrams, is not the best model for explaining the theory of sets, nor for understanding what makes for syllogistic validity and invalidity. The operation of SyUoid doesn't trade on the containment analogy, and the structure it incorporates seems even more minimal than that of Stenning and Oberlander. But to say that in this device we have the heart, the essence, of syllogistic reasoning would be vastly premature.
8 See (Cornett, 1967). I'm grateful to Gill Clezy for information about this syndrome. 9 (Kant, 1781) A6/B10. In 'Two Dogmas of Empiricism' (Quine, 1953), Quine criticizes Kant's account on the grounds that 'it appeals to a notion of containment which is left at a metaphorical level' (p.21). 10(Lakoff, 1987: 458).
Teaching Syllogistic to the Blind
255
REFERENCES
Cornett, R. O., 1967. Cued Speech. American Annals of the Deaf 112:3-13. Crane, Tim, ed., 1992. The Contents of Experience. Cambridge: Cambridge University Press. Evans, Gareth, 1985. Collected Papers. Oxford: Oxford University Press. Glymur, Clark, 1992. Thinking Things Through. Cambridge, MA: MIT Press. Goldstein, Laurence, and Tim Moore, 1991. Logic Tutor: a suite of programs and manuals. Hong Kong: Logical Products (HK) Ltd.. Ishiguro, Hide, 1994. On Representations. European Journal of Philosophy 2: 109124. Johnson-Laird, Philip, 1983. Mental Models. Cambridge, MA: Harvard University Press. Kant, Immanuel, 1781. Critique of Pure Reason. English edition, 1929, trans. N. Kemp Smith. London: Macmillan. Kneale, William, and Martha Kneale, 1962. The Development of Logic. Oxford: Clarendon Press. Lakoff, George, 1987. Women, Fire and Dangerous Things. Chicago: The University of Chicago Press. Lear, Jonathan, 1980. Aristotle and Logical Theory. Cambridge: Cambridge University Press. Martin, Michael, 1992. Sight and Touch. In: Tim Crane, ed., The Contents of Experience, 196-215. Quine, Willard, 1953. From a Logical Point of View. Cambridge, MA: Harvard University Press. Sacks, Oliver, 1995. An Anthropologist on Mars. New York: Picador. Stebbing, Susan, 1966. A Modern Elementary Logic. London: Methuen. Stenning, Keith, and Jon Oberlander, 1994. Spatial inclusion and set membership: a case study of analogy at work. In: K. Holyoak and J. Barnden, eds., Advances in Connectionist and Neural Computation Theory, Vol. 2. Hillsdale: Lawrence Erlbaum. Watson, Richard, 1994. Having Ideas. American Philosophical Quarterly 31:185-198.
This Page Intentionally Left Blank
Cognitive Technology: In Search of a Humane Interface B. Gorayska and J.L. Mey (Editors) 9 1996 Elsevier Science B.V. All rights reserved.
257
Chapter 16 USING MICROCOMPUTER TECHNOLOGY TO PROMOTE STUDENTS' "HIGHER-ORDER" READING Che Kan Leong* Department for the Education of Exceptional Children University of Saskatchewan, Canada leong@sask, usask, ca
ABSTRACT This report is in two interrelated parts. Part I discusses the theoretical underpinning of computer-mediated reading and text-to-speech computer systems for the enhancement of reading. Part II reports on a series of three studies using the sophisticated DECtalk text-to-speech computer system with students, in a move towards the goal of"computer-based medium for thinking and communication." Study 1, with two experiments, showed a high level of intelligibility of DECtalk speech in children. There was some evidence of the efficacy of combined on-line reading and DECtalk auding. Study 2 used a pre- and post- test training design and examined the comprehension of reading 12 expository prose passages in a group of 67 grades 6, 7 and 8 below average and above average readers; the group was further divided into subgroups for on-line reading only, and for on-line reading plus DECtalk auding. There was an overall training effect, but the efficacy of DECtalk plus on-line reading with explanation of difficult words was verified for only two of the passages. Study 3 further tested the contribution of grade (age), reading level (above and below average) and response mode in on-line reading plus DECtalk auding in 192 grades 4, 5 and 6 students. The students in each grade were randomly assigned to one of 4 experimental conditions: (1) on-line reading plus DECtalk auding, (2) on-line reading plus DECtalk auding plus explanation of difficult words in both modes, (3) on-line reading plus DECtalk auding plus explanation of difficult words in both modes plus metacognitive The studies reported were assisted in part by the Social Sciences and Humanities Research Council of Canada with SSHRCC research grant No. 410-89-0128. I am grateful to SSHRCC for this assistance. I thank S. Lock, M. Leung and L. Wang for the different phases of DECtalk computer programming; M. Baker, J. Lappa, M. Mackay, G. Martens, L. Proctor, L. Reineke and K. Sarich for their work in different schools over a period of time; and the students and teachers in these schools for their participation in the various phases of the DECtalk project. Aspects of this paper were presented at the Invitational International Symposium on Exploration and Advancement of Technology for Persons with Learning Disabilities in Missillac, France in July, 1993 and at a seminar for the Cognitive Technology Group, City University of Hong Kong, Hong Kong in November, 1993. Recent post-experimentation discussions with Jerome Elkind, Ingvar Lundberg and Marshall Raskind have given me further ideas to explore this line of work for research and instruction. Any shortcomings are necessarily my own.
258
Che Kan Leong
activities, and (4) on-line reading plus DECtalk auding of the "simplified" passages. Reading comprehension was assessed in two ways: (1) verbal answers to inferential questions, and (2) verbal summaries of the passages. Analyses of variance and covariance showed significant differences in grade, reading level, and response mode in favor of inferencing over summaries, but not with respect to the experimental conditions. However, reports and observations showed high motivation of learning among the students. The three studies taken as a whole are discussed within the framework of knowledge acquisition and social construction of knowledge. INTRODUCTION The importance of conceptual and theoretical bases for computer-mediated reading is argued forcefully in a review by Reinking and Bridwell-Bowles (1991). Other researchers suggest that the computer also reorganizes and redefines cognitive functions, not simply amplifying them (see Webb & Shavelson, 1985, for details). These kinds of theoretical and empirical studies have been expanded to include a wider range of technology of computers, videodiscs, and teleconferencing to construct or reconstruct learning in information-rich realistic contexts (see Cognition & Technology Group in Vanderbilt, 1992; Lehrer, 1992; Nix & Spiro, 1990, for details). There is evidence of actual benefits and further potentials in using computer technology to assist learning in students. However, computer technology alone cannot guarantee "optimal adjustment" to individual learners because of the complexity of human behavior. To be effective, computer technology needs to involve human tutors in the process, and significant adult-child interaction and appropriate instructional procedures are necessary (Hativa & Lesgold, 1991; Margalit, 1990). Computer environments should be "socialized" so that learners respond to them as if they were empathetic tutors (Turkle, 1984). Ideally, such sophisticated and empathetic tutoring systems know when or how to diagnose learning "bugs", when and where to intervene, and how to provide motivational as well as cognitive support generally (Lepper & Chabay, 1988; Lepper & Gurtner, 1989).
Scope of this report
The Computer as Cognitive Support in Reading Difficulties My research into computer-mediated reading with grade school children with mild and severe reading disabilities is in the spirit of adaptive education first enunciated by Robert Glaser (1977). I aim at linking psychological and technological knowledge with educational practice to assist learning and to narrow individual differences in reading. Specifically, the concept of adaptive education derives from the psychological principle of compensation (Leong, 1993; Lundberg & Leong, 1986). Within the context of the multi-level and multi-component approach to reading and its disorders, compensation is conceptualized as emphasizing or enhancing one component of reading (for example, the phonological or morphological aspects of words, sentential comprehension) more than another component, in order to ameliorate decrements or deficiencies in the components. Compensation can be provided by parents, teachers or clinicians to enrich the learning situation; it can take the form of adaptive or improved
Using Microcomputer Technology
259
task properties; and it needs cognitive support systems generally, including the use of technology (Backman, 1985). These forms of compensation are not mutually exclusive; they reinforce one another interactively and their principles apply to the reading process, in accordance with the interactive-compensatory model of Stanovich (1980) THEORETICAL BASIS OF THE PROJECT Automaticity Principle and Immediacy of Feedback The series of studies using the (DECtalk) text-to-speech computer system attempt to provide on-line and computerized speech support to readers, as and when they need such help. This is accomplished by harnessing the interactive nature of the microcomputer and its capability for storing lexical and discourse materials for immediate or delayed retrieval. These capabilities readily lend themselves to studies of the speedy and accurate processing (automaticity principle) of words, and of units larger than words, that goes into enhancing reading comprehension. Drawing on works by Lesgold (1983), Perfetti (1985), Stanovich (1986) and others, the postulate of our studies is that reading comprehension, as reflected in answers to inferential questions, text recall, and summarization, relates to local processing levels of efficient and high-quality access of words, and of overall auding (listening to text) abilities. Individual differences in text comprehension can be traced by the efficiency with which children remember words just read, activate their naming codes, analyze their morphological relationship, and integrate the successive units (words, phrases, clauses), as they come along, into propositional form for interpretation. Furthermore, poor or low ability readers require more time to access words, but their processing is facilitated by context just as much as, if not more than, in the case of good readers; good or high ability readers can also be affected by context when their lexical access is slowed down (Perfetti, 1985; Stanovich, 1980; Stanovich & West, 1981; Stanovich, West, & Feeman, 1981; West, Stanovich, Feeman, & Cunningham, 1983). "Contextual Enrichment" Two
Computer Approaches
The principle of "contextual enrichment" has been used by George Miller and his colleagues (Gildea, Miller, & Wurtenberg, 1990; Miller & Gildea, 1987) to promote word knowledge through the use of interactive videodiscs; and by Leong and colleagues (Leong, 1992b, 1992c, 1992d; Leong & Mackay, 1993; Lock & Leong, 1989) in enhancing reading comprehension by putting to use the text-to-speech (DECtalk) computer system. There is, however, a basic difference in the Miller and Leong approaches. Miller and colleagues used sentence contexts in narratives and offered both visual and linguistic enrichment from videodisc technology to help young children learn words and their meaning. Leong's experiments in situ provided more precise explanations of, or substitutions for, difficult words and phrases on-line and via DECtalk speech, used as "instruments" (Stahl, !991) for the enhancement of reading comprehension. The purpose here is not to compare these, almost opposite approaches: they may be used for different purposes and may attain different goals. Rather, I will attempt to
260
Che Kan Leong
show what can be achieved with modest instrumentation, without the advantage of multi-faceted technology for multitasking. My rationale is to provide word knowledge both on-line and with immediate speech support, in order to enhance reading comprehension. While context may constrain word meaning, facilitation may not apply with great precision, and certainly not for low frequency words (McKeown, 1985; Perfetti, 1992; Schatz & Baldwin, 1986). Word knowledge may pose problems for poor readers, especially for those with below average decoding and segmentation abilities(Anderson & Davison, 1988). While unfamiliar words can impede reading comprehension, instructed words improve the recall of propositions (Omanson, Beck, McKeown & Perfetti, 1984), and instruction needs to be sustained and systematic. The better results of systematic instruction derive from a combination of explicit reference to meaning, learning through examples, learning through verbal contexts, and learning through the analysis of the hierarchical and relational aspects of derivation, inflection, and compounding of words (Beck, Perfetti, & McKeown, 1982; Jenkins & Dixon, 1983). In word learning, emphasis should be on multiple cues and multiple exposure of words to build up their phonological, morphological, syntactic, and semantic networks (Leong, 1992a; McKeown & Curtis, 1987; Stahl, 1991; Sternberg & Powell, 1983). EXPERIMENTAL STUDIES
Rationale of Computer-Mediated Reading There are good reasons for using the computer to assist reading; just as there are conceptual and methodological issues in how best to do this (Reinking and BridwellBowles, 1991). One main reason is that reading is a real-time language activity using all types of available linguistic information (Bierwisch, 1983). Lexical access and semantic encoding can be facilitated with an on-line approach using the microcomputer, interfaced with the text-to-speech (DECtalk) computer system, so as to provide immediate on-line reading and high-quality synthetic speech support and feedback of segments of words and of discourse. The on-line approach also takes into account that text comprehension is incremental and cumulative. The comprehension process incorporates the buffering of information from different linguistic units; the retrieving of old information; the purging of redundant information from working memory; and the integrating of new and old information from successive segments for propositional encoding (Jarvella, 1979). The computer interfaced with synthetic speech can be used effectively to guide readers to achieve a smooth integration of different discourse segments for accurate and automatic processing. Indeed, one dilemma in computer-mediated reading instruction is to maintain a balance between "basic-but-dull" word decoding and "complex-butengaging" text comprehending, and to make both tasks interesting and effective (Perfetti, 1983). The other reason is that for those students diagnosed as "backward", or gardenvariety poor readers in accordance with the "Simple View" of reading of Gough and Tunmer (1986), reading disorders could result from difficulties in decoding, in comprehending, or in both. One of the claims of the Simple View "may well be ... that skilled decoding combined with skilled listening must produce literacy" (Gough & Tunmer, 1986: 9). These are testable hypotheses. Some evidence of decoding and
Using Microcomputer Technology
261
listening comprehension interacting on reading comprehension is provided by Hoover and Gough (1990) in a longitudinal study of grade school children. Their results (based on a series of regression analyses) show that the linear combination of decoding and listening comprehension accounted for a substantial proportion of the variation of reading comprehension, with enhancement from the multiplicative effect of decoding and listening comprehension; furthermore it is shown that both components are needed. In the domain of computer assisted reading proper (without speech accompaniment), a report on grade 6 subjects shows that providing them with vocabulary learning on a computer screen could increase their reading comprehension (Reinking & Rickman, 1990). Further, poor readers were facilitated in their reading comprehension when the computer incorporated comprehension monitoring (Reinking, 1988). However, results from a recent study of computer-mediated reading comprehension are less sanguine, and emphasize the need to buttress computer reading with metacognitive activities, as well as the importance of working memory (Swanson & Trahan, 1992). These differing results will need to be further validated.
Text-to-Speech Conversion and Systems The emphasis placed on accurate and automatic access of word meaning and pronunciation as an aid in reading comprehension would make it appear that text-tospeech (TTS) computer systems could be put to use for providing bisensory feedback to assist readers. In general, computer speech production ranges, in a cascading manner, from stored samples of human speech in digitized form, with varying quality of speech, to sophisticated text-to-speech computer systems which incorporate "deeper" linguistic knowledge. Technical discussion of speech conversion can be found in several volumes (e.g., Allen, Hunnicutt & Klatt, 1987; Klatt, 1987; Witten, 1982). Overview of the DECtalk System Text-to-speech systems such as MITalk and DECtalk are generally defined as computer devices that analyze, synthesize, and convert "plain" or unrestricted text into fluent and high quality speech. These high-fidelity synthesized text-to-speech systems can analyze plain text, without having recourse to phonetic and prosodic markers for all linguistic information; they can synthesize the analyzed information to produce the output acoustic and articulatory waveforms in the form of fluent and highly intelligible speech (Leong, 1992c). The DECtalk system, together with its variant, the Swedish Infovox multilingual system, is the mainstay of the various research reports in a special issue on reading and spelling using text-to-speech computer technology (Leong, 1992b). The DECtalk device has a large vocabulary and makes use of analysis-by-synthesis principles to extract the underlying phonemic, morphemic, and syntactic representations of unrestricted text to produce synthetic utterances. The access to deeper linguistic knowledge in DECtalk, its range of speaking rate from 120 to 250 words per minute (WPM) (recent DECtalk PC card ranging from 120 to 550 wpm), and its 7 built-in voices 1 with "Perfect Paul" being the preferred mode, offer considerable possibilities for the project on hand. In the course of our investigation, a A recent DECtalk PC card has 9 voices.
262
Che Kan Leong
DECtalk program library with computer programs in Turbo Pascal routines was compiled by Lock and Leong (1989) to facilitate the tailoring of the hardware. Before they start using the DECtalk system with children, especially those with reading and spelling disorders, researchers need to ask and answer several pertinent questions. One question concerns the degree of intelligibility of DECtalk speech; another is whether or not the bisensory output of on-line text simultaneously with DECtalk speech is optimal for reading and.text comprehending. There are further issues: technical ones such as hardware and sottware configurations; conceptual and methodological ones such as the nature of "useable" text on-line, output of discourse materials, and other aspects. Some of these issues are discussed in subsequent sections; fuller discussions are provided in Leong (1992b, 1992c, 1992d). Study 1
Intelligibility of DECtalk Intelligibility and other aspects of text-to-speech systems with adults have been investigated by Pisoni and his colleagues in Indiana (Greene, Logan, & Pisoni, 1986; Greenspan, Nusbaum, & Pisoni, 1988; Ralston, Pisoni, Lively, Greene, & Mullennix, 1991); the quality of DECtalk synthetic speech is rated very highly by these researchers. For children, Olson, Wise and their colleagues in Colorado have shown that disabled readers did not differ from college students in the recognition accuracy of words spoken by DECtalk ("Perfect Paul" voice), and this accuracy rate of 94.5% differed only slightly from these children's perception of the same words spoken in natural speech (98.4%) (Olson, Foltz, & Wise, 1986). Furthermore, as shown in a long-term remedial reading program (Wise, Olson, Anstett, Andrews, Terjak, Schneider, Kostuch, & Kriho, 1989), DECtalk speech, being highly intelligible, is far superior to digitized speech. Subjects Expanding on the pioneering work of the Colorado group and drawing on the Lock and Leong (1989) DECtalk program library, M. Mackay and myself have similarly found, in two experiments with grade 6 students, a high level of intelligibility for DECtalk ("Perfect Paul" voice), as compared with human speech (Leong & Mackay, 1993). Our total sample consisted of 66 twelve-year-old subjects with no known hearing problems, who were randomly divided into two subgroups of 33, one for the DECtalk (DEC), and one for the human speech (Voice) modes of output of the same words and sentences. These 66 students were further divided (on the basis of the Canadian Tests of Basic Skills (King, 1982)) into three subgroups ("below average" (BA), "average" (AV), and "above average" (AA)) of 9, 12 and 12 students respectively for each listening condition. All subjects were given practice in listening to a 170-word sample passage "Shaggy Bear Tale" adapted from the DECtalk manual (Digital Equipment Corporation, 1984). Task and Procedure Experiment 1 adapted Durrell and Catterson's (1980) Listening Vocabulary subtest to assess the intelligibility of the DECtalk system as compared with human speech. In
Using Microcomputer Technology
263
this refined task with 60 words varying from two to three syllables and represented by 15 semantic categories, students in each mode of presentation were asked to listen to each of the 60 words outputted at random. They were required to match accurately and rapidly, by pressing a computer key, each lexical item with the corresponding superordinate or coordinate semantic category, shown (with three alternatives) in both pictorial and verbal forms. An example is the word FLOWER, going with the category of PLANTS (picture of a potted plant); CHEERFUL, with the category of HAPPY (smiling face).
Results The results show for DECtalk an overall matching accuracy of 88% and a mean response time of 2101 msec. For human speech, the overall matching accuracy was 96% and the mean response latency was 827 msec. There was no speed-accuracy trade-off A 2 (presentation mode) by 3 (reading level) ANOVA shows the expected significant main effect in favor of human speech (F (1, 60) - 36.01, p k/2
In this equation, j represents the number of trees that correctly classify example x. We require that it be more than half of the k trees, thus the restrictions on the sum. p(x)] represents the probability of] trees getting the example correct; (1-p(x)fl'-J is the probability that the remaining trees get it wrong, k choose j simply counts the number of possible ways k trees could divide into two sets of trees, one of size ]. Figure 1 shows how maj(k,x) varies with p(x) when different numbers of trees are used for the majority. Note that for example x, taking the majority vote increases the probability of getting a correct classification ifp(x) > 0.5, but decreases it ifp(x) < 0.5. Let X1 be the set of examples in the test set for which p(x) < 0.5, and ] 0.5. If x e X~, it is to our advantage to use the classifiers directly. If, on the other hand, x e X2, taking the majority will increase the probability that we will classify x correctly. For any given test set, there will likely be points in both cases. Obviously we cannot tell, given a particular example, whether it belongs to X1 or X2 unless we know its classification. However, it is our experience that the benefit we get by increasing the likelihood of a correct classification for those examples in X2 outweighs the loss in accuracy we get on the examples in X1.
I
I
/[.. i
0.8
o
~D O
"n
o.,
9
- " - .---~"
~
o ~
!-: , ,o/~
tl
I
i
!- , , / - / /
0.6 0.4
.....
,.'t o
-
/,';1/
/,,"","
0 .2
/ .~ t /
.,.--"
0
/-"
-"
,," /! ,,'" ii
. - 9
.,-t"
/i
--
'i Tree . . . . Trees . . . . . . '9 T r e e s . . . . . . ' 4 9 Trees ...........
'3
..'~ /
- ......
"
0.2 0.4 Individual
I
,
I
0.6 0.8 Probability
I
Figure 1. Majority classification probability versus individual classification probability.
308
D. Heath, S. Kasif and S. Salzberg
Intuitively, it would seem that simply increasing the number of classifiers on the committee (in a majority voting scheme) should continually increase the expected accuracy of the decisions. The next example illustrates why this intuition is wrong, and how, in fact, the ideal size of the committee will vary depending on the problem. The critical factor is how many examples in the domain at hand are difficult to classify--if there are many such examples, then very small committees will be preferable. An implication of this is that choosing the appropriate value for k may be a difficult problem. We have already seen that for some examples (those with less than 50% probability of being correctly classified by the average tree), using a majority vote will lower the chances of a correct classification, and the more trees used, the lower the resulting accuracy will be. On the other hand, increasing the number of trees involved in the vote will increase the accuracy on those points likely to be classified correctly by the average tree. Normally, we would expect many domains to have a mixture of these two types of examples, some difficult to classify and some easy. When we try using a majority voting scheme on a mixture of these two types, we will get a mixed result. Consider two examples, e/ and e2. If we generate many trees, on average el is classified correctly 45% of the time, and e2 is classified correctly 80% of the time. (One can also think of el as a set of examples with the same probability of correct classification.) As shown in Figure 2, if we use a majority voting scheme, then el will rarely be classified correctly, but e2. will almost always be classified correctly. Figure 2 also shows the combined expected accuracy for the set {el, e2}. If we generate a
I U
r.) u
{1)
0.6
I
I
. .
accuracy accuracy Combined ~ 1 7 .6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I
80% 45%
0.8
:3
I
.
.
.
.
.
.
.
.
.
.
.
.
.
...... ......
0.4
-,-I 0
0.2 !
I
5
i0
!
15 Number
I
20 of
I
I
25 30 Trees
I
35
Figure 2. Effects of majority voting on mixed data sets. series of trees and use each one to classify the two examples, we expect their average accuracy to be 62.5%. If we use majority voting, we expect the accuracy to increase up to about 68% for nine trees. However, if we use more than nine trees, the expected accuracy goes down, eventually converging to 50%. Thus for this simple example, the qr~timal.csmamittee_~z~i~nine_.
309
Committees of Decision Trees
For a set of examples X, where p(x) is the probability of example x being correctly classified by an average tree, it is easy to show that the average accuracy without voting is
• ~p(x) while the accuracy when an infinite number of trees are used in a majority computation is I{x ~ X , p ( x ) > 0.5}[
I;O that is, the fraction of the examples which are more than likely classified correctly by the average tree. Between these two extremes, the overall accuracy may have dips and peaks. In this paper, we experiment with majority voting using different numbers of trees. We use these experiments to empirically choose a value for k which seems to work well in practice. RELATED W O R K k-DT is one of several different strategies for combining multiple classifiers. There are two common approaches to this problem. The first approach can be thought of as multi-level learning: a set of classifiers are trained. Their outputs are fed to another learning system, which learns an appropriate weighting scheme to apply to those outputs, in the hopes of creating a more accurate classifier. Depending on the implementation, the two levels can be trained separately or simultaneously. Wolpert's (1992) stacked generalization technique and the hybrid technique developed by Zhang et al. (1992) are examples of separately trained systems. An example of a simultaneously trained system is Jacobs et al. (1991), in which the second learning level learns how to assign training examples to the different components of the first level. k-DT takes another approach. Only the first level is trained; the second level is a simple, easily understood, fixed strategy. We have used majority voting in this study, but other fixed strategies could also be used. Another system that takes this approach is the cluster back propagation network of Lincoln et al. (1990). THE SADT ALGORITHM Although the majority voting technique could be applied to any randomized classifier scheme, k-DT was first conceived as a natural enhancement of our SADT algorithm. Accordingly, all of our experiments have been conducted on the SADT algorithm. To aid in the understanding of k-DT, we explain the workings of our SADT algorithm here. The basic outline of the SADT algorithm is the same as that of most other decision tree algorithms. That is, we find a hyperplane to partition the training set and recursively run the partitioning algorithm on the two subsets that result. Here we describe how SADT searches for a good hyperplane.
310
D. Heath, S. Kasif and S. Salzberg
In our implementation, d-dimensional hyperplanes are stored in the form
H ( x ) = hd+1+ ~_~a=~h~x~, where H - {hi, h2..... hd§ 1} is the hyperplane, x - (Xl, x2, ..., x,/) is a point, and hd+l represents the constant term. For example, in the plane the hyperplane is a line and is represented in the familiar ax + by + c - 0 form. Classification is done recursively. To classify an example, compare it to the current hyperplane (initially this is the root node). If an example p is at a non-leaf node labeled H(x), then we follow the left child ifH(p)> 0; otherwise we descend to the fight child. The first step in our algorithm is to generate an initial hyperplane. This initial hyperplane is always the same and is not tailored to the training set. We simply wanted to choose some hyperplane that was not parallel to any of the axes, so we used the hyperplane passing through the points where x i = l and all other xj-O, for each dimension i. In particular, the initial hyperplane may be written in the above form as hi = 1 for 1 < i < d and ha+ 1 = -1 since H(x) = 0 for each of these points. Thus in 3-D, we choose the hyperplane which passes through (1,0,0), (0,1,0), and (0,0,1). Many other choices for the initial hyperplane would be equally good. Once the annealing begins, the hyperplane is immediately moved to a new position, so the location of the initial split is not important. Next, the hyperplane is repeatedly perturbed. If we denote the current hyperplane by H - {hi, h2 ..... hal§ then the algorithm picks one of the hi's randomly and adds to it a uniformly chosen random variable in the range (-0.5,0.5). Using our goodness measure (described below), we compute the energy of the new hyperplane and the change in energy AE. ,
If AE is negative, then the energy has decreased and the new hyperplane becomes the current split. Otherwise, the energy has increased (or stayed the same) and the new hyperplane becomes the current split with probability e-AwT where T is the temperature of the system. The system starts out with a high temperature that is reduced slightly with each move. Note that when the change in energy is small relative to the temperature, the probability of accepting the new hyperplane is close to one, but that as the temperature becomes small, the probability of moving to a worse state approaches zero. In order to decide when to stop perturbing the split, we keep track of the split that generated the lowest energy seen so far at the current node. If this minimum energy does not change for a large number of iterations (we used numbers between 3000 and 100,000 iterations in our experiments), then we stop making perturbations and use the split that generated the lowest energy. The recursive splitting continues until each node is pure; i.e., each leaf node contains only points of one category. Goodness Criteria SADT can work with any goodness criterion, and we have experimented with several. For detailed discussions of these measures, see Heath (1992) or Murthy et al. (1994). In this paper, we experiment with three of these criteria: information gain (Quinlan, 1986) and our own Max-Minority (MM) and Sum-Minority (SM) measures. We define MM and SM as follows.
Consider a set of examples X, belonging to 2 classes, u and v. A hyperplane divides the set into two subsets X1 and X2. For each subset, we find the class that appears least often. We say that these are the minority categories. If )(1 has few examples in its
Committees of Decision Trees
311
minority category C1, then it is relatively pure. We prefer splits that are pure; i.e., splits that generate small minorities. Let the number of examples in class u (class v) in )(1 be Ul (Vl) and the number of examples in class u (class v) in X2 be u2 (vz). To force SADT to generate a relatively pure split, we define the SM error measure to be min(ul ,Vl ) + min(u2 ,v2 ), and the MM error measure to be max(min(ul ,Vl ),min(u2
,v2)). EXPERIMENTS
Classifying irises For our first experiment, we ran k-DT on Fisher's iris data, a well known dataset that has been the subject of numerous other machine learning studies (see Holte, 1993 for a recent summary). The data consists of 150 examples, 50 each of three different types of irises: setosa, versacolor, and virginica. Each example is described by numeric measurements of width and length of the petals and sepals. We performed 35 ten-fold cross validation trials using SADT. In an x-fold cross validation trial, one divides the data into x approximately equal subsets and performs x experiments. For each subset s, we train the learning system on the union of the remaining x-1 sets and test on set s. The results are averaged over these x runs. Our results on the iris data are shown in Table 1. Average Goodness Error Criterion Rate (%)
Error rate Reduction
Best Accuracy
with 11 trees
in Error
Error Rate
Number of Trees
4.1
28%
4.1
9
MM
5.7
SM
5.3
3.7
30%
3.3
33
IG
5.5
4.8
13%
4.8
5
Table 1. Iris results for k-DT. Shown in the table is the accuracy obtained when, for each training- and test-set pair, we take the majority vote of 11 trees when classifying the test set. Note that the accuracy when using the majority voting scheme is consistently higher than when using single SADT trees. Also shown in Table 1, in the last two columns, are results from the single best tree of the 35 different trials. Weiss and Kapouleas (1989) obtained accuracies on this data of 96.7%, 96.0%, and 95.3% with backpropagation, nearest neighbor, and CART, respectively. Their results were generated with leave-one-out trials, i.e., 150-fold cross validation.
Choosing a value f o r k How did we choose k=l 1 for our k-DT trees? Intuitively, it may seem that the more trees used in the voting process, the higher will be the combined accuracy. However, if an example is somehow 'difficult' to classify, then voting will only make it less likely that the example is classified correctly by the committee of trees.
312
D. Heath, S. Kasif and S. Salzberg
Figure 3 is a plot of average classification accuracy on the iris data set, as the number of trees in the voting process is varied. Note that there is a big jump in accuracy even when only three trees are used. The max-minority and information gain measures peak fairly early and begin to drop off, whereas the sum-minority measure is still increasing in accuracy at thirty-five trees.
98
|
i
i
l
|
l
u
97
o o
._.___......_..__.---.-----