STUDIES I N MULTIDISCIPLINARITY
VOLUME 2
Mu1tidi sciplin ary Approaches to Visual Represen tations and Interpretations
STUDIES
IN
MULTIDISCIPLINARITY
SERIES EDITORS
Ray Paton* University of Liverpool, Liverpool, UK Mary A. Meyer Los Alamos National Laboratory, Los Alamos, New
Mexico, USA
Laura A. M c N a m a r a Sandia National Laboratories, Alburquerque, New
Mexico, USA
On the cover. Harald F. Teutsch. Cross-section of a parenchymal u 1996, 85 x 80 cm, acrylics on paper on canvas.
STUDIES
IN
MULTIDISCIPLINARITY
VOLUME
M ul ti di s ci pli n ary Approaches to Visual Representations and Interpretation s EDITED BY
Grant Malcolm
The University of Liverpool Liverpool, UK
2004
ELSEVIER Amsterdam Paris -
-
Boston
San Diego
-
-
Heide]berg
San Francisco
-
London
-
Singapore
-
New York -
Sydney-
Oxford
Tokyo
2
ELSEVIER B.V. Radarweg 29 P.O. Box 211, 1000 AE Amsterdam The Netherlands
ELSEVIER Inc. 525 B Street, Suite 1900 San Diego, CA 92101-4495 USA
ELSEVIER Ltd The Boulevard, Langford Lane Kidlington, Oxford OX5 1GB UK
ELSEVIER Ltd 84 Theobalds Road London WC1X 8RR UK
9 2004 Elsevier B.V. All rights reserved. This work is protected under copyright by Elsevier B.V., and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier's Rights Department in Oxford, UK: phone (+44) 1865 843830, fax (+44) 1865 853333, e-mail:
[email protected]. Requests may also be completed on-line via the Elsevier homepage (http://www.elsevier.com/locate/permissions). In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W 1P 0LP, UK; phone: (+44) 20 7631 5555; fax: (+44) 20 7631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of the Publisher is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier's Rights Department, at the fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. First edition 2004 Library of Congress Cataloging in Publication Data A catalog record is available from the Library of Congress. British Library Cataloguing in Publication Data A catalogue record is available from the British Library. ISBN: 0-444-51463-5 ISSN (Series): 1571-0831 ( ~ The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
Working together to grow libraries in developing countries www.elsevier.com I www.bookaid.org I www.sabre.org
Dedication
This book is dedicated to the memory of Ray Paton, whose large discourse of reason gave birth to the conferences on Visual Representations and Interpretations. Sure, he that made us with such large discourse, Looking before and after, gave us not That capability and god-like reason To fust in us unused. - William Shakespeare, Hamlet
This Page Intentionally Left Blank
Editor's preface
It is hard to think of an area of academic study that does not engage, in some way, with visual representations and their interpretations. Many disciplines make a direct contribution to the sciences, practices, or our understanding of visual representations. They illuminate the way we perceive, construct or construe such representations. The simple act of seeing an image can be studied from the perspective of physics, biology, psychology, sociology, cognitive science, computer science, fine art, design, media studies, engineering, anatomy, and philosophy - and surely this list is not exhaustive. Even if a discipline makes no direct contribution, it nevertheless makes an indirect contribution: every discipline has its own notations, which have some visual or diagrammatic form, and which have their own conventions of use and practices of interpretation. These notations, with their attendant conventions and practices, can be a fruitful area of study in their own fight. Certainly, in my own field of Computer Science, a long-standing concern with notations and their deployment has grown up into an active area of research with connections to cognitive science, sociology, and even ethnomethodology. This kind of interdisciplinary study is, by its very nature, common to many disciplines, and provides an important means for ideas to propagate among different communities - a process that becomes increasingly important as fields (and researchers) become increasingly specialised. The value of multi-disciplinary research, the exchanging of ideas and methods across traditional discipline boundaries, is well recognised. It could be argued that many of the advances in science and engineering take place because the ideas, methods and the tools of thought from one discipline become re-applied in others. Because of its very breadth, the topic of 'the visual' is an extemely fruitful one for dialogue across disciplines. It has become increasingly important as vii
,iii
Editor's preface
advances in technology have led to multi-media and multi-modal representations, and extended the range and scope of visual representation and interpretation in our lives. Under this broad heading there are many different perspectives and approaches, from across the entire spectrum of human knowledge and activity. The development of advanced graphics for computer games and film animations, for example, has drawn on and led developments in computational geometry. Outside the technological sphere, recent controversies over artworks show the power of the visual to manifest wildly different interpretations, and to become a topic of everyday conversation and a focus of political activity. This volume contains revised papers from the second international conference on Visual Representations and Interpretations (VRI 2), which took place in Liverpool in September, 2002. The first VRI conference, also held in Liverpool, provided a forum for researchers to communicate their work to other researchers in a wide variety of disciplines. VRI 2 continued this young tradition, and provided a very open forum for researchers working in any area concerned with visual representations and interpretations. The conference brought together workers in Bioengineering, Biology, Cognitive Science, Computer Science, Design, Engineering, Fine Arts, Linguistics, Mathematics, Medicine, Philosophy, Physics, Psychology, Psychotherapy, and Statistics. Moreover, all the papers presented at the conference showed, in one way or another, a concern with reaching out across disciplinary boundaries. One of the successes of the conference was the emergence of various dialogues between participants. In order to continue those dialogues, contributors were given the opportunity to revise and update their presented papers. The results make up the body of this book. The breadth of the papers made it difficult to order the chapters, and as an editor, I was tempted simply to put them in alphabetical order and invite the reader to browse through them as he pleased, and discover for himself the connections between them. However, some themes emerged quite clearly, and I took the editor's privilege of grouping the chapters into parts to emphasise those themes. Many other reasonable groupings would be possible, and browsing, dipping, and flicking through the chapters is recommended. Perhaps, the clearest theme is brought out in the chapters forming Part I. Each of these six chapters is concerned with some aspect of the use of the visual in the sciences, although even here there is a wide variety of approaches and concerns. Gooding gives an excellent opening to the whole book with an account of the role of visualisation and visual representation in scientific discovery and communication. The chapter introduces many of the themes that arise in later chapters, including the exploration of the 'personal'
Editor's preface
ix
and 'public' dimensions of visual representations, and the relationships between representations and inference. Bertamini, Spooner, and Hecht show how interpretation of visual information can go awry. The 'personal' cognition of physical processes can be strikingly at odds with reality, even though individuals are adept at working with those physical processes in everyday actions. This chapter raises another theme that recurs in later chapters: the difference between 'seeing that' and 'seeing as'. Perini takes a philosophical approach to scientific visual representations, examining their textual and pictorial aspects, and bringing a technical notion of isomorphism to bear on the question of how scientific representations denote and therefore, presumably, are interpreted. The application of the notion of isomorphism directly addresses the structure of visual, particularly diagrammatic, representations. Parish addresses the issue of visually representing structure in molecular biology, and raises the question of whether DNA itself can be read as a description. The process of interpretation would then be a functional process; but how would this be related to cognitive processes of interpretation? The discussion again addresses the structure of visual representations, and its relationship to the represented structure. Teutsch again is concerned with structure, in a discussion of the 'modular design' of the rat liver. This chapter shows how making cell function visible allows structural relationships to be recognised. B inz, Pods, and Schempp give a technical account of how Heisenberg groups give a mathematical basis for modelling information and information transmission, giving applications to image transmission. They also address the issue of information in the double-helix structure of DNA. The issues of structure in visual representation and interpretation, and the relationships between representation and inference, link the seven chapters in Part II: Signs and Systems. Goguen and Harrell's presentation at the conference had, unfortunately, to be cancelled, but their chapter here raises again the theme of structure in representation and interpretation. This chapter uses a notion of 'structure-preserving' transformation (technically, a weaker notion than Perini's isomorphism) in addressing the issue of design quality for user-interfaces in software systems. This issue is one application of the semiotic notions that are introduced in this chapter. Quite similar semiotic notions are used in Norman's chapter on 'direct' and 'indirect' interpretation. In the context of distinguishing diagrammatic and sentential representations, Norman introduces types of iconicity as a key concept in understanding direct interpretations. Peirce's existential graphs are used as an example here, and also in Pietarinen's chapter, which looks at the role of diagrammatic logical representations in concept modelling, and argues that these allow strategies to be expressed. The argument is strongly supported by
x
Editor's preface
the relationships explored here between diagrammatic logic and gametheoretical semantics. Paton' s chapter takes up the theme of concept modelling, and explores the uses of various graph structures to elicit, describe and model knowledge. The modelling approach described here applies to dynamic bodies of knowledge, and the dynamics are captured by allowing different types of graphical representations. Coherence between these different representations involves 'structure-preserving' relationships, treated intuitively here to allow for 'open' modelling of dynamic knowledge. Concept modelling again arises in Luchjenbroers' chapter on the uses of visual and verbal cues in discourse. Luchjenbroers makes a very elegant use of conceptual mappings and conceptual blends in analysing the ways that gestures convey and elaborate information in spoken discourse. The ensuing chapter, by Carroll, Luchjenbroers, and Parker, is again concerned with discourse analysis, and presents an example of the use of textual and video analysis of a discourse. The example illustrates how the two analyses differ on the establishment of rapport between speakers. Finally, Karatzas and Antonacopoulos bring a different approach to the theme of pictorial and textual representations. Their chapter addresses the technological challenges of mechanically extracting text from pictures, particularly in web pages. Interestingly, their proposal involves structuring colour representations to mirror the ways in which humans perceive colour. The ways in which visual representations mediate communication, either in its own fight or as part of a teaching and learning process, is the theme of Part III: Communication and Learning. Lee explores components and simplifications in the use of pictures to communicate concepts, and compares semiotic and philosophical approaches and an approach that takes context, and especially the notion of 'speaker'-convergence, into account. Leishman and McNamara's chapter gives a very multi-disciplinary approach to graphical representations in multi-disciplinary projects. The graphical representations discussed here are not only used for knowledge modelling and prediction but also as a part of a communication process. Here too, as the representations are refined, a notion of convergence comes into play. Lund and Paton describe the use of a visual metaphor as a means of communication between patient and psychotherapist. Here, convergence is balanced against a need for openness in interpretations. The remaining chapters in this part are concerned - to varying degrees with the use of software systems in teaching and learning. In a duet of papers, Sedig, Morey, Mercer, and Wilson discuss the use of software systems in learning through exploration. The first of these, by Sedig and Morey, analyses at a general level the different forms of interaction provided by
Editor's preface
xi
user-interfaces, while the second discusses the design and use of a particular tool that allows the user to explore a particular kind of lattice through visual representations. Visual tools in education is further discussed by Jenschke, Fangera, and Arnstein, in a chapter that describes the Labscape system used by high-school students. The experiment described in this chapter again emphasises the importance of structured interaction with a visual userinterface. Software tools help students, but they can also embody principles used by practitioners of an academic discipline. The chapter by Whiteley compares these two aspects in a discussion of the role played by visual representations in Mathematics. Part IV is concerned with the generation and use of visual representations, particularly drawings. Biggs discusses Wittgenstein's picture theory of meaning, and argues that this is not a simple analogy of how drawings and language depict, but that it includes an inferential force: like an engineering drawing, performance and action can be calculated from the notation. August, Eckert, and Clarkson's chapter is included here as it is concerned explicitly with engineering drawings: in particular with matrix representations of design processes. These have both a depictive and computational force, and the authors address the issues of interpreting large-scale diagrams. The theme of drawing and design is picked up by Rose, in a chapter that discusses the skills and cognitive processes involved in successful drawing. An example in this chapter, involving drawing bicycles, recalls the naive knowledge discussed in the chapter by Bertamini, Spooner, and Hecht. The role of functional understanding and perception of structure in drawing is explored by Ferreira, Ball, Friede, and Scrivener. This chapter reports on an experiment to elucidate the cognitive processes involved in drawing objects from memory, with applications to drawing designs of complex objects. The perceptual and affective aspects of visual representations and their interpretations are the subject of Part V: Seeing and Responding. This opens with Latto's discussion of the aesthetic affect of shapes, and in particular of the orientation of lines. That orientation contributing to aesthetic value is shown by experiments, and the chapter discusses the relationship between aesthetic value and cognitive process. Bradley gives a philosophical account of colour perception and proposes a definition of colour experience that unifies the different colours perceived by different species. The chapter by Nagl gives an analysis of artworks, and in particular the paintings of Frida Kahlo, in an exploration of how we view our bodies in the 'post-genomic' age. The issues of how people relate to technology is reflected in the relationship between patients and medicine. Returning to visual perception, Zschocke analyses the work of the contemporary artists Turrell and Fontcuberta, focusing on their play with the phenomenology of vision,
xii
Editor's preface
pushing perception and interpretation to extremes to provoke a feeling of unease or 'irritation'. The relationships between people and technology are further explored by Gschwendtner, in a study of how this relationship is portrayed in films. Freudian concepts are used to illustrate an increasingly complex relationship between people and increasingly complex machines. Finally, Holcombe, Smith, Merewood, and Swingeford bring a new slant to the relationship between people and machines by describing a computer program that produces pictures in the styles of Mondrian, Escher and Klee. An analysis of the constructive techniques of the artists is distilled into rule-based computations. The analysis of the techniques deployed by Mondrian to achieve different effects is a nice reflection of Latto's analysis of affect in the opening chapter of this part. This grouping simply reflects some of the connections between the contributions to this book. There are many more. The index at the back of this book should help the reader discover at least some of these. The present volume is more of a reflection of ongoing dialogues than the proceedings of a conference, but these would not have been possible without the VRI conference, and I am grateful to all the participants for making it a stimulating meeting. The conference itself would not have been possible without the efforts of a large number of people. I am particularly grateful to the Programme Committee: Caroline Baillie, Michael Biggs, Ernst Binz, Nicola Dioguardi, Andr6e Ehresmann, Paul Fishwick, Bob Franza, JeanLouis Giavitto, Peter Giblin, Joseph Goguen, David Goodsell, Leo Groarke, Rom Harr6, Robin Hendry, Mike Holcombe, John Lee, Charles Lund, Michael Leyton, Peter McBurney, Mary Meyer, Arthur Miller, Irene Neilson, Ray Paton, Walter Schempp, and Peter Wright. In particular, Peter McBurney and especially Ray Paton put tremendous effort and enthusiasm into the organisation of the entire conference, Irene Nielson developed a very attractive and useful website, and Brian Reay kept everything going smoothly. Thanks are also due to the Department of Computer Science at the University of Liverpool for sponsoring the conference, to the Tate Liverpool for giving us a guided tour of their collection, to the Moathouse Hotel for providing an excellent venue, to Thelma Williams, of the Department of Computer Science, for all her help with administration, and to Geoff Beard, of the University of Liverpool, for helping smooth out the difficulties we ran into on the way. During the final stages of the production of this book we received the very sad news that Ray Paton had been taken ill suddenly and passed away. Ray was a dear friend and colleague, and a tireless worker whose interests
Editor's preface
xiii
covered many fields. The breadth of his interests are reflected in the VRI conferences: Ray organised and chaired the first VRI conference; the second, and this book, would not have been possible without him. His breadth of interests, and his energy and enthusiasm, also meant that he made connections and friendships with a large number of colleagues all over the world. I am sure that they, like me, will miss him greatly, and I am grateful for the chance to dedicate this book to the memory of Ray C. Paton. Grant Malcolm Department of Computer Science The University of Liverpool
This Page Intentionally Left Blank
Contributors
Antonacopoulos, A. - PRIMA Group, Department of Computer Science, University of
Liverpool, Peach Street, Liverpool L69 7ZF, UK Arnstein, L. - Department of Computer Science and Engineering, University of
Washington, Seattle, WA, USA August, Elias - Engineering Design Centre, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Ball, Linden J. - Faculty of Applied Sciences, Lancaster University, Lancaster LA1
4YF, UK Bertamini,
M.
-
Department of Psychology, University of Liverpool, Eleanor
Rathbone Building, Liverpool L69 7ZA, United Kingdom Biggs, Michael A.R. - Faculty of Art and Design, University of Hertfordshire College Lane, Hatfield, Herts ALl0 9AB, UK Binz, Ernst - Lehrstuhl fiir Mathematik I, Universit~it Mannheim, 68131 Mannheim,
Germany Bradley, P. -
Philosophy Department and Philosophy-Neuroscience-Psychology
Program, Washington University in St. Louis, Campus Box 1073, St Louis, MO 63130, USA Carroll, P. - Department of Linguistics, University of Wales, Bangor, Gwynedd LL57 2DG, Wales, UK Clarkson, P. John - Engineering Design Centre, Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Eckert, Claudia - Engineering Design Centre, Department of Engineering, University
of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK Fangera, N. - Cell Systems Initiative, Department of Bioengineering, University of
Washington, Seattle, WA, USA
XV
xvi
Contributors
Ferreira, Isabelle
M.S. - Faculty of Applied Sciences, Lancaster University, Lancaster
LA1 4YF, UK Friede, Tim -
Faculty of Applied Sciences, Lancaster University, Lancaster LA1 4YF,
UK Department of Computer Science and Engineering, University of California, San Diego, CA, USA
Goguen, Joseph A. -
D a v i d - Science Studies Centre, Department of Psychology, University of Bath, Bath BA2 7AY, UK
Gooding,
Gschwendtner,
Andrea
-
Berlin University of Fine Arts, Berlin, Germany
Department of Computer Science and Engineering, University of California, San Diego, CA, USA
Harrell, D. Fox -
Man-Vehicle Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Building 37-219, Cambridge, MA 02139-4307, USA Holcombe, M i k e - Department of Computer Science, University of Sheffield, Portobello Street, Sheffield S 1 4DP, UK Hecht, H. -
Cell Systems Initiative, Department of Bioengineering, University of Washington, Seattle, WA, USA K a r a t z a s , D . - PRIMA Group, Department of Computer Science, University of Liverpool, Peach Street, Liverpool L69 7ZF, UK L a t t o , R i c h a r d - Department of Psychology, University of Liverpool, Eleanor Rathbone Building, Bedford Street South, Liverpool L69 7ZA, UK L e e , J o h n - Department of Architecture, Human Communication Research Centre, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK L e i s h m a n , D e b o r a h - Statistical Sciences Group, Los Alamos National Laboratory D-l, Los Alamos, NM, USA L u c h j e n b r o e r s , J. - Department of Linguistics, University of Wales, Bangor, Gwynedd LL57 2DG, Wales, UK L u n d , C.A. - Regional Department of Psychotherapy, Newcastle City Health Trust, Newcastle upon Tyne, UK M c N a m a r a , L a u r a - Statistical Sciences Group, Los Alamos National Laboratory D-1, Los Alamos, NM, USA M e r c e r , R . - Cognitive Engineering Laboratory, Department of Computer Science, The University of Western Ontario, London, Ont., Canada Merewood, R o w a n - Department of Computer Science, University of Sheffield, Portobello Street, Sheffield S 1 4DP, UK Jenschke,
L. -
J. - Cognitive Engineering Laboratory, Department of Computer Science, The University of Western Ontario, London, Ont., Canada N a g l , S y l v i a B. - Department of Oncology, Royal Free and University College Medical School, Rowland Hill Street, London NW3 2PF, UK
Morey,
Contributors
xvii
Norman, Jesse - Philosophy Department, University College London, Gower Street, London WC1E 6BT, UK
Parish, J. H. - School of Biochemistry and Molecular Biology, The University of Leeds, Leeds LS2 9JT, UK Parker, S. - Materials Science LTSN, Liverpool University, Liverpool L69 3GH, UK Paton, R. C. - Department of Computer Science, The University of Liverpool, Liverpool L69 3BX, UK
Perini, Laura - Philosophy Department, Virginia Polytechnic and State University, Blacksburg, VA 24060, USA
Pietarinen, Ahti-Veikko - Department of Philosophy, University of Helsinki, P.O. Box 9, FIN-00014 Helsinki, Finland Pods, Sonja - Lehrstuhl ftir Mathematik I, Universit~it Mannheim, 68131 Mannheim, Germany Rose, Chris - University of Brighton, Brighton BN2 2JY, UK Schempp, Walter - Lehrstuhl ftir Mathematik I, Universit~it Siegen, 57068 Siegen, Germany
Scrivener, Stephen A.R. - School of Art and Design, Coventry University, Priory Street, Coventry CV1 5FB, UK Sedig, K. - Cognitive Engineering Laboratory, Department of Computer Science and Faculty of Information and Media Studies, The University of Western Ontario, London, Ont., Canada Smith, Samantha - Department of Computer Science, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK Spooner, A. -
Department of Psychology, University of Liverpool, Eleanor
Rathbone Building, Liverpool L69 7ZA, United Kingdom Swingeford, Andy - Department of Computer Science, University of Sheffield, Portobello Street, Sheffield S 1 4DP, UK
Teutsch, Harald F. - Department of Anatomy and Cellular Neurobiology, University of Ulm, Albert-Einstein-Allee 11, D-89069 Ulm, Germany Whiteley, Walter - Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Canada M3J 1P3 Wilson, W.W. - Cognitive Engineering Laboratory, Department of Computer Science, The University of Western Ontario, London, Ont., Canada Zschocke, Nina - Institute of Art History, University of Cologne, An St Laurentius 8, 50923 K61n, Germany
This Page Intentionally Left Blank
Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
V
Editor's preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XV
Part I. Visual representations in science 1. Visualisation, inference and explanation in the sciences, by David Gooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The representation of naive knowledge about physics, by M. Bertamini, A. Spooner and H. Hecht . . . . . . . . . . . . . . . . . . .
27
3. Convention, resemblance and isomorphism: understanding scientific visual representations, by Laura Perini . . . . . . . . . . . . . .
37
4. Emerging descriptions in molecular biology, by J.H. Parish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5. Modular design of the liver of the rat, by Harald F. Teutsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
6. The Heisenberg group as a fundamental structure in nature, by Ernst B inz, Sonja Pods and Walter Schempp . . . . . . . . . . . . . . .
69
Part II. Signs and systems 7. Information visualisation and semiotic morphisms, by Joseph A. Goguen and D. Fox Harrell . . . . . . . . . . . . . . . . . . . . .
83
8. Iconicity and "direct interpretation", by Jesse Norman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
9. Diagrammatic logic and game-playing, by Ahti-Veikko Pietarinen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
115
xx
Contents
10. Mobilising knowledge models using societies of graphs, by R.C. Paton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
135
11. Verbal and visual cues for navigating mental space: conceptual mappings and discourse processing theory, by J. Luchjenbroers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
12. Sounds, signs, and rapport: on the methodological importance of a multi-modal approach to discourse analysis, by P. Carroll, J. Luchjenbroers and S. Parker . . . . . . . . . . . . . . . . . .
165
13. Visual representation of text in Web documents and its interpretation, by D. Karatzas and A. Antonacopoulos . . . . . . . . .
181
Part lII. Communication and learning 14. Component modes of graphical communication, by John Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
197
15. Interlopers, translators, scribes, and seers: anthropology, knowledge representation and Bayesian statistics for predictive modelling in multidisciplinary science and engineering projects, by Deborah Leishman and Laura McNamara . . . . . . . . .
211
16. Developments in the use of a visual metaphor with reference to clinical problems, by C.A. Lund and R.C. Paton . . . . . . . . . . . .
229
17. A descriptive framework for designing interaction for visual abstractions, by K. Sedig and J. Morey . . . . . . . . . . . . . . . . . . . . . . .
239
18. Visualising, interacting and experimenting with lattices using a diagrammatic representation, by K. Sedig, J. Morey, R. Mercer and W.W. Wilson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
255
19. Labscape for education: Ballard High School Pilot Project, by L. Jenschke, N. Fangera and L. Arnstein . . . . . . . . . . . . . . . . . . .
269
20. Teaching to see like a mathematician, by Walter Whiteley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
279
Part IV. Drawing 21. Visualisation and Wittgenstein's "Tractatus", by Michael A.R. Biggs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293
22. Using design structure matrices in visualising design processes, by Elias August, Claudia Eckert and P. John Clarkson . . . . . . . . .
305
23. Vision and drawing in design,
by Chris Rose . . . . . . . . . . . . . . . . . .
319
Contents 24. Sketching behaviour in object recall and object copying, by Isabelle M.S. Ferreira, Linden J. Ball, Tim Friede and Stephen A.R. Scrivener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxi
329
Part V. Seeing and responding 25. Do we like what we see?
by Richard Latto . . . . . . . . . . . . . . . . . . . .
26. The unity of colour: a quasi-functionalist proposal, by P. Bradley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
343 357
by Sylvia B. Nagl . . . . . . . . . . . .
367
28. The strategy of visual irritation: forms of ambiguous representation in contemporary art, by Nina Zschocke . . . . . . . . .
373
29. Interaction of people and machines as a narrative and visual figure in film: a study of motifs, by Andrea Gschwendtner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
389
30. Computational modelling of creativity in abstract art, by Mike Holcombe, Samantha Smith, Rowan Merewood and Andy Swingeford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
407
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
425
27. Art and post-genomic medicine,
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
Visualisation, inference and explanation in the sciences David Gooding Science Studies Centre, Department of Psychology, University of Bath, Bath BA2 7AY, UK
This chapter draws attention to case studies of scientific discovery as an important source of information about visualisation as a real-time cognitive process. These show that visual modes of representation are essential to the generation, communication and dissemination of new knowledge. I survey a number of strategies of visualisation and develop a process model based on the ways that scientists in a wide range of fields manipulate the dimensionality of images in order to move between representations that are local, situated and often personal and images that are widely understood and have an objective status as depictions of facts or laws. This shows that the cognitive status of visual images changes as scientists integrate novel representations into their arguments.
1.
INTRODUCTION
Case studies of innovation and discovery in science and technology are an important but neglected source of information about the uses of visualisation. These show that visual modes of representation are essential to the generation and dissemination of new knowledge (Rudwick, 1976; Ferguson, 1977, 1992; Gooding, 1982, 1985, 1998; Miller, 1986, 2001; Lynch and Woolgar, 1990; Rasmussen, 1997; Jones and Galison, 1998; de Chadarevian and Hopwood, 2004). These studies confirm what we have known all along but have barely begun to describe, let alone theorise in a systematic
2
D. Gooding
way - that visualisation is integral to many kinds of thinking. I am investigating the ways that scientists construct and manipulate mental images (sometimes called spatial cognition). While historical case studies provide suggestive material, historians are interested in coherent narratives rather than general, cognitive theories about cognitive processes while sociologists generally attend to the use of images only as a consensual basis for the dissemination and acceptance of facts (Latour, 1990; Henderson, 1991; Beaulieu, 2001). Despite some suggestive work by Gregory (1981) on perceptual hypotheses, Hanson (1958) and Shelley (1996) on perceptual inference as abduction, Kemp (2000) on structural intuitions and Tversky (2002) on psychological studies relevant to visual depiction, our understanding of visualisation and visual reasoning remains vague and sketchy. The neglect of real-world examples by psychology is hardly surprising. The complexity and the relatively long duration of the processes place them beyond the scope of an experimental approach. According to Anderson (2002), the timescales of processes suitable for experimental studies designed according to Newell (1990), four hierarchical bands of cognition can differ by over seven orders of magnitude, from months to microseconds. The real-time processes we would like to explain involve processes in at least three of Newell' s four bands, taking minutes, hours, days or even decades to complete (Gruber, 1974; Westfall, 1980; Gooding, 1990a,b; Holmes, 1991; Tweney, 1992). In addition to this temporal incompatibility, case studies show that spatial cognition often integrates a range of experiences that vary in duration and also originate in different sensory modalities. This suggests that visualisation cannot be theorised in terms of Newell's classification of cognitive processes according to the time-intervals in which the tasks used to define different kinds of cognition. Moreover, images have a further function in the dissemination of science, where they are used to codify new knowledge (such as empirical laws, cf Cheng, 1994) and to organise and structure demonstrations, as for example in textbooks and in thought experiments (Brown, 1991; Sorensen, 1992). Studies suggest that while some visualisations remain integral to thought and argument, others decline in importance as the vocabulary and discursive practices of a field develop. Insofar as science is about winning acceptance through argument and demonstration as well as proposing new constructions, our approach must recognise what (if anything) is irreducibly visual about each stage in the development of a scientific field, if we are to capture and represent the cognitive dynamics of these activities. (Even this is not quite enough. In the longer term we need to investigate also those cases where visualisation fails, that is, where the limitations of the visual are exposed by the need to transcend naturalistic or depictive types of representation - see Miller, 1986, 1996.) In many cases, visual representations are displaced by verbally
Visualisation, inference and explanation in the sciences
3
expressed imagery or by expressions such as icons and symbols, whose meaning has been fixed. Images such as sketches that were crucial aids to interpretation at the start of an investigation may give way to verbal descriptions and mathematical formulae as the language of description is enhanced, only to become crucial again in the context of popular dissemination. The importance of plasticity to the articulation of meaning has been noted by Gooding (1982, 1986) and by Henderson (1999, pp. 198 if) who argues that much of the power of visual representations lies in the fact that they can carry information both explicitly and implicitly, giving scope for negotiation in the fixing of meaning. Both the importance of visual representations and their plasticity vary in relation to the novelty or familiarity of an experiential domain, and in relation to how well developed the repertoire of associated verbal and symbolic representations is. This is to be expected because novel, unfamiliar processes are more readily grasped if we can visualise them in terms of familiar elements of experience. It is necessary to understand the world in terms of which the process is made intelligible. That world is invoked and defined by visualisation embedded in an experimental narrative, which extends the experimenter's world to introduce unforeseen possibilities. In this way, visual representation helps articulate intuitions, which once articulated, can become part of a verbally represented argument that draws upon familiar experience and extends it into the unforeseen. For example, the experimental results of Davy, Ampere, Biot and others required no drawings or engravings to report their observations as published in the Philosophical Transactions and the Memoires de l'Acaddmie. Nevertheless, visualisation in terms of geometrical forms had been essential to their initial ordering of disparate experiences into phenomena and to the subsequent description of new electromagnetic phenomena (Gooding, 1990a,b, Chapters 2-4). The variable status and meaning of visualisations suggest that the long-standing debate about whether the representations underlying mental imagery are themselves analogical or propositional (Kosslyn, 1981; Pylyshyn, 1981) cannot be resolved by empirical means.
2.
OSTENSION: MAKING IMAGES THAT DEPICT
Images are typically taken as depictions of an actual or possible state of affairs. This is the naturalistic attitude, which scientists share with artists and with most of the rest of the population. Science is representational and naturalistic and like art it can extend and change what counts as being naturalistic. Miller argues that prior to the rise of quantum theory in the early
4
D. Gooding
decades of the 20th century, "physicists had dealt with physical systems that with some justification were assumed to be amenable to their perceptions, and for which the space and time pictures of classical physics were applicable" (Miller, 1986, p. 128). We may ask, "What makes a representation amenable to a set of perceptions. Many psychological studies demonstrate the dynamical, constructive character of perception, indicating that little, if anything, is given in experience (Gregory, 1981, pp. 383-415; Gibson, 1986). Yet the same presumption about the natural congruence of visual representations to aspects of experience underlies the maxim suggested by Tversky's review of graphical depictions: "Use spatial elements and relations naturally. Naturalness is found in natural correspondences, 'figures of depiction,' physical analogs, and spatial metaphors, derived from extensive human experience with the concrete world. It is revealed in language and in gesture as well as in a long history of graphic inventions" (Tversky, 2002, p. 111). There is an important distinction to be made between what appears natural and what is made to appear natural. Some aspects of perception such as the tendency to prioritise vertical and horizontal alignments over oblique ones (Latto, 2003), or the innate tendency to see a human face in certain surface features on Mars or a wormlike structure in meteoric material (fig. 1) are natural in the sense that they are innate or biologically endowed. Others can be attributed to the repertoire of experiences we bring to bear upon new experience. However, amenability and naturalness mainly arise from people's attempts to communicate their experience of phenomena that they take to be natural. Most of the perceptions we are interested in here owe their existence to human action and construction. Ostension is the act of linking a token to the object it names or denotes. Although ostension is often treated as a matter of making a connection between two givens - words and objects (Austin, 1962, pp. 121-122) in science as in art - the depicting of something by another sort of thing is
Fig. 1. Left: the face of Cydonia (source: NASA Mars surface image). Right: calcite structures seen in electron micrograph of Martian meteorite ALH84001 (source: NASA).
Visualisation, inference and explanation in the sciences
5
accomplished, not given (Wittgenstein, 1953, w167 We say that an image depicts when it has a direct resemblance to what it is an image of. We may describe the depiction as transparent, natural or realistic. However, "directness", "resemblance", "transparency" and "realism" are not themselves transparent notions. They depend on culturally established conventions. The effectiveness of these conventions may of course depend on how well they invoke innate cognitive capacities. We tend to assume that these capacities are constant across domains and cultures. But this assumption is difficult to investigate for the sciences because most case studies focus on the culture-specific aspects of image-making in science (see, for example, Galison and Stump, 1996; Galison, 1997; Jones and Galison, 1998). There are several ways of exposing cognitive factors relevant to a psychological theory of visualisation in science. One is to trace the development of primary modes of representation such as numerical (digital) and visual-verbal (analog). Notwithstanding the apparent methodological priority of measurement and numerically presented data, scientific disciplines do not simply develop through a "soft" or qualitative infancy to a "hard", quantitative maturity. Many episodes in the history of science show that counting and imaging are essential and that neither has supremacy (Galison, 1997; Gooding, 2002). Another approach is to examine transitions in science and art between perception-based depictions and visualisations of what is known but cannot possibly be experienced, even indirectly Miller (1986, 1994). A third, to which I now turn, is to look at innovation and discovery. Successful visualisation of novelty may draw on cognitive processes in a more revealing way than more standardised modes of communication do. What do you do when you want to describe a phenomenon that has never been seen before or features that have never been noticed or deemed as relevant to the depiction of a phenomenon or process? A new material image has to be created alongside the associations and conventions that establish it as an image of something that deserves a place in our experience. Here, the existing repertoire of descriptive resources is necessary but cannot be sufficient to solve the problem. This is because successful visualisation of new experience requires that an image does more than draw on an existing cultural repertoire of visual meanings and associated conventions.
2.1.
Direct depiction
Consider an example from the history of art. During the 19th century, landscape painters developed new ways of depicting foliage so as to
6
D. Gooding
Fig. 2. The "touch of oak" by J.G. Strutt (Magazine of Natural History, Vol. 1, 1828). differentiate between different species of trees. Artists' interest in scientific modes of observation called for new depictions of the form and foliage of trees, but no model existed for the depiction of foliage (Hartley, 1996, p. 158). As artists sought to be scientific, they needed to differentiate between species and to show the effects of distance and atmospheric conditions. Imaging this called for new ways of moving pencil and brush. An example is the particular method of moving a soft pencil in order to achieve a particular "touch" to capture a particular type of foliage (see fig. 2). This involved establishing both "correct" and "incorrect" techniques (fig. 3). Previously artists had not noticed, or had not cared to depict such differences. Here depiction involves a move from something (not previously noticed) to a new set of discriminations captured by active manipulation of pencil or brush. Artists such as Edward Kennion, J.D. Harding and later, John Constable also had to persuade their audience that the new images stood for distinct species and show how each representation recognised atmospheric conditions. Seeing newly differentiated types of trees in a painting depended both on mastery of a technique by the artist and on the viewer's ability to read the new types of markings. Neither of these skills came simply from looking at nature; both were invented, learned and then taught. This example illustrates
Fig. 3. Correct and incorrect depictions of foliage from a manual by J.D. Harding, Elementary Art (London, 1834).
Visualisation, inference and explanation in the sciences
7
how the having of an experience depends on new techniques as well as shared associations and conventions.
2.2.
Interpretation as visualisation by reconstruction
Archaeologists regularly use techniques of reconstruction to aid the interpretation of found objects. Thus they learn to make flints in order to study the markings made by this process. This enables them to discriminate between markings caused by natural processes such as erosion and the effects of intentional, human action such as flint making. Once they have learned to identify broken stones from their markings and from their context as tools, they can work out their function (Schick and Toth, 1993; Shelley, 1996). This calls for experiments that reconstruct the different uses such as skinning animals, breaking bones to extract marrow, carving bones into tools, etc., in order to study the effects both on the bones and on the stones used as tools. These experiments are a kind of simulation: they produce new evidence about ancient objects. They teach us to see each object in terms of the characteristic patterns that identify its function. This illustrates the importance of human image-making to our coming to know what these objects are, why they appear as they do, and how they were made. Reconstructive methods are not confined to the sciences: on the contrary, as cognitive agents, scientists have always drawn on representational practices drawn from culture. In particular, science owes a great deal to practices of art (as ars or makers' knowledge), as argued by historians of medieval science and technology (Crombie, 1953; de Santillana, 1962; Perez-Ramos, 1988). More recently historians of art such as Martin Kemp and the artist David Hockney have argued that reconstructive techniques show that the photographic realism of the paintings of grand masters including van Eyck, Caravaggio and Holbein, is due to the usage of imaging devices such as concave mirrors and camera obscura (see Kemp, 2000, pp. 28-29, pp. 64-65; Kemp, 2001, Hockney, 2001). Hockney believes that they painted not only from the world using the methods of linear perspective, but also from images reflected or projected from the world. This view suggests a nice analogy between the techniques that lie behind representation in art and in science. This is further supported by the example of other painters who transformed our vision such as Picasso, who, in the early 20th century, combined the new technology of photography with inventive uses of glass negatives to create new painted images such as his Demoiselles of 1907 (Miller, 2001). Miller argues that Picasso shone light through stacks of glass negative images of his own paintings onto a fresh canvas to produce the
8
D. Gooding
composite, multi-perspectival images that came to define cubism. Half a century later, Kendrew used a similar method of stacking images held in sheets of lucite to produce the first 3D model of a myoglobin molecule (see Kendrew, 1961; Kemp, 2000, pp. 118-119; de Chadarevian and Hopwood, 2004). Recent modelling of the vascular structure of the liver involves the same basic procedures on 2D and 3D representations (Teutsch et al., 1999; Teutsch, 2003). In what follows, I will move beyond the rather vague notions of resemblance, naturalness and structural intuitions to identify some important features of the process of creating naturalistic representations that successfully depict scientific facts, and to show that these features define a process at work in many different scientific fields.
3.
SOME OSTENSIVE PRACTICES IN SCIENCE
Not all representations depict in the way that these images and objects do. Nevertheless, a wide range of scientific work involves ostensive practices, which link images to aspects of experience of interest to scientists. Even those scientists working in today's technologically complex, industrial-scale research laboratories wish to be able to image entities and processes. The history of science "bears witness to the desire of scientists for visual imagery" even in situations in which normal or natural modes of visualisation cannot apply (Miller, 2001, p. 36). After all, there are limits to what can be achieved by manipulating symbols or the statistical analysis of data. For example, physicists in the "image" tradition of high-energy physics developed technologies to produce detailed images of particle collisions and decay, as tracks and patterns created by the motion of particles through gases and emulsions. Those in the quite distinct "logic" tradition developed mechanical (and later computerised) techniques for detecting particle events in large quantities of numerical data (Galison, 1997). The existence of both an imaging approach and a mechanised numerical method of obtaining information about fundamental particles indicates the continuing importance of both picturing and of counting and classifying as fundamental modes of dealing with experience (Gooding, 2002). Given two different kinds of information about the possible structure of crystalline DNA, the contrasting approaches of Rosalind Franklin and that of Watson and Crick indicates the same fundamental difference in preferred mode of representation. Whereas Franklin and Wilkins chose to analyse crystal structure by Patterson projections (a mathematical simulation method used to compensate for the phase effects of the scattering of X-rays by atoms of the crystal), Watson played down the importance of scattering effects, focusing
Visualisation, inference and explanation in the sciences
9
on the suggestiveness of X-ray diffraction patterns for a structural model of the DNA molecule.
3.1.
Visualisation by analogy
The existence of viral particles was theorised and then imaged in the 1960s. An early article by Wildy et al. (1960) contains two adjacent images: one is the pattern made by the scattering of electron beams by a viral particle; the other shows the patterns of light and shade made by a wooden model. In the absence of strong theoretical constraints on possible structures, the shadows made by the macroscopic object validate an interpretation of an X-ray diffraction pattern caused by the submicroscopic object. So, notwithstanding the differences between X-rays and visible light, the electron micrograph is taken to be an image of something that is very like the wooden model (see also Kendrew, 1961; Bragg, 1968; Olson and Goodsell, 1992). The method of analogy is very old. Galileo used it to interpret his sketches of the scarred, cratered surface of the Moon. His carefully crafted images were disbelieved - surely a celestial body like the moon could not suffer scarring and deformation? Galileo's images were doubted because they were images of art (ars), i.e. not derived by acceptable intellectual methods (scientia - see de Santillana, 1962; Perez-Ramos, 1988; Winkler and Van Helden, 1992). To meet the objections of critics that his telescope was not a valid, reliable method of seeing, Galileo could point it at a nearby object to show how it displays an image of something to which we have independent ocular access, unmediated by an instrument or art. Why doubt that the instrument does anything different when showing the moon? In the late 19th century, Nasmyth and Carpenter used photography to create a visual analogy that enables (or perhaps obliges) us to see the image of the moon in terms of more familiar experiences such as the shadows on a wrinkled hand or a shrivelled apple (Kemp, 2001, pp. 62-63). The "analogy" can, of course, be validated by correlating sight to hearing or another of the senses. For example, the diagnostic use of early X-ray images of lungs affected by tuberculosis depended on this kind of validation (fig. 4). Practitioners such as Halls Dally created "likenesses" between the sounds of percussion and shadows in the X-ray images, translating shadows into sounds (see Halls Dally, 1903). In these cases, a new and therefore suspect method of imaging is shown to display the same patterns, structural features or regularities as those obtained
10
D. Gooding
Fig. 4. Diseased area of the lung (arrowed) as interpreted by creating a sound analogue (from Halls Dally, 1903). via perception that is either unmediated or is dependent on already established methods of extending perception. The method of visual analogy draws on a cultural repertoire of established techniques which owe their efficacy to innate, general capacities for spatial cognition.
3.2.
Extending ostension: imaging novel phenomena
Faraday's detailed records of his laboratory work show how visualisation works in conjunction with sensorimotor awareness (proprioception or kinaesthetic awareness) to produce new representations. These are interpretative images whose cognitive (generative) and social (communicative) functions are inextricably linked. I call them construals (Gooding, 1990a,b, Chapters 1-3) while Magnani describes them as manipulative abductions (Magnani, 2001, pp. 53-59). These are proto-representations, which merge images and words in tentative interpretations of novel experience. This experience is created through the interaction of visual, tactile, sensorimotor and auditory modes of perception together with existing interpretative concepts including mental images. These word-image hybrids integrate the different types of knowledge and experience. This performative and linguistic framework is the basis for abductive inferences about processes behind the phenomena (Gooding, 1996). The many sketches in Faraday's manuscript Diary show that like his mentor Humphry Davy, Faraday construed many of his experiments as showing a temporal slice - a "snapshot" - of the effect of some more complex but hidden, physical process (Martin, 1932-1936). In response to Oersted's discovery that a current-carrying wire has magnetic properties,
Visualisation, inference and explanation in the sciences
11
Faraday and Davy had developed experimental methods of integrating discrete experimental events by September of 1821 (or rather, of integrating the images depicting them). Electrical and magnetic effects are mixed in a way that the eye simply cannot see. So, Davy and Faraday practised "accumulation", that is, they combined discrete images obtained over time into a single geometrical structure. Conversely, they also created a physical structure of sensors with which to record the effects of a single event at different points of space. A typical procedure involved carefully positioning one or more needles in the region of a wire, connecting the circuit to a battery and observing the effect on the needles. Similarly, continuous exploration of the space around the wire would produce a pattern made up of many discrete observations of needle positions. Davy and Faraday combined these results into a single model, a 3D representation of the magnetic effects of the current. A structure of needles arranged in a spiral around the wire and examined after discharging a current through it, gave a 3D magnetic snapshot of the magnetising effect of the current. Another set-up, a horizontal disc with needles arranged around its perimeter, emerged from a set of temporally distinct observations, which this set-up integrates into a single spatial array. These objects are complexes of material things, active manipulations, effects and proto-interpretations of the outcomes. These structures and patterns explained nothing in themselves, but once they had been identified as features of a process became heuristics, guiding further exploration of structures hidden from view. These should manifest themselves through new experimental set-ups as other (new) patterns. An important example is Davy's explanation of phenomena observed in an experiment carried out in May of 1821. Assisted by Faraday, he passed a current through a vacuum to produce a luminous glow discharge. Davy reported that when "a powerful magnet [was] presented to this [luminous] arc or column, having its pole at a very acute angle to it, the arc, or column, was attracted or repelled with a rotatory motion, or made to revolve by placing the poles in different positions, according to the same law ... as described in my last paper" (Gooding, 1990a,b, Chapter 2). Davy and Faraday construed this process in terms of hidden, real-time (4D) processes involving 3D structures. Faraday later developed this approach with other devices to "extend" his ability to analyse high-frequency processes. Where the discrimination Faraday sought exceeded the capacity of practised manipulation before unaided senses, he made devices and procedures to extend his sensory and discriminatory powers, for example, to "slow down" the high-frequency processes that might produce an appearance of structure or of motion. This is an example of inferring a structure and process both from the features of a pattern and from its behaviour under manipulation, where the process is
12
D. Gooding
Fig. 5. Microscopic, aquatic "wheel animalcule", which appears to have two rotating discs (top) which Faraday analysed as progressive waves in fixed rings of cilia; (from Faraday, 1831, plate III, fig. 17).
made sensible by a sense-extending device. The spinning, toothed discs used in his work on optical perception are an example of the extension of visual perception by apparatus (Tweney, 1992). A related method reproduced patterned appearances by means of mechanical simulations. Where he could simulate some aspect of a natural phenomenon by a high-speed mechanical process, Faraday took this to be a fair indication as to the nature of that process. Typical simulations were the toothed wooden wheels whose rotation could reproduce apparent (but biologically implausible) rotation of the apparent discs of cilia of aquatic "animalcules". These had earlier been observed by Leeuwenhoek in 1702, but were shown by Faraday in 1831 to be progressive undulations in their cilia - see fig. 5.
4.
PATTERN, STRUCTURE, PROCESS
These examples suggest different strategies such as visualisation by analogy and by the freezing or slowing of processes. Can we go beyond this, to develop a unifying model of how scientists use images to devise solutions to problems? In this section, I will show that we can identify a schema that is
Visualisation, inference and explanation in the sciences
13
widely used in a range of contexts and at different stages in the development of representations and discourse about them. Faraday's constructive method involved moving from 2D patterns to 3D structures, which could then be animated either as thought-experiments in time or as material, bench-top simulations of the invisible processes. Faraday took an image to express some pattern discernible in a process; he construed such patterns as indicative of some hidden process. To investigate the latter involved adding dimensionality, that is, imagining a 3D structure which, if "frozen" in time, might have such a structure and which, as a 4D process occurring in time, would generate the 2D patterns he had initially construed as suggestive of the 3D structures. The resulting 3D model would have to yield the original 2D pattern in some phenomenon. This process can be represented as the repeating schema: pattern ~ structure --, process ---, pattern ~ ...
(Eq.1)
where each arrow indicates an as yet unspecified type of inference. There is first a reduction of complex, real-time phenomena to an abstract image (usually a pattern or set of patterns, such as a magnetically induced distribution of iron filings). This image is then enhanced by "adding" dimensions, first to create a 3D structure, which can be imagined and sometimes also drawn, and then - where a causal explanation is sought further enhanced by constructing a real-time, 4D process model.
4.1.
Dimensional reduction and enhancement
The progression from two to four dimensions is a dimensional enhancement. The process is more complex than this summary suggests since the 2D images with which the process begins are themselves partial abstractions, dimensionally reduced representations of a more complex experience. Dimensional reduction is always necessary when recording real world processes as, say sketches in a notebook. Dimensional enhancement, therefore, always depends on a prior abstraction or reduction. A second feature is that in all cases, the initial enhancement is followed by a consolidating move in which the originating 2D image(s) and new ones are derived from the 4D process model. Consolidation involves reducing the complex images from four dimensions to two. Dimensional reduction is, therefore, used in both the construction and the consolidation stages. In the latter, reduction enables dissemination (say, of predictions or observed results) in the form of printed diagrams. A search for new effects predicted by the model might typically involve 2D patterns or the design of new
14
D. Gooding
Fig. 6. Faraday's first sketch (September 1821) envisioning the configuration of wires, magnet, mercury and other components of the first device to produce continuous electromagnetic motion. From Faraday's Diary, Martin (1932-1936), Vol. 1, p. 50. observational techniques to analyse full-blown 4D processes. The consolidation stage is analogous to prediction and retrodiction as inferences on propositional representations. Thus it resembles a deduction, albeit one accomplished through manipulating objects that are neither propositions nor symbolic representations. These features of the process highlight three different roles for images, each corresponding to a different stage of the process of constructing a new representation and integrating it into an argument: 1. Generativity: they may be instrumental in generating new representations or in extending the use of existing ones. 2. Integration: they symbolise an integrated model of a process that involves many more variables than the eye or the mind could otherwise readily comprehend. In these two cases, visualisation is essential to the construction and use of interpretative and analytical concepts. 3. Justification: they enable empirical support for the theory embodied by the model, usually through the dissemination of images in 2D form. Here the visualisation of observations or data assists a verbal argument that may have been developed by non-visual means. I have shown elsewhere how the movement between pattems, structures and processes is characteristic of Faraday's experimental reasoning. It is clearly at work in his record of the day's work that led to the "rotation apparatus", which was the first electric motor (fig. 6). In this case, visualisation produced a new material artefact, an electric motor. It is also exemplified in his first comprehensive representation of the mutual interaction of electricity, magnetism and motion of 1832 (fig. 7).
Visualisation, inference and explanation in the sciences
9
nI
9
9
nlum nit
mm UlNf I q ~
/!",
,
dh
;-"-. 9.
"-, hi"
ql.
m' nn mm
"-.
lrl*. ,n
9
lfn
q',-
I
4n
,*I ! .;. - " - i ' " l ' -'
; ".I t I ;
Ikmlnull " u ~ S 9
15
m. 4
dl, lu
.-"
9 9
"mP_
_;
."
~ F
Fig. 7. Faraday's sketch of March 1832 which is accompanied by verbal instructions that describe how to animate the elements of the drawing to show the mutual dependence of electricity, magnetism and motive force. From Faraday's Diary, Martin (1932-1936), Vol. 1, p. 425. The micro-structure of exploratory work shows that representation involves dimensional reduction (whereby selected features are represented visually, as patterns), followed by enhancements that generate new 3D configurations, further reductions that generate predictions about new phenomena, and consolidation that establishes the derived structures as plausible explanations or realisations of the observed patterns.
4.2.
Visual inference
Faraday' s notebooks are a rich mixture of sketches, diagrams and text. These are not simple records of observations; rather, as working notes they are very much part of the world on which he worked. Thus the sketches do not simply depict; they were not introduced with nicely fixed referents. They are tools for thinking, not images of its outcomes. What they purport to represent is both complex and dynamic. They are early manifestations of a process of establishing a basis for shared experience and for communication about that experience. Each image itself stands for an accumulation of practical and theoretical knowledge. Thus, in fig. 7 Faraday visualises a constant feature of a set of changing relationships. This can also be seen in a sequence of sketches of the interaction of a current-carrying wire and a magnetised needle of September 1821, where each sketch incorporates and summarises a complex set of discrete operations and observations that precedes it, so as. This process also involves a dimensional enhancement, i.e. moving from two to three dimensions by shifting the observer's point of view through 90 ~ (for a full explanation, see Gooding, 1998).
16
D. Gooding
Table 1 Visual reasoning by dimensional enhancement and reduction - part of the sequence of dimensional enhancements and reductions for the discovery of the electric motor - a day' s work recorded in Faraday's manuscript for 3 September 1821
The 2D representation of 3D structure in the top row of column 2 was constructed by rotating what is represented in column 1 through 90~ these images are then "accumulated" by mental superposition to produce the image of continuous motion implied in column 3. Where column D is blank, this indicates that he had not yet envisaged a physical device. Faraday then systematically removed dimensions in order to derive the original phenomena and new ones such as the electromagnetic rotations (moving downwards in column 4).
The meaning and function of such images is therefore variable, depending upon how it is used in relation to others that represent earlier or later manipulations and interpretations. Displaying them as an array (see table 1) allows us to view each image not as a self-contained depiction but rather as part of a continuous process involving observation, interpretation, construction, abduction and deduction.
5.
VISUAL ABDUCTION: SOME OTHER EXAMPLES
I turn now to other examples that suggest that the abduction schema (1) is widely used in science, wherever there is a need to resolve, order and communicate experience that is complex, chaotic, unstable or ambiguous.
Visualisation, inference and explanation in the sciences
5.1.
17
Reanimation of the Burgess Shales
Gould introduces his chapter on Whittington's reconstruction of the lifeforms fossilised in the Burgess Shales with the a remark that I can't imagine an activity further from simple description than the reanimation of a Burgess organism. You start with a squashed and horribly distorted mess and finish with a composite figure of a plausible living organism. (Gould, 1989, p. 100) This example displays important similarities to the processes used by Faraday to create 3D and 4D (process) models of the electromagnetic phenomena. The process involves making careful camera lucida drawings of both positive and negative impressions of the flattened, fossilised animals. These 2D images are then interpreted in terms of what they show about possible organisms. While some organisms might be interpreted by analogy to modern counterparts, many Burgess organisms have no counterparts. Moreover, the cleavage planes in the shale cut the flattened organisms at different angles (so creating the problem of determining which impressions image the same organism). In order to be identified, an impression would have to be mentally re-imaged as if from several points of view. One investigator (Morris) reports having drawn specimens "that had been found in various orientations, and then passing countless hours 'rotating the damned thing in my mind' from the position of one drawing to the different angle of another, until every specimen could be moving without contradiction from one stance to the next" (Gould, 1989, p. 92). Whittington and his co-workers engaged, just as Faraday did, in a dialectical process of moving back and forth between 3D structures made from 2D images and inferring the flattened layers from solid objects.
5.2.
Explaining seafloor spreading
During the controversy over continental drift certain images acquired a crucial, persuasive role. This cruciality depended on the construction of 3D structural and 4D (process) models from numerical data displayed in the form of patterns. During the 1950s, measurements of magnetic field strength were made in the form of magnetometer scans along well-defined paths. Viewed magnetically, the seafloor in the region of the eastern pacific rise consists of strips of rock, each of which has a different magnetic field strength. Records of these scans were accumulated into anomaly maps. An anomaly map displays patterns of magnetisation built up by many hundreds
18
D. Gooding
of scans representing many thousands of numerical readings. The visualisation of data tables as 2D maps involves a translation of numerical into graphical form, but the image threshold is also important. Rendering the data by binary (black or white) images rather than by greyscale ones highlights the striping, which indicates regular alternations in field strength (compare the two images in fig. 8). In this episode, a key image that became crucial to the acceptance both of the reality of seafloor spreading and a new explanation of it, is a particular run of the ocean survey ship Eltanin (LeGrand, 1990). Selected from a large survey of sea floor magnetisation, this image became known as Eltanin-19. The anomaly maps display patterns in data accessed through instruments but can also incorporate other relevant phenomena and features, such as centres of volcanic activity or earthquakes and the chemical composition, thickness, temperature and underlying geology of the sea-floor. 3D models were then constructed from these maps. These static models accumulated several different types of information into a single type of drawing, which became the new focus of thought and argument. Although considerable evidence supported the theory of continental drift, no plausible mechanism had been proposed that could cause the movement of continents. In the mid-1960s, Vine, Matthews and Wilson proposed a theory of ocean floor spreading that incorporated the striping shown most clearly in the binary versions of the anomaly maps. Molten basalt is magnetised as it cools. Its magnetisation will depend on the sense of the earth' s field at the time it is extruded and cools. This magnetisation will subsequently affect the field strength in the region above it, being "added" to or "subtracted" from the earth' s field. Where extrusion continues during periodic reversals of polarity,
Fig. 8. Left:Greyscale image of magnetometer data. Right: binary image of magnetometer data. (From Raft and Mason, 1961.)
Visualisation, inference and explanation in the sciences
19
the magnetic striping of the sea floor becomes a record of these reversals (Vine and Matthews, 1963; Vine and Wilson, 1965; Vine, 1966). This hypothesis treats the anomaly patterns as a consequence of the extrusion of molten basalt, invoking a geological process whose details could be worked out. The 3D model now stands for a state (the current state) of the 4D process, while the 2D anomaly pattern becomes a historical record of the products of this process. The static structural representation, which had suggested the process model now becomes a consequence, both logically and causally. In 1965, Vine and Wilson inferred that if the process explanation is correct then striping should be symmetrical. If molten basalt is extruded along a fissure identified as a ridge, patterns and field intensity plots should show mirroring on either side of this ridge. Vine and his colleagues then found scans that displayed this property. One of these - Eltanin-19 - displayed it particularly well (LeGrand, 1990, pp. 255-257; Pitman and Heirtzler, 1996, p. 1166). The persuasive force of this plot depended on the scientists' ability to "illustrate the invisible", accumulating, presenting and integrating large quantities of data about different features as a single pattern or plot. We can use a matrix to represent visual inference so as to distinguish aspects that we do not yet understand from those what we may already understand. For example, a move a c r o s s a row in table 2 represents an abductive inference whose cognitive character remains opaque, i.e. beyond the reach of current psychological theories. A move d o w n w a r d s represents an inference that may prove, on further analysis, to involve standard forms of inference (induction and/or deduction). This is because horizontal moves generate representations that are stable enough to use in other, less opaque kinds of inference. Tables situate these processes in relation to inference Table 2 Dimensional enhancement, reduction and consolidation for sea-floor spreading Dimensions 2 1. Representation Anomaly maps profiles 2. New feature 3. Representation
Search/generate anomaly maps and profiles for symmetry 4. Representation Selected anomaly depicts real maps and profiles world feature that show symmetry
3
4
Derivation
Static Process model model Symmetry in striping either side of a ridge Selected features of existing/new maps Selected maps and profiles
20
D. Gooding
generally by displaying dimensional enhancement or reduction as operations on images (across the rows) and other kinds of inference as the generation of new images, constructs and propositions (in new rows).
5.3.
Vascular structures from modular sections
The modular structure suggested by the appearance of cryosections of rat liver is a typical example of an hypothesised property that cannot be observed in whole livers and which evidence based on dissection does not actually support (Teutsch, 2003). Teutsch has shown that the modularity and the vascular structures that support it can be demonstrated by the construction of 3D images from virtual "stacks" of images of very thin cryosections (Teutsch et al., 1999). Patterns of the sort that had suggested modularity since the 17th century can be observed in these sections but it is only through meticulous reconstruction that the character of this modularity (in terms of primary and secondary modular structures) and of the complex vascular structures that service the modules has emerged. Teutsch's method is the virtual counterpart of the procedure attributed to Picasso and to early investigations of the structure of crystals (see section 2). It involves the same procedures of depicting, accumulating and structuring that I attributed to Faraday in section 4. The development of the representations in the examples in section 5 can be described in terms of the model of dimensional manipulation (enhancement and reduction) proposed in section 4. These stages are summarised in table 3, in which dimensions increase from left to fight. Each column represents a different order of representational capability. Table 3 shows both representational enhancement (moving left to fight) and formal consolidation (moving to a new row). Here, as in tables 1 and 2, columns 2 - 4 hold representations of a given dimensionality. The fightmost column contains a derived consequence (whose dimensionality may be less than that of the process model). Each new row contains a new step that establishes the original map, pattern or section as a consequences of the processes postulated in column 4 for the structure identified in column 3. This is consolidation or justification.
6.
BEYOND OSTENSION: IMAGING WHAT CANNOT BE SEEN
I have proposed a model of visual inference based on specific manipulations of the dimensionality and observer's view of representations.
Table 3 Visual inference as the manipulation of dimensions Dimensions
Type of representation
2D (pattern)
3D (structure)
4D (process)
Maps of magnetic actions, anomaly map, cytosections
Static models of electromagnetic interaction, extrusion of lava, vascular structure
Process theory animates structural model, showing how 2D patterns or sections emerge in time
Prediction or retrodiction of patterns of other phenomena
Derivation
New magnetic motions. Property of symmetry in anomaly plots. Sub-modularity
~,~o
t,~
22
D. Gooding
The process includes procedures that are both constructive (generative) and derivational (demonstrative). These procedures have been identified in a number of cases and there are many others. The model identifies one of the strategies whereby scientists construct visualisations that can bear a demonstrable relationship either to perceptible objects (as with X-ray diffraction images) or to objects created through instrumentation (such as the visualised numerical data in fig. 8). The appeal to scientists is obvious. Thus an article on computer visualisation of biological molecules quotes da Vinci's description of the eye as "the window of the soul...the chief means whereby the understanding may most fully...appreciate the works of nature" (Olson and Goodsell, 1992, p. 76). The complexity of the linkage of image to its referent may render it opaque even to many practitioners in the field. The linkages are not chains of purely natural causes - their transparency also depends on an understanding of the technologies, procedures and skills to create and replicate them. Some images are created by methods that are so complex and recondite that the concept of ostension as a means of establishing a transparent link between image and human experience of a referent no longer applies. Transparency becomes the province of a small number of experts. Although the transparency of human "seeing" suggests an intimate connection between vision and intellectual comprehension there are many types of representation which are not depictive (Tversky, 2002) or naturalistic (Miller, 1986, 1994). Such images take us beyond ostension when the image is introduced, not to depict a potentially experiential object or process but rather to denote an abstraction. Such images do not represent objects that can be experienced, even via the use of concrete analogies like Wildy's model of the viral particle (see section 2). Images such as Feynman diagrams or Darwin's branching tree diagram (Darwin, 1882) refer to processes and entities such as mathematical relationships and complex, temporally extended processes that are literally unimaginable. Darwin's tree does not depict a process that can be experienced. There is no phenomenology of natural selection as there is, for example, for electromagnetism (section 4). Such processes are beyond any experience that is possible for humans. As a recent controversy over the depiction of complex statistical constructs such as electron orbitals show, the appeal of seeing via representations that appear to depict remains strong even when the objects depicted have no basis in any possible human experience (Zuo et al., 1999; SCelTi, 2001). Scientists equipped with sophisticated theories and technologies wish nevertheless to see as rest of us do. This appears to be why the editor of Nature agreed with the researchers that they had successfully done the impossible, and imaged an electron orbital (Humphrey, 1999).
Visualisation, inference and explanation in the sciences
7.
23
CONCLUSION
I have described the generative, integrative and demonstrative uses of visual representations in the work by a number of scientists in a variety of fields. I have shown how these can be schematised and that the schema is used in a variety of scientific contexts, including creative, exploratory work and the interpretation of novel information (early electromagnetism, palaeobiology) through to dissemination and argumentation (geophysics, hepatology). The visualisation processes and strategies as surveyed here include dimensional enhancement and reduction, concrete analogies, and perspectival construction and projection (Hockney). No doubt there are many other ways of connecting words, images and symbols to what they denote in the scientist's world. The very character of this correspondence is determined by the cognitive capacities that underlie image manipulation strategies of the sort described here, by scientific theory (as electron orbitals illustrate) and by technologies of observation and visualisation illustrated.
REFERENCES Anderson, J.R., 2002. Spanning several orders of magnitude: a challenge for cognitive modelling. Cognit. Sci. 26, 85-112. Austin, J.L., 1962. Sense and Sensibilia. Oxford University Press, Oxford. Beaulieu, A., 2001. Voxels in the brain. Soc. Stud. Sci. 31,635-680. Bragg, L., 1968. X-Ray crystallography. Sci. Am., 58, July. Brown, J.R., 1991. The Laboratory of the Mind. Routledge, London. Cheng, P.C.-H., 1994. Scientific discovery and creative reasoning with diagrams. In: Smith, S.M., Ward, T.B. and Finke, R.A. (Eds.), The Creative Cognition Approach. MIT Press, Cambridge. Crombie, A.C. 1953. Robert Grosseteste and the Origins of Experimental Science. Oxford University Press, Oxford. Darwin, C., 1882. On the Origin of Species, 6th Edition. London. de Chadarevian, S., Hopwood, N. (Ed.), 2004. Models: the Third Dimension of Science. Stanford University Press, Stanford. de Santillana, G., 1962. The role of art in the scientific renaissance. In: Clagett, M. (Ed.), Critical Problems in the History of Science. U. Wisconsin Press, Madison, pp. 33-65. Faraday, M., 1831. On a peculiar class of optical deceptions. Reprinted in: Faraday, M. (Ed.), 1859. Experimental Researches in Chemistry and Physics. Taylor & Francis, London, pp. 291-309. Ferguson, E.S., 1977. The mind's eye: nonverbal thought in technology. Science 197, 827-836. Ferguson, E.S., 1992. Engineering and the Mind's Eye. MIT Press, Cambridge. Galison, P., 1997, Image and Logic. A Material Culture of Microphysics. Chicago University Press, Chicago. Galison, P., Stump, D., 1996. The Disunity of Science: Boundaries, Contexts, and Power. Stanford University Press, Stanford. Gibson, J.J., 1986. The Ecological Approach to Visual Perception. Erlbaum, Mahwah, NJ.
24
D. Gooding
Gooding, D.C., 1982. Empiricism in practice: teleology, economy and observation in Faraday's Physics. ISIS 73, 46-67. Gooding, D.C., 1985. In nature's school: Faraday as a natural philosopher. In: Gooding, D., James, F. (Eds.), Faraday Rediscovered. Macmillan/American Institute of Physics, London, pp. 105-135. Gooding, D.C., 1986. How do scientists reach agreement about novel phenomena? Stud. Hist. Philos. Sci. 17, 205-230. Gooding, D.C., 1990a. Mapping experiment as a learning process. Sci. Technol. Hum. Values 15, 165-201. Gooding, D.C., 1990b. Experiment and the Making of Meaning. Kluwer, Dordrecht. Gooding, D.C., 1996. Creative rationality: towards an abductive model of scientific change, Vol. 58, Philosophica: Special Issue on Creativity, Rationality and Scientific Change, pp. 73-101. Gooding, D.C., 1998. Picturing experimental practice. In: Heidelberger, M., Steinle, F. (Eds.), Experimental Essays - Versuch zum Experiment. Baden-Baden, Nomos, pp. 298-323. Gooding, D.C., 2002. Varying the cognitive span: experimentation, visualisation and digitalization. In: Radder, H. (Ed.), The Philosophy of Scientific Experiment. University of Pittsburgh Press, Pittsburgh, pp. 369-405. Gould, S.J., 1989. Wonderful Life: the Burgess Shale and the Nature of History. Penguin, Baltimore, MD. Gregory, R.L., 1981. Mind in Science. Penguin, Baltimore, MD. Gruber, H.E., 1974. Darwin on Man: A Psychological Study of Scientific Creativity. Wildwood House, London. Halls Dally, J.F., 1903. On the use of the Roentgen rays in the diagnosis of pulmonary tuberculosis. Lancet 1, 1800-1806. Hanson, N.R., 1958. Patterns of Discovery. Cambridge University Press, Cambridge. Hartley, B., 1996. The living academies of nature: scientific experiment in learning and communicating the new skills of early 19thC landscape painting. Stud. Hist. Philos. Sci. 27, 149-180. Henderson, K., 1991. Flexible sketches & inflexible databases: visual communication, conscription devices and boundary objects in design engineering. Sci. Technol. Hum. Values 16, 448-472. Henderson, K., 1999. On line and on paper: visual representations, visual culture, and computer graphics. Design Engineering. MIT Press, Cambridge. Hockney, D., 2001. Secret Knowledge: Rediscovering the Lost Techniques of the Old Masters. Thames & Hudson, New York. Holmes, L., 1991. Hans Krebs. The Formation of a Scientific Life, Vol. 1, Oxford University Press, Oxford. Humphrey, C.J., 1999. Electrons seen in Orbit. Nature 401, 21-22. Jones, C., Galison, P., 1998. Picturing Science: Producing Art. Routledge, London. Kemp, M., 2000. Visualizations: the Nature Book of Art and Science. Oxford University Press, Oxford. Kemp, M., 2001. Master class in "cheating". Times Higher Edu. Suppl., 19, October 19. Kendrew, J.C., 1961. The three-dimensional structure of a protein molecule. Sci. Am., 96, December. Kosslyn, S.M., 1981. The medium and the message in mental imagery. In: Block, N. (Ed.), Imagery. MIT Press, Cambridge, MA, pp. 207-244. Latour, B., 1990. Visualisation and cognition. In: Lynch, M., Woolgar, S. (Eds.), Representation in Scientific Practice. MIT Press, Cambridge. Latto, R., 2003. Do We Like What We See?, VRI2002, Liverpool, September 2002, this volume.
Visualisation, inference and explanation in the sciences
25
LeGrand, H.E., 1990. Is a picture worth a thousand experiments? In: LeGrand, H.E. (Ed.), Experimental Inquiries. Kluwer Academic, Dordrecht, pp. 241-270. Lynch, M., Woolgar, S. (Eds.), 1990. Representation in Scientific Practice. MIT Press, Cambridge. Magnani, L., 2001. Abduction, Reason and Science. Kluwer Academic, Dordrecht. Martin, T., 1932-1936. Faraday's Diary, 7 Vols. Bell, London. Miller, A.I., 1986. Imagery in Scientific Thought. MIT Press, Cambridge. Miller, A.I., 1994. Aesthetics and representation in art and science. Lang. Des. 2, 13-37. Miller, A.I., 1996. Insights of Genius: Imagery and Creativity in Science and Art. Copernicus, New York. Miller, A.I., 2001. Einstein, Picasso: Space, Time and the Beauty that causes Havoc. Perseus/ Basic Books, New York. Newell, A., 1990. Unified Theories of Cognition. Cambridge University Press, Cambridge. Olson, A., Goodsell, D., 1992. Visualizing biological molecules. Sci. Am. 76, November. Perez-Ramos, A., 1988. Francis Bacon's Idea of Science and the Maker's Knowledge Tradition. Oxford University Press, Oxford. Pitman, W.C., Heirtzler, J.P., 1996. Magnetic Anomalies over the Pacific-Antarctic Ridge. Science 154, 1164-1171. Pylyshyn, Z.W., 1981. The Imagery Debate: Analogue Media Versus Tacit Knowledge. Psychol. Rev. 88, 16-45. Raft, A., Mason, R., 1961. Magnetic survey off the west Coast of North America 40-520N. Geol. Soc. Am. Bull. 72, 1267-1270. Rasmussen, N., 1997. Picture Control. Stanford University Press, Stanford. Rudwick, M.J.S., 1976. The emergence of a visual language for geology, 1760-1840. Hist. Sci. 14, 149-195. Scerri, E.R., 2001. The recently claimed observation of atomic orbitals and some related philosophical issues. Philos. Sci. 68, 3, $76-$88. Schick, K., Toth, N., 1993. Making Silent Stones Speak. Phoenix, London. Shelley, C., 1996. Visual abductive reasoning in archaeology. Philos. Sci. 63, 278-301. Sorensen, R., 1992. Thought Experiments. Oxford University Press, Oxford. Teutsch, H., 2003. Modular Design of the Liver of the Rat, VRI2002, Liverpool, September 2002, this volume. Teutsch, H., Shuerfeld, D., Groezinger, E., 1999. Three-dimensional reconstruction of parenchymal units in the liver of the rat. Hepatology 29, 494-505. Tversky, B., 2002. Spatial schemas in depictions. In: Gattis, M. (Ed.), Spatial Schemas and Abstract Thought. MIT Press, Cambridge, pp. 79-112. Tweney, R.D., 1992. Stopping time: Faraday and the scientific creation of perceptual order. Physis 29, 149-164. Vine, F.J., 1966. Spreading of the ocean floor: new evidence. Science 154, 1405-1415. Vine, F.J., Matthews, D., 1963. Magnetic anomalies over oceanic ridges. Nature 199, 949-950. Vine, F.J., Wilson, J.T., 1965. Magnetic anomalies over a young oceanic ridge off Vancouver Island. Science 150, 485-489. Westfall, R.S., 1980. Never At Rest: A Biography of Isaac Newton. Cambridge University Press, Cambridge. Wildy, P., Russell, W., Home, R., 1960. On myoglobin. Virology 12. Winkler, M. and van Helden, A., 1992. Representing the Heavens, Isis, 83, 2, 195-217. Wittgenstein, L., 1953. Philosophical Investigations. Blackwells, Oxford. Zuo, J.M., Kim, M., O'Keefe, M., Spence, J., 1999. Direct observation of d-orbital holes and Cu-Cu bonding in Cu20. Nature 401, 49-51.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
2
The representation of naive knowledge about physics I M. Bertamini ~, A. S p o o n e r a and H. H e c h t b aDepartment of Psychology, University of Liverpool, Eleanor Rathbone Building, Liverpool L69 7ZA, United Kingdom bMan-Vehicle Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Building 37-219, Cambridge, MA 02139-4307, USA
Human beings rely on visual information to learn about the environment around them, construct representations of the world, and control their actions. By and large, humans are remarkably accurate when it comes to complex motor actions such as catching a baseball or hitting a target. In fact, the perceptual skills underlying such actions are not easily understood, as they are far superior to any visual information processing capability of artificial systems constructed to date. In stark contrast to our excellent perception-action abilities, there are conditions under which humans make striking judgment errors that are at odds with the visual information experienced. We will describe some examples of such errors in a large proportion of the population suggesting that knowledge of the physical world is represented poorly in the cognitive domain. We will discuss some explanations for this phenomenon, and explore the implications for a scientific study of visual representations and interpretations.
1.
A BRIEF INTRODUCTION TO NAIVE PHYSICS
Naive physics is the name given to the field of study of our common-sense beliefs about classical mechanics, as these pertain to our actions. ~The authors would like to acknowledge the support of the ESRC, Grant R000223564 to MB.
27
28
M. Bertamini, A. Spooner and H. Hecht
Naive beliefs are often found to be at odds with reality. For instance, when asked where to drop a ball to hit a target on the floor while moving in an airplane or on a conveyor belt (McCloskey et al., 1983; Kaiser et al., 1992; Krist et al., 1993), many adults state that they should release the ball fight above the target. This belief immediately turns out to be mistaken when actually doing the task. Even children quickly adjust once they see that they overshoot. However, the mistaken "straight-down belief' remains in place. Similarly, when a marble's motion upon exiting a C-shaped tube lying on a tabletop has to be predicted, many adults mistakenly predict a curved exit path. The same people, upon observing curved paths in manipulated video animations, immediately notice that straight paths look much more natural. In essence, naive physics can only be understood if we conceive of the representation of elementary laws of physics in a modular way. Three representational subsystems represent knowledge with little or no cross-talk. Action representations are accurate to the extent needed and as a function of how costly it is to correct an action. Perceptual representations are often but not always superior to cognitive representations. And the latter - strangely fare worst. To better understand the evidence for this modular view, we describe from a historical perspective intuitive physics findings in general and then focus on the new field of intuitive optics. The first appearance of the term "naive physics" is believed to be in a book by Lipmann and Bogen (1923), referring to the interaction with the physical world in everyday tasks. The idea of empirically investigating beliefs and concepts about the physical world was picked up and explored by the Gestalt school of psychology (we can include in this the classic work with chimpanzees by K6hler, 1921). An exploration of naive mechanics was carried out in the 1950s by Bozzi, but this work was not published in major journals (see Bozzi, 1990; Pittenger and Runeson, 1990, and for more historical notes, see Smith and Casati, 1994). The term naive physics has also been used in the field of artificial intelligence (for a manifesto, see Hayes, 1979). A technique known as "knowledge engineering", based on introspection, is employed to formulate descriptions of world-knowledge in the language of formal logic (see Davis, 1990, for a good example, particularly Chapter 7). Such research aims to provide a foundation of knowledge for use in robotics (Hayes, 1979). Within the cognitive sciences, the field of naive physics studies the common-sense beliefs that people hold about the way the world works (as defined by Proffitt, 1999, in the MIT encyclopaedia of cognitive sciences; see also McCloskey, 1983, who calls it "intuitive physics"). Although, in theory naive physics may be explored for all natural phenomena, particular attention has been given to classical mechanics (Shanon, 1976; Bozzi, 1990; Proffitt, 1999). It is probably non-controversial
The representation of na~'ve knowledge about physics
29
that classical mechanics does offer the most relevant examples, because the importance of other aspects of physics, such as quantum mechanics, can only be appreciated in the small scale of subatomic physics or large scale of astrophysics. Neither of these domains is easily accessible to people's everyday experience of a world of middle-scaled objects (i.e. from a few millimetres to a few kilometres) (on this, see also Gibson, 1979). Because of the amount of experience that human beings have with the physical phenomena in the environment described by classical mechanics, it is intriguing that in many instances people hold beliefs that are not just underdeveloped but systematically wrong. For example, people can aim projectiles accurately (e.g. throwing a ball) but have difficulty drawing the shape of path that projectiles take (Caramazza et al., 1981; Clement, 1982; McCloskey et al., 1983; Kaiser et al., 1985a,b, 1986; Krist et al., 1993). Furthermore, physical expertise does not always improve naive understanding. For example, about 40% of adults predict the orientation of a liquid surface in a tilted but stable glass to be more than 5 ~ away from horizontal (McAfee and Proffitt, 1991). Expert liquid handlers, such as the professional barstaff at the Oktoberfest, exhibited even larger errors (Hecht and Proffitt, 1995). Mistaken beliefs are not only present when abstract questions are asked out of context, but also extend to cognitive, perceptual, and developmental aspects of knowledge. For example, it is physically true that a pendulum will take the same amount of time to swing through its arc, however wide the arc (deviations are small for all practical purposes). However, Bozzi found that people will only perceptually accept certain speeds that appear "natural" to them, and for long arcs pendulums appear unnaturally fast (Pittenger and Runeson, 1990). Galileo himself only came to believe in the isochronism principle (fixed period) after empirical observations and never failed in his writings to point out how this was true even though it was counterintuitive (Bozzi, 1990). For projectiles, evidence that perceptual knowledge of natural motion is better than abstract knowledge is for instance in Kaiser et al. (1985a,b) but see Hecht and Bertamini (2000) for a case in which perceptual judgement about projectiles is incorrect. With respect to the mistaken beliefs about mechanics two lines of explanation have been developed. On the one hand, our intuitions may evolve slowly, and our naive beliefs may not have progressed beyond the level of Aristotelian physics, unable to follow the advances of modem physics (Shanon, 1976; Caramazza et al., 1981; McCloskey et al., 1983; Bozzi, 1990). The alternative explanation is that naive physics reflects capacity limitations in people's reasoning process (Kaiser et al., 1985a,b; Proffitt and Gilden, 1989). It is suggested that even when people know all relevant dimensions or properties in isolation, they fail to integrate them when forming representations of complex events. These incomplete
30
M. Bertamini, A. Spooner and H. Hecht
representations are then applied to novel situations, in which the outcome is, therefore inaccurately predicted. It has also been demonstrated that representations of events can incorporate too many properties: people appear to believe that the accelerating properties of a thrower's arm will remain in the ball after it has been thrown, and therefore continue to accelerate (Hecht and Bertamini, 2000). What is common to all of these examples is the fact that experience of extremely familiar events, such as the motion of a thrown ball, does not always lead to correct knowledge (either abstract or implicit) about the underlying principles (Hecht and Bertamini, 2000). Furthermore, and surprisingly, some of these mistaken beliefs are strengthened rather than weakened by experience (e.g. Hecht and Proffitt, 1995). In the rest of this chapter, we shall do two things. Firstly, we shall briefly outline new results from our laboratory that extend the field into what we call na'fve optics. Secondly, we shall discuss the need for the field of naive physics to systematically explore the differences between the following three levels of representations: naive beliefs - accessible through introspection; perceptual knowledge - tested by inspecting people's ability to recognise deviations from the laws of physics in simple physical phenomena; and action knowledge - tested by looking at what people can and cannot do. We propose that a comparison of these three levels is essential to understanding the structure of visual representations. For example, the existence of conflicting representations within the individual may reflect a modular system of representations, with far-reaching impact in the study of any system, human or artificial. Neurophysiological evidence already suggests that parallel systems do exist in humans to control visual recognition and to guide visually controlled action (Milner and Goodale, 1995). As was pointed out earlier, the field of na'fve physics has previously focussed on mechanics. This is reasonable since so much of what is relevant for human behaviour depends on the laws of mechanics, from walking to trying to hit a prey with a projectile. However, recent work has expanded the field to cover some aspects of physical optics (Bertamini et al., 2002; Croucher et al., 2002; ). Although it is true that light as such is never directly the object of our experience (Gibson, 1979), a large amount of human behaviour depends indirectly on the laws of optics. For instance, what is made visible by a mirror depends on the laws of reflection, because it depends on the way light travels and bounces in the environment before reaching our eyes. Therefore, knowledge about mirrors may be derived from an understanding of the laws of optics (and vice versa) (Croucher et al., 2002). In the next section, we will summarise a set of naive optics findings. As will become clear, in common with naive mechanics our representations of reflections are surprisingly inaccurate given our wealth of experience.
The representation of na~e knowledge about physics 2.
31
oo
NAIVE OPTICS FINDINGS
Surprisingly few common-sense beliefs about light and optics have been studied although there is indication of blatant errors. For instance, many children and even adults believe that the eyes emit rays or objects. This extromission belief was prevalent in ancient Greek philosophy (Cottrell and Winer, 1994; Winer and Cottrell, 1996; Winer et al., 1996). In this section, we will summarise new findings in a related area, the intuitive understanding of mirror reflections (Croucher et al., 2002). In summary, many participants made significant errors when asked to indicate where an observer would be able to see a target in a mirror. In a set of experiments, participants were presented with a diagram of a room on paper (see fig. 1), and were asked to mark where on the paper a character (Jane) would first see her reflection in a mirror. The correct answer in fig. 1 was that Jane would have to be level with the near-edge of the mirror. However, participants tended to predict that Jane would see her reflection when she was still some distance to the side of the mirror. This consistent error remained when participants were asked to position themselves so that they could just see their own reflection in a pretend (non-reflective) mirror (see fig. 2). People tend to believe that they would see themselves in mirrors before they actually would (Croucher et al., 2002). This finding is intriguing, since people have a wealth of experience walking over to mirrors to view their reflections. Furthermore, we found that this error extended to predictions regarding when another object becomes visible in a mirror. This was true whether the object was stationary while the observer moved or vice versa. ~/~rror
T
Mean distancefromthe mirroredge:-35mm
Jane
Fig. 1. One of the two tasks used in Croucher et al. (2002). An example of an item from a paper-and-pencil task, including a grey line showing the correct answer, and an arrow showing the mean response.
32
M. Bertamini, A. Spooner and H. Hecht
Fig. 2. The other task used in Croucher et al. (2002). Photographs of the room used in the pretend task. In the second image, the person is standing at the average distance chosen by the participants (70 cm away horizontally from the mirror edge).
Croucher et al. (2002) considered four possible explanations of this consistent error. (a) Egocentric mirror rotation hypothesis: Observers may have failed to take the orientation of the mirror surface into account and they may have treated the mirror as a surface (approximately) orthogonal to their line of sight. (b) Capture hypothesis: Mirrors may be conceived as pictures which capture images for further inspection, so that the location of the observer is irrelevant. (c) Boundary extension hypothesis: People may perceive (and remember) a larger amount of the virtual space than is actually visible in mirrors. There is evidence that something similar happens for photographs, and this phenomenon is known as "boundary extension" (Intraub and Bodamer, 1993; Intraub, 1997). (d) Left-right reversal hypothesis: People have some understanding that there is some left-right reversal in mirrors, and may extrapolate from this incomplete representation to expect complete reversal of the imagined visual space around a vertical axis, thus misplacing objects in the mirror reflection (Gregory, 1997). People would then predict an observer's reflection to appear from the left as the observer approaches from the right, and in turn this may lead to an overestimation of what is visible from the side (Bertamini et al., 2002). There is some evidence to support all four of these hypotheses, and experiments are under way to test them more directly. The actual outcome may be a combination of all of them (Bertamini et al., 2002).
The representation of na~'ve knowledge about physics
33
We expect that the complex pattern of results will be explained only by a careful examination of the three levels of representations mentioned earlier: naive beliefs, perceptual knowledge and action knowledge with relation to mirrors. That important differences must exist is already suggested by the fact that the large prediction errors (naive beliefs) that we have documented do not seem to lead to a lack of usefulness of mirrors in controlling actions such as shaving or driving a car (action knowledge). Moreover, the studies underway in our laboratory also test whether these mistakes extend to perceptual knowledge. In other words, whether people would be able to select a correct mirror reflection as the most "natural" reflection (Bertamini et al., 2002).
3.
TYPES OF TASKS AND TYPES OF REPRESENTATION
We have seen that there are accepted definitions of naive physics, but it is also fair to say that the question of what knowledge about the world we display in our beliefs, perceptions, and action is a broad one, and much overlap exists with other areas. In this chapter, we have briefly summarised some findings, and in particular, we have reported recent developments that go beyond classical mechanics (Croucher et al., 2002). In this section, we reflect on the importance of the study of visual representations and interpretations as they are revealed by all three main types of tasks used in the naive physics literature: open questions as well as some paper-and-pencil tasks test explicit (naive) knowledge and beliefs; judgments about what looks "natural" test what we have called perceptual knowledge; setting specific tasks that need to be carried out by visually controlled actions tests what we have called action knowledge. Empirical evidence has demonstrated that conflicting beliefs can co-exist in the individual across these three levels of representation. Research in cognitive neuroscience has led to what is known as a theory of two visual systems (for a well-documented synthesis, see Milner and Goodale, 1995, another recent review is in Creem and Proffitt, 2001). Milner and Goodale suggest that one system is mainly involved in the processes of recognition and identification (they call it the "what" system). Another system is responsible for mapping the location of objects, and is involved in the visual control and guidance of motor behaviour (the "how" system) (see also the distinction between pragmatic and semantic representations in Jeannerod, 1997). Work on the issue of the two visual systems with normal participants has already shown that it is often a subtle change in the task that can change the nature of the outcome completely. For instance, when judging the inclinations
34
M. Bertamini, A. Spooner and H. Hecht
of hills people make large mistakes. A 5 ~ hill appears to be about 20 ~ to the average observer (Proffitt et al., 1995). People make these mistakes any time they rely on a stored representation of the scene (for instance after a delay). However, people are accurate when they use an immediate motor response, such as when they use their hand to match the inclination of the slope, while still looking at the slope (Bhalla and Proffitt, 1999). In the present context, we are suggesting that this new way of understanding apparent incompatibilities in knowledge can help us explain what is quite so surprising in naYve physics. People can be systematically wrong about laws of mechanics (and optics) to which they have been exposed throughout their lives, and even presenting observers with physically possible and impossible events does not always allow them to amend their judgment and recognise the correct event (e.g. Pittenger and Runeson, 1990; Proffitt, 1999; Hecht and Bertamini, 2000; Bertamini et al., 2002). However, the existence of systematically wrong beliefs about the physical world does not get in the way of people interacting successfully with it. Climbing slopes, carrying glasses full of beer, throwing balls, and shaving using a mirror are extremely complicated tasks for a machine but are almost trivial for human beings, although the machine may know the mechanical rules but the human does not. The work we have reviewed in naive physics has demonstrated a clear distinction between what we know and what we can do. While people are bad at drawing or recognising correct trajectories, they appear to use another system to guide the ball to the basket. Conversely, extensive training as a structural engineer will not raise your chances of getting all your beer across a busy pub. Likewise, observers who grossly misjudge the location of a mirror image when asked to make a prediction are likely to use their car's rear view mirror successfully. Knowledge about basic laws of physics, it is starting to appear, is only useful if represented at the correct level for the task at hand. Moreover, it appears that the action system experiences the strictest validations whilst the cognitive system, and to a smaller degree the perception system, are less curtailed by reality.
REFERENCES Bertamini, M., Spooner, A., Hecht, H., 2003. Naive optics: predicting and perceiving reflections in mirrors. J. Exp. Psychol. Hum. Percept. Perform. 29, 982-1002. Bhalla, M., Proffitt, D.R., 1999. Visual-motor recalibration in geographical slant perception. J. Exp. Psychol. Hum. Percept. Perform. 25, 1076-1096. Bozzi, P., 1990. Fisica Ingenua. Garzanti, Milano. Caramazza, A., McCloskey, M., Green, B., 1981. Naive beliefs in 'sophisticated' subjects: misconceptions about trajectories of objects. Cognition 9, 117-123.
The representation of nai've knowledge about physics
35
Clement, J., 1982. Student's preconceptions in introductory mechanics. Am. J. Phys. 50, 66-71. Cottrell, J.E., Winer, G.A., 1994. Development in the understanding of perception: the decline of extramission perception beliefs. Dev. Psychol. 30, 218-228. Creem, S.H., Proffitt, D.R., 2001. Defining the cortical visual systems: 'What', 'Where', and 'How'. Acta Psychol. 107, 43-68. Croucher, C.J., Bertamini, M., Hecht, H., 2002. Naive optics: understanding the geometry of mirror reflections. J. Exp. Psychol. Hum. Percept. Perform. 28, 546-562. Davis, E., 1990. Representations of Commonsense Knowledge. Morgan Kaufmann, San Mateo. Gibson, J.J., 1979. The Ecological Approach to Visual Perception. Lawrence Erlbaum Associates, New Jersey. Gregory, R., 1997. Mirrors in Mind. Penguin, London. Hayes, P.J., 1979. The naive physics manifest. In: Michie, D. (Ed.), Expert Systems in the Micro-Electronic Age. Edinburgh University Press, Edinburgh, pp. 242-270. Hecht, H., Bertamini, M., 2000. Understanding projectile acceleration. J. Exp. Psychol. Hum. Percept. Perform. 26, 730-746. Hecht, H., Proffitt, D.R., 1995. The price of expertise: effects of experience on the water-level task. Psychol. Sci. 6, 90-95. Intraub, H., 1997. The representation of visual scenes. Trends Cognit. Sci. 1,217-222. Intraub, H., Bodamer, J.L., 1993. Boundary extension: fundamental aspect of pictorial representation or encoding artifact? J. Exp. Psychol. Learn. Mem. Cogn. 19, 1387-1397. Jeannerod, M., 1997. The Cognitive Neuroscience of Action. Blackwell, Oxford. Kaiser, M., Proffitt, D.R., Anderson, K., 1985a. Judgements of natural and anomalous trajectories in the presence and absence of motion. J. Exp. Psychol. Learn. Mem. Cogn. 11, 795-803. Kaiser, M.K., Proffitt, D.R., McCloskey, M., 1985b. The development of beliefs about falling objects. Percept. Psychophys. 38, 533-539. Kaiser, M.K., Jonides, J., Alexander, J., 1986. Intuitive reasoning about abstract and familiar physics problems. Mem. Cogn. 14, 308-312. Kaiser, M.K., Proffitt, D.R., Whelan, S.M., Hecht, H., 1992. The influence of animation on dynamical judgments. J. Exp. Psychol. Hum. Percept. Perform. 18, 669-690. K6hler, W., 1921. Intelligenzprfungen an Anthropoiden, Abhandlungen der Preussischen Akademie der Wissenschaften, English translation: The Mentality of Apes. Kegan Paul, Trench, Trtibner, 1927, London. Krist, H., Fieberg, E.L., Wilkening, F., 1993. Intuitive physics in action and judgment: the development of knowledge about projectile motion. J. Exp. Psychol. Leam. Mem. Cogn. 19, 952-966. Lipmann, O., Bogen, H., 1923. Naive Physik. Arbeiten aus dem Institut ftir Angewandte Psychologie in Berlin. Johann Ambrosius Barth, Leipzig. McAfee, E.A., Proffitt, D.R., 1991. Understanding the surface orientation of liquids. Cognit. Psychol. 23, 669-690. McCloskey, M., 1983. Intuitive physics. Sci. Am. 248, 114-122. McCloskey, M., Washbum, A., Felch, L., 1983. Intuitive physics: the straight-down belief and its origin. J. Exp. Psychol. Learn. Mem. Cogn. 9, 636-649. Milner, A.D., Goodale, M.A., 1995. The Visual Brain in Action. Oxford University Press, Oxford, UK. Pittenger, J.B., Runeson, S., 1990. Paolo Bozzi' s studies of event perception: a historical note. ISEP Newslett. 4, 10-12. Proffitt, D.R., 1999. Naive physics. In: Wilson, R., Keil, F. (Eds.), The MIT Encyclopedia of the Cognitive Sciences. MIT Press, Cambridge, MA.
36
M. Bertamini, A. Spooner and H. Hecht
Proffitt, D.R., Gilden, D.L., 1989. Understanding natural dynamics. J. Exp. Psychol. Hum. Percept. Perform. 15, 384-393. Proffitt, D.R., Bhalla, M., Gossweiler, R., Midgett, J., 1995. Perceiving geographical slant. Psychon. Bull. Rev. 2, 409-428. Shanon, B., 1976, Vol. 5, Aristotelianism, Newtonianism and the Physics of the Layman Perception, pp. 241-243. Smith, B., Casati, R., 1994. Naive physics. Philos. Psychol. 7, 227-247. Winer, G.A., Cottrell, J.E., 1996. Effects of drawing on directional representations of the process of vision. J. Educ. Psychol. 88, 704-714. Winer, G.A., Cottrell, J.E., Karefilaki, K.D., Chronister, M., 1996. Conditions affecting beliefs about visual perception among children and adults. J. Exp. Child Psychol. 61, 93-115.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
3 Convention, resemblance and isomorphism: understanding scientific visual representations Laura Perini Philosophy Department, Virginia Polytechnic and State University, Blacksburg, VA 24060, USA
1.
INTRODUCTION
A typical journal article in science includes printed sentences, mathematical formulas, and figures: charts, diagrams, graphs, and results from imaging techniques like electron microscopy and MRI. Journal articles - and research talks - are the means by which scientists present and defend hypotheses in science communities. These presentations function as arguments, and scientists evaluate them in terms of strength and soundness. This scrutiny is not limited to linguistic representations: figures are evaluated as if they are also integral components of the argument. For this reason, a philosophical understanding of scientific reasoning must include an understanding of both what visual representations contribute to scientific arguments and how they do so. However, most philosophical discussion of visual representations has been conducted in the aesthetics literature, in which the focus is on images from art or everyday contexts, rather than scientific figures. Philosophers of science have paid little attention to figures. ~ This is surprising, not only because of the frequency with which figures are used in the contexts of explanation and confirmation - areas of intense philosophical study - but also because work in these areas has been dominated by analyses of linguistic representations. This general method, of studying scientific ~This situation is starting to change; see Baigrie (1996) and Taylor and Blum (1991), for example.
37
38
L. Perini
reasoning by analysing the symbols used to express that reasoning, has not been applied to visual representations in science. The result of these disciplinary trends is that philosophers of science do not have an account of the contribution figures make to science. What is needed is an account of visual representations that can accommodate the different kinds of figures used in science, and that clarifies what various visual representations are capable of in terms of content expressed and inferences supported. Explaining why scientists use visual representations at all, rather than just text, will require an account of visual representations that will allow for comparisons between figures and linguistic and mathematical representations. Understanding what figures contribute to science and how they do so thus requires a foundational analysis of visual representations. Because developing an account of visual representation that covers scientific visual representations involves consideration of a broader class of images than is usually discussed in aesthetics; this approach has the potential for a payoff beyond its relevance to philosophy of science. In particular, this study sheds some new light on old issues in aesthetics, especially the debate over convention and resemblance in pictorial representation.
0
GOODMAN'S ANALYSIS AND THE RELEVANCE OF RESEMBLANCE
Nelson Goodman, in The Languages of Art (1976) is also interested in comparisons among many different kinds of symbols, so this is a natural place to look for resources to identify similarities and differences among the different kinds of representations scientists use. Goodman analyses representations in terms of features of the symbol systems they come from rather than in terms of features of individual symbols. Systems consist of characters (classes of visible marks or utterances), rules for combining characters into more complex characters, and rules for assigning referents to characters. Goodman identifies several ways in which systems can vary. Syntactic features have to do with the relation between marks and the characters they instantiate, and semantic features have to do with relations between characters and referents of a system. Pictorial symbol systems 2 are syntactically inarticulate: it is not always possible to determine the exact character a mark instantiates. They are syntactically dense: characters are ordered so that there is one between any 2Goodman uses the term "representational", but I use "representation" more broadly, to include linguistic, mathematical and visual symbols, and use "pictorial" to distinguish systems with syntactic and semantic density and lack of articulation (or individual symbols from such systems).
Convention, resemblance and isomorphism
39
Fig. 1. Electronmicrograph, Fern~indez-Mor~in(1962).
other two characters. Pictorial systems are semantically dense as well: the dense character set is mapped onto a referent set in which one referent is ordered between any two. This electron micrograph (fig. 1) is an example of a symbol from a pictorial system. This is a representation of the structure of a biological sample. The system is dense, which means that any difference in the black-white array of this marking would mean that this image is an instance of a different character, and each different character is associated with a different referent. So, any change in the form of the marking corresponds to a different representation. Linguistic representations, on the other hand, are characterised by articulate syntax: each mark can be identified as a particular character, or it's simply illegible. This supports the compositional syntax so important to logical languages and text: symbols are built out of discrete atomic characters, and the meaning of the complex is determined by the identity and arrangement of the atomic characters. However, these syntactic and semantic differences do not explain the difference between diagrams and text: diagrams also have articulate syntax, and diagram systems can be compositional in the way linguistic systems are (see fig. 2). Goodman's analysis of symbol systems, thus, does not account for the difference between visual representations in general, and linguistic and mathematical formulas. Goodman is famous for arguing for the conventional nature of representation: pictures do not represent in virtue
CH3
,,~O
Fig. 2. Diagramof oestrone.
40
L. Perini
of their resemblance to their referents. Most opposition to Goodman' s radical conclusion has come from those arguing that resemblance is essential to pictorial representations like photographs. But, his extreme conventionalism is also responsible for his inability to identify the difference between diagrams and text. In arguing that resemblance is neither sufficient nor necessary for pictorial representation, Goodman draws the conclusion that resemblance is irrelevant to pictorial representations, and this conclusion prohibits him from recognising any relationship between the form of the symbol and its referent besides a conventional relation between the two. The relation between visual representations and their referents actually does depend on systematic relationships between the visible form of these symbols and their meanings, and these systematic relations between symbol form and referent can include resemblance. Furthermore, recognition of the relation between the form of visual representations and their content is essential to understanding the difference between visual representations and linguistic representations. An account of that difference thus depends on first showing why Goodman's denotative theory of pictorial content fails. The insightful analysis of Craig Files (1996) provides just the resources needed. Goodman starts his argument for the irrelevance of resemblance with the claim that resemblance is neither necessary nor sufficient for representation. The Duke of Wellington resembles his portrait, but he does not represent that picture. So, resemblance is not a sufficient condition for representation because the fact that the portrait resembles the Duke does not entail that it represents him. Resemblance is not a necessary condition for representation, either. Just about any mark could refer to just about anything - just by stipulating that the mark refers to that thing. Goodman concludes that resemblance is irrelevant to pictorial representation. Goodman thinks that pictures function like labels; they pick out referents in the same way verbal predicates do. According to Goodman, "Picture of a man" is a misleading expression; the picture is a man-picture, it's a kind of thing we recognise by sight, just like we recognise the sequence of shapes in the written symbol "man". The form of the picture is not related to the picture's referent, any more than the form of the letters in man is related to the word' s referent. The picture denotes what it does the same way the term man does: a purely conventional connection between symbol and referent. Craig Files (1996) shows where Goodman's argument breaks down. The problem is not in Goodman's conclusion that resemblance is neither necessary nor sufficient for a mark to refer to something. Rather, Files shows that Goodman's conclusion that resemblance is completely irrelevant to pictorial representation does not follow from this fact. Files' diagnosis is that Goodman has conflated two different questions about representation. The first question is: what does it take for a symbol to represent at all, or to be
Convention, resemblance and isomorphism
41
about something? The second question is: in virtue of what does a symbol represent what it does; what determines its content? In showing that resemblance is neither necessary nor sufficient for representation, Goodman has shown that resemblance relations cannot explain why an object is a representation (is about something). Any object could, by stipulation, be used to represent anything. But, this means that resemblance is irrelevant to the first question. Files then claims that resemblance may be involved in answering the second question: resemblance between symbol and referent may well be what explains the content a particular representation has. Files concludes that representation is not totally conventional after all: all forms of representation depend on conventions which determine that those objects are representations, but some symbol systems also have a non-conventional aspect because the content of their representations is specified by resemblance relations. Files is not explicit on this point, but he seems to conclude that the answer to the question about in virtue of what a representation carries the content it does need not invoke convention at all. But this claim is not warranted by File's argument; in fact, File's clarification of the issues provides the resources to show that there are actually two different ways in which convention is essential to pictorial representation. Any two objects resemble each other in some ways, and most pictures fail to resemble their referents in all respects (unless they represent themselves). Different kinds of pictures exploit different sorts of resemblances between picture and object - just consider a black and white photo of a person vs. a portrait in watercolours. So convention plays a role in answering the second question Files identifies (about the content of a symbol) as well as the first (about the capacity to represent at all). This means that convention plays two essential roles in symbol systems, including visual symbol systems. The first is to determine which objects are representations, and the second is to determine which of the properties of both symbol and referent are relevant to a particular representation. The fact that convention plays an essential role in determining the contents of visual representations does not, however, imply that the relationship between a symbol and its referent is entirely arbitrary. In some systems, like natural languages, there is no relationship between the visible properties of terms and their referents, except for the convention of using a particular marking to refer to something. As a result, the visible form of text has a merely arbitrary relation to its content. 3 In pictorial systems such as traditional oil painting, however, conventions of content determine which 3This arbitrariness can be a great advantage: it provides a means to express very abstract ideas such as negation, which would be difficult or impossible to express with visual representations.
42
L. Perini
properties of a picture resemble which properties of the referent. These resemblance relations are determined at the level of the symbol system: they hold for all the pictures in that particular system. For this reason picture systems like photography can be both conventional and objective. They are conventional, because referents are represented in virtue of a conventionally determined subset of the resemblance relations holding between symbols and referents. Pictorial representation is also objective, because there is a relationship between individual symbols and their referents that holds due to properties of the objects involved, rather than by any humanly stipulated relationship between particular pairs of individuals. This conclusion applies to the visual symbol systems used in science. They are both conventional and objective. The symbol-referent relation is determined (conventionally) at the level of the symbol system. This relation holds between an individual character and its referent in virtue of the properties of each. The way content is related to the form varies among the different symbol systems used in visual representations. For example, the diagram in fig. 2 has the syntax of a linguistic system. The lengths of the lines do not refer to spatial features such as distance between atoms; lines simply refer to chemical bonds between atoms. The atomic characters in this system are lines and letters, which refer to bonds between atoms and types of atoms, respectively. The forms of these atomic characters are arbitrary with respect to what is represented. In this symbol system, however, the position of lines with respect to symbols for atoms is meaningful; a line terminating at a letter refers to a bond with an atom of the type denoted at that end; two lines joined at an angle represent a carbon atom at the intersection, that is bound to atoms at the other ends of the lines. The interpretive scheme through which the diagram is understood defines a relation between certain features of the form of the diagram and structural features of the compound.
3.
ISOMORPHISM AND VISUAL REPRESENTATION
At this point we can return to the question of the difference between visual representations in general (including diagrams) and linguistic and mathematical representations. Recall that the syntactic and semantic concepts developed by Goodman do not account for such a general difference; in fact, his work points out the syntactic features that figures like the diagram have in common with text. But the discussion has shown that the forms of visual representations are related to their content in a way that is different from the relation that holds between linguistic representations and their referents. Saussure is credited with pointing out that linguistic representations are
Convention, resemblance and isomorphism
43
characterised by a serial structure. Not only are the forms of their atomic characters arbitrarily related to their referents, the identity of complex characters is determined just by the sequence of atomic characters. Written sentences can be meaningful without the contribution of spatial features such as distance between words or font. Visual representations, on the other hand, are charactefised by a 2D format. In contrast to text, all visual representations have at least one spatial feature that is interpreted as referring to some feature of the thing represented. Visual representation is not a matter of an arbitrary denotative relation between individual symbols and their referents; for all visual representations, including diverse types such as micrographs and diagrams, some aspect of the spatial form of a figure is relevant to the content it conveys, due to conventions on interpreting spatial features of symbols from a particular symbol system. This charactefisation of the difference between visual representations and text (as the difference between spatially vs. serially formatted symbols) is more abstract than visible resemblance. The more abstract characterisation is essential to an account of visual representations that includes those used in science. The relation of perceptual resemblance is too narrow a concept to account for the difference between visual representations in general and serial representations, because visible resemblance is not the relation that determines the content of most scientific figures. Many of the subjects of scientific research are not visible at all, and visible features of figures are often interpreted as referring to properties and relations to which they have no visible resemblance. For example, there is no visible resemblance between the curve in a graph of gas pressure vs. temperature and the relation holding between those properties. On the other hand, resemblance in general is too broad a concept; it includes many relations that are irrelevant to the symbol-referent relationships in science. What is needed is a concept that rules out such irrelevant relations as the identity relation, or the "being referred to in the same sentence" relation. But there does seem to be an important similarity between these symbols and the facts they represent. Many have noted the isomorphism between visual representations and their referents; Lee (1999) develops an account of pictorial reference that is based on structural mapping of relations between symbols and referents. Recall that the characteristic feature of visual representations is the fact that they are symbols from systems in which some spatial properties are interpreted to mean something about the referent. For this reason, all visual representations are structurally related to the phenomena they represent. In the context of discussions of visual representations, "isomorphism" is used informally, to refer to the sameness of structure holding between the symbol and referent. Isomorphism is a technical concept for
44
L. Perini
mathematicians and logicians, with a precise definition. Two set-theoretic structures are isomorphic just in case there is a one-to-one mapping from the elements of each onto the elements of the other, and there is a similar mapping between the relations holding among each structure to those of the other. A is isomorphic to B iff there is a one-to-one mapping f: A ~ B such that for all elements x and y in A, and all relations R, x R y ifff(x) Sf(y) for the corresponding relation S on B. The technical sense of isomorphism applies to some visual representations. For diagrams, the interpretive conventions define an isomorphic function between atomic characters (and their spatial arrangement) and objects (and relations). For example, the chemical diagram (fig. 2) stands in an isomorphic relationship to an arrangement of atoms in space. The isomorphism does not hold between the symbol and the molecule, because not all features of the molecule are represented by the diagram. The isomorphism holds between elements of the visible form of the symbol and the content of the interpreted symbol: viz. particular features of the molecule. 4 The interpretive conventions define a relation of partial homomorphism between the atomic characters and the molecule itself: the actual object has other properties, and other relations, not related to those of the symbol through the interpretive conventions. For visual systems in which this concept of isomorphism does apply, there is no difference in the kind of content conveyed compared to that of text. Even though facts are represented by a system in which the symbol is interpreted in terms of an isomorphic function between the form of the symbol and its referent, the referent will be a state of affairs composed of objects and relations holding among them, and this is no different from the content typically represented by serial representations. But diagrams are not the only kind of visual system. Pictorial systems have dense and inarticulate syntax, and dense and inarticulate semantics as well. These are the features that characterise relations holding among the set of characters on the one hand, and among the set of referents on the other. However, when these factors occur in a system in which some spatial features are interpreted (a visual symbol system), the result is a connection between these syntactic and semantic system characteristics that produces a particular kind of symbol-referent relation. For some visible feature of the symbol, any difference in that feature is correlated with a difference in referent. Pictorial systems are thus capable of representing a very dense set of properties. Furthermore, because these properties are represented by visible 4As Lee's (1999) discussion shows, the isomorphism really holds between particular visible features of the symbol - those that are interpreted - and features of the referent. Lee describes this as a relation between an abstraction of the symbol and an abstraction of the referent object.
Convention, resemblance and isomorphism
45
features of the symbol (most fundamentally, spatial features) and because humans are adept at comprehending complicated visual forms, pictorial systems can express extremely complex properties. This makes pictorial forms of representation extremely useful in science. Pictorial symbol systems provide a way to communicate about very complicated properties, even when there are no linguistic terms for those properties. Figure 1, for example, is a representation that was produced from using some new techniques with a type of biological material that was not well understood. The experiment provided surprising information about the very complicated structural features of the sample scanned. The micrograph is a member of a system with pictorial syntax, but like representations with the articulate syntax shared by text and diagrams, there is a difference between the relation holding between the visible symbol and its content on the one hand, and between the visible symbol and the subject of the visual representation. A pictorial representation like an electron micrograph does not represent all features of the sample, and so does not stand in an isomorphic relation to the sample. So if a micrograph is isomorphic to its content, then it has a relation of partial homomorphism to the subject of the figure. It does seem to be isomorphic to its content: the spatial features of the sample represented by the micrograph. But is this an isomorphism in the same sense as that defined for diagrams? Pictorial representations are not composed of discrete atomic characters. The interpretive conventions do not map one set of discrete objects (atomic characters or their composites) to another. Instead, the form of the character, as a whole, is mapped directly to some (usually very complex) property of the referent. So the technical meaning of isomorphism does not capture what is common to both pictorial and diagrammatic visual representations. However, the informal sense of the term does.
4.
CONCLUSION
Where do we stand at this point? The role of convention in visual representation has been clarified: it is essential not only to determining which objects are representations, but also to determining which visible properties of a symbol are correlated with properties of the referent. This does not eliminate the relevance of relations between symbol form and content, including resemblance. In fact, the fundamental difference between visual representations and serial representations depends on such relations: what all visual representations have in common is that some spatial relations of the symbol are interpreted as referring to some feature of the referent. As a
46
L. Perini
result all visual representations stand in (either the loose or strict sense of) an isomorphic relation to the content they convey. Goodman's analysis of symbol systems serves to identify the differences between visual symbol systems, and shows what some visual symbols have in common with linguistic representations. Some visual representations in science have the syntactic character of text (and so support compositionality, in virtue of the discrete atomic characters, like diagrams do). Other figures (like electron micrographs) have the syntactic and semantic features of pictorial representations. These results conflict with a tacit assumption that seems to influence many of those trying to account for the difference between pictorial representations and text: the assumption that symbol types can be defined in terms of exclusive sets of properties. The difference between spatial and serial formatting may distinguish all visual representations from all text, but it does not explicate the differences between electron micrographs and diagrams. Similarly, the distinction between pictorial syntax and semantics vs. linguistic syntax does not account for the difference between diagrams and textual or mathematical symbols (Table 1). All this is, of course, merely a first step toward providing an account of the roles visual representations play in science. But giving an account of figures as representations has already produced some intriguing results. Isomorphic relations between symbol form and content determine the referents of individual symbols of a visual system. Such a relation holds between the form of the symbol and the features represented; visual representations are not usually designed to represent all features of the referent. A verbal description can be a completely accurate representation of some of an object's properties even though it does not describe all the object's properties. Similarly, accuracy of visual representations in science does not require representation of all features of the object of study. Furthermore, figures represent properties in virtue of an isomorphic relation between symbol and referent: they do not have to actually share properties, Table 1 Types of symbol systems Syntax and semantics
Serial
Spatial
Articulate syntax
Linguistic representations: text, logical formulas, mathematical formulas
Diagrams
Syntactically and semantically dense and inarticulate
Pictorial representations: MRIs, some graphs, electron micrographs
Convention, resemblance and isomorphism
47
especially visible properties. This is good news for visual representations in science" figures are often used to represent phenomena that cannot be seen. So it is not possible for any visual resemblance to hold between the figure and what it represents. There are, of course, many intriguing questions still on the table. The analysis so far has not provided any reason to think that the kind of content conveyed by pictures is different from that expressed by linguistic representations. The 2D formatting of visual representations alone does not imply a difference in content type because some visual symbol systems have the syntactic and semantic structure of linguistic systems, and the content expressed by figures in such a system can be expressed linguistically as well. Is there any reason to think that pictorial representations convey a different kind of content than linguistic representations? The difference between diagrams, text, and mathematical symbols on the one hand and pictorial representations on the other has to do with syntactic and semantic properties of the different symbol systems to which they belong. Why think that density of characters, or inability to determine exactly which character a mark instantiates, makes a difference to the content a picture conveys? Goodman's analysis might lead to a radical conclusion after all: in spite of the dramatic difference in how pictures look compared to text or diagrams, pictorial representations simply refer to very complicated properties. There is not a difference in kind of content after all.
REFERENCES Baigrie, B. (Ed.), 1996. Picturing Knowledge: Historical and Philosophical Problems Concerning the Use of Art in Science. University of Toronto Press, Toronto. FernAndez-MorAn, H., 1962. Circulation 26, 1039-1065. Files, C., 1996. Goodman's rejection of resemblance. Br. J. Aesthetics 36, 398-402. Goodman, N., 1976. Languages of Art: an Approach to a Theory of Symbols. Hackett Publishing Company, Indiana. Lee, J., 1999. Words and pictures - Goodman revisited. In: Paton, R., Nielson, I. (Eds.), Visual Representations and Interpretations. Springer-Verlag, London. Taylor, P., Blum, A., 1991. Pictorial representation in biology. Biol. Philos. 6, 125-134.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
4 Emerging descriptions in molecular biology J. H. Parish School of Biochemistry and Molecular Biology, The University of Leeds, Leeds LS2 9JT, UK
Certain molecules allow for the existence of chirality ("handedness") and, importantly, the existence of two or more chiral centres in one molecule (or in interacting molecules) leads to a type of structural non-equivalence known as "diastereoisomerism". Several conventions have been adopted for representing such molecules as sketches or strings. Biological macromolecules are composed of smaller building blocks and contain many chiral centres. Several properties and functions emerge in biological macromolecules including the pathway for their biosynthesis. In considering whether, in some sense, the property of being a description has emerged with DNA, the answer must be "probably", given the existence of a molecular toolkit and the recognition of constraints in interpretation.
1.
COMPLEXITY AND SIMPLIFICATION
1.1. Emerging properties The essential physical principles underlying our universe are few in number (possibly there is fundamentally just one) but as the components become larger in number, novel properties emerge. Some kind of classification is useful for describing this in more detail. Chandler (1996, 1998) has considered the semiotics of complex systems and, although the use of his notation for mathematical structure is not needed for our discussion, the notation itself is useful for summarising levels of complexity (table 1). 49
50
J. H. Parish
Table 1 Summary of Chandler's notation for organised systems Symbol
Class of object
Notes
O~ 0~ 0~ 0~ 0~
Subatomic particles Atoms Molecules Biomacromolecules Cells
This class includes ions
0~
Ecoment
0~
Environment
Living objects having a boundary and sustained by a genetic system. A multicellular organism is " a cell" in this context The surrounds of a cell in the above sense. Nutrients and external signals (such as stimuli) are parts of the ecoment
The Notes column is empty if the class description corresponds to everyday scientific meaning. Biological chemistry involves O ~ 1 7 6 and is relevant to studies on 0~ We start by considering the emergent properties in 0~ and issues of representation and interpretation that arise from the properties of molecules. Although, with certain exceptions (e.g. a diamond), molecules are submicroscopic, their properties in 3D can be accurately deduced and there are methods for representing these. With biological molecules (and also macromolecules, 0~ an essential feature is handedness, which is described and represented in a number of ways.
1.2.
Handedness
The most familiar handed structures are our own hands. In fig. 1, we have cartoons of human hands using a simple convention. If we look at our left hand in a mirror, we see an image that could be superimposed on our fight hand. We are not constrained about which way up the hand is (fig. l b). In fig. 1c, we look at a l e f t - f i g h t pair of hands and in the reflected images we emphasise that we are not concerned about the relative positions of the hands. However, fig. l c,d, shows another consequence of having pairs of hands. The "transition" from c to d could not simply be achieved with a mirror: the "?" in the figure shows the effect of taking the mirror image of the black hand and leaving the white hand out of this "looking glas s world". We end up with a new pair (d), which can be superimposed neither on (c) nor on the mirror image of (c), however, we turn over or reorient the components of the pair.
Emerging descriptions in molecular biology
51
7
(a)
(b)
f
(c)
M (d) Fig. 1. (a) Top and bottom views of a left hand (on the left): the transformation (M) is the effect of reflecting the hand in a mirror to generate a fight hand. (b) A left hand and following M its fight-hand reflection in three different orientations. (c) A left-fight pair of hands and (following M) two different arrangements of the fight-left reflection. (d) A fight-fight pair and (following M) its left-left reflection. The transformation (".9") from (c) to (d) is discussed in the text.
Figure 2 illustrates the same principles as fig. 1 but with molecules. The C atoms shown as a white "C" on a black disc are "asymmetric" because the four substituents, arranged in 3D as the vertices of a tetrahedron, are all different. In fig. 2a, we emphasise the rotation of the structure (shown for the mirror image) does not affect the structure. Figure 2b,c illustrates the consequences of having two such asymmetric C atoms in the same molecule. The two identical structures for the mirror images in fig. 2c emphasise that bond between the asymmetric C atoms is an axis of free rotation. The chemical description of these two structures is that they are different conformations. The asymmetric C atoms are sometimes said to be "chiral" and the different structures of (b) and (c) are described as "diastereoisomers". The diagrams in figs. 1 and 2 are rather clumsy attempts at representing 3D structures in two dimensions. They are not simply projections and rely on viewpoints and the use of lines of different types. For many years, chemists have adopted simple rules for drawing
52
J. H. Parish
(a)
CHO
ClIO
ClIO
CH20 H
CH2OH
OH
CHO
Hf
CHO
F ~OH
HO"" ?F
!
CH2OH
(b)
HO,
CliO \ H
CI-I2OH ~
~
CHO ~
f f ~
(c)
, OH (~H2OH
H " H
HO~
/ ~H CH2OH
HO
CH2OH
Fig. 2. Molecules. (a) A "handed" molecule with a mirror transformation (M); the mirror image is shown in from two points of view. (b) and (c) These correspond to pairs of hands in fig. 1(c) and (d). In (c) an alternative version of the mirror image is generated by rotating the top part of the structure with respect to the lower part. m o l e c u l e s such as those in fig. 2, w h i c h are true r e p r e s e n t a t i o n s of the absolute configuration. In table 2 w e s u m m a r i s e the rules and also g e n e r a t e s o m e for r e p r e s e n t i n g hands. U s i n g the c o n v e n t i o n s o f table 2, w e can r e p r e s e n t s o m e o f the structures in figs. 1 and 2 (see fig. 3). C l e a r l y the rules for r e p r e s e n t i n g h a n d s could be used for instructing an artist about a design with a lot of p e o p l e and mirrors in it. T h e c h a l l e n g e is, to us, a 1D or " p r o s e " description o f the structures. H e r e w e can leave our h a n d s Table 2 Rules for drawing hands and molecules such as those in figs. 1 and 2 Objects
Viewpoint/conformation
Convections
Hands
From the back of the hand Look at each chiral C atom so that attached C atoms are away from the viewpoint and the other two are towards the viewpoint; use a conformation so this applies to all chiral C atoms
Arrow( ---, ) from wrist to fingers; line (-) for thumb Represent C - C bonds just as a line; do not bother to draw C - H bonds
Certains molecules
Emerging descriptions in molecular biology
T T
(let~ and
Left and right hands
CHO
~
/
CHO
right)
CHO
--OH
CH2OH
53
--OH
CH20H
D- and L-glyceraldehyde
CI-IO HO
--OH
--OH
CH2OH
CH2OH
D-ery~ose and D-threose CO2I-I
CO2H
H2N--
L-alanine
L-prolme
Fig. 3. Application of the rules of table 2 to structures in figs. 1 and 2 and introduction of two amino acids. The "CH3" in L-alanine is not required but is normally included in biochemistry textbooks.
because the problem is trivial (e.g. call them L and R). There are several descriptive methods of summarising the chemical structures shown in fig. 3. The names given in the figure are either out of date (glyceraldehyde) or are trivial (the other names). However, before describing a compact description, we need to clear up the identification of the chirality of the asymmetric atoms. The only physicochemical difference between D- and L-glyceraldehyde derives from an optical property: the sense in which solutions of the substance rotate the plane of plane polarised light and, experimentally and historically, such substances were said to be "optically active" and described as d (dextro) or 1 (laevo). Biological chemists soon realised that most naturally occurring carbohydrates (including sugars) had the same configuration of the "bottom-most" (or only) chiral atom of d-glyceraldehye and are referred to as the D-series, irrespective of their optical activity. The amino acids that form the building blocks in proteins are, in contrast, described as being in the L-series (see alanine and proline in fig. 3). The o-and L-nomenclature, although widely used in biochemistry is not
54
J. H. Parish
universally applicable: it does not cope with chiral atoms in molecules that cannot readily be related to carbohydrates and amino acids. However, a strict set of chemical rules (Cahn-Ingold-Prelog) is of limited application in biological chemistry because it fails to provide an intuitive designation for families such as the L-amino acids. We are now ready to introduce a convention devised for representing chemical structures and referred to as SMILES strings (Weininger, 1988). The rules are based on the kind of simplification we introduced in fig. 3 with the difference that here the bonds are taken as implicit but, from the point of view of representation, the interesting feature of SMILES is the way in which the configuration of chiral atoms is presented. L-alanine (fig. 3) is represented as N[C @ @H](C)C(--O)O. The @ @ means that if the viewer looks towards the N atom, the next named atoms in order are observed in that order clockwise (a single @ is anticlockwise). Before leaving this point, L-proline is a simple example of how a molecule with a ring or loop in the structure can be represented by a string: N1 [C@ @H](C)CCC1C(--O)O. SMILES provides an interesting metaphor for the discussion of 0~ (table 1 and sections 1.1 and 3.3). However, it differs in one important aspect: the mapping of SMILES strings to a structure (with the exception of very simple strings such as CC for ethane) is asymmetric in the sense that there is no rule for deciding where to start.
1.3.
Macromolecules
Macromolecules are polymers. Of relevance to this section are proteins (poly amino acids referred to by biochemists as "polypeptides") and DNA (a polynucleotide). The sequence in fig. 4 is an example of a very small protein called "pCro". Amino acids are all L, and abbreviated to single letters, e.g. A and P are alanine and proline (fig. 3). The line-break is just to allow the sequence to fit on the page but the direction of the sequence is significant, e.g. "MEQR" (the left end of the sequence) is chemically quite different from "RQEM". If one imagines the structure of pCro being drawn in the kind of detail of fig. 2, clearly there are vast numbers of possible conformations (usually called "folds" in the protein science community). However, important properties emerge in 0 ~ 1 7 6 (table 1). In some cases (but not pCro), a small MEQRITLKDYAMRFGQTKTAKDLGVYQSAINKAIHAGRKIFLTINADGSVYAEEV KPFPSNKKTTA
Fig. 4. Amino acid sequence of pCro.
Emerging descriptions in molecular biology
55
number of folds is very much more stable because the vast number of interactions between parts of the amino acid add up to generate great stability. Briefly, we must consider methods of representing such structures. The structures are stored in databases in which the principal atoms are represented in a Cartesian coordinate system and the viewing of such structures relies on powerful molecular graphics programs such as RasMol (Sayle and Milner-White, 1995). In fig. 5, two alternative representations of the pCro fold are shown. In (a) the principle bonds are shown in a convention similar to that in fig. 3. It is important to emphasise that the picture only shows the bonds and that the structure is not "empty" but full of atoms (spheres). However, a striking feature of the structure is not apparent in a fiat static view such as in fig. 5a: in fig. 5b the structure of pCro is drawn as a cartoon that emphasises the geometry of the chain of amino acid residues. The fold includes several "secondary structure elements" and in pCro (in contrast to many proteins) these are all helical. The helices themselves are chiral and, although we do not pursue this point in this chapter, there are chiral alternatives as we "go round the comers" from one secondary element to the next. In the following section, some generalisations about macromolecules are considered but we can look now at the interactions between macromolecules involving pCro: it is a regulatory molecule that when bound to certain specific DNA sequences completely changes the pattern of gene expression in the bacterium in which it is synthesised. Figure 6a shows using the cartoon representation of fig. 5b - two molecules of pCro bound to a cognate DNA sequence. The DNA double helix is at the bottom of the cartoon. Although pictures such as these emphasise the different chiralities involved, they do not illustrate the fact that there are many molecular contacts involved in the sequence. In fact, pCro illustrates a general feature of many specific DNA-binding regulatory proteins. Figure 7 shows a DNA sequence to which pCro
Fig. 5.
Two representations of the 3D fold of pCro. Both views are of the same face of the structure, which is very approximately globular.
56
J. H. Parish
Fig. 6.
Two representations of two molecules of pCro bound to DNA.
will bind. The two partial sequences underlined in fig. 7 are almost identical (G and A are similar; likewise C and T): a sequence such as that in fig. 7 is said to be "partly palindromic". There is thus a symmetry component to the very strong three-component complex of fig. 6: one pCro sub-unit binds to one of the two half-palindromes and also to the other pCro sub-unit.
0
2.1.
MACROMOLECULAR ARCHITECTURE, FUNCTION AND G E N E S
Folds, domains and functions
There is a finite number of protein folds: there are three different hierarchical classifications (http://www.ebi.ac.uk/dali/fssp/fssp.html; http://scop. mrc-lmb.cam.ac.uk/scop/; http://www.biochem.ucl.uk~ms/cath/) and here we just refer to a "fold" as a recognisable 3D structure of a group of proteins. We shall refer to proteins having more-or-less similar folds as having related structures. Proteins are important because of their functions. Examples of functional classes of proteins are structural proteins, enzymes (catalysts for metabolic reactions), toxins, storage proteins, receptors (for hormones, taste, smell, etc.) and components of signal transduction pathways. Thus protein 5' ATACAAGAAAGTTTGTACT TATGTTCTTTCAAACATGA
5'
Fig. 7. One of the DNA sequences to which pCro binds (e.g. in fig. 6). The direction of the two strands of DNA are represented running from the end labelled 5I. The two DNA sequences are said to be complementary (A opposite T and G opposite T). Note that this is a DNA sequence and A, C, G and T are abbreviations for nucleotides and do not correspond to the amino acid abbreviations in fig. 3.
Emerging descriptions in molecular biology
57
function is central to 0 ~ 1 7 6 (table 1). We return to this point in section 3 but here a few generalisations about proteins are relevant. The only protein actually shown in this chapter (fig. 5) is very small. Some proteins are extremely large and many large proteins have a modular architecture. The modules are referred to as "domains" by protein scientists (although they are not all agreed about the definition of this). Are there any generalisations about relatedness of sequence, structure and function? The short answer is, "not many". One certain point is that proteins of different functions may have related structures. Thus, a successful architecture may be recruited to perform different functions. Within the functional group of enzymes, related enzymes more usually do not have similar structures. Relating sequence to structure and function is difficult. There are good closely related folds whose members have little discernible sequence relatedness. Closely related sequences typically result in closely related structures although the "protein folding diseases" (including BSE and Alzheimer's disease) represent important exceptions.
2.2.
Gene expression
In cells, the genetic information is contained in the DNA and so, for example, there must be a DNA sequence corresponding to pCro (fig. 4): it is shown in fig. 8. Deducing the sequence in fig. 4 from fig. 8 is simple: it involves looking up the Genetic code. The stages involved in this in a living cell are summarised in fig. 9 and table 3. Table 3 is over simplified: for example, processes other than 1 and 4 are regulated. The other functions of macromolecules in structural components of cells (in the biochemists' sense of the word) are to serve as extracellular material, enzymes and components of signal transduction. The distinction between enzymic catalysis and signal transduction is partly historical as enzymes can be regarded as information processing systems (Paton et al., 1996) but in processes 3, 4 and 8 (table 3 and fig. 9) RNA plays roles that are played by protein in other metabolisms. This may reflect residuary activities from a pre-biotic "RNA world". RNA enzymes are "ribozymes". ATGGAACAACGCATAACCCTGAAAGATTATGCAATGCGCTTTGGGCAAACCAAGACAGCT AAAGATCTCGGCGTATATCAAAGCGCGATCAACAAGGCCATTCATGCAGGCCGAAAGA'IT TTTTTAACTATAAACGCTGATGGAAGCGTTTATGCGGAAGAGGTAAAGCCCTTCCCGAGT AACAAAAAAACAACAGCATAA
Fig. 8. The sequence of the cro gene (corresponding to pCro); only one strand is shown.
58
J. H. Parish
RHA
turnover
~Tew
DNA Repaired DNA ....
Recombtned DNA [ y /
9/,1 'Functional
5
~4
[ l:~,otein "
Fig. 9. Outline of gene metabolism and gene expression. The two processes drawn in grey only occur in cells infected with certain viruses. Names for processes 1-9 are in table 3. 3.
RULES AND DESCRIPTIONS
3.1.
IN M O L E C U L A R
BIOLOGY
From genes to proteins
Relationships such as that between cro (fig. 8) and pCro (fig. 4) are straightforward. The expression of the cro gene is regulated because the D N A sequence of fig. 8 is surrounded by sequences that indicate "start under regulated circumstances" and "stop" (in some cases but not cro also "under regulated circumstances"). The toolkit for such expressions is represented by processes 2 - 4 in table 3 (in fact, process 2 does not Table 3 Macromolecules involved in gene metabolism and gene expression 1 1 1 2 2 3 4 4 4 5 6 7 8 9
Process names
Enzymes
Other components
DNA replication Regulation Error correction Transcription Regulation RNA modification and editing Translation Regulation Error correction Protein folding and modification DNA repair Genetic recombination RNA turnover Protein turnover
Protein Protein Protein Protein Protein Protein, Protein, Protein, Protein Protein Protein Protein Protein, Protein
Protein, RNA DNA, RNA DNA, RNA Protein DNA, RNA Protein, RNA Protein, RNA Protein, RNA Protein, RNA Protein Protein, DNA Protein, DNA Protein, RNA Protein
The numbers refer to fig. 9.
RNA RNA RNA
RNA
Emerging descriptions in molecular biology
59
occur with cro messenger RNA but it is required for ancillary RNA molecules involved in process 3 and, in general, messenger RNA may require modification itself), pCro is itself a protein involved in the regulation of transcription (process 2 of table 3) and is a part of a regulatory net.
3.2.
From proteins to functions
The function of a protein is determined by its fold although there is not a oneto-one relationship (section 2.2). The relationship between sequences is well understood although there are several components in the necessary toolkit (fig. 9 and table 3). The meaning of a DNA or protein sequence is as a description of the protein's fold so it should be possible to describe the rules for relating the sequence (e.g. fig. 4) to its fold (e.g. fig. 5). The methods used by molecular biologists for doing this in the case of a sequence with no experimentally determined structure are of little help because their methods are largely heuristic. However, the mechanisms involved can be described qualitatively. The acquisition of the "correct" fold does not necessarily require ancillary components and such rules as have been deduced are consistent with the idea that the availability of folding pathways is crucial. In other words, the "rules" are those of physical-organic chemistry determining the interactions between amino acid residues in the sequence. Interactions between proteins and other proteins and other macromolecular structures are central to regulation (e.g. fig. 6) and signal-transduction, including interactions with the ecoment (0~ of table 1). The power of protein-protein interactions is illustrated by the shapes of simple viruses. It has been recognised for many years (since the 1950s) that the genetic information in simple viruses is too limited to make the capsids (the protein envelopes that surround the viral DNA or RNA). Any text on virology will explain the solution, but we cite Fraenkl-Conrat (1969) as an authoritative work. Simple viruses are either stiff helical rods or appear to be approximately spherical. Figure 10a shows a disc composed of six subunits. In the case of tobacco mosaic virus, each subunit is a dimer (a pair of proteins). Thus this protein has structural features to allow it to dimerise and for six such dimers to form a disc with a hole in the middle. The virus structure itself can be envisaged by imagining such discs to stack up with the genetic material (RNA in this case) down the hole but with a dislocation in the starting disc so that layers of discs are replaced by broad helix. The spherical viruses have icosahedral symmetry. The icosahedron (fig. 10b) is one of
60
J. H. Parish
) (a) Fig. 10.
(a) Top view of "segment" of the tobacco mosaic virus capsid. Each of the six segments is a protein dimer. (b) An icosahedron.
the Platonic solids equilateral triangle). subunit and it is interactions and the of fig. 10b.
3.3.
with 12 vertices, 30 edges and 20 faces (each an Such viruses contain several (e.g. 3) kinds of capsid the properties of these proteins that cause their generation of a capsid with the symmetry properties
Emerging descriptions
As one proceeds through the symbols from 0~ to 0~ of table 1, new properties emerge that require symbolic representations so that given understood conventions, complex structures can be represented in text or simple graphical conventions. The alternative issue is the extent to which a DNA sequence can be read to imply a protein sequence, structure and function and, beyond that do DNA sequences describe in some sense organisms and populations? We emphasise that the properties of handedness (section 1.2) are crucial to understanding the interactions of macromolecules and their functions. It is reasonable to suppose that the key emergent property of DNA as a genetic material has led to a new descriptive "language" and that the sequence may lead unambiguously to descriptions of organisms and possibly populations. The feature of the mechanisms of interpreting such a description is that they are highly constrained or subject to boundary conditions. In the case of protein folding, one such constraint is that the amino acid residues are constrained by the maintenance of the connectivity of the residues in the sequence. In the case of a cell and its signal transduction, an important constraint is the positions of membranes and other components of cell envelopes. A working view is that with DNA, description became an emergent property but that the interpretation of the description requires molecular toolkits and is highly constrained.
Emerging descriptions in molecular biology
61
REFERENCES Chandler, J.L.R., 1996. Complexity III. Emergence. Notation and Symbolization. WESScomm. 2, 34-37. Chandler, J.L.R., 1998. Semiotics of complex systems: a hierarchical notation for the mathematical structure of a single cell. In: Holcombe, M., Paton, R. (Eds.), Information Processing in Cells and Tissues. Plenum Press, New York, pp. 185-195. Fraenkel-Conrat, H., 1969. The Chemistry and Biology of Viruses. Academic Press, New York. Paton, R.C., Staniford, G., Kendall, G., 1996. Specifying logical systems in cellular heirarchies. In: Cuthbertson, R., Holcombe, M., Kendall, G. (Eds.), Computation in Cellular and Molecular Biological Systems. World Scientific, Singapore, pp. 105-119. Sayle, R.A., Milner-White, E.J., 1995. RasMol: Biomolecular Graphics for All. Trends Biochem. Sci. 20, 31-36. Weininger, D., 1988. SMILES 1. Introduction and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31-36.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
5
Modular design of the liver of the rat Harald F. Teutsch Department of Anatomy and Cellular Neurobiology, University of Ulm, Albert-Einstein-Allee 11, D-89069 Ulm, Germany
The idea of a modular design of the liver goes back to Wepfer's (1665) studies of the liver of the pig. In this species, individual parenchymal units can be made visible in tissue sections without difficulty by using routine histological staining procedures, because the modules are separated by connective tissue (Wuensche, 1981). The fact that the livers of most mammalian species (including man) are lacking these separating connective tissue sheets explains why the various attempts to define the modules in these livers have so far, only led to the proposition of concepts that are, still today, a matter of debate (Kiernan, 1833; Mall, 1906; Rappaport et al., 1954; Rappaport, 1976). In the absence of the separating connective tissue, liver parenchyma thus appear morphologically homogeneous, and the modules in question cannot be identified with certainty. We have found that this problem can be solved by making cell function visible with enzyme histochemical techniques, in particular, by the demonstration of glucose-6-phosphatase activity (Teutsch, 1988) and succinate dehydrogenase activity (Oeksuez, 1997), because it clearly marks the perimeter and centre of parenchymal units. Thus, it is possible to trace individual modules, together with the supplying vessels at the surface and the draining vessels in the centre of modules, through sequences of 15 Ixm thick cryosections. On the basis of these data, modules can then be reconstructed three-dimensionally and analysed morphometrically. 63
64
H. F. Teutsch
Fig. 1. Graphic representation of polyhedral primary units. Numerals indicate the number of faces, the volume in nl and the height in Ixm; the draining central vein is represented as a contour, supplying portal tracts are omitted.
Modular design of the liver of the rat
65
In general, a module consists of the following structural elements: 1. supplying vessels (branches of the hepatic artery and portal vein) connected by vascular septa, which make up a continuous supplying surface; 2. sinusoids that take their origin from the vascular surface; 3. a central vein, located in the centre of the module, that drains the sinusoids; and 4. hepatocytes that are lined up along the sinusoids. The modules are present as individual "primary units" that drain directly into a sublobular vein, or can be integrated in variable number into larger "secondary units" that are drained by a common central venous tree, the stem of which empties into a sublobular vein. So far, we have reconstructed 33 individual primary units and 10 secondary units (Schuerfeld, 1996; Oeksuez, 1997; Teutsch et al., 1999). The primary units are tetra to heptahedral in shape with plane, convex or concave faces (fig. 1). Their heights vary from 310 to 1275 txm, their volumes from 0.154 to 0.653 mm 3. The secondary units comprise between 3 and 14 primary units (fig. 2). The heights vary between 550 and 2100 Ixm and the volumes vary between 0.224 and 3.300 mm 3. The integrated primary units are tri to heptahedral in shape, with faces that are plane, concave or convex. They have heights between 70 and 840 Ixm and volumes between 0.034 and 0.482 mm 3. Individual primary units and secondary units are "attached" to a sublobular vein (into which the blood is drained), from where they extend either to the surface of the lobe or end at variable distances from the surface. Integration of a variable number of primary units into secondary units further increases the three-dimensional plasticity of the modular subdivision of the liver. From vascular casts of sublobular veins and the connected central venular trees, we know that the distribution of the units follows an organisational principle. Accordingly, the size and complexity of those units that extend to the lobular surface increase along the length of a sublobular vein, i.e. from the narrow beginning to the wide end that is connected to the vascular stem of the hepatic vein. These units determine the shape of the different lobes in the liver of the rat. In this arrangement, individual primary units mainly serve to fill the variable gaps between neighbouring secondary units. With regard to the basic organisation, the primary units reconstructed from the liver of the rat are comparable to what is known about the lobular units of the human and pig liver (Wuensche, 1981; Matsumoto and Kawakami, 1982; Ekataksin and Wake, 1991). The concept of the acinus (Rappaport et al., 1954; Rappaport, 1976; Sasse et al., 1986, 1992), on the other hand, cannot be applied to the liver of the rat (Teutsch et al., 1999).
66
H. F. Teutsch
Fig. 2. Graphic representation of secondary units. Roman numerals indicate primary units integrated into a secondary unit. Other numerals indicate the number of faces, the volume in nl and the height in Ixm; the draining central vein is represented as a contour, supplying portal tracts are omitted.
REFERENCES Ekataksin, W., Wake, K., 1991. Am. J. Anat. 191, 113. Kiernan, F., 1833. Philos. Trans. R. Soc. Lond. 123, 711. Mall, F., 1906. J. Anat. 5, 227. Matsumoto, T., Kawakami, M., 1982. The unit-concept of hepatic parenchymama re-examination based on angioarchitectural studies. Acta Pathol. Jpn. 32 Suppl. 2, 285-314.
Modular design of the liver of the rat
67
Oeksuez, M., 1997. Inaugural Dissertation, Universit/it Ulm. Rappaport, A.M., 1976. Beitr. Pathol. 157, 215. Rappaport, A.M., Borowy, Z.J., Lougheed, W.M., Lotto, N., 1954. Anat. Rec. 119, 11. Sasse, D., Thurmann, R.G., Kauffman, F.C., Jungermann, K. (Eds.), 1986. Regulation of Hepatic Metabolism. Plenum Press, New York. Sasse, D., Spornitz, U.M., Maly, I.P., 1992. Enzyme 46, 8. Schuerfeld, D., 1996. Inaugural Dissertation, Universit~it Ulm. Teutsch, H.F., 1988. Hepatology 8, 311. I'eutsch, H.F., Schuerfeld, D., Groezinger, E., 1999. Hepatology 29, 494. Wepfer, J.J., 1665. De Dubiis Anatomicis. Epistola ad Jacob Henricum Paulli. In: Paulli, J.H. (Ed.), Anatomiae Bilsianae Anatome Occupata Imprimis Circa Vasa Meseriaca et Labyrinthum in Ductu Orifero. Argentorati, Simonem Paulli, pp. 93-100. Wuensche, A., 1981. Zbl. Vet. Med. C. Anat. Histol. Embryol. 10, 342.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
6
The Heisenberg group as a f u n d a m e n t a l structure in nature Ernst Binz a, Sonja Pods a and Walter Schempp b ~Lehrstuhl ftir Mathematik I, Universit~it Mannheim, 68131 Mannheim, Germany bLehrstuhl ftir Mathematik I, Universit~it Siegen, 57068 Siegen, Germany
The notion of a Heisenberg group and its Lie algebra are fundamental objects in science and engineering because they provide a model for information storage and its transmission. Moreover, they determine the spin group which together with the Schrfdinger representation models the double helix of a chromosome. They also yield the radar cross-ambiguity function, the classical quantisation procedure, the Newtonian gravitational field, a Minkowski space and, in consequence, Einstein's equation C - m c 2. All these results are based on the concept of information.
1.
INTRODUCTION
The concept of information is fundamental in science and engineering. We show that Heisenberg groups and their Lie algebras, the Heisenberg algebras, serve ideally to model the storage and transmission of information. In fact, these notions are of such a basic character that the skew field of quaternions JHI and the spin group SU(2) can naturally be constructed. These constructions exhibit a natural Minkowski metric on H as well as on the Heisenberg algebra and reproduce Einstein's famous equation C m c 2. Comparing two Heisenberg algebra structures on the same underlying linear space allows us to construct the Newtonian gravitational field. =
69
70
E. Binz, S. Pods and W. Schempp
The Heisenberg group includes a periodic time scale on its centre and the plane of information input. The implementation of the Heisenberg group as a space of signals is done by the Schr6dinger representation. This representation has a natural symmetry group, the symplectic group of the plane whose elements preserve the local information. Its double coveting gives rise to the metaplectic group. The Schr6dinger representation naturally yields the ambiguity function in radar and optics. In conjunction with the Schr6dinger representation, the symplectic and the metaplectic groups of the plane serve for the description of optical systems in geometric optics such as the wave-fronts in Fresnel optics. The infinitesimal metaplectic representation naturally yields the well-known classical quantisation of homogeneous quadratic polynomials. The interplay between the Schr6dinger representation and the spin 89 allows the geometric structure of chromosomes to be modeled.
0
THE NOTION OF INFORMATION, INFORMATION DENSITY AND INFORMATION PRESERVING LINEAR MAPS
Suppose we have a photography and want to send it electronically to a friend. The photography is a two-dimensional object and thus can be thought of as being embedded in a plane F which itself is contained in a three-dimensional linear space E. When the picture was taken, signal intensities were chemically encoded into grey scales. In order to transmit the image, it is necessary to convert the black, grey and white spots on the image into a language of electromagnetic signals. We convert these grey scales into intensities, i.e. into half of the square of the amplitude of an electromagnetic wave. The electromagnetic wave can then be transmitted to its destination and converted back again. This evolution of information is obviously time-dependent. The grey scale of the photography reflects an information density. Such a density is usually described by a real-valued positive density function f. Its integral reflects the total amount of information. To form the integral, the plane of the photography has to be equipped with a volume form to, which is also a symplectic structure due to dim F = 2. Thus F becomes a symplectic plane. Then the total information I has the form I - - IFf~~
(Eq. 1)
The integrand reflects the infinitesimal amount of information. The density function f is used to analyse the local information content and yields in
The Heisenberg group as a fundamental structure in nature
71
a natural way a probability density 79, namely 79 = f / I . The negative of the logarithm log 79 of 79 is called the entropy and is one of the most important ingredients in Shannon's information theory. It provides us with a kind of measure for information. In transmitting the photography the total information certainly has to be preserved. In fact, we want more if the information is rearranged then the infinitesimal volume shall be retained. The preservation of both I and oJ amounts to preserving the integrand in Equation (1) and yields a continuity equation. Let us perform this rearrangement by a smooth map 9 from an open subset ~ C F into F in such a way that the total information is preserved. Mathematically formulated this is expressed as I(@(N)) --
[email protected]*to -- I(X).
(Eq. 2)
Here @*to at x E Y, is defined via the differential D@ of @ as @*to(x)(v, w) := to(D~(x)(v), D@(x)(w))
Vv, w E F.
The goal of transmission is to recover the image as a whole; however, during the process of transmission, information is rearranged, for example by the lens of a camera. As a consequence, @ is time-dependent. Differentiating both the sides of Equation (2) with respect to time yields the continuity equation of entropy: -log
(1
7"f~
)
-- -log(po@(t)) -- const.
for each t.
The spatial derivative D@ evaluated at h E F, i.e. D@(h), is a linear map preserving the symplectic structure to. The group of all such linear maps of F is called the symplectic group and is denoted by Sp(F). Thus D@(h) E Sp(F). The symplectic group plays a fundamental role in geometric optics as we will see in section 3.
3.
INFORMATION TRANSMISSION
Our next step in formulating information transmission into mathematical terms is to provide us with an algebraic formulation of the photographic plane and the direction of the information transmission, i.e. the information
72
E. Binz, S. Pods and W. Schempp
channel. This is to say that we split up the three-dimensional space into E = R.aOF with the symplectic form to implemented on F. Here a is a non-vanishing vector transversal to F. The symplectic form produces a natural scalar product on E, where a is a unit vector perpendicular to F. In the plane F the information (of the photography) is situated. We will call it the plane of input in the following. R.a is the channel of information transmission. To obtain all possible directions of information transmission, we have to vary a in the two-sphere S 2. The algebraic structure we will use is the Heisenberg algebra structure on E defined by the Lie bracket [Al-a -+- hi, A2"a -Jr h2] := to(hi, h2)'a, where A1, /~2 t~ ]t~ and hi, h 2 E F. Let us call this Lie algebra Ga. It contains the collection of bits of information together with a volume form on it and the information channel and thus is a basis for our description of information
transmission. Let us consider geometric optics as a first example. Here the plane of input of information has a parallel counterpart at v.a, namely, the plane F~, say, of information output. Here v > 0. In between these two planes an optical system consisting of lenses, prisms, etc. shall be installed. The information in F is transmitted to F~ by light rays passing through the optical system. It is a remarkable fact that the image is obtained by mapping the information in F to F~ by a suitable linear map ,4 depending on the particular optical system. More precisely, ,4 yields a symplectic rearrangement in F performed by a symplectic map A E Sp(F); the resulting image of A is mapped to F~ by rays parallel to IR.a. The intersection points in F~ constitute the optical image. So far we have not yet implemented time in our setting. To include all directions of information transmission, i.e. all possible Heisenberg algebras {Gala E S2}, in a linear space together with a universal time axis, at least four dimensions are needed (cf. Equation (3)). This is to say we have to enlarge the linear space Ga by one dimension. This enables us to rotate Ga and compare the respective signal transmissions. Doing this we will at the same time open the door to the realm of special relativity in a rather natural way. Given the symplectic plane (F, to) and a basis (el, e2) with to(el, e2) = 1, the scalar product is given by (h, k) := w(Jh, k)
Vh, k E F.
73
The Heisenberg group as a fundamental structure in nature
Here J is the linear map with matrix
(01)1 0 The scalar product extends orthogonally to all of Ga by setting (a, e s) := 0,
s = 1,2,
and
(a, a) := 1.
Adding a further dimension carried by a unit vector e orthogonally to Ga for the implementation of our universal time axis finally yields the linear space N := R . e O G a. There is exactly one volume form/z on Ga for which/z(a, h~, h2) = W(hl, h2) holds true for any h~, h 2 E F. Hence
h2) = (a,
• h2).
In our context Hamilton's multiplication on H is then defined by (Al.e + u,).(Az.e + u2):= (A1.A2 - (u,, Uz)).e + AlU2 + AzUl + h, x h 2 for all A1,/~2 ~ ~ and all u l, u2 E Ga. Together with this multiplication, H is a skew field called the quaternions. The commutative subfield R.e is called the centre of H. Obviously ( , ) extends to all of H so that e is a unit vector perpendicular to Ga. Given Ga, the brackets are determined by the multiplication in H for any a E S 2. Obviously, [A~.a + h~, A2.a + h2] := W(hl, h2).a = (a • h~, h2).a = (a, hl.h2).a holds true. It will be shown in Equation (4) that Ga also determines the multiplication in H. Since Ikl.k2[ = [kl ['[k2[ for all kl, k2 E H, the unit sphere S 3 C H is a group called SU(2). The two-sphere S 2 - S 3 n G~ is the equator of S 3 = SU(2) and the Lie algebra su(2) of SU(2) is the linear space underlying Ga endowed with the cross product " • ". As su(2) -- LJa~S2 ]~.a, the inclusion su(2) C U Ga C H
(Eq. 3)
aES 2
reflects the transmission of information in all directions in the respective planes of input. The construction of ]HI suggests to split it into H = C~@F where C ~ span(e,a), a commutative subfield of H isomorphic to C by the map i~" C ---+C ~, say. Clearly, SU(2) n C ~ is a one-dimensional unitary group called ua(1), whose Lie algebra is the information channel IR.a. The field C a
74
E. Binz, S. Pods and W. Schempp
operates on F both from the left and the fight by the multiplication in IH[. Hence F is a complex line in a twofold manner. The algebraic structure of IN[provides us with a Minkowski metric. Indeed, given a quaternion k - A.e + u with A E IR and u E Ga, its square is k 2 _ (/~2 _ lul2).e + 2A.u. The central term determines a natural Minkowski metric of signature (1,3) defined by gM(A1 "e + b/l, A2"e + / / 2 ) "- c2"/~-1"/~2 - (Ul,/,/2)
for any A1, /~2 (~ ]~ and any Ul, U2 (~ ~a. This metric is negative definite on R.e and positive definite on Ga. This justifies calling lR.e the universal time axis. The scaling factor c 2 on the time axis is the speed of light which will be set equal to one for reasons of simplicity. Hence R . e O F consists of the time axis and the plane of information input rendering information time-dependent. Consequently, Hamilton's multiplication on H can be rewritten as
(.1
(Al'e q- hl)'(A2"e + h2) -
-
--
gM
~C. e
+hi,
+ A1h2 + A2hl.
a2 c
-l9e
h2
)
e9 - [Al-a + hi, A2.a + h2] (Eq. 4)
In order to implement a time scale on the information channel N.a, we multiply R . e O F by a yielding Ga and transfer gM to Ga yielding the Minkowski metric g~ on Ga. Hence ~a itself is a Minkowski space. The transmission of information is now parameterised by the time on the information channel. Thus, H = IR.eOIR.aOF
has implemented the original time axis R.e, the channel of information transmission R-a with the time scale as well as the plane of input F. The Minkowski metric gM has an interesting application: Einstein's equation g - mc 2 can be deduced from the following observation. For each singularity-free vector field in the three-space there is a natural U(1)-principal bundle. The fibre is a circle with the reciprocal value of the square root of the field strength as radius. This amounts to saying that along a particular field line there is a natural circle bundle whose fibres characterise the field strength at the respective base points. For the Newtonian gravitational field this bundle restricted to a field line is a cone
The Heisenberg group as a fundamental structure in nature
75
with the field line as symmetry axis. The field line reparameterises R.a for some a E S 2. Comparing this cone with the light cone of gM yields Einstein's equation S - mc 2. The Newtonian gravitational field can be constructed out of the coadjoint orbits of a Heisenberg algebra together with an additional symplectic structure on the plane of information input, producing the solar mass.
oo
0
HEISENBERG GROUPS, SCHRODINGER R E P R E S E N T A T I O N , CROSS-AMBIGUITY F U N C T I O N AND QUANTISATION
Given a ~ S2, i.e. a direction of information transmission, the exponential map exp : lt~.a ---, ua(1) defined by exp a.a := e-cos ce + a.sin a implements a time scale on ua(1) in a periodic fashion. Since U(1) C C is isomorphic to U a (1), the unitary group U(1) is our standard watch being independent of a particular channel of information. Encoding the periodic time scale and the plane of input in one algebraic object is naturally achieved by the Heisenberg group: Ga:=ua(1)~F,
a subset of H equipped with the multiplication
(Zl + hl)'(z2 -Jr-h2)"- Zl'Z2"e (1/2)w(hl'h2)'a Jr- hi + h2 for all zl, z2 E ua(1) and all hi, h2 E F. Its Lie algebra is ga as it is easy to see. Now we observe that the union of all the circles ua(1), a E S2, is nothing else but the three-sphere S 3. This is to say that the smallest linear space containing the collection of all Heisenberg groups G a is H with its universal time axis. The geometry of information input and information transmission will next be implemented into a space of signals. This is done by the Schr6dinger representation of the Heisenberg group G a on the Schwartz space S(R, C) of all C-valued rapidly decreasing smooth functions on IR. (Its completion consists of the Hilbert space of all C-valued square integrable functions of I~). We need to choose an orthonormal coordinate system in F splitting it into F - I ~ . e o +I~.a.eo with le01- 1. Then we may consider any q~ E S ( ~ , C) as being defined on R.e0, called the ~'-axis.
76
E. Binz, S. Pods and W. Schempp
The Schr6dinger representation is defined as follows: given a frequency E R and any z -- e ta E ua(1),
Pv(Z -+- ~'eo + rl'a'eo)(qg)(x) "- e-Vti'e-Vn?i/e'evrrxi'q~(x - ~) for all ~', r / E IK, all q~ ~ 5~(R, C) and any x E I~. Here the unit vector a'eo carries the so-called r/-axis. The Stone-von Neumann theorem states that, up to equivalence, p~ is the only irreducible unitary representation of G a which coincides with p~ on the centre U a (1) of G a. In fact, p~ can be constructed from a point v E F and the restriction p~lua(1) of Pv to ua(1), called the central character, denoted by X~- Thus the pair (X~, v) determines p~ up to equivalence and the plane F~ : v.aOF, a coadjoint orbit, is a geometric characteristic of p~. As far as the modulation of information on signals from S(R, C) is concerned, we can restrict ourselves to the Schr6dinger representation p - p~ having frequency v - - 1. The coefficient function cp,r : G a " ' * C defined by
Cp,r
+ h)"- (p(z + h)(q~), ~)L2
Vz U_. ua(1) and Vh E F
and any fixed pair q~, ~ E 5e(R, C) plays an important role in signal analysis, as will be seen in formula (7). In order to compute Cp,r for any h ~ F we write h as
h = ~'eo + ~q.a.eo
V~', ~/E I~.
(Eq. 5)
Hence the coefficient function is expressed as
I ent"i.e-nCi/e.q~(t ' -- sr).~(t')dt' R
-
j'Ren(t'-C/z)i.q~(t ' - ~').~(t')dt'.
Changing t ~ into t -= t ~ - ~'/2 yields
cp,r
- l~t enti.q~(t - -~ ) .~(t + -~ ) .dt ,
(Eq. 6)
the famous cross-ambiguity function H applied to q~, qt, ~"and 77
cp,,,q,(h) - H(q~, ~; ~, rl)
Vq~, ~0 ~ S(It~, C)
(Eq. 7)
for any h represented as in Equation (5). The variables ~"and r/represent local phases and frequencies. Together with a reference (phase difference) they allow us to reconstruct the entire phase. This is a typical application of the radar cross-ambiguity function.
The Heisenberg group as a fundamental structure in nature
77
As an example of transmission and detection we consider radar. Suppose a time-dependent signal q~ is sent to a moving target. The echo signal is delayed in time by ~"E I~+ and undergoes a frequency shift r / ~ IK, say. More precisely, iff(t) -- q~(t).ei'~ at time t, the echo signal is of the form g(t) - q~(ff).ei(~~ t - -~
"t
The cross-ambiguity function then is H ( g , f ; ~, rl) -- [ r JR
~).ei(~176
i~ dt.
The time delay f and the frequency shift rI allow to compute the distance and the velocity of the target. The method to determine time delay and frequency shift is to search for a peak in the graph of H(--,f;-., ..) in the (sr, r/)-plane F. As one easily verifies by Equations (6) and (7) H ( g , f ; ~, ~) -- (p(f'eo + ~q.a.eo)g(t),f(t)),
showing the fundamental role of the Heisenberg group in signal detection. To study the effect of a volume preserving rearrangement of the collection C F of information on a Schr6dinger representation, we extend any A E Sp(F) to all of G a by setting it equal to the identity on the centre ua(1). This extension is called A again. Hence, A : Ga---+G a
is a group automorphism. Due to the theorem of Stone-von Neumann, poA-
u ( a ) o p o U ( a ) -1
for the metaplectic representation U" Mp(F) --~ U(L 2 (R, C)). The metaplectic representation is a twofold coveting of Sp(F), i.e. there is a two-to-one surjective map pr: Mp(F)--~ Sp(F). Hence the volume preserving rearrangement, i.e. the isentropic rearrangement, of information in F causes the metaplectic representation. This representation U has a variety of application, for example, it is used in geometric optics and in a natural quantisation procedure. Both will be sketched below. Suppose we have an optical system characterised by a symplectic map A. A well-known construction by Fresnel converts any A E Sp(F) into two
78
E. Binz, S. Pods and W. Schempp
different operators on the space of signals S(R, C). The group of these operators realises the metaplectic group Mp(F), i.e. Fresnel's construction amounts to a unitary representation U. Any of these operators converts a prescribed intensity distribution on the plane of input F (in our example the photography) into an intensity distribution, i.e. an amplitude distribution, on the output plane F~. In mathematical terms, given an optical system characterised by a symplectic map A E Sp(F) which is the projection of E Mp(F), say, U(i])(q~) is the image of an amplitude density function q~ on F. The natural quantisation procedure for the collection Q of quadratic homogeneous real-valued polynomial functions is performed as follows. Let p E Q. The principal part ham(p) of the Hamiltonian vector field Xp of p on F is a traceless linear map, i.e. Xp E sp(F), the Lie algebra of Sp(F). The classical quantisation is performed by ham
sp(F)
idU
Herm S.
Here Herm S is the C-linear space of all Hermitian operators on S. This well-known construction emphasises the close relation between optics and quantum mechanics. It is built up from the Heisenberg group G a.
0
THE THREE-DIMENSIONAL SPIN GROUP AND THE DOUBLE HELIX STRUCTURE OF C H R O M O S O M E S
The spin group SU(2), i.e. the three-sphere in the quaternions, which decomposes into S U ( 2 ) - UaEs 2 U a (1) can be reconstructed from the Heisenberg algebra ~a for any given a E S2. This is to say that in our context the spin group reflects signal transmission in all spatial directions (cf. Equation (3)). From here we easily pass on to the spin 89 r" S U ( 2 ) ~ U(H) used in nuclear physics, MRI, etc. The restriction of this representation to Uo(1) is defined by r" ua(1)---* U(H)
r(z)(z' + h ) " - z'.z + z-l'h for all z ~ u a ( 1 ) and all z' + h E ]I-]I- CaOF. Let us point out that this restriction implements the periodic time in both C a and F with opposite directions. The full spin 1-representation is given by
r[Ub(1) -- ~kOrlu. 1)~
-1
gb
E S2
The Heisenberg group as a fundamental structure in nature
79
for some quaternion k depending on b. Here the automorphism ~'k is defined by Tk(U)"-k'U'k -1 for any two k, k~E H and is called an inner automorphism. The infinitesimal spin 89 multiplied by hal2 yields the spin quantisation in the direction of a. Let us point out that the spin representation and its quantum version stem out of only one Heisenberg algebra encoding storage and transmission of information in the direction of the information channel I~.a. The special feature of the spin group SU(2) becomes apparent if we consider the canonical map ~-: SU(2)---. SO(E) defined by r(k):= ~'k for every k E SU(2). This surjective homomorphism is a Lie group homomorphism with kernel { e , - e } , the centre of SU(2). Thus SU(2) is the twofold covering of SO(E). Since SO(F) is a circle in SO(E) and Ta(Ua(1))- SO(F), the map ~'a: Uo(1) ---' SO(F) covers the circle SO(F) twice. Every rotation in SO(F) has two pre-images. This is the essence of the spin group yielding the notions of spin-up and spin-down and the basis to describe the double helix structure of the chromosomes. The axis I~.a in the Euclidean space E serves as an axis of rotation for a E S2 C E. On the other hand, the vector a determines the symplectic form o9 on F. The group SO(F), a circle, is a maximal cc_.ompact subgroup of Sp(F). This circle has a twofold covering by SO(F), say, a subgroup of Mp(F). A natural isomorphism links Uo(1) with SO(F). It will provide us with the possibility to choose either the amino acid link AdenineThymine or Guanine-Cytosine in the modelling of the double helix of chromosomes. The double helix structure can be modelled by an interplay between the spin 1-representation and the Schr6dinger representation. Out of the spin 1-rep resentati~ we will construct a one-parameter family of Schr6dinger representations naturally characterised by an oriented helix. Passing on to the contragradient representation yields a second oppositely oriented helix. Both oriented helices yield the double helix. The tensor product of one Schr6dinger representation with frequency O with its contragradient one characterises a pair of entangled points in Fo, modelling the links of the amino-acid combination Adenine-Thymine and Guanine-Cytosine, respectively. The double coveting of a circle which itself is covered by the double helix provides us with a choice of two entangled pair of points representing either the Adenine-Thymine or Guanine-Cytosine link. Altogether we will get the geometric structure of a chromosome. In more detail this is expressed as follows. The spin 89 r operates isometrically and is volume preserving on the Ca-linear space JHI= CaOF. In contrast to the
80
E. Binz, S. Pods and W. Schempp
isomorphism m : Uo(1) ---* SO(F) C Sp(F) defined by m(z)(h) - z - l . h for all z E ua(1) and all h ~ F, the bijection mz:Ua(1)---*ua(1)
given by mz(z' ) :=z.z' for a fixed z and all z ' E Uo(1) is not a homomorphism. It has to be lifted to R.a C G~. This lift to the information channel is implemented in
defined by Y'(O)(tl.a -Jr h i ) ' -
eO.t,.a + e-~
1
for O, tl E ~ and any h~ ~ F. The lower index O in both the Heisenberg group G~9 = u a ( 1 ) O F and its Lie algebra ~ 9 - - ~ . a O F indicates the symplectic structure w ~ on F. Obviously, ?[•.a " ]~.a ---* U a (1) is a character called Xo and thus Y'(O)(t.a + hi) - Xo(t) + r(e~
By the Stone-von Neumann theorem Po and p are not equivalent since po(eqa)(qg) - e -O'tl"i while p(eq'a)(qg) - e -t''i. The coadjoint orbits classifying Po and p are Fo -- O.aOF,
respectively,
F1 -- a O F ,
planes perpendicular to a with the respective symplectic structures to ~ and toa on F. On the other hand, p(y.(O))(hl) -- po(hl.e ~
-- Umeo.aOpO(hl)Ofme_O.a
,
and hence p(t(O)) on G~ is rewritten in terms of Po on G a as p ( t ( O ) ( t l . a + hi)) - po(e h'a + e-O.a.hl) -- Umeoa~
-+- hl)~
for any 0 E ItS, any t~ E R and any h l E F. Now let us turn this equation into a geometric picture as follows. The Schr6dinger representation Po on G a shall be induced up by the character X~ xo.e (~....> for a non-vanishing vector v E F \ { 0 } . (Choosing another one yields an equivalent representation.) Thus the character X ~ varies on a helix in Ga covering the circle Kv:ua(1)---,F
The Heisenberg group as a fundamental structure in nature
81
defined by Kv (Z) "-- Z - I ' v
VZ ~- U a (1).
Therefore, pop is naturally defined by the universal covering of Im Kv - ua(1)'v, a helix in Ga, here called the spin 89 Equally well we can work with (pop)* yielding a helix with opposite orientation. The pair of points of pop(O) and (pot(O))* in a coadjoint orbit O.aOF are entangled since (pot)@(po?(O))* describes entangled pair of points in the double helix determined by pot and (pot)*. Notice that the two helices pop and (pop)* are oppositely oriented. The double helix intersects each coadjoint orbit in two opposite points ql and q2, say. Hence linking these two points by a line segment yields a ruled surface. For each of these line segments there are two choices as can be seen as follows: let v be a point in a coadjoint orbit on the double helix. The image of the map U a(1) ~ U a(1).v mapping each Z ~ ua(1) into K~(z) is a circle. This circle, however, is covered twice by this map. Thus to any w E ua(1).v there are two pre-images in ua(1), in particular to q l and q2. Linking the points in according sheets over q l and q2 yields the two choices of the line segment q lq2 of the double helix. The plane F has to be slightly inclined to produce the double helix structure of real chromosomes. Finally, let us point out that the pair O.aOF and - O . a O F of coadjoint orbits gives rise to the Hopf fibration $3---~ S 2 and, therefore, to the Villarceau circles of the two-dimensional toms. In that sense the Hopf fibration reflects entanglement in quantum information theory.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
7
Information visualisation and semiotic morphisms Joseph A. Goguen and D. Fox Harrell Department of Computer Science and Engineering, University of California, San Diego, CA, USA
Information visualisation design is generally ad hoc, using trial and error, and perhaps prior visualisation experiments. This chapter suggests a different approach: general design principles based on a combination of algebraic abstract data type theory, semiotics, and social theory. Major concepts include semiotic spaces to describe systems of related signs, semiotic morphisms to describe representations of signs, and preservation measures to describe the quality of representations. Some examples are given, each with a critical discussion, illustrating how semiotic morphisms can help with design.
1.
INTRODUCTION AND MOTIVATION
Appropriate visualisations of complex data sets can be an enormous aid to scientists in discovering, verifying, and predicting significant patterns. Unfortunately, it has proven difficult to find general principles for producing appropriate visualisations. One reason is the lack of a precise definition for the word "appropriate" in the previous two sentences. The present state of HCI research does not provide an adequate basis for the design of visualisations. A few precise laws are known, but they have very limited scope (e.g. Fitt's law); there are many case studies, but their generality is unknown; and there are many methods, but reliability is uncertain (e.g. protocol analysis, usability studies, interviews - see Goguen and Linde (1993) for a survey). Meanwhile, both user communities and technology bases are 83
84
J. A. Goguen and D. Fox Harrell
expanding very rapidly, while the commercial sector continues to produce exaggerated claims and mediocre products, and faith in experimental psychology and ergonomics as foundations is eroded by developments in Computer Supported Cooperative Work (CSCW) and related areas which demonstrate that many difficulties arise from taking inadequate account of the social context in which interfaces are actually used, and of the meaning behind the interfaces. In this sad situation, we badly need to explore new directions for the construction of general theories. Many fundamental issues in information visualisation can be understood in terms of representation: a visualisation is a representation of some aspects of the underlying information, and major questions are what to represent and how to represent it. An adequate theory of information visualisation must take account not just of current display technology capabilities, but also of the structure of complex information such as scientific data, the capabilities and limitations of human perception and cognition, and the social context of work. For scientific visualisation, the social context should include current scientific theories, conventional meanings of the signs and symbols used, the unequal importance of different patterns in the data, and the collaborative nature of scientific work. While it would be difficult to deny the importance of these factors for the design of visualisations and tools to support them, it would be foolish to believe that they are easy, and in particular, it would be foolish to believe that it is easy to get the designs of visualisation or visualisation tools fight the first time, or that design can be fully automated. For this reason, both theories and tools need to be broad and flexible, supporting relatively painless reconfiguration and evolution. Although it seems natural to try to use semiotics as the basis for a theory of representation, classical semiotics has unfortunately neither developed in a sufficiently rigorous way for our needs, nor has it explicitly addressed the representation of complex signs; in addition, its approach to meaning has been naive in some crucial respects, especially in neglecting (though not entirely ignoring) the social basis and contextualisation of meaning. So it is not surprising that semiotics has mainly been used in the humanities, where scholars can compensate for these weaknesses, rather than in engineering, where descriptions need to be much more explicit. Another deficiency of classical semiotics is its inability to address dynamic signs and their representations, as is necessary for interfaces that involve change, instead of presenting a fixed static structure, e.g. for standard interactive features like buttons and fill-in forms, as well as for more complex situations like animations and virtual worlds. We will suggest approaches to overcome all these limitations.
Information visualisation and semiotic morphisms
85
Because we consider information visualisation in particular, and user interface design in general, as problems in constructing appropriate representations, we need to know what representations are, and what makes them appropriate. For the first question, we consider a representation to be a mapping from one structured domain of signs, called a semiotic space or a sign system, to another such space. For the second question, we can measure the quality of a representation by how well it preserves what is most important to users, subject to any constraints imposed. These ideas might seem simple, but it is not so obvious how to make them precise. Here we use some algebraic methods developed for the theory of abstract data types (Goguen et al., 1978). More specifically, the structure of a sign system is given by an algebraic theory (consisting of a syntax declaration, similar to a context-free grammar, and a set of equations) plus some specifically semiotic features, including hierarchical levels for signs, and priorities on constructors; more details are given in section 2, and full details appear in Goguen (1999a). Dynamic interfaces can be handled by generalising from classical algebra to a variant called hidden algebra (Goguen and Malcolm, 2000), as discussed further in section 2.4. The success of this approach can be judged by the analyses and suggestions for improvement it provides for concrete examples, as in section 3. While sensitive designers might reach similar conclusions, algebraic semiotics does so in a systematic way, based on general principles (in any case, the original designers of the examples in section 3 did not reach these conclusions). The mathematical formulation of the theory also raises hope for partial automation of the design process. Finally, since all communication is mediated by signs, there is hope for applications well beyond information visualisation.
2.
ALGEBRAIC SEMIOTICS
We approach questions of representation and of the quality of representation through precise notions of semiotic space and semiotic morphism, the latter being a systematic translation between semiotic spaces. Though transformations are fundamental in many areas of mathematics and its applications (e.g. linear transformations, i.e. matrices), they have not been considered in classical semiotics. This section gives an intuitive introduction to some basic concepts. The main reference for algebraic semiotics is Goguen (1999a); an informal exposition of some main ideas and their motivation is given in the webnote Goguen (1996a), and an (intendedly) amusing introduction is given in the UC San Diego Semiotic Zoo (Goguen, 1996b). Further applications
86
J. A. Goguen and D. Fox Harrell
have been developed for a course on user interface design, some of which can be browsed at the class website (Goguen, 2002).
2.1.
Semiotic spaces
Signs need not be the simple things that we usually call "signs", such as the letters of an alphabet or traffic signs. In written natural language, sentences are composed from words, and words are composed from letters; also, user interfaces are often very complex systems that are usefully considered single complex signs. Semiotic systems 1 capture the systematic structure of signs. This subsection introduces some elements of this notion informally; see Goguen (1999a) for more formal details. An important insight due to de Saussure (1983) is that signs always come in systems. A typical example considered by Saussure is the tense system for the verbs of a language. For example, in English, adding "ed" to the end of a present tense (regular) verb makes it past tense, and adding "will" in front makes it future tense, as in "walk", "walked", and "will walk". Saussure's emphasis on the structure of systems of signs rather than isolated signs has been very influential, for example, in French structuralism and poststructuralism. A basic strategy for making complex combinations of signs easier to understand is to divide their potential parts into sorts, and then discover rules for the ways that each sort can be used. For example, newspapers are composed from articles, ads, cartoons, etc. while articles are composed from headlines, paragraphs, photos, diagrams, etc. and paragraphs are composed from sentences. The so-called parts of speech in traditional grammars are also sorts in this sense. Sorts may have a hierarchical structure under a subsort partial ordering. For example, the sort NOUN is a subsort of the sort NOUN-PHRASE. The rules for composing signs into more complex signs are of two kinds, called constructors and axioms. Constructors are functions that build new signs from other signs of given sorts, plus perhaps additional parameters. For example, a computer graphics image of a cat may be given as a constructor with parameters that determine its size, colour, and location on the screen. There may also be functions and predicates defined on signs; for example, a LOCATION function for graphical objects, and a HIGHLIGHTED predicate for text. A x i o m s are logical formulae built from constructors, functions and predicates; they constrain the set of possible signs. 1This paper uses the terms "sign system", "semiotic system", and "semiotic space" interchangeably.
Information visualisation and semiotic morphisms
87
In many examples, some constructors for signs of a given sort are more important than others. For example, a warning popup window is more important than a virtual pet cat. This gives rise to a priority partial ordering on the constructors for each sort. For a different example, the pollutants in a lake may be prioritised by their toxicity, to aid in the design of an appropriate visualisation. Another fundamental strategy for managing complexity is to have a hierarchy of levels, with signs that are not atomic being constructed from other signs that are at lower (or possibly the same) levels. Thus linguistics has levels for phonology, morphology, lexicography, syntax, and discourse (i.e. multisentential units, such as stories). Similarly, standard GUI displays have windows, which may contain other windows. It is clear that context, including the physical setting of a given sign, can be at least as important for meaning as the sign itself. In an extreme example, the sentence "Yes" can mean almost anything, given the fight context. This corresponds to an important insight of Peirce (1965), that meaning is relational, not just denotational (i.e. functional); this is part of the point of his famous semiotic triangle. Using the ideas of this paper, we can consider constructors that place signs in context, by making them parts of larger signs. For example, the familiar 12-h clock tells the correct 24-h time in the context of external illumination, which can be considered an argument of a higher level constructor for clocksin-context. It is worth noting that neither semiotic theories nor semiotic morphisms describe relationships between signs and the realities (if any) that they represent; rather, it is the signs determined by the theories that can be taken to describe real situations. For example, a database schema might have fields for the age, condition, type, height, etc. of roses, but only a particular database can contain actual data about roses. Thus, a semiotic theory determines a class of signs, which can potentially describe things in the world. This paragraph contains some technical remarks for those who have the background and interest. A semiotic system S is a tuple (~,A,P,L), where is a signature (or grammar) with a set N of sorts (or non-terminals) partially ordered by a subsort relation, A a set of axioms, P a priority ordering on constructors (which are in ~ ) , and L a level ordering on sorts. Then the signs of S are the elements of an "initial" (i.e. standard, or "intended") model of S which is known to exist for many reasonable choices of a logic to use for ~ and A (for example, equational logics and Horn clause logics have initial models, as do all "liberal institutions" in the sense of Goguen and Burstall (1992)). More mathematical details can be found in Goguen and Malcolm (1996).
88
2.2.
J. A. Goguen and D. Fox Harrell
Semiotic morphisms and design
Crafting a helpful explanation or a good "icon" (in the informal sense of computer graphics rather than in Peirce's technical sense), choosing a good file name, or using a mixture of media to present given content in a satisfactory way, are all problems of translating signs in one system to signs in another system. In such cases, we know the source system, and we seek a suitable target system and an appropriate transformation that presents the information of interest in an appropriate way; often we even know the target system. This is the problem of design. Conversely, we may know the target sign system and seek to infer properties of signs in the source system from their images in the target system; this happens, for example, when we try to understand a poem, an equation, a drawing, or indeed, anything at all. Let us call this the inverse problem, as opposed to the "direct problem" of design. Information visualisation is an especially good source of illustrations for algebraic semiotics, due to the two advantages that information visualisations have over arbitrary design problems. These are that the source space is concrete and given in advance, and that the target space consists of visual signs. The designer must be sensitive to features of the data to create a useful visualisation, but certain structural features may not be obvious, and it may be even less obvious which of them are the most important. The process of considering a visualisation as a semiotic morphism can focus the designer on such basic structural issues, and thus help in creating a good graphical representation. Because semiotic systems are theories rather than models, semiotic morphisms must be translations from one theory to another, rather than translations from one concrete sign to another. This may seem indirect, but it has important advantages. First, these are theories of systems of signs, rather than of particular signs. In the case of information visualisation, each model of the source theory is a possible dataset to be visualised, and each model of the target theory is a possible graphic representation. Dealing with theories forces the designer to more carefully consider the space of possibilities, instead of being seduced by idiosyncratic features of some particular data sets that happen to be available. Second, taking theories as our basis allows new structure to be added later, by expanding the theory in a consistent way. In general, there are many different semiotic morphisms between two given semiotic spaces, each determining a different way to represent signs. For example, in scientific visualisation, a database may be presented as a text file, or displayed graphically in many different ways. Semiotic morphisms take structure in the source space to structure in the target space, mapping sorts to sorts, subsorts to subsorts, constructors to constructors, etc. But in many real-world applications, not everything can be preserved, so these
Information visualisation and semiotic morphisms
89
maps must be partial. Axioms should also be preserved - but again in practice, not all axioms are preserved. Design is the problem of massaging a source space, a target space, and a morphism, to achieve acceptable quality, subject to constraints. The extent to which different kinds of structure are in fact preserved gives a way to compare the quality of semiotic morphisms, as discussed further in section 2.3. Semiotic morphisms should of course also preserve content, but there are many examples where this too is partial; for example, relatively little content is preserved in representing a book by its table of contents. This paragraph continues the technical remarks at the end of section 2.1 for those who have the background and interest. A semiotic morphism from S to S' consists of a partial theory morphism from (~,A) to ( ~ , A ' ) that partially preserves the priority and level orderings. Under certain reasonable conditions (e.g. if the logic in which theories are expressed is liberal in the sense of Goguen and Burstall (1992)), a semiotic morphism induces a (partial) homomorphism on the initial models, which maps the signs of S to signs of S ~. There is always a natural "forgetful" mapping in the reverse direction. More mathematical details can be found in Goguen and Malcolm (1996).
2.3.
Quality of semiotic morphisms
Each aspect of semiotic spaces that might be preserved gives rise to a different measure of quality, given as the degree to which this aspect is preserved. For example, given semiotic morphisms M1 and M2 from one semiotic space S~ to another $2, we may define M1 E--cM2 if M2 preserves every constructor that M1 preserves, and M1 E--AM2 if M2 preserves every axiom that M~ preserves. Other preservation relations are defined similarly (Goguen, 1999a). There are also more refined orderings, e.g. M1 E--C,sM2 if M2 preserves every constructor of sort s that M~ preserves; and we can define Boolean combinations of all these orderings to get something appropriate for a particular application. For example, Goguen (1999b) applies these ideas in justifying design decisions for the user interface to a theorem-proving system. Note that these quality measures are partial orderings, rather than linear numerical scales; this is appropriate because semiotic spaces are qualitative, in that they are concerned with structure. However, we can certainly define numerical scales if we wish to; for example, the percentage of constructors of sort s preserved corresponds to F-c, ~ but conveys less information than Fc, ~ does, since the latter can be used to compare a given morphism with as many others as necessary to determine exactly which constructors are preserved.
90
2.4.
J. A. Goguen and D. Fox Harrell
Some further topics
Sacks' (1972) notion of "category system" from the branch of ethnomethodology Garfinkel (1967) called conversation analysis (cf. Sacks, 1992) is related to semiotic systems, but is less formal. Our previous work on the nature of information (Goguen, 1997) also uses ideas from ethnomethodology, and can be seen as providing a philosophical and methodological foundation for algebraic semiotics that takes account of the social nature of signs. Lakoff, Johnson and others have developed the flourishing field of cognitive linguistics, building on previous careful studies of metaphor (Lakoff, 1987; Lakoff and Johnson, 1980; Lakoff and Nfifiez, 2000). Fauconnier and Turner (1998) introduced the notion of "blending", and demonstrated its importance for many aspects of cognition. See the blending website for much more information (Turner, 2003). Simple examples from natural language include "house boat", "road kill", "artificial life", and "computer virus", each of which is a blend of its two component words. It happens that "boat house" has a different meaning from "house boat" because a different blend is computed. This is not because the order of the words is different, but because the same two spaces can have many different blends (Goguen, 1999a). Semiotic spaces significantly generalise the conceptual spaces used in cognitive linguistics, because they allow far more than just objects and binary relations. An appropriate generalisation of blending is given in Goguen (1999a), coveting many interesting examples in user interface design and information visualisation. In this setting, a blend is built from two (or more) semiotic morphisms having a common source, called the generic space, with targets called the input spaces, by providing two (or more) semiotic morphisms from the input spaces to a blend space, subject to certain "optimality" conditions that rule out the uninteresting cases (Goguen and Malcolm, 1996). Hidden algebra extends the algebraic theory of abstract data types to handle states and dynamics, as well as concurrency and non-determinism (Goguen and Malcolm, 2000). These are exactly the features needed to move algebraic semiotics from static signs to dynamic signs, for handling interactive interfaces, animated visualisations, virtual worlds (Goguen, 2001), etc. Our approach requires that the cognitive and social dimensions of this extension should also be addressed. These can be explored using Gibson's notion of affordance, which he defined as "a capability for a specific kind of action, involving an animal and a part of its environment" (Gibson, 1977). For example, a [BACK[ button on a browser provides an affordance for retuming to the previously viewed page. Wemer Kuhn has
Information visualisation and semiotic morphisms
91
used semiotic morphisms, Gibsonian affordances, and blending to develop semantics for geographic information system interfaces (Kuhn, 2002).
3.
SOME EXAMPLES
Four examples are given in the following subsections, each with a discussion showing how semiotic morphisms can help with the design of information visualisations, including suggestions for improving displays.
3.1.
A code browser
Because a major intuition of semiotic morphisms is that they should preserve what is most important, it may be surprising that, if there is a conflict between structure and content (e.g. because not all the data can be displayed at once), it is more important to preserve structure than content. This is called Principle F/C in Goguen and Malcolm (1996), and it is nicely illustrated in fig. 1, which is based on a code browser built at Bell Labs. The content of this display, which is the code of some program, has been sacrificed in favour of its structure, which is its division into files and procedures. Two spatial dimensions are used to
Fig. 1. A code browser.
92
J. A. Goguen and D. Fox Harrell
represent this structure, while colour (which shows up as shading in the black and white version) is very effectively used to represent the age of the code. (The superimposed window on the bottom gives an overview of the whole program, plus a close-up showing some actual text. This illustrates the overview and zoom features of the system.) Without knowing the use of this system, it is impossible to know how appropriate its representation really is. Still, we can infer from the display that the designer thought that the age of code was the most important attribute, presumably because of its value in debugging. However, such a tool would be even more useful if it could be configured to highlight with colours a variety of features of interest for a variety of problems; such features might include references to certain variables, certain uses of pointers, certain kinds of recursion, etc. (e.g. consider what might be needed to work on the Y2K problem).
3.2.
FilmFinder
Figure 2 illustrates FilmFinder, a system from Ben Shneiderman's group at the University of Maryland (Shneiderman, 1998) for displaying films, with the vertical axis indicating popularity, the horizontal axis indicating date, and
Fig. 2. FilmFinder.
Information visualisation and semiotic morphisms
93
the colour indicating genre; 2 the area on the fight-hand side is for controlling the system. We can see this display as the image under an appropriate semiotic morphism of a sign in a system of information about films, and we can infer what information the designer of this interface thought users would consider most important, namely the popularity, date, and genre of each film. Treating this figure as a display of scientific data about the movie industry, we see that the density of films is significantly greater in the most recent years, except perhaps for those genres that are least popular; one can also notice other facts, such as that there has always been a higher percentage of drama and that there are increasing percentages of action and horror. However, this representation is not as useful as it could be. The problem is that too much content and not enough structure has been preserved. For example, it would seem better to aggregate all films having approximately the same attributes of interest into one blob and then display the number of films in a blob using a distinct visual attribute, such as size or brightness. Successive blobs of the same kind could then be connected by lines having the same colour as the blobs. Users could click on a blob to see what is in it, preferably displayed in a new popup window. These revisions could facilitate search.
3.3.
A later version of FilmFinder
Figure 3 depicts a later version of the same tool as in fig. 2, for the same domain of films (the SpotFire version of FilmFinder, from IVEE Development in Sweden); the main improvement is to give the user more control over what is displayed and how it is displayed. The particular display shown uses length and date for its two axes, and again uses colour for genre, though the genre colour coding scheme is not indicated; prize-winning films are highlighted by having a larger size. Here we can observe a clustering at around 90 min length, and we can again observe that there are too many dots to be useful, even though this particular display cuts off at 1990! If the user is looking for a particular film or class of films, he/she will have to narrow the focus by imposing additional constraints, and this single display does not give us enough information to know how effectively that can be done. We may presume that the (possibly imaginary) user who created this display thought that these particular attributes were the most interesting at a certain point during a sequence of displays constituting a search; but in fact, they do not seem particularly useful. 2As before, this is indicated by tones of grey in our rendition of the display.
94
J. A. Goguen and D. Fox Harrell
Fig. 3. The SpotFire version of FilmFinder. We can also infer what the designer of this version thought would be most important, by examining the controls on the fight of the display; we may hope that these were determined by polling an adequate pool of typical users, but the key issue should be how easy it is to use these controls in scenarios that have been found to be of particular importance. Presumably typical users are more likely to be looking for a good video to rent, than they are to be analysing trends in the movie industry. So once again, the controls should reflect the key features involved in typical searches, rather than just the most important attributes of films in general. It would take some experimental work to determine what these key search relevant attributes might be. But we can still criticise the design of the control console, because of its exclusive focus on simple attributes instead of structure. And we can criticise the fine grain control given to users over length and year, suggesting instead that soft constraints would be more appropriate; it also seems doubtful that length is a highly significant attribute for search. We can also criticise its design philosophy, advocating instead a more socially oriented approach that relates the profile of one user to the profiles of other users to select films that similar users have found interesting (there are numerous variations on this theme, such as listing films that a user's friends have liked). Finally, we can note that the design ideas proposed to improve the previous version of this system still apply to this version.
Information visualisation and semiotic morphisms
95
Fig. 4. Two representations of a file hierarchy.
3.4.
A file system
Figure 4 sketches a semiotic space for a file hierarchy, along with two semiotic morphisms, for visualising it in two different ways in the graphical user interface of Apple's Macintosh OS 8.6. The source space is a rational reconstruction of a specification for the file system; its structure is that of an ordered labelled finite tree. When Folder C is opened in the representation on the fight, the location of file "Document.txt" is represented textually in the small area at the top of its window, whereas in the left representation, its location has a visual representation, based on position, including indentation. The left visualisation is better, because it shows more of the source space structure in visual form, and also provides more browsing affordances in visual form. However, more could be done in this direction.
4.
DISCUSSION
As the examples above illustrate, it is often more practical to apply algebraic semiotics informally, calling on precise definitions only when needed for difficult design decisions, and otherwise using the formal framework mainly as a way to guide the analysis. The examples also illustrate that even a little relevant theory can pinpoint significant deficiencies and suggest improvements. The UCSD Semiotic Zoo (Goguen, 1996b) displays a number of other graphical designs and uses algebraic semiotics to analyse their deficiencies.
96
J. A. Goguen and D. Fox Harrell
Measuring quality by what is preserved and how it is preserved seems a novel idea, at least when formulated with the precision and generality suggested here. The principle that it is more important to preserve structure than content when a trade-off is forced, has surprised even some design professionals, although it is in the literature for many special cases, for example in the books of Edward Tufte, e.g. Tufte (1983). Another nonobvious result is that preserving high-level sorts is more important than preserving priorities, when a trade-off is necessary. The need to take account of social issues in user interface design, e.g. in our discussion of fig. 3, is also surprising to many people; for this reason, our version of semiotics is not only just algebraic but also social. This insight is not unique to algebraic semiotics; for example, the importance of social factors in HCI is the focus of its CSCW subfield.
REFERENCES Fauconnier, G., Tumer, M., 1998. Conceptual integration networks. Cognitive Science 22, 2, 133-187. Garfinkel, H., 1967. Studies in Ethnomethodology. Prentice-Hall, New York. Gibson, J., 1977. The theory of affordances. In: Shaw, R., Bransford, J. (Eds.), Perceiving, Acting and Knowing: Toward an Ecological Psychology. Erlbaum, Manwah, NJ. Goguen, J., 1996a. Semiotic morphisms. Available on the web at www.cs.ucsd.edu/users/ goguen/papers/smm.html. Earlier version in Proceedings of the Conference in Intelligent Systems: A Semiotic Perspective, Albus, J., Meystel, A., Quintero, R. (Eds.), Vol. II, National Institute of Science and Technology, pp. 26-31. Goguen, J., 1996b. The UCSD Semiotic Zoo, 1996-2001. Available on the website at URL www.cs.ucsd.edu/users/goguen/zoo/. Goguen, J., 1997. Towards a social, ethical theory of information. In: Bowker, G., Star, L., Turner, W., Gasser, L. (Eds.), Social Science, Technical Systems and Cooperative Work: Beyond the Great Divide. Erlbaum, Mahwah, NJ, pp. 27-56. Goguen, J., 1999a. An introduction to algebraic semiotics, with applications to user interface design. In: Nehaniv, C. (Ed.), Computation for Metaphors, Analogy and Agents, Vol. 1562, Lecture Notes in Artificial Intelligence. Springer, Berlin, pp. 242-291. Goguen, J., 1999b. Social and semiotic analyses for theorem prover user interface design. Formal Aspects of Computing 11,272-301. Goguen, J., 2001. Towards a design theory for virtual worlds: algebraic semiotics, with information visualisation as a case study. Proceedings, Virtual Worlds and Simulation, Society for Modelling and Simulation, San Diego, CA, pp. 298-303. Goguen, J., 2002. User Interface Design Class Notes. The CSE 271 website at www.cs.ucsd. edu/users/goguen/courses/271. Goguen, J., Burstall, R., 1992. Institutions: abstract model theory for specification and programming. Journal of the Association for Computing Machinery 39, 1, 95-146. Goguen, J., Linde, C., 1993. Techniques for requirements elicitation. In: Fickas, S., Finkelstein, A. (Eds.), Requirements Engineering '93. IEEE, pp. 152-164, Reprinted in
Information visualisation and semiotic morphisms
97
Software Requirements Engineering. Thayer, R., Dorfman, M. (Eds.), 2nd Edition. IEEE Computer Society, 1996. Goguen, J., Malcolm, G., 1996. Algebraic Semantics of Imperative Programs. MIT Press, Cambridge, MA. Goguen, J., Malcolm, G., 2000. A hidden agenda. Theoretical Computer Science 245, 1, 55-101. Goguen, J., Thatcher, J., Wagner, E., 1978. An initial algebra approach to the specification, correctness and implementation of abstract data types. In: Yeh, R. (Ed.), Vol. IV, Current Trends in Programming Methodology. Prentice-Hall, New York, pp. 80-149. Kuhn, W., 2002. Modeling the semantics of geographic categories though conceptual integration. In: Egenhofer, M.J., Mark, D.M. (Eds.), Geographic Information Science, Second International Conference (GIScience 2002), Vol. 2478, Springer Lecture Notes in Computer Science, pp. 108-118. Lakoff, G., 1987. Women, Fire and Other Dangerous Things: What Categories Reveal about the Mind. Chicago. Lakoff, G., Johnson, M., 1980. Metaphors We Live By. Chicago. Lakoff, G., Nffiez, R., 2000. Where Mathematics Comes from: How the Embodied Mind Brings Mathematics into Being. Basic Books, New York. Peirce, C.S., 1965. Collected Papers. Harvard. In 6 volumes, see especially Vol. 2, Elements of Logic. Sacks, H., 1972. On the analysability of stories by children. In: Gumpertz, J., Hymes, D. (Eds.), Directions in Sociolinguistics. Holt, Rinehart and Winston, New York, pp. 325-345. Sacks, H., 1992. In: Jefferson, G. (Ed.), Lectures on Conversation. Blackwell, Oxford. de Saussure, F., 1983. Course in General Linguistics. Duckworth, London, Translated by Roy Harris. Shneiderman, B., 1998. Designing the User Interface, 3rd Edition. Addison Wesley, Reading, MA. Tufte, E., 1983. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT. Turner, M., 2003. The blending website. Maintained by Mark Turner, and available at the URL www.wam.umd.edu/--mtum/WWW/blending.html.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
8
Iconicity and "direct interpretation ''1 Jesse Norman Philosophy Department, University College London, Gower Street, London WC1E 6BT, UK
In a series of papers, Keith Stenning and colleagues have advanced a wellknown distinction between "direct" and "indirect" interpretation, as a means to differentiate diagrams and sentences. I assess this distinction by exploring a case study in Stenning (2000) that compares Peirce's existential graphs (EGs) and Euler circles (ECs) with each other, and with their equivalents in various sentential logics. This analysis suggests that the distinction between direct and indirect interpretation is not sufficient as a means to differentiate between diagrams and sentences, either in the case under review or more generally. However, in the light of the discussion we can usefully distinguish between two types of iconicity, which when applied to ECs and EGs seem to explain many of the relevant intuitions.
1.
INTRODUCTION
A well-known and continuing debate concerns how in principle diagrams and (declarative) sentences represent information, and whether and how differences in how they do so can be used to distinguish reliably between them. There is now a substantial literature on this topic, with at least eight identifiable "discrimination theories", as I shall call them, in play. 2 ~This paper was prepared under an Arts and Humanities Research Board postgraduate grant, which I gratefully acknowledge. I would also like to thank the anonymous referees of this paper for their comments. 2Shimojima (2001) usefully lists and describes these theories.
99
100
J. Norman
Among these discrimination theories, two in particular have been recently prominent. The first, originating from Barwise and Etchemendy (1990), takes as foundational the claim that diagrams bear a homomorphic relation to what they represent (their "ranges" or "targets"), whereas sentences typically do not. This has been extended into a theory that analyses such linkages in terms of the structural constraints imposed by the need to represent information via one representational type rather than another, and has been given a detailed situation-theoretic treatment by Shimojima (1996). The second theory, advanced by Stenning (and colleagues), claims that "the fundamental distinction between diagrammatic and sentential semantics is between direct and indirect interpretation". 3 Stenning criticises structural constraint theories for ignoring an important dimension of what it is to be a diagram: that is, the apparently greater degree of cognitive accessibility of diagrams as against sentences. 4 This cognitive accessibility is understood mainly in terms of the ease with which inferences can often be made from diagrams. On this view, then, it is the central fact that diagrams are "directly interpreted" that renders them more accessible, more available for and supportive of human reasoning, than sentences. Stenning's concern as to the sufficiency of the appeal to structural constraints to explain our intuitions about diagrams is, I think, well taken, for several reasons. Quite apart from the general question whether any single property could be enough to explain a suitable set of our intuitions as to the differences between sentences and diagrams, it is clear that homomorphism alone is insufficient to do so, since sentences in formal and natural languages can be homomorphic to their ranges without thereby becoming diagrams. Moreover, to the extent that a central goal of structural constraint theories is to explain the apparently distinctive perspicuity of diagrams as a representational type, it seems that the general appeal to a theory of structural constraints requires to be supplemented by a theory that unpacks the specific psychology of our encounter with diagrams if it is to provide the requisite explanation. Without this, attractive though it is, a structural constraints theory will lack an account of just what it is about that encounter for we humans that imposes the constraints in question. I have myself separately argued for a theory of this general shape, in terms of two defined properties of "discretion" and "assimilability". On this line of thought, what differentiates diagrams from sentences and depictions lies in the interplay between a structural or information-theoretic component (the extent to which, in order to convey information that P, a representation must also convey other information; or "discretion" in my 3Stenning (2000, p. 136). 4See, e.g. Gurr et al. (1998, pp. 533-534).
Iconicity and "direct interpretation"
101
terminology), and a psychological component (the extent to which human reasoners can grasp and process the content of a particular representation, or "assimilability"). Though assimilability is not purely a matter of inferential tractability, this and Stenning's notion of accessibility are clearly quite closely related. 5 However, it is not with this broader debate as such that I am concerned in this chapter. Rather, I want to focus on Stenning's important positive claim: that "the fundamental distinction between diagrammatic and sentential semantics is between direct and indirect interpretation". It seems that this distinction is intended to supersede a similar distinction of Peirce' s, between the iconic and the symbolic. I want to argue that Stenning's claim is, in fact, rather questionable, and that we should prefer Peirce' s distinction. Out of this discussion will emerge an interesting and, I think, valuable distinction between two varieties of iconic representation. Given that the interdisciplinary study of diagrammatic or graphical representation is still in its early stages, it may be valuable to try to clarify some of the organising concepts and terminology in this way.
0
THE "DIRECT/INDIRECT INTERPRETATION" DISTINCTION
The central claim under review - that the fundamental distinction between diagrammatic and sentential semantics is between direct and indirect interpretation - is advanced in a series of papers by Stenning and his collaborators. 6 For the sake of specificity, I will focus on Stenning (2000). To understand the central claim, we need to understand what is meant by the distinction between direct and indirect interpretation. How should we do so? Matters are not quite clear, so one should proceed cautiously here. However, Stenning identifies a variety of features that are supposed to distinguish between direct and indirect interpretation; among these, we can focus on three related features in particular: 1. Interposed syntax. The first, and apparently the most basic, feature relates to syntax: "sentential languages are interpreted indirectly because an abstract syntax is interposed between representation and the referenced world .... The interpretation is indirect because the significance of two elements being spatially (or temporally) concatenated cannot be assessed 5cf Norman (1999, 2000). 6These include Stenning et al. (1995), Gurr et al. (1998), Stenning (2000), and Stenning and Lemon (2001).
102
J. Norman
without knowing what abstract syntactic relation holds between them". 7 Call this the "interposed syntax" feature. 2. Uniform treatment. A further distinguishing feature is suggested by Stenning's treatment of 1D "finite state" languages, which contain letterstrings but have no abstract syntax. Of these, he says "here the interpretation of the spatial relation is uniform wherever it occurs, unlike concatenation in a language with abstract syntax. So on our classification these systems are semantically 'diagrammatic' ,,.8 Call this the "uniform treatment" feature. 3. Agglomerative use. Finally, Stenning suggests that the difference between directly and indirectly interpreted representations has as a consequence a difference in the types of reasoning in which they can be used. He distinguishes between "discursive" and "agglomerative" presentations of arguments. On a discursive presentation, such as a standard sentential presentation of a proof, a sequence of thoughts is represented by a sequence of lines, in which each line contains information derived from a previous line, modified by a permitted rule of inference. No information is erased, but only selected information may be carried forward from one to the next. On an agglomerative presentation, a single representation is progressively modified at each stage, and representation of new information may cause old information to be erased. This gives a third distinguishing feature: according to Stenning, "only indirectly interpreted representations can be used discursively". 9 So, since diagrams are supposed to be direct representations, diagrams cannot be used discursively. Call this the "agglomerative use" feature. On this view, then, diagrams are distinguished by their direct interpretation, and this amounts to the claim that: (1) there is no interposed abstract syntax between a diagram and what it represents; (2) the interpretation of the spatial relations between elements of a diagram is uniform; and (3) diagrams can never be used discursively. 1~ The foundational distinction here is between direct and indirect interpretation. On closer review, however, it is not immediately clear what is meant by these terms, or by the phrase "abstract syntax". The key quote in (1) above could be read as containing two claims: one about the existence of functional relations between a representation and what it represents, and one 7Stenning (2000, p. 136). 8Stenning (2000, p. 137). 9Stenning (2000, p. 133). I~ further feature discussed in Stenning (2000) is that of type- vs. token-referentiality. This raises some important additional issues. I do not, however, think they affect the main argument made here.
Iconicity and "direct interpretation"
103
about how these relations are to be assessed by someone who looks at the representation. The former claim cannot be used to differentiate diagrams from sentences in formal contexts, since both will normally have explicitly defined functional relations linking representations and what they represent. So it seems better to read "interposed" as referring, not simply to these relations, but to the way they are to be understood by an interpreter. 11 The thought is, I think, that the concatenation that holds between the elements of, for example, a noun phrase and those of a verb phrase in a sentence of natural language can only be understood by someone who has already grasped the syntax for the language in question. As Stenning says, "the presence of an abstract syntax means that there has to be 'punctuation' of formulae into syntactic units (sentences) prior to interpretation". 12 Indirect interpretation, then, requires the reasoner to know 13 an abstract syntax in order to grasp the significance of a representation correctly. The contrast with direct interpretation is, then, that in the latter the reasoner does n o t need to know any abstract syntax in order to grasp the significance of a representation correctly. 3.
H O W TO CLASSIFY EXISTENTIAL GRAPHS?
In Stenning (2000), it is regarded as a virtue of the direct/indirect interpretation approach that "as a criterion, it leads to a classification of actual representation systems that goes beneath the surface of what is 'obvious' about which things are diagrams and which language". Consider, in particular, how it classifies Peirce' s 2D system of alpha EGs. This system has a formal syntax and semantics. 14 The syntax identifies various permissible categories of mark, defines what it is for a graph composed of these marks to be well formed, and gives the rules of inference that govern permissible transformations of one graph into another. The semantics then describes how each of the marks in the system may be interpreted: propositional letters may be taken as denoting propositions, the "cut" or closed curve may be taken as denoting negation, etc. On the direct/indirect distinction described in (1) above, alpha EG is classified as sentential, for it has an abstract syntax and semantics, and one must at least grasp both in order to grasp the meaning of a claim represented by a graph. I shall return to (2) and (3) later. 11This is supported by a further remark on homomorphisms (Stenning 2000, p. 137): "Direct interpretation precludes an abstract syntax .... It is not that there is any lack of homomorphism in sentential languages, but its recovery requires access to the syntactic interpretation of concatenation." 12Stenning (2000, p. 138). 13Or merely: be able to appeal to? Tacitly or explicitly? I leave these options open. 14A good general introduction to the syntax and semantics is Roberts (1973).
104
J.
Norman
This would be an interesting result, and this makes EG something of a test case for the theory, for many people have regarded EG as diagrammatic. ~5 How should one react to it? On one view, the result shows the power of the theory in revealing how our apparently obvious intuitions about this representational form are mistaken; if we accept the distinction, we must reject our intuitive classification of EG as diagrammatic. On another view, this logic can be reversed; if we hold to the view that EG is diagrammatic, then the "direct/indirect interpretation" theory cannot be correct. A third view might be this: that EG is diagrammatic in one way and sentential in another. This view faces a double challenge: to articulate both why we should accept the basic distinction, and why EG has this hybrid status. Which view should we adopt? We cannot decide this just by considering EG alone, for whether or not EG is diagrammatic is part of what is at stake here. We also need to consider a case study that Stenning offers of the differences between EG and ECs, which is (I think) supposed to justify the rejection of EG as diagrammatic by comparing it with another (diagrammatic) presentation of logical relations, and each with their equivalents in various sentential logics. This will allow us to get a sharper and more specific understanding of how the "direct/indirect interpretation" distinction works. In particular, we need to see if the criteria adduced above genuinely distinguish in the desired way between these and other putatively sentential and diagrammatic representations in a way that respects our intuitions. The latter is a substantial topic in its own fight, so my remarks will of necessity be brief.
4.
COMPARING EC AND EG
I start with the case study in Stenning (2000). This consists of a specific comparison between an EG, sentences in a standard two- (--, A) or fiveoperator (--,, A, V,---,, ,--,) sentential logic ("2SL" a n d "5SL"), 16 and an EC; for reasons of space, I must assume the reader's familiarity with these systems. The goal is to explain the status of EGs by comparison with sentential and diagrammatic representations that are both intuitively genuine and determinately so in terms of direct and indirect interpretations. Specifically, the thought is that two representations can be visually very similar (cf. Fig. 1) and yet one be classified as only partially diagrammatic on other grounds. 15cf the work of Peirce scholars such as Zeman (1964), Roberts (1973), Ketner (1996), and of diagram theorists such as, for example, Hammer (1995) and Shin (1998). 16I reserve the word "proposition" to denote the "content" of these logics, not the "form". Strictly speaking, alpha EG is a propositional logic.
Iconicity and "direct interpretation"
Existential Graph Fig. 1.
105
Euler Circle
E G and EC representations.
Take the following representations, which are both taken by Stenning to express the claim "if P ~ Q, if Q---, R": The line of thought seems to be this. These representations supposedly express the same claim, and are very similar in appearance. But appearances can be deceiving" the letters in the EG stand for propositions that are negated by being enclosed, while in the EC the letters function as labels for sets demarcated by the circles. Moreover, the graph can be read in a variety of other ways, for example, as (in sentential logic): -, (PA--, (QA--R)) or as
P--.(Q---,R) whereas it seems that the EC can be read only in one way, as P ~ (Q ~ R). 17 So it seems as though the concatenation relation can be understood in different ways 18 in the case of EG, requiring an appeal to syntax to differentiate them. Moreover, the translations in sentential logic suggest that the logical content 17In fact, matters are not quite so simple. As noted, Stenning says that both diagrams express the claim "if P ~ Q, if Q---, R". But this sentence of loglish is ambiguous between (in English) the single proposition "if P, then (if Q then R)" and the two propositions "if P then Q" and "if Q then R". Moreover, the EG given in Stenning (2000) does not correctly formalise either of these alternatives. If the former is meant, then the graph should read, in linear EG, (P((Q(R)))); if the latter is meant, then the graph should read (P(Q)) x (Q(R)). By contrast the quoted graph (fig. 1) is, in linear EG, (P(Q(R))). To be faithful to Stenning' s case study, I have not corrected the graph; but this has the effect that in fact the two representations are not logically equivalent, as originally intended. 18This is not supposed to suggest that it is clear in general how we should understand the notion of concatenation for diagrams; in many cases what is concatenated will depend on how/from which direction the diagram is read.
106
J. Norman
of the representation is clustered in slightly different ways in each case, as indicated by the relative positions of the parentheses and negation operators. So there is something here akin to what Stenning calls "punctuation". 19 In the case of the EC above, however - the line of thought seems to go - the labels have no stipulated location; they can be placed either inside or outside the circles to which they refer. So it seems that nothing in the way we understand the circles hang on where the letters are located. Moreover, concatenation is apparently read only in a uniform way (as denoting set inclusion, and so interpretable as material implication). Finally, Stenning notes that the EC is designed to be used in an agglomerative way, progressively representing information until a given conclusion is read-off the relevant diagram. By contrast, the EG can be used agglomeratively, but it can also be used discursively, as described in several textbooks. 2~ So, if we apply the three criteria for diagrammaticity listed earlier, it appears that: (1) there is no interposed abstract syntax between the EC and what it represents; (2) the interpretation of the spatial relations between elements of a diagram is uniform; and (3) the use of ECs is always agglomerative. By these criteria, the EC is diagrammatic, but the graph is not. How then to accommodate the common intuition that EG is diagrammatic? The very interesting suggestion here is that (as per the third view mentioned above) EG is a hybrid: Stenning says "EG graphs directly represent sentences, but only indirectly represent the propositions those sentences represent". 21 The sentences in question are those of the sentential version of EG, and the thought seems to be that a reasoner does not need to know any abstract syntax in order to grasp that a given graph represents the counterpart sentence in the sentential version of EG. Why not? Because she can simply read the sentence as a horizontal slice of the graph, as in fig. 2. On this view, then, one does not need to know any abstract syntax to grasp that the graph above represents the counterpart sentence (P(Q(R))). Of course, the view would still hold that the graph did not satisfy criteria (2) and (3) - nothing has been done to change the multiple (i.e. nonuniform) interpretability of graphs or their sentential counterparts, or the possibility of reasoning with them discursively. But it seems the basic criterion (1) would be satisfied. So it seems we have a reason here for 19Akin to Stenning's notion of punctuation, because unlike the latter, it does not occur prior to interpretation; in particular, it is not prior to the interpretation of the logical operators. a~ Roberts (1973) and Ketner (1996). elStenning (2000, p. 145).
Iconicity and "direct interpretation"
107
Fig. 2. Reading-offlinear EG.
regarding EG as a hybrid: as sentential in a primary sense, diagrammatic in a secondary one. However, this view faces an immediate difficulty. For if we apply criterion (1) to the sentential version of EG, "linear EG", it seems we must also accept that a reasoner does not need to know any abstract syntax in order to grasp that a given sentence of linear EG represents its counterpart graph. But if this is true, then the sentence is a direct representation of the graph, i.e. a diagram. This would make the acknowledged sentences of linear EG into diagrams of the relevant graphs. Something seems to have gone wrong. How can a sentence be both a sentence and a diagram, on a test meant to differentiate the two? How can both forms of EG be sentences representing propositions? Note that the problem cannot be evaded by denying that the lines of linear EG are sentences. For the theory takes them to be sentences - they indirectly represent propositions - and this faithfully reflects the deeper commitment to the cognitive accessibility of diagrams in general that underlies the overall approach. I shall return to this point below.
5.
THE EC/EG COMPARISON RECONSIDERED
The case study above also raises some more general questions, which creates further difficulties for the direct/indirect representation approach in differentiating between the supposedly diagrammatic EC and the supposedly hybrid EG. Take the third "agglomerative use" feature: there seems to be no reason why we cannot use ECs for purposes of reasoning in a discursive
108
J. Norman
presentation. Here is a discursively presented proof in EC of the conclusion represented in fig. 1:
Premiss: all P is Q
Premiss: all Q is R
Conclusion: all P is R
Presenting the proof in this way would require the addition of a rule of insertion/substitution to the given syntax, but there is no reason at all to think this improper in general, and it self-evidently does not lead to falsehood in this case. Since information can be erased during the course of agglomerative proofs using ECs, and the discursive presentation does not erase information, there is value in the discursive presentation for EC, just as there is for EG. 22 So if, as the third feature claims, diagrams cannot be used discursively, then ECs are not diagrams. If this is fight, then not only are EGs sentential, but ECs are too. Note, moreover, that this point is not confined to EG and EC: arguments in Euclid's geometry can be presented discursively using diagrams, as can arguments using Venn diagrams, for example. But these representations are often regarded as central cases of the diagrammatic in mathematics and logic, if not more generally. So we have reason to doubt the value of the third criterion if it rules against them. 22On philosophical issues relating to proof procedure in alpha EG, see Norman (1999).
Iconicity and "direct interpretation"
109
Considering the second "uniform treatment" criterion seems to yield the same conclusion. Both EG and EC use marks and spatial relations to symbolise logical (or set-theoretic) operations. If we understand the letters in the EC as the labels of circles, then it may seem as though it is quite arbitrary whether the labels fall inside or outside the relevant circle; and so that the labels are irrelevant to the way in which the diagram is understood, and the "concatenation" relations are always to be interpreted in a uniform way. In fact, we may understand the letters, not as the labels for circles, but - as Euler did himself - as class terms (e.g. P stands for "oak", Q for "tree", R for "wood"). 23 On this view, the letter refers to the members of the set within the relevant circle, and so is properly placed within that circle. Alternatively, we may understand the circles as sets of possibilities in which a given state of affairs holds or a given proposition is true, both of which can be denoted by a propositional letter, and again the propositional letter will be properly placed within the relevant circle. 24 On either of the latter interpretations, the positioning of the letters is not arbitrary, but governed by the same or very similar considerations as those governing the positioning of letters in EG. According to the second criterion, as applied, EG is sentential. So it seems, by parity of reasoning, that EC must be sentential on these interpretations. But the status of the EC as a diagram or sentence cannot plausibly depend on which of these interpretations is adopted, for two reasons: first, if we grant an intuitive link between visual appearance and diagrammatic status, because the EC in question need not change its appearance under each interpretation; secondly, because the decision which interpretation to adopt requires an appeal to abstract syntax, if anything does. So the second criterion too seems very questionable. What, then, of the first and most basic criterion? Recall that on this "interposed syntax" criterion, a representation is a diagram if the reasoner does not need to know an abstract syntax in order to grasp its significance correctly. In the case study, it was the multiple readings and the apparently punctuated quality of the representations that pointed to the use of an abstract syntax in the case of EG. Should we think differently about EC? Take the punctuation point first. Both EGs and ECs can form visually distinct clusters on a page, and in both cases one can visually attend to and reason about some sub-element without attending to others, though of course the reasoning will follow different rules in each case. So there is little to differentiate the two here. What about syntax? Say, for purposes of illustration, that we understand the EC in fig. 1 as equivalent to P---, (Q ~ R). This interprets 23Euler (1795, 450ff). 24Note that this last interpretation, and not the "label" interpretation, seems in fact to be the one Stenning adopts, at least implicitly, in interpreting the EC in fig. 1 above as expressing "if P ---* Q, if Q---* R".
110
J. Norman
the set-inclusion relation as material implication. But material implication can be expressed in sentential logic in terms of negation and disjunction, so that Q--~ R is logically equivalent to -, Q v R; or, in terms of negation and conjunction, to -~ (Q ^ -, R). So we could, if desired, read the EC as P--, --, (Q ^ --, R). Of course, this is not a standard practice, though it or parallel inferences might become so as the theory of ECs is developed, but this is irrelevant: the point is simply to explore the parallelism in principle with EG. Note that if we do understand ECs in this way, the different readings exhibit the same phenomenon of clustering as was noted for EG. So it does not seem that the phenomenon of punctuation can do the required explanatory work in helping to differentiate between EG and EC according to the third criterion above.
6.
T W O TYPES OF I C O N I C I T Y
We can now ask the question: is it in fact plausible that, as the direct interpretation theory would require, a reasoner can correctly grasp the significance of ECs without knowing, or at least being able to appeal to, any abstract syntax? Of course, a full answer would require one to be clear about the exact sense of "significance", but it is very hard to see how this could be so. The significance of a graph will, it seems, be a partial function of the significance of its logical operators, and grasping this must require a grasp of at least some syntax. Thus, even if one reads the rules of inference of EC in an entirely un-interpreted way, merely as permissions to transform representations of type X into representations of type Y, it will still be true that a genuine understanding of a given EC will require the reasoner to know, or be able to appeal to, an abstract syntax. On this criterion, again, the EC will be sentential, and the supposed contrast with EG will be lost. The same seems to be true of many if not all diagrams generally, if we understand by the representation relation something more than mere structural correspondence between the diagram and what it represents. But this assumption is commonly made: we do not normally think of an ECG read-out as a diagram representing the performance of a financial index, though it could of course be taken to be a diagram, even though there may be a homomorphic or isomorphic correspondence between the two. It seems, then, that the attempt to distinguish between diagrams and sentences in terms of direct or indirect interpretation is unsuccessful. In the case study, in addition to the problem described in section 4, applying the three criteria leads to the result that EC is sentential, and so the supposed contrast with EG is lost. More generally, the distinction seems to construe
Iconicity and "direct interpretation"
111
too many representations as sentential where, as with ECs (and EG), our intuition that they are diagrammatic is robust. However, at this point it may be of value to return to an earlier distinction drawn by Peirce. 25 Peirce famously differentiates between the iconic and the symbolic in terms that are echoed by the attempt to distinguish direct from indirect interpretation. A sign is symbolic, according to Peirce, insofar as it represents merely in virtue of an arbitrary conventional association with its object, while it is iconic insofar as it has a resemblance to or shares a common character with its object. A sign may have both symbolic and iconic aspects: a picture of a beer mug in a sketch may iconically represent a beer mug and symbolically represent a pub. This is, of course, vague. But we can make the notion of iconicity more precise, in terms of how a reasoner grasps the structural relations involved. Let us say that II is a homomorphic or isomorphic relation between diagram A and target D if for any relevant relation R between elements dl, d2 of A there is a relevant relation S between the elements s l, s2 of D to which they are assigned by II; and the converse relation is also true. That is, dl R d2 if and only if Sl S s 2.
(Eq.1)
If this is the case, and also: it is possible to tell whether dl R d2 just by observing A,
(Eq.2)
then A is an icon of D . 26 In using the phrase "just by observing", I mean that the observer can grasp the information presented by the representation without a conscious process of inference. 27 We can then distinguish between two types of iconicity. For in one case, there may be a similarity of visual appearance between diagram and target: call this VA-iconicity. In another case, there may be no visual resemblance, but the reasoner may nevertheless grasp and reason about the relevant structural correspondence: call this S-iconicity. The distinction is quite intuitive" a geometrical diagram may bear a visual resemblance to a structurally homomorphic figure in visual 25The underlying distinction here in fact seems to originate with Kant. Specifically, Kant distinguishes in the first Critique (Kant, 1998; A717/B745) between "ostensive" and "symbolic" construction in terms that clearly anticipate the iconic/symbolic distinction in Peirce. Peirce's achievement is to embed this distinction into a tripartite analysis of representational character as icon, index and symbol, and thereby into his broader "semeiotic". The latter theory of signs is complex and extremely subtle, and the iconindex-symbol distinction is a small though well-known part of it; as a starting point, the interested reader should consult Peirce (1992) or Peirce (1977). Liszka (1996) is a useful overview. 26Similar remarks will obviously apply to properties of elements, but we can ignore this detail here. 27An important way in which this may occur is by "seeing as" or "multiple readability". Of this, Stenning remarks that "We should be wary of assuming that multiple readability is a feature of diagrammatic representation." I agree: and the analysis given here does not assume this, but provides an argument why it might be so.
112
J. Norman
imagination or visual memory, and so can be VA-iconic of it. A map typically bears no similarity of visual appearance to its terrain, and so can be S-iconic of it. Are EG and EC VA- or S-iconic to their targets? It seems hard to deny that EGs bear no visual resemblance to any logical relations that they represent, since logical relations do not have a visual appearance. But we can adapt a suggestion of Stenning's here, for it seems that the graphs can be VA-iconic of the sentences of linear EG: a reasoner can visually isolate the sentence of linear EG within a given graph, as in fig. 2. It is in this sense, then, that EG can be considered a hybrid system, for it is both S- and VA-iconic to different targets. What about EC? Intuitions may differ here. On the one hand, a parallel argument suggests EC is also S-iconic, since set-theoretic relations do not have a visual appearance as such; on the other hand, it might be suggested that the circles could be understood as physically containing or excluding sets of points, and that EC is VA-iconic of these sets of points so understood. Though it faces various difficulties, I do not think that this view, which is quite similar to one advanced in Maddy (1990), can be dismissed. EG and EC are, then, iconic of logical or set-theoretic relations. But they are also symbolic, in that the graphs and circles consist of marks whose meaning is given by the various conventions that constitute the syntax and semantics of the relevant systems. This enables us to offer a diagnosis of where the direct/indirect distinction went wrong, for the distinction we need - at least in these cases - is not one that appeals to the presence or absence of knowledge of syntax, but to the nature of the processing which symbols in the given syntax, once they are understood and given the structural relations they bear to their objects, elicit from the reasoner. And on this analysis, we can reformulate Stenning's intuition as to the hybrid nature of EG, not in terms of direct and indirect interpretation, but in terms of the different types of iconicity the graphs bear, to logical relations and to sentences of linear EG. At the outset of this discussion, I mentioned the emphasis placed by Stenning' s more general theory on the importance of the notion of cognitive accessibility to a satisfactory overall account of diagrammatic vs. sentential representation. Someone with these motivations has reason to accept the iconic/symbolic analysis given above, since it captures a sense in which the accessibility of a given representation affects our intuitive judgement as to whether it is diagrammatic or sentential. In particular, though I do not offer this analysis as a discrimination theory of the diagrammatic/sentential distinction, this account can avoid the problem raised in section 4 above; that is, in construing the graphs as diagrammatic, it is not thereby compelled to claim that sentences of linear EG are also diagrammatic. For it will not generally be true that sentences of linear EG are iconic of the graphs, in
Iconicity and "direct interpretation"
113
either sense - for example, some process of pairing-off or counting the parentheses will normally be required - and this is so precisely because they do not seem to possess the same cognitive accessibility as their graphical counterparts. How that accessibility is itself to be understood is a further question.
REFERENCES Barwise, J., Etchemendy, J., 1990. Visual information and valid reasoning. In: Zimmerman, W. (Ed.), Visualization in Mathematics. MAA Press, Washington, DC, pp. 9-24. Euler, L., 1795. Letters to a German Princess. Thoemmes Press, Bristol. Gurr, C., Lee, J., Stenning, K., 1998. Theories of diagrammatic reasoning: distinguishing component problems. Minds Mach. 8, 4, 533-557. Hammer, E., 1995. Logic and Visual Information. CSLI Publications, Stanford, CA. Kant, I., 1998. Critique of Pure Reason. Cambridge University Press, Cambridge. Ketner, K., 1996. Elements of Logic. Arisbe Associates, Lubbock, TX. Liszka, J., 1996. A General Introduction to the Semeiotic of C.S. Peirce. Indiana UP, Bloomington, IN. Maddy, P., 1990. Realism in Maths. OUP, Oxford. Norman, A.J., 1999. Diagrammatic reasoning and propositional logic. Dissertation, University College, London. Norman, A.J., 2000. Differentiating diagrams: a new approach. In: Anderson, M., Cheng, P., Haarslev, V. (Eds.), Theory and Application of Diagrams. Springer, Berlin. Peirce, C.S., 1977. In: Hardwick, C. (Ed.), Semiotic and Significs. Indiana UP, Bloomington, IN. Peirce, C.S., 1992. In: Ketner, K.L. (Ed.), Reasoning and the Logic of Things. Harvard UP, Cambridge, MA. Roberts, D., 1973. The Existential Graphs of Charles S. Peirce. Mouton, The Hague. Shimojima, A., 1996. On the efficacy of representation. Dissertation, Indiana University. Shimojima, A., 2001. The graphic-linguistic distinction. In: Blackwell, A. (Ed.), Thinking with Diagrams. Kluwer, Dordrecht. Shin, S.-J.(1998) Multiple readings of Peirce's alpha system, Conference Paper, Thinking with Diagrams. Stenning, K., 2000. Distinctions with differences: comparing criteria for distinguishing diagrammatic from sentential systems. In: Anderson, M., Cheng, P., Haarslev, V. (Eds.), Theory and Application of Diagrams, LNM 1889. Springer, Berlin. Stenning, K., Lemon, O., 2001. Aligning logical and psychological perspectives of diagrammatic reasoning. In: Blackwell, A. (Ed.), Thinking with Diagrams. Kluwer, Dordrecht. Stenning, K., Inder, R., Nielsen, I., 1995. Applying semantic concepts to the media assignment problem in multimedia communication. In: Chandrasekaran, B., Glasgow, J. (Eds.), Diagrammatic Reasoning. MIT Press, Cambridge, MA, pp. 303-338. Zeman, J.J., 1964. The graphical logic of C.S. Peirce. PhD thesis, University of Chicago.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
9 Diagrammatic logic and game-playing Ahti-Veikko Pietarinen Department of Philosophy, University of Helsinki, P.O. Box 9, FIN-00014 Helsinki, Finland
In this paper, diagrammatic systems for logical concept modelling are investigated. These systems include Charles S. Peirce's existential graphs, conceptual graphs, extensive semantic games, and discourse representation theory (DRT). It is argued that the fundamental difference between these systems is that unlike in the other graphical methods of logic, in the theory of extensive semantic games one is able to model concepts that call for some strategic, i.e. game-theoretic deliberations. They are needed among other things in properly understanding the linguistic concept of anaphora and its interplay with negation. Furthermore, it is shown how Peirce's existential graphs transform into extensive semantic games. This has important consequences to the game-theoretic visualisation of logic and semantics and enables one to represent uncertainty in existential graphs.
1.
INTRODUCTION
The diagrammatic approach to the representation of logical concepts began seriously around 1900, when Charles S. Peirce invented the influential theory of existential graphs. Peirce wanted his theory to provide a graphic notation and foundation for practically all conceptual representation and ~The work on this paper has been supported by the Ella and Georg Ehrnrooth Foundation and the Academy of Finland (Project: Game-Theoretical Semantics and its Applications). I would like to thank the participants and organisers of VRI'02 for continuing interest in diagrammatic systems of logic, Tapio Janasik for comments on anaphora and The Helsinki Metaphysical Club for supporting Peirce studies.
115
116
A.-V. Pietarinen
reasoning imaginable. Peirce himself called the logic of existential graphs "the greatest illumination of logic that ever has been made yet" (MS L 387), and "the luckiest find that has been gained in exact logic since Boole" (MS 280:22). 2 The theory consists of three parts: the Alpha part of the existential graphs corresponds to classical sentential logic, the Beta part to classical predicate logic with identity, and the Gamma part, although left somewhat incomplete, to fragments of modal logic, higher-order logic, and reasoning about graphs themselves. 3 Since the proliferation of graph-theoretic and diagrammatic methods, largely due to the expansion of computer science and the related disciplines of computational linguistics, cognitive science and artificial intelligence, the time to understand the insights of existential graphs has finally come. Sowa (1984) showed that conceptual graphs can be mapped to classical predicate calculus or order-sorted logic, and hence can be taken to form a useful and efficient basis for a graphical notation for logic. Meanwhile, Kamp (1981) developed the DRT for the purpose of linguistic representation of natural language expressions and discourse. Its discourse representation structures (DRSs) are diagrammatic images resulting from the interpretation of linguistic utterances, aimed at providing a precise medium for the information possessed by the speakers of language. Besides applications to information theory and artificial intelligence, a mathematical investigation of Peirce's existential graphs is under way (Brady and Trimble, 2000). Other widely used pictorial methods for computational purposes include entity-relationship diagrams, flowcharts, Petri nets, finite-state machines, and semantic nets. Parallel distributed programming and neural networks in general are also diagrammatic in nature. In semantic nets, for example, the task of knowledge representation has been to keep the theory close to actual expressions of natural language. In a closer contact with logic, Kripke frames and modal models are labelled graphs in which formulas of modal logic are interpreted. The processes of logical reasoning have in turn been investigated in projects such as Hyperproof 4 and Tarski's World (Barwise and Etchemendy, 1992). To enable complicated calculations, Feynman diagrams 2The references MS and MS L are to Peirce (1971) by manuscript and paragraph number. 3 According to Peirce (CP 4.511): "The gamma part is still in its infancy. It will be many years before my successors will be able to bring it to the perfection to which the alpha and beta parts have been brought. For logical investigation is very slow, involving as it does the taking up of a confused mass of ordinary ideas, embracing we know not what and going through with a great quantity of analyses and generalizations and experiments before one can so much as get a new branch fairy inaugurated". (The references CP are to Peirce, 1931-1966 by volume and paragraph number.) Peirce also envisaged the Delta part, which he thought one still has to "add...in order to deal with modals" (MS 500:3). Regrettably, no document is known to disclose what kind of modal system Peirce really had in mind here. 4 http://www-vil.cs.indiana.edu/Projects/hyperproof.html.
Diagrammatic logic and game-playing
117
and Penrose tilings have proved advantageous in quantum physics and related fields. However, many of the previous approached were preceded by yet another diagrammatic theory, the theory of semantic games. As will be shown below, game-theoretic semantics (GTS) of Hintikka (1973) is a theory that provides an alternative, yet powerful, tool from which formal or linguistic expressions derive their meaning. It is also closely related to Peirce's ideas on logical and semiotic thought. GTS accounts for the same underlying phenomena as DRT, but unlike its DRSs, games additionally draw out the total histories of the evaluation process correlated with linguistic or logical concepts. This will be possible as soon as we take the game-theoretic character of the theory seriously and think of the games as extensive diagrams of what the players do in the sense of the mathematical theory of extensive games. A particular area of importance is the representation and resolution of anaphoric concepts in natural language, which is relatively well studied in DRT, but which nonetheless wavers once one moves from simple sentences to some more complex ones, or when sentences involve negative constructions. The relations between games and conceptual graphs have not yet been investigated in the literature to a sufficient depth. Yet they are both foundationally appealing and desirable for applications because they allow one to exploit powerful pattern-matching abilities to a larger extent than does the classical linear or compositional logical notation. Games and conceptual graphs are both attempts to build a unified language-modelling tool. However, the main difference that needs to be acknowledged is that conceptual graphs aim at incorporating reasoning methods into the theory. On the other side, game-theoretic methods as conceived here are primarily semantic or, more broadly speaking, semiotic in nature. If proof-theoretic concepts are to be employed, we can still use games to that effect, but we would need to change the class of games from semantic to dialogical or proof-theoretic ones (Lorenzen and Lorenz, 1978; Rahman and Rtickert, 2001). Such classes are widely researched in computer science under the headings of game semantics and linear logic (Abramsky and Jagadeesan, 1994), but they will not fall within the scope of this paper. These paradigms often resort to the resources of category theory, itself diagrammatic in nature. Indeed, categorical concepts lurk behind the mathematics of Peirce's existential graphs (Brady and Trimble, 2000). In this paper we confine ourselves to three interrelated issues. We (i) investigate and give a brief comparison of some of the key ideas behind diagrammatic theories for logical systems, (ii) stress the importance of strategic dimensions in such systems, and (iii) outline some further
118
A.-V. Pietarinen
directions that the research on diagrammatic logic can take, especially within the purview of the effective interplay between existential graphs and semantic games.
2.
SYSTEMS OF DIAGRAMMATIC REPRESENTATION
2.1. Existential graphs Peirce's theory of existential graphs consists of three parts, the Alpha part (propositional logic), the Beta part (predicate logic with identity), and the Gamma part (modalities, collections, abstraction). The starting point of any of these is the surface of a plane, the sheet of assertion, on which various kinds of diagrammatic information are displayed. For instance, the characteristic features of Beta graphs are 9 Conjunction as juxtaposition of other graphs or predicate terms on the sheet, in any order. 9 Existentially quantified variables and identity as represented by lines of identity attached to predicate terms. Multiple occurrences of the same variable are represented by extending the lines of identities so as to connect different predicates. 9 Negations (cuts) as closed lines around graphs. The interpretation of these elements is in Peirce's terminology endoporeutic, that is, it proceeds outside in, starting from the outmost occurrence of any mark on the sheet and ending when a terminating (atomic) graph is reached. For example, in predicate logic the outmost existential quantifier in a formula would be denoted by a line of identity on the outmost zone of the graph. Implication can according to these definitions be symbolised by two nested circles (the scroll), the outer one denoting the antecedent and the inner one denoting the consequent. As an example, consider
71x(S1x-..+ S2x ) .
(Eq. 1)
This is diagrammatised to the existential graph of fig. 1.5
2.2.
Conceptual graphs
Since Sowa (1984), conceptual graphs have presented themselves as an increasingly important diagrammatic method and a tool in knowledge s Together with Oskar H. Mitchell, Peirce had already developed algebraically motivated predicate logic of quantifiers, with scope conventions and all, roughly at the same time with Gottlob Frege. Their language and notation was deployed alongside with those of existential graphs (Peirce, 1983).
Diagrammatic logic and game-playing
s, I I
s2 I
119
I
Fig. 1. Existential Beta graph for Equation 1.
representation and reasoning in artificial intelligence. We will ignore the reasoning part here, which is a proof method based on graph homomorphisms, and focus on the representational side. The idea is that a graph relation can show the connections between natural language expressions directly without using variables and variable renaming or typing. The characteristic features of conceptual graphs are: 9 Concepts, denoted by boxes drawn on a sheet. A concept consists of a type, which is a label of the concept, and a referent, which is a name or a quantifier. Either classical or generalised (plural) quantifiers may be allowed. 9 Relations between concepts, denoted by circles. 9 Coreference, denoted by a dotted line drawn between concepts. Various labels can be further specified and analysed, for example according to the genus or differentia of the entity they denote. A label is very much like a restrictor limiting the domain of applicability of a quantifier. A simple conceptual graph representing the sentence 2 is given in fig. 2. Every man sees a dragon.
(Eq. 2)
We can then add nesting of conceptual graphs and get different kinds of information especially as regard to the context within which any inner layer of a graph subsides. It is helpful to take context to mean negation, in which case we get an equivalent visualisation of negative information to that of existential graphs, namely the different areas of enclosures of graphs would depict the scope of negation in the sense of a symbolic formula. For example, the graph in fig. 3 corresponds to the classical first-order formula
(Eq. 3)
--I (:qX(SlX A S 2 x A ---, ( S i x A S3x))),
which is equivalent to (Eq. 4)
Vx((S,x ^ S2x) --. S3x). Man: V
I
I
~
~Drag~ *
Fig. 2. Conceptualgraph for sentence 2.
120
A.- V. Pietarinen
/ S 1 "*
S 2 "*
I
't
! S 1 :*
I
S 3 :* /
Fig. 3. Nestedconceptual graph for Equation 3. In Hendrix (1979), it is shown how graph structures that are nonpartitional can also be used in visual and diagrammatic expression of natural language items. It is interesting to note that this kind of nonpartitionality is no longer needed on the object level of graphs when they are subsumed under a dynamic game-theoretical interpretation that involves imperfect information (see section 5).
2.3.
Discourse representation theory
The development of the theory of discourse representation (DRT, Kamp, 1981; Kamp and Reyle, 1993) is mainly motivated by the problem of interpreting pronouns in a sequence of natural language sentences. Its idea is to portray natural language discourse in two steps. First, it constructs a DRS involving a set of discourse referents, and second, it gives a set of conditions (predicates, etc.) on these discourse referents introduced by the natural language expression in question. The result is a graphical notation of a box, consisting of an upper part of the list of discourse referents and a lower part of the conditions imposed on them. Figure 4 illustrates a simple DRS for the sentence 5. (Eq. 5)
A man sees a dragon.
Like conceptual graphs, a DRS can be nested, which is vital in trying to capture the meaning of anaphoric pronouns. By the rule of transitivity, the conditions of a box are inherited to those boxes that occur deeper inside it.
x,y
Man(x) Dragon(y) Sees(x, y) Fig. 4. DRS for sentence 5.
Diagrammatic logic and game-playing 2.4.
121
Game-theoretic semantics
GTS is a theory for logical and linguistic semantics (Hintikka, 1973). In GTS, formulas of a given language are evaluated by means of noncooperative zero-sum semantic games played by two players (say P1 and P2) in accordance with the game rules. In essence, conjunction and existential quantifier prompt a move by the player who is playing the role of the verifier (MYSELF) and disjunction and universal quantifier prompt a move by the player who is playing the role of the falsifier (NATURE). For quantified variables, this move is a choice of an individual from the domain of the structure. For connectives, the move can likewise be a choice of an element, but this time restricted to a domain of two elements. Negation is gametheoretic, that is, it calls for an exchange of the roles of the players, and the winning conventions will change accordingly. These conditions say that if the atomic sentence that has been reached is true then MYSELF wins the play of the game. If the atomic sentence is false then NATURE wins. In playing a game the players carry on by way of strategies, that is, by a method that shows them how to move optimally, taking the moves of the opponent into account. The strategy is a winning one if the player wins a play using it, no matter how the adversary moves. The truth of a sentence is then defined as the existence of a winning strategy for the player who started the game off as MYSELF, that is, as the verifier of that sentence. Dually, the falsity of a sentence is defined as the existence of a winning strategy for the player who started the game off as NATURE, that is, as the falsifier of that sentence. For example, in a domain of two elements, formula 6 gives rise to the extensive game of fig. 5. Vx 3y Sxy.
(Eq. 6)
The details of the theory of extensive semantic games are presented in Sandu and Pietarinen (2003) and Pietarinen (2002b). In brief, these forms are tree structures depicting the choices made by the players as labels on
192: Vx3yPxy P1 : 3y Pay 3y Pby a/ ~ a/ ~ Paa Pab Pba Pbb (1,-1)
(-1,1) (-1,1)
(1,-1)
Fig. 5. Extensive semantic game for Equation 6 (with payoffs).
122
A.- V. Pietarinen
the edges departing from non-terminal histories (decision nodes). The root is labelled with the whole formula, and the emanating non-terminal histories with its proper subformulas. Terminal histories are placeholders for atomic formulas. Terminal histories are adjoined with payoffs that denote the outcomes of the interpretation of atomic formulas. In addition, the nonterminal histories are labelled with the role of the player who is to move at that history. The theory of extensive games is an important explication useful for many applications of GTS. As far as natural language semantics is concerned, the basic mechanism of the GTS treatment of anaphora can be illustrated by the analysis of a simple conditional $1--* $2. The game G(S1) on the antecedent is played first with the players' roles reversed. If S1 turns out true, the players move on to play the game G(S2) on the consequent. The strategy used in G(S1) by player P1 for verifying $1 is then available for, or remembered by, player P2 in G(S2) who in turn sets out to verify $2. Falsification strategies are not taken to carry over in this manner. To improve on previous expositions of game-theoretical anaphora found in the literature, we can now capture the notion of P2 remembering the verification strategies used by P1 in G(S1) in the extensive-form representation of a semantic game. For basic sentences such as sentence 7, this is done in terms of the information retrieved from the histories of the game" subgames and operations on them are defined so that the remembering of a strategy amounts to the inheritance of assignments from the top node downwards. The more complex sentences involving remembering of the strategy functions themselves in addition to the assignments can also be treated in an extension of the framework that enables one to represent and reason about strategies within the extensive game system. To attain a proper treatment of simple anaphora, operations on subgames are defined so that a consequent subgame with terminal histories is augmented with an antecedent subgame. The consequent is then played with the assignment inherited from the antecedent. The information about anaphoric relations is thus captured in terms of the histories of the game. Given an input assignment at the start of the game, what the play in effect produces is a sequence of assignments that captures the anaphoric information. As far as the mechanism of anaphora is concerned, this method sets GTS on a par with theories of dynamic semantics (Muskens et al., 1997). Furthermore, no separate notion of a choice set as used in the previous expositions is needed as all actions can be recovered from the game history. We will return to these issues below.
Diagrammatic logic and game-playing 3.
123
SOME COMPARISONS
3.1.
Pronominal anaphora
Due to its coreferential nature, an illustrative example of the usefulness of diagrammatic representation systems is provided by natural language anaphora. For consider A man sees a dragon. He escapes.
(Eq. 7)
We take these two sentences to form an implicative relation from left to fight. Diagrammatic representations of this mini-discourse are given in figs. 6 - 9 , for existential graphs, conceptual graphs, the discourse structures of DRT and the extensive games of GTS, respectively.
3.2.
Negation in diagrammatic representations
A particularly instructive brainwave in graphical modelling is the concept of negation. In existential graphs, negation is a cut (a separation line) that severs the enclosed subgraph from the rest of the graph. Alternatively, this incision can be seen as giving rise to the role reversal between those who interpret the graph, which comes out nicely in the semantic game framework. The general idea nonetheless has interesting applications. Because of the idea of a separation, it can be seen why in some cases anaphoric coreference is not possible. For consider a discourse that is out (the illicit part is marked by a star): It is not the case that a man sees a dragon. 9 He escapes.
Man ]~
Sees
Dragon
Escapes I Fig. 6. ExistentialBeta graph for sentence 7.
Neg
/ I
I
Dragon:*
Neg Man:,, ]
't ,t
Fig. 7. Conceptualgraph for sentence 7.
(Eq. 8)
124
A.- V. Pietarinen x,y Man(x) Dragon(y) Sees( x, B)
Z --X
Escapes(z) Fig. 8. DRS for sentence 7. (We naturally assume that a man never escapes unless he sees a dragon.) The diagrammatic representations for sentence 8 are given in figs. 10-13. In the diagrams of figs. 10-12, the nesting rules for the cut-like negative operation are seen to be violated, which explains the impossibility of coreference. In the game graph of fig. 13, it is the role reversal between the two interlocutors that is the key explanatory concept. 3.3.
Variables
Since conceptual graphs do not use variables, it is reasonable to take them to be closer to Peirce's original theory of existential graphs than to other diagrammatic logics such as DRT. But what is the role of GTS here? In addition to the kind of semantic games that can be defined for a number of formal languages, an alternative non-representational approach is to associate games directly on natural language expressions, including lexical ones. Hence no variables would be needed in GTS, either (cf. Hintikka and a man sees a dragon; he escapes, 9
I 3x; man(x); sd(x); he escapes, 9
Pl'man(x);sd(x);heescapes, g[x/a] man(x),g[x/a]
...
P2:sd(x);heescapes, g[x/a]
/ sa(x), g[x/a]
\ PI" h~ escapes,g[x/a]
I escapes(x), 9 [x / a]
Fig. 9. Extensivesemantic game for sentence 7 (without payoffs).
125
Diagrammatic logic and game-playing
Sees
Dragon I
Escapes
?
Fig. 10. Illicitexistential graph for sentence 8, violating the endoporeutic nesting of cuts. Kulas, 1983). It is of course possible to use a formalised medium into which one first maps the expressions, but that would be an optional extra.
3.4.
Modalities
One item not sufficiently well understood yet is how to incorporate modalities into diagrammatic systems of logic. This is particularly problematic in the predicate modal extensions, which are few and far between. However, diagrammatic methods promise a good deal of fresh insight into old problems of modal predicate logic, including those of cross-world quantification and identification, the de dicto vs. de re distinction, and modal anaphora across attitude contexts. An example of the latter is the notion of intentional identity, a special anaphoric coreference in the context of two or more non-iterated modalities. (For formal treatment, see Pietarinen, 2001.) So in diagrammatic theories of modalities, we can for instance tackle the notion of cross-world identity in novel ways, dispensing with the somewhat dubious existence assumptions in the actual world. It is interesting to note that Peirce was developing a modal system of graphs in the Gamma part of the existential graphs (and also in what he termed "tinctured" existential graphs, later abandoned as nonsensical). He was trying to represent the concepts of possibility and necessity using a marked relation between "states of information" of different graphs (CP 4.517, MS 467). This of course comes very close to the modem model-theoretic approach to
Man:* Neg
I
i
.3
Orag~
I Man:* I Fig. 11. Illicitconceptual graph for sentence 8.
126
A.- V. Pietarinen x,y Man(x) Neg Dragon(y) Sees(x, y)
z ZzX 9 1
Escapes(z)
Fig. 12.
Illicit DRS for sentence 8.
modalities as accessible states or possible worlds, but apart from an isolated description in his writings, Peirce did not go on exploit this idea further, however. The construction of Gamma graphs was not comprehensive in the direction of the treatment of modalities, and one can among other things also interpret them as encompassing higher order (type-theoretic) logic. Some ideas relating quantified modal logics and Gamma graphs are explored in OhrstrCm (1997). One can nevertheless go further, since a closer look at the proposed connections suggests that novel parallels can be drawn, on the one hand, between Peirce's notion of unbroken cuts and the so-called open domain assumption in modal predicate logics, and on the other hand, between broken cuts and common domain assumption. As to the former, various additional identification modes to enable crossworld comparisons have to be evoked, while as to the latter, the notion of identification loses its importance.
not : a man sees a dragon; he escapes, g ~3x; man(x); sd(x); he escapes, g P2: man(x); sd(x); he escapes, g[x/a]
...
J man(x),g[x/a]
P2: sd(x); he escapes, g[x/a]
/ sd(x), g[x/a]
\ P, : he escapes, g[x/a] escapes(z), g[xla]
Fig. 13.
Illicit extensive semantic game for sentence 8, with P2 now choosing for x.
Diagrammatic logic and game-playing 4.
127
FROM EXISTENTIAL GRAPHS TO EXTENSIVE GAMES
In Burch (1994), the Alpha part of Peirce's theory of existential graphs and GTS were brought into relation, by mapping the conventions for the Alphafragment (CP 4.394-402) to the game-theoretic rules of action. This is in accordance with Peirce's intentions. He recognised in so many words the importance of the dialogue-like interactive settings in the foundations of his semiotic approach to logic (Hilpinen, 1982; Pietarinen, 2002a). In brief, the existential graphs are constructed by the Grapheus, who is the malin genie determining the truth of the irreducible, terminal graphs. The Grapheus is willing to play off against the Graphist, who scribes the molecular graph on the sheet of assertion and begins its examination by an interactive examination process with the Grapheus. 6 The mapping from Alpha graphs to semantic games is straightforward. To recap, the basic logical components of Alpha graphs are the cuts (negation), juxtaposition (conjunction), and the verum (logically true proposition). All these are scribed on the sheet of assertion. Any two graphs scribed on the sheet represent commutative conjunction. A continuous circle around the graph represents negation. An empty graph is the verum and a cut around an empty graph is the falsum (logically false proposition). The Grapheus' universe determines the truth values as well as the falsity values of atomic graphs. In semantic games, the Grapheus and the Graphist are mapped to their roles of NATURE and M Y S E L F , respectively. The mapping is total, that is, at each non-terminal history of the game a player has one of these roles and the adversary has the other. As noted, the rule of interpretation is endoporeutic, starting from an outmost cut or a graph outside a cut, and proceeding toward an atomic or a blank graph. At each history where a decision is to be made, an erasure is performed, that is, a cut is removed or a player throws away those graphs that were not designated. When encountering the cut, the roles of the players will change, and the winning conditions will also change throughout the examination. Since the graph is finite, an atomic graph is eventually reached. The winning conditions are given so that when an atomic graph is reached, the player playing the verifying role (i.e. M Y S E L F ) wins if that graph is true or is a blank one, and when an atomic graph is reached, the
6In fact, Peirce describes this interaction as "collaborative" (CP 4.552), which is extremely interesting because in the customary theory of semantic games, the players are taken to draw their actions in a strictly competitive fashion. Yet if there is collaboration, it is not unconceivable that there be some "division of surplus" of the truth values of atomic propositions, which leaves the possibility of atomic contradictions. On the idea of non-strictly competitive semantic games, see Pietarinen, 2000.
128
A.- V. Pietarinen
player playing the falsifying role (NATURE)wins if that graph is false or a blank one encircled by a cut. The molecular graph itself is true, precisely in the case the player who made the first move as MYSELF is able to win no matter how her adversary moves. Symmetrically, the graph is false, precisely in the case the player who made the first move as NATURE is able to win no matter how his adversary moves. In the terminology of semantic games, we say that in these cases there exists a winning strategy for a player. In Pietarinen (2002a), it is argued that albeit Peirce did not come to use this game-theoretic terminology of strategies, his widely spread notion of a habit can in restricted contexts viewed as one from which the notion of a strategy can be derived. 7 In the case of the Beta part of the theory of existential graphs, not covered by Burch (1994), there will be lines of identities that correspond to existentially quantified variables and coreference. Accordingly, their meaning is that individuals have to be picked from a suitable domain by MYSELF and assigned to the lines. For this, we need a domain of individuals and their arrangement in a structure. The role of a model can be played by another (cut-free) graph with interpreted constant symbols, relations and predicates (Sowa, 2001). Similar winning conditions and truth definitions apply here as in the Alpha part, with the addition that the atomic graphs are interpreted by the Grapheus in terms of also checking whether the sequence of individuals chosen along the endoporeutics are to be included into the extensions of the atomic predicates that were reached. If so, the current player who is MYSELF will win. If not, the player who is NATURE wins. Having described the game-theoretic interpretation of Peirce's existential graphs (we leave the Gamma part for future investigation), what in fact is the structure and nature of these games? It turns out that there is a convenient way of representing these graph games in the game-theoretic format of extensive-form representations briefly described in the previous section. By doing so, another diagrammatic and iconic representation of logic emerges. For any existential graph can be turned into an extensive semantic game, adjoined by the payoff conditions judged by the Grapheus. The tree will in the case of Alpha graphs consist of binary choices between a subgraph and the rest of the graph, together with the labelling of the nonterminal histories by the roles of the players. The edges will be labelled by binary elements from the set {This, Anything_else}. Hence, the extensive form will be a tree with two successors. The payoff function will assign the 7 Peirce was exploring many kinds of games in mathematical contexts such as chess and tic-tac-toe, trying to lay out their winning conditions. These investigations showed little connections with his logical and semiotic theories, however.
Diagrammatic logic and game-playing
129
terminal histories the values in { 1, - 1 }, transforming the extensive form into the game proper. In the case of the Beta graphs, the branching factor of the tree is the size of the domain for levels where the lines of identities are interpreted. Besides binary elements, the edges are labelled by the names of the individuals chosen from the domain. In the case of Gamma graphs, the modalities will have to be taken into account too, and the branching factor may additionally be the cardinality of the different states of information subsisting in the model.
0
I N C O R P O R A T I N G U N C E R T A I N T Y INTO T H E VISUALISATION OF L O G I C
The diagrammatisation of logical propositions by existential graphs is an efficient heterogeneous method for visualising what one's linear, symbolic formulas of logic are meant to express and are meant to be interpreted. In a similar vein, extensive games assemble formulas into a tree structure and they also show, endoporeutically, how information transmits from one logically active component to another. But as soon as diagrammatic graphs are associated with endoporeutic interpretation, a striking generalisation follows. For there are no pre-theoretical reasons to assume that the kind of information flow they exhibit has to be an uninterrupted one. As it happens, there are logics, even those that might be among our very elementary ones, where the semantic information flow is not perfect. The family of logics in question are known as the "independence-friendly" (IF) versions of traditional, syntactically linearly notated logics. In IF first-order logics, for instance, the quantifiers and connectives are slashed to denote of which other components they are supposed to be independent (Hintikka, 1996). For instance, if we do not want the choice for x in the first-order formula Vx x :ty Sxy to be visible at y, we rewrite it as Vx(=ly/x)Sxy. In general, we ought to assume that there are theoretically significant distinctions in symbolisations that are truth-conditionally (weakly) equivalent in their expressive capacity but which come to the fore when theories of diagrammatisation are brought into play. The notion of independence means that one associates such slashed formulas with semantic games of imperfect information. That is, the player who is making a move is not necessarily perfectly informed about what some of the previous moves of the game have been. But as soon as this much is admitted, the diagrammatic approach to logic suggests that one could go all the way through. Hintikka restricts the regulation of semantic flow to the universal-existential types of independence. But in its most general form, the formulas itself are graphical visualisations of all kinds of dependence and independence relations between quantified variables and
130
A.- V. Pietarinen
connectives. Therefore, we can represent the formula q~ of traditional firstorder logic L by a tuple (DG, ~o), where DG is a directed graph and q~ is a formula that carries no presupposition about the a priori ordering of its logical constants. The relation between two nodes means that the information concerning the value of the variable instantiated to it - labelled to the starting node of a relation of DG - transmits to the ending node of that relation. In another words, the latter variable can be said to depend on the former. The extremal cases come about when (i) DG is closed under equivalence relation, in which case all variables and connectives in q~depend on all the others, and when (ii) DG is a disjoint graph, in which case no variable and no connective in ~0depends on anything else, not even on itself. The associated semantic games have to be adjusted to reflect these generalisations; we would need to play concurrent games for those constituents not in any relation. Such games have been developed in computer science (Abramsky and Melli~s, 1999; de Alfaro and Henzinger, 2000). In disjoint graphs, reflexive relations are not admitted, and hence the associated extensive games would not comprise even singleton information sets (i.e. partitions of the histories of the extensive game). In a sense, the generalisation described gives rise to a Bayesian belief net or a semantic net. Therefore, some surprising novel dimensions are possible in this direction by generalisations of graphical representation of logical syntax alone. The decisive question now is: is the theory of existential graphs, being diagrammatic, able to reflect these new dimensions? What is the existential graph for a sentence that is not first-order representable, such as "For every A, there is a B" (A, B monadic)? (See Boolos, 1981 for the argument that this needs to be symbolised by 2D Henkin quantifier Vx 3y Vz ::lu
= y
z = .) ^
By)),
which in turn is representable in IF logic as Vx 3 y ( V z l x y ) ( : : l u / x y ) x ((x = y ~ z = u ) A (Ax---, By)), but which is not reducible to traditional first-order logic.) It turns out that there is no need to try to build any special notational gimmick into the language of existential graphs. Peirce's endoporeutics, as indeed its modern cousin of GTS, do not need to be methods of perfect information flow. As soon as endoporeutics is cast into the mathematical theory of games, we can take the evaluation of the graphs to exhibit imperfect information. Technically, this imperfectness refers to the communication between the Graphist and the Grapheus, but we already observed how they can be viewed as the (real or imaginary) interlocutors
Diagrammatic logic and game-playing
131
playing the roles of MYSELF and NATURE according to the given conventions of the semantic game. Consequently, diagrammatic visualisation of logic by existential graphs reveals yet another facet of the commonality of endoporeutics. We can impose various restrictions on its basic, defining characteristics, or relax some of them. It is of interest to note that Peirce by contrast did seem to assume his dialogues to be of perfect information" "whichever of the two makes his choice of the object he is to choose, after the other has made his choice, is supposed to know what that choice was. This is an advantage to the defence or attack, as the case may be" (Hilpinen, 1982; Pietarinen, 2002a). This is consonant with the fact that Peirce took the law of excluded middle to hold in fragments of logic that do not need to deal with the notion of vagueness. Because of the assumption of perfect information, Peirce thought the meaning of existential graphs to be compositional in the sense of being determined by its component graph-instances. This is evidenced in MS 280:35" "The meaning of any graph-instance is the meaning of the composite of all the propositions which that graph-instance would under all circumstances empower the interpreter [the Graphist] to scribe."
6.
CONCLUSIONS
Not only a knowledge representation scheme in artificial intelligence or a pictorial device for writing out discourse structures, the diagrammatic approach unifies the outlook on logical systems themselves. The motivation for conceptual graphs comes from computer science, but their proximity to Peirce's existential graphs as well as to GTS makes them all foundationally rich. However, although structurally similar, conceptual graphs and DRT still lack the strategic dimensions of game-theoretic systems. In many knowledge-based systems aimed at understanding natural language expressions one benefits from strategic resources such as world knowledge, collateral information, lexical clues and various cognitive repertoires. Extensive games are diagrammatic systems tailor made for such strategic tasks. One thus still needs to investigate what kind of games do we have for conceptual graphs, or possibly other forms of visual representations of logic including DRT. As hinted above, one of the key distinctions between GTS and DRT is that DRT does not keep any record of the histories of discourse elements to which we could refer to and among which we could go on to choose the preferred interpretations of, say, anaphoric constructions.
132
A.-V. Pietarinen
Accordingly, the interpretational history is missing in the existential graph representation, at least until we take their dialogical character seriously and interpret graphs by the apparatus of extensive forms of games. Yet, Peirce's existential graphs do involve at least rudimentary forms of strategic meaning of utterances, however. For Peirce often resorted to the notion of a habit in the contexts where it might guide us to the fight decision through generalisations (Pietarinen, 2002a). In this sense the development of GTS has rewarded Peirce's objective, although we are still far from a complete theory of strategic meaning. For example, various forms of bounded rationality that are currently pursued in game theory and interactive epistemology may turn out to be especially important. Peirce wanted his existential graphs to put before us true "moving pictures of thought" (MS 300:22). This was not achieved in full. His own investigation was conducted on the fairly static level of endoporeutic interpretation. By putting the graphs, so to speak, "on the move" in the sense of the theory of games, we may hope to accomplish a truly pictorial and dynamic representation of the meaning of logical propositions. By thereby having to go further on the path of diagrammaticalisation than Peirce, we also manage to put Peirce's anticipations into a sharper perspective: "A picture is visual representation of the relations between the parts of its objects; a vivid and highly informative representation. Yet...it cannot directly exhibit all the dimensions of its object, be this physical or psychic. It shows this object only under a certain light, and from a single point of view" (MS 300:22-23).
REFERENCES Abramsky, S., Jagadeesan, R., 1994. Games and full completeness for multiplicative linear logic. J. Symb. Log. 59, 543-574. Abramsky, S., Mellibs, P.-A., 1999. Concurrent games and full completeness, Proceedings of the 14th Annual IEEE Symposiumon Logic in Computer Science. IEEE Computer Society Press, pp. 431-442. Barwise, J., Etchemendy, J., 1992. Tarski's World. CSLI, Stanford. Boolos, G., 1981. For all A there is a B. Linguist. Inq. 12, 465-467. Brady, G., Trimble, T.H., 2000. A categorical interpretation of C.S. Peirce's propositional logic Alpha. J. Pure Appl. Algebra 149, 213-239. Burch, R.W., 1994. Game-theoretical semantics for Peirce's existential graphs. Synthese 99, 361-375. de Alfaro, L., Henzinger, T.A., 2000. Concurrent Omega-regular games, Proceedings of the 15th Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society Press, pp. 141-154. Hendrix, G.G., 1979. Encoding knowledge in partitioned networks. In: Findler, N.V. (Ed.), Associative Networks: Representation and Use of Knowledge by Computers. Academic Press, Orlando, pp. 51-92.
Diagrammatic logic and game-playing
133
Hilpinen, R., 1982. On C.S. Peirce' s theory of the proposition: Peirce as a precursor of gametheoretical semantics. The Monist 62, 182-189. Hintikka, J., 1973. Logic, Language-Games and Information. Oxford University Press, Oxford. Hintikka, J., 1996. The Principles of Mathematics Revisited. Cambridge University Press, New York. Hintikka, J., Kulas, J., 1983. The Game of Language: Studies in Game-Theoretical Semantics and its Applications. Reidel, Dordrecht. Kamp, H., 1981. A theory of truth and semantic representation. In: Groenendijk, J., Janssen, T. and Stokhof, M. (Eds.), Formal Methods in the Study of Language. Mathematical Centre, Amsterdam, pp. 475-484. Kamp, H., Reyle, U., 1993. From Discourse to Logic. Introduction to Model-Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht. Lorenzen, P., Lorenz, K., 1978. Dialogische Logic. Wissenschaftliche Buchgesellschaft, Darmstadt. Muskens, R., van Benthem, J., Visser, A., 1997. Dynamics. In: van Benthem, J., ter Meulen, A. (Eds.), Handbook of Logic and Language. Elsevier, Amsterdam, pp. 587-648. OhrstrCm, D., 1997. C.S. Peirce and the Quest for Gamma Graphs, Conceptual Structures: Fulfilling Peirce's Dream, Lecture Notes in Artificial Intelligence 1257. Springer, Berlin, pp. 357-370. Peirce, C.S., 1931-1966. In: Hartshorne, C., Weiss, P., Burks, A.W. (Eds.), Collected Papers of Charles Sanders Peirce. Harvard University Press, Cambridge, MA, 8 vols. Peirce, C.S., 1971. Manuscripts in the Houghton Library of Harvard University, as identified by Richard Robin, Annotated Catalogue of the Papers of Charles S. Peirce (Amherst: University of Massachusetts Press, 1967), and in The Peirce Papers: A supplementary catalogue. Trans. C.S. Peirce Soc. 7, 37-57. Peirce, C.S. (Ed.), 1983. Studies in Logic, by Members of the Johns Hopkins University. John Benjamins, Amsterdam. Pietarinen, A., 2000. Logic and coherence in the light of competitive games, Logique et Analyse 43, 371-391. Pietarinen, A., 2001. Intentional identity revisited. Nordic J. Philos. Log. 6, 147-188. Pietarinen, A., 2002a. Peirce's game-theoretic ideas in logic, Semiotica, 144, 33-47. Pietarinen, A., 2002b. Semantic games in logic and epistemology. In: Gabay, D., van Bendegem, J.-P., Rahman, S., Symons, J. (Eds.), Logic, Epistemology, and the Unity of Science, Kluwer Academic Press, Dordrecht. Rahman, S., Rtickert, H., 2001. Dialogical connexive logic. Synthese 127, 105-139. Sandu, G., Pietarinen, A., 2003. Informationally independent connectives. In: van Loon, I., Mints, G., Muskens, R. (Eds.), Logic, Language and Computation 9. CSLI Publications, Stanford. Sowa, J.F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, MA. Sowa, J.F., 2001. Existential Graphs: MS 514 by Charles Sanders Peirce with commentary by John F. Sowa, http://users.bestweb.net/-~ sowa/peirce/ms514.htm.
This Page Intentionally Left Blank
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
10
Mobilising knowledge models using societies of graphs R. C. Paton Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK
This chapter discusses a way of mobilising knowledge by using models of knowledge based on graphs. These models are informal and geared to ease of use. A key feature of the approach is concerned with the idea that as knowledge about a domain unfolds, a society of graphs can be used to seed, generate and elaborate the emerging model. This society can help the exploration of a domain to unfold and the graphs satisfy a number of roles that we describe in terms of some key metaphors. A simple case study is followed to illustrate the approach concerned with the notion of a network.
1.
INTRODUCTION
Diagrams are invaluable for the meaningful articulation of complex knowledge. They can play a central role in the evolution of scientific knowledge and are closely linked with the processes and products of modelling. Models not only include systems, but also models of knowledge of the systems, and models of our knowledge of models of the systems. An idiographic approach to knowledge modelling is taken here, with an emphasis on dialogue and reflection. This chapter places modelling, and particularly modelling with diagrams, within the context of the process as well as the product. Models are representations of one thing in terms of something else. These terms may be physical or they may be verbal, diagrammatic, conceptual or 135
136
R. C. Paton
symbolic. Underlying the production of a model, and reflecting the processes related to its production, is a theoretical framework that is not only partly shared by a community but is also idiosyncratic and individualistic. One class of diagrams that are the focus of the present discussion comprises nodes and arcs, and summarise objects, parts, relations, associations and interactions. (Note: given the multidisciplinary readership of this chapter, "arc" and "node" are used rather than "edge" and "vertex"). These graphs are used across many knowledge domains and occur in a wide variety of forms including ball-and-stick molecular model, circuit diagram, bond graph, food web, lineage tree, block diagram, state transition network, entity-relationship model and semantic network. A large number of diagrammatic approaches to knowledge modelling have been developed. Contemporary methods range from representations for learning and understanding (e.g. Novak, 1998) to formalisms for dealing with logic and ontology (e.g. Sowa, 2000). For example, Concept mapping was developed by Novak and collaborators for providing an advance organiser in the form of a diagrammatic summary. An advance organiser was a general, abstract and succinct representation of knowledge in a domain presented to a learner prior to the main corpus of knowledge. It was based on Ausubel's meaningful reception learning paradigm (e.g. Ausubel et al., 1998). Concept Maps can also provide a way of negotiating meaning and of making metacognitive knowledge explicit (Novak, 1998). The graphs that are described in the present chapter have been developed from many sources (including Novak and Sowa) and represent part of a collection of types that have been developed to aid modelling, summarisation, communication and memorisation of domain knowledge.
0
GRAPH HERMENEUTICS: TOWARDS A SOCIETY OF GRAPHS
This chapter will emphasise an eclectic approach to diagrammatic graphs as "informal" models of systems and knowledge. The graphs that are discussed here are not encumbered with complicated formation rules or strict varieties of arcs and nodes. A person using the graphs need not have to learn formal logic or linguistics in order to apply them. There is a degree of approximation in their production. The advantage is that there is a relaxation of any feeling of "shoe-horning" or applying a "Procrustean axe" to a person's knowledge in order to make it fit. The method encourages experimentation with, and exploration of the knowledge in a domain. This facilitates an appreciation of knowledge in process (rather than as a fixed entity), and the importance of
Mobilising knowledge models using societies of graphs
137
a diversity of representations, dialogues and viewpoints. Diagrams provide one means for examining possible semantic and cognitive contexts. Hermeneutics is a method for sharing or facilitating dialogue within these contexts (Meyer and Paton, 2002). Within this hermeneutic approach, the construction of a graph involves a finished production, processes of producing the artefact, and adherence to certain composition rules (guidelines) for production and interpretation. A diagrammatic form can be used to summafise a process (such as a problemsolving process, algorithm, or protocol). We apply hermeneutic thinking to allow the sharing of common perspectives on domain knowledge (Meyer and Paton, 2002). The approach is partly based on the writings of Ricoeur (1981) and incorporates the dynamic between interpreting text in terms of metaphor and explaining metaphor in terms of text. Within the current discussion this dynamic will be exploited in two ways: to associate a number of metaphors with "graphing" and its products, and to use metaphor to help explain and interpret certain graphs (and sub-graphs). We shall describe metaphor as the language used to talk about one thing in terms of something else. To clarify usage, metaphors that are used as general types will be presented in UPPER CASE. A graph is usually presented using the same medium as the written word. As TEXT, it is open to be read, interpreted, edited, informative, drafted, illuminating, illustrative, reviewed, annotated, documented, and so forth. It may also provide a summarising overview and be used for the purposes of clarification and explanation. Rules associated with graph composition and interpretation suggest that grammars can be applied. Any grammar in this context is much more than applying rewriting rules to a mathematical object, it must also relate to the context, container and contents of the graph. TEXT can convey ideas of narrative related to story, history or trace. One of the verbs used to describe aspects of modelling knowledge deals with mapping (e.g. in the production of Mind maps, Concept maps, etc). Mapping a domain of knowledge can convey ideas about relationship, juxtaposition, space, topographical context, navigation, contour and landscape. There is also the notion of mapping as displacement from a source to a target. Associated with MAP we may think of JOURNEY. Reading or writing a graph can be like making a journey in the sense that there is an impression of moving between the component parts along the arcs, and of having perspectives and viewpoints of the (knowledge) landscape. In some diagrammatic constructions, there is the idea of exploring unknown territory, namely the complexity of unfolding conceptual relations as the graph is formed and evolves. MAP is related to TEXT (e.g. reading, representing, portraying, interpreting and understanding) and to ART (e.g. form, symmetry, pattern, drawing, image, icon, presentation and sketch). Like an artist, a graph
138
R. C. Paton
constructor may approach a design by making preliminary sketches and drawings. These intermediate steps can help clarify the form of the finished production. Thompson (1942) proposed that the form of an object is a diagram of forces. WINDOW shares some characteristics with ART, JOURNEY and TEXT. It provides visual access to what is outside a CONTAINER or FRAME. The icon in the Orthodox tradition acts as a window on heaven (the eternal) - a visual grasp of wonder that cannot be contained in words or images. As TEXT, the WINDOW can help in interpretation and explanation: the reader's understanding is deepened or extended. With regard to a hermeneutic dialogue, a graph can provide an interpreter with a WINDOW onto aspects of another person's knowledge models. In a self-reflective mode, the graph WINDOW can also enable an individual to interpret and mobilise their own models. CONTAINER, FRAME and SCHEMA are metaphors that can help to articulate how a graph has components that are related to each other, and to what is not part of the graph, in distinctive ways. The sense of containment and frame (or framework) within a graph occurs with such language as outline, plan, order, script, skeleton, inside-outside, boundary and interface. SCHEMA is a more generic term that includes the other two. Its contemporary intellectual roots are in A.I./Cognitive Science. In many ways, this trio of terms can subsume many features of the other metaphors mentioned so far. The constructions and projections of geometry, sketch theory, Feynman diagrams and Voronoi diagrams are a few of the many examples of diagrammatic INSTRUMENTS for solving problems. A graph can be used as an INSTRUMENT for understanding, summarising and explaining an area of knowledge (i.e. as a visual model). As such there is a wide variety of associated verbs including analyse, construct, count, detect, dissect, examine, magnify, measure, open-up and project. The different forms and roles that graphs may take, the interrelations between them in modelling knowledge (in process and as product), and the various metaphors that can be used to assist understanding, leads to the idea of a "society of graphs" (Paton, 2002b). These societies have many internal interactions, a rich internal (organisational) structure, division of labour, and components that may be heterogeneous. In order to illustrate how a society of graphs can emerge and be exploited, we shall examine the mobilisation of reflective knowledge concerned with a commonly used idea related to society namely, "network". Networks are collected into a grouping of terms called "reticulations" that share many general systemic properties with societies (Paton, 2002a,b). Reticulation terms also include lattice, pathway, cycle, rete, grid, mesh, weave, and reticulum. The example now presented is written to explain the processes
Mobilising knowledge models using societies of graphs
139
of modelling rather than producing a finalised description. In order to illustrate this, we begin with the construction of one type of graph and allow this to bootstrap a reflective knowledge modelling process that will involve several graphing techniques.
3.
MAPPING PROCESS: JOURNEYS WITH C-GRAPHS
A C-graph (Paton, 2002b) comprises a network of associations between verbs. It is usually seeded with a single verb and from this a network emerges (see fig. 1). Other seeding possibilities exist, for example, with a pair of verb nodes and a common arc, or with a noun. In some ways, the growing graph is enabling MAP and JOURNEY to unfold. This can be an individual reflective process for the meaningful construction, reviewing, editing and thinking about a domain. Alternately, intermediate outputs like C-graphs, can be used dialogically to enrich a common descriptive framework between individuals. The form the network takes will varies according to the "seed" verb and the person constructing the graph. As noted elsewhere, the idiographic focus of this work should not be under-estimated, the products of graphing are unlikely to be exactly reproducible (even by the same individual). The trace (history) of the appearance of the verbs is registered in the numerical value given to the arcs, which also indexes a verb label (see table 1). What emerges in the C-graph is a pattern of associations among related verbs. This helps to bootstrap knowledge and reflective
communicate transferv -- exchange ~ interact ~ socialise ~ 27 6 26 ~ / ~ ~ mesh flow -- network~ 21 2 3 interrelate 4~ ~ ~ 1 8 5 ~ ~ 25~.~cycle connect weave ~ thread~ return /"/~92 9 / ~ ~ 1 1 5 ~ ~ ~13 lace combine
spin
~10. 6S/ ~ /l / w
17 \ , ~repair stitch 1 14 19 Fig. 1. A C-graphof verbs relatedto "network". knit~
140
R. C. Paton
Table 1 Meaning of arc labels in fig. 1 Arc
Meaning
Arc
Meaning
Arc
Meaning
Arc
Meaning
1 2 3 4 5 6 7
Involves Involves Enables Constructs Forms/produces In order to In order to
8 9 10 11 12 13 14
Involves Involves Is like Enables Neededin Involvedin Is like
15 16 17 18 19 20 21
Is like Is like Involvedin Needsto In order to Involvedin Is like
22 23 24 25 26 27 28
Enables Trace Involves Leadsto Forms/produces Leadsto/results in Involvedin
understanding about the domain. It also provides a focus on processes rather than objects. Verb associations may reflect causal or inferential relations although these will not be pursued in the present discussion. At any stage during the production of the C-graph the current verb, or any of the verbs that have appeared up to that point, could seed other C-graphs. If the growth of the C-graph stops, other graph types could be used to facilitate growth or move to other representations. Inspection of the form of the C-graph shows that some verbs are linked to many others, some are members of longer open paths, and some are part of closed paths or loops. Inspection of the arc meanings in the C-graph reveals a number of patterns of association including (with associated verbs in parentheses) 9 T h e m e s : are concerned with sequences of processes (leads to, in order to, needs to), 9 N e s t i n g s : deal with one process that is a part of another (involves), 9 C l u s t e r s : often concerned with similar processes (is like). It should be noted that many of the arcs in the C-graph could have been labelled with more than one associated verb. As well as generalising the verb meanings of the arcs, it is also possible to simplify the graph in terms of the verbs at the nodes. One approach collects verbs together because they are related to a similar process or action. The left-hand graph in fig. 2 gives one summary. This type of graph will be described as a "star" graph. Note that network is no longer the core verb node. The core node now represents what is being shared between all the other nodes and the core in the C-graph is now an arc in the star graph. The status of the core concept in the star graph is related to an emerging concept that is concerned with the collection concept for the whole graph. It also has a connection with the idea of a colimit that is discussed in section 4. Arcs in this star graph are bi-directional, and this emphasises that the verbs in the star graph have possible meaningful relations with each of
Mobilising knowledge models using societies of graphs
141
socialise~ /interrelate connect @
~
mov/I~mesh
""
Fig.2. Exampleofstargraphandits linegraph. the other nodes and with themselves. The latter (reflexive) case is clearly demonstrated with the various interrelations between verbs associated with transforming fabric (see fig. 2). Remember that the verbs in the star graph have been nodes in the C-graph. Given that each process in this arrangement can be related to itself and all the other processes (through the central collecting node), it is possible to construct another type of graph that makes these interconnectivities explicit. This is a line graph, which is constructed by making the arcs of the source (star) graph become the nodes in the target (line) graph. The line graph looks complex and the bi-directionality of the arcs in the star graph has been ignored to keep it readable (even so there are 28 arcs). This complex graph could be used to represent strengths of association (e.g. weighted arcs), or to identify some clusterings and cliques (e.g. between network, mesh and socialise). There is a TEXTUAL dimension to the many subtle connections, paths, threads and themes. In many ways, the line graph provides an INSTRUMENT that can FRAME all possible associations and from which a clearer model can emerge.
4.
REGIONALISING C-GRAPHS
Using ideas concerned with MAPS and topography, we may say that fig. 3 regionalises the C-graph of fig. 1 with regard to verbs associated with four metaphorical collection constructs SOCIETY, CONDUIT, WEAVE/FABRIC and GLUE. These appeared through inspection of the verbs and the requirement to generalise, collect or include verbs together. There is also a meaningful association between these metaphors and the verbs labelling the star graph in fig. 2. However, remember that the society of graphs approach is not only idiographic, it also preserves the trace or record of when particular forms appeared. In this case, the analysis resulting in fig. 2 appeared before that in fig. 3. The metaphors that are associated with regions of the C-graph
142
R. C. Piton
communicate .....~__ 28
ttan~et
~,/ excaange ....\~-,~-
I~ 11 \interact I ,,
. . ~~socialise
CONDUIT ~27w 26 ~ n e 6 ! o r k ~ WIgAVE/ 5 ~ ~ -~-lnterrelate FABRIC 24! ~ ~231 8 / retug5~cycle connect/ weave" 8~--~'thread" ~ lace J
GLUE25"~~~22~ / combine/
9/ ~1 5 ~ ~ sew spin ~10. 1 6 / ~ / 1 7 k n i t ~ ~titch1-~9 repair
Fig.3. C-graphandsomekeymetaphors. have emerged from the "bottom-up". An alternative strategy could have been to produce metaphors by constructing a type of star graph (called a scratch net (SN) - e.g. Piton, 2002b) from the top-down. SNs share some things in common with (so-called) "spider diagrams" and "pattern notes", and fulfil a number of roles including summarisation, abstraction and what could be called "diagrammatic brainstorming". They are distinguished from star graphs in terms of the process by which they are formed, rather than in their appearance on paper. SNs are constructed very quickly (like a sketch) and can summarise a lot of information (as a kind of advance organiser). They can satisfy the role of a memory aid and provide a simple visual FRAME on which to arrange concepts and terms. A SN based on the regionalisation of fig. 1 is redrawn in the top left of fig. 4. We are now combining MAP and TEXT features of the diagramming process to interpret topographical relations and edit the C-graph. The product is an annotation in terms of network associations. It soon becomes clear that the associated metaphors share concepts that may be displaced between them. For example, certain structural and organisational features of a SOCIETY may be described using language associated with FABRIC/WEAVE. FABRIC/WEAVE, GLUE and CONDUIT share, combine or blend ideas concerned with holding something of many parts together. In this case, GLUE relates to adhesion, cohesion and combination (Piton, 1997). An SN that has connections between peripheral nodes is described as a Factor complex (FC). This graph is no longer a tree structure like the star graphs and distributes the focus from the central node to include the other nodes.
Mobilising knowledge models using societies of graphs SOCIETY
143
FABRIC/WEAVE
network CONDUIT
GLUE
cohere contain Glue(dinto) Societies
integrate combine pattern
/~ The Fabrico f / _ J _ Societyis / ~ " )
% SocialConduits _L ~area Fabric ~
,ue
structure. ~ organlse
The Fabricationof ConduitsneedsGluing Fig. 4. Some graphs associated with the metaphor SN.
Figure 4 (top fight) shows an alternative drawing of an FC that "forgets" the central node and explicates the associations (implied in the SN) between the peripheral nodes (i.e. the metaphors). We shall call this a "forgetful FC". From this graph, it is possible to identify six pairs of terms and four triples. The decomposition of the tetrahedron into four triples and thence into six line segments represents a sequence of reductions of simplices of dimension 2 and 1 respectively. Figure 4 shows that the pairs and triples have meaning (sometimes overlapping) in respect to verb associations. The pairings (bottom left of fig. 4) show how a large number of verbs (many more could be added) are displaceable between metaphors. Similarly, a number of themes emerge from interactions in the triples. The displacement and sharing of terms between the different metaphors enables blends and analogies to be formed. Verbs that are cohering the metaphor pairings in fig. 4 can each seed new C-graphs. The construction of the forgetful FC graph has similarities with the line graph construction in fig. 2. Using the language of Category Theory it is possible to describe the patterns of interactions between the metaphors in terms of a colimit (Ehresmann and Vanbremeersh, 1987). The pattern is a collection of cooperating objects in which displacements, analogies and blends may be made. A colimit (cohesive binding) glues a pattern into a single unity in which the degrees of freedom of the parts are constrained
144
R. C. Paton
Fig. 5. Graphrepresenting the colimit of the FC pattern. by the whole. A diagrammatic representation of this process is shown in fig. 5. As is made clear by the description of a colimit, it is important to note that what is portrayed in the colimit is much more than the combining of the four triangular simplices in fig. 4. The colimit is not the same object as the central node the SN or star graph. It presents a notion of hierarchy with the pattern between the parts at one level being glued to a single unit of meaning at the next level. The colimit models the integration of the pattern into a single unity. From the original regionalisation of the C-graph into an SN we have moved to a semantic and diagrammatic appreciation of the emergence of a collecting concept operating over the contributing metaphorical sources.
5.
WIDENING THE WINDOW
So far an emphasis has been placed on verbs and processes. This has been deliberate in that many graph-based approaches to knowledge modelling place an emphasis on nouns (objects) and the present approach seeks to incorporate objects and processes. One method for elaborating the richness (or depth) of the internal structure of a collecting concept such as network, is with an Expansion graph (E-graph). E-graphs are constructed by associating prepositions with a word and expanding the possible properties that can be associated in this way. As fig. 6 shows, a number of (mainly) prepositions can be associated with the noun network. Many meaningful conceptual links can be made between these terms and some general properties are shown at the fight hand side of the figure.
Mobilising knowledge models using societies of graphs involving ~
~
of
Flow, transfer Parts
for network
145
Relations
between Purpose, goal within Container concerning like
Other reticulation terms (e.g., net, web, grid)
through l Fig. 6.
Preposition expansions on network.
The WINDOW can continue to be widened by constructing further SNs or E-graphs related to the property terms on the fight-hand side (such as container or conduit). It would also be possible to go deeper by particularising for specific domains (such as computer networks, social networks, blood systems and so forth). An E-graph has similarities to both SNs and FCs. The preposition layer reflects a SN. The layer between preposition and the properties is more network or FC-like. As properties are further expanded in the source graph, so a deeper structure emerges. This gives a sense of enlarging things that seem farther away and relates to the INSTRUMENTALITY of this use of E-graphs.
6.
CONCLUDING REMARK
A society of graphs can be used to facilitate the mobilisation of domain knowledge. This chapter has reported an illustrative example. Beginning with a C-graph, we pursued the development of a society of graphs that included star graph, SN, line graph, FC, forgetful FC, simplices, colimits, E-graph and layered FC. Relations between the graphs have been explored (many more could have been described) and the utility of particular types has been related to a number of descriptive metaphors. Within this society, we have observed division of labour, heterogeneity of components, co-operation of the participating graphs, nestings and embeddings among graph forms. The societies of graphs can be represented graphs (networks) of graphs. The language and concepts that have emerged from the previous discussion can be re-applied to anticipate and explore emerging knowledge models.
146
R. C. Paton
REFERENCES Ausubel, D., Novak, J.D., Hanesian, H., 1978. Educational psychology, 2nd Edition. Holt, Rinehart and Winston, New York. Ehresmann, A.C., Vanbremeersch, J-P., 1987. Hierarchical evolutive systems. Bull. Math. Biol. 49, 1, 13-50. Meyer, M.A., Paton, R.C., 2002. Interpreting, representing and integrating scientific knowledge from interdisciplinary projects. Theoria Hist. Sci. 6, 2, 323-356. Novak, J.D., 1998. Learning, Creating and Using Knowledge. Lawrence Erlbaum Associates, New Jersey. Paton, R.C., 1997. Glue, verb and text metaphors in biology. Acta Biotheor. 45, 1-15. Paton, R.C., 2002a. Process, structure, and context in relation to integrative biology. BioSystems 64, 63-72. Paton, R.C., 2002b. Diagrammatic representations for modelling biological knowledge. BioSystems 66, 43-53. Sowa, J., 2000. Knowledge Representation Logical, Philosophical, and Computational Foundations. Brooks/Cole, Pacific Grove, CA. Ricoeur, P., 1981. Hermeneutics and the Human Sciences. Cambridge University Press, Cambridge. Thompson, D'Arcy W., 1942. On growth and Form. Cambridge University Press, Cambridge, First edition was 1917.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
11
Verbal and visual cues for navigating mental space: conceptual mappings and discourse processing theory 1 J. Luchjenbroers Department of Linguistics, University of Wales, Bangor, Gwynedd LL57 2DG, Wales, UK
The primary foci of this chapter concern (a) the range of verbal and gestural cues speakers have available to them to navigate mental spaces during discourse and (b) how gestures iconically refer to the subject matter being discussed. This discussion is embedded within an introduction to cognitive linguistics theory that is relevant to the gestural analyses that follow. The examples provided illustrate varying complexity in how gestures may amplify the verbal component of speaker utterances, as well as give further evidence of the conceptual mappings needed by hearers to comprehend speaker-intended meaning.
1.
INTRODUCTION
This chapter reports on research that explores how multimodal sources of information combine to facilitate the hearer's task in reconstructing 1The research drawn upon in this chapter was supported by a postdoctoral and a New Staff grant to the author from the University of Queensland (Australia). Many thanks to Aaron Cicourel, Gilles Fauconnier, and Seana Coulson for their insightful comments on many aspects of my work, including many of the issues dealt with in this chapter. Thanks also to Shannon Dougherty (my research assistant) for the countless hours of transcription, and also to Simon Parker and Pat Carroll for their comments on an earlier version. All oversights are of course my own.
147
148
J. Luchjenbroers
speaker-intended meaning. The particular issues considered here include an initial discussion of discourse processing theory (in general) and Mental Spaces theory (in particular). I will then present a range of gesture types found in these data that perform a variety of different discourse functions, as well as discuss the role of iconicity evident in these gestures that relate to the lexical component of talk.
1.1.
Lexical choices: discourse processing theory
Like others in the discourse field, I view discourse as a process of mutual ground construction in which discourse participants appear to achieve a mutual understanding of what they are talking about and also appear to work toward that goal (cf. Grice, 1975). This is the basis of the "cooperative discourse" approach that is inherent in the work of many discourse theorists working in this field (e.g. Tomlin, 1987; Luchjenbroers, 1993, 2000, MS; Chafe, 1994; Lambrecht, 1994; Clark, 1996, 1997). Cooperation thus involves speakers giving addressees adequate cues to derive their speaker-intended meaning, and addressees making a determined search for that meaning. According to Tomlin et al. (MS) virtually all approaches to discourse processing involve the management of a mental model or conceptual representation of discourse information. In effect, cooperative discourse practices require speakers to tailor their outputs to what they assume is needed in the hearer's cognitive models (to make that model resemble the one the speaker is trying to recreate in the hearer's mind); and it also requires the hearer to "unpack" the speaker's conveyed information to reconstruct the model the speaker is trying to convey. From this perspective of discourse processing, it is the speaker's linguistic choice that lays the foundation for new conceptual cognitive models of discourse information, to which subsequently presented information can be mapped. 2 Many of the approaches to discourse processing referred to by Tomlin are essentially Functionalist (i.e. sentence parts are defined by discourse function - e.g. topic, focus), and within that general approach to discourse processing, the conceptual coordination described above is primarily 2As outlined elsewhere (Luchjenbroers, 2000) functionalist accounts suffer most from a blurring of speaker and hearer roles (i.e. production with comprehension processes). In effect, the single cognitive system account is idealised in that it presumes that both the speaker's and the hearer's cognitive models share the same fundamental properties, and therefore a speaker' s choice of functional elements (e.g. topic, focus) will naturally match the needs of the hearer's cognitive model. On solipsistic grounds, this must be impossible.
Verbal and visual cues for navigating mental space
149
attributed to surface features of text. In contrast, however, Mental Spaces theory is fundamentally based on the view that linguistic form underspecifies speaker meaning (cf. Fauconnier, 1985; Fauconnier and Sweetser, 1996), and that meaning construction takes place at a conceptual level that is compatible, but not synonymous, with the mental models of discourse referred to above.
1.2.
Lexical choices: Mental Spaces theory
Mental Spaces theory (Fauconnier, 1985/1994) was put forward (in part) as an answer to fundamental problems in mainstream Formal Semantics that sees language in terms of truth-functional meaning (cf. Frege, 1970; Montague, 1974). The formal, compositional approach to sentence and lexical semantics holds that to understand the meaning of a word or sentence is to understand each of its component parts (i.e. the whole equals the sum of the parts), on the basis of which that word or sentence can be measured as a true or false description of the world it purports to describe. 3 For example, in the sentence, My brother is a bachelor, the word bachelor denotes "unmarried male"; and if the referent were married, then this sentence would justifiably be deemed false. Many such examples have been generated to lend credence to the Objectivist view (i.e. 1-to-1 relations exist between words and the world); however, non-prototypical usages of words like bachelor example make evident the fuzzy boundaries that defy truth-function: e.g. ?The Pope is a bachelor. Even though the Pope is unmarried and male, he is unlikely to be described as a bachelor because he is not eligible to get married, making this a difficult case to measure as either true or false. Other examples have also been cited as unclear cases for truth-function, such as ?My gay friend is a bachelor (see Fillmore, 1982; Lakoff, 1987; Sweetser, 1990). Examples like these also illustrate how social expectation (which is subject to change over time) impacts on word meaning and thus the flexibility of the fuzzy boundaries surrounding questions of truth. In contrast to the Objectivist approach to sentence and lexical semantics, Mental Spaces theory measures the truth of a proposition only in terms of the mental space to which it is attributed. For example, in sentence (1) below, the committee's choice is attributed to the temporal space, in 1993, while hosting the games is attributed to the removed temporal space, in 2000 - see fig. 1. 3See Coulson (2001) for a comprehensive discussion of the formal, compositional approach as well as the Objectivist approach that she advocates in her book.
150
J. Luchjenbroers IN
~ii~~~i~~\
i i i i)! ! : i i i i ?i i it
i i Sydney:host: i/ ~iii!!!!!!!i!i!iii~ ~ "~.'."!:OlympiCi i :./
Fig. 1.
Mappings across spaces.
In both clauses, the truth of the attributed propositions is relative to the spatial definitions to which they are attributed; therefore, changing the spatial definition changes the criteria by which the embedded proposition can be measured as true or false: cf. "In 1994 the Olympics committee chose Sydney..." is false because the decision was made in 1993. (1) In 1993 the Olympics committee chose Sydney to host the Olympic games in 2000. Similarly, when considering counterfactual statements like (2), the sentence conveys a logical reasoning that is necessarily true: if X then Y; if not X then not Y; both not X and not Y are true; therefore this sentence is logically true, even though some may argue that a sentence like (2) is without truth-function (i.e. neither true nor false) because it describes a possible world in the past that cannot obtain. In contrast, from a mental-spaces perspective, truth is measured only by the mental space in which the proposition occurs, making no distinction between real world and hypothetical worlds. In example (2), relevant correlations are mirrored within parallel spaces: one in the Here-and-Now (i.e. the real world in current time and space) and the other in a hypothetical space. Both spaces relate the speaker to their conduct at school and then to their job outcomes. However, the cross-domain mappings - from the hypothetical space to the Here-andNow - maps the success found in the hypothetical space to the speaker now, bringing with it the inference: "I'm smarter~better than my current state implies" - see fig. 2.4 (2) If I hadn't dropped out of school, I'd be a company president by now. 4 More current approaches to Mental Spaces (cf. Fauconnier and Turner, 2002) would deal with this as a Conceptual Blend. This is also discussed later in this chapter.
Verbal and visual cues for navigating mental space Here-and-Now
Hypothetical
Dropped out
Finished school school
Poor job
I -----------~
151
school
,/
Fig. 2.
Mental Space projections.
These examples illustrate the advantages of the Mental Spaces approach to semantic reasoning and also highlight how the formal approach to semantics suffers in that it cannot distinguish between truth and acceptability. The Objectivist account also suffers because it cannot deal with partial structure such as in this case where both counter-propositions may be true but the causal relation itself is false. As discussed above, information processing requires the management of mental models (i.e. conceptual representations) of discourse information. The Mental Spaces orientation to discourse processing further requires speakers to make clear what conceptual spaces are needed, as distinct from the propositional information to be processed within those spaces. Spaces are set up with a number of devices, the most prevalent being temporal and locative phrases or clauses, and also with tense. Additionally, spaces may be separated from propositions through pauses or hesitations in speech. Consider the examples given in (3). (3a) is that an ac'ceptable thing T ... in in in Sabah T ...in Borneo T [OF5OF8:15-16] 5 (3b) is that an acceptable thing in in Queensland? In example (3a) the truth of the queried proposition, is that an acceptable thing, is clearly limited to the locative space, in Sabah, in Borneo. If the speaker were to have used a different spatial definition (such as in (3b), in Queensland), she would have changed the context in which the proposition can be measured as true or false. Similarly, in (4) the enclosed proposition: It's six notes in a row, is true or false within the new space to this talk, in music. However, that space and proposition are further embedded within 5Examples are given with the following additional information: bold, primary stress (prosodic pulse); underlined, where verbal and gesture components coincide; arrow I above text, gesture onset; arrows ( T, 1, ---' within text examples), intonation contour. In more detailed text examples, an arrow ~ in front of an example indicates the particular line to which attention is being drawn. Data numbers are coded as: participants, nationality and gender (OF, Ozzie Female), followed by the full data line numbers.
15 2
J. Luchjenbroers Personal Opinion Music (Field) Plagiarism = 6 notes in a row
Fig. 3.
E m b e d d i n g of spaces.
the personal space of the author's opinion, I think, making the truthfunction again arguably inappropriate - see fig. 3. (4) In music I think it's, six notes or something in a row [OF4OF10:19] Space building occurs as needed during discourse according to the guidelines provided by the linguistic expressions in discourse (it will be shown later that this can also be established through gesture). Important to note is that neither spaces nor propositions are linguistic phenomena although propositions are conveyed linguistically, spaces are conceptual phenomena that partition knowledge and provide the specific contexts in which the associated propositions can be measured as true. In addition to the hearer recognising and establishing specific conceptual domains (i.e. mental spaces) in which to process propositional information, they also need to map conceptual elements from one domain to another. Once information is offered, it can be drawn upon through mappings to newly created spaces and propositions. Discourse participants perform these required mappings with little effort, though the actual speech stream itself gives little evidence of how and when such movements around cognitive space are needed. Mental spaces rely on two important sub-processes: (i) recognising and establishing specific conceptual domains (i.e. spaces) in which propositional information is to be processed and (ii) mapping conceptual elements from one domain to another (as needed). Discourse has its own structure, and movement around conceptual space can involve hierarchical structures that also require conceptual navigation. In terms of mental spaces, accessing and re-accessing a higher order space may be an important instruction to the hearer for comprehension. Although relevant to the concepts raised in this chapter, the complexity of discourse levels and the embedding of spaces has been considered in depth in another chapter (Luchjenbroers, MS). It is the focus of this chapter to discuss how gestural information can provide hearers with appropriate spatial cues to facilitate hearer comprehension of the verbal component as well as navigate the conceptual space they are meant to construct.
Verbal and visual cues for navigating mental space
2.
153
DATA
The data used for this chapter come from a larger video-taped study into negotiated talk involving a total of 36 Australian and non-Australian University students. 6 The subjects were placed into dyads, bringing one Australian male or female student together with either another Australian or a foreign student. The participants in each dyad were given the task of devising guidelines (to be given to faculty) about how new students should avoid the pitfalls associated with (a) cheating or (b) plagiarism. The subjects were recorded in a sound-proof room; positioned diagonally opposite each other. The purpose of this positioning was to maximise the view for the analyst (sitting in the next room, behind a large, tinted window) and the video-recorder, without drawing undue attention to either. Subjects reported that they found the analyst easy to ignore, but that was less true of the videorecorder. Each dyad lasted roughly 30 min and each participant was recorded twice, making a total of 36 interactional dyads (approximately 18 h of data). Subjects were paid for their participation and anonymity was assured. This chapter focuses on 12 of these dyads, which is the total number of dyads involving native Australian students only. The Australians-only data is made up of: four Male + Male dyads, four Male -+- Female dyads, and four Female + Female dyads. The first notable feature of these data is that there are remarkable gender differences in how much subjects make use of gesture during conversation. Australian women gesture far more than Australian men, and noticeably more so when talking to another Australian woman than when talking to an Australian male; Australian men generally gesture very little, and particularly so when talking to another Australian male. Most of the gesture examples used for this chapter were therefore drawn from the Female + Female dyads.
3.
GESTURES
The size of a speaker's gesture space is defined by where they make most of their gestures. Among the Australian participants of this study, the general dimension of this "comfort zone" for gestures is roughly the shape of a cube that runs from shoulder to waist in height, from the elbow (at the waist or in these data, the table) to the hand in depth, and has body width. The actual size of a speaker's gesture space, and similarly the proportion of gesture to 6The larger study is entitled Gender and cultural representations of "self" in the language of negotiations, and was conducted during a Postdoctoral Fellowship with the University of Queensland, 1997-1998.
154
J. Luchjenbroers
speech, varies from speaker to speaker, and very likely from culture to culture. Therefore for some, the gesture space is a much smaller cube, sometimes involving no more than the speaker's hands, and in one case, just movement of the thumbs from a clasped hands position. In general, speakers who are less animated in gesture use a smaller gestural cube, and those who are more animated use a larger cube. In addition to this comfort zone in which speakers make most of their gestures, however, speakers also make numerous gestures that are clearly outside these general boundaries. I suggest that these general vs. extreme boundaries are consistent with "inside" vs. "outside" their gestural "F-space", and when a gesture is made within (or outside) F-space, the speaker is conveying additional but relevant information about information focus and navigating mental space. 7 As will become evident in the next section, gestures within F-space are relevant to "Here" or "Me" (i.e. the speaker) and gestures outside F-space are relevant to "Not here" and/or "Not Me". In this sense, gestures can function like contrastive stress, in that pointing to a physical location in front of the speaker, amplifies not only "Here" but also "Not there", or "This" and "Not That", while deictic gestures to physical locations outside F-space amplify the opposite.
3.1.
Indexical gestures
Researchers in gesture (see McNeill, 2000) generally recognise at least three types of gesture: (i) Deictic (which relates to "here" vs. "there"), also called Indexical (cf. index finger) gestures, 8 (ii) Iconic gestures which iconically (and often metonymically) illustrate features of talk, and (iii) Pantomimes. Indexicals are the most basic form of gesture and refer to gestures that have a specific location: the physical location of the referent has a direct relation to the physical location the speaker is pointing to. Also in terms of F-space, the relation between the physical locations of the referents and the indexical gestures have no coincidence - see (5)-(7). (5) of course at the university that's...that's not on [OF5OF8:41] 7 Although it is tempting to refer to this gestural comfort zone as "Focus-space", this would be misleading as a speaker can refer to multiple spaces within that physical space in a single contribution, each of which enjoys "focus" for information processing purposes. Additionally, the focus space can be outside F-space (i.e. Not here). Later examples will elaborate on this point. 8 Deixis (sometimes called "shifting" because specific reference shifts from speaker to speaker) refers to lexical and gestural items that depend on the context for meaning - e.g. sitting here at my desk, my here is simultaneously everyone else's there. Hence the words here and there have no objective meaning apart from indicating the speaker's orientation toward phenomena around him/her.
Verbal and visual cues for navigating mental space
155
both hands form a flat cup, palms down and slanting inwards, fingers touch the table in front of Speaker (-- F-space) (6) they say the university policy here...is [OF5OF8:57] Right hand holding pen, points down, touching the table in front of Speaker (= F-space) The gestures in examples (5) and (6), referring to "here", are firmly in the centre of the speaker's gestural space - i.e. in F-space; however, example (7), referring to a different university, is clearly outside F-space and the gesture moves the finger, as well as the hearer's attention, away from it. (7) there's even a special section of legal studies at QUT [OF5OF8:178] Left arm crosses body (and F-space); left finger points away from Speaker in the direction of QUT (# F-space) The gestural choice in (7) is not arbitrary: it requires greater physical effort for the speaker to produce than would have been a gesture in the same direction made by the fight hand. Notably, this speaker is fight-handed and in (6) made the gesture, here, with her fight hand. However, if she had chosen the fight hand to point to the fight side, to make her there gesture, even though the indexical would still have been outside F-space, it might have been less obviously outside F-space than the gesture she produced. Hence, the speaker's choice to use the left hand to cross the body (and F-space) to a position that is again outside the speaker's gestural F-space is more telling of the speaker's intent and the focal status of that information. The body of talk is about practices at university (in Australia) but the gestures make clear that for these speakers, the specific space that is maximally active (and focal, and possibly more importantly" relevant to them) is what happens "here" (at the University of Queensland) as opposed to "Not here". This indexical use of gesture to refer to concrete objects with a specific physical location is already a progression from the most basic sense of here, which bears a 1-to-1 relation between an object and its location - e.g. This pen is mine, or I am here, and You are there. Examples such as these show that the dimension of the object referred to is comparable to the physical location indicated by the indexical: the eye can move from the finger to the object it refers to. In contrast, all three examples given in (5)-(7) show an iconic relationship between the physical location of the referent and the gestural space allocated to it. For example, the use of here can mean this room, this building, this university, this city, this country; each of these locations can be serviced by the indexical here, and in each case the indexical bears an iconic relationship to the full dimension actually being referred to; for each of these possible referents, the gesture, here, would be placed squarely within the speaker's F-space.
156
J. Luchjenbroers
In addition to these, a third point on this scale is also possible, where the indexical gesture does not point to an object's physical location at all - e.g. (8). l(1) ,I,(2) (8) like, if you know they've sort of taken this out of this book... 1(3) 1(4) because they've referenced this and you've read this book... what do you do? [OF4OF10:33-5] 1. R hand, across L hand but centre field (inside F-space = plagiarised material) 2. R hand, across L hand and further to Left (border of F-space = source text) 3. R hand points again to "source text" space 4. R hand points again to "source text" space In cases such as this, external phenomena that are related to the subject matter under discussion are attributed to points in gesture space. This example shows how (i) the plagiarised material and (ii) the source from which the plagiarised material was taken are distinguished from each other by being allocated to distinct points in the speaker's gesture space (both referents are focal and within F-space). This disambiguating strategy is clearly used by speakers, and is available to hearers (if they are being attentive to the cognitive model management cues being used by the speaker). In cases such as these, once a speaker has attributed a referent to a particular location in (physical) gesture space, s/he will continue to point to the same locations upon further references to those referents. In this way, gestures serve as a form of reference tracking that is available to all participants in discourse. Examples such as (8) also demonstrate how, in attributing referential gestures to specific locations in physical space, speakers engage in a form of conceptual management that bears a relationship to formalised practice in sign languages (e.g. Auslan, BSL, ASL). This kind of iconic gesture to navigate the speaker's use of mental and physical spaces is among the simplest kind of gesture-cognitive process correlation to appear in visual discourse data. In each of the examples considered in this section, the (mostly indexical) gestures convey a reasonably straightforward semantic relationship between the referent identified by the gesture and the lexical description that accompanies it. The data has also revealed a range of gestures that expand on the information provided by the lexical component. This type of "complementation" can take two forms: (i) gestures that amplify some aspect(s) of the semantic content conveyed by the lexical component and (ii) gestures that convey additional information
Verbal and visual cues for navigating mental space
157
than what is conveyed lexically. This then presents the possibility of gestural complexity and how it may facilitate the communicative exchange between the speaker and hearer.
3.2.
Iconic complexity
Gestural complexity refers to the correlation between the meaning conveyed by a gesture and the lexical component that it complements. For example, the gesture for take is generally illustrated with one hand scooping an unseen substance or object and drawing it to the body (= "make mine"). This gesture was used to complement talk of taking, stealing, plagiarising, and cheating. Therefore, when the speaker gestures take when talking about plagiarism, it is clear that s/he conceptualises the act of plagiarism as a form of theft. Furthermore, the gesture for take is a good example of conceptual metaphor of both the event-structure of taking, as well as the metaphor THOUGHTS/IDEAS ARE OBJECTS. In this case, the gesture take illustrates an event-structure that has its experiential basis from infancy where infants grab what they want and bring it closer to themselves. 9 In this sense, the take gesture is appropriate, even though the stolen phenomenon (thoughts or ideas) does not have mass and therefore cannot be grasped or displaced. Cognitive Semantics literature abounds with examples of the THOUGHTS/IDEAS ARE OBJECTS metaphor (cf. Lakoff, 1987 - IDEAS ARE ENTITIES: putting ideas into words, sending ideas to other people, getting down ideas, ideas get stolen, etc.); therefore the event-structure illustrated by the gesture take is another consistent usage of this metaphor. In addition to metaphorical examples such as take, other simple iconic examples illustrate an event-structure metonymically - as in (9). The eventstructure associated with making a phone-call is more complex than just holding the phone to one's ear. In this way, the illustration of the telephone mouth and earpiece is metonymic for the entire process of dialling a number and talking to someone on the other end. 1 (9) you know quickly ringing their mates to ask them the answer to a question... Right fist raised to Right side of the speakers face: thumb to the R ear and the pinky to the mouth ( t h u m b = e a r p i e c e and pinky = mouthpiece). 9For more examples of the "Experiential Basis of Metaphor", see http://www.ac.wwu.edu/---market/ semiotic/metl2.html.
158
J. Luchjenbroers
Even though the telephone-call event-structure is more economically illustrated by this associated gesture than the take example above, both examples have a very direct relationship between the gesture's meaning and the speaker's choices in the lexical component. In contrast, more complex examples illustrate gestures that provide more information than is given in the lexical component uttered by the speaker.
3.3.
Iconic complexity and Conceptual Blending theory
In order to fully explicate the complexity of iconicity in the gestures found in these data, I will first expand on Conceptual Blending theory and how this relates to the data to be described. Conceptual Blending is a form of conceptual integration that is on "a par with analogy, recursion, mental modelling, conceptual categorisation, and framing" (Turner and Fauconnier, 1998). During this process, the comprehender takes semantic components from two (or more) input sources (typically an entrenched frame and a new context) and produces a new interpretation of that entrenched frame: the blend. The kind of Conceptual Blending referred to is most frequently illustrated with humour (e.g. single-frame cartoons) where blends are a necessary factor in comprehending the joke (cf. Coulson, 2001; Fauconnier and Turner, 2002). For example, in my bathroom I have an illustration of an overweight woman standing on a set of bathroom scales. She is holding a revolver and both her gaze and the barrel of the gun are directed at the "face" of the scales which would show her true weight. In order for this joke to work, the comprehender must blend the framerelevant knowledge associated with standing on a set of scales to determine one's weight (= entrenched frame), together with the frame-relevant knowledge of a hold-up (--novel context). Both frames share the genetic features of a person doing something to obtain a specific result. The relevant feature of standing on a set of scales is to determine one's weight; the person on the scales has no power or control over the result given by the scales and the scales are not subject to intimidation (being an inanimate object). In contrast, the relevant feature of a hold-up is to force someone to do as the person holding the gun wants (and likely not what they would chose to do themselves). In the latter case, the person holding the gun has the power to not only influence outcome but also the behaviour of another person through the threat of being shot. The blending of these two frames superimposes the scales' inanimate indifference to the weight-watcher's desired outcome, with a possible world in which the scales are personalised
Verbal and visual cues for navigating mental space
159
into someone who can be intimidated and hence manipulated to fulfil the "bandit' s" (here, weight-watcher' s) wishes. This blend is not merely a result but is required to process a novel presentation of a well-known frame. The juxtaposed information together creates a new image that would not be generated by just one of these pieces of information alone. I suggest here that in similar fashion, the complexity of the gestures given below, although not necessarily novel, enriches the conceptual representations of discourse to be created, and therefore increases the information available to the hearer. The following examples show how gestures that convey complementary but different information to the lexical component, encourage a blend of two or more sources of information that would result in a more complete representation of the speaker's meaning than would be conveyed by the lexical component alone. 1~ For example, in a case such as (10) the speaker's gesture reveals her attitude toward the crime, not immediately apparent from the lexical component alone, l (10) their...outline...and then that will...er it'll prevent the l 1 holus-bolus...copying [OF5OF8:308-10] 1 - 3 . Left hand flat and Right hand chopping into middle of palm ( - cut out; in F-space) In (10) the speaker's gesture resembles a guillotine that would stamp out this unwanted behaviour. In this sense the gesture meaning goes beyond what is conveyed lexically (prevent), for which one might have expected a barrier gesture. The enriched representation therefore includes features of prevention (conveyed lexically) together with an element of dire consequences, conveyed by the gesture. Similarly, example (11) reveals two complex iconic gestures, the second being more complex than the first. In this example, the speaker is talking about an example of plagiarism and the reference, inside, is complemented with a flipping pages gesture which conveys that here she is talking about a book; hence the gesture is indicative of the size of the work referred to. In the next clause, she includes the proposition, they cited, which is complemented by a writing gesture, l(1) 1(2) (11) um...and then inside they they've they cited... [OF5OF8:694] l0 Coulson, in a personal communication on the applicability of blending theory to the coordination of gestures with speech, explains that juxtaposition is not in itself proof of a Conceptual Blend. I see this as the result of Conceptual Blends being the product of the comprehension process, although likely one that is intended by the speaker - particularly in the case of jokes which otherwise would not work. As a function of the comprehension process, blends are thus outside the speaker's control; speakers can juxtapose sources of information without the hearer taking note of it. Only through further research into exactly how much speakers take in (such as the work done by McNeill and associates) can this point be truly addressed.
160
J. Luchjenbroers
1. Right hand in the air, flipping pages (temple height) 2. Right hand, writing in the air, from centre forehead to shoulder height (outside F-space). This writing gesture (i.e. making squiggles in the air as though holding a pen and writing) is complex because the height and directionality of the gesture conveys that it is not just a citation, but a full length document (in this case, a declaration). The lexical component here is enriched because the first gesture helps clarify the Mental Space (i.e. a thesis), while in the second clause the verbal component conveys the proposition to be processed within it (they cited), but the associated gesture gives detail about the quality of that event (i.e. magnitude). Unlike the humour examples where Conceptual Blending is a necessary component to explain how a joke works, here there is little proof that hearers actually blend the multiple sources of information into a singular representation of discourse information. Nevertheless, these data do reveal that such blending of input sources is necessary to derive a comprehensive representation of the information presented to hearers. In sum, gestural complexity involves additions to the lexical component of discourse that may have a direct beating on spatial, propositional, and sometimes interactional dimensions of talk. Unlike the lexical component, which can generally be unambiguously assigned one or other mental space role (i.e. space builder or proposition), gestures often contain components with multiple roles. The remaining issue to better clarify is how gestural cues can help both speakers and hearers to navigate conceptual space.
4.
NAVIGATING MENTAL SPACE AND F-SPACE
Examples such as those given above have shown how speakers sometimes attribute objects or arguments to a specific location in physical (gesture) space, and also how they utilise those locations to help disambiguate when multiple referents are simultaneously "on stage". In addition to this, speakers also make productive use of Inside vs. Outside F-space to amplify the relevance of these referents to (primarily) themselves, in that F-space has as its referential centre, the ego. For example in (12), the topic of discussion is plagiarism, and the relevant subject matter for these speakers is copying written materials. The speaker is a Master's student in the humanities, and in this example she refers to additional contexts: computer data, film or audio, which are all accompanied by gestures that occur outside F-space. In this case, outside F-space gives further illustration of the perceived relevance of these points to the speakers
Verbal and visual cues for navigating mental space
161
and the central theme of their discussion - i.e. outside F-space equals outside the immediate domain of talk. (12) so find out what is plagiarism how about things like...um... 1 l l f r o m um c o m p u t e r data an' I don't know...urn film or... [OF5OF8:70-76] audio information... 1-3. open Right hand moves in circular waves from beside Speaker to further outside desk area (--outside F-space and outside domain of talk) Similarly, example (13) illustrates how speakers make productive use of the contrast between Inside and Outside F-space. Here the speaker's gesture scoops an unseen substance ( - information from undisclosed sources) from Outside F-space, and brings that substance to the speaker's chest (Inside F-space). In this sense, the speaker' s gesture conveys a process similar to the take gesture discussed above - i.e. make mine. l (13) so...he just couldn't.., turn that information around... [OF5OF8:263-4] 1. both hands, palms facing Speaker, rotating from away from Speaker, up and over the other hand to closer to Speaker, several times ( l mixing Outside F-space into F-space) The data has provided a number of such examples where gestures occurring Inside vs. Outside F-space correlate with the gap between (i) the focus of discussion and the ego and (ii) external information or sources to the focus of discussion. In another such example the speaker gesturally takes information from a number of sources (all Outside F-space - made with a full arm stretch around the circumference of the desk before) and then makes a mixing gesture with her hands landing on her chest, to bring that information to the ego and within F-space. In these cases, the locations of her gestures amplify both subject matter that is Inside vs. Outside F-space, as well as "relevant to me" vs. "not relevant to me".
5.
DISCUSSION/CONCLUSIONS
In this chapter I've devoted most energy to illustrating how a speaker's choice of gesture not only serves to amplify the lexicalised information presented to hearers, but also enriches that information by adding dimensions that might otherwise not be conveyed. This extra dimension in some cases illustrates the mental space in which a proposition is to be processed, such as the flipping pages gesture that denotes a book (in which a declaration was made) or by the strategic use of F-space that conveys the relevance of the subject matter or argument to the speaker (or the arguments that they
162
J. Luchjenbroers
put forward). I've also tried to include how the conceptual integration of these sources of discourse information is important for a hearer to fully comprehend speaker-meaning (as suggested in Blending theory). Although discourse analysis cannot prove that a blending of verbal and gestural information actually occurs, it is entirely clear that without such blends, a hearer's interpretation would fall far short of the information made available to them.
REFERENCES Chafe, W., 1994. Discourse Consciousness and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago University Press, Chicago, IL. Clark, H.H., 1996. Using Language. Cambridge University Press, Cambridge, MA. Clark, H.H., 1997. Dogmas of understanding. Discourse Process. 23, 567-598. Coulson, S., 2001. Semantic Leaps: Frame-Shifting and Conceptual Blending in Meaning Construction. Cambridge University Press, Cambridge, MA. Fauconnier, G., 1985. Mental Spaces: Aspects of Meaning Construction in Natural Language. MIT Press, Cambridge, MA, rev. ed.: Cambridge University Press, 1994. Fauconnier, G., Sweetser, E., 1996. Spaces, Worlds, and Grammar. Chicago University Press, Chicago, IL. Fauconnier, G., Turner, M., 2002. The Way We Think: Conceptual Blending and the Mind' s Hidden Complexities. Basic Books, New York, NY. Fillmore, C.J., 1982. Frame semantics. In: Linguistics Society of Korea (Ed.), Linguistics in the Morning Calm. Hanshin, Seoul, pp. 111-137. Frege, G., 1970/1892. On sense and reference, Translations from the Philosophical Writings of Gottlieb Frege. Blackwell Publishing, Oxford. Grice, P., 1975. Logic and conversation. In: Cole, P., Morgan, J.L. (Eds.), Syntax & Semantics, Vol. 3, Speech Acts. Academic Press, London. Lakoff, G., 1987. Women, Fire, and Dangerous Things: What Categories Reveal About the Mind. Chicago University Press, Chicago, IL. Lambrecht, K., 1994. Information Structure and Sentence Form. Cambridge University Press, Cambridge, MA. Luchjenbroers, J., 1993. Pragmatic inference in language processing, Unpublished doctoral dissertation, La Trobe University, Melbourne, Australia. Luchjenbroers, J., 2000. Cognitive strategies for mutual ground construction. In: Verhagen, A., van de Weijer, J. (Eds.), Language & Cognition Conference. Leiden University, The Netherlands, to appear in Levels in Language and Cognition. Luchjenbroers, J., 2002. Prosodic and gestural cues for navigations around mental space, Proceedings of the 27th BLS Conference. University of California Press, Berkeley. McNeill, D. (Ed.), 2000. Language and Gesture, Series, Language, Culture & Cognition. Cambridge University Press, Cambridge, MA. Montague, R., 1974. Formal Philosophy: Selected Papers of Richard Montague. Yale University Press, Cambridge, MA. Sweetser, E., 1990. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure. Cambridge University Press, Cambridge, MA. Tomlin, R. (Ed.), 1987. Coherence and Grounding in Discourse. Benjamins, Amsterdam.
Verbal and visual cues for navigating mental space
163
['omlin, R., Forest, L., Pu, M.-M., Kim, M.H., MS. Knowledge Integration and Information Management in Discourse. Also available at http://logos.uoregon.edu/uoling/faculty/ tomlon/KI&IM/KI&IM.html, downloaded November 2001. ['urner, M., Fauconnier, G., 1998. Conceptual Integration Networks. Downloaded from http://www.humaniora.sdu.dk/--~thewaywethink/encyclo.htm ("Blending and Conceptual Integration").
This Page Intentionally Left Blank
gtudies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 2004 Elsevier B.V. All rights reserved.
12
Sounds, signs, and rapport: on the methodological importance of a multi-modal approach to discourse analysis ~ P. Carroll ~, J. L u c h j e n b r o e r s a and S. P a r k e r b
aDepartment of Linguistics, University of Wales, Bangor, Gwynedd LL57 2DG, Wales, UK bMaterials Science LTSN, Liverpool University, Liverpool L69 3GH, UK
This paper reports on continuing research into interpersonal rapport and what features of discourse are predictive thereof. The notion of "rapport" stems back to the work by Bandler and Grinder (1976), who argued that representational predicate matching - i.e. when discourse participants make semantically similar choices of verbal predicate - is a primary factor in establishing rapport. Parker and Carroll (2001) set out to test the hypothesis that speakers' choices of sentential predicate can provide persuasive predictive evidence of interpersonal rapport, using only transcript data. In parallel work on gestural and paralinguistic cues in discourse processing, Luchjenbroers (2004) identified a number of speaker strategies that serve to facilitate discourse comprehension, and thus, at least potentially, interpersonal rapport between discourse participants. The current research agenda was then expanded to include gestural and paralinguistic features of discourse practice to provide a further means of testing the hypothesis that representational predicate choices are predictive of interpersonal rapport. The research question engaged here asks whether observed rapport can be predicted from a textual analysis of predicate terms 1The research drawn upon in this paper was supported by a postdoctoral and a New Staff grant to June Luchjenbroers from the University of Queensland (Australia).
165
166
P. Carroll J. Luchjenbroers and S. Parker
used in dialogue, and the multimodal approach taken in methodology can help clarify whether any one mode is sufficient in itself to predict rapport.
I. 1.1.
BACKGROUND Discourse processing
Interpersonal rapport between discourse participants is likely the result of a range of personality and discourse factors, many of which go beyond the brief of our research. However, one major factor for establishing and revealing rapport of linguistic interest is the extent to which speakers coordinate the lexical semantics of their contributions. This kind of coordination conveys to an interlocutor the speaker's understanding of the other speaker's arguments and position with regard to the subject matter of discourse. This kind of linguistic coordination can thus indicate that both/all speakers' attitudes and understanding of the subject matter being discussed are in accord. This kind of accord is of particular interest to discourse theory as it implies that two or more discourse participants can achieve a mutual understanding of talk. The theoretical construct concerning mutual understanding is commonly referred to in linguistic literature as "mutual ground" or "common ground" (cf. Clark, 1993, 1996, 1997) and is used to refer to conceptual representations of discourse information that each speaker presumes is shared. However, mutual ground as an explanatory, theoretical tool suffers because discourse theorists have not been able to overcome the subjectivist argument: "I cannot feel your pain and you cannot think my thoughts" (cf. Luchjenbroers, ms-a). The profound difficulty in accounting for how speakers manage (or not) to convey their intended meaning, in a way that adequately integrates into a hearer's conceptual representation of discourse, has repeatedly brought discourse analysts back to the theoretical construct of mutual or common ground because no account of discourse can proceed without it, despite the apparent fact that "mutual" conceptual representations are logically impossible. It is thus apparent that even though discourse proceeds with the goal of achieving a mutual understanding of discourse information, there is no guarantee that one speaker's conceptual representation of discourse information is consistent, or even compatible with, the representations held by other participants (cf. Luchjenbroers, ms-a). This uncertainty thus requires all participants to utilise as many cues as possible (verbal, auditory and visual) to maximise each speaker's success in being appropriately understood as well as to ensure a hearer's success in appropriately
Methodological importance of multi-modal approach
167
understanding the speaker's intended meaning. Luchjenbroers (ms-a) offers a range of lexical speaker strategies (e.g. feedback requests, repetitions) that clearly have the intended function of maximising a mutual understanding of discourse information. The current work on semantic predicate choices offers additional overt evidence of how discourse participants seemingly circumvent the logical impossibility of mutual ground, and pursue a criterion of sufficient evidence for believing they have achieved a mutual understanding of discourse. The importance of speakers believing they have mutual ground with their addressees directly effects their ability to provide new information in appropriately sized chunks for their addressees' comprehension: if the chunks are too large, comprehension will fall over. Thus each sentence is composed of expected, knowns or given information ( = context) with very little new or newsworthy information, which speakers reliably place in expected syntactic positions to facilitate comprehension. Thus speakers need to monitor this discourse-building process to be able to estimate their successes in providing the fight information in the fight format. Similarly, a cooperative addressee will help this process by providing as many cues as possible of how and what they understand talk to be about. Assuming that more information may reduce the number of possible interpretations the participants could make of the presented information, viewing the accompanying gestural movements (e.g. facial expression, body position) occurring in talk may serve to limit or delimit the interpretation of the verbal component. For example, the information contained in video data, while not easily quantifiable, does provide a far greater degree of richness of contextual information. Additionally, non-verbal cues can shed light on the reliability of linguistic cues identified by discourse theorists. Within the discipline of linguistics, modem discourse analysts are now turning to visual cues to complement their functional analyses of discourse (cf. Tomlin et al., 1997; Van Dijk, 1997). Despite several decades of transcript discourse analysis, van Dijk (1997, pp. 6-7) now argues that an analysis of the visual dimension of discourse is indispensable. Today's rapid technological advances in computer and web-based video software have opened the way for comprehensive qualitative research, encompassing the lexical with the gestural and prosodic features of talk. However, the lack of an accepted convention in analysing visual data (i.e. how to code and define non-verbal units of interaction) inhibits a full use of visual/audio data in qualitative research. Other disciplines, however, have an advance on linguists in that the tradition of using visual information has long been in swing. One such field relevant to this paper is NeuroLinguistic Programming (NLP).
168
P. Carroll J. Luchjenbroers and S. Parker
1.2. Rapport and neuro-linguistic programming The approach termed NLP attempts to link cognitive processing theories with linguistic utterances. It utilises a transformational approach based on Chomsky's (1957, 1965) "Deep Structure of Language" models. Despite its popularity in management training and therapeutic circles, NLP has little academic currency. Much of the content of NLP training and certification programmes has gained an uncritical acceptance without supporting empirical studies. However, some aspects of NLP have attracted attention in some academic disciplines (e.g. education). In particular, the hypothesis that empathy between discourse participants is enhanced through the matching and mirroring of discourse behaviours, which in turn creates rapport, is relevant to the present work. According to NLP principles, communication can be considered as having two parts or levels, each of which provides information about the dialogue. These parts are called the Content and the Relationship Messages. Bandler et al. (1980, p. 115) describe the content as that part that is conveyed by the verbal (thought to be "digital") portion of communication, while relationship messages are conveyed by the nonverbal (described as "analogue") features of discourse. Analogue features thus include: body posture, motion, tonality, and message tempo (Bandler and Grinder, 1976, pp. 33-36). There is a lot of information about the speaker's relationship to the interlocutor contained in the structuring of the analogical message. These messages inform the interlocutor about how speakers structure their conceptions and representations of the world, and how speakers position themselves in relation to discourse information as well as their interlocutors. NLP also embraces the idea of conceptual "maps", which are representations of each individual's sensory experiences. According to NLP principles, we each individually construct a "Model of the World", which is utilised to act on and interpret our experiences. By behaving in similar ways and structuring our dialogue to match that of our interlocutors, we present ourselves as sharing a particular model of the world. Rapport is then achieved by matching the structural features of an interlocutor's analogical relationship messages to create a sense of empathy. This shares some features with Giles and Coupland's Accommodation Theory (cf. Giles et al., 1991). Proponents of NLP would thus expect individuals to experience more rapport with those who exhibit the same or a similar "model of the world" to themselves, and this would manifest itself as "pacing": i.e. matching and mirroring of conversational behaviours. These behaviours include nonverbal phenomena, such as: gesture, body posture, and facial expression, as
Methodological importance of multi-modal approach
169
well as verbal phenomena, such as representational predicate choices. Dilts (1983, p. 7) describes this as an interlocutor becoming synchronised with the speaker' s own internal processes. Hence, NLP postulates that rapport in faceto-face communication is established and maintained by a matching of the linguistic and behavioural modes used by discourse participants, and more specifically, that a matching of predicate modes leads to rapport. In this paper we use the following definitions"
Rapport is when all parties make lexical and gestural choices that enable them to infer they have a mutual understanding of the subject matter being discussed. Representational Predicates includes the use of (generally embodied) Conventional Metaphors that take sensory modes as a source domain and map them onto conceptual representations in discourse. In "classic" NLP notation there are four main predicate modes: Visual (e.g. I see...), Audial (e.g. I hear...), Kinaesthetic (e.g. I move...), and Internal Dialogue (e.g. I believe...). Defining rapport in terms of predicate matching and then using predicate matching to identify rapport is a tautology. We predict that rapport will become more easily identifiable with audio-visual data. Therefore, in addition to the generally accepted criteria for recognising rapport, we will test whether visually identifiable markers of rapport occur concurrently with periods of predicate matching.
2.
DATA AND RESEARCH M E T H O D O L O G Y
The data used for this research draws on a single case study involving two Australian university students in a mixed-sex dyad. The discourse consists of negotiated talk on the topic of plagiarism. The subjects in this case study, "Harry" and "Lynn" (pseudonyms) are postgraduate students, aged in their late 20s to early 30s. They had not met each other before this occasion, although they had both already participated in a similar recording before, with another participant discussing another topic" cheating.
2.1.
Procedure
Subjects were placed in a sound-proof room, positioned diagonally across from each other to encourage them to look at each other but away from the video camera, placed at a substantial distance from the discourse.
170
P. Carroll J. Luchjenbroers and S. Parker
The subjects were separated by a desk that had the printed task taped to it, to avoid added noise through paper shuffling. They were asked to devise a set of guidelines, to be given to faculty, about how students should avoid the pitfalls of plagiarism. The data was video taped for later transcription and the analysis was subsequently performed in three stages.
2.1.1.
Analysis: Stage One
The first stage involved only the textual transcription of talk in which the representational predicates used by each participant were first identified, and then those stretches of discourse in which both speakers matched their choices of representational predicates were identified to predict the occurrence of rapport. All representational predicates were coded into one of the following five predicate categories: (i) Visual, (ii) Audial, (iii) Kinaesthetic, (iv) Discourse and (v) Conceptual Predicates. Examples of these five categories are given in table 1, with text examples below. In this study we decided to split the original NLP category, Internal Dialogue, into two categories: Discourse and Conceptual Predicate. The basis for this decision was our observation that the abstract nature of subject matter (i.e. plagiarism) frequently leads conversation into Internal Dialogue predicate usage: e.g. I'd say that .... so what you're saying is..., I think that..., that makes sense. Hence, splitting the original NLP category offers a greater degree of discrimination for predicate matching. Examples of the five predicate types in table 1 are: (A) (i) They might see that as an attractive way of doing what they were doing anyway... (ii) The first thing that occurs to me looking at that again is that um if you' re not really clear on English then... (B) (i) sounds like plagiarism when you put it that way (ii) yeah I've heard of yeah it's almost the classic joke isn't it Table 1 Predicate types Predicate mode
Examples
A. Visual B. Audial C. Kinaesthetic D. Discourse E. Conceptual
See, Look, Perspective Sounds like, Rings true, In tune Feels, Get a hold of, Move to Said, Talked about, Described Conceive, Think, Identify
Methodological importance of multi-modal approach
171
(C) (i) don't know maybe we I feel like we sort of missed a the first bit (ii) I don't know whether lecturers go around checking up on people' s... (D) (i) what about say if you were um doing an assignment I yeah as I said I I used to plagiarise a little (ii) if you're talking about assignments and um you know written things (E) (i) there was this I think there might have been something (ii) I believe they design those so that they can actually ha- um apply some sort of statistical test to ah ah NLP theory suggests that people have a preferred mode or orientation toward a particular mode and the matching of predicates according to the preferred modes is a major factor in creating and maintaining rapport. For instance one person may prefer visual structuring, whereas others may prefer auditory or Kinaesthetic structuring. There is little in the way of evidence to back the idea of genetic preferences. There is, however, evidence to suggest that in particular contexts and specific situations, people adopt preferential representational strategies that they adapt appropriately to the situated discourse in which they are involved. This may be due in part to the propensity to converge models and linguistic and non-verbal discourse strategies in participant's efforts to create mutual ground between interlocutors.
2.1.2.
Analysis: Stage Two
The second stage of analysis focused solely on the Audial and Visual stimuli captured in the video recording. The audio-visual cues taken into account include features such as body positioning, eye contact, and gestural cues, as well as supra-segmental features of discourse such as speech tempo and volume, laughing, and sighing. The visual data for each speaker was then coded according to whether they suggest rapport or non-rapport. For example, rapport was taken to be evident when participants shared eye contact, mirrored body position, utilised similar gesture spaces, 2 and displayed similar speech tempo and volume, while periods of non-rapport were signalled by a marked change in any of these features and possibly others such as a deep sigh and periods of silence and immobility. Observable behavioural indicators of rapport fall into two types" Body movement and Audial qualities, given in table 2. Many of these features were also put forward by Bretto (1989). 2See Luchjenbroers (2004) for a full description of the gesture spaces used in these data.
172
P. Carroll, J. Luchjenbroers and S. Parker
Table 2 Behaviours that discourse participants can "mirror" and "match" Body mirroring and matching
Vocal/verbal mirroring and matching
Body posture Hand gestures Facial expressions Weight shifts Breathing Movement of feet Eye movements/gaze space
Tempo of speech Volume of speech Auditory tone Highly valued descriptors Phatic and back-channelling utterances
2.1.3.
Analysis:Stage Three
The third stage involved making correlations between the results of the first and second stages of analysis to determine to what extent the audio-visual measures of rapport match those predicted from the textual analysis of representational predicate choices. This process required a re-examination of the video data in terms of personal image. During earlier examinations of the data it emerged that certain utterances and non-verbal behaviours were eliciting adverse interpersonal reactions between the protagonists. The video data was coded into "events" where Personal Image was compromised and these events were compared and contrasted with the video coding for evidence of rapport. Factors emerged from the comparison, which made clear that the interlocutors were positioned differently within the discourse. For example, they held different values about plagiarism and had differing experiences regarding copying.
3. 3.1.
RESULTS Results of the three stages
Data coding was examined in terms of percentage usage and sequential patterns of usage. Those portions of the transcript that exhibited higher degrees of predicate matching were marked as predictors of rapport. Conversely, portions of the transcript where a mismatching of predicates occurred were marked as predictors of loss or lack of rapport. Idiomatic utterances such as "you know" were excluded from the textual analysis, as were phatic, filler and back-channelling utterances. In addition, Audial predicates were excluded as their usage was minimal (two instances).
Methodological importance of multi-modal approach
173
Table 3 Representational predicate coding by percentage usage K
C
V
D
No rapport Harry Lynn Difference
52 35 17
32 20 12
12 2 10
3 43 40
Rapport Harry Lynn Difference
58 49 9
22 25 3
9 10 1
10 16 6
K, Kinaesthetic; C, Conceptual; V, Visual; D, Discourse.
Table 3 shows the comparisons of percentage usage in the sections coded as "No Rapport" and "Rapport". Table 3 illustrates that periods occur in these data where speakers match their choices of representational predicates, and in other sections, clearly do not. Interesting to note is that Harry does not appear to alter his choices much throughout the data. There is some evidence of a reduction in Conceptual predicates (e.g. I think~elieve...): 32 ~ 22% in favour of an increase in primarily Discourse predicates (e.g. I say~tell...): 3 ---, 10%. However, it is clear from table 3 that far greater accommodation is made by Lynn who shows a marked increase in her use of Kinaesthetic predicates (e.g. I feel...): 35 ---, 49% and Visual predicates (e.g. I see...): 2 ~ 10% but a highly marked decrease in her use of Discourse predicates (e.g. I think~elieve...): 43---, 16%. In effect, Lynn reduces the percentage usage of her preferred Discourse predicates and increases her use of all other predicate modes, particularly the Kinaesthetic mode which is Harry' s preferred mode. It seems that Lynn is trying hard to repair rapport. On the basis of predicate matching, those sections where rapport was expected to be observed could be mapped. This was subsequently compared to the results of Stage Two analysis: audio-visual cues of rapport (see fig. 1). In fig. 1, the top line indicates the durative aspect of talk; the middle line captures the periods of predicate matching found in the textual analysis; and the bottom line shows sections of talk where Harry and Lynn visually appeared in accord. The results given in fig. 1 show that for the first half of this interaction there is a match between predicted and observed periods of rapport. However, after the second break in rapport (both predicted and observed), rapport is not regained despite predictions based on predicate matching. Video data shows evidence of attempts to re-establish rapport, which are unsuccessful.
174
P. Carroll J. Luchjenbroers and S. Parker 1
200
LTt~erance Number 400
600
739
Textual~alysis
Video Ahalysis
Rappo~ Fig. 1.
Predicted periods of rapport predicted from the textual analysis as compared with those determined by third party observations from the video data.
Something has happened at the end of the second period of observed rapport that has a stronger effect on re-establishment of rapport than that of predicate matching. To ascertain what factors influenced the loss of rapport during the second half of this interaction, the video data was then further examined using both a situated discourse perspective (cf. Davies and Harre, 1991; Saljo, 1997) and a discourse levels perspective (cf. Luchjenbroers, 2002a). Following a full analysis of the video data, six video clips were selected as showing the factors influencing establishment of rapport. These are described in the following section.
3.1.1.
Additional results: video evidence
Clip 1. 3 It shows the starting body positions of both participants. Lynn is leaning forward, while Harry is sitting back; both have their ankles crossed. These positions are maintained throughout most of the dialogue. They only changed upper body positions and leg positions at times when significant events occurred, as illustrated by the clips following. Harry begins the dialogue with a disclosure that he used to plagiarise at school. Lynn responds with a "girlie" giggle, and challenges the validity of Harry's statement: "You actually plagiarised... ?" [line 7]. They have eye contact, similar tempo and volume, and are both smiling. 3Ideally we would like to include the video clips in an electronic or web-based version of the publication. This is not possible as permission from the subjects was not granted (nor asked for) for the publication of video footage.
Methodological importance of multi-modal approach
175
Data excerpt 1" Harry:
Lynn:
plagiarism this is a bit of a sore point for me since ah I kind of ah used it to get by a bit in ah my high school even though I didn't probably didn't need to but I was a bit lazy [laugh] you actually plagiarised at school
[1] [2] [3] [4] [5] [6] [7]
Evident from the analysis of other Australian participants, in the study from which this case study was drawn, is that males often proclaim how they are guilty of these misdeeds, whereas all the Australian women only declare how they never would, and maybe how they were almost guilty, but managed in the end to do the fight thing (cf. Luchjenbroers, 2002b). However, despite this faux pas of Harry' s, visual cues show that the participants came to this task with a general willingness to like the stranger they are talking to. Clip 2. Very early in talk, Lynn focuses on the discourse task [lines 16-17], whilst looking at the printed task taped to the desk. Harry moves forward, and gives a generalised example of "taking something published and sort of reusing it..." [lines 18-20]. At this point he moves back to his "usual" position to finish his utterance. Lynn' s contribution follows on from the theme set up by Harry. The researchers coded these opening sequences as an attempt at establishing rapport. Data excerpt 2: Lynn: Harry: Lynn:
soo...plagiarism plagiarism kind of acts be instances of plagiarism? OK so ah there's one case ah as I said um taking something published and ah sort of re-using it and pretending you came up with ah the the ideas form or whatever what about say if you were um doing an assignment and you were discussing with someone they come up with an idea but you put it in your assignment
[16] [17] [18] [19] [20] [21] [22] [23]
Harry and Lynn continue to explore different instances of plagiarism for some lines. They are both animated with upbeat tempo, are engaged with the content of their discussion, are listening to each other (i.e. have relevant, speedy responses), have eye-contact, and appear amiable and well disposed toward each other. Clip 3. The third clip begins at line 61. Harry is leaning forward. Lynn describes a continuum of "copying exactly" to "something that's paraphrased" [lines 61-66], with an accompanying gesture; she moves both hands from left to fight in front of her, starting and finishing outside F space. Harry indicates he is listening by uttering a back-channel "yep" in
176
P. Carroll J. Luchjenbroers and S. Parker
line 64. A few lines later Harry utilises the same continuum in gesture space whilst saying" "at one end is plagiarism then it sort of shades into something else" [lines 80-81]. The accompanying gesture mirrored that of Lynn' s. He moved his hands across gesture space, from his left to fight, corresponding to where Lynn had begun her gesture in her gesture space. These sequences were coded as examples of establishing rapport. Harry is leaning slightly in toward Lynn for much of these sequences. Data excerpt 3: Lynn: Harry: Lynn: Harry:
so like um if you're talking about plagiarism is there like like there's got to be a line drawn somewhere between you know there's the extreme of copying exactly to F- yep something that's paraphrased or something that looks similar you know how would you be able to tell if it was I was doing ah writing assignments type class with ah student services ah and ah yeah they tried to explain how the various shades OK it had ah um verbatim ah copying and ?? as if it was your own
[61] [62] [63] [64] [65] [66] [67] [68] [69]
~ 1 7 6 1 7 6
Harry:
I suppose this was this wasn't only talking about plagiarism specifically but was talking about the various kinds and ways you can treat your source material and at the one end is plagiarism and then it sort of shades into something else
[77] [78] [79] [80] [81]
Clip 4. This shows the first instance of failure to establish rapport. Harry spends over a minute talking about an article he is writing for a student magazine. He ends talking about how he might reference some of the items he is using in the article. About halfway through his monologue Lynn appears to lose interest. Eye contact between them ceases, Lynn looks at the paper which has the task on it rather than at Harry. She fiddles with her pen and toward the end she touches her hair, which is an idiosyncratic behaviour that the researchers identify as indicative of her discomfort. In the transcription Lynn makes an unintelligible utterance at line 137, accompanied by an iconic gesture of brushing something away. The researchers, after listening many times to the audio tape, assume she is saying "forget it anyway". Whatever she is saying the gesture is certainly dismissive. After this loss of rapport, Lynn brings the talk back to the task level: "but if we were to devise guidelines for academia ..." [line 138]. Later the interlocutors again attempt to establish rapport, and by line 165 have succeeded. They have more or less equal floor space and display all the markers for rapport as already mentioned; they frequently match body position, with Harry leaning in on occasion in a display of attentiveness; they have eye contact, matched tempo, breathing, tone, and volume.
Methodological importance of multi-modal approach
177
Around line 224 rapport is again lost. There is a brief silence and Harry attempts to review their progress. Lynn brings up the Copyright Act [lines 237-238], which reveals her basic orientation toward the topic. Harry responds with a continuation of his position that there is a difference between copyright and plagiarism. He dominates the discourse space until line 304 when Lynn tries to bring the topic back to the task. They go through an establishment phase, then Lynn focuses once again on the task [line 321]. At this point it is clear that Harry and Lynn are essentially operating at different discourse levels: while Harry operates at the anecdotal level of specific instances of plagiarism, Lynn is clearly more comfortable at the task level. Rapport is re-established by line 325 where Harry asks Lynn for anecdotal examples, when the topic turns to students from other cultures for whom English is a second language. At line 385 rapport is lost. There is a silence, Lynn touches her hair, Harry takes the conversation back to a former topic. Rapport is almost regained but then Lynn asks "Why is plagiarism wrong?" [line 405]. This marks a significant pause and Harry looks like he needs to find the energy to continue. Immediately following this Lynn makes clear that plagiarism is theft. Harry is visibly crestfallen at this point. Harry previously disclosed that he had used plagiarism to get by at high school. Data excerpt 4: Harry: Lynn: Harry: Lynn: Lynn:
Harry: Lynn: Harry: Lynn:
I mean just the fact that the same articles seem to keep reappearing every year makes you wonder [laugh] unless the same person's putting them in each year yea- yeah ah what else why is plagiarism wrong?
[399] [400] [401] [402] [403] [404] [405]
I'd say you've got two points there you've got plagiarism is wrong because it it's bad f- um you're doing things against that author you know like you you're saying yep you're taking their ideas away from them F-you're Yep and secondly you're doing yourself a disservice
[408] [409] [410] [411] [412] [413] [414] [415] [416]
Clip 5. At line 452, Lynn outlines how the penalties for plagiarism need to be included in the guidelines they are devising. At line 457 Lynn's laugh is embarrassed. At this point she has called plagiarisers "small people", which must also apply to Harry. Harry smiles as well but his face then becomes impassive. He appears to stop breathing and
178
P. Carroll J. Luchjenbroers and S. Parker
to freeze. Lynn realises something has happened and struggles to continue to talk. She moves her feet and kicks the microphone. The silence between line 460 and Harry's sigh lasts 14 s, which is a long silence in this dialogue. It is uncomfortable to watch as the interlocutors are themselves very emotionally involved at that point. Lynn appears embarrassed and Harry appears upset. He sighs heavily and repeatedly over the next few lines. Lynn also sighs. It is impossible to determine whether it is through empathy with Harry or not. Lynn's tempo slows, her tone drops lower; she is conciliatory as she tries to repair. She takes the topic back to the one that was under discussion when they were last in rapport, deflecting the topic away from the personal consequences that face (would-be) plagiarists. Rapport is never re-established in the remainder of the dialogue, although many attempts are made. Data excerpt 5: Lynn:
this is something that's able to be penalised, you've also got to give guidelines of how they're going to be penalised and I think when we're saying why it's wrong you know ylisting both the penalties involved and how it um makes? them? for a small person [laugh] no but um...um and the I don't know...the benefits of no you can't let mm ~ 1 7 6
Harry: Lynn: Harry:
yes hhh (SIGH) so what are we going to say to people who who do have difficulty with with English? hmm um
[452] [453] [454] [455] [456] [457] [458] [459] [460] [461] [462] [463] [464] [465]
Clip 6. It shows the last minutes of the dialogue. Harry looks fed up; Lynn is talking and Harry is no longer engaged at all. He is watching for "time's up". The researcher enters and immediately asks for feedback on the interaction. It is only at this point that Lynn finally acknowledges her personal bias: her relatives are currently suing for breach of copyright. Data excerpt 6: Researcher: Harry: Lynn: Harry:
Did you feel you had any common ground on that? er no ...I think we've got different backgrounds for it where our ideas are coming from ok for example my family is currently suing for breach of copyright I'm surprised you didn't bring that in earlier
Harry points to Lynn as he declares she should have mentioned that sooner. His voice is loud and he moves from leaning on the table to leaning
Methodological importance of multi-modal approach
179
fight back in his chair, and looks from the researcher to Lynn. He is obviously surprised and enlightened by this new piece of information. The researchers concluded that the absence of this piece of information was an important factor in Harry and Lynn achieving integration. Its absence had led to his inability to process the topic focus on copyright that Lynn repeatedly tried to steer toward.
4.
DISCUSSION
In order to explain rapport, the discourse analyst needs to include evidence of how interlocutors coordinate their conceptual representations of the subject matter under discussion. For this reason, some researchers have turned to speakers' choices of representational predicates. However, subsequent analyses have illustrated that the linguistic medium is but one predictor of rapport, and not always a reliable one. The full explanation of the ebb and flow of rapport during this interaction given in section 4.2 makes it clear that one linguistic feature, such as predicate matching can be an unreliable measure. In this dialogue it was observed that during the later part of this discourse rapport was predicted but not observed. The full explanation reveals that this is primarily due to Lynn trying to repair the damage of her remarks by going back to safer ground (the task level). Inadvertently, this also has the effect of speakers finding those predicates (i.e. arguments) that worked before, and hence implying rapport where there clearly, and visibly, was none: Harry was no longer in the mood. The Co-operative Principle put forward by Grice (1975) postulates that interlocutors in a discourse will generally attempt to work co-operatively in exchanging information. The explicit structure of rapport can be deduced using quantitative behavioural and linguistic approaches, as outlined in the original analyses (Stages One to Three). However, the implicit structure of rapport (the cognitive and conceptual components) is inferred from observation and an understanding of the social practices pertaining to discourse, thus making clear the value of a multi-modal approach to this discourse: without it a very different picture of interpersonal rapport would have emerged. Additionally, the NLP position that predicate matching results in rapport could not have been tested, and shown to be in error. The results gathered in this case study have revealed that periods of representational predicate matches between speakers is concomitant with but not necessarily indicative of rapport, and nor is it the only factor in establishing and maintaining rapport.
180
P. Carroll J. Luchjenbroers and S. Parker
The situated discourse analysis of the video suggests positioning (the drive to "save face") is a significant factor in establishing and maintaining rapport, and personal i m a g e is significant in the discourse positioning of the interlocutors. Discourse positioning is a d y n a m i c process of the construing of o n e ' s " s e l f ' in the face of others. If the positioning of the other is u n k n o w n then self-construal within the discourse can b e c o m e problematic.
REFERENCES Bandler, R., Grinder, J., 1976. The Structure of Magic II. Science and Behaviour Books, California. Bandler, R., Grinder, J., Delozier, J., Dilts, R., 1980, Neuro-linguistic Programming, Vol. 1. Meta Publications, Cupertino, CA. Bretto, C., 1989. A Framework for Excellence: A Resource Manual for NLP. The Centre for Professional Development, Santa Cruz, CA. Chomsky, N., 1957. Syntactic Structures. Mouton, The Hague. Chomsky, N., 1965. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA. Clark, H.H., 1993. Arenas of Language Use. Chicago University Press, Chicago. Clark, H.H., 1996. Using Language. Cambridge University Press, Cambridge. Clark, H.H., 1997. Dogmas of Understanding. Discourse Process 23, 567-598. Davies, B., Harre, R., 1991. Positioning: the discursive production of selves. J. Theory Social Behav. 20, 1, 44-63. Dilts, R., 1983. Applications of Neuro-linguistic Programming. Meta Publications, Cupertino, CA. Giles, H., Coupland, J., Coupland, N. (Eds.), 1991. Contexts of Accommodation. Cambridge University Press, Cambridge. Grice, P., 1975. Logic and conversations. In: Cole, P., Morgan, J. (Eds.), Syntax and Semantics, Vol. 3, Speech Acts. Academic Press, New York, NY. Luchjenbroers, J., 2002a. Prosodic and gestural cues for navigations around mental space, Paper presented at 27th BLS Conference, Special session: Language & Gesture. University of California Press, Berkeley. Luchjenbroers, J., 2002b. Gendered features of Australian English discourse: discourse strategies in negotiated talk. J. English Linguist. June. Luchjenbroers, J., 2004. Visual and verbal cues for navigating mental space, this volume. Luchjenbroers, J., ms-a. Functionalist categories and cognitive strategies for mutual ground construction. In: Verhagen, A., van de Weijer, J. (Eds.), Levels in Language and Cognition, to appear. Parker, S., Carroll, P., 2001. Rapport and textual analysis. Paper presented at University of Wales, Bangor. Saljo, R., 1997. Concepts, Learning and the Constitution of Objects and Events in Discursive Practices. Unpublished paper. Tomlin, R., Forest, L., Pu, M.-M., Kim, M.H., 1997. Discourse semantics. In: Van Dijk, T. (Ed.), Discourse: A Multidisciplinary Introduction. Sage Publications, London. Van Dijk, T.A., 1997. The study of discourse in discourse as structure and process. In: Van Dijk, T.A. (Ed.), Discourse: A Multidisciplinary Introduction. Sage Publications, London.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
13
Visual representation of text in Web documents and its interpretation D. Karatzas and A. Antonacopoulos PRIMA Group, Department of Computer Science, University of Liverpool, Peach Street, Liverpool L69 7ZF, UK
This chapter examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described.
1.
INTRODUCTION
A Web document, like many other types of documents in electronic form, comprises two components: the code and the view. The code is typically a file containing markup language tags, program instructions and various types of text. To be more precise, text in this instance refers to anything that it is not a keyword or part of a program. This text may not actually appear in the browser window, such as attributes to keywords (e.g. textual attributes to META or ALT tags). On the other hand, the bulk of the text will be visible in the browser window as part of the document text. Typically, this text is encoded in ASCII or UNICODE and is formatted for display according to the instructions in the code. 181
182
D. Karatzas and A. Antonacopoulos
The view of the document is what actually appears in the browser window. This is what humans see when they look at the monitor screen and what the creator of the document intended to present. In the authors' opinion, the view is the definitive representation of the document message as it was originally intended to be conveyed to the reader. The reasons for establishing the view as the baseline representation are explained immediately below. In a typical Web document, there are significant discrepancies between the text appearing in the view and that in the code of the document. First, text in the code that is due to appear in the browser window may not be visible. This sounds paradoxical but it is true when Web document designers create text in the same colour as that of the background. The rationale, in this case, is that search engines will use this (often irrelevant but highly topical) text to boost the ranking of the document in the relevant indices. It should be noted that this approach is an attempt to overcome the fact that some search engines do not index text that is not to be displayed (such as META or ALT tag attributes) as this text is often unreliable (see below). A second major (and very frequently occurring) discrepancy is that some of the visible text in the view of the document is actually embedded in images. There is no correspondence between the code (an instruction to display a given image) and the text contained in that image. The human reader, of course, can read all the text on the screen (document view), whether this text is in the code or not. From this point on, visible text that is contained in the code will be referred to as encoded text, while text that is embedded in images will be referred to as image text. The latter discrepancy between the code and the view representations of a Web document is potentially very significant. The origins of the problem are twofold. First, Web document designers create image text as a way of overcoming the limitations of the markup language used in the code. Second, due to limitations of current technology, image text is not accessible to any automated process performed on the document. Both of these interrelated issues are examined next in order to achieve a deeper understanding of the problems of the representation and their impact on the automated interpretation of Web documents. Image text is created for two main reasons. The first is one of necessity as the markup language (HTML in this case) cannot adequately display textual entities such as mathematical equations, text in diagrams and charts, etc. The second and main reason is that document creators wish to add impact to certain textual entities such as titles, headings, buttons, etc. The effects applied to the text and its background are such that they cannot be expressed in the markup language. Not having all the visible text in the code of the document means that a proportion of the text seen by the human reader (image text) is not available
Visual representation of text in Web documents and its interpretation
183
for any automated analysis. Such analysis includes essential processes, fundamental to the modus operandi of the Web, such as automated indexing by search engines. In the case of indexing, the problem is compounded by the fact that it is precisely the semantically important text (titles, headings, etc.) that is most often required to make a visual impact and, therefore, represented as image text. The lack of a uniform representation of the text impacts negatively on several other possibilities for exploiting the Web. If all the visible text were available as encoded text, it would be possible to perform accurate voice browsing (Brown et al., 2001), for instance. One could listen to the Web document read to them instead of having to look at a monitor. Such a possibility will enable browsing in the car, via the telephone and also will benefit visually impaired people. Another major application area is the analysis of the content of a Web document for filtering, summarisation and display on small form-factor devices such as PDAs and mobile phones. From the above, it is evident that there is a potentially significant problem of not having a uniform representation of the visible text in a Web document. The remainder of this chapter focuses on the problem of achieving such a uniform representation by extracting and recognising the image text. The characteristics of image text are described in section 2. Image text is usually present in colour (both the foreground and the background). Section 3 briefly discusses the properties of colour and its representation in the context of both the monitor screen and of how humans perceive it (exploited in the authors' approach to extract the image text). Properties of text in terms of its spatial representation in images are presented in section 4. An overview of the challenges faced by current approaches as well as open problems is given in section 5. The authors' research towards converting image text into its encoded form is summarised in section 6, while section 7 concludes the chapter.
2.
IMAGE TEXT REPRESENTATION
For encoded text, a pure textual representation exists and is directly available by analysing the code of the web document. In the case of image text though, information about the textual content of the image is generally absent. The only HTML provision for an alternative representation of image text is via the ALT tag, by which a textual description can be supplied for each image. Nevertheless, in an average of 56% of image text cases, the ALT tag description is incomplete, totally false or non-existent
184
D. Karatzas and A. Antonacopoulos
(Antonacopoulos et al., 2001). The same study showed that 76% of the image text does not appear within the rest of the encoded text. Some information could potentially be extracted from the filename of the image, which is usually related to the thematic content of images (Munson and Tsymbalenko, 2001). However, it can be appreciated that the filename does not, in most cases, represent an accurate description of the image text. The ALT tag description and the image filename, along with the size and placement of each image inside a web document, is about all the information that can be obtained by analysing the code of the document alone. The remaining option is to analyse the images themselves and extract and recognise the image text directly from them. As mentioned earlier, this is the most reliable way of obtaining a uniform representation, as the definitive representation is only what the reader sees. Towards this goal, the key characteristics of image text are examined in the remainder of this section. Images found in Web documents share some common characteristics that emanate from the specific use of the images on pages. Certain observations can be made for image text: 9 Image text is generated using computers, in order to be viewed on computer monitors. Therefore, the choice of resolution and colours and their rendering is affected. 9 File-size minimisation is very important when creating image text, as it has to be rapidly transmitted over the Internet. Therefore, the resolution is usually lowered and file compression (often lossy) is applied. 9 Image text is created to add impact. Designing eye-catching headers and selection buttons, and enhancing the appearance of a Web document using images for anything that the visitor should pay attention to (e.g. advertisements), is a strong advantage in the continuous effort to attract more visitors. 9 There are no strict rules governing the creation of image text, e.g. the use of colours, fonts, provision of alternative representation, etc. Therefore, people exercise their creativity and frequently produce images with complex colour arrangements of text and background. The nature of image text differs significantly from that of text in images of scanned paper documents that are typically analysed by optical character recognition (OCR) applications. Certain assumptions that are usually made by OCR applications regarding their input images are not applicable to image text in Web documents. These assumptions render OCR inapplicable to image text. Differences can be identified with regard to both the structure and the content of images. The most prominent difference is the fact that image text is multicoloured, whereas typical document images are black and white (bi-level). Traditional scanned document analysis methods also require both the text
Visual representation of text in Web documents and its interpretation
185
Fig. 1. (a) Image containing text over multi-coloured photographic background. (b) Image containing multi-coloured textured text. (Rendered here in greyscale - Ed.). and the background to be of constant colour. These methods are, therefore, unsuitable for image text in Web documents. The majority of such image text contains gradient-colour or textured characters rendered over textured or photographic background (fig. 1). The number of colours present in the image text dataset used by the authors (comprising approximately 120 images of text collected from various Web documents), range from 2 to 66,023 with an average of 4832 colours per image. Image text is designed to be viewed on computer monitors. This entails certain characteristics with regard to the size and the resolution of the images. Contrary to typical document images, which have a minimum spatial resolution of 300 dpi (dots per inch), characters in image text have an average resolution of 72 dpi. The actual size of the characters is the next significant difference between scanned documents and image text in Web documents. An expected character size in scanned documents is 10 pt or larger, whereas in image text, characters can be as small as the equivalent of 5 - 7 pt. Commercial OCR methods typically fail for characters of such small size. Furthermore, although image text does not suffer from the typical distortions and noise introduced during document scanning, different types of artefacts are evident in most of the cases. Anti-aliasing is probably the most common kind of artefact that strongly affects a method's ability to differentiate characters from the background. Anti-aliasing is extensively used when rendering text, especially when it comes to small-sized characters, since it produces an aesthetically better outcome. In general terms, it involves a process of blending a foreground object to the background, creating a smooth transition from the colours of one to the colours of the other. This produces characters with poorly defined edges, in contrast to the characters in typical document images (fig. 2).
Fig. 2. (a) Original image of a menu item. (b) The word "Search" magnified. The effect of anti-aliasing is severe in this case and, combined with the small size of characters makes this text difficult to recognise, even for humans.
186
D. Karatzas and A. Antonacopoulos
Fig. 3. (a) Original GIF compressed image. (b) Magnification of an area containing part of characters and part of the background. Dithering is evident and pixels belonging to the same area would be assigned to different colour clusters in most colour image analysis techniques.
Another artefact is that due to the sampling grid used by software packages when applying colours to objects, the same character can appear slightly different in different parts of an image. Finally, the fact that image text is created with file-size minimisation in mind, suggests that most often compression is applied to the image file. The vast majority of images in the Web use JPEG compression. This type of compression may have no particular effect in areas of almost constant colour, but can introduce significant artefacts to characters. This kind of lossy compression is even more noticeable when colour analysis of the image takes place, as lightness information is mostly preserved, but colour information is to a great extent discarded in the JPEG compression scheme. The next most popular format used for storing image text, is GIF. As an alternative, the GIF format preserves much more information than JPEG, but it is limited to representing 256 colours. This fact vastly reduces the number of available colours to represent the characters with, and can introduce significant colour quantisation artefacts in the attempt. In addition, due to the limited number of colours available, dithering techniques are often employed to render colours that cannot be represented uniquely (fig. 3). Dithered areas are difficult to identify as uniform regions, which poses a further problem in colour image analysis.
3.
COLOUR REPRESENTATION
Colour is the perceptual result of light in the visible region of the spectrum (having wavelengths between approximately 400 and 700 nm). A good understanding of how colour is reproduced on computer monitors (and the way it is internally represented within a computer system) is vital to understand the difficulty of analysing colour images (especially image text).
Visual representation of text in Web documents and its interpretation
187
Colour is reproduced in cathode ray tube (CRT) displays in an additive manner by mixing three lights of different colours (red, green and blue) produced by the phosphors of the screen. Thus three components are being used, namely R, G and B, which express the participating power of each mixing colour. Each component is quantised in 28 - 256 levels; thus a CRT display can produce 2563 colours, by mixing different amounts of light of each colour. Depending on the technical and physical characteristics of the CRT display, only a certain gamut (range) of colours can be produced. The largest range of colours will be produced with primaries that appear red, green and blue, and that is the reason why phosphors producing colour stimulus with exactly those primaries are employed. Nevertheless, since there are hardware differences between computer systems, the RGB information alone is not (strictly speaking) adequate to determine the actual colours of an image. A set of primaries that closely represent the primaries used in CRT monitors are the ones specified for the HDTV protocol by the standard ITU-R recommendation BT. 709 (1990). The majority of monitors conform to Rec.709 within some tolerance, so it is a relatively safe assumption that the same RGB code will produce the same colour on different CRT monitors. The most widely used colour system in computer applications is, therefore, RGB (fig. 4). Although RGB is hardware dependant, in the sense that the same RGB colour may be slightly different between different monitors, it is the default choice for most applications because of its simplicity and low computational cost. A number of colour attributes can be calculated from the RGB components. An interesting set of attributes, in the sense that they are representative of human perception, is hue, lightness and saturation. These are the psychological attributes related to human impressions of colour. The use of such perceptually based quantities can prove more
Fig. 4. RGB colour space. (a) Axis of the colour space. (b) Colour gamut.
188
D. Karatzas and A. Antonacopoulos
Fig. 5. HLS colour space. (a) Axis of the colour space. (b) Colour gamut.
suitable for the analysis of images created to be viewed by humans such as real-life scenes, and for this reason, image text (Tominaga, 1986; Ledley et al., 1990). HLS, HVC and HSI are colour systems based on these attributes (fig. 5). There exists, however, a totally different approach based directly on human vision characteristics rather than on transformations of the RGB components. A colour stimulus is radiant energy of a given intensity and spectral composition, entering the eye and producing a sensation of colour. This radiant energy can be completely described by its spectral power distribution. This is often expressed in 31 components, each representing power in a 10 nm band from 400 to 700 nm. Using 31 components is a rather impractical and inefficient way to describe a particular colour, especially when a number of colours must be described and communicated, which is the case with computer graphics. A more efficient way to describe a colour would be to determine a number of appropriate spectral weighting functions. It transpires that just three components are adequate for that purpose, based on the trichromatic nature of vision. The Commission Internationale de l'Eclairage or International Commission on Illumination (CIE) standardised, in 1931, a set of spectral weighting functions, called colour matching functions, which model the perception of colour by human beings. These curves are referred to as ~, ~, and ~, and the colour system is consequently defined as CIE XYZ (fig. 6). A significant problem with most colour systems (including XYZ) is that the distance between two colours in the colour space does not correlate with the perceived (by humans) distance of the same colours (how similar or dissimilar they appear to be). For this reason, the CIE proposed certain variants of the XYZ colour system, resulting in systems that exhibit greater
Visual representation of text in Web documents and its interpretation
189
Fig. 6. CIE XYZ colour space. (a) Axis of the colour space. (b) Colour gamut.
perceptual uniformity. The CIE L*a*b* (McLaren, 1976; Robertson, 1977) and CIE L*a*b* (Carter and Carter, 1983) are such colour systems (fig. 7). These are used when a colour distance measure that correlates well to the perceptual colour distance is required.
4.
SPATIAL REPRESENTATION
Although colour information is vital when trying to separate the foreground from the background of an image, there are an additional number of spatial characteristics that enable us to infer whether a region in the image is a character, even if it is a character we have never encountered before. In this section, the spatial features of characters indicative of their hypostasis are briefly summarised in the context of image text.
Fig. 7. CIE L*a*b* colour space. (a) Axis of the colour space. (b) Colour gamut.
190
D. Karatzas and A. Antonacopoulos
Fig. 8. (a) Original character. (b) The character decomposedin a number of strokes: an arc, a straight line and a circle. A distinctive feature of characters is the fact that they are comprised of strokes. A stroke can be thought of as a single movement of the writing tool. In the context of image text, a stroke can be any short line, straight or curved, which is part of a character. All characters can be decomposed to a series of strokes as can be seen in fig. 8. This is an important observation, since it directly suggests a way to create a comprehensive description of every character. Such a description can be obtained by identifying a character's strokes and the way they are combined, in terms of corners, ends and intersections. Descriptions like the above are invariant in terms of size and most of the time in terms of rotation, and are widely used in character recognition applications. Although such a stroke identification process usually comes after segmentation (after the characters have been separated from the background), the knowledge that characters comprise a number of strokes can provide useful information for the segmentation process as well. A second key feature of characters is their aspect ratio. This is defined based on the bounding box of the character, as the ratio of the bounding box' s width to its height or vice versa, and is a measure of the overall shape of the character (in terms of how elongated its bounding box is). In general, with the exception of characters like "i" or 'T', the bounding boxes of characters are closer to square, with an aspect ratio near 1. Other spatial features that can be used towards the identification of characters in images are the percentage of the area of a character's bounding box occupied by character pixels (as opposed to pixels describing the background), and the number of transitions from character pixels to background ones and vice versa within the bounding box of the character. At a more macroscopic scale, when looking at the whole set of characters in an image or document, we usually expect them to share some
Visual representation of text in Web documents and its interpretation
191
Fig. 9. (a) Image with characters of different font in the same line of text. What is also interesting is that part of the first character is missing (placed in a different image in the web document). This is an example of tightly cropping the images around the characters, even splitting characters among different images. (b) Characters of the same line of text are of different size. Also they are not placed on a straight line as is usually the case in paper documents. common characteristics. The size of characters is probably the first such characteristic. Indeed, in the majority of the cases we expect the characters in a paragraph, or at least within a single line of text to have similar size. Such assumptions stand true for essentially all paper documents and for the majority of image text. Nevertheless, there are many cases of image text, where even characters of the same word are of different font and consequently of different size as well (fig. 9). To make things even worse, there are cases where characters are substituted by other shapes for the sake of visual impact, as can be seen in fig. 10. At the image level (in the context of image text), one could study features such as the proportion of foreground pixels to background ones for the whole image. Knowledge of the expected coverage of the image by characters could prove useful in the process of selecting the foreground colour class for the image. In simple cases where both the text and the background are each of constant colour, the selection of the colour corresponding to characters could be initially based on exactly this kind of information. Of course, this simplified case mostly applies to images such as scanned documents and not so much in multi-coloured image text in Web documents. For bi-level images of documents, one could use this information to evaluate the final segmentation produced by other segmentation methods, and subsequently evaluate whether the classes are identified as expected. Furthermore, characters in image text are often cropped tightly around their outlines (fig. 9a), and they have no equivalent to the white frame, present in document images, thus the proportion of the image area occupied by characters varies significantly.
Fig. 10. Exampleof an image where characters have been replaced by other shapes.
192
D. Karatzas and A. Antonacopoulos
Fig. 11. (a) Sentences placed on a circle, and straight lines of text placed on an angle. (b) Letters of the same "word" placed with different orientation. Finally, a common characteristic of text is that characters are usually placed on straight (and horizontal) baselines. While this is true for the majority of paper documents, characters in image text in Web documents may not be on straight or even horizontal baselines (as depicted in fig. 11). Overall, concerning the spatial characteristics of text (as a whole or of single characters), their relevance to image text in Web documents proves limited compared to traditional scanned documents. However, in combination with other features, such as colour similarity, spatial characteristics can provide considerable help in a number of circumstances.
5.
CHALLENGES AND APPROACHES
The characteristics of the image text have been examined in the previous sections in terms of image, colour and spatial representations. This section examines the problem of extracting the characters from the image text and the subsequent recognition of this text. As the character extraction is still an unsolved and difficult problem, and there is only one approach in the literature that has attempted to recognise the characters (Zhou et al., 1997), this section concentrates mostly on character extraction. A small number of approaches have been proposed towards text extraction from image text. Previous attempts, mainly assume that the characters are of uniform colour, work with a relatively small number of colours, and restrict their operations on the RGB colour space. One of the most prominent approaches is that of Zhou and Lopresti (Lopresti and Zhou, 1996; Zhou and Lopresti, 1997; Zhou et al., 1997). They proposed methods for both text extraction and recognition. The images used are GIF formatted (256 colours only), and the characters are assumed to be rendered in a homogeneous colour. Their method for text extraction is based on clustering in the RGB colour space, and subsequently identifying connected components in the image according to the clusters located. A detection rate of 47% was initially reported for a data set comprised by GIF
Visual representation of text in Web documents and its interpretation
193
images collected from the Web. An optimisation of the algorithm was later proposed (Zhou et al., 1998; Lopresti and Zhou, 2000) which introduced a metric that combines RGB Euclidean distance with the spatial proximity of pixels having the same colour computed in a small neighbourhood. The definition of such a metric is feasible since the images used are GIF formatted, thus they contain a maximum of 256 colours. A layout analysis stage follows connected component identification, which aims to identify the character-like components based on spatial features of text by making certain assumptions for the placement of characters. The authors report an average character detection rate of 68.3% for a set of 482 GIF images collected from the Web, containing homogeneous text. With similar assumptions about the colour of characters, the approach of Antonacopoulos and Delporte (1999) uses two alternative clustering approaches in the RGB space, but works on (bit-reduced) full-colour images (JPEG) as well as GIFs. Jain and Yu (1998) report a method based on decomposing an original image into a small number of foreground images and a single background one. The original number of colours (8-bit or 24-bit images) is dramatically reduced (to between 4 and 8 distinct colours) by bit dropping and colour quantisation in the RGB space.
6.
AN ANTHROPOCENTRIC APPROACH
Towards the extraction of characters from image text, the authors have attempted to identify possible ways to segment and extract character-like components in colour images. Two different methods have been implemented and tested. The innovation of both approaches, lies in the fact that they are both based on available knowledge of the way humans perceive colour differences. The anthropocentric nature of the two approaches is evident primarily through the way colour is analysed, making use of human perception data and employing colour systems that are efficient approximations of the psychophysical manner humans understand colour. The first method proposed by the authors (Antonacopoulos and Karatzas, 2000) is based on a split and merge strategy. It employs the HLS colour space to split the image into layers in a recursive way, by analysing the lightness and the hue histograms. Connected components are then identified, and for each component, the neighbouring pixels are examined for colour similarity. In this way, a visually similar area is identified for each component as a possible extension. Special consideration has been given to the way visual similarity is assessed. Towards that end, the authors used experimental
194
D. Karatzas and A. Antonacopoulos
biological data (Wyszecki and Stiles, 2000) for wavelength and lightness discrimination, according to the layer processed each time. The merging process starts with the bottom layers and proceeds in a bottom-up manner. The merging of components (and their possible extensions) is ruled by the extent to which they spatially overlap. The second method developed by the authors (Antonacopoulos and Karatzas, 2001), is based on the use of a propinquity measure defined in the context of a fuzzy inference system. The method comprises two steps. It starts with the grouping of pixels having similar colours into connected components, and then uses the propinquity measure defined to combine connected components into progressively larger ones, aiming at constructing a correct segmentation of the characters in the image. This approach makes use of the perceptually uniform CIE L*a*b* colour space in order to assess the colour similarity of the pixels. The propinquity measure used in the second step combines, with the help of a new fuzzy inference system, the colour distance between two components and a metric indicative of their spatial distance. The colour distance metric used is the Euclidean distance in the CIE L*a*b* colour space. Since the CIE L*a*b* is perceptually uniform, the Euclidean distance would be indicative of the perceptual distance between colours. The second input of the fuzzy system is a topological measure defined by the authors, which has to do with the way components are connected in the image. Both methods were able to correctly segment an average of 60% of the characters in images containing multi-colour text over multi-colour background. In simpler images, where either the text or the background was mostly uniform, both methods correctly segmented approximately 80% of the characters. Figure 12 illustrates in a comparative way the resulting segmentation from both methods. Correctly segmented characters are
Fig. 12. (a) Original image. (b) Results obtained with split and merge method. (c) Results obtained with fuzzy segmentation method.
Visual representation of text in Web documents and its interpretation
195
illustrated in black colour, while in light grey colour are characters that were still correctly separated from the background, albeit not as separate, whole characters (broken in more than one components, or joined together).
7.
CONCLUDING REMARKS
From the preceding sections, it can be appreciated that there is a problem of non-uniform representation (in terms of encoding) of text in Web documents. There is a pressing need to obtain a uniform representation to achieve more accurate searching and retrieval of information from the Internet. Moreover, there is an ever-increasing requirement to provide the capability for novel ways of interaction with the Internet (e.g. voice browsing and viewing summarised documents on devices with small bandwidth and form-factor). To obtain a uniform representation for the text in Web documents, the image text must be analysed, and the characters within it extracted and recognised. It is evident that the various characteristics of the representation of the image text (image, colour and spatial representation) make the extraction and recognition of image text a difficult problem. This chapter described a handful of prominent approaches to interpret the image text (mostly to extract the characters from the images at this stage). Among them, the research carried out by the authors attempts to exploit human perception of colour differences along with spatial features of characters. The results obtained so far are promising and further development of the extraction and subsequent recognition methods is taking place.
REFERENCES Antonacopoulos, A., Delporte, F., 1999. Automated interpretation of visual representations: extracting textual information from www images. In: Paton, R., Nielson, I. (Eds.), Visual Representations and Interpretations. Springer, Berlin. Antonacopoulos, A., Karatzas, D., 2000. An anthropocentric approach to text extraction from www images. Proceedings of the 4th IARP Workshop on document Analysis Systems, Rio de Janeiro, Brazil, pp. 515-526. Antonacopoulos, A., Karatzas, D., 2001. Text extraction from web images based on human perception and fuzzy inference. Proceedings of 1st International Workshop on Web Document Analysis, The Pattern Recognition and Image Anlaysis (PRIMA) Group, Seattle, USA, pp. 35-38. Antonacopoulos, A., Karatzas, D., Lopez, J.O., 2001. Accessing textual information embedded in internet images. Proceedings of SPIE Internet Imaging II, San Jose, USA, pp. 198- 205.
196
D. Karatzas and A. Antonacopoulos
Brown, M.K., Glinski, S.C., Schmult, B.C., 2001. Web page analysis for voice browsing. Proceedings of 1st International Workshop on Web Document Analysis, The Pattern Recognition and Image Anlaysis (PRIMA) Group, Seattle, USA, pp. 59-61. Carter, R.C., Carter, E.C., 1983. CIE L*u*v* color-difference equations for self-luminous displays. Color Res. Appl. 8, 252-253. Jain, A.K., Yu, B., 1998. Automatic text location in images and video frames. Pattern Recognition 31, 2055-2076. Ledley, R.S., Buas, M., Golab, T.J., 1990. Fundamentals of true-color image processing. Proceedings of 10th International Conference on Pattern Recognition, 791-795. Lopresti, D., Zhou, J., 1996. Document analysis and the world wide web. Proceedings of the Workshop on Document Analysis Systems, Marven, Pennsylvania, 417-424. Lopresti, D., Zhou, J., 2000. Locating and recognizing text in www images. Information Retrieval 2, 177-206. McLaren, K., 1976. The development of the CIE 1976 (L*a*b*) uniform colour space and colour-difference formula. J. Soc. Dyers Colorists 92, 338-341. Munson, E.V., Tsymbalenko, Y., 2001. To search for images on the web, look at the text, then look at the images. Proceedings of 1st International Workshop on Web Document Analysis, The Pattern Recognition and Image Anlaysis (PRIMA) Group, Seattle, USA, 39-42. Robertson, A.L., 1977. The CIE 1976 Color Difference Formulae. Color Res. Appl. 2, 7-11. Tominaga, S., 1986. Color image segmentation using three perceptual attributes. Proceedings of Conference Computer Vision and Pattern Recognition, 628-630. Wyszecki, G., Stiles, W.S., 2000. Color Science, Concepts and Methods, Quantitative Data and Formulae. Wiley, New York. Zhou, J., Lopresti, D., 1997. Extracting text from WWW images, Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany. Zhou, J., Lopresti, D., Lei, Z., 1997. OCR for world wide web images. Proceedings of IS&T/ SPIE International Symposium on Electronic Imaging, San Jose, California, 58-66. Zhou, J., Lopresti, D., Tasdizen, T., 1998. Finding text in color images. Proceedings of IS&T/ SPIE Symposium on Electronic Imaging, San Jose, California, 130-140.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
14 Component modes of graphical communication John Lee Department of Architecture, Human Communication Research Centre, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK
The particular characteristics of communication by graphics have always been controversial. It has been argued by some (e.g. Goodman, 1969; Scholtz, 1993) that the realm of graphics is to be distinguished from that of text or other forms of language on syntactic grounds. Another line of thought is that this fails to get us far unless we devote careful attention to semantics at the same time (Lee, 1999). Syntax is indeed in some sense crucial, but can only be understood and clarified by reference to the semantic properties of the expressions, symbols, or whatever we wish to call them, that occur in the graphical medium. One way of developing this line of thought is to consider the uses of graphical expressions. Such uses are of course common. We see pictures, diagrams, symbols of many graphical kinds in use every day. It is instructive to consider these, but often we will find that their uses are somewhat removed from the immediacy of the communicative context from which they are originally derived. Some aspects of their communicative function in ~The experiment described here was designed and conducted as part of the project "Multimedia and Graphics in Communication" (MAGIC), funded by the UK research councils' PACCIT programme (L328253003). It is described in some detail in Fay and Garrod (2001), and a substantial report is in preparation including extensive quantitative analyses. The present discussion is relatively impressionistic. The author is very deeply indebted to discussions and conversations with, and the writings of, other members of the project: Pat Healey, Simon Garrod, Jon Oberlander, Nicolas Fay and James King. However, they should not necessarily be held to endorse all the views expressed in this paper.
197
198
J. Lee
those contexts may have ossified, atrophied even, in the course of a process whereby they have become conventionalised in their usage. Perhaps we can better understand these communicative functions if we study a situation which is relatively unconventionalised, but in which conventions can be allowed to emerge. We can then investigate what happens to the roles and uses of graphical expressions during this emergence.
1.
SUGGESTIONS FROM AN EXPERIMENT
Here we discuss an experiment in which such a situation is set up. Pairs of subjects are asked to communicate concepts to each other using only graphical means. This is similar to the well-known game "Pictionary", except that the concepts are drawn from a fixed list which is known to both participants. The subject (the matcher) whose task is to identify the concept drawn on a whiteboard by the other (the director) has 16 possibilities to choose from. These include easily confused groups such as art gallery, museum; parliament, theatre; Robert de Niro, Clint Eastwood, Arnold Schwarzenegger. In this situation, the participants are able to make some minimal assumptions about each other's shared cultural background (e.g. they may expect the other to know that Schwarzenegger has large muscles), but have relatively few shared resources for using the communication system. During the experiment, they carry out the task through a number of "blocks", each consisting of 12 of the same 16 items, chosen arbitrarily. Hence the items regularly recur, though at random intervals. There are various different conditions in the experiment, e.g. that the director and matcher alternate roles with succeeding blocks, or that they maintain the same roles throughout the experiment. One of the most salient findings is that there are marked differences where the roles alternate (the Director-Director, or DD condition), compared with where they do not (the Director-Matcher, or DM condition). The drawings that are used to communicate successive occurrences of the same item tend in the DD condition to change considerably: they become simpler, and the productions of the two participants tend to become more similar. The latter observation remains impressionistic at this stage, but for the former we have an algorithmic objective measure, which correlates well with unbiased judges. We note that these changes are generally much less marked, or entirely absent, in the DM condition. In some respects, the experimental situation just described is of course fairly unlike most of real life. There could hence be a charge of artificiality.
Component modes of graphical communication
Fig. 1.
Fig. 2.
199
Cartoon.
Clint Eastwood.
However, we would like to see this as a reflection of a way in which the situation compresses into a very short time processes that in general do occur but over a much longer period. Krampen (1983) discusses the historical development of various systems of symbols that now constitute road signs, and there are striking parallels between this development and the very smallscale phenomena our experiment observes. The analogy of ontogeny recapitulating phylogeny is attractive, even if we cannot at this stage treat it as more than suggestive. Consider the following example (fig. 1). Here, in the DD condition, the participants have alternately come across "cartoon" through six blocks. The drawings show both obvious simplification and a degree of convergence, in that the difference between the second and fourth drawings (from the second participant) seem to indicate some move towards ideas perhaps suggested by the first and third drawings. The final drawing of course might well fail to communicate cartoon to a third party; it is just in the context of this series of interchanges that continued success with these attenuated resources arises from the shared history of the participants. Other examples show a similar pattern (figs. 2 and 3). Here, we see simplification and a convergence on a common component (the hat, the crossed-out banknote) that has been present throughout, though used in a slightly different framework by each participant. In these cases, we see a somehow predominant use initially of a mode of expression that might be called depiction, in that elements of the drawings (a man, some money, a cartoon-style rabbit) are apparently meant to "look like", and hence evoke, certain things that might be expected to be common in the participants' experiences. There is also some use, especially in fig. 3, of known conventions to achieve more complex objectives (crossing-out to evoke the notion of negation). This aspect is sometimes more prominent still. In fig. 4, the use of the two diverging lines seems to be as a representation of the idea of
Fig. 3.
Poverty.
200
J. Lee
Fig. 4.
Museum.
Fig. 5.
Fig. 6.
Loud.
Soap opera.
containment: the building contains the objects. This consistently survives the simplification process. In other cases, the apparently self-same element can be used to represent quite different things. Consider fig. 5. Here, the diverging lines evidently indicate some notion of sound going outwards. On the other hand, in fig. 6 they seem to mean either containment again, or some rather vaguer relational concept (and are quite distinct from the differently sized and oriented but otherwise very similar television aerial!) 2.
2.
MODES OF REPRESENTATION
Even in these simple cases, then, we see different uses of the graphical medium emerging. These of course have been studied for a long time. CS Peirce, in particular, is well known for developing a series of distinctions between, e.g. iconic, indexical and symbolic uses of graphics. The basis of these distinctions, for Peirce, lies in the kind of relationship he conceives of as existing between the graphical element and the thing it represents. This is part of a general semiotic approach which is apparent also in the works of Saussure and others, but also in much more recent formal semantics. There are distinct domains consisting of the representing system (in our case, graphical) and that which is represented (perhaps simply "the world", or a part or a formal model thereof). Representation is described in terms of some set of relations between these domains, defined 2The introduction of this aerial in drawing 4 is very odd, since it goes directly against the usual trend towards simplification and elimination of elements. We might conjecture that it is due to the apparent divergence in drawing 3, which could be interpreted as suggesting that this participant's interpretation of the box element as a television is shaky and needs to be shored up somehow. See later discussion.
Component modes of graphical communication
201
and perhaps mediated in ways that we will consider later. The iconic, Peirce proposes, is based on a relation such as resemblance or similarity: as suggested above, an iconic image represents something because it "looks like" that thing. Symbolic items, by contrast, have an essentially arbitrary relationship to what they represent and are mediated by some form of convention. Words are like this, but perhaps also graphical elements such as the cross that crosses-out the banknote in fig. 3. Indexical elements are an interesting and often controversial category in which, according to Peirce, the image is somehow causally related to something which it thereby "points to" and thus represents. The stock examples of this type are often not graphical (such as pawprints, which are an index of the earlier presence of a bear, or smoke, which is an index of fire) but also include photographs, which are indices of captured scenes, and perhaps 3 things like the traces on recording barometers, etc. A slightly different perspective on these distinctions is offered by Goodman (1969), who focuses on the syntax of symbol systems in general. Goodman concerns himself especially with the idea of a notation, as something very clearly distinct from a graphical representation. The details of this are discussed in many places; for present purposes, the most important aspects are the features of graphics, which Goodman identifies as most distinctively nonnotational. These are syntactic density and repleteness. A symbol scheme is dense if any two (ordered) characters have another between them. Thus clocks are not notational if we consider that any position of the hand is a slightly different meaningful element from any other; whereas if we say that any position between two adjacent numbers is equivalent as a symbol, then we have partitioned the clock face into a notational scheme. Repleteness is a property that Goodman especially associates with pictures, rather than diagrams, etc. It has to do with the extent to which all of the properties of the representational object are implicated in the way that it represents. So, in a picture, every nuance of the specific character of the lines, the colours, etc. might if changed potentially change its identity as a symbol and as a picture. One thing one might say, comparing these ideas, is that there is a certain affinity between Peirce's notion of "symbol" and Goodman's of "notation".
3Sometimes the notion of pointing is exploited to suggest that arrows which occur in diagrams are indexical, since they point to some other element, but this seems rather dubiously to mix the domains conceptually involved in the semiotic relation. An index should point to something in the represented (semantic) domain, not simply to a representation of this thing within the graphical domain of the index itself. Arrows seem much more plausibly treated as symbolic representations of some, perhaps pointinglike, relation in the represented domain (to which extent they might still be regarded as "deictic", or indexical in some sense other than what I take to be Peirce' s). It's not clear that anything in the present data should be regarded as indexical in Peirce's sense.
202
J. Lee
For Goodman, a symbol is not something to be considered in isolation, but is always part of a scheme, or a system; and also he is happy to think of anything that has semantics, including a picture, as a symbol. However, the symbol that occurs in a notational scheme is one that has an arbitrarily chosen identity and an arbitrarily assigned denotation. Peirce's symbol thus appears to be notational, though of course the scheme in which it inheres may usually be entirely implicit. So these treatments seem to agree reasonably about the properties of such symbols. However, it is not apparently possible in the same sort of way to relate the two treatments of non-notational images, because Goodman scornfully rejects the notion that iconic or pictorial representation may be based on resemblance or similarity. Indeed, he argues that their operation is as conventional as that of text, but that they have the somewhat distinctive syntactic properties discussed above. Peirce, it seems, may have been somewhat naive in his idea that iconicity is based on resemblance (Greenlee, 1973, p. 73ff), leaving himself open to some of the charges levelled by Goodman, which would have, e.g. a man representing his portrait as much as the other way round (since resemblance is reflexive). Nonetheless, Peirce has a subtle view of representation, holding that an object, to be a representation, must be "interpreted as a sign", and that the relation between representation and represented is mediated by what he calls the interpretant. The latter is a complex notion (see Greenlee, 1973, ch. IV passim), but what it amounts to is an interpretation by (normally at least) an interpreting agent. Hence, it is open to him to argue that resemblance is the basis of iconic representation, but that some form of convention guides people in whether or not to interpret something in this way. Just such a view is argued by Files (1996) (though it is also very clearly proposed by Greenlee, 1973, p. 78), who wants to use it to show that while Goodman was fight to highlight the role of convention, he was wrong in rejecting resemblance as he did. However, we should not lose sight of the fact that although Goodman maintained that iconic representation was essentially conventional, he also pointed out very clearly its syntactic, and correlative semantic, differences from notations. It is proposed in Lee (1997, 1999) that his analysis in terms of density can be closely related to an account in which the use of non-notational representations is supported by structuremapping, and that this is perhaps a way of capturing the respectable aspects of the idea of similarity. Structure-mapping is here the identification, at some level, of structural homomorphism between the representation and that which is represented, a potentially complex relationship (cf. Gurr, 1998; Gurr et al., 1998) which may or may not emerge as something that people in normal circumstances will visually recognise as a resemblance. If this is fight, we might say that the main elements of both Peirce and Goodman's accounts at this point can be captured by noting that there exist at
Component modes of graphical communication
203
least the following two forms of representation: (a) a form that depends on arbitrary elements being recognised as ones which are related (by arbitrary stipulative convention) to particular denotata; (b) a form that depends on structure-mappings between representation and denotatum being appropriately identified, recognised and used. We now seek to investigate the communicative uses of these forms of representation, through further study of the experimental data introduced above.
3.
M O D E S OF C O M M U N I C A T I O N
We will use, for these forms of representation (a) and (b), Peirce's terms "symbolic" and "iconic", in full awareness that we do not necessarily mean quite what Peirce may have meant. If we look again at figs. 1-6, it certainly seems that the initial drawings in each figure have a strong iconic component. It is easy enough to conjecture as to why this might be. The director has at the start to make some connection with the matcher's presumed knowledge, which she/he might exploit to evoke the required concept. Thus in fig. 4, for "museum", a building is drawn; moreover, the building has many windows, perhaps helping to connote size and differentiation in type from an ordinary house. Also there are objects recognisable as a vase, a picture and an animal, such as might be contained in a museum. All these things seem plausibly iconic. But now, there are the two diverging lines that seem to connect the group of objects and the building. These are of a different kind, because there is nothing in the world that looks like them; also there are things in other drawings that do look like them (fig. 5), but represent something rather different. So we might want to say that these are a symbolic element, the director implicitly stipulating for them an arbitrary meaning at which the matcher has to guess from the context established by the recognised icons. Consistent with this is the observation that this element seems to have a clear relational interpretation, suggesting a predicative semantic function, in contrast to the iconic elements, which appear simply to refer to objects. In fact, this limitation on the function of the iconic elements is syntactically important. Approaches to the syntax of graphics frequently assume that there will be some basic elements, which are composed somehow into a more complex whole (cf. the discussion in Gurr et al., 1998). Within the basic elements, spatial relations are not interpreted as such, whereas in the composition they are. Consider the museum building in the first drawing of fig. 4: this has five windows, but we doubt that the drawer
204
J. Lee
intended the number as such to be significant. Also, many details of the three other objects and their spatial arrangement are irrelevant. However, the fact that these two groups are related by the diverging lines (the third significant element) is crucial - the graphical juxtaposition of elements is being used to represent that the building contains the objects. This kind of relationship does not depend on there being an explicit third element, since the objects could in principle have been drawn within the boundary of the museum building, with the same significance. But in that case, one might say, the third element is the spatial relationship holding between the two icons (building and object-group); implicitly, this relationship (actual spatial inclusion in the drawing) would then be interpreted as inclusion between the denotata in "the world", while many other relationships holding within the icons would not be interpreted at all. Notwithstanding this, we naturally want to say that the spatial structure of the icons is important because it supports the structure-mapping that allows them to be recognised. However, that's all it has to do. Anything that does that will suffice for the communicative objectives of the participants. At the outset, in this first drawing, there must be sufficient complexity to ensure, or at least make it likely, that the matcher will recognise the object. A good deal is already known about the matcher' s resources in achieving this recognition: she/he is known to have a shared cultural background and hence can be assumed to have relevant background knowledge; also she/he is known to have the list of potential candidate concepts to match against (the experimental task being in that sense simpler than normal Pictionary). As time goes on, more is known, due to the shared experience of this particular interaction. Accordingly, the icons can be simplified, allowing some economy of effort with fair confidence that recognition will still be achieved. This situation (as brought out by the more detailed discussion in Fay and Garrod, 2001), is quantitatively very parallel to the observations of Clark and colleagues (e.g. Clark and Wilkes-Gibbs, 1986; Clark and Schaefer, 1987; Schober and Clark, 1989) concerning the simplification of linguistic descriptions of items in repeated trials, and even the simplification of gestures in structurally similar experiments (Clark, personal communication). We notice that in examples such as fig. 4, the relational element remains rather constant. One question that arises is why it is there at all: why is the inclusion of the objects in the museum not just drawn implicitly? A possible answer is that it would be somehow more complex to draw things that way, but this seems unconvincing in general. Perhaps a better reason is that the explicit representation of the relation focuses attention on it. This is an aspect of what is elsewhere often known as information packaging, which concerns the way that a medium is used to achieve different cognitive effects (cf. Lee and Stenning, 1998), and is in
Component modes of graphical communication
205
this case perhaps analogous, e.g. to the way that syntax is used in language to influence emphasis (compare John stole the vase and The vase was stolen by John). A museum is a building with objects inside it. This is the basic conceptual structure of the representation that the matcher has chosen, and this structure remains constant and probably central in differentiating this concept from the various others (say, art gallery) that might occur in the experiment. In this structure, then, the iconic elements can be simplified, because in the context of the ongoing interaction they remain recognisable. But surely this means they become less iconic? This seems an interesting suggestion - but how could we measure how iconic something is? Going back to the discussion of the last section, it does not appear that Peirce, e.g. thought of iconicity as something admitting of degrees. Certainly, if we look at the rabbit in fig. 1, it is clear both that the final drawing continues to represent the rabbit's ears and that it is extremely attenuated with respect to the first drawing. The latter, however, represents not only - indeed not primarily - a rabbit, but rather an arbitrary exemplar of a genre, namely cartoons. Perhaps, even, it is not intended to be a rabbit, but rather, say, a mouse. Rabbithood (or mousehood) is in this case at best incidental, yet in the end it is the only apparent feature left represented. This is certainly misleading, since the role of the drawing is simply to recapture the reference to cartoons. So perhaps we should say that it has become entirely symbolic? Or that each drawing is partly symbolic, partly iconic, and the balance has shifted? This latter is especially tempting, but it is unclear what it will gain us to take such a view. We have as yet no way to specify where the balance lies, nor to characterise the significance of its different possible positions. Our attention is hence firmly directed to the nature of "simplification". It consists here in the removal of graphic complexity while retaining recognisability. The latter should be construed, we have suggested, as the preservation of elements of a particular structure-mapping. Such simplification is common enough, and we know how it happens; we are very familiar with the continuum between photographically realistic painting and outline drawing. The scanty Haro sketch in fig. 7, for example, retains certain things and omits others in a far from random manner. People in general can tell when a less complex graphic remains recognisable, and the less complex is also clearly in some sense more "abstract" - it is more ambiguous; it could be (used as) a representation of more things. 4 On the other hand, complexity and specificity are not straightforwardly correlated. The Haro sketch, though 4One should more strictly say here, it could more easily be used as a representation of more things, since, e.g. one could in principle use the Mona Lisa to represent an arbitrary specific female (cf. the discussion in Lee, 1999).
206
J. Lee
~
Fig. 7. A sketch by Haro. sparse, retains much that makes it very specific as to the type of woman it depicts. In the cases of the examples from the experiment, the nature of the simplification that occurs is usually quite clear: elements, or parts of elements, are simply omitted, with the effect that the drawing would be, out of context, far more ambiguous. In the context of the ongoing interaction, however, the ambiguity does not have the effect that the risk of confusion is increased. This is a situation that has parallel elsewhere in communication. An obvious place is the use of pronouns and similar devices in language ("anaphora" in general). A pronoun such as he, she or it is extremely ambiguous, of course, but nonetheless is used in context to obtain considerable economy of expression. It does this by acting simply as a marker that something previously mentioned is being referred to again. In itself it contains almost no clues as to the identity of this thing - perhaps, in English, its gender and number, but little else. The rest of the clues come from the context, and the language user's memory of previous related mentions. Sometimes, if more information is needed, a definite description might be used (e.g. the museum), which indicates which type of object the antecedent must be. A good deal of linguistic work has been done on phenomena like these, and a general lesson is that an expression contains only as much information as is absolutely necessary to secure the reference needed. In the work of Clark and colleagues, mentioned above, a type of task often used involves repeated reference to "tangram" arrangements of simple shapes (triangles, lozenges, etc.). Participants may begin with locutions such as "the one that looks like a little old man bent over carrying a large bundle of sticks", but after a very few recurrences this
Component modes of graphical communication
207
might become truncated to "the man with the sticks", or conceivably even "him with the sticks". We notice, about this (fictitious) example, two things: simplification occurs radically in the referring expressions, but the relational structure is retained. We conjecture that the basis of this process is the same in language and in graphics, and in an example like this will parallel the outcome in the above discussion of fig. 4, if the task context and the probability of confusion with other items are also sufficiently parallel. Initially, the reference will be complex - verbose, or a complex drawing - in order to secure reference as reliably as possible, but subsequent references will very rapidly decay in complexity to the minimum required in the task context. Moreover, the initial description will highlight distinctive features of the item and make their relationship explicit because this will aid both initial identification and subsequent reidentification, the latter implying that this aspect of the description will tend to be retained through the simplification process. In the light of this conjecture, it emerges that simplification of an iconic symbol is its controlled reduction as the minimal effective way to evoke its denotatum possible in the context. This becomes simpler in a way that depends on parameters of the shared interaction. The final drawing in fig. 1 could not be used at the centre step of the interchange, but the latter can be (and in normal circumstances predictably will be) simpler than the initial drawing. Exactly what parameters govern this reduction process remains a topic for further research. We note, of course, that sometimes the course of the interaction runs less smooth, as in fig. 6. Here, it seems likely that after the third drawing there is some reason (of which perhaps the evident lack of convergence is also symptomatic) for the drawer of the fourth to believe that earlier attempts have not secured evocation of the appropriate concept; in which case the highly unusual addition of an element (the TV aerial) can be seen as a repair strategy. 5 Indeed, after these things seem to work better, and perhaps if the interaction continued further, simplification would occur after all.
4.
CONCLUSION
We conclude this programmatic discussion by reflecting that the dimension of iconic and symbolic representation does not seem a particularly helpful way of thinking about the issues in communication. A drawing can be 5The aerial is an element often included by other pairs in this experiment, clearly to help differentiate the drawing from ones used to evoke other confusable concepts on the list to be communicated, such as "microwave" and "computer".
208
J. Lee
considered "iconic" if its use is based on a structure-mapping at some level, but the relevant level is a function of what has to be communicated, for what purpose and in what context. Sometimes it will be as complex as the Mona Lisa, sometimes as simple as a Haro sketch. If, in fact, its features are all necessary to the role it plays, then it can be regarded as "replete" in Goodman's sense. In that case, we might defend the claim that all the drawings in the figures in this paper are iconic. But also, something is "symbolic" if it functions as a placeholder to achieve as far as possible acontextual reference to an arbitrarily stipulated item. In this sense, it functions to the same end as the icon in the first drawing, but not in the same way. It seems, then, as if symbolic representation is not, after all, like the extreme case of simplification, because the latter works only in context. This standoff might yet be seen as really a consequence of taking too narrow a view. Actually, no communication occurs out of context - there is always a cultural context, a physical context, and a pragmatic context (as captured in the notion of speech act, for example). While the reduction of an icon occurs and functions within a context for a pair of subjects, when these subjects move on to interaction with others, the reduced icon can take wider hold on a role as a recognised placeholder, and thus increasingly a symbol. The emergence of such group conventions in language has been shown by Garrod and Doherty (1994), and a follow-up experiment to the one described above is already producing evidence of a similar effect in graphics. It is useful at this point to reflect on the programmes of Peirce and Goodman. Both came from philosophical traditions that emphasise expressions with some degree of permanence: texts in books, and pictures in galleries. Both are concerned with the different ways these representational systems work, but do not generally address the ephemera of communication in dialogue, whether linguistic or graphical. Such is the case with most philosophical, and much other, theory. Communicative systems are considered that have become part and parcel of a culture through a long process, by the end of which distinctions may hold up that would be difficult to defend at an earlier stage. The usual apparatus addressed towards expressions complete in syntax and semantics, analysed through propositions and truth-functions, is much more difficult to apply in dialogue of any kind. If we have been fight here to argue that the distinctions employed in these systems apply, as far as they do, to phenomena that emerge from dialogue but are not so clearly discernible within it, then we should expect that they are of limited help in analysing dialogue itself. In communication, people are profligate in their use of available modalities of expression and will exploit anything to achieve their ends efficiently. If they are restricted - to graphics as in our experiment, to words as on the phone - then they will cope, but if they are less restricted they will
Component modes of graphical communication
209
behave differently. In an earlier pilot for the above experiment, pairs were asked to communicate architectural styles by drawing, and although they were not allowed to talk they were able to see each other and could guess freely. Trying to communicate Richard Rogers, one participant frowned and looked blank; quickly the other guessed, and correctly hit on Rogers after one incorrect try. After this, a blank stare was successfully used again by the other participant to communicate Richard Rogers. This story emphasises two points. First, that reduction of graphical expressions is a strategy that is employed because it is available and works in the given situation, but it will be readily circumvented by anything else that works even better; hence, form follows function. Second, that interaction is central, and that simplification, though we have somewhat neglected this point above, is accompanied by convergence. Only if the participants move towards tacit agreement on their communicative strategy will it really succeed. Another way of putting this is that the participants move into alignment. We confidently expect that recent work on a psycholinguistic theory of interactive alignment by Pickering and Garrod (under review) will presently be shown to apply much more generally, both to the above kind of situation and to contexts such as the design dialogues discussed by Neilson and Lee (1994). The key to this story, then, has been the dynamic of dialogical communication. We observe that this is also characteristic at a higher level and over longer periods of interaction among communities, and we suggested at the outset that perhaps the way ontogeny recapitulates phylogeny is a useful analogy. This is emphasised in Wittgenstein's notion of "language games", and while it may seem rather selfconsciously Wittgensteinian to say "the interpretant is use", this nonetheless captures a strong underlying theme in the foregoing discussion. Somewhat less clearly in that tradition is our suspicion that this dynamic is based on fundamental, perhaps information-theoretic principles, which drive or can be used to describe interactive alignment more generally. But if language games are thought of as a high level of interactive alignment among communities, perhaps the two notions come together after all.
REFERENCES Clark, H., Schaefer, E.F., 1987. Concealing one's meaning from overhearers. J. Memory Lang. 26, 209-225. Clark, H., Wilkes-Gibbs, D., 1986. Referring as a collaborative process. Cognition 22, 1-39. Fay, N., Garrod, S., 2001. The principles of graphical communication: preliminary findings, Unpublished report. Department of Psychology, University of Glasgow. Files, C., 1996. Goodman's rejection of resemblance. Br. J. Aesthetics 36, 4, 398-412.
210
J. Lee
Garrod, S., Doherty, G., 1994. Conversation, co-ordination and convention: an empirical investigation of how groups establish linguistic conventions. Cognition 53, 181-215. Goodman, N., 1969. Languages of Art. Oxford University Press, Oxford. Greenlee, D., 1973. Peirce's Concept of Sign. Mouton, The Hague. Gurr, C., 1998. On the isomorphism, or lack of it, of representations. In: Marriot, K., Meyer, B. (Eds.), Theories of Visual Languages. Springer, Berlin, pp. 288-301. Gurr, C., Lee, J., Stenning, K., 1998. Theories of diagrammatic reasoning: distinguishing component problems. Minds and Machines 8, 4, 533-557. Krampen, M., 1983. Icons of the road. Semiotica 43, 1/2, 1-204. Lee, J., 1997. Similarity and depiction. In: Ramscar, M., Hahn, U. (Eds.), Proceedings of the Interdisciplinary Workshop on Similarity and Categorisation (SimCat '97). Department of Artificial Intelligence, University of Edinburgh, Edinburgh. Lee, J., 1999. Words and pictures - Goodman revisited. In: Paton, R., Neilson, I. (Eds.), Visual Representations and Interpretations. Springer, Berlin, pp. 21-31. Lee, J., Stenning, K., 1998. Anaphora in multimodal discourse. In: Bunt, H., Beun, R.-J., Borghuis, T. (Eds.), Multimodal Human-Computer Communication. Springer, Berlin, pp. 250-263. Neilson, I., Lee, J., 1994. Conversations with graphics: implications for the design of natural language/graphics interfaces. Int. J. Hum. Comput. Stud. 40, 509-541. Schober, M., Clark, H., 1989. Understanding by addressees and over-hearers. Cogn. Psychol. 21, 211-232. Scholtz, O., 1993. When is a picture? Synthese 95, 1, 95-106.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
15 Interlopers, translators, scribes, and seers: anthropology, knowledge representation and Bayesian statistics for predictive modelling in multidisciplinary science and engineering projects Deborah Leishman and Laura McNamara Statistical Sciences Group, Los Alamos National Laboratory D-I, Los Alamos, NM, USA
Multidisciplinary projects often lack integrated representations to support a diverse community's problem-solving process. In this chapter, we discuss an interdisciplinary approach to knowledge elicitation, representation and transformation developed in the Statistical Sciences Group at the Los Alamos National Laboratory. This approach is called information integration technology (liT), and it meshes techniques from cultural anthropology, the AI community, and Bayesian statistics to address the complexities of multidisciplinary research. Specifically, we use elicitation techniques derived from cultural anthropology to elicit tacit problemsolving structures from the "natives" - generally, the scientists and engineers collaborating on difficult R&D problems. The elicited information, in turn, is used to develop ontologies that both represent the problem space in the "native language" of the research team, but which are more mathematically tractable to AI and statistical communities. Iterative cycles of representational refinement and quantification lead to the emergence of predictive statistical models that make intuitive sense to all parties: the engineers, elicitation experts, knowledge modellers and statisticians. This method can be used in many types of problems including reliability quantification as shown here. 211
212 1.
D. Leishman and L. McNamara
INTRODUCTION
Statisticians are often asked to provide predictive risk and reliability assessments for a wide range of research and development projects. When these projects are very innovative, however, the statistician may be faced with the dilemma of minimal data for the system under scrutiny. Complicating such situations is the increasing ubiquity of multidisciplinary and multinational research teams" statisticians often find themselves asked to contribute to complex, emergent projects that challenge their ability to build predictive models capable of integrating multiple types of data, information and knowledge from a wide range of sources. In this chapter, we discuss an interdisciplinary approach to knowledge elicitation, representation and transformation developed in the Statistical Sciences Group at the Los Alamos National Laboratory. This approach is called IIT, and it meshes techniques from cultural anthropology, the AI community, and Bayesian statistics to address the complexities of multidisciplinary research. Specifically, we use elicitation techniques derived from cultural anthropology to elicit tacit problemsolving structures from the natives - generally, the scientists and engineers collaborating on difficult R&D problems. The elicited information, in turn, is used to develop ontologies that both represent the problem space in the native language of the research team, but which are more mathematically tractable to the AI and statistical communities. Iterative cycles of representational refinement and quantification lead to the emergence of predictive statistical models that make intuitive sense to all parties" the engineers, elicitation experts, knowledge modellers and statisticians. In the following pages, we describe the origins and structure of the IIT approach and demonstrate its use in the development of a hierarchical reliability model for a complex rocket system. The IIT knowledge modelling techniques are of particular interest to Bayesian statisticians, whose problem solving approach often relies on complex hierarchical networks.
2.
A P R I M E R ON STATISTICAL CONSULTING AND BAYESIAN HIERARCHICAL MODELLING
Statisticians who work in experimental science and engineering fields become quite adept at consulting with research teams to develop a wide range of probabilistic models for decision-making. Traditionally, statisticians have worked fairly bounded pieces of a larger problem: experimental
Anthropology, knowledge representation and Bayesian statistics
213
design, for example, or failure mode analysis. This trajectory has resulted in a standard model for statistical consulting in which the clients provide the statistical consultant with a problem definition and some data sources that in the statistician's mind lend themselves to a particular class of models. The statistician goes back "over the fence" and works an area of the problem, periodically asking clients to clarify some aspects of the model or to provide additional data. The past 20 years or so, however, have seen a trend towards largescale, complex, multidisciplinary scientific projects that often incorporate experts from a wide range of disciplines, including engineering, biology, physics, computer science, chemistry and others. The complexity of these problems often requires a greater level of participation from the statistician and also demands a statistical approach capable of combining multiple forms and types of data. Bayesian statistics is one such approach. This relatively new sub-field of statistics was perceived until recently among more traditional "frequentist" statisticians as a radical, controversial, and even untenable approach to estimating probability (Wilson, 2001). Today, Bayesian models are widely used to combine multiple sources of data to estimate the probability of an event in the future, based on relevant information regarding the occurrence of that event in the past. Although Bayesian models are well suited to addressing complex problems, constructing a Bayesian model requires a great deal of time and information about the problem at hand. The liT approach was designed to address this problem by using conceptual graphs to represent the complex problem space. Because Bayesian models are represented as chain graphs (i.e. nodes connected by arcs), they are remarkably synergistic with conceptual graphs.
0
INFORMATION INTEGRATION TECHNOLOGY FRAMEWORK
The diagram shown in fig. 1 outlines the IIT framework, which we use to derive qualitative knowledge models of a domain of interest, and transform these knowledge models into quantitative mathematical models such as Bayesian networks. The framework specifies the context in which these models are being formulated: for example, a decisionmaking environment in which they will be used to predict the reliability or performance of a system. IIT methods and the IIT framework are designed to support the emergence of a comprehensive, quantitative decision support model through developing
214
D. Leishman and L. McNamara
~
Oectives
. .vOeCisi~ e Problem Refinement
Problem Definition
I
Data Sources Communities of Practice
Fig. 1.
Decision Making
[ ]
The Information integration technology framework.
a set of knowledge representations that serve as a common denominator for all problem owners. In a complex system reliability problem, "problem owners" may include engineers, program managers and sponsors, computer scientists, physicists, technicians, and other experts contributing to the problem, liT requires the ongoing involvement of a knowledge modeller, who acts as a translator working iteratively among the problem owners, technical experts and consultant statisticians. The resulting graphical models provide a comprehensive, nuanced representation of the problem space. These representations are arranged hierarchically in interlinked levels of abstraction, the highest of which provides problem owners with an overview of the entire problem space. The hierarchy of specification enables project participants to drill more deeply into important areas of the problem while maintaining a consistent logical structure throughout all levels of problem representation. The first stage in the liT method is elicitation of the foundation elements: identifying the communities of practice and/or stakeholders involved in the problem, defining the problem space and the decisions that are to be made by all stakeholders, and documenting the relationship between the stakeholders' objectives and their decisions. Once the problem space is defined, the knowledge modeller begins to work with experts to elicit the conceptual structures they use to work the problem. Using this elicited information, the knowledge modeller develops graphical representations of the problem space using those elements. The visual representations used in the liT method are derived from conceptual graph techniques pioneered by John Sowa (1984). As these qualitative representations emerge, the knowledge modeller works iteratively with the problem owners, experts and consultant statisticians to formulate the dependencies between concepts in the knowledge model. Once finalised with the experts, the knowledge modeller and the statistician begin transforming these qualitative representations into mathematical models.
Anthropology, knowledge representation and Bayesian statistics
215
The resulting mathematical framework is an extremely useful structure capable of combining multiple types of quantitative information to support decision-making in a traceable manner. Doing so requires identifying appropriate data sources to populate nodes in the model, transforming these data into joint probability distributions, and propagating these distributions and their associated uncertainties through the model.
4.
C O N C E P T U A L GRAPHS
The conceptual graph model proposed by John Sowa (1984) is a method of representing the mental models that people use to understand the world. This approach combines a mapping to and from natural language with a mapping to logic. A conceptual graph, which consists of concepts and relations connected by arcs, asserts a proposition and takes the form of a finite connected bipartite graph. Concepts represent any entity, attribute, action, state or event that can be described in natural language. Relations detail the roles that each concept plays, and the arcs serve as connectors between the two. These graphs can be written in either a graphical representation or in a linear form to conserve space.
4.1.
Simple graphs
This section presents parts of the conceptual graph model that form a central core. This includes concepts, relations and the arcs between them. Central to the model is the ability to map the graphs into first-order predicate calculus. An example of a simple graph is: [Cat: #123] ~ (State)---, [Sit] ~ ( L o c a t i o n ) ~ [Mat]
(Eq.1)
which represents "A cat named 123 is sitting on a mat".
4.1.1. Concepts and relations Concepts represent the entities, attributes, actions, states or events found in natural language. In conceptual graph notation, they are shown as square boxes. A concept box has a referent field on the fight of the colon. In this way, both genetic concepts and particular individuals can be referred to. For example, [Person: 9 ] or [Person] both refer to the genetic concept, while [Person: #123] or [Person: Sam] refers to particular individuals,
216
D. Leishman and L. McNamara
one named Sam and one named 123. Every genetic concept in the graph terminology is existentially quantified. Generic concepts act like variables in logic, while individuals are like constants in logic. Relations in the conceptual graph model specify the role a concept plays and define the relationship between concepts. Relations are shown as circles in the graph notation and can have any number of arcs. For example (Past) is a monadic relation with one arc, (Agent) is a dyadic relation with two arcs and (Between) is a triadic relation requiring three arcs.
4.1.2.
A logical mapping
The conceptual graph model defines the operator ~b, which maps simple conceptual graphs into formulas in the first-order predicate calculus. For these simple graphs, the only logical operators, which are needed are conjunction and the existential quantifier. For example, the conceptual graph (1) maps into the following formula when the ~b operator is applied: 3x3y(Cat(#123) ^ State(#123,x)^ Location(x,y)^ Mat(y)). Conceptual graphs are usually more concise than logical formulas because arcs on the graphs show the connections more directly than variable symbols.
4.2.
Compound graphs
Compound conceptual graphs allow for the expression of more complex sentences than can be described using simple graphs. The components comprising compound graphs are discussed in this section and include nested propositions and co-reference link. Tense, modality and negation can also be represented in conceptual graphs. Figure 2 shows an example of a graph that contains most of these elements.
4.2.1.
Propositions
A proposition is a concept whose referent is a set of conceptual graphs that are being asserted. The graphs being asserted are said to occur in the context of that proposition, thus propositions are also referred to as context boxes. Propositions can be nested inside of one another and proposition is the default label for a box that has no other type label. Conjunction of two or
Anthropology, knowledge representation and Bayesian statistics
217
Sam thinks that the house has a kitchen and that Ivan believes that there is a cat in the kitchen Person: Sam
I-"
I Person: Ivan
[
k(~)
~
House
I_
I [Think [
/
I-" iii I I I I i I I House I - ~ Part )
I I I I
Proposition: ~ I I I I I |
I
---[Kitchen I ! I ! I
I |
I Person: Ivan
,_
[-
r
)
1~ Expr ~
I
II Believe I /
I
I I
'
I I
I
I I I
Proposition: I Cat l
Fig. 2.
'
I i
-I Kitchen
A conceptual graph of a complex sentence.
more graphs is represented by drawing all the graphs inside a proposition. Figure 2 contains three nested propositions.
4.2.2.
Co-reference finks
Co-reference links in conceptual graphs show which concepts refer to the same entities within a graph. In a sentence, these links are expressed as pronouns or other anaphoric references. Figure 2 shows co-reference links using dashed undirected lines. These co-reference links are also referred to by Sowa (1984) as lines of identity and denote an equality relation between concepts. In fig. 2, for example, the phrase, "Sam thinks that the house has a kitchen", refers to a house that Sam already knows about.
218
5.
D. Leishman and L. McNamara
THE CONTEXT: THE R O C K E T D E V E L O P M E N T P R O G R A M CENTER
To illustrate the application of the methods we have developed, we use examples from a research and development program that gathers data on test rockets to analyse their performance during flight and to make modifications to their design as necessary. Throughout this discussion, all engineers and agencies are aliased to maintain controls over proprietary and sensitive information about the program. We refer to this program as the RDP, or the Rocket Development Program. The overseeing agency for the RDP is a group of engineers located in the south-eastern United States; we refer to them as Rocket Development Program Center (RDPC). Two other groups of engineers are responsible for building separate sections of the rocket: one group of engineers is building a booster to send the rocket into the upper atmosphere, while the other group designs a test payload for the rocket to carry. In addition, several other sub-contractors and vendors provide parts and support to each of the two primary engineering agencies. RDPC is primarily responsible for project management, cost controls, and scheduling. The RDPC program managers came to Los Alamos with a specific problem: how does one develop a predictive reliability model for an engineering system that is still in the design stages? Multiple concerns drove this question: the rocket development program is extremely expensive. Only one or two of the prototypes is built and flown and is usually destroyed in the process; rarely are the engineers able to salvage subsystems for reuse in further iterations of the program. Because each system flown is unique, there is little direct, performance, or reliability data available for parts or subsystems on the test rocket. Hence the program managers had little idea how to make predictions or assess risk areas for the flights. The goal of the LANL/RDP collaboration was to develop an integrated, full-system, predictive reliability model for an upcoming rocket flight. In developing the model, Los Alamos developed a model framework that captured the critical interactions among the rocket's subsystems during flight. We also elicited and documented the many sources of data and information that the engineers used to build confidence in their rocket before flight. The resulting model combines multiple sources of information in a rigorous, quantitative framework that can be used to identify and weigh potential risk areas to overall mission "success".
Anthropology, knowledge representation and Bayesian statistics
6. 6.1.
219
BUILDING A MODEL Engineering representations
The contracting engineers in charge of developing the rocket are prolific creators of representations: mechanical drawings, electrical layout diagrams, interface control documents, reliability block diagrams, viewgraphs for debating design issues. Not surprisingly, many of these engineers expressed doubt about the utility of creating even more diagrams of their systems. However, while their representations were sufficient for building a test rocket, they were not sufficient for creating a statistical reliability model. For one thing, engineering drawings - like all representations - are locally meaningful mediums of expression that require experiential knowledge to be sensible to the viewer. Hence it can be quite difficult for a non-member to decode the representations created by a community of engineers one has only recently met. The design and development process that the contracting engineers follow compounds this problem. As anthropologist Etienne Wegner (1998) has observed that problem solving is a process of devising representations of knowledge around which parties negotiate meaning. Like many engineering communities, the two primary contractors in the RDPC project each assign bounded teams of engineers to work on separate subsystems of the rocket. Engineering representations are used to communicate design requirements across team boundaries. Each iteration results in new, updated representations that capture the current state of knowledge about each of the subsystems required for a functioning rocket. However, at no point in the engineering problem-solving process does the community develop an integrated representation of the rocket's many subsystems as they are intended to work during flight. Indeed, demonstration of the successful integration of the community's many "ways of knowing" only takes place once the rocket is in flight. To develop a reliability model as a Bayes net, however, the statistician must understand relationships among different elements of the rocket as it works during flight. This is where knowledge modelling becomes a critical step in creating an integrated model, one that captures subtle dependencies among interrelated parts and uses those dependencies to predict states for the overall mission.
6.2.
Defining project goals and identifying adviser-experts
The first step in the IIT knowledge modelling process was to meet with the RDP project leaders to identify specific goals for the rocket system, to get an
220
D. Leishman and L. M c N a m a r a
overview of how the rocket would function, to find out which contractors were responsible for the major areas of the project, and to determine the metrics that the RDP project leaders would use to assess the project's outcomes. At the same time, we devised a general set of goals for the statistical model: to support the rocket project by identifying risk areas, and to provide a quantifiable, traceable statement of risk to upper-level managers in RDPC. It is impossible to meet project goals without the cooperation of the project's experts, and this requires identifying cooperative insiders who can act as adviser experts to the knowledge modelling team. To ensure the participation of adviser experts throughout the project, RDPC instructed the lead engineer in each contracting organisation to support the model building effort. RDPC also provided funding to these agencies so that they could pay their staff to contribute to the model development. These individuals would serve as adviser experts within the contract organisations: insiders who would willingly partner with the knowledge modeller to identify other experts and to develop sound elicitation protocols and instruments.
6.3.
Scratchnets and success and failure for system builders
Once we had met with the adviser experts in each of the contracting agencies and explained the goals of the project, the next step was development of a formal ontology to represent the primary concepts of knowledge in the problem space, and to understand the network of relationships among those concepts. During our first series of meetings with advisor experts from RDP, the booster contractor, and the payload contractors, we elicited information using a scratchnet (Paton et al., 1994). Scratchnets are straightforward, non-hierarchical node-and-arc drawings that simply identify concepts as related to specified domain. In addition to developing a scratchnet representation of the problem, we also worked with the problem owners to elicit definitions of success and failure for the RDPC program managers. We borrowed a common aerospace terminology for describing mission outcomes: a "stoplight chart", which is perhaps more accurately described as a continuum of failure-to-success, represented by red, yellow and green panels. Equally important in this stage was eliciting how the booster and payload builders defined success and failure, so that we could understand how their goals interlocked with RDPC' s goals. We used the same stoplight continuum in elicitation sessions with our adviser experts at each agency. All stoplight charts were ultimately combined into a single chart, with all mission goals and states for mission
Anthropology, knowledge representation and Bayesian statistics
221
outcomes clearly mapped. In addition, we worked with RDPC to elicit metrics that would determine each of the states for mission success and failure, while eliciting metrics for subsystem performance from the adviser experts at each contracting agency. This information provided the statisticians with a means of quantifying a range of potential outcomes for each of the subsystems in the rocket, and a way to quantify overall mission success and failure.
6.4.
The top level ontology
Iteratively refining the scratchnets and the success failure continua is a learning process for the knowledge modellers and leads to the development of a first-order ontology, one that mapped at the most basic level the key concepts for the domain "RDP-2 rocket" and the relationships among those concepts. In the ontology shown in fig. 3, we use a conceptual graph representation with concepts as rectangular nodes and relations as circular nodes and arcs indicating directionality among concepts and the relationships that tie them together. The diagram is black and white, but in the actual ontology, the concept boxes were
ROCKET DEVELOPMENT PROGRAM
EVENTS] I ....
RUN
~1 EVENTS
FUNCTIONS I
SPACE/TIME
I STATES I
IMETRICSl MISSILE]
URE MODES I
ISUBSYSTEMSl
"xE ~T I 'RO 'ESS r
I_
MECHANISMS 1-
Fig. 3. Ontologyfor RDP-2 rocket model.
IPARTSI
222
D. Leishman and L. McNamara
colour coded to ensure that specifications of concepts in later drawings were linked to the correct conceptual category. This representation is also recognisable to Bayesian statisticians, who use directed acyclic graphs as structures for propagating uncertainties. Note that the ontology differentiates between two stages in the design process: "design time", when the engineers are working to plan and build the rocket; and "run time", which represents the actual functioning of the rocket during flight. Essentially, the knowledge modeller partnered with the engineers in the design time area of the ontology to create a statistical model that would be used to predict the reliability and performance of the rocket system during run "time", the actual flight. Information generated during the design process in the design time area of the ontology was used to create a model structure and to gather data to populate the model. The top-level ontology is a significant point in the liT method, for it is an elicitation tool that provides a guide for specifying further levels of the domain. In the rocket project, the ontology revealed key focus areas: for example, what functions were required in order for a particular event to occur? What parts were required for that function to occur? How could failures in individual parts contribute to failed events? During the elicitation process, the ontology also guides the development of a hierarchy of representations for the problem, from the most general and abstract representation (the top level ontology) to the most specific representations (dependency diagrams that detail specific relationships among parts, subsystems and functions). One critical outcome for the representations is traceability from level to level, so that the representations flow in an orderly fashion from the ontology and make intuitive sense to all parties: the knowledge modellers, the statisticians, RDPC, and the builders of the booster and the payload.
6.5.
Ontology specification: event dependencies
Once the top-level ontology was completed, we were ready to begin developing specific representations of its concepts. The first level of specification focused on identifying measurable flight-time events that would act as conceptual waypoints, to make the linear flow of the planned rocket trajectory into a discrete series of measurable focus areas for the model. Significantly, the order in this representation of flight events was not a timeordered linear sequence, but rather a sequence of dependencies as shown in fig. 4. In other words, this level specified the order in which any particular event during the flight could impact, or be impacted, by any other event.
Anthropology, knowledge representation and Bayesian statistics CAMERA B DATA COLLECTION
CAMERA A DATA COLLECTION INTEREVENT DEPENDENCIES
223
t
T
CAMERA B DEPLOYMENT
CAMERA A DEPLOYMENT
t IGNITION
BOOSTED FLIGHT I TRAJECTORY
SEPARATION EVENT 1
SEPARATION EVENT 2
APOGEE
~J
M
II
REENTRY
EXPERIMENT
IDEPLOYMENT
II
EVENT
Fig. 4. Specification of inter-event dependencies for rocket flight.
Using the success and failure chart in combination with the event specification, the RDP staff could heuristically begin to relate overall mission success to states for any single event, by asking how a red, yellow or green state for a particular event might impact subsequent flight events.
6.6.
Functional, subsystem-part, and series-parallel specifications
The next stage in specifying the full ontology was to focus on each flight event and begin identifying key parts, subsystems and functions. Working with the subsystem engineers, we created the next three levels of specification for each event: a functional diagram that detailed only the functions required for an event; a subsystem-part diagram that broke subsystems into collections of parts; and a modified series parallel diagram that specified the order in which parts in a subsystem work together to perform a function. For each event displayed on the inter-event dependency diagram, we created a representation to detail relationships between functions and events. For example, fig. 5 details the functions that the booster must execute in order for the first stage of the flight to occur. Note that the representation says nothing about the state (red, yellow, green) of the functions, or the event itself: the functional drawing simply relates functions to other functions and ultimately to the event, "boosted flight".
224
D. Leishman and L. McNamara BOOSTED FLIGHT: PRIMARY FUNCTIONS
TR BOOSTED FLIGHT / TRAJECTORY
IGNITION
I DATACOLL I VEHICLE TRACKING
I
DATA COLLECTION 1
I
DATA COLLECTION2
TR FLIGHT
I
VEHICLE TRACKING
I CAMERADATA I COLLECTION
VEHICLE GUIDANCE, NAVIGATION, CONTROL
I
PROPULSION I
I ATTITUDE I CONTROL
Fig. 5. Functional view of event, "boosted flight". The representation above identifies two primary functions for "TR Flight". These functions include "Data Collection/Vehicle Tracking", and boosted flight, which are themselves broken into several sub-functions. These sub-functions, in turn, can be further specified by the parts and subsystems involved in their performance. Note that in the drawing, the event TR Flight depends not only on a set of nested functions, but also on a previous event in the trajectory, "Ignition". Given that a rocket flight is an enormously complex set of dependencies, one of the convenient things about this type of representation is that it allows the knowledge modeller to detail only the functions specifically required for the event in question. In other words, while a boosted flight of course depends heavily on what happens during Ignition, those ignition-related functions are detailed in a set of representations for the Ignition event and do not need to be re-drawn for "boosted flight". Not shown in this chapter are the next two levels of representational abstraction. Subsystem-part representations are graphical inventories of specific parts and the subsystems that house them. It is important to point out that this view provides no information about how any constellation of parts performs a function, but rather identifies how specific parts are grouped into subsystems. This is important since functions are not infrequently the result of individual parts in separate subsystems working simultaneously across subsystems to produce a particular function. This diagram is less a representation for the statistical model than it is a "laundry list" that the knowledge modellers and the engineers use to ensure that all parts are properly grouped into their respective subsystems.
Anthropology, knowledge representation and Bayesian statistics
225
Process knowledge is specified in the next stage of abstraction, a series parallel diagram that locates parts within a subsystem and displays the order in which parts function with each other to perform a given function. Most engineering drawings tend to be structural in nature, not functional; in other words, they display connections among parts, rather than describe how parts work together to perform one of more functions. Although we realised that a functional view of the system would be critical for developing any kind of predictive model of rocket performance, that knowledge was not only tacit; it was also distributed across numerous individual engineers. Hence, it was necessary to elicit and represent this information using the functional and structural specifications described above. This stage marked the beginning of the transition from an engineering understanding of the system, to a statistical dependency model that could be quantified and populated with available data to make predictions about the rocket in flight. The series parallel diagram was the first step in this transition. This type of drawing is somewhat similar to a series parallel diagram exemplified in a classic reliability block diagram, but with a great deal more descriptive information. Block diagrams simply connect parts to parts in the order that they must perform so that a given phenomenon occurs. The series parallel diagrams we developed followed the structure of a reliability block diagram but contained a great deal more information about the context of a particular part and its functions.
0
7.1.
DEPENDENCY DIAGRAMS: FROM KNOWLEDGE MODELLING TO BAYES NETS Dependency diagrams: roll control
Although different kinds of series parallel diagrams provide a wealth of information about how parts and subsystems and functions are linked to events on the rocket trajectory, these diagrams are not sufficient for building a Bayes net. This is because Bayes nets represent dependencies among their elements: given what I know about one node in a model, what might I be able to say about nodes whose states depend on that event? The final stage in the knowledge-modelling process, then, is to transform the series parallel diagrams into dependency diagrams. The difference between the two is subtle, but critical: series parallel diagrams specify the linkages among parts related to a function and imply some order to those parts: for example, a power function might be described as, "Battery A feeds power to a PTS, which sends a current to the following electrical components:...".
226
D. Leishman and L. McNamara
A dependency diagram, on the other hand, describes that same power function as dependent on the performance of Battery A and the PTS, and how downstream components' performance is (at least partially) dependent on that power function. The most immediate difference between a basic series parallel diagram and a dependency diagram is that subsystems are not represented in the latter. This is because subsystems simply designate the geographical location of parts within the rocket; dependencies exist between their parts and one or more functions. Strictly speaking, no functions depend on a subsystem; however, many functions may depend on the individual parts within a subsystem. In a dependency diagram, we are concerned with specifying three types of information: how functions depend on one or more necessary parts, how the performance of a particular part depends on a particular function (recursive relationships), and how parts may provide redundancy (part A or part B is necessary for function X) or single points of failure (part A and part B are necessary for function X). These relations among parts and functions specify the dependency structure for a Bayes net. 7.2.
Roll control: an example of a Bayes net
The final transition occurred when the dependency diagram was turned into the Bayes net structure. The diagram shown in fig. 6 is a Bayes net, extracted from the larger rocket model. The statistician built it using the dependency diagram developed. The initial translation can be performed easily from the dependency diagram to the Bayes net, although the knowledge modeller and the statistician do work together to check the Bayes net and ensure that the statistician has specified the fight dependencies, labelled the functions and parts correctly, and indicated the proper directionality in the relationship arcs. The Bayes net is a highly distilled version of the dependency diagram: it eliminates all relationship labels and, at the level shown above, offers no information about subsystem location for any of the parts. Population of the model occurs in later iterations, using the series parallel diagrams for failure (to designate a range of states for each of the part and function nodes), the stoplight charts (to designate states for the mission events), and the seriesparallel data diagrams (to identify sources of data for each part and its associated failure modes). The model generates a probability distribution for each event in the inter-event dependency diagram, as well as a final probability distribution for states red, yellow, and green for the entire mission. In addition, the Bayes net allows the user to trace sample paths for different solutions through the states of each node, so that it is possible to connect given outcome for the entire system to the state of any particular node.
Anthropology, knowledge representation and Bayesian statistics
227
RCAS ECU
Heat Shield
Fig. 6. Fragmentof a Bayes net representation for roll control. 8.
CONCLUSION
Multidisciplinary projects often lack integrated representations to support the community's problem-solving process. It is frequently difficult for project insiders to develop these representations: for one thing, they are focused on meeting the project's goals. More subtly, insiders often have a great deal of local knowledge about a specific area within a project, but may have difficulty leveraging that into a global view of the problem. Anthropologists and knowledge modellers, on the other hand, are trained to elicit this information and can draw on a wide range of representation techniques to create useful abstractions of the project area. An interdisciplinary approach to knowledge modelling, one that combines techniques from anthropology, artificial intelligence, and knowledge representation, is particularly helpful in situations where problems are undergoing definition, are emergent, and that involve multiple players from different disciplines and/or geographical locations. When such modelling techniques are paired with quantitative tools from statistics, it becomes
228
D. Leishman and L. McNamara
possible to develop complex models that can, among other things, enable the integration of multiple, diverse sources of data to estimate performance without testing. Other research-related applications that we are exploring include development of models to quantify the value of an experiment without testing, to estimate the probability that an invader into a secure facility will be interdicted, and to quantify production requirements as a new consumer product is undergoing design and development.
REFERENCES Paton, R.C., Lynch, S., Jones, D., Nwana, H.S., Bench-Capon, T.J.M., Shave, M.J.R., 1994. Domain characterisation for knowledge based systems, Vol. 1, Proceedings of A.I. 94 Fourteenth International Avignon Conference, pp. 41-54. Sowa, J., 1984. Conceptual Structures. Addison Wesley, Reader, MA. Wenger, E., 1998. Communities of Practice: Learning Meaning and Identity. Cambridge University Press, Cambridge. Wilson, G.D., 2001. Articulation theory and disciplinary change: unpacking the Bayesianfrequentist paradigm conflict in statistical science. PhD Thesis, Department of English, New Mexico State University, Ann Arbor.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All fights reserved.
16
Developments in the use of a visual metaphor with reference to clinical problems C. A. L u n d a a n d R. C. Paton b
aRegional Department of Psychotherapy, Newcastle City Health Trust, Newcastle upon Tyne, UK bDepartment of Computer Science, The University of Liverpool, Liverpool L69 3BX, UK
A visual metaphor for articulating a number of key psychoanalytic concepts has been discussed in relation to experiences that may be encountered in clinical settings. This model is extended to include a number of specific cases that may be encountered by a clinician. The present discussion examines two examples, containment in a General Practitioner (GP) consultation, and an account of the evolution of phobias. Some limitations and developments to the model are considered in the concluding section.
1.
INTRODUCTION
At the first VRI Conference, we presented a visual metaphor that sought to clarify the relationships between a number of psychoanalytic concepts which had varied greatly in their meanings over the 100 years of their existence (Lund and Paton, 1998). The concepts that we were concerned with were in three linked pairs: transference and counter-transference (Sandler et al., 1979), projection and identification (Ogden, 1982), container and contained (Bion, 1962; Meltzer et al., 1982; Scharff and Scharff, 1998). We described how these were characterised as being linked through their role in what can be described as the "psychic metabolism" of phantasies and affects. Our discussions had led us to move from the narrative textual descriptions of these interactions to a visual representation in the form of a hexagonal tube (fig. 1). The side plates of this tube could be taken to represent the 229
230
C. A. Lund and R. C. Paton
mother
father
~
/
.".~
,"/
"
.-'- 7~7"
s
S
culture
siblings child Fig. 1. The hexagonal tube model. cardinal relationships of Mother, Father, Siblings, Self, Body and Culture. We indicated also that each plate could be regarded as the site of transmitters and receivers of phantasies and affects. These phantasies and affects not only passed between plates in a cross-sectional contemporaneous sense, but also along the length of the tube in a diachronic sense. This latter mode indicates the influence of past icons of memory and/or feelings upon current phantasies or percepts and feelings (Lund and Paton, 1998).
2.
D E V E L O P M E N T S OF THE H E X A G O N A L TUBE M O D E L
Since we first presented and published these ideas, the model has been modified in two aspects. The first modification followed the recognition that the plate labelled "Self" could be better labelled as "Child". Certainly in terms of early development, it made more sense of Mother, Father, Sibling and Body interactions with that plate. More importantly, it raised the intriguing insight that the self could better be regarded as the sum of the complex of plates and the inter-weaving of affects and phantasies between them. This avoidance within the visual metaphor of an entity that implied a concrete locus for the self, obviated any sense of, to reverse a clich6, "a machine within the ghost". The second change has facilitated the prospect that the visual metaphor of the hexagon can be extended from the psychoanalytic field into the wider domains of clinical work and psychiatry, the focus of this paper. This second
Developments in the use of a visual metaphor to clinical problems
Fig. 2.
231
Hexagon with surrounding plates.
change, if change it be, was the appreciation that each personal hexagon is surrounded, honeycomb-like, by the hexagons of others. This meant that each plate could be viewed as having a corresponding plate in apposition (fig. 2). The exchanges between these adjacent plates could be visualised as in fig. 3 where we have taken as an example a mother worried by the illness of her child who seeks and receives containment by a cultural representative, such as her GP. Containment takes place when there is a meeting of minds around an emotive issue. In this example, a mother's mind is filled with images Culture plate of GP
contained illness or distress
ss S
-')/
contained illness or distress ~
%~
illness
Mother . ~ plate ot
distress
mother
contained illness or distress
s S \ \
Child plate of mother
illness
%%% illness distress
distress
Fig. 3.
Illness distress containment. Cross-section.
Child plate of child
232
C. A. Lund and R. C. Paton
and phantasies of what might happen to her sick child. These images are formed out of memories of her own and other's past experience, media representations (both visual and oral) and other people's concerns about her child. All these phantasies will be accompanied by feelings of fear, hope, anxiety, anger and disbelief. The GP, drawing on his or her own personal and professional experience, is likely to share something of the mother's phantasies and feelings - enough, i.e. to identify with them and thereby to empathise. Nonetheless, he or she can also call upon the icons of memories of past similar cases and clinical texts accompanied by reassuring feelings of familiarity with the problem and a sense of knowing what to do. It is this meshing of the varied mental life experience, as expressed in phantasies and feelings, of the mother and the GP working together to combat the terrors of the situation that constitute containment. The diachronic aspects of the transactions can be better appreciated in longitudinal section (fig. 4). What needs to be stressed here is that the visual metaphor is not attempting to represent the matter-of-fact appraisal of the mother of her sick child and the matter-of-fact things that need to be done. Rather, what it can address are the feelings and feared scenarios that well up in the mind of the mother, side by side with the factual. What the longitudinal section in particular illustrates, is how the fears of her 7-year old child resonates with her own fears when she was a 7 year old. If these fears were initially more or less contained by her experience of her calming mother at 7 years of age, then that package of contained phantasy and affect can be transmitted forward to resonate with the current mother's Mother plate. Women who have experienced such appropriate maternal containment are likely to approach their GP in anxious, but adult mode. Other women may have
Fig. 4. Illnessdistress containment. Longitudinal section.
Developments in the use of a visual metaphor to clinical problems
233
been less fortunate, their mothers may not have contained their anxiety as 7 year olds, and may indeed have compounded things by panicking and speaking of awful consequences. When they consult their GP, in addition to their expectable anxiety, they may also bring their enacted experience of themselves as an uncontained, panicking 7 year old. In everyday parlance, they might be described as being childish or hysterical. From this it can be seen that the task of the GP in each of these two situations is quite different. In the first, the GP has only to contain the anxiety of a woman in her late twenties. That degree of containing function then supports the mother to cope both practically and emotionally in an ageappropriate way. However, in the second case, the GP will have a variety of tasks. First, he or she will have to recognise, control and contain his or her feelings of irritation aroused by the outpourings of the doubly distressed mother. The GP will then have to try to contain the 7 year old's experience within the panicking mother sufficient to move her on, if possible, into more adult mode. If that is successful, then he or she can work with her as in the case of the more contained mother. If that is not successful, then the GP will have the delicate task of working with the mother so far as is possible, but arranging for Health Visitors or Nurses to augment the practical and emotional mothering process. We have explicated this extension of our original visual metaphor by means of this clinical instance, both because it is an example which is a common enough scenario for readers to relate to, yet is also a gateway to the understanding of how the use of the metaphor can be extended from the relatively closed world of psychoanalysis to the everyday world of GPs, psychiatrists and their patients. To consider this extension further, we shall think about the evolution of phobias and how the different aspects of them can be depicted by means of the hexagon.
0
EXTENSIONS TO THE METAPHOR: THE EVOLUTION OF PHOBIAS
Many years ago, a patient who was a retired senior executive consulted the first author. (Please note: for reasons of confidentiality there is much that cannot be communicated. Yet to illustrate the points, we have no alternative other than to pick out the essential elements of the problems.) His problem was an overwhelming fear of bridges and offlying. While he was still working, this set some limits on his ability to travel and thereby advance his career. It was generally agreed that, while he was very able and respected in his field, he had
234
C. A. Lund and R. C. Paton
not achieved the eminence that his talent and industry would have justified. During most of his career he had consulted or had been in treatment with some of the most eminent psychiatrists and psychologists of the day. He could, and did, wittily describe the cavalcade of fashions in the understanding and treatment of his difficulties. Viewed as an illness, his phobias had been medicated with barbiturates, benzodiazepines such as Valium, and latterly anti-depressants. Regarded as the consequence of faulty learning, he and his phobias had been treated first by classical behaviour therapy and later by cognitive therapy. Seen as a psychodynamic issue, he had had individual and conjoint marital therapy sessions for his recurrent domestic difficulties: married twice, he had been divorced twice. As the reader may imagine, the clinical author quailed on hearing the implication that he too was destined to join the list of failures! Resisting any forlorn temptation to sort out a personal issue of at least 4 0 - 5 0 years duration, in parallel with helping him work through a more immediate bereavement crisis, it was possible, over several weeks to clarify the following: 1. He had come from humble beginnings through high academic achievement to his position. 2. The family atmosphere was characterised by mother's expectations of high achievement and father's undermining criticism, shades of D.H. Lawrence (Lawrence, 1994). 3. That while, to all outward appearance, he was self-assured to, and beyond the point of arrogance, within himself, he was fiddled with doubt. 4. His symptoms had meant that he had been in some sort of relationship with the psychiatric services for the majority of the past 40 years. 5. He had been married and divorced twice and was known in the local community as a difficult character. 6. He was well capable of visual imagery. To understand phobias, it is helpful to differentiate between three categories" (a) normal, or the readily recognisable exaggeration of normal or innate fears, e.g. a fear of snakes. (b) phobias generated by traumatic exposure(s) to noxious stimuli, the classic behavioural paradigm. (c) phobias associated with more generalised anxiety, often of a conflictual nature. In this patient's case, it could be argued that his was an exaggeration of normal fear. Against that are both the extent of incapacity and its resistance to vigorous treatment, if it were the only factor. There was no evidence of (b) in the onset of his condition, though it could be argued that each time he
Developments in the use of a visual metaphor to clinical problems
235
subsequently failed to fly or cross a bridge, he accrued a degree of aversive experience. The most convincing explanation of the onset of his symptom was that it was the enacted visual metaphor of his life's dilemma. That is to say, by virtue of his own abilities and his mother's expectations, he was regarded as "a high flier", "highly intelligent", "highly thought of" and "should go far". Yet, in conflict with that, because of his working class origins and his father's critical undermining, he felt "the ground cut from under him", "up in the air" and "out of his depth". The genesis of these attitudes and feelings within the family and their induction in the enduring mental life of the patient can be depicted by means of the hexagon (fig. 5). The experience of being a child subjected to these conflicting attitudes is transmitted from the past into the future. As such, the child within the adult remains vulnerable to the conflicting attitudes, creating a tension with approach/avoidance characteristics. This can be thought of as the basis of his symptomatic difficulties, i.e. his neurosis. But what the visual metaphor also makes plain simultaneously is that the family dynamic has also induced a representation within his psyche of a critical father and a high-achievement demanding mother and that these too can be visualised as being projected forward and influencing his adult behaviour. By grasping this, it is easier to understand his inter-personal difficulties, as manifested in his failed marriages and problematic relationships. That is to say, he brought to every relationship a high expectation of the other's performance, combined with a slashing criticism if these expectations were not met. This area of difficulty would be classified in psychiatry as a mild degree of personality disorder.
\
\ X
Mother
X \
\
/
Motherplate of child High expectations of achievement
-- -Father plate of child -
Unde.rmining criticism
/
/ \
Father
-
/
\
\
\
~t
Childplate of child
Fig. 5. Inductionof a critical father/high-achievementmother conflict.
236
C. A. Lund and R. C. Paton
To return to the patient's symptomatic neurosis, his phobia, the visual metaphors of bridges or planes immediately summon to the mind concepts of height, "highly intelligent", "a high flier" with distance "should go far". In much the same way as dreams are recognised to summarise ideas in internal mental images (Freud, 1900), so some symptoms can be understood in terms of their symbolic function. Viewed in this way, the difficulties of the patient can be re-framed as a partial solution to a life dilemma. How can he both achieve, yet stay within his own and the family's limits? Quite unconsciously, the internal visual icons of the bridge and the plane were projected out into the cultural sphere. The anxieties that he experienced in venturing forth were summarised by the height icons and were focused there. They became the feared objects. They could not be used. They, not the limits set by his family or himself, were the cause of his problems. Insofar as his psychiatrists and psychologists diagnosed him as suffering from a phobia, they unwittingly confirmed him in his beliefs. In doing so, they, knowingly or not, engaged in the process of containing the patient's feelings of anxiety and his phantasies of the dangers of height, literal and metaphorical, by receiving, accepting and responding to them in terms that he found conducive. When they strayed beyond this frame of reference, e.g. by suggesting that after all their treatment and expertise, he "ought" to be getting "better", then it would be that he would suffer a relapse. The options would then revolve around whether the patient and therapist could re-establish a containing relationship again, on the pretext of an alternative therapeutic enterprise, or whether the therapist would be subject to withering contempt from which the relationship, like his marriages, could not recover. If these exchanges between the therapist and patient are examined more closely in relation to the bridge/plane icon and anxiety, a number of unexpected possibilities emerge. In projecting these phantasies and affects into the therapist, not only is the patient ridding himself of the burden of the symptom for someone else to worry about, but also he is projecting the burden of ambition onto the therapist, thereby relieving a little of that load on himself. By doing so, some of the anxiety generated by the conflict could be lessened. That ambition is often identified with by the therapist, usually unconsciously, and emerges as an ambition to cure the patient. There would be the additional bonus for the patient that he would be in the driving seat when it came to undermining the therapist's efforts. That is to say he would be identifying with his critical father, rather than being subject to the internal criticism. It was, however, a position forever poised on the brink of a Pyrrhic victory, since a "successful" attack would destroy his relationship with his therapist, and therefore his containment. These processes can be summarised in fig. 6.
Developments in the use of a visual metaphor to clinical problems
237
Displaced expectations Motherplate of achievement Expectationsof ~/ achievement ~ . . . . ~.... Fatherplate Undermining criticism
~-~~/~ ~ f
,,\ _~
~ ... ~-"
Culture plate therapist ~
Cultureplate \ k,..~
Displaced undermining criticism
Childplate
Fig. 6. The process of containment of the phobic patient. 4.
CONCLUDING REMARKS
For the most part, psychiatric and psychological formulations of psychiatric conditions state, or imply, discrete functions in respect of patient, therapist, illness and any antecedent factors. The visual metaphor we have been developing points the way to illustrating the inter-relatedness of each of these entities. It reveals how, in a functional sense, through the medium of phantasies and affects, there can be a diachronic relationship involving not only a doctor and patient, but also the patient's long-dead parents. In this chapter we have sought to demonstrate the use of a visual representation with reference to two clinical vignettes. Some may feel a measure of discomfort at the use of idiographic sources. This issue is well recognised by those working in the field (Malan, 1979; Ward, 1997). But when exploring complex human relationships there is no meaningful alternative. Indeed, the richness of the material requires a more varied array of the means of recording and mapping the findings than is currently in routine use. For others their discomfort may relate to a sense of being left, somehow, in the air in respect of the accounts of the Mother and the Phobic Patient. This discomfort arises out of the absence of the soothing effect of a narrative text, a story well told, with a beginning, a middle and an ending, preferably happy, but at least conclusive. By contrast, the visual metaphor reduces the artifice of narrative (Budd, 1997) by drawing attention to the open-ended nature of human existence. This leads to a potential for more open-ended discussion of the detail of the relationship between the therapist and the patient and of the limitations of the verbal metaphoric assumptions that currently underpin that relationship. The hexagon is itself only a hermeneutic device. It is not a rigid model of the mind and should not be used as such. Nor should it even be regarded as a rigid honeycomb with each tessellated side neatly and forever
238
C. A. Lund and R. C. Paton
fitted against a corresponding side. It has been developed thus far to encourage a clarity of thinking about complex ideas that keep changing their meaning and to facilitate dialogue by finding a verbally neutral ground to share ideas. In terms of future development, we are working toward a pictographic representation of phantasies, possibly using film or video clips, with music possessing strong culturally recognised qualities to represent feelings and affective tones in a rapid sequence montage. This work continues to be developed and within this context it is important to emphasise that the model is a representation and not a resemblance. By way of illustration, one could envisage the sibling plate also in terms of peers, and the child plate as secondary process thinking. The use and application of the hexagonal tube as a hermeneutic device is to mobilise discussion and reflection.
REFERENCES Bion, W.R., 1962. Learning from Experience. Heineman, London. Budd, S., 1997. Ask me no questions and I'll tell you no lies - The social organisation of secrets. In: Ward, I. (Ed.), The Presentation of Case Material in Clinical Discourse. Freud Museum Publications, London. Freud, S., 1900. The Interpretation of Dreams, Standard Edition, Vols. 4 and 5. Hogarth Press, London, 1953. Lawrence, D.H., 1994. Sons and Lovers. Penguin, Baltimore, MD. Lund, C.A., Paton, R.C., 1998. A visual metaphor for psychoanalytic training and supervision. In: Paton, R.C., Neilson, I. (Eds.), Visual Representations and Interpretations. Springer, London, pp. 52-61. Malan, D.H., 1979. Individual Psychotherapy and the Science of Psychodynamics. Butterworth, London. Meltzer, D., Milana, G., Maiello, S., Petrielli, D., 1982. The conceptual difference between projective identification (Klein) and container-contained (Bion). J. Child Psychother. 8, 185-202. Ogden, T., 1982. Projective Identification and Psychotherapeutic Technique. Jason Aronson, New York. Sandier, J., Dare, C., Holder, A., 1979. The Patient and the Analyst: The Basis of the Psychoanalytic Process. Karnac Books, London. Scharff, J.S., Scharff, D.E., 1998. Object Relations Individual Therapy. Jason Aronson/Karnac Books, London. Ward, I. (Ed.), 1997. The Presentation of Case Material in Clinical Discourse. Freud Museum Publications, London.
Studies in Multidisciplinarity, Volume 2 Editor: G. Malcolm 9 2004 Elsevier B.V. All rights reserved.
17
A descriptive framework for designing interaction for visual abstractions K. Sedig a and J. Morey b
aCognitive Engineering Laboratory, Department of Computer Science and Faculty of Information and Media Studies, The University of Western Ontario, London, Ont., Canada bCognitive Engineering Laboratory, Department of Computer Science, The University of Western Ontario, London, Ont., Canada
This chapter propses a descriptive framework for categorisation and characterisation of the different forms of interaction with visual abstractions (VAs). Abstract visual representations play an important role in assisting human reasoning, thinking, and understanding processes. There are different forms of designing interaction with these representations. The goal of this chapter is to provide a descriptive framework to guide the designers and evaluators of cognitive tools to determine the appropriate forms of interaction that can facilitate the understanding of abstract concepts, patterns, structures and processes. The framework is described and substantiated using a number of VAs that represent and communicate mathematical ideas. 1.
INTRODUCTION AND BACKGROUND
Many concepts, patterns, structures, and processes are too complex to understand without the aid of external cognitive aids (Norman, 1993). Visual abstract representations can assist human reasoning and learning l This research is funded by the Natural Sciences and Engineering Research Council of Canada.
239
240
K. Sedig and J. Morey
(Jonassen et al., 1993; Glasgow et al., 1995; Peterson, 1996). The human visual system has limited channel capacity. Visuals provide high-bandwidth interaction with the mind. VAs can be defined as a set of interconnected symbols that can embody causal, functional, structural, and semantic relations and properties. Examples of VAs include visual mathematical representations, diagrams, maps, graphs, networks, and so on. Explicit, external VAs extend human memory by acting as "knowledge in the world", can stimulate cognitive activity, amplify human cognition, and assist perceptual interpretation (Nardi and Zarmer, 1993; Zhang and Norman, 1994). VAs may be primary (derived from real-world objects) or secondary (derived from representations such as patterns in raw data, textual information, or scientific and mathematical concepts). Much of the knowledge embodied in secondary VAs may not be at the surface level and readily available for reasoning or perceptible to the human mind. Allowing users to interact with VAs as cognitive tools 2 can enhance this process of reasoning, interpretation, and sense making. However, the form and style of interaction plays a crucial role in how well and how much knowledge learners can construct (de Souza and Sedig, 2001; Sedig et al., 2001). Most cognitive tools borrow interaction techniques devised for and used in productivity tools (Sedig et al., 2001). The appropriateness of some interaction techniques for problem solving and learning activities has been questioned (Golightly, 1996; Holst, 1996; Sedig et al., 2001). However, there is no clear understanding of what form of interaction cognitive tools should incorporate, de Souza and Sedig (2001) have suggested that when designing concept-centred interfaces, the availability of a general framework to guide choices among the visual representations is lacking. Additionally, there is a need for a framework to guide choices among forms of interaction with these visuals. Although Shneiderman (1991) has proposed a general taxonomy of interaction styles, this taxonomy is too broad and does not seem suitable for cognitive tools. This chapter is a step in creating a framework to categorise and characterise different forms of interaction with VAs. Existence of such a framework can provide designers of interactive cognitive tools with options as how to systematically think about design of interaction for VAs. In the following sections, we use several systems to develop our framework. All these systems have been developed by our research group and use mathematical concepts as a test-bed to assist us in thinking about the proposed framework. These systems include: Super Tangrams (Sedig and Klawe, 1996), a tool to help children learn 2D geometrical transformations (i.e. translation, rotation, and reflection); Archimedean 2Cognitive tools refer to computational tools intended to support and extend human mental activities while engaged in perceptual, reasoning, and problem solving processes (Lajoie, 2000).
A framework for designing interaction for visual abstractions
241
Kaleidoscope (Morey et al., 2001), a tool to help users visualise and explore polyhedral 3 solids; K-Lattice Machine (Sedig et al., 2002), a tool to help users explore sub-patterns in 2D regular lattice structures; Archimedean Confection, a tool to explore relationships among polyhedral solids; Lattice Space, a tool to visualise and explore 3D lattice structures; and Polyvise, an interactive tool to visualise and explore 4D Archimedean polytope structures. 2.
INTERACTION FACTORS
This chapter proposes that the form of interaction with VAs is determined by a set of factors. In this section, 10 factors are discussed: mode, flow, focus, filtering, scoping, recording, scaffolding, content, chunking, and configuration.
2.1.
Mode
The mode of interaction refers to the metaphoric bodily organ by which the user interacts with a VA. There are three basic bodily metaphors by which humans interact with entities in their surroundings: hands (handling entities), feet (walking on or through entities), and mouth (conversing with entities). Therefore, there are three modes in which a user can interact with a VA: manipulation, navigation, and conversation. Instances of these modes can be illustrated through an example. Figure 1 shows a VA representing a 3D lattice structure. As manipulation, the user can rotate it and view the whole structure from different angles; as navigation, the user can walk through or on it; and as conversation, the user can type a command to query the lattice about one of its properties or to transform it in some way. This can be done using natural language queries, speech, menus, form-fill-ins, or any type of linguistic command.
2.2.
Flow
The flow of interaction refers to the effect of the interaction on how the user perceives the relationship between cause and effect in the time-space continuum. Flow of interaction can be continuous or discrete. In continuous interaction, the user observes cause and effect simultaneously. When there is continuous flow to the interaction, a VA fluidly responds to the user's 3A polyhedron is a geometric solid bounded by polygons.
242
K. Sedig and J. Morey
Fig. 1. 3D lattice in Lattice Space. interaction with it. For instance, fig. 2 shows a VA representing the mathematical concept of 2D translation as an interactive vector. The user can click on one of the tips of the vector to change its size and direction. The user' s interaction with this VA is continuous because the movement of the mouse cursor is fluidly translated into a change in the size and direction of the vector. In discrete interaction, cause and effect are separated in time. That is, the interaction takes place in a modal fashion. For instance, fig. 3a shows a VA representing a state-transition diagram. In order to cause a transition from one state to another, the user clicks on the end point of one of the transitional links, and the state transition takes place. Although the user may see the effect of the click without any time delay, nonetheless this is a discrete interaction since the user's interaction with the VA takes place in temporal snapshots.
2.3.
Focus
The focus of interaction refers to the centre of attention of the user while interacting with an environment. There are two fundamental ways of interacting with VAs: direct and indirect.
A framework for designing interaction for visual abstractions
.
:
.
-
!
.
; .
! .
243
f .
i
i
L .....
~ .....
~ 7
~" ' ; . . . . . . .
i N-
-
..
.
.
.
.
.
.
i
.
.
,
. . . .