Artificial Intelligence in Education (Frontiers in Artificial Intelligence and Applications)

ARTIFICIAL INTELLIGENCE IN EDUCATION Frontiers in Artificial Intelligence and Applications Series Editors: J. Breuker...

Author: S.P. Lajoie

369 downloads 3314 Views 44MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

ARTIFICIAL INTELLIGENCE IN EDUCATION

Frontiers in Artificial Intelligence and Applications Series Editors: J. Breuker, R. Lopez de Mdntaras, S. Ohsuga and W. Swartout

Volume 50 Previously published in this series: Vol. 49, P. McNamara and H. Prakken (Eds.), Norms, Logics and Information Systems Vol. 48, P. Navrat and H. Ueno (Eds.), Knowledge-Based Software Engineering Vol. 47, M.T. Escrig and F. Toledo, Qualitative Spatial Reasoning: Theory and Practice Vol. 46, N. Guarino (Ed.), Formal Ontology in Information Systems Vol. 45, P.-J. Charrel et al. (Eds.), Information Modelling and Knowledge Bases IX Vol. 44, K. de Koning, Model-Based Reasoning about Learner Behaviour Vol. 43, M. Gams et al. (Eds.), Mind Versus Computer Vol. 42, In preparation Vol. 41, F.C. Morabito (Ed.), Advances in Intelligent Systems Vol. 40, G. Grahne (Ed.), Sixth Scandinavian Conference on Artificial Intelligence Vol. 39, B. du Boulay and R. Mizoguchi (Eds.), Artificial Intelligence in Education Vol. 38, H. Kangassalo et al. (Eds.), Information Modelling and Knowledge Bases VIII Vol. 37, F.L. Silva et al. (Eds.), Spatiotemporal Models in Biological and Artificial Systems Vol. 36, S. Albayrak (Ed.), Intelligent Agents for Telecommunications Applications Vol. 35, A.M. Ramsay (Ed.), Artificial Intelligence: Methodology, Systems, Applications Vol. 34, Y. Tanaka et al. (Eds.), Information Modelling and Knowledge Bases VII Vol. 33, P. Pylkkanen et al. (Eds.), Brain, Mind and Physics Vol. 32, L. de Raedt (Ed.), Advances in Inductive Logic Programming Vol. 31, M. Ghallab and A. Milani (Eds.), New Directions in AI Planning Vol. 30, A. Valente, Legal Knowledge Engineering Vol. 29, A. Albert (Ed.), Chaos and Society Vol. 28, A. Aamodt and J. Komorowski (Eds.), Fifth Scandinavian Conference on Artificial Intelligence Vol. 27, J. Hallam (Ed.), Hybrid Problems, Hybrid Solutions Vol. 26, H. Kangassalo et al. (Eds.), Information Modelling and Knowledge Bases VI Vol. 25, E. Hillebrand and J. Stender (Eds.), Many-Agent Simulation and Artificial Life Vol. 24, J. Liebowitz and D.S. Prerau (Eds.), Worldwide Intelligent Systems Vol. 23, J. Stender et al. (Eds.), Genetic Algorithms in Optimisation, Simulation and Modelling Vol. 22, S. Schulze-Kremer (Ed.), Advances in Molecular Bioinformatics Vol. 21, J. Breuker and W. Van de Velde (Eds.), CommonKADS Library for Expertise Modelling Vol. 20, C. Backstrom and E. Sandewall (Eds.), Current Trends in AI Planning Vol. 19, H. Jaakkola et al. (Eds.), Information Modelling and Knowledge Bases V Vol. 18, E. Sandewall and C.G. Jansson (Eds.), Scandinavian Conference on Artificial Intelligence - 93 Vol. 17, A. Sloman et al. (Eds.), Prospects for Artificial Intelligence Vol. 16, H. Kangassalo et al. (Eds.), Information Modelling and Knowledge Bases IV Vol. 15, R. Winkels, Explorations in Intelligent Tutoring and Help Vol. 14, J. Slender (Ed.), Parallel Genetic Algorithms: Theory and Applications

ISSN: 0922-6389

Artificial Intelligence in Education Open Learning Environments: New Computational Technologies to Support Learning, Exploration and Collaboration Edited by

Susanne P. Lajoie McGill University, Montreal, Canada and

Martial Vivet Universite du Maine, France

/OS

Press Ohmsha

Amsterdam • Berlin • Oxford • Tokyo • Washington, DC

C 1999. The authors mentioned in the Table of Contents All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted. in any form or by any means, without the prior written permission from the publisher. ISBN 90 5199 452 4 (IOS Press) ISBN 4 274 90307 9 C3000 (Ohmsha) Library of Congress Control Number: 99-64149 Second printing. 2002

Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam The Netherlands fax: +31 20 620 3419 e-mail: [email protected]

Distributor in the UK and Ireland IOS Press/Lavis Marketing 73 Lime Walk Headington Oxford OX3 7AD England fax:+(44 1865 75 0079

Distributor in the USA and Canada IOS Press, Inc. 5795-G Burke Centre Parkway Burke, VA 22015 USA fax: +1 703 323 3668 e-mail: [email protected]

Distributor in Germany, Austria and Switzerland IOS Press/LSL.de Gerichtsweg 28 D-04103 Leipzig Germany fax: +49 341 995 4255

Distributor in Japan Ohmsha, Ltd. 3-1 Kanda Nishiki-cho Chiyoda-ku. Tokyo 101-8460 Japan fax: +81 3 3233 2426

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

Preface The 9th International Conference on Artificial Intelligence in Education (AI-ED 99) is one of a series of international conferences in this area and it is designed to report on state of the art research in the field of AI in Education. This field is interdisciplinary and brings together researchers from the domains of computer science, cognitive science, education, psychology, linguistics, and engineering (to name a few). A conference, such as this, provides us with the rare opportunity to engage in cross-disciplinary dialogues that help us move the research forward in a more informed manner. The theme for 1999 is Open Learning Environments: New Computational Technologies to Support Learning, Exploration, and Collaboration. Technologies can support both the individual and small groups of individuals who work collaboratively to solve problems and learn new materials. New methodologies for supporting and documenting the learning process in both these situations are being developed and are described in this volume. We are fortunate to have John Self address this theme in his Keynote Address. One hundred and seventy five submissions were received from thirty different countries and were reviewed by seventy-six reviewers worldwide. The review process resulted in the acceptance of sixty-six papers and sixty-one posters. Two tutorials and eight workshops were invited to participate in the conference. The invited speakers, panelists and luncheon speakers highlight important developments in AI-ED. Along the continuum of topics provided in the proceedings are: student modeling, authoring tools, multi-media, virtual reality, agents, learning, metacognition, text and discourse, and collaborative knowledge building. This conference is the ninth Artificial Intelligence in Education conference, the second since the International Artificial Intelligence in Education Society established itself as an independent society in 1997. The society is now functioning very well, with an active web site (http://cbl.leeds.ac.uk/ijaied/aiedsoc.html), its own electronic journal (the International Journal of Artificial Intelligence in Education), and growing membership numbers. This conference reflects its vibrancy. We would like to take this opportunity to thank the many people who have assisted us in making this international conference a success. The following pages provides a list of credits to those whose hard work has ensured the scientific integrity of this conference as well to those who have worked around the clock to make this conference run smoothly in Le Mans. We would like to thank the Program Committee and the Reviewers, the Local Organizing Committee in France and the many sponsors. The Program Chair would like to thank her graduate student assistants Nancy Lavigne, Steve Wakeham, and Sonia Faremo for their unbelievable organization skills, patience, and willingness to assist in the reviewing, dissemination of proposals, and the production of the proceedings. She would also like to thank Ben Du Boulay and Jim Greer (two previous program chairs) for all their valuable advice. The program chair would also like to thank Gilles Gauthier, Jim Greer, and Riichiro Mizoguchi for participating in the final program review phase. We look forward to an excellent meeting in Le Mans, July 19-23, 1999.

Susanne P. Lajoie, McGill University Martial Vivet, Universite du Maine April, 1999

International AI-ED Society Executive Committee President: Gordon McCalla (University of Saskatchewan, Canada) President-elect: Lewis Johnson (University of Southern California, USA) Secretary: Joost Breuker (University of Amsterdam, The Netherlands) John Self, Editor of IJAIEd Bob Aiken (Temple University, USA) Michael Baker (CNRS, Universite Lumiere Lyon 2, France) Nicolas Balacheff (IMAG, Grenoble, France) Bert Bredeweg (University of Amsterdam, The Netherlands) Peter Brusilovsky (Carnegie-Mellon University, USA) Tak-Wai Chan (National Central University, Taiwan) William Clancey (Institute for Research on Learning, Palo Alto, USA) Christopher Dede (George Mason University, USA) Claude Frasson (University of Montreal, Canada) Monique Grandbastien (Universite de Nancy, France) Jim Greer (University of Saskatchewan. Canada) Ulrich Hoppe (University of Duisburg, Germany) Judy Kay (University of Sydney, Australia) Chee-Kit Looi (Kent Ridge Digital Labs., Singapore) Riichiro Mizoguchi (Osaka University, Japan) Tom Murray (University of Massachusetts at Amherst, USA) Jeff Rickel (University of Southern California, USA) Luigi Sarti (Istituto Tecnologie Didattiche, Italy) John Self (University of Leeds, UK) Martial Vivet (Universite de Maine, France) Barbara White (University of California Berkeley, USA) Philip Winne (Simon Fraser University, Canada) Conference Organizing Committee Gordon McCalla (University of Saskatchewan, Canada) Lewis Johnson (University of Southern California, USA) Susanne Lajoie (McGill University)

Benedict Du Boulay (University of Sussex, UK) Martial Vivet (Universite du Maine, France) Riichiro Mizoguchi (Osaka University, Japan) John Self (University of Leeds. UK) Program Committee Chair: Susanne P. Lajoie (McGill University. Canada) Robert Aiken (Temple University, USA Leila Alem (CSIRO Mathematical and Information Sciences, Australia) Roger Azevedo (Carnegie Mellon University. USA) Michael Baker (CNRS, Universite Lumiere Lyon 2, France) Nicolas Balacheff (Laboratoire Leibniz Institut Imag Grenoble, France) Ben du Boulay (University of Sussex, UK) Gilles Gauthier (Universite du Quebec a Montreal, Canada) Monique Grandbastien (Universite Henri Poincare Nancy 1, France) Jim Greer (University of Saskatchewan. Canada) Barry Harper (University of Wollongong, Australia) Ulrich Hoppe (University of Duisburg, Germany) Ton de Jong (University of Twente, The Netherlands) Ken Koedinger (Carnegie Mellon University, USA) Alan Lesgold (University of Pittsburgh, USA) Jian-xiang Lin (Peking University, China) Toshio Okamoto (University of ElectroCommunications, Japan) Riichiro Mizoguchi (Osaka University, Japan) Ana Paiva (IST-Technical University of Lisbon and INESC, Portugal) Helen Pain (University of Edinburgh, UK) Daniel Schwartz (Vanderbilt University. USA) Valerie Shute (GKIS, Inc., USA) Local Organizing Committee Chair: Martial Vivet, Universite du Maine. France Jacques Bonet, Universite du Maine, France Stephane Bru (Universite du Maine, France) Paul Delannoy (Universite du Maine, France) Elisabeth Delozanne (Universite du Maine. France) Xavier Dubourg(Universite du Maine. France)

Pierre Jacoboni (Universite du Maine, France) Pascal Leroux (Universite du Maine, France) Philippe Teutsch (Universite du Maine, France) Co-Sponsored by The International Artificial Intelligence in Education (AI-ED) Society With Financial Support from Universite du Maine and Lium Lab. The City of Le Mans and the Urban Community (CUM). The General Council of the Sarthe. The Council of Region des Pays de la Loire and ATLANTECH. The Chamber of Commerce and Industry of Le Mans. The French Ministry of Education, Research and Technology. The French Center for Scientific Research CNRS. Non-Profit organizations AFIA: French association for AI. ARC: The association for cognitive research. ATIEF: French Organisation for Information Technologies in Education and Training. EPI: French Association "Enseignement Public et Informatique" INRP: The French National Institute for Pedagogical Research. Private Organizations France Telecom ACO: Automobile Club de I1 Quest Credit Agricole Anjou et Maine Systel IBM Agent in Le Mans La Poste CEGOS Reviewers Robert M. Aiken Fabio Akhras Leila Alem K.S.R. Anjaneyulu Roger Azevedo Michael Baker Nicolas Balacheff Joseph Beck Janet Blatter

Jacqueline Bourdeau Bert Bredeweg Sean Brophy Paul Brna Peter Brusilovsky Stefano A. Cerri Thierry Chanier Yam San Chee Tom Cobb Albert Corbett Richard Cox Geoff Cumming Robert de Hoog Sharon Derry Ton de Jong Kees de Koning Elisabeth Delozanne Darina Dicheva Carolyn Dowling Aude Dufresne Ben du Boulay Sonia Faremo Isabel Fernandez-Castro Carl Frederiksen Patricia Fung Gilles Gauthier Kevin Gluck Vladimir A. Goodkovsky, Monique Grandbastien Jim Greer Barry Harper Lois Wright Hawkes Daniele Herin Tsukasa Hirashima Peter Holt H. Ulrich Hoppe Lin Jian-xiang Michelle Joab Lewis W. Johnson Paul Kamsteeg Sandra Katz Kenneth R. Koedinger Vivekanandan Kumar Patrick Kyllonen Susanne Lajoie Nancy Lavigne Bernard Lefebvre Pascal Leroux Alan Lesgold Svein-Ivar Lillehaug Chee-Kit Looi Sandy Marshall Dave McArthur Gord McCalla Riichiro Mizoguchi Allen Munro

Tom Murray Adisack Nhouyvanisvong Jean-Francois Nicaud Stellan Ohlsson Helen Pain Ana Paiva Gilbert Paquette Valery A. Petrushin Rolf Ploetzner Jeff Rickel Luig Sarti Eileen Scanlon Daniel Schwartz John Self Julian Serrano Mike Sharpies Hans Spada Valerie Shute Katherine M. Sinitsa Mia Stern Daniel D. Suthers Akira Takeuchi Judi Thomson Philippe Teutsch Jody S. Underwood Wouter Van Joolingen Julitta Vassileva Jesus Vazquez-Abad Steve Wakeham Barbara Wasson Geoff Webb Radboud Winkels Phil Winne Beverly Park Woolf Yoneo Yano Young Researchers Track Chair: Cyrille Desmoulins LORIA, Universite Nancy, France

Tutorial 1: The Play's the Thing: Enhancing Learning Design Through Game Elements Presenter: Clark N. Quinn Tutorial 2: How Ideas but not Tools of Artificial Intelligence Help to Effectively Develop the Intelligence of Young Children and Teenagers Presenters: V.A. Fomichov and Dr. O.S. Fomichova Workshops Chair: Nicolas Balacheff, Laboratoire Leibniz - Institut Imag Grenoble, France Workshop 1: Instructional Uses of Synthetic Agents Organizers: W. Lewis Johnson, Elisabeth Andre, Claude Frasson, James Lester, Ana Paiva, and Jeff Rickel Workshop 2: Ontologies for Intelligent Educational Systems Organizers: Riichiro Mizoguchi, Tom Murray Workshop 3: Analysing Educational Dialogue Interaction: Towards Models that Support Learning Organizers: Rachel Pilkington, Jean McKendree, Helen Pain and Paul Brna Workshop 4: Educational Robotics Organizers: Pascal Leroux, Martial Vivet, Brigitte Denis, Pierre Nonnon, Workshop 5: Medical Image Tutoring Organizer: Ben Du Boulay Workshop 6: Tutoring Systems that Learn Organizer: Joseph E. Beck

Demonstrations Chair: Franfois Marie Blondel INRP, Paris, France Tutorials Chair: Monique Grandbastien, Universite Henri Poincare, Nancy, France

Workshop 7: Open, Interactive and Other Overt Approaches to Learner Modelling Organizer: Rafael Morales Workshop 8: What do we Know of Open Learning Environments? Organizer: Jari Multisilta

Contents Preface

v

Keynote Open Sesame?: Fifteen Variations on the Theme of Openness in Learning Environments, J. Self

3

Invited Speakers Cognitive Applications of New Computational Technologies in Eye Tracking, S. P. Marshall Collaborative Learning in Open Distributed Environments - Pedagogical Principles and Computational Methods, H. U. Hoppe An Overview of the State of the Art in ITS Authoring Tools, T. Murray Trends and Issues in AI and Education: Towards a Common Research Framework, J. Sandberg Agent Models Agent Systems for Diversity in Human Learning, J. Les, G. Gumming and S. Finch Teachable Agents: Combining Insights from Learning Theory and Computer Science, S. Brophy, G. Biswas, T. Katzlberger, J. Bransford and D. Schwartz Meta-knowledge Representation for Learning Scenarios Engineering, G. Paquette A Multi-Agent Design of a Peer-Help Environment, J. Vassileva, J. Greer, G. McCalla, R. Deters, D. Zapata, C. Mudgal and S. Grant A Methodology for Building Intelligent Educational Agents, H.N. Keeling The Systemion: A New Agent Model to Design Intelligent Tutoring Systems, M.F. Canut, G. Gouarderes and E. Sanchis Analysis of Collaboration and Group Formation Learning Goal Ontology Supported by Learning Theories for Opportunistic Group Formation, T. Supnithi, A. Inaba, M. Ikeda, J. Toyoda and R. Mizoguchi Toward Intelligent Analysis and Support of Collaborative Learning Interaction, A. Soller, F. Linton, B. Goodman and A. Lesgold Authoring Tools An Ontology-A ware Authoring Tool: Functional Structure and Guidance Generation, L. Jin, W. Chen, Y. Hayashi, M. Ikeda, R. Mizoguchi, Y. Takaoka and M. Ohta Formatively Evaluating REDEEM - An Authoring Environment for ITSs, S. Ainsworth, J. Underwood and S. Grimshaw Intelligent Agent Instructional Design Tool for a Hypermedia Design Course, S. Stoyanov, L. AroyoandP. Kommers Design Principles of a New Modelling Environment for Young Students, Supporting Various Types of Reasoning and Interdisciplinary Approaches, A. Dimitracopoulou, V. Komis, P. Apostolopoulos and P. Politis

1 8 9 10

13 21 29 38 46 54

67 75

85 93 101

109

Collaboration and Argumentation Representational Bias as Guidance for Learning Interactions: A Research Agenda, D.D. Suthers Favouring Modellable Computer-Mediated Argumentative Dialogue in Collaborative Problem-Solving Situations, M. Quignard and M. Baker

121 \ 29

Collaborative Knowledge Building Designing Computer-Mediated Epistemic Interactions, M. Baker, E. de Vries and K. Lund 139 Teachers' Collaborative Interpretations of Students' Computer-Mediated Collaborative Problem Solving Interactions, K. Lund and M. Baker 147 Learning as Knowledge Refinement: Designing a Dialectical Pedagogy for Conceptual Change, A. Ravenscroft and R. Hartley 155 Evaluating Adaptive Systems A Methodology for Developing Affective Skills with Model Based Training, T. M. Khan, K. Brown and R. Leitch 165 User Controlled Adaptivity versus System Controlled Adaptivity in Intelligent Tutoring Systems, M Crampes 173 Dynamic versus Static Hypermedia in Museum Education: An Evaluation of ILEX. the Intelligent Labelling Explorer, R. Cox, M O'Donnell and J. Oberlander 181 Evaluating Tutoring Systems A Multi-Year Large-Scale Field Study of a Learner Controlled Intelligent Tutoring System, T.N. Meyer, T.M. Miller, K. SteuckandM. Kretschmer Tutoring Answer Explanation Fosters Learning with Understanding, V. Aleven, K. R. Koedinger and K. Cross Towards a Product for Teaching Formal Algebra, J. -F. Nicaud, D. Bouhineau, C Varlet and A. Nguyen-Xuan Learning to Solve Polynomial Factorization Problems: By Solving Problems and by Studying Examples of Problem Solving, with an Intelligent Learning Environment, A. Nguyen-Xuan, A. Bastide and J.-F. Nicaud Foundational Issues for AI-ED The Plausibility Problem: Human Teaching Tactics in the "Hands" of a Machine, B. du Boulay, R. Luckin and T. del Soldato Bringing Back the AI to AI & ED, J. E. Beck and M. K. Stern IF "What is the Core of AI & Education?" Is the Question THEN "Teaching Knowledge" is the Answer, N. Van Labeke, R. Aiken, J. Morinet-Lambert and M. Grandbastien Intelligent Multimedia A Three-Layered Scalable Architecture for Computer Based Instruction, G. Adorni, M.S. Barbieri, D. Bianchi, A. Poggi and A.M. Sugliano Multiple Representation Approach in Multimedia Based Intelligent Educational Systems, Kinshuk, R. Oppermann, A. Patel and A. Kashihara

191 199 207

215

225 233

241

251 259

Learning Companions The Missing Peer, Artificial Peers and the Enhancement of Human-Human Collaborative Student Modelling, S. Bull, P. Brna, S. Critchley, K. Davie and C. Holzherr User Modeling in Simulating Learning Companions, C.-Y. Chou, C.-J. Lin and T.-W.Chan Teaching Scientific Thinking Skills: Students and Computers Coaching Each Other, LA. Scott and F. Reif

269 277 285

Metacognition Teaching Meta-Cognitive Skills: Implementation and Evaluation of a Tutoring System to Guide Self-Explanation while Learning from Examples, C. Conati and K. VanLehn 297 Metacognition in Epistolary Rhetoric: A Case-Based System for Writing Effective Business Letters in a Foreign Language, P. Boylan, C. Vergaro, A. Micarelli and F. Sciarrone 305 New Directions Integrating a Believable Layer into Traditional ITS, 5. Abou-Jaoude and C. Frasson Helping the Peer Helper, V.S. Kumar, G.I. McCalla and J.E. Greer A Knowledge Extractor Environment for Classroom Teaching, A.I. Cristea and T. Okamoto Simulation: Systems and Architectures An Agent-Operated Simulation-Based Training System - Presentation of the CMOS Project, L. Richard and G. Gouarderes Towards a Unified Specification of Device-Instructor-Learner Interactions, P. - W. Fung and R.H. Kemp An Open Architecture for Simulation-Centered Tutors, A. Munro, D.S. Surmon, M.C. Johnson, Q.A. Pizzini and J.P. Walker Skill Acquisition and Assessment A Combination of Representation Styles for the Acquirement of Speech Abilities, V. Govaere An Evaluation of the Impact of AI Techniques on Computerised Assessment of Word Processing Skills, R.D. Dowsing and S. Long Internet Based Evaluation System, A. Rios, E. Milldn, M. Trella, J.L. Perez-de-la-Cruz and R. Conejo Student Modeling SIPLeS-II: An Automatic Program Diagnosis System for Programming Learning Environments, S. Xu and Y.S. Chee The Interactive Maintenance of Open Learner Models, V. Dimitrova, J. Self and P. Brna An Easily Implemented, Linear-Time Algorithm for Bayesian Student Modeling in Multi-Level Trees, W. R. Murray Error-Visualization by Error-Based Simulation Considering Its Effectiveness: Introducing Two Viewpoints, T. Horiguchi, T. Hirashima, A. Kashihara and J. Toyoda

315 325 333

343 352 360

371 379 387

397 405 413

421

Supporting Learning Communities Assessing Knowledge Construction in On-Line Learning Communities, S. J. Deny and LA. DuRussel Impact of Shared Applications and Implications for the Design of Adaptive Collaborative Learning Environments, D. Gurer, R. Kozma and E. Milldn

431 439

Supportive Collaborative Learning An Approach to Analyse Collaboration when Shared Structured Workspaces are Used for Carrying Out Group Learning Processes, B. Barros and M.F. Verdejo 449 Supporting Distance Learning from Case Studies, M.C. Rosatelli and J.A. Self 457 Assistance and Visualization of Discussion for Group Learning, M. Nakamura, K. Hanamoto and S. Otsuki 465 Supporting Mathematics Learning A Semi-Empirical Agent for Learning Mathematical Proof, V. Luengo A Proof Presentation Suitable for Teaching Proofs, E. Melis and U. Leron A Diagnosis Based on a Qualitative Model of Competence, S. Jean, E. Delozanne, P. Jacoboni and B. Grugeon Support for Medical Education Expertise Differences in Radiology: Extending the RadTutor to Foster Medical Students' Diagnostic Skills, R. Azevedo and S.L Faremo Building a Case for Agent-Assisted Learning as a Catalyst for Curriculum Reform in Medical Education, E. Shaw, R. Ganeshan, W.L. Johnson and D. Millar Computer-Based Tutoring of Medical Procedural Knowledge, O. Larichev and Y. Naryzhny Understanding Texts and Dialogues Tutoring Systems Based on Latent Semantic Analysis, B. Lemaire Improving an Intelligent Tutor's Comprehension of Students with Latent Semantic Analysis, P. Wiemer-Hastings, K. Wiemer-Hastings and A.C. Graesser Modeling Pedagogical Interactions with Machine Learning, S. Katz, J. Aronis and C. Creitz An Intelligent Agent in Web-Based Argumentation, R. Yu and Y.S. Chee Virtual Realities and Virtual Campuses Intelligent Assistance for Web-Based Telelearning, J. Girard, G. Paquette, A. Miara and K. Lundgren Dialectics for Collective Activities: An Approach to Virtual Campus Design, P. Jermann, P. Dillenbourg and J.-C. Bronze Virtual Humans for Team Training in Virtual Reality, J, Rickel and W.L. Johnson Detecting and Correcting Misconceptions with Lifelike Avatars in 3D Learning Environments, J.P. Gregoire, L.S. Zettlemoyer and J.C. Lester Posters Learning Effectiveness Assessment: A Principle-Based Framework, L. Alem, C.N. Quinn and J. Eklund Piagetian Psychology in Intelligent Tutoring Systems, I. Arroyo, J. E. Beck, K. Schultz

and B.P Woolf

475 483 491

501 509 517

527 535 543 551

561 570 578 586

597

600

Software Agents for Analysis of Collaboration in a Virtual Classroom, P.A. Jaques and P.M. de Oliveira Intelligent Navigation Support for Lecturing in an Electronic Classroom, N.A. Baloian, J. Pino and U. Hoppe An Ablative Evaluation, J.E. Beck, 1. Arroyo, B.P. Woolfand C. Beal Distributed User Models for Client/Server Architectures, R. Bedanokova and M. Vivet Providing Help and Advice in an Open Learning Environment in Chemistry, F.-M. Blondel, M. Schwab and M. Tarizzo Ontological Engineering of Instruction: A Perspective, J. Bourdeau and R. Mizoguchi ArgueTrack: Computer Support for Educational Argumentation, A. Bouwer An Intelligent Learning Environment for Musical Harmony, M. Brandao, H. Pain and G. Wiggins An Architecture for a Literacy Teaching ITS, M. Carvalho, H. Pain and R. Cox Investigating Representational Competence in Secondary School Students, T. Conlon, R. Cox, J. Lee, J. McKendree and K. Stenning A Coached Computer-Mediated Collaborative Learning Environment for Conceptual Database Design, M.A. Constantino-Gonzalez and D.D. Suthers Using AI Techniques to Educate Novice Designers, M. Danaher Towards an Authoring Environment for Building and Maintaining a Society of Artificial Tutoring Agents, E. de Barros Costa and J. Costa da Silva Combining Artificial Intelligence and Human Problem Solving: Proposing a New Architecture for ITS's, A. de Sd Leite and N. Omar Motivation Self-Report in ITS, A. de Vicente and H. Pain Eliciting Motivation Diagnosis Knowledge, A. de Vicente and H. Pain How to Design a Dynamic Adaptive Hypermedia for Teaching, N. Delestre, J.-P. Pecuchet and C. Greboval Using Design Patterns in ITS Development, V. Devedzic EXPLORA: An Interface to Support the Learner with Dynamic Graphs and Multimodal Goal Driven Explanations, A. Dufresne, V. Cosmova, T. LeTran and C. Ramstein Navigational Issues in Interactive Multimedia, S. Fenley Using DETECTive, a Generic Diagnostic System for Procedural Domains, B. Ferrero, I. Fernandez-Castro and M. Urretavizcaya Agents for Diversity in Statistics Education, S. Finch, G. Gumming and J. Les A Generic Graph Model for Case-Based Tutoring, P.-W. Fung andR.H. Kemp Student Modelling for Multi-Agent Intelligent Tutoring Systems, T. Gavrilova, A. VoinovandT. Chernigovskaya Computer-Supported Project Pedagogy in a Distributed Collaborative Learning Environment, S. George and P. Leroux Individualising the Assessment for Low-Attaining Pupils in Word Problem Solving, K. Georgouli, M. Grigoriadou and M. Samarakou Towards an Analysis of Answers to Open-Ended Questions in Computer-Assisted Language Learning, J. Gerbault Towards a New Computational Model to Build a Tutor, L.M.M. Giraffa, M. da C. Mora and R.M. Viccari Knowledge-Based Integration and Adaptation of Presentations in an Intelligent Tutoring System, M. Gonschorek Design of an Intelligent Tutoring System for Teaching and Learning Hoare Logic, K. Goshi, P. Wray, Y. Sun and M. Owens

603 606 611 614 617 620 623 626 629 632 635 638 641 645 648 651 654 657

660 663 667 670 673 676 679 682 686 690 693 696

Developing Pedagogical Simulations: Generic and Specific Authoring Approaches, V. Gueraud and J.-P. Pernin Evaluating and Revising Teaching Courses Built from Heterogeneous Distributed Educational Materials, L. Guizzon and D. Herin The Virtual Campus Prolog Learning Environment, H. Gust, C. Peylo, C. Rollinger and W. Teiken Group Learning in Hybrid Communities in Intelligent Tutoring Systems, A. Harrer A Framework for Estimating Usefulness of Educational Hypermedia, S. Hasegawa, A. Kashihara and J. Toyoda A Knowledge Structure Visualization for Supporting Exploratory Learning in Hyperspace, A. Kashihara, H. Uji'iandJ. Toyoda Awareness of the Other's Metacogniton is the Discerning Factor by which Reflection Turns into Monitoring, M. Kayashima and T. Okamoto Knowledge Representation and Processing in Web Based Intelligent Tutoring, KinshukandA. Patel E-Slate: A Kit of Educational Components, G. Birbilis, M. Dekoli, T. Hadzilacos, M. Koutlis, C. Kynigos, K. Kyrimis, X. Siouti, G. TsironisandG Vasiliou Knowledge Tracing of Cognitive Tasks for Model-Based Diagnosis, J. -C. Le Mentec, W. Zachary and V. Iordanov Model-Tracing Methodology and Student-Controlled Environment: A Possible Marriage?, R. Lelouche Modelling Tutoring Knowledge in an ITS with a Team of Interacting Agents. R. Lelouche and J.-F. Morin Specification of the Educational Mediator Architecture: An Inter-Operation Architecture for Educational Software, M. Macrelle A Framework for Authoring Tools for Explanation Mechanisms in ITS, C.J. Martincic and D.P. Metzler Making Ethics Real for Students, G. McCalla and M. Winter MetaLinks - A Framework and Authoring Tool for Adaptive Hypermedia, T. Murray, C. Cond.it, J. Piemonte, T. Shen and S. Kahn Didactic Resource Server for Distributed Intelligent Tutoring Systems, R. Nkambou Using and Re-Using Agents in Multi-Agent Learning Environments, A. Paiva, I. Machado and A. Martins Towards a Knowledge Engineering Method for the Construction of Advisor Systems, G. Paquette and P. Tchounitdne Applying Learning Algorithm Techniques in the Design of a Tutor Model for Pascal Programming, R.L Reyes A Dialogue Based Tutoring System for Basic Electricity and Electronics, C.P. Rose, B. Di Eugenio and J.D. Moore Explanation Process in Distance Tutoring Interaction, the Case of TeleCabri, S. Soury-Lavergne Design of a Learning Environment in Combinatorics: Nondeterministic Machines to Improve Modelling Skills, G. Tisseau, H. Giroire, F. Le Calvez, M. Urtasun and J. Duma A Domain Independent Authoring Environment for Problem Solving Knowledge,

U. Trapp and M. Lusti The Generic Computer-Assisted Language Learning Environment CALLE, W. Winiwarter

699 702 705 708 711 714 717 720 723 726 729 732 735 738 741 744 747 750 753 756 759 762

765

768 771

Agents Aggregation Model in Virtual Learning Environments, L. -H. Wong and C.-K.Looi

774

Panels The Impact of AIED in the Schools, T. Conlon, K. Koedinger, A. Paiva, S. Goldman, J. -M. Laborde and E. Soloway Issues Involving Human and Computer Tutoring, B. Du Boulay, J. Greer, M. Lepper. A. Graesser, K. VanLehn and J. Moore

780

Luncheon Speakers Voices from the Field: Schools Don't Want Technology, Schools Want Curriculum, C. Norris and E. Soloway

785

Workshops Instructional Uses of Synthetic Agents, Organizers: W.L. Johnson, E. Andre, C. Frasson, J. Lester, A. Paiva and J. Rickel Ontologies for Intelligent Educational Systems, Organizers: R. Mizoguchi and T. Murray Analysing Educational Dialogue Interaction: Towards Models that Support Learning, Organizers: R Pilkington, J. McKendree, H. Pain and P. Brna Educational Robotics, Organizers: P. Leroux, M. Vivet, B. Denis and P. Nonnon Medical Image Tutoring, Organizer: B. Du Boulay Tutoring Systems that Learn, Organizer: J. E. Beck Open, Interactive and Other Overt Approaches to Learner Modelling, Organizers: R. Morales, S. Bull, J. Kay and H. Pain What do we Know of Learning Environments?, Organizer: J. Multisilta

779

789 790 791 792 793 794 795 796

Tutorials The Play's the Thing: Enhancing Learning Design through Game Elements, C.N. Quinn How Ideas but not Tools of Artificial Intelligence Help to Effectively Develop the Intelligence of Young Children and Teenagers, V.A. Fomichov and O.S. Fomichova

800

Author Index

801

799

Keynote

This page intentionally left blank

Artificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) IOS Press, 1999

Open Sesame?: Fifteen Variations on the Theme of Openness in Learning Environments John Self Computer Based Learning Unit University of Leeds Abstract. The theme of this conference is "open learning environments". But what sense(s) of 'open* do we mean? Why do we want our learning environments to be 'open'? What educational philosophies underlie this move towards 'open'ness? What technologies exist to enable our learning environments to become more 'open'? What is the social context in which we envisage our 'open' environments will be used? If we are seeking the 'open sesame* to magically fling wide the door to learning, are we Ali Baba or the Forty Thieves? This talk will present a number of variations, illustrated by specific learning environments, on the theme of openness in an attempt to answer these questions.


Invited Speakers



Cognitive Applications of New Computational Technologies in Eye Tracking Sandra P. Marshall Cognitive Ergonomics Research Facility Department of Psychology College of Sciences San Diego State University Abstract. Many instructional activities require an individual to process information displayed on a computer screen. The more we know about how an individual interacts with the display, the better are we able to model the individual's performance, to diagnose cognitive strengths and weaknesses, and to evaluate the effectiveness of the display. Based on my recent research with a decision support system, I will describe how point-of-gaze measures can be used to evaluate screen and instructional designs. I will discuss how pupil-based estimates of cognitive workload can be related to the point-of-gaze information, and I will illustrate this relationship with data from several cognitive tasks. Finally, I will suggest ways in which eye-tracking technologies can contribute to cognitive models, using my schema models of tactical decision-making as examples.

Artificial Intelligence in Education SP. Lajoie and M. Vivet(Eds.) /OS Press. 1999

Collaborative Learning in Open Distributed Environments Pedagogical Principles and Computational Methods H. Ulrich Hoppe COLLIDE Research Group Gerhard Mercator University Duisburg, Germany Abstract. Research into intelligent learning support systems has recently experienced a shift from focusing on individual learners to focusing on groups of learners. However, group tutoring cannot count on cognitive models of learning in groups in the same sense as ITS counted on individual cognitive models. Considering open task environments and group interaction forces us to redefine intelligent learning support as inherently restricted to partial "insight* on the part of the system. Still, artificial agents based on domain and learner models may in a variety of ways enrich and support human-human inter-action in group learning scenarios. In this view, intelligent learning support functions can be conceived as local resources in combination with, e.g., support by peers or human tutors. Learning in a social context also requires the consideration of organizing the learning process in time and space beyond single tasks and single applications. This leads us to conceiving the design of technologically enriched spaces for learning (such as classrooms) under aspects of "roomware" and integrated classroom information management. Recently, the question has been raised if the computer as an explicit object with an uniform standard interface will be replaced by a variety of more specialized "information appliances". Should educational environments try to follow this trend, or perhaps, should they even try to be on the leading edge of it? However, technology is not enough to make collaborative learning effective and efficient Practical experience tells us mat cooperation itself is a subject of learning which has to be introduced and trained. More generally, we must develop design principles for collaborative learning environments and materials in accordance with or even derived from pedagogical aims and methods. The issues mentioned will be discussed and reflected on the background of concrete experience in several projects.

Artificial Intelligence in Education S.P. Lajoie and M. Vivetm(Eds.) IOS Press, 1999

An Overview of the State of the Art in ITS Authoring Tools Tom Murray University of Massachusetts and Hampshire College Abstract. Intelligent tutors are becoming more common and proving to be increasingly effective, but they are difficult and expensive to build. Authoring systems can decrease the cost of building ITSs and lower the skill threshold so that more people can participate in building them. Authoring systems are commercially available for traditional computer aided instruction and multimedia-based training, but they lack the sophistication required to build intelligent tutors. Researchers have been investigating ITS authoring tools almost since the birth of ITSs, and over two dozen very diverse systems have been built. In the last five years there has been significant progress in the development of these systems and in the understanding of the key issues involved. Still, for the most part they are research vehicles which have demonstrated significant success in limited cases. In this talk I hope to address these questions: "what methods and design principles have come out of this research?" and "what is really available (or soon to be available) to make ITS authoring cost effective?" The talk will give an overview of some existing systems and discuss how various techniques have been used for knowledge management, knowledge elicitation and acquisition, knowledge visualization, and knowledge creation, to simplify the authoring process. Important issues such as "who is the intended author?" and "how general vs. special purpose should authoring tools be?" will be discussed.

10

Artificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) IOS Press. 1999

Trends and Issues in AI and Education: Towards a Common Research Framework Jaoobijn Sandbag University of Amsterdam

Abstract. As long as the ITS-paradigm dominated toe field of AI and Education, the role of AI was clear. Since the eighties new instructional paradigms have emerged reflecting various shifts of emphasis in education, such as a heavier focus on meta-cognition, more use of open domains and tasks, a renewed interest in collaborative learning, The role of AI in these paradigms is less clear-cut than it used to be. In the presentation I will address the shifts that accompany the development of new instructional paradigms and I will discuss the support AI can bring to these. The presentation will refer to ongoing research at our department as well as research conducted elsewhere to illustrate the most salient developments in our field and the changing role of AI.

Agent Models



13

Agent systems for diversity in human learning Jan Les, Geoff Gumming, Sue Finch School of Psychological Science, La Trobe University, Australia 3083 (j.les, g.cumming, s.finch}@latrobe.edu.au Expert human teachers base their comments to a learner on a wide range of types of information fragments about that learner. In stark contrast, tutoring systems usually rely on modelling a single type of learner information—local domain knowledge. How can this insight into human tutoring be exploited? We examine the vogue concept of agents, as used in computer science and AI; in business and public discussion; and in AIED. So many perspectives have been discussed and definitions used that the agent concept is in danger of losing value. Autonomy, reactivity and purposiveness are central features. Agents may be symbolic, or situated, or hybrid. Agent-oriented programming is a valuable software technique, somewhat separated from the presentation of anthropomorphic agents to a user. We discuss learning systems that encompass a diversity of representations and perspectives; using sets of situated agents is a promising approach. Agent-based systems should permit a closer approach to the richness of human teaching-learning interactions.

We consider the vogue concept of 'agents' and discuss how it can contribute to AIED, particularly in the light of research with human teachers that suggests limitations to traditional AIED approaches. In an accompanying paper [9] we discuss agent possibilities specifically for learning statistics. 1

Agents: Definition and metaphor

'Although increasingly popular, the term [agent] has been used in such diverse ways that it has become meaningless without reference to a particular notion of agenthood.' [17, p.52]. Many others have made similar remarks, and the observation applies also within AIED. The Oxford dictionary distinguishes two meanings of 'agent': 1. 'one who or that which exerts power or produces an effect'; [agent as executive, or doer] 2. 'one who acts for another in business, politics, etc'; [agent as subordinate, or servant] The agent metaphor is therefore ambiguous between autonomy, initiative, and independence (Oxford 1); and fulfilling the wishes of the user (Oxford 2). This ambiguity may be useful, but allows 'agent' to be applied to an enormous spread of computational entities. 1.1 Popular discussions The dominant popular metaphor evokes 'the lovely image of a butler performing tasks with a clairvoyant understanding and ability to take care of user needs' [16, p.92]. Tasks may range from simple procedures, as offered by dreadfully mis-named software 'wizards', to retrieval tasks beyond the intelligence of an expert human. Agents can learn a user's characteristics and needs, and discussions of this feat reinvent AIED analyses of learner modelling. The metaphor of the intelligent and ever-learning butler is dominant, but agent hype draws also on technological advances, including ubiquitous computing, improved telecommunications, natural language abilities and a greatly enhanced Internet. The sci-fi image of an agent ranging the world to pander to one consumer's whim thus relies on technical advances as well as the software concept of agent. Human-like emotions and

14

J Les et al / Agent Systems for Diversity

appearance may lead the user to anthropomorphise much more than the agent's abilities justify. In fact software capabilities dressed up for presentation to a user as an agent may not meet a software designer's criteria for an agent. Interface agents solve any communications problem the user may have and, as the collection of special purpose agents grows, a meta-agent may be introduced as manager so the user never risks information overload. 1.2 The user's perspective Popular discussions usually take the perspective of the user rather than the software architect. Any entity presented to a user as an agent should hide complexity, but give the user an appropriate level of knowledge of what is going on. The designer should prompt in the user a mental model of the agent that maximises agent effectiveness and leads the user to have an appropriate level of trust in the agent.. 2

Computer science views of agents

The agent notion may have been first suggested by McCarthy in 1968 [7]. Shoham [17] presented agent-oriented programming (AOP) as a new computational framework, a specialisation of object-oriented programming. 'An agent is an entity whose state is viewed as consisting of mental components such as beliefs, capabilities, choices, and commitments.' (p.52). The defining property of an agent is 'precisely the fact that one has chosen to analyze and control it [the agent] in these mental terms' (p.52): this recourse to subjectivity removes any hope for a clear definition of agent here. Shoham [17] emphasised a 'societal view of computation', with multiple agents interacting with one another, but did not include this in the definition. Genesereth and Ketchpel [5], however, adopted communication as their definition: 'An entity is a software agent if and only if it communicates correctly in an agent communication language...' (p.50) This served their software engineering purposes but does not sufficiently capture the agent metaphor. Wooldridge and Jennings [20] gave a more useful, mainstream framework by distinguishing a weak notion of agency requiring autonomy, reactivity to the environment, initiative towards a goal, and ability to interact with other agents; and strong agency incorporating also mental states and other human-like attributes. These views span Oxford 1 and 2, and fit with the everyday agent metaphor. Sometimes the ability to learn is required for agenthood, and sometimes the ability to interact with other agents is not required. There is much discussion of what constitutes autonomy, with an important factor being a large degree of control over the agent's own actions. 2.1 Artificial Life and situatedness Shoham [17] declared that agents are usually 'high-level', although 'a certain counterideology deliberately denies the centrality or even existence of high-level representation in agents' (p.52). This was a reference to the subsymbolic entities of artificial life—the ants in the ant heap, each simple but with the colony showing complex emergent behaviour. There is a research tradition (Castelfranchi & Werner [1]) in which even the 'ants' are referred to as agents. This usage seems to strain the agent metaphor, but some of the most interesting agent developments now involve situated agents, which respond and adapt to environmental complexities rather than incorporating complex symbolic models of their world. Steels [18]

J. Les et al. / Agent Systems for Diversity

15

goes so far as to argue that only such a situated agent can show autonomy, in the sense of being able to adapt in a deep sense to its environment. Hybrid architectures offer a promising synthesis of symbolic and sub-symbolic approaches (Muller [12]). For example, lower layers within an agent can give the reactivity afforded by situatedness, while upper levels use a symbolic approach to planning. 3

Agents for Learning - the literature

Kearsley [7] painted a rosy picture in suggesting how agents could be used in education: • intelligent agents act as coaches or advisors giving individually-tailored explanations • different agents have different personalities and thus offer different perspectives • agents assist with navigation and other tasks incidental to the learning goals • 'build agents into a programming environment so that students can create them directly' (p.297)—this is the constructivist idea that learners build and use their own agents • agents interact with other users (and agents) giving a strongly collaborative view of learning. Kearsley required an agent to have common-sense reasoning, and to build a rich model of the user encompassing the entire personality. Dede and Newman [3] gave a cogent response, arguing that Kearsley's view of an intelligent teaching agent 'sounds suspiciously like the Holy Grail of a universal intelligent tutoring system' (p.306). Merely using the term agent does not necessarily add anything. Those two 1993 'viewpoint' papers raised issues but did not give a searching analysis of agents in AIED, nor any startling insight. 3.1 Agents, large or small? The 'large agent' approach labels whole ITS components as single agents. Thus Zhang, Alem and Yacef [21] described the components of their system as the 'domain expert agent', 'learner agent', 'instructor agent', and so on, to emphasise the system's extensibility. Ritter and Koedinger [15] described the addition of a single 'plug-in tutor' agent to Geometer's Sketchpad, and to Excel; they used the same core tutoring module in each case, with application-specific translators, and justified use of the term agent to emphasise the reuse. Extensibility and reusability are often valuable aspects of agent systems, so describing whole modules as agents can be defended. However these agents are larger in scope than most, and there is little reference to the power of multiple agents that are specialised but broadly comparable. So labelling a whole component as an agent does not help much. Leman, Giroux and Marcenac [8] went to the other extreme in taking a multi-agent approach to learner modelling. Agents were of 3 types: S-agents 'own tiny pieces of static knowledge on the domain' (p.261); D-agents model dynamic functional aspects of concepts, and R-agents are at a higher level and model reasoning. Modelling information, of the student's knowledge and current reasoning, is distributed, and various possibilities can be considered simultaneously by the agent system. Some agents are at a very low level, comparable with atomic statements or nodes in reasoning graphs; it is hard to see how this use of 'agent' gives us much. The higher level agents, however, support multiple hypotheses about current learner thinking. It is such simultaneous consideration of diversity that is the great potential of agent systems for learning.

16

J. Les et al. /Agent Systems for Diversity

3.2 Simultaneous diversity in learning systems To the software designer, an agent architecture offers an efficient way to provide a wide range of functions simultaneously. To the software user it offers a mental model for comprehending and using such diverse functionality. In education it offers a way to support the diversity of representations and perspectives inherent in human learning and teaching; this is a key reason for the potential of multi-agent systems for learning. Paquette, Pachet, Giroux, and Girard [13] took further the work on multi-agent learner modelling. A hierarchy of advice agents maps onto a hierarchy of tasks. Advice agents operate at a number of levels and so the user can receive very specific or quite general advice. The agent architecture facilitates reuse in adding advisory capabilities to other learning environments. Ritter [14] extended the earlier 'large agent' work by building multiple tutoring agents, which have standard interfaces and protocols so they can be developed separately. Tutor agents do not have particular knowledge of the learning environment, because a translator or mediator element is placed between tutor and tool. Agents can be reused in different domains. The agents are all in operation at once, with various cooperative and competitive relations between them. The mediator component copes with contention among tutor agents. 3.3 Social agents In the systems considered above the emphasis was on agents as a powerful software architecture, easing the task of system design and development. The learner was not necessarily aware of interacting with a system incorporating many agents. A quite different approach is to present agents to the learner, perhaps as friendly experts or fellow learners. Lester, Converse, Stone, Kahler, and Barlow [10] added an animated pedagogical advisor to a simulation environment. With just one advisor, this is a 'large agent' example. Middle school students received comments and advice on their work from the agent, which was sufficiently animated and expressive to be regarded as a creature having relevant expertise. This representation of the advice-giver was found effective, at least for these learners. Dillenbourg, Jermann, Schneider, Traum, and Buiu [4] wished to understand how an agent should be designed to participate as advisor or co-learner in collaborative problem solving at a distance. They studied the conversational and information sharing interaction between a pair of human learners collaborating at a distance. Their analysis was in terms of multiple ongoing dialogues, information management and best use of available communication and presentation media. Building fully collaborative agents is a large challenge, but less capable agents that assist with parts of the task or communications are more feasible and still useful. A similar approach was taken by Hietala and Niemiripo [6], who studied middle school students working on mathematical equations. The students could choose to collaborate with, and obtain advice from any of several simulated anthropomorphised learning companion agents. The capability of the companion and the student, as well as social and personality factors influenced companion choice and effectiveness. Wang and Chan [19] reported a further development in a substantial program of work on social learning systems. Such systems include both human learners and educational agents, possibly distributed at various locations. Wang and Chan described an AOP language for implementing the agents, which can act as tutors, learning companions or


17

virtual students. A key design feature was expressed as: The magic to achieve autonomy of an agent is letting the agent communicate with others by performing speech act[s], such as inform, query, answer, request, etc.' (p.l 1) The above few examples illustrate the great range of approaches and issues in research on agents that are presented to learners as behaving entities. In many but not all cases the underlying software is also agent-based. The last project mentioned combined most explicitly both the software and the social concepts of agents in that the goal was the development of an agent-oriented software tool for constructing entities to be seen by learners as agents. 3.4 Situated agents for learning Masthoff and Van Hoe [11] described a very interesting attempt to build a domainindependent module comprising a number of agents each specialising in one aspect of teaching. The aspects were: navigation through course material; examples and practice; instruction; feedback; presentation formats; student history. Each agent was based on simple, general rules. Domain-specific teaching knowledge was included in the domain component, to operate alongside the set of general teaching agents. A key feature is that the agents were described as situated, or reactive: 'There is no control component. The behaviour of the system emerges from the interaction between the agents. ...[for example] the navigation agent, practice agent and instruction agent compete sometimes.' (p.87) 5.5 Conclusions about agents for learning As in the earlier sections on popular discussions and computer science, we found here many conceptions of agents. Also, in relation to learning a wide range of issues are discussed, including: • autonomy and purposiveness are central features of agents • reactivity is a further central feature • agents may be situated, with reactivity to the surroundings fully shaping agent behaviour, or may be symbolic or deliberative, in which case reactivity is governed also by internal knowledge and plans • the situated-symbolic issue is a major question, with various combinations possible • an agent-based system can be a good software architecture without the user being presented with any anthropomorphised agents; conversely the user can encounter a human-like entity without the underlying software being agent-based; but most commonly the two are combined • agents most usefully come in a collection; communication ability between agents is for some a definitional requirement of an agent, but situated agents may have no such developed ability and so we should not require agents to be social. 4

Agents for learning - expert teachers of English as a Second Language

Our main purpose is to examine how AIED might respond to our previous findings of diversity in learner information used by teachers. Gumming, Sussex, Cropp and McDougall [2] studied the three-way teacher-leamer-computer (TLC) interactions as an expert teacher advised a learner of English as a Second Language who worked at lexical tasks at the computer. Teachers later gave commentaries as they viewed parts of the video record. They were asked in particular about the learner information they had used in formulating

18

J. Les et al. / Agent Systems for Diversity

comments. The aim was to tap teachers' learner modelling, which must be the very basis of individualisation in education.

• • • • • • •

Analysis of the transcripts and commentaries led to the following conclusions: Expert teachers can give effective individual guidance despite having only very limited learner information. Before making a substantive comment, a teacher often asks a question or two, seeking particular learner information. Teachers use a wide diversity of types of learner information: cognitive, social, affective and others; short-term and enduring; local and general. Stereotypes about students, ways of learning, and patterns of errors play a role. Teachers make use of fragmentary, sparse and incomplete information. Teachers sometimes use incorrect learner information, and later have to recover. Teachers sometimes make conjectures and assumptions about the learner, knowing that later corrective action may be needed.

Particularly striking was the wide diversity of types of information about a learner that were used, and the fragmentary nature of most of that information. These findings are not surprising to experienced teachers, but are in striking contrast to the learner modelling typically attempted in AIED. In a traditional ITS the computational learner model typically attempts to capture only the student's current cognitive structure of the learning domain, and that in only one form or grain size, but to represent this as fully and accurately as possible. 4.1 Multiple situated agents to model the learner The expert human teacher appears to draw on a vast variety of fragments of information about the learner, including personality, personal background, past learning history, current mood and motivation, goals, preferences, recent learning activities, and the current environment. Correspondingly, a collection of student modelling agents, each rudimentary and focussing on one aspect of the learner or the learning activities, may constitute a basis for advice that is more similar to that of the teacher than of the ITS. A traditional symbolic approach could be taken to managing the coordination of such a collection of agents, preferably based on some model of how human teachers choose amongst the various types of fragmentary learner information currently available. Constructing and validating such a model seems implausible if not impossible, given the very large number of variables involved. The alternative of following Masthoff and Van Hoe [11] is appealing: equip the specialised student modelling agents with no sophisticated inter-agent communication abilities, and provide no sophisticated higher-level manager. Instead, allow the agents to contend, perhaps with some rudimentary strength index attached to each potential signal that any agent generates, perhaps allowing the learner to choose from several preferred comments. We would thus have a set of situated student modelling agents. These could remain hidden in the system, or perhaps more usefully could be presented to the learner as a number of sources of possibly useful comments. Of course we have not mentioned many challenging issues related to other components of a full intelligent educational system, and how these would integrate with the proposed set of student modelling agents. Some of these have been addressed by Masthoff and Van Hoe [11] in their system including a set of situated teaching agents. Masthoff and Van Hoe built agents to provide diversity in aspects of teaching; our agent proposal here is in response to observed diversity in types of learner information used by teachers. It is likely that other important forms of diversity can be identified in human teaching and learning. In an


19

accompanying paper [9] we discuss agents to meet the need in statistics education of other types of diversity, most particularly the importance for good understanding of working with a number of alternative representations of a concept. 5

Agents for learning — discussion and conclusions

For the software designer, AOP offers a powerful way of coping with complexity of function, and of software itself. To the public eye, agents may fulfil some sci-fi fantasies, offer a human face for scary technology, and offer promise of sophisticated systems that are more easily understood and used. Agents allow computers to take a further step towards social interaction with people. All the above agent considerations appear also in systems or discussions addressing learning. In many but far from all cases AOP is combined with presentation of possiblyanthropomorphised agents to the learner. Many systems promise exciting and powerful developments to come. Our motivation for considering agents for learning is different. ITSs have been criticised on many grounds, often because they seem monochrome by comparison with human teaching. Recognising various ways that human learning and teaching involves diversity may be an important step in providing guidance for designers of learning systems that go beyond the monochrome. This diversity may be of types of teacher activity [11], or types of learner information [2], or perhaps of other kinds. The richness of human teaching and learning interactions may be the strongest justification for researching agent-based systems for learning. 6

References

[1] Castelfranchi, C., & Werner, E. The MAAMAW spirit and this book. In C. Castelfranchi & E. Werner (Eds.) Artificial social systems (pp.vii-xvii) Berlin: Springer, 1994. [2] Gumming, G., Sussex, R., Cropp, S., and McDougall, A. In B. du Boulay, & R. Mizoguchi (Eds.), Artificial intelligence in education: Knowledge and media in learning systems (pp. 577 - 579). Amsterdam: IOS Press, 1997. [3] Dede, C., and Newman, D. Differentiating between Intelligent Tutoring Systems and Intelligent Agents. Journal of Artificial Intelligence in Education (pp.305–307), 4(4) 1993. [4] Dillenbourg, P., Jermann, P., Schneider, D., Traum, D., and Buiu, C. (1997) In B. du Boulay, & R. Mizoguchi (Eds.), Artificial intelligence in education: Knowledge and media in learning systems (pp. 15 - 22). Amsterdam: IOS Press, 1997. [5] Genesereth, M. R., and Ketchpel, S. P. Software Agents. Communiacations of the ACM, (pp. 48 - 53), 37(7) 1994. [6] Hietala, P., and Niemiripo, T. The Competence of Learning Companion Agents. International Journal Of Artificial Intelligence in Education 9 1998 (to appear). [7] Kearsley, G. (1993) Intelligent Agents and Instructional Systems: Implications of a New Paradigm. Journal of Artificial Intelligence in Education (pp. 293- 304) 4(4) 1993. [8] Leman, S., Giroux, S., and Marcenac, P. A multi-agent approach to model student reasoning process. . In Greer, J. (Ed.) Artificial intelligence in education. Proceedings of AIED95, the 7th World Conference on Artificial Intelligence in Education (p.258), Washington, August. Charlottesville, VA: AACE, 1995

20

J Les et at /Agent Systems for Diversity

[9] Finch, S., Gumming, G., and Les, J., Agents for diversity in statistics education. Poster presented at AIED 99, and short paper published in these proceedings. [10] Lester, J., C., Converse, S., A., Stone, B., A., Kahler, S., E., and Barlow, S., T. In B. du Boulay, & R. Mizoguchi (Eds.), Artificial intelligence in education: Knowledge and media in learning systems (pp. 23 - 30). Amsterdam: IOS Press, 1997. [11] Masthoff, J., and Van Hoe, R. APPEAL: A Multi-Agent Approach to Interactive Learning Environments. In Perram, J., W., and Muller, J., P. (Eds.) Distributed Software Agents and Applications. 6* European Workshop on Modelling Autnomous Agents in a Multi-Agent World, MAAMAW'94 (pp.77- 89). SpringerVerlag, 1996. [12] Muller, J., P. The Design Of Intelligent Agents. A Layered Approach. Springer-Verlag, 1996. [13] Paquette, G., Pachet, F., Giroux, S., and Girard, J. EpiTalk: Generating Advisor Agents for Existing Information Systems. Journal of Artificial Intelligence in Education (pp. 349 - 379), 7(3/4) 1996. [14] Ritter, S., Communication, Cooperation and Competition among Multiple Tutor Agents. In B. du Boulay, & R. Mizoguchi (Eds.), Artificial intelligence in education: Knowledge and media in learning systems (pp. 31 38). Amsterdam: IOS Press, 1997. [15] Ritter, S., and Koedinger, K., R. An Architecture for Plug-in Tutor Agents. J. Journal of Artificial Intelligence in Education (pp. 315 - 347), 7(3/4) 1996. [16] Selker, T. Coach: A Teaching Agent that Learns. Communications of the ACM (pp. 92 - 99) 37(7) 1994. [17] Shoham, Y. Agent-oriented programming. In Artificial Intelligence (pp 51–92) 60 1993. [18] Steels, L. Building agents out of autonomous behavior systems. In L. Steels & R. Brooks (Eds.) The artificial life route to artificial intelligence (pp.83–121). Hillsdale, NJ: Erlbaum, 1995. [19] Wang, W.- C., and Chan, T. - W. Experience of Designing an Agent-Oriented Programming Language for Developing Social Learning Systems. In B. du Boulay, & R. Mizoguchi (Eds.), Artificial intelligence in education: Knowledge and media in learning systems (pp. 7 - 14). Amsterdam: IOS Press, 1997. [20] Wooldridge M. and Jennings N. Intelligent Agents: Theory and Practice. Knowledge Engineering Review, 10(2) 1995. [21] Zhang, D., M., Alem, L., and Yacef, K. Using Multi-agent Approach for the Design of an Intelligent Learning Environment. In Wobcke, W., Pagnucco, M., and Zhang, C. (Eds.) Agents and Multi-Agent Systems (pp.220 - 230). Berlin: Springer, 1998.


21

Teachable Agents: Combining Insights from Learning Theory and Computer Science Sean Brophy, Gautam Biswas, Thomas Katzlberger, John Bransford, and Daniel Schwartz Box 45-GPC Vanderbilt University Learning Technology Center Nashville, TN Abstract. We discuss computer environments that invite students to learn by instructing "teachable agents" (TA's) who venture forth and attempt to solve problems that require knowledge of disciplines such as mathematics, science or history. If the agents have been taught properly they solve the problems they confront; otherwise they need to be further educated. The TA's have both a "knowledge dimension" and a "personality dimension" (e.g., some may be impetuous, not listen or collaborate well, need many examples to understand, etc.). This helps students focus on academic content plus the characteristics of "difficult agents" that interfere with learning. The paper briefly discusses learning by teaching, learning by programming, and relevant classroom research. This background helps identify key principles underlying teachable agent learning environments. The rest of the paper discusses a framework for instantiating these principles into a general teachable agent environment.

1. Introduction In this paper we discuss computer environments that invite students to learn by instructing "teachable agents" (TA's) who venture forth and attempt to solve problems that require knowledge of disciplines such as mathematics, science or history. If the agents have been taught properly they solve the problems they confront; otherwise they need to be further educated. The TA's have both a "knowledge dimension" and a "personality dimension" (e.g., some may be impetuous, not listen or collaborate well, need lots of examples to understand, etc.). This helps students focus on academic content plus the characteristics of "difficult agents" that interfere with learning. The TA environments that we discuss are designed to facilitate research on human learning (especially research on the potential advantages of "learning by teaching"). For example, in our environments, students do not know exactly what problem an agent will have to solve. Therefore, they need to think about "big ideas" that will prepare an agent to solve a class of problems. Presumably, teaching about big ideas will facilitate student learning and transfer as compared to teaching specific facts for solving a specific problem [1]. Our TA work is also intended to help us discover and refine computer science techniques for designing agents who can be taught by students and then display the effects of this teaching in their behavior. For example, students do not program TA's with procedural steps. Instead, they use representations common to the disciplines of knowledge we want the students to learn. So, rather than teaching an agent with a pseudo-code algorithm for computing X given Y, students can construct a graph that shows the functional relationship. In the following, we briefly review research on the benefits of learning by teaching and learning by programming. We then describe how classroom research led us to investigate the idea of having students learn by teaching agents. Next, we describe our current, and future, approaches to TA's where a student teaches an agent that is a virtual person. We conclude with a discussion of our current research on the development of TA environments.

22

S. Brophy et al. / Teachable Agents

2. Learning by Teaching A belief in the value of learning by teaching is widespread. One example involves graduate students who become teaching assistants in areas like statistics and note that teaching helped them really learn. The literature relevant to learning by teaching includes reciprocal teaching [2], small group interactions, self explanation [3], and peer-assisted tutoring [4]. Much of this literature refers to the fact that tutors often learn as much as, if not more than, their tutees [5]. Nevertheless, strong empirical evidence on this point is difficult to find. Some benefits of learning by teaching seem to be that teachers have to anticipate what their students need to learn, that teachers are often confronted by "naive questions" that make them rethink their own understanding, that teachers need to organize their knowledge in clear, consistent, and communicable ways, and that teachers have opportunities to notice the importance of different behaviors for people's abilities to learn. Presumably, when students take the role of teacher, they partake of these benefits. Moreover, interacting with another individual has a motivational component that may not be found in interactions with inanimate instructional materials. People, for example, are more motivated to make sure they get it right if they have the responsibility of teaching it to someone else. 3. Learning by Programming Learning by programming and the metaphor of computer agents provide opportunities that may yield benefits similar to those of learning by teaching. This work may be divided into two broad classes: research on the benefits of learning by programming, and research on techniques that make it possible for agents to learn. There is a substantial history to the question of whether learning to program has general intellectual benefits that extend beyond programming [6]. In the context of agents, the idea of learning by programming was emphasized by Papert [7] who helped students learn by teaching a logo turtle (cf. [8]). The ability to improve domain independent thinking skills (e.g., planning) through programming has received mixed support. Consequently, there has been a shift towards domain specific knowledge that students learn as they program the domain knowledge into agents. This idea includes programming lego toys and robots to interact with one another explicitly [9], programming computer agents to collaboratively learn from one another [10], creating micro worlds, and creating software to help others learn topics such as mathematics [11]. Another instructional method is to place the learner in the role of simulation designer. In this method the goal is to define a model by identifying the major factors in a system and to identify rules that govern those factors. Repenning and Sumner's AgentSheets [12], for example, makes it easy to create SimCity type simulations. Factors are represented as agents who each have a local set of rules that define their behavior. The design process resembles an informal inquiry process where students generate a hypothesis of how a system works, then they translate this hypothesis into an agent's rule base. Students acting as designers learn about the underlying principles of a situation by evaluating how an agent acts with other agents when running the simulation. The design process of hypothesizing, implementing and testing provides an excellent model for learning. Other work relevant to teachable agents comes from research on how to make agents that can learn. For example, the Persona project at Microsoft ([13]) has focused on agents that learn sophisticated user interactions, communication and social skills. Recent architectures have focused on agents that can learn from examples, advice and explanations [14][15]. These agents learn new knowledge through a range of techniques including natural language, monitoring users actions (demonstration), or learning through mistake correction. Our teachable agents could use similar approaches, but our end goal is different. Our teachable agents only need to "appear" that they are learning from the user. We are creating teachable agents that support student learning, not learning agents.

4. Learning environments that promote action, reflection and refinement Opportunities to teach and to program have both demonstrated potential for facilitating student learning. Our goal is to bring these ideas together to design TA computer environments. Our entrance into agent technologies followed a different path from that of research into computer applications or research into learning by teaching per se. Instead, we


23

have come to TAs by way of classroom research where we have found it important for students to have opportunities to develop and assess their understanding and to interact with their peers. The following reviews several key features of our SMART project (SMART stands for "scientific and mathematical arenas for refining thinking) that lead to the design features and principles of our teachable agent project. 4.1 Complex Learning Activities and Opportunities for Frequent Assessments One way to facilitate learning is to organize activities around anchoring problems that integrate and motivates multiple domain concepts [16][17][18]. Instruction begins with the presentation of a complex challenge. For example, in the video-based Jasper Adventure, Rescue at Boone's Meadow (RBM), students need to design a plan to rescue an injured eagle found in the mountains. Students and teachers identify subgoals for completing the challenge. Each subgoal creates a need to learn specific domain concepts. For example, students need to learn about rate-time-distance relationships for various vehicles (e.g., an ultralight and a car) to determine the optimal route and vehicle combination. Anchoring instruction in a larger problem-solving context helps students understand the value of new knowledge and how to apply it. To help students develop and self-assess their understanding of concepts, we often provide simulations. We use the larger context of the challenge to help make the simulation more meaningful to students. For example, AdventurePlayer, is a planning simulation that compliments RBM. Students experiment with different plans and receive feedback on the plan's success. The larger context of RBM provides an interpretive framework for understanding the feedback and for making decisions about learning goals. This helps students (and teachers) determine whether they need to revise their understanding. 4.2 Introducing a Teachable Agent Another way that we have helped students develop the abilities to solve a challenge is with a feature called, Kids on Line [19]. Kids on Line is a direct precursor of teachable agents. Students watched videotapes of students (actors we hired) as they explained their incomplete ideas on how to solve a challenge. The students critique the Kids on Line and provided suggestions for how to improve. Students (and teachers) found this activity extremely motivating, and they had many suggestions for how to improve the Kids on Line. When asked years later, students often spontaneously referred to the value of Kids on Line. 4.3 Representing and Working with Domain Knowledge using Smart Tools Anchors provide familiar contexts that help students grasp the meaning of new ideas and skills. It is important, however, that student knowledge be given a chance to generalize. We do not want students to learn how to solve a specific rate-time-distance (RTD) problem about the time it takes to fly a plane to a particular location, we want them to be able to solve classes of RTD problems. Thus, we introduce students to the idea of SmartTools. SmartTools refer to the representational methods experts use to organize and make sense of complex information. For example, graphs can quickly illustrate the relationship of a dependent variable and an independent variable. Many of the "anchor" stories we have created attempt to set up situations that motivate the value of representational tools. For example, in one anchor, students must be prepared to make quick computations in real time to help potential clients of a travel firm. At first, they discover that unaided arithmetic can be too slow. Afterwards, they learn that graphs and tables can be tremendous aids. Students receive opportunities to invent their own SmartTools and to learn about tried and true conventions. Overall, our work in classrooms indicates that students learn well when they have an opportunity to learn in anchored problem-solving contexts that include 1) scaffolds and frequent opportunities for assessment, 2) when they interact with models of behavior, and 3) when they can create SmartTools to generalize their knowledge. As a group, these ideas have evolved into the concept of helping students learn by teaching agents to solve particular sets of problems. One reason we moved to agent technology is because we found students do not always get sufficient opportunities to exercise and test their understanding when working on large projects. In this light, computer environments make an excellent

24


complement to project-based instruction because they can provide increased opportunities for individual exploration. 5. Designing Teachable Agents Environments Our TA environments fold our classroom research into work on programming to learn. For example, we have found that teaching someone else can be very motivating. Therefore, the agents that students teach are virtual humans. A recent classroom study helps to clarify the potential of teachable agents. In this study, students began their inquiry by meeting a cartoon character named "Billy Bashinall", they watched him attempt to perform in an environment that required knowledge of ecosystems and water quality. The challenges for the students consisted of teaching Billy so he could perform correctly when tested. The students eventually teach him about water quality, how to assess it, and how pollution affects dissolved oxygen and hence life. To meet these challenges, they did research, and were able to observe the effects of this teaching on his behavior. Figure 1 helps clarify this process. It shows drawings that represent snippets from an 8 minute video anchor that introduces students to Billy Bashinall. A particularly interesting aspect of this study was that the students showed great perseverance in their attempts to teach Billy. They used resources and revised their own understanding for several weeks without flagging interest. For example, we gave students repeated multiple choice tests. We told the students they were using these tests to determine whether they were ready to teach Billy. Our measures indicate that students did not view the tests as tests, but instead embraced them as opportunities to assess their own preparedness. And, as expected, they showed strong learning gains. 6. Creating Teachable Agent Environments: Computer Science Issues In the preceding example, Billy was not computerized. He was simply a scripted cartoon character shown on video. Classrooms of fifth through seventh grade students voted on the knowledge that Billy should receive, we tallied their votes, and then showed one of a few "canned" Billy behaviors based on the majority of votes. The level of interactivity was low, but this study provides important information about the design of TA's. It indicates that students did not need complete realism or interactivity to enjoy and benefit from teaching virtual agents. The students did not feel the need to create Billy from the bottom up, and they were willing to treat a cartoon character with the intent they might bring to teaching a real human. Our next goal is to create a more interactive TA environment. Adding interactivity to our environment involves designing small simulations where a TA performs. The first step is to identify the knowledge of a domain and construct a working simulation. We then convert the standard simulation into the TA framework. The TA framework moves the student's interaction away from directly manipulating the variables that control the simulation to providing input that instructs the TA on how to decide what adjustments to make to these variables. Figure 2 provides an overview of one method for accomplishing this outcome. Adding TA's to a simulation does not necessarily represent an application of machine learning. Just like the larger context helped students view the non-interactive Billy as a "real" agent worth teaching, the challenge context reduces the degrees of freedom a TA must have to seem teachable to students. The challenge context defines the boundaries of the amount of domain knowledge an agent needs to know and the knowledge the students need to convey. TA's can greet students already possessing most of the knowledge they need including domain knowledge, knowledge of how to interact with the simulation, and knowledge of how to plan and solve key aspects of the problem. Students only need to provide a little knowledge to make the TA work.


25

Billy Bashinall is ready to turn in his group's report on water monitoring. He tells his friend, Sally, that he's confident his group's report is good enough because "five pages is always good for a 'C in Mr. Hogan's class."

Billy's negative attitude comes to the attention of the Dare Force, a group of individuals who learned the hard way that it pays to work hard in school. They have dedicated their lives to helping (daring) others do well.

The Dare Force interviews Billy about his understanding of water quality, including the use of indicator species such as macroinvertebrates, and relationships between pollution and dissolved oxygen. Many of the interviews take the form of showing Billy visual scenes from actual river monitoring projects and asking him to explain what's going on. Billy starts out over-confident. He answers some questions correctly but also displays a number of preconceptions that need to be repaired (e.g., a healthy stream is clear and bug-free). Billy also chooses various tools to help with water monitoring that will not work well. The Dare Force makes it clear to Billy that he has a lot of learning to do (without specifying his exact strengths and weaknesses.) Billy finally agrees and asks for help. The Dare Force asks students in the classroom to teach Billy and help him reinterpret his data. When they feel they are ready, students can have Billy return to the Dare Force context and see how well he fares.

Figure 1. An Introduction to a Teachable Agent and Its Problem Environment

e.g. SMART tools

User (student)

Teachable Agent Knowledge Base Disposition (learning attitude)

Explain actions, ask questions Visual representation of simulation

Figure 2. Framework for Teachable Agent (TA)

26


Instructing a teachable agent may take several different forms. One method of instruction centers around an agent who doesn't have the tools to make decisions about how to formulate a plan. Students could fill out a Smart Tool that allows the TA to make decisions and calculations. This method of designing SmartTools is one approach we are exploring for converting the AdventurePlayer software to a TA design. The new version of the AdventurePlayer program provides multiple interfaces that a student can use to teach Billy. For example, students must teach Billy how to derive time parameters by constructing a graph that shows different time over distance slopes for various rates of travel. To teach him, they choose the frame of a line graph which presents two axes. It is their task to enter relevant information including axes labels (miles, hours, etc.), unit increments (10's, 100's, etc.), and lines indicating the time-distance relationship (e.g., the distance over time slope for a 25 mph rate). The completed graph becomes a part of Billy's knowledge base. A complete and correct graph will enable Billy to solve the problem correctly. A graph with incorrect quantities or quantitative relationships will cause the plane to leave too soon or too late in lawful ways. A graph that is missing the labels can lead to idiosyncratic behaviors. One possibility is that the program randomly inserts labels of the same ontology but of different scale (e.g., seconds instead of minutes). The simulation would reflect mis insertion, and the graph would highlight the fact that Billy just guessed and put a value onto the axis himself. A second possibility is that the program could insert labels that do not maintain ontology (e.g., pounds instead of minutes). How this manifests itself in the simulation will depend on the specific simulation (e.g., it may misinterpret pounds as the amount of gas needed for a given distance). Determining the best forms of feedback and instruction is a topic for continued research. Once students have taught Billy, they may place him back into the simulation environment. The simulation environment provides mini-assessments (specific condition and/or configuration of the simulation). These assessments offer small, manageable simulations where students can "debug" Billy under relatively controlled circumstances. Billy is introduced to a specific problem to solve. Students can see Billy's behaviors and whether his solution works. This provides feedback that helps students determine what Billy (and they) have yet to learn correctly. The challenge of debugging Billy can be made more or less difficult by varying the number of "free parameters" that he must fix to make the simulation work. A problem with many attempts to help students learn by programming is the "overhead problem"— learning to program often gets in the way of learning important mathematical or scientific content. In the hands of teachers who know the relevant content knowledge and have the programming skills, programming projects involving real and computer agents have resulted in successful content-based learning [20]. However, the burden on teachers to monitor and structure students' learning experiences is very large [20]. The fact that our TA environments are focused on specific goals for content and skills makes the "overhead problem" much less severe. Students can program Billy by using graphs, timelines, and other "SmartTools" that they need to learn to understand the domain. Figure 2 shows TAs have a dispositional component. This component determines the consistency of the behavior and learning of the agent in plausible ways; for example, the agent may jump to a quick, reasonable solution (a brief search in its knowledge space), or it may strive slowly for a precise solution (an exhaustive search). Students may see short cartoons of Billy solving various simple problems. One cartoon may show Billy responding very quickly and confidently. Another may show him checking his answers five times. They have to choose which cartoons contain the dispositions that they would like Billy to develop. In this case, probably neither because when they put Billy into a mini-assessment, neither will enable Billy to use his knowledge in an optimal manner. For the former disposition, he may answer fast, but randomly make stupid mistakes. For the latter disposition, he may always get the precise answer but take forever. One can imagine many other dispositions such as simply refusing to do the task, or asking for clarification on a faulty graph before entering the mini-assessment. We want students to understand that dispositions have a large influence on success, regardless of one's knowledge state. Thus, in our computational representation of Billy we have separated his disposition "module" from his knowledge "module." We are currently exploring different ways to represent dispositional information and have it interact with knowledge to create performances. This represents something of a switch from the Platonic paradigm that dominates Artificial Intelligence. In this paradigm, correct knowledge leads to


27

correct behavior. In our paradigm, correct knowledge may be deployed poorly if an agent has a poor disposition. Moreover, a poor disposition may make it difficult for the agent to learn. We believe that making disposition variables explicit in TA environments should have interesting consequences on students' ability to reflect on their own attitudes and those of their peers. 7. Conclusion Teachable agents provide a method to facilitate learning in a motivating and natural way for students. The introduction of a virtual human agent captures students' attention and motivates them to help this virtual agent learn new knowledge to accomplish a goal. Our instructional approach emerges from what we know about classroom learning including the value of an anchoring context for making learning meaningful, the need for multiple opportunities for assessing and revising one's knowledge, and the importance of promoting generalization. Students interact with teachable agents in the larger context of a meaningful challenge. Teachable agent simulations provide an excellent mechanism for students to apply what they know and receive feedback. Asking students to teach agents with SmartTools promotes generalization. The focus of our activities with teachable agents relate more to how humans represent knowledge rather than to how computers represent knowledge. This perspective opens new doors for exploring various computer science techniques for representation, accessing and displaying knowledge to facilitate human learning. We offered several example of teachable agents ranging from low interactivity to very high level of interactivity and complex representations of agents and their knowledge base. We are continuing to explore interesting challenges that capitalize on teaching agents and to explore new methods for virtual characters to interact with the human counterparts. References [I] Bransford, J. D., & Schwartz, D. L. (in press). Rethinking transfer: A simple proposals with educational implications. Review of Research in Education. Washington, DC. [2] Palinscar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension monitoring activities. Cognition and Instruction, 1, 117–175. [3] Chi, M. T. H., Bassok, M., Lewis, M., Reimann, M. & Glaser, R. (1989). Selfexplanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145 182. [4]Fuchs, L. S., Fuchs, D., Karns, K., and Hamlett, C. L. (1996). The relation between students ability and the quality of effectiveness of explanations. American Educational Research Journal, 33, 631–644. [5] Webb, N. M. (1983). Predicting learning from student interaction: Defining the interaction variables. Educational Psychologist, 18, 33-41. [6] Nickerson, R. S. (1983). Computer programming as a vehicle for teaching thinking skills. Thinking, The Journal of Philosophy for Children, 4,42–48. [7] Papert, S. (1980). Mindstorms. New York: Basic Books. [8] Ableson, H. and diSessa, A. (1980). Turtle geometry: The computer as a medium for exploring mathematics. Cambridge, MA: MIT Press. [9] Kafai, Y., & Resnick, M. (Eds.). (1996). Constructionism in practice. Mahwah, NJ: Lawrence Erlbaum & Associates. [10] Dillenbourg, P. (Ed.), (in press). Collaborative learning: Cognitive and computational approaches. New York: Elsvier Press. [II] Harel, I., and Papert, S. (1991). Constructionism. Norwood, NJ: Ablex.

28

S Brophy et al. / Teachable Agents

[12] Repenning, A., & Sumner, T. (1995). Agcntsheets: A medium for creating domain oriented visual languages. Computer, 28,17–25. [13] Ball, G., Ling, D., Kurlander, D., Miller, J., Pugh, D., Skelly, T., Stankosky, A., Theil, D., Van Dantzich, M., & Wax, T. (1997). Lifelike computer characters: The persona project at Microsoft research. In J. M. Bradshaw (Ed.), Software Agents (191–222). Menlo Park, CA: AAAI/MTT Press. [14] Huffman, S. B., and Laird, J.E. (1995). Flexible instructable agents, Journal of Artificial Intelligence Research, 3, 271–324. [15] Lieberman, H. and Maulsby, D. (19%). Instructible agents: Software that just keeps getting better. IBM Systems Journal, 35(3&4), 539-556. [16] Cognition and Technology Group at Vanderbilt. (1997). The Jasper Project: Lessons in curriculum, instruction, assessment, and professional development. Mahwah, NJ: Lawrence Erlbaum Associates. [17] Cognition and Technology Group at Vanderbilt. (1998). Designing environments to reveal, support, and expand our children's potentials. In S. A. Soraci & W. Mcllvane (Eds.), Perspectives on fundamental processes in intellectual functioning, Vol. 1 (pp. 313350). Greenwich, CT: Ablex. [18] Sherwood, R., Petrosino, A., Lin, X. D., and the Cognition and Technology Group at Vanderbilt. (in press). Problem based macro contexts in science instruction: Design issues and applications. In B.J. Fraser & K. Tobin (Eds.), International handbook of science education. Dordrecht, Netherlands: Kluwer. [19] Vye, N. J., Schwartz, D. L., Bransford. J. D., Barron, B. J., Zech, L. and Cognition and Technology Group at Vanderbilt. (1998). SMART environments that support monitoring, reflection, and revision. In D. Hacker, J. Dunlosky, & A. Graesser (Eds.), Metacognition in Educational Theory and Practice. Mahwah, NJ: Lawrence Erlbaum & Associates. [20] Littlefield, J., Delclos, V., Lever, S., Clayton, K., Bransford, J., & Franks, J. (1988). Learning logo: Method of teaching, transfer of general skills, and attitudes toward school and computers. In R. E. Mayer (Ed.), Teaching and learning computer programming (pp. 111– 135). Hillsdale, NJ: Lawrence Erlbaum Associates.


29

Meta-knowledge Representation for Learning Scenarios Engineering Gilbert Paquette LICEF Research Centre, Tele-universite 1001 Sherbrooke St. East, Montreal [email protected]. uquebec. ca http://www. licef.teluq. uquebec. ca Abstract. Meta-knowledge helps an individual to improve the ways he/she learns, thus facilitating transfer operations from a known application domain to new ones, and finally enabling him/her to learn more autonomously. The importance of meta-knowledge justify its inclusion within a knowledge model that provides a structured representation of « learning objects», the content of a learning environment. We will outline briefly a knowledge representation technique that was built for the design of learning environments and extend it to display graphical descriptions of meta-knowledge objects and their relationships with domain specific knowledge. We then use the MOT knowledge editor to represent skills and generic processes at the metaknowledge level. We finally show how a skill applied to a central knowledge object in an application domain can lead to meaningful learning scenarios intended both for the acquisition/construction of specific knowledge and meta-knowledge. Key words. Higher-order thinking skills, Metacognition, Knowledge and skill acquisition, Knowledge representation for instruction, Learning environments and microworlds, Principles and tools for instructional design.

1.

Introduction

Although many studies involve meta-knowledge, the term is not always explicitly used. These studies can be found in various domains such as mathematical logic [1], scientific methodology [2], problem resolution and its teaching [3], educational technology [4,5], software and cognitive engineering [6, 7], artificial intelligence [8]. Jacques Pitrat has produced an important synthesis in which he distinguishes several metaknowledge categories and proposes the following definition: «meta-knowledge is knowledge about knowledge, rather than knowledge from a specific domain such as mathematics, medicine or geology». According to this definition, meta-knowledge is at the heart of the learning process, which consists in transforming information into knowledge : -

by attributing values to knowledge from other domains : truth, usefulness, importance, knowledge priority, competence of an individual towards a knowledge object, etc.

-

by describing « intellectual acts », processes that facilitate knowledge processing in other domains: memorisation, understanding, application, analysis, synthesis, evaluation, etc.

-

by representing strategies to acquire, process and use knowledge from other domains : memorisation techniques, heuristic principles for problem solving, project management strategies, etc.

30

G. Paquette / Meta-Knowledge Representation

Romizowki [4] expresses very well the simultaneous phenomenon of knowledge acquisition in a particular domain, and the meta-knowledge building of skills : « The learner follows two kinds of objectives at the same time - learning specific new knowledge and learning to better analyze what he already knows, to restructure knowledge, to validate new ideas and formulate new knowledge », an idea expressed in another way by Pitrat: « meta-knowledge is being created at the same time as knowledge ». In other words, meta-knowledge develops while it is applied on knowledge in a particular field. Anybody learning new knowledge uses meta-knowledge (at least minimally) without necessarily being aware of it. However, using meta-knowledge should really be a learner's conscious act. This is what metacognition [9] is about. Meta-knowledge is knowledge that eventually leads an individual to improve the ways he learns, thus facilitating transfer operations from a known application domain to new ones, and finally enabling him to learn more autonomously. These objectives justify the inclusion of meta-knowledge within a knowledge model that represents a learning system's contents providing a structured representation of « learning objects». The goal of this article is to show that learning system engineering concepts need to be described precisely at the metaknowledge level. We hope to show that the MOT graphical representation language is a useful tool to describe such concepts, especially those involved in building learning scenarios from a skill's description as a generic process. In the next two sections, we will summarise the MOT knowledge representation technique and extend it to the meta-knowledge domain, thus providing a graphic way to represent of meta-knowledge objects. In the last two sections, we will present an application to the design of learning environments. 2.

Summary of the « MOT » representation system

A basic MOT model [10], is composed of six types of knowledge objects and six types of links. Knowledge is represented by geometric figures that identify its type, such as abstract knowledge (concepts, procedures, principles), as well as three types of corresponding facts (examples, traces, statements). Relations between these entities are represented by oriented links whose symbol (C, S, P, I/P, R and I) indicates the type of relation (see examples at the end section 3). -

Concepts describe a domain's classes of objects (the « what» dimension), by their attributes and possible values. Procedures describe sets of operations that may apply to several objects (the « how » dimension). Principles are general statements intended to describe objects properties, of concepts to establish cause-and-effect links between them (the « why » dimension), or properties of procedures to determine their conditions (the « when ») and the resulting actions.

Facts are data, observations, examples, prototypes, actions or statements to describe a particular object: -

Examples result from specifying values for each concept's attribute, thus gathering a set of facts that describe a very concrete and precise object. Traces are obtained by specifying the variables of all the operations that compose a procedure, resulting in a set of particular and precise actions, an execution trace.


-

31

Statements are obtained by specifying a principle's variables, thus resulting in a causeand-effect link between facts about particular objects, or facts about a particular object (instantiated condition) and the resulting traces of the actions.

Relations let us establish links between knowledge objects. -

-

3.

The instanciation (I) link relates abstract knowledge to a group of facts obtained by giving values to all the attributes (variables) that define a concept, a procedure or a principle, respectively examples, traces or statements. The composition link (C) connects a knowledge unit to one of its components or parts. Any object's attributes may be specified as a knowledge unit's components. The specialisation link (S) connects one abstract knowledge object to a second one that is more general than the origin of the S link. The precedence link (P) connects two procedures or principles, where the first must be terminated or evaluated before the second one can begin or be applied. The input-product link (I/P) connects a concept to a procedure, the concept being the input of the procedure, or a procedure to a concept which is the product of the procedure. The regulation link (R ) is directed from a principle towards a concept, a procedure or another principle. In the first case, the principle defines the concept by specifying definition or integrity constraints or it establishes a law or relation between two or several concepts. A regulation link, from a principle to a procedure or to another principle means that the principle exerts external control on the execution of a procedure or the selection of other principles. Meta-knowledge and Meta-facts

Every domain such as chemistry, sociology or law is constituted of knowledge objects. The domain that studies knowledge per se is particularly important for learning. Knowledge from this domain will be called meta-knowledge or generic knowledge. Since meta-knowledge is also knowledge (about knowledge), we will distinguish between three categories of abstract meta-knowledge and their corresponding meta-facts, as we do for knowledge. Meta-concepts are knowledge attributes. They are concepts that define value systems to apply to knowledge from various domains . -

For instance, when one claims that some knowledge in physics or economics is a priority, he uses a concept of « priority » that does not belong to physics nor economics, but to the domain that studies knowledge. Instanciating such a meta-concept is like attributing an exact value to a specific knowledge place-holder. As a result, we get meta-examples 2 in various domains such as: «the concept of atom is essential », « the break-even point calculation procedure is useful ».

In this sense, any meta-knowledge can be described as generic because it applies to several knowledge domains. Literature also includes terms such as « generic concept », « generic process », « generic task », «generic method ». We believe there are many advantages in regrouping these notions and in treating mem in the same study domain, the domain studying knowledge. Values attributed to knowledge are called « meta » when reified in the application domain. In the domain that studies knowledge, meta-examples or meta-concepts are, of course, simply examples or concepts.

32

-


Another meta-concept involves someone's competence with regard to particular knowledge objects. To keep it simple, we could state that competence can take only four values, depending on the knowledge level a person has reached: awareness, familiarity, mastery or expertise. Exemplifying such a meta-concept is selecting values for its attributes as applied to a specific knowledge object. As a result, we obtain meta-examples in various domains such as "J. B. is aware of Java programming", "H. L. masters Java programming" or " H. L. is familiar with financial analysis methods".

Meta-procedures are « operations on knowledge ». They are actions on knowledge or facts from various domains where they are applied. -

-

A classification procedure defined as a set of operations to determine the smallest class of a taxonomy to which a particular object belongs, is an example of a meta-procedure. It is composed of operations intended to determine of what first-level class the object is an example, and to examine the second-level sub-classes, up to the smallest classes of the taxonomy. To instanciate such a meta-procedure consists in choosing the taxonomy and the object to which we want it applied, for example, a taxonomy of vertebrates and a bat, or a taxonomy of professions and a given individual. The result is a meta-trace of specific operations in the application domain: « the bat satisfies the definition of vertebrate, then the definition of mammal and then of cheiropter »; « the individual satisfies the definition of the members of a liberal profession, then that of law professionals and then of the notaries ». Another meta-procedure is the induction of laws according to observations of variable attributes of objects. It consists in generating an observation table by giving values to the variables; then the table is analysed so as to find invariant expressions; then, an hypothesis or mathematical expression that explains data is formulated, and finally, the expression is validated with a set of new data. Instanciating this meta-procedure produces a meta-trace of specific operations in the application domain: " we create an observation table of the voltage variables (V), resistance (R) and current intensity (I), we verify that V/RI is invariant, we state the hypothesis that V=RI, and finally, we verify the hypothesis with this set of observations: V=60, R=2,1=30; V=120, R=4,1=30, etc."

Meta-principles are generic statements that apply to various domains aiming to regulate the use of other meta-knowledge objects or to establish relations between them. As action or relational principles, they can be described as « knowledge control principles » or « knowledge association principles ». -

-

One example is the following principle: " to solve a complex problem, one can first solve a particular case of it". Instanciating such a meta-principle is in fact choosing knowledge, here a problem, to which the meta-principle will be applied. Results are meta-statements like « to build a general procedure for compound interest calculation, first solve the problem using an interest rate of 10% », or « To diagnose a car breakdown, first diagnose the state of the electrical system ». Another example, « if a knowledge object is more general than another, the second is of the same type as the first», involves the meta-concepts « knowledge type » and « knowledge generality ». Instanciating such a meta-principle is the same as choosing knowledge to which it is applied. Results are meta-statements such as : «the concept of a vertebrate is more general than the knowledge a mammal, consequently the latter is also a concept», or « the law of perfect gases is more general than Gay-Lussac's ; the first being a principle, the second is of the same type ».

Since meta-knowledge is knowledge, not only can meta-knowledge objects be instanciated using the instance (I) relation to link to meta-facts about particular domains, but they can also


33

be associated with one another through composition (C ), specialization (S), precedence (P), input-output (I/P) or regulation ( R) links at the raeta-knowledge level. Models grouping concepts, procedures, principles and their instances at the knowledge domain level can also be built and represented using the MOT graphical syntax and the MOT editor. For example: -

meta-taxonomies classifying knowledge objects into a « sort of » hierarchy of metaconcepts; for example, the MOT ontology presented in the preceeding section; meta-processes describing generic tasks, their sub-tasks, their input and products, as well as the meta-principles that regulate their execution ; meta-methods emphasising heuristic meta-principles ruling many processes as in software or didactic engineering.

Figure 1 - A simulation meta-process applied (link AP) to the Internet domain.

The main knowledge unit on figure 1 is a procedure (oval shape). It defines the main purpose of the learning unit « Search for information on the Internet». The main procedure is decomposed into sub-procedures (using C links). One of them is "Execute the request": it has a « Request » input concept (rectangle shape) and produces (I/P link) a list of « Interesting Web sites ». These are in turn used as inputs to another sub-procedure, "Identify interesting information", which precedes (P link) the final procedure: "Transfer information in a text editor". The "Refine the request" sub-procedure is regulated (R link) by principles helping a user to refine a request. Finally, an application (AP) link on the main procedure shows that the learner will have to exercise a generic skill: « Simulate a process». Also, a metaconcept such as "Very important knowledge" is applied using an AP link to some of the knowledge objects to be learned in the Internet domain. Other use of the AP link3 could be applied to a meta-method for building methods. When instanciated to car driving, such a meta-method will make it possible to build a method in this application domain. Another meta-method for the evaluation of learning in a psycho-motor domain, when instanciated to this same domain, will make it possible to assess the learning of someone in car driving. 3 For an extensive discussion of this extension of the MOT representation system and the semantic assigned to the AP link in different situations, see [11)


34

4.

Skills and Generic Tasks: Examples of Generic Processes

Generic processes or meta-processes are described by a main procedure in the domain studying knowledge, which is broken down into sub-procedures until procedures are terminal. For each procedure, there is also a description of input or product concepts that feed them or are generated by them, as well as principles that regulate the transfer of control between various generic procedures. Generic procedures are structured sets of generic actions that can be instanciated so they can be applied to several knowledge domains called application domains. Under various names and serving different goals, several taxonomies have been proposed in the cognitive sciences for meta or generic processes. Summarised here, is an artificial intelligence taxonomy [8], a software engineering taxonomy [7] and two educational taxonomies [12, 4]. Although they do not exactly correspond, the following table distributes them into ten levels of generic process.

Generic processes

Active

KADS(genetic meta-knowledge problems)

(J. Pitrat) Pay Attention Integrate

Knowledge acquisition and search Storage o f

Apply

Analyze

Knowledge use, Knowledge expression

Knowledge discovery

Repair Synthetise

TaXWKHMyof

cogaltive Objectives

(B.BLOOM)

Receiving, Obtaining

Memorise

Instanciation Matching

Understand

know-ledge

Illustrate/Specify Transpose/ Translate

KADS (taferew* primitives)

Attention Perceptual acuity and discrimination Interpretation

Prediction, Supervision, Evaluation, Diagnosis

Specification, Apply Sending, Presenting Decomposition Analyse Transformatio n Sorting Comparison,

Recall procedures Recall schemata Skill to generate alternatives

Repair

Selection

Skill to restructure

Planning, Design, Modelling

Generalisation, Synthetise Abstraction, Assembling

Evaluate Knowledge acquisition

Skills cycle (AJ. Romtoowski)

Evaluate

Self-control

Skills to think of implications, to act on a decision, to see through the action, to selfcorrect

Table 1 - Comparison of four taxonomies of generic skills

Based on this table, a four-level taxonomy of meta-processes has been built for our learning system engineering method MIS A [11]. According to this taxonomy," Simulate a process", a

35


skill used in the example on figure 2, is a specialisation of the "Apply" skill. Each of the skills in this taxonomy is described very precisely by its inputs and its products and by a detailed generic process showing how the inputs are transformed into specific products. The "Simulate a process" skill is compared below to the "Construct a process" skill which is a specialisation of the "Synthetise" skill in the taxonomy. Skill Simulate process

Generic process

Product

Input process, its a A procedures, inputs, products and control principles.

A trace of the procedure : -Choose input objects set of facts obtained -Select the first procedure to execute through the application of the procedures in a -Execute it and produce a first result particular case -Select the next procedure and execute it -Use the control principles to control the flow of execution

Construct a Definition constraints such as process certain inputs, products and/or steps in the process

A description of the process: its inputs, products, sub-procedures with their input and output, and control principles.

-Give a name to the procedure to be constructed -Relate it to specified input and product -Decompose the procedure -Continue to a point where well understood steps are attained.

Table 2 - Comparison of two generic skills

From these descriptions of the two generic skills, we can see that a pedagogical scenario on the same subject of "Information search on the Internet" but with a different skill objective such as "Construct a process" would be very different from the one based on the "Simulate a process" skill. In the first case, a kind of walk-through of the process is sufficient, while in the second case, we could need a project-based scenario where learners and engaged in a more complex problem-solving activity. The description of both processes in the previous table is however just a summary, insufficient to lead to a precise pedagogic scenario. The MOT graph in figure 2 provides a more precise definition of the "Simulate a process" skill.

Figure 2 - Graph of a generic process: "Simulate a process"

36

5.


From generic task to learning scenario

We will now use the preceding generic process to build a learning scenario where learners will simulate the "Search information on the Internet" process. To do this, we lay out a graph corresponding to the generic process, but taking a "learning activity" viewpoint. As shown on figure 3, the graph is instantiated in a way that the vocabulary of the specific application domain (the Internet) is used. It is also formulated in an "assignment style" displaying seven activities. Note that globally, based on the generic process input and product, the learning scenario starts on a description of the process to simulate and ends on producing essentially a trace report.

Figure 3- A learning scenario: simulate the "Search the Internet" process

The following table shows the relation between the learning activities in the scenario and their correspondence in the generic process description. Note also the loop between activities 4 and 5 that can only end when the learner is satisfied according to certain quality principles that would have to be stated. Activity in the scenario Activity 1 : Choose a subject

Correspondence in the generic process Inputs to the overall process, defining the case to be simulated

Activity 2: Open a browser and a search engine Activity 3: Build a first search request

Next applicable procedure to execute

Activity 4: Execute a search request

Execute the chosen procedure

Activity 5: Refine the first request

Next applicable procedure

Activity 6: Transfer interesting information

Next applicable procedure to execute

Activity 7: Report on your search process

Assemble the trace

Table 3 - Correspondence between learning activities and tasks in a generic process

Of course the scenario is not yet complete. For example, we could add resources that help learners achieve their tasks, such as a tutorial on the structure of a request or on a final report form. Also, we might specify some collaboration assignments and maybe a description of the evaluation principles that will be use to assess the learner's work.


37

But the important thing here is that the generic process becomes the backbone of the learner's assignments. In that way, we make sure that he exercises the right skill, in this case "simulating a process", while working on the specific knowledge domain, thus building specific domain knowledge and meta-knowledge at the same time. 6.

Conclusion

We hope that the graphic language and the few examples presented here have succeeded in demonstrating some of the complex interrelations between metaknowledge and domainspecific knowledge. To use generic processes describing intellectual skills from the knowledge model as a direct basis for learning activities seems a promising way to facilitate building instructional scenarios based on integrated knowledge and metaknowledge objects. The most stimulating aspect of this work concerns the opportunity given to research teams to create a relatively complete set of pieces in this huge puzzle of learning systems engineering, hence building more reusable and significant components of knowledge and pedagogical models. In a similar way to KADS primitive librairies, we have started to build librairies of konwledge models and instructional models serving as templates for the design of learning systems. References [ 1 ] Thayse, A. Approche logique de I 'intelligence artificielle. Dunod, Paris, 1988. [2] Popper K.R. The logic of scientific discovery. Science Editions, 1961. [3] Polya G. How to solve it?. Princeton University Press, 1957. [4] Romiszowski, A. J. Designing Instructional Systems. Kogan Page London/Nichols Publising, 415 pages, New York: 1981. [5] D. Merrill. Principles of Instructionnal Design. Englewood Cliffs, New Jersey: 1994.

Educationnal Technology Publications, 465 pages,

[6] B. Chandrasekaran. Towards a Functional Architecture for Intelligence Based on Generic Information Processing Tasks. Proceedings UCAI-87, pp 1183–1192, Milan Italy: 1987. [7] Schreiber G., Wielinga B., Breaker J. KADS - A Principled Approach to Knowledge-based System Development. Academic Press. 457 p. San Diego : 1993. [8] J. Pitrat. Metaconnaissance, avenir de I"Intelligence Artificielle. Hermes, Paris, 1991. [9] Noel, B. La metacognition. De Boeck-Wesmael. 229 p. Bmxelles: 1991. [10] Paquette. G. La modelisation par objets types: une methode de representation pour les systemes d'apprentissage et d'aide a la tdche. Sciences et techniques educatives , France, avril 1996. [11] Paquette, G. Metaknowledge representation: application to learning systems engineering. TL-NCE technical reports. Vancouver, Canada, 1998. [12] Bloom, B. S. Taxonomy of Educational Objectives: the Classification of Educational Goals. New York: D. McKay, 1975.

38

Artificial Intelligence in Education S.P LajoieandM Vivet (Eds.) IOS Press, 1999

A Multi-Agent Design of a Peer-Help Environment Julita Vassileva, Jim Greer, Gord McCalla, Ralph Deters, Diego Zapata, Chhaya Mudgal, Shawn Grant ARIES Lab, Department of Computer Science, University of Saskatchewan, Canada [email protected] Abstract: This paper presents a multi-agent approach to design of adaptive distributed collaborative and peer help environments, which addresses a number of challenges: locating appropriate human and electronic resources depending on the help-request, motivating users to help each other, and is easily extendible "in-depth" and "in-breadth". We explore two novel approaches in user modelling: modelling of user social characteristics and modelling the relationships between users to support user collaboration and peer help. We show how adaptive behaviour of a heterogeneous distributed system can arise as a result of negotiation of goal-oriented autonomous cognitive agents. We are introducing an economic model in order to motivate users to collaborate and provide peer help while protecting helpers from getting overloaded with requests. Finally, we propose using multi-agent simulation tools to test the appropriateness of various economic models for a learning environment.

1. Introduction I-Help is an integration of previously developed ARIES Lab tools for peer help in university teaching. One of its components, CPR provides a subject-oriented discussion forum and moderated FAQ-list supporting students with electronic help. Another component, PHelpS selects an appropriate peer helper who can support the student with direct peer help via a synchronous chat environment. The selection of an appropriate discussion forum, FAQ-article or human peer helper is based on modelling learner knowledge in the context of the concept / topic structure of the subject material. See [3] for a more detailed description of the I-Help Project This paper presents our ongoing research on distributing the centralized monolithic architecture of I-Help, by using a multi agent-architecture [9]. There are a number of reasons for adopting a mufti-agent approach to developing Al-based educational systems. The rapid development of new technologies like telecommunications, networking, and mobility leads to new types of working environments where the borderline between working and learning disappears; just-in-time learning evolves into lifelong learning, "hybrid societies" emerge, consisting of real persons and electronic agents. An important factor in this type of environments is that everyone and everything is connected, so possibilities emerge for peer help, collaboration, and sharing resources (computational, applications, human advice etc.). Four major consequences follow. First, environments are needed that reduce complexity for users and allow them to concentrate on their primary goals or tasks, as well as support their learning and collaboration. Second, it becomes important to provide a source of motivation for benevolence and collaboration among users, along with protection mechanisms, ensuring security of communications, privacy of personal data, and equal chances for all to participate. Third, hard computational and software engineering challenges arise in large scale distributed learning environments, which can be overcome by an agent-based approach. Finally, the field of multi-agent systems opens a number of extremely interesting and potentially useful research avenues concerning inter-agent negotiation, persuasion and competition in agent societies. Modern society offers a rich reservoir of paradigms for modelling multi-agent systems, e.g. social roles and cultural values, norms and conventions, social movements and institutions, power and dominance distribution. We can learn from the adaptability, robustness, scalability and reflexivity of social systems and use their building blocks to come up with more powerful multi-agent technologies. I-Help provides an excellent environment for studying these issues. A distributed, agent-based architecture reflects naturally the distributed web-based environment in I-Help. If one views "live" sessions with peer helpers and electronic peer help (discussion forum postings; on-line materials)

J. Vassileva et al. /Multi-Agent Design of a Peer-Help Environment as help resources provided by some agents (human in the first case, software in the latter case), then there is no virtual difference between humans and software agents. One can draw further conclusions: it is no longer advantageous to have a monolithic ITS, which is like an almighty teacher knowing the answer to any question that may arise in the learner. Networking provides a possibility to find some narrow-focussed learning resource, suitable for the domain of the question. This resource might be some courseware or a peer helper. In this way a human enters the teaching loop to compensate for limitations related to educational software. It is no longer a necessary condition that there is a powerful diagnostic component, to perform individualised teaching. A human can "enter this loop" too and help to diagnose the reason for learners' misunderstanding. This diagnosis can be later used by an ITS or by another human, taking the role of a helper to provide the learner with advice. In this way a multi-agent architecture provides for a natural synergy between humans and software agents (ITSs, diagnostic components, pedagogical expert systems, on-line help systems, web-based pools of on-line materials, etc.)

Figure 1: Agent-based architecture

2.

Figure 2. A

™multi-level structure of an agent society.

Multi-level multi-agent architecture of I-Help

The adaptation within the multi-agent I-Help system is based on models of human users and models of involved software applications. These are maintained by two classes of agents (see Figure 1): personal agents (of human users) and application agents (of software applications). These agents use a shared taxonomy (ontology) and a communication language. Each agent manages specific resources of the user / application it represents, for example the knowledge resources of the user on certain tasks, topics or concepts or web-based materials belonging to an application. The agents use their resources to achieve the goals of their users, their own goals, and goals of other agents. Thus all the agents are autonomous and goal-driven. The agents are cognitive - they can plan the achievement of goals by means of decomposing them into sub-goals and relating them to resources. In their goal pursuit the agents can also use resources borrowed from other agents, i.e. they are collaborative. For this they have to negotiate and become involved in persuasion and conflict resolution. Each agent possesses a model of its inter-agent relationships, some of which reflect relationships between human users. Finally, the agents are mobile i.e. they can travel from one computer to another, thus optimising resources and bandwidth. In this way, we achieve a complex (multi-user, multi-application) adaptive (self-organised) system that supports users in locating and using resources (other users, applications, and information) to achieve their goals. A relatively detailed description of the multi-agent architecture we are using is presented in [9,10]. The multi-agent architecture of I-Help can be viewed as a society of autonomous intelligent agents. It involves various levels of organization, including intra- and inter-agent organization. A schematic representation of the multi-level organization of the I-Help architecture (and the agent society structure) is shown in Figure 2.

40

J. Vassileva et al. /Multi-Agent Design of a Peer-Help Environment

2.1. Basic Agent Infrastructure The environment in which all personal and application agents "live" guarantees a safe existence and fair chance of getting computational resources for all agents. It provides resource management, detects and limits or eliminates faulty and malicious agents, i.e. agents that consume too many resources. It also takes care of appropriate agent migration for die purpose of optimal use of resources. The agent environment consists of several self-explainable modules: monitor, resource manager, communication module, transporter (for agent migration) and magistrate (to make decisions in case of resource conflicts). The agents are implemented as a set of tasks, which are mapped on Java threads. Each agent consists of two or more tasks: & Kernel task containing information about the state of the agent, a Message task that ensures that the agent can communicate with other agents, and one or more Working tasks that allow the agent to pursue specific goals. The anatomy of an agent at Level 1 is shown in Figure 3. An agent is an instantiation of these classes of threads. There can be multiple instantiations of such threads within one machine, therefore multiple agents can co-exist on one machine. Considering a pool of networked machines, a society of agents is virtually independent of the physical location of the threads of the individual agents. There can be an agent with a Message thread running on one machine and Working threads running on other machines. Agent 1

_

Agent 2

Objects Threods

Figure 3. Anatomy of an agent on Level 1 (each bubble is a thread).

Flgure 4.

Different abstract views over the threads.

Agents can be "put to sleep" at moments when they have no active Message or Working threads. In this case, the Kernel thread is the only one that remains active, after storing information of the agent's last state in a database. In order to optimize resources, we have to provide a mechanism that allows an agent that has free Working or Message threads to offer them to other agents to use. This can be achieved by introducing an abstract "task level" between the agents and the actual processes running on the Java machine (see Figure 4). The upper two levels in Figure 4 are abstractions; only the lowest level exists in software consisting of Java objects and threads. However, these objects and threads can be viewed as belonging to one task, or to one agent, or to several cooperating / collaborating agents at the same time, since collaborating agents can be viewed as one agent (sharing Wcrk threads, Message threads and the corresponding tasks). Agents communicate by sending messages (objects) to each other. Communication among agents is implemented by a set of communication primitives, which is a subset of KQML (extended with specific primitives). Upon receiving a speech act primitive from another agent, an agent has to analyze it and respond to it depending on how the speech act corresponds to the agent's goals and negotiation plan (strategy). Two different communication forms have been implemented: • Peer commanication: the agent uses an address book to generate a channel to the desired agent. The channel takes care to send the message to the target agent. The use of a channel enables migrations of agents during communication, which may be forced by computational resource limitations. A channel offers a bi-directional communication line, which is useful for private communication among agents. • Bus commanication: it offers communication among several agents. An agent can create, join, leave, or send a message to a bus. Every message is broadcast to all members. A bus can be itself a member of a bus. It is useful for sending requests among agents, but it may lead to a flood of answers that the requesting agent will have to process.

2.2. Autonomous Cognitive Agents Users, learners, applications and learning environments are represented by autonomous goalbased social agents who communicate, co-operate, and compete in a multi-system and multi-user


41

distributed environment (see Figure 1). Each agent maintains three distinct models of the user / application that it represents: • a model of the user's/application's goals (in the case of human users - the goals and the preferences of the user), • a model of the resources available to the user / application, which in the case of a user, can include his / her cognitive resources (i.e. there are several domain specific models of the user's knowledge, experience, skills etc.), • a model of the relationships of the user or application (such as personal friends, acquaintances, agents which have proven to be useful, have offered help, and other agents who the user/application "owes a favour" to). These models allow the agent to be autonomous and reason about user goals, to plan about user resources and to search for agents of other users or applications who have the same goal and/or possess resources the user needs and doesn't have. This level involves the knowledge representation (user and application models) and reasoning mechanisms that make the agent autonomous and cognitive. Each agent has a pool of goals of two types. Intrinsic are goals that are built-in the agent, or pre-programmed, such as "maximize utility", "save state periodically", "search for help-resource", "offer service", "gain more resources", or "refresh a specific resource", etc. Extrinsic are goals that are adopted from other agents, be it the user of the personal agent, other application agents, or personal agents, like "find me a helper or knowledge resource on topic X", "ask your user to be a helper on topic Y", "provide me with access to your resource Z". Extrinsic goals are delegated explicitly to the agents, i.e. they don't infer the goals of other agents. The personal agents have to be told explicitly by the user on what topic help is needed. Inferring user goals by a personal agent is an attractive application of that classical diagnosis for user modelling, however, since we don't want to invest effort in developing a smart and narrow focussed mechanism for a certain domain, we let all goals be stated explicitly. All goals of an agent are co-existing in a goal-space. The position of a goal in the goal space depends on the goal's importance. The importance of each goal is a number which is calculated /updated constantly by the reasoning mechanism of the agent. It depends on the inherent characteristics of the goal, like whether it concerns the existence of the agent, the relationship with the user (if it is a goal of a personal agent), on what type of goal it is. Extrinsic goals are usually less important, depending on the importance of the relationship with the agent from whom the goal is adopted. The goal importance is a highly dynamic goal characteristic: with time flow or changes in the environment, some goals raise in importance and some drop in importance. The constant analysis of the importance of each goal is one of the central reasoning functions in an agent. This analysis is plan-based and reactive, i.e. it happens periodically, but it can be also triggered by opportunities, when certain changes in environment, or certain events occur. For example, if a help-request arrives (a new goal enters the goal-space), the agent has to evaluate the relative importance of the new goal with respect to the other goals, so that it can decide whether to pursue it or not. The evaluation of a goal's importance follows a set of rules that involve the parameters of a goal. We are experimenting with a simple PROLOG rule-based reasoner as the kernel of the goal-evaluation thread of the agent. One of the important intrinsic goals of an agent is the goal to maximize utility. Utility is defined as a function of the number of goals achieved by the agent, their importance, and the amount and cost of resources expended in the pursuit of the goals. This definition of utility is different from the usual definition of utility for a rational agent, whose utility is measured usually by maximizing a certain resource (e.g. money). We can model the behaviour of such a classical rational agent by defining an important goal for the agent to acquire a certain resource (money). However, the above definition gives us a huge flexibility in defining the agent's major drive (motivation). It can be motivated, for example, to maximize the number of relationships, or to maximize some other resource, let's say general user knowledge. The major agent motivation can be set to maximize the number of user goals in which the agent helps achieving (an altruistic helpful agent). In this way, by modifying the goal-importance of different agent goals, we can achieve very different agent behaviour. The setting of the goal importance should be probably left to the user; in this way the "character" of his/her agent can be "tuned" to be altruistic, greedy, sociable, helpful, hostile etc. The state of resources of each agent reflects the resources of its user or of the application represented by the agent. The agent manages the resources of the user/application just like a broker manages the money of an investor. The resources of users are represented in user models and the resources of software applications in application models. Both types of models are represented in a database and are created by special applications (diagnostic applications) whose goal is to enter or infer the state of resources from users' input or behaviour. The resources can be characterized according to several dimensions, for example, whether they are perishable or not, whether they are rechargeable or not, whether they are lendable or not. For example, time is perishable, not

42


rechargeable and not lendable; knowledge is (hopefully) not perishable, re-chargeable and lendable (through teaching or peer help). Knowledge resources are organized according to a taxonomy of knowledge topics / concepts / skills depending on the domain. Each agent manages the set of knowledge resources of the user or application it represents. We say that an application possesses a certain knowledge resource (say, on topic X), if the application provides electronic source of information about this topic (web-page, article in a FAQ, discussion thread on this topic, or if the application is supposed to teach the topic). A user possesses a certain knowledge resource if the user model contains a certain knowledge level on this topic. In the context of I-Help the key to finding appropriate peer helpers is modelling the users' ability, willingness and readiness to help. The readiness is a function of the inter-agent negotiation, as will be explained in the next section. The ability is a model of the user's cognitive resources, represented as an overlay over the concept (topic, skill, or task) structure. The willingness is a model of the user's social behaviour, including a list of resources characterizing the user's eagerness, helpfulness, and ranking in the group (class), i.e. a user "social model". The initialization and updating of the user model is done by a number of independent diagnostic applications. Each diagnostic application is represented by its own application agent who is activated at certain times or is called by die personal agent at a certain period of time to update the state of resources of the user. A diagnostic application updating the model of the user's knowledge can be a diagnostic component of an ITS in a specific domain. In pur case we have two different diagnostic applications: one is an online self-evaluation questionnaire for students to evaluate their own knowledge on the topics of the course and a set of on-line quizzes on the concepts which students have to complete as they move through the course curriculum. The diagnostic application may request help from a knowledgeable human "diagnoser" or "cognologist" to carry out the diagnosis. In this way a human can enter the diagnosis loop. The parameters in the social model are updated by several diagnostic applications. Diagnosis of the user's eagerness is based on monitoring the user's on-line activity (for example the votes on posting of this person in the discussion forum, the posted questions and answers in the discussion forum, observation of the threads visited during discussion forum browsing, or browsing in the web-based course materials). The user's class ranking is evaluated by taking into account the user's marks on assignments. The corresponding diagnostic application provides an interface for the teacher to input grades of the students from spreadsheet. Finally, the calculation of the helpfulness parameter of the user is based on the number of times the person has given help and the feedback on the quality of peer help received by the belpee. A diagnostic application monitors these peerhelp activities and the evaluation questionnaires after each session and enters values for certain parameters in the user's social model. Time is another important user resource. The amount of time available to the user can be assigned directly by the user or inferred using a stereotype value, for example, if the user is on-line, the user is available. Currency is a universal exchange resource for the society of agents. It may be given an real value by exchanging it for goods (marks, candies or lottery chances) in the real world. The user model employed by the agent when acting on behalf of its user or application also contains information about the user's / application's relationships with other users / applications. This representation allows an agent to know about certain interpersonal relationships existing among users and certain dependencies existing among applications (for example, a diagnostic system and a teaching system for a certain skill). The relationships are represented along several dimensions: symmetry (dominant - peer), sign (positive - negative), type (between humans, between agents, between applications). These dimensions play a role when selecting a partner for negotiation and in the negotiation itself. It is very important to model qualitatively interpersonal relationships, since they play a major role in human motivation to collaborate. An agent needs to have at least four inference mechanisms: • monitoring the environment (i.e. the communication channel) for new coming goals, about opportunities (unexpected resource availability, for example, the student has time which is an opportunity for the agent to pursue the goal to increase the state of its currency resource by making the student give help) for achieving some persistent, long term goals that are "sleeping" at the moment; to estimate the goal importance periodically and opportunistically; • calculating the utility function and to decide how to maximize it by selecting a set of active goals to pursue at every moment; • retrieving plans for goal achievement (eventually a plan generator may be integrated); • coordinating the execution of the plan by deploying the mediods (Java threads) and resources required by the goal-method (task) definitions. An open question is whether all of these inference mechanisms have to be "on-board" the agent. Currently, since the mechanisms are fairly simple, they are incorporated in every agent. However, since they are the same for all agents, the virtual tasks running on the machine can be reused by other agents.

J. Vassileva et al. / Multi-Agent Design of a Peer-Help Environment

43

2.3. Communicative agent In pursuing their goals, the agents require resources that they may or may not have. In the latter case, the agents can try to get access to resources of another agent, if this agent is willing to lend them these resources. For example, an agent with the goal to find help for its user on topic X will try to find agents of another user who knows about this topic, or of application that possess on-line resources on this topic. Agents can also delegate the execution of a goal to another agent who is better equipped with resources and methods for achieving the goal, for example, it can contact the agent of an application who teaches this topic. In this case the second agent may adopt the goal of the first agent, thus offering a service to the first agent. That means that agents have to offer resources, services (contracted goal adoption) or relationships (e.g. increase in the relationship importance or change the relationship sign or type) in exchange for the resources / service they receive from other agents. The agents engage in negotiation until they agree about the price (in terms of specific resource, service or relationship change. A negotiation protocol between agents has been introduced including a set of speech primitives (KQML performatives). Negotiation is a way for the agents to engage in collaboration [7]. There hasn't been as much research on negotiation among agents in multi-agent societies of deliberative and planning agents as in the field of sub-cognitive agents [2]. Most of the studies of simulated collaboration involve agents that are purely reactive [5]; [8].

2.4. Economic Model of the agent society The location of knowledgeable helpers and appropriate on-line resources or applications could have been achieved also without a multi-agent architecture. In fact, the first version of I-Help was a centralized component. However, the evaluation of the system showed that it is hard to motivate good students to serve as helpers for a longer time after the initial enthusiasm about the new system has gone. In every learner community there are students who ask for help often and students who nearly never ask for help. These latter are exactly the students who would make good peer helpers. It is obvious that there should be some reward for these students, in order to encourage them to participate — to "pay them" with something that they consider useful. The need for some sort of economic model is evident. However, this raises many questions: Should the economy based on currency or on barter (exchange of goods) as proposed in [1]? If based on currency exchange, what should be the real world equivalent of the fictitious currency: marks or goods? Should the economy be based on the zero-sum assumption (when one gains, the other one loses) or on accumulating some resource, like for example the global knowledge resource of all participants in the system? The role of the economic model is to ensure a co-ordination mechanism in the agent society and a source of motivation (reward) for the participating agents to offer their resources to other agents who may need them and thus cooperate with each other. Without the notion of "currency" as a resource which agents can accumulate and exchange, and an intrinsic goal for the agents to maximize the amount of their currency, it will be hard to create a utility function that will encourage agents having excess resources to offer them to other agents (i.e. good students to spend their time in helping weaker students). By variations of the price with respect to quality of resource (helper ranking) the agent economy will regulate the number of requests to the best helpers and will provide motivation for them to participate in the system. By introducing currency, without economic control mechanisms (e.g. taxation) we can expect a chaotic emergent behaviour from the system of negotiating agents. We need to ensure stable prices, trust in the currency, to ensure that the society will not become polarized with few very "rich" agents (what the agents of good students are likely to become) and a lot of very "poor" agents of students who need help but can't afford to buy it.

2.5. Control The purpose of the control level is to ensure fair chances for all agents to pursue their goals, and to protect the society from criminal, misbehaving agents. It involves diagnosis of potential troublemakers and punishing them by either isolating them (damaging their reputation, so mat other agents avoid having deals with them), or taking away their computational resources (putting them in jail). Measures of soft control involve the ability of agents to gossip, i.e. the agents themselves would communicate with each other regarding how their help transactions went, and introducing a "Review agent" that would collate and organize preference data based on global agent experiences. Measures taken under a hard social control scheme would include the banning of unethical agents, monetary deductions, reprimands through e-mail, etc. Examples of hard social control measures are introducing a "Refund Agent" which monitors transactions, and if some sort of obvious fraud was detected, would revoke the transaction and possibly impose harsher measures on the agent who did not live up to its contractual obligations. Another example of a hard control measure would be to

44


introduce an "Immigration/Police Agent". This agent would monitor the environment, looking for agents who do not have permission to perform transaction or participate in other ways.

3. Current state of the project The multi-agent approach allows our system to be developed simultaneously bottom-up and top-down. Currently we have implemented the basic agent infrastructure, and we are experimenting with agents with a very limited set of goals and resources (i.e. our user models are fairly simple). All resources (user models) are kept in a database and the personal agents only have access to the user models of their own users. The database is updated by several independent diagnostic applications. One of them (diagnostic application denoted with "Q" in Figure 1) provides a quiz on the concepts involved in the course. Another one ("M") monitors the user's activities (browsing, postings, voting), checks time-stamps and updates the user's eagerness. A third diagnostic application ("T") allows a teacher to enter marks on assignments, which are used to calculate the ranking of the student in the class (a part of the user model). A fourth diagnostic application ("S") allows the user to fill a self-evaluation form to initialize the knowledge component of the model. The reasoning of the agents is currently very simple; it matches resources to goals in a 1:1 fashion, and decides which goals to pursue by trying to maximize its utility function. Certain parameters of the utility function can be defined by the user, thus defining the "character" of the personal agent (greedy, altruistic, friendly, generous etc.). Currently the agents have no planning capabilities and no knowledge about goal-decompositions. The agents' negotiation capabilities are also limited. The only "reasoning" happening at this level involves determining the price for a resource / service based on the state of resources of the agent, the ranking of the agent in the list of potential helpers and the relationship with the user asking for help. Both of these are implemented as functions of a number of parameters. At the same time we are working on the upper levels of the multi-agent architecture, in a "topdown" fashion, by using a multi-agent based simulation tool, SWARM [6] to predict the global behavior of the system when multiple users and applications are negotiating and trading resources and services. In this simulation, the real users and applications are replaced with "random" functions producing certain input and the simulation concentrates only on the multi-agent society and the economic activities in it. We want to predict what would be the behaviour of the system depending on different distributions of agents with utility functions of various types. Will the economy function better with more "greedy" or with more "friendly" agents? Under what circumstances it is likely that "crooks" appear (e.g. cliques of agents who exchange fake help among each other in order to maximize their currency resources)? How can we discourage such behaviour with economic measures, and if not possible, what types of control / enforcement measures are necessary? Based on SWARM, different ways of control will be simulated (soft based on "reputation" of agents and hard - based on forceful measures like taking away resources from agents). It would be interesting to see what influence different simulated negotiation strategies have on the flow of currency in the society of agents and to correlate this with the gained knowledge in the system. Two student projects [4] and [11] are underway to investigate these issues.

4. Discussion I-Help is an user-adaptive distributed system deployed to support a collaborative community of learners to locate help resources (electronic resources and human peer helpers) in a University setting within a large course of introductory computer science. Adaptivity of the system is achieved through multi-agent architecture with autonomous, cognitive, and collaborative agents. Motivation is an important issue in learning environment, and especially so in collaborative and peer help environments. By introducing an economic model we address this issue, hoping that it will increase motivation for participation and collaboration among the users / learners. A multiagent architecture provides a natural testbed for implementing different kinds of economic models and experimenting with mem to test under which conditions such economic models will work out to be beneficial for the learning of all participants in the system. An important issue in systems for peer help is to protect good helpers from being overused. The multi-agent approach allows regulating the offer and demand naturally by varying the price for help. The personal agent of the helper can decide, depending on the priorities of the user, what the price would be. This further depends on the load on the helper, the urgency of the request for the helpee; the assumed quality of help provided (based on the helper's ranking in the list of helpers for the topic). Of course the user (helper) makes the final decision. We believe that it is very important to take into account the interpersonal relationships among users when creating a peer help environment and we suspect that this will turn out to be even more important in a collaboration environment. The interpersonal relationships (friends, "favours owed"


45

or "Bozo list") play a role in decisions to help or to refuse. In our multi-agent I-Help system, the personal relationships of every user are represented as a part of his/her user model and managed as a specific type of user resource by the personal agent. They are taken into account in calculating the price of service or resources that the personal agent offers to other agents, i.e. there is a specific discount for friends and an extra-charge for "Bozos". People who have been helped successfully (positive evaluation by the helpee) can be added to the helper's list of relationships as "friends" or "favours owed". At first glance it may seem that this would prevent people from collaborating, by for example, making people help only their friends and not help people they don't know (which makes the whole idea of I-Help senseless). However, the agents have other drives, such as maximizing their profit, (which makes them help unknown people who they can charge more), as well as maximizing the number of their relationships, which will pay back later since the agents of "friends" will charge less in case the user asks them for help). This makes modelling relationships a powerful motivation catalyst for cooperation and collaboration. To our best knowledge this is a new idea in user modelling which we believe will have impact on the design of collaboration environments. The multi-agent approach which we adopted in the design of a peer-help and collaborative learning environment I-Help offers software engineering advantages, like easy extendibility and inter-operability. We can add new on-line materials, new applications (diagnostic systems, teaching systems, discussion forums, collaboration environments), and add more users on line. Such a system is less vulnerable when certain applications don't work, since it is essentially distributed: every participant can be taken away without causing the overall system to crash. Another software advantage is that the overall system works independently of the intelligence of each individual agent. We start with fairly simple agents, with simple user models (limited resources) and limited reasoning about goals and ways to achieve them, with very limited negotiation strategies and gradually extend the agents capabilities by providing new applications offering "reasoning" services, "negotiation" services etc. Gradually, with the development of more sophisticated reasoning and negotiation mechanisms tuned to fit with functions controlling the simulation so that the system behaves in a desired way we hope to achieve a powerful distributed peer-help and collaboration learning environment. Acknowledgement: We would like to acknowledge the TeleLeaming NCE for financial support and to thank members of the ARIES Lab "Agents" group for their input.

References 1. Boyd G. (1997) Providing Real Learning with Virtual Currency, Proceedings of the International Conference on Distance Education, Penn State Univ., June 1997. 2. Castelfranchi C. and Conte, R. 1992. Emergent functionality among intelligent systems: Cooperation within and without minds. Al & Society, 6, 78-93. 3. Greer, J., McCalla, G., Cook, J., Collins, J., Kumar, V., Bishop, A. and Vassileva, J. (1998) The Intelligent HelpDesk: Supporting Peer Help in a University Course, Proceedings ITS'98, San Antonio, Texas. 4. Kostuik, K. (to appear) Simulating a decentralized economy in a multi-agent learning environment, CMPT400 Honours Project, Dept. of Computer Science, University of Saskatchewan. 5. Mataric, M. (1992) Designing Emergent Behaviors: From Local Interactions to Collective Intelligence. In Simulation of Adaptive Behavior 2. MIT Press. Cambridge 6. Minar, M., Burkhart, R., Langton, C., Askenazy, M. 1996, The Swarm Simulation System: A Toolkit for Building Multi-agent Simulations. Santa Fe Institute. 7. Mudgal, Ch, (to appear). Negotiation and conflict resolution in multi-agent systems: a game theoretic approach, MSc Thesis Proposal. Dept. of Computer Science, University of Saskatchewan. 8. Steels, L. (1990) Cooperation between distributed agents through self-organization. In Y. Demazeau and J.P. Mueller (eds.) Decentralized Al North-Holland, Elsevier 9. Vassileva J. (1998) Goal-Based Autonomous Social Agents Supporting Adaptation and Teaching in a Distributed Environment, Proceedings of ITS'98, San Antonio, Texas.. 10. Vassileva J., Deters R., Greer J., McCalla G., Kumar V., Mudgal C. (1998) A Multi-Agent Architecture for Peer-Help in a University Course, Proc. Workshop on Pedagogical Agents at ITS'98, San Antonio, Texas, 64-68. 11. Winter, M. (to appear) Simulated Social Control in a multi-agent based learning environment, CMPT405 Project, Dept. of Computer Science, University of Saskatchewan.

46

Artificial Intelligence in Education S.P LajoieandM. Vivet (Eds.) IOS Press. 1999

A Methodology for Building Intelligent Educational Agents Harry N. Keeling

Department of Systems and Computer Science Howard University 2300 6th Street, NW, Washington, DC 20059 Phone: (806) 806-4830, [email protected] Abstract The acquisition of domain knowledge has traditionally been recognized as a major impediment to the development of knowledge-based systems. In particular, this "knowledge acquisition bottleneck" has significantly slowed the development of intelligent educational software where the domain experts are teachers with little or no experience with knowledge engineering. However, recently there has been a growing recognition of the potential benefits that can be derived from the application of advances in the Al-related fields of machine learning and knowledge acquisition. In this paper, we discuss a full life cycle methodology for building educational agents mat can be used in classrooms. This innovative approach synergistically combines methods from machine learning and knowledge acquisition, concepts from intelligent tutoring research and advances in skill assessment from educational research. The result has been a demonstrated methodology for building educational agents that utilizes a multistrategy apprenticeship learning approach, a multi-dimensional evaluation phase and a customized software toolkit mat facilitates its application. This methodology has been successfully applied to develop three educational agents that act as indirect communicational channels between the educator and the student. Applied in the areas of American history and statistics, these agents generate test questions and provide tutoring through hints and explanations. One agent has been used to enhance the functionality of an existing educational software package, while the other two have been used in a stand-alone mode. Fhis research demonstrates solutions to the problems involved in building intelligent educational software and prescribes a new approach that draws from the fields of artificial intelligence and educational research.

1

INTRODUCTION

This paper presents a methodology for building intelligent educational agents through the integration of machine learning and knowledge acquisition. Based on a multistrategy machine learning approach called Disciple [9], this methodology reduces the involvement of the knowledge engineer as well as the time required from the domain expert. A claim of this research is that this methodology can be naturally employed to (1) build agents that are capable of adding intelligent features to educational software and (2) build stand alone agents with their own interfaces that are capable of assisting both teachers and students. The following pages discuss this methodology by first describing the roles of each participant in the effort and then outlining the agent development process.

H.N. Keeling I Building Intelligent Educational Agents

2

47

OVERVIEW OF METHODOLOGY

An overview of the Disciple full life cycle agent building methodology is presented in Figure 1. This figure illustrates the stages of the agent as it is developed and the roles of the three "stakeholders" in the agent development process: the domain expert/teacher; the knowledge engineer/developer; and the user(s) of the agents.

Figure 1: Overview of the Disciple agent building methodology

Depending upon the complexity of the application domain, the expert may require the assistance of a separate developer (a software/knowledge engineer). The expert and the developer work closely to determine requirements and to build the agent. Broken lines indicate where the agent development process may require some iterative development efforts, due to changes in the application domain or in the requirements for the agent after the agent has been developed. These iterative development efforts may include the refinement of the agent's knowledge or the further development of the agent's interfaces and/or problem solving algorithms. To begin the agent building process, the agent developer in cooperation with the domain expert must customize the Disciple Learning Agent Shell [10]. This shell contains components for basic knowledge acquisition and learning, problems solving and knowledge base management. The customization of this shell requires the development of several domain dependent software modules. These modules may include specific interfaces that operate on top of the shell's domain independent graphical user interface. These domain specific interfaces allow the domain expert to express his/her knowledge in a manner that is both natural and appropriate for the application domain. Further, these interfaces can be used by the agent to communicate with the expert in a similar manner.

48

H. N. Keeling / Building Intelligent Educational Agents

The domain expert and the agent developer also decide on the nature and extent of a domain specific problem solver, based upon the type and purpose of the agent to be developed. Subsequently, the agent developer continues the customization of the agent shell with the development of the domain-specific problem solver. The software/knowledge engineer may need to develop the agent's domain specific problem solving component by extending the existing problem solving components of the Disciple Learning Agent Shell. Other domain dependent software modules may include conversion routines to be used to import existing knowledge bases from other sources. The result of this effort is a customized Learning Agent Shell with learning capabilities and a generic problem solving component, but without a knowledge base. While the learning agent is under development, it resides in the customized Disciple Learning Agent Shell that is utilized by the expert during knowledge elicitation, rule learning, and rule refinement processes. At this stage, the domain expert develops the agent's initial knowledge base and teaches it domain specific tasks. If the agent is to be his or her assistant, the expert both teaches the agent and then uses it. If the agent is to be used by a non-expert user, the expert will initially teach the agent then release the trained agent to the user. The role of the domain expert is to train the agent once the customized agent shell has been built. During this dialog, the expert interacts directly with the agent to define an initial knowledge base by engaging in a knowledge elicitation dialog in which the agent guides the expert to provide whatever knowledge he or she can easily express. Figure 2 below presents the Concept Editor and Concept Browser windows along with two domain specific interfaces. These interfaces are utilized to build the initial knowledge base.

Figure 2: Concept Editor and Concept Browser windows along with two domain specific interfaces

After this initial knowledge base has been developed, the expert teaches the agent how to perform domain specific tasks using the Disciple approach. Disciple is an apprenticeship, multistrategy approach for developing intelligent agents [1]. Central to this approach is the creation of a man-machine dialog where an expert teaches the agent how to perform domain-specific tasks. This agent training is performed in a way that resembles the manner in which an teacher would teach an apprentice, by giving the agent examples and explanations as well as by supervising and correcting its behavior [9]. The expert


49

teaches the agent how to solve certain types of problems by providing a concrete example, helping the agent to understand the solution, supervising the agent as it attempts to solve analogous problems, and correcting its mistakes. In this approach the agent learns its behavior from its teacher by integrating several machine learning and knowledge acquisition techniques and by taking advantage of their complementary strengths to compensate for their individual weaknesses [2, 8, 11]. Through only a few such natural interactions, the agent is guided to learn complex problem solving rules and to extend and correct its knowledge base. This process is based on a cooperation between the agent and the human expert in which the agent helps the expert to express his/her knowledge using the agent's representation language, and the expert guides the learning actions of the agent. As a consequence, the Disciple approach significantly reduces the involvement of the knowledge engineer as well as the time required from the domain expert in the process of building an intelligent agent. At this point, this initial knowledge base of the agent, built mainly through knowledge elicitation techniques, is assumed to be incomplete and partially incorrect. It is expected that the agent will require maintenance during its lifetime. That is, the agent may need to be retrained and possibly redeveloped. The process of retraining and redevelopment does not differ from the agent's initial training and development. Additions could be required for the agent's knowledge in the form of new concepts, instances, and features. Also, the existing rule base may require refining and/or new rules may need to be learned. These new customization and knowledge requirements may be detected by the expert during the initial learning sessions or later, by the user, after the agent has been put to use. Additional knowledge requirements may be revealed during the agent testing and evaluation phases. During the agent testing and evaluation phase, the agent developer tests the agent in much the same way that traditional software systems are tested. However, as with knowledge-based systems like a Disciple agent, the expert and the users must also assess the adequacy of the agent's knowledge and evaluate agent performance. These evaluation activities focus on measuring the completeness and correctness of the agent's knowledge base. Several evaluation experiments are an integral part of this methodology. Two types of experiments are necessary to measure the predictive accuracy of the agent's knowledge base [6]; one with respect to the training expert and another with respect to a independent domain expert. These studies measure the ability of the Disciple approach to acquire the expertise of the training expert, while the other seeks to determine how well the agent has acquired general knowledge in its problem domain. In other experimental studies, both the domain expert and the agent's potential users provide subjective survey-based evaluations. These metrics reveal specific performance aspects of the trained agent that can indicate the need for redevelopment and/or the need for additional training as illustrated in Figure 1. After the agent's knowledge and performance are verified and validated, an agent capable of operating in a given domain is released for use. The next section describes the major phases of the Disciple agent building methodology in more detail. 3

PHASES OF AGENT DEVELOPMENT

The Disciple agent development methodology is an application of software engineering principles to the orderly transformation of the needs of teachers and students into a working educational agent. It also includes an optional phase for the integration of this agent into a target educational software package. Moreover, this methodology includes the verification, validation and maintenance of this agent until the end of its useful life. This methodology can be broadly characterized as an iterative analysis, design and development effort. These phases of this methodology are: (1) analyzing the problem domain, defining agent requirements and defining the top level ontology of the agent's

50

H.N. Keeling / Building Intelligent Educational Agents

knowledge base; (2) designing domain dependent modules, the agents task structure and problem solver; (3) customizing the Disciple shell, building the initial knowledge base and problem solver, and teaching the agent how to generate tests; (3) developing the agent with a problem solving engine and a graphical user interface; and finally (4) verifying, validating and maintaining the agent. These phases are graphically presented in Figure 3. The process of designing, building and integrating intelligent educational agents is comprised of four phases. The remainder of this section describes these phases within the context of the traditional development phases of software engineering:

Figure 3: Four phases of agent development life cycle

Requirements Definition: In the first phase, the need for the agent is investigated and its main goal identified. Subsequently, the scope of die development effort is determined. The agent's goal is refined into objectives and its purpose is characterized in terms of the requirements that it must meet. Also, during this phase the role of the agent is determined and what the agent will do in this role is outlined. Outcomes of this phase are the customization requirements for the


•

•

•

51

Disciple Learning Agent Shell to support the agent's specialized knowledge elicitation and expert communication requirements. Also, the agent's knowledgerelated needs are documented by the identification and modeling of the necessary knowledge elements from the agent's problem domain. Design: The second phase covers the design of the agent's problem solving engine and the domain-dependent modules and interfaces that will facilitate knowledge acquisition. The required customizations of the Disciple shell are designed. Also, the representations of the knowledge elements identified in the top-level ontology [4] are designed and organized into a high level semantic network. Development: During the development phase, the Disciple Learning Agent Shell is customized with domain-dependent interfaces and a specialized problem-solving engine is built. The initial knowledge base of the agent is constructed and the agent is taught how to perform domain specific tasks. This phase also includes the construction of the tested agent with a problem solver and knowledge base. Evaluation: During this phase, the software modules that comprise the agent are tested and the performance of the agent is evaluated. The results of this process can cause the development process to return to previous phases for reformulation of requirements, redesign, or further refinement of the agent's knowledge base. Also, a multi-dimensional process for the verification and validation of the agent is completed during this phase.

In keeping with software engineering principles, each phase in the agent building methodology reveals new issues and requirements that may causes the project team to return to a previous phase. Experience has shown that it is very likely that each phase will need to be re-visited and that the products of each phase will evolve as new requirements are revealed during this iterative process. 4

EDUCATIONAL AGENTS FOR TEST GENERATION

This methodology has been successfully applied to develop three educational agents that act as indirect communicational channels between the educator and the student. Applied in the areas of American history and statistics, these agents generate test questions and provide tutoring through hints and explanations. One agent has been used to enhance the functionality of an existing educational software package, while the other two have been used in a stand-alone mode. In the following pages, these agents are illustrated and briefly discussed. The three developed assessment agents represent a new type of educational agent that serves as an asynchronous communication channel between one educator and an unlimited number of students, and yet provides one-to-one communication [5]. It has been shown that an educator can teach the agent in much the same way that he/she might teach a human apprentice or student. Subsequently, the agent can test and tutor the students the way it was taught by the educator. Since the educator instructs the agent as to what kind of tests to generate, and then the agent generates a wide variety of tests of that kind, the assessment agent provides the educator with a very flexible tool that lifts the burden of generating personalized tests for large classes, tests that do not repeat themselves and take into account the instruction received by each student. For instance, the assessment agent in history can generate on the order of 100,000 relevancy tests. The assessment and support agent in statistics can generate over 1,000,000 unique test questions. Moreover, the ability of such agents to not only generate personalized sets of test questions, but also hints for solving them as well as detailed explanations, allows the students to be guided by the agent during their practice of the higher-order thinking skills as they would be directly guided by the educator.

52


These agents that are representative of a class of agents are built by an expert to be used by other users, like history teachers and history students. One such agents is integrated with the MMTS (Multimedia and Thinking Skills) educational system [3], creating a system with expanded capabilities, called IMMTS (Intelligent Multimedia and Thinking Skills). This agent is referred to as the integrated Assessment Agent. This agent illustrates an application of the Disciple agent building methodology to developing educational agents that enhance the capabilities, generality, and usefulness of non-knowledge based software. Figure 4 presents the IMMTS agent.

Figure 4: IMMTS agent test question with feedback for correct response The second agent was a stand-alone agent that is used independently of the MMTS software. This agent illustrates the application of this methodology to the development of an intelligent educational tool that directly assists both teachers and students. This agent was experimentally evaluated by domain experts, teachers, and students in an actual classroom. The results indicated that this methodology can be employed to build agents that are both accurate and that provide significant benefit to its users. The third agent, called the Statistical Analysis Assessment and Support Agent, seeks to support the development of higher-order thinking skills, in particular, statistical problem solving skills. This agent focuses not only on the factual information, but also on the analysis of that information. The Statistical Analysis Assessment and Support Agent, in its assessment mode, seeks to measure the entire array of higher-order thinking skills that are required for solving a particular problem. So the agent takes one problem and tests all the skills required for that problem, in turns, following closely the intuitive and natural problem solving process. This agent, in its assessment mode, goes over the complete analysis of a data set: it first tests whether the student has captured the type of the data set so as to determine the kind of questions that should be asked about that data set, second it tests whether the student can formalize the questions in the form of hypothesis testing or statistical measures, third it tests whether the student can identify the techniques and tools necessary to successfully complete the analysis. It is envisioned that ultimately the agent will also be able to test whether the student actually performs the analysis correctly, thereby completing the problem solving methodology.


5

53

CONCLUSIONS

The development of each of these educational agents followed the methodology previously described. This research demonstrates solutions to the problems involved in building intelligent educational software and prescribes a new approach that draws from the fields of artificial intelligence and educational research. AH the agents developed in this research can be used as multipurpose assistants to both the teacher and the student. They can be used either to test the student and to provide the student with expertise and guidance. The explanations given to agent by the educational expert are similar to the explanations provided by the agent to the learner's through their interaction with the test questions. It acts as a one-on-one teacher/tutor, using the same processes as the ones by which it was taught by the expert. The agent can replicate some part of usual teacher/student or mentor/student interactions. Therefore, it can be concluded that the agents act as an indirect communication medium between the educator and the students. This illustrates a significant benefit to be derived from using the Disciple approach to building educational agents. Additional roles for educational agents built with this methodology will be explored in future research.

REFERENCES [1] Bradshaw, J. M. (editor), (1997). Software Agents. Menlo Park, CA AAAI Press. [2] Buchanan, B. G. and Wilkins, D. C. (editors), (1993). Readings in Knowledge Acquisition and Learning: Automating the Construction and Improvement of Expert Systems, Morgan Kaufrnann, San Mateo, CA, 1995. [3] Fontana, L. Debe, C., White, C. and Gates, W. (1993). Multimedia: Gateway to Higher-Order Thinking Skills a Work in Progress. In Proceedings of the National Convention of the Association for Educational Communications and Technology. 1993. [4] Gruber, T. R. (1993). Toward principles for the design of ontologies used for knowledge sharing. In Guarino, N. and Poli, R. (editors), Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic. [5] Hamburger H. and Tecuci G. (1998). Toward a Unification of Human-Computer Learning and Tutoring. In Proceeding of the 4th International Conference, ITS '98, San Antonio, Texas, Springer Verlag, 1998. [6] Keeling, H. (1998). A Methodology for Building Verified and Validated Intelligent Educational Agents, Ph.D. Thesis, Learning Agent Lab, Department of Computer Science, School of Information Technology and Engineering, George Mason University, 1998. [7] Kibler, D. And Langley, P. (1990). Machine Learning as an Experimental Science. Readings in Machine Learning, (pp. 38–43) Morgan Kaufrmann Publishers, Inc. San Mateo, CA. 1990. [8] Michalski, R.S. and Tecuci, G. (editors), (1994). Machine Learning: A Multistrategy Approach Volume 4, Morgan Kaufmann Publishers, San Mateo, CA, 1994. [9] Tecuci G. with contributions from Dybala T., Hieb M., Keeling H., Wright K., Loustaunau P., Hille D., Lee S. W., (1998). Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies, Academic Press, 1998. [10] Tecuci, G.and Keeling, H. (1998). Teaching an Agent to Teach Students. In Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin, Morgan Kaufmann, 1998. [11] Tecuci, G. and Kodratoff Y. (editors), (1995). Machine Learning and Knowledge Acquisition: Integrated Approaches, Academic Press, 1995.

54

Artificial Intelligence in Education S P Lajoie and M. Vivet (Eds.) /OS Press, 1999

The Systemion: A New Agent Model to Design Intelligent Tutoring System CANUT M. Francoise Universite Paul Sabatier - IUT A Labo LGC/Equipe API 50, ch des Maraichers 31077 TOULOUSE - France Tel: (33)0562258886 Fax : (33)0562258801 E-mail : [email protected]

GOUARDERES Guy SANCHIS Eric Universite de Pau - IUT de Bayonne UPS - IUT Rodez 3, Av. Jean Darrigrand 33. av du 8 mai 1945 64000 BAYONNE - FRANCE 12000 RODEZ - FRANCE Tel: (33) 05595289% E-mail: [email protected] Fax : (33) 0559528989 E-mail: [email protected]

Abstract: Considering the designing step of ITS and their necessary distribuated representation of knowledge, our claim is to make the most of the full potential of new architectures for agents, both at engineering and theoretical levels. The inherent problems of the evolution, the dynamic acquisition and the emergence of knowledge in such systems are underlined and discussed. So, in mis paper, a new conceptual scheme for architecturing cognitive agent is proposed based on determinism, rational actor and emergence. Another strong idea is that the term "cognitive agent" can be described as an agent that learns in the same sense in which people learns. So, focus is put both on learning protocols and mutant processes of a new model of agent: the Systemion. Keywords : Cognitive, Learning Agents, ITS, Agent Model.

1. Introduction Computer networks to support human learning is a field in major change which as to cope with recent approaches of Distributed Artificial Intelligence (DAI) techniques and new concepts to learning environments, Intelligent Tutoring Systems (ITS), etc... A perspective which has risen from applying DAI to teaching and learning (mostly on Internet), is the social perspective of what we call, user-friendly tele-learning. In the field of education, Internet can support collaboration between various domain experts and teachers in designing new approaches to teach and cooperate by sharing instructional material. A large store of information can be accessed or explored according to different ways (structured or not), providing opportunities to design ITS with diverse pedagogic strategies (guided, discovery or coaching mode). Studies to develop ITS rely on DAI to study computationally intelligent behavior achieved by the interaction of multiple entities, each one capable of autonomous action, to some extend. Such entities are usually referred to as agents [4], [8]. The three major problems in multi-agent based ITS are : the definition of the communication standards for exchanging information between the agents, the indexing of expertise to facilitate 'knowhow' tracking within all the relevant domains and the requirements of guidelines for interface design. So, to resolve these problems, agents must integrate adaptable strategies to monitor the user's actions and his level of attention and to provide him the adequate help in the same way as an ITS does. The first aim of this paper is to propose an issue to weaken the inherent negative effects (i.e. the "gaps") from the merging of Internet and ITS technology using individual or collective learning agents to model a learning session in an Intelligent Distance Learning Systems (IDLS). The second aim is to find an efficient abstraction for knowledge and

F.M. Canut et al. / The Systemion

55

reasoning representation to design mobile, learning and "mutant" software agents (named Systemions) improving this session modeling. 2. How to architecture knowledge in ITS ? 2.1 Classical, monolithical ITS The traditional ITS development was mainly based on inference engine power and monolitical knowledge base paradigms. In fact, many of studies on the development of ITS have considered the ITS an individual tool where different sources of knowledge (teacher, student, media,...) were predefined. Thus, it is very difficult for pedagogues and teachers to imagine all strategies to adapt to every student. Generic architectures [2] have been developed to cover a large scope of ITS. These architectures are build on hierarchical levels of knowledge and every level is controlled by the upper level. They are not dynamic and knowledge is very difficult to be defined by teachers (often, the system supports only one pedagogical strategy !) and very difficult to be modified because of the great degree of knowledge granularity. An ITS must be a system evolving and acquiring knowledge during the time. The acquisition of knowledge is not limited to the design step but must be developed during its usage, according to not only its own processes but to the environment. The necessary distributed representation of knowledge observed in the field of ITS design [4], [8] due to the diversity of the required expertise is another good reason for adopting a multi-agent architecture where every agent has its own knowledge and its own behavior (autonomy). The major problem in modeling a learning session is to find an efficient knowledge representation suitable for different learning processes and student models. 2.2 Traditional ITS versus evolving ITS From a previous work, [4] we have shown how an ITS architecture can be smartly designed with intelligent agents according to three parameters : mobility, agency (or autonomy) and intelligence. Mobility is the degree to which agents travel through a network. Agency is the degree of autonomy (in terms of interactions) and authority given to an agent (at a minimum an agent can work asynchronously). Intelligence is the degree in which we can gradually deploy preferences, reasonings, planning and learning. At the limit of these last parameters, an agent can learn and adapt to its environment both in terms of objectives and resources available (these three fundamental properties can be studied as multi-agent components). However, even if the design of these previous properties with construction blocks of knowledge is very important for educational purposes, it is still insufficient for ITS requirements. Indeed, we need to have agents which model human behavior in learning situations. This cognitive aspect of an agent relies upon its capability to learn and discover new facts or improve its knowledge for a better use. At a first stage, this aspect has been studied as a generic learning process which can be achieved and assessed by a variety of methods (genetics algorithms, machine learning, [6]). At a second stage, from previous multi-agent ITS experiments, we have emphasized on the potential of agent architecture to carry on these learning mechanisms [8],[9]. We have observed that, they can be classified on three levels of abstraction (mechanims) depending on the functional aspects of learning : - (a-mechanism) : learning as replication, where agents can provide instructional data, representation of pedagogical strategy, and one of them, the Tutor, is devoted to mimic the teacher acting in the classroom (learning process can be embedded as a reactive mechanism or a-mechanism), -(b-mechanism): learning by tautology, where demonstrations can be designed to guide the learner through the learning process with the help of specialized agents (Tutor, Companion, Adviser...) (learning process can be embedded as an adaptive mechanism),

56

F.M Canut et al. / The Systemion

- (c-mechanism): learning by dynamic interactions and shared initiative, where the computer is more active and information is not only provided by the courseware, but can be modified and generated by the learner, (learning process can be embedded as a cognitive c-mechanism). It can be noticed when using the first a & b-mechanims in multi-agent architecture paradigms, there is a lack of logical inferencing mechanisms needed to validate student actions and an amalgamation requirement with the knowledge acquisition paradigm to procure these facilities. Developed in distributed and but also social perspectives, previously mentionned works try to answer to this question : how to develop an architecture based on the user able to produce individual asynchronous learning, but also when it is necessary individual asynchronous learning and individual versus collaborative synchronous learning ? In such an architecture, it means that the user and machine must acquire new knowledge and competencies in learning about the other. The adaptive architecture of an agent can provide these different modes of learning on condition that knowledge and mechanisms to manipulate and adapt it, must be defined as multi-agent ITS components. 2.3 Multi-agent ITS components According to this perspective, the four main components of an ITS (the student model, the knowledge model, the pedagogical model, the interface model) has been formerly built in the form of intelligent agents. However, the evolution of intelligent tutoring systems toward the use of multiple learning strategies calls on reusable components in a multi-agent architecture as in Actor's agent [4]. An actor is an intelligent agent which is reactive, instructable, adaptive and cognitive. The actor architecture includes four modules (Perception, Action, Control and Cognition) distributed in three layers (Reactive, Control and Cognitive). ITS improvement by actors has progressively highlighted two fundamental characteristics : (1) learning in ITS is a reactive process (as an a-mechanism component) involving several partners, (2) to improve it, various learning strategies can be used such as one-on-one tutoring, learning with a co-learner, learning by teaching, learning by disturbing. However, the success of a pure multi-agent architecture based tutoring system depends on a learners' motivation and self-discipline. Most of the ITS inherits a 'context gap' at the points of divergence between the purpose of the tasks performed within an ITS and the purpose of the predicted solutions expected by the pedagogue. The difficulty is to know the level of understanding of the learner in order to give him/her adapted assistance and how to represent this required knowledge. According to this new stage issue, an intelligent agent needs a b-mechanism component. Related to this first difficulty, another point of view refers to a second problem of agent's adaptation in a society of learning agents : the 'dropping gap' (i.e., the rate of renunciation due to the lack of motivation and help). To weaken the inherent negative effects (i.e. the "gaps"), we have achieved a project LANCA [7] using individual or collective learning agents to model a user-friendly and collaborative learning session from the merging of Internet and ITS technology using the Web as a virtual classroom. Thereby, this method for weakening the "dropping gap" inevitably introduces the 'context gap' restraint jointly with the shared initiative problem between the learner and the system. Then, in a third stage of experiments, we intend to identify aspects of a user's behavior that can be monitored and learned when evaluating several achieved prototypes of cognitive agents working together integrating the learner "in-the-loop". An example is the CMOS (Cockpit Maintenance Operation Simulator) project [12] where the instructor's assistant role is to relieve the instructor of repetitive tasks. This artificial agent plays the role of an assistant whose collaboration is more and more useful and pertinent because s/he observes and generalizes decision taken by the instructor. The machine learning technics used to implement this artificial agent are Restricted Coulomb Energy mecanisms. RCE is a specific type of neural network very well fitted to the classification and generalization of tasks (much more adapted to this situation than multi-layers Perceptron) [13]. These final stage issues show that a cognitive agent needs a c-mechanism component. From these former series of results, we have pointed up some adequate properties and

57


principles in a general framework for multi-agent ITS with the following three-level of needs in requirements: Cognitive Agents: 1. consider different strategy and learning styles, 2. establish learning objectives, 3. create, locate, track and review learning materials, e.g., diagnostic instruments, scenarios, learning modules, assessment instruments, mastery tests, etc..., Adaptive Agents: 4. register changes and review/track students' progress and manage needed interventions, 5. provide student access to the net, manage student-tutors communications both asynchronously and synchronously, Reactive Agents: 6. assign appropriate materials to students, 7. manage student-ITS communications synchronously (when desired/required), 8. evaluate student needs and learning. The remainding problem is how to manage indidual versus collective learning strategies through these three levels of abstraction among a society of cognitive agent. 2.4 Different three-level agent architectures Going beyond the individual concept "one to one" of the client/server by a "many to many" one in a collective society of agents, one can choose a two-level architectural model [14]. But this architecture cannot ensure both learning by requirements satisfaction (individual), and learning by constraint satisfaction (collective). We have suggested the use of a three-level architecture (reactive, adaptive and cognitive behaviour) based on the ARC Model [5] which articulates two overlapped processes : collective (global) and individual (local) learning. That is why we have first proposed with C. Frasson (University of Montreal) [4] to use a three-layer model of Actor model (reactive, tactic and strategic) in the initial version of an ITS demonstrator for the development of agricultural plots in Niger [9]. DIFFERENT PARADIGMS Learning Agents Smart Agents Unfamiliar Situations Partially Familiar Situations

AGENT ARCHITECTURE

Acton

Syttemions

Decision (RCA Model)

Emergence

Cooperation

Control (Planning)

Soft Determinism

Autonomy

Reactivity (Feed-Back)

Rational Agent

Learning

Individual Cognitive Cognitive /versusGathering / Learning Collecive Decision Control

Perception Familiar Situations [Chazb-draa, Fig 1: Different

96]

Interface

Adaptative Learning Action Reactive Learning

[Nwana, 96] [Frasson et al., 96] [Gouarderes etal.,98] Black Board

abstraction of learning into a three-level architecture for cognitive agent.

From all these points of view (cf Fig.l), we are now focusing on learning activities for agents in two ways : 1-learning by requirements satisfaction (individual), such as two level agents of [14] (with planning of the presupposed or beliefs), 2-leaming by constraint satisfaction (collective) (by plasticity and functioning adaptation of the global system). Moreover, these learning agents have to learn on their three basic levels (cf Fig 1) but these learning protocols are not independent It will also be useful to design ITS with specially tailored agents that certainly learn to adapt to user interests and student style

58

F.M.

Canut et al. / The Systemion

criteria. Reactive agents in ITS learn to issue outputs (behaviors) in response to inputs (sensors) incrementally in time. However, we are more interested here by agents that try to learn higher-level knowledge and also that try to learn for collective use. For these reasons, reactive learning agents must be endowed with capabilities to adapt their behavior to unexpected contexts [8]. In fact, the four expertises (domain, pedagogy, student model, interface) are not independent. For individual learning agents, the task is to learn about and adapt to others in the group so that the overall group performance increases. In a group, each member has limited information about the knowledge and capabilities of others. An agent can model others by starting off with assumptions of their default behavior and then approximating the actual behaviors of others in the group by learning deviations from the default behavior. We are interested in endowing autonomous agents with the same capability so that they can adapt individual behavioral rules to enhance performance. The research questions that we address in a multi-agent ITS are the following : - How does an agent model others with its own skills and competencies ? - When does an agent learn skills or habilities ? - When and how does an agent use learning skills to interact with others ? In § 2 we have discussed different experiments to implement cognitive agents as agents that learn (like Actors or LANCA). In the following, we propose a new paradigm for agents that mutate using a learning mechanism : the Systemion model [11]. Such agents must include determinist and rational mechanisms to support their cognitive and emergent behavior in a learning situation. 3. SYSTEMION, a model of agent which learn and mutate 3.1 Principles of learning mechanisms 3.1.1 Determinism Determinism assumes that introduction of new technology in organization, has predictable unidirectional consequences on human behavior. Experimental issues (§ 2.1.2), suggest us to adopt a "soft" definition of determinism : combining technology advances with issues emerging from social interaction in a softer approach. The production of emergent phenomena is taken to be a characteristic of the self-organization. 3.1.2 Rationality Considering a learning environment as a conceptual high-level structure we can supply the system knowledge with goals, and available actions, and the ability of predicting its own behavior based on the principle of rationality. The individual perspective on rational actors was sketched in the described model of actors (§ 2.1.1) as an approach which accommodates such learning, modeling and analysis. 3.1.3 Emergence Combining determinism and emergence tends to focus on the concept of an emerging phenomenon viewed through a collective learning environment for agents. The following does not really focus on a mechanistic simulation of emergence by rational agents but instead, how to model social learning via a new type of agent. 3.2 The Systemion model 3.2.1 What is a Systemion ? A Systemion (contraction for 'systemic daemon'), is a simple and experimental software agent model integrating two other properties of the living systems on top of mobility : replication and evolution. Replication allows a Systemion to create a clone of himself locally or remotely. As species evolve from generation to generation, adapting themselves to modifications in their local environment, Systemions integrate a mechanism which self-modifies of their code. Three areas are at the origin of Systemions : software agents, server process (or daemons) and systemic entities. In operating systems and distributed applications, a demon


59

(daemon) is a process that is perpetually active and provides always the same service to client processes. A systemion is also a permanent executing process but which provides a service that can evolve or change during its life. Two important systemic entities distinguish them from classic systems : they are open to the external world, they exchange various items with the environment and they integrate both internal structuring and unstructuring elements, interactions of these different elements neither killing nor stopping the entity. In the same way, a systemion has an internal architecture elements can, in the course of the time, be dynamically subtracted, added, modified or inhibited. A systemion structure is composed of two subsystems (cf Fig.2): - a functional subsystem, - a behavioural subsystem.

Fig. 2 : Three level architecture for Systemions The functional sub-system (cf Fig.2a) implements what is relative to the achievement of the function assigned to a Systemion. This function can change during the Systemion lifecycle and constitutes the most flexible part of the Systemion. The basic element of the functional sub-system is the code segment. Functional Subsystem Missing

Deletion

Inhibited

Active

Fig. 2a: Code segment (genotype) operations The behavioral sub-system (cf example Fig.2b) implements independent characteristics of the current function assigned to a systemion, such as the ability of replication, the mobility or the maintenance of the systemion identity. The basic element of the behavioural sub-system is the Attribute. 3.2.2 Definition of attributes From the previous experiments described in § 2 (Actors, LANCA), we have selected the following criteria to model cognitive agents : 1- Autonomy: autonomous agents interact by messages (delegation mechanism), 2- Delegation: a specified task or sub-task is delegated to a dedicated agent, 3- Context: environment in which an agent can operate (implicit or bounded domain), 4- Predictability (determinism): their behavior is based on principle of rationality, 5- Complexity: agents can't learn by reactivity onlone (as deliberative agent), 6- Specificity: they are specifically tailored for a given task or mission. The next additive properties are specific to Systemions : 7- Evolution, 8- Learning (with emergence), 9- Mutation / adaptation in the sense of artificial life. Thereby, supplied with an evolution mechanism, Systemions are sometimes : - functionally well-determined agents (i.e., fulfilled with #1 to #6 attributes),

60

P.M. Canut et al. / The Systemion

- functionally undetermined agents (i.e., soft determinism assumed by #7 to #9 attributes). 3.2.3 Learning The learning mechanism is an inherent characteristic of a Systemion. It is implemented with a genetic algorithm as "Anytime Learning". A model possibly identified at a given moment, may be no longer accurate after a while as the main characteristics of the environment and of the agent may change. So Systemions, faced with an unexpected context have no other alternative than migrate (mobility) into a new site or adapt itself (mutation). Behavioural Subsystem

Fig. 2b : Replication/mutation

3.3 SYSTEMION "Life Cycle" The systemion "life-cycle" is supported by a reproductive system. The systemion father's lineage of systemions has a variable but limited life span. It disappears when a derived child's lineage is able to carry out all current father's tasks only with more efficiency (cf Fig.3).

F.M. Canut et al / The Systemion

61

3.3.2 Main features Mobility, allows a systemion to migrate to others sites (i.e. different URLs), Replication and mutation produce new generations of systemions (using genetic algorithm or/and copying code segments). Informational elements in systemion instances describe how knowledge is symbolized or formalized (i.e. as First Class Function in Scheme). When the reproductive system runs, each systemion can check in time whether the new adopted behavior does or does not meet the assigned goal, in order to increase the efficiency by increasingly specifying its own competency by learning. In this simplified proposal, the reproduction process is monitored and updated using just two parameters: replication frequency and learning rate. Moreover, the complete learning system is more sophisticated, the algorithm insures that the best lines of children are selected, by evolving a population of systemions. This requires the evaluation of many genotypes, each one representing a given level of competency to be acquired by a systemion. Thus, each inferred competency level should be continuously assessed (for example, by implicit evaluation in Scheme of the threshold of each affected variable). 3.3.3 General algorithm In the general algorithm for learning, the two parameters for self-supervised learning are: - Adaptive Threshold (A.T.) fixed to s, estimated by the af function. - Learning Rate, specif: classify specificity for each frame of selector according to a goal. The basic problem is how to estimate s reaching the expected specificity during a mutation. The basic process runs on three steps : 1Parent-line reproduction for a "lineage" is started for n steps. 2at each step crossing over is performed : 2.1 - Adaptive Function af for step x is returned if af (x) > s then a child-line reproduction is generated at step x for n' steps. 2.2 - if specif increase (i.e. number of * decrease) then n n+1 and af (x) < s then the line of parents die and so on In fact, systemions uses four functional properties for learning and mutating : Cloning (replication only) => precondition : Cloning (if af (x) = 0 and specif = 0) Supervised Cloning => precondition : S-Cloning (if af (x) > s and specif = 0) Mutation (replication+evolution) => precondition : Mutation (if af (x) > s and specif > 0) Mobility => Cloning (if af (x) = 0 and specif = 0) but with changing context(af & specif). 4. Tests & validation We try to improve with systemions the performance of pedagogical agents embedded in different shells of ITS dedicated to help the planning engineers in computer assisted design (Catia Dassault system). In the folowing "portico" example, the role of systemions is to carry out behavioral knowledge extracted from the know-how of differents users working as a team to develop gantries projects in mechanical domain. In the same way, know-how from different tutoring strategies will be exploited in a next stage. Systemions acts as help "mining operators" to provide new improved explanations for a pedagogical agent [6]. The role of this agent is to supervise local helps for the learning situation. It provides the learner with a pedagogical strategy to help him/her in a problem-solving situation. It uses : - a learner model initially formed by requests addressed to the user, and progressively enriched. - a knowledge base which includes the case-based problem (i.e. CAD of gantries), the explanation base, the different user's profiles, and the learning. It finds the built-in explanations adapted to the learner into the explanation base and brings them to the user in synchronous or asynchronous mode. It addresses requests to the negotiating agent for additional help from other users who might have resolved similar problems (but using different point of views). It asks all the other agents in the complete architecture to elaborate on which part of the explanation was decisive for reaching the solution.

62

F.M.

Canut et al. / The Systemion

In some parts of the current tutoring process this last point can be done automatically. To assess this capability of generating local improved explanations, we have experimented various lineage of systemions on a concrete case based on the CAD-design of a portico with three beam. We applied the learning process of different point of view acquisition as shown in Fig 4. The results, which follow, represent the percentage of the design's success for a generated population. There are 3 graphs representing 3 learning way to reach the most useful point of view to help the user. The requested help parameters for these 3 examples are identical.

Fig. 4: Three ways of learning for a given lineage of systemions depending on different contexts

5. Conclusion and Future Works To sum up the aim of systemion architecture is to provide a conceptual framework and tools to develop ITS co-design involving pedagogs and computer scientists, with this perspective, it is simultaneously necessary to reduce the complexity of the design and to model reality more accurately. But, we are well aware that these first software engineering issues do not prove the pedagogic validity systemions as ITS components. The next stage of this work concerns a modular and incremental method to complete the design of ITS with such agents. Moreover, from the concrete example of improving cognitive help in the portico design problem, we have shown that in the first step it is a question of working out a set of agents, and taking the classes of possible helps again. In this case, the advantage of modeling agent is to leave the agents the decision-making power to suggest (or not) different explanations according to selected point of view. They are ready and able to integrate new knowledge, to communicate between them, to solve problems, and to dialogue with their environment. To be even more effective, the final environment must integrate other types of agents, acting in the same way both on the pedagogical strategy and on the the learner model to


63

select the most adequate's one to integrate them in the complete pedagogical environment. This work is under development on the industrial prototype of the Cockpit Maintenance Operation Simulator [12], [13]. References [1] Baron C., Gouarderes G., 1999, «Mechanical design with Genetic Algorithms», 3d International Conference in Industrial Ingineering, Mai 99, Montreal, Canada, (to appear). [2] Canut M.F., Eloi M., "A Task-based Environment for ITS : The NGE Kernel", Eight International Conference on Tools with Artificial Intelligence TAl 96, Toulouse, November 1996, IEEE Computer Society Press. [3] Chaib-draa B., "Interaction between Agents in Routine, Familiar and Unfamiliar Situations" , International Journal of Intelligent & Cooperative Information Systems, 5(1), 1996. [4] Frasson C., Mengelle T., Aimeur E, Gouarderes G., "An Actor-based Architecture for Intelligent Tutoring Systems", Third International Conference ITS'96, Montreal June 1996, Lecture Notes in Computer Science, Heidelberg Springer Verlag. [5] Gouarderes G, "Les modeles LA de 1'apprentissage", 2° colloque Sciences Cognitives, SELF, Editions I'Harmattan, Novembre 1996, Biarritz, France. [6] Gouarderes G., Canut M.F., Sanchis E., "From Mutant to Learning Agents. Different Agents to Model Learning", A symposium at the 14th European Meeting on Cybernetics and Systems Research EMCSR'98, Avril 1998, Vienna, Austria. [7] LANCA : Learning Architecture Based on Networked Cognitive Agents, supported by AT&T, is underway at the University of Montreal and involves three countries (Canada, France, Mexico). [8] Marcenac P., Leman S., Giroux S., " A Generic Architecture for ITS Based on a Multi-agent Approach ", Third International Conference ITS'96, Montreal June 1996, Lecture Notes in Computer Science, Heidelberg Springer Verlag. [9] Millet S.," SERAC : Un Systeme d'Evaluation et de Revision Automatise des Connaissances en EIAO", Doctorat de I'Universite Toulouse III, Novembre 97. [10] Nwana H.S., "Software Agents : An Overview", Knowlegde Engeneering Review, Vol.11, N° 3, pp 1– 40, Sept 96, Cambridge University Press, 1996. [11] Sanchis E., " Systemions: des agents logiciels qui mutent", Rapport interne, Septembre 1996, I.U.T Rodez (France) http://omega.iut-rodez.fr/teachers/sanchis/htnil.rapp96/systemio.htnil. [12] Richard L., Gouarderes G., "Human Centered designed ITS for Aircraft Maintenance Training System" HCI Aeronautics, ACM SIGCHI, Montreal, May 1998. [13] Richard L., Gouarderes G., "An Agent-operated Simulation-based Training System", 9th International Conference on Artificial Intelligence in Education AI-ED 99, July 1999, Le Mans, France, (to appear). [14] Vidal J.M., Durfee H.M., "Agents Learning about Agents : A Framework and Analysis", Artificial Intelligence Laboratory, University of Michigan, April 97. http://www-personal.engm. umich.edu/~jmvidal/papers.hnas [15] Wooldridge M., Jennings N., "Intelligent Agents : Theory and Practice", The Knowledge Engineering Review, Vol 10, N° 2, pp 115–152, 1995.


Analysis of Collaboration and Group Formation



67

Learning Goal Ontology Supported by Learning Theories for Opportunistic Group Formation Thepchai Supnithi, Akiko Inaba, Mitsuru Dceda, Jun'ichi Toyoda, Riichiro Mizoguchi ISIR, Osaka University Email: {thep, ikeda, miz} @ei.sanken.osaka-u.ac.jp {inaba, toyoda} @ai.sanken.osaka-u.ac.jp 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047 JAPAN Abstract: The most important concept that plays a central role in the decision making process of the negotiation is a collaborative learning goal. At the process of negotiation, each agent considers the personal benefit for its own learner while it considers the social benefit for overall group. To make a negotiation reach an agreement, the compromises between the personal and social aspect is necessary. In this paper, we will focus on the learning goal ontology of collaborative learning and illustrate the basic way of thinking about the negotiation process for the opportunistic group formation.

1.

Introduction

The expectation of invaluable pedagogical effects induced by the mutual interaction among learners has been gathering much attention of the researchers in the field of education for a long time. In the research field of computer-based education, CSCL (Computer Supported Collaborative Learning) is a subject of study attracting the interest of many researchers in these years due to the rapid development of computer network, multimedia, and artificial intelligence in education. Our research interest here is to make the educational function of the collaborative learning group clear. In general, each member of a learning group is expected to achieve his/her own personal goal through the interaction while attaining the social goal of whole group. Based on this principle, we can clarify the right situation to shift the learning mode from individual learning to collaborative learning adaptively and also the configuration of the learning group appropriate for the situation. We call the integrated model "Opportunistic Group Formation (OGF)"[16]. It is important to find a good ontology to represent details of the OGF model so we can obtain maximum social educational utility from collaborative learning, while each learner is allowed to pursue private benefit. The ontology concerning OGF can be divided into two kinds, that is. Negotiation Ontology and Collaborative Learning Ontology. The combination of both ontologies gives us the possibility to scope the shared ontology that should be required. To make a negotiation reach an agreement, it is necessary to answer "What is the most important concept for the "justification" of negotiation?" In the learning environment based on OGF model, each learner has an agent as his/her tutoring system. The agents sometimes negotiate each other to form an effective learning group. Learning goal1 plays an important role in the negotiation process, because the negotiation will not lead the agent to an agreement without sharing the same goal. In the OGF negotiation process, it is important to compromise between the personal goal and the social goal. By compromising them, we will get the maximum benefit to learners from forming a group for collaborative learning. This paper is organized as follows: first we show the overview of our research idea of OGF. Then, we describe the system of concepts concerning OGF, that is, Negotiation Ontology and Collaborative Learning Ontology. Finally, we focus on Learning Goal Ontology in Collaborative Learning Ontology for OGF. 1

In this paper, we use the term "Learning Goal" in two meanings: (1) A learner's goal which represents what the learner acquires, and (2) An agent's goal to make an effective learning setting for a learner.

68

T. Supnithi et al. / Learning Goal Ontology active-helper

AL LA

AL

Individual Learning Mode (a)

passive-helper Broadcasting a request (b)

Negotiation among agents (c)

Success in forming a Learning Group (d)

Start Collaborative Learnine (e)

Learner 3 Agent

Figure 1 . The overview of negotiation process

2.

Opportunistic Group Formation

The idea of "Opportunistic Group Formation" can be expressed as follows: Opportunistic Croup Formation is a function to form a collaborative learning group dynamically. When it detects the situation for a learner to shift from individual learning mode to collaborative learning mode, it forms a learning group each of whose members is assigned a reasonable learning goal and a social role which are consistent with the goal for the whole group.

We will briefly explain the outline of the OGF here. The description of ontologies is described in the next section. Figure 1 shows the overview of the negotiation process. Basically a learner is in the individual learning mode and studies under a tutoring function of FTTS/CL (Figure l(a)). An agent takes charge of monitoring a learner and tries to get the benefit for its own learner by considering the personal goal. When the agent detects a desired situation for its own learner to switch into collaborative learning mode based on the learner model, the agent will initiate the negotiation process in order to form a learning group. At the same time, the agent establishes a learning goal and a desired role for the learner. This information is broadcasted to other agents as a request for forming a collaborative learning group (Figure l(b)). It is based on the personal aspect goal. Only the agents that can get the benefit of collaborative learning for their own learners will participate in the negotiation process (Figure l(c)). Agents are connected each other in order to do the negotiation. In the description of the negotiation process, opinion exchange, persuasion, compromise and criticism action will be selected to use in order to overcome the conflict among agents. The outline of concepts for these actions is shown in Table 1. Each agent considers the personal benefit for its own learner while it considers the social benefit for overall group. Learning goals are concepts that play an important role for executing each action. The most important key to overcome the conflict and reach an agreement of negotiation is to compromise between the personal and social aspect goal on behalf of learners. When the negotiation completed successfully (Figure l(d)), each participant in collaborative learning is well informed of the learning goal for a whole group and the role assigned to him/her. Then a new communication channel is opened for the members of the learning group. Participants can freely communicate with each other through the channel by using natural language (Figure l(e)). The communication is not monitored by agents in any sense. Agents only send some messages via a dialog box in order to give an explanation about how to collaborate in the initial phase and wait until participants achieve the goal. When the achievement of the learning goal is declared by one of the participants, the agents close the channel and ask the participants the outcome of the collaborative learning in order to evaluate their achievement. Each agent updates the learner model based on the evaluation and encourages the learner under its charge to resume his/her learning task in individual learning mode. The communication among agents is done following a protocol based on KQML[14]. 3. Ontology for Opportunistic Group Formation: Negotiation Ontology and Collaborative Learning Ontology

Our approach has two objectives: to build a negotiation mechanism and to identify concepts for supporting the negotiation. The former is about the decision making process compromised between the personal and social aspects. The important point to note here is that to make a negotiation reach an agreement, a process model is necessary. We constructed a Negotiation

T. Supnithi et al. /Learning Goal Ontology

69

Process Model represented by transition network[16]. The Negotiation Process Model is a key to making negotiation successful. The latter is about how to find a good ontology to represent details of the model OGF. In this paper, we will concentrate on the latter. Negotiation for OGF is based on learning goals, the typical classes of the learning group, roles of members in the learning group and learning scenarios. The fundamental principles that make the negotiation reach an agreement among the agents, are in a shared ontology[21]. By sharing the ontology, an agent is able to understand other agents' points of view and negotiate with each other. What is important here is what scope of shared ontology should be required. The sharing of ontology enhances the negotiation in order to reach an agreement. However, it at the same time decreases the independency and generality for each agent's behavior. The ontology concerning OGF is mainly divided into two types as shown below. Negotiation Ontology : The system of concepts for modeling the negotiation process such as opinion exchange, persuasion, compromise and agreement (Table 1). Collaborative Learning Ontology : The system of concepts for modeling the collaborative learning process such as learning goal, learning group type, and learning scenario (Table 2). When the ontologies are in use, they are arranged into three layers as shown in Figure 2. The top layer is the negotiation level that corresponds to Negotiation Ontology. The negotiation level is the level mat represents the important information for negotiation at an abstract level. The bottom layer is the agent level that corresponds to Individual Learning Ontology[21]. The Table I. The outline of Negotiation Ontology Short communication words for negotiation process call-for-participation Request to other agents in order to form a group formation of collaborative learning for their own learners. Reply an accept/decline answer after receiving call-for-participation message. reply Submit a (original, opposition, revised, compromise) proposal. proposal Support a proposal submitted by others. support Give (critique/justification) to other's proposal. give-opinion Ask a question on other's proposal. interrogate Open local information to public. open-info Persuade others to agree the proposal by giving justifications for it. persuade Agree with a compromise proposal. agree The activities that are happened among agents in negotiation process. Ne zotiation events The activity of sending a negotiation message from an agent to others. send receive The activity of receiving a negotiation message from other agents. reach-agreement The activity of reaching an agreement under the same appropriate proposal among all the agents which participate in negotiation process. The activity of figuring out a disagreement opinion among proposals. conflict The process of negotiating among agents in order to reach an agreement of forming a Ne gotiation process group formation for collaborative learning devising The process of forming a proposal and submitting it to other agents. compromise The process of forming and exchanging opinion in order to reach an agreement. persuasion The process of giving a strongly held opinion of its own learner as a justification observation The process of observing and understanding the conflict among proposals in order to control the direction to reach an agreement. investigation The process of justifying accept/decline reply and making an opinion as an answer for opposition. The concept that is necessary for negotiation process. Net'otiation objects proposal The design of collaborative group that an agent offer as a proper plan. It includes learning goal, role, topic, and learning scenario. justification The opinion that is used to support a proposal. It shows why the proposal is appropriate for collaborative learning. conflict The state of difference between part of opinion in proposals. criticism The opposed judgement and evaluation in detail for proposal. Collaborative learning The agent level concepts of collaborative learning task that are used in negotiation process. task concepts

Ne gotiation message

70

T. Supnithi et al. / Learning Goal Ontology

Table 2. The outline of Collaborative Learning Ontology Trigger

The detection of an opportunity for a learner to shift from individual teaming mode to collaborative learning mode. Impasse When a learner has some difficulty on a teaming process, the impasse trigger is detected. If the trigger isworthto initiate collaborative learning, the agent detected ft submits a request to public. When a learner complete a given task, the review bigger is detected, If review of the task is worth Review to be carried out as a collaborative activity, the request for group formation is submitted to pubiic. The trigger is prescribed by teaching materials authors in advance. For example, the author may Program describe the necessity of group learning by an experiment for a topic. In such a case, the program trigger will be detected when a certain number of learners successfully acquire prerequisite knowledge for the experiment Learning goal Goal for learner from collaborative learning viewpoint The learning goal that represents what a learner acquires. I-goal The learning goal that represents the means to attain I-goal. Y motion' which is; 'a push equal to friction causes box speed decrease' (i.e. no net force no motion/speed decrease") in an Assertion x2. Goal:- ADDRESS-AGENCY-VIOLATION

Tactic:- Persuade In assertion ASSERTION x2 you have given no causal agent, state change. In Assertion:- ASSERTION xl Abstract agent push > Abstract agent friction : causes box has property speed. Change: increase Isn't it the case that: Abstract agent push = Abstract agent friction : is a condition for trolley has property speed. Change: constant In this example a Persuade tactic addresses the strategic sub-goal ADDRESS-AGENCY-VIOLATION. It refers to the violating explanation (ASSERTION x2), critiques it in terms of the world model 'you have given no causal agent, state change.', and poses a consistent 'version' of the offending assertion—that is a system implication—as a question 'Isn' t it the case that: Abstract agent push = Abstract agent friction : is a condition for trolley has property speed. Change: constant'; where this question is prefaced by the previous student assertion (xl) that generated the implication.

lx

Note that these qualitative quantity values can have changes attached (e.g. increase, decrease) and relations applied (e.g. greater than, less than, equal to) to them.

158

A. Ravenscroft and R. Hartley /Learning as Knowledge Refinement

Essentially the system adopts three main types of tactic to play its role in the dialogue, namely Challenge, Probe and Persuade. 'Challenge' may be issued when the laws of agency or effect are violated, and the learner is presented with a critique in terms of the system's reasoning followed by a request for another, hopefully revised explanation. 'Probe' questions students about an implication the system has made from their explanation. The 'Persuade' tactic may be presented when a law of agency or effect is violated, and there is an implicit contradiction between a system implication and the learner's assertion that caused the violation. In this case the student is presented with their consistent assertion, to establish some common ground, followed by the system's inference that was generated from the assertion (see example above). The latter is usually posed as a question and, if the learner concurs, they are requested to Resolve the explicit contradiction. Additionally, the system can suggest an 'Assumption', which becomes the basis for further discussion if an impasse is reached. Several modes of expression for the tactics can be realised by the available locutionary acts and predicates.

2.3.1 Tactic Preferences The preference orderings of tactics for each CR state are specified in Ravenscroft [1], and follow a maxim of 'minimal exposition', so the system favours Challenges, then Probes, then Persuades or Resolves, and finally Assumes. Similarly, heuristics specify preferences for addressing particular CR states relevant to the same strategic goal: • 'prefcr-indirect-resolution' specifies the principle 'it is better to use indirect tactics, stick as Probe, Challenge and Persuade—be/on directly confronting the leaner with Reso/ve' when addressing the goals Address_agency_violattons and Addnss_effect_violations; • 'prefer-cause-for-effect' specifies the principle 'it is better to critique an effect without a cause (if. a cause anthout an effect) and request another cause for the effect' when addressing the goal Address_agency_violations, • 'prefer-opposite-value' specifies the principle 'if the learner has provided an explanation including a qualitative value (e.g. 'speed increase') and quantity implications have been generated (e.g. for 'speed constant' and 'speed decrease'), then Probe the opposite value next (i.e. 'speed decrease 'for this case)' when addressing the goal Check_inferred_states; • 'prefer-net-General-agent' specifies the principle 'it is better to request the effect far a net General-agent (e.g. net friction-force) before requesting the effect for HO net General-agent (e.g. net force)' when addressing the goal Check_inferred_net_relationi.

These are demonstrated during the interchanges in section 3 and 4.1.

3. The CoLLeGE System: A Dialogue Modelling Work-bench The design has been implemented in a prototype CoLLeGE (Computer based Lab for Language Games in Education) system written in Lisp and KR (a frame based extension, [17]) that demonstrates how the specified World Model, Interaction Language, Dialogue Strategy and Tactic Generation are implemented and integrated within a unified system realising the Facilitating Dialogue Game. The current CoLLeGE system is in its first implementation phase, and, since a suitable student-system interface has not been implemented it is used as a 'work-bench' for modelling, demonstrating and investigating the underlying dialogue processes necessary for learning as knowledge refinement. Currendy this CoLLeGE interface consists of three scrolling windows representing the Dialogue Record, Commonsense Reasoning Agenda and Dialogue Tactics Agenda, along with the World Representation (i.e. world model classification) of the learner's input and a series of buttons (along the bottom of the interface) for conducting the dialogue. These buttons facilitate the expression of explanatory Locutionary Acts (i.e. Assert, Withdraw, No, Do_Not_Know), the passing of initiative (i.e. Pass) and the selection of Dialogue Tactics from the Agenda (i.e. Choose-Tactic) to perform facilitating Locutionary Acts; where the dialogue tactics (e.g. Persuade) are linked to the strategic goals they address (e.g. ADDRESS-AGENCY-VIOLATION). Each Locutionary Act, Commonsense Reasoning State and Dialogue Tactic is assigned an identification number, for example, in the illustration the first Assertion in the dialogue record is numbered 4983; these numbers are referred to in subsequent descriptions. This example (Figure 1) is taken from of a typical dialogue that occurred during a validation study (see section 4). On the first turn of the dialogue the learner Asserted (4983) an explanation for an increase in speed of a box, where the predicates used (i.e. 'acts on' and "has property") allowed 'box' and 'person' to be classified as Entities, and 'push' to be classified as an Action in the World Representation. After Passing the initiative to CoLLeGE , the system, in applying its law of effect, generated causation implications for quantities (i.e. zero, small, medium, large) of antecedent and consequent states, some of which (e.g. 14704, 14715) are illustrated in the Commonsense Reasoning Agenda. However, in order for CoLLeGE to complete the model, the goal ACQUIRE-MISSING-STATE required the learner to provide an explanation for the box speed to decrease, and so an Atquire-contruct tactic addressing this

A. Ravenscroft and R. Hartley / Learning as Knowledge Refinement

159

Figure 1: The CoLLeGE Interface goal (9272) was performed on the second turn. The learner provided an inconsistent causal explanation (11528) in response, Asserting a zero push was a condition for the box speed to decrease. Upon receiving the initiative CoLLeGE reasons that this Assertion (11528) is inconsistent with the Law of Agency and posts a critique (14694) to the Commonsense Reasoning Agenda. Note that the system generates a repertoire of tactics for addressing this agency violation (i.e. goal - ADDRESS-AGENCYVIOLATION), including Persuade (14774), Probe (14772) and Challenge-construct (14769, 14766) tactics, and posts these to the Dialogue Tactics Agenda. Although the deployment of any of these tactics is a legitimate response, following the specified preference orderings and the 'prefer-cause-for-effect' heuristic the Challenge-construct (14769) was chosen that requested the learner to 'think again' and provide another cause for the given effect (i.e. 'box speed decrease'). The learner was subsequently stimulated to introduce friction into their explanatory model (Assertion-15748), allowing 'friction' to be represented in the World Representation as an Abstract Agent. In this example, the student then Withdrew their inconsistent assertion (11528) in the same turn.

4. A Validation Study An empirical study was undertaken, and has been fully reported by Ravenscroft [1], in this paper we will focus on demonstrating the dialogue interactions that led to belief revision and improvements in explanatory performances that are reported elsewhere [1, 4]. The chosen subject for the study was kinematics, and used the context of a person pushing a box/trolley along a (horizontal) floor [5]. Eleven students participated, who all showed, in varying degrees, deficient explanations and alternative conceptions in their initial narrative and explanatory accounts of the box/trolley pushing event. The students and tutor-facilitator conducted a dialogue according to the specified facilitating dialogue game framework, the latter documenting the discourse on a flip-chart in an equivalent form of the interaction language, and all these sessions were transcribed. The learners' utterances were entered into the CoLLeGE system by the experimenter, so its tactics could be generated and matched with those selected by the tutor.

160


4.1 The Dialogue Process and Dialogue Tactics The CoLLeGE system was able to simulate the dialogues, with all tutor tactics (with one exception) being generated by the system and placed in the Tactics Agenda. The dialogue transcripts are lengthy, so extracts from the CoLLeGE simulations will be used to demonstrate some of the interesting tactics and tactic changes which governed the flow and content of the dialogue. The first extract demonstrates tactics that addressed a student's (Sll) alternative conception of a push implying motion, which led to an impasse for the equilibrium condition when the trolley speed is constant. The student expressed this common but inconsistent assertion (in the World Model terms of its law of agency) 'push > friction causes trolley speed constant' (i.e. a net causal agent push resulting in no state change) in turn 19. The tutor (and CoLLeGE), following the maxim of 'minimal exposition', then adopted a series of Probe-construct tactics and applied the 'prefer-opposite-value' heuristic to Probe the less than '< ' condition before the equilibrium '=' condition—see below. 19. Sll. Push greater than friction causes the speed to be constant. Assertion:- ASSERTION-32718 Abstract agent push > Abstract agent friction :causes trolley has property speed. Change: constant 20. F. What is the consequence when push less than friction? Goal:- CHECK-INFERRED-EQUILIBRATION-31036 Tactic:- Probe-construct Construct: Abstract agent push < Abstract agent friction : causes ? 21. Sll. Push less than friction causes the speed to decrease Assertion:- ASSERTION-36473 Abstract agent push < Abstract agent friction :causes trolley has property speed. Change: decrease 22. F. What is the consequence when push equal to friction? Goal:- CHECK-INFERRED-EQUILIBRATION-31032 Tactic:- Probe-construct Construct: Abstract agent push = Abstract agent friction :causes ?

23. S11. Push equal to friction causes the speed to be constant. Assertion:- ASSERTION-40224 Abstract agent push = Abstract agent friction : causes trolley has property speed. Change: constant Then, after probing for another explanation for when push is greater than friction and acquiring a consistent but contradictory explanation (ASSERTION-45223 below) the student was asked to Resolve the explicit contradiction between this recent explanation and their previous one (ASSERTION-32718). They subsequently Withdrew the inconsistent assertion (32718), leaving a consistent explanatory model. 26. F2. (Resolve Contradiction) same causes, different effects, push greater than friction causes the speed constant, push greater than friction causes the speed increase. Goal:- ADDRESS-CAUSATION-VIOLATION-48838 Tactic:- Challenge-resolve There is an explicit contradiction between ASSERTION-32718 ASSERTION-45223 (same agent, different effect) .Resolve: Assertion:- ASSERTION-32718 Abstract agent push > Abstract agent friction : causes trolley has property speed. Change: constant or Assertion:- ASSERTION-45223 Abstract agent push > Abstract agent friction : causes trolley has property speed. Change: increase 27. SI 1. Withdraw push greater than friction causes the speed constant Statement withdrawn ASSERTION-32718

If these tactics prove unsuccessful—a Persuade tactic may be adopted, that was successful with another student (S7) who had the same alternative conception—see below:


161

14. F. Push less than friction causes the trolley speed to decrease (a repeat of a student statement) Isn't it the case that a push equal to friction is a condition for trolley speed constant? Goal:- ADDRESS-AGENCY-VIOLATION-62866 Tactic:- Persuade In assertion ASSERTION-31238 you have given causal agent, no state change. In Assertion:- ASSERTION-57992 Abstract agent push < Abstract agent friction : causes trolley has property speed. Change: decrease I s n ' t it the case that: Abstract agent push = Abstract agent friction : is a condition for trolley has property speed. Change: constant to which the student agreed, and then other inferences were probed. Further tactics (shown with student S3) that address the goal ACQUIRE-GENERAL-AGENT are Acquire-construct and Probe-construct, which deal with the difficulty of establishing an equilibrium condition when the push and friction effects have been established. This leads, hopefully, to the realisation that their effects can be 'added' to give a resultant equilibrium. 25. F. Is there a word which is general name for push and friction? Are they examples of the same type of thing? Goal:- ACQUIRE-GENERAL-AGENT Tactic:- Acquire-construct Construct: Can you give a general name for push and friction? Push AND friction ISA? 26. S3. Yes...is it force? Assertion:- push AND friction ISA force 27. F. So net push-force causes? Goal:- CHECK-INFERRED-NET-RELATIONS Tactic:- Probe-construct Construct: Net push force causes ? 28. S3. Net push-force causes box speed increase. Assertion:- Net push force causes box speed increase 29. F. Net friction-force causes? Goal:- CHECK-INFERRED-NET-RELATIONS Tactic:- Probe-construct Construct: Net friction force causes

?

30. S3. Net friction-force causes box speed decrease. Assertion:- Net friction force causes box speed decrease 31. F. Net force zero causes? Goal:- CHECK-INFERRED-NET-RELATIONS Tactic:- Probe-construct Construct: Net force zero causes ? 32. S3. Net force zero causes box speed constant (no change). Assertion:- Net force zero causes box speed constant To summarise this interchange, once the Acquire-construct tactic had stimulated the learner to provide the General-agent 'force' as the generalisation of Abstract-agents 'push' and 'friction', following the 'prefer-net-General-agent' heuristic a series of Probe-construct tactics acquired explanations for the net General-agents (i.e. 'push force' and 'friction force'). After which a further Probe-construct acquired an explanation for the zero-net General-agent condition (i.e. 'Net force zero"), facilitating the development of a complete, consistent and more general model for the motion context. In all these interactions (except for the occasional use of repetition) the CoLLeGE tactic preferences could be matched against those performed by the tutor, validating the implemented framework. 4.2 Summary and Further Comments Although this is a small-scale study, of the eleven subjects participating, all except two provided incomplete and inconsistent (in World Model terms) explanatory models during the early stages of the dialogues. During the subsequent dialogue games five subjects produced models that were complete, consistent and general, with all except one having to revise and refine their beliefs. Additionally, three subjects constructed complete and consistent models they were unable to generalise. Only three subjects

162


could not be stimulated to revise their incomplete or inconsistent explanations. Further, all the refined models, with the exception of one generalisation were retained in delayed post-tests. Although CoLLeGE generated the tactics that were deployed by the tutor, his tactic selection was sometimes more sophisticated than currently specified because it emerged and developed reactively during the dialogues in response to unforeseen conceptual difficulties experienced by the students. In other words, the tutor learned about pedagogy within and across dialogues and refined his tactic selection accordingly. When these modifications to tactic selection have been implemented and the interface is made suitable for students, CoLLeGE can be exploited as a Virtual assistant' as well as a dialogue modelling work-bench. This line of work is interesting, because the CoLLeGE approach should have wide applicability in subjects where this style of educational dialectic is considered useful. Acknowledgements The authors are grateful to Dr. Rachel Pilkington for her help with the design of the dialogue and Dr. Conroy Mallen for his assistance with the CoLLeGE implementation. The second author acknowledges the support of Leverhulme through the award of an Ementus Research Fellowship. References [1] A. Ravenscroft, Learning as Knowledge Refinement: A Computer Based Approach, unpublished Ph.D. Thesis, Computer Based Learning Unit, University of Leeds, UK, 1997 [2] ]. D. MacKenzie, Question-Begging in non-cumulanve systems, Journal of Philosophical Logic, Vol. 8, 1979, 117-133. ' [3] D. Walton, Logical Dialogue-Games and Fallacies, published Ph.D. Thesis, Lanham: University Press Amenca, 1984. [4] J. R. Hartley & A. Ravenscroft, Supporting Exploratory and Expressive Learning: A Complimentary Approach. In Special Issue, Dicheva, D. & Kommers, P. A. M. (Guest eds.). International Journal of Continuing Engineering Education and Lifelong Learning. Special Issue: Micro Worlds for Education and Continuous Learning, Vol. 9, No. 2,

1998 (in press). [5] D. Twigger, M. Byard. S. Draper, R. Driver, R. Hartley, S. Hennessey, C. Mallen, R. Mohamed, C. O'Malley, T. O'Shea & E. Scanlon, The Conceptual Change in Science project. Journal of Computer Assisted Learning, 7,1991,144–155. [6] J.R. Hartley, M. Byard & C. Mallen, Qualitative Modeling and conceptual change in science students. In L. Birnbaum (ed.). The International Conference on the Learning Sciences, Proceedings of the 1991 Conference, Northwestern

University of Evanston, Illinois. Charlottesville.Virgma: Association for the Advancement of Computing in Education, 1991. [7] R. Gunstone, & M. Watts, Force and motion. In Dnver, R., Guesne, E. & Tiberghien, A. (1986) (eds.) Children's Ideas in Science. Open University Press, 1985. [8] ]. Clement, Students' preconceptions in introductory mechanics. American Journal of Physics, 50 (1), 1982, 66-71. [9] j.R. Hartley, Qualitative reasoning and conceptual change: Computer based support in understanding science. In Special Issue Winkels, R. & Bredeweg, B. (Guest eds.) Interactive Learning Ennroumtnts: The Use of Qualitative Reasoning Techniques in Interactive Learning Environments, 1996 (in press).

[10] L. Vygotsky, Mind in Society. Cambridge, MA: Harvard University Press, 1985. [11] R.M. Pilkington, & C. Parker-Jones, Interacting with Computer-Based Simulation, Computers & Education, 27,1,1996, 1–14 [12] M. T. H. Chi, M. Bassok, M.W. Lewis, P. Reimann & R. Glaser, Self Explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145–182. [13] R.M. Pilkington, J.R. Hartley, D. Hintze & D. Moore, Learning to Argue and Arguing to Learn: An interface for computer-based dialogue games. Journal of Artificial Intelligence in Education, 3 (3), 1992, 275-85. [14] R. Dieng, O. Corby & S. Lapalut, Acquisition and exploitation of gradual knowledge. International Journal of Human-Computer Studies, 42, 1995, 465-499. [15] W.R. Van Joolingen, QMaPS: Qualitative Reasoning for Simulation Learning Environments. Journal, of Artificial Inttlbgnce in Education, 6 (1), 1995, 67-89. [16] R.M. Pilkington, Intelligent Help: Communicating with Knowledge Based Systems. London: Paul Chapman, 1992. [17] B. A. Myers, D. Guise, R.B. Dannenberg, B. Vander Zanden, D. Kosbie, P. Marchal, E. Pervin & A Mickish, The Garnet Reference manuals (2.0). Technical Report, CMU-CS-90-117-R2, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA, 1992.

Evaluating Adaptive Systems



\ 65

A methodology for developing affective skills with model based training Tariq M Khan, *Keith Brown, *Roy Leitch School of Informatics and Multimedia Technology University of North London, London N7 8DB. [email protected]. *Department of Computing and Electrical Engineering, Heriot-Watt University, Edinburgh EH 14 4AS. {keb;leitch}@cee.hw.ac.uk Abstract: An analysis is presented of how model based training can be used to develop metacognitive skills of modelling. The proposal involves using model based training systems as cognitive tools for learners to develop skills in modelling in engineering and non-engineering domains. To achieve this, a rich description of intellectual knowledge is needed which will facilitate well reasoned and productive discussion. The learner and an instructor each hold an initial model of a system, called a viewpoint, which is represented by a distinct model. By negotiating on certain characteristics of both models, both the learner and the instructor can engage in instructional discussion. The learning experience is supported by a computer tutor that acts as a shared workspace to aid representation and communication of viewpoints.

1. Introduction Social interaction in the work place demands adeptness in communication skills such as negotiation and persuasion as well as technical knowledge. These skills are emergent properties of interaction between two or more agents, and are demonstrated while actually discussing or being engaged in similar communicative acts. The key principle is that it is difficult to try to model communication knowledge because much of the characteristics of good communication are tacit and unavailable for inspection or modelling. This fact goes against one of the most enduring assumptions underlying the design of Intelligent Computer Aided Learning (ICAL) systems: that domain knowledge can be represented in symbolic models and people can develop skills by learning these descriptions of knowledge [1]. To support affective skills, support must be provided so that skills are developed through practice. And in order to provide this support technology-based methods, such as modelbased training, which were designed to teach intellectual skills, must be applied to teaching affective skills. Although it is difficult to model affective knowledge, a well chosen set of domain models describing intellectual knowledge can be used to create a learning environment that supports learning of affective elements of behaviour. The learning environment must be used as a cognitive tool in which the descriptions of intellectual knowledge are manipulated to facilitate access to affective knowledge. Here we present a methodology that promises to support development of affective skills. Although, the work described is mainly conceptual, the approach taken is amenable to practical application and is currently undergoing implementation. 2. Affective Skills The affective domain [2] is an area that is poorly suited to being modelled explicitly in executable models. Much of the difficulty arises because of the ambiguous understanding most people have of such concepts as creativity, consequently, the affective aspects of knowledge cannot be adequately abstracted from their context of use [3]. Careful use of models, however, can encourage development of affective skills. This process requires two

166

T. M. Khan et al / Developing Affective Skills

elements: viewpoints, which are descriptions of the situation that enable communication of mental models in a shared workspace; and roles, which are particular manners in which the communication takes place. 3. Viewpoints Reflective cognition [4][5] involves identifying and articulating various constituents of a viewpoint, and evaluating them in the context of the situation. This is more demanding than experiential cognition, which is automatic and symptomatic of expert behaviour. Different amounts of reflection can be supported through use of different kinds of dimensions. Changes in critical dimensions, which are the defining characteristics of a viewpoint, represent a switch of viewpoint as a consequence of relatively deeper reflection. Whereas a change in accidental dimensions, which are nonessential characteristics of a viewpoint, denotes only a shallower form of reflection in which a switch of method takes place within the same viewpoint. Any dimension can be either critical or essential since it depends on the nature of the problem. Students can adopt different values of accidental dimensions based on preference, and still have consistent viewpoints if the critical dimensions coincide with other students and instructors. If the critical dimensions do not match, then the viewpoints are said to be inconsistent. Different viewpoints of a situation can be described with different models, which differ in fundamental dimensions. These facilitate communication of viewpoints between students and instructors so that particular aspects of viewpoints can be discussed. 4. Roles A role is the manner in which communication takes place. Figure 1 shows three examples of affective roles: conciliating, resolute and generative. The important point is that a role requires two participants: a learner and a more experienced master or instructor. Interaction between these two agents allows their different roles to be demonstrated. It is only because a learner, for example, continually generates new hypotheses, that an instructor can reject them and remain resolute. Similarly, the instructor's resolution is necessary for the learner to be generative (produce more hypotheses). Conciliation is essential for the novice faced with new models and knowledge that seem inconsistent with the old. Conciliation involves identifying critical dimensions for the situation and ensuring that the synthesised viewpoint maintains their values. This ensures that the synthesised viewpoint is functionally consistent with the two subject viewpoints. Figure 1 illustrates that the learner can learn in different ways by working in different interactive modes with a professional (mentor). The relevance for AI-ED research is that the instructor can be computer generated. In our work the instructor is provided by a model based training system that is capable of retrieving models and explaining them. The architecture is described in [6][7] so here the focus is on the instructional strategy, which creates productive learning scenarios for the learner through dialectic relationships.

Figure 1: Dialectic relationships between affective roles

There are many affective skills that can have a dialectic relationship with other affective skills; which allows learning strategies to be created where learner and instructor adopt different roles and viewpoints. For instance: (1) the instructor can adopt a conciliatory role

T.M. Khan et al. /Developing Affective Skills

167

and the learner a generative one; (2) the instructor resolute and the learner conciliatory; and (3) the instructor generative and the learner resolute. A dialectic is created when each participant in an interaction has a different viewpoint and role, which produces two conflicting viewpoints of the world that must be synthesised. This can be arranged by associating different models of a situation with each participant, and enabling communication in terms of the models. In the next section we examine how a multiple model architecture can facilitate interaction between a learner and an instructor by providing the vocabulary for expressing each participant's viewpoint. 5. A vocabulary for describing models A model can be visualised as a geometrical shape with a number of axes or dimensions. It is easier to think of a cubical form with three dimensions, Figure 2, although there will be more than three dimensions in reality. Each dimension represents a significant decision point that helps to locate the model in the model space, which is a multi-dimensional space containing all possible models of a system or problem. In selecting a model decisions will have been made about all the relevant dimensions that locate it. Or put another way, by choosing values for every dimension, e.g. high, low, medium, for resolution, or component, subsystem, system for scope, one can navigate to a specific model in the model space. Neighbouring models will differ slightly-possibly by a value for only one dimension. More distantly related models will differ more. Therefore, the relative co-occurrence of values for dimensions is an indication of the relative similarity of models. Generality

Resolution

Scope Figure 2: Multiple Model Cube-each cube comprises smaller cubical structure

A comprehensive set of properties, called modelling dimensions [8][9], allows a rich representation of a situation and supports many different kinds of knowledge. Each dimension is a fundamental characteristic associated with perception of a physical system. Here only a few dimensions are discussed, which are relevant to the examples given later: scope, ontology, generality, and resolution. Decisions about ontology hit at the heart of how a situation is perceived. One student could choose to analyse the chemical and physical processes in a system, whereas another could concentrate on the electrical, mechanical or financial processes. A third student could choose to ignore processes and instead examine the individual components in a system. Choices about ontology have repercussions elsewhere. The value for scope is strongly dependent on ontology since the bounds of the system to be included in a viewpoint are determined by the choice of process or the number of components identified as relevant. Resolution dictates how much detail a viewpoint has. Several small components, such as water tanks and valves can be kept separate in a high resolution viewpoint, whereas a low resolution viewpoint would combine them all in a single

168

T.M. Khan et al. / Developing Affective Skills

unit. Finally, generality affects the efficiency of a viewpoint. A high generality viewpoint is relatively abstract and permits reasoning over a large number of situations but requires additional reasoning to establish a connection between the abstract representation and the world. In contrast, a low generality viewpoint is limited to a few problems but has a direct mapping to the world. Different positions along each dimension's quantity space produce different viewpoints. These can be communicated and discussed in a shared workspace to reflect on the choices made and the consistencies or otherwise between two or more viewpoints on a common situation. 6. Representation of Models A model is created by specifying values for a set of modelling dimensions. A set of models, each with different values for dimensions, produces a multiple model architecture, in which navigation through the space of models represents a meta-level strategy for switching viewpoints. Every model is represented in one of three ways: equations for formal knowledge, and production rules and procedures operational knowledge [10]. 7. Summary of Terms A viewpoint describes the elements of technical domain knowledge. Often these are stated in terms of descriptions of knowledge in various representations, e.g. procedures, rules and equations. A role involves demonstrating affective skills. Combination of skills (roles) and representations (viewpoints) produces a distinct identity for the actor. Viewpoints provide the vocabulary for communication whereas roles govern the manner of the communication. Learning can be encouraged by designing interaction between students and instructors to produce a dialectic that promotes learning. Thus, for example, an engineer contemplating how two depictions of an industrial boiler plant can be combined to resolve an uncertain system state might adopt a conciliatory role for problem solving, with boiler plant viewpoints. When confronted with a different role and/or viewpoint there is potential for learning to take place. This process can be supported with ICAL by employing descriptive models of the technical domain knowledge (rules, equations, procedures) to serve as the raw material with which communication works. A space of multiple domain models enables several different viewpoints to be held, and switching models is analogous to the metacognitive act of switching viewpoints. Metacognitive reasoning involves thinking about the problem solving process itself, e.g. why one approach to problem solving is preferred over another. It demands self-explanation [11], flexibility and reflection [5]. so the modelling dimensions are the substance of reflection.

Figure 3: Structural representation of Human Heart

T.M. Khan et al. / Developing Affective Skills

169

Elsewhere we have given an example of the use of the ICAL in supporting communication about an industrial process plant [9]. Here we will provide an example concerning models of the human heart. The purpose of the interaction between the instructor and the learner is that by the instructor offering alternative viewpoints, the learner is encouraged to conciliate these with his or her own viewpoint. In synthesising the viewpoints, the learner develops conciliation skills as well as technical knowledge. Resolution skills are developed by justifying why an initial viewpoint should be retained, and generative skills are developed by producing viewpoints consistent with an instructors. 8. Example of Learning Affective Skills An example is presented here that demonstrates how a multiple models architecture can be used to provide practice in affective skills. The learning strategy employed is to encourage conflict between viewpoints and roles that must be reconciled. The aim is to teach affective skills rather than the technical knowledge, therefore, the domain is incidental to the learning objectives. The domain chosen for the examples is the human heart. Affective skills are developed by using descriptive models of the heart or any other domain as tools for exercising conciliation, resolution and generation. The example focuses on conciliation. 8.1 The Structure of the Heart The circulatory system allows blood to carry oxygen around the body. It includes the heart, lungs, arteries and veins. In the heart there are two upper chambers (right and left atria ) and two lower chambers (right and left ventricles). There are four valves that allow blood to pass in one direction through the chambers: tricuspid, pulmonary, mitral and aortic valves. The heart pumps blood to the lungs to be oxygenated and then to the rest of body. Through contractions the heart sustains pressure. Figure 3 shows how this arrangement can be described as a process system using the metaphor of an industrial process. The main constituents are tanks, valves, and pumps. The atria are represented as tanks, and the ventricles are represented as combinations of tanks and pumps (P1, P2, P3 & P4). Valves are naturally enough shown as valves, and arrows indicate blood flow. Blue blood indicates oxygen depleted blood, whereas red blood is oxygenated blood. This is a very simplified representation of the heart, which is sufficient to demonstrate the idea. 8.2 Selecting Viewpoints Both the learner and instructor can develop viewpoints of the heart that are particular to them. The characteristics of each viewpoint depend on the purpose for producing viewpoints and individual preference. By using the vocabulary of modelling dimensions, each viewpoint can be described by a different model. If the task is to explain the flow of blood from the right ventricle into the left ventricle, there are several options for modelling. One approach, say of the instructor, is to take a process view and examine the components of the heart involved in circulation. Another view, say of the learner, could take a component approach and focus on the components of the heart located adjacent to the left and right ventricles. The instructor's view could contain all four chambers, pumps and valves, whereas the learner's could be confined to the chambers and valves only (both ignoring the lungs). Another choice is about the type of knowledge used to describe the situation. Say the instructor decides to use basic principles of fluid dynamics and the learner is content with a heuristic that associates blood flow out of the right ventricle with blood flow into the left ventricle. The learner also chooses a low resolution viewpoint in which the valves are not included as separate components and the upper and lower chambers are combined. Therefore only two aggregate chambers (left and right) are considered and connected via a simple conduit, such as a blood vessel, Figure 4 (b). A medium resolution could be employed by the instructor in which the 'pump' characteristics of the chambers are not considered, Figure 4 (a), but the upper and lower chambers are kept distinct. Both viewpoints are simplified perceptions of the original situation of Figure 3, and both can be modelled according to their dimensions. Instructor {Ontology is process, Scope is entire heart, Generality is principles, and

170

T. M. Khan et al. / Developing Affective Skills

Resolution is medium} and learner {Ontology is component, Scope is entire heart, Generality is associations, and Resolution is low} A model may be simply a rule that references appropriate parts of the heart and which contains relationships between them. Similarly it may be an equation or procedure. However, to aid visualisation, simulations should be used that reflect the viewpoints of each participant (as in Figure 4). Note that the Generality dimension dictates the kind of knowledge available for reasoning and the other dimensions constrain what factors must be considered in selecting a model of that knowledge kind. In the case of the instructor, a quantitative simulation should be employed, whereas for the learner a rule-based (causal) model could be employed to simulate the blood flow. 8.3 Selecting Roles So far choices about viewpoint (representation) have been made, which allow the training system to retrieve appropriate models (provided they exist). The next part of the training is for each participant to adopt a suitable role. Figure 1 can be used to select beneficial combinations of roles. In the case of the instructor, the training system will act out the role, and will continually generate new consistent viewpoints (by switching to other models), or attempt to justify (through explanations) why the instructor's initial model is most appropriate. The learner could adopt a conciliatory role and attempt to produce a third viewpoint that borrows from both conflicting viewpoints.

Figure 4: Viewpoints developed by instructor (a) and learner (b)

8.4 Identifying Critical Dimensions To recap, a critical dimension is one that is essential to producing a useful and appropriate viewpoint, whereas accidental dimensions can be substituted according to preference. For two viewpoints to be consistent, they must share the same values in the quantity space for the same critical dimensions but values can differ in accidental dimensions. The task, "explain the flow of blood from the right ventricle into the left ventricle" suggests that the boundary of the viewpoint must be set to include the left ventricle. Therefore Scope is a critical dimension, and both viewpoints must accord with each other on this to be consistent. There is no requirement, however, for agreement on the other dimensions, so although interest is with "flows", a process view is not essential but may be preferred. 8.5 Conciliating Viewpoints The purpose of constructing models is to provide material in a shared workspace (training environment) to practice different affective skills. Discussing, negotiating and adapting viewpoints in terms of values in the quantity space of different modelling dimensions provides valuable opportunities to experience activities that are commonplace in the work setting, but which are normally difficult to support with machine based training. The key to successful conciliation is to recognise critical dimensions and ensure that they are not subject

T. M. Khan etal.l Developing Affective Skills

171

to negotiation. Discussion should be focused on the accidental dimensions, which in the case of the example, are ontology, generality and resolution. The objective should be to attain a balance between the instructor's and the learner's viewpoints and avoid one-sided negotiations. Figure 5 shows one possible conciliated viewpoint. Blue Blood

Red Blood

R i g h t Side

Left Side

A :

Figure 5: Conciliated Viewpoint

The conciliated viewpoint takes account of both initial viewpoints by maintaining the Scope value at the entire heart, and conciliating on the Resolution level by treating each pair of upper and lower chambers as one unit (like the learner's initial viewpoint) but including both the pulmonary and mitral valves (like the instructor's). The result is a Resolution of lowmedium. Conciliation is also reflected in the fact that a process Ontology has been adopted (from the instructor), since a component view would have ignored the valves. Finally, to maintain a balance, the learner's Generality value can be adopted, i.e. rules instead of principles, which affords a more qualitative approach to reasoning. Learning occurs when the learner communicates his or her viewpoint, and the instructor generates a consistent viewpoint that the learner must synthesise with his or her original. Synthesis, in this case, involves adopting generality from the learner's viewpoint, ontology from the instructors, maintaining each ones scope and finding a mid-point for resolution. The synthesis has elements from both learner and instructor while maintaining consistency with them both. 9. Related Work A review of the application of multiple viewpoints in ICAL systems can be found in [12]. More recently, the VIPER system helps students develop an understanding of the applicability of different viewpoints in solving diagnosis problems [13]. VIPER relies on predefined models of Prolog debugging and associates these with a bug catalogue, from which the student selects one as the basis of further investigation. The modelling dimensions approach instead relies on the learner and system generating personal viewpoints for the situation. Rather than, as in VIPER's viewpoints, matching specific strategies and procedures with specific errors, our approach allows expression of fundamental beliefs and values through situation specific viewpoints. Other similar work presented in [14] describes Brahms, which uses simulations to target the interactive elements of work practices rather than the technical descriptions of procedural work flows. 10. Conclusion We have presented a methodology based on modelling dimensions for supporting discussion and negotiation of viewpoints. Changing viewpoints is valuable practice in avoiding rigidity and seeking alternatives. By practising this, on descriptions of technical knowledge, the learner should learn that changes in viewpoints can be beneficial for complex problem solving. The purpose of the methodology is to help develop these affective skills by using the knowledge representations as cognitive tools that can be manipulated in a shared workspace. A set of multiple domain models should be available to the training system, and our current work is developing this facility-the explanation function needed to enable communication between the training system and the learner is described in [6] [7]. It operates by trying to

172

T. M Khan et al.. / Developing Affective Skills

identify the knowledge that the user needs in order to understand a particular model. Multiple models are particularly useful for presenting different viewpoints during learning, and this point was illustrated through an example that shows how conciliation skill can be demonstrated by negotiating on modelling dimensions. Training systems are being developed for four industrial applications in the process industries (energy generation and paper production [15][16]). These systems will be used along with existing simulations for training in a range of different skills: from helping experienced operators learn to cope with uncertain events to developing model switching strategies to handle dynamic and complex tasks. References [1] Anderson, J.R, Reder, L.M & Simon, H.A (1996). Situated learning and education. Educational Researcher, 25(4)5-11. [2] Krathwohl, D.R, Bloom, B.S & Masia, B.B (1964). Taxonomy of educational objectives: the classification of educational goals: Handbook 2 - Affective domain. London: Longman Group. [3] Boden, M (1977). Artificial intelligence and natural man. London: Harvester [4] Norman, D (1993). Things that make us smart. Addison-Wesley. [5] Schon, D.A (1983/1991). The Reflective Practitioner. Maurice Temple Smith Ltd. [6] Khan, T.M, Brown, K & Leitch, R.R (1997a) Didactic and Informational Explanation in Simulations with Multiple Models, pp:355-362 in Proceedings of 8th AIED, Kobe, Japan 1997. [7] Khan, T.M., Paul, S.J., Brown, K.E., Leitch, R.R. (1998) Model based explanations in simulation based training. Proceedings of Intelligent. 4th International Conference on Tutoring Systems, ITS'98. San Antono, Texas. [8] Leitch, R.R et al (1995). Modelling choices in intelligent systems, AISB Quarterly, 93, Autumn. 5460.

[9] Khan, T.M., Mitchell, J.E.M., Brown, K.E. and Leitch, R.R. (1998) situated learning using descriptive models. International Journal of Human Computer Studies (special issue on Situated Cognition). [10] Rasmussen, J, Pejtersen, A.M & Goodstein, L.P (1994). Cognitive Systems Engineering. John Wiley. [11] Chi, M.T.H.; Bassok, M; Lewis, M.W ; Reimann, P; Glaser, R. (1989). Self-Explanations: How Students Study and Use Examples in Learning to Solve Problems. Cognitive Science 13, 145–182. [12] Wenger, E (1987). Artificial Intelligence and Tutoring Systems. Los Altos, California: Morgan Kaufmann. [13] Moyse, R (1992). A structure and design method for multiple viewpoints. Journal of Artificial Intelligence in Education 3, 207-233. [14] Clancey, W.J., Sachs, P., Sierhuis, M. and van Hoof, R. (1998). Brahms: simulating practice for work systems design. International Journal of Human Computer Studies, 49, 831–866. [15] Caimi, M and Lanza, C. (1997). Intelligent Simulation Systems: the EXTRAS Approach. Proceedings of the Sixth European Conference on Cognitive Science Approaches to Process Control-CSAPC'97, 103–106. CNR. [16] Khan, T.M., Laine, S., Brown, K.E., Leitch, R.R, Ritala, R (1999). Multiple Model-based Explanations for Industrial Decision Support, Int. Conf. on Comp. Intel, for Modeling, Control and Automation-ClMCA '99. Acknowledgements The work described in this paper has been undertaken in Brite/Euram III Project BE-1245 (EXTRAS: Explanation in On-line Help and Off-line Training Based on Continuous Simulation) by the following partners: Alenia Marconi Systems (formerly GEC Marconi RDS), CISE, ENEL, Heriot-Watt University, Iberdrola, KCL, LABEIN, TECNATOM, United Paper Mills. The authors wish to acknowledge the contribution of all members of the project team to the ideas presented in this paper, whilst taking full responsibility for the way they are expressed.


173

User Controlled Adaptivity versus System Controlled Adaptivity in Intelligent Tutoring Systems Michel Crampes Laboratoire de Genie Informatique et d'Ingenierie de Production (LGI2P) EERIE-EMA, Parc Scientifique Georges Besse. 30 000 NIMES (FRANCE) Tel: (33) 4 66 38 7000. E-mail: [email protected] Abstract : From a purely educational point of view, adaptivity seems to be an important quality in any tutoring system. Much research is done in order to develop adaptive computer-based learning environments that are able to adapt their behaviour, i.e. their presentations and their curricula, to the learners' needs and knowledge. We call this type of adaptivity System Controlled Adaptivity (SCA) since it comes under the responsability of the intelligence of the computer. This paper describes our experience in developing this sort of environment for particular tutorial projects, the difficulties that we have encountered in the pedagogical field, and the switching from SCA to User Controlled Adaptivity (UCA) in order to overcome these difficulties. We present some simple mechanisms that invite learners be responsible for the adaptive educational process. The advantages of the UCA approach are presented and its limits discussed. The paper concludes with the investigations we are conducting in order to have both approaches benefit from each other for WWW educative applications. Keywords : Adaptivity, Intelligent Tutoring Systems, Hypermedia, System Controlled Adaptivity, User Controlled Adaptivity.

1. Introduction "An adaptive computer-based learning environment attempts to 'know something' of the learner, and adjust or advise instructional sequences accordingly" [1]. Since the late eighties, our team at Ecole des Mines d'Ales, France, has designed, or participated in the design of educational systems that embed 'intelligence' to produce curricula and interactions adapted to the learners. We call these Intelligent Tutoring Systems (ITS) System Controlled Adaptive (SCA) systems. In the present paper, we consider the other possibility of adaptivity where the learners are responsible for defining their own curricula, switching intelligence from the system to the users. They are User Controlled Adaptive (UCA) systems. If the target of adaptivity is to create a better service for the users, SCA systems are not necessarily the unique solution. Our experience began with the design of SCA ITS systems. Then, after experimentation with learners, it appeared that for a better service, we had to put aside system-oriented intelligence and we started to conceive UCA systems that appeared to be more effective for education. We are now analysing the rewriting of some of our

174

M. Crampes / Intelligent Tutoring Systems

productions for the World Wide Web and our reflection leads us to consider that the best solution would lie in the cross-fertilization of the two approaches.

2. Related works Adaptive hypermedia pedagogical environments are generally presented as adaptive navigational support techniques which take into account a conceptual model of the domain, the current learners' knowledge and the educational goals [2] [3] Numerous techniques for adaptive navigation have been proposed [4] usually based on simple or fuzzy user models. [5] use a model of the teaching process (teach, test, summarise) where the next topic, the content detail and the teaching action are determined by the meta-strategy (MT). In ELMART [6] [7], test items are used with the trace of the precedent learners' pathways to infer advice on what to do next. [8] shows how it is difficult to infer a user model based merely on information about the users' pathways when navigating through a network of educational content. All these techniques try to develop models of the users in order to provide them with advice and adapted curriculum sequencing. They consider that the learners must be helped when they are lost in hyperspace such as in [9]. Few authors but [10] consider navigation on the behalf of the user as a central educative task. We do not know of other works where the problem of navigation is exploited to develop UCA pedagogical strategies where educational optimization is the cornerstone of the learners' motivations and learning process. With our approach, learning is seen both as an economical activity and a cognitive activity. The learners' performances can afterwards be used to develop a model of the users not only based on what they are supposed to know, but also on what they think they know before taking a test

3. Our Experience Between 1986 and 1992 we have developped an Intelligent Tutoring System (ITS) named MethodMan under a European project in order to train software engineers in software project management methodology [11]. Since then, MethodMan has been sold and used in different companies, engineering schools or universities where it has been estimated that about two thousand learners have been trained by means of this tutorial. We plan to rewrite it for the World Wide Web improving its pedagogical capacities through more embedded intelligent adaptivity. MethodMan teaches a simplified methodology of software project management [12]. It explains the phases of a standard project, the different activities and controls in each phase, and the different tasks that are related to these activities and controls. A large hypertext glossary is also available. Project management presents several specificities to be taught compared to other topics like geometry or a programming language. 1) When teaching people in companies, different people already know different things about project management. 2) There is no linear progression in the content, but all concepts are interlinked and the objective is to make each aspect of the content contribute to the whole topic, i.e. project management. 3) The content is made up of numerous abstract concepts which need a lot of texts to be explained. Merely reading all these texts is a boring intellectual process where, in fact, all these texts are targetted to develop action in project management


175

4) Optimization is an important part of project management. It is through permanent optimization that tasks must be planned, decisions made, and resources organized. 5) Common sense is part of the content. Teaching common sense often requires argumentation which needs a lot of personalization. Therefore, the difficulties, the motivation and the attention vary a lot according to the learners . Considering these characteristics of project management, it appears that any tutorial dealing with this topic needs some type of adaptivity, particularly to meet points 1 and 5. 3.1.

The three design trials

First design. The first prototype was designed as a highly adaptive interactive system in order to allow the learners to discover methodology through practice. The learners are given a list in alphabetical order of the phases, activities and controls that are the traditional backbone of a well-managed project. They have to choose step by step, the order of these items in order to reorganize the classical V-cycle of a project. An adaptative feed back is provided during this formal simulation. It uses two expert systems (ES) that build up dialogues with the learners according to the difficulties encountered in the exercise. The Method ES (MES) is a planner which knows the formal prerequisites and results of each phase, activity or control. It is able to reconstruct the V-cycle thanks to this knowledge. The Pedagogue ES (PES) asks the MES for a local diagnosis of the learners' interactions and builds up a feed back considering the type of errors they have made and the number of errors. To do so, it chooses a sentence adapted to the situation among a list of possible advice, and adds some methodological information from the MES such as the missing methodology prerequisites of the activity proposed by the learner which should not be triggered off. Those learners that have experience of methodology in project management can go through the simulation and check or update their knowledge with the help of the two ESs. For the novices, two types of pedagogical material are provided. A sophisticated hypertext glossary through which the learners can navigate freely, and a library of more than 60 textual pedagogical resources that give a deeper description of all concepts and techniques. In order to find the necessary information easily during a dialogue with the PES, the learners can ask for all the resources or only for user adapted contextual resources. We modelled a simple users' profile template (age, professional status, experience in project management, etc.) and we divided into types the pedagogical resources, according to the interest they presented for the different types of learners. The type division includes the role of the resources in a possible argumentation (a definition, an example, an illustration of a concept, a justification, etc.). The resources were also characterized with their prerequisites, i.e. the other resources that are necessary to understand them. When, during the V-cycle, the PES gives a feed-back to the learners, the latter can interact through a menu in order to ask for a justification. The pedagogical engine, developped on the basis of the MES's engine, is in charge of proposing a pedagogical plan of resources. The idea is that the learners can trigger off what can be considered as an argumentative link and that MethodMan can construct a plan of pedagogical resources to provide the argumentation. When this prototype was experimented on with a group of learners, it appeared that the using of both the hypertext glossary and this argumentative facility was very limited. Moreover, the learners looked very passive in information gathering and did not take much initiative beyond the immediate interaction with the PES and the MES. This proved that the collaborative actions of the MES and PES were very pertinent, but we were very disappointed because we had expected the learners to go through most of the resources, and

176


maybe a lot of the glossary in order to better achieve the formal simulation of the V-cycle. In fact, it has been noticed that they prefered to work with the help of the MES and the PES and did not ask much from the library or the glossary. Second design. In order to correct this pedagogical semi-failure, we invited the learners to explore the library freely after the formal simulation, considering that the close adapted interaction with the PES and the MES could be the main cause of the discarding of the pedagogical material. To provoke more initiative and autonomy on the behalf of the learners, the resources were reorganized as a hypertext network. This idea was based on point 2 which says that the contents are very interlinked. A set of Multiple Choice Questionnaires (MCQ) was also provided for the learners to test their knowledge. In this case, there was no adaptivity from the system, but a traditional hypertext navigation through all the material with the possibility of testing the acquired knowledge through the MCQs. Quickly we realized that the learners became bored (in fact in accordance with point 4) and lost when navigating through the maze of concepts, and they prefered to avoid the MCQs than to go through them. There was a need for a mechanism to bring about interest and excitement. There was also an obvious lack of integrating point 4, i.e. optimization in project management, within the tutorial. We kept the idea of the hypertext for the glossary anyway. Something different had to be designed and added to the original system. Third design. The final idea was to consider the tutorial itself as a project which would be structured in three stages. i) The first stage consists of the original simulation using the MES and the PES with the possibility of using the resource library in a rough manner, without hyperlinks and any adaptive pedagogical engine. The adaptive interactions with the PES and the MES are sufficient to provide interest and pedagogical support, ii) The second stage consists in exploring the pedagogical resources under the learners' responsability. The objectives of the learners in the project are to build up a system of personal knowledge about the topic. In this perspective, consulting a resource is the equivalent of an activity in a project. Going through a MCQ is the equivalent of negociating ami evaluating the result of the activity with the client of the project The MCQs must be representative of risk management in a project. The tendancy to discard the resources observed in the previous solutions must become spontaneously a basis for optimization. iii) The last stage must be the equivalent of system evaluation in a project with the risk of not satisfying the client if the expected value is not sufficient. In order to provoke motivation in the second stage and to firmly invite the learners to study the pedagogical resources more thoroughly, a subtle mechanism had to be proposed for the latter to be willing to play the game. They had to be responsible for their own pedagogical strategies. This is where we introduced several simple mechanisms that invite the learners to participate and that are based on User Controlled Adaptivity. User controlled tactical adaptivity. During the second stage, the learners have to go through thirty Multiple Choice Questionnaires (MCQ), and they can prepare for these questionnaires by consulting the pedagogical resources as they wish. At each MCQ, they can win a few units of a specific MethodMan currency called Elementary Knowledge Units (EKUs). When entering the second stage, they are warned that they will have to obtain enough EKUs in order to attend a validation test that will be proposed in the third stage. This last stage is built on the same basis as the first stage, but the PES transforms its advice into penalties. It consists in losing a certain amount of EKUs every time the learners make a mistake during a general test of a considerable size that includes the V-cycle reconstruction and a set of original MCQs at the end of each phase of the V-cycle To succeed in the third


177

stage, the balance between the EKUs won during the second stage and the EKUs lost during the third stage must be positive. The students are free at any time during the second stage to leave it and risk taking the test in the third stage, if they believe they have enough knowledge. In order to sustain the overall motivation a (virtual) positive reward at the end of the third stage is given if it is completed successfully. Thanks to this mechanism, the learners are free to navigate through the pedagogical resources and the MCQs. They must, however, be able to evaluate their capacity to pass the final test and decide upon their level of knowledge. User controlled local adaptivity. A second level of user controlled adaptivity is provided at the level of each MCQ. There are different types of MCQs according to the constraint they implement. All MCQs can be tried as many times as desired. All MCQs conclude with a list of pedagogical resources that contain the information to pass the MCQs. Depending on the type of MCQ, the learners must adapt their knowledge acquisition methodology to optimize the knowledge acquisition and the EKU gathering. Nothing is said about the different optimal methodologies. The learners have to discover them. But the learners are informed of the content and the type of tests in their titles, with one exception that we shall see later. 'Simple tests' consist of five independant questions. The learners get 2 EKUs for each correct answer. Whatever their answers, MethodMan gives them the correct answers after their own answers and they can redo the test hoping to gain the maximum amount of EKUs. The students understand very quickly that the optimal methodology is to go through all these simple tests in the first place without even consulting the pedagogical resouces. With these tests, the learners can be lazy, the way they tend to be spontaneously. The objective of these tests is purely to enhance motivation through immediate positive feed-back. 'Increasing winning blind tests' are used to make the students evaluate their own knowledge without being penalized. Five questions must be answered. If the learners give less than three correct answers, they get nothing. If they give at least three correct answers, they win 5 EKUs. If they give five correct answers, they get 9 EKUs. Nothing is said about which answers are correct and which are wrong. The best strategy is to go through these tests, try to guess which answers are wrong if some are, then to go and study the corresponding pedagogical material, and try again until all the answers are correct. 'Decreasing winning blind tests' ask for the opposite strategy. Only one reward is given if four questions out of five are answered correctly. Again, at the end of the test, MethodMan gives the number of correct answers but does not say anything about which answers are correct and which are wrong. If the learners go through the test successfully the first time, the reward is 10 EKUs. If they pass it the second time, the reward is 6 EKUs. The third time, it is 4 EKUs. For any other attempt, the reward for success is 3 EKUs. The best strategy for the trainees is to go through these tests at the very last moment, after having studied the resources that echo the titles of these tests, or those that have been recommended by MethodMan after a first attempt. Attempting such a test is taking a risk concerning the evaluation of acquired knowledge. 'Unexpected tests' are the only tests that are not announced. Their titles look like a normal pedagogical resource. When learners encounter such a test, they are informed that it is a Decreasing winning blind test'. If they want to carry on, they can expect to win a maximum amount of EKUs as usual. If they think they are not ready, they can postpone the attempt, but the maximum amount of EKUs for the first success will only be 8 EKUs. The learners face a double dilemna with three possible expected winnings: 10 EKUs in the case of success in the first trial without postponement, 8 EKUs in the case of success in the first trial with postponement, and 6 EKUs or less in the case of failure in the first trial.

178


This environment looks like assets that the learners must deal with in order to reach a knowledge objective. The learners must optimize their learning capabilities and develop a meta-knowledge which presents two aspects. One aspect is a meta-knowledge about their knowledge of the domain which corresponds to the learner model in ITS. The other one is about their own learning strategies, which corresponds to intelligent planning in ITS. In fact, they must develop the sort of intelligence designers are trying to formalize in ITSs where the tutorial intelligence is supposed to be on the behalf of the system which decides what to do next according to the (imagined) user's needs. With a UCA approach, the tutorial intelligence is left to the user who plays the role the ITS is supposed to play. The result of such an approach is that the learner must develop meta-knowledge and, implicitely, she unveils her meta-knowledge performance through her interactions and requests. This information can be used to develop a possible user model (which has not been done in the system which is presented here). Therefore, the user model is not based on the knowledge the user develops about the domain, such as in an overlay model, but it is a model of the user's capacity to develop her knowledge. In some ways, it is not far from the "disturbing strategy" proposed by [13] with the difference that, in our model, the disturbance does not come from a co-learning agent, but from the user herself. 3.2.

Evaluation and Learners' behaviour

We tested this last version of MethodMan with different groups of learners. In the first year of its exploitation, the learners were asked to fill in an evaluation leaflet. Although they had spent between ten to fourteen hours studying texts and going through stressful tests, the satisfaction level was very high. Few are the learners that do not pass the third stage. The time spent on the tutorial is very dependant on the users' profiles. Experienced project managers spend only four to five hours here. They try the most difficult tests, check that they have the overall knowledge, and go to the final evaluation with few EKUs. Novices can spend hours gathering EKUs to limit the risk of failure. It has been observed that learners understand very quickly that they must adapt their knowledge acquisition methodology to the type of tests. The knowledge acquisition is turned into a game and the motivation is high. The learners appreciate the navigational freedom and the possibility to take risks. There is no complaint about a possible cognitive overload due to the necessity to keep track of the rules of the game. 3.3.

Interest of User Controlled Adaptivity

The UCA pedagogical strategy is the result of our research in two ways. It forces the learners to go deeply through numerous pedagogical textual resources that they do not explore spontaneously, and it creates excitement and motivation within a basic project management activity, i.e. resource optimization. The learners have to adapt their curricula according to their task in order to optimize their knowledge acquisition. Another interest of this approach is that the learners must work upon their own knowledge in order to manage their learning. This characteristic reinforces a lot the learning process. The result is also obtained with simple and cheap mechanisms, i.e. the winning rules, which require few developments. The intelligence is on the side of the learners, not the system. To be adapted to other domains, such a strategy can also be easily extended with other sorts of simple winning rules, or with other sorts of constraints like time constraints


3.4.

\ 79

A second example of UCA with time constraints

Between 1992 and 1995, we developped several multimedia tutorials that explored other UCA mechanisms. In one of them, called CD-Quality, the winning strategy is combined with a time constraint strategy, which produces a double pressure knowledge management on the learners [14]. CD-Quality teaches quality management in small companies, in engineering schools and in universities. We created a simulation of a small company where several (virtual) processes are active (the production process, the commercial process, etc.). The company is supposed to be involved in a certain market and the aim is to be amongst the best in this market. Without quality management, the company loses market shares, whilst it wins market shares if the quality is optimized. The trainees are in charge of this quality management. Pedagogical resources are available as a set of small multi-media lessons. CDQuality differs from MethodMan in the way the MCQs are accessed and the answers are rewarded. The questions are asked for by characters of the company (a worker, the manager of the company, an engineer, etc.) who intervene unexpectedly under different visual and tactical scenarios. The results of the company evolve according to the learners' answers. In order to give good answers, the learners must learn about quality through the pedagogical resources. Since they do not control the characters' interventions, they must access the resources as soon as they can, particularly when nothing happens. Or they must discard the characters' interventions if they think that answering the questions will be a waste of time because they may not know the answers. Without answers, or with wrong answers, the company quality decays after a certain time. The whole process requires the learners to permanently adapt their pedagogical strategy to the different situations. In CDQuality, the time constraint puts pressure on the learners to decide which resources should be studied first and to study them with the utmost attention, because no time must be lost. These user controlled adaptive pedagogical tasks require the system not to help the learners, otherwise, it would make the game lose interest. Consequently, the question is whether User Controlled Adaptivity excludes System Controlled Adaptivity or not.

4. Conclusion: User Controlled Adaptivity versus System Controlled Adaptivity We think that User Controlled Adaptivity has not only qualities, but also several limits of which we mention only a few. It supposes that the learners are capable of developing a meta-knowledge to manage their own knowledge. They must also have a certain vision of their universe, because decision-making relies on information. They must not get discouraged when their attempts to control their universe through their knowledge acquisition do not bring success. Finally, the learners must be ready to play in a demanding pedagogical game where no passivity is allowed. The answer to these limits could be found in two ways. On the one hand, the system could adapt the challenge according to the difficulties encoutered by the learners. For instance, in CD-Quality, the characters' interventions could be slowed down to let the learners consult the resources and get ready for the questions. On the other hand, since the system knows the sort of questions that are going to be asked and the sorts of resources necessary to give a good answer to the questions, the adaptive capacity of the system could be used to help the learners, in the same way as a new employee is advised by other employees. But in this case, using system adaptivity as an aid does not help the learners to develop their meta-knowledge. If the learners can ask for help at any time, their involvement may become limited. The other possibility is to leave the decision regarding this help to the

180


system. But the drawback here is that the learners are passive. They tend to let the system intervene which discharges them of the knowledge management responsability. We think that the best solution could lie in paying for assistance. Learners that ask for a switch from UCA to SCA should (virtually) pay for it. In this manner, SCA would be part of the UCA, and consequently, be part of the game. We have not as yet implemented any of these SCA solutions because the systems presented in this paper do work nicely with the learners, as if it were not necessary to develop more system's intelligence for the benefit of the users. Our experience shows that when Tutoring Systems focus their attention on the learner intelligence, Artificial Intelligence must redefine its role. This new role could be in UCA/SCA collaboration. Thanks to UCA and SCA mechanisms, the system can see the difference between the strategies it would produce to adapt the knowledge acquisition process to the situations encountered by the learners and the strategies employed by the learners. This gives precious information to the system. It can see the confidence the learners have in their own knowledge, and their willingness to take risks with their amount of knowledge. Therefore, User Controlled Adaptivity is a precious source of information for user modelling, and, as a consequence, for System Adaptivity. Adaptivity is necessarily a partnership between both actors of the training process, the system and the learner.

5. References [1] Eklund, J. Knowledge-Based Navigation Support in Hypermedia Courseware using West, paper proposed for Australian Educational Computing, Vol 11 N°2. [2] Cherkaoui, C., Chambreuil, M., Gaguet, L. "Aspect de la pianification didactique: etud dans le cadre d'un environnement d'aide a 1'apprentissage de la lecture", Sciences et Techniques Educatives, Vol. 4 , N° 3/1997, pp. 2587-297, Hermes, 1997. [3] Brusilovsky, P., Pesin L. ISIS-Tutor: An Intelligent Learning Environment for CDS/ISIS Users, in Proc. of CLE'94, Jocnsuu, Finland, 1994. [4] Brusilovsky, P., Schwartz, E., Eklund, J. Adaptive Textbooks on the World Wide Web, in Proc. of AusWeb97-Techiucal Futures-Adaptive Textbooks on the World Wide Web. [5] Woods, P.J., Warren, J.R. Adapting Teaching Strategies in Intelligent Tutoring Systems [6] Brusilovsky, P., Schwartz, E., Weber G. A Tool for Developing Hypermedia-Based ITS on WWW, Position Paper for Workshop on Architectures and Methods for Designing Cost-Effcctive and Reusable ITSs, Montreal, June 10th 19%. [7] Weber, G., Specht, M. User Modeling and Adaptive Navigation Support in WWW-based Tutoring Systems, paper presented at UM-97 (CagUari, Italy, June 2-5,1997). [8] Sterb, M.K.. The difficulties in Web-Based Tutoring, and Some Possible Solutions, in Proc. of the Workshop "Intelligent Educational Systems on the World Wide Web", 8th World Conference of the AIED Society, Kobe, Japan, 18-22 August 1997. [9] Greer, J.E., Philip, T. Guided Navigation Through Hyperspece, in Proc. of the Workshop "Intelligent Educational Systems on the World Wide Web", 8th World Conference of the AIED Society, Kobe, Japan, 18-22 August 1997. [10] Linard, M., Zeiliger, R. Designing a Navigational Support for an Education Software, ARTMOSC 95, available at http://www.irpeacs.fr/papers/rz/artrnosc.html [11] COMETT (1992) COMETT/ VOLET Cb: Final report Volet Cb - Contractual year 1991/1992. Reference number 90/1/5081/Cb, Brussels. [12] Kameas, A., Crampes, M., Pintelas, P. "Computer based tools for methodology teaching", ADCIS Conference, Norfolk Virginia, November 1992 [13] Piche\ P., Frasson, C., Aimeur, E. "Amelioration de la formation au moyen d'un agent pertubateur dans un systeme tutoriel intelligent", NTICF'98, INSA Rouen, Novembre 1998. [14] Crampes, M., Saussac, G. L'acte d'Apprcntissage au Coeur de la Simulation. NTICF'98, INSA de Rouen, France, November 1998

Artificial Intelligence in Education S. P. Lajoie and M. Vivet (Eds.) IOS Press, 1999

181

Dynamic versus static hypermedia in museum education: an evaluation of ILEX, the intelligent labelling explorer Richard Cox School of Cognitive & Computing Sciences University of Sussex Falmer. BNI 9QH, UK richcicogs.susx.ac.uk Mick O'Donnell Department of Artificial Intelligence University of Edinburgh 80 South Bridge, Edinburgh, EH1 1HN, UK

[email protected]

Jon Oberlander Hitman Communication Research Centre University of Edinburgh 2 Buccleuch Place, Edinburgh, EH8 9LW, UK [email protected] Abstract This paper describes an evaluation of an intelligent labelling explorer (ILEX), a system that dynamically generates text labels for exhibits in a museum jewellery gallery. In the evaluation, learning outcomes in subjects who used the dynamic ILEX system were compared to those of subjects who used a static-hypermedia version of the system (i.e. a system more typical of current hypermedia systems). The aim was to attempt to isolate learning effects specifically due to dynamic hypertext generation. Several types of data were collected - user-system interaction logs of navigational and browser use, recordings of the type of information to which each subject was exposed, post-session tests of factual recall and a special 'curator' task in which the subjects were required to classify novel jewellery items. Results showed that performance measures (post-session tests) did not differ between subjects in the two conditions. However, the interaction-log data revealed that the two groups differed in terms of their navigational behaviour and in the type and amount of information to which they were exposed. These results are discussed in terms of the 'learning-performance' distinction often drawn in psychological accounts of learning. The paper concludes with an outline of further planned work.

1

Introduction

The term hypermedia refers to hypertext systems that include graphics, diagrams, photographs, movies, animations, etc. They are systems that allow non-linear access to multimedia resources. The intelligent labelling explorer (ILEX) system produces descriptions of objects encountered during a guided tour of a museum gallery. ILEX seeks to automatically generate labels for items in an electronic catalogue (or museum gallery) in such a way as to reflect the interest of the user and also opportunistically to further certain educational (or other) aims. The ILEX domain is that of a 20th Century Jewellery Exhibit in the Royal Museum of Scotland.

182

R Cox et al. / Hypermedia in Museum Education

At top level, the virtual gallery consists of a page ('virtual glass case') of 30 thumbnail images of jewels from the National Museum of Scotland collection. The jewels are quite varied - the exhibit features works by many designers (e.g. Gerda Flockinger, Jessie M. King), in around 6 styles (art deco. arts & crafts. ...) from several periods. The exhibits are made from a wide variety of materials — from 'ephemeral' items in plastic such as a 'Beatles brooch' to items made in precious metal and precious stones. The user explores via a Web browser. The user begins by selecting an item from the jewellery 'case' - a larger image of the item is then presented and some explanatory text is generated by the system An example of a generated label can be seen in Figure 1

Figure 1: An example of a dynamically labelled exhibit. Note the referring expressions ('As already mentioned...') and user-modelling based on previous items seen ('for instance the previous item has floral motifs') The pages produced by the system differ from conventional hypermedia pages in that they are generated dynamically—in other words, they are tailored to a particular user in a particular communicative situation. This flexibility has a number of advantages. For one thing, the discourse history of the user can be taken into account — the objects which the visitor has already seen — so that information the visitor has already assimilated can be taken into account in the current description. For instance, the description of the object currently being viewed can make use of comparisons and contrasts to previously-viewed objects, while omitting any background information that the visitor has already been told [6,7, 8]. Dynamic hypertext makes it possible for the generation system to pursue its own agenda of educational and communicative goals, while allowing the user the freedom to browse the collection of objects in any order, as in a normal hypermedia system. The aim is to reproduce the kind of descriptions that a real curator might give, were the visitor to have one at their elbow. Opportunistic text tailoring is achieved in ILEX via the use of referring expressions, comparison expressions, nominal anaphora and approaches derived from rhetorical structure theory [4,6, 7]1). The aim of the evaluation was to attempt to assess the effect of intelligent label generation upon several types of learning outcome. Dynamic and static versions of the intelligent labelling explorer (ILEX) system were compared. The goal was to attempt to 'pin down' or isolate to some degree the specific effects of text which is tailored to the user and which takes into account his/her browse history. Unlike typical hypermedia evaluation studies, the aim was not to compare hypermedia with traditional media, or to investigate aspects of hypermedia such as configurations of page links, but, rather, to compare two versions of hypermedia - a traditionally configured version (static pages, no user modelling) with the intelligent system (dynamically generated text containing referring expressions and comparisons based on a user-model) These papers and others are available from the project website http //cirrus dai. ed. ac uk 8000/ilex/

R. Cox et al. /Hypermedia in Museum Education

183

ILEX provides succinct and coherent information to the learner by relating information about a currently viewed object to previously viewed objects and thus, to some degree, organising, structuring and contextualising the material in a semantically coherent way. Thus a prediction was that learning outcomes in terms of factual recall would be greater from the dynamic system than from the static system. The ability of dyanamic-ILEX to draw out comparisons and make generalisations was also predicted to produce better outcomes on subjects' learning to classify novel, unseen, artifacts. On the other hand, there is some evidence that unpredictable, varying, dynamic hypermedia do not always facilitate performance [5]. Hence, there was an alternative hypothesis that the dynamic system would produce poorer learning outcomes than the static system due to the variability and less-standardised and less-predictablyformatted nature of its output. The aim, therefore, was to test these competing hypotheses by comparing learning outcomes from the two versions of ILEX. 2 The systems - two versions of ILEX Two versions of the system were developed - 'dynamic ILEX' (the ILEX project system) and a comparison 'static ILEX' (described below). The two systems differed in terms of the format of the descriptions that they generated. Also, the response time of dynamic ILEX was somewhat slower than that of static ILEX, due to the computational overhead of dynamic label generation. 2. /

Dynamic ILEX

This system has been described in the introduction and more information can be obtained from the project Web site (see footnote 1). 2.2 Static ILEX The static version of the ILEX system was prepared by generating all the available pages of each jewellery object (usually 4 or 5 per object) from dynamic-ILEX. The discourse history was maintained over the pages of an object, so that each sucessive page contained new information, and may have referred to entities introduced on prior pages. However, discourse history was erased before starting on the next jewel, so that the pages of the new object may repeat what was said for a previous object, and indefinite reference was used for first mentions of some objects. The generated system was 'frozen' and became the static ILEX system used in the study. Static-ILEX is more typical of current Web-based hypermedia systems than dynamic-ILEX. 2.3 Sample output from the two versions of ILEX To illustrate how output from the two versions of the system differed, the same browsing sequence was conducted with each system in order to generate comparative output. The browse sequence in this exercise was as follows. The first jewel in the gallery 'case' was accessed, then a further 5 jewels in the gallery were viewed (the first row of jewels in the case). The first jewel was then revisited. The jewel browse sequence was 1) A gold, enamel and saphire pendant necklace by Jessie M. King. 2) A gold, enamel and moonstone pendant-necklace by Jessie M. King. 3) A silver and enamel pendant necklace by Jessie M. King. 4) A waist buckle by Jesse M. King. 5) A gold, moonstone and opal necklace by Edward Spencer and 6) A silver metal, beryl and tourmalines necklace by Arthur and Georgie Gaskin. The first jewel in this sequence was then revisited. The first three pages of output are shown below, for both versions of the system. 2.3.1

Output from dynamic ILEX

On the first viewing of the first jewel, dynamic ILEX produced the following labels (nb 'say more' shows output following click of 'say more' button (see Figure 1): This jewel is a necklace and was made by a British designer called Jessie M. King. It is one of the four items in this case which were made for Liberty & Co. It is comprised of openwork gold floral sprays. It is set with jewels. It is in the Arts and Crafts style but it uses faceted stones. It has an elaborate design (specifically it has floral motifs). King was British. Other jewels designed by King include...(list of 3) 'Say more' This necklace was made in 1905. It is made from sapphire, enamel and gold. It was made in London. As already mentioned, it was made by Jessie M. King. King was not just a jewellery designer, she was an illustrator too. In fact, she did quite a lot of different types of creative work. Jewellery is just part of it. We hare also mentioned that it is in the Arts and Crafts style. Arts

184

R. Cox et al. / Hypermedia in Museum Education

and Cmfts style jewels usually use oval-shaped stones and usually feature rounded stones. Other jewels in the style include...(list of 7) 'Say more' This necklace has festoons and was produced in limited quantity. As already mentioned, it was made by Jessie M. King. King was important and lived in London. We have also mentioned that this necklace is in the Arts and Crafts style. Arts and Crafts style jewels usually have an elaborate design and are usually flexible. They usually demonstrate the artistic sensibilities of the wearer and were usuallv produced by single craftsmen. Other jewels in the style include: ... (list of 7) After visiting the next five jewels in the virtual 'case' the same jewel was revisited. The first page of the label now read: Arts and Crafts style jewels usually feature rounded stones; for instance the previous item uses oval-shaped stones (in other words it features rounded stones). However this necklace does nor featured rounded stones; rather it uses faceted stones. Note that the label now contains a comparison between this jewel and the previous jewel viewed (a silver, beryl and tourmalines necklace by G & A Gaskins. also in the Arts & Carfts style). The phrase 'previous jewel viewed' was a hyperlink back to that item. 2.3.2

Output from static ILEX

On the first visit the label read: Page 1 This jewel is a necklace and is made by a designer called Jessie M. King. It is one of the four items in this case which were made for Liberty & Co. It is comprised of openwork gold floral sprays. It is in the Arts and Crafts style and is made in 1905. It is set with jewels, in that although Arts and Crafts style jewels usually use oval-shaped stones this jewel uses faceted stones. Other jewels designed by King include ... (list of three displayed). Other jewels in the style include: ... (list of 4 displayed) Page 2 This necklace is made from sapphire, enamel and gold. It is made in London and has festoons. It is produced in limited quantity. It has an elaborate design (indeed Arts and Crafts style jewels usually have an elaborate design); specifically this necklace has floral motifs. Liberty and Co were at the interface between mass-produced jewellery and 'craft' jewellery — one-offs. They used the very best designers to design jewels for them, which were then produced in fairly limited quantity, but in quantity nevertheless. Not quite mass-produced, and not quite 'craft'. Liberty's designers were never credited, so a piece like this wasn 't sold as a Jessie M. King (for Liberty)' necklace - it was simply sold as a Liberty necklace. But the design books survive, and we have a pretty good idea who designed what. Page 3 As already mentioned, this necklace is in the Arts and Crafts style. Arts and Crafts style jewels usually feature rounded stones and is usually flexible. They usually demonstrate the artistic sensibilities of the wearer. This necklace is made for Liberty and Co. Since this version was a static hypermedia system, the output was identical when this item was revisited after the intervening five jewels. 3 Development of the outcome measures Three instruments were devised for use in the evaluation. They consisted of 1. a recall test of factual knowledge about jewels in the exhibition, 2. a 'curator' task (to be described below) and 3) a useability questionnaire. The tests were administered to subjects on-line, as Web-forms linked to ILEX. Multiple choice check boxes and radio buttons were provided for subjects to indicate their response choices. 3.1

Factual recall test

This was a multiple choice test which was introduced to the subjects with the heading "What did you learn from the virtual exhibition?" Two examples (of 15 items in total) are shown below: 1. Clean lines and geometric forms are characteristics of which style(s)? (sixties, organic, Scandinavian, machine age, art deco, arts & crafts] 15. Wendy Ramshaw and David Watkin made jewels in the 60"s. They typically used which materials (choose up to 5 materials)'' [paper, steel, perspex. wood (31 materials listed in total)]


3. 2

185

Curator task

An educational aim valued by museum curators is the inculcation in visitors of a notion of artifacts as evidence - evidence of a particular time, place and set of beliefs [3]. Visitors are taught to look, describe, record, classify objects and artifacts and perhaps take away skills that may be useful for assessing the significance of (novel) artifacts outside of the museum context. The second instrument therefore attempted to assess these kinds of skills. A typical item is shown below. It consisted of the presentation of a jewel not seen in the exhibition, with subjects instructed to 'Examine the photograph and then classify the jewel in terms of its Style'. Multiple choices options were [sixties, organic, Scandinavian, machine-age, art deco, arts & crafts]. The complete test consisted of 15 items. 4

Evaluation study

4. 1

Subjects

The subjects were University of Edinburgh students recruited via notices in academic departments and halls of residence. Thirty subjects were randomly assigned to one of two conditions - static ILEX or dynamic ILEX, with the constraint that more subjects (20) were allocated to the dynamic ILEX condition than to static ILEX (comparison) condition (10 subjects). Gender representation was as follows: static-ILEX 7 male, 3 female; dynamic-ILEX 11 male, 7 female. Data was lost for two of the subjects in the dynamic-ILEX condition due to equipment failure. 4. 2

Methodology and procedure

The subjects were run over three, consecutive, 90 minute sessions. Each session consisted of exhibition browsing (45 minutes) followed by the post-tests and a questionnaire (45 minutes). The experiment was conducted in one section of a computer laboratory. Subjects were seated at a row of Pentium PCs (one row each side of a central divide). In order to ensure adequate levels of performance, multiple servers were used to serve the ILEX web pages. 4. 2. 1 Subjects' task Subjects were instructed to log on to the system and explore the ILEX virtual museum gallery. They were told that they would be required to answer quiz questions following the gallery browsing session. Subjects were not told which version of ILEX they were using, or that there were two versions of ILEX in use. Since static-ILEX subjects were seated on one side of a central divide and dynamic-ILEX users were on the other side, they were unaware of any differences between the systems. 4. 2. 2

User-system interaction logging

In order to track the domain content viewed by the users, and to relate hypermedia browsing and navigational behaviour to learning outcomes, a logging system was implemented on both static and dynamic ILEX systems. All subjects' button presses were recorded in time-stamped logs together with records of the pages visited and the information content of the pages. 5

Results

The recall test and curator task responses were forwarded electronically to the experimenter for marking. Usersystem interaction logs were automatically saved into separate directories for each subject. 5. 1 Post test outcome 5. 1. 1 Factual recall test Test 1 consisted of 15 items. Maximum possible score was 31. The test means for subjects in both groups are shown in Table 1. 5. 1. 2

Curator task

The second ('curator') task consisted of 15 items. Maximum possible score was 23. The test means for subjects on this test are also shown in Table 1. As shown in Table 1, mean group scores were similar on both tests. However, the dynamic subjects showed considerably less variation in score, compared to static subjects, on both tests. Although performance scores were similar in both groups, it was of interest to discover whether the underlying behaviour or learning processes by which they were achieved differed. To pursue this question, the log

186


Group Static ILEX Dynamic ILEX

mean(std dev) mean(std dev)

Factual recall test 18(4. 2) 18(3. 0)

Curator task 9(2. 9) 10(1. 9)

Table 1: Mean scores and standard deviations for subjects in the static and dynamic ILEX conditions on the two tests.

data were analysed in order to determine whether the two systems produced differential effects in terms of a) browser/navigational behaviour and b) the amount and type of information browsed. 5. 2

Analysis of log data

Two kinds of data were extracted from the user-system interaction logs — 1. the user's browser behaviour (navigational maneouvres) and 2. page content —jewellery facts presented to particular users in the course of their session. 5. 2. 1 Browser/navigation events measures Several indexes of navigational behaviour were extracted from the logs. These consisted of: visits to the case of jewels (CASES); button clicks in the ILEX navigation bar provided at the top of each page (back, forward, go to jewellery case) (NAV, PNTR); requests for more information or another page of information about a jewel (NAV2); and total events (CASES, NAV, NAV2, PNTR plus other miscellaneous browser events). Subjects using static-ILEX demonstrated approximately 60% more navigation-related button clicks than their dynamic-ILEX counterparts. However, this was an artifact of way static pages are generated by dynamicILEX - information about a particular jewel was distributed over several pages and contained more repetition of information compared to the dynamic-ILEX system. In order to standardise the comparison, therefore, the CASES, NAV, PNTR and NAV2 measures were expressed as proportions of total events to yield PCASES, PNAV, PPNTR and PNAV2 variables for each subject. 5. 2. 2

Page content measures

The following parameters were derived from the log data: the number of (different) jewels viewed (NODIFF); the mean number of pages browsed per jewel (MNPPO); the mean number of visits per page per jewel (MNVPPO). This provided an index of the extent to which subjects revisited pages of information about a particular jewel; and the mean display time per page (MNDPP). 5. 3

Log analysis results

5. 3. 1 Browser/navigation events measures Statistically significant differences were found between the groups in the number of visits to the case of jewels (Static subjects PCASES=0.075, dynamic subjects PCASES=0. 169, t=-5. 26, p (A+B)(A-B) becomes search a common factor. 7) Some transformation rules are based on a process, e. g., the discriminant. 8) Rule-concepts (concepts from transformation rules) are used. For example, the transformations of (x-2)[2(x+l)]+(x-2)[3(2x-l)] into (x-2)[2(x+l)+ 3(2x-l)] and of 4x 2 -l into (2x+l)(2x-l) are called factorisations. 9) There is an important concept of reduction. Reduction rules have generally the highest priority for any problem type. 10) There are well identified problem types (e. g., factorisation, equation solving), however the characterisation of solved forms is not precise and is sometimes teacher dependant. 11) Reduction and cleaning up take a important place in solved forms. The observation of students and teachers shows that some problem types have no strategic difficulties. At any given time, there is only one thing to do. In fact, there are always many applicable transformations. The only thing to do is the result of specialised knowledge combining transformation and strategic features, which is another aspect of compiled knowledge [1]. Sometimes, this lack of strategic difficulties is the result of a strong limitation of the domain, like when, at some level, factorisation problems are limited to expressions that match an identity so that they are solved just by applying the identity and reducing the result. For domains with strategic difficulties, it is sometimes necessary to backtrack to a previous step in order to try another path (see figure 1). This means that the solving process must be considered in a heuristic search framework [11] and that knowledge is necessary for deciding when to backtrack, where to backtrack and what rule to choose. 2. 2. Modelling human reasoning We have built a general model of human reasoning in formal algebra for building ILEs, the MCA model (Model based on Compiled knowledge for Algebra), and a particular model of the factorisation domain for building the APLUSIX system. These models implement the features of the above analysis. In the general model, there are a heuristic search framework, a theory of expressions, expression-concepts, matching-concepts, rule-concepts, plans, sub-problems. For details, see [9]. In the particular model for factoring polynomials: 1) There is no plan and no sub-problem because the domain is not complex enough1. 2) Three expression-concepts have been defined: monomial, factor, square. For example, the concept of monomial has three slots: coefficient, variable and degree. Each of the following expressions: 4, -4, x, -x, 4x, -4x, xn, -x n , 4x n , -4xn are monomials, and a unique transformation rule is able to calculate the sum of two monomials of the same degree. 3) The matching-concept modulo a constant factor is used, in particular for matching factors. 4) Four main rule-concepts have been defined: strong-factoring, strong-expansion, strongreduction, elementary. For example, factoring out a monomial is strong-factoring and factoring out a constant is elementary. We have stated the following strategic principle [5]: For factoring polynomials, use in priority strong-factoring and strong-reduction. This principle is important for choosing the rule to apply, for deciding when to backtrack and for providing help to the student. A few models have been built by authors of ILEs for algebra. PRESS [3] has rule-concepts and uses these concepts for choosing the rule to solve equations. But PRESS has no expression-concept. In MATHPERT [2], the essential object is the operator. Operators are chosen with the help of indicators. There is not expression-concept and no real rule-concept. Plans and sub-problems are implemented for another domain in another prototype.

J. -F. Nicaud et al. / Teaching Formal Algebra

209

DISSOLVE [10] has expression-concepts, in particular the factor concept, but no rule-concept. These three systems do not use a heuristic search framework. 3. The APLUSIX prototype From 1987 to 1997, the APLUSIX system evolved. Some functionalities were modified or added, other were suppressed when they seamed to be useless. The last version is described here. The system was mainly used at an intermediate level, and sometimes at a low level. It was particularised for some features for these levels. In both levels, the use of the discriminant was not known by the students. The system provides the learner with two interaction modes, an observation mode, in which the student observes the system solving a problem, and an action mode, in which the student solves a problem. Moreover, the system may be used in a free way, the student choosing the parameters and the problems, or in a directed way, parameters and problems being described in automata.

Figure 1: A two branch tree generated by the system

3. 1. The observation mode In the observation mode, APLUSIX uses reference knowledge to solve a problem, step by step, generating and displaying a search tree. For many problems, at the considered level, the search tree consists of a unique branch. However, a two branch tree is sometimes obtained (figure 1). Each step is described using the rule or the rule-concept applied. The rule itself is used for factorisation (the rules under study) otherwise the rule-concept is used. At each step, the learner can ask for explanations about the matching. In this case, the system provides details in a temporary window, for example: 3x-l is a commun factor in (4-J2x)(x-7)-x(9x-3) because (4-J2x)(2x-7)=-4(3x-l)(x-7) and -(-2x-7)(9x-3)=-3x(3x-l) In previous prototypes, explanations about the strategies were available for the students. They have been abandoned because experiments show that they were useless at this level. They will be available for other levels in the future. 3. 2. The action mode In the action mode, the learner solves problems. At each step, (s)he chooses an action in a menu. The action is either a rule (for factorisations) or a rule-concept (figure 2). After that, an action window is opened for selecting the concerned sub-expression. For factorisations, the student has in addition to indicate the matching (figure 3). The request of the student is then analysed by the system. When it is correct, the system calculates and displays the result. When it is not, a message is displayed. At any time, it is possible to go back to a previous step by clicking on this step. In this mode, the student can ask for help using the menu (figure 2).

210


Figure 3: Input of an action In the action mode, the student is freed from calculations and calculation errors. In the action window (figure 3), the input of expressions is controlled and syntax errors are indicated as soon as they occur. The selection of sub-expressions is simplified: the first click selects a monomial, at each other clicks, the minimal sub-expression including all the clicks is selected. Experiments have proved the interest of this mode. It allows the student to focus on the matching and on the strategy so that (s)he can solve more difficult problems. Of course, we do not consider this mode as the unique one to use. It is a complement to the traditional paper/pencil mode. The analysis function. This function analyses the student's request. Its role is to decide whether the request is correct or not. When the request is correct, it is applied, otherwise feedback is provided. The analysis function does not use the reference solver because there are correct transformations that are not envisaged by the reference solver. For example, factor out x+2 from (4x+8) (x-1)-(2x+4)(3x-2) is a correct request that is not generated by the reference solver which generates factor out 2x+4 from (4x+8)(x-1)-(2x+4)(3x-2). The analysis is performed by production rules. The help function. In the last version, the help function suggests one or two actions among the more promising ones that have not yet been applied. Sometimes, it suggests backtracking. In the first version, the help function was different (we did not want to tell the student what to do): we suggested the possible transformations in the selected step, so that the student had to think to decide which one to choose. This method did not work. Some students followed the first suggestion every time, others did not use the help because there were too many things to read.

3.3. Parameters Parameters determine the behaviour of the system. In the free mode, the student can change their values. In the directed mode, they are set by an automaton. Some of these parameters are: - expand-and-reduce: when it is true, the menu includes an expand-and-reduce action allowing expansion and reduction in one step of first order polynomials like 2(x-3)-3(x+l). - a2+2ab+b2-U: when it is true the identity a2+2ab+b2 can be used by the student with a value of b such as -2, otherwise, the a2-2ab+b2 identity is required with b equal to 2. - help: when the value is true, the action mode allows the student to ask for help.

- delay-help: indicates, when help is true, the delay before giving the student access to help. - step-by-step: when the value is true, the student is not allowed to perform complex matching for factoring out a polynomial (e.g., for factoring out x-4 from (x-4)(x+1)-(2x-8)(3x-1), (s)he has first to transform 2x-8 in 2(x-4)).

3.4. Automata An automaton is a list of linked problems in a file. Each problem is described by a name, an expression, an optional modification of the parameters and optional links. For example : (tbcd1 "9-4x^2+(2x-1)(2x+3) H (context (mode action) (expand-and-reduce yes) (help yes) (delay-help 5) (commute no) (a2+2ab+b2-U no) (do-one-step no) (step-by-step no)) ((or (> nb-aide 0) (eq erreur factor-out-error) abandon) 1d1) (t 1b))

says to factor the expression 9-4x2+(2x-l)(2x+3) in the given context. When this problem is finished, the link indicates the next problem.

J.-F. Nicaud et al. / Teaching Formal Algebra

211

4. Experiments with the APLUSIX prototype From 1987 to 1997, we carried out several experiments. The first showed that a wide majority of the students like to work with APLUSIX (as they told us) and learned with the system (this was attested by a pre-test and a post-test). In this section, we summarise the major experiments we carried out during this period. 4.1. Two experiments done in 1990 and 1991 Two pilot experiments were carried out in 1990 and 1991 the objectives of which were, first, to evaluate the user friendliness of the interface, and, second, to determine the degree of difficulty of the different types of matching modulo a numerical factor, and the way perceptual factors may interfere with the recognition of an expression. The data gathered from these pilot studies led to an important improvement of the interface, particularly in the way the student has to tell the system how to match an expression to a transformation rule. For instance, in the first prototype, when the student wanted to apply the rule A2-B2 -> (A+B)(A-B) to the sub-expression 4x2-9, (s)he had just to designate the subexpression and the rule. In the improved interface, the student must designate the subexpression, the rule and, in addition, the values of A and B. In this way, the student is more active, and it is possible to evaluate the degree of the student's matching knowledge. For instance, a student may recognise that the expression 4x2-9 matches the rule A2-B2, and that the values of A and B are, respectively, 2x and 3; but (s)he may not recognise that the expression 4(3x-2)2-9 matches the same rule (with the correct values for A and B). Hand analysis of the protocols showed that the matching of an expression to a rule can be more or less difficult, depending on the necessity of implicitly factoring out numerical factors, to obtain an expression that can be directly matched to the rule. This result led us to define the notion of "visibility": an expression is totally "visible" when it matches directly a rule. 4.2. An experiment realised in 1992 A controlled experiment on learning to solve factorisation problems with the action mode, was carried out in 1992 with a new version APLUSIX/V0-M2. The student can ask for advice when (s)he do not know how to proceed. When (s)he makes a mistake, the system gives the student a comment about the mistake, an "error message". This experiment comprised three phases. The first two phases concerned the acquisition of matching skills based on the rule factoring out and second order identities. In the third phase, the problems were more complex, they can be solved in different ways, and for some of them, the students must choose between different promising rules. A wrong choice may lead to an impasse. Forty-six secondary school students (14–16 years-old) participated in the experiment. Each student spent four forty-five minutes sessions with the system. A program was developed to diagnose the student's state of matching knowledge during the learning process [6]. The program distinguishes between 17 categories of matching, the categories being defined by five types of basic visibility of matching. To evaluate the student's matching knowledge, the method used consists of sequentially increasing or decreasing a student's matching score for a category of matching knowledge. The score for a category of matching knowledge is increased every time (s)he performs correctly and without help a transformation involving this category. The individual protocols were given to the program to calculate the evolution of the students' score during the learning experiment. The results showed that the notion of visibility we defined was relevant: the less an expression is visible in regard to a given transformation rule, the more it is difficult to be mastered. A hand analysis of the protocols showed three important points. Firstly, the majority of students learned to backtrack, something they did not do in a paper-and-pencil situation. The search tree presented in the display may have played the role of a reification [4, 12] that helped foster the concept of a search space. Secondly, observing the way in which matching knowledge was acquired led us to differentiate between two different styles of learning which we have termed the conceptual based learning style and the experiential based learning style. We hypothesise that these styles of learning are related to the depth of the student's conceptual knowledge of algebra. The student who learned in an experimental way progressed slowly and did not generalise his (her) matching ability immediately from one expression to another : (s)he may have acquired his

212


(her) matching skill solely by establishing a relation of surface similarity between the structures of symbols representing an abstract rule and the structure of symbols representing an algebraic expression. Conversely, students whose progress was based on a deeper understanding of the concepts of square and of factor had less difficulty in quickly generalising their matching ability over expressions that had different surface features. Thirdly, we have observed that the matching knowledge about the identities was more stable than the matching knowledge about the factor out rule. We hypothesised that this result was related to the way the error messages and the prompts were formulated by the system. In the case of identities, the prompts were general, the system told the student to apply an identity to an expression, but it did not tell the student what were the values of A and B to type in. So that the student still had something to do by him(her)self: find the correct values of A and B. The error messages were, on the contrary, very precise. Indeed, in the case where the student wanted to match an expression E to an identity, and the expression cannot match, the error message was "this expression is not a difference of two squares". When the student gave an erroneous value of A and/or B for an identity, the system's error message indicated clearly in which way it was an error, by saying, for instance, "With the values of A and B you gave, A2-2AB+B2= x2-8x+16 and not x 2 -4x+4." So that the student can understand why his (her) action was erroneous. By contrast to the case of identities, in the case of factoring out a (numerical, monomial, or binomial) factor from an expression, the error messages were laconic, and the prompts too precise. For any error in applying the factor out rule, the error message was always the same : "it is not possible to factor out [expression designated as A] from [selected expression]." So that there was no hint about the reason of the error. The prompt concerning the application of the factor out rule was, on the contrary, too precise. Consider the student who asks for help at the problem state (4-12x)(2x-7)-(-2x-7)(9x-3). Among the possible actions is the application of the factor out rule. The prompt was: "factor out 3x-l from (4-12x)(2x-7)-(-2x-7)(9x-3)." So that the student did not have anything to do except to copy exactly the prompt. 4.3. An experiment carried out in 1994 An experiment was conducted in May 1994 [7], with 12 students of the same age. The objective of this experiment was to test the hypothesis that less detailed prompts about the application of the factor out rule, and more detailed comments about the errors in applying this rule would result in an improved learning of this rule, and would also provide more useful data for the analysis of the learning process. The learning situation was identical to that of the preceding experiment. The prompts relating to the identities were not modified. The prompts for the factor out rule were more general. Consider the student who asks for help at the problem state (3x+6)(3-2x)-(2x+3)(-8-4x). The prompt was formulated more laconically "factor out a common factor from (3x+6)(3-2x)-(2x+3)(-8-4x)." So that the student still had to find out by him(her)self the common factor of the expression. The comments on errors differentiate two cases, and the formulation of the comments was more precise about the origin of the error, (i) If the student gives a value of A (a numeral, a monomial or a binomial) that is not a common factor, the system picks out the terms that do not have A as a factor, (ii) If the student wants to factor out a numeral, a monomial or a binomial from a product (for example "factor out 3 from (3x+6)(3-3x)") the system says: "I cannot do what you asked, because common factor applies only to sums." The results showed that the improvement in the matching knowledge about the factor out rule was statistically significantly better for the subjects in this learning experiment than in the previous one. 4.4. An experiment carried out in 1997 The previous experiments used only the action mode. A learning experiment was then conducted in 1997 in which we compared four groups of 20 students. The 80 students were selected by a pre-test in order to obtain four groups that did not differ in their factorisation problem solving ability: low level and high level students were eliminated from the experiment. The 80 students were randomly assigned to one of the four groups in a 2 by 2 factorial plan: two learning contexts (the learning-with-examples context and the learning-just-by-solving context), and presence or absence of help/explanation from the system. These results suggest that, concerning the matching knowledge, learning by action is more effective than learning by observation [8].


213

5. Conclusion, current state and future work 5.1. Conclusion A priori, scientific research does not have the role of realising products, with the industrial and commercial meaning of this term. As an explanation of that situation, we could say that the development of prototypes is sufficient and satisfactory enough to validate a theory, or to allow experiments. This was the way we worked during the last decade. From 1987 to 1997, the various prototypes built up for APLUSIX filled these two objectives: (1) the approach adopted and the problem-solving model (the MCA model cited in section 2.2) made it possible indeed to analyse, model and help algebraic manipulation done by students during problem-solving activities in pre-algebra; (2) experiments led to the refinement of the models and trying to answer general questions about learning theories. But, since the beginning of 1998, the APLUSIX team has decided to design a new version of APLUSIX which will not be a prototype but a product. The work considered for this purpose must tackle constraints of several kinds, which give an explanation of our motivation: 1) constraint of openness: we want to continue the research on the modelling of the cognitive processes of problem-solving and the research on the modelling of the learning processes; 2) constraints of availability: we want to give a general tool (APLUSIX) to teachers and researchers in psychology and didactics to take part in the research (this introduces one of the major characteristics of a product: to be available for all); 3) constraint of quality (design): we want to allow the use and the realisation of experiments on a large scale (feedback is waited for), which introduces other characteristics of a product: robustness, ergonomics; 4) constraints of adaptability and parametrisation: we want to develop the scope of APLUSIX, development in terms of the variety of possible activities (factorisation, equation solving calculus of primitives, etc.) and in terms of available school levels. The parametrisation is intended to allow the product to adapt its functions according to the desired use. The terms openness, availability, quality, parameterisation summarise our objectives. Some of them are clearly the attributes one can hope in a product. The others relate to the research aspect that APLUSIX will continue to have.

5.2. From prototype to product: rewriting rules for the knowledge base The flexibility offered by a prototype is not conceivable for a product under the same conditions. If one can imagine modifying a prototype to carry out such or such experiment or such adjustment, that is not easily conceivable with a product. Each modification would mean a new version of the product to maintain and commercialise. And yet we want to have the same flexibility with the future product as with the preceding prototypes. In order to obtain such flexibility, we have changed the framework. One major change is: the product that we are realising is constructed over one knowledge base which is partly external. This part of the knowledge base will be represented by a set of rewriting rules written in a natural form. The knowledge base will be accessible to an advanced user (teacher or experimenter). If one occults the aspects related to the robustness, the availability and maintenance, the choice for a definition of the knowledge base, independent and accessible, is a major difference between the preceding prototypes and the product to come. We count on this major characteristic of the next APLUSIX to ensure the openness and the parametrisation of the product we want to design. The development of the elements characteristic of the external knowledge base of APLUSIX was based on the preceding experiments and the study of the preceding prototypes. In particular, the study of the preceding prototypes emphasised a set of parameters and a set of knowledge in algebra used, implicitly or not: - parameters of environment (visibility of rules, availability of help and diagnostic) - parameters for the application of rewriting rules (search for sub-expressions, conceptual matching, matching modulo a coefficient) - grouping of rewriting rules in semantic groups (factorisation, reduction, development, etc.) - establishment of a hierarchy in the application of rewriting rules. This set of parameters will be defined in the knowledge base. Algebraic knowledge contained in the preceding prototypes was expressed in a compiled form. In the same way, the matching mechanisms specific to algebra were compiled (matching of sub-expressions, conceptual matching, matching modulo a factor). Sometimes these mechanisms were badly

214


distinguished from the rest of the code. From a general point of view, this knowledge was static. They evolved only with the change of the prototype. Furthermore, this knowledge was not accessible in extenso to non-specialists. However a mathematical form, was diffused [9]. It is this form which was used as a starting point for writing the knowledge base of the future product. For example, in [9] three rules like the following expressed factorisation by a common factor: If E is of the form Al +... +An with n>= 2 and each Ai admits the factor U with degree pi and Q is a canonical integer ranging between I and the minimum of pi and for any i, U with the degree Q is a factor of Ai with cofactor Ci then replace E by Uq(Cl+... +Cn)

The equivalent formulation that we wish to obtain is the following much more natural one: a*b+a*c -> a*(b+c). The use of this rule is not straightforward. It is done using a complex matching mechanism which takes into account the concepts seen in 2.1. Note: If the definition of the rules is opened, the matching mechanism is closed. It is compiled and comprises non trivial mathematical knowledge (associativity, commutativity, identity elements, etc.). However this mechanism is dependent on parameters. These parameters can be changed by program, and defined a priori in the knowledge base for each occurrence of a rewriting rule in one group. 5.3. A product soon The current state of the work is the following: (1) strategic aspects of the problem-solving are designed, (2) display/selection of algebraic expressions are realised, (3) creation ex nihilo of algebraic and the strategic elements have been initialised, (4) a plain algorithm for matching terms with rewriting rules (with matching of sub-expressions modulo associativity and commutativity) is already obtained, (5) an advanced matching algorithm (with conceptual matching, matching modulo a factor) is nearly achieved. A first version of the final product, with more or less the same functionalities as APLUSIX-M0-V2, is planned for the end of 1999. Other versions, with a developed scope are planned for 2000. Our current platform for the work is Windows 95 and Object Oriented Pascal Delphi. 6. References [1] Anderson J. R. (1983): The Architecture of Cognition. Harward University Press. [2] Beeson M. (1996): Design Principles of Mathpert: Software to support education in algebra and calculus, in: Kajler, N. (ed.) Human Interfaces to Symbolic Computation, Springer-Verlag. [3] Bundy A. & Welham B. (1981): Using Meta-level Inference for Selective Application of Multiple Rewriting Rule Sets in Algebraic Manipulation. Artificial Intelligence, Vol 16 (2). [4] Conlon, T. (1993). PathFinder: A Programming Tool for Search Based Problem Solving. Proceedings of the Seventh International PEG Conference, Edinburgh, 74-82. [5] Gelis J.M. (1994). Elements of a theory of algebra for ILEs. Proc. of the international symposium on mathmeatics/science education and technology. San Diego. AACE publisher. [6] Nguyen-Xuan A., Joly F., Nicaud J.F, Gelis J.M. (1993). Automatic diagnosis of the student's knowledge state in the learning of algebraic problem solving. Proc. of AI-ED93. [7] Nguyen-Xuan, A., Nicaud, J. F. & Gelis, J. M. (1997). Effects of feedback on learning to match algebraic rules to expressions with an intelligent learning environment. J. of Computer in Math and Science Teaching, 16, 291–321. [8] Nguyen-Xuan, A., Bastide A., Nicaud, J. F (1999). Learning to match algebraic rules by solving problems and by studying examples with an ILE. Proc. of AI-ED'99. [9] Nicaud J.F. (1994): Modelisation en EIAO, les modeles d'APLUSlX. Didactique et acquisition des connaissances scientifiques, Vol 14, (1-2), la Pensee Sauvage, Grenoble. [10] Oliver J., Zukerman I. (1990): DISSOLVE: An Algebra Expert for an ITS. Proc. of ARCE, Tokyo. [11] Pearl J. (1984): Heuristics. Addison-Wesley, London. [12] Singley, M. K. (1990). The reification of Goal Structure in a Calculus Tutor: Effects on Problem-Solving Performance. Interactive Learning Environments, 1 (2), 102–123.


215

Learning to Solve Polynomial Factorization Problems: By Solving Problems and by Studying Examples of Problem Solving, with an Intelligent Learning Environment Anh Nguyen-Xuan, Anne Bastide ESA-CNRS "Cognition et Activites Finalisees", Universite de Paris 8, 2, rue de la Liberte, 93526 Saint-Denis Cedex, France [email protected]

Jean-Francois Nicaud IRIN, 2 rue de la Houssiniere, BP 92208, 44322 Nantes cedex 3, France Jean-Francois.Nicaud@ irin.univ-nantes.fr

Abstract The relative merits of these two modes of cognitive skills acquisition is still a matter of much debate. Solving factorization problems is an important skill in algebra. It implies, in particular, the sub-skill of recognizing how to match an algebraic subexpression with a transformation rule, and strategic knowledge for choosing the transformation rules to be applied. An intelligent learning environment (ILE) was developed which can either solve problems and progressively exhibit the search tree it generates, or let the student solve the problem while alerting him (her) of his (her) mistakes. We conducted an experiment in which we compared four groups of students who learnt to solve difficult factorization problems. The first two groups learnt by studying examples. The third and fourth groups learnt by solving problems. The results suggested that learning by doing favored the acquisition of factorization problem solving skills.

1. The theoretical framework Several studies in cognitive psychology have demonstrated that the problem solving process differs depending on whether the problem solver is an expert or a novice [1,2,3]. The two main differences concern the use of search strategies and the degree of operator automatisation. Experts favor forward searching and novices are more prone to use a backward meansends search strategy. Experts are able to use forward searching because they already possess a set of schemata for different categories of problems. A problem schema is a goal structure of a class of problems. As for operators, experts apply them automatically, whereas novices make frequent errors, calculation errors for example, or the application of inappropriate operators etc. In several studies, John Sweller and his colleagues have shown that when novices learn by solving problems, means-ends analysis is the only search strategy that they can use, and that this type of search is very costly in terms of cognitive load [4,5,6,7]. On the other hand, when novices learn by studying examples of problem solving they do not have to search, they just have to observe the operator application sequence and the successive states that are generated. With this approach, the cognitive load is not very high. In the studies conducted by Sweller and his colleagues, the comparison between the two modes of learning demonstrated the superiority of learning by studying examples [6,8]. A study of the publications in this field prompts two questions: 1/In all of the experiments conducted by researchers, the students were given only a few operators with a sample of problems and its solution in a subject matter which was in fact, up until that moment, quite new to them. A short training phase (of about one hour) would follow: one group would learn by studying examples of problem solving, the other group would have to solve these same problems all by themselves. During the post-test phase both groups would have to solve problems having the same degree of difficulty as those encountered in the training phase. The authors referred to the subjects prior to the training phase as "novices". After training those subjects who scored high marks in the post-test phase were considered to be "experts".

216

A. Nguyen-Xuan et al. / Polynomial Factorization Problems

We do not agree with this method of categorizing knowledge states. In our opinion all the subjects studied in these experiments are in fact beginners who, after undergoing a short training program, can be considered to be no more than "novices". Under no circumstances can they be called "experts". Common sense alone dictates that if a student is a complete beginner, the problems given to him(her) can but allow him(her) to acquire a better grasp of what is meant by a "problem" or a "solution" in any given subject matter. The first question that arises is : Is learning by studying examples always the best way for a "novice" ( i.e. someone who knows what is meant by a problem and what is meant by a solution in a given subject matter, and who is able to solve "easy" problems) to become an "expert" (i.e. someone who is able to solve more "difficult" problems)? 2/ In the experiments referred to above the students who learnt by doing (or by solving problems) had no feedback whenever they made mistakes during the problem solving exercise. Because of this, and irrespective of whether or not they found the right answer, they were unable to reach any meaningful conclusions with regard to their own working methods. In fact, the second of our questions concerns this very issue of feedback : if the learner gets feedback as soon as he makes an error, does this enable him(her) (i) to improve his(her) ability to recognize situations in which the use of given operators is inappropriate, and then perhaps (ii) to cut away from his(her) search-tree branches which lead only to deadends and so clear the way to finding the correct solution path? In a previous study [9], we showed that providing feedback when errors were made had a significant effect on the way learning progressed. The learning experiment that we describe here is an attempt to answer these two questions. We will compare two modes of learning : learning by studying examples of problem solving and learning by problem solving. Our students were novices and not beginners : they had to be able to solve simple problems in the chosen subject matter. These learners underwent training which lead them to becoming "experts" : the learning experiment consisted of several training sessions. The problems became progressively more difficult as the experiment proceeded. Finally, unlike classical learning experiments, the post-test phase consisted of a group of problems which were much more difficult than those given in the pre-test phase. The area that we chose to study was the solving of polynomial factorisation problems. The learning experiment used an intelligent learning environment (ILE), the APLUSIX system. Our hypothesis was that learning by studying examples of problem solving probably enables the student to recognize more clearly the solution schemata; this because the solved problem presented to the student comprises no redundant "states". On the other hand, as far as operator automatization is concerned, learning by problem solving with an ILE is more effective because whenever the learner makes a mistake, he/she immediately gets a message informing him(her) of what is wrong with the operation that he/she wishes to perform. Sections 2 and 3 are taken up by a presentation of the polynomial factorization domain and the APLUSIX system. The experiment and the results are described in section 4. The results are discussed in section 5.

2. Definition of a polynomial factorization problem and the notion of visibility Solving a problem of factorization consists of applying a sequence of rewriting rules to a given expression. The problem is solved when a product of prime polynomials obtains. Example: given: (7x+2)2-(5x+2)(7x+2)+2( 13x2-6) solution: (10x+6)(4x-2) To solve this kind of problem, four different types of knowledge are required: (l)knowledge of the rewriting rules; (2)matching knowledge; (3)calculation skills; and (4)heuristics for choosing the most promising rule to be applied. Heuristics are necessary when the problem is complex. But for complex problems, one must also be able to identify the operators that may be applied to a problem state, before considering which operator may be the most promising. That is, to become a skilled factorization problem solver, knowledge of rewriting rules and matching knowledge must come before heuristics.


217

Knowing the rewriting rules means knowing the operators. Knowing how to match them with algebraic expressions means knowing the conditions in which the operators can be applied. In their fourth or fifth year at secondary school (14–15 year old), students learning to solve factorization problems are taught how to use four principal types of rewriting rules: - factoring out an expression A from an expression B: AB+AC ==> A(B+C) - applying the special identity to an expression, (we distinguish between two different types of special identity): Square of a sum : A2+2AB+B2 => (A+B)2, and Difference of two squares : A2-B2 ==> (A+B)(A-B) - expanding an expression; - reducing an expression. For the rules "Factor out A from B", and the special identities, we will show below why matching knowledge can be difficult to acquire. For that, we define the notion of "visibility". Matching of the rewriting rules can either be direct or extended. A direct matching is a one-to-one mapping of the elements on the left hand side of the rewriting rule and an expression. Matching is said to be extended when an expression can be mapped onto the left hand side of a rewriting rule provided that one or more intermediate calculations are performed beforehand. We use the concept of matching visibility to characterize an expression to which a rule can be applied. An expression that can be directly matched to a rule is characterized as evident; it has maximum visibility. For instance, we call x2-32, that directly matches with A 2 -B 2 , Difference of two squares whose visibility is evident. We distinguish between four basic types of algebraic extended matching: commuted (in which the order of the terms is different from the one given in the formal rule), opposite (in which the implicit intermediate rewriting involves factoring out -1), one-factor (in which the implicit intermediate rewriting involves factoring out a constant k), two-factors (in which the implicit intermediate transformations involve factoring out a constant k for one factor, and a constant k' for the other factor). In addition to these four basic types of algebraic extended matching, we can identify one more type of extended matching, when rewriting rules are applied to sub-expressions. We call this type of extended matching "dispersion". Dispersion" happens when the members of the sub-expression to which a rule can be applied are not assembled together. For instance, the sub-expression 9x2-4, in 9x2+2(8+18x2-24x)-4. Two or more of these five types of extended matching may be combined, and such combinations define the degree of visibility of an expression with regard to a rewriting rule. It is sometimes difficult to recognize an expression or a sub-expression to which a particular rule may apply. Therefore, different levels of matching skills are required. Before presenting the experiment, we will briefly describe our ILE. 3. The ILE "APLUSIX" The APLUSIX learning environment uses a knowledge base, called Reference Knowledge State [10], which encompasses human knowledge in heuristics for searching and rule matching. Several experimental studies [9,11,12] were conducted in order to: (i) develop an automatic program for diagnosing student knowledge states during the learning process; (ii) improve the way the system interacts with the student. APLUSIX provides the student with two learning modes: learning-by-example (i.e., learning by observation), and leaming-by-doing (i.e., learning by problem solving). - In the learning-by-example mode, the system reveals in a step-by-step display a search tree generated with the current Reference Knowledge State [cf.13]. In this mode, the learner can obtain an explanation of each step taken by the system. For example : "3x-l is a common factor in (4-12x)(x-7)-x(9x-3) because (4-12x)(2x-7)=-4(3x-l)(x-7) and -(-2x-7)(9x-3)=-3x(3x-l)" - In the learning-by-doing mode, the learner expands his(her) own search tree by choosing, at each step, an expression and an applicable rewriting rule. In this mode, the learner can choose from a menu of possible formal rewriting rules, or ask for help in order to obtain suggestions about the options open to him.

218


There are four types of rules: factor, expand, reduce, expand and reduce. The rule "factor" must be specified using one of four sub-categories as shown in the menu. APLUSIX uses its knowledge base to test the validity of the transformation the student wants to perform [cf.10, for more details]. If the transformation is valid, the system performs it; that is, the system generates the new expression, so that the student is freed from the task of calculation (cf. note 1). If the transformation is not valid, the system tells the student why. Figure 1 presents an example of an error message given by the system.

Figure 1. An example of an error message.

When the student asks for help, the options generated by the system are based on a strategic classification of the possible actions (for more details, cf. [11]). For example : "go to step #2, expand the sub-expression 3x(3x+2)+l." When the student thinks that he/she has solved the problem, he/she can click on the "solved" button of the menu : the system then either displays "OK" or "the problem is not yet solved." 4 . The experiment and the results 4.1. The experiment The participants were 14-15 year-old students. They were in their fifth year of secondary school. All the students belonged to the same school. The experiment comprised three phases: pre-test, learning, and post-test. 1/ In the pre-test phase, 102 students were given 21 simple factorization problems to solve by themselves. The matching was either evident or had been made "opaque" by a single extended matching. The pre-test was used to select students with regard to their initial knowledge. Students who were either too poor or too expert, (i.e. those who made, respectively, more than 30, and less than 6 errors) were eliminated from the experiment. Eighty students were thus selected for the learning and the post-test phases. They were randomly assigned to one of four groups in a 2 by 2 factorial plan: two learning modes (by studying examples or by solving problems), and the presence or absence of help from the system. In the learning by studying examples mode, help consisted in allowing the students to ask for explanations about an action the system had just performed; help in the problem solving mode consisted in allowing the students to ask the system for advice. 2/ In the learning phase, the problem solving groups (G3 and G4) solved 28 problems, with or without the help option. The 28 problems were made up of 14 pairs of progressively more difficult isomorphic problems. The students in the two learning by studying examples groups (G1 and G2) did not simply study the examples. In fact, they had to study both problems from each pair of problems given to the problem solving groups, before being given a third isomorphic problem to solve by themselves, and without help. Consequently, the learning by observation groups not only had the opportunity to study examples at their own speed, but also the opportunity to solve extra problems by themselves.


219

3/ In the test-phase, all participants solved 24 problems without the help option. The 24 problems were of two types: 12 problems were isomorphic from a set of the most difficult problems that had been studied in the learning phase, and 12 were new problems which were more difficult than the problems already encountered in the learning phase. The difficulty did not concern the number of minimum steps necessary to solve the problem, it concerned the degree of visibility of the rule "Factor out A from B". The participants worked in groups of four or five with the same supervisor (Anne Bastide. An automaton was implemented in the system to monitor the experiment (for more details, cf.[9]). Each individual, though, worked with his(her) own personal computer. Working at their own rhythm, the students took between four and six 55mn sessions to go through all of the problems in the three phases of the experiment. The students were not allowed to take notes. 4.2. Results The dependent variables we considered were the individual number of matching errors and the individual number of steps taken to solve a problem. The number of matching errors was a measure of matching skills, and the number of steps taken to solve a problem was a measure of the adequation of the schema used to reach the solution. We did not consider all the possible matching errors. Only the most significant errors were taken into account. Six types of errors were considered: - False recognition of a factor A as a common factor of B; - False recognition of an expression as being matchable to a special identity rule; - When the student wanted to apply a special identity rule to an expression and s/he gave a false value for A and/or for B; - When the student wanted to reduce an expression that was already reduced; - When the student wanted to expand an expression that was already expanded. 4.2.1. The pre-test phase The sample we considered in our statistical elaboration comprised of four groups of 20 participants each. Table 1 presents the means and standard deviations for the four groups. There were 21 problems in the test phase. Number of errors Number of steps Number of participants per group: 20 Standard Standard Mean deviation deviation Mean Number of problems: 21 Group 1 (studying examples, no explanation) 12.15 9.9 4.69 100.1 Group2 (studying examples, explanation) 13.20 110.1 5.89 16.8 6.27 Group3 (problem-solving, no help) 12.75 102.5 14.9 14.1 12.85 103.5 Group4 (problem-solving, help) 5.89 Table 1. Means and standard deviations for the number of errors and the number of steps in the test phase.

ANOVA results showed that, at the level of p=.05, there was no statistical difference between the four groups for the pre-test phase, either for the number of errors or for the number of steps taken to solve the problems given. There was no significant interaction between "learning mode" and "help". 4.2.2. The learning phase For the learning phase, the two groups who learned by studying examples had to study two examples before solving a third isomorphic problem by themselves. The total number of problems they solved during the learning phase was 14. The two groups who learned by problem solving had 14 pairs of isomorphic problems. For these latter two groups, we counted the number of errors and the number of steps to solution for the second of each pair of problems. Table 2 presents these data. The principal results of 2x2 factorial plan ANOVAs (two learning modes, each with a help and a no-help option) were as follows: - In the learning phase there was a significant effect (at the level of p=.05) of the learning mode on the number of errors made. The learning by problem solving mode produced significantly fewer errors. The fact that the student could ask for explanation or advice did not significantly differentiate the groups. There was no interaction effect between "learning mode" and "help". - However, with regard to the dependent measure based on the number of steps taken to solve the problems, there was no significant difference between either the "learning mode"


220

factors or between the "help" factors. Nor indeed was there any interaction between these two independent factors. Number of errors Number of steps Number of participants per group: 20 Standard Standard Number of problems: 14 Mean deviation Mean deviation Group1(studying examples, no explanation) 8.15 5.34 81.8 9.0 Group2 (studying examples, explanation) 8.45 4.42 83.6 10.9 Group3 (problem-solving, no help) 5.95 82.8 3.65 10.2 Group4 (problem-solving, help) 4.75 3.19 77.9 6.05 Table 2. Means and standard deviations for the number of errors and the number of steps in the learning phase (the second of each pair of problems for groups 3 and 4, the third problem for groups 1 and 2).

As far as matching skills are concerned, the results suggest that studying two examples of problem solving prior to solving a third isomorphic problem oneself is not as useful as solving a unique problem first and an isomorphic one second. 4.3. The post-test phase The post-test phase comprised 12 isomorphic problems from amongst the most difficult problems encountered by the students during the learning phase, plus 12 new problems that were more difficult than the problems encountered in the learning phase. However, the new problems' greater degree of difficulty concerned matching knowledge and not problem schema. We separated these two groups of problems and performed 2x2 factorial plan ANOVAs on the raw data of errors and the problem solving path lengths. Tables 3 presents the data for this phase. First, concerning the errors, there was a significant difference (at the level of p=.05) between the modes of learning: for both the isomorphic problems and the new problems, the problem solving learning mode produced significantly fewer errors. There was no significant effect for the "help" factor, nor was there any significant interaction between the two independent factors "learning mode", and "help". Number of steps Number of errors Number of participants per group: 20 New New Isom. Isom. New Isom. Isom. New Numb, of problems: Mean S.D.* Mean S.D.* Mean S.D* Mean S.D.* 1 2 isomorphes & 12 new 103.00 21.95 4.06 5.69 77.65 8.77 6.20 9.65 Group1 (study. ex., no explan.) 4.59 8.90 5.55 76.65 11.10 100.95 22.08 8.45 Group2 (study, ex., explan.) 3.64 94.85 11.70 2.59 4.90 72.10 5.21 4.20 Group3 (problem-solving, no help) 94.80 16.51 4.65 72.30 7.96 3.43 6.70 4.75 Group4 (problem-solving, help) Table 3. Means and standard deviations for the number of errors and the number of steps in the posttest phase for the 12 isomorphic and 12 new problems (S.D.* = standard deviation).

Second, concerning the problem path length, the mode of learning had a significant effect with regard to the 12 isomorphic post-test problems. The mean path length was significantly shorter for the two groups G3 and G4 than for the groups G1 and G2. However, although the same tendency was observed for the new problems, the difference was not statistically significant at the .05 level. The effect of the "help" factor was not significant, and, once again, there was no interaction between the two independent factors. Nevertheless, with the exception of Group2, the new problems gave rise to significantly more errors and longer problem solving paths than the isomorphic problems. This was not an unexpected result, but simply the demonstration that a problem which had already been encountered was easier to solve than a brand new problem. 5. Discussion Our study shows that in the case of acquiring expertise in polynomial factorization problem solving, learning by problem solving is more effective than learning by studying examples. This result is in contradiction with the results from several studies conducted by Sweller and his colleagues. Nevertheless, we must point out at least three differences between Sweller's research work and our study. Firstly, as we argued in the introduction, in the domains studied by Sweller, expertise consists in acquiring the relevant schemata for conducting a search in the problem space, and in operator automatisation. We consider that operator automatisation comprises two aspects:


221

knowledge of what operators can be applied to a current state, and knowledge of how to apply them. The first aspect of operator automatisation consists in recognizing the application conditions of the operator; the second aspect consists in applying correctly the operators. In the domains of simple geometry and physics problem solving studied by Sweller, the operator application conditions do not increase in difficulty. As a result only the second aspect of operator automatisation has to be acquired. In factorization problem solving, expertise also comprises both acquisition of appropriate schemata for guiding the search and operator automatisation. However, in our experiment, the application conditions of operators become less and less visible as the problems become more difficult, and the application of operators also becomes more and more complex. That is, with complex expressions, the calculations become more difficult. However, by using our ILE, which takes care of calculations, we were able to concentrate solely on the first of the two aspects of operator automatisation: the acquisition of expertise in the recognition of operator application conditions. We are led to conclude therefore that Sweller's studies on learning concerned the second aspect of operator automatisation, whereas our study concentrated on the first aspect. The second difference is related to the issue discussed above, it concerns the effect of feedback in learning by problem solving. In a classical (classroom) situation, the students are given exercises to solve by themselves before receiving feedback from the teacher. The feedback is either the correct solution displayed on the blackboard, or comments and corrections marked on the hard copy produced by the students. In both cases, the feedback is not immediate. Because the work he/she has to do is over, the student may or may not pay attention to the teacher's corrections and comments. On the other hand in the case of problem solving with an ILE, feedback is immediate at each step of the resolution path. As a result, and in order to continue, the student has to take account of the errors highlighted by the system, and think how to correct them. The good results obtained using the learning by problem solving mode may be, at least partly, due to this immediate feedback that the system provides the student. The third difference we wish to highlight concerns the degree of expertise acquired in the learning experiments. The learning experiments conducted by Sweller and his colleagues were short experiments in which the learners were real beginners who learned to solve problems they had never solved before. The students learned to study examples or to solve problems during a short time (one session), so it cannot be said that the students acquired any real expertise in the domain. They simply learned to solve simple problems that they were not able to solve before. In fact, the problems did not greatly increase in difficulty between the pre-test and the post-test. Exactly the opposite applied to our experiment: the participants already had some knowledge about factorization problem solving, and they already knew the operators, and how to apply them in simple problems. The learning experiment resembled more of a training program designed to help the students acquire a certain level of expertise, since the problems given during the learning phase and those given in the post-test phase were much more complex than the ones given in the pre-test phase. The fourth difference to be discussed concerns the issue of "cognitive load" which is the core notion of Sweller's theory. This theory claims that in learning by solving problems, learners cannot but devote a large part of their cognitive resources to searching, and that they cannot but use the very costly (from a cognitive load point of view) means-ends search strategy. It follows that, learning by studying examples relieves learners from expending most of their cognitive resources in such searching, so that full attention can be devoted to the problem solving method displayed by the examples, and to the study of the choice of operators at each problem state. This analysis is relevant in the case of the simple problems in geometry, physics, and the algebraic equations studied by Sweller, in which the goal state was precisely defined, so enabling a means-ends backward search. However, in the case of factorization problems, the solution is defined in a very general way: "a product of prime polynomials". Therefore, a backward search cannot really be used. The problem solver can only use a forward search. If the problem solver is an expert, s/he may possess heuristics for choosing the most promising operators for solving a problem. For instance, one of the heuristics for choosing between rewriting rules is: prefer the rule "Factor out A from B" to the rule "Expansion". On the other hand, a heuristic approach does not always lead to the best choice. With regard to this third difference between Sweller's studies and our own, we can therefore conclude that our study introduces a nuance to the claim that novices cannot but use means-ends searching for solving problems, because there are problem domains where a

222


backward search strategy cannot be used. The most important knowledge to acquire in the first place is that concerning operator application conditions. Finally, we would like to point out two results that are coherent with Sweller's theory. Firstly, we have seen that, within the context of our ILE, the learning by problem solving mode is the more effective. This advantage extends to the use of matching knowledge in both the isomorphic and the new problems in the post-test phase. As for the acquisition of problem schemata, the learning by problem solving mode proved to have no advantage over the learning by observation mode when the students were faced with new problems. Indeed, we believe that, with regard to the acquisition of problem schemata for very complex problems, learning by studying examples is more helpful than learning by doing. The reason for this is that, when learning by studying examples, there is no risk of losing sight of the problem schema amidst the entangled undergrowth of fruitless solution paths. Secondly, by freeing the students from the task of calculation, which is very cognitively demanding, both learning modes succeeded in guiding the students to a level of expertise in factorization problem solving that cannot be attained in school, where students learn to solve factorization problems with paper and pencil. Indeed, at the end of the learning experiment, most of our students were able to solve problems with minimal path lengths and without matching errors. These were very difficult problems not usually given in classroom pencil and paper situations. Note 1: Since Sweller and his colleagues had showed that learning by studying examples was more efficient only when the student did not have to integrate too many sources of information [6], we had to take this aspect into account in our own experiment. Our ILE does it by relieving the student of the task of calculation which latter could hamper the learning of matching skills in either one of two ways. First, errors of calculation lead necessarily to wrong solutions, and, secondly, calculation is a cognitively costly task that may distract the learner's attention from the important task of analyzing the steps towards the solution.

References [1] J.H. Larkin, J. McDermott, D. Simon and H.A. Simon, Models of Competence in Solving Physics Problems. Cognitive Science, 4 (1981), 317–345. [2] J.H. Larkin, Enriching Formal Knowledge: a Model for Learning to Solve Textbook Physics Problems. In J.R. Anderson (Ed.), Cognitive Skills and their Acquisition. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1981. [3] J.Sweller, R.F. Mawer and M.R. Ward, Development of Expertise in Mathematical Problem Solving. Journal of Experimental Psychology: General, 112 (1983), 639–661. [4] G. Cooper and J. Sweller, Effects of Schema Acquisition and Rule Automation on Mathematical Problem-Solving Transfer. Journal of Educational Psychology, 79 (1987), 347–362. [5] J. Sweller, Cognitive Load During Problem Solving: Effects on Learning. Cognitive Science. 12, (1988), 257-285. [6] M. Ward and J. Sweller, Structuring Effective Worked Examples. Cognition and Instruction, 7( 1990), 1– 39. [7] P. Ayres and J. Sweller, Locus of Difficulty in Multistage Mathematics Problems. American Journal of Psychology, 103 (1990) 167–193. [8] R.A. Tamizi and J. Sweller, Guidance During Mathematical Problem Solving. Journal of Educational Psychology, 80 (1988), 424-436. [9] A. Nguyen-Xuan, J.F. Nicaud and J.M. Gelis, Effect of Feedback on Learning to Match Algebraic Rules to Expressions with an Intelligent Learning Environment. Journal of Computers in Mathematics and Science Teaching, 16(1997), 291–321. [10] J.F. Nicaud, Reference Network: A Genetic Model for Intelligent Tutoring Systems. Proceedings of ITS'92, Montreal (1992), 351-359. [11] J.F. Nicaud, Modelisation en EIAO, Les Modeles d'APLUSIX. In N.Balacheff and M. Vivet (Eds.). Didactique et Intelligence Artificielle. Grenoble: La Pensee Sauvage (1994). [12] A. Nguyen-Xuan, J.F. Nicaud, J.M. Gelis and F. Joly, Automatic Diagnosis of the Student's Knowledge State in the Learning of Algebraic Problem Solving. Proceedings of the AI-ED'93. Edinburgh (1993), 489–496. [ 13] J.F. Nicaud, D; Bouhineau, C. Varlet and A. Nguyen-Xuan, Toward a Product for Teaching Algebra. Proceedings of the AI-ED'99, Le Mans (1999), France.

Foundational Issues for AI-ED



225

The Plausibility Problem: Human Teaching Tactics in the 'Hands' of a Machine Benedict du Boulay, Rosemary Luckin & Teresa del Soldato+ School of Cognitive and Computing Sciences University of Sussex, UK Email: [email protected] + Agora Professional Services, Kingston upon Thames, UK

Abstract: This paper explores the Plausibility Problem identified by Lepper and his colleagues, namely the question as to whether teaching tactics and strategies which work well for expert human teachers can also work for machine teachers. We offer examples of the plausibility problem and argue that the issue arises in part from the impoverished expectations of students about what intelligent learning and teaching systems can achieve and in part from expectations about who should play what role in a learning interaction with a machine. We believe the problem will be reduced, though not necessarily eradicated, if systems are properly integrated within the overall educational context in which they are to be used, though further work is needed to establish this. 1 Introduction There are three principled methodologies for developing the teaching expertise in AIED systems. First is the empirical observation of human learners and human teachers followed by an encoding of effective examples of the teacher's expertise, typically in the form of rules. An influential early example of this methodology was "Socratic Tutoring" [6]. Socratic Tutoring provides a number of detailed teaching tactics for eliciting from and then confronting a learner with her misconceptions in some domain. A more recent example of the methodology is provided by Lepper et al. [14] who analysed the methods that human tutors use to maintain students in a positive motivational state with respect to their learning. Ohlsson [18] provides an analysis of the great variety of teaching actions in a versatile teacher's repertoire, and berates AIED for implementing only a tiny proportion of this versatility. Bloom [2] compared the effectiveness of a number of general teaching strategies in terms of learning outcomes and provided adaptive systems with the goal of increasing mean learning gains by two standard deviations compared to conventional classroom teaching. The second methodology starts from a learning theory and derives appropriate teaching tactics and strategies from that theory. Conversation Theory [19] and its reification in various teaching systems is an example of this approach. As with Socratic Tutoring, Conversation Theory is concerned essentially with epistemology rather than with affective aspects of teaching and learning. It is concerned to ensure that the learner constructs a multifaceted understanding of a domain that allows her to describe (to herself or to others) the inter-relationships

226

B.

du Boulay et al. / The Plausibility Problem

between concepts. In some ways it is echoed by the "self-explanation" view of effective learning [4]. An example of the second methodology that partially addresses some of the affective issues is Contingent Teaching [23]. Here the idea is to maintain the learner's agency in a learning interaction by providing only sufficient assistance at any point to enable her to make progress on the task. The evaluation of this strategy in the hands of non-teachers who had been deliberately taught it shows that it is effective but sometimes goes against the grain for experienced teachers who often wish to provide more help at various points than the theory permits [24]. The third methodology is an amalgam of the above two. This builds a computational model of the learner or of the learning process and derives a teaching strategy or constraints on teaching behaviour by observing the model's response to different teaching actions. For example, VanLehn et al. [22] compared two strategies for teaching subtraction to a production rule model of a subtraction learner and concluded, on the basis of the amount of processing engaged in by the mode, that the "equal additions" strategy was more effective than the more widely taught "decomposition" strategy. With a similar general methodology VanLehn [21] derived "felicity conditions" on the structure of tutorial examples, for instance that they should only contain one new subprocedure. In addition to any problems of educational effectiveness in practice, all three of these methodologies are vulnerable to what Lepper et al. [14] call the "Plausibility Problem": "Even if the computer could accurately diagnose the student's affective state and even if the computer could respond to that state (in combination with its diagnosis of the learner's cognitive state) exactly as a human tutor would, there remains one final potential difficulty: the plausibility, or perhaps the acceptability, problem. The issue here is whether the same actions and the same statements that human tutors use will have the same effect if delivered instead by a computer, even a computer with a virtually human voice." [14](page 102) In other words will human teaching tactics and strategies, or tactics derived from learning theories or learning systems work effectively in an intelligent learning or tutoring environment? It is important to stress thatthispaper is not arguing for an "anti" Artificial Intelligence in Education (AIED) stance. Indeed, although there have not been very many evaluations of the educational utility of the individual adaptiveness implemented in AIED systems, those that are reported offer grounds for optimism, see e.g. [11]. For brief surveys of AIED evaluations see [9,20]. One response to the plausibility problem has been an increasing interest in the development of learner companion systems of various kinds, see e.g. [3]. Here the idea is that the human learner has access to a (more or less) experienced fellow learner who can either provide help, act as a learning role model, or through its mistakes act as a reflective device for the human learner. Most of these systems start from the premise that learners need interactions with more than just teachers and that certain sorts of interaction are better conducted with a peer than with a teacher. Of course, computer-based companions raise their own version of the "plausibility problem" compared to their human counterparts. This paper explores the plausibility issue by reference to two examples from the work of the authors. The first example offers an account of students finding certain human teacherlike behaviours unacceptable when exhibited by a machine. The second example is used to argue that this sense of what is acceptable and what is not acceptable is strongly conditioned by the rather narrow range of machine behaviours that students have actually experienced. Finally it suggests that as intelligent learning environments and teaching systems find their

B. du Boulay et al. / The Plausibility Problem

227

way into the mainstream of education the plausibility problem is likely to diminish, especially if the systems are properly embedded in the educational context in which they are to operate. However even careful embedding may not be enough to undermine deep preconceptions about what a "reasonable" role for a machine teacher should be. 2 Denial of Help by the System Students are used to being observed by a teacher while they struggle with some problem, for example a mathematics problem, and yet do not receive help. It may be frustrating for them to think that if only the teacher proffered some assistance the intellectual struggle could be terminated sooner, but most will accept that in many circumstances there is value to be had in trying to solve a problem for themselves. Del Soldato [7,8] implemented various of the motivational tactics, e.g. derived by [10, 12, 13, 14] in a prototype tutor to teach rudimentary debugging of Prolog programs. Her system had three sets of teaching rules. The first set of (problem domain) rules were concerned with helping the student move through the curriculum of debugging problems, from easy to the more difficult, respecting prerequisite links. A second set of (motivational) rules was concerned to maintain the students' sense of confidence and control. Sometimes these two sets of rules would make similar suggestions to the tutoring system about the difficulty of the next problem to be given to the student or about the level of specificity of help that should be provided in response to a request for help from the student. But there were situations where the problem domain and the motivational rules offered opposite advice. In order to reconcile such occasional conflicts of advice within the system, there was a third set of rules whose job it was to try to meld the suggestions from the other two sets of rules into a coherent single strategy — in fact, giving priority to the motivational if there was an irreconcilable clash. The system (MORE) was evaluated by comparing a version with the motivational rules switched on with one where they were disabled. The version using motivational rules was generally liked by students but two negative reactions from students are noteworthy. One of the rules in the system was designed to prevent the student prematurely abandoning a problem and moving on to the next one, if the system believed that the student was not exhibiting enough "effort", as measured by the number of actions the student had taken in the partial solution. "One subject was showing signs of boredom from the start of the interaction. . . . After a little effort trying to solve a problem, the subject gave up and the tutor encouraged him to continue and offered help. The subject kept working, grumbling that the tutor was not letting him leave. When comparing the two versions of the tutor he recalled precisely this event, complaining that he had not been allowed to quit the interaction." [7](page 77) Further rules were concerned with deciding how specific a help message should be delivered in response to a help request — not dissimilar to the rules in Sherlock, see e.g. [15], or indeed to the Contingent Teaching strategy [23]. However in some circumstances the help system refused to offer any help at all in response to a request from the student, in the belief that such students needed to build up their sense of control and that they were becoming too dependent on the system. "The subjects who were refused a requested hint, on the contrary, reacted strongly against the tutor's decision to skip helping (ironically exclaiming "Thank

228


you" was a common reaction). Two subjects tried the giving-up option immediately after having had their help requests not satisfied. One case resulted in the desired help delivery (the confidence model value was low), but the other subject, who happened to be very confident and skilled, was offered another problem to solve, and later commented that he was actually seeking help." "One of the subjects annoyed by having his help request rejected by the tutor commented: "I want to feel I am in control of the machine, and if I ask for help I want the machine to give me help". When asked whether human teachers can skip help, the answer was: "But a human teacher knows when to skip help. I interact with the human teacher but I want to be in control of the machine". It is interesting to note that the subject used to work as a system manager." [7](pages 76–77) In both these cases the student was surprised that the system behaved in the way that it did — not we believe because the system's response was thought to be educationally unwarranted, but because it was "merely" a machine and it was not for it, as a machine, to frustrate the human learner's wishes. 3 Refusal of Help by the Users Learner's expectations are an important factor of the plausibility problem. Increasingly learners are exposed to computers in their learning and in other aspects of their lives. They absorb the cultural computation conventions and facilities for giving help. These build up expectations of the degree of focussed assistance that they might reasonably expect. In the second example the plausibility problem may be responsible for results which confounded expectations. There are a number of differences between this system and that of del Soldato, described above. It was aimed at school children, specifically designed to be similar to other educational systems they had used and was evaluated in the children's everyday class. It also explored a topic — simple ecology — that the children were learning at school and, in the versions that decided how helpful to be, was designed to ensure that the child succeeded as far as possible, even if this meant that the system did most of the work. Three versions of a tutorial assistant which aimed to help learners aged 10 – 11 years explore food webs and chains were implemented within a simulated microworld called the Ecolab [16]. The system was developed to explore the way in which Vygotsky's Zone of Proximal Development might be used to inform software design. The child can add different organisms to her simulated Ecolab world and the complexity of the feeding relationships and the abstractness of the terminology presented to the learner can be varied. The simulated Ecolab world can be viewed differently, for example in the style of a food web diagram, as a bar chart of each organism's energy level or as a picture of the organisms in their simulated habitat. The activities the learner was required to complete could be "differentiated" (i.e. made easier) if necessary and different levels (i.e. qualities) of help were available. One version of the system: VIS maintained a sophisticated learner model and took control of almost all decisions for the learner. It selected the nature and content of the activity, the level of complexity, level of terminology abstraction, differentiation of the activity and the level of help. The only option left within the learner's control was the choice of which view to use to look at her Ecolab. A second version of the assistant: WIS, offered learners suggestions about activities and differentiation levels. They were offered help, the level of which was decided on a contingently calculated basis [23]. They could choose to reject the help offered or select the "more help" option. The third system variation was called NIS. It offered 2 levels of help


229

to learners as they tried to complete a particular task. The first level consisted of feedback and an offer of further help. The second level which was made available if the child accepted this offer involved the assisting computer completing the task in which the child was currently embroiled. Of the three systems NIS offered the smallest number of different levels of help and allowed the greatest freedom of choice to the child. She could select what she wanted to learn about, what sort of activity she wanted to try, how difficult she wanted it to be and then accept help if she wanted it. The choices were completely up to the individual child, with not even a suggestion of what might be tried being offered by the system. Three groups of 10 children (matched for ability) worked with the three systems. Outcomes were evaluated both through pre/post-test scores on a test of understanding of various aspects of food webs and chains, and via an analysis of what activities the children engaged in and how much help they sought and received. Pre/post-test comparisons showed that VIS produced greater learning gains than WIS and NIS, see [16, 17] for details. Our focus here is not on the learning gains but on the help seeking behaviour of the students. 3.1

Help Seeking

It is clear from the the records logged by the systems of each child's interactions none of the NIS users accepted the option of seeking more help when offered feedback. There is a clear and typical pattern within the interactions of NIS users: actions are attempted, feedback is given with the offer of help, help is not accepted. The action is re-attempted and once completed successfully it is repeated, interspersed with view changes and further organism additions at differing rates of frequency. Only one of the NIS users asked for a differentiated activity and only two attempted to interact at anything other than the simplest level of complexity or terminology abstraction. The child who tried the differentiated activities chose the highest level of differentiation and once the activities were done he returned to the typical NIS pattern. The help seeking or lack of it is particularly marked in the two children who opted to try the most advanced level of interaction. Both made errors in their initial attempts at completing the food web building action selected, but neither opted to take more help when offered. Few activities were attempted and those that were chosen were accessed with the lowest level of differentiation. The same food web building activity was repeated in both sessions of computer use and in both sessions errors were made. The presence of these errors and the apparent desire to tackle more complex concepts would suggest that the children were willing to move beyond what they already understood. However, the lack of collaborative support restricted their opportunities for success and their progress was limited. What could have been a challenging interaction became a repetitive experience of limited scope. Unlike the NIS users all the WIS users accepted help above the basic level and the majority used help of the highest level and then remained at this level. A typical WIS approach would be to try an action, take as much help as needed to succeed with this action and then repeat it before trying another different action. Activities were requested with differentiation. In the majority of cases this differentiation was at the highest level. Without question the WIS users were more willing to attempt actions with which they were going to need help. There were members of this group who progressed through the curriculum both in terms of complexity and terminology abstraction. This is a direct contrast to the NIS user group. The clear difference between one group's willingness to use help over and above simple feedback (WIS) and the other group's complete lack of help seeking is interesting. The help instances for the NIS users were either simple feedback or a demonstration of the particular action being attempted: equivalent to the highest level of help in WIS or VIS. All but one of the NIS users made mistakes and were given feedback, but none of them accepted the offer

230


of further help. It is difficult to explain this startling lack of help seeking behaviour and any attempts are clearly speculative. 4

Educational Context

The only difference between the WIS and NIS system with regard to differentiation or the presentation of help is in the way that WIS suggests that the user try a particular level of differentiation for an activity or ask for help. This policy of offering suggestions was not universally successful. WIS users received suggestions about which activities they should try. These were however accepted less often than the suggestions about the differentiation of an activity. If a suggestion was enough to allow the child to accept an easier activity then it seems reasonable to consider the possibility that without the suggestions, the NIS users viewed choosing a more difficult activity as being somehow better and therefore what they should be attempting. As part of the design of the experiment, note was taken of the computer programs the children had experienced previously. One tentative explanation of the different behaviours is that children did not believe that either asking for more help or for an easier activity would be successful. The WIS users received suggestions and once the higher levels of help were experienced they were taken up and used prolifically. In this sense the WIS system demonstrated its plausibility as a useful source of assistance in a way that the children never gave the NIS system a chance to show. A further factor which is consistent with this help seeking behaviour is found in the observation that none of the children accessed the system help menu or system help buttons. These were available to explain the purpose of the various interface buttons and the way mat action command dialogues could be completed. The children had all used a demo of the system, which allowed them to determine the nature of the interface and none reported problems at the post-test interview. However, when observing the children using the system it was clear that there were occasions when they were unsure about a button or a box and yet they did not use the help button provided. This may well be an interface issue which needs attention in any further implementations of VIS. However, it may also be part of the same plausibility problem. There is another facet to the Plausibility Problem, besides the violation of expectation about what a machine may do (MORE) or what a machine can do (Ecolab). This is related to the way that the intelligent system is used within the overall educational context. The following example from Anderson's work illustrates the issue. One of Anderson's most recent evaluations concerns a system designed to be used in Pittsburgh High Schools [11]. The Practical Algebra Tutor (PAT) is designed to teach a novel applications-orientated mathematics curriculum (PUMP — Pittsburgh Urban Mathematics Project) through a series of realistic problems. The system provides support for problemsolving and for the use of a number of tools such as a spreadsheet, grapher and symbolic calculator. Of special note here, apart from the positive evaluation of the system, is the way that attention was paid to the use of the Tutor within the classroom. The system was used not on a one-to-one basis but by teams of students who were also expected to carry out activities related to the use of PAT, but not involving PAT, such as making presentations to their peers. In this situation the educational interactions involved the system almost as a third party, or even as a "conversation piece", so students were not so starkly faced with the problem of dealing with the machine as the sole provider or withholder of help.


5

231

Conclusions

We have argued that Lepper and his colleagues were correct to raise the issue of the Plausibility Problem and that in our work we have encountered examples of it. However we believe that one aspect of the plausibility problem derives from students' expectations of what intelligent teaching systems can actually achieve. Such a conclusion must necessarily be very tentative as few examples of such systems have found their way to the classroom and so most students' beliefs about the degree of insight and adaptability will be based on some mixture of computer games and CAL programs as well as on science-fiction. Such mixed models of what computer teachers might and can be like can only be confusing. Once students have experienced a number of adaptive systems, their surprise that such systems will exercise a similar degree of agency to human teachers should diminish, so long as the use of those systems in the classroom is properly thought through. Even in this situation there may still be some resentment if the machine is seen to be usurping its authority. Further work is needed to establish whether this will be the case. But there is still a further issue to be wary of and that concerns students expectations of themselves when working with an intelligent system. Barnard & Sandberg [1] built a learning environment for the domain of tides to help students understand why, where and how tides occur in relation to the movement of the earth, moon and sun. Despite encouraging their students to engage in self-explanation so as to reveal areas of the tidal process which they did not understand, students were loathe to do this and in general they had little insight into how partial their knowledge of these processes actually was. It may be that this problem can be reduced by providing a more effective interface, rather than encouragement, to make reflective insight more likely [5]. They describe another facet of the Plausibility Problem, namely that strategies that can be adapted by a human teacher to provoke reflection and self-explanation may not work when the teacher is known to be a machine. In other words, is the very methodology so carefully nurtured by AI-ED systems to track the learning of the student effectively a message to students that they do not need to do this for themselves? To the extent that the system can track students at all, the student will reasonably believe in the high quality, patient record-keeping of the machine, as a machine. While the human teacher may or may not build a detailed model of a student that she is interacting with, it will be clear to the student, especially as one among a group of students, that a human teacher really will not be able to track their work in detail, and if anyone is to do it it will have to be the student herself. References [1] Y. F. Barnard and J. A. C. Sandberg. Self-explanations, do we get them from our students? In P. Brna, A. Paiva, and J. Self, editors, Euroaied: European Conference on Artificial Intelligence in Education, pages 115–121, Lisbon, 1996. Edicoes Colibri. [2] B. S. Bloom. The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6):4–16,1984. [3] T.-W. Chan. Learning companion systems, social learning systems, and the global learning club. Journal of Artificial Intelligence in Education, 7(2):125-159, 1996. [4] M. Chi, M. Bassok, M. Lewis, P. Reimann, and R Glaser. Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13:145-182, 1989. [5] A. Collins and J. S. Brown. The computer as a tool for learning through reflection. In H. Mandl and A. Lesgold, editors, Learning Issues for Intelligent Tutoring Systems, pages 1-18. SpringerVerlag, New York, 1988.

232

B du Boulay et al. / The Plausibility Problem

[6] A. Collins, E. H. Warnock, N. Aiello, and M. L. Miller. Reasoning from incomplete knowledge. In D. G. Bobrow and A. Collins, editors, Representation and Understanding, pages 383–415. Academic Press, New York, 1975. [7] T. del Soldato. Motivation in tutoring systems. Technical Report CSRP 303, School of Cognitive and Computing Sciences, University of Sussex, 1994. [8] T. del Soldato and B. du Boulay. Implementation of motivational tactics in tutoring systems. Journal of Artificial Intelligence in Education, 6(4):337–378, 1995. [9] B. du Boulay. What does the AI in ABED buy? In Colloquium on Artificial Intelligence in Educational Software. IEE Digest No: 98/313, 1998. [10] J. M. Keller. Motivational design of instruction, In C. M. Reigeluth, editor. Instructional-design Theories and Models: An Overview of their Current Status. Lawrence Erlbaum, 1983. [11] K. R Koedinger and J. R Anderson. Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8(l):30–43, 1997. [12] M. R. Lepper. Motivational considerations in the study of instruction. Cognition and Instruction, 5(4):289–309, 1988. [13] M. R. Lepper and R. Chabay. Socializing the intelligent tutor: Bringing empathy to computer tutors. In H. Mandl and A. Lesgold, editors, Learning Issues for Intelligent Tutoring Systems, pages 242-257. Springer-Verlag, New York, 1988. [14] M. R. Lepper, M. Woolverton, D. L. Mumme, and J.-L. Gurtner. Motivational techniques of expert human tutors: Lessons for the design of computer-based tutors. In S. P. Lajoie and S. J. Deny, editors, Computers as Cognitive Tools, pages 75-105. Lawrence Erlbaum, Hillsdale, New Jersey, 1993. [15] A. Lesgold. S. Lajoie, M. Bunzo, and G. Eggan. Sherlock: A coached practice environment for an electronics troubleshooting job. In J. H. Larkin and R. W. Chabay, editors, ComputerAssisted Instruction and Intelligent Tutoring Systems, pages 289–317. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1992. [16] R. Luckin. 'ECOLOAB': Explorations in the Zone of Proximal Development Technical Report CSRP 386, School of Cognitive and Computing Sciences, University of Sussex, 1998. [17] R. Luckin and B. du Boulay. Scaffolding learner extension in the zone of proximal development. International Journal of Artificial Intelligence in Education, 10, forthcoming. [18] S. Ohlsson. Some principles of intelligent tutoring. In R. W. Lawler and M. Yazdani, editors, Learning Environments and Tutoring Systems: Learning Environments and Tutoring Systems, volume 1, pages 203–237. Ablex Publishing, Norwood, New Jersey, 1987. [19] G. Pask and B. Scott. CASTE: A system for exhibiting learning strategies and regulating uncertainties. International Journal of Man-Machine Studies, 5:17–52,1975. [20] V. J. Shute. Rose garden promises of intelligent tutoring systems: Blossom or thorn? In Space Operations, Applications and Research (SOAR) Symposium, Albuquerque, New Mexico, 1990. [21] K. VanLehn. Learning one subprocedure per lesson. Artificial Intelligence, 31(1):1–40. 1987. [22] K. VanLehn, S. Ohlsson, and R. Nason. Applications of simulated students. Journal of Artificial Intelligence in Education, 5(2): 135–175,1994. [23] D. J. Wood and D. J. Middleton. A study of assisted problem solving. British Journal of Psychology, 66:181–191,1975. [24] D. J. Wood, H. A. Wood, and D. Middleton. An experimental evaluation of four face-to-face teaching strategies. International Journal of Behavioural Development, 1:131–147, 1978.


233

Bringing Back the AI to AI & ED Joseph E. Beck and Mia K. Stern Computer Science Department University of Massachusetts Amherst, MA 01003, U.S.A. Phone: (413) 545-0582; Fax: (413) 545-1249 email: {beck,stern}@cs.umass.edu

Abstract The field of artificial intelligence & education has been distancing itself from artificial intelligence. This is unfortunate, as AI can contribute greatly to the design of intelligent learning environments (ILEs). AI allows the use of more complex models of student behavior, and has the potential to decrease the cost and complexity of building instruction systems. Furthermore, ILEs can be made more adaptive in their interactions with students through the use of AI. We discuss reasons the field may be undergoing this shift in emphasis away from AI. We provide an overview of two promising techniques: machine learning and Bayesian networks. We then describe how these techniques have been applied to existing systems, and how they provide an advantage over traditional methods. We also discuss possible future extensions of these techniques. Finally, we conclude by examining how the use of these techniques changes the knowledge engineering requirements of building an ILE.

1 Introduction and motivation The field of artificial intelligence and education is a curious one. The collection of researchers spans computer science, psychology, education, and cognitive science (at a minimum). Each of these camps has different ideas about what the field of AI & ED encompasses, and very different ideas about where the field should be headed. At various times, certain disciplines largely controlled the field. Initially, the field was started by AI researchers[4], with the research focusing on knowledge representation and grain size for instruction[19]. However, the recent trend has been away from AI. In 1987, Wenger wrote a summary article which stated that the traditional basis of AI was shifting towards cognitive science. Anderson[l](p. 242) commented this represented a natural development, as a principled framework for human skill acquisition was critical to improve a system's performance. The attitude seemed to be that AI had "done enough" and now it was the cognitive scientists' turn. Recently, however, the field has moved away from both traditional AI and cognitive modeling. A summary article by Sandberg and Andriessen[14] describes this trend of extending the computer into new teaching domains and new teaching techniques. Thus the focus has shifted from finding AI techniques to make computers better teachers, to instead teaching with a computer. Sandberg and Andriessen interpret this as a movement away from AI. In their view, as well as in Anderson's, AI has been relegated to a support role, rather than a research focus. However, their arguments may be interpreted as a shift

234

J. E. Beck and M.K. Stern / Bringing Back the AI to AI & ED

away from "closed systems," which runs counter to much of the prior work in cognitive psychology such as representing a student's knowledge, task decomposition, etc. Like Anderson, Sandberg and Andriessen consider this shift a natural evolution. It seems that in the development of today's intelligent learning environments (ILEs), the goal is to determine if the designer's favorite learning theory and teaching strategies are correct. The "tutor" is restricted by the designer's ideas, and cannot adapt when these ideas fail. The question we must ask is: why are these systems handicapped by being designed to contain little AI? We argue in this paper that the removal of AI from AI & ED is premature, and that AI still has much to contribute to this field. 1.1 Reasons for shift away from AI There are several possible reasons for this distancing from AI research. For starters, the field of AI has been plagued with hype, which may have caused disillusionment with system designers. It is one thing to have a toy prototype working in the laboratory; it is significantly harder to make this prototype robust enough to survive in a classroom. Another reason for this shift is that the field of AI & ED has moved away from the traditional "drill and test" systems in favor of more open ended exploration. For example, collaborative learning is a hot topic in both the AI & ED and the straight education circles. "Meta-learning" or "learning to learn" are also popular, so systems are being built to embody those principles[14]. Perhaps classic AI technologies are not well-suited to such domains? Another possible reason for this shift is the difficulty in finding AI contributions. Sandberg and Andriessen [14] point out that in a recent call for papers for the AI & ED conference, only two topics pertained to ITS. The question is whether this represents an actual decline in utility of AI techniques, or a bias within the field. In recent conference proceedings (AIED95, ITS96, AIED97, and ITS98), "Case-based reasoning" (in AIED97) and "Instructional Strategies/Planning" (AIED95) are the only session titles that explicitly refer to AI techniques. It is a valid question of whether there is a lack of papers submitted by AI researchers because of the way the conferences are structured or whether the conferences are structured this way because there is a lack of AI submissions. An interesting comparison can be found in the User Modeling conferences. This community has a large overlap with AIED/TTS, but the proceedings (and call for papers) are structured very differently. There are sessions for AI techniques such as probabilistic models, machine learning, as well as sessions devoted to collaborative user modeling. Unsurprisingly, AI has had a strong presence at these conferences. In contrast, at AIED/ITS conferences, AI papers are spread across several sessions and intermixed with a wide variety of topics. This leaves the impression that AI methods are not valid for consideration in their own right. 1.2 Goals The purpose of this paper is not to supplant traditional techniques. For example, cognitive task analysis has certainly been shown to be effective, as many of the successful tutors have used this technique to some degree. This is not a minor accomplishment, and before another methodology can replace it, it would need a track record of successes. The trend towards system having more emphasis on pedagogical theory is also good. Having AI researchers decide what is effective teaching is probably not the best solution. Instead, our goal is to present some of the more recent innovations in artificial intelligence. We also show how these techniques can be used to improve the reasoning of ILEs. To illustrate this, we use several existing and proposed systems as examples. There are situations where using AI instead of other methods has strong advantages, and we hope that these methods will be used more often. We also discuss how AI can be used to help

J.E. Beck and M.K. Stern / Bringing Back the Al to Al & ED

235

cognitive science, psychology, and education research. The goal, therefore, is to present ways in which all sects of AI & ED research can work together. 2 What is Al? Schank[15] wrote an intriguing article on what constitutes artificial intelligence. A particularly interesting point is his discussion of which parts of expert systems "contain the Al." His argument is that the process of firing the rules is not critical; rather, it is the knowledge engineering required to construct the rules. This fits with the Al trend in the 1960's and 1970's of "knowledge is power." In other words, to have a system perform capably it is necessary to give it specific knowledge of the domain in which it will work. Unfortunately, in the field of Al & ED, the lesson has been "knowledge is expensive." An ILE requires knowledge specific to each domain being taught: domain structure, a set of problems to present, hints to provide, teaching strategies, a technique to interpret the student's actions, and possibly a database of misconceptions. Encoding all of this knowledge is expensive, especially since new systems must be built for each domain to be taught. Schank further proposes that to be considered intelligent, the system itself should adapt its behavior to better accomplish its task. This is a lofty goal, but has several advantages. First, an adaptive system can better tailor itself to a unique environment. For an ILE, since students vary so much, this is tremendously useful. A second advantage is that it shifts some of the complexity of reasoning onto the machine. The more the machine is able to deduce on its own, the less explicit instruction it needs. This can potentially reduce the knowledge engineering costs of system construction. There are a variety of artificial intelligence techniques that are applicable to ILEs. We will concentrate on two of the more recent developments in the field: machine learning, and Bayesian networks. Machine learning definitely addresses Schank's proposal for adaptivity. While Bayesian networks (in most forms) do not do this, they do have the potential to simplify ITS construction by concentrating reasoning in one central mechanism. 3 Machine learning Machine learning allows computers to reason and make predictions about situations they have not encountered previously. One such application is a pattern matcher or a classifier. The machine learner takes a set of inputs describing the object, and tries to determine to which category the objects belongs. This can be applied, for example, to stereotypical student models[10]. A slightly more general view is to think of a machine learner as an automatic model generator. A linear regression is an example of this concept: the goal is to determine a function that best predicts the environment. A possible use of such a machine learner could be to automatically derive the equations used by Shute[17] for updating a student's knowledge. A common objection to machine learning techniques is they need significant training, and require fast processors to run. With the advent of web-based and networked ILEs, it is possible to gather training data from a large pool of users. And with falling prices, the objection that a fast computer is required is quickly becoming moot. Given that it is possible to deploy ILEs that use machine learning, a reasonable question is "why?" As in the regression example, learning permits computers to automatically construct a model of a phenomena. In addition, ML techniques can consider a broad class of models, and do so automatically. The question now becomes, "to what types of problems is automatic model construction useful?"

236

J. E. Beck and M. K. Stern / Bringing Back the AI to AI & ED

3.1 Student modeling Machine learning is applicable to a wide variety of tasks within student modeling [18], including handling misconceptions and remediation. The original work on BUGGY[3] demonstrated that students display a wide variety of misconceptions, even about a relatively narrow domain such as multi-column subtraction. The knowledge engineering requirements to collect such a database were substantial, as hundreds of bugs were found. Furthermore, the oft-cited paper by Self[16] says that a system should not diagnose what it cannot treat. Given the difficulties in compiling a list of bugs, the task of constructing an intervention to deal with these is even more daunting. This calls into question the usefulness of creating bug libraries. However, work by Baffes and Mooney with the ASSERT system[2] has bypassed this problem. They have constructed a learning system that uses a technique called theory refinement to deduce the student's current misconceptions. The system alters a correct expert module until it generates a set of production rules consistent with the student's behavior, i.e. it learns about the student's mistakes. However, the harder problem is to give students some instruction that is relevant to their misconceptions. ASSERT is able to modify an example that teaches the correct skill to demonstrate to a student that his misconception is false. It knows how the correct rule was transformed, and is able to modify its explanations accordingly. By doing this, no explicit remediation for misconceptions must be created. This reduces system construction costs. 3.2 Learning pedagogical rules Most intelligent teaching systems have their teaching rules predefined; their adaptivity and intelligence comes from the changes in the student model. However, there are benefits to altering the teaching rules themselves. For example, if a student has not learned very much from examples presented to him, perhaps the system should stop presenting examples. Few systems offer even this small degree of flexibility. This task is distinct from student modeling, since the tutor is learning about its own functioning, not just about the student. The NeTutor[ll] is a prototype designed to learn when a particular teaching strategy is most appropriate for a particular student. This system gathers information describing the context of when a particular interaction style was presented to the student. At a fine-grained level, the learning agent considers specific information about the domain and problem type. At a coarse-grained level, it considers the amount of interactivity in its teaching, how much guidance the student was given, and the style of information presentation. The tutor uses rough set theory to derive a set of teaching rules from this collection of data. 4 Bayesian networks Bayesian networks are used to reason in a principled manner about multiple pieces of evidence. A Bayesian network can be thought of as a directed acyclic graph, where each vertex represents the probability of a certain piece of evidence being true, and the edges indicate relationships between these pieces of evidence. One of the more common uses of a Bayesian network (BN) is to represent causal relations. However, nearly anything that can be represented as a directed acyclic graph can be mapped to a BN. For example, an ILE representing its domain as a topic network could use a BN to allow the propagation of information about the student's knowledge (see section 4.3). Traditionally, the largest problems in using Bayesian networks have been

J.E. Beck and M.K. Stern I Bringing Back the AI to AI & ED

237

implementation difficulties, determining how evidence is propagated (what are the edges of the network), and estimating the edge probabilities. There are now free software packages, in several programming languages, available for Bayesian nets. Thus, the extra programming required is greatly lessened. For an ITS, the designer is already required to determine what knowledge is being represented, so it is relatively easy to specify what the vertices of the network should be. Similarly, the edges represent whether objects are related, and are usually not difficult to determine. However, the probability tables each vertex needs to combine evidence can be expensive to create. Several tables are needed, although many fewer than for a completely connected graph. Bayesian networks have been shown to be highly effective in modeling student behavior. Reye [12] demonstrates how the ACT Programming Languages Tutor's equations for the student model [7] can be rederived using dynamic belief networks. And in [13], he shows how Shute's work in StatLady [17] can also be rederived using a two-phase version of dynamic belief networks. Since these other models are special cases of Reye's framework, it is possible for designer to create broader models that incorporate more information. 4.1 Determining student goals The ANDES system[6][8] teaches physics problem solving to college students. ANDES's architecture uses Bayesian nets for many of its pedagogical decisions, and is a good case study for using BN's. One such use is to determine what problem-solving strategy the student is using currently. In many physics problems, there are multiple valid solution techniques. Traditionally, rule-based systems have had difficulties with multiple solution paths. The search problem of determining which production rules should fire can be expensive. Plan recognition strategies have also been used, but there are difficulties with integrating the student's prior knowledge[6]. In fact, Self[16] recommends making the user interface transparent to allow the ILE to easily determine the user's actions. There are drawbacks to this, including complicating the user interface, and the unnatural mapping for some domains in requiring the student to show the intermediate steps. ANDES solves this difficulty by constructing a BN whose vertices represent problem-solving actions, facts that are relevant to the problem, and strategies the student may apply [6]. When the tutor observes the student perform an action, the probability representing that problem-solving action is set to 1.0. In addition, the tutor uses the contents of the student model to set the probability values for whether the student knows how to apply certain rules or knows certain facts. Combining this evidence greatly extends the tutor's reasoning abilities. If a possible solution path involves knowledge the student does not have, the tutor can probably disregard it. This flexibility in allowing the student a variety of problem solving approaches fits well with the current trend of open-ended problems. 4.2 Determining feedback ANDES also uses its BN to determine what hints to provide students. ANDES's designers determined the most frequent hint request from students was, "what do I do next?" However, to provide this feedback it is necessary to both know how the student is solving the problem and how far he has progressed down the solution path. Once ANDES determines the solution path a student is on, it examines the vertices on this path and calculates the probability the student can use each of the required facts/procedures. I.e., if the student is unlikely to be able to apply the second step in solving the problem, he is even less likely to be able to apply the third. The system has a cutoff

238

J.E. Beck and M.K. Stern /Bringing Back the Al to AI & ED

probability, and if the student's likelihood of understanding is less than this, the tutor demonstrates how to perform that step. The student does not need to inform the tutor how he is solving the problem or where he thinks he is stuck. This is beneficial, as asking a confused student to specify where he needs help (to the degree the computer can understand) may not be optimal. 4.3 Curriculum sequencing In addition to reasoning at a micro-level about student actions, it is possible to use Bayesian networks to reason at a coarser grain size. One such use of Bayesian nets is determining a student's level of competence within a domain[5]. To do this, Collins et al. constructed a BN that represents a hierarchy of skills for arithmetic. The lowest level of the network contains the actual test questions, the next level arithmetic theory and arithmetic skill. This hierarchy can continue for a number of levels. The semantics implied by the links are "is a part of the knowledge for." So if a question points to an arithmetic skill, the implication is that question tests the student's skill at arithmetic. This hierarchy easily maps to a classical domain network, which makes design simpler. Also, it reduces the computational complexity of performing calculations with the Bayesian network. The system uses this network by considering the current estimate of the student's level of ability to compute the probability he will respond correctly to each question in the database. An adaptive testing approach would select an item the student has a 50% chance of answering correctly. An ILE would probably be more conservative, as maintaining student confidence and motivation need to be considered. If the system is unsure about the student's knowledge level, it can behave more like an adaptive test and quickly arrive at the student's approximate knowledge state. As the system's model of the student becomes more accurate, it can select problems the student is more likely to be able to answer. Rather than coding two or more heuristics to accomplish this task, the designer specifies a parameter that determines the likelihood of the student answering the question correctly. 5 Other techniques The are several other artificial intelligence techniques that deserve a brief mention. The first is reasoning under uncertainty. This refers to combining data that may be unreliable or inaccurate, and drawing a conclusion. Bayesian reasoning (e.g. Bayesian networks) is an example of this. Another technique is Dempster-Shafer theory, which is a generalization of Bayesian reasoning. For representing categories which are ill-defined, fuzzy logic is useful. Fuzzy logic provides a means of mapping information to vague categories. An excellent introduction to these techniques, as well as a summary of systems that use formal methods for reasoning under uncertainty can be found in[9]. That some AI techniques can take an extremely large amount of time is a known problem. However, since students expect interactions to take place in real-time, there is often a (low) upper bound on how much time the system can take performing a computation. Anytime algorithms [20] are a solution to this problem. There are two general approaches; one is to provide the algorithm with an amount of time, after which the algorithm is guaranteed to produce a solution. The other method leaves the amount of time unbounded, but requires the algorithm to be capable of providing a solution at any instant. 6 The future We are not proposing that we go back to the days when AI researchers were solely responsible for determining the instructional content. However, we feel that the current

J.E. Beck and M.K. Stern /Bringing Back the AI to Al & ED

239

shift away from AI is detrimental to the field. We recommend that the many camps work together to produce a better end result, each working within their strengths. We will now provide a few examples as to how cooperation can be accomplished. One use of AI for DLEs we see is the construction of self-improving tutors. By using the data from every student, and techniques similar to those of NeTutor, it is possible to draw conclusions about the effectiveness of particular parts of the ILE. Currently, researchers analyze these data by hand for use in designing improved versions of this system. It is possible to automate this process. If the system can observe that certain techniques, say presenting a particular example, fail to help the student, it can stop presenting that example. The next step in self-improving tutors is systems that learn about teaching strategies. As we have mentioned, the current trend is to hard code a specific teaching strategy into a system, with that theory applying to all students under all circumstances. But what happens if the theory is not the right one for all students under all circumstances? We propose almost a backwards approach. We suggest coding many teaching strategies into a system, and then let the system figure out which ones work under which circumstances. As a result, instruction will improve, since the system can adapt to situations not thought of ahead of time. Furthermore, the AI & ED community will develop a more comprehensive and correct learning theory, since the system can learn what does, and what does not, work. One of the biggest open research questions in the AI & ED field is how to get different teaching systems to work together, with the goal being to reduce the amount of work required to create an instructional tool. We see machine learning as a very useful technique to aid in this research. An agent would be able to learn which systems work for which kinds of students (or even which systems work at all), and then be able to direct students to those systems most appropriate for them. 7 Conclusions We have described several promising technologies that have the potential to greatly impact the AI & ED community. We have also provided a summary of how these techniques can be used to solve a variety of problems relevant to the community. Given the relative tradeoffs of using these techniques, it is unfortunate that relatively few systems use these methods. A theme throughout many of these systems is that the reasoning is "principled." One benefit of this is it is easier to justify a decision that is based on a known technique than one based on a home-grown heuristic. This could be particularly relevant if an ILE is used for mission-critical training, and one of its graduates makes a costly mistake. Even for more prosaic tasks, it is easier to convince others (perhaps a school board) that your system is behaving properly if it is based on solid mathematical ground. The AI techniques listed above complement many currently used design processes for ITS. Experts already provide much of the knowledge required to construct a Bayesian network. Machine learning agents perform better if domain experts determine their inputs. Rather than a radical shift in design philosophy, we propose a change in how experts are used. Rather than trying to predetermine how the ILE should act, experts' knowledge is instead coded and more decisions are left to the computer. This is not a major change, but has the potential to let us create systems more cheaply, and have systems that adapt to a wider variety of conditions.

240

J.E. Beck and M.K. Stern / Bringing Back the AI to AI & ED

References [1]

J. Anderson. Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale, NJ, 1993.

[2] P. Baffes and R. Mooney. A novel application of theory refinement to student modeling. In American Association for Artificial Intelligence, pages 403-408,1996. [3] R. Burton. Diagnosing bugs in a simple procedural skill. In Sleeman and Brown, editors. Intelligent Tutoring Systems, pages 157–182. Academic Press, 1982. [4] J. R. Carbonell. AI in CAI: An artificial intelligence approach to computer-assisted instruction. IEEE Transactions on Man-Machine Systems, 11:190–202, 1970. [5] J. Collins, J. Greer, and S. Huang. Adaptive assessment using granularity hierarchies and Bayesian nets. In Proceedings of Intelligent Tutoring Systems, pages 569-577, 1996. [6] C. Conati, A. Gertner, K. VanLehn, and M. Druzdel. On-line student modeling for coached problem solving using Bayesian networks. In Proceedings of the Seventh International Conference on User Modeling, pages 231–242, 1997. [7] A. Corbett and J. Anderson. Student modeling and mastery learning in a computer-based programming tutor. In C. Frasson, C. Gauthier, and G.I. McCalla, editors, Proceedings of the Second International Conference of Intelligent Tutoring Systems, pages 413–420, Berlin, 1992. Springer- Verlag. [8] A. S. Gertner, C. Conati, and K. VanLehn. Procedural help in andes: Generating hints using a bayesian network student model. In Fifteenth National Conference on Artificial Intelligence, pages 106–111, 1998. [9] A. Jameson. Numerical uncertainty management in user and student modeling: An overview of systems and issues. User Modeling and User-Adapted Interaction, 5:193-251, 1996. [10] J. Kay. Lies, damned lies and stereotypes: Pragmatic approximation of users. In Proceedings of Fourth International Conference on User Modeling, pages 175–184, 1994. [11] M. Quafafou, A. Mekaouche, and H.S. Nwana. Multiviews learning and intelligent tutoring systems. In Proceedings of Seventh World Conference on Artificial Intelligence in Education, 1995. [12] J. Reye. A belief net backbone for student modelling. In C. Frason, G. Gauthier, and A. Lesgold, editors. Intelligent Tutoring Systems, pages 596–604. Springer, 19%. [13] J. Reye. Two-phase updating of student models based on dynamic belief networks. In B. Goettl, H. Halff, C. Redfield, and V. Shute, editors. Intelligent Tutoring Systems. Springer, 1998. [14] J. Sandberg and J. Andriessen. Where is ai and how about education? In B. du Boulay and R. Mizoguchi, editors. Artificial Intelligence in Education. IOS Press, 1997. [15] R. C. Schank. Where's the AI? Technical Report 16, Northwestern University, The Institute for the Learning Sciences, 1991. [16] J.A. Self. Bypassing the intractable problem of student modelling. In C. Frasson and G. Gauthier, editors, Intelligent Tutoring Systems: at the Crossroads of Artificial Intelligence and Education, pages 107– 123, Norwood, NJ, 1990. [17] V. Shute. Smart evaluation: Cognitive diagnosis, mastery learning and remediation. In Proceedings of Artificial Intelligence in Education, pages 123-130, 1995. [18] R. Sison and M. Shimura. Student modeling and machine learning. International Journal of Artificial Intelligence in Education, 9:128-158, 1998. [19] B. P. Woolf. AI in education. In S. C. Shapiro, editor, Encyclopedia of Artificial Intelligence, pages 434–444, 1992. [20]

S. Zilberstein. Using anytime algorithms in intelligent systems. AI Magazine, 17(3):73-83, 19%.

Artificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) /OS Press. 1999

241

IF "What is the Core of AI & Education ?" is the Question THEN "Teaching Knowledge" is the Answer N. Van Labeke1, R. Aiken2, J. Morinet-Lambert1, M. Grandbastien1 ' LORIA - UHP/Nancy I, Campus Scientifique, BP 239, F-54506 Vandoeuvre-les-Nancy Cedex, France 2 Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA Abstract. This paper emphasizes the importance of capturing, i.e. making explicit, the knowledge that teachers implicitly use in teaching - the content as well as pedagogy. We describe a process for obtaining information from geometry teachers that will help us to better understand how they teach the spatial properties inherent in 3D geometry. This in turn enables us to improve the design of special software we have built (Caiques 3D) based on their requests to have software that will help them bridge the gap between their current ways of teaching and the objective of having their students better prepared for the world of Computer Assisted Design. We describe the tools (forms) we use to capture this information and the results we experience - both the advantages and problems. Keywords. Teaching knowledge; Knowledge acquisition; Pedagogical expertise; Interactive Learning Environment.

1. Introduction Acquiring knowledge from people (usually experts) is well-known as a difficult problem (see [1] and [2]). In order to extract this knowledge one often utilizes the assistance of a knowledge engineer (for example, see [3] for a cogent overview of this process). However, a knowledge engineer is often not available or too expensive, especially when one is working in education. Our goal was to develop materials and techniques that would allow us, without the assistance of knowledge engineers, to extract the implicit knowledge that teachers have about "how and why they teach concepts in particular ways". This was important because we were trying to assist Geometry teachers to most effectively and seamlessly integrate a software tool, Caiques 3D, that had been developed in our lab into their way of teaching spatial properties of geometry. Thus, we needed to find ways in which we could help them make explicit not only what steps they used in building their lessons (extracting concepts) but also how they wanted to present their ideas (extracting pedagogy). This paper describes how we attempted to solve this problem, the results we have achieved, and the problems we experienced. The key element in our work has been the design of several forms that teachers are requested to use in the description of how they present the topics they will cover. Previous work done by several of the authors ([4, 5, 6]) as well as others (see [7] for example) has shown the importance of integrating the teachers in the design process of developing and using educational software. In our experience it is necessary to extend this participation from not only assisting in the design of the software but to the way it will actually be used in the classroom. In order to do that effectively we needed to better understand how the teachers taught different concepts so they could help us prepare links between how they taught and how they could use Caiques 3D to help them achieve their teaching objectives.

242

N. Van Labeke et al. / "What is the Core of Al & Education? "

We begin with a brief description of the Interactive Learning Environment we have built to assist geometry teachers in their visual presentation of spatial geometry concepts. A discussion of the rationale and design of the forms follows this overview . The remaining sections of the paper discuss our experiences with the teachers using the forms, an assessment of their usefulness and ideas we have for extending this work. The cornerstone for our work is that Teaching Knowledge, though elusive, needs to be well understood if we are to achieve maximum success with the use of educational software. The utilization of forms such as those discussed in this paper is one way to help extract this information.

2. An Interactive Learning Environment for spatial geometry Caiques 3D [8] is a microworld, i.e. a type of Interactive Learning Environment (ILE), designed for constructing, observing and manipulating geometrical figures. It provides students an intuitive and adaptable access to environment features. Intuitive because it is used by students who do not have preparation and adaptable because it allows teachers to decide, with respect to their own pedagogy, which primitives and operations will be made available to students. The aims of Caiques 3D are threefold : • Observation : allowing one to see and understand the third dimension by changing the spatial system of reference (e.g. axes, floor, etc.), choosing perspective (e.g. cavalier, vanishing point, etc.), modifying the observer's point of view (e.g. a frontal point of view for 'real-sized' observation), displaying visual feedback on objects (e.g. projection of points' co-ordinates on the horizontal plane), etc. • Construction : allowing a student to dynamically construct geometrical figures from elementary objects (points, lines, planes, etc.) and construction primitives (intersection, parallel, perpendicular, etc.). • Exploration : allowing one to explore and discover geometrical properties of the figure (deforming it by directly dragging base-points, extracting geometrical objects in separate synchronized tracings, etc.). The plane representation of a spatial figure (i.e. the drawing of the figure) does not provide enough visual information to allow students to understand all its geometrical properties, even if they could modify the observer's point of view. However, the dynamic deformation of such a figure does provide the user with another way of exploring it. This exploration of figures is realized through an interface based on an extension of the direct manipulation concept [9] for spatial environments. As a dynamic geometry environment, our project is built on similar research, both in plane geometry, e.g. Geometer's Sketchpad [10], Cabri-geometry [11], Caiques 2 [12], and in spatial geometry, e.g. Cabri 3D [13].

3. A framework for extracting implicit teacher knowledge Figure 1 provides an overview of a pedagogical sequence that describes different components of teaching activities [14]. This model proposes that teachers view their presentation from three perspectives: contents (domain knowledge), student learning goals (as seen by the teachers) and their own teaching process to achieve these goals. The most important part of this model for us are the boxes representing the activities "Description of teaching", "Practice of learning" and "Remediation". We have tried to capture this information, which we call "Teaching Knowledge", by asking teachers to fill out forms which break these general activities into finer grained pieces. For example, we replace the less precise idea of "Description of teaching activities" with the more specific request for information on "Objectives of the Sequence" and "Activities of the Sequence". Figure 2 and Figure 3 are the specific forms we used to extract detailed information about "teaching knowledge" - both HOW teachers present the information and WHY they have chosen this method. In the work we report here we have collaborated with a group of teachers who teach dynamic geometry as part of a course they give in technical schools to students in a general curriculum as well as to students in a "technicians" curriculum (i.e.

N. Van Labeke et al. / 'What is the Core of Al & Education? "

243

technical draftsman, TV repairman, etc.). Each of them has been teaching this material for more than 20 years and it is their wisdom and expertise that we are trying to capture.

Figure 1: Overview of a pedagogical sequence

Our goal was to design these forms based on the model shown in Figure 1 in order to extract the teaching knowledge that teachers implicitly use. We realized that the forms had to be close enough to the teachers' way of thinking and describing pedagogical sequences BUT also needed to provide the type of information that we could incorporate in order to improve Caiques 3D and its application in the classroom. That is why the forms are based on description of activities rather than on description of the general pedagogical sequence. Moreover, according to the objectives of Caiques 3D, we focus on activities that involve visualization, construction and manipulation skills. Thus, at this stage of the project, we do not try to obtain data on every part of the pedagogical sequence (e.g. with final summative evaluation). We only ask the teachers to describe the most relevant activities (i.e. activities related to information and operation sequence objectives, see section D of Figure 2). Figure 2 provides an annotated version of how we obtain information from the teachers with respect to the sequence of steps they use to help students understand the concepts they are trying to convey (in this case in dynamic geometry). It is important to note two aspects of this form: 1) It is designed to assist us to understand the way they teach, NOT necessarily to help them to improve their teaching by making their steps more explicit (even though they can draw benefits from describing those steps carefully) and 2) the form describes the general attributes most teachers take into account in developing a lesson, i.e. the form is NOT designed specifically to be used with geometry teachers. Many aspects of the form are self-explanatory. However, it is important to draw the readers attention to the following points. 1. Section D, the "Objectives of the sequence", is presented from the viewpoint of the student. As we will discuss later this has been one of the most difficult aspects for teachers to explicitly describe. While they are able to provide these descriptions in general, it is difficult for them to illustrate these items with concrete examples without first observing students using this specific software. 2. In section E, "Activities of the sequence", we found that the teachers concentrated almost exclusively on the computer-based activities they wanted to incorporate into their lessons. However, our objective here was to have them explicate their overall approach the lecture/non-computer based activities (demonstrations with physical objects, drawings on the blackboard, etc.) as well as the use of Caiques 3D for various purposes. This will also be discussed in more detail in section 4.

N. Van Labeke et al. / 'What is the Core of AI & Education? "

244

ID (name): regular polyhedron: octahedron Key words : octahedron included in a cube, identification of a triangle or a square, orthogonality of two planes. Pythagoras' relation and volumes calculus

B

Student (level....): Professional High School Background : initial professional training, continuing education Prerequisites: Concepts: ordinary figures (equilateral triangles, square,...), polygons isontetry •> Method*: construction of a cube, construction of square diagonals Teaching constraints: •> Link with other sequences : requires tlu following activities: nature of a triangle, nature of a quadrilateral. orthogonality of two planes, Pythagoras' relation

D.

Objectives of the sequence (classified into four categories, based on student's viewpoint): 1. Information (observation, analysis. ...): initiation to the software, interpretation of a problem's text. observation of 2D and 3D geometrical figures 2. Operation (practice, self-initiation,...): construction of a cube, construction of the centre of a square (using us diagonals), construction of an octahedron: extraction of ordinary figures, orthogonality of two planes 3. Mastering (criticize , validate,...): lengths (Pythagoras' relation) and volumes calculus 4. Expertise (synthesis....): conclusions on the properties of a regular polyhedron Activities of the sequence: (not necessarily computer-based) > Preparatory activities: reminders on elementary geometrical notions (regular polygons, diagonals, orthogonality of two planes, Pythagoras' relation) > Teaching activities: reminders on the software's use •*> Learning activities: construction of a cube, construction of a regular polyhedron (cf. next form) Remediation activities: reminders on regular polygons properties •> Synthesis activities: construction of other polyhedrons (another activity description), reflection on a way for constructing them

Figure 2: Description of a pedagogical sequence (annotations are in italics).

Figure 3 is the annotated description of the computer-based activities that they propose to incorporate into their lessons. It is the last two sections of this form that provide particularly useful information. For example, in Section B if the teachers can identify factual mistakes that students often make then we can concentrate on providing support using Caiques 3D to help students overcome these mistakes (preemptive mode). Moreover, once the teachers have explicitly catalogued certain student mistakes then we can work with them to evaluate how best to rectify the errors when they are made (remediation mode). A key decision is to ascertain whether it is better to provide a computer-based solution or another alternative . Section C in Figure 3 is perhaps the most informative. This table summarizes some of the causal errors that teachers have identified and ways that we can address them. For example, if the student does not understand a term, then using a dictionary (off-line or computer-based) is a good solution. Or if the student is lacking some "know-how", for example, how to construct a parallelogram, then a demonstration might be most useful. Note that we have used the term "know-how" to indicate knowledge of the system as opposed to knowledge as applied to concepts. So continuing with this example, if the student does not know what a parallelogram is then we would provide a definition and example. We experienced a problem in having the teachers provide information in this section. Instead of describing a set of problems they had encountered in teaching this material they thought we wanted them to identify errors students made in using Caiques 3D. Given their unfamiliarity in using this software it was difficult for them to imagine what problems students might encounter. Once we explained that the form had a more global purpose, the gathering of typical student mistakes in learning dynamic geometry, they were able to provide us with the information we were seeking. This discussion also was important as part of the teachers comprehending the "larger picture" which was our effort to better understand their overall teaching process (knowledge) so that we could improve the design of the software and accompanying documentation for the use of a broad class of teachers.

N. Van Labeke et al / "What is the Core of AJ & Education? "

245

Characteristics of the activity Activity name: regular polyhedron : construction, observation and some calculus Type (leaching, learning, other...): learning and thorough examination Description (text): A) Construction of the polyhedron. 1) Construct a cube ABCDEFGH. 2) Construct the centres I.J,K,L,M and N of each of the 6 faces ADHE. ABFE. BCFG, DCGH, ABCD and FGHE. 3) Hide the intermediary elements needed for the centers construction. 4) Join by a segment line the centres of non-opposite faces B) Observation of the polyhedron. 1) Give the number of faces of the obtained polyhedron. 2) Determine the nature of the face: extract two of them in tracing 1 and 2 and compare them (by using frontal projection). 3) Determine the nature of quadrilaterals INKM. MJNL and IJKL Extract them in tracing 1, 2 and 3 and compare them. 4) Determine the position of the planes that contain the quadrilaterals INKM. MJNL and IJKL 5) Give a name to the polyhedron C) Some calculus. I) Let a be the length of the cube's edge. Calculate the length of the side of the square IJKL and express its area. 2) Express the volume of the polyhedron according to a. 3) Calculate the ratio V'/V i.e. the ratio of the polyhedron volume and the cube volume Geometrical objects : Presentation of the object, Properties,... Cube, line, plane, point, segment line triangle, pyramid, octahedron •> Construction functions : Allowed, Obligatory, Not required, ... Construction of a cube allowed Construction of a segment line allowed Construction of a point on a segment allowed Construction of a midpoint Not allowed Construction of a plane allowed Manipulation functions : Allowed, Obligatory, Not required, ... Extraction of figures in a tracing Hide objects Visualisation functions: Allowed, Obligatory, Not required,... Change space system of reference Change observer's point of view Frontal projection : Typical mistakes ident Ified In the activity: •* incorrect identifi •> inisunderslandin of geometrical terms/properties •> misuse ofcompu er-based dynamic aspects ofgeoi Difficulties of the activity and available help for overcoming them Difficulties

Type of help

know-how manipulation visualisation

show a demonstration give a typical example

knowledge

Give the appropriate theorem ..

Title, description

propose a remediation activity. Cf. activity 2.1

Figure 3: Description of computer-based activities in the sequence (annotations are in italics).

Based on this experience we realized that we need teachers for their teaching knowledge, but they need some preliminary training in order to become effective and efficient partners in this software design task, and that step was missing here. Even if there were more support for these teachers (i.e. some release time from teaching) this task remains quite difficult because of their lack of experience in teaching in a computer-based environment. We are confounding the teacher's existing expertise by asking them to invent a computerbased one, including fundamental changes in the relative importance and difficulty of teaching each concept or know-how.

4. Evaluation of using these forms The teachers have been very conscientious about using the forms and providing us feedback. As noted in the previous section we soon found that there were some difficulties understandable on the one hand but difficult to predict on the other! Following are some of the observations we made about the problems we encountered in using this method to

246

N. Van Labeke et al. / 'What is the Core of Al & Education? "

extract "Teaching Knowledge" ; both with respect to the design of these forms as well as the use of such forms for more general knowledge acquisition. Among the advantages of using forms like these are: 1. Provides us (and the teachers) with a better understanding of why they present geometry concepts as a series of connected activities; this seems especially important as they investigate ways to seamlessly integrate a new piece of software (Caiques 3D) into their current way of presenting material or, alternatively, deciding whether this software actually allows them to teach in a different and better way. 2. A major reason that teachers are interested in using educational software is to motivate students. They realize this intuitively. But the key is to get them to translate this intuition into concrete uses of the software. In our case this problem was greatly simplified by having the teachers explicitly record the steps in their teaching process and provide us (and them) with a clear picture of how Caiques 3D could be used to convey key dynamic geometrical properties that could not be effectively presented otherwise, e.g. for a correct visualization of the poles of a sphere , which students often misplace on its apparent contour when asked to draw them. 3. These teachers taught students in training schools who did not have strong academic backgrounds. Thus, they were concentrating on teaching "skills". However, they realized that there needed to be a better way to bridge the gap between their normal lecture/demonstration format and the computer-assisted-design environment (CAD) for which they were trying to prepare their students. The opportunity to use software like Caiques 3D gave them the possibility to bridge this gap and provide their students a more powerful way of visualizing and manipulating 3D objects in 2D, thus better preparing them for a CAD environment. Using the forms helped all of us to see how they could re-structure their "modus operandi" to incorporate this new tool. 4. Discussions of concepts related to understanding various parts of the form helped all of us clarify the way that students could view objects in different ways (for example as translucent, transparent, etc.). Thus, we could come to an agreement on vocabulary as well as identifying the advantages of presenting objects in different ways. 5. This, in turn, allowed the designers to re-design the software in order to take into account the ways that the teachers intended to use the system. Without the use of the forms we would not have been able to obtain from the teachers the specifics of how they intended to use Caiques 3D and in what ways they found it particularly attractive. 6. The description of the "Activities using software" provided a means for these teachers to formulate activities that were applicable to a wide range of teaching strategies. Moreover, this allowed us to impress upon them the need to concentrate on representing activities from the student's perspective. Otherwise we found that they became enamored with the software and tended to imagine a number of quite inventive uses that had very little pedagogical relevance. Moreover, most of these ideas were not grounded within the capabilities of Caiques 3D and would not have been possible to implement. Among the problems we experienced were the following: 1. As noted previously there was a misunderstanding of the purpose of the forms. This was an easy problem to correct but it points to the more general problem observed in many Knowledge Acquisition scenarios, which is that it is necessary to reach a concensus on the objectives in order that the teachers and software designers better understand each other. Moreover, it is difficult for teachers with little or no experience in using this specific type of software to predict what types of problems students will most likely encounter. 2. In addition to the general misunderstanding about the use of the form, there were several instances of different interpretations of specific words. For example, the teachers thought the world "manipulation" signified any movement of an object on the screen (interface level) but for the software designers "manipulation" meant the re-orientation of an object on the screen with the coordinates actually changed. From the designers point of view this means that extracting a sub-figure of a construction or changing the observer's point of view, even if these operations require user manipulation, are not considered as manipulation operations but translating an object in the geometrical universe does. 3. These teachers were used to focusing on skills acquisition. Thus, it was difficult at times for them to describe their teaching on a more general level as we were asking them to

N. Van Labeke et al. / "What is the Core of Al & Education? "

247

do in completing these forms. They had a certain number of topics to cover and a very strict progression of items to present for each topic. Getting them to visualize how they would use Caiques 3D to provide a more powerful way for students to work with objects in space was a real challenge. 4. The teachers found it hard to forecast the difficulties that students might have in using the software without actually seeing the students work with it. Our goal was to have them envision this based on their previous teaching experience but this proved very difficult. This provided additional validation to our thesis that educational software design without an incremental construction with a workable prototype is meaningless. 5. Since we were interested in non computer-based as well as computer-based activities we tried to design the forms to capture both types of information. However, we underestimated the impact that experimenting with the prototype would have on the teachers. They immediately saw a number of intriguing possibilities for using it (good point!) but then found it extremely hard to describe how they would "mix and match" their non computer-based activities (lectures, demonstrations, etc.) with all the possibilities they perceived with the software. And it was this mix that we were interested in capturing so that we could find ways to better improve the design, as well as suggest more effective ways to integrate Caiques 3D in their lessons.. Table 1 summarizes the benefits gained from using these forms to capture "Teaching Knowledge" as well as the problems we encountered. Table 1: Summary of advantages and problems experienced with the forms Advantages Better understanding of presentation choices of the geometry concepts effective visualization of key (dynamic) geometry properties identifying key design parameters with respect to teachers' intended use supporting teachers and software engineers agreement on geometrical vocabulary and object presentation

Problems initial misunderstanding of the purpose of the forms different interpretation of specific vocabulary, terms and fields content requesting description of specific students' difficulties without observing students' use of the software few descriptions of mixed computer-based (demonstration) and classical (lecture) activities

5. Conclusions and future work 1. The teachers believe that this software will provide them a way to bridge the gap between their traditional teaching methods and the Computer Assisted Design world for which they are preparing the students. Visualization and manipulation of 3D objects allows students to experiment with spatial properties (different views, translation, rotation, etc.) in ways that will have the most direct correlation with concepts they need for understanding CAD. We have noted this and should re-emphasize it here along with a mention of how the feedback from the forms can help the software designer improve the software for the teachers use. 2. Teachers need the opportunity to experiment with this software in order to grasp its potential. As a part of that process it is important to have them explicitly describe how they will and will not use it. The forms allow us to capture their precise ideas and to re-structure the software so that it better suits their teaching needs. For example, specifying the types of student difficulties pointed out the need to include an on-line dictionary since students often do not know the meaning of terms. 3. Focusing the teachers attention on their teaching intentions and pedagogical objectives helped us to obtain specific suggestions for improving the software. Before developing and using the forms, there was a tendency for the teachers to request quite impractical and unrealistic software environments. Not only would these ideas have been almost impossible to implement but more importantly, they were not grounded in a pedagogical process of how they would be used.

248

N. Van Labeke et al / "What is the Core of AI& Education? "

4. Using the forms allowed us to discuss with the teachers their ideas about computer versus non-computer based activities. In this way we obtained a much better idea of what features of the software would be particularly useful versus which features might be "nice to have" but not as pedagogically important. In this way we could concentrate on improving the features of the software that they deemed most important for their teaching goals. 5. As a side benefit we could use the information we collected from these teachers to train other teachers as to how to best integrate the use of Caiques 3D with their usual approach to teaching this material. Future Work : 1. We plan on testing the prototype in the classroom with the concomitant possibility of getting more feedback for improving the "ease-of-use" features of Caiques 3D. 2. A project is planned to use the forms with geometry teachers in Scotland to observe if teachers describe their teaching knowledge in the same way. In particular we hope to ascertain where the forms are "culturally biased" versus what aspects seem more generally applicable to teaching dynamic geometry in different countries (or contexts). Another goal is to determine if the software is used in the same or in different ways and do teachers have different thoughts about how to integrate the software into their teaching approach. 6. References [I] E. Feigenbaum and P. McCorduck, The Fifth Generation. Addison Wesley, Reading, Mass., 1983. [2] B.R. Gaines, An Overview of Knowledge Acquisition and Transfer, Journal of Man-Machine Studies 26/4(1987)453–472 [3] M. Stefik, Introduction to Knowledge Systems. Morgan Kaufman, San Francisco, 199S. [4] M. Grandbastien and E. Gavignet, ECSA: An environment to design and instanciate learning material. In: T. De Jong and L. Sarti (eds.). Design and Production of Multimedia and Simulation Based Training Material. Kluwer Academic Publisher, Amsterdam, 1994 [5] M. Grandbastien, Teaching Expertise and Learning Environments. Proc. of ITS'98 4th International Conference on Intelligent Tutoring Systems, San Antonio, TX. Springer (1998) [6] N. Van Labeke, J. Morinet-Lambert, and M. Grandbastien, Designing Adaptable Educational Software : A Case-study For Spatial Geometry. Proc. of ED-MEDIA'98 World Conference on Educational Multimedia and Hypermedia, Freiburg, Germany. AACE (1998). [7] T. Murray and B.P. Woolf, Tools for Teacher Participation in ITS Design. Proc. of lTS'92 International Conference on Intelligent Tutoring systems, Montreal, Canada. Springer (1992) 593-600 [8] N. Van Labeke, Caiques 3D: a microworld for spatial geometry learning. 4th International Conference on Intelligent Tutoring Systems - ITS'98 System Demonstrations Proceedings (1998) [9] B. Schneiderman, The future of interactive systems and the emergence of direct manipulation. Behavior and Information Technology 7 (1982) 237–256 [10J N. Jackiw, The Geometer's Sketchpad. Visual Geometry Project, Key curriculum Press, 1995. [ I I ] J.M. Laboide and F. Bellemain, Cabri-Geometry II. Texas Instruments, Dallas, TX, 1994. [12] P. Bernat, Caiques 2. Topiques Edition, Pont-a-Mousson, 1994. [13] S. Qasem, La representation dans un micro-monde de geometric dans 1'espace : le cas de CABRI 3D. In: Proc. of EIAO'97 Cinquiimes Journees EAIO de Cachan, Cachan, France. Hermes (1997) 133-146 [14] B. Chouaniere and T. Gille, Mathe'matiques : problemes concrets. Fenfitre active. CRDP, Nancy, 1991.

Acknowledgements This work was done while the second author was an invited professor at the University of Nancy and LORIA in Spring 1998. Pan of his support was from a Research Leave grant from Temple University. The dynamic geometry software design was supported by a grant from the "Region Lorraine".

Intelligent Multimedia



A three-layered scalable architecture for Computer Based Instruction G. Adorni1, M. S. Barbieri2, D. Bianchi', A. Poggi', A. M. Sugliano3 1 Dipartimento di Ingegneria dell 'Informazione Universitd di Parma Parco Area delle Scienze 181A, 43100 Parma, Italy 2 Dipartimento di Psicologia Universita di Trieste Via Universita 7, 34123 Trieste, Italy 3 Dipartimento di Scienze Antropologiche Universita di Genova Vico S.Antonio 5/716126 Genova, Italy Abstract In this work we discuss an architecture for collaborative instruction able to support video conferencing, video on demand and collaborative works. Such architecture is "scalable" depending on the kind of the instructional service required and on the kind of available network interconnection bandwidth. The architecture is composed of three levels. The first level, called basic level, allows the individual use of the system having an internet connection with a limited bandwidth to support the exchange of e-mails or without any internet connection. The second level, called collaborative level, allows remote students to take part to a virtual classroom where they can interact with each other and with the instructor via a chatting line and by sharing the use of the software packages, simulating instruments and tools for laboratory activities, and of a whiteboard allowing to write, to draw and to process images. The third level, called multimedia level, allows a multimedia interaction between the members of the virtual classroom and the integration of the course material with videos. Each user must own a microphone, a CCD camera and an internet connection with a bandwidth supporting the exchange of audio and video data. The previous architecture has been used to realize a on-line course on Mathematical Logic, Logic Programming and Prolog. Up to now, the system does not completely support the three levels of the proposed architecture. The basic level, which regards individual usage of the system, was extensively tried by the students attending the third year of a Computer Science program. We plan to test the collaborative level with a group of students during the present academic year. The Multimedia level is in a tuning stage and only the video on demand service is now available.

251

252

G. Adorni et al. / Computer Based Instruction

1. Introduction With the increased availability of computational power and of different kind of network interconnections characterized by different bandwidth, there has been a growing interest in applying Human Computer Interaction techniques to high-level cognitive tasks, such as, Collaborative Multimedia Computer-Based Instruction. When preparing a computerized lesson, the instructor (i.e., the author) must write the program which controls the interaction between the student and the computer. Over the years, the instructional techniques have not changed too much, but the efficiency of the methods to interact with a computer have significantly changed. Several kind of ComputerBased Instruction techniques can be used to assist instruction at different levels [ 1 ]. In Computer-Managed Instruction, the computer performs many of the administrative tasks of instruction such as record keeping, updating of grades, grading of tests in which computer forms where used. Computer-managed instruction could range from an individual instructor using a spreadsheet to record and tally grades, to a more complete computerized system which does all the book-keeping tasks and will specify an individual program of study on the basis of the student's performance. In Computer-Aided Instruction, the computer takes the roles of the traditional instructor. Whole sections of the material can be presented on the computer while other sections of the course can still be presented by an instructor. The amount of the computer control over the complete instructional process varies with the application. Computer-aided instruction can be used as a support of an instructor's lesson, providing instruction on a small subset of the total material. Intelligent Computer-Aided Instruction can be used to control all aspects of the pedagogical interactions with an instructor, by means of artificial intelligence techniques. In the early years of computer-based instruction, the trend was for large integrated hardware and software systems which could not be decoupled. The trend then moved toward designing software authoring systems supported by operating systems independent of the machine manufacturer. In recent years, we assisted to a marriage between computer-based instruction and software packages; that is, software packages that incorporate computer-based instruction lessons on how to use the package itself. Typically, through textual instruction, the user is asked to perform the button presses or the commands to perform the functions needed for using the software. These tutorials can incorporate some animations (e.g., the cursor is seen moving to the appropriate command) and/or some simulations (e.g., tasks are simulated during the lessons). In Distance Learning the instructor is in a different physical location from the students. Students can be distributed at different sites (e.g., companies, universities). Storically, distance learning occurs in three general ways: (i) videotaped lectures delivered to students at remote sites; (ii) live video and audio lectures sent over satellite systems to sites which have a satellite hook-up; (iii) instruction through the World Wide Web (WWW) [2,3]In the first two cases, it is not required interaction with computer systems. However, computer programs can be used for some demonstrations, and e-mail is used for communication between instructor and student. In the latter case, most interaction with the WWW occurs through hypertext transfer protocol (http) sites using the Hypertext Mark-up Language (HTML). HTML use hypertext to interact with the user. Hypertext can allow students to click on keywords during a lesson to reference past material, elaborate on a concept, or reference a definition for a term. Hypertext is essentially an approach to information management in which data is stored in a network of nodes connected by links. When the nodes are not limited to textual information, but can include graphics, sound, animation, and video, the use the term Hypermedia.


253

The main advantage of distance learning using WWW is its accessibility. Anybody with a computer, a modem, and an access to an Internet browser is a potential student. Another advantage is the potential for easier interactivity between the instructor and the student. E-mail access, chat rooms, and FAQs can make interaction easy for the students. The main disadvantage of WWW distance learning is the relatively long time needed to display some kinds of information. Time to display depends on the complexity of the images, the speed of the computer, the speed of the modem, and the bandwidth of the network. During the last few years, we are assisting to the spreading of low cost desktop video conferencing equipment and tools. Such tools, together with tools for collaborative works (i.e., collaborative editors) became to have an interesting impact on distance learning. In this paper, we present a system suitable for Collaborative Multimedia computer-Based Instruction tasks. 2. System architecture The system architecture we defined allows three different levels of use that differ both for the available bandwidth of the computer internet connection and for its use as individual user or as a member of a "virtual classroom" (see Figure 1). The first level, called basic level, allows the individual use of the system having an internet connection with a limited bandwidth to support the exchange of e-mails or without any internet connection.

Figure 1. Architecture of the system.

At the basic level, the system presents to the student a course module and its related tutorials through a WWW browser. The theoretical part of the subject matter is presented through HTML pages. Linked to the main topics of the key chapters there is a series of tutorials (guided training exercises), presenting questions and problems that the students are invited to solve and offering software packages simulating instruments and tools for laboratory activities. At the end of each tutorial there is a self-assessment test composed of multiple choice, true or false questions fill-the-blank questions and essay questions. While the multiple choice, the true or false and fill-the-blank questions are corrected automatically, the essay questions need to be graded by the tutor; therefore, if the student has available an internet connection, the system automatically sends an e-mail to the tutor with all the information needed to evaluate the results of the test. At the same time, the student can take

254


advantage of the e-mail connection to write her/his comments and to send questions to the tutor. The second level, called collaborative level, allows remote students to take part to a virtual classroom where they can interact with each other and with the instructor via a chatting line and by sharing the use of the software packages, simulating instruments and tools for laboratory activities, and of a whiteboard allowing to write, to draw and to process images. At the collaborative level, users (i.e., students and the instructor) can easily cooperate to solve problems and to perform laboratory activities. In fact, the different shared tools are controlled by a sharing management system that serializes the input of the different users to the tools while the output are made available to all the users [4]. Such input serialization is guaranteed by the management system by assigning a sharing token to a user who will use the tools. Until the user releases the token or the system remove it to the user, the other users cannot use the tools, however, the work of each user with the shared tools can be voted by the other users. The assignment of the sharing token can be done with different policies varying from the simple FIFO policy to complex policies that, for example, choose the next user on the basis of positive/negative votes received when she/he worked with the shared tools. The release of the sharing token can be done either for a time out or for a voting of the other users who do not agree with her/his work. The third level, called multimedia level, allows a multimedia interaction between the members of the virtual classroom and the integration of the course material with videos. Each user must own a microphone, a CCD camera and an internet connection with a bandwidth supporting the exchange of audio and video data. At the multimedia level, users collaboration is enhanced because this level support audio and video interaction. During a collaborative work the images acquired from the camera of the user owning the sharing token are sent to the other users. The other users can interact with the sharing token owner via the use of a chatting token. The chatting token allows to speak in alternative to the sharing token owner: when a user gets the chatting token her/his audio and video data are sent to the other users. 3. An example of courseware: Hyperprolog Basic level. The basic level of the architecture of Figure 1 has been implemented as a system devised to allow the on-line use of a course module and its related tutorials. The subject matter of the module is "Mathematical Logic, Logic Programming and Prolog ". It is used in courses for the diploma degree in Computer Engineering at the University of Parma. The theoretical part of the subject matter is presented through hypermedia (which are made of hypertext and other kind of materials as video clips, animations, etc). Linked to the main topics of the key chapters there is a series of tutorials (guided training exercises) about with questions and problems that the students are invited to solve. Students can actually try out their answers and solutions by using, within the browser, an available Prolog interpreter on the server together with a number of files related to the examples presented in the tutorials. These sample files can be directly loaded and tried out in this environment which we called "PrologLab". The course contains also a number of topics related to Artificial Intelligence: natural language processing, knowledge representation, fuzzy logic, learning, temporal logic. Also for these topics there are a number of working examples that the students can try in the PrologLab environment. The students can easily switch from the hypertext to the


255

PrologLab or use both concurrently. At the end of each tutorial there is a self-assessment test which the students can take. The system can be used over an internet/intranet network or as a stand alone resource. In the former case we have a client-server architecture based on a WWW browser and server. A platform independent user interface has been developed using Java applets. The students can write their comments and send questions to the tutor or can communicate among themselves using mail and newsgroup facilities integrated in the interface. The student can send to the tutor his program, or the output of a query, or an error report with a simple cut and paste operation from the editor or from the Prolog interface text area to the e-mail or to the news window. Collaborative level. In order to allow a direct distance interaction between tutor and students a collaborative environment has been developed. The aim is that to realize a virtual classroom in which the tutor can demonstrate the use of the PrologLab, develop programs interacting with the students, test the program with the Prolog Interpreter. All participants in the classroom have to see the same information on their screens. Moreover each participant can, in an ordered fashion, gain control of the collaborative resources end use them. For example can edit a file, consult a program in the Prolog database, execute a Prolog query and so on. As explained in the previous section, different floor management politics can be adopted. Because our aim, with the collaborative tools, is to reproduce a lesson, we can assume that the teacher may decide, on the basis of a request list, which student has the floor and so have an exclusive control of the application resources. At any time, the teacher can also gain the control of the floor previously given to a student. On the other hand, if the collaborative tools are used by a group of students working to a common project a more democratic policy of floor management should be adopted. A voter list is maintained to allow collective decisions and the members of the group can vote to accept or reject a proposed change to the current program. Multimedia level. Preliminary work has been done about the use of video material integrated in the hypertext. Some hyperlink can point to MPEG tracks stored in a Video Server [5]. This video on demand service allows each student to see different videos at the same time. It is possible to access to this service only in campus over our University intranet. We are presently testing this level with few hours of video lessons integrated in 4 different tracks in the HyperProlog courseware. Due to actual bandwidth of the University intranet, up to now, this kind of service is reasonable with no more than 10 user. We plan to increase this kind of service for the next academic year after the updating of the intranet with ATM technology. 4. Evaluation In the last two years the system was extensively used by two groups of students attending the third year of a Computer Science program. These students have a good knowledge of procedural programming, operating system, computer networks and hardware design, but not knowledge about logics, declarative and logic programming, as well as no AI background. The experimentation regards only to the use of the basic level of the system. The result of the two years were similar. Data relative to the first year of testing were reported in [6]. In the second year the system was tried out by a group of 34 students. They worked with the system for about 30 hours. In the evaluation of this system we relied on three types of information. 1) Analysis of the students' patterns of activity by means of system logs. This analysis will give information about students' use of the system, which pages of the text they looked

256

G. Adorni et al / Computer Based Instruction

at most, which facilities they used the most (tutorials, self test, PrologLab, e-mail, conference area) or which ones they overlooked. 2) Effectiveness of the system in terms of the students' learning outcomes. This evaluation is based on the results of the final test. 3) Students' attitudes toward the resource. A questionnaire was administered to the students to assess how much they liked this resource in comparison with more traditional courses. They were asked which part of the system they used most and which ones least, which ones they felt as difficult to use and why. The result of the questionnaires were integrated with the information gathered during focused group discussions on the topic. Patterns of usage can be obtained by two sources. The first source is automatically provided by the WWW server that saves each transaction in a log file. In the log file there is a record for each page accessed and for each form submitted. In this case, the associated query string is also registered. By looking at these data we can obtain statistics about page usage, i.e. about the topics the student was reading, or about the operation that the user was performing, such as loading a file in the edit area, consulting the file, submitting queries to the interpreter, etc. Other data about the user activity can be directly recorded by the system. For example, every time that a student starts a session, our system saves his user name, date and time, host name, etc. The data obtained from the system logs show that students did not access very frequently the general presentation of mathematical logics. The four related pages prepositional logics, introduction to mathematical logics, logics of first order predicates, logics for problem solving - show an average of 100 accesses each, that is about three times for each student. On the contrary, the Prolog page registers over 700 access, that is over 20 times for each student. The object of this four parts is to present how from the formalism of logic derives the Prolog. Students said they accessed not frequently these parts because they preferred to concentrate on the Prolog page that was the argument of the final examination. The tutorials were also read a lot. Working on the tutorials includes checking the solutions of the exercises and loading the example files. Table 1 shows the number of accesses to the tutorials and the related files and demonstrates the interest shown by the students for the possibility of practicing what they learned in the Prolog presentation. We can see that tutorials 3 and 4 are the most accessed. This can be explained by the difficulty of their topics. Tutorial 5 and 6, instead, have not been accessed very much because time was lacking at the end of the course. Table I - Number of accesses to tutorials, solution of exercises and to example files for laboratory use. Tutorial contents: I - facts, rules and queries; II - syntax and structures; III - lists; IV - backtracking and cut; V - arithmetics; VI built-in-procedures.

tutorials solutions example files

I II III IV V VI 210 172 237 211 152 104 26 50 50 31 31 17 90 51 322 268 88 51

Tot 1086 205 870

Self assessment tests were devised to give the students a feedback on their understanding and mastery of the subject matter. There were 6 self assessment tests. Each one included a number of multiple-choice and true/false questions and one or two fill-theblank questions. Answers were automatically checked and, immediately after having sent the test to the tutor the students could see a screen showing how many answers were


257

correct, were wrong and were missing. They could also reread the questions and, since they knew that they had made a mistake, they could try to understand the reason for their error. Student complained about the fact that the feedback they received said how many answers were correct and not which ones. They would have preferred to know which answer was correct and which one wrong in order to "learn from their errors". But the reason for this preference from the tutors was to not allow the students to became "mechanical experts" of the solution through practice, and to stimulate the students to verify deeply their knowledge. Self assessment tests were randomly generated from an available data base. Therefore each student received a different version of the test. This allowed the students to repeat the self assessment tests as many times as they thought it necessary. Students used self assessment tests in a focused way to assess their learning, presumably at the end of the related tutorial. The majority of them tried each test only one or two times. At the end of the course students were graded by means of a test given also through the Web. This test was the same for all the students. It included 14 questions and covered all the topics of the course. Although two of the students did not do it at all, all the others performed well. The mean percent of correct answers is 62 (range37% to 100%) and the mode72%. Lastly, we collected some subjective data by means of questionnaires and focused interviews. The questionnaire included ten questions. Students were asked whether they were familiar with the hypertext format and how much they liked it. Then, they were asked whether they had ever used an integrated tool, such as HyperProlog. They were asked whether they liked it better than a traditional book plus a separated Prolog interpreter or not, and why. Lastly, they were asked whether they had had any difficulty with HyperProlog and, if yes, of what kind. They were asked what parts of HyperProlog they used most and what they used the least and why.

5. Concluding Remarks Recent trends in higher education emphasize the relevance of what are called dual mode universities, i. e. those institutions where CMC and high level technological resources are used to support in and off campus teaching and learning. Authors [7,8] point out the reciprocal benefit deriving to in campus teaching from the use of well thought materials devised for distance learning, and the benefits in terms of credibility deriving to distance learning from the awareness that course materials are the same as those for residential students. The cost saving advantages of this policy, together with the flexibility of use of these courses should strongly advice a large spreading of the dual mode approach. Flexible use requires modular systems with different functionalities that the users can choose according to their needs, but these needs change according to the context. Off campus use makes asynchronous communicative tools more relevant, while in campus students may be more focused on tutorials, courseware and presentations. In this paper we presented a three level architecture for Collaborative Multimedia Computer-Based Instruction. The architecture is scalable depending on the service required and on the network technology available. The system can be used at home, in a stand alone configuration, or using a telephone line to access a WWW server, e-mail and news service or in campus using a local network like Ethernet or ATM for full operation. The basic level, which regards individual usage of the system was extensively tried on. We plan to test the collaborative level with a group of students during the present academic year. The Multimedia level is in a tuning stage and only the video on demand service is now available.

258

G. Adorni et al / Computer Based Instruction

From the data collected in the last two years of experiments with students attending the third year of a Computer Engineering program, it seems quite evident that this system has been used effectively their learning. Interviews and questionnaires have offered some suggestions for future development and improvement of Hyperprolog, as well as to help us to define and "tune" the final specifications of the presented architecture. Mainly, the effort of screen reading suggests that the theoretical presentation might take advantage from some changes devised to make the text shorter. This could be done by a hierarchical subdivision of the text, in such way that the students might choose their preferred level of difficulty and detail. Some parts might also be made available by means of video on demand. Another development is to make HypeProlog accessible both through the net and in a stand-alone version. This would make it available to students that wish to use it at home and do not have Internet access. The Internet version is however important in allowing collaborative use of the tool, a necessity strongly felt by the students and practiced usually in the labs where they check their doubts with each other. Therefore, to fully exploit the possibilities of this system it will be necessary: a larger diffusion of technological resources (such as individual access to Internet), together with changes both in the students' study habits (i.e. larger familiarity with hypermedia tools, less leaning on immediate feedback from the tutor in face to face interaction) and changes in the features offered by the system (i.e. the offer for more differentiated types of activity, and possibility of personalization in the study strategies). A significant step in the direction of making the system more flexible will consist of a detailed analysis of the students' sequences of activity, i.e. reading the text, trying the tutorial, using the lab, etc., in order to support and facilitate tutoring, no matter whether tutoring is performed by a human or a virtual assistant. This work is partially supported by "Progetto di ricerca applicata 5% del CNR Multimedialita". References [1] E. Eberts, J. F. Broock, Computer-Based Instruction, Handbook of HumanComputer Interaction, M. Helander (Ed.), North-Holland, Amsterdam, 1988, 599-627. [2] C. Bouras, D. Fotakis, V. Kapoulas, S. Kontogiannis, P. Lampsas, P. Spirakis and A. Tatakis, Using Multimedia/Hypermedia Tools over Networks for Distance Education and Training, Educational Technology Review, (summer 1997, No 7), 20,26. [3] A.D.Marshall Developping Hypertext Courseware on the World Wide Web, EDMULTIMEDIA 95 Word Conference on Educational Multimedia and Hypermedia, Graz, Austria, (June 1995), 418–423. [4] J. Begole, C.A. Struble, C.A. Shaffer Leveraging Java Applets: Toward Collaboration Transparency in Java, IEEE Internet Computing, Vol 1, N. 2, 19997, 57–64. [5] Silicon Graphics, WebFORCE MediaBase overview, 1998. Available at http://www.sgi.com/software/mediabase/product.html [6] G. Adomi, M. S. Barbieri, D. Bianchi, E. Calabrese, A. M. Sugliano: How to distribute learning facilities by means of a network: some issues and a case study. The Virtual Campus Trends for higher education and training, F. Verdejo and G. Davies (eds.), Chapman & Hall, 1998, 211–224. [7] E.A. Stacey. Learning at a virtual campus: Deakin University's experiences as a dual mode university. The Virtual Campus Trends for higher education and training, F. Verdejo and G. Davies (eds.), Chapman & Hall, 1998, 39–49. [8] F. Jevons (1986). Dual mode institutions - The way forward. Open Campus, 12, Deakin University: Geelong.

Artificial Intelligence in Education S. P. LajoieandM. Vivet (Eds.) IOS Press, 1999

259

Multiple Representation Approach in Multimedia based Intelligent Educational Systems Kinshuk", Reinhard Oppermann*, Ashok Patel" and Akihiro Kashihara"" ' GMD-FIT, Sankt Augustin, Germany " De Montfort University, Leicester, United Kingdom *" I.S.I.R., Osaka University, Japan Abstract: The paper describes the Multiple Representation approach for presenting multimedia technology within intelligent educational systems. The implementation of the approach is dependent on the adopted educational framework. In this paper, it is discussed for systems using cognitive apprenticeship framework for task oriented disciplines where the major focus remains on cognitive skills acquisition. The paper describes the application of the approach in the design of InterSim system, which provides learning of structure and functionality of the human ear.

1. Introduction This paper demonstrates the Multiple Representation (MR) approach to present multimedia objects (such as audio, pictures, animations) into a multimedia interface world where the relationships of the objects to the world are governed by the educational framework. Learners are provided with various forms of interactivity to suit the pedagogical goals of the intelligent educational systems. This approach ensures the suitable domain content presentation by guiding the multimedia objects selection, navigational objects selection, and integration of multimedia objects to suit different learner needs. The next section of the paper discusses the application of multimedia technology for acquisition of cognitive skills under cognitive apprenticeship framework. Then the paper presents the Multiple Representation approach for cognitive apprenticeship framework. Rest of the paper discusses the implementation of Multiple Representation approach in the InterSim system which aims to provide adequate cognitive skills in the human ear domain while adapting content/information selection and presentation to learners needs.. 2. Multimedia and cognitive skills The use of multimedia objects in educational systems can enhance their efficacy to a great extent in facilitating cognitive skills besides other components of domain competence. However, just the collection of multimedia objects does not guarantee proper learning [15]. Another important aspect is the proper interaction of the learner with the interface components, specially when learning is recognised as a complex activity (or process) combining various factors such as information retrieval, navigation, and memorisation [5]. In the area of cognitive skills, the use of various multimedia objects in a suitable educational framework may satisfy different learning needs which arise at different stages of cognitive skills acquisition. Cognitive Apprenticeship framework [4] provides one such effective path [13].

260

Kinshuk et al. / Multimedia Based Intelligent Educational Systems

The first step of cognitive apprenticeship is the observation phase where the learner receptively explores the task pattern of an expert. Within a system, receptive exploration is possible through text reading, watching a picture, listening audio and observing a video or animation [12]. After basic understanding the system can provide advanced observation through image maps, interactive videos and pictorial virtual reality. After the observation phase, the learner is required to imitate the observed tasks to acquire skills. Simulations and interactive flowcharts can provide an adaptive environment where the learner can imitate the tasks under system guidance. The progress in skill development and retention can then be measured in problem solving and assessment scenarios where different multimedia objects would play different roles. After acquiring basic skills, the learner can achieve competence through repetitive training using practice simulations and flowcharts in different contextual scenarios. Table 1 gives few examples of multimedia objects suitable for different tasks under cognitive apprenticeship framework. Table 1. Tasks in cognitive skills acquisition and related multimedia objects Requirement Observation (receptive) Observation (active) Exploration (Imitation) Feedback (immediate) Evaluation (delayed feedback) Practice (repetition) Transfer in real lite Co-operation in work context

Example of suitable multimedia objects Text, Static pictures. Animations, Video, Audio Image maps, Textual links, Interactive videos, pictorial VR Simulations, Flowcharts All above components in problem solving All above components in assessment mode Practice simulations and flowcharts for different scenarios Authoring tools using various multimedia objects Authoring and communication tools using various multimedia objects

It is not an easy task to select adequate multimedia objects in a particular context, specially when there is a need to integrate various objects, or the objects need to act as navigational aids. The proposed Multiple Representation approach provides guidelines for multimedia objects manipulation according to the adopted educational framework. 3. The Multiple Representation approach Learners with different domain competence levels require different explanations and representations of domain content. Multiple Representation (MR) approach tackles the domain content presentation in three ways: multimedia object selection; navigational object selection; and integration of multimedia objects. 3.1. Multimedia object selection Various recommendations for domain content presentation according to the MR approach are described below. 3.1.1. Task specificity and learner's competence MR approach suggests the selection of multimedia objects to be based on their suitability for the tasks to be carried out, for example, [2] suggested that audio is good to stimulate imagination, video clips for action information, text to convey details whereas diagrams are good to convey ideas. The selection of objects should also consider the learner's domain competence and consequently the curriculum should follow a granular structure to allow its measurement at individual units level [1]. Table 2 shows the selection of multimedia objects for cognitive apprenticeship framework.


261

Table 2. Multimedia objects selection for cognitive apprenticeship framework Domain competence level Novice in both in knowledge and skills Intermediate in knowledge, novice in skills Intermediate in both knowledge and skills (Ready for problem solving) Expert in knowledge, intermediate in skills Intermediate in knowledge, expert in skills Expert in both knowledge and skills

Task Direct instruction for knowledge

Examples of multimedia objects Text, pictures, audio, animations

Direct instruction for skills with little exploration possibilities Learning by problem solving for both skills and knowledge

Animations, videos, textual links, sensitive parts in static pictures Pictorial VR (e.g. asking correct position of a part in structure), Flowchart (e.g. asking a decision point) Flowcharts, user controlled animations. simulations User controlled animations, advance Pictorial VR Advance user controlled animations. advance simulations

Advance exploration possibilities Advance active observations Practice required for achieving mastery

3.1.2. Reference & revisits of already learned domain content In learning process it is sometimes necessary to refer already learned domain content in different contexts [17]. The MR approach favours these revisits in different contexts as it enforces links between concepts, enhances the mental model of previously learnt concept, helps in generalising its applicability in multiple situated scenarios and provides ease in learning current concept by making familiarisation with past learning experiences. 3.1.3. Use of multi-sensory channels The selection of objects should adequately use the visual, aural and tactile senses of the learner. The reception by the learner enhances if the representation of domain content involves all relevant sensory channels (chances of getting distraction due to an unused channel should be minimised). 3.1.4. Context based selection of multimedia objects When there are more than one multimedia objects available for representation of same task or concept, the domain presentation should use the most suitable object in that particular context. 3.2. Navigational object selection The navigation in educational systems takes place through various links provided in the system. The learner's expectations of outcome while activating a link should be properly matched with the presentation of actual resulting interface. The MR approach therefore examines the suitability of various types of links and favours both interaction objects (e. g. push buttons, radio buttons, check boxes) and interactive objects (e.g. text, pictures) [3] to provide navigation. Interaction objects provide transition from one part of the system to another on learner's explicit initiative, whereas interactive objects facilitate a system recommended contextual transfer. Six types of navigational links are identified. a) Direct successor link leading to the successive domain unit in knowledge hierarchy. Such transfer arises from current context such as link in text or message after fulfilling learning criteria of current domain unit. b) Parallel concept link, leading to the analogous domain unit for comparative learning or to the unit related to another aspect of the currently being learnt domain content.

262


c) Fine grained unit link, leading to the fine details of the domain content after identification of some missing or mis-conceptions [11]. These transfers are very contextual and therefore interactive objects such as image maps are suitable for such transfers where the fine grained unit would be explanatory unit of the object clicked in the picture. d) Glossary link, leading to a pop-up "spring loaded" module [10] available only as long as learner is interested in it and is explicitly doing something to keep it active (such as pressing the mouse button). e) Excursion link, leading to a learning unit outside the current context, to learn about an external concept in view of current conceptual unit [7]. Excursion links are used to provide related learning of current context which would generally be a description or a phrase which links current unit to the excursion unit. f) Problem links, leading to the problems related to current learning unit. Transfer to problems is a result of system's suggestion of doing so after learning criteria fulfilment of a learning unit. Table 3 presents examples of multimedia objects used as navigational links. Table 3. Types of multimedia objects as navigational links and recommended uses Examples Textual links from main text (Interactive object) Textual links from messages (Interactive object) Sensitive parts of static pictures (image maps) (Interactive object) Push buttons (Interaction objects)

Pop-up menu items (Interaction objects)

Recommended uses * transfer to successor unit * transfer to excursion * transfer to glossary pop-up * transfer to successor unit * transfer to excursion * transfer to problems * transfer to fine grained unit * transfer to successor unit * transfer to another learning unit on learner's explicit request * transfer to another aspect of same learning unit * transfer from message (e. g., arrow button in message) * transfer to another learning unit on learner's explicit request

3.3. Integration of multimedia objects In many situations, the domain content presentation requires use of more than one multimedia objects. Learning improves as complementary stimuli and cognitive resources used to present learning content include relevant coding (text, graphics, tables etc.) and relevant modalities (visual, auditive senses) [14]. Following are some recommendations on how best to combine multiple multimedia objects. - There should be no more than one active multimedia object at a time on the screen. For example, a screen with two animations showing two different aspects of same domain content would demand high cognitive load on the user (with exception of comparative study of two actions). " The integration of multimedia objects should be complimentary and should be synchronised. For example, audio narration along with a diagram should direct the learner towards the salient parts of the diagram [16]. Care should also be taken not to present the same material with more than one multimedia object (such as audio transmission of the text presented on the screen).


263

•* Decision intensive objects such as flowcharts demand high cognitive loading. Therefore integration of such objects with any other multimedia object is not recommended. *• To avoid confusions different multimedia objects not initially distinguishable should not be put together. For example, pictures and image maps initially look static, similarly, user controlled animations and automatic animations initially have similar dynamic look. , Integration of dynamic observation objects (e.g. animations) with static observation objects (e.g. text) should not use the same sensory channel at the same time. For example, learner should not be forced to read text while watching the animation. 4. The InterSim System The InterSim system facilitates conceptual knowledge and cognitive skills of the human ear domain. A detailed description of the system architecture is available in [8]. The system has three main functional states: learning, assessment and case authoring. The learning and assessment states are for the learner whereas the case authoring is for teachers to add real cases of the domain into the system. The learning state is further sub-divided: (a) coarse grained instruction dominated learning; (b) fine grained knowledge construction; (c) cognitive skills development; and (d) application of the acquired knowledge and skills. 4.1. The Multiple Representation approach in the InterSim system 4.1.1. Multimedia object selection Table 4 examines the educational objectives of various parts of InterSim system under cognitive apprenticeship framework. Various multimedia objects are then selected on the basis of table 1 and 4. Following section describes the rationale of using these objects. a) The receptive and active observation of the subject domain starts with the help of static pictures along with corresponding text. Three types of static pictures are used: normal static pictures, static pictures with sensitive parts (similar to image maps), and static pictures with semi-sensitive parts. Table 4. Educational ob ectives of various states and sub-processes in the InterSim system States and sub-processes Learning state: coarse grained instruction dominated learning Learning state: fine grained knowledge construction

Learning state: cognitive skills development Learning state: application of the acquired knowledge and skills Assessment state Case authoring state

Educational objective * Receptive and active observation of healthy ear structure and functionality * Observation of simple physics related to the auditory system * Observation of graphs and diagrams related to the auditory system * Exploration of structure and functionality of the healthy ear * Excursions to auditory system related topics in physics of sound and audiometric measurements * Observation of diseases of the ear * Exploration and diagnosis of diseases of the ear * Interpretation of graphs and diagrams related to the auditory system * Learning by problem solving of the healthy ear and diseases of the ear * Repetitive training by practice in multiple contexts * Analysis of retention of concepts and skills acquired in the learning state * Addition of various real cases of the domain to be used in advanced exploration of diseases of the ear for acquiring context based cognitive skills

Normal static pictures are used for receptive observations whereas static pictures with sensitive and semi-sensitive parts are used for active observations. Sensitive parts in the pictures represent domain objects in current domain hierarchy. On such objects, mouse

264


over shows the boundary and name of the part; single click highlights the whole part and gives a short textual/audio description; and double click transfers the learner to the fine grained learning unit. Semi-sensitive parts do not belong to the current domain hierarchy and they react only to the double click mouse actions to provide information about how to change the current domain hierarchy. b) The next stage in the receptive observation deals with the dynamic/functional behaviour of ear parts, and animations are found suitable for this purpose [18]. Three types of animations are used: automatic animations run in continuous loop without any learner intervention; user controlled animations allow the learners to see a continuous action, and a particular event can be generated by some explicit actions (e.g., pressing a button); and user initiated animations are used where learners can explicitly run the initially stopped animation to see the complete process. c) After the observation phase, simulations are used for acquiring skills. Simulations are important because learner's first active participation in learning process starts with simulations. They help learners in achieving mastery by providing virtually unlimited practice situations without incurring costs of real work environment [18]. d) To provide more realistic learning environment, pictorial virtual reality (VR) are used which allow manipulation of three dimensional objects and scenes. e) Even more realistic cases are provided by videos to show the actual world phenomena. f) Decision making skills are provided by flowcharts as they graphically represent the sequencing, options and conditions affecting the domain content representation [9]. Table 5 shows how the system supports the student by providing timely recommendations and updating the student model, while not hindering the student to access any such part of the domain which would not cause unnecessary cognitive overload on the user. The table shows a typical learner-system interaction sequence for learning of acute otitis media disease when the learner tries to access the disease from the Eustachian Tube learning unit in the healthy ear part of the system. Table 5. Typical system-learner interaction with adaptive system behaviour Domain content representation on screen Eustachian tube closure Animation (Simple observation)

Student model (major changes) Update: eustachian tube closure knowledge exposed

System recommendation Go to observation of diagnosis (animations)

Diagnosis - User controlled animation (Exploration) Problem - Simulation tor diagnosis (covering observation and exploration of diagnosis)

Update: diagnosis knowledge exposed

Go to advanced exploration of diagnosis (simulations) Need more learning, go to advanced exploration of diagnosis (simulations) Problem solving

Development of acute otitis media - Animation (Observation) Problem - Flowchart related to diagnosis of eustachian tube closure and initial development of"acute otitis media

Partial success update: eustachian tube closure knowledge grasped, diagnosis tried but not grasped update: development of acute otitis media knowledge exposed Full success update: eustachian tube closure fully grasped, initial acute otitis media knowledge grasped

Go for advanced exploration of acute otitis media development (simulations)

Actual student action Rejects system recommendation, goes to exploration of diagnosis Rejects system recommendation, goes to problem solving Rejects system recommendation. goes to development of acute otitis media Accepts system recommendation (some action from user in continuing the learning process)


265

4.1.2. Navigational object selection In InterSim system the navigation methods are selected following MR approach. For example, in the partial screen of the Ossicle Chain learning unit in figure 1, navigation panel on the left side provides various combo boxes for explicit navigation among various learning units. The ossicle chain picture on right behaves as image map to allow navigation to successor units. The textual links pop-up glossary window explaining the terms.

Figure 1. Screen-shot of simulation for Acute Otitis Media in InterSim ear system 4.1.3. Integration of multimedia objects The InterSim system follows MR approach in integrating multimedia objects for domain content representation. For example, the concept of "appropriate sound energy routing " is presented by two comparative animations. On another occasion, structure of ossicles required representation both as static picture and pictorial VR. Since both multimedia objects have similar initial visual states, not recommended by MR approach for simultaneous use, they are used alternative to each other and the learner can explicitly switch between the two without being confused due to their initial similar states. 5. Discussion and future plans The use of multimedia technology in educational systems has not been considered much from the view of educational theories. This paper proposed such a consideration in the form of Multiple Representation (MR) approach. The approach has been implemented in the InterSim system using cognitive apprenticeship framework. There are many areas in which the research demands further consideration. Currently the MR approach is applied only to the disciplines with focus on cognitive skills. The requirements of other types of disciplines and educational scenarios demand different frameworks to work with (for example, Socratic dialogues is one such framework, guided discovery another) and the implementation of Multiple Representation approach would also need to adapt to such requirements. References [1]

Adams E. S., Carswell L., Ellis A., Hall P., Kumar A., Meyer J. & Motil J. (1996). Interactive multimedia pedagogies: Report of the working group on interactive multimedia pedagogy. Sigcute Outlook, 24 (1-3), 182-191.

266

[2]


Alty J. L. (1991). Multimedia - What is it and how do we exploit it? People and Computers IV (Eds. D Diaper & N. Hammond), CUP: Cambridge. [3] Bodart F. & Vanderdonckt J. (1994). Visual layout techniques in multimedia applications. Conference Companion. CHI'94, Boston, Mass., USA, 121-122. [4] Collins A., Brown J. S. & Newman S. E. (1989). Cognitive Apprenticeship : Teaching the crafts of reading, writing and mathematics. Knowing, Learning and Instruction (Ed. L. B. Resnick), Hillsdale. NJ: Lawrence Erlbaum Associates, 453-494. [5] Dillon A. (1996). Myths, misconceptions, and an alternative perspective on information usage and the electronic medium. Hypertext and Cognition (Eds. J. Rouet, J. J. Levonen, A. Dillon & R. J. Spiro), New Jersey: Lawrence Erlbaum Associates, 25-42. |6| Jones, M. K. (1989). Human-Computer Interaction: A design guide. Englewood Cliffs. NJ: Educational Technology Publications. [7] Kashihara A., Kinshuk, Oppermann R., Rashev R. & Simm H. (1997). An Exploration Space Control as Intelligent Assistance in Enabling Systems. International Conference on Computers in Education Proceeding (Eds. Z. Halim, T. Ottmann & Z. Razak), AACE, VA, 114-121. [8] Kinshuk, Oppermann R., Rashev R. & Simm H. (1998). Interactive simulation based tutoring system with intelligent assistance for medical education. World Conference on Educational Multimedia ami Hypermedia, June 20-25, 1998. Freiburg, Germany. |9| Lara S. & Perez-Luque M. J. (1996). Designing educational multimedia. Lecture Notes in Coin/xtrci Science, 1108, 288-297. 110) Nielsen J. (1996). Features missing in current web browsers. SUN Microsystems - What's Happening Columns and Commentary. Http://www.sun.com/950701/columns/alertbox/newfeatures.html [ 11] Patel A. & Kinshuk (1997). Granular Interface Design : Decomposing Learning Tasks and Enhancing Tutoring Interaction. Advances in Human Factors/Ergonomics - 21B - Design of Computing Systems Social and Ergonomic Considerations (Eds. M. J. Smith, G. Salvendy & R J. Koubek). Amsterdam Elsevier. 161-164. ( 1 2 1 Payne S. J., Chesworth L. & Hill E. (1992) Animated demonstrations for exploratory learners. Interacting with Computers, 4 ( 1 ) , 3-22. [13] Quinn C. N. (1997). Engaging learning-./TTorM/n Paper 18. http://itechl.coe.uga.edu/itforum/paper 18/paper18.html. [14] Reimann, P. & Schult T. (1996). Schneller schlauer. Bildung im Multimedia-Zeitalter c't2 9, 178 - 186. [15] Rogers E., Kennedy Y., Walton T., Nelms P. & Sherry I. (1995). Intelligent multimedia tutoring for manufacturing education. Frontiers in Education conference. November 2-4, 1995. Atlanta, Georgia. USA. [16] Rogers Y. & Scaife M. (1997). How can interactive multimedia facilitate learning? Intelligence and Multimodality in Multimedia Interfaces: Research and Applications (Ed. J. Lee), CA: AAAI Press [17] Spiro R. J., Feltovitch P. J., Jacobson M. J. & Coulson R. J. (1991). Cognitive flexibility, constructivism and hypertext: Random access instruction for advanced knowledge acquisition in ill-structured domains Educational Technology, 31(5), 24-33. [18] Towne D. M. (1995). Learning and Instruction in Simulation Environments. Englewood Cliffs. New Jersey: Educational Technology Publications.

Learning Companions



269

The Missing Peer, Artificial Peers and the Enhancement of Human-Human Collaborative Student Modelling Susan Bull*, Paul Brnat, Sonia Critchley*, Koula Davie* & Corina Holzherr * School of Languages, University of Brighton, Falmer, E Sussex, BN1 9PH, UK. Computer Based Learning Unit, University of Leeds, Leeds, LS2 9JT, UK. email: [email protected]; [email protected]. Abstract: We present peerISM, a domain-independent collaborative student modelling system to support two students in critiquing each other's work in distributed and face-to-face mode. It has an artificial peer to provide additional support, and to enable interaction to continue with a single learner when one partner 'goes missing'.

1 Introduction Much ongoing research focuses on the value of human-human collaboration in which one human peer interacts with another. The value of synchronous and asynchronous forms of collaboration have been examined with results that need careful interpretation [1]. Another strand of research applies the notion of collaboration to the issue of building a shared model of a learner [2,3,4]. Such collaborative student modelling minimally requires that an individual interacts with a computer system to negotiate the contents of an 'open1 student model. While there is a plethora of environments nationally designed to support collaborative work, there are relatively few environments in which the system both supports and enhances the activities of two human students in critiquing each other's work in a detailed manner. We describe such a system: peerISM. PeerISM extends the notion of collaborative student modelling by seeking to utilise the strengths of human-human collaboration together with a system designed to support and enhance the quality of the interaction. This paper explains why peerISM was extended to include an artificial peer. The work is distinct from many other approaches to incorporating an artificial learning companion into an environment, for example: research on learning companions sometimes stresses the scarcity of suitable human resources requested by an individual [5]; or assumes that benefits of faceto-face communication will automatically transfer to human/artificial learner pairs. Work with peerISM emphasises human-human interaction, with additional support from an artificial peer. This artificial peer assumes a more central role when a human peer 'goes missing'. 2

Rationale

Artificial Learners (AL) have been investigated as a way of helping students by providing a computational peer with whom a human student may learn. The aim of integrating an AL into an intelligent learning environment (ILE) is to take advantage of benefits of face-to-face collaboration settings, in particular: the potential for encouraging learner reflection, and combining this with the benefits that an ILE can offer. Early implementations examining the potential of ALs were described by Chan & Baskin [6], and Dillenbourg & Self [7]. Interest in ALs continues, as current systems build on previous findings, to try to define the most efficient kinds of AL to interact with a student. For example, Dillenbourg & Self found that human learners (HL) became disinterested when their AL could not offer sufficient useful sugges-

270

S. Bull et al. / Human-Human Collaborative Student Modelling

tions. This problem is now being addressed in systems focussing on ALs with knowledge levels aimed at maximising the effectiveness of interactions for the human student [8,9]. Inevitably there will be differences between AL-HL, and HL-HL interactions, and some systems are designed for use purely with collaborating HL pairs [10,11 ]. Another possibility is distributed HL pairs [12]. The choice of approach is likely to depend on the particular learning context. For example, it has been suggested that face-to-face collaboration might be useful in situations where HLs have already developed mature social skills, and also possess similar knowledge levels; and distributed HL pairs might be more appropriate for tasks which require time to reflect [13]. Another issue to consider is the collaboration skills of HLs interacting in some way via computers [14]. A potential problem for HL-HL pairs is coordinating the presence of two learners—an issue which is irrelevant to AL-HL pairs [5]. It would be useful to develop an approach to computer based, or computer mediated pair interaction, which could be more widely applicable than most implemented to date. MIST [11] has the advantage that it is independent from content knowledge, within the task of learning from texts. This is beneficial not only because it can be used across courses, by different students, but also is advantageous in that individuals may become accustomed to using the system repeatedly, in different situations, reducing the need for them to learn a new program. This is particularly important where students are unaccustomed to computer use. It is important to maintain the aim of promoting reflection in a domain independent system, and the literature on self and peer assessment offers a useful direction. Both types of assessment can benefit learning, and can be usefully employed formatively, in a range of subjects [15,16,17]. In the case of peer assessment, a time delay is implied between initial completion of a task and self evaluation by one student, and the evaluation by the other, allowing the giver of feedback time to reflect. If this feedback is on the same task that the student performed themselves, they will be interacting with the material for a second time, and will be likely to obtain greater benefit since they have already considered their own self evaluation before being confronted with another student's responses to the task. This might help combat the problem noted by Dillenbourg, that "the availability of reflection tools does not guarantee that users do indeed reflect on their learning experiences" [18]. Another way to promote learner reflection is to encourage learner/system collaborative student modelling [2,3,4]. This involves student/system interaction about student model contents. An accessible student model is in some sense similar to a learning companion, as expressed by Chan of the AL in Integration-Kid: "...the companion's behaviour can be viewed as a form of active student model.. .it interacts explicitly with the student and reflects to the student an image close to him" [19]. An additional advantage of collaborative student modelling is that the image reflected to the student is intended to be of him, rather than close to him. Thus, if this image does not correspond to how the student evaluates their beliefs, they should recognise the differences between their beliefs and the contents of their student model, focusing reflection more specifically onto their own knowledge and misconceptions. Combining self and peer assessment with collaborative student modelling suggests a student model with contributions from the student modelled, and from a peer, in addition to representations of the student's beliefs inferred by the system. Thus the notion of collaborative student modelling is extended beyond student/system negotiation of model contents, to include information from peers: human (HP) and/or artificial (AP)1. PeerISM is an Inspectable Student Model comprising representations from the learner modelled, an HP and/or AP, and the system itself. The aim of promoting learner reflection is maintained. PeerISM is flexible with regard to its context of use, as it is domain independent. As student contributions are in the form of 'assessment', different skills are required man for HP-HP collaboration, where collaboration is often expected to occur from scratch. For example, if assessment criteria are provided, learners have a focus from which they may develop their individual and joint interaction, making the question of knowledge level of each member of a pair less crucial. Faceto-face interaction can then occur, when partners already have a focus for their discussion [see 10]. The problem of peer availability can be overcome by acting only with the AP.

3 Promoting reflection with peerISM PeerISM is designed to help students learn by promoting reflection in the following manner: - by requiring users to provide a self-assessment;

5. Bull et al. / Human-Human Collaborative Student Modelling

271

- by exposing students to the work of others; - by requiring students to give peer feedback, and evaluate die feedback that they give; - by providing an inspectable student model derived from self evaluation, peer evaluation (HP and/or AP), and system inference; - by admitting students into their model to offer amendments to representations of their beliefs; to comment to a peer about their evaluation; or to comment on the system's remarks; - by giving a starting point for human-human interaction (at or away from the computer). Three versions of peerISM have been implemented: one which allows short one-word answers that can be evaluated by the system [20]; one which permits only text as input; and a third, a variant of the second, which includes an AP. The second and third versions are the focus of this paper, and are domain independent. The system is used as follows: 1 Students individually input their textual answers to questions provided by their tutor, into the edit fields shown in answers and self evaluation in Fig. 1. They then click on a button to give a quantitative self evaluation for each answer, on a four point scale (very good; good; variable; problematic). Qualitative evaluations can also be noted. 2 Students view their partner's work in a separate section (not shown), or on a printout. 3 Peer assessment is given in the peer feedback section. Qualitative feedback is given in edit fields, allowing commentary on any aspects of the work the evaluator wishes. A quantitative evaluation is also provided for each answer, on the same four point scale. Evaluators also indicate their confidence in their feedback and evaluation (sure or unsure). 4 Students view their respective student models, created from self, HP and/or AP, and system comments (see Fig. 2). The student's own answers are also available. The self and HP evaluations are taken from the assessments provided by each partner; in the case of the HP, also taking into account their confidence in their evaluation. AP evaluations are inferred from the system's checking of the answer edit fields for keywords provided previously, by the tutor. This may not lead to an accurate assessment: e.g. a student might use an alternative acceptable word. However, since the AP is giving a peer evaluation, the recipient of inaccurate comments will be aware of the possibility of inaccuracy. The model of the learner inferred by the AP is presented as textual feedback. The system evaluation is inferred from the combination of quantitative self evaluations and all available quantitative peer evaluations [see also 21,22]. Again none of the information sources are guaranteed accurate. If all evaluations are similar, peerISM assumes these to be probably true. If there are differences, these are noted. The system's model is also presented textually. 5 When viewing their student model, students may make comments for themselves, to their HP and to peerISM. Comments to HPs are designed to remain as HP-HP interactions unless a problem cannot be resolved. Such interactions may be on- or off-line. Comments may also be sent to the tutor. This facility is intended for difficulties that are not resolved to the learner's satisfaction. Encouraging student written explanations as questions to their tutor, in cases of disagreement, is designed to promote self explanation of beliefs and domain knowledge-which may result in students resolving difficulties for themselves [23]. 6 Viewing paired student models has been suggested to lead to intense face-to-face peer (HP-HP) interaction, including self- and other-explanation, and spontaneous peer tutoring [10]. The inspectable student models of peerISM are intended to fulfil a similar functionlearners will have completed the task and reflected on their views before meeting face-toface, where intensive interaction may then develop, based around the contents of their respective models. This should help address the observation of Teasley and Roschelle that "collaboration does not just happen because individuals are co-present" [24].

Fig. 1: the peerISM screens

272


Fig. 2: the student model showing self, HP, AP and system contributions

This human-human collaborative student modelling may result 'only* in modified mental models, or may also lead to explicit changes in the pecrISM models, if learners choose to interact with the system again. In the latter case, students may find further changes occurring in their mental models. The kinds of interaction between students at this stage should be determined by the students themselves, according to the approaches that they find useful. Because the multiple agents involved in peerISM interact about the evaluation of a piece of work, initially in asynchronous mode, selection amongst available peers based on student models is not such an issue as in other contexts [e.g. 21,25]. However, if more than two HPs become involved in the peer feedback context, this must be considered [see 26]. 4

An example context of peerISM

4.1 The domain Halliday states "Talking and writing.. .are different ways of saying. They are different modes for expressing linguistic meanings" [27]. Although some of the claims for differences between oral and written language may be oversimplified [see 28], students do experience difficulties in writing which arise to some extent from gaps in their understanding of the function of genres. Writers must also appreciate the perspectives of both author and audience as, in contrast to conversation, the target readership is usually absent during composition [29]. It is suggested mat there can be a relationship between the development of effective reading and writing skills [30]; and the analysis of texts by learners can be a useful way of facilitating their appreciation of what constitutes a good text, in a certain genre [31 ]. In accordance with these views, undergraduate students taking a French Language module as part of an Applied Language degree were encouraged to analyse texts, to empower mem to attempt similar documents themselves. Texts were studied as follows: a general overview was gained from the title, first and last paragraphs. Students then independently summarised the main ideas in each paragraph, and highlighted key structuring features, e.g. time phrases and connectors. The use of tense and lexis was considered, with reference to the part played in the cohesive development of the content. Finally, a summary of the text was produced in pairs.


273

PeerISM was introduced to promote interaction between pairs of students to help them develop familiarity with text structure and organisation of content, by having to discuss their accounts of these aspects of the text with a partner. Such deliberation aims to raise awareness of document structure, to then serve as an example for the organisation and structuring of writing. Self and peer evaluations focussed on the independently written summaries, with the aim of this leading to subsequent face-to-face interaction about the learners' respective student models. The final co-written summary was a means to link discussion about the student models with the task of understanding the structure of the text. A second reason to use peerISM in this context is the feeling of isolation faced by many, when writing. Introducing peer interaction in reading and summarising was designed to make it easier to overcome this remoteness amongst students, by getting them accustomed to working together, and then continuing peer feedback and interaction into the writing process. A further benefit is that peer discusion of writing can lead to the development of more effective writing techniques, resulting ultimately in the production of better texts [32]. 4.2 PeerISM in use The use of peerISM described below was undertaken before the AP was available. Hence all peers were HPs. The group comprised 12 students. All pairs were self-selecting. Students were assigned the task described above, and shown how to use peerISM. Written instructions were also distributed. Help is also available from within the peerISM program. Only 6 of the 12 students attempted to use peerISM. Of those learners who used it, there was only one pair who completed all steps as a pair. This was because 4 of the users were partnered with people who did not attempt to use the system. From their class participation, it appeared that the non-users did little preparation in any form (i.e. their lack of participation seemed to be general, rather than directed specifically at the peerISM component of the task). The pair who did work together suggested that they would like to use the system again. Thus they completed their assignment for the following week in the same manner. They provided useful information about their experiences with peerISM, and on the basis of this were invited to be co-authors on this paper. Some of their comments are reproduced in Fig. 3. As was the case for most of the group, neither KD nor CH were habitual computer users. Nevertheless, they found peerISM useful for learning. It enabled them to work at a pace which suited, a feature considered important by each of them. Both students also described the benefits of reflection facilitated by peerISM. Peer interaction had a positive effect for both: each made changes after viewing their partner's work, and each reflected further on returning to the work a second time. In addition, both felt the need to discuss their work and the feedback. CH also felt that although conventional (face-to-face) peer work can be very helpful, with peerISM this can be taken further by providing a focal point for reflection. KD felt that peerISM might not be so effective for pairs who are at different levels, an issue raised also by Jehng et al [13]. It has been suggested that if there are differences between the cognitive abilities of individuals, a collaborative interaction will change into one of tutor-tutee [33]. If this occurs with peerISM, this kind of interaction may also be beneficial: the tutee will receive explanations, and the tutor will be reinforcing their own knowledge by making it explicit [see 10]. KD was subsequently asked to comment on how she felt about the potential of peer tutoring in such a situation (see Fig. 3). KD clearly felt that had she found herself in either role of a peer tutoring situation, it would have been useful for her own learning. However, she was very sensitive to the possibility that this may not apply to other learners—some may not wish to assume the role of tutor only. Further investigation is required to determine the relative benefits of peer tutoring and peer collaboration in the context of peerISM, and how much these may vary amongst individuals. As well as its role of enhancing reflection on the domain, for CH inspection of her student model served as a trigger to realising how she might further improve, by more detailed self evaluation. However, she also raised the question of whether more comprehensive self evaluation might result in less peer interaction. This possibility merits further consideration. If it is found to be the case for some partners, the benefits gained from additional self evaluation need to be weighed against effects of reduced peer interaction, for those pairs. CH felt that in their case, increased self assessment would not have led to reduced peer discussion.

274


KD: I enjoyed doing the exercise and found it useful. Many times I have done some work and I have wondered how good my answers were, and how someone else would have tackled the questions. I could have asked of course, but it is not always easy, as some people lose interest once they have done the work. Sometimes I felt I would be taking someone's valuable time, and also it is at times embarrassing to ask 'how did you write. Comparing my own work with someone else's can help me learn new ways of building constructions, how to summarise, etc., especially as I cannot ask the tutor every time I have a question, otherwise he would not have time left to teach! Learning through a system like peerISM means that I can learn at my pace without my partner being present. I can read her answers as many times as I like. Assessing her work makes me contribute to her work as well as making me think. In order to give feedback, I read my partner's answers, which I found impressive: a very good summary, well chosen words, no unnecessary details. I then went back to my answers and compared them with C's and saw how I could improve my work, but I felt I had to ask my partner a few questions first. Our feedback led to a discussion between us, for example, C had used 'par rapport a' and we talked about when to use the expression. We also talked about how to find the main idea in a paragraph, etc. I feel that peerISM has enhanced my learning. For example, my summaries were too long, too detailed, and looking at C's was a good model to follow. If I had spent the same total time working alone, I would not have benefitted as much because I could only have compared my work to what I already know instead of learning new skills. My only reservation is that students assessing each other should be of roughly similar ability.

CH: I think that pair-work can be very fruitful, yet i often the problem seems to be that the partners do I not get enough time for reflection. There is usually I time pressure. Working with somebody face-to-face \ j one is also more hesitant with criticism perhaps: i whilst one would not dare to 'offend* somebody face- i i to-face, one can write a more critical comment on the i I computer. In this way peerISM does in my opinion \ bring down unhelpful barriers on the one hand (i.e. i i being too polite), and contributes towards constructive criticism on the other. PeerISM made me type my answers, gave me time to view my partner's answers when it was convenient for me, and took away immediate time and peer pressure. This proved positive. Giving feedback meant that I generally reconsidered the questions, which consolidated my learning. I also had to evaluate my own answers at least semi-consciously, compare them with my partner's and make a judgement accordingly. I made amendments to at least one of my answers on coming back to the work. To look at somebody else's work in this formal way was a good experience. One can, of course, talk to people about the work, and this is usually done in conventional pair work. PeerISM seems to give the starting point of taking an informal discussion a little further: one has the work on paper first, i.e. everything is very conscious and ordered, but perhaps not detailed enough. One can then discuss each other's work in a more informed, and at the same time, informal, way. It really makes one work that bit harder than in conventional pair work. I am a strong believer in 'real communication between humans', and I realise that peerISM in no way excludes this. K and I both felt the need to discuss our input, especially where discrepancies i arose. We did this partly formally in front of the i computer, and partly informally over a cup of coffee, I think it was in part writing the feedback, and partly i receiving it, which triggered our questions. Working alone would not have been as useful: one would not spend so much time on the work, i.e. one j would answer the questions once and not go back and i j reconsider them. I think having a second opinion is always helpful, even if only to reconfirm our own. i When I looked at my student model I realised I ; Response when asked about tutoring: could perhaps have used more self evaluation. When KD: If I had tutored a partner, it would have been I write an answer which is not a straightforward yes \ useful to me in more than one way. It would have or no response, and which is intended for someone allowed me to reflect on my work more than I else's consumption, I often want to explain the i usually do so mat I could explain it to my partner. It 'hows'and the'whys'. I think this could be would encourage communication between us and I i contained in a more detailed self evaluation and \ would enjoy helping her. If my partner was the I might help subsequent communication between i stronger, I would welcome tutoring, as for me the i partners. On the other hand, it might make that very best way to learn is by example and comparison. communication superfluous. In our case, I think PeerISM allows me to do just that. But I still feel I more detailed self evaluation would have helped us should also be able to help her, as she might make our argument more explicit, and assisted our i otherwise lose interest. partner in giving feedback. i Fig. 3: comments from an HP-HP pair

S. Bull et al. I Human-Human Collaborative Student Modelling

5

275

The artificial peer

Chan [34] discusses roles for APs and HPs in combination, in Social Learning Systems. Because of the situation which occured in the above student group, where four individuals used peerISM, but their partner did not, an AP as an additional peer evaluator might be useful to help overcome the peer availability problem. This could, in fact, be described here as an uncooperative peer problem, since the non-users avoided the work in general: they did not complete the initial individual stage (even on paper), thus there was no work for their partners to give feedback on; neither did they provide feedback to their partners, who had done the initial work. With an AP, the four students with non-user partners may have been able to gain some benefit from peerISM without the need for a reliable HP. The problem of non-users remains. It may be the case that some students gain less from peer interaction, working better independently. The question then arises as to whether there is any reason to try to enforce HP-HP interaction for such individuals [33]. However, the 6 non-users in this group did not perform the final task at an adequate level. For such students it might be helpful to encourage AP-HP interaction, as their motivation may increase if they are able to benefit from giving and receiving feedback without having to interact with an HP. The AP, although less central in this context, may also be useful for HP-HP pairs as an additional resource. Fig. 2 shows how differences between HP and AP feedback to an individual might prompt learners to think about different issues2. 6

Discussion and further work

This paper described peerISM, a domain independent system to encourage learner reflection by collaborative peer modelling and self and peer evaluation. PeerISM may be used with a human and/or computational peer. The AP differs from APs in most other systems in the same way that the HP contributions to the student model differ from traditional HP-HP collaboration: the HP and AP are not trying to collaboratively learn domain content provided through the system, but are carrying out peer evaluations. In the empirical work reported here, the student population split into those who participated as HP-HP pairs and found the experience rewarding; those that were let down because their partner did not participate; and those who were not highly motivated. From these observations, we propose further research to investigate both the ways in which the use of a simple AP affects the HP-HP interaction; the ways in which the AP influences an HP in a situation where another HP cannot—or will not-participate; and the possible motivational value of an AP for those who are poorly motivated to do even the off-line work. An important issue to consider is the minimum necessary functionality of the AP. The functionality could reasonably include: performing the tasks; commenting on HP performance including generation of confidence values; negotiating a new assessment based on HP response; changing the AP's own solution. The present AP can comment based on keywords the teacher believes necessary for a successful answer, but this will not permit much negotiation to occur, and such an AP cannot do the task. Current work is investigating the potential for this additional role, to enable an HP to offer feedback when no HP partner is available.

Notes 1. 2.

AP is an artificial peer: it may or may not itself be capable of learning. Fig. 2 shows the model comprising self, HP, AP and system contributions. It was amended from one of the models of the successful pair, with AP evaluations added, and system comments accordingly altered.

References [1] [2]

Crook, C. (1994). Computers and the Collaborative Experience of Learning. London: Routledge. Bull, S. & Pain, H. (1995) "Did I say what I think I said, and do you agree with me?", in J. Greer (ed), Proceedings World Conference on Artificial Intelligence in Education, AACE, Washington DC, 501-508. [3] Dimitrova, V., Dicheva, D., Bma, P. & Self, J. (1998) A Knowledge Based Approach to Support Learning Technical Terminology, Report 98/9, Computer Based Learning Unit, University of Leeds. [4] Morales, R., Ramscar, M. & Pain, H. (1998) Modelling the Learner's Awareness and Reflection in a Collaborative Learner Modelling Setting, Workshop on Current Trends and Applications of AIED, Fourth World Conference on Expert Systems, Monterrey.

276

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]

S.

Bull et al. / Human-Human Collaborative Student Modelling

Goodman, B., Seller, A., Linton, F. & Gaimari, R. (1997) Encouraging Student Reflection and Articulation using a Learning Companion, in B. du Boulay & R. Mizoguchi (eds), Artificial Intelligence in Education, IOS Press, Amsterdam, 151-158. Chan, T-W, & Baskin, A.B. (1988) Studying with the Prince: The Computer as a Learning Companion, Proceedings of International Conference on Intelligent Tutoring Systems, Montreal, 194-200. Dillenbourg, P. & Self, J.A. (1992) A Computational Approach to Socially Distributed Cognition, European Journal of Psychology of Education 7(4), 353-372. Hietala, P. & Niemirepo, T. (1997) Collaboration with Software Agents: What if the Learning Companion Makes Errors? in B. du Boulay & R. Mizoguchi (eds). Artificial Intelligence in Education, IOS Press, Amsterdam. Ramirez Uresti, J.A. (1998) Teaching a Learning Companion, in G. Ayala (ed), Workshop on Current Trends and Applications of AIED, World Conference on Expert Systems, Monterray. Bull, S. & Broady, E. (1997) Spontaneous Peer Tutoring from Sharing Student Models, in B. in Boulay & R. Mizoguchi (eds), Artificial Intelligence in Education, IOS Press, Amsterdam, 143-150. Puntambekar, S. (1995) Investigating the Effect of a Computer Tool on Students' Metacognitive Processes, Unpublished PhD Thesis, University of Sussex. Baker, M. & Lund, K. (1997) Promoting Reflective Interactions in a CSCL Environment, Journal of Computer Assisted Learning 13(3), 175-192. Jehng, J-C.J., Liang, S. & Chen, C-W. (1995) Examining Sociocognitive Effects of Peer Learning in a Distributed Learning Environment, Proceedings of lCCE, Singapore, 510-518. Burton, M., Brna, P. & Treasure-Jones, T. (1997) Splitting the Collaborative Atom, in B. du Boulay & R. Mizoguchi (eds). Artificial Intelligence in Education, IOS Press, Amsterdam, 135-142. Mowl, G. & Pain, R. (1995) Using Self and Peer Assessment to Improve Students' Essay Writing: a case study from geography, Innovations in Education and Training International 32(4), 324-335. Somervell, H. (1993) Issues in Assessment, Enterprise and Higher Education: the case for self-, peer and collaborative assessment, Assessment and Evaluation in Higher Education, 18(3), 221-233. Stefani, L.A.J. (1994) Peer, Self and Tutor Assessment, Studies in Higher Education 19(1), 69-75. Dillenbourg, P. (1992) The Computer as Constructorium: Tools for Observing One's Own Learning, in R. Moyse & M.T. Elsom-Cook (eds), Knowledge Negotiation, Academic Press Ltd, London, 185-198. Chan, T-W. (1991) Integration-Kid: a Learning Companion System, Proceedings of IJCAI, 1094-1099. Bull, S. & Bma, P. (1997) What does Susan know that Paul doesn't? (and vice versa): Contributing to Each Other's Student Model, in B. du Boulay & R. Mizoguchi (eds). Artificial Intelligence in Education, IOS Press, Amsterdam, 568-570. Greer, J., McCalla, G., Cooke, J., Collins, J., Kumar, V., Bishop, A. & Vassileva, J. (1998) The Intelligent Helpdesk: Supporting Peer-Help in a University Course, in B.P. Goettl, H.M. Halff C.L. Redfield & V.J. Shute (eds), Intelligent Tutoring Systems, Springer, Berlin Heidelberg, 490-503. Soller, A., Goodman, B., Linton, F. & Gaimari, R. (1998) Promoting Effective Peer Interaction in an Intelligent Collaborative Learning System, in B.P. Goettl, H.M. Halff, C.L. Redfield & V.J. Shute (eds), Intelligent Tutoring Systems, Springer, Berlin Heidelberg, 186-195. Bull, S. (1997) 'See Yourself Write': A Simple Student Model to Make Students Think, in A. Jameson. C. Paris & C. Tasso (eds), User Modeling, Springer Wien New York, 315-326. Teasley, S.D. & Roschelle, J. (1993) Constructing a Joint Problem Space: The Computer as a Tool for Sharing Knowledge, in S.P. Lajoie & S.J. Deny (eds), Computers as Cognitive Tools, Lawrence Erlbaum Associates, Hillsdale, New Jersey, 229-258. Hoppe, H.U. (1995) The Use of Multiple Student Modeling to Parameterize Group Learning, in J. Greer (ed), Proceedings of World Conference on Artificial Intelligence in Education, AACE, 234-241. Bull, S. (1997) A Multiple Student and User Modelling System for Peer Interaction, in R. Schifer & M. Bauer (eds), Adaptivitat und Benutzermodellierung in interactiven Softwaresystemen, Universitat des Saarlandes, 61-71. Halliday, M.A.K. (1989) Spoken and written language, Oxford University Press. Grabe, W. & Kaplan, R.B. (1996) Theory and Practice of Writing, Longman, New York. Widdowson, H.G. (1983) New Starts and Different Kinds of Failure, in A. Freedman, I. Pringle & J. Yalden (eds), Learning to Write: First Language/Second Language, Longman Inc., New York, 34-47. Carson Eisterhold, J. (1990) Reading-Writing Connections: Toward a Description for Second Language Learners, in B. Kroll (ed), Second Language Writing, Cambridge University Press, 88-101. Rubin, B. (1996) The Writing of Research Texts, in G. Rijlaarsdam, H. van den Bergh & M. Couzijn (eds), Effective Teaching and Learning of Writing, Amsterdam University Press, Amsterdam, 37-50. Kiefer, K. & Palmquist, M. (1996) How does Access to a Computer Network Affect Writing Students' Interactions with Peers and Teachers?, in G. Rijlaarsdam, H. van den Bergh & M. Couzijn (eds), Effective Teaching and Learning of Writing, Amsterdam University Press, Amsterdam, 358-371. Issroff, K. & del Soldato, T. Incorporating Motivation into Computer-Supported Collaborative Learning, in P. Brna, A. Paiva & J. Self (eds), Proceedings of Euro-AIED, Edicoes Colibri, Lisbon, 284-290. Chan, T-W. (1995) A Tutorial on Social Learning Systems, in J. Self & T-W. Chan (eds). Emerging Computer Technologies in Education, AACE.


277

User Modeling in Simulating Learning Companions Chih-Yueh Chou

Chi-Jen Lin

Tak-Wai Chan

Institute of Computer Science and Information Engineering National Central University Chung-Li, Taiwan 32054, R.O.C. {yueh, zen, chan}@src.ncu.edu.tw Abstract Other than just being an intelligent tutor, a learning companion may take various roles when interacting with a user. Depending on the social learning models adopted in the learning process, a learning companion can be a collaborator, competitor, peer tutor, peer tutee, etc. and there can be one or multiple learning companions. This paper clarifies some issues of user modeling in simulating such learning companions. First, user modeling takes the same crucial role in simulating learning companions as in simulating intelligent tutor in traditional intelligent tutoring systems (ITSs). Second, the system architecture of learning companions is a substantial extension of that of ITSs. In particular, each learning companion demands its own user model with respect to the role it takes. Third, to behave in a "psychologically credible' way to the user when interacting, the learning companion, besides user model, also needs a behavior module that consists of some assumptions and heuristic rules. General Companion Modeling (GCM), described in this paper, is a method as well as an instance that takes the above architectural view of learning companions in modeling beliefs, capabilities, and behaviors of a learning companion in general problem solving domain. How the learning companions construct their own user models will be discussed in this paper. Keywords: learning companion, educational agent, social learning, collaborative learning 1 Why learning companion needs to construct its user model Learning companion, a computer simulated agent, may play different roles, for example, a collaborator or a competitor to the user [1]. As an agent, a learning companion should have its beliefs, decisions, and capabilities [2]. In collaborative situations, the user's beliefs not only involve a domain but also other users [3]. Similarly, a learning companion's beliefs should be concerned with a domain and other users. A learning companion's beliefs in a user can be regarded as a user model constructed from perspective of the learning companion. Problem solving and user modeling are two main capabilities of a learning companion. A learning companion constructs its beliefs about other users according to its capabilities to solve problem and model the user. Beliefs and capabilities differ according to each learning companion, and the interactions between a learning companion and a user may differ from others. Therefore, the user models constructed by learning companions (i.e. the learning companions' beliefs in the user) differ from each other. Shoham stated the following: "Decisions are logically constrained, though not determined, by the agent's beliefs; these beliefs refer to the state of the world (in the past, present, or future), to the mental state of other agents, and to the capabilities of this and other agents." [2] A learning companion may decide to adopt different behaviors towards different users; and different learning companions may also adopt different behaviors towards the same user (i.e. user-adapted interaction of learning companions). In traditional intelligent tutoring systems (ITSs), the user model is constructed by a system simulated expert. The model is the system's best understanding about the student. The typical method to construct the user model is to compare the behavior of the expert and the user. The model assists the system in understanding the current status of a student so that the system can play the role of a tutor to give appropriate instructions. In a learning companion

278

C - Y. Chou et al. / Simulating Learning Companions

system (LCS), the system simulates an expert as the role of a tutor; besides, the system also simulates non-expert agents as the role of learning companions. A learning companion should also construct its user model, from the view of the learning companion, to store its understanding about the user and interactions between it and the user. The model can help the system to play the role of the learning companion to make appropriate responses. Therefore, there are several user models for one user in an LCS. The simulated expert constructs a user model by overseeing all the actions of a user and interactions among the user and learning companions. Each learning companion constructs its own user model by interacting with the user and observations at the user. A learning companion may not be an expert and not be involved in all activities, accounting for why the contents of user model constructed by a learning companion differs from the user models constructed by other learning companions and the simulated expert. Each learning companion interacts with a user according to its own user model. Such interaction can make learning companions 'psychologically credible' to the user [4]; that is, the user can conceive the behaviors or responses of a learning companion. For instance, a learning companion x reminds the user that he/she makes the same mistake again when he/she interacts with another learning companion y. However, the learning companion x is not present when the user interacts with the learning companion y. Although such interaction may be effective for instruction, the user may feel peculiar. The user model constructed by the simulated expert can help to control the behaviors of learning companions for pedagogic tactics. A situation in which the simulated expert detects that the independence of a user is not enough may suggest that all learning companions should not give advice to the user at present. However, learning companions make the finial decision to remain psychologically credible. Several researchers have demonstrated that the user model can be constructed in diverse ways, such as by system simulated expert, by human peer, by human teacher, or by user himself/herself [5, 6, 7]. Several user models may be available which address a user in the system. Communication, cooperation, and competition among multiple user models are required. The interaction between these user models can enrich the social learning activities. For instance, such interaction can be used in collaborative student modeling, peer interaction, and reflection [5]. In addition, learning companions can be another means of constructing the user model. This approach can make more possible compositions and enrich the social learning activities. Another merit of the learning companion is the ability to control the learning companions' capabilities, beliefs, and behaviors for particular pedagogic strategies. For example, Hietala and Niemirepo stated "A group of heterogeneous companion agents at the learner's disposal will increase his/her motivation to collaborate" [8]. Another possible strategy is learning by disturbing [9]. In this strategy, a learning companion can also be designed as a troublemaker whose role is to deliberately disturb the user. Furthermore, Goodman and his colleagues designed a learning companion, Lucy, whose purpose is to encourage user reflection and articulation [10]. In light of above discussion, this work simulates learning companion's capabilities to solve problem and model the user. The rest of mis paper is organized as follows: Section 2 presents general companion modeling (GCM) to simulate multiple and different kinds of learning companions. Section 3 and 4 describe how to simulate the problem solving and user modeling capabilities of a learning companion in GCM. A brief summary is finally made. 2 General companion modeling: a method to simulate learning companions GCM consists of two parts: a method and an architecture for simulating multiple and different kinds of learning companions. The learning companions constructed by GCM method have their own capabilities, beliefs, and behaviors, which will be regarded as the characteristics of the learning companions. 2.1 GCM method GCM uses six steps in simulating a learning companion's characteristic data: 1. Collect all of the required characteristic data of a set of students in their learning process. The behavior data include problem solving states, paths, responses, and interactions;

C.-Y. Chou et al. / Simulating Learning Companions

279

2. Select appropriate data representation to represent the user, that is, a particular student; 3. Design a set of heuristic rules to simulate an expert; 4. Modify the designed heuristic rules above to a set of parameter-based heuristic rules to simulate different levels of capabilities and kinds of beliefs and behaviors of learning companions; 5. Give initial attribute values (parameters for heuristic rules) of each learning companion to determine its characteristics and roles; and 6. Modify the attribute values of learning companions to change their characteristics at appropriate time. The techniques from step 1 to step 3 resemble each other with respect to constructing a user model in traditional ITSs. We shall focus on step 4 and step 5, that is, how GCM simulates problem-solving capabilities of learning companions. Step 6 is how to construct companions' own user models. Simulations of other capabilities, such as giving hints, and behaviors, such as playing different roles at different times, of learning companions are beyond the scope of this paper. 2.2 GCM architecture GCM architecture is an extension of the traditional ITS architecture. Also, from this architecture view, it provides a way to compare the basic differences between ITSs and LCSs. Three main components in traditional ITS architecture are tutoring module, expert module, and student model; whereas GCM architecture consists of four main components, namely, behavior module, learning task simulation module, user model, and learning companion model. The behavior module and the learning task simulation module are substantial generalizations of tutoring module and expert module in ITSs, respectively. The content of user model in LCSs is different from that of the student model in ITSs because they are constructed from different points of view, one is from a learning companion, and the other is from a tutor. Notice that, in traditional ITSs, the student model is the best representation the system can get about the student. This happens to be the same model the system needs if the system claims to be a tutor of the model. For example, if the system induces that the student understands a concept, then, being a tutor, the system will assume that the student understands the concept. In contrast, a learning companion might think that the student does not understand the concept even the system has induced that the student does, a situation may occur if the learning companion is a troublemaker. The learning companion model stores the attribute values of learning companions to represent learning companion agents. Behavior module (vs. tutoring module in ITSs) In a ITS, the tutoring module must support the system to play the role of a tutor. But a learning companion may function in a role such as a peer tutor, peer tutee, collaborator, competitor, or troublemaker. The behavior module is responsible for all these possible behavior simulations. The contents of a behavior module are a set of behavior heuristic rules. The behavior module receives the data from the interface, user model, learning task simulation module, and learning companion model, as parameters to simulate appropriate behaviors or responses. This module may then send an instruction to the interface, pass data to the learning task simulation module for more processes and analyses, as well as modify data in user model and learning companion model.

Figure 1. GCM Architecture

280

C. - Y. Chou et al. / Simulating Learning Companions

Learning task simulation module (vs. domain expert module in ITSs) The system plays the role of an expert in traditional ITSs. An LCS, however, must simulate not only an expert, but also a non-expert such as a novice, or an average student. A non-expert may misunderstand a concept, make mistakes, and learn. Therefore, the system must represent what the learning companion knows, what it does not know as well as to simulate what mistakes are made and what the learning companion can learn. The learning task simulation module is responsible for these tasks. The expert module in traditional ITSs, which is in responsible for simulating a domain expert, theoretically knows all correct knowledge, incorrect knowledge, potential mistakes, and problem solving processes. This simulation can be modified to allow a learning companion to possess some correct knowledge and some incorrect knowledge, as well as to implement some correct solving process and make some mistakes. The learning companion's correct and incorrect knowledge can also be modified, or mistakes can be removed, to simulate its learning behavior. The learning task simulation module may store a set of simulation heuristic rules. These rules receive the contents of the learning companion model as parameters to run the simulation. If a learning companion is set as an expert, the learning task simulation module simulates an expert's behavior; if a learning companion is being set as a novice, the module simulates the novice's behavior. User model The user model stores a user's status as observed by the learning companion. The format of the user model in LCSs is the same as the user model in a traditional ITS. The difference is that the user model in LCSs is constructed from the perspective of learning companions. If an LCS contains two or more agents, the system must record the interaction and dialogue between the user and each agent. This can help each agent remain psychologically credible to the user. Three approaches are used to construct the user model of LCSs: (1) Only one user model in an LCS provides all agents with information to take action. The model is the system's best understanding of the user. The learning companion and other agents (for example, tutor) share the same user model. (2) Although the system has one user model, each agent has its own interpretation of the user model. For example, the independence level of a user in the user model is four. Although the tutor regards the independence of the user as high, the learning companion regards it as low. (3) Each agent in an LCS has its own user model. This model is constructed according to the perspective of the agent, i.e. the agent's understanding and belief in the user. The agent may misunderstand the user. Using this approach, each agent has its own belief, modifies that belief, and acts according to the belief; but controlling the behaviour of agents is complex. One solution is to make appropriate communication among the user models. In mis approach, even if the system contains no tutor, a user model behind the system must be constructed from the perspective of an expert. The model can then provide information to control the agent if that is necessary to satisfy the system's pedagogical strategy. The contents of the user model are divided into domain dependent contents and domain independent contents. The domain dependent contents consist of areas such as the user's understanding, misunderstanding, and levels of proficiencies. These data are the result of an analysis of the user's domain dependent status conducted by the learning task simulation module. These data can help the learning companion give instruction to the user. The domain independent contents are the user's motivational status such as self-confidence, independence, and effort [11]. These data are provided for behavior simulation module to determine what behavior the learning companion should adopt. For example, a learning companion which detects mat the independence of the user is low and its confidence in the user is high may reject the user's help request. Learning companion model The learning companion model stores the attribute values and status of the learning companion, including the domain dependent contents and domain independent contents. The domain dependent contents comprise of what the learning companion knows, its misunderstanding, and levels of proficiencies. These data provide support for simulating the problem solving process of the learning companion in the learning task simulation module. For example, the learning companion's knowledge proficiencies, problem solving, and learning capability are determined by the setups of the learning companion model. The


281

domain independent contents are the learning companion's motivational setups and status. These data provide support for simulating the behavior of the learning companion in the behavior simulation module. For example, a learning companion with low independence may ask the user for help even it knows what to do next. Multiple learning companion models can simulate multiple learning companions in a system. With different settings of the learning companion models, the system can simulate different learning companions. The system can also modify the attribute values of the learning companion model to change the characteristics of learning companions. 3 Use of GCM method to simulate the problem-solving capability of a learning companion A learning companion should have its problem solving capability in order to compare it with user and construct its user model. Many approaches and possible heuristics are available to implement the problem-solving capability of a learning companion. One approach is to make some assumptions and based on these assumptions to design heuristic rules. Below discusses in detail step 4 and step 5 of GCM method, which require making assumptions and design of heuristic rules based on these assumptions. 3.1 Design of a set of parameter-based heuristic rules The learning task, i.e. what the learning system wants to teach the user, may be a problem solving skill or a conceptual knowledge. The system must simulate how each learning companion solves the problem or which knowledge he/she knows. The conceptual knowledge is stored in the learning companion model to represent what the learning companion knows. Herein, we adopt breadth-first-search in General Problem Solver [12] and modify it to meet the end of simulating problem solving in learning task simulation. Three procedures for this task are as follows: 1) identify all neighbor states, which can be reached from the current state through the operators; 2) evaluate all these neighbor states; and 3) select the operator and transform the current state into the state with best evaluated value. For simulating different problem solving processes for various learning companions, two assumptions are made in implementation. Assumption 1: A student may forget or neglect some possible operators of the current state owing to the student's lack of proficiency with respect to the operators. If the student neglects some operators, some neighbor states will not be considered. There can be one or more proficiencies for mastering an operator. The possibility that the student neglects an operator depends on levels of corresponding proficiencies of the student. Heuristic: A filter is added to every operator to determine whether or not the learning companion considers the operator. The probability of passing the filter is calculated by the levels of corresponding proficiencies of the learning companion. A situation in which the operator does not pass the filter implies that the learning companion neglects the operator. According to Figure 2, the learning companion neglects problem state S2 and S3.

Figure 2. Addition of a filter to neglect some states

Assumption 2: A student may incorrectly evaluate the problem states and, therefore, chooses either sub-optimal or wrong state. The probability of inaccurately evaluating a state depends on the student's related proficiencies with respect to evaluating the state.

282


Heuristic: Every neighbor state is given an evaluated value with respect to the difference between the state and goal state. Next, an operator is selected that can reach the state, as well as have the best-evaluated value among all the neighbor states. Each state may have different possible evaluated values: the value evaluated by expert and some other inaccurately evaluated values. These inaccurately evaluated values are generated because some related proficiencies of the learning companion are inadequate. Which evaluated value is selected for a state is calculated by the learning companion's related proficiencies with respect to evaluating the state. The problem state 5, in Figure 3 has two possible evaluated values: EV1 and EV2. The learning companion selects EV1 as the evaluated value of problem state S1.

Figure 3. Different evaluation values in a state

The merit of this novel approach to simulate the learning task is that the learning companion model is taken as parameters of problem solving procedure. In addition, different learning companion models can be established to attain different problem solving results. Therefore, several learning companions in a system can be simulated. 3.2 Give initial attribute values of each learning companion The learning companion model determines the characteristics of each learning companion. This model stores the setups and data of each learning companion. These data are divided into domain dependent and domain independent contents. The domain dependent contents are supportive of the learning task simulation module to simulate each learning companion's problem solving. With different value setting, the learning companion will have different levels with respect to the capability of problem solving and analysis skills. These values can be set up by the system for a particular pedagogical plan or can be determined by the user. The domain dependent contents include: (i) Initial levels of domain proficiencies: the initial levels of all domain proficiencies for learning task simulation. With different levels of proficiencies, the learning companion can be an expert, a novice, or an average student. (ii) Proficiency-adjustment: an adjustment to increase or decrease the levels of proficiencies in its user model. This value is used for user modeling. (iii) Learning speed: the advance amount to increase the levels of proficiencies in the learning companion model when the learning companion learns something. If the learning speed is set to 0, this learning companion does not learn anything. The domain independent contents contribute to behavior simulation of the learning companion in the behavior module. With different values setting, the learning companion has different characteristics. For example, a learning companion with high independence may prefer exploring the problem on its own to asking user for help. The domain independent contents include: (i) Self-confidence value: a value to represent the learning companion's self-confidence status. (ii) Confidence-threshold: a threshold value defined to distinguish between low and high confidence. The learning companion uses this value to assess whether the confidence of the user and itself is high or low. The threshold value to distinguish the confidence status of the user and the learning companion can be divided into two values if necessary, (iii) Confidence-adjustment: an adjustment to increase or decrease the confidence value


283

of user and its self-confidence value. The adjustment of user and the learning companion can also be divided into two values if necessary. Similarly, independence value, independence-threshold, and independence-adjustment are contents with respect to independence. 4 User modeling by learning companions User modeling collects data of user, analyzes the user's status, and stores them in the user model. In GCM architecture, the learning task module is responsible for modeling domain proficiencies and confidence status. When the user solves a problem, the learning companion also simulates how to solve the problem if it is at the position of the user. Next, the learning companion compares its state and the user's state. After analysis with respect to the user's problem solving by the learning task module, the learning companion modifies the levels of proficiencies and confidence value with respect to the user in its user model. Table 1 lists the heuristic rules of a learning companion c to model a user u. Herein, we denote the next problem state at which the learning companion c selects to go as Sc; the next problem state at which the user u selects to go as Su; the evaluated value of Sc by the learning companion c as EVC(Sc); the evaluated value of Su by the learning companion c as EV C (S u ); the learning companion c's confidence in the user u as ConfC,u; the levels of related proficiencies of the user u observed by the learning companion c as Pc,u. The learning companion's confidence in the user is a feeling that the learning companion believes the user can solve problem. This value also represent the learning companion believes whether the user is better than it or not. The value will affect the interaction between the learning companion and the user. For example, the learning companion may choose a dialog with high confidence when it has high confidence in user. In general, the average of all Pc,u could be regarded as Confc,u. The Conf c,u is separated so that we can change this value to control the behavior of the learning companion c about the user u, and does not affect the observed levels of proficiencies. Table 1. Heuristic rules of user modeling by learning companions Confc,u Pc,u

EVc(Sc)>EVc(Su) EVc(Sc)=EVc(Su) EVc(Sc)<EVc(Su)

Significant decrease Increase Significant increase

Decrease Increase Increase

A situation in which EVC(S,)>EVC(Su) implies that the learning companion c which assesses its state is better than the user's. Then the learning companion c largely decreases its confidence in the user u and decreases the levels of related proficiencies with respect to the user u in its user model. A situation in which EVc(Sc)=EVc(Su) implies that the learning companion c which assesses its state is equal to the user's. The learning companion c increases its confidence in user u and the levels of proficiencies with respect to user u in its user model. A situation in which EVc(Sc)<EVc(Su) implies that the learning companion c which accesses its state is worse than the user's. Then the learning companion c largely increases its confidence in the user u and the levels of proficiencies with respect to the user u in its user model. These heuristic rules can be extended to include simulation of how the learning companion adjusts its self-confidence and how it learns from the process.

Independence and self-confidence modeling The behavior module models independence and self-confidence status of the user. Herein, we adopt Soldato and Boulay's method to model the user's independence and selfconfidence in ITSs [11]. The independence status relates to the perceived feeling of needing or not requiring the tutor's assistance to accomplish the learning task. The self-confidence status concerns that the user's result of the task is accomplished or not and the confidence degree of dialog. The student with high self-confidence can work harder and persist longer. The pedagogic strategy is to increase the student's independence and self-confidence. The modeling in ITSs can be regarded as in the view of tutor. In GCM, a learning companion constructs independence and self-confidence model of user according to the interaction

284


between it and the user. Moreover, each learning companion has its own confidence and independence thresholds. Although a learning companion regards an independence value as low, another learning companion may regard it as high. A learning companion represents user's independence status according to help state and help detail. A set of heuristic rules to model independence are listed in Table 2. Table 2. Heuristic Rules of Independence Modeling Help state Help detail Independence degree General Decrease Providing Specific Significant decrease Providing No Increase

5 Summary This work has presented a novel method to construct the user model from the view of the learning companion agent in order to make the learning companion human-like or psychologically credible to the user. This method simulates the learning companion's capabilities of problem solving and user modeling. Each learning companion can construct its own user model and interact with the user according to its attribute values and its own user model. Future work should examine how the user model constructed by learning companions can be applied to enrich learning activities. Also, to better understand different ways of constructing user models: constructed by system simulated expert as in ITSs, user himself/herself, other users, human teacher, and learning companions, through study and comparison is needed.

References 1. Chan, T.W. & Baskin, A.B. (1990). Learning companion systems. In C. Frasson & G. Gauthier (Eds.) Intelligent Tutoring Systems: At the Crossroads of Artificial Intelligence and Education, Chapter 1, New Jersey: Ablex Publishing Corporation. 2. Shoham. (1993). Agent-Oriented Programming, Journal of Artifical Intelligence, Vol. 60 No. 1, 5192. 3. Paiva, A. (1997). Learner Modelling for Collaborative Learning Environements. In Proceedings of AI-ED 97 World Conference on Artificial Intelligence in Education, Kobe, Japan, 215-222. 4. Gilmore, D., & Self, J. (1988). The application of machine learning to intelligent tutoring systems. In J. Self (ED.), Artificial Intelligence and human learning, intelligent computer-aided instruction, New York: Chapman and Hall, 179-196. 5. Bull, S. (1998). 'Do It Yourself Student Models for Collaborative student Modelling and Peer Interaction. In Proceedings of 4th International Conference, ITS'98, San Antonio, Texas , USA, 177185. 6. Beck, J., Stern, M., & Woolf, B.P. (1997). Cooperative student Models. In Proceedings of AI-ED 97 World Conference on Artificial Intelligence in Education, Kobe, Japan, 127-134. 7. Collins, J., Greer, J., Kumar, V., McCalla, G., Meagher, P., & Tkach, R. (1997). Inspectable User Models for just in Time Workplace Training. In A. Jameson, C. Paris, C. Tasso (Eds.) User Modelling, Proceedings of the UM97 Conference, Springer Wien New York, 327-337. 8. Hietala, P., & Niemirepo, T. (1997). Collaboration with Software Agents: What if the Learning Companion Agent Makes Errors? In Proceedings of AI-ED 97 World Conference on Artificial Intelligence in Education, Kobe , Japan, 159-166. 9. Aimeur, E., Dufort, H., Leibu, D., & Frasson, C. (1997). Some Justifications for the Learning by Disturbing Strategy. In Proceedings of AI-ED 97 World Conference on Artificial Intelligence in Education, Kobe, Japan, 119-126. 10. Goodman, B., Soller, A., Linton, F., & Gaiman, R. (1998). Encouraging Student Reflection and Articulation using a Learning Companion. International Journal of Artificial Intelligence in Education, 9, 237-255 11. Soldato, T.D. & Boulay, B.U. (1995). Implementation of Motivational Tactics in Tutoring Systems, Journal of Artificial Intelligence in Education, Vol. 6 No. 4, 337-378. 12. Newell, A., & Simon, H. A.. (1963). GPS, A Program That Simulates human Thought. In Computers and Thought. E. A. Feigenbaum & J. Feldman (Eds.), McGraw-Hill. New York, 279-293

Artificial Intelligence in Education S.P. Lqjoie and M. Vivet (Eds.) IOS Press, 1999

285

Teaching Scientific Thinking Skills: Students and Computers Coaching Each Other Lisa Ann Scott, Frederick Reif Center for Innovation in Learning, Carnegie Mellon University, Pittsburgh, PA, 15213, USA Lscott+ @ andrew.cmu.edu, freif+ @ andrew.cmu.edu Our attempts to improve science education have led us to analyze the thought processes needed to apply scientific principles to problems — and to recognize that reliable performance requires the basic cognitive functions of deciding, implementing and assessing. Using a reciprocal-teaching strategy to teach such thought processes explicitly, we have developed computer programs called PALs (Personal Assistants for Learning) in which computers and students alternately coach each other. These computer-implemented tutorials make it practically feasible to provide students with individual guidance and feedback ordinarily unavailable in most courses. We constructed PALs specifically designed to teach the application of Newton's laws in basic physics. In a comparative experimental study these computer tutorials were found to be nearly as effective as individual tutoring by expert teachers — and considerably more effective the instruction provided in a welltaught physics class. Furthermore, almost all of the students using the PALs perceived them as very helpful to their learning. These results suggest that the proposed instructional approach could fruitfully be extended to improve instruction in various practically realistic contexts.

1. Introduction Many students emerge from science courses with significant misconceptions, poor problem-solving abilities, and an inability to apply the scientific concepts or principles that they ostensibly learned. This paper describes work aiming to improve instruction so as to overcome the preceding deficiencies. In particular, we have done the following: • To address students' difficulties in applying scientific principles, we analyzed the required thinking skills, including basic ones needed to make decisions and assess their implementations. • By using instructional strategies for teaching these thinking skills explicitly, and using computers as practical means of providing students with individual guidance and feedback, we devised highly interactive instruction where the computer and student take turns coaching each other. • We carried out an experimental study which indicated that the resulting computer tutorials were nearly as effective as individual tutoring by expert teachers — and considerably more effective than the instruction in a well-taught physics class. 2. Analysis: Applying principles to problems Instruction should enable students to apply basic scientific principles flexibly, to explain or predict diverse phenomena, and to become good problem solvers and independent learners. These are ambitious instructional goals involving complex intellectual activities. 2.1 Methods for applying principles to problems Dealing with physics requires one to properly interpret and apply basic concepts and principles. Teaching can be more effective if the required thought processes are explicitly understood so that effective learning processes can be deliberately designed. Our past work provides quite a few examples where elucidation of such thought processes has allowed the design of effective instruction. [1, 2, 3]

286

LA. Scott and F Reif/ Teaching Scientific Thinking Skills

A central instructional need is to analyze (and teach explicitly) the thought processes required to interpret scientific concepts and principles. Such an analysis should not merely identify the required knowledge, but must specify methods for using such knowledge effectively. However, even if thought processes for scientific tasks are well understood, effective teaching of these processes requires proper attention to several prerequisite instructional needs. The following subsections identify three universally important needs and suggest specific ways of addressing them. 2.2 Needed cognitive functions: Deciding, implementing, assessing The performance of any task requires one repeatedly to decide what to do, implement the chosen action, and assess whether performance has been satisfactory. When tasks are highly familiar, these basic functions are often carried out without conscious awareness. But in the case of complex or unfamiliar tasks these basic functions may need to be carried out deliberately. In scientific domains, students are mostly focused on implementing actions and much less concerned with making decisions or assessing their performance. Decisions are often made without much thought, causing students to invoke inappropriate knowledge or not to retrieve knowledge that they do possess. Similarly, many students don't perceive the need to assess their performance — or do not know how to assess it effectively. As a result, students often fail to learn adequately from their mistakes. One cannot expect that students will learn to properly apply scientific principles, or to effectively solve scientific problems, unless their basic thought processes are sufficiently systematic. Thus students need to make appropriate decisions, to implement them properly, and to assess their implementation. These basic cognitive functions are essential prerequisites for the performance of scientific tasks and must be made explicit in instruction. 2.3 Instructional strategies for effective learning One needs instructional strategies designed to teach the above thought processes and to provide students with sufficient guidance and feedback. The following strategies address these goals. Acquiring needed thinking skills: "Reciprocal teaching". To help students learn to carry out the basic processes of deciding, implementing, and assessing (and thereby also to acquire more complex scientific abilities), we have employed a modified form of the "reciprocal teaching" [4]. It involves the following two alternating modes of interaction between a student and a tutor: (1) Tutor coaching student. The tutor decides what to do and gives corresponding directions, the student implements these, and the tutor assesses and corrects. (The student practices implementing while the tutor acts as a coach.) (2) Student coaching tutor. The student and tutor reverse roles. The student decides what to do and gives directions, the tutor implements these (but may deliberately make mistakes similar to those common among students), and the student assesses and corrects. (The student practices decision making and assessing, i.e., the student acts somewhat like a coach.) Cognitive considerations suggest that this strategy should be effective for the following reasons: (a) The instruction is highly interactive and keeps the student constantly engaged in active thinking. (b) The basic processes of deciding, implementing, and assessing are made highly explicit, (c) The student practices these processes separately, but in the context of an entire task. (Separate practice allows focused attention without excessive cognitive demands, but the complete task context provides all cues needed for realistic work.) (d) As the tutor and student alternate in their coaching roles, the tutor repeatedly models good performance that the student can then emulate, (e) The tutor constantly monitors student performance, providing feedback and instruction designed to remedy the student's mistakes. (0 All the preceding features provide the student with good individual guidance and feedback, ensuring that the student engages in effective learning activities. In pilot experiments one of us (LAS), playing the role of tutor, applied the reciprocal-teaching strategy to teach college students several physics concepts and principles. The students did, in fact,

LA. Scott and F. Reif /Teaching Scientific Thinking Skills

287

learn these effectively. Furthermore, they came to make more articulate decisions, to assess their work more carefully, and to do these things more spontaneously. Developing independence: "Learningfrom well-studied examples". The reciprocal-teaching strategy provides students with good individual guidance and feedback. But repeated guided practice alone is not sufficient to guarantee that students will ultimately perform well independently without external assistance. There is then need for a strategy helping students to attain such independence. We use a strategy where guided-practice sessions, using reciprocal teaching, are frequently followed by independent-practice sessions. In these the student is asked to work independently on a somewhat similar task, while getting only the minimal feedback necessary to complete the task successfully. The student uses the example, well-studied during the preceding guided practice, to help develop the ability to work without outside assistance. This strategy has the following advantages: (a) The transition to independent performance does not just occur at the end of a period of instruction, but is integrated throughout in a more flexibly adaptive way. Indeed, whenever students have been guided in acquiring some new performance skills, they are always immediately asked to demonstrate their ability to perform more independently, (b) Students are more motivated to engage in careful learning during a guided-practice session. This is because they are explicitly told that they will shortly afterwards need to demonstrate their acquired competence by working independently. (c) This proposed strategy combines the reciprocal-teaching strategy with the well-known instructional strategy of "learning from examples". Indeed, the efficacy of this latter strategy has been well-established by cognitive research. [5, 6] One would expect that its efficacy should increase when it is used in the proposed manner because the preceding reciprocal-teaching session then guarantees that the prior examples have been well studied. 2.4 Individual guidance and feedback provided by computers The preceding instructional strategies, used by a tutor interacting with a student, can provide the student with good individual guidance and feedback — and ensure that the student engages in effective learning. But, it would be practically impossible to provide every student with an individual tutor. This might be partly overcome if good instructional strategies are implemented by a computer playing the role of tutor. Properly designed computer programs could help to ensure that every student receives good individual guidance and feedback. The practical feasibility of such computer programs is enhanced if they can be readily produced without excessive technological sophistication. Educational efficacy is then achieved primarily by careful design and good instructional strategies, but without necessarily resorting to the complexities of artificial intelligence. [7, 8] We call such a program a PAL (Personal Assistant for Learning) indicating that it plays a tutoring role without much human-like intelligence. The guidance and feedback provided by such a computer program might be less good than that available from a human tutor, but can still be much better than what is currently available to most students. (When a PAL is designed to play the role of the tutor in the reciprocal-teaching strategy, the result is computers and students coaching each other.) 3. PAL Tutorials for Newton's law The preceding suggests an instructional approach in which the thought processes required for scientific tasks are analyzed and systematic instructional strategies are implemented by computers (Personal Assistants for Learning) to help students learn these thought processes and cognitive functions while providing every student with individual guidance and feedback. The following describes efforts to explore the feasibility of implementing this approach in a prototypical domain. 3.1 Methods for applying Newton's law The proper interpretation and application of fundamental principles is centrally important in any science, yet a demanding task difficult for many students. We explored how our approach might be

288

LA. Scott and F Reif /Teaching Scientific Thinking Skills

Fig. 1. PAL acts as coach. The screen display shows a partially completed system diagram . PAL, deciding to invoke a step of the method , asks the student to specify the force on the car by the road . The student implements this incorrectly by specifying that this force is upward . PAL assesses this response and provides corrective feedback.

used to teach such principles more effectively, focusing our attention on Newton's second law ma = F101 (a fundamental physics principle that typically causes students much difficulty). Any principle expresses a relationship among concepts. Properly applying a principle requires the following kinds of procedural knowledge: A method specifying these concepts in any particular instance, and a method specifying the relationship among them. By analyzing what ingredients are needed to express Newton's law and how they can usefully be combined, we were then able to specify two corresponding methods for applying Newton's law (already largely formulated and experimentally studied in previous work). f9] The first of these methods can be seen in the upper right corner of Fig. 1 3.2 PALs teaching application of Newton's law Our PAL tutorials were designed to teach the application of Newton's law to solve mechanics problems. The tutorials help students to learn the above-mentioned methods by using the reciprocal teaching strategy. The instructional strategies are then implemented by PAL tutorials of the following three types: (a) PAL coaching the student, (b) The student coaching PAL. (c) PAL providing the student with independent practice. Guided practice: PAL coaching student. PAL (the computer acting as Personal Assistant for Learning) plays the role of coach, deciding which actions the student should implement. PAL then assesses the student's implementations by detecting errors, helping the student to diagnose the reasons for incorrectness, and guiding the student to correct his or her work. Each PAL tutorial deals with a mechanics problem to be solved using the methods for applying Newton's law. A tutorial includes the following three successive parts: Specifying the related concepts by drawing a system diagram, specifying the relationship expressing Newton's law by its component equations, and exploring some qualitative implications of these equations. In the first two parts of the tutorial, PAL displays the appropriate method and follows its steps to decide on successive directions to the student. The student implements each of these steps. PAL then assesses the student's implementation. If it is correct, PAL completes the step and proceeds according

LA. Scott and F. Reif/ Teaching Scientific Thinking Skills

289

Fig. 2. Student acts as coach. Following the studnt's instruction to draw the contact forces on the person in the diagram , PAL drew the forces with incorrect directions . When the student asked PAL to proceed without noting PAL's incorrect implementation, PAL asks the student to check again .

to the method. Otherwise, PAL provides hints or other feedback to help the student diagnose his or her errors and to correct them. (Fig. 1 shows a screen display illustrating such a PAL-student interaction.) The results of the first two parts of the tutorial are a correct system diagram and the corresponding equations. In the final part, PAL asks questions about some qualitative implications of the equations. Guided practice: Student coaching PAL. In this tutorial, the student now acts like a coach, deciding on actions and assessing the implementations done by PAL (who may make mistakes). Each PAL tutorial again applies the methods and includes three successive parts. In the first two parts of the tutorial, PAL displays in randomized order, the steps of the appropriate method. PAL then repeatedly asks the student what it (PAL) should do next. To decide what should be done and direct PAL to do so, the student must then select an appropriate step in the proper order. PAL implements this step, but may make mistakes. (PAL deliberately makes mistakes reflecting common student misconceptions or errors.) After implementing any step, PAL asks to be warned if the student detects any mistakes. (If the student fails to detect such a mistake and merely asks PAL to proceed, PAL expresses misgivings and asks the student to check more carefully.) Whenever the student detects a mistake, PAL asks the student to diagnose the nature of the mistake and to correct it. (Fig. 2 shows a screen display illustrating such a PAL-student interaction.) The results of the first two parts of the tutorial are a correct system diagram and the corresponding equations. In the final part of the tutorial, PAL states some qualitative conclusions which the student may need to correct. Independent practice. It is also necessary that students learn to perform well without the guidance provided by PAL and under ordinary conditions where they may need to work independently on paper rather than on a computer. Accordingly, a PAL "guided-practice tutorial" is frequently followed by a PAL "independent-practice tutorial" which asks the student to work independently on a similar problem. (The preceding guided-practice tutorial thus serves as the example in the instructional strategy of learning from well-studied examples.) In an independent-practice tutorial PAL presents the student with an entire problem. It then asks the student to work through the problem independently (on paper} and then to check the solution by answering questions on the computer. If the student's answers are correct, PAL congratulates the

290

LA. Scott and F Reif / Teaching Scientific Thinking Skills (•) Newton testscores(%)

Fig. 3. Test scores of the three groups. Each box plot indicates the median, qualifies and total range of the score distribution (i.e., the line inside the box indicates the median score and the box includes half the scores). Mean scores and standare errors are displayed in the last column.

The relatively good performance of the Tutoring group is not surprising since it is well known that individual tutoring by experienced human tutors is a very effective instructional method. [10] Our study allowed us to devote large amounts of time and teaching talent to the small number of students in the Tutoring group. However, it is noteworthy that the instructional efficacy of the PAL tutorials was almost as large as that of individual tutoring. This finding has a potentially important practical implication since it would be much more feasible to make PALs available to large numbers of students than to provide each of them with access to an individual human tutor. It is also interesting to compare students' Newton test scores with their average previous test scores. As indicated in Fig. 3b, these previous test scores are essentially the same for all three groups (since the groups were initially selected to be equivalent in these respects).** Fig. 3 indicates that the subsequent Newton test scores of the students in the PAL and Tutoring groups are not very different from those on their prior tests. On the other hand, the subsequent Newton test scores of the students in the Class group are much worse.*** While only a quarter of the Class students received scores less than 65% on the prior tests, nearly half of these students received scores below this amount on the Newton test. This suggests a plausible interpretation. The special help provided to students in the PAL and Tutoring groups apparently allowed them to face the complex task of applying Newton's law without undue difficulties. Without such help, many students in the Class group could not cope — with the result that their performance substantially deteriorated on a crucially important part of the course. Comparison of errors. We examined the tests to identify the types of errors committed by the students in the three groups. In particular, we distinguished between serious errors (e.g., extraneous or missing forces in the system diagram or misapplications of Newton's law) and minor errors (e.g., mistakes in algebra). The percentage of students making serious errors in the Class group was much higher than the percentage of students in the PAL or Tutoring groups. Serious errors can be classified into two main categories: errors in constructing a system diagram and errors in expressing Newton's law. (These categories correspond to the two methods taught in the PAL tutorials and also emphasized by the tutors in the Tutoring group.) Students in the Class group committed many more diagram errors (on both problems) than student in the PAL or Tutoring groups. Errors in the application of Newton's second law were also more prevalent among students in the Class group than among students in the PAL or Tutoring groups. 5. Summary Our goal has been to improve science instruction so as to reduce commonly observed knowledge deficiencies. This goal has led us to analyze the thought processes required to apply scientific concepts or principles. It has also led us to recognize that reliable performance of all such tasks requires the basic cognitive functions of making appropriate decisions, implementing these, and assessing the results We have developed a reciprocal-teaching strategy explicitly designed to teach these basic cognitive functions as well as the more complex thought processes required for applying scientific concepts or

LA. Scott and F Reif /Teaching Scientific Thinking Skills

291

student and considers the tutorial satisfactorily completed. Otherwise, PAL gives the student suggestions and asks the student to try again. This cycle is repeated, with progressively more detailed suggestions, until the student manages to complete the problem. The goal of such an independentpractice tutorial is to encourage the student to complete the problem with minimal assistance. Central issues in PAL production. To achieve programming simplicity, all PAL tutorials were written in "Authorware", a programming language designed for instructional applications (distributed by Macromedia, Inc.). Authoring was done on Macintosh computers, but (with some modifications) the resulting tutorials can run on both Macintosh and Windows platforms. Efforts were made to pay attention to the human-computer interface since it can help or hinder instructional efficacy. The screen layout included limited text with many graphic elements. Color and motion were used to direct the student's attention to appropriate parts of the screen. The mouse (rather than typing) provided the primary means whereby students interacted with the computer. All tutorials were tested and revised by using detailed observations of individual students working with them. Carefully designed multiple-choice questions were frequently used, facilitating the programming task. More importantly, it also helped further the pedagogical aims of teaching decision making and assessing. The tutorials could thereby make explicit the options that need to be considered in decision making — and also make explicit the likely errors that need to be heeded in assessing performance. 4. Instructional efficacy of PAL tutorials We constructed and tested with individual students about a dozen PAL tutorials dealing with the application of Newton's law. These tutorials (somewhat more primitive versions of the ones described in the preceding paragraphs) were then used in an experimental study to assess their efficacy in a classroom setting. This study and its results are described are described in the following subsections. 4. 1 Description of the experimental assessment study This study was carried out in an introductory physics course for science majors at Carnegie Mellon University. Students were offered special help during the week of the course covering Newton's laws. Approximately 75 students (about 40% of students in the class) volunteered. Among these equally motivated students we selected 45 whom we divided into the following three groups of about 15 students. (These groups were carefully chosen to be of equivalent ability as judged by their SAT scores and their scores on the two previous tests given in the course.) (1) PAL group. This group was given a homework assignment consisting mostly of PAL tutorials dealing with problems either identical, or very similar, to those in the regular course assignment. A separate room was set up with computers to accommodate about six of these students at a time. (2) Tutoring group. This group worked on the regular course assignment under conditions where they could receive individual help from experienced human tutors (one of the authors or Professor Jill Larkin) who emphasized methods similar to those used in the PALs A separate room was set up to accommodate approximately six of these students at a time. (3) Class group. This group worked on the course assignment under normal conditions. They received no assistance from us, but could get help from the various teaching assistants in the course 4.2 Performance results Comparison of test scores. The test administered to the whole class at the end of the week served as a measure of how well students had learned to apply Newton's laws. The Newton test scores for each group are summarized by the "box plots" in Fig. 3a. Only about ten percent of the students in the PAL and Tutoring groups received scores less than 65%. By contrast, fully half of the students in the Class group received scores below this amount. The mean scores listed in Fig. 3a also reflect this performance difference. The mean scores for the PAL and Tutoring groups (78.5% and 84.0%) are both significantly greater than the mean score of the Class group (62.5%) *

292

LA. Scott and F. Reif /Teaching Scientific Thinking Skills

(b) Previous average lest scores (%)

(«) Newton test scores (%)

Mean score

Mean score Class

Class

PAL

PAL

Tutoring

Tutoring

Fig. 3. Test scores of the three groups. Each box plot indicates the median, quartiles and total range of the score distribution (i.e., the line inside the box indicates the median score and the box includes half the scores). Mean scores and standare errors are displayed in the last column.

The relatively good performance of the Tutoring group is not surprising since it is well known that individual tutoring by experienced human tutors is a very effective instructional method. [10] Our study allowed us to devote large amounts of time and teaching talent to the small number of students in the Tutoring group. However, it is noteworthy that the instructional efficacy of the PAL tutorials was almost as large as that of individual tutoring. This finding has a potentially important practical implication since it would be much more feasible to make PALs available to large numbers of students than to provide each of them with access to an individual human tutor. It is also interesting to compare students' Newton test scores with their average previous test scores. As indicated in Fig. 3b, these previous test scores are essentially the same for all three groups (since the groups were initially selected to be equivalent in these respects).** Fig. 3 indicates that the subsequent Newton test scores of the students in the PAL and Tutoring groups are not very different from those on their prior tests. On the other hand, the subsequent Newton test scores of the students in the Class group are much worse.*** While only a quarter of the Class students received scores less than 65% on the prior tests, nearly half of these students received scores below this amount on the Newton test. This suggests a plausible interpretation. The special help provided to students in the PAL and Tutoring groups apparently allowed them to face the complex task of applying Newton's law without undue difficulties. Without such help, many students in the Class group could not cope — with the result that their performance substantially deteriorated on a crucially important part of the course. Comparison of errors. We examined the tests to identify the types of errors committed by the students in the three groups. In particular, we distinguished between serious errors (e.g., extraneous or missing forces in the system diagram or misapplications of Newton's law) and minor errors (e.g., mistakes in algebra). The percentage of students making serious errors in the Class group was much higher than the percentage of students in the PAL or Tutoring groups. Serious errors can be classified into two main categories: errors in constructing a system diagram and errors in expressing Newton's law. (These categories correspond to the two methods taught in the PAL tutorials and also emphasized by the tutors in the Tutoring group.) Students in the Class group committed many more diagram errors (on both problems) than student in the PAL or Tutoring groups. Errors in the application of Newton's second law were also more prevalent among students in the Class group than among students in the PAL or Tutoring groups. 5. Summary Our goal has been to improve science instruction so as to reduce commonly observed knowledge deficiencies. This goal has led us to analyze the thought processes required to apply scientific concepts or principles. It has also led us to recognize that reliable performance of all such tasks requires the basic cognitive functions of making appropriate decisions, implementing these, and assessing the results. We have developed a reciprocal-teaching strategy explicitly designed to teach these basic cognitive functions as well as the more complex thought processes required for applying scientific concepts or

LA. Scott and F. Reif/ Teaching Scientific Thinking Skills

293

principles. This strategy provides a student with effective guidance and feedback. Guided-practice sessions, using this strategy, can also be used in conjunction with interspersed independent-practice sessions where students are given minimal outside assistance. This allows students to learn from wellstudied examples so as to develop their ability to work independently. Adequate individual guidance and feedback, although lacking in most courses, is essential to ensure effective learning. Computers, employing the preceding instructional strategies, can act as PALs (Personal Assistants for Learning) providing a practical means of providing individual guidance and feedback to very students. Even with little human-like intelligence, such PALs can play the role of tutors in a reciprocal-teaching strategy and provide students with better individual guidance and feedback than they currently receive. They can also help guide them toward independent performance. To demonstrate the feasibility of this instructional approach, we constructed a set of PAL computer tutorials designed to teach the application of Newton's laws. We then carried out a study to assess their efficacy in the context of a physics course. This study showed the following: (a) The PAL tutorials were nearly as effective as individual tutoring by experienced tutors, but required much less instructor time. (b) The PAL tutorials prevented nearly all the students from failing the subsequent test (i.e., getting scores less than 65%). By contrast, about half of the equally able and motivated students failed this test when they had received only the instruction provided in the course, (c) Students liked the PALs, found them helpful to their learning, and perceived they were learning useful ways to think about physics. The preceding results suggest that an instructional approach, using cognitively-based strategies incorporated in PAL tutorials, may fruitfully be continued and extended. Footnotes * An analysis of variance, with Scheffe post hoc tests, showed that these differences are highly significant. (The observed difference between the PAL group and Class group would only occur by chance with a probability p = 0.03, and that between the Tutoring Group and Class group would only occur with a probability p = 0.003.) ** An analysis of variance confirms that the means of the previous average test scores of the three groups are essentially equivalent (i.e.. no significant difference, p = 0.94). *** Paired t-tesls show that the Newton test score, compared to the previous average test score, is not significantly different for the PAL or Tutoring groups, and significantly lower (p = 0.04) for the Class group. References [1] Reif, F. (1987). Interpretation of scientific or mathematical concepts: Cognitive issues and instructional implications, Cognitive Science 11, 395–416. (2] Labudde, P., Reif, F. & Quinn, L. (1988). Facilitation of scientific concept learning by interpretation procedures and diagnosis, International Journal of Science Education 10, 81–98. [3] Reif, F. & Allen, S. (1992). Cognition for interpreting scientific concepts: A study of acceleration, Cognition and Instruction 9, 1–44. [4] Palincsar, A. S. & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities, Cognition and Instruction, 1, 117–175. [5] Sweller, J. & Cooper, G. A. (1985). The use of worked examples as a substitute for problem solving in learning algebra, Cognition and Instruction, 2, 59-89. [6] Zhu, X. & Simon, H. (1987). Learning mathematics from examples and by doing. Cognition and Instruction 4, 137-166. [7] Anderson, J. R., Boyle. C F. & Reiser, B. J. (1985). Intelligent tutoring systems, Science, 228, 456-462. [8] Anderson, J. R, Corbett. A. T., Koedinger, K. R. & Pelletier, R. (1995). Cognitive tutors: Lessons learned. Journal of the Learning Sciences, 4, 167–207. [9] Reif, F. (1995). Understanding Basic Mechanics, New York: Wiley. [10] Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring, Educational Researcher, 13, 3–16.


Metacognition



297

Teaching meta-cognitive skills: implementation and evaluation of a tutoring system to guide selfexplanation while learning from examples Cristina Conati1 and Kurt VanLehn1.2* 1

2

Intelligent Systems Program, University of Pittsburgh, U.S.A. Department of Computer Science, University of Pittsburgh, U.S.A.

The SE-Coach is a tutoring module designed to help students learn effectively from examples through guiding self-explanation, a meta-cognitive-skill that involves clarifying and explaining to oneself the worked out solution for a problem. The SE-Coach provides this guidance through (a) an interface that allows the student to interactively build self-explanations based on the domaintheory (b) a student model that assesses the quality of the student's explanations and the student's understanding of the example. The SE-Coach uses the assessment in the student model to elicit further self-explanation to improve example understanding. In the paper we describe how the SE-Coach evolved from its original design to the current implementation via an extensive and thorough process of iterative design, based on continuous evaluations with real students. We also present the results of the final laboratory experiment that we have performed with 56 college students. We discuss some hypotheses to explain the obtained results, based on the analysis of the data collected during the experiment.

1

Introduction

Computer-based tutors generally focus on teaching domain specific cognitive skills, such as performing subtractions in algebra or finding the forces on a body in Newtonian physics. However, a key factor that influences the quality of learning is what cognitive processes are triggered when the student learns. Tutoring is more effective when it encourages cognitive processes that stimulate learning and discourages counterproductive cognitive processes. We have developed a tutoring module, the SE-Coach, that instead of teaching directly the knowledge necessary to master a target domain, stimulates and guides the application of selfexplanation, a learning process that allows the effective acquisition of knowledge in many domains where it is possible to learn from examples. Self-explanation is the process of generating explanations and justifications to oneself when studying an example. There are many studies showing that students who self-explain learn more[l–3]. When students are either explicitly taught [4] or even just prompted [5] to self-explain, most students will do so and thus increase their learning. The SE-Coach provides tutoring for self-explanation within Andes, a tutoring system designed to teach Newtonian physics to students at the US Naval Academy [6]. Within Andes, the SE-Coach makes sure that students thoroughly self-explain the available examples, especially those parts that may be challenging and novel to them. A first prototype of the SE-Coach was described in [7]. It included: (a) a Workbench, that interactively presents examples and provides tools to construct theory-based selfexplanations, (b) a probabilistic student model, that uses both the students' workbench actions and estimates of their prior knowledge to assess the students' understanding of an We thank Prof. Jill Larkin for her invaluable help at the different stages of the system design and evaluation. This research is sponsored by ONR's Cognitive Science Division under grant NOOO14-96-1-0260.

298

C. Conati and K VanLehn / Teaching Meta-Cognitive Skills

example, and (c) a Coach, that uses the assessment from the student model to identify deficits in the students' understanding and elicits self-explanations to remedy them. In this paper we describe how the initial prototype evolved into the current implementation through successive evaluations with real students. We focus in particular on the changes to the Workbench and to the Coach. Details on the implementation and performance of the SECoach student model can be found in [8]. In Section 2 we outline the features of the selfexplanation process that influenced the design of the SE-Coach. In Section 3 we give an overview of the SE-Coach's architecture. In Section 4 and 5 we describe the development of the Workbench and the SE-Coach respectively. In Section 6 we discuss a laboratory experiment that we performed with 56 college students to formally evaluate the effectiveness of the SE-Coach. Although the subjects that used the SE-Coach performed better than the control group, the difference did not reach statistical significance. However, the analysis of the log data files generated during the experiment provides interesting insights on how the students perceived and used the systems. In the last section of the paper we discuss these insights and further changes that could help improve the effectiveness of the tutor. 2

Self-explanation with the SE-Coach

A distinguishing characteristic of the SE-Coach is that it focuses on correct selfexplanations. In all the previous studies, even incorrect statements were classified as selfexplanations. When human tutors guided self-explanation[4, 5], the experimenters did not give feedback on the self-explanations content or correctness. In all these experiments, students' problem solving improved, leading some researches to argue that it is the selfexplanation process per se, and not the correctness of its outcome, that elicits learning [2]. Although we agree that even incorrect and incomplete self-explanations can improve learning, we also believe that correct self-explanation can extend these benefits. Therefore, the SE-Coach is designed to verify the validity of students' explanations, and to provide feedback on their correctness. A second characteristic of the SE-Coach is that it focuses on two specific kinds of selfexplanation: (a) justify a solution step in terms of the instructional domain theory, and (b) relate solution steps to goals and sub-goals in the underlying solution plan. While students generally produce a high percentage of theory-based self-explanations, they tend not to generate goal-related explanations spontaneously [3], although these self-explanations can help acquire highly transferable knowledge [10]. We designed the SE-Coach to target these useful but uncommon self-explanations specifically, thus hoping to further improve the benefits for learning. Another kind of quite frequent self-explanations involves knowledge outside the instructional domain. Unfortunately, the SE-Coach cannot monitor and guide the generation of these explanations. The system would require a natural language based interface, and a much more complex knowledge base and student model to process and evaluate them. However, even if the SE-Coach cannot explicitly guide self-explanations based on background knowledge, hopefully it does not prevent the students from generating them spontaneously. 3

The SE-Coach architecture The SE-Coach has a modular architecture, as shown in Figure 1. The left side shows the authoring environment. Prior to run time, an author creates both the graphical description of the example, and the corresponding coded example definition. A problem solver uses this definition and the set of rules representing Andes' physics knowledge to automatically generate a model of the example solution called the solution graph. The solution graph Figure l: SE-Coach's architecture encodes how physics rules generate intermediate goals and facts in the example

is a dependen

C. Conati and K. VanLehn / Teaching Mela-Cognitive Skills

299

solution to derive the example's desired quantities [11]. The right side of the figure shows the run-time student environment. Students use the Workbench to study examples and to generate self-explanations. The Workbench sends the students' explanations to the SE-Coach, which tries to match them with rules in the solution graph and provides immediate feedback on their correctness[7]. The student's workbench actions are also sent to the student model, which uses them to assess the quality of the student's explanations and example understanding [8]. The SE-Coach refers to the student model to make decisions about what further self-explanations to elicit from the student. 4

The Workbench for self-explanation

When the student selects an example to study, the Workbench presents it with all the text and graphics covered with gray boxes, each corresponding to a single "unit" of information. When the student moves the mouse pointer over a box, it disappears, revealing the text or graphics under it. This allows the SE-Coach to track what the student is looking at, and for how long. Whenever the student unmasks a piece of the example, if it contains an idea worthy of explanation the Workbench will append a button labeled "self-explain". Pressing the button gives the student a choice between "This fact is true because..." and "This fact's role in the solution plan is...". If the student selects the first choice, a rule browser is displayed in the right half of the window (see Figure 1), whereas if the student selects "The role of the fact in the solution plan is.... " then the right part of the window displays a plan browser. The next sections describe how the interaction proceeds in the two cases. 4. 1

The rule browser

The rule browser (Figure 2) contains all the system's physics rules, organized in a tree structure so that clicking on the + and - buttons reveals and hides subtrees of the hierarchy. Using this browser, the student finds and selects a rule that justifies the uncovered fact. If the student then selects "submit," the SE-Coach will use red/green feedback to indicate whether the selected rule is the one that explains the uncovered information. The SE-Coach does not provide additional help besides red/green feedback, since one feature that makes selfexplanation effective for learning is that students elaborate the available material and knowledge by themselves. Thus, when a wrong rule is selected, the only way for the student to correct the mistake is to keep browsing the hierarchy until the correct rule is found. For this reason, the organization of the rule names in the browser is crucial to make the search for the correct rule a thought provoking activity, instead of a frustrating one that may result in the student clicking exhaustively on all Figure 2: the rule browser

The current organization of the rule hierarchy is the result of successive evaluations with pilot subjects, which helped reduce the amount of floundering observed in the first versions of the browser. A interesting behavior that surfaced during these evaluations is that most students did not try to click on rule names randomly when they got stuck. Rather, when they could not find plausible candidates in the category that they had expanded they would stop, instead of browsing other parts of the hierarchy. We repeatedly changed the category names and arrangement to maximize the chance that students immediately enter the right part of the hierarchy. We also provided cross references for rules that could plausibly belong to different categories, such as the rule encoding the definition of Net Force, which rightfully belongs to the category Newton's Second Law but that students often tried to find in the category Forces. 4. 2 The rule templates The rule browser lists only the names of the rules, and most students will need to know more about a rule before they can be sure that it is the explanation they want. To explain

300

C. Conati and K VanLehn / Teaching Meta-Cognitive Skills

more about a rule, the student can click on the "template" button in the rule browser (Figure 2). A dialog box comes up (see Figure 3) with a partial definition of the rule that has blanks for the student to fill in. Clicking on a blank brings up a menu of possible fillers. After completing a template, the student can select "submit," which will cause the SE-Coach to give immediate feedback. By filling in a rule template, students can explain in a much more active way what a Figure 3: rule template rule says than by simply reading and selecting the rules from menus. Again, pilot evaluations were fundamental to assess and improve the clarity and meaningfulness of the template fillers in the pull down menus. For example, we discovered that students tended to ignore fillers that were too verbose, even when they were the only obviously correct choices. Another relevant insight that we gained from pilot evaluations was that, if students are given too much freedom as to whether to access a template or not, they tend not to do it. In the first version of the system, once a correct rule was selected the student could either click on the Template button at the bottom of the browser or click Done and quit. Most students never accessed templates. When asked why, they said that they did not remember what a template was, although the experimenter had extensively explained the interface at the beginning of the evaluation session. The simple change of giving only the Template choice after rule selection highly increased the percentage of students that filled templates, although students could still close a template without filling it by clicking on the Cancel button at the bottom (Figure 3). 4.3

Plan browser

If the student had selected "The role of the fact in the solution plan is...." after pushing the self-explain button, then the right part of the window would display a plan browser instead of a rule browser. The plan browser displays a hierarchical tree representing the solution plan for a particular example. The student indicates the explanation of the role of the uncovered fact in the solution plan by navigating through the goal hierarchy and selecting a plan step that most closely motivates the fact. The "submit" button causes SE-Coach to give immediate feedback. There are no templates associated with the plan browser, since they would simply explicitly spell out information on the plan structure already encoded in the browser hierarchy (e.g. If the goal is to apply Newton's law and we have selected a body, then the next subgoal is to describe the properties of this body) 5

SE-Coach's advice

Initially, self-explanation is voluntary. The SE-Coach keeps track of the students' progress through the example, including how much time they looked at a solution item and what they chose to self-explain via the rule and plan browsers. This information is passed to the probabilistic student model, which integrates it with estimates on the student's current knowledge of physics rules to assess what solution items need more self-explanation. In particular, when a student fills a template or select a plan step correctly, the probability of the corresponding rule is updated by taking into consideration the prior probability of the rule and how many attempts the student made to find the correct selection[8]. If a student tries to close an example, the SE-Coach consults the student model to see if there are solution items that require further explanations. The student model returns solution items that correspond to facts or goals derived from rules with a low probability of being known, or items with reading time not sufficient for self-explanation [8]. If the student model indicates that there are lines that need further explanation, the SECoach tells the student "You may learn more by self-explaining further items. These items are indicated by pink covers", and colors some of the boxes pink instead of gray. It also attaches to each item a more specific hint such as "Please self-explain by using the Rule browser" or "Please read more carefully". The color of the boxes and the related messages change dynamically as the student performs more reading and self-explanation actions.

C. Conati and K. VanLehn / Teaching Meta-Cognitive Skills

301

If the student tries to close the example when there are still some pink covers left, the SECoach generates a warning such as "There are still some items that you could self-explain. Are you sure you want to exit?", but it lets the student quit if the student wants to. The SE-Coach's advice is probably the feature that was most affected by the feedback from pilot evaluations. In the original version of the system, the Coach would point out lines that required self-explanations one at a time, instead of indicating them all at once by changing their color. When the student tried to close the example, the SE-Coach would generate a first, generic warning such as "There are still some items that you could self-explain. Do you want to try? " The student could either (a) reject the advice, (b) accept it and go back to study the example without any further indication of what to self-explain, (c) ask for more specific hints. If the student chose the latter, the SE-Coach would say, for example "Why don't you try to use the rule browser to explain this line?", and it would uncover the line. At this point the student would go back to the example, and possibly explain the line, but the only way for the student to get additional suggestions from the Coach would be to close the example again. The rationale behind this design was to stimulate as much spontaneous self-explanation as possible. We thought that directing the student to a particular example line could be enough to also trigger explanations on other lines. This did not happen. Either students were natural self-explainers and explained most of the example the first time through, or they strictly followed individual SE-Coach hints but rarely initiated any additional self-explanation. For non-spontaneous self-explainers, the interaction with the coach would quickly become quite uninspiring, since after doing what the Coach had suggested (e.g. finding a rule name in the rule browser), they would try to close the example and they would get another hint ("there is something else that you could self-explain, do you want me to show you"), suggesting further explanation either on the current line via template/plan browser or on a different line. A student would have to repeat this cycle to access each new piece of advice, and most students lost interest and chose to close the example after the first couple of hints. The current design, based on the coloring of example lines, allows the students to see at once all the parts that they should self-explain, and what workbench tool they should use for the explanations. It also gives the students better feedback on the progresses that they making, since line color and hints change dynamically as students generate more self-explanations. 6 Empirical evaluation of the SE-Coach Once we had iteratively improved the system design through pilot evaluations, we performed an empirical evaluation to test its effectiveness. 6. 1

Experiment design

We conducted a laboratory experiment with 56 college students who were taking introductory physics classes at the University of Pittsburgh, Carnegie Mellon University and U.S.Naval Academy. The design had two conditions: Control: 27 students studied examples with the masking interface only. Experimental: 29 students studied examples with the SE-Coach. The evaluation consisted of one session in which students 1) took a paper and pencil physics test, 2) studied examples on Newton's second law with the system, 3) took a paper and pencil post-test with questions equivalent but not identical to the ones in the pre-test, 4) filled put a questionnaire designed to assess the students impressions on the system. Timing was a heavy constraint in the experiment. The sessions needed to be held when students already had some theoretical knowledge to understand the examples and generate self-explanations, but were not so far ahead into the curriculum that our examples would be too trivial for them. To satisfy this constraint, we ran subjects in parallel in one of the Pitt University computer labs. Another constraint was that we had to concentrate the evaluation in one session, to avoid that the post-test performance be influenced by knowledge that students were gaining from their physics class. This, and the fact that the computer lab was available in 3-hour slots, obligated us to limit the length of pre-test and post-test. Thus, we could not insert any items to specifically test knowledge gained from goal-based explanations built with the plan browser, and we had to rely on the possibility that students would show such knowledge in the resolution of the problem solving questions available in the test. In order to roughly equate time on task, students in the control condition studied 6 examples and students in the experimental condition studied 3 examples. Despite this, there is a

302


statistically significant difference between the average time on task of the experimental group (52') and the control group (42' 32"). However, we found no significant correlation of time on task with post-test scores. 6.2

Results

Two different grading criteria were used for pre and post test. The first criterion, called objective grading, comprised only those questions in the test that required a numeric answer Group N control 27 se-group 29

Mean StdDev 2 30 2 38 2.38 1.76

Group control se-group

N 27 29

Mean 5.04 6.04

StdDev 4 35 4.49

Table 1: (a) objective-based gain scores (b) Feature-based gain scores

or a selection from a set of choices, and looked Only at result.

The second criterion, called feature-based grading, included also those items in the test that required more qualitative definitions, and took into account how students got their answers. For both grading systems, there were no significant differences between conditions on the pre-test scores. Unfortunately, the gain scores were also not significantly different, although the trend was in the right direction and the gain score difference was higher for feature-based grading (table 1), which was more apt to capture knowledge gains due to self-explanation. A possible explanation for the non-significant result is that students in the experimental condition did not generate sufficient self-explanations with the Workbench tools, because they had problems using them and/or because they did not follow the SE-Coach advice. To test this explanation, we extracted from the experimental group log data file information on Workbench tools usage and SE-Coach's performance: (a) how many times students initiated self-explanations with rule browser, templates and plan browser, (b) how many of these explanations were successful, (c) how many attempts it took the students on average to find a correct answer in each of the self-explanation tools, or to decide to quit the explanation, and (d) how often students followed the SE-Coach's advice. Rule browser usage. On average, students initiated 28.8 rule browser explanations, which represents 62% of the total rule browser explanations that can be generated in the available examples. Of the initiated rule browser explanations, 87% successfully ended with the selection of the correct rule. On average it took the students 1.27 attempts to get the correct answer, with a average maximum of 9.2 attempts. Although on average students did not flounder much to find a correct rule, for almost all of them there was at least one rule that was very hard to find. The rule browser accesses that failed to find the correct rule took an average of 4 attempts and students spent an average of 4 minutes on failed rule browser explorations, a minor fraction of the average total time on task (52 minutes). This data shows that, although the rule browser did not seem to cause many problems to students, it could have generated some degree of distraction and frustration in the few situations in which a student took a long time to find the correct rule or could not find it at all. The system may benefit from an additional form of help, that leads the student to find the right browser category when the student is floundering too much. This was in fact the main suggestion that students wrote in the questionnaire that they filled after the post-test. Template usage. On average students accessed 23.8 templates, 55.5% of the available template explanations. This data is not indicative of how effectively templates stimulate selfexplanations since, as we described in Section 4.2, template access is mandatory when a correct rule is selected in the Rule browser. More indicative is the fact that, although it is not mandatory to fill a template after opening it, 97% of the accessed templates were filled correctly, with an average of only 0.5 attempts and an average maximum of 2.5 attempts. On average students spent only 59 seconds trying to fill templates for which they could not find the correct answer. This data allows us to discard user interface problems with templates as a cause for the non-significant results. Plan browser usage. Students initiated only 38% of the possible plan browser explanations. Students did not have many problems using the plan browser. Of the initiated explanation, 85% resulted in the selection of the correct plan step, with an average of 1 attempt. Students spent on average only 29 seconds on plan browser accesses that did not lead to a correct explanation. Despite good plan browser performance, we could not detect any gain in the students' planning knowledge because the post-test did not have any question that


303

specifically tapped it. Furthermore, many students wrote in the questionnaire that they found the plan browser not very useful. This outcome is not surprising. As we mentioned in Section 2, goal-related explanations are quite unnatural for students. This is especially true if students don't have any theoretical knowledge on the notion of solution planning. The plan browser was designed with the idea of evaluating the system at the Naval Academy, with students that had been introduced to the idea of abstract planning by the physics professors participating to the Andes project. We hope to be able to perform this evaluation in the near future, to verify the effectiveness of the plan browser when used in the optimal instructional context. SE-Coach result. As described in Section 5, the SE-Coach gives its suggestions by changing the color of the lines to self-explain and by attaching to each line specific hints indicating if the line should be explained with rule browser/template, with the plan browser, or if it should be simply read more carefully. In the three evaluation examples, the SE-Coach can generate a maximum of 43 rule browser hints, 34 plan browser hints and 43 hints to read more carefully. The Coach gave an average of 22.6 rule browser hints, 22.4 plan browser hints and 7 reading hints. Each student followed an average of 38.6% of these rule browser hints, 42% of the plan browser hints and 34% of the hints suggesting to read more carefully. As we explained in Section 5, the SE-Coach hints are given based on the student's model assessment of what solution items correspond to rules that have low probability (< 0.75) of being known by the student. As students correctly explain the suggested solution items, probabilities of the corresponding rules are increased in the student model. So an indicator of the effectiveness of the SE-Coach is the percentage of physics and planning rules that, at the end of the evaluation session, have changed their probability from less to more than 0.75. On average, 79.3% of the physics rules used in the three examples and 77% of the plan rules reached the 0.75 threshold. 6.3 Results discussion The results on Workbench usage suggest that user interface problems are not likely to be a primary cause for the non-significant difference in gain scores, although changes that reduce floundering in the rule browser could help improve the effectiveness of the system. On the other hand, the results on the effectiveness of the SE-Coach's advice show that, although the current design works much better than the original one described in Section 5, better learning for the experimental group could be obtained with a stronger form of coaching, that leads students to self-explain more exhaustively. As a matter of fact, in all the experiments in which human tutors elicited self-explanation, the tutor made sure that students self-explained every item in the target examples. We did not want to make the SE-Coach's suggestions mandatory because they are based on a probabilistic student model whose accuracy had not been tested at the time of the evaluation. In particular, the student model predictions strongly depend on estimates of student's initial physics knowledge [8]. At the time of the evaluation we had no way of obtaining these estimates for every student, so we assigned to every rule a probability of 0.5. Given the possible inaccuracy of the model, we did not want to risk frustrating the students by forcing them to explain example lines that they may have already understood. We may obtain better results from the SE-Coach with an evaluation in which we set the initial probabilities of the student model by using the results of the student's pre-test, and we make the SE-Coach hints mandatory. Three more hypotheses for the lack of significant gain scores should be considered. The first hypothesis is that students in the control group self-explained as much as students in the experimental group. This hypothesis is not easy to test since we have no simple way to ascertain whether control students self-explained or not. We are currently working on the analysis of control group log data files, to see if we can identify any correlation between how students read the examples and their posttest results. The second hypothesis is that the self-explanations generated with the Workbench did not stimulate as much learning as verbal self-explanations do. Also, the fact that students must concentrate on the self-explanations allowed by the Workbench may actually inhibit the generation of self-explanations based on knowledge outside the physics domain that, as we discussed in Section 2, appeared quite frequently in experiments on verbal self-explanations. A possible way to test this second hypothesis is to compare the SE-Coach interface with an interface that allows students to express their self-explanations by writing.

304


Lastly, given that the experimental group post-test scores were higher than the control group scores, but the difference was not large compared to the standard deviation, it may be that the SE-Coach works fine but students did not use it long enough. If students studied twice as many examples, perhaps the difference in learning between the two groups would be large enough to be statistically significant. 7

Conclusions

The SE-Coach is a tutoring module that focuses on teaching the meta-cognitive skill known as self-explanation, instead of directly teaching cognitive skills related to a particular instructional domain. Many studies show that self-explanation, the process of clarifying and making more complete to oneself the solution of an example, can improve problem solving performance, and that guiding self-explanation can extend these benefits. We believe that empirical evaluations are fundamental for the development of instructional systems of real effectiveness. This is especially true for the SE-Coach, since it focuses on a learning process whose underlying mechanisms are still unclear and under investigation. In this paper, we described how the system evolved through pilot evaluations from the original design proposed in [7] to its current version. In particular, we illustrated how these evaluations shaped two fundamental elements of the system (a) the SE-Coach interface, known as the Workbench, that provides specific tools for constructing self-explanations, and (b) the SE-Coach's advice, which uses the assessment of a probabilistic student model to elicit self-explanations that can improve the students' understanding of the example. We also illustrated the results of a formal evaluation that we performed with 56 college students to test the effectiveness of the system. Although the learning trend was in the right direction, the results did not reach statistical significance. However, the analysis of the log data files collected during the evaluation allowed us to understand how students used the system, and to generate hypotheses to explain the lack of statistically significant results. We plan to start testing with formal evaluations those hypotheses that involve minor changes to the system (adding additional help to use the Workbench tools, making the SECoach's advice mandatory) and minor changes to the experiment design (adding more specific test questions to tap all the knowledge addressed by the SE-Coach, increasing the time on task by making students study more examples). The insights provided by these new evaluations could be used in the future to develop and study alternative self-explanation user interfaces and coaches in order to see which ones encourage the most learning. 8 [1]

References

Chi, M.T.H., et a!., Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 1989. 15: p. 145–182. [2] Chi, M.T.H., Self-explaining: A domain-general learning activity, in Advances in instructional Psychology, R. Glaser, Editor, in press, Erlbaum: Hillsdale, NJ. [3] Renkl, A., Learning from worked-examples: A study on individual differences. Cognitive Science. 1997. 21(1): p. 1-30. [4] Bielaczyc, K., P. Pirolli, and A.L. Brown, Training in self-explanation and self-regulation strategies: Investigating the effects of knowledge acquisition activities on problem-solving. Cognition and Instruction. 1995. 13(2): p. 221–252. [5] Chi, M.T.H., et al., Eliciting self-explanations improves understanding. Cognitive Science, 1994. 18 [6] VanLehn, K., Conceptual and meta learning during coached problem solving, in 1TS96: Proceeding of the Third International conference on Intelligent Tutoring Systems., C. Frasson, G. Gauthier, and A. Lesgold, Editors. 1996, Springer-Verlag: New York. [7] Conati, C., J. Larkin, and K. VanLehn, A computer framework to support self-explanation, in Proceedings of the Eighth World Conference of Artificial Intelligence in Education. 1997. [8] Conati, C. A student model to assess self-explanation while learning from examples. To appear in Proc. of UM'99, 7th Int. Conference on Student Modeling, Banff, Canada. [9] Catrambone, R., Aiding subgoal learning: effects on transfer. Journal of educational psychology, 1995. 87 [10] Conati, C., et al., On-line student modeling for coached problem solving using Bayesian networks, in User Modeling: Proc. of the 6th International conference, UM97, 1997, Spring Wien: New York.

Artificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) I OS Press, 1999

305

Metacognition in Epistolary Rhetoric: A Case-Based System for Writing Effective Business Letters in a Foreign Language Patrick Boylan, Carla Vergaro Dipartimento di Linguistica, Universita di Roma Tre Via del Castro Pretorio, 20 1–00185 Roma, Italia Alessandro Micarelli, Filippo Sciarrone Dipartimento di Informatica e Automazione, Universita di Roma Tre Via della Vasca Navale, 79 1–00146 Roma, Italia

Contact e-mail: micarel@dia. uniroma3 . it

Abstract: The Business Letter Tutor discussed here and described more fully in [10] helps office staff learn to correspond more effectively (in their native language but also in any foreign language) by getting them to define their goals and then retrieving and displaying appropriate excerpts from a database of letters of proven value appropriately tagged paragraph by paragraph. The present paper describes two problems encountered while developing the prototype of the system, as well as the solutions devised, now being implemented. The lessons learned are: (i) the sophisticated meta-language used to tag the paragraphs of the letters stored in the database cannot be used to furnish the user with categories of letter writing styles and strategies from which to choose; another "meta-meta-language" must be created, couched in everyday language and based not on conceptual categories but on pragmatic goals; (ii) the user will, with time, tend to get into a rut and choose the same rhetorical strategies and even the very same letters over and over, simply because they work at least minimally; thus, to guarantee the educational value of the Tutor and to get the user to explore new ways of writing effectively, the System must incorporate an additional component in the form of an Overseer with a "personality" of its own, capable of suggesting (but not imposing) different approaches to a specific writing task.

1. Introduction The Business Letter Tutor discussed here and described more fully in [10] helps office staff learn to compose effective business letters in English by getting them to define their goals and then retrieving and displaying appropriate excerpts from a database of letters of proven value appropriately tagged paragraph by paragraph. In cutting and pasting the excerpts together, users learn through example what effective letter writing means. Instruction is "self-directed" in that it is up to the users to judge how suitable the retrieved excerpts are and how they should be pieced together. The assumption is that, in a business environment, users will want to produce (and have a way of recognising) satisfactory end products. Their final choices may therefore be considered "expertise" which the Tutor can use to furnish ever more useful excerpts to study in similar circumstances in the future.

306

P. Boylan et al / Metacognition in Epistolary Rhetoric

2. The Educational Philosophy behind the Project The project is based on the dictum according to which teaching less favours learning more [12]. Conventional wisdom has it that training is only as effective as the trainer is competent - thus the attempt to create tutoring systems able to formalise and control the entire learning process, systems which can then (it is hoped) be fine-tuned to perfection [3]. Implicit in this Faustian attitude is its Pygmalion counterpart: learners are seen as empty vases to be filled with "knowledge" by a deft trainer/system, i.e. one capable of pouring information into their heads without spilling a drop. Or, to use an image more consonant with computer science, learners are seen as inert silicon chips to be programmed by a skilled human/machine tutor so that, for a given input, a specified output is regularly obtained (learners are said to "know" a "subject" when they regularly output expected answers to test questions). This philosophy has such ancient roots that it has withstood decades of research clearly demonstrating the fundamental creativity — and thus uncontrollability - of the learning [13J. Cognitive scientists have in fact shown that students are neither empty vases nor inert chips: they are, indeed, the engineers of their own learning process [2]. Teachers or trainers, like books or audio-visual aids or computer programs, are simply one of the tools offered to students by the educational environment; they can help or hinder learning but cannot cause it to happen (nor prevent it from happening). The first premise of this paper, therefore, is that learners are experimenters who want (or who can be led to want) to investigate a domain and who, in forging tools for this purpose, end up (re-)creating a "subject". This view of learning radically changes the role of the tutor (human or electronic) who, from "depository of knowledge", becomes the agent responsible for creating a stimulating environment in which the learner can come to grips with a given domain by conducting successful experiments on it. 3. The First Implementation and the Educational Benefits The Tutoring System presented here reflects this philosophy. It aims at harnessing the power of an Al-based engine to give users control over the System (specifically, over a data base of business letters) instead of giving the System control over them. The program first gets the user to define the circumstances of the letter to be written; then it searches a Case Library of past (successful) company correspondence for excerpts that match as closely as possible both the present circumstances and the user profile. Hits appear in a Model Letters Window. To produce an appropriate letter, users cut and paste excerpts into an Edit Window and then adjust the collage to fit the current situation. How are these processes handled by the System? In the original design (see Figure 1), letter excerpt retrieval is handled by three components: la) a User Model i.e., the set of attributes describing the person writing the letter or the person for whom the letter is being written (e.g., in the case of a secretary using the Tutor, the boss). In an office-pool situation, there would be many "bosses" and therefore the system would contain many User Models, each with an idiosyncratic way of handling such writing tasks as ADDRESSING.COMPLAINTS, REQUESTINGJNFORMATION or DISPUTING. FINDINGS. The User Model is built up initially by selecting a User_Stereotype [11] (from among a set number of Stereotypes programmed into the system) on the basis of the answers which a new user gives to a brief questionnaire. In other words, the User defines himself (or is defined by a secretary) in terms of ATTITUDES and EXPRESSIVE STYLE and the system picks the closest match from among the already existing Stereotypes. A Stereotype is a set of attributes/values, i.e. weighted ATTITUDES and EXPRESSIVE STYLES typically associated with letter-writing MOVES and STRATEGIES, i.e. with an inventory of the thematic development devices (or "rhetoric") contained in the various letters stored in the Business Letter-Writing Component. Ib) a Recipient Model — i.e., the person to whom the letter is addressed. As in the case of the creation of the User Model, the recipient is initially defined by means of a short questionnaire; the system then enhances the definition by associating it with one of several Recipient_Stereotypes programmed into the system.

P. Boylan et al. / Metacognition in Epistolary Rhetoric

307

Figure 1. The Architecture of the System.

Both the Recipient and the User Models are then refined by the system as time goes by. That is to say, the attributes and values associated with successfully retrieved letters, in relation to domain knowledge and goals, are incorporated into the Stereotype to form an original Recipient Model (or User Model as the case may be). In other words, the attributes and values are not stored as such but are linked to data in the other system components to form an associative network. 2) a Domain Model, i.e. the savvy that a writer of good business letters has and which can be formalised as the set of possible links among the MOVES and STRATEGIES, associated with particular goals, characterising the business letters in the data base. In line with the educational philosophy mentioned above, the System does not attempt to tell users whether their selections are "right" or "wrong". The burden of judging the suitability of excerpts and collages is left entirely to the users themselves. In a business environment, it can be assumed that they will want to produce and have some means of recognising satisfactory end products; their choices may therefore be used as "expertise" from which the System can learn to furnish increasingly appropriate excerpts to cut and paste, customised according to each user's history of choices. "But isn't such a system simply a data base?" one might object at this point; "After all, it doesn't teach". The reply is simple: True, our System doesn't "teach" in the traditional sense. But then, systems don't necessarily have to, in order get users to learn. Indeed, selfdirected learning presupposes non-directive teaching. Research shows that educational practices based on this philosophy can be extremely effective. "Learning by example" — provided the examples are self-explanatory has shown its value in acquiring procedural knowledge (like writing skills, for instance [1]) while "learning-by-doing" provided the tools truly permit users to get a hold on their object of study - has proven to be highly effective in getting learners to internalise procedural knowledge and make it automatic (id.). Even a seemingly mechanical activity such as "recopying" a model letter retrieved by the System provided the recopying is done with intent, i.e. to achieve a specific communicative aim is an extremely effective way for non-native speakers of English to acquire (and to get practice in using) the "hard-to-teach" lexical-grammatical and pragmatic subtleties that characterise well-written English and that cognitive methods tend to sweep under the carpet (id.). The Tutoring System we built fosters all three of these activities.

308


Of course, since non-directive "teaching" is by definition unobtrusive, most lay users may have the impression that our computer program is simply a "word-processing aid", not an educational product. But, in point of fact, such an impression is simply the proof that the learning going on is genuinely "contextual" (i.e., that these users perceive a connection between the texts to be produced and some real-life function). It also proves that user motivation is "intrinsic", i.e., that these users feel they are accomplishing something and not simply "doing exercises".

4. Lessons Learned As mentioned, two problems were encountered in developing the prototype of the system. They were: (a.) the sophisticated meta-language used to tag the paragraphs of the letters stored in the database cannot be utilised to furnish the user with categories of letter writing styles and strategies from which to choose; this is because users simply balk at having to choose from a myriad of menu selections. Another "meta-meta-language" must be created, couched in everyday language and based not on conceptual categories but on pragmatic goals. That language will be discussed in Section 5 of this paper and illustrated in the Appendix; (b.) the user will, with time, tend to get into a rut and choose the same rhetorical strategies and even the same letters, simply because they work, however minimally; thus, to guarantee the educational value of the Tutor and to get the user to explore new ways of writing effectively, the System must incorporate an additional component in the form of an Overseer with a "personality" of its own, capable of suggesting different approaches to a specific writing task. The Overseer Model is a new component consisting of a set of goal-setting and goal-attaining heuristics which enable the system to improve its performance creatively. In the original specifications, system performance improved over time through the continual refinement of the USER, RECIPIENT and DOMAIN models on the basis of successful hits (i.e. on the basis of which bits of stored correspondence are actually chosen and used by the secretary day by day). The addition of an Overseer Model, now underway, will give the system an "independent personality" in overseeing the operations and suggesting improvements in writing style and effectiveness. The Overseer Model intervenes with "helpful" suggestions as users go about composing their letters. The purpose of these suggestions is to get users to consider MOVES and STRATEGIES not necessarily consonant with their past choices but perhaps better suited to obtaining the desired effect. In fact, the MOVES and STRATEGIES suggested correspond to the Overseer's developing "personality", not the User's attitudes and style. In a word, it is though the user receives, in addition to the help furnished by the retrieval system, the advice of an outside agent (perceived something like an office colleague trying to lend a hand). The intrusions may at times appear meddlesome (since the "colleague's" way of doing things is not necessarily consonant with the user's way of handling correspondence); at other times they may be truly illuminating. This is because suggestions made by an "outsider" may help the user get out of the rut her/his past choices have placed her/him in. The Overseer Model is still in the project stage and will not be discussed in this paper.

5. A Simplified Meta-Language As discussed previously [8,9], what makes our letter-writing tutor efficient in retrieving the "right" bits and pieces of letters is the use of User Model built dynamically by means of a hybrid architecture in which an artificial neural network is embedded into a case-based reasoner (each combination of USER_STEREOTYPE + RECIPIENT_STEREOTYPE + LETTER_. BITS_SELECTED_FROM_DOMAIN constitute a case). This solves the indexing problem created by the continual updating of system every time the user pastes together a letter and thereby


309

"approves" a certain set of letter fragments and a certain order of presenting them, which is linked to certain goals. Where our system proved most lacking, however, was in the user interface. Our original design failed to take sufficiently into account the negative effect on the menu displays caused by the "explosion" of attribute types associated with new letters added to the data base as time goes by. To keep things simple, our previous work was based on a handful of letters. As soon as we extended the number beyond 50, however, it became impossible to incorporate new categories of MOVES and STRATEGIES (as well as CONTENT, LETTER TYPE, KEYWORDS) into easily legible screen menus. Internally, the system had no problem in indexing the new attribute types. But the overburdened menus made the system decidedly unfriendly for an unsophisticated user and therefore, with run of the mill office personnel, practically unusable. It became increasingly clear that only a very highly motivated and extremely intelligent student (such as we had employed for testing) could possibly handle the conceptual challenge of deciding between such subtle choices in writing a letter as, for instance, ESTABLISHING CREDENTIALS in order to DEFEND THE FIRM'S IMAGE, instead of DESCRIBING CAPABILITIES in order to REASSURE THE ADDRESSEE [5, 6]. In most any office, harried personnel would balk at having to choose from menus filled with such intricate and apparently arcane options [7]. What we needed, then, was a way to simplify the secretary's task by automating the choices she had to make and yet, at the same time, maintain the complexity of the tagging system used to characterise letter fragments, in order to enable the Tutor to deliver a small number of extremely appropriate choices. Our solution has been to display only summary menus, get the user to choose from them, and then let the program guess what specific choices the secretary would have made had the full set of options been presented. In other words, the same mechanism used to choose the letters - i.e., the combined heuristics provided by the USER MODEL, the RECIPIENT MODEL, the DOMAIN MODEL in specifying the letter selections to be displayed -- is now being implemented to permit the system to choose from a complex list of attributes on the basis of a few keywords. The simple keywords are in effect translated into a complex series of attributes on the basis of past performance. In other words, the non sequitur is avoided by having the user approve (thus, "choose") the options selected automatically by the system before they then become the basis of selection of the bits and pieces of correspondence: the System learns to guess what THIS particular user means by OBTAIN_ACTION on the basis of what OBTAIN_ACTION has usually meant for the user in the past. In practice, the system offers the user a short-list of letter goals. Then, the system relates the goals chosen by the user to the characteristics of the parties involved in the correspondence (USER, RECIPIENT, DOMAIN EXPERT, OVERSEER), coming up with a second short-list of very specific, highly articulated goals. The user may approve the list without even reading it, choose from among the tags, or reject the entire list and call for another. In that case, a second short-list is displayed. If the AI retrieval mechanism is efficient, no more than one or two further attempts should be necessary. Thus, the "overburdened menus" problem has been eliminated by making the choice of letter description options a problem of intelligent data retrieval, just as the choice of letters was. This is not a particularly new idea: intelligent Help Messages in a Word Processor Application operate on a similar principle (albeit without a User Model). What makes the present implementation interesting is the entity of the matching process: the list of letter tags can run into one or two hundred items, next to the dozen or so items that an intelligent Help Message algorithm must handle in a given context. It is clear however that, for this solution to work, the system requires a carefully worked out architecture of METAKNOWLEDGE. It must, in other words, map its internal resources and translate them into even more general categories (a "meta-meta-language") which capture given sets of pragmatic relationships (USER - RECIPIENT - DOMAIN EXPERTISE - CONTROL FUNCTION) and which, at the same time, can be expressed in everyday language any office staff member can understand. The system's META-META-KNOWLEDGE may therefore be described as a simplified network within the network represented physically by the links between GOALS (contained in the Domain Model) and BEHAVIOUR (as expressed by the

310


USER and RECIPIENT MODELS). It is represented comunicatively in "everyday language" based on "actions to get things done". Examples are given in the appendix. As in knowledge-based systems in general this metacognitive apparatus does not have to be entirely spelled out. The associations established between the various program components and the tagged letters stored in the data base, are due to weightings assigned on the basis of past user choices. In a word, the system builds up its expertise through the sedimentation of historical events (acts of will). This is indeed, it may be argued, what the intelligence of human beings comes down to. A general METACOGNITIVE framework is still necessary, of course, in order to provide the system with a basis for its initial operations. In the case at hand, that framework defines specific epistolary communicative acts and relates them to a network of GOALS and BEHAVIOUR typical of the business world [4]. This enables the system to second-guess the user's intents, thus saving her/him the tedium of making her/his will fully explicit through introspection. The METAKNOWLEDGE thus represented is linked to the META-META-TERMINOLOGY displayed in the menus dynamically, on the basis of who is writing to whom, for what purpose and using what kind of strategy. In other words, using the metacognitive framework, the system is able to choose the most likely candidates from the host of MOVES and STRATEGIES, related to specific ATTITUDES and EXPRESSIVE STYLES aimed at obtaining specific GOALS and producing specific BEHAVIOUR, and then present them to the user with a simplified terminology that corresponds, if the neural network has been sufficiently trained; to what the user desires.

6. Future Developments We are currently examining the prospect of making our letter-writing tutor a more sophisticated tool by enabling it to learn from other machines. "Borrowing" tagged letters from other data bases can help reduce the considerable costs of updating the system by tagging new, additional letters by hand and introducing them manually into the system, to increase its scope. It would be much cheaper for a firm to take advantage of the work already done by tagging experts working for firms in the same domain. (The correspondence exchanged among firms would, of course, have to manually purged of specific references to clients, etc.) It is clear, however, that no system can be updated simply by copying the data base of tagged letters stored in another system. This is because the tags (MOVES, STRATEGIES) applied to the paragraphs of letters in other data bases must be "translated" into ATTITUDES and EXPRESSIVE STYLE corresponding to specific GOALS and BEHAVIOUR, stored in the various components of the target system. It would, however, be possible to interconnect computers running the program and let the Overseer of one unit take turns in interrogating the other units, as would a user. The bits and pieces of letters furnished by the other computers at the request of the Overseer, instead of being cut and pasted into a tetter to send, would be stored as additional correspondence in the data-base. The unit doing the interrogating would establish, within its own system, the associations and weightings relative to the letter called up on the host computers' screens; the host computers would furnish the letters and the tags associated with them. All this could be done at night, while office staff are not using the machines, and would eliminate the need for high-priced consultants to update and enlarge a given system.

References [1]

Boylan, P. (1995). "What does it mean to 'learn a language' in today's world; what role can present-day computer technology play?". In Proceedings of the Symposium on Language and Technology, Florence: Editrice CUSL, pp. 92–114. [2] Dalin, A. (1975). Towards self-management of learning processes? Strasbourg, Council of Europe: CCC/EES 75(9). [3] Dickinson, L. (1978). "Autonomy, self-directed learning and individualization". In Self-directed learning and autonomy, Cambridge: University of Cambridge. [4] Jenkins, S. and Hinds, J. (1987). "Business Letter Writing: English, French and Japanese". TESOL Quarterly, 21(2), pp. 327-349.

311


[5] Kong, K.C.C. (1998). "Are Simple Business Letters Really Simple? A Comparison of Chinese and English Business Request Letters". Text, 18(1), pp. 103-141. [6] Maier, P. (1992). "Politeness Strategies in Business Letters by Native and Non-Native English Speakers". English for Specific Purposes, 11, pp. 189-205. [7] Mauranen, P. (1993). Cultural Differences in Academic Rhetoric, Peter Lang Verlag, Frankfurt am Main. [8] Micarelli, A., Sciarrone, F., Ambrosini, L. and Cirillo, V. (1998). "A Case-Based Approach to User Modeling". In: B. Smyth and P. Cunningham (eds.) Advances in Case-Based Reasoning, Lecture Notes in Artificial Intelligence, 1488, Springer-Verlag, pp. 310-321. [9] Papagni, M., Cirillo, V. and Micarelli, A. (1997). " Ocram-CBR: A Shell for Case-Based Educational Systems". In: D.B. Leake and E. Plaza (eds.) Case-Based Reasoning - Research and Development, Lecture Notes in Artificial Intelligence, 1266, Springer-Verlag, pp. 104–113. [10] Papagni, M., Cirillo, V., Micarelli, A. and Boylan, P. (1997). 'Teaching through Case-Based Reasoning: An ITS Engine Applied to Business Communication". In Proceedings of the 8Th World Conference on Artificial Intelligence in Education Al-ED 97, Kobe, Japan, pp. 111–118. [11] Rich, E. (1983). "Users are individuals: individualizing user models". International Journal of ManMachine Studies, 18, pp. 199–214. [12] Schank, R. (1998). "Horses for Courses". Communications of the ACM, 41(7), pp. 23-25. [13] Stevick, E.W. (1976): Memory, Meaning and Method. Rowley, Mass.: Newbury House Publishers. [14] Toulmin, S. E. (1958): The Uses of Argument. Cambridge: Cambridge University Press. [15] Wainright, G. (1993). Tricky Business Letters. Pitman Publishing, London.

APPENDIX Metacognitive System Applied to a Random Sample of Business Letters 5 EXAMPLES OF LABELS FROM EACH CATEGORY given in the form of Value+[attribute] CONTENT dealership / 1. 4 [new] / 4, payment / 2, 8, [overdue] / 2, [impossibility to pay [unexpected emergency]] / 8,

MOVE specify problem / 12, 15, [insufficient overdraft] / 12, [errors in press report] / IS, solicit compliance/6, 8. 13, 14, 12, IS, [retraction] /15,

LETTER TYPE request/ 8, II, 12, 13, [extension for payment] / 8, [explanation] / 11. [increase in overdraft facility] / 12, [rent review] / 13,

STRATEGY warn / 6. 7, 13, [discretely] / 6, [order placement with another supplier]/ 7, [move elsewhere] / 13, support the request / 8, 12,

KEYWORDS (level(s) of) service / 3, account(s) 12, 11, aircraft / 5, amount(s) / 1 1 . apolog(y)(ies) / 3.

SAMPLE META-META-CATEGORIES Example of Meta-Language: Argumentative Discourse Flow (based on Toulmin 1958:104) [D] You have not taken up your option yet, [W] although not taking up option means relinquishing it [B] as is stated in our signed Agreement; thus [Q] it is reasonable to infer that [C] you accept that we are free to act now and therefore [R] while you may be just buying time, [C'] we shall consider Agreement null.

1 Since W

Unless R

1 On account of B

(D = Datum, Q=Qualifier, C=Claim, C'=corollary to claim, W=Warrant, B=Backing, R=Rebuttal)

312


Transformation into Mela-Meta-Language: same Argumentative Discourse Flow [D] No reply (option) [C'] Invalidate (agreement) [W] "As per" (agreement)

So,C

D W

(Qualifier, Claim, Backing, Rebuttal reconstructed by system from previous similar correspondence)

Other examples of Meta-Meta-Language: • Apologise (failure to comply) • Buy time (decision) • Get clarification (shipping) • Complain (quality)

SAMPLE "TAGGED" LETTER (letter taken from [15]) "A special promotion" 6 August 1994 Dear Mr Baker [Title of special product or service]

SUBJECT

If you could increase your efficiency and productivity and at the same time lower your costs, would you be interested? It is because we think you would that we want you to know about [product or service]. * * strategy: grab attention At the present time, all businesses are looking for methods of reducing costs without harming profitability. [Product or service] helps you to do this. STATE ADVANTAGE (CUT COSTS) [Briefly describe how product or service will benefit the customer] DETAIL OFFER (BENEFITS) In addition to all this, [product or service] possesses a unique additional benefit for your business. It is available for a limited period at a special discount price of [state price], ENHANCE OFFER (DISCOUNT)

If this were not enough, we further guarantee that if, after trying [product or service] for seven days you are in any way dissatisfied with its performance, you can return it undamaged in its original packaging for a full, no-questions-asked refund. ENHANCE OFFER (RISK-FREE TRAIL)

Please return the enclosed post paid card today and you will receive [product or service] by return. SOLICIT (COMPLIANCE)

Yours sincerely

New Directions


A rtificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) IOS Press, 1999

315

Integrating a Believable Layer into Traditional ITS Sassine Abou-Jaoude

Claude Frasson

Computer Science Department Universite de Montreal C.P. 6128, Succ. Centre-Ville Montreal, Quebec Canada {jaoude, frasson} @ iro. umontreal. ca

Abstract. Adding believability and humanism to software systems is one of the subjects that many researchers in Artificial Intelligence are working on. We, and from our interest in Intelligent Tutoring Systems (ITS), propose adding a believable layer to traditional ITS. This layer would act as a user interface, mediating between the human user and the system. In order to achieve acceptable believability levels, we based our work on an emotional agent platform that was introduced in previous works. Our ultimate aim is to study the effect, from a user's perspective, of adding humanism in software systems that deals with tutoring.

1. Introduction We work mainly on research and development that revolve around Intelligent Tutoring Systems and how to make these systems more efficient. Works that aims at increasing the efficiency of ITS are mainly conducted under the following guidelines: Student Model. Building a more representative student model [1] that would project a thorough image of the user in front, thus permitting the system's adaptability to be more reflective of the actual user's status. Curriculum. Constructing a more efficient curriculum [2] module that would work as a skeleton for teaching subjects. Tutorial strategies. Working on different cooperative tutorial strategies [3], through the amelioration of existing ones, the introduction of new ones, such as the troublemaker [4], and the possibility of switching strategies in real time interactions [5]. It was a natural progressive step, for us, to start exploring the integration of humanism and believability in ITS, since we highly believe, that it will increase the system's performance. For this, we suggest the addition of a believable layer in traditional systems in order to create, what we call a believable ITS (see fig. 1).

316

5. Abou-Jaoude and C Frasson / Believable Layer into Traditional ITS

Figure 1: Introduction of the believable layer

This layer will play the mediating role between the user, an entity that is very much influenced by humanism in its normal behavior, and the system that is normally not capable to interpret humanism on its own. Ultimately we would say that our believable agent platform is complete if a user being tested by a system, of which he has no direct sight, would not be able to tell if the latter is a human or a machine (i.e. passing the Turing test somehow). Adding believability means the introduction of human aspects in the software agent such as personality, emotions, and randomness in behavior. The nature of randomness itself makes it somehow, an achievable goal. One way to achieve randomness is through the use of well-known software procedures that would produce random output mapped on the sample behavioral space of the agent. Moreover, since no rules or standards may govern randomness any proposal suggested at different levels may be considered as an additional randomness entity to the system. For the character and emotions; the integration is not as evident. They were subject to much research. And the line of division between these two entities is not as clear too. Therefore, most researchers approached the emotional aspect and the personality traits as one entity. Among these research group we would like to mention the contribution of the following people: C. Elliot and co. for their work on the affective reasoner [6], a platform for emotions in multimedia. J. Bates and co. at Carnegie Melon University, for their work on agent with personalities and emotions [7] and the role of the latter's in assuring believability. Barbara Hayes-Roth and her team at Stanford, for their work on the virtual theater [8] and agent improvisation. Pattie Maes and her team at MIT [9] for their work on virtual world and the emotional experience in simulated world. As for us, we believe, for reasons that will be presented in the next section (section 2), that emotions might be the major entity in creating believability. And integrating a believable layer in ITS, would be almost reached by adding emotional platforms to main agents in this ITS. Section 3 will allow us to introduce our emotional agent platform, the emotional status E and the computational model that would allow the agent to interact emotionally in real time following different events that might take place in his micro-world (in our case the ITS). Section 4, is a case study in which the theory proposed in section 3 will be applied to a student model in a competitive learning environment (CLE). Finally, in section 5, we will be concluding by presenting our future work and the main questions that we are planing to treat in our future experiments.

S. Abou-Jaoude and C. Frasson / Believable Layer into Traditional ITS

317

2. Believable layer vs. Emotions The use of the term "emotions" in our work is done to its widest meaning. An emotion is not simply a feeling that an agent X might have to an agent Y (i.e. hate, like, etc.). It is more, a submersible status in which an entity (human or agent in our case) is, that would influence its behavior. Instead of limiting our choice to the classical definition of emotions we widened it, to include many types, based mainly on previous work done by Ortony and Elliot [10, 11], such as well-being (i.e., joy, distress, etc.), prospect-based (i.e. hope, fear, etc.), attraction (i.e. liking, disliking, etc.) and etc. We also introduced in previous work [12] the notion of stable emotions as a mean of defining personality traits, and henceforth personality. Although interpretation of emotions differs among cultures, backgrounds and even individuals themselves, the sure thing, is that humans are the most reliable source we know for this interpretation. This human ability has always given human teachers advantages on their machine's counterpart 12]. In ITS emotions enhance believability, because they permit the system to simulate different aspects that the user had previously only encountered in a human teacher. Among these aspects we mention the following: 1) The engagement of the user, which is achieved when the system wears different personalities, that would interest the user, to discover. 2) The fostering of enthusiasm in the domain that is also achieved by the capacity of the system to show its own interest in the subject through the variations of its emotions. And 3) the capacity to show positive emotions as a response to the users positive performances. We believe that the wide definition and the flexible classification we gave to emotions allow us to approach believability as analog to emotional. And to us, creating a believable layer is in fact building an emotional agent platform. Yet, we are totally aware, that the ultimate measurement of the righteousness of our choice will be provided by the final results of the system. Particularly in ITS, adding a believable layer (such as fig. 1 shows) would in fact be narrowed down to adding additional emotional layers to agents that might exist in the simulated micro-world of the ITS (i.e. A tutor, a troublemaker, a companion, and a student model, etc.). The existence of these actors depends mainly on the tutorial strategy of the system.

Figure 2: Student model with layers added chronologically

318


The student model is one of the entities that exists in many strategies, therefore for illustration purposes we show in figure 2 a diagram of our student model and its chronological evolution. As figure 2 shows, the final layer is the emotional layer. This layer will be the mean to produce believability in the system as a whole. 3. The emotional agent platform Lately our research has been concentrated on creating a computational system for emotions. A system that would be used to create, calculate and constantly update the emotional status of the agent while interacting with its environment. This section (section 3) will present the general computational model for an emotional platform and the next section (section 4) will present a case study in which the theory is implemented in a particular scenario where an emotional agent is reacting in a competitive learning environment. 3.1 The emotional couple The basic entity in our system is the emotional couple ei. An emotional couple is a duo of two emotions that belong to the same group [11] but contradict each other. If E11 and Ei2 are two emotions that satisfies the condition just stated, than we can write: ei=[E11/E12] is an emotional couple made of these two emotions As an example, the two emotions Joy and Distress belongs to the same group (appraisal of a situation) and they have contradictory exclusive interpretations (while Joy is being pleased about an event, Distress is being displeased). Joy and Distress will make an emotional couple e1 = [Joy/Distress]. The value of an emotional couple is a real number that varies between -1 and +1 inclusively. When this value is equal to +1, it would be interpreted that the left emotion in the couple is being experienced to a maximum. When the value is equal to -1, it would be interpreted as the emotion on the right side of the couple is witnessed to a maximum. A zero would mean that concerning this group of emotions the agent is indifferent. Most of the times the value of the couple is floating between those limit values. Formally we would have: ei

[E11E12

[-1,1]

Where:

3.2 The emotional status The emotional status E is the set of all the emotional couples that an agent would have. The emotion status is somehow function of the time t (however this variation with t is not continuous, it is somehow discrete, since it awaits an event): E

where i, n 6 K

S. Abou-Jaoude and C. Frasson /Believable Layer into Traditional ITS

319

The new emotional status E' is computed every time an event takes place. It is function of the previous status E, and the set of the particularities P (s) that the context posses (see section 4). These particularities are external factors that influence the emotional status yet belong to external entities other than the emotional layer. As an example the performance of the user in an ITS would be an external factor that would influence its emotional status but is not a part of it. Again we define P (s) as the set of these factors pi therefore we have: P(s) s ( P i > P 2 , . . . , P i . . . P m l where i, n N Finally we can write:

E'=f{E,P(s)} 3.3 The computational matrix In order to compute every element ei' of E', we will be creating a computational matrix M. M will have elements aij that will determine the weight of how much each constituents affects the emotional couple e i '. The choice of the values of aij is a very critical issue since it is basically the core of the whole emotional paradigm. The next paragraph will present the experimental way by which these weights, in the matrix M are to be determined. For the moment, we know that M has the following dimensions: IM\ = \E\ x \E + P(s)\ = nx(n+m) And it has the following shape:

E an E'

Now we can calculate E' as the set of ei' where

P(s)

320


Figure 3: Block diagram of the backtrack test allows the determination or the weights aij in M.

3.4 The weight factor and the procedure Upon determining P ( s ) we will proceed to determine the weight aw experimentally. The experiment is a backtrack procedure (see fig. 3 for details). Normally E and M are known and we proceed to calculate E'. But at this level M is the target. Human users will help determine M. In details, users are given explanations about the system's entities (ei, pt, and permissible values), and then they are tested to see if they assimilated their meanings. In this test, as figure 3 suggests, we tolerate a certain level of misunderstandings in order to simulate the randomness and the indecision in human. Starting with E, the user is asked to provide the new E' following a certain event, according to the user's best judgement. Note that the idea of human making choices is exactly what we wanted in our system, since we are aiming at creating systems that imitate humans. Repeating this procedure (see fig. 3) will allow us to create enough equations to solve for aij in a system of n*m equations with n*m variables.


321

4. Case Study: An emotional student model in a competitive learning environment (CLE) To test all thus, we proceeded in adding an emotional layer to the traditional student model (see figure 2), in a simulated learning environment. There are three main actors in the system: the emotional agent in question, the troublemaker (a special actor who sometimes mislead the student for pedagogical purposes [5]) in the role of a classmate and the tutor. In the story: the tutor will ask a question of value V to both students. The troublemaker will provide an answer to the tutor Rpt, of which the emotional agent has no knowledge. The troublemaker will then propose an answer to the emotional agent R. At this point, the emotional agent will have to provide his answer R.t to he tutor. Once this exchange is finished the emotional agent have access to Rpt. And following this, his emotional status will be recalculated. Figure 4 shows the environment of the emotional agent, with different variables that will enter in the calculation of E.

Figure 4: Simulation of an emotional layer added to the user model in a CLE

The particularities (i.e. pi of the system that affects the calculation of the emotional status should be determined experimentally based on human judgements, the way M was. In our system we identified three main factors that might affect E. The performance of the emotional agent Pe, the performance of the troublemaker as perceived by the user (P p )e, and the level of disappointment also known as the deception degree dd. 4.1 Value of question vs. Value of answer To simulate the fact that different goals of the agent might have different priorities we propose to add a weight V to each question asked by the tutor. The answers have a value of -1 for wrong answers and +1 for correct one, therefore: V e {1,2, 3}

&

R e {-1, +1} where +/ = correct answer & -1 = wrong answer

322

5. Abou-Jaoude and C. Frasson / Believable Layer into Traditional ITS

4.2 The performance of the agent and the perceived performance Pe is the performance of the believable agent himself, this performance is also computed in real time taking into consideration its previous value, the value of the question and the value of the answer.

P'e= (1 • v/8) x Pe + (V/8) x Ret The factor 8 is also an experimental value determined by a human user sample and rounded in the formula of Pe'. (P p ) „ is the perceived performance of the troublemaker as seen by the believable agent. It is normal that the emotions of the agent play a significant role in the calculation of this entity. We propose the following:

(p,r.= f {(p,)., PP, E(t) Also experimentally we managed to approximate this function to the following: (seetable2,

4.3 The deception degree We define the deception degree dd as the degree of disappointment that the agent is witnessing in his interaction with the troublemaker. The value of dd also varies between -1 and +1. Upon explaining the factor and its extreme values (i.e. -1 and +1) we proceeded by providing the users with different combinations of Rpt, R, R.t and C and asked them to provide values for dd based on their own judgement.

C is a factor that tells if the agent chose the troublemaker answer (c = 1) or not (c=0) Table 1: Experimental values of the deception factor

4.4 The emotional status In our model we have defined 13 (see table 2) emotional couples shown in table below. These emotional couples are based on the work of Ortony and Elliot [10,11]. Therefore the emotional status E is the set of values of those 13 emotions.

Table 2: Emotional couples used in our case study

for e 13 )


323

4.5 Solving for M Knowing E and P ( s ) we proceeded experimentally to solve for M. In this case we had 208 variables. This will require a minimum of 208 equations of the form 13

e i '=

a

n

s

x (P P )a+ aji6X d d

4.6 Results The results we obtained for the matrix M and its final shape were under development by the time we first produced this article, now these results are ready and will be presented in the conference's presentation and in future works.

5. Conclusion (The believable tutor) Another major experience that we are working on, is the creation of a mini ITS based on the classical tutor-tutee strategy. Two models of the tutor will be explored, a model in which the tutor is an agent with no emotional nor believability aspects, and another one where a believable layer is added to the tutor. Our aim is to compare the performances of the ITS under both models. We have already started working on this application, and preliminary results lead us to believe that believable tutors influence users performances depending on the latter's knowledge levels.

Figure 5. Traditional vs. Believable ITS performances and the pre-estimated crossing between the two.

In details (see figure 5), a believable tutor would be very successful with users who have a low to average knowledge level, but less appealing to experts.

324


Still many questions will be answered when the system is implemented. We will be interested in answering questions such as: 1- Early applications tends to tell that the believable layer in a ITS would reduce the performance of an expert, who is more interested in direct application, than emotional systems. Does a crossing between the two ITS (believable and regular) exist? 2- If it does. Then where does it occur? In other terms, at which level of knowledge it is better to switch to the regular ITS? 3- The knowledge level affects the crossing point of the two systems. Does the learning preferences or learning profiles of the user affect it too? If yes, then how? 4- Does the choice of the curriculum affect the crossing point?

Acknowledgments We thank the TLNCE (TeleLearning Network Centers of Excellence in Canada) who have supported this project.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [ 10] [11] [12]

Lefebvre, B., Nkambou, R., Gauthicr, G. and Lajoie, S. "The Student Model in the SAFARI Environment for the Development of Intelligent Tutoring System ", Premier Congres d'Electromecanique et Ingenierie des Systemes, Mexique, 19%. Rouane, K. and Nkambou, R. "La nouvelle structure du curriculum". Rapport semestriel-008. Research Group: SAFARI, DIRO, University of Montreal, pp. 25-35, 1997. Aimeur, E., Alexe, C., and Frasson, C. "Tutoring Strategies in SAFARI Project", Departmental Publication # 975, Department of Computer Science, University of Montreal. 1995. Frasson, C. and Aimeur, E. "A Comparison of Three Learning Strategies in Intelligent Tutoring Systems", Journal of Educational Computing Research, vol 14, 1996. Abou-jaoude, S. and Frasson, C. "An Agent for Selecting a Learning Strategy". Nouvelles Technologies de I'Information et de la communication dans les Formations d'lngenieurs et dans rindustrie. NTICF98- Rouen, France, 1998. Elliot, C. "I Picked Up Catapia and Other Stories: A Multimedia Approach to Expressivity for "Emotionally Intelligent" Agents." In Proceedings of the first International Conference on Autonomous Agents, Marina del Rey, 1997. Bates, J. "The Role of Emotions in Believable Agents." Technical Report CMU-CS-94–136, School of Computer Science, Carnegie Melon University, Pittsburgh, PA, 1994. Rousseau, D. and Hayes-Roth, B. "Interacting with Personality-Rich Characters" Stanford Knowledge Systems Laboratory Report KSL-97-06, 1997. Maes, P. "Artificial Life meets Entertainment: Life Like Autonomous Agents. "Communications of the ACM, 38, 11, 108-114, 1995. Ortony, A., Clore, B. and Collins, C. "The Cognitive Structure of Emotions. Cambridge University Press, 1988. Elliot, C. "Affective Reasoner personality models for automated tutoring systems." In proceedings of the 8th World Conference on Artificial Intelligence in Education, AI-ED 97. Kobe, Japan, 1997. Abou-Jaoude, S. and Frasson, C. "Emotion Computing in Competitive Learning Environments". Proceedings of the workshop on pedagogical agents. Fourth International Conference ITS 98, Texas, 1998.


325

Helping the Peer Helper Vive S.Kumar, Gordon I.McCalla, Jim E.Greer ARIES Laboratory, Department of Computer Science, University of Saskatchewan, Canada Email: [email protected] Help systems address the task of providing assistance in learning, tutoring, training, and workplace performance-support. It is a multifaceted and complex undertaking to customise the contents of help and present it in an individualised fashion. This paper describes the framework of a customised and individualised help system that is being built based on the notion of human-in-the-loop. Human-in-the-loop technique introduces peer helpers into the help system and equips them with minimalist-Al, collaborative, and multiagent help tools. The framework focuses on helping the peer helper rather than directly helping the helpee. The paper elaborates on the work-in-progress of a help system based on this framework and highlights the research impact of such a system.

1

Introduction

Computer-based tools are becoming prevalent in workplaces. They often necessitate different degrees of adaptation from different users with different amounts of computer-expertise, resulting in divergent types of impasses. This requires software tools to place increasing emphasis on built-in help facilities to handle such impasses. Most software tools have generic help facilities including metaphoric help (user-friendly interfaces) and online help (www manuals). A few of them now even have contextspecific help as in Microsoft's Intellisense™ technology [9]. There have been attempts to further personalise the context-specific help using AI techniques [1,12,14]. This paper describes a framework for such a context-specific and personalised help system where the focus of help is on the helper rather than on the helpee (the person being helped). The effectiveness of a help system is limited by the shortcomings of the context information, the inability to match a help request to an appropriate help response, and the inadequacy in meeting time limitations. Most help systems compromise on these three issues and this adversely affects the quality of help being provided. To tackle this problem we introduce the notion of bringing human helpers into the system and supporting them, aptly named human-in-the-loop (11). Human-in-theloop approach presents an optimal trade-off between human-help and machine-help. Instead of machine-help, the machine assists with the selection of the human helpers and supports human-help. Simply put, help systems offer help. Breuker (1990) defines help systems as systems that aid users in performing a specific task. Help can be offered in tutoring, training, and other workplace settings. Help systems are similar to tutoring (2), training (7), and workplace performance support (10) systems in the sense that all of them are help-oriented systems. Different help systems store different aspects of user information for different purposes. Sherlock (7) models store the performance of users in terms of fuzzy domain variables. EUROHELP (1,15) models represent user's learning and forgetting of tasks. Information obtained from user models and other system resources often assist in making useful knowledge-based decisions using a variety of inference techniques. However, inferencing is a time-costly operation. It requires an enormous quantity of computation to make even simple inferences. This is one of the reasons that influenced the introduction of collaborating human helpers in the system and eventually the humanin-the-loop approach. In PHelpS (3,6,8) and I-Help (4,5,13), a peer helper is introduced as a collaborating partner to help the helpee. Users of a collaborative help system could be geographically separated and connected through the Internet. This introduces the need for simple, powerful, and maintainable distributedprocessing techniques. The WWW is an appropriate platform to deploy distributed help systems, as adopted in AutoHelp (12), PHelpS, and I-Help. In addition, WWW-based agent-oriented technologies, as implemented in OFFICE (10) and I-Help could also provide value-added distributed

326

V S Kumar et al. / Helping the Peer Helper

help services. While the degree of agenthood varies from system to system, it is a fair contention that agenthood is one of the fundamental components of help systems. Availability and widespread use of distributed processing resources such as Java, Distributed Component Object Model ™ (DOOM), Common Object Request Broker Architecture ™ (CORBA), Remote Procedure Call (RFC), Message-Oriented Middleware (MOM), database stored procedures, and peer-to-peer services further promote personalisation and customisation of distributed help systems [16]. The techniques that are outlined so far are essential for the success of a help system. The ideology proposed in this paper employs these techniques and merges them in a coherent framework based on pedagogical plans.

2

Helping the helper

The framework we have developed for helping the peer helper is based on observations from three different studies. Each of the studies enabled various implementation aspects of a helping-the-helper environment and the subsequent helper-helpee proceedings were recorded. The analysis of the observed data helped to derive and validate this framework. Brief descriptions of these studies and a summarisation of the findings are given below. The first study is an informal usability study of PHelpS [6], a just-in-time peer help system, developed at the University of Saskatchewan, and field tested at the Regional Psychiatric Centre of Correctional Services of Canada in one of its training courses. The goal of this study was to provide hands-on training with OMS (a prisoner information system) and PHelpS together and to test the effectiveness of PHelpS. The experimenters recorded the proceedings of the worker-helper interactions using audio (telephone conversations and debriefing), video (of the entire proceedings), trickle files (of the keystrokes), and observation notes. The second study was conducted using an intelligent helpdesk called I-Help [13]. I-Help is a www-based peer-help tool that enables students in an introductory computer science course to interact with each other. Importantly, I-Help suggests appropriate peer-helpers based on the user models that paves way for appropriate peer-help. An informal usability study was conducted using 40 students and 5 "expert" helpers. The goal of this study was to verify the performance and interface hypotheses related to I-Help. The resultant interactions between the helpers and the helpee were captured in trickle files. The third study was recently conducted using I-Help but with a different set-up. About 150 students who took an introductory computer science course took part in this experiment. As part of the course they were required to solve a lab-based programming assignment using Java. Whenever they compiled their solution programs, the source code was automatically captured and appended to a repository, along with a time-stamp. All the students were given access to I-Help throughout the term. While working on their lab assignments students invoked I-Help many times seeking assignmentrelated help from expert helpers or from peers. This enabled us to analyse the source code developed at a given point in time, the type of help requested at that time, and the type of help provided by the helpers using I-Help. Preliminary analysis of the data collected from all the three studies revealed that an ideal peer-help session follows a three-stage help process: help-context, help-plan, and help-delivery. The data also disclosed that in all the successful help sessions, the peer helper provided either help resources or useful hints to the helpee. Using the comparative analysis of the third experiment, we were able to identify the type of programming difficulties that compelled the students to seek for help. Importantly, we are able to gather information on what type of help peer-helpers provided and what they should be providing. Some excerpts from a help session illustrate this point. Helpee: Hello, I can't seem to get the indexOf to work properly. Helper: How are you trying to use it? Helpee: I typed in "variable.substring(indexOf("string")-1 .number); but it says it's not found. Helper: That's because indexOf needs an object attached to it in the form variable.indexOf ("string"). {At this point, the confused Helpee quits the help session) Helper: Hello? Is anyone there? well... bye then...

y.S. Kumar et al. I Helping the Peer Helper

327

First of all, we should keep in mind that this peer helper was another student taking the same course. He/She could have been selected by the system for various reasons. For example, the helper might have already submitted the solution for this assignment or a related assignment, or the helper might have been the only one available at that time. Being a fellow student, a peer helper may not be correct in his/her analysis and/or solution of the problem. The following are observed from the excerpts: • The helpee was not specific enough about the problem description. The Java programming language code variable.substring(index Of("string") -1,number) and the comment "but it says it's not found" are quite unrelated. • The helper tried to provide help without establishing sufficient context. It is quite possible for the help system to encourage the helpee to provide some additional context information. • The helper's plan of providing direct help may be useful but may not be appropriate. Instead, the peer helper could have provided the www-link that explains the instance methods of the String class. It is quite possible for the help system to advocate such pedagogically value-added suggestions to the peer helper. • It may be difficult for the peer helper to locate the correct www-link amidst his/her current work commitments. It is quite possible for the system to automatically list a set of relevant resources from which the peer helper can choose appropriate ones and send them to the helpee with the click of a mouse button. These are some of the major observations identified in the third study. These observations mould the foundations of the "helping the peer helper" framework. The framework follows a threestage sequential process similar to the one identified in these studies, provides the peer helper with a suite of appropriate help resources, guides the peer helper through useful pedagogical plans, and delivers help with minimal effort from the peer helper. 2.1

Help Context

A help-context is nothing but an expanded form of the help request. The helpee can decide the extent to which he/she wishes to provide the help-context information. It is conceivable that the more the help-context the better the quality of help received from the help system. A help-context typically contains the topic of help, the helpee's query on the topic, and other information related to the topic. It can be represented as a set of slot-value pairs. The helpee, the helper, or the system itself can fill in values of the help-context slots, individually or in consultation with each other. Thus, help-context acts as the object of reference for help between the helpee, the helper, and the system. Table 1 lists the slots of a generic help-context in our framework. Topic Help-request question Time of help delivery Duration of expected help Keywords from the helpee Concept Plan-net Query overlay on topic Query overlay on concept helpee-preferred type, mode, and form of help response Helper-preferred type, mode, and form of help-response System-preferred type, mode, and form of help-response Type of help query Pointers to the helpee and helper user models Table 1: Help-Context slots

328

V.S. Kumar et al. / Helping the Peer Helper

A topic can be chosen from a topic hierarchy on which a helpee can seek help. A help-request question is a structured question formed by the helpee that includes other material related to the question. For example, the helpee can attach source code on which he/she wants help. The helpee can also specify when and how long the help is needed. In addition, the helpee can enter keywords or select from a list of keywords from the database corresponding to the selected topic. A concept can be a node from a network of concepts associated with the selected topic. Depending on the type of topic, a plan-net represents procedures that correspond to the topic. For instance, PHelpS handles different types of topics categorised at the top level as administrative or OMS-tasks. Based on the categorisation, the topics in PHelpS can have different sequence of procedures. A plan-net captures how to perform the task associated with a topic. Depending on the chosen topic, adjacent and related topics from the topic hierarchy and the concept network nodes can be identified and assigned weights. The weights represent the influence of a topic over other topics and concepts, with respect to the helpee's user model, resulting in topic- and concept-overlays. There are many help response types from which the helpee, the helper, and the help system can choose. Some example help response types are, pointer, short answer, discussion, explanation, analogy, rebuke, delayed response, and provide clues. Help can be delivered in three pre-defined modes: offline, online, and just-in-time. Offline help involves asynchronous communication between the helpee and the helper using email. In the online mode, a dedicated helper shares the helpee's workspace and steps the helpee through the task. In the just-in-time mode, the help provided is highly context-specific and is delivered at the time when help is requested. Just-in-time mode requires a short amount of help, provided at real time, from an appropriate helper. Workplace scenarios can typically use the just-in-time model of help with support from fellow workers. The form of the help response can be either manual or automated. In the manual form of help, the communication tools and interfaces are established between the helpee and the helper so that the helper can deliver help in person. In the automated form of help, the help system provides the necessary help documents and help procedures to the helper who then verifies the help material and eventually lets the system deliver the same. The help system can also store a structure called type of query, which is a summarisation of the help-context information, compiled from the slot values. The help-query is a vector of seven elements that includes, the helpee id, the time at which a help-request was initiated, the topic node, the concept node, the help response type, the mode of help, and die form of help. The type of query can be used as an index of help-contexts. The first two elements uniquely define the help-context and the rest help categorise the help-context. Importantly, all the values of the type of query pertain only to the helpee. In addition, a help-context also contains pointers to the helpee and helper user models using which other useful information can be retrieved. The helpee provides the initial values of the help context. Some of the slots can be filled in by the system using knowledge-based inferences. The chosen helper can provide values for the rest of the slots. Thus, the first task of a help response is the co-construction of a help context.

2.2 Help-plan A help-plan provides a guideline for help response. It can be initiated based on the information available in the help-context and other information derived by the system from other system resources. It can be either selected from a library of plans or generated afresh. Presently we will consider only the selection of a help-plan from a library. The framework addressed in this paper adheres to a 3-step process to select an appropriate help-plan. First, a help-context maps on to a help-principle. Second, the help-principle maps on to a help-plan-network. Third, the help-plan-network is traversed to select a help-plan. The first step is to associate the help-context with a help-principle. A canned, domaindependent set of help-principles can be compiled for this purpose. A set of domain dependent "pedagogical rules" can be used to associate a help-context with a help-principle. For instance, consider the case in which the help-context reflects the fact that the helpee is a novice and expects syntax level help in the domain of introductory Java programming. In this scenario, the system can choose the help-principle "degree of help from minimum to maximum" based on the pedagogical rule that novice helpees benefit more if the help provided is minimal at the beginning and increases

V.S. Kumar et al. /Helping the Peer Helper

329

towards maximal help as the help-session progresses. The set of pedagogical rules should guarantee the selection of a help principle for each possible instance of help context. Table 2 presents a sample set of help principles. Degree of help from minimum to maximum Degree of use of domain-related material from maximum to minimum Degree of use of domain-related material from minimum to maximum Transition from traditional apprenticeship to cognitive apprenticeship Use of analogy from maximum to minimum Constant use of analogy Modelling (helper performs and helpee observes) Decremental scaffolding Reflection (helpee compares performances) Helping in increasing diversity (provide help from different tools) Table 2 : Help Principles The second step results in the selection of an appropriate help-plan-network. Each helpprinciple is directly associated with a help-plan-network. . A sample help-plan-network for "minimum to maximum degree of help" is given in Figure 1. As depicted in this figure, a helpplan-network is a directed network of nodes. Each node in the network represents a unique helpaction. There is a start node and an end node in the network. There can be many paths starting from the start node to the end node through a number of help-action nodes. Based on the "min-to-max degree of help" principle, the nodes earlier in a path represent less direct help than the successive nodes. Each arc connecting two help-action nodes is associated with a number of "pedagogical constraints".

Figutre 1: Help-plan- network representing ' ' m a x t m m degree of help"

330

V.S. Kumar et al. / Helping the Peer Helper

The traversal of the help-plan-network results in a help-plan. The traversal is dependent on the pedagogical constraints. Pedagogical constraints are constraints that can be attached between any two nodes. They ensure the availability of the help-tools required for the execution of the help-plan, prerequisite relations between any two nodes in the network, and other preferences of the helpee and the helper. They limit the distance of how far a path in a help-plan-network can be traversed depending on the current help-context. Traversal of the help-plan-network starts with the start node. It can continue through a path of help-actions by solving the pedagogical constraints attached in each arc. Depending on the availability of the tools and other pedagogical requirements, one of the successor nodes is chosen to be included in the traversal path. Likewise, the network is traversed until either the terminal node is reached or the constraints block the traversal to any other node. Once a node is reached beyond which the traversal is prohibited, the path traced by that traversal result in a candidate help-plan. The candidate help-plans are presented to the peer-helper who can select an appropriate one for execution. A help-plan-network is a directed graph. It can be cyclic, thus allowing possibly infinite execution of a set of help-actions. It is the responsibility of the developer to make sure that necessary pedagogical constraints are imposed to ensure flow of control in the network. For example, in a cyclic segment of a path, the developer can introduce the number of times that a node can be executed. By default, all the predecessor nodes are prerequisites of the successor nodes. These prerequisite constraints can be relaxed. For example, consider the path START-l-2-3-4b-5b-6a-7a-8a-9-STOP. If a "complete solution" is not available and a constraint prevents the transition from 3 to 4b, the node 4b cannot be executed and the help-plan would have to be START- 1-2-3-STOP. Suppose the prerequisite constraints at arcs 3-4b, 4b-Sb, and 5b-6a are relaxed, the system can now consider START-l-2-3-6a-7a-8a-9-STOP as a valid path. The availability of the help resources that are required for the execution of a help-action can also be imposed as pedagogical constraints. For instance, the arc 5b-6 can have the constraint that ensures the availability of the resource "complete example solution". Pedagogical constraints can also include preferences of the helpees and the peer helpers that can be derived from the user models.

2.3

Help delivery

Help-delivery is the process of executing a help-plan in a controlled manner. A help-plan is executed when the help-actions are executed in a sequential fashion. As mentioned earlier, a help-plan is made of help-actions. Each help-action is based on an instance of a help-strategy. The framework identifies over 100 different generic help-strategies that can be of use in a help system. Each generic strategy can yield a number of instances of strategies. For example, "browse" is a generic help-strategy that yields a number of instances including "browse documents", "browse models", and "browse available tools". Help-strategies form the skeleton of a help-plan. A help-action consists of a triplet, namely, an instance of a help-strategy, the participants of the help-action, and the help tools that assist in the implementation the help-action. Some sample help actions based on these help strategies are listed in Table 3. "Provide evidence, helpee, case-library browser" is an example help-action where "Provide evidence" is the task to be performed by the "helpee" using the "case-library browser". Provide evidence, Helpee, Case-library browser Use, Helpee, Relative bugger-debugger Browse models, Helper, Model browser Observe help-session, Helper, Session observer Rephrase diagnosis, Helper, Text area Explain code, Helper, {Text area, FAQ, Case library of explanations} Hint example-part-of-solution, Helper, {Text area, case library of hints} Show example-part-of-solution, Helper, Text area Trace example solution, {Helpee, Helper}, Code stepper Animate code. Helper, Code animator Table 3 : Sample help actions

V.S. Kumar et al. /Helping the Peer Helper

3

331

Conclusion

"Helping the peer helper" framework is a natural extension to any peer-supported help system. It can guide the peer helpers to use a disciplined approach of providing help through the stages of helpcontext, help-plan, and help-delivery. These three stages are identified and adopted with a view to judiciously distribute machine-help and human-help across the help process. An obvious point of contention is the utility of such an elaborate help framework. From the three studies reported in this paper, we conclude that the range of requests originating in a higher education environment like university courses or in workplaces will require help responses ranging from very simple to highly complex. We claim that this framework can handle different types of help requests requiring different degrees of help, in a variety of applications, under different domains. The framework inherently reduces the cognitive load of the helper in terms of search efforts for appropriate, pedagogically sound, and planned help material. In that sense, this approach is most beneficial when the peer helper is neither a trained teacher nor an expert helper. However, the system allows the helper to choose alternative help material or modify the help material suggested by the system. Peer help networks can be deployed in a variety of domains including education, training, and workplace environments. Each of these domains places different types of help service requirements on their respective peer help networks. One such service requirement is the ratio of context-specific and generic help resources used in peer help networks. This ratio can be different for different peer help networks across different domains. For instance, in the domain of corporate training, it is quite possible to deploy a rich set of context-specific help resources and only a few generic help resources. On the other hand, peer help networks in the domain of education may require a considerable number of generic help resources since the courseware being taught can vary from time to time. Another major difference between peer help networks across different domains is the availability of quality peer helpers. In the domains of corporate training and workplace environments one can assume the availability of a considerable number of experienced helpers. On the other hand, in the domain of education, peer help networks may have to be satisfied with other students and possibly a small number of expert helpers such as "lab assistants", "tutorial leaders", "dedicated peer helpers", etc. A prototype of a help system for a peer help network in the domain of higher education is being built. The system is designed based on Java's mobile code foundation and CORBA's distributed object infrastructure. It is a 3-tier JAVA 2 - JDBC - CORBA system that includes a "thick" CORBAcompliant client (tier 1), a CORBA-compliant application server (tier 2), and a JDBC bridge connecting to a DBMS server (tier 3). Presently, we use Java EDL™ (Interface Definition Language) as the CORBA/Java ORB. The server objects in turn talk to one or more DBMSs via JDBC. A binary constraint solver is also being built as part of the inference engine. Helper/helpee services are mostly offered via Java applications with Swing™ interfaces. The prototype is expected to be ready for demonstration by July 1999. Conceptually, the help system being built based on this framework is similar to that of a CSCL, CSCW, or SMC systems except that the focus is on helping the peer helper rather than helping the helpee directly. In that sense, the approach adopted in this framework is unique. Moreover, the framework prescribes guidelines for the implementation of a practical help system that fuses traditional pedagogy with modem instructional methods. It is extendable and can be employed in a variety of workplace and higher education domains involving applications like help-desks, corporate training, university curriculum, and distance learning, where peer help is available and untapped. The framework also supports knowledge-based inferences to select appropriate links and resources that the peer helper can put to use. It is a robust technology that can overcome the traditional Al-complete problems with the use of collaborative, distributed-processing, and human-in-the-loop techniques, where the peer-helper and the help system complement each other to provide value-added help.

332

V.S. Kumar et al / Helping the Peer Helper

Acknowledgements We would like to thank the University of Saskatchewan graduate scholarship programme and the Canadian TeleLearning Network of Centres of Excellence for financial support of the project. References [1] Breaker J.A. (Editor). (1990). EUROHELP: developing Intelligent Help Systems. Copenhagen, EC. [2] Chan T.W. (1991). Integration-Kid: A Learning Companion System. 12th International Joint Conference on Artificial Intelligence. Sydney, Australia, 1094–1099. [3] Collins J.A., Greer J.E., Kumar V.S., McCalla G.I., Meagher P., & Tkatch R. (1997). Inspectable User Models for Just-In-Time Workplace Training. The Sixth International Conference on User Modeling. Chia Laguna, Sardinia, Italy, 327-338. [4] Greer J.E., McCalla G.I., Cooke J., Collins J., Kumar V.S., Bishop A., & Vassileva J.I. (1998). The Intelligent Helpdesk: Supporting Peer-Help in a University Course. The International Conference on Intelligent Tutoring Systems. San Antonio, TX, USA, 494–503. [5] Greer J.E., McCalla G.I., Cooke J.E., Collins J.A., Kumar V.S., Bishop A.S., & Vassileva J.I. (1998). Integrating Cognitive Tools for Peer Help: the Intelligent IntraNet Peer Help-Desk Project. In Lajoie S. (Ed.), Computers as Cognitive Tools: The Next Generation (to appear). Lawrence Erlbaum. [6] Greer J.E., McCalla G.I., Kumar V.S., Collins J.A., & Meagher P. (1997). Facilitating Collaborative Learning in Distributed Organizations. Available: http://www.oise.utoronto.ca/CSCL/papers/greer.pdf (Accessed 09 Dec 1998). [7] Katz S., Lesgold A., Eggan G., & Gordin M. (1992). Approaches to student modeling in the Sherlock tutors. The Third International Workshop on User Modeling. Dagstuhl Castle, Germany. [8] McCalla G.I., Greer J.E., Kumar V.S., Meagher P., Collins J.A., Tkatch R., & Parkinson B. (1997). A Peer Help System for Workplace Training. In duBoulay B. and Mizoguchi R. (Editors), IOS Press: Amsterdam. 8th World Conference on Artificial Intelligence in Education. Kobe, Japan, 183190. [9] Microsoft ™ (1998). Microsoft Office IntelliSense White Paper. Available: http://www.mkrosoft.coin/macoffice/prodinfo/office/intel.htm (Accessed 21 Oct 1998). [10] Nirenburg S., & Lesser V. (1988). Providing intelligent assistance in distributed office environments. In Bond A.H. & Gasser L. (Eds.), Readings in Distributed Artificial Intelligence (pp. 590–598). Morgan Kaufmann. [11] Palthepu S. (1998). Scalable Program Recognition for Knowledge-Based Reverse Engineering [PhD Dissertation] (pp. 93-95), Canada: University of Saskatchewan. [12] Thurman D.A., Tracy J.S., & Mitchell C.M. (1997). Design of an intelligent web-based helpdesk system. Available: http://www.isye.gatech.edu/chmsr/publications/smc97/dat/HelpDesk.smc97.pdf (Accessed 09 Dec 1998). [ 13] Vassileva J.I., Deters R., Greer J.E., McCalla G.I., Kumar V.S., & Mudgal C. (1998). A Multi-Agent Architecture for Peer-Help in a University Course. Workshop on Pedagogical Agents. The International conference on Intelligent Tutoring Systems. San Antonio, TX, USA. [14] Waters R.C. (1986). The Programmer's Apprentice: A session with KBEmacs. In Rich C. & Waters R.C. (Eds.), Artificial Intelligence and Software Engineering (pp. 351–376). Morgan Kaufmann. [15] Winkels R. (1998). EUROHELP. Available: http://www.lri.jur.uva.nl/- winkels/eurohelp.html (Accessed 21 Oct 1998). [ 16] Orfali R., & Harkey D. (1998). Client/Server Programming with Java and CORBA. John Wiley & Sons, Inc, Toronto.


333

A Knowledge Extractor Environment for Classroom Teaching Alexandra I. CRISTEA and Toshio OKAMOTO AI lab., The Grad. School of Info Systems, Univ. of Electro-Comm. Chofu, Chofugaoka 1-5-1, Tokyo 182, Japan, Email: { alex, okamoto] @ai. is. uec. ac.jp This paper presents a knowledge extractor environment for Classroom Teaching. Knowledge is extracted from Neural Networks and added to the domain knowledge possessed by the teacher, who can in his/her turn pass it on to the students. As application field, we aim at. the the Educational process, in particular, we discuss an Economy class for high-school level education. The presented environment will be used to extract current knowledge from the stock exchange market, by training a neural network on stock exchange events, and convey it to a classroom of students, in the form of symbolic knowledge (as opposed to the sub-symbolic knowledge embedded in a neural network) and ultimately natural language. We believe that this type of environment opens new possibilities for the educational field, and is one more step towards breaking the black-box neural net, to read its information.

1

Introduction

In this paper we propose a knowledge extractor environment to serve as an assistant in the educational process. This is the large frame of our purpose and it includes many modules and subgoals. Getting useful information for the teaching process is very difficult when we deal with unstructured knowledge. Neural Networks (NN) can store sub-symbolic knowledge, but until recently it was believed they are able to do it only in a "black-box" format. Knowledge extraction (REX) from NNs tries to reduce these disadvantages and build a bridge between sub-symbolic and symbolic knowledge. As the teaching process requires only symbolic knowledge, we believe this to be a chance for teachers to significantly improve their teaching materials and/or style by combining the symbolic knowledge of the domain theory with the rules extracted from the empirical sub-symbolic knowledge stored in NNs trained on examples [1]. For the current study, we set our goal as developing a Neural Network's SubSymbolic Knowledge Extraction Environment(NNKEE) for Teaching Process Assistance. For testing, we built a case study of teaching stock exchange price evolution [3], [4].

2

The Knowledge Extraction Environment

The NNKEE has 3 modules: NN Engine -, Rule Extraction - and User Interface module (Fig. 1). The NN Engine module, contains a NN on which the sub-symbolic knowledge is stored in the form of weights and biases. This NN is trained in our working example on Stock Exchange events, having as inputs past prices on the stock exchange market, and as outputs the predicted future prices. After training, the stored data of the NN engine module are handed over to the Rule Extraction module. The Rule Extraction module processes these data, in order to obtain symbolic knowledge out of a sub-symbolic representation. This module is based on knowledge extraction mechanisms, that we will discuss later on. The extracted knowledge is forwarded to the User Interface module. The User Interface module receives the extracted rule base knowledge and arranges it systematically for display. This module also obtains data from the NN Engine module, which it can also display. The interface is mainly dedicated for the teacher's usage and to serve him/her as an assistant during the preparation of the teaching material or during the teaching process itself, but, if the teacher chooses, he/she can allow the students to study directly the outputs displayed on the interface. As can be seen in Fig.l, the teacher is not replaced by the NNKEE, but the NNKEE only provides more educational material and structures unstructured data, in order to help the teacher explain his/her domain theory knowledge. In other words, it acts only as a translator from a strange language, called sub-symbolic language, into a more comprehensible one, so that it's result can be integrated in the teaching process.

334

A.I Cristea and T. Okamoto /Knowledge Extractor Environment

Figure 1: The Neural Network Knowledge Extraction Environment and it's usage

3

Knowledge Extraction

In previous work ([4]) we have discussed how rule implementation can be easily done, if the size of the NN is no issue, and also some possible optimization for smaller net sizes. Here we are going to concentrate mainly on rule extraction from N N s .

3.1 A REX example: the SUBSET Algorithm This algorithm is based on the assumption that the levels of the input/output signals of neurons in a trained network correspond to either T or F, so that the activation potential is determined by the values of the weights. By finding subsets of the incoming links for which the sum of the weights, plus the bias, makes the internal activation potential high enough to bring the output variable at the T level, it is possible to formulate rules. The steps of the SUBSET Algorithm are: 1. For each output and hidden neuron i, find up to 9p subsets P of positive weight links incoming to neuron i, so that: is the bias for neuron i) 2. For each P, find up to gn subsets N of negative weight links incoming to neuron i, so that: 3. For each N state the rule: "if P and N then (statement attached to unit i)' 4. Prune duplicate rules. The main difficulty of the SUBSET algorithm is its combinatorial complexity, a very large number of subsets and rules, many redundant (equivalent to the implementation of NNs with logical functions, section 3.1). The problem can be alleviated, e.g., by setting an upper limit on the considered number of subsets ( g p , g n in the algorithm). This algorithm is used by many rule eliciting systems.

3.2 REX Classification; Advantages, Disadvantages There are two main approaches to knowledge extraction: decompositions) methods - structural analysis, which assigns to each unit of the NN a prepositional variable and establishes the logical links between these variables; pedagogical methods - input-output mapping, which treats the network as a black box, without analysing its internal structure. From the algorithms in the literature ([1], [5]-[ll]]), KBANN, KT, Connectiorist Scientist Game, RULE NEC, RULE-OUTxre based on the decompositional approach, while BRAINNE, VIA illustrate the pedagogical methods. Many of them are based on a pre-structuring of the NN. This paper is focussed on the decompositional approach, which puts the NN sub-symbolic structure in correspondence with the theoretical symbolic structure. Advantages and disadvantages can be noticed on the given two example algorithms: precision in rule-building can lead to an enormous quantity of rules, with a high redundancy and unclear high-level language meaning of the equivalent symbolic representation, whereas reductions of the number of rules can cost us precision. In our NNKEE, the teacher can choose between the various rule extraction mechanisms, according to his/her goals and application field. In this way, we generalize the NNKEE for further applications. Another topic of interest for rule extraction are the rule quality criteria. Towell and Shavlik [12], enumerate four: accuracy, fidelity, consistency and comprehensibility. other authors have up to 9,

A.I. Cristea and T. Okamoto / Knowledge Extractor Environment

335

etc. The introduction and usage of quality criteria depends strictly on the application domain. For teaching, it is sufficient to extract any information that is complementary to the domain knowledge of the teacher. Therefore we aim at transparency, but not necessarily at fidelity. Next to these criteria from the literature we added the extra criteria of meaning fulness, as the existence of a meaning that can be attributed to the rules and translatability, as the property for the rule to be able to be translated into natural language.

4

Knowledge Extraction Tool Integration

As different tools of knowledge extraction present different advantages, it is difficult to stop at one or the other single method as the Knowledge Extraction Tool to be integrated into our system. Constrains Backpropagation Methods seem to give better overall results, but a method like RULEX relies highly on it's partner, the CEBP (Constraint Error Backpropagation) NN (see [1]). On the other hand, in order to translate rules stored in a general NN, general algorithms, as the one described in the previous section, seems to be good. Therefore, we decided to integrate in our NNKEE not only one, but. several Knowledge Extraction mechanisms. By doing this, we have two advantages: first, the teacher can him/herself chose a rule extraction algorithm that seems to him/her most appropriate; second, the teacher can choose more algorithms, sequentially, compare results and gain more information than from only one rule extraction engine. At a first selection, we stopped at the SUBSET algorithm. We did this, because, as we started from a given NN engine for stock exchange prediction tool, built and described in detail in previous works (see [2]), we needed a general algorithm to work with, fit for any type of NN. We also had in mind further extensions of the application field. By changing the application field, the NN that is at the base of the NN engine changes not only in size, but can also change in type. To build a flexible environment, we have to use general rule extraction methods. Still, if there is any specific rule extraction algorithm, that is optimal for a specific program, our system, with several algorithms enlisted, has a good chance to be able provide it. This is an other reason why it is very important to have diversity in the rule extraction alternatives. It can provide not only a globally optimized solution, by a combination of the results of two or more algorithms, but it also can offer problem-specific handling, when selecting the appropriate algorithm for each particular NN. By allowing a teacher to freely chose among many mechanisms, the flexibility of the NNKEE environment is therefore enhanced. The rule extraction algorithm selected by the teacher will be the one that is used by the Rule extraction module (illustrated in Fig. 1), for computation. The inputs of the algorithm are all the data of the final state of the trained NN. Therefore, the rule extraction engine cannot work but after the training of the NN is completely finished. These inputs are weights of the NN, biases, and inputs and outputs of the NN (the training sets). These inputs represent the sub-symbolic data that was stored in the NN through the training procedure. The rule extraction module will use the selected rule extraction algorithm(s) to transform these sub-symbolic data into symbolic data (rules). The knowledge extraction tool has to be tuned also with the interface tool, in order to present the extracted rules in a comprehensible way for the teacher, or even for the students. The general order of the system activities is the following: 1) Teacher selects the application field (here, stock exchange); system offers the appropriate NN; 2) teacher selects the dimension of the NN and of the input window (what past data and prediction data are interesting for his/her class); system builds the NN and starts the training procedure; 3) system asks the teacher to select a Rule Extraction algorithm (default: SUBSET method); 4) system uses the teacher's answer to start the rule extraction with the selected algorithm (after the NN training has ended); 5) system displays results of rule extraction and, if requested, of the NN training system.

5

Implementation and usage of NNKEE

We are showing in Fig.2 an example of Usage of the NNKEE tool for the Stock Exchange study case. When explaining stock exchange events, a teacher can refer to domain knowledge, such as stock exchange definitions, like "The stock exchange is a market where stocks are sold and bought, and prices are set by the bids and offers of the market members.", a.s.o., in the standard way of lecturing. But with the help of our system, the teacher can also use historical data of stock exchange events, both for displaying them directly on the screen for showing them to the students, and for analysing them.

336


Figure 2: System display: rules and graph Analysing is done in the following steps: first, by having a NN trained on the historical data, and then, having the knowledge extracted from this NN by the previously described knowledge extraction module. By combining the pure descriptive domain knowledge, with the more precise rules, extracted from the system, the teacher can provide a better overview of the taught material, allowing the students to get a deeper understanding of the presented knowledge. At this stage, it is presumed that the teacher possesses the required domain knowledge him/herself. For this knowledge to be present in our system, a database of domain knowledge is necessary, with a hierarchical handling tool for an easy access to data. We considered this item less important at this stage of the research, firstly, because teachers are usually supposed to master the domain knowledge of their subject rather well, and secondly, because this system is designed primarily to allow teachers access to information which is otherwise not accessible to them. The example in Fig.2 analyses stock exchange prices for a period of 20 days, which can be seen in Fig. 2, in the 'neural net training' window These settings correspond to the settings for the NN engine and the rule-extractor 1 . The network is trained till the error becomes zero (the 'Error dispersion' window inside the 'neural net training' window represents a zero error). The prices are normalized between an upper and a lower price limit, therefore, the output values have a range between [0,1]. The rule extraction, in the main window, is done with the SUBSET Algorithm. This algorithm generates firstly simple rules, as the displayed rules 2-4, then composite rules, as rule no.l. A teacher could deduct from rule number 1, transforming it into natural language, the following statement: In the studied time period, the stock exchange data tended to be rather cyclical and had an oscillation tendency around an average value." Rules like 2-4 are too specific, and have little educational use. We displayed them in order to show problems which can appear with rule extracting. Rule 3 is interesting in the sense that it enlarges the prediction window from 1-step ahead prediction to two-step or. generally speaking, n-step ahead prediction. Other rules and their respective graphs can be viewed also in the following figures: 3, 4, 5, 6 and 7. The teacher can define the neural net, by choosing sub-items of the 'Neural Net' button, or just apply training and rule extraction on predefined NNs. The teacher can also choose the more appropriate rule extraction mechanism from a 'Rule Extraction' button and has additional setting possibilities provided by a 'Option' button, merely choosing if both rules and NN should be displayed (as in Fig. 2) or not, a.o graphical display settings. The window interface is realized with MOTIF tools, the engine computations are implemented in C++. 1

pointed by 'rules deduced' in Fig.2


5.1

337

Validity Testing

The system proposed is difficult to evaluate, out of many reasons. First of all, there are no benchmarks in this specific domain, as the proposed system is a pioneer in this line of research. Secondly, the previously defined quality criteria are qualitative and not quantitative, and also highly subjective, so their measurement is difficult. With these problems in mind, we tried to combine qualitative rating with quantitative statistics, and developed a method to evaluate the system in an easy and rapid way. The method used is the questionnaire method, so often used in educational software design. As the perspective users are educators and teachers from everywhere, this questionnaire was made available on the internet, and the address is: http://www.ai.is.uec.ac.jp/u/alex/TESTS/questtonnaire.html Teachers, instructors and people involved in the educational process from many countries were invited to fill in this questionnaire. The contents was as follows. The first page to access is an explanatory page, that stated the problem and the desired goals. The content of this page is listed in the following.

Figure 3: XOR graphical chart

Figure 4: XOR hidden rules

Figure 5: XOR rules before pruning

338


Figure 6: OR AND NOT graphical chart

val (t-3

Figure 7: OR AND NOT rules Explanation: You are going to be asked to fill in a questionnaire, after looking first at some time-series graphs and then at some rules extracted for these time-series. Both graphs and rules are generated by a program that is intended to be used by a teacher in a high-school level classroom. The application class is Economy, in particular, the subject stock- exchange. The presented program is for analysing the stock-exchange time-series data, and for extracting rules from it. These rules can be used to explain the occurencies on the stock-exchange market in a certain time-period, and for analysing it. The program is intended for the teacher of Economy, and stock-exchange in particular, for adding new interesting information to his/her class. Still, it is the author's belief that this type of program can be used wherever there is time-series analysis to be done by a teacher. The series in the examples are some synthetic series, of which the relationship is mostly given, val(t) = f(val(t-l), val(t-2), ..., val(t-p)), (e.g., val(t)= val(t-l) XOR val(t-2), etc.). Some noise is included in the series Please look both at the graphs and rules for each relationship and try to recognise it. The answer for each question has to be one of: () Yes () A little () Almost not at all () No () Not very clear 'Yes' -refers to complete agreement with the statement questioned. 'No' - is complete disagreement. In-between there are the states: 'A little' -close to 'Yes'; 'Almost not at all' -close to 'No' 'Not very clear' -should be used if the question is not clear, or it is impossible to decide on an answer to the question. Please comment on your 'Not very clear' decisions at the end of the questionnaire. Now please go on to the examples, one by one. The next page to be accessed was the first example case, consisting of a pair of a graphical display of a time series and of the rules that were extracted from it. The following four pages were similar, only consisting in different time series. The last page was the questionnaire, as can be seen in figure


339

Figure 8: The questionnaire: first part 8. Pointers back to the explanation page and the example page were also provided. Experts in education from many countries replied (Japan, Romania, Denmark, Holland, Greece, USA, China, a.o.). In order to get a large participation to these tests, the internet address of the questionnaire was published in many newsgroups debating educational subjects and problems. An example scoring of the replies of one person can be viewed in figure 9. The overall reply was very much (86%) in favour of the system, and of such systems generally speaking (75%). The information gathering system was made to automatically reject subjects which presented a low personal adecquance to judge upon the system (based on their computer skills and other factors). The replies of such subjects were not recorded by the system. The accepted specialists showed an overall personal adecquance of 87%. Biased with the personal adecquance of the questioned subjects, the system was evaluated at a percentage of 70% overall performance, specialist acceptance and usefulness. AH these data can be seen in figure 10. The thicker line represents the movement of the average, the thinner are individual ones. There is an ongoing refinement of the system according to suggestions of specialists. Therefore, the presented results are preliminary, and the interested specialists are invited to check our www address presented above. 80% 60%

•

i

• •

40% 20%

i 0%

Personal •adequance

Computer skins

i Overall in lavor of program overall personal adequance program reached goal

Figure 9: Scoring example

340


11 specialists until now: 100%

0%

Personal adequance

Computer skills

Overall in personal adequance

favor of program overall program reached goal

Figure 10: Preliminary results

6

Conclusions

We developed a knowledge extractor environment for Neural Networks, to serve as an assistant in the educational process. We showed how a rule extraction module can transform sub-symbolic knowledge into symbolic knowledge, in order to provide useful information and assistance during the teaching process, which can deal only with symbolic knowledge. As application we described a case example for an Economy class for high-school level education, and several time-series. We also presented an evaluation of our system by domain specialists and commented upon the results and showed how we determine the trend of the future research. We believe these knowledge extractor environments to be a chance for teachers to significantly improve their teaching materials and/or style by combining the symbolic knowledge of the domain theory with the rules extracted from the empirical sub-symbolic knowledge stored in NNs trained on examples, and to add practical knowledge, structured in a rigorous way, to the pure theoretical knowledge.

References [1] Andrews, et al. :

"A survey and critique for extracting rules from trained A N N " , on www.qut.edu. au (1995). [2] Cristea, A., Cristea, P., Okamoto, T. : "Neural Network Knowledge Extraction", Rev. Roumaine des Sciences Technique, Serie EE, Romania, vol. 42, no. 4, pp. 477 - 491 (1997). [3] Cristea, A., Okamoto, T. : "Energy Function based on Restrictions for Supervised Learning on Feedforward Networks", Journal IPSJ (Info. Proc. Society of Japan), SIGMPS Trans., vol.1, no.l (to appear 1999). [4] Cristea, A., and Okamoto, T. : " The development of a neural network knowledge extraction environment for teaching process assistance", ED-MEDIA/ED-TELECOM'98, Eds. Thomas Ottmann and Ivan Tomek, organiz. AACE, vol 1, pp. 227–232 (1998). [5] Fu, L.M. : "Rule gener. from NN ", pp.1114 – 24, IEEE Trans.on Syst., Man&Cybern., vol.28, no.8 (1994). [6] Fu, L.M. : "Knowledge-based connectionism for revising domain theories", pp.173 - 182, IEEE Trans. on Systems, Man and Cybernetics, vol. 23, no.l (1993). [7] Geczy, P. and Usui, S. : "Rule Extraction from Trained ANNs", ICONIP97, vol 2, pp. 835 - 838 (1997). [8] Giles, L., Lawrence, S. and Tsoi, A.C.:"Rule Inference for Financial Prediction using Recurrent Neural Networks", IEEE/IAFE Conference on Computational Intelligence for Financial Engineering. Proceedings, IEEE Press, pp. 255–259 (1997) [9] Hayashi, Y. : " A neural expert system with automated extraction of fuzzy if-then rules", pp.578 – 584, Adv. in Neural Info Proc. Syst., vol. 3, M. Kaufmann (1990). [10] Healy, M. : "Acquiring Rule Sets as a Product of Learning in a logical neural architecture", IEEE Transactions on NN, vol.8, no.3 (1997). [11] Kanoh, S., et al. : "Rule Extraction in Temporal Sequence Generation from Spatially-Encoded Information by Recurrent Neural Networks", ICONIP, vol. 2, Eds. Nikola Kasabov et al., Springer, pp. 873–876 (1997). [12] Towell, G., and Shavlik, J. : "Extracting Refined Rules From Knowledge Based Neural Networks". Machine Learning, 3(1), pp. 71-101 (1993).

Simulation: Systems and Architecture


343


An Agent-operatedSimulation-based Training System - Presentation of the CMOS1 Project -

Luc Richard Laboratoire de Gestion et Cognition 50, chemin des Maraichers 31077 Toulouse Cedex 4 - FRANCE +33 (0) 5 62 25 88 85 [email protected]

Guy Gouarderes Universite de Pau IUT Informatique 64100 Bayonne - FRANCE +33 (0) 5 59 46 32 18 [email protected] Abstract

This paper presents the results of the development of an interactive learning environment. The system aims at training maintenance operators in aeronautics. We present the two main points of the learning environment: it is based on a real-time simulation of the domain, and it works with a multi-agents system.

Keywords Interactive Learning Environments, Simulation, Multi-Agents Systems, Graphical Interface

1. Introduction The state of the system presented in this paper is the result of the second working phase of a large project. The main objective of the whole project is to develop a modern training environment dedicated to maintenance staff operating on recent technology aircraft. The first phase (1991 - november 1996) has consisted on the development of an aircraft maintenance training device. The device, called CMOS (Cockpit Maintenance Operations Simulator) has been prototyped to prove both feasibility and advantages of a full simulation-based approach. In the second phase (may 1995 - today), we design and implement the multi-agents architecture of the learning environment. This ITS (Intelligent Tutoring System) has both to drive a lot of graphical components in a flexible simulation process and to support learning activities by dynamically supervising timely interactions with the user. The CMOS project is developed in the industrial framework of Airbus Training and research issues are supported by the LGC laboratory (Management and Cognition Laboratory) and by STAR (Specialized Training in Aeronautics and Research). The CMOS prototype hardware form is a computer with multiple display units. It provides full functionality of an industrial flight simulator (excluding the flight loop) and all possible interactions within the flight deck. Depending on their needs, maintenance technicians may acquire an initial training, a skills updating or a just-in-time training by practicing on diagnostic and repairing tasks in a simulated environment close to their real working conditions. Airliner's flight deck is the main working place for maintenance operations, it 1

CMOS = Cockpit Maintenance Operations Simulator

344

L. Richard and G. Gouarderes / Agent-Operated Simulation-Based Training System

sums up information sources required for aircraft maintenance. By querying onboard computers and by reading flight related documents, maintenance technicians have to elicit relevant information, detect failures and take a decision to fix a possible defective system. This paper is composed of four parts. First, we describe the hardware form of the CMOS prototype. The current state of the CMOS project is a learning environment advanced prototype dedicated to aeronautical maintenance operators. The constitutive characteristic of this learning environment - the "full-simulation" principle - is discussed in the second part. Then, we present learner/system interactions. Finally, we propose examples of artificial agents. Machine learning mechanisms provide them with basic cognitive capabilities. 2. Hardware Form The current CMOS prototype simulates an A340 Airbus. This simulation of most of aircraft systems is coupled with a graphical interface which represents the inside of the flight deck in a very realistic manner. The hardware form of the prototype is composed of two workstations (Silicon Graphics), a PC, and their wide screen monitors. We call this form a "desktop simulator" in order to distinguish it from a "cabin simulator" (Cabin simulators are huge devices inside of which is a scale 1 real flight deck. Instances of such cabin simulators are FFS2 or FBS3). FBSs and our prototype common objective is to provide a realistic interface between the learner and simulated aircraft systems. Since maintenance operations are done on parked aircrafts, there is no need for a visualization of the outside of the cockpit (landscape, clouds...). Due to the large amount of flight deck elements and the diversity of possible interactions, ergonomic surveys are intrinsic issues in cockpit design. Modern flight decks summarize the current state of all components of the aircraft, providing pilots with synthetic views. Each flight deck panel regroups all push-button switches, lights, dials, displays, knobs and levers related to a given aircraft system (Engines, Electricity, Hydraulics, Conditioned Air, APU...). The visualization of full flight deck details requires a wide screen surface. The chosen solution for the prototype interface is a combination of large screen monitors (21") and windowing techniques. • The first two monitors display simulated aircraft panels. • The third monitor displays a pedagogical window on a HTML browser: helps, advice and hints to the learner. A snapshot of a simple panel (Figure 1) shows what the graphical interface looks like. On each of the first two monitors, numerous panels are displayed on interactive graphical windows the learner can move or resize on the screen (there are up to 30 similar panels on a flight deck).

2

FFS = Full Flight Simulator FBS = Fixed Base Simulator

3


345

Figure 1 : Example of an interactive window: the Electricity panel (235 VU)

3. Main characteristic of the prototype : "full-simulation" principle The use of simulators for training is not a new idea ; many projects have been based on a simulator or on a simulation of the domain : the well-known Steamer system-coach [6][1] provides graphical display and control of simulation of a steam plant. More recently, Sherlock II [7] is a learning-by-doing system for electric fault diagnosis. (The domain of Sherlock II is avionics for F15 fighter aircraft). The need of such training tools is high. There is a global market competition in the industry, skills are rapidly evolving (sometimes in complexity) with new technical environment and specialized knowledge, and employees must acquire new skills and know-how in limited time. Consequently, many researches aim at the design of environments for creating tutoring system in industrial training (e.g. SAFARI project [4][5]) and in academic domains (e.g. NGE [2]). The distinguishing feature of the CMOS prototype is its full-simulation based scaffolding. In learning-by-doing systems, training efficiency depends on the realism level reached by the simulation. Indeed, "full" simulation does not mean that we have attempted to simulate all of the parts of the aircraft. It targets at quality gained by a very fine grain for represented knowledge. We have dropped the idea to use the black-box diagram (reverse engineering principle) and we have preferred to agglomerate previously implemented simulation modules, each one is devoted to a specific aircraft system (Hydraulics, Engines, Electricity...). These simulation modules embody the expertise of designers and specialist engineers. Quality of simulation tightly depends on the selected methodology to assemble these "building blocks". A strict overlapping is realized by a multiplexing algorithm ( Figure 2) which guarantees an efficient dynamic running. These software simulation modules have been implemented by multi-disciplinary teams of aeronautic experts and computer scientists. The code corresponds to the optimized translation of the physical equations that models the aircraft system behavior. Therefore, the deepest knowledge represented in the code is of continuous nature (physical phenomenon equations) in contrast to the knowledge built by reverse engineering (Black-box principle constructs a model using case-based analysis then generalization. It links observed inputs and outputs. Knowledge is fragmented and recorded as thresholds, thus it is of discontinuous nature). Since the beginning of the project, this quest for quality has brought facility for maintenance and evolution in software life cycle (during incremental prototyping). Consequently that has facilitated the reuse of the produced knowledge base to architect pedagogical units and strategies of the ITS.

346


Allow a time-slice specific to the task to perform—

Watch-

Interface

Figure 2 : Structure of the environment embedding the ITS

Whatever is the type of a classical ITS (system coach, micro-word, just-in-time training system...) and whatever are its guiding strategies, the main functioning mode divide the session time into knowledge presentations to the learner, then testing the learner. Moreover, even if the pedagogical planing is dynamic, "knowledge blocks" taught to the learner are necessarily static (since they are designed before). The important feature in most of classical ITS resides in the "intelligent" algorithm for dynamic pedagogical re-planing : it is to know how to chain these blocks in order to answer pedagogical needs of the learner within the shortest possible prescribed time. In this paper, it is neither the point to underestimate the difficulties of dynamic pedagogical planing, nor to minimize results from research in this field. Nevertheless, we would like to put forward the inadequacy of the "building block approach" to envision a design methodology of learning environments with neither coarse nor fine knowledge granularity. This constitute the reason why interactive learning environments based on full simulation can not be structured through the traditional narrow filter (decomposition in four distinctive modules : interface, learner profile, domain expert, pedagogical expert). In our learning environment, simulation is not a modeling choice, it is the only knowledge representation mode that can cope with the high complexity of real-time evolving situation. The very consequence is that cognitive situations are equally dynamic. The two following examples put in contrast static or dynamic cognitive situations depending on respectively traditional learning environment or learning environment based on full simulation: • First example: in a tutoring environment on mathematics, a student is learning how to solve quadratic equations. The cognitive situation is static since the equation is not going to modify its parameters, to self-complexify or to become impossible to solve, through time passing. • Opposite example: when a learner is immersed in a flight simulation - let's say s/he must pilot a one defective engine aircraft in the landing phase - such a cognitive situation is highly dynamic. Consequently, coupling an ITS with a simulation-based training device is a real challenge. It rises various difficult problems ; not only technical ones, but also conceptual ones. Machine-centered design of a training device is confronted with human-centered design of interactive learning environment. Many fields of cognitive sciences are involved in such issues, especially ergonomics that surveys multi-agents (human or artificial) communication [3]. As a first approach and with their own difficulties, two information sources are exploitable for knowledge acquisition. On the one hand, the technical documentation can be used to build an automated documentary chain that allows a simplified and continuous update of the knowledge base of tasks. AMM (Aircraft Maintenance Manual) tasks and TSM (Trouble Shooting Manual) tasks are revised as soon as adjustments or modifications occur in airliners production. The crucial concept is to elaborate an automated entry-point


347

according to the dynamic planing of task. On the other hand, maintenance instructors expertise is the second source of knowledge, which "humanizes" and introduces flexibility in the tasks follow-up.

4. Learner-System Interactions Learner interactions with the cockpit panels have been presented in part 2 of this paper : most of these interactions are mediated by interactive windows that look and feel as in Figure 1. Let us now present interactions with the "meta-help" window (Figure 3). This window is called "meta-help" window because it neither give the learner "what to do next" nor "how to do it", but it dynamically sums up the achievements of the learner concerning global objectives of the task to complete. First of all, the learning environment is a trainer, a device for operational practice. It aims at training technicians in aeronautics to master maintenance procedures (AMM tasks) together with the practice of tracking and troubleshooting procedures (TSM tasks). AMM and TSM are exhaustive manuals. They regroup the totality of these procedures, classified by aircraft systems and specific to a given aircraft. Such a complexity could be approached with an adequate cognitive task analysis [8]. A relevant selection of significant procedures is based on their frequency of use or on their specific structure. Consequently, defining the curriculum consists mainly in choosing a precise set of "key-tasks" in order to give the learner a general knowledge of the structure of the complete set of tasks and a documentation handling experience. The content of the meta-help window (Figure 3) is a hypertext document. The text of the task displayed is the same as in the AMM or TSM paper form. The content is enhanced with graphical icons, and underlined texts are links to other hypertext documents (static part of the documents are implemented in HTML, dynamic ones are implemented in Java). Example of a static entry-point : A TIP (Training Information Point) graphical icon (situated just after "BUS TIE" in fig. 4) means that instructor's recommendations or an hyper-media document is available for the specific check "Make sure that BUS TIE pushbutton switch is pushed in". Example of dynamic icon : a "V"-shape icon (Figure 3): V NOTE: You must ground the aircraft... V(1) On the panel 235VU, make sure... V(2) On the panel 212VU, make sure...)

automatically switches its color from red to green when the corresponding step is correctly effected. The structure of an AMM task or a TSM task is hierarchical: Task 24-24-00-710-801 "Operational Test of the Emergency Generation System" : 1. Reason for the job 3. Job Set-up Energize the aircraft electrical circuits ( T A S K 24-41-00-861-801) 1. Reason for the job 3. Job Set-up Subtask 24-41-00-480-050 1. Reason for the job | Do the EIS start procedure ( T A S K 31-60-00-860-801) 1. Reason for the job

348


Figure 3 : In this example, the learner works on an aircraft electrical energizing task

5. Examples of Artificial Agents : the Learner's Assistant and the Instructor's Assistant The objective of the deliberative coaching system is to give timely advice, aids and suggested helps adapted to the exercise, the context and the understanding level of the learner. All interventions respect the criteria of opportunism, discretion, learning flexibility and precision of messages (defined in [9]) in order to better manage learner's cognitive workload. The training units are the sessions, composed of one or more "scenarios". A scenario is composed of two phases. Firstly, the learner discovers the problem in real working situations through video sequences, audio and animated graphics. Then, usual documents present the work to complete. Finally, the solving phase begins and the learner is totally free to interact with the simulation. The solving phase is mainly composed of hierarchically interrelated AMM and TSM tasks. The direct link between the learner and the learning environment is an artificial agent : the learner assistant. The running basis of this artificial agent is a diagnostic reasoning background process that observes every action of the learner then immediately tries to insert this action in a meaningful plan. In real-time the follow-up system builds and updates the remaining paths still available to achieve the goal (the learner is expected to succeed in the task). An "estimated willing" summarizes various data that come from the learner basic information (static ones) and from the extrapolation of latest interactions with


349

the system. This dynamic estimated willing (data system update occurs every 800 ms), allows the preparation of advice or help messages. Then, the learner assistant send a message to another agent: the curriculum agent whose job is to deliver a pedagogical content depending on the specific difficulties of the reached step inside the task. For instance, when the learner assistant detects the learner has already taken his/her error into account and is beginning to correct it, the curriculum agent does not disturb the learner. Advice, help messages and information messages are displayed using different aspects. In the case of a full automatically elaborated message, the use of a simple grammar limits the syntax of the message (pseudo-natural language). On the other hand, well-known errors allow instructors to prepare much more personalized messages in hyper-media documents. To sum up, the learner assistant can be described as a highly dynamic system that coaches the learner by maximizing the use of graphical hints located on user well-known panels and documents (interactive graphical windows of the cockpit panels and documents such as PFRs), by minimizing the use of unusual forms of advice (texts in windows are restricted to safety warnings and hard-coded instructor's recommendations) in order not to disturb the learner from his/her operational practice tasks. The learning environment could be ran under two modes4: the learner's mode (main mode) and the instructor's mode. In this last one, the instructor plays the role of a learner in order to validate didactical resources automatically elaborated by the system, before it could be used on real learning situation. This work consists in repeating phases of 1) exploration of each of the tasks and scenarios for all learning sessions, and 2) verification of the appropriateness between the potential done error and the didactical resource proposed as a correction. Experts of the domain (aeronautical maintenance instructors) considered this work quite long and boring. Consequently, it was of high interest to design and implement an instructor's agent. The starting point of this implementation is observation of experts' behavior: Depending on their experiences, maintenance instructors rate the importance of each step in a task ; not only on a continuum from low importance step to very high importance step, but also on the strictness of the order between the steps. The main objective of smoothing mechanisms is to add, when needed, adequate paralleled structures in the model. Running with the "instructor's mode" selected, the prototype displays the same graphical interface as in the "learner's mode", but since the user is an instructor, the system does not tutor him/her, it learns from his/her interactions. The instructor's assistant role is to relieve the instructor of repetitive tasks. This artificial agent plays the role of an assistant whose collaboration is more and more useful and pertinent because s/he observes and generalizes decision taken by the instructor. The machine learning technique used to implement this artificial agent are Restricted Coulomb Energy [10] mechanisms. RCE is a specific type of neural network very well fitted to the classification and generalization of tasks (much more adapted to this situation than multilayers Perceptron).

4

The idea is to use the same software environment for training the learners and for eliciting expert knowledge

350

L.

Richard and G. Gouarderes / Agent-Operated Simulation-Based Training System

Figure 4 : The instructor's agent asks the instructor to confirm its initiative

Content of the previous window (Figure 4) : The instructor is working with the learning environment in the "instructor's mode" (testing if the didactical resources of the system are correct). Meanwhile, the instructor's assistant is learning from its observations of the instructor's actions on the learning environment. The instructor's assistant decides to open a dialogue window with the instructor (Figure 4). This artificial agent points out that "the system has detected an error". Then, he proposes what will be the standard message to the learner if the instructor agrees (the content in the frame is the standard message). Finally, the instructor may disagree (checking the NO box) or agree (checking the YES box) or give the name of a file that contain a more detailed message. The percentage written near the YES box means that the instructor's assistant estimates that the YES answer is the right one with a 78% probability rate. If this probability have been greater than 90% (value by default), the instructor's assistant would have not opened a dialogue window since s/he would have taken the decision authoritatively (without referring to the instructor). In this part, we have put forward that artificial agents based on an accurate observation of instructors' behaviors can integrate in the system (on run-time) better humanadapted structures. 6. Conclusion The results of a system development are discussed in this paper. We have designed and implemented an interactive learning environment dedicated to maintenance operators in aeronautics. We support the ability of multi-agents architectures to bridge the gap between complex dynamic systems (specially those based on real-time simulation) and interactive learning environments. The short run perspective of this work is to analyze all feedback for the evaluation of this interactive learning environment as an effective training tool. The validation of such a long haul development effort (a many years collaboration between numerous researchers and engineers) is currently under collecting the appreciation feedback from users targeted (trainees and instructors).


351

References [1] Burton, R.R. "The environment module of ITSs". Foundations of Intelligent Tutoring Systems, pp. 117–119, 1988. M.C. Poison, J.J. Richardson, E. Soloway. Lawrence Erlbaum Associates Publishers. [2] Eloi, M. "Mise en oeuvre d'un Noyau de Generateur d'Environnements Interactifs d'Apprentissage avec Ordinateur pour 1'Atelier de Genie Didacticiel Integre". Doctorat de I'Universite Toulouse III. Juin 1996. [3] Falzon, P. "Human-computer interaction: Lessons from human-human communication". In P.Falzon (Ed) Cognitive Ergonomics. London: Academic Press, 1990. [4] Frasson, C. "Some characteristics of instructional design for industrial training", invited paper, CALISCE'96, 3rd International Conference on Computer Aided Learning and Instruction in Science Engineering, pp. 1–8, Donostia, San Sebastian, Espagne, July 1996. [5] Gecsei, J. and Frasson, C. "SAFARI: an Environment for Creating Tutoring Systems in Industrial training". EdMedia, World Conference on Educational Multimedia and Hypermedia, Vancouver, June 1994. [6] Holland, J.D. Hutchins, E.L. Weitzman L. "Steamer : An interactive inspectable simulation-based training system", pp. 15–28. AI Magazine, 5, 1984. [7] Katz, S. Lesgold, A. Eggan, G. Greenberg, L. "Towards the Design of More Effective Advisors for Learning-by-Doing Systems". Lecture Notes in Computer Science, Springer. Intelligent Tutoring Systems, Third International Conference, ITS '96, pp. 641–649, Montreal, Canada, 1996. [8] Lajoie, S. Deny, S. J. (editors) "Computers As Cognitive Tools - Technology in Education". 1993 [9] Mengelle, T. "Etude d'une architecture d'Environnements d'Apprentissage bases sur le concept de preceptorat avise". Doctorat de I'Universite Toulouse III. Juillet 1995, pp. 102-105 [10] Reilly, D.L., Cooper, L.N., Elbaum, C. "A neural model for category learning". Biological Cybernetics, 1982, N° 45, pp. 35–41

352


Towards a Unified Specification of Device-Instructor-Learner Interactions Pak-Wah Fung and Ray H. Kemp Institute of Information Sciences and Technology Massey University Palmerston North, New Zealand {P.W. Fung;R.Kemp}@massey.ac.nz

Abstract. Human computer interaction is an important issue in developing intelligent tutoring systems. In this paper, the authors propose a visual formalism which aims to unify the task of modelling interactivity between the instructor, the learner and the device simulation into a single framework in order to ease system development. The proposal is based on the statechart formalism [6] and it serves to provide a powerful basis for a generic authoring system. With this system, the developers only need to focus their effort on the domain model and appropriate training materials will be generated automatically. However, the developers still have the freedom to incorporate their own desired approach to tutoring by modifying the metamodel which is transparent to them. To illustrate the idea, a video cassette recorder training scenario is used.

1. Introduction Human-computer interaction is a complicated process. In device training scenarios, developers have to contend with an extra level of complexity because the interactions often involve more than one autonomous agent. Take training a novice in the use of an interactive device like the video cassette recorder (VCR) as an example. Here, there are two intelligent agents: the instructor and the learner; and a 'dumb' device. The situation is even more complex if instruction is being delivered to more than one learner in a cooperative learning environment. Not only do the persons involved have to reason about the device behaviour, the responses of other parties are also their area of concern and they have to react accordingly to ensure the learning process is successful. To develop robust tutoring systems, a global design methodology (i.e. coordinating activities amongst all the agents) is of paramount importance to help the developer handle the intricate interactive structure. We propose the use of the statechart [6] as a unified formalism tool for modelling the interaction between the device, the instructor and the learner(s). Our philosophy is to create abstract representations of the interaction which can be converted automatically into code. Our goal is to provide a powerful basis for building a generic authoring system which allows domain experts to develop training material automatically without needing to be concerned about pedagogical issues. Using this technique, we believe the process of authoring interactive learning systems will be improved and the specification will allow the software designer to reason about the model. One underlying motivation for our research is to unify the specification of system behaviour and the dialogue between a system and its user into a single representation. From the device simulator to the interface, from customizing feedback to teaching approaches, all are expected to

P.-IV. Fung and R.H. Kemp /Device-Instructor-Learner Interactions

353

be dealt with using one notation. There are many advantages of a single representation but we argue that the most important one is that the authors need only familiarize themselves with just one notation in contrast with heterogeneous representations. Also of particular value is modularity, i.e. the ease of combining different components together. In fact, the idea of a single representation is not new and most notable is Anderson's work on programming tutors - see, for example, Anderson's work [1]. Spada's DiBi system [17] is another example. Work on the use of heterogeneous representations includes [12] and [10]. In assessing the formalism, we have adapted criteria from [18]: formality, completeness, comprehensibility, flexibility and executability and we claim that the approach proposed in this paper satisfies all of these. First, the statechart (SC) is formally defined ([6], [7]) and is amenable to rigorous analysis. For completeness, the notation is self-contained and can encompass abstract descriptions like "tape playing" as well as very low level operations such as "square button no.3 pressed". Thirdly, statecharts are visually oriented and are comprehensible to both developers and users. The statechart is also very flexible and its state-event driven ontology allows a broad variety of dialogue styles to be accommodated. The most appealing feature of the statechart is its direct executability in supporting rapid prototyping, and its suitability for the development and testing of interactive learning systems [7]. In our past work on TANDEM [9], [10] we have reported our progress in using plan nets [3] as a tool for authoring interactive device training. This approach avoids the combinatorial explosion experienced when using finite state machines (FSMs) for domain representation and is also useful for providing appropriate feedback. However, it is limited in its descriptive power for representing other parts of the system such as the interface, task specification or for modelling the teaching method. 2. Characteristics of Interactive Learning Systems One powerful method for handing complex systems is the so-called divide and conquer approach - the system is partitioned into a collection of smaller sub-systems which work together to achieve a common goal. Each subsystem could be isolated and studied its own right but, being part of the complete system, the inter-linking relationships among subsystems must also be studied in order to gain an accurate picture of the whole. Such a system characteristic demands a notation that can capture both part-part and part-whole relationships. Interactive systems are normally dynamic, i.e. one or more of their features can have different states over a period of time. The state of the whole system is defined by the states of all of its features. The system moves from one state to another by changes in the states of its components. But above all, the most important feature of an interactive system is its "interactivity" i.e. the system state is directly influenced by the person using the system. Such a characteristic of interactive systems renders the state-event paradigm a particularly appropriate modeling methodology. States and events are adopted a priori for describing the system's dynamic behaviour. In other words, the system is modelled as a space of 'snapshots' where each snapshot represents a particular state. The system is continuously subjected to external or internal stimuli and responds accordingly to whatever event has occurred. This view not only applies to the device model, it can be extended to denote the instructor-learner interaction. In providing tutoring support on using a VCR, for example, a basic fragment can be in the form of "when event(press play)"occurs while the VCR is in state(stop), and if condition tape(in) is true at the time, the system transfers to state(tape(playing)). This is a typical example of the device system description. Consider the tutor-student system: when the user attempts to press "eject" while the system is in "state(play)", an intelligent tutor should react to the event in a form of providing feedback like "you must stop the tape before taking that action". This perspective naturally points to finite state machines (FSMs) and their corresponding state-

354

P.-W. Fung and R.H. Kemp / Device-Instructor-Learner Interactions

transition diagrams and indeed it has been adopted in specifying human-computer interaction [18] and in computer-assisted instruction [4]. The problem is that we are dealing with two interactive systems: the device and the tutor-student system. [6] argues from four perspectives that the SC is superior to the FSM and we have adapted some of his arguments in the context of developing tutorial system for interactive devices: 1

The notion of depth or hierarchy is lacking in the FSM but this way of organizing state information is of paramount importance in large systems. Not only is this approach helpful in structuring the device information but it is also a good pedagogical technique. One can help the learners to focus on a higher-level of abstraction during an initial exposition and move on to finer details at a later stage. 2 When it comes to a common transition in several states, a single arrow in the SC represents a multitude of arrows in the FSM. This provides an excellent way of unifying feedback to students as one single action may cause the system in different states to respond in a certain way. 3 The FSM requires an explicit listing of all possible state combinations which is a daunting task for the author but in the SC, different state combinations are implicitly represented Depending upon the learner's committed actions, the tutor may have different ways of providing comments. This feature releases the author from the need to explicitly list all possible tutor-student dialogue scenarios. 4 Inherently, the FSM is sequential in nature which requires the user to carry out some actions in a fixed sequence However, concurrency is commonplace in system behaviour and many subtasks can be executed in any order or simultaneously. In the SC, concurrent processes can be represented using what [6] calls orthogonal components. Indeed, a learning situation is composed of three orthogonal components: the device, the tutor and the student. Suppose the student is inserting a tape, the device is receiving the tape and the tutor watching. All these three scenarios are happening concurrently. 3. Basic Components of the Statechart Statecharts are a higraph-based extension of the FSM and the basic components of SCs are blobs (denoting states) and arrows (representing transitions) [6] summarizes the SC in the following form Statechart = state diagram + depth + orthogonality + broadcast communication. In the example SC shown in Fig. 1, the system is modelled by two orthogonal blobs A and B which have states Al, A2, A3 and Bl, B2, B3 respectively. Once the whole system is activated, it must be in state Al (the default state represented by a single arrow) and Bl (the default state of B) The H stands for 'enter-byhistory', i.e. when returning to the compound state, the system will B enter the state most recently visited. The interstate transitions are represented by the labelled event arrows like a traditional FSM but of particular importance is the common transition Z which Fig. 1: Sample Statechart with two orthogonal components applies to both B2 and B3. Whether the system is in B2 or B3, the occurrence of Z will trigger a transition to Bl. The


355

contour surrounding B2 and B3 reveals the notion of depth in the SC which can be perceived as a superstate of B2 and B3. The special condition [A2] and associated action /g attached to arrow y illustrate the broadcast mechanism across the whole, i.e. what happens within one blob is being broadcast to other blobs which in turn would affect other blobs' responses. Suppose B is now in state Bl and event y occurs. A must be in state A2 prior to the transition Bl B3. The attached action on the arrow indicates the associated action to be carried out and its effect will be broadcast to the whole system. Consider an initial system state, say at (A1,B1), and event x occurs. Since the precondition [Al] is satisfied, the transition Bl B2 can take place but at the same time trigger the action e. Action e is then being broadcast to other blobs, including A, so A immediately transits from Al to A2. This chain reaction like mechanism is a powerful feature for capturing system behaviour due to internal stimuli.

4. Modelling an Interactive Device with the Statechart In this section, we are going to illustrate a highly simplified version of a VCR and demonstrate how the SC model may be used to represent its behaviour. Figures 2, 3 and 4 show the various levels of a VCR and one can easily see the notion of stepwise refinement has already been incorporated into the chart. At the highest level, the VCR can be seen as a combination of two systems: the power and tape systems but the functioning of each is not completely independent of the other. One advantage of this 'underspecification' is to allow the user to complete tasks in an arbitrary order. For example, suppose the user wants to view a video programme, s/he can insert the tape first before switching on the power. The other way round is also permitted, however, i.e. switching on the power prior to inserting the tape. Each user action (e.g. press play) is modelled as an external event represented by an arrow. Fig. 2: Power and Tape (Level 1)

Tape In TAPE IN MOTION

-press record[power(on)] -press

(RECORDING) (FAST

REWIND

(FAST

FORWARD

play(power(on)l

-press rewind(power(on)]press

Motion

forward[power(on)] press record [power(on])

press

Tape in

stop

Fig. 3: Tape In (Level 2)

H STOP

press forward [power(on)]

press [power[on])

Fig. 4: Tape in Motion (Level 3)

rewind

356


5. Metamodel of the Device: Instructor-Learner Dialogue Modelling Among the models of interaction proposed, the number of layers involved is highly dependent upon which model is adopted [5], [15], [16]. However, the most widespread approach is still the linguistic model, i.e. the human-computer dialogue is defined at three levels: lexical, syntactic and semantic. Within this model, the interface contains only two languages: the user input and system response. Since this model fits most of the tutorial systems (students input what they want to do/ask and the system responds accordingly) we adopt this model to develop our idea. The lexical layer covers the basic primitive of the user input, like key stokes, button pushes or the like. What constitutes a valid expression in the user language is defined in the syntactic layer whereas the semantic layer defines the effects of such expressions, i.e. the functionality of the interface. The SC is capable of modelling user actions at all three levels because the authors can begin specification from the higher layer (i.e. the semantic layer) and then keep refining the individual semantic blobs until the lowest level of user input primitives is reached at the lexical layer. And, in fact, boundaries separating the three layers are quite blurred because the different blobs may require different levels of detail of description. This property provides the authors the highest degree of flexibility in specifying the instructor-learner dialogue. Shown in Fig. 5 is a sample metamodel of the device, i.e. how the instructor and the learner respond to the device model. Note that the specification shown is at the highest semantic layer only and an individual blob can be refined until the final interactive primitives are reached. The user operates on the interactive device freely (as an exploratory learning environment). For example, for the blob "doing", the learner can do anything such as "press play", "press eject" etc. However, what s/he is doing is closely monitored by the tutor. This model can be pre-defined by the system but again its structure can be easily modified to suit the individual teaching style of the author. Space limitation does not allow us to provide a complete specification of a tutoring system of a VCR so the following figures only show the semantic features but this is what we expect the author to focus on.

Monitoring Asking completed Doing completed

completed Feedback completed-

Answering Feedback on what has been done Commenting Student

Tutor

Fig. 5: Tutor-Student Dialogue At this level, each agent (tutor and student) has only two possible states (student acting and tutor monitoring or student listening while the tutor comments). "Acting", "Listening", "Monitoring" and "Commenting" are only high level descriptions but they can be further refined in a stepwise refinement manner. In the actual authoring process, this is going to be implemented with the


357

metaphor of "point-zoom-edit". The author points the mouse pointer to "Asking", say, the blob will be zoomed out for adding further details (shown in Fig. 6) that list the types of questions the student may be asked. Similarly, how the tutor responds to the student's questions is specified in Fig. 7. Note that the events linking individual blobs are omitted to make the diagram easier to read.

Ask for all actions currently possible Why operation currently not possible

List

all currently legal actions with satisfied preconditions

(Explain the preconditions cannot be fulfilled

Ask about current situation

List current system states

How operation can be made possible

Describe preconditions to be satisfied

How 'impossible' operation can be made possible

)

List all actions that would lead to the satisfaction of preconditions

Asking

Answering

Fig. 6: Possible questions to be asked

Fig. 7: Tutor responds to student's questions

Question-answering is just one possible tutorial scenario in this model and most of the student's time should be spent on interacting with the device to discover its behaviour. This is, by no means, restricted and, in fact, either agent can initiate a dialogue. For the purpose of illustration, here we only delineate three types of action the student can take when s/he is doing something: carrying out a legal operation, attempting an illegal operation and attempting a currently impossible operation. These are identified in [8] (Fig. 8).

Carrying out legal operation

Praising

Attempting illegal operation

Indicating illegal operation

I Attempting currently impossible operation Doing

Fig. 8: Actions carried out by the student

!

Explaining that action currently impossible because of unsatisfied preconditions' Feedback of what has been done

Fig. 9: Tutor Responds

"Attempting illegal operation" is different from "attempting currently impossible operation". For instance, inserting a CD into the tape slot is an illegal operation while attempting to press play before turning on the power is a currently impossible operation. The tutor would categorize the types of mistaken steps taken and provide comments accordingly (see Fig. 9).

358

P. - W. Fung and R.H. Kemp / Device-Instructor-Learner Interactions

Once the author has specified the dialogue structure of the tutor-student interaction, the system will provide an inventory of syntactical and lexical primitives for the author to choose from. A typical syntactic item could include "turning a knob", "pressing a button" or "flicking an on/off switch" while typical lexical primitives are "pressing the key A", "positioning the mouse pointer to (x,y)", "selecting from a menu" etc. This methodology encourages the author to concentrate only on the semantic issues and leave others for the system to deal with. For instance, the author may decide to incorporate the action "inserting a tape" as an allowable operation but s/he is not concerned with how it is implemented. The operation could be semantically achieved through a syntactically valid diagram sequence (simulation) or just by selecting from a list of operation menu items or by asking the user to type in a natural language expression. 6. Sample Dialogue As the SC is directly executable, it can be test run immediately once specified. In conducting a tutorial session, the learner will be involved in a simulated environment (the device model) but his/her individual action is monitored by the tutor. Since the device model is accessible to the tutor, one can assume that the tutor 'knows' the device model and can comment on individual student actions. Table 1 shows an excerpt from a possible dialogue along with remarks that describe the reason why the system responds in that way according to the specification. Table 1: Sample dialogue of a tutorial session Agent Student Tutor Student Tutor Student Tutor Student Tutor Student Tutor

Actions/Response Insert Tape Good, you've executed a legal step Press "Play" Sorry, the power is not on Press "On" Good, you've executed a legal step Press "Eject" Sorry, you are not allowed to do that How can I achieve that Press "Stop" before pressing "Eject"

Remarks Carrying out legal operation Praising Attempting currently impossible operation Explaining that action requires satisfied preconditions Carrying out legal operation Praising Attempting illegal operation Indicating illegal operation Asking how impossible operation can be made possible Describe preconditions to be satisfied

7. Concluding Discussion As an improvement on the FSM and on plan nets we believe that the SC is a versatile tool in providing a unified interface for modeling the behaviour of an interactive device and also the human/system dialogue for tutoring its usage. With just one interface, the author can specify both the device and the dialogue structure between the tutor and the student. The built-in hierarchical feature and the broadcast mechanism of the SC render it amenable for modeling large and complex devices as well as sophisticated instructional strategies. However, the focus so far has been solely on providing feedback on domain behaviour. As [10] observe, it is more pedagogical desirable to guide the student to complete a specific task rather than merely explore the domain. The next stage of our work is to define a task structure to be overlaid on the domain SC. For instance, the sample dialogue shown in Table 1 is taken from a free domain exploration session. But in a task-oriented dialogue, the student would not be allowed to press "eject" because it would not help to achieve the current goal (playing the tape). Another goal is to investigate the formal representation of these systems. [6] demonstrates the formal basis of SCs, showing that they are amenable to rigorous analysis. We are currently


359

looking into some kind of analytic techniques to deal with the issues of model consistency, ambiguity and completeness. It is also planned to test the model using the Wizard of Oz method [14]. As shown by [19] and by [11], for example, this approach is particularly suitable for testing out tutorial systems. References [1]

Anderson, J. R., Corbett, A. T., Fincham, J., Hoffman, D. and Pelletier, R. (1992). General Principles for an Intelligent Tutoring Architecture. In J. W. Regian and V. Shute (Eds.), Cognitive Approaches to Automated Instruction (pp. 81-106). Hillsdale, NJ: Lawrence Erlbaum. [2] Dix. A. and Runciman, C. (1985). Abstract Models of Interactive Systems. In P. Johnson and S. Cook (Eds) People and Computers: Designing the Interface (Proceedings HCI' 85). Cambridge University Press. Cambridge. [3] Drummond. M. (1989). Situated Control Rules. In R.J. Brachman, H.J. Levesque & R. Reiter (Eds). Principles of knowledge representation and reasoning, San Mateo, CA: Morgan Kaufmann. pp. 103–113. |4] Feyock. S. (1977). Transition Diagram-based CAI/HELP systems, International Journal of Man-Machine Studies, Vol.9, pp.399–413. [5] Foley. J.D. and van Dam, A. (1982). Fundamentals of Interactive Computer Graphics. Addison-Wesley. Massachusetts. [6] Harel. D. (1987). Statecharts: A Visual Formalism for Complex Systems, Science of Computer Programming. Vol. 8. pp. 231-274. [7] Horrocks. I. (1998). Constructing the User Interface with Statecharts. Addison-Wesley, Massachusetts. |8] Kemp. R.H. (1995). Designing Interactive Learning Environment. Ph.D. Thesis. Massey University. Palmerston North, New Zealand. |9| Kemp. R. H. and Smith, S. P. (1994). Domain and Task Representation for Tutorial Process Models. International Journal of Human-Computer Studies, Vol. 41. pp. 363–383. [10] Kemp. R. H. and Smith, S. P. (1996). A Visual Approach to Procedural Tutor Specification. M. Apperley (Ed.). Proceedings of OzCHI 96 (pp. 190-1%), Hamilton. New Zealand: IEEE Computer Society Press. [11] Kemp. R. H. (1997). Using the Wizard of Oz Technique to Prototype a Scenario Based Simulation Tutor. In B. du Boulay and R. Mizoguchi (Eds.), Artificial Intelligence in Education: Knowledge and Media in Learning Systems (pp. 458–465). Amsterdam: IOS Press. [12] Khan. T.M., Brown, K.E. and Leitch R.R. (1997). Didactic and Informational Explanation in Simulations with Multiple Models. In B. du Boulay and R. Mizoguchi (Eds.), Artificial Intelligence in Education: Knowledge and Media in Learning Systems (pp. 355-362). Amsterdam: IOS Press. [13] Mark. M. & Greer, J. (1995). The VCR Tutor: Effective Instruction for Device Operation. The Journal of the Learning Sciences, Vol. 4(2), pp. 209-246 [14] Maulsby, D., Greenberg, S. and Mander, R. (1993). Prototyping an Intelligent Agent through Wizard of Oz Techniques. In Ashlund, K. Mullet, A. Henderson, E. Hollnagel and T. White (Eds.), Proceedings of INTERCHI '93 (pp. 277–284), Amsterdam: ACM. [15] Moran, T.P. (1981). The Command Language Grammar, a Representation for the User Interface of Interactive Computer Systems. International Journal of Man-Machine Studies, Vol. 15(1), pp. 3–50. [ 16] Nielsen. J. (1986). A Virtual Protocol Model For Computer-Human Interaction. International Journal of ManMachine Studies, Vol. 24(3), pp.301–312. [17] Spada, H., Stumpf, M. and Opwis, K. (1989). The Constructive Process Of Knowledge Acquisition: Student Modeling. In H. Maurer (Ed.), Computer Assisted Learning (pp. 486–499). New York: Springer-Verlag. [18] Wasserman, A. (1985). Extending State-Transition Diagrams For The Specification Of Human-Computer Interaction, IEEE Transactions on Software Engineering, Vol. 11, pp.699–713. [19] Winkels. R. and Breuker, J. (1992). Modelling Expertise For Educational Purposes. In C. Frasson, G. Gauthier and G. I. McCalla (Eds.), Intelligent Tutoring Systems: Second International Conference (pp. 633–641). Berlin: Springer-Verlag.

360

Artificial Intelligence in Education S.P. Lajoie and M Vivet (Eds.) IOS Press. 1999

An Open Architecture for 1 Simulation-Centered Tutors Allen Munro, David S. Surmon, Mark C. Johnson, Quentin A. Pizzini, and Josh P. Walker Behavioral Technology Laboratories University of Southern California 250 North Harbor Drive, M/S 309 Redondo Beach, CA 90266 USA email contact: [email protected]

Abstract Although simulation-centered tutorial development systems such as RIDES and VIVIDS have been productively applied to tutoring research and advanced development, the monolithic nature of these systems has been an obstacle to integration with other advanced simulation-centered tutoring components. Marrying the VIVIDS technology to virtual environment viewers and to other technologies, such as autonomous agents, has been difficult and has limited some of the productivity and maintainability benefits of the authoring system approach for those tutorial applications. A new, open architecture approach to simulation-centered tutors offers advantages such as lightweight tutors, cost savings due to component reuse and improved maintainability, more rapid adaptation to new presentation technologies, and support for a wider range of tutorial applications than would be possible with a monolithic system. The major components of the architecture include a tutorial controller, non-simulation user interface elements such as presentation channels, commands, and answer entry interfaces, and behavior model and model view components that embody an interactive simulation. The open architecture's components can be abstractly defined in terms of the sets of services that these components provide. This abstract approach encourages the extension of the simulation-centered tutoring component model to tutoring in the context of real devices or systems, in addition to tutoring in the context of simulations.

Background For a number of years, together with colleagues at the University of Southern California, we have developed a series of systems for authoring and delivering simulation-centered tutors. The most recent such systems are RIDES [1,2] and its immediate successor VIVIDS [3]. Using these systems, researchers and developers at a number of institutions have developed a variety of interactive graphical simulations and tutorial courses on a variety of topics, including the characteristics of satellite orbits, the maintnenance of aircraft landing gear systems, the operation of antenna control systems, radar operations, the structure and functions of the human circulatory system, and many others. While these systems achieved some practical success and met a number of research objectives, it was not long before we found that ambitious developers were pushing our authoring systems in ways we had never anticipated. These developers

1 We acknowledge gratefully the research support of the Air Force Research Laboratory under contract no. F33615-90-C-0001 and of the Office of Naval Research under grant no. N00014–98–1–0510.

A. Munro et al. /Simulation-Centered Tutors

Figure 1. Scenes from Two VIVIDS Simulations

found ways to couple RIDES and VIVIDS to new technologies, such as speech input systems and virtual environment viewers. The marriage of our monolithic authoring and delivery system to unanticipated technologies was sometimes awkward, and failed to provide completely satisfactory results. Often only a subset of the features of VIVIDS and of the newly coupled technologies could be practically applied in tutorials, and the design, development, and maintenance of complex tutorial systems was more difficult than it seemed it should be. This was painfully evident by contrast with the natural interactive approach used for the development and delivery of simulations and simulation-centered tutorials when only the native capabilities of the monolithic VIVIDS system were employed. These issues were forcefully demonstrated to us when we ourselves participated in a large project designed to couple three major technologies in a complex simulationcentered tutorial project called Virtual Environments for Training (VET) as described in [4] and [5]. This project combined these three large components: • Vista is a virtual environment capable of displaying 3D models in a variety of file formats, including VRML, and designed to handle student movement and manipulation events in virtual space. Vista [6] was developed by Randy Stiles and his associates at Lockheed Martin Co. Further information on Vista is available from the VET project web site: http://vet.parl.com/-vet/ • Steve is a Soar-based autonomous agent developed by Lewis Johnson and Jeff Rickel and their colleagues at Information Sciences Institute, University of Southern California [7]. Steve agents employ models of tasks to monitor, evaluate, demonstrate, and explain procedures to students. Information on Steve is available from the CARTE web site at: http://www.isi.•du/isd/VET/vat-body.html • Our VIVIDS was used to define the interactive and non-interactive behavior of complex systems. It responds to user manipulations of simulated objects by propagating the effects of those simulations through a series of automatic evaluations of easily authored and maintained behavioral rules. In addition, VIVIDS is used to author and control the course structures that determined the long-term progress of instruction, and to rapidly develop simple tutorials.

361

362


Detailed information on VIVIDS, including complete authoring documentation is available at our web site: http://btl.usc.edu/vzvxD8/ These three large software applications were required to be in constant communication with each other during tutorial interactions. The process of developing subject-matter materials (primarily 3D models for Vista, task structures for Steve, and behavioral data and rules for VIVIDS) was difficult and awkward, at times, because of many specialpurpose additions that had to be made to these components in order to support their working together. The ad hoc character of some of these integrations raised issues as to the authorability and maintainability of tutors developed using these three systems in combination. Figure 2 displays a scene from a VET tutorial We believe firmly that the answer to these concerns is not to develop an even larger monolithic system that combines all the capabilities of the three VET components in one tightly integrated and closed application. Instead, the VET experience and reports from other researchers using VIVIDS have encouraged us to define a new open architecture for simulation-centered tutors.

Figure 2. VET tutorial scene

Open Architecture Concepts We have five major goals for this open architecture: 1. Avoid unnecessary baggage. While the VET system can be viewed as a system of components, they are very heavyweight components. In addition to increasing the memory footprint and costs, the legacy features of heavyweight components bring their own maintenance requirements without offering utility in many application environments. 2. Achieve software cost savings through component reuse. It is easy to justify the development costs for a very well-crafted lightweight component that can be reused in many different types of simulation-centered tutors. If similar functionality must be recoded for different tutorial applications, then development resources are being spread thinly over and over to achieve very similar results. 3. Be better positioned to meet unanticipated needs. A lightweight component with well-defined services may have applications not envisioned when it was first developed. A cleaner, more maintainable, design is likely to result when newly required components interact with previous components through a well-tested service interface, rather than modifying a large monolithic application to meet the new requirements. 4. Be better able to exploit new technologies as they emerge. An architecture with an appropriate level of abstraction in the definitions of its components has the


potential for more naturally incorporating new technologies into future simulationcentered tutoring systems. In particular, there should be a natural path for the incorporation of unanticipated new presentation technologies. 5. Be able to support tutorial applications with widely differing requirements. One example of such differing requirements is being able to support 2D graphical simulations that can be quickly downloaded over the Internet, but also being able to support tutors that work with virtual environment systems that make use of very specialized, complex, and expensive computational, display, and input devices. Another example is being able to provide components for a tutor that requires a simple single-student tutorial, but also being able to support another tutor that is used for networked team training. What form should these components take? We propose that each component of our proposed architecture should have a core abstract form as an API—an application programming interface. This permits a variety of implementations of the abstract component. In our own current projects, we are developing a number of concrete components based on the abstractions of the open architecture APIs. Our core concrete components are Java classes that meet the specifications of the abstract component specified by our architecture. In addition, such core components can be coupled with proxy objects that support the use of the core component in networked applications. The Major Abstract Components of the Open Architecture Figure 3 schematically presents the major components of the open architecture for simulation-centered tutoring systems. User View

Tutor Component

Tutorial Engine

Behavior Model

Figure 3. Major Components for Simulation-Centered Tutors

One way of viewing simulation-centered tutoring systems is in terms of three major components, each of which has its own components. These three highest level components are • A tutor, or tutorial control system • A set of instructional presentation and related interfaces • A simulation The tutorial control system itself is likely to contain some type of tutorial engine, a model of the student or students, and a model of the domain knowledge or skills that are to be taught. Each of these may have a wide variety of forms. In actual implementations, it can happen that two or more abstract components may be implemented as a single software component. For example, in some tutors, the tutorial engine and the domain knowledge may be tightly intertwined. This fact does not invalidate the architectural model, so long as the implemented component provides the services that other components may require of them. To those client components, the compound component may appear as two independent components.

363

364


The set of presentation and related instructional interfaces will include such subcomponents as text-presentation windows, HTML viewers, text-to-speech components, and digital audio and video presenters. Such components are called presentation channels or, simply, channels, in our architecture. Another class of user interface elements are those that are used for entering answers. These can include popup menus and keypads, text entry dialogs, speech recognition, and so on. Any question posed by the tutor that does not require an interaction with the simulation is likely to require the use of one of these components, which we call entries. A third type of instructional interface element is a command. These components present interfaces such as buttons and menus that are used, not to answer questions (which may be judged), but rather to issue directives to the tutoring system. Examples include Quit ("I'm going to leave now; archive my data"); Help ("Give me a hint or even the answer to that question"); and Continue ("OK, I've read that bit, what's next?") In our architecture, there actually is not a unitary simulation component Instead, we propose that simulation-centered tutoring systems should have a behavior model and one or more model views. Of course, in some tutors it may be that a single software component implements the roles of both a behavior model and a model view. Figure 4 shows a simple two-dimensional graphic simulation tutor with some of the visible components labeled.

Figure 4, Graphical Simulation with Some Components Labeled

Figure 5 shows a virtual environment simulation tutorial system with some of its visible components labeled.

Figure 5. Virtual Environment Simulation with Some Components Labeled

365


The reason that model view and behavior model components are shown separately from each other is that the architecture supports team training applications. Each student can have his own computer with its own view of the simulation. These model views need not be the same, and for many applications they will not be. Team members often work with different parts of a complex interconnected system, and are able to see only one part of it at a time. The single behavior model determines the values of simulation attributes, and some of these value changes cause changes in what is displayed in the model views. Figure 6 displays a simplified schematic of a two-member team training environment. Student Station #2

Learning Environment Server

Student Station #1

—

Tutor Component

User View Tutorial Engine

Command

course Controller Student Model

Channel

.Interfaces 4

Model View

Behavior Model Simulation Engine

Figure 6. Multiple Model Views—Networked Team Training

A possible future application of the architecture will be the integration of tutoring systems with real equipment systems with sufficient computational power to support some of the service requirements of behavior models and model views. Instruction Interface Entry Controller Command Controller

Tutor Component

Channel Controller

Real Equipment Proxy/Interface

Figure 7. Real Devices in 'Simulation-Centered' Tutoring

It is possible to implement concrete components that dp not provide every service specified by the abstract components of the architecture. Doing so simply restricts the range of instructional interactions that a particular simulation-centered tutor is able to provide.

366

A Munro el al. /Simulation-Centered Tutors

Component Services Most of the components of this architecture are defined in terms of the services that they offer and that they may require of others. Too many services have been defined to be explicated within the page constraints of this paper. The participants in AIED are likely to be quite familiar already with the sorts of instructional services that may be offered by presentation channels, command objects, and entry objects. We therefore concentrate here on briefly describing some of the instructional services of behavior models and of model views. Selected Behavior Model Services. One type of behavior model service asks the behavior model to note when a student manipulates some element of the simulation. The tutorial component can specify whether it is interested in manipulations caused by discrete actions, such as mouse clicks, and whether it is interested in continuous actions, such as dragging. It can also tell the behavior model that it is no longer interested in such student actions: The tutor component can also ask a behavior model to close one simulation and open another. It can ask a behavior model to evaluate a previously defined expression or a new expression stated in the simulator's own behavior language. Not every simulation will implement the expression construct, but those that do can provide a powerful facility for the tutor to check on relevant aspects of the state of the simulation. A tutor component can ask the simulator for time in the simulator's time frame. It can make the simulation pause and restart and it can ask whether the simulation is currently paused. Tutors can also make use of a behavior model service that will set a simulation attribute to a specified value. Naturally, such an attribute value change can result in propagated simulation events. The tutor can also pretend to be a human user. It can ask the behavior model to simulate human interactions by specifying user events that should be 'played1 by the simulation. Both discrete events, such as mouse clicks or pinch glove 'picks', and continuous events, such as mouse drags, can be supported. Selected Model View Services. Most model view services are used either to ask a view to make some appearance change or to express an interest in or otherwise deal with low level student user actions in the view. Examples of appearance-related services include asking a model view to open or to close, asking it to highlight a graphical object, or to stop highlighting an object, and asking it whether a particular object is currently being highlighted. A tutor can also direct a model view to stop updating in response to changes in the behavior model. The view must accumulate graphical change data so that it can correctly portray the simulated world when the tutor directs the view to update itself. Naturally, the tutor can also direct a view to resume auto-refreshing in response to simulation changes. The model view provides a number of services related to user actions. The model view can be asked not to attend to student actions. When this happens, the view does not pass events on to the behavior model, so no simulation consequences can result A tutor can register an interest in user actions (both continuous and discrete), so that it will be informed when the user takes actions in the view. This makes it possible to respond instructionally to student actions, even when the behavior model has been prevented from responding to such actions. A tutor can ask a model view what input devices it has, such as mouse, keyboard, pinch glove, and so on. It can direct the view to disable a particular input device, or to disable all of its input devices. It can also query the current status of input devices. In order to simulate user actions, a tutor can direct a model view to carry out user events. This makes it possible for tutorials to demonstrate how procedures should be carried out in the simulated environment. For a complete specification of the services of the major components of this new proposed architecture, see our web page at http://btl.use.edu/VIVIDS/newVivide/APIs/


367

The specification presented there includes the services of presentation channels, commands, and answer entry components of the user view, in addition to expanded sets of services for the model view and for the behavior model. This site also presents a discussion of an example tutorial engine that makes use of the services specified by these APIs.

Conclusions A component-based architecture for simulation-centered tutoring systems offers the potential for collaboration among a variety of components that can be selected for optimal fit to particular tutoring requirements. In one application, a data-driven authored simulation engine can work with a scripted tutorial engine. In another, a hard-coded simulation or even an actual device, appropriately interfaced, could be used with the same tutorial engine. In yet another application, an authored simulation could work with a pedagogical agent. Features of the architecture support both implementations that are lightweight web-delivered single-student systems, and large implementations that support distributed team training applications. The open architecture is intended to encourage the rapid introduction of novel presentation, simulation, and tutoring technologies as they emerge. It is our hope that our colleagues in the tutoring research community will examine the details of our proposed component interfaces and that they will offer improvements and extensions.

References [1] Munro, A. Authoring interactive graphical models. In T. de Jong, D. M. Towne, and H. Spada (Eds.), The Use of Computer Models for Explication, Analysis and Experiential Learning. Springer Verlag, 1994. [2] Munro, A., Johnson, M. C., Pizzini, Q. A., Surmon, D. S., Towne, D. M. and Wogulis, J. L, Authoring Simulation-Centered Tutors with RIDES. International Journal of Artificial Intelligence in Education. 1997, 8, 284-316. [3] Munro, A. and Pizzini, Q. A. VIVIDS Reference Manual, Los Angeles: Behavioral Technology Laboratories, University of Southern California, 1998. [4] Stiles, R., McCarthy, L., Munro, A., Pizzini, Q., Johnson, L., and Rickel, J., Virtual Environments for Shipboard Training, Intelligent Ship Symposium, American Society of Naval Engineers, Pittsburgh PA Nov. 1996. [5] Johnson, W. L., Rickel, J., Stiles, R. and Munro, A., Integrating Pedagogical Agents into Virtual Environments. Presence, in press. [6] Stiles, R., L. McCarthy, and M. Pontecorvo, "Training studio: a virtual environment for training," 1995 Workshop on Simulation and Interaction in Virtual Environments (SIVE95) Iowa City, IW: ACM Press, July 1995. [7] Johnson, W.L. and J. Rickel, "Intelligent Tutoring in Virtual Environment Simulations," ITS '96 Workshop on Simulation-Based Training Technology, June 1996.


Skill Acquisition and Assessment



A Combination of Representation Styles for the Acquirement of Speech Abilities

Virginie Govaere LOR1A & Universite Henri Poincare B.P. 239, F-54506 Vandoeuvre-les-Nancy, France Virginie. Govaere @ loria.fr Tel. [33] (0) 383 59 20 74

In this article, we present GEREV, a guidance and assessment system for voice rehabilitation, and in particular, a system for representation and structuring of manipulated knowledge. The first point that we develop concerns details of the knowledge domain. Indeed, contrary to many Intelligent Tutoring Systems (ITS) systems which allow acquisition of reasoning (for example in geometry or in declarative knowledge), we work on the acquisition of the norm of acoustic parameters; these can be defined as an ensemble of physical sizes, some of which constitute the standard intervals of production for categories such as male or female. In rehabilitation, the norms must needs be adapted to students' possibilities. Therefore, we have centred a part of our work on the style and structuring of knowledge, which will permit us to adapt the norm. Next, we present our choice of representations and their implications. We qualify the quantitative values with the help of fuzzy sets, which allow us on the one hand to pass from the quantitative to the qualitative, and on the other hand to obtain as exhaustive and precise a representation as possible. At the level of the knowledge unit itself - that is the ability, defined as the entire body of information that allows to define an acoustic parameter representation - we use frames, which precisely describe notions of knowledge prototypes in the domain. As for the global organisation of expertise, we chose to describe it with the help of a network of frames which express relations between abilities. These combinations of representation styles are necessary in order to express the totality of knowledge details. However, they entail modifications within modules of the expert and the student which we will describe, before presenting in conclusion, an outline of thoughts on the portability of our work on other systems; and finally, perspectives which are evoked by problems concerning implantation of such a system, student assessment, and correction and guidance in GEREV.

371

372

1

V. Govaere / Acquirement of Speech Abilities

Introduction

The theoretical objective of this work is to model a student, with the aim of better adapting the interactive training environment placed at learner disposal. In order to individualise a system, it must arrange, notably, information on the user [4]. With this view, we turned to the different types of knowledge used by the training system; their form and their organisation as well as their utilisation in guidance[5]. Here, the originality of our contribution is at the level of the type of knowledge manipulated on the one hand, and on the other hand, at the level of its organisation. In this article, we first present GEREV, a system of guidance and assessment of learning in the framework of the speech rehabilitation system SIRENE; we then discuss features of representations, which will lead us to an exposition of our representational choices. Then, we will consider propositions for the GEREV's architecture, and we will conclude with reflections on the portability of the theoretical and practical contributions of this work. Finally, in the last section, we will evoke perspectives of GEREV which bear on the problems of establishing such a system, and of assessing, correcting, and guiding students in GEREV. 2

Objectives and presentation of the GEREV system

GEREV is an in-progress software developed in an existing environment, SIRENE [3]. Its function is to suggest an assessment and guidance adapted to the student. Distinctive features of GEREV include, on the one hand, the aim of training of a norm and not of a knowledge set, and on the other hand, the possibility to modify this norm according to the evolving ability of the student. Our objective is for the learner to acquire intelligible speech, which can be defined with the help of a particular set of abilities. What we refer to as ability throughout this article, concerns the entire data on an acoustic parameter as intensity, pitch. As a intelligible speech does not only have necessary and sufficient minimal data, we do not ask the student to produce a value of an acoustic parameter within a given range, but rather to tend towards a norm while accounting for his or her deficits. By this last point, our approach distinguishes itself amongst the many ITS which apply to knowledge training, such as the training of languages, a geometry, or a research on disabilities. SIRENE is a software intended for deaf adults. It presents a graphic visualisation of several parameters of the speech, such as the resonant intensity, the fundamental frequency, the rhythm of statements, the components of articulation. Normally, the speech is controlled via auditory feedback. In the case of the deaf people, this feedback can not occur. The principle of our software is relatively simple: it affords a visualisation of the acoustic parameters necessary in establishing and mastering intelligibility of speech. These parameters are regrouped in three large categories: voice, articulation, and prosody. In the present state of the system, SIRENE relies on the intervention of the speech therapist/rehabilitator, who will calibrate the exercises according to his or her level of requirement and the abilities of the subject. The assessment and advice in SIRENE are general and, in fact, must to be adapted to the student. Therefore, if one wants to permit regular training by an autonomous system, it is necessary to attack the problem (currently not treated in the software) of generating advice and guidance adapted to the student. From this perspective, we propose an autonomous system. It will contain, notably, knowledge on the student and on the reference's domain. The speech therapist will parameterise the system while accounting for data on the student, such as the sex, the degree of deficiency in the different groups of parameters, etc. According to these data and the current observations, the software will propose certain exercises, advices, and will be able to propose a different level of requirement according to abilities of the subject. This will bring to the level of domain knowledge, by an adaptation of the norm. 3

Features of manipulated knowledge

As we have previously seen, the manipulated knowledge concerns speech. The specialist expresses domain's knowledges with the help of qualitative statements of the form "the intensity is weak". The student's speech is controlled and measured rationally, with a


373

certain number of acoustic parameters (pitch, voicing, intensity...). The quantitative measures constitute a first stage of treatment, since these raw data are not directly usable each numeric value of a parameter does not itself suggest a qualitative assessment. It is therefore necessary to transform these quantitative data to qualitative data. For mat, it is necessary to form the numeric value groups which correspond to the qualitative data. Nevertheless, it is suitable to utilise strict ranges of parameter values. Indeed, we must be able to keep account of the proximity of those values; it is quite inconceivable that two near values, such as 40 dB and 41 dB, should be categorised as belonging to two qualitatively different groups of data. It is thus necessary to have recourse to a style of knowledge representation which permits to transform from numeric to qualitative while accounting for the proximity of the numeric values; and all that, in order to be able to use a faithful picture of the subject's production and to acquire sufficient informations so as to account for the evolution of their abilities. However, our objective is not to collect the maximum amount of data on the student, but rather to build up sufficient informations to which the system will then refer for individualising the guidance of the learning, and the updating of the norm initially aimed at. Effectively, those data must be reliable and pertinent for the system [10]. In representation of an ability, there are not only numeric values. Other informations are necessary to represent an ability such as those related to the designation of the ability, the unit of measurement, the target knowledges, and the advices. It is necessary to structure these informations. The structuration of data is common to all abilities. It could therefore be pertinent to use prototypes that one could instanced, on the one hand for every ability while bringing in the relative data to every ability, and on the other hand for every student while bringing in values of his or its productions. The last characteristic of this knowledge is that they are not each independent of the others. For example, if one wants to pronounce a vowel in a steady and intelligible manner, one must have a steady fundamental frequency. Therefore, it is not only necessary to organise knowledge of each parameter with regard to the others, but also to propose a representation that expresses relations of dependence and proximity between abilities. The system of chosen representation must preserve relations of mutual dependence or the proximity of mechanisms between abilities. We have thus far presented the characteristics and the essential requirements of manipulated knowledge in this system. In the following section, we expose the styles of retained representation. 4 4.1

Representation mechanisms in GEREV From the quantitative to the qualitative

For the speech-therapist, domain's knowledge is expressed in the following form :"the intensity must remain at a medium level". In GEREV, we have chosen to represent speech abilities in as close a form (actually identical if possible) to those manipulated by the specialist. That choice must permits us to transform the quantitative to qualitative while maintaining distances between the numeric values. Therefore, it appeared that fuzzy qualifications would be the most suitable for this type of representation. Thus, for each acoustic parameter, one defines some fuzzy labels such as "weak", "normal", "strong". Parameter's categories constructed with the help of these fuzzy sets constitute the reference knowledge with which the system works. The use of fuzzy sets allows to represent speech abilities manipulated by the learner in the same way as the specialist, and to benefit to the type of fuzzy set categorisation. This last point is particularly important for the reliability of the datum representation. By this type of data qualification, a precise picture is obtained by going to give a degree of adherence of the learner's production; this is the case for all fuzzy sets defined for this ability. Its different informations are brought in vectors. Each vector contain the same number of point that there are categories for one parameter. The value at each point, between 0 and 1, indicate the appurtenance to one parameter's category.

374


Intensity frame Ability moderated with dB Value of category (between 10 and 150) Adherence function Weak medium

strong

very strong

.45 0 68 150 Intensity in dB value of production : (waited vector) here, (0 0 .7 .45 0 ) target category : medium intensity if value of production is different to target category : reeducation in function to the value of production + advice adapted Figure 1: an outline of intensity frame

For example, for the parameter intensity, several fuzzy labels are defined: "very weak intensity", "weak intensity", "medium intensity", "strong intensity" and "very strong intensity". The intensity's vector have five points (00000). In Figure 1, we propose an example of that which could be the fuzzy sets and the distributions for the intensity parameter. An intensity of 68 dB would be categorised with a degree of appurtenance of .70 in "medium intensity" and .45 in "strong intensity". His vector will be (0 0 .7 .45 0). Here, .7 and .45 does not the chance the student's production to be in category [1] rather the appurtenance in parameter's categories. One therefore has, in a way, a picture from different perspectives of the same production, which then facilitates this type of representation. There is not information lost. Besides, the use of fuzzy sets permits to account for the evolution of ability for all student's answers, as thanks to the comparison of two vectors, the system can say if there is an evolution of the abilities concerning this acoustic parameter. 4.2

Representation and organisation of an ability

We have chosen to represent student's productions in a qualitative way, thanks to the fuzzy sets. Nevertheless, this only represents one part of the data necessary for modelling an ability. Indeed, the representation of an ability contains both information on the production's value, its designation and its type, as well as a common organisation to all abilities. These two notions (structuring of different information and identical structuring of all abilities) made us choose to use frames. The frames formalism could be compared to forms which include slots to fill in and slots filled by default; it represents prototypes of objects [6]. A frame possesses some of essential properties, notably those of values associated to an attribute by default when the real value is unknown, those of constraints that must satisfy attributes, those of procedures that release when the value of the attribute is either required or newly appeared [2]. The idea of prototype is interesting in this system. It intervenes, on the one hand on each ability in bringing in the specific data to each parameter (its unit of measure, the advice, the category targets), and on the other hand on each learner when the system is instantiated with a numeric value transformed thereafter in qualitative terms. As far as this last point, frames allow a system to be used despite the incomplete or current knowledge on the learner (values by default or values in waiting [6]). Indeed, our student profiles are structures of prototypical data in their form and in their content, in which the entry of data is waited (either learner's production or manual entry by the specialist) in order to


375

particularise the model and thus, the system. We present in Figure 1 an outline of the minimum informations which a single frame will contain: informations pertaining to the designation of the frame, the reference's knowledge represented by a distribution of fuzzy sets (described in section 4.1), the knowledge's target which is the category toward which the learner's productions must be offered, advices attached to the reference's categories, and the procedure to follow in case of deviation of the target category. Other informations will be able to be represented as exercises in which abilities are treated . 4.3

Organisation of abilities

Having chosen the type of representation of an ability, we are interested in the type of structuring between them. All abilities forms a knowledge system that constitutes the reference's domain by only considering abilities and not their particularisation by learners. To form this structured system, it is necessary to call upon a type of representation which allows to organise some knowledge with regard to others, and to express relations of dependence and proximity between them. The use of a frames network is the most adequate means to fulfil these requirements. We have thus seen the manner to represent all knowledges manipulated, from the numeric entry to the organisation of the knowledge system. We are now going to consider the integration of this representation system in the different components of GEREV: expert and student modules. 5

Proposition of software architecture of GEREV

In the ITS, there typically are four components [7]: - the expert's module in which there are reference's knowledges, and that has an active role because it generally contains strategies used by the expert; - the student's model, named module in the following, which are representations of the student's knowledges, possibly in addition to other data such as age, sex, incentives, and the learner's level; - a pedagogical module in which there are strategies bound to the aims of the training, such as training by trial and error, etc. - communication interfaces. During the last ten years, the most studied of these components has been the student's module. If appropriate to the pedagogical module [9; 10], the student's module should allow individualisation of the teaching. In the remainder of this exposition, we consider mainly the expert's and student's module, as they are essentially the ones to which is bound the problem of knowledge representations, or of abilities in our case. 5.1

The expert's module

In this section, we will consider, respectively, the following two measurements: - the data attached to the reference knowledge; - the reference knowledge itself. With regard to the expert's module, we propose a singular conception; at first, there is no place in which to include the equivalent of problem resolution strategies because, in the process of speech production, there do not exist any conscious strategies on behalf of the speaker, nor of the expert. Otherwise, we attach advices to the parameter's categories in this module and not in the pedagogical module. The explanation of this last point is quite simple: the advice depends on the type of answers produced, and not on the teaching used or on the subject himself. Indeed, if the intensity of the resonant production is too weak, it is necessary to give advice that will enable the subject to compensate for this deficiency, such as "take a deep breath before beginning...", and not the fact that we are in the presence of such a subject rather than another one, or that we use such an pedagogical strategy rather than another one. In GEREV, what will vary from one subject to another will be, on the one hand, the choice of one type of help presentation which is bound to the rehabilitation or the order in which the specialist will decide to illustrate the rehabilitation (for example, for the exercise on vowel articulation, system can chooses to propose a cut of the vocal tract

376

V Govaere / Acquirement of Speech Abilities

schematising the articulation of the vowel targets, or a representation of the delay between the target production and that of the subject); and on the other hand, the degree of requirement that will be selected to judge the adherence of the production to a fuzzy set defined for this ability... Therefore, all learner's answers that will be categorised as belonging to the same knowledge, will receive the same advice. The knowledges manipulated by the expert can take two forms: target knowledges which are performance's levels to be ideally attained (the norm), and reference abilities which are the basis from which the system is going to be able to treat the student's productions(those forms are represented with fuzzy set. Those are organised in parameter's frame). This translates itself in the expert's module by a marking of the fuzzy set which must be reached (medium intensity) for seen as being the target, as well as by the other sets bound to the ability in question. The learner succeeds when he or she produces a answer which is categorised in the target fuzzy set as having a high enough degree of adherence to this fuzzy set. Use to representation of abilities with the fuzzy sets, we obtain a representation as the whole of the knowledge domain. Consequently, all of the potential answers are modelised in this module, that either the just answers or also divergent answers, and this fast and easily implantable. 5.2

The student's module

This module is, in a sense, an instance of the expert's module. Indeed, the ideal objective for the learner is to reach the expert performance. The initial frames network corresponds to the a theoretical student model. The entry of individual data such as the learner's identity, his or her sex, and his or her expertise in the various abilities, lead to the initialisation of this user's profile. Theoretically, our conception of the system is to approach a model by overlay [7; 11]. Effectively, the system's objective is to bring students to the norm or the abilities conceptualised by the expert. Nevertheless, this module is not a subset of the expert's model since that corresponds to the acquisition of a norm and not of an set of knowledges or an ability. The expert is therefore not considered here as a "super-user" who would master a vast domain of knowledge and who would have one particular type of utilisation (at the level of strategy, for example); he is rather a reference, towards which it is necessary to aim the level of performance. The main difficulty encountered in overlay models is the problem of forming the knowledge domain. Classically, the expert's knowledges are represented as a set of knowledges that are going to compare the student's knowledge. All student's answers that are not present in the expert's knowledges, are considered false. In this conception, acquirement and hiatuse learner are represented. In order to take into account learner's deviations and therefore to allow the system to treat this type of information, system's conceptualizer build up catalogues of mistakes. This construction is expensive in time and does not necessarily foresee the entirety of potential answers. In our case, the use of fuzzy sets allows us to cover the whole of the learner's answers without previously having to build up mistakes catalogues, and without the risk to forgot a case or a combination of cases. 6

Conclusion

After a summary of the choices of representing abilities in this system and their principal advantages, we propose, once again, an outline of reflections on the potential generalisation and limits of our work. Our choices of representation are as follows: - an ability is qualified by fuzzy sets (the qualification of data allows to preserve the same type of knowledge representation as that manipulated by the specialist; to qualify a production progressively, thanks to a adherence's degree to several fuzzy sets at the same time; and to take into account their evolution); - an ability is an elementary data set organised in a frame (to propose an data organisation in which there is a skeleton of informations with data by default and data necessary to be entered into the system); - the ability sets or the elementary frames, that either is at the expert's module or at the student's module, is organised inside a frames network (to inform on the proximity and the


377

relations amongst the different abilities or knowledges, and both in structuring the knowledge system). Now, the problem is to know the extent to which this combination of types of representation can be generalised to other systems. In order to present a clear exposition, we will successively consider the different aspects of types of representation. The use of the fuzzy qualification permits us to easily represent the entirety of the potential student's answers. This is not a insignificant contribution, as it theoretically allows to suppress the major default of overlay models. Nevertheless, it is not the fuzzy logic that solves this problem. Indeed, that is possible to the extent that the domain knowledge is given by imprecise data (measures of parameters) that will be categorised in qualitative terms (cover all values consistent of the abilities). The knowledge used must therefore be in concordance with the fuzzy representation type in order for take full advantage of the latter. Nevertheless, it may be judicious to qualify student's answers by fuzzy sets of the type "very close to the correct answer". This might allow to construct a student's model that would take into account not only the student's answer, but also the gap with respect to the target answers. However, the conception of these sets would be relatively expensive, as it would first be necessary to conduct a study in order to foresee all cases and also to appraise their gaps with the target. The data's organisation with frames is very interesting in EIAO, whichever the nature of the knowledge representation. Effectively, it allows to construct the student's model in bringing in default values. On the one hand, this permits the system to function despite the incompleteness of the model; and on the other hand, those values will be subsequently reestimated in order to work out the student's profile, and in also allowing to propose a knowledge structuration that foresees the new future data. This last type of organisation the network of frames- is quite applicable when the information that the system is to manipulate embodies the notions of dependence or proximity between concepts or knowledges. These notions are independent of the type of information itself, but allow to add and to organise the relative data either to the domain of application or to the student. They therefore appear to be fundamental in the development of the system. 7

Perspectives

To conclude this exposition, we will succinctly present problems of implementation of GEREV, and student's assessment, reeducation and learner's guidance in exercises (he specifications are finished, the interface are realised and I'm coding the Gerev's system). The difficulty encountered in implementation is located, on the one hand, at the information exchanges between GEREV and SIRENE, and, on the other hand, at the conception to reference sets (fuzzy sets). For the latter, it is necessary to establish the limits of each set, with the help of the speech-therapist. It is this setting up of limits that will partly depend the exactness of the subject's answer categorisation. Nevertheless, while this stage is important, it does not appear insoluble or particularly expensive in the process of development. As for exchanges of informations between GEREV and SIRENE, we foresee the use of an architecture of the " epiphyte" type [8], which should allow the two systems to function independently. This type of architecture is going to allow GEREV to constantly take information on the subject's productions from SIRENE without interrupting the latter; and also to play the role of the speech-therapist in sending back data resulting from the assessment, which will allow to guide the subject in the course of the different exercises. Several times in this article, we have evoked the adaptation of the reference knowledge to the learner. We foresee that this possibility can be achieved by a displacement of the limits of fuzzy sets. If a student presents a very weak intensity, these fuzzy sets will displace in order having "normal intensity" between 40 and 60 dB rather than between 50 and 70 dB. This example is theorical since no experimentation or study has currently been carried out to allow adjustment of reference knowledge to a given subject, especially at a given moment. There are different levels of requirement, since one answer can be accept as normal in spite of the relatively distant productions of "medium parameter" sets for subjects having an important deficit in the given parameter. As for the reeducation and the student's guidance, two types of information are taken into

378


account: - data on the subject's performance on a parameter; - prerequisites to the rehabilitation of parameters. We will realise a work for rehabilitate a parameter when the reference category for a subject is distinct relative to the target category. The student guidance during exercises which make up the rehabilitative progression, is based on prerequisites for each exercise. To perform an exercise, it will be necessary that all parameters relevant to the realisation of the exercise are either categorised in "medium parameter" sets or chosen deliberately by the learner or the specialist. Finally, we foresee two types of subject's assessment: an assessment in context, which corresponds to the assessment of an ability in an exercise specifically designed to assess that ability; and an assessment out of context, which corresponds to an assessment of an ability in certain exercises not directly aimed at the rehabilitation of that ability, but in which that ability plays a part.

References [1] J. Beck , M. Stern and B-P. Woolf, Using the Student Model to Control Problem Difficulty. User Modelling. (1997) 277-288. [2] J-P. Haton,, N. Bouzid, F. Charpillet, M-C. Haton, B. Laasri, P. Marquis, T.Mondot and A. Napoli. Le raisonnement en intelligence artificielle. InterEditions, 1991 [3] M-C. Haton, Issues in self evaluation and correction of speech. In K.Ponting, (ed). Computational Models of speech Pattern Processing. Springer-Verlag 1998 [4] R. Lelouche , The successive contributions of computers to education: a survey". European Journal of Engineering Education, 1998. 23( 3), p. 297-308. [5] S. Leman, P. Marcenac, S. Giroux, Reconnaissance et modelisation du raisonnement d'un apprenant; Une approche multi-agent, RFIA : reconnaissance des formes et intelligence artificielle, 1996. 1, 367-376. [6] M. Minsky, La societe de I'esprit. InterEditions, Paris 1988. [7] J-F. Nicaud and M. Vivet, M. Les tuteurs Intelligents : realisations et tendances de recherches. Techniques et Sciences Informatiques, 1988. 7(1), 21–45. [8] F. Pachet,, J-Y. Djamen.. C. Frasson, and M. Kalrenbach, Un mecanisme de production de conseils exploitant les relations de composition et de precedence dans un arbre de laches. Sciences et techniques educatives. 3(1). (19%) 43-73. [9] J-A. Self, Bypassing the intractable problem of student modelling. In Frasson C and Gauthier G (Ed). Intelligent Tutoring Systems : At the crossroads of AI and Education 1988. [10] J-A. Self, Formal approaches to student modelling. In Greer J.E and MacCalla G.I, (Ed, Springer-Verlag), Student Modelling: the key to Individualized Knowledge Based Instruction, 1994. 295354. [11] K., Vanlehn, Foundations of Intelligent Tutoring Systems. Lawrence Erlbaum Associates (1988), p 55–76.


379

An Evaluation of the Impact of AI Techniques on Computerised Assessment of Word Processing Skills R. D. Dowsing and S. Long School of Information Systems, University of East Anglia NORWICH NR4 7TJ, UK Email: (rdd,sl}@ sys.uea.ac.uk Abstract: Despite the early promise of Artificial Intelligence (AI) techniques in educational systems, such techniques are still rarely used outside the laboratory. This paper presents evaluation results for a computerised word processing assessment system which was conceived to address a practical educational problem; the AI component was added in response to system performance requirements, rather than being the focus of the system. The automated assessor is intended to reproduce the performance of human examiners in professional word processing examinations. Different, increasingly "intelligent", versions of the assessor are compared in an empirical study on large sets of authentic test data. The evaluation strategy used follows the Expert System model of comparing system performance to human expert/correct performance. The results show a direct correlation between the use of explicit knowledge representation, that is, increased system intelligence and improvements in the assessor's performance. By using AI techniques the system performs authentic examination assessments with accuracy, consistency and speed.

1. Introduction The great promise of the Artificial Intelligence in Education (AIED) work carried out in the 1970s and 80s, typified by Intelligent Tutoring Systems (ITS) such as Clancey's GUIDON [4], and Anderson's LISP Tutor [18], has so far not lead to the wide-spread use of AI techniques in "real world" educational systems. In fact, the number of AIED systems currently in use outside the laboratory is very small. Several reasons for this poor take up have been proposed. At a recent forum [8] Self noted that AIED system development tends to be top down, in that a theory is first developed and a suitable domain attached afterwards. This often leads to theoretically interesting systems of little practical use [11]. At the same forum Breuker highlighted the fact that the long term industrial investment needed to develop useful systems has so far not been forthcoming [9]. Self identified several real world educational applications where Artificial Intelligence (AI) techniques are being used [11]. These systems were developed bottom up, to solve particular educational problems and AI techniques were introduced only as system behaviour required them. Despite the current lack of examples of ABED applications in real world use, evaluation results are available[10] which demonstrate the benefits of AI in educational software. This paper is intended to add to that body of evidence by comparing the performance of different versions of the same practical educational system incorporating different levels of "intelligence". The experimental systems compared in this paper constitute various stages in the development of an automated assessor of word processing skills. Increasingly large numbers of candidates, both in academic and vocational settings, are being required to demonstrate skills in word processing. Human based assessment of such skills is time consuming, expensive, and error prone, and so the idea of computerising the role of the human examiner in the assessment process was conceived. It was recognised at an

380

R. D. Dowsing and S. Long / Computerised Assessment of Word Processing Skills

early stage that automated assessment might not completely replace human examiners. Computerised assessment can deal with the majority of solutions which are relatively easy to assess, and human examiners can deal with solutions which are too difficult to assess by computer. Increased system intelligence represents additional development time, and therefore cost. The experiments described here were employed to establish when a suitable level of performance, measured by the accuracy of assessment results, was achieved, and thereby identify when development of the assessor could stop. The computer-based assessor described in this paper is different from traditional AIED systems, for example ITSs, in that it was designed solely to automate a summative assessment process, presently carried out by human examiners, and not to provide any tutorial interaction, although the technology can be applied to formative assessment or tuition. The expert system evaluation model is employed, where system performance is compared to that of a human expert, or some other benchmark of appropriate performance [1]. Only those issues relating to the system's ability to reproduce the appropriate assessment results are discussed in this paper; other descriptions of the system can be found in [5, 6]. 2. Background 2.1. Word processing skill The use of word processors to create and edit documents has, until recently, been the domain of the secretary. However, with the proliferation of the personal computer a large proportion of people in work, academia or home environments, now have occasion to produce documents using word processors. Traditionally, documents were distributed and read as hard copy, but with the growing power of the internet and electronic mail, they are increasingly being distributed and viewed electronically. As with any skill that involves the creation of some artefact, two components of the skill are available for analysis, what Rowntree calls the product and the process [19]. For word processing the product is the completed document which constitutes the final output of the task, be it a business letter, a memo, a report, etc., either printed out on paper, or as a machine readable file. The completed document has two principal features, the textual content, and the style information referring to how the text is formatted and set out on the page. The document is readily available simply by saving and/or printing from the word processing application. The process is the solution path or method and consists of the sequence of edits, operations or actions carried out during the course of a word processing session. The solution path can be gathered either by using dedicated software to log the events in the word processing session, or by direct observation. 2.2. Assessment Assessment, in this paper, is taken to be the process of establishing the levels of skill and/or knowledge of the candidate by means of the collection and analysis of evidence. The process of assessment can be divided into the following generic stages [13]. 1. 2. 3. 4.

Define assessment objectives and requirements. Collect evidence. Match evidence to objectives and requirements. Make judgements based on match results.

Most professional vocational awards in word processing follow this model and incorporate the principles of competence based assessment [13] and Mager's Criterion Referenced model [16] which stress the importance of well defined performance criteria and valid evidence of skilled performance. Detailed criteria are prepared outlining performance objectives and error tolerances. These are used to guide exam design, course design, and to aid examiners in marking. The evidence for the assessment is provided by the final paper documents produced during an authentic word processing session, rather than the process or

R.D. Dowsing and S. Long / Computerised Assessment of Word Processing Skills

381

method employed. This is partly because, in terms of business and vocational requirements, the final product is more important than the method, and partly because it is much easier to organise the assessment of paper based documents by human examiners. It also means that any word processing application can be used (provided it is connected to a printer) to produce artefacts which can be assessed using the same criteria. The assessment process performed by human examiners relates mainly to stage 3 above. The evidence of performance, that is the completed document, is matched with the pre-set assessment criteria to check that performance criteria are met. Essentially, this involves detecting and classifying errors. The final stage of professional word processing assessment involves comparing the number of errors detected with predefined tolerance criteria in order to establish an overall classification for the candidate (corresponding to stage 4 above). 2.3. Systems Evaluation Evaluations of ITSs and other AIED systems generally concentrate on demonstrating the effects of the ITS on student learning [10]. Littman and Soloway [12] call this external evaluation and sum it up with the question "what is the educational impact of an ITS on students?" External evaluations have been carried out which compare the effects on student learning of different "smarter or dumber" versions of the same system [10]. Littman and Soloway also identify the need for "internal" evaluation which addresses the question "what is the relationship between the architecture of an ITS and its behaviour?" and report results of both kinds of evaluation for their PROUST Pascal ITS. Systems which are not designed to teach, but rather to reproduce the performance of a human expert cannot be evaluated by studying teaching outcomes. The system described in this paper is designed to emulate a human examiner performing a summative assessment for accreditation purposes. The evaluation process described here is therefore more akin to the evaluation of Expert Systems [1] involving the empirical "testing of the knowledge base against the judgmental accuracy of experts and ground-truth measures of accuracy" [ 1, pp. 6]. Littman and Soloway's internal evaluation is also employed to explain system performance in terms of system architecture. Thus the evaluation process is concerned with the performance and knowledge structures of the system, rather than learning improvements. 3. Why automate assessment? Professional word processing examinations, marked by human examiners, are well established. Experienced examiners are very good at classifying errors in accordance with complex assessment criteria, and mark many thousands of exam solutions each year. However, traditional methods are not without problems. Human examiners make mistakes, such as failing to detect potential errors, because they become tired or because their concentration lapses. In addition, there is a risk that different human examiners may interpret complex errors differently, leading to inconsistency. This is one reason why Examinations Boards hold standardisation meetings. Paper based examinations are also time consuming to mark, involve large amounts of paper pushing, and are very difficult to audit. Computerbased assessment, assuming sufficient levels of accuracy can be achieved, promises not only faster assessment, but also more consistency and the ability to audit, and statistically analyse, all results. 4. Overview of basic automated assessor The basic automated assessor, which was developed to assess word processing examinations, follows the professional assessment model by assessing the products of authentic word processing tasks; the fundamental difference being that solutions are presented for assessment as machine readable files rather than in hard copy. The current system inputs Rich Text Format (RTF) files [21], which can be produced by most modern word processors.

382


The basic automated assessment process relies on two assumptions, for each document to be produced in an examination: 1) there exists a document - called the goal document which is the result of correctly carrying out the instructions in the exam paper, and 2) evidence for the assessment can be provided by comparing the candidate's final solution - the candidate document - to the goal document. The basis of this document comparison process is a published algorithm [15] which can compute the minimum number of differences between two strings of tokens and produce a minimum length list of deletes and inserts necessary to convert one string into the other. The assessment process in the earliest systems comprises the following stages: 1. Tokenise text strings from goal and candidates documents into lists of words 2. Compare word lists to produce a list of differences 3. Count differences as errors and compare to pass/fail criteria This flat comparison approach is suitable for simple assessment criteria based on raw differences. However, early testing showed that treating all differences as errors and using fixed counting strategies is too simple to support more complex assessment criteria of the kind used by professional examinations boards. The difference-based approach provides evidence for assessment, but more sophisticated evidence and criteria matching procedures are necessary for complex assessment criteria. 5. Target word processing examination The evaluation results in this paper refer to versions of the system whose knowledge resources have been targeted at a particular examination scheme - the RSA Examinations Board "word processing stage 1 part 2"[20]. Following this scheme, three short documents are produced/edited during an examination of one and a half hours. Instructions are presented on paper showing what alterations to make to the original documents. The skills required to successfully complete the instructions include inserting, deleting, replacing and moving text, formatting characters, paragraphs and pages, altering the layout of the document, creating or editing tables. Examinations can be taken, using any word processor, at one of the hundreds of examination centres across the UK. Final document solutions are printed out and sent to human examiners for marking. The human examiner studies the final solution for errors and assessment criteria are applied which dictate how to count those errors, leading to a final classification of distinction, pass or fail. Errors can be counted once for each word in error, once per instance (contiguous group of words in error), once per examination, or simply ignored. Assessment criteria map errors of different types, in different contexts, to these error counting methods. For example, superfluous additions of words are counted once per word, while failure to delete text present in the initial document is only counted once per instance. Similarly, failure to emphasise text as instructed is counted once per examination, but additional emphasis or underlining in headings is ignored. Criteria also dictate how the total error count relates to the overall classification of the candidate, that is, whether a distinction, pass or fail has been achieved. In the target examination 3 or less errors achieve a distinction, 4 to 7 errors a pass, and greater than 7 errors indicate a fail. 6. Experimental versions of the Assessor Evaluations of early versions of the basic system showed that, while it could identify potential errors, it failed to reproduce the subtle error classification of the human examiners, and therefore very often misclassified candidates. A knowledge acquisition exercise [7, 14] was carried out involving a study of the target assessment criteria and a protocol analysis [7, 14] for expert human examiners assessing exam solutions. This lead to the development of more intelligent versions of the system capable of carrying out more complex matches of assessment evidence to criteria.


383

6.1. Test Version 1: Structured Comparison The version of the system used as the base line performance comparison in this study is an improved version of the flat comparison model. The assessment process has various stages. 1) a pre-processing stage deals with special cases which can cause problems for the comparison engine, such as missing spaces between words, 2) the main comparison stage employs algorithms to detect all the differences between the goal and candidate files. This comparison is structured, grouping information into difference scripts according to broad type - simple text differences, text misplacement, white space and punctuation differences, case differences and format differences. 3) Given this structured difference information, independent error counting strategies are declared for each difference type allowing some coarse grain adaptability between counting strategies. The counting strategies used are as follows: simple textual errors - once per word, text misplacements - once per instance, white space/punctuation errors - once per examination, case errors - once per instance, format errors - once per examination. 4) Finally, error counts are related to distinction/pass/fail criteria to produce the candidate's overall classification. 6.2. Test Version 2: Context-free difference interpretation This version employs the pre-processing and structured comparison stages described above to provide evidence for its assessment. However, a new post processing stage is added after the main comparison in which differences identified in the comparison stage are matched against higher level error and edit patterns stored in a structured knowledge source. For example, the white space difference pattern replace "word.<space>" with "word. " would be reclassified as a missing paragraph error. A more complex rule base for linking these interpreted differences with error counting strategies is employed. For example the following rule is used to count errors of the above type, where the priority field is included in rules for conflict resolution purposes [7, 14]: if difference type is MISSING PARAGRAPH, error type is PARAGRAPH ERROR, count type is ONE PER INSTANCE with priority 2

6.3. Test Version 3: Context-sensitive error classification In the target examination criteria, error counting strategies are often modified according to the context in which an error occurs. Accordingly, in this version, information concerning the active task and document region is attached to the goal document, and in turn, to the candidate document during the comparison stage. Error counting rules are augmented to include fields relating to task and document component context, and the rule base expanded to cater for all relevant cases. An example of this type of rule is, if difference type is MISSING PARAGRAPH and region is TABLE, error type is PARAGRAPH ERROR, count type is ONE PER EXAMINATION with priority 1

6.4. Test Version 4: Context-sensitive error classification with uncertainty An important outcome of the Knowledge Acquisition exercise has been an improved understanding of the differences between the human and the automated system, in particular with regard the identification of errors which are difficult for the automated system to detect or classify correctly. For example, the human examiner has access to visual data, whereas the computer must infer visual information from the machine readable document. It is therefore possible that errors (differences between candidate and goal document) detected by the automated system may actually be invisible on the page. An example is the addition of a superfluous new-line. If this occurs at the end of a line it is indistinguishable, on paper, from

384

R.D.

Dowsing and S. Long / Computerised Assessment of Word Processing Skills

a naturally occurring soft new-line. A list of machine classified errors which may be prone to this uncertainty has been developed. The notion of uncertainty is incorporated into the system by means of a simple certainty measure [7], attached to every error counting rule, indicating whether the rule is certain or uncertain. This essentially allows a maximum and minimum error count to be maintained. If these counts fall into different final categories (distinction, pass, fail) then the final result is uncertain and the candidate solution is marked for reassessment by a human examiner. The size of the rule base is also increased as some error counting rules can have different certainty values in different contexts. This use of uncertainty allows the system to qualify its final judgement about the examination candidate. 7. Evaluation The performance of the various versions of the automated assessment system were compared for a sample of candidate solutions obtained from two authentic examination scripts. The benchmark for comparison of all systems is the correct assessment of candidate solutions, applying the target assessment criteria, following the Expert System evaluation model. The two examinations are identified below as Exam A and Exam B. Much of the system development was performed using test data for Exam A. Evaluation results for Exam B are also included here in order to demonstrate the transferability of the knowledge base. The same error counting rule base was used for each examination, although information about the specific location of document components was different for each goal document. 44 candidates' solutions were available for testing for Exam A, while 41 sets of solutions were tested from Exam B. These solutions were produced by candidates sitting these examinations, at more than a dozen separate examination centres. 7.1. Dimensions and methodology of empirical study The ability of the different system versions to produce the correct assessment results for all the candidates is compared by counting the number of candidates who receive correct or incorrect final classifications (distinction, pass or fail). An additional category is used for version 4 of the system to count the number of candidates' solutions which are marked for reassessment by human examiner due to uncertainty. 7.2. Results Figures 1 and 2 below show the performance of the different versions of the system for Exam A and B respectively. The Y axes shows the percentage of candidates in each category, while the X axes refer to the system version (v 1 - Structured Comparison, v2 - Context-free difference interpretation, v3 - Context-sensitive error classification, v4 - Context-sensitive error classification with uncertainty).

Figure 1 Exam A

Figure 2 Exam B


385

7.3. Observations The performances of versions 3 and 4 are very similar for both sample examinations. For both sample examinations the number of erroneous classifications decreases as system versions become more intelligent. Version 3 of the system correctly classifies at least 90% of the sample candidates (93% for Exam A and 90% for Exam B), i.e.. erroneous classifications do not exceed 10% of the sample. In version 4 erroneous classifications have almost disappeared (only 2% of the samples), although the overall number of correctly assessed solutions is slightly less than that produced by version 3, as some are now marked for reassessment. The performances of versions 1 and 2 for Exam B are significantly worse than the performances of the same versions for Exam A. 7.4. Discussion The observations about system performance described above are now explained in terms of system architecture, following Littman and Soloway's internal evaluation model. Version 2 out performs version 1 of the system for both examinations. This is because it is capable of finer grain modification of its counting procedures in comparison to the more basic version. For example, it can distinguish between a "missing paragraph division" error and a "too many spaces" error, whereas version 1 would treat them both as white space errors. Versions 1 and 2 of the system perform significantly worse for Exam B compared to Exam A. The reason for this clearly demonstrates the impact of context sensitivity on the system. Close inspection of both examinations, and of the automated error scripts produced for each candidate, show that Exam B contains a special table which has no equivalent in Exam A. Certain error types are treated differently when they occur in such an area of the document. Neither version 1 nor 2 of the system employ knowledge of context, and therefore cannot deal with errors occurring in that table in the correct way. Once context is added, the system can adapt its error counting appropriately to deal with this area and no substantial differences between the two sets of data is noticed. Hence versions 3 and 4 achieve similar results for both sample examinations. The performance of version 3 of the system already compares favourably with that of expert human examiners, achieving a high level of accuracy and consistency. The impact of the addition of uncertainty into the system, and the facility to mark solutions for reassessment (version 4), is to reduce the number of erroneous classifications still further. This improvement in the reliability of results comes with the additional cost of reassessment of scripts which cannot be confidently assessed automatically. There is clearly a trade-off between accuracy and reassessment costs. The level of uncertainty employed in the knowledge base tested here achieves a high level of accuracy (only 2% errors), while keeping the number of scripts marked for reassessment to around 10% of the sample. It is possible to take a more conservative approach to uncertainty, by flagging more error counting rules as uncertain, and reduce the risk of system error to close to zero. However, this leads to a larger number of candidate solutions being reassessed, thereby defeating the object of automating assessment. Similarly, if less error counting rules are marked as uncertain a larger proportion of the sample would be assessed with confidence, but the reliability of results might worsen. Ultimately the body responsible for an examination has to trade-off an acceptable level of error with the cost of human assessment and the error associated with it. 8. Future Work and Conclusions The results reported above show that versions 4 and 5 of the system can achieve very high levels of accuracy and consistency by employing relatively straight forward AI techniques. While it would not be cost effective to continue increasing system intelligence much further, one additional level of knowledge is being considered. The system currently treats errors as independent entities, in much the same way the GOMS model of skilled performance treats text processing tasks as independent unit tasks [2, 3]. However, on occasion, apparently separate errors can interact and modify the ways in which either, or both

386

R.D. Dowsing and S. Long / Computerised Assessment of Word Processing Skills

should be interpreted and counted. For example, an unsolicited paragraph error occurring next to a manual page number error should, in fact, be ignored. In order to deal with this in the architecture of the assessment system, a knowledge component of meta-rules is necessary to define how to deal with special combinations of error counting and classifying rules. This paper has described the impact of AI techniques on a system designed to tackle a complex educational problem. The AI methods employed convert it from being a very coarse grain indicator of word processing skills, to being able to carry out a sophisticated professional level assessment with the adaptability of an expert human examiner. By showing that relatively straightforward AI techniques can be used to solve a practical educational problem, these results support Self's argument that such practically conceived systems represent a promising way forward towards mainstream usage for AIED in general. 9. Acknowledgements We would like to thank the four UK Higher Education Funding Councils for funding part of the work on which this paper is based via a Teaching and Learning Technology Programme (TLTP) grant. We would also like to acknowledge the support of the OCR (formerly RSA) Examination Board for funding the work on professional examinations. 10. References [1] Adelman, Leonard (1992). Evaluating Decision Support and Expert Systems. John Wiley and Sons, inc. [2] S. K. Card, T. P. Moran, and A. Newell (1983). The Psychology of Human-Computer Interaction Hillside: Lawrence Erlbaum. [3] S. K. Card, T. P. Moran, and A. Newell (1980). Computer text-editing: an information-processing analysis or a routine cognitive skill. Cognitive Psychology, 12:32–74. [4] W. J. Clancey (1982). Tutoring rules for guiding a case method dialogue. In D. Sleeman and J. S. Brown (eds.), Intelligent tutoring systems (pp. 79-98). New York: Academic Press. [5] R. D. Dowsing and S. Long (1997) The Do's and Don'ts of Computerising IT Skills Assessment. Proc. 14th ASCILITE Conference, Perth, R. Kevill, R. Oliver and R. Phillips (ed.), ASCILITE, CEDIR. University of Woolongong NSW 2522, Australia. [6] R. D. Dowsing, S. Long and M.R. Sleep (1996) The CATS word processing skills assessor. Active Learning, n.4, July 19%. CTISS Publications, University of Oxford, ISSN 1357–1125. [7] S. Dutta (1993). Knowledge Processing and Applied Artificial Intelligence. Butterworth-Heinemann. [8] From IEE Colloquium on Artificial Intelligence in Educational Software, organised by Professional Group A4 (Artificial Intelligence), held at Savoy Place, London on Friday 12 June 1998. Digest No: 98/313: [9] Breuker, Joost (1998). University of Amsterdam. What are intelligent coaching systems and why are they (inevitable? [10] du Boulay, Benedict (1998). University of Sussex. What does the "AI" in AIED buy? [ I I ] Self, John (1998). University of Leeds. Grounded in reality: The infiltration of AI into practical educational systems. [12] David Littman and Elliot Soloway (1988). Chapter 8: Evaluating ITSs: The Cognitive Science Perspective. In Martha C. Poison and Jeffrey J. Richardson, editors, Intelligent Tutoring Systems, pages 21–53. LEA [13] Fletcher, Shirley (1992). Competence-based Assessment Techniques. Kogan Page. [14] G. F. Lugar and W. A. Stubllefield, editors (1989). Artificial Intelligence: The design of Expert Systems. Benjamin/Cummings [15] Webb Miller and Eugene W. Myers (1985). A file comparison program. Software - Practice and Experience, 15:1025–1040. [16] R. F. Mager (1990). Making Instruction Work. Kogan Page. [17] Martha C. Poison and Jeffrey J. Richardson, editors (1988). Intelligent Tutoring Systems. LEA. [18] B. V. Reiser, J. R. Anderson, and R. G. Farrel (1985). Dynamic student modelling in an intelligent tutor for lisp programming. In Proceedings of Ninth International Joint Conference on Artificial Intelligence. CA: Morgan Kaufman. [19] D. Rowntree (1987). Assessing students - how should we know them?. Kogan Page. [20] RSA Series Examinations Handbook 1996–1997. RSA Word processing 1 part 2 (A.3.a.28) [21] RTF Specification 1.5. http://www.primate.wisc.edu/software/RTF/

387


Internet Based Evaluation System A. Rios, E. Millan, M. Trella, J.L. Perez-de-la-Cruz and R. Conejo Departamento de Lenguajes y Ciencias de la Computation Facultad de Informdtica, Campus de Teatinos, 29071. Malaga, Spain (rios, eva, trella, cruz, conejo}@ iaia.lcc.uma.es

Abstract. In this paper, we describe the design and development of a web-based Computerized Adaptive Testing system (CAT) that is still under development and will be one of the main components of the TREE project. The TREE project consists in the development of a several web-based tools for the classification and identification of different European vegetable species (an expert system, interfaces for creating and updating databases and an intelligent tutoring system). The test generation system will be used by the ITS diagnostic module, and has a complete set of tools that not only assists teachers in test development and design, but also supports student evaluations. Adaptive capabilities are provided by an IRT model. While the student is taking the test, the system creates (and updates) his/her temporary student model. In this way, the system can be used in two different ways: as an independent evaluation tool over the WWW (SIETTE system, already finished), or as a component of the diagnostic module in any ITS with a curriculum structured knowledge base as the TREE ITS. Keywords: evaluation system, authoring tools, adaptive testing, student model, diagnosis.

1 Introduction In this paper, we will describe one of the main components of the TREE project. (TRaining of European Environmental trainers and technicians in order to disseminate multinational skills between European countries). The TREE project is included in the EU Leonardo da Vinci Program, and its main goal is the development of an ITS for the classification and identification of different European vegetable species. The main modules of the tool being developed in the TREE project are: the Expert System (ES), the Intelligent Tutoring System (ITS) and the Test Generation System (TGS), as shown in Figure 1.

Experts

Figure 1. Structure of TREE

All these tools make use of the Knowledge Base (KB) that contains all the information about the botanical domain and is being incrementally built by different users in different

388

A. Rios et al / Internet Based Evaluation System

locations. Each one of the components, including the knowledge base, has an independent web-based interface that allows the whole system to be used as a learning tool or as an independent consultation tool. The test generation system used in TREE has been developed and implemented as an independent and reusable system for the design and generation of adaptive tests over the WWW. Also, this system can interact with any ITS that has: • *

a curriculum structured knowledge base, and a student model defined as an semantic network, where each node is a curriculum component with an associated knowledge level.

The system described in this paper tries to join the dynamic nature of computer adaptive tests and the advantages that the WWW offers as learning environment (multimedia content, hypertext capabilities, client/server architecture). In this way, evaluation methods can be included in distance learning systems to provide them with adaptive capabilities.

2 Adaptive Testing In traditional paper and pencil test evaluation methods, the storage and analysis of the information is static. The use of computers in testing processes opened up the possibility of making these processes dynamic. An computer adaptive test is a computer administered test where the presentation of each item and the decision to finish the test are dynamically adopted based on the student answers, and therefore based on his/her proficiency. By using a temporary student model, an adaptive test can also generate descriptive information about the student learning style and problems. The theory used to provide computerized adaptive tests with adaptive capabilities is Item Response Theory (IRT). DRT is a statistical framework in which examinees can be described by a set of ability scores that are predictive, linking actual performance on test items, item statistics and examinee abilities. The two central issues in adaptive testing are: a) Calibration of items. In order to provide computerized testing systems with adaptive capabilities, questions have to be calibrated with some parameters. These parameters will guide the question selection strategy. b) Content-balance. The test has to cover all the important content areas in the subject. These important issues are discussed in detail in [1]. The main advantages of adaptive testing over traditional testing are: • A significant decrease in the test length (consequently, in the testing time), as shown in the comparative study reported in [2]. • More accurate estimations of student knowledge level. Besides, usually this information is more detailed and therefore a better use of it can be made when trying to provide feedback or take instructional decisions. Our web-based tool to assist in the evaluation process is very simple to use, and makes these advantages accessible to educators all over the world. Moreover, this tool can help in the development of complete educational web-based systems that not only deliver the instructional material, but also provide detailed performance measurements relative to the learning process. This information that can be used either by instructors (evaluation). Intelligent tutoring systems, (instructional decisions) or by students (self-evaluation).

3 The SIETTE System The web-based tool we have developed is called SIETTE (Intelligent Evaluation System using Tests for Teleeducation; [3]). SIETTE can be used in two different ways: Instructors and domain experts can use SIETTE to define tests, • Students can use SIETTE to take the tests that are automatically generated according to the specifications provided by instructors and the information stored in the temporary student model.

A. Rios et al. / Internet Based Evaluation System

389

Once that the students have taken the tests, the system presents detailed information about the performance of each student. The architecture of the SIETTE system is shown in Figure 2. In order to give a general overview of the system, we will briefly describe all the components and then, we will discuss the most important ones in more detail. Test Editor

Adaptive Test Generator

WWW Interface

WWW Interface

Figure 2. SIETTE Architecture

•

The question knowledge base is a collection of possible questions to pose in a test. All these questions are calibrated with some parameters. The test edition module is the tool used by instructors or domain experts to define the tests. This module allows to define the structure of the domain: topics, questions, relations between them and relative weights of topics in the test. This information is stored in the question knowledge base. Moreover, in this module the test developer can define test specifications that will guide the question selection process and the finalization criteria, such as maximum number of questions to be posed, minimum number of questions of each topic, degree of confidence in the estimated knowledge level, etc.

•

Once the tests have been defined, there is a module that validates their elements (topics, questions, specifications) and activates the tests so they can be used by the adaptive test generation system. This test validation and activation module is an off-line process that is run in the server side, where the database server is.

•

A temporary student model, that is created and updated by SIETTE for each student that takes the test. Basically, it consists in a vector of eleven probabilities (p0, p1,.... p10), where Pi represents the probability that the student has reached a knowledge level i, and also information about which questions have been asked by the system.

•

The test generator is the main module of SIETTE. It is responsible of selecting the questions that will be posed to each student. The generation process will be guided by the specifications defined by test developers and by the temporary model of the student that is taking the test.

Specific interfaces have been implemented to make test edition and test generator modules accessible via WWW. Using these interfaces, it is possible to add questions, answers and test specifications to the knowledge base, and also to modify them. The knowledge base has been implemented using a relational database that can be accessed via WWW with scripts. Now we will describe the test editor, the temporary student model and the test generator in more detail. TEST EDITOR The test editor is a tool that uses HTML forms to extract knowledge from domain experts or instructors. The information supplied is saved in a relational database so it can be used by the test generator module.

390


In Figure 3 we can see the interface for the test editor:

Figure 3. Test editor interface

As we see, in SIETTE, tests have a curriculum-based structure. Each test for a particular subject is structured in topics and questions. The editor also allows the definition of relationships between tests, topics and questions. The curriculum defined in this way will be used by the question selection algorithm to generate content-balanced tests adjusted to the specifications previously defined, as described in [4], [5] and [1]. Besides, test designers can define parameters associated to topics and questions: a weight for each topic that represents how important is the topic in the test, and a degree of difficulty for each question. This degree of difficulty will be then combined with the guessing factor (1 /number of possible answers) and the discrimination index to construct the item characteristic curve (ICC) associated to each question. In IRT, the ICC is given by the probability that a student answers the question in the right way given his/her current knowledge level. This procedure eliminates the need of empirical previous studies used by IRT to calibrate questions [6]. Finally, using the test editor, it is possible to add multimedia content (graphics, video, sound) to questions and answers. By including multimedia contents, a greater number of subjects can be evaluated using SIETTE. The mechanism for storing multimedia content via WWW is based on RFC 1867 (Form-based file upload in HTML) [7]. Another important area developed in the SIETTE system is the possibility to define question and answer templates. These templates will be dynamically instantiated if the item (question or answer) is selected, and the system will randomly choose one of such instantiations. So the main advantages that this edition module offers are; • component reusability: tests for the same subject can share topics, and these can share questions. • Test components are written using HTML, with the flexibility that this language offers. • multimedia content in questions and possible answers, • by using templates, a great number of different questions (or answers) can be automatically generated by the system. THE TEMPORARY STUDENT MODEL

The temporary student model is created and updated by the system for each student that takes the test. This information will be used to provide the test generator module with adaptive capabilities.

391


Currently, the student can be classified into 11 knowledge levels, ranging from Level 0 (novice) to Level 10 (expert). Initially, the probability is uniformely distributed between the 11 levels. As the student takes the test, these probabilities are updated with a bayesian procedure. In Figure 4 we can see an example of a temporary student model and how it is structured: STUDENTS StudentID

TestID

Date

Level of Proficiency

Lower Confidence Level

Upper Confidence Level

John Smith

TREE

04/08/98

Level 1

0.9

1.1

KNOWLEDGE DISTRIBUTION

OBJECTIONS POSED

StudentID

TcstlD

Level 0

Level 1

Level 10

StudenttD

QuestionlD

AnswerlD

John Smith

TREE

0.001

0.9

0.001

John Smith

A1.1

John Smith

Qi Q3 Q5 Q6 Q8 Qio

A

John Smith

Ql2

A.12.3

John Smith

Q14

A

John Smith

Ql7

A|7,2

John Smith

Q20

A20.1

John Smith John Smith

TOWC DISTRIBUTION StudentID

TestID

TopicID

% Questions

John Smith

TREE

PINUS

John Smith

TREE

ABIES

John Smith

TREE

CEDRUS

40% 40% 20%

John Smith John Smith

A3.2.

AS.I A6.3

AM 10.5

14.1

Figure 4. An example of a temporary student model TEST GENERATOR AND EVALUATION ALGORITHM

To generate the test we use Owen's Bayesian approach as described in [8]. Mainly the procedure works by calculating the posterior probabilities of a student having a certain knowledge after he/she gives an answer a to question n. However, instead of considering student's knowledge as a continuous random variable we have consider it as a discrete random variable whose possible values are {0,...,10}. This assumption simplifies computations needed for the estimation of the new knowledge level and its confidence interval. This module has been implemented using a CGI application. The test generator algorithm consists of three procedures: /

Question selection

Test developers can choose between three different question selection procedures: • Bayesian procedure: (selecting the item that minimizes the posterior standard deviation), • Adaptive procedure: (selecting the item which gives the minimum distance between the mean of the ICC and the mean of current student model), • Random procedure: (the item is selected randomly). Whatever procedure is used, the system extends Owen's approach with these features: •

Random item selection. If the selection criterium does not allows to differentiate between two questions a random selection is done. This usually happens when using templates, because ICCs are assigned to templates, so every instance of a template has the same ICC.

•

Content Balancing. In SDETTE, student's knowledge is represented by a variable 6, that is, SIETTE uses an unidimensional model. However, to assure content balanced tests, SIETTE uses the weights specified by the test designer for each topic included in the matter being evaluated. These weights determine the desired percentage of questions about each topic. SIETTE compares the empirical percentages of the questions that have already been posed with the desired percentages, and selects the topic with the biggest difference as the target topic. Then, SIETTE selects the best next question belonging to that topic using the ICC associated to each question.

392

•

A. Rios el al. / Internet Based Evaluation System

Longitudinal testing. Item selection strategy in SIETTE avoids posing the same items to a student who takes the test more than once. The selection strategy uses the information stored in the student model about items posed in earlier tests.

2 Updating the temporary student model Once the best question has been chosen, the system poses it to the student and waits for an answer. When the student answers the question, SISTTE system computes his/her new proficiency level and its confidence interval. With the new proficiency level, the confidence interval, and information about questions posed and coverage of the test, the system updates the temporary student model. 3 Termination criterion The termination criterion can also be determined by test developers, and it can be any valid combination (using OR) of the cases listed below: (1) The standard deviation of the distribution of the student knowledge is smaller than a fixed value (the estimation is accurate enough). (2) The probability that the student knowledge is greater or equal than a fixed proficiency level is greater than a fixed number. (3) The system has already posed all the questions in a test. (4) The system has posed at least the minimum number of questions of each topic specified by the test designer. Once the test finishes, the temporary model of each student becomes the student model of the examinee. One of the characteristic of the system is the capability to offer immediate feedback. While a student is taking a test, the system can show him/her the solution to the question she has just answered. In this moment, the student could try to cheat the system by pressing the navigator BACK button. To avoid this type of behavior, we have implemented a simple state machine that is shown in Figure 5: correction

Figure 5. State machine for avoiding cheating

4 An example In this section, we will present an example. Let us suppose that a new student is going to take a TREE test (about a botanic domain). In the TREE test specifications, the minimum level of knowledge required in order to pass the test is 5, and the level of confidence is 75%. Initialization of the temporary student model Initially, and in absence of any type of information, the knowledge level of the student is consider to be a uniform distribution, that is, the probability that the knowledge level of the student is i (for i = 0,1..., 10) is 1/11, as shown in the first window in Figure 6. Selection of the first question First the algorithm selects the target topic, that is the one with the biggest weight in the test (the most important topic). Then it selects one of the questions belonging to that topic, using the ICC for that question. The question selected is the one that minimizes the a-posteriori variance. In Figure 6 we can see the first question selected, and its associated ICC.


393

Figure 6. Initial state in a test session, first question presented and its ICC

Now the student will answer the question, and the system will update the temporary student model and use it to select next question. In Figure 7, we show an intermediate state after the student has already answered seven questions:

Figure 7. Question 8 and knowledge distribution after seven questions

Now the probability that the student's knowledge level is 8 is 0.49. The test goes on, and after 11 questions it finishes. The final result is shown in Figure 8: The student knowledge level is estimated to be level 9 with a probability of 0.823070, so the test finishes and the student's final estimated knowledge level is 9. We can also see the statistics that SIETTE presents when the test has finished: number of posed questions, number of correct answers, estimated knowledge level and confidence interval for this estimation. As the level of knowledge reached by the student is 9, according to the test specifications the student has passed the test.

394


Figure 8. Final lest session view

5 Conclusions and future work The system we have implemented combines the dynamic nature of computerized adaptive testing systems with the advantages that the WWW offers as learning environment. Using the WWW we can help teachers all over the world in the difficult task of evaluation. With our system, evaluation is impartial and the results are more consistent and accurate than with traditional paper-and-pencil tests. AH the tools that compose the SIETTE system can be accessed simultaneously, so many different persons can use the system with different purposes at the same time. SIETTE uses HTML-like language for editing questions, therefore both format and aspect of questions are totally adaptable to teacher's preferences. Generated tests can contain multimedia objects, so teachers can compose more attractive test interfaces and evaluate subjects (for example, all subjects that involve recognition of objects from photographs) that cannot be evaluated using only text. Finally, we would like to remark the importance of using efficient algorithms to select the best question to ask and also to store and recover the information in the student model. Delay times due to these processes and also to downloading ones can keep students waiting for too long when using the system. To improve the average performance, it may be interesting to use the Internet time delay to run the algorithms selecting the next question in the server side, while the student is still waiting or thinking about the last question in the client side. Also, in further improvements of the system we want to include polytomous items and also to use multidimensional models so we can have a more detailed information in the student model to be used by the ITS component of the TREE Project. REFERENCES [1] Huang, Sherman X. (19%). On Content-Balanced Adaptive Testing. LNCS 1108, 569–577. [2] Collins, J.A., Greer, J.E. and Huang, S.H. (1996). Adaptive Assessment Using Granularity Hierarchies and Bayesian Nets. LNCS1086, 569–577. [3] Rios, A. M., Perez de la Cruz, J. L. and Conejo, R (1998). SIETTE: Intelligent Evaluation System using Tests for TeleEducation. Workshop "Intelligent Tutoring System on the Web" at ITS'98. http://www-aml.cs.umass.edu/-stern/its98/ [4] Kingsbury and Zara (1989). Procedures for Selecting Items for Computerized Adaptive Tests. Applied Measurement in Education, 2(4), 359–375. [5] Welch, R. and Frick, T. W. (1993). Computerized adaptive testing in instructional settings. Educational Technology Research and Development, 41, 47–62. [6] Weiss D. J. and Kingsbury G (1979). An Adaptive Testing Strategy for Mastery Decision. Journal of Educational Measurement. 21(4), 361–375. [7] Nebel, E. and Masmter, L (1995). RFC 1867: Form based File Upload in HTML. http://sunsite.auc.dk/RFC/rfc/rfc 1867.html [8] Owen, RJ. (1975). A Bayesian sequential procedures for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association 70, 351–356.

Student Modeling



397

SIPLeS-H: An Automatic Program Diagnosis System for Programming Learning Environments Songwen Xu and Yam San Chee School of Computing, National University of Singapore Lower Kent Ridge Road, Singapore 119260 Abstract: Substantially new computational technologies are required to support the development of learning environments under more recent learning paradigms such as agent-based learning, constructivist, and problem-based learning. In this context, the automatic diagnosis of student programs is a prerequisite for a programming learning environment or an agent to carry out various pedagogical strategies. To our knowledge, no existing system today performs the automatic diagnosis of student programming errors entirely satisfactorily. In this paper, we propose a new approach to automate the diagnosis of student programming errors. We use a series of new methods including appended object-oriented program dependence graphs (AOPDG), transformation-based program standardizations, semantic level program comparisons, and maximal-likelihood-based error detection. Automatic diagnosis of students' programming errors is achieved by comparing the student's program with a specimen program after both are standardized by program transformations. This approach is implemented using Smalltalk in SIPLeS-II, an automatic program diagnosis system for Samlltalk programming learning environments. The system has been tested on approximately 100 student programs. Experimental results show that, using our approach, semantic errors in a student program can be identified rigorously and safely, and semantic-preserving variations in a student program can be handled and accommodated for the very first time. Our tests also show that the system can not only identify a wide range of errors, but it can also produce indications of the corrections needed in various programs. 1. Introduction Substantially new computational technologies are required to support the development of learning environments under more recent learning paradigms such as agent-based learning [12], constructivist [4, 7, 8], and problem-based learning. Based on our experience in developing several Smalltalk programming learning environments using cognitive apprenticeship [5] and goal-based scenarios [6], we find that in order to support interactions with various pedagogical strategies in a programming learning environment, automatic diagnosis of student programs is a prerequisite. Hence, the purpose of the research reported in this paper is to investigate a method to automate the diagnosis of student programming errors. There is much research related programming learning environments [19, 23], automatic program assessment [21], and program analysis and understanding [20]. However, there are few prototypes of systems focusing on the problem of automatic diagnosis of programming errors. Existing work includes Adam & Laurent [2], Johnson & Soloway [12], and Murray [17]. To our knowledge, no existing system performs the automatic diagnosis of programming errors entirely satisfactorily today. Three main approaches to the automatic diagnosis of programming errors have been used to date. The first approach attempts to match a student's program against a specification that is a higher-level description of the program's goals [12, 13, 22]. This approach can be called a source-to-specification approach. The second approach attempts to extract the specification from a student's program and compare it with the task specification [17]. This is a specification-to-specification approach. These two approaches are consistent with a focus on program understanding [20]. However, the immaturity of program understanding technology [1] prevents these two approaches from being widely used as practical technologies in intelligent tutoring systems for learning programming. The third approach attempts to match a student's program against a specimen program (also called a model program) stored in the system. This approach can be called a source-to-source approach. An early piece of work using this approach is the LAURA system for learning FORTRAN [2]. LAURA uses very primitive program representations and transformations that are not applicable to structured languages and objectoriented languages. The match between the student program and the model program is not a semantic-level match. With advances in the fields of program analysis and compiler design, however, technologies such as

398

S Xu and Y.S. Chee /SIPLeS-Il

Object-oriented Program Dependence Graph (OPDG) representation [15J, program transformation [16], and program comparison [10, 25] have matured. The third approach offers promise in dealing with the problem of automatic diagnosis of programming errors. In this paper, we propose a new method, in line with the third main approach, for automating the diagnosis of student programming errors. We use a series of new techniques including appended objectoriented program dependence graphs (AOPDG), transformation-based program standardizations, semantic level program comparisons, and maximal-likelihood-based error detection. The automatic diagnosis of students' programming errors is achieved by comparing the student program with a specimen program after both have been standardized by program transformations. Our approach has been implemented using Smalltalk in SIPLeS-II, an automatic program diagnosis system for programming learning environments. The system has been tested on approximately 100 student programs. SIPLeS-II represents the continuation of earlier research on SIPLeS, our previous Smalltalk programming learning environment [6]. Experimental results show that, using our approach, semantic errors in a student program can be identified rigorously and safely, and semantics-preserving variations in a student program can be handled and accommodated for the very first time. Our tests also show that the system can not only identify a wide range of errors, but it can also produce indications of the corrections needed in various programs. The generality of our approach makes it useful for the development of programming learning environments. In particular, it is applicable to other object-oriented programming languages such as C++ and Java because the AOPDG representation is applicable to general object-oriented programming languages. It is also applicable to non-objected-oriented programming languages where programs can be represented in ordinary dependence graphs. In the following sections, we present the steps in our automatic diagnosis procedure first. Then, we explain each step in the procedure using a running example. Finally we present our test results and discuss these results. 2. Automatic diagnosis procedure Automatic diagnosis procedure in SIPLeS-H is shown in Figure 1. Get a student program (SP) and a model program (MP)

Perform advanced transformations to standardize SPTree and MPTree

Represent SP and MP in ASTs, called SPTree and MPTree

Represent SP and MP in AOPDGs, called SPGraph and MPGraph

Perform basic transformations to standardize SPTree and MPTree

Compare SPGraph and MPGraph

Produce flow-graphs for SP and MP and calculate definition-use information for SP and MP

Detect errors and produce sis report

Figure

1 . Automatic diagnosis procedure i n

(AST) first. Basic transformations that do not require definition-use information are performed to standardize SPTree and MPTree. After that, the flow-graph of student program and model program are produced and used in the calculation of definition-use information (DU information) of both student program and model program. The DU information is made use of in advanced transformations to standardize SPTree and MPtree further in order to facilitate program comparison. The DU information is also used in the calculation of data dependence for producing AOPDGs for SP and MP. Dead codes are removed based on dependence information calculated. Next, SPGraph is compared with MPGraph. The comparison produces the results of textual differences and semantic differences between SP and MP. Finally, the comparison results are processed in the step of errordetection and a diagnosis report is produced. The details of every part are addressed in the following sections.

S. Xu and Y.S. Chee / SIPLeS-Il

399

The main technologies in the approach are program representation, program transformation, and program comparison. In SIPLeS-II, the system is provided with model program to enable it to judge the semantic correctness of a student program. The differences between the student program and the model program, which is revealed from the semantic-level program comparison, are regarded as semantic errors in the student program. The program representations used in SIPLeS-IJ include AST, flow-graph, and AOPDG. An AOPDG consists of an appended control dependence subgraph (ACDS) and an appended data dependence subgraph (ADDS). It is produced by appending an OPDG [15] with several types of auxiliary nodes (called sigma nodes) in order that the program representation be in SSA (single static assignment) form where every use of a variable is only defined by one definition. It is important to note that (1) Many variations existing in source codes are eliminated in ASTs, and many variations existing in ASTs are eliminated in AOPDGs; (2) An AOPDG represents the semantic information of a program. Program transformation has been extensively studied in the fields of compiler design [16], automatic logic and functional program generation [18], and parallel program optimization. However, transformations used in SIPLeS-H are only those that are applicable at source code level [14], not at intermediate code level, and are necessary for handling the variations that appear. There are three levels of program comparisons: simple text-level comparison, syntax-level comparison [9], and semantic-level comparison [10, 11, 25]. The comparison algorithm used in our approach is a semantic-level comparison algorithm, which extends Yang's partitioning algorithm to the representation of AOPDG and combines Horwitz's algorithm of identifying textual differences. There could be various kinds of variations in a student program compared to a model program. Some variations are legal variations whereas others are errors. The biggest challenge in automatic diagnosis of student programming errors is to handle various legal variations with proper strategies. In our approach, we used the following strategies to handle legal variations. (1) Represent programs in ASTs: eliminate differences that exist at source code level. (2) Basic transformations: eliminate differences of temp declaration, cascaded message, algebraic variations, and control structures variations. (3) Represent programs in AOPDGs: eliminate differences that exist at syntax level, such as different statement orders. (4) Advanced transformations: eliminate differences of using different numbers of temporary variables. (5) Comparison algorithm: handle the use of difference variable names. Variations of semantics-preserving changes of control structure are accommodated in the comparison. (6) Dead code removal: handle the differences caused by dead codes. (7) Equivalent-expression learning: handle equivalent expressions by a self-learning mechanism. (8) Model-program learning: handle the variations caused by student programs using different algorithms by adding new model programs corresponding to those algorithms. 3. Step 1: Get a student program (SP) and a model program (MP) We use a running example to explain the approach. The task description is given below. A student program and a model program for the task with statement names are given below. Define a method called taxiFeeWith: mile isBookingCase: bookingCase. If "bookingCase" is true, a booking fee of $2.00 should be charged, and the price per mile is $3.00, otherwise no booking fee is charged, and the price per mile is $2.50. The total taxi fee is calculated by mile*price + bookingFee.

Student program

Model program

MEntry taxiFeeWith: mile isBookingCase: bookingCase taxiFeeWith: mile isBookingCase: booking I bookingFee price | | bookingFee price payment I MS0011 bookingFee:-0.0. SS0011 (booking) SS1121 ifTrue:[ price:-3.0. MS0012 price :-2.5. SS1122 bookingFee: 2.0.) MS0013 (bookingCase) SS1123 if False: [ price :-2.5. MS1321 ifTrue: (bookingFee:- 2.0. SS1124 bookingFee:-0.0.]. MS1322 price:-3.0). SS0012 payment:- (price + bookingFee) * mile. SS0013End payment MSOOUEnd "(price 'mie + bookinoFee) Different people view differences between two different programs differently. To identify the differences exactly between two different programs is, in general, undecidable. From one point of view, a human tutor may identify the differences between the student program and the model program as follows: (1) Difference 1: Different parameter names are used in SEntry and SS0011. This is a textual variation. SEnlry

400

S. Xu and Y.S. Chee / SIPLeS-ll

(2) Difference2: Different number of temporary variables is used in SS0012 and MS0014End. This difference changes the way computations are performed without changing the values computed. We refer to this as a semantics-preserving variation. (3) Differences: Different statement orders of SS1121 and 81122 compared to that of MS1321 and MS1322. This difference is also a semantics-preserving variation. (4) Difference4: Different source-code format in SS1122 compared to that in MS1322. There is an extra"." before a "]" in SS1122. This difference is a source-code level variation. (5) Differences: Different values produced at the point of SS0011. price and bookingFee are calculated out of the ifFatee: control structure in MP, whereas they are calculated inside the ifFatee: control structure in SP. It is also a semantic-preserving variation, because at the point of SS0012, the values produced are the same compared to the values produced in MP at the point of MS0014End. (6) Differenced: Different computation carried out in SS00l2End compared to MS0014End. This is a semantic variation. It is an error in the student program. 4. Step 2: Represent SP and MP in ASTs, called SPTree and MPTree The AST representation of a program is constructed from the parse tree of the program by adding information such as statement level [24]. The statement level provides structural information to support program analysis and program transformation. Some variations only existed at the source code level are eliminated in the AST representation. In the example, difference4, the "." before the "]", is eliminated in SPTree and MPTree. 5. Step 3: Perform basic transformations to standardize SPTree and MPTree Basic semantics-preserving transformations, which do not require DU information, are performed on both model program and student program to standardize them if applicable. There are 4 basic transformations. (1) Statement separation: standardize cascaded messages to a message sequence. (2) Temporary declaration standardization; standardize all temporary variables to be defined only in method temporary variable declaration rather than in block temporary variable declaration. (3) Algebraic standardization: standardize arithmetic expressions by applying rules of commutativity, associativity, and distribitivity, such as t + c = c + t, (t1*t2)*t3=tr(t2t3), and t1*(t2-t3H1*t2-t1*t3, where c is a constant, t1 12, and (3 are non-constant expressions. A value of weight is calculated and used in applying rules containing more than one non-constant expressions to decide whether a standardization rule is applicable. (4) Control structure standardization: standardize all control structures to one of the three structures which are ifTruerifFatee:, whHeTrue:, and to:by:do:. For example, the control structure of "receiver or: bl" is standardized with a rule "receiver or: b1 -> receiver ifTrue: [true] ifFatee: b1" In the example, control structure standardization and algebraic standardization are applied on SPTree. Algebraic standardization is applied on SPTree. They are changed to the fallowings after the standardization.

Student program Sentry

taxiFeeWth: mite isBookingCase: booking | bookingFee price payment l SS0011 (booking} SS1121 ifTrue: [price:-3.0. SS1122 bookingFee := 2.0] SS1123 itFabe:|price:-2.5. SS1124 bookingFee:'0.0). SS0012 payment :• price 'mite* (bookingFee' mile) SS0013End 'payment

MEntry

tariFeeWWt: mite isBookingCase: bookingCase ibookingFee price1 MS0011 bookingFee:* 0.0. MS0012 price :-2.5. MS0013 (bookingCase} MS1321 ifTrue: (bookingFee:« 2.0 MS1322 price:-3.0] #False:(). MS001i4End '(once-mte + bookinoFee)

6. Step 4: Produce flow-graphs for SP and MP and calculate definition-use information for SP and MP The flow-graphs for SP and MP are produced based on the SPTree and MPTree and appended with sigma nodes. Based on the appended flow-graphs, DU information is calculated. In the example, the DU information for SS0012 and MS0014End are the followings, where rCHin means definitions reached at the incoming flow of a statement and rCHout means definitions available at the outgoing flow of a statement. SS0012 rCHin: Set ('EXIT(bookingFee)'->'bookingFee1 'EXIT(price)'->'price' ' INI(payment) '->'payment' 'INI(mile) '->'mile' 'INI(booking) '•>'booking')

S. Xu and Y.S. Chee / SIPLeS-II

401

Set ('EXIT(bookingFee)'->'bookingFee' 'EXIT(price)'->'price' rCHout: 'INI(mile) ->'mile' 'SS0012'->'payment' 'INI(booking)'->'booking') MS0014END rCHin: Set ('EXIT(bookingFee)'->'bookingFee' 'EXIT(price)'->'price' 'INI(bookingCase)'->'bookingCase' 'INI(mile)'->'mile') rCHout: Set ('EXIT(bookingFee)'->'bookingFee' 'EXIT(price)'->'price' 'INI(bookingCase)'->'bookingCase' 'INI(mile)'->'mile') 7. Step 5: Perform advanced transformations to standardize SPTree and MPTree With the DU information available, advanced standardization transformations are performed on SPTree and MPTree. There is one advanced standardization transformation. Forward substitution: If a variable v is defined as the value of an expression ex in S1 and the operands of the expression are not modified between S1 and S2 where v is used and not defined, all uses of v in S2 can be replaced by the expression ex [16]. In the example, no advanced standardization transformation is applicable to MP. Forward substitution is applied on SP with the results of "SS00l3End Apaymenf being changed into "SS0013End Aprice *mile+ (bookingFee * mile).". Difference2 is eliminated in this step. 8. Step 6: Represent SP and MP in AOPDGs, called SPGraph and MPGraph With the DU information and flow-graphs available, control dependence and data dependence are calculated for SP and MP, and appended objected-oriented dependence graphs of SP and MP, called SPGraph and MPGreaph, are produced. Differences is eliminated in the representation of AOPDG. Moreover, dead codes in MP and SP are removed. Dead codes are identified as those statements without any outgoing dependence edges and are not return statements. In the example, SS0012 is removed as dead code and the last statement in SP is renamed to SS0012End. SPGraph in the example is shown below. The solid lines with T/F labels are control dependence edges and the dot lines are data dependence edges.

where INI(booking): booking := INI(booking), ENTER(price): price := ENTER(price), and EXIT(price): price := EXIT(price). Figure 2. AOPDG for the student program 9. Step 7: Compare SPGraph and MPGraph After the student program and the model program are standardized by the standardization transformations described above, the system compares MPGraph with SPGraph to identify errors in the student program. The main idea of the algorithm is based on global value numbering [16, 25]. The vertices in SPGraph (i.e. statements in SP) and vertices in MPGraph are classified into various parts in a partition based on the following semantic information of a statement. (1) Operators in a statement. (2) Operands in a statement. (3) Incoming and outgoing control dependence edges. (4) Incoming and outgoing data dependence edges. Specifically, the system creates an initial partition based on operators and operands. Then, the initial partition is refined by data dependence edges and control dependence edges. If a statement in SP and a statement in MP are in a same partition part after refining, these two statements are surely semantic equivalent. Furthermore, with only one pass of the refinement upon data dependence edges, the system is able to accommodate semantics-preserving differences caused by difference control structures. In this case, the information of control dependence has already been reflected in the different types of data dependence edges.

402

S. Xu and Y. S. Chee / SIPLeS-11

In the running example, the comparison result, a partition refined by the data dependence, is given below. {0.0-KMS0011 SS1124) 2.0->(MS1321 SS1122) 3.0->(MS1322 SS1121) 2.5->(MS0012 SS1123) ifTrue:ifFalse:(MS0013 SS0011) taxFeeWlhisBookingCase:->(MEntry SEntry) +->(nil->SS0012End) NEW12->(MS0014End->nit){ By comparing the text of the matched statements, the system identifies textual differences between two matched statements in a certain partition pan. In the example, they are ifTrue:ifFalse:-»(MS0013 SS0011) and taxiFeeWith:isBookingCase:->{MEntry SEntry). Semantic differences are indicated by unmatched statements. There are two unmatched statements in the example, +->(nil>SS0012End) and NEWl2-»(MS0014End->nil). Difference1 is correctly identified as textual difference in this step. Difference5 is accommodated in the comparison and Difference6 is correctly identified as semantic difference in the comparison. 10. Step 8: Detect errors and produce diagnosis report In order to diagnose errors in the student program, the comparison result is processed further in this step. First, exact textual differences between the matched pairs of SP statement and MP statement are extracted. Second, for every unmatched SP statement, a similar MP statement is found based on the similarity. The definition of the similarity is also based on the factors used in the comparison judging the congruence of the statements. This technology of maximal-likelihood-based error detection enables the system to pinpoint errors and gives corrections of the errors needed. Third, differences between the similar unmatched statement pair, excluding exact textual differences, are identified. Fourth, a self-learning mechanism is employed to handle variations caused by equivalent expressions. Finally, the rest differences between the similar unmatched statement pair are used to report programming errors in the diagnosis report. If two statements are unmatched without any explicit difference indicated, it reveals that there is an incorrect control structure affecting the correctness of the unmatched statement in the student program. In the running example, the system produces the following diagnosis report: ==========Diagnosis Report========== The student program is INCORRECT. Detailed diagnosis of the student program SENTRY taxiFeeWith:isBookingCase:#({mile} {booking}) MENTRY taxiFeeWith:isBookingCase:#({mile} {bookingCase}) The above statement in student program has textual change(s). 'booking' >'bookingCase' SS0011 {booking ifTrue: aBlcok ifFalse: aBlcok} MS0013 {bookingCase ifTrue: aBlcok ifFalse: aBlcok} The above statement in student program has textual change(s). 'booking' >'bookingCase' SS1121 {price := 3.0} MS1322 {price := 3.0} The above statement in student program is correct. SS1122 {bookingFee := 2.0} MS1321 {bookingFee := 2.0} The above statement in student program is correct. SS1123 {price := 2.5} MS0012 {price := 2.5}

The above statement in student program is correct. SS1124 {bookingFee := 0.0} MS0011 {bookingFee := 0.0} The above statement in student program is correct. SS0012END {Amile * price + (mile * bookingFee)} MS0014END {AbookingFee + (price * mile)} The above statement in student statement has understandable error(s) {mile * price} >'bookingFee' 'mile' > 'price' 'bookingFee' > 'mile'

S. Xu and Y.S. Chee / SIPLeS-II

403

After the system carries out all the strategies for handling legal variations described in section 2, the differences left are reported in the diagnosis report as student programming errors. A programming learning environment can carry out suitable pedagogical strategies based on the diagnosis report. It is true that a reported error may actually be a legal variation because the system can not handle a certain kind of legal variation correctly. However, in our approach, it is impossible to miss an error if it actually is. This means the approach emphasizes on safety and it is a conservative approach. 11. Results and discussion In this paper, we proposed a new method for automatic diagnosis of student programs in programming learning environments. In this approach, programs are represented as Appended Object-oriented Program Dependence Graphs (AOPDG). Both the student program and model program are standardized by basic transformations as well as advanced transformations. Then, the two AOPDGs, which reflect the semantic information of the student program and the model program, are compared at the semantic-level. Textual differences and semantic differences between the student program and the model program are obtained and processed in the step of maximal-likelihood-based error detection. Finally, a diagnosis report is produced. The approach is implemented in a system called SIPLeS-II using Smalltalk/Visual Works 2.5. It is tested by using it to diagnose approximately 100 student programs belonging to 3 different programming tasks. The size of the student programs is about 10 to 20 statements. The test results are shown below. Table 1. Test results of SIPLeS-II Task name TaxiFee IsPerfectNumber PickPalindrome

Student programs

37 40 32

Model programs

5 5 4

Student programs diagnosed correctly

Rate of correct program-diagnosis

Rate of correct statement-diagnosis

33 36 28

86% 90% 87%

97% 98% 97%

From Table 1, we see that the rate of correct program-diagnosis and the rate of correct statementdiagnosis are very high, about 88% and 97% respectively. A new model program is to be input by the teacher when a student program uses a different algorithm from that used by the model program. From the figures shown above, the number of model programs needed for each of the three tasks is around 5. Although, strictly speaking, the number of model programs needed is undecidable, it is reasonable to believe that in practice the number of model programs is small. In our tests, we observed that the number of model programs needed became stable after 20 to 25 student programs were processed. Limitations of the approach: (1) Some cases of incorrect diagnosis are in the situations that the following pairs appear in the student program and model program respectively: (0.5 + price) (3.0), (-1*0.5 + price) 2.5. The system diagnoses incorrectly because of not understanding that in the specific contexts the value of price is 2.5 and 3.0 respectively. General mathematical knowledge and mathematical knowledge within a specific context are required to solve this problem. (2) System level variations, such as different class hierarchies, are not considered. The new features of the approach are as follows. (1) The program representation reflects semantic information of the program and eliminates many non-semantic variations. (2) Programs are analyzed, transformed, and compared rigorously at the semantic level. By "rigorously", we mean that the results of the analysis, transformation, and comparison are guaranteed to be correct. (3) Student programming errors are identified safely. By "safely", we mean that the approach may regard an actual correct statement as an incorrect statement, but the approach will never regard an actually incorrect statement to be a correct statement. It is a conservative approach. (4) Semantics-preserving variations in student programs, such as variations of different control structures, are accommodated and handled. In summary, with our approach, for the very first time, semantic errors in a student program can be identified rigorously and safely, and semantics-preserving variations in a student program can be accommodated and handled. The tests also show that the system is able to identify a wide range of errors and also produce indications of the corrections needed. The approach is very useful for the development of programming learning environments. It can be applied to other object-oriented programming languages such as C++ and Java because AOPDG representation is applicable to general object-oriented programming languages. It is equally applicable to non-objected-oriented programming languages where programs can be represented in ordinary dependence graphs.

S

404

Xu and Y.S. Chee /SIPLeS-H

References [I]

Abd-EL-Hafiz, S.K., & Basili, V.R. (1995). A Knowledge-based Approach to Program Understanding, Norwell. Massachusetts: Kluwer Academic Publishers. [2] Adam, A. & Laurent, J. (1980). A system to debug student programs. Artificial Intelligence, 15(1), 75– 122. [3] Bredo, E. (1994). Reconstructing educational psychology: situated cognition and Deweyian pragmatism. Educational Psychologist, 29(1), 23–35. [4] Chan, T-W. (1996). Learning companion systems, social learning systems, and the global social learning club. Journal of Artificial Intelligence in Education, 1.7(2), 125-159. [5] Chee, Y.S. (1995). Cognitive apprenticeship and its application to the teaching of Smalltalk in a multimedia interactive learning environment. Instructional Science, 23, 133–161. [6] Chee, Y.S. & Xu, S. (1997). SIPLeS: supporting intermediate Smalltalk programming through goalbased learning scenarios. In B. du Boulay & R.Mizoguchi (Eds). Artificial Intelligence in Education: Knowledge and Media in Learning Systems, IOS Press, 95–102. [7] Clancey, W.J. (1991). The frame of reference problem in the design of intelligent machines. In K VanLehn (Ed.), Architectures for Intelligence, Hillsdale, NJ: Lawrence Erlbaum. 357–423. [8] Collins, A., Brown, J.S., & Newman, S.E. (1989). Cognitive apprenticeship: teaching the crafts of reading, writing, and mathematics. In: L. B. Resnick (Ed.), Knowing. Learning, and Instruction: Essays in Honor of Robert Glaser, Hillsdaie, NJ: Lawrence Erlbaum, 453–494. [9] Elsom-Cook, M. & du Boulay, B. (1988). A Pascal program checker. In: J. Self (Ed.), Artificial Intelligence and Human Learning, Chapman and Hall, 361–373. [10] Horwitz, S. (1990). Identifying the semantic and textual differences between two versions of a program. ACM SIGPLANNotices, 25(6), 234-245. [11] Jackson, D., Ladd, D.A. (1994). Semantic Diff: a tool for summarizing the effects of modifications. Proceedings of International Conference on Software Maintenance, pp. 243-252. [12] Johnson, W.L., & Soloway, E. (1985). Proust: Knowledge-based program understanding. IEEE Transactions on Software Engineering, SE-11(3), 11–19. [13] Looi, C. (1991). Automatic debugging of prolog programs in a Prolog intelligent teaching system Instructional Science, 20, 215–263. [14] Loveman, D. (1977). Program improvement by source to source transformation. Journal of ACM, 24(1), 121-145. [15] McGregor, J.D., Malloy, B.A. & Siegmund, R.L. (1996). A comprehensive program representation of object-oriented software. Annals of Software Engineering, 2, 51–91. [ 16] Muchnick, S.S. (1997). Advanced Compiler Design and Implementation, Morgan Kaugmann Publishers. [17] Murray, W.R. (1988). Automatic Program Debugging for Intelligent Tutoring Systems, Morgan Kaufmann Publishers. [18] Pettorossi, A., & Proietti, M. (1994). Transformation of logic programs: foundations and techniques. Journal of Logic Programming, 19-20, 261–320. [19] Ramadhan, H. & du Boulay, B. (1992). Programming environment for novices. In Lemut, E., du Boulay, B., & Dettori, G. (Eds.). Cognitive Models and Intelligent Environments for Learning Programming, Springer-Verlag. [20] Rich, C. & Wills, L.M. (1990). Recognizing a program's design: a graph-parsing approach. IEEE Software, 1,82–89. [21] Thorbum, G. & Rowe, G.(1997). PASS: An automated system for program assessment. Computers and Education, 29(4), 195-206. [22] Ueno, H. (1987). Knowledge based intelligent programming environments—from the point of view of program comprehension. Information Processing, 28(10), 1280–1296. [23] Ueno, H. (1995). Concepts and methodologies for knowledge-based program understanding-the ALPUS's approach. IEICE Transactions on Information and Systems, E78-D(9), 1108–1117. [24] Xu, S. & Chee, Y.S. (1998). Transformation-based Diagnosis of Student Programming Errors. Proceedings of ICCE'98. Springer. 405–414. [25] Yang, W., Horwitz, S., & Reps T. (1992). A program integration algorithm that accommodates semantics-preserving transformations. ACM Transactions on Software Engineering and Methodology, 1(3), 310-354.


405

The Interactive Maintenance of Open Learner Models Vania Dimitrova, John Self and Paul Brna Computer Based Learning Unit, University of Leeds, Leeds LS2 9JT, UK E-mail: { V.Dimitrova, J.A.Self, P.Brna}@cbl.leeds.ac.uk

Abstract. Opening the learner model is a process involving the learner as a collaborator in building a model of his beliefs. The interaction plays a crucial role here since it provides both the system and the learner with a medium to reflect on the learner's beliefs. In this paper we describe a computational framework for the interactive maintenance of open learner models. It adopts some approaches from human-human interaction and considers the two main issues of the interaction - the language for communication and the underlying dialogue model. We describe a communication language that graphically externalises the learner's beliefs. An approach based on dialogue games is adapted to manage the moves in an interactive diagnostic dialogue. The framework is exemplified in STyLE-OLM, an open learner modelling component in a scientific terminology learning environment.

1 Introduction In traditional student diagnosis, the system seeks to infer the reasons for the learner's behaviour without direct help from the learner. A potentially more accurate alternative is to involve the learner in the process of building the learner model (LM) with both the learner and the system sharing in the activity of inspecting and changing the LM. The interaction plays a crucial role in open learner modelling environments since it provides both the system and the learner with a medium to reflect on the learner's beliefs. Moreover, through the interaction, information about the LM can be provided by the learner to the system assisting the system in learner modelling. Therefore, how to maintain systemlearner interaction is a crucial issue in open learner modelling systems. TAGUS [11] and UM [8] use viewers that present the LM on the screen and supply command options so that the learner can submit his changes. The learner's suggestions are accepted or not by the system according to its priority mechanism. Mr Collins [3] negotiates with the learner when there is a disagreement in the confidence measures of the student's beliefs given by the two sides. The interaction occurs only if there is a conflict in the belief measures. Examples of interactions in peer diagnosing systems, however, show a diversity of dialogue moves that occur when two people discuss their belief models (c.f. [4]). People are motivated to reflect on their LM and to discuss it with human peers ([4], [7]). With a computer system, the learners show lower motivation, e.g. more than half of the students using UM have browsed their LMs mainly passively without any significant interest [8]. If we compare humancomputer and human-human interactive diagnosis, it is plain that the communicative abilities of the computer systems and those given to the learner are severely limited. The dialogue is very restricted - it usually comprises a single exchange in which the learner

406

V. Dimitrova et al. / Maintenance of Open Learner Models

must respond to a system's question. Then, the learner's response is strictly delimited by the system's choices and often he cannot ask for another option. Adopting models from humanhuman interaction is a favourable approach to design interactive open learner modelling environments that can lead to improved reflection and motivation. In this paper, we describe a computational framework for the interactive maintenance of open learner models. Unlike the systems discussed above, we have adapted models from human-human communication to maintain the interaction between the system and the learner when discussing the learner model. The main issues of our approach are exemplified in STyLE-OLM, an open learner modelling component in a scientific terminology learning environment. The learner model and the role of interaction in STyLEOLM are described in §2 and §3, respectively. We present a conceptual graph (CG) [15] interactive environment that provides a diagrammatic communication language and a graphical externalisation of the learner's beliefs (see §4.1). An approach based on dialogue games (DGs) is adapted to specify the dialogue moves that both participants can use to discuss the learner model (see §4.2). A dialogue example is given in §4.3. 2 The Learner Model in STyLE-OLM A learner model represents the system's beliefs about the learner's beliefs accumulated during the diagnostic process.The LM represents those aspects of the learner's knowledge that are regarded as significant for determining the actions of the intelligent learning environment (ILE). STyLE-OLM is intended for use as an open learner modelling component in a scientific terminology ILE. In STYLE-OLM, the system and the learner discuss the learner's conceptual knowledge. The jointly constructed LM can be used as a source for adaptive feedback and content planning in a terminological ILE. It has been shown that conceptual graphs can be used in a terminological ILE to represent the subject area knowledge [6]. STyLE-OLM uses the same formalism to represent the learner's beliefs about concepts and their relations. We consider two belief stores representing what the system thinks about the learner's domain expertise. The first store consists of CGs that show what the system believes that the learner believes (SbLt,). SbLb, includes correct or erroneous learner's beliefs. Comparing the learner's behaviour with its domain expertise, the system may conclude that the learner has not created certain relations presented in its domain model. These refer to the learner's incomplete knowledge, namely what the system believes that the learner does not believe (SbL-b,). Figure 1 shows part of the LM in a Computer Science domain (CGs are presented in linear form [ 15]). GI_C

[OBJECT] --

(contain) -* (DATA] (contain)-* [OPERATION: {*(] [OBJECT-ORIENTED PROGRAM] -> (falsefriend) -* [OBJECT PROGRAM] [OBJECT-ORIENTED LANGUAGE: VISUAL BASIC] [OBJECT-ORIENTED LANGUAGE: C++]

G2_e Cl_e C2_C

[ACTION: TRANSLATE] --* (instrument) -> [OBJECT LANGUAGE] -* (object) -* [OBJECT PROGRAM] -* (agent) [COMPILER] (object) [SOURCE PROGRAM] [OBJECT-ORIENTED LANGUAGE] -• (contain)-* (OBJECT: {•)] (contain)-* [CLASS: {*}]

G4_c

(characteristic)-* [ENCAPSULATION] (characteristic) -* [INHERITANCE] C3_ix. C4_ic:

[OBJECT-ORIENTED PROGRAM] [OBJECT PROGRAM]

Figure 1. Part of the LM in STyLE-OLM: belief stores are represented as CG sets


407

When the learner's beliefs differ from the system's domain expertise the student's problems need some kind of explanation for the ILE to decide what to do. Thus, in addition to the learner's beliefs, the LM includes learner's misunderstandings that explain his erroneous and incomplete knowledge. We consider three groups of misunderstandings: (1) language similarities that cause conceptual problems (e.g. OBJECT-ORIENTED PROGRAM and OBJECT PROGRAM can be confused because they contain the term OBJECT); (2) communication and representational problems (the first concerns misunderstanding dialogue moves and the second concerns problems with the communication environment); and (3) misconceptions are those misunderstandings that are hard to correct and are at heart conceptually wrong. They are crucial for the learner's conceptual modelling. Misconceptions in STyLE-OLM are derived from concept learning theories [17] and formalised with rules based on CGs. Three main classes of errors are considered: • misclassification - an individual I is wrongly considered as a member of a class C; • class misattribution - an attribute A is wrongly attached to a class C; • individual misattribution - an attribute A is wrongly attached to an individual J. To illustrate, table 1 shows two misconception rules to explain misclassification errors. Table 1. Misclassification rules in STyLE-OLM in a Prolog notation: find_gr/2 searches the LM for graphs containing a class or an individual; fmd_indiv/2 searches for an individual from a given class; generalisation/3 finds the generalisation [15] of two graphs. The examples depict reasons for a student

to believe wrongly that VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE. misclassification_l(I.C,A):- fmd_gr(I,G1), find_gr(C,G2),generalisation(Gl,G2,A). The individual has features that are part of the class features, e.g. VISUAL BASIC [I] is an OBJECTORIENTED LANGUAGE [C] because it "contains OBJECTS" [A]. misclassification_2(LC,J,A):- find_indiv(C,J), find_gr(I,G 1), find_gr(J,G2), generalisation(G 1 ,G2,A). The individual has common features with an individual that belongs to the class, e.g. VISUAL BASIC [I] is an OBJECT-ORIENTED LANGUAGE [C] because both it and VISUAL C++ [J] allow "programming in a VISUAL ENVIRONMENT" [A].

3 The Role of STyLE-OLM in Learner Modelling Process Learner modelling combines both observation and interaction. We assume that the ILE calling STyLE-OLM has an observational component that builds a preliminary LM (such as the one shown in figure 1). STyLE-OLM is an interactive diagnosis tool where the rough LM is refined through a discussion with the learner. The ILE opens a discussion about the LM, i.e. it enters STyLE-OLM, when there are problems with the observational diagnosis: • The learner makes the same errors several times although the system always gives him proper feedback. The ILE needs either to refine the information about this error in the LM or to find a sophisticated explanation, i.e. the misconception behind it. • The learner shows a belief that is very wrong. For example, he thinks that OBJECT PROGRAM and OBJECT-ORIENTED PROGRAM are interchangeable but according to the system's knowledge they are not conceptually related. In this case, the ILE may go to an interaction about the LM in order to add, refine and delete learner's beliefs. • The system does not have enough information to decide what the student knows about a term and faces problems about what the next action should be. To decide how to react when the learner uses the term erroneously, the system has to fill the gap in the LM, i.e. to add more information about the learner's beliefs. The student himself has the most to gain from a successful diagnosis. He might be supposed to be motivated for full engagement in the diagnostic process. For example, he may discover that the system is not very adaptive when it gives him explanations for terms

408

V. Dimitrova et a!. / Maintenance of Open Learner Models

that he knows very well and misses others he feels not very aware of. Then the learner can enter STyLE-OLM and initiate a discussion about the LM. To summarise, we consider both the learner and the ILE entering the interactive diagnosis component. The main actions on the learner model in this component are: (1) add new beliefs; (2) refine existing beliefs; (3) delete beliefs; and (4) explain beliefs (find out the misunderstandings that cause erroneous or incomplete beliefs). 4 The Interaction Model in STyLE-OLM Any interaction involves a set of conventions for communicating (a communicative language) and a model that monitors the components of the communication (a dialogue model). Next in this section we discuss these issues presenting the interaction model in STyLE-OLM which is based on the following assumptions: • The system and the learner are provided with a common language that allows them to express and understand their actions. • In the utterance of a "sentence" in this language, the speaker is able to indicate both the propositional content and the illocutionary force [14] in a way that allows the hearer to understand them clearly. • The system-learner interaction is rule governed. This decreases the ambiguity of utterances and reduces the computational complexity of the dialogue model. 4.1 Communication language Researchers from different areas have formalised human-machine interaction as a set of communicative acts (CAs) and rules for monitoring these acts in order to achieve communicative goals. For example, Baker [ 1 ] defines the main negotiative CA and presents a model of negotiation in teaching-learning dialogues. Maybury [10] describes a plan-based communicative act approach for generating multimedia explanations. In STyLE-OLM the communicative acts are the minimal units of interaction. We consider a communication language based on a graphical representation of conceptual graphs that allows the system and the learner to create propositions when discussing the content of the LM. Some advantages of such an approach are: (1) Natural mapping. The graphical representation externalises student's beliefs represented with CGs (see §2) and provides a direct mapping between the external and internal system representation. We can hypothesise that being a kind of semantic network, CGs externalise the learner's internal mental representation which is built as a network connecting concepts with corresponding relations [5]; (2) Expressiveness. CGs have been used for translating computer-oriented formalisms to and from natural languages. CGs also inherit the high logical expressiveness of semantic networks [16]; (3) Cognitive effect. Communication based on diagrams can operate as a medium for thinking and understanding [2]. Our graphical communication approach is based also on the results of a study carried out to investigate whether people can read, build, manipulate, and communicate with CGs. The subjects, 29 secondary school Bulgarian students (17-18 yrs), studied in a specialised mathematical school where technical subjects (Computer Science in particular) were taught in English. Only three students had a little experience with graphs, nobody knew CGs. The study had three phases: training (introduction of the main CGs notations); test (6 questions to check the use of CGs for communication); discussion (post-experiment commentaries and suggestions from the participants). The study showed that the students understood information presented with CGs and, to a certain degree, expressed their knowledge by using CGs. The subjects extracted the relationships between concepts and understood

V. Dimitrova et al. /Maintenance of Open Learner Models

409

questions expressed with CGs. They created answers easily by changing CGs to represent the necessary meanings by adding new concepts/relations. The study showed some negative aspects in communicating with CGs, mainly concerning the ambiguity of relations and the directions of the arrows. Students also failed to build a new conceptual graph when they were given an English sentence. Most of them described this as a difficulty in distinguishing the main concepts and the relations that held between them in the text. Some students made useful comments such as "CGs helped me to distinguish clearly the concepts and the relationships between them" or "CGs keep language ambiguity and I am still confused with the meaning." The study showed that CGs could be used as part of the communication medium in a terminological learning environment. Some of the problems it indicated were taken into account in the design of the CGs communication language. STyLE-OLM provides a multimodal communication environment combining graphics, text, and some interface widgets such as menus and buttons. Following Maybury's taxonomy [10] we distinguish between graphical and linguistic acts in a multimodal communication. The former concern manipulating in a graphical medium (pointing, adding, moving, deleting, etc.) and the latter are the main dialogue moves and relate to speech acts in linguistic communication [14]. The dialogue moves in STyLE-OLM are adapted from Pilkington's DISCOUNT scheme' for analysing educational dialogues [12]. Preliminary selected dialogue moves were refined with a Wizard of Oz study where the designer played the role of STyLE-OLM and interacted with a colleague who played the role of a learner (an example of this interaction is given in $4.3). The dialogue moves in STyLE-OLM are: Inform (the speaker believes a proposition and informs the hearer about that); Inquire (the speaker asks about a proposition); Challenge (the speaker doubts a proposition); Withdraw (the speaker disagrees with a proposition); Justify (the speaker explains why a proposition is correct); Agree (the speaker agrees with a proposition); Suggest (the speaker suggests a new focus of the discussion); Deny (the speaker denies the suggested focus shift).

Figure 2. Dialogue move 9 from the example in §4.3 in the STyLE-OLM environment. The system (1) "points" to the terms OBJECT-ORIENTED LANGUAGE and INHERITANCE (graphical act - blinking); (2) creates a new ellipse for a relation and puts a question mark inside; (3) selects "Inquire" from the illocutionary area; (4) "submits" its move by "pressing" the button SUBMIT (blinking); (5) finally, the text explaining the dialogue move appears in the text area below. Follows a learner's turn. 1 DISCOUNT considers many dialogue moves. We have taken those we intuitively thought related to a diagnosing dialogue.


410

In the performance of a speech act, Searle [14] distinguishes "the proposition indicating element and the function indicating device". When analysing the utterances one can separate the analysis of these two. In human-human conversations, the boundaries of the prepositional and illocutionary parts are often interdependent. In our artificial communication environment they are shown explicitly. By combining different graphical acts the speaker can express the prepositional content of a CA. This is done in a diagrammatic propositional area (see figure 2) where CGs represent current propositions. The illocutionary force is explicitly indicated by selecting a dialogue move (see the illocutionary area in figure 2). This resembles the use of performative verbs to indicate the illocutionary force of a speech act [14]. For example, selecting the Inform verb and drawing a graph G means "Inform that PC" where PG is the proposition presented with the graph G. 4.2 Dialogue model The dialogue structure in STyLE-OLM is based on dialogue games. DG theory is a formal device for generating well-formed sequences of dialogue moves and is based on studies of naturally occurring human dialogues [9]. Following the main structure of a DG and the extensions described in [13] needed for a computer system to participate in an argumentative dialogue, we have designed the communication model in STyLE-OLM. The main DG components are adapted for interactive diagnosis and are described below. • Dialogue moves are described in §4.1. • Commitment stores (relate to the belief sets discussed in §2) represent each participant's beliefs about the learner's subject area knowledge. • Commitment rules define the effects of moves upon the players' commitment stores (see table 2). • Game rules define when the moves are allowed (i.e. which are the preceding moves of a move) and thus describe the pragmatic structure of the dialogue (see table 2). Table 2. DG commitment rules and game rules. P and Q indicate propositions represented with CGs Propositions indicated with the same letters in different rows are not connected.

Preceding moves

Dialogue move

Commitment Speaker

Store Rules Hearer

{Inform(Q), Inquire(Q), Challenge(Q\ Withdraw(Q) Justify(Q), Agree(Q), Suggest(Q), Deny(Q)}

Inform(P)

add P

add P

{Inform(Q)Jnquire(Ql Suggest(Q),Deny(Q)}

Inquire(P)

add not P

{Inform(P)}

ChaUenge(P)

delete P

{Inform(P)}

Withdraw(P)

(ChallengeiQ), Wthdraw(Q)}

Justifyi(P)

delete P; add not P add (P->Q)

{InformWJustiMP)}

Agree(P)

add P

{Inform(Q)Jnquire(Q\ Justify(Q\Agree(Q\ Suggest(Q)J)eny(Q)}

Suggest(P)

{Suggest(P)}

Deny(P)

In order to take part in a dialogue the system needs a strategy that defines the metalevel of the dialogue [13]. The strategy rules in STyLE-OLM fulfil diagnostic goals. So far


411

we have considered three sets of strategy rules that allow the system to play three kinds of DGs. Explain Learner Errors aims at finding the misconceptions causing the learner's errors. STyLE-OLM will follow a Collect More Information strategy when the LM does not contain enough beliefs for a particular term. The main aim of this game is to add more beliefs and it follows the principle to discover as much information as possible concerning the last concept discussed. When a conflict arises between the system and the learner's beliefs about the learner's knowledge, then a Resolve Conflicts game starts. DG rules allow most of the conflicts to be solved during the interaction in the previous two games. However, after withdrawing the other side's statement instead of accepting it and changing the beliefs in their commitment stores, the speaker may initiate a negotiative game. Examples of strategy rules for the Explain Learner Errors game that the system plays in the dialogue in 4.3 are: ER1 Ask the learner to confirm all the conditions before assigning a misconception. ER2 IF there is a misconception which has only one missing condition THEN inquire about this condition OR make a statement with this condition. ER3 IF there is a condition connected with the term under discussion and this condition can lead to an explanation of a misconception THEN inquire about this condition OR make a statement with this condition. ER4 IF there is more than one possibility to an explain an error THEN consider those conditions that have been discussed most recently. ER5 IF the suggested focus graph has a concept that appears in a condition of a misconception THEN accept the focus shift AND inquire about this condition or make a statement with it. 4.3 An Example dialogue With the model discussed above the system is able to participate in a dialogue about student's beliefs. The example below is from the Wizard of Oz study described in $4.1. STyLE-OLM plays an Explain Learner Errors DG trying to find out why the student wrongly thinks that VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE. For the sake of space we present the dialogue in a text form. In this dialogue, the learner suggests a discussion topic (move 2), states a proposition to start an exchange (move 4), and requests more information (move 6). 1•

S

2.

L[Suggest] Let US talk about OBJECT-ORIENTED LANGUAGE.

[inform, ERi] You think that VISUAL BASIC is an OBJECT-ORIENTED LANGUAGE.

3. L[inform] I know that an OBJECT-ORIENTED LANGUAGE contains OBJECTS. 4. S[inquire. ER3] Can you tell me a characteristic of OBJECT-ORIENTED LANGUAGE. 5.

L[inform] OBJECT.

6. L[Inquire] Can you tell me a characteristic of OBJECT-ORIENTED LANGUAGE. 7. 8.

S[inform, ER5] INHERITANCE - you do not know that. L[withdraw] I know it.

9. S[Inquire, ER5] Do you know a relation between VISUAL BASIC and INHERITANCE. 10. L[inform] I do not know. 11. L[inform] I know that VISUAL BASIC contains OBJECTS. Since all the conditions for misclassification_l (VISUAL BASIC, OBJECT-ORIENTED LANGUAGE, "contains OBJECTS") are TRUE it will be inferred from the learner model.

412


5 Conclusion Opening the learner model is a process involving the learner as a collaborator in building a model of his beliefs. Apart from building a more accurate learner model this approach has many positive effects promoting learner's reflection and knowledge awareness [3]. These effects can be exacerbated rather than improved if we ignore the communication issue of diagnosis and build systems where the responsibility for interaction has been given mainly to the computer. We have presented a computational architecture to maintain the interaction in open learner modelling. Two main issues - communicative language and dialogue model - have been discussed. The framework is based on communication theories, an empirical study, and some intuitive decisions. It is exemplified in STyLE OLM - an open learner modelling component in a scientific terminology learning environment. Acknowledgements We are grateftil to the members of Vision Cognition & Learning and Dialogue for Knowledge Interchange groups in CBLU for commenting on this research. The work is part of the EU funded LARFLAST project aimed at developing intelligent tools for learning scientific terminology. The first author is supported by the British ORS programme and the Open Society foundation.

References [1]

M. Baker, A model for negotiation in teaching-learning dialogues. Journal of Artificial Intelligence in Education 5 (1994) 199-254. [2] P. Brna, R. Cox, & J. Good, Learning to think and communicate with diagrams. Artificial Intelligence Review, to appear. [3] S. Bull, H. Pain, & P. Brna, Mr Collins: student modelling in intelligent computer assisted language learning. Instructional Science 2 (1995) 65–87. [4] S. Bull & P. Brna, What does Susan know that Paul doesn't (and vice versa): Contributing to each other's student model. In: B. du Boulay. & R. Mizoguchi. (eds.) Artificial Intelligence in Education Knowledge and Media in Learning Systems, IOS Press, Amsterdam, 1997, pp.568–570. [5] A. Collins & M. Quillian, Semantic hierarchies and cognitive economy. Journal of Verbal Learning and Verbal Behaviour 8 (1969) 7–24. [6] V. Dimitrova, D. Dicheva, P. Brna & L.A. Self, A knowledge based approach to support learning technical terminology. In P. Navrat & H. Ueno (eds.) Knowledge-Based Software Engineering, Proc. of the 3rd Joint Conference on Knowledge-Based Software Engineering, IOS Press, 1998, pp.270–277. [7] J. Greer et al. Supporting Peer Help and Collaboration in Distributed Workplace Environments. International Journal of Artificial Intelligence in Education 9 (1998) 159–177. [8] J. Kay, The UM toolkit for cooperative user modelling. User Modeling and User-Adapted Interaction 4 (1995) 149–196. [9] J. Levin & J. Moore, Dialogue games: meta-commumcation structures for natural language interaction. Cognitive Science 1 (1977) 395–420. [10] M. Maybury, Planning Multimedia Explanations Using Communicative Acts, In: M. Maybury (ed.) Intelligent Multimedia Interfaces, AAAI Press / The MIT Press, California, 1993. [ I I ] A. Paiva, & J.A. Self, TAGUS - a user and learner modelling workbench, User Modeling and UserAdapted Interaction 4 (1995) 197–226. [12] R. Pilkington, Analysing educational discourse: the DISCOUNT scheme, Technical Report #99/2, Computer Based Learning Unit, University of Leeds, 1999. [13] R. Pilkington, R. Hartley, D. Hintze & D. Moore, Learning to Argue and Arguing to Learn: An interface for computer-based dialogue games. Journal of Artificial Intelligence in Education 3 (1992) 275–285. [14] J. Searle, What is a Speech Act, In: P. Giglioli (ed.) Language and Social Context. Penguin Books., England, 1972. [15] J. Sowa, Conceptual Structures: Information Processing in Mind and Machine, ISBN: 0-201-14472-7. Addison-Wesley, MA, 1984. [16] K. Stenning, & R. Inder, Applying Semantic Concepts to Analyzing Media and Modalities. In Glasgow J., Narayanan H. & Chandrasekaram B. (eds.) Diagrammatic Reasoning - cognitive and computational perspectives, AAAI Press/The MIT Press, California, 1995. [17] R. Stevenson, Language, Thought, and Representation. John Willey & Sons. 1993


413

An Easily Implemented, Linear-time Algorithm for Bayesian Student Modeling in Multi-level Trees William R. Murray Teknowledge Corporation Embarcadero Road Palo Alto, CA 94303 wmurray @ teknowledge.com Abstract A key part of an intelligent tutoring system is its student model. Difficulties in developing robust and sound student models have led to the use of ad hoc approaches: certainty factor approaches with unclear semantics and updating procedures that can double-count evidence; and, more generally, to disenchantment with the intelligent tutoring approach in favor of microworlds, and learning environments that do not require student models. The recent use of Bayesian student models places student models and intelligent tutoring systems back on firm ground, where the laws of probability provide a well-denned semantics and a sound means of propagating changes to the model. But Bayesian student models have not been widely embraced due to the difficulty in understanding the mathematics behind Bayesian belief networks, the difficulty in implementing the algorithms, the high costs of commercial packages, and the sometimes lengthy delays in updating, as updating is in general NP-hard. The intention of this paper is to make a practical and useful form of Bayesian student modeling readily accessible to any ITS developer, regardless of their understanding of Bayesian belief networks, by providing an algorithm that can be easily implemented in less than a week in any object-oriented language. The algorithm only addresses student models represented as trees of any height and width, but in return guarantees linear-time updating. In addition, we show how the algorithm can be used to handle different kinds of evidence: query evidence, task evidence, and subjective assessments. The algorithm allows any number of these kinds of assessments. It allows any number of skill levels, and questions and tasks that are of similarly varying degrees of difficulty. Evidence can also be retracted, for example, to simulate the discarding of data that is less relevant as time passes. This algorithm, called BSMA, for Bayesian Student Modeling Algorithm, is derived from the tree propagation algorithms of Pearl and Neapolitan. It specializes Pearl's parallel algorithm lo run on one processor and restricts new evidence to nodes that are added to the tree, rather than allowing arbitrary instantiation of tree nodes to values. It also extends the algorithm to allow this evidence, previously introduced as new nodes, to be dropped later. Additional simplifications in notation have been made to facilitate an intuitive understanding of the algorithm's operation.

1.

Introduction In this paper we present an algorithm, called BSMA, for Bayesian Student Modeling Algorithm, that provides dynamic updating for a Bayesian student model where the model is restricted to a multilevel tree. The algorithm is derived from Bayesian updating algorithms by Pearl [5] and Neapolitan [4]. BSMA is a synthesis of these two algorithms, with additional simplifications and extensions appropriate to student modeling. This algorithm can be easily implemented within a few days in any object-oriented language. 2. The Bayesian Student Modeling Algorithm BSMA In this section we provide an intuitive understanding of how Bayesian propagation in trees works and provide pseudo-code for the implementation of BSMA in any object-oriented language. 2.1. Representation First we consider how the student model and performance data can be represented in Bayesian belief networks restricted to multi-level trees. 2.1.1. The student model We will use a simple example adapted from the Desktop Associate [2], an experimental knowledgebased performance support system to teach common tasks performed in desktop applications. Consider a representation of Word1 skills simplified to illustrate BSMA's representations and operation. In this ' Word is a trademark of Microsoft Corporation.

414

W.R. Murray / Algorithm for Bayesian Student Modeling

example, Word skills are decomposed into formatting skills and the ability to use Word styles. Formatting skills in turn are broken down into the ability to format paragraphs (e.g., indentation, line spacing, justification, etc.) and the ability to format tables (e.g., headers, borders, shading, etc.). The Word styles skill is broken down into the ability to use Word styles (e.g., to apply styles, to use the format painter tool, to copy styles, etc.) and the ability to modify styles (e.g., change header styles, change the font of footnote text, etc.). Each skill is measured as a probability distribution over possible skill levels. BSMA handles any number of skill levels. In this example we use only three levels: novice, beginner, and advanced, to simplify the exposition. The figure below shows the network after initialization. Numbers are rounded to two decimals. The parameters below each node and the messages passing up and down the links are used to compute probability distributions and to cache intermediate results. For example, Prob=[.34,.42,.24] for the node Tables means that the probability that the student is at a novice, beginner, or advanced level of skill in formatting tables is 0.34, 0.42, and 0.24, respectively, given that no evidence is available yet. 2.1.2. Uncertainty about the student's knowledge Note that skill level is represented by a probability distribution. The more uniform the distribution the more uncertain we are what the student's skill actually is. The initial belief of the student's knowledge for Word skills is taken from its prior probabilities: We expect there is an s g b i :

= causal_support - diagnoscic_support = probability_distribution = message_to_paren

80% chance the student is a novice, a 15% chance the student is a beginner, and a 5% chance the student is advanced. A subject matter expert or the results of population testing could be used to obtain the priors; these figures are just used for illustration. 2.13. Conditional dependencies The conditional dependencies p(XIU) between a parent u and a child x are represented by a matrix. There are two kinds of dependencies: from parent to child skills and from skills to performance data. P(X|U) u =nov u.=beg u,=adv x =nov .6 .1 .05 2.1.3.1. From parent to child skills .7 .3 .15 x.=beg x,=adv .2 .80 The first kind of conditional dependency represents the likelihood that a component skill will be known given knowledge of how well its parent skill is known. For this example, we assume that an advanced student will know most subordinate skills quite well; a beginner mighr know some skills at an advanced level and still be a novice at others; and a novice probably knows very few skills but may have some beginner-level knowledge in one or two areas. In an actual application, these numbers could be obtained from data or a subject matter expert. 2.1.3.2. From skills to performance data Similarly, the conditional dependency between a skill and performance data is expressed by a matrix. In this case, the rows correspond to the different values that can be observed in the performance


415

data, which depend on the kind of assessment. For example, a question (true/false, multiple-choice, short-answer, etc.) is either correct or incorrect. The results of a self-assessment can be any one of the skill levels. Task performance can be measured by the number of hints, assuming there is a maximum number of hints beyond which the solution is given. Queries and tasks can vary in difficulty. By aligning the measure of their difficulty with the skill levels, we can simplify the determination of parameters as shown in [3]. Examples of matrices for assessments are shown later. 2.1.4. Causal versus diagnostic support The key insight of Pearl's algorithm is its separation of evidential support into two kinds of support: causal support and diagnostic support. Causal support flows from the top down: An event happens that increases the probability of other events that depend on it. Diagnostic support flows from the bottom-up: Evidence predicted by a hypothesis has happened, making that hypothesis more likely. The conditional probability of a variable can be calculated from just two parameters, one representing evidential support from above, and one representing evidential support from below. Two parameters, rather than just one, are needed to keep these different kinds of support from being confused and to ensure evidence is not counted more than once, as can happen with certainty factors [5]. To simplify calculations, messages that cache measures of causal and diagnostic support are passed between nodes. The class representation for nodes in the BSMA algorithm is shown below. The causal and diagnostic parameters and the messages to and from the node's parent are included as slot values. Additional slots represent the possible values of the variable, any parent and child nodes, and the matrix representing the conditional dependencies of each of this node's variable values on each of the parent node's variable values. class node name: values: parent: children: causal_support: diagnostic_support: message_from_parent: message_to_parent: probability_distribution: matrix:

"Formatting" (novice,beginner,advanced] Word [Paragraphs,Tables] [0.50,0.35,0.15] in Pearl's notation [1,1,1] in Pearl's notation [0.80,0.15,0.50] in Pearl's notation [1,1,1] in Pearl's notation [.50,.35,.15] in Pearl's notation [[.60, .10, .05], (.30, .70, .15] , [.10, .20, .80] ]

The root node does not have a parent and its matrix slot represents the prior probabilities of that skill. In this example these are the prior probabilities that the student is a novice, beginner, or advanced, given earlier as [0.8,0.15,0.05]. Now we can consider how BSMA is implemented in an object-oriented language. We illustrate the belief updating algorithms for nodes, and then the belief propagation mechanisms for an entire network. 2.2. Belief updating in nodes First we consider updating the probability distribution, the parameters for causal and diagnostic support, and the messages sent between nodes representing each node's contribution of causal and diagnostic support. 2.2.1. Updating the probability distribution The probability distribution vector for a variable is computed by multiplying the corresponding components of the diagnostic support and causal support vectors, and then normalizing the results: method update_probability_distribution for i = 1 to | values | do for each possible skill value... { product 1; product product * diagnostic_support[i] * c a u s a l _ s u p p o r t [ i ] ; probability_distribution[i] product; } probability_distribution normalize(probability_distribution);

416

W. R. Murray / Algorithm for Bayesian Student Modeling

2.2.2. Causal evidence from above The causal_support parameter of a node depends on only its parent node, if any, and the conditional dependency of the node on its parent, represented by the node's matrix parameter. method update_causal_support for i = 1 to values do for each possible skill value... { sum 0; for j = 1 to parent .values | do and for each parent skill value... { sum sum + ( message_from_parent(j ] * matrix [i ,j] ) ; causal_support[i] sum; } causal_support normalize(causal_support);

2.2.3. Diagnostic evidence from below Calculating the diagnostic support is more complicated since it must incorporate the support from multiple children, not just a single parent node, as is the case with handling causal support. method update_diagnostic_support for i = 1 to |values| do for each possible skill value { product 1; for z = 1 to (children! do for each possible child... { child_node children[z]; product product * child_node.message_to_parent(i]; diagnostic_support [ i ] product; } diagnostic_support normalize(diagnostic_support );

}

2.2.4. Messages sent Once the causal and diagnostic parameters for a node have been determined the messages to send to nodes above and below the updated node can be determined. The message sent to a node's parent, message_to__parent, is a measure of the node's diagnostic support for that parent. The messages sent to each child are a measure of the parent node's causal support for that child, and are stored in the child node's message_from_parent slot. 2.2.4.1. Message to the parent: diagnostic support from below The diagnostic support message to send to the parent node is computed from the node's current measure of diagnostic support, the diagnostic_support parameter, and the matrix representing the conditional dependencies between the node and parent method u p d a t e _ m e s s a g e _ t o _ p a r e n t if parent = empty Chen r e t u r n ; for j = i to p a r e n t . v a l u e s dc sum

for each v a l u e of the parent

0;

for i=l co values; dc f o r each value o f t h e node sum sum + ( d i a g n o s t i c _ s u p p o r t ( i ! * m a t r i x ( i , j message_to_parent(j) sum; } message_tc_parent norma1ize(message_to_parent) ;

);

)

2.2.4.2. Messages from the parent: causal support from above Determining the message sent to each child requires taking into account the child's siblings' diagnostic support for the parent node, in addition to direct causal support the parent node receives from above: method update__message_from_parent if c h i l d r e n = empty then r e t u r n ; for i = 1 to c h i l d r e n do for each c h i l d of t h i s node... { c h i l d c h i l d r e n [ i] ; for 3 = 1 to values do for each value of this node... product causal_support[j] ; f o l d in causal support for z = 1 tc c h i l d r e n do now for each sibling... ; if 2 * i Chen --add in diagnostic support { s i b l i n g c h i l d r e n [ z] ,• product product * sibling.message_to_parent:3] ; child.message_from_parent[j) product; } child.message_from_parent normalize(chiId.message_from_parent);

;


417

2.3.

Belief updating in networks The belief updating methods for the network orchestrate the calling of the node update methods. 2.3.1. Networks The network class represents a complete student model tree. Additional slots keep track of which nodes need to have their causal support, diagnostic support, or probability distributions updated. Here is the representation, with values shown for our example. The last three slots are queues, which are empty once all nodes in the network are initialized. class network name: "Word Skills" root: Word nodes: [Word,Formatting,Styles,Paragraphs,Tables,Using,Modifying] pending_diagnostic_updates: [] pending_causal_updates: [] pending_belief_updates: []

The method PROPAGATE_BELIEFS_IN_TREE, shown below, will first ripple upwards all diagnostic support, thus emptying the first queue and possibly placing new additions onto the other queues in the process. Next, causal support will ripple downwards, possibly placing new additions onto the last queue, but not on the now empty first queue. Finally, once both the first and second queues are emptied the last queue is processed. It does not cause any new nodes to be placed on any queues. Once all queues are drained the updating process is complete and all conditional probabilities have been revised to incorporate any new evidence. method propagate_beliefs_in_tree loop do -- continue until all queues are empty { if pending_diagnostic_updates is not empty then { propagate_diagnostic_support; continue; } if per.ding_causal_updates is not empty then { propagate_causal_support; continue; } if pending_belief_updates is not empty then { update_beiiefs; continue; } exit loop; }

2.3.2. Updating queues The methods for updating the queues are shown below. 2.3.2.1. Handling pending updates for diagnostic support To update a node's diagnostic support, we first call UPDATE_DIAGNOSTIC_SUPPORT if the node has children; otherwise there is no need to. Next the diagnostic support message to the parent can be calculated. To continue the propagation up towards the root node the parent node is added to the end of the diagnostic support updates queue, pending_diagnostic_updates.

We also need to queue the node for causal support updating and probability distribution updating. Even though the node's causal support is not altered by changing its diagnostic support, the causal support messages it sends down to its children need to be updated. method propagate_diagnoscic_support update_node .- pop(pending_diagnostic_updates ) ; if update_node is a non-terminal node then update_node.update_diagnostic_support; update_node.update_message_to_parent; pending_causal_updates.add_if_absent(update_node); --propagate down p e n d i n g _ b e l i e f _ u p d a t e s . a d d _ i f _ a b s e n t ( u p d a t e _ n o d e ) ; --update belief if update_node is not the root node then --propagate up { parent_node update_node.parent; - - i f possible pending_diagnostic_updates.add_if_absent(parent_node); }

2.3.2.2. Handling pending updates for causal support To update the first node on the causal support queue we call the node's method UPDATE_CAUSAL_SUPPORT and then queue its children for causal updates. The node is also queued to have its belief updated to reflect the current evidence. method propagate_causal_support update__node pop (pending_causal_updates) ;

418

W. R. Murray / Algorithm for Bayesian Student Modeling

if update_node is an evidence node then return; if update_node * root then update_node.update_causal_support; update_node. update_message_from_parent ; pending_causal_updates.add_if_absent(children(update_node) ) ,--propagate i pending_belief_updates.add_if_absent(update_node); --update belief

2.3.2.3. Handling pending updates for probability distributions

To revise the beliefs of the nodes in the queue pending_beiief_updates we just call each node's UPDATE_PROBABILITY_DISTRIBUTION method.

method update_beliefs update_node pop (pending_belief_updates); update_node.update_probability_distribution; 2.3.3. Initializing the network

The network is initialized by setting the causal support parameter for the root node equal to the priors for that skill. The root node is added to the queue pending_causal_updates, which will be used to propagate its influence down throughout the remaining nodes once belief propagation starts. The diagnostic support parameters and messages to parent nodes are set to vectors of all 1 's. The non-root nodes do not need to be queued for updating. As the root node's causal support propagates downwards, it sets the messages from parent nodes and causes all descendant nodes to be updated for both causal support and probability distribution. At the end of this process all nodes are initialized correctly. method initialize_network for i = 1 to nodes do { update_node nodes[i]; ) fcr 3 = 1 to update_node.values| do { update_node.diagnostic_support[j ] = 1 ; if update_node root then update_node.message_to_parent[j]= 1; for j = 1 to root.values| do { causal_support[j]=root.matrix[j]; } pending_causal_updates.add_to_end(root) ; propagate_beliefs_in_tree;

3.

)

Adding and retracting different kinds of evidence p(X|U) u.=beg u.=adv In this section we consider how to add and retract x.=right .2 .9 .2 evidence once the network is initialized. x =wrong 3.1. Adding new evidence We illustrate how to add three different kinds of assessments. One is a question, one is a task, and one is a student self-assessment. 3.1.1. Adding query data Assume we ask a question about how to format paragraphs. The question is advanced level: we only expect advanced students to answer this correctly, barring slips and lucky guesses. The conditional dependency matrix for this question is shown above and to the right. It assumes slip and guess parameters of 0.1 and 0.2, respectively (see [3]). A second conditional dependency matrix (not shown) is also required for beginner level questions. In general, if there are n levels of difficulty then n-I matrices are required since at the lowest level of skill, in this case the novice level, most questions are missed. Evidence is added as shown below. In this case the skill is Paragraphs and the evidence node has a generated name such as ADVANCED_LEVEL_QUERY_l. The method first links the evidence node to the skill node as its child. Next, it sets the diagnostic support and probability distribution of the evidence node to 0's for all elements except the value observed, which is set to I. Since the new node changes the diagnostic support for its parent, that node is queued for updating. After all changes are propagated, the student model reflects higher probabilities for beginner and advanced knowledge of all skills. The probability distributions change more for nodes closer to the evidence added. method add_evidence (datum (evidence node), skill [a node representing a skill]) datum.parent skill; --link child to parent push datum, ski 11.chiIdren; --link parent to new child


419

for i = 1 to datum.values do --j is evidence observed, E = v a l u e s [ j ] { datum.diagnostic_support[i] if i = j then 1 else 0; datum.probability_distribution[i] if i = j then 1 else 0; } datum.update_message_to_parent; push s k i l l , pending_diagnostic_updates; propagate_beliefs_in_tree;

3.1.2. Adding subjective assessments u.=nov u,=beg u,=adv P(X|U) The next kind of assessment we will add is a subjective .2 x,=nov .6 .3 assessment. This could be an assessment by a teacher, peer, .4 x,=beg .3 .6 or the student. In each case the reliability of the assessment .1 x,=adv .4 .1 will be more or less credible, depending on the source. For this scenario we assume a student self-assessment and we bias the conditional probability matrix to be more skeptical of claims of greater ability than claims of lesser ability. The matrix used to represent the conditional dependency is shown to the right. The column headers are the student's actual skill level and the row headers are the student's self-assessment. Assume in our example that the student reports an advanced level of skill with Word styles. This new evidence strengthens belief that the student is beginning or advanced in the ability to use Word styles. 3.1.3. Adding task performance data Finally, let us add the third kind of assessment: the results of task performance. As we are suspicious of the student's claim to have advanced knowledge of Word styles we give them a beginner level task in applying Word styles. For example, we ask them to change all citations in a text to a citation style that we provide. Assume they require two hints before they can succeed in performing this task. The third hint would have given the task away. An advanced user would not need any hints. We expect the student's skill level in Word styles to shift from advanced to novice, and this is indeed what happens. The final results are shown on the next page. P(X|U) u.=nov u- = beg u,=adv The conditional matrix used in the task x. = 0 hints .10 .60 .80 performance assessment is shown to the right. In this x. = l hint .15 .30 .15 case, the task provided is a beginner-level task. Thus, a .60 x,=2 hints .05 .03 x. = 3 hints .15 .05 .02 beginner should be able to perform the task with no hints, barring slips. An advanced user is even more likely to be able to perform the task with no hints, again barring slips. A novice, on the other hand, is not expected to be able to perform this kind of task without additional help. In this case, we assume that the novice is most likely to need two extra hints to solve the task. 3.2.

Retracting evidence Evidence is retracted by removing a previously added evidence node and then updating its parent. One reason we might want to retract evidence is because it is too old. The method for retracting evidence is shown below: method retract_evidence (node [a previously added evidence node ]) skill node.parent; node.parent empty; unlink parent and child nodes skill.children skill.children.delete node; push skill, update_diagnostic_support; propagate_beliefs_in_tree;

4.

Related work In this section we consider related work in Bayesian networks and in intelligent tutoring systems. 4.1. Bayesian belief networks and propagation algorithms First we consider work in Bayesian belief networks and propagation algorithms. 4.1.1. Pearl and Neapolitan The algorithm presented here is derived primarily from Pearl's algorithm [5] by instantiating it for one processor and choosing a particular order for updating nodes. It also incorporates Neapolitan's network initializations [4]. BSMA extends both algorithms to handle new evidence represented by nodes


420

that are dynamically added or removed. BSMA is much easier to understand than these algorithms and updating formulas: compare, for example, [5], pages 162–174, to the code presented in this paper. Caus - [.80. .15. . 0 5 ; Diag = i . 30. .21. . 4 9 ; Prob us = 1 . 5 3 , . 3 3 . . 1 4 ) ag = [ . 2 0 . . 2 5 . . 5 5 ] o b Ob = [ . 4 0 , . 3 1 . . 2 9 ) 46..35..18J

Modifying

4.2.

Applications of Bayes to intelligent tutoring systems

Murray [3] provides a more extensive survey of Bayesian student modeling than space permits here. That paper presents an approach to updating using the Simple Bayes model, a one-level tree. This paper presents a more general algorithm capable of handling multi-level trees. It does not address dynamic belief networks, as Reyes does [6], or handle directed cycles in belief networks, as can arise in the Andes Tutor's student modeling [1]. 5. Conclusions A key stumbling block for intelligent tutoring systems has been the difficulty of student modeling. Bayesian student modeling provides a formally correct way of modeling uncertainty, but has been difficult to apply, requiring either expensive commercial packages or, alternatively, wading through pages of complex mathematics, choosing appropriate algorithms, deciding how to apply them. interpreting the formulas, and finally deriving and debugging the code. In general Bayesian updating is NP-hard. BSMA chooses a middle ground, using tree representations only, to allow linear-time updating and an easily implemented object-oriented algorithm. The key contribution is to make fast Bayesian updating readily available to ITS developers, without required a detailed knowledge of Bayesian networks or their propagation formulas. References 11) Conali C.. and VanLchn K. POLA: A student modeling framework for probabilistic on-line assessment ol problem solving performance Proceedings of UM-96. Fifth International Conference on User Modeling. |2) Murray, W. R. 1997. Intelligent Tools and Instructional Simulations—the Desktop Associate. Final Report. Teknowledge Corporation Submitted to Armstrong Laboratory. Aircrew Training Research Division October 1997 Contract N66(X)l-95-l)-X642AXX)3 |3] Murray. W.R. 1998. "A Practical Approach to Bayesian Student Modeling " Lecture Notes in Computer Science 1452 Proceedings. 4th International Conference. ITS '98 Goetll. Halff. Redfield, and Shute (cds.). Sponger. 1998. pp 425–433 |4] Neapolitan. R I990. Probabilistic Reasoning in Expert Systems. Wiley InterScience |5] Pearl. J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaulmann |6| Reyes. J 1998 'Two-Phase Updating of Student Models Based on Dynamic Belief Networks." Lecture Notes in Computer Science 1452. Proceedings. 4th International Conference. ITS '98. Springer. 1998. pp. 274–283


421

Error-Visualization by Error-Based Simulation Considering Its Effectiveness -Introducing Two ViewpointsTomoya Horiguchi*, Tsukasa Hirashima**, Akihiro Kashihara***, Jun'ichi Toyoda*** *Faculty Chair of Information Engineering, Kobe University of Mercantile Marine 5-7-7, Fukaeminami, Higashinada, Kobe 658 JAPAN Phone: +81–78431–6263, E-mail: [email protected] **Department of Artificial Intelligence, Kyushu Institute of Technology 680–4, Kawatsu, lizuka, Fukuoka 820 JAPAN Phone: +81-948-29-7618, E-mail: [email protected] ***The Institute of Scientific and Industrial Research, Osaka University 8-1, Mihogaoka, Ibaraki, Osaka 567 JAPAN Phone: +81-6-879-18426/8425], E-mail: [kasihara/toyoda]@ai.sanken.osaka-u.ac.jp Abstract: Behavior simulation is an effective method for visualizing the meanings of mechanical concepts. In order to visualize a learner's error, we previously proposed Error-Based Simulation (EBS) which reflects an erroneous equation a learner made in solving mechanical problem. The differences between BBS and a normal simulation illustrate the error. However, BBS isn't always effective, especially when the error isn't visualized so clearly. In order for BBS to be effective, it should have qualitative differences from a normal simulation. Therefore, we also proposed a framework for managing BBS with qualitative reasoning techniques. It included two fundamental mechanisms: (1) to judge whether or not the BBS has qualitative differences, and (2) if not, to find the parameter of which change causes qualitative differences. However, the framework focused only on how clearly the error appears in BBS. In order to cause qualitative differences, it is often necessary to change parameters in the mechanical system. In such a case, the BBS often becomes factitious and a learner feels it unreliable. Therefore, it is also necessary to consider how largely the parameter is changed. The effectiveness of BBS must be estimated from these two viewpoints. The former is called "the appearance of BBS," and the latter "the reliability of EBS." For this purpose, in this paper, we analyze the effectiveness of EBS through introducing these two viewpoints, and clarify the effectiveness structure of BBS. Based on the result, we also propose an advanced framework for managing BBS. The usefulness of the framework is evaluated through experiments using a prototype system. 1. INTRODUCTION Behavior simulation is an effective method for visualizing the meanings of mechanical concepts and has been applied in many computer-aided education systems[12, 13, 18]. These simulations, however, aren't useful for a learner who has already formulated an erroneous equation. In order to help the learner recognize and correct the error, simulating the erroneous equation is a promising method. We previously proposed Error-Based Simulation (BBS): this reflects an erroneous equation a learner made in solving a mechanical problem [4]. In the EBS, the mechanical object behaves in a strange manner ruled by an erroneous equation. The differences between BBS and a normal simulation illustrate the error. However, EBS isn't always effective, especially when the error isn't visualized so clearly. We have confirmed the fact through experiments [5]. In the experiments, when differences between EBS and a normal simulation weren't qualitative ones, learners often couldn't be convinced that the EBS did reflect the error. In order for EBS to be effective, it should have qualitative differences from a normal simulation. Therefore, it is very important to diagnose the differences between EBS and a normal simulation. We also proposed a framework for managing EBS with qualitative reasoning techniques. It included two fundamental mechanisms [6]: (1) to judge whether or not the EBS has qualitative differences, and (2) if not, to find the parameter of which change causes qualitative differences. The former is realized with qualitative simulation [8, 9], and the latter with comparative analysis [14, 16]. However, the framework focused only on how clearly the error appears in EBS. In order to cause qualitative differences, it is often necessary to change parameters in the mechanical system. The more largely the parameter is changed, the more different the mechanical system becomes from the original system. In such a case, the EBS often becomes factitious and a learner feels it unreliable. Therefore, it is also necessary to consider

422

T. Horiguchi et al / Error- Visualization by Error-Based Simulation

how largely the parameter is changed in the mechanical system [7]. For managing EBS, it is necessary to estimate the effectiveness of EBS from these two viewpoints. The former is called "the appearance of EBS," and the latter "the reliability of EBS." For this purpose, in this paper, we analyze the effectiveness of EBS introducing these two viewpoints, and clarify the effectiveness structure of EBS. Based on the result, we also propose an advanced framework for managing EBS. The usefulness of the framework is evaluated through experiments using a prototype system.

2. ERROR-BASED SIMULATION AND ITS MANAGEMENT 2.1 Error-Based Simulation EBS is generated by mapping an erroneous equation in formula-world to simulation-world. It shows unnatural behavior in contrast with a correct simulation. The differences arouse cognitive conflict, which motivates a learner to recognize and correct her/his error. The following is a brief summary of generating EBS (for more detail, see [6]). The procedure to generate EBS is divided as follows: (1) specifying a mechanical object (e.g. block, slope and platform-car) and its physical attribute (e.g. "velocity," "acceleration" and applied "force.") to reflect the erroneous equation. The object is called EV-object, and the attribute EV-attribute. (2) calculating the value of EV-attribute with the erroneous equation. The value is called EV-value. As for the other objects and their attributes, the correct values are calculated with correct equations. As a result, the EV-object behaves erroneously, while the others behave correctly. As for EV-attribute, "velocity" and "acceleration" are essential in behavior simulation. These attributes are visible as they are. However, there are not a few cases in which they aren't sufficient to reflect the error. In such cases, it is necessary to select other physical attributes, such as "force," "energy," etc. Note that these attributes need some kind of metaphor to become visible. As to "force," for example, an arrow is a useful metaphor (its length indicates the force's magnitude and its direction the force's direction). We illustrate an example of EBS by using a simple mechanical problem as seen in Figure 1. When a learner sets up Equation-B as an equation of the Block, EBS based on Equation-B shows the Block ascending the Slope, while a correct simulation based on Equation-A shows it descending the Slope. Here, the Block is selected as EV-object, its "velocity" is selected as EV-attribute, and the EV-value is calculated with Equation-B. In this example, the error is so clearly reflected onto EBS that a learner can easily recognize it. However, for other cases, EBSs can't always visualize errors so clearly. Therefore, a management function for EBS is required. 2.2 Requirements for EBS Management The procedure above to generate EBS pays no attention to what kind of differences EBS has from a correct simulation. Occasionally, EBSs are generated which aren't useful in visualizing errors. In Figure 1, for example, when a learner sets up Equation-C, EBS based on Equation-C only shows the Block moving in the same direction along the Slope as a correct simulation at a little different velocity. In this case, it is difficult for a learner to judge which behavior is correct, and she/he often becomes confused. Therefore, for Equation-C, EBS shouldn't be used as it stands. The same is true for Equation-D. However, by applying some kind of "parameter change" to the mechanical system, it often becomes possible to make the differences much clear between EBS and a correct simulation. In Figure 1, in the case of Equation-C, when the angle of the Slope 0 increases, the velocity of the Block in EBS decreases, while the one in a correct simulation increases. Such an unnatural change in behavior enables a learner to recognize the error. That is, in this case, applying the "parameter change" that perturbs the parameter 0 makes EBS effective. Moreover, in the case of Equation-D, when 0 falls to zero (meaning a flat floor), the Block still moves at gravity acceleration in EBS, while it no longer moves in a correct simulation. When 0 stands at 90 degrees (meaning a vertical wall), the Block falls down at infinite acceleration in EBS, while it falls down at gravity acceleration in a correct simulation. These are both unnatural behaviors. That is, in this case, applying the "parameter change" that changes the parameter 0 to its boundary value makes EBS effective. We think such modifications are useful and allow various kinds of "parameter change" methods. They are called PC-methods, and the parameter to be changed is called PC-parameter. Through these examples, in order to visualize an error clearly with EBS, it can be assumed that EBS should have Question) Set up the equation for the Block on the Slope Block,

Equation-A: ma = mg sin Equation-B: ma = - mg sin Equation-C: ma = mg cos Equation-D: ma cos - mg

Figure 1. An Example of a Mechanics Problem-1.

T. Horiguchi et al. /Error-Visualization by Error-Based Simulation

423

qualitative differences in EV-attribute and/or its derivative from a correct simulation. Therefore, the conditions in order for EBS to be effective are formulated as follows: Condition for Error-Visualization'1 (CEV-1): There is a qualitative difference between the EV-object's EVattribute in EBS and the one in a correct simulation, that is, the qualitative values (e.g. "plus," "zero" and "minus") of their EV-attribute are different. Condition for Error-Visualization-2 (CEV-2): There is a qualitative difference between the EV-object's behavioral change in its EV-attribute in EBS and the one in a correct simulation, that is, the qualitative values (e.g. "increasing," "steady" and "decreasing") of the ratio of their EV-attribute's change to the PC-parameter's change are different. EBS must be adequately managed based on the conditions above. The module for checking CEV-1 can be realized with qualitative simulation. First, by QSIM [8,9], the qualitative behavior of EBS is predicted. Then, it is compared with the qualitative behavior of a correct simulation predicted by the same means. If the interval is found in which these behaviors are qualitatively different as to EV-attribute, CEV-1 is judged to be satisfied. Note that, in a few cases, HR-QSIM [15, 16] should be used instead of QSIM, because changing a parameter to its boundary sometimes causes other parameters to become infinite or infinitesimal. The module for checking CEV-2 can be realized with comparative analysis. First, by DQ-analysis [14, 16], the qualitative behavioral change of EBS to PC-parameter's change is predicted. Then, it is compared with the qualitative behavioral change of a correct simulation predicted by the same means. If the interval is found in which these behavioral changes are qualitatively different as to EV-attribute, CEV-2 is judged to be satisfied. For more detail about these modules, see [6]. In Figure 1, EBS based on Equation-B (with no "parameter change") satisfies both CEV-1 and CEV-2. Both the Block's velocity and acceleration (i.e. EV-object's EV-attribute and its derivative) in EBS are qualitatively different from the ones in a correct simulation. Note that acceleration is the ratio of velocity's change to the specific parameter "time." EBS based on Equation-C (with the "parameter change" that perturbs the parameter 0) satisfies only CEV-2. The Block's velocity in EBS is qualitatively the same as the one in a correct simulation, while the ratio of the Block's velocity's change to the 0's change is qualitatively different between them. EBS based on Equation-D (with the "parameter change" that changes the parameter 0 to zero) satisfies both CEV-1 and CEV-2. Both the Block's velocity and acceleration in EBS are qualitatively different from the ones in a correct simulation. As is described above, for generating EBS, various factors must be considered: (1) what kind of EV-attribute is selected, (2) How well the CEVs are satisfied, and (3) what kind of PC-parameter/PC-method is selected/ applied. They have significant influence on the effectiveness of EBS, and often conflict with each other. Such an example will be illustrated in the next section. 3. EFFECTIVENESS OF ERROR-BASED SIMULATION 3.1 Two Points of View In Figure 2a, when a learner sets up Equation-B, two kinds of EBS can be generated, shown in Figure 2b (EBSb) and Figure 2c (EBS-c). In these EBSs, the "force" applied to the pulley is selected as EV-attribute and visualized with arrow metaphor. In EBS-b, a "parameter change" is applied, which changes the parameter "m" (the mass of the left Block) to its boundary value zero. In EBS-c, on the other hand, a "parameter change" is applied, which changes the multiple parameters "m" and "M" (the mass of the right Block) simultaneously to their boundary values conserving the total magnitude of mass. This method is also useful, as will be shown below. (Question) Set up the equation for the force F applied 10 the Pulley in Figure 2i.

2a

F?

2b

M \ Qualitative Value of Qualitative Value of Qualitativc Value of Qualitative Value of Force Force's Change Force Force's Change

Equation-A: F = 4mMg/(m + M) Equation-B: F=(m + M)g

zero

minus

zero

minus

phis

minus

plus

zero

CEV.1

satisfied

satisfied

CEV-2

not satisfied

satisfied

Figure 2. An Example of a Mechanics Problem-2.

424

T. Horiguchi et al / Error- Visualization by Error-Based Simulation

EBS-c satisfies both CEV-1 and CEV-2, while EBS-b satisfies only CEV-1. Therefore, the error appears much more clearly in EBS-c than in EBS-b. However, it isn't necessarily concluded that EBS-c is more effective than EBS-b. In fact, a simple experiment revealed that not a few learners thought EBS-b more effective than EBS-c. They remarked that the mechanical system in EBS-c is too different from the original system, while the one in EBS-b isn't. They felt EBS-c is much more factitious and unreliable than EBS-b, because the applied "parameter change" in EBS-c is larger than the one in EBS-b. That is, even though EBS-c satisfies more CEVs than EBS-b, EBS-c is less effective than EBS-b, they felt. Therefore, in order to diagnose the effectiveness of EBS, it is necessary to consider not only how well the EBS satisfies CEVs, but also what kind of "parameter change" is applied to the mechanical system. The former is concerned with "the appearance of EBS," and the latter "the reliability of EBS." For managing EBS, we need to analyze the effectiveness of EBS from these two viewpoints. As is described above, the effectiveness estimated from the former viewpoint and the one from the latter often conflict with each other. Therefore, we must design the framework for managing EBS which uses EBSs from both viewpoints on a case-by-case basis. For this purpose, we analyze the effectiveness of EBS from these two viewpoints separately. 3.2 The Appearance of EBS "The appearance of EBS" is the viewpoint which focuses on how clearly the EBS visualizes the error. Based on the preceding discussion, from this viewpoint, the following two factors are concerned with the effectiveness of EBS: (1) What kind of EV-attribute is selected in order to reflect the error onto EBS? (2) What kind of CEVs are formulated in order to diagnose the effectiveness of EBS, and how well they are satisfied? As for EV-attributes, there are various physical attributes, such as "velocity," "force," "energy," etc. However, in behavior simulation like EBS, these attributes but "velocity" aren't visible and need some kind of metaphor. We call the EV-attribute which needs metaphor "attribute-with-metaphor," and the one which doesn't need metaphor "attribute-without-metaphor." Here, we assume the following preference: (attribute-withoutmetaphor] > [attribute-with-metaphor] ([A] > [B] means that [A] is more effective than [B] for visualizing errors). This is an assumption which contends that a human can more easily recognize the physical attribute which doesn't need metaphor than the one which needs metaphor because of the cognitive load. We have verified this assumption through cognitive experiments [7]. As for CEVs, we have formulated CEV-1 and CEV-2 in the preceding section. Here, we assume the following preference: [Both CEV-1 and CEV-2 are satisfied] > [Only CEV-1 is satisfied] > [Only CEV-2 is satisfied]. This is an assumption which contends that a human can more easily recognize a physical attribute than its derivative. We have also verified this assumption through cognitive experiments [7]. 3.3 The Reliability of EBS "The reliability of EBS" is the viewpoint which focuses on how natural a learner feels the EBS is. Based on the preceding discussion, from this viewpoint, the following two factors are concerned with the effectiveness of EBS: (1) What kind of PC-parameter is selected as a parameter to be changed? (2) What kind of PC-method is applied to the mechanical system in order to cause qualitative difference? As for PC-parameters, there are various physical attributes, such as "angle," "force," "mass," etc. However, in behavior simulation like EBS, only a few attributes are visible which describe the object's figure (e.g. "angle"). The others need some kind of metaphor. For example, Block's "mass" becomes visible with its size metaphor. We call the PC-parameter which needs metaphor "parameter-with-metaphor," and the one which doesn't need metaphor "parameter-without-metaphor." Here, because of the same reason as described in EV-attributes, we assume the following preference: [parameter-without-metaphor] > [parameter-with-metaphor]. We have verified this assumption through cognitive experiments [7]. As for PC-methods, the following three methods are essential: (1) applying no "parameter change," (2) perturbing the PC-parameter, and (3) changing the PC-parameter to its boundary value. These are called "parameter-original method," "parameter-perturbation method," and "parameter-boundary method," respectively. In addition, as was indicated in the preceding section, the method is also useful, that changes multiple PC-parameters while simultaneously conserving the total magnitude. Therefore, we add the following two methods: (4) perturbing multiple PC-parameters while simultaneously conserving the total magnitude, and (5) changing multiple PC-parameters simultaneously to their boundary values conserving the total magnitude. They are called "parameter-conserving-perturbation method," and "parameter-conserving-boundary method." Here, we assume the following preference: [parameter-original method] > [parameter-perturbation method] > parameter-conserving-perturbation method], [parameter-boundary method] > [parameter-conservingboundary method]. This is an assumption which contends that a human is more sensitive to discontinuous change than to continuous change. We have also verified this assumption through cognitive experiments [7]. Based on the discussion above, we can illustrate the relations between the factors which are concerned with the

T. Horiguchi et al. / Error- Visualization by Error-Based Simulation

425

/ # EV-attribute / Local Preference: • $ The Appearance of * The Effectiveness of

EBS

# CEVs Local Preference:

EBS $ The Reliability of EBS

#

PC-parameter Local Preference: \# PC-method Loal Preference

Figure 3. The Effectiveness Structure of Error-Based Simulation. effectiveness of EBS in Figure 3. In this figure, each factor has its own locally defined preference. We call it "local preference." We think that this result gives a useful guideline to diagnose the effectiveness of EBS. According to this figure, the two viewpoints conflict with each other. Moreover, even when the viewpoint "the appearance of EBS" is adopted, EV-attribute and CEVs conflict with each other (e.g. What kind of EV-attribute is selected often affects how well CEVs are satisfied). Similarly, even when the viewpoint "the reliability of EBS" is adopted, PC-parameter and PC-method conflict with each other (e.g. What kind of PCparameter is selected often affects what kind of PC-method can be applied). For managing EBS, these conflictions need to be adequately mediated. One simple solution to this problem is to give priority to one of the factors which are concerned with the conflict.

4. FRAMEWORK FOR MANAGING ERROR-BASED SIMULATION Based on the result in the preceding section, we propose the advanced framework for managing EBS. It is divided into two procedures: The managing procedure giving priority to "the appearance of EBS," and the one giving priority to "the reliability of EBS." Which viewpoint is preferred should be determined adequately, considering the learner's, problem's and error's property. 4.1 Managing Procedure Giving Priority to "the Appearance of EBS" According to "the appearance of EBS," EV-attribute and CEV are the essential factors. Therefore, there are two kinds of managing procedures from this viewpoint: the procedure giving priority to selecting EV-attribute, and the one giving priority to satisfying CEVs. The former is shown in Figure 4, for example. Giving priority to selecting EV-attribute: Firstly, a physical attribute is selected as EV-attribute considering its local preference. Secondly, it is checked whether or not the selected EV-attribute can satisfy any CEVs. If not, another physical attribute is selected and the same procedure is repeated until the EV-attribute which can satisfy any CEVs is found. Giving priority to satisfying CEVs: Even when a selected physical attribute as EV-attribute satisfies CEVs, it can be replaced by the other physical attribute which satisfies CEVs much more. In this case, the physical attribute is searched and selected as EV-attribute, that satisfies CEVs the most, considering the local preference of CEVs. In both procedures, in order to make the selected physical attribute satisfy the CEVs, any PC-parameter or any PC-method may be selected, because how clearly the EBS visualizes the error is preferred to how natural a learner feels the EBS is. These two procedures should be used adequately according to the learner's, problem's and error's property. 4.2 Managing Procedure Giving Priority to "the Reliability of EBS" According to "the reliability of EBS," PC-parameter and PC-method are the essential factors. Therefore, there are two kinds of managing procedures from this viewpoint: the procedure giving priority to selecting PCparameter, and the one giving priority to selecting PC-method. The former is shown in Figure 5, for example. Giving priority to selecting PC-parameter: Firstly, a parameter is selected as PC-parameter considering its local preference. Secondly, a "parameter change" method is selected as PC-method considering its local preference. Thirdly, it is checked whether or not there is any physical attribute which can satisfy any CEVs. If not, another parameter and another "parameter change" method are selected and the same procedure is repeated until any physical attribute which can satisfy any CEVs is found. Giving priority to selecting PC-method: Even when a selected parameter as PC-parameter with a selected "parameter change" as PC-method causes a physical attribute to satisfy CEVs, it can be replaced by the other parameter which cause a physical attribute to satisfy CEVs with much more natural "parameter change." In this case, the parameter is searched and selected as PC-parameter, that causes a physical attribute to satisfy CEVs with the most natural "parameter change," considering the local preference of PC-method.

426

T. Horiguchi et al. /Error-Visualization by Error-Based Simulation

Figure 4. Managing Procedure Giving Priority to Selecting EV-attribute. ("the Appearance of EBS" is preferred.)

Figure 6. The Architecture of the EBS Management System. Table 1. The Results of Experiment-1. Table la. The Usefulness of EBSs in Mechanics Problem-1 (In all EBSs. no conflict occurs)

ThinkingtheEBS Thinking the EBS

p< 01 p Let's be more specific about that AL I notice that you really are organized, can you elaborate? (to BL) NO, I'm at home..." (to BL) Remember to whisper if you're off topic, ok? I try very hard not to waste steps or precious moments. I go through my mail over the waste basket. I dump what is not needed so I don't have these huge piles. I keep files with things organized I know that I forget anything I don't write down. ..I even have to E-mail myself sometimes. (to NL) You mean AOL isn't a legitmate use of time mangenem ?<w> (to BL) Did we get the situation with J worked out? :nods SO AL— it sounds like those little time savers you do add up to big time savings overall... OK--How about saving time with this: <error> NR, you're an old timer at this school teaching business... what ideas do you have? (projects prepared text) (to NL) He looks pouty but he hasn't complained :) (to NL) "Didn't mean to step on your toes..." (to CS) have you asked about loading aol at school? (to BL) I think it's totally your call unless he wants to complain to higher authorities

The differences between Group A and Group B were quantified using the "Deficits" portion of the probability network as shown in figure 2, which is a sub-branch under the Group Information Processing Attention branch of the network. The group attention deficit index (Deficits) was defined as a combination of tool-related deficits (Tool_Deficit), group-level distractions (Grp_Distract), and multiple threads in the conversation (AttnToMT). (Note that for display purposes, the tool deficits and group distraction sub-branches are shown separately in the diagram). For each 10-minute interval, we computed per-person averages for involuntary disconnects, early exits, etc., and entered the averages as input to the network. Each node

S.J. Derry and LA. DuRussel / On-Line Learning Communities

437

GROUP B

GROUP A

St

LATE LATE;LATE:LATE4

LATE LATE:LATE:LATE

Figure 2: Probability Networks for Deficit sub-branch for Groups A and B in the network shows the probability that the node is in a particular state (high/medium/low). The input nodes determined the states of their parent nodes, which in turn propagated up to determine the state of the overall attention deficit index. As the Deficits nodes indicate, group A had a low degree of attention deficits (64% probability of low deficits; 31 % of average deficits; 5% of having high deficits). In contrast, group B had higher attention deficits (59% probability of high deficits; 32% probability of average deficits, and 9% probability of low deficits). In the larger group attention network (not displayed here) these probabilities resulted in group A having 59% probability of being a high-attention group and group B having only a 19% chance of being a high attention group. These results support the differences shown qualitatively in the transcript excerpts. We are in the process of refining the probability network based on the analysis of additional group meetings. Refinements are being made both in the network probabilities (e.g., what combination of values in input nodes results in a "high" , "medium", or "low" value for a parent node) and in the thresholds for determining input values (e.g., what input qualifies as a "high", "medium", or "low" percentage of whispers per total turns). At this point, many indices (such as percentage of errors) can be calculated based on information captured online, while others—such as thread count—currently require some human

438

SJ. Deny and LA. DuRussel / On-Line Learning Communities

processing. However, the amount of human pre-processing will gradually be reduced as coding systems and algorithms for machine processing are developed. Footnote

This research was supported by NSF REPP Grant #CDA-9725528 (Mark Schlager, PI) and a cooperative agreement between the National Science Foundation and the University of Wisconsin-Madison, USA (Cooperative Agreement No. RED-9452971), which established the National Institute for Science Education. At UW-Madison, the National Institute for Science Education is housed in the Wisconsin Center for Education Research and is a collaborative effort of the College of Agricultural and Life Sciences, the School of Education, the College of Engineering, and the College of Letters and Science. The collaborative effort is also joined by the National Center for Improving Science Education, Washington, D.C. Any opinions, findings or conclusions are those of the authors and do not necessarily reflect the view of the supporting agencies. References

[I] M. Schlager and P. Schank, TAPPED IN: A New On-line Teacher Community Concept for the Next Generation of Internet Technology. Paper presented at CSCL '97, Toronto, 1997. [2] J. Greeno, The Situativity of Knowing, Learning, and Research, American Psychologist 53(1), 1998, pp. 5-26. [3] J. Wertsch, Voices of the Mind: A Sociocultural Approach to Mediated Action. Harvard University Press, Cambridge, MA, 1991. [4] E. Wenger, Toward a Theory of Cultural Transparency. Doctoral dissertation, University of California-Irvine, 1990. [5] J. Orasanu and E. Salas, Team Decision Making in Complex Environments. In G. A. Klein, J. Orasanu, R. Calderwood, C. Zsambok (Eds ), Decision Making in Action: Models and Methods. Ablex, Norwood, NJ, 1991. [6] A. ODonndl, S. Deny, and L. DuRussel, Cognitive Processes in Interdisciplinary Groups: Problems and Possibilities. NISE Research Monograph Number S, University of Wisconsin-Madison, National Institute for Science Education, 1997. [7] D.HaJpern, Thought and Knowledge, 3rd Ed. Eribaum, Mahwah, NJ, 1996. [8] L. Resnick, M. Salmon, C. Zeitz, S. Wathen, and S. Holowchak, Reasoning in Conversation, Cognition and Instruction, 11(3&4), 1996, pp. 347-364. [9] S. Deny, S. Gance, L., Gance, and M. Schlager, Toward Assessment of Knowledge Building Practices in Technology-Mediated Work Group Interactions. To appear in S. Lajoie (Ed.), Computers as Cognitive Tools 0 Erlbaum, Mahwah, NJ, in press. [10] V. Hinsz, R. Tindate, and D. Vollnth, The Emerging Conceptualization of Groups as Information Processors. Psychological Bulletin, 121,1997, pp. 43–64. [II] J. Smith, Collective Intelligence in Computer-based Collaborations. Erlbaum, Mahwah, NJ, 1994. [12] R. Mislevy, Evidence and Inference in Educational Assessment. Psychometrika, 59(4), pp. 439–483.

Artificial Intelligence in Education S. P. Lajoie and M. Vivet (Eds.) I OS Press, 1999

439

Impact of Shared Applications and Implications for the Design of Adaptive Collaborative Learning Environments* Denise Gurer and Robert Kozma SRI International 333 Ravenswood Avenue, Menlo Park, California, USA E-mail: [email protected], [email protected] Eva MUlan Dept. Lenguajes y Ciencias de la Computacion Complejo Politecnico, Campus de Teatinos E.T.S.I. en Informdtica. Universidad de Malaga, 29080 Malaga, Spain E-mail: [email protected] ABSTRACT Studies have shown that the success of collaborative learning depends in large part on the quality of discourse between the collaborating students. This has important implications for distance learning environments. Not only should these environments provide students with a capability to share applications and communicate either verbally or textually, but they also should provide a place or social context within which the students can interact. This paper discusses the results of an initial empirical study whose goal was to guide our design of a learning environment that can proactively adapt to the needs of a pair of collaborating students, taking into account their understanding of the material and their interactions with the learning environment, the shared application, and each other. In this study, a pair of students collaborated to solve chemistry problems within the collaborative environment, in three configurations: (1) side by side, sharing the application on the same computer, keyboard, and mouse; (2) in separate rooms, using the same application with a sharing capability turned on and with an audio link; (3) in separate rooms, using the same application with sharing turned off, but with the audio link on. In this paper we discuss the students' actions and coordination activities and analyze the quality of their discussions. The results suggest that a collaborative system is needed that can take into consideration the cognitive overhead induced by collaborative activities. 7. INTRODUCTION Studies have shown that students working together can greatly improve their learning; however, collaboration is not always effective. Much depends on the quality and focus of discourse between the collaborating students [1,2]. Ideally, students who engage in explanations and discuss domain concepts have more successful learning experiences. In addition, social context plays an important role in collaborative learning where knowledge is socially constructed through the use of collaborative tools, modes of representation, applications, and interaction with a community of on-line users [3,4]. Thus, an effective collaborative system for distance learning should facilitate and encourage discourse and interaction among the collaborators and should provide an environment and tools that enable the learners to build a shared understanding of the domain. 'This research was funded partly by the Office of Naval Research, Contract Number N00014-96-C-0316 and the National Science Foundation, Grant Number CDA-9616611.

440

D. Gurer et al. / Impact of Shared Applications

We are designing an environment that proactively enhances collaboration and learning by adapting the choice of tools, representations, and other media to facilitate both collaborative activities and the social construction of knowledge. The aim of our design is to provide an adaptive, computer-based multimedia social and task environment that can foster and sustain a range of enduring, highly interpersonal relationships and activities that motivate and support continuous learning. As a first step toward designing such an environment, we performed studies of university students who collaborated in the domain of chemistry in a variety of configurations. In this paper we report on our initial findings and their implications for the design of incorporating adaptability into our distant collaborative adaptive learning system, called Virtual Places (VP). First we briefly describe the overall view of the student and group modeling necessary to provide adaptability. We then describe the experiments, results, and their implications for design. 2. ADAPTIVE COLLABORATIVE

ENVIRONMENTS

We are interested in going beyond the traditional intelligent tutoring system approach of modeling the individual learner [5] to an approach where we model and adapt to the broader social context within which learning occurs. We are designing our system to proactiveiy facilitate and encourage collaborative interaction between the learners and other members of the virtual community. To enable this interaction, we are designing a group model that constructs an understanding of pairs of learners as they work together towards a common goal. The group model constructs an understanding of each participant's information needs, level of expertise, and knowledge state, while at the same time modeling the social interaction of the group, such as social context, social roles, group strategies, and goals. We are basing our modeling on an environment where students are logged into a virtual place in addition to sharing applications at a distance. The group model can use the context, learner interaction with a shared application, learner discourse, and domain knowledge to update its understanding of the learners and their social interaction. The context includes virtual environment information (e.g., the location, interaction with objects, number of commands issued, and amount of text and audio dialog initiated) and curriculum information (e.g., the course the learners are enrolled in, what study unit they are working on, the unit's goals). Interaction with the simulation includes mouse clicks, points, and cursor gestures, and simulation functional features (e.g., the experiment that is currently displayed, the order in which the learners perform the experiments, the values of variables, and the media the learners are using). In order for the environment to adapt and make informed decisions, the learners' behavior must be correlated with domain concepts, skills, and problem solving strategies. 3. STUDY DESCRIPTION To assist us in designing the system described above, we performed a study of a pair of collaborating students who had been identified as good collaborators in a previous study [6]. These University of California, Los Angeles (UCLA) sophomore students were tasked to collaborate to solve Ideal Gas Law problems with an application in three different configurations: (1) side by side, sharing the application on the same computer, keyboard, and mouse; (2) in separate rooms, using the same application with sharing turned on, and with an audio link; (3) in separate rooms, using the same application with sharing turned off, and with an audio link. The simulation used by the students is based on the Ideal Gas Law and consists of three interactive graphs (pressure, temperature, and volume) and

D. Giirer et al. /Impact of Shared Applications

441

a graphic animation. The students were given three sets of similar problems to solve, one for each experiment, in which they needed to use the simulation. They were provided the same textbooks off-line and were given a 10-minute tutorial on the use of the application beforehand. When sharing is turned on, students in a collaborating pair can see what each other is doing by observing the response of the application; however, they could not see each other's cursors. Each session lasted approximately 25 minutes.

4. ANALYSES First we identified seven actions that the students performed for their collaborative problem-solving episodes: (1) Reference—consulting a reference external to the application, such as a textbook or their notebooks; (2) Problem Orientation—familiarizing themselves with the problem by reading out loud and writing notes; (3) Discussion— discussing the problem, the results of the simulation or calculations, reference material, the use of the application, and chemistry concepts; (4) Help—receiving help (solicited or unsolicited) for their efforts; (5) Application Use—making simulation settings and running the Ideal Gas Law simulation; (6) Notetaking—writing observations and answers in their notebooks; and (7) Application Orientation—familiarizing themselves with the application in order to proceed with their problem solving. Table 1. Action Incidents Distributed Between Student A and Student B for Each Configuration. Role actions are actions that one student actively performs while the other student only observes; joint actions are those in which both students are actively engaged

Actions

Same Computer Configuration 30 Actions Total B A

Sharing ON Configuration 19 Actions Total B A

Sharing OFF Configuration 28 Actions Total A

B 0

1 1

1

1

1

1

Problem Orientation

2

0

1

1

1

Discussion

10

10

5

5

7

7

Reference

Help

0

0

0

0

1

1

Application Use

0

14

4

4

12

14

Notetaking

2

2

2

3

3

3

Application Orientation

0

0

1

1

0

0

Total Actions

14

29

13

15

25

26

Actions (%)

47

97

68

79

89

93

Role Actions

17

10

5

Role Actions (%)

57

53

18

Joint Actions

13

9

23

Joint Actions (%)

43

47

82

Table 1 displays the action incidents for students A and B in each configuration. Because the problems were different for each configuration, the numbers of actions are not comparable; in our analyses we focus instead on percentages, or distribution of actions. Note that the majority of the actions involved application use and discussion. In order to understand the action distribution over the three configurations, we identified actions as role actions which are defined as actions that only one of the students actively performed, and joint actions, which are defined as actions in which both students

442


were engaged. For example, if both students took notes while discussing what they were writing, that note-taking action was a joint action, not a role action. On the other hand, if only one student took notes, the action was a role action. Another example of a role action is one in which one student reads a question out loud for problem orientation, while the other student just listens. Granted, the listening student is involved in the process, but the involvement is passive. On the other hand, some actions such as discussion are obviously joint in nature. We found that a large number of joint actions took place in the Sharing OFF configuration, while the Same Computer and Sharing ON configurations had similar and lower numbers of joint actions. This is not surprising since the Sharing OFF configuration forces the students to perform the same actions—application use and application orientation—independently and jointly, rather than performing different roles. The students appeared to perform role actions more often in the Same Computer and Sharing ON configurations. Interestingly, when the students did take on roles in the Same Computer configuration, they continued to perform distinct roles during the entire exercise. For example, one student would point to the screen and dictate what to do next and the other would work the mouse and keyboard throughout the exercise. When the same students were in a Sharing ON configuration, they shared actions without taking on distinct roles for the duration of the exercise. For example, in one simulation run one student would make a simulation setting and the other student would run the simulation. In another simulation run the other student might make the settings. In this case, one student did not dominate control of the application as happened in the Same Computer configuration. One possible explanation is the spatial orientation of each type of action. When the students took on specific roles in the Same Computer configuration, they were seated next to each other and one could reach the mouse more easily. When they were in separate rooms in the Sharing ON configuration, they took turns manipulating the simulation, since each had equal access to his or her own mouse and keyboard. On the other hand, in the Sharing OFF configuration each student had to manipulate the simulation separately, thus eliminating the possibility of taking distinct roles. Since a majority of the actions previously identified involved application use and discussion, we wanted to consider the interrelationship between these actions by examining how students coordinated their interaction. We identified four coordination actions that took the form of verbal cues in each configuration: (1) Location Verification, where the students identified their location in a document (e.g., "What page are you on?"); (2) Results Verification, where the students tried to verify that they had the same result (e.g., "I got two blue lines. What did you get?"), (3) Application Manipulation, where the students coordinated their actions with the application (e.g., "One more to go. We don't know volume but pressure goes down. Put your pressure as going down."), and (4) Action Determination, where the students decide what to do next (e.g., 'Til check on my book. How we solved a problem where those two have been involved."). These coordination actions add to the cognitive load in a collaboration session where some actions enhance the collaboration and others just add to the cognitive overhead. For example, action determination can be considered a useful coordination action, since it involves understanding and discussing domain concepts. On the other hand, application manipulation is a non-useful collaboration action, since it does not involve any conceptual understanding but simply an understanding of how to manipulate the application and in many cases can be avoided. Location verification and results verification are neutral, in that they may or may not involve concepts and in many cases are unavoidable.

D. Giirer et al. / Impact of Shared Applications

443

Table 2. Students' Actions to Coordinate Problem-Solving Activities for Each Configuration

Coordination Actions

SameCComputer Configuration Number Percentage of of Total Incidents

Location Verification

0

Sharing ON Configuration Number Percentof age of Total Incidents

1

0

Sharing OFF Configuration PercentNumber age of of Incidents Total

1

5

5

Results Verification

0

0

3

16

3

15

Application Manipulation

7

100

10

53

16

80

Action Determination

0

0

5

26

0

19

7

Total Coordination Actions

0 20

The number of coordination incidents is recorded in Table 2. These results suggest that more coordination actions are needed when students collaborate at a distance, as is the case in the Sharing ON and Sharing OFF configurations. This makes sense, since the students cannot see each other or each other's screens. However, the coordination actions needed by Sharing OFF are primarily for application manipulation, while for Sharing ON they are more evenly distributed over other actions such as results verification and action determination. With Sharing OFF, most of the efforts of student pairs consisted of making sure that both students were doing and seeing the same thing; whereas with Sharing ON, each student knew that the other saw the same thing, so that both could free some of their cognitive load for discussion of the problem and chemistry concepts. Table 3. Focus of Discussions of Collaborating Students, in Each Configuration. Discussions are defined as verbal discourse without use of the application

Focus of Discussion

SameC omputer Confic uration Number Percentage of of Incidents Total

Sharing ON Configuration Number Percentof age of Total Incidents

Sharing OFF Configuration Number Percentof age of Incidents Total

Domain Concepts

4

40

4

80

3

43

Action to Be Taken

6

60

1

20

4

57

Total Discussion Incidents

10

5

7

A key objective of this study was to examine the effect of each configuration on the amount of conceptual discussion that occurred between students. Table 3 classifies discussions that occurred when the students were not actively using the application as either (1) focusing on domain concepts or (2) determining the next action to take. The number of incidents of each type of discussion, as explicated in Table 3, demonstrates that the Sharing ON configuration was more effective in evoking conceptual talk than either the Sharing OFF or the Same Computer configurations. However, the discussions identified can include many dialog lines grouped together into a discussion event. Table 4 gives a more detailed analysis of the same discussions: it displays the number of dialog lines (of the entire interaction) that were tagged as corresponding to either domain concepts or actions to be taken. In this analysis, one turn of dialog is defined as an utterance, by one person, that may include more than one sentence. Interjections such as "urn" or "uh" were not counted. Table 4 illustrates the difference between the number of actual dialog turns devoted to discussing domain concepts as opposed to actions to be taken. The discussion of actions

D. Giirer et al. / Impact of Shared Applications

444

to be taken in the Sharing OFF configuration was much more extended than the equivalent discussions in the other two configurations. In addition, comparing Table 3 with Table 4 illustrates that the relative imbalance between the percentages of conceptual talk in the Same Computer and Sharing ON configurations is due primarily to the extended discussions of actions to be taken in the Sharing ON configuration. When only the number of dialog turns is considered, the amount of discussion of domain concepts is similar in the Sharing ON and Same Computer configurations. Table 4 also confirms the lower incidence of focus on domain concepts in the Sharing OFF configuration, mainly due to the necessity of maintaining coordinations actions to keep the collaboration going. Table 4. Focus of the Dialog in the Entire Interaction Between Two Students in Each Configuration Same Computer Sharing ON Sharing OFF Configuration unman PercentPercentNumber Number Number Focus of Dialog of age of of age of of Turn ageof Turns Total Turns Total Turns Total Domain Concepts Action to be Taken

18

42

38

47

32

25

58

43

53

98

Total Dialog Turns

5. DESIGN

43

81

25 75 130

IMPLICATIONS

While Virtual Places has provided learners with a social and task environment to support distant collaborative learning, much of the collaborative activity is left to the discretion of the students. The results of the study discussed above suggest that not all forms of collaboration are productive or lead to conceptual discourse. Collaborative knowledge building is more likely to occur when students use shared applications, particularly if these shared applications and joint actions are accompanied by representations that support conceptual thinking. Indeed, other work has shown that working at a distance in a Sharing OFF configuration causes students to be significantly more tired and less effective than working side by side [7]. However, even in the Sharing ON configurations, a large cognitive overhead can be devoted to collaboration coordination activities. These are collaboration mental processes that consist mainly of coordination actions (see Table 2) that constitute a cognitive load in addition to the usual load found in learning environments (i.e., learning the application, interacting with the environment, and understanding the problem). The goal of our research is to extend the capabilities of Virtual Places-like immersive environments so as to adapt them to the needs of individuals and groups of learners. Ideally, students should focus as much as possible on the domain concepts they are learning and less on off-target collaboration in activities such as the coordination of applications. Thus, we want to build capabilities that take proactive measures to facilitate collaboration and learning, and lessen the cognitive overhead. In particular, the results shown in Table 1 suggest that we should focus mainly on enabling the use of shared applications and discussion between the students. This can be accomplished by providing mechanisms to assist in coordination actions that can include coordinating of the application, providing connections to other people in the virtual environment for assistance, providing awareness of tools and on-line objects that could aid in the problem solving task, and providing on-line resource information.


445

6. SUMMARY These findings demonstrate that collaboration at a distance can support conceptual discussions as effectively as side-by-side collaboration, under certain conditions. Our results suggest that the use of shared applications and audio connections result in patterns of joint activities that are similar to those observed when students are working side by side. First, the amount of joint activity was similar under both conditions. However, with the distance learning condition Sharing ON configuration, the joint work was actually spread over a broader range of activities, since both students had their own keyboards and therefore equal access to the operation of the simulation. Joint activities were much lower in distance learning Sharing OFF configuration, even though both students were working on the same task. In this configuration it was more likely that an activity would be performed by each of the students. These findings also show that the use of shared simulations and representations at a distance can support more conceptual talk than it would without those shared applications. When pairs of students knew that they were both manipulating and observing the same representations, they could focus their discussion on the concepts that were represented more effectively than when they were manipulating and observing the same kinds of representations independently. Without a shared view of the simulation, they were more likely to spend more of their time discussing the operation and manipulation of the application. Future studies are needed of a range of students that are collaborating in the fully functional VP environment, which consists of a social environment, audio channels, shared applications, problem solving tools, on-line textbooks, text-based chat, and shared gesturing (i.e., the gesturers can see each other's cursors). Through these future studies we intend to identify the specific features of tools, representations, social context, and activities in each situation, as well as the relationship between these features and the behavior of the students and instructors. We wish to identify the most effective components and tools that can be integrated into our design for collaborative learning environments. 7. REFERENCES [1] N.M. Webb, Peer Interaction and Learning in Small Groups, International Journal of Educational Research, 13(1989)21–39. [2] M.J. Baker and K. Lund, Flexibly Structuring the Interaction in a CSCL Environment. Proceedings of EURO Artificial Intelligence in Education, 1996: see http://www.cbl.leeds.ac.uk/~euroaied/ papers/B aker-Lund.html. [3] M. Cole, On Socially Shared Cognitions, In: L. Resnick, J. Levine, and S. Behrend (eds.), Socially Shared Cognition. Erlbaum, Hillsdale, New Jersey, 1991. [4] J. Greer, G. McCalla, J. Collins, V. Kumar, P. Meagher, and J. Vassileva, Supporting Peer Help and Collaboration in Distributed Workplace Environments, International Journal of Artificial Intelligence in Education, 9 (1998), 159–177. [5] D.W. Gurer, M. desJardins and M. Schlager, Representing a Student's Learning States and Transitions, Proceedings of the AAAI-Spring Symposium on Representing Mental States and Mechanisms. AAA1 Press, Stanford, California, 1995, 51-59. [6] R. Kozma and D. Gurer, A Comparison of Students' Collaboration While Conducting Chemistry Wet Lab Experiments and While Using Molecular Modeling Software, presented at the Conference on Computer Support for Collaborative Learning, 1997. [7] D. Whitelock and E. Scanlon, Motivation, Media and Motion: Reviewing a Computer Supported Collaborative Learning Experience, Proceedings of the Euro Artificial Intelligence and Education Conference, 1996: see http://www.cbl.leeds.ac.uk/-euroaied/papers/Whitelock-Scanlon.html.


Supportive Collaborative Learning



449

An approach to analyse collaboration when shared structured workspaces are used for carrying out group learning processes Barros, B. & Verdejo, M.F. Departamento de Ingenieria Electrica, Electronicay Control, Escuela Tecnica Superior de Ingenieros Industriales (U.N.E.D) Ciudad Universitaria s/n, 28040 Madrid, Spain

Email {bbarros, felisa}@ieec .uned.es phone: 34–91–398 64 84 fax: 34– 91 – 398 60 28 Abstract In this paper we present an approach to characterize group and individual behaviour in computer-supported collaborative work in terms of a set of attributes. In this way a process-oriented qualitative description of a mediated group activity is given from three perspectives: (i) a group performance in reference to other groups, (ii) each member in reference to other members of the group, and (iii) the group by itself. In our approach collaboration is conversation-based. Then we propose a method to automatically compute these attributes for processes where joint activity and interactions are carried out by means of semi-structured messages. The final set of attributes has been fixed through an extensive period of iterative design and experimentation. Our design approach allows extracting relevant information at different levels of abstraction. Visualization and global behavior analysis tools are discussed. Shallow analyses as presented in this paper are needed and useful to tackle with a large amount of information, in order to enhance computer-mediated support.

1. Introduction Collaborative learning research has paid closed attention to study pupils interactions during peer-based work in order to analyze and identify the cognitive advantages of joint activity [6]. As [5] points out, the benefit of the collaborative approach for learning lies in the processes of articulation, conflict and co-construction of ideas occurring when working closely with a peer. Participants in a problem-solving situation have to make their ideas explicit (assertions, hypothesis, denials..) to other collaborators, disagreements prompt justifications and negotiations, helping students to converge to a common object of shared understanding. The computer provides opportunities to support and enhance this approach in a number of ways, for instance, offering computer-based problem spaces for jointly creating and exploiting structures of common knowledge and shared reference. Moreover, networks make possible opening the collaborative framework to distributed communities providing remote access to these spaces as well as computer-mediated communication to support interpersonal exchange and debate. An increasing number of collaborative learning environments for open and closed virtual groups have been built for a range of learning tasks [8][13][11][12], and experiences of use are reported from school to university level [2][4]. In this paper, we are focusing on the analysis of computer mediated interaction in

450

B Barros and M.F. Verdejo / Carrying out Group Learning Processes

collaborative learning processes. In our approach collaboration is conversation-based. Conversation consists of turn taking, where each contribution both specifies and grounds some content [3]. The type of contributions and their constraints can be defined to establish an explicit structure for a conversation. For learners the benefit is twofold (1) they receive some support for the process of categorising and organising their ideas when contributing to the debate and (2) further inspection of the process is facilitated because the system can take into account the type of units. Most of the automatic analysis of computer mediated interactions have been up to now based on quantitative terms, a variety of qualitative analysis have been performed mainly manually (see for instance [6]), with a few attempts to carry out automatic natural language processing [9]. Our goal is to design a tool to facilitate the handling of a large amount of data, in order to have a view, at an abstract, qualitative level. Our proposal aims to describe the individual and group behaviour in terms of a set of features to characterize collaboration when performing a task. This is a first step towards the complex goal of understanding how interactions are related to learning processes and outcomes. The following section gives an overview of our system. A process-oriented visualization tool is presented in section 3. The qualitative analysis is discussed in section 4 and elements for further research are pointed out in the conclusions. 2. The collaborative environment The main metaphor in our system for sustaining a learning activity is the concept of space, a virtual structured place with resources and tools to perform a task. Three types of spaces are available, an individual workspace, private for each user, and two types of shared spaces: one workspace for debate and joint construction, the other for coordination purposes. The information handled is mainly textual, so a variety of editing tools and file management facilities are available. Links to other relevant electronic sources of further information for the task at hand can also be included. A shared workspace provides support for conversation in the form of semi-structured typed messages. When learners express their contributions they have to select a type from a predefined set. Workspaces are defined in the configuration mode, an authoring tool is provided to define the workspaces associated to a learning task. The architecture of the system is organized into four levels: configuration level, performance level, reuse level and reflection level [1] For example, let us consider an experience comprising two collaborative activities: writing an essay and designing a case, where the first one, in terms of Activity Theory [10] is described as follows: The object of the learning activity is to make the survey in a focused research topic on Educational Technology. The outcome of the activity is an essay, pointing out the key ideas. The schema for the essay has been previously defined by the tutor, and included in the shared workspace. The community involved in the activity is made up of pairs of graduate students in a Ph.D. distance teaching programme. Students are part time, geographically distributed, usually interacting in disjointed time slots. The subject, each individual student, has either a technical or non technical profile, but groups are formed by people with the same background. Mediational tools include humans, a tutor and a technical assistant, and artifacts: the phone, a set of documents and the system (a networked computer environment including the script of the activity, private and shared structured task spaces for peer argumentative elaboration and coordination, as well as access to other electronically available sources of information). Rules for the activity are stated explicitly and have to be accepted before starting. They include the commitment to finish the work, the script for the activity and the protocol for the collaborative debate. Some of these rules are embedded in the system. For example, the conversational graph defines the way one can respond to another peer's contribution. Agreement has to be explicit and reached by consensus. Others rules are the full responsibility of the learners, such as the way to organize the discussion, as well as aspects of time management and deadlines. The only guideline given to the groups is to use the coordination

B. Barros and M.F. Verdejo / Carrying out Group Learning Processes

451

space to deal with all these matters, using the elaboration workspace only for the topic discussion. Students in each group have the same responsibility, there are no predefined roles or division of labour.

Figure I. Interfaces of the Elaboration Space and Result space of final version

The learner interface provided by the system for this experience is shown in figure 1. The upper window shows the interface for a groupwork. The bar shows the number of activities and the associated workspaces. For accessing activity 1 one's elaboration space the user clicks in the space name (labeled 1 in the figure) and the elaboration space window appears. This interface is organized in three areas. The upper area is a menu for accessing activity workspaces. In the left area we can see the task schema which has been automatically updated by the system and on the right side - after clicking on one of the contribution names the whole content of the contribution appears. Labels on the window explain each element of the interface. For moving to the result space, the user has to click the checkbox with the space name (labeled 2 in the figure) and the result space appears showing the state of the votes for each section on the right. The final document can be seen, on the right of the window, pressing the option labeled 3 in the figure. 3. Visualizing the process The system records all the accesses and the actions performed by each user. The information automatically registered includes user identification, time and date, host computer, learning experience, group, activity, task, workspace, and type of action. A relational database stores

452

B. Barros and M. F. Verdejo / Carrying out Group Learning Processes

this information so a variety of queries combining a selection of criteria can be easily formulated and the results are displayed either in textual or graphical format. A web-based interface with buttons is provided for selecting the options for a query. Examples of queries follow: • Evolution of the number of users contributions in an experience along a period, using a graphic display. • Plot the number of hourly accesses for a group in an activity. • Number of contributions by user, for all the group members in an activity. • Number of contributions by user, by type, for each substask of a task. • Contributions of a group, related to a workspace task, by type of contribution, displayed as a chart bar. The discussion process is summarised in term of the type units, in this case proposal, contraproposal and agreement have been the backbone of the process. Only a few comments, questions and clarifications have been performed. This graphic together with the previous question provides a good overview of the dynamics of a groupwork. • Number of contributions, by member, for an activity (all the workspaces supporting the activity), graphic display. • Evolution of a discussion for a substask, graphic display. We can observe here a first proposal followed by a contraproposal and a comment from the same author. A turn taking happens with a peer contraproposal and then two contraproposals plus a comment from the first author, a question from his peer follows, and after some time without receiving a response this student takes the initiative making a new contraproposal. In this way they finally reach an agreement The question of who many request this data is a matter to be decided at the setting-up of the collaborative learning experience, when rules for the group and division of labour are established. The system, in configuration mode, allows us to define restrictions and permissions related to roles. Roles can be assigned to users for each activity. The tool for visualizing group processes becomes available for authorized users by selecting the reflection mode. This process oriented data analysis can be used for a variety of purposes. For example, during the activity, an insightful presentation of individual contributions to the group task can increase presence awareness for the rest of the group and stimulate peers to contribute just in time. Furthermore, it may support pedagogical decisions. For instance, to carry out an external intervention, either by artificial or human agents, which suggests relating contributions to each other. 4. Qualitative Analysis Criteria and methods to evaluate whether a collaborative learning process has been carried out is a controversial and open question in the field. But al least, from a practical point of view, we need to identify, even roughly, if and when students have been addressing each other and working together. For this purpose we can exploit some of the evidence provided by the student contributions while performing a task. We propose three kinds of analysis to characterize: Group behavior compared to other groups performing the same task. Group behavior in itself Individual student behavior compared to the rest of the group members We will express the result of the analysis in terms of a set of features (attribute-value pairs), subjectively established but tested and refined through an extensive experimental period of system use. Figure 2 shows the attributes considered for group comparative behavior. Values for these attributes are either (i) calculated from data of the task definition and the process performed or (ii) concluded from fuzzy inference using a set of evaluation rules, for those attributes appearing with a dark background in the figure. These rules, which are obtained from


453

relationships among attributes, as shown in the figure, can be applied to the calculated variables to infer new ones. Those variables, in turn, can be the entry for another rules giving rise to a chained inference process. Next we will describe in detail these features.

MContnbutonsNumbei MContributioniSize Elaboration

Creativity

Figure 2. Relationship among attributes

4.1 Analysing group behavior Data for the process analysis of a learning experience comes from two sources: • The definition of the experience, embedded into the system configuration, i.e. tasks workspaces associated to each activity. In particular their conversational graphs and the group definition structure. The conversational graph specification includes the definition of values for a set of attributes for each type of contribution. We propose four attributes taking values in a range of (–10,10) Initiative, indicates the degree of involvement and responsibility required to perform a contribution. Creativity, relates to the degree of originality required to produce this type of contribution Elaboration, qualifies the workload needed for making a contribution. Conformity, establishes the degree of agreement of a contribution in relation to another selected and linked contribution. For instance making a contraproposal to a proposal indicates a low degree of conformity to the proposal. Figure 3 shows the feature structure associated to the conversational graph of the elaboration workspace presented in figure 1. V» Proposal (P) Contraproposal (CN) Question (Q) Comment (CO) Clarification (CL) Agreement (A)

10 10 2 4 4 0

10 9 1 3 2 0

-10 -10 3 0 0 10

Figure 3. Conversational graph and a table with the attributes and values for each contribution type

•

The learners'contributions organised as tree-like structures for each task workspace, and the set of messages interchanged through the coordination space associated to each activity.

From this data we consider and compute the following attributes: For each elaboration workspace Mean Contribution's number, number of contributions from the group/number of group members Mean Contribution's size: mean size of contribution contents (in characters) Depth of the Contribution's tree: Maximum depth of the trees related to the workspace

454

B. Barros and M. F. Verdejo / Carrying out Group Learning Processes

Interactivity: Percentage of contributions responded or linked to other contributions made by a student, other than the contributor. The values Vai for each i attribute: Initiative, Creativity, Elaboration and Conformity are computed by the formula below: Vai = k j=i (Nj * Vji) where Nj is the number of contributions for each type j, and Vji the value of the attribute for that type of contribution as defined in figure 3. For the coordination space, total number -by type- of messages. Conclusions are established by a fuzzy inference process. The logical product of each rule is inferred to arrive at a combined magnitude for each output membership function, by the MAX-MIN method. Then a defuzzification process is carried out for the output generation. The defuzzification function is performed by mapping the magnitudes to their respective output trapezoidal membership functions and computing the fuzzy centroid of the composite area of the member functions. In order to perform fuzzy inference, numerical values of the computed attributes have to be first of all mapped from their numerical scale to a linguistic label expressing their degree of membership. This is performed by a fuzzification function. As the scales have to be

„

Figure 4. Results of group behavior for an experience, comparing each task with other groups perform ing the same task

mean reference point (MRP). For instance, for a normal task 50 contributions is a typical number of contributions, so 55 is the top value for the adequate linguistic label, while for a long task this could be in the range of low. Mean reference points can be dynamically calculated taking into account all the similar tasks performed in all the learning experiences carried out with the system. Figure 4 shows the results of the global analysis for a particular group learning experience. 4.2 Individual behavior This analysis is done for each student in a group for a learning task, the method is the same as for

the group behavior, but in this case we consider a different set of attributes and rules as illustrated in figure 5. For instance the attribute promote discussion is inferred from the attributes: (i) Answered, contributions responded to others, (ii) Was Answered, own-contributions responded by peers, (iii) continued, contributions answered by the same user and (iv) proposal moves made by this person along the process. For more details see [1 ].

455


4.3 Group task behavior summary This analysis focuses on two aspects: distribution of work between group members and evolution of group activity in a period of time. We use the same data as before, (i.e. definition of the task and process contribution trees), but in this case we have to consider a clustering of attributes as indicated in table 1 for the definition of a task. Proposal Contraproposal Question Comment Clarification Agreement

Propos

propose propose argue argue argue agree

Conformity

Figure 5. Relationship among the attributes in individual analysis.

Table 1. Category attributes considered for group task behaviour.

Conclusions are reached in this case by comparing computed data from each user with the rest of the group. Figure 6 shows an example of results obtained from a particular task and a group of students. On the left we can observe the evolution of the task performed in three periods of activity, the first being where more contributions occurred. On the right, for each subtask, a conclusion of student's participation is given. The result in this case may suggest that collaboration, but also some division of labour, has really happened. subtasks

Que significe colebonary el soporte tecaico

/\ Objetivos yJ u s t i f i c i o ndel oprendizaje coleborative Done by user gooibz Entomos Abiotos

Done by user meninez

Conchuionei

Done by several users (oitztki proposed more

Figure 6. Intensity and evolution of work on the left, and participation by substask on the right

5. Summary and conclusions Computer mediated collaborative learning allows the recording of a large amount of data about the interaction processes and the task performance of a group of students. This empirical data is a very rich source to mine for a variety of purposes. Some of practical nature like, for instance, the improvement of peer awareness on the on-going work. Others of a more long-term and fundamental scope such as to understand socio-cognitive correlations between collaboration and learning. Manual approaches to fully monitor and exploit these data are out of the question. A mixture of computable methods to organize and extract information from all this rough material together with partial and focused in-depth manual analysis seems a more feasible and scalable framework. In this paper we have presented first of all an approach to characterize group and individual behaviour in computer-supported collaborative work in terms of a set of attributes. In this way a process-oriented qualitative description of a mediated group activity is given from three perspectives: (i) a group performance in reference to other groups, (ii) each member in reference to other members of the group, and (iii) the group by itself. Then we have proposed a method to automatically compute these attributes for processes where joint activity and interactions are carried out by means of semi-structured messages. The final set of attributes has been fixed through an extensive period of iterative design and experimentation. We do not make theoretical claims about this particular set, the only worth of this proposal is practical value and therefore it is open to further refinement. The method uses the feature structures associated to the conversational structure of shared workspaces as

456


data, therefore the attributes considered can be easily redefined. Moreover in our system these are specified in a declarative way when configuring the computer environment for a learning experience. The results about the processes can be used in many ways for pedagogical purposes. Promote reflection is an interesting one, for instance, one could extend the experience with a peer reflection phase, making visible this information either within or among the groups of students performing similar tasks. Furthermore, it is quite useful input for teachers and designers, to evaluate whether the definition and the support provided for the collaborative task has been adequate. To combine process-oriented evaluation together with manual evaluation of the final product has proved for us a reasonable approach to deal with assessment matters in our distance learning courses. A comprehensive theoretical perspective for analysing collaborative learning has been and would be a trend for the research community, partly depending on the development of computable methods for analysing process-oriented data. Natural language processing would allow a categorization of contributions from a content analysis, but current NLP techniques require expensive resources and processing. To consider semi-structured interactions are a non-intrusive way to break this complexity at least for a broad range of tasks. Our design approach allows extracting relevant information at different levels of abstraction. Visualization and global behavior analysis tools are just two of them. A fine-grained study of collaborative interactions would require a more extensive modeling not only dealing with task and communication aspects but also with learner's beliefs. Nevertheless shallow analyses as presented in this paper are needed and useful to tackle a large amount of information, in order to enhance computer mediated support. Acknowledgements The work presented here has been partially funded by CICYT, The Spanish Research Agency, project TEL97–0328–C02–01. We would like to thank our UNED, UCM and UPM students, for their participation in the experiences carried out while designing the system prototypes. References [ 1 ] Barros, B. (1998) Aprendizaje Colaborativo en Ensenanza a Distancia: Entorno generico para configurar, realizar y analizar actividades en grupo. Technical Report. STEED Project. [2] Bell, P. & Davies, E. A. (1996) "Designing and Activity in the Knowledge Integration Environment". 1996 Annual Meeting of the American Educational Research Association, New York. (http://www.kie.berkey.edu/KIE/info/publications/AERA96/KlE lnstruction.html) [3] Bobrow, D. (1991) "Dimensions of Interaction: AAAI-90 Presidential Address", Al Magazine, 12(3), pp. 64-80. [4] Collis, B. (1998) "Buidling evaluation of collaborative learning into a WWW-based course: Pedagogical and technical experiences ", Invited paper for the Special Issue on Networked and Collaborative Learning of the Indian Journal of Open Learning, New Delhi, India. (http://utto237.edte.utwente.nl/isnVisml-97/papers/col-eval.doc ) [5] Crook, C. (1994) Computers and the Collaborative Experience of Learning, Routledgem International Library of Psychology. [6] Dillenbourg, P. & Baker, M (19%) Negotiation Spaces in Human-Computer Collaborative Learning. Proceedings ofCOOP'96. http://tecfa.unige.cn/tecfa/publicat/dil-papers/LastJuan fm.ps [7] Dillenbourg, P., Baker, M., Blaye, A. & O'Malley (1996) "The evolution of research on collaborative Learning", Learning in Humans and machines, Spada & Reimann (editors). http://tecfa.unige.ch/tecfa/research/lhm/ESF-Chap5.ps [8] Edelson, D.C., Pea R.D. & Gomez L.M. (1996) "The collaborator Notebook", Communications of the ACM 39(4) pp 32–33. [9] Henri, F. (1992) "Computer Conferencing and Content Analysis", Collaborative learning through computer conferencing: the Najaden papers, Kaye, A.R. (Editor), Springer-Verlag, pp. 117–136. [10] Nardi, B. A. (1996) Context and Consciousness. Activity Theory and Human-Computer Interaction, MIT Press. [11] Scardamalia, M. & Bereiter, C. (1994) "Computer Support for Knowledge-Building Communities", The Journal of the Learning Sciences, 3 (3) pp. 265–283. [12] Suthers, D. & Jones, D. (1997) "An architecture for Intelligent Collaborative Educational Systems", Proc AI-ED'97, B. Du Bovlay and R. Mizoguchi (editors) pp.55-62. [13] Wan D. & Johnson P. (1994), "Experiences with CLARE: a computer-supported collaborative learning environment" Int.J. Human-Computer Studies, vol. 41, pp 851–859.


457

Supporting Distance Learning from Case Studies Marta C. Rosatelli1'2 and John A. Self1 Computer Based Learning Unit, University of Leeds, Leeds LS2 9JT, UK 2 Graduate Program in Production Engineering, Federal University of Santa Catarina, Florianopolis, SC, 88040–900, Brazil {marta, jas}@cbl.leeds.ac.uk 1

Abstract: Distance Learning from Case Studies involves enabling collaboration between two or more learners at a distance on a case study activity. In this paper we present an empirical qualitative study that simulates a learning scenario where a pair of subjects at a distance are provided with a collaborative learning environment and required to collaborate in order to solve a case study. The results of the empirical qualitative study have implications that inform the design of a system to support group activity in Distance Learning from Case Studies. Keywords: distance learning, WWW, collaboration, learning from case studies.

1 Introduction Learning from Case Studies is well established as an educational method [1]. It has been widely used for years in disciplines such as law, psychology, psychiatry, architecture, education, engineering, business, and management. The common characteristic among those disciplines is that they introduce the kinds of problem that no analytical technique or approach is suitable to solve: open-ended, with no "correct" or clear-cut solution. Hence, the case method is used where the skills of solving complex and unstructured problems are required [2]. In addition, the method is considered a way to bridge the gap between theory and practice, creating opportunities for the learner to face the complexity of real problems and to deal with day-to-day ambiguities of professional life [3]. The case method is potentially appropriate to be applied to distance learning, provided that fundamental issues in the design of distance education [4] are addressed. Rich [5] used the World Wide Web as a distribution medium for the case study contents and a mailing list and e-mail to carry out the case discussion. But the Web provides much more scope for support than this. As Oram [6] suggests, it is also essential to assist the different stages of the learning process. The case study activity might be supported over the Web with: (1) the use of text based/graphical hypermedia resources to present a case study and to lead the learner through the system use; (2) the use of tools to carry out discussions and group activities; (3) the combination of both off-line and on-line case study activities, as they usually take longer than other learning activities. The characteristics of case studies activities have led to their relative neglect in AIED research. The projects which have sought to adapt this research for the Web (e.g., [7]) have typically focussed on problemsolving monitoring, rather than on case studies activities.

458

M.C. Rosatelli and J.A Self / Supporting Distance Learning from Case Studies

The context for this research is a distance course on production engineering that includes disciplines containing open-ended problems about design, analysis, selection, planning, and/or business decision situations. Usually, such problems are derived from actual experience, reflecting the "real world" concerns of engineers and managers, and are used to train learners for professional practice. In this paper we present an empirical qualitative study that simulates a learning scenario where a pair of subjects at a distance are provided with a collaborative learning environment and required to collaborate in order to solve a case study. The results of the empirical qualitative study have implications that inform the design of a system to support group activity in Distance Learning from Case Studies. Section 2 presents an overview of the Learning from Case Studies method, focussing on its most important features and the significance of such features to an intelligent system to support the case method. Section 3 describes the empirical qualitative study that was carried out. Section 4 discusses the experimental results and their implications. Section 5 presents the conclusions and recommendations for further research. 2 Learning from Case Studies A case study can take diverse forms such as a story, an event or a text. It is an "instance of a larger theoretical class" and as such it represents certain features of that class. In addition to the narrative, cases are situated or contextualised in application, place, time, etc. Their specificity and localism mean that their use as teaching materials takes into account the situated nature of cognition in the learning process. Also, cases are suitable to present complex situations that demand cognitive flexibility to cope with them. As a result, the case method is often applied to learning in ill-structured domains [3]. The case study will basically furnish raw material for the case discussion, which is a central issue in Learning from Case Studies. It is so important that the case method is often referred to as the process of teaching by holding discussions, as opposed to lectures or labs. The case discussion process is often described as fluid and collaborative. On the other hand, although it might seem at first to be freewheeling and unstructured, the discussion process has a kind of structure that usually emerges as it progresses [8]. Easton [2] provides a comprehensive framework for the case discussion, in which the case solution is developed step by step: the Seven Steps approach. The value of this approach to a computer-based system is that the case study solution is developed through a sequentially structured process, splitting it into parts that have a manageable grain size of information. The outcome of each step may be represented by the system so that it can interact with the learners, providing support and feedback during the case solution process. Each step has its own goal and suggests a range of activities to be carried out by the learners in order to achieve such goals. Table 1 presents an overview of the steps sequence, stating each step goal and exemplifying the kind of activities associated with each of them. The case discussion is intrinsically related to the instructor's role in the case method. The leadership of the case discussion process is a critical responsibility of the instructor who, rather than having a substantive knowledge of the field or case problem, must lead the process by which individuals and the group explore the complexity of a case. The effective case instructor maximises the opportunities for learning by asking the appropriate questions during the discussion. The skills and techniques of a discussion leader are subtle but can be observed, abstracted, and taught. The attempt to identify question-and-response patterns typically leads to about eight basic types of questions (e.g., questions of fact, of interpretation, hypothetical questions, etc.) and dozens of subsets available for the instructor's use [1].

M.C. Rosatelli and J.A. Self / Supporting Distance Learning from Case Studies

459

Table 1. Adapted from the Seven Steps Approach [2] Activities The Seven Steps Relate, summarise Stepl . Understanding the situation List problems Step2. Diagnosing problem areas List solutions Step3. Generate alternative solutions List outcomes Step4. Predicting outcomes List pros and cons StepS. Evaluating alternatives Step6. Rounding out the analysis Detail, choose Step7. Communicating the results Present a solution to the case

3 The Empirical Study The application of the Learning from Case Studies method to a computer-based learning environment according to the Seven Steps approach takes into account how the case solution process occurs in the traditional classroom. The main issues considered are the guidance provided by the case instructor, the case discussion based on the confrontation of individual ideas, and the need to reach an agreement about the solution. 3.1 Purpose The main objective of the empirical qualitative study was to observe how the process of discussion between distant learners collaborating on the solution of a case study occurs. Although the discussion is a central issue in Learning from Case Studies in the traditional classroom, little or no information is available concerning it in the application of the case method to distance learning, using networked computers and a collaborative learning environment. Therefore, the study aimed to further investigate the issues that arise from the case discussion in this kind of medium and environment. The experimental observations and results are then used to derive implications that inform the design of a system to support group activity in distance learning. 3.2 Framework/Model The collaborative learning environment for the experiment was based on NCSA Habanero v2.0 Beta 2, which is a collaborative framework written in Java. It includes a Client, a Server and a set of applications. Among the available set of applications in the current version, the learning environment made use of (1) Savina, a collaborative Web browser; (2) Chat, a text based chat environment with logging capability; and (3) mpEDIT, a collaborative text editor (See Fig. 1). The collaborative aspect of the Web browser means that if one of the peers accesses a particular Web page, the other participants in the same session would view the same page on their screens. Concerning the text editor, the participants can edit a text together, typing one at a time in the same text area. The use of these three particular tools was determined by the requirements imposed by the use of the method/approach in the traditional classroom. The collaborative Web browser was a default assumption, as the Web is the medium used to deliver the teaching and learning materials. The Web pages' design was based on the Seven Steps approach. The first set of pages presented, besides the case study text, a brief explanation about the case method emphasising the importance of both the case discussion and the reaching of an agreement. It also included instructions to the learners on how to proceed to solve the case according to the Seven Steps approach in this environment (i.e., which tool to use, how, and when). The second set referred to the Seven Steps. Each page included a description, the goal, the question, and the demanded activity concerning each step.

460


Figure 1. Habanero Collaborative Environment screen showing on the upper left the framework main window, on the upper right the browser window, on the bottom left the text editor window, and on the bottom right the chat window.

The hyperlinks were designed to be followed sequentially, according to the nature of the approach. In each step the learners were supposed to answer individually the question posed. Then, based on their individual answers, to have an on-line discussion in order to reach an agreement on a joint (group) answer. Proceeding through the sequence of steps conducted the learners to the development of a solution to the case study. The questions associated with each of the Seven Steps were chosen from a set of examples modelled on approaches used by case study instructors in the traditional classroom. Those questions are classified according to the instructor's purpose in categories such as: discussion starters, challenging questions, analytical and evaluative questions, summary questions, etc. [9]. In order to choose the appropriate questions to each step, the categories were matched to each step goal and requested activity. The application of the case method to distance learning implies a collaboration [10]. Hence, this issue turned out to be a main concern. The collaborative text editor was used to present and record both the individual and agreed joint answers. The text based chat environment was where the discussion to reach an agreement, about each step answer and the case solution, took place. The starting point for the discussion were the differences and/ or similarities between the individual answers, visualised with the collaborative text editor. The case study used in the experiment is entitled "A Dilemma Case on 'Animal Rights'"[11]. In short, the case asked what a Professor in charge of a General Biology course should do, when confronted with a student who was philosophically opposed to dissection labs. The case does not require any specialised knowledge in order to be solved and the controversial nature of the subject is believed to motivate and encourage the discussion. The assumption behind the choice of this case was that the subjects would not be attending a course or learning a particular domain, which is the standard situation in the application of the case method.


461

3.3 Procedure The subjects were 5 pairs of postgraduate students, who volunteered to take part in the experiment. Although they were from different backgrounds, they were familiar with the applications used in the experiment: browser, text editor, and chat. On the other hand, they were neither used to the collaborative aspect of the browser and text editor nor used to the case study method. After being randomly paired, each pair of subjects received a brief explanation about the experiment. The explanation emphasised collaborative work on the solution of the case study, as well as the reaching of an agreement demanded in every intermediate decision point (i.e., the answer to each step question) and in the final case solution. The subjects were also given a 15-minute explanation about the Habanero Collaborative Framework. Two connected computers placed side by side were used in this explanation to demonstrate how the applications operated under collaboration, and how to use them with the case method and the Seven Steps approach. Next, the subjects were located in separate rooms, using network connected computers (PCs under Windows NT) to carry out the experiment. Occasionally during the experiment some assistance concerning the operation of the applications was provided.

4 Results Analysis and Recommendations The results analysis presented below is derived from (1) the observations made by the experimenter; (2) the review of the log files containing the dialogue contributions; and (3) the review of the edited texts, concerning the individual and joint answers to each step question. Moreover, in order to follow the subject pair's reasoning and solution path taken, the case solution was represented in detail in a tree data structure [12]. The experiment showed that the application of the Learning from Case Studies method to distance education is feasible. We believe this is due to: (1) the Web pages' design, which was based on the Seven Steps approach, stating each step goal and posing a question; and (2) the collaborative learning environment, which provided appropriate tools for the different kinds of communication demanded by the method/approach. The experiment generated the kinds of dialogue that were expected concerning the case discussion, therefore validating the previous ideas about the development of the discussion process and the case solution. But, it also showed that there are issues that should be addressed in order to support group activity in Distance Leaning from Case Studies. Below, we outline in our recommendations an agent-based [13] approach to tackle these issues. It includes agents that can be classified either as interface, reactive, and/or hybrid agents [14]. 4.1 Solution Development Although the subjects seemed to have a clear understanding of their task, the observations showed that they did not fully realise that the case solution is build up step by step, adding in each step to what they have done in the previous one. If the steps are correctly followed, there will be a list of alternative solutions, an outcomes prediction of each alternative solution, and a pros and cons list to each outcome. This can become quite overwhelming and confusing, especially when it gets to the point of choosing a solution. A common behaviour was to enumerate the step answer components, what worked well for a single step. But, when moving to the next step, they often missed the interconnections between the answer components, therefore losing track of their solution paths. This can be illustrated by the non-consideration of a certain topic raised in one step in the subsequent one,

462


demonstrating the abandonment of a possible solution path formerly envisioned1. For instance, from Step 3 (listing of alternative solutions) to Step 4 (listing of predicted outcomes) they might miss one of the foreseen alternative solutions when listing the possible outcomes. Consequently one possible case solution is not examined. Conversely, the opposite can happen: the inclusion of topics that were not anticipated in an earlier step. Recommendations. The system should provide the learners with a way to map or visualise the building up of their solution. The development of the case solution according to the Seven Steps is well represented as a tree [15]. The input are the outcomes of each step, that is, the solution tree is generated from the topics raised by the learners in each step answer. The root of the solution tree corresponds to the question posed by the case. The levels of the tree represent each of the Seven Steps. Each node in a given level refers to a component of the answer to that step question. At a given step, the system adds one level to the tree, generating it from the expansion of the nodes in the previous level. Thus, the tree corresponds to the representation of the case solution developed so far. A graphical representation of this tree should be presented to the learners in order to point out any incoherence concerning the steps' expected outcomes. 4.2 Timing The subject pairs showed a tendency to spend more time in the earlier steps and go through the last ones quicker, contrarily to what was expected, as the last steps demand more complex activities. This can be due to various factors. The learners might (1) get overloaded with the level of detail demanded by the Seven Steps approach; (2) have a (normal) decrease in their productivity, so that they carry out better the earlier steps' activities; (3) have just a rough idea of how the solution process will be developed in Seven Steps, concentrating too much effort in the first steps; and/or (4) take more time in the first steps to get familiar with the situation presented by the case. In any circumstance, the result is that when they realise the solution development is taking a long time, they speed up and decrease the quality of their solutions. Recommendations. The Seven Steps approach allows the step-by-step time control of online collaborative work. The system should intervene if the time spent by the learners discussing and answering a certain step question exceeds the time limits. This time limit is a function of (1) an estimated time to answer each step question, and (2) an estimated time to solve a specific case study. In order to provide this kind of support, the system should perform the timing and warn the learners whenever they come close to the time limits. 4.3 Participation The dialogues and edited texts revealed a higher degree of participation concentrated in one of the peers, across the different phases of the case solution, in a significant number of situations. The partner offered only a small number of contributions and an almost invariable set of acquiescence with the peer's ideas. Recommendations. The system does not aim to assess the learners' contributions individually. However, it should encourage participation. If one of the learners remains silent or has a non-significant degree of participation, not contributing to the solution (e.g., small number of utterances mainly restricted to "I agree", "yes"). the system should be able to identify this behaviour and provide an additional stimulus to improve it. This means encouraging this specific learner participation, prompting an invitation to participate. ' If the case solution is represented as a tree [12], pruning is a valid technique. What is reported here is not pruning but not considering a branch of the tree just by negligence, without purpose or a further analysis.


463

4.4 Case-Specific Utterance Patterns The dialogues presented some relevant utterance patterns, either discussion-specific or case-specific. The former referred to the process of negotiation to reach an agreement. The latter were related to the step questions and to what was included in the previous answers, i.e., related to the domain. Among these, we may focus on the case-specific utterance patterns that characterise a misunderstanding about the case solution, denoting difficulties in answering a particular step question. Recommendations, The case-specific utterance patterns may represent points at which the system should provide support concerning the case study solution process, intervening in order to draw the learners' attention to a potential misunderstanding. In order to accomplish this function, there is no need that the expert knowledge be fully represented in the system. A case study may be represented in a script, a "standardised general episode" [16]. Such a kind of knowledge structure abstracts the sequence of events present in the case and can be used to support learners/system interactions [17]. Similarly, the system should use the script-based representation to initiate interventions that are generated based on an identified misunderstanding about the case study or solution process. 4.5 Tools Co-ordination Each tool used in the experiment, namely the browser, text editor, and chat, serve a different purpose, at different points during the case solution. Although the actions initiated by one of the subjects in a particular tool window were visible to his/her distant peer, the pairs did not seem to be co-ordinated concerning switching between tools. Each peer has his or her own pace to read, type, and reason. Recommendations. The lack of co-ordination pointed out above suggests that the capability of visualising the actions performed by the peer in a particular window is not enough to keep the group together during the case solution process. As a result, the system should be able to identify this lack of co-ordination and prompt a notification to the learners about a peer's actions. Every event detected on the screen of one of the learners (e.g., a particular window is activated) should be checked against his/her peers' screen (e.g., if the peers have this same window active). If not, the learner should be notified that his/her peers are working on something else. 5 Conclusions and Future Research The empirical qualitative study results indicate that Distance Learning from Case Studies can be effective to the extent that the relevant issues in the application of the method in the traditional classroom are taken into account and properly implemented in a computational system. That is, the pedagogy of the case study activity needs to be translated into a different medium. In this system, the case method's functioning principles are the premises for the design and should be accomplished in two levels: by providing the appropriate tools in order to make the case method operational in a networked environment, and by adapting to this learning scenario the kind of support provided by the case instructor. As in the traditional classroom, guiding, promoting, and monitoring the case discussion is essential. The Seven Steps guide the case solution process. The step questions are used to promote the discussion, although this can be further improved according to the recommendations above. Concerning monitoring, the challenge is to use agent-based techniques within the customised tools of the collaborative framework to meet the recommendations derived from the experiment.

464


In summary, supporting Distance Learning from Case Studies concerns mainly the discussion management. The Seven Steps approach structures the group activity, providing the system with a manageable grain size of information. The system is then able to handle a group of distant learners, taking account of the learning processes going on in each individual learner.

Acknowledgements We thank the experiment volunteers. The first author is a scholar of CNPq - Brazil.

References [1] C. R. Christensen and A. J. Hansen, Teaching with Cases at the Harvard Business School. In. C. R. Christensen with A. J. Hansen (eds.), Teaching and the Case Method: Text. Cases, and Readings. Harvard Business School, Boston, MA, 1987, pp. 16-49. [2] G. Easton, Learning from Case Studies. Prentice Hall, London, 1982. [3] L. S. Shulman, Toward a Pedagogy of Cases. In: J. H. Shulman (ed.). Case Methods in Teacher Education. Teachers College Press, Teachers College, Columbia University, New York, 1992, pp. 1-30. [4] G. Burt, Face to Face with Distance Education. Open and Distance Education Statistics. Milton Keynes, UK, 1997. [5] M. Rich, Supporting a Case Study Exercise on the World Wide Web. In: D. Jonassen and G. McCalla (eds.), Proceedings of the International Conference of Computers in Education. AACE, Charlottesville, VA. 1995, pp. 222-228. [6] I. Oram, Computer Support of Learning from Cases in Management Education, Innovations in Education and Training International 33 (1) (1996) 70-73. [7] P. Brusilovsky et al., ELM-ART: An Intelligent Tutoring System on World Wide Web. In: C. Frasson, G. Gauthier and A. Lesgold (eds.), Intelligent Tutoring Systems, Lecture Notes in Computer Science Volume 1086. Springer-Verlag, Berlin, 1996, pp. 261-269. [8] A. J. Hansen, Suggestions for Seminar Participants. In: C. R. Christensen and A. J. Hansen (eds.), Teaching and the Case Method: Text, Cases, and Readings. Harvard Business School, Boston, MA, 1987. pp. 54-59. [9] C. Meyers and T. B. Jones, Promoting Active Learning: Strategies for the College Classroom. JosseyBass Publishers, San Francisco, CA, 1993. [10] J. Roschelle and S. Teasley. The Construction of Shared Knowledge in Collaborative Problem Solving. In: C. O'Malley (ed.), Computer Supported Collaborative Learning. Springer-Verlag, Heidelberg, 1995, pp. 69-100. [11] C. F. Herreid, A Dilemma Case on "Animal Rights", Journal of College Science Teaching 25 (1996) 413-418. [12] M. C. Rosatelli and J. A. Self, An Empirical Qualitative Study on Collaborating at a Distance to Solve a Case Study. Technical Report 98/27, Computer Based Learning Unit, University of Leeds, UK, 1998 [13] S. Franklin and A. Graesser, Is it an Agent or Just a Program? A Taxonomy for Autonomous Agents. In: J. Mueller, M. J. Wooldridge and N. R. Jennings (eds.), Intelligent Agents HI, Lecture Notes in Artificial Intelligence Volume 1193. Springer-Verlag, Berlin, 1997, pp. 21-35. [14] H. S. Nwana, Software Agents: An Overview, The Knowledge Engineering Review 11(3)(1996)205244. [15] S. J. Russel and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, NJ, 1995. [16] R. C. Schank and R. P. Abelson, Scripts, Plans, Goals and Understanding. Lawrence Erlbaum. Hillsdale, NJ, 1977. [17] A. P. Parkes and J. A. Self, Towards "Interactive Video": A Video-Based Intelligent Tutoring Environment. In: C. Frasson and G. Gauthier (eds.), Intelligent Tutoring Systems: At the Crossroads of Artificial Intelligence and Education. Ablex Publishing Corporation, Norwood, NJ, 1990, pp. 56-82.


465

Assistance and Visualization of Discussion for Group Learning Manabu Nakamura, Kazuhiro Hanamoto, and Setsuko Otsuki Hiroshima City University, Hiroshima, 731-3194 Japan Abstract. This paper proposes a multi-agent environment which assists students in discussing on a topic in a logically distinct field like physics and mathematics. A student writes his/her opinion on each private window which no other student can read. The opinion is able to be transferred to a common window if he/she wants, which all participants can read. A student's opinion is displayed on the windows which constitutes a node of a graph. The arc of the graph consists of an arrow expressing relations between two opinions. An agent is assigned to each student in a client site and plays a role of intelligent tutor in ITS. The ITS agent generates a question on the common window if the pertinent student has lost the way because of the student's missing knowledge or incorrect knowledge, or if the pertinent student has written nothing on the window more than a predefined interval. If plural ITS agents put questions simultaneously, they select the best one by negotiation from the instructional viewpoint. Another agent, a moderator agent, is introduced into the server site. The moderator agent generates its opinion on the common window only if the discussion comes to a deadlock because of missing knowledge and/or erroneous knowledge of the group.

1. Introduction In the field of cognitive science, it is pointed out that group learning has better influence on students than individual learning [1]. Because group learning is superior to individual learning at the following points: • Because a group as a whole has richer knowledge than each student in the group has, the students may meet with various opinions and examine them. • Because they share roles among them, their learning loads are decreased, compared to those of individual learning. • They are motivated by the existences of other students. • Explaining their own opinions to other students increases their understandings. Focusing on these strong points, various group learning environments have been proposed [2, 3, 4, 5, 6, 7]. Even if students are provided with environments to learn cooperatively, these strong points don't necessarily always work well. In order to keep these strong points effective, we introduce Hypothesis-Experiment-Instruction [1, 8] into our environment. We limit our investigations to learning domains in which this cycle can be used and we describe it more precisely in section 2. This paper proposes a multi-agent environment which assists students in discussing on a topic in a logically distinct field like physics and mathematics. The special feature of the environment is as follows. (1)A11 students join in the discussion via each individual client computer. The student writes his/her opinion on each private window which no other student can read. The opinion is able to be transferred to a common window if he/she wants, which all participants can read.

466

M. Nakamura et al. / Discussion for Group Learning

(2)A student's opinion is displayed on the windows which constitutes a node of a graph with attributes of proposition, agreement, etc. The arc of the graph consists of an arrow expressing relations between two opinions in both ends, e.g. causality, discrepancy, instantiation, generalization, question, refutation and so on. (3)An agent is assigned to each student in a client site and plays a role of intelligent tutor in ITS, i.e. the ITS agent constructs a student model by using domain knowledge and teaching knowledge. (4)The ITS agent generates a question on the common window if the pertinent student has lost the way because of the student's missing knowledge or incorrect knowledge, or if the pertinent student has written nothing on the window more than a predefined interval [9]. Figure 0: Outline of Group Learning If an ITS agent creates plural questions, it Environment selects the best one from the tutoring viewpoint. If plural ITS agents put questions simultaneously, they select the best one by negotiation from the instructional viewpoint. Thus the ITS agents have two functions of assisting the pertinent students. One is to correct misunderstanding and the other is to activate the discussion by inviting the inactive student into the discussion. (5)Another agent, a moderator agent, is introduced into the server site. The moderator agent constructs a "group model" which is composed of the following three constituents; first one is a logical sum of all correct knowledge in all student models. The sum expresses a scope of topics which is able to be correctly inferred by students themselves. Second one is a logical product of all erroneous knowledge in all student models. The product expresses a scope of topics which all students infer incorrectly. The last one is the logical difference of the sum from the domain knowledge, which expresses a missing knowledge of the group. (6)The moderator agent generates its opinion on the common window only if the discussion comes to a deadlock because of the missing knowledge and/or the erroneous knowledge. Figure 1 shows the outline of the environment. The learning phases, details of the private windows and the common window, the domain knowledge and the models are described in section 2, 3, and 4, respectively. In section 5, a method of assistance for group learning is proposed. Finally, discussion and conclusions are given in section 6. 2. Learning Phases According to Hypothesis-Experiment-Instruction, the learning process is divided into three phases. Thus students in a group are expected to achieve the following goals in order to make out the learning domain: (l)Hypothesis Generation Phase: Each student generates his/her hypothesis about a given problem by selecting one out of several choices. He/she is expected to make his/her opinion clear about a given problem in order to explain his/her opinion to other students in the next phase. (2)Hypothesis Reasoning Phase: They discuss the given problem together. They are


467

expected to make their opinions clear and make opposed points between their different opinions clear by discussing the given problem. A student who doesn't have his/her clear opinion is expected to examine other students' opinions. (3)Hypothesis Verification Phase: They carry out experiments to confirm the result. They are expected to make out the learning domain by knowing the result and ruminating over their inference processes and the contents of the discussion.

3. Structure Visualization of Students' Inference and Discussion Visualizing structures of students' inference and the discussion by a suitable representation, group learning environments can assist the students to achieve the goals of the learning phases. Thus the environment provides them with the following two tools. One is the private window, which is a tool for each student to make his/her opinion clear. He/she can express his/her opinion by nodes and arrows. A node represents a phenomenon and an arrow represents a relation between two phenomena. It visualizes the structure of his/her inference for him/her to make his/her opinion clear. The following seven relations between phenomena are prepared: (l)cause —> result: An arrow is drawn from a cause node to a result node when he/she puts a cause or a result. (2)cause (question) —> result: An arrow is drawn from a question node to a result node when he/she asks other students a cause the result originates from. (3)cause —> result (question): An arrow is drawn from a cause node to a question node when he/she asks other students a result the cause produces. (4)cause — question —» result: A question node is attached to an arrow from a cause node to a result node when he/she asks other students a question why the result originates form the cause. (5)negation: A negation node is attached to a node or an arrow that he/she thinks wrong. (6)opposition: A dual directional red arrow is drawn between opposed nodes or arrows. (7)no-relation: This is an isolated node that has no relation to any other nodes. He/she can set and modify a relation between two phenomena by selecting one out of these relations. Furthermore, the private window provides him/her with the following functions to make the structure of his/her inference easy to understand: (l)When he/she sets relations from one node to opposed nodes using "cause —» result," the area that includes these nodes is highlighted in order to caution him/her the contradiction of the relations. (2)Using a menu of common terminology, he/she writes words in a node. The use of the common terminology lightens his/her cognitive load originated from a synonym. (3)He/she can use a keyboard for supplementary information. The other is the common window, which is a common tool for all students to discuss the given problem. An opinion on each private window is able to be transferred to it. It visualizes the structure of the discussion to make opposed points between their different opinions clear and make each opinion clearer. Figure 2 shows an example of the structure visualization of the discussion. The following functions are characteristic ones of the common window: (1)A student can put a question by designating one of the students or the ITS agents out of a list of participants' names. (2)An added new node or arrow is highlighted for a predefined interval in order to make the modification easy to recognize. A deleted old node or arrow is shown in gray for a predefined interval, too. (3)When a student repeats an opinion represented by an existing node or arrow, a new node or arrow isn't added but an old one is highlighted for a predefined interval.

468


cause 1 of result 1 (result 3) tl

I cause 2 of resul H] (cause 1 of resul

-» result

cause of result 3

reason for negation of cause 2 of result I

Figure 1: Example of Structure Visualization of Discussion

(4)When a student agree to an opinion represented by an existing node or arrow, a new node or arrow isn't added but an old one is highlighted for a predefined interval. Furthermore, a node agreed by several students is shown as a pile of nodes and an arrow agreed by several students is shown as a thick arrow. The thickness of the nodes or the arrow is propotional to the number of the students who agree to it. (5)A11 inference process of any student can be confirmed by hilighting all the nodes and arrows supported by the student. Using this function, the students can grasp other student's opinion at a glance. (6)If necessary, they can look detailed information (e.g. the names of the students who agreed to a designated node or arrow, the deleted nodes or arrows, the chronological order of the designated node or arrow, etc) in another window. (7)The contents of the discussion can be replayed in chronological order. Using this function, students can ruminate over the contents of the discussion in chronological order. 4. Domain Knowledge and Models The domain knowledge is used when the ITS agents recognize what the pertinent students write on their private windows and when they and the moderator agent assist the students. Figure 3 is an example of the given problem. Figure 4 is a part of the causality chains in this problem which are visualized by nodes and arrows. The symbols of 'ft', Ml', 'U', and '«=' in the figure mean 'increase,' 'be unchanged,' 'decrease,' and 'be proportional to,' respectively. Knowledge concerning time sequence is represented by discrete time intervals. The domain knowledge is written in Prolog. Thus the ITS agent matches what the pertinent student writes on his/her private window with the domain knowledge and constructs his/her student model. The moderator agent gathers all student models and constructs the group model as Q: When a person standing on a described in 1, (5). The group model is updated weighing machine starts to sit whenever each student model is updated. down, how does the measurement of the weighing machine change? 5. Assistance for Group Learning Although group learning environments provide students with some tools for communication, the strong points of group learning don't necessarily always work well. The discussion may not goes well,

The measurement (1) becomes bigger. (2) becomes smaller. (3) is unchanged. Figure 2: Example of Given Problem


469

Figure 3: Domain Knowledge

for example, if there is no student who has clear opinion, if all students have the same opinion, or if no student speaks about his/her opinion. It is expected that the ITS agents and the moderator agent autonomously assist them in these cases. They participate in the discussion in the same positions as the students and speak about the given problem by referring the domain knowledge and the models. However, it may occur a case that several students need to be assisted at the same time. If each ITS agent speaks freely, the discussion may become confused. It is desirable that the ITS agents assist the students cooperatively. Thus the ITS agents negotiate among them in order to dissolve the above condition and select the ITS agents in charge of the assistance from the instructional viewpoint. The function of assistance for group learning consists of the following four processes: (1)An ITS agent makes opinions concerning differences between the common window and the pertinent student's private window. These opinions are candidates for assistance. (2)It evaluates the candidates and proposes the most suitable one from the tutoring viewpoint. (3)It negotiates with other ITS agents to select one out of all proposed candidates from the instructional viewpoint. (4)The ITS agents in charge of the assistance speak about the selected candidate through the common window. The following sub sections describe these four processes in detail together with the assistance by the moderator agent. 5.1 Candidates for Assistance An ITS agent speaks in the same position as the pertinent student by using the pertinent student model in order to assist the students. In other words, if it puts a question to a student about a phenomenon that he/she hasn't spoken about but knows, it can prompt him/her to join in the discussion. If it puts a question about a phenomenon that a student hasn't put a question about but has doubts about, he/she can learn it by another student or another agent. If it speaks about a phenomenon that the pertinent student hasn't spoken about but knows, other students can know it. When the pertinent student hasn't spoken for a predefined interval, the ITS agent creates the following candidates concerning a statement spoken by a student or another ITS agent by referring the pertinent student model and the domain knowledge: (l)If the cause in the causality hasn't been spoken, the ITS agent with the same opinion as the current topic creates a request to explain the following candidate:

470


• an opinion that shows the cause in the causality the ITS agent with the opinion opposed to the current topic creates the following candidates: • a question that asks the cause in the causality • an opinion that points out the opposed points in the causality (2)If the result in the causality hasn't been spoken, the ITS agent with the same opinion as the current topic creates a request to explain the following candidate: • an opinion that shows the result in the causality the ITS agent with the opinion opposed to the current topic creates the following candidates: • a question that asks the result in the causality • an opinion that points out the opposed points in the causality (3)If the cause and the result in the causality has been spoken, the ITS agent with the same opinion as the current topic creates a request to explain the following candidate: • an additional opinion if only a part of the causality has been spoken the ITS agent with the opinion opposed to the current topic creates the following candidates: • a question that asks the reasoning process from the cause to the result • a question that asks the cause producing the causality • a question that asks the result originated from the causality • an opinion that points out the opposed points in the causality (4)If a student or an ITS agent has asked others a question, an ITS agent creates a request to explain the following candidate: • an opinion that shows a cause or a result in the causality according to the question Regardless that ITS agents' opinions are correct or not, they can prompt the students to make opposed points and their opinions clear. 5.2 Evaluation The ITS agent evaluates the candidates by the following way in order to propose the best one from the tutoring viewpoint: (l)The longer its student hasn't spoken, the higher it evaluates the candidate in order to prompt him/her to join in the discussion. (2)It evaluates newer candidates highly in order to give priority to the continuity between the topics of the discussion. 5.3 Negotiation If plural ITS agents offer their candidates simultaneously, all the ITS agents negotiate among them in the following way in order to select the most suitable one out of all candidates: (l)Each ITS agent offers the best-evaluated candidate out of all own candidates. (2)They compare the offered candidates by the sum of evaluations in the following way and selects the best-evaluated one out of all candidates: • If an ITS agent has the same candidate as the offered one, the ITS agent agrees with it by telling the evaluation to the offerer. • If an ITS agent has a candidate that answers to the offered question, the ITS agent agrees with it by telling the evaluation to the offerer. • If an ITS agent has a questioning candidate about the offered one. the ITS agent

472


be shown it by another student or another ITS agent. In general subjects of discussions can be divided into two categories: one is the subject which can perfectly represented and explained by logical inference, like mathematics and physics. The other is the subject which is described using preferences or beliefs, like decision of a destination of excursion. The method of structure visualization of the discussion is necessarily different between these two categories. In this paper, we have treated the former subject only. References [l] Hatano, G. and Inagaki, K., "A Two-level Analysis of Collective Comprehension Activity," Paper presented as part of the symposium entitled "Integrating the cognitive and social in the construction of mathematical and scientific knowledge," AERA, 1994 [2] Ikeda, M., Go, S., and Mizoguchi, R., "Opportunistic Group Formation -A Theory for Intelligent Support in Collaborative Learning-," Proc. of AI-ED 97, pp. 167-174, 1997 [3] Inaba, A. and Okamoto, T., "Negotiation Process Model for Intelligent Discussion Coordinating System on CSCL Environment," Proc. of AI-ED 97, pp. 175-182, 1997 [4] Chan, T. W., Chou, C. Y., Lee, M. F., and Chang, M. H., "Reciprocal-Tutoring-Kids: Tutor-Tutee Role Playing Systems," Proc. of AI-ED 95, pp. 226-233, 1995 [5] Hoppe, H. U., "The Use of Multiple Student Modeling to Parameterize Group Learning," Proc. of AI-ED 95, pp. 234-241, 1995 [6] McCalla, G. I., Greer, J. E., Kumar, V. S., Meagher, P., Collins, J. A., Tkatch, R., and Parkinson, B., "A Peer Help System for Workplace Training," Proc. of AI-ED 97, pp. 183-190, 1997 [7] Suthers, D. and Jones, D., "An Architecture for Intelligent Collaborative Education Systems," Proc. of AIED 97, pp. 55-62, 1997 [8] Itakura, K., "A New Method of Science Teaching," Kokudo Sha, 1966 (in Japanese) [9] Nakamura, M. and Otsuki, S., "Group Learning Environment Based on Hypothesis Generation and Inference Externalization," Proc. of ICCE'98, Vol. 2, pp. 535-538, 1998


471

agrees with it by telling the evaluation to the offerer. • Although an ITS agent doesn't has a questioning candidate about the offered one, if the pertinent student doesn't know the offered one, the ITS agent agrees with it by telling the evaluation to the offerer. Because it is meaningful for him/her to know the phenomena concerning the offered candidate. (3)In general, there exist plural ITS agents that agree with a question about the selected candidate. We call the ITS agents "questioner set." On the other hand, there exist plural students that their ITS agents agree with speaking about the selected one. We call the students "explainer set." 5.4 Assistance The assistance by the ITS agents is limited to putting a question only. The ITS agents in charge of the assistance are selected out of the ITS agents that agree with the selected candidate by the following ways and speak about the selected candidate through the common window: • One of the questioner set puts a question to a student in the explainer set who hasn't spoken for a longest interval. This assistance prompts the student to join in the discussion. • When the questioner set is null, the moderator agent puts a question to a student in the explainer set who hasn't spoken for a longest interval. This assistance prompts the students to share their opinions. • When the explainer set is null, one of the questioner set puts a question about the selected candidate. If there is no student to answer, the moderator agent replies it. This assistance prompts the students to acquire new knowledge. • One more role is required to fulfill the objectives of the group discussion. In order to assist the students in case that no student has a required part of knowledge to infer a correct answer, the moderator agent supplements this deficient role in the discussion by using the missing knowledge in the group model. 6. Discussion and Conclusions Suthers et al. proposed an intelligent collaborative educational system [7]. It provides students with shared workspaces for constructing inquiry diagrams. Furthermore, intelligent coaches provide the student with methods of inquiry, feedback concerning correctness, and new information if he/she wants. As described in section 1, the objective of our method is to make the strong points of group learning effective. Therefore we focused not on supplementing individual student's knowledge directly, but on supplementing it indirectly through the group discussion in the following ways: • We employed the most effective assistance from the instructional viewpoint. • We employed multi-agents to assist the students in the same positions as the students. The ITS agents assist the students for the following purposes: (1) they prompt a student to join in the discussion, (2) they prompt a student to supplement a deficient part of the discussion, (3) they supplement a deficient part of knowledge through the discussion. One ITS agent is assigned to each student and speaks or puts a question about the given problem by referring his/her student model. The above purposes are achieved respectively as follows: • If an ITS agent puts a question to a student who hasn't spoken but knows, the ITS agent can prompt him/her to join in the discussion and other students can know it. • If an ITS agent puts a question that a student hasn't put but has doubts about, he/she can

Supporting Mathematics Learning



475

A Semi-Empirical Agent for Learning Mathematical Proof Vanda Luengo Equipe EIAH, Laboratoire Leibniz, 46 avenue Felix Viallet 38041 Grenoble, France. email: [email protected] Abstract. The paper is organized as follow. First, it presents some concepts of semi-empirical theory and the notion of agent in this theory. Then we explain why we use this theory in our model and we present two characteristics of our model: the role of the objet of knowledge and the interaction between agents. In the last part, we present some elements of Cabri-Euclide to illustrate how we implemented our model and to illustrate CabriEuclide capacities as a semi-empirical agent.

1. Introduction Research results in the didactics of mathematics show that the resolutions of problems are essentially dialectic, based on the interaction between proof and refutation ([1] p. 4). This characteristic is particularly true in geometry where graphic register (in which experiences are developed) and linguistic register (in which conjectures are expressed) interact. Research on computer environments for human learning in this area proposes only environments that strongly limit the freedom of the user in problem solving ([2] p. 27). Regarding the design of the learner model, all approaches ([2], p. 18) have the use of automatic demonstration capabilities to build solutions of reference in common. The learner's understanding is verify by comparing the user's demonstration to one of the demonstrations produced by the software, or by making automatic deductions from the user's productions. These approaches share the idea of linking what the user produce and the model building before the learning process begins. These systems function with correct reference knowledge. Even if there is software ([3] and even if it [4]) that allows the possibility for making the status of a statement evolve (it is possible to have conjectures), this statement has to be true in the problem and in the knowledge recognized by the software. Such approaches leave few places to the empirical proof whose role is essential in the process of discovery and therefore in the resolution of problems. In the analysis that we carried out ([5] and [2]), we show the limits of software having these characteristics and we suggest some ideas about design not limited by these constraints. In particular, for learning mathematical proof, the system has to allow the construction of learning from knowledge that is not supposed exact, but that is able to evolve during the interaction. In order to develop a system with this feature, we used semi-empirical theory [6] and the theory of didactic situations ([7] and [8]). The central idea is therefore to have a rational agent that allows an evolution of the knowledge during the interaction of proof and refutation ([1] and [2]).

2. Semi-empirical Theory The purpose of semi-empirical theory [6] is the construction of reasoning from knowledge not supposed exact and capable of progressing during interaction with a user trying to understand an

476

V. Luengo / Agent for Learning Mathematical Proof

area of application (ibid. p. 9). Semi-empirical theory is defined in opposition to forms of representation of knowledge that stipulate the reason only from stable and exact knowledge. This semi-empirical theory defines the construction of concepts as the study of forms of knowledge that condition the elaboration of computable agent models by the machine. These models have to satisfy two conditions: they must remain relevant for the user and they must valid during experimental encounters, without reproducing the reasoning of the user or simulating the behavior of the phenomenon studied (ibid. p. 10). A system base on semi-empirical theory has to allow the user to conjecture with that greatest degree of certainty possible the behavior of a universe that is too complex for a computable model. The system functions from partial data, and the solutions it proposes cannot always be supposed true (ibid. p. 10). Consequently, the problem in designing the systems broadens to include the control of phenomenon that do not have foreseeable or known computable models. A postulate of semiempirical theory is that it is possible to learn the knowledge by an experimental process. We can therefore envisage that the knowledge can be reconsidered and that it could evolve during interactions with the user. In Sallantin's proposal [6], dialogue with the user is a process whose function is to improve a conjecture by producing examples and by examining negative feedback in the form of counterexamples. Even if in our case the interaction is not made exclusively by the examples (or counterexamples), we retain the idea of the interaction process around a conjecture that, as its name indicates, has no definitively established truth.

2.1. The notion of agent in Semi-empirical theory For Sallantin, the agent progresses to the point of considering the validity and the relevance of its empirical theories by interacting with other agents (ibid. p. 45). A world is a group of agents communicating among themselves. Two agents communicate by exchanging examples and objections (ibid. p. 46). The term "rational" is used by Sallantin to mean controllable by formal recognized methods and possible to transmit and argue. The author presents the term of agent in the sense of a supplied unit of these mechanisms of argument. Formal constraints confer autonomy to the agent, in the sense that it controls the laws governing development of the system (ibid. p. 65). Thus, Sallantin defines an agent as a system capable of deciding with incomplete knowledge. Capacities attributed to rational semi-empirical agents are: 1. to produce dialogues; 2. to have the capacity to argue and to communicate; 3. to allow the construction of reasoning from re-examined knowledge, likely to evolve in the course of the interaction. 3. The Semi-empirical theory in our model The objet of knowledge in our software is the mathematical proof in geometry. We analyzed didactical, epistemological and cognitive works about characteristics of proof, and we designed the software from this analysis [2].


477

We chose to work in a problem-solving situation and to allow the proof/refutation dialectic. We also chose to limit ourselves to deductive proof and to use the analysis of Duval [9] for the representation and manipulation of this type of proof. These choices are explained in Luengo [2] and Balacheff [1]. Thus semi-empirical theory proposes a model of reasoning that, in our opinion, is relevant to a process of problem solving and mathematical proof construction. It is relevant because of its relation to the validity of intervening statements and because of the manner in which one constructs this reasoning. A hypothesis at the basis of our work is that the process of problem solving is semi-empirical: "The objective of Lakatos' model is to prove a conjecture by determining its proof and the area of validity of this proof. This step produces a semi-empirical theory that expresses the conjecture with the help of relationships between statements of the language" ([6], p.37). For us, reasoning and types of interaction depend on the object of knowledge. The main characteristic of our model is that all interactions are conceived according to the object to know, the object that gives the reason for the interaction. Thus, the focus in the interactions is on characteristics and properties of the objects of knowledge, not on how the problem is solved, the type of reasoning or the strategy that has been implemented. Furthermore, it is equally necessary to take account of exchanges that are going to allow the user to overcome constraints of interaction, because the problems are not necessarily relative to knowledge, but relative to constraints of communication (for example, the user does not know how use a particular functionality). In the model presented here, which is a model of interaction for learning, we consider human rational agents (HRA) and artificial rational agents (ARA) as having as interactions those considered in the theory of didactical situations ([7] and [8]).

3.1 The object of knowledge As said above, the agent has a specific relation to knowledge. Therefore, in the case of problem solving, each agent has a particular task to fulfill as a function of the role that the object of knowledge that is associated with it plays. It is the interaction between all agents that is going to allow the resolution of the problem. In our case, the figure allows the heuristic work, the text environment allows the formulation of statements and deductive proof and the graph of the proof, allows the user to represent the deductive organization. An important aspect of our model is the fact that each agent has a module of knowledge that is going to interact with the user's knowledge. Each ARA that has independent knowledge of other ARA has to be able to work independently. In other words, we consider autonomous rational agents. In the case of proof, for example, the agent that make the graph of a proof can be independent from the agent that allows the construction of the figure. The didactic analysis of objects of knowledge will be the key to the success of the design of our model, because the choice that will be suitable in relation to knowledge will determine the main characteristics of the model. This choice will be made relative to representation of the objects and the role in the different forms of interaction. The representation of the objects of knowledge is based on cognitive analyzes of the mathematical proof [9]. The deductive proof in our model is a set of elementary steps. Each step is composed of 3 types of statements: the statements given (or hypotheses), one deductive rule, and one exit statement or conclusion of this step.

478


3.2 Interactions In this section, we present the conditions (or rules of the game) for interactions that we take into account in order to allow an exchange concerning the object of learning at stake. These conditions are analyzed in connection with the different types of interactions in the theory of didactical situations ([8], chap. 3 and 4): action, formulation and validation.

Interaction for Action This refers to actions and decisions that act directly on the other protagonist. The construction of an object, for example, is an interaction for the action. The ARA must offer ways in order for the HRA manipulates abstract objects in a broad sense, i.e. it has to allow the creation of an object as well as the determination of its mathematical characteristics. This implies that, in the conception of ARA, we have to define a formal system [10], i.e. a set of primitive objects, a set of elementary operations as well as a set of rules expressing how operations can be implemented and associated. The ARA has to allow the manipulation of these objects. Specifically, the ARA has to allow the creation, the elimination, the manipulation, the linking of the objects and the change in their characteristics. For the manipulation of objets, we use the idea of direct manipulation [11]: "Continuous representation of the object of interest, physical actions (movements and selections or labeled button presses instead of complex syntax and rapid, incremental reversible operations whose impact on the object of interest is immediately visible". The ARA also has to determine the reactions to the actions produced by an ARA (or HRA). For us, these types of feedback have to be defined in relation to the object of knowledge that intervenes in the action. In our case, for example, if the statement "segment AB" is propose, the text agent can say that this segment does not exist in the figure. Interaction for Formulation Here the purpose is to design an ARA that allows the HRA to make formulations in the linguistic register concerning the object of knowledge. The key to the communication itself is expressed by the feedback. As Brousseau indicates [7], the demand for the communication focuses on the conformity to the code (minimal for the intelligibility of the message), the ambiguity, the redundancy, the lack of relevance (of superfluous information) and the efficiency of the message (must be optimal). Here we must to consider here the representation of the knowledge, in the formal system and at the interface, because this will determine possible formulations that could be realized between agents. Tools of communication have to be accessible all along the resolution. Thus, if the object of knowledge is the mathematical proof in Euclidean Geometry, it is necessary to define the formal object in a manner that allows HRA to construct formulations relative to this type of Geometry (parallel, perpendicular, hypotheses, theorem, etc.) in the interface.

Interaction for validation This allows the production of interactions for the production of judgments about abstract objects that intervene in the process of learning. The ARA also has to provide the ways of validation of the knowledge implied in the process of resolution. It also has to be capable of making judgments about these abstract objects. In the case of Cabri-geometre, a microworld for the dynamic construction of geometric figures ([12] and [13]), the software allows the analysis of properties of the figure by the displacement of objects and it is equally capable of producing a judgment concerning a property.


479

As for the other interactions, the ways of validation will be relative to the object of knowledge. Here, the challenge is the one proposed by Brousseau ([7], p. 108): to allow the user to implement the mathematical knowledge as a way of convincing (or of convincing him/herself) while encouraging HRA to reject rhetorical refutations or rhetorical ways that a do not constituted good evidence. This does not mean that it is not necessary to consider how to validate and produce pragmatic refutations; however the final agreement has to be made by validations and intellectual refutations (ibid.). We also have to consider the possible communication constraints on the three types of interactions. That gives us a way to anticipate and overcome these constraints. Communication constraints exist, in our opinion, because the ARA imposes a more or less flexible set of rules. Thus, in the actual version of Cabri-Euclide for constructing the statement, the HRA must to type a set of characters respecting a set of rules in order for the software to understand this formulation. In the design of Cabri-Euclide (not yet fully implemented) HRA produces the statement with the figure. To produce the statement "A parallel B", we choose the parallel property and we show the lines A and B. This can reduce the constrain for producing the statement and consequently the communication constraints.

4. A Semi-empirical agent: Cabri-Euclide In this section, we will show some aspects of Cabri-Euclide that illustrates its capacities as a semi-empirical agent.

4.1 The object of Knowledge All interactions have to be made in relation to objects of the knowledge that we have taken into account during the analysis. Thus there are three type objects in Cabri-Euclide with which the user can interact: the figure, the text and the graph. It's therefore possible to interact with three piece of software in constructing the proof: Cabrigeometre I ([12] and [13]) for the figure, Cabri-Euclide [2] for the text, and Cabri-Graphe [14] for the graph. In the next figure, we see the three contexts:

Figure 1. Cabri-Euclide software

480


4.2 Interactions The Cabri-Euclide agent can interact with Cabri-geometre software [15] in constructing the figure. The user can draw a figure in order to do the heuristic analysis of the problem. S/he can use the dynamic features of this software to make the conjectures. There is a relationship between the text and the figure. All objects in a statement have to exist in the figure. One characteristic of this relationship is the potential, when possible, for proposing counter-examples to the user. When the user proposes a statement, and if this statement is a property (parallel, perpendicular, etc.), Cabri-Euclide asks the oracle of Cabri-geometre if mere is a counter-example, and if there is, the software proposes the counter-example. For example ([2], p. 203): The user figure:

Statement of the user

the counter-example:

[C,I] a la meme longueur que [F,G] Cabri-Euclide Feedback: A Cette propriete est apparemment urale su / !\ votre figure, mais elle ne Test pas dans le cas general: voulez-vous an contre-eHemple ?

Translate: This property is apparently true on your figure, but it is not true in the general case. Do you want a counter-example ?

[C,l] meme longueur[F,G](CF) car

[A,C] meme longueur [B,D] (CF) [A,C]-2fC,llCHV) [B,D] = 2[F,G](PV) TRANSITIVITE EGALITE (TH) (CF) = false conjecture

Example 1: figural refutation.

This example shows the capacity of the agent to manipulate knowledge whose degree of truth is not completely established. It also shows the capacities of communication that exist with another HRA or ARA. Cabri-Euclide's agent is responsible for formulating and verifying of the statements. The user can construct and manipulate the statements and propose the relationship between the statements in order to construct a proof. The kind of proof the user can construct is the deductive proof: there are theorems and definitions, and the user can associate these statements with the statements constructed by himself or herself. The text agent verifies the consistency of the proof construction and is able to make refutations. There are tree types of verification concerning the consistency of the proof ([2] p. 94): • • •

The relation between the figure and the statements (for example, the property statement has to be true in the figure); The deductive organization (for example, the relation between hypothesis, theorem and conclusion); The status of the statements (for example, a true statement can't have a false statement in its proof).

The text agent does not automatically verify all operations performed by the user. This is because the agent does not know whether the user has finished until the user informs it. The


481

verifications produced when the user asks for them are part of the deductive organization and the status of the statements. In the first case if the theorem cannot deduce the conclusion or if there is not enough hypothesis, Cabri-Euclide give a negative feedback, and, if the user asks for it, the agent shows the theorem and why the organization is not correct. The design of this agent includes the possibility of transferring the statement to other problems or transferring the mathematical proof constructed to the toolbox of theorems. This possibility allows the teacher to introduce a mathematical proof like a conjecture and build the evolution toward the theorem. In the same way, it is possible for the teacher to construct new theorems with Cabri-Euclide's interface. We can also construct new definitions: macro-definitions. To do this, the user proposes the basic objets of the definition and, if these new objet (or definition) have properties, it is possible to associate these properties with the macro-definition. The definition plays two roles: the first is to produce a deductive proof as well as a theorem (ABCD is a parallelogram by definition because [AB] is parallel to [CD], [AD] is parallel to [BC]). The second role is to formulate statements (for example, for the formulation of the parallelogram or the square, we use the macro-definition because such objets are not basic objets). The last two paragraphs show the capacity of Cabri-Euclide, as a semi-empirical agent, to enhance the object knowledge of the text agent in interaction with a human agent. Cabri-Euclide can interact too with Cabri-Graph software [14] for the construction of the graph. But in this case, in the actual version, the human agent can't construct graph objets. HRA can manipulate only the objets of the graph by direct manipulation. Cabri-Euclide constructs the graph from the set of statements constructed by the HRA. We can distinguish in the following table the capacities of these three different registers in connection with the possible interactions presented in our model: Object of Knowledge Geometric Figure Cabrigeometrel

H=>C

Deductive proof in Euclid geometry

CabriEuclide Graph of proof

Cabri-graph

Action • Construction, • Macro-construction •Save •Open •Exit • Change the properties of objets • Displacement of objets • Construction • Macro-construction • Association of statements •Save •Open • Exit • Change the properties of objets • Construction • Displacement of objets •Save •Open •Exit

Formulation

Validation

Euclid geometry Objets: points, line, etc.

• Displacement of objets, • Verification of figure properties

Statement : properties, definitions, theorems, deductive inference: Hypotheses => Conclusion Couple : (vertex or statement, arcs or inferential link)

• Verification of coherency • counter-example • deductive organization • status

Table 1. Kinds of interactions with the Human Rational Agent.

482


5. Conclusion Utilizing the theory of semi-empirical agents has allowed us to bring together the capacities of systems with empirical interactions and the evolution of learning in a pragmatic context thanks to systems that take account of formulation validity. This theory is appropriate for us because it provides a conceptual way of producing a system composed of several agents processing different objects of knowledge, but able to be brought together in a new object of knowledge. From the point of view of the implementation, it is interesting to see the integration of several applications that have heretofore been conceived with different capacities, i.e. Cabri-geometre, and Cabri-graph. Furthermore, the feasibility of associating several applications depends on characteristics of this software: it is necessary among other things for these applications to have ways of communicating with each other. Interaction with semi-empirical agents will be more or less a success depending on the capacity of these agents to negotiate and to convince. This is in relation to the fact that agents work with knowledge that can be re-examined and, in the case of an interaction (for example, refutation), the system must be capable of arguing based on the truth of its formulation. We have seen during the experimentation how the knowledge of a user who was in error persisted and resisted against the counter-example proposed by the system ([2] p. 173). We can thus conclude that the model depends strongly on its capacity to argue and negotiate, i.e. on its capacity to communicate.

6. References [1] Balacheff N., Apprendre la preuve, in: Sallantin J. (eds.) La preuve a la lumiere de I'intelligence artificielle. Paris: PUF (a paraitre), 1997. [2] Luengo V., Cabri-Euclide : un micromonde de preuve integrant la refutation. Principes didactiques et informatiques. Realisation. These de doctoral. Grenoble : Universite Joseph Fourier, 1997. [3] Bernat P., CHYPRE : Un logiciel d'aide au raisonnement, Reperes - 1REM n° 10, 1993. [4] Py D., Aide a la demonstration en geometrie : le projet MENTONIEZH. Sciences et Techniques Educatives. Vol. 3 2/1996, pp. 256-277. [5] Luengo V. and Balacheff N., Contraintes informatiques et environnements d'apprentissage de la demonstration en mathematiques, Sciences et Techniques Educatives. Vol. 3 2/1996, pp. 256-277. [6] Salantin J., Theories semi-empiriques : conceptualisation et illustrations. Revue d'intelligence artificielle. Vol. 5, n°l, pp. 9-67, 1996. [7] Brousseau G.,Fondements et methodes de la didactique des mathematiques. Recherches en Didactiques des Mathematiques, Vol. 7, n° 2, 1986, pp. 33-115. [8] Brousseau G., Theory of Didactical Situations. Dordrecht : Kluwer Academic Publishers edition and translation by Balacheff N., Cooper M., Sutherland R. and Warfield V., 1997. [9] Duval R., Structure du raisonnement deductif et apprentissage de la demonstration. Educational Studies in mathematics 22, 1991, pp. 233-261,. [10] Balacheff N., Didactique et Intelligence Artificielle. Recherches en didactique des mathematiques, Volume 14/1.2, 1994, pp. 9-42. [11]Laborde J., Intelligent Microworlds and Learning Environments. Nato Asi Series. Berlin : Springer Verlag. 1995, pp. 113-132. [12] Laborde J., Projet d'un Cahier Brouillon Informatique de Geometrie. Rapport interne (IMAG). Grenoble, 1985. [13] Bellemain F., Conception, realisation et experimentation d'un logiciel d'aide a I'enseignement de la geometrie: Cabri-geometre. These de doctorat, Universite Joseph Fourier, Grenoble, 1992. [14] Carbonneaux Y. et al.,. Cabri-graph: A Tool for Research and Teaching in Graph. Graph Drawing'95, Frank J. Brandenburg (Eds.), Springer Verlag, Lecture Notes in Computer Science n°1027, 1995 pp. 123-126. [15]Tessier S., Laborde J.M., Description des evenements Apple acceptes par Cabri-geometre. Rapport Technique. IMAG, Universite Joseph Fourier, Grenoble, 1994.


483

A Proof Presentation Suitable for Teaching Proofs Erica Melis* Universitat des Saarlandes, Fachbereich Informatik 66041 Saarbrucken [email protected]

Uri Leron Dept. of Science Education Technion Inst. of Technology Haifa, Israel 32000 [email protected]

Abstract. The paper addresses comprehensible proof presentation for teaching and learning that can be provided by an automated proof planner that is a component of the proof development environment OMEGA. Starting from empirically discovered requirements for the comprehensibility of proofs we show that, given a proof plan representation of a proof, the problem of automatically presenting a mathematical proof in a comprehensible manner becomes feasible in a way that was impossible before. We show how, based on a proof plan, a structured presentation at the level of proof methods can be automatically constructed

1 Introduction Computer algebra systems have been successfully used in teaching mathematics (see, e.g. a summary of Dana Scott's courses [12]). Can automated theorem proving systems be useful for teaching mathematical proofs? and under which conditions? The traditional automated theorem provers' output is at best a presentation of steps representing logic calculus rules.1 Therefore, this output is hardly readable for an untrained user, let alone understandable as a mathematical proof, even for mathematicians. For instance, for the Theorem: LIM+: The limit of the sum of two real-valued functions is the sum of their limits. a so-called e-5-proof would construct a real number 5 dependent on (. such that 0 < e - > ( 0 « 5 A V a : ( | : r - a | < 6 -> \(f(x) + g(x)) - (L1 + L 2 )| < e)) holds, provided lim f (x) = L1 and lim g(x) = L2, whereas its proof as produced by the automated theorem prover OTTER [10] is the following 20 [binary, 12.2,10.3] -LE(0,x)l -LE(abs(F(XS(x))+ -F(A)) .HALF(E0)) I -LE(abs(G(XS(x))+ -G(A)),HALF(E0)). 126 [binary,20.2,3.3] -LE(0,x)l -LE(abs(G(XS(x))+ -G(A)),HALF(E0))I -LE(0,HALF(E0))I -LE(abs(XS(x)+ -A),D1(HALF(E0))). 699 [binary,126.3,11.2,unit.del,5] -LE(0,x)l -LE(abs(G(XS(x))+ -G(A)),HALF(E0))I -LE(abs(XS(x)+ -A) .D1(HALF(E0))). 991 [binary,699.3.8.2] -LE(0,x)l -LE(abs(G(XS(x))+ -G(A)).HALF(E0))I - L E ( X S ( x ) + -A),MIN(y,Dl(HALF(E0)))). 1194 [binary,991.3,6.2,factor.8imp] -LE(0,MIN(x,D1(HALF(E0))))I -LE(abs(C(XS(MIN(x,Dl(HALF(EO)))))+ -G(A)).HALF(E0)). 1196 [binary,1194.1,9.3] -LE(abs(G(XS(MIN(x,Dl(HALF(E0)))))+ -G(A)).HALF(E0))I -LE(0,x)l -LE(0,D1(HALF(E0))). 1231 [binary,1196.3,1.2] -LE(abs(G(XS(MIN(x.Dl(HALF(E0)))))+ -C(A)),HALF(E0))I -LE(0,x)l -LE(0,HALF(E0)>. 1276 [binary,1231.3,11.2,unit.del,5] -LE(abs(G(XS(MIN(x,Dl(HALF(E0)))))+ -G(A)),HALF(E0))I -LE(0,x). 1284 [binary,1276.2,2.2] -LE(abs(G(XS(MIN(D2(x),D1(HALF(E0)))))+ -G(A)).HALF(E0))I -LE(0,x). 1297 [binary,1284.1,4.3] -LE(0.x)l -LE(0,HALF(E0))I -LE(abs(XS(MIN(D2(x),D1(HALF(E0))))+ -A),D2(HALF(E0))). 1303 [factor,1297.1,2] -LE(0,HALF(E0))I -LE(abs(XS(MIN(D2(HALF(E0)),D1(HALF(E0))))+ -A),D2(HALF(E0))). 1413 [binary,1303.1,11.2,unit.del,5] -LE(abs(XS(MIN(D2(HALF(E0)),D1(HALF(E0))))+ -A),D2(HALF(E0))).

*This author was supported by the Deutsche Forschungsgemeinschaft, SFB 378. 1 such as resolution

484

1417 1418 1419 1422 1428 1430 1431

E. Melis and U. Leron / Proof Presentation for Teaching Proofs

[binary,1413.1.7.2] -LE(abs(XS(MIII(D2(HALF(E0)) .D1(HALF(E0))))» -A) ,MIN(02(HALF(E0)) .x)) [binary. 1417.1.6.2] -LE(0.MIN(D2(HALF(E0)) ,Dl(HALF(E0)))) [binary,1418 .9.3] -LE(0,D2(HALF(E0)))I - L E ( 0 . D 1 ( H A L F ( E 0 ) ) ) [binary,1419. .2.2] -LE(0,D1(HALF(E0)))I -LE(0.HALF(E0)) [binary,1422. ,1.2.factor_sinp] - L E ( 0 , H A L F ( E 0 ) ) . [binary,1428. .11.2] -LE(0,E0). [binary.1430. .5.1] False.

The need for better outputs was recognized long ago and several attempts have been made to produce readable proofs. Proof presentation in natural language has recently been realized in ILF [6] and PROVERB [7] which slightly abstract proofs before the presentation. PROVERB returns a proof presentation at the so-called assertion-level.2 However a proof verbalization at this level is not necessarily the most natural and best way to communicate a proof to mathematicians or to students. In particular, often the proof is not abstract enough and the user cannot go from an abstract level to a more detailed level because a (hierarchical) structure is missing. What is an appropriate presentation of proofs for teaching and learning? What kind of system can support the student (and the teacher) to understand (and to present) a proof properly? This paper will give a partial answer to this question. In this paper, we shall show how automatically generated proof plans can serve as a basis for an appropriate proof presentation that resembles as much as possible 'good' proofs generated by humans. Although the idea to use proof plans as a basis for a structured proof presentation might seem obvious, it is new and our systems is the first to implement this idea.

2

Empirical Results on Proof Presentation

Empirical class room studies have provided useful insights about the presentation of proofs for students. In particular, (hierarchically) structured presentations help students to understand and memorize proofs. Catrambone [4] proved empirically that people often memorize a sequence of problem solving steps when studying example in mathematics or physics without learning what domain-dependent subgoals or subtasks these steps achieve. As a result, they have trouble solving novel problems that contain the same structural elements. Consequently, the emphasis of subgoal structure yields a more successful transfer to novel problems and teaching this structure helps to solve problems that require novel techniques to solve them. In [5] he showed that even meaningless labels that structure a proof support the problem solving capabilities of people. Even more impressive is a comparison of a hierarchical presentation of a proof with the standard linear presentation. Linear proofs, which proceed unidirectionally from hypotheses to conclusion, may be suitable for mechanically checking the validity of the proof, but they leave the student mystified as to what the main ideas of the proof are, why those steps are taken and why in that order. On the other hand, structured proofs consist of independent modules which are arranged hierarchically. First the top level is presented, giving the main idea of the proof, and then the next levels which supply successively finer level of detail till we get to the bottom where all the 'leaks' left at higher levels are plugged and the proof is 'waterproof'. The top level gives a global view of the proof at the cost of leaving some gaps which are then filled at the lower levels. For a comparative analysis of many proofs, elementary as well as advanced, in the two styles, see [8]. Leron has shown that, as opposed to the linear presentation 2

An assertion is an axiom or definition the proof refers to.


485

of proofs, a hierarchically structured presentation makes explicit the (intuitive and) global ideas behind the proof, hence helps students to understand and to be able to reproduce a proof rather than just check its correctness [8]. Leron demonstrates among others one particular principle for structuring proofs that involve the construction of mathematical objects: At the abstract level the existence of a mathematical object that satisfies certain properties P is postulated and with this assumption the proof (idea) is completed. Only at a hierarchically lower level an object that actually satisfies P is constructed. For teachers and textbook authors, who are aware of the deep ideas behind a proof, presenting the proof in a structured format is an easy task. For students or readers structuring a linear proof amounts to re-discovering those deep ideas from scratch and may be a very difficult, even impossible task. Since our system J7MEGA[2] is designed to assist students in proving mathematical theorems, to understand the proofs produced (semi-)automatically, and to improve their capabilities, its proof planner should produce a output comparable to an expert teacher's proof presentation.

3

Proof Planning

Proof planning is a relatively new paradigm in automated theorem proving [3, 11]. As opposed to traditional automated theorem proving which searches for a proof at the level of a logic calculus, proof planning automatically constructs a plan by searching at a higher level of abstraction and by using explicit control knowledge. A resulting plan represents a proof at the more abstract level. A plan consists of methods (which are plan operators). These methods represent (i) common patterns in proofs, e.g. the common structure of an induction proof or (ii) common (procedural) proof techniques such as term simplification or the computation of a derivative. Examples of methods are Diagonalization, Induction, estimation methods, lifting methods, and the application of an important theorem such as the Hauptsatz of Number Theory (each natural number can be uniquely represented as the product of prime numbers). Proof planning chains abstract methods together and then, if desired by the user, recursively expands abstract methods to less abstract subplans. During the expansion of methods, previously hidden subgoals may become visible and subject for proof planning. Hiding of subgoals is a form of hierarchical planning (precondition abstraction) and using more abstract methods is another one (operator abstraction). Both yield a hierarchically structured proof plan. The proof planner of QMEGA automatically produces the plan for LIM+ and the other theorems mentioned below. The graphical user interface of QMEGA , LOUI, makes accessible the hierarchical structure of the proof plan data structure (PDS) by first showing the most abstract proof plan in a tree-like graph (top left window in Figure 1). Then the nodes of this abstract proof plan can be expanded and this yields a (hierarchically lower) subplan. e -6-Proofs We shall illustrate the use of proof plans with an example from the class of limit theorems that can be automatically planned by QMEGA. The experiments for planning and presenting proofs were mainly conducted with examples from this class. The class of limit theorems includes the well-known theorem LIM+ from calculus that states that the limit of the sum of two functions in the real numbers, IR, is the sum of their limits. lim f(x) = L1 A lim g(x) = L2 -> lim (f (x) + g(x)) = L1 + L2,

486


which, after expanding the definition of lim, becomes V e j ( 0 < c , ->3*,(0 |/(zi)-L,| < e , ) ) ) A Vf 2 (0 < e2 -> 3*2(0 «52 A Vz 2 (x 2 ^ a A |i2 - a| < <J2 -> |s(z2) - L2| < € 2 ))) -» Vf(0 < f -» 35(0 «$ A Vz(i ^ a A | a ; - a i « J - 4 |(/(i) + 5(1)) - (L1 + L 2 )| < e))) Other class members are, e.g., similar theorems about differences (LIM-) and products (LIM*), Composite that states that the composition of two continuous functions is continuous, ContlfDeriv that states that a function having a derivative at a point is continuous there, Cont+ that states that the sum of two continuous functions is continuous and a similar theorem (Cont*) about products, UNIFcont that says that a uniformly continuous function is continuous, and theorems like LIMsquare: lim x2 = a2. and other theorems about continuity. The typical way a mathematician proves such a theorem is to (incrementally) construct a real number 6 that depends on f. In the remainder, / , * , + , — , || denote the division, multiplication, addition, subtraction, and absolute value function in IR, respectively. Fa denotes the result of applying a substitution a to an expression F. 3. 1 Proof Plans for t -6 - Proofs In proof planning limit theorems, some of the methods to be used are the estimation methods ComplexEstimate and Solve and other methods such as UnwrapHypothesis, Skolemize, and IntroduceCS. ComplexEstimate proves the estimation of a complicated term |b| by representing b as a linear combination k * aa +l of a term a whose estimation is already known and by reducing the goal |b| < e to three simpler subgoals that contain M - a real number whose existence is postulated by ComplexEstimate:

1. |k| < M, 2. |a| < e / ( 2 * M ) , 3. |l| < c/2.

For instance, in planning LIM+ , at some point the goal | f ( x ) + g(x) — (l1 + l2) | < e is reduced by ComplexEstimate to

(1) | (2) |f(X1) – l1| < e / (2 * M),

(3) |g(x)-l 2 | < / 2. Another estimation method, Solve, handles simple inequality goals such as x < a. Solve does not produce any subgoal but only removes the particular goal. 4

The Use of Proof Plans for Proof Presentation

Hierarchically structured proof plans and the representation of common proof procedures and common structures by planning methods provide information needed for a comprehensible proof presentation and for proof explanation.


487

Verbalization of Methods The verbalization of the methods belongs to the domain expertise and therefore has to be designed by a maths knowledge engineer. As experience shows, some methods in proof plans are heavily verbalized, whereas others, such as QuantifierElemination and AndElimination, are hardly verbalized at all in standard mathematical texts. An example for an always verbalized method in plans of limit theorems is ComplexEstimate. ComplexEstimate delivers the main idea of the LIM+ proof, whereas other methods will be mentioned in a more detailed description of the proof only. In the textbook [1] the counterpart of ComplexEstimate is, with slight linguistic variations, verbalized as follows. In order to estimate the magnitude of |6|. we rewrite the term to |k * aa + l| and we use the Triangle Inequality |b|< | k * a a | + |l| The goal can be shown in three steps: • There exists an M such that |k| < M, • \a\ < €/(2 * M) , and • |l| < e/2.

it follows that |b| < |k| * |aa| + |l| < M * c/(2 * M) + e/2 = e and therefore |b| < e. In the instantiations of this schematic verbalization, the terms (maths fond) are instantiated with the actual instantiation of the parameters a, b, k, l in the particular ComplexEstimate step in the proof plan. (For LIM+, a is f ( X 1 ) — l1, b is f(x) + g(x) (l1 +l 2 ) , k is 1, l is g(x) - l2.) This schematic verbalization is stored in a slot of the method in order to employ it for the presentation of the proof. Only because the methods represent a relative abstract content, a schematic verbalization is quite appropriate. Different modi of a method's verbalization are possible and desired, e.g., for different styles of the presentation of a construction proof. All these verbalizations are prefabricated though. 4.1

Automatic Generation of Proof Verbalization

The plan presentation routine LocalPresent is integrated into the user interface LOUI [13] and uses the hypertext facility based on the LOUI markup language LML. For the presentation, a presentation window is created, shown at the bottom right window of Figure 1. The actual input of LocalPresent is the proof plan that is graphically presented by LOUI. When the user clicks on a node of the graphical presentation of the proof plan, then LocalPresent searches backwardly for the next node whose method can be verbalized (remember that not every methods actually is verbalized). If such a method is found, then its instantiated verbalization is shown in the presentation window. If no such method exists, then the verbalization obviously is produced. The instantiated verbalization of a method M may contain a subgoal of M. In this case, a dynamically produced hypertext link points to the proof of this subgoal. If the

488


user clicks on a marked subgoal g, LocalPresent goes to the node in the proof plan that corresponds to g and again searches backwardly for the next method that can be verbalized in order to present the proof of g. In case the user wants to see more details of a method M that is currently verbalized, then (s)he can click on an expansion link that causes LocalPresent to dynamically expand M to a subplan and to show a verbalization of this subplan. Typically, this expansion is the instantiation of a schema that is represented in the proof schema slot of a method. Since only the verbalization of one method is shown in the presentation window, this presentation is a local one as opposed to a global verbalization that would present a sequence of methods. This local presentation has the advantage that the user can more easily see the correspondences between the (graphical) plan tree and the verbalization. If (s)he wants to see more of the verbal presentation of the proof (s)he has to click on another node or a subgoal in a verbal presentation. 4.2

Example: LIM+

An understandable presentation of LIM+'s plan can be automatically extracted from the complete proof plan and obtained by first clicking on the node that represents the theorem in the graph. The presentation starts with postulating the answer constraint formula which results from collecting all constraints of 6, el, and e2. This answer constraint formula 6 < 61 /\6 < 62 ^. . . is the presentation of the IntroduceCS method. The next method verbalized is ComplexEstimate. Its verbalization contains subgoals that can be clicked on further for obtaining the other parts of the complete verbalization. Figure 1 shows how LocalPresent verbalizes this method in the presentation window. The white node in the graph is the one that is currently verbalized. Three subproofs

***

E3

4

A*

A.

*•

V
0, there exists 61 such that for all x, if |x — a| < 51 then | f(x) - L1| < e/(2 * M) (3) and by hypothesis , if 6/2 > 0, there exists 62 such that forall x, if |x — a| < 62, then |g(x) - L2| < 6/2. Because of (*) it follows that | f(x) + g(x) - (L1 + L2)| < M * | f (x) -L1| + |g(x) - L2| < M * 6/(2 * M) + 6/2 = 6 and therefore | f(x) + g(x) - (L1 + L2)| < e. This proves the theorem. 5

Benefits for Teaching

We have not yet conducted class room experiments that show the superiority of the plan presentation of QMEGA. However, if you compare the above verbalization of the LIM+ proof with the output of the automated theorem prover OTTER as given in the introduction, then the proof plan based presentation clearly is more human-oriented. In particular, the hierarchical structure of a proof can be easily taught to or dicovered by a student. Since LocalPresent allows the user to decide when to look into details, i.e., when to expand a presentation, the presentation becomes rather individualized. IOMEGA can not only be used to verbalize the proof of LIM+ but also other e-5-proofs that we have checked in experiments. More experiments, e.g., with diagonalization proofs will follow shortly. More features will then be integrated into LocalPresent, e.g., different presentation styles and dependency on the user model. A GlobalPresent will use linguistic knowledge to chain the method verbalizations together and will use structuring features to structure a sequential presentation that is presented independent from the graphical proof tree. 3 The occurrence of the auxiliary variable M is due to the more general presentation needed for other limit theorems.

490

6


Conclusion

We have addressed requirements of comprehensible proof presentations, in particular the hierarchically structured presentation at the level of proof methods. Proof planning techniques and the proof plan representation is the basis for automatically generating proof presentations in the QMEGA system satisfying these requirements. Sometimes, a change of the underlying knowledge representation totally changes the problem solving horizon. This situation applies to changing from calculus-level proofs of traditional automated theorem proving to proof plans. Now suddenly, problems become simple which were hardly solvable in traditional automated theorem proving since proof plans are hierarchically structured and can be employed for the comprehensible presentation of proofs. Related the work has been done on XBarnacle [9] which is a proof planner used for proofs by mathematical induction. XBarnacle uses a flat proof plan to communicate with the user. It refers to a method by its name only rather than by a natural language description, so the user needs to know what this method is about. References [1] R.G. Bartle and D.R. Sherbert. Introduction to Real Analysis. John Wiley& Sons. New York, 1982. [2] C. Benzmueller, L. Cheikhrouhou, D. Fehrer, A. Fiedler, X. Huang, M. Kerber, M. Kohlhase, K. Konrad, A. Meier, E. Melis, W. Schaarschmidt, J. Siekmann, and V. Sorge. OMEGA: Towards a mathematical assistant. In W. McCune, editor, Proc. 14th International Conference on Automated Deduction, pages 252-255, 1997. Springer. [3] A. Bundy. The use of explicit plans to guide inductive proofs. In E. Lusk and R. Overbeek, editors, Proc. 9th International Conference on Automated Deduction, pages 111-120, 1988. [4] R. Catrambone. Improving examples to improve transfer to novel problems. Memory & Cognition, 22(5):606-615, 1994. [5] R. Catrambone. Generalizing solution procedures learned from examples. Journal of Experimental Psychology, Learning, Memory, and Cognition, 22(4):1020–1031. 1996. [6] B.I. Dahn and A. Wolf. Natural language presentation and combination of automatically generated proofs. In Proc. FroCos'96, pages 175-192. Kluwer, 1996. [7] X. Huang and A. Fiedler. Proof verbalization as an application of nlg. In Proc. of the 15th International Conference on Artificial Intelligence, pages 965-970. Morgan Kaufmann, 1997. [8] U. Leron. Structuring mathematical proofs. The American Mathematical Monthly, 90:174-185, 1983. [9] H. Lowe and D. Duncan. Xbarnacle: Making theorem provers more acessible. In W. McCune, editor, Proc. of the Fourteenth Conference on Automated Deduction, pages 404408, 1997. [10] W.W. McCune. Otter 2.0 users guide. Technical Report ANL-90/9, Argonne National Laboratory, Maths and CS Division, Argonne, Illinois, 1990. [11] E. Melis. AI-techniques in proof planning. In European Conference on Artificial Intelligence (ECAI-98), pages 494–498, Brighton, 1998. Kluwer. [12] D. Scott. Symbolic computation and teaching. In J. Calmet, J.A. Campbell, and J. Pfalzgraf, editors, Artificial Intelligence and Symbolic Mathematical Computation, AISMC-3, Springer, 1996. [13] J.H. Siekmann, S.M. Hess, C. Benzmuller, L.Cheikhrouhou, D. Fehrer, A. Fiedler, M. Kohlhase, K. Konrad, E. Melis, A. Meier, and V. Sorge. LOUI: A distributed graphical user interface for the interactive proof system fiMEGA. In Proc. of the International Workshop on User Interfaces for Theorem Provers, 1998.


491

A DIAGNOSIS BASED ON A QUALITATIVE MODEL OF COMPETENCE Stephanie JEAN1, Elisabeth DELOZANNE1, Pierre JACOBONI1, Brigitte GRUGEON2 1

2

LIUM - Universite du Maine, 72 085 LE MANS Cedex 9 (France) DIDIREM - Paris VII, 2 place Jussieu, 75 251 PARIS Cedex 05 (France)

Abstract: The main focus of the PEPITE project is how to build a system that helps teachers to assess students in elementary algebra. We describe here prototypes yet tested in actual classrooms and we share our experience of designing practically useful classrooms tools in a participative way. The methodology adopted is a combination of work in mathematics education and user-centred design derived from Human-Computer Interaction research. Starting from a multidimensional model of competence in algebra that includes quantitative and qualitative descriptions, we first show how it is applied to analyse both paper-and-pencil tasks and computerised tasks, and secondly how it is applied to analyse the students' productions when performing those tasks. Then we present the prototypes developed to automatically build students' profiles. Finally we discuss the validation process of such an assessment system and our research results. Keywords: Interface design, evaluation of instructional system, assessment of student's competence, elementary algebra.

1. Introduction The aim of the PEPITE project is to develop a tool to help teachers in assessing students' competence in elementary algebra. The 15 year-old students enter French general high schools coming from French college or vocational schools. Most of them encounter strong difficulties and the educational system fails to help them in overcoming those difficulties. As we started this research, our aim was to understand the reasons of such dysfunction, to identify the necessary conditions to a positive evolution, to create appropriate learning situations likely to help evolution of students' knowledge. The idea is to seek out, in the student's way of functioning, the nuggets of knowledge (in French, "pepites") to use as a basis to build some new knowledge. One of the results of this study is tools enabling teachers to interpret students' production in order to find starting points to modify students' knowledge. The PEPITE project is the first part of a larger project (not presented here) which aims to assist teachers in choosing activities for students or groups of students corresponding to the starting points highlighted by PEPITE. As pointed by Conlon and Pain [3], applied AIED needs "a research methodology that gives a central place to collaboration among teachers, researchers and technologists". HCI research proposes such methods (user-centred design, participatory design, usability engineering) [13] [11]. These methods suggest that users (actual students and teachers) must participate in the design process from the very beginning. We present here our experience in such a multidisciplinary approach and discuss about validation process of assessment system.

492

S Jean et al. / Qualitative Model of Competence

In such an approach, the focus is on how to collect relevant and reliable data with a computer to make sense of students' behaviour according to teachers' needs. In this paper we assume that teachers' needs are expressed in a model of competence derived from an educational research presented in section 2. This model gives the kind of results the diagnosis system has to produce (the students' profiles). Relevance of the collected data refers to the model. Reliability refers to the biases introduced by using a computer and thus refers to interface design problems. To make sense refers usually in AI community to diagnosis techniques. It refers also to cognitive and epistemological assumptions about nature of competence in the domain. Presently in our work we dealt more with difficulty to clarify the model of competence and with interface design problems than with diagnosis techniques. So we focus in this text on how to ensure quality of incoming data that is, in our opinion, of most importance in relation with our objectives. We here firstly present the educational basis of our work. We then introduce the research objectives of PEPITE project and the general architecture of the system. We describe each prototype we have implemented and its validation. We point out that difficulties in designing and implementing such a software are not only a diagnosis problem as well known in AIED community, but first an interface designing problem. Finally, we discuss the methodology of validation of PEPITE and our research results.

2. Educational basis We based our work on a qualitative model of competence in algebra. We begin by presenting what teachers want to know about students, we then present our theoretical framework about mathematical learning and our model of competence. This section ends with the presentation of the paper-and-pencil diagnosis tool we built. 2.1. What do teachers want to know? Assessment systems are very often short-item tests consisting of questions that can be answered in less than one minute each. Such systems give a description of student's state of knowledge in term of rates of success / failure. A more popular approach in AIED community bases assessment on student modelling [17]. In those systems the representation of student's knowledge consists of a set of rules, each expressing some small aspect of the domain. This set includes rules for most common misconceptions. A student model is a fine-grained report on student's skills. For example in OLAE [15], the student model reports the probability of mastery of around 290 rules. Teachers and mathematics educational researchers of our project found inadequate the level of rule mastery to make decisions about elementary algebra teaching. It is not the only dimension of algebra competence. Let's take an example. Figure 1 shows a student's solution for a classical problem. In term of rules we could say that Karine uses famous incorrect rules: x + a —> x a a x ± b —> (a ± b) x a x - x —> a- 1

5. Jean et al. / Qualitative Model of Competence

493

A prestidigitator is self-confident while carrying out the following trick. He says to a player: "Think of a number, add 8, multiply by 3, subtract 4, add the number you thought of, divide by 4, add 2 and subtract the number you first thought of: you have found 7. " Is this affirmation true? Justify your answer. Figure la: The prestidigitator problem

Figure 1b: Karine's paper-and-pencil answer to the prestidigitator problem

Teachers in the PEPITE project observe three points and then give an interpretation: - Karine reduces algebraic expressions in order to obtain a result without operator symbol at each right member of an equality. This difficulty is reported by Davis [4] as process product dilemma. Nonetheless, Karine's algebraic formulae keep meaning in relation with the problem and let her use incorrect rules but also correct ones: 3(x+8) -» 3×8x —»24 + 3x. - Karine translates each sentence of statement into one symbolic expression: Teachers interpret this translation as an algebraic strategy yet very close to an arithmetic one. - It is possible that knowing the result stirs Karine into using incorrect rules to obtain 7. Karine has constructed malrules coherent with her conception about algebra as a formal tool to compute a result. In order to help her, it is not efficient enough to show her the right rules. Teachers have proposed to her problem situations involving algebra as a proving tool and emphasising the equivalence meaning of equal sign. To adapt mathematical activity to student's state of knowledge, teachers need more than a quantitative description of student's behaviour. Thus we intended to define a qualitative description in order to help teachers to choose adequate students' activities. 2.2. Assumptions about mathematical learning Making sense of learner's behaviour is closely linked to a theoretical framework about mathematical learning. In this section we present assumptions that found our research. In order to analyse the dysfunction mentioned above, we feel necessary to define a kind of reference for algebraic competence at this level. We made a synthesis of mathematical, epistemological, didactical and cognitive research works in algebra learning. According to Douady [5] mathematical concepts have two non-independent dimensions: a tool status and an object status. As far as the tool dimension is concerned competence is expressed in terms firstly of ability to build algebraic expressions and relationships in order to translate (for instance a verbal description of a problem) and to interpret them. Secondly it addresses the ability to choose adequate algebraic tools to solve problems. Different kinds of problems are involved with this tool dimension such as translating problem situations into equations. As far as the object dimension is concerned, we take into account the duality of the algebraic expressions when manipulating them formally: both semantics and syntactic objects. Competence is then expressed in terms of status of algebraic objects, manipulative ability and articulation between their semantic and syntactic attributes linked with other semiotic frames (algebraic, numerical, graphical and geometrical frame and natural language). At this level, we need to consider that algebraic thinking require a rupture with arithmetic thinking and requires abilities to interpret algebraic expressions both at a procedural and a structural level and to develop a necessary flexibility between the two kinds of interpretations [8] [12] [16].

Translation: "the solution is actually 7".

5. Jean el al. / Qualitative Model of Competence

494

2.3.

The multidimensional model of competence in algebra

Based on this theoretical framework, we have been observing students' behaviour in mathematical classrooms activities during a long period (all the school year round) and have linked those observations with analysis of their exercise books of the previous year [6]. This study highlights that students' productions present coherence and regularities that correspond to their personal knowledge. From this study we kept four dimensions to have a qualitative description of students' algebraic behaviour (cf. figure 2). This model is used firstly to analyse tasks on which students are supposed to learn algebra and secondly to analyse students' productions on those tasks.

From arithmetic to algebra

- using equal sign • announces a result « expresses a symmetric and transitive relation - calculating with arithmetic numbers • correctly • incorrectly - using letters * correctly (as unknown to write an equation, as variable to express a relationship or to prove a numerical property) « incorrectly (as generalised number to substitute numerical values, as unspecified to manipulate formulae with incorrect rules, as label or shorthand for a concrete object) « never using

Manipulating algebraic formulae

Translating from a frame to another

Justifying

- good technical mastery - weak technical mastery (e.g.: not recognising of remarkable identities) - incorrect technique * bad using of brackets (leading to good / bad result) • using identified malrules * sign errors while transforming - confusing + et x - correctly - correctly but unexpected - incorrectly (e.g.: square of sum—»x2+y2) - abbreviating - using algebra - using legal rules - using formal rules - arguing in natural language - using numerical example - no explanation

Figure 2: Qualitative model of student's algebraic behaviour. 2.4.

The paper-and-pencil diagnosis tool

Combining this multidimensional model with an ad hoc set of paper-and-pencil tasks we designed a tool enabling teachers to interpret the students' productions in order to establish their profile. This set of tasks has been carefully chosen by researchers and teachers to cover each model dimension. Three types of tasks are proposed to students during a test. Technical exercises aim to determine the level of mastery of formal manipulations. Recognition exercises aim to determine how students identify and interpret algebraic expressions. Modelling exercises aim to identify if students use the expected algebraic type of treatment, how they translate problems into algebraic frame and how they use adapted tools to solve problems. Matching students' answers to the model provides a diagnosis matrix of values (40x60) linking questions and dimensions of analysis. This very fine description of the behaviour is too detailed to be used by teachers. It is necessary to establish a higher level description: students' cognitive profiles. These profiles have three levels of description: a quantitative description of algebraic skills in terms of success rates for each type of tasks, a description of flexibility between frames (between algebraic frame and other frames) represented by a diagram and a qualitative description of functioning coherence. This paper-and-pencil diagnosis tool has been tested several times. It has in particular been tested in June 1996 on 600 students (21 classes and 7 teachers) of a third form class. This experiment has pointed out that it was difficult and boring for teachers to fill the diagnosis matrix for all their students because the encoding of the students' productions is a

S. Jean et al. / Qualitative Model of Competence

495

very difficult diagnosis task that needs an important didactical expertise. Moreover when several teachers encode same students' tests, diagnosis matrixes may be slightly different but cognitive profiles are in the end identical. It seems to indicate the soundness of the diagnosis tool with respects of teachers' expertise and their acceptance of the algebraic competence description. Furthermore, teachers involved in the experiment are excited at our project to computerise the boring part of the diagnosis. At last, the students' paper answers coming from this experiment have been used as a corpus for the conception of the PEPITE project described in next section.

3. The PEPITE project The PEPITE project intends to demonstrate that it is possible to collect with a computer data on students' competence from which experts can build students' profiles, that it is possible to automate this diagnosis and that it is possible for teachers to use these automatically built profiles to make decisions in their classrooms. Thus, PEPITE software contains three modules: PEPITEST collects students' answers to problems adapted from the paper-and-pencil tasks; PEPIDIAG automatically fills the diagnosis matrix from data collected by PEPITEST; PEPIPROFlL from the diagnosis matrix computes the students' profiles and presents them to the users (teachers or researchers). 3.1.

PEPITEST

PEPITEST is the student interface: it provides problems and gathers students' answers. InPEPITESTdesign, we firstly pay very much attention to usability problems. Indeed it is crucial in an assessment environment where collected data had to be interpreted as competence indicators and not to be biased by interface manipulation problems. Ease of learning and short learning times are paramount because students take the test only once. Iterative design is strongly recommended for ensuring interface usability throughout the HCI literature [10] [13]. In [7] we have discussed means to evaluate usability: ergonomic criteria, guidelines, expert walkthrough and pilot test with users [2]. Secondly, we had to createPEPITESTtasks as close as possible to paper-and-pencil tasks in order to get answers equivalent to paper-and-pencil ones. Equivalent means that an expert or PEPIDIAG could interpret them to fill the diagnosis matrix. Let us note that the multidimensional model of competence is used both to diagnose students' productions and to evaluate the PEPITE tasks. Transferring pencil-and-paper exercises and tools to computational environment is not so obvious. It changes the tasks and has consequences on students' productions. Balacheff [1] calls this computational transposition. The main problem in PEPITEST is that writing an algebraic expression with a pen is very different from typing it on keyboard. From students' point of view, without a specific editor, they have to translate a spatial representation of the expression (e.g. a fraction) into a linear one. From assessment point of view, this translation introduces a difficulty that can disturb diagnose acting as a distorting mirror (introducing bias) or that can make visible normally invisible indicators, acting as a cognitive microscope [14]. We can propose an algebraic expression editor but it is not yet so easy for student to use it. No PEPITEST version presented here integrates this editor. Thirdly, we bear in mind the difficulties in interpreting students' open answers. We could have use form-based user interface allowing students to express their approach without using natural language nor typing algebraic formulae. But it is necessary to allow students to express themselves without monitoring answers, in order to capture for instance

496


their kind of justification or their writing of algebraic formulae. So, we have limited open questions but not to much in order to ensure the test completeness [7]. Presently PEPITEST runs with 22 problems, with 32 closed questions, 26 answers requiring algebraic expressions and 31 answers using both algebraic expressions and natural language. As a formative evaluation we first set up a pilot test on October 19% with 25 students in a high school classroom. As far as the usability was concerned, some minor changes in the test rise to evidence: For instance the basic manipulations (such as carriage return, drag and drop etc.) have to be taught to some students. Nonetheless students have found easy using PEPITEST 1. As designer, we enjoy that, in spite of difficulties in writing algebraic expressions, students have produced such expressions, and moreover, educational researchers succeed in interpreting them. PEPITEST 2 was tested on June 1997 with 43 students in two classes in order to validate PEPlTEST as data collector for diagnosis. Educational researchers in our team were enthusiastic: they were suspicious PEPITEST would reduce the range of students' productions. For each question, we have found every kind of expected answer proposed in our model of competence in algebra. Thus it showsPEPITESTcompleteness in relation with the model of competence. In regard to algebraic formulae, as we expected, students had difficulties in producing them. But, those difficulties do not prevent them from answering with algebraic formulae. According to teachers, only one student out of 43 seems to modify her answers. Thus it shows that the expression editor is welcome and useful but may be temporarily bypassed. Finally, educational researchers can fill the diagnosis matrix from students' answers to PEPITEST problems and the teacher of the class could confirm the profiles thus obtained. So it shows the validity ofPEPITESTin relation with the paper-andpencil diagnosis tool. 3.2.

PEPIDIAG

PEPIDIAG is the diagnosis module that analyses answers toPEPITESTand fills the diagnosis matrix. Closed questions are easy to analyse because we manage to design the interface so that each choice matches expected skills in the competence model. Exercises requiring entering answers with algebraic formulae are more difficult to deal with. Before linking them with the model skills it is necessary to apply transformations to students' productions in order to normalise them (commutativity, associativeness, etc.). In remaining exercises, in addition to the well-known difficulties processing natural language answers, we face with a segmentation problem, when algebraic formulae occur mixed with natural language. For this module we presently obtain two main results. Firstly, PEPIDIAG is able to automatically analyse every closed answer and every simple algebraic expression answer. So we analyse 75 percent of students' answers toPEPITESTproblems. Secondly, we ran PEPIDIAG on every student's production of our corpus: the system fills the diagnosis matrixes. In order to correlate this partial diagnosis with human assessment, we choose 5 students with different levels of competence and we asked an expert to fill manually the diagnosis matrix. PEPIDIAG and the human assessor were in agreement. That means that we can already partially automate the diagnosis, but analysing the remaining 25 percent answers still has sense to obtain the completeness of the profiles. 3.3. PEPIPROFIL

is the teacher interface: it computes students' profiles and presents them to the teacher. As we explained in 2.4, student's profile has three levels of description: success

S. Jean et al. / Qualitative Model of Competence

497

rates, flexibility between frames, functioning coherences. This results from algorithmic processes merging similar answers and applying thresholds. For the computing part of PEPIPROFIL we yet obtain two results. With a manually filled matrix, PEPIPROFIL computes the same profile than teacher does. And from the partial matrix yet filled by the system,PEPIPROFILbuilds partial profiles that are confirmed by the teachers. We have now to testPEPIPROFIL,in particular, the presentation part with teachers.

4. Discussion and conclusion Integrating real teachers and educational researchers in the design team make us focus on student mathematical activities with early field studies. Our approach is very close to PCM Methodology for applied AIED proposed by Conlon and Pain [3] where the development cycle is "driven by practice but informed by, and contributing to theory". This design process requires an early examination of criteria and evaluation methods. Human-Computer Interaction literature recommends that evaluation be considered as a state of mind which must express itself throughout the design of a system. Our validation of PEPITEST consists of verifying that we obtain equivalent answers in paper-and-pencil test and with PEPITEST and that data obtained from the software allows experts to build profiles equivalent to paper-and-pencil ones. We evaluate PEPIDIAG and PEPIPROFIL by comparing how automatic profiles fit with human assessors ones. It's called horse-race evaluations in [18] and criterion-related validity in [15]. In regard to student modelling, referring to the model proposed by Balacheff [1], PEPITEST provides a set of data that is a behavioural model. PEPIDIAG interprets these data to fill the diagnosis matrix, which is the procedural part of the epistemic model. Indeed PEPIDIAG interprets students' productions correlating them with an algebraic skill described in the model of competence. The profile computed byPEPIPROFILis a conceptual model. Figure 5 shows the matching between PEPITE and Balacheff's model. Our validation of PEPITEST corresponds to a behavioural morphism between the behavioural models from paper-and-pencil and machines. Agreement between human assessors and PEPITE demonstrates an "epistemological morphism between the epistemic model and the students' conception as elaborated by research, that is a mapping which preserve the epistemological structures" [1]. PEPITEST students' answers V

PEPIDIAG diagnosis matrix

Set of data Behavioural model Interpretation of data Epistemic model

PEPIPROFIL profile

Transversal analysis Epistemic model

Figure 3:PEPITE diagnosis tool and Balacheff's model

We intend to develop a tool to help teachers in assessing students' competence in elementary algebra. This tool is not yet completed but we have proved its feasibility. Firstly until the algebraic expressions editor is not integrated, it's not so easy for students to enter such expressions. Secondly, until we progress in interpreting natural language answers, the diagnosis would be partial. Thirdly, until we have made large-scale experiments with

498


teachers, we are not quite sure of the system acceptability by teachers. In spite of these present weaknesses, we have made substantial progress toward our objectives and we have implemented prototypes that already give results that teachers can use. Teachers already recognise their students in the partial profiles made by PEPITE. This can be explained by two kinds of reasons: the diagram is very informative and, except for use of letters, the diagnosis gives information for each dimension. Even if important indicators are missing, the partial profiles yet propose a good overall view of students' competence. The present PEPITE prototype already gives results that teachers can use. In spite of its still simple diagnosis module, PEPITE already performs by the quality of incoming data gathered by this interface. This allows us to incrementally develop the diagnosis module, working with corpus obtained from real students' interactions withPEPITEST.In our opinion this first success is due to teachers' and educational researchers' involvement from the very beginning of the project.

5. References [1] N. Balacheff, Advanced Educational Technology: Knowledge Revisited, in Liao ed, proceedings of the NATO ASI Serie F. Berlin, Springer Verlag, 1994. [2] C. Bastien, D. Scapin, Ergonomic Criteria for the Evaluation of HCI, INRIA (RT 156), 1993. [3] T. Conlon, H. Pain, Persistent Collaboration: A Methodology for Applied ABED, IJAIED (7-3/4), 219-259, 1996. [4] R. B. Davis, Cognitive Processes Involved in Solving Simple Algebraic Equations, Journal of Children's Mathematical Behavior (1-3), 7-35, 1975. [5] R. Douady, The Interplay between Different Settings: Tool-Object Dialectic in the Extension of Mathematical Ability: Examples from Elementary School Teaching, in Streefland ed, proceedings of the ninth International Conference for the Psychology of Mathematics Education (2), 33-53. 1985. [6] B. Grugeon, Design and using of a Multidimensional Structure of Analysis in Elementary Algebra, RDM (17-2), 167-210, 1997. (In French) [7] S. Jean, E. Delozanne, P. Jacoboni, B. Grugeon, Cognitive Profile in Elementary Algebra: the PEPITE Test Interface, IFIP-TC3 Official Journal "Education and Information Technology"(3), Chapman & Hall Ltd, 1-15, 1998. [8] C. Kieran, The Learning and Teaching of School Algebra, Handbook of research on Mathematics Teaching and Learning, Douglas Grouws Ed. Macmillan publishing company, 1992. [9] K. R. Koedinger, J. R. Anderson, Intelligent Tutoring Goes to School in the Big City, IJAIED (8), 30–43, 1997. [10] W. Mackay, A.-L. Fayard, HCI Natural Science and Design: a Framework for Triangulation across Disciplines, Designing Interactive Systems, Amsterdam, Holland, 1997. [11] J. Preece, Y. Rogers, H. Sharp, D. Benyon, S. Holland, T. Carey, Human-Computer Interaction, Addison-Wesley, 1994. [12] A. Sfard, On the Dual Nature of Mathematics Conceptions: reflections on processes and objects as different side of the same coin. Educational Studies in Mathematics (22), 1–36,1991. [13] B. Shneiderman, Designing the user Interface, Addison Wesley, 1992. [14] M. Twidale, Redressing the balance: the Advantages of Informal Evaluation techniques for Intelligent Learning Environments, IJAIED (4–2/3), 155–178, 1993. [15] K. VanLehn, J. Martin, Evaluation of an Assessment System based on Bayesian Student Modeling, IJAIED (8-3/4), 179-221, 1997. [16] G. Vergnaud, Conceptual Fields, Problem-Solving and Intelligent Computer Tools, in De Cone, Linn, Mandl, Verschaffel eds, Computer based Learning Environments and Problem-Solving, NATO ASI serie F.40, 287-308, Berlin, Springer-Verlag, 1992. [17] E. Wenger, Artificial Intelligence and Tutoring Systems, Computational and Cognitive Approaches to the Communication of Knowledge, Morgan Kaufman Publishers, Los Altos, CA, 1987. [18] P.H. Winne, A Landscape of Issues in Evaluating Adaptive Learning Systems, IJAIED (4–4), 309-332, 1993.

Support for Medical Education



501

Expertise Differences in Radiology: Extending the RadTutor to Foster Medical Students' Diagnostic Skills Roger Azevedo 1 , Sonia L. Faremo2 Carnegie Mellon University, Department of Psychology, Pittsburgh, PA 15213, USA [email protected] 2 McGill University, Applied Cognitive Science Research Group, Montreal, Quebec, Canada H3A 1Y2 [email protected] 1

This paper presents a re-analysis of a subset of Azevedo's (1997) and Faremo's (1997) original data on the problem solving strategies used by medical professionals (medical students, surgical residents, radiology residents, and staff radiologists) with varying levels of training in the areas of breast disease and mammogram interpretation. The results of our re-analysis indicated that novices (i.e., medical students) tended to have difficulty identifying mammographic observations and findings, provided fewer accurate diagnoses, used mainly hypothetico-deductive problem solving, committed more multiple errors, often made requests for additional medical information, seldom engaged in diagnostic planning, typically constructed multiple goals when solving breast disease cases, and made extensive use of data exploration, data acquisition, data examination, data classification, and hypothesis generation problem solving operators. Subsequently, a description of how the results will be used to extend the RadTutor's framework to accommodate the training of novices is provided. In the new version of the RadTutor, novices will engage in several activities which are aimed at enhancing their training, including understanding the diagnostic process, identifying observations and findings, integrating multiple knowledge representations, acquiring variability by extending multiple representations, and solving breast disease cases within a problem solving context.

1. Introduction Cognitive research in the area of diagnostic radiology [1, 2, 3, 4] is still in its infancy compared to the corpus of research in other non-visual medical domains. This research has begun to provide evidence of novice-expert differences, a characterization of the diagnostic processes, an understanding of the reasoning strategies used during diagnostic problem solving, an understanding of how certain factors (e.g., case typicality) influence diagnostic behavior, an understanding of several performance measures, information about the role of perceptual and cognitive processes in the diagnostic process, and rich empirical data for the design of computer-based training systems. As such, several cognitive psychologists and AI researchers have began to develop computer-based environments to train radiology professionals. The theoretical and empirical foundations of these systems lie in extensive empirical evidence in medical cognition, results from a few studies of radiology expertise, and diverse learning paradigms and instructional models (5, 6, 7, 8, 9, 10, 11, 12]. These and other systems for radiology training have focused specifically on the training of more experienced radiology professionals such as residents and staff radiologists. One of the goals of our project is to extend the RadTutor's existing framework to train novices (i.e., medical students) to interpret mammograms. This paper presents a re-analysis of a subset of Azevedo's [ 1 ] and Faremo's [2] original data on the problem solving strategies used by medical professionals (i.e., medical students, surgical residents, radiology residents, and staff radiologists) with varying levels of training in the areas of breast disease and mammogram interpretation. This paper has two purposes: (1) to present the results of our re-analysis on the novice-intermediate-subexpert-expert differences, and (2) to illustrate how the

502

R. Azevedo and S.L. Faremo / Expertise Differences in Radiology

results will be used in extending the RadTutor's framework to accommodate the training of novices (i.e., medical students) to interpret mammograms. 2. Method Study 1 [1] examined the cognitive processing of more experienced medical personnel, namely staff radiologists and radiology residents. Study 2 [2] was a parallel study of two less-experienced groups, medical students and surgical residents. The studies were conducted at several teaching hospitals affiliated with McGill University, and conducted with the assistance of the same two medical experts (one surgeon specializing in breast disease and one senior radiologist). They are also parallel in terms of the design of the experimental procedures, selection and diagnosis of cases, development of a model of breast disease diagnosis, and analyses of the verbal protocols. 2.1 Participants A total of 20 participants (5 staff radiologists and 5 radiology residents from Azevedo's [ 1 ] study, and 5 surgical residents and 5 medical students from Faremo's [2] study) from several large metropolitan university teaching hospitals provided the verbal protocol data used in this re-analysis. The radiologists had MD degrees. Board Certification, affiliation with a teaching hospital, an average of 24 years of post-residency training, an average of 18 years of mammography training, and estimated having analyzed an average of 42,000 mammograms during their careers. The radiology residents had MD degrees, were on rotation at a teaching hospital, had completed 1 mammography rotation, and had an average of 5 months of mammography training. The surgical residents had completed their undergraduate medical training, had an M.D. degree and several years of surgical training, had an average of 16 hours of classroom training on breast disease and an average of 19 hours in a breast clinic, had seen an average of 40 breast cases, and had diagnosed an average of 21 breast disease cases. Lastly, medical students had very limited knowledge of breast disease, an average of X hours of classroom training, an average of 5 hours in a breast clinic, seen an average of 4 breast cases, and never attempted to diagnose a breast disease case. 2.2 Breast Disease Cases Ten breast disease cases were used in each of the original studies. Cases were selected by both medical experts from their teaching files. Each case was comprised of a brief clinical history and at least four mammograms. Different cases were employed in the two studies. However, this re-analysis is based on a subset of the cases and participants used in the two studies. In order to conduct a meaningful reanalysis, it was necessary to match the cases drawn from one study with those from the other. This matching was done by the two researchers and the two medical experts, who carefully selected the cases based on level of difficulty, finding type and mammographic appearance, and disease typicality. 2.3 Procedure The following description of the experimental procedure refers to both studies [1, 2]. Participants were tested individually. The experimenter provided the participant with a I-page handout of instructions for the diagnostic task, and then presented each participant with a practice case. Each case was presented in a manila envelope containing a type-written clinical history and a set of mammograms. For each case, the experimental procedure involved having the participant: (1) read the clinical history, (2) display the mammogram set on a view-box, (3) point to the mammographic findings and/or observations, (4) provide a diagnosis (or a set of differential diagnoses), and (5) discuss subsequent investigations (if necessary). The participant was instructed to "think out loud" [13] throughout the entire diagnostic process. The experimental procedure was repeated for each subject. No time constraints were imposed.


503

2.4 Analysis of the Verbal Protocols The goal of analyzing the verbal protocol data was to identify the problem solving strategies, operators, and control processes used by participants during mammogram interpretation, and to extract several other performance measures (e.g., number of findings). The coding scheme developed by Azevedo [ 1 ] was used in this re-analysis of the 100 protocols (i.e., 20 participants each solving five cases). It was based on a content analysis of breast disease and mammography, and on theoretical and methodological articles in cognitive science and medical cognition. The coding scheme consisted of three major categories: knowledge states, problem solving operators, and control processes [14, 15]. Knowledge states in this domain were coded as radiological observations (information recognized as potentially relevant but not clinically useful), radiological findings (critical cues with clinical significance), and diagnoses (at different levels of abstraction). Problem solving operators were divided into 11 classes (e.g., hypothesis generation) and comprised a total of 30 operators (see [1]). Control processes were comprised of diagnostic planning, goal verbalizations, and meta-reasoning. Verbal protocol data was transcribed and segmented according to the standard conventions. The coding scheme was applied to each of the 100 segmented protocols in order to identify the knowledge states, problem solving operators, control processes, problem solving strategies, and domaindependent, and domain-independent semantic relations. The numbers of radiological findings, observations, and diagnoses were also tabulated based on a thorough examination of each protocol. Lastly, each protocol was also categorized based on its diagnostic accuracy, the type(s) of error(s) committed, and whether any requests for additional medical information were made. Inter-rater reliability was established by recruiting two graduate students with experience in the area of breast disease and mammography and training them to use the coding scheme. They were instructed to independently code 50 randomly selected protocols yielding a reliability coefficient of .91. 3. Results Inferential analyses were conducted to verify whether there were any significant differences in the mean number of radiological findings, observations, and diagnoses across levels of expertise. In addition, non-parametric statistical analyses were conducted on the frequencies of ratings of diagnostic accuracy, reasoning strategies, error types, requests for additional medical information, and control processes by level of expertise. 3.1 Number of Radiological Findings, Observations, and Diagnoses Three one-way ANOVAs were performed on the mean number of radiological findings, observations and diagnoses across the four levels of expertise. The analyses revealed significant differences between the groups in the mean number of radiological findings (F [3,16] = 6.8 1, p < .05) and observations (F [3, 16] = 9.98, p < .05). The third analysis failed to reveal a significant difference in the mean number of diagnoses between the groups (F [3, 16] = 2.54, p > .05). On average, radiology residents and staff radiologists identified approximately three observations per case while medical students and surgical residents failed to identify any. As for radiological findings, medical students failed to identify any, but the other three groups identified at least one finding. All participants tended to provide approximately one diagnosis per case. The means and standard deviations for radiological findings, observations and diagnoses by level of expertise are presented in Table 1. 3.2 Diagnostic Accuracy The final diagnosis provided in each case was categorized as correct, indeterminate (e.g., a diagnosis of benign in a case where the characteristics of the mass indicate that it might be malignant) or wrong. A 3X4 Chi-square analysis revealed a significant difference in the distribution of the number of cases across levels of expertise and diagnostic accuracy (Chi-square [6, N = 100] = 43.4, p < .05) (see Table 2). Overall, radiology professionals provided significantly more correct diagnoses (76% and 80%) than non-radiology professionals (12% and 44%). In contrast, non-radiology professionals

504


provided significantly more incorrect diagnoses (20% and 72%) than radiology professionals (12% and 24%). Non-radiology professionals also provided disproportionately more indeterminate diagnoses (16% and 36%) than the radiology professionals (0% and 8%). Table 1. Mean Radiological Observations. Findings, and Diagnoses by Level of Expertise

Performance Measure

Medical Students Mean (SD)

Level of Expertise Surgical Radiology Residents Residents Mean (SO) Mean (SO)

Radtoloaical Observations Radiological Findings Diagnoses

0.16 (0.2) 0.60 (0.2) 0.92 (0.4)

0.44 (0.3) 1.36 (0.4) 0.92 (0.2)

2.56 (1.1)

1.04 (0.3) 1.36 (0.4)

Staff Radiologists Mean (SO) 3.32 (1.9) 1.16 (0.3) 1.12 (0.1)

Ratinas. Diagnostic Strategy Error Types and Table 2. Frequencies and Percentagesof Dianostic Ac Requests for Additional Medical Information by Level of Expertise

Medical Students Diagnostic Accuracy Correct Diagnosis Indeterminate Diagnosis Wrong Diagnosis Reasoning Strategy Hypothetico- Deductive Data-Driven Mixed Error Types Perceptual Detection Wrong Recommendation Multiple Errors Request for Additional Medical Information No

Yes

Level of Expertise Surgical Radiology Residents Residents

Staff Radiologists

N (%)

N (%)

N (%)

N (%)

3 (12) 4 (16) 18 (72)

11 (44) 9 (36) 5 (20)

19 (76) 0 (0) 6 (24)

20 (80) 2 (8) 3 (12)

17 (68) 8 (32) 0 (0)

8 (32) 17 (68) 0 (0)

0 (0) 20 (80) 5 (25)

0 (0) 23 (92) 2 (8)

0 (0) 0 (0) 22 (100)

0 (0) 0 (0) 13 (100)

5 (83) 1 (17) 0 (0)

3 (60) 2 (40) 0 (0)

14 (56) 11 (44)

24 (96) 1 (4)

25 (100) 0 (0)

25 (100) 0 (0)

3.3 Reasoning Strategy Each of the 100 protocols was categorized as hypothetico-deductive (a form of backward reasoning involving hypothesis generation, information search, data interpretation and hypothesis evaluation), data-driven (forward reasoning whereby one proceeds from reading the clinical history to specifying subsequent examinations), or mixed-strategy (combination of data-driven and goal-driven reasoning strategies). A 3X4 Chi-square analysis revealed a significant difference in distribution of strategies used across levels of expertise (Chi-square [6, N = 100] = 48.1, p < .05) (see Table 2). Overall, the medical students diagnosed the breast disease cases using mainly hypothetico-deductive reasoning (68%) but sometimes used a data-driven strategy (32%). In contrast, the surgical residents used mainly data-driven (68%) and rarely used hypothetico-deductive reasoning (32%). As for the two moreexperienced groups, they both tended to use the data-driven strategy (80% and 92%) and sometimes used a mixed-strategy (8% and 25%).


505

3.4 Error Types An in-depth analysis of the 46 errors committed by the participants revealed three major types: (1) perceptual detection errors , (2) wrong recommendation errors (an inappropriate subsequent investigation is proposed), and (3) multiple errors (combination of perceptual detection, finding mischaracterization, no diagnosis, wrong diagnosis or wrong recommendation). A 3×4 Chi-square analysis revealed a significant difference in distribution of error types across levels of expertise (Chisquare [3, N = 46] = 34, p < .05) (see Table 2). Overall, the results indicate that medical students and surgical residents committed more errors (22 and 13, respectively) than the radiology residents and staff radiologists (5 and 3, respectively). Also, the two less-experienced groups committed multiple errors while the two more-experienced groups committed single errors only. 3.5 Requests for Additional Medical Information The protocols were also analyzed in order to determine whether or not participants requested additional medical information other than the clinical history and set of mammograms. A 2X4 Chi-square analysis revealed a significant difference in distribution of requests for additional medical information across levels of expertise (Chi-square [3, N = 100] = 21.5, p < .05) (see Table 2). Overall, medical students requested additional medical information more frequently (on 44% of the cases) than the surgical residents, radiology residents and staff radiologists (on 4%, 0%, and 0% of the cases, respectively). 3.6 Frequency of Problem Solving Operator Use The frequency of use for each of the 11 types of problem solving operators was calcualted based on level of expertise (see Table 3). The results revealed a predominant use of the following operators (listed in order of descending frequency): (a) data exploration (e.g., characterizing a mammographic cue), (b) data acquisition (e.g., inspecting a set of mammograms), (c) data examination (e.g., identifying a mammographic cue), (d) hypothesis generation , (e) hypothesis evaluation, and (f) data classification. These 6 operators account for 70% to 77% of all operators used by all participants, regardless of level of expertise. Table 3. Frequency of Operator Use by Level of Expertise.

Operators Data Acquisition Data Identification Data Assessment Data Examination Data Exploration Data Comparison Data Classification Data Explanation Hypothesis Generation Hypothesis Evaluation Summarization

Medical Students N (%) 50 (19) 0 (0) 0 (0) 44 (16) 68 (25) 15 (5) 24 (9) 15(5) 25 (9) 27 (10) 0 (0)

Level of Expertise Surgical Radiology Residents Residents N (%) N (%) 50 (18) 50 (15) 14(4) 0 (0) 0 (0) 5(1) 48 (17) 52 (16) 68 (24) 109 (33) 18 (5) 11 (4) 37 (13) 20 (6) 14(4) 13(5) 28 (10) 44 (13) 23 (8) 5(1) 4(1) 4(1)

Staff Radiologists N (%) 50 (18) 6 (2) 5 (2) 46 (16) 74 (26) 28 (10) 19 (7) 10 (4) 35 (12) 5 (2) 4(1)

3.7 Frequency of Control Processes Used Each protocol was analyzed for the frequency of control process use. The 100 protocols revealed that all participants, regardless of level of expertise, used two main control processes: diagnostic plans and

506

R. Azevedo and S.L. Faremo /Expertise Differences in Radiology

goals. A 2X4 Chi-square analysis revealed a significant difference in distribution of control processes used across levels of expertise (Chi-square [3, N = 138] = 29.1, p < .05) (see Table 4). Overall, moreexperienced participants tended to use more diagnostic plans (87% and 96% versus 28% and 75%). In contrast, less-experienced participants tended to use more goals (72% and 25% versus 13% and 4%). Table 4. Frequency of Control Processes Used by Level of Expertise.

Control Processes Diagnostic Plans Goals

Medical Students N (%) 8 (28) 21 (72)

Level of Expertise Radiology Surgical Residents Residents N (%) N (%) 39 (87) 30 (75) 10 (25) 6 (13)

Staff Radiologists N (%) 23 (96) 1 (4)

3.8 Summary This re-analysis has revealed many ways in which radiology professionals were more effective and accurate than novices. The discussion to follow illustrates how the results of this re-analysis have been used in the design of an instructional component for the RadTutor, a computer-based instructional system for training medical personnel to diagnose breast disease from mammograms. 4. Discussion 4.1 The RadTutor The RadTutor [5, 6] is a computer-based environment designed to train radiology residents to interpret mammograms. The system's conceptual framework is based on information processing (IP) theory and a cognitive theory of skill acquisition [14, 15]. Azevedo's [1] original results were used as the basis for the RadTutor. The content and task analyses, model of mammogram interpretation, problem solving operators and control processes, and error analyses from the original study formed the basis for the system's domain knowledge module, instructional sequencing, instructional scaffolding, and interface design. It's pedagogical strategies (modeling, scaffolding, coaching, and articulation) are based on analyses of the staff - resident interaction during mammography rounds. The purpose of this discussion, however, is to illustrate how the results of our re-analysis will be used to make certain modifications to the RadTutor's existing framework, in order to accommodate the training of medical students (i.e., novices). The RadTutor will be extended to provide an environment in which this population can practice encoding different forms of information, building domain knowledge, mapping new knowledge to previous knowledge, and applying what is learned within a problem solving context. 4.2 The Steps in Diagnosing Breast Disease in the RadTutor Briefly, solving a RadTutor case begins as the tutor provides a clinical history and displays the corresponding mammograms in random order. The user can re-position the images in the mammogram display area. Subsequently, the user is prompted to circle and identify the mammographic observations, and then has the option to either select critical observations from a list of features or go directly to the images and highlight them. The user must identify and label all observations and then do the same for findings. All findings must then be characterized. Finally, the user is then asked to provide a set of differential diagnoses and subsequent examinations. At each phase, the tutor responds accordingly by providing one of its instructional scaffolding strategies (e.g., from simple confirmative feedback to more specific directives).


507

4.3 Extending the Radiator to Foster Diagnostic Skill Development in Novices The preceding re-analysis demonstrated several areas in which novices had difficulties performing in this domain. Five instructional modules are presented below which are designed to assist novices in overcoming these problems. These instructional prescriptions are consistent with robust empirical findings in cognitive and perceptual psychology [16], and educational psychology [17]. • 1. Assessing and developing foundational knowledge. The nature of errors found along with medical students' limited experience in the re-analysis suggest that developing basic knowledge of the domain (e.g. breast anatomy) should be a first priority. As a result, the RadTutor's first curriculum module will test novices' knowledge in these areas and provide an appropriate level of instruction based on the results. • 2. Understanding the diagnostic process. The second module was designed in response to other errors, unnecessary requests for clinical information, the use of hypothesis-driven reasoning, and the lack of planning by novices. Together they suggest the importance of training novices to systematically follow an appropriate diagnostic process. Accordingly, module 2 will teach Azevedo's [1] 7-step diagnostic model. It is hypothesized that with explicit training on the diagnostic process, novices will adopt more systematic and expert-like diagnostic procedures. • 3. Identifying and characterizing observations and findings. Novices demonstrated various difficulties with the critical tasks of identifying and characterizing observations and findings. Accordingly, module 3 will provide them with instruction and experience with this aspect of the task. A large selection of mammogram images with different types and manifestations of observations and findings will be accessible. The tutor will provide scaffolding to users as they practice identifying, localizing, and labeling observations and findings. • 4. Integrating multiple knowledge representations. Novices' difficulties with identifying and characterizing observations and findings are also the basis for module 4. It is also based on instructional principles of knowledge acquisition [16, 17]. The tutor will support selective encoding and selective combination by focusing learner attention on relevant case information and connecting it with visual representations. The tutor will also provide examples for users to work with, scaffold them as they do so, and provide appropriate feedback. • 5. Acquiring variability by extending multiple representations. The fifth component will present the novice with multiple single mammograms that depict findings belonging to particular categories (e.g., ill-defined masses). This aspect of training will be critical since the novice needs to go beyond "prototypical" representations and understand that there is variability in mammographic findings. This training will build the novices' visual knowledge base to include better understanding of the variability in findings. This training strategy will be maintained for all categories of mammographic findings. 4.4 The New RadTutor As described, the RadTutor currently supports users in solving breast disease cases. In the new version of the system, novice users will be able to work with the instructional modules described above and will also have access to mini-tutorial and help functions as they attempt to solve cases. This part of the training will allow novices to apply what was learned through interaction with the instructional modules to solve real cases. Finally, we are planning on integrating several "cognitive tools" [18] to further support the development of the novices' diagnostic skills. 5. Acknowledgments This research is partially funded by a postdoctoral fellowship awarded to the first author from the Social Sciences and Humanities Research Council of Canada (SSHRC). The initial design and development of the RadTutor was funded by Dr. Fleiszer and McGill University's Medical Informatics Committee. The authors would like to thank Xiaoyan Zhao for programming the various versions of the RadTutor. The authors wish to thank Susanne Lajoie, David Fleiszer, Monique Desaulniers, and the medical students, surgical residents, radiology residents and staff radiologists for their participation in the original studies.

508


6. Conclusion In conclusion, this paper presented a re-analysis of a subset of Azevedo's [ 1 ] and Faremo's [2] original data on the problem solving strategies used by medical professionals with varying levels of training in the areas of breast disease and mammogram interpretation. The analysis revealed that medical students tended to have difficulty with many aspects of the diagnostic process. Subsequently, we illustrated how we plan to use the results to extend the RadTutor's framework to accommodate the training of novices (i.e., medical students). In the new version of the RadTutor, novices will engage in several activities to develop their background knowledge and skills and practice applying them. These activities include understanding the diagnostic process, identifying observations and findings, integrating multiple knowledge representations, acquiring variability by extending multiple representations, and solving breast disease cases within a problem solving context. References [1] Azevedo, R. (1997). Expert problem solving in mammogram interpretation A visual cognitive task. Unpublished doctoral dissertation, McGill University, Montreal, Quebec, Canada, [2] Faremo, S. (1997). Novice diagnostic reasoning in a visual medical domain: Implications for the design of a computer-based instructional system for undergraduate medical education Unpublished master's thesis. Concordia University, Montreal, Quebec, Canada, [3] Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D., & Wang. Y. (1988). Expertise in a complex skill: Diagnosing X-ray pictures. In M. Chi, R. Glaser. & M. J. Farr (Eds.). The nature of expertise (pp. 311–342) . Hillsdale, NJ: Erlbaum. [4] Rogers, E. (19%). A study of visual reasoning in medical diagnosis. In G.W. Cottrell (Ed.), Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society (pp. 213–218). Mahwah, NJ: Erlbaum, [5] Azevedo, R., Lajoie, S.P., Desaulniers, M., Fleiszer. D.M., & Bret, P.M. (1997). RadTutor: The theoretical and empirical basis for the design of a mammography interpretation tutor. In B. du Boulay. & R. Mizoguchi (Eds.). Frontiers in artificial intelligence and application (pp. 386–393). Amsterdam: IOS Press. [6] Azevedo, R., & Lajoie. S.P. (1998). The cognitive basis for the design of a mammography interpretation tutor. International Journal of Artificial Intelligence in Education. 9, 32–44. [7] Sharpies, M., du Boulay, B., Teather, B.A., Teather, D., Jeffrey, N., & du Boulay, G.H. (1995). The MR tutor: Computer-based training and professional practice. In J. Greer (Ed.), Proceedings of AI-ED'95 (pp. 429–436). Charlottesville, VA: ACCE. [X] Sharpies, M., Jeffery, N., Teather, D., Teather, B., & du Boulay, G. (1997). A socio-cognitive engineering approach to the development of a knowledge-based training system for neuroradiology (CSRP 463). Sussex, UK: University of Sussex. [9] Rogers, E. (1995). VIA-RAD: a blackboard-based system for diagnostic radiology. Visual interaction assistant lor radiology. Artificial Intelligence in Medicine, 7(4), 343-60. [ 10] Macura, R.T., & Macura, K.J. (1995). MacRad: Radiology image resource with a case-based retrieval system. In M. Veloso. & A. Aamodt (Eds.), Proceedings of the International Conference on Case-Rased Reasoning (pp. 43–54). Berlin: Springer Verlag. [11] Direne, A.I. (1997). Designing intelligent systems for teaching visual concepts. International Journal of Artificial Intelligence in Education, ft, 44-70. [12] Taylor, P., Fox, J., & Todd-Pokropek, A. (1997). A model for integrating image processing into decision aids lor diagnostic radiology. Artificial Intelligence in Medicine, 9, 205-225. [13] Ericsson, K.A., & Simon, H.A. (1993). Protocol analysis: Verbal reports ax data. Cambridge. MA: MIT Press. [ 14] Anderson, J.R. (1993). Problem solving and learning. American Psychologist. 48(1). 35–44. [15] Newell, A., & Simon. H.A. (1972). Human problem solving. Englewood Cliffs. NJ: Prentice Hall. [16] Klatzky, R.L. (1980). Human memory: Structures and processes. San Francisco, C A: Freeman. [17] Stemberg, RJ. (1998). Principles for teaching for successful intelligence. Educational Psychologist, 33(2/3), 65-72. [18] Lajoie, S.P., & Azevedo, R. (in prep.), Cognitive tools for medical informatics. In S.P. Lajoie (Ed.), Computers as cognitive tools II: No more walls: Theory change, paradigm shifts and their influence on the use of computers for instructional purposes. Mahwah, NJ: Erlbaum.


509

Building a Case for Agent-Assisted Learning as a Catalyst for Curriculum Reform in Medical Education Erin Shaw, Rajaram Ganeshan, and W. Lewis Johnson Center for Advanced Research in Technology for Education (CARTE) Information Sciences Institute, University of Southern California 4676 Admiralty Way, Marina del Rey, CA 90292–6695 USA {shaw, rajaram, Johnson}@isi.edu, http://www.isi.edu/isd/carte/ Douglas Millar Graduate School of Education and Psychology, Pepperdine University 400 Corporate Pointe Drive, Culver City, CA 90230 USA dougnhelen @ moonlink.net Abstract: Animated pedagogical agents offer promise as a means of making computer-aided learning more engaging and effective. Realizing its full potential will involve integrating agents into curricula on a massive scale. This paper describes progress toward this end: the development of an agent-assisted learning environment designed for widespread use in health science curricula. The system features Adele, an animated pedagogical agent who guides and assesses students as they work through clinical cases. We present the results of an initial evaluation of Adele by twenty-five medical students, and draw several conclusions about computer-based tutoring in a clinical domain. We also describe our experience working with the medical faculty to design and build the clinical case for the evaluation, and propose a process for large-scale case authoring. Keywords: Animated pedagogical evaluation, simulation-based training

agent,

intelligent

tutoring, real-world

1 Introduction Animated pedagogical agent technology has been proposed as a new approach for making computer-based learning more engaging and effective [Johnson 1998]. It builds upon previous work on intelligent tutoring systems [Wenger 1987] and extends it in several important respects. As we view the concept, animated pedagogical agents are a type of autonomous agent [Johnson and Hayes-Roth 1998]: they are capable of pursuing goals in complex environments, adapting their behavior as needed in response to unforeseen events. Their environment is typically an educational simulation, together with the learners and other agents that interact with the simulation. A pedagogical agent may seek to achieve pedagogical goals (e.g., to help a learner to learn about a topic), communicative goals (e.g., to acknowledge a learner's action), and task goals (e.g., to help the learner solve some particular problem). Animated pedagogical agents also have life-like, animated personas. They can respond to learners with a combination of verbal communication and non-verbal gestures such as gaze, pointing, body stance, and head nods. They can convey emotions such as surprise, approval, or disappointment. Taken together these capabilities allow animated pedagogical agents to interact with learners in a manner that is closer to face-to-face collaborative learning.

510

E. Shaw et al. / Case for Agent-Assisted Learning

The technical sophistication of animated pedagogical agents has progressed rapidly. Steve, a 3D animated agent, can interact with learners in individual and team scenarios [Rickel and Johnson 1997, 1998]. PPP Persona is able to generate tutorial presentations of Web-based learning materials [Andre et. al 1998]. Cosmo is able to generate critiques and explanations using a combination of speech and emotive gestures [Towns et. al 1998]. Early empirical results show that these agents can enhance the learning experience and improve its effectiveness [Lester et. al 1997]. In this paper we describe our experience employing Adele, an Agent for Distance Education, who is designed to help realize the potential of animated pedagogical agent technology in on-line education [Shaw et. al 1999]. Adele operates in a Web-based, distributed simulation environment, where she monitors students as they solve problems, gives feedback, points them to relevant on-line reference materials, and evaluates their performance. She is designed to be used in a wide range of health science courses; here, we focus on our work to support clinical skills learning in medical education. This paper presents important progress toward widespread adoption of agent-assisted learning techniques. Empirical studies of animated pedagogical agents have shown promising results with school children, but it was not known whether university students and practicing professionals would respond positively as well. We present the results of a formal evaluation of Adele by twenty-five medical students, and draw several conclusions about computer-based tutoring in a clinical domain. We also describe our experience working with the faculty and staff at the Medical School to design and build the clinical case for the evaluation, and propose a process for large-scale case authoring. 2

Adele system overview

Adele's system, shown in Figure 1, consists of four main components: the pedagogical agent, the simulation, the client and server, and the server store. The pedagogical agent consists further of two sub-components, the animated persona and the reasoning engine. A fifth component, the session manager, is employed when the system is run in multipleuser mode. The central server implements gate-keeping services, maintains a database of student progress, and when appropriate, provides synchronization for collaborative exercises carried out by multiple students on multiple computers. The reasoning engine performs all monitoring and decision making. Its decisions are based on the case task plan and an initial state, which are downloaded from the server when a case is chosen, and the agent's current mental state, including the student model.

Figure 1. Architectural overview of Adele's system.


511

which is updated as a student works through a case. The simulation can be authored using the language or authoring tool of one's choice. All simulations communicate with the agent via a common application programming interface (API) that supports event and state change notifications as defined by the simulation logic. The two-dimensional animated persona is simply a Java applet that can be used alone with a JavaScript interface or incorporated into a larger application such as the one we describe here. 2.1

Task representation and reasoning

Previous efforts in medical intelligent tutoring (e.g., [Clancey 1983] and [Azevedo et. al 1997]) have either been large expert systems incorporating sizeable medical knowledge bases, or have been prototypes focusing on narrow sub-fields of medicine. Neither approach is suitable for a system like Adele, which is designed to apply to variety of health science courses, yet must be downloaded and run on client computers. Adele employs a different approach: she encodes for each case only the knowledge needed to tutor that particular case. Decisions about what knowledge to include are made when the case is authored. Adele's formal knowledge consists mainly of the procedural knowledge necessary to work through a case, and is represented as a hierarchical plan [Russell and Norvig 1995]. Related medical knowledge, such as background information about disease etiologies, is incorporated into textual hints and Web-based reference materials, and can thus be presented to the learner as needed, but is not represented formally within Adele's knowledge base. This simplifies the run-time knowledge base and reasoning engine. 2.2

Opportunistic learning

Adele supports two kinds of opportunistic learning: 1) quizzes to assess a student's understanding of the material, and 2) pointers to Web-page based reference material on the current subject. These pedagogical opportunities are represented as situations in a situation space [Marsella and Schmidt 1990]. A situation space provides a means of structuring the space of states associated with a domain so that they can be used to guide dynamic behavior. By maintaining an awareness of the current situation the agent can undertake contextually-appropriate interactions with the student. 3

Case-based medical education

The School of Medicine at the University of Southern California has begun to implement a first and second year curriculum that emphasizes the analysis of clinical cases. 'Clinical case'-based education is, more generally speaking, a case-based approach to instruction [Gragg 1940], one in which patient-physician cases create a familiar context for learning. The move to a case-based approach in medical instruction is in response to the problem of knowledge application, whereby a student may recall learned knowledge but can not apply it in an operational setting [Whitehead 1929]. While not particular to the medical domain, the problem is pervasive within it. A clinical case typically consists of a patient's chief complaint, the clinical findings, the differential diagnoses with an emphasis on determining a final diagnosis, and the treatment and management of the disease. Although created for presentation in a classroom setting, the cases are inherently situated and interactive, and with effort, can be reconstructed as simulation-based diagnosis exercises. In this section we explain how Adele guides students through one of these interactive exercises and discuss the benefits and drawbacks of computer-based tutoring in the medical domain.

512

E. Shaw et al / Case for Agent-Assisted Learning

Figure 2. Adele explains the importance of palpating the patient's abdomen

3.1

Multiple-level learning design

Figure 2 shows a typical case-based diagnosis exercise in which students are presented with a simulated patient in a clinical setting. In the role of physicians, students are able to perform a variety of actions on the simulated patient; they may ask questions about medical history, perform a physical examination, order diagnostic tests, and make diagnoses. Adele monitors the student's actions and provides feedback accordingly. Depending upon the instructional goals, Adele may highlight aspects of the case, suggest correct actions, provide hints and rationales for particular actions, reference relevant background material, and provide contextual assessment. A clinical case presents a rich context for learning at many levels. Adele can emphasize the procedure, namely, the best practice approach to a particular diagnosis, or she can concentrate on the related learning materials. Exercises can focus on different aspects of a case: in the medical domain, we may have pre-clinical exercises, which stress fundamental facts; clinical exercises, which stress clinical diagnosis; or residentlevel exercises, which stress treatment and management. One case can support all three levels, although the pedagogical emphasis of the learning task changes for each level. 3.2

Computer-based learning design

The pedagogical emphasis also changes with respect to the learning content, in order to exploit the benefits, or mitigate the drawbacks, of computer-based learning. For example, discovery by palpation and percussion, two forms of touch, cannot be simulated on a computer. Teaching listening skills is problematic, too, because it is difficult to record high fidelity body sounds without special equipment. The best an application can do is to provide a textual description of what something feels like or sounds like. On the other hand, the visual bias of the computer makes it especially good for teaching observation skills, something that is often under-emphasized in clinical settings, where students are apt to miss important visual cues in their rush to complete an examination. We can exploit visual opportunities further: for example, after the simulation response, "The throat is not inflamed.", when examining a patient's throat, Adele will display and compare images that show both normal and inflamed throats.

E. Shaw et al. I Case for Agent-Assisted Learning

4

513

Case Development

If Adele is to be adopted as part a system-wide curriculum for case-based medical education, the issue of authoring the task representation and simulation logic for each case becomes very important. This section discusses issues in large-scale case authoring, and tools to assist in this process. Adele's interactive cases are developed by layering domain, simulation, and pedagogical data. Domain knowledge, such as reusable case and disease data can be combined to produce a best practice task model. This task model specifies an ordering of steps to be taken, places constraints on the steps, and relates the steps to the diagnosis. Rationales and hints for the steps are added here. The amount of case material that is used to build the task model depends on the instructional level and objective. Once the task model is completed, the author supplies the media assets needed for the simulation. Media include text for Adele's responses, and text, image, video and audio files for the patient and clinical findings. The pedagogical model, which consists of contextual comments, reference materials, and quizzes for assessment, is then layered over the resulting task model and simulation. 4.1

Building an authoring tool

Currently, case authors enter data for new cases via a form that sits on top of a database. A translator takes the output and parses it into an object-oriented case representation scheme from which both the task representation as well as the simulation logic for the case are generated automatically. The translation process performs a variety of syntactic consistency checks on the input provided by the author and also saves the author the burden of matching simulation events with the actions in the plan. The current tool isn't ideal; authoring a non-sequential path through a case can be confusing, and it does not yet provide for a persistent repository so that data can be easily reused across cases. We are actively designing an authoring tool with these qualities that will meet the needs of both instructors and developers. In their paper on authoring via task-specific tools, Bell and Zirkel [1997] argue that many ITS authoring tools sacrifice pedagogical power for content flexibility but that developing an authoring tool for a specialized category of instructional applications has advantages even so. We agree that while targeted authoring tools do indeed sacrifice flexibility, it need not be for pedagogical power. Power is afforded to authors who understand the types of tasks they want to teach, and the constraints that are inherent within those tasks. From our experiences with one-on-one authoring between physicians and developers, we propose that a successful authoring tool must be 1. Domain-intuitive: Medical professionals understand how clinical cases are constructed and presented, and their knowledge of the domain can be accounted for to help tailor a system for them, rather than one for the programmer, or architecture; 2. Reusable: Productivity and ease of use are increased for authors who can select from a repository of previously-authored steps and rationales, etc., instead of supplying new information each time. Standard ontologies [19] can also serve as reusable knowledge bases from which knowledge for cases is drawn as needed; 3. Testable: Case authors need feedback on the cases they author and familiarization with the system they are authoring for will make them better authors. Adele's system may be a good candidate for an authoring tool that utilizes Programming By Demonstration (PBL). Diligent [Angros et. al 1998], is a good example of a PBL system in which users author tasks within the tutoring environment; it has been shown that instructors more accurately do in context, than explain out of context.

514

5


Evaluation

In this section we describe the initial evaluation of Adele. The evaluation was given to a class of second year medical students during the third week of November 1998 and was based on a new case on Lung Cancer that the students had studied in class. The students were unmonitored and worked through the case on their own time. Over one hundred students used the system, although only twenty-five of them completed the final questionnaire. Two face-to-face evaluations were also conducted. Our goals for the evaluation were twofold: to discover how the students would react to Adele, and to confirm that the system could support the students. Did Adele and the concept of Web-based exercises have potential in a medical school environment? Was the Adele system easy and robust enough to be used in such an asynchronous setting? The final questionnaire contained thirty questions in six categories: system use, system components, rationales, Adele, and learning. They addressed both specific elements of the tutoring system, such as the interface and rationales, and the general reaction of the students to Adele and the concept of the system. Answers were scored on a Likert scale. Evaluation question The system was easy to use (combined results) It was easy to figure out what to do It was easy to figure out how to do it

Strong disagree % 2 4 12

Disagree % 8 24 16

Neutral % 15 20 24

Adele is a good distance education tool Adele is useful as a classroom preparatory tool Adele would be helpful as a class supplement Adele is a good substitute for a class lecture Adele provided most info of a lecturer Adele provided most info of an attending phys Adele is believable as an attending physician I would like to have more cases available

4 4 4 29 12 8 8 0

8 17 20 33 0 8 24 8

Adeie's hints are helpful Adele's rationales were useful I prefer Adele give rationales before a step I prefer Adele give rationales after a step I prefer Adele let me ask for the rationales Adele's images and actions were motivating Adele is preferable to a text-only tutor I prefer a real voice to a synthesized one Adele's unsynchronized lips-voice bother me

0 0 4 0 4 4 12 0 4

4 4

16 8 4 21 64 56 44 13 33 29 40 39 46 42 24 32 48

5.7

20 17 4 17 32 12 22

Agree % 56 44 40 44 38 42 9 24 26 24 42 21 34 28 35 38 29 16 32 22

Strong agree % 19 8 8 28 33 29 9 0 0 0 38 42 34 8 9 8 8 16 24 4

Analysis of the results

Overwhelmingly, students thought the system was easy to use, yet some found it difficult to figure out what to do next, and how to do it. The latter finding indicates a need for more guidance than the system currently provides. We are looking to Adele to fill this role; to suggest actions at a system level, as opposed to a task level, when a user becomes confused, and to point out the importance of interface elements if they are not utilized. Individual comments ran the gamut, from frustrated students who were unable to use the system to enthusiastic users whose comments sound like testimonials. As researchers and initial users of the systems we develop, we are inclined to bias the usability of the system in favor of users like ourselves, in other words, sophisticated users. Before any system can be successfully deployed on a large scale, this bias must be remedied. If there was a consensus among the students, it was that they wanted more cases to work on. In the words of one participant, "the more practice, the better."


515

The majority of students thought Adele would be a good distance education tool, and useful as a classroom preparation tool, but would not suffice as a replacement for a class lecture. They did not find Adele believable as an attending physician, a feeling that conflicted with a statement made by a fourth year student who had actually worked with attending physicians. It is also not clear if students prefer the persona to a text-only tutor. Though the numbers would indicate otherwise, the students provided favorable impressions of Adele in their general textual comments. Not surprisingly, students would have preferred a real voice to a synthesized one. Surprisingly, however, they did not mind that Adele's lips and voice were not synchronized. Adele's lips and voice move at the same time but are not yet phonetically synchronized and some people find this phenomenon disturbing. It may not be as important as we once thought. Almost all students agreed that Adele's hints were helpful and her rationales useful. They had mixed feelings, though, about when they wanted to hear the rationales given. We noticed during a one-on-one session that the student was not asking "Why?", and therefore missing much of the authored knowledge, so we decided to have Adele give some of the rationales automatically, whether a student asks for them or not. We implemented three variations: 1) give a rationale only when asked, 2) give it automatically after a hint, and 3) give it automatically after a user takes a step, and then asked the students which variation they preferred. Most students answered that they prefer to hear a rationale only when they ask for it, although our experience suggests that they would not ask at all if given a choice. We continue to explore these issues further. 6

Conclusion

Adele and other pedagogical agents like her have lessons to offer regarding autonomous agent design. Researchers in believable agents argue that the audience's perception of an agent's competence is more important than the competence itself [Sengers 1998], and that competence is but one of many factors that agent authors should take into account [Reilly 1997]. A key question is to what extent these claims are true for pedagogical agents. User feedback from Adele suggests that while students find her advice and feedback useful, and think she is a good learning supplement, they don't think her knowledge is equal to that of a lecturer's or attending physician's. Adele's task representation is being enhanced to explicitly reason with hypotheses and their likelihoods as in GUIDON [Clancey 1983]. This will allow Adele to present rationales in a more flexible manner and also automatically generate quizzes to verify the student's knowledge of the hypotheses underlying their actions. From an authoring perspective, the explicit representation of hypotheses and their relationships to findings could allow for automated generation of rationale without requiring the author to provide the rationale for every diagnostic step in a case. These limitations notwithstanding, the approach of orienting Adele's run-time knowledge base to individual cases appears to be workable, and will continue to be followed as the library of cases is developed. We are encouraged by the acceptance of Adele by the medical students and the positive impression she made on the medical faculty and staff, whose cooperation and support was crucial. We will be continuing our efforts in curriculum innovation in the medical domain with a larger and more formal effort beginning in the summer of 1999, which will include the development and evaluation of complete course modules aimed at multiple organ systems and student expertise levels. Meanwhile, we continue to make steps toward the adoption of the technology in other domains. In collaboration with the USC School of Dentistry, we have implemented a case for a field trial in April 1999 and are planning to author more cases for a course in the fall. We are also developing a new non-clinical simulation that will allow Adele to be employed in more general settings.

516

7


Acknowledgments

CARTE staff members Kate Labore and Dr. Jeff Rickel, 'A' Team members Ami Adler, Andrew Marshal, Anna Romero, and graduate student Chon Yi all contributed to the work presented here. Dr. Allan Abbott led the evaluation at the medical school and he and medical student Michael Hasler authored the case. Our collaborators, Drs. Demetrio Demetriatis, William La, Wesley Naritoku, Sidney Ontai, and Beverly Wood, and Angela Atencio and Leah Flodin at the USC School of Medicine provided indispensable assistance. This work was supported by an internal research and development grant from the USC Information Sciences Institute. 8 1. 2. 3.

4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21

References Andre, E., Rist, T., and Muller, J. (1998). Integrating reactive and scripted behaviors in life-like presentation agents. In K.P. Sycara and M. Wooldridge (Eds.), Proc. of the Second Int'l Conf. on Autonomous Agents, pp. 261–268, ACM Press, New York. Angros, R., Johnson, W.L., & Rickel, J. (1997). Agents that Learn to Instruct. In Proc. of the 1997 AAAI Fall Symposium on Intelligent Tutoring System Authoring Tools, TR FS-9701. Azevedo, R., Lajoie, S.P., Desaulniers, M., Fleiszer, D.M., & Bret, P.M. (1997). RadTutor: The theoretical and empirical basis for the design of a mammography interpretation tutor. In Proc. of the 8th World Conf. on Artificial Intelligence and Education, IOS Press, Amsterdam. Bell, B., and Zirkel, J. (1997), Authoring virtual gallery environments via task-specific tools. In Proc. of the 1997 AAAI Fall Symposium on Intelligent Tutoring Systems Authoring Tools, TR FS–97–01. Clancey, W. J. (1983). The Epistemology of a Rule-Based Expert System: A Framework for Explanation. Journal of Artificial Intelligence, vol. 3, pp. 215–251. Clancey, W. (1987). Knowledge-based Tutoring: The GUIDON Program. MIT Press. Gragg, C.I. (1940). Because wisdom can't be told. Harvard Alumni Bulletin, October 19, pp. 78–85. Johnson, W.L. (1998). Pedagogical agents. In Proceedings of ICCE '98, vol. I, pp. 13–22 China Higher Education Press and Springer-Verlag, Beijing. Johnson, W.L. and Hayes-Roth, B. (1998). The First Autonomous Agents Conference. The Knowledge Engineering Review 13(2), pp. 1 –6. Lester, J.C., Converse, S., Stone, B., Kahler, S., and Barlow, T. (1997). Animated pedagogical agents and problem-solving effectiveness: A large-scale empirical evaluation. In Proc. of the Eighth World Conference on Artificial Intelligence in Education, pp. 23-30. IOS Press, Amsterdam. Marsella, S.C. and Schmidt, C.F. (1990). Reactive planning using a situation space. In Proc. of the 1990 AAAI Spring Symposium Workshop on Planning. Reilly, W.S.N. (1997). A methodology for building believable social agents. In W.L. Johnson and B. Hayes-Roth (Eds.), Proc. of the First Int'l Conference on Autonomous Agents, pp. 114–121. ACM Press, New York. Rickel, J. and Johnson, W.L. (1998). Animated agents for procedural training in virtual reality: perception, cognition, and motor control. To appear in Applied Artificial Intelligence Journal. Rickel, J. and Johnson, W.L. (1997). Intelligent Tutoring in Virtual Reality: A Preliminary Report. In Proc. of the Int'l Conf. on Artificial Intelligence in Education, pp. 294-301 IOS Press, Amsterdam. Russell, S., and Norvig, P. (1995) Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs. Sengers, P., (1998). Do the Thing Right: An architecture for action-expression. In K.P. Sycara and M. Wooldridge (Eds.), Proc. of the Second Int'l Conf. on Autonomous Agents, pp. 24–31. ACM Press, NY. Shaw, E., Johnson, W.L., and Ganeshan, R. (1999). Pedagogical agents on the web. To appear in Proc. of the Third Int'l Conference on Autonomous Agents. ACM Press, New York. Towns, S.G., Callaway, C.B., Voerman, J.L., and Lester, J.C. (1998) Coherent gestures, locomotion, and speech in life-like pedagogical agents. IUI ' 98: 1998 International Conference on Intelligent User Interfaces, pp. 13-20, ACM Press, New York. U.S. Department of Health and Human Services, National Institutes of Health and National Library of Medicine (1998). Unified Medical Language System Knowledge Sources. Wenger, E. (1987). Artificial intelligence and tutoring systems: Computational and cognitive approaches to the communication of knowledge. Los Altos, CA: Morgan Kaufmann Publishers. Inc. Whitehead, A.N. (1929) The Aims of Education. Cambridge University


517

Computer-based Tutoring of Medical Procedural Knowledge Oleg Larichev and Yevgeny Naryzhny Institute for Systems Analysis, Prospekt 60 let Oktyabrya, 9, ISA, Moscow, Russia [email protected], nev@ isa.ru

There are a lot of examples of efficient intelligent tutoring systems for teaching procedural knowledge (ability to solve practical problems) in the formal fields such as geometry, algebra, and programming. However, there are considerable difficulties encountered in developing the systems aimed at giving procedural knowledge in a number of important, albeit imprecisely defined, fields as, for example, medical diagnosis. In this paper, we consider a new approach to the development of intelligent tutoring systems for the problems in diagnosis; the use of this approach makes it possible to lay the groundwork for experience in statement of diagnosis close in accuracy to the expert's diagnosis.

1. Introduction Development of intelligent tutoring systems aimed at the computer-based transfer of knowledge from an expert in a certain domain to a newcomer to the field remains one of the most important directions of research in the field of artificial intelligence. Certain progress is achieved in the development of systems helping in the acquisition of declarative knowledge. There are also examples of efficient systems for tutoring in procedural knowledge (ability to solve practical problems) in the formal fields such as geometry, algebra, and programming [1]. However, there are considerable difficulties encountered in developing the systems aimed at giving procedural knowledge in a number of important, albeit imprecisely defined, fields as, for example, medical diagnosis. The first serious attempt in such a field was famous tutoring system GUIDON [2]. A characteristic feature of these fields of human activity is the lack of realistic models that help in decision making and cover the entire space of possible states. Therefore, as it always has been, tutoring in the art of solving problems is accomplished by the tutor demonstrating his expertise, by the student imitating the expert's behavior, and by the student analyzing his (her) mistakes. It takes no less than ten years of active and cogitative practice for an expert to progress from beginner's level to the heights of professional expertise [3]. During this span of time, not only the body of knowledge possessed by a person expands considerably, but also (as studies have shown) the structure of the knowledge and the very way of thinking changes [4]. A person who has become an expert is not only unerring in the majority of his (her) decisions but also uses so-called forward reasoning, i.e., a direct transition from the description of the problem to its solution. Furthermore, in the majority of cases, the person cannot explain how he (she) came to the decision made; i.e., we may assume that at least part of the knowledge of an expert resides on the subconscious level. Moreover, it deems that a procedural knowledge is not available to introspection under any circumstances [5]. Thus, procedural knowledge appears to be unconscious in the strict

518

O. Larichev and Y. Naryzhny / Tutoring of Medical Procedural Knowledge

sense of the term. Investigation into the mechanisms of gaining a subconscious experience indicates that acquisition and refinement of this experience occurs in the course of active practice and depends on its duration [6]. Therefore, the problem of searching for ways of more efficient tutoring with the use of computer systems is important. In this paper, we consider a new approach to the development of intelligent tutoring systems for the problems in diagnosis; the use of this approach makes it possible to lay the groundwork for experience in statement of diagnosis close in accuracy to the expert's diagnosis.

2. Diagnostics as a Problem of Classification A problem in diagnosis may be treated as a problem in classification when an expert should assign an object to one or several classes of solutions. We now give a formal statement of the problem. Let the objects be described by M attributes (criteria). A discrete range of possible values is specified for each attribute; i.e., there is the set P; for the i-th attribute with Pi = { p i 1 ,Pi2,.... piki}, where ki is the number of values of the i-th attribute. Let N be the number of classes of the solutions Cn (properties of the objects). A certain verbal description corresponds to each class of solutions, each attribute, and each value within the range of each attribute. A complete set of various objects for a given problem in classification can be represented as a Cartesian product of the ranges of attributes, A = P 1 xP 2 X.. .xPm; in this product, each of the objects appears as the vector ai = (ai1, ai2, ..., atm), where a'm Pm. Typically, in actual problems, the objects of the space A are not all realizable in fact, which, in certain cases, considerably reduces the space of possible solutions. An expert successfully solves the problem of assignment of vectors ai A to a single or several solution classes Cn. It is required to transfer this skill to a newcomer who possesses declarative knowledge in the field but has no experience in solution of practical problems. We suggest to solve this problem in the following two stages: a) formation of the expert-knowledge base imitating precisely the expert decisions for any ai A; and b) development of a computer environment aimed at tutoring and based on the formed knowledge base; this environment allows a newcomer to the field to approach the expert level in solving the problems in a given field and to secure the knowledge obtained. The aim of the first stage can be accomplished by developing a complete knowledge base by the procedure of expert classification [7]. The completeness here means that the knowledge base can correctly classify any ai A. In order to build such complete knowledge base in a general case it is necessary to make the expert to answer exactly |A I questions. However, it is possible to significantly reduce the number of questions to the expert making use of the fact that the attribute values have a different typicality for different decision classes. For example "complaints about pain in thorax" is more typical than "no complaints about pain" for pulmonary-artery thromboembolism. So, the values within the range of each attribute can be ordered by the degree of typicality with respect to each class Cn ;i.e., these values can be arranged in a dominance relation, where more typical values dominate over less typical ones. In the same fashion we can build a partial order covering all the problem space A. Vector ai A is referred to as dominating over vector aj A if aim dominates or equals to dm for each 1 < m < M, else ai and aj remain incomparable. Using of the partial order of objects within the problem space allows to reduce the


519

number of questions to the expert and build the complete knowledge base in short time with our knowledge acquisition software [7].

3. Subconscious Decision Rules The formed complete knowledge bases were used in studies of the models for internal organization of the expert's knowledge [8]. It was found out that this knowledge can be represented in a compact way. An object belonging to a certain decision class is referred to as the boundary object if it does not dominate any other object of the same class. Boundary objects of each class completely define a given class and cannot dominate over each other. In actual classification problems, the so-called unstable objects can be encountered both among the set of boundary objects belonging to the class Cn and among the objects closest to this set, albeit belonging to the class "not Cn" these unstable objects can be assigned to the opposite class when presented to the expert for the second time. The analysis of boundaries defining the demarcation lines between the solution classes of various classifications developed by different experts has shown that one can invariably devise a family of decision rules of definite kind that define a certain boundary Qn1separating stable objects of the class Cn from stable objects belonging to the class "not Cn".

This rule can be represented as "p i k, ..., pik+l and no less than t values from the remaining M-1 attributes characteristic of the class Cn" Here, the representation pik ..., p ik+l signifies a conjunction of the values of the most important attributes, which should be supplemented with no less than t values of the less important attributes. Typically, the number of such rules approximating the boundary of a class of solutions does not exceed 2–3 [8]. An additive property of the rules appears to be important because a subconscious estimation of typical values of insignificant attributes is a widely encountered procedure performed by the human system of information processing [9]. Study has shown that behavior of an expert dealing with the problems of classification can be simulated to a good accuracy with the use of a small number of decision rules having a fairly simple structure. This made it possible to advance the hypothesis that, as a result of long-standing practice, an expert forms subconscious rules for recognition of similar structures; these rules, although impossible to verbalize, are used by the expert in solving the diagnosis problems. With knowledge of an expert represented compactly, a solution of the problem of efficient tutoring in the art of diagnosis becomes feasible. The aim of tutorship becomes the development of subconscious decision rules in the long-term memory of the newcomer to the field, so that the decisions are no different from those made by the expert.

4. Method of Learning Let a be a certain object belonging to the class Cn. Let the boundary Qn of class Cn consists of boundary objects written as qi = (qi1,qi2, ..., qiM). We define the distance between a and the boundary Qn as .where Dm =1 if am# q1m

520

O. Larichev and Y, Naryzhny / Tutoring of Medical Procedural Knowledge

The set A can be divided into subsets (layers) that are separated by various distances from the boundary layer Qn and, consequently, are of different degrees of complexity for decision making. We define the complexity of the object as a linear function of the layer number with the values of this function ranging from 1 % (for the layer farthest from the boundary layer) to 100% (for the boundary layer). We can associate each layer of objects belonging to the class Cn and having the given complexity with a symmetric (about the boundary layer) layer of objects of the class "not Cn " and assign the same level of complexity to the latter layer (Fig. 1).

c=1%

9 Class Cn Class«not « Cn

00...0

Fig. 1. Layers of objects of various complexity in the space of possible states.

In training in the solution of the problems of given complexity, the student is confronted randomly by the objects that are to be sorted and belong to the layer of complexity c either from the class Cn or from the class "not Cn." Training begins with the problems of 1%-level complexity and consists in self-reliant solution of a large number of problems by the trial-and-error method. The decision rules used as standards in the classification are not made explicitly known to the student. Instead, in the case of a wrong decision, explanations similar to those used by the expert in substantiation of his (her) decisions are given. In the real world nearly all complex skills, as the art of medical diagnostics in our case, are acquired with a blend of the explicit and the implicit [11]. We consider that explanations in terms of typical and untypical diagnostic signs remain one of the critical points of our learning procedure, because they help a learner to focus his (her) attention on the important aspects of the case to be diagnosed. The primary goal of such explanations is to teach the learner to think on his (her) own comparing his (her) diagnosis with the diagnosis of the experienced physician. These explanations cannot be treated as decision rules, and are considered as comments helping the learner to understand what he (she) has missed in making the decision. So, the learner should read the explanation and think of the case applying his (her) clinical knowledge and experience, including experience with the tutoring system. If the student solves a fairly long sequence of problems correctly, the system raises the complexity of problems and presents the objects from the next layer. If the student now makes too many errors, the system reduces the problems' complexity by returning to the previous layer. The system predicts the decision of the student for each of the problems within the layer by calculating the index of conformity with the standard


521

decision. The method of calculating the conformity index is based upon a learner model, which is generated dynamically with an induction algorithm. Primarily, the objects with the lowest value of this index are presented to the student. The training is completed when the student becomes capable of solving with confidence the problems of the highest complexity. The method of learning suggested above can be characterized as a method of implicit learning. The term implicit learning more than three decades ago was used by Reber to characterize the underlying structure of a complex stimulus environment [10]. This process is characterized by two critical features: (a) It is an unconscious process and (b) it yields abstract knowledge. Reber's first results were concerned with implicit learning of artificial grammars, but later he extended the notion of the term by providing some insight into a variety of related processes such as arriving at intuitive judgements, complex decision making, and, in a broad sense, learning about the complex covariations among events that characterize the environment [11]. Similar results on implicit learning were obtained by other researchers. Broadbent with colleagues showed that knowledge of complex rule systems governing simulated economic/production systems is also acquired and used in an implicit fashion [12]. Broadbent and his co-workers demonstrated that subjects induced the rule systems implicitly and made appropriate adjustments in the relevant variables and did so in the absence of conscious knowledge of that rules themselves. In this work we state that it is possible to build a formal model and an effective tutoring system for medical diagnostics based upon principles of implicit learning.

5. Application The principles for transfer of procedural knowledge stated above were actually tested with one of the most complex problems in medical diagnosis - the problem of differential diagnosis of the pulmonary-artery thromboembolism and the acute infarct of myocardium. The authors of the paper developed an intelligent tutoring system OSTELA for this problem on the basis of nine clinically and instrumentally identified symptoms, such as anamnesis and risk factors, respiration rate and its dynamics, electrocardiogram pattern, echo cardiogram, and so on [13]. Each of the symptoms (attributes) has the range of several values of different degrees of typicality for the diseases under consideration. Various combinations of the values of attributes form the set of descriptions of hypothetical patients in need of urgent diagnosis (Fig. 2). Post-graduate students of the Russian Academy of Medical Sciences and young inexperienced physicians from the Botkin Clinical Hospital took part in the experiments with OSTELA training. Prior to and after the course of training, all the participants were subjected to a test of 20 problems of the highest level of complexity. The training course lasted two days with 4 h of training each day. During this time, each of the tested persons solved, on the average, about 500 problems. While the percentage of correct decisions made in the preliminary test coincided, on the average, with that at random choice of decisions, the persons subjected to the post-course control test demonstrated 90 - 100% level of coincidence with the expert's decisions, although they were unable to formulate the rules of decision making. Some involved in the procedure were subjected to a repeat test in a week and gave 85 - 95% of correct answers, which indicated that the experience was secured. Following an additional session in training, which lasted less than an hour, the involved persons demonstrated complete conformity with the expert's decisions.

522


Recently endured perivisceral operation in anamnesis Complaints about pain in thorax A pronounced cyanosis of face, neck, and upper part of the body at the instant of examination A suddenly occurred acute dyspnea Low arterial pressure. Weakness increased gradually According to the electrocardiogram, there are an incomplete blockade of the right-hand branch of His's fascicle and the deep SVS - V6; ST III is slightly raised and bow-shaped; and the T III, AVF, and VI - V4 spikes are negative In X-ray pattern of thorax, expansion of the principal branches of pulmonary artery and the shortened lung radix are evident In echo cardiogram, a considerable increase in the pressure in pulmonary artery and dilatation of the right-hand cardiac ventricles; there are the akinesia zones in the left-hand ventricle Appreciably increased concentrations of AST, ALT, CPK, MB CPK, and LDH in the plasma

Choose one of the following alternative decisions: 1. Preliminary diagnosis: simultaneous pulmonary-artery thromboembolism and acute infarct of myocardium; 2. Preliminary diagnosis: an acute infarct of myocardium, whereas an additional examination is required to ascertain the presence of pulmonary-artery thromboembolism; 3. Preliminary diagnosis is positive for an acute infarct of myocardium. There is no pulmonaryartery thromboembolism Fig. 2: A simplified screenshot of OSTELA tutoring session (graphical ECG pictures omitted). This is an example of one of diagnostic problems in detection of the of the pulmonary-artery thromboembolism for the patient suffering from an acute infarct of myocardium. Selecting decision 1 means choosing class Cn, and selecting decision 2 or 3 means choosing class "not Cn ".

6. Conclusion In the majority of cases the expert cannot explain how he (she) came to the diagnosis. During the years of deliberate practice the expert seems to form subconscious rules for recognition of similar structures. These rules cannot be correctly verbalized and we can only make a guess at their real structure. Nevertheless, in this article we demonstrated that it does remain possible to build a precise copy of expert's intuitive judgements in such a complex domain as medical diagnostics. Moreover, we found that it is possible to build an effective real-world tutoring system based on these judgements. Thus, the studies performed by us lend credit to the possibility of developing efficient tutoring systems that can considerably shorten the time needed to convert a newcomer to the field into an expert able to make correct self-reliant decisions.


523

References [1] [2] [3] [4]

[5] [6]

[7]

[8] [9] [10] [11] [12] [13]

Anderson, J. R., Corbett A. T., Koedinger K. R., Pelletier R., Cognitive Tutors: Lessons Learned. The Journal of the Learning Sciences, 1995, 4(2), 167–207. Clancey, W. J., GUIDON. The Handbook of Artificial Intelligence, Los Altos, California: William Kaufmann, 1982. Simon, H. A., Reason in Human Affairs, Stanford: Stanford Univ. Press, 1983. Ericsson, K. A., Lehnmann, A. C, Expert and Exceptional Perfomance: Evidence of Maximal Adaptation to Task Constraints, Annual Review of Psychology, 1996, 47:273-305. Kihlstrom, J. F., The Cognitive Unconscious, Science, 1987, Vol. 237, 1445-1452. Ericsson, K. A., The Acquisition of Expert Perfomance: An Introduction to Some of the Issues, In Ericsson, K. A. (Ed.), The Road to Excellence: The Acquisition of Expert Perfomance in the Arts and Sciences, Sports and Games, Hillsdale, NJ: Lawrence Erlbaum Associates, 1996, 1–51. Larichev O. I., Moshkovich H. M., Furems E. M., Mechitov A. I., Morgoev V. K., Knowledge Acquisition for the construction of the full and contradiction free knowledge bases, lec ProGAMMA, Croningen, The Netherlands, 1991. Larichev O. I., A Study on the Internal Organization of Expert Knowledge, Pattern Recognition and Image Analysis, 1994, Vol.5, No. l, 57–63. Ling, C., Marinov, M., A Symbolic Model of Nonconscious Acquisition of Information, Cognitive Science, 1994, 18, 595-621. Reber, A. S., Implicit Learning of Artificial Grammars. Journal of Verbal Learning and Verbal Behavior, 1967, 77, 317-327. Reber, A. S., Implicit Learning and Tacit Knowledge. Journal of Experimental Psychology: General, 1989, Vol. 118, No. 3, 219–235. Broadbent, D. E., FitzGerald, P., Broadbent, M. H. P., Implicit and Explicit Knowledge in the Control of Complex Systems, British Journal of Psychology, 1986, 77, 33–50. Kuznetsova, V. P., Bruk, E. I., Thromboembolism of Pulmonary Artery, Moscow: Ross. Gos. Akad. Med. Nauk (in Russian), 1997.


Understanding Texts and Dialogues



527

Tutoring Systems based on Latent Semantic Analysis Benoit Lemaire L.S.E. University of Grenoble II BP 47 – 38040 Grenoble Cedex 9 France [email protected] Abstract. Latent Semantic Analysis is a model of language learning, based on the exposure to texts. It predicts to which extent semantic similarities between words are learned from reading texts. We designed the framework of a tutoring system based on this model. Domain examples and student productions are represented in a high-dimensional semantic space, automatically built from a statistical analysis of the co-occurrences of their lexemes. We also designed tutoring strategies to select among the examples of a domain the one which is best to expose the student to. Two systems are presented: the first one successively presents texts to be read by the student, selecting the next one according to the comprehension of the prior ones by the student. The second plays kalah with the student in such a way that the next configuration of the board is supposed to be the most appropriate with respect to the semantic structure of the domain and the previous student's actions.

1

Introduction

This paper relies on a new model of human learning called Latent Semantic Analysis (LSA) [8]. Although the authors sketched out the outline of a general model of learning, they mainly focused on language learning. This model predicts to which extent semantic relations are drawn by a subject reading a set of texts. It does so by automatically extracting semantic relations between words by a method presented in the next section. Our contribution is twofold. First, we designed a framework for representing both domain and student knowledge of a tutoring system by means of LSA. We also designed tutoring strategies for selecting among the various stimuli the student can be exposed to, the one which is supposed to be the best for improving learning. Second, we built two systems we implemented on top of this framework. In the first one, stimuli are texts whereas in the second one they are boards of a strategy game. 2 The Latent Semantic Analysis Model of Learning LSA was primarily designed as a tool for text retrieval [2] but because of its good performance, its scope was extended to information filtering [5], cross-language information retrieval [4], automatic grading of essays [6], measuring of text coherence [8], assessment of knowledge [11], machine learning [9] and then modelling human learning [8]. Before presenting LSA as a model of learning and knowledge representation, we will present its principles.

528

2.1

B. Lemaire /Systems Based on Latent Semantic Analysis

LSA: a Tool for Text Retrieval

One of the problems in the field of text retrieval is to be able to retrieve pieces of texts given a list of keywords. However, because of polysemy, synonymy and inflexion, retrieving only the texts that contain one or more of the keywords does not work well. For instance, Steinbeck's book Of Mice and Men should be retrieved given the keywords mouse and man although none of these words appear in the title. Therefore, retrieval should also be based on semantic information. In order to perform such semantic matching, LSA relies on large corpora of texts to build a semantic high-dimensional space containing all words and texts, by means of a statistical analysis. This semantic space is built by considering the number of occurrences of each word in each piece of text (basically paragraphs). For instance, with 300 paragraphs and a total of 2000 words, we get a 300x2000 matrix. Each word is then represented by a 300-dimensional vector and each paragraph by a 2000-dimensional vector. Nothing new so far since it is just occurrence processing. The power of LSA rather lies in the reduction of these dimensions. It is this process that induces semantic similarities between words. All vectors are reduced by a method close to eigenvector decomposition to, for instance, 100 dimensions. The matrix X is decomposed as a unique product of three matrices: X = T0S0D'0 such that Tu and Du have orthonormal columns and S0 is diagonal. This is called singular value decomposition. Then only the 100 columns of TO and DO corresponding to the 100 largest values of Su are kept, to obtain T, S and D. The reduced matrix X such that: .X= TSD' permits all words and pieces of texts to be represented as 100-dimensional vectors. It is this reduction which is the heart of the method because it extracts semantic relations, if a word (i.e. bike) statistically co-occurs with words (i.e. handlebars, pedal, ride) that statistically co-occur with a second word (i.e. bicycle) and the first word statistically does not co-occur with words (i.e. flower, sleep) that do not co-occur with the second one, then the two words are considered quite similar. If the number of dimensions is too small, too much information is lost. If it is too big, not enough dependencies are drawn between vectors. A size of 100 to 300 gives the best results in the domain of language [8]. This method is quite robust: a word could be considered semantically close to another one although they never co-occur in texts. In the same way, two documents could be considered similar although they share no words. An interesting feature of the method is that the semantic information is derived only from the lexical level. There is no need to represent a domain theory. Several experiments were performed which showed that LSA works quite well. An experiment [8] consisted in building a general semantic space from a large corpora of English texts, then testing it with the synonymy tests of the TOEFL (Test Of English as a Foreign Language). Given a word, the problem is to identify among 4 other words the one that is the semantically closest. LSA performed the test by choosing the word with the highest similarity between its vector and the vector of the given word. LSA results compare with the level of foreign students admitted to American universities. 2.2

LSA: a Model of Learning

Apart from interesting results in the field of text retrieval, LSA can also be viewed as a general model of human learning. First of all, we will describe LSA as a model of language learning but we will see later that it can be extended to other kinds of knowledge. LSA is a model of the way humans learn from texts since, given a set of texts read by a subject, it predicts to which extent semantic relations between words are learned. The more


529

texts LSA gets, the more accurate are the semantic proximities. The model has been tested by simulating word learning between age 2 and 20 [8]. They estimated that human beings read about 3500 words a day, and learn an average of 7 to 15 words per day during that period. If provided with a similar amount of texts, LSA learns 10 words a day to get a performance similar to the humans by the age of 20 (defined as the performance to the TOEFL test described earlier). This result is coherent with the human rate of learning. A similar method based on a high-dimensional representation shows its ability to model the human semantic memory [10]. We formalized that model in the following way: • A domain D is composed of lexemes. In the domain of language, lexemes are words. In the domain of problem solving, lexemes are facts and conclusions (high-fever, prescribe-penicillin, meningitis, etc. in the domain of medicine). In the domain of game playing, lexemes are positions of pieces (pawn-in-E2, queen- inF4, etc. in chess). • A student learns the domain by being exposed to sequences of lexemes (sequences of words, sequences of facts and conclusions, sequences of pieces' positions, etc.). • What is learned is semantic similarities between lexemes or sequences of lexemes (for instance, high-fever and meningitis). In chess, two boards can be semantically similar although their pieces are not in the same positions; it is a characteristic of chess masters to be able to recognize two boards as being similar. • LSA predicts the semantic similarity between two lexemes or sequences of lexemes given the sequences of lexemes the student has been exposed to. Some of the sequences of lexemes that are presented to the student will highly improve learning because their structure map the semantic structure of the domain. Other sequences will be of poor interest for the student because they are either too close to or too far from the student knowledge. Knowing which sequence is best for the student depends on the semantic structure of the domain and the student knowledge. Therefore, we need to represent both in the LSA formalism of knowledge representation. Afterwards, we will be able to design tutoring strategies for selecting the sequences of lexemes that are best to present to a given student. All this is akin to the well-known structure of tutoring systems [13]: expert module, user model and pedagogical module. 3 High-Dimensional Representation of Domain Knowledge One of the main interests of our approach is that the representation of domain knowledge is automatically built from examples. These examples should be semantically valid and therefore given by experts (verbally or from books). For instance, in the domain of language learning, examples are just well-formed texts. In the domain of game playing, examples are boards as well as an indication whether it is a winning board or a losing board (this information is obtained at the end of the game). From these examples, LSA builds a semantic space in which all lexemes and examples are represented. There is no need to build by hand semantic networks or logic formulas to represent the domain. All is automatically done by LSA. The only requirement is to find a good formalism to represent the examples. We do not need to represent all possible examples of the domain, but only enough of them to statistically capture a significant part of the latent semantics of the domain.

530

B. Lemaire / Systems Based on Latent Semantic Analysis

4 High-Dimensional Representation of Student Knowledge In the same way, we represent the student knowledge, that is the student's meaning of entities (we call entity an element of the semantic space, either a lexeme or a sequence of lexemes). Lexemes are described twice: • as a domain entity to represent the right meaning of the term, constructed as shown previously from the word usage in the language; • as a student entity to represent the student's meaning of the term; For instance, the semantic space may contain one instance of pneumonia as a domain entity and one instance as a student entity. In the same way, sequences of lexemes are represented in the semantic space. Before being represented, that knowledge needs to be extracted. There are several ways to do that: • the student freely produces one or several texts: student entities are therefore new entities; • all the sequences of lexemes the student was exposed to and tested about are represented in the space. That way, student entities are domain entities weighted by a score corresponding to the comprehension of the domain entity by the student. The goal is that the student entities cover all the domain entities. If the student entities are all in the same part of the space, it probably means that the student has a gap in his knowledge. In that case, the goal of the system is to provide him with appropriate sequences of lexemes so that his knowledge cover a larger part of the space. 5 Tutoring strategies Now that we have a representation of both domain and student knowledge, we need to design tutoring strategies. As we said before, in the LSA model, learning results from the exposure to sequences of lexemes. By being exposed to sequences of lexemes in a random fashion, the student would certainly learn some lexemes in the same way a child learn new words by reading various books. However, the process of learning could be speeded up by selecting the right sequence of lexemes given the current state of student entities. Therefore, the problem is to know which text (for language learning) or which move (for game learning) has the highest chance of enlarging the part of the semantic space covered by the student entities. 5.1

Selecting the closest sequence

Suppose we decide to select the sequence which is the closest to the student sequences. If { . s 1 , s 2 . . . S n } arc the student sequences and {d1, d2 , ...d p } the domain sequences, we select dj such that: proximity(s i ,dj) is minimal. Figure 1 shows that selection in a 2-dimensional representation (remind that LSA works because it uses a lot of dimensions). Let us illustrate this by means of an example. Suppose the domain is composed of 82 sequences of lexemes corresponding each one to an Aesop's fable. Then suppose that a beginner student was asked to provide an English text in order for the process to be initiated. The user model is composed of only this sequence of lexemes:


closest

531

closest from those out of the zones

farthest

domain entities user entities Figure 1: Various ways of selecting the next sequence My English is very basic. I know only a few verbs and a few nouns. I live in a small village in the mountain. I have a beautiful brown cat whose name is Felix. Last week, my cat caught a small bird and I was very sorry for the bird. He was injured. I tried to save it but I could not. The cat did not understand why I was unhappy. I like walking in the forest and in the mountain. 1 also like skiing in the winter. I would like to improve my English to be able to work abroad. I have a brother and a sister. My brother is young.

Running LSA, the closest domain sequence is the following: Long ago. the mice had a general council to consider what measures they could take to outwit their common enemy, the Cat Some- said this, and some said that; but at last a young mouse got up and said he had a proposal to make, which he thought would meet the case. "You will all agree." said he. "that our chief danger consists in the sly and treacherous manner in which the enemy approaches us. Now, if we could receive some signal of her approach, we could easily escape from her. I venture, therefore, to propose that a small bell be procured, and attached by a ribbon round the neck of the Cat. By this means we should always know when she was about, and could easily retire while she was in the neighborhood." This proposal met with general applause, until an old mouse got up and said: "That is all very well, but who is to bell the Cat'.'" The mice looked at one another and nobody spoke. Then the old mouse said: It is easy to propose impossible remedies.

It is hard to tell why this text ought to be the easiest for the student. A first answer would be to observe that several words of the fable occured already in the student's text (like cat, young, small, know, etc.). However, LSA is not limited to occurrence recognition: the mapping between domain and student's knowledge is more complex. A second answer is that the writer of the first text actually found that fable the easiest from a set of 10 randomly selected ones. The third answer is that LSA has been validated several times as a model of knowledge representation; however, experiments with many subjects need to be performed to validate that particular use of LSA. Although the closest sequence could be considered the easiest by the student, it is probably not suited for learning because being too close to the student's knowledge. 5.2

Selecting the farthest sequence

Another solution would be then to choose the farthest sequence (Figure 1): A Horse and an Ass were travelling together, the Horse prancing along in its fine trappings, the Ass carrying with difficulty the heavy weight in its panniers. "I wish I were you," sighed the Ass; "nothing to do and well fed, and all that fine harness upon you." Next day, however, there was a great battle, and the Horse was wounded to death in the final charge of the day. His friend, the Ass. happened to pass by shortly afterwards and found him on the point of death "I was wrong." said the Ass. Better humble security than gilded danger.

532


That sequence was found quite hard to understand by our writer. Choosing the farthest sequence is probably neither appropriate for learning, because it is too far from the student's knowledge. 5.3

Selecting the closest sequence among those that are far enough

None of the previous solutions being satisfactory, a solution would be then to ignore the domain sequences that are too close to any of the student sequences. A zone is therefore defined around each student sequence and domain sequences inside these zones are not considered (we present in the next section a way of implementing that procedure). Then by using the same process described in section 5.1, we select the closest sequence from the remaining ones. Figure i illustrates this selection. The idea that learning is optimal when the stimuli is neither too close nor too far from the student's knowledge has been theorized by Vygotsky [12] with the notion of zone of proximal development. He influenced Krashen [7] who defined the input hypothesis as an explanation of how a second language is acquired: the learner improves his linguistic competence when he receives second language 'input' that is one step beyond his current stage of linguistic competence. 5.4

Integrating the strategy into a language learning program

We designed a program in C, based on the previous procedures. It is based on the result that most of the words we know, we learned from reading (evidence for that assertion is provided in [8]. Therefore, the goal is, at each step, to find the most appropriate English text for French students to read in order to stretch the student subspace First, LSA is run to place all domain texts in a semantic space. The systems works in the following way: it selects dynamically a text, presents it to the student, tests his comprehension, then selects another text, etc. The process is initialized with a text the student provides (it will also work if the student provides no text; it will just take more time to reach a correct behavior of the system). At the beginning, the system does not know much about the student and therefore, the selection of the next text might not be optimal. However, after a while the user model is more and more precise and the choice of the system more accurate. After each text is provided, the student is required to rate his comprehension on a 1 to 5 scale. The text is then added to the student model as well as its weight. This is used to compute the proximity between a domain text and a student text: the similarity provided by LSA is multiplied by the weight. Therefore, texts that were well understood by the student play a more important role in the selection of the next text. Improvements could be made by allowing the student to select words or parts of the texts that were not understood. This is going to be part of our future work. 6

A system to help learning kalah

In the same way, we designed a program to help a student learn an African game called kalah. This program, written in C, can be viewed as a tutor: it plays in such a way that the student is driven towards a state which should be optimal for his learning. Kalah is played on a board composed of two rows of 6 pits. Each player owns a row of 6 pits as well as special pit called kalah. Pits contain initially 6 stones, and both kalahs are empty. Each player takes all the stones in any of his 6 pits, then spread them over the pits anticlockwise, one stone per pit, including his kalah (but not the opponent's kalah). If the last

B. Lemaire / Systems Based on Latent Semantic Analysis

1

3

1

11

1

11

1

3 0

11

2

12

533

4 9

Figure 2: A state of the game of kalah

stone lands in the kalah, the player goes again. If the last stone lands in an empty pit, and the opponent has stones in the opposite pit, then all these stones go in the kalah. The goal is to get as many stones as possible in the kalah. Lexemes are elements for describing a state. For instance, the lexeme a3 indicates that there are 3 stones in the pit a (pits including kalahs are labelled from a to n). A sequences of lexemes represents a state of the game. For instance the state shown in figure 2 is represented by the following sequence of lexemes: al b3 c0 dll e2 f12 g9 hll i2 jll kl 13 m2 n4. A semantic space was built from 50,000 states automatically generated by two C programs using a traditional minmax algorithm at a depth of 3. This semantic space therefore covers a large part of the kalah semantics. Each time the student encounters a new configuration of the board, it is recorded into the space as a student entity. The system plays against the student. At each turn, it looks for the different possible moves. Each one results in a new state, that is a new sequence of lexemes. The system looks for the new sequence of lexemes which is close enough to the student model but not too close (we rely on the same procedure described earlier). Then it plays the corresponding move. For instance, in the previous figure, there are 6 possible moves, therefore 6 possible new states (the system plays the upper row): 1. a2 b3 c0 dll e2 f12 g9 hll i2 jll kl 13 m0 n5 2. a2 b3 c0 dll e2 f12 g9 hll i2 jll kl 10 m3 n5 3. al b3 c0 dll e2 f12 g9 hll i2 jll k0 14 m2 n4 4. a2 b4 cl d12 e3 f13 g9 h12 i2 J0 k2 14 m3 n5 5. al b3 c0 dll e2 f12 g9 hll i0 J12 k2 13 m2 n4 6. a2 b4 cl d2

e3 f12 g9 h0 i3 J12 k2 14 m3 n5

States 1 and 2 give the student the opportunity to play one more move and to put the last stone in an empty hole (which would allow him to capture the opposite stone). State 4 shows to the student 13 stones in a hole, which is a special number because after going round the board the last stone falls necessarily in an empty hole. State 5 gives the student the possibility to play again. States 3 and 6 have noting special. According to the student model, state 5 is considered the most appropriate. Therefore move 5 will be played by the machine. This system does not play optimally; indeed it sometimes plays badly since it is only concerned with driving the student towards an appropriate part of the semantics of the domain. As long as the student is aware that it plays against a player with an apparently random behavior, we believe that this is no problem.

534

1


Conclusion

In this paper, we relied on a high-dimensional representation of the lexemes of a domain to build the framework of a tutoring system. The goal of this tutoring system is to select the next stimuli to expose the student to in order for his learning to be optimal. Our approach is based on a model that has been validated in the field of cognitive psychology. We believe that many domains can be concerned with our approach as long as their examples can be expressed as sequences of lexemes. Our next task will be to design experiments with human subjects to develop more elaborate ways of selecting stimuli. / thank Susan Dumais and the Bellcore Labs to have allowed me to use and hack the code of the basic LSA programs. I am also grateful to Erica de Vries and Philippe Dessus for their comments on previous versions of this article. References [ 1 ] C. Burgess and K. Lund, Modelling Parsing Constraints with High-dimensional Context Space. Language and Cognitive Processes \ 2(2/3) (1997) 177–2! 0. [2] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer and R. Harshmann, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science 41(1990) 391:407. [3] S.T. Dumais, Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers 23(2) (1991) 229–236. |4] S.T. Dumais, T.A. Letsche, M.L. Littman and T.K. Landauer, Automatic cross-language retrieval using Latent Semantic Indexing, In AAAI Spring Symposium on Cross-Language Retrieval using Latent Semantic Indexing, 1997. 15] P.W. Foltz and S.T. Dumais, Personalized Information Delivery: An Analysis of Information Filtering Methods, Communications of the ACM 35( 12) (1992) 51 –60. ( 6) P.W. Foltz, Latent Semantic Analysis for text-based research. Behavior Research Methods, Instruments, & Computers 28(2) (1996) 197–202. [7] S.D. Krashen, Second Language Acquisition and Second Language Learning, Prentice-Hall International, 1988. |8) T.K. Landauer and S.T. Dumais, A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge, Psvchological Review 104(2) (1997)211–240. |9] B. Lemaire, Models of High-dimensional Semantic Spaces, In Proceedings of the 4th International Workshop on MultiStrategy Learning (MSL'98), June 1998. [10] K. Lund and C. Burgess, Producing high-dimensional semantic spaces from lexical cooccurrence. Behavior Research Method. Instruments, & Computers 28(2) (1996) 203-208. [ 1 1 ] B. Render, M.E. Schreiner, M.B. Wolfe, D. Laham, T.K. Landauer, and W. Kintsch. Using Latent Semantic Analysis to assess knowledge: Some technical considerations. Discourse Processes 25 (1998)337–354. [ 12] L.S. Vygotsky, Thought and Language, Cambridge, M.I.T. Press, 1962. [13] E. Wenger, Artificial Intelligence and Tutoring Systems, Morgan Kaufman, 1987.


535

Improving an intelligent tutor's comprehension of students with Latent Semantic Analysis* Peter Wiemer-Hastings Katja Wiemer-Hastings Arthur C. Graesser The University of Memphis Department of Psychology Memphis TN 38152-6400 ([email protected])

Abstract AutoTutor is an intelligent tutor that interacts smoothly with the student using natural language dialogue. This type of interaction allows us to extend the domains of tutoring. We are no longer restricted to areas like mathematics and science where interaction with the student can be limited to typing in numbers or selecting possibilities with a button. Others have tried to implement tutors that interact via natural language in the past, but because of the difficulty of understanding language in a wide domain, their best results came when they limited student answers to single words. Our research directly addresses the problem of understanding what the student naturally says. One solution to this problem that has recently emerged is Latent Semantic Analysis (LSA). LSA is a statistical, corpus-based natural language understanding technique that supports similarity comparisons between texts. The success of this technique has been described elsewhere [3, 5, for example]. In this paper, we give an overview of LSA and how it is used in our tutoring system. Then we focus on an important issue for this type of corpus-based natural language analysis, namely, how large must the training corpus be to achieve efficient performance? This paper describes two studies which address this question, and systematically tests the kind of texts needed in the corpus. We discuss the implications of these results for tutoring systems in general. "This project is supported by grant number SBR 9720314 from the National Science Foundation's Learning and Intelligent Systems Unit, and by the work of the other members of the Tutoring Research Group at the University of Memphis: Ashraf Anwar, Myles Bogner, Patrick Chipman, Scotty Craig, Rachel DiPaolo, Stan Franklin, Max Garzon, Barry Gholson, Doug Hacker, Xiangen Hu, Derek Harter, Jim Hoeffner, Jeff Janover, Bianca Klettke, Roger Kreuz, Kristen Link, Johanna Marineau, Bill Marks, Lee McCaulley, Michael Muellenmeister, Fergus Nolan, Brent Olde, Natalie Person, Victoria Pomeroy, Melissa Ring, Holly Yetman, and Zhaohua Zhang. We wish to thank Peter Foltz and Walter Kintsch for their comments on our use of LSA, and Priti Shah for suggestions on presentation of the results. We also wish to acknowledge very helpful comments on a previous draft by three anonymous reviewers.

536

1

P. Wiemer-Hastings et al. /Intelligent Tutor's Comprehension of Students

Introduction

In the past many intelligent tutoring systems have been developed in scientific or mathematical tutoring domains. Topics in such domains can be relatively cleanly defined, with a set of problem-solving exercises and expected answers. This scientific bent also fits in well with the interests of many AI researchers. However, the advantages of this approach come at a cost. First, it confines tutoring domains to a narrow range of topics. Second, such tutoring systems are inflexible in accommodating different and perhaps more efficient modes of learning. Entering numerical answers into computers is just one way of interacting with a tutor. Some education researchers have argued that students learn better when they verbally process the learning material in a tutoring situation [l for example]. We are using analyses of human-human tutoring situations and a set of new technologies to develop an intelligent tutor that interacts with students through such natural tutorial dialogue. The primary goal of the project is to produce natural interaction, not to increase student learning. Following the educational results cited above, we assume that a cooperative, constructive dialogue will increase learning. A key technological requirement for this project is a tool that robustly understands the students' natural language contributions. A corpus-based, statistical technique called Latent Semantic Analysis (LSA) has recently been used in other text analysis tasks. Its comprehension performance correlates well with human experts. We use LSA to evaluate student contributions and help the tutor decide what dialogue move to perform next. This paper gives a broad overview of LSA, and how it is used in our tutoring system, AutoTutor. We present findings which show that LSA performs comparably with human raters in evaluating the quality of student answers. Our discussions focus on a key issue for such a corpus-based natural language mechanism: the amount of corpus material that is needed to provide adequate performance. Then we address a follow-up issue of how closely that corpus should be related to the tutoring topic.

2

Overview of AutoTutor

To facilitate our description of the language understanding module, we give a general overview of the AutoTutor architecture here. For a more detailed description, see [9]. The basic "protocol" of a tutoring session with AutoTutor is modeled on human tutoring sessions [4, 6]. The tutor asks a question or poses a problem, and collaborates with the student to construct what the tutor judges to be a fairly complete answer to the question. Then the process repeats. Most human tutors are not highly trained, but are instead peers of the students. Tutors often use simple props or drawings to help their students learn. Tutors do not get very far "into the heads" of students [6]; they typically have only a shallow understanding of what the students say, but can determine whether a response is in the general vicinity of the expected answer. Despite the lack of complete understanding, survey studies have shown a huge advantage for face-to-face tutoring sessions over classroom situations [2]. The user interface to AutoTutor consists of two windows: one for displaying animated or static graphics, and one for the student to type in her replies.1 There is also a will attempt to integrate a speech understanding mechanism in a later stage of the project.

P. Wiemer-Hastings et al. / Intelligent Tutor's Comprehension of Students

537

talking head on the screen which speaks AutoTutor's contributions (with moving lips), and gestures to appropriate parts of the graphical display. AutoTutor's knowledge of its tutoring domain resides in a curriculum script. This is not a script like the proverbial restaurant script or a script in a play, but a static representation of the questions or problems that the tutor is prepared to handle in a tutoring situation [7]. AutoTutor's current curriculum script contains three different topics within our tutoring domain. For each topic, there are 12 different questions, or problem-solving exercises which are graded from easy to hard, based on theoretical analyses of what it will take to completely solve them. For each question or problem there is also: (a) an optional textual or animated information delivery item, (b) a relatively lengthy complete and correct "ideal" answer, (c) that ideal answer broken down into a set of specific good answers which each address one aspect of the ideal answer, (d) a set of additional good answers, (e) a set of bad answers, (f) a set of question that the student would be likely to ask, with appropriate answers, and (g) a succinct summary. For each aspect of the ideal answer there are three additional items to help the student construct that aspect: a hint, a prompt, and an elaboration. The current tutoring domain for AutoTutor is computer literacy. This is a required class at the University of Memphis, so we have easy access to students on whom we can test the system. Several members of the project have experience teaching this class. Although it may seem to be a contradiction from our stated desire of steering away from a more formal or scientific domain, the class is full of issues like the relative merits of the Macintosh and Windows operating systems, or different approaches to promoting computer security. AutoTutor's curriculum script focuses on such issues and deep reasoning questions.

3

Assessing student answers with LSA

As previously mentioned, LSA is a corpus-based, statistical mechanism. It was originally developed for the task of information retrieval: searching a large database of texts for a small number of texts which satisfy a query. A number of researchers have recently evaluated LSA on other tasks, from taking the TOEFL analogy test, to grading student papers [5, 3]. We give a broad overview LSA of here, and concentrate on its use in AutoTutor. (For more details about LSA, see the recent special issue of Discourse Processes, volume 25, numbers 2 & 3, 1998, on quantitative approaches to semantic knowledge representations.) The training of LSA starts with a corpus separated into units which we will call texts here. For the AutoTutor corpus, we used the curriculum script, with each item as a separate text for training purposes. The corpus also included a large amount of additional information from textbooks and articles about computer literacy. Each paragraph of this additional information constituted a text. The paragraph is said to be, in general, a good level of granularity for LSA analysis because a paragraph tends to hold a well-developed, coherent idea (Peter Foltz, personal communication, October 1997). LSA computes a co-occurrence matrix of terms and texts. A "term" for LSA is any word that occurs in more than one text. The cells in this matrix are the number of times a particular term occurs in a particular text. A log entropy weighting is performed on this matrix to emphasize the difference between the frequency of occurrence for a term in a particular text and its frequency of occurrence across texts. Then the matrix is

538

P. Wiemer-Hastings et al. /Intelligent Tutor's Comprehension of Students

reduced to an arbitrary number of dimensions, K, by a type of principle components analysis called singular value decomposition (SVD). The result is a set of weightings (the singular values, or eigenvalues) and a set of K-long vectors: one for each term, and one for each text. The normalized sum of the vectors of the terms in any text equals the vector for the text. The distance between any two vectors is conveniently calculated by their geometric cosine. This distance is interpreted as the semantic distance, or similarity, between the terms or texts. A cosine close to 1 indicates high similarity. A cosine of 0 (for an orthogonal vector in the K-dimensional space) indicates low similarity or complete unrelatedness. It appears that the data compression of the SVD forces terms that occur in similar contexts to have similar representations; it is claimed that this contextual co-occurrence carries semantic information. The training is done in advance of the AutoTutor tutoring sessions. AutoTutor uses the results of the training to evaluate student responses in the following way: A vector for the student contribution is calculated by summing the vectors of the terms included in the contribution. This vector is compared with the text vectors of some of the curriculum script items for the current topic. In particular, AutoTutor calculates a general goodness and badness rating by comparing the student contribution with the set of good and bad answers in the curriculum script for the current topic. More importantly, it compares the student response to the particular good answers that cover the aspects of the ideal answer. We calculate two measures with this comparison: • Completeness: the percentage of the aspects of the ideal answer for the current topic which "match" the student response • Compatibility: the percentage of the student response (broken down into speech acts) that "match" some aspect of the ideal answer A "match" is defined as a cosine between the response vector and the text vector above a critical threshold. By comparing human ratings of these same measures with these LSA ratings computed with a variety of thresholds and dimensionalities (Ks) we can empirically determine which settings work best for a given task and corpus. As described in [10], such an evaluation showed that a threshold of .55 with a 200dimensional space correlated highest with the average ratings of four human raters (r = 0.49). Two human raters with intermediate knowledge of computer literacy correlated with each other r = 0.51.2 Because we are starting the project by attempting to model untrained human tutors (who produce excellent learning gains), we are quite happy with this level of performance. A third variable that affects the performance of LSA in such a task is the size of the training corpus. This issue is of great practical significance for others wishing to create such a corpus-based natural language understanding mechanism. The remainder of this paper describes our exploration of this issue. 2 The correlations reported here are for the Compatibility metric defined above. Two domain experts correlated on this metric r=0.78. The correlations between LSA and humans were lower for the Completeness score because of differences in the way the non-LSA portion of the measure was computed.


539

Threshold

Figure 1: Evaluation performance by different sized corpora

4

How much corpus is enough?

In order to evaluate the contribution of the size of the corpus to LSA's performance, we randomly excised items from the supplemental corpus (i.e. the textbook material). It was necessary to keep the curriculum script items in the corpus in order to evaluate the metrics, but they account for only 15% of the entire 2.3 MB corpus. The supplementary corpus was split into two parts: The "specific" subcorpus deals with the tutoring topics: computer hardware, software and the internet. The "general" subcorpus covers other areas of computer literacy. The specific and general subcorpora accounted for 47% and 38% of the total corpus respectively. We tested 4 different amounts of corpus and maintained the balance between specific and general text by randomly removing none, 1/3, 2/3, or all of each of the specific and general subcorpora. The ideal balance between specific and general text is discussed below. Because the size of the corpus could affect the dimensionality and threshold, we tested the performance with a 4x3x19 design, with four levels of corpus size, 3 different dimensionalities (200, 300, and 400) that had previously performed well, and 19 critical threshold values, from 0.05 to 0.95 in 0.05 increments. For each combination of these factors, we tested LSA's correlation with the ratings of the human raters. We performed a multivariate analysis of variance (MANOVA) on these data, with correlation between the LSA rating and average human rating as the dependent variable and corpus size, dimensionality, and threshold value as predictors. We obtained main effects for amount of text (significant at the .01 level), number of dimensions (significant at the 0.05 level), and threshold value (significant at the .01 level). There were also significant interactions between the size of the corpus and thresholds, and between dimensions and thresholds. Figure 1 plots performance for each level of corpus size by threshold, averaged across the different levels of dimensions. As expected, LSA's performance with the entire corpus was best, both in terms of the maximum correlation with the human raters and in terms of the width of the threshold value range in which it performs well. One surprising result is the negligible difference between the 1/3 and 2/3 corpora (the two lines with intermediate performance in the middle thresholds). Clearly there is not a linear relation between the amount of text and the performance of LSA. Another surprise was the relatively high performance of the corpus without

540


Threshold Figure 2: Performance with no additional corpus materials any of the supplemental items, that is, with the curriculum script items alone. This demonstrates that a relatively small set of items that are closely relevant to the given task can produce acceptable performance with LSA. Figure 2 shows the performance of the curriculum-script-only corpus for the three different dimensionalities. There is an interaction between the number of dimensions and the threshold values: at lower dimensionalities, LSA performs better with higher thresholds.3 In addition, it shows that LSA achieves the best performance with this corpus with the 200 and 300 dimensional spaces, achieving a maximum correlation of r = 0.43 with the human raters. This is almost as high as the maximum correlation we obtained for the entire corpus (r = .49).

5

Does the LSA corpus need specific or general text?

In our first experiment, we kept the original balance between the amount of domain specific and domain general text. We originally came to this balance in an effort to give LSA a "well-rounded education". We were advised against using a very general corpus like an encyclopedia which others have used for other tasks [5] because it would dilute the knowledge base by flooding it with terms which it would never encounter in the tutoring domain (Peter Foltz, personal communication, October 1997). We did want to include a range of texts from within computer literacy so that the student could bring in other technical terms that were not strictly within the confines of the tutoring topics (hardware, software, and the internet). We collected all the text from two computer literacy textbooks, and supplemented the chapters on the tutoring topics with 10 additional articles or book chapters about each of those three topics. We did a further manipulation of the training corpus to address the question of which ratio is best between domain specific and domain general texts. We used the same 4 partitions of each subcorpus as in the previous experiment, but this time combined the parts in each of the 16 possible ways. We again compared the performance of each resulting LSA space with three different dimensionalities and 19 different threshold 3 At the highest threshold values, no student contribution ever exceeded the threshold, so no correlations with human ratings could be computed.


541

Percent of General Corpus

Figure 3: Effects of varying ratios of specific and general corpora values. A MANOVA of these data showed main effects of the size of the specific corpus, the size of the general corpus, and of the threshold values (all significant at the 0.01 level). There were interaction effects between specific and general corpus size, specific corpus and thresholds, general corpus and dimensions, general corpus and thresholds, and dimensions and thresholds (the interaction between the general corpus and the dimensions was significant at the 0.05 level, all others at the 0.01 level). Figure 3 shows the relationships obtained between the specific corpus size and the general corpus size, with the performance averaged across the number of dimensions. Each group of bars represents one level of general corpus. Within the groups, the specific corpus varied. A full line graph of all of these data is similar to that shown in figure 1. The best and worst performance were again produced by the full corpus and the curriculum-script-only corpus. All of the other corpora were crowded in between. Figure 3 again shows the non-linearity of performance that was apparent in the first experiment. The smallest and largest corpora perform significantly better and worse than the others. But the other levels of corpus size and balance are almost indistinguishable. These findings support the general notion that more of the right kind of text is better for LSA. But it also suggests that empirical testing of the corpora is still necessary. A smaller corpus takes less time to train, less storage space, and less processing time for comparisons. Thus, if there is no significant performance advantage with larger corpora, they can be avoided.

6

Discussion

An intelligent tutoring system which can interact with a student using natural language promises many advantages over traditional systems in both the range of tutoring domains and tutoring styles available, and may lead to better learning. A critical technology for such a system is a natural language processing mechanism that can robustly understand student input, but this goal has been elusive for decades. LSA provides such a mechanism that performs at the level of human raters with intermediate domain knowledge. We think it will allow us to create an intelligent tutor that simulates a human tutor. Our analyses of different sizes of training material showed monotonic but not linear increases in LSA performance. This supports the general benefit of increasing the size of

542

P. Wiemer- Hastings et al / Intelligent Tutor's Comprehension of Students

the training corpus of relevant text. Our manipulation of the balance between general and specific texts seems to support our initial hypothesis that an approximately equal balance or one slightly favoring the specific texts is advantageous to LSA. In regard to the generality of the findings reported here, it has been noted in previous work that there is a positive correlation between the length of a text and the ability of LSA to accurately judge its quality [8]. For grading essays, LSA produced the most reliable grades when the length of the (manipulated) text was above 200 words. In the tutoring task, the length of our student contributions is significantly smaller, averaging 16 words. It is likely that this limits the level of performance of the LSA mechanism. However, it is also likely that short texts are more difficult for humans to assess, as evidenced by the correspondingly low correlations between our intermediate human raters. Fill-in-the-blank interfaces can lead students to a guess-and-test approach to a tutoring situation. We hope that by pushing students to construct full natural language answers to questions, they will learn better. Our analyses of the performance of LSA suggest that it can provide the same level of distinction as that shown by untrained human tutors, and thereby better support the learning process.

References [1] M. T. H. Chi, N. de Leeuw, M. Chiu, and C. LaVancher. Eliciting self-explanations improves understanding. Cognitive Science, 18:439-477, 1994. [2] P. A. Cohen, J. A. Kulik, and C. C. Kulik. Educational outcomes of tutoring: A meta-analysis of findings. American Educational Research Journal, 19:237–248. 1982. [3] P.W. Foltz. Latent Semantic Analysis for text-based research. Behavior Research Methods, Instruments, and Computers, 28:197-202, 1996. [4] A. C. Graesser, N. K. Person, and J. P. Magliano. Collaborative dialogue patterns in naturalistic one-to-one tutoring. Applied Cognitive Psychology, 9:359–387, 1995. [5] T.K. Landauer and S.T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104:211-240, 1997. [6] N. K. Person. An analysis of the examples that tutors generate during naturalistic one-to-one tutoring sessions. PhD thesis, University of Memphis, Memphis, TN. 1994. [7] R. T. Putnam. Structuring and adjusting content for students: A study of live and simulated tutoring of addition. American Educational Research Journal, 24:13-48, 1987. [8] B. Rehder, M. Schreiner, D. Laham, M. Wolfe, T. Landauer, and W. Kintsch. Using Latent Semantic Analysis to assess knowledge: Some technical considerations. Discourse Processes, 25:337-354, 1998. [9] P. Wiemer-Hastings, A. Graesser, D. Harter, and the TRG. The foundations and architecture of autotutor. In B. Goettl, H. Halff, C. Redfield, and V. Shute, editors, Intelligent Tutoring Systems, Proceedings of the 4th international conference, pages 334-343, Berlin, 1998. Springer. [10] P. Wiemer-Hastings, K. Wiemer-Hastings, and A. Graesser. Evaluating student answers in a tutoring session with Latent Semantic Analysis. In preparation


543

Modeling Pedagogical Interactions with Machine Learning Sandra Katz Learning Research and Development Center University of Pittsburgh, Pittsburgh PA 15260 katz+©pitt.edu John Aronis, Colin Creitz Department of Computer Science University of Pittsburgh, Pittsburgh PA 15260 Abstract: This paper describes our approach to coding pedagogical interactions that took place between avionics students and domain experts in an ITS for electronic troubleshooting called SHERLOCK 2. We also describe prototype machine learning systems that we developed to learn grammars of discourse structure. The grammars revealed instructional functions of particular speech acts that we had not been aware of, and provided concise models of explanations common in diagnostic tasks.

1. Introduction Models of classroom-based teaching and one-on-one or small group tutoring can guide teacher training, the development of automated tutors, and research on the features of instructional dialogue that correlate with learning (e.g., [12]). We describe our approach to coding pedagogical interactions that took place between avionics students and domain experts in an ITS for electronic troubleshooting called SHERLOCK 2 (e.g., [8]). We then describe prototype machine learning systems that we developed to learn grammars of discourse structure. We discuss the results of our first attempt to use these systems to model the coded SHERLOCK 2 dialogues. In addition to modeling discourse structure, these techniques led to discoveries about the connections between form and function in tutorial dialogue. 2. The Corpus of Tutorial Dialogue and How We Coded It We obtained a substantial corpus of problem solving and reflective dialogue (approximately 35 one-hour sessions). The data was collected to identify the kinds of support students need to collaborate effectively in SHERLOCK 2 [6]. We observed eight pairs of avionics students (dyads) who were coached by eight experienced avionics technicians (mentors) from local Air National Guard units. Students in each dyad took turns in the roles of problem solver and peer coach. In apprenticeship fashion, the mentors' roles were to guide the student coach when he could not help his peer, to intervene when they deemed it necessary, and to comment on students' performance. Interaction during problem solving was typed via a "chat" window and was spoken during the post-performance review (PPR).

544

S. Katz et al. /Modeling Pedagogical Interactions

Advise Disclose Question Request Disjunctive-Request Diagnostic-Query Appraise Confirm Refute Correct

Direct Make-Sense Interpret Convert Predict Hypothesize Instruct Define Identify Compare Inform

Explain Knowledge-Explanation Step-Explanation Causal-Process-Description Strategic-Goal-Explanation Justify Step-Justification Strategic-Goal-Justification Knowledge-Justification

Table 1: Speech Acts During Learning Interactions Our approach to conversational analysis is hierarchical. We describe speaker intent using a taxonomy of speech acts [10] tailored to instructional dialogue (see Table 1). These speech acts map onto the domain-neutral descriptors of speaker intent in other systems—e.g., Instruct maps to Enable in Rhetorical Structure Theory [9]. Similarly, the taxonomy of topics we use maps onto general semantic categories—e.g., actions, states, and events. Our approach thus supports discourse modeling from general and instruction-specific perspectives. Some of the speech acts are "atomic,'' because they are not subdivided further—i.e.. Advise, Disclose, Question, Appraise, Direct, Make-Sense. Instruct, and their members. Other speech acts—Explain, Justify, and their members—are "complex" because they consist of atomic speech acts and, recursively, other complex speech acts. Following Chandrasekaran et al. 's classification of explanations in diagnostic domains [2], we distinguish between explanations about how a system works (Knowledge-Explanations). explanations about system status according to the results of diagnostic actions (StepExplanations), descriptions of causal mechanisms (Causal-Process-Descriptions), and explanations about how and why to achieve particular problem-solving goals (StrategicExplanations). We specify speech acts further by describing their relationships with other speech acts—for example. Knowledge-Explanation of an Advise act. which we write as "Knowledge-Explanation of advice." We parse each speaker's turn (contribution) into its constituent speech acts. The intermediate nodes of the parse tree consist of complex speech acts, while the leaves are atomic speech acts. To represent dialogue interactions, we group speaker turns into exchanges. Finally, at the highest level of analysis, we group exchanges into conversations. A conversation contains a series of exchanges about a particular topic. Figure 1 illustrates a short conversation coded using this scheme. See [7] for details.

3. Methodology Traditionally, machine learning has been used to analyze scientific, medical, and business data. Since text and discourse lack the rigid structure of formatted databases, they have been largely neglected by the machine learning community (but see [5]). In this section we give an overview of three approaches we used to analyze the data from tutorial sessions in SHERLOCK 2. The simplest way to compare two sets of contributions is to compute aggregate properties of each set, then compare these properties. For instance, we can distinguish


545

CONVERSATION

Exchange 1 Type: Instruction in procedural knowledge

Initiating Contribution

Response Contribution Speaker: Student Text: Hold on... Yes, it would be 0100, right?

Speaker: Tutor Text: Do you know how to write 9 in binary?

Segment 1 Text: Do you know how to write 9 in binary? Speech Act: Diagnostic query Topic: How to convert numbers

Segment 1 Text: Hold on... Yes Speech Act: Disclose Topic: Mental/emotional state

Segment 2

Figure 1: A Hierarchical Approach to Dialogue Analysis Knowledge-Explanations from Step-Explanations by comparing the fraction of speech acts of type Inform and Interpret using the following mixture table:

Knowledge-Explanation Step- Explanation

Inform .60 .01

Interpret .20 .65

It is apparent from this table that Knowledge-Explanations are distinguished from StepExplanations by a greater fraction of Inform acts, and a lower fraction of Interpret acts. Although this kind of analysis is often useful, it has serious shortcomings since it removes most of the structure from the original data. Because order plays a crucial role in discourse, we built a system called String Rule Learner (SRL) that can learn patterns of speech acts. These patterns specify which speech acts are present in the contributions being analyzed and in what order. Patterns are similar to regular expressions and are matched against contributions: a speech act in the pattern must match that speech act in the contribution, a speech act with a Kleene-star must match zero or more occurrences of that speech act in the contribution, and a speech act with a Kleene-plus must match one or more occurrences of that speech act in the contribution. Patterns may also contain the token Any, which can match any speech act in the contribution. SRL attempts to characterize a set of contributions, C, by performing a heuristic search to find patterns that cover a large proportion of C but little of its complement. For instance, SRL learned the following rules to characterize Knowledge-Explanations of corrections during problem solving and the same speech act during PPR, respectively: Identify Any* Inform Any*

Problem-Solving PPR

That is, Knowledge-Explanations of corrections during problem solving often begin with an Identify, but during PPR they often begin with an Inform. Although these rules show a structural difference between explanations during problem solving and PPR while highlighting an interesting symmetry, they remain silent

546

S Kate et al. / Modeling Pedagogical Interactions

about what occurs after the initial speech act. Also, these rules say nothing of any possible phrase or recursive structure of contributions. Furthermore, rules found by SRL are expressed solely in terms of the original vocabulary of atoms provided to the system. These shortcomings stem from several sources. First, supervised learning methods, such as SRL, simply learn to distinguish different classes of objects. This is often the right thing to do when the researcher approaches the data with a question of the form, "how are X's different from Y's?" However, it might focus learning on distinctions that successfully discriminate between classes but have no meaning in any larger sense. Second. while SRL's use of patterns is more expressive than approaches based on a fixed set of features, regular expressions are inadequate to capture the full structure of discourse. Finally, like most machine learning systems, SRL cannot invent new terms or extend its basic representation. In the remainder of this section, we describe the Grammar Learner (GL) system and our efforts to overcome some of these limitations. In a probabilistic context-free grammar (PCFG). each rule is augmented with the probability that the category on the left-hand side of the rule will be rewritten according to that rule [3]. For instance: Knowledge-Explanation X Identify-Group

—> —>

X (.3) | X Causal-Process-Description (.7) Identify-Group X (.6) Identify-Group (.4) Inform (.3) | Identify (.4) Interpret (.3)

Like a context-free grammar, a PCFG determines a language. It also determines the distribution of sentences in that language. For instance, Knowledge- Explanations of corrections are more likely to end with a Causal- Process- Description than not. We introduce some conventions to make PCFG's easier to read. Since we are usually not interested in the exact probabilities of rewrite rules but only their relative frequency, we may omit the numbers and simply order rewrite rules according to their probabilities. Also, since many recursive rules, such as the second rule in the above grammar. specify simple iteration, we may rewrite them using the Kleene-plus notation of regular expressions. Finally, we indicate optional elements of a rule by enclosing them in curly brackets. With these conventions, our grammar can be written: Knowledge- Explanation Identify-Group

—> —>

Identify-Group+ {Causal-Process-Description} Identify | Inform | Interpret

Although some information is lost, we find this format to be more useful than the original notation. In any case, the original PCFG is available to make finer distinctions. We can interpret a set of contributions to be sentences in a language. To find the structure of this language we assume it is generated by a hidden process, then search for the best hypothesis G for this generator which must both account for the data, D, and also be highly likely. These goals are generally at odds with each other: more complex processes can better account for the data, but simpler processes are more likely. Bayesian statistics provides a partial answer to this dilemma [4, 11]. Given a set of contributions D, the probability that G is the generator of D is:

Since we want to find a PCFG with the maximum posterior probability given a set of contributions D), we search for a G that maximizes this equation. That is, we seek a G that both accounts for the data (makes P(D\G] large) and is likely (makes P(G]

S. Katz et al. / Modeling Pedagogical Interactions

547

large). Finding a satisfactory hypothesis G requires the solution of three problems: 1) we must compute the posterior probability of a grammar G given data D, 2) we must compute the prior probability of a grammar G, and 3) we must search the space of grammars for one that maximizes P(G\D). Several authors (e.g. [3]) describe how to compute P(D\G). This is generally tractable, although if G is highly ambiguous we must resort to approximate methods. If we assume all grammars are equally likely a priori, then computing P(G] can be done combinatorially based on the size of G. See [1] for details; [11] describes a similar approach to learning PCFG's.

4. Results The techniques described in the previous section enabled us to address several questions about the structure, composition, and function of various components of tutorial dialogue. As researchers who have used machine learning in other domains have noticed, the rules with highest precision and the features with highest proportions in the mixture tables tend to tell you what you already know. It is the next tier of rules that often focus new forays into the data that lead, in turn, to new discoveries. This was our experience too. For example, we expected that system KnowledgeExplanations would be rife with Inform acts; that Step-Explanations would center around Interpret acts. Although finding these patterns increased our confidence that our coding of the data was reasonable, they weren't very interesting. But the rules on the second rung hinted, "something is going on here..."—for example, with Identify acts in Knowledge-Explanations and Predict acts in Causal-Process-Descriptions. Given the large number of speaker contributions in our corpus (over 3000), it would have been impossible to discover by hand what instructional role particular speech acts, in particular positions, were playing. We describe some of our findings. Why are Identify acts a prominent component of system Knowledge-Explanations ? The mixture tables told us that approximately 15% of all Knowledge-Explanations contain Identify speech acts and the proportions are even higher for particular types of Knowledge-Explanations. For example, 45% of Knowledge-Explanations of advice and corrections that took place during problem solving contained Identify acts. At first, we assumed that Identify was simply performing a referential function: the mentor was naming domain objects discussed in the explanation in order to establish a common ground of reference with the student. Our analysis of the data confirmed that Identify does indeed perform this function. However, identification was also playing more subtle instructional roles. In the example Knowledge-Explanation of corrections below, which occurred during problem solving, note how the mentor integrates the schematic label for "pin 48" (3.1) with instruction about its function (3.2): Student: 1. Hypothesize: I think pin 48 is the messed up one. 2. Step-Justification: because all the others give a 5 volts reading... Mentor: 3. Knowledge-Explanation of correction: 3.1. Identify: pin 48 is a Reset pin. 3.2. Causal-Process-Description: If it goes to 5 volts, it resets the test station. When beginning students look at schematic diagrams they are often completely confused. They tend to see a mass of lines and arrows and strange symbols with meaningless labels. Sometimes they invent their own descriptive names—e.g., one student referred

548

S. Katz et al / Modeling Pedagogical Interactions

to a resistor as "that funny little squiggly line." The student in the example above saw the label "Reset" but failed to connect it with the function of the component in the diagram. Experts, however, see "meaning behind the madness" of these diagrams. For example, they often use labels to figure out what components do. In the example above, the expert's Knowledge-Explanation guides the student through the process of deciphering labels to help make sense of the components in the diagram. Several students, including the one in this example, learned to use component names and other cues for diagnostic reasoning. Why do Identify acts tend to occur in initial position? The mixture tables pointed out the high proportion of Identify acts in Know ledge-Explanations. The patterns learned by SRL elaborated this—Knowledge-Explanations often start with an Identify. Why? Experts often use a category label of a simple, familiar object to set up an explanation about the function of complex, unfamiliar objects. Note, for example, how the mentor in the explanation below explains the function of a complex logic card that selects a relay to be activated by relating it to other components with the same basic function: Mentor: 1. Identify: Uhh, basically all it (BOO) is, is a relay, though. 2. Causal-Process-Description: All that card's [the logic card] doing is setting your ground. All it's doing is switching a ground to certain relays. The sense-making "mini-lesson" here is: when you see an unfamiliar component in the schematics, ask, "What does it do? How might its function relate to objects it communicates with?" In this case, the expert is trying to get the student to see that a component that provides input to a card with relays must play a role in determining which relay gets activated. He identifies the familiar object (a relay), whose operation most avionics students understand, to simplify the function of a complex, unfamiliar object (a logic card). Why are Predictions the backbone of causal explanations? SRL told us that CausalProcess-Descriptions of an interpretation of system status typically consist of a single speech act of any type, then one or more Predictions, followed by a sequence of zero or more speech acts of any type. When we examined the mixture tables for this rule, we saw that Predict played a more salient role in causal explanations than is suggested by this pattern: 100% of causal explanations of an interpretation contained at least one Predict act. Some causal explanations, in fact, consist of a string of Predict acts. For example: Student: 1. Request for interpretation: Is that [reading] okay? Mentor: 2. Interpret: Yea, 5 volts on pin 12 is fine. 3. Causal-Process-Description of interpretation: 3.1. Predict: relay B24 is going to be high 3.2. Predict: which means you should have a high on 4 3.3. Predict: which puts a high on [pin] 12 there 3.4. Predict: and then 43, 40, 9 will all be low. Students often take measurements somewhere in the middle of the circuit, get a reading and then ask, "Is that ok?" The problem is that some students don't understand that by tracing a signal forward from a switch to a component with a known value, they can


549

figure out the expected value of intermediate readings. In this example, the expert is explaining why there should be 5 volts on pin 12 (3), a signal between the switch and relay B24. He models how to trace the signal from the switch to relay B24, whose value is expected to be "high" because it is activated (3.1). He breaks the path into small chunks and predicts values each step of the way. Prediction thus plays an important role in showing students how to break complex interpretive tasks into smaller tasks. What do Knowledge-Explanations of corrections during problem-solving look like, and how do they compare to those during PPR? We used GL to form the following grammar of Knowledge-Explanations of corrections during problem solving: Knowledge-Explanation Identify-Group

—» Identify-Group + {Causal-Process-Description} —> Identify | Inform | Interpret

Here, GL makes two claims. First, by forming the category Identify-Group it says that Identify, Inform, and Interpret acts are syntactically interchangeable. Second, it claims that the most common structure of Knowledge-Explanations of corrections is one or more acts of type Identify-Group followed by an optional Causal-Process-Description. Knowledge-Explanations of corrections during PPR were more complicated, as reflected in the following GL grammar: Knowledge-Exp Inform-Group Predict-Group

—> Inform-Group + {Causal-Process-Description} | Inform-Group + {Predict-Group} Knowledge-Exp —>• Inform | Identify | Interpret —> Predict | Refute Direct

That is, a Knowledge-Explanation (abbreviated to "Knowledge-Exp" above) of corrections during PPR consists either of one or more acts from Inform-Group then an optional Causal-Process-Description, or one or more acts from Inform-Group then an optional act from Predict-Group and a (recursive) Knowledge-Explanation. The important thing to notice is that this grammar is an expansion of the first. The second rule and the category Predict-Group are new. Inform-Group is the same as Identify-Group in the first grammar, except Inform is now the dominant act in this category. We do not know why explanations during problem solving tended to be less elaborated and varied than those during debrief: was it the difference in communication medium (typing during problem solving versus speaking during debrief) or the difference in instructional activity itself (problem solving versus PPR), or both? Our future work will examine why the differences we observed between problem solving and PPR occurred and whether elaborated explanations should be the province of PPR.

5. Conclusions We have discussed two prototype systems that we developed to model discourse and the results of our first attempt to apply them to a substantial corpus of instructional dialogue. One system, GL, generates several candidate grammars and uses a Bayesian evaluation function to select the most plausible grammar. In addition to modeling the structure of tutors' explanations, GL and SRL enhanced our understanding of the role that particular speech acts play in tutorial dialogue. Further work will address the limitations of our approach and extend it to new problems, including analysis of larger units of discourse, namely, dialogue exchanges and conversations. Our systems are limited to learning syntactic, context-free grammars of

550

S. Katz et al. / Modeling Pedagogical Interactions

discourse. Clearly an adequate theory of discourse needs to incorporate semantics and pragmatics into a richer grammatical structure. Furthermore, it is not enough to know what human tutors do. We also need to know what they should do (e.g., [12]), and when—e.g., during problem solving, as opposed to during a post-practice, reflective discussion. We have begun to examine these issues, and have found GL and SRL to be useful tools to support this effort.

Acknowledgements The research reported in this paper was made possible by grant number N00014-97-10848 from the Office of Naval Research, Cognitive Science Division. We are grateful to Bruce Buchanan for helpful comments on the work, Gabriel O'Donnell for coding the data, and Linda Greenberg for technical assistance.

References [1] J. Aronis. A Bayesian approach to learning probabilistic context-free grammars. Technical Report, Intelligent Systems Laboratory. University of Pittsburgh. 1999. [2] B. Chandrasekaran, M. Tanner, and J. Josephson. Explaining control strategies in problem solving. IEEE Expert, 4(1). 1988. [3] E. Charniak. Statistical Language Learning. MIT Press. 1993. [4] P. Cheeseman, J. Kelly, M. Self. J. Stutz, W. Taylor, and D. Freeman. Autoclass: A Bayesian classification system. In: Proceedings of the Fifth International Conference on Machine Learning, 1988. [5] J. Chu-Carroll and N. Green, editors. Applying Machine Learning to Discourse Processing. AAAI Press, 1998. [6] S. Katz. Identifying the support needed in computer-supported collaborative learning systems. In: Proceedings of CSCL '95: The First International Conference on Computer Support for Collaborative Learning, 1995. [7] S. Katz. Peer and student interactions in a computer-based training environment for electronic fault diagnosis. Technical Report, Learning Research and Development Center. University of Pittsburgh, 1997. [8] S. Katz, A. Lesgold, E. Hughes, D. Peters, G. Eggan, M. Gordin, and L. Greenberg. SHERLOCK 2: An intelligent tutoring system built on the LRDC tutor framework. In: C. Bloom and R. Loftin (Eds.), Facilitating the Development and Use of Interactive Learning Environments. Lawrence Erlbaum Associates, 1998. [9] W. Mann and S. Thompson. Rhetorical structure theory: Towards a functional theory of text organization. TEXT, 8(3), 1988. [10] J. Searle. Speech Acts. Cambridge University Press, 1969. [11] A. Stolcke and S. Omohundro. Inducing probabilistic grammars by Bayesian model merging. In: Proceedings of the International Conference on Grammar Induction, 1994. [12] K. VanLehn, S. Siler, C. Murray, and W. Bagget. What makes a tutorial event effective? In: Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society. 1998.


551

An Intelligent Agent in Web-based Argumentation Ronggang Yu and Yam San Chee School of Computing, National University of Singapore Lower Kent Ridge Road, Singapore 119260 [email protected] [email protected] Abstract: The emergence of the Internet and World Wide Web (WWW) has triggered the development of computer supported collaborative argumentation (CSCA) tools. However, because of the difficulty of dealing with semantic issues, understanding arguments is still a major difficulty in designing and implementing these tools. In this paper, we propose a new approach to alleviate this problem by using the technology of a linguistics corpus in natural language understanding. From our survey and research, we find that computer-based arguments contain many regular patterns that occur with high frequency. Further more, many argumentation strategies and rhetorical methods are related to these regular patterns. Arguments can be partially or locally understood through these regular patterns, and a system can provide students with advice by interpreting these patterns. We implemented this approach on the platform of HyperNews. All technology details are hidden from users. Users experience an intelligent agent providing support for argumentation strategies and rhetorical methods during the process of web-based argumentation.

1. Introduction With the advance of the Internet and communication technology over the last several years, ComputerSupported Collaborative Argumentation (CSCA) tools such as bulletin board systems, mailing lists, USENET, and HyperNews have become more and more pervasive. Many instructors and researchers are trying to employ these tools for supporting collaborative learning activities. They either embed CSCA tools into their learning and collaborative working environment [1, 7, 15] or build CSCA tools with new features for the convenience of arguers [6, 12,17]. However, there has been little emphasis placed on improving students' argumentation skills using these CSCA tools, especially on the platform of World Wide Web. There is much research related to the training of students' face-to-face argumentation skills [5, 13, 22]. The structure of formal arguments is analyzed, and ways to make propositions more persuasive and refutations more compelling are provided. However, there are some differences between face-to-face argumentation and computer-supported argumentation. First, students don't need to take account of their opponent's age, sex and location in computer-supported collaborative argumentation. The relationship among them is peer to peer. Second, computer-supported collaborative argumentation is an open medium. Anyone interested can join an argument without considering date, location, time, etc. Third, most computer-supported collaborative argumentation platforms are asynchronous in nature. Participants can consider problems almost without time limitation. Fourth, almost all computer-supported collaborative arguments take the form of written language, which is a more formal medium of communication than oral arguments. Because of above four differences, some ways to improve students' face-to-face argumentation skills need to be modified and other new methods adopted when making use of the Internet and communication technology. Prior to our work, there have been many systems that focus on structuring discourse with the use of graphical interfaces to support computer-based argumentation. Euclid [19] provides a graphical representation language for generic argumentation; gIBIS [3] and JANUS-Argumentation [4] record the process of design in order to support and critique it in accordance with established methods in the design community. Suthers developed Belvedere [18], a prototype for teaching high school students collaborative reasoning and argumentation. Belvedere provides a diagramming tool for constructing representations of the logical and rhetorical relations within a debate. Buckingham Shum's research focuses on notation and tools to support argumentation as a means of capturing organizational knowledge and design rationale [1]. The general intention of prior work has been to help students analyze the relationship of the components of arguments and then organize them into high quality ones.

552

R. Yu and Y.S Chee / Web-Based Argumentation

The purpose of the research reported in this paper is to design and implement an intelligent agent to help students improve their computer-based collaborative argumentation skills, i.e., to improve the quality of their arguments. As HyperNews [9] is one of the most popular CSCA tools over the World Wide Web and the source code is available, we implemented the intelligent agent on this platform. In the following sections, we first report the results of a survey showing that regular patterns occur with high frequency in computer-based collaboralive arguments. Then, we illustrate how an agent can provide students with argumentation strategies and rhetorical methods during the process of web-based argumentation. Third, we give design and implementation details of our system. The architecture of the intelligent agent is also described. Finally, we state our conclusion and comments on future work to further improve the system.

2. Survey Regular patterns in arguments are a collection of words that reflect the relationship within or among sentences. They are the skeleton of the arguments, and they appear with high frequency. If we can accumulate sufficient regular patterns with appropriate interpretations in a database, then an intelligent agent can be programmed to search tor the regular patterns in related arguments and provide advice to students. We provide an example below to illustrate a regular pattern. The following extract is a section of an argument against the proposition that the death penalty can deter crime. After 35 years in the Prison Medical Service, an British doctor found that 'Deterrence is by no means the simple affair that some people think. A high proportion of murderers are so tensed up at the time of their crimes as to be impervious to the consequences to themselves; others manage to persuade themselves that they can get away with it." This last point underlines another weakness in the deterrence argument. Those offenders who plan serious crimes in a calculated manner may decide to proceed despite the risks in the belief that they will not be caught. The key to deterrence in such cases is to increase the likelihood of detection, arrest and conviction The death penalty may even be counter-productive in that it diverts official and public attention from efforts needed to bring about real improvements in combating crime. The deterrence argument is not borne out by the facts. If (1) the ?flftCtiye|y.Jhan..otherpunishment, one would expect 2 to find that in analyses of comoarable which have the death penalty for a particular crime would have a tower fate of that crime than those which do not. penalty and a decline In crime rates would be expected among states which introduce it tor thosa crimes. Yet (3) study after study has failed to establish any such link between the death penalty and crime rates

In the third paragraph, "if.(1). one would expect 12) , yet (3) " can be regarded as a regular pattern for the following two reasons; First, it is a generic word pattern without specific knowledge, and this pattern can occur in other arguments with high frequency. Second, this pattern can be interpreted or can be used as an argumentation strategy. Consider the following strategy. "Is there any situation where (2) can be reached not because (.1.) but because of other reasons?". This strategy can be used by the students. They can propose some reasons that conclude (2) to show that above refutation is weak and ineloquent. However, the words like "if. "yet" etc cannot be regarded as regular patterns because they lack the essential structure of an argument. All regular patterns with their interpretations in a database are referred to as a regular pattern corpus. Currently, we have 67 regular patterns in our corpus, and the number is still increasing. We have used this corpus to conduct a survey showing their status in computer-based arguments. The survey result is shown in Table 1. In this table, CEDA-L [16] is a college debate mailing list archive. EDEBATE [ 14] is the successor of CEDA-L. alt.philosophy.debate is a newsgroup containing arguments on all kinds of topics. We collected all the arguments that were posted on the three sites in the months shown against each site in Table I. From the following table, we find that on averages there is one regular pattern in every four or five sentences. The above result was obtained with a regular pattern corpus size of 67. Increase of the size of the corpus should result in much better statistics.

R. Yu and Y.S. Chee / Web-Based Argumentation

553

Table 1. Survey result of three computer-based argument archives (the corpus size is 67) Category of statistics

alt.philosophy .debate November 1998

CEDA-L January 1996

EDEBATE January 1996

No. of articles

177

296

371

No. of sentences

1757

4305

4939

No. of sentences per ankle

9.92

14.54

13.31

No. of pattern hits in all arguments

514

972

1117

No. of patterns per article

2.90

3.28

3.01

Average pattern reoccurrence rate

7.67

14.51

16.67

In conducting the survey, we were very concerned with the quality of our corpus. There is a tradeoff between the hit rate of regular patterns and their quality. For example, the word pattern "of.. .of..." occurs in arguments with high frequency, but this word pattern cannot be used as a regular pattern because there are no viable argumentation strategies related to this word pattern. On the other hand, although the word pattern "if...has the property of...one would expect... yet...thus..." has potentially high value as an argumentation strategy, it also cannot practically be regarded as a regular pattern because this word pattern does not occur regularly. Thus, a regular pattern should both be interpretable and occur with high frequency in arguments. From our survey, we can see that regular patterns can both be interpretable and occur with high frequency in arguments. This survey result can be used by an intelligent agent in argumentation to provide students with online immediate advice.

3. Illustration An agent is a system that tries to fulfill its goals in a complex, dynamic environment. It is situated in the environment and interacts through sensors and actuators. Autonomous adaptive agents operate totally autonomously and become better over time at achieving their goals. Agents that are situated in the "cyberspace" environment are known as "software agents" or "interface agents" [10]. Interface agents are computer programs that automate repetitive tasks to provide assistance to a user dealing with a particular computer application. The idea of employing agents to delegate computer-based tasks goes back to research by Negroponte [11} and Kay [8]. The research in this field is directed toward the ideal of agents that have highlevel human-like communication skills and can accept high-level goals and reliably translate these to low level tasks. A software agent in computer-supported collaborative argumentation environments can provide students with immediate online assistance or advice. The intelligent agent identifies and selects regular patterns, and then stores these patterns with their associated argumentation strategies and rhetorical methods, into the corpus. When the agent finds a sentence containing the same regular pattern again, it can provide advice back to the users of the computer-supported collaborative argumentation. In the follows, we list five ways in which students can make use of the system to improve their web-based argumentation skills. •

If students make a proposition either supporting or arguing against a viewpoint, but have no idea on how to expand their arguments, they can press the button named "Agent advice" for immediate assistance. The intelligent agent will provide argumentation strategies and rhetorical methods suitable for proposition or refutation with associated context information to the students. The agent interface is shown in Figure 1.

•

Online students can also have a chat with other online students by pressing the button named "Chat", In a chat room, they can discuss any topic with each other synchronously.

•

Users can also establish virtual teacher-student relationships in this computer-supported collaborative argumentation environment; they can find their "virtual" teachers or "virtual" students in HyperNews. The relationship can only be established if each recognizes the other. When students make arguments, their "virtual" teachers will receive those arguments. "Virtual" teachers can then give comments to their students. Students improve their web-based argumentation skills through reflecting on these comments, while teachers can also improve their argumentation skills by scaffolding [21].

554

R Yu and Y.S. Chee / Web-Based Argumentation

•

Students can search for suitable arguments on the platform by pressing the icon named "Search". They can perform searches by title, author, date, keywords, etc. The "Search" component will help them to retrieve the appropriate arguments that they are interested in.

•

When students use the HyperNews, they can press the icon named "Statistics" to calculate the number of articles that they have posted and the current article's sentence number and word number. This component helps students to be aware of the status of their contributions to web-based argumentation.

We now illustrate how students can obtain advice from the intelligent agent in web-based argumentation. Suppose one student makes an argument supporting the use of animals in medical research. Other students can add new articles either supporting or arguing against this viewpoint by pressing the icon named "Add message". An editting frame will be opened, and they can edit the articles to explain the reasons why they support or argue against this viewpoint.

Figure 1. Agent providing advice in web-based argumentation In Figure 1, the student tries to argue for the proposition that using animals in medical research should be encouraged. During the supporting procedure, if she has no idea on how to expand her anicle, she can press the button named "Agent Advice" for immediate online assistance. The agent analyzes related arguments and tries to find regular patterns by searching for them in the corpus. In this example, the intelligent agent finds the regular pattern: "(1) has (2) benefits to (3) " in the first sentence of the reference article. It retrieves the regular pattern's argumentation strategy: "you can enumerate more benefits to show that (1) can help to (3)" from the regular pattern corpus. This strategy is paraphrased with specific context information into "you can enumerate more benefits to show that the use of animals in medical research can help to make or test new drugs". The intelligent agent presents this strategy to the student with related rhetorical methods: using facts, and using example. Now she can begin to think about how to expand her argument with this argumentation strategy and the rhetorical methods provided by the intelligent agent. She can continue to click the button named "Agent Advice" for further advice until there are no more regular patterns found. If the regular patterns found in a referenced article are scarce, the intelligent agent can also retrieve general argumentation strategies from the database.

555


4. Implementation We implemented our intelligent agent using CGI/PERL programming, W3-mSQL script, and Javascript. The interface was implemented using JavaScript embedded in HTML files. Connection to database was implemented using W3-mSQL script which is also embedded in HTML files. The other components of the system were implemented by CGI/PERL script. In Figure 2, we present the components of the system and their relationships. Mini-SQL

Query Manager

Database

Pattern Adder

Pattern Miner

Figure 2. System architecture In Figure 2, the Pattern Miner is the component that finds regular patterns from the computer-supported collaborative argumentation environment. It uses HTML files as input. HTML tags and other markers are removed from these files. The Pattern Miner then gets each sentence from the plain files without HTML tags and tries to find two-word patterns, three-word patterns, four-word patterns etc, from each sentence. Figure 3 illustrates the How from HTML files to word patterns.

prune the words or patterns that occur only once prune unwanted words or patterns prune sparse words or patterns

Figure 3. Flows from HTML files to word patterns The words are marked with file number, sentence number, and word number. In Figure 1, if the referenced article number is 7, the word "benefits" can be marked as (benefits, 7, 0, 10). The two-word pattern "benefits to" can be marked as (benefits:to, 7, 0, 10), while the two-word pattern "has benefits" can be marked as (has:benefits, 7, 0, 7). The three-word pattern "has benefits to" can be marked as (has:benefits:to, 7, 0, 7). Some unwanted words or patterns should be removed, as they contribute nothing to the regular pattern corpus; for example, the words "a", "an", "the", "of, etc.

556

R. Yu and Y.S Chee / Web-Based Argumentation

It is easy for the Pattern Miner to get each word's file number, sentence number and word number. The Pattern Miner can also get marks of each two-word pattern by trying all two word combinations in a sentence. It prunes the two-word patterns that occur infrequently. Three-word patterns can get their file number, sentence number, and word number by parsing two-word patterns twice to determine whether those two-word patterns are in the same article and sentence. With the same method, the Pattern Miner can get four-word patterns, five-word patterns, etc. The system administrator can add, delete, or update the regular patterns in the database using the Pattern Miner and the Regular Pattern Adder. The Regular Pattern Adder is a password-protected HTML/W3-mSQL program which only the system administrator can access. The quality and the number of the corpus are maintained by the system administrator. With the help of the Pattern Miner, the system administrator can collect word patterns. Some of the word patterns could prove useless, while others can be used as regular patterns if they occur with high frequency and have an appropriate argumentation strategy interpretation. If the system administrator finds a regular pattern and decides to add it to the database, she can access the Pattern Adder and provide the information of this regular pattern including pattern name, pattern format, strategy format, rhetorical methods, etc. The Pattern Adder will automatically check the syntax of the strategies and rhetorical methods then save this regular pattern into the corpus. This component is programmed with W3mSQL script embedded in HTML files. The regular pattern corpus is stored in the database at the server side. The system administrator can access this database through a web-based Regular Pattern Adder. Every regular pattern can have four argumentation strategies, and every argumentation strategy has its own properties such as the syntactical functions and morphological features of the regular pattern and the surrounded context. The properties of the strategy are used by the Core Manager to make decisions on whether word patterns in arguments and regular patterns in the corpus are really matched. The Interface Manager receives the user's arguments and requests using CGI scripts. It then sends all these arguments and requests to the Core Manager. The Interface Manager also presents the strategies and rhetorical methods back to the users via a small window. This component is programmed with JavaScript. The Query Manager connects to the database. It can only query the database without any modification of the data stored. The Query Manager is called by the Core Manager to search for regular patterns in the corpus. Once a regular pattern is found, the Query Manager will send related argumentation strategies and rhetorical methods back to the Core Manager. The Core Manager is the most important component of the system. It receives the arguments from the Interface Manager. It then calls the Query Manager to check whether the word patterns in arguments match the regular patterns in the corpus. Figure 4 depicts the algorithm flow of how to check whether there is a match. This component is programmed with PERL script.

get a sentence sentence from the Interface Manager

gel first word of all the regular patterns in the corpus into @array

get the pattern in the corpus that begin with Sword, this pattern is saved as Soonern

yes paraphrase this pattern with specific context and then return back to the Interface Manaeer

Figure 4. Algorithm flow tor searching regular patterns in arguments


557

5. Conclusion and Future Work In this paper, we reported a method to automatically provide students with argumentation strategies and rhetorical methods during the web-based argumentation. A regular pattern corpus has been set up and the interface with W3-mSQL database and Web Server has been built. The intelligence of the agent is achieved by finding the regular patterns in arguments, then retrieving the strategies and rhetorical methods from the corpus, combining them with specific context, and providing immediate online assistance to students. From the result of this study, the intelligent agent helps students to improve their web-based argumentation skills. We are currently engaged in enlarging the size of the regular pattern corpus while maintaining the quality of newly added regular patterns. In order to collect more regular patterns, we are continuing our efforts with work on analyzing high quality arguments and mining the regular patterns for storing them into our corpus.

6. References [1] [2] [3] [4] [5] [16] [7] [8] [9] [10] [11] [12] [13] [ 14] [15] [16] [17] [18] [19] [20] [21] [22]

Buckingham Shum, S. & Hammond, H. (1994). Argumentation-based design rationale: What use at what cost? International Journal of Human-Computer Studies, 40(4), 603-652. Buckingham Shum, S., Tamara, S. and Diana, L. (1996). On the future of journals: Digital publishing and argumentation. In the proceedings of HCI'96. Conklin, J. & Begeman, M. L. (1987). gIBIS: A hypertext tool for team design deliberation. /// the proceedings of Hypenext '87, pp. 247-252. Fischer, G., McCall, R., and Morch, A. (1989). JANUS: Integrating hypertext with a knowledge-based design environment. In the proceedings of Hypenext'89, pp.105–117. Freeley, A. J. (1996). Argumentation and Debate: Critical Thinking for Reasoned Decision Making (ninth edition), wadsworth Pub. Co., Gaspar, R. F. & Thompson, T. D. (1995). Current trends in distance education. Journal of Interactive Instruction Development, pp. 21–27. Hurwitz, R. & Mallery, J. C. (1995). The open meeting: A web-based system for conferencing and collaboration. In the proceedings of the 4th International Conference on the World Wide Web. Kay, A., (1984). Computer software, Scientific American, 251(3), pp. 53-59. Laliberte, D. & Wooley, D. (1997). Presentation features of text-based conferencing systems. WWW Computer-Mediated Communication Magazine, May 1997, http://www.hypernews.org. Maes, P. (1994). Modeling adaptive autonomous agents. Artificial Life Journal, Vol. 1, MIT Press. Negroponte, N. (1997). The Architecture Machine: Towards a more Human Environment, MIT Press. Ou, K. L., Chang, C. K., and Chen, G. D. (1998). Web-based asynchronous discussion system. In the proceedings of 8" International Conference on Computer in Education, Vol 1, 108–117. Branham, R. J. (1991). Debate and Critical Analysis: The Harmony of Conflict. Hillsdale, N.J.: L. Erlbaum Associates. Ryder, J. (1997). Team topic debating in America, http://list.uvm.edu/archives/edebate.html. Schlageter, G., Buhrmann, P., Laskowski, F. and Mittrach, S. (1997). Virtual university: a new generation of net-based educational systems. In the proceedings of 7" International Conference on Computers in Education. Stanton, J. (1997). CEDA mailing list archive, http://www.cs.jhu.edu/-jonathan/debate/ceda-l/archive. Suthers, D., Toth, E. E. and Weiner, A. (1997). An integrated approach to implementing collaborative inquiry in the classroom. In the proceedings of Computer Supported Collaborative Learning'97. Suthers, D., Weiner, A., Connelly, J. and Paolucci, M. (1995). Belvedere: Engaging students in critical discussion of science and public policy issues. //; the proceedings of 7" World Conference of Artificial Intelligence in Education. Smolensky, P., Fox, B., King. R., and Lewis, C., (1987). Computer-aided reasoned discourse, or how to argue with a computer. In R. Guindon (Ed.), Cognitive Science and Its Applications for HumanComputerlinteraction, pp. 109–162. Saltzberg, S. & Polyson, S. (1996). Distributed learning on the World Wide Web. International University Consortium Computer-based Conference, http://www.umuc.edu/iuc/cmc96/cmc-basl.html Chan, T-W. (1996). Learning companion systems, social learning systems, and the global social learning club. Journal of Artificial Intelligence in Education, 1.7(2), 125-129. Ziegelmueller, G. W. (1997). Argumentation: Inquiry and Advocacy (third edition), Boston: Allyn and Bacon.


Virtual Realities and Virtual Campuses



561

Intelligent Assistance for Web-based TeleLearning Jean Girard, Gilbert Paquette, Alexis Miara, Karen Lundgren LICEF Research Centre, Tele-universite 1001 Sherbrooke St. East, Montreal gpaquett(a)licef.teluq. uquebec. ca http://www. licef.teluq. uquebec. ca Abstract. Intelligent assistance in telelearning environments is even more important than in individual tutoring systems because of the inherent complexity of distance education. But the problem here is quite different and provides large areas of unexplored territories especially in the full exploitation of the multiple data sources captured from the interaction of the different actors involved in a telelearning event. We will address some of these questions by presenting first a model of a Virtual Learning Centre (VLC) and an implementation for Web-based training called Explore. A VLC focuses on the interaction spaces between five theoretical actors: the learner, the informer (content expert), the trainer, the manager and the designer. We will give an example of such an environment and show how the VLC supports learners and the other actors in such a case. Then we will focus on a three-dimensional assistance space in a VLC based on a typology of assistance resources. Finally, methods and tools to build an intelligent advisor for a web-based Telelearning environment will be discussed using an operational JAVA implementation on the Internet. Key words. Learning environments and micro-worlds, Non-standard and innovative interfaces, Student modelling

1.

The Case for Advisor Agents in a Virtual Learning Centre

We live in societies coping with an exponential growth of information. In the knowledge society, new competencies and higher level skills are required. The rapidly evolving availability of multimedia telecommunication is a challenging answer to this knowledge acquisition and knowledge building gap. But we have to integrate many types of resources to really enhance learning. We see a telelearning system as a society of agents, to use Marvin Minski's term, some of them providing information, others constructing new information, stills others helping collaboration between agents or providing assistance to the other agents on content, pedagogical process or organisation of activities. What is behind terms like "distance education", "on-line learning", "telelearning", "multimedia training" is a multi-facetted reality from which we can identify six main paradigms: the open classroom integrating technologies in traditional classrooms, the virtual classroom [1,2], the teaching media, focused on multimedia courses on a CD ROM, Web-based training [3], On-line learning communities [4,5] and Electronic performance support system (EPSS) [6]. Intelligent assistance, based on knowledge of the learner's activities and production, is even more important in telelearning systems because of their complexity. But the

562

J. Girard et al. I Web-Based TeleLearning

problem is quite different than in individualised ITS research. It provides large areas of unexplored territories especially in the full exploitation of the multiple data sources captured from the interaction of the different actors involved in a telelearning event. 2.

EXPLORA, a Web-Based Virtual Learning Centre

Our Virtual Learning Centre model [10] emphasises the concept of process-based learning scenario coupled with assistance resources. Basically, the learner proceeds into a scenario, a network of learning activities, using different kinds of information resources to help her achieve the tasks and produce some outcome: a problem solution or new information that can be used in other activities. The assistance resources for each task are also planned at design time. The assistance can be distributed among many agents: trainers interacting through email or teleconferencing, other learners, contextual help or intelligent advisors. 2.1 Actors, roles and agents We have described elsewhere [7,8,9] how we have built an object oriented model of a Virtual Campus using software engineering methodology. In our Virtual Learning Centre architecture, we identify five actors. The Learner transforms information into personal knowledge."Information" here is any data, concrete or abstract, perceptible by the senses and susceptible of being transformed into knowledge. "Knowledge" means the information that has been integrated by a cognitive entity into its own cognitive system, in a situated context and use. The Informer (the content expert) makes the information available to the learner. It may be a person or a group of persons presenting information to the learners, but also a book, a video, a software or any other material or media. The Designer is the actor building, adapting and sustaining a learning system (LS) that integrates information sources (human informers or learning materials), and also selfmanagement, assistance and collaboration tools for the other actors. The Trainer facilitates learning by giving advice to the learner about his individual process and the interactions that may be useful to him based on the learning scenarios defined by the designer. Finally, the Manager facilitates learning by managing actors and events, for example creating groups or making teleservices available in order to insure the success of the process, based on the scenarios defined by the designer.

Figure 1 - Actors and interaction spaces

J. Girard et al. / Web-Based TeleLearning

563

2.2 Interactions spaces and resources Figure 1 shows the five theoretical actors and their interactions. We will limit ourselves to interactions in which the learner is involved while learning, at delivery time. Interactions between learner and designer. These are the interactions where the learner interacts with the learning environment into which the designer has in a way "mediated" himself by creating it. Here, the learner is preoccupied with the self-management of the learning activities, of their input resources and of the products he has to build. Interactions between learner and informer. These are the interactions where the learner, individually, consults the information made available by the informer, and process them in the production space to produce certain results while building personal knowledge. Interactions between learners. These are interactions using different forms of collaboration or cooperation between learners for team work, group discussion, collaborative problem solving, etc. Interactions between learner and facilitators. These interactions concern the assistance that the system can provide to the learner on both the pedagogic (from the trainer) and the management (from the manager) dimensions of telelearning. We will study these interactions in the next section. 2.3 Host systems and HyperGuides At delivery time, the learner and the other actors interact within a computerised learning environment. This host system presents the content of a learning event, proposes activities, identifies resources to achieve them and describes productions to make.

Figure 2 - A host teleleaming environment and a trainer's HyperGuide within Explore

564


Figure 2 present such a web-based host system that intends to help learners build knowledge about the job market, their job objectives, and methods to apply while searching for a job. Actors need different point of view on the host system in each interaction space: selfmanagement, information-production, collaboration or assistance. For example, in the information space, a learner will need different input resources such as a list of related web sites, descriptions of job categories, questionnaires to identify his skills or qualities, etc. In the same information space, a trainer needs other information resources: traces of the learners activities for diagnosis, information on the group of learners, information on learner productions, annotation tools to identify and organise information for assistance. These resources are made available through an external palette as shown in the floating window on the right of figure 2. This is what we call an HyperGuide. It is an actor's environment for a course or program supported by the Virtual learning centre. It groups resources into five interaction spaces (self-management, information, production, collaboration and assistance) according to an actors' role and course specificity. The resources can be generic tools developed specifically for the Virtual Learning Centre, they can be generic commercial tools or they can be web resources (used in many courses) such as a technical FAQ or a virtual library. The following table shows a possible distribution of resources into the production and the assistance spaces, for the Job search example, for the learner, the trainer and the designer. Learner PRODUCTION Text Editor Speadsheet Presentation Software Model Editor

Trainer PRODUCTION Diagnosis Tool Evaluation Forms Feedback Questionnaire

ASSISTANCE Access to trainer/manager Guided tour Technical FAQ Help desk Course advisor

ASSISTANCE Access to content experts Access to course designers FAQ on tutoring Tutoring guide

Designer PRODUCTION ISA Design Workbench Bank of design objects WebCT Authoring Tool Learning System Editor Advisor Definition Tools ASSISTANCE Instructional design advisor FAQ on ISD methodology Help resource persons

Table 1 - Example of the distribution of resources in the Virtual learning centre

These resources are external to the web course but some of them, for example the learner's trace, content navigator or intelligent advisors are linked to a course by the designer using the learning system editor and the advisor definition tools. There are many advantages to such an architecture. The Virtual Learning Centre is at the learning organisation's level, thus avoiding duplication and facilitating evolution and reuse of resources from one course to another. It also speeds up the design process because each individual web-course if freed from all the generic resources and the circulation of information management between different actors.


3.

565

Assistance in a Virtual Learning Centre

Whatever the agents, human or computer-based, the assistance must be «intelligent», that is, informed of the user, of the kind of tasks he is involved in, of the information he has consulted or produced, of the interactions and collaboration with others, and finally of his use of the assistance resources. In another words, the central question of ITS research, the user model, reappears in telelearning systems, but in a different way. We will first to present a typology of possible assistance resources, including advisor agents. Then we proceed with the preliminary step of course modelling and integration between the HyperGuide and the Web course. We finally present the way the user model is constructed and updated at delivery time and give some examples of the pieces of advice for a particular telelearning environment. 3.1 Types of assistance resources When the designer plans the telelearning system, he must select or build different kinds of resources for the assistance space. Assistance resources can be addressed to different actors, for different purposes and by different means, yielding a three dimensional assistance space. The preceding discussion gives a framework for these decisions. — The theoretical actors to which assistance is addressed is the first dimension of the typology. Often the organisation will distribute roles differently. For example, the trainer and the informer roles car be merged; as a consequence, there might be only four assistance spaces to design. On the other hand, the designer's roles can be split into more specialised roles such as content modelling, pedagogical design and learning material production, each of these actors having different assistance needs. — The second dimension of assistance, its object, can be related to each of the interaction spaces: self-management, information, production, collaboration or instructional and organisational assistance. The assistance can be given to help users progress through the activities, acquire consistent information and knowledge, engage in peer collaboration or use efficiently the available assistance resources. — Finally, the third dimension identify in what way the assistance will be provided: by a human facilitator, by an access to help desks, using FAQs and help files, providing contextual on-line help or intelligent advisor agents. Because of its complexity, to design assistance in a telelearning system is a huge task. To reduce this complexity, we propose to use a design method that help identify the appropriate resources based on a good understanding of their possibilities and limitations. Also, the integration of multiple assistance resources and agents, especially human facilitators reduces the load on heavily computerised resources such as an ITS. Finally, the development of generic assistance resources at the VLC level with the help of advisor building tools should help tackle the heaviest components of the assistance space.

566

J Girard et al. / Web-Based TeleLearning

3.2 Modelling a course With these kinds of problems in mind, we have developed a method for learning systems engineering called MISA [11]. The MISA method presents the ID processes and tasks according to an engineering perspective analogous to software engineering. The method is a complex process decomposed at several levels, into sub-processes. Each sub-process has its inputs and its 30 main products and 150 sub-tasks well defined, the whole process generating a learning system as its final output. This method innovates by using cognitive modelling techniques to represent knowledge, as well as pedagogical models, learning materials model and delivery models. These four aspects of a learning system, are clearly differentiated but they also are interrelated through specific associations in each of six main phases, making the engineering process visible and structured, thus facilitating quality control of the processes and their products. For the assistance space, we need to focus here on the design of the knowledge model representing the content to be learned and the instructional model representing the learning events (program, course or activities). The terminal learning events are called learning units for which we design an instructional scenario corresponding to a target population of learners. These learning events form a network into which the learner will navigate. Similarly, the knowledge model is a network of facts, concepts, procedures and principles that the learner must acquire/build. Actually, in the implementation presented here, we have not yet considered the full potential of such MOT models for advising. We simplify the problem par reducing both models to hierarchical trees. The first one is the instructional structure (IS) representing a curriculum, its main subdivisions down to terminal learning events, that is learning activities which are the main components of a learning scenario. The second one is the knowledge structure (KS) compose of a main knowledge object, decomposed into related parts and ending with simple objects. These models can be displayed as generic VLC tools. On figure 3, such a tool is presented for the instructional model navigator.

Figure 3 - VLC tools: an instructional model viewer and a collaborative path finder

J. Girard et al / Web-Based TeleLearning

567

To integrate a course like the one in figure 2 to the VLC, we need to link each unit in the IS and the KS, and progress level within them, to actual web pages or sections in the host system. This will enable the system to capture the user's action and build a user model. Then this user model will offer an alternate way for any user to navigate in the course by selecting a learning event in an instructional model viewer (figure 3 - left), asking to display the corresponding pages or sections. Also, another VLC tool (figure 3 - right) enables a user to see another's progress in the same web course, to help collaboration as well as tutoring. 3.3 The user model The integration of a course into the VLC is done through a simple interface where the designer of the advisor describes the IS and the KS. Also, another design tool helps define conditional principles that will update the user model, display an advice or engage in a dialogue with the user. Finally a management tool identifies the learners and facilitators assigned to the course, making possible, though JAVA scripting, to navigate into pages visited by a trainee or a co-learner. With the help of these tools, the designer of the advisor will go through the following steps. 1- Define the IS as a hierarchical list of instructional units {IU1, IU2, KS as a hierarchical list of knowledge units {KU1, KU2, , KUm}

, IUn} and the

2- Define for each IU or KU a set of ordered symbols called progress levels {Ip1, Ip2, , Ipk ; {Kp1, Kp2, , KpL} and assign user events in the host system to each couple formed of a IU or KU with one of its progress level. 3- Define how the acquired or desired progress levels will be defined at the beginning and how they will be updated according to the user's actions. 4- Define the conditions that will fire each advice or action and state the piece of advice or describe the action using a symbolic language. At any time the system evaluates the user's actions in the host system and assigns, for each IU or KU, an estimated progress level considered to be acquired by the user, called his belief, and a targeted progress level called his goal. The user model at time t is simply the set of all beliefs and goals assigned at timet to each IU or KU. User-Model (t) - {BIU(I,t),GIU(I,t)}i.i,n u (BKU(J,t), GKU(J,t)}j=i,m where BIU(I,t) and GIU(I,t) are the acquired and desired progress level for IU number I at time t and BKU(J,t) and GKU(J,t) are the acquired and desired progress level for KU number J at time t The user model is updated essentially in three ways: by the designer's predefined rules, by querying the user at run time and by some action the user can take to modify the model. First, the designer will predefine basic actions on the model, that is principles stating that if certain conditions are met, the model will be updated to some belief or goal level for a

568


certain unit (IU or KU). Actually, the following actions can be taken: add or suppress a progress level in a unit; move up or down the progress level in a unit, query the user with a message and offer a set of possible answers. A second way to update the model is when the user is queried. Upon certain answers, other questions can be asked, until the end of a predefined decision tree. Then, depending on the path of the user's answers, the system will be able to update the model according to the designer's definition in such a case. Finally, the user can see at any time the different belief and goal levels assigned to him for any IU or KU. The VLC tool shown on figure 4 is one way of doing this. According to some evaluation of the distance between a progress level and the next one, and the number of progress levels for a IU or a KU, different weights can be assigned to progress levels. In this way, the system can display a bar diagram showing the proportion of progress in each IU or KU according to the belief level. Such a display can help the learner orient his actions. Also, he can disagree with the system. The tool of figure 4 enables him to change the values of any acquired or goal progress level. 4.

Extending the advisory system

We will now conclude this paper by identifying extensions to the actual implementation we have outlined here. There are three directions in which we want to move. The first one in to extend the actual advisor to support collaboration. Right now, the JAVA implementation enables a user to see the other users' learning path, look at their progress in the host environment and communicate with mem accordingly. An extension of the user model has been designed by the first author by adding another couple of progress level to the model for each instructional unit (IU) or knowledge unit (KU) called social belief and social desire [12]. These values identify the believed capability of a user to interact with others at a certain level for a given IU or KU, and also his intention to do so (is it a goal?). Based on this extension of the user model, a syntax has been defined to update this model, making it possible to advice on collaborative issues such as the selection of peer learners, the selection of tasks on which to collaborate, or the identification of knowledge on which to exchange. The second set of tasks we now face is to design advisors for other actors than the learner, particularly for the trainer and the designer. The actual learner's advisor is based on the use of VLC viewers that help the learner use his user's model and act accordingly. Such tools will have to be tailored for the trainer's role; for example, to help him make accurate diagnosis of the learner or groups of learners, to provide meaningful pieces of advice, to evaluate the learner's achievements. Also we will work on an advisor for the


569

designer, to help him adapt his scenario according to the characteristics of a design project, to navigate efficiently within the learning systems engineering method (MISA) and to assess the consistency and the quality of a learning system design. We wall here extend previous work on the Epitalk architecture [13]. Finally, we need to improve the design tools to help build such advisors. Defining the actions and pieces of advice is a time consuming task. We believe that the approach presented here can help automate and systematise a good part of the job. For example, once the design of the scenarios is done using a method like MISA, the IU and KU hierarchical lists can be produced automatically, default progress levels and updating actions can be proposed to the designer, action and advice frames can also be proposed. When this is done, we would like also to replace the hierarchical lists for the IS and the KS with richer instructional or knowledge models. Acknowledgements The authors wish to underline the contribution of Claude Ricciardi-Rigault, Chantal Paquin, Ileana de la Teja, Frederic Bergeron and of all the other researchesrs who participate in the various Virtual Campus projects at LICEF and have helped these ideas to mature. Also, a special thank to the Quebec Information Highway Fund, the TeleLearning Network of Centers of Excellence and the Office for Learning Technology (OLT) who have contributed to the funding of these projects. References [1] Wilson J. & Mosher D. The Prototype of the Virtual Classroom. Journal of Instructional Delivery Systems, Summer 1994, 28–33.Wilson J. & Mosher D. [2] Hiltz, S. Evaluating the Virtual Classroom, in Harasim, L. (ed.) Online education: perspectives on a new environment. New-York: Praeger Publishers, 133-184, 1990. [3] Hall, B. Web-based training, 1998 [4] Harasim, L. Online education: An environment for collaboration and intellectual amplification, in L. Harasim (Ed.), Online education: Perspectives on a new environment. New York: Praeger Pub., 1990. [5] Ricciardi-Rigault, C., Henri, F. Developing Tools for Optimizing the Collaborative Learning Process, Proceedings of the International Distance Education Conference, June, Penn State, 1994. [6] Gery G. Granting three wishes through Performance-Centered Design NATO Communications of the ACM, volume 40, number 7, pp.54–59, July 1997. [7] Paquette G., Bergeron G., Bourdeau J. The Virtual Classroom revisited, TeleTeaching'93 Proceedings, Trondheim, Norway, aout 1993. [8] Paquette G., Modeling the Virtual Campus. In "Innovating Adult Learning with Innovative Technologies" (B. Collis and G. Davies Eds) Elsevier Science B.V., Amsterdam, 1995. [9] Paquette G. , Ricciardi-Rigault C., Paquin C., Liegeois S. and Bleicher E. Developing the Virtual Campus Environment, ED-Media International Conference, Boston, USA, June 1996. [10] Paquette, G. Virtual Learning Centers for XXIst Century Organizations. In F. Verdejo and G.. Davies (eds.), The Virtual Campus (pp. 18–34). Chapman & Hall, London, 1997 [11] Paquette G. , Aubin C. and Crevier, F. Design and Implementation of Interactive TeleLearning Scenarios. Proceedings of ICDE'97 (International Council for Distance Education), Penn State University, USA, June 1997. [12] Girard J., Paquette G., Giroux S. Architecture de Systeme Conseiller multiagent sur la collaboration dans un Systeme d'apprentissage. LICEF Research Reports, Te'le'-universitS, 1997. [13] Paquette G. and Girard J. AGD: a course engineering support system, ITS-96, Montreal, June 1996.

570


Dialectics for collective activities: an approach to virtual campus design Patrick Jermann, Pierre Dillenbourg, Jean-Christophe Brouze TECFA (unit of Educational Technology), School of Psychology and Education Sciences University of Geneva, Switzerland

Abstract This contribution is an attempt to systematise our approach to the design of a virtual campus. Activities in the recently started TECFA virtual campus rely on Internet tools, but they concern both distance teaching and presential interactions. Designing learning activities is relatively easy when the learning goal is an activity in itself. In this contribution, we explain how we extend the "learning by doing" approach to the acquisition of declarative knowledge. After fulfilling a pseudo-task, the activities we describe lead students to interpret a graphical representation of their performance. This happens during a debriefing session where the teacher turns students' experiential knowledge into academic knowledge. The target knowledge, i.e. the concept or theory we want them to understand, is acquired through the effort students make to understand differences that appear in the representation of pseudo-task performances. The representations produced by the system were successful in triggering intense discussions. Keywords: Declarative knowledge acquisition, virtual campus, collaboration and collaborative tools, collective representations.

1. Introduction: What's in a virtual campus ? What is the difference between a virtual campus and any genuine Web site for training? The most salient difference is that the information space is rendered by a spatial metaphor. A second difference is that a campus is supposed to be broader than a course, i.e. that a virtual campus should cover rather broad curricula. This requires some homogeneity across courses, both at the pedagogical level and at the management level (e.g. student tracking tools). In this contribution, we suggest a third difference: to restrict the label Virtual campus' to environments which offer learning activities, rather than simply provide information. The evolution of the Web shows itself the limits of the "teaching as transmitting information" paradigm: we have the technology to provide learners with a huge amount of information, but the learners cannot turn it into personal knowledge simply by reading Web pages. We see the virtual campus paradigm as a move away from information sites towards learning sites, from providing (multimedia) documents towards involving the user in learning activities. This additional item in the definition of a virtual campus does not reflect its current use in the community, we propose it here for adoption. However, when training university teachers to use Internet, we observed that, although they mostly accept this constructivist approach at the discourse level, they do complain about the difficulty to apply it to their own teaching ("I like what you do but it would not work for my course!"). Designing learning activities is relatively easy when the learning goal is an activity in itself. For instance, learning a programming language naturally relies on programming

P. Jermann et al. / Dialectics for Collective Activities

571

exercises. In a similar way, when teaching software design, the most natural method is projectbased, the learners being asked to design a piece of software. In other words, the design of activities is rather natural for procedural and heuristic knowledge. It is harder for declarative knowledge, teachers object that activities are not useful for teaching theories. Hence, this contribution addresses the design of Web-based activities for learning declarative knowledge. Since the goal is not a task in itself, the learning activity is hereafter referred to as a pseudotask, i.e. a task which is artificially introduced for making learners aware of some features. TECFA virtual campus includes a variety of learning activities. We present three of them (section 3) because they share some design features (section 4), which can be generalised to other learning objectives. We do however not propose any recipe, the design of learning activities remains a complex engineering process, based on the deep analysis of the knowledge or skills to be acquired. Our contribution suggests some solutions which can be chosen during this design process. These solutions will have to be adapted to other contexts than ours (section 2), according to the number and level of students, to the balance of presential versus distance learning activities, and so forth. This contribution does not include AI components, as the scope of this conference has been broadened to interactive learning systems. It is based on standard tools for generating dynamically Web pages from a database. However, the activities illustrated in this paper and other activities in the virtual campus share many technical aspects. We hence suggest to develop higher order tools which produce this kind of environments with reduced costs. 2. Teaching context The TECFA virtual campus relies on design principles drawn from our five-years experience in teaching with Internet. We used it for a master degree in educational technology. Its organisation combines distance with presential periods: students attend courses for one week at the University and then work remotely for 5 to 7 weeks using a combination of available media (e-mail, discussion groups, MOOs, phone, ...). Combining presence and distance has proved to be a very robust formula. Some weaknesses of electronic communication can be repaired through face-to-face meetings (and vice-versa). For instance, e-mail is not appropriate for the initial phase of a project, when goals have to be defined (Hansen et al, 1998): Brainstorming and negotiation are more efficiently conducted through face-to-face meetings. Paradoxically, we observed a bias in our evolution: since students spend a short time in presence, we tended to teach as much theory possible and neglected interactive sessions. The actual state of TECFA virtual campus results from the solutions we found to these various problems. 2.1. Designing principles The design of the activities illustrated in section 3, reflect general principles: Principle #1: Less text, more activity. We do not claim that a web site should not include any documents but that resources and activities must be differentiated at the technical and conceptual levels. The resources are the documents, images, movies,... The activities are the tasks that students are asked to perform in order to process the encountered information. In general, the Web sites include an implicit activity: please consult the information (read/watch/listen). We suggest to use activities in which the learner has a more active role to play, where he is in charge of constructing his knowledge. Principle #2: Design activities for presence. The usage of Internet tools is not restricted to distance interactions, they can also be used in presential teaching. One could even argue that

572

P.

Jermann et al. / Dialectics for Collective Activities

the real challenge is to use the Web in presential teaching: when students are far away, providing information is per se a good thing, whatever the pedagogical quality; at the opposite, if students are present, one has to justify in which way Web-based activities offer any extra added value compared to other ways of teaching. Of course, we have the same interest for distance activities, but those presented here are used with students being present in the room. Principle #3: Integrate communication into activities. Providing communication tools (email, forums,...) does not imply that students actually use them for content-rich interactions ("conversational learning"). Our experience in Internet-based teaching is that e-mail is mostly used for management issues (appointments, assignments,...), while discussion groups are more frequently used to handle technical problems than for knowledge intensive argumentation. This statement is rather trivial: Students do not communicate just for the sake of communication, but because they need to communicate with respect to some task. Communication should not be an extra-task, it should be part of the task. Namely, the main vector for communication in our examples are shared spaces, i.e. interfaces by which learners act on the same objects. Principle #5: Structure groupwork with scenarios. Beyond collaborative learning, we observed that involving the whole class of students at some stages produced very dynamic interactions. We later refer to this phase as being 'collective'. Most of our learning activities rely on the group of learners. However, as we know from research on collaboration (Dillenbourg et al, 1995), group interactions are not guaranteed to produce learning. To increase the probability of productive interactions, we design scenarios which specify which learner must do what and at which time. This scenario is encompassed in the interface: the scenario phases are represented as virtual rooms, the environments functionalities determine how learners interact,... 3. Three examples of activities The activities we present in this section are represented as buildings inside the TECFA virtual campus. A building corresponds to a pedagogical scenario, and each phase is matched with a room. The examples below share two aspects: they all include a pseudo-task, i.e. a task which is apparentely not related to the pedgagoical goals, and this pseudo-task is followed by debriefing activities during which emergent knowledge is structured by the teacher. 3.1. Ergonomics laboratory This activity belongs to a course on human-computer interaction. Its objective is to introduce the SSOA model (Schneiderman, 1992). Students compare the effectiveness of different interaction styles and relate these styles to different types of users' knowledge taken from the SSOA model. The difference between styles is not taught but experienced by the students themselves as they use different versions of the same application in a pseudo-task. Pseudo-task. We ask students to use six versions of an application in order to produce railway tickets. Each student composes 4 tickets with each version. Students have to fulfil the wishes of an imaginary passenger who chooses the destination, the type of seat, the class, return or not, etc... They get a verbal description of the tickets to produce. Each version of the application is based on different "interaction styles": command languages, pull-down or pop-up menus, direct manipulation (drag & drop or click) and forms. They get a verbal description of the tickets to produce. Once the student has completed the 24 tickets, log data is collected by the application and sent to a central device.


573

Figure 1 : Histogram of mean response time in seconds (bars al..fl) and number of errors (bars a2..f2) across the different interaction styles: (a) command language, (b) pop-up menus, (c) pull- down menus, (d) forms, (e) direct manipulation by click, (f) direct manipulation by drag & drop

Representation and debriefing. Several parameters are stored during task completion: the response time, the number of errors and the number of help requests. On demand, by moving to the next phase, the system generates representations of the parameters measured for each interaction style. A general histogram (Figure 1) shows the mean of parameters across the interaction styles. The students comment these data by explaining why they were slow or fast with different versions, why they did mistakes, they clarify which system features are responsible for their behaviour. The role of the teacher is to synthesise both these numeric data and the experiential comments and to articulate them with a theoretical framework, in this case Schneiderman's SSOA model. Experiment. This activity took two hours to be completed. It was effective in the sense that students produced a very rich list of pros and cons with respect to each interaction style and related them to the target type of user (Schneiderman). Students tend to use their experiential knowledge more than the statistical summary during the debriefing. This summary is more useful for the teacher to draw a synthesis. 3.2. Defects of multiple-choice questionnaires This activity is intended to teach the common defects of multiple-choice questionnaires, i.e. to provide students with an operational understanding of the validity of this educational measurement tool. Pseudo-task. Students answer two questionnaires in pairs: the first about performances of belgian athletes, the second about capitals in western Europe. Before they answer the questionnaires, they are asked to evaluate their level of competence in each of the domains on a 5 point scale. The questionnaires are built in such a way that the questions related to the domain more familiar to the students (capitals of western Europe) is hard to answer. Conversely, the questions for the less familiar domain (performances of belgian athletes) are built in such a manner that one can guess the correct answer. Completing each questionnaire leads to a score to be compared with the self-evaluation the students gave at the beginning of the activity.

574


SCORE.PREDIT

SCORE_REEL

Figure 2: Graph of evaluated and real scores for the two questionnaires. (Real scores on the X-axis and predicted scores on the Y- axis). Because some pairs had the same scores, several squares and circles are superimposed. The squares above the diagonal are responses to the questionnaire on capitals of western Europe (familiar topic). Students over-evaluated their competence. Conversely, the circles below the diagonal represent scores to the questionnaire on performances of belgian athletes (unfamiliar topic). Students under-estimated their competence.

Representation and debriefing. In the next phase, the system plots a graph (Figure 2) with the students5 position indicated by his name. In addition to the scores graph, the system provides several statistics: the distribution of answers to each question, a list of defects and ways to avoid them. During the debriefing, the teacher reviews questions one per one, the students explain how they produced their answers and identify defects of the questionnaires. Finally, the teacher generalises the observed defects of the questionnaires from a docimological viewpoint. During the debriefing, the negative correlation between predicted and real scores is not interpreted as reflecting bad metacognitive skills, but the low validity of the questionnaire. Experiment. This activity lasted 2 hours. It appeared to be effective for inducing the defects of MCQuestionnaires, especially for the second questionnaire: since students were upset by their bad scores they were eagerly trying to attribute their errors to defects of the questionnaires. 3.3. Argue Graph The goal of this activity is to make students aware of learning theories (e.g. constructivism, behaviourism, ...) underlying design choices in courseware development. In the past, this course was given as a standard lecture with less convincing outcomes. Pseudo-task. The scenario includes the following steps: Students twice fill in the same online questionnaire about design principles in courseware development. Here is an example of the questions they had to answer: What is the best way to motivate students? a) show them what the learning objective is; b) show them a funny animation at each correct response; c) include the score they get for the course evaluation. The first time they answer to the questionnaire alone and the second time in pairs. When answering, they are invited to give a written argument to support each of their choice. The choices the students make are transformed into two scores reflecting whether they privilege system- vs. user-driven interactions and a discovery vs. teaching based pedagogy. A scatterplot is created on the basis of these scores representing each student's position along the two dimensions of courseware design. In a second phase we let students work in pairs. These pairs are formed as to maximise the differences between students based on their answers to the individual questionnaire. When working in pairs, the students see the arguments they gave to


575

support their answers in the individual phase. They have to agree on a common answer and provide a common argument. DISCOVERY

TEACHING

Figure 3: Scatterplot of solo answers. Each square corresponds to a student's opinion. The horizontal axis opposes system vs. learner driven interaction. The vertical axis opposes discovery based learning vs. teaching.

Representation and debriefing. The system draws two scatterplots. After students answered alone, they can see their position along the dimensions we just described (Figure 3). Then, after students answered in pairs, the scatterplot represents the "migration" of each student from his initial position to the common position. In addition to the scatterplots, the system lists all the arguments given for each question and draws a piechart with the distribution of answers. Finally, a brief statement presents the underlying theories to the options the student can select in a question. The teacher debriefs the class on the basis of this information. Experiments. This activity took 4 hours to complete. We ran two pre-experiments, respectively with 15 students in 1997 and 17 students in 1998, after which the system was improved with regard to various functionalities. The experiment reported here was run with on October 22nd 1998. Most students where located in the same quarter of the graph (Figure 3). This phenomenon is probably due to the fact that the questions did too clearly reflect the pedagogical values sponsored at TECFA and did not take into account the technical or financial dimensions. A preliminary data analysis shows that the pairing method is efficient for triggering argumentation. In 49% of the questions, the members of the pair had to answer a question for which the individuals previously gave different answers. There is some relation between the distance in the graph and the frequency of conflict (the five pairs with a distance of 1 have a disagreement rate of 38%, while the pairs with a larger distance have a rate of disagreement of 52.5%), but the size of the sample is not sufficient to compute a correlation rate. For each AB pair, we counted the number of times that AB's joint answer corresponded A's previous solo answer versus the times it corresponded to B's previous solo answer. We thereby observed that pairs were rather symmetrical, the difference ranging between 0 and 2, with the exception of 2 pairs. 4. Generalizing the approach: dialectic collective activities The learning activities we described in section 3 follow a similar scenario: students first complete a task not directly related to the target knowledge to be learned. The system then uses

576


the students' answers to produce a synthetic representation of their performance. Finally, the teacher uses this representation to debrief the class, i.e. to turn their experiential knowledge into academic knowledge, to put labels on new concepts, to structure the outcome. As stated in the introduction, designing learning activities is relatively easy when the learning goal is an activity in itself. When acquiring declarative knowledge (principles, theories, laws, concepts, ...) the learning task cannot simply be derived from the target task. Therefore, we introduce the notion of pseudo-task to refer to a task which is not the skill to be mastered, but which produces learning by experience. The target knowledge results from the effort to understand differences that appear in the representation of pseudo-task performances. These representations always present an aspect of differentiation, either cognitive, metacognitive or social. While the students have accomplished the pseudo-task either alone or in pairs, the synthesis produced by the system represents the students as a collectivity. "Collective" differs from "collaborative" due to the fact that it does not necessarily imply rich interactions among students. Simply, the system collects individual productions or data and makes them available for the whole group. Collective representations bring the social plane into the frame of reference used during the debriefing session. In other words, the underlying principle is to play with various types of differences in such a way that the explanation of observed differences produces the information which is then structured by the teacher during the debriefing. Table 1 summarises design parameters we used to set up the activities. The question which remains open is whether a particular type of competence, type of difference and type of representation match better. Table 1: Design parameters

Target knowledge

Pseudo-task

Type of competence in the pseudo-task One correct response Debriefing

Representation used in debriefing Differentiation level Mode of representation

Ergonomics laboratory SSOA model: understanding the usability of different interaction styles produce railway tickets with six different interfaces tool specific knowledge

MCQuestionnaires validity criteria for multiple choice questionnaires

Argue grapk learning theories underlying design choices in courseware

answer two badly designed questionnaires

answer a questionnaire about design choices

domain specific knowledge

opinions

yes (ticket matches the needs of passenger)

yes (only one answer is correct)

no (opinions)

clarify differences between interaction styles with respect to a theoretical model graph represents mean performance for each interface cognitive (performance: time, errors)

clarify the concept of validity

link opinions about courseware development with learning theories

graph represents selfevaluation and score obtained metacognitive (mismatch between self-evaluation and effective score) individual compared to an anonymous group

graph represents people's opinions

anonymous

social (difference between opinion scores) individual compared to individuals


577

5. How to get a demo TECFA virtual campus is located at http://tecfa.unige.ch/campus/infospace/index.php. In order to have full access to the facilities of the campus, users have to log in. Ten guest accounts are available for testing purpose. Use "guestl" (or "guest2" ... "guest10") as login and password. The "ergonomics laboratory" is located in building 1201 of zone 12. The "defects of multiple-choice questionnaires" activity is in building 1603 of zone 16. Finally, the "argue graph" can be found in building 1601 of zone 16. 6. Conclusion This contribution is an attempt to systematise our approach to the design of a virtual campus. We have presented three learning activities for declarative knowledge acquisition. After fulfilling a pseudo-task, these activities lead students to interpret a graphical representation of their performance. This happens during a debriefing session where the teacher turns students' experiential knowledge into academic knowledge. The target knowledge, i.e. the concept or theory we want them to understand, is acquired through the effort students make to understand differences that appear in the representation of pseudo-task performances. The representations produced by the system were successful in triggering intense discussions. We have just started the TECFA virtual campus and it remains under construction. New developments are planned in the near future. For example, the extension of the existing navigation tool with 2D maps should allow us to use distances between buildings to represent thematic relationships. We also intend to build personal maps which can then be used by students as spatially organised bookmarks. 7. Acknowledgments TECFA Virtual Campus has been developed with the help of Cyril Roiron for the activities mentioned here, and more generally with the collaboration of David Ott, Daniel Schneider, Daniel Peraya, Patrick Mendelsohn, Philippe Lemay and Didier Strasser. 8. References Dillenbourg, P., Jermann, P., Schneider, D., Traum, D. and Buiu, C. (1997) - The design of MOO agents: Implications from a study on multi-modal collaborative problem solving. In Proceedings of AI&ED197 Conference. B. du Boulay and R. Mizoguchi (Eds.). IOS Press. Dillenbourg, P., Baker, M., Blaye, A. & O'Malley, C. (1995) - The evolution of research on collaborative learning. In E. Spada & P. Reiman (Eds) Learning in Humans and Machine: Towards an interdisciplinary learning science. (Pp. 189–211) Oxford: Elsevier. Hansen, D., Dirckinck-Holmfeld, L., Lewis, R. and Rugeli, J. (1998) - Using Telematics for Collaborative Knowledge Construction. In Collaborative Learning: Cognitive and Computational Approaches. Elsevier Science Publishers: Amsterdam. Rouet, J.-F. (1994) - Question answering and learning with hypertext. In R. Lewis & P. Mendelsohn (Eds) Lessons from learning. IFIP Transactions A-46. Schneiderman, B.(1992) - Designing the user interface. Reading, MA: Addison-Wesley.

578


Virtual Humans for Team Training in Virtual Reality Jeff Rickel and W. Lewis Johnson USC Information Sciences Institute & Computer Science Department 4676 Admiralty Way, Marina del Rey, CA 90292-6695 [email protected], [email protected], http://www.isi.edu/isd/VET/vet.html

Abstract This paper describes the use of virtual humans and distributed virtual reality to support team training, where students must learn their individual role in the team as well as how to coordinate their actions with their teammates. Students, instructors, and virtual humans cohabit a 3D, simulated mock-up of their work environment, where they can practice together in realistic situations. The virtual humans can serve as instructors for individual students, and they can substitute for missing team members. The paper describes our learning environment, the issues that arise in developing virtual humans for team training, and our design for the virtual humans, which is an extension of our Steve agent previously used for one-on-one tutoring.

1 Introduction Complex tasks often require the coordinated actions of multiple team members. Team tasks are ubiquitous in today's society; for example, teamwork is critical in manufacturing, in an emergency room, and on a battlefield. To perform effectively in a team, each member must master their individual role and learn to coordinate their actions with their teammates. There is no substitute for hands-on experience under a wide range of situations, yet such experience is often difficult to acquire; required equipment may be unavailable for training, important training situations may be difficult to re-create, and mistakes in the real world may be expensive or hazardous. In such cases, distributed virtual reality provides a promising alternative to real world training; students, possibly at different locations, cohabit a 3D, interactive, simulated mock-up of their work environment, where they can practice together in realistic situations. However, the availability of a realistic virtual environment is not sufficient to ensure effective learning. Instructors are needed to demonstrate correct performance, guide students past impasses, and point out errors that students might miss. Yet requiring instructors to continually monitor student activities places a heavy burden on their time, and may severely limit students' training time. In addition, team training requires the availability of all appropriate team members, and may require adversaries as well. Thus, while virtual environments allow students to practice scenarios anywhere and anytime, the need for instructors and a full set of teammates and adversaries can provide a serious training bottleneck. One solution to this problem is to complement the use of human instructors and teammates with intelligent agents that can take their place when they are unavailable. The intelligent agents cohabit the virtual world with human students and collaborate (or compete) with them on training scenarios. Intelligent agents have already proven valuable in this role as fighter pilots in large battlefield simulations [7, 10], but such agents have a limited ability to interact with students. Our work focuses on a different sort of agent: a virtual human that interacts with students through face-to-face collaboration in the virtual world, either as an instructor or teammate. We call our agent Steve (Soar Training Expert for Virtual Environments). Our prior work focused on Steve's ability to provide one-on-one tutoring to students for individual tasks [14, 15]. Steve has a variety of pedagogical capabilities one would expect

J. Rickel and W.L. Johnson / Team Training in Virtual Reality

579

of an intelligent tutoring system. However, because he has an animated body, and cohabits the virtual world with his student, he can provide more human-like assistance than previous disembodied tutors. For example, he can demonstrate actions, use gaze and gestures to direct the student's attention, and guide the student around the virtual world. This makes Steve particularly valuable for teaching tasks that require interaction with the physical world. This paper describes our extensions to Steve to support team training. Steve's prior skills provided a solid foundation for his roles in team training, but several new issues had to be addressed. Each agent must be able to track the actions of multiple other agents and people, understand the role of each team member as well as the interdependencies, and communicate with both human and agent teammates for task coordination. In the remainder of this paper, we describe our learning environment for team training (Section 2), our solutions to these issues (Section 3), and related and future work (Section 4).

2

The Learning Environment

Our learning environment is designed to mimic the approach used at the naval training facility in Great Lakes, Illinois, where we observed team training exercises. The team to be trained is presented with a scenario, such as a loss of fuel oil pressure in one of the gas turbine engines that propels the ship. The team must work together, guided by standard procedures, to handle the casualty. At Great Lakes, the team trains on real, operational equipment. Because the equipment is in operation, the trainers have limited ability to simulate ship casualties; for example, they must mark gauges with grease pencils to indicate hypothetical readings. In our learning environment, the team, consisting of any combination of Steve agents and human students, is immersed in a simulated mock-up of the ship; the simulator creates the scenario conditions. As at Great Lakes, each student is accompanied by an instructor (human or agent) that coaches them on their role. Each student gets a 3D, immersive view of the virtual world through a head-mounted display (HMD) and interacts with the world via data gloves. Lockheed Martin's Vista Viewer software [17] uses data from a position and orientation sensor on the HMD to update the student's view as she moves around. Additional sensors on the gloves keep track of the student's hands, and Vista sends out messages when the student touches virtual objects. These messages are received and handled by the simulator, which controls the behavior of the virtual world. Our current implementation uses VIVIDS [13] for simulation authoring and execution. Separate audio software broadcasts environmental noises through headphones on the HMD based on the student's proximity to their source in the virtual world. Our current training environment simulates the interior of a ship, complete with gas turbine engines, a variety of consoles, and their surrounding pipes, platforms, stairs and walls. A course author can create a new environment by creating new graphical models, a simulation model, and the audio files for environmental sounds. Our architecture for creating virtual worlds [9] allows any number of humans and agents to cohabit the virtual world. While the behavior of the world is controlled by a single simulator, each person interacts with the world through their own copy of Vista and the audio software, and each agent runs as a separate process. The separate software components communicate by passing messages via a central message dispatcher; our current implementation uses Sun's ToolTalk as the message dispatcher. This distributed architecture is modular and extensible, and it allows the various processes to run on different machines, possibly at different locations. It greatly facilitates team training, where arbitrary combinations of people and agents must cohabit the virtual world; our extension to team training would have been more difficult had we originally designed a more monolithic system geared towards a single student and tutor. Humans and agents communicate through spoken dialogue. An agent speaks to a person (teammate or student) by sending a message to the person's text-to-speech software (Entropic's TrueTalk), which broadcasts the utterance through the person's headphones. When a person speaks, a microphone on their HMD sends their utterance to speech recognition soft-

580

J. Rickel and W.L. Johnson I Team Training in Virtual Reality

Figure 1: (a) Steve describing an indicator light; (b) one Steve agent speaking to another: (c) one Steve agent watching another ware (Entropic's GrapHvite), which broadcasts a semantic representation of the utterance to all the agents. Currently, Vista provides no direct support for human-to-human conversation; if the humans are not located in the same room, they must use telephone or radio to converse. For team training, teammates and instructors must track each other's activities. Each person sees each other person in the virtual world as a head and two hands. The head is simply a graphical model, so each person can have a distinct appearance. Each Vista tracks the position and orientation of its person's head and hands via the sensors, and it broadcasts the information to agents and the other Vistas. Each agent appears as a human upper body (see Figure 1). To distinguish different agents, each agent can be configured with its own shirt, hair, eye, and skin color, and its voice can be made distinct by setting its speech rate, base-line pitch, and vocal tract size (these parameters are supported by the TrueTalk software).

3

Agent Design

3.1 Architecture Each Steve agent consists of three main modules: perception, cognition, and motor control [15]. The perception module monitors messages from other software components, identifies relevant events, and maintains a snapshot of the state of the world. It tracks the following information: the simulation state (in terms of objects and their attributes), actions taken by students and other agents, the location of each student and agent, and human and agent speech (separate messages indicate the beginning of speech, the end, and a semantic representation of its content). In addition, if the agent is tutoring a student, it keeps track of the student's field of view; messages from the student's Vista indicate when objects enter or leave the field of view. The cognition module, implemented in Soar [11], interprets the input it receives from the perception module, chooses appropriate goals, constructs and executes plans to achieve those goals, and sends out motor commands to the motor control module. The motor control module accepts the following types of commands: move to an object, point at an object, manipulate an object (about ten types of manipulation are currently supported), look at someone or something, change facial expression, nod or shake the head, and speak. The motor control module decomposes these motor commands into a sequence of lower-level messages that are sent to the other software components (simulator, Vista Viewers, speech synthesizers, and other agents) to realize the desired effects. (See [15] for more details on this architecture.) To allow Steve to operate in a variety of domains, his architecture has a clean separation between domain-independent capabilities and domain-specific knowledge. The code in the perception, cognition, and motor control modules provides a general set of capabilities that are independent of any particular domain. These capabilities include planning, replanning, and plan execution; mixed-initiative dialogue; assessment of student actions; question answering ("What should I do next?" and "Why?"); episodic memory; communication with teammates; and control of a human figure [15]. To allow Steve to operate in a new domain.


Task transfer-thrust-control-ccs Steps press-pacc-ccs, press-scu-ccs Causal links press-pacc-ccs achieves ccs-blinking for press-scu-ccs press-scu-ccs achieves thrust-at-ccs for end-task Ordering press-pacc-ccs before press-scu-ccs Roles pace: press-pacc-ccs; scu: press-scu-ccs

581

Task loss-of-fuel-oil-pressure Steps transfer-thrust-control-ccs, ... Causal links ... Ordering ... Roles eoow: (transfer-thrust-control-ccs pace), ... ; engrm: (transfer-thrust-control-ccs scu), ...

Figure 2: Example team task descriptions a course author simply specifies the appropriate domain knowledge in a declarative language. (Recent work has focused on acquiring the knowledge from an author's demonstrations and the agent's experimentation [2]). The knowledge falls in two categories: perceptual knowledge (knowledge about objects in the virtual world, their relevant simulator attributes, and their spatial properties) and task knowledge (procedures for accomplishing domain tasks and text fragments for talking about them). For details about Steve's perceptual knowledge, see [15]; the remainder of the paper will focus on Steve's representation and use of task knowledge.

3.2

Representing Task Knowledge

Most of Steve's abilities to collaborate with students on tasks, either as a teammate or tutor, stem from his understanding of those tasks. As the scenario unfolds, Steve must always know which steps are required, how they contribute to the task goals, and who is responsible for their execution. In order to handle dynamic environments containing other people and agents, he must understand the tasks well enough to adapt them to unexpected events; he cannot assume that the task will follow a pre-specified sequence of steps. Moreover, our goal was to support a declarative representation that would allow course authors to easily specify task knowledge and update it when necessary [14]. Our representation for individual tasks, used in our previous work for one-on-one tutoring, satisfies these design criteria. The course author describes each task using a standard plan representation [16]. First, each task consists of a set of steps, each of which is either a primitive action (e.g., press a button) or a composite action (i.e., itself a task). Composite actions give tasks a hierarchical structure. Second, there may be ordering constraints among the steps; these constraints define a partial order over the steps. Finally, the role of the steps in the task is represented by a set of causal links; each causal link specifies that one step in the plan achieves a goal that is a precondition for another step in the plan or for termination of the task. For example, pulling out a dipstick achieves the goal of exposing the level indicator, which is a precondition for checking the oil level. This task representation is suitable for structured tasks based on standard procedures. It would not be suitable for tasks that require creative problem solving, such as design tasks. Fortunately, many tasks in industry and the military have this type of structure, including operation and maintenance of equipment, trauma care, and surgical procedures. Moreover, this representation need not be viewed as a fixed sequence of steps; rather, it is a general causal network of steps and goals, and can be used by a planning algorithm to dynamically order the steps even in the face of unexpected events, as described in Section 3.3. To extend Steve to team training, we had to decide how to assign team members to task steps. Research on multi-agent teams has addressed methods by which teammates dynamically negotiate responsibility for task steps. However, supporting such negotiation among a team of agents and people would require more sophisticated natural language dialogue capabilities than Steve currently has. Fortunately, many team tasks have well-defined roles that are maintained throughout task execution, so we focus on this class of tasks. Extending Steve to support such team tasks required one simple addition to each task description: a mapping of task steps to team roles. For example, Figure 2 shows a simplified task model for transferring thrust control to the central control station of a ship. Two roles

582


must be filled: one operator mans the propulsion and auxiliary control console (PACC) in the central control station (CCS), and another operator mans the shaft control unit console (SCU) in the engine room. The PACC operator requests the transfer by pressing the CCS button on her console, which results in the CCS button blinking on both consoles. When the CCS button is blinking on the SCU, the SCU operator presses it to finalize the transfer. This last action achieves the end goal of the task, which is indicated in the task description by specifying its effect as a precondition of the dummy step "end-task."1 If a step in the task is itself a team task, it will have its own roles to be filled, and these may differ from the roles in the parent task. Therefore, the parent task specifies which of its roles plays each role in the subtask. For example, the first task in Figure 2 is a subtask of the second (loss-of-fuel-oil-pressure). The task description calls for the executive officer of the watch (EOOW) to play the role of the PACC operator and for the engine room officer (ENGRM) to play the role of the SCU operator for the transfer of thrust control. Task descriptions (e.g., Figure 2) specify the structure of tasks, but they leave the goals (e.g., ccs-blinking) and primitive steps (e.g., press-pacc-ccs) undefined. The course author defines each primitive step as an instance of some action in Steve's extensible action library. For example, the step press-pacc-ccs would be defined as an instance of press-button in which the particular button to be pressed is pacc-ccs, the name of an object in the virtual world. The course author defines each goal by the conditions in the simulated world under which it is satisfied. For example, ccs-blinking is satisfied when the simulator attribute scu-ccs-state has the value "blinking." Thus, Steve is able to relate his task knowledge to objects and attributes in the virtual world.

3.3

Using Task Knowledge

When someone (e.g., a human or agent instructor) requests a team task to be performed, each Steve agent involved in the task as a team member or instructor uses his task knowledge to construct a complete task model. The request specifies the name of the task to be performed and assigns a person or agent to each role in that task. Starting with the task description for the specified task, each agent recursively expands any composite step with its task description, until the agent has a fully-decomposed, hierarchical task model. Role assignments in the request are propagated down to subtasks until the task model specifies which team member is responsible for each step. For example, if the task is loss-of-fuel-oil-pressure, with Joe as the EOOW, then Joe will play the role of the PACC for the subtask transfer-thrust-controlccs, and hence he is responsible for the step press-pacc-ccs. Since all agents have the same task knowledge, each agent will construct the same hierarchical task model with the same assignment of responsibilities. For simulation-based training, especially for team tasks, agents must be able to robustly handle unexpected events. Scripting an agent's behavior for all possible contingencies in a dynamic virtual world is difficult enough, but the problem is compounded when each agent must be scripted to handle unexpected actions by any human team member. One option is to simply prevent human students from deviating from standard procedures, but this robs the team of any ability to learn about the consequences of mistakes and how to recover from them. Instead, we have designed Steve to use his task knowledge to adapt task execution to the unfolding scenario. To do this, each agent maintains a plan for how to complete the task from the current state of the world. The task model specifies all steps that might be required to complete the task; it can be viewed as a worst case plan. Agents continually monitor the state of the virtual world, identify which goals in the task model are already satisfied, and use a partial-order planning algorithm to construct a plan for completing the task [15]. This plan is a subset of the task model, consisting of the steps relevant to completing the task, the ordering constraints among them, and the causal links that indicate the role of each step in achieving the end goals. In 'This representation for end goals is standard in AI planners [16].


583

our prior work, this plan would specify how an agent intended to complete a task; for team training, the plan specifies how the agent intends for the team to collectively complete the task, with some causal links specifying the interdependencies among team members (i.e., how one team member's action depends on a precondition that must be achieved by a teammate). Thus, agents dynamically interleave construction, revision, and execution of plans to adapt to the unfolding scenario. If an agent is serving only as a missing team member, it simply performs its role in the task, waiting when appropriate for the actions of its teammates, and communicating with them when necessary. In contrast, an agent serving as an instructor for a human student interacts with that student in a manner similar to one-on-one tutoring. The agent can demonstrate the student's role in the task, explaining each action it takes, or it can monitor the student as she performs the task, answering questions when the student needs help. Moreover, the agent instructor can easily shift between these two modes as the task proceeds; the student can always interrupt the agent's demonstration and ask to finish the task herself, and she can always ask the agent to demonstrate a step when she gets stuck.

3.4

Team Communication

In team tasks, coordination among team members is critical. Although team members can sometimes coordinate their actions by simply observing the actions of their teammates, spoken communication is typically required. Team leaders need to issue commands. Team members often need to inform their teammates when a goal has been achieved, when they are starting an activity, and when they detect an abnormal condition. Because team communication is so important, it must be taught and practiced in team training. We model team communication as explicit speech acts in the task descriptions. For the sort of structured tasks we have studied, this is natural; all the documented team procedures given to us specified when one team member should say something to another and how it should be said. To support this, we extended Steve's action library to include a new type of action: a speech act from one team member to another. The specification of the act requires four components: (1) the name of the task role to which the speech act is directed (e.g., SCU), (2) the name of the attribute being communicated (e.g., thrust-location), (3) the value being communicated for that attribute (e.g., ccs), and (4) the appropriate text string (e.g., "Thrust control is now at the central control station"). Each speech act appears as a primitive action in the task description, allowing us to explicitly model its relationship to the task, including the role responsible for performing it, ordering constraints on when it should be said, and causal links that specify how its effect contributes to completing the task (i.e., which other steps depend on that result). Given this representation for team communication, Steve agents can both generate and comprehend such utterances during task execution. When an agent's plan calls for it to execute one of these speech acts, it sends the text string to appropriate speech synthesizers for its human teammates to hear, and it broadcasts a semantic representation of the speech act for its agent teammates to "hear." When a human says the appropriate utterance, her speech recognizer identifies it as a path through its domain-specific grammar, maps it to an appropriate semantic representation, and broadcasts it to the agents. Each agent checks its plan to see if it expects such a speech act from that person at that time. If so, it updates the specified attribute in its mental state with the specified value, and it nods to the student in acknowledgment. If the speech recognizer fails to understand the student's utterance, or the utterance is not appropriate at the current time, the student's instructor agent is responsible for giving the student appropriate feedback. There are several important points about this approach. First, it only applies to structured tasks for which the required team communications can be specified in the task description; it will not suffice for tasks that require more arbitrary communication. Fortunately, many well structured team tasks, particularly in the military, include such a prescribed set of utterances.

584

J. Rickel and W.L. Johnson I Team Training in Virtual Reality

Second, since Steve does not include any natural language understanding abilities, all valid variations of the utterances must be added to the grammar for the speech recognizer. Again, this is reasonable for tasks with prescribed utterances. Third, note the difference between our approach and communication messages in a purely multi-agent system; a speech recognizer cannot tell to whom the utterance is intended, so agents use the task model to determine whether the speaker is addressing them. Finally, each agent must treat a human student and their instructor as jointly performing a role; if either of them generates the speech act, it must be treated as coming from that role. Although spoken communication is typically required for team tasks, nonverbal communication is also important. Human students can observe the actions of their nearby agent and human teammates, which is often required for proper team coordination. Agents look at a teammate when expecting them to do something, which can cue a student that she is responsible for the next step. Agents look at the teammate to whom they are speaking (Figure Ib), allowing students to follow the flow of communication and recognize when they are being addressed. Finally, agents react to their teammates' actions; they look at objects being manipulated by teammates (Figure Ic), and they nod in acknowledgment when they understand something a teammate says to them. For tasks that require face-to-face collaboration among team members, such nonverbal communication is critical.

4

Discussion

Steve has been tested on a variety of naval operating procedures. In our most complicated team scenario, five team members must work together to handle a loss of fuel oil pressure in one of the gas turbine engines. This task involves a number of subtasks, some of which are individual tasks while others involve subteams. All together, the task consists of about three dozen actions by the various team members. Steve agents can perform this task themselves as well as in concert with human team members. Several other recent systems have applied intelligent tutoring methods to team training, although none provides virtual humans like Steve. The PuppetMaster [12] serves as an automated assistant to a human instructor for large-scale simulation-based training, providing high-level interpretation and assessment to guide the instructor's interventions. It models team tasks at a coarser level than Steve, and is particularly suited to tracking large teams in very dynamic situations. AETS [19] uses detailed cognitive models to monitor a team of human students as they run through a mission simulation using the actual tactical workstations aboard a ship, rather than a virtual mockup. However, the system does not use these models to provide surrogate team members, and the automated tutor provides feedback to students only through a limited display window and highlighting of console display elements. AVATAR [5] supports simulation-based training for air traffic controllers and provides feedback on their verbal commands to simulated pilots and their console panel actions. However, the automated tutor and pilots have no planning capabilities, so scenarios must be more tightly scripted than in our approach. The Cardiac Tutor [6] trains a medical student to lead cardiac resuscitation teams. The system includes simulated doctors, nurses and technicians that play designated team roles, but they do not appear as virtual humans; they are heard but not seen. The representation of task knowledge appears less general than ours, since it is tailored particularly to trauma care. Unlike our system, where any team member could be a student or agent, their system is limited to a single student playing the role of the team leader. Some limitations in our system could be alleviated by recent research results from related areas. To go beyond tasks with prescribed utterances, we could leverage ongoing research on robust spoken dialogue [1]. To handle tasks with shifting roles and unstructured communication among teammates, we could incorporate a more general theory of teamwork [8, 18]. To handle tasks that involve simultaneous physical collaboration (e.g., two people jointly lifting a heavy object), we will need a tighter coupling of Steve's perception and body control [3]. Although research in these areas is still incomplete, many useful methods have been developed.


585

There is a growing understanding of the principles behind effective team training [4]. Empirical experiments are beginning to tease out the skills that make teams effective (e.g., task skills vs. team skills), the basis for team cohesion (e.g., shared mental models), the best types of feedback (e.g., outcome vs. process), and the best sources of feedback (e.g., instructor vs. teammate). Because our approach allows us to model face-to-face interaction among human instructors and students and their agent counterparts, we are now in an excellent position to incorporate and experiment with a variety of these new ideas in team training.

5

Acknowledgments

This work was funded by the Office of Naval Research, grant N00014-95-C-0179. We are grateful for the contributions of our collaborators: R. Stiles and colleagues at Lockheed Martin; A. Munro and colleagues at Behavioral Technologies Laboratory; and R. Angros, B. Moore, and M. Thiebaux at ISI.

References [1] J.F. Allen, B.W. Miller, E.K. Ringger, and T. Sikorski. Robust understanding in a dialogue system. In Proc. 34th Annual Meeting of the Assoc. for Comp. Ling., pp. 62-70, 1996. [2] R. Angros, Jr., W.L. Johnson, and J. Rickel. Agents that learn to instruct. In AAAl Fall Symposium on ITS Authoring Tools, 1997. AAAI Press. AAAI Tech. Report FS-97-01. [3] N.I. Badler, C.B. Phillips, and B.L. Webber. Simulating Humans. Oxford University Press, 1993. [4] E. Blickensderfer, J.A. Cannon-Bowers, and E. Salas. Theoretical bases for team self-correction: Fostering shared mental models. Adv. in Interdiscipl. Studies of Work Teams, 4:249-279, 1997. [5] C.A. Connolly, J. Johnson, and C. Lexa. AVATAR: An intelligent air traffic control simulator and trainer. In Proc. ITS '98, pp. 534-543. Springer, 1998. [6] C. Eliot and B.P. Woolf. An adaptive student centered curriculum for an intelligent training system. User Modeling and User-Adapted Instruction, 5:67-86, 1995. [7] R.W. Hill, Jr., J. Chen, J. Gratch, P. Rosenbloom, and M. Tambe. Intelligent agents for the synthetic battlefield: A company of rotary wing aircraft. In Proc. IAAI-97, pp. 1006-1012, 1997. AAAI Press. [8] N. Jennings. Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial Intelligence, 75, 1995. [9] W.L. Johnson, J. Rickel, R. Stiles, and A. Munro. Integrating pedagogical agents into virtual environments. Presence: Teleoperators and Virtual Environments, 7(6):523–546, December 1998. [10] R.M. Jones, J.E. Laird, and P.E. Nielsen. Automated intelligent pilots for combat flight simulation. In Proc. IAAI-98, pp. 1047-1054, 1998. AAAI Press. [11] J.E. Laird, A. Newell, and P.S. Rosenbloom. Soar: An architecture for general intelligence. Artificial Intelligence, 33(l):l-64, 1987. [12] S.C. Marsella and W.L. Johnson. An instructor's assistant for team-training in dynamic multiagent virtual worlds. In Proc. ITS '98, pp. 464-473. Springer, 1998. [13] A. Munro and D.S. Surmon. Primitive simulation-centered tutor services. In Proc. AI-ED Workshop on Arch. for Intell. Simulation-Based Learning Environments, Kobe, Japan, 1997. [14] J. Rickel and W.L. Johnson. Intelligent tutoring in virtual reality: A preliminary report. In Proc. Eighth World Conference on Artificial Intelligence in Education, pp. 294-301. IOS Press, 1997. [15] J. Rickel and W.L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied Artificial Intelligence, 1999. Forthcoming. [16] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. [17] R. Stiles, L. McCarthy, and M. Pontecorvo. Training studio: A virtual environment for training. In Workshop on Simul. and Interaction in Virtual Environments (SIVE-95), 1995. ACM Press. [18] M. Tambe. Towards flexible teamwork. Jour. of Artificial Intelligence Research, 7:83-124, 1997. [19] W. Zachary, J. Cannon-Bowers, J. Burns, P. Bilazarian, and D. Krecker. An advanced embedded training system (AETS) for tactical team training. In Proc. ITS '98, pp. 544-553. Springer, 1998.

586

Artificial Intelligence in Education S.P. Lajoie and M. Vivet (Eds.) /OS Press, 1999

Detecting and Correcting Misconceptions with Lifelike Avatars in 3D Learning Environments Joel P. Gregoire

Luke S. Zettlemoyer James C. Lester Multimedia Laboratory Department of Computer Science North Carolina State University Raleigh, NC 27695-8206 {jgregoi,lszertle,lester}@eos.ncsu.edu

Abstract: Lifelike pedagogical agents offer significant promise for addressing a central problem posed by learning environments: detecting and correcting students' misconceptions. By designing engaging lifelike avatars and introducing them into task-oriented 3D learning environments, we can enable them to serve a dual communication role. They can serve as students' representatives in learning environments and simultaneously provide realtime advice to address their misconceptions. We describe a three-phase avatar-based misconception correction framework in which (1) as students navigate their avatars through 3D learning environments, a misconception detector tracks their problem-solving activities by inspecting task networks, (2) when they take sub-optimal or incorrect actions, a misconception classifier examines misconception trees to identify the most salient misconceptions, and (3) the avatars employ a misconception corrector to intervene with tailored advice. The framework has been implemented in a misconception system for the lifelike avatar of CPU CITY, a 3D learning environment testbed for the domain of fundamentals of computer architecture. In CPU ClTY, students direct their lifelike avatar to navigate a 3D world consisting of a virtual computer, to transport data and address packets, and to manipulate computational devices. Informal studies with students interacting with CPU CITY suggest that the framework can is an effective tool for addressing students' misconceptions. 1. Introduction Research on lifelike pedagogical agents has been the subject of increasing attention in the AI & Education community [1,12,14,19,21]. Because of the immediate and deep affinity that children seem to develop for interactive lifelike characters, their potential benefits to learning effectiveness are substantial. By creating the illusion of life, animated agents offer much promise for increasing both the quality of learning experiences as well as the time that children seek to spend with learning environments. Moreover, recent advances in affordable graphics hardware are beginning to make the widespread distribution of realtime 3D animation software a reality. Lifelike pedagogical agents have significant potential for addressing a central problem posed by learning environments: detecting and correcting misconceptions. From the classic work on student modeling [6,7] plan recognition [9], and plan attribution [13], researchers have endeavored to unobtrusively track learners' problem-solving activities and dialogues and correct their misconceptions [17]. Lifelike pedagogical agents can enable us to recast these problems into a solvable form. By designing task-oriented 3D learning environments, representing detailed knowledge of their 3D models and layout, and introducing lifelike avatars into these worlds, we can craft engaging educational software that effectively addresses learners' misconceptions.

J.P. Gregoire et al. / 3D Learning Environments

587

For the past several years our laboratory has been investigating these issues by iteratively designing, implementing, and evaluating avatar-based 3D learning environments. Previously, we have reported results on task-sensitive cinematography planning for 3D learning environments [2], the habitable 3D learning environments framework [3], and performing student-designed tasks [15]. In this paper, we describe a three-phase avatar-based misconception correction framework: 1. Misconception Detection: As the learner solves a problem in a 3D learning environment by directing her avatar to navigate through the world and to manipulate objects within it, a misconception detector tracks her problem-solving actions by inspecting a task network. 2. Misconception Classification: When she takes sub-optimal actions, a misconception classifier examines a misconception tree to identify the most salient misconception. 3. Misconception Correction: Finally, a misconception corrector directs the avatar to address conceptual problems by examining a curriculum information network [24], intervening with verbal advice, and providing tailored responses to follow-up questions she poses. By enabling the avatar to serve in the dual capacity of student representative and advice-giving agent, it tightly couples problem-solving to misconception correction. In short, avatar-based misconception detection and correction provide a critical link between task-oriented conceptual development and addressing learners' misconceptions directly in problem-solving contexts. This framework has been implemented in a lifelike avatar, WHIZLow, who inhabits the CPU CITY 3D learning environment testbed (Figure 1) for the domain of computer architecture for novices. Given "programming assignments," learners direct WHIZLOW to pick up, transport, and insert data packets into registers to execute their programs. A formative study with students interacting with WHIZLOW suggests that the framework can be an effective tool for addressing misconceptions. 2. Lifelike Avatars in 3D Environments A particularly promising line of work underway is that of lifelike animated intelligent agents. Because of these agents' compelling visual presence and their high degree of interactivity, there has been a surge of interest in believable intelligent characters [1,4,5,8]. As a result of these developments, the AI & Education community is now presented with opportunities for exploring new technologies for lifelike pedagogical agents. Work in this direction is still in its infancy, but progress is being made on two fronts. First, research has begun on a variety of pedagogical agents that can facilitate the construction of component-based tutoring system architectures and communication between their modules [22,23], provide multiple context-sensitive pedagogical strategies [18], reason about multiple agents in learning environments [11], provide assistance to trainers in virtual worlds [16], and act as co-learners [10]. Second, projects have begun to investigate techniques by which animated pedagogical agents can behave in a lifelike manner to communicate effectively with learners both visually and verbally [1,14,19,21]. It is this second category, lifelike animated pedagogical agents, that is the focus of the work described here. In particular, we investigate the marriage of avatar-style problem-solving functionalities with misconception correction functionalities. Over the course of the past few decades, educators and learning theorists have shifted from an emphasis on rote learning to constructivist learning. The latter emphasizes learning as knowledge construction, which is reflected in the design requirements we impose on avatar-based misconception detection and correction: • Situated problem-solving: Because of constructivism's focus on the learner's active role in acquiring new concepts and procedures [20], problem solving should play a central role in learning. For example, to learn procedural tasks, a 3D learning environment could enable students to perform the task directly in the environment. Hence, rather than memorizing an abstract procedure, students should be able to actively solve problems, perhaps by immersively interacting with rich 3D models representing the subject matter. • Non-invasive task monitoring: Although continuously querying the student about her momentto-moment activities would provide up-to-date information about her knowledge and intentions— this would significantly simplify the problems posed by student modeling [6,7], plan recognition [9], and plan attribution [13]—such an invasive technique would distract the learner from her activities and likely result in substantially reduced learning effectiveness.

588


Figure 1

•

Embodied advice: The traditional approach to introducing explanation facilities into learning environments has been with text-based dialogues. However, given the motivational benefits of lifelike agents [1,12,14], "embodying" explanations in onscreen personae would enable learning environments to provide avatars that were (1) visually present in the world, e.g., immersed in a 3D learning environment, (2) able to exhibit navigational and manipulative behaviors in the world to clearly correct misconceptions, and (3) adept at coordinating their physical behaviors with a running verbal explanation that is tightly coupled to the task the learner was performing.

3. Detecting and Correcting Misconceptions in 3D Environments The learner's avatar serves two key roles in addressing misconceptions. First, she solves problems by manipulating a joystick to direct her avatar's behaviors. In response to these directives, the avatar performs actions such as picking up objects, carrying them, manipulating devices, and traveling from one location to another. Second, her avatar serves in an advisory capacity by providing explanations. When the student takes actions that indicate she harbors misconceptions about the domain, her avatar intervenes and corrects her misconceptions by providing appropriate advice. Hence, problem solving and advice both play out immersively in the 3D world. All of the avatar's misconception detection/correction functionalities are provided by the architecture shown in Figure 2. To begin, the avatar verbally presents a problem for the student to solve. As the student begins to perform the task to solve the problem, her position and the changes she enacts to the objects she manipulates are reflected in continuous updates made to the 3D world model, which represents the coordinates and state of all objects in the learning environment. All of her problem-solving activities are tracked by an avatar misconception handler, which observes and addresses the misconceptions she exhibits while interacting with the world. 3.1 Misconception Detection Problem-solving begins when the avatar poses a problem for the learner to solve in the 3D world. To take advantage of the immersive properties of 3D worlds, the avatar-based framework described here focuses on procedural tasks in which learners acquire concepts relating sequences of steps in a coached-practice setting. For example, in the CPU CITY learning environment testbed, students are posed problems about how to perform the fetch-execute cycle in the virtual computer and the avatar coaches their problem-solving activities. Given a particular "programming assignment," their job is to


589

Avatar Misconception Handler Misconception Detector

Avatar Behavior Directives

Misconception Classifier

Misconception Corrector

Questions

Avatar

3D World

3D 1 Behaviors

World Model

Figure 2. The avatar-based misconception detection and correction architecture

manipulate the avatar through the world to transport data packets and manipulate hardware components such as the ALU, decoder, and registers. After the avatar describes the problem, the misconception detector employs a goal-decomposition planner to create a hierarchical task network representing the potentially correct problem-solving actions to be taken by the learner. Task networks are graph structures whose nodes represent actions at varying levels of detail, organized in a hierarchy of time-ordered sequences. At the bottom of the hierarchy are primitive tasks whose actions require no further task decomposition. Each action specification encodes information about the type of action, the actors involved, the objects affected, and potential subsequent actions. As the student performs actions through the avatar in the environment, the misconception detector tracks her behaviors by traversing the leaves of the task network it generated. By inspecting the action types of active action specification nodes against student-driven actions performed by the avatar in the world, the misconception detector classifies each action she performs as either critical or non-critical. Actions are considered critical if they enact significant changes to the problem state. For example, in CPU CITY, critical actions include attempts to pick up and deposit objects, e.g., data packets, to pass through a portal from one architectural component to another, and to manipulate a device, e.g., pulling a lever of the ALU. Periods of latency that exceed a task-specific time limit are also considered critical actions. These are used to detect misconceptions in which the student does not know what course of action to follow next and "freezes." Example non-critical actions in CPU CITY include turning the avatar slightly to the left or walking forward (but not through a portal). Efficiently managing classification of actions is essential to the intention monitoring enterprise. First, to combat the problem of enumerating the full set of non-critical actions, the misconception detector polls the world model for critical actions and considers all other actions to be non-critical. Second, in learning environments that provide instruction about tasks of any complexity, multiple solution paths are possible. The misconception detector must therefore exploit a representation that is sufficiently flexible to accommodate more than one solution. Hence, the misconception detector generates task networks that are lattices of branching, partially ordered task

590

J. P. Gregoire et al. / 3D Learning Environments

specifications. When the student embarks on a solution that consists of a particular order of atomic actions, the misconception detector tracks her activities by traversing the task network via the particular solution path she adopts. It produces one of three outcomes: (1) if the current action is deemed non-critical, intention monitoring continues; (2) if the current action is deemed critical and correct, the misconception detector advances the action task node specification to the appropriate successors; (3) if the current action is critical but incorrect, the misconception detector invokes the misconception classifier to classify a potential misconception. 3.2 Classifying Misconceptions When the student performs a critical action that is sub-optimal, the misconception classifier determines the type of misconception the student may have about the subject matter. Rather than invasively probing the student with questions, the misconception classifier exploits the knowledge about the student's problem-solving activities to make its diagnosis. To do so, it first inspects (1) the active task specification nodes1 in the task network and (2) the specific action performed by the student Next, it searches through a misconception tree to determine the most specific misconception that is relevant to the student's sub-optimal actions. The misconception tree represents the most common misconceptions about procedural knowledge and the most common misconceptions about conceptual knowledge that may induce procedural missteps. It encodes a decision tree, where the children of each node represent specialized categories of misconceptions. At each level, the most salient problem-solving actions performed by the student and key environmental features of the 3D world are used to distinguish different categories of misconceptions. Beginning at the root, the classifier traverses the tree as deep as it can to determine the most specific misconception class applicable to the current situation. To illustrate, suppose a student interacting with the avatar of CPU CITY has been given a "programming exercise" in which she must retrieve the next microcode instruction. To do so, she must pull the Load lever in the RAM. She directs the avatar to pull the lever but has neglected to put the address in the RAM input register. Because pulling a lever is a critical action but does not successfully advance the current goal to a legitimate subsequent goal in the task network, the misconception detector signals a misconception and invokes the misconception classifier. The classifier searches through a misconception tree (Figure 3). The first decision in the tree is to determine whether the physical location of the avatar is correct with respect to the current goal. Since she is currently in the RAM, a misconception about location (incorrect environment) is ruled out and the classifier turns to potential inappropriate actions taken in the correct location (rationale absent). Because her actions have been tracked by the misconception detector, it is known that the offending behavior involved manipulating devices in the RAM, in this case, the Load lever (incorrect lever). Finally, by inspecting the task tree, the classifier determines that a lever pull action is currently inappropriate but will be appropriate soon; the student has skipped over an intermediate prerequisite step, a common problem in learning procedural tasks (step skip). Because a leaf in the tree has now been reached, the most specific category has been identified. 3.3 Correcting Misconceptions After the student's most likely misconception has been identified, the misconception classifier invokes the misconception corrector. Given the specific category of misconception and its contextually instantiated arguments, the corrector indexes into a curriculum information network (CIN) [24] that encodes misconception correction topics and the prerequisite relations that hold between them. For example, given the step skip category identified above and the specific arguments (RAM-load-attempt), the corrector for the CPU CITY avatar indexes into the CIN and directs the avatar to provide verbal advice on a particular topic. A template associated with the selected topic is instantiated with lexicahzations of arguments from the current situation and the avatar is directed to provide verbal advice. In this case, the avatar informs the student, "You're skipping a step. You 1 Multiple task specification nodes may be active because of multiple (alternate) paths through the task network to achieve a particular goal.


591

Figure 3. Example traversal of the misconception tree for CPU CITY

forgot to X" where X is instantiated here as, "put the address in the RAM input register." The strings are annotated with prosodic markups, passed to the speech synthesizer, and then spoken by the avatar. Misconceptions are further corrected with two student-initiated question-asking techniques. If the student asks for further assistance by pressing a "help" button (Figure 1), the corrector executes the following three-step algorithm. (1) The corrector examines the students recent actions, the active task action specifications nodes in the task network, and the world model to index into the CIN. (2) It inspects an overlay student model [7] associated with the CIN to assess the student's prior exposure to the concept(s) discussed in the selected CIN node. (3) If the prior exposure is limited, it directs the avatar to provide a general (abstract) explanation of the relevant concepts; if there has been some degree of prior exposure, the avatar will be instructed to provide more specific assistance; if the student has been exposed to the current material multiple times and is still experiencing difficulty, the avatar will offer to perform the task for the student and explain it using pedagogical agent demonstration techniques [1,15,21]. Students may also request additional assistance by asking specific questions via a pop-up question-asking interface. If they request information about a particular topic, the corrector performs a topological sort of overlay CIN nodes to determine prerequisites of the selected concept. It then directs the avatar to provide the necessary background information and addresses the question. 4. A Lifelike Avatar for the CPU CITY Learning Environment The misconception framework has been implemented in WHIZLOW, a lifelike avatar who inhabits the CPU CITY 3D learning environment testbed for the domain of computer architecture for novices.2 CPU CITY'S 3D world represents a motherboard housing nearly 100 3D models that represent the principal architectural components including the RAM, CPU, and hard drive. Students are given "programming" tasks in which they direct through the virtual computer. The avatar's operators used to generate task networks to track students' problem-solving activities range from operators for picking up and depositing data, instruction, and address packets to operators for interacting with devices that cause arithmetic and comparison operations. Its misconception classifier handles a broad range of misconceptions including incorrect locations for operation attempts, procedural sub-task repetitions and step skips, inappropriate device manipulations where pre-conditions have not been satisfied, and confusions between addresses and data and between input and output. The avatar's misconception corrector addresses these misconceptions by employing a CIN with more than 60 nodes. Altogether, the misconception handler, avatar behavior generator, and the CPU CITY learning environment consists of approximately 60,000 lines of C++ and employs the OpenGL graphics library for real-time 3D rendering. The avatar has been subjected to a number of formative studies with more than 40 subjects interacting with WHIZLOW in CPU CITY. Most recently, the misconception framework has been investigated with a focus group study with 7 students, each of whom interacted with WHIZLOW for an 2

The current implementation runs on Pentium II 300 MHz machines, with 64 MB of memory and an 8MB SGRAM Permedia2 OpenGL accelerator at frame rates between approximately 10-15 FPS. WHIZLOW's speech is synthesized with the Microsoft Speech SDK 3.0. Generating speech for a typical sentence requires approximately 1/8 second, which includes the time to process prosodic directives.

592

J. p. Gregoire et al. / 3D Learning Environments

hour to work on 5-7 programming exercises. To help the students feel comfortable, the experimenter encouraged them to pose questions to the avatar frequently. Subjects' experience interacting with WHIZLOW suggests that the task network representation is sufficiently expressive to enable the avatar to comment effectively on their activities in the world. In general, subjects were very pleased with his responses to their questions. Even though the avatar exhibited awkward movements at times, they found him extremely friendly and likeable. Because task network nodes encode preconditions on actions and their relationships with devices, they enabled the pedagogical planner to note students' problem-solving difficulties. The granularity of task networks appeared to be at approximately the appropriate level. If it were any lower, the agent would have made comments about low-level details such as micro-manipulation by subjects of the joystick, an activity with which they experienced no problems. In contrast, if the granularity were any higher the misconception detector would be unable to know where the student should be directing her avatar, what devices she should be manipulating, or how she should be performing the task. Perhaps most critically, students interacting with avatars driven by the misconception framework learn why actions should be performed. First, they learn how their actions relate to constraints on the operation of the devices they operate. For example, subjects interacting with CPU CITY learned that RAM must be accessed with a specific address by being required to obtain a value from memory, but being unable to do so without specifying a particular address for it. Second, students learn the consequences of their actions through explanations provided by the agent. For example, when one subject was attempting to obtain a data packet from the RAM and a page fault occurred, WHIZLOW explained to him that he needed to go instead to the hard drive because the data sought after by the student had in fact been stored on there rather than in the RAM. 5. Conclusions and Future Work The avatar-based misconception framework offers much promise for coupling 3D learning environments with procedurally-oriented tasks. By addressing misconceptions in the context of problem-solving, corrections offered by the avatar at these junctures—particularly by an engaging lifelike avatar—can be readily assimilated. By detecting students' misconceptions via tracking task networks, classifying misconceptions via traversing misconception trees based on features of the task and physical characteristics of the 3D learning environment, and correcting misconceptions via directing the avatar to deliver topical advice based on prerequisites in a CIN, avatars can serve as effective tools for addressing misconceptions. Three key directions for future work are particularly intriguing. First, the misconception framework currently makes a single-fault assumption. If in fact a student has multiple misconceptions about the domain—this is frequently the case, particularly for novices—the framework now can only detect and correct one misconception at a time. Addressing multiple misconceptions is critical for deploying these technologies. Second, the knowledge engineering involved in designing task tree operators, misconception trees, and CINs is substantial. Accelerating the creation of these knowledge structures and ensuring they remain mutually consistent presents non-trivial challenges. Third, conducting large-scale empirical studies in even more complex 3D worlds will shed considerable light on the most effective misconception detection and correction techniques. We will be exploring these issues in our future research.

Acknowledgements The authors gratefully acknowledge William Bares for his work on the cinematography planner, Dennis Rodriguez for his work on the CPU CITY interface, and the members of the 3D modeling and animation team (Tim Buie, Mike Cuales, and Rob Gray), which was led by Patrick FitzGerald. Thanks also to Francis Fong for his work on the animation tools and to Charles Callaway, Brent Daniel, and Gary Stelling for comments on an earlier draft of this paper. Support for this project is provided by the National Science Foundation under grants CDA-972039S (Learning and Intelligent Systems Initiative) and IRI-9701503 (CAREER Award Program), the North Carolina State University IntelliMedia Initiative, and Novell, Inc.


593

References [1] Andre, E., Rist, T., & Muller, J. (1998). Integrating reactive and scripted behaviors in a life-like presentation agent. In Proceedings of the Second International Conference on Autonomous Agents (pp. 261-268). Minneapolis: ACM Press. [2] Bares, W., Zettlemoyer, L., Rodriguez , D., & Lester, J. (1998). Task-sensitive cinematography interfaces for interactive 3D learning environments. In Proceedings of the Fourth International Conference on Intelligent User Interfaces (pp. 81-88). San Francisco: IUI '98. [3] Bares, W., Zettlemoyer, L., & Lester, J. (1998). Habitable 3D learning environments for situated learning. In Proceedings of the Fourth Int'l Conference on Intelligent Tutoring Systems (pp. 76-85). Berlin: Springer-Verlag. [4] Bates, J. (1994). The role of emotion in believable agents. Communications of the ACM, 37, 122-125. [5] Blumberg, B., & Galyean, T. Multi-level direction of autonomous creatures for real-time virtual environments. In Proceedings of SIGGRAPH '95 (pp. 47-54). New York: ACM Press. [6] Brown, J., & Burton, R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2, 155-191. [7] Carr, B., & Goldstein, I. (1977). Overlays: A theory of modeling for computer-aided instruction. Massachusetts Institute of Technology, Artificial Intelligence Laboratory, AI Memo 406. [8] Cassell, J. (in press). Embodied conversation: Integrating face and gesture into automatic spoken dialogue systems. In S. Luperfoy (Ed.), Automatic Spoken Dialogue Systems. Cambridge, MA: MIT Press. [9] Chu-Carroll, J., & Carberry, S. (1994). A plan-based model for response generation in collaborative taskoriented dialogues. In Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 799805). Seattle, Washington: American Association for Artificial Intelligence. [10] Dillenbourg, P., Jermann, P., Schneider, D., Traum, D., & Buiu, C. (1997). The design of MOO agents: Implications from an empirical CSCW study. In Proceedings of the Eighth World Conference on Artificial Intelligence in Education (pp. 15-22). Kobe, Japan: IOS Press. [11] Eliot, C., & Woolf, B. (1996). A simulation-based tutor that reasons about multiple agents. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 409-415). Portland, Oregon: American Association for Artificial Intelligence. [12] Hietala, P., & Niemirepo, T. (1998). The competence of learning companion agents. InternationalJournal of Artificial Intelligence in Education. 9(3-4), 178-192. [13] Hill, R., & Johnson, W. L. (1995). Situated plan attribution. Journal of AI in Education, 6, 35-66. [14] Lester, J.C., Converse, S.A., Kahler, S. E, Barlow, T., Stone, B.A., & Bhogal, R. (1997). The persona effect: Affective impact of animated pedagogical agents. In Proceedings of CHI '97 (Human Factors in Computing Systems) (pp. 359-366). Atlanta: ACM Press. [15] Lester, J., Zettlemoyer, L., Gregoire, J., & Bares, W. (in press). Explanatory lifelike avatars: Performing user-designed tasks in 3D learning environments. To appear in Proceedings of the Third International Conference on Autonomous Agents, Seattle, Washington. [16] Marsella, S., & Johnson, W.L. (1998). An instructor's assistant for team-training in dynamic multi-agent virtual worlds. In Proceedings of the Fourth Int'l Conference on Intelligent Tutoring Systems (pp. 464-473). Berlin: Springer-Verlag. [17] McCoy, K. (1989-90). Generating context-sensitive responses to object-related misconceptions. Artificial Intelligence, 41, 157-195. [18] Mengelle, T., De Lean, C., & Frasson, C. (1998). Teaching and learning with intelligent agents: Actors. In Proceedings of the Fourth Int 7 Conference on Intelligent Tutoring Systems (pp. 284-293). Berlin: SpringerVerlag. [19] Paiva, A., & Machado, I. (1998). Vincent, an autonomous pedagogical agent for on-the-job training. In Proceedings of the Fourth Int'l Conference on Intelligent Tutoring Systems (pp. 584-593). Berlin: SpringerVerlag. [20] Piaget, J. (1954). The construction of reality in the child. New York: Basic Books. [21] Rickel, J., & Johnson, W. L. (1997). Intelligent tutoring in virtual reality: A preliminary report. In Proceedings of the Eighth World Conference on AI in Education (pp. 294-301). Kobe, Japan: IOS Press. [22] Ritter, S. (1997). Communication, cooperation, and competition among multiple tutor agents. In Proceedings of the Eighth World Conference on AI in Education (pp. 31-38). Kobe, Japan: IOS Press. [23] Wang, W., & Chan, T. (1997). Experience of designing an agent-oriented programming language for developing social learning systems. Proceedings of the Eighth World Conference on AI in Education (pp. 714). Kobe, Japan: IOS Press. [24] Wescourt, K., Beard, M., & Barr, A. (1981). Curriculum information networks for CAI: Research on testing and evaluation by simulation. In P. Suppes (Ed.), University-level Computer-Assisted Instruction at Stanford: 1968-1980 (pp. 817-839). Stanford, CA: Stanford University Press.


Posters



597

Learning Effectiveness Assessment: A principle-based framework Leila Alem CSIRO Mathematical and Information Sciences leila. alem @ cmis. csiro. au

Clark N. Quinn Knowledge Universe Interactive Studio cnquinn@kno\vledgeu. com

John Eklund Access Australia Co-operative Multimedia Centre j. eklund@accesscmc. com

Abstract: With the rapid growth in the use of technology to deliver and enhance teaching and learning, it is becoming increasingly important to develop a reliable and efficient means for assessing the learning effectiveness of computer-based instructional systems. A sound understanding of the theory and background to learning effectiveness evaluation is necessary in order to better plan, design and conduct an evaluation program. While a purely theoretical rationale would support one approach, the pragmatics of the real world indicates the need for an alternative. This paper covers the key issues related to the evaluation of learning effectiveness of computer based instructional systems. This framework has been designed as the stepping stone for the development and operationalisation of a number of evaluation instruments that form the basis for conducting the learning effectiveness assessment service (LEA). This paper focuses on the design process of the learning effectiveness framework. It is composed mostly of three processes: developing a set of key learning effectiveness principles according to a set of instructional knowledge types, defining learning effectiveness key criteria and their classification, and implementing the criteria into the evaluation instruments.

1. Introduction With the rapid growth in the use of technology to deliver and enhance teaching and learning, it is becoming increasingly important to develop reliable and efficient means for

598

L. Alem et al. / Learning Effectiveness Assessment

assessing the learning effectiveness of computer-based instructional systems. This paper outlines a principle-based learning effectiveness evaluation framework, focussing on the design process evaluation process of the framework. We followed three processes to develop the a Learning Effectiveness Assessment (LEA) framework: developing a set of key learning effectiveness principles according to a range of instructional knowledge types, defining learning effectiveness key criteria and their classification, and implementing the criteria into the evaluation instruments. We have followed a pragmatic design process for the design of the learning effectiveness evaluation method. It consisted of a number of steps, including a literature survey, defining the objectives of the evaluation, trial of existing methods, developing a set of learning effectiveness principles that span a variety of learning and instruction theories, instrument development and refinement, and operationalising the method. 2. Defining Learning Effectiveness In defining learning effectiveness there are a number of critical issues to consider. Do learning conditions meet pedagogical objectives? Do instructional events meet student competencies? Do instruction events incorporate the particular condition of learning? Do instructional media and material coherent with learning activities? Here the instructional events are examined in terms of their relation to the pragmatic process of learning: gaining attention, information about objectives, guiding learning (stimulating recall of prerequisite learning, presenting the stimulus material, semantic encoding), providing feedback (reinforcement), enhancing retention and transfer, and assessing the performance. We consider that the basis for evaluating the effectiveness of a learning environment includes how well a number of key instructional features are supported by such an environment (design factors), and how much the learning environment has been accepted by the learner (acceptance factors). The design factors include: instructional goals, instructional content, learning tasks, learning aids (support and feedback), and assessment (evaluation of learner success). The acceptance factors include: level of motivation to use the product, level of active participation involved, quality of learning support, and level of user satisfaction. We do not evaluate the attainment of learning objectives or quality of the domain content, as such evaluation requires content expertise. 3. Developing the method Evaluation methods can be classified in term of user/learner review methods and expert review methods. User/learner review methods are methods in which the evaluation data is collected from the users/learners directly (questionnaire, confidence log, interview, focus group) or is collected by observing the user using the system (observation, code sheet, log, test data analysis). Expert review methods are methods that involve experts going through the system to be evaluated (checklist, questionnaire). We have followed a pragmatic design process for the design of the learning effectiveness evaluation method. It consisted of a number of steps, including: • Literature survey: The initial step was a literature survey to determine potential evaluation method candidates, and the extent to which these methods had been evaluated. We also made a preliminary assessment of the likely costs of the use of these methods based upon personnel, resources, and time. • Defining the objectives of the evaluation: We have organised a focus group with a

L. Alem et al. / Learning Effectiveness Assessment

•

•

•

•

599

select number of existing customers with learning products in order to defined with them the objectives of the evaluation and to establish and maintain a customer-focus during the method design. Existing Method Trial: On the basis of this data, we selected candidate instruments and methods for evaluation through trial on real products. Our experiences led to some preliminary product evaluations and a determination to develop our own instruments that reflected our understanding of fundamental learning principles. Learning effectiveness evaluation framework: As an initial step, we developed a set of learning effectiveness principles that span a variety of learning and instruction theories. This set of principles has been refined throughout the testing process, and serves as the basis for our evaluation framework. Evaluation Instruments Specification: The elements of the learning method developed centre on an initial product briefing to be conducted with the client, an expert review method and instrument, an user/learner review method and instrument, and report generation guidelines and template. Instrument development: The expert review instrument has been prototyped and trialed with real customer products, which has led to an iterative process of review and refinement. The users/learners review instrument followed a similar cycle, as have the project brief.

Using an analogy to Nielsen's [1] heuristic evaluation [2], we designed a converging information evaluation mechanism. The process considers all steps from defining the evaluation objectives to writing the evaluation report via determining the criteria for evaluation, and choosing the right evaluation instrument. Similar to Nielsen, we have an expert review guided by principles; in this case principles for learning [3]. We also are similar in our use of user participation. This analogy is deliberate in two ways. First, our interest is in creating a pragmatic model, which Nielsen's model accomplishes successfully. Second, the existing usability testing methods used at the Australian Multimedia Testing Centre (where the LEA service will be offered) are similarly modeled on Nielsen's heuristics. The resulting LEA service provides a well-articulated set of instruments and methods for formally assessing learning technology. The service is aligned from learning principles, strikes the best balance between rigor and efficiency, and integrates into a successful testing enterprise.

4. References [1] Neilsen (1994) Heuristic Evaluation. In J. Nielsen & R. L. Mack (Eds) Usability Inspection Methods. New York: John Wiley & Sons. [2] Quinn (1996) Pragmatic evaluation: lessons from usability. In A. Christie, P. James & B. Vaughan (eds.) Proceedings of ASCILJTE 96. Uni SA, Adelaide. p.437-444. [3] Eklund, Quinn & Alem (1999 in press) CBT: Is it effective? Technology Business Review. April-May, 1999

600

Artificial Intelligence in Education SP Lajoie and M. Vivet (Eds.) IOS Press, 1999

Piagetian Psychology in Intelligent Tutoring Systems Ivon Arroyo, Joseph E. Beck, Klaus Schultz, Beverly Park Woolf {ivon,beck,schultz,bev}@cs.umass.edu School of Education & Computer Science Department -University of Massachusetts, Anthers* We describe in this paper the use of Piaget's notion of cognitive development [4,5] in the building of pre-tests that would then allow for the improvement of a tutor's reasoning ability. We are interested in developing law-cost computer-based instruments which will detect individual differences that not only predict a student's overall performance, but that can also be easily applied to actual tutoring decisions. Our hypothesis was that students with different levels of cognitive development should behave differently in the context of our math tutoring system. This is a sufficient reason for children to be taught with different strategies. We thought it was very likely that our population of elementary school students had different cognitive levels, although the range of ages of the students was very small (10-11 years old). If this were the case, then cognitive development level would be an important aspect to consider while adapting the behavior of our tutoring system. We have adapted classic Piagetian tasks [8] used to measure cognitive development levels for use on computer. We found that this measure predicts student performance in the amount of time to solve problems and also in the number of problems students need to go through to achieve mastery of a topic. We are interested in how this measure of cognitive development could be usefully applied to enhance the behavior of the tutor for students at different cognitive levels. Further research will deal with finding out what strategies are appropriate for students with different cognitive levels. We have some ideas with respect to this issue. There is evidence in this paper for students at a particular cognitive level showing improvements in performance with the aid of certain hints. Meanwhile, students at other cognitive levels showed no improvement in their performance after seeing the same hints. 1. Introduction MFD (Mixed numbers, fractions and decimals) is an intelligent tutoring system (ITS) aimed at teaching fractions, decimals and whole numbers to elementary school students [2]. A version of MFD was evaluated in May 1998. It tutored operations with whole numbers and fractions. This version was tested with 601 sixth grade elementary school students during three days (for a total of three hours using the system). Students were randomly divided into an experimental and a control group. The experimental group used a version with intelligent hint selection and problem selection. Intelligent problem selection consisted in giving the student a problem with an appropriate difficulty level, depending on the level of mastery of different skills. Intelligent hint selection consisted in determining the most appropriate amount of information to provide in a hint. The control group also used a version with intelligent problem selection but received no feedback other than a prompt to try again after an incorrect response. We wanted to investigate the benefits of the help component when the student was at a particular cognitive level. 2. The experiments We gave students a computer-based pre-test that measured their level of cognitive development. Ten computer-based Piagetian tasks measured different cognitive abilities. Some of them were easier to implement on computer than other ones. We intended to determine with these tasks if the students were at one of the last two stages of cognitive development proposed by Piaget (concrete 1

Due to absentees among students, we only have complete data for 46 students

/. Arroyo et al. / Intelligent Tutoring Systems

601

operational stage and formal operational stage). Seven tasks were given to the students to verify dominance of concrete operations and three tasks checked for formal operations. All these experiments were based on those that Jean Piaget proposed [3,4,5,8]. These tasks involved a high level of interactivity. The ones for concrete operations tested number conservation, serialization, substance conservation, area conservation, class inclusion, functionality and reversibility. Three more tasks were administered that should be correctly solved when a person reaches the formal operations stage. We measured establishment of hypotheses, control of variables in experimental design, drawing of conclusions, proportionality and combinatorial analysis. 3. Description of results The number of Piagetian tasks that the student correctly accomplished was used as a measure of cognitive development, with zero as the minimum possible level and 10 as the highest possible level. The mean number of correct answers for the sixth grade pupils in the study was 5.7, with a standard deviation of 2.1. When a session in MFD starts, the student first goes through a section of problems about whole numbers (addition, subtraction, multiplication and division). There is a significant negative correlation (Pearson R=-0.391, two-tailed p=0.006) that shows that children with lower cognitive levels spend more time solving whole number problems. This suggests that students with higher cognitive levels are faster solvers of whole number problems, for both the experimental group (students who received help) and control group (students who did not receive help). We also decided to investigate an alternative measure of performance. We looked at how many problems students at different cognitive levels needed to go through to reach mastery of whole numbers. Mastery of whole numbers is considered to be reached when the student solves a certain number of problems for each whole number operation (+, -, x, /) with little or no help at all. The result was a significant negative correlation between number of problems seen and cognitive level (Pearson R=0.39, two-tailed p=0.007). This implies that the higher the child's cognitive level, the least problems she needs to go through to master the topic. This also implies that students at higher levels of development are succeeding at more problems, or at problems that are more difficult. Students with low cognitive levels needed more problems on average to reach mastery of whole numbers. To verify that this was true, we performed an independent samples t-test to compare the number of problems required by students above and below a median cognitive level. The means of these two groups were significantly different (two tailed t-test, p=0.004). A) EXPERIMENTAL GROUP

B) CONTROL GROUP

Figure 1: Relationship between cognitive level and performance for the fraction problems.

602

I.

Arroyo el al. / Intelligent Tutoring Systems

The tutor will move students on to the fractions section only when they have shown mastery of whole numbers. We wanted to know if the hints for the fractions section had been appropriate for students at any cognitive level, or at some level in particular. We verified this by comparing the performance of those students who did not receive any help other than a "try again" message against the performance of those students who were provided the tutor's help. We measured performance in this case as the number of actual problems solved weighted by the difficulty of those problems. We found a significant positive correlation between cognitive development level and performance for those students who had not received intelligent help (Pearson R=0.584, twotailed p=0.007). This effect can be explained by the fact that when there is no intelligent help in the tutor, performance depends on the capabilities of the student. This high correlation is not seen for the experimental group, who received intelligent help. Figure 1 shows how performance in the experimental group is specially high for a group of students of average cognitive ability –cognitive levels 4 to 6, which could be considered as a late concrete operational stage—. It seems that the current hints and the process of intelligent hint selection that the tutor provides are best designed for a group of students with middle level of cognitive development. 4. Conclusions and future work We have constructed a test to measure elementary school students' level of cognitive development according to Piaget's theory of developmental stages. We have adapted classic tasks used to measure these levels for use on computer. The test requires approximately 10 to 15 minutes for students to complete. This measure predicts student performance at a variety of grain sizes: effectiveness of hints received, rate of failure, amount of time to solve problems and the number of problems students need to attempt to master a topic. The data we have obtained from 46 sixth grade students strongly suggests that cognitive level is a useful variable to add to an intelligent tutoring system, when the population of students is around 10 years old. These results are similar to prior predictive work in the field [1,6]. However, our measure takes little time to administer, which is an advantage given the relatively brief period of time most tutors are used. We plan to pursue mis research along several independent paths. First, we are interested in validating the instrument. Another path is augmenting the tutoring knowledge by including Piagetian information about each hint. The tutor can use this knowledge to avoid presenting hints that are beyond the student's understanding. Finally, we are determining how to add cognitive development to the tutor's teaching and update rules. This is difficult, as most teachers/tutors do not think about mis information when instructing. Therefore, we are considering using machine learning techniques [7] to allow the tutor to determine for itself how to best use this information. Acknowledgements: We acknowledge support for this work from the National Science Foundation, HRD-9714757. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the granting agency.

5. References [1] Anderson, J. (1993). Rules of the Mind. Hillsdale, NJ: Lawrence Erlbaum Associates. [2] Beck, J.; Stern, M.; Woolf, B. (1997). Using the student model to control problem difficulty. The Sixth International Conference on User Modeling. [3] Ginsburg, H,; Opper, S. (1988). Piaget's Theory of Intellectual Development. [4] Piaget, J. (19S3) How Children Form Mathematical Concepts. In Scientific American. [5] Piaget, J. (1964). The Child's Conception of Number. Roudedge & Kegan. [6] Shute, V. (199S). Smart evaluation: Cognitive diagnosis, mastery learning and remediation. In Proceedings of Artificial Intelligence in Education. Pages 123—130. [7] Stern, M.; Beck, J.; Woolf B. (1999). Naive Bayes Classifiers for User Modeling Submitted to The Seventh International Conference on User Modeling. [8] Voyat Gilbert E. (1982). Piaget Systematized.


603

Software Agents for Analysis of Collaboration in a Virtual Classroom Patricia Augustin Jaques - paugustinffiinf.pucrs.br Flavio Moreira de Oliveira - [email protected] Pontificia Universidade Catolica do Rio Grande do Sul - Mestrado em Informdtica Av. Ipiranga, 6681 - Predio 16 -Porto Alegre -RS- Brazil - Cep:90619-900 Fone/Fax: 55-51-320-3621 Trends in distance education show a growing emphasis in collaborative learning, stimulating students to exchange ideas and information. When the students interact among themselves, they feel more motivated and engaged and get better results in their studies. A collaborative environment, however, will demand a higher effort from the teacher, who will have to supervise all the discussions among the learners, so that they do not deviate from the intended topic for the lesson. Moreover, the information proceeding from the interactions among the students provide the teacher with insights useful for an individual evaluation of the students and the course. In this way, this paper presents a Multi-agent architecture able to monitor the communication tools in a distance learning group. This system will analyze the discussions taking place in these tools (discussion list, chat and newsgroups), showing to the teacher statistical information (percentage of participation and number of messages) and identifying possible associations in the interactions topics and subtopics that interest the students, groups of learners that interact intensively, etc.

We observe, currently, a great improvement of quality in the distance education, due to the use of Internet-related tools with great potential. Such new technologies offer twoway communication between students and teacher. It has been observed in distance learning an increasing emphasis to the collaboration among the students. This is because the educators believe that this exchange of communication among students improves interest in the study, making them get better results. In the distance education the collaboration can be supported by communication tools, such as: discussion list, newsgroup and chat. The teacher should observe and analyze all the communication exchanged by the students, so that he/she can follow the evolution of each student, and to verify whether the group is focusing on the topic proposed for study. This information is very important to the teacher, because it allows to evaluate the students and the course, verifying if the learning is occurring at the expected level. However, this emphasis on collaboration will result in higher interaction among the students, higher number of exchanged messages, thus making more difficult to the teacher to monitor the messages exchanged by the students. Although the information collected in the messages is very important to the teacher, the current tools do not aid the teacher in this

604

P. A. Jacques and F.M. de Oliveira / Collaboration in a Virtual Classroom

task. The DE literature recognizes the existence of the problem, but does not present effective solutions in such a way [7]. Thus, we consider to insert in this learning environment a system of collaboration monitors able to observe all the interactions that are occurring in a distance learning environment, to extract information from these interactions, to perform analyses and to transmit the results to the teacher. Since both the problem and the environment have distributed and interaction-based characteristics, we choose a multiagent architecture [6] as a natural approach for development. The proposed multi-agent system has four agents. Three agents, which we call tool agents, are responsible for collecting information from the messages of the Internet communication mechanisms: Discussion List, Newsgroup and Chat. The fourth is the teacher agent which, when requested, shows the analysis performed by the other agents, as well as the global analysis that is carried out by itself. Beyond these, there is a name server agent, necessary due to the framework adopted for the implementation of the system, the Java Agent Template [3] version 0.3. This agent has the name and address of all the agents belonging to the society. This agent has the task of supplying the address of an unknown agent for the other agents of the society. The teacher agent is located in the machine chosen by the teacher. The system has, for each one of the communication mechanisms of Internet (Discussion List, Chat and Newsgroup), an agent which will periodically search the information in that system. For example, it has a agent in charge for the Discussion List. This agent is activated by the system from time to time and searches all the new messages that arrived in the List. In the same way, there is an agent responsible for the newsgroup, as well as an agent responsible for the log archive of chat. When the teacher agent sends an analysis request to the tool agents, they send in reply a message with the name and address of the archive with the analysis. While they are reading the new messages, the tool agents collect data that will be used for analysis. These information is stored in a log file with the following fields: ID (identification for each message of email and newsgroup and chat), From, Reply, Subject, Sub-subject, Date, Hour, Communication tool (chat, newsgroup or email). The subjects and sub-subjects of the messages are identified through the subject field in the message (in the case of e-mail or newsgroup messages) or by keywords in the content of the messages. In order to verify the syntactic and morphologic meanings of these words, we use the Lexis Dictionary [5] and a Thesaurus provided by the teacher. The use of these dictionaries allows the agents to identify different words that relate to the same subject and specific topics on a subject (sub-subjects). Each agent performs an analysis from the data contained in its database in a proactive way. Thus, when the teacher agent requests, it sends name and address of the archive that contains the respective analyses. The teacher can also request the analysis of the interactions that had occurred in an intermediary period of time. There are essentially three types of associations that can be identified by the agent in the interactions: /. Student-student: Identify students that interact intensively among themselves. 2. Student-subject: Information regarding subjects more addressed by each student This type of analysis allows to identify which topic interests each one of the students

P. A. Jacques and F.M. de Oliveira / Collaboration in a Virtual Classroom

605

3. Student-student-subject: In this type of association the agent will identify the subjects that interest a specific group of students. In addition to the associations above, the agents provide general statistical analyses from the data extracted from messages, which include number of exchanged messages and percentage of participation, that will be presented to the teacher in order to assist in the process of evaluation of the students and the course. For the implementation of the multi-agent society we are using Java Agent Template framework (JAT) version 0.3 [3]. JAT supplies a set of classes, written in Java language, that allow the construction of software agents which communicate in a community of agents distributed in the Internet. In JAT, all the messages of the agents use KQML as protocol for communication language. All the functionalities of the agents are being implemented in Java [1]. In order to validate this system, we will use it in a distance learning course of the Campus Global1 in Pontificia Universidade Catolica do Rio Grande do Sul - PUCRS [2].

References [1] Cornell G. & Horstmann C. Core Java 1.1, Volume II: Advanced Features. P. Hall. [2] S. N. Ferreira & M. B. Campos. CBP 2001: Uma experiencia pratica de sala de aula virtual nos cursos de graduacao da PUCRS. In: IX CLAIO, Latino-Iberoamerican Congress on Operations Research, Buenos Aires, September, 1998. [3] Frost, R. (1998) Java Agent Template (JAT). URL: http://java.stanford.edu/java-agcnt/html.

[4] Johson, W. & Shaw, E. (1997) Using Agents to Overcome Deficiencies in Web-Based Courseware. In: Proceedings of the workshop Intelligent Educational Systems on the World Wide Web. 8* World Conference of the AIED Society, Kobe, Japan. [5] Lima, V. L. S.; Abrahao, P. R. e Paraboni, I. Approaching the dictionary in the implementation of a natural language processing system: toward a distributed struture. IV South American Workshop on String Processing. Valparaiso, Chile, Novembro. [6] B. Moulin e B. Chaib-Draa. An Overview of Distributed Artificial Intelligence. In Fundations of Distributed Artificial Intelligence. G. O'Hare e N. Jennings(eds.). John Wiley & Sons; 19%. [7] Palme, J. (1997) Groupware tools to support distance education [http]. URL: http://www.dsv.su.se/~jpalme/distance-education/mmm-tools.htm

[8] Sherry, L. (1997) Issues in Distance Learning [http]. URL: gopher://oasis.Denver.Colorado.EDU/hO/UCD/depVedu/rT/sheny/Ut.html.

[9] Sichman, J.; Demazeu, Y. & Boissier, O. (1995) How can knowledge-based systems be called agents? In: IX Simposio Brasileiro de Inteligencia Artificial. Canela, Rio Grande do Sul, Brazil.

http://www.cglobal.pucrs.br/.

606


Intelligent Navigation Support for Lecturing in an Electronic Classroom Nelson A. Baloian1, Jose Pino1, Ulrich Hoppe2 Universidad de Chile, Dept. of Comp. Science Santiago, Chile {nbaloian,jpino}@dcc. uchile. cl 2 University of Duisburg, Dept. of Mathematics/Comp. Science Duisburg, Germany hoppe@informatik. uni-duisburg. de 1

The implementation of electronic classrooms did not come together with the development of software systems to support teaching/learning in this situation. In fact, there are still very few applications already developed dealing with the problem of the curriculum or lesson planning for the collaborative teaching/learning case and most of them implement a rather linear structure for the curriculum, allowing very few variants in the order the learning material is used during the lesson. There is a large number of existing software systems which, despite the fact they were not designed for the electronic classroom environment, they can be used and are being used for supporting teaching/learning in the classroom. This fact also encourages the development of a method for supporting the planning and authoring of a lesson which should take place in the electronic classroom incorporating these individual systems. This planning can help the teacher to present ideas during the development of the lesson in a coherent way. It can also help the teacher to recall the computer learning material at the right moment. Finally, it can support the students to reuse the learning material after the lesson. This paper presents a model for structuring the computer based learning material in an appropriate way to support the lecturing inside and outside an electronic classroom environment. It is a non-trivial problem to design a curriculum structure which fully meets the requirements of being as general as possible, providing at the same time sufficient amount of information in order to provide real support to its users. Furthermore, the same structure must be utilized in different situations and according to different profiles and teaching/learning situations. In the traditional intelligent tutoring systems (ITS), where flexibility of the learning path is required, the curriculum structure is typically represented as a graph. In these graphs each node represents a concept or a subject and the link represents how these concepts are related to each other. The relationship often imposes a partial order among the nodes defining the sequence in which concepts can be learnt. The different ways the graph can be traversed represent different learning paths. A certain learning path may suit better one student than another. Systems based on a graph-like curriculum structure can use the information contained in the nodes and links to assist the learner with some intelligence. In this work, a model of a curriculum representation based on a graph is introduced. This has been called a "didactic network" of a lesson or simply the lesson graph, and the path followed to perform a lesson is called the "learning path". In the next two paragraphs the characteristics of the nodes and links of a didactic network are described.

N.A. Baloian et al. / Intelligent Navigation Support

607

The Nodes: In a didactic network, a node represents an abstraction of a subject that is to be taught and learnt. Nodes are labeled with a name which is given by the author of the lesson graph (the curriculum designer). This name should be a mnemonic for describing the subject represented by that node. Nodes are also typified, this means, a node can be only of a type chosen from a pre-defined types set. The type of the node represents the learning activity which should take place when visiting the node. For performing the learning activity, teacher and students use (also) the computerized learning material which is associated to the node. The node types are the following: node type name Graphic presentation Animation presentation Audio Presentation Video presentation Text presentation Discussion Individual work Group work

activity involved while visiting for presentation of mostly graphic information to the students for presentation of animated & interactive simulation programs for presentation of mostly audio type information to the students for presentation of video clip to the students for presentation of mostly textual information to the students for a group discussion the main activity is the individual work of the students for a structured collaborative work (e.g. collaborative problem solving)

By typing the node according to a learning activity, the author of the lesson graph must first think about the subject and the concrete activity through which the subject will be taught/learnt and afterwards about the computerized learning material required. Moreover, the computer material can be associated to the node long after its creation. This also helps the author to clarify his ideas about the dynamics of a lesson based on this graph. The links: The links in a didactic graph are directed and labeled. The label establishes a relationship between the subjects represented by the nodes. The labels are also typified in order to make an intelligent guidance to the users possible, especially during the lesson development. The typing also allows the user of one of these graphs during the lesson to unequivocally understand what the author had in mind. The link types to be provided depend on the intended system support, i.e., the guidance functions the system will offer to its users. Because the aim of our system is to support the authoring and presentation of learning information and not to give help based on the knowledge acquired by the learning group, we will use rhetorical links for constructing didactic networks. The set of rhetoric links presented by Trigg will be modified and complemented in order to provide the curriculum designer a set of relations for constructing the lesson graph according to a preferred presentation strategy. The following table represents the relationship established by a link which points node Y from a node X. intended usage recommended for the beginning of a lesson and introduction of new topics may be used to split a topic in various sub topics.

link type name X introduces to Y

represented relationship X introduces the subject Y

X refined by Y

the subject Y is a part or a detail of the subject X the subject X is explained more deeply by may be used to justify or support the idea of the predecessor the subject Y Y is an example of the subject X may be used to illustrate the idea of the predecessor's subject Y is a summary for all the nodes linked by there are normally more than one predecessor linked with this type of link. this type of link, having Y as successor

X explained by Y X exemplified by Y X summarized by Y

608


Figure 1: An example of a didactic network for a lesson on Data Structures and Algorithms.

1. Using the graph for teaching/learning support The lesson graph has some important information about the lesson or learning unit being modeled. It is possible to use it as a basis for supporting the teaching/learning process. There are two basic functions based on the information contained by this graph which may give a good advice in many possible situations when guidance is required: a) displaying the graph in a way that highlights the possible lecturing threads b) answering to the question "which is the best next node to visit?" Both functions should take into account the user's profile. A good alternative to display the whole graph is to show a spanning tree of it. It has many advantages over the graph: A) A tree is easier to draw than a graph with a general algorithm. B) The learning material is presented in an "extended" form, where paths and endpoints of the lesson are easy to distinguish. C) It is even possible to show all the incoming and outgoing links for the current node in a separate window in order to have a better assessment of the situation. D) We can expect the structure of a didactic network representing a lesson or learning unit will have a structure similar to a tree. In fact, most printed learning material comes in the form of a tree with title, subtitles, chapters etc. and people tend to adopt this structure while organizing digitized information. Cross-references from one part to another are added afterwards. The question about how to automatically generate the spanning tree of the lesson graph arises because there are various ways to construct it. The best known algorithm for constructing spanning trees of a graph are the "Kruskal" and the "Prim" algorithms. We consider at least two meaningful criteria for constructing the spanning tree in this case: A) "Good looking" tree: A good-looking tree is a well-balanced tree with a relative short depth. If we follow the assumptions made by Wong & Chan about the structure of a lesson, we conclude a spanning tree with the shortest depth will be the best for overlooking the entire lesson structure, in order to avoid cycles introduced by cross references. This spanning tree can easily be generated by a breadth-first traversal of the graph, based on Dijkstra's algorithm to find the shortest path from one node to all others. B) "Most related concepts": Our objective now is to have a better close-up view of the current situation. For the implementation of this approach, the distances defined over the label-types of the links are crucial because they define how closely related are two


609

concepts to each other. Accordingly to this, a link-type which links to closely related concepts must be given a shorter distance. Prim's algorithm for constructing a spanning tree is the most suitable here because it starts from a given point, which will be the starting point of the lesson's graph. The NEXT-NODE function; A very easy to implement but powerful function to suggest navigation routes through the lesson graph is the automatic positioning of a "current node" pointer at a node which has at least one non-visited son. The teacher may continue the lesson by choosing any of the non-visited sons or backtracking on the tree until eventually reaching the root, if desired. This is very helpful in the case of visiting a leaf or reaching a node whose sons were already visited by following another path. This can spare the need of pressing a "go back" button while navigating through the network. It is easy to note that this function will leave no node unvisited, unless the teacher explicitly wants it. Introducing teaching/learning preferences: With a combination of ideas of the most related concepts spanning tree and the positioning of the current node pointer at a node with at least one non-visited son it is possible to implement some interesting algorithms which support the traversing of the graph under different profiles and/or requirements which arise from the teaching/learning context. The approach is the following: a spanning tree of the lesson graph is generated after the principle of the "most related concepts" but the table with the weight-function is given as a parameter, reflecting the teacher's and/or students' preferences. By calling the NEXT-NODE help function, the following algorithm is performed: if the current node has non-visited sons then select and return the one connected with the link which has the shortest distance. Else, go up to the father of this node and recursively apply the algorithm. This will indeed find the node with the most related subject to the current one, which has not been visited yet. This algorithm can be easily extended to construct a list with the possible next-nodes ordered by closeness according to the distances. Let us see now how we can implement the support to different learning strategies by changing the distance function. We will analyze 3 examples which implement the support for inductive learning, deductive learning, and a "short version" of the lesson. Under inductive learning we understand the learning process follows an inductive strategy; this means individual examples and facts will induce the understanding of general rules and laws. This is why in such a learning mode the examples have higher priority than the explanations. This can be implemented by assigning the links labeled with exemplifiedby a small distance. Table 1 is an example of implementation of this learning mode. In a deductive learning mode the rules and laws are explained first and then individual facts and examples are deduced from them. This can be implemented by assigning the links labeled with explained-by a short distance. Table 2 is an example of this. In a "short version" of the lesson only some of the nodes are visited in order to have an overview of the whole lesson without getting into details. In order to let some nodes out of the traversing path we can mark some links with an infinite distance. This means these links will never be selected during the construction of the minimal spanning tree. The nodes we can leave out of the lesson for a short version are the one linked with an explained-by and/or exemplifiedby. Table 3 is an example of this.


610

Table 2.

Table 1. Type of link introduces to explained by exemplified by refined by summarized by

Type of link introduces to explained by exemplified by refined by summarized by

Value 5 3 1 4 2*

Table 3.

Value 5 1 4 2 3*

Type of link introduces to explained by exemplified by refined by summarized by

Value 1 00

00

2 3*

' this link is considered only if all the predecessors of pointed node were already visited

The three examples we showed here about different learning modes that can be implemented are in no way the best nor the only possible implementation of learning modes. Their presentation here serves only for a better understanding of the underlying idea about the key role of the distance function for supporting different learning paths.

2. Conclusions and outlook This paper presented a model for structuring a curriculum for a lesson or course with computer based learning material. By structuring the computer based learning material in a curriculum as a graph we wanted to achieve at least the following goals: A) Make the curriculum designer aware of his decision of using a certain computer based material and/or programs during a lesson, and how the learning activity which will be supported by computers fits into the overall curriculum of a lesson or course. B) Help the curriculum designer to develop a discourse strategy to display the planned lesson and perform the learning planned activities. C) Help the curriculum designer to structure a lesson with independence of the learning material which will be used. D) Assist the teacher and students in the selection of a "convenient learning path" given some parameters describing the profile of a teaching/learning strategy and the needs of the learning group. A preliminary version of a system that uses the didactic network model for supporting authoring and presentation of computer supported collaborative lectures for the in-classroom face-to-face situation was implemented at the University of Duisburg by the COLLIDE group in the context of a system. The authoring system was based on a modified version of the Sepia authoring tool for hypermedia.

References Baloian, N., Hoppe, H.U. and Kling, U. (1995) Structural authoring and cooperative use of instructional multimedia material for the computer-integrated classroom. Proceedings of the ED-MEDIA 95 Conference in Graz, Austria, June 1995, pp. 81–86. McCalla, G. (1992) The search for adaptability, flexibility and individualization: approaches to curriculum in intelligent tutoring systems. In: Jones, M., Winnie, P. (Eds.) Adaptive Learning Environments, Springer-Verlag, NATO ASI Series, pp. 123–143. Trigg, R. (1993) A Network-based approach to text handling for online scientific community. PhD Dissertation. Department of Computer Science, University of Maryland. Wong, W.K. and Chan, T.W. (1997) A Multimedia System for Crafting Topic Hierarchy, Learning Strategies and Intelligent Models. International Journal of Artificial Intelligence in Education. 8.


611

An Ablative Evaluation Joseph E. Beck1'2, Ivon Arroyo1'2, Beverly Park Woolf1'2 and Carole Beal3 Computer Science Department, 2School of Education, 3Psychology Department University of Massachusetts, Amherst, MA 01003, USA Phone: (413) 545-0582; Fax: (413) 545-1249 email: {beck, ivon, bev}@cs. umass.edu; cbeal@psych. umass. edu

1

Abstract. The goal of this project is to use through an intelligent, multimedia computer tutor to increase students' interest in mathematics and their confidence in their ability to leam mathematics. Based on its student model, WhaleWatch selects problems of appropriate difficulty and provides help and instruction as needed. We present the results of a series of two studies. The first study showed that after using the tutor, girls increased in self-confidence at solving mathematics problems. Our second study was ablative, with the tutor's hinting mechanism disabled. We have found that some students respond better (higher motivation and performance) when given less help. Our current goal is to integrate this knowledge into our next design.

1. Introduction and motivation In the present project, we have developed an intelligent tutor called WhaleWatch to teach fraction concepts. The domain of fractions was selected for three reasons: First, it represents an introduction to more abstract mathematics material and is often regarded as considerably more difficult by students than basic arithmetic. Second, fractions is taught to 11 -year-olds in the United States, which is the point at which students, particularly girls, begin to express a dislike of mathematics. Third, there are well-documented and systematic error patterns associated with specific types of fractions problems. WhaleWatch was specifically designed to support the learning styles that are appealing to girls. Problems involving fractions are presented in the context of the domain of environmental biology, which, of the sciences, is particularly appealing to both girls and boys. The value of learning mathematics is conveyed through the overall goal of assisting the animals, rather than through competition, which girls tend to dislike as the basis for a learning activity. WhaleWatch also provides individualized instruction and supportive feedback to each student. This individualized instruction is provided by adjusting the level of feedback, controlling the progression through the curriculum, and constructing problems at the "appropriate" level of difficulty. A description of this process can be found in [2]. By making sure the instruction is appropriate for each student, we ensure that students do not become overly discouraged. We hypothesized that this feature should be particularly effective with female students, who are more easily discouraged about their progress in mathematics. 2. Experiments In Spring 1997, we conducted an evaluation study of WhaleWatch. The participants included 50 students attending two sixth grade classes at an elementary school located in a

612

J.E. Beck et al /An Ablative Evaluation

rural/suburban setting. To assess the students' changing beliefs about their mathematics ability, a questionnaire was administered in a pre- and post-test design. The questionnaire was drawn from work by Eccles et al. [4]. The questionnaire includes items that tap two dimensions of beliefs about mathematics: the belief that mathematics is useful, and a student's self-confidence of her ability in mathematics. Students rate their response to each item on a seven-point scale. The questionnaire has been shown to be highly reliable and to have good psychometric properties [4], and it has been frequently used in prior research on mathematics achievement. This test was given before the students first used the tutor, and after the students completed their final session with Whale Watch. Girls who used WhaleWatch increased in self-confidence from 4.9 to 5.3 (significant at P, 1< j < m, different laterality (L l j t , L„;....,L,,,), with t> 0, are associated. For each pair (C^D,,) is defined a sub-domain of DK, abbreviated for dike. Once DK is defined, the society of agents can be defined following the scheme of DK. To each one of the sub-domains (pair

Artificial Intelligence in Education (Frontiers in Artificial Intelligence and Applications)

Artificial Intelligence Research and Development (Frontiers in Artificial Intelligence and Applications, Vol. 146) (Frontiers in Artificial Intelligence and Applications)

Advances in Logic, Artificial Intelligence and Robotics: Laptec 2002 (Frontiers in Artificial Intelligence and Applications, 85)

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms (Frontiers in Artificial Intelligence and Applications)

Artificial Intelligence in Medicine

Advances in Artificial Intelligence

Artificial Intelligence Research and Development: Volume 163 Frontiers in Artificial Intelligence and Applications

Argumentation in Artificial Intelligence

Logics in Artificial Intelligence

Artificial Intelligence Research and Development: Volume 163 Frontiers in Artificial Intelligence and Applications

Algorithms and Architectures of Artificial Intelligence (Frontiers in Artificial Intelligence and Applications)

Algorithms and Architectures of Artificial Intelligence (Frontiers in Artificial Intelligence and Applications)