ARTIFICIAL INTELLIGENCE IN EDUCATION
Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biannual ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, R. Dieng, N. Guarino, R. López de Mántaras, R. Mizoguchi, M. Musen
Volume 125 Recently published in this series Vol. 124. T. Washio et al. (Eds.), Advances in Mining Graphs, Trees and Sequences Vol. 123. P. Buitelaar et al. (Eds.), Ontology Learning from Text: Methods, Evaluation and Applications Vol. 122. C. Mancini, Cinematic Hypertext –Investigating a New Paradigm Vol. 121. Y. Kiyoki et al. (Eds.), Information Modelling and Knowledge Bases XVI Vol. 120. T.F. Gordon (Ed.), Legal Knowledge and Information Systems – JURIX 2004: The Seventeenth Annual Conference Vol. 119. S. Nascimento, Fuzzy Clustering via Proportional Membership Model Vol. 118. J. Barzdins and A. Caplinskas (Eds.), Databases and Information Systems – Selected Papers from the Sixth International Baltic Conference DB&IS’2004 Vol. 117. L. Castillo et al. (Eds.), Planning, Scheduling and Constraint Satisfaction: From Theory to Practice Vol. 116. O. Corcho, A Layered Declarative Approach to Ontology Translation with Knowledge Preservation Vol. 115. G.E. Phillips-Wren and L.C. Jain (Eds.), Intelligent Decision Support Systems in Agent-Mediated Environments Vol. 114. A.C. Varzi and L. Vieu (Eds.), Formal Ontology in Information Systems – Proceedings of the Third International Conference (FOIS-2004) Vol. 113. J. Vitrià et al. (Eds.), Recent Advances in Artificial Intelligence Research and Development Vol. 112. W. Zhang and V. Sorge (Eds.), Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems Vol. 111. H. Fujita and V. Gruhn (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the Third SoMeT_W04
ISSN 0922-6389
Artificial Intelligence in Education Supporting Learning through Intelligent and Socially Informed Technology
Edited by
Chee-Kit Looi National Institute of Education, Nanyang Technological University, Singapore
Gord McCalla Department of Computer Science, University of Saskatchewan, Canada
Bert Bredeweg Human Computer Studies, Informatics Institute, Faculty of Science, University of Amsterdam, The Netherlands
and
Joost Breuker Human Computer Studies, Informatics Institute, Faculty of Science, University of Amsterdam, The Netherlands
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2005 The authors. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 1-58603-530-4 Library of Congress Control Number: 2005928505 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 620 3419 e-mail:
[email protected] Distributor in the UK and Ireland IOS Press/Lavis Marketing 73 Lime Walk Headington Oxford OX3 7AD England fax: +44 1865 750079
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected] LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
v
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Preface The 12th International Conference on Artificial Intelligence in Education (AIED-2005) is being held July 18–22, 2005, in Amsterdam, the beautiful Dutch city near the sea. AIED-2005 is the latest in an on-going series of biennial conferences in AIED dating back to the mid-1980’s when the field emerged from a synthesis of artificial intelligence and education research. Since then, the field has continued to broaden and now includes research and researchers from many areas of technology and social science. The conference thus provides opportunities for the cross-fertilization of information and ideas from researchers in the many fields that make up this interdisciplinary research area, including artificial intelligence, other areas of computer science, cognitive science, education, learning sciences, educational technology, psychology, philosophy, sociology, anthropology, linguistics, and the many domain-specific areas for which AIED systems have been designed and built. An explicit goal of this conference was to appeal to those researchers who share the AIED perspective that true progress in learning technology requires both deep insight into technology and also deep insight into learners, learning, and the context of learning. The 2005 theme “Supporting Learning through Intelligent and Socially Informed Technology” reflects this basic duality. Clearly, this theme has resonated with e-learning researchers throughout the world, since we received a record number of submissions, from researchers with a wide variety of backgrounds, but a common purpose in exploring these deep issues. Here are some statistics. Overall, we received 289 submissions for full papers and posters. 89 of these (31%) were accepted and published as full papers, and a further 72 as posters (25%). Full papers each have been allotted 8 pages in the Proceedings; posters have been allotted 3 pages. The conference also includes 11 interactive events, 2 panels, 12 workshops, 5 tutorials, and 28 papers in the Young Researcher’s Track. Each of these has been allotted a one-page abstract in the Proceedings; the workshops, tutorials, and YRT papers also have their own Proceedings, provided at the conference itself. Also in the Proceedings are brief abstracts of the talks of the four invited speakers: Daniel Schwartz of Stanford University in the U.S.A., Antonija Mitrovic of the University of Canterbury in New Zealand, Justine Cassell of Northwestern University in the U.S.A., and Ton de Jong of the University of Twente in the Netherlands. The work to put on a conference of this size is immense. We would like to thank the many, many people who have helped to make it possible. In particular we thank the members of the Local Organizing Committee, who have strived to make sure nothing is left to chance, and to keep stressing to everybody else, especially the program co-chairs, the importance of keeping on schedule! Without their concerted efforts AIED-2005 would probably have been held in 2007! As with any quality conference, the Program Committee is critical to having a strong program. Our Program Committee was under much more stress than normal, with way more papers than expected, and a shorter time than we had originally planned for reviewing. Thanks to all of the Program Committee members for doing constructive reviews under conditions of extreme pressure, and doing so more or less on time. Thanks, too, to the reviewers who were recruited by Program Committee members to help out in this critical task. The commit-
vi
tees organizing the other events at the conference also have helped to make the conference richer and broader: Young Researcher’s Track, chaired by Monique Grandbastien; Tutorials, chaired by Jacqueline Bourdeau and Peter Wiemer-Hastings; Workshops, chaired by Joe Beck and Neil Heffernen; and Interactive Events, chaired by Lora Aroyo. Antoinette Muntjewerff chaired the conference Publicity committee, and the widespread interest in the 2005 conference is in no small measure due to her and her committee’s activities. We also thank an advisory group of senior AIED researchers, an informal conference executive committee, who were a useful sounding board on many occasions during the conference planning. Each of the individuals serving in these various roles is acknowledged in the next few pages. Quite literally, without them this conference could not happen. Finally, we would like to thank Thomas Preuss who helped the program co-chairs through the mysteries of the Conference Master reviewing software. For those who enjoyed the contributions in this Proceedings, we recommend considering joining the International Society for Artificial Intelligence in Education, an active scientific community that helps to forge on-going interactions among AIED researchers in between conferences. The Society not only sponsors the biennial conferences and the occasional smaller meetings, but also has a quality journal, the AIED Journal, and an informative web site: http://aied.inf.ed.ac.uk/aiedsoc.html. We certainly hope that you all enjoy the AIED-2005 conference, and that you find it illuminating, entertaining, and stimulating. And, please also take some time to enjoy cosmopolitan Amsterdam. Chee-Kit Looi, Program Co-Chair, Nanyang Technological University, Singapore Gord McCalla, Program Co-Chair, University of Saskatchewan, Canada Bert Bredeweg, LOC-Chair, University of Amsterdam, The Netherlands Joost Breuker, LOC-Chair, University of Amsterdam, The Netherlands Helen Pain, Conference Chair, University of Edinburgh, United Kingdom
vii
International AIED Society Management Board Paul Brna, University of Glasgow, UK – journal editor Jim Greer, University of Saskatchewan, Canada – president elect Riichiro Mizoguchi, Osaka University, Japan – secretary Helen Pain, University of Edinburgh, UK – president
Executive Committee Members Joost Breuker, University of Amsterdam, The Netherlands Paul Brna, University of Glasgow, UK Jim Greer, University of Saskatchewan, Canada Susanne Lajoie, McGill University, Canada Ana Paiva, Technical University of Lisbon, Portugal Dan Suthers, University of Hawaii, USA Gerardo Ayala, Puebla University, Mexico Michael Baker, University of Lyon, France Tak-Wai Chan, National Central University, Taiwan Claude Frasson, University of Montreal, Canada Ulrich Hoppe, University of Duisburg, Germany Ken Koedinger, Carnegie Mellon University, USA Helen Pain, University of Edinburgh, UK Wouter van Joolingen, University of Amsterdam, Netherlands Ivon Arroyo, University of Massachusetts, USA Bert Bredeweg, University of Amsterdam, The Netherlands Art Graesser, University of Memphis, USA Lewis Johnson, University of Southern California, USA Judy Kay, University of Sydney, Australia Chee Kit Looi, Nanyang Technological University, Singapore Rose Luckin, University of Sussex, UK Tanja Mitrovic, University of Canterbury, New Zealand Pierre Tchounikine, University of Le Mans, France
viii
Conference Chair Helen Pain, University of Edinburgh, United Kingdom
Program Chairs Chee-Kit Looi, Nanyang Technological University, Singapore Gord McCalla, University of Saskatchewan, Canada
Organising Chairs Bert Bredeweg, University of Amsterdam, The Netherlands Joost Breuker, University of Amsterdam, The Netherlands
Conference Executive Committee Paul Brna, University of Glasgow, UK Jim Greer, University of Saskatchewan, Canada Lewis Johnson, University of Southern California, USA Riichiro Mizoguchi, Osaka University, Japan Helen Pain, University of Edinburgh, UK
Young Researcher’s Track Chair Monique Grandbastien, Université Henri Poincaré, France
Tutorials Chairs Jacqueline Bourdeau, Université du Québec, Canada Peter Wiemer-Hastings, DePaul University, United States of America
Workshops Chairs Joe Beck, Carnegie-Mellon University, United States of America Neil Heffernan, Worcester Polytechnic Institute, United States of America
Interactive Events Chair Lora Aroyo, Eindhoven University of Technology, The Netherlands
Publicity Chair Antoinette Muntjewerff, University of Amsterdam, The Netherlands
ix
Program Committee Esma Aimeur, Université de Montréal, Canada Shaaron Ainsworth, University of Nottingham, United Kingdom Fabio Akhras, Renato Archer Research Center, Brazil Vincent Aleven, Carnegie-Mellon University, United States of America Terry Anderson, Athabasca University, Canada Roger Azevedo, University of Maryland, United States of America Mike Baker, Centre National de la Recherche Scientifique, France Nicolas Balacheff, Centre National de la Recherche Scientifique, France Gautam Biswas, Vanderbilt University, United States of America Bert Bredeweg, University of Amsterdam, Netherlands Joost Breuker, University of Amsterdam, Netherlands Peter Brusilovsky, University of Pittsburgh, United States of America Susan Bull, University of Birmingham, United Kingdom Isabel Fernández de Castro, University of the Basque Country UPV/EHU, Spain Tak-Wai Chan, National Central University, Taiwan Yam-San Chee, Nanyang Technological University, Singapore Weiqin Chen, University of Bergen, Norway Cristina Conati, University of British Columbia, Canada Albert Corbett, Carnegie-Mellon University, United States of America Vladan Devedzic, University of Belgrade, Yugoslavia Vania Dimitrova, University of Leeds, United Kingdom Aude Dufresne, Université de Montréal, Canada Marc Eisenstadt, Open University,United Kingdom Jon A. Elorriaga, University of the Basque Country, Spain Gerhard Fischer, University of Colorado, United States of America Elena Gaudioso, Universidad Nacional de Educacion a Distancia, Spain Peter Goodyear, University of Sydney, Australia Art Graesser, University of Memphis, United States of America Barry Harper, University of Wollongong, Australia Neil Heffernan, Worcester Polytechnic Institute, United States of America Pentti Hietala, University of Tampere, Finland Tsukasa Hirashima, Hiroshima University, Japan Ulrich Hoppe, University of Duisburg, Germany RongHuai Huang, Beijing Normal University, China Chih-Wei Hue, National Taiwan University, Taiwan Mitsuru Ikeda, Japan Advanced Institute of Science and Technology, Japan Akiko Inaba, Osaka University, Japan Lewis Johnson, University of Southern California, United States of America David Jonassen, University of Missouri, United States of America Wouter van Joolingen, University of Twente, Netherlands Akihiro Kashihara, University of Electro-Communications, Japan Judy Kay, University of Sydney, Australia Ray Kemp, Massey University, New Zealand Ken Koedinger, Carnegie-Mellon University, United States of America Janet Kolodner, Georgia Institute of Technology, United States of America Rob Koper, Open University of the Netherlands, Netherlands Lam-For Kwok, City University of Hong Kong, Hong Kong
x
Susanne Lajoie, McGill University, Canada Fong-lok Lee, Chinese University of Hong Kong, Hong Kong Ok Hwa Lee, Chungbuk National University, Korea James Lester, North Carolina State University, United States of America Rose Luckin, University of Sussex, United Kingdom Tanja Mitrovic, University of Canterbury, New Zealand Permanand Mohan, University of the West Indies, Trinidad and Tobago Rafael Morales, University of Northumbria at Newcastle, United Kingdom Jack Mostow, Carnegie-Mellon University, United States of America Tom Murray, University of New Hampshire, United States of America Toshio Okamoto, University of Electro-Communications, Japan Rachel Or-Bach, Emek Yezreel College, Israel Ana Paiva, INESC-ID and Instituto Superior Técnico, Portugal Cecile Paris, CSIRO, Australia Peter Reimann, University of Sydney, Australia Marta Rosatelli, Universidade Católica de Santos, Brazil Jeremy Roschelle, SRI, United States of America Carolyn Rosé, Carnegie-Mellon University, United States of America Fiorella de Rosis, University of Bari, Italy Jacobijn Sandberg, University of Amsterdam, Netherlands Mike Sharples, University of Birmingham, United Kingdom Raymund Sison, De La Salle University, Philippines Amy Soller, Institute for Scientific and Technological Research, Italy Elliot Soloway, University of Michigan, United States of America Dan Suthers, University of Hawaii, United States of America Erkki Suttinen, University of Joensuu, Finland Akira Takeuchi, Kyushu Institute of Technology, Japan Liane Tarouco, Universidade Federal do Rio Grande do Su, Brazil Carlo Tasso, University of Udine, Italy Pierre Tchounikine, Université du Maine, France Kurt VanLehn, University of Pittsburgh, United States of America Julita Vassileva, University of Saskatchewan, Canada Felisa Verdejo, Universidad Nacional de Educacion a Distancia, Spain Gerhard Weber, University of Trier, Germany Barbara White, University of California at Berkeley, United States of America Lung-Hsiang Wong, National University of Singapore, Singapore Jin-Tan David Yang, National Kaohsiung Normal University, Taiwan Diego Zapata-Rivera, Educational Testing Service, United States of America Zhiting Zhu, East China Normal University, China
xi
Reviewers Esma Aimeur Ainhoa Alvarez Shaaron Ainsworth Fabio Akhras Vincent Aleven Terry Anderson Stamatina Anstopoulou Ana Arruarte Roger Azevedo Mike Baker Nicolas Balacheff Beatriz Barros Gautam Biswas Bert Bredeweg Joost Breuker Chris Brooks Francis Brouns Jan van Bruggen Peter Brusilovsky Stefan Carmien Valeria Carofiglio Berardina De Carolis Rosa Maria Carro Isabel Fernández de Castro Tak-Wai Chan Ben Chang Sung-Bin Chang Yam-San Chee Weiqin Chen Yen-Hua Chen Yu-Fen Chen Zhi-Hong Chen Hercy Cheng Andrew Chiarella Cristina Conati Ricardo Conejo Albert Corbett Ben Daniel Melissa Dawe Yi-Chan Deng Vladan Devedzic Vania Dimitrova Aude Dufresne Hal Eden Marc Eisenstadt Jon A. Elorriaga
Rene van Es Jennifer Falcone Sonia Faremo Bego Ferrero Gerhard Fischer Isaac Fung Dragan Gasevic Elena Gaudioso Elisa Giaccardi Peter Goodyear Andrew Gorman Art Graesser Jim Greer Barry Harper Pentti Hietala Tsukasa Hirashima Ulrich Hoppe Tomoya Horiguchi RongHuai Huang Chih-Wei Hue Mitsuru Ikeda Akiko Inaba Lewis Johnson Russell Johnson David Jonassen Wouter van Joolingen Akihiro Kashihara Judy Kay Elizabeth Kemp Ray Kemp Liesbeth Kester Ken Koedinger Shin’ichi Konomi Rob Koper Yang-Ming Ku Hidenobu Kunichika Lam-For Kwok Chih Hung Lai Susanne Lajoie Mikel Larrañaga Fong-lok Lee Seung Lee Sunyoung Lee James Lester Chuo-Bin Lin Fuhua Oscar Lin
Chee-Kit Looi Susan Lu Rose Luckin Heather Maclaren Montse Maritxalar Brent Martin Liz Masterman Noriyuki Matsuda Jose Ignacio Mayorga Gord McCalla Scott McQuiggan Tanja Mitrovic Frans Mofers Permanand Mohan Rafael Morales Jack Mostow Bradford Mott Kasia Muldner Tom Murray Tomohiro Oda Masaya Okada Toshio Okamoto Olayide Olorunleke Ernie Ong Rachel Or-Bach Mourad Oussalah Ana Paiva Cecile Paris Harrie Passier Tom Patrick Peter Reimann Marta Rosatelli Jeremy Roschelle Carolyn Rosé Fiorella de Rosis Peter van Rosmalen Jacobijn Sandberg Mike Sharples Raymund Sison Peter Sloep Amy Soller Elliot Soloway Slavi Stoyanov Jim Sullivan Dan Suthers Erkki Suttinen
xii
Akira Takeuchi Tiffany Tang Colin Tattersall Pierre Tchounikine Takanobu Umetsu Maite Urretavizcaya Kurt VanLehn
Julita Vassileva Felisa Verdejo Fred de Vries Gerhard Weber Barbara White Mike Winter Lung-Hsiang Wong
Jin-Tan David Yang Yunwen Ye Gee-Kin Yeo Diego Zapata-Rivera Zhiting Zhu
xiii
YRT Committee Monique Baron, France Joseph Beck, USA Jim Greer, Canada Erica Melis, Germany Alessandro Micarelli, Italy Riichiro Mizoguchi, Japan Roger Nkambou, Canada Jean-François Nicaud, France Kalina Yacef, Australia
Additional YRT Reviewers John Lee, UK Judy Kay, Australia Cristina Conati, Canada Shaaron Ainsworth, UK Peter Brusilovsky, USA Michael Baker, France Phil Winne, Canada Aude Dufresne, Canada Tom Murray, USA Catherine Pelachaud, France
Organising Committee Lora Aroyo, Eindhoven University of Technology, Netherlands Anders Bouwer, University of Amsterdam, The Netherlands Bert Bredeweg, University of Amsterdam, The Netherlands Joost Breuker, University of Amsterdam, The Netherlands Antoinette Muntjewerff, University of Amsterdam, The Netherlands Radboud Winkels, University of Amsterdam, The Netherlands
xiv
Sponsors
xv
Contents Preface International AIED Society Management Board Executive Committee Members Conference Organization Sponsors
v vii vii viii xiv
Invited Talks Learning with Virtual Peers Justine Cassell
3
Scaffolding Inquiry Learning: How Much Intelligence is Needed and by Whom? Ton de Jong
4
Constraint-Based Tutors: A Success Story Tanja Mitrovic
5
Interactivity and Learning Dan Schwartz
6
Full Papers Evaluating a Mixed-Initiative Authoring Environment: Is REDEEM for Real? Shaaron Ainsworth and Piers Fleming An Architecture to Combine Meta-Cognitive and Cognitive Tutoring: Pilot Testing the Help Tutor Vincent Aleven, Ido Roll, Bruce McLaren, Eun Jeong Ryu and Kenneth Koedinger
9
17
“À la” in Education: Keywords Linking Method for Selecting Web Resources Mirjana Andric, Vladan Devedzic, Wendy Hall and Leslie Carr
25
Inferring Learning and Attitudes from a Bayesian Network of Log File Data Ivon Arroyo and Beverly Park Woolf
33
Why Is Externally-Regulated Learning More Effective Than Self-Regulated Learning with Hypermedia? Roger Azevedo, Daniel Moos, Fielding Winters, Jeffrey Greene, Jennifer Cromley, Evan Olson and Pragati Godbole Chaudhuri
41
Motivating Appropriate Challenges in a Reciprocal Tutoring System Ari Bader-Natal and Jordan Pollack
49
Do Performance Goals Lead Students to Game the System? Ryan Shaun Baker, Ido Roll, Albert T. Corbett and Kenneth R. Koedinger
57
xvi
Pedagogical Agents as Social Models for Engineering: The Influence of Agent Appearance on Female Choice Amy L. Baylor and E. Ashby Plant The Impact of Frustration-Mitigating Messages Delivered by an Interface Agent Amy L. Baylor, Daniel Warren, Sanghoon Park, E. Shen and Roberto Perez Computational Methods for Evaluating Student and Group Learning Histories in Intelligent Tutoring Systems Carole Beal and Paul Cohen
65 73
80
Engagement Tracing: Using Response Times to Model Student Disengagement Joseph E. Beck
88
Interactive Authoring Support for Adaptive Educational Systems Peter Brusilovsky, Sergey Sosnovsky, Michael Yudelson and Girish Chavan
96
Some Unusual Open Learner Models Susan Bull, Abdallatif S. Abu-Issa, Harpreet Ghag and Tim Lloyd Advanced Capabilities for Evaluating Student Writing: Detecting Off-Topic Essays Without Topic-Specific Training Jill Burstein and Derrick Higgins Thread-Based Analysis of Patterns of Collaborative Interaction in Chat Murat Cakir, Fatos Xhafa, Nan Zhou and Gerry Stahl Conceptual Conflict by Design: Dealing with Students’ Learning Impasses in Multi-User Multi-Agent Virtual Worlds Yam San Chee and Yi Liu
104
112 120
128
Motivating Learners by Nurturing Animal Companions: My-Pet and Our-Pet Zhi-Hong Chen, Yi-Chan Deng, Chih-Yueh Chou and Tak-Wai Chan
136
ArithmeticDesk: Computer Embedded Manipulatives for Learning Arithmetic Hercy N.H. Cheng, Ben Chang, Yi-Chan Deng and Tak-Wai Chan
144
Adaptive Reward Mechanism for Sustainable Online Learning Community Ran Cheng and Julita Vassileva
152
What Is The Student Referring To? Mapping Properties and Concepts in Students’ Systems of Physics Equations C.W. Liew, Joel A. Shapiro and D.E. Smith The Effects of a Pedagogical Agent in an Open Learning Environment Geraldine Clarebout and Jan Elen Using Discussion Prompts to Scaffold Parent-Child Collaboration Around a Computer-Based Activity Jeanette O’Connor, Lucinda Kerawalla and Rosemary Luckin Self-Regulation of Learning with Multiple Representations in Hypermedia Jennifer Cromley, Roger Azevedo and Evan Olson
160 168
176 184
xvii
An ITS for Medical Classification Problem-Solving: Effects of Tutoring and Representations Rebecca Crowley, Elizabeth Legowski, Olga Medvedeva, Eugene Tseytlin, Ellen Roh and Drazen Jukic
192
Mining Data and Modelling Social Capital in Virtual Learning Communities Ben K. Daniel, Gordon I. McCalla and Richard A. Schwier
200
Tradeoff Analysis Between Knowledge Assessment Approaches Michel C. Desmarais, Shunkai Fu and Xiaoming Pu
209
Natural Language Generation for Intelligent Tutoring Systems: A Case Study Barbara di Eugenio, Davide Fossati, Dan Yu, Susan Haller and Michael Glass
217
Dialogue-Learning Correlations in Spoken Dialogue Tutoring Kate Forbes-Riley, Diane Litman, Alison Huettner and Arthur Ward
225
Adolescents’ Use of SRL Behaviors and Their Relation to Qualitative Mental Model Shifts While Using Hypermedia Jeffrey A. Greene and Roger Azevedo
233
Teaching about Dynamic Processes A Teachable Agents Approach Ruchie Gupta, Yanna Wu and Gautam Biswas
241
Exam Question Recommender System Hicham Hage and Esma Aïmeur
249
DIANE, a Diagnosis System for Arithmetical Problem Solving Khider Hakem, Emmanuel Sander, Jean-Marc Labat and Jean-François Richard
258
Collaboration and Cognitive Tutoring: Integration, Empirical Results, and Future Directions Andreas Harrer, Bruce M. McLaren, Erin Walker, Lars Bollen and Jonathan Sewall Personal Readers: Personalized Learning Object Readers for the Semantic Web Nicola Henze Making an Unintelligent Checker Smarter: Creating Semantic Illusions from Syntactic Analyses Kai Herrmann and Ulrich Hoppe
266
274
282
Iterative Evaluation of a Large-Scale, Intelligent Game for Language Learning W. Lewis Johnson and Carole Beal
290
Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents W. Lewis Johnson, Richard E. Mayer, Elisabeth André and Matthias Rehm
298
Serious Games for Language Learning: How Much Game, How Much AI? W. Lewis Johnson, Hannes Vilhjalmsson and Stacy Marsella
306
Taking Control of Redundancy in Scripted Tutorial Dialogue Pamela W. Jordan, Patricia Albacete and Kurt VanLehn
314
xviii
Ontology of Learning Object Content Structure Jelena Jovanović, Dragan Gašević, Katrien Verbert and Erik Duval Goal Transition Model and Its Application for Supporting Teachers Based on Ontologies Toshinobu Kasai, Haruhisa Yamaguchi, Kazuo Nagano and Riichiro Mizoguchi
322
330
Exploiting Readily Available Web Data for Scrutable Student Models Judy Kay and Andrew Lum
338
What Do You Mean by to Help Learning of Metacognition? Michiko Kayashima, Akiko Inaba and Riichiro Mizoguchi
346
Matching and Mismatching Learning Characteristics with Multiple Intelligence Based Content Declan Kelly and Brendan Tangney
354
Pedagogical Agents as Learning Companions: Building Social Relations with Learners Yanghee Kim
362
The Evaluation of an Intelligent Teacher Advisor for Web Distance Environments Essam Kosba, Vania Dimitrova and Roger Boyle
370
A Video Retrieval System for Computer Assisted Language Learning Chin-Hwa Kuo, Nai-Lung Tsao, Chen-Fu Chang and David Wible
378
The Activity at the Center of the Global Open and Distance Learning Process Lahcen Oubahssi, Monique Grandbastien, Macaire Ngomo and Gérard Claës
386
Towards Support in Building Qualitative Knowledge Models Vania Bessa Machado, Roland Groen and Bert Bredeweg
395
Analyzing Completeness and Correctness of Utterances Using an ATMS Maxim Makatchev and Kurt VanLehn
403
Modelling Learning in an Educational Game Micheline Manske and Cristina Conati
411
On Using Learning Curves to Evaluate ITS Brent Martin, Kenneth R. Koedinger, Antonija Mitrovic and Santosh Mathan
419
The Role of Learning Goals in the Design of ILEs: Some Issues to Consider Erika Martínez-Mirón, Amanda Harris, Benedict du Boulay, Rosemary Luckin and Nicola Yuill
427
A Knowledge-Based Coach for Reasoning about Historical Causation Liz Masterman
435
Advanced Geometry Tutor: An intelligent Tutor that Teaches Proof-Writing with Construction Noboru Matsuda and Kurt VanLehn
443
xix
Design of Erroneous Examples for ACTIVEMATH Erica Melis “Be Bold and Take a Challenge”: Could Motivational Strategies Improve Help-Seeking? Genaro Rebolledo Mendez, Benedict du Boulay and Rosemary Luckin
451
459
Educational Data Mining: A Case Study Agathe Merceron and Kalina Yacef
467
Adapting Process-Oriented Learning Design to Group Characteristics Yongwu Miao and Ulrich Hoppe
475
On the Prospects of Intelligent Collaborative E-Learning Systems Miikka Miettinen, Jaakko Kurhila and Henry Tirri
483
COFALE: An Adaptive Learning Environment Supporting Cognitive Flexibility Vu Minh Chieu and Elie Milgrom
491
The Effect of Explaining on Learning: A Case Study with a Data Normalization Tutor Antonija Mitrovic
499
Formation of Learning Groups by Using Learner Profiles and Context Information Martin Muehlenbrock
507
Evaluating Inquiry Learning Through Recognition-Based Tasks Tom Murray, Kenneth Rath, Beverly Woolf, David Marshall, Merle Bruno, Toby Dragon, Kevin Kohler and Matthew Mattingly
515
Personalising Information Assets in Collaborative Learning Environments Ernest Ong, Ai-Hwa Tay, Chin-Kok Ong and Siong-Kong Chan
523
Qualitative and Quantitative Student Models Jose-Luis Perez-de-la-Cruz, Ricardo Conejo and Eduardo Guzmán
531
Making Learning Design Standards Work with an Ontology of Educational Theories Valéry Psyché, Jacqueline Bourdeau, Roger Nkambou and Riichiro Mizoguchi Detecting the Learner’s Motivational States in an Interactive Learning Environment Lei Qu and W. Lewis Johnson Blending Assessment and Instructional Assisting Leena Razzaq, Mingyu Feng, Goss Nuzzo-Jones, Neil T. Heffernan, Kenneth Koedinger, Brian Junker, Steven Ritter, Andrea Knight, Edwin Mercado, Terrence E. Turner, Ruta Upalekar, Jason A. Walonoski, Michael A. Macasek, Christopher Aniszczyk, Sanket Choksey, Tom Livak and Kai Rasmussen
539
547 555
xx
A First Evaluation of the Instructional Value of Negotiable Problem Solving Goals on the Exploratory Learning Continuum Carolyn Rosé, Vincent Aleven, Regan Carey and Allen Robinson Automatic and Semi-Automatic Skill Coding with a View Towards Supporting On-Line Assessment Carolyn Rosé, Pinar Donmez, Gahgene Gweon, Andrea Knight, Brian Junker, William Cohen, Kenneth Koedinger and Neil Heffernan The Use of Qualitative Reasoning Models of Interactions between Populations to Support Causal Reasoning of Deaf Students Paulo Salles, Heloisa Lima-Salles and Bert Bredeweg
563
571
579
Assessing and Scaffolding Collaborative Learning in Online Discussions Erin Shaw
587
THESPIAN: An Architecture for Interactive Pedagogical Drama Mei Si, Stacy C. Marsella and David V. Pynadath
595
Technology at Work to Mediate Collaborative Scientific Enquiry in the Field Hilary Smith, Rose Luckin, Geraldine Fitzpatrick, Katerina Avramides and Joshua Underwood
603
Implementing a Layered Analytic Approach for Real-Time Modeling of Students’ Scientific Understanding Ron Stevens and Amy Soller
611
Long-Term Human-Robot Interaction: The Personal Exploration Rover and Museum Docents Kristen Stubbs, Debra Bernstein, Kevin Crowley and Illah Nourbakhsh
621
Information Extraction and Machine Learning: Auto-Marking Short Free Text Responses to Science Questions Jana Z. Sukkarieh and Stephen G. Pulman
629
A Knowledge Acquisition System for Constraint-Based Intelligent Tutoring Systems Pramuditha Suraweera, Antonija Mitrovic and Brent Martin
638
Computer Games as Intelligent Learning Environments: A River Ecosystem Adventure Jason Tan, Chris Beers, Ruchi Gupta and Gautam Biswas
646
Paper Annotation with Learner Models Tiffany Y. Tang and Gordon McCalla
654
Automatic Textual Feedback for Guided Inquiry Learning Steven Tanimoto, Susan Hubbard and William Winn
662
Graph of Microworlds: A Framework for Assisting Progressive Knowledge Acquisition in Simulation-Based Learning Environments Tomoya Horiguchi and Tsukasa Hirashima The Andes Physics Tutoring System: Five Years of Evaluations Kurt VanLehn, Collin Lynch, Kay Schulze, Joel A. Shapiro, Robert Shelby, Linwood Taylor, Don Treacy, Anders Weinstein and Mary Wintersgill
670 678
xxi
The Politeness Effect: Pedagogical Agents and Learning Gains Ning Wang, W. Lewis Johnson, Richard E. Mayer, Paola Rizzo, Erin Shaw and Heather Collins
686
Towards Best Practices for Semantic Web Student Modelling Mike Winter, Christopher Brooks and Jim Greer
694
Critical Thinking Environments for Science Education Beverly Park Woolf, Tom Murray, David Marshall, Toby Dragon, Kevin Kohler, Matt Mattingly, Merle Bruno, Dan Murray and Jim Sammons
702
NavEx: Providing Navigation Support for Adaptive Browsing of Annotated Code Examples Michael Yudelson and Peter Brusilovsky Feedback Micro-engineering in EER-Tutor Konstantin Zakharov, Antonija Mitrovic and Stellan Ohlsson
710 718
Posters An Ontology of Situations, Interactions, Processes and Affordances to Support the Design of Intelligent Learning Environments Fabio N. Akhras
729
Toward Supporting Hypothesis Formation and Testing in an Interpretive Domain Vincent Aleven and Kevin Ashley
732
Authoring Plug-In Tutor Agents by Demonstration: Rapid, Rapid Tutor Development Vincent Aleven and Carolyn Rosé
735
Evaluating Scientific Abstracts with a Genre-Specific Rubric Sandra Aluísio, Ethel Schuster, Valéria Feltrim, Adalberto Pessoa Jr. and Osvaldo Oliveira Jr.
738
Dynamic Authoring in On-Line Adaptive Learning Environments A. Alvarez, I. Fernández-Castro and M. Urretavizcaya
741
Designing Effective Nonverbal Communication for Pedagogical Agents Amy L. Baylor, Soyoung Kim, Chanhee Son and Miyoung Lee
744
Individualized Feedback and Simulation-Based Practice in the Tactical Language Training System: An Experimental Evaluation Carole R. Beal, W. Lewis Johnson, Richard Dabrowski and Shumin Wu Enhancing ITS Instruction with Integrated Assessments of Learner Mood, Motivation and Gender Carole R. Beal, Erin Shaw, Yuan-Chun Chiu, Hyokyeong Lee, Hannes Vilhjalmsson and Lei Qu Exploring Simulations in Science Through the Virtual Lab Research Study: From NASA Kennedy Space Center to High School Classrooms Laura Blasi
747
750
753
xxii
Generating Structured Explanations of System Behaviour Using Qualitative Simulations Anders Bouwer and Bert Bredeweg
756
The Bricoles Project: Support Socially Informed Design of Learning Environment Pierre-André Caron, Alain Derycke and Xavier Le Pallec
759
Explainable Artificial Intelligence for Training and Tutoring H. Chad Lane, Mark G. Core, Michael van Lent, Steve Solomon and Dave Gomboc An Agent-Based Framework for Enhancing Helping Behaviors in Human Teamwork Cong Chen, John Yen, Michael Miller, Richard Volz and Wayne Shebilske P3T: A System to Support Preparing and Performing Peer Tutoring Emily Ching, Chih-Ti Chen, Chih-Yueh Chou, Yi-Chan Deng and Tak-Wai Chan Cognitive and Motivational Effects of Animated Pedagogical Agent for Learning English as a Second Language Sunhee Choi and Hyokyeong Lee
762
765 768
771
Added Value of a Task Model and Role of Metacognition in Learning Noor Christoph, Jacobijn Sandberg and Bob Wielinga
774
Introducing Adaptive Assistance in Adaptive Testing Ricardo Conejo, Eduardo Guzmán, José-Luis Pérez-de-la-Cruz and Eva Millán
777
Student Questions in a Classroom Evaluation of the ALPS Learning Environment Albert Corbett, Angela Wagner, Chih-yu Chao, Sharon Lesgold, Scott Stevens and Harry Ulrich
780
Scrutability as a Core Interface Element Marek Czarkowski, Judy Kay and Serena Potts
783
DCE: A One-on-One Digital Classroom Environment Yi-Chan Deng, Sung-Bin Chang, Ben Chang and Tak-Wai Chan
786
Contexts in Educational Topic Maps Christo Dichev and Darina Dicheva
789
Analyzing Computer Mediated and Face-to-Face Interactions: Implications for Active Support Wouter van Diggelen, Maarten Overdijk and Jerry Andriessen
792
Adding a Reflective Layer to a Simulation-Based Learning Environment Douglas Chesher, Judy Kay and Nicholas J.C. King
795
Positive and Negative Verbal Feedback for Intelligent Tutoring Systems Barbara di Eugenio, Xin Lu, Trina C. Kershaw, Andrew Corrigan-Halpern and Stellan Ohlsson
798
Domain-Knowledge Manipulation for Dialogue-Adaptive Hinting Armin Fiedler and Dimitra Tsovaltzi
801
xxiii
How to Qualitatively + Quantitatively Assess Concepts Maps: The Case of COMPASS Evangelia Gouli, Agoritsa Gogoulou, Kyparisia Papanikolaoy and Maria Grigoriadou
804
Describing Learner Support: An Adaptation of IMS-LD Educational Modelling Language Patricia Gounon, Pascal Leroux and Xavier Dubourg
807
Developing a Bayes-Net Based Student Model for an External Representation Selection Tutor Beate Grawemeyer and Richard Cox
810
Towards Data-Driven Design of a Peer Collaborative Agent Gahgene Gweon, Carolyn Rosé, Regan Carey and Zachary Zaiss
813
Discovery of Patterns in Learner Actions Andreas Harrer, Michael Vetter, Stefan Thür and Jens Brauckmann
816
When do Students Interrupt Help? Effects of Time, Help Type, and Individual Differences Cecily Heiner, Joseph Beck and Jack Mostow
819
Fault-Tolerant Interpretation of Mathematical Formulas in Context Helmut Horacek and Magdalena Wolska
827
Help in Modelling with Visual Languages Kai Herrmann, Ulrich Hoppe and Markus Kuhn
830
Knowledge Extraction and Analysis on Collaborative Interaction Ronghuai Huang and Huanglingzi Liu
833
Enriching Classroom Scenarios with Tagged Objects Marc Jansen, Björn Eisen and Ulrich Hoppe
836
Testing the Effectiveness of the Leopard Tutor Under Experimental Conditions Ray Kemp, Elisabeth Todd and Rosemary Krsinich Setting the Stage for Collaborative Interactions: Exploration of Separate Control of Shared Space Lucinda Kerawalla, Darren Pearce, Jeanette O’Connor, Rosemary Luckin, Nicola Yuill and Amanda Harris
839
842
Computer Simulation as an Instructional Technology in AutoTutor Hyun-Jeong Joyce Kim, Art Graesser, Tanner Jackson, Andrew Olney and Patrick Chipman
845
Developing Teaching Aids for Distance Education Jihie Kim, Carole Beal and Zeeshan Maqbool
848
Who Helps the Helper? A Situated Scaffolding System for Supporting Less Experienced Feedback Givers Duenpen Kochakornjarupong, Paul Brna and Paul Vickers
851
xxiv
Realizing Adaptive Questions and Answers for ICALL Systems Hidenobu Kunichika, Minoru Urushima, Tsukasa Hirashima and Akira Takeuchi
854
CM-DOM: A Concept Map Based Tool for Supervising Domain Acquisition M. Larrañaga, U. Rueda, M. Kerejeta, J.A. Elorriaga and A. Arruarte
857
Using FAQ as a Case Base for Intelligent Tutoring Demetrius Ribeiro Lima and Marta Costa Rosatelli
860
Alignment-Based Tools for Translation Self-Learning J. Gabriel Pereira Lopes, Tiago Ildefonso and Marcelo S. Pimenta
863
Implementing Analogies Using APE Rules in an Electronic Tutoring System Evelyn Lulis, Reva Freedman and Martha Evens
866
Interoperability Issues in Authoring Interactive Activities Manolis Mavrikis and Charles Hunn
869
An Ontology-Driven Portal for a Collaborative Learning Community J.I. Mayorga, B. Barros, C. Celorrio and M.F. Verdejo
872
A Greedy Knowledge Acquisition Method for the Rapid Prototyping of Bayesian Belief Networks Claus Möbus and Heiko Seebold
875
Automatic Analysis of Questions in e-Learning Environment Mohamed Jemni and Issam Ben Ali
878
Intelligent Pedagogical Action Selection Under Uncertainty Selvarajah Mohanarajah, Ray Kemp and Elizabeth Kemp
881
A Generic Tool to Browse Tutor-Student Interactions: Time Will Tell! Jack Mostow, Joseph Beck, Andrew Cuneo, Evandro Gouvea and Cecily Heiner
884
Effects of Dissuading Unnecessary Help Requests While Providing Proactive Help R. Charles Murray and Kurt VanLehn
887
Breaking the ITS Monolith: A Hybrid Simulation and Tutoring Architecture for ITS William R. Murray
890
A Study on Effective Comprehension Support by Assortment of Multiple Comprehension Support Methods Manabu Nakamura, Yoshiki Kawaguchi, Noriyuki Iwane, Setsuko Otsuki and Yukihiro Matsubara
893
Applications of Data Mining in Constraint-Based Intelligent Tutoring Systems Karthik Nilakant and Antonija Mitrovic
896
Supporting Training on a Robotic Simulator Using a Flexible Path Planner Roger Nkambou, Khaled Belghith, Froduald Kabanza and Mahie Khan
899
The eXtensible Tutor Architecture: A New Foundation for ITS Goss Nuzzo-Jones, Jason A. Walonoski, Neil T. Heffernan and Tom Livak
902
xxv
An Agent-Based Approach to Assisting Learners to Dynamically Adjust Learning Processes Weidong Pan EarthTutor: A Multi-Layered Approach to ITS Authoring Kristen Parton, Aaron Bell and Sowmya Ramachandran Using Schema Analysis for Feedback in Authoring Tools for Learning Environments Harrie Passier and Johan Jeuring
905 908
911
The Task Sharing Framework for Collaboration and Meta-Collaboration Darren Pearce, Lucinda Kerawalla, Rose Luckin, Nicola Yuill and Amanda Harris
914
Fostering Learning Communities Based on Task Context Niels Pinkwart
917
MALT - A Multi-Lingual Adaptive Language Tutor Matthias Scheutz, Michael Heilman, Aaron Wenger and Colleen Ryan-Scheutz
920
Teaching the Evolution of Behavior with SuperDuperWalker Lee Spector, Jon Klein, Kyle Harrington and Raymond Coppinger
923
Distributed Intelligent Learning Environment for Screening Mammography Paul Taylor, Joao Campos, Rob Procter, Mark Hartswood, Louise Wilkinson, Elaine Anderson and Lesley Smart
926
The Assistment Builder: A Rapid Development Tool for ITS Terrence E. Turner, Michael A. Macasek, Goss Nuzzo-Jones, Neil T. Heffernan and Ken Koedinger
929
What Did You Do At School Today? Using Tablet Technology to Link Parents to Their Children and Teachers Joshua Underwood, Rosemary Luckin, Lucinda Kerawalla, Benedict du Boulay, Joe Holmberg, Hilary Tunley and Jeanette O’Connor Semantic Description of Collaboration Scripts for Service Oriented CSCL Systems Guillermo Vega-Gorgojo, Miguel L. Bote-Lorenzo, Eduardo Gómez-Sánchez, Yannis A. Dimitriadis and Juan I. Asensio-Pérez What’s in a Rectangle? An Issue for AIED in the Design of Semiotic Learning Tools Erica de Vries A User Modeling Framework for Exploring Creative Problem-Solving Ability Hao-Chuan Wang, Tsai-Yen Li and Chun-Yen Chang Adult Learner Perceptions of Affective Agents: Experimental Data and Phenomenological Observations Daniel Warren, E. Shen, Sanghoon Park, Amy L. Baylor and Roberto Perez
932
935
938 941
944
xxvi
Factors Influencing Effectiveness in Automated Essay Scoring with LSA Fridolin Wild, Christina Stahl, Gerald Stermsek, Yoseba Penya and Gustaf Neumann
947
Young Researchers Track Argumentation-Based CSCL: How Students Solve Controversy and Relate Argumentative Knowledge Marije van Amelsvoort and Lisette Munneke
953
Generating Reports of Graphical Modelling Processes for Authoring and Presentation Lars Bollen
954
Towards an Intelligent Tool to Foster Collaboration in Distributed Pair Programming Edgar Acosta Chaparro
955
Online Discussion Processes: How Do Earlier Messages Affect Evaluations, Knowledge Contents, Social Cues and Responsiveness of Current Message? Gaowei Chen
956
PECA: Pedagogical Embodied Conversational Agents in Mixed Reality Learning Environments Jayfus T. Doswell
957
Observational Learning from Social Model Agents: Examining the Inherent Processes Suzanne J. Ebbers and Amy L. Baylor
958
An Exploration of a Visual Representation for Interactive Narrative in an Adventure Authoring Tool Seth Goolnik
959
Affective Behavior in Intelligent Tutoring Systems for Virtual Laboratories Yasmín Hernández and Julieta Noguez Taking into Account the Variability of the Knowledge Structure in Bayesian Student Models Mathieu Hibou Subsymbolic User Modeling in Adaptive Hypermedia Katja Hofmann The Effect of Multimedia Design Elements on Learning Outcomes in Pedagogical Agent Research: A Meta-Analysis Soyoung Kim
960
961 962
963
An ITS That Provides Positive Feedback for Beginning Violin Students Orla Lahart
964
A Proposal of Evaluation Framework for Higher Education Xizhi Li and Hao Lin
965
xxvii
Supporting Collaborative Medical Decision-Making in a Computer-Based Learning Environment Jingyan Lu
966
Logging, Replaying and Analysing Students’ Interactions in a Web-Based ILE to Improve Student Modelling Manolis Mavrikis
967
How do Features of an Intelligent Learning Environment Influence Motivation? A Qualitative Modelling Approach Jutima Methaneethorn
968
Integrating an Affective Framework into Intelligent Tutoring Systems Mohd Zaliman Yusoff
969
Relation-Based Heuristic Diffusion Framework for LOM Generation Olivier Motelet
970
From Representing the Knowledge to Offering Appropriate Remediation – a Road Map for Virtual Learning Process Mehdi Najjar
971
Authoring Ideas for Developing Structural Communication Exercises Robinson V. Noronha
972
An Orientation towards Social Interaction: Implications for Active Support Maarten Overdijk and Wouter van Diggelen
973
Designing Culturally Authentic Pedagogical Agents Yolanda Rankin
975
Incorporation of Learning Objects and Learning Style - Metadata Support for Adaptive Pedagogical Agent Systems Shanghua Sun
976
Enhancing Collaborative Learning Through the Use of a Group Model Based on the Zone of Proximal Development Nilubon Tongchai
977
Tutorial Planning: Adapting Course Generation to Today’s Needs Carsten Ullrich Mutual Peer Tutoring: A Collaborative Addition to the Cognitive Tutor Algebra-1 Erin Walker Enhancing Learning Through a Model of Affect Amali Weerasinghe Understanding the Locus of Modality Effects and How to Effectively Design Multimedia Instructional Materials Jesse S. Zolna
978
979 980
981
Panels Pedagogical Agent Research and Development: Next Steps and Future Possibilities Amy L. Baylor, Ron Cole, Arthur Graesser and W. Lewis Johnson
985
xxviii
Tutorials Evaluation Methods for Learning Environments Shaaron Ainsworth Rapid Development of Computer-Based Tutors with the Cognitive Tutor Authoring Tools (CTAT) Vincent Aleven, Bruce McLaren and Ken Koedinger
989
990
Some New Perspectives on Learning Companion Research Tak-Wai Chan
991
Education and the Semantic Web Vladan Devedžić
992
Building Intelligent Learning Environments: Bridging Research and Practice Beverly Park Woolf
993
Workshops Student Modeling for Language Tutors Sherman Alpert and Joseph E. Beck International Workshop on Applications of Semantic Web Technologies for E-Learning (SW-EL’05) Lora Aroyo and Darina Dicheva Adaptive Systems for Web-Based Education: Tools and Reusability Peter Brusilovsky, Ricardo Conejo and Eva Millán
997
998 999
Usage Analysis in Learning Systems
1000
Workshop on Educational Games as Intelligent Learning Environments Cristina Conati and Sowmya Ramachandran
1001
Motivation and Affect in Educational Software Cristina Conati, Benedict du Boulay, Claude Frasson, Lewis Johnson, Rosemary Luckin, Erika A. Martinez-Miron, Helen Pain, Kaska Porayska-Pomsta and Genaro Rebolledo-Mendez
1002
Third International Workshop on Authoring of Adaptive and Adaptable Educational Hypermedia Alexandra Cristea, Rosa M. Carro and Franca Garzotto
1004
Learner Modelling for Reflection, to Support Learner Control, Metacognition and Improved Communication Between Teachers and Learners Judy Kay, Andrew Lum and Diego Zapata-Rivera
1005
Author Index
1007
Invited Talks
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
3
Learning with Virtual Peers Justine Cassell Northwestern University U.S.A.
Abstract Schools aren't the only places people learn, and in the field of educational technology, informal learning is receiving increasing attention. In informal learning peers are of primary importance. But, how do you discover what works in peer learning? If you want to discover what peers do for one other so that you can then set up situations and technologies that maximize peer learning, where do you get your data from? You can study groups of children and hope that informal learning will happen and hope that you have a large enough sample to witness examples of each kind of peer teaching that you hope to study. Or you can make a peer Unfortunately, the biological approach takes years, care and feeding is expensive, diary studies are out of fashion, and in any case the human subjects review board frowns on the kind of mind control that would allow one to manipulate the peer so as to provoke different learning reactions. And so, in my own research, I chose to make a bionic peer. In this talk I describe the results from a series of studies where we manipulate a bionic peer to see the effects of various kinds of peer behavior on learning. The peer is sometimes older and sometimes younger than the learners, sometimes the same race and sometimes a different race, sometimes speaking at the same developmental level -- and in the same dialect -- and the learners, and sometimes differently. In each case we are struck by how much learning occurs when peers play, how learning appears to be potentiated by the rapport between the real and virtual child, and how many lessons we learn about the more general nature of informal learning mediated by technology.
4
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Scaffolding inquiry learning: How much intelligence is needed and by whom? Ton de Jong University of Twente The Netherlands
Abstract Inquiry learning is way of learning in which learners act like scientists and discover a domain by employing processes such as hypothesis generation, experiment design, and data interpretation. The sequence of these learning processes and the choice for specific actions (e.g., what experiment to perform) are determined by the learners themselves. This student centeredness makes that inquiry learning heavily calls upon metacognitive processes such as planning and monitoring. These inquiry and metacognitive processes make inquiry learning a demanding task. When inquiry is combined with modelling and collaboration facilities the complexity of the learning process even increases. To make inquiry learning successful, the inquiry (and modelling and collaborative) activities need to scaffolded. Scaffolding can mean that the learning environment is structured or that learners are provided with cognitive tools for specific activities. AI techniques can be used to make scaffolds more adaptive to the learner or to developments in the learning process. In this presentation an overview of (adaptive and non-adaptive) scaffolds for inquiry learning in simulation based learning environments will be discussed.details will follow.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
5
Constraint-based tutors: a success story Tanja Mitrovic University of Christchurch New Zealand
Abstract Constraint-based modelling (CBM) was proposed in 1992 as a way of overcoming the intractable nature of student modelling. Originally, Ohlsson viewed CBM as an approach to developing short-term student models. In this talk, I will illustrate how we have extended CBM to support both short- and long-term models, and developed methodology for using such models to make various pedagogical decisions. In particular, I will present several successful constraint-based tutors built for a various procedural and non-procedural domains. I will illustrate how constraint-based modelling supports learning and metacognitive skills, and present current project within the Intelligent Computer Tutoring Group.
6
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Interactivity and Learning Dan Schwartz Stanford University U.S.A.
Abstract Two claims for artificial intelligence techniques in education are that they can increase positive interactive experiences for students, and they can enhance learning. Depending on one’s preferences, the critical question might be “how do we configure interactive opportunities to optimize learning?” Alternatively, the question might be, “how do we configure learning opportunities to optimize positive interactions?” Ideally, the answers to these two questions are compatible so that desirable interactions and learning outcomes are positively correlated. But, this does not have to be the case – interactions that people deem negative might lead to learning that people deem positive, or vice versa. The question for this talk is whether there is a “sweet spot” where interactions and learning complement one another and the values we hold most important. I will offer a pair of frameworks to address this question: one for characterizing learning by the dimensions of innovation and efficiency; and one for characterizing interactivity by the dimensions of initiative and idea incorporation. I will provide empirical examples of students working with intelligent computer technologies to show how desirable outcomes in both frameworks can be correlated.
Full Papers
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
9
Evaluating a Mixed-Initiative Authoring Environment: Is REDEEM for Real? Shaaron AINSWORTH and Piers FLEMING School of Psychology and Learning Sciences Research Institute, University of Nottingham Email: {sea/pff}@psychology.nottingham.ac.uk Abstract. The REDEEM authoring tool allows teachers to create adapted learning environments for their students from existing material. Previous evaluations have shown that under experimental conditions REDEEM can significantly improve learning. The goals of this study were twofold: to explore if REDEEM could improve students’ learning in real world situations and to examine if learners can share in the authoring decisions. REDEEM was used to create 10 courses from existing lectures that taught undergraduate statistics. An experimenter performed the content authoring and then created student categories and tutorial strategies that learners chose for themselves. All first-year psychology students were offered the opportunity to learn with REDEEM: 90 used REDEEM at least once but 77 did not. Students also completed a pre-test, 3 attitude questionnaires and their final exam was used as a post-test. Learning with REDEEM was associated with significantly better exam scores, and this remains true even when attempting to control for increased effort or ability of REDEEM users. Students explored a variety of categories and strategies, rating their option to choose this as moderately important. Consequently, whilst there is no direct evidence that allowing students this control enhanced performance, it seems likely that it increased uptake of the system.
1.
Introduction
The REDEEM authoring tool was designed to allow teachers significant control over the learning environments with which their students learn. To achieve this goal, the authoring process and the resulting learning environments have both been simplified when compared to more conventional authoring tools. REDEEM uses canned content but delivers it in ways that teachers feel are appropriate to their learners. Specifically, material can be selected for different learners, presented in alternative sequences, with differences exercises and problems, and authors can create tutorial strategies that vary such factors as help, frequency and position of tests and degree of student control. This approach, focussing on adapted learning environments rather than adaptive learning environments, has been evaluated with respect to both the authors’ and learners’ experiences (see [1] for a review). Overall, REDEEM was found to be usable by authors with little technological experience and timeefficient for the conversion of existing computer-based training (CBT) into REDEEM learning environments (around 5 hours per hour of instruction). Five experimental studies have contrasted learning with REDEEM to learning with the original CBT in a variety of domains (e.g. Genetics, Computing, Radio Communication) and with a wide range of learners (schoolchildren, adults, students). REDEEM led to an average 30% improvement from pre-test to post-test, whereas CBT increased scores by 23%. This advantage for REDEEM translates into an average effect size of .51, which compares well to non-expert human individual tutors and is around .5 below full-blown ITSs (e.g. [2,3]).
10
S. Ainsworth and P. Fleming / Evaluating a Mixed-Initiative Authoring Environment
To perform three of these experiments, teachers were recruited who had in-depth knowledge of the topic and the students in this class. They used this knowledge to assign different student categories which resulted in different content and tutorial strategies. In the other two experiments, this was not possible and all the participants were assigned to one category and strategy. But, it may have been more appropriate to let students choose their own approach to studying the material. This question can be set in the wider context of authoring tools research, namely for any given aspect of the learning environment, who should be making these decisions – should it be a teacher, should it be the system or can some of the authoring decisions be presented to learners in such a way that they can make these decisions for themselves. Whilst, there has been some debate in the literature about how much control to give the author versus the system [4], the issue of how much of the authoring could be performed by learners themselves has received little direct attention. Of course, the general issue of how much control to give students over aspects of their learning has been part of a long and often contentious debate (e.g. [5, 6]). There are claims for enhanced motivation [7] but mixed evidence for the effectiveness of learner control. However, in the context under consideration (1st year University students), there was no teacher available who could make these decisions based upon personal knowledge of the student. Consequently, to take advantage of REDEEM’s ability to offer adapted learning environments, the only sensible route was to allow learners to make these decisions for themselves. As a result, a mixed initiative version of REDEEM was designed that kept the same model of content and interactivity authoring as before, but now gave students the choice of learner category (from predefined categories) and teaching strategy (also predefined). Thus the aim of this approach is not to turn learners into authors as [8] but instead to renegotiate the roles of learners and authors. A second goal for this research was to explore the effectiveness of REDEEM over extended periods, outside the context of an experiment. One positive aspect of AIED in recent years has been the increase in number of evaluations conducted in realistic contexts (e.g. [3, 9]). However, given the complex issues involved in running an experiment, the norm for evaluation (including the previous REDEEM studies) is that they are conducted in experimental situations with limited curriculum over a short duration and post-tests tend to be on the specific content of the tutor. To show that interacting with a learning environment improves performance when used as part of everyday experience is still far from common (another exception is ANDES [10] whose research goal is to explore if minimally invasive tutoring can improve learning in real world situations). Yet, it is this test that may convince sceptics about the value of ITSs and interactive learning environments. However, assessing if REDEEM improves learning ‘for real’ is far from easy as it was difficult to predict how many students would chose to use REDEEM or whether we would be able to account for explanations based upon differential use of REDEEM by different types of learners. 2.
Brief System Description
REDEEM consists of three components: a courseware catalogue of material created externally to REDEEM, an ITS Shell and a set of authoring tools (please see [1] for a fuller description of components and the authoring process). REDEEM’s authoring tools decompose the teaching process into a number of separate components. Essentially, authors are asked to add interactivity to the underlying courseware (by adding questions, hints, answer feedback and reflections points) they describe the structure of material, create student categories and create teaching strategies. This information is then combined by assigning particular teaching strategies and types of material to different learner groups. The difference with this latest version is that the students themselves select one of the learner categories and this now results in a default teaching strategy, which they can change
S. Ainsworth and P. Fleming / Evaluating a Mixed-Initiative Authoring Environment
11
to any other strategies that are available. This design is a trade-off between giving students’ significant choice yet only requiring a minimum of interaction to utilise this functionality. The courseware consisted of ten PowerPoint lectures saved as html. These were then imported into REDEEM by an experimenter, who in addition to describing the structure of the material, added approximately one question per page with an average of three hints per question and an explanation of the correct answer and reflection points. Four learner categories were created (non-confident learner (NCL, confident learner (CL), non-confident reviser (NCR), confident reviser (CR). Four default teaching strategies were created (Table 1) based upon ones teachers had authored in previous studies [11]. In addition, four optional strategies were devised that provided contrasting experiences such as using it in ‘exam style’ or in ‘pre-test’ mode (test me after the course, before section or course). Table 1. Teaching Strategies Name
Default
Description
Simple Introduction Guided Practice
NCL
Guided Discovery Free Discovery Just Browsing Test me after the course Test me before each section Test me before the course
CL
No student control of material or questions; easy/medium questions (max one per page), 2 attempts per question, help available. Questions after page. No student control of material/questions; easy/medium questions (max one per page). 5 attempts per question, help is available. Questions after section. Choice order of sections but not questions. 5 attempts per question, help only on error. Questions after section. Choice order of sections and questions. 5 attempts per question, help available Complete student control of material. No questions. No student control of material or questions. All questions at the end, 1 attempt per question, no help. Choose order of sections. Questions are given before each section. 5 attempts per question and help available on error. Student control sections All questions at the start. 5 attempts per question. Help is available.
3.
NCR
CR
Method
3.1. Design and Participants This study employed a quasi-experimental design as students decided for themselves whether to learn with the REDEEMed lectures. All 215 first-year Psychology students (33 males and 182 females) had previously studied a prerequisite statistics course, which was assessed in the same exam as this course, but for which no REDEEM support had been available. 167 students completed both the pre-test and post-test. 3.2. Materials Pre and post-tests were multiple-choice, in which each question had one correct and three incorrect answers. A pre-test was created which consisted of 12 multi-choice questions addressing material taught only in the first semester. Questions were selected from an existing pool of exam questions but were not completely representative as they required no calculation (the pre-test was carried out without guaranteed access to calculators). The 100 question multi-choice two hour exam was used as a post-test. These questions were a mix of factual and calculation questions. All students are required to pass this exam before continuing their studies. The experimenters were blind to this exam. A number of questionnaires were given over the course of the semester to assess students’ attitudes to studying, computers, statistics and the perceived value of REDEEM.
12 • • •
S. Ainsworth and P. Fleming / Evaluating a Mixed-Initiative Authoring Environment
A general questionnaire asked students to report on their computer use and confidence, the amount of time spent studying statistics and the desire for further support. An attitude to statistics questionnaire assessed statistics confidence, motivation, knowledge, skill and perceived difficulty on a five-point Likert scale. A REDEEM usage questionnaire asked students to report on how much they used REDEEM, to compare it to other study techniques and to rank the importance of various system features (e.g. questions, having a choice of teaching strategy).
3.3. Procedure • •
• • • •
4.
All first year students received traditional statistics teaching for Semester One (ten lectures) from September to December 2003. Early in the second semester, during their laboratory classes, students were introduced to REDEEM and instructed in its use. They were informed that data files logging their interactions with the system would be generated and related to their exam performance but data would not passed to statistics lecturers in a way that could identify individuals. During these lessons, students were also given the pre-test and a questionnaire about their use of computers and perceptions of statistics. As the second semester progressed, REDEEMed lectures were made available on the School of Psychology intranet after the relevant lecture was given. Students logged into REDEEM, chose a lecture and a learner category. Students were free to override the default strategy and change to one of seven others at any time. At the end of the lecture course (the tenth lecture) another questionnaire was given to reassess the students’ perceptions of statistics and REDEEM. Finally, two and a half weeks after the last lecture, all of the students had to complete a statistics exam as part of their course requirements.
Results
This study generated a vast amount of data and this paper focuses on a fundamental question, namely whether using REDEEM could be shown to impact upon learning. In order to answer this question a number of preliminary analyses needed to be carried out and criteria set, the most important being what counted as using REDEEM to study a lecture. After examining the raw data, it was concluded that a fair criterion was to say that students were considered to have studied a lecture with REDEEM if they had completed 70% of the questions for that lecture. The range of strategies allowed very different patterns of interactions, so questions answered was chosen because many students only accessed the practice questions without choosing to review the material and only one student looked at more than three pages without answering a question. Note, this criterion excludes the just browsing strategy, but this was almost never used and was no one’s preferred strategy. A second important preliminary analysis was to relate the 100 item exam to individual lectures. This was relatively simple given the relationship between the exam structure and learning objectives set by the lecturers. 42 questions were judged as assessing Semester 1 performance and so these questions provided a score on the exam that was unaffected by REDEEM. The Semester 2 questions were categorised according to the lecture in which the correct answer was covered. The 12 questions that addressed material taught in both semesters were not analysed further.
S. Ainsworth and P. Fleming / Evaluating a Mixed-Initiative Authoring Environment
13
4.1. Relationship between REDEEM Use and Learning Outcomes Table 2. Scores of REDEEM v non-REDEEM users REDEEM at least once (N = 90) Never used REDEEM (N = 77)
Pre-test 50.64% (15.96) 49.24% (14.06)
Semester 1 Post-test 69.00% (12.08) 67.32% (10.35)
Semester 2 Post-test 58.09% (13.03) 53.44% (14.43)
The first analysis compared the scores of students who had never used REDEEM to those who had studied at least one lesson with REDEEM (Table 2). A [2 by 1] MANOVA on the pre-test, Semester 1 and Semester 2 scores revealed no difference for pre-test and Semester 1, but found the REDEEM users scored higher on Semester 2 (F(1,167) = 4.78, p Unattractive (85% vs. 15%) ***
Cool > Uncool (71% vs. 29%) ***
Representative Agent (% selected)
(16% )
Who would you most want to be like?
Female > Male (79% vs. 21%) ***
Who is similar to who you see yourself as?
Female > Male (81% vs. 19%) ***
Young > Old (81% vs.19%) ***
Who most looks like an engineer?
Male > female (94% vs. 6%) *
Old > Young (63% vs. 37%) ***
Uncool > Cool (75% vs. 25%) ***
Who looks least like an engineer?
Female > Male (73% vs. 27%) ***
Young > Old (69% vs. 31%) **
Cool > Uncool (84% vs. 16%) ***
Who would you like to learn from about engineering?
Male > female (87% vs. 13%) ***
Young > Old (85% vs. 15%) ***
(72%)
(53%)
(28%)
(24%)
Attractive > Unattractive (69% vs. 31%),**
Uncool > Cool (64% vs. 36%) ** (22% )
* p Median FOK < Median INF > Median INF < Median KE > Median KE < Median PKA > Median PKA < Median RR > Median RR < Median SUM > Median SUM < Median
Low/Low 26 45 30 41 15 56 35 36 43 28 33 38
Low/Medium 12 11 13 10 11 12 12 11 12 11 10 13
Low/High 25 9 24 10 16 18 19 15 11 23 20 14
Medium/Medium 6 7 8 6 3 11 5 9 6 8 8 6
um/High
High/High 6 2 3 5 3 5 6 2 4 4 2 6
Negative Shift 16 25 13 28 9 32 13 28 24 17 16 25
4. Implications for the Design of Hypermedia This research can inform the design and use of HLEs with complex science topics such as the circulatory system. The tremendous opportunities afforded to educators through the use of HLEs will only come to fruition if these learning environments are built to scaffold higher-order learning behaviors. This study points to the importance of creating HLEs that are clear in their presentation, lest unnecessary re-reading of the material take time away from higher order student cognition and learning. On the other hand, it would seem that higher order cognitive strategies, such as summarization and knowledge elaboration, are more likely to lead to the types of qualitative mental model shifts that are essential for true understanding. HLEs could scaffold these strategies by providing prompts and examples of these behaviors. Likewise, students should be encouraged to monitor their understanding both through the activation of prior knowledge and through checking their learning through FOK. HLEs should also prompt such behaviors, perhaps through asking thought questions at the beginning of the section, and presenting mini-quizzes when students proceed to the end of that section. Truly adaptive HLEs would use student trace logs to adaptively approximate students’ dynamically changing mental models, providing the necessary feedback to put struggling students back on track while helping successful students achieve new heights [18, 19]. The practical applications of this research lie in the design of HLEs that both decrease the need for lower-level SRL behaviors such as re-reading and increase
240
J.A. Greene and R. Azevedo / Adolescents’ Use of SRL Behaviors and Their Relation
the use of higher-order ones such as FOK. More research remains to be done, however, on how these higher-order SRL behaviors can be prompted and taught during student use of HLEs. Future research should focus on the best means of inculcating effective SRL behaviors through on-line methods, so that HLEs can teach both content and the actual process of learning.
5. Acknowledgements This research was supported by funding from the National Science Foundation (REC#0133346) awarded to the second author. The authors would like the thank Jennifer Cromley, Fielding Winters, Daniel Moos, and Jessica Vick for assistance with data collection.
6. References [1] Azevedo, R., & Cromley, J.G. (2004). Does training on self-regulated learning facilitate students' learning with hypermedia? Journal of Educational Psychology, 96(3), 523-535. [2] Azevedo, R., Cromley, J.G., & Seibert, D. (2004). Does adaptive scaffolding facilitate students’ ability to regulate their learning with hypermedia? Contemporary Educational Psychology, 29, 344-370. [3] Shapiro, A., & Niederhauser, D. (2004). Learning from hypertext: Research issues and findings. In D. H. Jonassen (Ed.). Handbook of Research for Education Communications and Technology (2nd ed). Mahwah, NJ: Lawrence Erlbaum Associates. [4] Azevedo, R., Winters, F.I., & Moos, D.C. (in press). Can students collaboratively use hypermedia to learn about science? The dynamics of self- and other-regulatory processes in an ecology classroom. Journal of Educational Computing Research. [5] Chi, M. T.H., Siler, S., Jeong, H., Yamauchi, T., & Hausmann, R. (2001). Learning from human tutoring. Cognitive Science, 25, 471-534. [6] Jacobson, M., & Kozma, R. (2000). Innovations in science and mathematics education: Adavnced designs for technologies of learning. Mawah, NJ: Erlbaum. [7] Lajoie, S.P., & Azevedo, R. (in press). Teaching and learning in technology-rich environments. In P. Alexander, P. Winne, & G. Phye (Eds.), Handbook of educational psychology (2nd ed.). Mahwah, NJ: Erlbaum. [8] Chi, M. T.H., de Leeuw, N., Chiu, M.-H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439-477. [9] Chi, M. T.H., Siler, S., & Jeong, H. (2004). Can tutors monitor students’ understanding accurately? Cognition and Instruction, 22, 363-387. [10] Kozma, R., Chin, E., Russell, J., & Marx, N. (2000). The roles of representations and tools in the chemistry laboratory and their implications for chemistry learning. Journal of the Learning Sciences, 9(2), 105-144. [11] Jacobson, M., & Archodidou, A. (2000). The design of hypermedia tools for learning: Fostering conceptual change and transfer of complex scientific knowledge. Journal of the Learning Sciences, 9(2), 149-199. [12] Shapiro, A. (2000). The effect of interactive overviews on the development of conceptual structure in novices learning from hypermedia. Journal of Interactive Multimedia and Hypermedia, 9(1), 57-78. [13] Azevedo, R., Guthrie, J.T., & Seibert, D. (2004). The role of self-regulated learning in fostering students’ conceptual understanding of complex systems with hypermedia. Journal of Educational Computing Research, 30(1), 87-111. [14] Pintrich, P.R. (2000). The role of goal orientation in self-regulated learning. In M. Boekaerts, P. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 451-502). San Diego, CA: Academic Press. [15] Winne, P.H., & Perry, N.E. (2000). Measuring self-regulated learning. In M. Boekaerts, P. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 531-566). San Diego, CA: Academic Press. [16] Winne, P.H. (2001). Self-regulated learning viewed from models of information processing. In B. Zimmerman & D. Schunk (Eds.), Self-regulated learning and academic achievement: Theoretical perspectives (pp. 153-189). Mawah, NJ: Erlbaum. [17] Zimmerman, B. (2000). Attaining self-regulation: A social cognitive perspective. In M. Boekaerts, P. Pintrich, & M. Zeidner (Eds.), Handbook of self-regulation (pp. 13-39). San Diego, CA: Academic Press. [18] Brusilovsky, P. (2001). Adaptive hypermedia. User Modeling and User-Adapted Interaction, 11, 87-110. [19] Brusilovsky, P. (2004). Adaptive navigation support in educational hypermedia: The role of student knowledge level and the case for meta-adaptation. British Journal of Educational Technology, 34(4), 487-497.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
241
Teaching about Dynamic Processes A Teachable Agents Approach Ruchie Gupta, Yanna Wu, and Gautam Biswas Dept. of EECS and ISIS, Vanderbilt University Nashville, TN, 37235, USA ruchi.gupta, yanna.wu,
[email protected] Abstract. This paper discusses the extensions that we have made to Betty’s Brain teachable agent system to help students learn about dynamic processes in a river ecosystem. Students first learn about dynamic behavior in a simulation environment, and then teach Betty by introducing cycles into the concept map representation. Betty’s qualitative reasoning mechanisms have been extended so that she can reason about cycles and determine how entities change over time. Preliminary experiments were conducted to study and analyze the usefulness of the simulation. Analysis of the students’ protocols was very revealing, and the lessons learnt have led to redesigned simulation interfaces. A new study with the system will be conducted in a fifth grade science classroom in May, 2005. . 1. Introduction
Modern society is engulfed by technology artifacts that impact every aspect of daily life. This makes learning and problem solving with advanced math and science concepts important components of the K-12 curriculum. Many of the teaching and testing methods in present day school systems focus on memorization and not on true understanding of domain material [1]. Lack of systematic efforts to demonstrate the students’ problem solving skills hamper the students’ abilities to retain what they have learned, and to develop the competencies required for advanced science and technology training in the future. A solution proposed by researchers is to introduce constructivist and exploratory learning methods to help students take control of their own learning and overcome the problems of inert learning and learning without understanding [1]. The cognitive science and education literature has shown that teaching others is a powerful way to learn [23]. Preparing to teach others helps one gain a deeper understanding of the subject matter. While teaching, feedback from students provides the teacher with an opportunity to reflect on his or her own understanding of the material [4]. We have adopted the learning by teaching paradigm to develop an intelligent learning environment called Betty’s Brain, where students teach Betty, a software agent, using a concept map representation [5]. Experiments conducted with Betty’s Brain in fifth grade science classrooms demonstrated that the system is successful in helping students learn about river ecosystem entities and their relationships [6]. Students showed improved motivation and put in extra effort to understand the domain material. Transfer tests showed that they were better prepared for “future learning” [7]. In the current version of the system, Betty’s representation and reasoning mechanisms are geared towards teaching and learning about interdependence in river ecosystems. However, analysis of student answers to post-test questions on balance (equilibrium) made it clear that students did not quite grasp this concept and how it applied to river ecosystems.
242
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
We realized that to understand balance, students had to be introduced to the dynamic behavior of river ecosystems. This brought up two challenges. First, how do we extend students’ understanding of interdependence to the notion of balance, and second, how should we extend the representation and reasoning mechanisms in Betty’s Brain to help middle school students learn and understand about the behavior of dynamic processes. Analyzing dynamic systems behavior can be very challenging for middle school students who do not have the relevant mathematical background or maturity. To overcome this, we introduced the notion of cycles in the concept map representation to model changes that happen over time. To scaffold the process of learning about temporal effects, we designed a simulation that provides a virtual window into a river ecosystem in an engaging and easy to grasp manner. This brings up another challenge, i.e., how do we get students to transfer their understanding of the dynamics observed in the simulation to the concept map representation, where changes over time are captured as cyclic structures. This paper discusses the extensions made to the concept map representation and the reasoning mechanisms that allow Betty to reason with time. A protocol analysis study with high school students pointed out a number of features that we needed to add to the simulation interfaces to help students understand dynamic behaviors. The redesigned simulation interfaces will be used for a study in a middle school science classroom in May 2005. 2. Betty’s Brain: Implementation of the Learning by Teaching Paradigm Betty’s Brain is based on the learning by teaching paradigm. Students explicitly teach and receive feedback about how well they have taught Betty. Betty uses a combination of text, speech, and animation to communicate with her student teachers. The teaching process is implemented through three primary modes of interaction between the student and Betty: teach, quiz, query. Fig. 1 illustrates the Betty’s Brain system interface. In the teach mode, students teach Betty by constructing a concept map using an intuitive graphical point and click interface. In the query mode, students use a template to generate questions about the concepts they have taught her. Betty uses a qualitative reasoning mechanism to reason with the concept map, and, when asked, she provides a detailed explanation of her answers [5]. In the quiz phase, students can observe how Betty performs on a pre-scripted set of questions. This feedback helps the students estimate how well they have taught Betty, which in turn helps them reflect on how well they have learnt the information themselves. Details of the system architecture and its implementation are discussed elsewhere [589]. The system, implemented as a multi-agent architecture, includes a number of scaffolds to help fifth grade students in science classrooms. These include extensive searchable online resources Figure 1: Betty’s Brain interfaces on river ecosystems and a mentor agent, Mr. Davis, who not only provides feedback to Betty and the student but also provides advice, when asked, on how to be better learners and better teachers. Experimental studies in fifth grade classrooms have demonstrated the success of Betty’s Brain in students’ preparation for future learning, in general, and learning about river ecosystems, in particular [56].
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
243
3. Extending Betty’s Brain: Introducing the temporal reasoning framework One of our primary goals is to help students extend their understanding of interdependence among entities in an ecosystem to the dynamic nature of the interactions between these entities, so that they may reason about and solve problems in real world processes. Middle school students lack the knowledge and maturity to learn about mathematical modeling and analysis approaches for dynamic systems using differential equations. As an alternative, we have to develop intuitive approaches based on simplified, qualitative representations [10, 11] that capture the notion of change over time, hide complex details, but are still accurate enough to replicate the behavior of a real world ecosystem. Even experts use qualitative representations to develop quick, coarse-grained solutions to problems, and explanations for how these solutions are derived [14]. Researchers have used this approach to help students develop high level reasoning skills that are linked to mathematical methods [11]. In this work, we build on the existing qualitative reasoning mechanisms in Betty’s Brain to incorporate temporal representations and reasoning. To avoid confusion and cognitive overload, these new additions have to be seamless extensions of the previous representation and reasoning mechanisms. Also, to accommodate our novice student learners, it is important to provide them with scaffolds to aid their understanding of dynamic system behavior. In the learning by teaching framework, the student teachers are given opportunities to learn and understand the material to be taught before they proceed to teach Betty. To help students in their preparations to teach, we have designed and implemented a simulation of a river ecosystem. In the rest of this section, we describe the simulation system, and the extensions to Betty’s qualitative reasoning mechanism. 3. 1. The Simulation In constructivist approaches to learning, students are encouraged to direct and structure their own learning activities to pursue their knowledge building goals [12]. To facilitate this approach to learning, we provide the students with an exploratory simulation environment, where they are exposed to a number of situations that makes them aware of the dynamic phenomena that occur in river ecosystems. The simulation includes a variety of visual tools that the students can use to observe how entities change over time, and how these changes interact to produce cycles of behavior in the ecosystem. 3.1.1 The mathematical model and simulator The interacting entities in a river ecosystem are typically modeled as differential equation or discrete time state space models. Our river ecosystem simulation is based on a discretetime model that takes the form: xt 1 f ( xt , u t ) , where xt 1 , the state vector at time step t 1 is defined as a function of the state of the system, xt , and the input to the system, u t at time step t. We create a one-to-one mapping between the state variables in the simulation, and the entities in the river ecosystem expert concept map that are created by the fifth grade science teachers. This concept map includes the following entities: fish, algae, plants, macro invertebrates, bacteria, oxygen, waste, dead organisms, and nutrients. The quantity of each of these entities is represented by a state variable, and a typical state equation takes on the following form: O2 t 1 O2 t 0.001125 .Pt 0.006 .Ft 0.001 .M t 0.00075 At 0.0004 .Bt This linear equation describes the change in the amount of dissolved oxygen, O2, from one time step to the next for the ecosystem in balance. O2 t , Pt , Ft , M t , At , and Bt represent the quantity of dissolved oxygen, plants, fish, macroinvertebrates, algae, and bacteria, respectively, in appropriate units at time step, t. The coefficients in the equation represent the strength of interactions between pairs of entities. For example, the coefficient for Ft is greater than the coefficient for M t because fish consume more oxygen than macro inverte-
244
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
brates. Producers of oxygen, plants and algae, have positive coefficients and consumers, fish, macroinvertebrates, and bacteria, have negative coefficients in the above equation. The state equations would have been much more complex with steep nonlinearities, if we had included phenomena, where the river did not remain in balance. Instead, we employ a hybrid modeling approach, and switch the equations when the entities exceed predefined critical values. For example, if the amounts of dissolved oxygen and plants fall below a certain value, they have a strong negative effect on the quantity of fish in the river. This phenomenon is captured by the following equation: If O 2 t 3 (ppm) and Pt 3500 (micromg/L) Ft 1 Ft ((6 O 2 t ) / 300).Ft ((4000 Pt ) / 50000).Ft Therefore, our state equation-based simulation model captures the behavior of a river ecosystem under different operating conditions that include the behavior of the ecosystem in balance and out of balance. The simulation model was implemented using AgentSheets [13], which is a software tool designed to facilitate the creation of interactive simulations using a multi agent framework. This tool was chosen primarily because it provides an easy way to construct appealing visual interfaces. Its user friendly drag and drop interface made it easy to implement the simulation model. Each entity was modeled as an agent with the appropriate set of equations describing its behavior at every time step. 3.1.2 The visual interface Fig. 2 illustrates the visual interface of the simulation system. It has two components. The first uses an animation to provide a virtual window into the ecosystem. Its purpose is to give the student an easy to understand global view of the state of the system. The second component uses Figure 2: The simulation interface graphs to give a more precise look at the amount of the different entities and how these amounts change with time. The student can use these graphs to not only determine the amounts, but also study patterns of change. Further, since the cyclic behavior of the variables was clearly visible in these plots, we believed that students could use these graphs to learn about cycle times, and teach Betty this information in the concept map representation. 3.1.3 Ranger Joe Ranger Joe plays the role of the mentor in the simulation environment. He provides help on a variety of topics that range from textual descriptions of the simulation scenarios, to telling students how to run the simulation, and how to read the graphs. When asked, he makes students aware of the features available in the simulation environment, and how students may use them to learn more about dynamic changes in the river. The current version of Ranger Joe provides responses in text form only. 3.2. Extending Betty’s reasoning mechanisms to incorporate temporal reasoning As discussed earlier, we have extended the concept map representation in Betty’s Brain to include cyclic structures. Any path (chain of events) that begins on a concept and comes back to the same concept can be called a cycle. For example, the concepts macroinvertebrates, fish, and dissolved oxygen form a cycle in the concept map illustrated in Fig. 3. Unlike the previous version of Betty’s Brain, where the reasoning process only occurred along the paths from the source to the destination concept (identified in the query), e.g., “If
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
245
fish increase what happens to bacteria?”, the new system also takes into account the changes that occur along feedback paths from the destination to the source concept. For example, a change in the amount of bacteria above may cause a change in the amount of fish along the feedback path, which would further cause a change in bacteria along the forward path and so on. This creates a cycle of change and the time it takes to complete an iteration of the cycle is called the cycle time. The query mechanism had to be extended so Betty could answer questions that involved change over time, e.g., “If algae decrease a lot, what will happen to bacteria after one month?” Last, Betty’s reasoning and explanation mechanisms were extended. Each of these is described below. 3.2.1. Concept Map Building and Query Interfaces We extended the concept map interface to allow students to teach Betty about dynamic processes by constructing a concept map with cycles (see Fig. 3). To help Betty identify a cycle in the concept map, students click on the “Teach Cycle” button, which brings up a pop up window with the same name. Students identify the cycle, using any one of the nodes as the starting point, e.g., crowded algae in cycle 2 (Fig. 3) then identify the other concepts in the cycle in sequence, e.g., dead algae, then bacteria, and then nutrients. Along with each cycle, the student also has to teach Betty the time (in days) it takes to complete an iteration of the cycle. Betty responds by identifying the cycle with a number. Fig. 3 shows the concept map after the student has built two cycles identified by Betty as cycles 1 and 2 with cycle times of 5 and 10 days, respectively. Like before, students can query Betty. The original query templates were extended as shown in Fig. 3 to include a time component. 3.2.2. Temporal Reasoning Algorithm and Explanation Process The extended temporal reasoning algorithm that Betty uses has four primary steps. In step 1, Betty identifies all the forward and feedback paths between the source and destination concepts in the query. For the query, “If algae decrease a lot, what will happen to bacteria after one month?” Betty identifies algae as the source concept and bacteria as the destination concept. A forward path is a path from the source to the destination concept (e.g., algae Æ crowded algae Æ dead algae Æ bacteria) and the feedback path traces back from
Figure 3: Betty’s Brain: Temporal Reasoning Interface (top-right): temporal question template; (bottom-right): interface for teaching Betty about cy-
246
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
the destination to the source concept (e.g., bacteria Æ dissolved oxygen Æ macroinvertebrates Æ algae). In step 2, using the original reasoning process [5], all the concepts on these paths are given an initial value. In step 3, Betty orders the cycles from slowest to fastest, and executes the propagation of the chain of events for each cycle. When a path includes more than one cycle, the faster cycle is run multiple times, and then its effects are integrated with the chain of events propagation in the slower cycle. This method incorporates the time-scale abstraction process developed by Kuipers [014]. This process is repeated for the feedback path, and the result gives the updated values for the source and destination concepts after one full cycle. In step 4, this process is repeated multiple times till the value of the destination concept has been derived for the time period stated in the query. For example, when asked the query about algae and bacteria, Betty first identifies the forward and feedback paths shown earlier, and propagates the change of algae to the concepts on the forward path and then to the concepts on the feedback path using the original reasoning mechanism. She determines that crowded algae, dead algae and bacteria decrease a lot on the forward path, and dissolved oxygen, and macroinverterbrates increase a lot. In step 2, she identifies two cycles (cycles 1 and 2 in Fig. 3), one on the forward path, and the second on the feedback path. Since cycle 2 has the larger cycle time, she assigns the main cycle a period of 10 days. After that, she runs the reasoning process twice (10/5) for cycle 1 and determines that macroinverterbrates and fish increase a lot and dissolved oxygen decreases a lot. Cycle 2 is run once (10/10) to derive that crowded algae, dead algae, and nutrients decrease a lot. Betty then combines the effects of cycles 1 and 2 to determine the value for algae after 10 days (feedback effect), i.e., algae decrease a lot, and, as a result, bacteria decrease a lot (this completes one cycle, i.e., a 10 day period of behavior). Since the student wanted to know what happens to bacteria after one month, this process has to be repeated three times, and Betty arrives at the answer that bacteria decrease a lot. To facilitate student’s understanding of the temporal reasoning mechanisms, Betty uses a top-down explanation process, if asked to explain her answer. First, Betty explicates her final answer, and states how many full cycles she had to run to get this answer. Then Betty breaks down the rest of the explanation cycle by cycle, and then combines the results. Students can control what parts of the explanation and how much detail they want, by simply clicking on “Continue Explanation,” “Repeat,” and “Skip” buttons in left bottom of the interface. 4.0 Protocol Analysis Studies with the Temporal Betty We conducted a preliminary protocol analysis study with 10 high school students. None of these students knew or remembered much about the river ecosystems unit they had covered in middle school. The overall goal for each student was to teach Betty about the dynamic processes in river ecosystems by first teaching her about general concepts of the ecosystem by drawing a concept map and then refining the map by identifying cycles and teaching her timing information. One of our goals was to see how they would use the simulation tool to derive information about the structure and time period of cycles. Each student worked with a research assistant (who conducted the study) on the Betty’s Brain system for two one hour sessions. As students worked, the research assistants involved them in a dialog, in which they asked the students to interpret what they saw in the simulation, and how that information may be used to teach Betty using the concept map structure. All verbal interactions between the student and the researcher was taped, and later transcribed and analyzed. An overview of the results is presented next. Overall, all students liked the simulation and felt that it was a good tool for learning about river ecosystems. Also, they thought that the river animation was engaging and served the purpose of holding the student’s attention. The researchers asked specific ques-
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
247
tions that focused on students’ understanding of graphs, cycles and cycle times. An example dialog that was quite revealing is presented below. Researcher: So do you think the graphs were helpful in helping you think about the temporal cycles? Student: They were critical because that’s where I got my initial impression because ordinarily when someone gives you something to read, it’s really a small amount of text and doesn’t clarify much. So the graphs are the main source of information.
Also, some of the dialogues indicated that the graphs were put to good use in learning about cycle times. For example, a student, who was trying to find the cycle time involving fish and macro invertebrates said: Researcher: Are you trying to assign the time period of the cycle? Student: Yeah, see how the cycle kind of completes the whole graph in about 2 days.
A second example: Researcher: What is hard about using the graphs? Student: Well, I see the graph; I see the sine wave and the period of the sine wave, right here, right? Researcher: Right. Student: So I would think of that as completing the cycle.
Students also made some important suggestions about the graphs. Many of them mentioned that it would be better to have multiple quantities plotted on the same graph. Some of them said that it would be useful to have quantities plotted against each other rather than plotted against time so that relationships between such quantities could be observed directly. Others said that simply showing numbers of changing quantities over time would be useful too. We also had some feedback about the resources and feedback that Ranger Joe provided. The students found the text resources to be useful but thought there was too much to read, so it would be a good idea to reorganize the text into sections and make it searchable. They also thought that Ranger Joe was passive, and that he should be an active participant in the learning process. Most students stressed the importance of being able to easily navigate between different graphs and see them side by side for easy comparisons. These protocols provided valuable feedback on the effectiveness of the different features of the simulation. We realized some of the features would have to be modified, and extra features had to be implemented. These changes could not be implemented in AgentSheets. This motivated us to redesign and reimplement the simulation in a flexible programming environment like Java to facilitate the addition of new tools and easy integration of the simulation and Ranger Joe with the temporal Betty system. 5.0 The Redesigned Simulation System Different representations enhance different aspects of thinking and problem solving skills. In the new simulation, we present the state of the river ecosystem using a number of different representations that are more relevant to their problem-solving tasks. In this version of the simulation, we provide the students with a new set of tools which exploits the use of representations as a critical tool of thought. We also hope that this will help students develop representational fluency, which is an important attribute to have while attempting to solve complex real world problems. The tools for the presentation and analysis of the information in the graphs have been revamped. Students can now choose the graphs they want to view from a pull-down menu. They can choose between line graphs and bar graphs. The unit of time for the bar graph plots can be a day (unit of time in the simulation), or a week (typically the frequency with which experimental data is collected in rivers). A second feature introduced is a compare graph tool that allows the student to plot multiple quantities in the same graph to get a better idea of the interrelationships between the entities. The students can also view the simulation data in tabular form. A third tool will help students analyze the temporal change in the quantities in a more abstract qualitative way. Changing trends are depicted by upward facing arrows (increase in the quantity) and downward facing arrows (decrease in the quan-
248
R. Gupta et al. / Teaching About Dynamic Processes a Teachable Agents Approach
tity). This representation provides information that is closer to what students need to generate the concept map. The text resources have been restructured and reorganized in a hypertext form. They contain a detailed description of how to use the different tools in the simulation and how to use and interpret graphs. A keyword search features helps students to easily find the specific information they are looking for. The mentor agent, Ranger Joe, plays a more active role in this new environment. He can address specific questions that the student might have, and gives feedback that is tailored to the students’ current activities. 5.0 Discussion and Future Work Our upcoming study with middle school students starting in May, 2005 will focus on evaluating the usefulness of the system (temporal Betty + the simulation) in teaching about dynamic processes in a river ecosystem. In particular, we want to find how easy it is for students to understand the notion of timing and cycles and also how well they can learn to translate timing information in the simulation into the concept map framework. Also, we want to study the various graph representations in terms of their general usefulness, their frequency of use, and their success in helping students learn about the dynamic nature of ecosystem processes. Acknowledgements: This project is supported by NSF REC grant # 0231771. References 1. Bransford, J.D., A.L. Brown, and R. R. Cocking (2001). How People Learn: Brain, Mind,
Experience and School. 2. Palinscar, A. S. & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering
and comprehension -monitoring activities. Cognition and instruction, 1: 117-175. 3. Bargh, J. A., & Schul, Y. (1980). On the cognitive benefits of teaching. Journal of Educa-
tional Psychology, 72(5), 593-604 4. Webb, N. M. (1983). Predicting learning from student interaction: Defining the interaction
variables. Educational Psychologist, 18, 33-41. 5. Biswas, G., D. Schwartz, K. Leelawong, N. Vye, and TAG-V (2005). “Learning by Teach-
ing: A New Agent Paradigm for Educational Software,” Applied Artificial Intelligence, special issue on Educational Agents, 19(3): 363-392. 6. Biswas, G., Leelawong, K., Belynne, K., et al. (2004). Incorporating Self Regulated Learning Techniques into Learning by Teaching Environments. 26th Annual Meeting of the Cognitive Science Society, (Chicago, Illinois, 120-125. 7. Schwartz, D. L. and Martin, T. (2004). Inventing to prepare for learning: The hidden efficiency of original student production in statistics instruction. Cognition & Instruction, 22: 129-184. 8. Biswas, G., Leelawong, K., Belynne, K., et al. (2004). Developing Learning by Teaching Environments that support Self-Regulated Learning. in The seventh International Conference on Intelligent Tutoring Systems, Maceió, Brazil, 730-740. 9. Leelawong, K., K. Viswanath, J. Davis, G. Biswas, N. J. Vye, K. Belynne and J. B. Bransford (2003). Teachable Agents: Learning by Teaching Environments for Science Domains. The Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico, 109-116. 10. Bredeweg, B., Struss, P. (2003). Current Topics in Qualitative Reasoning (editorial introduction). AI Magazine, 24( 4), 13-16. 11. Bredeweg, B., Forbus, K. (2003). Qualitative Modeling in Education. AI Magazine, 24(4). 35-46. 12. Harel, I., and Papert, S. (1991). Constructionism. Norwood, NJ: Ablex. 13. Repenning, A. and Ioannidou (2004). Agent-Based End-User Development. Communications of the ACM, 47(9), 43-46. 14. Kuipers, B. (1986). Qualitative Simulation, Artificial Intelligence, 29: 289-388.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
249
([DP4XHVWLRQ5HFRPPHQGHU6\VWHP
+LFKDP+$*((VPD$Ȏ0(85 'HSDUWPHQWRI&RPSXWHU6FLHQFHDQG2SHUDWLRQDO5HVHDUFK 8QLYHUVLW\RI0RQWUHDO ^KDJHKLFKDLPHXU`#LURXPRQWUHDOFD
$EVWUDFW$OWKRXJK(OHDUQLQJKDVDGYDQFHGFRQVLGHUDEO\LQWKHODVWGHFDGHVRPHRILWVDVSHFWVVXFKDV( WHVWLQJDUHVWLOOLQWKHGHYHORSPHQWSKDVH$XWKRULQJWRROVDQGWHVWEDQNVIRU(WHVWVDUHEHFRPLQJDQLQWHJUDO DQGLQGLVSHQVDEOHSDUWRI(OHDUQLQJSODWIRUPVDQGZLWKWKHLPSOHPHQWDWLRQRI(OHDUQLQJVWDQGDUGVVXFKDV ,0647,(WHVWLQJPDWHULDOFDQEHHDVLO\VKDUHGDQGUHXVHGDFURVVYDULRXVSODWIRUPV:LWKWKLVH[WHQVLYH( WHVWLQJ PDWHULDO DQG NQRZOHGJH FRPHV D QHZ FKDOOHQJH VHDUFKLQJ IRU DQG VHOHFWLQJ WKH PRVW DGHTXDWH LQIRUPDWLRQ ,Q WKLV SDSHU ZH SURSRVH XVLQJ UHFRPPHQGDWLRQ WHFKQLTXHV WR KHOS D WHDFKHU VHDUFK IRU DQG VHOHFW TXHVWLRQV IURP D VKDUHG DQG FHQWUDOL]HG ,06 47,FRPSOLDQW TXHVWLRQ EDQN 2XU VROXWLRQ WKH ([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP XVHV D K\EULG IHDWXUHDXJPHQWDWLRQ UHFRPPHQGDWLRQ DSSURDFK 7KH UHFRPPHQGHUV\VWHPXVHV&RQWHQW%DVHGDQG.QRZOHGJH%DVHGUHFRPPHQGDWLRQWHFKQLTXHVUHVRUWLQJWRWKH XVH RI D QHZ KHXULVWLF IXQFWLRQ 7KH V\VWHP DOVR HQJDJHV LQ FROOHFWLQJ ERWK LPSOLFLW DQG H[SOLFLW IHHGEDFN IURPWKHXVHULQRUGHUWRLPSURYHRQIXWXUHUHFRPPHQGDWLRQV
.H\ZRUGV (OHDUQLQJ (WHVWLQJ $VVHVVPHQW WRRO (OHDUQLQJ 6WDQGDUGV ,06 47, +\EULG 5HFRPPHQGDWLRQ
250
H. Hage and E. Aïmeur / Exam Question Recommender System
1 Introduction (OHDUQLQJKDVDGYDQFHGFRQVLGHUDEO\LQWKHODVW\HDUV7RGD\WKHUHH[LVWPDQ\(OHDUQLQJ SODWIRUPVFRPPHUFLDO:HE&7>@%ODFNERDUG>@ RURSHQVRXUFH$7XWRU>@ ZKLFK RIIHUPDQ\WRROVDQGIXQFWLRQDOLWLHV>@6RPHRIWKHVHWRROVDUHDLPHGWRZDUGVWHDFKHUV DQG GHYHORSHUV DQG RWKHU WRROV DLPHG WRZDUGV VWXGHQWV DQG OHDUQHUV >@ $OWKRXJK ( OHDUQLQJKDVFRPHDORQJZD\VRPHRILWVDVSHFWVDUHVWLOOLQWKHLUHDUO\VWDJHV2QHVXFK DVSHFWLV(WHVWLQJ:KLOHH[LVWLQJ(OHDUQLQJSODWIRUPVGRRIIHU(WHVWLQJ$XWKRULQJWRROV PRVW DUH RQO\ EDVLF (WHVWLQJ IXQFWLRQDOLWLHV >@ >@ ZKLFK DUH OLPLWHG WR WKH SODWIRUP LWVHOI:LWKWKHHPHUJHQFHRI(OHDUQLQJVWDQGDUGVDQGVSHFLILFDWLRQVVXFKDVWKH,0647, >@ ,06 4XHVWLRQ DQG 7HVW ,QWHURSHUDELOLW\ (OHDUQLQJ PDWHULDO FDQ EH UHXVDEOH DFFHVVLEOHLQWHURSHUDEOHDQGGXUDEOH:LWK(OHDUQLQJVWDQGDUGV(WHVWLQJPDWHULDOFDQEH WUDQVIHUUHG IURP RQH SODWIRUP WR DQRWKHU )XUWKHUPRUH VRPH (OHDUQLQJ SODWIRUPV DUH VWDUWLQJ WR RIIHU WKH IXQFWLRQDOLW\ RI 7HVW %DQNV 7KLV IHDWXUH DOORZV WHDFKHUV DQG GHYHORSHUVWRVDYHWKHLUTXHVWLRQVDQGH[DPVLQWKH7HVW%DQNIRUIXWXUHDFFHVVDQGXVH7R WKH EHVW RI RXU NQRZOHGJH (OHDUQLQJ SODWIRUPV 7HVW %DQNV DUH OLPLWHG WR WKH WHDFKHU¶V SULYDWH XVH ZKHUH HDFK WHDFKHU FDQ RQO\ DFFHVV KLV SHUVRQDO SULYDWH TXHVWLRQV DQG WHVWV 7KHUHIRUHLQRUGHUWRVKDUHDYDLODEOH(WHVWLQJNQRZOHGJHWHDFKHUVPXVWGRVRH[SOLFLWO\ E\XVLQJLPSRUWH[SRUWIXQFWLRQDOLWLHVRIIHUHGRQO\E\VRPHSODWIRUPV&RQVHTXHQWO\GXH WRWKHOLPLWDWLRQVLQNQRZOHGJHVKDULQJWKHVL]HRIWKH7HVW%DQNVUHPDLQVUHODWLYHO\VPDOO WKXV(OHDUQLQJSODWIRUPVRQO\RIIHUEDVLFILOWHUVWRVHDUFKIRULQIRUPDWLRQZLWKLQWKH7HVW %DQN,QRUGHUWRHQFRXUDJHNQRZOHGJHVKDULQJDQGUHXVHZHDUHFXUUHQWO\LQWKHZRUNVRI LPSOHPHQWLQJ D ZHEEDVHG DVVHVVPHQW DXWKRULQJ WRRO FDOOHG &DGPXV &DGPXV RIIHUV DQ ,06 47,FRPSOLDQW FHQWUDOL]HG TXHVWLRQVDQGH[DPV UHSRVLWRU\ IRU WHDFKHUV WR VWRUH DQG VKDUH (WHVWLQJ NQRZOHGJH DQG UHVRXUFHV )RU VXFK D UHSRVLWRU\ WR EH EHQHILFLDO LW PXVW FRQWDLQ H[WHQVLYH LQIRUPDWLRQ RQ TXHVWLRQV DQG H[DPV 7KH ELJJHU DQG PRUH XVHIXO WKH UHSRVLWRU\ EHFRPHV WKH PRUH GUHDGIXO LV WKH WDVN WR VHDUFK IRU DQG UHWULHYH QHFHVVDU\ LQIRUPDWLRQ DQG PDWHULDO $OWKRXJK WKHUH H[LVW WRROV WR KHOS WHDFKHUV ORFDWH OHDUQLQJ PDWHULDO>@>@WRRXUNQRZOHGJHWKHUHDUHQ¶WSHUVRQDOL]HGWRROVWRKHOSWKHWHDFKHUVHOHFW H[DPPDWHULDOIURPDVKDUHGGDWDEDQN:KDWZHSURSRVHLVWRLQFRUSRUDWHLQWR&DGPXVDQ ([DP4XHVWLRQ5HFRPPHQGHU6\VWHPWRKHOSWHDFKHUVILQGDQGVHOHFWTXHVWLRQVIRUH[DPV 7KHUHFRPPHQGHUXVHVDK\EULGIHDWXUHDXJPHQWDWLRQUHFRPPHQGDWLRQDSSURDFK7KHILUVW OHYHOLVD&RQWHQW%DVHGILOWHUDQGWKHVHFRQGOHYHOLVD.QRZOHGJH%DVHGILOWHU>@ >@,Q RUGHUWRUHFRPPHQGTXHVWLRQVWKH.QRZOHGJH%DVHGILOWHUUHVRUWVWRDKHXULVWLFIXQFWLRQ )XUWKHUPRUH WKH ([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP JDWKHUV LPSOLFLW DQG H[SOLFLW IHHGEDFN>@IURPWKHXVHULQRUGHUWRLPSURYHIXWXUHUHFRPPHQGDWLRQV 7KHSDSHULVRUJDQL]HGDVIROORZVVHFWLRQLQWURGXFHV(OHDUQLQJ(WHVWLQJDQGRIIHUVDQ RYHUYLHZ RI (OHDUQLQJ VWDQGDUGV LQ SDUWLFXODU ,06 47, VHFWLRQ SUHVHQWV FXUUHQW UHFRPPHQGDWLRQWHFKQLTXHVVHFWLRQGHVFULEHVWKHDUFKLWHFWXUHDQGDSSURDFKRIWKH([DP 4XHVWLRQ5HFRPPHQGHU6\VWHPVHFWLRQKLJKOLJKWVWKHWHVWLQJSURFHGXUHDQGWKHUHVXOWV DQGVHFWLRQFRQFOXGHVWKHSDSHUDQGSUHVHQWVWKHIXWXUHZRUNV 2 E-learning (OHDUQLQJ FDQ EH GHILQHG ZLWK WKH IROORZLQJ VWDWHPHQW WKH GHOLYHU\ DQG VXSSRUW RI HGXFDWLRQDODQGWUDLQLQJPDWHULDOXVLQJFRPSXWHUV (OHDUQLQJ LV DQ DVSHFW RI GLVWDQW OHDUQLQJ ZKHUH WHDFKLQJ PDWHULDO LV DFFHVVHG WKURXJK HOHFWURQLF PHGLD LQWHUQHW LQWUDQHW &'520 « DQG ZKHUH WHDFKHUV DQG VWXGHQWV FDQ FRPPXQLFDWH HOHFWURQLFDOO\ HPDLO FKDW URRPV (OHDUQLQJ LV YHU\ FRQYHQLHQW DQG SRUWDEOH )XUWKHUPRUH (OHDUQLQJ LQYROYHV JUHDW FROODERUDWLRQ DQG LQWHUDFWLRQ EHWZHHQ VWXGHQWV DQG WXWRUV RU VSHFLDOLVWV 6XFK FROODERUDWLRQ LV PDGH HDVLHU E\ WKH RQOLQH HQYLURQPHQW)RUH[DPSOHDVWXGHQWLQ&DQDGDFDQKDYHDFFHVVWRDVSHFLDOLVWLQ(XURSHRU
H. Hage and E. Aïmeur / Exam Question Recommender System
251
$VLDWKURXJKHPDLORUFDQDVVLVWLQWKHVSHFLDOLVW¶VOHFWXUHWKURXJKDZHEFRQIHUHQFH7KHUH DUH IRXU SDUWV LQ WKH OLIH F\FOH RI (OHDUQLQJ >@ 6NLOO $QDO\VLV 0DWHULDO 'HYHORSPHQW /HDUQLQJ$FWLYLW\DQG(YDOXDWLRQ$VVHVVPHQW 2.1 E-testing 7KHUHH[LVWPDQ\(OHDUQLQJSODWIRUPVVXFKDV%ODFNERDUG:HE&7DQG$7XWRUWKDWRIIHU GLIIHUHQWIXQFWLRQDOLWLHV>@$OWKRXJK(YDOXDWLRQDQG$VVHVVPHQWLVDQLPSRUWDQWSDUWRI WKH (OHDUQLQJ OLIH F\FOH (WHVWLQJ UHPDLQV LQ LWV HDUO\ GHYHORSPHQW VWDJHV 0RVW ( OHDUQLQJ SODWIRUPV GR RIIHU (WHVWLQJ $XWKRULQJ WRROV PRVW RI ZKLFK RIIHU RQO\ EDVLF WHVWLQJIXQFWLRQDOLWLHVDQGDUHOLPLWHGWRWKHSODWIRUPLWVHOI)RULQVWDQFHPRVW(OHDUQLQJ SODWIRUPV RIIHU VXSSRUW IRU EDVLF TXHVWLRQ W\SHV VXFK DV 0XOWLSOH &KRLFH 7UXH)DOVH DQG 2SHQ(QGHG 4XHVWLRQV EXW GR QRW RIIHU WKH SRVVLELOLW\ RI DGGLQJ PXOWLPHGLD FRQWHQW LPDJHV VRXQGV « WR VHW D WLPH IUDPH IRU WKH H[DP RU HYHQ LQFOXGH LPSRUW IXQFWLRQDOLWLHV WR DGG TXHVWLRQVIURP H[WHUQDO VRXUFHV >@ ,Q RUGHU WR GHOLYHU (OHDUQLQJ PDWHULDO HDFK (OHDUQLQJ SODWIRUP FKRRVHV GLIIHUHQW GHOLYHU\ PHGLD D GLIIHUHQW SODWIRUPRSHUDWLQJV\VWHPDQGLWVRZQXQLTXHDXWKRULQJWRROVDQGVWRUHVWKHLQIRUPDWLRQLQ LWV RZQ IRUPDW 7KHUHIRUH LQ RUGHU WR UHXVH (OHDUQLQJ PDWHULDO GHYHORSHG RQ D VSHFLILF SODWIRUPRQHPXVWFKDQJHFRQVLGHUDEO\WKDWPDWHULDORUUHFUHDWHLWXVLQJWKHWDUJHWSODWIRUP DXWKRULQJ WRROV²KHQFH LQFUHDVLQJ WKH FRVW RI GHYHORSPHQW RI (OHDUQLQJ PDWHULDO 6WDQGDUGV DQG VSHFLILFDWLRQV KHOS VLPSOLI\ WKH GHYHORSPHQW XVH DQG UHXVH RI (OHDUQLQJ PDWHULDO 2.2 IMS Question and Test Interoperability $V VWDWHG LQ WKH $'/ $GYDQFHG 'LVWULEXWHG /HDUQLQJ JRDOV >@ VWDQGDUGV DQG VSHFLILFDWLRQVHQVXUHWKDW(OHDUQLQJPDWHULDOLV5HXVDEOHPRGLILHGHDVLO\DQGXVDEOHRQ GLIIHUHQW GHYHORSPHQW WRROV $FFHVVLEOH DYDLODEOH DV QHHGHG E\ OHDUQHUV RU FRXUVH GHYHORSHUV ,QWHURSHUDEOH IXQFWLRQDO DFURVV GLIIHUHQW KDUGZDUH RU VRIWZDUH SODWIRUPV DQG 'XUDEOH HDV\ WR PRGLI\ DQG XSGDWHIRU QHZ VRIWZDUH YHUVLRQV &XUUHQWO\ WKHUH DUH PDQ\RUJDQL]DWLRQVGHYHORSLQJGLIIHUHQWVWDQGDUGVIRU(OHDUQLQJ>@HDFKSURPRWLQJLWV RZQVWDQGDUGV6RPHRIWKHOHDGLQJRUJDQL]DWLRQVZLWKWKHPRVWZLGHO\DFFHSWHGVWDQGDUGV DUH ,((( /HDUQLQJ 7HFKQRORJ\ 6WDQGDUGV &RPPLWWHH >@ $'/ ,QLWLDWLYH $GYDQFHG 'LVWULEXWHG/HDUQLQJ >@DQG,063URMHFW,QVWUXFWLRQDO0DQDJHPHQW6\VWHP >@,06 47,VHWVDOLVWRIVSHFLILFDWLRQVXVHGWRH[FKDQJHDVVHVVPHQWLQIRUPDWLRQVXFKDVTXHVWLRQV WHVWV DQG UHVXOWV 47, DOORZV DVVHVVPHQW V\VWHPV WR VWRUH WKHLU GDWD LQ WKHLU RZQ IRUPDW DQG SURYLGHV D PHDQV WR LPSRUW DQG H[SRUW WKDW GDWD LQ WKH 47, IRUPDW EHWZHHQ YDULRXV DVVHVVPHQWV\VWHPV :LWKWKHHPHUJHQFHDQGXVHRI(OHDUQLQJVWDQGDUGVOHDUQLQJDQGWHVWLQJPDWHULDOFDQEH UHXVHGDQGVKDUHGDPRQJYDULRXV(OHDUQLQJSODWIRUPV>@.QRZOHGJHVKDULQJZRXOGOHDG WR D TXLFN LQFUHDVH LQ WKH DYDLODEOH LQIRUPDWLRQ DQG PDWHULDO OHDGLQJ WR WKH QHHG IRU UHFRPPHQGDWLRQV\VWHPVWRKHOSILOWHUWKHUHTXLUHGGDWD 3 Recommender System 5HFRPPHQGHU V\VWHPV RIIHU WKH XVHU DQ DXWRPDWHG UHFRPPHQGDWLRQ IURP D ODUJH LQIRUPDWLRQ VSDFH >@ 7KHUH H[LVW PDQ\ UHFRPPHQGDWLRQ WHFKQLTXHV GLIIHUHQWLDWHG XSRQ WKH EDVLV RI WKHLU NQRZOHGJH VRXUFHV XVHG WR PDNH D UHFRPPHQGDWLRQ 6HYHUDO UHFRPPHQGDWLRQWHFKQLTXHVDUHLGHQWLILHGLQ>@LQFOXGLQJ&ROODERUDWLYH5HFRPPHQGDWLRQ WKHUHFRPPHQGHUV\VWHPDFFXPXODWHVXVHUUDWLQJVRILWHPVLGHQWLILHVXVHUVZLWKFRPPRQ UDWLQJV DQG RIIHUV UHFRPPHQGDWLRQV EDVHG RQ LQWHUXVHU FRPSDULVRQ &RQWHQW%DVHG 5HFRPPHQGDWLRQ WKH UHFRPPHQGHU V\VWHP XVHV WKH IHDWXUHV RI WKH LWHPV DQG WKH XVHU¶V LQWHUHVW LQ WKHVH IHDWXUHV WR PDNH D UHFRPPHQGDWLRQ DQG .QRZOHGJH%DVHG 5HFRPPHQGDWLRQ WKH UHFRPPHQGHU V\VWHP EDVHV WKH UHFRPPHQGDWLRQ RI LWHPV RQ
252
H. Hage and E. Aïmeur / Exam Question Recommender System
LQIHUHQFHVDERXWWKHXVHU¶VSUHIHUHQFHVDQGQHHGV (DFKUHFRPPHQGDWLRQWHFKQLTXHKDVLWV DGYDQWDJHV DQG OLPLWDWLRQV WKXV WKH XVH RI K\EULG V\VWHPV WKDW FRPELQHV PXOWLSOH WHFKQLTXHVWRSURGXFHWKHUHFRPPHQGDWLRQ7KHUHH[LVWVHYHUDOWHFKQLTXHVRIK\EULGL]DWLRQ >@>@VXFKDV6ZLWFKLQJWKHUHFRPPHQGHUV\VWHPVZLWFKHVEHWZHHQVHYHUDOWHFKQLTXHV GHSHQGLQJ RQ WKH VLWXDWLRQ WR SURGXFH WKH UHFRPPHQGDWLRQ &DVFDGH WKH UHFRPPHQGHU V\VWHPXVHVRQHWHFKQLTXHWRJHQHUDWHDUHFRPPHQGDWLRQDQGDVHFRQGWHFKQLTXHWREUHDN DQ\ WLHV DQG )HDWXUH $XJPHQWDWLRQ WKH UHFRPPHQGHU V\VWHP XVHV RQH WHFKQLTXH WR JHQHUDWHDQRXWSXWZKLFKLQWXUQLVXVHGDVLQSXWWRDVHFRQGUHFRPPHQGDWLRQWHFKQLTXH 2XU ([DP 4XHVWLRQ 5HFRPPHQGDWLRQ 6\VWHP XVHV D K\EULG IHDWXUHDXJPHQWDWLRQ DSSURDFKXVLQJ&RQWHQW%DVHGDQG.QRZOHGJH%DVHGUHFRPPHQGDWLRQ 4 Exam Questions Recommendation System Architecture &DGPXV LV DQ (WHVWLQJ SODWIRUP WKDW RIIHUV WHDFKHUV DQ H[WHQVLYH TXHVWLRQ OLEUDU\ 7KH PRUH FRPSUHKHQVLYH &DGPXV¶V TXHVWLRQ OLEUDU\ LV WKH KDUGHU WKH WDVN WR VHDUFK IRU DQG VHOHFWTXHVWLRQV7KHILUVWVXJJHVWLRQWKDWFRPHVWRPLQGLVWRILOWHUTXHVWLRQVDFFRUGLQJWR WKHLUFRQWHQWDQGWKHQHHGVRIWKHWHDFKHU$&RQWHQW%DVHGILOWHUZLOOKHOSEXWPLJKWQRW EHHQRXJK)RU LQVWDQFH WKHUH PLJKW EH EHWZHHQ DQG TXHVWLRQVLQ WKHOLEUDU\ WKDW VDWLVI\WKHFRQWHQWUHTXLUHPHQWEXWQRWDOOZLOOEHUDWHGWKHVDPHE\GLIIHUHQWWHDFKHUVZLWK GLIIHUHQWSUHIHUHQFHVDWHDFKHUPLJKWSUHIHU³PXOWLSOHFKRLFH´WR³WUXHDQGIDOVH´RUPLJKW SUHIHU TXHVWLRQV ZLWK D FHUWDLQ OHYHO RI GLIILFXOW\ :KDW ZH SURSRVH LV D IHDWXUH DXJPHQWDWLRQ K\EULGUHFRPPHQGDWLRQDSSURDFK ZKHUH WKH ILUVWOHYHO LVD &RQWHQW%DVHG ILOWHUDQGWKHVHFRQGOHYHOD.QRZOHGJH%DVHGILOWHU7KH&RQWHQW%DVHGILOWHUZLOOUHGXFH WKHVHDUFKWRTXHVWLRQVZLWKFRQWHQWSHUWLQHQWWRWKHWHDFKHU¶VQHHGVDQGWKH.QRZOHGJH %DVHGILOWHUZLOOVRUWWKHVHTXHVWLRQVZLWKUHJDUGVWRWKHWHDFKHU¶VSUHIHUHQFHVVXFKWKDWWKH KLJKHUUDQNLQJTXHVWLRQVDUHWKHPRVWOLNHO\WREHFKRVHQE\WKHWHDFKHU)LJXUHLOOXVWUDWHV WKH DUFKLWHFWXUH RI WKH UHFRPPHQGHU V\VWHP :H FDQ GLVWLQJXLVK WZR GLIIHUHQW W\SHV RI FRPSRQHQWV 6WRUDJH FRPSRQHQWV 4XHVWLRQ %DVH DQG 8VHU 3URILOH DQG 3URFHVV &RPSRQHQWV&RQWHQW%DVHG)LOWHU.QRZOHGJH%DVHG)LOWHUDQG)HHGEDFN 4.1 Question Base 7KH4XHVWLRQ%DVHVWRUHVDOOWKHTXHVWLRQVFUHDWHGE\WKHWHDFKHUV7KHDFWXDOTXHVWLRQLV VWRUHG LQ DQ H[WHUQDO ;0/ ILOH IROORZLQJ WKH ,06 47, VSHFLILFDWLRQV DQG WKH GDWDEDVH FRQWDLQVWKHIROORZLQJLQIRUPDWLRQDERXWWKHTXHVWLRQ ,GHQWXQLTXHTXHVWLRQLGHQWLILHU 7LWOHFRQWDLQVWKHWLWOHRIWKHTXHVWLRQ /DQJXDJHFRUUHVSRQGVWRWKHODQJXDJHRIWKHTXHVWLRQLH(QJOLVK)UHQFK« 7RSLFGHQRWHVWKHWRSLFRIWKHTXHVWLRQLH&RPSXWHU6FLHQFH+LVWRU\« 6XEMHFWVSHFLILHVWKHVXEMHFWZLWKLQWKHWRSLFLH'DWDEDVHV'DWD6WUXFWXUHV« 7\SHGHQRWHVWKHW\SHRITXHVWLRQLHPXOWLSOHFKRLFHWUXHIDOVH« 'LIILFXOW\ VSHFLILHV WKH GLIILFXOW\ OHYHO RI WKH TXHVWLRQ DFFRUGLQJ WR SRVVLEOH YDOXHV 9HU\(DV\(DV\,QWHUPHGLDWH'LIILFXOWDQG9HU\'LIILFXOW .H\ZRUGVFRQWDLQVNH\ZRUGVUHOHYDQWWRWKHTXHVWLRQ¶VFRQWHQW 2EMHFWLYH FRUUHVSRQGV WR WKH SHGDJRJLFDO REMHFWLYH RI WKH TXHVWLRQ &RQFHSW 'HILQLWLRQ&RQFHSW$SSOLFDWLRQ&RQFHSW*HQHUDOL]DWLRQDQG&RQFHSW0DVWHU\ 2FFXUUHQFHDFRXQWHURIWKHQXPEHURIH[DPVWKLVTXHVWLRQDSSHDUVLQ $XWKRUWKHDXWKRURIWKHTXHVWLRQ $YDLODELOLW\ GHVLJQDWHV ZKHWKHU WKH TXHVWLRQ LV DYDLODEOH RQO\ WR WKH DXWKRU WR RWKHU WHDFKHUVRUDQ\RQH 47,4XHVWLRQKDQGOHWRWKH,0647,FRPSOLDQW;0/ILOHZKHUHWKHTXHVWLRQDQGDOO RIUHOHYDQWLQIRUPDWLRQVXFKDVDQVZHUVFRPPHQWVDQGKLQWVDUHVWRUHG
253
H. Hage and E. Aïmeur / Exam Question Recommender System
4.2 User Profile 7KH 8VHU 3URILOH VWRUHV LQIRUPDWLRQ DQG GDWD DERXW WKH WHDFKHU WKDW DUH XVHG E\ WKH .QRZOHGJH%DVHGILOWHU7KHXVHUSURILOHFRQWDLQVWKHIROORZLQJ /RJLQXQLTXHLGHQWLILHURIWKHXVHU 7\SH:HLJKWVHOHFWHGE\WKHXVHUIRUWKHW\SHFULWHULD 2FFXUUHQFH:HLJKWVSHFLILHGE\WKHXVHUIRUWKHRFFXUUHQFHFULWHULD 'LIILFXOW\:HLJKWFKRVHQE\WKHXVHUIRUWKHGLIILFXOW\FULWHULD $XWKRU:HLJKWVSHFLILHGE\WKHXVHUIRUWKHDXWKRUFULWHULD ,QGLYLGXDO 7\SH :HLJKWV V\VWHPFDOFXODWHG ZHLJKW IRUHDFK GLIIHUHQW TXHVWLRQ W\SH LHZHLJKWIRU7UXH)DOVHIRU0XOWLSOH6HOHFWLRQ« ,QGLYLGXDO 2FFXUUHQFHV :HLJKWV V\VWHPFDOFXODWHG ZHLJKW IRU HDFK GLIIHUHQW TXHVWLRQRFFXUUHQFHLH9HU\/RZ$YHUDJH+LJK« ,QGLYLGXDO'LIILFXOWLHV:HLJKWVV\VWHPFDOFXODWHGZHLJKWIRUHDFKGLIIHUHQWTXHVWLRQ GLIILFXOW\LHZHLJKWIRU(DV\IRU'LIILFXOW« ,QGLYLGXDO$XWKRUV:HLJKWVV\VWHPFDOFXODWHGZHLJKWIRUHDFKDXWKRU
User Interface Search Criteria & Criteria Weights
Add/Edit
Retrieve
Recommend
Gather Feedback
Content-Based Filter
Question Base
Feedback
Update Profile
Knowledge-Based Filter
Retrieve User Profile
Candidate Questions
)LJXUH6\VWHP$UFKLWHFWXUH
7KH WHDFKHUVSHFLILHG 7\SH 2FFXUUHQFH 'LIILFXOW\ DQG $XWKRU ZHLJKWV DUH VHW PDQXDOO\ E\ WKH WHDFKHU 7KHVH ZHLJKWV UHSUHVHQW KLV FULWHULD SUHIHUHQFH LH ZKLFK RI WKH IRXU LQGHSHQGHQW FULWHULD LV PRUH LPSRUWDQW IRU KLP 7KH WHDFKHU FDQ VHOHFW RQH RXW RI ILYH GLIIHUHQWYDOXHVZLWKHDFKDVVLJQHGDQXPHULFDOYDOXH7DEOH WKDWLVXVHGLQWKHGLVWDQFH IXQFWLRQH[SODLQHGLQ7KHV\VWHPFDOFXODWHGZHLJKWVLQIHUWKHWHDFKHU¶VSUHIHUHQFHV RIWKHYDULRXVYDOXHVHDFKFULWHULDPLJKWKDYH)RUH[DPSOHWKH7\SHFULWHULDPLJKWKDYH RQHRIWKUHHGLIIHUHQWYDOXHV7UXH)DOVH7) 0XOWLSOH&KRLFH0& RU0XOWLSOH6HOHFWLRQ 06 WKXVWKHV\VWHPZLOOFDOFXODWHWKUHHGLIIHUHQWZHLJKWVZ7)Z0&DQGZ067KHV\VWHP NHHSVWUDFNRIDFRXQWHUIRUHDFKLQGLYLGXDOZHLJKWLHDFRXQWHUIRU7UXH)DOVHDFRXQWHU IRU0XOWLSOH6HOHFWLRQ« DQGDFRXQWHUIRUWKHWRWDOQXPEHURITXHVWLRQVVHOHFWHGWKXVIDU E\ WKH WHDFKHU (DFK WLPH WKH WHDFKHU VHOHFWV D QHZ TXHVWLRQ WKH FRXQWHU IRU WKH WRWDO QXPEHU RI TXHVWLRQV LV LQFUHPHQWHG DQG WKH FRUUHVSRQGLQJ LQGLYLGXDO ZHLJKW LV LQFUHPHQWHGDFFRUGLQJO\LHLIWKHTXHVWLRQLVD7UXH)DOVHWKHQWKH7UXH)DOVHFRXQWHULV LQFUHPHQWHGDQGZ7) &RXQWHU7UXH)DOVH 7RWDOQXPEHURITXHVWLRQV7KHYDOXHRIWKH LQGLYLGXDOZHLJKWVLVWKHSHUFHQWDJHRIXVDJHVRWKDWLIWKHXVHUVHOHFWHGTXHVWLRQVRXW RIZKLFKZHUH7)ZHUH0&DQGZHUH06WKHQZ7) Z0& Z06 DQGZ7)Z0&Z06 :HLJKW 9DOXH
/RZHVW
7DEOH:HLJKWV9DOXHV /RZ 1RUPDO
+LJK
+LJKHVW
254
H. Hage and E. Aïmeur / Exam Question Recommender System
4.3 Content-Based Filter :KHQIRUWKHSXUSRVHRIFUHDWLQJDQHZH[DPWKHWHDFKHUZDQWVWRVHDUFKIRUTXHVWLRQVKH PXVWVSHFLI\WKHVHDUFKFULWHULDIRUWKHTXHVWLRQV)LJXUH 7KHVHDUFKFULWHULDDUHXVHGE\ WKH&RQWHQW%DVHG)LOWHUDQGFRQVLVWRIWKHIROORZLQJ/DQJXDJH7RSLF6XEMHFWWKHRSWLRQ RI ZKHWKHU RU QRW WR LQFOXGH TXHVWLRQV WKDW DUH SXEOLFO\ DYDLODEOH WR VWXGHQWV 2EMHFWLYH 7\SH 7\SH :HLJKW XVHG E\ WKH WHDFKHU WR VSHFLI\ KRZ LPSRUWDQW WKLV FULWHULD LV WR KLP FRPSDUHG ZLWK RWKHU FULWHULD 'LIILFXOW\ 'LIILFXOW\ :HLJKW 2FFXUUHQFH 2FFXUUHQFH :HLJKW .H\ZRUGV RQO\ WKH TXHVWLRQV ZLWK RQH RU PRUH RI WKH VSHFLILHG NH\ZRUGV DUH UHWULHYHG,IOHIWEODQNWKHTXHVWLRQ¶VNH\ZRUGVDUHLJQRUHGLQWKHVHDUFK $XWKRURQO\WKH TXHVWLRQVRIWKHVSHFLILHGDXWKRUV DUHUHWULHYHG DQG$XWKRU:HLJKW
)LJXUH4XHVWLRQ6HDUFK
7KHWHDFKHUPXVWILUVWVHOHFWWKHODQJXDJHDQGWKHWRSLFIRUWKHTXHVWLRQDQGKDVWKHRSWLRQ WR UHVWULFW WKH VHDUFK WRDVSHFLILFVXEMHFW ZLWKLQ WKH VHOHFWHG WRSLF 6LQFH VRPH TXHVWLRQV PD\EHDYDLODEOHWRVWXGHQWVWKHWHDFKHUKDVWKHRSWLRQWRLQFOXGHRURPLWWKHVHTXHVWLRQV IURP WKH VHDUFK )XUWKHUPRUH WKH WHDFKHU PD\ UHVWULFW WKH VHDUFK WR D FHUWDLQ TXHVWLRQ REMHFWLYHTXHVWLRQW\SHTXHVWLRQRFFXUUHQFHDQGTXHVWLRQGLIILFXOW\0RUHRYHUWKHWHDFKHU FDQ QDUURZ WKH VHDUFK WR TXHVWLRQV IURP RQH RU PRUH DXWKRUV DQG FDQ UHILQH KLV VHDUFK IXUWKHU E\ VSHFLI\LQJ RQH RU PRUH NH\ZRUGV WKDW DUH UHOHYDQW WR WKH TXHVWLRQ¶V FRQWHQW )LQDOO\ WKH WHDFKHU FDQ VSHFLI\ WKH ZHLJKW RU WKH LPSRUWDQFH RI VSHFLILF FULWHULD WKLV ZHLJKW LV XVHG E\ WKH .QRZOHGJH%DVHG ILOWHU :KHQ WKH XVHU LQLWLDWHV WKH VHDUFK WKH UHFRPPHQGHU V\VWHP ZLOO VWDUW E\ FROOHFWLQJ WKH VHDUFK FULWHULD DQG ZHLJKWV 7KHQ WKH VHDUFKFULWHULDDUHFRQVWUXFWHGLQWRDQ64/TXHU\WKDWLVSDVVHGWRWKHGDWDEDVH7KHUHVXOW RIWKHTXHU\LVDFROOHFWLRQRIFDQGLGDWHTXHVWLRQVZKRVHFRQWHQWLVUHOHYDQWWRWKHWHDFKHU¶V VHDUFK 7KH FDQGLGDWH TXHVWLRQVDQG WKH FULWHULD ZHLJKWV DUH WKHQ XVHG DV WKH LQSXW WR WKH .QRZOHGJH%DVHGILOWHU 4.4 Knowledge-Based Filter 7KH.QRZOHGJH%DVHG)LOWHUWDNHVDVLQSXWWKHFDQGLGDWHTXHVWLRQVDQGWKHFULWHULDZHLJKWV 7KH FULWHULD ZHLJKW LV VSHFLILHG E\ WKH WHDFKHU DQG UHSUHVHQWV WKH LPSRUWDQFH RI WKLV VSHFLILFFULWHULDWRWKHXVHUFRPSDUHGWRRWKHUFULWHULD7DEOHSUHVHQWVWKHSRVVLEOHYDOXHV RI WKH FULWHULD ZHLJKW DQG WKH UHVSHFWLYH QXPHULFDO YDOXHV 7KH .QRZOHGJH%DVHG ILOWHU
255
H. Hage and E. Aïmeur / Exam Question Recommender System
UHWULHYHV WKH WHDFKHU¶V SURILOH IURP WKH 8VHU 3URILOH UHSRVLWRU\ DQG XVHV WKH GLVWDQFH IXQFWLRQWRFDOFXODWHWKHGLVWDQFHEHWZHHQHDFKRIWKHFDQGLGDWHTXHVWLRQVDQGWKHWHDFKHU¶V SUHIHUHQFHV 4.4.1 Distance Function ,Q RUGHU WR GHFLGH ZKLFK TXHVWLRQ WKH WHDFKHU ZLOO SUHIHU WKH PRVW ZH QHHG WR FRPSDUH VHYHUDOFULWHULDWKDWDUHXQUHODWHG)RULQVWDQFHKRZFDQVRPHRQHFRPSDUHWKH7\SHRID TXHVWLRQZLWKWKHQXPEHURIWLPHVLWDSSHDUVLQH[DPVWKH2FFXUUHQFH "6LQFHZHFDQQRW FRUUHODWHWKHGLIIHUHQWFULWHULDZHOHIWWKLVGHFLVLRQWRWKHWHDFKHUKHPXVWVHOHFWWKHFULWHULD ZHLJKW 7KLV ZHLJKW PXVW HLWKHU UHLQIRUFH RU XQGHUPLQH WKH YDOXH RI WKH FULWHULD 7KH .QRZOHGJH%DVHG UHFRPPHQGHU XVHV D KHXULVWLF 'LVWDQFH )XQFWLRQ (TXDWLRQ WR FDOFXODWHWKHGLVWDQFHEHWZHHQDTXHVWLRQDQGWKHWHDFKHU¶VSUHIHUHQFHV
V = ¦ :L Z M L
(TXDWLRQ'LVWDQFH)XQFWLRQ
7KHGLVWDQFHIXQFWLRQLVWKHVXPRIWKHSURGXFWVRIWZRZHLJKWV:DQGZZKHUH:LVWKH ZHLJKW VSHFLILHG E\ WKH WHDFKHU IRU WKH FULWHULD DQG Z LV WKH ZHLJKW FDOFXODWHG E\ WKH UHFRPPHQGHU V\VWHP 7KH PXOWLSOLFDWLRQ E\ : ZLOO HLWKHU UHLQIRUFH RU XQGHUPLQH WKH ZHLJKWRIWKHFULWHULD&RQVLGHUWKHIROORZLQJH[DPSOHWRLOOXVWUDWHWKHGLVWDQFHIXQFWLRQLQ WKHVHDUFKSHUIRUPHGLQ)LJXUHWKHWHDFKHUVHW:7\SH +LJK:'LIILFXOW\ /RZ:2FFXUHQFH /RZHVWDQG:$XWKRU +LJKHVWYDOXHVLOOXVWUDWHGLQ7DEOH 7DEOHLOOXVWUDWHVWKHYDOXHV RIWZRGLIIHUHQWTXHVWLRQVDQG7DEOHLOOXVWUDWHVWKHLQGLYLGXDOZHLJKWVUHWULHYHGIURPWKH WHDFKHU¶V SURILOH 7DEOH FRQWDLQV RQO\ D SDUW RI WKH DFWXDO SURILOH UHIOHFWLQJ WKH GDWD SHUWLQHQWWRWKHH[DPSOH 4XHVWLRQ4 4XHVWLRQ4 &ULWHULD 9DOXH :HLJKW
7DEOH4XHVWLRQ9DOXHV 7\SH 'LIILFXOW\ 2FFXUUHQFH 7UXH)DOVH (DV\ +LJK 0XOWLSOH&KRLFH (DV\ /RZ
$XWKRU %UD]FKUL %UD]FKUL
7DEOH7HDFKHU V3URILOH9DOXHV 7\SH 'LIILFXOW\ 2FFXUUHQFH 7UXH)DOVH 0XOWLSOH&KRLFH (DV\ +LJK /RZ
$XWKRU %UD]FKUL
&DOFXODWLQJWKHGLVWDQFHIXQFWLRQIRUERWKTXHVWLRQVZLOOJLYH V = :7\SH × Z7UXH )DOVH + :'LIILFXOW\ × Z(DV\ + :2FFXUHQFH× Z+LJK + :$XWKRU × Z+DOD V = :7\SH × Z0XOWLSOH&KRLFH + :'LIILFXOW\ × Z(DV\ + :2FFXUHQFH× Z/RZ + :$XWKRU × Z+DOD
V = × + × + × + × = V = × + × + × + × =
$OWKRXJKWKHUHH[LVWVDELJGLIIHUHQFHEHWZHHQWKH2FFXUUHQFHV¶ZHLJKWVLQWKHIDYRURI4 4ZLOOUDQNKLJKHUEHFDXVHWKHWHDFKHUGHHPHGWKH7\SHFULWHULDDVPRUHLPSRUWDQWWKDQ WKH2FFXUUHQFHFULWHULD 4.5 Feedback 7KH ([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP ILUVW UHWULHYHV FDQGLGDWH TXHVWLRQV XVLQJ WKH &RQWHQW%DVHGILOWHUWKHQUDQNVWKHFDQGLGDWHTXHVWLRQVXVLQJWKH.QRZOHGJH%DVHGILOWHU DQGILQDOO\GLVSOD\VWKHTXHVWLRQVIRUWKHWHDFKHUWRVHOHFWIURP7KHWHDFKHUFDQWKHQVHOHFW DQGDGGWKHGHVLUHGTXHVWLRQVWRWKHH[DP$WWKLVVWDJHWKHH[DPFUHDWLRQDQGLWVHIIHFWRQ
256
H. Hage and E. Aïmeur / Exam Question Recommender System
WKHTXHVWLRQVDQGWHDFKHU¶VSURILOHLVRQO\VLPXODWHGQRDFWXDOH[DPLVFUHDWHG7KH([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP JDWKHUV WKH IHHGEDFN IURP WKH WHDFKHU LQ WZR PDQQHUV ([SOLFLWDQG,PSOLFLW([SOLFLWIHHGEDFNLVJDWKHUHGZKHQWKHWHDFKHUPDQXDOO\FKDQJHVWKH FULWHULDZHLJKWVDQGKLVSURILOHLVXSGDWHGZLWKWKHQHZVHOHFWHGZHLJKW,PSOLFLWIHHGEDFN LVJDWKHUHGZKHQWKHWHDFKHUVHOHFWVDQGDGGVTXHVWLRQVWRWKHH[DP,QIRUPDWLRQVXFKDV WKH TXHVWLRQ W\SH GLIILFXOW\ RFFXUUHQFH DQG DXWKRU LV JDWKHUHG WR XSGDWH WKH V\VWHP FDOFXODWHGLQGLYLGXDOZHLJKWVLQWKHWHDFKHU¶VSURILOHDVKLJKOLJKWHGLQ 5 Testing and Results 7KH SXUSRVH RI WKH ([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP LV WR VLPSOLI\ WKH WDVN RI VHDUFKLQJIRUDQGVHOHFWLQJTXHVWLRQVIRUH[DPV7KHDLPRIWKHWHVWLQJLVWRGHWHUPLQHWKH SHUIRUPDQFH RI WKH UHFRPPHQGDWLRQ LQ KHOSLQJ WKH WHDFKHU VHOHFW TXHVWLRQV 7R WHVW WKH UHFRPPHQGHUV\VWHPZHXVHGDGDWDEDVHFRQWDLQLQJDERXW-DYDTXHVWLRQV7KHV\VWHP KDV D WRWDO RI GLIIHUHQW DXWKRUVXVHUV )RU HDFK UHFRPPHQGDWLRQ DQG VHOHFWLRQ WKH V\VWHP UHFRUGHG WKH IROORZLQJ 7HDFKHU¶V 1DPH 'DWH 6HDUFK 1XPEHU 4XHVWLRQV 5HFRPPHQGHG4XHVWLRQV6HOHFWHGDQG5DQN7KHGDWHDQGWKHVHDUFKQXPEHUHQDEOHXVWR WUDFNWKHSHUIRUPDQFHDQGTXDOLW\RIWKHUHFRPPHQGDWLRQDVWKHXVHUPDNHVPRUHFKRLFHV DQG KLV SURILOH LV GHYHORSLQJ 7KH UDQN RI WKH VHOHFWHG TXHVWLRQV LV DQ LQGLFDWLRQ RI WKH DFFXUDF\RIWKH.QRZOHGJH%DVHG)LOWHUWKHKLJKHUWKHUDQNRIWKHVHOHFWHGTXHVWLRQVWKH PRUHDFFXUDWHLVWKHUHFRPPHQGDWLRQRIWKH.QRZOHGJH%DVHGILOWHU 5.1 Results 7KH SUHOLPLQDU\ UHVXOWV DUH YHU\ HQFRXUDJLQJ DQG ZH DUH VWLOO XQGHUJRLQJ IXUWKHU WHVWLQJ 7KHUHZHUHUHJLVWHUHGXVHUVWHDFKHUVWHDFKHU¶VDVVLVWDQWVDQGJUDGXDWHVWXGHQWV WHVWLQJ WKH V\VWHP IRU D WRWDO RI UHFRPPHQGDWLRQV DQG TXHVWLRQV VHOHFWHG DQG DGGHG WR H[DPV VRPH TXHVWLRQV ZHUH VHOHFWHG PRUH WKDQ RQFH 2Q DYHUDJH TXHVWLRQV ZHUH UHFRPPHQGHG DIWHU HDFK VHDUFK )LJXUH LOOXVWUDWHV WKH 5DQNLQJ 3DUWLWLRQ RI WKH VHOHFWHG TXHVWLRQV $OPRVW RI WKH VHOHFWHG TXHVWLRQV ZHUH DPRQJ WKH WRS WHQ UHFRPPHQGHG TXHVWLRQV)LJXUHLOOXVWUDWHVWKHUDQNSDUWLWLRQLQJRIWKHTXHVWLRQVVHOHFWHGDPRQJWKHWRS :HQRWLFHWKDWWKHILUVWUDQNLQJTXHVWLRQLVWKHPRVWVHOHFWHGZKLOHWKHWRSILYHUDQNHG TXHVWLRQVFRQVWLWXWHDERXWRIWKHVHOHFWHGTXHVWLRQVZLWKLQWKHWRSWHQUDQNHGE\WKH UHFRPPHQGHU V\VWHP 2Q DQ DYHUDJH RI TXHVWLRQV SURSRVHG ZLWK HDFK VHDUFK DOPRVW RI WKH VHOHFWHG TXHVWLRQV ZHUH ZLWKLQ WKH ILUVW WHQ TXHVWLRQV UHFRPPHQGHG E\ WKH ([DP 4XHVWLRQ 5HFRPPHQGHU 6\VWHP DQG DOPRVW ZHUH ZLWKLQ WKH ILUVW UHFRPPHQGHG TXHVWLRQV 7KXV IDU ZH FDQ FRQFOXGH WKDWLQ RI WKHFDVHVWKH WHDFKHU GLGQRWQHHGWREURZVHIDUWKHUWKDQTXHVWLRQVWKHUHE\PDNLQJLWHDVLHUIRUWKHWHDFKHUWR VHDUFKIRUWKHUHTXLUHGTXHVWLRQVIRUKLVH[DP 18%
5% rank @ %XUNH5³+\EULG5HFRPPHQGHU6\VWHPV6XUYH\DQG([SHULPHQWV´8VHU0RGHOLQJDQG8VHU$GDSWHG ,QWHUDFWLRQ9RO1RSS >@ %UHDGHO\.DQG6P\WK%³$Q$UFKLWHFWXUHIRU&DVH%DVHG3HUVRQDOL]HG6HDUFK´$GYDQFHVLQ&DVH %DVHG5HDVRQLQJWK(XURSHDQ&RQIHUHQFH(&&%5 SS0DGULG >@ &DEHQD 3 +DGMLQLDQ 3 6WDGOHU 5 9HUKHHV - DQG =QDVL $ ³'LVFRYHULQJ 'DWD 0LQLQJ )URP &RQFHSW7R,PSOHPHQWDWLRQ´8SSHU6DGGOH5LYHU1-3UHQWLFH+DOO >@ *DXGLRVL(%RWLFDULR-³7RZDUGVZHEEDVHGDGDSWLYHOHDUQLQJFRPPXQLW\´,QWHUQDWLRQDO&RQIHUHQFH RQ$UWLILFLDO,QWHOOLJHQFHLQ(GXFDWLRQ$,(' SS6\GQH\ >@ 0LOOHU % .RQVWDQ - DQG 5LHGO - ³3RFNHW/HQV 7RZDUG D 3HUVRQDO 5HFRPPHQGHU 6\VWHP´ $&0 7UDQVDFWLRQRQ,QIRUPDWLRQ6\VWHPV9RO1RSS >@ 0RKDQ 3 *UHHU - ³(OHDUQLQJ 6SHFLILFDWLRQ LQ WKH FRQWH[W RI ,QVWUXFWLRQDO 3ODQQLQJ´ ,QWHUQDWLRQDO &RQIHUHQFHRQ$UWLILFLDO,QWHOOLJHQFHLQ(GXFDWLRQ$,(' SS6\GQH\ >@ 7DQJ 7 0F&DOOD * ³6PDUW 5HFRPPHQGDWLRQ IRU DQ (YROYLQJ (/HDUQLQJ 6\VWHP´ ,QWHUQDWLRQDO &RQIHUHQFHRQ$UWLILFLDO,QWHOOLJHQFHLQ(GXFDWLRQ$,(' SS6\GQH\ >@ :DONHU$5HFNHU0/DZOHVV. :LOH\'³&ROODERUDWLYHLQIRUPDWLRQILOWHULQJ$UHYLHZDQGDQ HGXFDWLRQDODSSOLFDWLRQ´,QWHUQDWLRQDO-RXUQDORI$UWLILFLDO,QWHOOLJHQFHDQG(GXFDWLRQ9ROSS >@KWWSEODFNERDUGFRP >@KWWSZZZZHEFWFRP >@KWWSOWVFLHHHRUJ >@KWWSZZZDGOQHWRUJ >@KWWSZZZLPVSURMHFWRUJ >@KWWSZRUNVKRSVHGXZRUNVFRPVWDQGDUGV >@KWWSZZZHGXWRROVLQIRLQGH[MVS >@KWWSZZZDVLDHOHDUQLQJQHWFRQWHQWDERXW(/LQGH[KWPO >@KWWSZZZPDUVKDOOHGXLWFLWZHEFWFRPSDUHFRPSDULVRQKWPOGHYHORS >@KWWSZZZDWXWRUFD
258
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
DIANE, a diagnosis system for arithmetical problem solving Khider Hakem*, Emmanuel Sander*, Jean-M arc Labat**, Jean-François Richard* * Cognition & Usages, 2 Rue de la Liberté, 93526 Saint Denis Cedex 02France
[email protected],
[email protected],
[email protected] ** UTES , Université Pierre et Marie Curie, 12, rue Cuvier 75270 Paris cedex 05, France
[email protected] Abstract. We hereby describe DIANE an environment that aims at performing an automatic diagnosis on arithmetic problems depending on the productions of the learners. This work relies on results from cognitive psychology studies that insist on the fact that problem solving depends to a great extent on the construction of an adequate representation of the problem, which is highly constrained. DIANE allows large-scale experimentations and has the specificity of providing diagnosis at a very detailed level of precision, whether it concerns adequate or erroneous strategies, allowing one to analyze cognitive mechanisms involved in the solving process. The quality of the diagnosis module has been assessed and, concerning non verbal cues, 93.4% of the protocols were diagnosed in the same way as with manual analysis. Key Words: cognitive diagnosis, arithmetical problem solving, models of learners.
Introduction DIANE (French acronym for Computerized Diagnosis on Arithmetic at Elementary School) is part of a project named « conceptualization and semantic properties of situations in arithmetical problem solving » [12]; it is articulated around the idea that traditional approaches in terms of typologies, schemas or situation models, the relevance of which remains undisputable, do not account for some of the determinants of problem difficulties: transverse semantic dimensions, which rely on the nature of the variables or the entities involved independently of an actual problem schema, influence problem interpretation, and consequently, influence also solving strategies, learning and transfer between problems. The identification of these dimensions relies on studying isomorphic problems as well as on an accurate analysis of the strategies used by the pupils, whether they lead to a correct result or not. We believe that fundamental insight in understanding learning processes and modeling learners may be gained through studying a “relevant” micro domain in a detailed manner. Thus, even if our target is to enlarge in the long run the scope of exercises treated by DIANE, the range covered is not so crucial for us compared to the choice of the micro domain and the precision of the analysis. We consider as well that a data analysis at a procedural level is a prerequisite to more epistemic analyses: the automatic generation of a protocol analysis is a level of diagnostic that seems crucial to us and which is the one implemented in DIANE right now. It makes possible to test at a fine level hypotheses regarding problem solving and learning mechanisms with straightforward educational implications. Having introduced our theoretical background that stresses the importance of interpretive aspects and transverse semantic dimensions in arithmetical problem solving, we will then present the kind of problems we are working with, describe DIANE in more details and provide some results of experiments of cognitive psychology that we conducted.
K. Hakem et al. / DIANE, a Diagnosis System for Arithmetical Problem Solving
259
1. Toward a semantic account of arithmetical problem solving 1.1 From schemas to mental models The 80’s were the golden age for the experimental works and the theories concerning arithmetical problem solving. The previously prevalent conception was that solving a story problem consisted mainly in identifying the accurate procedure and applying it to the accurate data from the problem. This conception evolved towards stressing the importance of the conceptual dimensions involved. Riley, Greeno, & Heller [10] established a typology of one-step additive problems, differentiating combination problems, comparison problems and transformation problems. Kinstch & Greeno [7] have developed a formal model for solving transformation problems relying on problem schemas. Later on, the emphasis on interpretive aspects in problem solving has led to the notion of the mental model of the problem introduced by Reusser [9], which is an intermediate step between reading the text of the problem and searching for a solution. This view made it possible to explain the role of some semantic aspects which were out of the scope of Kinstch & Greeno’s [7] model; for instance, Hudson [6] showed that in a comparison problem, where a set of birds and a set of worms are presented together, the question How many birds will not get a worm ? is easier to answer than the more traditional form How many more birds are there than worms ?, and many studies have shown that a lot of mistakes are due to misinterpretations [4]. Thus, these researches emphasized the importance of two aspects: conceptual structure and interpretive aspects, which have to be described more precisely. Informative results come from works on analogical transfer. 1.2 Influence of semantic dimensions More recently, work on analogical transfer showed that semantic features have a major role in problem solving process. Positive spontaneous transfer is usually observed when both semantic and structural features are common [1]. When the problems are similar in their surface features but dissimilar in their structure, the transfer is equally high but negative [11], [8]. Some studies have explicitly studied the role of semantic aspects and attributed the differences between some isomorphic problem solving strategies to the way the situations are encoded [2]. Several possibilities exist for coding the objects of the situation and a source of error is the use of an inappropriate coding, partially compatible with the relevant one [13]. Within the framework of arithmetic problems, our claim is that the variables involved in the problem are an essential factor that is transverse to problem schemas or problem types. We propose that the different types of quantities used in arithmetic problems do not behave in a similar way. Certain variables call for some specific operations. Quantities such as weights, prices, and numbers of elements may be easily added, because we are used to situations where these quantities are accumulated to give a unique quantity. In this kind of situations, the salient dimension of these variables is the cardinal one. Conversely, dates, ages, durations are not so easy to add: although a given value of age may be added to a duration to provide a new value of age; in this case, the quantities which are added are not of the same type. On the other hand, temporal or spatial quantities are more suited to comparison and call for the operation of subtraction, which measures the
260
K. Hakem et al. / DIANE, a Diagnosis System for Arithmetical Problem Solving
difference in a comparison. In this kind of situations, the salient dimension of these variables is the ordinal one. We want to describe in a more precise way the semantic differences between isomorphic problems by characterizing their influence. For this purpose, it seems necessary to study problem solving mechanism at a detailed level which makes it possible to identify not only the performance but the solving process itself and to characterize the effect of the interpretive aspects induced by the semantic dimensions. Thus, we constructed a structure of problems from which we manipulated the semantic features.
2. A set of structured exercises and their solving models Several constraints were applied in order to choose the exercises. (i) Concerning the conceptual structure, the part-whole dimension is a fundamental issue in additive problem solving; it appears as being a prerequisite in order for children to solve additive word problems efficiently [14]; thus our problems are focused on a part-whole structure. (ii) We looked for problems that could be described in an isomorphic manner through a change of some semantic dimensions. We decided to manipulate the variables involved. (iii) We looked for a variety of problems, more precisely problems that would allow the measure of the influence of the variable on the combination/comparison dimension. Hence, we built combination problems as well as comparison problems (iii) In order to focus on the role of transverse semantic dimensions, we looked for problems that did not involve either procedural or calculation difficulties. Therefore, we chose problems involving small numbers. (iv) We looked for problems allowing several ways to reach the solution so as to study not only the rate of success but the mechanisms involved in the choice of a strategy, whether it is adequate or not and to assess the quality of DIANE’s diagnosis in non trivial situations. As a result, we built problems that might require several steps to solve. The following problems illustrate how those constraints were embedded: John bought a 8-Euro pen and an exercise book. He paid 14 Euros. Followed by one of these four wordings: - Paul bought an exercise book and 5-Euro scissors. How much did he pay? - Paul bought an exercise book and scissors that costs 3 Euros less than the exercise book. How much did he pay? - Paul bought an exercise book and scissors. He paid 10 Euros. How much are the scissors? - Paul bought an exercise book and scissors. He paid 3 Euros less than John. How much are the scissors? Those problems have the following structure: all problems involve two wholes (Whole1 and Whole2) and three parts (Part1, Part2, Part3); Part2 is common to Whole1 and Whole2. The values of a part (Part1) and of a whole (Whole1) are given first (John bought a 8 Euros pen and an exercise book. He paid 14 Euros). Then, a new set is introduced, sharing the second part (Part2) with the first set. In the condition in which the final question concerns the second whole (Whole2) a piece of information is stated concerning the non common part (Part3), this information being either explicit (combination problems: Paul bought an exercise book and 5-Euro pair of scissors) either defined by comparison with Part1 (comparison problems: Paul bought an exercise book and scissors that cost 3 Euros less than the exercise book). In the condition in which the final question concerns the third
K. Hakem et al. / DIANE, a Diagnosis System for Arithmetical Problem Solving
261
part (Part3) a piece of information is stated concerning the second whole (Whole2), this information being either explicit (combination problems: Paul bought an exercise book and scissors. He paid 10 Euros) either defined by comparison with Whole1 (comparison problems: Paul bought an exercise book and scissors. He paid 3 Euros less than John). Then a question concerns the missing entity: Part 3 (How much are the scissors?) or Whole2 (How much did Paul pay?). In fact, three factors were manipulated in a systematic manner for constructing the problems presented hereby: - The nature of the variable involved. - The kind of problem (2 modalities: complementation or comparison): if the problem can be solved by a double complementation, we call it a complementation problem; if it can be solved by a complementation followed by a comparison, we call it a comparison problem. - The nature of the question (2 modalities: part or whole): If the question concerns Whole2, we call it a whole problem and if the question concerns Part3, we call it a part problem. The two last factors define four families of problems that share some structural dimensions (two wholes, a common part and the explicit statement of Whole1 and Part1) but differ in others (the 2x2 previous modalities). Among each family, we built isomorphic problems through the use of several variables that we will describe more precisely later on. One major interest of those problems is that they can all be solved by two alternative strategies that we named step by step strategy and difference strategy. The step by step strategy requires to calculate Part2 before determining whether Part3 or Whole2 (calculating that the price of the exercise book is 6 Euros in the previous example). The difference strategy does not require to calculate the common part and is based on the fact that if two sets share a common part, then their wholes differ by the same value as do the specific parts (the price of the pen and the price of the scissors differ by the same value as the total prices paid). It has to be noted that, if in complementation problems both strategies are in two steps, in the case of the comparison problem, the step by step strategy require three steps whereas the difference strategy requires only one. There exists as well a mixed strategy, that leads to the correct result even though it involves a non useful calculation; it starts with the calculation of Part 2 and ends with the difference strategy. The solving model used for DIANE is composed of the following triple RM=(T, S, H). T refers to the problem Type and depends on the three parameters defined above (kind of problem, nature of the question, nature of the variable). S refers to the Strategy at hand (step by step, difference or mixed strategy). H refers to the Heuristics used and is mostly used to model the erroneous resolution; for instance applying an arithmetic operator to the last data of the problem and the result of the intermediate calculation.
3. Description of DIANE DIANE is a web based application relying on open source technologies. DIANE is composed of an administrator interface dedicated to the researcher or the teacher and of a problem solving interface dedicated to the pupil. The administrator interface allows the user to add problems, according to the factors defined above, to create series of exercises, to look
262
K. Hakem et al. / DIANE, a Diagnosis System for Arithmetical Problem Solving
for the protocol of a student, or to download the results of a diagnosis. The role of the problem solving interface is to enable the pupil to solve a series of problems that will be analyzed later on and will be the basis for the diagnosis. This interface (Figure 1) provides some functions aimed at facilitating the calculation and writing parts of the process in order to let the pupil concentrate on the problem solving. The use of the keyboard is optional: all the problems can be solved by using the mouse only. The answers of the pupils are a mix of algebraic expressions and natural language. All the words which are necessary to write an answer are present in the text; the words were made clickable for this purpose. Using only the words of the problem for writing the solution helps to work with a restrained lexicon and avoids typing and spelling mistakes; it allows us to analyze a constrained natural language.
Figure 1. The pupil interface
4. Diagnosis with DIANE Diagnosis with DIANE is a tool for analyzing and understanding the behavior of the learners at a detailed level when they solve arithmetic problems. The diagnosis is generic in that it might be applied to all the classes of problems that are defined and is not influenced by the surface features of the exercises. Diagnosis concerns not only success or failure or the different kinds of successful strategies, but erroneous results are coded at the same detailed level as the successful strategies. As we have already mentioned, our main rationale is that understanding the influence of representation on problem solving requires the analysis of behavior at a very detailed level. Note that more than half of the modalities of the table of analysis are used for encoding errors. Diagnosis is reported in a 18 column table. Depending on the strategies and the nature of the problem up to 14 columns are effectively used for analyzing one particular resolution. The first column encodes the strategy. It is followed by several groups of four columns. The first column of each group encodes the procedure (addition, subtraction, etc), the second one indicates whether the data are relevant, the third one indicates whether the result is correct and the fourth one indicates whether a sentence is formulated and evaluates the sentence (this column is not yet encoded automatically). Another column, the 14th is
K. Hakem et al. / DIANE, a Diagnosis System for Arithmetical Problem Solving
263
used to identify the nature of what is calculated in the last step of the resolution (a part, a whole, the result of a comparison, an operation involving the intermediary result and the last item of data, etc.) The answer of the pupil, a string of characters, is treated following the pattern of regular expressions. This treatment turns the answer of the pupil into four tables, which are used for the analysis. The first table contains all the numbers included in the answer, the second one contains all the operations, the third one all numbers that are not operands and the fourth one contains all the words separated by spaces. The data extracted or inferred from the problem (Whole1, Part1, Part3 …) are stored in a database. The automatic diagnosis is based on comparisons between the data extracted and inferred from the text and the tables, through using heuristics derived from the table of analysis. The following table (Table 1) provides two examples of diagnosis for the problem: John bought a 8-Euro pen and an exercise book. He paid 14 Euros. Paul bought an exercise book and scissors. He paid 3 Euros less than John. How much are the scissors? Pupil 1 Pupil 2 Diagnosis by DIANE Response Diagnosis by DIANE Col 1: Erroneous comparison Col 1: step by step strategy strategy 14 - 8 = 6 Col 2-4: subtraction, relevant data, 14 - 8 = 7 Col 2-4: subtraction, relevant 14 - 3 = 11 calculation error Col 6-8: subtraction, relevant data, The scissors cost 11 Euros data, exact result 14 - 3 = 11 Col 14: calculation of comparison exact result Col 15-17: subtraction, data Col 14: calculation of a part 11 - 7 = 4 correct for the comparison but not The scissors cost 4 Euros Col 15-17: subtraction, relevant data (the for the solution, exact result calculation error is taken into account), exact result
Response
Table 1: An example of Diagnosis with DIANE
DIANE provides a fine grained diagnosis that identifies the errors made by the pupils. For instance, pupil 1 (Table 1) made a calculation mistake when calculating Part 2 (14-8=7), which implies an erroneous value for the solution (11-7=4). DIANE indicates that an item of data is incorrect in the last calculation due to a calculation error at the first step. The same holds true for erroneous strategies. Pupil 2 (Table 1), after having performed a correct first step ends his/her resolution with the calculation of the comparison (14-3=11). In this situation, DIANE diagnosis indicates that the pupil used an erroneous strategy that provided a result which is correct for the calculation of the comparison but not for the solution. This situation is a case of use of the heuristic previously described (using the last data and the result of the intermediate calculation).
5. Results from experimental psychology Experimentation has been conducted on a large scale [12]; 402 pupils (168 5th graders, 234 6th graders) from 15 schools in Paris and the Toulouse area participating. The experimental design was the following: each child solved, within two sessions, complementation and comparison problems for three kinds of variables and the two kinds of questions, that is twelve problems. Even if the experimental results are not the main scope of this paper, let us mention that the main hypotheses were confirmed (for each of the four families of problems, we found a main effect of the kind of variable on the score of success (17,79move>creation/deletion->simulation. Further, the good groups had a less balanced work distribution than the mediocre and poor groups. The ordered (and therefore less successful) groups split their time between having one person perform the whole phase (M = 37%), the other person perform the whole phase (M = 34%), or both people taking action in the phase (M = 28%). The scrambled groups had fewer phases where both people took action (M = 15%), and a less balanced distribution of individual phases (Ms = 53% and 32%). These results were surprisingly congruent with the task coordination results for Experiment 1, as reported in detail in [13]. Although task coherence varied between conditions in Experiment 1, there were few differences on this dimension between groups in Experiment 2. Groups refered to an average of 1.8 objects per phase in move phases, creation/deletion phases, and simulation phases. All groups tended to refer to the same objects across multiple phases. Task selection also did not differ between groups in this experiment, but commonalities between groups provided insight into the collaborative process. Groups structured their actions based on the transitions from one state of traffic lights to the next. Creation/deletion actions were linear 79% of the time, in that the current edge being drawn involved an object used in the previous creation/deletion action. Groups tended to focus on either the pedestrian or the car lights at a given time; the current creation/deletion action tended to involve the same light class as the previous creation/deletion action 75% of the time. In addition to the analysis of Experiment 2 based on the five dimensions, we explored how the BR could be used to analyze and tutor collaboration. For example, we used the BR to capture individual creation actions, and discovered that two groups (1 and 3) used the same correct strategy in creating the links necessary to have the traffic lights turn from green to yellow to red. This path in the graph demonstrated a conceptual understanding of how Petri Nets can be used to effect transitions. We will ultimately be able to add hints that encourage students to take this path, leveraging the behavior graph as a means for tutoring. In likewise fashion, the BR can also be used to identify common bugs in participants' action-by-action problem solving. For instance, the BR captured a common error in groups 1 and 2 of Experiment 2: each group built a Petri Net, in almost identical fashion, in which the traffic-red and pedestrian-green lights would not occur together. In situations like this, the behavior graph could be annotated to mark this sequence as buggy, thus allowing the tutor to provide feedback should a future student take the same steps. On the other hand, it is clear that the level of individual actions is not sufficient for representing all of the dimensions. For instance, evaluating whether students are chatting "too much" or alternating phases in an "optimal" way is not easily detected at the lowest level of abstraction. To explore how we might do more abstract analysis, we wrote code to pre-process and cluster the Cool Modes logs at a higher level of abstraction and sent them to the BR. Figure 3 shows an example of this level of analysis from Experiment 2. Instead of individual actions, edges in the graph represent phases of actions (see the "CHAT", "MOVE", and "OBJEC" designations on the edges). The number to the right of each phase in the figure specifies how many instances of that particular action type occurred during consecutive steps, e.g., the first CHAT phase, starting to the left from the root node, represents 2 individual chat actions. The graph shows the first 5 phases of groups 2, 3, 5, and 8. Because the type of phase, the number of actions within each phase, and who participates (recorded but not shown in the figure), is recorded we can analyze the data and, ultimately, may be able to provide tutor feedback at this level. For instance, notice that the scrambled groups (2 and 3) incorporated move phases into their process, while at the same point, the organized groups (5 and 8) only used CHAT and OBJEC (i.e., creation/deletion)
272
A. Harrer et al. / Collaboration and Cognitive Tutoring
phases. Additionally, groups 5 and 8 began their collaboration with a lengthy chat phase, and group 5 continued to chat excessively (23 chat actions by group 5 leading to state22!). This level of data provided to the BR could help us to understand better the task coordination dimension. In addition, if provided at student time, the BR could also provide feedback to groups with "buggy" behavior; for instance, a tutor might have been able to intervene during group 5's long chat phase. In future work, we intend to further explore how this and other levels of abstraction can help us address not only the task coordination dimension but also the task coherence and task selection dimensions. 4.3
Discussion
There are two questions to answer with respect to these empirical results: Were the five dimensions valid units of analysis across the experiments? Can the BR analyze the dimensions and, if not, can the Figure 3. An Abstracted Behavior Graph dimensions be used to guide extensions to it? The dimensions did indeed provide a useful analysis framework. The conceptual understanding dimension was helpful in evaluating problem solutions; in both experiments we were able to identify and rate the dyads based on salient (but different) conceptual features. Visual organization was important in both tasks, and appeared to inform problem solutions. The task coordination dimension provided valuable data, and the clearest tutoring guidelines of all the dimensions. The task coherence dimension provided information about object references in Experiment 1, but was not as clear of an aid in the analysis of Experiment 2. Finally, the task selection dimension was a useful measure in both experiments, but was more valuable in Experiment 1 due to the greater number of possible strategies. With the introduction of abstraction levels, the effort to provide hints and messages to links will be greatly reduced because of the aggregation of actions to phases and sequences of phases. Even with abstraction, larger collaboration groups would naturally lead to greater difficulty in providing hints and messages, but our intention is to focus on small groups, such as the dyads of the experiments described in this paper. 5. Conclusion Tackling the problem of tutoring a collaborative process is non-trivial. Others have taken steps in this direction (e.g., [14, 15]), but there are still challenges ahead. We have been working on capturing and analyzing collaborative activity in the Behavior Recorder, a tool for building Pseudo Tutors, a special type of cognitive tutor that is based on the idea of recording problem solving behavior by demonstration and then tutoring students using the captured model as a basis. The work and empirical results we have presented in this paper
A. Harrer et al. / Collaboration and Cognitive Tutoring
273
has led us to the conclusion that BR analysis needs to take place at multiple levels of abstraction to support tutoring of collaboration. Using the five dimensions of analysis as a framework, we intend to continue to explore ways to analyze and ultimately tutor collaborative behavior. We briefly demonstrated one approach we are exploring: clustering of actions to analyze phases (of actions) and sequences of phases. Since task coordination appears to be an interesting and fruitful analysis dimension, we will initially focus on that level of abstraction. Previously, in other work, we investigated the problem of automatically identifying phases by aggregating similar types of actions [16] and hope to leverage those efforts in our present work. An architectural issue will be determining when to analyze (and tutor) at these various levels of abstraction. Another direction we have considered is extending the BR so that it can do “fuzzy” classifications of actions (e.g., dynamically adjusting parameters to allow behavior graph paths to converge more frequently). We are in the early stages of our work but are encouraged by the preliminary results. We plan both to perform more studies to verify the generality of our framework and to implement and experiment with extensions to the Behavior Recorder.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. Journal of the Learning Sciences, 4, 167-207. Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8, 30-43. Bransford, J. D., Brown, A. L., , & Cocking, R. R. (Eds.). (2000). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press. Slavin, R. E. (1992). When and why does cooperative learning increase achievement? Theoretical and empirical perspectives. In R. Hertz-Lazarowitz & N. Miller (Eds.), Interaction in cooperative groups: The theoretical anatomy of group learning (pp. 145-173). New York: Cambridge University Press. Johnson, D. W. and Johnson, R. T. (1990). Cooperative learning and achievement. In S. Sharan (Ed.), Cooperative learning: Theory and research (pp. 23-37). New York: Praeger. McLaren, B. M., Koedinger, K. R., Schneider, M., Harrer, A., & Bollen, L. (2004b) Toward Cognitive Tutoring in a Collaborative, Web-Based Environment; Proceedings of the Workshop of AHCW 04, Munich, Germany, July 2004. Pinkwart, N. (2003) A Plug-In Architecture for Graph Based Collaborative Modeling Systems. In U. Hoppe, F. Verdejo & J. Kay (eds.): Proceedings of the 11th Conference on Artificial Intelligence in Education, 535-536. Koedinger, K. R., Aleven, V., Heffernan, N., McLaren, B. M., & Hockenberry, M. (2004) Opening the Door to Non-Programmers: Authoring Intelligent Tutor Behavior by Demonstration. In Proceedingsof ITS, Maceio, Brazil, 2004. Nathan, M., Koedinger, K., and Alibali, M. (2001). Expert blind spot: When content knowledge eclipses pedagogical content knowledge. Paper presented at the Annual Meeting of the AERA, Seattle. Jansen, M. (2003) Matchmaker - a framework to support collaborative java applications. In the Proceedings of Artificial Intelligence in Education (AIED-03), IOS Press, Amsterdam. Koedinger, K. R. & Terao, A. (2002). A cognitive task analysis of using pictures to support pre-algebraic reasoning. In C. D. Schunn & W. Gray (Eds.), Proceedings of the 24th Annual Conference of the Cognitive Science Society, 542-547. Corbett, A., McLaughlin, M., and Scarpinatto, K.C. (2000). Modeling Student Knowledge: Cognitive Tutors in High School and College. User Modeling and User-Adapted Interaction, 10, 81-108. McLaren, B. M., Walker, E., Sewall, J., Harrer, A., and Bollen, L. (2005) Cognitive Tutoring of Collaboration: Developmental and Empirical Steps Toward Realization; Proceedings of the Conference on Computer Supported Collaborative Learning, Taipei, Taiwan, May/June 2005. Goodman, B., Hitzeman, J., Linton, F., and Ross, H. (2003). Towards Intelligent Agents for Collaborative Learning: Recognizing the Role of Dialogue Participants. In the Proceedings of Artificial Intelligence in Education (AIED-03), IOS Press, Amsterdam. Suthers, D. D. (2003). Representational Guidance for Collaborative Learning. In the Proceedings of Artificial Intelligence in Education (AIED-03), IOS Press, Amsterdam. Harrer, A. & Bollen, L. (2004) Klassifizierung und Analyse von Aktionen in Modellierungswerkzeugen zur Lernerunterstützung. In Workshop-Proc. Modellierung 2004 . Marburg, 2004.
274
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Personal Readers: Personalized Learning Object Readers for the Semantic Web 1 Nicola Henze a,2 a ISI – Semantic Web Group, University of Hannover & Research Center L3S Abstract. This paper describes our idea for personalized e-Learning in the Semantic Web which is based on configurable, re-usable personalization services. To realize our ideas, we have developed a framework for designing, implementing and maintaining personal learning object readers, which enable the learners to study learning objects in an embedding, personalized context. We describe the architecture of our Personal Reader framework, and discuss the implementation of personalization services in the Semantic Web. We have realized two Personal Readers for e-Learning: one for learning Java programming, and another for learning about the Semantic Web. Keywords. web-based learning platforms & architectures adaptive web-based environments, metadata, personalization, semantic web, authoring
1. Introduction The amount of available electronic information increases from day to day. The usefulness of information for a person depends on various factors, among them are the timely presentation of information, the preciseness of presented information, the information content, and the prospective context of use. Clearly, we can not provide a measurement for the expected utility of a piece of information for all persons which access it, nor can we give such an estimation for a single person: the expected utility varies over time: what might be relevant at some point might be useless in the near future, e.g. the information about train departure times becomes completely irrelevant for planning a trip if the departure time lies in the past. With the idea of a Semantic Web [2] in which machines can understand, process and reason about resources to provide better and more comfortable support for humans in interacting with the World Wide Web, the question of personalizing the interaction with web content is at hand: Estimating the individual requirements of the user for accessing the information, learning about a user’s needs from previous interactions, recognizing the actual access context, in order to support the user to retrieve and access the part of information from the World Wide Web which fits best to his or her current, individual needs. 1 This work has partially been supported by the European Network of Excellence REWERSE - Reasoning on the Web with Rules and Semantics (www.rewerse.net). 2 Correspondence to: Nicola Henze, ISI - Semantic Web Group, University of Hannover & Research Center L3S, Appelstr.4, D-30167 Hannover Tel.: +49 511 762 19716; Fax: +49 511 762 19712; E-mail:
[email protected] N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
275
The development of a Semantic Web has, as we believe, also great impact on the future of e-Learning. In the past few years, achievements in creating standards for learning objects (for example the initiatives from LOM (Learning Objects Metadata [13]) or IMS [12]) have been carried out, and large learning object repositories like Ariadne [1], Edutella [7] and others have been built. This shifts the focus from the more or less closed e-Learning environments forward to open e-Learning environments, in which learning objects from multiple sources (e.g. from different courses, multiple learning object providers, etc.) could be integrated into the learning process. This is particularly interesting for university education and life-long learning where experienced learners can profit from self-directed learning, exploratory learning, and similar learning scenarios. This paper describes our approach to realize personalized e-Learning in the Semantic Web. The following section discusses the theoretical background of our approach and motivates the development of our Personal Reader framework. The architecture of the Personal Reader framework is described in Section 3; here we also discuss authoring of such Personal Learning Object Readers as well as required annotations of of learning objects with standard metadata for these Readers. Section 4 shows the implementation of some example personalization services for e-Learning. Section 4.4 finally provides information about realized Personal Learning Object Readers for Java programming and Semantic Web.
2. Towards personalized e-Learning in the Semantic Web Our approach towards personalized e-Learning in the Semantic Web is guided by the question how we can adapt personalization algorithms (especially from field of adaptive educational hypermedia) in a way that they can be 1. re-used, and 2. can be plugged together by the learners as they like - thus enabling learners to choose which kind of personalized guidance and in what combination they appreciate personalized e-Learning. In a theoretical analysis and comparison of existing adaptive educational hypermedia systems that we have done in earlier work [10], we found that it is indeed possible to describe personalization functionality in a manner required for re-use, i.e. describe such personalization functionality in encapsulated, independent modules. Brusilovsky has argued in [5], that current adaptive educational hypermedia systems suffer from the socalled open corpus problem. Hereby is meant, that these systems work on a fixed set of documents/resources which are normally known to the system developers at design time. Alterations in the set of documents like modifying a document’s content, adding new documents, etc., are nearly impossible because they require substantial alterations on the document descriptions, and normally affect relations in the complete corpus. To analyze the open-corpus-problem in more detail, we started in [10] an analysis of existing adaptive educational hypermedia systems and proposed a logic-based definition of adaptive educational hypermedia with a process-oriented focus. We provided a logic-based characterization of some well-known adaptive educational hypermedia systems: ELM-Art, Interbook, NetCoach, AHA!, and KBS Hyperbook, and where able to described them by means of (meta-)data about the document space, observation data (at runtime required
276
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
data about user interaction, user feedback, etc.), output data, and the processing data - the adaptation algorithms. As a result, we were able to formulate a catalogue of adaptation algorithms in which the adaptation result could be judged in comparison to the overhead required for providing the input data (comprising data about the document space and observation data and runtime). This catalogue provides a basis-set for re-usable adaptation algorithms. Our second goal, designing and realizing personalized e-Learning in the Semantic Web which allows the learners to customize the degree, method and coverage of personalization, is subject-matter of the present paper. Our first step towards achieving this goal was to develop a generic architecture and framework, which makes use of Semantic Web technologies in order to realize Personal Learning Object Readers. These Personal Learning Object Readers are on the one hand Readers, which mean that they display learning objects, and on the other hand Personal Readers, thus they provide personalized contextual information on the currently considered learning object, like recommendations about additional readings, exercises, more detailed information, alternative views, the learning objectives, the application where this learning content is relevant, etc. We have developed a framework for creating and maintaining such Personal Learning Object Readers. The driving principle of this framework is to expose all the different personalization functionalities as services which are orchestrated by some mediator service. The resulting personalized view on the learning object and it’s context is finally determined by another group of services which take care on visualization and device-adaptation aspects. The next step to achieve our second goal is to create an interface component which enables the learners to select and customize personalization services. This is object of investigation of our ongoing work. Other approaches to personalized e-learning in the Semantic Web can be taken, e.g. focusing on reuse of content or courses (e.g. [11]), or focusing on metadata-based personalization (e.g [6,3]). Also portal-strategies have been applied for personalized e-Learning (see [4]). Our approach differs from the above mentioned approaches as we encapsulate personalization functionality into specific services, which can be plugged together by the learner.
3. The Personal Reader Framework: Service-based Personalization Functionality for the Semantic Web The Personal Reader framework [9] provides an environment for designing, maintaining and running personalization services in the Semantic Web. The goal of the framework is to establish personalization functionality as services in a semantic web. In the run-time component of the framework, Personal Reader instances are generated by plugging one or several of these personalization services together. Each developed Reader consists of a browser for learning resources the reader part, and a side-bar or remote, which displays the results of the personalization services, e.g. individual recommendations for learning resources, contextual information, pointers to further learning resources, quizzes, examples, etc. the personal part (see Figure 2). This section describes the architecture of the Personal Reader framework, and discusses authoring of Personal Readers within our framework.
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
277
3.1. Architecture The architecture of the Personal Reader framework (PRF) makes use of recent Semantic Web technologies for realizing a service-based environment for implementing and accessing personalization services. The core component of the PRF is the so-called connector service whose task is to pass requests and processing results between the user interface component and available personalization services, and to supply user profile information, and available metadata descriptions on learning objects, courses, etc. In this way, the connector service is the mediator between all services in the PRF. Two different kinds of services - apart from the connector service - are used in the PRF: personalization services and visualization services. Each personalization service offers some adaptive functionality, e.g. recommends learning objects, points to more detailed information, quizzes, exercises, etc. personalization services are available to the PRF via a service registry using the WSDL (Web Service Description Language, [15]). Thus, service detection and invocation takes place via the connector service which ask the web service registry for available personalization services, and selects appropriate services based on the service descriptions available via the registry. The task of the visualization services is to provide the user interface for the Personal Readers: interprete the results of the personalization services to the user, and create the actual interface with reader-part and personalization-part. The basic implementation guideline in the Personal Reader framework is the following: Whenever a service has to communicate with other services, we use RDF (Resource Description Framework, [14]) for describing requests, processing results, and answers. This has the immediate advantage, that all components of the Personal Reader framework (visualization services or personalization services) can be independently developed, changed or substituted, as long as the interface protocol given in the RDF descriptions is respected. To make these RDF descriptions “understandable” for all services, they all externalize their meaning by referring to (one or several) ontologies. We have developed an ontology for describing adaptive functionality, the l3s-ontology1 . Whenever a personalization service is implemented, the provided adaptation of this service is described with respect to this adaptation ontology, such that each visualization service can interprete the meaning of the adaptation, and can decide which presentation of the results should be used in accordance to the device that the user currently has, or the available bandwidth. This has the consequence, that local context adaptation (e.g. adaptation based on the capabilities of the user’s device, bandwidth, environment, etc.) is not done by the personalization services, but by the visualization services. Figure 1 depicts the data flow in the PRF. 3.2. Authoring Authoring is a very critical issue for successfully realizing adaptive educational hypermedia systems. As our aim in the Personal Reader framework is to support re-usability of personalization functionality, this is an especially important issue here. Recently, standards for annotating learning objects have been developed (cf. LOM [13] or IMS [12]). As a guideline for our work, we established the following rule: 1 http://www.personal-reader.de/rdf/l3s.rdf
278
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web User is clicking on a link rdf request Visualization Service passes request to Connector rdf request Connector Service searches for meta−information about user, course, currently visited page, etc. rdf request Each registered Personalization Service answers the request rdf result Ontology of Adaptive Functio− nality
Connector provides all results to Visualization Service rdf result Visualization Service determines presentation according to context (device, bandwith, settings, etc.)
Figure 1. The communication flow in the Personal Reader framework: All communication is done via RDF-descriptions for requests and answers. The RDF descriptions are understood by the components via the ontology of adaptive functionality
Learning Objects, course description, domain ontologies, and user profiles must be annotated according to existing standards (for details please refer to [8]). The flexibility must come from the personalization services which must be able to reason about these standard-annotated learning objects, course descriptions, etc. This has an immediate consequence: We can implement personalization services which fulfill the same goal (e.g. providing a personal recommendations for some learning object), but which consider different aspects in the metadata. E.g. a personalization service can calculate recommendations based on the structure of the learning materials in some course and the user’s navigation history, while another checks for keywords which describe the learning objectives of that learning objects and calculates recommendations based on relations in the corresponding domain ontology. Examples of such personalization services are given in Section 4. The administration component of the Personal Reader framework provides an author interface for easily creating new instances of course-Readers: Course materials which are annotated according to LOM (or some subset of it), and which might in addition refer to some domain ontology, can immediately be used to create a new Personal Reader instance which offers all the personalization functionality which is - at runtime - available in the personalization services.
4. Realizing Personalization Services for e-Learning This sections describes in more detail the realization of some selected personalization services: A service for recommending learning resources, and a service for enriching learning objects with the context in which they appear in some course. 4.1. Calculating Recommendations. Individual recommendations for learning resources are calculated according to the current learning progress of the user, e. g. with respect to the current set of course materials. As described in Section 3.2, it is the task of the personalization services to realize strate-
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
279
gies and algorithms which make use of standardized metadata annotations of learning objects, course descriptions, etc. The first solution for realizing a recommendation service determines that a learning resource LO is recommended if the learner has studied at least one more general learning resource (UpperLevelLO), where “more general” is determined according to the course descriptions: : FORALL LO, U learning_state(LO, U, recommended) <EXISTS UpperLevelLO (upperlevel(LO, UpperLevelLO) AND p_obs(UpperLevelLO, U, Learned) ).
Further personalization services can derive stronger recommendations than the previous one (e. g., if the user has studied all general learning resources), or less strong recommendations (e.g., if one or two of these haven’t been studied so far), etc. A different realization of a recommendation service can calculate its results with respect to the keywords describing the objectives of the learning object in some domain ontology. In particular, this is an appropriate strategy if a user is regarding course materials from different courses at the same time. FORALL LO, U learning_state(LO, U, recommended) <EXISTS C, C_DETAIL (concepts_of_LO(LO, C_DETAIL) AND detail_concepts(C, C_DETAIL) AND p_obs(C, U, Learned) ).
Comparing the above strategies for recommendation service we see that some of the recommendation services might provide better results as others - depending on the situation in which they are used. For example, a recommendation service, which reasons about the course structure will be more accurate than others, because it has more fine– grained information about the course and therefore on the learning process of a learner who is taking part in this course. But if the learner switches between several courses, recommendations based solely on the content of learning objects might provide better results. Overall, this yields to a configuration problem, in which we have to rate the different services which provide the same personalization functionality according to which data they used for processing, and in which situation they should been employed. We are currently exploring how we can solve this configuration problem with defeasible logics. 4.2. Course Viewer For viewing learning objects which belong to some lecture, it is essential to show the learner the context of the learning objects: what is the general learning goal, what is this learning object about, and what are details that are related to this specific learning object. For example, a personalization service can follow the strategy to determining such details by following the course structure (if such a hierarchical structure like sections, subsections, etc. is given). Or it can use the key-concepts of the learning object and determine details with respect to the domain ontology. The following rule applies the latter approach: Details for the currently regarded learning resource are determined by detail_learningobject(LO, LO_DETAIL) where LO and LO_Detail are learning resources, and where LO_DETAIL covers more specialized learning concepts which are determined with help of the domain ontology. FORALL LO, LO_DETAIL detail_learningobject(LO, LO_DETAIL) <EXISTS C, C_DETAIL(detail_concepts(C, C_DETAIL) AND concepts_of_LO(LO, C) AND concepts_of_LO(LO_DETAIL, C_DETAIL)) AND learning_resource(LO_DETAIL) AND NOT unify(LO,LO_DETAIL).
280
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
Figure 2. Screenshot of a Personal Reader for a e-Learning course on “Java Programming”. The so far implemented Personal Readers are freely available at www.personal-reader.de.
4.3. Basic User Modeling At the current state, the Personal Reader requires only few information about the user’s characteristics. Thus, for our example we employed a very simple user model: This user model traces the users path in the learning environment and registers whenever the user has visited some learning resource. This simple user model is queried by all personalization services; updating the user model is task of the visualization services which provide the user interface and monitor user interactions. 4.4. Examples of Personal Learning Object Readers Up to now, we have developed two Personal Learning Object Readers with our environment: A Personal Reader for learning the Java programming language (see the screenshot in figure 2), and a Personal Reader for learning about the Semantic Web. The Personal Reader for Java uses materials from the online version of the Sun Java Tutorial2 , while the one for learning about the Semantic Web uses materials of a course given at University of Hannover in summer 20043 .
5. Conclusion and Future Work This paper describes our approach for realizing personalized e-Learning in the Semantic Web. Our approach is driven by the goal of realizing a Plug & Play architecture for per2 http://java.sun.com/docs/books/tutorial/ 3 http://www.kbs.uni-hannover.de/
henze/semweb04/skript/inhalt.xml
N. Henze / Personal Readers: Personalized Learning Object Readers for the Semantic Web
281
sonalized e-Learning which allows a learner to select, customize and combine personalization functionality. To achieve this goal, we have developed a framework for creating and maintaining personalization services, the Personal Reader framework. This framework provides an environment for accessing, invoking and combining personalization services, and contains a flexible, service-based infrastructure for visualizing adaptation outcomes, and for creating the user interface. Up to know, we have realized two Personal Readers (for the domains of Java programming and Semantic Web). Currently, we are implementing further personalization services, and are extending the user modeling component of the Personal Reader framework. Future work will include an improved way for combining personalization service, and for detecting and solving potential conflicts between the recommendations of these services.
References [1] Ariadne: Alliance of remote instructional authoring and distributions networks for europe, 2001. http://ariadne.unil.ch/. [2] Tim Berners-Lee, Jim Hendler, and Ora Lassila. The semantic web. Scientific American, May 2001. [3] P. De Bra, A. Aerts, D. Smits, and N. Stash. AHA! version 2.0: More adaptation flexibility for authors. In Proceedings of the AACE ELearn’2002 conference, October 2002. [4] P. Brusilovsky and H. Nijhawan. A framework for adaptive e-learning based on distributed re-usable learning activities. In In Proceedings of World Conference on E-Learning, E-Learn 2002, Montreal, Canada, 2002. [5] Peter Brusilovsky. Adaptive hypermedia. User Modeling and User-Adapted Interaction, 11:87–110, 2001. [6] Owen Conlan, Cord Hockemeyer, Vincent Wade, and Dietrich Albert. Metadata driven approaches to facilitate adaptivity in personalized elearning systems. Journal of the Japanese Society for Information and Systems in Education, 42:393–405, 2003. [7] Edutella, 2001. http://edutella.jxta.org/. [8] Nicola Henze, Peter Dolog, and Wolfgang Nejdl. Reasoning and ontologies for personalized e-learning. ETS Journal Special Issue on Ontologies and Semantic Web for eLearning, 2004. To appear. [9] Nicola Henze and Matthias Kriesell. Personalization Functionality for the Semantic Web: Architectural Outline and First Sample Implementation. In Proceedings of the 1st International Workshop on Engineering the Adaptive Web (EAW 2004), Eindhoven, The Netherlands, 2004. [10] Nicola Henze and Wolfgang Nejdl. A logical characterization of adaptive educational hypermedia. New Review of Hypermedia, 10(1), 2004. [11] Sebastien Iksal and Serge Garlatti. Adaptive web information systems: Architecture and methodology for resuing content. In Proccedings of the 1st International Workshop on Engineering the Adaptive Web (EAW 2004), Eindhoven, The Netherlands, 2004. [12] IMS: Standard for Learning Objects, 2002. http://www.imsglobal.org/. [13] LOM: Draft Standard for Learning Object Metadata, 2002. http://ltsc.ieee.org/wg12/index.html. [14] Resource Description Framework (RDF) Schema Specification 1.0, 2002. http://www. w3.org/TR/rdf-schema. [15] WSDL: Web Services Description Language, version 2.0, August 2004. http://www.w3.org/TR/2004/WD-wsdl20-20040803/.
282
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Making an Unintelligent Checker Smarter: Creating Semantic Illusions from Syntactic Analyses Kai Herrmann and Ulrich Hoppe University of Duisburg-Essen, Germany. {herrmann, hoppe}@collide.info Abstract. In interactive learning environments based on visual representations the problem of checking the correctness of student solutions is much harder than with linear textual or numerical input. This paper presents a generic approach to providing learner feedback in such environments. The underlying checking mechanism analyzes system states against target constellations defined as sets of constraints about locations or connections of objects. A newly introduced analysis graph allows also the definition of procedural dependencies. An implementation of a feedback mechanism for traffic education in secondary schools is presented as an example. Keywords. Pseudo Tutoring, Intelligent Tutoring, Error Analysis
1. Introduction: Modular Constraint Checking Many interactive learning environments are based on visual representations which makes it much more difficult to check the correctness of student solutions than it is the case with linear textual or numerical input. In former articles [1,2], we have introduced a general approach to implement a checking mechanism for configuration problems of such visual languages essentially based on syntactic features of the underlying representation. Although this mechanism (called “modular constraint checking”: MCC) is “unintelligent” in that it does not rely on semantic knowledge, it is capable of creating the illusion of having deep domain knowledge by providing highly contextual feedback to the learner. First, we will briefly recapitulate the basic ideas behind the MCC: It is a state based (1.2) checking mechanism for visual languages (1.1), and capable of producing the impression or “illusion” of deep domain knowledge (1.3). Then, we describe new challenges for this approach (sec. 2) and how they are implemented (sec. 5). An example use case shows the advantages of the relatively light-weight form of checking in this approach as compared to more sophisticated (and more complicated!) systems (sec. 3). 1.1. Checking Visual Languages Although there are some checking systems for dialog driven interfaces [3,4,5], there is a lack of systems which are able to check (and provide feedback for) visual languages in a more specific sense: A visual language consists of graphical objects the user can arbitrarily arrange on the screen. Values and positioning of the objects together form expressions in visual languages. There are two main problems in checking visual languages (compared with checking “regular”, fixed interfaces):
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
283
• A checking mechanism for visual languages must be aware of the absolute and relative positions of objects on the screen and the connections between them. These facts are often as important as the values of the components itself. • While it is simple to identify objects in fixed interfaces, where users only change the value, but not the position of objects, there is a problem in identifying objects in a visual language: Given, there are two objects (x and y) that represent the same concept in a visual language and differ only by their value. A user can, then, simply switch the values (and the position) from x to y and vice versa, so that x is now at the position y was before and has the value y had. For the user the objects now have changed their identity. But for the system they are still the old objects, but with changed values and positions. In such cases, the system must be able to handle objects according to the understanding of the user. So, the MCC checking system uses especially information about location and connections of objects to identify them. Often a non-ambiguous identification is impossible, so the MCC checker has to deal with this fact. (See [2] for technical details.) 1.2. Checking States When working with visual languages users typically modify objects following the direct manipulation paradigm. That means, e.g., moving an object to a certain position may include placing the object somewhere near the designated position in a first step, and in further steps, refining the position until the object ist placed exactly where it belongs to. Each single action of this sequence is not very expressive, nor is the sequence at a whole. Another user may use a completely different sequence of actions to move the object to the designated location, because there are literally thousands of ways to do so. Because (sequences of) single actions in the manipulation of visual languages is often not important, we do not observe actions of users, but states of the system. When observing a move operation, our system only recognizes the two states “object has reached the destination” and “object has not yet reached the destination”. This approach differs from the approach of [6], which also describes a tutoring system that uses “weak” AI methods and programming by example. But Koedinger et al. examine user actions instead of states and build up a graph of (correct and faulty) user behavior, the behavior graph (BG). Each edge in this graph represents (one or more) user actions. Solving a task means doing actions which are marked as correct in the BG. If a user leaves these paths of correct actions, he or she gets error messages. The disadvantage of that approach is the fact that all possible actions users can execute while fulfilling a task must be inserted into the BG before. For visual languages, this is difficult, as pointed out before. Even integrating the logged actions of a big number of users into the graph (behavior recording, [7]) cannot solve this problem, because the number of possible sequences to solve a task is too big. 1.3. Semantic Illusion To avoid the costs and complex problems of building checking systems which work with domain models [3], we focus on checking relatively low-level generic attributes of objects. We do not try to interpret these attributes on a domain level, but confine ourself to the analysis of connections between objects and their locations on the screen. Neverthe-
284
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
less, remaining on this lower level of interpretation we create feedback that appears to the user as if the checking system would possess deep domain knowledge. We call this as if behavior “semantic illusion”. For each single case the system is prepared for, it is impossible to distinguish between such a “pseudo tutor” and a full sized intelligent tutoring system.[4] This approach releases us from building a domain model, and, by this, makes our system easily portable to new domains. Especially for the interaction with our learning environment Cool Modes [8], which is able to work with many visual languages from different domains, this is an advantage: The MCC checking system is able to work with all these languages with no or only very little porting effort.
2. New Challenges Based on the concepts described in the last section, we are developing the following enhancements to the MCC system, which will be explained in detail in the sec. 5. 2.1. Supporting Non Experts in Extending the System When developing new tutoring systems, a problem often mentioned is the fact that domain experts for learning scenarios (e.g. teachers) are normally not experts in AI programming. Thus, teachers are not able to build their own tutoring/checking system, because these computer related skills are necessary to build such a system. With the enhancements of the MCC system we overcome the barrier between author and system designer [9], because things a system designer normally has to do on an implementation level at design time (writing code in a programming language) is now broken down to configuration level and can be done at use time by a domain expert. In this way, we enable a flexible transition from example authoring to system extension. 2.2. Aspect Handling So far, the MCC system analyzes objects on the level of single attributes (e.g. the color of an object, or its x- and y-position on the screen). To make it easier for users to work with the MCC system, we have now added the concept of aspects. Aspects represent another type of constraints that implement higher-level concepts. Examples of aspects are: • • • •
Absolute position of an object on the screen, relative position of two objects to each other, unique identification of objects, connections of an object to other objects.
If a user wants to observe one (or more) of these facets of an object, he or she does not have to deal with low-level parameters, but can simply select the suitable aspect(s), leaving the details to the system. It is easy to combine different aspects (e.g., unique identification and absolute position), and even mixing aspect constraints and other (lower level) attributes is possible. Additionally, users can make “snapshots” of given object constellations that hold information about one or more aspects of this constellation. This is like using a camera to make a photo of these objects. Such a “snapshot” can then be used as a target constellation for the checking system.
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
285
Figure 1. Left side: Picture of a complex traffic situation with message boxes generated by the MCC checking system. They describe four of about 20 situations that are observed by the MCC checking system in this scenario. In situation a and b a traffic participant is at a place where he is not allowed to be. In both cases a message box is shown that calls attention to that fact.The text box in situation c appears after the user has moved the car at the big horizontal street from the left of the crossing to the right. The message tells the user that the car in the one way street that touches the crossing from the top would have had the right of way, although the horizontal street is bigger. Situation d shows a combination of streets nearly identical to situation c. But now there are traffic signs that annul the right-before-left-rule. Here, the car on the horizontal road would have had the preference. Right side: At the top a condition tree that implements the situation shown at the bottom.
2.3. Process Specification As mentioned above, we do not observe user actions, but system states. Nonetheless, often these states have to be reached in a certain chronological order. To be able to define such sequences of system states, we have now added a process model that represents sequential dependencies that are to be controlled at a given time. The use case described in section 3 uses an advanced feature of this process model that allows the definition of rules about system states that are active if given preconditions are fulfilled.
3. An Example Scenario: Traffic Education with Cool Modes In the following, we describe an example use case for the MCC checking system from the domain of traffic education at primary schools. The scenario is realized by implementing an interactive traffic board, on which users can arrange streets, traffic signs and different kinds of vehicles. This interactive board is realized as a plug-in for the Cool Modes learning environment.[8] The MCC checking system is already integrated into Cool Modes. So, we can use it together with the traffic plug-in (as with any other plug-in) instantly, without further porting effort. The left side of fig. 1 shows a scenario for the traffic plug-in for Cool Modes. You can see five streets with, at all, six crossings. Four cars, a truck and a bicycle drive through the streets. Various traffic signs rule the rights of way between the traffic participants. The very existence of this setup makes teaching traffic education easier than with a blackboard. But the plug-in does not only show pictures, but it also “knows” the traffic rules that apply to the different locations of the map. The four text boxes in fig. 1 (left side) show messages that appear when the user violates a traffic rule while he or she
286
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
moves a vehicle across the screen. So, the user can move vehicles across the streets of this example and explore all the things that might go wrong when driving a car. Every time he or she violates a rule, the system reports the error to the user. In addition, in many cases the user gets suggestions how he or she can avoid errors in the future.
4. Imagine Doing this with an Intelligent Checker... In section 5 we will see how the MCC checking system implements checking the situations in the example scenario. But before let us consider the problems a checking system would have if it would try to solve these situations based on domain knowledge: • All relevant traffic rules must be formalized and modelled. • The system must be able to deal with inaccuracies, e.g. when the user places a car slightly beside a lane. So it must implement some kind of “fuzzy” recognition of situations. • In the example in figure 1 the system seems to make guesses about the reasons of errors. So, an intelligent system must add heuristics to generate such tips for the user. On the other hand, the big advantage of a knowledge based implementation of the traffic rules (and an important limitation of the MCC system) is that it would work with other street configurations as well, while the approach presented here restricts checking to one single given configuration. Using the “stupid” MCC approach, an author must build a new configuration file for each new street set up. But it is very questionable whether it would be worth doing the great effort of implementing an intelligent checker with a domain model for this traffic scenario, because scenarios like the one in figure 1 are complicated (and thus expensive) to model with a rule driven system. The implementation only pays off, if the resulting checker is used for many scenarios, and thus the cost is shared between the different applications. An ad-hoc-implementation by a teacher for the use at school next day can be done better and faster using the approach presented in this paper.
5. Solutions In this section we will see how the MCC checking system produces the illusion “as if” it knows something about traffic rules. Also, we will explain the new features aspect handling and process specification (cf. section 2). 5.1. How to Specify a Target Constellation Although a good part of the highway code is involved in the traffic example presented in the last section, nothing of this code is modelled for the checking facilities of the MCC system. Instead, just the parameters for the location and size of the objects are needed. The right side of fig. 1 shows how the (semantic, domain specific) traffic rules are broken down to a level of checking locations: The crossing in the figure involves concepts like STOP and right-of-way signs, in concurrence to the right-before-left rule. But the concrete situation can also be described with two simple sentences:
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
287
• If there is a car at v or w, the car at u is not allowed to drive. (This sentence is modelled by the condition tree at the top right side of figure 1 (right side). • If there is a car at v, the car at w is not allowed to do a left turn. There is no need to explain the highway code to the system, as long as it can check these two simple sentences. The system does this by simply checking, whether there is an (better: any) object at a specified screen location, highlighted in fig. 1 (right side). By this, the system can “recognize” domain-specific situations like right-of-way conditions without knowing anything about cars and streets. Because the analysis only uses methods that are inherited from the super class of all visual components in Cool Modes (x and y coordinate), there is no need to adjust the checking system for dealing with traffic education. The support of this domain is for free. For other Cool Modes plug-ins it may be necessary to provide specialized forms of analysis. But even this specialized analysis methods can be added at runtime by configuration, not by implementation. 5.2. Aspect Handling Using an older version of the MCC checking system [2], a user had to implement an examination of object locations by using low-level parameters like x, y, width, and height. He or she can still do so with the new system. But in most cases this is unnecessary. To provide a more practical, user oriented way of specifying target constellations, we added aspects to the MCC. An aspect is a new type of constraint that can be used instead of a bundle of (low-level) attributes to realize a higher level concept. E.g., the concept “absolute position on the screen” is implemented by combining the parameters x, y, width, and height. If a user wants to check the position of an object, he or she does not have to deal with low-level parameters, but can simply select the suitable aspect from a list, even without knowing which parameters in detail are substituted by this aspect. The attributes forming the aspect “absolute position” are quite obvious. Less obvious are the attributes defining the aspect “identification”, that is a collection of attributes that faces the problem of defining identity in visual languages, mentioned in section 1.1. This aspect does not comprise a fixed set of attributes, but different attributes, depending on the object that is to be identified. To instantly produce a target constellation for a check, users can make snapshots of a given group of objects. While doing so, the system adapts (one or more) aspects to each member of a group of objects and adds the result of this operation to a constraint tree. 5.3. Sequences of Target Constellations The MCC checking system has the ability to survey not only single target constellations, but also sequences of these. Going back to the traffic example in figure 1 (right side), we see that the correct handling of the right-of-way situations needs the analysis of two different situations: • First, the system has to recognize that there is a situation that may cause a violation of the right-of-way rule. When the system recognizes such a situation, the rule is switched on.
288
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
• Second, the system must survey, if, with his or her next step, the user actually breaks the rule. Only in this case a feedback will be provided. If, on the other hand, the user resolves the situation correctly, the rule is switched off silently.
Figure 2. The right side of this figure shows a graph, in which the nodes on the right side (“Cars at...”) represent a target constellation. Also, each of these nodes can have an output associated with it. The graph realizes a simplified version of the right-of-way rule for the crossing at the left of this figure. At the beginning, the “Start” node is active and surveys the first target constellation (cars at x and y). The target constellation is not fulfilled, and so nothing happens. After moving the car from the top to area x (1), the target constellation is fulfilled. The processor now follows the edge to the next node, which says "Wait for next action". Now the processor surveys the second target constellation (cars at y and z). If the user makes an error now and moves the car from area x to area z (2b) the second target constellation is fulfilled. There is an output connected with this configuration (not shown here) and the user will be informed that he or she made an error concerning the stop sign. Otherwise, if the user (correctly) moves the car from area y (2a), there will be no message. Neither the first, nor the second target constellation is fulfilled any longer (there is just a car left in area x), and so the processor starts again surveying the first target constellation only.
Fig. 2 shows in detail how this sequencing process works. In the use case described in section 3, about 20 rules like this are active simultaneously. Of course, the “chains” built by surveyed target constellations can be longer than shown in fig. 2. Here, there is just a precondition and a postcondition. As long as the precondition is fulfilled, the system surveys the postcondition. The sequencing mechanism in the MCC checking system has the same function as the behavior graph in the CTAT environment of [4]. It connects points of interest throughout the checking process and gives them a consistent order. But while the behavior graph is restricted in the way that it only works with sequences of user actions that are defined before, the processor graph is more flexible and provides more freedom to the user: In the example in fig. 2 the user can do arbitrary actions; but every time he or she produces a situation matching the first target condition, the rule will switch to active state. Now, again, the user can do arbitrary actions, maybe at other areas of the map, with other cars at other crossings, the rule waits until a relevant state is reached and then reports an error or switches off silently. Compared with this, in any given situation, the behavior graph can only handle user actions which are provided for this concrete situation. Parallelism (user does something different first before continuing his or her actual task) and unexpected user behavior are much more complicated to handle with the behavior graph.
K. Herrmann and U. Hoppe / Making an Unintelligent Checker Smarter
289
6. Conclusion In this paper we presented the MCC system, a method to check visual language configuration problems without the use of deep domain knowledge and without “strong” AI methods. The MCC system is effective when feedback should be provided for a smaller number of problems in a given domain. Additionally, the system can be customized for new domains by domain (not AI) specialists. The MCC checker has been tested with configuration problems from various domains, e.g. models of mathematical proofs, petri nets, and context sensitive help systems. Although constraint satisfaction problems (CSPs) in general can have exponential complexity, the complexity of average MCC configuration files is usually more like O(n2 ), because most of the constraints are local. So, the system can also handle more complex cases without run time problems. A limitation of the system is that an author has to create a new configuration file for each new case. The bigger the number of cases from a single domain, the more it is worthwhile to invest in the work of building a real ITS based on strong AI methods. But for a teacher who just wants to set up one or two situations for next day’s use, the MCC system is much better suited. Currently, we are building a MCC checker to provide context sensitive help for a complex visual language concerning stochastic experiments. Another idea (not put into practice yet) is to use a MCC checker as an agent to move the cars in the use case described in this paper. The cars would then move across the traffic setting automatically, behaving in accordance with the highway code but without having any idea of it.
References [1] K. Gassner, M. Jansen, A. Harrer, K. Herrmann, and U. Hoppe. Analysis methods for collaborative models and activities. In Proceedings of the CSCL 2003, pp. 369–377. [2] K. Herrmann, U. Hoppe, and N. Pinkwart. A checking mechanism for visual language environments. In Proceedings of the AIED 2003, pp. 97–104. [3] T. Murray. Authoring intelligent tutoring systems. Int. Journal of AIEd, 10:98–129, 1999. [4] K. Koedinger, V. Aleven, and N. Heffernan. Essentials of cognitive modeling for instructional design: Rapid development of pseudo tutors. In Proceedings of the ICLS, 2004. [5] Hot Potatoe. http://web.uvic.ca/hrd/halfbaked/, 2004. [6] K. Koedinger, V. Aleven, N. Heffernan, B. McLaren, and M. Hockenberry. Opening the door to non-programmers: Authoring intelligent tutor behavior by demonstration. In Proceedings of the ITS, 2004. [7] B. McLaren, K. Koedinger, M. Schneider, A. Harrer, and L. Bollen. Towards cognitive tutoring in a collaborative web-based environment. In Maristella Matera and Sara Comai, editors, Engineering Advanced Web Applications, Paramus, USA, 2004. Rinton Press. [8] N. Pinkwart. A plug-in architecture for graph based collaborative modeling systems. In Proceedings of the AIED 2003, pp. 535–536. [9] G. Fischer and E. Giaccardi. End User Development, chapter Meta-Design: A Framework for the Future of End-User Development. Kluwer Academic Publishers, 2004.
290
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Iterative Evaluation of a Large-Scale, Intelligent Game for Language Learning W. Lewis Johnson, Carole Beal Center for Advanced Research in Technology for Education USC / Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 Abstract. Educational content developers, including AIED developers, traditionally make a distinction between formative evaluation and summative evaluation of learning materials. Although the distinction is valid, it is inadequate for many AIED systems because they require multiple types of evaluation and multiple stages of evaluation. Developers of interactive intelligent learning environments must evaluate the effectiveness of the component technologies, the quality of the user interaction, and the potential of the program to achieve learning outcomes, in order to uncover problems prior to summative evaluation. Often these intermediate evaluations go unreported, so other developers cannot benefit from the lessons learned. This paper documents the iterative evaluation of the Tactical Language Training System, an interactive game for learning foreign language and culture. This project employs a highly iterative development and evaluation cycle. The courseware and software have already undergone six discrete stages of formative evaluation, and further formative evaluations are planned. The paper documents the evaluations that were taken at each stage, as well as the information obtained, and draws lessons that may be applicable to other AIED systems.
Introduction Educational content developers conventionally draw a distinction between formative and summative evaluation of educational materials. Formative evaluation takes place during development; it seeks to understand strengths and amplify them, and understand weaknesses and mend them, before the educational materials are deployed. Summative evaluation is retrospective, to document concrete achievement [5]. Many view formative evaluation as something that should be kept internal to a project, and not published. This is due in part to the belief that formative evaluations need not involve learners. For example, Scriven [6] is frequently quoted as having said: “When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative.” Although the formative vs. summative distinction is useful, it does not provide much guidance to AIED developers. AIED systems frequently incorporate novel computational methods, realized in systems that must be usable by the target learners, and which are designed to achieve learning outcomes. These issues all warrant evaluation, and the “cooks” cannot answer the evaluation questions simply by “tasting the soup.” Yet one cannot use summative evaluation methods for this purpose either. Multiple evaluation questions need to be answered, which can involve multiple experiments, large numbers of subjects and large amounts of data. Meanwhile the system continues to be developed, so by the time the evaluation studies are complete they are no longer relevant to the system in its current form. This paper documents the formative evaluation process to date for the Tactical Language Training System (TLTS). This project aims to create computer-based games, incorporating artificial intelligence technology, and each supporting approximately 80 hours of learning.
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
291
Given the effort required to create this much content, evaluation with learners could not wait until the summative stage. Instead, a highly iterative formative evaluation process was adopted, involving six discrete evaluation stages so far. Representative users were involved in nearly all stages. Each individual evaluation was small scale, but together they provide an accumulating body of evidence from which to predict that the completed system will meet its design objectives. The evaluation process has enabled the TLTS to evolve from an exploratory prototype to a practical training tool that is about to be deployed on a wide scale. These evaluation techniques may be relevant to other AIED projects that wish to make a smooth transition from the research laboratory to broad-based educational use. 1. Overview of the Tactical Language Training System The Tactical Language Training System is designed to help people rapidly acquire basic spoken conversation skills, particularly in languages that few foreigners learn because they are considered to be very difficult. Each language training package is designed to give people enough knowledge of language and culture to carry out specific tasks in a foreign country, such as introducing yourself, obtaining directions, and arranging meetings with local officials. The curriculum and software design are focused on the necessary skills for the target tasks, i.e., it has a strong task-based focus [3]. The current curricula focus on the needs of military personnel engaged in civil affairs missions; however the same method could be applied to any language course that focuses on communication skills for specific situations. Two training courses are being developed so far: Tactical Levantine Arabic, for the Arabic dialect spoken in Lebanon and surrounding countries, and Tactical Iraqi, for the Iraqi Arabic dialect. The TLTS includes the following main components [8]. The Mission Game (Figure 1, left side) is an interactive story-based 3D game where learners practice carrying out the mission. Here the player’s character, at middle left, is introducing himself to a Lebanese man in a café. The player is accompanied by an aide character (far left), who can offer suggestions if the player gets stuck. The Skill Builder (Figure 1, right) is a set of interactive exercises focused on the target skills, in which learners practice saying words and phrases, listening to and responding to sample utterances. A virtual tutor evaluates the learner’s speech and gives feedback that provides encouragement and attempts to overcome learner negative affectivity [7]. A speech-enabled Arcade Game gives learners further practice opportunities. Finally, there is a hypertext glossary can show the vocabulary in each lesson, the grammatical structure of the phrases being learned, and explains the rules of grammar that apply to each utterance.
Figure 1. Views of the Tactical Language Training System 2. Evaluation Issues for the TLTS
292
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
The stated goal of the TLTS project is to enable learners with a wide range of aptitudes to acquire basic conversational proficiency in the target tasks, in a difficult language such as Arabic, in as little as eighty hours of time on the computer. We believe that achieving this goal requires a combination of curriculum innovations and new and previously untested technologies. This raises a host of evaluation issues and difficulties. It is hard to find existing courses into which TLTS can be inserted for evaluation purposes, because the TLTS curriculum and target population differ greatly from that of a typical Arabic language course. Most Arabic courses place heavy emphasis on reading and writing Modern Standard Arabic, and are designed for high-aptitude learners. The TLTS Arabic courseware focuses on spoken Arabic dialects, and is designed to cater to a wide range of learners with limited aptitude or motivational difficulties. The TLTS employs an innovative combination of gaming and intelligent tutoring technologies; this method needed to be evaluated for effectiveness. It incorporates novel speech recognition [11], pedagogical agent [7] and autonomous agent technologies [14], whose performance must be tested. Because of the large content development commitment, content must be evaluated as it is developed in order to correct design problems as early as possible. It is not even obvious how much content is needed for 80 hours of interaction. Then once the content is developed, additional evaluation questions come up. Standard language proficiency assessments are not well suited for evaluating TLTS learning outcomes. The most relevant assessment is the Oral Proficiency Interview (OPI), in which a trained interviewer engages the learner in progressively more complex dialog in the foreign language. Since TLTS learners apply language to specific tasks, their score on an OPI may depend on the topic that is the focus of the conversation. So-called task-based approaches to assessment [3] may be relevant, but as Bachman [1] notes, it is difficult to draw reliable conclusions about learner proficiency solely on the basis of task-based assessments. Thus TLTS faces a similar problem to other intelligent tutoring systems such as the PUMP Algebra Tutor [9]: new assessment instruments must be developed in order to evaluate skills that the learning environment focuses on. Finally, we need to know what components of the TLTS contribute to learning effectiveness; there are multiple components which may have synergistic effects. 3. Evaluating the Initial Concept The project began in April of 2003, and focused initially on Levantine Arabic, mainly because Lebanese speakers and data sets are readily available in the United States. Very early on, an interactive PowerPoint mockup of the intended user interaction was developed and presented to prospective stakeholders. This was followed by simple prototypes of the Mission Game and Skill Builder. The Mission Game prototype was created as a “mod” of the Unreal Tournament 2003 game, using the GameBots extension for artificially intelligent characters (http://www.planetunreal.com/gamebots/). It allowed a learner to enter the virtual café shown in Figure 1, engage in a conversation with a character to get directions to the local leader’s house, and then follow those directions toward that house. The Skill Builder prototype was implemented in ToolBook, with enough lessons to cover the vocabulary needed for the first scene of the Mission Game, although not all lessons were integrated with the speech recognizer. This prototype then was delivered to the Department of Foreign Languages at the US Military Academy (USMA) for formative evaluation. The USMA was a good choice for assisting the evaluation because they are interested in new technologies for language learning, and they have an extensive Arabic language program that provides strong training in spoken Arabic. They assigned an experienced Arabic student (Cadet Ian Strand) to go through the lesson materials, try to carry out the mission in the MPE, and report on the potential value of the software for learning. CDT Strand was not a truly representative user, since he already
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
293
knew Arabic and had a high language aptitude. However he proved to be an ideal evaluator at this stage—he was able to complete the lessons and mission even though the lessons were incomplete, and was able to evaluate the courseware from a hypothetical novice’s perspective. An alternative approach at this stage could have been to test the system in a Wizard-ofOz experiment. Although Wizard-of-Oz experiments can be valuable for early evaluation [13], they have one clear disadvantage—they keep the prototype in the laboratory, under the control of an experimenter. By instead creating a self-contained prototype with limited functionality, we obtained early external validation of our approach. 4. Adding Functionality, Testing Usability Several months of further development and internal testing followed. The decentralized architecture of the initial prototypes was replaced with an integrated multi-process architecture [8]. Further improvements were made to the speech recognizer, and the lesson and game content were progressively extended. Then in April 2004 we conducted the next formative evaluation with non-project members. Seven learners participated in this study. Most were people in our laboratory who had some awareness of the project; however none of them had been involved in the development of the TLTS. Although all had some foreign language training, none of them knew any Arabic. All were experienced computer game players. They were thus examples of people who ultimately should benefit from TLTS, although not truly representative of the diversity of learners that TLTS was designed to support. The purpose of this test was to evaluate the usability and functionality of the system from a user’s perspective. Each subject was introduced to the system by an experimenter, and was videotaped as they spent a one-hour session with the software, using a simplified thinking aloud protocol [13]. Afterwards the experimenter carried out a semi-structured interview, asking the subject about their impressions of different parts of the system. No major usability problems were reported, and none appeared on the videotape. The subjects asserted that they felt the approach was much better than classroom instruction. Subjects who had failed to learn very much in their previous foreign language classes were convinced that they would be able to learn successfully using this approach. The subjects also felt that the game and lesson components supported each other, that if they had spent more time in the lessons it would help their performance in the game. At the same time, a number of problems emerged, both in the instructional design and in the use of the underlying technology. The pronunciation evaluation in the Skill Builder was too stringent for beginners; this created the impression that the primary learning objective was pronunciation instead of communication. The feedback of the pedagogical agent was repetitive and sometimes incorrect. Because we had designed the pedagogical agent to act human-like, instances of repetitive, non-human-like behaviour were especially glaring. Some subjects were unsure of where to go in the game and what to do. There was also a general reluctance to play the game, for fear that it would be too difficult. Once they got to the game, they had difficulty applying the knowledge that they had acquired in the Skill Builder. These evaluations led to system improvements. The library of tactics employed by the pedagogical agent was greatly extended, pronunciation accuracy threshold was lowered, and speech recognition performance was improved. More simulated conversation exercises were added to the Skill Builder, to facilitate transfer of knowledge to the Mission Game. An introductory tutorial was created for the Mission Game, in order to help learners get started. 5. A Comparative Test with Representative Users
294
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
A more extensive test was then conducted in July of 2004 with representative users. It was structured to provide preliminary evidence as to whether the software design promotes learning, and identify what parts of the software are most important in promoting learning. The following is a brief overview of this study, which is described in more detail in [2]. Twenty-one soldiers at Ft. Bragg, North Carolina, were recruited for the study. The subjects were divided in four groups, in a 2x2 design. Two groups got both the Skill Builder and Mission Game, two got just the Skill Builder. Two groups got a version of the Skill Builder with pronunciation feedback, two groups got no pronunciation feedback. This enabled us to start to assess the role that tutorial feedback and gameplay might have on learning outcomes. Due to the limited availability of test computers each group only had six hours to work with the software over the course of a week, so learning gains were expected to be limited. The group that worked with the complete system rated it as most helpful, considered it to be superior to classroom instruction, and in fact considered it to be comparable to one-on-one tutoring. On the other hand, the group that got tutorial feedback without the Mission Game scored highest on the post-test. It appeared that combination of performance feedback and motivational feedback provided by the virtual tutor helped to keep the learners engaged and focused on learning. Some reported that the found the human-like responses to be enjoyable and “cool”. Apparently the shortcomings that the earlier study had identified in the tutorial feedback had been corrected. Another important lesson from this study was how to overcome learners’ reluctance to enter the Mission Game. We found that if the experimenter introduced them directly to the game and encouraged them to try saying hello to one of the characters there, they got engaged, and were more confident to try it. With the assistance of the virtual tutor, many were able to complete the initial scenario in the first session. Improvement was found to be needed in the Mission Game and the post-test. The Mission Game was not able to recognize the full range of relevant utterances that subjects were learning in the Skill Builder. This and the fact that there are only a limited range of possible outcomes of the game when played in beginner mode gave learners the impression that they simply needed to memorize certain phrases to get through the game. After the first day the subjects showed up with printed cheat-sheets that they had created, so they could even avoid memorization. We concluded that the game would need to support more variability in order to be effective. On the evaluation side, we are concerned that the post-test that we used was based on the Skill Builder content, so that it did not really test the skills that learners should be acquiring in the game, namely to carry on a conversation. We subsequently made improvements to the Mission Game language model and interaction so that there was more variability in game play. We also came up with a way to make the post-test focus more on conversational proficiency: to use the Mission Game as an assessment vehicle. If the virtual tutor in the game is replaced by another character who knows no Arabic, the learner is then forced to perform the task unassisted. If they can do this, it demonstrates that they have mastered the necessary skills, at least in that context. To make this approach viable, it would be necessary to log the user’s interaction with the software. Therefore logging capabilities were added to enable further analysis of learner performance. 6. A Longer-Term Test with Representative Users Once these and other improvements were made to the system, and more content was added, another test was scheduled at Ft. Bragg, in October, 2004. This time the focus was on the following questions. (1) How quickly do learners go through the material? (2) How proficient are they when they complete the material? (3) How do the subjects’ attitudes and motivation affect performance, and vice versa? Question 1 was posed to extrapolate from the work completed so far and estimate how much additional content would be required to complete an
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
295
80-hour course. Question 2 was posed to assess progress toward achieving task-based conversational proficiency. In particular, we wanted to assess whether our proposed approach of using the Mission Game as an assessment tool was workable. Question 3 was of interest because we hypothesized that the benefits of TLTS result in part from improved learner motivation, both from the game play and from the tutorial feedback. For this study, rather than break the subjects into groups, we assembled just one group of six subjects, and monitored them through three solid days of work with the program followed by a post-test. They were also soldiers, with a range of different aptitudes and proficiencies, although being members of the US Army Special Forces their intelligence was greater than that of the average soldier. Their ages ranged from 20 to 46 years, and all had some foreign language background; one even had some basic training in Modern Standard Arabic. Not surprisingly, all subjects in this study performed better than in the previous study, and performance was particularly good on vocabulary recognition and recall, understanding conversations, and simulated participation in conversations. They were also able to perform well in the Mission Game when employed as an assessment tool. They made better use of the Mission Game, and did not rely on cheat sheets this time. Overall, the utility of the Mission Game was much more apparent this time. Although most of the subjects did well, two had particular difficulties. One was the oldest subject, who repeatedly indicated that he was preparing to retire from the military soon and had little interest in learning a difficult language that he would never use. The other subject expressed a high degree of anxiety about language learning, and that anxiety did not significantly abate over the course of the study. Meanwhile, other problems surfaced. The new content that had been introduced in time for this evaluation still had some errors, and the underlying software had some bugs that impeded usability. The basic problem was that once the evaluation was scheduled, and subjects were accrued, it was impossible to postpone the test to perform further debugging. Given the choice of carrying out the test with a buggy version of the program and cancelling it altogether, the better choice was to go ahead with the evaluation and make the best of it. Another problem came up during analysis of the results: the log files that were collected proved to be very difficult to use. Questions that were easy to pose, e.g., “How long did each subject take on average per utterance in the final MPE test scene?” in fact proved to be difficult to answer. The log files that the TLTS generated had not been constructed in such a way as to facilitate the kinds of analyses that we subsequently wanted to perform. In a sense we relearned the lesson that other researchers have identified regarding interaction logs [10]: that log analysis is more that data collection, and attention must be paid both to the design of the logging facility and to the tools that operate on the resulting logs. Fortunately our iterative evaluation approach enabled us to learn this lesson quickly and correct the situation before subsequent evaluations. 7. Formative Evaluation of Tactical Iraqi After having responded to the lessons learned from the previous test and corrected some of the errors in the Levantine Arabic content, we then temporarily put Levantine Arabic aside and focused on developing new content for Iraqi Arabic. There was a political reason for this (the desire to do something to improve the political situation in Iraq), a technical reason (to see if the TLTS was generalizable to new languages), and a pedagogical reason (to see if our understanding of how to develop content for the TLTS had progressed to the point where we could develop new courses quickly). Iraqi Arabic is substantially different from Levantine Arabic, and Iraqi cultural norms different from Lebanese cultural norms. Nevertheless our technical and pedagogical progress were such that by January 2005 we had a version of
296
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
Tactical Iraqi ready for external formative evaluation that was already better developed than any of the versions of Tactical Levantine Arabic that have been developed to date. During January we sent out invitations to US military units to send personnel to our laboratory to attend a seminar on the installation and use of Tactical Iraqi, and to take the software back with them to let other members of their units use. Three units sent representatives. It was made clear to them that Tactical Iraqi was still undergoing formative evaluation, and that they had critical roles to play in support of the formative evaluation. During the seminar the participants spent substantial amounts of time using the software and gave us their feedback; meanwhile their interaction logs and speech recordings were collected and used to further train the speech recognizer and identify and correct program errors. All participants were enthusiastic about the program, and two of the three installed it at their home sites and solicited the assistance of other members of their unit in beta testing. Logs from these interactions were sent back to CARTE for further analysis. Periodic testing continued through the spring of 2005, and two more training seminars were held. A US Air Force officer stationed in Los Angeles volunteered to pilot test the entire course developed to date in May. This will be followed in late May by a complete learning evaluation of the content developed to date, at Camp Pendleton, California. Fifty US Marines will complete the Tactical Iraqi course over a two week period, and then complete a post test. All interaction data will be logged and analyzed. Camp Pendleton staff will informally compare the learning gains from this course against learning from their existing classroombased four-week Arabic course. During this test we will employ new and improved assessment instruments. Participants will complete a pre-test, a periodic instrument to assess their attitudes toward learning, and a post-test questionnaire. The previous learning assessment post-test has been integrated into the TLTS, so that the same mechanism for collecting log files can also be used to collect post-test results. We have created a new test scene in the Mission Game in which the learner must perform a similar task, but in a slightly different context. This will help determine whether the skills learned in the game are transferable. We will also employ trained oral proficiency interviewers assess the learning gains, so that we can compare these results against the ones obtained within the program. Although this upcoming evaluation is for higher stakes, it is still formative. The content for Tactical Iraqi is not yet complete. Nevertheless, it is expected that the Marines will make decisions about whether to incorporate Tactical Iraqi into their language program. Content development for the current version of Tactical Iraqi will end in June 2005, and summative evaluations at West Point and elsewhere are planned for the fall of 2005. 4. Summary This article has described the formative evaluation process that was applied in the development of the Tactical Language Training System. The following is a summary of some of the key lessons learned that may apply to other AIED systems of similar scale and complexity. Interactive mock-ups and working prototypes should be developed as early as possible. Initial evaluations should if possible involve selected individuals who are not themselves target users but can offer a target user’s perspective and are able to tolerate gaps in the prototype. Preliminary assessments of usability and user impressions should be conducted early if possible, and repeated if necessary, in order to identify problems before they have an impact on learning outcomes. In a complex learning environment with multiple components, multiple small-scale evaluations may be required until all components prove to be ready for use. Design requirements are likely to change based on lessons learned from earlier formative evaluations, which in turn call for further formative evaluation to validate them.
W.L. Johnson and C. Beal / Iterative Evaluation of a Large-Scale, Intelligent Game
297
Mostow [10] has observed that careful evaluation can be onerous, and for this reason researchers tend to avoid it or delay it until the end of a project. An iterative evaluation method is infeasible if it involves a series of onerous evaluation steps. Instead, this paper illustrates an approach where each evaluation is kept small, in terms of numbers of subjects, time on task, and/or depth of evaluation. The individual studies may yield less in the way of statistically significant results than large-scale evaluations do, but the benefit is that evaluation can be tightly coupled into the development process, yielding a system that is more likely to achieve the desired learning outcomes when it is complete. The experience gained in the formative pilot evaluations will moreover make it easier to measure those outcomes. Acknowledgments This project is part of the DARWARS Training Superiority Program of the Defense Advanced Research Projects Agency. The authors wish to acknowledge the contributions of the members of the Tactical Language Team. They also wish to thank the people at the US Military Academy, Special Operations Foreign Language Office, 4th Psychological Operations Group, Joint Readiness Training Center, 3rd Armored Cavalry Division, 7th Army Training Command, and Marine Corps Expeditionary Warfare School for their assistance in the evaluations described here. References [1] Bachman, L.F. (2002). Some reflections on task-based language performance assessment. Language Testing 19(3), 461-484. [2] Beal, C., Johnson, W.L., Dabrowski, R., & Wu, S., (2005). Individualized feedback and simulation-based practice in the Tactical Language Training System: An experimental evaluation. AIED 2005. IOS Press. [3] Bygate, M., Skeehan, P., & Swain, M. (2001). Researching pedagogic tasks: Second language learning, teaching, and testing. Harlow, England: Longman. [4] Corbett, A.T., Koedinger, K.R. & Hadler, W.S. (2002). Cognitive Tutors: From research classroom to all classrooms. In P. Goodman (Ed.): Technology enhanced learning: Opportunities for change. Mahwah, NJ: Lawrence Erlbaum Associates. [5] The Center for Effective Teaching and Learning, University of Texas at El Paso. Formative and summative evaluation. http://cetal.utep.edu/resources/portfolios/form-sum.htm. [6] Scriven, 1991, cited in “Summative vs. Formative Evaluation”, http://jan.ucc.nau.edu/edtech/etc667/proposal/evaluation/summative_vs._formative.htm [7] Johnson, W.L., Wu, S., & Nouhi, Y. (2004). Socially intelligent pronunciation feedback for second language learning. ITS ’04 Workshop on Social and Emotional Intelligence in Learning Environments. [8] Johnson, W.L., Vilhjálmsson, H., & Marsella, S. (2004). The DARWARS Tactical Language Training System. Proceedings of I/ITSEC 2004. [9] Koedinger, K.R., Anderson, J.R., Hadley, W.M., & Mark, M.A. (1997). Intelligent tutoring goes to school in the big city. IJAIED, 8, 30-43. [10] Mostow, J. (2004). Evaluation purposes, excuses, and methods: Experience from a Reading Tutor that listens. Interactive Literacy Education: Facilitating Literacy Environments Through Technology, C. K. Kinzer, L. Verhoeven, ed., Erlbaum Publishers, Mahwah, NJ. [11] Mote, N., Johnson, W.L., Sethy, A., Silva, J., Narayanan, S. (2004). Tactical language detection and modeling of learning speech errors: The case of Arabic tactical language training for American English speakers. InSTIL/ICALL Symposium, Venice, Italy. [12] Nielsen, J. (1994). Guerrilla HCI: Using discount usability engineering to penetrate the intimidation barrier. http://www.useit.com/papers/guerrilla_hci.html [13] Rizzo, P., Lee, H., Shaw, E., Johnson, W.L,, Wang, N., & Mayer, R. (2005). A semi-automated Wizard of Oz interface for modeling tutorial strategies. UM’05. [14] Si, M. & Marsella, S. (2005). Thespian: Using multiagent fitting to craft interactive drama. AAMAS 2005.
298
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents W. Lewis Johnson1, Richard E. Mayer2, Elisabeth André3, Matthias Rehm3 USC / Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 2 University of California, Santa Barbara 3 University of Augsburg, Institute for Computer Science, Eichleitnerstr. 30, Germany
1
Abstract. Politeness may play a role in tutorial interaction, including promoting learner motivation and avoiding negative affect. Politeness theory can account for this as a means of mitigating the face threats arising in tutorial situations. It further provides a way of accounting for differences in politeness in different cultures. Research in social aspects of human-computer interaction predict that similar phenomena will arise when a computer tutor interacts with learners, i.e., they should exhibit politeness, and the degree of politeness may be culturally dependent. To test this hypothesis, a series of experiments was conducted. First, American students were asked to rate the politeness of possible messages delivered by a computer tutor. The ratings were consistent with the conversational politeness hypothesis, although they depended upon the level of computer literacy of the subjects. Then, the materials were translated into German, in two versions: a polite version, using the formal pronoun Sie, and a familiar version, using the informal pronoun Du. German students were asked to rate these messages. Ratings of the German students were highly consistent with the ratings given by the American subjects, and the same pattern was found across both pronoun forms.
1. Introduction Animated pedagogical agents are capable of rich multimodal interactions with learners [6, 14]. They exploit people’s natural tendency relate to interactive computer systems as social actors [16], to respond to them as if they have human qualities such as personality and empathy. In particular, pedagogical agents are able to perform affective and motivational scaffolding [2, 4, 9]. Educational researchers have increasingly called attention to the role of affect and motivation in learning [13, 17] and the role of expert tutoring in promoting affective and motivational states that are conducive to learning [11, 12]. Pedagogical agents are being developed that emulate motivational tutoring tactics, and they can positively affect learner attitudes, motivational state, and learning gains [18]. We use the politeness theory of Brown and Levinson [3] as a starting point for modelling motivational tactics. Politeness theory provides a general framework for analyzing dialog in social situations, and in particular the ways in which speakers mitigate face threats. When human tutors interact with learners they constantly risk threatening the learner’s face, by showing disapproval or taking control away from the learner. They can also enhance learner face by showing approval and respect for the learner’s choices. This in turn can have an impact on the learner’s attitude and motivation. Johnson et al. [10] have developed a model for characterizing tutorial dialog moves in terms of the amount of face threat redress they exhibit, and implemented it in a tutorial tactic generator that can vary the manner in
W.L. Johnson et al. / Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents
299
which a tutorial dialog move is realized depending upon the degree of attention paid to the learner’s face and motivational state. An interesting aspect of Brown and Levinson’s theory is that it applies to all languages and cultures. Every language has a similar set of methods for mitigating face threat; however, not all cultures ascribe equal importance to each type of face threat. Using politeness theory as a framework, it is possible to create tutorial tactics in multiple languages and compare them to assess their impact in different cultures. This paper presents a study that performs just such a comparison. German subjects evaluated the degree of face threat mitigation implied by a range of tutorial tactics for a pedagogical agent. These ratings were compared against similar ratings by American subjects of pedagogical agent tactics in English. The ratings by the subjects were in very close agreement. Use of formal vs. informal pronouns, a cardinal indicator of formality in German, did not have a significant effect on ratings of face threat mitigation. These results have implications for efforts to adapt pedagogical agents for other languages and cultures, or to create multilingual pedagogical agents (e.g., [8]). 2. Background: Politeness Theory and Tutorial Dialog An earlier study analyzed the dialog moves made by a human tutor working with learners on a computer-based learning environment for industrial engineering [7]. It was found that the tutor very rarely gave the learners direct instructions as to what to do. Instead, advice was phrased indirectly in the form of questions, suggestions, hints, and proposals. Often the advice was phrased as a proposal of what the learner and tutor could do jointly (e.g., “So why don’t we go back to the tutorial factory?”), when in reality the learner was carrying out all of the actions. Overall, tutorial advice was found to fall into one of eight categories: (1) direct commands (e.g., “Click the ENTER button”), (2) indirect suggestions (e.g., “They are asking you to go back and maybe change it”), (3) requests, (4) actions expressed as the tutor’s goals (e.g., “Run your factory, that’s what I’d do”), (5) actions as shared goals, (6) questions, (7) suggestions of student goals (“e.g., “you will probably want to look at the work centres”), and (8) Socratic hints (e.g., “Well, think about what you did.”). Brown & Levinson’s politeness theory provides a way to account for these indirect tutorial dialog moves. According to politeness theory, all social actors have face wants: the desire for positive face (being approved of by others) and the desire for negative face (being unimpeded by others). Many conversational exchanges between people, (e.g., offers, requests, commands) potentially threaten positive face, negative face, or both. To avoid this, speakers employ various types of face threat mitigation strategies to reduce the impact on face. Strategies identified by Brown and Levinson include positive politeness (emphasizing approval of the hearer), negative politeness (emphasizing the hearer’s freedom of action, e.g., via a suggestion) and off-record statements (indirect statements that imply that an action is needed). The eight categories listed above fit naturally as subcategories of Brown and Levinson’s taxonomy, and can be understood as addressing the learner’s positive face, negative face, or both. In this corpus positive face is most often manifested by shared goals (the willingness to engage in shared activity with someone implies respect for that person’s contributions). We hypothesize that tutors adjust their modes of address with learners not just to mitigate face threat, but also to enhance the learners’ sense of being approved of and free to make their own choices. These in turn can have an influence on the learners’ self-confidence, and these factors have been found by researchers on motivation (e.g. [12]) to have an impact on learner motivation. Based on this analysis, Johnson and colleagues [11] developed a tutorial dialog generator that automatically selects an appropriate form for a tutorial dialog move, based on
300
W.L. Johnson et al. / Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents
the social distance between the tutor and the learner, the social power of the tutor over the learner, the degree of influence the tutor wishes to have on the learner’s motivational state, the type of face threatening action, and the degree of face threat mitigation afforded by each type of tutorial dialog move. The dialog generator utilizes a library of tutorial tactics, each of which is annotated according to the amount of redress that tactic gives to the learner’s positive face and negative face. Once each tactic is annotated in terms of negative and positive face, the generator can choose appropriate tactics automatically. To make this scheme work, it is necessary to obtain appropriate positive politeness and negative positive politeness ratings for each tactic. These ratings were obtained using an experimental method described in [13]. Two groups of instances of each of the eight tactic categories were constructed (see appendix). One set, the A group, consisted of recommendations to click the ENTER button on a keyboard. The B group consisted of suggestions to employ the quadratic formula to solve an equation. Two different types of advice were given in case the task context influences the degree of face threat implied by a particular suggestion. These advice messages were then presented to 47 experimental subjects at the University of California, Santa Barbara (UCSB), who were told to evaluate them as possible messages given by computer tutor. Each message was rated according to the degree to which it expressed respect for the user’s choices (negative politeness) and a feeling of working with the user (positive politeness). The main findings were as follows: • With this experimental instrument, subjects ascribed degrees of positive and negative politeness with a high degree of consistency; • The rankings of the ratings were consistent with the rankings proposed by Brown and Levinson, suggesting that the subjects ascribed politeness to the computer tutor as if it were a social actor; • The task context did not have a significant effect on the politeness ratings; • Ratings of politeness did depend upon the amount of computer experience of the subjects—experienced computer users were more tolerant impolite tutor messages than novice computer users were. Based upon these findings, it was concluded that politeness theory could be validly applied to dialog with a pedagogical agent, and that the average ratings for each type of tactic obtained from the study could be used to calibrate the tutorial tactic generator, possibly adjusting for the level of computer experience of the user. 3. Experimental Evaluation of Politeness in German Having successfully applied to politeness theory to the choice of tutorial tactics in English, we then considered the question of whether it might equally apply to tutorial tactics in German. Politeness theory is claimed by Brown and Levinson to apply to dialog in all languages and cultures; however not all cultures attribute the same degree of face threat to a given face threatening act. We therefore attempted to replicate the UCSB study in German. We anticipated that the ratings given by German subjects might differ from the American ratings for any of the following reasons: • Politeness theory might not apply cross-culturally to human-computer interaction as it does to human-human interaction; • Certain face threats might be regarded as more serious in one culture than in the other; • Human tutors in Germany might have different power or social distance relationships with human students, affecting the amount of face threat that learners tolerate; • Translating the messages into German might introduce cultural issues that are absent in English and yet have an impact on perceived politeness.
W.L. Johnson et al. / Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents
301
The participants for the German experiments were 83 students from Augsburg University. Thirty-nine students were recruited from the Philosophy department while 44 students were recruited from the Computer Science department. One subject indicated using a computer 1 to 5 hours per week, 11 indicated using a computer 5 to 10 hours per week, 26 indicated using a computer 10 to 20 hours per week, and 45 indicated using a computer more then 20 hours per week. The mean age of the subjects was 22.8 years (SD=1.997). There were 37 women and 46 men. Seventy-eight of the 83 students reported German as their native language. For the German experiment, we devised a German version of the original English questionnaire. We tried to find translations that closely matched the original English documents, but nevertheless sounded natural to native speakers of German. During the translation, the question arose of how to translate the English “you”. There are different ways of saying “you” in German depending on the degree of formality. In German, the more familiar “Du” is used when talking to close friends, relatives or children, while people tend to use the more formal “Sie” when talking to adults they do not know very well or to people that have a high status. Whether to use “Sie” or “Du” may constitute a difficult problem both for native speakers of German and foreigners. On the one hand, the “Du” address form might be considered as impolite or even abusive. On the other hand, switching to the “Sie” address form may be interpreted as a sign that the interlocutor wishes to maintain distance. A German waiter in the pub that is mostly frequented by young people is in a dilemma when she has to serve somebody of an older age. Some customers might consider the “Du” as disrespectful. Other might be irritated by the “Sie” since it makes them aware of the fact that they belong to an older age group. Similar dilemmas may occur in the academic context. Most German professors would use “Sie” when addressing undergraduates, but “Du” is common as well. Since address forms are an important means to convey in-group membership (see also [3]), we expected that the use of “Sie” or “Du” might have an impact on the students’ perception of politeness. In particular, we assumed that the students might perceive an utterance as more cooperative if the “Du” is used (positive politeness). Furthermore, the students might feel under higher pressure to perform a task if the teacher conveys more authority (negative politeness). To investigate these questions, we decided to divide the subjects into two groups. Thirty-seven students were presented with the more formal “Sie” version and 46 students were presented with the more confidential “Du” version of the questionnaire. That is, the variable “address form” was manipulated between subjects while comparisons concerning types of statements were within subject comparisons. Do the two kinds of politeness rating correspond for the English and the German version? Table 1 gives the mean ratings for each of the 16 sentences for the English and the German experiment on the rating scale for negative and positive politeness. Items were rated on a scale from 1 (least polite) to 7 (most polite). The items are listed in order of negative/positive politeness for the US condition. As in the US experiment, the most impolite statements are direct commands and commands attributed to the machine whereas the most polite statements are guarded suggestions and “we” constructions that indicate a common goal. For set B, there are just two permutations between two neighbour positions (B1 ↔ B2, B6 ↔ B7) in the case of positive politeness. In the case of negative politeness the order of the statements of set B completely coincide. For set A, the order of the statements differs to a higher degree. In particular, item A5 got a much lower rating for negative politeness in Germany than in the US. As a reason, we indicate that the utterance “Drücken wir die ENTER Taste” (Let’s click the ENTER button.) sounds rather patronizing in German which might
302
W.L. Johnson et al. / Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents
have evoke the feeling in the students that the agent does not respect their freedom. This patronising impression engendered by the first person plural is not unique to German; for example, in English adults sometimes use this form when giving commands to children (e.g., “OK, Johnnie, let’s go to bed now”). Nevertheless, the effect was obviously stronger for the German version, but interestingly only occurred for negative politeness. Both the American and the German subjects gave A5 the highest rating in terms of positive politeness. Mean Ratings for Neg. Politeness for the Experiments Conducted in the US and in Germany A1 A2 A3 A4 A5 A6 A7 A8 B1 B2 B5 B4 B3 B8 B6 B7 US 1.75 2.72 2.89 3.17 3.34 4.28 4.51 5.85 1.79 2.75 3.26 3.32 3.79 4.11 4.70 4.83 D 1.42 2.70 2.65 3.70 1.93 4.35 4.06 5.49 1.43 2.10 3.31 3.76 4.08 4.17 4.60 5.39 Mean Ratings for Pos. Politeness for the Experiments Conducted in the US and in Germany A1 A2 A4 A3 A6 A8 A7 A5 B2 B1 B4 B8 B3 B6 B7 B5 US 2.53 2.94 3.32 3.85 4.09 4.11 4.83 5.17 3.06 3.09 4.04 4.43 4.79 4.89 4.95 5.26 D 3.04 2.87 3.98 3.28 4.72 4.83 4.48 4.87 2.45 2.41 4.27 4.27 5.04 5.23 5.20 5.66
Table 1: Comparison of the Experimental Results Obtained in the US and in Germany Overall, the Pearson correlation between the US and German ratings of positive politeness for the 16 statements is r = .926 which is highly significant (p < .001). The correlation for US and German ratings of negative politeness for the 16 statements is r = .893 which is highly significant (p < .001) as well. This means that we can conclude that German and American users responded to the politeness level of our statements in the same way. An analysis of variance conducted on the 8 items revealed that the ratings differed significantly from each other for negative politeness (F(7,574)=100.6022, p ; ; } + {in, into} + , where = NP with the content of ‘fertilised egg’ singular_det = {the, one, 1, a, an} <split> = {split, splits, splitting, has split, etc.} = {divides, which divide, has gone, being broken...} = {two, 2, half, halves}, etc
632
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
The pattern basically is all the paraphrases collapsed into one. It is essential that the patterns use the linguistic knowledge we have at the moment, namely, the part-ofspeech tags, the noun phrases and verb groups. In our previous example, the requirement that is an NP will exclude something like ‘one sperm has fertilized an egg’ while accept something like ‘an egg which is fertilized ...’ for e.g. The patterns or templates (we use the terms interchangeably here, although in some applications it makes sense to distinguish them) i.e., the rules that select from each text the information relevant to the task, are built from training data in one of the following ways. In each case we need to devise a language or a grammar to represent these rules. Before describing the methods and the results, we need to state which shallow linguistic properties we are considering and how we ‘extract’ them. We have relied on part-of-speech tagging and information on noun phrases and verb groups in the data. We used a Hidden Markov Model part-of-speech (HMM POS) tagger trained on the Penn Treebank corpus, and a Noun Phrase (NP) and Verb Group (VG) finite state machine (FSM) chunker to provide the input to the information extraction pattern matching phase. The NP network was induced from the Penn Treebank, and then tuned by hand. The Verb Group FSM (i.e. the Hallidayean constituent consisting of the verbal cluster without its complements) was written by hand. Shallow analysis makes mistakes, but multiple sources help fill gaps, and in IE this is adequate for most of the time. The general-purpose lexicon contains words with corresponding tags from the British National Corpus and the Wall Street Journal corpus. The domain-specific lexicon is obviously an on-going process. 1.1 Manually-Engineered Patterns A person writes special knowledge to extract information using grammars and rules. The 3 crucial steps to take in writing extraction rules by hand can be found, among other references on information extraction, in Appelt and Israel (1999). First, all the ways in which the target information is expressed in a given corpus are determined. Second, all the plausible variants of these ways are considered and then written in appropriate patterns. We first describe the grammatical formalism with which we wrote the patterns. A pattern takes the form: Id :: LHS ==> RHS, where Id can be a complex term to categorise patterns into groups and subgroups. LHS is a Cat, where Cat is a (linguistic) category like NP, VG, Det, etc, or one that is user-defined. RHS is a list of Elements, where possibly each element is followed by a condition and Elements are defined: Element
==>
Variable | Word/Cat | c(Cat) |?(Element) optional element | (Element; Element) disjunction W(Word)
The first step in the pattern matching algorithm is that all patterns are compiled. Afterwards, when an answer arrives for pattern-matching it is first tagged and all phrases (i.e. verb groups-VG and noun phrases-NP) are found. These are then compared with each element of each compiled pattern in turn, until either a complete match is found or all patterns have been tried and no match was found to exist. The grammar went through stages of improvement ([13],[14]), starting from words, disjunction of words, sequence of words, etc up until the version described above. We also experimented with a different number of answers used for the training data for
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
633
different questions and, on average, we have achieved 84.5% agreement with examiners scores. Note that the full mark of each question range between 1-4. Table 1. Results using the manually-written approach Question 1 2 3 4 5 6 7 8 9 Average
FullMark 2 2 2 1 2 3 1 4 2 ----
Percentage of Agreement 89.4 91.8 84 91.3 76.4 75 95.6 75.3 86.6 84
Table 1 shows the results using the last version of the grammar/system on 9 questions in the GCSE biology exams4. For each question, we trained on 80% of the positive instances i.e. answers where the mark was > 0 (as should be done), and tested on the positive and negative instances. In total, we had around 200 instances for each question. The following results are the ones we got before we incorporated the spelling corrector into the system and before including rules to avoid some over-generation. Also, we are in the process of fixing a few NP, VG formations and negations of verbs, and all this should make the percentages higher. Due to some inconsistency in the marking, examiners’ mistakes and the decisions that we had to make on what we should consider correct or not, independently of a domain expert, 84% average is a good result. Hence, though some of the results look disappointing, the discrepancy between the system and the examiners is not very significant. Furthermore, this agreement is calculated on the whole mark and not on individual sub_marks. This, obviously, makes the result looks worse than what in reality the system’s performance is5. In the following section, we describe another approach we used for our automarking problem. 1.2 Automatic Pattern Learning The last approach requires skill, much labour, and familiarity with both domain and tools. To save time and labour, various researchers have investigated machinelearning approaches to learn IE patterns. This requires many examples with data to be extracted, and then the use of a suitable learning algorithm to generate candidate IE patterns. One family of methods for learning patterns requires a corpus to be annotated, at least to the extent of indicating which sentences in a text contain the relevant information for particular templates (e.g. [11]). Once annotated, groups of similar sentences can be grouped together, and patterns abstracted from them. This can be done by taking a partial syntactic analysis, and then combining phrases that partially overlap in content, and deriving a more general pattern from them. All that is needed is people familiar with the domain to annotate the text. However, it is still a laborious task. Another family of methods, more often employed for the named entity recognition stage, tries to exploit redundancy in un-annotated data (e.g. [5]). Previously, in [14], we said that we did not want to manually categorise answers into positive or negative instances, since this is a laborious task, and that we will only consider the 4
We have a demo available for the system. For more details on the issues that the system faces and the mistakes it makes and their implications please consult the authors.
5
634
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
sample of human marked answers that have effectively been classified into different groups by the mark awarded. However, in practise the noise in these answers was not trivial and, judging from our experience with the manually-written method, this noise can be minimized by annotating the data. After all, if the training data consists of a few hundred answers then it is not such a laborious task, especially if done by a domain expert. A Supervised Learning or Semi-Automatic Algorithm The following algorithm omits the first 3 steps from the previously described learn-test-modify algorithm in [14]. In these 3 steps we were trying to automate the annotation task. Annotation here is a lightweight activity. Annotating, highlighting or labelling, in our case, simply means going through each student's answer and highlighting parts of the answers that deserve 1 mark. Categories or classes of 1 mark are chosen as this is mainly the guideline in the marking scheme and this is how examiners are advised to do. There is a one-to-one correspondence between 1 part of the marking scheme, 1 mark, and one equivalence class (in our terms). These are separated by semi-colons (;) in the marking scheme. We can replace these steps with, hopefully a more reliable annotation done by a domain expert6 and we start with the learning process directly. We keep the rest of the steps in the algorithm as they are, namely, 1.
2. 3. 4. 5.
The learning step (generalisation or abstracting over windows): The patterns produced so far are the most-specific ones, i.e. windows of keywords only. We need some generalisation rules that will help us make a transition from a specific to a more general pattern. Starting from what we call a triggering window, the aim is to learn a general pattern that covers or abstracts over several windows. These windows will be marked as ‘seen windows’. Once no more generalisation to the pattern at hand can be made to cover any new windows, a new triggering window is considered. The first unseen window will be used as a new triggering window and the process is repeated until all windows are covered (the reader can ask the authors for more details. These are left for a paper of a more technical nature). Translate the patterns (or rudimentary patterns) learned in step 1 into the syntax required for the marking process (if different syntax is used). Expert filtering again for possible patterns. Testing on training data. Make additional heuristics on width. Also, add or get rid of some initial keywords. Testing on testing data.
We continue to believe that the best place to look for alternatives, synonyms or similarities is in the students’ answers (i.e. the training data). We continue in the process of implementation and testing. A domain expert (someone other than us) is annotating some new training data. We are expecting to report on these results very soon. 2. Machine-Learning Approach In the previous section, we described how machine-learning techniques can be used in information extraction to learn the patterns. Here, we use machine-learning algorithms to learn the mark. Given a set of training data consisting of positive and negative instances, that is, answers where the marks are 1 or 0, respectively, the algorithm abstracts a model that represents the training data, that is, describing when or when not to give a mark. When faced with a new answer the model is used to give a mark. Previously in [13], we reported the results we obtained using Nearest Neighbour Classification techniques. In the following, we report our results using two algorithms, namely, decision tree learning and Bayesian learning on the questions shown in the previous section. The first experiments show the results with non-annotated data; we then repeat the experiments with annotated data. As we mentioned earlier, the annotation is very simple: we highlight the part of the answer that deserves 1 mark, meaning that irrelevant material can be ignored. Unfortunately, this does not mean that the training data is noiseless since sometimes annotating the data is less than straightfor6
This does not mean we will not investigate building a tool for annotation since as it will be shown in section 2, annotating the answers has a significant impact on the results.
635
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
ward and it can get tricky. However, we try to minimize inconsistency. We used the existing Weka system [15] to conduct our experiments. For lack of space, we will omit the description of the decision tree and Bayesian algorithms and we only report their results. The results reported are on a 10-fold cross validation testing. For our marking problem, the outcome attribute is well-defined. It is the mark for each question and its values are {0,1, …full_mark}. The input attributes could vary from considering each word to be an attribute or considering deeper linguistic features like a head of a noun phrase or head of a verb group to be an attribute, etc. In the following experiments, each word in the answer was considered to be an attribute. Furthermore, (Rennie et al. 2003) propose simple heuristic solutions to some problems with naïve classifiers. In Weka, Complement of Naïve Bayes is supposed to be a refinement to the selection process that Naïve Bayes makes when faced with instances where one outcome value has more training data than another. This is true in our case. Hence, we ran our experiments using this algorithm also to see if there was any difference. Results on Non-Annotated data We first considered the non-annotated data, that is, the answers given by students, as they are. The first experiment considered the values of the marks to be {0,1, …, full_mark} for each question. The reports of decision tree learning and Bayesian learning are reported in the columns titled DTL1 and NBayes/CNBayes1. The second experiment considered the values of the marks to be either 0 or >0, i.e. we considered two values only. The results are reported in columns DTL2 and NBayes2/CNBayes2. The baseline is the number of answers with the most common mark over the total number of answers multiplied by 100. Obviously, the result of the baseline differs in each experiment only when the sum of the answers with marks greater than 0 exceeds that of those with mark 0. This affected questions 8 and 9 in Table 2 below. Hence, we took the average of both results. It was no surprise that the results of the second experiment were better than the first on questions with the full mark >1. After all, in the second experiment, the algorithm is learning a 0-mark and a symbol for just any mark>0 as opposed to an exact mark in the first. In both experiments, the Naïve Bayes learning algorithm did better than the decision tree learning algorithm and the complement of Naïve Bayes did slightly better or equally well on questions with a full mark of 1, like questions 4 and 7 in the table, while it resulted in a worse performance on questions with full marks >1. Table 2. Results for Bayesian learning and decision tree learning on non-annotated data Question 1 2 3 4 5 6 7 8
9 Average
Baseline 69 54 46 58 54 51 73 42 / 57 60 / 70 60.05
DTL1
NBayes/CNBayes1
DTL2
NBayes/CNBayes2
Stem_DTL2
Stem_Nbayes2
73.52 62.01 68.68 69.71 60.81 47.95 88.05 41.75
73.52 / 66.47 65.92 / 61.45 72.52 / 61.53 75.42 / 76 66.66 / 53.21 59.18 / 52.04 88.05 / 88.05 43.29 / 37.62
76.47 62.56 93.4 69.71 67.25 67.34 88.05 72.68
81.17 / 73.52 73.18/ 68.15 93.95 / 92.85 75.42 / 76 73.09 / 73.09 81.63 / 77.55 88.05 / 88.05 70.10/ 69.07
-----73.98 93.03 81.44
-----80.10 87.56 71.65
61.82
67.20 / 62.36
76.34
79.03 / 76.88
63.81
67.97/62.1
74.86
79.51/77.3
71.51 --
77.42 --
Since we were using the words as attributes, we expected that in some cases stemming the words in the answers would improve the results. Hence, we experimented with the answers of 6, 7, 8 and 9 from the list above and the results, after stemming,
636
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
are reported in the last two columns in Table 27. We notice that whenever there is an improvement, as in question 8, the difference is very little. Stemming does not necessarily make a difference if the attributes/words that could affect the results appear in a root form already. The lack of any difference or worse performance may also be due to the error rate in the stemmer. Results on Annotated data We repeated the second experiments with the annotated answers. As we said earlier, annotation means highlighting the part of the answer that deserves 1 mark (if the answer has >=1 mark), so for e.g. if an answer was given a 2 mark then at least two pieces of information should be highlighted and answers with 0 mark stay the same. Obviously, the first experiments could not be conducted since with the annotated answers the mark is either 0 or 1. The baseline for the new data differs and the results are shown in Table 3 below. Again, Naïve Bayes is doing better than the decision tree algorithm. It is worth noting that, in the annotated data, the number of answers whose marks are 0 is less than in the answers whose mark is 1, except for questions 1 and 2. This may have an effect on the results. From getting the worse performance in NBayes2 before Annotation, Question 8 jumps to seventh place. The rest maintained the same position more or less, with question 3 always nearest to the top. Count(Q,1)Count(Q,0) is highest for questions 8 and 3, where Count(Q,N) is the number of answers whose mark is N. The improvement of performance for question 8 in relation to Count(8,1) was not surprising, since question 8 has a full-mark of 4 and the annotation’s role was an attempt at a one-to-one correspondence between an answer and 1 mark. On the other hand, question 1 that was in seventh place in DTL2 before annotation, jumps down to the worst place after annotation. In both cases, namely, NBayes2 and DTL2 after annotation, it seems reasonable to hypothesize that P(Q1) is better than P(Q2) if Count(Q1,1)-Count(Q1,0) >> Count(Q2,1)-Count(Q2,0), where P(Q) is the percentage of agreement for question Q. Furthermore, according to the results of CNBayes in Table 2, we expected that CNBayes will do better on questions 4 and 7. However, it did better on questions 3, 4, 6 and 9. Unfortunately, we cannot see a pattern or a reason. Table 3. Results for Bayesian learning and decision tree learning on annotated data Question 1 2 3 4 5 6 7 8 9 Average
Baseline 58 56 86 62 59 69 79 78 79 69.56
DTL 74.87 75.89 90.68 79.08 81.54 85.88 88.51 94.47 85.6 84.05
NBayes/CNBayes 86.69 / 81.28 77.43 / 73.33 95.69 / 96.77 79.59 / 82.65 86.26 / 81.97 92.19 / 93.99 91.06 / 89.78 96.31 / 93.94 87.12 / 87.87 88.03 / 86.85
As they stand, the results of agreement with given marks are encouraging. However, the models that the algorithms are learning are very naïve in the sense that they depend on words only and providing a justification for a student won’t be possible. The next step is to try the algorithms on annotated data that has been corrected for spelling and investigate some deeper features or attributes other than words, like the heads of a noun phrase or a verb group or a modifier of the head, etc. 7
Our thanks to Leonie Ijzereef for the results in the last 2 columns of Table 2.
J.Z. Sukkarieh and S.G. Pulman / Information Extraction and Machine Learning
637
3. Conclusion In this paper, we have described the latest refinements and results made on our automarking system described in ([13],[14]), using information extraction techniques where patterns were hand-crafted or semi-automatically learned. We have also described experiments where the problem is reduced to learning a model that describes the training data and use it to mark a new question. At the moment, we are focusing on information-extraction techniques. The results we obtained are encouraging enough to pursue these techniques with deeper linguistic features, especially to be able to associate some confidence measure and some feedback to the student with each answer marked by the system. We are using machine-learning techniques to learn the patterns or at least some rudimentary ones that the knowledge engineer can complete. As we mentioned earlier in section 1.2, this is what we are in the process of doing. Once this is achieved, the next step is to try and build a tool for annotation and also to use some deeper linguistic features or properties or even (partially) parse the students’ answers. We have noticed that these answers vary dramatically in their written quality from one group of students to another. For the advanced group, many answers are more grammatical, more complete and with less spelling errors. Hence, we may be able to extract linguistic features deeper than a verb group and a noun group. Bibliography [1] Appelt, D. & Israel, D. (1999) Introduction to Information Extraction Technology. IJCAI 99. [2] Burstein J., Kukich K., Wolff S., Chi Lu, Chodorow M., Braden-Harder L. and Harris M.D. Automated scoring using a hybrid feature identification technique. 1998. [3] Burstein J., Kukich K., Wolff S., Chi Lu, Chodorow M., Braden-Harder L. and Harris M.D. Computer analysis of essays. In NCME Symposium on Automated Scoring, 1998. [4] Burstein J., Leacock C. and Swartz R. Automated evaluation of essays and short answers. In 5th International Computer Assisted Assessment Conference. 2001 [5] Collins, M. and Singer, Y. (1999) Unsupervised models for named entity classification. Proceedings Joint SIGDAT Conference on Empirical Methods in NLP & Very Large Corpora. [6] Foltz P.W., Laham D. and Landauer T.K. Automated essay scoring: Applications to educational technology. 2003. http://www-psych.nmsu.edu/~pfoltz/reprints/Edmedia99.html. Reprint. [7] Leacock, C. and Chodorow, M. (2003) C-rater: Automated Scoring of Short-Answer Questions. Computers and Humanities 37:4. [8] Mitchell, T. Russel, T. Broomhead, P. and Aldridge, N. (2002) Towards robust computerized marking of free-text responses. In 6th International Computer Aided Assessment Conference. [9] Mitchell, T. Russel, T. Broomhead, P. and Aldridge, N. (2003) Computerized marking of shortanswer free-text responses. In 29th annual conference of the International Association for Educational Assessment (IAEA), Manchester, UK. [10] Rennie, J.D.M., Shih, L., Teevan, J. and Karger, D. (2003) Tackling the Poor Assumptions of Naïve Bayes Text Classifiers. http://haystack.lcs.mit.edu/papers/rennie.icml03.pdf. [11] Riloff, E. (1993) Automatically constructing a dictionary for information extraction tasks. Proceedings 11th National Conference on Artificial Intelligence, pp. 811-816. [12] Rose, C. P. Roque, A., Bhembe, D. and VanLehn, K. (2003) A hybrid text classification approach for analysis of student essays. In Building Educational Applications Using NLP. [13] Sukkarieh, J. Z., Pulman, S. G. and Raikes (2003) N. Auto-marking: using computational linguistics to score short, free text responses. In the 29th annual conference of the International Association for Educational Assessment (IAEA), Manchester, UK. [14] Sukkarieh, J. Z., Pulman, S. G. and Raikes (2004) N. Auto-marking2: An update on the UCLESOXFORD University research into using computational linguistics to score short, free text responses. In the 30th annual conference of the International Association for Educational Assessment (IAEA), Philadelphia, USA. [15] Witten, I. H. Eibe, F. Data Mining. Academic Press 2000.
638
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
A Knowledge Acquisition System for Constraint-based Intelligent Tutoring Systems Pramuditha Suraweera, Antonija Mitrovic and Brent Martin Intelligent Computer Tutoring Group Department of Computer Science, University of Canterbury Private Bag 4800, Christchurch, New Zealand {psu16, tanja, brent}@cosc.canterbury.ac.nz Abstract. Building a domain model consumes a major portion of the time and effort required for building an Intelligent Tutoring System. Past attempts at reducing the knowledge acquisition bottleneck by automating the knowledge acquisition process have focused on procedural tasks. We present CAS (Constraint Acquisition System), an authoring system for automatically acquiring the domain model for nonprocedural as well as procedural constraint-based tutoring systems. CAS follows a four-phase approach: building a domain ontology, acquiring syntax constraint directly from it, generating semantic constraints by learning from examples and validating the generated constraints. This paper describes the knowledge acquisition process and reports on results of a preliminary evaluation. The results have been encouraging and further evaluations are planned.
1 Introduction Numerous empirical studies have shown that Intelligent Tutoring Systems (ITS) are effective tools for education. However, developing an ITS is a labour intensive and time consuming process. A major portion of the development effort is spent on acquiring the domain knowledge that accounts for the intelligence of the system. Our goal is to significantly reduce the time and effort required for building a knowledge base by automating the process. This paper details the Constraint Acquisition System (CAS), which automatically acquires the required knowledge for ITSs by learning from examples. The knowledge acquisition process consists of four phases, initiated by an expert of the domain describing the domain in terms of an ontology. Secondly, syntax constraints are automatically generated by analysing the ontology. Semantic constraints are generated in the third phase from problems and solutions provided by the author. Finally, the generated constraints are validated with the assistance of the author. The remainder of the paper is initiated by a brief introduction to Constraint-based modelling, the student modelling technique focused in this research, and a brief overview of related research. We then present a detailed description of CAS, including its architecture and a description of the knowledge acquisition process. Finally, conclusions and future work is outlined. 2 Related work Constraint based modelling (CBM) [6] is a student modelling approach that somewhat eases the knowledge acquisition bottleneck by using a more abstract representation of the domain compared to other commonly used approaches [5]. However, building constraint sets still remains a major challenge. Our goal is to significantly reduce the time and effort required for acquiring the domain knowledge for CBM tutors by automating the knowledge acquisition process. Unlike other automated knowledge acquisition systems, we aim to produce a system that has the ability to acquire knowledge for non-procedural, as well as procedural, domains.
P. Suraweera et al. / A Knowledge Acquisition System
639
Existing systems for automated knowledge acquisition have focused on acquiring procedural knowledge in simulated environments or highly restrictive environments. KnoMic [10] is a learning-by-observation system for acquiring procedural knowledge in a simulated environment. It generates the domain model by generalising recorded domain experts’ traces. Koedinger et al have constructed a set of authoring tools that enable non AI experts to develop cognitive tutors. They allow domain experts to create “Pseudo tutors” which contain a hard coded domain model specific to the problems demonstrated by the expert [3]. Research has also been conducted to generalise the domain model of “Pseudo tutors” by using machine learning techniques [2]. Most existing systems focus on acquiring procedural knowledge by recording the domain expert’s actions and generalising recorded traces using machine learning algorithms. Although these systems appear well suited to tasks where goals are achieved by performing a set of steps in a specific order, they fail to acquire knowledge for non-procedural domains, i.e. where problem-solving requires complex, non-deterministic actions in no particular order. Our goal is to develop an authoring system that can acquire procedural as well as declarative knowledge. The domain model for CBM tutors [7] consists of a set of constraints, which are used to identify errors in student solutions. In CBM knowledge is modelled by a set of constraints that identify the set of correct solutions from the set of all possible student inputs. CBM represents knowledge as a set of ordered pairs of relevance and satisfaction conditions. The relevance condition identifies the states in which the represented concept is relevant, while the satisfaction condition identifies the subset of the relevant states in which the concept has been successfully applied. 3
Constraint Authoring System
The proposed system is an extension of WETAS [4], a web-based tutoring shell that facilitates building constraint-based tutors. WETAS provides all the domain-independent components for a text-based ITS, including the user interface, pedagogical module and student modeller. The pedagogical module makes decisions based on the student model regarding problem/feedback generation, and the student modeller evaluates student solutions by comparing them to the domain model and updates the student model. The main limitation of WETAS is its lack of support for authoring the domain model. As WETAS does not provide any assistance for developing the knowledge base, typically a knowledge base is composed using a text editor. Although the flexibility of a text editor may be adequate for knowledge engineers, novices tend to be overwhelmed by the task. The goal of CAS (Constraint Authoring System) is to reduce the complexity of the task by automating the constraint acquisition process. As a consequence the time and effort required for building constraint bases should reduce dramatically. CAS consists of an ontology workspace, ontology checker, problem/solution manager, syntax and semantic constraint generators, and constraint validation as depicted in Figure 1. During the initial phase, the domain expert develops an ontology of the domain in the ontology workspace. This is then evaluated by the ontology checker, and the result is stored in the ontology repository. The syntax constraints generator analyses the completed ontology and generates syntax constraints directly from it. These constraints are generated from the restrictions on attributes and relationships specified in the ontology. The resulting constraints are stored in the syntax constraints repository. CAS induces semantic constraints during the third phase by learning from sample problems and their solutions. Prior to entering problems and sample solutions, the domain expert specifies the representation for solutions. This is a decomposition of the solution into
640
P. Suraweera et al. / A Knowledge Acquisition System
components consisting of a list of instances of concepts. For example, an algebraic equation consists of a list of terms in the left hand and a list of terms in the right hand side. Problems and solutions Problem/ solution interface
Problem/ solution manager
Semantic constrains generator
Semantic constraints
Ontology workspace
Ontologies
Syntax constrains generator
Syntax constraints
Constraints validation component
Ontology checker
Figure 1: Architecture of the constraint-acquisition system
The final phase involves ensuring the validity of the generated constraints. During this phase the system generates examples to be validated by the author. In situations where the author’s validation conflicts with the system’s evaluation according to the domain model, the author is requested to provide further examples to illustrate the rationale behind the conflict. The new examples are then used to resolve the conflicts, and may also lead to the generation of new constraints. 3.1 Modelling the domain’s ontology Domain ontologies play a central role in the knowledge acquisition process of the constraint authoring system [9]. A preliminary study conducted to evaluate the role of ontologies in manually composing a constraint base showed that constructing a domain ontology assisted the composition of the constraints [8]. The study showed that ontologies help organise constraints into meaningful categories. This enables the author to visualise the constraint set and to reflect on the domain, assisting them to create more complete constraint bases. Construct
Regular Relationship
Binary Regular
Attribute
Entity
Relationship
Identifying Relationship
Regular
Weak
Simple
Composite
Binary Identifying
N-ary Regular Recursive Regular
N-ary Identifying
Key
Partial key
Single
Derived
Multi-valued
Recursive Identifying
Figure 2: Ontology for ER modelling domain
An ontology describes the domain by identifying important concepts and relationships between them. It outlines the hierarchical structure of the domain in terms of sub- and super-concepts. CAS contains an ontology workspace for modelling an ontology of the domain. An example ontology for Entity Relationship Modelling is depicted in Figure 2. The root node, Construct, is the most general concept, of which Relationship, Entity and Attribute are sub-concepts. Relationship is further specialised into Regular and Identifying, which are the two possible types of relationships, and so on. As syntax constraints are generated directly from the ontology, it is imperative that all relationships are correct. The ontology checker verifies that the relationships between con-
P. Suraweera et al. / A Knowledge Acquisition System
641
cepts are correct by engaging the user in a dialog. The author is presented with lists of specialisations of concepts involved in a relationship and is asked to label the specialisations that are incorrect. For example, consider a relationship between Binary identifying relationship and Attribute. CAS asks whether all of the specialisations of attribute (key, partial key, single-valued etc) can participate in this relationship. The user indicates that key and partial key attributes cannot be used in this relationship. CAS therefore replaces the original relationship with specialised relationships between Binary identifying relationship and the nodes single-valued, multi-valued and derived. Ontologies are internally represented in XML. We have defined set of XML tags specifically for this project, which can be easily be transformed to a standard ontology representation form such as DAML [1]. The XML representation also includes positional and dimensional details of each concept for regenerating the layout of concepts in the ontology. 3.2 Syntax Constraint Generation An ontology contains much of information about the syntax of the domain: information about domain concepts; the domains (i.e. possible values) of their properties; restrictions on how concepts participate in relationships. Restrictions on a property can be specified in terms of whether its value has to be unique or whether it has to contain a certain value. Similarly, restrictions on the participation in relationships can also be specified in terms of minimum and maximum cardinality. The syntax constraints generator analyses the ontology and generates constraints from all the restrictions specified on properties and relationships. For example, consider the owner relationship between Binary identifying relationship and Regular entity from the ontology in Figure 2, which has a minimum cardinality of 1. This restriction specifies that each Binary identifying relationship has to have at least one Regular entity participating as the owner, and can be translated to a constraint that asserts that each Identifying relationship found in a solution has to have at least one Regular entity as its owner. To evaluate the syntax constraints generator, we ran it over the ER ontology in Figure 2. It produced a total of 49 syntax constraints, covering all the syntax constraints that were manually developed for KERMIT [7], an existing constraint-based tutor for ER modelling. The generated constraint set was more specific than the constraints found in KERMIT, i.e. in some cases several constraints generated by CAS would be required to identify the problem states identified by a single constraint in KERMIT. This may mean that the set of generated constraints would be more effective for an ITS, since they would provide feedback that is more specific to a single problem state. However, it is also possible that they would be overly specific. We also experimented with basic algebraic equations, a domain significantly different to ER modelling. The ontology for algebraic equations included only four basic operations: addition, subtraction, multiplication and division. The syntax constraints generator produced three constraints from an ontology composed for this domain, including constraints that ensure whenever an opening parenthesis is used there should be a corresponding closing parenthesis, a constant should contain a plus or minus symbol as its sign, and a constant’s value should be greater than or equal to 0. Because basic algebraic expressions have very little syntax restrictions, three constraints are sufficient to impose the basic syntax rules. 3.3 Semantic Constraint Generation Semantic constraints are generated by a machine learning algorithm that learns from examples. The author is required to provide several problems, with a set of correct solutions for
642
P. Suraweera et al. / A Knowledge Acquisition System
each depicting different ways of solving it. A solution is composed by populating each of its components by adding instances of concepts, which ensures that a solution strictly adheres to the domain ontology. Alternate solutions, which depict alternate ways of solving the problem, are composed by modifying the first solution. The author can transform the first solution into the desired alternative by adding, editing or dropping elements. This reduces the amount of effort required for composing alternate solutions, as most alternatives are similar. It also enables the system to correctly identify matching elements in two alternate solutions. The algorithm generates semantic constraints by analysing pairs of solutions to identify similarities and differences between them. The constraints generated from a pair of solutions contribute towards either generalising or specialising constraints in the main constraint base. The detailed algorithm is given in Figure 3. a. For each problem Pi b. For each pair of solutions Si & Sj a. Generate a set of new constraints N b. Evaluate each constraint CBi in main constraint base, CB, against Si & Sj, If CBi is violated, generalise or specialise CBi to satisfy Si & Sj c. Evaluate each constraint Ni in set N against each previously analysed pair of solutions Sx & Sy for each previously analysed problem Pz, If Ni is violated, generalise or specialise CBi to satisfy Sx & Sy d. Add constraints in N that were not involved in generalisation or specialisation to CB Figure 3: Semantic constraint generation algorithm
The constraint learning algorithm focuses on a single problem at a time. Constraints are generated by comparing one solution to another of the same problem, where all permutations of solution pairs, including solutions compared to themselves, are analysed. Each solution pair is evaluated against all constraints in the main constraint base. Any that are violated are either specialised to be irrelevant for the particular pair of solutions, or generalised to satisfy that pair of solutions. Once no constraint in the main constraint base is violated by the solution pair, the newly generated set of constraints is evaluated against all previously analysed pairs of solutions. The violated constraints from this new set are also either specialised or generalised in order to be satisfied. Finally, constraints in the new set that are not found in the main constraint base are added to the constraint base. 1. Treat Si as the ideal solution (IS) and Sj as the student solution (SS) 2. For each element A in the IS a. Generate a constraint that asserts that if IS contains the element A, SS should contain a matching element b. For each relationship that element is involved with, Generate constraints that ensures that the relationship holds between the corresponding elements of the SS 3. Generalise the properties of similar constraints by introducing variables or wild cards Figure 4: Algorithm for generating constraints from a pair of solutions
New constraints are generated from a pair of solutions following the algorithm outlined in Figure 4. It treats one solution as the ideal solution and the other as the student solution. A constraint is generated for each element in the ideal solution, asserting that if the ideal solution contains the particular element, the student solution should also contain the matching element. E.g.
Relevance: IS.Entities has a Regular entity Satisfaction: SS.Entities has a Regular entity
In addition, three constraints are generated for each relationship that an element participates with. Two constraints ensure that a matching element exists in SS for each of the two
P. Suraweera et al. / A Knowledge Acquisition System
643
elements of IS participating in the relationship. The third constraint ensures that the relationship holds between the two corresponding elements of SS. E.g. 1. Relevance: IS.Entities has a Regular entity AND IS.Attributes has a Key AND SS.Entities has a Regular entity AND IS Regular entity is in key-attribute with Key AND IS Key is in belong to with Regular entity Satisfaction: SS.Attributes has a Key 2. Relevance: IS.Entities has a Regular entity AND IS.Attributes has a Key AND SS.Attributes has a Key AND IS Regular entity is in key-attribute with Key AND IS Key is in belong to with Regular entity Satisfaction: SS.Entities has a Regular entity 3. Relevance: IS.Entities has a Regular entity AND IS.Attributes has a Key AND SS.Entities has a Regular entity AND SS.Attributes has a Key AND IS Regular entity is in key-attribute with Key AND IS Key is in belong to with Regular entity Satisfaction: SS Regular entity is in key-attribute with Key AND SS Key is in belong to with Regular entity a. If constraint set, C-set that does not contain violated constraint V, has a similar but a more restrictive constraint C then replace V with C and exit. b. If C-set has a constraint C that has the same relevance condition but different satisfaction condition to V, Add the satisfaction condition of C as a disjunctive test to the satisfaction of V, remove C from C-set and exit c. Find a solution Sk that satisfies constraint V d. If a matching element can be found in Sj for each element in Sk that appears in the satisfaction condition, Generalise satisfaction of V to include the matching elements as a new test with a disjunction and exit e. Restrict the relevance condition of V to be irrelevant for solution pair Si & Sj, by adding a new test to the relevance signifying the difference and exit f. Drop constraint Figure 5: Algorithm for generalising or specialising violated constraints
The constraints that get violated during the evaluation stage are either specialised or generalised according to the algorithm outlined in Figure 5. It deals with two sets of constraints (C-set): the new set of constraints generated by a pair of solutions and the main constraint base. The algorithm remedies each violated constraint individually by either specialising it or generalising it. If the constraint cannot be resolved, it is labelled as an incorrect constraint and the system ensures that it does not get generated in the future. The semantic constraints generator of CAS produced a total of 135 constraints for the domain of ER modelling using the ontology in Figure 2 and six problems. The problems supplied to the system were simple and similar to the basic problems offered by KERMIT. Each problem focused on a set of ER modelling constructs and contained at least two solutions that exemplified alternate ways of solving the problem. The solutions were selected that maximised the differences between them. The differences between most solutions were small because ER modelling is a domain that does not have vastly different solutions. However, problems that can be solved in different ways consisted of significantly different solutions.
644
P. Suraweera et al. / A Knowledge Acquisition System
The generated constraints covered 85% of the 125 constraints found in KERMIT’s constraint-base, which was built entirely manually and has proven to be effective. After further analysing the generated constraints, it was evident that the reason for not generating most of the missing constraints was due to a lack of examples. 85% coverage is very encouraging, considering the small set of sample problems and solutions. It is likely that providing further sample problems and solutions to CAS would increase the completeness of the generated domain model. Although the problems and solutions were specifically chosen to improve the system’s effectiveness in producing semantic constraints, we assume that a domain expert would also have the ability to select good problems and provide solutions that show different ways of solving a problem. Moreover, the validation phase, which is yet to be completed, would also produce constraints with the assistance of the domain expert. CAS also produced some modifications to existing constraints found in KERMIT, which improved the system’s ability to handle alternate solutions. For example, although the constraints in KERMIT allowed weak entities to be modelled as composite multivalued attributes, in KERMIT the attributes of weak entities were required to be of the same type as the ideal solutions. However CAS correctly identified that when a weak entity is represented as a composite multivalued attribute, the partial key of the weak entity has to be modelled as simple attributes of the composite attribute. Furthermore, the identifying relationship essential for the weak entity becomes obsolete. These two examples illustrate how CAS improved upon the original domain model of KERMIT. We also evaluated the algorithm in the domain of algebraic equations. The task involved specifying an equation for the given textual description. As an example, consider the problem “Tom went to the shop to buy two loafs of bread, he gave the shopkeeper a $5 note and was given $1 as change. Write an expression to find the price of a loaf of bread using x to represent the price”. It can be represented as 2x + 1 = 5 or 2x = 5 – 1. In order to avoid the need for a problem solver, the answers were restricted to not include any simplified equations. For example the solution “x = 2” would not be accepted because it is simplified. a) Relevance: IS LHS has a Constant (?Var1) Satisfaction: SS LHS has a Constant (?Var1) or SS RHS has a Constant (?Var1) b) Relevance: IS RHS has a + Satisfaction: SS LHS has a – or SS RHS has a + c) Relevance: IS RHS has a Constant(?Var1) and IS RHS has a – and SS LHS has a Constant(?Var1) and SS LHS has a + and IS Constant (?Var1) is in Associated-operator with – Satisfaction: SS Constant (?Var1) is in Associated-operator with + Figure 6: Sample constraints generated for Algebra
The system was given five problems and their solutions involving addition, subtraction, division and multiplication for learning semantic constraints. Each problem contained three or four alternate solutions. CAS produced a total of 80 constraints. Although the completeness of the generated constraints is yet to be formally evaluated, a preliminary assessment revealed that the generated constraints are able to identify correct solutions and point out many errors. Some generated constraints are shown in Figure 6. An algebraic equation consists of two parts: a left hand side (LHS) and a right hand side (RHS). Constraint a in Figure 6 specifies that for each constant found in the LHS of the Ideal solution (IS), there has to be an equal constant in either the LHS or the student solution (SS) or the RHS. Simi-
P. Suraweera et al. / A Knowledge Acquisition System
645
larly, constraint b specifies that an addition symbol found in the RHS of the IS should exist in the SS as either an addition symbol in the same side or a subtraction in the opposite side. Constraint c ensures the existence of the relationship between the operators and the constants. Thus, a constant in the RHS of the IS with a subtraction attached to it, can appear as a constant with addition attached to it in the LHS of the SS. 4
Conclusions and Future work
We provided an overview of CAS, an authoring system that automatically acquires the constraints required for building constraint-based Intelligent Tutoring Systems. It follows a four-stage process: modelling a domain ontology, extracting syntax constraints from the ontology, generating semantic constraints and finally validating the generated constraints. We undertook a preliminary evaluation in two domains: ER modelling and algebra word problems. The domain model generated by CAS for ER modelling covered all syntax constraints and 85% of the semantic constraints found in KERMIT [7] and unearthed some discrepancies in KERMIT’s constraint base. The results are encouraging, since the constraints were produced by analysing only 6 problems. CAS was also used to produce constraints for the domain of algebraic word problems. Although the generated constraints have not been formally analysed for their completeness, it is encouraging that CAS is able to handle two vastly different domains. Currently the first three phases of the constraints acquisition process have been completed. We are currently developing the constraint validation component, which would also contribute towards increasing the quality of the generated constraint base. We also will be enhancing the ontology workspace of CAS to handle procedural domains. Finally, the effectiveness of CAS and its ability to scale to domains with large constraint bases has to be empirically evaluated in a wide range of domains. References [1] [2]
[3]
[4]
[5]
[6] [7] [8]
[9]
[10]
DAML. DARPA Agent Markup Language, http://www.daml.org. Jarvis, M., Nuzzo-Jones, G. and Heffernan, N., Applying Machine Learning Techniques to Rule Generation in Intelligent Tutoring Systems. In: Lester, J., et al. (eds.) Proc. ITS 2004, Maceio, Brazil, Springer, pp. 541-553, 2004. Koedinger, K., et al., Openning the Door to Non-programmers: Authoring Intelligent Tutor Behavior by Demonstration. In: Lester, J., et al. (eds.) Proc. ITS 2004, Maceio, Brazil, Springer, pp. 162-174, 2004. Martin, B. and Mitrovic, A., WETAS: a Web-Based Authoring System for Constraint-Based ITS. Proc. 2nd Int. Conf on Adaptive Hypermedia and Adaptive Web-based Systems AH 2002, Malaga, Spain, LCNS, pp. 543-546, 2002. Mitrovic, A., Koedinger, K. and Martin, B., A comparative analysis of cognitive tutoring and constraint-based modeling. In: Brusilovsky, P., et al. (eds.) Proc. 9th International conference on User Modelling UM2003, Pittsburgh, USA, Springer-Verlag, pp. 313-322, 2003. Ohlsson, S., Constraint-based Student Modelling. Proc. Student Modelling: the Key to Individualized Knowledge-based Instruction, Berlin, Springer-Verlag, pp. 167-189, 1994. Suraweera, P. and Mitrovic, A. An Intelligent Tutoring System for Entity Relationship Modelling. Int. J. Artificial Intelligence in Education, vol 14 (3,4), 2004, pp. 375-417. Suraweera, P., Mitrovic, A. and Martin, B., The role of domain ontology in knowledge acquisition for ITSs. In: Lester, J., et al. (eds.) Proc. Intelligent Tutoring Systems 2004, Maceio, Brazil, Springer, pp. 207-216, 2004. Suraweera, P., Mitrovic, A. and Martin, B., The use of ontologies in ITS domain knowledge authoring. In: Mostow, J. and Tedesco, P. (eds.) Proc. 2nd Int. 2nd International Workshop on Applications of Semantic Web for E-learning SWEL'04, ITS2004, Maceio, Brazil, pp. 41-49, 2004. van Lent, M. and Laird, J.E., Learning Procedural Knowledge through Observation. Proc. International conference on Knowledge capture, pp. 179-186, 2001.
646
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Computer Games as Intelligent Learning Environments: A River Ecosystem Adventure Jason TAN, Chris BEERS, Ruchi GUPTA, and Gautam BISWAS Dept. of EECS and ISIS, Vanderbilt University Nashville, TN, 37235, USA {jason.tan, chris.beers, ruchi.gupta, gautam.biswas}@vanderbilt.edu Abstract. Our goal in this work has been to bring together the entertaining and flow characteristics of video game environments with proven learning theories to advance the state of the art in intelligent learning environments. We have designed and implemented an educational game, a river adventure. The adventure game design integrates the Neverwinter Nights game engine with our teachable agents system, Betty’s Brain. The implementation links the game interface and the game engine with the existing Betty’s Brain system and the river ecosystem simulation using a controller written in Java. After preliminary testing, we will run a complete study with the system in a middle school classroom in Fall 2005. Keywords: educational games, video game engines, teachable agents, intelligent learning environments
Introduction Historically, video and computer games have been deemed counterproductive to education [1]. Some educators, parents, and researchers believe that video games take away focus from classroom lessons and homework, stifle creative thinking, and promote unhealthy individualistic attitudes [1,2]. But many children find these games so entertaining that they seem to play them nonstop until they are forced to do something else. As a result, computer and video games have become a huge industry with 2001 sales exceeding $6 billion in the United States alone [3]. Research into the effects of video games on behavior has shown that not all of the criticism is justified [3]. State of the art video games provide immersive and exciting virtual worlds for players. They use challenge, fantasy, and curiosity to engage attention. Interactive stories provide context, motivation, and clear goal structures for problem solving in the game environment. Researchers who study game behavior have determined that they place users in flow states, i.e., “state[s] of optimal experience, whereby a person is so engaged in activity that self-consciousness disappears, sense of time is lost, and the person engages in complex, goal-directed activity not for external rewards, but simply for the exhilaration of doing.” [4] The Sims (SimCity, SimEarth, etc.), Carmen Sandiego, Pirates, and Civilization are examples of popular games with useful educational content [3]. However, the negative baggage that has accompanied video games has curtailed the use of advanced game platforms in learning environments. Traditional educational games tend to be mediocre drill and practice environments (e.g., MathBlaster, Reader Rabbit, and Knowledge Munchers) [5]. In a recent attempt to harness the advantages of a video game framework for learning 3D mathematical functions, a group of researchers concluded that doing so was a mistake. “By telling the students beforehand that they were going to be using software that was
J. Tan et al. / Computer Games as Intelligent Learning Environments
647
game-like in nature, we set the [computer learning environment] up to compete against commercial video games. As can be seen by the intense competition present in the commercial video game market, the students’ high expectations are difficult to meet.” [6]. What would we gain by stepping up and facing the challenge of meeting the high expectations? Integrating the “flow” feature of video games with proven learning theories to design learning environments has tremendous potential. Our goal is to develop learning environments that combine the best features of game environments and learning theories. The idea is to motivate students to learn by challenging them to solve realistic problems, and exploit animation and immersive characteristics of game environments to create the “flow” needed to keep the students engaged in solving progressively more complex learning tasks. In previous work, we have developed Betty’s Brain, a teachable agent that combines learning by teaching with self-regulated mentoring to promote deep learning and understanding [7]. Experiments in fifth grade science classrooms demonstrated that students who taught Betty showed deep understanding of the content material and developed far transfer capabilities [8]. Students also showed a lot of enthusiasm by teaching Betty beyond the time allocated, and by putting greater effort into reading resources so that they could teach Betty better. A study of game genres [9] has led us to adopt an adventure game framework for extending the Betty’s Brain system. We have designed a game environment, where Betty and the student team up and embark on a river adventure to solve a number of river ecosystem problems. Their progress in the game is a function of how well Betty has been taught about the domain, and how proficient they are in implementing an inquiry process that includes collecting relevant evidence, forming hypotheses, and then carrying on further investigations to support and refine the hypotheses. This paper discusses the interactive story that describes the game structure and the problem episodes. 1. Learning by Teaching: The Betty’s Brain System Our work is based on the intuitively compelling paradigm, learning by teaching, which states that the process of teaching helps one learn with deeper understanding [7]. The teacher’s conceptual organization of domain concepts becomes more refined while communicating ideas, reflecting on feedback, and by observing and analyzing the students’ performance. We have designed a computer-based system, Betty’s Brain, shown in Fig. 1, where students explicitly teach a computer agent named Betty [10]. The system has been used to teach middle school students about interdependence and balance in river ecosystems. Three activities, teach, query, and quiz, model student-teacher interactions. In the teach mode, students teach Betty by constructing a concept map using a graphical drag and drop interface. In the query mode, students can query Betty about the concepts they have taught her. Betty uses qualitative reasoning mechanisms to reason with the concept map. When asked, she uses a combination of text, speech, and animation to provide a detailed explanation of how she derived her answer. Figure 1. Betty’s Brain Interface In the quiz mode, students observe
648
J. Tan et al. / Computer Games as Intelligent Learning Environments
how Betty performs on pre-scripted questions. This feedback tells students how well they have taught Betty, which in turn helps them to reflect on how well they have learned the information themselves. To extend students’ understanding of interdependence to balance in river ecosystems, we introduced temporal structures and corresponding reasoning mechanisms into Betty’s concept map representation. In the extended framework, students teach Betty to identify cycles (these correspond to feedback loops in dynamic processes) in the concept map and assign time information to each cycle. Betty can now answer questions like, “If macroinvertebrates increase what happens to waste in two weeks?” A number of experimental studies in fifth grade science classrooms have demonstrated the effectiveness of the system [8]. The river ecosystem simulation, with its visual interface, provides students with a window to real world ecosystems, and helps them learn about dynamic processes. Different scenarios that include the river ecosystem in balance and out of balance illustrate cyclic processes and their periods, and that large changes (such as dumping of waste) can cause large fluctuations in entities, which leads to eventual collapse of the ecosystem. The simulation interface uses animation, graphs, and qualitative representations to show the dynamic relations between entities in an easy to understand format. Studies with high school students have shown that the simulation helps them gain a better understanding of the dynamics of river ecosystems [11]. This has motivated us to extend the system further and build a simulation based game environment to create an entertaining exploratory environment for learning. 2. Game Environment Design Good learning environments must help students develop life-long learning and problem solving skills [12]. Betty’s Brain, through the Mentor feedback and Betty’s interactions with the student-teacher, incorporates metacognitive strategies that focus on self-regulated learning [8]. In extending the system to the game environment, we hope to teach general strategies that help students apply what they have learnt to problem solving tasks. The River Ecosystem Adventure, through cycles of problem presentation, learning, teaching, and problem solving, is designed to provide a continual flow of events that should engage students and richly enhance their learning experience (see Fig. 2). Students are given opportunities to question, hypothesize, investigate, analyze, model, and evaluate; the six phases of the scientific inquiry cycle not only help students acquire new knowledge, but develop metacognitive strategies that lead to generalized problem solving skills and transfer [13]. The game environment is set in a world where students interact with and solve problems for communities that live along a river. The teachable agent architecture is incorporated into the game environment. The student player has a Figure 2. Abstract view of the river
J. Tan et al. / Computer Games as Intelligent Learning Environments
649
primary “directorial” role in all phases of game play: learning and teaching, experimenting, and problem solving. In the prelude, students are introduced to the game, made familiar with the training academy and the experimental pond, and given information about the ecosystem problems they are likely to encounter on the river adventure. The learning and teaching phase mirrors the Betty’s Brain environment. The student and Betty come together to prepare for the river adventure in a training academy. Like before, there is an interactive space (the concept map editor) that allows the player to teach Betty using a concept map representation, ask her questions, and get her to take quizzes. Betty presents herself to the student as a disciplined and enthusiastic learner, often egging the student on to teach her more, while suggesting that students follow good self-regulation strategies to become better learners themselves. Betty must pass a set of quizzes to demonstrate that she has sufficient knowledge of the domain before the two can access the next phase of the game. Help is provided in terms of library resources and online documents available in the training academy, and Betty and the student have opportunities to consult a variety of mentor agents who visit the academy. In the experiment phase, Betty and the player accompany a river ranger to a small pond outside of the academy to conduct experiments that are geared toward applying their learnt knowledge to problem solving tasks. The simulation engine drives the pond environment. The ranger suggests problems to solve, and provides help when asked questions. Betty uses her concept map to derive causes for observed outcomes. The ranger analyzes her solutions and provides feedback. If the results are unsatisfactory, the student may return with Betty to the academy for further study and teaching. After they have successfully solved a set of experimental problems, the ranger gives them permission to move on to the adventure phase of the game. In the problem-solving phase, the player and Betty travel to the problem location, where the mayor explains the problem that this part of the river has been experiencing. From this point on, the game enters a real-time simulation as Betty and the student attempt to find a solution to the problem before it is too late. The student gets Betty to approach characters present in the environment, query them, analyze the information provided, and reason with relevant data to formulate problem hypotheses and find possible causes for these hypotheses. The student’s responsibility is to determine which pieces of information are relevant to the problem and communicate this information to Betty using a menu-driven interface. Betty reasons with this information to formulate and refine hypotheses using the concept map. If the concept map is correct and sufficient evidence has been collected, Betty generates the correct answer. Otherwise, she may suggest an incorrect cause, or fail to find a solution. An important facet of this process involves Betty explaining to the player why she has selected her solution. Ranger agents appear in the current river location at periodic intervals. They answer queries and provide clues, if asked. If Betty is far from discovering the correct solution, the student can take Betty back to the academy for further learning and teaching. The simulation engine, outlined in section 2, controls the state of the river and data generated in the environment. A screenshot of the game scenario is shown Figure 3. Screenshot of the game
650
J. Tan et al. / Computer Games as Intelligent Learning Environments
in Fig. 3. As the simulation clock advances, the problem may get worse and it becomes increasingly urgent for Betty and the student to find a solution. A proposed solution is presented to the mayor, who implements the recommendation. Upon successfully solving and fixing the problem, the team is given a reward. The reward can be used to buy additional learning resources, or conduct more advanced experiments in the pond in preparation for future challenges. The challenges that the students face become more complex in succession. 2.1. Game Engine Selection In order to accomplish our goal of combining the advantages of current video game technology and an intelligent learning-by-teaching environment, we looked at several adventure/RPG game engines. Most of these game engines provide a variety of scripting tools to control the characters, the dialog structures, and the flow of events in the game. In our work, we felt that a game engine that provides an overhead view of the environment would be most suitable for the student to direct Betty’s movements and actions in the world, rather than game engines that provide a first-person point-of-view. This led us to select the Neverwinter Nights game engine from BioWare Corp. [14] as the development environment for this project. The game environment, originally based on the popular game, Dungeons and Dragons, includes the Aurora Toolset, a sophisticated content development toolkit that allows users to create new weapons and monsters, as well as new scenarios and characters using scripted dialogue mechanisms. The toolset has been very successful and has spawned many free user-created expansions. 2.2. Development Process The Aurora Toolset uses a unique vocabulary for content creation. The adventure is created as a module containing all the locations, areas, and characters that make up the game. The module is divided up into regions or areas of interest. Each area can take on unique characteristics that contribute to different aspects of the game. The primary character in the game (the student) is the Player Character (PC). A number of other characters not directly under the control of the PC can be included in the adventure. They are called the Non-Playing Characters (NPC). In the River Adventure, Betty has an unusual role of being a NPC who is often controlled by the PC. Each individual problem scenario, the training academy, and the pond define individual areas, and the mentor agents, the rangers, and all other characters in the game environment are NPCs placed in the appropriate areas. Some NPCs can migrate from one area to another. 3. Implementation of the Game Environment One of the benefits of the Neverwinter Nights game engine is that it can be implemented using a client-server approach. This allows us to separate the simulation engine, Betty’s AIbased reasoners, and the other educational aspects of the game from the Neverwinter Nights interface. The underlying system based on the Betty’s Brain system with the added functionality (described in Section 3) can then be implemented on the server side, as illustrated in Fig. 4.
J. Tan et al. / Computer Games as Intelligent Learning Environments
651
Figure 4. The game environment architecture
A representation of the world is presented to the player by the game engine through the game interface on the client system. The player interacts with the system using a mouse and keyboard to control the movements of his own character and Betty (they move together), click on items of interest (to perform experiments, collect data, check on the concept map, etc.), and to initiate dialog with other NPCs. These define the set of actions that are programmed into the game engine. When students perform an action, it is communicated to the game engine. The game engine controls the visual representation of the world, renders the necessary graphics, and maintains the basic state of the environment and all the characters. On the server side, the River Adventure module describes the location and appearance of each NPC, the details of each area (what buildings and items are present in each scene), how each area connects to other areas, and the overall flow of the game from one level to the next. The Aurora toolset provides a powerful scripting engine used to control the NPC’s actions, and other aspects of the module. However, to fully implement the Betty’s Brain agent architecture, the river ecosystem simulation, and other more complicated aspects of the system, we utilize the “Neverwinter Nights Extender” (NWNX) [15]. NWNX allows for extensions to the Neverwinter Nights server. In our case, we use the nwnx_java extensions which implements an interface to Java classes and libraries. This allows us to incorporate aspects already implemented in the Betty’s Brain system with less effort. The controller and the simulation, implemented in Java, can now be integrated into the River Adventure module. As described in Section 2, the simulation engine uses a state-based mathematical model to keep track of the state of river system as time progresses. Details of this component are presented elsewhere [11], so we do not repeat it here. The rest of this section focuses on the design of the controller, and the updates we made to Betty’s reasoning mechanisms to enable her to perform diagnosis tasks.
652
J. Tan et al. / Computer Games as Intelligent Learning Environments
3.1. The Controller The controller, made up of the agent architecture and the evaluator, is the core of the intelligent aspects of the game implementation. Additionally, the controller maintains the current state of the game and determines what aspects of the world are accessible to the player. The evaluator assesses the performance of Betty and the student and is used to determine what scaffolding is necessary, as well as maintaining the player’s score. The controller leverages our previous work on multi-agent architecture for learning by teaching systems [8]. Each agent has three primary components: (i) the pattern tracker, (ii) the decision maker, and (iii) the executive. Betty, the mentors and rangers, and all of the significant NPCs in the game world have a corresponding agent within the controller. The pattern tracker monitors the environment, and initiates the decision maker when relevant observable patterns occur. The decision maker takes the input from the pattern tracker and determines what actions the agent should take. Finally, the executive executes these actions, and makes the necessary changes to the environment. Depending on the agent, this could include movement, dialog generation, or a specialized activity, such as making inferences from a concept map or generating help messages. NPC dialogues are generated by retrieving the correct dialog template and modifying it based on the decision maker’s output. The controller relays new information resulting from the agents’ actions through the nwnx_java plugin to the game module, and also updates the simulation as necessary. Separate from the agent architecture, the evaluator is the part of the controller that assesses the student’s performance and adjusts the game accordingly. The evaluator analyzes the results of the simulation as well as the student’s past actions to determine how the game will progress. It takes into account what aspects of the problem the student has yet to complete and sends this information to the game module. The decision makers associated with the mentor agents use this information to determine what level of help the mentors should give the student. If certain aspects of the problem remain unsolved for an extended period of time the mentors can give additional help. 3.2. Betty’s extended reasoning mechanisms Problem solving in the game hinges upon Betty’s ability to determine the root cause of a problem given the symptoms and current conditions. Betty’s concept map has to be correct and sufficiently complete for her to generate a correct answer. The reasoning mechanism in the existing Betty agent focuses on forward reasoning. It allows Betty to hypothesize the outcome of various changes to the environment. For example, she may reason that if the number of plants in the river increases, then the amount of dissolved oxygen will increase. In the game environment, Betty needs to reason from given symptoms and problems, and hypothesize possible causes. To achieve this, the reasoning mechanism had to be extended to allow Betty to reason backward in the concept map structure. The combination of the forward and backward reasoner defines a diagnosis process [16] that was added to Betty’s decision maker. The diagnosis component also gives Betty the capability of choosing the most probable cause when there are multiple possibilities of what is causing the problem in the river. Betty and the student can reflect on this information to decide on what additional information they need to determine the true cause for the problem they are working on. 4. Discussion and Future Work In this paper, we have designed a game environment that combines the entertainment and flow provided by present day video games with innovative learning environments that sup-
J. Tan et al. / Computer Games as Intelligent Learning Environments
653
port deep understanding of domain concepts, the ability to work with complex problems, and also develop metacognitive strategies that apply across domains. The Neverwinter Nights game interface and game engine are combined with the river ecosystem simulation to create a river adventure, where students solve a series of river ecosystem problems as they travel down a river. The learning by teaching component is retained, and incorporated into the game story by creating an initial phase where the student learns domain concepts and teaches Betty in a training academy. Components of the river adventure have been successfully tested, and preliminary experiments are being run on the integrated system. Our goal is to complete the preliminary studies this summer, and run a big study in a middle school classroom in Fall 2005. Acknowledgements: This project is supported by NSF REC grant # 0231771. References [1] Provenzo, E.F. (1992). What do video games teach? Education Digest, 58(4), 56-58 [2] Lin, S. & Lepper, M.R. (1987). Correlates of children's usage of video games and computers. Journal of Applied Social Psychology, 17, 72-93. [3] Squire, K. (2003). Video Games in Education. International Journal of Intelligent Simulations and Gaming, vol. 2, 49-62. [4] Csikszentmihalyi, M. (1990). Flow: The Psychology of Optical Experience. New York: Harper Perrennial. [5] Jonassen, D.H. (1988). Voices from the combat zone: Game grrlz talk back. In Cassell, J. & Jenkins, (Ed.), From Barbie to Mortal Combat: Gender and Computer Games. Cambridge, MA: MIT Press. [6] Elliot, J., Adams, L., & Bruckman, A. (2002). No Magic Bullet: 3D Video Games in Education. Proceedings of ICLS 2002, Seattle, WA. [7] Biswas, G., Schwartz, D., Bransford, J., & The Teachable Agents Group at Vanderbilt University. (2001). Technology Support for Complex Problem Solving: From SAD Environments to AI. In Forbus & Feltovich (eds.), Smart Machines in Education. Menlo Park, CA: AAAI Press, 71-98. [8] Biswas, G., Leelawong, K., Belynne, K., et al. (2004). Incorporating Self Regulated Learning Techniques into Learning by Teaching Environments. in The 26th Annual Meeting of the Cognitive Science Society, (Chicago, Illinois), 120-125. [9] Laird, J. & van Lent, M. The Role of AI in Computer Game Genres. http://ai.eecs.umich.edu/people/laird/papers/book-chapter.htm [10] Leelawong, K., Wang, Y, Biswas, G., Vye, N., Bransford, J., & Schwartz, D. (2001). Qualitative reasoning techniques to support learning by teaching: The teachable agents project. Proceedings of the Fifteenth International Workshop on Qualitative Reasoning , San Antonio 73-80. [11] Gupta, R., Wu, Y., & Biswas, G. (2005). Teaching About Dynamic Processes: A Teachable Agents Approach, Intl. Conf. on AI in Education, Amsterdam, The Netherlands, in review. [12] Schwartz, D. & Martin, T. (2004). Inventing to Prepare for Future Learning: The Hidden Efficiency of Encouraging Original Student Production in Statistics Instruction. Cognition and Instruction. Vol. 22 (2), 129-184. [13] White, B., Shimoda, T., Frederiksen, J. (1999). Enabling Students to Construct Theories of Collaborative Inquiry and Reflective Learning: Computer Support for Metacognitive Development. International Journal of Artificial Intelligence in Education, vol. 10, 151-182. [14] BioWare Corp. (2002). Neverwinter Nights and BioWare Aurora Engine. [15] Stieger Hardware and Softwareentwicklung. (2005). NeverwinterNights Extender 2 [16] Mosterman, P. & Biswas, G. (1999). Diagnosis of Continuous Valued Systems in Transient Operating Regions. IEEE Transactions On Systems, Man, And Cybernetics—Part A: Systems And Humans, Vol. 29(6),554-565.
654
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Paper Annotation with Learner Models Tiffany Y. Tang1,2 and Gordon McCalla2 Dept. of Computing, Hong Kong Polytechnic University, Hong Kong
[email protected] 2 Dept. of Computer Science, University of Saskatchewan, Canada {yat751, mccalla}@cs.usask.ca
1
Abstract. In this paper, we study some learner modelling issues underlying the construction of an e-learning system that recommends research papers to graduate students wanting to learn a new research area. In particular, we are interested in learner-centric and paper-centric attributes that can be extracted from learner profiles and learner ratings of papers and then used to inform the recommender system. We have carried out a study of students in a large graduate course in software engineering, looking for patterns in such “pedagogical attributes”. Using mean-variance and correlation analysis of the data collected in the study, four types of attributes have been found that could be usefully annotated to a paper. This is one step towards the ultimate goal of annotating learning content with full instances of learner models that can then be mined for various pedagogical purposes.
1. Introduction When readers make annotations while reading documents, multiple purposes can be served: supporting information sharing [1], facilitating online discussions [2], encouraging critical thinking and learning [3], and supporting collaborative interpretation [4]. Annotations can be regarded as notes or highlights attached by the reader(s) to the article, and since they are either privately used or publicly shared by humans, and should thus ideally be in humanunderstandable format. Another line of research on annotations focuses more on the properties (metadata) of the document as attached by editors (such as teachers or tutors in an e-learning context), e.g. using the Dublin Core metadata. Common metadata include Title, Creator, Subject, Publisher, References, etc. [5]. These metadata (sometimes referred to as item-level annotations) are mainly used to facilitate information retrieval and interoperability of the distributed databases, and hence need only be in machine-understandable format. Some researchers have studied automatic metadata extraction, where parsing and machine learning techniques are adapted to automatically extract and classify information from an article [6, 7]. Others have also utilized the metadata for recommending a research paper [8], or providing its detailed bibliographic information to the user, e.g. in ACM DL or CiteSeer [7]. Since those metadata are not designed for pedagogical purposes, sometimes they are not informative enough to help a teacher in selecting learning materials [9]. Our domain in this paper is automated paper recommendation in an e-learning context, with the focus on recommending technical articles or research papers with pedagogical value to learners such as students who are trying to learn a research area. In [10], we studied several filtering techniques and utilized artificial learners in recommending a paper to human learners. In that study, papers were annotated manually. The annotations included the covered topics, relative difficulty to a specific group of learners (senior undergraduate students), value-added (the amount of information that can be transferred to a student), and the authoritative level of the paper (e.g. whether the paper is well-known in the relevant area). The empirical results showed that learners’ overall
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
655
rating of a paper is affected by the helpfulness of the paper in achieving their goal, the topics covered by the paper, and the amount of knowledge gained after reading it. The study indicated that it is useful for a paper to be annotated by pedagogical attributes, such as what kinds of learners will like/dislike the paper or what aspects of the paper are useful for a group of learners. In this paper, we will describe a more extensive empirical analysis in pursuing an effective paper annotation for pedagogical recommendations. In section 2, we will briefly describe the issues related to pedagogical paper recommendation and paper annotation; more information can be found in [10]. In section 3, we will describe the data used in our analysis. And in section 4, we will provide and discuss the results of our analysis. We make suggestions for further research in section 5. 2. Making Pedagogically-Oriented Paper Recommendations A paper recommendation system for learners differs from other recommendation systems in at least three ways. The first is that in an e-learning context, there is a course curriculum that helps to inform the system. Since pure collaborative filtering may not be appropriate because it needs a large number of ratings (sparsity issue), the availability of a curriculum allows the deployment of a hybrid technique, partly relying on curriculum-based paper annotations. In addition, instead of relying on user feedbacks, we can also keep track of actual learner interactions with system to obtain implicit user models [11]. The second difference is the pedagogical issue. Beyond the learner interests, there are multiple dimensions of learner characteristics that should be considered in recommending learning material. For example, if a learner states that his/her interest is in Internet Computing, then recommending only the highly cited/rated papers in this area is not sufficient, because the learner may not be able to understand such papers. Thus, the annotations must include a wider range of learner characteristics. The third difference comes from the rapid growth in the number of papers published in an area. New and interesting papers related to a course are published every year, which makes it almost impossible for a tutor to read all the papers and find the most suitable one for his/her learners. A bias in the annotations may also be generated if the paper is explicitly annotated by a teacher or tutor. Hence, an automated annotation technique is desirable. The benefit is not only to avoid bias through use of ratings by many readers, but also to reduce the workload of the human tutor. For the purpose of automatic annotation, the source of information could come from either the content of the paper itself (intrinsic properties) or from the usage of the paper (extrinsic properties) by the readers. Usually, the intrinsic properties can be determined by using text processing or text mining techniques, e.g. the topics or subjects discussed in the paper, the difficulty level of the paper, or its authoritative level. But the extrinsic properties cannot be determined so readily, e.g. whether the paper is useful to learners, or contains value-added relative to any learner’s knowledge. In this paper, we will not focus on harvesting metadata of intrinsic properties from an existing paper library. Rather, we will focus on studying the collection of both intrinsic and extrinsic properties from learner experiences and feedback. What we are seeking are the pedagogical attributes that cannot be recognized easily. We argue here that relying on explicit metadata added to a digital library is not enough for the following reasons: x The authoritative level of a paper is commonly determined by the number of citations of the paper or by the journal in which the paper is published. However, these are measures most useful for experienced researchers, whereas value to learners is determined by more diverse factors.
656
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
x
Most learners have difficulty in specifying their interests, because they only have a superficial knowledge about the topics and may gain or lose interest in a topic after reading relevant or irrelevant papers. Additionally, the keywords or subjects provided by the metadata in a digital library usually represent a coarser-grained description of the topics, which may not match the details of a learner’s interests. In the next section we will describe a study in which papers were annotated with pedagogical attributes extracted from learner feedback and learner profiles, to see if learner-centered patterns of paper use can be found. This is another step in a research program aimed at annotating research papers with learner models, and mining these models to allow intelligent recommendations of these papers to students. 3. Data Collection The study was carried out with students enrolled in a masters program in Information Technology at the Hong Kong Polytechnic University. In total 40 part-time students were registered in a course on Software Engineering (SE) in the fall of 2004, with curriculum designed primarily for mature students with various backgrounds. During the class, 22 papers were selected and assigned to students as reading assignments for 9 consecutive weeks starting from the 3rd until the 11th week. After reading them, students were required to hand in a feedback form along with their comments for each paper. In the middle of the semester, students were also asked to voluntarily to fill in a questionnaire (see Figure 1). 35 students returned the questionnaire and their data are analyzed here.
7 4 8
29 4 14 2
1 6 5 0 1 2 10 10 5 2 0 3
11 17 19 9 5 11 11 13 15 11 4 8
13 10 7 17 17 10 9 7 11 15 17 16
7 2 2 8 9 11 4 4 2 5 12 6
3 0 2 1 3 1 1 1 2 2 2 2
5
10
12
6
2
1 1 2 2 0
9 3 11 5 8
9 10 9 18 5
9 15 11 6 14
7 6 2 4 8
12 7 7
18 10 13
Figure 1. Questionnaire for obtaining learner profile
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
657
3.1 Learners Figure 1 shows the questionnaire and the frequencies of the answers by the students (the numbers inside the boxes on each question). The questionnaire has four basic categories: interest, background knowledge, job nature, and learning expectation. In each category we collected data about various features related to the subject of the course. We believe that these features constitute important dimensions of learners’ pedagogical characteristics. As shown in Figure 1, the population of learners has diverse interests, backgrounds, and expectations. As for their learning goals, most of the students expect to gain general knowledge about SE. But not all of them are familiar with programming (7 out of 35 say ‘not familiar’). Hence, the students represent a pool of learners with working experience related to information technology, but do not necessarily have background in computer science. 3.2 Papers The 22 papers given to the students were selected according to the curriculum of the course without considering the implications for our research (in fact, they were selected before the class began). All are mandatory reading materials for enhancing student knowledge. Table 1 tabulates the short description of some papers: the covered topics, the publication year, and the journal/magazine name of the publication. Table 1. Short description of papers Paper Topics #1 Requirements Eng. #2 Project Mgmt.; Soft. Quality Mgmt. #3 Requirements Eng. #6 Requirements Eng.; Agile Prog.; Project Mgmt. #10 Web Eng.; UI Design #11 Web Eng.; UI Design; Software Testing #15 Web Eng.; UI Design; Soft. Testing; Case Study #16 UI Design; SE in General #17 Web Eng.; Software Testing #20 Software Testing and Quality Mgmt.; Agile Prog. #22 Project Mgmt.; Quality Mgmt.; Case Study
Year 2003 2001 2003 2004 2001 2004 1996 2003 1992 2003 2004
Figure 2. Learner feedback form
Journal/magazine name IEEE Software Comm. of the ACM IEEE Software IEEE Software IEEE Software ACM CHI ACM CHI ACM Interactions IEEE Computer IEEE Software IEEE Software
658
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
3.3 Feedback After reading each paper, students were asked to fill in a paper feedback form (Figure 2). Several features of the papers were to be evaluated by each student, including its degree of difficulty to understand, its degree of job-relatedness with the user, its interestingness, its degree of usefulness, its ability to expand the user’s knowledge (value-added), and its overall rating. We used a Likert 4-scale rating for the answer. 4. Data Analysis and Discussion Among the 35 students who answered the questionnaire, the vast majority read and rated all assigned papers. Table 2 shows the number who answered for each paper, along with the average overall ratings (Q.6 of Figure 2) and their standard deviations. From the table we can see that the range of average ratings is from 2.3 (paper #5) to 3.1 (paper #15), which means some papers are preferred over others, on average. Certainly, the means and standard deviations of a paper’s overall ratings must be annotated to each paper and updated periodically, because this determines the general quality of a paper (A1). Table 2. Average overall ratings and number of observations Paper 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Mean 2.8 2.9 2.4 2.5 2.3 2.9 3.0 2.8 3.0 2.9 2.6 2.8 2.7 2.9 3.1 2.4 3.0 2.8 2.9 2.6 2.8 2.9 .5 .6 .6 .7 .5 .8 .5 .5 .5 .5 .6 .5 .4 .8 .8 .6 .8 .6 .6 .7 .6 .5 StdD. N
35
35
35
32
32
32
31
32
32
35
34
33
34
35
35
34
35
35
35
35
35
35
As shown in Table 1 some papers are on related topics, e.g. Web Engineering and UI design. Intuitively, if a learner likes/dislikes a paper on one topic, then s/he may like/dislike papers on similar topics. But this may not always be correct because the ratings may not depend exclusively on the topic of the paper. To check this, we have run a correlation analysis over the ratings of each pair of papers. The results show various correlations between -0.471 to 0.596 with 14 of them greater than or equal to 0.5 and only one less than -0.4. This suggests that some pairs of papers have moderately similar rating patterns, while others show an inverse pattern. The results can be used to generate recommendation rules across papers, such as: x “If a learner likes paper #20 then s/he may like paper #21 with correlation 0.596” x “If a learner likes paper #8 then s/he may dislike paper #13 with correlation 0.471” Unsurprisingly, most high correlations are attained from the ratings of papers on different topics. If we pick the top-ten highest correlated ratings, only three pairs of papers belong to the same topics, i.e. (#14, #15), (#14, #17) and (#20, #21). Given this information, we propose to annotate a paper with both positively and negatively correlated papers (A2). To extract more information, a further analysis was performed by looking for patterns in student feedback on each paper, in particular looking for correlations between answers Q.1 to Q.5 on the feedback form (Figure 2) with Q.6 in order to determine the factors that affect a student’s overall rating. Our conjecture is that the overall ratings given to each paper may uniquely be affected by those factors or a combination of them. For instance, some papers may get higher ratings due to having richer information about topics that match the interests of the majority of students, while others may get higher ratings due to good writing of the paper or its helpfulness to the student in understanding the concept being learned. If such patterns can be discovered, then we should be able to determine whether a particular paper is suitable to a particular learner based on the paper’s and the learner’s attributes. For instance, if the overall ratings of a paper have a strong correlation
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
659
to learner interest, then we can recommend it to learners whose interests match the topic of the paper. Alternatively, if the ratings are strongly correlated to the learner’s goal, then it will be recommended to learners with similar goals. Figure 3 illustrates the correlation of different factors, i.e. between Q.6 in Figure 2 with Q.1 to Q.5 for 22 papers. The Y-axis is the correlation coefficient with range [-1, 1].
Figure 3. Factors that affect overall ratings
As shown in Figure 3, the learners’ overall ratings of a paper are affected mostly by the interestingness of the paper, followed by the value-added gained after reading it and its usefulness in understanding the concept being learned. This result is slightly different from the result obtained in our prior study [10], where the usefulness slightly exceeded the interestingness. The reason is that in the current study we used a larger group of students and, more importantly, used different papers. As shown in Figure 3, the correlation varied for different papers, which means the individual differences of the papers matter here. Therefore, we also propose annotating a paper with the correlation of the factors that affect learners’ overall ratings (A3). Finally, we can also determine the features of the learner (as determined by his or her questionnaire answers) that affect the learner’s overall ratings. In other words, we analyze the correlations between the overall ratings and each feature in the learner’s profile (Figure 1). Based on the conversion of the Likert scale, two methods are used simultaneously to extract the correlation. The first method is to convert the user interest and background knowledge into binary (3 to 5 into ‘1’, and 1 and 2 into ‘0’), and assign ‘1’ if the user ticks any feature in the ‘job nature’ and ‘expectation’ (see Figure 1). For the overall rating (Q.6 in Figure 2), ‘3’ and ‘4’ are interpreted as ‘high’ ratings; therefore we assign it a ‘1’, while ‘1’ and ‘2’ are interpreted as ‘low’ ratings, and are assigned a ‘0’. After all values are converted into binary, then we run the correlation analysis. The second method is without converting any value to binary. We use both methods for the purpose of extracting more information. Figure 4 shows the combined results of both methods. There are 22 rows, where each row represents a paper, and each column represents features of the learner profile shown in Figure 1 (taken top-down, e.g. ‘job nature = software development’ is the fourth column under JOB in Figure 4). If the correlation obtained from either method is greater than or equal to 0.4, the relevant cell is highlighted with a light color, and if it is smaller than or equal to -0.4, it is filled with a black color. If the correlation is in between (no significant correlation), then we have left it blank.
660
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
Figure 4. The correlation matrix between overall ratings and learner models
It is shown from Figure 4 that only 16/22 papers have positive correlations with attributes of the learner profile. Some correlations can be verified easily, while others cannot. For instance, the ratings of the third paper are positively correlated to the first feature of learner interest (Q.1 in Figure 1: “software requirement engineering”). Yet the content of the third paper is about “requirements engineering” (cf. Table 1). And the ratings of the tenth paper (about “web engineering and UI design”) are correlated to the third feature (about “UI design” too). Thus, by checking the positive correlation between learner ratings and their interests, we can infer the topics covered by the paper. However, this method also results in some unexplainable results, such as why there is a positive correlation between the ratings of paper #1 (“requirements engineering”) with learners’ expectations of learning UI design (the top-rightmost cell)? It also shows negative correlation between the ratings of paper #3 with learner interest in “trust and reputation system on the Internet”, which cannot be explained even after checking the individual learner profiles. We think there are two possibilities here. The first is that the correlation is a coincidence, which may happen when the amount of data is small. The second is that the correlation represents hidden characteristics that have not been explained, something of interest discovered by the data mining. Due to limited data at the present time, we cannot derive any conclusion here. Nevertheless, we suggest annotating a paper with significant correlations of the overall ratings with each feature of the learner profile (A4). Given the pedagogical attributes (A1 – A4), we expect that the recommended papers can be more accurate and useful for learners. However, as in many recommendation systems, sparsity and scalability are two critical issues that may constrain a large-scale implementation. As the number of articles increases, the system may need to compute the correlations among thousands of documents, which in many cases cannot be completed real-time. Meanwhile, it is seldom that we can get enough learners to get a critical mass of ratings. Fortunately, both issues may not be so serious in e-learning systems. As pointed out earlier, the course curriculum may restrict the number of candidate papers within a subject and we can also utilize intrinsic properties to filter out irrelevant papers. In addition, lowrated and old papers will be discarded periodically, which eventually will increase the efficiency of the system. Another concern comes from the reliability of the feedback, because learners may have their interests and knowledge changing over time. Intuitively, an extensive interaction between learners and system can be used to track these changing behaviours since many mandatory assessments are commonly used in any learning system. Instead of making an effort to solve this problem, we can trace these changes to provide us with a refined understanding about the usage of the paper and the learning curve of learners interacting with it.
T.Y. Tang and G. McCalla / Paper Annotation with Learner Models
661
5. Conclusions and Future Work Several factors could affect the value of the annotations, including the properties of the paper and the learner characteristics. The combination of these properties then affects the learner ratings toward the paper. Through empirical analysis we have shown that we can use these correlations to extract paper properties by using the learner profiles and their paper ratings. Our data has also shown that the ratings of some papers have a significant correlation with the ratings of others and also attributes of learners. So far, we have extracted four sets of pedagogical attributes (A1 – A4) that can be annotated to a paper and used for recommendation. However, more information may still exist. For example, it may happen that the combinations of several learner attributes could better explain the learner ratings. In the future, we will use other data mining techniques to try to dig out such information, if it exists. In the longer term this research supports the promise of annotating learning objects with data about learners and data extracted from learners’ interactions with these learning objects. Such metadata may prove to be more useful, and perhaps easier to obtain, than metadata explicitly provided by a human tutor or teacher. This supports the arguments in [12] for essentially attaching instances of learner models to learning objects and mining these learner models to find patterns of end use for various purposes (e.g. recommending a learning object to a particular learner). This “ecological approach” allows a natural evolution of understanding of a learning object by an e-learning system and allows the elearning system to use this understanding for a wide variety of learner-centered purposes. Acknowledgements We would like to thank the Canadian Natural Sciences and Engineering Research Council for their financial support for this research. 6. References [1] Marshall, C. Annotation: from paper books to the digital library. JCDL’97, 1997. [2] Cadiz, J., Gupta, A., and Grudin, J. Using web annotations for asynchronous collaborative around documents. CSCW’00, 2000, 309-318. [3] Davis, J. and Huttenlocher, D. Shared annotation for cooperative learning. CSCL’95. [4] Cox, D. and Greenberg, S. Supporting collaborative interpretation in distributed groupware. CSCW’00, 2000, 289-298. [5] Weibel, S. The Dublin Core: a simple content description format for electronic resources. NFAIS Newsletter, 40(7):117-119, 1999. [6] Han, H., Giles, C.L., Manavoglu, E. and Zha, H. Automatic document metadata extraction using support vector machines. JCDL’03, 2003, 37-48. [7] Lawrence, S., Giles, C. L., and Bollacker, K. Digital libraries and autonomous citation indexing. IEEE Computer, 32(6): 67-71, 1999. [8] Torres, R., McNee, S., Abel, M., Konstan, J.A. and Riedl, J. Enhancing digital libraries with TechLens. JCDL’04, 2004. [9] Sumner, T., Khoo, M., Recker, M. and Marlino, M. Understanding educator perceptions of “quality” in digital libraries. JCDL’03, 2003, 269-279. [10] Tang, T. Y., and McCalla, G.I. Utilizing artificial learners to help overcome the cold-start problem in a pedagogically-oriented paper recommendation system. AH’04, Amsterdam, 2004. [11] Brooks, C., Winter, M., Greer, J. and McCalla, G.I. The massive user modeling system (MUMS). ITS’04, 635-645. [12] McCalla, G.I. The ecological approach to the design of e-learning environments: purpose-based capture and use of information about learners. J. of Interactive Media in Education (JIME), Special issue on the educational semantic web, T. Anderson and D. Whitelock (guest eds.), 1, 2004, 18p. [http://wwwjime.open.ac.uk/2004/1]
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
662
Automatic Textual Feedback for Guided Inquiry Learning Steven TANIMOTO, Susan HUBBARD, and William WINN Online Learning Environments Laboratory Box 352350, Dept. of Computer Science and Engineering University of Washington, Seattle, WA, 98195, USA
Abs tract. We briefly introduce the online learning environment INFACT, and then we describe its textual feedback system. The system automatically provides written comments to students as they work through scripted activities related to image processing. The commenting takes place in the context of an online discussion group, to which students are posting answers to questions associated with the activities. Then we describe our experience using the system with a class of university freshmen and sophomores. Automatic feedback was compared with human feedback, and the results indicated that in spite of advantages in promptness and thoroughness of the automatically delivered comments, students preferred human feedback, because of its better match to their needs and the human’s ability to suggest consulting another student who had just faced a similar problem.
1. Introduction Timely feedback has been found in the past to improve learning [1]. However, it can be a challenge to provide such feedback in large classes or online environments where the ratio of users to teachers and administrators is high. We report here on an experimental system that provides automated feedback to students as they work on activities involving elementary image processing concepts.
1.1 Project on Intensive, Unobtrusive Assessment The motivation for our project is to improve the quality of learning through better use of computer technology in teaching. We have focused on methods of assessment that use as their evidence not answers to multiple-choice tests but the more natural by-products of online learning such as students’ user-interface event logs, newsgroup-like postings and transcripts of online dialogs. By using such evidence, students may spend more of their time engaged in the pursuit of objectives other than assessment ones: completing creative works such as computer programs and electronic art, or performing experiments using simulators in subject areas such as kinematics, chemical reactions, or electric circuits. (We currently support programming in Scheme and Python, and performing mathematical operations on digital images.) Various artificial intelligence technologies have the potential to help us realise the goal of automatic, unobtrusive diagnostic educational assessment from evidence naturally available through online learning activities. These technologies include textual pattern matching, Bayesian inference,
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
663
and Latent Semantic Indexing [4]. In this paper, we focus on our experience to date using textual pattern matching in this regard.
1.2 Facet-Based Pedagogy Our project is studying automatic methods for educational assessment in a context in which multiple-choice tests are usually to be avoided. This means that other kinds of evidence must be available for analysis, and that such evidence must be sufficiently rich in information that useful diagnoses of learning impediments can be made. In order to obtain this quality of evidence, the learning activities in which our assessments are performed are structured according to a “facetbased pedagogy.” A facet is an aspect, conception, approximate state of understanding, or state of skill with regard to some concept, phenomenon, or skill. Minstrell [5] uses the term “facet” to refer to a variation of and elaboration of DiSessa’s phenomenological primitive (“p-prim”) [3]. We use the term “facet” in a more general sense, so as to be able to apply a general pedagogical approach to the learning not only of conceptual material such as Newton’s laws of motion but also of languages and skills. The facet-based pedagogical structure we use posits that instruction take place in units in which a cycle of teaching and learning steps proceeds. The cycle normally lasts one week. It begins with the posing of a problem (or several problems) by the instructor. Students then have one day to work on the problem individually and submit individual written analyses of the problem. Once these have been collected, students work in groups to compare and critique answers, keeping a record of their proceedings. By the end of the week, the students have to have submitted a group answer that incorporates the best of their ideas. It also must deal with any discrepancies among their individual analyses. Students work in groups for several reasons. One is essentially social, allowing students to feel involved in a process of give-and-take and to help each other. Another is that the likely differences in students’ thinking (assuming the problems are sufficiently challenging), will help them to broaden their perspectives on the issues and focus their attention on the most challenging or thought-provoking parts of the problem. And the most important reason, from the assessment point of view, to have the students work in groups is to help them communicate (to each other, primarily, as they see it, but also to us, indirectly) so as to create evidence of their cognition that we can analyze for misconceptions. During the cycle, we expect some of the students’ facets to change. The facets they have at the beginning of the unit, prior to the group discussion, are their preconceptions. Those they have at the end of the unit are their postconceptions. We want their postconceptions to be better than their preconceptions, and we want the postconceptions to be as expert-like as possible. In order to facilitate teaching and learning with this facet-based pedagogy, we have developed a software system known as INFACT. We describe it in the next section.
2. The INFACT Online Learning Environment Our system, called INFACT, stands for Integrated, Networked, Facet-based Assessment Capture Tool [6, 7]. INFACT catalyzes facet-based teaching and learning by (a) hosting online activities, (b) providing tools for defining specific facets and organising them, (c) providing simple
664
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
tools for manual facet-oriented mark-up of text and sketches, (d) providing tools for displaying evidence in multiple contexts including threads of online discussion, and timeline sequence, and (e) providing facilities for automatic analysis and automatic feedback to students. INFACT also includes several class management facilities such as automatic assignment of student to groups based on the students’ privately entered preferences (uses the Squeaky-Wheel algorithm), automatic account creation from class lists, and online standardized testing (for purposes such as comparison to the alternative means of assessment that we are exploring). The primary source of evidence used by INFACT is a repository of evolving discussion threads called the forum . Most of the data in the forum is textual. However, sketches can be attached to textual postings, and user-interface log files for sessions with tools such as an image processing system known as PixelMath [8] are also linked to textual postings. The forum serves the facet-based pedagogical cycle by mediating the instructor’s challenge problem, collecting student’s individual responses and hiding them until the posting deadline at which time the “curtain’’ is lifted and each student can see the posts of all members of his or her group. The forum hosts the ensuing group discussions, and provides a record of it for both the students and the instructor. Any facet-oriented mark-up of the students’ messages made by the instructor or teaching assistants is also stored in the forum database. In the experiments we performed with manual and automated feedback to students, we used a combination of the forum and email for the feedback. The facet-based pedagogy described above, as adapted for INFACT, is illustrated in Figure 1. A serious practical problem with this method of teaching is that the fourth box, “Teacher’s facet diagnoses,” is a bottleneck. When one teacher has to read all the discussions and interact with a majority of the students in a real class, most teachers find it impossible to keep up; there may be 25 or more students in a class, and teachers have other responsibilities than simply doing facet diagnoses. This strongly suggests that automation of this function be attempted. Teacher’s challenge question
Individual posts
Group discussion
Teacher’s facet diagnoses
Visualization
Question s to student s via email
Students’ response s
Correct ion s
Intervention
to diagnose s
Figure 1. The INFACT pedagogical cycle. The period of the cycle is normally 1 week.
INFACT provides an interface for teachers to ana lyze student messages and student drawing, and create assessment records for the database and feedback for the students. Figure 2 illustrates this interface, selected for sketch-assessment mode. The teacher expresses an assessment for a piece of evidence by highlighting the most salient parts of the evidence for the diagnosis, and then selecting from the facet catalog the facet that best describes the student’s apparent state of learning with regard to the current concept or capability. In order to provide a user-customizable text-analysis facility for automatic diagnosis and feedback, we designed and implemented a software component that we call the INFACT rule system. It consists of a rule language, a rule editor, and a rule applier. The rule language is based
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
665
Figure 2. The manual mark-up tool for facet-based instruction. It is shown here in sketchassessment mode, rather than te xt assessment mode. on regular expressions with an additional construct to make it work in INFACT. The rule editor is a Java applet that helps assure that rules entered into the rule system are properly structured and written. The rule applier comprises a back-end Perl script and a Java graphical user interface. The INFACT rule language is based on regular expressions. These regular expressions are applied by the rule applier to particular components of text messages stored in INFACT-Forum. In addition to the regular expressions, rule patterns contain “field specifiers.” A field specifier identifies a particular component of a message: sender name, date and time, subject heading, body. Each instance of a field specifier will have its own regular expression. Someone creating a rule (e.g., a teacher or educational technology specialist) composes a rule pattern by creating any number of field specifier instances and supplying a regular expression for each one. Each field specifier instance and regular expression represent a subcondition for the rule, all of which must match for the rule to fire. It is allowed to have multiple instances of the same field specifier in a pattern. Therefore INFACT rules generaliz e standard regular expressions by allowing conjunction. The rule applier can be controlled from a graphical user interface, and this is particularly useful when developing an assessment rule base. While regular expressions are a fundamental concept in computer science and are considered to be conceptually elementary, designing regular expressions to analyze text is a difficult and error-prone task, because of the complexity of natural language, particularly in the possibly broken forms typically used by students in online writing. Therefore we designed the rule applier to make it as easy as possible to test new rules. Although a complete rule specifies not only a condition, but also an action, the rule applier can be used in a way that safely tests conditions only. One can easily imagine that if it didn’t have this facility, a teacher testing rules in a live forum might create confusion when the rules being debugged cause
666
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
Figure 3. The “hit list” returned by the rule applier in testing mode. email or INFACT postings to be sent to students inappropriately. When applying rules in this safe testing mode, the rule actions are not performed, and the results of condition matching are displayed in a “hit list” much like the results from search engine such as Google. This is illustrated in Figure 3. It is also possible to learn rules automatically [2], but this study did not use that facility.
3. The Study The automated feedback system was tested in a freshman class for six weeks out of a ten-week quarter. The class was given in a small computer lab where each student had their own machine. Eighteen students completed the course and provided usable data. They were randomly divided into three groups, Arp, Botero and Calder. Almost all of the work discussed here was done collaboratively within these groups. In addition to testing the usability and reliability of the automatic feedback system for instruction, the class was used to conduct a simple study in which the effectiveness of the automatic system was compared with the effectiveness of feedback provided by an instructor. A “no-feedback” condition served as a control. The three feedback conditions were rotated through the three groups using a within-subjects design so that every student had each kind of feedback for two weeks over the six-week period. The feedback study began with the fourth week of class. The order of the types of feedback was different for each group. Each two-week period required the students to complete exercises in class and as homework. Every week, activities were
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
667
Figure 4. Feedback to the teacher/administrator from the action subsystem of the rule system. assigned requiring each student to find the solution to a problem set by the instructor (a PixelMath formula, a strategy, some lines of Scheme code) and to post that solution to INFACT-Forum by mid-week. The group then had the rest of the week to come to a consensus on the solution and to post it. At the end of the two-weeks, before the groups rotated to the next type of feedback, students took a short on-line post-test over the content covered in the preceding two weeks. The automatic feedback was provided in the manner described above. The human feedback was provided by an instructor (“Alan”). During the class, Alan sat at one of the lab computers watching posts come into INFACT-Forum from the group whose turn it was to receive human feedback. As each post arrived, he responded. Out of class, Alan checked the forum every day and responded to every post from the designated group. Students in the no feedback group were left to their own devices. Several data sources were available, including scores on the post-tests, the students' posts and the feedback provided automatically and by Alan, interviews with selected students at the end of each two-week period conducted by a research assistant, questionnaires, and observations of the class by three research assistants. The class instructor and Alan were also interviewed.
4. Findings Analysis of the post-test scores showed no statistically reliable differences among the groups as a function of the type of feedback they received, nor significant interactions among group, feedback, or the order in which the groups received feedback. There are two explanations for this finding, aside from taking it as evidence that the automatically-provided feedback was neither more nor less effective than that provided by Alan, and that neither was better than no feedback. First, the small number of students in each group reduced the statistical power of the analysis to the point where type-two errors were a real possibility. Second, the first no -feedback group was quick to organize itself and to provide mutually-supporting feedback within its members. This proved to be extremely effective for this group (Arp) and subsequently also for Botero and Calder when it was their turn not to receive feedback. However, examination of other data sources showed some differences between the automatic and Alan's feedback, as well as some similarities. First, both encountered technical problems. For the first few sessions, the automatic feedback system was not working properly.
668
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
This made it necessary for a research assistant to monitor the posts from the automatic feedback group and to decide from the rules which prepared feedback statement to send. Fortunately, the bug was fixed and the Wizard-of-Oz strategy was quickly set aside. Also, Alan soon discovered that posting his feedback to INFACT-Forum took too long as the system acted sluggishly. It was therefore decided to send the “human” feedback to the students' personal email accounts. This was much quicker. However, it required the students to have their email programs open at the same time as INFACT-Forum and PixelMath. With so many windows open, some students did not notice Alan's feedback until some time after it had been sent. Some even minimized their email windows to make their screens more manageable and did not read the feedback until some time after it was sent, if at all. The most obvious difference between the automatic and the human feedback was that the automatic feedback was very quick, while it took Alan time to read students' posts, consider what to reply, to type it and send it. This delay caused some minor frustration. One observer reported students posting to INFACT and then waiting for Alan's response before doing anything else. Several students were even seen to turn in their seats and watch Alan from behind while they were waiting for feedback. Also, out of class, Alan's feedback was not immediate, as he only checked the forum once a day. Automatic feedback was provided whenever a student posted something, whether during class or out of class. Next, the automatic feedback responses were longer and more detailed than Alan's. This was because they had been generated, with careful thought, ahead of time, while Alan responded on the fly. Alan also mentioned that he often had difficulty keeping up with the student posts during class and that he had to be brief in order to reply to them all. Over the six weeks Alan posted close to 300 messages. The automatic system sent less than 200. The main reason for this difference seems to be Alan's tendency to respond in a manner that encouraged the development of discussion threads. While both types of feedback asked questions of students and asked them to post another message as a matter of course (“Why do you think that is?”, “Try again and post your response.”), this tactic produced only one follow-on post to an automatic feedback message during the six weeks of the study. Though posting shorter messages, Alan was better than the automatic system at deciding what a student's particular difficulty might be, and responding more flexibly and particularly to individual students' posts. Some of the students said they preferred Alan's feedback for this reason, finding the automatic feedback too general or less directly relevant to their particular difficulties or successes. Moreover, Alan could sometimes determine more precisely than the automatic system what was causing a student to have a problem. In such cases, he would often suggest a strategy for the student to try, rather than giving direct feedback about the student's post. Alan also referred students to other students' posts as part of his feedback. Because he was monitoring all of the posts from the group, while the students themselves might not be, he knew if another student had solved a problem or come up with a suggestion that would be useful to the student to whom he was currently responding, and did not hesitate to have the student look at the other's post. This also speeded up the feedback process somewhat. On two occasions, Alan was able to spot common problems that were then addressed for everyone in the next class session. The students found Alan's feedback more personal. He made typos and used incomplete sentences. The automatic system did not. He used more vernacular and his posts reflected a more friendly tone. Alan also made an occasional mistake in the information he provided through feedback, though, fortunately, these were quickly identified and put right. In spite of this, most students preferred interacting with a human rather than the automatic system.
S. Tanimoto et al. / Automatic Textual Feedback for Guided Inquiry Learning
669
Finally, as we mentioned above, the first group to receive no feedback, Arp, compensated for this by providing feedback and other support to each other. By coincidence, students in Arp, more than in Botero and Calder, had, by the fourth week, developed the habit of helping each other through the forum. It turns out that Arp also contained the strongest students in the class who, collectively, had strength in all the skills required in the course. As a result, requests for help from one group member were answered without fail, in one case by ten responses from the other group members. One result of this was that, when it was Arp's turn to receive the system's feedback and then Alan's, they had come to rely on it. (The students who stopped work until Alan replied to their posts, whom we mentioned above, were all from Arp.) To summarize, the automatic feedback system delivered feedback and showed potential. Initial technical problems were quickly solved and the students received detailed and mostly relevant feedback on their posts to INFACT-Forum. The comparison to human feedback points to improvements that should be considered. First, it would be useful if the system could crossreference student posts so that students could be referred to each other's contributions in a way that proved effective in Alan's feedback. More generally, the ability of feedback from the automatic system to generate more collaboration among the students would be an important improvement. Second, the ability of the system to better diagnose from posts the reasons students were having problems would be useful. This would allow the system to sustain inquiry learning for more “turns” in the forum, rather than giving the answer, or suggesting a particular strategy to try. Third, any changes that made the automatic system appear to be more human would make it better received by students. Finally, it would be nice to create a computer-assisted feedback system in which the best of automated and human faculties can complement one another.
Acknowledgments The authors wish to thank E. Hunt, R. Adams, C. Atman, A. Carlson, A. Thissen, N. Benson, S. Batura. J. Husted, J. Larsson, D. Akers for their contributions to the project, the National Science Foundation for its support under grant EIA-0121345, and the referees for helpful comments. References [1] Black, P., and Wiliams, D. 2001. Inside the black box: Raising standards through classroom assessment. Kings College London Schl. of Educ. http://www.kcl.ac.uk/depsta/education/publications/Black%20Box.pdf. [2] Carlson, A., and Tanimoto, S. 2003. Learning to identify student preconceptions from text, Proc. HLT/NAACL 2003 Workshop: Building Educational Applications Using Natural Language Processing. [3] diSessa, A. 1993. Toward an epistemology of physics. Cognition & Instruction, 10, 2&3, pp.105-225. [4] Graesser, A.C., Person, N., Harter, D., and The Tutoring Research Group. 2001a. Teaching tactics and dialog in AutoTutor. International Journal of Artificial Intelligence in Education. [5] Minstrell, J. 1992. Facets of students’ knowledge and relevant instruction. In Duit, R., Goldberg, F., and Niedderer, H. (eds.), Research in Physics Learning: Theoretical Issues and Empirical Studies . Kiel, Germany: Kiel University, Institute for Science Education. [6] Tanimoto, S. L., Carlson, A., Hunt, E., Madigan, D., and Minstrell, J. 2000. Computer support for unobtrusive assessment of conceptual knowledge as evidenced by newsgroup postings. Proc. ED-MEDIA 2000, Montreal, Canada, June. [7] Tanimoto, S., Carlson, A., Husted, J., Hunt, E., Larsson, J., Madigan, D., and Minstrell, J. 2002. Text forum features for small group discussions with facet-based pedagogy, Proc. CSCL 2002, Boulder, CO. [8] Winn, W., and Tanimoto, S. 2003. On -going unobtrusive assessment of students learning in complex computer-supported environments. Presented at Amer. Educ. Res. Assoc. Annual Meeting , Chicago IL.
670
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Graph of Microworlds: A Framework for Assisting Progressive Knowledge Acquisition in Simulation-based Learning Environments Tomoya Horiguchi* Tsukasa Hirashima** *Faculty of Maritime Sciences, Kobe University **Deptartment of Information Engineering, Hiroshima University Abstract: A framework for assisting a learner’s progressive knowledge acquisition in simulation-based learning environments (SLEs) is proposed. In SLE, usually a learner is first given a simple situation to acquire basic knowledge, then given more complicated situation to refine it. Such change of situation often causes the change of the model to be used. Our GMW (graph of microworlds) framework effifiently assists a learner in such ‘progressive’ knowledge acquisition by adaptively giving her/him microworlds. A node of GMW has the description of a microworld which includes the model, its modeling assumptions (which can explain why the model is valid in the situation) and the tasks through which one can understand the model. The GMW, therefore, can adaptively provide a learner with the microworld and the relevant tasks to understanding it. An edge has the description of the difference/change between microworlds. The GMW, therefore, can provide the relevant tasks which encourage a learner to transfer to the next microworld and can explain how/why the behavioral change of the model is caused by the change of the situation in model-based way. This capability of GMW greatly helps a learner progressively refine, that is, reconstruct her/his knowledge in a concrete context.
1. Introduction Simulation-based learning environments (SLEs) have a great potential for facilitating exploratory learning: a learner could act on various objects in the environment and acquire knowledge in a concrete manner. However, it is difficult for most learners to be engaged in such learning activities by themselves. The assistance is necessary at least by providing the relevant task and settings through which a learner encounters new facts and apply them. The task, in addition, should be always challenging and accomplishable for a learner. With this view, a popular way is to provide a series of increasingly complex tasks through the progression of learning. Typically, in SLEs, a learner is first provided with a simple example and some exercises similar to it to learn some specialized knowledge, then provided with more complex exercises to refine the knowledge. This ‘genetic’ [11] approach has been generally used in SLEs for designing instruction [13][16][17]. The exercises to learn the specialized knowledge in SLEs means the situations in which a learner has to consider only a few conditions about the phenomena. The exercises to refine the knowledge means the situations in which she/he has to consider many conditions. In other words, the models are different which are necessary to think about the phenomena in SLEs. Therefore, it is reasonable to segment the domain knowledge into multiple models of different complexity, which is the basic idea of ‘ICM (increasingly complex microworlds)’ approach [3][7]. In ICM, a learner is introduced to a series of increasingly complex microworlds step by step, each of which has the simplified/focused domain model to its degree. This makes it easier to prevent a learner from encountering too difficult situations during exploration and to isolate the error about a segment of knowledge from the others, which greatly helps debug a learner’s misunderstandings. Several systems have been developed according to ICM approach and their usefulness has been verified [7][18][19][20][21]. The limitations of these systems are that they have little adaptability, and that they can hardly explain the differences between the models. It is important to adaptively change the situation to each learner’s knowledge state, her/his preference, the learning context etc. It is also important to explain why the new or more refined knowledge is necessary in the new situation. Though the existing ICM-based systems are carefully designed for progressive knowledge acquisition, the target knowledge of each microworld and the tasks for acquiring it isn’t necessarily explicitly represented on the system (The target knowledge of a microworld means its model. We say ‘a learner has understood the model’ in the same meaning as ‘she/he has acquired the target knowledge’). This makes it difficult to customize the series of microworlds for each learner, and to explain the necessity of microworld-transitions. In order to address these problems, the followings have to be explicitly represented: (1) the target knowledge of each microworld and the tasks for acquiring it, and (2) the difference of the target knowledge between the microworlds and the tasks for understanding it. In this paper, we propose a framework for describing such target knowledge and tasks of a series of microworlds to assist progressive knowledge acquisition. It is called ‘graph of microworlds (GMW)’: the graph structure the nodes of which stand for the knowledge about microworlds and the edges of which stand for the knowledge of the relation between them. By using the item (1), the GMW-based system can identify the microworlds for a learner to work on next, and provide the relevant tasks for her/him to acquire the target knowledge in each microworld. By using the item (2) (especially because it is described in model-based way), the system can provide the relevant tasks for encouraging a learner to transfer to the next microworld, and explain the necessity of the transition in model-based way. For example, the task is provided in which the previous model isn’t applicable but the new or more refined model is necessary. If a learner made a wrong solution by using the previous model, the system explains why her/his solution is wrong by relating it to the difference between the previous and new models, that is, the difference of models in two microworlds. This capability of the system would greatly help a learner progressively reconstruct her/his knowledge in a concrete context.
T. Horiguchi and T. Hirashima / Graph of Microworlds
671
In fact, there have been developed several SLEs which have multiple domain models. Such systems embody the ICM principle to some extent whether they refer to it or not. In QUEST [21], ThinkerTools [18][19][20] and DiBi [14], for example, a series of microworlds are designed to provide a learner with increasingly complex situations and tasks which help her/him acquire the domain knowledge progressively (e.g., from qualitative to quantitative behavior, from voltage value to its change, from uniform (frictionless) to decelerated (with friction) motion). In ‘intermediate model’ [9][10] and WHY [5][15], on the other hand, a set of models are designed from multiple viewpoints to explain the knowledge of a model by the one of another model which is easier to understand (e.g., to explain the macroscopic model’s behavior as the emergence from its microscopic model’s one). These systems, however, have the limitations described above. They usually have only a fixedly ordered series of microworlds. If one would use them adaptively, human instructors are necessary who can determine which microworld a learner should learn next and when she/he should transfer to it. Even though it is possible to describe a set of rules for adaptively choosing the microworlds, the rules which aren’t based on the differences of models couldn’t explain the ‘intrinsic’ necessity of transition. This is also the case about the recent non-ICM-based SLEs with sophisticatedly designed instruction [13][16][17]. Their frame-based way of organizing the domain and instructional knowledge often makes the change of tasks or situations in instruction ‘extrinsic.’ The GMW framework addresses these problems by explicitly representing the knowledge about the microworlds and the difference between them in terms of their models, situations, viewpoints, applicable knowledge and the tasks for acquiring it. 2. GMW: The Graph of Microworlds 2.1 Specification for the Description of Microworlds In microworlds, a learner is required not only (t1) to predict the behavior of the physical system in a situation, but also (t2) to predict the change of behavior of the system given the change of the situation. That is, there are two types of tasks each of which requires (t1) and (t2) respectively. The latter is essential for a learner to refine her/his knowledge because the change of the situation might change the model itself to be used for prediction. A learner should get able not only to accomplish the task by using a model, but also to do so by choosing the relevant model to the given situation. Our research goal is, therefore, (1) to propose a framework for describing a set of models and the differences/changes between them and, based on this description, (2) to design the functions which adaptively provide a learner with microworlds (i.e., situations and tasks) and explain how/why the models change according to the changes of situations. The model of a physical system changes when the situation goes out of the range within which it is valid. The range can be described as the modeling assumptions, which are the assumptions necessary for the model to be valid. In this research, we consider the followings*1: (a1) the physical objects and processes considered in a model (a2) the physical situation of the system (e.g., a constraint on the parameters’ domains/values, the structural conditions of the system) (a3) the behavioral range of the system to be considered (e.g., the interval between boundary conditions, the mode of operation) (a4) the viewpoint for modeling the system (e.g., qualitative/quantitative, static/dynamic) The change of modeling assumptions causes the model of physical system to change. From the educational viewpoint, it is important to causally understand a behavioral change of physical system related to its corresponding change of modeling assumptions. Therefore, our framework should include not only the description of (the change of) models but also the description of (the change of) modeling assumptions. In addition, it should also include the description of the tasks which trigger the change of models, that is, encourage a learner to think about the differences of models. Based on the discussion above, we propose the framework for describing and organizing microworlds in section 2.2. 2.2 Description and Organization of Microworlds 2.2.1 Description of a Microworld The following information is described in each microworld. (m1) the target physical system and a model of it. (m2) the physical objects and processes to be considered in the model (a1) (m3) the physical situation of the system (a2) (m4) the behavioral range of the system (a3) and the viewpoint for the modeling (a4) (m5) the skills necessary for the model-based inference (m6) the tasks and the knowledge necessary for accomplishing them. The items (m2), (m3) and (m4) stand for the valid combination of modeling assumptions which corresponds to a (valid) model of the physical system (m1). The item (m5) stands for the skills used with the model for accomplishing tasks (e.g., numerical calculation for a quantitative model). The item (m6) stands for the tasks to be provided for a learner, to each of which the knowledge necessary for accomplishing it (the subset of *1
We reclassified the modeling assumptions discussed in [6].
672
T. Horiguchi and T. Hirashima / Graph of Microworlds
(m1)-(m5)) is attached. From the viewpoint of model-based inference, there are two types of tasks: the task which can be accomplished by using the model of the microworld it belongs to, and the task which needs the transition to another microworld (that is, which needs another model) to be accomplished. All of the task (t1) are the former type. The tasks (t2) which don’t need the change of the model (i.e., the given change of conditions doesn’t cause the change of modeling assumptions) are also the former type. They are called ‘intra-mwtasks.’ The knowledge necessary for accomplishing an intra-mw-task can be described by using (m1)-(m5) of the microworld it belongs to. The tasks (t2) which need the change of the model (i.e., the given change of conditions causes the change of modeling assumptions) are the latter type. They are called ‘inter-mw-tasks.’ The knowledge necessary for accomplishing an inter-mw-task is described by using (m1)-(m5) of the microworld it belongs to and (m1)-(m5) of the microworld to be transferred to. The description of inter-mwtask includes the pointer to the microworld to be transferred to. 2.2.2 Organization of Microworlds In order to organize the set of microworlds as described above, we propose the ‘Graph of Microworlds (GMW).’ The GMW makes it possible to adaptively generate the series of microworlds to each learner. It is the extension of the ‘Graph of Models (GoM)’ [1][2] which is the framework for describing how the model of a physical system can change by the change of its constraints. The nodes of GoM stand for the possible models of the system and its edges stand for the changes of modeling assumptions (which are called ‘modeltransitions’). The GoM is applied to model identification by observational data, fault diagnosis etc. We extend the GoM to be the GMW the nodes of which stand for the microworlds and the edges of which stand for the possible transitions between them. Two educational concepts are introduced into GMW: the knowledge which a learner could acquire by understanding the model of a microworld, and the task by accomplishing which she/he could understand the model. The target knowledge of a microworld is its model, modeling assumptions and the skills used with the model (i.e., (m1)-(m5)). In order to encourage a learner to acquire it, the system provides her/him with the intra-mw-tasks of the microworld. In order to encourage a learner to transfer to another microworld, on the other hand, the system provides her/him with the inter-mw-task, the target knowledge of which is the difference between the knowledge about the two models. In GMW, two nodes have the edge between them if the difference between their target knowledge is sufficiently small (i.e., the transition between two microworlds is possible if it is educationally meaningful as the evolution of models). In the neighborhood of a microworld, therefore, there are a few microworlds which are similar to it in terms of the target knowledge. This makes it possible for the system to adaptively choose the next microworld according to the learning context. (Example-1) Curling-like Problem (1) Figure 1a shows a ‘curling-like’ situation. At the position x0, a stone M1 is thrown by a player with the initial velocity v0, then slides on the ice rightward until it collides with another stone M2 at the position x1. If the friction on the ice isn’t negligible and the initial velocity is small, it may stop between x0 and x1 (described as ‘the interval [x0, x1]’) without collision. By the player’s decision, the interval [x0, x1] may be swept with brooms (uniformly) before the start of M1. When modeling the behavior of this physical system, there can be various physical situations (e.g., the initial velocity is small/large, the friction is/isn’t negligible, the ice is/isn’t swept), behavioral ranges (e.g., the interval before/after the collision, the instant of collision) and viewpoints (e.g., qualitative/quantitative). Therefore, several models are constructed corresponding to them. These models are, with the tasks for understanding them, then organized into the GMW (as shown in Figure 1b). Some of the modeling assumptions and tasks in the microworlds are described as follows: MW-1: (m1) (m2) (m3) (m4) (m5) (m6)
v1(t) = v0, x1(t) = x0 + v0t uniform motion (no force works on M1) 0 < v0 < v01, μ1 < epsilon, not sweep([x0, x1]) position(M1) is in [x0, x1] numerical calculation (1) derive the velocity of M1 at the position x (x0 < x < x1). (2*) derive the velocity of M1 at the position x (x0 < x < x1) when it becomes μ1 > epsilon. [-> MW-2:(m6)-(1)] (3*) derive the velocity of M1 after the collision with M2 when it becomes v0 > v01 (assume the coefficient of restitution e = 1). [-> MW-4:(m6)-(1)]
MW-2: (m1) (m2) (m3) (m4) (m5) (m6)
a1(t) = -μ1M1g, v1(t) = v0 - μ1M1gt, x1(t) = x0 + v0t - μ1M1gt2/2 uniformly decelerated motion, frictional force from the ice 0 < v0 < v01, μ1 > epsilon, not sweep([x0, x1]) position(M1) is in [x0, x1] numerical calculation (1) derive the velocity of M1 at the position x (x0 < x < x1). (2) derive the position x (x0 < x < x1) at which M1 stops. (3*) derive the position x (x0 < x < x1) at which M1 stops when the interval [x0, x1] is swept. [-> MW-3:(m6)-(1)] (4*) derive the velocity of M1 after the collision with M2 when it becomes v0 > v01 (assume the coefficient of restitution e = 1). [-> MW-4:(m6)-(1)]
MW-3: (m1) (m2)
a1(t) = -μ2M1g, v1(t) = v0 - μ2M1gt, x1(t) = x0 + v0t - μ2M1gt2/2 uniformly decelerated motion, frictional force from the ice, heat generation by sweeping, melt of the surface of the ice by the heat (which makes the coefficient of friction decrease to μ2 and the temperature of the surface of ice increase to zero centigrade degree)
T. Horiguchi and T. Hirashima / Graph of Microworlds
(m3) (m4) (m5) (m6)
673
0 < v0 < v02, μ1 > μ2 > epsilon, sweep([x0, x1]) position(M1) is in [x0, x1] numerical calculation (1) derive the position x (x0 < x < x1) at which M1 stops.
MW-4: (m1) (m2) (m3) (m4) (m5) (m6)
M1v1 = M1v1’ + M2v2, -(v1’ - v2’)/(v1 - v2) = e elastic collision, the total kinetic energy is conserved v1 > 0, v2 = 0, e = 1 velocity(M1, x1) = v1 numerical calculation (1) derive the velocity of M1 after the collision with M2. (2*) derive the velocity of M1 after the collision with M2 when it becomes 0< e < 1. [-> MW-5:(m6)-(1)]
MW-5: (m1) (m2) (m3) (m4) (m5) (m6)
M1v1 = M1v1’ + M2v2, -(v1’ - v2’)/(v1 - v2) = e inelastic collision, deformation of the stones by collision (which makes the total kinetic energy decrease) v1 > 0, v2 = 0, 0 < e < 1 velocity(M1, x1) = v1 numerical calculation (1) derive the velocity of M1 after the collision with M2.
where, 1. v01 and v02 are the minimal initial velocities of M1 for the collision to occur when the coefficients of friction are μ1 and μ2 respectively. 2. If the coefficient of friction in [x0, x1] is smaller/larger than epsilon, the frictional force is/isn’t negligible. 3. The asterisked tasks are the inter-mw-tasks which have the pointers to the microworlds to be transferred to. 4. In MWs, the causal relations between (m2), (m3) and (m4) are explicitly described.
Suppose a learner who has learned ‘uniform motion’ by the intra-mw-task (1) in MW-1 is provided with the inter-mw-task (2*) of MW-1. She/he would be encouraged to transfer to MW-2 because the friction becomes not negligible by the change of physical situation in the task (by accomplishing this task, she/he would learn the ‘decelerated motion’ and ‘frictional force,’ which is the difference between MW-1 and MW-2). Suppose, on the other hand, she/he is provided with the inter-mw-task (3*) of MW-1. She/he would be encouraged to transfer to MW-4 because, in order to accomplish the task, it is necessary to consider the behavioral range (after collision) which is out of consideration in MW-1 (she/he would learn the ‘elastic collision,’ which is the difference between MW-1 and MW-4). In addition, suppose a learner is provided with the inter-mw-task (3*) in MW-2. If she/he use only the knowledge/skills she/he has acquired in MW-2, she/he would get a wrong solution. This error encourages her/him to learn the ‘heat generation’ and ‘melt of the ice,’ that is, to transfer to MW-3. In the similar way, the inter-mw-task (2*) in MW-4 encourages a learner to learn the ‘inelastic collision,’ that is, to transfer to MW-5. 3. Assistance in Microworld-Transition by Parameter Change Rules There are two types of microworld-transitions: the one which changes the behavioral range of the system to be considered or the viewpoint for the modeling (m4), and the other which (slightly) changes the physical situation of the system (m3). In the former, a learner usually can’t execute the procedure she/he previously learned for getting a solution because the different type of knowledge/skills (model) is required in the new microworld (suppose the transition from MW-1 to MW-4 in Figure 1b, for example). This would sufficiently motivate her/him to transfer to the new microworld. In the latter, on the other hand, a learner often could execute the previous procedure as it is. She/he, therefore, might get a wrong solution because the previous knowledge/skill (model) by itself is irrelevant to the new microworld (suppose the transition from MW-1 to MW-2 in Figure 1b, for example), and she/he might not be aware of the error. This makes it difficult for her/ him to transfer to the new microworld. In such a case, it is necessary to explain why the learner’s solution is wrong compared with the correct solution, in other words, how/why her/his previous model irrelevant to the new situation differs from the
674
T. Horiguchi and T. Hirashima / Graph of Microworlds
‘right’ model in the situation. Therefore, the model-based explanation is required which relates the difference between the behavior of the wrong and right models with the one between their modeling assumptions (that is, it relates the observable effect of the error with its cause). In this chapter, we show the method for generating such explanation by using a set of ‘parameter change rules.’ The framework of GoM has a set of ‘parameter-change rules’ each of which describes how a modeltransition (i.e., the change of modeling assumptions) qualitatively effects on the values of parameters calculated by the models. By using them, it becomes possible to infer the relevant model-transition when the values of parameters calculated by the current model (prediction) are different from the ones measured in the real system (observation). In the framework of GMW, such rules can be used for assisting a learner in microworld-transitions, which are described in the following form: If
the modeling assumptions (m2) change to (m2’), and the modeling assumptions (m3) change to (m3’) (and the other modeling assumptions (m4) don’t change) Then the values of some parameters qualitatively change (increase/steady/decrease) This rule means that if the model of the physical system (i.e., the physical objects and processes to be considered) changes by the change of physical situation, the values of some parameters of the system increase/steady/decrease. Consider the assistance in transferring from one microworld to the other. First, the parameter change rule which matches them is searched. By using it, the inter-mw-task is identified/generated which asks the (change of) values of those parameters when the physical situation changes. If a learner has difficulty in the task, the explanation is generated which relates the difference between the values calculated by the two models with the difference between their modeling assumptions (i.e., the physical objects and processes to be considered). Thus, the necessity of microworld-transitions can be explained based on the difference between the phenomena she/he wrongly predicted and the ones she/he experienced in the microworld. (Example-2) Curling-like Problem (2) We illustrate the two parameter change rules of the GMW in Figure 1b: one is for the transition from MW-1 to MW-2 and the other is for the transition from MW-2 to MW-3. They are described as follows: sliding(M1, ice), friction(M1, ice) = μ1, 0 < v0 < v01, not sweep([x0, x1]), and changed(μ1 < epsilon => μ1 > epsilon), and changed(consider(uniform motion) => consider(uniformly decelerated motion)), and considered(frictional force) Then decrease(velocity(M1, x))
PC-Rule-1:
If
PC-Rule-2:
If
sliding(M1, ice), and changed(not sweep([x0, x1]) => sweep([x0, x1])), and considered(heat generation, melt of the ice) Then change(friction(M1, ice) = μ1 => friction(M1, ice) = μ2 ; epsilon < μ2 < μ1), increase(velocity(M1, x), position(M1, v1 = 0))
By using PC-Rule-1, it is inferred that the inter-mw-task (m6)-(2*) of MW-1 is relevant to the transition from MW-1 to MW-2 because it asks the (change of) velocity of M1 when the coefficient of friction μ1 increases. By using PC-Rule-2, on the other hand, it is inferred that the inter-mw-task (m6)-(3*) of MW-2 is relevant to the transition from MW-2 to MW-3 because it asks the (change of) position at which M1 stops when the surface the ice is swept. If a learner has difficulty in these tasks, the model-based explanations are generated by using the information in these rules and microworlds. 4. Assistance in Microworld-Transition by Qualitative Difference Rules The assistance by parameter change rules is based on the quantitative difference of the behavior of physical systems. That is, what motivates a learner to change the model she/he constructed is the fact that the values of parameters calculated by her/his model differs from the ones observed in the microworld (which is calculated by the ‘right’ model). A learner, however, generally tends to maintain her/his current model (hypothesis). Even when the prediction by her/his model contradicts the observation, she/he often tries to dissolve the contradiction by slightly modifying the model (instead of changing the modeling assumptions) [4]. In addition, quantitative differences sometimes provide insufficient information for the change of modeling assumptions. It would be, therefore, often more effective to use the qualitative/intuitive difference for explaining the necessity of microworld-transitions. In this chapter, we show the method for generating such explanation by using a set of ‘qualitative difference rules’ (which are used complementarily to parameterchange rules). Each of qualitative difference rules describes how a model-transition effects on the qualitative states/ behavior of physical systems calculated by the models (e.g., in Figure 1, the existence of the water (the melted ice made by the frictional heat) in MW-3 qualitatively much differs from the absence of it in MW-2, which is out of the scope of parameter-change rules). They are described in the following form: If
the modeling assumptions (m2) change to (m2’), and the modeling assumptions (m3) change to (m3’) (and the other modeling assumptions (m4) don’t change)
T. Horiguchi and T. Hirashima / Graph of Microworlds
675
Then the qualitative differences of the states/behavior of systems occur In order to describe these rules, we first classify the differences of the states/behavior between two physical systems from some qualitative viewpoints. We then relate such differences to the ones of modeling assumptions by which they could be caused. In order to derive a set of qualitative difference rules systematically, we execute this procedure based on the qualitative process model [Forbus 84]. The procedure is described in the following two sections. 4.1 Concepts of Difference [12] The purpose of classifying the behavioral ‘differences’ of physical systems is to provide a guideline for describing the ‘educationally useful’ qualitative difference rules, which enable the assistance to motivate a learner as much as possible. When a learner can’t explain an observed phenomenon by her/his model, she/he is motivated to modify/change it. The strength of motivation and the relevancy of modification/change would much depend on what kind of difference she/he saw between the observation and her/his prediction. In Figure 1, for example, when a learner sees the water in MW-3, she/he who still uses the model of MW-2 would be surprised because it can’t exist by her/his prediction. In addition, the deformation of stones in MW5 (by the inelastic collision) would surprise a learner who still uses the model of MW-4 because they never deform by her/his prediction. Such differences would motivate a learner much more than the (slight) difference of the velocity of M1 or the (slight) difference of the energy of stones which might be neglected by her/ him. Therefore, the difference in physical objects’ existence/absence and the one in physical objects’ intrinsic properties (i.e., the classes they belong to) are supposed more effective for motivating a learner because of their concreteness, while the difference in the values of physical quantities are supposed less effective because of their abstractness. There can appear several/various ‘differences’ when a physical system behaves contrary to a learner’s prediction. Though all of them suggest her/his error more or less, it would be better to choose the ‘most effective difference’ to be pointed out to her/him*2. Therefore, the possible ‘differences’ and their ‘effectiveness’ in the behavior of physical systems should be systematically identified and classified. This, in addition, needs to be done in the model-based way because the qualitative difference rules will be described based on this identification/classification. With this view, we use the qualitative process model [8] because of its reasonable granularity and generality. That is, we regard a physical system and its behavior as a set of physical objects which interact each other through physical processes. The objects are directly/indirectly influenced by the processes and are constrained/changed/generated/consumed. The processes are activated/inactivated when their conditions become true/false. In order to observe the objects in such a system, we introduce the following viewpoints, each of which focuses on: (v1) how an object exists, (v2) how a relation between objects is, (v3) how an object changes through time, and (v4) how a relation between objects changes through time. If these are different between in the prediction and in the observation, a learner is supposed to recognize the difference of the behavior. Based on the viewpoints above, the differences are identified/classified as shown in Figure 2 (they are called ‘concepts of difference’). We illustrate some of them (see [12] for more detail): (d1) the difference about the existence of an object: If an object exists (or doesn’t exist) in the observation which doesn’t exist (or exists) in the prediction, it is the difference. In Figure 1, suppose the behavior of the model in MW-2 is the prediction and the one in MW-3 is the observation, the existence of water (the melted ice by the frictional heat) in the latter is recognized as the difference because it can’t exist in the former. (d2) the difference about the attribute(s) an object has (the object class): If an object has (or doesn’t have) the attribute(s) in the observation which the corresponding object doesn’t have (or has) in the prediction, it is the difference. In other words, the corresponding objects in the observation and prediction belong to the different object classes. In Figure 1, suppose the behavior of the model in MW-2 is the prediction and the one in MW-3 is the observation, the ice in the former belongs to ‘(purely) mechanical object class’ because it doesn’t have the attribute ‘specific heat,’ while the one in the latter belongs to ‘mechanical and thermotic object class’ because it has the attribute ‘specific heat.’ Therefore, the ice increasing its temperature or melting in the observation is the difference. In addition, suppose the model in MW-4 is the prediction and the one in MW-5 is the observation, the stones in the former belong to ‘rigid object class (the deformation after collision can be *2
The ‘most effective difference’ here means it is the most motive one. Of course, the difference should be also ‘suggestive’ which means it suggests the way to modify/change a learner’s model. This issue is discussed in section 4.2. At present, we are giving priority to motivation in choosing the ‘most effective difference,’ which could be complemented by other ‘more suggestive (but less motive) differences.’
676
T. Horiguchi and T. Hirashima / Graph of Microworlds
difference
(v1: how an object exists) An object exists/doesn’t exist (d1) An object has/doesn’t have an attribute (d2) An object has/doesn’t have a combination of attributes (d3) A constraint on an attribute’s value (d4) A constraint on a combination of attributes’ values (d5) (v2: how a relation between objects is) A combination of objects’ existence/absence (d6) A combination of objects’ attributes’ existence/absence (d7) A constraint on a combination of objects’ attributes’ values (d8) (v3: how an object changes along time) An object appears/disappears (d9) An object’s attribute appears/disappears (d10) A combination of an object’s attributes’ appearance/disappearance (d11) A change of an object’s attribute’s value (or constraint) (d12) A change of a combination of an object’s attributes’ values (or constraints) (d13) (v4: how a relation between objects changes along time) A combination of objects’ appearance/disappearance (d14) A combination of objects’ attributes’ appearance/disappearance (d15) A change of a combination of objects’ attributes’ values (or constraints) (d16)
Figure 2. Concepts of Differences
ignored),’ while the ones in the latter belong to ‘elastic object class (the deformation after collision can’t be ignored).’ Therefore, the deformed stones in the observation are the differences. In both cases, the objects in the observation show ‘impossible’ natures to a learner. In general, it would be reasonable to assume the effectiveness of these differences descends from (d1) to (d18) because of their concreteness/abstractness and simpleness/complicatedness. It is of course necessary to identify which differences of them are educationally important and how their effectiveness are ordered depending on each learning domain. The concepts of difference, however, at least provide a useful guideline for describing such knowledge. 4.2 Describing Qualitative Difference Rules Since the concepts of differences above are identified/classified in model-based way, they can be easily related to the differences of modeling assumptions of the models. That is, each of them can suggest what kind of physical processes, which influence the objects and the constraints on them, are/aren’t considered in the models and by what kind of physical situations these processes are/aren’t to be considered. This information could be formulated into qualitative difference rules. The qualitative difference rules are described based on the set of guidelines which are systematically derived from the concepts of differences. We illustrate an example (see [12] for more detail): (p1) Rules for the differences of the processes which influence (or are influenced by) an object’s (dis)appearance: If an object exists (or doesn’t exist) in the observation which doesn’t exist (or exists) in the prediction (d1), the followings can be the causes or effects: 1) The process which generates the object is working (or not working) in the former, and is not working (or working) in the latter. 2) The process which consumes the object is not working (or working) in the former, and is working (or not working) in the latter. 3) The influence of the process which generates the object is stronger (or weaker) than the one which consumes the object in the former, and is weaker (or stronger) in the latter. 4) By the existence (or absence) of the object, some process is working (or not working). Therefore, the following guideline is reasonable: (Guideline-1) As for the change of a physical process in (m2) (and the accompanying physical situation in (m3)), the difference about the existence an object can be its effect which is generated/consumed by the process, or can be its cause the existence/absence of which influences the activity of the process. The qualitative difference rules are used for both identifying/generating inter-mw-tasks and generating modelbased explanations, as are the parameter change rules. Especially, when a learner doesn’t understand the necessity of microworld-transition, it becomes possible by using them to indicate the qualitative differences of objects which are too surprising to neglect. Since there are usually several qualitative difference rules which match the microworld-transition under consideration, there will be listed several qualitative differences. The effectiveness of them can be estimated based on the concepts of differences and the most effective differences will be chosen. (Example-3) Curling-like Problem (3) We illustrate the two qualitative difference rules of the GMW in Figure 1b: one is for the transition from MW-2 to MW-3 and the other is for the transition from MW-4 to MW-5. They are described as follows: QD-Rule-1: If
sliding(M1, ice), and changed(not sweep([x0, x1]) => sweep([x0, x1])), and
T. Horiguchi and T. Hirashima / Graph of Microworlds
677
considered(heat generation, melt of the ice) Then appears(water): existence-diff.(d1), has-attribute(M1, specific-heat): class-diff.(d2) collides(M1, M2), coefficient-of-restitution(M1, M2) = e v1 > 0, v2 > 0, and changed(e = 1 => 0 < e < 1), and changed(consider(elastic collision) => consider(inelastic collision)) Then deforms(M1), deforms(M2): class-diff.(d2)
QD-Rule-2: If
By using QD-Rule-1, it is inferred that the inter-mw-tasks are relevant to the transition from MW-2 to MW3 which focus on the water on the surface of the ice or the increasing temperature of the ice, that is, the differences about the existence of an object or the one about the object class. By using QD-Rule-2, on the other hand, it is inferred that the inter-mw-tasks are relevant to the transition from MW-4 to MW-5 which focus on the deformation of the stones after collision, that is, the differences about the object class. If a learner has difficulty in these tasks, the explanation is generated which relates these differences to the melt process, the heat generation process or inelastic collision process. These rules are, from the viewpoint of motivation, preferred to the parameter change rules matched to these microworld-transitions (the latter identify the tasks which ask the quantitative differences of parameters). Since there is no qualitative difference rule that match the transition from MW-1 to MW-2, the PCRule-1 (which matches it) is used and the inter-mw-task (m6)-(2*) of MW-1 (which asks the quantitative change of the velocity of M1) is identified as the relevant task. Concluding Remarks In this paper, we proposed the GMW framework for assisting a learner’s progressive knowledge acquisition in SLEs. Because of its explicit description of microworlds and their differences, the GMW can adaptively navigate a learner in the domain models and generate model-based explanations to assist them. Though the implementation is now ongoing, we believe the GMW greatly helps a learner progressively reconstruct her/ his knowledge in a concrete context. References [1] Addanki, S., Cremonini, R. and Penberthy, J.S.: Graphs of models, Artificial Intelligence, 51, pp.145-177 (1991). [2] Addanki, S., Cremonini, R. and Penberthy, J.S.: Reasoning about assumptions in graphs of models, Proc. of IJCAI-89, pp.1432-1438 (1989). [3] Burton, R.R., Brown, J.S. & Fischer, G.: Skiing as a model of instruction, In Rogoff, B. & Lave, J. (Eds.), Everyday Cognition: its development in social context, Harvard Univ.Press (1984). [4] Chinn, C.A., Brewer, W.F.: Factors that Influence How People Respond to Anomalous Data, Proc. of 15th Ann.Conf. of the Cognitive Science Society, pp.318-323 (1993). [5] Collins, A. & Gentner, D.: Multiple models of evaporation processes, Proc. of the Fifth Cognitive Science Society Conference (1983). [6] Falkenhainer, B. and Forbus, K.D.: Compositional Modeling: Finding the Right Model for the Job, Artificial Intelligence, 51, pp.95-143 (1991). [7] Fischer, G.: Enhancing incremental learning processes with knowledge-based systems, In Mandl, H. & Lesgold, A. (Eds.), Learning Issues for Intelligent Tutoring Systems, Springer-Verlag (1988). [8] Forbus, K.D.: Qualitative Process Theory, Artificial Intelligence, 24, pp.85-168 (1984). [9] Frederiksen, J. & White, B.: Conceptualizing and constructing linked models: creating coherence in complex knowledge systems, In Brna, P., Baker, M., Stenning, K. & Tiberghien, A. (Eds.), The Role of Communication in Learning to Model, pp.69-96, Mahwah, NJ: Erlbaum (2002). [10] Frederiksen, J. & White, B. & Gutwill, J.: Dynamic mental models in learning science: the importance of constructing derivational linkages among models, J. of Research in Science Teaching, 36(7), pp.806-836 (1999). [11] Goldstein, I.P.: The Genetic Graph: A Representation for the Evolution of Procedural Knowledge, Int. J. of ManMachine Studies, 11, pp.51-77 (1979). [12] Horiguchi, T. & Hirashima, T.: A simulation-based learning environment assisting scientific activities based on the classification of 'surprisingness', Proc. of ED-MEDIA2004, pp.497-504 (2004). [13] Merrill, M.D.: Instructional Transaction Theory (ITT): Instructional Design Based on Knowledge Objects, In Reigeluth, C.M. (Ed.), Instructional-Design Theories and Models Vol.II: A New Paradigm of Instructional Theory, pp.397-424 (Chap. 17), Hillsdale, NJ: Lawrence Erlbaum Associates (1999). [14] Opwis, K.: The flexible use of multiple mental domain representations, In D. Towne, T. de Jong & H. Spada (Eds), Simulation-based experiential learning, pp.77-90, Berlin/New York: Springer (1993). [15] Stevens, A.L. & Collins, A.: Multiple models of a complex system, In Snow, R., Frederico, P. & Montague, W. (Eds.), Aptitude, Learning, and Instruction (vol. II), Lawrence Erlbaum Associates, Hillsdale, New Jersey (1980). [16] Towne, D.M.: Learning and Instruction in Simulation Environments, Educational Technology Publications, Englewood Cliffs, New Jersey (1995). [17] Towne, D.M., de Jong, T. and Spada, H. (Eds.): Simulation-Based Experiential Learning, Springer-Verlag, Berlin, Heidelberg (1993). [18] White, B., Shimoda, T.A. & Frederiksen, J.: Enabling students to construct theories of collaborative inquiry and reflective learning: computer support for metacognitive development, Int. J. of Artifi. Intelli. in Education, 10(2), pp.151-182 (1999). [19] White, B. & Frederiksen, J.: Inquiry, modeling, and metacognition: making science accessible to all students, Cognition and Instruction, 16(1), pp.3-118 (1998). [20] White, B. & Frederiksen, J.: ThinkerTools: Causal models, conceptual change, and science education, Cognition and Instruction, 10, pp.1-100 (1993). [21] White, B. & Frederiksen, J.: Causal model progressions as a foundation for intelligent learning environments, Artificial Intelligence, 42, pp.99-157 (1990).
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
678
The Andes Physics Tutoring System: Five Years of Evaluations Kurt VANLEHN1, Collin Lynch1, Kay Schulze2, Joel A. Shapiro3, Robert Shelby4, Linwood Taylor1, Don Treacy4, Anders Weinstein1, and Mary Wintersgill4 1 LRDC, University of Pittsburgh, Pittsburgh, PA, USA 2 Computer Science Dept., US Naval Academy, Annapolis, MD, USA 3 Dept. of Physics and Astronomy, Rutgers University, Piscataway, NJ, USA 4 Physics Department, US Naval Academy, Annapolis, MD, USA Abstract. Andes is a mature intelligent tutoring system that has helped hundreds of students improve their learning of university physics. It replaces pencil and paper problem solving homework. Students continue to attend the same lectures, labs and recitations. Five years of experimentation at the United States Naval Academy indicates that it significantly improves student learning. This report describes the evaluations and what was learned from them.
1
Introduction
Although many students have personal computers now and many effective tutoring systems have been developed, few academic courses include tutoring systems. A major point of resistance seems to be that instructors care deeply about the content of their courses, even down to the finest details. Most instructors are not completely happy with their textbooks; adopting a tutoring system means accommodating even more details that they cannot change. Three solutions to this problem have been pursued. One is to include instructors in the development process. This lets them get the details exactly how they want them, but this solution does not scale well. A second solution is to include the tutoring system as part of a broader reform with significant appeal to instructors. For instance, the well-know Cognitive Tutors (www.carnegielearning.com) are packaged with an empirically grounded, NCTM-compliant mathematics curriculum, textbook and professional development program. A third solution is to replace grading, a task that many instructors would rather delegate anyway. This is the solution discussed here. The rapid growth in web-based homework (WBH) grading services, especially for college courses, indicates that instructors are quite willing to delegate grading to technology. In physics, the task domain discussed here, popular WBH services include WebAssign (www.webassign.com), CAPA (www.lon-capa.org/index.html) and Mastering Physics (www.masteringphysics.com). Ideally, instructors still chose their favorite problems from their favorite textbooks, and they may still use innovative interactive instruction during classes and labs. [1] All that changes is that students enter their homework answers on-line, and the system provides immediate feedback on the answer. If the answer is incorrect, the student may receive a hint and may get another chance to derive the answer. Student homework scores are reported electronically to the instructor.
K. VanLehn et al. / The Andes Physics Tutoring System
679
Although WBH saves instructors time, the impact on student learning is unclear. WBH’s immediate feedback might increases learning relative to paper-and-pencil homework, or it might increase guessing and thus hurt learning. Although most studies merely report correlations between WBH usage and learning gains, 3 studies of physics instruction have compared learning gains of WBH to those of paper-and-pencil homework (PPH). In the first study, [2] one of 3 classes showed more learning with WBH than PPH. Unfortunately, PPH homework was not collected and graded, but WBH was. It could be that the WBH students did more homework, which in turn caused more learning. In the other studies, [3, 4] PPH problem solutions were submitted and graded, so students in the two conditions solved the roughly the same problems for the same stakes. Despite a large number of students and an impressive battery of assessments, none of the measures showed a difference between PPH students and WBH students. In short, WBH appears to neither benefit nor harm students’ learning compared to PPH. The main goal of the Andes project is to develop a system that is like WBH in that it replaces only the PPH of a course, and yet it increases student learning. Given the null results of the WBH studies, this appears to be a tall challenge. This paper discusses Andes only briefly—see [5] for details. It focuses on the evaluations that test whether Andes increases student learning compared to PPH. 2
The function and behavior of Andes
In order to make Andes’ user interface easy to learn, it is as much like pencil and paper as possible. A typical physics problem and its solution on the Andes screen are shown in Figure 1. Students read the problem (top of the upper left window), draw vectors and coordinate axes (bottom of the upper left window), define variables (upper right window) and enter equations (lower right window). These are actions that they do when solving physics problems with pencil and paper. Unlike PPH, as soon as an action is done, Andes gives immediate feedback. Entries are colored green if they are correct and red if they are incorrect. In Figure 1, all the entries are green except for equation 3, which is red. Also unlike PPH, variables and vectors must be defined before being used. Vectors and other graphical objects are first drawn by clicking on the tool bar on the left edge of Figure 1, then drawing the object using the mouse, then filling out a dialogue box. Filling out these dialogue boxes forces students to precisely define the semantics of variables and vectors. For instance, when defining a force, the student uses menus to select two objects: the object that the force acts on and the object the force is due to. Andes includes a mathematics package. When students click on the button labeled “x=?” Andes asks them what variable they want to solve for, then it tries to solve the system of equations that the student has entered. If it succeeds, it enters an equation of the form = . Although physics students routinely use powerful hand calculators, Andes’ built-in solver is more convenient and avoids calculator typos. Andes provides three kinds of help: x Andes pops up an error messages whenever the error is probably due to lack of attention rather than lack of knowledge. Typical slips are leaving a blank entry in a dialogue box, using an undefined variable in an equation (which is usually caused by a typo), or leaving off the units of a dimensional number. When an error is not recognized as a slip, Andes merely colors the entry red. x Students can request help on a red entry by selecting it and clicking on a help button. Since the student is essentially asking, “what’s wrong with that?” we call this What’s Wrong Help.
680
K. VanLehn et al. / The Andes Physics Tutoring System
Figure 1: The Andes screen (truncated on the right)
x
If students are not sure what to do next, they can click on a button that will give them a hint. This is called Next Step Help. What’s Wrong Help and Next Step Help generate a hint sequence that usually has three hints: a pointing hint, a teaching hint and a bottom-out hint. As an illustration, suppose a student who is solving Figure 1 has asked for What’s Wrong Help on the incorrect equation Fw_x = -Fw*cos(20 deg). The first hint, which is a pointing hint, is “Check your trigonometry.” It directs the students’ attention to the location of the error, facilitating selfrepair and learning. [6, 7] If the student clicks on “Explain more”, Andes gives a teaching hint, namely: If you are trying to calculate the component of a vector along an axis, here is a general formula that will always work: Let TV be the angle as you move counterclockwise from the horizontal to the vector. Let Tx be the rotation of the x-axis from the horizontal. (TV and Tx appear in the Variables window.) Then: V_x = V*cos(TV-Tx) and V_y = V*sin(TV-Tx). We try to keep teaching hints as short as possible, because students tend not to read long hints. [8, 9] In other work, we have tried replacing the teaching hints with either multimedia [10, 11]or natural language dialogues. [12] These more elaborate teaching hints significantly increased learning, albeit in laboratory settings. If the student again clicks on “Explain more,” Andes gives the bottom-out hint, “Replace cos(20 deg) with sin(20 deg).” This tells the student exactly what to do. Andes sometimes cannot infer what the student is trying to do, so it must ask before it can give help. An example is shown in Figure 1. The student has just asked for Next Step Help and Andes has asked, “What quantity is the problem seeking?” Andes pops up a
K. VanLehn et al. / The Andes Physics Tutoring System
681
menu or a dialogue box for students to supply answers to such questions. The students’ answer is echoed in the lower left window. In most other respects, Andes is like WBH. Instructors assign problems via email. Students submit their solutions via the web. Instructors access student solutions via a spreadsheet-like gradebook. They can accept Andes’ scores for the problems or do their own scoring, and so on. 3
Evaluations
Andes was evaluated in the U.S. Naval Academy’s introductory physics class every fall semester from 1999 to 2003. This section describes the 5 evaluations and their results. Andes was used as part of the normal Academy physics course. The course has multiple sections, each taught by a different instructor. Students in all sections take the same final exam and use the same textbook but different instructors assign different homework problems and give different hour exams, where hour exams are in-class exams given approximately monthly. In sections taught by the authors (Shelby, Treacy and Wintersgill), students were encouraged to do their homework on Andes. Each year, the Andes instructors recruited some of their colleagues’ sections as Controls. Students in the Control sections did the same hour exams as students in the Andes section. Control sections did homework problems that were similar but not identical to the ones solved by Andes students. The Control instructors reported that they required students to hand in their homework, and credit was given based on effort displayed. Early in the semester, instructors marked the homework carefully in order to stress that the students should write proper derivations, including drawing coordinate systems, vectors, etc. Later in the semester, homework was graded lightly, but instructors’ marks continued the emphasis on proper derivations. In some classes, instructors gave a weekly quiz consisting of one of the problems from the preceding homework assignment. All these practices encouraged Control students to both do the assignments carefully and to study the solutions that the instructor handed out. The same final exams were given to all students in all sections. The final exams comprised approximately 50 multiple choice problems to be solved in 3 hours. The hour exams had approximately 4 problems to be solved in 1 hour. Thus, the final exam questions tended to be less complex (3 or 4 minutes each) than the hour exam questions (15 minutes each). On the final exam, students just entered the answer, while on the hour exams, students showed all their work to derive an answer. The hour exam results will be reported first. 3.1
Hour exam results
Table 1 shows the hour exam results for all 5 years. It presents the mean score (out of 100) over all problems on one or more exams per year. In all years, the Andes students scored reliably higher than the Control students with moderately high effect sizes, where effect size defined as (Andes_mean – Control_mean)/Control_standard_deviation. The Year Andes students Control students Andes mean (SD) Control mean (SD) P(Andes= Control) Effect size
1999 173 162 73.7 (13.0) 70.4 (15.6) 0.036 0.21
Table 1: Hour exam results 2000 2001 2002 140 129 93 135 44 53 70.0 (13.6) 71.8 (14.3) 68.2 (13.4) 57.1 (19.0) 64.4 (13.1) 62.1 (13.7) < .0001 .003 0.005 0.92 0.52 0.44
2003 93 44 71.5 (14.2) 61.7 (16.3) 0.0005 0.60
Overall 455 276 0.22 (0.95) -0.37 (0.96)
Introduction Much recent research points to the important role of student motivation in learning. Students who are highly motivated set goals, monitor their progress, evaluate their understanding, and use strategies to enhance learning, and have higher grades and test scores than less-motivated students. In fact, behaviors associated with high motivation are a stronger predictor of academic learning outcomes in some studies than measures of general intelligence [1]. Thus, adding a model of learner motivation should increase the pedagogical effectiveness of ITS instruction. 1. Project objectives In this poster, we present our initial efforts to assess students' motivation and mood while working with an ITS. Self reports provided a reliable, non-intrusive and inexpensive source of motivation and mood data that could be easily collected in public school classrooms. Our initial target domain is high school mathematics, specifically, instruction in problem solving for high stakes achievement tests in the Wayang-West ITS. We were especially interested in the interaction of student gender and motivation. Much prior research indicates that females and males have different emotional reactions in mathematics, and that females have higher levels of test anxiety. Although females receive higher grades on average than males in math classes, females tend to score lower on high stakes achievement tests such as the SAT-M [2]. 2. Study methodology
2.1 Participants. The study included students (N = 47) in two high school geometry classes in a large high school in urban Southern California serving a diverse student population. Students worked with the Wayang-West ITS during their mathematics class each day over one-week period, under the supervision of their classroom teacher.
C.R. Beal et al. / Enhancing ITS Instruction with Integrated Assessments
751
2.2 Instruments. The Wayang-West ITS included integrated web pages at which students completed their daily "Math Personality Profile" (MPP). The MPP included instruments to assess a) beliefs about intelligence (fixed or possible to enhance), b) mood (anxious or relaxed/confident about the activity), c) mathematics motivation (self efficacy, liking of math, value of math), d) expected performance (predicted score on a real exam), e) intention to learn from the activity (attention, effort) and f) attribution for quality of the day's math work. 3. Results and discussion Student motivation. Not surprisingly, students who had high self efficacy in math before the ITS intervention had higher expectations for success, felt that math was less difficult to learn, and predicted that they would get higher test scores than students with lower self efficacy (all correlations p < .05). The classroom mathematics teacher provided grade information and independent ratings of the students' observed mathematics motivation, which were highly correlated with students' self reports (correlations p < .01). Although most students were quite motivated and thought they were doing well, nearly half were performing below grade expectations. Thus, there were many students who wanted to do well and seemed to be trying, but who were not actually mastering the class material. This presents a pedagogical challenge: The ITS must be designed to raise students' objective skills while sustaining motivation (e.g., having high hopes and trying hard is not enough; acquiring specific strategies and skills is also critical). Motivation and mood. Self efficacy was strongly correlated with the students' daily mood reports, and mood predicted students' specific estimates of their likely SAT-M scores, and the perceived difficulty of the task (i.e., more positive mood associated with lower perceived difficulty). Gender comparisons. There were no gender differences in students' mathematics motivation or mood before the ITS activity started. However, by the end of the final session, males' mood had increased, whereas females' mood had significantly declined, even though there was no objective difference in ITS problem solving performance for males and females. Mood reports were also linked to male and female students' estimates of their test score on a real exam. Initially, students estimated that their score would be about at the national average, but males' increased their score estimates as they worked with the ITS, whereas female students' estimates declined. Thus, student gender is a potentially important factor to be considered in a pedagogical model: Male and female students were performing similarly, and both felt that the material was becoming easier to learn, yet their emotional responses diverged over time, as did their expectations for successful outcomes. Use of multimedia help. Students' perception that math is difficult was a significant predictor of their use of help resources in the tutoring system, as was self-reported intention to learn. Not surprisingly, students' use of help resources (viewing multimedia hints) was negatively correlated with the number of incorrect answers entered (guessing or, "bottoming out"). Mood did not predict guessing behavior, but students who believed that people are born with a certain innate ability in math were more likely to enter multiple wrong answers per problem (i.e., to guess) than students with "incremental" beliefs about intelligence, F(1,39) = 5.487, p < .05. Providing praise for student performance can actually undermine math motivation, and increase beliefs that native talent is most important [3]. Thus, students may benefit most from ITS feedback focusing on their effort and use of the help resources as contributors to positive learning outcomes, rather than on feedback that emphasizes performance (number correct, scores relative to other students, etc.).
752
C.R. Beal et al. / Enhancing ITS Instruction with Integrated Assessments
In a regression analysis with students' estimated test score as the outcome variable, mood and learning intention were both significant predictors, and mood accounted for a higher proportion of the variance. Thus, for example, students who started the final session feeling relaxed, confident and at ease expected to do better on the real exam, relative to other students who had similar intentions to work hard at learning but who felt more anxious, tense and worried. This suggests that daily mood assessments will be important to include in the enhanced pedagogical model of the ITS. For example, the student who is anxious might benefit from an initial review of problems that have already been tackled, along with feedback emphasizing incremental beliefs (e.g., "small steps add up"). A second regression analysis focused on students' estimates of their likely test performance after the ITS activity was over. Gender and mood at the start of the last session were both significant predictors, whereas factors such as the number of problems completed, and the use of help resources, did not account for significant variance. Again, it appears that students' affective state influenced their response to the ITS activity, with females showing less positive mood than males. 4. Conclusions and next steps In this initial study, we established that students were able to report their motivational beliefs and affective states, using real-time self-reporting tools integrated into the ITS. Self reports were validated by ratings and grade information provided by their classroom mathematics teacher. The next step in the project is to implement our pedagogical model and select strategies appropriate for students who show high or low motivation, positive or negative affect, and so on. The pedagogical strategies that we are implementing are based on studies of how expert human instructors help students learn difficult material while sustaining motivation. Acknowledgements The Wayang Outpost ITS was originally developed with support from United States National Science Foundation HRD 0120809 [4]. The present project is supported by National Science Foundation grants HRD 0411532 and REC 0411886. The views represented in this paper do not necessarily reflect the position and policies of the funding agency. We would like to thank Ms. Devi Mattai, Ms. Jeanine Foote, Mr. Kenneth Banks, and Dr. Derick Evans of the Pasadena Unified School District for their support and assistance with this project. We would also like to thank Lewis Johnson and Mark Lepper for their assistance with the design of the motivational model. References
[1] Zimmerman, B. J., & Schunk, D. H. (Eds.). (2001). Self regulated learning and academic achievement: Theoretical perspectives (2nd ed.). Mahwah NJ: Erlbaum. [2] Beal, C. R. (1994). Boys and girls: The development of gender roles. New York: McGraw Hill. [3] Mueller, C. M., & Dweck, C. S. (1998). Praise for intelligence can undermine children's motivation and performance. Journal of Personality and Social Psychology, 75, 33-52. [4] Beal, C. R., Woolf, B. P., & Royer, J. M. (2001-2004). AnimalWorld: Enhancing high school women's mathematical competence. National Science Foundation HRD 0120809.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
753
Exploring Simulations in Science through the Virtual Lab Research Study: From NASA Kennedy Space Center to High School Classrooms Laura BLASI Department of Educational Research, Technology, and Leadership (ERTL) The College of Education, The University of Central Florida (UCF) PO Box 161250 Orlando, FL 32816-1250, USA
[email protected] Abstract: Documenting the use of the Virtual Lab within nine high school biology classrooms in an urban school district in Florida, this study focuses on general science classes (i.e. not advanced or honors-level) within underserved populations over the 20042005 school year (n=225). The baseline data is presented from an administration of the Test of Science Related Attitudes (TOSRA) (Fraser, 1981) as the author shares preliminary analysis of the usability data in summer of 2005. With funding from the Bellsouth foundation, the researchers are contributing to the development of the 3d environment and the scanning electron microscope (SEM) simulation, in conjunction with the efforts of educational technology specialists at the Kennedy Space Center (KSC), NASA.
1. Introduction Students make meaning and learn through the stories that contexualize the tools, skills, and facts that we give them. As Schank (1995) has observed, "Our machines do not solve puzzles, nor do they do mathematics. Rather, our aim is to make them interesting to talk to, an aspect of intelligence often ignored by computer professionals and intelligence assessors.” Simulations provide access to the tools needed in high school classrooms for hands-on science exploration (c.f. Gordon and Pea, 1995) and this study aimed to document a baseline of data regarding the high school student population as well as the responses of students interacting with the simulation in a usability study contributing to the further development of the Virtual Lab. The Virtual Lab provides a navigable 3d lab environment run from the computer (http://education.ksc.nasa.gov/edtech/vl.htm). This project was funded through the NASA Learning Technologies Project and targets high school and entry-level college students. The software provides an environment with enough information and realism to give students the experience of operating the actual SEM instrument. Prior studies have explored the impact of the use of simulations in science on students achievement and attitudes (Huppert, Lomask, and Lazarowitz, 2002; Geban, Askar, and Ozkan,1992) with positive results, as the simulations allow students repeated practice with access to sophisticated equipment. 2. The Methods This study focused on tenth graders, students approximately 16 years old, at a time when neuroscience has shown an increased capacity for scientific reasoning, at the stage of development students enter
754
L. Blasi / Exploring Simulations in Science Through the Virtual Lab Research Study
after their middle school years (Kwon and Lawson, 2000). The teachers who participated, leading the nine high school general science (not advanced) biology classrooms, were teaching in the schools for a minimum of one year prior to the study. At each of the three schools there was one control group (C) and two treatment groups: one with the technology and training (B) and another with technology, training, and assistance (A). The sample (N=225) was randomized at the classroom level. An experimental design was used to document the impact – if any – of the researcher interactions within the classrooms (A), measuring change by a pre- and post-test administration of the Test of Science Related Attitudes (TOSRA) (Fraser, 1981). Standardized achievement test items in science from the fifth grade and the tenth grade tests were used to document student performance in biology across the three schools. Grounded in this baseline of data regarding attitudes toward science, test item performance, and demographic data, the usability study focused on Group A. Researchers spent time in classrooms documenting students’ on screen use of the technology (video) as well as their voices (audio), as they narrated their reactions and experience navigating the software program without interruption (Dumas & Redish,1993). Cognitive interviewing techniques were used to document student perspectives in response to specific questions during their software use (Ericsson & Simon, 1993). 3. Findings and Further Research While the usability data is being analyzed in the summer of 2005, the preliminary findings provide some insight into the conditions for developing and implementing simulations in general science classrooms at the high school-level in the US. This information is especially important for developers who may not have classroom teaching experience and may not be able to interact with students in the content areas. The demographics reveal that of the three schools, school 2 had the highest population of students who were not white (78%), and the highest number of students who were eligible for federal assistance through the “free or reduced lunch” program (53%). The interviews with the teachers documented their interest in using tools in the science curriculum alongside observation in order to document student achievement. As one explained: "The best way I can measure some type of gain in learning it’s usually with some type of hands-on activity, which requires them to do some type of performance tasks in front of me…you can cover microscope use...preparing the slides…knowing how to manipulate the microscope to see a better picture...." ( Sarah, 10th grade biology teacher). However a consistent theme across interviews was confirmed by the research team observations in all three schools: science classes in this district have little or no access to computer labs due to the emphasis on preparation for standardized tests in other subjects, while at the same time classrooms are rarely equipped with computers beyond the teacher’s desk and the hardware donated for participation in this research study. A small sample of achievement test items in biology were administered from fifth grade and tenth grade exams at the beginning of the study. Analyzing only the performance on fifth grade items in fall 2004, these tenth grade students in schools 1 and 3 scored correctly on 60% of the items, while students in school 2 scored correctly on 48% of the items. This performance matches the ratings by the state based on overall standardized test results, which rank school 2 the lowest of the three. Using the TOSRA (Fraser, 1981), a 70-item self-report instrument with seven dimensions, researchers documented: a) attitudes towards careers in science; b) evidence of scientific attitudes; c) evidence of application of scientific inquiry as measured by a pre-post administration of the TOSRA. A higher mean score (mean scores can range between 1 and 5) is indicative of a more positive view of science and the pretest documented that student scores were not significantly different across schools, despite differences in demographics and performance on state standardized achievement tests (mean=3.1 out of a possible 5). Overall, observations and interviews across all three schools revealed that the conditions for implementing classroom simulations are not optimal. The schools varied in terms of
L. Blasi / Exploring Simulations in Science Through the Virtual Lab Research Study
755
demographics and achievement performance, however the student scores on the TOSRA were not significantly different, with an average score of 3 out of 5 on the 70-item instrument. While conducting an analysis of the usability data, the author will work at NASA KSC in the summer of 2005. The long term goal of the study is to contribute to the further development of educational software and simulations at NASA KSC, with high school students who are in general science tracks as the primary audience. This report was written as a Fellow in the Academy for Teaching and Learning at UCF. References Dumas, J.S., & Redish, J.C. (1993). A practical guide to usability testing. Norwood, NJ, Ablex. Ericsson, K. A. & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Revised edition. Cambridge, MA: The MIT Press. Fraser, B. J. (1981). TOSRA. Test of Science-Related Attitudes handbook. The Australian Council for Educational Research Limited. Geban, O., Askar, P., & Ozkan, I. (1992). Effects of computer simulations and problem-solving approaches on high school students. Journal of Educational Research, 86(1), 5. Gordon, D.N. and Pea, R.D. (1995). Prospects for scientific visualization as an educational technology. Journal of Learning Science 4(3), 249–279. Huppert, J., Lomask, S. M., & Lazarowitz, R. (2002). Computer simulations in the high school: Students' cognitive stages, science process skills and academic achievement in microbiology. International Journal of Science Education, 24(8), 803821. Kwon, Y. J., & Lawson, A. E. (2000). Linking brain growth with the development of scientific reasoning ability and conceptual change during adolescence. Journal of Research in Science Teaching, 37(1), 44-62. Schank, R. C. (1995). Tell me a story: Narrative and intelligence. Evanston, Ill.: Northwestern University Press.
756
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Generating Structured Explanations of System Behaviour Using Qualitative Simulations Anders BOUWER and Bert BREDEWEG University of Amsterdam, Human-Computer Studies Lab The Netherlands Email: {bouwer,bredeweg}@science.uva.nl Abstract. This paper presents an approach to generate structured explanations of system behaviour based on qualitative simulations. This has been implemented in WiziGarp, a domain-independent learning environment. The main research question addressed here is how to manage the complexity of the simulations in order to generate adequate explanations.
1. Introduction Qualitative simulations explicitly represent the kinds of knowledge that can support learners in building their own conceptual model of dynamic phenomena. This knowledge is used to generate a state-transition graph of all possible behaviours. The main problem in using qualitative simulations in education is that due to the amount of detail included, a simulation can be very complex, i.e., containing a large number of states and transitions, and a large amount of information within each state. 2. Aggregation of Qualitative Simulations In order to organize the information to be communicated, five levels of aggregation are introduced, which vary from the individual system state level via longer time-frames to the global level, at which alternative possibilities occur. For each of these levels, aggregation techniques have been implemented which reduce the amount of information to be communicated by the WiziGarp system. On the system state level, the status of causal dependencies is analyzed to arrive at a classification of causal effects, as inactive, submissive, balanced, or effective. This allows grouping and selection of those dependencies which have an actual effect, discarding dependencies whose effect does not contribute to the outcome of the simulation. The classification refines work by De Koning et al. [2], who distinguish only between submissive and effective dependencies, and work by Mallory [5], who attributes similar labels directly to quantities instead of to their effects. Figure 1 shows an example screenshot from WiziGarp with a subset of dependencies for a grass population in Brazilian Cerrado ecology [6], including submissive dependen-
A. Bouwer and B. Bredeweg / Generating Structured Explanations of System Behaviour
757
Figure 1. A WiziGarp screenshot: dependencies for the grass population and an exercise question
cies (the positive influences), as well as effective ones (the negative influences and the positive proportionalities). On the local event level, recognition of events is performed. Information from adjacent states is selected and combined to form larger chunks of information, specifying meaningful events of various types, such as value and derivative events (e.g., Qx starts to increase), causal effect events (e.g., the influence from Qx on Qy becomes inactive), and model fragment events (e.g., the model fragment for a particular process becomes active). On the path (segment) level, additional value and derivative events are recognized by selection and chunking of lower level events (e.g., the highest value of Qx that is reached in the path (segment) P is V , or Qx fluctuates between V1 and V2 , respectively). On the global level, transitive reduction and aggregation of alternative orderings are used to simplify the state-transition graph. Transitive reduction abstracts from all transitions T (= Sx → Sy ) for which holds that there is a path P from Sx to Sy which does not contain transition T, with the extra condition that P contains the same events as T. Aggregation of alternative orderings abstracts paths which divert and reunite, if they include the same events, albeit in a different order. The algorithms for these techniques can be found in [1]. In the figure, the state-transition graph shown is the result of performing aggregation of alternative orderings. It contains 6 states and only 2 paths, whereas the original state-transition graph contained 19 states and 869 possible paths. This reduction is possible because most paths contain the same events and only differ in the order in which the events occur.
758
A. Bouwer and B. Bredeweg / Generating Structured Explanations of System Behaviour
3. Generating Interactive Explanations WiziGarp can present topics using various didactic means: diagrams, textual descriptions, causal explanations, contrastive explanations, queries or exercise questions. Which ones are used in what order, as well as the desired level of aggregation for each topic, can be specified in a didactic plan (these are currently handcrafted). WiziGarp can take the initiative by asking exercise questions that are automatically generated, as described in detail in [4]. Figure 1 includes a question generated about the grass population. The learner can also take the initiative and ask for specific information to be answered by WiziGarp. To this end, events on the local level and path (segment) level are determined for the quantities of interest. The learner can ask queries about a particular event by selecting it and choosing a query from a popup menu that arises. For example, when the learner asks why the number of trees has started to increase, this is causally explained by the introduction of the effect of immigration. In addition, contrastive explanations can be generated which highlight the differences between states, or paths.
4. Discussion Several parts of the WiziGarp architecture have been evaluated by potential users and domain experts, such as the question generation module [4], and the diagrammatic representations [1]. The results of these evaluation studies are encouraging and have informed the design of the other modules. The approach is generic and has also been tested on other domains, such as physics and biology. Compared to related work that addresses explanations for simulations [3, 5], WiziGarp encompasses a more extensive taxonomy of events, more flexible aggregation mechanisms, and a richer set of didactic means. Future work will address reactive curriculum planning, based on learner answers.
References [1] A. Bouwer. Explaining Behaviour: Using Qualitative Simulation in Interactive Learning Environments. PhD thesis, University of Amsterdam, to appear. [2] K. de Koning, B. Bredeweg, J. Breuker, and B. Wielinga. Model-based reasoning about learner behaviour. Artificial Intelligence, (117):173–229, 2000. [3] K. D. Forbus, P. B. Whalley, J. O. Everett, L. Ureel, M. Brokowski, J. Baher, and S. E. Kuehne. Cyclepad: An articulate virtual laboratory for engineering thermodynamics. Artificial Intelligence, 114(1/2):297–347, 1999. [4] F. Goddijn, A. Bouwer, and B. Bredeweg. Automatically generating tutoring questions for qualitative simulations. In P. Salles and B. Bredeweg, editors, Proceedings of QR’03, the 17th Int. workshop on Qualitative Reasoning, pages 87–94, Brasilia, Brazil, 20-22 Aug. 2003. [5] R. S. Mallory. Tools for Explaining Complex Qualitative Simulations. PhD thesis, Department of Computer Sciences, University of Texas at Austin, 1998. [6] P. Salles and B. Bredeweg. Constructing progressive learning routes through qualitative simulation models in ecology. In G. Biswas, editor, Proceedings of QR’01, the 15th Int. workshop on Qualitative Reasoning, pages 82–89, San Antonio, Texas, 17-19 May 2001.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
759
The Bricoles project: support socially informed design of learning environment Pierre-André Caron, Alain Derycke, Xavier Le Pallec Laboratoire Trigone, équipe NOCE Université des Sciences et Technologies de Lille 59655 Villeneuve d'Ascq cedex - France
[email protected] Abstract: In this paper we describe our current work on the BRICOLES project. Its objective is to provide an environment which help s teachers to elaborate e-learning courses. We show how MDD approach could be apply to e-learning creation. We use RAM3, a meta-modeling tool which is developed in our laboratory. With RAM3, we express pedagogical scenario and study models define on different e-learning platform metamodel. Then we expo rt model on target ed platform with scripts.
Introduction The BRICOLES [Bring off Reflexive, Intuitive and Conceptual Open LEarning System] project main objective is to suggest solutions to reintroduce teacher in e-learning courses design.
The design of the Bricoles project with a Model Driven Approach Using artifacts[1] and “Bricolage”[2] are two natural ways to help teacher to design distant courses. We chose to adopt ideas from model driven engineering to “materialize” such concept/method. Broadly speaking, using model driven tools begins by defining a logical model, without technical/implementation details, and ends by automatic generation of corresponding application (after selecting the implementation platform).Graphical Models, which are manipulated provide better boundary objects and models transformation allows “Bricolage” by reusing experience of others and by adapting models to target platform[3]. Supporting different modeling formalisms is complex to implement and models transformation generally needs to define transformation rules which are not easy. In our context, fortunately these rules are not defined by teachers. We use Model Driven Approach, by defining the corresponding metamodel for each e-learning platform and implement deployment facility which will be fed by instances of previous metamodels. Then, we put IMS-LD [4] forward for modeling. This pedagogical metamodel represents a
760
P.-A. Caron et al. / The Bricoles Project: Support Socially Informed Design
standardization effort from the educational community), it allows teachers to express easily dependencies between pedagogical intention and platform functionality.
Life cycle More than a tool, our proposition consists in a design environment which is composed by two tools: RAM3 [5] - to support different modeling formalisms; GenDep - to deplo y model on e-learning platform.
use
students
GenDep metamodel IMS-LD Claroline Ganesha Accel
models java Course IMS-LD java Course for Claroline java Course for Ganesha java Course for Accel java Course for other LMSs…
use
Pedagogic engineering
Co-build
Pedagogic engineering
Co-develop
Choose
teacher
Run Java course
Ganesha Accel Claroline
Transformation
RAM3
Fig 1 life cycle
Figure 1 represents roles of each hypothetical user of our environment. Computer scientists and pedagogical engineers define, with RAM3, modeling formalisms like IMS-LD or Ganesha's metamodel. Teachers or pedagogical engineers define, with RAM3, models of pedagogical scenarios according to metamodels defined before. With GenDep the teacher deploys an instance of a course (play the scenario) in a e -learning platform (Ganesha [6]).
Fig 2 Course java
We illustrate the use of these tools for a teacher who wants to teach a Java course. The Java teacher defines the scenario. He begins to load IMS-LD in RAM3. Then, he may define the IMS-LD model corresponding to the Java course ( Fig 2). He describes the different roles
P.-A. Caron et al. / The Bricoles Project: Support Socially Informed Design
761
(students, author, and teacher) and the different phases (to study documents, to do exercises, to realize small project) which may run in parallel. Next, he chooses to project his model to the Ganesha platform (he can do this because transformation rules has been defined before). The transformation engine creates the Ganesha model corresponding to the previous IMSLD model. The teacher may use RAM3 to edit the resulting Ganesha model in order to improve or refine it . This transformation/exportation reveals differences between the IMSLD scenario and the Ganesha one. Finally, teacher uses GenDep to deploy his Ganesha model on the platform (Ganesha) where it has to do his course. GenDep asks him web address of the platform, it simulates a web user filling web forms, which are presented by the platform, in order to deploy corresponding group, to assign students to group, to allocate resources to students…. Simulation is done by sending HTTP requests (protocol used on Internet). We are studying same process to export our scenario on Claroline, Moodle and PostNuke platforms [7].
Conclusion Modeling is the main principle of our proposition. It softens the transition between needed pedagogical bricolage and needed computer structural data. For different projects (like European Kaleidoscope [8]) we have written several scenarios and modeled them. We now want to specify metamodel of other platforms and their associated “deployment protocol”. We are developing a graphical editor to define transformation rules and we are studying other artifacts.
Reference [1] Wartofsky, Marx W (1973), Perception, representation, and the forms of action: toward an historical epistemology, in Wartofsky, M. W., Models, Dordrecht: D. Reidel Publishing Company, 1979. [2] Levi-Strauss, C., (1966). The Savage Mind. (2nd. ed.) Chicago: University of Chicago Press.p 17 [3] Büscher Monika , Gill Satinder, Mogensen Preben and Shapiro Dan, Bricolage as a Method for Situated Design, Landscapes of Practice [4] Koper, R. Modeling Units of Study from a Pedagogical Perspective – The Pedagogical Meta-Model behind EML. Open University of the Netherlands. First Draft, Version 2, 2001. [5] Le Pallec Xavier, Derycke Alain, RAM3: towards a more intuitive MOF metamodelling process, SCI 2003, Systemics, Cybernetics and Informatics. [6] http://www.anemalab.org/ganesha/ [7] http://www.claroline.net/ , http://moodle.org/ , http://www.postnuke.com [8] http://www.noe-kaleidoscope.org/
762
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Explainable Artificial Intelligence for Training and Tutoring H. Chad LANE, Mark G. CORE, Michael VAN LENT, Steve SOLOMON, Dave GOMBOC University of Southern California / Institute for Creative Technologies 13274 Fiji Way, Marina del Rey, CA 90292 USA {lane, core, vanlent, solomon, gomboc}@ict.usc.edu Abstract This paper describes an Explainable Artificial Intelligence (XAI) tool that allows entities to answer questions about their activities within a tactical simulation. We show how XAI can be used to provide more meaningful after-action reviews and discuss ongoing work to integrate an intelligent tutor into the XAI framework.
Introduction Military training aids typically provide an after-action review (AAR) tool to allow students to review their exercises and ideally learn from them. Common features of these tools include mission statistics, a list of accomplished and failed objectives, and sometimes a mission replay feature. Because of increasingly complex artificial intelligence (AI) in these training aids, it has been difficult for users of such AAR tools to understand how their orders translate into the activities of computer-controlled entities with such limited AAR tools. Student users have the additional disadvantage of possessing fragmented and possibly flawed domain knowledge: they are faced not only with learning new tactical knowledge (i.e., how units perform their tasks) and new skills (i.e., constructing plans and updating them on the fly), but also comprehending emergent behaviors and their triggers. To provide a better AAR tool and to help users better understand entities’ actions in military simulations, we have developed a portable Explainable AI (XAI) module that allows a user to question entities directly about their actions, status, and goals. For live training exercises, the U.S. Army defines AARs as “a review of training that allows soldiers, leaders, and units to discover for themselves what happened during the training and why. It is also used to solicit ideas on how the training could have been performed better” [2, App. G]. During such AARs, participating soldiers are usually available for questioning. The goal of our research is to use XAI to provide this feature for computer-controlled soldiers. 1. Explainable Artificial Intelligence for Training Although our current XAI system was not designed exclusively for pedagogical purposes, it nonetheless addresses the first part of the AAR definition given above: it allows for a deeper investigation of what happened during an exercise. To interact with entities, the student may select from time points of focus, a particular entity to question, and finally a question to ask. Some questions are specific to a particular entity (e.g., what is your health?) while others concern a larger group (e.g., what is your unit’s task?). Time points and available questions are dependent on the entity in focus and the discourse context and exchanges with the system are displayed as question and answer dialogues. More details about the interface and system implementation can be found in [3].
H.C. Lane et al. / Explainable Artificial Intelligence for Training and Tutoring
763
A sample interaction with our XAI 1. User: What is your health/damage status? system appears in Figure 1. The student 2. Gligoric [2:26]: I am unhurt. begins by asking questions of Gligoric, an 3. User: Let's fast forward to simulation time 4:16. opposing force entity (OPFOR). Later, the 4. Gligoric [4:16]: I am currently withdrawing. student changes to Evans, a blue force entity 5. User: What is your health/damage status? (BLUFOR). This dialogue is intended to 6. Gligoric [4:16] I have been killed in action. 7. User: Let's talk to Private Evans. show how a student might investigate an 8. Evans [4:16]: I am the fire team leader of Blue event using the XAI tool. The student here Fireteam 2. My unit's task is to clear the W room of the post office. probably believes Evans’ unit was somehow involved in the death of Gligoric and is 9. User: Let’s rewind to simulation time 2:32 examining the events leading up to it. 10. Evans [2:32]: I am currently transiting to the right stack position outside of the NW door Although some of this information could of the W room of the post office. potentially be derived from a graphical replay 11. User: What are you shooting at? (we have provided this capability in previous 12. Evans [2:32]: I am shooting at an entity located at 125 m NNE of the NW door door of the W work), there is a chance it will not be room of the post office. obvious to a student from the replay (e.g., line 9). Such questioning can also therefore Figure 1. Sample XAI dialogue be used by the student to confirm inferences made from visual evidence alone. In its current state, XAI is best viewed as a discovery environment. That is, the student is free to use the tool as desired to explore the simulation results and achieve the goals of the AAR. Having just completed the exercise for “real,” interactions, such as the one in the figure, give the student a chance to learn more about the entities and what they experienced. It is up to the student, however, to ask the right questions of the right entities and understand the responses. Focusing more specifically on our system’s dialogue manager and natural language generator, we see that pedagogical support is built into these components. Currently we maintain a simple dialogue state consisting of all the entities and units that the user has talked with. In the dialogue in figure 1, Evans introduces himself as fire team leader and describes his unit’s task because the student has not talked with either Evans or anyone else in that unit. This feature is a placeholder for more powerful reasoning about how to adapt the system’s output to the student (e.g., it should not use undefined technical terms, it may need to explicitly state knowledge implied by its explanations). Although it is currently simulation-dependent, our system also maintains specific points of reference to refer to when responding to questions that require some location-oriented answer (e.g., line 12 in the Figure 1).
2. Related Work The motivation for and technical challenges of explaining the internal processing of AI systems have been explored at length in the context of medical diagnosis systems. One prominent example, MYCIN, used a complex set of rules to diagnose illness and suggest treatments based on patient statistics and test results [6]. The developers of these systems were quick to realize that doctors were not going to accept the expert system’s diagnoses on faith. Consequently, these systems were augmented with the ability to provide explanations to justify their diagnoses. Education becomes a natural extension as well since explanation is often an important component of remedial interventions with students. Three notable efforts falling into this category are the Program Enhancement Advisor (PEA) for teaching LISP programmers to improve their code [5], the family of successors to MYCIN [1], and another entity-driven explanation system, Debrief [4].
764
H.C. Lane et al. / Explainable Artificial Intelligence for Training and Tutoring
3. XAI for Tutoring Evidence for learning in pure discovery environments is marginal [5], and so we are in the early stages of designing an intelligent tutoring module with the goal of providing a more guided discovery experience for students. We adopt the general goals of an AAR: review what happened, investigate how and why these events occurred, and discuss how to improve future performance. Answering why questions is a significant technological challenge, but also highly relevant to good tutoring. For example, discovering why a unit has paused in the middle of executing a task has the potential to help a student who gave the order to proceed. This may require reasoning about previous or concurrent events in the simulation. If a unit is currently under fire, for example, it is critical that the student understand what has caused the delay. It could very well involve an earlier mistake, such as failing to provide cover. The student could be asked to analyze the situation and suggest ways to allow the unit in question to proceed. One such question would be “Now that you have learned why this unit is delayed, what was missing from your plan?” If the student cannot generate any ideas, hints such as “Can you think of a way to conceal the unit for safe movement?” or “Do you see any other nearby units that could provide cover fire?” would be appropriate. We hypothesize that questions such as these, and more dynamic AARs, will improve students’ self-evaluation skills and problem solving abilities within the simulation. In addition to working with tactical behaviors, we are also in the early phases of targeting non-physical behaviors, such as emotional response, for explanation. This has advantages for systems that aim to teach subjects such as negotiation skills, cultural awareness or sensitivity. Explaining why an utterance (by a user) has offended an automated entity is, for example, similar to explaining emergent tactical behaviors. Tutoring in situations like this would, we believe, also be similar (e.g., “Could you have phrased that differently?”).
Acknowledgements The project or effort described here has been sponsored by the U.S. Army Research, Development, and Engineering Command (RDECOM). Statements and opinions expressed do not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.
References [1] Clancey, W. J. (1986) From GUIDON to NEOMYCIN and HERACLES in twenty short lessons, AI Magazine, Volume 7, Number 3, pages 40-60. [2] FM 25-101. (1990) Battle Focused Training. Headquarters, US Dept. of the Army. Washington D.C. [3] Gomboc, D., Solomon, S., Core, M. G., Lane, H. C., van Lent, M. (2005) Design Recommendations to Support Automated Explanation and Tutoring. To appear in Proceedings of the 2005 Conference on Behavior Representation in Modeling and Simulation (BRIMS), Universal City, CA. May 2005. [4] Johnson, W. L. (1994) Agents that learn to explain themselves. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 1257-1263. [5] Mayer, R. M. (2004) Should There Be a Three-Strikes Rule Against Pure Discovery Learning? American Psychologist, Volume 59, Number 1, pages 14-19. [6] Shortliffe, E. H. (1976) Computer-based Medical Consultations: MYCIN. Elsevier, New York. [7] Swartout, W. R., Paris, C. L., and Moore, J. D. (1994) Design For Explainable Expert Systems. IEEE Expert, Volume 6, Number 3, pages 58-64.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
765
An Agent-based Framework for Enhancing Helping Behaviors in Human Teamwork Cong CHEN, John YEN The School of Information Sciences and Technology, the Pennsylvania State University University Park, PA 16802, U.S.A. Michael MILLER and Richard VOLZ Department of Computer Science, Texas A&M University College Station, TX 77843, U.S.A. Wayne SHEBILSKE The Department of Psychology, Wright State University Dayton, OH 45435, U.S.A Abstract. This paper proposes an intelligent training framework where agents are used with explicit teamwork models for desired teamwork behaviors. In the framework, we divide coaching process into two manageable sub-phases and model trainees regarding teamwork dynamics and team performance. We have implemented the framework on a team-based agent architecture (CAST) and applied it to train helping behaviors for a simulated command and control (C2) task. The framework and its implementation enable us to design experiments for studying the effectiveness of agent-based coaching for helping behaviors among team members.
Introduction The objective of this research is to build an agent-based team training system to coach human trainees, specifically facilitate trainees’ learning of how to help each other when collaboratively achieve a mission. In section 1, we propose an agent-based intelligent team training (AITT) framework that supports coaching of helping behaviors. In section 2, we describe the design of the coaching agents to be used in the framework, focusing on the two-phase training protocol. Discussions and conclusions are given in section 3. 1. An Agent-based Intelligent Training Framework CAST-ITT (CAST Intelligent Team Training) system is a team training system extended from a multi-agent infrastructure CAST [1]. A generic Agent-based Intelligent Team Training (AITT) framework has been developed to monitor the interactions among intelligent agents, human trainees, and the simulation system. Agents serve multiple roles in AITT – one role is to be the virtual partners who perform similar tasks - components of a partner agent are shown on the left side of Figure 1; the other role is to support user modeling of teamwork and provide coaching feedback to the team regarding team members’ helping behaviors – components of a coaching agent are shown on the right side of Figure 1. To build the coaching agents in the framework, we utilize user modeling components and reinforce a set of human training strategies. Both overlay model and error taxonomy are used to diagnose trainee errors.
766
C. Chen et al. / An Agent-Based Framework for Enhancing Helping Behaviors Figure 1 Agent-based Intelligent Team Training Framework
The individual teamwork module captures information relevant to teamwork related behaviors for individual human trainees. The expert model represents both the expert strategies acquired from the domain experts or the qualified trainee knowledge after planning phase evaluation. The team performance assessment module generates team performance measures and the assessment results. A team performance model of trainees is maintained to facilitate the reasoning of team performance history, which distinguishes our work from the traditional assessment module in a user modeling framework. 2. Two-phase Coaching We choose the Distributed Dynamic Decision-making (DDD) simulation as the domain to be used with the CAST-ITT system. The team mission of DDD is for a team to collectively protect restricted zones [2]. The experiment is set up so that particular team members are overloaded and the mission can only be achieved when they assist each other. We coach team collaboration via two sub-phases—in the planning phase, agents coach on trainee’s planning of team collaboration and in the execution phase, agents coach on team’s collaboration process. In the planning phase, we provide trainees additional information through the use of an intelligence report and allow them to plan the allocation of team resources. Intelligence report gives an overview of the track arrival information. Via our graphic planning tool, trainees communicate, make decisions and come up with a team placement plan about how to allocate their vehicles. A scoring algorithm has been developed to evaluate trainees’ placement plan based on a list of prioritized expert strategies, with respect to trainees’ resources and work loads. A higher score indicates a better helping pattern in the planning phase. Feedbacks are categorized into the major domain tasks, such as identification or attack of tracks. Possible errors can be trainee’s planning too much help without covering own zone, or planning no help when extra vehicles can be sent to particular teammates. In the execution phase, trainees’ online performance can be measured based on the dynamic sensing information, trainees’ collaborative actions and the qualified placement plan generated in the coaching of planning phase. When a helping event is triggered, there isn’t a precise “desired behavior” for the team; how the team performs depends on many domain factors and good strategies have to be adaptive to execution contingencies. Without having a comprehensive expert model, we use a hybrid approach of overlay model and error
C. Chen et al. / An Agent-Based Framework for Enhancing Helping Behaviors
767
taxonomy – modeling only trainee’s high level goals and diagnosing only the trainee errors related to several critical factors of a helping event. Figure 2 shows the generation of the event-based target goals in the planning phase, the execution of these desired goals in coaching agent’s expert model and the evaluation of trainee’s performance by comparing trainee actions with the target behaviors in the execution phase. Each trainee in the domain acts an individual Decision Maker (DM). Figure 2 Two-Phase coaching: Planning and Execution of Team Collaboration
Placement Plan
Sensing Input
DM1: R1, R2, R3 (Q3) DM2 DM3 DM4 Time interval 2 Time interval 3 Time interval 4
Detect Heavy Load in Quadrant3 Move Resource3 to Quadrant3
CAST-ITT (Intelligent Team Training) Expert Model Measure Detect Execute Deviation from Event Desired Goals Desired Goals
Event-based Desired Goals
Trainee Actions
3. Discussions and Conclusions In this paper, we report our ongoing research of using intelligent agents to coach helping behaviors among team members. To avoid the costly monitoring of trainee behaviors throughout a task, we propose an efficient user modeling system that only captures the highlevel desired goals for the target knowledge and skills. Feedback is provided to the trainee at the end of each session when an overall comprehensive assessment can be made. The information we collect from the trainee might be domain dependent, yet the domain independent nature of helping behaviors enables the AITT framework to be used by other researchers to design and test coaching applications within a teamwork-based domain – the framework can fit in other agent architectures or apply to multiple domains with the inclusion of our coaching agents. It is not always an advantage when humans have to make decisions within an automated interactive system [3]. Agents can be very concise and accurate about a specific set of rules while humans are good at adaptation to domain contingencies. We have observed that trainees and agents can take advantages of their unique strengths by acting in a complementary manner. To allow positive interactions outside the automated system, human trainees are explicitly encouraged discussing about their own helping strategies with the necessary information and feedback that the intelligent coaching agents have provided. References [1].Yen, J., Yin, J., Ioerger, T.R., Miller, M., Xu, D., and Volz, R.A. CAST: Collaborative Agents for Simulating Teamwork. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01). 2001. [2].Kleinman, D.L., Young, P. W., and Higgins, G. The DDD-III: A Tool for Empirical Research in Adaptive Organizations. In Proceedings of the Command and Control Research. 1996. [3].Woods, D.D., Cognitive Techniques: The Design of Joint Human-Machine Cognitive Systems, in AI Magazine. 1986. p. 86-92.
768
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
P3T: A System to Support Preparing and Performing Peer Tutoring Emily Ching1, Chih-Ti Chen2, Chih-Yueh Chou3, Yi-Chan Deng2, Tak-Wai Chan1 Graduate Institute of Network Learning Technology1 Department of Computer Science & Information Engineering2 National Central University, Taiwan Department of Computer Science and Engineering, Y uan Ze University, Taiwan3
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] Abstract. Peer tutoring has been proven an effective way to engage students in active learning. This paper presents a system called P3T that supports peer tutoring in a classroom where every student is equipped with TabletPC or laptop with wireless capability. P3T structures peer tutoring by scaffolding both tutors and tutees to prepare for the tutorial session and facilitating their elaborations during their face-to-face tutoring. The rationales of the evolving design of P3T prototype are given and discussed in this paper.
1. Introduction Students teaching students, or peer tutoring, is a pedagogical strategy that has been being studied extensively in education research. It has been found that having students teach each other increases their achievements at various educational levels [1][2]. However, classroom peer tutoring programs without technology support are usually for learning simple tasks. For example, the peer tutor in class-wide peer tutoring program reads the questions and answers on a set of flashcards when interacting with the tutee [3][4]. We assert that when a classroom where every student is able to interact with her classmates via her own computing device with wireless capability unobtrusively, it is probable that we can design more sophisticated support for peer tutoring given the computing affordances offered by technology. This paper intends to give an account of the design rationales of our proposed system, P3T, standing for computer supported Preparing and Performing Peer Tutoring, which supports for a complex peer tutoring model by the combination of Web and wireless technologies. 2. Background Evidences from empirical researches have indicated that students have greater academic performance when they are studying for teaching others than for taking a test [5]. In a more in depth research, Coleman et al. [6] found that when carrying out far transfer tasks (e.g. inference and application), students who were told to teach others by explanation outperformed those who were told to teach by summarization. This finding suggests that the type of teaching task is sensitive to tutors’ learning outcome; in other words, complex teaching tasks will involve student tutors in deeper thinking processes during the preparation for teaching. Besides preparing for teaching, performing teaching also bring tutors some cognitive benefits. This phase includes verbally presenting instructional materials and responding to
E. Ching et al. / P3 T: A System to Support Preparing and Performing Peer Tutoring
769
tutee’s questions. Webb [7] noticed that the former makes learning occur when one is aware of any inadequacies during giving out elaboration, and the latter makes one learn because of further clarification and discovered that students who gained the most from cooperative activities were those who provided elaborated explanations to others. When comparing tutors’ learning outcomes with tutees’, however, there are many studies demonstrating that peer tutoring benefits tutors in their cognitive gains more than it benefits tutees [5]. According to Webb’s finding [7] as summarized previously, it can imply that if a peer tutoring model has tutors cover most of the elaboration activities and tutees act as passive listeners, it will only favour tutors’ learning. 3. Design Rationales Conclusions drawn from the empirical researches are that the peer tutoring program should take advantage of learning effects during preparation for teaching, especially for teaching tasks involving deep-processing thinking, as well as during performing teaching. Tutees should also be involved in active learning to increase their cognitive gains in a peer tutoring setting. Based on these suggestions, P3T is featured in three designs of which we give an account in this section. 3.1 Computer-Scaffolded Tutorial Notes Composition One main feature of P3T is having each student tutor compose her own teaching material. By composing the tutoring notes, students concretely shape and present their thoughts. And, while developing and revising the tutorial notes, the composer is actually reflecting on her own thoughts and consequently involving her in deeper thinking. By placing what and how to be taught into digital text or figures, tutors make their comprehension of the target material “visible” and hence able to be monitored by the class teacher with the support of P3T system. However, without guidance, students may not know how to compose a good tutorial note [8]. P3T system scaffolds tutors to self-test their understanding, make lesson plan, identify keywords and their connectedness with prior knowledge, structure the tutorial notes in general format, and self-assess their tutorial notes with rubrics which assess the quality of tutorial notes from five perspectives: completeness, correctness, reorganization, presentation, and tutor’s personal analysis. 3.2 Computer-Supported Collaborative Tutoring After guiding tutors in individually composing tutorial notes, P3T further enhances the qualities of tutorial notes and tutoring performance by supporting the collaboration among tutors. The mechanisms are three sequential stages: anonymous peer assessment of tutorial notes, pairing tutors and having each pair generate a common tutorial note, and tutors in a pair helping and consulting each other during the tutorial session. Tutors assess each other’s works according to the same rubrics of the self-assessment. Besides providing the anonymous assessment function, the system distributes the tutorial note which earns the highest score to all tutors for their reference. After peer assessment, tutors are paired to integrate their tutorial notes into a common version. Tutors in a pair further clarify, reflect on, and merge their thoughts while confronting alternative points of view in this integration process.
770
E. Ching et al. / P3 T: A System to Support Preparing and Performing Peer Tutoring
In the tutorial session, each student tutor has her own peer learner to teach, and the P3T system presents the common tutorial note for the tutoring dyad. Tutors with common tutorial note help and consult each other when confronting questions from their tutees, and then the four students—two tutoring dyads—discuss together as a ‘small learning group’. 3.3 Computer-Facilitated Tutor-Tutee Face-to-Face Elaboration In the model of P3T, tutoring is held in the form of class-wide one-on-one tutoring in a classroom where each student has a TabletPC or laptop equipped with wireless connection capability. Besides displaying tutor-designed tutorial notes and questions for the tutees, P3T system also prompts questions which tutees posed in their pre-class reading. In other words, tutees have to come to the tutorial session with ideas in mind. Using P3T system, tutees point out the parts they feel difficult and pose questions about the original learning materials during pre-class reading phase, and the system will prompt those difficulties and questions for tutors to respond to during the tutorial session. Another P3T’s design to facilitate the elaboration and interaction between tutors and tutees is distributing instructor-designed questions for all students—tutors and tutees—to answer individually initially, and then having the four members in a small learning group share their answers and discuss for the correct one. 4. Summary and Current Work The current work of P3T is an ongoing research effort to construct a model of learning by peer tutoring in 1:1 classroom setting. The underlying framework for the research is aimed at integrating active learning processes (e.g. interpretation, elaboration, organization, etc.) with theories of learning by teaching and cognitive skill development. The educational context involves tutors’ learning by composing tutorial notes individually and then collaboratively, and tutees’ learning by pre-class study, and followed by face-to-face elaboration. The technology plays the roles of tutoring scaffolder, collaborative learning supporter, and elaboration facilitator. Currently we are implementing this system in a graduate course and will report our findings in the near future. References [1] Cohen, P.A.; Kulik, J.A.; Kulik, C.L.C. (1982). Educational outcomes of tutoring: a meta-analysis of findings. American Educational Research Journal, 19, 237-248 [2] Falchikov, N. (2001). Learning together: Peer tutoring in higher education. London: RoutledgeFalmer. [3] Greenwood, C.R., Delquardri, J.C., and Hall, V. (1989). Longitudinal effects of classwide peer tutoring, Journal of Educational Psychology, 81, 371–383 [4] Fantuzzo, J. W., King, J. A., & Heller, L. R. (1992). Effects of reciprocal peer tutoring on mathematics and school adjustment: A component analysis. Journal of Educational Psychology, 84, 331-339. [5] Bargh, J. A., & Schul, Y. (1980). On the cognitive benefits of teaching. Journal of Educational Psychology, 72, 593-604. [6] Coleman, E. B., Brown, A. L., & Rivkin, I. D. (1997). The effect of instructional explanations on learning from scientific texts. Journal of the Learning Sciences, 6, 347-365. [7] Webb, N. M. (1989). Peer interaction and learning in small groups. International Journal of Educational Research, 13, 21-39. [8] Ching, E., Chen, C. T. , Chou, C. Y., & Deng, Y. C. (2005). A pilot study of computer supported learning by constructing instruction notes and peer expository instruction. Short paper will be presented in the 10th conference of Computer Supported Collaborative Learning (CSCL 2005).
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
771
Cognitive and Motivational Effects of Animated Pedagogical Agent for Learning English as a Second Language Sunhee CHOI and Hyokyeong LEE University of Southern California/Information Sciences Institute Los Angeles, CA 90089, U.S.A Abstract. This paper discusses the results of a pilot study that explores the cognitive and motivational effects of an animated pedagogical agent as well as an alternative delivery system (a simple flashing arrow with audio) in a multimedia environment in which college level ESL students learn English relative clauses. The study also examines the cognitive efficiency of these two media systems used to deliver the same instructional method for teaching English grammar.
Introduction The primary purpose of this pilot study is to examine the claims that Animated Pedagogical Agents (referred as a pedagogical agent hereafter) increase learning scores over instructional treatments that do not employ agents [1][3][4]. The present study explored the cognitive and motivation effects of an agent as well as an alternative multimedia system (i.e., simple flashing arrow with audio) in a computer-based learning environment in which college level ESL (English as a Second Language) learners learn English relative clauses. The study also measured the cognitive efficiency of the two multimedia systems. Cognitive efficiency refers to “one medium being more or less effortful than another, more or less likely to succeed with a particular learner, or interacting more or less usefully with a particular prior knowledge set” (p. 25) [2], leading to faster learning, or requiring less conscious effort from learners in processing learning material. 1. Study Design and Method The design of the present study is a true experimental design with pre- and posttest and involves two treatment groups. An explicit presentation of rule on English relative clauses (e.g., how the target grammar works and what strategies can be used to process it) and a reading comprehension task were adopted as an instructional treatment in the study. Both tasks were computerized and delivered online. An animated pedagogical agent, Genie’ was embedded in the agent-based learning environment (Agent Group) to deliver the explicit rule presentation, while an flashing arrow with audio was implemented to do the same thing in the non-agent based environment (Arrow Group) [Figure 1]. 19 students who speak English as a second language were recruited from two local universities (8 Korean, 6 Chinese, 3 Thai, & 2 Japanese speakers). Participants were randomly assigned to one of the two treatment groups, Agent Group (9 participants) and Arrow Group (9 participants). The pretest consisted of three testing measures: a sentence combination test (12 questions), an interpretation test (9 questions), and a grammaticality judgment test (12 questions). The posttest was essentially the same as the pretest except that
772
S. Choi and H. Lee / Cognitive and Motivational Effects of Animated Pedagogical Agent
some lexical items were replaced with other equally difficult items and all the items were presented in a different order. Participants learned English relative clauses using the ESL learning program, ‘Reading Wizard’ that consisted of the explicit explanation on the target grammar and the reading task. Five dependent variables were measured including mental effort, self-efficacy, time, learner interest, and performance measures. Figure 1. Agent based and Non-Agent based Learning Environments
Agent Group
Arrow Group
2. Results & Discussion 2.1Does the explicit rule presentation have positive effect on learners’ learning of English relative clauses? The results of the Paired-Samples t test show that there was no difference between the pretest and the posttest scores [t (17) = -1.021, p = 0.321]. However, since the half of the participants had high prior knowledge, another t-test was conducted after dropping those who answered correctly more than 80% of the pretest questions. The results show that the students with low prior knowledge made significant improvements after learning from the program [t (6) = -3.315, p = 0.016], which indicates that the explicit rule presentation and reading comprehension task did have positive effect on learners’ learning of English relative clauses when the learners did not have much prior knowledge of the target grammar. 2.2 Does the type of medium delivering the same instructional method have a differential effect on learners’ learning of English relative clauses? No significant difference was found between the agent and the arrow group on the pretest [t (16) = 0.383, p = 0.707], which means that any difference obtained from the posttest is attributable to the instructional treatment implemented in the study. There was no meaningful difference between the two groups on the posttest either [t (16) = 0.118, p = 0.907]. The results of Mann-Whitney test (a non-parametric test), however, showed that a delivery medium did make difference in the low prior knowledge participants – Agent group performed better (z = -1.101, p = 0.400), which means that a pedagogical agent might work better with low prior knowledge. 2.3 Does the type of medium delivering the same instructional method have a differential
S. Choi and H. Lee / Cognitive and Motivational Effects of Animated Pedagogical Agent
773
effect on the time and mental effort spent to achieve the same level of performance? No difference was found between the two groups in terms of the amount of time spent by the learners to acquire the target grammar (Agent group: M = 12.70 minutes, Arrow group: M = 12.52 minutes, t = .111, p = .913). However, the learners in Agent group exerted significantly less mental effort than those in Arrow group (Agent group: M = 2.17, Arrow group M = 3.44, t = -2.682, p = .016). Given that learners’ performances in the two groups were not significantly different, it is reasonable to say that the agent group achieved the same level of performance with less mental effort. In other words, the cognitive efficiency of the animated pedagogical agent was higher than the arrow with audio when they were used to deliver the explicit rule presentation. 2.4 Does the type of medium delivering the same instructional method have a differential effect on the levels of learner interest and motivation? The results of paired-sample t test show that the overall increase of learner self-efficacy beliefs from pre-treatment to post-treatment was not significant [t (6) = -2.202, p = 0.070; z = -1.787, p = 0.074]. Yet, the arrow group’s scores were higher and increased more (almost to the significant level, t (5) = -2.465, p = 0.057; z = -2.141, p = 0.032) than the agent group. On the contrary, the agent group (M = 5.25) perceived the learning environment more interesting than its counterpart (M = 4.5) [t (5) = 0.624, p = 0.560; z = -1.061, p = 0.289]. In other words, the increased interest did not have a cause and effect relationship with students’ actual learning outcomes. 3. Conclusion and Future Research The present study shed light on the issue of cognitive efficiency of multimedia which has not been included in the field of instructional technology. As discussed above, the delivery medium did not have significant impact on learning product, but it did on learning process, especially the amount of mental effort exerted by learners. The study also demonstrated that animated pedagogical agents work better with learners with low prior knowledge than with those with high prior knowledge. The study also displayed that the agent group showed higher interest level than the arrow group, but the higher interest was not led to better performance. Yet, it should be noted that the there was no clear pattern observed due to the small sample size. That is, it is required to conduct a larger scale study which in fact is being planned by the researchers. References [1] Atkinson, R. K. (2002). Optimizing learning from examples using animated pedagogical agents." Journal of Educational Psychology, 94(2): 416-427. [2] Cobb, T. (1997). Cognitive Efficiency: Toward a revised theory of media. Educational Technology Research & Development, 45(4), 21-35. [3] Johnson, W. L., Ricke. J. W., & Lester, J. C. (2000). Animated pedagogical agents: Face-to-face interaction in interactive learning environments. International Journal of Artificial Intelligence in Education, 11, 47-78 [4] Lester, J. C., Converse, S. A., Kahler, S. E., Barlow, S. T., Stone, B. A., & Bhogal, R. S. (March, 1997). The persona effect: Affective impact of animated pedagogical agents. Computer-Human Interaction: Atlanta.
774
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Added value of a task model and role of metacognition in learning Noor Christoph 1 , Jacobijn Sandberg and Bob Wielinga University of Amsterdam Keywords. Constructivism, Knowledge Management, Knowledge retention, Metacognition, Problem solving, Task models
1. Introduction Constructivist learning environments generally advocate the active acquisition of knowledge and skills, collaboration, and the use of authentic and realistic case material. Also, the constructisivm point out the importance of metacognitive skills in order to monitor and control the learning process. In general, the use of metacognitive skills is positively related to learning success [5]. Games and simulations fit rather well in this paradigm since learners can experiment in a highly realistic environment. Empiric results however, show that learning is problematic in these environments [2]. One of the problems concerns difficulties learners have in regulating their learning behaviour. In this paper the assumption is that constructivist learning environments can be beneficial for learning provided that they contain a task model. A task model is a model that prescribes how to solve a particular problem. The task model contains all elementary executable activitives stemming from the general phases of problem solving, decompiled at the level of cognitive activities. Mettes & Pilot [3] for example developed a task model for problem solving in the field of thermodynamics. It appeared that students working with this model, outperformed students that did not have this model available. From the field of instructional technology the notion of including a task model in a learning environment is seen as a form of instructional support, especially for regulating learning behaviour. An assumption is that metacognitive skills of novice learners are mostly domain independent in nature. In order to become an expert, these skills should be instantiated for the domain at hand. Since in a the task model the general phases of problem solving are applied to a particular domain, the task of students is made easier. They do not have to use their general metacognitive skills and apply these to the domain at hand, since this is done for them. This should lead to improved learning compared to a situation in which no model is available. When no task model is available in the learning environment, students have to fall back on their framework of metacognitive skills. They have to apply these general skills to the domain themselves. If these metacognitive skills are not readily 1 Correspondence to: Noor Christoph, University of Amsterdam, Matrix I, Kruislaan 419 1098 VA Amsterdam. Tel.: +31 020 888 4671; Fax: +31 020 888 0000; E-mail:
[email protected].
N. Christoph et al. / Added Value of a Task Model and Role of Metacognition in Learning
775
available, learning will be problematic. Thus the research question in this paper is: What is the added value of a task model for learning and what role do metacognitive skills play in this context? The learning environment suitable for this research, is called KM Quest. KM Quest1 is a constructivist gaming-simulation environment for the domain of Knowledge Management (KM). The task model in KM Quest concerns the Knowledge Management model (KM model). It prescribes how students should solve knowledge management problems. KM Quest will be played in two conditions: one without the task model (no-model condition) versus the standard environment (model condition). Students are assigned to conditions based on randomization. In a pre-test post-test design, measures of learning and self-reported use of metacognitive skills will be employed. Hypothesis 1 covers the main effect of condition: students in the model condition outperform students in the no-model condition with regard to the acquisition of declarative and procedural knowledge. Hypothesis 2 concerns an interaction effect of condition and metacognition. Players in the no-model condition that score high on use of metacognitive skills reach comparable scores on knowledge tests to players in the model condition. Players in the no-model condition that score low on use of metacognitive skills, perform less on knowledge tests than students in the model condition.
2. Methods The electronic questionnaire KMQUESTions was used in order to measure the acquisition of declarative and procedural knowledge. KMQUESTions was developed in a previous study and appeared to be sufficiently reliable and valid [10]. The scale Metacognitive self-regulation of the Motivated Strategies for Learning Questionnaire [11] is used in order to measure the self-reported use of metacognitive skills. The reliability of this scale is sufficient [10, 11]. Scores on metacognition were divided in two groupes based on the median, in order to discriminate between students that scored high and low on this variable.
3. Results Concerning within-subject effects, a main effect for learning was found (F = 72.13, p < 0.01). No interaction effects were found. The hypothised interaction effect of condition and metacognition was not found (F = 0.22, p = 0.64). This indicates that students acquired declarative knowledge regardless of condition and metacognition. A main effect for acquiring procedural knowledge could be reported (F = 38.56, p < 0.01). A main effect was found for condition (F = 9.26), p < 0.01). Students in the model condition outperformed students in the no-model condition. One interaction effect was found, namely between learning success and metacognition (F = 4.66, p < 0.05). Students that scored low on metacognition, showed more learning success in relation to students that scored high on metacognition. No interaction effects between condition and metacognition could be reported (F = 0.10, p = 0.75). A complication was that students in the no-model condi1 KM Quest was developed in the EC project KITS (IST-1999-13078), which consisted of the following partners: University of Twente, The Netherlands; TECNOPOLIS CSATA novus, Italy; Cibit, The Netherlands; EADS, France; ECLO, Belgium and the University of Amsterdam, The Netherlands
776
N. Christoph et al. / Added Value of a Task Model and Role of Metacognition in Learning
tion did not have the KM model at their disposal, therefore they could not answer several items for procedural knowledge correctly. After excluding these items, the main effect of learning did hold (F = 15.69), p < 0.01). However, the main effect of condition disappeared. The interaction effect between learning success and metacognition also remained (F = 4.55, p < 0.05).
4. Discussion and conclusions In this study, the objective was to find out what the effect of a task model was on learning, and how the use of metacognitive skills fits in. The results reveal that students acquire declarative and procedural knowledge about the domain Knowledge Management, this replicates findings of an earlier study [10]. The first hypothesis cannot be confirmed. Students in the model condition do not outperform their peers in the nomodel conditions. The second hypothesis, namely about the interaction effect of condition and metacognition, cannot be confirmed. Students in the no-model condition that score high on metacognition, do not exceed students in the same condition that score low on metacognition. However, an interaction effect was found between learning success in terms of (general) procedural knowledge and metacognition. Students that scored low on metacognition, obtained significantly more learning gain in (general) procedural knowledge than students that scored high on metacognition, regardless of condition. Concluding, the main finding of this study is that especially weaker students in terms of metacognitive skills appear to benefit from KM Quest, regardless whether a model is present. Their learning gain is highest compared to students that are stronger in using metacognitive skills. It is however, not the KM model that they benefit most from, since the addition of this model to the environment does not lead to better learning results. Perhaps the fact that KM Quest is in essence a constructivistic learning environment is the reason why weaker students achieve more learning success. Maybe for this learning environment, one has successfully translated the theoretical principles underpinning the constructivism into specific didactical and pedagogical teaching strategies that lead to advanced self-regulatory behaviour and therefore, better learning, especially for those who need it.
References [1] T. de Jong, W.R. van Joolingen (1998). Scientific discovery learning with computer simulations of conceptual domains. Review of educational research, 68, 179-202. [2] C.T.C.W. Mettes, A. Pilot (1980). Over het leren oplossen van natuurwetenschappelijke problemen. Enschede:Onderwijskundig Centrum CDO/AVC TH-Twente, The Netherlands. [3] N. Christoph, J.A.C. Sandberg, B.J. Wielinga (2004). Measuring learning effects of a gamingsimulation environment for the domain of knowledge management. Proceedings of IADIS CELDA conference, Lisbon, Portugal. [4] P.R. Pintrich, D.A.F. Smith (1993) Reliability and predictive validity of the motivated strategies for learning questionnaire (MSLQ). Journal of educational psychology, 82, 33-40.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
777
Introducing adaptive assistance in adaptive testing Ricardo CONEJO, Eduardo GUZMÁN, José-Luis PÉREZ-DE-LA-CRUZ and Eva MILLÁN Departamento de Lenguajes y Ciencias de la Computación Universidad de Málaga, Spain. {conejo, guzman, perez, eva}@lcc.uma.es Abstract. In this paper, we discuss the development of a theoretical framework for introducing adaptive presentation in adaptive testing. To this end, a discussion of some aspects concerning the adaptive selection mechanism for hints is presented. Some axioms that hints must fulfil are also determined, providing a hint validation procedure.
1.
Introduction
Testing is commonly used in many educational contexts with different purposes: grading, selfassessment, diagnostic assessment, etc. In order to improve the efficiency of the diagnosis process, adaptive testing systems select the best question to be asked next according to relevant characteristics of the examinee. In this way, higher accuracy can be reached with a significant reduction in test length. One of the most commonly used approaches for adaptive testing is Item Response Theory (IRT) [1], which assumes that the answer to a question depends on an unknown latent numerical trait θ, which in educational environments corresponds to the knowledge of the subject being tested. In any adaptive educational system, it is necessary to have accurate estimations of the student’s knowledge level in order to take the more suitable instructional action. In this sense, Computerized Adaptive Tests (CATs) [2] based on IRT provide a powerful and efficient diagnosis tool. In our group we have used this framework to design and implement SIETTE1 [3], [4], which is a web-based assessment system that implements CATs based on a discretization of IRT. There can be little doubt that one of the main contributions to educational psychology in the XX century is Vigotsky’s Zone of Proximal Development (ZPD) [5]. A short operational definition useful for our purposes is given in [6]: the zone defined by the difference between a person’s test performance under two conditions: with or without assistance. Soon after the definition of the ZPD, attempts to apply this concept were made in the context of the administration of tests, typically with the aim to classify students with the goal to allocate them in the more appropriate educational program. But the main goal of the work presented here is different: to build a model that allows the integration of adaptive assistance in the adaptive testing procedure within the SIETTE system. It is widely accepted that hinting is a general and effective tactic for teaching. In [7] it is shown that human tutors maintain a rough assessment of the student’s performance (the trait θ in our approach) in order to select a suitable hint. Many Intelligent Tutoring Systems also give hints to the student, like for example, ANDES [8] and Animalwatch [9]. 1
http://www.lcc.uma.es/SIETTE
778
R. Conejo et al. / Introducing Adaptive Assistance in Adaptive Testing
In our framework, assistance will be represented by hints, h1, …., hn that provide different levels of support for each test item. By adaptive assistance we mean that the hint to be presented will be selected by the system depending on how far in the ZPD is the item, in such a way that it provides the minimal amount of information so that the student is able to correctly answer such item. The work presented here aims to extend our previous research [10] on the introduction of hints and feedback in adaptive testing. The main goal is now the definition and evaluation of a theoretical framework for adaptive hinting. This paper addresses the definition of such framework, and is structured as follows: next, we discuss several aspects concerning the introduction of hints in adaptive testing environments and then we present some conclusions and future lines of research. 2.
Introducing hints in an Adaptive Testing environment
As aforementioned, SIETTE implements CATs and IRT in a web-based assessment tool. In contrast with traditional IRT, θ is defined as a discrete variable. To introduce hints in this model, let us first define some terms: • Item. We use this term to denote a question or exercise posed to a student. The solution of such task will be provided by answering a multiple choice question, that is the conjunction of a stem and a set of possible answers, where only one is correct. • A test is a sequence of items. • Hint. A hint is an additional piece of information that is presented to the student after posing a question and before he answers it. Hints may provide an explanation of the stem, clues for rejecting one or more answers, indications on how to proceed, etc. Hints can be invoked in two different ways: a) active (the examinee asks for it) or b) passive, (the system decides when to present it). As an example, consider the following test item and some possible hints: What is the result of the expression: 1/8 + 1/4? a) 3/4
b) 3/8
c)2/4
d)2/8
Hint 1: 1/4 can be also represented as 2/8 Hint 2: First, find equivalent fractions so they have the same denominator Hint 3: d is incorrect
For our purposes, a simplifying assumption is that hints do not modify student’s knowledge. This assumption (that the trait θ remains constant during the test) is usual in adaptive testing, and in this case means that hints do not cause a change in examinee’s knowledge but a change on the ICC shape. In this way, the hint brings the question from the ZPD to student’s knowledge level. In this sense, the combination of the item plus the hint can be considered as a new item. This new (virtual) item is represented by a new ICC whose parameters can be estimated using the traditional techniques. However, both ICC’s are not independent. First, the use of a hint should make the question easier, which can be stated as: Axiom 1. Given a question q and a hint h, let ICCq and ICCq+h be the ICCs associated to the question and to the combination question+hint, respectively. Then, ICCq(k) ≤ ICCq+h(k). If the examinee uses a combination of hints, the question should become even easier: Axiom 2. Given a question q, a set of hints H and a hint h ∉ H, for all knowledge levels k, ICCq+H(k) ≤ ICCq+H+{h}(k). If the parameters for such ICCs have been estimated and the axioms above are not satisfied, it means that the “hint” misleads the student and should be rejected. This simple approach provides with a useful empirical method to validate hints.
R. Conejo et al. / Introducing Adaptive Assistance in Adaptive Testing
779
In adaptive environments, it makes sense to look for a criterion for adaptively selecting the best hint to be presented. Under the ZPD framework, if the student is not able to solve the item but the item is on his/her ZPD, the best hint to be presented would be the hint that brings the item I from the ZPD to the zone of the student’s knowledge. So for example if an item I has three associated hints h1, h2 and h3 at different levels of detail, it means that each hint is suitable for a different part of the ZPD, as represented in Figure 1.
ZPD2 Student ZPD1 (h2) knowledge (h1)
ZPD3 (h3)
I
Fig.1. Student knowledge, ZPDs and hints
A possibility for adaptive selection of hints is to use classical adaptive mechanisms: given the knowledge estimation θ(k) for a student, and given two hints h1 and h2, the best hint is the one that minimizes the expected variance of the posterior probability distribution. This mechanism is simple to implement and does not make substantial modifications in the adaptive testing procedure, because the test is used for evaluation and not for learning. However, the use of hints can provide positive stimuli and increase student self-confidence. 3.
Conclusions and future work
This paper has presented some ideas about introducing adaptive hints in an adaptive testing environment, based upon IRT constructs. Hints are considered not as knowledge modifiers, but as modifiers of the ICC of a question. Some formal axioms that every model of hints must satisfy have been stated and informally justified. A preliminary evaluation study (not reported here due to lack of space) suggests that that the use of adaptive hints in such environments is adequate and feasible. The next step is the calibration of ICCs for each pair item-hint using empirical data. The obtained ICCs will allow validating such hints and serve as a basis for the integration and implementation of this model in SIETTE to allow for adaptive selection of items and hints in our testing system. References [1]. Hambleton, R. K.; Swaminathan, J., and Rogers, H. J. (1991). Fundamentals of Item Response Theory. California, USA: Sage publications. [2]. H. Wainer (ed.). (1990). Computerized adaptive testing: a primer, Lawrence Erlbaum Associates, Hillsdale. [3]. Conejo, R., Guzmán, E., Millán, E., Trella, M., Pérez-De-La-Cruz, J. L., and Ríos, A. (2004). SIETTE: A Web-Based Tool for Adaptive Testing. International Journal of Artificial Intelligence in Education, 14, pp. 29-61. IOS Press. [4]. Guzmán, E. and Conejo, R. (2004). A brief introduction to the new architecture of SIETTE. LNCS 3137, Springer; pp. 405-408. [5]. Vygotskii, L. S. (1994). The Vygotskii Reader, Blackwell, Cambridge, Massachusetts. [6]. Wells, G. (1999). Dialogic inquiry: Towards a socio-cultural practice and theory of Education, chapter 10. New York: Cambridge University Press. Online: www.oise.utoronto.ca/~gwells/resources/ZPD.html. [7]. Gertner, A. S., Conati, C., VanLehn, K. (1998): Procedural Help in ANDES: Generating Hints Using a Bayesian Network Student Model. Proc. 15th National Conference on Artificial Intelligence, pp. 106-111. [8]. Arroyo, I. (2002). Animalwatch: an arithmetic ITS for elementary and middle school students. Workshop on Learning Algebra with the Computer at Intelligent Tutoring Systems 2002, Montreal, Canada. [9]. Hume, G. D., Michael, J., Rovick, A., Evens, M. W. (1996). Hinting as a tactic in one-on-one tutoring. Journal of Learning Sciences 5(1) pp. 23-47. [10]. Conejo, R., Guzmán, E. and Pérez-de-la-Cruz, J.L. (2003). Towards a computational theory of learning in an adaptive testing environment. Proceedings of AIED’03, IOS Press: pp.398-400.
780
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Student Questions in a Classroom Evaluation of the ALPS Learning Environment Albert CORBETT, Angela WAGNER, Chih-yu CHAO, Sharon LESGOLD, Scott STEVENS, Harry ULRICH Human-Computer Interaction Institute Carnegie Mellon University, Pittsburgh, PA 15213 USA
Abstract. Intelligent tutors for problem solving are successful environments, but have limited capabilities to provide help when learning opportunities arise. This paper reports a classroom pilot study of a new ALPS learning environment that is designed to engage students in more active learning by enabling students to ask questions at any time during problem solving. This ALPS environment integrates a Cognitive Tutor for math problem solving with Synthetic Interview technology which allows students to type natural language questions at any time during problem solving and receive pre-recorded videos of a human tutor replying. In this study we examine the rate of question-asking, the content of student questions, and student attitudes about ALPS. The student interactions with ALPS and attitudes toward the environment are promising and provide guidance for future development.
Introduction Many successful intelligent tutors for problem solving have been developed that lead to demonstrably large achievement gains [1,5]. These tutors can identify learning opportunities during problem solving by detecting student errors, but have a limited capability to help students construct a deep understanding. With rare exceptions [3] students can only press a help key, corresponding to the generic question, “what should I do next.” This paper reports a classroom pilot of a new ALPS (Active Learning in Problem Solving) environment that opens an additional communication channel during problem solving. This environment integrates Cognitive Tutors, a successful problem-solving ITS with an off-the-shelf technology called Synthetic Interviews, to let students ask questions during problem solving. The Synthetic Interview environment [4] permits students to type questions and provides the videotaped replies of a human tutor. A principal goal of providing this virtual human tutor is to engage students more actively in learning, by letting students generate questions at any point during problem solving. The purposes of this classroom study are to observe students’ question-asking behaviors and attitudes toward the new environment. 1. The ALPS Environment This ALPS environment employs Cognitive Tutor algebra problems. Each problem describes a situation that can be modeled with a linear function. Students solve numerical questions, generate an algebraic expression for the function and graph the function. In this pilot, the ALPS virtual tutor was grafted onto the usual Cognitive Tutor help facilities. Students were
A. Corbett et al. / Student Questions in a Classroom Evaluation of the ALPS Learning Environment 781
advised to start by typing a question in the ALPS window when they needed help. If the video answer to a question (or sequence of questions) was insufficient, students were advised to use the tutor’s help button as usual. This ALPS release was seeded with video answers to 70 problem-related questions and 30 responses to social conversational moves (e.g., saying “thanks”) or to off-task social questions (e.g., “how old are you”).
1.1. A Wizard of Oz Prototype Study Relatively little is known about student question-asking behavior in problem-solving ITSs. We began studying student question-asking behavior with a Wizard of Oz laboratory pilot of a prototype ALPS environment in which a human tutor played the role of the Synthetic Interview [2]. In this study, students averaged 38.0 utterances per hour. Of these utterances, 14.4 were unprompted questions, 3.5 were questions prompted by the human tutor, 11.8 were other responses to tutor questions or answers, and 8.2 were other comments. We found that more than half the questions were about the interface and a few were requests for definitions. Of the remaining problem-solving questions, shallow answer-oriented questions were far more frequent than deeper process-oriented questions or principle-oriented questions.
2. The S tudy One hundred students enrolled in Cognitive Tutor 7th-grade math or pre-algebra courses in two Pittsburgh-area schools participated in the study. Each student used the ALPS Algebra tutor in one Cognitive Tutor class period and completed a short attitude questionnaire.
2.1 Student Utterances The 100 students generated a total of 548 utterances and averaged 11.1 utterances per hour. This overall rate of utterances is substantially lower than in the Wizard of Oz pilot study, as might be expected in this early pilot, since the ALPS tutor was less proficient at answering questions than the human tutor and the ALPS students had other sources of answers, notably the tutor help button. All student utterances were independently coded by two authors into seven task-related categories and seven off-task social utterance categories. Fifty-three percent of the utterances were task-related and the remaining were social interactions. The distributions of utterances across the task-related categories in this study and in the Wizard of Oz prototype study are essentially the same in their most important feature: among the answer-, process- and principle-oriented mathematics questions, the shallow answer-oriented questions far outnumber those in the deeper categories, and principle-oriented questions are virtually non-existent. This similarity confirms that the scaffolding of deeper knowledgeconstructing questions will be a key design goal in an environment that fosters studentinitiated questions. Almost half the student utterances were off-task social utterances. M ost of these student-initiated utterances were appropriate to and characteristic of interactions with a another human and among these off-task social utterances only 10% were inappropriate, either overly personal or intentionally offensive.
782 A. Corbett et al. / Student Questions in a Classroom Evaluation of the ALPS Learning Environment
2.2 Questionnaires Students answered two Likert-scale questions that compared the ALPS and standard cognitive tutors (Which do you like more? Which is more helpful?). Overall, students were neutral between the two tutors. On the 5-point Likert scale, the ratings for the two questions averaged 3.0 and 2.9 respectively. Two other questions asked “What did you like most about the ALPS tutor?” and “What did you dislike most about the ALPS Tutor?”. Their answers suggest that students like the ALPS concept, but were dissatisfied with this initial implementation. Students’ most frequent answers to the first question were that they liked asking questions and liked talking to a “person.” The most frequent answers to the second question were that the answers didn’t help and the tutor was slow.
3. Conclusion The results of this ALPS pilot study are promising. Students interact with the ALPS virtual tutor much as they do with human tutors, in good ways and bad. Students interacted socially with the tutor much as they would with a familiar human tutor, including observing social conventions. If anything, students spent too much time interactingsocially with the tutor. The distribution of answer-, process- and principle- oriented task-related questions is also virtually identical in the ALPS and Wizard of Oz studies. This both validates the ALPS environment and poses the greatest design challenge: scaffolding deeper knowledge-building questions. The apparent validity of the ALPS environment implies that it can be a useful research tool to address these issues. Finally, students’ principal complaints that answers were insufficiently helpful and took too longto start up represent implementation difficulties to be overcome, but students report that they like being able to ask questions and like “talking” to the virtual tutor.
Acknowledgements This research was supported by National Science Foundation Grant EIA0205301 “ITR: Collaborative Research: Putting a Face on Cognitive Tutors: Bringing Active Inquiry into Active Problem Solving.”
References [1] Anderson, J.R., Corbett, A.T., Koedinger, K.R. and Pelletier, R., 1995. Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4, 167-207. [2] Anthony, A., Corbett, A., Wagner, A., Stevens, S and Koedinger, K. 2004. Student question-asking patterns in an Intelligent Algebra Tutor. In J. C. Lester, R. M. Vicari & F. Paraguacu (Eds.) Intelligent th Tutoring Systems: 7 International Conference, ITS 2004, 455-467. [3] Rosé, C.P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K. and Weinstein, A., 2001. Interactive Conceptual Tutoring in Atlas-Andes. Proc. Artificial Intelligence in Education 2001, 256-266. [4] Stevens, S.M. and Marinelli, D., 1998. Synthetic Interviews: The Art of Creating a ‘Dyad’ Between Humans and Machine-Based Characters. Proc. IEEE Workshop on Interactive Voice Technology for Telecommunications Applications. [5] VanLehn, K., Lynch, C., Taylor, L., Weinstein, A., Shelby, R., Schulze, K., Treacy, D. and Wintersgill, M., 2002. Minimally invasive tutoring of complex physics problem solving. In S. Cerri, G. Gouarderes & F. Paraguacu (Eds.) Intelligent Tutoring Systems: Sixth International Conference, ITS 2002, 367376.
783
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Scrutability as a core interface element Marek CZARKOWSKI, Judy KAY, Serena POTTS School of Information Technologies University of Sydney Sydney, NSW 2006, Australia Abstract
This paper describes a new approach to supporting scrutability of adaptation. Because our previous work indicated that people are unfamiliar with the whole notion that they might control the personalisation of an adaptive learning interface, we have experimented with a new approach. We make the scrutability support blatant and presented by default. We report a user trial indicating that although the users were unaccustomed to the notion that they might understand and control personalization, the did succeed in scrutinizing the adaptation. 1. Introduction We believe that it is important to be able to build personalized systems in such a way that learners can scrutinize the adaptation of a hypertext system to answer these questions: What has been adapted to me? What caused the adaptation I saw compared with that seen by a peer? How can I control or alter the adaptation? We have several motivations for this as argued in detail elsewhere [1, 5, 6]. In previous work [1, 5, 6], we have been quite surprised at the difficulty of providing an adaptive hypertext interface that learners are able to scrutinize to answer the questions above. Part of the problem appears to be that people tend to be unaware of the fact that material has been personalized. Even if they realize this, they have difficulty appreciating that the personalization is driven by their student model. Even if they realize this, they have difficulty realizing that they can simply change their user model to effect changes in the personalization. Section 2 presents the user’s view of the delivery interface. Section 3 describes the evaluation and Section 4 draws conclusions. 2. System Overview The system will be described based on a tutorial on UNIX file permissions. Figure 1 shows an example of the interface. Note that the right hand part of the screen is devoted to the personalisation cells. This has the authentication details at the top, then the summary of the user model in text form and finally a cell linked to the student’s model. In the figure, the details of the adaptations are displayed. If the user clicks hide adaptation, the screen changes to omit the background colouring and all the content that should be excluded disappears.
784
M. Czarkowski et al. / Scrutability as a Core Interface Element
Figure 1: Page that has been adapted according to the user’s user model showing sections included highlighted in yellow and sections excluded in green
To see why an individual section has been included or excluded, the user holds their mouse is over the section in question and a caption pops up to indicate the reason. The system has been built using a local lightweight but highly adaptable web framework called Cellerator [2] in conjunction with Personislite [7]. Following the terminology of Brusilovsky [4], the system provides adaptive presentation and adaptive navigation. We have also drawn on many elements of the previous implementation of a scrutably adaptive hypertext [1, 5, 6]. The important difference is that this time we have experimented with making the user model present at all times. This should make the possibility of scrutiny of the personalisation more obvious to users. 3. Evaluation The system was used to assess whether participants could: Appreciate that their profile caused the adaptation; Determine what had been adapted to them; Understand why it had been adapted and Change their user model to control the adaptation. The evaluation was undertaken in two stages. In first stage, participants 1-5 were asked to answer the initial questionnaire as if they were a single fictitious user, Fred, as in [1]. By contrast, participants 6-9, in the second stage, answered for themselves. Users were required to perform tasks that required them to use the system to scrutinise the adaptation, determine which attributes in their user model caused the adaptation, and finally change the values in their user model to change the adaptation.
M. Czarkowski et al. / Scrutability as a Core Interface Element
785
All first stage participants understood that their user profile would cause the adaptation of content within the system and were able to effectively change their user profile. Most were able to view the adaptation though of those who could, the majority experienced difficulties utilising the mouseover function provided to see the reason for the adaptation. This is a significant improvement over the previous study [1]. The second stage indicated that in general participants found the system easy to use and were able to use all the functions provided. Only one appeared to experience trouble using the system. This appeared to be due to the fact they chose to see the adaptation on every page when they filled in their user profile initially. 4. Discussion and Conclusions The purpose of a scrutable adaptive hypertext system is to give the user control, allowing them to understand and control the adaptation. The majority of participants could identify that their profile caused the adaptation, were able to see what had been adapted to them; understood why it had been adapted; could change their user model, hence controlling the adaptation. This is a real step forward, compared with our previous studies [1] with a more subtle interface. This study seems to indicate that if we want learners to scrutinize an adaptive hypertext learning environment, we may need the blatant always-present reminder that adaptation is being performed as we have done in the Personalisation cell at the right of the interface. 5. References [1] [2] [3]
[4] [5] [6] [7]
Kay, J. and M. Czarkowski, (2003). How to give the user a sense of control over the personalization of AH? Workshop on Adaptive Hypermedia and Adaptive Web-Based Systems, User Modeling 2003 Session: p. 121-132. Kummerfeld, B. and P. Lauder, Cellerator, http://www.it.usyd.edu.au/~bob, accessed August 2003 Bicking, I., The Web Framework Shootout, http://www.python.org/pycon/papers/framework/web.html, accessed August 2003 Brusilovsky, P., Adaptive Hypermedia: From Intelligent Tutoring Systems to WebBased Education. Intelligent Tutoring Systems 2000, 2000: p. 1-7. Czarkowski, M., An adaptive hypertext teaching system. Honours Thesis, Basser Dept of Computer Science, University of Sydney, 1998. Czarkowski, M. and J. Kay, Challenges of Scrutable Adaptivity. Proceedings of AIED Conference, 11th International Conference on Artificifial Intelligence in Education, IOS Press, 2003: p. 404 - 407. Kay, J., B. Kummerfeld, and P. Lauder, Personis: A Server for User Models. Adaptive Hypermedia and Adaptive Web-Based Systems, AH 2002 Proceedings, 2002: p. 203212.
786
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
DCE: A One-on-One Digital Classroom Environment Yi-Chan Deng, Sung-Bin Chang, Ben Chang*, Tak-Wai Chan* Dept. of Computer Science and Information Engineering, National Central University, Taiwan *Research Center for Science and Technology of Learning, National Central University, Taiwan
[email protected] Abstract. This paper describes a platform that supports for one-on-one digital learning. The platform is named “DCE”, which stands for digital classroom environment. It consists of four major modules, curriculum (including activities) module, user data module, communication module, and external interface module. An application example of DCE was described and demonstrated.
1. Background One-on-one (1:1) digital classroom environment or, simply, 1:1 classroom refers to an environment in classroom setting where every student is equipped with at least one computing device with wireless communication power for individual or group learning, either inside or outside classroom. A group of researchers envision that handy and portable computing devices with wireless support will be accessible by a significant proportion of K12 learners in the forthcoming decade [1]. Researchers have been exploring how to use wireless and mobile devices to enhance the physical classroom. For example, eClass system is one of the earliest digital classroom projects formerly known as Classroom 2000 [2]. ClassTalk [3] focuses on improving teacher ability to pose questions to student’s graphic calculators and conduct the class-wide discussion. Gay et al. [4] report on the impact of basic wireless networking in the classroom environment. Roschelle et al. [5] survey a lot of research evidences in the classroom response and communication systems. DCE 3.0 inherits two previous systems with different names, EduClick and WiTEC. EduClick is another simple wireless response system that can gather collective instant response data from individual students in the class. In experiments conducted with EduClick [6], it was found that there was increase of student interest, attention as well as interactions between students and teachers in the classroom. Instead of using EduClick’s simple remote response emitters, WiTEC that built on a revised architecture of EduClick adopted WebPad, the only computing devices that provided handwriting screens of considerable size during the time before TabletPC was first launched in 2002. Small group collaborative learning has been designed in WiTEC and trial tested in elementary classrooms [7]. A particular feature of WiTEC is that it provides a three layer framework to support teachers for designing learning flows [8]. There has been some description of the architectures of EduClick [6] and WiTEC [7]. However, being an inherited system, DCE 3.0 is in an advantageous position for improving, extending and generalizing such 1:1 digital classroom environment in this series of implementation efforts and will be used as a basis for some subsequent experiments [9]. DCE may also bear the potential to impact web-based learning research and development. Currently, hybrid model, that is, distance together with face-to-face classroom instructional model may become the de facto model for network learning. This means that system such as DCE will possibly be an inevitable component of most future learning management systems that have to incorporate both face-to-face and distance learning models. In this paper, we describe the design of the general architecture framework of DCE 3.0, in particular, the four major modules,
Y.-C. Deng et al. / DCE: A One-on-One Digital Classroom Environment
787
curriculum (including activities), user data, communication and external interface. An example application of DCE is described. 2. One-on-one Digital Classroom Environment The general framework of 1:1 digital classroom environment is represented as a communication triangle that plays the orchestrated role (see Figure 1). There are three circles at the three points of the communication triangle. The circle at the upper point is the user circle which includes teacher and students. The outer ring of user circle is the external interface. Teacher and students perform a learning task through the external interface and communication triangle. When students are performing a learning task, the external interface collects user data logs. The circle at the lower left point is the user data circle. User data repository is the source for building student model if the system intends to provide intelligent support for both the students and the teacher. The right hand circle is curriculum which coordinates task and material to build a curriculum tree [10] for learning.
Figure 1 General architecture framework of digital classroom environment
Figure 2 Five levels of curriculum tree
Curriculum module: There are five levels of curriculum module, namely, curriculum level, unit level, task level, episode level, and element level. Figure 2 shows the relation of each level. Communication module: Students interact in 1:1 classroom is via computer mediated communications. Communication module manages the intra-classroom communication and inter-classroom coordination. The inter-classroom coordination of DCE is prepared for supporting inter-classroom learning activities in the future. User Data module: User data module is the data centre of student learning records. In the 1:1 classroom, each student is equipped with at least one device, the learning processes can be recorded in student’s logs. For example, these logs include the access of curriculum tree, units, tasks, episodes and elements, outcome of tasks, and so on. External Interface: External interface handles the user interface and peripheral of the learning devices. For example, the peripheral can be digital camera, digital microscope, probes, or other computer embedded tangible objects. It makes the digital learning environment extended easily and naturally. The learning devices also provide the computing affordances to enhance interactions. 3. Application examples This section describes an example of graduate-level course on “The Trend of Digital Learning Development”. Figure 3 shows the unit level curriculum tree of this course. There are 6 units in the tree. The task level Figure 3 Curriculum tree of the trend of description of asking a good question (AGQ) digital learning development model [9] is presented in the table 1 left column. First is the task name, second is the abstract of this task, third are pedagogy rationales, and last is a set of episode lists. In this case, there are eight episodes of AGQ task. The first argument of composing episode is Q&A which stands for the composing outcome and the
788
Y.-C. Deng et al. / DCE: A One-on-One Digital Classroom Environment
second argument stands for interactive-mode which, in this case, is an individual action. The episode level description of composing episode is presented in the table 1 right column. This episode includes episode name, general description, pedagogy rationales, and a set of element lists. Table 1 Levels descriptions of AGQ model taskName: AGQ episodeName: Composing description: description: z Teacher-led presentation of learning material z Self-study of learning material and individual Q&A generation z Q&A assessment z Small group formation and conflict resolution z Teacher-led class-wide discussion.
pedagogyRationales: AGQ is a model of student question generation that engaging students in a challenging learning activity that potentially involves higher-level cognitive processing operations.
episodeList: 1. 2. 3. 4. 5. 6. 7. 8.
Reading(Null, Ind, PaperNo) Composing(Q&A, Ind) PeerAssessment(AssSheet, PTraG) Composing(Q&A, TraG) PeerAssessment (AssSheet, G2G) Composing(Q&A, S2S) Quiz(Grade, Ind) Summary(Null, Cls)
z Each student composes a question and a corresponding answer with guided stems and a specific question category z The question can be multiple choice or short answer open question z The student assesses his/her own question with a set of rubrics
pedagogyRationales: z Questioning can help students find out what the important part in the reading materials is z The questions will be higher level with the guided stems and the specific question category z The process of self-assessment can help students reflect on their own questions
elementList: 1. 2. 3.
guidedStem(); composingProduct(); selfAssessment();
4. Summary and future extension In this paper, we describe the design and implementation of the architecture framework of DCE 3.0, in particular, the four major modules: curriculum module, user data module, communication modules, and external interface module. An example application of DCE is described. The future extension of DCE includes student model of user data module, script engine of communication module, and multi-sensor of external interface. More subsequent experiments using DCE 3.0 platform will be conducted in this coming year. References [1] Global network of collaborative researchers on 1:1 educational computing. http://www.g1on1.org, March 2004. [2] Abowd, G.D. (1999). Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Systems Journal, 38(4), 508-530. [3] Dufresne, R.J., Gerace, W.J., Leonard, W.J., Mestre, J.P. & Wenk, L. (1996). Classtalk: A classroom communication system for active learning. Journal of Computing in Higher Education, 7, 3-47. [4] Gay, G., Stefanone, M., Grace-Martin, M. & Hembrooke, H. (2001). The effects of wireless computing in collaborative learning environments. International Journal of Human-Computer Interaction, 13(2), 257-276. [5] Roschelle, J., Penuel, W.R. & Abrahamson, L.A. (2004). The networked classroom. Educational Leadership, 61(5), 50-54. [6] Huang, C.W., Liang, J.K. & Wang, H.Y. (2001). EduClick: A Computer-supported Formative Evaluation System with Wireless Devices in Ordinary Classroom. In Proceedings of ICCE 2001, 1462-1469. [7] Liu, T.C., Wang, H.Y., Liang, J.K., Chan, T.W., Ko H.W. & Yang, J.C. (2003). Wireless and mobile technologies to enhance teaching and learning. Journal of Computer Assisted Learning, 19(3), 371-382. [8] Wang, H.Y., Liu, T.C., Chou, C.Y., Liang, J.K., Chan, T.W. & Stephen Yang. (2004). A Framework of Three Learning Activity Levels for Enhancing the Usability and Feasibility of Wireless Learning Environments. Journal of Educational Computing Research, 30 (4), 309-329. [9] Chang, S.B., Tung, K.J., Huang, H.M. & Chan, T.W. (2005). AGQ: A Model of Student Question Generation Supported by One-on-One Educational Computing, In the Proceedings of CSCL2005. (Accepted) [10] Chen, Y.H., Wu, Y.T., Ku Y.M., Wu J.F. & Chou Y.C. (2005). Curriculum tree: A management system to support curriculum and learning activities, Unpublished paper submitted to the 5th IEEE International Conference on Advanced Learning Technologies (ICALT 2005).
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
789
Contexts in Educational Topic Maps Christo DICHEV and Darina DICHEVA Winston-Salem State University, 601 M.L.K. Jr. Dr., Winston Salem, N.C. 27110, USA Abstract. This paper explores the idea of using contexts to support more efficient information search in Topic Maps-based digital libraries. The notion of context is perceived as abstraction of grouping of domain concepts and resources based on the existing semantic relationships between them. The proposed model of context is used for context representation in the TM4L environment.
Introduction There is a large amount of high quality learning resources on the web already and they should be made more accessible to users. In this paper we explore the idea of using contexts to support more efficient information search. We propose to define contexts as abstraction of clusters of domain concepts and resources based on the existing relationships between them. This is related to our previous work on contexts as well as the development of a framework of concept-based digital course libraries [1]. The framework is based on using the new Semantic Web technology Topic Maps (TM) [2] that enables users to navigate and access documents in an organized manner. In the topic map paradigm the scope feature defines the extent of validity of an assertion: the context in which a topic name or an occurrence is assigned to a given topic, or in which topics are related through associations. Thus thinking of representing contexts in TM, a quick straightforward answer would be to use the topic maps scoping. In the TM standard a scope is a set of themes (of validity). Themes can be defined and applied to objects (topic names, resources, and associations). Obviously a scope can be used to present a context however this would be a rather static view. Independently of the standard we propose using TM associations to represent context as grouping. Topic maps associations can be interpreted as statements relating topics. For instance, in the case of educational applications, it is possible to express the statement that a given concept is represented in a particular learning object (e.g. tutorial, definition, etc.) in the form: topic X is represented by tutorial Y (in a particular syntactic form). Similarly, associations such as Prolog is based on Resolution, Prolog refers to Horn-Clause Logic, Prolog applies Backtracking, make the topic Prolog pertinent to the related topics. Obviously, association types combined with role types enable meaningful grouping of topics that we call context. Formally context can be defined as a collection of statements that are true in a model. In a less formal perspective, context can be interpreted as the things, which surround, and give meaning to something else. The statement “Snow is white” is meaningful if we talk about New Year in Alaska, but has no meaning in terms of CPU scheduling. We can view contexts as a means of grouping facts relevant to a particular situation. Grouping and classification of objects is a human invention to simplify communication. For our purpose we take a restricted model of this view of context, namely, as a grouping of topics based on their relations to a given topic. Translated in TM terminology a context can be defined as a collection of associations related to a common topic selected to represent and name the context. Technically, this is a nested TM drawn around a topic chosen to name the context.
790
C. Dichev and D. Dicheva / Contexts in Educational Topic Maps
1. Context as grouping Most works related to formalizing context are centered around the so called “box model”, where “Each box has its own laws and draws a sort of boundary between what is in and out” [3], [4]. The problem with this approach is that we have to predefine all potentially needed “boxes” in order to use them. The world is too unpredictable to foresee the complete set of contexts that might be needed. Rather than preparing a set of static boxes we suggest to use a TM model that allows shifting boundaries of the context dynamically based on the current topic. The proposed interpretation of context as a collection of topics surrounding a given topic (denoting the context) is intended to localize the search and the inference within an area of relevant topics. It allows us to introduce a measure of relevancy. The interpretation of what are the surrounding topics is relative. At one point a topic can be part of the surrounding collection and at another point it can be viewed as surrounded by some other topics giving meaning to it. The relationships are at the heart of semantics, lending meaning to concepts and resources linked to them. The basic assumptions underlying the proposed contextual framework include: x Each context is a collection of topics related to a certain topic of the topic map that plays a role of a focus or center of the context. x The central topic is unique and can be used to name the context. x All semantically related topics identify regions formed by the topics directly or indirectly related to the center of the context. x The relevance of a topic to the current context is reverse proportional to its distance to the focus of the context. According to the last assumption the topics of a collection forming a context have no equal status with respect to that context. Their role in the context depends on the distance to the central topic. For each topic, the context maps that topic to a collection of topics whose degree of membership to the context depends on their level of relevancy. Among the valuable features of this context model is that it provides a mechanism to refer to the current context, and use it to identify an area of interest within the TM. This implies that searching for relevant information can be localized into a specified area of interest.
2. Context: minimal set of generic relations Learning content typically embodies related topics, hard to be presented into conventional hierarchical structures. Thus we focus on a model for expressing a broader class of relationships on contextual structures. Our idea was to define a minimal set of generic relations which cover the needs of the intended applications. The advantage of such an approach is that generic relations subsume particular instances that might be impossible to articulate in specific terms. Our proposed minimal set of generic relations appropriate for elearning applications is based on guiding principles including: (1) Simplicity: simpler is better other things being equal, and (2) Scope: a broader scope is better because it subsumes the narrower ones. We propose the following relations: x Part-whole – a transitive relation that characterizes the compositional structure of an object. It is intended to capture in generic sense structural information that subsumes transitive relations of the type X is part-of Y, X is member-of Y, X is portion-of Y, X is area-of Y, X is feature-of Y, etc. x Relevant-to - represents a family of asymmetric not necessary transitive relations. It is intended to capture in a generic sense asymmetric relations of the type X is related-to Y, X is used-by Y, X refers-to Y, X points-out-to Y etc.
C. Dichev and D. Dicheva / Contexts in Educational Topic Maps
791
x
Similar-to - describes relations with symmetric roles assigned to the two role players. It is intended to capture in a generic sense symmetric relations of the type “co-refers” (X is analogous-to Y, X co-mentions Y, X is-of-the-complexity-level-of Y, X is compatible-with Y, X is-matching Y). We extend this set with the conventional superclass-subclass, and class-instance relations. The basic intuition is that the five relations superclass-subclass, class-instance, part-whole, relevant-to and similar-to represent a sufficient basis of generic relations for elearning applications. They can be used as a generic grouping of concepts and resources that might be difficult to articulate. The proposed set of relations provides also a strategy for organizing the information. It supports a shared way of grouping topics by standardizing the used set of relations. The intended application of context in our framework includes the following aspects: x Identifying an area of interest for more reliable and accurate interpretation of search requests. x Providing a method for ranking the search results by relevance. x Providing a framework for topic map visualization. Context has the potential for enhancing the focus and precision of the search. Situating topics contextually provides additional information derivable from the distance between topics. Thus, search results can be listed with decreasing relevance to the search topics.
4. Conclusion Efficient information retrieval requires information filtering and search adaptation to the user’s current needs, interests, knowledge level, etc. The notion of context is very relevant to this issue. In this paper we propose an approach to context modeling and use in topic maps-based educational applications. It is based on the standard Topic Maps support for associations and defines the context as an abstraction of grouping related information. This context model provides a mechanism for referring to the current context, and using it to identify a current area of interest within the topic map. The latter is useful for localizing a search for relevant information within the current area of interest. We have used the proposed model of context in the design of TM4L, an e-learning environment aimed at supporting the development of efficiently searchable, reusable, and interchangeable discipline-specific repositories of learning objects on the Web [5]. Acknowledgement This material is based upon work supported partly by the National Science Foundation under Grant No. DUE-0333069 “NSDL: Towards Reusable and Shareable Courseware: Topic Maps-Based Digital Libraries” and DUE-0442702 “CCLI-EMD: Topic Maps-based Courseware to Support Undergraduate Computer Science Courses.” References 1. Dicheva, D., Dichev, C.: A Framework for Concept-Based Digital Course Libraries, J. of Interactive Learning Research, 15(4) (2004) 347-364 2. Biezunski, M., Bryan, M., Newcomb, S.: ISO/IEC 13250:2000 Topic Maps: Information Technology, www.y12.doe.gov/sgml/sc34/document/0129.pdf. [Last viewed December 5, 2004]. 3. Giunchiglia F.: Contextual reasoning, Epistemologia, Special issue on I Linguaggi e le Macchine XVI (1993) 345–364 4. McCarthy, J.: Generality in Artificial Intelligence, Communications of ACM 30(12) (1987) 1030–1035 5. The TM4L Project, http://www.wssu.edu/iis/NSDL/index.html
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
792
Analyzing Computer Mediated and Face-toFace Interactions: Implications for Active Support Wouter van DIGGELEN, Maarten OVERDIJK and Jerry ANDRIESSEN Department of Educational Sciences, Utrecht University Heidelberglaan 1, 3584CS Utrecht, The Netherlands Abstract: In this paper, we argue that co-constructive activities of learners are not solely confined to the problem at hand or to the process of collaboration, but that they are also directed at the pre-defined structural features of the tool. Although these features are defined in advance, they become meaningful in interaction. Learners re-construct these structures in activity through a process of appropriation during which ‘new’ structures emerge that guide the collaborative learning activities. It is hypothesized that the learnerexternal artefact interactions may be an additional source for modelling problem-solving discussions. We will support our line of reasoning with observations of computer mediated interactions from a study that we carried out in a real-life classroom setting.
1.
Introduction
A wide variety of computer tools has been developed to support problem-solving discussions inside and outside the classroom. These tools mediate the interactions between learners and may provide them with active – ‘intelligent’ – support. This support may take on two forms [1, 3, 5]. First, the computer tools that mediate communication provide the users with predefined structures that direct their actions and interactions. This pre-structured support concerns, for example, the organisation of the discussion (e.g. turn-taking or simultaneous access), communicative acts (e.g. notation system) or the accessibility of information sources (e.g. anonymity of users). A second kind of – ‘intelligent’ – support concentrates on the management of collaborative activities. This active, ‘on the spot’ support provides users with current information about their – individual or group – performance which they may use to adjust subsequent actions. This kind of support is based on a system that collects interaction data, transforms that data into interaction models that are presented to the learner, directly or after a comparison with a desired state of interaction [3]. A situated perspective on cognition seems promising for developing valid interaction models for active support. The situated approach implies that one should focus on significant features of learner-environment interaction that are turned and oriented towards adaptive action [6]. Most interaction models for active support focus on just one part of the learner-environment relation, i.e. the interactions with other learners. We hypothesize that the interaction between the leaner and external artefacts – e.g. the computer tool that mediates their interactions – could be another indicator of effective collaborative problem solving. At least this form of interaction is not as static as is often is assumed.
W. van Diggelen et al. / Analyzing Computer Mediated and Face-to-Face Interactions
793
In line with structuration theory [2] we state that the structural features of the tool are reconstructed in activity through a process of appropriation [4]. In this process of appropriation ‘new’ structures emerge that guide the collaborative learning activities. 2. Research We studied the process of appropriation in a real-life classroom experiment where groups of three students argued about a claim. Students could communicate face-to-face and with the support of the Digalo tool1. The Digalo tool provides its users with a shared workspace based on a concept-mapping interface (figure 1). Users can put forward contributions simultaneously into a shared workspace, using a predefined notation system. Users can also relate associated contributions by using links. When users discuss in the shared workspace they collaboratively construct an – argumentative – diagram of their discussion.
Figure 1. User interface of the Digalo tool
3.
Interactions between the learner and the external artefact
The interactions of learners that are directed towards the external artefact may provide significant input for modelling problem-solving discussions. The learner-external artefact relation can be characterised – just as the learner-learner relation –as mutually constitutive. When learners interact, they appropriate the structural features of the tool. The structures that emerge during their interaction influence subsequent interactions. We studied this process of appropriation as it took place in the Digalo environment. We focused our analysis on the structures that users apply to relate their contributions. The structures that emerge when students interact with the tool seem crucial, because they enable the students to construct a visual representation of their discussion. We distinguish three principles for relating and organising contributions in the shared workspace of the Digalo: 1. Users can relate contributions by selecting the same contribution type (e.g. ‘argument in favour’). These contributions are recognised visually by their shape and colour. This organizing principle is the most compelling because all statements in the Digalo are associated with a contribution type. 2. Users can draw a line – i.e. a link – between two associated contributions. 3. Users can spatially group associated contributions.
794
W. van Diggelen et al. / Analyzing Computer Mediated and Face-to-Face Interactions
Students could freely apply the three organizing principles. This led to a diversity of argumentative representations. Some groups came up with a rigid, structured representation while other groups constructed a more unstructured, complex map that may even be judged as chaotic by outsiders. The more rigid, structured maps can be characterised by one leading pattern. However, the leading pattern differed between groups. We observed, for example, maps reflecting the temporal order of the discussion and maps that emphasize the opposing standpoint with regard to the claim. These structured maps are characterized by the fact that all organizing principles strengthen the representation of one leading pattern. The more unstructured, complex maps lack a clear leading pattern. The students that constructed those maps mainly used the two organizing principles ‘same contribution type’ and ‘link between two associated contributions’. They did not use the third organizing principle – the spatial grouping of contributions – which made these maps more complex. This became even more apparent when the number of contributions increased. Students’ autonomy or freedom of action leads to a diversity in constructed diagrams. Organizing a diagram seems to be a process of – implicit – negotiation that can have multiple outcomes. The time students spent organizing their discussion indicates that the activity of making sense of a discussion is as important as expressing the ideas into words. A crucial step towards future development of active support lies in examining how individual actions performed in a system relate to social activities, such as argumentation, negotiation and problem solving. Our research implies that the development of interaction models requires a broad analysis of learner-environment interactions in order to understand how individual actions and group interactions constitute the process of argumentation, negotiation or problems solving in computer mediated groups. 1
The tool described in this paper has been developed in the DUNES project. The DUNES project (IST-2001-34153) is partially funded by the European Commission under the IST Programme of Framework V.
Reference [1]
[2] [3]
[4] [5] [6]
Barros, B. and Verdejo, M.F. (2000). Analysing student interaction processes in order to improve collaboration: The DEGREE approach. International Journal of Artificial Intelligence in Education, 11: 221-241. Giddens, A. (1986), The Constitution of Society: Outline of the Theory of Structuration. Polity Press, Cambridge. Jermann, P., Soller, A. and Muehlenbrock, M. (2001). From mirroring to guiding: A review of state of the art technology for supporting collaborative learning. Proceedings of the first European Conference on Computer-Supported Collaborative Learning, Maastricht, The Netherlands. Poole, M.S., Seibold, D.R. and McPhee, R.D. (1996). The Structuration of Group Decision. In R.Y. Hirokwa and M.S. Poole (Eds), Communication and Group Decision Making. London: Sage. Reimann, P. (2003). How to support groups in learning: More than problem solving. Proceedings of Artificial Intelligence in Education 2003, Sydney, Australia. Semin, G.R. and Smith, E.R. (2002). Interfaces of social psychology with situated and embodied cognition. Cognitive Systems Research, 3: 385-396.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
795
!"#$' * !"+'
> > >> + *
|* >
+*
*\* **
|* * *>*
> **
\** _\* > \* > * > * *>* * *
\* * * *\* * >> *>* \ +*
* * * * _ |* *
* >> > ** \ *\ * > * *>* * *
|* *\ * *
^ * *>* *
\ * * * \* * _ * ^ *
* > > \ *\ ** *
** > \* * *>* *
* >
\* * *
* *\* \* * * * _\ > * * > > *
>
* *** * * > *\* * * >
D. Chesher et al. / Adding a Reflective Layer to a Simulation-Based Learning Environment
797
`**
_*
> > \ \ * >* \** > |*
* * ` * B *
* > ** |*
* ! ! X\*
* > > \** > * * >
> #$ ` ^ \* > \>
> |* > >\ *
_
* * > +
* * >
*
*\* > * + >> *
> |*
** B *
* ! ! X \* * > >> > * * ^
>> * |* >> *
* `**
* *
* \* ** *
>* * * *
\ > |*
> > \> ^ *
> **
> > >\ >> > % $ =
$ =
+
_ !$$
* * > $===
|
'` | +>
\ _*> *
* >* * * * ! * $$ " *|* > * \ \ \ >===== ! "
> < >$
* \
! \**
> + $$| $$
` _ "
$==
*`| >
_
* * $$$ **
> >
! ' $=$ *¡ *
|\ ' \ | * *
!
`
_ > ! * =$=$
798
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Positive and negative verbal feedback for Intelligent Tutoring Systems Barbara Di Eugenio a,1 Xin Lu a Trina C. Kershaw a Andrew Corrigan-Halpern a Stellan Ohlsson a a
University of Illinois, Chicago, USA
Abstract. We built three different versions of an ITS on a letter pattern extrapolation task: in one version, students only receive color-coded feedback; in the second, they receive verbal feedback messages when they perform correct actions, and in the third, when they make a mistake. We found that time on task and number of errors are predictive of performance on the post-test rather than the type of feedback. Keywords. Intelligent Tutoring Systems. Natural Language feedback.
1. Introduction and motivation Research on the next generation of Intelligent Tutoring Systems (ITSs) [2,3,4] explores Natural Language (NL) as one of the keys to bridge the gap between current ITSs and human tutors. In this paper, we describe an experiment that explores the effect of simple verbal feedback that students receive either when they perform a correct step or when they make a mistake. We built three different versions of an ITS that tutors students on extrapolating a complex letter pattern [7], such as inferring MEFMGHM from MABMCDM. In the neutral version of the ITS the only feedback students receive is via color coding, green for correct, red for incorrect; in the positive version, they receive feedback via the same color coding, and verbal feedback on correct responses only; in the negative version, they receive feedback via the same color coding, and verbal feedback on incorrect responses only. In a between-subject experiment we found that, even if students in the verbal conditions do perform slightly better and make fewer mistakes, these differences are not significant. Rather, it is time on task and number of errors that are predictive of performance on the post-test. This work is motivated by two lines of theoretical inquiry, one on the role of feedback in learning [1], the other, on what distinguishes expert from novice tutors [8]. In another experiment in the letter pattern domain, subjects were individually tutored by three different tutors, one of which had years of experience as a professional tutor. Subjects who were tutored by the expert tutor did significantly better on one of the two problems in the post-test, the more complex one. The content of the verbal messages in our ITSs is based on a preliminary analysis of the language used by the expert tutor. 1 Correspondence to: B. Di Eugenio, Computer Science (M/C 152), University of Illinois, 851 S. Morgan St., Chicago, IL, 60607, USA. Email:
[email protected].
B. di Eugenio et al. / Positive and Negative Verbal Feedback
799
Figure 1. The negative ITS, that provides verbal feedback on mistakes
2. Method and Results Our three ITSs are model-tracing tutors, built by means of the Tutoring Development Kit [6]. Fig. 1 shows the interface common to all three ITSs. The Example Pattern row presents the pattern that needs to be extrapolated; the A New Pattern row is used to enter the answer – the first cell of this row is filled automatically with the letter the extrapolation must start from; the Identify Chunks row can be used to identify chunks, as a way of parsing the pattern. If seen in color, Fig. 1 also shows that when the subject inputs a correct letter, it turns green (H, F), and when the subject makes a mistake, the letter turns red (C). We ran a between-subjects study in which each group of subjects (positive [N = 33], negative [N = 36], and neutral [N = 37]) interacts with one version of the system. All subjects first received instructions about how to interact with the ITS. The positive and negative groups were not informed of the feedback messages they would receive. All subjects trained on the same 13, progressively more difficult, problems, and then received the same post-test consisting of 2 patterns, each 15 letters long. Subjects see the same pattern for 10 trials, but must continue the pattern starting with a different letter each time. Post-test performance is the total number of letters that subjects enter correctly across the 20 trials (a perfect score is 300).
Positive Negative Neutral
Post-test score
Time
Errors
154.06 141.83 134.62
42.68 45.52 42.02
18.91 14.69 21.89
Table 1. Means for the three groups
Means for each condition on post-test scores, time spent in training, and number of errors are shown in Table 1. Subjects in the two verbal conditions did slightly better on the post-test than subjects that did not receive any verbal feedback, and they made fewer mistakes. Further, subjects in the positive condition did slightly better than subjects in the negative condition on the post-test, although subjects in the negative condition made fewer mistakes. However, none of these differences is significant.
800
B. di Eugenio et al. / Positive and Negative Verbal Feedback
A linear regression analysis was performed with post-test scores as the dependent variable and condition, time spent in training, and number of errors as the predictors. The overall model was significant, R2 = .16, F (3, 102) = 6.52, p < .05. Time spent in training (β = −.24, t(104) = −2.51, p < .05) and number of errors (β = −.24, t(104) = −2.53, p < .05) were significant predictors, but condition was not a significant predictor (β = −.12, t(104) = −2.53, p > .05). Hence, we can explain variation in the post-test scores via individual factors rather than by feedback condition. The more time spent on training and the higher number of errors, the worse the performance. However, it would be premature to conclude that verbal feedback does not help, since there may be various reasons why it was not effective in our case. First, students may have not really read the feedback, especially in the positive condition in which it may sound repetitive after some training [5]. Second, the feedback may not be sophisticated enough. In the project DIAG-NLP [2] we compared three different versions of an ITS that teaches troubleshooting skills, and found that the version that produces the best language significantly improves learning. The next step in the letter pattern project is indeed to produce more sophisticated language, that will be based on a formal analysis of the dialogues by the expert tutor. On the other hand, it may well be the case that individual differences among subjects are more predictive of performance on this task than type of feedback. We will therefore also explore how to link the student model with the feedback generation module. Acknowledgments. This work is supported by awards CRB S02 and CRB S03 from UIC, and by grant N00014-00-1-0640 from the Office of Naval Research.
References [1] A. Corrigan-Halpern and S. Ohlsson. Feedback effects in the acquisition of a hierarchical skill. In Proceedings of the 24th Annual Conference of the Cognitive Science Society, 2002. [2] B. Di Eugenio, D. Fossati, D. Yu, S. Haller, and M. Glass. Natural language generation for intelligent tutoring systems: a case study. In AIED 2005, the 12th International Conference on Artificial Intelligence in Education, 2005. [3] M. W. Evens, J. Spitkovsky, P. Boyle, J. A. Michael, and A. A. Rovick. Synthesizing tutorial dialogues. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pages 137–140, 1993. [4] A.C. Graesser, N. Person, Z. Lu, M.G. Jeon, and B. McDaniel. Learning while holding a conversation with a computer. In L. PytlikZillig, M. Bodvarsson, and R. Brunin, editors, Technology-based education: Bringing researchers and practitioners together. Information Age Publishing, 2005. [5] Trude Heift. Error-specific and individualized feedback in a web-based language tutoring system: Do they read it? ReCALL Journal, 13(2):129–142, 2001. [6] Kenneth R. Koedinger, Vincent Aleven, and Neil T. Heffernan. Toward a rapid development environment for cognitive tutors. In 12th Annual Conference on Behavior Representation in Modeling and Simulation, 2003. [7] K. Kotovsky and H. Simon. Empirical tests of a theory of human acquisition of informationprocessing analysis. British Journal of Psychology, 61:243–257, 1973. [8] S. Ohlsson, B. Di Eugenio, A. Corrigan-Halpern, X. Lu, and M. Glass. Explanatory content and multi-turn dialogues in tutoring. In 25th Annual Conference of the Cognitive Science Society, 2003.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
801
Domain-Knowledge Manipulation for Dialogue-Adaptive Hinting Armin Fiedler and Dimitra Tsovaltzi Department of Computer Science, Saarland University, P.O. Box 15 11 50, D-66041 Saarbrücken, Germany. 1. Introduction Empirical evidence has shown that natural language (NL) dialogue capabilities are a crucial factor to making human explanations effective [6]. Moreover, the use of teaching strategies is an important ingredient for intelligent tutoring systems. Such strategies, normally called dialectic or socratic, have been demonstrated to be superior to pure explanations, especially regarding their long-term effects [8]. Consequently, an increasing though still limited number of state-of-the-art tutoring systems use NL interaction and automatic teaching strategies, including some notion of hints (e.g., [3,7,5]). On the whole, these models of hints are somehow limited in capturing their various underlying functions explicitly and relating them to the domain knowledge dynamically. Our approach is oriented towards integrating hinting in NL dialogue systems [11]. We investigate tutoring proofs in mathematics in a system where domain knowledge, dialogue capabilities, and tutorial phenomena can be clearly identified and intertwined for the automation of tutoring [1]. We aim at modelling a socratic teaching strategy, which allows us to manipulate aspects of learning, such as help the student build a deeper understanding of the domain, eliminate cognitive load, promote schema acquisition, and manipulate motivation levels [13,4,12], within NL dialogue interaction. In contrast to most existing tutorial systems, we make use of a specialised domain reasoner [9]. This design enables detailed reasoning about the student’s action and elaborate system feedback [2] Our aim is to dynamically produce hints that fit the needs of the student with regard to the particular proof. Thus, we cannot restrict ourselves to a repertoire of static hints, associating a student answer with a particular response by the system. We developed a multi-dimensional hint taxonomy where each dimension defines a decision point for the associated cognitive function [10]. The domain knowledge can be structured and manipulated for tutoring decision purposes and generation considerations within a tutorial manager. Hint categories abstract from the strict specific domain information and the way it is used in the tutoring, so that it can be replaced for other domains. Thus, the teaching strategy and pedagogical considerations core of the tutorial manager can be retained for different domains. More importantly, the discourse management aspects of the dialogue manager can be independently manipulated. 2. Hint Dimensions Our hint taxonomy [10] was derived with regard to the underlying function of a hint that can be common for different NL realisations. This function is mainly responsible for the educational effect of hints. To capture all the functions of a hint, which ultimately aim at eliciting the relevant inference step in a given situation, we define four dimensions of hints: The domain knowledge dimension captures the needs of the domain, distinguishing different anchoring points for skill acquisition in problem solving. The inferential
802
A. Fiedler and D. Tsovaltzi / Domain-Knowledge Manipulation for Dialogue-Adaptive Hinting
role dimension captures whether the anchoring points are addressed from the inference per se, or through some control on top of it for conceptual hints. The elicitation status dimension distinguishes between information being elicited and degrees to which information is provided. The problem referential perspective dimension distinguishes between views on discovering an inference (i.e., conceptual, functional and pragmatic). In our domain, we defined the inter-relations between mathematical concepts as well as between concepts and inference rules, which are used in proving [2]. These concepts and relations can be used in tutoring by making the relation of the used concept to the required concept obvious. The student benefits in two ways. First, she obtains a better grasp of the domain for making future reference (implicitly or explicitly) on her own. Second, she is pointed to the correct answer, which she can then derive herself. This derivation process, which we do not track but reinforce, is a strong point of implicit learning, with the main characteristic of being learner-specific by its nature. We call the central concepts which facilitate such learning and the building of schemata around them anchoring points. The anchoring points aim at promoting the acquisition of some basic structure, called schema, which can be applied to different problem situations [13]. We define the following anchoring points: a domain relation, that is, a relation between mathematical concepts; a domain object, that is, a mathematical entity, which is in the focus of the current proof step; the inference rule that justifies the current proof step; the substitution needed to apply the inference rule; the proof step as a whole, that is, the premises, the conclusion and the applied inference rule. 3. Structuring the Domain Our general evaluation of the student input relevant to the task, the domain contribution, is defined based on the concept of expected proof steps, that is, valid proof steps according to some formal proof. In order to avoid imposing a particular solution and to allow the student to follow her preferred line of reasoning, we use the theorem prover ΩMEGA [9] to test whether the student’s contribution matches an expected proof step. Thus, we try to allow for otherwise intractable ways of learning. By comparing the domain contribution with the expected proof step we first obtain an overall assessment of the student input in terms of generic evaluation categories, such as correct, wrong, and partially correct answers. Second, for the partially correct answers, we track abstractly defined domain knowledge that is useful for tutoring in general and applied in this domain. To this end, we defined a domain ontology of concepts, which can serve as anchoring points for learning proving, or which reinforce the defined anchoring points. Example concepts are the most relevant concept for an inference step, that is, the major concept being manipulated, and its subordinate concept, that is, the second most relevant concept. Both the domain contribution category and the domain ontology constitute a basis for the choice of the hint category that assists the student at the particular state in the proof and in the tutoring session according to a socratic teaching model [10]. 4. Using the Domain Ontology Structured domain knowledge is crucial for the adaptivity of hinting. The role it plays is twofold. First, it influences the choice of the appropriate hint category by a socratic tutoring strategy [2]. Second, it determines the content of the hint to be generated. The input to the socratic algorithm, which chooses the appropriate hint category to be produced, is given by the so-called hinting session status (HSS), a collection of parameters that cover the student modelling necessary for our purposes. The HSS is only concerned with the current hinting session but not with inter-session modelling, and thus does not represent if the student recalls any domain knowledge between sessions. Special fields are defined for representing the domain knowledge which is pedagogically useful for inferences on what the domain-related feedback to the student must be. These fields
A. Fiedler and D. Tsovaltzi / Domain-Knowledge Manipulation for Dialogue-Adaptive Hinting
803
help specify hinting situations, which are used by the socratic algorithm for choosing the appropriate hint category to be produced. Once the hint category has been chosen, the domain knowledge is used again to instantiate the category yielding a hint specification. Each hint category is defined based on generic descriptions of domain objects or relations, that is, the anchoring points. The role of the ontology is to assist the domain knowledge module (where the proof is represented) with the mapping of the generic descriptions on the actual objects or relations that are used in the particular context, that is, in the particular proof and the proof step. For example, to realise a hint that gives away the subordinate concept the generator needs to know what the subordinate concept for the proof step and the inference rule at hand is. This mapping is the first step to the hint specifications necessary. The second step is to specify for every hint category the exact domain information that it needs to mention. This is done by the further inclusion of information that is not the central point of the particular hint, but is needed for its realisation in NL. Such information may be, for instance, the inference rule, its NL name and the formula which represents it, or a new hypothesis needed for the proof step. These are not themselves anchoring points, but specify the anchoring point for the particular domain and the hint category. They thus provide the possibility of a rounded hint realisation with the addition of information of the other aspects of a hint, captured in other dimensions of the hint taxonomy. The final addition of the pedagogically motivated feedback chosen by the tutorial manager via discourse structure and dialogue modelling aspects completes the information needed by the generator. References [1] C. Benzmüller et al. Tutorial dialogs on mathematical proofs. In Proceedings IJCAI Workshop on Knowledge Representation and Automated Reasoning for E-Learning Systems, pp. 12–22, Acapulco, 2003. [2] A. Fiedler and D. Tsovaltzi. Automating hinting in an intelligent tutorial system. In Proceedings IJCAI Workshop on Knowledge Representation and Automated Reasoning for ELearning Systems, pp. 23–35, Acapulco, 2003. [3] G. Hume et al. Student responses and follow up tutorial tactics in an ITS. In Proceedings 9th Florida Artificial Intelligence Research Symposium, pp. 168–172, Key West, FL, 1996. [4] E. Lim and D. Moore. Problem solving in geometry: Comparing the effects of non-goal specifi c instruction and conventional worked examples. Journal of Educational Psychology, 22(5):591–612, 2002. [5] N. Matsuda and K. VanLehn. Modelling hinting strategies for geometry theorem proving. In Proceedings 9th International Conference on User Modeling, Pittsburgh, PA, 2003. [6] J. Moore. What makes human explanations effective? In Proceedings 15th Annual Meeting of the Cognitive Science Society, Hillsdale, NJ, 1993. [7] N. Person et al. Dialog move generation and conversation management in AutoTutor. In C. Rosé and R. Freedman, eds., Building Dialog Systems for Tutorial Applications—Papers from the AAAI Fall Symposium, pp. 45–51, North Falmouth, MA, 2000. AAAI press. [8] C. Rosé et al. A comparative evaluation of socratic versus didactic tutoring. In J. Moore and K. Stenning, eds., Proceedings 23rd Annual Conference of the Cognitive Science Society, University of Edinburgh, Scotland, UK, 2001. [9] J. Siekmann et al. Proof development with ΩMEGA. In A. Voronkov, ed., Automated Deduction — CADE-18, number 2392 in LNAI, pp. 144–149. Springer, 2002. [10] D. Tsovaltzi et al. A Multi-Dimensional Taxonomy for Automating Hinting. In Intelligent Tutoring Systems — 6th International Conference, ITS 2004, LNCS. Springer, 2004. [11] D. Tsovaltzi and E. Karagjosova. A dialogue move taxonomy for tutorial dialogues. In Proceedings 5th SIGdial Workshop on Discourse and Dialogue, Boston, USA, 2004. [12] B. Weiner. Human Motivation: metaphor, thoeries, and research. Sage Publications, 1992. [13] B. Wilson and P. Cole. Cognitive teaching models. In D. Jonassen, ed., Handbook of Research for educational communications and technology. MacMillan, 1996.
804
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
How to Qualitatively + Quantitatively Assess Concepts Maps: the case of COMPASS Evangelia GOULI, Agoritsa GOGOULOU, Kyparisia PAPANIKOLAOY and Maria GRIGORIADOU Department of Informatics & Telecommunications, University of Athens, Panepistimiopolis, Ilissia, Athens 15784, Greece
[email protected],
[email protected],
[email protected],
[email protected] Abstract. This paper presents a scheme for the quantitative and qualitative assessment of concept maps in the context of a web-based adaptive concept map assessment tool, referred to as COMPASS. The propositions are characterized qualitatively based on specific criteria and on the error(s) that may be identified. The quantitative assessment depends on the weights assigned to the concepts/propositions and the error categories.
Introduction In educational settings, where assessment is aligned with instruction, concept maps are considered to be a valuable tool of an assessment toolbox, as they provide an explicit and overt representation of learners’ knowledge structure and promote meaningful learning [6]. A concept map is comprised of nodes, which represent concepts, and links, annotated with labels, which represent relationships between concepts. The triple Concept-Relationship-Concept constitutes a proposition, which is the fundamental unit of the map. The assessment of a concept map is usually accomplished by comparing the learner’s map with the expert one [7]. Two most commonly investigated assessment methods are the structural method [6], which provides a quantitative assessment of the map, taking into account only the valid components, and the relational method, which focuses on the accuracy of each proposition. Most of the assessment schemes proposed in literature either have been applied to studies where the evaluation of concept maps is human-based [7], [5] or constitute a theoretical framework [4], while the number of systems that have embedded a scheme for automated assessment and for feedback provision is minimal [1]. In this context, we propose an assessment scheme for both the qualitative and quantitative assessment of concept maps and subsequently for the qualitative and quantitative estimation of learner’s knowledge. The assessment scheme has been embedded in COMPASS (COncept MaP ASSessment tool) (http://hermes.di.uoa.gr:8080/compass), an adaptive webbased concept map assessment tool [3], which serves the assessment and the learning processes by employing a variety of activities and providing different informative, tutoring and reflective feedback components, tailored to learners’ individual characteristics and needs. 1. The Assessment Scheme embedded in COMPASS The proposed scheme is based on the relational method and takes into account both the presented concepts on learner’s map and their corresponding relationship(s) as well as the missing ones, with respect to the expected propositions presented on expert map. The
E. Gouli et al. / How to Qualitatively + Quantitatively Assess Concepts Maps
805
propositions are assessed according to specific criteria concerning completeness, accuracy, superfluity, missing out and non-recognizability. More specifically, a proposition is qualitative characterized [3] as (i) complete-accurate: when it is the expected one, (ii) incomplete: when, at least, one of the expected components (i.e. the involved concepts and their relationship(s)) is incomplete or missing; the error categories that may be identified are incomplete relationship (IR), missing relationship (MR), missing concept and its relationship(s) (MCR) and missing concept belonging to a group and its relationship(s) (MCGR), (iii) inaccurate: when, at least, one component/characteristic of the proposition is inaccurate; the error categories that may be identified are incorrect concept (IC), incorrect relationship (INR), concept at different place (CDP) and difference in arrow’s direction (DAD), (iv) inaccurate-superfluous: when, at least, one component of the proposition is characterized as superfluous; the error categories that may be identified are superfluous relationship (SR) and superfluous concept and its relationship(s) (SCR), (v) missing: when the expected proposition is missing (i.e. missing proposition (MP) error), and (vi) non-recognizable: when it is not possible to assess the proposition, due to a non-recognizable concept (NRC) and/or a non-recognizable relationship (NRR). The qualitative assessment is based on the aforementioned qualitative analysis of the errors and aims to contribute to the qualitative diagnosis of learner’s knowledge, identifying learner’s incomplete understanding/beliefs (the errors “MCR”, “IR”, “MR”, CDP”, “MCGR”, and “MP” are identified) and false beliefs (the errors “SCR”, “INR”, “IC”, “SR”, “DAD” are identified). The quantitative analysis is based on the weights assigned to each error category as well as to each concept and proposition that appear on expert map. The weights are assigned by the teacher and reflect the degree of importance of the concepts and propositions as well as of the error categories, with respect to the learning outcomes addressed by the activity. The assessment process consists of the following steps (a detailed description is given in [3]): x at first, the weights of the concepts, that exist in both maps (learner’s and expert) and they are at the correct position, as well as the weights of the propositions on learner’s map, which are characterized as complete-accurate, are added to the total score, x for all the propositions/concepts, which are partially correct (i.e. errors “IR”, “IC”, “INR”, “CDP”, and “DAD”), their weights are partially added to the total score; they are adjusted according to the weights of the corresponding error categories and added to the total score, x for all the propositions/concepts, which are superfluous or missing (i.e. errors “SCR”, “SR”, “MR”, “MCR”, and “MCGR”), their weights are ignored and the weights of the related concepts, which have been fully added to the score at the first step, are adjusted according to the weights of the corresponding error categories and subtracted from the total score, x the total learner’s score is divided by the expert’s score (weights of all the concepts and propositions, presented on expert map, are added) to produce a ratio as a similarity index. The results of the quantitative and the qualitative assessment are exploited for the provision of adequate personalised feedback according to the underlying error(s) identified, aiming to stimulate learners to reflect on their beliefs.
2. Empirical Evaluation During the formative evaluation of COMPASS, an empirical study was conducted, aiming to investigate the validity of the proposed scheme, as far as the quantitative estimation of learners’ knowledge is concerned. In particular, we investigated the correlation of the quantitative results obtained from COMPASS with the results derived from two other approaches: (i) the holistic assessment of concept maps by a teacher who assigned a score on a scale from 1 to 10, and (ii) the assessment of maps based on the similarity index algorithm of Goldsmith et al. [2]. The study took place during the school year 2004-2005, in the context of a
806
E. Gouli et al. / How to Qualitatively + Quantitatively Assess Concepts Maps
Estimation of Students' Knowledge Level
course on Informatics at a high school. Sixteen students participated in the study. The students were asked to use COMPASS and work on a “concept-relationship list construction” task, concerning the central concept of “Control Structures”. The results from the assessment of students’ concept maps, according to the three different approaches, are presented in Figure 1. The reader may notice that the quantitative scores obtained from COMPASS converge in a high degree with the scores obtained from the other two assessment approaches. COMPASS
Teacher
Similarity Index 100% 80% 60% 40% 20% 0% 1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
12th
13th
14th
15th
16th
Figure 1.The results of the quantitative assessment of students’ concept maps.
3. Conclusions The discriminative characteristics of the proposed scheme are: (i) the qualitative characterization of the propositions, (ii) the assessment process followed, which takes into account not only the complete-accurate propositions but also the identified errors, (iii) the qualitative diagnosis of learner’s knowledge, based on the qualitative analysis of the errors identified, (iv) the quantitative estimation of learner’s knowledge level, based on the complete-accurate propositions, on the weights assigned to the concepts, the propositions and the error categories, and (vi) the flexibility provided to the teacher in order to experiment with different weights and to personalize the assessment process. The validity of the proposed assessment scheme can be characterized as satisfactory, as the quantitative estimation of learner’s knowledge obtained from COMPASS are close with the estimation obtained from the human-based assessment and the similarity index algorithm.
References [1]
[2] [3]
[4]
[5] [6] [7]
Conlon, T. (2004). “Please argue, I could be wrong”: A Reasonable Fallible Analyser for Student Concept Maps. Proceedings of ED-MEDIA 2004, World Conference on Educational Multimedia, Hypermedia and Telecommunications, Volume 2004, Issue 1, 1299-1306. Goldsmith, T., Johnson, P. & Acton, W. (1991). Assessing structural knowledge. Journal of Educational Psychology, 83, 88-96. Gouli, E., Gogoulou, A., Papanikolaou, K., & Grigoriadou, M. (2005). Evaluating Learner’s Knowledge level on Concept Mapping Tasks. In Proceedings of the 5th IEEE International Conference on Advanced Learning Technologies (ICALT 2005) (to appear). Lin, S-C., Chang, K-E., Sung, Y-T., & Chen, G-D. (2002). A new structural knowledge assessment based on weighted concept maps. Proceedings of the International Conference on Computers in Education (ICCE’02), 1, 679-680. Nicoll, G., Francisco, J., & Nakhleh, M. (2001). A three-tier system for assessing concept map links: a methodological study. International Journal of Science Education, 23, 8, 863-875. Novak, J., & Gowin, D. (1984). Learning How to Learn. New York: Cambridge University Press. Ruiz-Primo, M., & Shavelson, R. (1996). Problems and issues in the use of concept maps in science assessment. Journal of Research in Science Teaching, 33 (6), 569-600.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
807
Describing Learner Support: An adaptation of IMS-LD Educational Modelling Language Patricia GOUNON*, Pascal LEROUX** and Xavier DUBOURG* Laboratoire d'Informatique de l'Université du Maine - CNRS FRE 2730 * I.U.T. de Laval – Département Services et Réseaux de Communication 52, rue des docteurs Calmette et Guérin 53020 LAVAL Cedex 9, France {patricia.gounon; xavier.dubourg}@univ-lemans.fr phone: (33) 2 43 59 49 23 ** Institut d’Informatique Claude Chappe Avenue René Laennec 72085 Le Mans Cedex 9, France
[email protected] phone: (33) 2 43 83 38 53 Abstract. In this paper, we propose an adaptation to the educational modelling language IMS-Learning Design in terms of support activity description and the specification of the actors’ roles in these activities. The propositions are based on an organization tutoring model that we have defined. This model has three goals: (1) to organize tasks between actors tutor with learners during a learning session, (2) to allow an adaptive support activity to learners in according to the learning situation and (3) to specify support activity tools of learning environment. Keywords: tutoring model, educational modelling language, IMS-Learning Design, learner support.
1. Introduction Different learner support problems are observed in distance learning environments from both the learner and human tutor point of view. A learner may have difficulties concerning in knowing when and about what he could contact the tutor during a learning session. What’s more, the learner is not always aware of the mistakes he makes. Therefore, he does not necessarily take the initiative to ask for help. The human tutor may find it difficulty to following the learning activity development. These obstacles affect the human tutor’s capacity to react in time and with a suitable learner adapted activity. These observations give rise to the question: how can we facilitate the design of the accompanying learner environments in the case of distance learning? One response is to offer to guide the designer in the description of the pedagogical scenario of a study unit integrating, in the design process, the learners’ planned support. Presently, the pedagogical scenario descriptions use an Educational Modelling Language (EML). An EML is a semantic model describing the content and the process of a study unit whilst allowing reuse and interoperability [4]. The learner support notion is not often taken into account. It is the reason why we propose an adaptation concerning the EML IMS-LD. The proposition is based on the tutoring organization model that we describe in the next part. We will conclude by giving some perspectives for our research.
808
P. Gounon et al. / Describing Learner Support
2. Model to Organize Tutoring for Learning Activities Our tutoring model [2] is organized around three components: the tutor, the tutored person and the tutoring style. The tutor component identifies which actor should intervene during the learning activity. The tutored person component defines the beneficiaries of tutor interventions during the learning session. The tutoring style component clarifies the tutoring strategy and the associated tools for actors of learning sessions. To describe the tutoring style, we have to determine (1) the intervention content brought to one or several learners (2) the intervention mode, and (3) actions scheduling. We define four tutoring contents including motivation, which corresponds to a social aspect of tutoring. From this model, the designer describes tutor tasks during the session. Each task identifies the tutor, the beneficiary and the task style. Then, we use each described task to specify tools to support the proposed tutor actions during a learning activity. The tutoring model is used during the four phases of the life-cycle courseware: design, production, development and evaluation (see Figure 1). The tutoring model application in the life-cycle courseware aims, both to define and understand the tutoring activity better and to facilitate the analysis of the observed tutoring at the end of the learning activity.
Figure 1. Life-cycle Courseware
2. Describing Support Actors Using Norms There has been a real interest over recent years for the use and application of standards so as to encourage the exchange and reuse of learning objects. [3] defines a learning object as 'any digital resource, used to create learning activities or to support learning and that could be used, re-used or referenced during a learning activity. Different approaches exist to describe learning objects: the documentation approach (LOM) [1], the engineering of software components (SCORM) [5] and the pedagogical engineering (EML). Our goal is to define what exactly concerns the learner support in pedagogical scenarios. Consequently, it is important to examine how the support is dealt with in the EMLs. We use and make propositions particularly with the language IMS-LD (Open University of the Netherlands) [3]. We choose this language because it allows to model all pedagogical situations and it is opened to modifications. This is important if we want to integrate our tutoring model elements. This language allows us to describe the development of a study unit using an important diversity of existing pedagogical approaches (constructivism approach, socioconstructivism, …). It use permits us to consider the association of the different contents (pedagogical resources, tools) of a learning design. It also aims to describe the support activity for a unit of study. The description of support activity with IMS-LD is poor and do not allows a precise tutor tasks (tutoring mode, tutoring style, content). The interest of our work consists to add several information allowing to have a better tutoring description for a study unit. To do that, we use the characteristics of the tutoring model.
P. Gounon et al. / Describing Learner Support
809
3. IMS-LD Adaptation Proposal integrating the Tutoring Organization Model First, the modifications brought to the role component concern the learner and staff components. We add categories (sub-group, co-learner, …) identified in the tutoring model proposed. Thus the granularity for the actors description of a given study unit is increased. Second, we have added further modifications to the service description. Various information are inserted in the part to establish the actor references using the tool and the intervention mode used. The aim of the extension proposition is to facilitate the use analysis of the different support tools in a study unit. It is also a way to give better access to tools during the learning activity design. Third, with the tutoring model, we define a unit of tutor tasks that can be carried out during a learning activity. These tasks help to identify the characteristics and to specify the tool management of the learners’ support activity. The tool choice is expressed with IMS-LD. It describes a tutor action by using the tag . The staff references are modified by specifying the characteristics of each actor (tutor and tutored person) and of the exchange style. This description corresponds to the tutor task transcription described in the tag . Then, the task application is defined by one of the tags: x x x
the task is universal to the study unit (), the task is specific to a structure activity () or the task described is specific to a learning activity (). Finally, the tool satisfying the task described is referenced in the tag environment. 4. Conclusion We proposed, in this paper, an extension to the EML IMS-LD integrating a tutoring organization model that we use to guide the design of support environments. Our proposition aim to add a level of detail to the participating tutor and tutored person’s description. This adaptation also brings the same degree of precision to the tool description. Our proposition is used in the environment to guide the designer in the description of the study unit and the specifications of the learner support. The application helps the designer to specify the tool choice for the support activity by proposing a uniform range of tools according to the defined tasks. We also wish to enable the integration of tools and pedagogical scenarios to existing platforms described with IMS-LD. References [1] Forte, E. Haenni, F. Warkentyne K., Duval, E. Cardinaels, K. Vervaet, E. Hendrikx, K. Wentland Forte, M. Simillion, F. « Semantic and Pedagogic Interoperability Mechanisms in the ARIADNE Educational Repository », in ACM SIGMOD, Vol. 28, No. 1, March 1999.
[2] Gounon, P., Leroux, P. & Dubourg, X., « Proposition d’un modèle de tutorat pour la conception de dispositifs d’accompagnement en formation en ligne » (à paraître), In: Revue internationale des technologies en pédagogie universitaire, numéro spécial: L'ingénierie pédagogique à l'heure des TIC, printemps 2005. [3] Koper, R., Olivier, B. & Anderson T., eds., IMS Learning Design Information Model, IMS Global Learning Consortium, Inc., version 1.0, 20/01/2003. [4] Rawlings, A ; Rosmalen, P., Koper, R., (OUNL), Rodríguez-Artacho, M., (UNED), Lefrere, P., (UKOU), « Survey of Educational Modelling Languages (EMLs) », 2002. [5] ADL/SCORM, ADL Sharable Content Object Reference Model Version 1.3, Working draft 0.9, 2002.
810
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Developing a Bayes-net based student model for an External Representation Selection Tutor Beate Grawemeyer and Richard Cox Representation & Cognition Group Department of Informatics, University of Sussex, Falmer, Brighton BN1 9QH, UK Abstract. This paper describes the process by which we are constructing an intelligent tutoring system (ERST) designed to improve learners’ external representation (ER) selection accuracy on a range of database query tasks. This paper describes how ERST’s student model is being constructed - it is a Bayesian network seeded with data from experimental studies. The studies examined the effects of students’ background knowledge-of-external representations (KER) upon performance and their preferences for particular information display forms across a range of database query types. Keywords. Student modeling, External representations, Bayesian networks
1. Introduction Successful use of external representations (ERs) depends upon the skillful matching of a particular representation with the demands of the task. Good ER selection requires, inter alia, knowledge of a range of ERs in terms of a) their semantic properties (e.g. expressiveness), b) their functional roles (e.g. [4],[1]) together with information about the ‘applicability conditions’ under which a representation is suitable for use [7]. Our aim is to build ERST - an ER selection tutor. We conducted a series of empirical studies (e.g. [6]), that have provided data for ERST’s student model and it’s adaptation mechanism. This paper extends the work by investigating the effect of learners’ background knowledge of ERs (KER) upon information display selection across a range of tasks that differ in their representation-specificity. In the experiments, a prototype automatic information visualization engine (AIVE) was used to present a series of questions about information in a database. Participants were asked to make judgments and comparisons between cars and car features. Each participant responded to 30 questions, of which there were 6 types, e.g. identify; correlate; quantifier-set; locate; cluster; compare negative. Participants were informed that to help them answer the questions, the system would supply the needed data from the database. AIVE then offered participants a choice of representations of the data. They could choose between various types of ERs, e.g. set diagram, scatter plot, bar chart, sector graph, pie chart and table. The ER options were presented as an array of buttons each with an icon depicting, in stylized form, an ER type (bar chart, scatter plot, pie chart, etc). When the participant made his or her choice,
B. Grawemeyer and R. Cox / Developing a Bayes-Net Based Student Model
811
AIVE then instantiated the chosen representational form with the data needed to answer the task and displayed a well-formed, full-screen ER from which the participant could read-off the information needed to answer the question. Having read-off the information, subjects indicated their response via on-screen button selections (i.e. selecting one option out of a set of possible options). Note that each of the 30 questions could (potentially) be answered with any of the ER display types offered. However, each question type had an ’optimal’ ER. Following a completed response, the participant was presented with the next question in the series of 30 and the sequence was repeated. The data recorded were: the randomized position of each representation icon from trial to trial; user’s representation choices (DSA); time to read question and select representation (DSL); time to answer the question (DBQL); responses to questions (DBQA). Further details about the experimental procedure are provided in [6]. Prior to the database query tasks, participants were provided with 4 different types of KER pre-tests [5]. These tests consisted of a series of cognitive tasks designed to assess ER knowledge representation at the perceptual, semantic and output levels of the cognitive system. A large corpus of external representations (ERs) was used as stimuli. The corpus contains 112 ER examples. The decision task (ERD) was a visual recognition task requiring real/fake decisions1 . The categorisation task (ERC) assessed semantic knowledge of ERs - subjects categorised each representation as ‘graph or chart’, or ‘icon/logo’, ‘map’, etc. In the functional knowledge task (ERF), subjects were asked ‘What is this ER’s function’?. In the naming task (ERN), for each ER, subjects chose a name from a list. E.g.: ‘venn diagram’, ‘timetable’, ‘scatterplot’, ‘Gantt chart’, ‘entity relation (ER) diagram’, etc [5].
2. Results and Discussion The simple bivariate correlations between KER and AIVE tasks for display selection accuracy (DSA), database query answering accuracy (DBQA), display selection latency (DSL) and database query answering latency (DBQL) were: Three of the 4 KER tasks correlated significantly and positively with DBQA (ERD r=.46, p AT > AT-MC. 3.1 Methods The participants were 132 students from Rhodes College and University of Memphis who were paid for their participation. The experiment consisted of three phases: a pretest phase, a learning phase, and a post-test phase. During the pre-test phase, all participants were administered 26 multiple choice questions (pulled from the Force Concept Inventory). During the learning phase, participants answered four physics problems while interacting with one of the AutoTutors. The post-test phase consisted of a different set of 26 multiple choice questions (counterbalanced with the pre-test), and a user perception survey. The experiment took approximately two hours to complete. 3.2 Results and Discussion We compared the three different tutors using four outcome indices: pre-test, post-test, simple learning gains (post test – pre test), and proportional learning gains [(post-test proportion – pre-test proportion) / (1 – pre-test proportion)]. The pre- and post-tests were converted to proportion correct scores. There were no significant differences in pre-test scores between the tutoring conditions. Overall, we found that all the versions of AutoTutor produced significant learning gains; posttest scores (M =.60, SD =.18) were significantly better than pretest
H.-J. Joyce Kim et al. / Computer Simulation as an Instructional Technology in AutoTutor
847
Table 1 Means and Standard Deviations for Learning Measures. Tutor conditions
AT-Simulation AT AT-Monte Carlo
Pretest
Posttest
.459 (.18) .442 (.20) .464 (.20)
.633 (.17) .589 (.19) .582 (.18)
Simple Learning Gains .174 (.14) .147 (.15) .118 (.13)
Proportional Learning Gains .309 (.25) .271 (.25) .237 (.27)
scores (M = .46, SD = .19), F(1, 129)= 141.17, p < .001, effect-size = 0.74. Although no other effects were significant, the data trend supported the predictions: AT-Sim > AT > AT-MC. Table 1 shows the cell means and SD’s in the analyses. A 2 (pre vs. posttest scores) x 3 (three tutor conditions) x 2 (low vs. high knowledge) ANOVA showed a significant interaction between test scores and a domain knowledge, F(1, 90) = 23.44, p < .01. The difference between pre and post scores was significantly greater for students with low domain knowledge than those with high domain knowledge. Thus, students with low knowledge benefited more from AutoTutor than those with a high knowledge. More interestingly, when we used participants whose pre-test scores were greater than .5, we found a significant difference in the simple learning gains between AT-Sim and AT-MC, F(1, 28) = 4.19, p< .05. AT-Sim produced significantly higher learning gains than AT-MC. This indicates that Monte Carlo tutor might inhibit learning for high knowledge participants and learning gains might suffer without adaptive dialogues. We are currently in the process of revising the simulation dialogues and improving the simulation environments. Improved simulation dialogues, faster display of simulations, and modeling of effective learning with simulations might ultimately help students to learn deeply about abstract physics concepts. Interactive simulations will hopefully show some promise as a new medium for dialogue scaffolding, creating an immersive environment in which the learner and tutor can interact. Acknowledgements The research on AutoTutor was supported by the National Science Foundation (SBR 9720314, REC 0106965, REC 0126265, ITR 0325428) and the DoD Multidisciplinary University Research Initiative (MURI) administered by ONR under grant N00014-00-1-0600 (visit http://www.autotutor.org).. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DoD, ONR, or NSF. References [1] Choi, B., & Gennaro, E. (1987). The effectiveness of using computer simulated experiments on junior high students’ understanding of the volume displacement concept. Journal of Research in Science Teaching, 24(6), 539-552. [2] Dekker, J., & Donatti, S. (1981). The integration of research studies on the use of simulation as an instructional strategy. Journal of Educational Research, 74(6), 424-427. [3] Graesser, A.C., Chipman, P., Haynes, B., & Olney, A. (in press). AutoTutor: An intelligent-tutoring system with mixed-initiative dialogue. IEEE Transactions in Education. [4] Graesser, A.C., Lu, S., Jackson, G.T., Mitchell, H., Ventura, M., Olney, A., & Louwerse, M.M. (2004). AutoTutor: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers, 36, 180-193.
848
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Developing Teaching Aids for Distance Education Jihie Kim, Carole Beal, and Zeeshan Maqbool USC/Information Sciences Institute, Marina del Rey, CA 90292, USA Abstract. As web-enhanced courses become more successful, they put considerable burdens on instructors and teaching assistants. We present our work on developing software tools to support instructors by A) semi automatic grading of discussions and B) creating instructional tools that handle many student requests. We are using knowledge-based techniques in modelling course components, student queries, and relations between them. The results from our initial analysis in developing such tools are also presented.
Introduction Web-enhanced courses and distance education courses are becoming increasingly popular. Such courses make class materials easily accessible to remote students, and increase the availability of instructors to students beyond the traditional classroom. However, as such courses become more successful, their enrollments increase, and the heavier on-line interaction places considerable burdens on instructors and teaching assistants. Thus, the ultimate success of web-based education is constrained by limited instructor time and availability. At the same time, many routine student queries and on-line activities do not necessarily require instructor or TA intervention. Software tools that can handle some student activities would allow instructors to focus on queries and activities that truly require their attention. 1. Turning quantity into quality: Development and validation of a measure to support automatic assessment of on-line discussion contributions Engagement in on-line discussions is an important part of student activities in distance education, and instructors often use it to measure each student’s contribution to the class. Although it is probably not feasible or pedagogically appropriate to completely automate the grading process, we are developing approaches to semi-automate some of the work. There has been some prototype measures of discussion quality that relies on the quantity of discussion contributions [2], which include the number of posted comments and the number of responses that a post elicits from classmates and/or the TA or instructor. We are extending the framework to accommodate various factors. Posts that engage many different students might have a higher probability of being high quality than a post that does not elicit interest from anyone else. If a student was involved in various discussions on different topics, we may infer that he/she has broader interests than a student who contributes to only small number of topics. We are currently collecting course data from various fields including Psychology, Mechanical Engineering and Computer Science. Here we report an initial analysis of a graduate-level Computer Science class on Advanced Operating Systems held in Fall 2003. The course had over 80 graduate students enrolled. Students were encouraged to participate in an on-line forum to discuss on general issues as well as course topics. Their participation was reflected in their grades as class participation scores, consisting up to
J. Kim et al. / Developing Teaching Aids for Distance Education
849
10% of the final grade. Table 1 presents a part of our results, showing ranks from three different groups: 5 students with highest ranks, 5 students with middle ranks, and 5 students with lowest ranks. The ranks are computed based on the following factors: A) total number of messages sent, B) average length of the threads where the student participated, C) total number of threads initiated by the student, D) average number of other students involved in the threads that the student initiated, E) total number of different threads where the student participated. The last column shows qualitative assessment of student participation by the instructor. A (rank)
B(rank)
C(rank)
D(rank)
E(rank)
Avg Rank
Instructor’s assessment
S-high1
25(4)
7.41(31)
8(3)
3.57(16)
23(3)
11.4
strong
S-high2
23(6)
9(23)
5(5)
3.5(17)
16(7)
11.6
strong
S-high3
28(3)
7(34)
4(7)
3.75(13)
18(4)
12.2
strong
S-high4
8(14)
10.25(18)
2(15)
6(3)
7(12)
12.4
relatively strong
S-high5
104(1)
6.21(41)
16(1)
3.21(19)
37(1)
12.6
strong
S-mid1
7(17)
6(42)
1(20)
3(20)
6(14)
22.6
not strong
S-mid2
4(29)
6.26(40)
4(7)
3.8(12)
3(29)
23.4
not strong
S-mid3
4(29)
8.5(25)
2(15)
5(5)
1(43)
23.4
not strong
S-mid4
6(22)
13(8)
0(34)
0(33)
4(24)
24.2
not strong
S-mid5
7(17)
7.17(33)
0(34)
0(33)
8(10)
25.4
not strong
S-low1
1(46)
8(23)
0(34)
0(33)
1(43)
36.4
not strong
S-low2
2(40)
3.5(53)
0(34)
0(33)
2(38)
39.6
not strong
S-low3
1(46)
7(34)
0(34)
0(33)
0(54)
40.2
not strong
S-low4
1(46)
2(57)
1(20)
2(26)
0(54)
40.6
not strong
S-low5
1(46)
5(45)
0(34)
0(33)
0(54)
42.4
not strong
Table 1: Student participations in discussions.
As shown in the table, the instructor agreed that in fact the top 5 students provided strong contributions to the discussions and other students were less strong. Also, we found that there are some correlations between A,C, and E factors. We are currently validating actual correlations between these factors and analyzing other factors that can be potentially useful. 2. Developing instructional tool that semi-automatically answers student queries The goal of this part of the work is to develop a tool that can handle many of student requests semi-automatically. The tool will seek the instructor's help only when the student needs additional help. As an initial step, we are focusing on routine queries on general course information, administrative issues on assignments and exams, and other frequently asked questions. Instructors agree that they often spend a significant amount of time although many of them do not actually need their intervention. We are developing 1) a course ontology that represents generic components of distance education courses, 2) a query ontology that describes types of student queries and requests, and 3) general mappings between the two ontologies, i.e., how a type of query can be addressed by some course components. They are being built as general background knowledge which can support various reasoning capabilities such as classification, verification and knowledge authoring across different courses [1]. Note that these ontologies can include dependencies between different components. For example, participation to the discussion forum is enabled when the student knows how to access the forum class. Attendance policy is a part of grading policy if class participation grade counts in attendance rate. Figure 1 shows the current ontology we are developing based on the Operating Systems course described above. The left hand side shows the concepts representing the course components. The right hand side shows types of student requests. The actual class structure and its materials are being represented in terms of these concepts and their relations. For example, the course is represented in terms of its syllabus, general information (office hours, exam dates, etc.), distance education network (DEN) relevant information, etc. Each student query is mapped to query types based on the
850
J. Kim et al. / Developing Teaching Aids for Distance Education
keywords in the message. In the figure the numbers next to query types mean the numbers of actual queries in the class. The lines in the figure highlight mappings between query types and course components. Course Info By making these Syllabus Student Request dates and lesson topics discussion forum relations explicit, the General Info forum account 18 TAs cannot access discussion forum 1 system can map student office hours/location exam details Exam details exam date 2 queries to relevant course exam date open book, 1 other details (e.g. open book) other exam materials materials efficiently and Homework details missing exam material 1 the results can be sent to Research paper details homework details Grading how to send homework 1 the students. When the midterm, final, quiz confirm homework sent 1 research paper, reading reports penalty for delayed submission 1 system cannot find class participation extension request 1 discussion research paper details appropriate mappings or Attendance policy length of the paper 1 handouts research paper due date 7 the student is not FAQs research paper proposal 1 Other information gathering information 1 satisfied with the Academic Policies other details 1 materials sent, the system DEN info credit transfer 1 Announcement grade may bring the case to the office hour changes grade changes 1 links to assignments wrong grade 1 instructor’s attention. All links to reading materials grade calculation 1 info about DEN access cheating 2 the interactions between info about discussion forum others creating forum account directed research 1 the system and the class info changes student will be available no class on certain date class move to different date to the instructor. … office hour changes The ontologies enable Student discussions on lesson topics Discussions about exams, assignment the system to find General discussions Other info from instructor websites answers when simple Class keyword based search Table-2. ontology of general information about a class and its mapping to student queries fails. The following shows an example of such a case. Although the student is asking about message posting and registration, the actual information he needs is how to access his forum account shown below. Student: I am unable to post message in the Class Discussion. In fact I didn't receive any activation key in email upon registration. Could you please suggest me a way out ? Course info: Your forum accounts have been created. Your username is the the username part of your school e-mail account, e.g., if your e-mail address
[email protected], then your username for the forum is gbush. Your temporary password is ....
If the system simply uses the content of the message, it may retrieve other instructional components such as how to register for DEN to access DEN materials, which does not help the student in this case. In order to provide an appropriate answer, the system needs to know what information will help the student in the given situation, such as discussion forum account enables the student participation in the discussion forum. The ontology can be also used in assisting the instructor. The system can show how certain answers were derived by tracing the concepts and their relations used during answer generation. The instructor can use the ontology in organizing their instructional materials and the system can check whether there is any missing or duplicate information by checking dependent components. We are planning to extend our ontology to take into account of the history of student activities, making the context of the queries more explicit. Acknowledgement We thank Dongho Kim for providing discussion data.
References [1] Kim, J. and Gil, Y., Knowledge Analysis on Process Models, Proceedings of IJCAI-2001. [2] Shaw, E. Assessing and Scaffolding Collaborative Learning in Online Discussions, Proceedings of AIEd-2005.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
851
Who Helps the Helper? A Situated Scaffolding System for Supporting Less Experienced Feedback Givers 1
Duenpen KOCHAKORNJARUPONG, 1Paul BRNA, and 2Paul VICKERS 1
The SCRE Centre, University of Glasgow, St Andrews Building 11 Eldon Street, Glasgow, G3 6NH, Scotland, UK 2 School of IET, Northumbria University Newcastle Upon Tyne, NE2 1XE, England, UK
[email protected],
[email protected],
[email protected] Abstract: This research emphasizes the construction of feedback pattern. A system called McFeSPA is designed to help inexperienced teaching assistants (TAs)1 who lack training in how to provide quality feedback. The system employs scaffolding to help the TAs improve quality feedback skill while marking assignments. We have currently been implementing the system with techniques drawn from Artificial Intelligence, cognitive psychology and theories of education. Our next step will entail the examination of the system for both scaffolding turned off to help two TAs give feedback to a group of students and two TAs using the full system with scaffolding.
1. Introduction The aim of our research is giving intelligent support for feedback givers with the help of feedback patterns [1], situated within the context of marking programming assignments. Although “feedback patterns” have been proposed in the pedagogical patterns project [2], they have not been implemented in ITS & AIED communities [3] to assist novice TAs become experienced teachers. McFeSPA employs some techniques to help teaching and learning based on feedback giving by experienced teachers. Although automated/semiautomated marking assignment systems can help teachers mark programming assignments (e.g. CourseMaster [4]), they don't explicitly scaffold novice TAs learning to give feedback, their main aim being to make marking assignments easier. In order to carry this research out, there are many questions that need to be asked including “How do people learn to give quality feedback” (and what is quality feedback)? “What does the feedback giver need to learn in order to help the learner”? 2. Scaffolding Framework The scaffolding approach has been selected as appropriate for TAs who, like adults, have little time to learn anything while engaged in marking students scripts. Although the implementation of scaffolding is difficult, scaffolding techniques have been deployed effectively in a number of systems (e.g. Ecolab [5]). We have chosen to work on the problem faced by the TAs in the realistic situation of marking programming assignments 1
Inexperienced TAs mean novice TAs including novice teachers, novice tutors, and novice lecturers
852
D. Kochakornjarupong et al. / Who Helps the Helper?
for large classes and providing feedback on the students' errors. The TAs are likely to be inexperienced in giving feedback even if they have excellent programming skills. Helping TAs learn to mark programming assignments is close to the method of providing cognitive apprenticeship [6], and consists of content, methods, sequencing, and aspects of social learning. We include this in the framework for designing McFeSPA2. McFeSPA’s architecture is presented in Figure 1. In this paper, we summarize the approach in Table 1. Element of McFeSPA framework Content: Two kinds of domain knowledge: about feedback (knowledge of feedback patterns, knowledge of scaffolding, knowledge of quality feedback), the programming domain (knowledge of errors/weaknesses), heuristic knowledge (rules for feedback pattern, rules for providing quality feedback, rules for tutor’s hint, and rules for dialogue response) Methods: An integrated set of cognitive and metacognitive skills through the process of observation and guided and supported practice as well as implementing fading within McFeSPA Sequencing: Applying the approach/skill of giving quality feedback to any course of assignments marking based on the users’ experience Social Learning: Situated learning (learning to give quality feedback in the situation of marking real assignments) and learning within a culture focused on and defined by expert practices
Table 1 Element of McFeSPA framework K n o w le d g e o f in d iv id u a l T A
K n o w le d g e o f E r r o r / W eakness M essage
K n o w le d g e o f in d iv id u a l S t u d e n t W e a k n e s s /G iv e n fe e d b a c k
R u le o f E r r o r / W eakness P a tte r n
K n o w le d g e o f E r r o r / W eakness Type
T A M o d u le
TA M odel
A n a ly s is o f S o lu tio n M o d u le
K n o w le d g e o f f e e d b a c k p a t te r n
R u le o f D ia lo g u e R e s p o n s e
K n o w le d g e o f q u a lity fe e d b a c k
P r o g r a m m in g D o m a in E x p e rt M o d e l
S y s te m In te r fa c e ( S e m i- A u to m a te d M a r k in g A s s ig n m e n t) S ym bol
C o m m u n ic a tio n M o d e l M odel
K n o w le d g e o f s c a ffo ld in g F e e d b a c k D o m a in E x p e r t K n o w le d g e
M u d u le /p r o c e s s
R u le s f o r fe e d b a c k p a tte r n s K n o w le d g e b a s e / s tu d e n ts s o lu t io n file
R u le s fo r q u a lity fe e d b a c k
R u le s fo r tu to r ’s h in ts
S tu d e n ts S o lu tio n S y s te m In te r fa c e
P e d a g o g ic a l M o d e l T im e r e s p o n s e
Figure 1 Architecture of scaffolding framework for provision feedback on students’ assignment
As can be seen in Figure 1, The TA receives the student’s solution from the interface of the system. Then the system analyses the student’s solution based on the error or weakness patterns detected. Thereafter, the system annotates error or weakness patterns and sends this to the TA module. In this stage the system allows the TA to add/update/delete further weakness messages, extending the system. This module will compare each student’s weakness with their previous weaknesses and the current weakness in order to help the TA provide appropriate feedback to the student. The TA module stores some information about what the TA does and this module will hold the information which helps TAs to reflect on their work – for example “doesn’t do very much” or “doesn’t spend a lot of time on reworking the Analysis of solution”, and so on. This module depends on monitoring the time taken by the TA, and also employs the knowledge of feedback pattern and the knowledge of quality feedback to help the TA organise the feedback for the student before generating the feedback report to the student. During this process, some information is passed between the Communication module which uses the rules for Dialogue Response and the Pedagogical module which uses the rules for hints, the rules 2
McFeSPA will run in two modes - scaffolding on or off - this is done for experimental reasons - see later.
D. Kochakornjarupong et al. / Who Helps the Helper?
853
for quality feedback, and the rules for feedback patterns. The Pedagogical module utilises three knowledge bases which are for scaffolding, quality feedback, and feedback patterns in order to scaffold the TA to provide quality feedback. In addition, the design and implementation of McFeSPA includes several forms of scaffold: Functional: the explanation of any components in McFeSPA; Content: five levels of contingent help; and Metacognitive: the hints are designed to help the TA rethink his/her decision, a form of scaffolding for reflection [7]. This latter type of scaffold can help the learner to be aware of his/her own learning through reflection, monitoring, etc. For example, the assessment of understanding "Do I know more/understand better now?" 3. Conclusion, & Future work From our combination of feedback patterns and our conceptualisation of quality feedback in terms of the five levels of contingent help in McFeSPA, we have hypothesised that McFeSPA could help the TAs learn to give feedback – and could also help the TAs improve their practical feedback skills while fading could promote better help seeking activity. The system is not designed to be a complete solution for supporting the TA e.g. it does not support any interaction between the TA and the student receiving the feedback; nor does it directly support marking assignments even though some error detection is available. Thus, we believe that this research makes a novel contribution to the field of AIED by focusing on how to train people to give quality feedback while marking assignments, in our case, in the context of teaching programming. McFeSPA helps TAs directly, and students indirectly. However, we cannot guarantee that TAs will be happy with the way McFeSPA works given that this depends on its usability which has yet to be determined. After further improvements, our long-term aims include the development of McFeSPA to provide scaffolding for a range of ILEs and also to provide services for webbased systems. References [1]
[2] [3] [4]
[5] [6]
[7]
Kochakornjarupong, D. and Brna, P. (2003). Towards Scaffolding Feedback Construction: Improving Learning for Student and Marker. In Y.S. Chee, et al. (Eds.), Proceedings of the 11th International Conference on Computers in Education, (ICCE 2003) (pp. 599-600). 2-5 December 2003, Hong Kong. Eckstein, J., Bergin, J., and Sharp, H. (2002). Patterns for Feedback. In Proceedings of EuroPLoP 2002. Seventh European Conference on Pattern Languages of Programs, Irsee, Germany. Kumar, V., McCalla, G., and Greer, J. (1999). Helping the Peer Helper. In Proceedings of International Conference on Artificial Intelligence in Education, Le Mans, France. Higgins, C., Symeonidis, P., and Tsintsifas, A. (2002). The Marking System for CourseMaster. In Proceedings of the 7th Annual Conference on Innovation and technology in Computer Science Education. Luckin, R. and du Boulay, B. (1999). Ecolab: The Development and Evaluation of a Vygotskian Design Framework. International Journal of Artificial Intelligence in Education, 10, 198-220. Collins, A., Brown, J.S., and Newman, S.E. (1989). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In L.B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser (pp. 453-494.). Hillsdale, NJ: Erlbaum. Jackson, S.L., Krajcik, J., and Soloway, E. (1998). The design of guided learner-adaptable scaffolding in interactive learning environments. In Proceedings of CHI 98, Los Angeles CA.
854
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Realizing Adaptive Questions and Answers for ICALL Systems Hidenobu Kunichika*1, Minoru Urushima*2, Tsukasa Hirashima*3 and Akira Takeuchi*2 *1 Dept. of Creation Informatics, Kyushu Institute of Technology, Japan *2 Dept. of Artificial Intelligence, Kyushu Institute of Technology, Japan *3 Dept. of Information Engineering, Hiroshima University, Japan Abstract. Language training systems that provide learners adaptive questions on the contents of stories require several capabilities such as semantic analysis, automated question generation and diagnosis of learners' answer sentences. This paper presents a method of selecting questions from a generated list to realize adaptive questions and answers. Our method filters out similar questions, and then selects questions by considering the difficulty, types and order. This paper also describes an evaluation of our method. As the result of our experiment demonstrates, we have found that our method generates a viable series of questions.
1. Introduction It is common in second language learning, to answer questions on the contents of passages after listening and/or reading them. Such questions and answers (QA) in the target language is effective for acquiring practical skills because multiple language skills are required to answer the questions, in particular to grasp the contents of the story and the questions, as well as to compose answers. Many computer assisted language learning systems have been developed [1]. Some are equipped with exercise functions which ask about the contents of sentences. Most, however, use questions prepared beforehand [5, 6, 7]. Thus these have the problem that such systems will present questions without considering the learner's level of understanding because the number of prepared questions is limited. In order to solve these problem, we are aiming to realize a QA function which provides adaptive questions on the surface semantics of English stories prepared by authors or learners. To realize the QA function, the following sub-functions are necessary: (1) to understand English sentences and to extract syntactic and semantic information, (2) to generate automatically various kinds of question sentences for presentation to learners who have varying degrees of comprehension, (3) to select adaptive questions from a set of generated question sentences, (4) to analyze learners' answer sentences and to diagnose errors and (5) to offer intelligent help for the correction of errors and the acquisition of correct knowledge by referring to the student models. In earlier studies, we have already implemented the submodules for the functions (1), (2), (4) and (5). This paper proposes an adaptive method of selecting questions for the function (3). 2. The outline of the QA function Our QA function gives learners questions about the contents of a story. After studying the contents of the story by reading and/or listening, they answer the questions. Aims of our QA are both to train for conversation by using multiple skills through reading a story, listening to or reading questions and composing answers and to give learners a chance to realize their own state of understanding of, for example, vocabulary and grammar; and to practice usage through QA. To reduce the burden of memorizing the content of the story and to concentrate on composing sentences from memory, the length of any one passage in a single presentation set at about 5 or 6 sentences and QA on the surface meaning of the story is sufficient.
H. Kunichika et al. / Realizing Adaptive Questions and Answers for ICALL Systems
855
The QA function generates as many questions about the story as possible [3], and then, selects a suitable and purposive question. 3. Selecting Adaptive Questions In order to achieve the aims mentioned in the previous chapter, the QA function needs to generate a series of questions according to the following principles instead of blindly giving questions. (1) to give a tailored series of questions for each learner: questions that are too easy/difficult will reduce learner motivation. It is, therefore, desirable to give each learner questions of suitable difficulty. (2) to make learners use as many skills as possible: The question generation module [3] generates four types of questions: a general question generated from one sentence, a special question generated from one sentence, a general question generated from more than one sentence and a special question generated from more than one sentence. Because these types of questions require different language skills on the part of the learner, it is necessary to give learners various types of questions. (3) to give questions following the flow of the story: The QA function gives learners series of questions. It is desirable that each series covers the entire contents of a story instead of asking about the same part of the story. When the QA function gives such a series of questions, it is necessary to consider the order of questions because a series of questions without the consideration of the flow of a story will confuse learners. (4) to avoid similar questions: Giving similar questions to already answered questions will reduce learner motivation. The QA function, therefore, needs to generate a series of questions after considering the history of QA. The principles are classified into two groups: (4) is for avoiding undesirable questions and (1) - (3) are for selecting desirable questions. Our method, first, filters out undesirable questions in a series by using the following restrictions. • Not selecting questions that have already been displayed. • Selecting no more than two questions pertaining to a sentence when the learner’s answer was correct. • Not selecting questions with the same case and object as the previous question. Next, our method tries to select desirable questions by referring to three factors: the difficulty of questions, the types of questions and the order of the questions. There is a best value for each selection factor and there is an acceptable range around the best value. It is necessary to select desirable questions by synthetically considering the three selection factors. Therefore, our method assigns the probability of selection to each question according to its desirability and selects one question at random. A way of assigning probability is as follows. (a) the difficulty of questions: Our method tries to select questions with values of difficulty at 5 1 points more/less than that of the previous question if a learner correctly/incorrectly answered the question. In order to realize such a selection, our method gives high probabilities to questions which have around the standard value of difficulty by referring to the normal distribution. (b) the types of questions: In order to make learners use various skills, our method gives the probability of selection to each question type with the intention of avoiding the same question type as the previous one and selecting all types in a series. (c) the order of questions: In order to select a series of questions to represent different sections of the story in an order same as that of the narration, our method defines the 1
We have implemented a mechanism to calculate the difficulty of questions which reflects the learner’s state of understanding [4]. In the previous study, we found that the threshold value used for the judgment of significant difference between two questions is 5.
856
H. Kunichika et al. / Realizing Adaptive Questions and Answers for ICALL Systems
areas of original sentences2 according to a specified number of questions in a series and gives a probability of selection to each sentence in an area. 4. Evaluation In order to confirm whether or not our method generates a good series, we compared a series of questions generated by our method with a series generated by a non-adaptive method. The non-adaptive method selects questions at random under the following three restrictions which can be implemented easily. • to select questions following the flow of the story, • not to select the same question twice, but • to allow up to two questions from the same sentence in a story to be selected when a learner correctly answered such questions. We used three stories from textbooks for Japanese junior high school students. Each story consists of six sentences. We set the number of questions for each series of questions to 4 and generated four series of questions about the contents of each story. We gave the stories and pairs of series generated by these two methods to subjects and asked them which they preferred. The number of subjects was 15. They were graduate and undergraduate students. Each subject compared 12 pairs. Thus there were 180 pairs (12 pairs * 15 subjects) in total. The subjects judged that our method was superior in 119 pairs. Therefore, we have found that our method is significantly better by using the binomial test (p
,QWURGXFWLRQ 1RZDGD\VFRPSXWHUVDUHWKRXJKWRIDVFRJQLWLYHWRROVIRUOHDUQHUVWRFRQVWUXFWNQRZOHGJHE\ DFWLYHO\EXLOGLQJDQGPDQLSXODWLQJLQWHUSUHWLQJDQGQHJRWLDWLQJ7KUHHZRUNLQJDVVXPSWLRQV IRXQGWKHLUGHVLJQ>@OHDUQLQJLQYROYHV FRQVWUXFWLRQRIRQH¶VRZQUHSUHVHQWDWLRQVUDWKHU WKDQ PHUHO\ LQWHUSUHWLQJ SUHIDEULFDWHG UHSUHVHQWDWLRQV WUDQVODWLRQ DQG PDQLSXODWLRQ RI PXOWLSOHUHSUHVHQWDWLRQVDQG LQWHUDFWLRQZLWKRWKHUVWRQHJRWLDWHPHDQLQJ,QDQXWVKHOOWKH FRPSXWHULVSXWIRUZDUGDVDVHPLRWLFWRROZLWKUHSUHVHQWDWLRQDODIIRUGDQFHVIRUFRQVWUXFWLRQ PDQLSXODWLRQDQGFROODERUDWLRQDVNH\ OHDUQLQJDFWLYLWLHV,Q WKLVSRVWHU,WDNH WKHQRWLRQRI VHPLRWLFOHDUQLQJWRROVWRWKHOHWWHU6HPLRVLVUHIHUVWRWKHSURFHVVRIPDNLQJPHDQLQJRXWRI REMHFWV V\PEROV RU DQ\WKLQJ HOVH SHUFHLYHG LQ WKH HQYLURQPHQW 6HPLRWLFLDQV FRQFHLYH RI KXPDQVDVHYROYLQJLQPDQ\LQWHUUHODWHGVLJQV\VWHPVLHLQDODUJHFRPSOH[V\VWHPRIVLJQ V\VWHPV>@6HPLRWLFVDVWKHVWXG\RIVLJQVLQYHVWLJDWHVWKHFXOWXUDOFRQYHQWLRQVWKDWVKDSH WKH UHODWLRQ EHWZHHQ VLJQLILHUV UHSUHVHQWLQJ DQG ZKDW LV VLJQLILHG UHSUHVHQWHG /HDUQLQJ WRROVZKHQTXDOLILHGDVVHPLRWLFVXSSRVHGO\KHOSOHDUQHUVLQPHDQLQJPDNLQJE\GHSHQGLQJ RQH[LVWLQJHYROYLQJRUHPHUJLQJFRQYHQWLRQVUHJDUGLQJJUDSKLFDODQGOLQJXLVWLFHOHPHQWVRQ FRPSXWHU VFUHHQV 7KH SRVWHU SUHVHQWV D WHQWDWLYH DQDO\VLV IRU D SDUWLFXODUO\ SHUYDVLYH JUDSKLFDOHOHPHQWDUHFWDQJOH $ERXWUHSUHVHQWDWLRQ $ UHSUHVHQWDWLRQ LV RIWHQ GHILQHG DV VRPHWKLQJ WKDW VWDQGV IRU VRPHWKLQJ HOVH LH LQWHUQDO UHSUHVHQWDWLRQV LQ WKH KXPDQ PLQG VWDQG IRU REMHFWV DQG UHODWLRQV LQ WKH UHDO ZRUOG >@ /HDUQLQJFDQEHFRQFHSWXDOLVHGDVWKHFRQVWUXFWLRQRILQWHUQDOUHSUHVHQWDWLRQVRQWKHEDVLVRI H[WHUQDO RQHV DQG UHFHQW UHVHDUFK IRFXVHV PRUH H[SOLFLWO\ RQ WKH LQWHUSOD\ RI LQWHUQDO DQG H[WHUQDOUHSUHVHQWDWLRQV>@6HPLRWLFVLQDGGLWLRQWDNHVLQWRDFFRXQWERWKWKHQDWXUHRIWKH VLWXDWLRQ DQG RI WKH HQWLW\ WKDW LV LQWHUSUHWLQJ FI RQH RI 3HLUFH¶V GHILQLWLRQV RI D VLJQ VRPHWKLQJZKLFKVWDQGVWRVRPHERG\ IRUVRPHWKLQJLQ VRPHUHVSHFW RUFDSDFLW\>@ ,QWKLV YLHZ WKH PHDQLQJ RI D UHSUHVHQWDWLRQ GHSHQGV RQ FXOWXUDO FRQYHQWLRQV DQG RQ SHUVRQ WDVN DQG VLWXDWLRQ FKDUDFWHULVWLFV 7KH FDVH IRU VHPLRWLFV LV WKUHHIROG PHDQLQJ PDNLQJ LV DQ LPSRUWDQW HGXFDWLRQDO JRDO GRPDLQ FRQYHQWLRQV DUH RIWHQ SDUW RI OHDUQLQJ REMHFWLYHV DQG OHDUQLQJ HQWDLOV EHFRPLQJ LQGHSHQGHQW RI D SDUWLFXODU UHSUHVHQWDWLRQ ZKLFK HQWDLOV PDQLSXODWLRQRIVHYHUDOGLIIHUHQWUHSUHVHQWDWLRQV5HOHYDQWGLPHQVLRQVLQHGXFDWLRQWKHUHIRUH DSSHDU WR EH WKH GHJUHH RI GRPDLQVSHFLILFLW\ DQG IDPLOLDULW\ RI UHSUHVHQWDWLRQV DQG WKH DPRXQWRIUHTXLUHGWUDQVODWLRQEHWZHHQUHSUHVHQWDWLRQV
E. de Vries / What’s in a Rectangle?
939
5HFWDQJOHVLQFRPSXWHUEDVHGOHDUQLQJWRROV 5HFWDQJOHV DUH HYHU\ZKHUH RQ FRPSXWHU VFUHHQV DQG H[SHULHQFHG XVHUV HDVLO\ LQWHUSUHW D UHFWDQJOH DV D ODEHO D FHOO D ZLGJHW D EXWWRQ RU DQ\WKLQJ HOVH H[FHSW PD\EH LQ XVLQJ DQ XQIDPLOLDU SURJUDP $FFRUGLQJ WR JHQHUDO FRQYHQWLRQV D UHFWDQJOH VLJQLILHV D ODEHO ZKHQ LW HQFORVHVRWKHUV\PEROVDURDGVLJQZLWKWKHQDPHRIDFLW\ DFRQWDLQHUZKHQLWLVVXSSRVHG WRKROGVRPHWKLQJD'SURMHFWLRQRIDER[ DEXLOGLQJEORFNZKHQLWLVD'SURMHFWLRQRID VROLG D ORFDWLRQ ZKHQ VSDWLDO FRQILJXUDWLRQ PDWWHUV RU D SURFHVV ZKHQ LW LQFDUQDWHV VRPH WUDQVIRUPDWLRQZLWKDQLQSXWDQGRXWSXW7KHDLPRIWKHDQDO\VLVSUHVHQWHGKHUHLVWRLGHQWLI\ HPHUJLQJFRQYHQWLRQVLQFRPSXWHUWRROVIRUOHDUQLQJ &RQFHSWPDSVDQGK\SHUPHGLDFRQVWUXFWLRQWRROV &RQFHSWPDSVOLNHVHPDQWLFQHWZRUNVGHSLFWNQRZOHGJHLQWHUPVRIFRQFHSWVWKHQRGHV DQG UHODWLRQV ODEHOOHG OLQHV RU DUURZV ,Q RUGHU WR FRQVWUXFW RQH OHDUQHUV QHHG WR LGHQWLI\ DQG H[SOLFLWFRQFHSWVDQGWKHLULQWHUUHODWLRQVKLSV7KHVDPHNLQGRIPDSVHLWKHUOHDUQHURUWHDFKHU FRQVWUXFWHGDUHDOVRXVHGDVRUJDQL]HUVRIK\SHUPHGLDPDWHULDOFIDZHEYLHZDVDG\QDPLF FOLFNDEOH PDS ,Q VXFK VHPDQWLF RUJDQL]DWLRQ WRROV UHFWDQJOHV VLJQLI\ ODEHOV RI FRQFHSWV RU FRQWHQW $OWKRXJK WKH WRROV JHQHUDOO\ XVH D FRGH IRU GLVWLQJXLVKLQJ FRQFHSWV IURP UHODWLRQV WKHXVHRIDGLIIHUHQWJUDSKLFDOHOHPHQWIRUODEHOOLQJHJHOOLSVHVLQVWHDGRIUHFWDQJOHVZRXOG QRW FKDQJH WKH LQWHQGHG LQWHUSUHWDWLRQ RU PHDQLQJ RI D PDS LQ PRVW FDVHV 7KXV WKH FRQYHQWLRQVUHJDUGLQJWKHXVHRIUHFWDQJOHVDUHQRWGRPDLQVSHFLILFDQGODUJHO\IROORZJHQHUDO FXOWXUDOFRQYHQWLRQV 0RGHOOLQJDQGVLPXODWLRQWRROV $VHFRQGFDWHJRU\FRQFHUQVPRGHOOLQJDQGVLPXODWLRQWRROVIRUOHDUQLQJLQELRORJ\SK\VLFV HWF(J6WHOODXVHVDKLJKO\VSHFLILFFRGHDUHFWDQJOHVLJQLILHVDVWRFNRIVRPHWKLQJFORXGV DUHUHVRXUFHVDQGFLUFOHVDUHIORZVRUFRQVWDQWV,QPRVWPRGHOOLQJWRROVKRZHYHUUHFWDQJOHV VLJQLI\YDULDEOHVDQGUHODWLRQV PXFKOLNHLQFRQFHSWPDSV7KHGLIIHUHQFHLVWKDWPRGHOOLQJ WRROVDOORZHQWHULQJDV\VWHPRIHTXDWLRQVWRUHSUHVHQWWKHXQGHUO\LQJPDWKHPDWLFDOPRGHODQ DFWLYLW\ WKDW UHTXLUHV WUDQVODWLRQ EHWZHHQ UHSUHVHQWDWLRQV %R[HU XVHV UHFWDQJOHV WR VLJQLI\ SURFHGXUHVHJ³PRYHV´ RIDFHUWDLQW\SHHJ³'RLW´ DQGWKDWKDYHDUHFWDQJOHDWWDFKHGDV DODEHOZLWKLWVQDPHHJ³WLFN´ %R[HUDOVRKDVURXQGHGUHFWDQJOHVIRUGDWDEXWQROLQHVRU DUURZVWRUHSUHVHQWUHODWLRQVRUIORZV7KHVHH[DPSOHVVKRZKLJKO\VSHFLDOL]HG\HWGLIIHUHQW ORFDO DQG UHODWLYHO\ XQIDPLOLDU FRQYHQWLRQV 'HVSLWH WKH IDFW WKDW WKH\ FRQFHUQ PDWKHPDWLFDO PRGHOV RI G\QDPLF V\VWHPV WKHVH WRROV GR QRW VHHP WR FRQIRUP WR D FRPPRQ HVWDEOLVKHG GRPDLQVSHFLILFFRQYHQWLRQ &ROODERUDWLRQDQGGLVFXVVLRQWRROV ,Q%HOYHGHUHUHFWDQJOHVDUHGDWDIDFWVREVHUYDWLRQV DQGURXQGHGUHFWDQJOHVDUHK\SRWKHVHV DUURZV DUH UHODWLRQV ZLWK D FRORXU FRGH UHG IRU VXSSRUWLQJ LQ IDYRXU RI DQG JUHHQ IRU LQYDOLGDWLQJDJDLQVW UHODWLRQV,Q'UHZUHFWDQJOHVDUHQRQFRQIOLFWXDOSURSRVLWLRQVVTXHH]HG UHFWDQJOHV DUH FRQIOLFWXDO RQHV FLUFOHV DUH QRQFRQIOLFWXDO UHODWLRQV DQG GLDPRQGV DUH FRQIOLFWXDO RQHV D FRORXU FRGH VLJQDOV FRQWULEXWLRQV RI GLIIHUHQW SDUWLFLSDQWV :KHUHDV WKH %HOYHGHUHIRUPDWVWUHVVHVHSLVWHPRORJLFDOVWDQFHK\SRWKHVLVGDWD WKH'UHZIRUPDWVWUHVVHV LQGLYLGXDOFRQWULEXWLRQVDQGGLV DJUHHPHQW0RUHRYHUVTXHH]HGUHFWDQJOHVLQ'UHZVHHPWR KLQJHRQWKHFRQQRWDWLRQRIEHLQJZHGJHGRUZLFNHGWRVLJQLI\GLVDJUHHPHQW%RWKH[DPSOHV VKRZ ORFDO FRQYHQWLRQV LQGHSHQGHQW RI D SDUWLFXODU GRPDLQ DQG UHODWLYHO\ XQIDPLOLDU WR WKH OHDUQHUV
940
E. de Vries / What’s in a Rectangle?
&RQFOXVLRQV ([WHUQDOUHSUHVHQWDWLRQVWKDWFRQIRUPWRDIRUPDOFRGHLQWKHH\HVRIWKHFRQVWUXFWRUPD\QRW QHFHVVDULO\ EH XQLYRFDO WR WKH LQGLYLGXDO WKDW UHDGV DQG PDQLSXODWHV WKHP LQ D SDUWLFXODU VLWXDWLRQ 7KH WHQWDWLYH DQDO\VLV RQ UHFWDQJOHV VKRZV WKDW ZH FDQQRW \HW FRQFHSWXDOL]H VHPLRWLFOHDUQLQJWRROVDVEHLQJSDUWRIDXQLTXHVHPLRORJLFDOV\VWHP>@$OWKRXJKWKHPRGH RI RSHUDWLRQ YLVXDO JUDSKLFDO DQG WKH GRPDLQ RI YDOLGLW\ OHDUQLQJ VLWXDWLRQV DUH VLPLODU ERWK WKH QDWXUH DQG QXPEHU RI VLJQV UHFWDQJOHV DUURZV FLUFOHV WH[WV DQG WKH W\SH RI IXQFWLRQLQJSUHVHQFHDEVHQFHVLPXOWDQHLW\ORFDWLRQ YDU\FRQVLGHUDEO\IURPWRROWRWRRO 7KHSUHVHQWHGWRROVXVHFRQYHQWLRQVIRUUHFWDQJOHVWKDWDUHOLWWOHGRPDLQVSHFLILFPRVWO\ XQIDPLOLDU WR WKH OHDUQHU DQG ZLWKWKH H[FHSWLRQ RI PRGHOOLQJ DQG VLPXODWLRQ WRROV GR QRW UHTXLUH PXFK WUDQVODWLRQ EHWZHHQ UHSUHVHQWDWLRQV 8QVSHFLILF FRQYHQWLRQV LH WKDW DUH QRW JURXQGHG LQ D GRPDLQ RI H[SHUWLVH PD\ FRQVWLWXWH DQ DGYDQWDJH EXW DQ LVVXH LV ZKHWKHU OHDUQLQJZLOOEHUREXVWZKHQWKHOHDUQHUV¶LQWHUSUHWDWLRQRIJUDSKLFDOHOHPHQWVGHYLDWHVIURP WKH LQWHQGHG PHDQLQJ 0RUHRYHU DOWKRXJK WKH\ VRPHZKDW FRQIRUP WR JHQHUDO FXOWXUDO FRQYHQWLRQV WKH WRROV DOVR LQWURGXFH KLJKO\VSHFLDOL]HG XQIDPLOLDU UHSUHVHQWDWLRQV ,Q IDFW WKH\ VXJJHVW D VWURQJ PRGHOOLQJ SHUVSHFWLYH RI NQRZOHGJH FRQVWUXFWLRQ DV LQGLYLGXDO PDWKHPDWLFDORUVRFLDODFWLYLW\$VHFRQGLVVXHLVWKHUHIRUHZKHWKHUOHDUQHUVZLOOHDVLO\DGRSW WKHP )LQDOO\ WKH WRROV UHTXLUH OHDUQHUV WR DGDSW WR D JLYHQ UHSUHVHQWDWLRQ DQG GR QRW SDUWLFXODUO\ LQYLWH OHDUQHUV WR WUDQVODWH EHWZHHQ WKHP H[FHSW IRU PRGHOOLQJ WRROV 7KH TXHVWLRQKHUHLVZKHWKHUOHDUQHUVZLOOEHDEOHWRHIIRUWOHVVO\VZLWFKIURPRQHUHSUHVHQWDWLRQWR DQRWKHU LQ XVLQJ PRUH WKDQ RQH WRRO DQG PRUH LPSRUWDQWO\ ZKHWKHU WKHLU NQRZOHGJH FRQVWUXFWLRQZLOOEHLQGHSHQGHQWRIWKHSDUWLFXODUUHSUHVHQWDWLRQXVHG $Q LPSOLFDWLRQ RI WKH DQDO\VLV LV ZH VKRXOG EH UHOXFWDQW WR TXDOLI\ QHZO\ GHYHORSHG OHDUQLQJWRROVDVVHPLRWLF2QWKHRQHKDQGPXOWLSO\LQJUHSUHVHQWDWLRQDOIRUPDWVPLJKWEHD VRXUFHRIFRQIXVLRQJLYHQWKDWXVHUVDUHOHDUQHUVRIUHSUHVHQWDWLRQDOIRUPDWVDVPXFKDVWKH\ DUH OHDUQHUV RI FRQWHQW $ VHW RI ORFDO XQIDPLOLDU XQVDQFWLRQHG DQG LQFRKHUHQW UHSUHVHQWDWLRQDOIRUPDWVLQWKLVYLHZZRXOGEHVHPLRWLFREVWDFOHVUDWKHUWKDQVHPLRWLFWRROV 2Q WKH RWKHU KDQG UHSUHVHQWDWLRQDO GLYHUVLW\ FRXOG DOVR UHPDLQ XQQRWLFHG SUHFLVHO\ EHFDXVH KXPDQVDUHWKRXJKWWRHYROYHLQDFRPSOH[V\VWHPRIPXOWLSOHVLJQV\VWHPVDQ\ZD\,QWKLV SHUVSHFWLYH KXPDQV DUH WUDLQHG LQWHUSUHWHUV RI DQG DGDSWHUV WR DOO VRUWV RI H[WHUQDO UHSUHVHQWDWLRQV HYHQ LQ D OHDUQLQJ VLWXDWLRQ ,Q WKH ODWWHU FDVH VSHDNLQJ RI VHPLRWLF OHDUQLQJ WRROVLPSOLFLWO\FDUULHVDGHQLDORIWKHSHUWLQHQFHRISDUWLFXODUUHSUHVHQWDWLRQDOIRUPDWV 5HIHUHQFHV >@ >@ >@ >@ >@ >@ >@
$LQVZRUWK 6 7KH IXQFWLRQV RI PXOWLSOH UHSUHVHQWDWLRQV &RPSXWHUV DQG (GXFDWLRQ 'XYDO5 6pPLRVLVHWSHQVpHKXPDLQH>6HPLRVLVDQGKXPDLQWKRXJKW@%HUQ3HWHU/DQJ (FR8 /HVLJQHKLVWRLUHHWDQDO\VHG¶XQFRQFHSW>7KHVLJQKLVWRU\DQGDQDO\VLVRIDFRQFHSW@ 7UDQVODWHGIURP,WDOLDQE\-0.OLQNHQEHUJ%UX[HOOHV(GLWLRQV/DERU 3DOPHU 6 ( )XQGDPHQWDO DVSHFWV RI FRJQLWLYH UHSUHVHQWDWLRQ ,Q ( 5RVFK % % /OR\G (GV &RJQLWLRQ DQG FDWHJRUL]DWLRQ SS +LOOVGDOH 1- /DZUHQFH (UOEDXP $VVRFLDWHV =KDQJ- 1RUPDQ'$ 5HSUHVHQWDWLRQVLQGLVWULEXWHGFRJQLWLYHWDVNV&RJQLWLYH6FLHQFH 3HLUFH & 6 &ROOHFWHG 3DSHUV &DPEULGJH +DUYDUG 8QLYHUVLW\ 3UHVV 3DUWLDOO\ WUDQVODWHG E\ * 'HOHGDOOH (G &KDUOHV 6 3HLUFH (FULWV VXU OH VLJQH 3DULV (GLWLRQV GX 6HXLO %DUWKHV5 (OpPHQWVGHVpPLRORJLH>(OHPHQWVRIVHPLRORJ\@&RPPXQLFDWLRQV
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
941
A User Modeling Framework for Exploring Creative Problem-Solving Ability Hao-Chuan WANG1, Tsai-Yen LI2, and Chun-Yen CHANG3 Institute of Information Science, Academia Sinica, Taiwan1 Department of Computer Science, National Chengchi University, Taiwan2 Department of Earth Sciences, National Taiwan Normal University, Taiwan3
[email protected],
[email protected],
[email protected] Abstract. This research proposes a user modeling framework which aims to assess and model users’ creative problem-solving ability from their self-explained ideas for a specific scenario of problem-solving. The proposed framework, User Problem-Solving Ability Modeler (UPSAM), is mainly designed to accommodate to the needs of studying students’ Creative Problem-Solving (CPS) abilities in the area of science education. The use of open-ended essay-question-type instrument and bipartite graph-based modeling technique together provides a potential solution of user model elicitation for CPS. The computational model has several potential applications in educational research and practice, including automated scoring, buggy concepts diagnosis, novel ideas detection, and supporting advanced studies of human creativity.
1. Introduction Problem-solving has consistently been an attractive topic in psychological and educational research for years. It is still a vital research field nowadays, and its role is believed to be much more important than it used to be, in alignment with the trends of putting stronger emphasis on students’ problem-solving process in educational practices. User Modeling (UM) for problem-solving ability is an alluring and long-going research topic. Previous works in the area of Intelligent Tutoring Systems (ITS) have endeavoured substantially to model problem-solving process for well defined problem contexts, such as planning a solution path in proving mathematical theorems or practicing Newtonian physics exercises [3]. However, we think the classical ITS paradigm cannot well describe the process of divergent and convergent thinking in the human Creative Problem-Solving (CPS) tasks [1][5]. In other words, the classical approach lacks the functionality to support advanced educational research on the topic of CPS. In this paper, we propose a user modeling framework, named UPSAM (User Problem Solving Ability Modeler), by exploiting open-ended essay-question-type instrument and bipartite graph-based representation to capture and model the creative perspective of human problem-solving. UPSAM is designed to be flexible and can have several potential advantageous applications, including: 1) offering functionalities to support educational studies on human creativity, such as automated scoring of open-ended instruments for CPS, and 2) detecting students’ alternative conception on a particular problem-solving task for enabling meta-cognitive concerns in building adaptive educational systems. 2. UPSAM: User Problem Solving Ability Modeler
942
H.-C. Wang et al. / A User Modeling Framework for Exploring Creative Problem-Solving Ability
A bird’s eye view of the UPSAM framework is abstractly depicted in Figure 1. The grey box labelled Agent refers to the core software module implemented several functionalities to perform each process of user modeling as described in [4], including: 1) Perceiving the raw data from the user (the process of eliciting user information), 2) Summarizing the raw data as the structured user model (the process of monitoring/modeling), and 3) Making decisions based on the summarized user model (the process of reasoning). Note that the source data for UPSAM are users’ free-text responses in natural language toward an open-ended essayquestion-type instrument. However, although users’ responses are open-ended, they are not of no structure by themselves. With the help of a controlled domain vocabulary which increases the consistency between users’ and the expert’s wording, as well as the pair-wise semi-structured nature of the instrument Figure 2. A snapshot of the answer sheet showing which help identify the context of users’ the pair-wise relation among ideas and reasons. answers, it becomes much more tractable to perform the operation of user model summarization from such open-ended answers. Figure 2 depicts the format of the instrument for eliciting user information, which is based on the structure of the CPS test proposed by Wu et al. in [5]. Users are required to express their ideas (cf. the production of divergent thinking in CPS) in the problem-solving context described by the instrument, and then explain/validate each idea with reasons (cf. convergent thinking in CPS). 3. Bipartite Graph-based Model In UPSAM, an important feature to capture users’ CPS ability is to structure the domain and user models (see Figure 1) as bipartite graphs. Actually, a domain model is simply a special case of user model summarized from domain experts with a different building process. Domain models are now authored by human experts manually, while user models are built by UPSAM automatically. Therefore, the fundamental formalism of the domain and user models is identical. One of the most important features in CPS is the relation bewteen divergent thinking and convergent thinking. The bipartite graph in the graph theory is considered appropraite to represent this feature. A bipartite graph is one whose vertex set can be partitioned into two disjoint subsets such that the two ends of each edge are from different subsets [2]. In this case, given a set of ideas A={a1, a2, …, an} and a set of reasons B={b1, b2, …, bm}, the domain model can be represented as an undirected bipartite graph G=(V, E) where V=A B and
H.-C. Wang et al. / A User Modeling Framework for Exploring Creative Problem-Solving Ability
943
A B= I . The connections between ideas and reasons are represented as E={eij}, and each single edge eij represents a link between idea ai and reason bj . Different ideas, reasons, and combinations of the (idea, reason) pairs should be given different scores indicating the quality of answers. The scoring functions are assigned to A, B, and E, respectively: Sc
{good answe r , regular , no credit}, f A : A o Sc, f B : B o Sc, and f E : E o Sc
where SC denotes the range of these scoring functions, and each ordinal value (ex. “regular”) is connected to a corresponding numeric value. Then the total score of a model G=(A B, E) can be computed as the weighted summarization of individual part of scores:
f total (G )
( w A f A ( A) wB f B ( B) wE f E ( E )) /( w A wB wE )
wA , wB , and wE are weighting coefficients that can be tuned according to the needs of each application. Therefore, the score for a user U can be reasonably defined as the ratio of the user model’s (GU ) total score to the domain model’s (GD ) total score. That is, Score(U)=ftotal(GU)/ ftotal(GD). An automated scorer for grading semi-structured responses can then be realized accordingly. Moreover, a fine grained analysis of users’ cognitive status is possible by considering the difference between the domain and user models. The Diff Model representing the difference is defined as Gdiff =(GU GD)-(GU GD). Its properties and applications deserve further exploration. The process of building the bipartite graph-based user models from users’ answers is computationally tenable. The kernel idea is to employ techniques of Information Retrieval (IR) to identify the similarity between users’ open-ended entries and the descriptions associated to each vertex in the domain model. As mentioned in Section 2, the incorporation of a controlled vocabulary and the structure of the instrument are considered helpful to the process. A prototypical automated user modeling and scoring system has been implemented, and more details will be reported soon. 4. Conclusion In this paper, we have briefly described a user modeling framework for CPS ability, UPSAM. Empirical evaluations, full-fledged details, and applications of the framework are our current and future works. We also expect that the computational model can be of contribution to the study of human creativity in the long run. References [1] Basadur, M. (1995) Optimal Ideation-Evaluation Ratios. Creativity Research Journal, Vol. 8, No. 1, pp.63-75. [2] Boundy, J., Murty, U.S.R. (1976) Graph theory with applications, American Elsevier, New York. [3] Conati, C., Gertner, A.S., VanLehn, K., and Druzdzel, M.J. (1997) On-Line Student-Modeling for Coached Problem Solving Using Bayesian Network. Proceedings of 6th International Conference on User Modeling, Italy. [4] Kay, J. (2001) User Modeling for Adaptation. User Interface for All: Concepts, Methods, and Tools, Lawrence Erlbaum Associates, pp. 271-294. [5] Wu, C-L., Chang, C-Y. (2002) Exploring the Interrelationship Between Tenth-Graders’ ProblemSolving Abilities and Their Prior Knowledge and Reasoning Skills in Earth Science. Chinese Journal of Science Education, Vol. 10, No. 2, pp. 135-156.
944
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Adult Learner Perceptions of Affective Agents: Experimental Data and Phenomenological Observations Daniel WARREN
E SHEN
Sanghoon PARK
Amy L. BAYLOR
Roberto PEREZ
Instructional Systems Program RITL – Affective Computing Florida State University
[email protected] Instructional Systems Program RITL – Affective Computing Florida State University
[email protected] Instructional Systems Program RITL – Affective Computing Florida State University
[email protected] Director, RITL http://ritl.fsu.edu
Instructional Systems Program RITL – Affective Computing Florida State University
[email protected] Florida State University
[email protected] Abstract. This paper describes a two-part study of animated affective agents that varied by affective state (positive or evasive) and motivational support (present or absent). In the first study, all four conditions significantly improved learning; however, only three conditions significantly improved math self-efficacy, the exception being the animated agent with evasive emotion and no motivational support. To help in interpreting these unexpected results, the second study used a phenomenological approach to gain an understanding of learner perceptions, emotions, interaction patterns, and expectations regarding the roles of agent affective state and motivational support during the learning process. From the qualitative data emerged three overall themes important to learners during the learning process: learner perceptions of the agent, learner perceptions of self, and learner-agent social interaction. This paper describes the results of the phenomenological study and discusses the findings with recommendations for future research.
1. Introduction Animated agents are graphical interfaces that are capable of using verbal and non-verbal modes of communication to interact with users in computer-based environment. These agents generally present themselves to users as believable characters, who implement a primitive or aggregate cognitive function by acting as mediators among people and programs, or by performing the role of an intelligent assistant [1]. In other words, they simulate a human relationship by doing something that another person could otherwise do for that user [2]. There has been extensive research that shows learners in agent-based environments have showed deeper learning and higher motivation [3]. A recent study [4] in which agents monitored and evaluated the timing and implementation of teaching interventions, has indicated that agent role and agent voice and animation had a positive effect on learning, motivation, and self-efficacy. Yet, there are few studies which focus on the cognitive function of the agent in the learning environment [5], or which implement a systematic examination of learner motivation, perceived agent values, and self-efficacy. The focus of this study is to explore how users perceive emotionally evasive and unmotivated agents, and to try to uncover what perceptions and alternative strategies users may develop to deal with this kind of agent. 2. Experimental Method Sixty-seven General Education Development students in a community college in the southeastern United States participated in this study. Students were 52% male with 17.9% Caucasians, 71.6% African-Americans, and 13.5% of other ethnicities, with average age 22.3 years (SD=8.75). There were four agent conditions: 1) Positive affective state + motivational support; 2) Evasive affective state + motivational support; 3) Positive affective state only; 4) Evasive affective state only. Students were randomly assigned to one of the agent conditions, and they learned to solve percentage word problems. Before and after the task, students’ math anxiety level and math
D. Warren et al. / Adult Learner Perceptions of Affective Agents
945
self-efficacy were measured. The post-test also measured perceived agent value, instructional support, and learning. 3. Findings Results indicated that students who worked with the positive + motivation support agent significantly enhanced their self-efficacy from prior (M=2.43, Fig. 1: the animated SD = 1.22) to following the intervention (M = 3.79, SD = 1.37, p < .001). agent used in the Similar improvement was found for the agent with positive affective state stud only (M=2.42, SD = .96 vs. M = 3.84, SD = 1.43, p < .001) and for the agent with evasive + motivation support (M = 3.06, SD = 1.53 vs. M = 4.13, SD = 1.03, p < .001). Additionally, students perceived the agent with motivational support as significantly more human-like (M = 3.83, SD = 1.02) and engaging (M = 4.03, SD = 1.09) than the agent without motivational support (M = 3.33, SD = 1.02) (M = 3.65, SD = .92). As expected, the agent with evasive affective state and no motivation support did not lead to an improvement of student self-efficacy or to a perception of the agent as offering good instructional support. However, across all conditions, students performed significantly better on the learning measure than prior to using the program. In other words, students who interacted with an emotionally evasive, un-motivational agent, still improved their learning (i.e., “in spite of” this agent). This result was intriguing enough to motivate the second part of the study, where students were observed and interviewed about their interactions with an agent that displayed evasive emotions and provided no motivational support. The focus of this part, then, was on understanding those interactions better, as well as getting students’ feedback to improve the agent. 4. Observational Method The phenomenological follow-up study included six students enrolled in an Adult Education program at the same southeastern United States community college. Participants were selected using intensity sampling to identify individuals willing to express opinions and describe their experiences. Data were collected using direct observations and interviews. During the initial observation phase, participants navigated through a computer-based math learning module and interacted with a pedagogical agent that displayed evasive emotion without motivational support. Participants were asked at specific times to describe their perception of the agent’s emotional expressions. Researchers observed participants from a control booth through one-way windows and took field notes noting participants’ emotional expressions. During the follow-up interview, participants viewed digitally cued segments of their interactions with the agent, and were asked to describe their emotional expressions, feelings, and reactions at the specific time in the video recording. 4.3 Coding the Data Coding the data involved looking for meaningful patterns and themes that aligned with the purposes and the focus of the study. Interview data were digitized and transcribed then imported into NVivo™ software for subsequent data coding and analysis. 4.4 Validation and Triangulation Process Triangulation of findings involved: comparing field notes from observations, interviews, and survey responses; using different data collection methods; using different sources; and using perspectives from different analysts to review the data; which together lent further credibility to the findings. 5. Findings From iterative and immersive data analyses emerged themes, each of which is discussed below.
946
D. Warren et al. / Adult Learner Perceptions of Affective Agents
Learner Perception of the Agent. This theme refers to learners’ reaction toward the agent’s: emotion, facial expression, gaze, image, voice, and initial reaction. Responses such as “it was strange,” “what’s going on,” and “funny looking” characterize the initial reactions that students had toward the agent. Categories within this theme contained two sub-categories: “learner’s assessment” (of the agent) and “learner’s recommendation” (to improve the agent), both in regard to the agent’s emotional expressions, facial expressions, and tone of voice. Learner Perception of Self. This theme refers to learner: nervousness, anxiety, confusion, frustration, and confidence while interacting with the agent. Two categories not related to agent interactions but included in this theme were participants’ emotional experience when exposed to timed questions, and learners’ assessment of their prior content knowledge. Learner-Agent Social Interaction. This theme refers to the agent’s: feedback, overall nature and manner, and support and encouragement. Other emergent categories include: descriptions of possible agent social interaction interface options, favorite teacher characteristics, and descriptive comparisons of the agent versus a face-to-face teacher, and the agent’s voice versus the screen text. 7. Conclusions Participant responses imply that benefits of the agent depended on the learner and context characteristics. Participants seemed to perceive that having the agent present and interacting with them could have afforded the possibility for providing support for their learning, but that the specific instructional and support strategies with this particular agent did not always do so. Participant suggestions in terms of agent voice quality, facial expressions, eye contact, gestures, and emotional responses can be used to improve the interface. These improvements also apply to learner’s expectations for social interactions that do not distract from the learning task. Participant responses also suggest that a more responsive agent in terms of the variety of learners’ instructional needs would facilitate better learning experiences, and lead to less frustration and greater satisfaction. Participants expressed similar sentiments in terms of the agent’s ability to provide more positive and reinforcing feedback and support, rather than simply saying “correct” or “incorrect,” saying instead “good job” or “good try, but next time try better.” Although these results did not provide enough data to account for student gains in learning under unfavorable conditions (e.g., an agent with evasive emotional states), the study provided an insight into how students’ emotions and perceptions developed in their interaction with an agent. At the same time, the experimental part of the study confirmed previous findings as to the benefits of motivational support and positive emotion displayed by an animated agent. Future research can be carried out on affect and how different aspects of the agent interact to affect the user. 8. Acknowledgements This work was supported by the National Science Foundation, Grant IIS-0218692. References 1.Bradshaw, J.M. Software agents. in Bradshaw, J.M. ed. An introduction to intelligent agents, MIT Press, Menlo Park, CA, 1997, 3-46. 2.Seiker, T. Coach: A teaching agent that learns. Communication of the ACM, 37 (7). 92-99. 3.Moreno, R., Mayer, R.E. and Lester, J.C., Life-Like Pedagogical Agents in Constructivist Multimedia Environments: Cognitive Consequences of their Interaction. in World Conference on Educational Multimedia, Hypermedia, and Telecommunication (ED-MEDIA), (Montreal, 2000). 4.Baylor, A.L. Permutations of control: cognitive considerations for agent-based learning environments. Journal of interactive learning research, 12 (4). 403-425. 5.Baylor, A.L. The effect of agent role on learning, motivation, and perceived agent value. Journal of Educational Computing Research.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
947
Factors Influencing Effectiveness in Automated Essay Scoring with LSA Fridolin Wild, Christina Stahl, Gerald Stermsek, Yoseba Penya, Gustaf Neumann Department of Information Systems and New Media, Vienna University of Economics and Business Administration (WU Wien), Augasse 2-6, A-1090 Vienna, Austria {firstname.lastname}@wu-wien.ac.at Abstract. Automated essay scoring by means of latent semantic analysis (LSA) has recently been subject to increasing interest. Although previous authors have achieved grade ranges similar to those awarded by humans, it is still not clear which and how parameters improve or decrease the effectiveness of LSA. This paper presents an analysis of the effects of these parameters, such as text pre-processing, weighting, singular value dimensionality and type of similarity measure, and benchmarks this effectiveness by comparing machine-assigned with human-assigned scores in a real-world case. We show that each of the identified factors significantly influences the quality of automated essay scoring and that the factors are not independent of each other.
Introduction Computer assisted assessment in education has a long tradition. While early experiments on grading free text responses had mostly been syntactical in nature, research today focuses on emulating a human-semantic understanding (cf. [12]). In this respect, Landauer et al. [1] found evidence that a method they named ‘latent semantic analysis’ (LSA) produces grade ranges similar to those awarded by human graders. Several stages in this process leading from raw input documents to the machine assigned scores allow for improvement. Contradicting claims, however, question the optimisation of these influencing factors (e.g. [2] vs. [9]). In this contribution we describe an experiment on the optimization of influencing factors driving the automated scoring of free text answers with LSA. By testing automated essay scoring for the German language and through the use of a small text-corpus we extend previous work in this field (e.g. by [2, 3]). Whereas a detailed description of LSA in general can be found elsewhere (e.g. [1]), the following sections give an overview of the methodology, hypotheses and the results of our experiments. 1. Methodology Formally, an experiment tries to explore the cause-and-effect relationship where causes can be manipulated to produce different effects [4]. In this way, we developed a software application to alter the settings of the influencing factors we adopted for an experimental approach. This enabled us to compare machine-assigned scores (our dependent variables) to the human-assessed scores by measuring their correlation, a testing procedure commonly used in the literature of essay scoring (e.g. in [5], [6], [7]). By changing consecutively and ceteris paribus the influencing factors (our independent variables), we investigated their influence on the score correlation. The corpus of the experiment consisted of students’ free-text answers to the same marketing exam question. The 43 responses were pre-graded by a human assessor (say, a teacher) with points from 0 to 5, assuming that every point was of the same value and thus, the scale was
948
F. Wild et al. / Factors Influencing Effectiveness in Automated Essay Scoring with LSA
equidistant in its value representation. The average length of the essays was 56.4 words, a value that is on the bottom of recommended essay length [8]. From those essays that received the highest scores from the human evaluator, we chose three so-called ‘golden essays’. These golden essays were used to compute the correlation for the remaining essays assuming that a high correlation between a test essay and the mean of the golden essays entails a high score for the test essay [1]. The SVD co-occurrence matrix was built with the three golden essays and a marketing glossary consisting of 302 definitions from the domain of the exam. Every glossary entry was a single file with an average length of 56.1 words and the glossary was part of the preparation materials for the exam. 2. Hypothesis and Test Design We conducted several tests addressing four aspects that have proven to show great influence on the functionality and effectiveness of LSA [1,2]: 1. Document pre-processing: With the elimination of stop-words and stemming in mind, we used a stop-word list with 373 German terms and Porter’s Snowball stemmer [11]. We assessed the effects of pre-processing by testing the corpus with and without stemming, with and without stop-word removal and with the combination of stemming and stopword removal. For the succeeding tests, we used the raw matrix as default. 2. Weighting-schemes: Several weighting-schemes have been tested in the past (e.g. in [3, 9]), yielding best results for the logarithm (local weighting), and the entropy (global). Assuming that these results will also apply to the German language and the automated scoring of essays, we combined three local (raw term-frequency, logarithm, and binary) and four global (none, normalization, inverse document-frequency, and entropy) weightings. As default we used the raw term frequency and no global weighting. 3. Choice of dimensionality: The purpose of reducing the original term-document matrix is to minimize noise and variability in word usage [10]. In order to determine the amount of factors needed for the reduced matrix, we considered the following alternatives: a. Percentage of cumulated singular values: Using the vector of singular values, we can sum up singular values until we reach a specific value; we suggest using 50%, 40% and 30% of the cumulated singular values. b. Absolute value of cumulated singular values equals number of documents: Here the sum of the first k singular values equals the number of documents in the corpus. c. Percentage of number of terms: Alternatively the number of used factors can be determined by a fraction of used terms. Typical fractions are 1/30 or 1/50. d. Fixed number of dimensions: A less sophisticated but common approach is to use a fixed number of singular values, for instance 10. For testing the other influencing factors, we chose 10 as default value. 4. Similarity measures: Finally, we tested three similarity measures: the Pearson-correlation, Spearman’s rho and the cosine. As default we used Spearman’s rho. 3. Reporting Results In the pre-processing stage, stop-words removal alone (Spearman’s rho = .282) and the combination of stopping and stemming (r = .304) correlated significantly with the human scores (with a p-value less than .05). Stemming alone, however, reduced the scoring correlations. For the weighting-schemes, the raw term frequency (tf) combined with the inverse term frequency (idf) (r = .474) as well as the logarithm (log) combined with idf (r = .392) proved
F. Wild et al. / Factors Influencing Effectiveness in Automated Essay Scoring with LSA
949
best (p < .01). Similarly, the binary term frequency (bintf) in combination with idf (r = .360) showed significant results for a level of p < .05. Looking at the local schemes separately, we found that none of the schemes alone improved results significantly. For the global schemes, idf yielded outstanding results. Surprisingly, neither of the two schemes proposed in other literature (i.e. logarithm as the local scheme and entropy as the global) returned the expected sound results. In fact, for our case they both reduced the performance of LSA. In our dimensionality tests, the only procedure yielding significant results was the use of a certain percentage of the cumulated singular value. On a level of p < .01 we received a correlation with the human grades of r = .436 for a share of 50 %, r = .448 for 40 % and r = .407 for 30 %. The other methods failed to show significant influence. Finally, spearman’s rho obtained the best results when comparing the influence of different similarity measures on the effectiveness of LSA. It was the only measure producing a correlation on a level of p < .01 with the human scores. 4. Conclusions and Future Work Our results give evidence that for the real-world case we tested, the identified parameters influence the correlation of the machine assigned with the human scores. However, several recommendations on the adjustment of these parameters proposed in the literature do not apply in our case. We suspect that their adjustment strongly relies on the document corpus used as text base and on the essays to be assessed. Nevertheless, significant correlations between machine and human scores were discovered, which ensures that LSA can be used to automatically create valuable feedback on learning success and knowledge acquisition. Based on these first results, we intend to test the dependency of the parameter settings on each other for all possible combinations. Additionally, the stability of the results within the same discipline and in different contexts needs to be further examined. Moreover, we intend to investigate scoring of essays not against best-practice texts, but against single aspects, as this would allow us to generate a more detailed feedback on the content of essays. References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12]
Landauer, T., Foltz, P., Laham, D. (1998): Introduction to Latent Semantic Analysis, In: Discourse Processes, 25, pp. 259-284 Nakov, P., Valchanova, E., Angelova, G. (2003): Towards Deeper Understanding of the LSA Performance. In: Recent Advances in Natural language processing – RANLP’2003, pp. 311-318. Nakov, P., Popova, A., Mateev, P. (2001): Weight functions impact on LSA performance. In: Recent Advances in Natural language processing – RANLP’2001. Tzigov Chark, Bulgaria, pp. 187-193. Picciano, A. (2004): Educational Research Primer. Continuum, London. Foltz, P. (1996): Latent semantic analysis for text-based research. In: Behavior Research Methods, Instruments, and Computers, 28 (2), pp. 197-202. Foltz, P., Laham, D., Landauer, T. (1999): Automated Essay Scoring: Applications to Educational Technology. In: Proceedings of EdMedia 1999. Lemaire, B., Dessus, P. (2001): A system to assess the semantic content of student essays. In: Journal of Educational Computing Research, 24(3), pp. 303-320. Rehder, B., Schreiner, M., Laham, D., Wolfe, M., Landauer, T., Kintsch, W. (1998): Using Latent Semantic Analysis to assess knowledge: Some technical considerations. In: Discourse Processes 25, pp. 337-354. Dumais, S. (1990): Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval. Technical Report, Bellcore. Berry, M., Dumais, S., O’Brien, G. (1995): Using Linear Algebra for Intelligent Information Retrieval, In: SIAM Review, Vol. 37(4), pp. 573-595. Porter, M.F. (1980): An algorithm for suffix stripping, In: Program, 14(3), pp. 130-137 Hearst, M. (2000): The debate on automated essay grading, In: IEEE Intelligent Systems, 15(5), pp. 22-37
This page intentionally left blank
Young Researchers Track
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
953
Argumentation-Based CSCL: How students solve controversy and relate argumentative knowledge Marije VAN AMELSVOORT & Lisette MUNNEKE Utrecht University, Heidelberglaan 2, 3584CS Utrecht, The Netherlands e-mail:
[email protected];
[email protected] Our study focuses on argumentative diagrams in computer-supported collaborative argumentation-based learning. Collaboration and argumentation are crucial factors in a learning process, since they force learners to make their thoughts explicit, and listen and react to the other person’s ideas. Since most people only have knowledge about part of a certain domain, argumentative interaction can help them to collaboratively acquire, refine, and restructure knowledge in order to get a broader an deeper understanding of that domain. However, argumentative interaction is not easy. People especially have difficulties with handling controversy in arguments, and exploring their argumentative (counter)partner’s ideas. An argumentative diagram might solve the above-mentioned problems by making controversy explicit, or by focusing on relations between arguments. Thirty pairs of students discussed two cases on the topic of Genetically Modified Organisms via the computer. They communicated via chat. One third of the pairs constructed a diagram using argumentative labels to describe the boxes in the diagram. One third of the pairs constructed a diagram using argumentative labels to describe the arrows between the boxes in the diagram. The third group was asked to collaboratively write a text without using labels. We hypothesized that students who have to explicitly label arguments in a diagram will have a deeper discussion than students who do not use labels, because it helps them to focus on the deepening activities of counter-argumentation and rebuttal, and to realize what kind of argumentation they haven´t used yet. Students who have to label relations will address controversy more than students in the other two groups, because the labeling is a visual display of the controversy and might ‘force’ students to solve these kinds of contradictions in collaboration. At this moment, eight pairs have been analyzed on exploration of the space of debate and labeling their diagrams. These preliminary results show that students hardly ever discuss controversy and relations in chat, nor talk about the labeling of the diagram. They are mainly focused on finishing the diagram or text, without explicitly exploring the space of debate together. They seem to avoid controversy, probably because they value their social relation, and because they want to finish the task quickly and easily. Students mainly explore the space of debate in the diagrams. The diagrams in the label-arrow condition are bigger than the diagrams in the label-box condition. There was no difference in conditions in amount of counterarguing or rebutting arguments in the diagram. Most students indicated there was no controversy in their discussion with their partner. However, when looking at the diagrams, many controversies can be found that are not related or discussed. We wonder whether students do not see controversy or whether they don’t feel the need to solve it. Further results will be discussed at our presentation.
954
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Generating Reports of Graphical Modelling Processes for Authoring and Presentation Lars BOLLEN University of Duisburg-Essen, Faculty of Engineering Institute for Computer Science and Interactive Systems, 47048 Duisburg, Germany In the process of computer supported modelling, the learner interacts with computational objects, manipulates them and thereby make his thoughts explicit. In this context, the phrase “objects to think/work with” has been introduced in [1], meaning that the exploration, manipulation and creation of artefacts support in establishing understanding. Nevertheless, when a learner finishes a modelling task within a modelling environment like Cool Modes [2], usually only a result is stored. The process of creating and exploring a model is compressed to a single artefact. Information about the process of his work, about different phases, about the design rationale, alternative solutions and about collaboration gets lost when having only a single artefact as the output of a modelling process. Knowledge about these issues is helpful for various target groups and for various purposes: E.g., the learner could use this information for self reflection, peer authoring and for presenting own results. Teachers could be supported in assessment, authoring and in finding typical problems in students solutions. Researchers in the field of AIED and CSCL could use the additional information for interpreting and understanding learner’s actions. Approaches that take into account processual aspects of learning and modelling can be found in [3, 4]. The problem described above can be addressed and solved by generating reports. Reports, in the sense of this approach, are summaries of states and action traces from modelling processes. A prototypical implementation of a report generation tool is already available. In this implementation, information about states and action traces from modelling processes are collected, analysed (using domain knowledge) and represented automatically in a graph-based visualisation, in which different nodes represent different states of the modelling process. Edges represent the actions that led to these states, providing information for analysing and interpreting modelling processes. Combining this automatic generated, graphbased representations with a mechanism for feeding back states into the learning support environment, provides for authoring and presentations (playing back previously recorded material), monitoring and assessment (observing collected material) and research (using advanced analysis methods to inspect specific features of modelling and collaboration).
References [1] Harel, I. and Papert, S. (eds.) (1991): Constructionism. Ablex Publishing. Norwood, NJ. [2] Pinkwart, N. (2003). A Plug-In Architecture for Graph Based Collaborative Modelling Systems. In Proc. of the 11th Conference on Artificial Intelligence in Education (AIED 2003), Amsterdam, IOS Press. [3] Müller, R., Ottmann, T. (2000). The "Authoring on the Fly" System for Automated Recording and Replay of (Tele)presentations. Special Issue of Multimedia Systems Journal, Vol. 8, No. 3, ACM/Springer. [4] Koedinger, K. R., Aleven, V., Heffernan, N., McLaren, B. M., and Hockenberry, M. (2004). Opening the Door to Non-Programmers: Authoring Intelligent Tutor Behavior by Demonstration. In Proceedings of 7th International Conference on Intelligent Tutoring Systems, ITS 2004, Maceio, Brazil.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
955
Towards An Intelligent Tool To Foster Collaboration In Distributed Pair Programming Edgar ACOSTA CHAPARRO IDEAS Lab, Dept. Informatics, University of Sussex BN1 9QH, UK
[email protected] Pair programming is a novel, well-accredited approach to teaching programming. In pair programming (as in any other collaborative learning situations) there is a need for tools that support peer collaboration Moreover, we must bear in mind the strong movement towards distributed learning technologies and how this movement could influence the design of such tools[1]. Indeed, there have been some attempts to implement tools to support distributed pair programming [2]. However, none of them have had any influence of pedagogical theories. To support the design and implementation of an intelligent tool in this work, the Task Sharing Framework (TSF) developed by Pearce et al. [3] is being explored. The aim of this doctoral research is to investigate the suitability of the TSF [3] in the design and implementation of a prototype of an intelligent tool that monitors and enhances the collaboration between distributed pair programmers facilitating their efforts at learning programming. In particular, the tool will search for signs of collaboration difficulties and breakdowns of pair programmers solving exercises of object-oriented programming. The TSF will support the sharing of collaborative tasks between users. Each peer will have their own identical yet independent copy of the task that by default, only they themselves can manipulate. The visual representation of agreement and disagreement has the potential to constructively mediate the resolution of collaborative disputes [3]. Programming is a heavy cognitive task and with the TSF each student will have two representations to look at. This might impact students’ cognitive efforts. The author is interested in exploring the learning gains and the peer collaboration with different versions of the intelligent tool using the TSF. Each participant will do a pre-test to evaluate her level of expertise in object-oriented programming. The learning gain and the collaboration will be measured by comparing the results from pre and post-tests, plus by analysing verbalizations and performance on the task. If the intelligent tool can be established and the TSF prove to be effective, it will support the implementation of intelligent tools that will extend the benefits of pair programming to a large population. Progress in this would also be of major significance in the area of intelligent learning systems used for teaching programming. References 1.
2. 3.
Fjuk, A., Computer Support for Distributed Collaborative Learning. Exploring a Complex Problem Area., in Department Informatics - Faculty of Mathematics and Natural Sciences. 1998, University of Oslo: Olso. p. 256. Stotts, P.D. and L. Williams, A Video-Enhanced Environment for Distributed Extreme Programming. 2002, Department of Computer Science. University of North Carolina. Pearce, D., et al., The task sharing framework for collaboration and meta-collaboration, in (in press) proceeding of 12th International Conference on Artificial Intelligence in Education. 2005: Amsterdam Netherlands.
956
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Online Discussion Processes: How do earlier messages affect evaluations, knowledge contents, social cues and responsiveness of current message? Gaowei Chen Department of Educational Psychology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
This study examined how earlier messages affected the four properties of current message, i.e., evaluations, knowledge contents, social cues and responsiveness. If earlier messages help to explain these features in current one, we can further know the interrelationship of online messages, and thereby taking measures to improve online discussion processes. Most current studies focused on dependent forums, which are related to specific courses, to do content analysis of online discussion. This study extended this line of research by examining how online discussion messages affect one another in an independent academic discussion forum. I selected 7 hot topics from the math board, an academic discussion forum of the Bulletin Board System (BBS) Website of Peking University (http://bbs.pku.edu.cn). This independent forum is free for entrance or leaving, with little requirement or limitation for participants’ activities. There were totally 131 messages, 47 participants responding to the 7 topics. After coding data, I did regressions at the message level. Structural equation model (SEM) was also used to test direct and indirect effects in the analyses. Results showed that, disagreement and contribution in previous message positively predicted disagreement and personal feeling in current message. Visit number of previous poster was likely to increase contribution in current message, while personal feeling in message two turns prior tended to weaken it. Disagreement in current message raised the likelihood of it getting future response. Moreover, replying to a previous on-topic message can also help the current message to draw later response. Together, these results suggest that evaluations, knowledge contents, social cues and person status in earlier messages may influence the property of current message during online discussion processes. Further studies are necessary before making firm recommendations. However, results of this study suggest that designers and teachers may improve the quality of online academic discussion by taking the following advices. Attach more earlier messages to current message. The branch structure of online discussion made it difficult for current poster to track earlier messages. As shown in the results and discussion, only lag 1 and lag 2 messages, which were displayed together, can affect current message. To help participants understand the discussion thread more easily, designers can attach more earlier messages to current post, e.g., adding lag 3 and lag 4 messages. Some BBS websites have adopted this kind of discussion style, e.g., the “unknown space” BBS website (http://www.mitbbs.com). Carry on controversial discussion in online forum. As shown in this study, participants were likely to perform and continue controversial interactions in online discussion. It implies that teachers can move some controversial topics, e.g., new theories or problems without certain answers, to online forum for discussion. Under such topics, participants can easily come into different sides to controvert and argue by posting personal ideas.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
957
PECA: Pedagogical Embodied Conversational Agents in Mixed Reality Learning Environments Jayfus T. Doswell George Mason University
The Pedagogical Embodied Conversational Agent (PECA) is an “artificially intelligent”, computer 3D graphic, animated character that teaches from computer simulated environments and naturally interacts with human end-users. What distinguishes a PECA from the traditional virtual instructor or pedagogical agent is the PECA’s ability to intelligently use its 3D graphical form and multimodal perceptual ability. While so doing, the PECA has capabilities to communicate with human end users and demonstrate a wide variety of concepts from within interactive mixed reality environments. More importantly, the PECA uses this intuitive form of communication to deliver personalized instruction for enhancing human learning performance by applying its underlying knowledge of empirically evaluated pedagogical techniques and learning theories. A PECA combines this “art and science” of instruction with knowledge of domain based facts, culture, and an individual’s learning strengths in order to facilitate a more personal human learning experience and to improve its own instructional capabilities. The challenge, however, is engineering a realistically behaving 3D character for human interaction in computer simulated environments and with capabilities to provide tailored instruction based on well defined pedagogical rules and knowledge of human learning capabilities across cultures. Neither the PECA’s advanced human computer interface capabilities or ability to interact within mixed reality environments is useful without it’s knowledge of best instructional methods for improving human learning. A formal instructional method is called pedagogy and is defined as the art and science of teaching. PECA pedagogy may include scaffolding techniques to guide learners when necessary; multi-sensory techniques so students use more than one sense while learning; multi-cultural awareness where awareness of the individual’s social norms potentially influences learning outcomes, among other instructional techniques. The PECA also tailor a particular instructional method to, at minimum, weighted learning strengths, including: visual learning seeing what you learn; auditory learning hearing spoken messages or sounds to facilitate learning; kinesthetic learning to sense the position and movement of what is being learned; and tactile learning where learning involves touch. These pedagogical and learning styles may be structured and decomposed, without losing their inherent value, into a ‘codifed’ set of computational rules expressed, naturally, by the PECA. This paper presents a novel approach to building PECAs for use in mix reality environments and addresses key challenges researchers face in integrating pedagogy and learning theory knowledge in PECA systems.
958
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Observational Learning from Social Model Agents: Examining the Inherent Processes Suzanne J. EBBERS and Amy L. BAYLOR Centre for Research in Interactive Technologies for Learning (RITL) Learning Systems Institute, Florida State University, Tallahassee, FL 32306 Using computers as social information conveyors has drawn widespread attention from the research world. Recently, the use of pedagogical agents has come to the forefront of research in the educational community. Already they are termed “social interfaces”. Yet for them to be fully useful, we must delineate how similarly to humans they socially function. Researchers are looking at them as social models. It would be useful to examine human-human modeling studies and replicate them in using agents. Schunk & colleagues [1-2] studied Mastery and Coping models in a social learning situation and their impact on self-efficacy, skill, and persistence. These model types have not been researched using agents. Social interaction with agents is another activity whose social impact has not much been examined. In human-human social learning situations, interaction with a model is more intensely experienced than is a vicariously experience. No study has compared the impact of directly or vicariously experienced social interaction by humans with pedagogical agents. Threat creates dissonance. We affiliate to reduce dissonance. Under threat one would seek to affiliate with a similar other. If the only “other” available is an agent, learners should seek to affiliate depending on agent similarity features. If the “similar” Mastery model demonstrates non-threatened learning through cheerful self-efficacy while the “similar” Coping agent demonstrates a threatened experience through initial self-doubt and apprehension, then learners should disaffiliate from the Mastery agent and affiliate with the Coping model. Direct social interaction will intensify learning efforts. The primary purpose of the 2x2 factorial design research is to examine the impact of social model agent type (Mastery, Coping) and social interaction type (Vicarious or Direct) on participant motivation (self-efficacy, satisfaction), skill, evaluations, frustration, similarity perceptions, attitude and feelings about experience. Secondarily, the study will use descriptive statistics describing how social processes manifest in affiliation activities. The computerized instructional module teaches learners to create an E-Learningbased instruction plan. A “teacher” agent provides information. The agent “listens” to the “teacher” except when self-expressing to a “classmate” agent or the learner, who then responds. Participants will be about 100 university pre-service teachers in an intro tech class. The experiment will occur during a class 1.5 hour session. The participants will be randomly assigned to one of the five conditions (including control – no agent present). Analysis will consist of two-way ANOVAs on most variables. For Motivation a two-way MANOVA will be used. “Feelings” will be qualitatively analyzed. References [1] D. H. Schunk and A. R. Hanson, "Peer-Models: Influence on Children's Self-Efficacy and Achievement," Journal of Educational Psychology, vol. 77, pp. 313-322, 1985. [2] D. H. Schunk, A. R. Hanson, and P. D. Cox, "Peer-Model Attributes and Children's Achievement Behaviors," Journal of Educational Psychology, vol. 79, pp. 54-61, 1987.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
959
An Exploration of a Visual Representation for Interactive Narrative in an Adventure Authoring Tool Seth GOOLNIK The University of Edinburgh Research Summary The earlier Ghostwriter project attempted to address the issue of weaknesses in children’s writing skills through the development of a virtual learning environment targeted to improve them. Ghostwriter is a 3D interactive audio-visual adventure game, in using it, results showed that children found this experience to be highly motivating and stories written after use of the software displayed significantly better characterisation than those written in typical classroom conditions. The work of Yasmin Kafai suggests that improved learning can be obtained by allowing children to create learning environments themselves. Motivated by this the Adventure Author project aims to explore if, by developing an authoring tool to allow children to not only participate in interactive narrative environments à la Ghostwriter but in addition enable them to create these narratives themselves, it would be possible to capitalise on the benefits of the Ghostwriter. As a continuation of Adventure Author this project attempted to formalize a system for visually representing interactive narrative as the next logical step in the development of a 3D virtual environment authoring tool. It then investigated whether children of the target age range for the authoring tool could understand and generate interactive narratives using this representation, attempting to provide a solid foundation for the ultimate development of the authoring tool. The visual system was developed using the example interactive narrative of adventure game books, with this found to be formalizable within the representational structure of an Augmented Transition Network. This system was first presented to the children via a specially designed interactive narrative, structurally contained on a paper chart. After participating in the interactive story the children were able to understand as a group that the chart represented it and further they were able to fully generate their own interactive narrative using the same paper-based representation. Following the success of the paper-based medium in conveying the visual system the computer-based medium of AA2D was developed. In individually using AA2D to both understand and generate the representation of interactive narrative all participants were successful: all understood the formal system AA2D conveyed; and all were able to use AA2D to generate their own valid interactive narratives. Participants also all explicitly commented they had enjoyed using AA2D for these purposes and would be happy to do so again. This project thus provides a clear assertion that the potentially valuable Adventure Author project can and should continue. By developing a visual formalisation of interactive narrative and then demonstrating that children of the target age range can both understand and generate it, an ultimate 3D interactive narrative environment authoring tool can now be seen to be viable. Furthermore, given that all experimental participants were admittedly engaged by their experiences and that surveyed literature suggests the educational benefits of their production, this project has shown that such further exploration into interactive narrative through virtual environments has real educational potential.
960
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Affective Behavior in Intelligent Tutoring Systems for Virtual Laboratories Yasmín HERNÁNDEZ1, Julieta NOGUEZ2 Gerencia de Sistemas Informáticos, Instituto de Investigaciones Eléctricas
[email protected] 2 Instituto Tecnológico y de Estudios Superiores de Monterrey, Campus Cd. de México
[email protected] México 1
We are developing an intelligent tutoring system coupled to a virtual laboratory for teaching mobile robotics. Our main hypothesis is that if the tutor recognizes the student affective state and responds accordingly, it may be able to motivate the student and improve the learning process. Therefore, we include in the ITS architecture an affective student model and an affective behavior model for the tutor. The student model contains knowledge about the affective state of the student. Based on the OCC model [1], we establish the affective state as an appraisal between goals and situation. To determine the student affective state we use the following factors: student personality traits, student knowledge state, mood, goals and tutorial situation (i.e. outcome of the students’ actions). According to the OCC model, the goals are fundamental to determine the affective state; we infer them by means of personality traits and the cognitive student state. For the personality traits we use the Five Factor Model [2] which considers five dimensions for personality. We use three of them to establish goals, because these are the ones that have more influence on learning. We represent the affective student model by a Bayesian network; since this formalism provides an effective way to represent and manage the uncertainty inherent in student modeling [3]. Once the affective student model has been obtained, the tutor has to respond accordingly and to provide the student with a pedagogical response that fits with his affective and cognitive state. The affective behavior model (ABM) receives information from the affective student model, the cognitive student model and the tutorial situation; and translates them into affective actions for the tutor and interface modules. The affective action includes knowledge about the overall situation that will help the tutor module to determine the best pedagogical response to the student, and also will advise the interface module to express the response in a suitable way. We represent the ABM by means of a decision network, where the affective action considers utilities in learning and motivation. Currently, we are implementing the affective student model and integrating it to the cognitive student model. We are preparing some experiments and looking for pedagogical and psychological support for the formalization of the affective behavior model. References [1] Ortony, A., Clore G.L., and Collins A., The Cognitive Structure of Emotions, Cambridge University Press, 1988. [2] Costa, P.T. and McCrae, R.R., Four Ways Five Factors are Basic, Personality and Individual Differences, 1992, 13 (1), pp. 653-665. [3] Conati, C., and Zhou X., Modeling students’ emotions from Cognitive Appraisal in Educational Games, 6th International Conference on Intelligent Tutoring Systems, ITS 2002, Biarritz, France, pp. 944-954.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
961
Taking into account the variability of the knowledge structure in Bayesian student models. Mathieu HIBOU Crip5 Université René Descartes – Paris 5 45 rue des Saints-Pères 75270 Paris Cedex 06 France
[email protected] Abstract. Bayesian belief networks have been widely used in student and user modelling. Their construction is the main difficulty for their use in student modelling. The choices made about their structures (especially the arcs orientation) have consequences in terms of information circulation. The analysis we present here is that the network structure depends on the expertise level of the student. Consequently, the evolution of the network should not only be numerical (update of the probabilities) but also structural. Hence, we propose a model constituted of different networks in order to take into account these evolutions.
Bayesian networks (BN) have been successfully used for student modelling in many different systems, [1], [4], [5]. We propose to extend their use in order to take into account the changes in the student’s knowledge structure. The existence of structural differences between experts and novices knowledge and problems representations have been studied and highlighted in cognitive psychology [3]. Consequently, there should be an evolution not only of the network parameters but also of its structure to reflect the changes in the student's knowledge structure. The solution we propose to take into account these changes, inspired by the Bayesian learning approach [2], is to consider that the model is constituted of different sub-models, each one of them being a Bayesian network. The selection of the most appropriate sub-model is made using abductive inference. After observation, the most probable explanation is figured out for abd arg max PV v e , where i denotes the network and e the evidence each network, vi vV \ e
observed. Each of those explanations has a probability P V vi abd e , and this probability is the criteria used for the determination of the sub-model that fits the best. This idea is currently tested in order to determine whether or not we can detect different submodels. References [1] A. Bunt, C. Conati. Probabilistic student modelling to improve exploratory behaviour, in Journal of User Modeling and User-Adapted Interaction, volume 13 (3), pages 269-309, 2003. [2] W. L. Buntine. Operations for learning with graphical models, Journal of Artificial Intelligence Research, volume2, n°, pages159-225, 1994. [3] Chi, M.T.H., Feltovitch, P.J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121-152. [4] C. Conati, A. Gertner, K. Vanlehn. Using Bayesian networks to manage uncertainty in student modeling, in Journal of User Modeling and User-Adapted Interaction, volume 12 (4), pages371-417, 2002. [5] A. Jameson. Numerical uncertainty management in user and student modeling: an overview of systems and issues, in User- Adapted Interaction, volume 5 (3-4), n°5, pages193-251, 1996.
962
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Subsymbolic User Modeling in Adaptive Hypermedia Katja HOFMANN California State University, East Bay, 25800 Carlos Bee Blvd., Hayward, CA 94542, USA, Phone +1(510) 885-7559, E-mail
[email protected] The most frequently used approach to user modeling in adaptive hypermedia is the use of symbolic machine learning techniques. Sison and Shimura and Weber and Brusilovsky describe a number of current systems, which use for example decision trees, probabilistic learning, or case-based reasoning to infer information about the student. However, many researchers have come to the conclusion that the applicability of symbolic machine learning to user modeling in adaptive learning systems is inherently limited because these techniques do not perform well on noisy, incomplete, or ambiguous data. It is very hard to infer information about the user based on the observation of single actions. Neural networks and fuzzy systems are subsymbolic machine learning techniques and are a very promising approach to deal with the characteristics of data obtained from observing user behavior. The two techniques complement each other and have inherent characteristics that make them suitable to deal with incomplete and noisy data inherent to user behavior in hypermedia systems. Most importantly, this approach can identify similarities in underlying patterns of complex, high-dimensional data. I want to find out how subsymbolic machine learning can be used to adapt navigation of web-based tutorials to the goals, knowledge level, and learning style of the student. The students’ interaction with the tutorial will be recorded and form the input to a neuro-fuzzy clustering mechanism. The resulting clustering will group similar student behavior in clusters, which is a representation of the patterns underlying the user behavior. My hypothesis is that students with similar goals, background knowledge, and learning style will show similar user behavior and will thus be grouped in the same or adjacent clusters. Based on the clustering, the online tutorial will adapt the navigation by placing the documents that similar students found helpful in the most prominent position. My work is based on the existing ACUT tutorial. ACUT uses collaborative learning and social navigation and aims at increasing retention of Computer Science students without extensive knowledge on UNIX, especially women and minority students. After implementing the clustering mechanism I will use empirical evaluation to test my hypothesis. Focused interviews will be used to receive very detailed qualitative and quantitative data. The Results will give information about the effectiveness and applicability of the adaptation mechanism, and about the evaluation method. The presented research is a work in progress and future research will be needed to carefully evaluate and compare the efficiency of current technologies and subsymbolic clustering for user modeling in adaptive hypermedia systems. After evaluating the first results I will be able to analyze resulting clustering and recommendations and refine the algorithm to make more informed decisions about navigational adaptation. The results of this research will be applicable to user modeling, navigation design, and development of collaborative computer based learning systems and recommender systems. Acknowledgements. I want to thank my advisor Dr. Hilary J. Holz for her invaluable help, feedback and motivation. I also thank Dr. Catherine Reed for her help with the educational side of my research. This work is partially sponsored by an ASI fellowship of CSU EB.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
963
The Effect of Multimedia Design Elements on Learning Outcomes in Pedagogical Agent Research: A Meta-Analysis Soyoung Kim Instructional Systems Program RITL – PALS http://ritl.fsu.edu Florida State University
[email protected] This study aimed at synthesizing the results of experimental research on the effect of multimedia elements in pedagogical agents on learning outcomes by using a meta-analysis technique. This pilot study targeted the overall effects of treatments that varied according to design elements and learning outcomes. Furthermore, the results of this meta-analysis were expected to provide in-depth understanding about pedagogical agent research in a more systematic way. Previous research suggests that lifelike agents have a strong motivational effect, promote learners’ cognitive engagement, and arouse various affective responses. However, the results of research on pedagogical agents are somewhat varied across studies due to the nature of the embryo stage. This study intended to explain the overall effect of multimedia elements across studies on pedagogical agents and to try to find a consensus regarding the role of multimedia elements in the effectiveness of pedagogical agents. Twelve different experimental studies of pedagogical agents by five different authors were included in this meta-analysis, through the process of inclusion and exclusion. Unpublished manuscripts as well as published articles were incorporate for this analysis to avoid publication bias. Non-significant results as well as significant results were incorporated as long as appropriate descriptive data were reported to avoid selection bias. Through the coding process, the four main elements of multimedia design were identified as ‘treatment’ variable; the three main learning outcomes were identified as ‘outcome’ variable. The treatment variable was classified into four different levels; (1) auditory, (2) visual image, (3) visual image plus animation, (4) visual image plus social meaning (role, gender, ethnicity, etc.). The outcome variable was categorized as (1) affective outcome, (2) cognitive outcome and (3) motivational outcome. The key to meta-analysis is defining an effect size statistic capable of representing the quantitative findings of a set of research studies in a standardized form. A total of 28 different effect sizes from 12 different studies were obtained and incorporated in this data set. A categorical fixed model, which is analogue to ANOVA model, was applied and a total of five different predictors including moderate variables (author group, duration and subject matter) as well as main variables (treatment, outcome) were investigated. Results in this study indicated that the presence of a pedagogical agent transmitted the effect of multimedia design elements (Q total), which were created by technological support consistently across the studies, on learning outcomes, even though the effect of each variable (Qbetween) could not be verified. Discussion focused on pedagogical agents in the context of the reciprocal relationship between learning theory and multimedia design and its impact on learning outcomes. Results suggested possible factors and, most of all, it has improved the understanding of the pedagogical agent research. Furthermore, larger size sample should be required for a better meta-analysis. In addition, more studies about affective domains should be incorporated.
964
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
! " !#$ ! " % ! &! ! ! # ! ' ! ' " ! !# " ! ' ! ! ! !
#( ! ! # ! ! ! !# !" ! !(! !
#
" ! !)* (! ! !" # !! % ! ' #+ ! ! ,-%./# " % ! !
# ! # ! ! ! 0 ! # ! ! # 1 ! ! !! ! # ( 2 ! ! ,3/# ( 2 ! % % % !% !# " % %" ! # ,-/ 4 % 2#%6%7#8$ %# )-999*# + ! 2: # ! # ,./1 %#% (%2#%; %#%+ %$#%; %# %#)-99 >$ $! > - $ % " >! % * . *>> !,$ $ >! " $)$ $>! ) $ > &> $ ' > & > $ ' $! >$ >$$> > >$,! >$, " >$ . > > >$> $> > !> )$' )$ > " % , /') "!!> "$>! $ 0 !"> . ! >$,> $>> $$" > ")! ! $ $ $, 1$$ "$ ), > %! $$ $> !$$" ! $ $ 2 " >$, $ $ $ ! $!$$ % $ $ ) $ * $$$ $% $ $$! >!$% " >$ ! $ " $1$ $
">" *."" %) %! ! "%
$>> ! !! $ $,$) ) %" %! $ $ $! - $ ! ! $ $ >> ! !$ - $ !"> $ ! > >$ ">" %! " % ! % * $ ">" >$$,%) -> " >$$ )%$! >% >)" . )$> > ! % *,! ! > >> > $ )$%" $!$ $ 3$ ! > !$$$ 4 -">," $>% $$%> -> $ ! 5 % >, $, ! % " $ * > $ ), ! $ $"% $ $ $ $$>> )! $% $ -> $ ! $ ) $ % > $ % " >$ $ ! $ !"" $ $! % "$ $% ! "
$ $ ! $% )6 *! !4 , 789:;;/8 . ,+ !! ,6 "", )!" % >
980
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Enhancing Learning through a Model of Affect Amali WEERASINGHE Intelligent Computer Tutoring Group Department of Computer Science, University of Canterbury Private Bag 4800, Christchurch, New Zealand
[email protected] The effectiveness of human one-to-one tutoring is largely due to the tutor’s ability to adapt the tutorial strategy to the students’ emotional and cognitive states. Even though tutoring systems were developed with the aim of providing the experience of human one-to-one tutoring to masses of students in an economical way, using learners’ emotional states to adapt tutorial strategies have been ignored until very recently. As a result, researchers still focus on generating affective models and evaluating them. To the best of our knowledge, a model of affect is yet to be used to improve the objective performance of learners. This paper proposes an initial study to understand how human tutors adapt their teaching strategies based on the affective needs of students. The findings of the study will be used to investigate how these strategies could be incorporated into an existing tutoring system which can then adapt to the learner’s affect and cognitive models. Several researchers have pointed out that it is more important to focus on using the student model to enhance the effectiveness of the pedagogical process, than building a highly accurate student model that models everything about the student. Therefore, we are interested in investigating how a model of affect can be used to improve learning. We choose to focus on using the affective model to develop an effective problem selection strategy because most ITSs employ adaptive problem selection based only on the cognitive model, which may result in problems being too easy or too hard for students. This may occur due to factors like how much guessing was involved in generating the solution, how confident she was about the solution, how motivated she was etc., which are not captured in the student’s cognitive model. Therefore, using both cognitive and affective models can potentially increase the effectiveness of a problem selection strategy, which in turn can improve the learners’ motivation to interact with the system. As we want to explore how emotional states could be used to adapt the tutoring strategies, we propose to conduct a study to understand how human tutors respond to learners’ affective states. The objectives of the study are to understand how human tutors identify the emotional states of students during learning and how they adapt tutoring strategies in each situation. Participants will be students enrolled in an introductory database course at the University of Canterbury. As we want to explore general tutoring strategies, we plan to use four existing tutoring systems developed by our research group. Several tutors will observe students’ interactions. All sessions will be videotaped. Based on the study, we want to explore how this adaptation of tutorial strategies can be incorporated into an intelligent tutoring system.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
981
Understanding the Locus of Modality Effects and How to Effectively Design Multimedia Instructional Materials Jesse S. Zolna Department of Psychology, Georgia Institute of Technology Abstract AIED learning systems sometimes employ multimedia instructional materials that leverage technology to replace instructional text with narrations. This can provide cognitive advantages and disadvantages to learners. The goal of this study is to improve principals of information design that cater to human information processing. Prior research in educational psychology has focused on facilitating learning by presenting information in two modalities (auditory and visual) to increase perceptual information flow. It is hypothesized that similar effects might also occur during cognitive manipulations (e.g., extended storage and fact association). The described study separates perceptual information effects from those of cognitive operations by presenting auditory and visual information separately. The typical multimedia effect was not found, but other influences on learning were observed. An understanding of these other causes will help us create a more complete picture of what producers of multimedia learning materials should consider during design.
Summary Contemporary technology is increasingly employed to improve the efficiency of educational instruction. Educational psychologists have been trying to understand how multimedia instructional materials, that is presenting to-be-learned information in more than one modality, can improve learning [1;2]. The goal of this study is to advance the limited knowledge associated with mixing media ingredients that best cater to the strengths and limitations of human information processing. Research related to instructional design has proposed that controlling the processing demand needed in multimedia learning environments might be achieved by spreading information among working memory stores [1;2]. The focus of these explanations have been on perceptual level encoding (i.e., transition from the sensory store), creating information design recommendations that center on the presentation of multimodal information. They have deemphasized how the two streams of information influence the active processing of new information. The two influences, that is on perceptual encoding and active processing, may be separable, each influential for learning. If so, designing multimedia interfaces with considerations for only perceptual effects, as has been common in the past, may be incomplete. Non-verbal (or visual-spatial) and verbal (or auditory) internal representations often correspond to diagrammatic and descriptive external representations, respectively. However, visually and auditorily presented information included in multimedia learning environments correspond imperfectly to this division of internal representations. Research investigating multimedia instructional materials in light of psychological models [3;4;5] will define internal representations by more than just materials’ external representations. In an experiment, typical multimedia learning effects were not found. The next steps are to understand human information processing based on the effects of modality for both internal and external representations of information, and consequently to make suggestions to designers of multimedia information. References [1] Mayer, R. (2001) Multimedia Learning. Boston: Cambridge University Press. [2] Sweller, J. (1999). Instructional Design. Melbourne: ACER Press. [3] Baddeley, A., & Hitch, G.J. (1994). Developments in the concept of Working Memory. Neurosychology, 8(4), 485-493. [4] Paivio, A. (1986). Mental representations: A dual coding approach. New York: Oxford University Press [5] Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical Issues in Ergonomic Science, 3(2), 159-177.
This page intentionally left blank
Panels
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
985
Pedagogical agent research and development: Next steps and future possibilities Amy L. BAYLOR
Ron COLE
Arthur GRAESSER
W. Lewis JOHNSON
Director, Center for Research of Innovative Technologies for Learning (RITL) Florida State University http://ritl.fsu.edu
[email protected] Director, Center for Spoken Language Research (CSLR)
Co-Director, Institute for Intelligent Systems (IIS)
Univ. of Colorado at Boulder
[email protected] University of Memphis
[email protected] Director, Center for Advanced Research in Technology for Education (CARTE) University of Southern California
[email protected] Abstract. The purpose of this interdisciplinary panel of leading pedagogical agent researchers is to discuss issues regarding implementation of agents as “simulated humans,” pedagogical agent affordances/constraints, and future research and development possibilities.
Introduction Pedagogical agent research and development has made significant strides over the past few years, incorporating animated computer characters that are increasingly more realistic and human-like with respect to their dialogue, appearance, animation and the instructional outcomes they produce. Given the rapid growth and convergence of knowledge and technologies in areas of cognitive science (how people learn, how effective teachers teach), computing / networking and human communication technologies, the vision of accessible and affordable intelligent tutoring systems that use virtual teachers to help students achieve deep and useful knowledge has moved from fantasy to emerging reality. This panel will build on other recent discussions (including an NSF – supported “Virtual Humans Workshop”) to assess the current state of knowledge of pedagogical agents, and discuss the science and technologies required to accelerate progress in this field.
1. Organization of Panel A brief overview of the construct of “pedagogical agent” will be presented together with a review of pedagogical agent effectiveness for different learning outcomes (e.g., content acquisition, metacognition, motivation). The panel discussion will focus on four key sets of questions (listed below), for which each panellist will present a brief prepared response. Following each of the four panellists’ responses, there will be time for broader discussion of the question among the panellists. 1.
2.
3.
4.
Definitions: o What constitutes a pedagogical agent (e.g., message, voice, image, animation, intelligence, interactivity)? o Is the agent interface enough to constitute a pedagogical agent? o How intelligent (cognitively, affectively, and/or socially) should pedagogical agents be? Human-likeness: o How human-like should agents be with respect to the different modalities? What new technologies and knowledge (e.g. social dynamics of face to face tutoring) are required to make pedagogical agents look and act like human teachers? o How can we best exploit the human-like benefits (e,g., affective responses) of pedagogical agents together with their benefits as a technology (e.g., control, adaptivity) Instructional affordances (and constraints): o What new possibilities can pedagogical agents provide? (e.g., unique instructional strategies, providing a social presence when online instructor is absent, employing multiple agents to represent different perspectives) o What constraints exist? (e.g., user expectations and stereotypes) The future: o What are the main technological challenges and research breakthroughs required to invent virtual humans, and when can we expect these challenges to be met? o What multidisciplinary research is required to invent pedagogical agents that behave like sensitive and effective human teachers? When might we expect a virtual teacher to pass a Turing test, e.g., teach a student to read or solve a physics problem as if it were an expert human tutor? What would this test look like? o What are some new possibilities for agents (e.g., in different artefacts and settings, in different roles/functions, to serve as simulated instructors and test-beds for controlled research)?
This page intentionally left blank
Tutorials
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
989
Evaluation methods for learning environments Shaaron Ainsworth School of Psychology and Learning Sciences Research Institute, University of Nottingham, Nottingham, UK This tutorial explores the issue of evaluation in AIED. The importance of evaluating AIED systems is increasingly recognised. Yet, there is no single right way to evaluate a complex learning environment. This tutorial will emphasize how to develop a practical toolkit of evaluation methodologies by examining classic case studies of evaluations, show how techniques from other areas can be applied in AIED and examine common mistakes. Key issues include: • the goals of evaluation (e.g. usability, learning outcomes, learning efficiency, informing theory), • choosing methods for data capture and analysis, • appropriate designs, • what is an appropriate form of comparison? • and the costs and benefits of evaluating “in the wild.” Audience: This is an introductory tutorial intended for researchers with a variety of backgrounds. Presentation: Slides interspersed with demonstrations and discussions. Working in groups participants will design their own evaluation plans for a system during the course of the session.
990
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Rapid development of computer-based tutors with the Cognitive Tutor Authoring Tools (CTAT) Vincent Aleven, Bruce McLaren and Ken Koedinger Carnegie Mellon University Pittsburgh, Pennsylvania USA The use of authoring tools to make the development of intelligent tutors easier and more efficient is an on-going and important topic within the AI & Ed community. This tutorial provides hands-on experience with one particular tool suite, the Cognitive Tutor Authoring Tools (CTAT). These tools support the development and delivery (including web delivery) of two types of tutors: problem-specific Pseudo Tutors, which are very easy to build, and Cognitive Tutors, which are harder to build but more general, having a cognitive model of a competent student’s skills. Cognitive Tutors have a long and successful track record: they are currently in use in over 2000 US high schools. The CTAT tools are based on techniques of programming by demonstration and machine learning. The tutorial will provide a combination of lectures, demonstrations, and a good amount of hands-on work with the CTAT tool suite. CTAT is available for free for research and educational purposes (see http://ctat.pact.cs.cmu.edu). The target audience includes • ITS Researchers and developers looking for better authoring tools • Educators (e.g. college level professors) with some technical background interested in developing on-line exercises for their courses • Researchers in education or educational technology interested in using tutoring systems as a research platform to explore hypotheses about learning and/or instruction.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
991
Some New Perspectives on Learning Companion Research Tak-Wai Chan National Central University, Taiwan Learning companions, a concept proposed in 1988, were originally intended to be an alternative model of intelligent tutoring systems. This concept has recently drawn a rapid growth of interest while the research has been going along with generation of a variety of names such as virtual character, virtual peer, pedagogical agent, trouble maker, teachable agent, animal companion, and so forth. A number of research and technological advancements, including affective learning, social learning, human media interaction, new views on student modeling, increase of storage capacity, Internet, wireless and mobile technologies, ubiquitous computing, digital tangibles, and so forth, are driving learning companion research to a new plateau. This tutorial intends to give an account of these new perspectives and to shed light on a possible research agenda on the ultimate goal of learning companion research — building a lifelong learning companion.
992
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Education and the Semantic Web Vladan Devedži´c University of Belgrade, Serbia and Montenegro The goals of this tutorial are to present important theoretical and practical advances of the Semantic Web technology and to show its effects on education and educational applications. More specifically, important objectives of the tutorial are to explain the benefits the Semantic Web brings to Web-based education, and to survey current efforts in the AIED community related to applying Semantic Web technology in education. Some of the topics to be covered during the tutorial include: ontologies, Semantic Web languages, services and tools, educational servers, architectural aspects of the Semantic Web AIED applications, learner modeling and The Semantic Web, instructional design and The Semantic Web, and semantic annotation of learning objects.
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
993
Building Intelligent Learning Environments: Bridging Research and Practice Beverly Park Woolf University of Massachusetts, Amherst, Massachusetts, USA This tutorial will bring together theory and practice about technology and learning science and take the next step toward developing intelligent learning environments. We will discuss dozens of example tutors and present a wealth of tools and methodologies, many taken from mathematics and science education, to help participants design and build their own intelligent learning environments. Discussions will focus on linking theory in learning systems, artificial intelligence, cognitive science and education with practice in writing specifications for an intelligent tutor. Participants are encouraged to select an academic domain in which they want to build an intelligent learning environment and the group will break into teams several times during the tutorial to solve design and specification problems. The tutorial will provide a suite of tools and a toolkit for general work productivity and will emphasize a team-oriented, project based approach. We will share tutor techniques and identify some invariant principles behind successful approaches, while formalizing design knowledge within a class of exemplary environments in reusable form.
This page intentionally left blank
Workshops
This page intentionally left blank
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
997
Student Modeling for Language Tutors Sherman ALPERT1 and Joseph E. BECK2 1 IBM T.J. Watson Research Center 2 Center for Automated Learning and Discovery, Carnegie Mellon University
[email protected],
[email protected] Abstract. Student modeling is of great importance in intelligent tutoring and intelligent educational assessment applications. However, student modeling for computer-assisted language learning (CALL) applications differs from classic student modeling in several key ways, including the lack of observable intermediate steps (behavioral or cognitive) involved in successful performance. This workshop will focus on student modeling for intelligent CALL applications, addressing such domains as reading decoding and reading and spoken language comprehension. Domains of interest include both primary (L1) and second language (L2) learning. Hence, the workshop will address questions related to student modeling for CALL, including what types of knowledge ought such a model contain, with what design rationale, and how might information about the user’s knowledge be obtained and/or inferred in a CALL context?
Topics and goals Student modeling is of great importance in intelligent tutoring and intelligent educational diagnostic and assessment applications. Modeling and dynamically tracking a student's knowledge state are fundamental to the performance of such applications. However, student modeling in CALL applications differs from more "classic" student modeling in other domains in three key ways: 1. It is difficult to determine the reasons for successes and errors in student responses. In classic ITS domains (e.g., math and physics), the interaction with the tutor may require students to demonstrate intermediate steps. For performance in language domains, much more learner behavior and knowledge is hidden, and having learners demonstrate intermediate steps is difficult or perhaps impossible, and at any rate may not be natural behavior. (How) Can a language tutor reason about the cause of a student mistake? (How) Can a language tutor make attributions regarding a student's knowledge state based on overt behavior? 2. Cognitive modeling is harder in language tutors. A standard approach for building a cognitive task model is to use think-aloud protocols. Asking novices to verbalize their problem solving processes while trying to read and comprehend text is not a fruitful endeavor. How then can we construct problem solving models? Can existing psychological models of reading be adapted and used by computer tutors? 3. It may be difficult to accurately score student responses. For example, in tutors that use automated speech recognition (ASR), whether the student’s response is correct cannot be determined with certainty. In contrast, in classic tutoring systems scoring the student’s response is relatively easy. How can scoring inaccuracies be overcome to reason about the students’ proficiencies? This workshop discusses attempts at solutions to these and related problems in student modeling for language tutors.
998
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
International Workshop on Applications of Semantic Web Technologies for E-Learning (SW-EL’05) Lora AROYO1 and Darina DICHEVA2 Department of Computing Science, Eindhoven University of Technology PO Box 513, 5600 MD Eindhoven, The Netherlands
[email protected] 2 Department of Computer Science, Winston-Salem State University 601 Martin Luther King, Jr. Drive, Winston Salem, N.C. 27110, USA
[email protected] 1
Abstract. The SW-EL'05 workshop at AIED’05 covers topics related to the use of ontologies for knowledge representation in intelligent educational systems, modularised and standardized architectures, achievement of interoperability between intelligent learning applications, sharable user models and knowledge components and support for authoring of intelligent educational systems. Two focus-sessions are included in the workshop: 1) Application of Semantic Web technologies for Adaptive Learning Systems, which focuses on personalization and adaptation in educational systems (flexible user models), ontology-based reasoning for personalising the educational Semantic Web, and on techniques and methods to capture and employ learner semantics. 2) Application of Semantic Web technologies for Educational Information Systems, which focuses on Semantic Web-based indexing/annotation of educational content (incl. individual and community based), on ontology-based information browsing and retrieval and Semantic Web/ontology based recommender systems. Papers presented in the workshop illustrate Semantic Web-based methods, techniques, and tools for building and sharing educational content, models of users, and personalisation components; services in the context of intelligent educational systems (i.e. authoring service, user modelling service, etc.) and ontology evolution, versioning and consistency. A key part of the reported results are related to empirical research on Intelligent Educational Systems presenting real-world systems and case studies and providing community and individual support by using Semantic Webtechnologies and ontologies. The workshop is also a forum for presenting research performed within the context of the KALEIDOSCOPE and PROLEARN network of excellences. Other editions of the SW-EL workshop include: x x x x x x
SW-EL'05 at ICALT'05, Kaohsiung, Taiwan SW-EL'05 at AIED'05, Amsterdam, The Netherlands SW-EL'05 at K-CAP'05, Banff, Canada SW-EL'04 at AH'04, Eindhoven, The Netherlands SW-EL'04 at ITS'04, Maceio, Brazil SW-EL'04 at ISWC'04, Hiroshima, Japan
General workshop web site: http://www.win.tue.nl/SW-EL/index.html
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
999
Adaptive Systems for Web-Based Education: Tools and reusability Peter Brusilovsky; University of Pittsburgh;
[email protected] Ricardo Conejo; University of Málaga;
[email protected] Eva Millán; University of Málaga;
[email protected] Motivation Web-based education is currently a hot research and development area. Benefits of Webbased education are clear at hand: learners from all over the world can enroll in learning activities, communicate with other students or teachers, can discuss and control their learning progress - solely based on an internet-capable computer. A challenging research goal is to tailor the access to web-based education systems to the individual learners' needs, as determined by such factors as their previous knowledge on the subject, their learning style, their general attitude and/or their cultural or linguistic background. A number of Web-based adaptive and intelligent systems have been developed over the last 5 years. However, a larger variety of innovative systems can still be created and evaluated to provide a real difference in E-Learning. The goal of this workshop is to provide a forum for the discussion of recent trends and perspectives in adaptive systems for web-based education, and thus to continue the series of workshops on this topic held at past conferences. Topics The list of topics includes, but is not limited to: x Adaptive and intelligent web-based collaborative learning systems x Web-based adaptive educational hypermedia x Web-based Intelligent tutoring systems x Adaptive Web-based testing x Web-based Intelligent class monitoring systems x Adaptive and intelligent information retrieval systems for web-based educational materials x Personalization in educational digital libraries x Architectures for adaptive web-based educational systems. x Using machine learning techniques to improve the the outcomes of Web-based educational processes x Using semantic web technologies for adaptive e-learning x Reusability and self-organisation techniques for educational material x Interoperability between tools and systems for adaptive e-learning x Pedagogical approaches in web-based educational systems
1000
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Usage analysis in learning systems AIED2005 Workshop (http://lium-dpuls.iut-laval.univ-lemans.fr/aied-ws/) The topic of analyzing learning activities has attracted a lot of attention in recent years. In particular a number of techniques have been proposed by the AIED Community to collect and analyze data in technology supported learning activities. Understanding and taking into account usage of learning systems is now a growing topic of AIED Community, as recent events (ITS2004 workshop) and projects ("Design Patterns for Recording and Analyzing Usage in Learning Systems" work package of the European Kaleidoscope Network) have shown. Learning systems need to track student usage and to analyze their activity in order to adapt dynamically the teaching strategy during a session and/or to modify contents, resources and scenario after the session to prepare the next one. These large amounts of student data can also offer material for further analysis using statistical, data mining or other techniques. The aims of this workshop are (1) to facilitate the sharing of approaches, problems and solutions adopted for usage analysis of learning systems and (2) to create a forum for collaboration and to develop an international community around this field of study. The workshop will consist in presentations of refereed papers and posters, discussions and end with a forum led by a panel (Nicolas Balacheff, Ulrich Hoppe and Judy Kay) aimed at synthesizing workshop contributions and at identifying promising directions for future work. Program Committee Christophe Choquet, LIUM, University of Maine, France (co-chair) Vanda Luengo, IMAG, University of Grenoble, France (co-chair) Kalina Yacef, SIT, University of Sydney, Australia (co-chair) Nicolas Balacheff, IMAG, University of Grenoble, France Joseph Beck, Carnegie Mellon University, USA Peter Brusilovsky, School of Information Sciences, University of Pittsburgh, USA Elisabeth Delozanne, CRIP5, University of Paris 5, France Angelique Dimitrakopoulou, Aegean University, Greece Ulrich Hoppe, COLLIDE, University Duisburg Essen, Germany Judy Kay, SIT, University of Sydney, Australia Jean-Marc Labat, AIDA, Paris 6 University, France Frank Linton, The Mitre Corporation, MA, USA Agathe Merceron, Leonard de Vinci University, Paris, France Tanja Mitrovic, University of Canterbury, Christchurch, New Zealand Jack Mostow, School of Computer Science, Carnegie Mellon University, USA Ana Paiva, INESC, Lisboa, Portugal. Richard Thomas, University of Western Australia, WA, Australia Pierre Tchounikine, LIUM, University of Maine, France Felisa Verdejo, UNED, Madrid, Spain
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
1001
Workshop on Educational Games as Intelligent Learning Environments Cristina Conati Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, V6T1Z4, Canada {manske, conati}@cs.ubc.ca Sowmya Ramachandran Stottler Henke Associates, Inc,
951 Mariner's Island Blvd., Ste 360, San Mateo, CA 94404
[email protected] Over the past decade there has been an increasing interest in electronic games as educational tools. Educational games are known to be very motivating and they can naturally embody important learning design principles like exploration, immersion, feedback, increasingly difficult challenges to master. However, there are mixed results on the actual pedagogical effectiveness of educational games, indicating that this effectiveness strongly depends upon preexisting students’ traits such as meta cognitive skills and learning attitudes. These results are consistent with the mixed results on the effectiveness of exploratory learning environments, not surprisingly since most educational games are exploratory learning environments with a stronger focus of entertainment. Artificial Intelligence is already playing a increasingly integral part in both noneducational game design, and the design of more effective exploratory learning environments. This workshop aims to explore if and how AI techniques can also help improve the scope and value of educational games. The overall goal of the workshop is to bring together people who are interested in exploring how to integrate games with intelligent educational technology, to review the state-of-the –art, and formulate directions for further exploration. Some of the questions that the workshop aims to address include: (1) are some genres of games more effective at producing learning outcomes? (2) How do learners ’ individual differences (cognitive, meta-cognitive and affective) influence the genres of games they prefer/benefit from? (3) How can intelligent tutoring technologies augment gaming experience, with particular consideration to both motivational and learning outcomes? (4) How can we incorporate tutoring without interfering with game playing? (5) What role can intelligent educational games play in collaborative and social learning experiences? (6) The cost of developing games is very high, and adding AI techniques to the picture is likely to make the cost even higher. What tools exist or need to be developed to manage the development cost? (7) Should the gaming industry be involved and how? By addressing these issues in an mixed-mode, informal set of interactions, this workshop will explore the feasibility and utility of Intelligent Educational Games, identify key problems to address, and contribute to advancing the state of the art of this emerging area of research.
1002
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Motivation and Affect in Educational Software Cristina Conati, University of British Columbia, Canada:
[email protected] Benedict du Boulay, University of Sussex, UK:
[email protected] Claude Frasson, University of Montreal, Canada:
[email protected] Lewis Johnson, USC, Information Sciences Institute, USA:
[email protected] Rosemary Luckin, University of Sussex, UK:
[email protected] Erika A. Martinez-Miron, Univ. of Sussex, UK:
[email protected] Helen Pain, University of Edinburgh, UK:
[email protected] Kaska Porayska-Pomsta, University of Edinburgh, UK:
[email protected] Genaro Rebolledo-Mendez, Univ. of Sussex, UK:
[email protected] Motivation and affect (e.g., basic affective reactions such as like/dislike; specific emotions such as frustration, happiness, anger; moods; attitudes) often play an important role in learning situations. There have been various attempts to take them into account both at design time and at run time in AIED systems, though the evidence for the consequential impact on learning is not yet strong. Much research needs to be carried out in order to better understand this area. In particular, we need to deepen our knowledge of how affect and motivation relate to each other and to cognition, meta-cognition, learning context and teaching strategies/tactics. This workshop is intended bridge the gap existing between previous AIED research, particularly in motivation and meta-cognition, with the everincreasing research in emotions and other affective components. By bringing together researchers in the area, the workshop will be a forum to discuss different approaches with the aim of enriching our knowledge about how to create effective and affective learning environments. Also, it is expected to be a forum on which to address the appropriateness of defining bridges that could bring about new ways of relating cognitive and affective aspects of learning. At the end of the workshop we expect to reach agreements on which are the relevant emotions in learning contexts, as well as in the terminology been used so far (e.g. affect, emotion, motivation). We invited papers, which present either finished, or work in progress or theoretical positions in the following areas: x Affective/motivational modelling. x Affective/motivational diagnosis. x Relevant aspects of motivation and affect in learning. x Strategies for motivational and affective reaction, x Integrative models of cognition, motivation, and affect. x Personal traits, motivation, and affect. x Learning styles, learning domains and learning contexts. x Learning goals, motivation, and affect. x Influences of dialogues in affective computing. x Use of agents as affective companions. x Interface design for affective interactions. The workshop is focused on exploring the following questions: x Which emotions might be useful to model (e.g. basic affective reactions such as like/dislike; specific emotions such as frustration, happiness, anger; moods)? x How do individual traits influence the learner’s motivational state? x How are motivation and emotional intelligence related?
C. Conati et al. / Motivation and Affect in Educational Software
1003
The workshop is focused on exploring the following questions: x Which emotions might be useful to model (e.g. basic affective reactions such as like/dislike; specific emotions such as frustration, happiness, anger; moods)? x How do individual traits influence the learner’s motivational state? x How are motivation and emotional intelligence related?
1004
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
Third International Workshop on Authoring of Adaptive and Adaptable Educational Hypermedia Dr. Alexandra Cristea - Eindhoven University of Technology, The Netherlands Dr. Rosa M. Carro - University Autonoma of Madrid, Spain Prof. Dr. Franca Garzotto - Politecnico di Milano, Italy
This workshop follows a successful series of workshops on the same topic. The current workshop focuses on the issues of design, implementation and evaluation of general Adaptive and Adaptable (Educational) Hypermedia, with special emphasis on the connection to user modelling and pedagogy. Authoring of Adaptive Hypermedia has been long considered as secondary to adaptive hypermedia delivery. This task is not trivial at all. There exist some approaches to help authors to build adaptive-hypermedia-based systems, yet there is a strong need of high-level approaches, formalisms and tools that support and facilitate the description of reusable adaptive websites. Only recently have we noticed a shift in interest, as it became clearer that the implementation-oriented approach would forever keep adaptive hypermedia away from the ‘layman’ author. The creator of adaptive hypermedia cannot be expected to know all facets of this process, but can be reasonably trusted to be an expert in one of them. It is therefore necessary to research and establish the components of an adaptive hypermedia system from an authoring perspective, catering for the different author personas that are required. This type of research has proven to lead to a modular view on the adaptive hypermedia. One of these modules, which is most frequently used, is the User Model, also called Learner Model in the Educational field (or Student Model in ITS). Less frequent, but also emerging as an important module is the Pedagogical Model (this model has also different names in different implementations, too various to name here). It becomes more and more clear that for Adaptive Educational Hypermedia it is necessary to consider not only the learner’s characteristics, but also the pedagogical knowledge to deal with these characteristics. This workshop will cover all aspects of the authoring process of adaptive educational hypermedia, from design to evaluation, with special attention to Learner and Pedagogical models. Therefore, issues to discuss are: x What are the main characteristics (that should be) modelled of learners? x How can the pedagogical knowledge be formulated in a reusable manner? x How can we consider user cognitive styles in adaptive hypermedia? x How can we consider user learning styles in adaptive hypermedia? x Are there any recurring patterns that can be detected in the authoring process generally speaking, and in the authoring of user or pedagogic model in particular? The workshop will also lead to a better understanding and cross-dissemination of userspecific patterns extracted from existing design and authoring processes in AH, especially focused around user modelling and pedagogic modelling. The workshop aims to attract the interest of the related research communities to the important issues of design and authoring, with special focus on user and pedagogic models in adaptive hypermedia; to discuss the current state of the art in this field; and to identify new challenges in the field. Moreover, the workshop should be seen as a platform that enables the cooperation and exchange of information between European and non-European projects. Major Themes of the workshop include: x Design patterns for adaptive educational hypermedia x Authoring user models for adaptive/adaptable educational hypermedia x Authoring pedagogic models for adaptive/adaptable educational hypermedia
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 © 2005 The authors. All rights reserved.
1005
Learner Modelling for Reflection, to Support Learner Control, Metacognition and Improved Communication between Teachers and Learners Judy KAY1, Andrew LUM1 and Diego ZAPATA-RIVERA2 School of Information Technologies, University of Sydney, Australia. 2 Educational Testing Service, Rosedale Road. Princeton, NJ 08541 USA {judy, alum}@it.usyd.edu.au,
[email protected] 1
Learner modelling is at the core of AIED research, as the learner model is the foundation of ‘systems that care’ because they have the potential to treat learners as individuals. This workshop will bring together researchers working towards the many important, emerging roles for learner models. Personalising teaching is their core task. It is becoming increasingly clear that learner models are first class objects which can be made open to learners and teachers as a basis for improving learning outcomes. Essentially, open learner models offer the potential to help learners reflect on their own knowledge, misconceptions and learning processes. A particularly important new direction is to incorporate open learner models into conventional learning systems. The challenge is to fruitfully make this data more useful as detailed models of learner development, with modelling of competence, knowledge and other aspects. A closely related area of importance is how best to collect, analyse and externalise data from learner interactions and how to represent this for most effective support of reflection. Another important new direction for open learner models is in the support of learner control over learning. At quite a different level, we are seeing the emergence of systems that model affective aspects such as emotion. We need to link this with the potential role of open learner models. Finally, there is considerable work in machine learning in conjunction with learner modelling. This is often predicated on the assumption that a machine learning system can access collections of student models. Program committee: Susan Bull, University of Birmingham, UK; Paul Brna, Northumbria University, UK; Peter Brusilovsky, University of Pittsburgh, USA; Al Corbett, Carnegie Mellon University, USA; Vania Dimitrova, University of Leeds, UK; Jim Greer, University of Saskatchewan, Canada; Gord McCalla, University of Saskatchewan, Canada; Rafael Morales, Northumbria University, UK; Kyparisia Papanikolaou, University of Athens, Greece; Nicolas Van Labeke, University of Nottingham, UK. Workshop Chairs: Judy Kay, University of Sydney, Australia Andrew Lum, University of Sydney, Australia Diego Zapata, Educational Testing Service, USA
This page intentionally left blank
1007
Artificial Intelligence in Education C.-K. Looi et al. (Eds.) IOS Press, 2005 C 2005 The authors. All rights reserved.
Author Index Abu-Issa, A.S. Acosta Chaparro, E. Aïmeur, E. Ainsworth, S. Akhras, F.N. Albacete, P. Aleven, V. Alpert, S. Aluísio, S. Alvarez, A. Anderson, E. André, E. Andric, M. Andriessen, J. Aniszczyk, C. Aroyo, L. Arroyo, I. Arruarte, A. Asensio-Pérez, J.I. Ashley, K. Avramides, K. Azevedo, R. Bader-Natal, A. Baker, R.S. Barros, B. Baylor, A.L. Beal, C. Beal, C.R. Beck, J. Beck, J.E. Beers, C. Belghith, K. Bell, A. Ben Ali, I. Bernstein, D. Bessa Machado, V. Biswas, G. Blasi, L. Bollen, L. Bote-Lorenzo, M.L. Bourdeau, J.
104 955 249 9, 989 729 314 17, 563, 732 735, 990 997 738 741 926 298 25 792 555 998 33 857 935 732 603 41, 184, 233 49 57 872 65, 73, 744 944, 958, 985 80, 290, 848 747, 750 819, 884 88, 997 646 899 908 878 621 395 241, 646 753 266, 954 935 539
Bouwer, A. Boyle, R. Brauckmann, J. Bredeweg, B. Brna, P. Brooks, C. Bruno, M. Brusilovsky, P. Bull, S. Burstein, J. Cakir, M. Campos, J. Carey, R. Caron, P.-A. Carr, L. Carro, R.M. Cassell, J. Celorrio, C. Chan, S.-K. Chan, T.-W. Chang, B. Chang, C.-F. Chang, C.-Y. Chang, S.-B. Chao, C.-y. Chavan, G. Chee, Y.S. Chen, C.-T. Chen, C. Chen, G. Chen, Z.-H. Cheng, H.N.H. Cheng, R. Chesher, D. Chieu, V.M. Ching, E. Chipman, P. Chiu, Y.-C. Choi, S. Choksey, S. Chou, C.-Y. Christoph, N.
756 370 816 395, 579, 756 851 694 515, 702 96, 710, 999 104 112 120 926 563, 813 759 25 1004 3 872 523 136, 144, 768 786, 991 144, 786 378 941 786 780 96 128 768 765 956 136 144 152 795 491 768 845 750 771 555 136, 768 774
1008
Claës, G. 386 Clarebout, G. 168 Cohen, P. 80 Cohen, W. 571 Cole, R. 985 Collins, H. 686 Conati, C. 411, 1001, 1002 Conejo, R. 531, 777, 999 Coppinger, R. 923 Corbett, A. 780 Corbett, A.T. 57 Core, M.G. 762 Corrigan-Halpern, A. 798 Cox, R. 810 Cristea, A. 1004 Cromley, J. 41, 184 Crowley, K. 621 Crowley, R. 192 Cuneo, A. 884 Czarkowski, M. 783 Dabrowski, R. 747 Daniel, B.K. 200 de Jong, T. 4 Deng, Y.-C. 136, 144, 768, 786 Derycke, A. 759 Desmarais, M.C. 209 Devedžić, V. 25, 992 de Vries, E. 938 Dichev, C. 789 Dicheva, D. 789, 998 di Eugenio, B. 217, 798 Dimitriadis, Y.A. 935 Dimitrova, V. 370 Donmez, P. 571 Doswell, J.T. 957 Dragon, T. 515, 702 du Boulay, B. 427, 459, 932, 1002 Dubourg, X. 807 Duval, E. 322 Ebbers, S.J. 958 Eisen, B. 836 Elen, J. 168 Elorriaga, J.A. 857 Evens, M. 866 Feltrim, V. 738 Feng, M. 555 Fernández-Castro, I. 741 Fiedler, A. 801 Fitzpatrick, G. 603
Fleming, P. Forbes-Riley, K. Fossati, D. Frasson, C. Freedman, R. Fu, S. Garzotto, F. Gašević, D. Ghag, H. Glass, M. Godbole Chaudhuri, P. Gogoulou, A. Gomboc, D. Gómez-Sánchez, E. Goolnik, S. Gouli, E. Gounon, P. Gouvea, E. Graesser, A. Grandbastien, M. Grawemeyer, B. Greene, J. Greene, J.A. Greer, J. Grigoriadou, M. Groen, R. Gupta, R. Guzmán, E. Gweon, G. Hage, H. Hakem, K. Hall, W. Haller, S. Harrer, A. Harrington, K. Harris, A. Hartswood, M. Heffernan, N. Heffernan, N.T. Heilman, M. Heiner, C. Henze, N. Hernández, Y. Herrmann, K. Hibou, M. Higgins, D. Hirashima, T. Hofmann, K. Holmberg, J.
9 225 217 1002 866 209 1004 322 104 217 41 804 762 935 959 804 807 884 845, 985 386 810 41 233 694 804 395 241, 646 531, 777 571, 813 249 258 25 217 266, 816 923 427, 842, 914 926 571 555, 902, 929 920 819, 884 274 960 282, 830 961 112 670, 854 962 932
1009
Hoppe, U. Horacek, H. Horiguchi, T. Huang, R. Hubbard, S. Huettner, A. Hunn, C. Ildefonso, T. Inaba, A. Iwane, N. Jackson, T. Jansen, M. Jemni, M. Jeuring, J. Johnson, L. Johnson, W.L.
282, 475, 830, 836 827 670 833 662 225 869 863 346 893 845 836 878 911 1002 290, 298, 306 547, 686, 747, 985 Jordan, P.W. 314 Jovanović, J. 322 Joyce Kim, H.-J. 845 Jukic, D. 192 Junker, B. 555, 571 Kabanza, F. 899 Kasai, T. 330 Kawaguchi, Y. 893 Kay, J. 338, 783, 795, 1005 Kayashima, M. 346 Kelly, D. 354 Kemp, E. 881 Kemp, R. 839, 881 Kerawalla, L. 176, 842, 914, 932 Kerejeta, M. 857 Kershaw, T.C. 798 Khan, M. 899 Kim, J. 848 Kim, S. 744, 963 Kim, Y. 362 King, N.J.C. 795 Klein, J. 923 Knight, A. 555, 571 Kochakornjarupong, D. 851 Koedinger, K. 17, 555, 571 929, 990 Koedinger, K.R. 57, 419 Kohler, K. 515, 702 Kosba, E. 370 Krsinich, R. 839 Kuhn, M. 830
Kunichika, H. Kuo, C.-H. Kurhila, J. Labat, J.-M. Lahart, O. Lane, H.C. Larrañaga, M. Lee, H. Lee, M. Legowski, E. Le Pallec, X. Leroux, P. Lesgold, S. Li, T.-Y. Li, X. Liew, C.W. Lima, D.R. Lima-Salles, H. Lin, H. Litman, D. Liu, H. Liu, Y. Livak, T. Lloyd, T. Lopes, J.G.P. Lu, J. Lu, X. Luckin, R.
854 378 483 258 964 762 857 750, 771 744 192 759 807 780 941 965 160 860 579 965 225 833 128 555, 902 104 863 966 798 176, 427, 459, 603 842, 914, 932, 1002 Lulis, E. 866 Lum, A. 338, 1005 Lynch, C. 678 Macasek, M.A. 555, 929 Makatchev, M. 403 Manske, M. 411 Maqbool, Z. 848 Marsella, S. 306 Marsella, S.C. 595 Marshall, D. 515, 702 Martin, B. 419, 638 Martínez-Mirón, E. 427 Martinez-Miron, E.A. 1002 Masterman, L. 435 Mathan, S. 419 Matsubara, Y. 893 Matsuda, N. 443 Mattingly, M. 515, 702 Mavrikis, M. 869, 967
1010
Mayer, R.E. Mayorga, J.I. McCalla, G. McCalla, G.I. McLaren, B. McLaren, B.M. Medvedeva, O. Melis, E. Mercado, E. Merceron, A. Methaneethorn, J. Miao, Y. Miettinen, M. Milgrom, E. Millán, E. Miller, M. Mitrovic, A. Mitrovic, T. Mizoguchi, R. Möbus, C. Mohanarajah, S. Moos, D. Mostow, J. Motelet, O. Muehlenbrock, M. Munneke, L. Murray, D. Murray, R.C. Murray, T. Murray, W.R. Nagano, K. Najjar, M. Nakamura, M. Neumann, G. Ngomo, M. Nilakant, K. Nkambou, R. Noguez, J. Noronha, R.V. Nourbakhsh, I. Nuzzo-Jones, G. O’Connor, J. Ohlsson, S. Oliveira, O. Olney, A. Olson, E. Ong, C.-K. Ong, E.
298, 686 872 654 200 17, 990 266 192 451 555 467 968 475 483 491 777, 999 765 419, 499, 638 718, 896 5 330, 346, 539 875 881 41 819, 884 970 507 953 702 887 515, 702 890 330 971 893 947 386 896 539, 899 960 972 621 555, 902, 929 176, 842, 932 718, 798 738 845 41, 184 523 523
Otsuki, S. 893 Oubahssi, L. 386 Overdijk, M. 792, 973 Pain, H. 1002 Pan, W. 905 Papanikolaoy, K. 804 Park, S. 73, 944 Parton, K. 908 Passier, H. 911 Pearce, D. 842, 914 Penya, Y. 947 Perez, R. 73, 944 Pérez-de-la-Cruz, J.-L. 531, 777 Pessoa, A. 738 Pimenta, M.S. 863 Pinkwart, N. 917 Plant, E.A. 65 Pollack, J. 49 Porayska-Pomsta, K. 1002 Potts, S. 783 Procter, R. 926 Psyché, V. 539 Pu, X. 209 Pulman, S.G. 629 Pynadath, D.V. 595 Qu, L. 547, 750 Ramachandran, S. 908, 1001 Rankin, Y. 975 Rasmussen, K. 555 Rath, K. 515 Razzaq, L. 555 Rebolledo-Mendez, G. 459, 1002 Rehm, M. 298 Richard, J.-F. 258 Ritter, S. 555 Rizzo, P. 686 Robinson, A. 563 Roh, E. 192 Roll, I. 17, 57 Rosatelli, M.C. 860 Rosé, C. 563, 571, 735, 813 Rueda, U. 857 Ryan-Scheutz, C. 920 Ryu, E.J. 17 Salles, P. 579 Sammons, J. 702 Sandberg, J. 774 Sander, E. 258 Scheutz, M. 920
1011
Schulze, K. Schuster, E. Schwartz, D. Schwier, R.A. Seebold, H. Sewall, J. Shapiro, J.A. Shaw, E. Shebilske, W. Shelby, R. Shen, E. Si, M. Smart, L. Smith, D.E. Smith, H. Soller, A. Solomon, S. Son, C. Sosnovsky, S. Spector, L. Stahl, C. Stahl, G. Stermsek, G. Stevens, R. Stevens, S. Stubbs, K. Sukkarieh, J.Z. Sun, S. Suraweera, P. Takeuchi, A. Tan, J. Tang, T.Y. Tangney, B. Tanimoto, S. Tay, A.-H. Taylor, L. Taylor, P. Thür, S. Tirri, H. Todd, E. Tongchai, N. Treacy, D. Tsao, N.-L. Tseytlin, E. Tsovaltzi, D. Tunley, H. Turner, T.E. Ullrich, C.
678 738 6 200 875 266 160, 678 587, 686, 750 765 678 73, 944 595 926 160 603 611 762 744 96 923 947 120 947 611 780 621 629 976 638 854 646 654 354 662 523 678 926 816 483 839 977 678 378 192 801 932 555, 929 978
Ulrich, H. Underwood, J. Upalekar, R. Urretavizcaya, M. Urushima, M. van Amelsvoort, M. van Diggelen, W. VanLehn, K. van Lent, M. Vassileva, J. Vega-Gorgojo, G. Verbert, K. Verdejo, M.F. Vetter, M. Vickers, P. Vilhjalmsson, H. Volz, R. Wagner, A. Walker, E. Walonoski, J.A. Wang, H.-C. Wang, N. Ward, A. Warren, D. Weerasinghe, A. Weinstein, A. Wenger, A. Wible, D. Wielinga, B. Wild, F. Wilkinson, L. Winn, W. Winter, M. Winters, F. Wintersgill, M. Wolska, M. Woolf, B. Woolf, B.P. Wu, S. Wu, Y. Xhafa, F. Yacef, K. Yamaguchi, H. Yen, J. Yu, D. Yudelson, M. Yuill, N.
780 603, 932 555 741 854 953 792, 973 314, 403, 443 678, 887 762 152 935 322 872 816 851 306, 750 765 780 266, 979 555, 902 941 686 225 73, 944 980 678 920 378 774 947 926 662 694 41 678 827 515 33, 702, 993 747 241 120 467 330 765 217 96, 710 427, 842, 914
1012
Yusoff, M.Z. Zaiss, Z. Zakharov, K.
969 813 718
Zapata-Rivera, D. Zhou, N. Zolna, J.S.
1005 120 981